This talk reviews some of the key statistical ideas that are encountered in situations in which one tries to find support to causal conclusions by applying statistical methods on observational or experimental longitudinal data. In such data, a collection of individuals are followed over time, then possibly registering for each one a sequence of covariate measurements along with values of variables that will be interpreted as causes, and finally reporting the individual outcomes or responses. Particular attention is given to the important problem of potential confounding and to conditions under which causal effects can be estimated by statistical methods from observational data. Our approach to this inferential problem is Bayesian, and uses predictive distributions as summary measures of the causal effects. We draw connections to relevant recent work in this area, notably to Judea Pearl's formulations based on graphical models and his calculus of *do*-probabilities. The talk is largely based on collaborative work with Jan Parner.

A CDC panel to help design experiments of anthrax vaccine was convened prior to September 11 because of concern with the current vaccine, such as how it should be delivered to minimize negative side effects. This question was considered especially important because of fears of the possible use of anthrax as a biological weapon. The two months since have proved these fears to be not as hypothetical as we all would have wished. This presentation will discuss critical design and analysis issues for using a combination of proposed randomized experiments with human volunteers and others with macaques, where in the former only surrogate outcomes (e.g., anti-body levels) will be available whereas in the later, surrogate outcomes and actual survival outcomes will be available. The proper "bridging" analyses of such studies involve statistical methods that are beyond standard formulations, and rely on the concept of "principal" stratification (Frangakis and Rubin, 2002).

During the last 40 years there has been a lot of development in the theory of random matrices in mathematical physics and in statistics. The subject has connections with many parts of mathematics. The study of the spectra of random matrices gives rise to interesting probability distributions that are not so well known in the probability litterature. I will give a survey of the field and describe some recent developments.

An (edge) reinforced random walk on a connected graph G moves at each step from a vertex to one of its neighbors with a probability proportional to a weight attached to the edge. The weight of the edge may be changed every time the walk traverses this edge. The sequence of positions of the walker is no longer Markovian, but the sequence of positions plus the weights of all edges is Markovian.

We will survey some of the results in the literature and discuss a recent result with R. Durrett and V. Limic. This result concerns a so-called once reinforced walk. In this walk one starts with weight 1 for each edge. When an edge is traversed the first time, its weight is increased to 1 + C, for some fixed C > 0. After that the weight of such an edge stays at 1 + C and is not changed anymore. The result is that such a walk on a regular tree is transient. We shall also discuss a strong law of large numbers and a central limit theorem for the height of such a walk on a tree.

This talk surveys recent results on the large scale behavior of certain interacting particle systems and interface models with drift, such as exclusion-type systems and Hammersley's process. The basic result is a law of large numbers, known as a hydrodynamic limit, which shows that when time and space are suitably scaled, the random evolution converges to a deterministic solution of a partial differential equation. In one dimension, fluctuations from such a limit are related to the current work on shape fluctuations and random matrices.

Cross-species gene finding is based on the observation that conserved regions between related organisms are more likely than divergent regions to be coding. A key feature of the method is the ability to enhance gene predictions by finding the best alignment between two syntenic sequences, while at the same finding biologically meaningful alignments that preserve the correspondence between coding exons.

In this talk we present a probabilistic framework for gene structure and alignment that can be used to simultaneously find both the gene structure and alignment of two syntenic genomic regions. Our model is a generalized pair hidden Markov model, a hybrid of generalized hidden Markov models which have been used previously for gene finding, and pair hidden Markov models which have applications to sequence alignment. The theory is implemented in a gene finding and alignment software we call SLAM, that aligns and identifies complete exon/intron structures in two related, but unannotated, sequences of DNA.

Molecular evolution has fundamental stochastic and combinatorial components. The study of the molecular evolution of genes and gene order demands a diverse range of combinatorial optimization and probabilistic modeling and problem solving techniques. This talk will present an overview of the most important models and most challenging problems, with focus on microbial genomics.

Statistical methods for localization of genes along the chromosomes were introduced already fifty yeares ago. However, with the availability of many genetic markers and large pedigrees, new methods have been developed. The goal is to map the disease genes that increase susceptibility to complex disorders. Often, the genetic component is the result of the interaction of many genes with small individual effects. Therefore, the statistical task is difficult and challenging. In the talk, new results regarding disease locus estimation in linkage analysis will be presented. The main working tools are:

1) To describe the inheritance pattern of each pedigree as a Markov process, defined on the space of binary vectors of a certain length (so called inheritance vectors).

2) To use argmax theory of stochastic processes to determine the asymptotic distribution of the disease locus estimator as the size of the data set (i.e. the number of pedigrees) tends to infinity.

The rate of convergence is, under certain assumptions, superefficient, and the limiting distribution the argmax of a certain compound Poisson process. Various applications of the theoretical results will be presented.

Gaussian random fields are the most common spatial models. They are either used for modeling the observed data directly or as building blocks in hierarchical models. Recently, there has been a lot of interest in extending these models to spatio-temporal processes. In this talk I will present two examples in metereology and spatial epidemiology. One of the main difficulties in modeling space-time phenomena lies in specifying the space-time covariance structure. These are mostly either isotropic in all dimensions, making no distinction between time and space, or separable in time and space. Both assumptions are not always plausible. In this talk, we compare two different approaches. The first is similar to strategies used for spatial processes, in that non-separable parametric forms for the covariance functions are directly defined. The second strategy is to introduce latent structures that generate space-time correlation, and to assume that the residual noise has a very simple coloring. We compare these two strategies and consider interpretability, inference and computational difficulties. Whenever there are two different modeling strategies that lead to the same model for the data, the following questions arises: In practise can we consider only one of the two approaches (i.e. covariance or latent variables modeling)? Our findings suggest that each of the two modeling strategies can lead to models that would not likely be proposed when following the other approach.

In this talk I will discuss Gaussian Markov Random Fields (GMRFs) and its use in spatial statistics. This class of models is excellent for several reasons: The conditional independence structure allows for very efficient computations of statistical tasks (unconditional and conditional sampling and computation of normalization constants) using numerical algorithms for sparse matrices, GMRFs can be used as proxies for commonly specified Gaussian fields, GMRFs can be used as the basis for constructing computational efficient non-Gaussian approximations to hidden GMRFs, efficient block-sampling MCMC-algorithms can be constructed when GMRFs are involved. Some examples will be presented.

Latest update Apr 8 2002, Per Hallberg <perh@math.kth.se>