Efficient sampling of probability distributions over large discrete spaces is a challenging problem that arises in many contexts in bioinformatics and ecology. For example, segmentation of genomes to identify putative functional elements can be cast as a multiple change-point problem involving thousands or even millions of change-points. Another example involves reconstructing the invasion history of an introduced species by embedding a phylogenetic tree in a landscape. A third example involves inferring networks of molecular interactions in cellular systems.

In this talk I describe a generalisation of the Gibbs sampler that allows this well known strategy for sampling probability distributions in R^n to be adapted for sampling discrete spaces. The technique has been successfully applied to each of the problems mentioned above. However, these problems remain highly computationally intensive. I will discuss a number of alternatives for efficient sampling of such spaces, and will be seeking collaborations to develop these and other new approaches.

BACK