The Frey Lab develops techniques that use large scale datasets to derive predictive models of how genes and many other genomic features act in combination to produce genetic messages that control cellular activities. We have most recently focused on how organisms use alternative splicing to generate a tremendous level of biological complexity that cannot be explained by gene expression alone (Nature, 2010). The group is led by Brendan J. Frey, who has appointments in Engineering and Medicine. If you are interested in joining the group, click here.
Current Research Highlight
Predicting Tissue-regulated Alternative Splicing
Alternative splicing is a major contributor to cellular diversity in mammalian tissues and relates to many human diseases. An important goal in understanding this phenomenon is to infer a genetic splicing code that predicts how splicing is regulated in different cell types by features derived from RNA, DNA and epigenetic modifiers.
New results:
Following our initial draft of the mouse splicing code (Nature 2010), we developed a new computational method for inferring the code. We formulated the assembly of a splicing code as problem of statistical inference and introduced a Bayesian method that uses an adaptively selected number of hidden variables to combine subgroups of
features into a network, allows different tissues to share feature subgroups and uses a Gibbs sampler to hedge predictions and ascertain the statistical significance of identified features. Using data for 3665 cassette exons, 1014 RNA features and 4 tissue types derived from 27 mouse tissues (data available here), we benchmarked several methods. The Bayesian method outperforms all others, and achieves relative improvements of 52% in splicing code quality and up to 22% in classification error, compared with the state of the art. Novel combinations of regulatory features and novel combinations of tissues that share feature subgroups were identified.
The first figure illustrates how the Bayesian method combines predictions from many models (over 100 million in our experiments), by weighing each model according to its posterior probability. The second figure compares the Bayesian method with other techniques, including a carefully tuned support vector machine (SVM) and the original boosted decision stump method.
Reference
Hui Y Xiong*, Yoseph Barash*, Brendan J Frey, Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context. Bioinformatics 27 (18), pg. 2554-2562, September 2011. [pdf file].
* Joint first authors.
