What is WASP?

The Website for Alternative Splicing Prediction (WASP) is a web application that predicts whether or not an exon is alternatively spliced, and if so, how its splicing depends on different cellular conditions, such as tissue type. Users can click on the above tabs to view curated exons that were also profiled using microarray data, or view de novo predictions for additional exons for which experimental data is not available. The application also maps putative regulatory elements in primary transcript sequence that is nearby regulated exons. It can be used to scan specific genes or exons of interest (eg, an exon that has been linked with a specific phenotype) in search of regulatory elements, and to perform genome-wide scans to identify exons with common regulatory patterns (eg, exons regulated in embryonic tissues).

How does it work?

The community has for some time sought a "splicing code" that predicts how a primary transcript will be spliced under different conditions and can account for biological mechanisms. WASP makes use of such a splicing code, which was assembled using machine learning techniques to sift through thousands of plausible biological features and identify those that in combination could be used to account for high-throughput expression data. The features that make up the regulatory code were derived from an extensive literature search and a de novo (literature-unbiased) motif search. The features include previously-described motifs, short motif counts (1-3nt frequencies), novel motifs (up to 10nt long), and features describing transcript structure, such as exon and intron lengths, whether or not inclusion or exclusion of an exon introduces a premature termination codon (PTC), regional probabilities of secondary structures, and inter-species conservation. The assembled code can be used to predict splicing changes and regulatory sequences, but also provides a framework for testing, comparing and visualizing biological mechanisms.

When a query is submitted to the website in the form of an exon sequence or genomic coordinates, features are extracted from primary transcript sequence proximal to that exon and the algorithm uses those features to predict whether the exon will exhibit increased inclusion or increased exclusion in each of several tissue types (see illustration below). If it has high confidence in the prediction, the website will output a matching high quality .pdf image of the "feature map" for the introns flanking the query exon, which can be used to identify functional elements of interest (and can handily be included as a figure in a paper). For ease of use with other user data, the website also creates custom tracks that can be uploaded in the UCSC genome browser with a click of a button.

For more information on the algorithm and derived code, please see:

Yoseph Barash, John A. Calarco, Weijun Gao, Qun Pan, Xinchen Wang, Ofer Shai, Benjamin J. Blencowe, and Brendan J. Frey. Deciphering the Splicing Code. Nature, 465:7294, May 6, 2010. ( pdf )

For more information on the website, please see the FAQ section.