University of Toronto home

PTMClust - A Post-translational Modification Refinement Algorithm

Clement Chung1,2, Jian Liu3,4, Andrew Emili3,4 and Brendan J. Frey1,2,3,5
1Department of Computer Science, 2Probabilistic and Statistical Inference Group, 3Banting and Best Department of Medical Research, 4Donnelly Centre for Cellular and Biomolecular Research, 5Department of Electrical and Computer Engineering
University of Toronto, Toronto, Canada


Motivation: A post-translational modification (PTM) is a chemical modification of a protein that occurs naturally. Many of these modifications, such as phosphorylation, are known to play pivotal roles in the regulation of protein function. Henceforth, PTM perturbations have been linked to diverse diseases like Parkinsonís, Alzheimerís, diabetes and cancer. To discover PTMs on a genomewide scale, there is a recent surge of interest in analyzing tandem mass spectrometry data, and several unrestrictive (so-called 'blind') PTM search methods have been reported. However, these approaches are subject to noise in mass measurements and in the predicted modification site (amino acid position) within peptides, which can result in false PTM assignments.

Results: To address these issues, we devised a machine learning algorithm, PTMClust, that can be applied to the output of blind PTM search methods to improve prediction quality, by suppressing noise in the data and clustering peptides with the same underlying modification to form PTM groups. We showed that our technique outperforms two standard clustering algorithms on a simulated dataset. Additionally, we showed that our algorithm significantly improves sensitivity and specificity when applied to the output of three different blind PTM search engines, SIMS, InsPecT and MODmap. Additionally, PTMClust markedly outforms another PTM refinement algorithm, PTMFinder. We demonstrate that our technique is able to reduce false PTM assignments, improve overall detection coverage and facilitate novel PTM discovery, including terminus modifications. We applied our technique to a large-scale yeast MS/MS proteome profiling dataset and found numerous known and novel PTMs. Accurately identifying modifications in protein sequences is a critical first step for PTM profiling, and thus our approach may benefit routine proteomic analysis.

Availability and Implementation: PTMClust is implemented in Matlab and is freely available for academic use.

Contact: Prof. Brendan Frey (

Supplementary Information

Download Latest Version

Click here for source code: Download Source Code (ver 1.1)

Click here for README: Download README File

Click here for tool to create PTMClust input file (read accompanied README.txt file for instructions): Download Tool (ver 1.0)

Download Input Data Files (Output from SIMS (Liu et al. 2008) in Matlab format)

Click here for phosphopeptide peptides (taken from Beausoleil et al 2004): Download Matlab File

Click here for yeast protein complex dataset (taken from Krogan et al 2006): Download Matlab File


C. Chung, J. Liu, A. Emili and B.J. Frey (2011) Computational Refinement of Post-translational Modifications Predicted from Tandem Mass Spectrometry (2011) Bioinformatics 27 (6) pg.797-806 Link to Publication, PDF

PSI Group home Emili Lab home