NIH/NLM R15 Project
A Computational Framework for Protein Identification and Quantification in Metaproteomics Using Data-Independent Acquisition
Project Number: 1R15LM013460-01
July, 2020 - June 2024
Software
- CloudProteoAnalyzer
- SEMQuant
- Transformer-DIA
- FineFDR
- WinnowNet
- MS-feature
- IDIA
- DeepFilter
- MetaLP
CloudProteoAnalyzer is a cloud-based parallel computing application to offer end-to-end proteomics data analysis software as a service
CloudProteoAnalyzer is available at https://sipros-oscer.unt.edu/
The source code of CloudProteoAnalyzer is available at https://github.com/Biocomputing-Research-Group/CloudProteoAnalyzer
SEMQuant is a protein quantification tool with match-between-runs in metaproteomics. It can convert the format of proteins identified by the database searching tool, i.e., Sipros, to the format accepted by the quantification tool, i.e., IonQuant
SEMQuant is available at https://github.com/Biocomputing-Research-Group/SEMQuant
Transformer-DIA is a transformer-based de novo peptide sequencing method for data-independent acquisition mass spectrometry.
Transformer-DIA is available at https://github.com/Biocomputing-Research-Group/Transformer-DIA
FineFDR is a fast and accurate fine-grained taxonomy-specific FDR assessment tool to control the FDR separately for peptides in different taxonomic ranks.
FineFDR is available at https://github.com/Biocomputing-Research-Group/FDR
WinnowNet is a filtering algorithm with a well-calibrated re-scoring function for peptide identifications in metaproteomics.
WinnowNet is available at https://github.com/Biocomputing-Research-Group/WinnowNet
It is a deep learning-based feature detection tool for feature detection in MS2 scans.
The source code of this tool is available at https://github.com/Biocomputing-Research-Group/MS-feature
IDIA is a tool to generate pseudo-spectra from data independent acquisition (DIA) data mass spectrometry-based proteomics data.
IDIA is available at https://github.com/Biocomputing-Research-Group/IDIA
DeepFilter is a metaproteomics-filtering tool based on a deep learning model. It is aimed at improving peptide identifications of microbial communities from a collection of tandem mass spectra.
DeepFilter is available at https://github.com/Biocomputing-Research-Group/DeepFilter
MetaLP is a protein inference algorithm for shotgun proteomics analysis of microbial communities. Two key innovations in MetaLP were the integration of taxonomic abundances as prior information and the formulation of protein inference as a linear programming problem. It was optimized for metaproteomics to address degenerate peptides and "one-hit wonders." The objective of MetaLP is to produce substantially higher numbers of protein identifications in complex metaproteomics datasets.
MetaLP is available at https://github.com/Biocomputing-Research-Group/MetaLP
Publications
(The names of students who were under the PI's supervision are underlined; undergraduates are double-underlined. Corresponding authors are asterisked.)
- Li J, Xiong Y, Feng S, Pan C, Guo X*. CloudProteoAnalyzer: scalable processing of big data from proteomics using cloud computing. Bioinformatics Advances. 2024. vbae024.
- Ebrahimi S, Guo X*. Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. In 2023 IEEE 23rd International Conference on Bioinformatics and Bioengineering (BIBE), Dayton, OH, USA, 2023, pp. 28-35. IEEE.
- Li J, Pan C, Guo X*. IDIA: An Integrative Signal Extractor for Data-Independent Acquisition Proteomics. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2022 Dec 6 (pp. 266-269). IEEE. PMCID: PMC10077956
- Jain KG, Zhao R, Liu Y, Guo X, Yi G, Ji HL*. Wnt5a/β-catenin axis is involved in the downregulation of AT2 lineage by PAI-1. American Journal of Physiology-Lung Cellular and Molecular Physiology. 2022 Nov 1;323(5):L515-24. PMCID: PMC9602939
- He J, Liu O, Guo X*. Deep Learning Based MS2 Feature Detection for Data-Independent Shotgun Proteomics. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2022 Dec 6 (pp. 2342-2348). IEEE. PMCID: PMC10457098
- Wang S, Feng S, Pan C, Guo X*. FineFDR: Fine-grained Taxonomy-specific False Discovery Rates Control in Metaproteomics. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2022 Dec 6 (pp. 287-292). IEEE. PMCID: PMC9998077
- Feng S, Ji HL, Wang H, Zhang B, Sterzenbach R, Pan C, Guo X*. MetaLP: An integrative linear programming method for protein inference in metaproteomics. PLOS Computational Biology. 2022 Oct 21;18(10):e1010603. PMCID: PMC9629623
- Feng S, Sterzenbach R, Guo X*. Deep learning for peptide identification from metaproteomics datasets. Journal of Proteomics. 2021 Sep 15;247:104316. PMCID: PMC8435027
Research Team
- Dr. Xuan Guo (PI)
- Dr. Armin Milker (Co-I)
- Dr. Chongle Pan (Collaborator)
- Jiancheng Li (Graduate Research Assistant)
- Shichao Feng (Graduate Research Assistant)
- Bailu Zhang (Graduate Research Assistant)
- Shiva Ebrahimi (Graduate Research Assistant)
- Ryan Sterzenbach (Undergraduate Research Assistant)
- Gloire Eunice Benga Ikambouayat (Undergraduate Research Assistant)
- Shengze Wang (Undergraduate Research Assistant)
- Kulthum Lakha (Undergraduate Research Assistant)
- Jonathan He (TAMS student)
- Olivia Liu (TAMS student)
Research Opportunities
Undergraduate Research Fellowship is available for this project. This research program aims to develop novel computation tools for identifying and quantifying proteins of microbial mixtures. Undergraduate students are provided with the opportunities to learn mass spectrometry techniques and applied programming and practice with MS-based protein analysis. More details are at here.