Molecular Medicine Israel

Introducing untargeted data-independent acquisition for metaproteomics of complex microbial samples

Abstract

Mass spectrometry-based metaproteomics is a relatively new field of research that enables the characterization of the functionality of microbiota. Recently, we demonstrated the applicability of data-independent acquisition (DIA) mass spectrometry to the analysis of complex metaproteomic samples. This allowed us to circumvent many of the drawbacks of the previously used data-dependent acquisition (DDA) mass spectrometry, mainly the limited reproducibility when analyzing samples with complex microbial composition. However, the DDA-assisted DIA approach still required additional DDA data on the samples to assist the analysis. Here, we introduce, for the first time, an untargeted DIA metaproteomics tool that does not require any DDA data, but instead generates a pseudospectral library directly from the DIA data. This reduces the amount of required mass spectrometry data to a single DIA run per sample. The new DIA-only metaproteomics approach is implemented as a new open-source software package named glaDIAtor, including a modern web-based graphical user interface to facilitate wide use of the tool by the community.

Introduction

Microbiome profiling has attained increasing attention in the past few years with the recognition of the important role of microbiota in human health and disease [1,2,3] and potential major implications for disease prediction, prevention, and treatment. Currently, metagenome sequencing has remained the most common approach to study microbiome, with several successful applications in various studies, including large multi-center studies of thousands of samples using either 16S rRNA or whole genome sequencing [4]. By cataloguing which microbes are present in a sample and their relative abundances, metagenomics can provide important information about the taxonomic composition of the microbial communities and predict their functional potential. A major limitation of the metagenome approach is, however, that it does not directly assess the function of the microbiota. To overcome this limitation, mass spectrometry based metaproteome analysis has emerged as a compelling option.

Metaproteomics is a relatively new field of research that aims to characterize all proteins expressed by a community of microorganisms in a complex biological sample [5]. Its major promise lies in its ability to directly measure the functionality of microbiota, while the more widely used metagenomics captures only the taxonomic composition and functional potential. Therefore, metaproteomics has emerged as an intriguing option, for example, in the study of human gut microbiota functionality in various healthy and disease states [67].

To date, metaproteomics has typically involved data-dependent acquisition (DDA) mass spectrometry, which is, however, known to have limitations [8]. For example, only the most intense peptide ions are selected for fragmentation, which leaves the rest of the peptides unidentified, while MS1-based methods still allow their quantification. This is particularly challenging for metaproteomics, where the vast number of peptides increase the chance of co-elution. The selection also introduces stochasticity to the identifications, reducing the overlap between repeated analyses. For this reason, DDA often requires multiple runs from the same sample to discover all obtainable peptides. Furthermore, the ion intensities are often not consistently recorded through the whole chromatographic profile, making quantification challenging [9].

To overcome the limitations of DDA, data-independent acquisition (DIA) mass spectrometry systematically fragments all precursor peptide ions. Therefore, DIA has been proposed as an alternative method to overcome many fallbacks of DDA. However, the systematic fragmentation of the precursor peptide ions produces highly convoluted fragment spectra, making peptide identification a difficult task. This is especially pronounced for complex metaproteomic samples, where multiple precursor ions are more likely to elute simultaneously.

Recently, we were the first to demonstrate that DIA mass spectrometry can be successfully applied to analyze complex metaproteomic samples by using a spectral library constructed from corresponding DDA data to assist the peptide identification [10]. While such DDA-assisted DIA method requires the peptides to be previously discovered through DDA, it allows reproducible identification and quantification of the detected peptides across the samples [11]. However, the requirement for having a DDA-based spectral library can be considered as a major drawback of the method. Creating the DDA-based spectral library consumes sample material, may not represent well the content of all the samples and, most importantly, brings the DDA-related limitations of peptide identification to DIA, as only peptides present in the library can be detected from the DIA data.

To this end, we introduce here untargeted analysis of DIA metaproteomics data without the need for any DDA data. To solve the problem of convoluted DIA spectra, we generate a pseudospectral library directly from the DIA data. This is done using the DIA-Umpire algorithm to deconvolve the DIA spectra into DDA-like pseudospectra, having precursors and their fragments, which can then be used for peptide identification with conventional protein database searches [12]. Similar approach has not been used in complex metaproteomics studies before. Using a laboratory-assembled microbial mixture and human fecal samples, we demonstrate that our DIA-only metaproteomic approach enables overcoming the limitations of the DDA-assisted DIA approach and reduces the number of required mass spectrometry analyses to a single DIA analysis per sample.

The new DIA-only metaproteomics approach is implemented as a new open-source software package named glaDIAtor. It contains two different interfaces to facilitate wide use of the tool by the community. The easy-to-use graphical user interface (GUI) is suited to users without extensive bioinformatics background, whereas the command line interface is more suited to high-performance computing (HPC) cluster usage and other scripted use cases. To provide a modern graphical web user interface, glaDIAtor utilizes the Pyramid and Vue frameworks. The primary intended method for its deployment is a server installation, where it can be accessed from multiple workstations by using web browsers, such as Firefox or Chrome. Alternatively, glaDIAtor can be deployed to a workstation where the web service is visible only to the local machine. Using the command line interface, glaDIAtor can be deployed on HPC clusters under work managers such as SLURM.

Results

The glaDIAtor software package implements a complete data analysis workflow for DIA metaproteomics from raw mass spectrometry files to peptide quantification and taxonomic/ functional annotation (Fig. 1). It is implemented using container technology, which provides all the required utilities and libraries in a single package enabling easy installation on multiple different platforms [1314], including support for server and workstation deployments. To enable broad adoption of the tool, we provide both a modern web-based graphical user interface as well as a command line interface that enables HPC cluster usage…

Sign up for our Newsletter