Mass-spectrometry offers powerful methods for protein analysis, but their throughput has remained limited. These limitations are particularly apparent, for example, in applications of proteomics to low sample amounts, where coverage is limited in depth and throughput. Accordingly, there has remained a need for methods capable of providing improved throughput and achieving other improvements over existing methods.
The present invention relates to improved methods and systems for mass spectrometric analysis.
In some aspects, the invention relates to an experimental and computational framework for simultaneously multiplexing the analysis of both peptides and samples (“plexDIA”).
In some embodiments, the throughput of analysis is increased by analyzing multiple peptides simultaneously, as afforded by data independent analysis (DIA) and by analyzing multiple samples simultaneously, as afforded by labeling methods. In some embodiments, the throughput increases multiplicatively with the number of labels. In some embodiments, the multiplicative increase in the throughput is achieved while also preserving the depth of coverage (number of quantified proteins per sample) and the quantitative accuracy of label-free approaches.
In some embodiments, the invention relates to method of analyzing a plurality of samples, each sample comprising peptides, the method comprising: (a) for each of the plurality of samples, labeling the peptides in the sample with a mass tag unique to that sample to form respective sets of labeled peptides; (b) pooling the sets of labeled peptides to form a mixture; (c) in a first mass spectrometer having a resolution of between about 70,000 and 512,000, generating labeled precursor ions corresponding to the labeled peptides in the mixture and creating a first mass spectrum; (d) selecting a range of mass-to-charge ratios from the first mass spectrum, the selected range being a mass selection window; (e) fragmenting the labeled precursor ions within the mass selection window to generate fragment ions; and (f) in a second mass spectrometer, the second mass spectrometer being in tandem with the first mass spectrometer, analyzing the fragment ions simultaneously by data independent analysis. Optionally, and in addition in some embodiments: (1) the mass tags are nonisobaric and isotopologous; (2) the mass tags are amine-specific and stable-isotope-labeled; (3) the mass tag unique to each sample differs in mass from each of the other mass tags unique to each of the other samples by at least about 30 mDa; (4) the plurality of samples is greater than 3 samples; (5) at least one of the plurality of samples comprises enzymatically-digested proteins; (6) the plurality of the peptides in at least one of the plurality of samples has a combined mass of less than about 100 μg; (7) the method further comprises identifying at least one peptide based on the data independent analysis; (8) the method further comprises obtaining a relative quantification of labeled test peptides based on the data independent analysis; (9) at least one of the first mass spectrometer and the second mass spectrometer comprises a quadrupole mass analyzer, a time of flight mass analyzer, a orbitrap mass analyzer, an electrostatic sector mass analyzer, a quadrupole ion trap mass analyzer, or an ion cyclotron resonance analyzer; (10) the identified peptide of interest is a post-translationally modified test peptide, e.g., a post-translationally modified test peptide, e.g., phosphorylation, acetylation, ubiquitination, O-glycosylation, N-glycosylation, sumoylation, methylation and combinations thereof; (11) the identified peptide of interest has at least 100 post-translational modifications (12) each of the plurality of test samples is obtained from a human; or (13) any combination of one or more of the foregoing.
In some embodiments, the invention relates to a method of determining an efficacy of a pharmaceutical compound comprising: performing the method of claim 1, wherein: a first of the plurality of samples is from a subject who has been administered the pharmaceutical compound; second of the plurality of samples is from a subject who has not been administered the pharmaceutical compound for each of the first and the second of the plurality of samples, determining a concentration of a peptide of interest; comparing the determined concentrations of the peptide of interest; based at least in part on the determined concentrations, determining the efficacy of the pharmaceutical compound.
In some embodiments, the invention relates to a method of analyzing a plurality of samples, each sample comprising peptides, the method comprising: (a) for each of the plurality of samples, labeling the peptides in the sample with a mass tag unique to that test sample to form respective sets of labeled peptides; (b) pooling the sets of labeled peptides to form a mixture; (c) in a first mass spectrometer, generating labeled precursor ions corresponding to the labeled peptides in the mixture e and creating a first mass spectrum; (d) selecting a range of mass-to-charge ratios from the first mass spectrum, the selected range being a mass selection window; (e) fragmenting the labeled precursor ions within the mass selection window to generate fragment ions; and (f) in a second mass spectrometer, the second mass spectrometer being in tandem with the first mass spectrometer, analyzing the fragment ions simultaneously by data independent analysis; wherein at least one of the plurality of samples has been obtained from contents of a single cell. Optionally, and in addition in some embodiments: (1) at least one of the plurality of samples obtained from contents of a single cell comprises a proteome of an organism; (2) an additional step involves characterizing the proteome; or a combination of the foregoing.
The foregoing will be apparent from the following more particular description of example embodiments, as illustrated in the accompanying drawings.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.
A description of example embodiments follows.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
It should be noted that throughout this specification the terms “comprising” and “having” are used to denote that embodiments of the invention “comprise” the noted features and as such, may also include other features. However, in the context of this invention, the terms “comprising” and “having” may also encompass embodiments in which the invention “consists essentially of” the relevant features or “consists of” the relevant features.
Tandem mass spectrometry, also referred to herein as MS/MS or MS2, involves multiple steps of mass spectrometry selection, with some form of fragmentation occurring in between the stages. In a tandem mass spectrometer, ions are formed in the ion source and separated by mass-to-charge ratio in the first stage of mass spectrometry (MS1). Ions of a particular mass-to-charge ratio (precursor ions) are selected and fragment ions (product ions) are created by collision-induced dissociation, ion-molecule reaction, photodissociation, or other processes known to those skilled in the art. The resulting ions are then separated and detected in a second stage of mass spectrometry (MS2). A common use is for analysis of proteins and peptides.
One kind of proteomics, quantitative proteomics, is used to determine the relative or absolute amount of proteins in a sample. As used herein, a sample can be, for example, a sample from an animal, mammal, a primate, or a human; and/or a blood, tissue, or cell sample.
Several quantitative proteomics methods are based on MS/MS. One method commonly used for quantitative proteomics is isobaric tag labeling. Isobaric tag labeling enables simultaneous identification and quantification of proteins from multiple samples in a single analysis. To quantify proteins, peptides are labeled with chemical tags that have the same structure and nominal mass, but vary in the distribution of heavy isotopes in their structure. These tags, commonly referred to as tandem mass tags (TMT™), are designed so that the mass tag is cleaved at a specific linker region upon higher-energy collisional-induced dissociation during tandem mass spectrometry, yielding reporter ions of different masses. Protein quantitation is accomplished by comparing the intensities of the reporter ions in the MS/MS spectra. Two commercially available isobaric tags are iTRAQ® and TMT™ reagents.
In isobaric labeling for tandem mass spectrometry, proteins can be for example, extracted from cells, digested, and labeled with tags of the same mass. Cells of interest can include, without limitation, tumor or cancer cells. When fragmented during MS/MS, the reporter ions show the relative amount of the peptides in the samples.
An isobaric tag for relative and absolute quantitation (iTRAQ®), for example, is a reagent for tandem mass spectrometry that is used to determine the amount of proteins from different sources in a single experiment. iTRAQ® uses stable isotope labeled molecules that can form a covalent bond with the N-terminus and side chain amines of proteins. The iTRAQ® reagents are used to label peptides from different samples that are pooled and analyzed by liquid chromatography and tandem mass spectrometry. The fragmentation of the attached tag generates a low molecular mass reporter ion that can be used to relatively quantify the peptides and the proteins from which they originated.
A tandem mass tag (TMT™), for example, is an isobaric mass tag chemical label used for protein quantification and identification. The tags contain four regions: mass reporter, cleavable linker, mass normalization, and protein reactive group. TMT™ reagents can be used to simultaneously analyze 2 to 11 different peptide samples prepared from cells, tissues or biological fluids. Three types of TMT™ reagents are available with different chemical reactivities: (1) a reactive NHS ester functional group for labeling primary amines (TMTduplex™, TMT™Sixplex™, TMT10plex Plus™, TMT11-131C™), (2) a reactive iodoacetyl functional group for labeling free sulfhydryls (iodoTMT™) and (3) reactive alkoxyamine functional group for labeling of carbonyls (aminoxyTMT™).
MS/MS can also be used for protein sequencing, as is understood by those skilled in the art. When intact proteins are introduced to a mass analyzer, it is called “top-down proteomics,” and when proteins are digested into smaller peptides and subsequently introduced into the mass spectrometer, it is called “bottom-up proteomics”. Shotgun proteomics is a variant of bottom up proteomics in which proteins in a mixture are digested prior to separation and tandem mass spectrometry.
According to aspects of the invention, it was recognized by the inventors that even where existing mass-spectrometry (MS) methods could, in some cases, achieve acceptably deep proteome coverage[1,2], low missing data[3], high throughput[4,5], and high sensitivity[6], simultaneously achieving all these objectives had remained an outstanding challenge[7,8]. Methods of mass spectrometry sufficient to empower critical biomedical projects had remained lacking.
The inventors recognized that resolving this challenge would empower biomedical projects that were impractical with current methods[8], especially those that require single-cell protein analysis[9-11]. Towards this goal, the inventors developed a novel approach: (i) increasing sample throughput and robustness by chemical labeling, and (ii) decreasing MS analysis time per sample by simultaneous (parallel) analysis of multiple peptides. These strategies are complementary, and they can be combined to achieve a multiplicative increase in the rate of quantifying the proteomes of limited sample amounts.
For example, chemical labeling had been used with data-dependent acquisition (“DDA”) to increase throughput via parallel sample analysis (
The throughput of DDA analysis could be increased by decreasing the ion accumulation times for MS2 scans, though this resulted in accumulating fewer ions and thus limits sensitivity[7]. Indeed, sensitive analysis of small sample amounts required (and was thus limited) by long ion accumulation times, which were typically substantially longer than the detection time required by MS detectors[6,19,20]. Even with short ion accumulation times for unlimited sample amounts, the requirement to serially analyze hundreds of thousands of precursor ions had remained a major challenge for simultaneously achieving high throughput and deep proteome coverage by serial precursor analysis.
A fundamental solution to this challenge was isolating and analyzing multiple precursor ions simultaneously by data-independent acquisition (DIA)[21]. This concept was since implemented into powerful methods for label-free DIA (LF-DIA) protein analysis[22-26]. Such parallel analysis of peptides decreased the time needed to analyze thousands of precursor ions and made the throughput of optimized LF-DIA and TMT-DDA workflows comparable (
Still, limitations persisted, and proteomic methods remain limited in depth and throughput, particularly where low sample amounts were available.
According to aspects of the present invention, we report improved systems and methods for proteomics by mass spectrometry, including those that increase the throughput of sensitive and/or quantitative protein analysis, thereby addressing shortcomings of existing methods.
In some embodiments, the invention relates to an experimental and computational framework, plexDIA, for simultaneously multiplexing the analysis of both peptides and samples. Multiplexed analysis with plexDIA can increase throughput multiplicatively with the number of labels without reducing proteome coverage or quantitative accuracy. The number of proteins accurately quantified by multiplexed DIA can increase multiplicatively with the number of labels used,
Increasing the throughput of sensitive DIA by multiplexing samples labeled with nonisobaric isotopologous mass tags advantageously does not increase the number of precursor ions, and concomitantly does not increase the time needed to analyze them via tandem DIA-MS; this contrasts with enhanced analysis times with DDA-MS[15,21].
For example, in some embodiments, and as further described herein, by using 3-plex nonisobaric mass tags, plexDIA enabled quantifying 3-fold more protein ratios among nanogram-level samples. Using 1-hour active gradients and first-generation Q Exactive, plexDIA quantified about 8,000 proteins in each sample of labeled 3-plex sets. plexDIA also increases data completeness, reducing missing data over 2-fold across samples.
As another example, and in some embodiments, plexDIA was used to quantify proteome dynamics during the cell division cycle in cells isolated based on their DNA content; plexDIA detected many classical cell cycle proteins and discovered new ones. When applied to single human cells, plexDIA quantified about 1,000 proteins per cell and achieved 98% data completeness within a plexDIA set while using about 5 min of active chromatography per cell.
In some aspects, the invention also addresses limitations encountered by DIA multiplexing by SILAC[29] or pulsed SILAC[30,31], achieving an increase in the number of quantitative data points that is multiplicative with the number of mass tags.
In some aspects, the invention used multiplexed DIA to increase sample throughput while preserving proteome coverage and quantification accuracy, which heretofore not been achieved due to the increased complexity of DIA data from labeled samples[33-37]. Aspects of the invention enable the use of both isobaric and isotopologous tags to multiplex DIA with enhanced quantification of proteins[cf. 32-34].
The optimized experimental and analytical framework described herein can enable n-fold multiplexed DIA to increase n-fold the number of accurate protein data points,
A variety of mass tags can be used, and they advantageously render sets of peptides in each of the n-samples distinguishable by the detector of the mass spectrometer, e.g., that of MS2. In some embodiments, the mass tags are nonisobaric and isotopologous, as demonstrated in detail herein. A variety of mass tags can be used, including, e.g., mass tags with different retention times or ion mobilities so that precursors from different samples may be separated and distinguished by the analysis. In some embodiments the mass tags are selected so that they vary from one another in mass by at least about 10, 20, 30, 35, 40, 45, 50, 60, 70, 80, or 100 mDalton (mDa), and in some embodiments all multiplexed sample tags vary by at least these amounts. In other embodiments, some or all of the mass tags differ in mass by amounts defined by ranges of the foregoing.
While multiple methods allow increasing proteomics throughput, plexDIA is distinct in simultaneously allowing high sensitivity, depth and accuracy. plexDIA enables a multiplicative increase (e.g., 3-fold with 3 samples, 3 labels, n-fold with n-samples, n-labels, where n can be for example, about 3, 5, 10, 20, 30, 40, 50, 100, or more) in the rate of consistent protein quantification across limited sample amounts while preserving proteomic coverage, quantitative accuracy, precision, and repeatability of LF-DIA. The gains in throughput, data completeness, and other performance measures relative to LF-DIA as described herein can also scale with “n” times J, where J, can be, for example, about 0.3, 0.5, 0.7, or 0.9.
Similar to other labeling methods, such as TMT-DDA, parallel analysis of multiple samples by plexDIA saves LC-MS/MS time and costs. Currently, the commercially available labels for plexDIA are low-plex (mTRAQ, TMT0/TMT/shTMT, or dimethyl labeling[12]), compared to 18-plex isobaric TMTpro labels available for DDA4. This current plex disadvantage is offset by the parallel precursor analysis enabled by plexDIA. Indeed, quantifying about 8,000 proteins/sample took 0.5 h for 3-plexDIA (3F-3J) and 1.1 h for a highly-optimized 16-plex TMTpro workflow[51].
Furthermore, n-plex, including e.g., 3-plex, DIA affords higher sensitivity since it does not require offline fractionation and does not incur associated sample losses. In some embodiments, higher plex mass tags for plexDIA can be used for different applications, such as single-cell proteomics[7].
The parallel sample and peptide analysis by plexDIA becomes increasingly important for lowly abundant samples since they require long ion accumulation times that undermine the throughput of serial acquisition methods, such as TMT-DDA, even when the vast majority of MS2 scans result in confident peptide identifications[7,52]. Thus, plexDIA is particularly attractive and advantageous when used for the analysis of nanogram samples, e.g., about 1, 2, 5, 10, 20, 50, 100, 200, 300, 500, 700, 1000 nanograms, it can afford accurate and deep proteome quantification without using 2-dimensional peptide separation (offline fractionation). Indeed, plexDIA can be applied to achieving sensitive and multiplexed results in the field of single-cell proteomics[7,19,28]. In some embodiments, the invention is applied to sub-nanogram samples, e.g., about 50, 100, 200, 300, or 500 picograms.
It should also be appreciated that while liquid chromatography (e.g. column) is one means of achieving a separation prior to introducing material to the mass spectrometer, other separation methods may be used as well in conjunction with aspects of the present invention, including, for example, capillary electrophoresis, or ion mobility methods, e.g., field asymmetric waveform ion mobility spectrometry (FAIMS).
The data disclosed herein demonstrate that plexDIA reduces the amount of missing data between diverse samples both within and across runs. This reduction stems from buffering sample-to-sample variability in protein composition. Furthermore, we introduced an approach for matching precursors within a run, which reduced missing data to a mere 2-3% in bulk samples (
plexDIA offers a framework that scales to n labels, and thus increases throughput n-fold, reduces costs nearly n-fold, and increases the fraction of proteins quantified across all samples. In addition, plexDIA can maintain accurate quantification and good repeatability. Here, we explicitly demonstrated this potential for n=3.
In some aspects, throughput can refer to the number of well-quantified protein data points achieved per unit time of the mass spectrometric analysis method.
The method also scales where n>3, as previously described. According to aspects of the invention any potential for interference can be resolved by increasing the resolving power of MS scans and/or improving data analysis. To sample sufficient ions from each peptide (given the finite capacity of MS detectors), one of skill in the art will recognize that smaller m/z ranges can be employed, e.g., quantification relying on small MS2 windows or split m/z ranges at MS1. As will be appreciated, the capacity of MS detectors is less limiting for small samples, such as single cells, and thus increasing the number of labels holds much potential for single-cell proteomics, as previously discussed[28,55].
It will be appreciated, for example, that various combinations of MS1 and MS2 scans can be used, such as, for example, an MS1 survey scan, an MS1 scan from about 300-1500 m/z, 450-850 m/z, or multiple MS1 scans e.g., 2, 3, 4, 5 or more, e.g., about 200-600 and 600-1500 m/z; in combination with MS2 scans having for example between about 3 and 100 windows, e.g., between about 5 and 20 windows. The width of the MS2 window can be, for example, about 2, 3, 4, 5, 10, 20, 50, 100, 200, 300, 500, 600 m/z units, or more.
In some embodiments, the plexDIA framework uses nonisobaric isotopologous labels, which advantageously results in sample-specific precursors (allowing MS1 quantification) and in sample-specific peptide fragments (allowing MS2 quantification). A variety of fragmentation methods can be used, including e.g., collision-induced dissociation (CID), and higher energy collisional dissociation (HCD).
Therefore, the plexDIA strategy enables quantification at the MS1 and MS2 levels, which offers advantages, such as evaluation of measurement reliability[56]. This strategy is opposite to previous approaches[33,34] and would have been expected to increase interference. We have demonstrated, however, that this theoretical potential was effectively thwarted by our data analysis (
In addition, it should be appreciated that a judicious selection of mass spectrometric parameters can amplify the advantages of the methods described herein, including, for example, the selection of MS1 and MS2 resolutions. Advantageously, the parameters provide enough resolutions to distinguish both peptide precursor ions and peptide fragment ions while also being implemented in short enough time frame to enable raid duty cycles, thereby achieving desirable degrees of depth of proteomic coverage high throughput, and accuracy, e.g., by sampling elution peaks at multiple time points with high time resolution. Accordingly, in some embodiments, the resolution of the first mass spectrometer (MS1 resolution) is between about 120k and 512k (where k=1000), such as for example for an orbitrap instrument. In some embodiments, the resolution is between about 70k and 512k, such as for a time of flight instrument, such as timsTOF, as described herein. The MS1 resolution can have a lower range, for example of about, 50k, 60k, 70k, 100, 120k, 150k, 200k and an upper range, for example, of about, 300K, 400k, 500k, 600k, 700k, 800k, or 900k. In some embodiments, the resolution of the second mass spectrometer (MS2 resolution) is between about 30k and 512k. The MS2 resolution can have a lower range, for example of about, 25k, 30k, 35k, 40k, 45k, 50k, 60k, 70k, 100, 120k, 150k, 200k and an upper range, for example, of about, 300K, 400k, 500k, 600k, 700k, 800k, or 900k.
Accordingly, we demonstrated the capabilities of plexDIA in providing a fold-change through-put increase for DIA proteomics, while yielding comparable data quality. While the 3-fold speed increase is a salient and sufficient advantage for many applications, plexDIA unleashes opportunities beyond sample-throughput, providing in aspects of the invention, additional advantages over existing methods. For example, plexDIA can enable gains in sensitivity important in applications for single-cell proteomics[7], and even beyond the results demonstrated in
As will be appreciated, the quantitative aspect can have a double benefit. Quantification accuracy and robustness can be improved by (i) using MS1- and MS2-level signals that are minimally affected by interference and by (ii) calculating quantities relative to the internal standard, which is likely to also significantly reduce the batch effects associated with LCMS performance variation. This makes the technology introduced by plexDIA highly promising not just for very deep profiling of selected samples using offline fractionation, but also for large-scale experiments, wherein batch effects are a significant challenge. Another avenue of plexDIA is increasing the throughput of applications seeking to quantify protein interactions, conformations and activities. For example, plexDIA is readily compatible with the recently reported covalent protein painting that enables analysis of protein conformations in living cells[57,58].
Since there are no fundamental limitations preventing the creation of non-isobaric labels which would allow a higher degree of multiplexing with DIA, we expect plexDIA to enable even higher throughput in the future. Given these considerations, we believe that plexDIA will eventually become the predominant DIA workflow, preferable over label-free approaches for most applications.
Data Interpretation with Neural Networks
To enhance MS data interpretation, the plexDIA module of DIA-NN capitalizes on the expected regular patterns in the data, such as identical retention times and known mass-shifts between the same peptide labelled with different isotopologous mass tags,
Despite the n-fold increased spectral complexity, the plexDIA framework accurately quantified peptides by calculating ratios of fragments from the most confident isotopologous precursor to the translated isotopologous precursors at the apex where the signal was greatest and the impact of interference was lowest. The mean fragment ratio was used to scale the precursor quantity of the best isotopologous precursor to the less-confident isotopologous precursors,
plexDIA Benchmarks were Established
We sought to evaluate whether plexDIA can multiplicatively increase the number of quantitative data points relative to matched label-free DIA (LF-DIA) analysis while maintaining comparable quantitative accuracy. Towards that goal, we mixed proteomes in precisely specified ratios shown in
Each sample was either analyzed by label-free DIA (LF-DIA) or labeled with one of three amine-reactive isotopologous chemical labels (mTRAQ: Δ0, Δ4, or Δ8),
The combined labelled samples were analyzed by plexDIA, and the result was used to benchmark proteomic coverage, quantitative accuracy, precision, and repeatability across runs relative to LF-DIA of the same samples. LF-DIA and plexDIA were evaluated with two data acquisition methods, V1 and V2, shown in
plexDIA Increased Throughput Multiplicatively
To directly benchmark the analysis of 500 ng protein samples by plexDIA relative to LF-DIA, the multiplexed and label-free samples described in
Both V1 and V2 resulted in approximately 2.5-fold more precursors and protein data points for plexDIA compared to LF-DIA per unit time,
plexDIA Increased Data Completeness Across Samples
Next, we sought to compare LF-DIA and plexDIA in term of the consistency of protein quantification across samples. The systematic acquisition of ions by DIA was well established as a strategy for increasing the repeatability of peptide identification relative to shotgun DDA24. We assessed whether, in addition to providing consistent data acquisition, plexDIA further reduced the variability between samples and runs, and thus further increased the consistency (overlap) between quantified proteins relative to LF-DIA.
Indeed, both SILAC and isobaric labeling reduce missing data by enabling the quantification of peptides identified in at least one sample from a labeled set[18,38]. Similarly, plexDIA takes advantage of the precisely known mass-shifts in the mass spectra for a peptide labeled with different tags to propagate peptide sequence identifications within a run. Specifically, confidently identified precursors in one channel (label) were matched to corresponding precursors in the other channels. This was the default analysis used with standards A, B and C.
plexDIA employed an additional mode for the special case when some proteins were present only in some samples of labeled sets. In such cases, plexDIA enabled sample specific identification for each protein by using multiple MS1- and MS2-based features to rigorously evaluate the spectral matches within a run and explicitly assign confidence for the presence of each protein in each sample. Such a special case was exemplified by a plexDIA set in which one sample had both yeast and bacterial proteins while another sample had only yeast proteins,
To assess whether plexDIA could improve data completeness, the protein groups intersecting across samples A, B and C were plotted as Venn diagrams for each replicate of plexDIA and LF-DIA,
We further benchmarked the consistency of identified proteins both from the repeated analysis of the same sample (such as replicate injections of sample A) and from the analysis of different samples (such as comparing samples B and C). Consistent with prior reports for DIA data completeness, both LF-DIA and plexDIA identified largely the same proteins from replicate injections, quantified by high Jaccard indices and only about 13-15% non-overlapping proteins, as shown in
The overlap between the proteins identified in distinct samples remained similarly high for plexDIA while it was significantly reduced for the LF-DIA analysis,
The Quantitative Accuracy of plexDIA was Comparable to LF-DIA
To benchmark the quantitative accuracy and precision of plexDIA and LF-DIA, we compared the measured protein ratios between pairs of samples to the ones expected from the study design,
For well-controlled comparisons between the quantitative accuracy of LF-DIA and plexDIA, we used the set of protein ratios quantified by both methods. The comparison results from V1 are shown in
By design, plexDIA allows quantifying precursors based on MS2- and MS1-level data, and we evaluated the quantitative accuracy for both levels of quantification,
The Repeatability of plexDIA was Comparable to LF-DIA
To assess the repeatability of plexDIA and LF-DIA quantitation, we computed the coefficient of variation (CV) for proteins quantified in triplicate runs for each method using MaxLFQ abundances[40]; we required each protein group to be quantified three times for plexDIA and LF-DIA, then the CVs for the overlapping sample-specific protein groups (n=12,863) were plotted in
Estimating Differential Protein Abundance by plexDIA and LF-DIA
We investigated the agreement of differential protein abundance between U-937 and Jurkat cell lines with plexDIA and LF-DIA. Differential protein abundance was estimated from LF-DIA data, and the differentially abundant proteins at 1% FDR were used to assess the agreement between U-937 and Jurkat protein ratios estimated by plexDIA and LF-DIA,
We also compared the ability of plexDIA and LF-DIA to recall true differentially abundant proteins as a function of each method's empirical FDR. Our experimental design from
Both methods used 3 replicates and performed comparably at 1% empirical FDR, with 643 proteins and 663 proteins found to be differentially abundant for plexDIA and LF-DIA, respectively. The slight increase of true positives for LF-DIA at higher empirical FDR may have been due to its slightly higher precision as visible in
Cell Division Cycle Analysis with plexDIA
Next, we applied plexDIA to quantify protein abundance across the cell division cycle (CDC) of U-937 monocyte cells. The CDC analysis allows further validation of plexDIA based on well-established biological processes during the CDC while simultaneously offers the possibility of new discoveries. The ability of plexDIA to analyze small samples made it possible to isolate cells from different phases of the CDC based on their DNA content,
The peptides from the sorted cells were labeled with non-isobaric isotopologous labels, combined, and analyzed both by MS1-optimized (V1) and MS2-optimized (V2) plexDIA methods,
To identify biological processes regulated across the phases of the CDC, we performed PSEA using data from both V1 and V2,
To further explore the proteome remodeling during the CDC, we identified differentially abundant proteins across G1, S, and G2/M phase,
In addition to the differential abundance of classic CDC regulators, we found that some poorly characterized proteins were also differentially abundant, such as proteins CDV3 and JPT2. To further investigate these proteins, we examined the extracted ion current (XIC) for representative peptides from these proteins,
Single-Cell Analysis with plexDIA
Next, we evaluated the potential of plexDIA to quantify proteins from single human cells. Thus, we prepared plexDIA sets from single cells from melanoma (WM989-A6-G3), pancreatic ductal adenocarcinoma (PDAC), and monocytes (U-937) cell lines were prepared into plexDIA sets using the nano-ProteOmic sample Preparation (nPOP)[44].
We aimed to test its generalizability to different types of MS detectors, an orbitrap and a TOF detector, and its ability to take advantage of ion mobility technology, such as trapped ion mobility spectrometry[45]. The technologies were implemented by analyzing single-cell plexDIA samples using two commercial platforms, timsTOF SCP (
As observed with bulk samples, plexDIA resulted in high data completeness among single-cell proteomes,
plexDIA quantified protein fold-changes spanning a 1,000-fold dynamic range and exhibited good agreement with corresponding fold-changes quantified from 100-cell bulk samples,
Sampling and detecting a sufficient number of precursor copies is key for accurate precursor quantification; otherwise quantification accuracy can be undermined by counting noise[19,48]. Since peptide fragmentation is usually incomplete, approaches like plexDIA that can perform MS1-level quantification are likely to count more copies per peptide than approaches relying on MS2 or MS3 level quantification[6]. To evaluate this expectation, we estimated the number of peptide and protein copies that the orbitrap counted from single cells,
Single-cell plexDIA data acquired from Q-Exactive and timsTOF SCP instruments were projected using a weighted PCA,
Methods of Cell Culture and Sample Preparation
Cell Culture
U-937 (monocytes) and Jurkat (T-cells) were cultured in RPMI-1640 Medium (Sigma-Aldrich, R8758), HPAF-II cells (pancreatic ductal adenocarcinoma (PDAC) cells, ATCC CRL-1997) were cultured in EMEM (ATCC 30-2003); all three cell-lines were supplemented with 10% fetal bovine serum (Gibco, 10439016) and 1% penicillin-streptomycin (Gibco, 15140122) and grown at 37.C. Melanoma cells (WM989-A6-G3, a kind gift from Arjun Raj, University of Pennsylvania) were grown as adherent cultures in TU2% media which is composed of 80% MCDB 153 (Sigma-Aldrich M7403), 10% Leibovitz L-15 (ThermoFisher 11415064), 2% fetal bovine serum, 0.5% penicillin-streptomycin and 1.68 mM Calcium Chloride (Sigma-Aldrich 499609). All cells were harvested at a density of 106 cells/mL and washed with sterile PBS. For bulk plexDIA benchmarks, U-937 and Jurkat cells were resuspended to a concentration of 5×106 cells/mL in LC-MS water and stored at −80° C.
E. coli and S. cerevisiae were grown at room-temperature (21° C.) shaking at 300 rpm in Luria Broth (LB) and yeast-peptone-dextrose (YPD) media, respectively. Cell density was measured by OD600 and cells were harvested mid-log phase, pelleted by centrifugation, and stored at −80° C.
Preparation of Bulk plexDIA Samples
The harvested U-937 and Jurkat cells were heated at 90.0 in a thermal cycler for 10 min to lyse by mPOP59. Tetraethylammonium bromide (TEAB) was added to a final concentration of 100 mM (pH 8.5) for buffering, then proteins were reduced in tris(2-carboxyethyl)phosphine (TCEP, Supelco, 646547) at 20 mM for 30 minutes at room temperature. Iodoacetamide (Thermo Scientific, A39271) was added to a final concentration of 15 mM and incubated at room temperature for 30 minutes in the dark. Next, Benzonase Nuclease (Millipore, E1014) was added to 0.3 units/μL, Trypsin Gold (Promega, V5280) to 1:25 ratio of substrate:protease, and LysC (Promega, VA1170) to 1:40 ratio of substrate:protease, then incubated at 37.0 for 18 hours. E. coli and S. cerevisiae samples were prepared similarly; however, instead of lysis by mPOP, samples were lysed in 6 M Urea and vortexed with acid-washed glass beads alternating between 30 seconds vortexing and 30 seconds resting on ice, repeated for a total of 5 times.
After digestion, all samples were desalted by Sep-Pak (Waters, WAT054945). Peptide abundance of the eluted digests was estimated by nanodrop A280, and then the samples were dried by speed-vacuum and resuspended in 100 mM TEAB (pH 8.5). U-937, Jurkat, E. coli, and S. cerevisiae digests were mixed to generate three samples which we refer to as Sample A, B, and C, and the mixing ratios are described in Table 51. Samples A, B, and C were split into two groups: (i) was kept label-free, and (ii) was labeled with mTRAQ Δ0, Δ4, or Δ8 (SciEx, 4440015, 4427698, 4427700), respectively. An appropriate amount of each respective mTRAQ label was added to each Sample A-C, following manufacturers' instructions. In short, mTRAQ was resuspended in isopropanol, then added to a concentration of 1 unit/100 μg of sample and left to incubate at room-temperature for 2 hours. We added an extra step of quenching the labeling reaction with 0.25% hydroxylamine for 1 hour at room-temperature, as is commonly done in TMT experiments where the labeling chemistry is the same[6,50]. After quenching, the mTRAQ-labeled samples (A-C) were pooled to produce the final multiplexed set used for benchmarking plexDIA.
Preparation of Single-Cell plexDIA Samples
Single cells were thawed from liquid nitrogen storage in 10% DMSO and culture media at a concentration of 1×106 cells/mL. Cells were first washed twice in PBS to remove DMSO and media and then were suspended in PBS at 200 cells/μL for sorting and sample preparation by nPOP as detailed by Leduc et al. [44]. In brief, single cells were isolated by CellenONE and prepared in droplets on the surface of a glass slide, including lysing, digesting, and labeling individual cells. In each droplet, single-cells were lysed in 100% DMSO, proteins were digested with Trypsin Gold at a concentration of 120 ng/μL and 5 mM HEPES pH 8.5, peptides were chemically labeled with mTRAQ, then finally single-cells were pooled into a plexDIA set for subsequent analysis. Cells were prepared in clusters of 3 for ease of downstream pooling into plexDIA sets; a total of 48 plexDIA sets were prepared per single glass slide. (It will be appreciated that other methods of digestion. e.g, non-enzymatic, e.g., formic acid can be used as well in accordance with aspects of the invention)
Each plexDIA set was composed of a single PDAC, Melanoma, and U-937 cell, except if a negative control was present in place of a cell. For samples run on the Q-Exactive, every fourth set contained a negative control that received all the same reagents but did not include a single cell. This resulted in 132 single cells prepared with 12 total negative controls. 10 additional plexDIA sets were run on the timsTOF SCP for a total of 30 single cells (no negative controls). Celltypes were labeled with randomized mass tags designs in the plexDIA sets to avoid any systematic biases with labeling. Specifically, each cell type was labeled with each mass tag as described in the single-cell metadata file.
Cell Division Cycle, FACS and Sample Preparation
U-937 monocytes were grown as described above, harvested and aliquoted to a final 1 mL suspension of approximately 1×106 cells in RPMI-1640 Medium. Then DNA was stained by incubating the cells with Vybrant DyeCycle Violet Stain (Invitrogen, V35003) at a final concentration of 5 μM in the dark for 30 minutes at 37° C., as per the manufacturer's instructions. Next, the cells were centrifuged, then resuspended in PBS to a density of 1×106 cells/mL. The cell suspension was stored on ice and protected from light until sorting began.
The cells were then sorted with a Beckman CytoFLEX SRT. The population of U-937s was gated to select singlets based on FSC-A and FSC-H, this population of singlets was then subgated based on DNA content using the PB-450 laser (ex=405 nm/em=450 nm). The G1 population is the most abundant population in actively dividing cells, and the G2/M populations should theoretically have double the intensity (DNA content), while the S-phase lies in between. Populations of G1, S, and G2/M cells were collected based on these subgates and sorted into 2 mL Eppendorf tubes.
Post-sorting, the cells were centrifuged at 300 g for 10 minutes, PBS was removed, then the cells were resuspended in 20 μL HPLC water to reach a density of approximately 4,000 cells/μL. The cell suspensions were lysed using the Minimal ProteOmic sample Preparation (mPOP) method, which involves freezing at −80° C. and then heating to 90.0 for 10 minutes[59]. Next, the cell lysates were prepared exactly as described in the “Sample Preparation” section. In brief, the cell lysate was buffered with 100 mM TEAB (pH 8.5), then proteins were reduced with 20 mM TCEP for 30 minutes at room temperature. Next, iodoacetamide was added to a final concentration of 15 mM and incubated at room temperature for 30 minutes in the dark, then Benzonase Nuclease was added to 0.3 units/μL. Trypsin Gold and LysC to were then added to the cell lysate at 1:25 and 1:40 ratio of protease:substrate, then the samples were incubated at 37.0 for 18 hours. After digestion, the peptides were desalted by stage-tipping with C18 extraction disks (Empore, 66883-U) to remove any remaining salt that was introduced during sorting[60]. G1 cells were labeled with mTRAQ Δ0, S cells were labeled with mTRAQ Δ4, and G2/M cells were labeled with 48, then combined to form a plexDIA set of roughly 2,000 cells per cell-cycle phase (label). The combined set was analyzed with 2 hour active gradients of MS1 (V1) and MS2-optimized (V2) methods as described in the “Acquisition of bulk data” section.
Acquisition of Bulk Data
Multiplexed and label-free samples were injected at 1 μL volumes via Dionex UltiMate 3000 UH-PLC to enable online nLC with a 25 cm×75 μm IonOpticks Aurora Series UHPLC column (AUR2-25075C18A). These samples were subjected to electrospray ionization (ESI) and sprayed into a Thermo Q-Exactive orbitrap for MS analysis. Buffer A is made of 0.1% formic acid (Pierce, 85178) in LC-MS-grade water; Buffer B is made of 80% acetonitrile and 0.1% formic acid mixed with LC-MS-grade water.
The gradient used for LF-DIA is as follows: 4% Buffer B (minutes 0-11.5), 4%-5% Buffer B (minutes 11.5-12), 5%-28% Buffer B (minutes 12-75), 28%-95% Buffer B (minutes 75-77), 95% Buffer B (minutes 77-80), 95%-4% Buffer B (minutes 80-80.1), then hold at 4% Buffer B until minute 95, flowing at 200 nl/min throughout the gradient. The V1 duty cycle was comprised of 5×(1 MS1 full scan×5 MS2 windows) as illustrated in
mTRAQ labeling increases hydrophobicity of peptides, which is why a higher % Buffer B is used during the active gradient of multiplexed samples; in addition, the scan range was shifted 100 m/z higher than LF-DIA to account for the added mass of the label. The gradient used for plexDIA is as follows: 4% Buffer B (minutes 0-11.5), 4%-7% Buffer B (minutes 11.5-12), 7%-32% Buffer B (minutes 12-75), 32%-95% Buffer B (minutes 75-77), 95% Buffer B (minutes 77-80), 95%-4% Buffer B (minutes 80-80.1), then hold at 4% Buffer B until minute 95, flowing at 200 nl/min throughout the gradient. The plexDIA V1 duty cycle was comprised of 5×(1 MS1 full scan×5 MS2 windows), for a total of 25 MS2 windows to span to full m/z scan range (480-1470 m/z) with 0.5 Th overlap between adjacent windows. The length of the windows was variable for each subcycle (20 Th for subcycles 1-3, 40 Th for subcycle 4, and 100 Th for subcycle 5). Each MS1 full scan was conducted at 140k resolving power, 3 ×106 AGC maximum, and 500 ms maximum injection time. Each MS2 scan was conducted at 35k resolving power, 3 ×106 AGC maximum, 110 ms maximum injection time, and 27% normalized collision energy (NCE) with a default charge of 2. The RF S-lens was set to 30%. The plexDIA V2 duty cycle consisted of one MS1 scan conducted at 70k resolving power with a 300 ms maximum injection time and 3×106 AGC maximum, followed by 40 MS2 scans at 35k resolving power with 110 ms maximum injection time and 3×106 AGC maximum. The window length for the first 25 MS2 scans was set to 12.5 Th; the next 7 windows were 25 Th, then the last 8 windows were 62.5 Th. Adjacent windows shared a 0.5 Th overlap. All other settings were the same as the plexDIA V1 method. Data acquired for the cell-division-cycle used 2 hour active gradients of the V1 and V2 methods.
The gradient used for mTRAQ DDA is the same used for plexDIA. However, the duty cycle was a shotgun DDA method. The MS1 full scan range was 450-1600 m/z, and was performed with 70k resolving power, 3×106 AGC maximum, and 100 ms injection time. This shotgun DDA approach selected the top 15 precursors to send for MS2 analysis at 35k resolving power, 1×105 AGC maximum, 110 ms injection time, 0.3 Th isolation window offset, 0.7 Th isolation window length, 8×103 minimum AGC target, and 30 second dynamic exclusion.
Acquisition of Single-Cell Data
Q-Exactive
plexDIA single cell sets and 100-cell standards were injected at 1 μL volumes via Dionex UltiMate 3000 UHPLC to enable online nLC with a 15 cm×75 μm IonOpticks Aurora Series UHPLC column (AUR2-15075C18A). These samples were subjected to electrospray ionization (ESI) and sprayed into a Thermo Q-Exactive orbitrap for MS analysis. Buffer A is made of 0.1% formic acid (Pierce, 85178) in LC-MS-grade water; Buffer B is made of 80% acetonitrile and 0.1% formic acid mixed with LC-MS-grade water. The gradient used is as follows: 4% Buffer B (minutes 0-2.5), 4%-8% Buffer B (minutes 2.5-3), 8%-32% Buffer B (minutes 3-33), 32%-95% Buffer B (minutes 33-34), 95% Buffer B (minutes 34-35), 95%-4% Buffer B (minutes 35-35.1), then hold at 4% Buffer B until minute 53, flowing at 200 nl/min throughout the gradient. The plexDIA duty cycle was comprised of 1 MS1 followed by 4 DIA MS2 windows of variable m/z length (specifically 120 Th, 120 Th, 200 Th, and 580 Th) spanning 378-1402 m/z. Each MS1 and MS2 scan was conducted at 70k resolving power, 3 ×106 AGC maximum, and 300 ms maximum injection time. Normalized collision energy (NCE) was set to 27% with a default charge of 2. The RF S-lens was set to 80%.
To generate a spectral library from 100-cell standards on the Q-Exactive, the same settings were used with the exception that the duty consisted of 1 MS1 and 25 MS2 windows of variable m/z length (specifically 18 windows of 20 Th, 2 windows of 40 Th, 3 windows of 80 Th, and 2 windows of 160 Th). The MS2 scans were conducted at 35k resolving power, 3 ×106 AGC maximum, and 110 ms maximum injection time.
timsTOF SCP
The single-cell plexDIA sets were separated on a nanoElute liquid chromatography system (Broker Daltonics, Bremen, Germany) using a 25 cm×75 μm, 1.6 μm C18 (AUR2-25075C18A-CSI, IonOpticks, Au). The analytical column was kept at 50° C. Solvent A was 0.1% formic acid in water, and solvent B was 0.1% formic acid in acetonitrile. The column was equilibrated with 4 column volumes of mobile phase A prior to sample loading. The peptides were separated over 30 min at 250 nL/min using the following gradients: from 2% to 17% B in 15 min, from 17% to 25% B in 5 min, 25% to 37% B in 3 min, 37%-85% B in 3 min, maintained at 85% for 4 min.
The timsTOF SCP was operated in dia-PASEF mode with the following settings: Mass Range 100 to 1700 m/z, 1/k0 Start 0.6 V s/cm2, End 1.2 V s/cm2, ramp and accumulation times were set to 166 ms, Capillary Voltage was 1600V, dry gas 3 l/min, and dry temp 200° C. dia-PASEF settings: Each cycle consisted of 1× MS1 full scan and 5×MS2 windows covering 297.7-797.7 m/z and 0.63-1.10 l/k0. Each window was 100 Th wide by 0.2 V s/cm2 high. There was no overlap in either m/z or 1/k0 (
Spectral Library Generation
The in silico predicted spectral library used in LF-DIA analysis was generated by DIA-NN's (version 1.8.1 beta 16) deep learning-based spectra and retention time (RT), and IMs prediction based on Swiss-Prot H. sapiens, E. coli, and S. cerevisiae FASTAs (canonical & isoform) downloaded in February 2022. The spectral library used for plexDIA benchmarking was created in a similar process, with the exception of a few additional commands entered into the DIA-NN command line GUI: 1) {—fixed-mod mTRAQ 140.0949630177, nK}, 2) {—original-mods}. Two additional libraries were generated: 1) mTRAQ-labeled spectral library from FASTAs containing only E. coli, and S. cerevisiae sequences. 2) mTRAQ-labeled spectral library from a FASTA containing only H. sapiens sequences; the former was used to search data shown in Fig. S2, and the latter was used to search cell-division-cycle and 100-cell standards. Triplicates of 100-cell standards of PDAC, Melanoma, and U-937 cells were run with the 1 MS1 ×25 MS2 scans method, searched using the in silico-generated human-only spectral library. The results of this search generated a sample-specific library covering about 5,000 protein groups; this library was used the search single-cell plexDIA sets acquired on the Q-Exactive and on the timsTOF SCP, as well as 100-cell standards run on the Q-Exactive with the same method used to acquire single-cell plexDIA data.
plexDIA Module in DIA-NN
A distinct feature of DIA-MS proteomics is the complexity of produced spectra, which are a mixture of fragments ions originating from multiple co-isolated precursors. This complexity has necessitated the rise of a variety of highly sophisticated algorithms for DIA data processing. Current DIA software, such as DIA-NN[25], aims to find peak groups in the data that best match the theoretical information about such peptide properties as the MS/MS spectrum, the retention time and the ion mobility. Once identified correctly, the peak group, that is the set of extracted ion chromatograms of the precursor and its fragments in the vicinity of the elution apex, allows to integrate either the MS1- or MS2-level signals to quantify the precursor, which is the ultimate purpose of the workflow.
Similar to match-between-runs (MBR) algorithms, plexDIA data provide the opportunity to match corresponding ions, in this case between the same peptide labeled with different mass tags. However, the use of isotopologous mass tags, such as mTRAQ, allows to match the retention times within a run with much higher accuracy than what can be achieved across runs. Thus, the sequence propagation can be more sensitive and reliable than with MBR[7]. This allows to enhance sequence identifications analogously to the isobaric carrier concept introduced by TMT-based single-cell workflows[53,61]. With the isobaric carrier approach, a carrier channel is loaded with a relatively high amount of peptides originating from a pooled sample that facilities peptide sequence identification[20,28]. We implemented a similar approach in the plexDIA module integrated in DIA-NN. Once a peptide is identified in one of the channels, this allows to determine its exact retention time apex, which in turn helps identify and quantify the peptide in all of the channels by integrating the respective precursor (MS1) or fragment ion (MS2) signals.
Apart from the identification performance, plexDIA also can increase quantification accuracy. The rich complex data produced by DIA promotes more accurate quantification because of algorithms that select signals from MS/MS fragment ions which are affected by interferences to the least extent[25]. For LF-DIA, DIA-NN selects fragments in a cross-run manner: fragments which tend to correlate well with other fragments across runs are retained, while those which often exhibit poor correlations due to interferences are excluded from quantification. While this approach yields good results, a limitation remains for LF-DIA: fragment ions only affected by interferences in a modest proportion of runs are still used for quantification, thus undermining the reliability of the resulting quantities in those runs. Here plexDIA provides a unique advantage. Theoretically, a single MS1- or MS2-level signal with minimal interference is sufficient to calculate the quantitative ratio between the channels. In this case, if low interference quantification is possible in at least one ‘best’ channel, this quantity can be multiplied by the respective ratios across other channels to obtain accurate estimates of quantities in all channels that share at least one low interference signal with this ‘best’ channel. This idea is implemented in DIA-NN to produce ‘translated’ quantities, which have been corrected by using ratios of high quality MS1 or MS2 signals between channels as described in
Data Analysis with DIA-NN
DIA-NN (version 1.8.1 beta 16) was used to search LF-DIA and plexDIA raw files, which is available at plexDIA.slavovlab.net and scp.slavovlab.net/plexDIA. All LF-DIA benchmarking raw files were searched together with match between runs (MBR) if the same duty cycle was used; likewise, all plexDIA benchmarking raw files were searched together with MBR if the same duty cycle was used with the exception of the cell-division-cycle experiments which used V1 and V2 methods—these two runs were searched together.
DIA-NN search settings: Library Generation was set to “IDs, RT, & IM Profiling”, Quantification Strategy was set to “Peak height”, scan window=1, Mass accuracy=10 ppm, and MS1 accuracy=5 ppm, “Remove likely interferences”, “Use isotopologues”, and “MBR” were enabled. Additional commands entered into the DIA-NN command line GUI for plexDIA: 1) {—fixed-mod mTRAQ 140.0949630177, n1(}, 2) {—channels mTRAQ, 0, nK, 0:0; mTRAQ, 4, nK, 4.0070994:4.0070994; mTRAQ, 8, nK, 8.0141988132:8.0141988132}, 3) {—original-mods}, 4) {—peak-translation}, 5) {—msl-isotope-quant}, 6) {—report-lib-info}, and 7) {—mass-acc-quant 5.0}. Note, #7 is only necessary for instances when MS2 quantitation is intended to be used; this command will use the pre-defined mass accuracy (e.g. 10 ppm) to identify precursors, but restrict the mass error tolerance to the value specified for quantitation; this can help reduce the impact of interferences for MS2-level quantitation. For LF-DIA, only the following additional commands were used: 1) {—original-mods}, 2) {—peak-translation}, 3) {—msl-isotope-quant}, 4) {—report-lib-info}, and 5) {—mass-acc-quant 5.0}. The same search settings were used for single-cell Q-Exactive and timsTOF SCP data, however ‘scan window’ was increased to 5.
Alysis with MaxQuant, DDA
MaxQuant (version 1.6.17.0) was used to search triplicate mTRAQ DDA, bulk benchmarking runs. MBR was enabled, and ‘Type’ was selected as ‘Standard’ with ‘Multiplicity’=3; mTRAQ-Lys0 & mTRAQ-Nter0, mTRAQ-Lys4 & mTRAQ-Nter4, and mTRAQ-Lys8 & mTRAQ-Nter8 were selected for light, medium, and heavy labels. Variable modifications included Oxidation (M), Acetyl (Protein-N-term); Carbamidomethyl (C) was selected as a fixed modification. Trypsin was selected as the protease, and searched with max. missed cleavage=2.
Quantifying Proteins for Bulk plexDIA Benchmarks
MaxLFQ abundance for protein groups was calculated based on MS1 intensities (specifically the “MS1 Area” column output by DIA-NN) using the DIA-NN R package25 for data acquired with the V1 method. However, for data acquired using the V2 method, MS2 quantitation (specifically the “Precursor Translated” column output by DIA-NN) was used for quantitation. These protein abundances were used to calculate protein ratios across samples, which were normalized by sub-setting human proteins (which are present in a 1:1 ratio, theoretically) and multiplying by a scalar such that the human protein ratios were centered on 1, and thus the other species (E. coli, S. cerevisiae) would be systematically shifted to account for any small loading differences across samples.
The quantitative comparisons between LF-DIA and plexDIA throughout this article are for intersected sets of proteins so that the results would not be influenced by proteins analyzed only by one method and not the other. For examples, compared distributions were for the same set of proteins to avoid “survival biases”62.
Protein-Set Enrichment Analysis (PSEA)
PSEA was performed across the multiplexed bulk samples corresponding to cells sorted by DNA content into cell cycle phases (G1, S, and G2/M). The reference human gene set database was acquired from GOA63. The Kruskall Wallis test was used to determine whether the hypothesis that all multiplexed samples had equivalent median protein abundances for a functionally annotated group of proteins could be rejected at a q value <0.05. Only protein sets with at least 4 proteins present were statistically tested. PSEA was run separately for the multiplexed samples analyzed by V1 and V2 methods. Protein sets were combined from both data-acquisition methods if at least one method produced a q value <0.05.
Differential Protein Abundance Testing
Differential protein abundance testing was performed using precursor-level quantitation. To account for variation in sample loading amounts, precursors from each sample were normalized to their sample-median. Then, each precursor was normalized by its mean across samples to convert it to relative levels. The normalized relative precursor intensities from different replicates were grouped by their corresponding protein groups and compared by a two-tailed t-test (
Relative Protein Fold-Change Between U-937 Cells and Jurkat Cells, Bulk
Protein group abundances for were calculated by MaxLFQ from triplicates of LF-DIA and plex-DIA; specifically, sample B and sample C were compared to calculate relative fold-changes between H. sapiens cell-lines, U-937 and Jurkat. The protein groups plotted were required to be quantified in each of the triplicates of plexDIA and LF-DIA. A Spearman correlation was calculated for all protein groups and for differentially abundant protein groups.
Correcting Isotopic Envelope of plexDIA Precursors
mTRAQ labels, which were used in this demonstration of plexDIA, are separated by 4 Daltons (Da). Because C-terminal arginine precursors are singly-labeled and have a mere 4 Da separating isotopologous precursors, there is greater potential of isotopic envelope interference from lighter channels into heavier channels than there is for C-terminal lysine precursors which would be separated by 8 Da; therefore, to improve quantitative accuracy, we correct the theoretical super-position of isotopic envelopes between channels for C-terminal arginine precursors. This can be accomplished because each precursor has a well-defined theoretical distribution of isotopes that we model with a binomial distribution; we use this theoretical distribution of isotopes to subtract and add back a precise amount of signal from heavier channels to lighter channels for MS1-level quantitation of each precursor.
Extracted Ion Current (XIC)
A precursor from a subset of proteins found to be differentially abundant was selected to be plotted to display the extracted ion current at MS1 and for fragments at MS2. Ion current was extracted using the DIA-NN GUI command interface by typing {—vis 25, PEPTIDE} where “PEPTIDE” is the peptide sequence and “25” is the number of scans to extract. MS1 and MS2 XICs were plotted to show the full elution profile. The four highest correlated fragments at MS2 were plotted; y-ions from C-terminal arginine peptide were excluded from plotting at MS2-level because these fragments are a super-position across samples as the C-terminus of arginine peptides is not labeled, and therefore, not sample-specific. The lines in
Estimating Peptide and Protein Copy Numbers
Precursor copy numbers at the MS1-level were estimated based on the signal-to-noise level (S/N) of individual peaks. The noise level of centroided spectra were used as reported by the Thermo firmware and extracted using a modified version of the ThermoRawFileParser64. Precursors reported by DIA-NN were matched to the S/N data based on the reported retention time with a tolerance of 5 scans and 12 ppm mass error. The number of charges in an orbitrap is proportional to the S/N level and scales with a linear factor CN. This factor has been estimated to be CN=3.5 for the Q-Exactive orbitrap[65,66] and has been confirmed by investigations with high-field orbitraps[49]. This proportionality constant was estimated at a resolving power of 240,000 and must be scaled by the square root ratio with the resolving power used for acquiring the spectra (R=70, 000). Precursor copy numbers are then calculated based on the number of charges z per precursor.
Analogous to the quantification, copy numbers were summed over the M and M+1 peaks. Peptide-level copy numbers were calculated as the sum of all charge states found for a given peptide; protein-level copy numbers were calculated as the sum of all peptides not shared with other proteins (proteotypic).
Single Cell Data Analysis
To increase sensitivity of single-cell analysis, Ms1.Extracted quantities output by DIA-NN were used for quantitation rather than Ms1.Area. Single cells with more than 60% missing data (no extracted MS1-level quantitation) at precursor-level were considered to have failed in sample preparation and were removed from analysis. Quantitative accuracy of single-cell sets was assessed by calculating fold-change between PDAC and U-937 cell-types of averaged single-cell MaxLFQ protein quantities and calculating a Spearman correlation to 100-cell bulk comparisons. The 100-cell bulk comparisons consisted of triplicates in which the each replicate alternated the labeling scheme. For a protein group to be included in the comparison, it was required to be quantified in at least 5 single-cells, and ⅔ of the bulk triplicates. Both the timsTOF SCP single-cell data and QE single-cell data were benchmarked to the same 100-cell QE-acquired plexDIA sets. Because missing data in DIA is related to low protein abundance, the missing MaxLFQ protein abundances in single cells and bulk were imputed with the lowest non-zero protein abundance for that protein in the same cell-type and condition (bulk or single-cells). The mean of each protein across the single cell observations and bulk triplicates (respectively) was taken to represent that cell-type and condition-specific protein abundance.
Single-cell sets acquired on the timsTOF SCP and QE were prepared on different days with different batches of cells. Generally, the data is quite similar as indicated by PCA
100-cell bulk plexDIA triplicates were used to identify proteins which are differentially abundant between U-937 and PDAC cells. Three proteins were chosen, and one precursor from each protein was selected to have its ion-current extracted and plotted from single-cell Q-Exactive acquired data. Please see the “Extracted ion current (XIC)” subsection for more details about how this is performed.
PCA was performed on Ms1.Extracted timsTOF SCP single-cell, Q-Exactive single-cell, and Q-Exactive 100-cell data. The following is a brief outline of the computational workflow: the abundance of each precursor was divided by the mean abundance of all 3 isotopologous precursors within the plexDIA set; then, the precursors of each labeled cell in each plexDIA was normalized to its median abundance; then, each normalized precursor was divided by the mean of normalized precursor abundance across all labels and sets. These normalized precursor abundances were collapsed to protein group level by the median normalized abundance precursor. The protein group data was then normalized in the same way the precursors were normalized. Missing protein group data for each cell was imputed by K-nearest-neighbors; the data set was batch-corrected; and finally, a weighted PCA was generated from the data, as was previously described[50].
The relevant teachings of all patents, published applications and references cited herein are incorporated by reference in their entirety.
While example embodiments have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the embodiments encompassed by the appended claims.
This application claims the benefit of U.S. Provisional Application No. 63/264,237, filed on Nov. 17, 2021 and U.S. Provisional Application No. 63/209,235, filed on Jun. 10, 2021. The entire teachings of the above applications are incorporated herein by reference.
This invention was made with government support under Grant Number GM123497 awarded by the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63264237 | Nov 2021 | US | |
63209235 | Jun 2021 | US |