1. Technical Field
The present disclosure relates, in general, to a method of analyzing a protein using a mass spectrometer, and more particularly to a method of analyzing a protein, which comprises analyzing a protein by data-independent analysis (DIA or MSE) and verifying the analyzed protein by data-dependent analysis (DDA).
2. Related Art
Proteomics is a field of study that aims to identify, characterize, and quantify proteins that are expressed in cells or tissues. Proteomics begin with the rapid development of mass spectrometry after 1990s together with the construction and possible use of a database for the amino acid sequences of proteins.
In comparison with conventional protein biochemistry that has been used to analyze individual proteins, proteomics is very different in terms of the volumes of targets, speeds, the automation of separation means, and the use of genomic/proteomic database information. Because proteomics is a large-scale, multi-stage, high-speed analysis technique that investigates total intracellular protein, it can be applied to investigate the expression, function, structure, and posttranslational modification (PTM) of proteins and protein-protein interactions, and thus it is more complex than genomics and involves a huge amount of data. Proteomics allows the analysis and understanding of the physiological changes, binding properties, and functions of cells. Thus, proteomics can be used to analyze protein isoforms, post-translational modifications such as phosphorylation, binding partners, etc., which cannot be found based on genetic information alone, and thus it can be used to analyze the mechanism of development of diseases and diagnose or treat diseases.
Generally, in proteomics, a protein mixture isolated from cells is digested by a specific method to make peptides, which are then subjected to mass spectrometry to obtain the mass spectrum information of the peptides, and the mass spectrum information is compared with an existing database, thereby quantitatively and qualitatively analyzing the protein. In other words, using data obtained from mass spectrometry and the protein sequences registered in databanks (NCBI, EXPASY, ETS, etc.), predicted data are compared and examined through a hypothetical fragmentation, thereby identifying proteins present in the sample. This proteomics is very useful, because gene information can be obtained by searching a genome and gene sequence database, and the amount of protein information registered in databanks is increasing in a geometrical progression.
A mass spectrometer is called in various names according to an ionization source and a mass analyzer (detector). Methods that are typically used to ionize sample proteins or peptides include electrospray ionization (ESI) and matrix-assists laser desorption ionization (MALDI). ESI is a method of ionizing liquid samples and is easily directly connected with a liquid chromatography separation method. MALDI comprises mixing a matrix with a sample, drying the mixture to form a crystal and ionizing the crystal by a laser.
Mass analyzers that are currently widely used include a ion trap analyzer, a time-of-flight (TOF) analyzer, a quadrupole (Q) analyzer and a fourier transform ion cyclotron resonance (FT-ICR) analyzer, which are used alone or in a combination of two or more thereof (tandem mass spectrometer).
Among tandem mass spectrometers, a triple-quadrupole mass spectrometer consists of three quadrupole analyzers (Q1, Q2 and Q3) connected in tandem. In the central quadrupole analyzer (Q2), injected neutral gas collides with sample ions to fragment the ions. The tripe-quadrupole analyzer is operated in two modes: a scan mode and a fragmentation mode. In the scan mode, only the Q1 analyzer is operated so that ions of all m/z values are recorded, and it is possible to perform the mass analysis of all ions within 1 sec. In the fragmentation mode, Q1, Q2 and Q3 are all used. In Q1(mass filter), voltage applied to the quadrupole is controlled (filtered) such that only ions having a predetermined m/z value (or range) are passed through Q1, and the passed ions enter a collision chamber (Q2). The ions that entered the collision chamber are fragmented by collision with argon gas. The fragmented ions enter Q3 and they are separated by mass-to-charge ratio and the results are recorded in the detector.
A data-dependent analysis (DDA) method is carried out using this tripe-quadrupole analyzer. The DDA method comprises obtaining mass-to-charge (m/z) values for all peptide ions in a sample in a scan mode, fragmenting the peptide in a fragmentation mode (MS/MS), and obtaining mass-to-charge (m/z) for the pigmented ions. Herein, MS and MS/MS are crossed to produce data (spectra).
The DDA method has an advantage in that, if accurate information about retention time and mass value (m/z) is input, only a substance in a sample, corresponding to the input information, can be analyzed. However, it has a disadvantage in that substances having large peptide ions are likely to be analyzed, and thus a small amount of a peptide may not be analyzed because it is not fragmented.
In recent years, as a methodology for obtaining peptide information, which has a concept different from the DDA method, a data-independent analysis method (high/low collision energy MS; MSE) has been proposed in which high collision energy and low collision energy are applied at the same time. This MSE method is also carried out using the triple-quadrupole analyzer. The MSE method comprises causing all peptides passed in unit time to collide with collision gas so as to be fragmented, and combining the information about the mixed peptide fragments with retention time in liquid chromatography and the patterns of obtained mass values, thereby producing MS/MS spectral information to be used for analysis.
This MSE method is more advantageous for analysis of a relatively small amount of a peptide than the DDA method, because it produces peptide fragments without regard to the observed height of ions. However, the MSE method has shortcomings in that proteins can be analyzed only by Proteinlynx Global Server (PLGS) of Waters Inc. and in that the method is not suitable for MASCOT and the like which are most frequently used by researchers. However, the MSE method has a powerful advantage in that it can analyze even a protein that is present in a trace amount in a sample. For example, it is thought that 23 kinds of proteins account for 98% of blood protein, and biomarkers of interest are present in the remaining 2%. In order to analyze these trace proteins, a process of removing a large amount of proteins to concentrate the trace proteins is required. However, blood samples cannot be obtained in large amounts, and thus there is a limit to the concentration of the blood samples. Also, membrane proteins are contaminated with intracellular proteins present in large amounts, which interfere with analysis of the membrane proteins. Despite the development of various methods, the analysis and verification of trace proteins (and membrane proteins) are difficult to perform.
There is thus a need for a new method of analyzing a protein.
The above information disclosed in this Background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not form the prior art that is already known in this country to a person of ordinary skill in the art.
The present invention aims to provide a method of using the MSE and DDA methods in a new way that can analyze and verify chemical changes in trace proteins. For example, as shown in
In one aspect, the present invention provides a method of quantification and qualification of a protein(s), the method comprising the steps of: (A) pre-treating at least one protein or a mixture containing at least one protein to obtain peptides; (B) obtaining information about retention times and mass values of the obtained peptides by performing data-independent analysis using a liquid chromatography-mass spectrometer (LC-MS); (C) searching a first database (e.g., PLGS) on the basis of the information obtained in step (B) to quantify and qualify a target protein or proteins; (D) extracting information about the quantified and qualified target protein or proteins; (E) obtaining information about retention times and mass values by performing data-dependent analysis using an LC-MS from the extracted information of step (D); (F) searching a second database (e.g., MASCOT) on the basis of the information obtained in step (E) to further quantify and qualify the target protein or proteins; and (G) comparing the search results of steps (C) and (F) to verify the quantification and qualification.
This invention may comprise an additional step of selecting a protein or a protein group of interest with reference to a protein database before the step (C). In this case, preferably, as the database in step (C), a database allowing for time-efficient analysis may be used.
In still another aspect, the present invention provides a program for performing said methods for quantitatively and qualitatively analyzing a protein and a storage medium storing the program.
In the present invention, the mass spectrometer may, preferably, be a triple-quadrupole mass spectrometer.
In the present invention, the protein may be a trace protein present in a cell, for example, a membrane protein. Also, the protein may be a post-translational modified (PTM) protein, for example, a cysteine-containing protein.
The above and other aspects and features will be further described.
Hereinafter, the present invention will be described in further detail with reference to examples. However, these examples are intended to illustrate rather than limit the technical idea and scope of the present invention. It will be obvious to those skilled in the art that various modifications are possible within the scope of the technical idea of the present invention.
The term “include list” used herein is defined as a list including information about a particular set of retention times and mass values of peptides obtained from at least one protein or a mixture containing at least one protein, and this information will be used in a DDA mode to analyze a target protein or proteins.
From information obtained by MSE analysis of proteins, the retention times and mass values of peptides that have been used to analyze the target protein or proteins were taken and a program capable of easily making “include list” was constructed. “Include list” to be used in verification will vary depending on the meaning imparted to proteins obtained from MSE analysis. As such, a proper include list can be constructed according to a target protein or proteins. For example, as described below, if a target protein is a cysteine-containing protein or a membrane protein, a proper include list tailored to the target protein can be constructed.
The following examples illustrate the invention and are not intended to limit the same.
If a test to be carried out is a test for observing a specific chemical change in the amino acid cysteine, proteins containing cysteine can be selected from protein information obtained from MSE, and peptide information that have been used to analyze the proteins can be collected, thus producing “include list”.
(1) Protein Pretreatment
Many proteins have an S-S covalent bond between cysteine residues. Under specific conditions, i.e., pathogenic conditions, the S—S bond breaks. To confirm this, a protein was covalently bonded with two chemical substances to make a sample. When the sample was treated with iodoacetamide, there was a change in mass of +57.02 Da in cysteine, and when the sample was treated with N-ethyl maleimide (NEM), there was a change in mass of +111.03 Da in cysteine.
After being treated with iodoacetamide, the protein sample was treated with DTT (dithiothreitol) to break the S—S bond. Then, the protein sample was treated with NEM, whereby a protein in which the S—S bond was originally broken could be distinguished from a protein in which the S—S bond was not originally broken.
(2) Data-Independent Analysis and Database Search
The sample was analyzed in a nano-HPLC-MSE mode composed of Nano-HPLC connected with Synapt HDMS tandem mass spectrometry (Waters). The analysis was performed in the following conditions:
The test was performed three times. The raw data obtained from the test was processed in PLGS to search proteins using the sprot database in an automatic mode with peptide tolerance and fragmentation tolerances.
(3) Preparation of EMRT Table and Determination of “Include List”
Among EMRT information produced by the MSE test, retention times and mono isotope mass of peptides for proteins containing cysteine were calculated to prepare “include list” (see
(4) Data-Dependent Analysis
The “include list” was applied to the DDA mode to obtain the results of total ion chromatography (TIC) as shown in
In
(5) Database Search (Verification)
Search was performed in the protein database IPI_mouse_v3.44.fasta using the MASCOT v 2.2 program. The search was performed using carbamidomethylation (C) and N-ethylmaleimide as variable modification at a peptide tolerance of 100 ppm and a ms/ms tolerance of 0.2 Da (
In
As can be seen in
In this Example, information obtained from data-independent analysis was used to verify trace proteins, and the method of this Example can provide a good method capable of more accurately obtaining information about the chemical modification of proteins.
Analyzing membrane proteins of industrial and scientific importance using a mass spectrometer is difficult due to their relatively small amounts. Accordingly, in the present invention, membrane proteins present in relatively small amounts were analyzed by the data-independent analysis method, and only information about the membrane proteins was extracted such that the membrane proteins could be analyzed by data-dependent analysis, whereby the membrane proteins could be analyzed and verified with higher reliability.
If proteins to be analyzed are membrane proteins, it is possible to use a method comprising predicting membrane proteins using a protein database and then producing an “include list” in comparison with the list of the predicted membrane proteins. From this Example, it can be seen that the present invention can be applied to analyze a mixture of proteins present in relatively small amounts.
(1) Database Search and Prediction of Membrane Proteins
The Synechocytosis protein database includes information about a total of 3661 proteins. From this protein information, information about a total of 706 membrane proteins was extracted using TMHMM 2.0 (http://www.cbs.dtu.dk/services/TMHMM/) and Signal P 3.0 (http://www.cbs.dtu.dk/services/SignalP/).
The extracted information about the membrane proteins were stored in the form of a text file as follows.
(2) Data-Independent Analysis and Database Search
A sample was analyzed in a nano-HPLC-MSE mode composed of Nano-HPLC connected with Synapt HDMS tandem mass spectrometry (Waters). The analysis was performed under the following conditions:
The test was performed three times. The resulting raw data including information about peptide fragments were processed in PLGS to search proteins using the sprot database. The proteins were searched under the following conditions: fragment tolerance: 100 ppm, MS/MS tolerance: 0.1 Da, enzyme: trypsin, missed cleavages: 1, fixed modification: cabamidomethylation (C), variable modification: oxidation (M).
(3) Preparation of EMRT Table and Determination of “Include List”
In order to analyze membrane proteins of interest by comparing gene indices predicted as the membrane proteins with independent data (EMRT table), the retention times and mono isotope mass of the peptides and their orders used were extracted to produce an “include list” such that data-dependent analysis could be performed.
A program for automatic production of the “include list” is illustrated below.
<Example of Program for Production of “Include List”>
Using the above-prepared program, an “include list” having a peptide distribution as shown in
(4) Data-Dependent Analysis
Data-dependent analysis was performed under the following conditions:
The LC developing solvent and flow rate used in the data-dependent test were the same as those used in the data-independent test. 5 μl of each of the samples was injected through an autosampler, and desalted and concentrated in a C18 trapping column. As an internal standard, 100 fmol/ml glu-fibrino peptide B was injected at a rate of 600 nL/min and ionized. Mass spectrometry was programmed such that a region of m/z 50-1990 was scanned in the V mode and a maximum of 3 precursor ions were fragmented.
(5) Database Search (Verification)
Membrane proteins were analyzed by both the method according to the present invention (MSE-DDA analysis method) and the prior art methods (MSE and DDA analysis methods) (
As can be seen from the graphs in
It was found that proteins, which were analyzed in the MSE method (x-axis), but not analyzed in the MSE-DDA method, were distributed in small amounts. It is considered that the reliability of analysis by the MSE method is lower because there is no or less accurate information about peptide analysis.
As a specific example,
As described above, according to the present invention, the results analyzed by the existing data-independent analysis method are compared with pre-calculated biological information to obtain information about peptides to be analyzed. Also, the obtained information is substituted into a data-dependent analysis mode to produce desired peptide fragments that can be used to analyze and verify a protein.
According to the MSE-DDA analysis methods, more accurate peptide information is used so that more peptide information is used to analyze a specific protein. Thus, an increase in the score of protein can be seen. Because higher scores of protein indicate the higher reliabilities of analysis of the protein, verification of protein by the MSE-DDA method can be useful. According to the methods, a modified protein and a trace protein present in a sample can be easily detected and quantitatively and qualitatively analyzed. Thus, the present invention is very useful in cell signaling studies, drug development, etc.
The invention has been described in detail with reference to preferred embodiments thereof. However, it will be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2009-48024 | Jun 2009 | KR | national |
This is a continuation of International Application No. PCT/KR2010/002745, with an international filing date of Apr. 30, 2010, which claims the benefit of Korean Application No. 10-2009-48024 filed Jun. 1, 2009, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/KR2010/002745 | Apr 2010 | US |
Child | 13309038 | US |