1. Field of the Invention
The present invention relates to field of biomolecule separation and purification, assays that identify protein complexes from whole cells and high-throughput devices for carrying out such methods.
2. Related Art
Most proteomics analyses employ a basic approach that combines a multi-dimensional separation scheme with a protein identification technique involving mass spectrometry. This reflects the simple fact that complex mixture of proteins must be well separated and simplified prior to identification, and only mass spectrometry has the adequate sensitivity and throughput for such analysis. Polyacrylamide gel electrophoresis (PAGE) is one of the most important and effective protein separation technique and has been incorporated in various multi-dimensional schemes. For example, traditional two-dimensional gel electrophoresis (2DE) has been established for many years as a robust method for separation of complex protein lysates, especially the soluble proteins, obtained from cells. The power of 2DE stems from the fact that both the isoelectrical-focusing (IEF) and the SDS gel electrophoresis are highly resolving, dynamic, and orthogonal chromatographic techniques. The two-dimensional blue native/SDS gel electrophoresis (2D BN/SDS-PAGE) has been the “method of choice” for intact membrane proteins and protein complexes.
Similarly, gel electrophoresis is a main tool for protein purification involving “tagging” strategies. However, a major limitation of PAGE, compared with some other liquid chromatography techniques, is that it is not readily interfaced with mass spectrometry because it is inconvenient to get separated protein bands out of the gel for further processing. To harvest separated proteins, one must cut them out or elute. Large scale “gel-cutting” has been used in proteomics and protein interaction projects and it is an effective but labor intensive approach. A gel-cutting robot, for instance, the “Ettan Spot Picker” from GE Healthcare's Life Sciences (Piscataway, N.J.) exists but its accessibility is generally limited. For elution, there are at least two types of elution devices on the market—“Model 491 Prep Cell” also described in U.S. Pat. No. 4,877,510 hereby incorporated by reference and “Whole Gel Eluter”—both from Bio-Rad Co. (Hercules, Calif.). They are not widely used, partially because they can process only one or two samples a time; sample loss is significant and it is difficult to operate. “The Mini Prep Cell” or “Model 491 Prep Cell” can be used to purify specific proteins or nucleic acids from complex mixtures by continuous-elution electrophoresis. Separated protein bands migrate off the bottom of the gel column and a complicated but not vary effective elution chamber traps and collects the bands. Other than its poor efficiency, it is not suitable for eluting and collecting bands in a dynamic range of protein mass, nor parallel multi-channel processing. Another device, “Whole Gel Eluter” allows simultaneous elution of multiple bands of proteins already separated by a slab gel, and the elution is in the transverse direction, through the thickness of the gel. The collection and recovery of eluted protein bands are difficult and ineffective. In addition, the experiment set-up procedure of this device is tedious and technically challenge; it can process only one sample per run.
It is essential to find an effective way to collect eluted bands. Important issues regarding fraction collection include interruption of electrophoresis, continuity of collection, loss of sample, loss of resolution and dilution, etc. Previously several approaches, although primarily designed for capillary electrophoresis (CE), had been investigated. Three approaches of particular interest are: (1) sweep liquid: liquid was swept through a standard liquid chromatographic detector, resulting in continuous and uninterrupted collection but sample dilution; (2) on-line frit: a frit structure was attached to the outlet end of the capillary to isolate electrical conductivity from the elution buffer flow (Huang, X.; Zare, R. N. Anal. Chem. 1990, 62, 443-446); and (3) coaxial sheath flow: a sheath flow was built around the capillary to confine the sample flow but also provide the electrical connection to ground electrode (Muller, O.; Foret, F; Karger, B. Anal. Chem. 1995, 67, 2974-2980). Each of these methods has its own advantages and disadvantages.
In recognition of this need in proteomics, the US Department of Energy's program, has established as a major goal the development of very high throughput methods to characterize the structures and functions of protein complexes in microbes relevant to its mission. In the Protein Complex Analysis Project (PCAP), our high throughput pipelines include methods that employ various “tagging” strategies and 2D BN/SDS-PAGE approach, and a common step in these methods is the use of SDS-gels for protein separation and purification, followed by protein identification by mass spectrometry.
It would be a significant advance in the field of proteomics if a high throughput gel-eluting tool were developed to interface with mass spectrometry. Genome sequencing projects have identified the complete set of proteins for many organisms. To take advantage of this information, the function of these proteins must now be determined Many proteins are components of homomeric or heteromeric protein complexes and their activity depends on the presence of the other polypeptides in the complex (Alberts, 1998). Protein complexes are further organized in to pathways and interact with other macro molecular complexes. In addition, the composition, stoichiometries, and structures of complexes can be influenced by environmental change. Thus, to correctly determine the functions of all gene products and how they are regulated, it is essential to identify the interactions between individual proteins and thoroughly characterize complexes.
Two principle methods have been used to identify the physical interaction between proteins on a genome wide basis: two hybrid screens and TAP followed by mass spectrometry. Each method has its strengths and weaknesses.
Two hybrid screens are a genetic assay that measures the interaction of two proteins expressed as heterologous fusions in yeast cells. These screens have a higher throughput than TAP and can detect transient interactions with disassociation constants in the μM range. But they cannot detect interactions that require more than two proteins; they have a false positive rate between 50%-90%, even when testing yeast proteins in yeast cells (van Merring et al, 2002; Edwards et al, 2002); and they do not give rise to a pure sample of protein complex.
TAP is a biochemical method in which a protein subunit is tagged with two separate affinity tags separated by a protease cleavage site (Puig et al, 2001). The tagged protein is expressed in vivo at natural or close to natural levels and—after cell lysis—complexes with the double tagged protein are purified over two affinity columns, resulting in very pure complex preparations. The identity of the co-purifying polypeptides is then determined by mass spectrometry. While this method cannot detect protein/protein interactions in the μM range, it has been used to detect hundreds of stable protein complexes in yeast and E. coli (Butland et al, Nature 2005, 433, 531-537). Compared to a curated set of known protein complexes, only 15% of expected interactions were not found in an analysis of about ¼ of the yeast proteome (Edwards et al, 2002), and the false positive rate of this method is regarded as being lower than that for two hybrid screens. To date this method has been limited to the analysis of heteromeric complexes, but with the addition of a simple characterization step, it could be used to detect higher multimeric homomeric complexes as well.
Despite the strength of TAP, it suffers from three deficiencies that are especially problematic given the needs of the DOE's Genomics: GTL project.
The DOE has identified a range of bacteria whose molecular pathways and regulatory networks it wishes to enumerate and model, which will in turn open the way for the use of these organisms for bioremediation and energy production. Many of these organisms, however, cannot yet be modified by genetic or recombinant techniques, and thus a method that does not require genetic/molecular manipulation of the organism would be a great advantage.
The TAP strategy requires that for each protein tagged, a separate strain of bacteria must be cultured, extract prepared, and protein purified, which is intrinsically labor some.
A major objective of the Genomics: GTL program is to characterize the changing interactions between proteins as the local conditions experienced by bacteria alter, and a key part of this projects goal is to analyze stress induced changes in complexes. To compare changes between environmental conditions for many complexes would require enormous precision and reproducibility between growth conditions used for each strain that would be difficult to achieve.
Thus, there is a need for a different approach that does not require the use of affinity tags, has the potential of much higher throughputs, and purifies complexes from a single large culture of cells.
Historically, stable protein complexes were identified one at a time, often as the result of purifying an enzyme activity of interest. In this traditional approach, complexes were inferred when multiple polypeptides co-migrated together with an associated enzyme activity through multiple chromatographic separation steps,2-4 demonstrated the same sedimentation velocities5,6 or electrophoretic mobilities.7 More recently, stable protein complexes have been identified using high throughput mass spectrometric detection of collections of polypeptides that are stably associated with heterologous affinity-tagged polypeptides.8 In particular, tandem affinity purification9-12 has proven to be highly effective in mapping the soluble portion of the yeast13, 14 and E. coli15 interactomes. Despite its undoubted utility, however, TAP suffers from several limitations. For example, this method is restricted to biological systems that are amenable to the genetic manipulations required to introduce the affinity tagged polypeptides into cells. Furthermore, the addition of an affinity tag may destabilize some protein-protein interactions or alter other relevant protein activities. Finally, TAP requires a distinct genetic strain to be constructed for each polypeptide and then each strain must be separately cultured and analyzed. These and other limitations suggest that it may prove difficult to automate this strategy to achieve higher throughput than has been already attained
The present invention provides a method of protein separation and analysis and a multi-channel gel electrophoresis instrument, capable of high resolution separation, and fast and continuous fraction collection over broad mass or size ranges. The invention employs a strategy of using multiple, short linear gels to achieve separation power similar to a long gradient gel. The fraction is then eluted in continuous and parallel fashion. The method works particularly well on SDS-gels.
Thus, in one aspect, a multi-channel gel electrophoresis apparatus for efficiently collecting molecules isolated by gel electrophoresis so they can be further analyzed, identified, or used as reagents or medications. The device collects biomolecules (protein, DNA, RNA, or pieces thereof) as they migrate off the bottom of gels. It uses a combination of fluid dynamics, electromotive forces, and gravity to increase the efficiency, concentration, and speed at which the bands of molecules are eluted into collecting wells. The device uses multiple parallel channels, harvests molecules at an efficiency of 50% or more, and has been scaled-up and automated for high-throughput. The eluted fractions are delivered directly into multiwell plates, where the molecules can be digested, if necessary, and directly analyzed by mass spectrometry or other techniques. The method can be used in native (or denatured) protein electrophoresis to analyze protein complexes in biological systems.
In one embodiment of the instrument, for a typical 2.5-hour electrophoresis run, each sample can be separated and eluted into 48 to 96 fractions over the mass range from ˜10 KD to 150 KD; sample recovery rate can reach 50% or higher; each channel can be loaded with up to ˜0.5 mg material in a 0.5 mL volume and a purified band typically elutes over 2-3 fractions (200 μl/fraction). For native gel electrophoresis, the sample loading capacity may be limited to ˜50 μg per channel due to protein aggregation. The system can however be used for native gels where some aggregation and dilution are tolerable.
A method for biomolecule size separation using electrophoresis comprising (a) providing a polymerized electrophoresis gel loaded with the biomolecules to be separated and purified; (b) performing electrophoresis on said gel to separate the biomolecules; (c) capturing the separated biomolecules as they migrate off the gel.
The methods provided by the present invention are high throughput methods and can be implemented in production pipelines to rapidly purify and identify the majority of stable protein complexes in a cell. Protein complexes or “molecular machines” perform a variety of discrete and highly specialized processes that modify and dictate systems molecular states, which, in turn, define cellular physiology in response to genetic and environmental stresses. To systems biology, it is essential to identify and characterize the complete inventory of protein complexes in high throughput
The invention further provides a method to rapidly purify and identify the majority of stable protein complexes in a cell without the use of affinity tags or affinity purifications. The “tagless” approach includes taking a crude protein extract prepared from a single large culture of cells, sequentially fractionating the extract by a number of orthogonal chromatographic separation steps (using ion exchange column, hydrophobic interaction column (HIC), sizing-column or native gel electrophoresis, for example). Selected fractions from each column are used as the input of the next step. At the last step, there are several hundreds of parallel sizing or native gel electrophoresis runs, generating 10,000-20,000 fractions that are proteolyzed prior to labeling the peptide products. Identifications of protein contents and their relative abundances in these fractions are obtained by tandem mass spectrometry (MS/MS). Protein complexes are inferred by analyzing elution profiles of all proteins detected and discovering proteins that co-migrate in the multi-fractionation space. In principle, this approach is suitable to detect endogenous protein complexes from wild type cells based on the shared elution profiles of polypeptides that, as components of a protein complex entity, co-migrate through multiple chromatographic steps.
The invention further provides a high throughput native gel system. Such as system shows promise that it can be incorporated into a “tagless” approach to protein purification and isolation to replace the need for a sizing-column, which is inherently low throughput and low resolution.
Tandem affinity purification is the principle method for purifying and identifying stable protein complexes system wide in whole cells. Although highly effective, this approach is laborious, prone to artifacts, and impractical in organisms where genetic manipulation is not possible. Herein is described a novel “tagless” method that combines multi-dimensional separation of endogenous complexes with mass spectrometric monitoring of their composition. In this method, putative protein complexes are identified based on the co-migration of collections of polypeptides through multiple orthogonal separation steps.
A majority of E. coli proteins are shown to remain in stable complexes during fractionation of a crude extract through three chromatographic steps. The inventors also demonstrate that iTRAQ™ reagent-based tracking can quantify relative migration of polypeptides through chromatographic separation media. LC MALDI MS and MS/MS analysis of the iTRAQ-labeled peptides gave reliable relative quantification of 37 components of 13 known E. coli complexes: 95% of known complex components closely co-eluted and 57% were automatically grouped by a prototype computational clustering method.
The assay and method of the invention dramatically improves the efficiency of the purification and identification of protein complexes in cells. In fact, the assay itself, being essentially a single experiment with two sequential procedures (separating co-migrating complexes and analyzing each complex for protein content) allows the capture of information about whole cells that was previously unattainable without months of laborious experiments.
Accordingly, the invention provides an opportunity to generate a database of the protein complexes present (as identified by this method and this assay) in any number of whole cells from any source (e.g. cell lines, tissue, plants and all other sources of whole cells). Presently, the only cell type for which this information is available is a yeast cell, and the information to map the entire protein complex profile of the yeast cell took many many months of experiments and data collection.
A high throughput method of identifying protein complexes in whole cells from any organism comprising: (a) passing cell lysate from whole cells through at least two orthogonal separations under conditions that preserve interactions among polypeptide components of protein complexes in the lysate, (b) collecting polypeptide components in separate elution fractions, (c) proteolytically digesting each fraction separately to produce a plurality of peptides; (d) analyzing the peptides in each fraction for peptide identity and abundance of the peptide in the fraction relative to the other fractions, and (e) identifying co-migrating polypeptides using mathematical analysis based on peptide distribution in the fractions, wherein co-migrating polypeptides identify protein complexes in the cell.
The peptides in each fraction for protein identity and abundance relative to the other fractions may be analyzed by mass spectrometry. In another embodiment, identifying co-migrating polypeptides comprises clustering. In another embodiment, passing cell lysate from whole cells through at least two orthogonal separations comprises passing the cell lysate through a chromatographic separation. In another embodiment, the determining of a structure of at least one protein complex may be accomplished by electron microscopy.
The high throughput method further comprising storing protein complex information from a plurality of whole cells in an interactive database accessible to a plurality of users, wherein the protein complex information comprises substantially all the protein complexes in the cell. In another embodiment, the method further comprising storing monomer and other single protein information.
A high throughput method of identifying protein complexes in whole cells from any organism comprising: (a) providing whole cells, (b) separating cell lysate from said whole cells under conditions that preserve interactions among polypeptide components of protein complexes in the lysate, (c) collecting said polypeptide components in separate elution fractions, (e) analyzing the peptides in each fraction for peptide identity and abundance of the peptide in the fraction relative to the other fractions, and (f) identifying co-migrating polypeptides using mathematical analysis based on peptide distribution in the fractions, wherein co-migrating polypeptides are protein complexes in the cell
From the method described in this invention, any number of researchers could develop a database that stores the information learned by practicing the tagless method of identifying protein complexes in a cell. For example, the database could classify cells by listing the protein complexes found by the method, and reference details of the isolation procedure (e.g. how many and which orthogonal separations were conducted). The database could, in addition, provide a platform for comparing cells, e.g. such as diseased and healthy cells, cells from different tissues, and search on given protein complexes to learn in which cells they are found. Clearly such information is vastly useful in planning any research using a particular cell, and for many uses relating to use of these cells, such as for diagnostic purposes and in development of therapies for treating patients with a condition that can be defined at a cellular level.
Table 3: Non-Ribosomal Proteins Identified by in the Course of Tagless Protein.
Table 4: Tagless Strategy-Detection of Reciprocal Protein-Protein Interactions Previously Identified by TAP.
Table 5: Results of Clustering Analysis of Elution Profiles of Non-Ribosomal Proteins
Table 6. Biochemical identity and composition of large macromolecular complexes purified from Desulfovibrio vulgaris Hildenborough by the tagless strategy. Homologs from other bacteria listed in the rightmost column are members of the same Pfam families as the D. vulgaris protein.
Important issues that needed to be resolved in fraction collection included preventing interruption of electrophoresis, maintaining the continuity of collection, and reducing sample loss, resolution loss, and dilution of protein bands.
To facilitate a direct interface between protein separation and isolation by polyacrylamide gel electrophoresis and protein identification by mass spectrometry, a multi-channel system was developed for continuous fraction collection as protein bands migrate off the bottom of gel columns. It was constructed based on a scheme that uses multiple short linear gel columns to achieve separation power similar to a long gradient gel, and an elution technique that allows continuous and simultaneous fraction collections of multi-channels at low costs. Fast and high resolution separation and fractionation of complex protein mixture can be achieved on this system while running SDS-PAGE gels.
In a 2.5-hour electrophoresis run, for example, each sample can be separated and eluted into multiple (e.g., 48 to 96) fractions over the mass range of ˜10 KD to 150 KD; sample recovery rate can reach 50% or higher; each channel can be loaded with up to 0.5 mg material in 0.5 mL volume and a purified band typically elutes over 2-3 fractions (200 μl/fraction). Similar results could be obtained when running native gel electrophoresis on this instrument, but protein aggregation, mainly caused by sample over-loading and stacking, may limit the loading capacity to about 50 μg per channel.
The goal of this work was to develop an easy to use multi-channel system that is effective in separating a complex mixture of proteins while automatically capturing separated bands into a liquid fraction collector, over a broad mass/size range. Such a system can be built based on a simple scheme.
In one embodiment, as shown in
As used herein, by the term, “gel column,” it is meant, the gel itself or the gel and the gel column tube or channel containing the gel together. For example, a gel column is prepared in a glass tube using a rubber stopper and a piece of thin plastic film to seal the bottom of the tube, and is referred to as a “gel column.” Thus, “gel column” can mean both the column gel itself in the gel column tube or the gel and gel column tube containing the gel.
Thus, in one embodiment, the multi-channel system as in
The plurality of linear gel columns can each have different polyacrylamide concentrations, to achieve separation power similar to a typical gradient gel thereby enabling continuous fraction collections of multiple gel columns. In one embodiment, the lower gel column is inserted into the conduit in the elution device prior to electrophoresis, where it remains until all fractions are collected. In
Elution Unit. Referring now to
To simultaneously capture protein bands off multiple gel columns, a “free-flow” technique was developed. In one embodiment, the elution unit comprising a machined conduit of acrylic glass and a fused-silica capillary tube. As shown in
Below the taper is the final portion of the conduit, a straight tube with an outer diameter (od); and in the middle of the length of this tube, four holes are drilled perpendicular to the central axis of the tube. In one embodiment, the straight tube is 12-mm long, with a 4.5-mm outer diameter (od); in the middle of the length of this tube, four 1.0 mm diameter holes are drilled perpendicular to the central axis of the tube, and the inner diameter (id) of the straight tube is 3 mm above the holes and only 1.53 mm below the holes. These holes, termed conducting holes herein, allow electrical currents to flow between the gel column and the anode and running buffer to flow in. The increased electrical field within the straight tube provides additional acceleration to eluting biomolecules once they enter the gel-free buffer solution.
A narrow-bore glass capillary tube with a sleeve (e.g., a 50 mm long PEEK tube, 0.5 mm id and 1/16th inch od) is inserted into the bottom of the conduit. The PEEK tube ends about 1 mm below the four holes and the capillary tube 2 mm below the taper. When charged molecules reach the tip of the capillary tube they are subjected to an inward drag force generated by the counter-flow of buffer solution. As the rate of the counter-flow increases, it overcomes the electrical force and sweeps the biomolecules down into the capillary tube and deposits them into a fraction collector below. The buffer flow can be gravity-driven, and thus does not require expensive pumps. The buffer flow is controlled by adjusting the length and inner diameter of the capillary tube. The relative vertical position of the capillary tube within the straight tube is an important parameter that affects the capture efficiency of eluted biomolecules (see below for further details). Using this approach, many channels can be operated simultaneously.
The present elution unit uses electrophoresis buffer solution as the media to establish electrical connection between the gel column and the ground electrode, but also the bulk flow of buffer solution to drain separated bands migrating off the gel column. The flow is gravity-driven and the rate can be controlled by adjusting the length and inner diameter of the capillary tube. The relative position of the capillary tube within the straight tube is an important parameter that affects the capturing efficiency of eluted bio-molecules (more details below).
The elution unit, which also serves as the lower buffer container, was formed by attaching a base plate holding four capillary tubes (320 μm id, 450 μm od and 15-25 cm long [part # TSP320450, Technologies, Phoenix, Ariz.]) and an O-ring gasket to a buffer container body, which includes a Pt electrode (anode) and a buffer inlet and outlet (
Multi-Channel Apparatus. Based on the scheme described above, a prototype, 16-channel instrument (See
In other embodiments, the elution unit, gel segment boxes and gel column tube can be made of any polymer or glass material that is inert and will not react to the electrophoretic current. In a preferred embodiment, the elution unit and gel segment boxes comprise Lucite or acrylic material and the gel column tubes comprise glass. In other embodiments, the electrode is any metal that can be used as a conducting electrode including platinum,
The present multi-channel system further comprises a power supply, and manual or digital control of the power supply and electrophoresis conditions. Typical electrophoresis condition is 20-30 volts/cm, with a power limit of 1-2 watts/column. Power supplies for electrophoresis applications can be obtained commercially such as the VWR® Power Supply Model 202 (VWR Catalog #93000-746; VWR, West Chester, Pa.) which has four sets of color-coded output terminals allow multiple gels to be run simultaneously.
A motorized fraction collector using standard 96-well plates is located below the elution unit. The distance between the capillary tube and the fraction collector can vary. In one embodiment, the fraction collectors are about 15-20 mm below the capillary tube. In one embodiment, there are two identical but individually addressable fraction collectors, each of which supports two 4-channel electrophoresis units. The fraction collector can be on an XY stage that is pneumatically or digitally controlled.
Referring now to
Computer software was made using is C# code on a Windows computer to control the “Y” stage pneumatic actuator that toggles between a “Left” and a “Right” position. There were two separate acquisition setups that could be controlled. The “Object”s are “Move 1” and motor “1” for setup one and “Move 2” and motor “2” for setup 2.
As shown in
Modes of operation. The instrument can be operated in two different modes. In the “single” mode, the sample is loaded above a single-piece, long gel column, and the proteins are separated and eluted directly into the fraction collector. Since there is only one segment of gel used in this mode, it can achieve effective separation over a finite mass range only, the range being chiefly determined by the concentration of polyacrylamide used. In the “multi” mode, two or three segments of gel columns, each with a different gel concentration, are stacked on top of one another during the initial electrophoresis run, the gels of lower concentration being placed above those of higher concentration. After the faster migrating protein bands have entered the lower gel segment, the upper segments are removed and put onto other elution units where the electrophoresis of slower migrating proteins is completed and the separated bands eluted and collected. The single mode is fast and effective if the application targets a specific band. The single mode can also be used where multiple gel columns of different percent acrylamide target a broader band range of polypeptides in cases where sample consumption is less of an issue. The multi mode is more efficient in separation and sample use but needs more control and parameter optimization in operation. For example, the condition and time of when to separate the two stacked gel columns must be empirically determined. The multi mode set-up functions similarly to that of a traditional analytical mode for protein separation.
Gel Preparation and Electrophoresis. In one embodiment, SDS or native PAGE gels can be cast in the gel column tubes. In one embodiment, the bottom of the glass tube column is sealed using thin plastic film, then gel solution is poured into the column to the desired length and section of water-saturated solvent is added on top to separate gel solution from air and to maintain a flat gel surface and polymerized. It is preferred that the gel solution is made fresh each time. A typical gel solution that can be used is solutions of 30% acrylamide (Bio-Rad, Cat. #161-0156), 375 mM TrisHCl buffer (pH 7.8), 0.1% (v/v) TEMED and 0.03% (v/v) APS.
When using the gel columns in stacking mode, upon solidification of the gel columns, a stacking gel solution is added. The amount of stacking gel solution used depends on the desired length and butanol added again. Rinse gel surface with Millipore water once the gel is solidified to get ride of excess butanol. Electrophoresis running and loading buffer is added to the glass tubes and multi-channel system prior to an electrophoresis run. If multiple gel segment containers are used, then running buffer needs only to be added to the top-most container and the elution unit as buffer will flow through the gel column tube from the top container, thus also completing the electrophoresis circuit.
The maximum field that could be applied for electrophoresis was 30 volts/cm. Going beyond this threshold resulted in local over-heating, causing gas bubbles to form between the bottom of the gel column and the taper, which interrupts electrophoresis.
Our current Counter Free-Flow PAGE protocols could be further optimized for specific needs. To improve resolution of low-molecular weight proteins or peptides, one could use higher acrylamide concentrations and/or longer gels, increase the counter-flow rate, and collect smaller fractions. To reduce dilution in the high molecular range, one might use lower acrylamide concentrations in the upper gel segment in the multi mode and increase the electric field. However, there are limits to the optimization of certain variables. For example, separation properties will become less reproducible as acrylamide concentrations approach 3% or less. Similarly, higher electric fields are more likely to generate air bubbles at the interface between the gel and the conduit, which will change the electrical field, sometimes completely interrupting electrophoresis
The eluted fractions are delivered directly into multiwell plates, where the molecules can be digested, if necessary, and directly analyzed by mass spectrometry or other techniques. The method can be used in native (or denatured) protein electrophoresis to analyze protein complexes in biological systems. In one embodiment, the fractions collected from the elution units can be further processed by other chromatography means such as Hydrophobic Interaction Chromatography (HIC), Size Exclusion chromatography, Hydrophobic interaction separation, or Chromatofocusing.
Sample Preparation. Crude DNA extracts (˜5 mg/ml, typical) are prepared with an addition sample volume of sample loading buffer, e.g., 60 mM Tris, 460.8 mM, 60% glycerol, 0.03% Bromophenol Blue. For SDS samples, the loading buffer may contain additional SDS and samples were denatured at 95 degree C. for 10 minutes prior to sample loading. For sample with high salt, such as HIC fractions, a desalt column can be used to remove salt and buffer exchanged into 125 mM TrisHCl pH 6.8 buffer. To concentrate desalt HIC samples, an HIC column such as Millipore's 3K Amicon Ultra column (UFC800324) can be used, reducing sample volume to 200˜600 μl per fraction.
Capturing Eluted Proteins. Multiple geometrical and dynamic parameters can affect the overall collection efficiency and ultimate resolution of this device; but it was found that the most critical ones were flow rate of the elution buffer and the relative location of capillary tube in the conduit.
As described above, a fraction collection stage with fraction collector containers or multi-well plates is used for capturing protein bands migrating off the gel column. The user can determine how many fractions are needed to capture and how much fluid volume should be collected in each fraction. For example, in the present embodiment, about 120 μL are captured in each fraction.
In one embodiment, given the structure and geometry of the current collection device, the optimal flow rate is set to about 125 μl/minute and the position of the capillary set to 2 mm below the tapered section. The flow rate and the position of the capillary set are empirically determined by monitoring the collection process of the band of the loading dye. The dye molecules used in the sample loading buffer form a blue band (1˜2 mm wide) that always emerges first from the bottom of gel column. When the elution parameters are less optimal, dye molecules will bypass the tip of capillary tube, and eventually leak out from the conducting holes. With optimal settings, leakage is minimal or not visible (see
This fraction collection technique does not concentrate separated bands as some other application might require. In one sample process, proteins in fractions selected for mass spectrometry analysis will be captured by the PVDF membrane of a 96-well plate (for example, MultiScreen-HV, Cat. MAHVN4510, Millipore Co. (Bellerica, Mass.)) format and digested thereafter. Therefore, a slight sample dilution is tolerable. In fact, we could even increase flow rate to capture more eluted molecules, resulting in more dilution but still having no impact on our mass spectrometry analysis. A true advantage of the current version of counter free-flow approach is that it is quite easy to construct and to operate in a multi-channel format.
For an application where sample dilution is critical, the coaxial sheath flow (Muller, O.; Foret, F; Karger, B. Anal. Chem. 1995, 67, 2974-2980) and the sweeping approach (Hjerten, S.; Zhu, M.-D. J. Chromatogr. 1985, 327, 157-164) would be advantageous. In fact, we have investigated applying these approaches in our system, but the results were less satisfactory. The main problem was that significantly more engineering efforts, in the areas of design architecture, precision machining and fabrication as well as controls, for example, must be made, especially considering the size of the gel column (not the typical capillary tube) used in this work. It should be pointed out that if a smaller size gel column is used in applications where much less sample is involved, we can simply reduce the size of the upper cup or employ an adaptor and decrease flow rate to accommodate.
Regarding the “stacking” mode of operation, to obtain reproducible separation results, namely, the same protein to be eluted into same or neighboring fractions from run to run, much QA/QC work is needed. For example, the gels must be cast with highly reproducible quality and run in same speed in order to allow reproducible separation between the two gel pieces. These require locking-down all critical operation parameters, which involves a large amount of testing and evaluation. For this reason, we are running the system more frequently in “single” mode at present.
The entire system can be set-up for conducting electrophoresis and fraction collection in various settings including the laboratory or a cold room if sample preservation is a concern. Alternatively, other controls or attachments for temperature control are contemplated to be added by one having skill in the art.
Choice of iTRAQ-Based MALDI MS/MS for Protein Elution Profiling
A major challenge in establishing the feasibility of our proposed tagless strategy was to select a suitable mass spectrometry method. Because of the large number of fractions to be analyzed it was critical to adopt an approach that minimized the number of MS/MS analyses as this could otherwise become a serious rate limiting step. It was also essential to adopt a method that was able to quantitate relative abundances of polypeptides in different fractions.
The inventors chose a LC MALDI MS workflow, rather than a LC electrospray ionization (ESI) workflow, as this decouples the LC step from the MS and MS/MS steps, and thus allows repeated interrogation of archived MALDI sample plates. In the context of our proposed analysis of a series of closely related fractions of similar content (
To track changing relative abundances of polypeptides between fractions, the inventors chose an isotopic dilution method that employs the primary amine-directed iTRAQ reagent as the label.17, 23 The iTRAQ labeling methodology is the most robust high throughput means of quantifying protein relative abundances by MALDI TOF MS/MS and offers an accuracy and precision comparable with the label-free ESI-based24-27 and MALDI-based28 methods. Furthermore, unlike other potential mass spectrometry labeling methods,29 iTRAQ multiplexes four samples in one analysis, further reducing the number of MS/MS analyses required.
The iTRAQ reagent was originally developed for comparing relative levels of peptides in protein expression profiling experiments. The inventors have adapted it in the following way for our purposes (
Reproducibility of iTRAQ-Based Protein Elution Profiling
In spite of this encouraging result, if a full scale implementation of the tagless strategy is to be successful, iTRAQ-based quantitation and column chromatography will have to be sufficiently reproducible so that data from different fractions, multiplex sets, columns, and days can be compared as part of a large single dataset. Therefore, reproducibility of the tagless method was examined at three levels: (a) reproducibility of mass spectrometric data acquired on a single instrument with the same spotted samples; (b) reproducibility of tryptic digestion, labeling and other sample preparation steps; (c) reproducibility of replica chromatography separations of protein mixtures.
Repeated analysis of the same LC MALDI plate gave essentially the same iTRAQ ratio values for relative abundances of polypeptides, indicating high analytical reproducibly of the mass spectrometers employed in this study (data not shown). Duplicate proteolytic digestion and iTRAQ reagent-labeling performed on the same set of fractions and followed by separate LC MALDI MS/MS analysis produced very similar elution profiles for components of the pyruvate dehydrogenase complex. In both experiments, subunits AceE, AceF and LpdA had similar elution profiles between fractions E4 and E8, suggesting that sample preparation was fairly reproducible (
To establish the reproducibility of chromatographic separations, two protein fractionation experiments were compared. Both used gel filtration followed by anion exchange chromatography of an E. coli lysate but were carried out at different scales using differing amounts of crude extract and different size columns. Despite these significant differences, the RNA polymerase, pyruvate dehydrogenase and 2-oxoglutarate dehydrogenase complexes all eluted in the same order and maintained very similar elution patterns (
Identifying known protein complexes. Next it was necessary to more thoroughly test the feasibility of identifying protein complexes by tracking polypeptide elution profiles using the iTRAQ approach by examining profiles of members of known protein complexes. This was accomplished by assaying 15 fractions grouped into five linked multiplex sets from across the larger scale Mono Q chromatography fractions described above. All fractions were initially analyzed utilizing 4700 Proteomics Analyzer (Applied Biosystems) and then the first 10 fractions encompassed by three four-plexes were re-spotted and re-analyzed using 4800 Proteomics Analyzer (Applied Biosystems). A total of 103 non-ribosomal polypeptides were identified on the basis of at least one peptide (Table 3). Then the literature was consulted to learn how many known protein complexes and protein-protein interactions were to be expected among the polypeptides that were detected. The inventors ignored the fact that for some complexes, usually lower abundance ones, only a subset of polypeptides were identified and instead focused on whether those polypeptides that the inventors could detect were identifiable as co-migrating in the iTRAQ data. According to the EcoCyc database (found online at the BIOCYC website), 35 of the polypeptides the inventors detected with the tagless strategy were constituents of 13 known protein complexes, comprising 37 components (Table 2). of 13 known protein complexes, comprising 37 components (Table 2).
aProtein complex data based upon the content of the Encyclopedia of E. coli K-12 Genes and Metabolism (http://biocyc.org/ECOLI/new-image?object=Protein-Complexes).
bProtein identification based on one and two or more peptides for [ID1] and [ID2+] categories, respectively. MS/MS spectra of polypeptides matched to a single peptide are shown in Supplemental FIG. 3.
cUnique non-overlapping peptides were used to calculate protein sequence coverage defined as the ratio between the sum of amino acids encompassed by the confidently matched peptides (% CI > 95%) and the number of amino acids in a polypeptide sequence. For polypeptides observed in more than one four-plex, the best four-plex data are shown.
dElution profiles of all detected non-ribosomal proteins were compared using a modified Pearson's algorithm. “Y” means that at least two complex components were clustered at a threshod of 0.92.
eAt least two complex components shared at least one apex of elution.
fAt least two complex components eluted in the same fraction.
According to a large scale TAP analysis of protein-protein interactions in E. coli15, 21 of the polypeptides the inventors detected with the tagless strategy were expected to participate in 24 reciprocal pair-wise interactions (Table 4). There was a significant overlap between these two sets of polypeptides since many of the components of known protein complexes were also detected by TAP methodology.
An ad hoc approach was employed to classify co-migrating polypeptides in the iTRAQ data based on polypeptides that showed maximum concentrations in the same fraction (elution apices). Of the known complex components from the EcoCyc database, the great majority (78%) shared the same elution apex. An additional 16% were detected in the same fractions, and hence 95% of the expected protein complex components demonstrated close co-elution (
Despite the broad similarity in elution profiles for known complex components, some intriguing differences between their profiles were seen in a few cases. For example, RNA polymerase components NusA and Rho demonstrated much narrower elution peaks than the core RNA polymerase subunits RpoA, B, and C (
Discovery of complexes by automated cluster analysis. The above analyses used ad hoc criteria to judge if polypeptide elution profiles were sufficiently similar to suggest that they are members of the same protein complex. However, adaptation of the tagless strategy in a high-throughput modality which generates many more fractions will require automated statistical analyses that can identify putative protein complexes and provide confidence estimates on the likelihood of that prediction. Towards accomplishing this goal, a prototype algorithm for automatically detecting complexes based on the clustering methods used, to detect co-regulated genes in expression microarray data34 was tested.
In general, the elution profile of a polypeptide can be plotted as an intensity map in a multi-parameter grid space, where the coordinates of each grid specify a fraction and its intensity indicates the relative abundance of the polypeptide. For example, in a two-step protein complex separation scheme, the map could be plotted exactly like a 3-D geological map representing hills and mountains. The task of finding co-migrating polypeptides is then reduced to co-localizing “hill and mountain” peaks within a grid of the N-dimensional map. From the data analysis point of view, each peak is a subset of registered data points and detecting co-localized peaks can be achieved by performing clustering analysis of the subset over the entire collection of protein elution profiles.
The results of such clustering analysis were compared to the manually curated groupings of co-migrating polypeptides. Our current clustering algorithm correctly grouped 69% and 87% of the polypeptides sharing the same apex of elution that were manually classified as members of either EcoCyc known complexes or reciprocal TAP-defined interactions, respectively (Table 2 and Table 4). The differences between the ad hoc and computational methods of grouping polypeptides reflected differences in the criteria used. The manual evaluation was based solely on shared fraction and shared apex elution, whereas the clustering algorithm also took into account additional features such as peak shape, peak resolution and the presence of multiple apices within a contiguous portion of polypeptide elution profile. These additional constraints resulted in the exclusion of some of the known complex components from certain clusters. For example, the RNA polymerase components RpoA, RpoB, RpoC, RpoD, and NusA were included in the same cluster, but RpoZ and Rho were not (
The data inputted into the clustering analysis was not limited to members of known complexes, but included all 103 non-ribosomal proteins. Not surprisingly, given the crude and complex nature of the chromatography fractions analyzed, clusters that included members of known complexes frequently contained additional polypeptides that seemed unlikely to be uncharacterized members of these complexes (Table 5). For example, YbbN, MetK and GroEL clustered with members of the RNA polymerase complex (
Purification of Protein Complexes
The tagless strategy was able to identify putative protein complexes without a need for complete purification. However, these results indicate that a majority of high and moderately abundant stable protein complexes can be purified to near homogeneity by an optimized tagless fractionation method employing four orthogonal separation steps and scaling up the amount of starting material. Even with the pilot fractionations employed here, i.e., using only anion exchange and size exclusion chromatography, three complexes (pyruvate hydrogenase, RNA polymerase and GroEL) have been purified to apparent homogeneity from E. coli cell lysate (
The inventors have established proof of principle evidence for the feasibility of employing a tagless strategy for protein complex identification and purification. The inventors estimate that at least around 50% of bacterial polypeptides participate in complexes that are sufficiently stable to survive the multiple chromatographic steps. The range of complexes identified by the tagless strategy is likely to be comparable to those identified by TAP experiments. Out of 24 TAP-detected reciprocal interactions,15 only three (MetK-SecA; MetK-DnaJ and GyrA-GyrB) had completely dissociated during purification (Table 4). In addition, there is good reason to believe that those complexes that are disrupted by the use of an affinity tag, and therefore are not detectable by TAP, will be identifiable by a tagless approach. Relative quantification using iTRAQ reagents allowed co-migration of polypeptides to be determined and the chromatographic separation appeared sufficiently reproducible such that results across multiple parallel chromatograph columns, each separating different subsets of total cellular protein, could be meaningfully compared. Even a relatively simple clustering algorithm was effective at automatically detecting members of protein complexes using data from only two dimensions of separation. Several of the more abundant complexes were purified to greater than 70% homogeneity.
The samples analyzed by iTRAQ LC MALDI MS/MS were derived from a subset of the protein fractions of a two dimensional scheme and represented only approximately ten percent, by mass, of the of water soluble E. coli proteins. Thus, even at this current pilot scale it is likely that around one thousand of polypeptides would have been detected had all fractions from the scheme been analyzed by mass spectrometry. The remaining two thousand or so water soluble proteins that would not have been detected are in most cases likely to be of lower abundance. Hence, by starting with large amount of crude protein extract and employing four, rather than two, orthogonal chromatography separation steps, it should be possible to detect the great majority of these lower abundance polypeptides. Of the two constraints inherent to analysis of low abundant species, i.e., dynamic range challenges and availability of material, the former is currently being addressed by performing extensive protein separation involving multiple chromatographic steps. The latter constraint is not a major obstacle since biomass for our target organism D. vulgaris is currently produced on a scale of 400 l scale (4×1013-4×1014 cells) that delivers ˜10 g soluble protein (˜200 μmol of total protein, assuming an average polypeptide MW of 50 kDa). Within this mixture, a low abundance polypeptide expressed at the level of 10 copies per cell will constitute ˜670 pmol material that corresponds to a 3.3×10−6 portion of total protein. The current yield after the four protein complex separation steps, tryptic digestion and iTRAQ-labeling is estimated at ˜0.5%. Assuming the same level of recovery of low abundance complex components and anticipating a spread of protein complex elution during a 4-step fractionation into 50 fractions, 3.35 pmol of the low abundance protein will be recovered at a level of 67 fmol per fraction or ˜130 fmol per iTRAQ multiplex, assuming the worst case situation when only two fractions within a four-plex might contain a protein complex. This scenario brings us within the current practical detection limits of MALDI TOF/TOF instrument. With expected increase in the sensitivity of mass spectrometers over the next five to ten years, nearly all complexes should be detectable with such a fractionation. The inventors have now established a four dimensional fractionation at this lager scale and are now optimizing each fractionation step (unpublished data). While the success of discovery of any specific low level protein complex will be highly dependent on the extent of its separation from other species, efficiency of digestion and labeling and quality of MS/MS, in principle detection of low abundance complexes is within the realm of possibility.
In another embodiment, the gel apparatus is used to separate the proteins in the tagless survey of proteins. In another embodiment, the apparatus and steps carried out are automated and software-controlled by a computer. A major advantage of the tagless approach is that by its design it is intrinsically more amenable to automation than TAP as it consists of fewer types of operations and is highly repetitious. For example, no genetic manipulation of the organism is required and only one large culture of cells need be grown. With the automation of the sample preparation and chromatographic separations and development of a data analysis pipeline that is coupled to real time control of the mass spectrometer to eliminate redundant and time consuming analysis of peptides from the same protein, and the expected future increase in the speed of MALDI MS/MS instruments, it should be possible to achieve much higher throughput identification of protein complexes than is currently possible.
In other embodiments, additional methods to establish the accuracy and veracity of putative complexes identified by the tagless strategy will be needed. In one embodiment, an increase in the number of fractionation steps and the use of more complex clustering algorithms that employ quantitative data on the migration of polypeptides across four chromatographic dimensions to reduce the occurrences of “opportunistic co-eluting” of unrelated proteins seen in the pilot study. At least in some model organism a subset of putative complexes could and should be verified by reciprocal TAP analysis. In general, it is critical to cross-verify the predictions made by any method to identify protein complexes system wide using a combination of biological and analytical techniques.
In conclusion, the tagless protein complex identification strategy is a discovery as well as a purification tool. Its great strengths lie in the ability to analyze native systems and in the potential of highly automated high throughput execution. The inventors expect that a combination of tagless- and immunoaffinity-based complex isolation strategies will greatly expand the amount of information about the biology of organisms and provide orthogonal confirmation of the overlapping results.
Any figures or details that can not be easily viewed in this patent application can also be found online at the website of a published journal article that relates to the invention. The reference, A “Tagless” Strategy for Identification of Stable Protein Complexes Genome-wide by Multidimensional Orthogonal Chromatographic Separation and iTRAQ Reagent Tracking, J. Proteome Res., 2008, 7 (5), pp 1836-1849, is accordingly fully incorporated by reference in its entirety herein.
Based on the scheme described above, a prototype, 16-channel instrument (see
To cast either SDS or native PAGE gels, first the bottom of the column was sealed using thin plastic film, then gel solution was poured into the column to the desired length and a 5 mm long section of water-saturated butanol was added on top to separate gel solution from air and to maintain a flat gel surface. The gel solution was made fresh each time using stock solutions of 30% acrylamide (Bio-Rad, Cat. #161-0156), 375 mM TrisHCl buffer (pH 7.8), 0.1% (v/v) TEMED and 0.03% (v/v) APS. Upon solidification of separation gel (usually about 3-4 hours), for stacking, 4% stacking gel solution, using 125 mM TrisHCl (pH 6.8) instead, of desire length was poured on top of the gel and butanol added again. The length of stacking gel was kept, at least, twice as the sample's to be loaded, with a minimal of 1 cm. Rinse gel surface with Millipore water once the gel was solidified to get ride of excess butanol. Gels were run in 1× running buffer (10 mM Tris, 76.8 mM Glycine; for SDS gels, add 0.2% (v/v) SDS). Typical electrophoresis condition was 20-30 volts/cm, with a power limit of 1-2 watts/column
Crude Desulfovibrio vulgaris (D. vulgaris) extracts (˜5 mg/ml, typical), were prepared with an addition ⅓ sample volume of 6× sample loading buffer (60 mM Tris, 460.8 mM, 60% glycerol, 0.03% Bromophenol Blue). For SDS samples, the loading buffer contained additional 18% (v/v) of SDS and samples were denatured at 95 degree C. for 10 minutes prior to sample loading. For sample with high salt, such as the HIC fractions, a desalt column from GE HealthCare (PD-10 Column, Cat. 17-0851-01) was used to remove salt and buffer exchanged into 125 mM TrisHCl pH 6.8 buffer. To concentrate desalt HIC samples, Millipore's 3K Amicon Ultra column (UFC800324) were used, reducing sample volume to 200˜600 μl per fraction. HIC samples were prepared with ⅓ sample volume of 6× Stacking Gel Sample Loading Buffer (375 mM Tris HCl (pH 6.8), 60% (v/v) Glycerol, 0.036% (v/v) Bromophenol Blue.).
To monitor and evaluate separation and elution results of this instrument, fractions were sampled and analyzed by traditional slab technique. 12.5 μl from each fraction sampled was mixed with 3 μl of 6× sample loading buffer (375 mM TrisHCl pH 6.8, 60% glycerol, 0.03% Bromophenol Blue, add 18% of SDS for SDS gel) and loaded onto a slab gel (Bio-Rad's Criterion Tris-HCl Gel, 4-15% (cat. 345-0029) for native samples, 4-20% (cat. 345-0034) for SDS samples). Native gels were run at 200V in 1× Gel Running Buffer (0.01M Tris, 76.8 mM Glycine), SDS gel at 200V in 1× gel running buffer with 0.2% SDS until the dye front reached the bottom of the gel. Gels were stained using Invitrogen's Silver Quest staining kit (LC6070).
To demonstrate that this instrument works, we first used it to separate and elute a mixture of denatured proteins by SDS gel electrophoresis.
Another test was performed using crude extract of D. vulgaris.
We have found that the protein separation reproducibility of this system is similar to other PAGE instruments, based on approximately 100 test runs performed over 2 years. In each run, the system was disassembled and reassembled with new buffer and gels, and multiple samples (identical replicates or different) were loaded and fractionated. For the same running conditions, the dye front arrival times, from lane-to-lane and day-to-day, typically varied by only 1 to 2 minutes (see
To understand the separation characteristics of linear, native polyacrylamide gels in our free flow electrophoresis system, we first developed protocols for casting and operating different percent gels in the single mode.
This suggested that a combination of 2 cm long gels of 5% and 8% might just cover the mass range of 30˜450 kD, where most of our proteins and protein complexes were located. FIG. 7 illustrates results of such a combination, including the ones obtained with samples of HIC fractions. In our “tagless” scheme, HIC fractions, all in large volume (2.5 ml per fraction) of high salt buffer, are inputs for native gel electrophoresis. They were desalted and concentrated to 200-600 μl, without apparent loss. This allowed loading of an entire HIC fraction into single gel column.
Clearly, the 8% portion worked as anticipated (see
Further optimization of the concentration and length of the upper gel column should achieve desired separation across the entire mass range. Also, for studying membrane protein complexes, the lower gel must be further optimized to increase resolution of smaller proteins as the membrane proteins and complexes tend to be much smaller.
We have noticed an extra tail trailing from fraction #32 and up in
Proteins called based on co-eluting—the “guilty by association” principle. A complex must survive a comprehensive and complete separation and its components detected by mass spec. Complexes must be validated and confirmed by other assays.
This method can be high throughput, generic and sensitive—simultaneous identification and purification of many complexes, sensitive protein detection and large amount of material available.
A major rate limiting step in current mass spectrometry is sample preparation. A variety of methods have been used. In one approach, the purified complex is denatured and the constituent polypeptides separated prior to tryptic digestion (e.g. Gavin et al, 2002). Such methods, though, are inherently slow and difficult to automate. The present strategy is to employ a liquid chromatography “shot gun” approach, in which all the polypeptides in a fraction are digested with a protease and then reverse phase or two dimensional HLPC is used to separate peptides prior to analysis by MS/MS mass spectrometry (e.g. Butland et al, 2005).
The tagless strategy would basically comprise the following steps: A crude protein extract is fractionated successively by different chromatographic methods, such as by size exclusion chromatography, ion exchange chromatography, hydrophobic interaction, chromatofocusing. Selected fractions representing the full repertoire of proteins from each column are fractionated on the next column, and the process is repeated. Usually in these experiments after each column separation is performed, only those fractions that contain the protein being assayed are pooled and used for subsequent rounds of purification. If, however, fractions were to be separately taken that collectively represented the full repertoire of proteins present on a column and each were fractionated in parallel by a second chromatography method, and this process were to be repeated successively, a large number of fractions would be produced that would contain purified or partly purified and separated proteins and protein complexes. It is estimated that with an optimized strategy, it should be possible to detect by mass spectrometry the majority of water soluble stable complexes present at least 10 molecules per cell by analysis of around 10,000-20,000 chromatographic fractions.
The multi-channel gel electrophoresis system is planned to fill a large role in separation and fractionation of proteins and protein complexes.
The tagless strategy involves the analysis of sets of neighboring fractions. It would be prohibitively slow with current protocols to exhaustively analyze all detectable peptides in each fraction by MS/MS sequencing. To overcome this problem, a MALDI TOF/TOF mass spectrometer is used as the principle screening tool and link that to intelligent rapid data analyses algorithms that use information from each fraction and its neighbors to greatly reduce the number of peptides sequenced. By identifying ions in 1D MS spectra that derive from polypeptides whose identities have been determined in an earlier fraction, many ions can be eliminated from further MS/MS analysis. A critical advantage of MALDI over ESI that suits it for the present purposes is that it provides a ready means to archive samples, allowing quick and repeated return to the same fraction.
The high throughput method once implemented would allow screening a large number of samples that contain anywhere from a few to 20-30 polypeptides. However, it is unlikely that this pipeline will be 100% efficient in identifying all components of heteromeric complexes, and it will not provide quantitation on the relative stoichiometry of their polypeptide constituents. Therefore, once fractions containing sets of co migrating polypeptides have been defined, a set of more standard low throughput mass spectrometry methods will be used to provide a more complete characterization of the putative protein complexes. Once established, our combined high throughput screen and final polishing pipeline will be generally useful for many applications in high throughput mass spectrometry.
All separations were performed at 4° C. and protein elution was monitored by UV at 280 nm. E. coli lysates were prepared as previously described.19 Protein extracts at 20-50 mg/ml were separated by gel filtration on either a 1.6 cm×60 cm (120 ml) or a 2.6 cm×60 cm (320 ml) Sephacryl S-400 column equilibrated with buffer A (25 mM HEPES, 10% glycerol, 0.01% NP-40, 2 mM DTT) containing 100 mM NaCl; either 50 or 500 mg protein was loaded for the small- and large-scale experiments, respectively. The high-molecular-weight fraction from each column, represented by the first of the two major UV peaks eluting from the sizing column ( 1/7- 1/10 of the total protein eluted), was further separated by anion exchange chromatography using either an 8 ml or a 20 ml Mono Q column. The columns were developed with a NaCl gradient (from 100 mM to 600 mM) in buffer A that spanned 25 column volumes. For 8 ml and 20 ml columns, the flow rate was 2 ml/min and 4 ml/min with the collection of 25% and 10% column volume fractions, respectively.
A portion of the Mono Q fractions was subjected to a further gel filtration purification step using either a 1.0 cm×30 cm (24 ml) Superose 6 or a 0.46 cm×10 cm (1.7 ml) Source 15PHE 4.6/100 PE column. The Source 15PHE column was first equilibrated with buffer B (25 mM HEPES, 10% glycerol, 2 mM DTT) with 1 M (NH4)2SO4. After sample loading, the column was developed with a linear gradient from Buffer B with 1 M (NH4)2SO4 to Buffer B without (NH4)2SO4.
Chromatography fractions were analyzed by SDS PAGE using the Criterion Precast gel system (Bio-Rad) 4-15% SDS PAGE gradient gels and 4-20% Native PAGE gradient gels were used. Gels were stained using a SilverQuest™ silver staining kit (Invitrogen).
Selected portions of the anion exchange chromatography eluates were sampled for mass spectrometry analyses at a frequency of 25% or 50% column volumes. Specifically, one in two or one in six fractions were assayed, a total of seven and fifteen fractions for the small- and large-scale experiments, respectively. The protein content of the fractions was estimated by using the Bradford assay.20 This information was used to ensure that protein digestion and derivatization for each experiment were performed at similar protein concentrations. Equal fraction volumes were digested and labeled when their respective protein concentrations were within 100% of each other. Otherwise, fraction volumes with equal protein concentrations were used as the starting material. Briefly, the proteins in each fraction were precipitated with acetone (6× volume excess), solubilized in 100 mM triethylammonium bicarbonate buffer (TEAB, pH 8.5) containing 0.1% SDS, reduced with tris-(2-carboxyethyl)phosphine (TCEP), alkylated with methyl methanethiosulfonate (MMTS) and digested with porcine trypsin (Pierce) at 37° overnight. The resulting tryptic peptide mixtures were derivatized with iTRAQ reagents in the TEAB buffer/80% ethanol for 1 hour at room temperature. The manufacturer's protocol for iTRAQ reagent labeling was followed, however, an approximate 4-5× higher iTRAQ reagent:protein ratio was used at the protein scale of ˜20-25 μg. Post-labeling, four consecutive Mono Q fractions, each tagged with a different iTRAQ reagent, were combined to generate a multiplexed sample; consecutive multiplexed samples shared one common fraction. The sample volume was reduced to ˜10-20 μL on a SpeedVac prior to one-step cation exchange chromatography which was carried out using the resin-containing cartridge and buffers provided by the manufacturer.17 The elutes that contained the peptide mixtures were concentrated to a volume 10-20 μL and stored at −20° C. prior to MALDI LC MS/MS analysis.
A Pepmap C18 trap column and a nano-column (100 μm i.d., 15 cm length, Dionex/LC Packings), were used for desalting and reversed phase (RP) peptide separation, respectively. A 30 minute linear gradient from 2% B to 40% B was run at 500 nl/min flow rate, utilizing solvents A: 2% AcCN/0.1% trifluoroacetic acid (TFA) and B: 85% ACN/5% isopropanol, 1.0% TFA using an Ultimate LC System (Dionex/LC Packings). Reversed phase-separated peptides were collected directly onto a stainless steel MALDI target utilizing Probot (Dionex/LC Packings) spotting robot. Column elute was combined, in a mixing tee, with MALDI matrix (α-cyano-4-hydroxycinnamic acid, 6 mg/ml in 80% ACN/0.1% TFA/10 mM dibasic ammonium phosphate), containing 25 fmol/μl Glu-fibrinopeptide (GluFib) for internal calibration, delivered at 1 μl/min. Peptides were analyzed on a 4700 and 4800 Proteomics Analyzer mass spectrometer (Applied Biosystems/MDS Sciex) in the positive ion mode. The 4700 and 4800 Proteomics Analyzers were equipped with TOF/TOF™ ion optics and a 200 Hz NdYag laser.21 For collision-induced dissociation (CID), the collision cell was floated at 1 kV (4700) or 2 kV (4800), the resolution of the precursor ion selection was set to 200 and 300 FWHM for the 4700 and 4800 analyzers, respectively and air was used as the collision gas at 5×10−7 Torr. Automated acquisition of MS and MS/MS data was controlled by 4000 Series Explorer Software. Internal one-point calibration utilized m/z of monoisotopic molecular ion of GluFib that met the following acceptance criteria: S/N 50, mass error 50 ppm; when the acceptance criteria were not met, default calibration based on a plate model algorithm (Applied Biosystems) was employed.
Typical mass accuracy was within 10 ppm and 50 ppm for the internal and default calibration, respectively. Automated MS/MS data analysis was performed utilizing GPS Explorer software 3.5 with MASCOT 2.1.0 (Matrix Science) software for protein identification and quantitation of iTRAQ reporter ions. The following criteria were employed for generation of MS/MS peak list: S/N 5, m/z 50 to −20 from a precursor molecular ion, 50 peaks per 200 Da, a maximum number of peaks 80. E. coli taxonomy within Swiss Prot protein database, release 48.0 of 13 Sep. 2005 and release 49.6 of 2 May 2006, was interrogated for the data sets generated on 4700 for all 15 fractions and on 4800 for the first 10 fractions, respectively.
The following search parameters were utilized: precursor mass tolerance 50 ppm; fragment mass tolerance 0.15 Da; tryptic digestion with 2 missed cleavages; fixed modifications: S-MMTS, K-iTRAQ and N-term iTRAQ; variable modifications: deamidation (Asn and Gln); Met-sulfoxide. GPS Confidence Interval (C.I. %) of 95% was used as the acceptance criteria and hence identification of each polypeptide was based upon at least one peptide that scored above a threshold value set by the Mascot search engine to indicate identity or extensive homology of proposed sequence at p<0.05. The reported protein list was manually updated to reflect the UniProt protein entry names and accession numbers (release 53.2 of 26 Jun. 2007); EcoCyc database22 (http://ecocyc.org/) was utilized to facilitate this process. Average relative ratios were calculated for each polypeptide using the GPS Explorer 3.5 algorithm without invoking a “bias” correction option. Only peptides that were completely labeled with iTRAQ at N-termini and lysines and whose individual relative ratios were different from zero were considered while calculating protein average. The outliers were automatically excluded.
To evaluate the extent of side reactions, the data were re-analyzed by interrogation of the same database using the same parameters as described above with the exception of iTRAQ settings, this time specifying a flexible rather than a fixed modification type and allowing for tyrosine derivatization. Only a limited number of under-derivatized peptides was revealed and no hits carrying iTRAQ-labeled tyrosine were found. In order to minimize the number of overlapping precursors, precursor ion selection for MS/MS data acquisition was performed at the resolution as high as possible without significantly jeopardizing sensitivity and a filter of a minimum of 200 resolution between a target precursor and potential non-related molecular ions was applied. Nevertheless, given the complexity of the sample and the limitation of the TOF/TOF precursor ion selection window it is inevitable that some of the quantitation data might have been adversely affected by interfering ions. A potential presence of multiple precursors was not addressed by the GPS software and no systematic examination of all the data was undertaken to evaluate the extent of the possible problem. However, a limited number of MS and MS/MS spectra, predominantly those derived from proteins represented by a small number of peptides, were examined manually and in the great majority of cases, no significant level of unexplained (product ion) signals was observed (
Not shown is evidence of identification of category [ID1] polypeptides that were matched on the basis of a single peptide: annotated MS/MS spectra. Each MS/MS spectrum was accompanied by the following information: polypeptide ID # (see Table 3), polypeptide code name and entry name, peptide sequence, experimental m/z of molecular ion and an error of mass measurement (in ppm). Theoretically expected masses of product ions are shown in the tables (in insests) and fragments that were detected are highlighted. The spectra were processed by Data Explorer 1.9 tools: baseline correction and noise filtering. The mass errors of the reported peptides were consistent with errors of other confidently identified peptides detected at the same MALDI target spots.
The final average relative ratios for the individual polypeptide components of each multiplexed set were normalized to the same fraction volume. Separate multiplexes were aligned and elution profiles for each polypeptide (over the entire chromatographic run) were drawn using the following procedure. The absolute values of the relative ratios measured for each polypeptide in a fraction that was shared between two adjacent multiplexed samples were equalized using the value of the precedent fraction as a reference point. The relative ratios of the same polypeptide in the remaining three fractions of the multiplexed sample were then adjusted to maintain the original ratio. Finally, relative polypeptide abundance was determined by arbitrarily assigning a value of 1.0 to the apex of each polypeptide elution peak and normalizing all other data points accordingly. When multiple apices of a contiguously eluting polypeptide were seen, the highest value within the original peak profile was used as a reference point and assigned a value of 1.0. By definition, all other apices had values that were less than 1.0 and their relative ratios corresponded to the abundance of the same polypeptide in fractions that were collected at different times. In this schema, local differences in apex values for different polypeptides were a consequence of the arbitrary method that was used to calculate elution profile and hence, they had no physical meaning. After normalizing and scaling, elution profiles were plotted as 2-D graphs where the ordinate values corresponded to the relative abundances of the component polypeptides in each fraction and the abscissa values represented the order in which the fractions eluted.
To identify putative protein complexes, a comparison of polypeptide elution profiles was performed within all the fractions where the polypeptide was observed. Average relative ratios calculated for each polypeptide by the GPS Explorer 3.5 algorithm that were normalized and scaled, as described above, were employed for clustering analysis. The first step was to identify all valid profile peaks using the following process: (i) find the center, left and right edges for all elution peaks for all polypeptides using a simple peak detection algorithm developed in our laboratory (ii) filter out the noise. The latter was accomplished by examination of the peak intensity ratios relative to the highest peak in the same polypeptide profile (R1) and relative to the intensities of its own left and right edges (R2). If any of the ratios, R1 and/or R2, were below the threshold (R1≦0.15 and R2≦1.20), the peak was classified as noise. The R1 and R2 threshold values are dependent on the data complexity and quality and might need further tuning in the future as the data size grows. Once a set of elution peaks of all polypeptides was established, Pearson correlation coefficients between any two peaks that overlap significantly were calculated. In this work, Pearson correlation coefficients were used as a measure of similarity between two peaks, see the formula below where (x1, x2, x3, . . . xn) and (y1, y2, y3, . . . yn) are normalized intensities of peaks x and y across fractions 1, 2, . . . n, and X and Y are their average intensities across all n fractions, respectively.
The clustering analysis routine was based on an algorithm originally developed for evaluation of gene expression profiles (http://genetics.stanford.edu/˜sherlock/cluster.html). This algorithm was customized to accommodate our polypeptide elution profile data. Mathematical averages of coefficient values of clustered peaks were used as the metrics for similarity measurement. Based on these criteria, a putative complex is called if the average Pearson coefficient of a cluster of polypeptides exceeds a threshold value of 0.92.
To provide a first proof-of-principle that the strategy allows protein complexes to be detected, the inventors first quantitated the relative levels of five RNA polymerase subunits across a series of Mono Q anion exchange fractions.16 In this and subsequent iTRAQ analyses, fractions were sampled at a frequency such that they were separated by at least one fraction and by no more that 25%-50% of a column volume as this was found to provide sufficient resolution to detect co-migration of polypeptides belonging to known complexes. The fractions themselves were quite heterogeneous, being derived from only two chromatography steps and contained a broad mixture of many polypeptides (
The above results suggest that iTRAQ quantitation is sufficiently accurate to detect co-migrating complex components. Since the fractions analyzed contain far more proteins than the highly purified fractions envisioned being assayed in our finalized tagless strategy protocol, the fact that the iTRAQ-based method was effective in these less than optimal circumstances was encouraging.
As part of a larger program to characterize and image the ensemble of macromolecular complexes in Desulfovibrio vulgaris Hildenborough (PCAP at LBNL), a bacterium of potential use in bioremediation of soils contaminated by toxic heavy metals (8-10), we have undertaken a survey of the most abundant complexes that are large enough to be distinguishable from one another within tomographic reconstructions of single cells. A total of 15 different macromolecular complexes with particle weights of at least 400 kDa were isolated by a “tagless” strategy (11), which is “unbiased” in the sense that it makes no prior assumptions about which protein complexes should be purified. Instead, purification used a high-throughput pipeline that includes differential solubility in ammonium sulfate, ion-exchange chromatography, hydrophobic interaction chromatography, and size-exclusion chromatography. In addition, DvH ribosomes were isolated by a special-purpose protocol similar to that used for the purification of 70 S E. coli ribosomes.
Because the collection of multiprotein complexes within DvH had not been cataloged, we used a “tagless” method to purify, identify, and structurally characterize those complexes that remain stable upon cell lysis. This method makes no assumptions about what proteins might exist in the form of multiprotein complexes, or what the subunit stoichiometries and quaternary structures of these complexes should be. Instead, comigrating protein subunits are separated on the basis of their physical properties, and the constituent polypeptides are identified by mass spectroscopy. The resulting proteomic survey reported here is intentionally limited to complexes with molecular mass greater than; ≈400 kDa and copy number greater than ≈100 per cell, because these would be the easiest ones to identify in EM tomograms due to their size and abundance. One of the complexes (phosphoenolpyruvate synthase) that was isolated in this way proved, however, to be a 265 kDa homodimer, which eluted during size-exclusion chromatography (SEC) as an ≈370 kDa particle. Electron microscopy of this particle subsequently showed it to have an elongated shape, thus explaining its anomalously high apparent molecular weight in SEC.
The biochemical identities and subunit compositions of the 15 “largest, most abundant particles” that we found within DvH are given in Table 6. Three of this set proved to be homo-oligomeric complexes of proteins (DVU0631, DVU0671, and DVU1012, respectively) for which no biochemical function could be identified or for which only weak similarity to proteins with known functions could be detected. Ten of the remaining 12 protein complexes whose biochemical functions could be identified with confidence are ones involved either in energy metabolism or in pathways of intermediary metabolism. The two remaining particles, GroEL and RNA polymerase, were already expected to be among the set of abundant particles in the desired size range.
Three-dimensional reconstructions were obtained at a resolution of 3 nm or better for 70 S ribosomes in addition to 7 of 15 complexes purified by the high-throughput, tagless pipeline. In addition, the values of particle weight obtained by size-exclusion chromatography and native gel electrophoresis were used to estimate the subunit stoichiometries of those complexes for which single-particle EM reconstructions were not successful. Images of the eight 3D reconstructions that were successful, images not shown, illustrate the fact that each such particle has a characteristic size and shape by which it could be identified. The extent to which diverse particles can be distinguished on the basis of their sizes and shapes supports the proposal that it will be possible to identify and localize a large number of different macromolecular complexes within cryo-EM tomograms, provided that these are obtained with a resolution in the range of 3 nm or better.
The preparation of samples for electron microscopy does not always produce specimens suitable for obtaining three-dimensional reconstructions, and as a result structures were not obtained for 8 of 15 complexes purified by the tagless approach. In some cases it appeared that the particles might be inherently flexible or polymorphic in structure, but in other cases we believe that the particles were easily damaged at some step during preparation for electron microscopy. Our success rate in producing informative 3D reconstructions is nevertheless at least 10 times higher than that reported in an earlier survey of complexes in the yeast proteome (12), possibly because our focus on characterizing only the largest such complexes. In addition, we took further time to optimize the details of preparing EM grids for each type of protein whenever the initial results looked promising, but there nevertheless was more heterogeneity than expected. Although the fraction of purified complexes for which we were able to get good three-dimensional reconstructions was thus relatively high, we believe that generic improvements in preparing single-particle samples for electron microscopy (rather than further biochemical purification of samples) could further improve the success rate and throughput.
Apart from GroEL and the 70 S ribosome, all of the remaining complexes whose biochemical identities can be assigned with confidence were found to have subunit stoichiometries or quaternary structures that are not fully conserved, even within bacteria, as is shown in column 7 and 8 of Table 6. The extent to which quaternary structures vary between different bacteria is quite surprising, because tertiary structure is normally well conserved over great evolutionary distance and because the quaternary structures of some homomeric (e.g. GroEL) and heteromeric (e.g. RNAP core enzyme) protein complexes have been found to be conserved over long evolutionary distances.
The striking nature of our observation is highlighted by a further description of the following four examples. First, the majority of DvH RNAP II is purified as an unusual complex containing two copies of both the core enzyme and NusA (particle E shown in
Cell culture and biomass production. Protein complexes were isolated from cells grown as mid-logarithmic cultures in 5-L or 400-L fermentors, which were run as turbidostats. As mentioned above, up to 4 orthogonal separation methods were used to purify multiprotein complexes solely on the basis of differences in their physical properties. The subunit compositions of samples containing purified complexes that ran on native-gel electrophoresis as predominantly a single band with Mr>400 k were characterized by SDS PAGE, and mass spectroscopy was used to identify the component proteins. Further details about cell growth, the purification of each respective complex, and the identification of proteins by mass spectroscopy are provided in Han et al., “Survey of large protein complexes in D. vulgaris reveals great structural diversity,” Proc Natl Acad Sci USA. 2009 Sep. 29; 106(39): 16580-16585, published online 2009 Sep. 11 hereby incorporated by reference in its entirety for all purposes.
D. vulgaris Hildenborough (DvH) (ATCC 29579) was obtained from the American Type Culture Collection (Manassas, Va.). A defined lactate-sulfate medium, LS4D (3) is used in all cultures. The medium is sterilized by autoclaving for 45 minutes at 121° C. Before inoculation, phosphate, vitamins and reducing agent (titanium citrate) are added to the medium. Stock cultures of DvH were prepared by growing the ATCC culture to log phase, and storing at −80° C. Starter culture is prepared inside an anaerobic chamber (Coy Laboratory Products, Inc., Grass Lake, Mich.) using stock culture at a ratio of 1 ml stock/100 ml LS4D. The starter culture is incubated at 30° C. and allowed to grow for 48 hrs to log phase (optical density at 600 nm of ˜0.3-0.4; ˜3×108 cells/ml). From the starter culture, a 10% subculture for inoculating the production culture is made in LS4D, in the anaerobic chamber, and incubated at 30° C. until log phase growth is reached (around 15 hours).
The production culture is grown in 5 L customized fermentors (Electrolab, Fermac 360, United Kingdom), run as turbidostats. PEEK headplates and agitators were specially manufactured so that there are no metallic wetted parts. The fermentor is autoclaved with 4.5 L LS4D medium and cooled on the bench under a nitrogen gas blanket. Once cooled, vitamins, phosphate and reducing agent are injected to the fermentor, followed by ten percent subculture (500 mL). The fermentor is continuously agitated at 200 rpm, maintained at 30° C., with nitrogen flowing through the headspace at 100 mL/min. Once log phase is reached, fresh medium is pumped to the fermentor at a dilution rate of 0.3 l/hr, maintaining an optical density of 0.6 (at 600 nm). The effluent passes through a chilling coil and is collected in a 20 L carboy where the temperature in maintained at 2-4° C. Effluent is collected over 12-15 hours, and then centrifuged at 11,000 g for 10 minutes, with refrigeration at 4° C. (Beckman Coulter, Avanti J-25). The supernatant is discarded, and the pellets are stored at −80° C. until further processing.
Purification of protein complexes Overview. The tagless purification strategy was based on previously described in the Examples above. All complexes were purified from cells derived from either a small scale culture of 20 L or a large scale culture of 400 L. Proteins were first bound to and then batch eluted from a QSepharose clean up column to remove many nonprotein impurities. 400 L scale preparations were then fractionated into six parts by ammonium sulfate precipitation. The ammonium sulfate fractions from the large preparation or the cleaned up small scale preparations were then fractionated by MonoQ chromatography. All the fractions from each MonoQ column were analyzed by both native and SDS PAGE to identify abundant protein bands that migrated at approximately 400 kDa or greater (
Experimental Methods. Extracts were prepared as described previously in Garczarek F, et al. (2007) Octomeric pyruvate-ferredoxin oxidoreductase from Desulfovibrio vulgaris, Journal of Structural Biology 159(1):9-18 and hereby incorporated by reference. 20 L bacterial cultures yielded crude extracts of 340 mg of protein and 400 L cultures yielded 10 g of protein. Chromatography was done using a AKTA FPLC system. All chromatography columns and media were from GE Healthcare. All separations were performed at 4° C. except hydrophobic interaction chromatography (HIC), which was run at room temperature. The concentrations of proteins were monitored by UV light at 280 nm. Mixtures of two buffers were used for ion exchange chromatography (IEC) and HIC. For IEC, buffer A contained 25 mM HEPES pH 7.6, 0 M NaCl, 10% (v/v) glycerol, 2 mM DTT, 0.01% (v/v) NP-40 and buffer B contained buffer A plus 1 M NaCl. For HIC, buffer A′ contained 25 mM HEPES pH 7.6, 10% (v/v) glycerol, 2 mM DTT and buffer B′ contained buffer A′ plus 2 M (NH4)2SO4. For SEC, the buffer used contained 25 mM HEPES pH 7.6, 0.05 M NaCl, 10% (v/v) glycerol, 2 mM DTT, 0.01% (v/v) NP-40.
Q-Sepharose clean-up: Protein extract supernatants were loaded onto either a 1.6×20 cm (small scale) or 5.0×30 cm (large scale) Q-Sepharose Fast Flow column equilibrated with 5% buffer B, and the bound proteins were eluted together with 50% buffer B. All fractions containing significant amounts of protein were pooled. The total protein amount obtained was 240 mg and 7 g for the small and large scale preparations respectively.
Ammonium sulfate precipitation: After the Q-Sepharose clean-up step, the large scale extract was fractionated into 6 parts by ammonium sulfate precipitation: 0-38%, 38-48%, 48-53%, 53-57%, 57-63% and greater than 63% ammonium sulfate saturation. Each cut, which contained between 568 mg to 1028 mg protein, was desalted into 5% 5 buffer B by buffer exchange using a G25 desalting column (5.0×30 cm).
Anion exchange chromatography: The post clean-up step small scale extracts were applied to a 20 ml 1.6×10 cm, 20 ml MonoQ column. Each desalted ammonium sulfate precipitation cut from large scale preparations was loaded to a 3.5×10 cm, 96 ml MonoQ column. All MonoQ columns were pre-equilibrated with 5% buffer B and developed with a linear gradient from 5% to 50% buffer B in 25 column volumes. For the 20 ml and 96 ml columns, the flow rates were 4 ml/min and 10 ml/min and fraction sizes were 4 ml and 24 ml respectively.
Protein complex survey: To quickly locate high abundance large molecular weight protein complexes, the Mono Q fractions were analyzed by native PAGE (e.g.
Protein complex molecular weight calculation: The molecular weights of purified protein complexes were determined from their migration on a 1.0×30 cm Superose6 column or a 1.6×60 Superdex200 column in SEC buffer. The molecular weight standards used to calibrate the SEC column were BSA (67 kDa), aldolase (158 kDa), catalase (223 kDa), ferritin (440 kDa), and thyroglobulin (669 kDa).
Protein copy number estimation: The copy numbers of protein complexes per cell listed in Table 1 were estimated from the amount of protein in the flow through of the QSepharose cleanup column and the Mono Q fractions; the estimated yield of total protein present after chromatography; and the number of cells used in the preparation. The amount of each complex in the MonoQ fractions or the Q-Sepharose flow through was estimated from native PAGE by comparing the target protein bands with known amounts of a BSA standard.
Electrophoresis and silver staining: Chromatographic fractions were analyzed by PAGE using Criterion Precast gels (Bio-Rad): 4-15% gradient gels for native PAGE and 4-20% gradient gels for SDS PAGE. Gels were stained using a SilverQuest™ silver staining kit (Invitrogen)
Identification of protein components by mass spectroscopy. Reagents used ACS/HPLC grade acetonitrile (AcCN) and HPLC water were from Honeywell Burdick & Jackson; trifluoroacetic acid (TFA) was from Pierce, Suprapur formic acid was from EMD Biosciences; sequencing grade modified porcine trypsin was from Promega; C18 ziptips and MultiScreen IP 0.45 μm Clear Non-sterile plates were from Millipore; guanidine hydrochloride, [tris-(2-carboxyethyl)-phosphine], iodoacetamide, polyvinylpyrrilodone 360 and ammonium bicarbonate were from Sigma.
Protein digestion. In-gel digestion of candidate proteins was performed according to the established protocol (7). Modified porcine trypsin from Promega was used at a final concentration of 12.5 ng/μl. In few cases, polypeptide components of protein complexes were not separated on the gel but directly digested with trypsin utilizing a 98-well PVDF plate format that we have adapted from Papac et al. (8). Briefly, protein was captured onto PVDF membrane of a MultiScreen IP 0.45 μm Clear Non-sterile plate, thoroughly washed, reduced and alkylated with iodoacetamide. Membrane was then blocked with polyvinylpyrrolidone 360, trypsin was added and digestion proceeded at 37° C. for 4 hr. Mixtures of proteolytic peptides were desalted using C18 ziptips, peptides were eluted with 50% AcCN/0.1% TFA.
Sample preparation for MS. For peptide mass fingerprinting (PMF) (9-13) and MS/MS analyses, desalted mixtures of proteolytic peptides were mixed with matrix solution (α-cyano-4-hydroxycinnamic acid 5 mg/ml in 50% ACN/0.1% TFA/10 mM dibasic ammonium phosphate) at a 1:1 ratio directly on a stainless steel target. For MALDI LC MS/MS analysis, samples were separated off-line, as reported previously (4), with the modifications outlined below. The Ultimate 3000 HPLC (Dionex Corporation, Sunnyvale, Calif., USA) that was custom plumbed to accommodate a dual parallel column arrangement was employed. Tryptic digests were separated on monolithic columns (200 μm I.D., 5 cm length, LC Packings, Dionex Corporation, Sunnyvale, Calif., USA) that alternated between a separation and clean up/re-equilibration stage. Following a 5 min isocratic step at 0% B, a linear gradient of 0-70% B in 14 min at a flow rate of 2.5 μl/min was used (A: 0.05% TFA; B: 95% AcCN/0.05% TFA). A SunCollect spotter (SunChrom, Friedrichsdorf, Germany) was used to collect eluate at a rate of one fraction (spot) per five seconds; collection started at 9 min and ended at 19.8 min, counting from the point of injection (129 spots total). Matrix was delivered at a 2.5 μl/min rate and mixed with the column eluate right before spotting onto the MALDI target. MALDI TOF MS and MS/MS Applied Biosystems 4800 Proteomics Analyzer (AB 4800) mass spectrometer (Applied Biosystems, Foster City, Calif., USA/MDS Sciex, Concord, ON, Canada) equipped with TOF/TOF™ ion optics and a 200 Hz NdYag laser (14) and controlled by 4000 Series Explorer Software V3.5.28193 was utilized. MS settings were: m/z range=800-6000 Da; total shots per spectrum=800-1500; single shot protection on (signal 12 intensity range=0-95000); fixed laser intensity=3800-4500. MS/MS data were generated using collision-induced dissociation (CID). MS/MS settings were: m/z range=[60-(10% below the precursor m/z)]; resolution of precursor ion selector=400 FWHM; metastable suppressor: on; total shots per spectrum=1500-4000 with stop conditions (1500 shots in maximum collected for spectra containing>6 peaks with S/N>80); fixed laser intensity=4700-5500; the collision cell was floated at 1 kV; no collision gas was used. AB 4800 MS mode was externally calibrated using Plate Model and Default MS Calibration Update software and employed a combination of six peptide standards (des-Arg1-bradykinin, angiotensin I, Glut-fibrinopeptide B and three ACTH clips: 1-17, 18-39 and 7-38) with the requirement of at least four standards passing the criteria of S/N of 300, mass tolerance of 0.5 Da, and maximum outlier error of 25 ppm. Default calibration of AB 4800 MS/MS data was based on minimum five matched fragment ions of angiotensin I detected with a minimum S/N of 120, mass tolerance of 2 Da and maximum outlier error of 20 ppm. Automated acquisition of MS and MS/MS data in the batch mode employed an interpretation method with the following settings: number of shots per spot=12; minimum S/N filter=50-80; minimum chromatogram peak width=1 fraction; resolution of precursor exclusion window=200 FWHM; trypsin autolysis peaks were excluded.
MS and MS/MS data analysis. PMF: Mass spectra were processed (baseline adjustment, noise filtering and monoisotopic peak filtering) using Data Explorer Software (Applied Biosystems, Foster City, Calif., USA/MDS Sciex, Concord, ON, Canada) to produce a list of monoisotopic molecular ion masses. Monoisotopic mass peak lists were submitted to the Aldente search engine (15, 16) for protein identification. A combination of two taxa; Desulfovibrio vulgaris subsp. vulgaris str. Hildenborough (DvH) and mammalia (taxon 40674) within UniProtKB/Swiss-Prot (Release 54.8 of 5 Feb. 2008) and UniProtKB/TrEMBL (Release 37.8 of 5 Feb. 2008) were searched using the following parameters: enzyme trypsin: one missed cleavage; fixed modification on Cys: carbamidomethyl (1 allowable; scoring factor 0.9), variable modification on Met: methionine sulfoxide (2 allowable; scoring factor 0.9); thresholds: shift=0.2, slope=200, error=25, minimum hits=4); mass range: 0-250,000 for all polypeptides but DVU101 for which mass range of 0-350,000 was used. Polypeptide identification was considered to be confident when its score was higher than a threshold value which was equal to a score generated by searching a random database, using pValue of 0.05 as a cutoff point; pValue was the probability of finding, for a given spectrum, a protein with the same score in a random protein database. Identities of selected polypeptides that demonstrated relatively low (DVU0460) or below-threshold scores (DVU3242) were confirmed by MS/MS.
MS/MS data were manually matched to the expected sequences. In accordance with the guidelines for publication of proteomics data (17), detailed information on MS-evidence leading to polypeptide identification is provided in Table 51 and Figures S20 to S26, as is indicated below, including PMF data on PMF-only identifications and MS/MS data on identifications based upon single peptides (“one hit wonders”).
LC MALDI MS/MS: Data analysis was performed using ProteinPilot software 13 (Version 2.0, Revision 50861, Applied Biosystems, Foster City, Calif., USA/MDS Sciex, Concord, ON, Canada) with Paragon search engine (18). The custom database that contained all DvH polypeptides and a selection of common contaminants, the latter from Applied Biosystems, was interrogated. The following parameters for ProteinPilot search were utilized: Sample Type: protein identification; Cys alkylation: iodoacetamide; ID Focus: biological modifications and amino acid substitutions; Species: none; Search Effort: thorough; Detection Protein Threshold: 1.3 (95%). Hits were considered to be of high confidence if at least one of at least two distinct peptides had a score of 2 (99% confidence). Polypeptides identified on the basis of less stringent criteria are also reported; their diagnostic MS/MS spectra are not shown.
Electron microscopy. Aliquots of the purified complexes were examined by singleparticle electron microscopy (EM) (29) of negatively stained samples. Uranyl acetate was used as the negative stain in the majority of cases, but ammonium molybdate was tried as a second choice when the results obtained with uranyl acetate were not acceptable. Particles were selected from areas of relatively thick stain in order to minimize the risk of flattening of particles, and images were recorded on film, using a JEOL 4000 microscope operated at 400 keV. Initial models of particle structures were obtained by the random conical tilt (RCT) method (30) whenever either low-pass filtered density maps of 13 homologous structures (e.g. the 70 S ribosome) or intuitive models were not an option. Further details are provided in Han et al, Proc Natl Acad Sci USA. 2009 Sep. 29; 106(39): 16580-16585, including polypeptide sequences, identifying spectra, representative micrographs, details of the reconstruction and refinement strategies, evaluation of the resolution of reconstructions by means of the FSC curve, and validation of results whenever possible by docking either known structures or homology models.
The above examples are provided to illustrate the invention but not to limit its scope. Other variants of the invention will be readily apparent to one of ordinary skill in the art and are encompassed by the appended claims. All references, publications, databases, and patents cited herein are hereby incorporated by reference for all purposes.
aProtein identification based on one and two or more peptides for [ID1] and [ID2+] categories,
bNumber of iTRAQ-labeled peptides that contributed to a calculation of average protein relative
cUnique non-overlapping peptides were used to calculate protein sequence coverage defined as
dGPS software (Applied Biosystems) score: confidence interval (% CI) above 95% signifies that
eProtein complex data based upon the content of the Encyclopedia of E. coli K-12 Genes and
fProtein-protein interaction data based upon the study of Butland et al.15
gElution profiles were compared using a modified Pearson's algorithm and clusters were defined
aProtein identificupplemental FIG. 3.
bNumber of iTRAe not analyzed
cUnique non-oveptides (% CI > 95%)
dGPS software (0.05,
eProtein complex
fProtein-protein i
gElution profiles given polypeptide.
aProtein identific
bNumber of iTRA
cUnique non-ove
dGPS software (
eProtein complex
fProtein-protein i
gElution profiles
indicates data missing or illegible when filed
aProtein-protein interaction data based upon the study of Butland et al.15
bProtein identification based on one and two or more peptides for [ID1] and [ID2+] categories, respectively. MS/MS spectra of polypeptides matched to a single peptide are shown in Supplemental FIG. 3.
cUnique non-overlapping peptides were used to calculate protein sequence coverage defined as the ratio between the sum of amino acids encompassed by the confidently matched peptides (% CI > 95%) and the number of amino acids in a polypeptide sequence. For polypeptides observed in more than one four-plex, the best four-plex data are shown.
dElution profiles were compared using modified Pearson's algorithm and clusters were defined employing a threshold of 0.92. Cluster ID “0” means that no partners were found for a given polypeptide.
ePolypetides shared at least one apex of elution.
fPolypeptides eluted in the same fractions.
gA summary of all TAP-derived protein-protein interactions. The following format was used: bN(R_D)_pM, where, b = bait; N = number of interactions reported for the polypeptide acting as a bait; R = number of reciprocal interactions reported for the bait; D = number or reciprocal interactions with partners detected in our study; p = prey; M = number of interactions detected for the polypeptide as a prey only.
aElution profiles were compared using a modified Pearson's algorithm and clusters were defined employing a threshold of 0.92.
bProtein identification based on one and two or more peptides for [ID1] and [ID2+] categories, respectively. MS/MS spectra of polypeptides matched
cUnique non-overlapping peptides were used to calculate protein sequence coverage defined as the ratio between the sum of
dProtein complex data based upon the content of the Encyclopedia of E. coli K-12 Genes and Metabolism
eProtein-protein interaction data based upon the study of Butland et al.15
fRelative abundance of each polypeptide detected in each of the analyzed fractions. Polypeptide elution profiles are derived from the iTRAQ-based
aElution profiles were compagiven polypeptide.
bProtein identification based upplemental FIG. 3.
cUnique non-overlapping pe eptides (% CI > 95%) and the number of amino acids in a polypeptide s
dProtein complex data base
eProtein-protein interaction d
fRelative abundance of each h polypeptide within each of the separately analyzed four-ple , annotated by an asterisk. Subsequently, the nce value) within its contiguous elution chromatogram.
indicates data missing or illegible when filed
DVU0671
DVU1044
DVU1198
(996)b
DVU1329
DVU2928
DVU2929
DVU3242
DVU0510
DVU1833
DVU1976
700e
DVU3025
†Entries in bold font indicate protein complexes for which three-dimensional reconstructions were obtained by single-particle electron microscopy (EM) of negatively stained samples.
#Stoichiometry is derived from EM data where we have determined the structure. In other cases, the stochiometry is derived from the SEC size estimation.
§Unless indicated by a specific literature citation, information about subunit stoichiometry was obtained from http://biocyc.org
a
E. coli also contains three DAHP synthetases (AroF, AroH and AroG) with stoichiometry α2, α2 and α4, respectively. M. tuberculosis AroG has stoichiometry α5 ((32).) Although Pfam lists Class I aldolases such as DVU0460 in a different family than DAHP synthetases, they are all classified in the same superfamily (Aldolase) in SCOP (40), based on structural evidence of remote homology.
bContribution of the Riboflavin synthase α-subunit to the particle weight is not included.
cPyruvate carboxylase is present in some bacteria as a single polypeptide chain and in other bacteria as α and β chains that are homologous to the C- and N- terminal parts, respectively, of the single-chain form of the enzyme. In cases shown here, the α and β chains from other bacteria comprise the same Pfam domains as the single DvH protein. We use αβ to represent the single-chain form.
dEM result indicates either a dimer or tetramer. Size-exclusion chromatography cannot distinguish between these possibilities.
eParticle copy number estimated on the assumption that the protein is present in the cell as a D7 14-mer rather than as the C7 heptamer isolated in our standard buffer conditions.
fHomologs of pyruvate ferredoxin oxidoreductase are sometimes fused and sometimes split into multiple chains. In the case shown here, the α, β, γ, and δ chains from T. maritima comprise the same Pfam domains as the single DvH protein. We use αβδγ to represent the single-chain form.
This application claims priority from U.S. Provisional Patent Application, 61/142,595, filed on Jan. 5, 2009, and U.S. Provisional Patent Application, 61/160,276, filed on Mar. 13, 2009, and International Application No. PCT/US2010/020167 filed on Jan. 5, 2009, all of which are hereby incorporated by reference in their entirety.
This invention was made during work supported under Contract No. DE-AC02-05CH11231 awarded by the U.S. Department of Energy. The government has certain rights in this invention.
Number | Date | Country | |
---|---|---|---|
61142595 | Jan 2009 | US | |
61160276 | Mar 2009 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2010/020167 | Jan 2010 | US |
Child | 13176704 | US |