1. Field of the Invention
The present invention relates to protein arrays, which allow for analysis of differentially expressed proteins in parallel.
2. Description of the Invention
A number of high throughput methods are currently used to catalogue the differential expression patterns of proteins within a given cell or tissue type, or between healthy, diseased and drug treated cells or tissues. For example, high throughput functional genomics tools such as DNA microarrays [1] are routinely used to generate large quantities of data on, for example, the transcriptomes associated with specific disease states [1] and rapid diagnosis methods based on DNA array technology are in now development [2]. At the protein level, 2-D gel electrophoresis [3] is now widely used to identify and catalogue proteins that are present or absent in different pathological states or cellular fractions. Examples of groups of differentially expressed proteins that are of interest include:
However, while the list of protein constituents of a tissue or cellular compartment at any one moment allows the gathering together of protein players in disease or specific cellular phenotype/function, it does not provide in isolation any functional data on interaction, activities and pathways conferring disease or function.
Currently there is a paucity of such functional data available to describe the exact phenotypic consequences of differential protein expression and it is the study of the protein compliment of a cell of tissue, known as proteomics, which will ultimately reveal this information. What is required then is knowledge of not only the relative protein expression levels but also the functional effects or consequences of the differentially expressed proteins, but currently no method exists to provide the latter in a high throughput manner since proteins are typically analysed for function in a one-by-one, serial manner.
Currently, differentially expressed proteins can be identified by two dimensional gel electrophoresis. Although this technique provides information on proteins present and absent and relative protein expression levels it does not provide functional information on the role of each differentially expressed molecule. Furthermore, this technique does not enable detection of proteins that are expressed in low copy number [4] and many proteins fail to be detected due to limitations in visualising differentially expressed molecules in gels by current staining techniques. The identification of small quantities of proteins, even by mass spectrometry, is not possible if the proteins cannot be detected and therefore are not identified as differential expressed.
Once differentially expressed proteins have been identified, protein-protein interactions can be studied on a one-by-one basis using yeast two-hybrid methods [Y2H; ref 5]. Y2H has several disadvantages for building a comprehensive picture of differentially expressed protein pathways. At present, Y2H can be used to identify the interacting partners for one “bait” protein at a time. This limits the input to a relatively small number of proteins and therefore does not allow proteome-scale experiments. Y2H also does not permit multiplexing as it does not allow screening of several protein baits in the same experiment in order to identify protein complexes that may form between them, which may influence further individual interactions. In addition, any non-yeast proteins that arc screened for binding to ‘baits’ may not be assayed in their native state, as expression and subsequent interactions occur in yeast. Finally, Y2H is not amenable to the study of entities that block or antagonise protein-protein interactions. This type of study is an essential criterion for assessing the importance of differentially expressed protein pathways for drug discovery. Small molecules that disrupt protein-protein interactions must be assayed by an independent method in this case.
Pull down assays/immunoprccipitations can be used to study protein-protein interactions. The test proteins are affinity tagged in vitro or in vivo and are then allowed to interact with other proteins within a solution or within the cell before being precipitated, via their tag, for identification of binding [6]. Although this method allows identification of many interacting protein partners at one time (complexes) providing additional information to Y2H, it also requires cell-based assaying that limits the throughput of the system. Finally, pull down assays or immunoprecipitations do not allow for the controlled testing of the effects of peptides or antibodies on the interactions observed as the experiments are conduced within the context of a whole cell and degradation and/or failure of local trafficking of the test molecule could prevent contact with the interacting proteins.
A number of recent publications [e.g. ref 7] have shown that arrays of functional proteins allow the individual members of an array to be screened simultaneously under identical conditions. This allows highly parallel and rapid experiments compared with other techniques, leading to directly comparable results across many proteins. These qualities set protein arrays apart from other assays for protein function, such as Y2H and immunoprecipitations/pull-downs, where the cellular compartmentalisation that is implicit in these other methods effectively divides each protein collection into individual proteins and thus individual assays.
Results obtained from the interrogation of arrays of the invention can be quantitative (e.g. measuring binding or catalytic constants KD & KM), semi-quantitative (e.g. normalising amount bound against protein quantity) or qualitative (e.g. functional vs. non-functional). By quantifying the signals for replicate arrays where the ligand (e.g. DNA, protein, antibody, peptide or small molecule) is added at several (for example, two or more) concentrations, both the binding affinities and the active concentrations of protein in the spot can be determined. Exactly the same methodology could be used to measure binding of drugs to arrayed proteins.
For example, quantitative results, KD and Bmax, which describe the affinity of the interaction between ligand and protein and the number of binding sites for that ligand respectively, can be derived from protein array data. Briefly, either quantified or relative amounts of ligand bound to each individual protein spot can be measured at different concentrations of ligand in the assay solution. Assuming a linear relationship between the amount of protein and bound ligand, the (relative) amount of ligand bound to each spot over a range of ligand concentrations used in the assay can be fitted to equation 1, rearrangements or derivations.
Bound ligand=Bmax/((KD/[L])+1) (Equation 1)
An estimation of inhibition (IC50) by a compound on an interaction can also be measured with replicate protein array assays.
Differentially expressed proteins represent a specific subset of the overall collection of proteins in a given cell, tissue or organism that may have particular clinical and pharmaceutical relevance. There is unlikely to be any significant sequence, structural or functional similarity across such a set of proteins and, as a result, protein arrays consisting of this protein group would represent a highly versatile tool with potential applications in drug target identification and validation processes as well as in drug selectivity screens and in delineation of differentially expressed protein-protein interaction maps. However, for such applications to be viable, the differentially-expressed proteins on the array need to be correctly folded such that they are likely to retain many if not all aspects of their natural function; such an array has not previously been described and is not obvious for a number of fundamental reasons:
In vitro screening of protein interactions in an array format has been demonstrated. In it's simplest form, microarrays have been generated from immunoglobulin molecules in order to capture proteins from solution [8-10]. These antibody arrays provide miniaturisation of the ELISA assay and enable high throughput analysis of e.g. cell lysates, serum samples or recombinant protein mixtures. A second example of protein array types is the antigen array, used to identify auto-antibodies in serum samples [11]. In these cases, the antigens are arrayed on a denaturing surface, making all linear epitopes available for antibody binding but destroying the native form of the arrayed molecules. Two examples of protein arrays in which the proteins were arrayed to retain correct folding and function have recently been described. In the first example, a ‘proteome on a chip’ was created for the relatively small yeast genome [12], enabling the researchers to identify activities based on binding to the proteins in their native conformations. In the second example, a small array of protein kinases has also been created [8]. In addition, arrays of randomly selected, functional proteins that have been specifically tagged at the N- or C-terminus have been created and interrogated to identify interacting partners such as DNA and small molecules have been described [13]. There has been no description to date of an array of folded, differentially expressed proteins.
Thus, there is still a lack of high throughput tools for the functional study of differentially expressed proteins and also a lack of tools to assay the effects of drug molecules on these functions in parallel. As the numbers of differentially expressed proteins may approach the hundreds, if not the thousands, a highly parallel method of functional analysis is needed that does not require antibodies, gels or beads to perform. The present invention is based on protein arrays and will have applications for specific drug target identification and validation and drug selectivity. Additional applications will include screening of interacting species, eg other proteins, to identify differentially expressed protein pathways.
The present invention is a collection of proteins, which together represent a proteomic footprint of a specific disease state or cellular compartment. The individual proteins in said collection are affinity tagged, purified and in a folded conformation. In addition, the individual proteins are spatially separated and specifically immobilised on to a surface in an array format such that the folded state of the individual proteins is unlikely to be perturbed.
Thus, in a first aspect, the present invention provides a protein array comprising a surface upon which are deposited at spatially defined locations at least two protein moieties characterised in that said protein moieties represent at least part of a set of proteins which arc differentially expressed.
As already indicated herein, examples of such groups of differentially expressed proteins include:
A protein array as defined herein is a spatially defined arrangement of protein moieties in a pattern on a surface. Preferably the protein moieties are attached to the surface either directly or indirectly. The attachment can be non-specific (e.g. by physical absorption onto the surface or by formation of a non-specific covalent interaction). In a preferred embodiment the protein moieties are attached to the surface through a common marker moiety appended to each protein moiety. In another preferred embodiment, the protein moieties can be incorporated into a vesicle or liposome which is tethered to the surface.
The number of proteins attached to the arrays of the invention will be determined, at least to a certain extent, by the number of proteins that occur naturally or that are of sufficient experimental, commercial or clinical interest. An array carrying one or two proteins would be of use to the investigator. However in practice and in order to take advantage of the suitability of such arrays for high throughput assays, it is envisaged that 1 to 10000, 1 to 1000, 1 to 500, 4 to 400, 1 to 300, 1 to 200, 1 to 100, 1 to 75, 1 to 50, 1 to 25, 1 to 10 or 1 to 5 such proteins are present on an array.
A surface as defined herein is a flat or contoured area that may or may not be coated/derivatised by chemical treatment. For example, the area can be:
a glass slide,
one or more beads, for example a magnetised, derivatised and/or labelled bead as known in the art,
a polypropylene or polystyrene slide,
a polypropylene or polystyrene multi-well plate,
a gold, silica or metal object,
a membrane made of nitrocellulose, PVDF, nylon or phosphocellulose
Where a bead is used, individual proteins, pairs of proteins or pools of variant proteins (e.g., for “shotgun screening”—to initially identify groups of proteins in which a protein of interest may exist; such groups arc then separated and further investigated (analogous to pooling methods known in the art of combinatorial chemistry)) may be attached to an individual bead to provide the spatial definition or separation of the array. The beads may then be assayed separately, but in parallel, in a compartmentalised way, for example in the wells of a microtitre plate or in separate test tubes.
Thus a protein array comprising a surface according to the invention may subsist as series of separate solid phase surfaces, such as beads carrying different proteins, the array being formed by the spatially defined pattern or arrangement of the separate surfaces in the experiment.
Preferably the surface coating is capable of resisting non-specific protein absorption. The surface coating can be porous or non-porous in nature. In addition, in a preferred embodiment the surface coating provides a specific interaction with the marker moiety on each protein moiety either directly or indirectly (e.g. through a protein or peptide or nucleic acid bound to the surface). A variety of surfaces can be used, as well as surfaces in microarray or microwell formats as known in the art.
In general, the individual members of the protein array each will contain an identical affinity tag through which they can be immobilised, thereby minimising the risk of perturbing the function of the arrayed proteins through non-specific contact with the surface.
The array format then allows the collection of differentially expressed proteins to be interrogated with a wide range of functional assays in a highly parallel manner in order to identify, for example, interactions with other proteins. In particular, the array provides a tool for screening individual entities that may block or antagonise these interactions across many hundreds or thousands of proteins simultaneously.
The collection of soluble, purified, and immobilised proteins making up a differential display array might be generated in several ways. After transcriptomic analysis such as subtractive hybridisation or cDNA microarray analysis or proteomic analysis such as two dimensional gel electrophoresis, differentially expressed cDNAs or proteins could potentially be identified by their sequence. Conventional cloning could then be used to express these genes as proteins for arraying. Alternatively the methods described in ref. 13 can be used for sequence-independent cloning and expression of a subtracted cDNA library. This latter approach does not rely on prior knowledge of the identity of the differentially expressed genes or proteins. It can therefore clearly be used to generate a set of tagged, purified, folded, differentially expressed proteins (and hence an array comprising any such set) from, for example, any given diseased or drug-treated cell or tissue since the differentially expressed cDNAs can be readily isolated by numerous methods, such as subtractive hybridisation, that are known in the art. In addition, cDNAs encoding collections of secreted proteins can be isolated using signal-trap systems known in the art.
Subtractive hybridisation represents one preferred method for isolating the set of differentially expressed cDNAs, and in particular readily enables identification of both up-regulated and down-regulated proteins.
The methods described herein are not restricted to use of recombinant protein expression hosts. As known in the art, a range of different expression hosts including bacteria, yeast, Drosophila spp. or mammalian cells could be transformed with vectors (plasmid or virus) containing the differentially expressed cDNAs fused in frame to an affinity tag. Similarly, the construction of the protein array is not constrained by the type or number of affinity tags used, although the proteins should be deposited in such a way on the solid surface as to retain their correct folding.
To make an array of purified, discrete protein spots, cell lysates can be deposited directly on to the surface, performing purification and immobilisation via the affinity tag in one step, and creating an array. Alternatively proteins can be purified via a first affinity tag and then arrayed for immobilisation through a second tag.
The protein arrays of the present invention can be probed with potential interacting partners such as crude cell lysates, other (individual) proteins, nucleic acids or small molecules/drugs to identify previously unknown interactions. The differential display protein array method allows the detection and characterisation of interactions by fluorescence, colorimetric and chemiluminescence techniques (microarray format). In addition, the array contains individually, highly purified proteins enabling each protein interacting with a differentially expressed protein on the array to be identified by mass spectrometry (microwell plate format) and surface plasmon resonance methods (fluidics chamber format). The method is not format dependant and does not require antibodies, beads, gels or recombinant host organisms in the assays, separating it from current methods.
Thus, in a second aspect, the present invention provides a method of making a protein array comprising the steps of:
Thus, utilising the information from a two dimensional gel electrophoresis, subtractive hybridisation or DNA microarray experiments for functional analysis of proteins to identify genes that are transcribed in a diseased tissue or cell that are not transcribed in the normal tissue of cell or genes which are expressed in a discrete cellular compartment or which are secreted cell proteins, and using a protein array format to display the encoded proteins, identified through these methods, such that they retain function (folding and solubility) enables the performing of highly parallel experiments on a ‘proteomic footprint of differentially expressed genes’.
This also allows for ‘on-array’ pull down and identification of multiple protein binding partners to each array member. Binding of one or many proteins from single or complex mixtures enables multiple proteins contained, for example, in multiprotein complexes, to bind and be identified via a single arrayed, differentially expressed protein. Simultaneous creening of the affects of new chemical entities, that may disrupt or antagonise such interactions, on hundreds or thousands of proteins under the same experimental conditions are possible
Thus, in a third aspect, the present invention provides a method for screening/evaluating the effect of a drug/chemical entity/therapeutic moiety which comprises the step of bringing said drug/chemical entity/therapeutic moiety into contact with an array of the invention.
Although the foregoing refers to particular embodiments, it will be understood that the present invention is not so limited. It will occur to those of ordinary skill in the art that various modifications may be made to the disclosed embodiments and that such modifications are intended to be within the scope of the present invention. Having now described the present invention in detail, the same will be more clearly understood by reference to the following examples, which are included herewith for purposes of illustration only and are not intended to be limiting of the invention.
The invention will now be described with reference to the following figures:
Differentially expressed cDNAs containing a signal peptide sequence are expressed and arrayed in a format compatible with parallel analysis. The method described here involves transformation of a yeast or mammalian expression host with a library of plasmids containing small fragments of genomic DNA that are fused in frame to a reporter gene construct in a plasmid. When the reporter protein is expressed and translocated to the outside of the cell it can be detected and the gene associated with the identified signal peptide can be identified, cloned and expressed for protein arraying.
A sample of cDNA (from a library or direct synthesis source) is fragmented by sonication, blunted ended using appropriate enzymatic modification [14] and cloned en masse into an expression vector. This plasmid contains a reporter protein, such as for example, invertase or plasminogen. The reporter gene is expressed as protein if fused in frame with a piece of cloned cDNA. If a signal peptide sequence is contained within the cloned fragment then the protein can be detected when secreted outside of the transformed expression host cell using an appropriate assay [15].
Capturing Full Length cDNAs for Secreted Proteins.
The cDNA fragments containing the signal peptide are used to capture the corresponding full-length cDNAs from a cDNA library using ClonCapture cDNA Selection kit (Clontech) following manufacturer's instructions. The captured plasmid library is then amplified through transformation of XL-1 blue E. coli and plasmids then extracted (Qiagen Maxi Plasmid Purification kit).
Full length cDNA clones are captured as above. The procedure described in ref 13 is then carried out on the pooled plasmid library encoding the secreted protein set and the final ligation mix is recovered by transformation of XL-1 blue E. coli. The secreted proteins are now encoded in the new library as fusions to a C-terminal green fluorescent protein-biotin carboxyl carrier protein [GFP-BCCP; refs. 16 & 17] tag DNA sequence. Restriction digestions are then performed [14] to remove the newly cloned inserts, together with the DNA encoding the tag, from the library vector and these are then ligated (directionally) into a suitable expression vector for use in yeast cell lines. Once transformed, these recombinant host cells are induced under appropriate conditions and those clones which express protein tagged proteins are identified. The secreted, tagged proteins are then purified and immobilised into an array in a single step from the cell culture (see below).
Proteins from a subtracted set of human breast tumour tissue cDNAs are arrayed in a format suitable for parallel functional analysis. The method described here involves transformation of a bacterial expression host with a library of plasmids containing differentially expressed cDNAs that are fused in frame to an affinity tag using the methodology described in ref 13. The proteins remain folded and active when expressed. Cell lysates are deposited on to a solid surface, performing purification and immobilisation via the affinity tag in one step, and creating an array of differentially expressed proteins.
Subtractive Hybridisation of Breast Tumour and Normal Breast Tissue cDNAs
Total RNA (1 μg) from female breast tumour tissue (invasive ductal carcinoma) and matched normal breast tissue (AMS Biotech) is used to synthesise first strand cDNA using the SMART cDNA Synthesis Kit (Clontech) following manufacturer's instructions. A subtractive hybridisation is then performed using PCR-Select Subtractive Hybridization kit (Clontech) with breast tumour cDNAs as ‘Tester’ and normal tissue cDNAs as ‘Driver’, and vice versa (as a control experiment) following manufacturer's instructions, to generate a subtracted set of cDNAs.
The subtracted set of tumour-associated cDNA fragments is cloned en masse into the TOPO TA cloning vector (Invitrogen). XL1 blue E. coli (Stratagene) are transformed with ligation mix and 100 cDNA clones are picked, plasmids extracted (Qiagen Mini Plasmid Purification kit) and cDNA inserts are sequenced using M13 reverse universal primer (LARK Technologies, U.K.). PCR amplification of a housekeeping gene, glucose 3 phosphate dehydrogenase, can be attempted from the subtracted cDNA set using G3PDH-specific primers under standard conditions: 95° C. 1 min, then varying numbers of cycles of 95° C. 30 s, 58° C. 30 s, 68° C. 1.5 min. Amplicons are analysed by 1.2% agarose gel electrophoresis (
Southern blotting (used standard techniques of bacterial colony blotting [14]) is used to assess the abundance of subtracted set cDNAs (designated the probe) in normal breast tissue and breast tumour tissue cDNA pools (the target;
Capturing Full Length Breast Tumour-Associated cDNA Clones
The subtracted set of cDNA fragments is used to capture the corresponding fill length cDNAs from a human heart cDNA library (Clontech) and a human breast tumour cDNA plasmid library (unpublished) using ClonCapture cDNA Selection kit (Clontech) following manufacturer's instructions. The captured plasmid library is then amplified through transformation of XL-1 blue E. coli and plasmids then extracted (Qiagen Maxi Plasmid Purification kit).
The subtracted breast tumour cDNA plasmid library is used to transform XL-1 blue E. coli. 384 bacterial colonies can be picked at random to analyse the cDNA inserts by restriction fragment length polymorphism patterns (14; data not shown). Based on RFLP pattern in this example, 96 different cDNA clones were selected and subjected (individually) to the procedure described in ref 13.
Each resultant modified cDNA is cloned by ligation with a suitably prepared equimolar pool of three plasmid vectors designated pIFM101-A, -B, & -C; these vectors each contain a C-terminal green fluorescent protein-biotin carboxyl carrier protein [GFP-BCCP; refs. 16 & 17] tag DNA sequence in a unique reading frame relative to their common 3′-cloning sites (i.e. all three possible reading frames of the GFP-BCCP tag relative to a common 3′-cloning sites are represented in a pool of these three vectors). DNA from each ligation is then used to transform E. coli XL-1 blue. After overnight growth on solid media, protein expression can be induced at 30° C. for ca. 4 hrs by addition of IPTG. Green fluorescent colonies (indicating expression of a soluble, GFP-BCCP-tagged fusion) are picked and the plasmids subjected to PCR amplification of the cDNA inserts (using plasmid-specific primers). DNA sequencing, using universal M13 reverse primer, is used to identify the cloned cDNAs.
The complete subtracted breast tumour cDNA plasmid library is used to transform XL-1 blue E. coli. In this example 3×105 bacterial colonies are scraped from agar plates, mixed and the plasmids extracted (Qiagen Maxi Plasmid Purification kit). The procedure described in ref 13 is carried out on the pooled plasmid library and the final ligation mix used to transform XL-1 blue E. coli. 244 green fluorescent colonies were picked (out of a total of 9,760 green fluorescent colonies). Southern blotting is carried out on the clones from the 244 bacterial colonies [14] with a probe consisting of 8 common genes. 184 different cDNA clones were consolidated and PCR was used to amplify the cDNA inserts which were then sized on agarose gels. 57 cDNA clones which contained cDNAs >300 bp, were identified. Extrapolation to the total number of colonies showed that the total library contains 2,262 colonies expressing soluble proteins of >12.5 kDa fused in frame to the GFP-BCCP tag.
Western blot analysis [14] of proteins expressed from final cDNA clones in both experiments confirmed, in this example, expression of 82 soluble, biotinylated proteins of >12.5 kDa. Final DNA sequencing of these cDNAs (Table 1) is carried out using universal M13 reverse primer.
Quality control experiments confirm that the initial subtraction process and subsequent modification yields a diverse collection of cDNAs, some of which are associated with breast cancer. The efficiency of subtraction is high as analysis of the subtracted library firstly reveals that very little cDNA encoding the housekeeping gene G3PDH remains after the subtractive hybridisation when it is performed in either direction (
Examples 1 & 2 described above both create collections of solubly expressed, folded, C-terminally tagged, differentially expressed proteins in a form directly suitable for immobilisation, via an affinity tag, into a spatially defined array on a solid surface. These proteins are expressed in parallel and the cell lysates used to directly capture the tagged proteins onto an affinity capture surface to fabricate a high density protein array on capture member.
Functional assays are performed to assess for example, DNA-binding, protein-binding, small molecule and drug binding and the effects of modifications such as phosphorylation on protein function. The plasmid DNA encoding the positive ‘hits’ is then sequenced in order to determine the identity of the protein.
Intestinal biopsies are obtained from null-CF with the d508 mutant, none-CF and residual-CF (d508 mutant). RNA is then extracted from these libraries and first strand and cDNA libraries are prepared. Subtractive hybrisation is then performed (as already described herein) to generate the following proteome arrays:
1. a subtracted null-CF/none-CF array
2. a subtracted residual-CF/none-CF array
3. a subtracted null-CF/residual-CF array
One or more of these arrays can be probed with one or more peptides derived from CFTR which are known to bind cellular proteins, as a positive control.
All references described in this application arc incorporated herein by reference in their entireties
This application claims benefit of the filing date of U.S. Provisional Application No. 60/366,550, filing date Mar. 25, 2002, the disclosure of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60366550 | Mar 2002 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10394653 | Mar 2003 | US |
Child | 12784214 | US |