This patent application describes methods of proteomic analysis and synthesis of samples that can be simple or complex mixtures of substances. One of the methods is a method of classification of samples, that can be used for example in the quality control of manufactured or biological goods. The methods include methods for the analysis of immune system V (variable) regions, with the classification of individuals with respect to various diseases (diagnosis). The diagnostic methods include measurements of binding of reference sets of reagents to immune system V regions. Immune system V region proteomics is important because the immune system V region repertoire is changed or “skewed” in many diseases, including cancer, autoimmune diseases and graft versus host disease. O'Neill, 1991, Cell. Immunol., 136, 54-61; Wucherpfennig et al. 1992, J Exp Med., 175, 993-1002; Imberti et al. 1991, Science 254, 860-862; Rebai et al. 1994, PNAS 91, 1529-33. This skewing opens possibilities for innovations in diagnostic testing. The methods include methods for preventing and/or treating diseases for which the skewing of the repertoire of immune system V regions is well characterized. These methods involve an immunization or immunizations that are tailored to reverse the particular skewing.
This invention includes the ability to classify a wide range of samples that may be simple or complex. It emerged in the context of classifying vertebrates with respect to various diseases on the basis of immune system V regions in biological samples.
A full proteomic description of the specific (V region) components of a particular immune system would constitute a list of the concentrations of each of millions of lymphocytes, antibodies and specific T cell factors, together with the isotypes, amino acid sequences and three-dimensional structures of the corresponding V regions. Even with the spectacular advances that are currently being made in proteomics, such a description is not a realistic goal, and even if it were, achieving it may not be particularly useful. Each individual has his or her own set of V regions, due to different V region genes, different MHC (major histocompatability complex) genes that affect the expressed repertoire of T cells, and different histories of exposure to a wide range of antigens. Furthermore, different somatic mutations in each individual contribute significantly to the generation of the V region repertoire.
One recent approach to diagnostic proteomics is the SELDI-MS technology coupled to pattern recognition software. Hitt et al. United States Patent Application Publication, Pub. No. US 2003/0004402 A1. This is not suited for V region proteomics because it is based on mass differences between molecules, and while (for example) IgG antibodies with different V regions can have slightly different masses, each person has a unique spectrum of antibodies.
On the other hand, ELISA (enzyme-linked immunosorbent assay) technology and Radio Immune Assay (RIA) technology are available that are suitable for V region proteomics.
This patent application describes a method for proteomic analysis that builds on the previously defined concept of serological distance coefficients. Hoffmann et al. 1989 Immunology Letters, 22, 83-90. Experimentally measurable similarity coefficients S[A,B|C] specify the extent to which a pair of substances, A and B. are similar in the context of a diverse reagent, C. The definition of S[AB|C] is the fraction of C that binds both A and B divided by the sum of (i) the fraction that binds A but not B, (ii) the fraction that binds B but not A and (iii) the fraction that binds both A and B. The value of S[A,B|C] is then necessarily a number between zero and one. This definition was applied also to similarities between complex mixtures of substances, such as the antibodies of two serum samples, A and B. A “distance coefficient” D[A,B|C] between two sera, A and B, in the context of C, was defined as one minus the similarity coefficient in the same context. The experimental measurement of these coefficients, and their possible use in the diagnosis and prognosis of disease conditions was described.
This invention invokes the concept of shape space. An N-dimensional shape space has been discussed by Perelson et al. 1979, J. theor. Biol. 81, 645-667, and a formulation that permits an experimental determination of the dimensionality of a shape space has been described by Lapedes et al. J. theor. Biol. 2001, 212, 57-69. The N-dimensional shape space of this invention is different from both of these; the different shape spaces are contrasted near the end of the detailed description of the invention.
The antibody repertoire of the immune system is regulated by the T cell repertoire. The T cell repertoire in turn is selected by self antigens, including most notably MHC (Major Histocompatability Complex) antigens, but possibly also the many self antigens that are much less polymorphic than MHC antigens. The impact of non-polymorphic self antigens on the T cell repertoire would not be seen in the kinds of experiments that demonstrate the high level of polymorphism in MHC antigens. A plausible evolutionary constraint on self antigens is that they should consist of a “balanced” set, such that for any self antigen impinging on the immune system and stimulating one set of clones, there are other self antigens that stimulate complementary clones. The immune system may itself (in addition) dynamically establish symmetry between each shape and complementary shapes in V region repertoires. This concept leads to the idea of a high level of similarity in the expressed antibodies repertoires of young, healthy individuals of different species, in the sense of them all being “balanced” repertoires in this respect. Among other applications, this invention will enable the concept of balanced repertoires, and hence similar repertoires even in healthy individuals of different species, to be experimentally tested.
The immune system is a highly sensitive system that can be modulated by very small amounts of antigens and antibodies. Experiments in mice and rats show that the specific response of the system to a particular antigen can be significantly decreased by injections of antigen as low as picograms or even less. Shellam 1969 Immunol. 16, 45-56; Ada et al. 1968 Proc. Nat. Acad. Sci. (USA), 61, 566-561. A response consisting of antibodies with a particular idiotype can be suppressed by an injection of 10 to 100 ng of antiidiotypic antibody. Eichmann 1974 Eur. J. Immunol., 4, 296-302. The injection of nanogram amounts of monoclonal IgM antibody can induce the production of antibodies of the same specificity. Forni et al. 1980. Proc. Nat. Acad. Sci. (USA) 77, 1125-1128. The genetic manipulation of adding a single heavy chain gene, that is a marker of a particular idiotype, to the genome of a mouse results in the-production of antibodies with the same idiotype, but using other genes. Weaver et al., 1985. Cell, 45, 247-259. It would seem that such manipulations of the immune system would make a marked difference to the state of the system only if it is normally precisely balanced. Only then might one expect that such very small perturbations can shift the state of the system significantly. Hence such findings suggest that a dynamically maintained balance between shapes and complementary shapes is a basic feature of the V regions of the immune system. Various diseases then correspond to various forms of a loss of balance in the system.
Further features and advantages will be apparent from the following Detailed Description of the Invention, given by way of example, of a preferred embodiment taken in conjunction with the accompanying drawings, wherein:
c is the Euclidean distance from A1av to A2av, ai is the Euclidean distance from A1av to Ai, and bi is the Euclidean distance from A2av to Ai.
This invention includes methods of classification of samples, and these methods lead to applications including quality control and methods for diagnostics and vaccine formulation. The invention utilises a number P (>>1) of reagents, rather than a single diverse reagent, where P≧N and N is the number of dimensions of a shape space with approximately orthogonal axes. Each of the reagents can be an individual substance or a mixture of substances. This produces a much larger data set than using a single diverse reagent, but it is still a very small set compared with, for example, the complete listing of V regions and their concentrations mentioned above. The result is a measure of similarity between substances or mixtures of substances (“samples”) based on the N-dimensional shape space, and is a more powerful tool for multiple applications, including applications to diagnostics and vaccines. diagnostics and vaccines. The new approach also has the advantage that it eliminates the need to do absorptions of the diverse reagent C, which was the most labour-intensive part of the determination of serological distance coefficients as previously described.
The members of the panel of P reagents are selected on the basis of being diverse and having well-defined, reproducible three-dimensional shapes and the constraint that the N shape space axes are optimally orthogonal. They may, for example, include, but are not restricted to being, normal human proteins and proteins of one or more other species.
We consider first the case that P=N. We denote the reagents of this panel by X(j) (with j=1 to N), and use them all most simply (but not necessarily) at a standard concentration C0. We measure the binding (relative affinity) of each of these reagents to each other using, for example, an ELISA or an RIA. This produces a matrix K with elements Kjk (j=1, N, k=1, N).
We next define N new reagents, that we denote as Y(j), (j=1, N). Each of the Y(j) reagents is made up of a linear combination of the X(j) reagents, with the amount of the kth component being proportional to Kjk. Those components that have strong binding to X(j) are present at a high concentration in Y(j), while those with little or no binding are included at a low or zero concentration, For each X(j) there is a corresponding Y(j), with j=1 to N. There are two possible ways of normalizing the concentrations of the Y(j) reagents to establish a symmetry between the X(j) reagents and the Y(j) reagents. One is to make the total concentration of the components of Y(j) such that the binding signal obtained for Y(j) binding to X(j) (in the case of an ELISA assay, with Y(j) binding to X(j) on the plate), in the linear range of the assay, is equal to the converse binding signal (binding of X(j) to Y(j), also in the linear range of the assay). The other method is to simply set the total concentration of each Y(j) equal to C0. The former method leads to the definition of a convenient virtual N-dimensional origin for the shape space, namely a hypothetical sample to which X(j) and Y(j) bind equally in the assay, for all values of j.
Each pair of reagents X(j) and Y(j) are complementary to each other and are thus opposite poles of an axis in the N-dimensional shape space. Together they define an axis in that space called the X(j)/Y(j) axis. We measure the binding of each X(j) reagent (j=1, N) to each Y(k) (k=1, N) reagent. This produces the N×N matrix J with elements Jjk. On the basis of mass-action, and subject to linearity of the assay, the expected relative values of the elements of J are
The diagonal elements of this matrix specify the level of binding between the reagents X(j) and Y(j), that have been specifically tailored to be complementary to each other. Hence their mutual binding will produce a strong binding signal, while there will be a relatively weak signal for off-diagonal terms. Thus J is an approximately diagonal matrix. The interpretation of this feature is that the N X(j)/Y(j) shape space axes are approximately mutually orthogonal.
We now consider samples, for example biological samples containing immune system V regions obtained from an individual i. These samples may be, for example but not exclusively, serum, T-lymphocyte extracts, B-lymphocyte extracts, saliva or urine. We measure the binding of each of the reagents X(j) (j=1 to N) to each of the samples, again using for example an ELISA or an RIA. For each sample we thus obtain N binding signals AiX(j).
We repeat this process using the set of N complementary reagents, Y(j). We measure the binding of each Y(j) reagent to components in the sample i, to obtain the values AiY(j) (measured) for j=1 to N. Subject to an assumption concerning linearity of the assay, we can however also compute expected relative values of AiY(j) according to:
The results of these summations are then normalized such that the average of the computed values of AiY(j) is the same as the average of the measured AiX(j) over j=1 to N. Hence, remarkably, we can have the benefit of an analysis in terms of the N X(j)/Y(j) axes in shape space without needing to prepare the Y(j) reagents, and without making measurements on all our samples using them! This is because the values of the Aij together with the K matrix values already contain all the physical information. On the other hand, by including the actual measurement of AiY(j) using Y(j) reagents we have a technology that is more robust, because the individual measurements are then automatically screened for self-consistency. This is analogous to sequencing both strands of DNA, in which case any sequencing errors are immediately revealed, since one sequence predicts the other. The inclusion in the technology of the measurements using Y(j) reagents is expected to be done at only a low additional cost. To the extent that the results differ, the best estimate of each AiY(j) may be obtained by taking the mean of the measured and computed values.
The difference AiX(j)−AriY(j) is a coordinate for the sample i on the X(j)/Y(j) axis, that can be either positive or negative, and will be denoted as Aij. It specifies whether the sample i is more X(j)-like (Aij<0) or more Y(j)-like (Aij>0). There are N such coordinates (j=1 to N) for each sample. The set of N coordinates Aij with j=1 to N is called the Proteomic Analyser point (“PA point”) for the sample i and in the case of a biological sample is a PA point for the individual or organism from whom or from which the sample was derived. This set of N coordinates for the sample i will be denoted by “Ai”.
The orthogonality of the shape space can be increased by using more reagents (“P reagents”) than the number of dimensions of the shape space (N) as follows. We use the set of P reagents X(j), j=1 to P, where P>N, and measure the P×P matrix “KP” with elements “KPij” (i=1 to P, j=1 to P) being the binding signals of each of the reagents to each other as before for the matrix K. We formulate a full set of reagents Y(j) (j=1 to P), using the full set of P X(j) reagents and the matrix KP to determine the relative concentration of each X(j) reagent in each Y(j) reagent. That is, each Y(j) reagent, for j=1 to P, consists of a weighted mixture of the P reagents, with the relative amount of the kth component being proportional to KPjk, for k=1 to P. We measure the binding of each of the X(j) reagents to each of the Y(j) reagents to obtain the P×P matrix JP. We then select the N X(j) and Y(j) reagent pairs that have the largest ratio of the diagonal elements of JP to the mean of the corresponding off-diagonal elements (terms in the same row and the same column). These N X(j)s and Y(j)s are then used in the experimental measurement of PA points for an N-dimensional shape space as already described. For these N X(j) and Y(j) reagents we have a K matrix and a J matrix as before. Then we obtain a set of N coordinates for the sample i denoted by “Ai” using this set of X(j) and Y(j) reagents as before. For a single shape space axis at least two reagents are needed, and making P=2N provides an additional degree of freedom for each shape space axis.
The above methods are designed to have no a priori bias or preference for any shape space axis over any other. This is desirable, since the goal is to map samples in a shape space that is as symmetrical as possible with respect to the universe of shapes. The result is that the magnitudes of the diagonal elements of J do not differ greatly from each other. These methods are therefore preferred to strategies that may achieve orthogonality of the N axes in a more managed way, and in the process result in some of the diagonal elements of J being much larger than others. Criteria for judging which methods of selection of the X(j) and Y(j) reagents are most successful include the resulting degree of diagonal dominance of J and the amount of uniformity in the magnitudes of the diagonal elements of J.
The first aspect of the invention is thus providing the ability to experimentally map samples, that can be either simple (few component substances) or complex (many component substances) in an N-dimensional shape space. This mapping is useful because it permits one to measure the distance in the N dimensional shape space between different samples, and permits the classification of samples based on where they map relative to each other in the space. If a category of samples maps clearly to within a defined region of the N dimensional space, and a sample maps clearly outside of that region, the mapping can be used to exclude that the sample belongs to that category. More generally, the mapping of groups of samples in different categories in the N dimensional shape space (for example giving mean, and standard deviation for the distribution in each of the N dimensions for each category) permits straightforward statistical methods to be used to compute relative probabilities of unclassified samples belonging to the various categories to be estimated, based on where they map in the N dimensional shape space. This means that a central aspect of the invention is that it provides the basis for an ability to classify samples with respect to categories.
An important application of this ability to classify samples with respect to categories is the diagnostic aspect of the invention, in which the different categories include sets of samples from individuals that are healthy and sets of samples from individuals with any of a variety of diseases. Each disease is expected to be characterized by Proteomic Analyser points within disease-specific regions, while healthy individuals are expected to be characterized by different fingerprints. For this application the samples contain immune system variable regions (“V regions”), and the binding of the reference set of reagents to immune system V regions is measured.
The diagnostic aspect leads to a vaccine aspect of the invention, in which the adaptive property of the immune system makes it possible to modify the immune system, and move the Proteomic Analyser point for the V regions of a person with a given disease (or whose Proteomic Analyser point is on a trajectory towards a given disease) back towards the Proteomic Analyser point that is characteristic of a healthy person, or (in a personally customized aspect of the invention) toward the Proteomic Analyser point of that person when he or she was healthy. The same set of reagents that are used to measure the Proteomic Analyser point are used to stimulate the immune system, such that it moves in the direction back towards a Proteomic Analyser point characteristic of the healthy state. For different diseases, different (calculable) recipes (lists) of the same set of reagents are used.
The ability to classify samples with respect to categories leads to the possibility of quality control for many goods, including for example agricultural goods. Extracts of samples of meat can have their Proteomic Analyser points measured and checked for consistency. Suppliers and purchasers of such items as grains and yeast (for making bread) may similarly find it advantageous to have the items certified to have Proteomic Analyser points within a specified range of what they know, from experience, to be satisfactory values. The manufacturers of breakfast cereals may find it useful to monitor the Proteomic Analyser points of batches of their products. A farmer may find it advantageous to measure Proteomic Analyser points of soil samples, and determine which Proteomic Analyser points for the soil samples correlate with good yields for various crops.
In light of these examples of potential applications, the potential utility of being able to measure Proteomic Analyser points is evident.
Mapping samples in an approximately orthogonal N-dimensional shape space leads to a method for classifying a wide range of samples with respect to a wide range of categories. We consider an unclassified sample U that we want to classify with respect to Q categories, where Q is an integer equal to or greater than 2, and with each of the categories labelled by a value of q, where q=1 to Q. We select M1 samples that are known by conventional criteria to belong to the category 1, select M2 samples that are known by conventional criteria to belong to the category 2, and in general select Mq samples that are known by conventional criteria to belong to the category q, thus using a total of Q sets of samples that have been classified using conventional criteria. We map the samples in each category in an N-dimensional, approximately orthogonal shape space, giving coordinates Aqij with q−1 to Q, i=1 to M, and j=1 to N and let these PA points be denoted by Aqi. We map the unclassified sample U in the same N-dimensional shape space, giving coordinates AUj, with j=1 to N and we let this PA point be denoted by AU.
We compute the N average Proteomic Analyser coordinates Aqav(j) for j=1 to N and q=1 to Q, of the Mq samples in each of the Q categories (their average PA point) as
and designate these average PA points “Aqav”, with q=1 to Q.
We select two of the sample set averages Aq to define a new axis in shape space (
and let this be designated “c” as shown in
We compute the mean and standard deviation of the xi for samples in the category 1 and category 2 and let them be denoted by μ1(xi), μ2(xi), σ1(xi) and σ2(xi) respectively. We denote the value of xi for the unclassified samples by xi(U).
In the context of the model that the distributions of values of xi for samples within each of the two categories is approximately normal, we calculate the z statistic, zU(q) (q=1 and q=2), for the xi of the unclassified sample U relative to the distribution of xi values for samples in each of the categories 1 and 2,
From these computed statistics for xi with q=1 and 2, we determine whether the unclassified sample U can be excluded from the categories 1 or 2, and if so, from which categories and with what level of confidence. We repeat this process with q=1 and 3, then 1 and 4, and so on to 1 and Q, to determine whether the samples can be excluded from each of the other categories, and if so, with what level of confidence, with category 1 in each case as the reference category. This process can also be implemented with a different category (q not equal to 1) as the reference category.
We can use a second approach to compute relative probabilities for the sample belonging to each of the various categories. The distributions of the coordinates of the samples in the database in each of the N dimensions defined by the N reagent pairs X(j) and Y(j) is used. We begin with using the N mean coordinates of each group, Aqav(j), to compute the standard deviations σqj (j=1 to N, q=1 to Q) for each of the N coordinates of the Mq samples in each group as
We use the values of the coordinates AUj (j=1 to N) (the components of AU), the computed values of the standard deviations σqj, and the model that the values of Aqij for a given category (fixed value of q), a given value of j, and i=1 to Mq are normally distributed about the mean Aqav(j). The normal distribution probability for the jth coordinate of the unclassified sample having the value AUj is given by
We compute the ratio [PU1/PU2]j of the probability that the unclassified sample U belongs to the category 1, to the probability that it belongs to the category 2, based on the data for these two categories for the jth shape space axis according to
We then compute the joint probability ratio using the data for all N (approximately orthogonal, hence approximately independent) axes in shape space, [PU1/PU2]all N axes, as the product from j=1 to j=N of the probabilities for each of the axes [PU1/PU2]j according to
[P
U1
/P
U2]all N axes=[PU1/PU2]1[PU1/PU2]2[PU1/PU2]3 . . . [PU1/PU2]N (10)
We can use this same procedure for computing the probability ratio for the sample i belonging to each of the other Q-2 categories relative to category 1. We can also compute in the same way other relative probabilities for the sample belonging to various categories, for example the probability that a sample belongs to category 5 relative to the probability of it belonging to category 6. The more samples we have in each category, the more accurately we can determine the means and standard deviations for each category with respect to each of the N axes, and the more accurate the classification results will be.
The above method of classification can be used as a diagnostic method. A premise of the diagnostic aspect of the invention is that immune system V regions in healthy individuals map to a limited, characteristic region in the N-dimensional shape space. This aspect is demonstrated using the Proteomic Analyser itself. Some diseases, such as autoimmune diseases, correspond to particular modes of aberration or collapse of the immune system network of V regions, and immune system V regions in samples from people with each of these diseases map to different, disease-specific regions of the N dimensional shape space. Some diseases are characterized by a disease-specific set of aberrant self antigens (as in the case of cancers) and are also associated with characteristic, disease-specific perturbations of the PA point relative to the healthy, young PA point for the individual. For this application category 1 refers to a set of samples from healthy, preferably young individuals. The other categories are sets of samples from people that have been classified to have various diseases.
The combination of the two classification processes as described above provides a diagnosis comprising both a list of diseases that are excluded and a list of relative probabilities for diseases. For example, a diagnosis may be that each of ten forms of cancer, Alzheimer's disease and Creutzfeldt-Jakob disease are excluded with confidence levels of 95% or higher, while lupus, diabetes and osteoarthritis are not excluded, and with the individual being one hundred times more likely to have lupus than being healthy, fifteen times as likely to have lupus as diabetes and five times as likely to have lupus as osteoarthritis.
So far we have included all of the N reagents in the analysis. We do not need to do this. For the diagnosis of a particular disease or condition we can instead include only those reagents that optimise specificity, sensitivity and simplicity, either individually or jointly.
An advantage of this diagnostic method over the precursor serological distance coefficient method is the fact that it eliminates the need to do absorptions, which was the most labour-intensive part of that earlier method.
Another advantage is that this diagnostic method is based on N-dimensional vectors, with N>>1 as opposed to the 2-dimensional map of the previously published serological distance coefficient diagnostic method, that utilised a single diverse regent. This means that the method provides more specific diagnoses. N-dimensional vectors with N>>1 contain much more precise information than 2-dimensional vectors.
In addition to the actual position in N-dimensional shape space, the direction of movement of the coordinates in shape space for an individual from a healthy state towards coordinates characteristic of having a particular disease is indicative of progression towards having that disease.
An example of a disease that has historically been difficult to diagnose is systemic lupus erythematosus (SLE). The definition of SLE of 1982 (Tan et al., Arthritis Rheum. 25, 1271-1277, 1982) includes eleven classes of criteria, with multiple alternative sub-criteria for five of these, such that there is a total of twenty criteria. An individual is defined as having lupus if he or she has four or more of the eleven classes of criteria. The Proteomic Analyser method can be used to identify people who have lupus or whose immune systems are on a trajectory towards having lupus.
In addition to its diagnostic role, the formalism and method developed here is useful for the formulation of highly specific multi-component proteomic perturbations to the immune system that function as preventive and/or therapeutic vaccines. This is the case when the diagnosis involves measurements of the binding of the set of reagents to immune system V regions. The diagnosis then measures skewing of the immune system repertoire of V regions relative to the repertoire of healthy individuals, and a stimulus consisting of a combination of the X(j) and Y(j) reagents can be tailored to correct the skewing.
The V region repertoire of an individual can be changed by stimulation with the X(j) and Y(j) reagents. This involves the process of clonal selection, in which cells with specific (V region) receptors that are complementary to a substance are stimulated by that substance to proliferate. Since each X(j) is complementary to the corresponding Y(j), cells with V region receptors that are complementary to the X(j) reagents will be called “Y(j) cells” and cells with V region receptors that are complementary to the Y(j) reagents will be called “X(j) cells”. The process of correcting skewing in the system involves a computed recipe for the stimulation of X(j) cells by the Y(j) reagents and stimulation of X(j) cells by the Y(j) reagents.
We use a set of MD samples containing immune system V regions from individuals who have been classified to have a given disease (the “D set”), and another set of MH samples containing immune system V regions from healthy individuals (the “H set”). We obtain MHN binding signals AH(i)X(j) of the X(j) reagents to immune system V regions for the healthy group, where i is an index for the sample that goes from 1 to MH, and j is the index for the reagents X(j) that goes from 1 to N. We likewise obtain MDN analogous results AD(i)X(j) from the disease group, where i goes from 1 to MD.
For each value of j we average the values of AH(i)X(j) for i=1 to MH:
We likewise average the values of AD(i)X(j) for each value of j:
Similarly, for a corresponding set of Y(j) reagents (j=1 to N) we determine, by measurement or computation, or by a combination of measurement and computation as described above, values AH(i)Y(j) for i=1 to MH, and values AD(i)Y(j) for i=1 to MD. We compute average values for each value of j, for the MH samples from healthy individuals and for the MD samples from individuals with the disease:
For a single pair of reagents X(j) and Y(j) and a given disease D we can plot the values ADavX(j), AHavX(j), ADavY(j) and AHavY(j) on the axes AX(j) and AY(j) as shown in
At first sight, we might choose a concentration of Y(j) proportional to AHavX(j)−ArDavX(j) and a concentration of X(j) proportional to AHavY(j)−ArDavY(j). A problem with this is however that some of these tentative relative concentrations will be negative, and we cannot include a negative amount of a reagent in the formulation of a more complex reagent. This problem can be resolved by substituting a positive amount of the reagent X(j) for a negative amount of any reagent Y(j) [since X(j) is complementary to Y(j)], and likewise a positive amount of Y(j) for any negative amount of X(j). The relative amount of X(j) needed in the vaccine, from the perspective of the X(j)/Y(j) pair of reagents, will be denoted by R[X(j)] and is given by
where sign x=1 for x>0, and sign x=−1 for x<0. Similarly, the relative amount of Y(j) in the vaccine, denoted by R[Y(j)], is given by
In the example of
Immunizations with the X(j) and Y(j) reagents can also be delivered together with an adjuvant, which is an agent that non-specifically boosts immune responses to specific antigens.
People's individual antibody repertoires and/or T cell V region repertoires and/or B cell V region repertoires can be characterised as points in N-dimensional shape space using the present invention also while they are still healthy. Changes in their repertoire as they age can be monitored by measuring the similarity between current and historical samples from the same individual. Any undesired changes can then be counteracted at an early stage by the vaccine method of the invention. The preceding description is in terms of vaccines suitable for a particular disease and for many people. Such vaccines are applicable especially as a preventive immunisation for healthy people. A patient may however have skewing that is unique to that individual. In such cases a personally tailored approach is beneficial. One method is to replace the average absorbance values ADavX(j) and ADavY(j) with the patient's absorbance values AD(i)X(j) and AD(i)Y(j) respectively in equations (15) and (16). Another step in the direction of personally tailored vaccines is to replace AHavX(j) with AH(i)X(j) and AHavY(j) with AH(i)Y(j), in equations (15) and (16), where AH(i)X(j) and AH(i)Y(j) are obtained using historical samples from when the individual i was healthy. Hence N-dimensional perturbations can be tailored to inhibit and/or reverse pathological skewing of V region repertoires at the levels of both populations and individuals.
The Proteomic Analyser can be used to compare the repertoires of antibodies of young, healthy individual mice of different strains, and of different species. Hence it can be used to experimentally confirm that the repertoires of healthy young individuals of different strains and different species are similar to each other.
While the concept of using X(j)/Y(j) axis coordinates emerged in the context of the V region network of interactions of the immune system, this technology can be used generally to characterise proteomes and monitor changes in the proteome of an individual or an organism. A Proteomic Analyser point that does not include some of the components of a sample can be useful. For example, mapping the Proteomic Analyser point for immune system V regions, for example, for IgG antibodies, may require some purification of the antibodies. On the other hand, a Proteomic Analyser point that may usefully be monitored, and may have diagnostic value, could be one that includes all the serum components, or all the serum components except antibodies. Thus mapping of molecules other than immune system V regions in the N-dimensional shape space may also be useful in diagnostic applications.
The Proteomic Analyser can be used to measure similarity and dissimilarity in shapes between different proteins, including those for which a three dimensional structure is known and others for which a three dimensional structure is not known. It can thus be a tool that assists in the elucidation of the three dimensional structure of proteins. This in turn can assist in the design of drugs that interact with particular proteins.
The Proteomic Analyser can measure Proteomic Analyser points for both biological and non-biological samples. It can provide a method for quality control for simple substances or mixtures of substances that may be simple or complex.
The invention utilises a diverse array of N reagents (N>>1) and the set of relative binding affinities of the substances for each other, as determined for example by an ELISA assay. A value of N in the range 20 to 1000 is anticipated, but the invention is not limited to this range. There is not a specific minimum value of N. From the perspective that the specificity of the method depends exponentially on the value of N (see below), the larger the value of N the better. From a practical point of view, the technology is likely to be at least initially implemented using ELISA plates that have 96, 384 or 1536 wells. A plausible implementation involves each plate containing N X(j) reagents and N Y(j) reagents, so that N is in the range of between about 40 and about 750. The choice of this range includes the possibility of using some of the wells as calibration controls. The use of other technologies for measuring the binding of reagents to each other and the binding of samples to the reagents may lead to other preferred values of N, that are specific to the details of those technologies.
The N reagents (X(j), j=1, N) are substances with reproducible, stable, diverse, three dimensional shapes and may include for example monoclonal antibodies and/or other proteins from one or more species. The invention optionally utilises also a second array of N reagents (Y(j), j=1, N), consisting of mixtures of the first array of N reagents, formulated as described in the above specification.
One preferred embodiment is for all the X(j) reagents to be monoclonal antibodies, for example all of the IgG class. This creates a symmetry in the system that allows for essentially unlimited diversity in shapes, while ensuring that all the reagents have a similar intrinsic ability to cross-link complementary receptors. (The cross-linking of receptors is believed to be the mechanism for the specific stimulation of lymphocytes.) This would be in contrast to using proteins with varying degrees of polymerisation, some of which would be much stronger immunogenic stimuli than others. IgG antibodies have two V regions, and are thus able to cross-link complementary receptors. Another preferred embodiment is to use exclusively soluble proteins of a size comparable to each other and without any repeating determinants, again ensuring that they are of similar immunogenicity.
The set of reagents should optimally have an essentially random interaction matrix K (or KP). The randomness of K or KP will correlate with the matrix J (or JP) being diagonally dominant. This diagonal dominance of J in turn correlates with the shape space axes being approximately orthogonal to each other. Thus the degree of diagonal dominance of J can be used as a measure of quality for a candidate set of the reagents X(j) and, by extension the corresponding Y(j). In order to increase the fraction of nonzero terms in the interaction matrix K, the reagents X(j) can themselves be mixtures of reagents, for example mixtures of proteins or (more specifically) of monoclonal antibodies. If the diagonal terms in the matrix J all have approximately the same size, there is a high level of symmetry in the shape space, which is beneficial.
For applications of the Proteomic Analyser involving the binding of the reagents to V regions in serum samples, it may be necessary to purify the V region bearing molecules in order to decrease the noise due to binding of the reagents to non-V region bearing molecules. A preferred embodiment is to constrain the set of X(j) reagents such that they have minimal affinity for proteins in the samples being mapped except for the V regions in those samples.
We are currently faced with an important new disease, namely SARS. A virus has been identified as the culprit. But the virus is not found to be present in all cases of the disease. Several years ago this seemed to be the case with AIDS and HIV, but then cases of the syndrome that were negative for HIV were defined as “idiopathic CD4+ T-lymphocytopenia”, rather than AIDS. Smith et al. 1993, N. Engl. J. Med., 328, 373-379; Ho et al. 1993, N. Engl. J. Med., 328, 380-385; Spira et al. 1993, N. Engl. J. Med. 328, 386-392; Duncan et al. 1993, N. Engl. J. Med. 328, 393-398. The definition of AIDS was narrowed to include only those people who are positive for HIV. Morbidity and Mortality Weekly Report, CDC Atlanta, USA 1999, 48 (RR13), 1-31.
We may now have a similar situation with SARS. The World Health Organisation has announced that a corona virus has been shown to cause the disease (see http://www.who.int/mediacentre/releases/2003/pr31/en/) but in Canada only about 50% of confirmed SARS patients were found to be positive for direct detection of the virus, namely polymerase chain reaction or virus culture (Frank Plummer, personal communication). Ultimately, about 95% of confirmed cases developed antibody to SARS coronavirus at 4 weeks (Frank Plummer, personal communication). This raises the question of whether SARS can be caused by a proteomic stimulus similar to that caused by the virus, but without the virus itself The method described here may be useful for identifying any additional causes of SARS. Responses to the corona virus would produce one form of repertoire skewing, while other agents may induce a similar but distinct skewing. The invention potentially enables a diagnosis for SARS that is independent of the detection of the corona virus or any other virus.
The specificity of the method depends on the value of N and the accuracy of the assay method. If the values of AiX(j)−AriY(j) are obtained simply as Boolean numbers, when N=20 the shape space would have 220 distinguishable points. With an ELISA assay the results are however analogue rather than Boolean, and each coordinate might have 10 distinguishable values. Then already with N=5 the shape space would have 105 distinguishable points, and with N=20 there would be 1020 distinguishable points. This theoretical remarkable resolution is expected to be important for applications to diagnostics and vaccines. It can be tested in experiments in which known mixtures of the X(j) reagents themselves are analysed using the method, and experimentally determined coordinates are compared with theoretical predictions.
In their work on shape space Perelson et al. 1979 J. theoret. Biol. 81, 645-667, estimated limits on the size of the repertoire that is needed to reliably respond to antigen, and they were also concerned with the necessity not to make antibodies to self. The focus of the theory is the relationship between the volume of shape space covered by the reactivity of a single antibody and the total volume of shape space, and hence the number of different antibodies needed to reliably cover shape space. The main parameters in the theory are the dimension of their shape space N, the size of the repertoire NAb, and the distance in shape space within which an antibody can bind all antigens, ε. These parameters are interdependent, and the theory did not include a method for measuring N or ε. On the basis of literature values of the frequencies of antigen specific cells, they estimated that N could not be more than 5 or 10.
Lapedes et al., 2001, J. theor. Biol. 212, 57-69, described a shape space for which a dimensionality can be determined using experimental data. They used MN experimental data points, namely the binding of M antigens to N antisera, to map the shapes of each of the antigens and sera to points in a D-dimensional shape space. The method involves minimizing a function of the experimental data points and the space shape coordinates. The relationship of this shape space to that of Perelson et al. is unclear, since it does not have ε or NAb as parameters. They found D to have a value of 4 to 5.
These papers by Perelson et al. and Lapedes et al. are based on the premise that there is an intrinsic dimensionality for shape space relevant to immunological recognition. This premise plays no role in this invention.
This invention is an extension of and improvement on the earlier concept of serological distance coefficients, in which similarity was defined in the context of a single diverse reagent, Hoffmann et al., 1989. Immunol. Letters, 22, 83-90. Here we define similarity in the context of an approximately orthogonal set of N axes in shape space. In immunology context is of over-riding importance, since antibodies are made in the context of a set of self antigens, T cells and other antibodies. The dimension N of the shape space is something we are free to choose, and the choice determines the level of specificity. The larger the value of N, the higher the specificity of the method.
This application claims the benefit of previously filed Provisional Patent Application No. 60/563,819, filed on Apr. 21, 2004.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CA2005/000606 | 4/19/2005 | WO | 00 | 3/11/2008 |
Number | Date | Country | |
---|---|---|---|
60563819 | Apr 2004 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11049964 | Feb 2005 | US |
Child | 11568183 | US |