The present invention relates to diagnosing disease. More particularly, the invention relates to analyzing biological samples for gene expression values to determine a degree of health of the biological sample.
A large, complex network of interacting components is difficult to describe as a whole dynamic system. In genetics research, scientists examining large numbers of genes, or genetic networks, often focus on identifying one gene or a group of genes that appears to be important to a particular outcome or pathology. What is needed are a low cost and efficient device, method and system for analyzing the interconnections between genes and genetic networks on a large-scale to output a report of a degree of health in a patient.
To address the needs in the art, a method of diagnosing a disease is provided, according to one embodiment of the invention, that includes a gene expression reader analyzing at least one biological sample and outputting gene expression values from at least two genes based on analyzing the biological samples, calculating a scaling factor a for the biological samples using an appropriately programmed computer, where the scaling factor a is calculated from the gene expression values by counting a number of link counts Cn for groups of an individual genes' expression values at different times at a threshold value C, or for groups of genes' expression values at a single time at the threshold value C, calculating an average number Cave of the link counts Cn, calculating a largest number M of the Cn, where the M includes the largest of the number of link counts Cn for a given threshold value C for all the gene expression value groups, iteratively applying a relation Cave=M/log(M) for different threshold values C, comparing data of the Cave values versus M/log(M), and calculating a fitting to the compared data to output the scaling factor a, where the scaling factor a is the slope of the fitting. The method further includes comparing values of the scaling factor a for the biological samples with other scaling factors a′ in a database from analyzed biological samples using the appropriately programmed computer, and outputting a report using the appropriately programmed computer, where the report includes estimates of the at least one biological sample for a degree of health.
According to one aspect of the current method embodiment, the at least one biological sample can include saliva, urine, other body fluids, synovial fluid, breast ductal fluid, blood and blood components, tissue, tumors, bone marrow, stem cells, induced pluripotent cells, cell lines, plant material, or other organic material.
In another aspect of the current method embodiment, the gene expression reader includes at least two gene probes.
In a further aspect of the current method embodiment, the number of link counts Cn includes a number of link counts for each of N expression value groups, where each expression value group includes a sequence of gene expression values n1, n2, . . . nT, at a threshold value C between the expression value group and the sequence of gene expression values n1, n2, . . . nT for the other N-1 gene expression value groups.
According to another aspect of the current method embodiment, the scaling factor a is calculated by iteratively applying Cave=M/log(M) for different threshold values C, using the appropriately programmed computer, and comparing Cave values versus M/log(M), and calculating a linear fitting of the comparison to get the scaling factor a.
In yet another aspect of the current method embodiment, comparing values of a further includes comparing byproducts of the scaling factor a, comparing healthy samples against disease samples, or comparing an unknown sample with a database of values from samples with a known condition.
According to another aspect of the current method embodiment, the threshold value C is in a range between 0 and 1.
In another embodiment of the invention, a system for diagnosing disease is provided that includes a gene expression reader for analyzing at least one biological sample and outputting gene expression values of at least two genes, a computer server for receiving from the gene expression reader the gene expression values and for managing and communicating patient information to a user, and a computer program hosted on the computer server, where the computer program analyzes the gene expression values and outputs a report, where the report includes estimates of the at least one biological sample for a degree of health, where the estimate includes comparing a scaling factor a for the at least one biological sample with other scaling factors a′ in a database from previously analyzed biological samples, where the scaling factor a is calculated from the gene expression values using the computer program by counting a number of link counts Cn for groups of an individual genes' expression values at a different times at a threshold value C or for groups of genes' expression values at a single time at the threshold value C, calculating an average number Cave of the link counts Cn, calculating a largest number M of the Ca, where the M includes the largest of the number of link counts Cn for a given threshold value C for all the gene expression value groups, iteratively applying a relation Cave=M/log(M) for different threshold values C, comparing the Cave data values versus M/log(M) data, and applying a fitting to the compared data to output the scaling factor a, where the scaling factor a is the slope of the fitting.
According to one aspect of the current system embodiment, the at least one biological sample can include saliva, urine, other body fluids, synovial fluid, breast ductal fluid, blood and blood components, tissue, tumors, bone marrow, stem cells, induced pluripotent cells, cell lines, plant material, or organic material.
In another aspect of the current system embodiment, the gene expression reader includes at least two gene probes.
In a further aspect of the current system embodiment, the number of link counts Cn includes a number of link counts for each of N expression value groups, where each expression value group includes a sequence of gene expression values n1, n2, . . . nT, at a threshold value C between the expression value group and the sequence of gene expression values n1, n2, . . . nT for the other N-1 gene expression value groups.
According to another aspect of the current system embodiment, the a scaling factor a is calculated by iteratively applying Cave=M/log(M) for different threshold values C, using the appropriately programmed computer, and comparing Cave values versus M/log(M) and calculating a linear fitting of the comparison to get the scaling factor a.
In yet another aspect of the current system embodiment, comparing values of a further includes comparing byproducts of the scaling factor a, comparing healthy samples against disease samples, or comparing an unknown sample with a database of values from samples with a known condition.
In a further aspect of the current system embodiment, the threshold value C is in a range between 0 and 1.
In another embodiment, the invention includes lab-on-a-chip device having a substrate for holding a biological sample receptacle, a gene expression reader and a microprocessor, where biological sample receptacle includes a sample input to the gene expression reader, where the gene expression reader outputs gene expression values of at least two genes based on analyzed the at least one biological sample, where the microprocessor includes a computer program for analyzing gene expressions in the at least one biological sample, where the computer program compiles the gene expression values, counts a number of link counts Cn for groups of an individual genes' expression values at different times at a threshold value C or for groups of genes' expression values at a single time at the threshold value C, calculates an average number Cave of the link counts Cn, calculates a largest number M of the Cn, where the M includes the largest of the number of link counts Cn for a given the threshold value C for all the gene expression value groups, iteratively applies a relation Cave=M/log(M) for different threshold values C, compares data of the Cave values versus M/log(M) data, calculates a fitting to the compared data to output the scaling factor a, where the scaling factor a is the slope of the fitting, compares values of the scaling factor a for the at least one biological sample with other stored scaling factors a′ from analyzed biological samples, and outputs a report, where the report includes estimates of the at least one biological sample for a degree of health.
According to one aspect of the current device embodiment, the at least one biological sample can include saliva, urine, other body fluids, synovial fluid, breast ductal fluid, blood and blood components, tissue, tumors, bone marrow, stem cells, induced pluripotent cells, cell lines, plant material, or organic material.
In another aspect of the current device embodiment, the gene expression reader includes at least two gene probes.
In a further aspect of the current device embodiment, the number of link counts Cn includes a number of link counts for each of N expression value groups, where each expression value group includes a sequence of gene expression values n1, n2, . . . nT, at a threshold value C between the expression value group and the sequence of gene expression values n1, n2, . . . nT for the other N-1 gene expression value groups.
According to one aspect of the current device embodiment, the a scaling factor a is calculated by iteratively applying the Cave=M/log(M) for different threshold values C, using the appropriately programmed computer, and comparing Cave values versus M/log(M) and calculating a linear fitting the comparison to get the scaling factor a.
In a further aspect of the current device embodiment, comparing values of a further includes comparing byproducts of the scaling factor a, comparing healthy samples against disease samples, or comparing an unknown sample with a database of values from samples with a known condition.
In yet aspect of the current device embodiment, the threshold value C is in a range between 0 and 1.
To address the needs in the art, a method of diagnosing a disease is provided, according to one embodiment of the invention.
According to one embodiment of the method 100, the invention uses gene expression values, for example from a microarray or genechip, for N expression value groups that can include a large number, if not all, the genes in a genome for a given organism, for example. In one embodiment, N does not need to contain all available expression value groups of the microarray data, only a large subset of the microarray data.
In one embodiment of the method 100, the gene expression values nT can be read from the microarray at multiple time intervals T. The dataset for quantification will include N groups of gene expression values nT of the form:
n1,n2, . . . , nT
Where n is the gene expression value of of one of N genes taken at T intervals.
For the sequence of gene expression values nj in the gene expression value group Ni, the absolute value is taken of a correlation between the gene expression value group Ni and every other gene expression value group (the other N-1 groups).
The total number of other gene expression value groups with a correlation above a threshold value C is called Cn and represents the number of links connecting this gene expression value group to all other gene expression value groups in the dataset with a value of C or greater. The largest of the Cn for a given C for all N gene expression value groups is then taken and called M. The average of all the Cn for a given C is also taken and called Cavg. According to one embodiment of the invention, for different values of C, the values of M and Cavg form the relation:
Cavg=(M/log(M))a
To find the value of the scaling factor a, the method above is repeated by iteratively applying a relation Cave=M/log(M) for different threshold values C, comparing the Cave data values versus M/log(M) data, and applying a fitting to the compared data to output the scaling factor a, where the scaling factor a is the slope of the fitting. According to the current embodiment, the threshold value C is in a range between 0 and 1.
In one embodiment of the method 100, shown in
In one example of this embodiment, given gene expression values for 5 different genes at a single time labeled 1-5, three gene expression value groups (N=3) can be made containing three gene expression values each (T=3). For example, the gene expression values from genes 1-3, 2-4, 3-5. The invention calculates the absolute values of the Pearson correlation between each group, and the other two (N-1=2). Assume that 4 of the correlation values calculated are >0.95. Then Cave for C=0.95 and N=3=4/3=1.33.
Further, assume that the largest number of absolute Pearson correlation values >0.95 for any single gene expression value group is 2. Then M for C=0.95 would be 2.
The essence of both the single-time groups and the time series (time groups) approach is that in each case correlation values are taken between one group and all the other groups.
Then it is calculated how many correlation values are greater that the threshold C. The largest number for any single group is M. The total number for all groups divided by the number of groups (N) gives Cave. Though these are two different ways to calculate scaling factors a that could be different values, according to one aspect of the invention, the only requirement is that either method used to generate a must be consistent when comparing values of a between biological samples.
According to one aspect of the method 100, the at least one biological sample can include saliva, urine, other body fluids, synovial fluid, breast ductal fluid, blood and blood components, tissue, tumors, bone marrow, stem cells, induced pluripotent cells, cell lines, plant material, or other organic material.
In another aspect of the method 100, comparing values of a further includes comparing byproducts of the scaling factor a, comparing healthy samples against disease samples, or comparing an unknown sample with a database of values from samples with a known condition.
In another embodiment of the invention,
According to one embodiment of the system 300, the at least one biological sample can include saliva, urine, other body fluids, synovial fluid, breast ductal fluid, blood and blood components, tissue, tumors, bone marrow, stem cells, induced pluripotent cells, cell lines, plant material, or organic material.
In another aspect of the system 300, the gene expression reader includes at least two gene probes.
In a further aspect of the system 300, the number of link counts Cn includes a number of link counts for each of N expression value groups, where each expression value group includes a sequence of gene expression values n1, n2, . . . nT, at a threshold value C between the expression value group and the sequence of gene expression values n1, n2, . . . nT for the other N-1 gene expression value groups.
According to another aspect of the system 300, the a scaling factor a is calculated by iteratively applying Cave=M/log(M) for different threshold values C, using the appropriately programmed computer, and comparing Cave values versus M/log(M) and calculating a linear fitting of the comparison to get the scaling factor a.
In yet another aspect of the system 300, comparing values of a further includes comparing byproducts of the scaling factor a, comparing healthy samples against disease samples, or comparing an unknown sample with a database of values from samples with a known condition.
In a further aspect of the system 300, the threshold value C is in a range between 0 and 1.
According to one aspect of the device 400, the at least one biological sample can include saliva, urine, other body fluids, synovial fluid, breast ductal fluid, blood and blood components, tissue, tumors, bone marrow, stem cells, induced pluripotent cells, cell lines, plant material, or organic material.
In another aspect of the device 400, the gene expression reader includes at least two gene probes.
In a further aspect of the device 400, the number of link counts Cn includes a number of link counts for each of N expression value groups, where each expression value group includes a sequence of gene expression values n1, n2, . . . nT, at a threshold value C between the expression value group and the sequence of gene expression values n1, n2, . . . nT for the other N-1 gene expression value groups.
According to one aspect of the device 400, the a scaling factor a is calculated by iteratively applying the Cave=M/log(M) for different threshold values C, using the appropriately programmed computer, and comparing Cave values versus M/log(M) and calculating a linear fitting the comparison to get the scaling factor a.
In a further aspect of the device 400, comparing values of a further includes comparing byproducts of the scaling factor a, comparing healthy samples against disease samples, or comparing an unknown sample with a database of values from samples with a known condition.
In yet aspect of the device 400, the threshold value C is in a range between 0 and 1.
The present invention has now been described in accordance with several exemplary embodiments, which are intended to be illustrative in all aspects, rather than restrictive. Thus, the present invention is capable of many variations in detailed implementation, which may be derived from the description contained herein by a person of ordinary skill in the art. For example, other complex interconnected networks where a single network component or node in the network can have the degree to which is it switched “on” quantified in a way similar to single gene expression values in a genetic network. Examples could include: numbers characterizing the total energy that each single protein in a protein-protein interaction network acquires from binding with other proteins in the network, other biochemical networks where the interaction between single components and other components can be similarly quantified for each component, numbers reflecting the flow of information to/from each single node in a communication or computer network, and numbers reflecting the flow of traffic through individual intersections in a city traffic network or between individual hubs in a transportation network.
All such variations are considered to be within the scope and spirit of the present invention as defined by the following claims and their legal equivalents.
This application claims priority from U.S. Provisional Patent Application 61/362676 filed Jul. 8, 2010, which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61362676 | Jul 2010 | US |