Claims
- 1. A method of reducing noise in assay data collected in assaying measurables in a sample comprising the steps of
providing replicate assay data for each of a plurality of measurables for one or more assay samples; providing a filtering function that identifies noise in replicate assay data; applying the filtering function to the replicate assay data to generate noise data; modeling the noise data to generate a noise model; and applying the noise model to the replicate assay data to reduce noise present in the replicate assay data.
- 2. The method of reducing noise in assay data of claim 1, wherein:
the filtering function has filtering conditions, the filtering function being configured to operate on the replicate assay data to filter data based on the filtering conditions; and the filtering function is applied to the replicate assay data to designate the replicate assay data as being part of at least a first group and a second group, wherein the data in the first group satisfies the filtering conditions, and the data in the second group fails to meet at least one filtering condition.
- 3. The method of reducing noise in assay data of claim 2, further comprising the step of decomposing the second group to generate an eigenmatrix comprising a plurality of eigenvectors.
- 4. The method of reducing noise in assay data of claim 1, wherein modeling the noise data comprises the steps of:
decomposing the noise data to generate decomposed noise data; projecting the noise data onto the decomposed noise data to form projected noise data; providing a model distribution having model distribution parameters; and fitting the model distribution to the projected assay by calculating the model distribution parameters to generate a model noise distribution.
- 5. The method of reducing noise in assay data of claim 3 further comprising:
providing a threshold eigendistance corresponding to the desired confidence level on the model noise distribution; projecting the replicate assay data onto the eigenmatrix to generate replicate assay data eigendistances for each of the replicate assay data; and selecting data from the replicate assay data having eigendistances greater than the threshold eigendistance; wherein the replicate assay data having eigendistances greater than the threshold eigendistance are the significant data.
- 6. The method of claim 3, wherein:
the replicate assay data are expression level measurements from a gene microarray experiment; and the filtering conditions comprise whether greater than a first percentage of the plurality of data for a given sample was manually adjusted, whether each of the plurality of data associated with an individual experimental sample has the same sign as each of the other data for that experimental sample, whether each expression level data for an experimental sample falls within a numerical range.
- 7. A method of generating a filtering function for selecting significant data in assay data comprising the steps of:
providing a filtering function with at least one filtering parameter that can have a plurality of possible parameter values; providing assay data comprising known false data; evaluating the ability of the filtering function to remove false data from the assay data for a plurality of possible parameter values to generate respective filtering function effectiveness values; using the filtering function effectiveness values to select a value for at least one filtering parameter of the filtering function to remove false data better than at least one other possible value of the filtering parameter.
- 8. The method of claim 7, wherein the filtering parameter is the number of replicate measurements.
- 9. The method of claim 8, wherein the number of replicate measurements is about four to six replicate measurements.
- 10. The method of claim 7, wherein the assay data are gene expression level measurements.
- 11. The method of claim 7 wherein providing assay data comprising known false data comprises:
providing a reference sample, wherein the reference sample generates a predominant majority of true positive reference results and a predominant minority of false negative reference results when studied with the experimental system; providing a blank sample, wherein the blank sample generates a predominant majority of true negatives results with the experimental system; providing an assay target sample, wherein the assay target sample generates no true positives when studied with the experimental system; and studying the reference sample, blank sample and assay target sample with the experimental system, wherein the reference sample is used to generate true positive results, the blank sample is used to generate true negative results, and the assay target sample is used to generate false positive results, wherein the true positive results, true negative results, and false positive results are used to select a value for the parameter of the filtering function from the possible parameter values that minimizes false positive results and false negative results.
- 12. The method of claim 11, wherein the filtering parameter is the number of replicate measurements.
- 13. The method of claim 12, wherein the number of replicate measurements is about four to six replicate measurements.
- 14. The method of claim 11, wherein the assay data are gene expression level measurements, and the false positive results are measurements falsely show changes in gene expression, the false negative results are results failing to show changes in gene expression.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH
[0001] This invention was made with United States government support under Grant Nos. R01-CA81367 and R29-CA78825 from the National Cancer Institute of the National Institutes of Health. The government of the United States has certain rights in the invention.