The present invention relates to a system, apparatus, and method for transforming original measurement data to reduce overall sensitivity in an unreliable region while enhancing the sensitivity of the data in regions where this is desired.
Measurement data can have distributions that do not well suit their use by certain pattern classification learning methods due to a large or small dynamic range. For example, consider microarrays in which a glass slide is populated with single stranded DNA. A sample is washed over such a slide so that RNA present in the sample will preferentially bind to the DNA strands. This is often done relative to a control with binding to a different type of fluorescing molecule being used to distinguish between the control and the target. The light color and intensity are then read to determine how the target is being expressed with the measurement data being logs of the ratio of the intensity of a first color and a second color.
In a typical experiment, readings for one type of microarray data are encoded as the log of a ratio of gene expression levels in test tissue and a control tissue. The numerical range of the resulting numbers can be very large, but typically will reside in a much narrower range (say plus two to minus two).
A popular pattern discrimination learning method is multi-layer perceptrons (MLP) also called feedforward neural networks. These machines require that their input data be numerical values in the range [0, 1]. Therefore in order to present these micro array data to a MLP, one must transform the original data to conform to this input data range requirement.
A function that can perform the desired transformation is a sigmoid function like the arctan function. These functions can insure that very large or very small measurement values will always map to the required range [0, 1], but at the price that differences between large values can be greatly diminished. Let us call this, “reduced sensitivity” in the range of large values. One can usually select a suitable parameter to the sigmoid function so that the sensitivity in the range typically expected is nearly linear. If the slope on the nearly-linear range is >45 degrees the sensitivity will be enhanced, if <45 degrees it will be reduced, if exactly 45 degrees it will remain unchanged.
A difficulty, however, can still occur. In the example above, the sensitivity of the transformed data will be maximum (i.e. the transform sigmoid function will have maximum derivative) near zero. This is the region where the ratio of measured values is near 1.0 where unfortunately its reliability is lowest. One would desire to have the sensitivity of the transformation very low here so that small differences would not be exploited by the learning machine where they are not reliable.
The system, apparatus and method of the present invention provide an effective and efficient way to transform the original data so as to reduce sensitivity of the overall transformation in an unreliable region while leaving it largely unchanged or enhanced everywhere else.
The present invention overcomes at least the above-noted problem of the prior art by providing an additional Gaussian transform that includes a parameter that permits tuning of the transform's width to that desired for the application in which it is being used. Further, the present invention advantageously addresses various issues surrounding the effectiveness and efficiency of current molecular diagnostic techniques. That is, the present invention will facilitate improved disease detection (e.g., both with respect to timing and accuracy), disease treatment (e.g., clear and personalized), and disease monitoring (e.g., fast and sensitive). Accordingly, the present invention is well suited to address the continuing need for real-time, faster, more sensitive, less labour-intensive and hence more cost-effective molecular diagnostic solutions suitable to replace or complement traditional techniques.
Additional benefits associated with the present invention (e.g., ability to cope with and/or effectively manage large amounts of data) will be apparent from the detailed description which follows, particularly when reviewed together with the appended figures, which figures are referenced to assist those of ordinary skill in the art to which the subject matter of the present disclosure appertains to better understand the illustrative examples of the present disclosure, wherein:
It is to be understood by persons of ordinary skill in the art that the following descriptions are provided for purposes of illustration and not for limitation. An artisan understands that there are many variations that lie within the spirit of the invention and the scope of the appended claims. Unnecessary detail of known functions and operations may be omitted from the current description so as not to obscure the present invention.
In measurement data, the distribution of the measurements may suggest transformations. For example, if a set of measurements is strongly skewed, a logarithmic, square root, or other power (between −1 and +1) may be applied. If a set of measurements has high kurtosis but low skewness, an arctan transform is used to reduce the influence of extreme values. However, the use of the arctan function creates a steepest slope at zero that the present Gaussian transform repairs. That is, the system, apparatus, and method of the present invention provide a way to transform data that reduces the sensitivity of the transformation in an unreliable region while leaving the data largely unchanged everywhere else. A second transformation is added that distorts the original data in such a way as to reduce the sensitivity of the overall transformation in the unreliable region while enhancing it or leaving it largely unchanged everywhere else.
In a preferred embodiment, an additional Gaussian transform is provided which has with its own parameter, herein p1 that permits the tuning of the width of the Gaussian transform to that desired for the application. Referring to
A preferred embodiment of a combined transformation for input of data to a Neural Net (or other pattern discrimination method) is shown in the following computer program.
It will be clear to one of ordinary skill in the art that one can have either transform independent of the other if one's task requires one and not the other property.
The combined transform of the present invention can be incorporated into an analysis apparatus as at least one of a software and firmware module that accepts values for parameters p1-p3 and original input values and returns transformed values. The following main program illustrates the behavior of such an embodiment wherein a main program solicits inputs for p1-p3 from a user and prints out transformed values according to the present invention for input data in the range [−20,20] that increments in steps of 0.1 over this range. In practice, actual sample data would be input and transformed by the combination.
Referring to
Referring now to
Having identified certain preferred aspects of the analysis apparatus of the present disclosure, it will be readily apparent to one skilled in the art that such apparatus may effectively be utilized in association with a variety of known and/or yet to be discovered medical diagnostic or measurement techniques. For example, the apparatus of the present disclosure is well suited for, among other things, use in association with the identification, monitoring and/or treatment of disease, as well as the characterization of biological conditions via, for instance, gene expression data (see, e.g., U.S. Pat. Nos. 6,964,850, 6,960,439, and 6,692,916, which patents being hereby expressly incorporated by reference as part hereof, for further illustrative discussion).
The user could make decisions based on the transformed data themselves, but more likely is that the transformed data would go directly into the analysis system 603 and use these outputs to make decisions. Initial analysis might just be computing and displaying the distribution of the transformed data, but more likely they would involve the application of pattern discovery methods and examining the discovered patterns according to some criteria of utility or reasonableness.
A persistent memory and database 500 provides short and long term storage of inputs, outputs, and intermediate results for transforming measurements by the measurement transform subsystem 500. The analysis system 600 further includes measurement analysis algorithms 603 connected to the persistent memory and database 510 that retains and makes available parameters, tolerances, decision rules, original measurements and a longitudinal history of results of transforming the original measurement data using the apparatus and method of the present invention.
Having identified certain preferred aspects of the analysis system of the present disclosure, it will be readily apparent to one skilled in the art that such apparatus may effectively be utilized in association with a variety of known and/or yet to be discovered medical diagnostic or measurement techniques. For example, as with the apparatus of the present disclosure, the system may also be well suited for, among other things, use in association with the identification, monitoring and/or treatment of disease, as well as the characterization of biological conditions via, for instance, gene expression data.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that the system and apparatus architectures and methods as described herein are illustrative and various changes and modifications may be made and equivalents may be substituted for elements thereof without departing from the true scope of the present invention. In addition, many modifications may be made to adapt the teachings of the present invention to a particular situation without departing from its central scope. Therefore, it is intended that the present invention not be limited to the particular embodiments disclosed as the best mode contemplated for carrying out the present invention, but that the present invention include all embodiments falling with the scope of the appended claims.
The present disclosure is related to U.S. Provisional Patent Application No. 60/691,131, entitled “Transforming Measurement Data For Classification Learning”, and filed Jun. 16, 2005, with such reference being assigned to the Assignee of the present disclosure.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2007/051283 | 4/10/2007 | WO | 00 | 11/6/2008 |
Number | Date | Country | |
---|---|---|---|
60746905 | May 2006 | US |