The present invention relates to the processing of spectral analysis data, and more particularly, to filtering spectral analysis data to remove random noise and other disturbances and/or to compensate for deficiencies in the data collection process utilized to capture the spectral data.
An assortment of analytical identification methods exist for the non-destructive testing of materials. Particularly, various spectroscopic methods, including Raman spectroscopy, can be advantageously employed in the practical identification of particulates. By way of illustration, in dispersive Raman spectroscopy, a laser is used as an excitation source to focus light onto a particle. The laser light from the excitation source interacts with the Raman active chemical bonds of the particle impinged upon by the laser light to produce Raman lines that are shifted from the wavelength of the excitation laser by corresponding vibration frequencies. Light from the sample area of the particle is filtered using a filter that blocks the excitation wavelength while passing the Raman-shifted wavelengths to a spectrometer.
The spectrometer may utilize a grating that disperses light causing different wavelengths to leave the grating at different angles. Under this arrangement, the light from the grating travels to an optical detector, such as a charge coupled device (CCD) camera to capture spectral data representative of the Raman spectrum of the particle under interrogation. The data captured by the CCD can be utilized as a signature, which is typically compared to a library of previously determined signatures to identify the material excited by the laser.
According to various aspects of the present invention, a method of locating and removing spike noise from spectral data is presented. The method comprises collecting a plurality of replicates of spectral data, where each collected replicate comprises a plurality of pixels, which have an associated intensity value. The plurality of replicates represents discrete instances of spectral data obtained sequentially from a sample under interrogation. The method further comprises performing replicate processing to condition the intensity values of the collected replicates for subsequent pixel processing and spike noise filtering and performing pixel processing to identify outliers corresponding to noise spikes within the plurality of replicates by collectively evaluating the conditioned intensity values of the plurality of replicates as a function of pixel position. The method still further comprises transforming the replicates by adjusting the intensity values of the plurality of replicates based upon a predetermined function. The method still further performs spike noise filtering by removing each identified outlier from its replicate and by replacing the removed outlier value with a computed approximation of an intensity value of a noise-free signal at that removed outlier pixel position.
According to further aspects of the present invention, a method of locating and removing noise spikes from spectral data is presented. The method comprises obtaining a first raw replicate of spectral data, wherein the first raw replicate represents a first instance of spectral data obtained from a sample under investigation; and obtaining a second raw replicate of spectral data, wherein the second raw replicate represents a second instance of spectral data obtained from the same sample under investigation. The method then computes a first smoothed replicate from the first raw replicate and a second smoothed replicate from the second raw replicate. The method compares the first raw replicate, the first smoothed replicate, the second raw replicate, the second smoothed replicate, or any combination thereof to determine whether at least one noise spike is present in one of the raw replicates. If the method locates a spike, then the method eliminates the effect of each located noise spike in the first raw replicate and the second raw replicate by locally re-smoothing the corresponding replicate around each located spike. After the effects of the noise spikes are removed, the method consolidates the first raw replicate and the second raw replicate into a single spectrum that is free of cosmic spikes.
According to still further aspects of the present invention, a method of detecting bad pixel elements in a charge coupled device is presented. The method comprises collecting a plurality of replicates of spectral data, wherein each collected replicate comprises a plurality of pixels, each pixel having an associated intensity value. The method then calculates a rolling median for each pixel of each collected replicate and estimates a noise level of the corresponding spectrum. The method also identifies pixels of interest, local minima, and the height of a negative peak of select pixels. The method identifies large negative peaks where local minima have a height greater than a predetermined threshold. Further, the method comprises assessing the negative peaks for removal and replacing each removed negative peak with a linear interpolation between the pixel previous to the peak and the next pixel after the peak.
With reference generally to
In some embodiments, the light source 12 comprises a high intensity laser capable of generating a laser beam 18 having a narrow spectral bandwidth. In many embodiments, the optics 14 comprise one or more optical components, such as lenses, reflection surfaces, and/or other optical devices necessary to direct the laser beam 18 towards a sample area 20. For instance, as illustrated, the laser beam 18 passes through one or more optional lenses and/or reflection surfaces 22, which direct the laser beam 18 along a first optical path, towards an optical device 24 such as a long pass dichroic mirror. Light from the laser beam 18 is redirected by the optical device 24 along a second optical path that is orthogonal to the first optical path so as to pass the laser beam 18 through an objective 26. The objective 26 serves to focus the laser beam 18 onto the sample within the sample area 20.
According to various aspects of the present invention, the sample area 20 may include a sample collected using a suitable sampler. In illustrative implementations, the sampler comprises an aerosol collector, such as a small area electrostatic aerosol collector. Such a small area electrostatic aerosol collector deposits a sample drawn from the ambient air onto an interrogation region 20A, e.g., a sample substrate within the sample area. In alternate implementations, other samplers and/or sampling techniques are utilized to collect a suitable sample for interrogation. For instance, in illustrative implementations, the sampler comprises a sample dispenser such as a micro liter liquid sample dispenser. As such, depending upon the collection technique implemented, the sample can be taken from the ambient air, from a water supply or other source, from an environmental sample, medical diagnostic sample, product, material or component sample, etc.
Regardless of the sample collection technology, the sample in the interrogation region 20A of the sample area 20 is illuminated by the light source 12. Scattered and dispersed light is collected from the sample area 20 back through the objective 26 along a third optical path that is generally opposite in direction of the second optical path. In this regard, the interaction between the laser light and the sample collected in the sample area 20 leads to Raman scattering of light that is shifted in wavelength from the light source 12. As such, the light directed along the third optical path includes elastic incident photons and inelastically scattered photons due to Raman scattering.
The light along the third optical path is redirected by the optical device 24 along a fourth optical path. As the light is directed along the fourth optical path, the inelastically scattered photons are separated from the elastic incident photons, e.g., using at least one appropriate filter device 28, e.g., a longpass filter, a bandpass filter, etc., such that the inelastically scattered photons are passed to a spectrometer 30 and a processing device 32, which implements one or more spike filters as described in greater detail herein.
For example, in several embodiments the spectrometer 30 includes a spectrometer grating that passes the filtered light to the image output device 16, e.g., a two dimensional charge coupled device (CCD). The divergence in angles of the light exiting the grating causes light at different wavelengths to arrive on different pixels of the CCD so as to capture spectral data representative of the Raman spectrum of the particle under interrogation. Thus, the image output device 16 receives inelastically scattered photons to output information regarding the sample interrogated on the sample substrate.
The imaging system 10 can implement any spectroscopic measurement system such as for Raman, infrared, atomic, visible, microwave and ultraviolet spectroscopy. In this regard, other system configurations may be implemented within the spirit and scope of the present invention. For instance, in numerous embodiments, the optics 14 comprise various combinations of filters, beam splitters, lenses, mirrors, and other devices. Moreover, even though CCD devices are commonly used for various types of spectroscopic measurements, in some embodiments, other devices are used to capture spectral information.
Referring to
The architecture and features of the processing device 32 are presented by way of illustration and not by way of limitation. In that regard, in several embodiments, the processor 32 may have an alternative architecture and/or features to that described with reference to
The Raman spectra collected from the image output device 16, e.g., a CCD device, are susceptible to showing spikes of random noise features caused by cosmic rays and other disturbances. These disturbances are also often referred to as ‘cosmic spike’ noise because the noise can be caused by the interaction of cosmic rays with the CCD pixel elements. However, as used herein, the term ‘cosmic spike’ or more generally, ‘noise spike’, refers generally to any random spike, noise spike, cosmic ray interference or other disturbance spike in spectral data.
Cosmic spike noise is generally random, and thus can affect any pixel element of a sensor within the CCD device. Accordingly, cosmic spike noise can distort the true signal and can thus interfere with spectral analysis and classification. By way of example, a Raman spectrum can be represented by a plurality of adjacent intensity signals aligned along a row thus defining a one dimensional array of pixels. However, a cosmic ray at the time of sensor reading can corrupt the true spectrum by producing a large spike that affects one or more of the measured intensity values. Such a cosmic spike can generate intensity values that are 10 times higher, or more, than other nearby intensity values. Moreover, each cosmic spike of interference can typically contaminate a number of pixels, e.g., 1-4 consecutive pixels in an illustrative example.
Referring to
The method 100 defines a cosmic spike filter (CSF), which may be implemented as the filter 36 executed by the processing device 32 of
In general, the method 100 can be conceptually organized as data gathering, data preprocessing, data analysis, spike elimination (where applicable) and spectral analysis preparation.
The data gathering aspect of the method comprises obtaining spectra comprising two or more replicates at 102. That is, a spectrum is collected at least two discrete times, from a given sample at a given sample location. Data characterizing the measurement of each captured spectrum defines a replicate of the spectrum, i.e., an instance of a spectrum measurement taken at a selected location of a given sample. For instance, in illustrative implementations, the image output device 16 as described with reference to
In an illustrative example, the cosmic spike filter 100 obtains the Raman spectra in pairs of replicates. If noise is introduced each time a replicate is read from the CCD array, then requiring only a low number replicates reduces the amount of noise introduced into the system. In an illustrative implementation, the cosmic spike filter 100 is able to detect and remove narrow noise spikes, e.g., spikes that are one to four pixels in width, and which are spaced more than one filter width apart. Furthermore, utilizing two replicates improves the accuracy of the spike detection process compared to when only one replicate is used, as will be described in greater detail herein.
Keeping with the above example, the method 100 obtains a first raw replicate of spectral data, designated herein as Replicate 1, where the first raw replicate represents a first instance of spectral data collected from a sample under interrogation, e.g. a sample in the interrogation region 20A of the sample area 20 illustrated in
Data preprocessing may be optionally performed on the gathered data. For instance, in illustrative implementations, each replicate is smoothed at 104. By way of illustration, smoothing is implemented by computing a first smoothed replicate from the first raw replicate and by computing a second smoothed replicate from the second raw replicate. In exemplary implementations, keeping with the above example, the processing device 32, e.g., via the microprocessor 42, executes computer readable program code stored in the tangible computer readable storage medium 60 to implement a quadratic least-squares procedure at 104, to compute the first smoothed replicate from the first raw replicate and to compute the second smoothed replicate from the second raw replicate.
Data analysis is performed to locate and identify features that are determined to be spike noise. For instance, a determination is made at 106 as to whether at least one noise spike is present in each of the first and second replicates. Keeping with the above example, the processing device 32, e.g., via the microprocessor 42, executes computer readable program code stored in the tangible computer readable storage medium 60 to determine whether at least one noise spike is present in each of the first and second raw replicates by comparing select ones of the first raw replicate, the first smoothed replicate, the second raw replicate and the second smoothed replicate. The comparison is described in greater detail in reference to section I.B. below. In this regard, noise spikes are located (if present) in each replicate. In illustrative implementations, noise spikes are located using a two-pass approach to find outliers. In many embodiments, additional processing is also performed on the results of this two-pass approach to return a final list of outliers that represent spike noise.
Upon identifying spike noise within Replicate 1 and/or Replicate 2, a noise spike elimination operation is performed at 108. In exemplary implementations, the processing device 32, e.g., via the microprocessor 42, executes computer readable program code stored in the tangible computer readable storage medium 60 to eliminate the effect of each located noise spike in the first and second raw replicates. By way of example, elimination of noise spikes is implemented by correcting the effect of each located noise spike in the first and second raw replicates by locally re-smoothing the corresponding replicate around each located spike, as will be described in greater detail herein.
After removing spike noise, the Replicates are prepared for spectral analysis at 110. In exemplary implementations, the processing device 32, e.g., via the microprocessor 42, executes computer readable program code stored in the tangible computer readable storage medium 60 to consolidate Replicate 1 and Replicate 2 into a single spectrum at 110 that is free of cosmic spikes.
Thus, in exemplary implementations, noise spikes are corrected by identifying spikes as outliers and by locally re-smoothing the corresponding raw replicate without using the outlying values. In some embodiments, the two resultant locally re-smoothed replicates are summed, combined or otherwise merged to yield a single spectrum that is free of cosmic spikes.
By way of illustration and not by way of limitation, an exemplary implementation of the method 100 comprises obtaining spectra at 102 by obtaining a pair of raw replicates, designated as Replicate 1 and Replicate 2. Further, smoothing each replicate at 104 comprises using local polynomial regression to calculate replacement values for corresponding spectral values, such as by implementing a Savitzky-Golay filter. The Savitzky-Golay filter can be executed by the processing device 32 of
As an illustration, let x1, x2, . . . , xn, and y1, y2, . . . , yn, denote the raw values for Replicate 1 and Replicate 2, respectively. Let X1, X2, . . . , Xn, and Y1, Y2, . . . , Yn, denote the corresponding smoothed values for Smoothed Replicate 1 and Smoothed Replicate 2, respectively. More specifically, Xi denotes the smoothed value that will replace xi, in the corresponding instance of smoothed spectral data in Smoothed Replicate 1. Similarly, Yi denotes the smoothed value that will replace yi, in the corresponding instance of smoothed spectral data in Smoothed Replicate 2.
For a filter width m, the quadratic function q(x) that best fits the 2m+1 data points xi−m, . . . xi+m is calculated by the method of least squares. Under this arrangement, Xi is defined as q(xi). In numerous embodiments, the quadratic function q(y) that best fits the 2m+1 data points yi−m, . . . yi+m is also calculated by the method of least squares such that Yi is defined as q(yi). In many embodiments, the smoothing operation is performed as a convolution method where the convolution coefficients are calculated ahead of time, e.g., before implementing smoothing at 104, to give the effect of performing local quadratic regression.
As described above, a determination is made as to whether a noise spike is present in a replicate at 106. In illustrative implementations, noise spikes, if present, are located using a two-pass approach. In general, the two-pass approach is implemented using a first raw replicate (e.g., Replicate 1) and a second raw replicate (e.g., Replicate 2).
A first standardized residual is calculated for the difference between a first raw replicate and a smoothed version of the first replicate (e.g., Smoothed Replicate 1). Pixel numbers corresponding to potential outliers are identified in the first raw replicate where the standardized residual value exceeds a predetermined cutoff value. That is, a potential outlier is declared for each pixel of the first raw replicate where the computed first standardized residual satisfies a predetermined condition.
In the second pass, the first raw replicate (Replicate 1) is compared with the second raw replicate (Replicate 2) at least at the identified pixel numbers corresponding to potential outliers to identify true outliers where the comparison exceeds a predetermined threshold value. As an illustrative example, comparisons can be performed in a neighborhood around potential outliers identified in the first pass. For instance, by comparing the difference between the first and second raw replicates within a filter width (pixel width of the cosmic spike filter) for each pixel declared as a potential outlier, pixels are identified as a true outlier if the difference between corresponding pixel values exceed a predetermined amount. In this manner, official outliers are identified from (discriminated from) the potential outliers.
The above-process is repeated for the second raw replicate. That is, a second standardized residual is calculated for the difference between the second raw replicate and a smoothed version of the second replicate. A potential outlier is declared for each pixel of the second raw replicate where the computed second standardized residual satisfies a predetermined condition. Then, by comparing the difference between the second and first raw replicates within a filter width of each pixel declared as a potential outlier, pixels are identified as a true outlier if the difference between corresponding pixel values of the second and first raw replicates within the filter width exceeds a predetermined amount. Locally re-smoothing is then performed over the first and second raw replicates without using the outlying values.
In some embodiments, these comparisons are implemented by forming standardized residuals between both the first raw replicate and its smoothed version and the first raw replicate and the second raw replicate. Outliers are identified where the standardized residuals exceed predetermined cutoff values. In this regard, the two-pass approach can be executed by the processing device 32 of
In illustrative implementations, the located true outliers are eliminated by locally re-smoothing the corresponding replicate, e.g., around and including the identified outliers, without using the outlying values. The above two-pass approach is repeated for the second raw replicate. That is, the second raw replicate can be compared with its smoothed version in a first pass, and the second raw replicate can be compared with the first raw replicate in the second pass, in a manner analogous to that set out above.
By way of illustration and not by way of limitation, standardized residuals are formed by calculating:
r
i
/s=(xi−Xi)/s
In illustrative examples, s is the standard deviation of the residual r. In alternative examples, s is an estimate of the standard deviation, such as:
s=(μ75−μ25)/1.3490
where μ25 and μ75 are the first and third quartiles of the distribution of residuals, respectively. A potential outlier at pixel i is identified when ri/s>c1, for some cutoff value c1.
Notably, the outlier detection method sometimes finds artificial outliers near cosmic spikes. Furthermore, the outlier detection method may not always find all pixels associated with a noise spike or other disturbance of interest. However, according to various aspects of the present invention, the deficiencies that may manifest in the outlier detection method are overcome by utilizing a second pass in which the raw values of Replicate 1 are compared to the corresponding raw values of Replicate 2. In this regard, a true outlier is declared, for example, by searching within only one filter width of a potential outlier and identifying the potential outlier as a true outlier if the difference between the two replicates at the outlier pixel location is deemed “large enough,” e.g., if the difference between the first and second raw replicates at the corresponding pixel location exceeds a predetermined amount. The difference between the two replicates is thus measured in both an absolute and relative sense.
Referring briefly to
However, referring back to
For purposes of illustration of cosmic spike filtering, an exemplary filter with width m=4 and cutoff value c1=3.5 is used to filter spectral information. In general, the filter width and/or the cutoff value may be empirically determined. Let Replicate 1 be the spectrum shown in
Referring back to
R
i
/S=(xi−yi)/S
In this illustrative example, S can be the standard deviation of the residual R or an estimate of the standard deviation, such as:
S=(μ75−μ2.5)/1.3490
where μ25 and μ75 are the first and third quartiles of the distribution of the Ris, respectively. Searching within only one filter width of pixel i (i.e., for j=i−m, . . . , i+m), a true outlier is declared if both of the following conditions are satisfied:
R
j
/S>c
1
and
R
j/max(1,yj)>c2
The first condition (Rj/S>c1) evaluates the difference between replicates in an absolute sense. The second condition (Rj/max(1,yj)>c2) evaluates the difference between replicates in a relative sense, i.e., as a percentage of the Replicate 2 value. The term “max(1, yj)” returns a “1” if yj<1 and yj if yj≧1. In some embodiments, a cutoff value of c2=0.25 is utilized.
Keeping with the above example, three pixels are identified as potential outliers in Replicate 1, including pixels 709, 713, and 717. However, after processing the second pass of outlier detection, results for Replicate 1 yield outliers at pixels 712, 713, and 714. There are at least two main results of the second pass. First, the potential outliers at pixels 709 and 717 did not pass the conditions of the second pass. Second, the outlier at pixel 713 has been expanded to cover pixels 712-714. That is, the cosmic spike at pixel 713 is supported over three consecutive pixels. For the second pass of the outlier detection method, Replicate 2 acts as a backup for deciding which values in Replicate 1 are truly outliers.
Referring to
The spike detection process is repeated for Replicate 2, where Replicate 1 acts as the backup. For the Replicate 2 illustrated in
Suppose that Replicate 1 has an outlier at pixel i: that is, at intensity value xi. Then each of the 2m+1 values smoothed using the value xi will be contaminated by this outlier. This contaminating effect is removed by recalculating the quadratic functions used in the original smoothing process without using the intensity value xi.
More specifically, the smoothed values Xi−m, . . . , Xi+m, have all been contaminated by the value xi. For example, Xi−m is obtained by finding the quadratic function q(x) that best fits the data points xi−2m, . . . , xi, and then setting Xi−m=q(xi−m). In the outlier elimination phase, Xi−m is recalculated by finding the quadratic function Q(x) that best fits the data points xi−2m, . . . , xi−1, (i.e., without xi), and then setting Xi−m=Q(xi−m). This process is repeated for all values Xi−m, . . . , Xi+m, and for all other outliers xi.
The outlier elimination process may be extended to cover the cases of two (double), three (triple), four (quadruple), or more consecutive outliers. The basic concept is the same in each case: the best-fit quadratic is recalculated without using the double, triple or quadruple outliers as data points. All contaminated values of the smoothed spectrum are then recalculated using the new quadratic functions. In this regard, the outlier elimination process can be executed by the processing device 32 of
The outlier elimination approach may exhibit a few limitations. For example, in some embodiments, a noise spike is completely removed only if it occupies a predetermined number of consecutive pixels, e.g., four consecutive pixels or fewer. The elimination method could be further extended to eliminate the effects of M consecutive pixels. However, there would be a degradation in the smoothing as M approaches 2m−2, and the least-squares problem would be underdetermined when M>2m−2. In general, high intensity noise that covers more than a few consecutive pixels, e.g., four consecutive pixels, should not be considered a noise spike and should be filtered using some other technique. Also, if two separate cosmic spikes occur within m pixels of one another, then it will not be possible to eliminate both noise spikes completely. This can be seen because, while the first cosmic spike is being corrected, the replacement values will still be contaminated by the second spike and vice versa.
The final filtered replicates are smoothed spectra. However, in certain illustrative implementations, the filtering process is modified so that raw intensity values are maintained except in the location of cosmic spikes. The replicates still need to be smoothed initially for use in the spike identification process, but otherwise the smoothed values are discarded. In this regard, identified noise spike intensity values are replaced as described above. In many embodiments, the decision to keep fully smoothed spectra or not depends on what further analysis will be done on the spectra. For instance, certain analysis techniques include smoothing in a later step. However, it may be undesirable to smooth the spectra twice.
As noted in
It has been observed that certain spectral features, e.g., that are high-intensity and very narrow, may falsely be identified as cosmic spikes based solely upon an evaluation of the absolute difference between raw replicates because the difference between replicates at the tip of the feature is large in the absolute sense. For this reason, the difference between replicates is also evaluated in a relative sense, as described in greater detail herein.
Referring to
To confirm that this feature agrees between the two replicates and is not a cosmic spike, the two replicates are plotted together in
I. F. Additional Observations with Regard to Comparing Replicates
The above-described outlier detection in its various implementations described above, exhibits at least one limitation. If a cosmic spike was to occur at the same pixel in each replicate, then the outlier detection might not detect such a pair of cosmic spikes because the two replicates would agree with each other at those pixels. However, cosmic ray events are rare, and it is highly unlikely for two such events to occur in the same pixel. Other kinds of noise peaks, such as those generated by electronic noise, are more common. However, these kinds of peaks tend to have broader pixel widths relative to cosmic spikes, and are not considered spikes for purposes of cosmic spike filtering herein. In this regard, filtering of electrical noise, etc., should be handled by some other filtering process.
Outliers consisting of too many consecutive pixels often represent noise features that are not spikes. Therefore, a designated spike feature is limited to a small fixed number of pixels (spike width). Any spikes that consist of more than a predetermined spike width threshold, e.g., four consecutive pixels in an illustrative example, are not considered spike noise for purposes of cosmic spike filtering herein. For purposes of clarity of discussion herein, the number of pixels has been limited to about three to four pixels. However, this number is for illustration only. In practice, the specific number of pixels used to define a maximum spike width can vary.
According to further aspects of the present invention, situations can arise where outlier pixels are spaced by a small “gap”. In this regard, it may not always possible to completely remove the effects of a noise spike of interest. Accordingly, “gap filling” may be performed as part of the data analysis process at 106 of
By way of illustration, assume that there is an outlier spike within m pixels of a spike of interest, where m is a relatively small number, e.g., smaller than the pixel width of typical cosmic spikes. According to illustrative implementations, the gap of m pixels between spikes is filled in and then “overlapping” spikes are removed. Assume for example, that pixels 123, 124, and 126 are identified as outliers. The naïve approach would be to view these pixels as a two-pixel spike spaced by one pixel from a one-pixel spike. However, it may not be possible to remove two spikes that are so close together (spaced by a single pixel in this example). In this regard, a better result is obtained by designating pixel 125 as an outlier pixel. Accordingly, pixels 123-126 in this illustrative example, are treated as one spike that is four pixels wide.
A missing pixel, such as pixel 125 in the above example, can often reasonably be considered as part of the noise spike, even where pixel 125 fails to be characterized as an outlier. For instance, due to the particular attributes of two raw replicates (Replicate 1 and Replicate 2), the conditions for identifying a pixel as an outlier by evaluating the difference between the raw replicates may fail to be satisfied at that pixel.
According to various aspects of the present invention, in many embodiments, missing pixels are filled using an appropriate rule. For instance, in several embodiments, a pixel gap is determined to be eligible to be filled if the total spike width (after filling gaps) does not exceed a predetermined number of consecutive pixels, e.g., four pixels.
According to further illustrative implementations of the present invention, missing pixels are filled in by recognizing that there can sometimes be more than one possibility for filling in a pixel gap and by giving higher priority to pixels that correspond, for example, to a larger mean Rj=Xj−Yj over the resulting spike. The utilization of the larger mean Rj is provided by way of illustration and not by way of limitation.
In this regard, other possibilities may be utilized for filling in a pixel gap. For example, in illustrative implementations, if pixels 103, 105, 106, and 108 are identified as outliers, and a rule says that the largest spike can be four pixels, a cosmic spike processing algorithm, e.g., as implemented during the data analysis processing at 106 of
In addition to filling in pixel gaps where possible, in numerous embodiments, some spikes are removed from consideration if they occur too close to other spikes. Two spikes are considered to be too close together, for example, if the end pixels of each spike are within m pixels of each other. Since it will not be possible to fix both such spikes, the lesser spike should be ignored so that the more severe spike can be removed successfully. There are a number of possibilities for deciding which spike is more severe and should be correspondingly passed on for spike elimination. In an illustrative example, a decision of which spike is more severe and will be passed on to the elimination part of the process is determined by the values of Rj. The spike with the higher maximum value of Rj is retained for further processing while the other spike is dropped from consideration.
Various aspects of the present invention are designed to act on Raman spectra collected in pairs, e.g., as Replicate 1 and Replicate 2 as described more fully herein. However, it is possible to collect additional replicates and combine the collected data into two master replicates e.g., by summing the data. The total Raman integration time should be the same for each of the master replicates so that the intensity values between the replicates is comparable. Furthermore, in many embodiments, multiple replicates are collapsed down to two master replicates by summing the collected replicate data in an alternating sequence. For example, if 8 replicates are collected, then Replicate 1 would be the sum of the collected replicates 1, 3, 5, and 7, and Replicate 2 would be the sum of the collected replicates 2, 4, 6, and 8. This practice reduces the chances of a spectral feature being classified as a spike because the intensity of the feature decayed sufficiently between the first half of the replicates and the second half.
According to various aspects of the present invention in several embodiments, the filter 100 finds cosmic spikes primarily in the pixel dimension as opposed to the time dimension. The exemplary filter 100 uses at least two time replicates. An advantage to using relatively few replicates is that there is less contribution to the overall spectral data (the sum of the replicates) from CCD read error, which is constant for each replicate used. The second replicate serves to improve the spike detection process. Moreover, the spike elimination process has been extended to handle spikes of consecutive pixels, e.g., four consecutive pixels.
According to further aspects of the present invention a data collection approach is provided that is implemented in combination with a statistical approach to removing cosmic spike noise from the collected spectral data without distorting the true spectral data.
Referring now to
Referring to
Referring to
More particularly, the approach of simple averaging may result in the collected data being affected by noise spikes, which are localized in time.
For instance, as illustrated in
Referring to
Referring to
Spectra are collected as a plurality of replicates at 202 in a manner analogous to that set out in greater detail herein. For instance, as described in greater detail herein, each collected replicate comprises a plurality of pixels, each pixel having an associated intensity value. In an illustrative example, eight replicates are collected.
As an illustrative example, assume a CCD such as the CCD device described with reference to
The method 200 further comprises performing replicate processing at 204 to condition the intensity values of the collected replicates for subsequent pixel processing and spike noise filtering. Further, the method 200 comprises performing pixel processing at 206 to identify outliers corresponding to noise spikes within the plurality of replicates by collectively evaluating the conditioned intensity values of the plurality of replicates as a function of pixel position. Finally, the method comprises performing spike noise filtering at 208 by removing each identified outlier from its replicate and by replacing the removed outlier value with a computed approximation of an intensity value of a noise-free signal at that removed outlier pixel position.
Referring to 18B, an exemplary detailed method of performing replicate processing at 204 is illustrated. In this regard, processing as described below may be implemented as part of the filter 36 executed by the processing device 32 of
In general, a loop is performed to prepare the collected replicate data for spectral processing. In the illustrated method, one of the replicates is selected at 212. A subset of pixel numbers from the selected replicate is optionally selected at 214. The subset of pixel numbers may include the entirety of pixels collected, or the subset may include less than the entirety of pixels. By way of illustration and not by way of limitation, the subset of pixel numbers comprises a limited range of pixels, e.g., pixels 163 to 943 of 1024 total pixels. In practice, the subset need not be limited to a single range or contiguous range of pixels. Rather, the subset can comprise any one or more pixels of a corresponding replicate.
In several embodiments, an optional compensation is performed at 216, e.g., for pixels that satisfy a predetermined condition. Compensation is used to compensate for pixel intensity values that are invalid for subsequent pixel processing and spike noise filtering, e.g., to prepare, limit, compensate, adjust or otherwise configure the values of pixels for subsequent processing as described more fully herein. A decision is made at 218 as to whether there are more replicates for processing. If all of the replicates have been processed, the replicate processing is complete. Otherwise, flow loops back to 212 to select a next replicate.
Referring to 18C, an exemplary detailed method of pixel processing at 206 is illustrated. In this regard, processing as described below may be implemented as part of the filter 36 executed by the processing device 32 of
Pixel adjustments are performed at 222 to transform the replicates by adjusting the intensity values of the plurality of replicates based upon a predetermined function. For instance, a first function is computed over at least the selected subset of pixel numbers. In an illustrative example, the spectral data is predicted to decay exponentially over time. As such, the function comprises computing the logarithm base-e of the intensity (hereinafter “log(intensity)”) at least for each pixel in the selected subset. Through this example, a curve with exponential decay is transformed into a linear curve to facilitate spike elimination. In numerous embodiments, for each replicate processed, the same subset of pixels is utilized, e.g., pixels 163 to 943 of 1024 total pixels. In this way, the first function, e.g., log(intensity) of each pixel in the subset of each replicate is calculated to implement an exponential to linear curve transformation over each replicate.
In an illustrative implementation, the optional compensation performed at 216 (
In exemplary implementations, the pixel adjustment at 222 also comprises mean centering the pixel values, such as by subtracting the mean log(intensity) from the (optionally compensated) log(intensity) values for each pixel of the subset.
At least one replicate is then identified as an outlier and is removed (if necessary) for each pixel location within the subset. In general, outliers are identified by fitting a first model at 224 to the transformed replicates that models each pixel location as a function of replicate and adjusted intensity value, computing for each pixel, a residual at 226 for each replicate and identifying whether an outlier exists at 228 for each pixel. Identifying whether an outlier exists is performed by identifying the replicate having the maximum residual value for each pixel location and declaring the pixel location for the replicate having the maximum residual value at each pixel location an outlier if the value of that maximum residual is greater than a predetermined threshold. If an outlier is identified, the intensity value of that outlier can be replaced with a predicted value of that pixel position from the first model.
By way of illustration, in some embodiments, replicates are removed by fitting a first model to the adjusted pixel data for each pixel by replicate at 224. For instance, a linear model is fit to predict log(intensity) values by replicate number, e.g., with a constant slope and intercept across all replicates per pixel in the corresponding subset. A residual is computed at 226 and outliers, if present, are then identified at 228 based upon the computed residual. For instance, referring back to
Comparatively, as illustrated in
As an example, for each replicate at each pixel within the corresponding subset, a Studentized residual is calculated. The Studentized residual is considered a residual divided by a standard error, e.g., residual divided by an estimate of its standard deviation. For each pixel, the replicate with the maximum residual is identified as an outlier at 228 and is removed or replaced with its predicted value (e.g., taken from the model at 224) if a probability, e.g., statistical p-value for that residual, is greater than a predetermined threshold. In an exemplary implementation, the threshold is a value such as 0.05.
A decision is made whether any additional residual processing is desired at 230. If additional processing is required, the flow loops back to re-fit a new model at 224 in view of previous adjustments to the replicate data, to compute residuals at 226 and to replace the value of the replicate having the maximum residual with its predicted value (e.g., taken from the new model at 224) if a probability for that residual is greater than a predetermined threshold, as set out above. For example, in many embodiments, it is desirable to repeat 224, 226 and 228 a plurality of times, e.g., three or more times to achieve the desired level of processing. Thus, for example, by looping three times, up to three replicates may be removed for each pixel.
By way of illustration, reference is made to
Further, assume that for pixel 5, the value of Replicate 3 represents the greatest residual among the replicates and that it exceeds the predetermined threshold. As such, the value of pixel 5, Replicate 6 is replaced with a predicted value. The process then loops back and repeats a second time. Although not explicitly illustrated in the Figures, with reference to
In the second loop, assume that for pixel 1, the value of replicate 1 represents the greatest residual among the replicates and that it exceeds the predetermined threshold. As such, the value of pixel 1, Replicate 1 is replaced with a predicted value. Again, although not explicitly illustrated in the Figures, with reference to
Referring to 18D, an exemplary detailed method of performing filtering at 208 is illustrated. In this regard, processing as described below may be implemented as part of the filter 36 executed by the processing device 32 of
Once identified replicate points (outliers) have been identified as described with reference to
A final model is fit to the adjusted pixel data by replicate at 234 and necessary pixels are replaced. Fitting a final model at 234 facilitates modeling each pixel location as a function of replicate and intensity value with identified outliers removed, thus representing a noise free signal. As such, the intensity value of each identified outlier can be replaced with the computed approximation of a corresponding intensity value of the noise free signal (approximated by the model), at that removed outlier pixel position.
In an illustrative implementation, a linear model is fit with all identified outlier replicate points removed and replaced with predicted values from the final model. The replaced values are transformed at 236 to their original scale, e.g., by performing the inverse of pixel adjustment processing at 222 during pixel processing. For instance, keeping with the above example of using a logarithm function, the transformed values are transformed back to the original scale by adding the mean log(intensity) that was subtracted out during the adjustment for that pixel and exponentiating the result.
Referring to
Referring to
According to various aspects of the present invention, the quality of spectral data collected from any measurement process is improved, where the spectral data is affected by sharp, random noise events that are localized to a single region of the array and occur in a space of time that is short relative to the total measurement time. In particular, any measurement employing a CCD array can be affected by spike noise, and the method 200 may be used to remove that noise.
According to various aspects of the present invention, the filter method 200 finds noise spikes in replicate data sets over time rather than along the pixel data dimension, and thus provides a good approximation of the noise-free signal. This approach avoids errors caused by distortions of the shape of the spectral data in the pixel dimension, which may occur if correction was implemented along the pixel dimension.
Signals from high-sensitivity charge-coupled device (CCD) array sensors can show a “negative peak” if certain conditions exist. Occasionally a pixel can go “bad,” i.e., malfunction, causing consistently low counts (or intensity levels) for that pixel and distorting the overall spectrum. These negative peaks can significantly distort the true spectral data. However, according to various aspects of the present invention, a statistical approach is provided to identify and remove negative peaks from a spectrum.
Referring to
Spectra are collected in a plurality of replicates at 302 in a manner analogous to that set out in greater detail herein. By way of example, the raw spectral data may be collected as eight replicates. In practice, other numbers of replicates may alternatively be selected. Moreover, the filter may be applied to the raw data that has been summed over the replicates at each pixel.
A rolling median is calculated for all pixels at 304. In this regard, a median width is specified. In an illustrative example, the median width is set to a value such as 11 pixels wide. However, other pixel widths can be utilized.
An estimate of the noise level of the spectrum is made at 306. As an illustrative example, a plurality of segments is taken, a model is fit to each segment and an estimated standard deviation is determined for the segments. In illustrative implementations, four segments are taken, such as at pixels 401-460, pixels 501-560, pixels 701-760 and pixels 926-985 of a CCD having 1024 pixels. A model such as a linear model is fit to each segment and the estimated standard deviation is set to the minimum standard deviation of the residuals of the four segments. In practice, any suitable number of segments or segment ranges may be utilized. Moreover, alternative models can be fit to each segment.
Pixels where the intensity deviates more than a predetermined amount are identified at 308. As an illustrative example, areas are identified where the intensity for a given pixel is more than one standard deviation from the rolling median. If an indentified area is greater than the rolling median, then that area may be indicated, e.g., with a “+”. Likewise, if the identified area is less than the rolling median, then the area may be indicated, e.g., with a “−.” The most recent pixel at which the values crossed the rolling median is recorded twice: once proceeding left to right and once proceeding right to left. This yields the first and last pixels for each negative peak. The width of each negative peak may thus be calculated, for example, as the number of pixels between (and including) the first and last pixels. Thus, width may be computed by subtracting the first pixel from the last pixel and adding one.
Local minima are indicated at 310. Local minima may be indicated, for example, when a pixel is less than the two pixels on either side thereof. Also, the “height” of the negative peak is determined at 312. The height of the negative peak at a pixel may be measured for example, by the absolute value of the difference between the rolling median and the minimum intensity of the negative peak.
As an illustration, the height of the negative peak at a predetermined width, e.g., a plurality of pixels, such as two pixels, is measured by the absolute value of the minimum difference between the intensity at the negative peak to the intensity of each of the adjacent pixels. If the intensities of both of the adjacent pixels are greater than the rolling median, then the height at a width of two pixels (in the current example) is equal to the height of the negative peak. Calculating the height of the negative peak at the width of a larger number of pixels, e.g., four pixels, is a bit more complicated. Such an approach starts by looking at the minimum difference between the intensity at the negative peak to the intensity of each of the pixels two away from the negative peak. Then the minimum distance between either of the adjacent pixels to the rolling median is considered. The height of the negative peak at the width of four pixels is the minimum of these four values.
Still further, large negative peaks are identified at 314. Large negative peaks may be identified, for example, when a local minimum has a height greater than a threshold, e.g., a multiple of the standard deviation. Under this arrangement, a height threshold can be specified. In an illustrated implementation, the height threshold is set to a value such as ten standard deviations.
Each large negative peak is assessed at 316. For instance, if the minimum height at the width of a predetermined number of pixels, e.g., two and four pixels, is less than one-half of the height of the negative peak, if the pixel is a local minimum within a predetermined number of pixels, e.g., ten pixels, and if the width of the negative peak is less than a specified threshold, then the peak is tagged for removal. Keeping with the above example, the width threshold may be specified as a value such as ten pixels.
Tagged peaks are removed at 318. In an illustrative implementation, each removed negative peak is replaced, e.g., with a linear interpolation between the pixel previous to the peak and the next pixel after the negative peak.
Various aspects of the present invention improve the quality of the spectral data collected from any measurement process that is affected by sharp, random noise events that are localized to a single region of the array. In particular, any measurement employing a CCD array can be affected by spike noise. CCD arrays are commonly used in spectroscopic measurements such as Raman, atomic, visible, and ultraviolet spectroscopy, and these types of measurements are employed in a wide variety of fields. However, various aspects of the present invention set out more fully herein, are provided to remove that noise.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention.
Various aspects of the present invention may be embodied as a computer program product embodied in one or more computer-readable storage medium(s) having computer-usable program code embodied thereon. Further, various aspects of the present invention may take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
Aspects of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams may be implemented by system components or computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions that implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other devices to produce a computer implemented process such that the instructions that execute on the computer, other programmable data processing apparatus, or other devices provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Computer systems programmed with instructions embodying the methods and/or systems disclosed herein, or computer systems programmed to perform various aspects of the present invention and storage or storing media that store computer readable instructions for converting a general purpose computer into a system based upon various aspects of the present invention disclosed herein, are also considered to be within the scope of the present invention. Once a computer is programmed to implement the various aspects of the present invention, including the methods of use as set out herein, such computer in effect, becomes a special purpose computer particular to the methods and/or program structures herein.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, one or more blocks in the flowchart or block diagrams may represent a component, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or in the reverse order, e.g., depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having thus described the invention of the present application in detail and by reference to embodiments thereof, it will be apparent that modifications and variations are possible without departing from the scope of the invention defined in the appended claims.
This application is a continuation of International Application No. PCT/US2011/027048, filed Mar. 3, 2011, entitled “FILTERS FOR SPECTRAL ANALYSIS DATA”, which claims the benefit of U.S. Provisional Patent Application Ser. No. 61/309,962, filed Mar. 3, 2010, entitled “FILTERS FOR SPECTRAL ANALYSIS DATA”, the disclosures of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61309962 | Mar 2010 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2011/027048 | Mar 2011 | US |
Child | 13596374 | US |