Embodiments described herein relate to physical sample characterization via spectral analysis, particularly for spectroscopic measurements such as those produced by infrared spectrometry.
Spectroscopic measurements, for example infrared absorption spectra, are often performed by co-adding a number of repeated measurements to reduce the impacts of noise in the measurements. Many commercial infrared spectrometers allow the user to program a set number of measurements of a spectrum to automatically acquire and co-add (or equivalently co-average). For measurements with random noise, co-adding improves the signal to noise ratio (SNR) by the square root of the number of co-adds (i.e., measurements included in the co-adding). In practice, this leads to diminishing returns from co-adding or co-averaging. For example, a 10× improvement in SNR would require a 102 (100×) increase in the number of measurements that are co-added, while a 100× improvement in SNR would require a 104 (1000×) increase in the number of measurements.
To improve the SNR by 100× for a measurement that requires 1 second per spectrum, the desired improvement accomplished by co-adding of the raw data alone would therefore require 10,000 seconds of measurement time (or nearly 3 hours) to accumulate a sufficient number of co-adds. For this reason, the number of co-adds used is often set by measurement time constraints, which can result in less than desired SNR.
Various noise reduction techniques are commonly applied to hyperspectral data sets, i.e. spectra measured at a plurality of locations. These techniques generally use assumptions about self-similarity of adjacent or related regions of a sample to reinforce common features between spectra while suppressing differences between spectra of similar regions, assuming that such differences are due to noise. Although this assumption may be useful in certain situations like image processing, it is an imperfect assumption for spectroscopic measurements because on the microscopic level these kinds of samples can be highly heterogeneous. Regions that are spectroscopically similar may in fact not be identical. The assumption of self-similarity can mask out minor differences between different regions of a sample, especially if the variations are spatially rare. Thus one cost to this kind of common feature/region reinforced co-averaging is that spatially small regions that exhibit different optical properties may be lost in processing of data that has insufficient SNR.
Accordingly, there is a need for an improvement in SNR of infrared spectroscopic data that results in minimal or no loss of information, and that does not acquire increased SNR at the cost of excessive measurement time requirements.
In various embodiments, a solution to one or both of these problems is provided by the use of intelligent co-adding techniques for analysis of spectroscopic measurements, rather than conventional co-adding of raw data or common feature/region reinforced data. In various embodiments, the intelligent co-adding noise reduction techniques overcome the limitations of conventional co-adding for spectroscopic measurements of spectra at the same location, rather than generating the entire hyperspectral analysis at the outset. In embodiments in which the spectroscopic measurements are repeated at the same location, the physical properties being measured are not just self-similar, but rather truly identical and invariant so long as the sample is not being altered by the measurement process or changing environmental conditions or over the duration of the data collection. In various embodiments, intelligent co-adding makes use of information about the invariance of the spectroscopic properties of a single sample location to dramatically improve the SNR and reduce the measurement time over conventional co-adding.
The disclosure herein refers to various embodiments of this process and techniques of the present disclosure as “intelligent co-adding” that can replace conventional co-adding with a significantly more efficient method of reducing noise in the accumulated spectroscopic measurements.
The above summary is not intended to describe each illustrated embodiment or every implementation of the subject matter hereof. The figures and the detailed description that follow more particularly exemplify various embodiments.
Subject matter hereof may be more completely understood in consideration of the following detailed description of various embodiments in connection with the accompanying figures, in which:
While various embodiments are amenable to various modifications and alternative forms, specifics thereof have been shown by way of example in the drawings and will be described in detail. It should be understood, however, that the intention is not to limit the claimed inventions to the particular embodiments described. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the subject matter as defined by the claims.
As described herein, improvements in signal-to-noise ratio of spectroscopic measurements can be accomplished by intelligently co-adding spectral data. Intelligent co-adding refers to improving the Signal to Noise Ratio (SNR) with repeated spectroscopic measurements (i.e., co-adding or co-averaging), while maintaining components of the data that are more likely to correspond to actual properties of the sample and discarding components of the data that are more likely to correspond to noise. Measurements are performed at the same location on a sample and that the underlying physical properties of the sample being measured are substantially invariant during the time of a plurality of measurements. This information can be used to substantially separate the signal from the noise and then reconstruct a signal where the noise has been substantially suppressed. The resulting intelligently co-added spectra can thus achieve an order of magnitude or better SNR improvement over conventional co-adding in the same amount of time, and/or dramatically reduce the amount of measurement time required to achieve a given SNR.
Raw spectra can be deconstructed into coefficients of appropriate transform basis functions (wavelets, for example) which are then compared on the basis of their relative signal content, removing or attenuating wavelets with low signal content. In this way noise is selectively removed, and the resulting denoised signal, provides a much more precise measurement of the infrared absorption characteristics of a sample and in a substantially shorter time than conventional co-adding.
In various embodiments, data sets can be decomposed into a sum of transform basis functions and corresponding coefficients. A well-known transform is the Fourier transform. Fourier transforms involve decomposing a function or curve as a sum of sine and cosine waves with varying frequencies, and assigning a weighting coefficient (e.g. amplitude) to each such wave, such that the sum of all of the sine and cosine functions approximates the original curve. (Fourier transforms can be equivalently constructed with sums of complex exponentials with corresponding amplitudes and phases.) For the purposes described herein, a variety of transform basis functions, including Fourier transforms, may be applied. But given the nature of IR absorption spectra, some transform basis functions will perform better than others. While Fourier transforms generally perform best at reconstructing periodic functions, IR spectra are generally composed of a series of peaks, so transform basis functions that are comprised of peaked shaped features rather than oscillatory features will generally provide better decomposition of the spectra. One example of a suitable transform is wavelet deconstruction, which was first described by Holschneider et al (M. Holschneider, R. Kronland-Martinet, J. Morlet and P. Tchamitchian). See also Wavelets, Time-Frequency Methods and Phase Space, pp. 289-297. Springer-Verlag, 1989. Wavelet basis functions comprise features that look like small waves that can vary in width and location on the X-axis. Thus they can be suitable for reconstructing the pattern of peaks in a typical IR absorption spectrum.
As used throughout this disclosure, and in order to more clearly describe these concepts, the following terms are used with these accompanying definitions relative to the acquisition, analysis, presentation and utilization of spectroscopic measurements, particularly spectroscopic measurements produced by infrared spectrometry:
Even this general rule, by which error in the measurements can be determined based upon the square root of N, applies accurately only when comparing SNR values at larger numbers of co-adds. It is not always accurate in predicting the SNR improvement from a single spectrum to a number N that is low enough, such as N=twenty spectra, because of the stochastic nature of the noise in the single spectrum and hence the large variability in SNR of individual spectra.
The improvement in SNR between
In practice, to obtain spectra 200 a user of a system according to an embodiment will obtain n spectroscopic measurements Si(λ) attempting to measure a physical property of a sample, for example the absorption as a function of wavelength A(λ). Measurements performed at the same location should have the exact same physical properties A(λ), but each measurement Si (λ) will be contaminated by a variable amount of noise Ni(λ) as shown in Eq. 1.
S
i(λ)=A(λ)+Ni(λ) Eq. 1:
The conventional approach of co-adding spectra simply adds up each Si(λ) and optionally divides by the number of spectra, as shown in Eq. 2.
Since A(λ) is assumed constant through the measurement, the goal of this averaging is to decrease the relative contribution of the random noise Ni(λ) via successive trials. As mentioned previously, this method is not very efficient, reducing the fractional impact of the noise term by the square root of n the number of measurements.
The current approach described herein decomposes the measurements Si(λ) into a series of m coefficients of a set of m transform basis functions w(m), as shown in Eq. 3.
S
i(λ)=Σj=0m-1αmw(m) Eq. 3:
A large variety of different basis functions w(m) may be successfully employed. Wavelets are especially suitable transform basis functions as they can be selected to have shapes similar to absorption peaks in IR spectroscopy. In one embodiment biorthogonal wavelets are employed.
At 204, the coefficients of the series of transform basis functions are used to determine a signal figure of merit. The signal figure of merit is a metric for ranking the relative signal content in each transform basis function in the series. The goal of the current approach is to substantially improve upon the conventional co-averaging/co-adding technique by using a signal content figure of merit to substantially discriminate between the signal and the noise.
Using the signal content figure of merit at 206, a threshold to substantially separate transform basis functions associated with signal from those primarily associated with noise is determined.
The next step is to examine the distribution of the coefficients αm for each spectrum Si(λ) to assess the signal content of each basis function w(m). In particular, we construct a suitable signal content figure of merit that can be used to identify a subset of transform basis functions that have high signal content versus those that have low signal content and/or high noise, i.e. those that do not represent signal content with high confidence. One suitable signal content figure of merit (FOM) is given by |μm|/σm where |μm| is the absolute value of the mean of each αm and σm is the corresponding standard deviation. This ratio of |μm|/σm can then be calculated for the coefficient αm for each transform basis function w(m). A basis function that contributes strongly to the real signal A(λ) will have both a relatively high mean μm and a lower standard deviation σm, and thus a higher signal content figure of merit |μm|/σm. Conversely, basis functions that are portions of the noise Ni(λ) will often (but not always) have lower mean values μm, but will almost always have large standard deviations σm due to the fact they are random and hence not constant over successive measurements. Thus the figure of merit |μm|/σm provides a metric that can be used to efficiently separate out the basis functions ws(m) whose coefficients contribute primarily to the signal A(λ) and those basis functions wn(m) that contribute primarily to the noise Ni(λ).
While wavelets have been referred to above, it should be understood that other types of basis function could be used in alternative embodiments. In fact, in order to determine the best approximation, in some embodiments different types of basis function can be used and compared to determine which function leads to the highest SNR.
At 208, a noise-reduced version of the spectroscopic signal is created, in which the coefficients of transform basis function primarily associated with noise are substantially attenuated while those associated primarily with signal are maintained. This noise-reduced version is an intelligently co-added spectrum, such as the one shown in
According to some embodiments, the threshold 302 determined automatically. A computer-implemented embodiment may, for example, set the position of threshold 302 at a desired signal content level (i.e., a fixed position on the y axis). In another embodiment, a computer-implemented embodiment may determine a number of data points that are required to generate a comprehensive intelligently co-added spectrum, and write off any additional data points (i.e., retain a desired minimum number of data points to the left of threshold 302 in the embodiment shown in
In still further embodiments, the appropriate position for threshold 302 can be determined by a user directly, rather than relying upon a computer implementation. For example, a user can observe the effect on the reconstructed spectrum of moving the threshold 302 relatively higher or lower. An experienced operator will be able to diagnose the effects of attenuating too much data (choppy lines, missed narrow “peaks” in absorption spectra) or attenuating too little data (fuzzy data lines) to achieve optimum intelligent co-adding.
In other embodiments, a user and a computer-implemented system can work together, such as by permitting user or expert to manipulate the threshold 302 and applying machine learning for the computer to determine the criteria applied by that user or expert. Over time, a machine-learning based system can detect the types of things (inflection points, line fuzziness, commonly-found peaks) that experts look for. Alternately or additionally, machine learning algorithms can learn the correlation between specific threshold settings and desired levels of denoising. After gaining sufficient experience, such machine learning algorithms can “suggest” or automatically apply an appropriate threshold 302.
The threshold selected results in a mathematical difference in how the data is calculated, as illustrated in the differences between
Once the separation is performed on the basis of the figure of merit corresponding to the threshold selected either by the user or by a computer, it is possible to attenuate the coefficients αmn of the transform basis functions wn(m) that contribute primarily to noise by multiplying these coefficients with an attenuation factor γm, as shown in Eq. 5. For example it is possible to calculate a new noise-reduced measurement Si′(λ) summing over a subset of basis functions that substantially contribute to the signal, while attenuating those that primarily contribute to the noise.
The two subsets represented by this sum are shown in
In Eq. 5, setting γm to zero is also equivalent to simply omitting the second term in the equation, i.e. omitting all the transform basis functions from the sum where the figure of merit is less than the threshold. In the context of this specification, the word “attenuate” when used to refer to attenuating or zeroing a subset of transform basis function coefficients also covers the case of simply omitting them from the reconstruction sum, omitting the second term in Eq. 5 to result in Eq. 6.
However, the application of an attenuation factor of zero removes the unreliable data measurements in their entirety, and in some cases there may be valuable data within the raw spectra. Alternately, therefore, it is possible to apply a series of variable attenuation factor γm that smoothly vary from high values for basis functions with high figures of merit to low values for basis functions with low figure of merit. That is, “more reliable” data having a higher likelihood of including valuable signal data can be given a higher attenuation factor on a sliding scale.
As described above, the sliding scale can be determined by a user manually selecting attenuation factors manually (which would take quite some time as there can be tens or hundreds of thousands of wavelets in a typical scan) or by selecting attenuation factors as a function of figure of merit. In embodiments, the results of setting attenuation factor for wavelets or sets of wavelets can be depicted for a user in real-time. Machine learning could be applied in some embodiments such that a user of the system could have attenuation factors suggested that result in relatively higher SNR.
The selection of an appropriate signal content figure of merit threshold can be either under user control or can be completely automated depending on the user sophistication and the end application.
As the signal content figure of merit threshold is moved to the right to include more wavelets in the reconstruction, the spectra become more faithfully reproduced and the residuals become more featureless. In the best case the residual contains substantially only noise and no spectral features, as represented in residual 506C and corresponding reconstructed spectrum 502C. If the threshold is pushed too far to the right, all real spectral features are reproduced but at the cost of an inclusion of some of the measurement noise, as shown in spectrum 502D. These tradeoffs can be observed in real-time by providing the user a real-time display of plots similar to those in
Automated algorithms may also be used to set the threshold, as described above. We described above one signal content figure of merit f=μm/σm where μm is the average coefficient of a basis function and σm is the standard deviation of that coefficient. In this case, the vertical axis of the figure of merit ranking plot represents in effect the signal to noise of a specific basis function. So when this number is well below unity, the associated basis functions have little or no signal content. In practice, it is often possible to set a threshold ft to a point where the ranking intersects a fixed value on the vertical scale. Empirically we have found that threshold values of 0.1-1 usually result in well reconstructed spectra with a efficient noise suppression.
Adaptive techniques can also be applied where the threshold is adjusted while calculating a metric of the quality of the spectrum.
No matter which of these features is used to set the threshold 602, the resulting data collection schema can be sensitized for generating high-SNR spectra of solid state materials for two reasons. First, solid state materials generally have broad spectral lines in IR absorption spectra. Second, random noise usually has higher frequency content that has a much higher derivative than true spectral features. As such, the peak to peak amplitude of the derivative can provide a sensitive metric to the onset of noise content in the basis function ranking plot.
Many other suitable metrics and optimization approaches can be used as well. For example, it is possible to adjust the threshold until the RMS amplitude of the resisdual achieves a target value. Another approach is to observe the frequency content in the residual, for example using a Fast Fourier Transform (FFT). The method described herein, separating spectral data into wavelets and associating a figure of merit that can be compared to a variety of phenomena, can be applied generally, and different phenomena can be selected depending upon various factors including the type of sample to be tested, in embodiments.
These remaining non-noise indicators can be indicators of types of suitable transformations that may be applied to the spectrum as part of the noise rejection step. As described previously and illustrated in equation 3, each spectrum is decomposed into a sum of appropriate transform basis functions with associated coefficients. One example of a commonly used transform is the FFT where a signal is decomposed into a sum of sines and cosines of difference frequencies and associated coefficients. In this case, the transform would be the FFT and the sines and cosines would be the transform basis functions. The FFT is often a desirable transform as the amplitudes and frequencies of the sines and cosines can be calculated extremely efficiently via a simple closed form computation. While the FFT can be excellent for periodic signals, it is less desired for transforming signals that non-oscillatory/non-periodic. For signals like IR absorption spectra, it may be desirable to use a basis function that has the shape of a peak or an impulse. For example, IR spectra are generally composed of a sum of molecular oscillations at specific frequencies, so transform functions that comprise a peak like shape can better reproduce the spectral content of IR spectra. It is possible in principle to decompose a spectrum into a series of peaked functions, for example Gaussian and/or Lorentzian functions, though this can be computationally intensive as it could involve extensive curve-fitting. But alternatives are available that are very computationally efficient, for example wavelet transformations. A specific wavelet transformation that can provide good results is the undecimated wavelet transform, also known as the stationary wavelet transform or other names. It should be recognized that a variety of different types of wavelets or other transform basis functions could be used, and some may be more suited to use with particular types of spectra.
In various embodiments, the techniques may substantially eliminate the non-common noise, while retaining the true signal of the IR absorption (i.e., resulting in a denoised IR spectrum of the sample location). Therefore at 802, the n spectra from the sample location are transformed using an undecimated wavelet transform. As described above, other wavelet transforms and more generally other transforms may also be successfully used in other embodiments. A variety of different wavelet basis functions can be employed successfully, although wavelets with smoother and more symmetric “mother wavelets” tend to produce better results in accurately reproducing IR spectra. Biorthogonal wavelets, including the FBI 4,4 wavelet, as well as Coifflet and symmlet wavelets for example can all produce acceptable results. Other wavelets can also be used successfully.
In the case of using an undecimated wavelet array, applying an undecimated wavelet transform at 802 results in a 2D array comprised of n sets of wavelet transform coefficients at different levels of detail. At 804 the 2D array is optionally reshaped into a 1D array of wavelet coefficients that can be sorted easily in a later step. In step 806, the average p and standard deviation c are calculated for each wavelet coefficient across each of the n spectra. This step is used to assess the variability of the wavelet coefficients, for use in determining which wavelets to use to reconstruct denoised spectra in following steps, which can be conducted either sequentially or in other order as appropriate. Not all steps must be performed in alternative embodiments.
At 808, a figure of merit (FOM) is calculated for each wavelet coefficient. As described above, a suitable figure of merit can be |μ|/σ, but other related figures of merit can be applied in various embodiments. At 810, the wavelet coefficients are sorted by the figure of merit, i.e. to rank the wavelets in order of high versus low signal content (or equivalently low versus high noise content). At 812, a FOM cutoff threshold is selected to determine which wavelet coefficients to maintain and which to attenuate or discard. This can be performed by a human operator and/or by an automated algorithm. At 814, low signal/high noise wavelet coefficients that are beyond the FOM cutoff are attenuated or set to zero to create a denoised set of wavelet coefficients. At 816, if 804 has been previously applied, the wavelet coefficients are reshaped into a 2D array with dimensionality matching that of the undecimated wavelet transform of 802. (If a wavelet transform has been applied that only produces a 1D output at 802, then 804 and 816 can be omitted.) At 818, an inverse undecimated wavelet transform is applied to the n sets of denoised wavelet coefficients, resulting in n denoised spectra. At 820, the n denoised spectra are added or averaged together to result in a single intelligently co-added spectrum for the location on the sample. This process can optionally be repeated at a plurality of points on the sample.
The process described above, can be modified to use different wavelets, different wavelet transforms, e.g. the discrete wavelet transform, or even altogether different transformations with different basis functions, and different specific implementation algorithms can be applied. In any case, the common approach will be (1) transformation of a plurality of spectra at a single location into a set of transform coefficients; (2) evaluating the variability of the transform coefficients; (3) ranking the transform coefficients by a figure of merit based on the variability; (4) attenuating or zeroing out a subset of transform coefficients whose figure of merit indicates high variability; (5) performing an inverse transform to reconstruct de-noised spectra.
As mentioned previously, this optimization process can also be completed automatically without user adjustment, in embodiments. For example, the signal content threshold indicated by cursor 908 in the embodiment shown in
The process described herein can be implemented as an improvement over conventional co-adding on infrared spectrometers. Examples of infrared spectrometers are illustrated in
In each of these cases 10A-10C, intelligent co-averaging can be applied by implementing the process described herein on the processors shown (1013, 1022, and 1038) or alternately on another processor.
Various embodiments of systems, devices, and methods have been described herein. These embodiments are given only by way of example and are not intended to limit the scope of the claimed inventions. It should be appreciated, moreover, that the various features of the embodiments that have been described may be combined in various ways to produce numerous additional embodiments. Moreover, while various materials, dimensions, shapes, configurations and locations, etc. have been described for use with disclosed embodiments, others besides those disclosed may be utilized without exceeding the scope of the claimed inventions.
Persons of ordinary skill in the relevant arts will recognize that the subject matter hereof may comprise fewer features than illustrated in any individual embodiment described above. The embodiments described herein are not meant to be an exhaustive presentation of the ways in which the various features of the subject matter hereof may be combined. Accordingly, the embodiments are not mutually exclusive combinations of features; rather, the various embodiments can comprise a combination of different individual features selected from different individual embodiments, as understood by persons of ordinary skill in the art. Moreover, elements described with respect to one embodiment can be implemented in other embodiments even when not described in such embodiments unless otherwise noted.
Although a dependent claim may refer in the claims to a specific combination with one or more other claims, other embodiments can also include a combination of the dependent claim with the subject matter of each other dependent claim or a combination of one or more features with other dependent or independent claims. Such combinations are proposed herein unless it is stated that a specific combination is not intended.
Any incorporation by reference of documents above is limited such that no subject matter is incorporated that is contrary to the explicit disclosure herein. Any incorporation by reference of documents above is further limited such that no claims included in the documents are incorporated by reference herein. Any incorporation by reference of documents above is yet further limited such that any definitions provided in the documents are not incorporated by reference herein unless expressly included herein.
For purposes of interpreting the claims, it is expressly intended that the provisions of 35 U.S.C. § 112(f) are not to be invoked unless the specific terms “means for” or “step for” are recited in a claim.