The present invention relates to a method of and a system for filtering at least a part of gas chromatography-mass spectrometry data. More specifically, the present invention relates to a method of and a system for filtering at least a part of gas chromatography-mass spectrometry data, wherein gas chromatography data comprising data representing one or more gas chromatography elution peaks obtained for at least one sample is provided.
Gas chromatography-mass spectrometry (forth denoted GC-MS) is a well-known method of substance identification that combines the features of gas-liquid chromatography and mass spectrometry to identify different substances within a test sample. GC-MS is generally widely used and has many applications for substance identification and e.g. comparison between multiple samples.
A GC-MS system or method typically produces a complex 3D dataset. The processing steps of the GC-MS data may e.g. be divided into four steps as shown in
Analysis of exhaled breath is an area of growing interest and use e.g. for use within the health and disease area. Using breath e.g. as a biological sample is appealing since breath-collection is relatively cheap, easy to perform, and non-invasive. GC-MS may be used to analyse exhaled breath. Other examples of usable chemical analytical methods for analysis of exhaled breaths are e.g. Time Of Flight Mass Spectrometry (TOF-MS) and Ion-Mobility Spectrometry (IMS).
For instance, so-called Volatile Organic Compounds (VOCs) are excreted from the skin, urine, feces, and most notably via exhaled breath. Besides pulmonary origin, VOCs also originate from the blood, reflecting any physiological, pathological or pathogen related biochemical processes throughout the body. Therefore, exhaled breath analysis may allow metabolic fingerprinting of disease processes anywhere inside the body. Exhaled breath analysis may also be used for other things than for metabolic fingerprinting of disease processes.
However, analysis of GC-MS data for a complex mixture such as breath is not evident or straightforward. Furthermore, when analysing exhaled breath every sample typically contain a few hundred peaks or so, giving a need for a fast alignment method. Additionally, known peak extraction methods are very sensitive, which of course is good but it therefore may also derive many ‘false’ peaks that do not really relate to a component.
Current commercially available software tools for GC-MS analysis are not generally designed for complex mixtures e.g. of the complexity of exhaled breath and furthermore it is not generally transparent to the user how the data is processed. Furthermore, at least some current commercially available software tools for GC-MS analysis apply filtering to the extracted peaks to improve the alignment step but this is most often done in a somewhat crude way simply by applying a threshold causing the physical meaning of this filtering to be unclear. In some tools, the filtering is not described or accounted for at all and a user simply does not know what happened to the data thereby reducing to quality of the data analysis.
Patent application US 2006/0125826 discloses systems and methods for correlating and displaying data produced by GC and MS. Filters for filtering displayed data e.g. like Extracted Ion Filter, Extracted Spectrum Filter, and a Search Engine Filter is disclosed where the Search Engine Filter is used to narrow down the list of matching spectra returned by the search engine.
It would be advantageous to provide a reliable aligned peak list or other suitable data structure that can be used for component identification and comparison between multiple samples. It would also be desirable to enable a reduction of the data to be processed. In general, the invention preferably seeks to mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination. In particular, it may be seen as an object of the present invention to provide a method that solves one or more of the above mentioned problems, and/or other problems, of the prior art at least to an extent.
To better address one or more of these concerns, in a first aspect of the invention a method of filtering at least a part of spectrometry GC-MS data is presented that comprise providing gas chromatography-mass spectrometry data for a gas mixture comprising data representing one or more gas chromatography elution peaks obtained for at least one sample, and filtering the gas chromatography-mass spectrometry data to reduce the amount of data, wherein the filtering comprises taking into account predetermined data representing one or more elution peaks previously determined to be false positives and/or predetermined data representing one or elution peaks previously determined to be true positives.
In this way, unreliable elution peaks are removed in an expedient manner reducing the amount of data e.g. used for a later alignment process speeding up the processing time and also improving the data quality.
In one embodiment, the method comprises displaying the gas chromatography-mass spectrometry data on a display together with the predetermined data representing one or more elution peaks previously determined to be false positives and/or the predetermined data representing one or elution peaks previously determined to be true positives.
In this way, a user may readily be presented with a visual representation of the GC-MS data together with predetermined true and false positives allowing for sensible choice of which filtering method to apply.
In one embodiment, the method comprises: displaying on a display, a representation of a decision line or plane, the decision line or plane illustrating a linear or non-linear boundary of the gas chromatography-mass spectrometry data between what is kept and what is removed after filtering.
In this way, a user may readily see what effect a given filtering method actually will have on the GC-MS data making the data-processing or filtering transparent to the user and further supporting sensible choice of which filtering method to apply.
In one embodiment, the filtering of the gas chromatography-mass spectrometry data comprises a filtering method selected from the group consisting of:
filtering the gas chromatography-mass spectrometry data removing data with the condition that all true positives and data associated with the true positives are left after filtering;
filtering the gas chromatography-mass spectrometry data removing data with the condition that all false positives and data associated with the false positives are removed;
filtering the gas chromatography-mass spectrometry data using at least two threshold values, each being for a preselected parameter, selected by a user where the filtering discards the gas chromatography-mass spectrometry data being below each threshold value for each associated parameter;
filtering the gas chromatography-mass spectrometry data based on statistical or mathematical analysis;
filtering the gas chromatography-mass spectrometry data based on linear discriminant analysis for two or more classes where the predetermined data representing one or more elution peaks previously determined to be false positives belongs to one predetermined class and/or the predetermined data representing one or elution peaks previously determined to be true positives belongs to a different predetermined class; and
filtering the gas chromatography-mass spectrometry data based on non-linear statistical analysis for two or more classes where the predetermined data representing one or more elution peaks previously determined to be false positives belongs to one predetermined class and/or the predetermined data representing one or elution peaks previously determined to be true positives belongs to a different predetermined class.
In this way, one or more efficient filtering method is/are provided suiting a given or different needs.
In one embodiment, the method comprises registering a selection made by a user of a filtering method and presenting to the user the decision line or plane associated with the selected filtering method.
In this way, a user may readily see what effect a given filtering method actually will have on the GC-MS data making the data-processing or filtering transparent to the user and further supporting sensible choice of which filtering method to apply.
In one embodiment, the gas mixture comprises exhaled breath.
According to another aspect, the invention also relates to a system for filtering at least a part of gas chromatography-mass spectrometry data, the system comprising: a processing unit adapted to filter gas chromatography-mass spectrometry data for a gas mixture to reduce the amount of data, the gas chromatography-mass spectrometry data comprising data representing one or more gas chromatography elution peaks obtained for at least one sample, wherein the filtering comprises taking into account predetermined data representing one or more elution peaks previously determined to be false positives and/or predetermined data representing one or elution peaks previously determined to be true positives.
The system and embodiments thereof correspond to the method and embodiments thereof and have the same advantages for the same reasons.
In general, the various aspects of the invention may be combined and coupled in any way possible within the scope of the invention. These and other aspects, features and/or advantages of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter.
Embodiments of the invention will be described, by way of example only, with reference to the drawings, in which
a-3d schematically illustrate an exemplary user interface of one embodiment of the method of filtering at least a part of GC-MS data with different filters selected and their corresponding decision lines illustrated;
An embodiment of the invention is illustrated in
The method starts or initiates at step 201 and proceeds to step 202 where obtained GC-MS data in the form of extracted elution peaks is displayed (e.g. together with additional information) to a user on a suitable display in either 2D or 3D, e.g. as illustrated by 301 in
At step 203, predetermined data representing one or more elution peaks previously determined to be false positives and/or predetermined data representing one or elution peaks previously determined to be true positives is provided and displayed together with the GC-MS data.
The predetermined data representing one or more elution peaks previously determined to be false positives and/or predetermined data representing one or elution peaks previously determined to be true positives may e.g. be stored in a data library or other suitable way. The data for false positives and/or true positives may e.g. have been determined based on earlier analysis, e.g. for simpler gas mixtures, and then stored for later use.
As one example,
It is to be understood that step 202 and 203 may be carried out as a single step.
At step 204, a method of filtering is selected by the user, e.g. among a plurality of available filtering methods. After the user has selected a filtering method, a linear or non-linear decision line or plane (depending on whether the data is displayed in 2D or 3D) is displayed together with the GC-MS data and the false positives and true positives. The decision line or plane illustrate a boundary of the GC-MS data between what is kept and what is removed of the GC-MS data after filtering according to the selected filtering method.
This provides the user with valuable feedback in an expedient manner of what data is removed and what data is kept after applying the selected filter.
The user may select between different filters and be presented with the associated decision line or plane (i.e. the method loops back before step 204) and can thereby better see the precise effect that a particular filter has on the GC-MS data and better make a sensible choice of filter to use. It is to be understood that in an alternative embodiment, only one predetermined filter may be used whereby step 204 is not necessary.
The available filters may comprise any suitable filters that remove an appropriate part of the GC-MS data while keeping another appropriate part. As examples are e.g. a filter that is based on user input, a filter based on statistical or mathematical analysis (e.g. a filter based on linear discriminant analysis (LDA), non-linear statistical methods, etc.), a ‘strict’ filter preserving only the GC-MS data being within an area defined by or associated with the true positives (e.g. defined by a derived linear regression line for the true positives), a ‘tolerant’ filter only excluding the GC-MS data being within an area defined by or associated with the false positives (e.g. defined by a derived linear regression line for the false positives), combinations thereof and/or any other suitable type of filter.
The ‘strict’ filter only needs the true positives while the ‘tolerant’ filter only needs the false positives. The ‘strict’ filter is more ‘aggressive’ and removes more data than the ‘tolerant’ filter but may remove some (currently unknown) true positives while the ‘tolerant’ filter is less ‘aggressive’ but may leave some (currently unknown) false positives. A filter using statistical or mathematical analysis may use both true and false positives or any one of them. A filter may filter of one or more parameters of the GC-MS data, e.g. like one or more of abundance, purity, Signal to Noise Ratio (SNR), width, amount, models, etc. Some of these filters will be explained in greater detail in connection with the following figures.
A filter to use is chosen by the user, e.g. after having seen the decision line or plane of one or more filters, and the filtering method is carried out on the GC-MS data removing a part of the GC-MS data thereby making a later alignment process (e.g. step 104 in
In an alternative embodiment, the user is not involved in selecting which filter to use, but rather a predetermined filter is used (whereby step 204 is not needed), e.g. a ‘strict’ or ‘tolerant’ filter or more preferably a filter based on statistical or mathematical analysis e.g. a filter based on LDE for a more automated process. The user may still be presented with the decision line or plane to know what will happen with the data after filtering but this (step 205) may also be omitted. As another alternative, the user is not displayed any information (whereby steps 202, 203, 204, and potentially step 205 is not needed) and an even more automated process is provided although without user knowledge and involvement.
Further details, variations, and aspects are explained in connection with the other figures.
a-3d schematically illustrate an exemplary user interface of one embodiment of the method of filtering at least a part of GC-MS data with different filters selected and their corresponding decision line illustrated. The same obtained or raw GC-MS data is shown and used in these figures.
a schematically illustrates an exemplary user interface comprising an area displaying obtained GC-MS data (301) for multiple samples in the form of extracted elution peaks to a user on a suitable display in 2D according to two selected parameters of the multi-dimensional data GC-MS data, in this case ‘models’ and ‘purity’. The multi-dimensional data GC-MS data may also be shown in 3D (according to three selected parameters as e.g. seen in
The user may choose which parameters that the obtained GC-MS data (301) is displayed according to at an appropriate selection area (302).
The obtained GC-MS data (301) is shown together with predetermined data representing one or more elution peaks previously determined to be true positives (304) and predetermined data representing one or elution peaks previously determined to be false positives (305). Please note, that in some embodiments, only one of these types may be displayed.
Further shown, is an area for selecting which filtering method to consider (303). In this particular figure, the user has selected the ‘strict’ filter and the corresponding decision line (306) is displayed on the GC-MS data (301) so the user readily can see the effect of the filter on the GC-MS data (301) once the filter is applied. The GC-MS data (301) below the decision line (306) is removed during filtering according to the selected filtering method. The decision line (306) may be determined as being perpendicular to a derived linear regression line for in this case the true positives (304). Alternatively, other, e.g. non-linear, decision lines may be used.
Also shown, is the amount of data that is removed by applying the filter; in this case 76.9441% reducing the amount of data significantly and thereby speeding up any later alignment process. It should be noted, since the % of reduction involves some calculation it is in this particular example not updated automatically by selecting a given filter but requires a further action in this case pressing the button designated ‘Apply Classifier’.
b schematically illustrates an exemplary user interface comprising an area displaying obtained GC-MS data (301). The user interface and the data correspond to the ones in
c schematically illustrates an exemplary user interface comprising an area displaying obtained GC-MS data (301). The user interface and the data correspond to the ones in
d schematically illustrates an exemplary user interface comprising an area displaying obtained GC-MS data (301). The user interface and the data correspond to the ones in
In this way, a user may select between different types of filtering and parameters and see the effect on the data that the selected filter will have. Furthermore, as an example, during a research phase, a user may e.g. initially use the more ‘tolerant’ filter until a greater understanding of the data has been achieved whereby later e.g. a ‘strict’ or ‘manual settings’ may be used.
In
It is to be understood, that the shown user interface is merely one example of a user interface and many other user interface designs could be used with the present invention.
While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive; the invention is not limited to the disclosed embodiments. Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2013/052242 | 3/21/2013 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61617118 | Mar 2012 | US |