Mass spectrometry may be used to diagnose diseases or in other non-medical applications. A sample of a target to be diagnosed (or otherwise identified) may be tested by a mass spectrometer that produces a mass spectrometry profile. The mass spectrometry profile may include one or more peaks at different mass-to-charge units or other measurement unit. These peaks are representative of the physical attributes of the sample of the target. Although these peaks do not contain any diagnostic information by themselves, they can be compared to a reference database of previously tested targets that do have known characteristic patterns. However, in order to generate any meaningful diagnostic information, the reference database must include probability distribution functions of attributes in sufficiently narrow ranges. Reference databases with overly wide distributions may prevent accurate diagnosis of diseases or other types of non-medical determinations.
Embodiments relate to an apparatus and/or method that includes deconvolving a pre-deconvoluted distribution of spectrometry reference profile peaks into at least two post-deconvoluted distributions of spectrometry reference profile peaks. Through this deconvolution, sufficiently narrow probability distribution functions may be attained, which may contribute to diagnostic accuracy. In embodiments, the pre-convoluted distribution of spectrometry reference profile peaks originated from spectrometry reference profiles associated with a first category. In embodiments, the at least two post-deconvoluted distributions of spectrometry reference profile peaks each originated from spectrometry reference profiles each associated with at least two different sub-categories of the first category. For example, the first category may be a particular cancer and the at least two sub-categories may be age groups. These different types of cancer may be further deconvolved into more subcategories to form a cluster of probability distribution functions, which are meaningful in diagnostic applications.
Example
Example
Example
Example
Example
Example
As the invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the drawings, similar reference numerals may used for similar elements in a non-limiting fashion.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention or the claims. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms comprise or have are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.
Medical diagnosis is becoming increasingly important. Early detection of diseases greatly increases the changes for successful treatment. Recently, mass spectrometry is becoming a trend in diagnosing disease. For example, matrix assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) mass spectrometers have been used as a fast, accurate, and cost-effective way of diagnosing diseases, including microorganism identifications. In microorganism identification or disease diagnosis using mass spectrometer data, each microorganism, for instance, bacteria is represented by a mass spectrum produced by mass spectrometer, for example a MALDI-TOF device. The mass spectrum of a sample under test is compared with the mass spectrums of the reference mass spectrums stored in the database to determine the specific micro-organism. This library diagnostics method is one of most beneficial aspect comparing with other diagnostic tools, it is more economical, more time-saving, more convenient to handle, and more accurate.
Embodiments relate to optimal clustering to facilitate higher accuracy. Embodiments are based on a deconvolution concept to obtain more highly clustered sets or categories of samples. Deconvolution may be defined as a process to separate a dataset into two or more independent datasets of clusters. In embodiments, successive deconvolution is a relatively efficient way of clustering and/or way to facilitate higher diagnostic accuracy. For example, in embodiments, sets of m/z's with newly defined subcategories are used as a base of disease diagnostics for pattern matching analysis.
Cancer diagnosis using mass spectrum has been challenging because diseases are affected by many factors such as age, health condition, etc. It may be difficult to identify markers that can accurately identify the progress of a particular diseases (e.g. cancer). Mass spectrums are divided into many categories such as cancer organ types, cancer stages, patient's conditions like cholesterol levels and blood sugar levels, and patient's disease history, etc. Successful diagnosis depends on clustering efficiency with classifications or categories. Embodiments relate to finding optimal categories for cancer diagnosis.
Example
MALDI-TOF MS offers rapid identification of biomolecules such as peptides, proteins and large organic molecules with very high accuracy and sensitivity. MALDI-TOF is becoming a standard for identification of micro-organisms in clinical biology.
Example
The samples may then be provided to the MALDI-TOF MS unit 302 having an ion flight chamber 321 and/or a high voltage vacuum generator 322, in accordance with embodiments. A processing unit 323 in the MALDI-TOF MS may identify the mass/charge and its corresponding intensity. For the disease diagnostic purpose, those acquired mass and intensity data may be reorganized to set up a standard mass list, in which a concept of the center of mass where intensities are balanced and equilibrated is introduced. A standard mass to charge list is defined based upon the machine accuracy and the center of mass concept. The stored spectrum data for each laser irradiation may also be used to set up the standard mass list.
In embodiments, diagnostic unit 303 may then compare, the spectra from a patient's sample with the pre-stored spectra and analyzes the pattern difference of the two spectra. The diagnostic unit 303 may then identify the presence and progress of the disease. In embodiments, as shown in example
Example
Example
The matrix 607 containing a sample may be irradiated by a laser 601. Both the sample molecules on the matrix 607 may be vaporized. As the matrix 607 absorbs the laser 601 and the sample becomes ionized, some of that energy is passed to the sample molecules and a number of the sample molecules become ionized 615a-c. Voltage may be applied to electrodes in a chamber containing the matrix 607, drawing the ionized molecules 615a-c to the mass spectrometer tube 603 and ultimately to detector 613.
An electrostatic field along the tube 603 of the spectrometer causes the ionized molecules 615a-c to fly down the length of the tube 603. The “time of flight” (TOF) is the time it takes the ions 615a-c to reach the detector 613 at the end of the tube 603 and depends on its mass/charge ratio (m/z) of the ionized particles 615a-c. The recorded time is converted by the spectrometer and is reported as an m/z ratio, where m is the mass of the ion in Daltons, and z is the ions' charge.
Example
Example
Example
In embodiments, mass spectrometer test data may have unknown characteristics and a plurality of sets of mass spectrometer reference data has known characteristics. The sample may include biological molecules. The metadata information of the source may include information about the source of the biological molecules. The characteristic information of the source may include a biological analysis information of the source. The biological analysis information may be a medical diagnosis of at least one of a human being, an animal, a plant, or a living organism.
Example
However, for example, for PDF 917 associated with cancer, this distribution of peaks in the reference database may contain more information than just the general diagnosis of cancer. In accordance with embodiments, PDF 917 may be deconvolved into multiple PDFs each associated with a different kind of cancer.
In embodiments, PDFs of a cancer patients and normal subjects at a particular m/z. Accurate classification is difficult because the two or more PDFs overlap with each other. This is due to convolution of the spectrums belonging to many categories.
Cancer is only use as an example disease for the purpose of illustration and any kind of categorization, even outside of the medical field, may be applicable.
For example, without PDFs 925 and 927, mass spectrometry test data from a patient may only be compared with point 923 of PDF 921. If one of the peaks of this mass spectrometry test data is within a reasonable range of point 923, it may be generally concluded that the patient under test has cancer, but not what type of cancer. By deconvolving PDF 921 into PDFs 925 and 927, the matching system is able to use much more information from the reference library. PDFs 925 and 927 are a cluster associated with the pre-deconvoluted PDF 921. Note that the approximate or actual summation of PDFs 925 and 927 equal PDF 921, but the centers of mass of PDFs 925 and 927 are different. In this simplified example, if the mass spectrometry profile of a patient under test had a peak at approximately the center of mass of PDF 925, then it may be concluded that the patient has lung cancer, which is more information than just a comparison with PDF 921 which would only indicate the general existence of cancer.
In embodiments, the associative relationship within a cluster provide quality information. For every probability density function for a category or sub-category, if deconvolution can be performed to further define the cluster, then there is a higher accuracy in diagnosis and an improvement in resolution.
In embodiments, a deconvolution process of a cancer patient PDF may be realized. Each spectrum may be split into two or more spectrums so that at least one of the spectrums gets more distance from the spectrum of the other category (higher clustering). For example, a cancer patient category may be divided into subcategories such as different cancer stages or different types of cancers. The PDFs of subcategories are now multiple PDFs spaced apart, with different centers of mass.
For example, PDFs of normal subjects of a subcategory and cancer patients after subcategorization is one of many possible ways to subdivide the deconvolutions. In embodiments, the PDFs of normal and cancer after deconvolution is spaced further apart than before deconvolution, resulting in better clustering. The area overlapped by two PDF represent the quality of clustering. The deconvolution process may be repeated until the optimal clustering is obtained. The above process of finding optimal clustering for each m/z repeated all m/z of interest. The optimal clustering is eventually used to derive a signature database that will be used to compare against an unknown patient's sample.
In embodiments, tables may be utilized for all relevant m/z's and their successive clustering results. From the set of all m/z's with optimal clustering information, a set of m/z's with optimal clustering is selected as signature database for pattern matching to accurately diagnose a cancer. One metric may be the distance between the cancer and normal clusters. The farther apart the better clustering. The areas overlapped by the normal and cancer may be used as weights for pattern matching, in accordance with embodiments.
Embodiments relate to a method of diagnosing cancer using mass spectrometry is provided. In embodiments, a method may include deconvolving the profile or the PDF of mass spectra within a category at a m/z point into two or more profiles of the category. In embodiments, a method may include repeating the deconvolution process until optimally clustered subcategories of each category at the m/z are obtained. In embodiments, a method may include repeating the optimal clustering process for other m/z's of interest. In embodiments, a method may include selecting an optimum set of m/z's to yield the best clustering. In embodiments, a method may include defining a pair of associated subcategories which shows the optimum clustering value. In embodiments, a method may include applying the optimum clustering and defining subcategorization process for the remaining data profiles until acceptable clustering outcome is achieved. In embodiments, clustered subcategories could be the existing classifications of diseases or microorganisms or the definition of a new classification.
Embodiments relate to cancer diagnostics using the mass spectrometer data. Embodiments may include deconvolving the profile or the PDF profile of mass spectra within a category at a m/z point into profiles of two or more subcategories where the category can be normal healthy people or cancer patients. Embodiments may include repeating the deconvolution process until desirable clustering subcategories of a category at the m/z are obtained, where the profile with one mode of a category being split into two profiles with each different mode and one of which being used for a higher clustering value against the other category. Embodiments may include repeating the clustering process for other m/z point of interest. Embodiments may include selecting an optimum set of m/z's to have the best and/or optimal clustering. Embodiments may include defining a pair of associated subcategories which shows the best (optimum) clustering value. Embodiments may include applying the optimum clustering and defining subcategorization process for the rest of the data until another acceptable clustering outcome is achieved or data is insufficient to perform the clustering process. In embodiments, the clustered subcategories could be the existing classifications of diseases or microorganisms or the definition of a new classification.
Embodiments relate to an apparatus and/or method that includes deconvolving a pre-deconvoluted distribution of spectrometry reference profile peaks into at least two post-deconvoluted distributions of spectrometry reference profile peaks. In embodiments, the pre-convoluted distribution of spectrometry reference profile peaks originated from spectrometry reference profiles associated with a first category. In embodiments, the at least two post-deconvoluted distributions of spectrometry reference profile peaks each originated from spectrometry reference profiles each associated with at least two different sub-categories of the first category.
Embodiments include receiving from a mass spectrometer a test mass spectrometry profile from a test on a sample. Embodiments include comparing peaks of the test mass spectrometry profile with the at least two post-deconvoluted distributions of spectrometry reference profile peaks. Embodiments include associating the test mass spectrometry profile to one of the at least two different sub-categories if at least one of the peaks of the test mass spectrometry profile is approximately the same as one of the two post-deconvoluted distributions of spectrometry reference profile peaks.
In embodiments, the mass spectrometer is comprised in a matrix assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS). In embodiments, the associating the test mass spectrometry profile to one of the at least two different sub-categories enhances a medical diagnosis through clustering.
In embodiments, the first category is at least one of disease and/or microorganism. In embodiments, at least one of the at least two different sub-categories of the first category is at least one of a characteristic and/or trait of the at least one disease and/or microorganism.
In embodiments, the first category is a characteristic of a reference sample that can be categorized. In embodiments, at least one of the at least two different sub-categories of the first category is at least one of a sub-characteristic and/or sub-trait of the characteristic of the reference sample. In embodiments, the at least one of the at least two different sub-categories is associated with a source of the spectrometry reference profile. In embodiments, one of the at least two different subcategories is age of the source, gender of the source, or characteristic of the source.
In embodiments, the first category and the at least two different sub-categories of the first category are comprises in a first cluster.
In embodiments, the pre-deconvoluted distribution of spectrometry reference profile peaks originated from spectrometry reference profiles associated with a second category. In embodiments, the at least two post-deconvolution distributions of spectrometry reference profile peaks each originated from spectrometry reference profiles each associated with at least two difference sub-categories of the second category. In embodiments, the second category and the at least two different sub-categories of the second category comprises a second cluster.
In embodiments, peaks of the pre-convoluted distribution of spectrometry reference profile peaks and the at least two post-deconvolution distributions of spectrum reference profile peaks are in units of mass-to-charge.
Embodiments include deconvolving at least one of the two post-deconvoluted distributions of spectrometry reference profile peaks into at least two secondary-post-deconvoluted distributions of spectrometry reference profile peaks. In embodiments, the at least two secondary-post-deconvoluted distributions of spectrometry reference profile peaks are each associated with at least two different secondary-sub-categories of at least one of the two different sub-categories of the first category. In embodiments, the first category, the at least two different sub-categories, and the at least two different secondary-sub-categories comprises a first cluster.
Embodiments include performing at least one subsequent deconvolving operations on the first cluster. In embodiments, the performing at least one subsequent deconvolving operations on the first cluster comprises an optimal number of deconvolving operations to optimize the first cluster.
In embodiments, the apparatus and/or method is performed on at least of a server and/or by cloud computing. In embodiments, the apparatus and/or method is performed using at least one of artificial intelligence and/or at least one deep learning algorithm.
Although the above-described embodiments are described based on a series of steps or flowcharts, this does not limit the time series order of the invention and may be performed simultaneously or in a different order as necessary. In addition, in the above-described embodiment, each component (for example, a unit, a module, etc.) constituting the block diagram may be implemented as a hardware device or software, and a plurality of components are combined into one hardware device or software. The above-described embodiments may be implemented in the form of program instructions that may be executed by various computer components, and may be recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical recording media such as CD-ROMs, DVDs, and magneto-optical media such as floptical disks, media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. The hardware device may be configured to operate as one or more software modules to perform the process according to the invention, and vice versa.
It will be obvious and apparent to those skilled in the art that various modifications and variations can be made in the embodiments disclosed. This, it is intended that the disclosed embodiments cover the obvious and apparent modifications and variations, provided that they are within the scope of the appended claims and their equivalents.
The present application claims priority to U.S. Provisional Patent Application No. 62/959,219 filed on Jan. 10, 2020 and U.S. Provisional Patent Application No. 62/959,223 filed on Jan. 10, 2020, which are all hereby incorporated by reference in their entireties.
| Number | Date | Country | |
|---|---|---|---|
| 62959219 | Jan 2020 | US | |
| 62959223 | Jan 2020 | US |