The present disclosure is generally directed toward diagnosing and treating health conditions, and in some particular embodiments the present disclosure is directed toward novel systems and methods for associating biological parameters with, inter alia, wellness classifications, wellness states, treatment effectiveness, and wellness progression or digression.
Timely diagnosis and treatment of health conditions is of great importance to the healthcare community. Conventional processes for arriving at conclusions as to diagnosis and treatment of health conditions are wanting in accuracy and precision. In particular, conventional methods of interpreting mass spectra obtained from biological samples are subject to intervening human error. Human inputs are often subject to bias that can taint a conclusion drawn from an interpretation of such mass spectra. Novel systems and methods are needed that provide improved reliability, accuracy, and precision in mass spectra interpretation through unbiased and continuously validated decision making in an intelligent environment.
As used in the present specification, the following words and phrases are generally intended to have the meanings as set forth below, except to the extent that the context in which they are used indicates otherwise.
The term “biological sample” refers to any biological fluid, cell, tissue, organ, or any portion of any one or more of the foregoing, or any combination of any one or more of the foregoing. By way of example, a “biological sample” may include one or more: tissue section(s) obtained by biopsy; cell(s) that are placed in or adapted to tissue culture; sample(s) of saliva, tears, sputum, sweat, mucous, fecal material, gastric fluid, abdominal fluid, amniotic fluid, cyst fluid, peritoneal fluid, spinal fluid, urine, synovial fluid, whole blood, serum, plasma, pancreatic juice, breast milk, lung lavage, marrow, gastric acid, bile, synovial fluid, semen, pus, aqueous humour, transudate, and the like; and any other biological matter, or any portion or combination of any one or more of the foregoing
The term “biomarker” refers to a distinctive biological or biologically-derived indicator of one or more process(es), event(s), condition(s), or any combination of the foregoing. In general, biological indicators and biologically derived indicators are detectable, quantifiable, and/or otherwise measurable. For instance, biomarker may include one or more measurable molecules or substances arising from, associated with, or derived from a subject, the presence of which is indicative of another quality (e.g., one or more process(es), event(s), condition(s), or any combination of the foregoing). A biomarker may include any one or more biological molecules (taken alone or together), or a fragment of any one or more biological molecules (taken alone or together)—the detected presence, quantity (absolute, proportionate, relative, or otherwise), measure, or change in one or more of such presence, quantity, or measure of which can be correlated with one or more particular wellness state(s). By way of example, biomarkers may include, but are not limited to, biological molecules comprising one or more: nucleotide(s), amino acid(s), fatty acid(s), steroid(s), antibodie(s), hormone(s), peptide(s), protein(s), carbohydrate(s), and the like. Further examples may comprise one or more: glycosylated peptide fragment(s), lipoprotein(s), and the like. A biomarker may be indicative of a wellness condition, such as the presence, onset, stage or status of one or more disease(s), infection(s), syndrome(s), condition(s), or other state(s), including being at-risk of one or more disease(s), infection(s), syndrome(s), or condition(s).
The term “glycan” refers to the carbohydrate portion of a glycoconjugate, such as the carbohydrate portion of a glycopeptide, glycoprotein, glycolipid or proteoglycan.
The term “glycoform” refers to a unique primary, secondary, tertiary and quaternary structure of a protein with an attached glycan of a specific structure.
The term “glycosylated peptide fragment” refers to a glycosylated peptide (or glycopeptide) having an amino acid sequence that is the same as part (but not all) of the amino acid sequence of the glycosylated protein from which the glycosylated peptide is obtained via fragmentation, e.g., with one or more protease(s).
The term “multiple reaction monitoring mass spectrometry (MRM-MS)” refers to a highly sensitive and selective method for the targeted quantification of protein/peptide in biological samples. Unlike traditional mass spectrometry, MRM-MS is highly selective (targeted), allowing researchers to fine tune an instrument to specifically look for peptides/protein fragments of interest. MRM allows for greater sensitivity, specificity, speed and quantitation of peptides/protein fragments of interest, such as a potential biomarker. MRM-MS involves using one or more of a triple quadrupole (QQQ) mass spectrometer and a quadrupole time-of-flight (qTOF) mass spectrometer.
The term “protease” refers to an enzyme that performs proteolysis or breakdown of proteins into smaller polypeptides or amino acids. Examples of a protease include, but are not limited to, one one or more of a serine protease, threonine protease, cysteine protease, aspartate protease, glutamic acid protease, metalloprotease, asparagine peptide lyase, and any combinations of the foregoing.
The term “subject” refers to a mammal. The non-liming examples of a mammal include a human, non-human primate, mouse, rat, dog, cat, horse, or cow, and the like. Mammals other than humans can be advantageously used as subjects that represent animal models of disease, pre-disease, or a pre-disease condition. A subject can be male or female. A subject can be one who has been previously identified as having a disease or a condition, and optionally has already undergone, or is undergoing, a therapeutic intervention for the disease or condition. Alternatively, a subject can also be one who has not been previously diagnosed as having a disease or a condition. For example, a subject can be one who exhibits one or more risk factors for a disease or a condition, or a subject who does not exhibit disease risk factors, or a subject who is asymptomatic for a disease or a condition. A subject can also be one who is suffering from or at risk of developing a disease or a condition.
The term “treatment” or “treating” means any treatment of a disease or condition in a subject, such as a mammal, including: 1) preventing or protecting against the disease or condition, that is, causing the clinical symptoms not to develop; 2) inhibiting the disease or condition, that is, arresting or suppressing the development of clinical symptoms; and/or 3) relieving the disease or condition that is, causing the regression of clinical symptoms.
As used herein, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise.
The computer-readable medium 102 is intended to represent a variety of potentially applicable technologies. For example, the computer-readable medium 102 can be used to form a network or part of a network. Where two components are co-located on a device, the computer-readable medium 102 can include a bus or other data conduit or plane. Where a first component is co-located on one device and a second component is located on a different device, the computer-readable medium 102 can include a wireless or wired back-end network or LAN. The computer-readable medium 102 can also encompass a relevant portion of a WAN or other network, if applicable.
As used in this paper, a “computer-readable medium” is intended to include all mediums that are statutory (e.g., in the United States, under 35 U.S.C. 101), and to specifically exclude all mediums that are non-statutory in nature to the extent that the exclusion is necessary for a claim that includes the computer-readable medium to be valid. Known statutory computer-readable mediums include hardware (e.g., registers, random access memory (RAM), non-volatile (NV) storage, to name a few), but may or may not be limited to hardware.
The computer-readable medium 102 or portions thereof, as well as other systems, interfaces, engines, datastores, and other devices described in this paper, can be implemented as a computer system, a plurality of computer systems, or a part of a computer system or a plurality of computer systems. In general, a computer system will include a processor, memory, non-volatile storage, and an interface. A typical computer system will usually include at least a processor, memory, and a device (e.g., a bus) coupling the memory to the processor. The processor can be, for example, a general-purpose central processing unit (CPU), such as a microprocessor, or a special-purpose processor, such as a microcontroller.
The memory can include, by way of example but not limitation, random access memory (RAM), such as dynamic RAM (DRAM) and static RAM (SRAM). The memory can be local, remote, or distributed. The bus can also couple the processor to non-volatile storage. The non-volatile storage is often a magnetic floppy or hard disk, a magnetic-optical disk, an optical disk, a read-only memory (ROM), such as a CD-ROM, EPROM, or EEPROM, a magnetic or optical card, or another form of storage for large amounts of data. Some of this data is often written, by a direct memory access process, into memory during execution of software on the computer system. The non-volatile storage can be local, remote, or distributed. The non-volatile storage is optional because systems can be created with all applicable data available in memory.
Software is typically stored in non-volatile storage. Indeed, for large programs, it may not even be possible to store the entire program in memory. Nevertheless, it should be understood that for software to run, if necessary, it is moved to a computer-readable location appropriate for processing, and for illustrative purposes, that location is referred to as the memory in this paper. Even when software is moved to the memory for execution, the processor will typically make use of hardware registers to store values associated with the software, and local cache that, ideally, serves to speed up execution. As used herein, a software program is assumed to be stored at an applicable known or convenient location (from non-volatile storage to hardware registers) when the software program is referred to as “implemented in a computer-readable storage medium.” A processor is considered to be “configured to execute a program” when at least one value associated with the program is stored in a register readable by the processor.
In one example of operation, a computer system can be controlled by operating system software, which is a software program that includes a file management system, such as a disk operating system. One example of operating system software with associated file management system software is the family of operating systems known as Windows® from Microsoft Corporation of Redmond, Wash., and their associated file management systems. Another example of operating system software with its associated file management system software is the Linux operating system and its associated file management system. The file management system causes the processor to execute the various acts required by the operating system to input and output data and to store data in the memory, including storing files on the non-volatile storage.
The bus can also couple the processor to the interface. The interface can include one or more input and/or output (I/O) devices. The I/O devices can include, by way of example but not limitation, a keyboard, a mouse or other pointing device, disk drives, printers, a scanner, and other I/O devices, including a display device. The display device can include, by way of example but not limitation, a cathode ray tube (CRT), liquid crystal display (LCD), or some other applicable known or convenient display device. The interface can include one or more of a modem or network interface. It will be appreciated that a modem or network interface can be considered to be part of the computer system. The interface can include an analog modem, ISDN modem, cable modem, token ring interface, satellite transmission interface (e.g. “direct PC”), or other interfaces for coupling a computer system to other computer systems. Interfaces enable computer systems and other devices to be coupled together in a network.
The computer systems can be compatible with or implemented as part of or through a cloud-based computing system. As used in this paper, a cloud-based computing system is a system that provides virtualized computing resources, software and/or information to client devices. The computing resources, software and/or information can be virtualized by maintaining centralized services and resources that the edge devices can access over a communication interface, such as a network. “Cloud” may be a marketing term and for the purposes of this paper can include any of the networks described herein. The cloud-based computing system can involve a subscription for services or use a utility pricing model. Users can access the protocols of the cloud-based computing system through a web browser or other container application located on their client device.
A computer system can be implemented as an engine, as part of an engine, or through multiple engines. As used in this paper, an engine includes at least two components: 1) a dedicated or shared processor and 2) hardware, firmware, and/or software modules that are executed by the processor. Depending upon implementation-specific or other considerations, an engine can be centralized or its functionality distributed. An engine can include special purpose hardware, firmware, or software embodied in a computer-readable medium for execution by the processor. The processor transforms data into new data using implemented data structures and methods, such as is described with reference to the FIGS. in this paper.
The engines described in this paper, or the engines through which the systems and devices described in this paper can be implemented, can be cloud-based engines. As used in this paper, a cloud-based engine is an engine that can run applications and/or functionalities using a cloud-based computing system. All or portions of the applications and/or functionalities can be distributed across multiple computing devices, and need not be restricted to only one computing device. In some embodiments, the cloud-based engines can execute functionalities and/or modules that end users access through a web browser or container application without having the functionalities and/or modules installed locally on the end-users' computing devices.
As used in this paper, datastores are intended to include repositories having any applicable organization of data, including tables, comma-separated values (CSV) files, traditional databases (e.g., SQL), or other applicable known or convenient organizational formats. Datastores can be implemented, for example, as software embodied in a physical computer-readable medium on a general- or specific-purpose machine, in firmware, in hardware, in a combination thereof, or in an applicable known or convenient device or system. Datastore-associated components, such as database interfaces, can be considered “part of” a datastore, part of some other system component, or a combination thereof, though the physical location and other characteristics of datastore-associated components is not critical for an understanding of the techniques described in this paper.
Datastores can include data structures. As used in this paper, a data structure is associated with a particular way of storing and organizing data in a computer so that it can be used efficiently within a given context. Data structures are generally based on the ability of a computer to fetch and store data at any place in its memory, specified by an address, a bit string that can be itself stored in memory and manipulated by the program. Thus, some data structures are based on computing the addresses of data items with arithmetic operations; while other data structures are based on storing addresses of data items within the structure itself. Many data structures use both principles, sometimes combined in non-trivial ways. The implementation of a data structure usually entails writing a set of procedures that create and manipulate instances of that structure. The datastores described in this paper can be cloud-based datastores. A cloud-based datastore is a datastore that is compatible with cloud-based computing systems and engines.
Referring once again to the example of
In some embodiments, biological samples are from one or more past studies that occurred over a span of 1 to 50 years or more. In some embodiments, the studies are accompanied by various other clinical parameters and previously known information such as a subject's age, height, weight, ethnicity, medical history, and the like. Such additional information can be useful in associating a subject with a wellness classification. In some embodiments, the biological samples are one or more clinical samples collected prospectively from subjects.
In one embodiment, a biological sample isolated from a subject is body tissue, saliva, tears, sputum, spinal fluid, urine, synovial fluid, whole blood, serum, or plasma. In another embodiment, a biological sample isolated from a subject is whole blood, serum, or plasma. In some embodiments, subjects are mammals. In some of those embodiments, the subjects are humans.
In one embodiment, glycosylated proteins considered for quantifying the glycomic parameters are one or more of alpha-1-acid glycoprotein, alpha-1-antitrypsin, alpha-1B-glycoprotein, alpha-2-HS-glycoprotein, alpha-2-macroglobulin, antithrombin-III, apolipoprotein B-100, apolipoprotein D, apolipoprotein F, beta-2-glycoprotein 1, ceruloplasmin, fetuin, fibrinogen, immunoglobulin (Ig) A, IgG, IgM, haptoglobin, hemopexin, histidine-rich glycoprotein, kininogen-1, serotransferrin, transferrin, vitronectin, and zinc-alpha-2-glycoprotein.
In one embodiment, glycosylated peptide fragments considered for quantifying glycomic parameters are one or more of O-glycosylated and N-glycosylated. In another embodiment, glycosylated peptide fragments considered for quantifying glycomic parameters have an average length of from 5 to 50 amino acid residues. In another embodiments, the glycosylated peptide fragments have an average length of from about 5 to about 45, or from about 5 to about 40, or from about 5 to about 35, or from about 5 to about 30, or about from 5 to about 25, or from about 5 to about 20, or from about 5 to about 15, or from about 5 to about 10, or from about 10 to about 50, or from about 10 to about 45, or from about 10 to about 40, or from about 10 to about 35, or from about 10 to about 30, or from about 10 to about 25, or from about 10 to about 20, or from about 10 to about 15, or from about 15 to about 45, or from about 15 to about 40, or from about 15 to about 35, or from about 15 to about 30, or about from 15 to about 25 or from about 15 to about 20 amino acid residues. In one embodiment, the glycosylated peptide fragments have an average length of about 15 amino acid residues. In another embodiment, the glycosylated peptide fragments have an average length of about 10 amino acid residues. In another embodiment, the glycosylated peptide fragments have an average length of about 5 amino acid residues.
In an embodiment, fragmentation of the glycosylated proteins is carried out using one or more proteases. In one embodiment, one or more of the proteases is a serine protease, threonine protease, cysteine protease, aspartate protease, glutamic acid protease, metalloprotease, asparagine peptide lyase or a combination thereof. A few representative examples of a protease include, but are not limited to, trypsin, chymotrypsin, endoproteinase, Asp-N, Arg-C, Glu-C, Lys-C, pepsin, thermolysin, ealastase, papain, proteinase K, subtilisin, clostripain, carboxypeptidase and the like. In another embodiment, the present disclosure provides the methods as described herein, wherein the one or more proteases comprise at least two proteases. In another embodiment, fragmentation and quantification of the glycosylated proteins employs liquid chromatography-mass spectrometry (LC-MS) techniques using multiple reaction monitoring mass spectrometry (MRM-MS), which enables quantification of hundreds of glycosylated peptide fragments (and their parent proteins) in a single LC/MRM-MS analysis. The advanced mass spectroscopy techniques of the present disclosure provide effective ion sources, higher resolution, faster separations and detectors with higher dynamic ranges that allow for broad untargeted measurements that also retain the benefits of targeted measurements.
The mass spectroscopy methods of the present disclosure are applicable to several glycosylated proteins at a time. For example, at least more than 50, or at least more than 60 or at least more than 70, or at least more than 80, or at least more than 90, or at least more than 100, or at least more than 110 or at least more than 120 glycosylated proteins can be analyzed at a time using the mass spectrometer.
In one embodiment, mass spectroscopy methods described in this paper employ QQQ or qTOF mass spectrometry. In another embodiment, mass spectroscopy methods described in this paper provide data with high mass accuracy of 10 ppm or better; or 5 ppm or better; or 2 ppm or better; or 1 ppm or better; or 0.5 ppm or better; or 0.2 ppm or better or 0.1 ppm or better at a resolving power of 5,000 or better; or 10,000 or better; or 25,000 or better; or 50,000 or better or 100,000 or better.
In the example of
In the example of
In the example of
In the example of
In the example of
Although a specific implementation is contained within a clinical and laboratory ecosystem, it should be understood other parameter generation systems can be utilized, including a social media parameter generation system that pulls data from social media regarding subjects, a behavioristic parameter generation system that pulls data regarding online activities from various sources, a governmental records parameter generation system that pulls publicly-available data from government-run websites, or the like. The larger the data sample size, the more disparate data can be incorporated into parameters used for wellness classification.
In the example of
In a specific implementation, the automatic non-biased machine learning diagnosis system 116 is capable of automatically determining abundance or dearth of one or more quantifiable biological parameters as biomarkers associated with a specific wellness classification and/or existence or lack of one or more non-quantifiable biological parameters as biomarkers associated with the specific wellness classification. Depending upon implementation-specific or other considerations, the biological parameter determined as a biomarker may be a scalar value or value range of a biological parameter, or a combination of two or more biological parameters (e.g., a ratio of two biological parameters, and a vector of two or more biological parameters). For example, a certain range (e.g., higher than a certain threshold, or between a lower threshold and a higher threshold) of a metabolic product indicates a wellness condition. In another example, a specific ratio or a ratio range of an amount of one type of glycopeptide to an amount of one type of lipid may indicates a wellness condition. In another example, a range of a quantifiable biological parameter over a certain threshold with a positive non-quantifiable parameter (e.g., non-smoker) may be a biomarker.
In a specific implementation, the automatic non-biased machine learning diagnosis system 116 prohibits or restricts user alteration of parameter settings for a specific data calculation process thereof, in order to ensure automatic machine calculation without human intervention (e.g., without human bias). This is because human bias tends to make it more difficult to find biomarkers of a wellness classification, when such biomarkers seem irrelevant to a human observer (e.g., scientist). In an example, in the automatic non-biased machine learning diagnosis system 116, each biological parameter that is taken into consideration by the automatic non-biased machine learning diagnosis system 116 has equal weight at least during an initial stage of the calculation. Stated in a different manner, during an initial stage of the calculation, the automatic non-biased machine learning diagnosis system 116 ignores no biological parameter. As the calculation process proceeds, the automatic non-biased machine learning diagnosis system 116 increasingly focuses on a first subset of the biological parameters as being correlated with a specific wellness classification, and less on a second subset of the biological parameters as being uncorrelated with the specific wellness classification (i.e., a noise component). Depending upon implementation-specific or other considerations, parameter setting alteration for the machine learning operation is protected through a user authentication system to ensure non-biased operation. Depending upon implementation-specific or other considerations, the machine learning is deep learning, neural network, linear discriminant analysis, quadratic discriminant analysis, support vector machine, random forest, nearest neighbor or a combination thereof.
In a specific implementation, the automatic non-biased machine learning diagnosis system 116 compares abundance or dearth of determined biomarkers associated with a wellness classification with quantification of the corresponding biological parameter obtained from a subject, to diagnose a wellness classification state (positive or negative) of the subject. For example, it is possible to determine that a subject has a disease when quantifications of biological parameters obtained from the subject falls within a specific range of the determined biomarkers.
In a specific implementation, the automatic non-biased machine learning diagnosis system 116 determines an effect of a medical treatment for a disease by comparing quantifications of biomarkers obtained from subjects who have the disease and have not received the treatment, subjects who have the disease and have received the treatment, and healthy subjects not having the disease (and not receiving the treatment). Here, the medical treatment can include, but are not limited to, exercise regimens, dietary supplementation, weight loss, surgical intervention, device implantation, and treatment with therapeutics or prophylactics used in subjects diagnosed or identified with a wellness condition. For example, it is possible to determine whether a medical treatment has a medically-favorable effect to treat a wellness condition when quantifications of biomarkers obtained from subjects receiving treatment are closer to quantifications of biomarkers obtained from healthy subjects, compared to quantifications of biomarkers obtained from the subject without the treatment. In a specific implementation, the automatic non-biased machine learning diagnosis system 116 is further capable of determining progress of medical treatment by comparing quantifications of biological parameters obtained from subjects who have the wellness classification and have not received treatment and subjects who have the wellness classification and have received treatment, and subjects who do not have the wellness classification (and are not receiving the treatment). For example, it is possible to determine treatment can be terminated when quantifications of biomarkers obtained from subjects receiving treatment approximately match quantifications of biomarkers obtained from healthy subjects. In a specific implementation, the automatic non-biased machine learning diagnosis system 116 is further capable of determining progress of wellness classification in a manner similar to determination of progress of treatment. In a specific implementation, the automatic non-biased machine learning diagnosis system 116 is further capable of determining or selecting an effective treatment from a plurality of possible treatments by comparing determined progress of the possible treatments.
In the example of
Appropriate platforms include, by way of example but not limitation, web pages (e.g., the determined biological parameters and/or the diagnosis result could be presented as a message on a personal web page, such as an individual web page of a hospital), electronic messages (e.g., emails, text messages, voice messages), print media (e.g. a letter), and other platforms suitable for providing content to a subject.
A specific example of operation for determining biological parameters for a specific wellness classification and diagnosing a subject based on the biological parameters using a system such as is illustrated in the example of
The automatic non-biased machine learning diagnosis system 116 determines one or more biological parameters that is considered to be associated with one or more wellness classifications based on quantification results of at least one of the glycomic parameters received from the glycomic parameter quantification system 104, the genomic parameters received from the genomic parameter quantification system 106, the proteomic parameters received from the proteomic parameter quantification system 108, the metabolic parameters received from the metabolic parameter quantification system 110, and the lipidomic parameters received from the lipidomic parameter quantification system 112, and/or based on quantification and/or non-quantification results of the clinical parameters received from the clinical parameter generation system 114. Advantageously, the automatic non-biased machine learning diagnosis system 116 performs the determination of the one or more biological parameters as the biomarkers based on combination of data from two or more of the glycomic parameter quantification system 104, the genomic parameter quantification system 106, the proteomic parameter quantification system 108, the metabolic parameter quantification system 110, the lipidomic parameter quantification system 112, and the clinical parameter generation system 114, to improve accuracy of the biological parameters as the biomarkers.
In a specific implementation, the automatic non-biased machine learning diagnosis system 116 carries out diagnosis of a subject based on comparison of biological parameters with measured values or inspected state of the subject. The diagnosis result presentation system 118 carries out presentation (e.g., generation of a GUI) of biological parameters determined by the automatic non-biased machine learning diagnosis system 116 and/or presentation (e.g., generation of a GUI) of a diagnostic result (e.g., positive or negative) generated by the automatic non-biased machine learning diagnosis system 116.
To quantify respective biological parameters (e.g., glycomic parameters, genomic parameters, proteomic parameters, metabolic parameters, lipidomic parameters), system 100 may perform one or more quantification operations in connection with the universe of mass spectral data obtained from the mass spectrometry technologies utilized in a given embodiment of the present disclosure. In some embodiments, for example, may utilize one or more peak picking tools and related integration methods to quantify one or more respective biological parameters within a biological sample or set of biological samples. In some embodiments, a system of the present disclosure such as System 100 may be equipped with a subsystem or platform that one or more of systems 104-112 may leverage in performing quantification. An example implementation of such an embodiment is illustrated in
As shown in
Acquisition component 132 may be configured to obtain a mass spectra dataset from a source (e.g., sample data repository 122) and make such mass spectra dataset information accessible to one or more other elements of system 120, including, for example, one or more components of peak integration platform 130—such as feature extraction component 134, consensus/ensemble component 136, and peak integration component 138. Acquisition component 132 may further be configured to store copies of obtained datasets in one or more other data repositories connected thereto. Acquisition component 132 may obtain data responsive to a user prompted command, or based on an automated trigger (e.g., a preset or periodic pulling of data at a particular time and from a particular source), or on a continuous basis. For example, acquisition component 132 may receive an indication from a user (e.g., by a user making selections via a computing device) that the user desired to load a particular mass spectra dataset associated with a new biological sample from a subject under investigation. Acquisition component 132 may further be configured to make obtained datasets available for access to one or more components sequentially, simultaneously (i.e., in parallel), in series in accordance with a predefined order, or in another arrangement based on a predetermined criteria. Acquisition component 132 may be a standalone application that facilitates the download of mass spectral dataset information in a specialized manner, or it may operate in concert with another application to effectuate the same.
Feature extraction component 134 may be configured to receive mass spectra data (e.g., associated with one or more biological samples from one or more subjects) from acquisition component 132, and to extract (i.e., identify) one or more proteomic features represented within the data. To effectuate feature extraction, feature extraction component may be configured to extract peptide induced signals (i.e., peaks) from the raw mass spectral data, or from pre-processed mass spectral data. A mass spectra dataset associated with a biological sample from a subject may contain tens to thousands of spectra (corresponding to intensity information for many different mass channels corresponding to isotopes) associated with many different molecular species (e.g., different molecules). Feature extraction component 134 may be configured to analyze the mass spectra dataset to determine whether any observed spectral patterns in the dataset (e.g., observed isotope distributions, peaks, etc.) correspond to a known or unknown but statistically significant/apparent molecular species. Known spectral patterns and/or isotope distributions corresponding to known molecular species may be stored in transition list repository 124, and accessible to feature extraction component 134 during operation. For example, transition list repository 124 may include information associated with known transitions between peaks and valleys that are associated with a particular feature. Transition list repository 124 may further include predetermined peak waveforms having predetermined start and stop points for integration (start and stop points generally corresponding to the valleys on either side of a peak associated with a known feature). Because mass spectral data can often include mixtures of overlapping isotope patterns and abundant noise, feature extraction component 134 may be configured to identify combinations of overlapping individual peaks, and filter out or otherwise reduce chemical and/or detector noise in the dataset.
Feature extraction component 134 utilize a peak picking tool known in the art, such as, NITPICK, Skyline, OpenMS, DIA-Umpire, PECAN, XCMS, multiplierz, MZmine, T-Biolnfo, MASS++, mslnspect, MassSpecWavelet, MALDlquant, EigenMS, PrepMS, LC-IMS-MS-Feature-Finder, mMass, IMTBX (Ion Mobility Toolbox), Grppr (Grouper), mzDesktop, Cromwell, MapQuant, pParse, MzJava, HappyTools, Mass-UP, LIMPIC, SpiceHit, ProteinPilot, PROcess, GAGfinder, Intact Mass, JUMBO, Maltcms, SpectroDive, enviPick, findMF, PNNL PreProcessor, msXpertSuite, LCMS-2D, or Siren (Sparse Isotope RegressionN). Feature extraction component 134 may be configured to apply or enable only unbiased features of any one or more of the foregoing, disallowing human intervention in the peak picking process.
In some embodiments, feature extraction component may apply any two or more peak picking operations to a given dataset (e.g., in parallel) to obtain two or more sets of feature extraction results for the dataset. Consensus/Ensemble component 136 may be configured to obtain multiple sets of feature extraction data for a dataset from feature extraction component 134, and identify consensus or non-consensus among the multiple sets of feature extraction results, or among portions of the multiple sets of feature extraction results. Consensus may be considered on a feature by feature basis, across the dataset as a whole, or any other desired criteria desired. In some embodiments, consensus for a given extracted feature (i.e., for a given peak (and associated transitions)) may be achieved with a predetermined number, percentage, or ratio of the applied peak picking operations arrive at an identification of a same peak within a given dataset.
In some embodiments, consensus/ensemble component 136 may generate a consensus dataset comprising a single set of feature extraction results that contains data for extracted features upon which consensus was obtained across multiple peak picking operations. In some embodiments, consensus/ensemble component 136 may generate an ensemble dataset comprising a single set of feature extraction results that is representative of the extracted features for which there was substantial similarity across multiple peak picking operations. In such embodiments, consensus/ensemble component 136 may be configured to generate the ensemble dataset by combining the feature extraction results across multiple sets of feature extraction results (e.g., on a feature specific basis) using a statistical operation to define one or more characteristics of a peak (e.g., a valley, a transition, a tip of the peak, a slope of the peak waveform at a point along the waveform, etc). Such a statistical operation may include one or more of an average, a median, a weighted combination, or any other combination.
Peak integration component 138 may be configured to obtain one or more feature extraction results from one or more of feature extraction component 134 and consensus/ensemble component 136 (or another component or element of system 120), and perform an integration to determine the area under the intensity curve that defines the peak associated with a given extracted feature (e.g., a given molecule). Peak integration component 138 may employ any type of integration method—e.g., trapezoidal integration, rectangular integration, etc. The area under the intensity curve for a given feature (even a unitless area) can be said to correspond to a quantity of molecules that are associated with that feature within a biological sample under consideration. Although the systems of the present disclosure need not generate a plot or graphical representation of spectra, or peak waveforms, or any other data in order to operate,
As may be observed from
In the example of
In the example of
In the example of
In the example of
In the example of
In the example of
In the example of
In the example of
In the example of
In the example of
In an implementation, the non-biased deep learning engine 305 forms an artificial neural network (ANN) comprising an input layer, an output layer, and one or more hidden layers formed between the input layer and the output layer. The input layer includes a plurality of artificial neurons, and to each of the artificial neurons of the input layer, one quantification of a part of or the whole types of glycosylated peptide fragments, and optionally further one or more parameters representing a condition of a subject, are input. Similarly, each of the one or more of the hidden layers includes a plurality of artificial neurons, and to each of the artificial neurons of each of the one or more hidden layers, one or more outputs of artificial neurons of the immediately-previous layer (e.g., the input layer or one of the hidden layers) are input. In each artificial neuron of the one or more hidden layers, inputs from the immediately-previous layer are received at certain weights according to an algorithm, and a certain calculation (e.g., XOR) is carried out. Outputs from artificial neurons of the last hidden layer of the one or more hidden layers are input to one or more artificial neurons of the output layer, and the output layer outputs one or more biological parameters as the candidate biomarkers to predict a classification (e.g., disease state). Depending upon implementation-specific or other considerations, the ANN of the non-biased deep learning engine 305 may include a neural network, such as a feedforward neural network, in which connections between layers do not form a cycle, or a recurrent neural network (RNN), in which connections between layers form a directed cycle. Depending upon implementation-specific or other considerations, a single unit of the non-biased deep learning engine 305 may perform a deep learning process for multiple wellness classifications of interest. In an alternative, a separate unit of the non-biased deep learning engine 305 may be provided for wellness classifications of interest.
In the example of
In a specific implementation, the matching results obtained by the internal validation engine 306 are fed back to the data categorization engine 302, and based on the matching results, the data categorization engine 302 maintains or modifies the manner of categorizing the quantification results into a training data group and a test data group. In a specific implementation, the matching results obtained by the internal validation engine 306 are fed back to the non-biased deep learning engine 305, and based on the matching results, the non-biased deep learning engine 305 maintains or modifies weights to be applied to each artificial neuron of the ANN.
In the example of
In the example of
In a specific implementation, similarly to the internal validation engine 306, the matching results obtained by the external validation engine 308 are fed back to the data categorization engine 302, and based on the matching results, the data categorization engine 302 maintains or modifies the manner of categorizing the quantification results into the training data group and the test data group, and/or the training-to-test ratio. In addition, the matching results obtained by the external validation engine 308 are fed back to the non-biased deep learning engine 305, and based on the matching results, the non-biased deep learning engine 305 maintains or modifies the weights to be applied to each artificial neuron of the ANN and/or other operational parameters of the deep learning to improve accuracy of determining the classification for the wellness classification.
In the example of
In the example of
In the example of
In the example of
If, on the other hand, it is determined the prediction diagnosis of the wellness classification is not performed with respect to new subjects (414-N), e.g., if the wellness classification state of new subjects is known, the flowchart 400 proceeds to module 418, where validated biomarkers undergo extensive validation with reference to quantification results of the new subjects. In a specific implementation, extensive validation includes determination of whether a positive subject of the wellness classification has quantifications of the one or more corresponding biological parameters matching abundance or dearth of the validated biomarkers, and whether a negative subject of the wellness classification has quantifications of the one or more corresponding biological parameters mismatching abundance or dearth of the validated biomarkers.
In the example of
In the example of
In the example of
In the example of
In a specific implementation, the biomarker-based diagnosis engine 503 determines whether a treatment applied to a subject is effective, by determining whether a quantification of a biological parameter obtained from a biological sample of the subject approaches a specific range corresponding to a healthy state, departing from another specific range corresponding to a wellness classification state, indicated by details of the biomarker, in comparison to the quantification that was obtained before the treatment was applied to the subject.
In a specific implementation, the biomarker-based diagnosis engine 503 determines an objective wellness classification progress of a subject, by determining whether a quantification of a biological parameter obtained from a biological sample of the subject increases or decreases in a specific range corresponding to a wellness classification state, departing from another specific range corresponding to a healthy state, indicated by details of the biomarker, in comparison to the quantification that was obtained previously after the subject was diagnosed as having the wellness classification. For example, after a subject was diagnosed as having a heart disease, a stage of the heart disease is objectively determined based on the biomarker level.
In a specific implementation, the biomarker-based diagnosis engine 503 determines (or selects) a treatment that is considered to be suitable for a subject having a wellness classification based on diagnosis results, in particular, treatment effectiveness results, stored in the diagnosis result datastore 504. For example, the biomarker-based diagnosis engine 503 retrieves from the diagnosis result datastore 504 treatment effectiveness results of a plurality of different treatments that have been applied to subjects having the wellness classification, and selects a best treatment from the plurality of treatments, based on the quantification results of the subject and the biomarkers.
The methods of the present disclosure are applicable to any disease or condition that can be detected by analyzing the biological parameters obtained from the biological samples of a subject. In some embodiments, the disease or condition is cancer. In other embodiments, the cancer is acute lymphocytic leukemia (ALL), acute myeloid leukemia (AML), adrenocortical cancer, anal cancer, bladder cancer, blood cancer, bone cancer, brain tumor, breast cancer, cancer of the female genital system, cancer of the male genital system, central nervous system lymphoma, cervical cancer, childhood rhabdomyosarcoma, childhood sarcoma, chronic lymphocytic leukemia (CLL), chronic myeloid leukemia (CML), colon and rectal cancer, colon cancer, endometrial cancer, endometrial sarcoma, esophageal cancer, eye cancer, gallbladder cancer, gastric cancer, gastrointestinal tract cancer, hairy cell leukemia, head and neck cancer, hepatocellular cancer, Hodgkin's disease, hypopharyngeal cancer, Kaposi's sarcoma, kidney cancer, laryngeal cancer, leukemia, liver cancer, lung cancer, malignant fibrous histiocytoma, malignant thymoma, melanoma, mesothelioma, multiple myeloma, myeloma, nasal cavity and paranasal sinus cancer, nasopharyngeal cancer, nervous system cancer, neuroblastoma, non-Hodgkin's lymphoma, oral cavity cancer, oropharyngeal cancer, osteosarcoma, ovarian cancer, pancreatic cancer, parathyroid cancer, penile cancer, pharyngeal cancer, pituitary tumor, plasma cell neoplasm, primary CNS lymphoma, prostate cancer, rectal cancer, respiratory system, retinoblastoma, salivary gland cancer, skin cancer, small intestine cancer, soft tissue sarcoma, stomach cancer, testicular cancer, thyroid cancer, urinary system cancer, uterine sarcoma, vaginal cancer, vascular system, Waldenstrom's macroglobulinemia, Wilms' tumor, and the like. In another embodiment, the cancer is breast cancer, cervical cancer or ovarian cancer.
In another embodiment, the disease is an autoimmune disease. In another embodiment, the autoimmune disease is acute disseminated encephalomyelitis, Addison's disease, agammaglobulinemia, age-related macular degeneration, alopecia areata, amyotrophic lateral sclerosis, ankylosing spondylitis, antiphospholipid syndrome, antisynthetase syndrome, atopic allergy, atopic dermatitis, autoimmune aplastic anemia, autoimmune cardiomyopathy, autoimmune enteropathy, autoimmune hemolytic anemia, autoimmune hepatitis, autoimmune inner ear disease, autoimmune lymphoproliferative syndrome, autoimmune peripheral neuropathy, autoimmune pancreatitis, autoimmune polyendocrine syndrome, autoimmune progesterone dermatitis, autoimmune thrombocytopenic purpura, autoimmune uticaria, autoimmune uveitis, Balo disease/Balo concentric sclerosis, Behcet's disease, Berger's disease, Bickerstaffs encephalitis, Blau syndrome, Bullous pemphigoid, cancer, Castleman's disease, celiac disease, Chagas disease, chronic inflammatory demyelinating polyneuropathy, chronic recurrent multifocal osteomyelitis, chronic obstructive pulmonary disease, Churg-Strauss syndrome, cicatricial pemphigoid, Cogan syndrome, cold agglutinin disease, complement component 2 deficiency, contact dermatitis, cranial arteritis, CREST syndrome, Crohn's disease, Cushing's syndrome, cutaneous leukocytoclastic angiitis, Dego's disease, Dercum's disease, dermatitis herpetiformis, dermatomyositis, diabetes mellitus type 1, diffuse cutaneous systemic sclerosis, Dressler's syndrome, drug-induced lupus, discoid lupus erythematosus, eczema, endometriosis, enthesitis-related arthritis, eosinophilic fasciitis, eosinophilic gastroenteritis, epidermolysis bullosa acquisita, erythema nodosum, erythroblastosis fetalis, essential mixed cryoglobulinemia, Evan's syndrome, fibrodysplasia ossificans progressive, fibrosing alveolitis, gastritis, gastrointestinal pemphigoid, glomerulonephritis, Goodpasture's syndrome, Graves' disease, Guillan-Barre syndrome, Hashimoto's encephalopathy, Hashimoto's thyroiditis, Henoch-Schonlein purpura, HIV, gestational pemphigoid, hidradenitis suppurativa, Hughes-Stovin syndrome, hypogammaglobulinemia, idiopathic inflammatory demyelinating diseases, idiopathic pulmonary fibrosis, idiopathic thrombocytopenic purpura, IgA nephropathy, inclusion body myositis, chronic inflammatory demyelinating polyneuropathy, interstitial cystitis, juvenile idiopathic arthritis, Kawasaki's disease, Lambert-Eaton myasthenic syndrome, leukocytoclastic vasculitis, lichen planus, lichen sclerosus, linear IgA disease, lupus erythematosus, Majeed syndrome, Meniere's disease, microscopic polyangiitis, mixed connective tissue disease, morphea, Mucha-Habermann disease, multiple sclerosis, myasthenia gravis, myositis, narcolepsy, neuromyelitis optica, neuromyotonia, occular cicatricial pemphigoid, opsoclonus myoclonus syndrome, Ord's thyroiditis, palindromic rheumatism, pediatric autoimmune neuropsychiatric disorders associated with streptococcus, paraneoplastic cerebellar degeneration, paroxysmal nocturnal hemoglobinuria, Parry Romberg syndrome, Parsonage-Turner syndrome, Pars planitis, pemphigus vulgaris, pernicious anemia, perivenous encephalomyelitis, POEMS syndrome, polyarteritis nodosa, polymyalgia rheumatic, polymyositis, primary biliary cirrhosis, primary sclerosing cholangitis, progressive inflammatory neuropathy, psoriasis, psoriatic arthritis, pyoderma gangrenosum, pure red cell aplasia, Rasmussen's encephalitis, Raynaud phenomenon, relapsing polychondritis, Reiter's syndrome, restless leg syndrome, retroperitoneal fibrosis, rheumatoid arthritis, rheumatic fever, sarcoidosis, schizophrenia, Schmidt syndrome, Schnitzler syndrome, scleritis, scleroderma, serum sickness, Sjogren's syndrome, spondyloarthropathy, stiff person syndrome, subacute bacterial endocarditis, Susac's syndrome, Sweet's syndrome, sympathetic ophthalmia, Takayasu's arteritis, temporal arteritis, thrombocytopenia, Tolosa-Hunt syndrome, transverse myelitis, ulcerative colitis, undifferentiated connective tissue disease, urticarial vasculitis, vasculitis, vitiligo and Wegener's granulomatosis, and the like. In another embodiment, the autoimmune disease is HIV, primary sclerosing cholangitis, primary biliary cirrhosis or psoriasis.
Quantification of IgG Glycopeptides as Biomarkers for Breast Cancer
Quantification of IgG Glycopeptides as Potential Biomarkers for PSC and PBC
Example 2 shows quantification results of changes in IgG, IgM and IgA glycopeptides in plasma samples from patients having primary biliary cirrhosis (PBC), patients having primary sclerosing cholangitis (PSC), and healthy donors (those who do not have PBS and PSC) with reference to
In Example 2, plasma samples from patients having PSC, patients having PBC and plasma samples from healthy donors were analyzed for IgG1 and IgG2 glycopeptides and the changes in their glycopeptide ratios were compared. Specifically, 100 PBC plasma samples, 76 PSC plasma samples and plasma samples from 49 healthy donors were subjected to MRM quantitative analysis on a QQQ mass spectrometer. As can be seen from the quantitative results in
Further, a mapping of the separate and combined discriminant analysis results using a K-means clustering are shown in
These and other examples provided in this paper are intended to illustrate but not necessarily to limit the described implementation. As used herein, the term “implementation” means an implementation that serves to illustrate by way of example but not limitation. The techniques described in the preceding text and figures can be mixed and matched as circumstances demand to produce alternative implementations.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US18/56574 | 10/18/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62573959 | Oct 2017 | US |