Over the past 20 years, mass spectrometry (MS) has emerged as a dynamic tool for proteomics-based biomarker discovery, providing more information than can be obtained from other high-throughput approaches. However, published biomarker candidates from MS studies often fail to translate to the clinic, when promising claims from original studies cannot be independently reproduced.
Provided herein are methods and systems that provide targeted proteomics workflows that effectively identify protein biomarkers associated with diseases such as, for example, colorectal cancer. The present disclosure recognizes that the failures of past mass spectrometry studies can be attributed to various shortcomings such as in study design, sample quality, assay robustness, assay reproducibility, and/or quality control. Accordingly, certain aspects of the methods and systems disclosed herein utilize quality and/or process control metrics and procedures to enhance predictive accuracy and consistency.
Provided herein are noninvasive methods of assessing a CRC status in an individual, for example using a blood sample of an individual. Some such methods comprise the steps of obtaining a circulating blood sample from the individual; obtaining a biomarker panel level for a biomarker panel comprising a list of proteins in the sample comprising A2GL, ALS, and PTPRJ, and also including individual age and gender as biomarkers to comprise panel information from said individual, and using said panel information to make a CRC health assessment. Some approaches comprise comparing said panel information from said individual to a reference panel information set corresponding to a known colorectal cancer status, such as at least one of no CRC, stage I CRC, Stage II CRC, stage III CRC, stage IV CRC, and more generally early CRC, advanced CRC; and categorizing said individual as having said colorectal cancer status if said individual's reference panel information does not differ significantly from said reference panel information set. Some approaches comprise using panel levels in an algorithm to obtain a panel score, and comparing the panel score to that of panel scores for at least one reference panel information set score corresponding to a known colorectal cancer status, such as at least one of no CRC, stage I CRC, Stage II CRC, stage III CRC, stage IV CRC, and more generally early CRC, advanced CRC; and categorizing said individual as having said colorectal cancer status if said individual's reference panel information does not differ significantly from said reference panel information set. Some approaches comprise using ratios of selected biomarkers relative to one another in an algorithm to obtain a panel score, and comparing the panel score to that of panel scores for at least one reference panel information set score corresponding to a known colorectal cancer status, such as at least one of no CRC, stage I CRC, Stage II CRC, stage III CRC, stage IV CRC, and more generally early CRC, advanced CRC; and categorizing said individual as having said colorectal cancer status if said individual's reference panel information does not differ significantly from said reference panel information set.
Some approaches comprise comparing said panel information from said individual to a reference panel information set corresponding to a known colorectal cancer status, such as at least one of no CRC, stage I CRC, Stage II CRC, stage III CRC, stage IV CRC, and more generally early CRC, advanced CRC; and categorizing said individual as having a CRC status different from said reference panel if said individual's reference panel information differs significantly from said reference panel information set. Some approaches comprise using panel levels in an algorithm to obtain a panel score, and comparing the panel score to that of panel scores for at least one reference panel information set score corresponding to a known colorectal cancer status, such as at least one of no CRC, stage I CRC, Stage II CRC, stage III CRC, stage IV CRC, and more generally early CRC, advanced CRC; and categorizing said individual as not having said colorectal cancer status if said individual's reference panel information differs significantly from said reference panel information set. Some approaches comprise using ratios of selected biomarkers relative to one another in an algorithm to obtain a panel score, and comparing the panel score to that of panel scores for at least one reference panel information set score corresponding to a known colorectal cancer status, such as at least one of no CRC, stage I CRC, Stage II CRC, stage III CRC, stage IV CRC, and more generally early CRC, advanced CRC; and categorizing said individual as not having said colorectal cancer status if said individual's reference panel information differs significantly from said reference panel information set.
Some CRC panels disclosed herein demonstrate a Validation Area Under curve (AUC), a parameter of panel test success, of at least 0.80, such as 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.90, or greater than 0.90. In some cases, one observes a CRC AUC of 0.82 or about 0.82, and a Validation Sensitivity of 0.81 or about 0.81 and a validation specificity of 0.78 or about 0.78.
Also provided herein are noninvasive methods of assessing an advanced adenoma status in an individual, for example using a blood sample of an individual. Some such methods comprise the steps of obtaining a circulating blood sample from the individual; obtaining a biomarker panel level for a biomarker panel comprising a list of proteins in the sample comprising A2GL, ALS, and PTPRJ, and obtaining the age of the individual as biomarkers to comprise panel information from said individual, and using said panel information to make a CRC health assessment. Some approaches comprise comparing said panel information from said individual to a reference panel information set corresponding to a known AA status; and categorizing said individual as having said AA status if said individual's reference panel information does not differ significantly from said reference panel information set. Some approaches comprise using panel levels in an algorithm to obtain a panel score, and comparing the panel score to that of panel scores for at least one reference panel information set score corresponding to a known AA status; and categorizing said individual as having said AA status if said individual's reference panel information does not differ significantly from said reference panel information set. Some approaches comprise using ratios of selected biomarkers relative to one another in an algorithm to obtain a panel score, and comparing the panel score to that of panel scores for at least one reference panel information set score corresponding to a known AA status; and categorizing said individual as having said AA status if said individual's reference panel information does not differ significantly from said reference panel information set.
Some approaches comprise comparing said panel information from said individual to a reference panel information set corresponding to a known AA status; and categorizing said individual as having an AA status different from said reference panel if said individual's reference panel information differs significantly from said reference panel information set. Some approaches comprise using panel levels in an algorithm to obtain a panel score, and comparing the panel score to that of panel scores for at least one reference panel information set score corresponding to a known AA status; and categorizing said individual as not having said AA status if said individual's reference panel information differs significantly from said reference panel information set. Some approaches comprise using ratios of selected biomarkers relative to one another in an algorithm to obtain a panel score, and comparing the panel score to that of panel scores for at least one reference panel information set score corresponding to a known AA status; and categorizing said individual as not having said AA status if said individual's reference panel information differs significantly from said reference panel information set.
In light of the above and the disclosure herein, provided herein are methods, compositions, kits, computer readable media, and systems for the diagnosis and/or treatment of at least one of advanced colorectal adenoma and colorectal cancer. Through the methods and compositions provided herein, a sample is taken from an individual. In some cases the individual presents no symptoms of colorectal cancer, or advanced adenoma, or both colorectal cancer and adenoma. Some individuals are tested as part of routine health observation or monitoring. Alternately, some individuals are tested in relation to presenting at least one symptom of a colorectal health issue such as colorectal cancer, or advanced adenoma, or both colorectal cancer and adenoma. In some cases the individual is identified as being at risk of colorectal cancer, or advanced adenoma, or both colorectal cancer and adenoma. The sample is assayed to determine the accumulation levels of a panel of markers such as proteins, or proteins and age, or proteins and gender, or proteins and age and gender, for example a panel of markers comprising or consisting of the markers in panels disclosed herein. In many cases the panels comprise proteins that individually are known to play a role in indicating the presence of advanced colorectal adenoma or colorectal cancer, while in other cases the panels comprise a protein or proteins not know to correlate with advanced colorectal adenoma or colorectal cancer. However, in all cases the identification and accumulation of markers into a panel results in a level of specificity, sensitivity or specificity and sensitivity that substantially surpasses that of individual markers or smaller or less accurate sets of markers.
Additionally, methods, panels and other tests disclosed herein substantially surpass the sensitivity, specificity, or sensitivity and specificity of many commercially available tests, in particular many currently available blood-based tests. Methods, panels and other tests disclosed herein have the further benefit of being easily executed, such that an individual in need of gastrointestinal health evaluation test results is much more likely to have this test performed, rather than collecting a stool sample or having an invasive procedure such as a colonoscopy, for example. Panel accumulation levels are measured in a number of ways in various embodiments, for example through an antibody florescence binding assay or an ELISA assay, through mass spectroscopy analysis, through detection of florescence of an antibody set, or through alternate approaches to protein accumulation level quantification.
Panel accumulation levels are assessed through a number of approaches consistent with the disclosure herein. For example panel accumulation levels are compared to a positive control or negative control standard comprising at least one and up to 10, 100, or more than 100 standards of known colorectal health status, or to a model of advanced colorectal adenoma or colorectal cancer accumulation levels or of healthy accumulation levels, such that a prediction is made regarding an assayed individual's health status. Alternately or in combination, panel results are compared to a machine learning or other model trained on or built upon data obtained from known positive or known negative patient samples. In some cases, a panel assay result is accompanied by a recommendation regarding an intervention or an alternate verification of the panel assay results.
Accordingly, provided herein are biomarker panels and assays useful for the diagnosis and/or treatment of at least one of advanced colorectal adenoma and colorectal cancer.
Also provided herein are kits, comprising a computer readable medium described herein, and instructions for use of the computer readable medium.
A number of treatment regimens are contemplated herein and known to one of skill in the art, such as chemotherapy, administration of a biologic therapeutic agent, and surgical intervention such as low anterior resection or abdominoperineal resection, or ostomy.
Also provided herein are approaches for determining a panel of biomarkers suitable for assessing colorectal health status such as colorectal cancer, advanced colorectal adenoma, and/or stage of colorectal cancer.
Described herein is the development and experimental steps of a method for identifying biomarkers relevant to disease or health status. A number of approaches are consistent with the disclosure herein, such as large-scale dMRM-based workflow. A number of approaches include the use of at least one process control to evaluate aspects of the analytical instrumentation. In some cases, the method implements SST, using SIS peptide mixture and pooled plasma sample as reference material, or any combination thereof. In some cases, the approach instrumentation metrics that are evaluated include consistency of the response, carryover, retention time stability, signal-to-noise, or other suitable metrics. In certain instances, quality controls are used in the form of pooled plasma sample to monitor and if needed, correct the analytical variability during sample processing and analysis. Quality control metrics can be utilized to assess the sample and/or sample processing. The use of QC markers to provide information indicative of workflow or assay performance is consistent with the present disclosure and can include markers that undergo at least one of collection, storage, elution, processing, and analysis together with the sample.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
Provided herein are noninvasive methods of assessing a health status in an individual, for example colorectal cancer status using a biological sample of the individual. Some such methods comprise the steps of obtaining a circulating blood sample from the individual; obtaining a biomarker panel level for a biomarker panel comprising a list of proteins in the sample selected from Table 1, and using said panel information to make a CRC health assessment. In some cases, individual age and/or gender are also selected as biomarkers to comprise panel information from said individual. Some approaches comprise comparing said panel information from said individual to a reference panel information set corresponding to a known colorectal cancer status, such as at least one of no CRC, stage I CRC, Stage II CRC, stage III CRC, stage IV CRC, and more generally early CRC, advanced CRC; and categorizing said individual as having said colorectal cancer status if said individual's reference panel information does not differ significantly from said reference panel information set. Some approaches comprise using panel levels in an algorithm to obtain a panel score, and comparing the panel score to that of panel scores for at least one reference panel information set score corresponding to a known colorectal cancer status, such as at least one of no CRC, stage I CRC, Stage II CRC, stage III CRC, stage IV CRC, and more generally early CRC, advanced CRC; and categorizing said individual as having said colorectal cancer status if said individual's reference panel information does not differ significantly from said reference panel information set. Some approaches comprise using ratios of selected biomarkers relative to one another in an algorithm to obtain a panel score, and comparing the panel score to that of panel scores for at least one reference panel information set score corresponding to a known colorectal cancer status, such as at least one of no CRC, stage I CRC, Stage II CRC, stage III CRC, stage IV CRC, and more generally early CRC, advanced CRC; and categorizing said individual as having said colorectal cancer status if said individual's reference panel information does not differ significantly from said reference panel information set.
Biomarker panels as disclosed herein share a property that sensitive, specific conclusions regarding an individual's colorectal health are made using protein level information derived from circulating blood, alone or in combination with other information such as an individual's age, gender, health history or other characteristics. A benefit of the present biomarker panels is that they provide a sensitive, specific colorectal health assessment using conveniently, noninvasively obtained samples. There is no need to rely upon data obtained from an intrusive abdominal assay such as a colonoscopy or a sigmoidoscopy, or from stool sample material. As a result compliance rates are substantially higher, and colorectal health issues are more easily recognized early in their progression, so that they may be more efficiently treated. Ultimately, the effect of this benefit is measured in lives saved, and is substantial.
Biomarker panels as disclosed herein are selected such that their predictive value as panels is substantially greater than the predictive value of their individual members. Panel members generally do not co-vary with one another, such that panel members provide independent contributions to the panel's overall health signal. Accordingly, a panel is able to substantially outperform the performance of any individual constituent indicative of an individual's colorectal health status, such that a commercially and medicinally relevant degree of confidence (such as sensitivity, specificity or sensitivity and specificity) is obtained. Thus, in the panels as disclosed herein, multiple panel members indicative of a health issue provide a much stronger signal than is found, for example in a panel wherein two or more members rise or fall in strict concert such that the signal derived therefrom is effectively a single signal, repeated twice. Accordingly, panels as disclosed herein are robust to variation in single constituent measurements. For example because panel members vary independently of one another, panels herein often indicate a health risk despite the fact that one or more than one individual members of the panel would not indicate that the health risk is present if measured alone. In some cases, panels herein indicate a health risk at a significant level of confidence despite the fact that no individual panel member indicates the health risk at a significant level of confidence on its own. In some cases, panels herein indicate a health risk at a significant level of confidence despite the fact that at least one individual member indicates at a significant level of confidence that the health risk is not present.
Biomarkers consistent with the panels herein comprise biological molecules that circulate in the bloodstream of an individual, such as proteins. Readily available information including demographic information such as individual's age or gender is also included in some cases. Physiological information including weight, height, body mass index, as well as other easily measured or obtained information is also eligible as a marker. In particular, some panels herein rely upon age, gender, or age and gender as biomarkers.
Common to many biomarkers herein is the ease with which they are assayed in an individual. Biomarkers herein are readily obtained by a blood draw from an artery or vein of an individual, or are obtained via interview or by simple biometric analysis. A benefit of the ease with which biomarkers herein are obtained is that invasive assays such as colonoscopy or sigmoidoscopy are not required for biomarker measurement. Similarly, stool samples are not required for biomarker determination. As a result, panel information as disclosed herein is often readily obtained through a blood draw in combination with a visit to a doctor's office. Compliance rates are accordingly substantially higher than are compliance rates for colorectal health assays involving stool samples or invasive procedures.
Exemplary panels disclosed herein comprise circulating proteins or fragments thereof that are recognizably or uniquely mapped to their parent protein, and in some cases comprise a readily obtained biomarker such as an individual's age.
Some biomarker panels comprise some or all of the protein markers recited herein, subsets thereof or listed markers in combination with additional markers or biological parameters. A lead biomarker panel relevant to colorectal cancer and/or advanced adenoma assessment comprises at least 1, 2, 3, or 4 markers, up to the full list, alone or in combination with additional markers, said list selected from the following: A2GL, ACTBM, ALS, APOC4, APOE, APOL1, CHLE, GELS, I10R1, ITIH2, KAIN, PON1, PTPRJ, SPP24, TFR1, TNF15, IBP3, THRB, GUC2A, LYNX1, PREX2, RET4, and also including age and optionally gender as biomarkers. In some cases, the ratio between a protein marker and age is utilized as a feature in the panel for making a CRC assessment, for example, PTPRJ/age and/or ALS/age ratios. As used herein, a ratio can include a ratio between a peptide fragment of a protein marker and a demographic such as age. A peptide/marker ratio can include a ratio between at least one peptide derived from any of A2GL, ACTBM, ALS, APOC4, APOE, APOL1, CHLE, GELS, I10R1, ITIH2, KAIN, PON1, PTPRJ, SPP24, TFR1, TNF15, IBP3, THRB, GUC2A, LYNX1, PREX2, and RET4 and a demographic such as age. Examples of peptide/age ratios can be found in the working examples described herein. Non-limiting examples of Another lead biomarker panel relevant to colorectal cancer and/or advanced adenoma assessment comprises markers selected from the following: A2GL, ACTBM, ALS, APOC4, APOE, APOL1, CHLE, I10R1, ITIH2, KAIN, PON1, PTPRJ, SPP24, TFR1, TNF15, and also including age of the individual as a biomarker. Another lead biomarker panel, or a combination of biomarker panels having colorectal cancer and advanced adenoma assessment capabilities comprises markers selected from the following: A2GL, ALS, PTPRJ, and age, or a subset thereof optionally having at least one individual marker excluded or replaced with one or more markers. Another lead biomarker panel, or a combination of biomarker panels having colorectal cancer and advanced adenoma assessment capabilities comprises markers selected from the following: A2GL, ALS, GELS, PTPRJ, and age, or a subset thereof optionally having at least one individual marker excluded or replaced with one or more markers. In some cases, a CRC biomarker panel comprises one or more ratios of a protein marker relative to age.
Often, it is convenient or efficient to combine a CRC biomarker panel and an advanced adenoma panel into a single kit or a single biomarker panel. In these cases, one sees a kit comprising three biomarkers, or a subset or larger set thereof, including A2GL, ALS, and PTPRJ, if included, is informative as to both colorectal cancer status and advanced adenoma status, particularly in combination with information regarding patient age. Alternate and variant colorectal cancer biomarker panels are listed below.
Much like the panel discussed above, these panels, or subsets or additions, are used alone or in combination with the above-mentioned advanced adenoma panel, optionally using markers such as A2GL, ACTBM, ALS, APOC4, APOE, APOL1, CHLE, GELS, I10R1, ITIH2, KAIN, PON1, PTPRJ, SPP24, TFR1, TNF15, IBP3, THRB, GUC2A, LYNX1, PREX2, RET4, and also in combination with age, to be indicative of colorectal cancer status and/or advanced adenoma.
Accordingly, disclosed herein are colorectal health assessment panels comprising the biomarkers mentioned above. Panels comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, or 22, or more than 22 of the biomarkers mentioned herein such as, for example, those listed in Table 1.
In some cases, biomarker panels described herein comprise at least three biomarkers. The biomarkers can be selected from the group of identifiable polypeptides or fragments of the 22 protein biomarkers listed in Table 1, optionally used in combination with age and/or gender. Any of the biomarkers described herein can be protein biomarkers. Furthermore, the group of biomarkers in this example can in some cases additionally comprise polypeptides with the characteristics found in Table 1. In some cases, the ratio of one or more protein biomarkers described herein (e.g., one or more proteotypic peptides evaluated by mass spectrometry) to another biomarker such as age is utilized in making the assessment of health status.
Exemplary protein biomarkers and, when available, their human amino acid sequences, are listed in Table 1, below. Protein biomarkers comprise full length molecules of the polypeptide sequences of Table 1, as well as uniquely identifiable fragments of the polypeptide sequences of Table 1. Markers can be but do not need to be full length to be informative. In many cases, so long as a fragment is uniquely identifiable as being derived from or representing a polypeptide of Table 1, it is informative for purposes herein.
Biomarkers contemplated herein also include polypeptides having an amino acid sequence identical to a listed marker of Table 1 over a span of 6 residues, 7 residues, 8 residues, 9, residues, 10 residues, 20 residues, 50 residues, or alternately 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70% 80% 90%, 95% or greater than 95% of the sequence of the biomarker. Variant or alternative forms of the biomarker include for example polypeptides encoded by any splice-variants of transcripts encoding the disclosed biomarkers. In certain cases the modified forms, fragments, or their corresponding RNA or DNA, may exhibit better discriminatory power in diagnosis than the full-length protein.
Biomarkers contemplated herein also include truncated forms or polypeptide fragments of any of the proteins described herein. Truncated forms or polypeptide fragments of a protein can include N-terminally deleted or truncated forms and C-terminally deleted or truncated forms. Truncated forms or fragments of a protein can include fragments arising by any mechanism, such as, without limitation, by alternative translation, exo- and/or endo-proteolysis and/or degradation, for example, by physical, chemical and/or enzymatic proteolysis. Without limitation, a biomarker may comprise a truncated or fragment of a protein, polypeptide or peptide may represent about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% of the amino acid sequence of the protein.
Without limitation, a truncated or fragment of a protein may include a sequence of about 5-20 consecutive amino acids, or about 10-50 consecutive amino acids, or about 20-100 consecutive amino acids, or about 30-150 consecutive amino acids, or about 50-500 consecutive amino acid residues of the corresponding full length protein.
In some instances, a fragment is N-terminally and/or C-terminally truncated by between 1 and about 20 amino acids, such as, for example, by between 1 and about 15 amino acids, or by between 1 and about 10 amino acids, or by between 1 and about 5 amino acids, compared to the corresponding mature, full-length protein or its soluble or plasma circulating form.
Any protein biomarker of the present disclosure such as a peptide, polypeptide or protein and fragments thereof may also encompass modified forms of said marker, peptide, polypeptide or protein and fragments such as bearing post-expression modifications including but not limited to, modifications such as phosphorylation, glycosylation, lipidation, methylation, selenocystine modification, cysteinylation, sulphonation, glutathionylation, acetylation, oxidation of methionine to methionine sulphoxide or methionine sulphone, and the like.
In some instances, a fragmented protein is N-terminally and/or C-terminally truncated. Such fragmented protein can comprise one or more, or all transitional ions of the N-terminally (a, b, c-ion) and/or C-terminally (x, y, z-ion) truncated protein or peptide. Exemplary human markers, nucleic acids, proteins or polypeptides as taught herein are as annotated under NCBI Genbank (accessible at the website ncbi.nlm.nih.gov) or Swissprot/Uniprot (accessible at the website uniprot.org) accession numbers. In some instances said sequences are of precursors (for example, preproteins) of the of markers, nucleic acids, proteins or polypeptides as taught herein and may include parts which are processed away from mature molecules. In some instances although only one or more isoforms is disclosed, all isoforms of the sequences are intended.
Antibodies for the detection of the biomarkers listed herein are commercially available.
For a given biomarker panel recited herein, variant biomarker panels differing in one or more than one constituent are also contemplated. Thus, turning to a lead CRC panel A2GL, ALS, PTPRJ, and also including individual age, as an example, a number of related panels are disclosed. For this and other panels disclosed herein, variants are contemplated comprising at least 3, or at least 2 of the biomarker constituents of a recited biomarker panel.
Provided herein are methods that utilize biomarker panels to assess health status such as, for example, colorectal cancer health status. The methods can provide a high AUC signal that arises from a small pool of markers in the panel. In some cases, the AUC signal arises from no more than 20, 15, 10, 9, 8, 7, 6, 5, or 4 markers in the panel. The panel may include a list of markers from which a smaller subset of markers provide an AUC signal of at least 0.70, 0.75, 0.80, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, or 0.99. For example, a biomarker panel may comprise a panel of at least one marker selected from A2GL, ALS, and PTPRJ (and optionally age), and at least one additional marker such as one listed in Table 1. In some cases, the biomarker panel used to assess a colorectal health status comprises no more than 20, 15, 10, 9, 8, 7, 6, 5, or 4 markers. The biomarker panel may comprise markers selected from Table 1. In some cases, the biomarker panel consists of A2GL, ALS, PTPRJ, and age. In some cases, the biomarker panel consists essentially of A2GL, ALS, PTPRJ, and age. In some instances, the assessment of colorectal health status comprises utilizing a ratio between one or more of A2GL, ALS, and PTPRJ with age. For example, a classifier utilizing the biomarker panel to generate a prediction or classification (e.g., health status assessment) may utilize the ratio between PTPRJ and age as a feature in making the prediction. A biomarker panel comprising A2GL, ALS, PTPRJ, and age may include additional markers such as any combination of those listed in Table 1 or the list of 430 candidate markers described herein. In some cases, the biomarker panel comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, or at least 23 markers from Table 1. The biomarker panel can comprise any reference listed in Table 2 in combination with at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or at least 20 additional markers (e.g., non-redundant markers) from Table 1. In some instances, the biomarker panel comprises at least 1, 2, 3, 4, or 6 of A2GL, ALS, PTPRJ, GELS, and TFRC1. An exemplary panel comprises A2GL, ACTBM, ALS, APOC4, APOE, APOL1, CHLE, IL10R, ITIH2, KAIN, PON1, PTPRJ, SPP24, TFR1, and TNF15. In some instances, a biomarker panel comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14 proteins selected from A2GL, ACTBM, ALS, APOC4, APOE, APOL1, CHLE, IL10R, ITIH2, KAIN, PON1, PTPRJ, SPP24, TFR1, TNF15, and optionally including age. Another exemplary panel comprises A2GL, ALS, PTPRJ, GELS, and TFR1. Sometimes, a biomarker panel comprises at least 1, 2, 3, or 4 of A2GL, ALS, PTPRJ, GELS, and TFR1, alone or in combination with age. The biomarker panel can comprise a ratio of a biomarker and age such as, for example, PTPRJ/age.
Exemplary CRC panels consistent with the disclosure herein are listed in Table 2. Also disclosed are panels comprising the markers listed in entries of Table 2.
In some cases, the panel comprises reference 1 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 2 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 3 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 4 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 5 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 6 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 7 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 8 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 9 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 10 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 11 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 12 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 13 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the panel comprises reference 14 of Table 2 in combination with at least one additional marker from Table 1. In some cases, the biomarker panel comprises any reference of Table 2 in combination with GELS from Table 1. In some cases, the biomarker panel comprises any reference of Table 2 in combination with TFR1 from Table 1.
The present disclosure includes methods that address various shortcomings with a targeted proteomics workflow that enable Tier 2 measurements of targeted peptides using mass spectrometry. In some instances, the measurements are obtained using dynamic multiple reaction monitoring (dMRM) MS. Described herein are various steps taken, including process controls, to develop and characterize a mass spectrometric analysis such as, for example, a high-multipex dMRM assay. Alternative assays are also consistent with the disclosure herein. For example, affinity assays using antibodies or antibody mimetics such as affibody molecules, affitins, atrimers, etc., may be used to detect and/or quantify markers. Affinity assays can include immunoassays and aptamer assays. In some cases, the assay measures proteotypic peptides from proteins related to a disease or health status. For example, described herein are assays measuring 641 proteotypic peptides from 392 colorectal cancer (CRC) related proteins. The present disclosure includes the use of quality and/or process control metrics and procedures to track and handle sample processing and instrument variations over a data collection period (e.g., of four months), during which the assay was used in the study of biological samples from patients with CRC symptoms. The biological samples can be obtained from various sources such as, for example, blood samples. The samples for 1,045 patients with CRC symptoms were analyzed in one study. After data collection, transitions can be filtered using one or more signal quality metrics before being used in receiver operating characteristic (ROC) analysis to assess univariate CRC signal. As an example, the ROC analysis demonstrated dMRM-based CRC signal carried by 127 CRC-related proteins in the symptomatic population. These dMRM assays can be developed as Tier 1 assays for clinical tests to identify individuals at elevated risk of CRC.
In some cases, transitions are filtered using at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, or at least ten signal quality metrics before being used in ROC analysis for assessing univariate CRC signal.
Disclosed herein is a dMRM MS method with the rigor of a Tier 2 assay as defined by the CPTAC ‘fit for purpose approach’. Using quality and process control procedures, the assay was successfully used to quantify 641 proteotypic peptides representing 392 CRC-related proteins in plasma from 1045 CRC-symptomatic patients. The results showed that 127 of the proteins carried univariate CRC signal in the symptomatic population. This large number of single biomarkers demonstrates the utility of multivariate classifiers to distinguish CRC in the symptomatic population using the disclosed workflow(s). Other methodologies in addition to dMRM MS may be used. Immunoassays and aptamer assays that utilize antibodies, aptamers, or other molecules capable of binding or recognizing specific targets are consistent with the methods and workflows described herein.
Various forms of mass spectrometry are available for evaluating protein and other molecules in a sample. For example, fragmenting approaches for tandem MS include collision-induced dissociation (CID), electron capture dissociation (ECD), electron transfer dissociation (ETD), infrared multiphoton dissociation (IRMPD), blackbody infrared radiative dissociation (BIRD), electron-detachment dissociation (EDD) and surface-induced dissociation (SID). Various separation techniques are available as well and include, for example, gas chromatography, liquid chromatography, and capillary electrophoresis.
Disclosed herein are quality and process control procedures that allow the generation of biomarker panels for assessing colorectal health status. Such procedures include process control and/or quality control steps for evaluating performance of the assays and/or instruments used to process samples. A process control step can include system suitability tests (SST) that are performed prior to sample processing. For example, SSTs can be performed on mass spectrometry instrumentation to evaluate performance of the liquid chromatography and/or mass spectrometer. Control samples can be used in this evaluation such as, for example, to generate standard curves of internal standards to assess the instrumentation and workflow. An example of a process control step is to determine whether 10× dilution series of internal standards are being accurately quantified by the mass spectrometer (or other affinity assay such as immunoassay or aptamer assay). The process control step may also determine whether the dynamic range spans across a threshold number of log units across the standard curve. For example, a lack of accuracy in quantification and/or a low dynamic range can cause the sample to be discarded and/or gated/screened to remove data determined to be impacted by the areas of poor performance. A process control step that evaluates at least one QC marker is also consistent with the present disclosure. In some cases, a control sample includes at least one QC marker as described herein.
Process control steps can include various forms of workflow monitoring such as, for example, monitoring flow-through AUC during immunodepletion, monitoring of TPA results for sample processing and immunodepletion efficiency, or sample preparation customization depending on the TPA result of each individual sample. Other examples of process control steps include a quality control check requiring a confidence interval of RTs of heavy transitions to be no more than a certain percentage from the margins of a chromatography mass spectrometry acquisition window. Examples of the certain percentage include 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, and 20%. Workflow monitoring utilizing QC markers to assess various conditions such as sample integrity, sample elution efficiency, sample storage condition, and internal standard monitoring are also contemplated in the present disclosure.
Biomarkers or biological markers can refer to any measurable characteristic of a biological specimen that can be evaluated as an indicator of normal biological processes, pathogenic processes or pharmacological responses to a therapeutic intervention. In the last 30 years, a greater understanding of the underlying biology of many cancers coupled with technological advances have contributed to the investment in biomarker discovery with the hope of identifying the appropriate biological markers to guide clinicians in the detection, screening, diagnosis, treatment and monitoring of cancer treatment. Among the plethora of biomarker-related publications of recent years there have been numerous reports on the discovery and promise of novel plasma- or serum-based cancer biomarkers, intended for diagnostic, prognostic and predictive purposes. However, despite the abundance of biomarker publications and the advances in genomic and proteomic technologies, few biomarkers have been implemented in clinical practice; by some estimates the success rate for clinical translation of biomarkers is as low as 0.1%, with only a few dozen biomarkers in clinical use for the treatment of cancer. While some have speculated on the factors contributing to the failures of biomarkers reaching the clinic, it is widely recognized that a large number of these failures can be categorized as false discoveries—biomarkers that could not be independently reproduced in follow-up studies.
The present disclosure recognizes that these false discoveries can be attributed to pre-analytical, analytical, and post-analytical shortcomings. The pre-analytical problems may stem from poor sample quality and/or incomplete clinical documentation. The analytical problems may originate from varying qualities of assay platforms and sample measurements. The post-analytical problems may result from faulty bioinformatics approaches (statistical problems related to multiple testing and overfitting). In light of the poor return on investment in biomarker discovery, in recent years, the scientific community has started to focus on identifying and addressing these issues contributing to high biomarker failure rate.
In some instances, analytical variation and address factors contributing to false biomarker discovery are monitored. These are particularly troublesome in multiplexed biomarker studies, where the variabilities of several assays must be tracked and managed to ensure success. The multi-marker assay presented in this manuscript can be classified as a Tier 2 assay under the CPTAC ‘fit for purpose approach’; it was developed to measure colorectal cancer candidate biomarker proteins with the goal of down-selecting to a much smaller protein panel, for further validation and eventual clinical implementation. A Tier 2 assay should be high-throughput, precise, reproducible and quantitative and it's because of these requirements as well as it's multiplexing capabilities that targeted dMRM was selected in this study with the goal of identifying a novel colorectal biomarker panel. While selecting the best technology platform for clinical utility will no doubt improve the odds of successful delivery of a clinical biomarker, it is also important to address the variability associated with the highly complex analytical process. To this end, an important consideration is the implementation of system suitability tests (SST) and quality controls to aid in monitoring and remedying the variability. Recent publications also support the growing recognition of the need for SST and quality controls as a means to addressing analytical variability and establishing confidence in analytical measurements.
Described herein is the development and experimental steps of a large-scale dMRM-based method for identifying biomarkers relevant to disease or health status. In some cases, the method implements SST, using SIS peptide mixture and pooled plasma sample as reference material, to evaluate aspects of the analytical instrumentation such as consistency of the response, carryover, retention time stability, and signal-to-noise. In certain instances, quality controls are used in the form of pooled plasma sample to monitor and if needed, correct the analytical variability during sample processing and analysis. The implementation of one or more systematic quality assessments was a critical component of the analytical process, providing confidence in over a thousand samples measurements, collected on multiple instruments over an extended period of time.
Described herein are systems and methods that address the analytical variability, and the pre-analytical factors impacting sample quality, were also an important consideration in the study design. The samples used in this study were from the same carefully curated cohort as used in previous biomarker studies and described in more detail in an earlier publication. In addition to the measures taken to monitor analytical variability in this report, described herein is a novel systematic approach used to filter peptides and rank peptide transitions, as a means to build a robust mass spectrometry analytical method such as, for example, a dMRM-based analytical method, for the measurement of proteotypic peptides representing disease or health condition related proteins. For example, disclosed herein are measurements of 641 proteotypic peptides representing 392 CRC-related proteins. Finally, with a dataset of reliable analytical measurements from various patients and under the guidance of a team of bioinformatics scientists, machine learning algorithms were used to analyze the quantitative measurements and to build candidate CRC biomarker panels suitable for identifying at-risk patients who should undergo colonoscopy. Described herein are biomarker panels generated based on measurements and analysis of 1045 CRC patients.
Candidate protein biomarkers for CRC can be selected from various sources such as one or more of: 1) an earlier targeted proteomics study performed in our laboratory, 2) analysis of publicly available proteomics datasets related to CRC, and 3) semi-automated literature searches. A non-limiting list of candidate protein biomarkers identified is shown below, which has a total of 430 proteins designated as CRC-related biomarker candidates for further experimental investigation.
1433B_HUMAN; CH60_HUMAN; H2BFS_HUMAN; PCKGM_HUMAN; TNF15_HUMAN; 1433E_HUMAN; CHK1_HUMAN; HABP2_HUMAN; PDIA3_HUMAN; TNF6B_HUMAN; 1433F_HUMAN; CHK2_HUMAN; HEMO_HUMAN; PDIA6_HUMAN; TP4A3_HUMAN; 1433G_HUMAN; CHLE_HUMAN; HEP2_HUMAN; PDLI7_HUMAN; TPA_HUMAN; 1433T_HUMAN; CLC4D_HUMAN; HGF_HUMAN; PDXK_HUMAN; TPM2_HUMAN; 1433Z_HUMAN; CLUS_HUMAN; HMGB1_HUMAN; PEBP1_HUMAN; TR10B_HUMAN; 1A68_HUMAN; CNDP1_HUMAN; HNRPF_HUMAN; PEDF_HUMAN; TRAP1_HUMAN; A1AG1_HUMAN; CNN1_HUMAN; HNRPQ_HUMAN; PGFRA_HUMAN; TREM1_HUMAN; A1AG2_HUMAN; CO3_HUMAN; HPT_HUMAN; PIPNA_HUMAN; TRFE_HUMAN; A1AT_HUMAN; CO4A_HUMAN; HRG_HUMAN; PLGF_HUMAN; TRFL_HUMAN; A1BG_HUMAN; CO6A3_HUMAN; HS90B_HUMAN; PLIN2_HUMAN; TRI33_HUMAN; A2AP_HUMAN; CO8G_HUMAN; HSPB1_HUMAN; PLMN_HUMAN; TSG6_HUMAN; A2GL_HUMAN; CO9_HUMAN; I10R1_HUMAN; PO2F1_HUMAN; TSP1_HUMAN; A2MG_HUMAN; COR1C_HUMAN; IBP2_HUMAN; PON1_HUMAN; TTHY_HUMAN; A4_HUMAN; CORIN_HUMAN; IBP3_HUMAN; POTEF_HUMAN; UGDH_HUMAN; AACT_HUMAN; CP1A1_HUMAN; IF4A3_HUMAN; PPIB_HUMAN; UGPA_HUMAN; ABCB5_HUMAN; CRDL2_HUMAN; IFT74_HUMAN; PRD16_HUMAN; UROK_HUMAN; ABCBA_HUMAN; CRP_HUMAN; IGF1_HUMAN; PRDX1_HUMAN; VCAM1_HUMAN; ACINU_HUMAN; CSF1_HUMAN; IGHA2_HUMAN; PRDX2_HUMAN; VEGFA_HUMAN; ACTBL_HUMAN; CSF1R_HUMAN; IGLL5_HUMAN; PREX2_HUMAN; VGFR1_HUMAN; ACTBM_HUMAN; CSPG2_HUMAN; IKKB_HUMAN; PRKN2_HUMAN; VILI_HUMAN; ACTG_HUMAN; CTHR1_HUMAN; IL23R_HUMAN; PRL_HUMAN; VIME_HUMAN; ACTH_HUMAN; CTNA1_HUMAN; IL26_HUMAN; PROC_HUMAN; VNN1_HUMAN; ADIPO_HUMAN; CTNB1_HUMAN; IL2RB_HUMAN; PROS_HUMAN; VP13B_HUMAN; ADT2_HUMAN; CUL1_HUMAN; IL6RA_HUMAN; PSME3_HUMAN; VTNC_HUMAN; AFAM_HUMAN; CYTC_HUMAN; IL8_HUMAN; PTEN_HUMAN; VWF_HUMAN; AGAP2_HUMAN; DAF_HUMAN; IL9_HUMAN; PTGDS_HUMAN; XBP1_HUMAN; AKA12_HUMAN; DEF1_HUMAN; ILEU_HUMAN; PTPRJ_HUMAN; ZA2G_HUMAN; AKT1_HUMAN; DESM_HUMAN; IPSP_HUMAN; PTPRT_HUMAN; ZMIZ1_HUMAN; AL1A1_HUMAN; DHRS2_HUMAN; IPYR_HUMAN; PTPRU_HUMAN; ZPI_HUMAN; AL1B1_HUMAN; DHSA_HUMAN; IRGM_HUMAN; PZP_HUMAN; ALBU_HUMAN; DPP10_HUMAN; ISK1_HUMAN; RAB38_HUMAN; ALDOA_HUMAN; DPP4_HUMAN; ITA6_HUMAN; RASF2_HUMAN; ALDR_HUMAN; DPYL2_HUMAN; ITA9_HUMAN; RASK_HUMAN; ALS_HUMAN; DYHC1_HUMAN; ITIH2_HUMAN; RBX1_HUMAN; AMPD1_HUMAN; ECH1_HUMAN; JAM3_HUMAN; RCAS1_HUMAN; AMPN_HUMAN; EDA_HUMAN; K1C19_HUMAN; REG4_HUMAN; AMY2B_HUMAN; EF2_HUMAN; K2C72_HUMAN; RET4_HUMAN; ANGI_HUMAN; ENOA_HUMAN; K2C73_HUMAN; RHOA_HUMAN; ANGL4_HUMAN; ENOX2_HUMAN; K2C8_HUMAN; RHOB_HUMAN; ANGT_HUMAN; ENPL_HUMAN; KAIN_HUMAN; RHOC_HUMAN; ANT3_HUMAN; ENPP1_HUMAN; KC1D_HUMAN; ROA1_HUMAN; ANXA1_HUMAN; ENPP2_HUMAN; KCRB_HUMAN; ROA2_HUMAN; ANXA3_HUMAN; EZRI_HUMAN; KISS1_HUMAN; RRBP1_HUMAN; ANXA4_HUMAN; FA10_HUMAN; KLK6_HUMAN; RSSA_HUMAN; ANXA5_HUMAN; FA5_HUMAN; KLOT_HUMAN; S100P_HUMAN; APC_HUMAN; FA7_HUMAN; KNG1_HUMAN; S10A8_HUMAN; APCD1_HUMAN; FA9_HUMAN; KPCD1_HUMAN; S10A9_HUMAN; APOA1_HUMAN; FABP5_HUMAN; KPYM_HUMAN; S10AB_HUMAN; APOA2_HUMAN; FAK1_HUMAN; LAMA2_HUMAN; S10AC_HUMAN; APOA4_HUMAN; FAK2_HUMAN; LAT1_HUMAN; S29A1_HUMAN; APOA5_HUMAN; FARP1_HUMAN; LBP_HUMAN; SAA1_HUMAN; APOC1_HUMAN; FBX4_HUMAN; LCAT_HUMAN; SAA2_HUMAN; APOC4_HUMAN; FCGBP_HUMAN; LDHA_HUMAN; SAA4_HUMAN; APOE_HUMAN; FCRL3_HUMAN; LEG2_HUMAN; SAHH_HUMAN; APOH_HUMAN; FCRL5_HUMAN; LEG3_HUMAN; SAMP_HUMAN; APOL1_HUMAN; FETA_HUMAN; LEG4_HUMAN; SBP1_HUMAN; APOM_HUMAN; FETUA_HUMAN; LEG8_HUMAN; SDCG3_HUMAN; ASAP3_HUMAN; FHL1_HUMAN; LEPR_HUMAN; SEGN_HUMAN; ATPB_HUMAN; FHR1_HUMAN; LEUK_HUMAN; SELPL_HUMAN; ATS13_HUMAN; FHR3_HUMAN; LG3BP_HUMAN; SEPP1_HUMAN; B2CL1_HUMAN; FIBA_HUMAN; LMNB1_HUMAN; SEPR_HUMAN; B2LA1_HUMAN; FIBB_HUMAN; LRRC7_HUMAN; SEPT9_HUMAN; B3GT5_HUMAN; FIBG_HUMAN; LUM_HUMAN; SF3B3_HUMAN; BANK1_HUMAN; FINC_HUMAN; LYNX1_HUMAN; SHIP1_HUMAN; BC11A_HUMAN; FLNA_HUMAN; LYSC_HUMAN; SHRPN_HUMAN; BCAR1_HUMAN; FLNB_HUMAN; MACF1_HUMAN; SIA8D_HUMAN; C1QBP_HUMAN; FLNC_HUMAN; MAP1S_HUMAN; SIAL_HUMAN; C4BPA_HUMAN; FND3B_HUMAN; MARE1_HUMAN; SIT1_HUMAN; CA195_HUMAN; FRIH_HUMAN; MASP1_HUMAN; SKP1_HUMAN; CAH1_HUMAN; FRIL_HUMAN; MASP2_HUMAN; SLAF1_HUMAN; CAH2_HUMAN; FRMD3_HUMAN; MBL2_HUMAN; SO1B3_HUMAN; CALR_HUMAN; FST_HUMAN; MCM4_HUMAN; SP110_HUMAN; CAPG_HUMAN; FUCO_HUMAN; MCR_HUMAN; SPB6_HUMAN; CASP9_HUMAN; FUCO2_HUMAN; MCRS1_HUMAN; SPON2_HUMAN; CATD_HUMAN; G3P_HUMAN; MIC1_HUMAN; SPP24_HUMAN; CATS_HUMAN; GAS6_HUMAN; MICA1_HUMAN; SRC_HUMAN; CATZ_HUMAN; GBRA1_HUMAN; MIF_HUMAN; SRPX2_HUMAN; CBG_HUMAN; GDF15_HUMAN; MMP2_HUMAN; STK11_HUMAN; CBPN_HUMAN; GDIR1_HUMAN; MMP7_HUMAN; SYDC_HUMAN; CBPQ_HUMAN; GELS_HUMAN; MMP9_HUMAN; SYG_HUMAN; CCD83_HUMAN; GFI1B_HUMAN; MTG16_HUMAN; SYNE1_HUMAN; CCL14_HUMAN; GGT1_HUMAN; MUC24_HUMAN; SYUG_HUMAN; CCR5_HUMAN; GHRL_HUMAN; MYL6_HUMAN; TACC1_HUMAN; CD109_HUMAN; GPNMB_HUMAN; MYL9_HUMAN; TAL1_HUMAN; CD20_HUMAN; GPX3_HUMAN; MYO9B_HUMAN; TBB1_HUMAN; CD24_HUMAN; GREM1_HUMAN; NDKA_HUMAN; TCTP_HUMAN; CD248_HUMAN; GRM6_HUMAN; NDRG1_HUMAN; TETN_HUMAN; CD28_HUMAN; GRP75_HUMAN; NFAC1_HUMAN; TF7L1_HUMAN; CD63_HUMAN; GSHR_HUMAN; NGAL_HUMAN; TFR1_HUMAN; CDD_HUMAN; GSTP1_HUMAN; NIBL2_HUMAN; THBG_HUMAN; CEA_HUMAN; GUC2A_HUMAN; NIPBL_HUMAN; THIO_HUMAN; CEAM3_HUMAN; H13_HUMAN; NNMT_HUMAN; THRB_HUMAN; CEAM5_HUMAN; H2A1D_HUMAN; NOD2_HUMAN; THTR_HUMAN; CEAM6_HUMAN; H2A2B_HUMAN; NUPR1_HUMAN; TIE2_HUMAN; CERU_HUMAN; H2AX_HUMAN; OSTP_HUMAN; TIMP1_HUMAN; CFAH_HUMAN; H2B1A_HUMAN; P53_HUMAN; TIMP2_HUMAN; CFAI_HUMAN; H2B1L_HUMAN; PAFA_HUMAN; TKT_HUMAN; CGHB_HUMAN; H2B1O_HUMAN; PAI1_HUMAN; TMG4_HUMAN; CH3L1_HUMAN; H2B3B_HUMAN; PALLD_HUMAN; TNF13_HUMAN;
Described herein is are methods for carrying out CRC biomarker discovery using targeted MS measures obtained with dMRM assays. The present methods addressed a significant problem that has plagued MS-based biomarker discovery over the past few decades—that few discovery results translate successfully to the clinic. To ensure a better success rate in translating the results to the clinic, a large amount of work went toward developing dMRM assays of very high quality.
The methods described herein allowed the development of Tier 2 assays as defined by the CPTAC ‘fit for purpose approach’. In some cases, a number of process and quality controls were utilized throughout assay development, study running, and study analysis; some of these control steps included novel approaches. During assay development, process control steps were implemented in early in silico peptide filtering, LC gradient optimization, transition filtering, CE optimization, and transition screening/ranking for the final method build. The transition screening/ranking process used an automated approach that is novel in the field, and that offers several advantages to manual methods. During study runs, process control steps were implemented in monitoring of flow-through AUC during immunodepletion, monitoring of TPA results for sample processing and immunodepletion efficiency, and sample preparation customization depending on each sample's TPA result. During study runs, quality control steps were implemented in SSTs run to check LC and MS performance prior to each day's planned sample runs, and in tracking PQCs' signal and reproducibility across study days. During study analysis, transitions were filtered to those with quantitative performance and with good peak quality, thus ensuring that only the best measures entered into study analysis. The peak quality tool that we employed is novel in the field; its high performance enables quick assessment of peak quality and obviates requirement for lengthy manual peak review. In addition, we used only transitions that had valid measures across all study samples, thus avoiding the problems that accompany data imputation for missing values.
The study presented here resulted in evidence for CRC signal carried individually by 127 CRC-related proteins in the CRC-symptomatic population. This large number of CRC biomarkers in the symptomatic population, combined with the very high quality assays with which they were identified, demonstrates the potential for development of new CRC diagnostic tests serving the CRC-symptomatic population using our workflow.
The present disclosure describes work related to classifier builds performed as part of the project known as Targeted Proteomics Version 2 (TPv2). The classifiers were aimed at discriminating colorectal cancer (CRC) from non-CRC samples, using data from 1,045 Endoscopy II (CRC-symptomatic) patients' plasma samples. In TPv2, the sample concentrations of targeted peptide ions were obtained using a dynamic multiple-reaction-monitoring (MRM) method on mass spectrometry (MS) instruments (You et al., 2018). The initial goals of the work reported here were to develop CRC classifiers that 1) demonstrate an improvement of CRC signal over that reported in TPv1 (Jones et al., 2016) and/or 2) demonstrate CRC performance at least equivalent to that found in the SimpliProColon Version 1 CRC (SPCv1) test, which was developed based on ELISA measures from the same 1,045 Endoscopy II patients used in the present study. The first goal was determined to be unrealistic because of differences between the datasets used in TPv1 and TPv2. The second goal was met.
An overview of the 58 simple grids is presented in
The column “pre-noc median merged test auc” lists the discovery set CRC vs NCNF AUCs achieved in each grid, prior to any NoC analyses. Considering just these AUCs, it's clear that the lowest AUCs were obtained for the CRC vs nonCRC discrimination, performed early in the process. This is consistent with other API studies using the same patient samples (CRC05E, which gave rise to the SPCv1 test). Based on this, the majority of later builds focused on the CRC vs NCNF discrimination. The highest AUCs were obtained for the CRC vs NCNF grids using the “AK 2016 classifier” feature subset. While AK's expanded grid often gave good classifiers in the past, this finding of highest AUCs was not entirely expected—only a subset of the AK 2016 classifier features was found in the data matrices that AK distributed to the team, and the peak areas appear to have been calculated using different algorithms than used by AK for his 2016 builds. Despite these differences, the highest AUCs were uncovered with these classifiers; this is another argument in favor of either recasting the simple grid with additional feature selection capabilities, or rehydrating the expanded grid,
Rows for classifiers for which NoC analyses were performed are highlighted in blue and orange in
These observations led to a revised approach focusing on using specialized feature subsets, and using fewer features. This eventually led to model 40, which validated with sens/spec matching that of SPCv1. The other notable success using this approach was model 52.
Comparison with TPv1
One of the initial goals of the work described here was to compare TPv2 results to those of TPv1 (Jones et al., 2016). The TPv1 study examined CRC vs non-CRC signal using samples from age- and gender-matched patient pairs in discovery and validation sets of 138 and 136 patients respectively. The patients came from three different cohorts that varied in control group composition and in information provided regarding comorbidities. At least one of the cohorts had a control group approximately equivalent to TPv2's NCNF (healthiest controls) group. TPv1 generated a 15-transition classifier with a discovery AUC of 0.82, and validated with an AUC of 0.91 and sens/spec of 0.87/0.81; this was higher than TPv2's validation AUC of 0.82 and sens/spec 0.81/0.78 for model 40.
There are several notable differences between TPv1 and TPv2, making a direct comparison challenging. Whereas TPv1 used matched samples and excluded demographic factors as CRC predictors, TPv1 randomized sample distribution and allowed age and gender to contribute to classifiers. Whereas TPv1 used three patient cohorts with varying annotation quality about comorbidities and symptomology, TPv2 used a single patient cohort with high quality annotations regarding comorbidities and symptomology. Whereas TPv1 samples may have had site bias correlated with CRC status for some cohorts, TPv2 samples were shown to have no site bias. Whereas TPv1 used a non-CRC group biased toward (and possibly dominated by) healthiest controls, TPv2 final classifiers used a non-CRC group representing the range of comorbidities in the actual ITT population. Whereas TPv1 did not use any information about patient CRC symptomology, TPv2 used only patients with CRC symptomology.
Of these differences, two can explain the larger CRC signal reported for the final TPv1 classifier: 1) bias toward healthy controls for the non-CRC group in TPv1, 2) potential site bias correlated with CRC status in TPv1. The first suggests that a more responsible comparison might be between TPv1 signal and TPv2's CRC vs NCNF signal. Considering TPv2's CRC vs NCNF discovery classifiers (Table 4) reveals that model 31 had a pre-NoC discovery AUC of 0.929, which is higher than the TPv1 discovery AUC of 0.81 at the same stage; taking model 31 forward into validation, and using the just the CRC vs NCNF subset there, might serve as an acceptable comparison with TPv1. This might be considered for future work, if a comparison with TPv1 is pursued further.
Comparison with SPCv1.
The second initial goal of the work described here was to demonstrate CRC performance at least equivalent to that found for the SPCv1 CRC test. The CRC05E study that gave rise to the SPCv1 test used samples from exactly the same patients as used in the current TPv2 study, with the same patients assigned to the discovery and validation sets. In addition, the SPCv1 classifier builds used the same approach as that used here—discovery CRC vs NCNF classifier builds, followed by NoC analyses in discovery ITT samples, followed by validation. Thus the results are directly comparable between the two studies. SPCv1 had a validated CRC vs non-CRC AUC of 0.83 and sens/spec of 0.81/0.78; TPv2 model 40 had a validated AUC of 0.82 (statistically indistinguishable from that of SPCv1) and sens/spec of 0.81/0.78; thus the TPv2 study demonstrated performance equivalent to that of SPCv1, meeting the goal.
The TPv2 classifier offers two advantages over that used in the SPCv1 test. First, the assay format, using targeted MRM MS measures, may prove to be more amenable to successful quality control and automation than the SPCv1 ELISAs. Second the smaller number of features in two of the best TPv2 classifiers (3 and 5 unique transition in models 40 and 52 respectively) will likely improve the focus and quality of any new test based on these results.
The work described here resulted in three validated CRC vs non-CRC classifiers targeted toward the CRC-symptomatic population. These classifiers were all SVMs, and arose from builds 28, 40, and 52. The classifier from build 40 is the most promising as it uses the fewest predictors and has the strongest performance in validation, matching sens/spec of 0.81/0.78 used in the SPCv1 test. This test, if implemented commercially on a MS platform, would provide equivalent CRC performance to SPCv1, and would likely prove more amenable to automation and quality control.
Disclosed herein are methods, systems, databases and compositions related to targeted health status assessment. Practice of the disclosure herein allows monitoring of a patient's health status, for example through the accurate, repeatable measurement of biomarkers such as proteins in an in vitro sample (e.g., derived from a patient). Monitoring may be directed toward a particular health status or condition, a set of conditions, or may be untargeted such that biomarkers are monitored and a change in biomarker levels or other signal from the biomarkers signals that a health condition indicated by the biomarkers or related to the biomarkers has changed or warrants further investigation or intervention.
Disclosed herein is a demonstration of the utility of mass spectrometry for the identification and quantitation of endogenous proteins and peptides in biological samples obtained from a human. Non-limiting examples of biological samples include dried blood or plasma spots, which can be collected using various collection methods such as special filter paper or dried plasma spot cards. In some embodiments of dried plasma spot cards, a blood sample is deposited on a filter layer that separates out the non-plasma blood components. After a specified amount of time, this filter layer is removed leaving a spot of plasma which is then left to dry prior to storage.
Biomarkers as contemplated herein encompass a broad range of data informative of patient health. Dried blood or dried plasma is an exemplary source of biomarker information, but a broad range of biomarkers and biomarker sources are compatible with the disclosure herein. In various embodiments, markers contemplated herein include at least one of patient age, gender, glucose level, blood pressure, sleep patterns, weight measurements, calorie intake, food intake constituents, vitamin or pharmaceutical intake, prescription drug use patterns, substance abuse history, exercise patterns or exercise output quantification (in terms, for example, of distance, an estimate of calories consumed, or other measure of energy consumed or exerted), and biomolecule measurement.
Additional markers employed in some embodiments include the time and place at which a sample is collected, such as at least one of time of day, time of week, date, and season in which a sample is collected. Similarly, geographic information related to the location at which the sample is collected, and/or geographical information relating to the individual from which the sample is collected, is also included in some embodiments.
A biomolecule serving as a biomarker can be measured from a sample in any number of patient tissues, for example fluids such as in at least one of a patient's blood, blood serum, urine, saliva, cerebrospinal fluid, breath exudate or any number of other tissues or fluids. In some cases, biomolecules are measured in, for example, patient urine, collected particles or fluid droplets in breath, or in saliva or blood. Preferred embodiments comprise measurement of a plurality of biomarkers from patient blood, such as protein biomarkers.
Biomarkers derived from a patient sample such as a patient fluid, for example as circulating biomarkers in patient blood, are quantified through a number of approaches consistent with the disclosure herein. When specific markers are targeted for measurement, mass spectrometric approaches or antibodies are used to detect and in some cases to quantify the level of at least one biomarker in a sample. Alternately or in combination, biomarkers such as circulating biomarkers in a blood sample or biomarkers obtained from breath aspirate are quantified, either relatively or absolutely, through mass spectrometric approaches.
Some aspects of the approaches described herein include the generation of large amounts of biomarker measurements. In various embodiments, measurements are made so that levels are determined for at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, or 200 or more biomarkers in a sample.
In some examples, label-free, label, or any other mass-shifted techniques are used to identify or quantify molecular markers in the sample. For example, label-free techniques include but are not limited to the Stable Isotope Standard (SIS) peptide response. Label techniques include but are not limited to chemical or enzymatic tagging of peptides or proteins. In some examples molecular markers in the sample include all the proteins associated with a particular disease. In some examples, these proteins are selected based on several performance characteristics (i.e. peak abundance, CV's, precision, etc.).
As disclosed herein, biomarkers can be accurately and repeatably measured for analyses such as in comparison to reference levels. Reference levels include levels of biomarkers determined from average levels of a plurality of individuals or samples for which at least one health condition status is known. Alternately or in combination, reference levels of biomarkers are determined from samples taken from the same individual at different times, such that temporal changes in an individual's biomarker profile are observed over time and such that a change in at least one up to a large number of biomarkers associated with a health status or condition is indicative of a change or an upcoming change in that health status or condition.
In some cases, a single biomarker is indicative of a health status in some instances, such that a change in the biomarker level is informative as to a change in health status. Alternately or in combination, a number of biomarkers, even if individually not informative of health status or informative below a confidence level upon which information is actionable, may exhibit changes in concert such that a health condition or status for which they are commonly implicated is identified as being altered or likely to be altered in the future with a level of confidence warranting action.
Biomarker measurements can be generated from mass spectrometry data or other sources such as protein or peptide array or immunological assays. In some cases, the measurements are for biomarkers corresponding to at least one of 1) known proteins or fragments mapping to known proteins of known function and known role in at least one heath status or disorder, 2) known proteins or known fragments mapping to known proteins of known function but unknown role in a health status or disorder, 3) unknown or unidentified proteins or fragments, such as fragments that have not been mapped to or identified with a particular protein of known function, but that nonetheless are in some cases relevant as markers for a health status or condition, for example due to their identifiable difference in levels between samples that differ in a known or hypothesized health status or health condition.
Accordingly, in various embodiments herein, marker data is useful in identifying a protein or set of proteins that differ between samples, such as individuals of differing health status or within a single individual at different time points, such that the identity of the biomarkers indicate a health condition or health status difference between individuals or in the individual at one time point compared to another. A non-limiting list of health conditions for which biomarkers are informative includes cardiovascular diseases (heart disease), hyperproliferative diseases (for example, cancer), neural diseases (for example, Alzheimer's disease), autoimmune diseases (for example, lupus metabolic diseases (such as obesity), inflammatory diseases (for example arthritis), bone diseases (such as osteoporosis) gastrointestinal diseases (such as ulcers), blood diseases (such as sickle cell anemia), infections (for example, bacterial, viral, and fungal infections), and chronic fatigue syndrome. Examples of hyperproliferative diseases such as cancer include colorectal, skin, lung, throat, blood, brain, breast, and prostate cancer.
Certain approaches described herein are targeted to the identification of colorectal cancer, adenoma, or polyp health status. For example, advanced colorectal cancer can be detected using a variety of techniques, and often include identifiable health symptoms such as rectal bleeding or bloody stool, change in bowel habits, weakness/fatigue, cramping, and weight loss. However, early stage colorectal cancer can be more difficult to detect. In some cases, the individual has not developed colorectal cancer and instead has a pre-CRC adenoma or polyp. Therefore, some of the methods described herein assess early stage colorectal cancer or pre-CRC using a biomarker panel recited herein such as, for example, A2GL, ALS, PTPRJ, and age.
A diagram showing an approach for designing and characterizing a study to identify biomarkers suitable for use in assessing health status such as colorectal cancer status is shown in
Described herein are quality control (QC) metrics informative of one or more factors having an influence on sample analysis. Such factors include sample collection, sample storage, sample elution, and other conditions or processes relevant to sample analysis. For example, certain conditions have an adverse impact on the quality, reliability, or variability of data that can be obtained from samples. Accordingly, QC metrics are indicative of at least one category of information such as sample integrity, sample elution efficiency, or filter storage condition. Sample integrity includes sample pH, sample stability, proteolytic activity, DNase activity, RNase activity, and other conditions informative of potential damage to the sample. Sample elution efficiency includes hydropathy-associated elution efficiency, overall sample elution efficiency, elution efficiency of sample constituents, and other indicators for assessing successful elution. Filter storage condition includes duration of sample storage, maximum temperature exposure, minimum temperature exposure, average temperature exposure, time-temperature exposure, light exposure, UV exposure, radiation exposure, humidity, and other conditions to which the sample has been exposed. QC metrics can be used to discard samples, discard or gate at least a portion of assay data obtained from the sample from further analysis or use in categorizing a result (e.g., CRC health status). For example, if a QC metric indicates that a threshold percentage of a marker of interest has failed to successfully elute from a collection device (e.g., greater than 10% of the marker or a corresponding internal standard or QC marker has failed to elute), then the marker may be discarded from use in categorizing a result. Alternatively, the quantification of the marker may be adjusted based on the QC metric (e.g., readjust calculated amount of marker to account for the predicted amount that was lost during elution).
QC metrics can be evaluated with the help of QC markers that provide information indicative of one or more category of information. In some embodiments, a QC marker is indicative of duration of sample storage, maximum temperature exposure, minimum temperature exposure, average temperature exposure, time-temperature exposure, sample pH, light exposure, UV exposure, radiation exposure, humidity, elution efficiency of sample constituents, hydropathy-associated elution efficiency, overall sample elution efficiency, sample stability, proteolytic activity, DNase activity, or RNase activity. Non-limiting examples of QC markers include elution markers, humidity markers, pH markers, temperature markers, time markers, proteolysis markers, nuclease markers, stability markers, radiation markers, UV markers, and light markers. Examples of QC markers can be found in international application PCT/US2018/049583, which is hereby incorporated by reference in its entirety. Specifically, at least the description of elution markers, humidity markers, pH markers, temperature markers, time markers, proteolysis markers, nuclease markers, stability markers, radiation markers, UV markers, and light markers from PCT/US2018/049583 are hereby incorporated by reference.
In some cases, the QC markers are collected and/or stored together with the sample. For example, a collection device such as a filter paper or dried blood spot filter comprising at least one QC marker is contemplated herein. Alternatively or in combination, QC markers are added to the sample after collection but before or during sample processing or analysis. Collection devices are suitable for collecting or receiving a variety of samples. Suitable samples include liquid samples such as blood, saliva, urine, tears, lymph, bile, sputum, or other biological fluids. A filter often comprises at least one layer such as a porous layer impermeable to particulates. When QC markers are used, at least one QC marker is disposed on a collection device such as a filter during device assembly, after device assembly, prior to sample deposition, during sample deposition, after sample deposition, before sample elution, during sample elution, after sample elution, before sample processing (e.g., for mass spectrometry or affinity assay analysis), during sample processing, or any combination thereof. At least one QC marker disposed on a collection device is positioned so as to co-migrate with a sample deposited on the device, co-elute from the filter with the sample, be stored on the device together with the sample, or any combination thereof. Alternatively, at least one QC marker disposed on a collection device is positioned to avoid co-elution with the sample. For example, some quality control markers provide direct information about the sample itself, which can include pH, proteolytic activity, or nuclease activity.
A filter consistent with the use of QC markers is a Noviplex Plasma Prep Card (Novilytic Labs), which comprises multiple layers that include an overlay (surface layer), a spreading layer, a separator (for filtering cells), a plasma collection reservoir, an isolation card, and a base card. In these types of filters, at least one QC marker can be disposed on at least one of the overlay, the spreading layer, the separator, the plasma collection reservoir, and the plasma collection reservoir. Variations on filter structure are contemplated, and markers and methods are compatible with a broad range of filter structures.
A QC marker can be positioned on a collection device based on the information the marker is intended to provide. For example, a marker for measuring the efficiency of sample migration from the overlay (surface) to the plasma collection reservoir is positioned on the overlay such that it co-migrates with the sample to the reservoir following sample deposition on the filter. Quantifying the marker in eluted sample relative to a marker in the collection reservoir, for example, can provide the elution efficiency of the device.
The corresponding marker, for example, having a known mass spectrometry migration offset (e.g., due to isotope labeling or a chemical modification) can be positioned in the reservoir at a known quantity. In certain cases, both markers have a known migration offset from a endogenous molecule from the sample to allow differentiation from the endogenous molecule. After sample elution, the two markers can be quantified using mass spectrometry to determine a ratio representative of the amount or proportion of the marker that is “lost” during sample migration. This, in turn, provides an estimate of the loss of the sample or biomarker in the sample collection process. Alternatively, when at least one QC marker indicates that only a subset of the data is impaired or compromised, the sample data is optionally gated to remove the compromised subset while retaining the remaining data for subsequent analysis. For example, a QC marker may indicate temperature exposure exceeding a threshold that is predicted or known to result in degradation for certain temperature-sensitive proteins. Accordingly, the temperature-sensitive proteins or data corresponding to these proteins can be screened out from further analysis without losing the entire sample or data set.
Internal standards can be used to evaluate a QC metric. An internal standard can be used to generate a calibration curve of multiple dilutions of a known amount of a marker. This calibration curve can be used to evaluate the sensitivity, dynamic range, and other indicators of the assay performance. For example, a calibration curve may indicate a loss of signal when the quantity of a marker is below a certain threshold. This information can be used to adjust the assay or sample processing as described above such as, for example, discarding the sample and/or gating or removing data for markers that fall below the threshold.
Some embodiments involve machine learning as a component of database analysis, and accordingly some computer systems are configured to comprise a module having a machine learning capacity. Machine learning modules often comprise at least one of the following listed modalities, so as to constitute a machine learning functionality.
Modalities that constitute machine learning variously demonstrate a data filtering capacity, so as to be able to perform automated mass spectrometric data spot detection and calling. This modality is in some cases facilitated by the presence of marker polypeptides, such as heavy isotope labeled polypeptides or other markers in a mass spectrometric analysis output, so that native peptides are readily identified and in some cases quantified. The markers are optionally added to samples prior to proteolytic digestion or subsequent to proteolytic digestion. Markers are in some embodiments present on a solid backing onto which a blood spot or other sample is deposited for storage or transfer prior to analysis via mass spectroscopy.
Modalities that constitute machine learning variously demonstrate a data treatment or data processing capacity, so as to render called data spots in a form conducive to downstream analysis. Examples of data treatment include but are not necessarily limited to log transformation, assigning of scaling ratios, or mapping data to crafted features so as to render the data in a form that is conducive to downstream analysis.
Machine learning data analysis components as disclosed herein regularly process a wide range of features in a mass spectrometric data set, such as 1 to 10,000 features, or 2 to 300,000 features, or a number of features within either of these ranges or higher than either of these ranges. In some cases, data analysis involves at least 1k, 2k, 3k, 4k, 5k, 6k, 7k, 8k, 9k, 10k, 20k, 30k, 40k, 50k, 60k, 70k, 80k, 90k, 100k, 120k, 140k, 160k, 180k, 200k, 220k, 2240k, 260k, 280k, 300k, or more than 300k features.
Features are selected using any number of approaches consistent with the disclosure herein. In some cases, feature selection comprises elastic net, information gain, random forest imputing or other feature selection approaches consistent with the disclosure herein and familiar to one of skill in the art.
Selected feature are assembled into classifiers, again using any number of approaches consistent with the disclosure herein. In some cases, classifier generation comprises logistic regression, SVM, random forest, KNN, or other classifier approaches consistent with the disclosure herein and familiar to one of skill in the art.
Machine learning approaches variously comprise implementation of at least one approach selected from the list consisting of ADTree, BFTree, ConjunctiveRule, DecisionStump, Filtered Classifier, J48, J48Graft, JRip, LADTree, NNge, OneR, OrdinalClassClassifier, PART, Ridor, SimpleCart, Random Forest and SVM.
Applying machine learning, or providing a machine learning module on a computer configured for the analyses disclosed herein, allows for the detection of relevant panels for asymptomatic disease detection or early detection as part of an ongoing monitoring procedure, so as to identify a disease or disorder either ahead of symptom development or while intervention is either more easily accomplished or more likely to bring about a successful outcome. Monitoring is often but not necessarily performed in combination with or in support of a genetic assessment indicating a genetic predisposition for a disorder for which a signature of onset or progression is monitored. Similarly, in some cases machine learning is used to facilitate monitoring of or assessment of treatment efficacy for a treatment regimen, such that the treatment regimen can be modified over time, continued or resolved as indicated by the ongoing proteomics mediated monitoring.
Machine learning approaches and computer systems having modules configured to execute machine learning algorithms facilitate identification of classifiers or panels in datasets of varying complexity. In some cases the classifiers or panels are identified from an untargeted database comprising a large amount of mass spectrometric data, such as data obtained from a single individual at multiple time points, samples taken from multiple individuals such as multiple individuals of a known status for a condition of interest or known eventual treatment outcome or response, or from multiple time points and multiple individuals.
Alternately, in some cases machine learning facilitates the refinement of a panel through the analysis of a database targeted to that panel, by for example collecting panel information for that panel from a single individual over multiple time points, when a health condition for the individual is known for the time points, or collecting panel information from multiple individuals of known status for a condition of interest, or collecting panel information from multiple individuals at multiple time points. As is readily apparent, in some cases collection of panel information is facilitated through the use of mass markers, such as heavy-labeled or ‘light-labeled’ mass markers that migrate so as to identify nearby unlabeled spots corresponding to the marked polypeptides. Thus, panel information is collected either alone or in combination with untargeted mass spectrometric data collection. Panel data is subjected to machine learning, for example on a computer system configured as disclosed herein, so as to identify a subset of panel markers that either alone or in combination with one or more non-panel markers analyzed through an untargeted approach, account for a health status signal. Thus, machine learning in some cases facilitates identification of a panel that is individually informative of a health status in an individual.
Methods, databases and computers configured to receive mass spectrometric data as disclosed herein often involve processing mass spectrometric data sets that are spatially, temporally or spatially and temporally large. That is, datasets are generated that in some cases comprise large amounts of mass spectrometric data points per sample collected, are generated from large numbers of collected samples, and are in some cases generated from multiple samples derived from a single individual.
Data collection is in some cases facilitated by depositing samples such as dried blood samples (or other readily obtained samples such as urine, sweat, saliva or other fluid or tissue) onto a solid framework such as a solid backing or solid three-dimensional framework. The sample such as a blood sample is deposited on the solid backing or framework, where it is actively or passively dried, facilitating storage or transport from a collection point to a location where it may be processed.
As disclosed herein, a number of approaches are available for recovering proteomic or other biomarker information from a dried sample such as a dried blood spot sample. In some cases samples are solubilized, for example in TFE, and subjected to proteolysis to generate fragments to be visualized by mass spectrometric analysis. Proteolysis is accomplished by enzymatic or non-enzymatic treatment. Exemplary proteases include trypsin, but also enzymes such as proteinase K, enteropeptidase, furin, liprotamase, bromelain, serratipeptidase, thermolysin, collagenase, plasmin, or any number of serine proteases, cysteine proteases or other specific or nonspecific enzymatic peptidases, used singly or in combination. Nonenzymatic protease treatments, such as high temperature, pH treatment, cyanogen bromide and other treatments are also consistent with some embodiments.
When particular mass spectrometric fragments are of interest or use in analysis, such as a biomarker panel indicative of a health condition status, it is often beneficial to include heavy-labeled or other markers as standard markers as described herein. Markers, as discussed, migrate on a mass spectrometric output at a known position and at a known offset relative to the sample fragments of interest. Inclusion of these markers often leads to ‘offset doublets’ in mass spectrometric output. By detecting these doublets, one can readily, either personally or through an automated data analysis workflow, identify particular spots of interest to a health condition status among and in addition to the full range of mass spectrometric output data. When the markers have known mass and amount, and optionally when the amount loaded into a sample varies among markers, the markers are also useful as mass standards, facilitating quantification of both the marker-associated fragments and the remaining fragments in the mass spectrometric output.
Standard markers are introduced to a sample either at collection, during or subsequent to resolubilization, prior to digestion or subsequent to digestion. That is, in some cases a sample collection structure such as a solid backing or a three-dimensional volume is ‘pre-loaded’ so as to have a standard marker or standard markers present prior to sample collection. Alternately, the standard markers are added to the collection structure subsequent to sample collection, subsequent to sample drying on the structure, during or subsequent to sample collection, during or subsequent to sample resolubilization, or during or subsequent to sample proteolysis treatment. In preferred embodiments, exactly or about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 225, 250, 275, 300, or more than 300 standard markers are added to a collection structure prior to sample collection, such that standard processing of the sample results in a mass spectrometric output having the standard markers included in the output without any additional processing of the sample. Accordingly, some methods disclosed herein comprise providing a collection device having sample markers introduced onto the surface prior to sample collection, and some devices or computer systems are configured to receive mass spectrometric data having standard markers included therein, and optionally to identify the mass spectrometric markers and their corresponding native mass fragment.
As used in the specification and claims, the singular forms “a”, “an” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “a sample” includes a plurality of samples, including mixtures thereof.
The terms “determining”, “measuring”, “evaluating”, “assessing,” “assaying,” and “analyzing” are often used interchangeably herein to refer to forms of measurement, and include determining if an element is present or not (for example, detection). These terms can include quantitative, qualitative or quantitative and qualitative determinations. Assessing is alternatively relative or absolute. “Detecting the presence of” includes determining the amount of something present, as well as determining whether it is present or absent.
The terms “panel”, “biomarker panel”, “protein panel” are used interchangeably herein to refer to a set of biomarkers, wherein the set of biomarkers comprises at least two biomarkers. Exemplary biomarkers are proteins or polypeptide fragments of proteins that are uniquely or confidently mapped to particular proteins. However, additional biomarkers are also contemplated, for example age or gender of the individual providing a sample. The biomarker panel is often predictive and/or informative of a subject's health status, disease, or condition.
The “level” of a biomarker panel refers to the absolute and relative levels of the panel's constituent markers and the relative pattern of the panel's constituent biomarkers.
The terms “colorectal cancer” and “CRC” are used interchangeably herein. The term “colorectal cancer status”, “CRC status” can refer to the status of the disease in subject. Examples of types of CRC statuses include, but are not limited to, the subject's risk of cancer, including colorectal carcinoma, the presence or absence of disease (for example, adenocarcinoma), the stage of disease in a patient (for example, carcinoma), and the effectiveness of treatment of disease. In some cases, a health status is the presence or absence of an adenoma or polyp that is pre-CRC.
The term “mass spectrometer” can refer to a gas phase ion spectrometer that measures a parameter that can be translated into mass-to-charge (m/z) ratios of gas phase ions. Mass spectrometers generally include an ion source and a mass analyzer. Examples of mass spectrometers are time-of-flight, magnetic sector, quadrupole filter, ion trap, ion cyclotron resonance, electrostatic sector analyzer and hybrids of these. “Mass spectrometry” can refer to the use of a mass spectrometer to detect gas phase ions.
The term “biomarker” and “marker” are used interchangeably herein, and can refer to a polypeptide, gene, nucleic acid (for example, DNA and/or RNA) which is differentially present in a sample taken from a subject having a disease for which a diagnosis is desired (for example, CRC), or to other data obtained from the subject with or without sample acquisition, such as patient age information or patient gender information, as compared to a comparable sample or comparable data taken from control subject that does not have the disease (for example, a person with a negative diagnosis or undetectable CRC, normal or healthy subject, or, for example, from the same individual at a different time point). Common biomarkers herein include proteins, or protein fragments that are uniquely or confidently mapped to a particular protein (or, in cases such as SAA, above, a pair or group of closely related proteins), transition ion of an amino acid sequence, or one or more modifications of a protein such as phosphorylation, glycosylation or other post-translational or co-translational modification. In addition, a protein biomarker can be a binding partner of a protein, protein fragment, or transition ion of an amino acid sequence.
The terms “polypeptide,” “peptide” and “protein” are often used interchangeably herein in reference to a polymer of amino acid residues. A protein, generally, refers to a full-length polypeptide as translated from a coding open reading frame, or as processed to its mature form, while a polypeptide or peptide informally refers to a degradation fragment or a processing fragment of a protein that nonetheless uniquely or identifiably maps to a particular protein. A polypeptide can be a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. Polypeptides can be modified, for example, by the addition of carbohydrate, phosphorylation, etc. Proteins can comprise one or more polypeptides.
An “immunoassay” is an assay that uses an antibody to specifically bind an antigen (for example, a marker). The immunoassay can be characterized by the use of specific binding properties of a particular antibody to isolate, target, and/or quantify the antigen.
An “aptamer assay” is an assay that uses an oligonucleotide (e.g., DNA, RNA, or a nucleic acid analogue such as peptide nucleic acid, morpholino, glycol nucleic acid, or threose nucleic acid) or a peptide molecule to specifically bind a target (for example, a protein or peptide biomarker). The aptamer assay can be characterized by the use of specific binding properties of a particular aptamer molecule to isolate, target, and/or quantify the target.
The term “antibody” can refer to a polypeptide ligand substantially encoded by an immunoglobulin gene or immunoglobulin genes, or fragments thereof, which specifically binds and recognizes an epitope. Antibodies exist, for example, as intact immunoglobulins or as a number of well-characterized fragments produced by digestion with various peptidases.
The term “tumor” can refer to a solid or fluid-filled lesion or structure that may be formed by cancerous or non-cancerous cells, such as cells exhibiting aberrant cell growth or division. The terms “mass” and “nodule” are often used synonymously with “tumor”. Tumors include malignant tumors or benign tumors. An example of a malignant tumor can be a carcinoma which is known to comprise transformed cells.
The terms “subject,” “individual,” or “patient” are often used interchangeably herein. A “subject” can be a biological entity containing expressed genetic materials. The biological entity can be a plant, animal, or microorganism, including, for example, bacteria, viruses, fungi, and protozoa. The subject can be tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro. The subject can be a mammal. The mammal can be a human. The subject may be diagnosed or suspected of being at high risk for a disease. The disease can be cancer. The cancer can be CRC (CRC). In some cases, the subject is not necessarily diagnosed or suspected of being at high risk for the disease.
The term specificity, or true negative rate, can refer to a test's ability to exclude a condition correctly. For example, in a diagnostic test, the specificity of a test is the proportion of patients known not to have the disease, who will test negative for it. In some cases, this is calculated by determining the proportion of true negatives (i.e. patients who test negative who do not have the disease) to the total number of healthy individuals in the population (i.e., the sum of patients who test negative and do not have the disease and patients who test positive and do not have the disease).
The term sensitivity, or true positive rate, can refer to a test's ability to identify a condition correctly. For example, in a diagnostic test, the sensitivity of a test is the proportion of patients known to have the disease, who will test positive for it. In some cases, this is calculated by determining the proportion of true positives (i.e. patients who test positive who have the disease) to the total number of individuals in the population with the condition (i.e., the sum of patients who test positive and have the condition and patients who test negative and have the condition).
The quantitative relationship between sensitivity and specificity can change as different diagnostic cut-offs are chosen. This variation can be represented using ROC curves. The x-axis of a ROC curve shows the false-positive rate of an assay, which can be calculated as (1−specificity). The y-axis of a ROC curve reports the sensitivity for an assay. This allows one to easily determine a sensitivity of an assay for a given specificity, and vice versa.
As used herein, the term ‘about’ a number refers to that number plus or minus 10% of that number. The term ‘about’ a range refers to that range minus 10% of its lowest value and plus 10% of its greatest value.
As used herein, the terms “treatment” or “treating” are used in reference to a pharmaceutical or other intervention regimen for obtaining beneficial or desired results in the recipient. Beneficial or desired results include but are not limited to a therapeutic benefit and/or a prophylactic benefit. A therapeutic benefit may refer to eradication or amelioration of symptoms or of an underlying disorder being treated. Also, a therapeutic benefit can be achieved with the eradication or amelioration of one or more of the physiological symptoms associated with the underlying disorder such that an improvement is observed in the subject, notwithstanding that the subject may still be afflicted with the underlying disorder. A prophylactic effect includes delaying, preventing, or eliminating the appearance of a disease or condition, delaying or eliminating the onset of symptoms of a disease or condition, slowing, halting, or reversing the progression of a disease or condition, or any combination thereof. For prophylactic benefit, a subject at risk of developing a particular disease, or to a subject reporting one or more of the physiological symptoms of a disease may undergo treatment, even though a diagnosis of this disease may not have been made.
In some embodiments, the platforms, systems, media, and methods described herein include a digital processing device, or use of the same. In further embodiments, the digital processing device includes one or more hardware central processing units (CPUs) or general purpose graphics processing units (GPGPUs) that carry out the device's functions. In still further embodiments, the digital processing device further comprises an operating system configured to perform executable instructions. In some embodiments, the digital processing device is optionally connected a computer network. In further embodiments, the digital processing device is optionally connected to the Internet such that it accesses the World Wide Web. In still further embodiments, the digital processing device is optionally connected to a cloud computing infrastructure. In other embodiments, the digital processing device is optionally connected to an intranet. In other embodiments, the digital processing device is optionally connected to a data storage device.
In accordance with the description herein, suitable digital processing devices include, by way of non-limiting examples, server computers, desktop computers, laptop computers, notebook computers, sub-notebook computers, netbook computers, netpad computers, set-top computers, media streaming devices, handheld computers, Internet appliances, mobile smartphones, tablet computers, personal digital assistants, video game consoles, and vehicles. Those of skill in the art will recognize that many smartphones are suitable for use in the system described herein. Those of skill in the art will also recognize that select televisions, video players, and digital music players with optional computer network connectivity are suitable for use in the system described herein. Suitable tablet computers include those with booklet, slate, and convertible configurations, known to those of skill in the art.
In some embodiments, the digital processing device includes an operating system configured to perform executable instructions. The operating system is, for example, software, including programs and data, which manages the device's hardware and provides services for execution of applications. Those of skill in the art will recognize that suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®. Those of skill in the art will recognize that suitable personal computer operating systems include, by way of non-limiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX-like operating systems such as GNU/Linux®. In some embodiments, the operating system is provided by cloud computing. Those of skill in the art will also recognize that suitable mobile smart phone operating systems include, by way of non-limiting examples, Nokia® Symbian® OS, Apple® iOS®, Research In Motion® BlackBerry OS®, Google® Android®, Microsoft® Windows Phone® OS, Microsoft® Windows Mobile® OS, Linux®, and Palm® WebOS®. Those of skill in the art will also recognize that suitable media streaming device operating systems include, by way of non-limiting examples, Apple TV®, Roku®, Boxee®, Google TV®, Google Chromecast®, Amazon Fire®, and Samsung® HomeSync®. Those of skill in the art will also recognize that suitable video game console operating systems include, by way of non-limiting examples, Sony® PS3®, Sony® PS4®, Microsoft® Xbox 360®, Microsoft Xbox One, Nintendo® Wii®, Nintendo® Wii U®, and Ouya®.
In some embodiments, the device includes a storage and/or memory device. The storage and/or memory device is one or more physical apparatuses used to store data or programs on a temporary or permanent basis. In some embodiments, the device is volatile memory and requires power to maintain stored information. In some embodiments, the device is non-volatile memory and retains stored information when the digital processing device is not powered. In further embodiments, the non-volatile memory comprises flash memory. In some embodiments, the non-volatile memory comprises dynamic random-access memory (DRAM). In some embodiments, the non-volatile memory comprises ferroelectric random access memory (FRAM). In some embodiments, the non-volatile memory comprises phase-change random access memory (PRAM). In other embodiments, the device is a storage device including, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, magnetic disk drives, magnetic tapes drives, optical disk drives, and cloud computing based storage. In further embodiments, the storage and/or memory device is a combination of devices such as those disclosed herein.
In some embodiments, the digital processing device includes a display to send visual information to a user. In some embodiments, the display is a cathode ray tube (CRT). In some embodiments, the display is a liquid crystal display (LCD). In further embodiments, the display is a thin film transistor liquid crystal display (TFT-LCD). In some embodiments, the display is an organic light emitting diode (OLED) display. In various further embodiments, on OLED display is a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display. In some embodiments, the display is a plasma display. In other embodiments, the display is a video projector. In still further embodiments, the display is a combination of devices such as those disclosed herein.
In some embodiments, the digital processing device includes an input device to receive information from a user. In some embodiments, the input device is a keyboard. In some embodiments, the input device is a pointing device including, by way of non-limiting examples, a mouse, trackball, track pad, joystick, game controller, or stylus. In some embodiments, the input device is a touch screen or a multi-touch screen. In other embodiments, the input device is a microphone to capture voice or other sound input. In other embodiments, the input device is a video camera or other sensor to capture motion or visual input. In further embodiments, the input device is a Kinect, Leap Motion, or the like. In still further embodiments, the input device is a combination of devices such as those disclosed herein.
In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked digital processing device. In further embodiments, a computer readable storage medium is a tangible component of a digital processing device. In still further embodiments, a computer readable storage medium is optionally removable from a digital processing device. In some embodiments, a computer readable storage medium includes, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, cloud computing systems and services, and the like. In some cases, the program and instructions are permanently, substantially permanently, semi-permanently, or non-transitorily encoded on the media.
In some embodiments, the platforms, systems, media, and methods disclosed herein include at least one computer program, or use of the same. A computer program includes a sequence of instructions, executable in the digital processing device's CPU, written to perform a specified task. Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), data structures, and the like, that perform particular tasks or implement particular abstract data types. In light of the disclosure provided herein, those of skill in the art will recognize that a computer program may be written in various versions of various languages.
The functionality of the computer readable instructions may be combined or distributed as desired in various environments. In some embodiments, a computer program comprises one sequence of instructions. In some embodiments, a computer program comprises a plurality of sequences of instructions. In some embodiments, a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations. In various embodiments, a computer program includes one or more software modules. In various embodiments, a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof.
In some embodiments, a computer program includes a web application. In light of the disclosure provided herein, those of skill in the art will recognize that a web application, in various embodiments, utilizes one or more software frameworks and one or more database systems. In some embodiments, a web application is created upon a software framework such as Microsoft®.NET or Ruby on Rails (RoR). In some embodiments, a web application utilizes one or more database systems including, by way of non-limiting examples, relational, non-relational, object oriented, associative, and XML database systems. In further embodiments, suitable relational database systems include, by way of non-limiting examples, Microsoft® SQL Server, mySQL™, and Oracle®. Those of skill in the art will also recognize that a web application, in various embodiments, is written in one or more versions of one or more languages. A web application may be written in one or more markup languages, presentation definition languages, client-side scripting languages, server-side coding languages, database query languages, or combinations thereof. In some embodiments, a web application is written to some extent in a markup language such as Hypertext Markup Language (HTML), Extensible Hypertext Markup Language (XHTML), or eXtensible Markup Language (XML). In some embodiments, a web application is written to some extent in a presentation definition language such as Cascading Style Sheets (CSS). In some embodiments, a web application is written to some extent in a client-side scripting language such as Asynchronous Javascript and XML (AJAX), Flash® Actionscript, Javascript, or Silverlight®. In some embodiments, a web application is written to some extent in a server-side coding language such as Active Server Pages (ASP), ColdFusion®, Perl, Java™, JavaServer Pages (JSP), Hypertext Preprocessor (PHP), Python™, Ruby, Tcl, Smalltalk, WebDNA®, or Groovy. In some embodiments, a web application is written to some extent in a database query language such as Structured Query Language (SQL). In some embodiments, a web application integrates enterprise server products such as IBM® Lotus Domino®. In some embodiments, a web application includes a media player element. In various further embodiments, a media player element utilizes one or more of many suitable multimedia technologies including, by way of non-limiting examples, Adobe® Flash®, HTML 5, Apple® QuickTime®, Microsoft® Silverlight®, Java™, and Unity®.
In some embodiments, a computer program includes a mobile application provided to a mobile digital processing device. In some embodiments, the mobile application is provided to a mobile digital processing device at the time it is manufactured. In other embodiments, the mobile application is provided to a mobile digital processing device via the computer network described herein.
In view of the disclosure provided herein, a mobile application is created by techniques known to those of skill in the art using hardware, languages, and development environments known to the art. Those of skill in the art will recognize that mobile applications are written in several languages. Suitable programming languages include, by way of non-limiting examples, C, C++, C #, Objective-C, Java™, Javascript, Pascal, Object Pascal, Python™, Ruby, VB.NET, WML, and XHTML/HTML with or without CSS, or combinations thereof.
Suitable mobile application development environments are available from several sources. Commercially available development environments include, by way of non-limiting examples, AirplaySDK, alcheMo, Appcelerator®, Celsius, Bedrock, Flash Lite, .NET Compact Framework, Rhomobile, and WorkLight Mobile Platform. Other development environments are available without cost including, by way of non-limiting examples, Lazarus, MobiFlex, MoSync, and Phonegap. Also, mobile device manufacturers distribute software developer kits including, by way of non-limiting examples, iPhone and iPad (iOS) SDK, Android™ SDK, BlackBerry® SDK, BREW SDK, Palm® OS SDK, Symbian SDK, webOS SDK, and Windows® Mobile SDK.
Those of skill in the art will recognize that several commercial forums are available for distribution of mobile applications including, by way of non-limiting examples, Apple® App Store, Google® Play, Chrome Web Store, BlackBerry® App World, App Store for Palm devices, App Catalog for webOS, Windows® Marketplace for Mobile, Ovi Store for Nokia® devices, Samsung® Apps, and Nintendo® DSi Shop.
In some embodiments, a computer program includes a standalone application, which is a program that is run as an independent computer process, not an add-on to an existing process, e.g., not a plug-in. Those of skill in the art will recognize that standalone applications are often compiled. A compiler is a computer program(s) that transforms source code written in a programming language into binary object code such as assembly language or machine code. Suitable compiled programming languages include, by way of non-limiting examples, C, C++, Objective-C, COBOL, Delphi, Eiffel, Java™, Lisp, Python™, Visual Basic, and VB .NET, or combinations thereof. Compilation is often performed, at least in part, to create an executable program. In some embodiments, a computer program includes one or more executable complied applications.
In some embodiments, the computer program includes a web browser plug-in (e.g., extension, etc.). In computing, a plug-in is one or more software components that add specific functionality to a larger software application. Makers of software applications support plug-ins to enable third-party developers to create abilities which extend an application, to support easily adding new features, and to reduce the size of an application. When supported, plug-ins enable customizing the functionality of a software application. For example, plug-ins are commonly used in web browsers to play video, generate interactivity, scan for viruses, and display particular file types. Those of skill in the art will be familiar with several web browser plug-ins including, Adobe® Flash® Player, Microsoft® Silverlight®, and Apple® QuickTime®. In some embodiments, the toolbar comprises one or more web browser extensions, add-ins, or add-ons. In some embodiments, the toolbar comprises one or more explorer bars, tool bands, or desk bands.
In view of the disclosure provided herein, those of skill in the art will recognize that several plug-in frameworks are available that enable development of plug-ins in various programming languages, including, by way of non-limiting examples, C++, Delphi, Java™ PHP, Python™, and VB .NET, or combinations thereof.
Web browsers (also called Internet browsers) are software applications, designed for use with network-connected digital processing devices, for retrieving, presenting, and traversing information resources on the World Wide Web. Suitable web browsers include, by way of non-limiting examples, Microsoft® Internet Explorer®, Mozilla® Firefox®, Google® Chrome, Apple® Safari®, Opera Software® Opera®, and KDE Konqueror. In some embodiments, the web browser is a mobile web browser. Mobile web browsers (also called mircrobrowsers, mini-browsers, and wireless browsers) are designed for use on mobile digital processing devices including, by way of non-limiting examples, handheld computers, tablet computers, netbook computers, subnotebook computers, smartphones, music players, personal digital assistants (PDAs), and handheld video game systems. Suitable mobile web browsers include, by way of non-limiting examples, Google® Android® browser, RIM BlackBerry® Browser, Apple® Safari®, Palm® Blazer, Palm® WebOS Browser, Mozilla® Firefox® for mobile, Microsoft® Internet Explorer® Mobile, Amazon Kindle Basic Web, Nokia® Browser, Opera Software® Opera® Mobile, and Sony® PSP™ browser.
In some embodiments, the platforms, systems, media, and methods disclosed herein include software, server, and/or database modules, or use of the same. In view of the disclosure provided herein, software modules are created by techniques known to those of skill in the art using machines, software, and languages known to the art. The software modules disclosed herein are implemented in a multitude of ways. In various embodiments, a software module comprises a file, a section of code, a programming object, a programming structure, or combinations thereof. In further various embodiments, a software module comprises a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, or combinations thereof. In various embodiments, the one or more software modules comprise, by way of non-limiting examples, a web application, a mobile application, and a standalone application. In some embodiments, software modules are in one computer program or application. In other embodiments, software modules are in more than one computer program or application. In some embodiments, software modules are hosted on one machine. In other embodiments, software modules are hosted on more than one machine. In further embodiments, software modules are hosted on cloud computing platforms. In some embodiments, software modules are hosted on one or more machines in one location. In other embodiments, software modules are hosted on one or more machines in more than one location.
In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more databases, or use of the same. In view of the disclosure provided herein, those of skill in the art will recognize that many databases are suitable for storage and retrieval of biomarker information. In various embodiments, suitable databases include, by way of non-limiting examples, relational databases, non-relational databases, object oriented databases, object databases, entity-relationship model databases, associative databases, and XML databases. Further non-limiting examples include SQL, PostgreSQL, MySQL, Oracle, DB2, and Sybase. In some embodiments, a database is internet-based. In further embodiments, a database is web-based. In still further embodiments, a database is cloud computing-based. In other embodiments, a database is based on one or more local computer storage devices.
The following embodiments recite nonlimiting permutations of combinations of features disclosed herein. Other permutations of combinations of features are also contemplated. 1. A method of assessing a colorectal health risk status in an individual, comprising steps of obtaining a circulating blood sample from said individual; and obtaining a biomarker panel level for at least one of A2GL, ALS, PTPRJ, and age of said individual, and assessing colorectal health risk status. 2. A method of analyzing a biological sample, comprising: obtaining protein levels in said biological sample for each protein of a biomarker panel comprising A2GL, ALS, and PTPRJ to determine a panel information for said biomarker panel; comparing said panel information to a reference panel information, wherein said reference panel information corresponds to a known colorectal cancer status; and categorizing said biological sample as having a positive colorectal cancer risk status if said panel information does not differ significantly from said reference panel information, wherein said biological sample is derived from a circulating blood sample. 3. The method of embodiment 2, wherein said biomarker panel further comprises at least one of an individual age and an individual gender. 4. The method of embodiment 2, wherein said known colorectal cancer status comprises at least one of early CRC and advanced CRC. 5. The method of embodiment 2, wherein said known colorectal cancer status comprises at least one of advanced adenoma, Stage 0 CRC, stage I CRC, Stage II CRC, stage III CRC, and stage IV CRC. 6. The method of embodiment 2, wherein said biomarker panel comprises no more than 20 proteins. 7. The method of embodiment 2, wherein said biomarker panel comprises no more than 10 proteins. 8. The method of embodiment 2, wherein said categorizing has a sensitivity of at least 70% and a specificity of at least 70%, or a sensitivity of at least 81% and a specificity of at least 78%. 9. The method of embodiment 2, further comprising performing a treatment regimen in response to said categorizing. 10. The method of embodiment 9, wherein said treatment regimen comprises at least one of chemotherapy, radiation, immunotherapy, administration of a biologic therapeutic agent, polypectomy, partial colectomy, low anterior resection or abdominoperineal resection and colostomy. 11. The method of embodiment 2, further comprising transmitting a report of results of said categorizing to a health practitioner. 12. The method of embodiment 11, wherein said report indicates a sensitivity of at least 70% or at least 81%. 13. The method of embodiment 11, wherein said report indicates a specificity of at least 70% or at least 78%. 14. The method of embodiment 11, wherein said report indicates a recommendation for a treatment regimen comprising at least one of chemotherapy, radiation, immunotherapy, administration of a biologic therapeutic agent, polypectomy, partial colectomy, low anterior resection or abdominoperineal resection and colostomy. 15. The method of embodiment 11, wherein said report indicates a recommendation for a colonoscopy. 16. The method of embodiment 11, wherein said report indicates a recommendation for undergoing an independent cancer assay. 17. The method of embodiment 11, wherein said report indicates a recommendation for undergoing a stool cancer assay. 18. The method of embodiment 2, further comprising performing a stool cancer assay in response to said categorizing. 19. The method of embodiment 2, further comprising continued monitoring for a period of 3 months or greater. 20. The method of embodiment 2, further comprising continued monitoring for a period of between 3 months and 24 months. 21. The method of embodiment 2, wherein said obtaining said protein levels comprises subjecting said biological sample to a mass spectrometric analysis. 22. The method of embodiment 2, wherein said obtaining said protein levels comprises subjecting said biological sample to an immunoassay analysis. 23. A method of analyzing a biological sample, comprising: obtaining protein levels in said biological sample for each protein of a biomarker panel comprising A2GL, ALS, and PTPRJ to determine a panel information for said biomarker panel; comparing said panel information to a reference panel information, wherein said reference panel information corresponds to a known advanced adenoma status; and categorizing said blood sample as having a positive advanced adenoma risk status if said panel information does not differ significantly from said reference panel information, wherein said biological sample is derived from a circulating blood sample. 24. The method of embodiment 23, wherein said biomarker panel further comprises at least one of an individual age and an individual gender. 25. The method of embodiment 23, wherein said biomarker panel comprises no more than 20 proteins. 26. The method of embodiment 23, wherein said biomarker panel comprises no more than 10 proteins. 27. The method of embodiment 23, wherein said categorizing has a sensitivity of at least 44% and a specificity of at least 80%. 28. The method of embodiment 23, further comprising performing a treatment regimen in response to said categorizing. 29. The method of embodiment 28, wherein said treatment regimen comprises at least one of chemotherapy, radiation, immunotherapy, administration of a biologic therapeutic agent, polypectomy, partial colectomy, low anterior resection or abdominoperineal resection and colostomy. 30. The method of embodiment 23, comprising transmitting a report of results of said categorizing to a health practitioner. 31. The method of embodiment 30, wherein said report indicates a sensitivity of at least 70% or at least 81%. 32. The method of embodiment 30, wherein said report indicates a specificity of at least 70% or at least 87%. 33. The method of embodiment 30, wherein said report indicates a recommendation for a treatment regimen comprising at least one of chemotherapy, radiation, immunotherapy, administration of a biologic therapeutic agent, polypectomy, partial colectomy, low anterior resection or abdominoperineal resection and colostomy. 34. The method of embodiment 30, wherein said report indicates a recommendation for a colonoscopy. 35. The method of embodiment 30, wherein said report indicates a recommendation for undergoing an independent cancer assay. 36. The method of embodiment 30, wherein said report indicates a recommendation for undergoing a stool cancer assay. 37. The method of embodiment 23, further comprising performing a stool cancer assay. 38. The method of embodiment 23, further comprising continued monitoring for a period of 3 months or greater. 39. The method of embodiment 23, further comprising continued monitoring for a period of between 3 months and 24 months. 40. The method of embodiment 23, wherein obtaining said protein levels comprises subjecting said biological sample to a mass spectrometric analysis. 41. The method of embodiment 23, wherein said obtaining said protein levels comprises subjecting said biological sample to an immunoassay analysis. 42. A method of analyzing data generated in vitro, comprising: storing, by a processor, a panel information corresponding to a biological sample, wherein said panel information comprises protein levels for each protein of a biomarker panel comprising A2GL, ALS, and PTPRJ; comparing, by said processor, said panel information to a reference panel information, wherein said reference panel information corresponds to a known colorectal cancer status; and categorizing, by said processor, said panel information as having a positive colorectal cancer risk status if said panel information does not differ significantly from said reference panel information. 43. The method of embodiment 42, wherein said biomarker panel further comprises at least one of an individual age and an individual gender. 44. The method of embodiment 42, wherein said known colorectal cancer status comprises at least one of early CRC and advanced CRC. 45. The method of embodiment 42, wherein said known colorectal cancer status comprises at least one of advanced adenoma, Stage 0 CRC, stage I CRC, Stage II CRC, stage III CRC, and stage IV CRC. 46. The method of embodiment 42, wherein said biomarker panel comprises no more than 20 proteins. 47. The method of embodiment 42, wherein said biomarker panel comprises no more than 10 proteins. 48. The method of embodiment 42, wherein said categorizing has a sensitivity of at least 70% and a specificity of at least 70%, or a sensitivity of at least 81% and a specificity of at least 78%. 49. The method of embodiment 42, wherein said processor is further configured to generate a report indicating said positive colorectal cancer risk status. 50. The method of embodiment 49, wherein said report further indicates recommendation for a treatment regimen in response to said categorizing. 51. The method of embodiment 49, wherein said treatment regimen comprises at least one of chemotherapy, radiation, immunotherapy, administration of a biologic therapeutic agent, polypectomy, partial colectomy, low anterior resection or abdominoperineal resection and colostomy. 52. The method of embodiment 49, wherein said report indicates a sensitivity of at least 70% or at least 81%. 53. The method of embodiment 49, wherein said report indicates a specificity of at least 70% or at least 78%. 54. The method of embodiment 49, wherein said report indicates recommendation for a colonoscopy. 55. The method of embodiment 49, wherein said report indicates recommendation for undergoing an independent cancer assay. 56. The method of embodiment 49, wherein said report indicates recommendation for undergoing a stool cancer assay. 57. A method of analyzing data generated in vitro, comprising: storing a panel information comprising protein levels for each protein of a biomarker panel comprising A2GL, ALS, and PTPRJ; comparing said panel information to a reference panel information, wherein said reference panel information corresponds to a known advanced adenoma status; and categorizing said panel information as having a positive advance adenoma risk status if said panel information does not differ significantly from said reference panel information. 58. The method of embodiment 57, wherein said biomarker panel further comprises at least one of an individual age and an individual gender. 59. The method of embodiment 57, wherein said biomarker panel comprises no more than 20 proteins. 60. The method of embodiment 57, wherein said biomarker panel comprises no more than 10 proteins. 61. The method of embodiment 57, wherein said categorizing has a sensitivity of at least 70% and a specificity of at least 70%. 62. The method of embodiment 57, further comprising generating a report indicating said positive advanced adenoma status. 63. The method of embodiment 62, wherein said report further indicates recommendation for a treatment regimen in response to said categorizing. 64. The method of embodiment 63, wherein said treatment regimen comprises at least one of chemotherapy, radiation, immunotherapy, administration of a biologic therapeutic agent, polypectomy, partial colectomy, low anterior resection or abdominoperineal resection and colostomy. 65. The method of embodiment 62, wherein said report indicates a sensitivity of at least 70%. 66. The method of embodiment 62, wherein said report indicates a specificity of at least 70%. 67. The method of embodiment 62, wherein said report indicates recommendation for a colonoscopy. 68. The method of embodiment 62, wherein said report indicates recommendation for undergoing an independent cancer assay. 69. The method of embodiment 62, wherein said report indicates recommendation for undergoing a stool cancer assay. 70. A computer system for analyzing data generated in vitro, comprising: (a) a memory unit for receiving a panel information comprising measurement of protein levels of each protein in a biomarker panel from a biological sample, wherein the biomarker panel comprises A2GL, ALS, and PTPRJ; (b) computer-executable instructions for comparing said panel information to a reference panel information, wherein said reference panel information corresponds to a known colorectal cancer status; and (c) computer-executable instructions for categorizing said panel information as having a positive colorectal cancer status if said panel information does not differ significantly from said reference panel information. 71. The computer system of embodiment 70, further comprising computer-executable instructions to generate a report of said positive colorectal cancer status. 72. The computer system of embodiment 70, wherein said biomarker panel further comprises at least one of an individual age and an individual gender. 73. The computer system of embodiment 70, wherein said known colorectal cancer status comprises at least one of early CRC and advanced CRC. 74. The computer system of embodiment 70, wherein said known colorectal cancer status comprises at least one of advanced adenoma, Stage 0 CRC, stage I CRC, Stage II CRC, stage III CRC, and stage IV CRC. 75. The computer system of embodiment 70, wherein said biomarker panel comprises no more than 20 proteins. 76. The computer system of embodiment 70, wherein said biomarker panel comprises no more than 10 proteins. 77. The computer system of embodiment 70, wherein said categorizing has a sensitivity of at least 70% and a specificity of at least 70%. 78. The computer system of embodiment 70, further comprising generating a report indicating said positive colorectal cancer risk status. 79. The computer system of embodiment 78, wherein said report further indicates recommendation for a treatment regimen in response to said categorizing. 80. The computer system of embodiment 79, wherein said treatment regimen comprises at least one of chemotherapy, radiation, immunotherapy, administration of a biologic therapeutic agent, polypectomy, partial colectomy, low anterior resection or abdominoperineal resection and colostomy. 81. The computer system of embodiment 78, wherein said report indicates a sensitivity of at least 70%. 82. The computer system of embodiment 78, wherein said report indicates a specificity of at least 70%. 83. The computer system of embodiment 78, wherein said report indicates recommendation for a colonoscopy. 84. The computer system of embodiment 78, wherein said report indicates recommendation for undergoing an independent cancer assay. 85. The computer system of embodiment 79, wherein said report indicates recommendation for undergoing a stool cancer assay. 86. The computer system of embodiment 70, further comprising a user interface configured to communicate or display said report to a user. 87. A computer system for analyzing data generated in vitro: (a) a memory unit for receiving a panel information comprising measurement of protein levels of each protein in a biomarker panel from a biological sample, wherein said biomarker panel comprises A2GL, ALS, and PTPRJ; (b) computer-executable instructions for comparing said panel information to a reference panel information, wherein said reference panel information corresponds to a known advanced adenoma status; and (c) computer-executable instructions for categorizing said panel information as having a positive advanced adenoma status if said panel information does not differ significantly from said reference panel information. 88. The computer system of embodiment 87, wherein said biomarker panel further comprises at least one of an individual age and an individual gender. 89. The computer system of embodiment 87, wherein said biomarker panel comprises no more than 20 proteins. 90. The computer system of embodiment 87, wherein biomarker panel comprises no more than 10 proteins. 91. The computer system of embodiment 87, wherein said categorizing has a sensitivity of at least 70% and a specificity of at least 70%. 92. The computer system of embodiment 87, further comprising computer-executable instructions to generate a report of said positive advanced adenoma status. 93. The computer system of embodiment 92, wherein said report further indicates recommendation for a treatment regimen in response to said categorizing. 94. The computer system of embodiment 93, wherein said treatment regimen comprises at least one of chemotherapy, radiation, immunotherapy, administration of a biologic therapeutic agent, polypectomy, partial colectomy, low anterior resection or abdominoperineal resection and colostomy. 95. The computer system of embodiment 92, wherein said report indicates a sensitivity of at least 70%. 96. The computer system of embodiment 92, wherein said report indicates a specificity of at least 70%. 97. The computer system of embodiment 92, wherein said report indicates recommendation for a colonoscopy. 98. The computer system of embodiment 92, wherein said report indicates recommendation for undergoing an independent cancer assay. 99. The computer system of embodiment 92, wherein said report indicates recommendation for undergoing a stool cancer assay. 100. A method of assessing colorectal health of an individual, comprising: obtaining a circulating blood sample from said individual; and detecting protein levels for each member of a list of proteins in said sample, said list of proteins comprising A2GL, ALS, and PTPRJ. 101. The method of embodiment 100, further comprising diagnosing said individual as having a colorectal cancer status when said protein levels from said individual do not differ significantly from a reference panel information set corresponding to a known colorectal cancer risk status. 102. The method of embodiment 101, further comprising performing colonoscopy on said individual. 103. The method of embodiment 101, wherein said known colorectal cancer status comprises at least one of early CRC and advanced CRC. 104. The method of embodiment 101, wherein said known colorectal cancer status comprises at least one of advanced adenoma, Stage 0 CRC, stage I CRC, Stage II CRC, stage III CRC, and stage IV CRC. 105. The method of embodiment 101, further performing a treatment regimen upon said individual. 106. The method of embodiment 105, wherein said treatment regimen comprises a polypectomy. 107. The method of embodiment 105, wherein said treatment regimen comprises radiation. 108. The method of embodiment 105, wherein said treatment regimen comprises chemotherapy. 109. The method of embodiment 100, wherein said list of proteins further comprises at least one additional protein selected from Table 1. 110. The method of embodiment 100, wherein said list of proteins further comprises at least two additional proteins selected from Table 1. 111. The method of embodiment 100, wherein said list of proteins further comprises at least three additional proteins selected from Table 1. 112. The method of embodiment 100, further comprising obtaining at least one of an age and a gender of said individual. 113. The method of embodiment 100, further comprising transmitting a report to a health practitioner of results of said detecting. 114. The method of embodiment 113, wherein said report indicates recommendation for a colonoscopy for said individual. 115. The method of embodiment 113, wherein said report indicates recommendation for a polypectomy for said individual. 116. The method of embodiment 113, wherein said report indicates recommendation for radiation for said individual. 117. The method of embodiment 113, wherein said report indicates recommendation for chemotherapy for said individual. 118. The method of embodiment 113, wherein said report indicates recommendation for undergoing an independent cancer assay. 119. The method of embodiment 113, wherein said report indicates recommendation for undergoing a stool cancer assay. 120. The method of embodiment 100, wherein said list of proteins comprises no more than 20 proteins. 121. The method of embodiment 100, wherein said list of proteins comprises no more than 10 proteins. 122. A method of assessing colorectal health of an individual, comprising: obtaining a circulating blood sample from said individual; and detecting protein levels for each member of a list of proteins in said sample, said list of proteins comprising A2GL and ALS; and obtaining an age of said individual. 123. The method of embodiment 122, further comprising diagnosing said individual as having a colorectal cancer status when said protein levels from said individual do not differ significantly from a reference panel information set corresponding to a known colorectal cancer risk status. 124. The method of embodiment 123, further comprising performing colonoscopy on said individual. 125. The method of embodiment 123, wherein said known colorectal cancer status comprises at least one of early CRC and advanced CRC. 126. The method of embodiment 123, wherein said known colorectal cancer status comprises at least one of advanced adenoma, Stage 0 CRC, stage I CRC, Stage II CRC, stage III CRC, and stage IV CRC. 127. The method of embodiment 123, further performing a treatment regimen upon said individual. 128. The method of embodiment 127, wherein said treatment regimen comprises polypectomy. 129. The method of embodiment 127, wherein said treatment regimen comprises radiation. 130. The method of embodiment 127, wherein said treatment regimen comprises chemotherapy. 131. The method of embodiment 122, wherein said list of proteins further comprises PTPRJ. 132. The method of embodiment 122, wherein said list of proteins further comprises at least one additional protein selected from Table 1. 133. The method of embodiment 122, wherein said list of proteins further comprises at least two additional protein selected from Table 1. 134. The method of embodiment 122, wherein said list of proteins further comprises each additional protein selected from Table 1. 135. The method of embodiment 122, further comprising obtaining a gender of said individual. 136. The method of embodiment 122, further comprising transmitting a report to a health practitioner of results of said detecting. 137. The method of embodiment 136, wherein said report indicates recommendation for a colonoscopy for said individual. 138. The method of embodiment 136, wherein said report indicates recommendation for a polypectomy for said individual. 139. The method of embodiment 136, wherein said report indicates recommendation for radiation for said individual. 140. The method of embodiment 136, wherein said report indicates recommendation for chemotherapy for said individual. 141. The method of embodiment 136, wherein said report indicates recommendation for undergoing an independent cancer assay. 142. The method of embodiment 136, wherein said report indicates recommendation for undergoing a stool cancer assay. 143. The method of embodiment 122, wherein said list of proteins comprises no more than 15 proteins. 144. The method of embodiment 122, wherein said list of proteins comprises no more than 8 proteins. 145. A method of assessing colorectal health of an individual, comprising: obtaining a circulating blood sample from said individual; and detecting protein levels for each member of a list of proteins in the sample, said list of proteins comprising A2GL and ALS. 146. The method of embodiment 145, further comprising diagnosing said individual as having an advanced adenoma status when said protein levels from said individual do not differ significantly from a reference panel information set corresponding to a known advanced adenoma risk status. 147. The method of embodiment 146, further comprising performing colonoscopy on said individual. 148. The method of embodiment 146, further performing a treatment regimen upon said individual. 149. The method of embodiment 148, wherein said treatment regimen comprises polypectomy. 150. The method of embodiment 148, wherein said treatment regimen comprises radiation. 151. The method of embodiment 148, wherein said treatment regimen comprises chemotherapy. 152. The method of embodiment 145, wherein said list of proteins further comprises PTPRJ. 153. The method of embodiment 145, wherein said list of proteins further comprises at least one additional protein selected from Table 1. 154. The method of embodiment 145, wherein said list of proteins further comprises at least two additional proteins selected from Table 1. 155. The method of embodiment 145, wherein said list of proteins further comprises each additional protein selected from Table 1. 156. The method of embodiment 145, further comprising obtaining a gender of said individual. 157. The method of embodiment 145, further comprising transmitting a report to a health practitioner of results of said detecting. 158. The method of embodiment 157, wherein said report indicates recommendation for a colonoscopy for said individual. 159. The method of embodiment 157, wherein said report indicates recommendation for a polypectomy for said individual. 160. The method of embodiment 157, wherein said report indicates recommendation for radiation for said individual. 161. The method of embodiment 157, wherein said report indicates recommendation for chemotherapy for said individual. 162. The method of embodiment 157, wherein said report indicates recommendation for undergoing an independent cancer assay. 163. The method of embodiment 157, wherein said report indicates recommendation for undergoing a stool cancer assay. 164. The method of embodiment 145, wherein said list of proteins comprises no more than 15 proteins. 165. The method of embodiment 145, wherein said list of proteins comprises no more than 8 proteins. 166. A method of assessing colorectal health of an individual, comprising: obtaining a circulating blood sample from said individual; detecting protein levels for each member of a list of proteins in sample, said list of proteins comprising A2GL and ALS; and obtaining an age of said individual. 167. The method of embodiment 166, further comprising diagnosing said individual as having an advanced adenoma status when said protein levels from said individual do not differ significantly from a reference panel information set corresponding to a known advanced adenoma risk status. 168. The method of embodiment 167, further comprising performing colonoscopy on said individual. 169. The method of embodiment 167, further performing a treatment regimen upon said individual. 170. The method of embodiment 169, wherein said treatment regimen comprises polypectomy. 171. The method of embodiment 169, wherein said treatment regimen comprises radiation. 172. The method of embodiment 169, wherein said treatment regimen comprises chemotherapy. 173. The method of embodiment 166, wherein said list of proteins further comprises PTPRJ. 174. The method of embodiment 173, wherein said list of proteins further comprises at least one additional protein selected from Table 1. 175. The method of embodiment 166, further comprising obtaining a gender of said individual. 176. The method of embodiment 166, further comprising transmitting a report to a health practitioner of results of said detecting. 177. The method of embodiment 176, wherein said report indicates recommendation for a colonoscopy for said individual. 178. The method of embodiment 176, wherein said report indicates recommendation for a polypectomy for said individual. 179. The method of embodiment 176, wherein said report indicates recommendation for radiation for said individual. 180. The method of embodiment 176, wherein said report indicates recommendation for chemotherapy for said individual. 181. The method of embodiment 176, wherein said report indicates recommendation for undergoing an independent cancer assay. 182. The method of embodiment 176, wherein said report indicates recommendation for undergoing a stool cancer assay. 183. The method of embodiment 166, wherein said list of proteins comprises no more than 20 proteins. 184. The method of embodiment 166, wherein said list of proteins comprises no more than 10 proteins. 185. A method of assessing colorectal health of an individual, comprising: obtaining a circulating blood sample from said individual; detecting protein levels for each member of a list of proteins in sample, said list of proteins comprising A2GL and ALS. 186. The method of embodiment 185, further comprising diagnosing said individual as having a colorectal cancer status when said protein levels from said individual do not differ significantly from a reference panel information set corresponding to a known colorectal cancer risk status. 187. The method of embodiment 185 or 186, further comprising performing colonoscopy on said individual. 188. The method of any one of embodiments 185 to 187, further performing a treatment regimen upon said individual. 189. The method of embodiment 188, wherein said treatment regimen comprises polypectomy. 190. The method of embodiment 188, wherein said treatment regimen comprises radiation. 191. The method of embodiment 188, wherein said treatment regimen comprises chemotherapy. 192. The method of embodiment 185, wherein said list of proteins further comprises PTPRJ. 193. The method of embodiment 185, wherein said list of proteins further comprises at least one additional protein selected from Table 1. 194. The method of embodiment 185, comprising obtaining age information for said individual. 195. The method of embodiment 185, comprising obtaining gender information for said individual. 196. The method of embodiment 185, comprising obtaining age information and gender information for said individual. 197. The method of any one of embodiments 185 to 196, further comprising transmitting a report to a health practitioner of results of said detecting. 198. The method of any one of embodiments 195 to 197, further comprising diagnosing said individual as having a colorectal cancer status when said protein levels, age and gender from said individual as a whole do not differ significantly from a reference panel information set corresponding to a known colorectal cancer risk status. 199. The method of embodiment 185, wherein said report indicates recommendation for a colonoscopy for said individual. 200. The method of embodiment 197, wherein said report indicates recommendation for a polypectomy for said individual. 201. The method of embodiment 197, wherein said report indicates recommendation for radiation for said individual. 202. The method of embodiment 197, wherein said report indicates recommendation for chemotherapy for said individual. 203. The method of embodiment 197, wherein said report indicates recommendation for undergoing an independent cancer assay. 204. The method of embodiment 197, wherein said report indicates recommendation for undergoing a stool cancer assay. 205. The method of any one of embodiments 185 to 204, wherein said list of proteins comprises no more than 20 proteins. 206. The method of embodiment 185, wherein said list of proteins comprises no more than 10 proteins. 207. 208. A method of assessing colorectal health of an individual, comprising: obtaining a circulating blood sample from said individual; detecting protein levels for each member of a list of proteins in sample, said list of proteins comprising A2GL and ALS. 209. The method of embodiment 208, further comprising diagnosing said individual as having an advanced adenoma status when said protein levels from said individual do not differ significantly from a reference panel information set corresponding to a known advanced adenoma risk status. 210. The method of embodiment 208 or 209, further comprising performing colonoscopy on said individual. 211. The method of any one of embodiments 208 to 210, further performing a treatment regimen upon said individual. 212. The method of embodiment 211, wherein said treatment regimen comprises polypectomy. 213. The method of embodiment 211, wherein said treatment regimen comprises radiation. 214. The method of embodiment 211, wherein said treatment regimen comprises chemotherapy. 215. The method of embodiment 208, wherein said list of proteins further comprises PTPRJ. 216. The method of embodiment 208, wherein said list of proteins further comprises at least one additional protein selected from Table 1. 217. The method of embodiment 208, comprising obtaining age information for said individual. 218. The method of embodiment 208, comprising obtaining gender information for said individual. 219. The method of embodiment 208, comprising obtaining age information and gender information for said individual. 220. The method of any one of embodiments 208 to 219, further comprising transmitting a report to a health practitioner of results of said detecting. 221. The method of any one of embodiments 208 to 219, further comprising diagnosing said individual as having an advanced adenoma status when said protein levels and age from said individual as a whole do not differ significantly from a reference panel information set corresponding to a known advanced adenoma risk status. 222. The method of embodiment 220, wherein said report indicates recommendation for a colonoscopy for said individual. 223. The method of embodiment 220, wherein said report indicates recommendation for a polypectomy for said individual. 224. The method of embodiment 220, wherein said report indicates recommendation for radiation for said individual. 225. The method of embodiment 220, wherein said report indicates recommendation for chemotherapy for said individual. 226. The method of embodiment 220, wherein said report indicates recommendation for undergoing an independent cancer assay. 227. The method of embodiment 220, wherein said report indicates recommendation for undergoing a stool cancer assay. 228. The method of any one of embodiments 208 to 227, wherein said list of proteins comprises no more than 20 proteins. 229. The method of any one of embodiments 208 to 227, wherein said list of proteins comprises no more than 10 proteins. 230. A method of generating a biomarker panel for assessing a health status, comprising: a) identifying candidate biomarkers having an association with the health status; and b) performing mass spectrometric processing on at least a fragment of a plurality of candidate biomarker proteins derived from the candidate biomarkers to determine biomarkers suitable for assessing a health status; wherein the processing comprises at least one process control step. 231. The method of embodiment 230, wherein the at least one process control step comprises using at least one system suitability test (SST) run to assess liquid chromatography (LC) and mass spectrometry (MS) performance prior to the mass spectrometric processing. 232. The method of embodiment 231, wherein the SST comprises determining LC-MS performance by running a SIS standard curve in log-serial dilution. 233. The method of embodiment 232, further comprising performing a quality control check requiring at least about a 10-fold difference in MS signal between any two adjacent concentration levels, and a dynamic range of approximately four log units across the standard curve. 234. The method of embodiment 231, wherein the SST comprises determining LC performance by monitoring heavy transitions of internal standards for RT stability. 235. The method of embodiment 234, wherein monitoring heavy transitions comprises tracking RT shift between a detected value and a scheduled RT. 236. The method of embodiment 235, further comprising performing a quality control check requiring the upper 95% confidence interval of RTs of heavy transitions are no more than 6 seconds from the margins of LC-MS acquisition windows. 237. The method of embodiment 230, wherein the at least one process control step comprises monitoring flow-through AUC during immunodepletion, monitoring of TPA results for sample processing and immunodepletion efficiency, sample preparation customization depending on TPA result of each individual sample, or any combination thereof. 238. The method of embodiment 230, further comprising analyzing results of the mass spectrometric processing. 239. The method of embodiment 238, wherein the step of analyzing results comprises filtering transitions based on quantitative performance and peak quality. 240. The method of embodiment 239, wherein peak quality is evaluated using a peak quality tool. 241. The method of embodiment 230, wherein identifying candidate biomarkers comprises at least one of: obtaining biomarkers from an internal biomarker dataset, obtaining biomarkers from public biomarker datasets, or conducting a semi-automated literature search to identify biomarkers associated with the health condition. 242. The method of embodiment 241, wherein the step of analyzing results comprises requiring transitions to have labeled peaks in every processed sample. 243. The method of embodiment 230, wherein the at least one process control step comprises evaluating transitions for quantitative performance, peak quality, and the presence of labeled peaks in every processed sample. 244. The method of embodiment 230, wherein the at least one process control step comprises evaluating heavy and light transition pairs for at least one quantitative metric comprising heavy transition specificity, signal to noise ratio, precision, linearity, light transition specificity, or any combination thereof. 245. The method of any one of embodiments 230-244, further comprising evaluating only transitions that passed the at least one process control step. 246. A system for generating a biomarker panel for assessing a health status, comprising: a) a module identifying candidate biomarkers having an association with the health status; and b) a module performing mass spectrometric processing on at least a fragment of a plurality of candidate biomarker proteins derived from the candidate biomarkers to determine biomarkers suitable for assessing a health status; wherein the processing comprises at least one process control step. 247. The system of embodiment 246, wherein the at least one process control step comprises using at least one system suitability test (SST) run to assess liquid chromatography (LC) and mass spectrometry (MS) performance prior to the mass spectrometric processing. 248. The system of embodiment 247, wherein the SST comprises determining LC-MS performance by running a SIS standard curve in log-serial dilution. 249. The system of embodiment 248, further comprising performing a quality control check requiring at least about a 10-fold difference in MS signal between any two adjacent concentration levels, and a dynamic range of approximately four log units across the standard curve. 250. The system of embodiment 247, wherein the SST comprises determining LC performance by monitoring heavy transitions of internal standards for RT stability. 251. The system of embodiment 250, wherein monitoring heavy transitions comprises tracking RT shift between a detected value and a scheduled RT. 252. The system of embodiment 251, further comprising performing a quality control check requiring the upper 95% confidence interval of RTs of heavy transitions are no more than 6 seconds from the margins of LC-MS acquisition windows. 253. The system of embodiment 246, wherein the at least one process control step comprises monitoring flow-through AUC during immunodepletion, monitoring of TPA results for sample processing and immunodepletion efficiency, sample preparation customization depending on TPA result of each individual sample, or any combination thereof. 254. The system of embodiment 246, further comprising analyzing results of the mass spectrometric processing. 255. The system of embodiment 254, wherein the step of analyzing results comprises filtering transitions based on quantitative performance and peak quality. 256. The system of embodiment 255, wherein peak quality is evaluated using a peak quality tool. 257. The system of embodiment 246, wherein identifying candidate biomarkers comprises at least one of: obtaining biomarkers from an internal biomarker dataset, obtaining biomarkers from public biomarker datasets, or conducting a semi-automated literature search to identify biomarkers associated with the health condition. 258. The system of embodiment 257, wherein the step of analyzing results comprises requiring transitions to have labeled peaks in every processed sample. 259. The system of embodiment 246, wherein the at least one process control step comprises evaluating transitions for quantitative performance, peak quality, and the presence of labeled peaks in every processed sample. 260. The system of embodiment 246, wherein the at least one process control step comprises evaluating heavy and light transition pairs for at least one quantitative metric comprising heavy transition specificity, signal to noise ratio, precision, linearity, light transition specificity, or any combination thereof. 261. The system of any one of embodiments 246-260, wherein only transitions that passed the at least one process control step are evaluated to determine the biomarkers suitable for assessing health status. 262. A method of assessing a colorectal health risk status in an individual, comprising steps of: a) obtaining a circulating blood sample from said individual; and b) obtaining a biomarker panel level for at least two of A2GL, ALS, and PTPRJ of said circulating blood sample, and assessing colorectal health risk status. 263. The method of embodiment 262, wherein said biomarker panel further comprises an individual age. 264. The method of embodiment 262, wherein said colorectal cancer status comprises at least one of early CRC and advanced CRC. 265. The method of embodiment 262, wherein said colorectal cancer status comprises at least one of advanced adenoma, Stage 0 CRC, stage I CRC, Stage II CRC, stage III CRC, and stage IV CRC. 266. The method of embodiment 262, wherein said biomarker panel comprises no more than 20 proteins. 267. The method of embodiment 262, wherein said biomarker panel comprises no more than 10 proteins. 268. The method of embodiment 262, wherein said categorizing has a sensitivity of at least 70% and a specificity of at least 70%. 269. The method of embodiment 262, further comprising performing a treatment regimen in response to said categorizing. 270. The method of embodiment 269, wherein said treatment regimen comprises at least one of chemotherapy, radiation, immunotherapy, administration of a biologic therapeutic agent, polypectomy, partial colectomy, low anterior resection or abdominoperineal resection and colostomy. 271. The method of embodiment 262, further comprising transmitting a report of results of said categorizing to a health practitioner. 272. The method of embodiment 271, wherein said report indicates a sensitivity of at least 70%. 273. The method of embodiment 271, wherein said report indicates a specificity of at least 70%. 14. 274. The method of embodiment 271, wherein said report indicates a recommendation for a treatment regimen comprising at least one of chemotherapy, radiation, immunotherapy, administration of a biologic therapeutic agent, polypectomy, partial colectomy, low anterior resection or abdominoperineal resection and colostomy. 275. The method of embodiment 271, wherein said report indicates a recommendation for a colonoscopy. 276. The method of embodiment 271, wherein said report indicates a recommendation for undergoing an independent cancer assay. 277. The method of embodiment 271, wherein said report indicates a recommendation for undergoing a stool cancer assay. 278. The method of embodiment 262, further comprising performing a stool cancer assay in response to said categorizing. 279. The method of embodiment 262, further comprising continued monitoring for a period of 3 months or greater. 280. The method of embodiment 262, further comprising continued monitoring for a period of between 3 months and 24 months. 281. The method of embodiment 262, wherein said obtaining said protein levels comprises subjecting said biological sample to a mass spectrometric analysis. 282. The method of embodiment 281, wherein said mass spectrometric analysis is evaluated according to at least one process control step. 283. The method of embodiment 282, wherein the process control step comprises using at least one system suitability test (SST) run to assess liquid chromatography (LC) and mass spectrometry (MS) performance prior to the mass spectrometric processing. 284. The method of embodiment 262, wherein said obtaining said protein levels comprises subjecting said biological sample to an affinity assay. 285. The method of embodiment 284, wherein said affinity assay comprises an immunoassay analysis of said biological sample. 286. The method of embodiment 284, wherein said affinity assay comprises an aptamer analysis of said biological sample. 287. The method of embodiment 284, wherein said affinity assay comprises assessing said biological sample according to a quality control (QC) parameter. 288. The method of embodiment 287, wherein the QC parameter comprises at least one of sample integrity, sample elution efficiency, sample storage condition, and internal standard monitoring. 289. A method of generating a biomarker panel for assessing a health status, comprising: a) identifying candidate biomarkers having an association with the health status; and b) performing mass spectrometric processing on at least a fragment of a plurality of candidate biomarker proteins derived from the candidate biomarkers to determine biomarkers suitable for assessing a health status; wherein the processing comprises at least one process control step. 290. The method of embodiment 289, wherein the at least one process control step comprises using at least one system suitability test (SST) run to assess liquid chromatography (LC) and mass spectrometry (MS) performance prior to the mass spectrometric processing. 291. The method of embodiment 290, wherein the SST comprises determining LC-MS performance by running a SIS standard curve in log-serial dilution. 292. The method of embodiment 291, further comprising performing a quality control check requiring at least about a 10-fold difference in MS signal between any two adjacent concentration levels, and a dynamic range of approximately four log units across the standard curve. 293. The method of embodiment 289, wherein the SST comprises determining LC performance by monitoring heavy transitions of internal standards for RT stability. 294. The method of embodiment 293, wherein monitoring heavy transitions comprises tracking RT shift between a detected value and a scheduled RT. 295. The method of embodiment 292, further comprising performing a quality control check requiring the upper 95% confidence interval of RTs of heavy transitions are no more than 10% from the margin from the margins of LC-MS acquisition windows. 296. The method of embodiment 289, wherein the at least one process control step comprises monitoring flow-through AUC during immunodepletion, monitoring of TPA results for sample processing and immunodepletion efficiency, sample preparation customization depending on the TPA result of each individual sample, or any combination thereof 297. The method of embodiment 289, wherein the at least a fragment comprises a proteotypic peptide. 298. The method of embodiment 289, wherein the at least a fragment comprises a full length protein.
Further understanding of the disclosure herein is gained through reference to the following embodiments.
A patient at risk of colorectal cancer is tested using a panel as disclosed herein. A blood sample is taken from the patient. The blood sample is mailed to a facility, where plasma is prepared and protein accumulation levels are measured using antibody florescence binding assay to detect members of a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status, and the patient is categorized with an at least 81% sensitivity, and an at least 78% specificity as having colon cancer. A colonoscopy is recommended and evidence of colorectal cancer is detected in the individual.
The patient of Example 1 is prescribed a treatment regimen comprising a surgical intervention. A blood sample is taken from the patient prior to surgical intervention and protein accumulation levels are measured for a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status, and the patient is categorized with an 81% sensitivity and a 78% specificity as having colon cancer.
A blood sample is taken from the patient subsequent to surgical intervention and protein accumulation levels are measured for a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status, and the patient is patient is categorized with an 81% sensitivity, and a 78% specificity as having colon cancer.
The patient of Example 1 is prescribed a treatment regimen comprising a chemotherapeutic intervention comprising 5-FU administration. A blood sample is taken from the patient prior to chemotherapeutic intervention and protein accumulation levels are measured for a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status, and the patient is patient is categorized with an 81% sensitivity, and a 78% specificity as having colon cancer.
A blood sample is taken from the patient at weekly intervals during chemotherapy treatment and protein accumulation levels are measured for a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status. The patient's panel results over time indicate that the cancer has responded to the chemotherapy treatment and that the colorectal cancer is no longer detectable by completion of the treatment regimen.
The patient of Example 1 is prescribed a treatment regimen comprising a chemotherapeutic intervention comprising oral capecitabine administration. A blood sample is taken from the patient prior to chemotherapeutic intervention and protein accumulation levels are measured for a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status, and the patient is patient is categorized with an 81% sensitivity, and a 78% specificity as having colon cancer.
A blood sample is taken from the patient at weekly intervals during chemotherapy treatment and protein accumulation levels are measured for a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results over time indicate that the cancer has responded to the chemotherapy treatment and that the colorectal cancer is no longer detectable by completion of the treatment regimen.
The patient of Example 1 is prescribed a treatment regimen comprising a chemotherapeutic intervention comprising oral oxaliplatin administration. A blood sample is taken from the patient prior to chemotherapeutic intervention and protein accumulation levels are measured for a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status, and the patient is patient is categorized with an 81% sensitivity, and a 78% specificity as having colon cancer.
A blood sample is taken from the patient at weekly intervals during chemotherapy treatment and protein accumulation levels are measured for a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status. The patient's panel results over time indicate that the cancer has responded to the chemotherapy treatment and that the colorectal cancer is no longer detectable by completion of the treatment regimen.
The patient of Example 1 is prescribed a treatment regimen comprising a chemotherapeutic intervention comprising oral oxaliplatin administration in combination with bevacizumab. A blood sample is taken from the patient prior to chemotherapeutic intervention and protein accumulation levels are measured for a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status, and the patient is patient is categorized with an 81% sensitivity, and a 78% specificity as having colon cancer.
A blood sample is taken from the patient at weekly intervals during chemotherapy treatment and protein accumulation levels are measured for a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status. The patient's panel results over time indicate that the cancer has responded to the chemotherapy treatment and that the colorectal cancer is no longer detectable by completion of the treatment regimen.
A patient at risk of colorectal cancer is tested using a panel as disclosed herein. A blood sample is taken from the patient and protein accumulation levels are measured using reagents in an ELISA kit to detect members of a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status, and the patient is patient is categorized with an 81% sensitivity, and a 78% specificity as having colon cancer. A colonoscopy is recommended and evidence of colorectal cancer is detected in the individual.
A patient at risk of colorectal cancer is tested using a panel as disclosed herein. A blood sample is taken from the patient and protein accumulation levels are measured using mass spectrometry to detect members of a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status, and the patient is categorized with an 81% sensitivity, and a 78% specificity as having colon cancer. A colonoscopy is recommended and evidence of colorectal cancer is detected in the individual.
1000 patients at risk of colorectal cancer are tested using a panel as disclosed herein. A blood sample is taken from the patient and protein accumulation levels are measured to detect members of a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patients' panel results are compared to panel results of known status, and the patients are categorized with an 81% sensitivity, and a 78% specificity into a colon cancer category. A colonoscopy is recommended for patients categorized as positive. Of the patients categorized as having colon cancer, 80% are independently confirmed to have colon cancer. Of the patients categorized as not having colon cancer, 20% are later found to have colon cancer through an independent follow up test, confirmed via a colonoscopy.
A patient at risk of advanced adenoma is tested using a panel as disclosed herein. A blood sample is taken from the patient. The blood sample is mailed to a facility, where plasma is prepared and protein accumulation levels are measured using an antibody florescence binding assay to detect members of a panel comprising A2GL, ALS, and PTPRJ, and also factoring in the patient's age. The patient's panel results are compared to panel results of known status, and the patient is categorized as being at risk of advanced adenoma.
Candidate protein biomarkers can be selected from various sources. Examples of sources of candidate protein biomarkers include publicly available proteomics databases or datasets, internal datasets (e.g., from past internal studies), and scientific literature. The candidate protein biomarkers can be identified based on a known or inferred relationship with a disease or health status such as CRC. In some instances, the health status comprises the presence or absence of CRC. Alternatively or in combination, the health status comprises the grade or stage of CRC. Examples of CRC grades include low grade (e.g., the tumor has well differentiated cells that resemble normal cells and tend to be slower growing) and high grade (e.g., the tumor has poorly differentiated or undifferentiated cells that do not resemble normal cells and tend to be faster growing). In some cases, CRC grades include grade 0, grade 1, grade 2, grade 3, or grade 4. Grade 0 is the earliest stage of cancer and the tumor has not grown beyond the inner mucosal layer of the colon. Grades 1-4 are more advanced stages. In some cases, the systems and methods described herein enable detection of CRC that is grade 0, 1, 2, 3, or 4. Sometimes, the systems and methods enable detection of pre-CRC or increased risk of developing CRC that is even before grade 0. In some instances, candidate protein biomarkers for CRC are selected one or more of three sources: 1) an earlier targeted proteomics study performed in our laboratory, 2) analysis of publicly available proteomics datasets related to CRC, and 3) semi-automated literature searches. These three approaches yielded a total of 430 proteins designated as CRC-related biomarker candidates for further experimental investigation.
1433B_HUMAN; CH60_HUMAN; H2BFS_HUMAN; PCKGM_HUMAN; TNF15_HUMAN; 1433E_HUMAN; CHK1_HUMAN; HABP2_HUMAN; PDIA3_HUMAN; TNF6B_HUMAN; 1433F_HUMAN; CHK2_HUMAN; HEMO_HUMAN; PDIA6_HUMAN; TP4A3_HUMAN; 1433G_HUMAN; CHLE_HUMAN; HEP2_HUMAN; PDLI7_HUMAN; TPA_HUMAN; 1433T_HUMAN; CLC4D_HUMAN; HGF_HUMAN; PDXK_HUMAN; TPM2_HUMAN; 1433Z_HUMAN; CLUS_HUMAN; HMGB1_HUMAN; PEBP1_HUMAN; TR10B_HUMAN; 1A68_HUMAN; CNDP1_HUMAN; HNRPF_HUMAN; PEDF_HUMAN; TRAP1_HUMAN; A1AG1_HUMAN; CNN1_HUMAN; HNRPQ_HUMAN; PGFRA_HUMAN; TREM1_HUMAN; A1AG2_HUMAN; CO3_HUMAN; HPT_HUMAN; PIPNA_HUMAN; TRFE_HUMAN; A1AT_HUMAN; CO4A_HUMAN; HRG_HUMAN; PLGF_HUMAN; TRFL_HUMAN; A1BG_HUMAN; CO6A3_HUMAN; HS90B_HUMAN; PLIN2_HUMAN; TRI33_HUMAN; A2AP_HUMAN; CO8G_HUMAN; HSPB1_HUMAN; PLMN_HUMAN; TSG6_HUMAN; A2GL_HUMAN; C09_HUMAN; I10R1_HUMAN; PO2F1_HUMAN; TSP1_HUMAN; A2MG_HUMAN; COR1C_HUMAN; IBP2_HUMAN; PON1_HUMAN; TTHY_HUMAN; A4_HUMAN; CORIN_HUMAN; IBP3_HUMAN; POTEF_HUMAN; UGDH_HUMAN; AACT_HUMAN; CP1A1_HUMAN; IF4A3_HUMAN; PPIB_HUMAN; UGPA_HUMAN; ABCB5_HUMAN; CRDL2_HUMAN; IFT74_HUMAN; PRD16_HUMAN; UROK_HUMAN; ABCBA_HUMAN; CRP_HUMAN; IGF1_HUMAN; PRDX1_HUMAN; VCAM1_HUMAN; ACINU_HUMAN; CSF1_HUMAN; IGHA2_HUMAN; PRDX2_HUMAN; VEGFA_HUMAN; ACTBL_HUMAN; CSF1R_HUMAN; IGLL5_HUMAN; PREX2_HUMAN; VGFR1_HUMAN; ACTBM_HUMAN; CSPG2_HUMAN; IKKB_HUMAN; PRKN2_HUMAN; VILI_HUMAN; ACTG_HUMAN; CTHR1_HUMAN; IL23R_HUMAN; PRL_HUMAN; VIME_HUMAN; ACTH_HUMAN; CTNA1_HUMAN; IL26_HUMAN; PROC_HUMAN; VNN1_HUMAN; ADIPO_HUMAN; CTNB1_HUMAN; IL2RB_HUMAN; PROS_HUMAN; VP13B_HUMAN; ADT2_HUMAN; CUL1_HUMAN; IL6RA_HUMAN; PSME3_HUMAN; VTNC_HUMAN; AFAM_HUMAN; CYTC_HUMAN; IL8_HUMAN; PTEN_HUMAN; VWF_HUMAN; AGAP2_HUMAN; DAF_HUMAN; IL9_HUMAN; PTGDS_HUMAN; XBP1_HUMAN; AKA12_HUMAN; DEF1_HUMAN; ILEU_HUMAN; PTPRJ_HUMAN; ZA2G_HUMAN; AKT1_HUMAN; DESM_HUMAN; IPSP_HUMAN; PTPRT_HUMAN; ZMIZ1_HUMAN; AL1A1_HUMAN; DHRS2_HUMAN; IPYR_HUMAN; PTPRU_HUMAN; ZPI_HUMAN; AL1B1_HUMAN; DHSA_HUMAN; IRGM_HUMAN; PZP_HUMAN; ALBU_HUMAN; DPP10_HUMAN; ISK1_HUMAN; RAB38_HUMAN; ALDOA_HUMAN; DPP4_HUMAN; ITA6_HUMAN; RASF2_HUMAN; ALDR_HUMAN; DPYL2_HUMAN; ITA9_HUMAN; RASK_HUMAN; ALS_HUMAN; DYHC1_HUMAN; ITIH2_HUMAN; RBX1_HUMAN; AMPD1_HUMAN; ECH1_HUMAN; JAM3_HUMAN; RCAS1_HUMAN; AMPN_HUMAN; EDA_HUMAN; K1C19_HUMAN; REG4_HUMAN; AMY2B_HUMAN; EF2_HUMAN; K2C72_HUMAN; RET4_HUMAN; ANGI_HUMAN; ENOA_HUMAN; K2C73_HUMAN; RHOA_HUMAN; ANGL4_HUMAN; ENOX2_HUMAN; K2C8_HUMAN; RHOB_HUMAN; ANGT_HUMAN; ENPL_HUMAN; KAIN_HUMAN; RHOC_HUMAN; ANT3_HUMAN; ENPP1_HUMAN; KC1D_HUMAN; ROA1_HUMAN; ANXA1_HUMAN; ENPP2_HUMAN; KCRB_HUMAN; ROA2_HUMAN; ANXA3_HUMAN; EZRI_HUMAN; KISS1_HUMAN; RRBP1_HUMAN; ANXA4_HUMAN; FA10_HUMAN; KLK6_HUMAN; RSSA_HUMAN; ANXA5_HUMAN; FA5_HUMAN; KLOT_HUMAN; S100P_HUMAN; APC_HUMAN; FA7_HUMAN; KNG1_HUMAN; S10A8_HUMAN; APCD1_HUMAN; FA9_HUMAN; KPCD1_HUMAN; S10A9_HUMAN; APOA1_HUMAN; FABP5_HUMAN; KPYM_HUMAN; S10AB_HUMAN; APOA2_HUMAN; FAK1_HUMAN; LAMA2_HUMAN; S10AC_HUMAN; APOA4_HUMAN; FAK2_HUMAN; LAT1_HUMAN; S29A1_HUMAN; APOA5_HUMAN; FARP1_HUMAN; LBP_HUMAN; SAA1_HUMAN; APOC1_HUMAN; FBX4_HUMAN; LCAT_HUMAN; SAA2_HUMAN; APOC4_HUMAN; FCGBP_HUMAN; LDHA_HUMAN; SAA4_HUMAN; APOE_HUMAN; FCRL3_HUMAN; LEG2_HUMAN; SAHH_HUMAN; APOH_HUMAN; FCRL5_HUMAN; LEG3_HUMAN; SAMP_HUMAN; APOL1_HUMAN; FETA_HUMAN; LEG4_HUMAN; SBP1_HUMAN; APOM_HUMAN; FETUA_HUMAN; LEG8_HUMAN; SDCG3_HUMAN; ASAP3_HUMAN; FHL1_HUMAN; LEPR_HUMAN; SEGN_HUMAN; ATPB_HUMAN; FHR1_HUMAN; LEUK_HUMAN; SELPL_HUMAN; ATS13_HUMAN; FHR3_HUMAN; LG3BP_HUMAN; SEPP1_HUMAN; B2CL1_HUMAN; FIBA_HUMAN; LMNB1_HUMAN; SEPR_HUMAN; B2LA1_HUMAN; FIBB_HUMAN; LRRC7_HUMAN; SEPT9_HUMAN; B3GT5_HUMAN; FIBG_HUMAN; LUM_HUMAN; SF3B3_HUMAN; BANK1_HUMAN; FINC_HUMAN; LYNX1_HUMAN; SHIP1_HUMAN; BC11A_HUMAN; FLNA_HUMAN; LYSC_HUMAN; SHRPN_HUMAN; BCAR1_HUMAN; FLNB_HUMAN; MACF1_HUMAN; SIA8D_HUMAN; C1QBP_HUMAN; FLNC_HUMAN; MAP1S_HUMAN; SIAL_HUMAN; C4BPA_HUMAN; FND3B_HUMAN; MARE1_HUMAN; SIT1_HUMAN; CA195_HUMAN; FRIH_HUMAN; MASP1_HUMAN; SKP1_HUMAN; CAH1_HUMAN; FRIL_HUMAN; MASP2_HUMAN; SLAF1_HUMAN; CAH2_HUMAN; FRMD3_HUMAN; MBL2_HUMAN; SO1B3_HUMAN; CALR_HUMAN; FST_HUMAN; MCM4_HUMAN; SP110_HUMAN; CAPG_HUMAN; FUCO_HUMAN; MCR_HUMAN; SPB6_HUMAN; CASP9_HUMAN; FUCO2_HUMAN; MCRS1_HUMAN; SPON2_HUMAN; CATD_HUMAN; G3P_HUMAN; MIC1_HUMAN; SPP24_HUMAN; CATS_HUMAN; GAS6_HUMAN; MICA1_HUMAN; SRC_HUMAN; CATZ_HUMAN; GBRA1_HUMAN; MIF_HUMAN; SRPX2_HUMAN; CBG_HUMAN; GDF15_HUMAN; MMP2_HUMAN; STK11_HUMAN; CBPN_HUMAN; GDIR1_HUMAN; MMP7_HUMAN; SYDC_HUMAN; CBPQ_HUMAN; GELS_HUMAN; MMP9_HUMAN; SYG_HUMAN; CCD83_HUMAN; GFI1B_HUMAN; MTG16_HUMAN; SYNE1_HUMAN; CCL14_HUMAN; GGT1_HUMAN; MUC24_HUMAN; SYUG_HUMAN; CCR5_HUMAN; GHRL_HUMAN; MYL6_HUMAN; TACC1_HUMAN; CD109_HUMAN; GPNMB_HUMAN; MYL9_HUMAN; TAL1_HUMAN; CD20_HUMAN; GPX3_HUMAN; MYO9B_HUMAN; TBB1_HUMAN; CD24_HUMAN; GREM1_HUMAN; NDKA_HUMAN; TCTP_HUMAN; CD248_HUMAN; GRM6_HUMAN; NDRG1_HUMAN; TETN_HUMAN; CD28_HUMAN; GRP75_HUMAN; NFAC1_HUMAN; TF7L1_HUMAN; CD63_HUMAN; GSHR_HUMAN; NGAL_HUMAN; TFR1_HUMAN; CDD_HUMAN; GSTP1_HUMAN; NIBL2_HUMAN; THBG_HUMAN; CEA_HUMAN; GUC2A_HUMAN; NIPBL_HUMAN; THIO_HUMAN; CEAM3_HUMAN; H13_HUMAN; NNMT_HUMAN; THRB_HUMAN; CEAM5_HUMAN; H2A1D_HUMAN; NOD2_HUMAN; THTR_HUMAN; CEAM6_HUMAN; H2A2B_HUMAN; NUPR1_HUMAN; TIE2_HUMAN; CERU_HUMAN; H2AX_HUMAN; OSTP_HUMAN; TIMP1_HUMAN; CFAH_HUMAN; H2B1A_HUMAN; P53_HUMAN; TIMP2_HUMAN; CFAI_HUMAN; H2B1L_HUMAN; PAFA_HUMAN; TKT_HUMAN; CGHB_HUMAN; H2B1O_HUMAN; PAI1_HUMAN; TMG4_HUMAN; CH3L1_HUMAN; H2B3B_HUMAN; PALLD_HUMAN; TNF13_HUMAN;
Protein Biomarkers from an Earlier Study
An earlier targeted proteomics study focused on measuring 187 CRC-related proteins in 274 samples. All of these proteins were translated to the current project. Fresh method development was performed to find transitions that operated well in the complete method.
Protein Biomarkers from Analysis of Public CRC Datasets
Two publicly available proteomics datasets were obtained from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) (https://cptac-data-portal.georgetown.edu/cptac/public). One offered shotgun proteomics measures from 95 CRC tumor samples analyzed earlier by The Cancer Genome Atlas (TCGA) (https://cptac-data-portal.georgetown.edu/cptac/s/S016, accessed August 2014). The second offered shotgun proteomics measures from normal colon tissue taken from 30 CRC patients (https://cptac-data-portal.georgetown.edu/cptac/s/S019, accessed August 2014). Both datasets originated from the same Proteome Characterization Center (Vanderbilt University), and were acquired using data-dependent MS2 methods on an LTQ Orbitrap Velos mass spectrometer. The datasets included relative abundance calculations for precursors and peptide sequence proposals based on MS2 spectra interpretation from database searching. Features with identical peptide sequence proposals were compared across the two datasets to find those that were significantly different using Student's t-test between normal and CRC tumor tissue. Any features found to be significantly different were then examined further to find those with peptide sequences uniquely linking them to a single protein. This procedure yielded 72 new candidate CRC-related proteins.
Protein Biomarkers from Semi-Automated Literature Searches
Semi-automated literature searches looked for co-occurrences of particular text terms in full-text PubMed Central (PMC, https://www.ncbi.nlm.nih.gov/pmc/) Open Access Subset and in PubMed abstracts. PubMed abstracts were searched for co-occurrences of common terms for CRC and of UniProt protein names and symbols, yielding 120 CRC-related proteins not used in the previous study. PMC open access articles were searched for co-occurrences of synonyms for “human”, “colon”, “cancer”, “plasma” or “serum”, and “protein”. Articles with these terms were additionally investigated to find any occurrences of UniProt protein names or symbols. The proteins were ranked by their number of mentions, and those proteins with the highest mention counts covering 95% of the total mentions were selected as candidate CRC-related proteins. This procedure yielded 172 new candidate CRC-related proteins.
The peptide selection process was performed using algorithms developed for the previous study and followed the guidelines established in published MS standards. Following in silico digestion of the proteins by trypsin, proteotypic peptides favoring zero miscleavage were selected for each protein by removing homologous peptides identified via BLAST sequence analysis. Next, some peptides were excluded because they have poor LC-MS responsiveness predicted by in silico models or include cysteine and methionine residues prone to chemical modification. The remaining peptides were then filtered by length, retaining those with 6-21 amino acids to ensure effective ionization and fragmentation. After these filtering steps, 1006 candidate proteotypic peptides covered the 431 proteins, with at least two peptides per protein.
The LC gradient was optimized by exploring LC gradient programs across repeated runs of a heavy peptide working solution. The working solution was a mix of stable isotope-labeled internal standards (SIS) (New England Peptide, Gardner, MA) consisting of nitrogen (15N) and carbon (13C) labeled versions (>95% purity) of the 1006 peptides with equal molar concentrations at 158 fmol/μL. Multiple reverse-phase chromatographic conditions were tested on a 1290 Infinity ultra-high performance liquid chromatography (UHPLC) system (Agilent Technologies) coupled with a 6550 quadrupole time-of-flight (Q-TOF) mass spectrometer (Agilent Technologies). Chromatographic separation was performed on a C18 column (Waters ACQUITY UPLC CSH, 2.1×150 mm, 1.7 μm particle size) with mobile phase A: 0.1% formic acid in water, and mobile phase B: 0.1% formic acid in acetonitrile. MS/MS spectra were acquired for heavy peptides exclusively and searched using in-house developed software for peptide identification and retention time assignment. The optimal LC gradient was established as that with the lowest gradient duration of less than 32 minutes, and with peptide concurrency approximately equal to 25 at any point, using an acquisition window of 42 sec and a cycle time of 500 ms. The final LC gradient used a flow rate of 450 μL/min on a 31.75 min linear gradient with the following segments: mobile phase B increased from 3% to 13% in the first 20 min, 13% to 20% in the next 7 min, 20% to 40% in the next 2 min, 40% to 80% in the next 1.25 min, and then stayed at 80% for the next 1.25 min before returning to 3% in the final 0.25 min.
With the final LC gradient, RTs were determined for 979 out of 1006 heavy peptides (430 out of 431 initial proteins). Skyline software (version 3.5) was used to list all possible singly charged product ion transitions for doubly charged precursor ions of the 979 peptides. From these ions, co-eluted ions with <=1 Da Mass difference were removed, leaving 12733 heavy transitions. From these 12733 transitions, small product ions b1, b2, y1, and y2 were excluded due to the risk of interference. The collision energy (CE) was then empirically optimized for the 8806 transitions using the heavy peptide working solution on a 1290 UHPLC coupled to a 6490 triple quadrupole (QQQ) mass spectrometer (Agilent Technologies). The CE calculated by Skyline software was used as a median value for CE optimization. CE optimization parameters were set to use 3 steps on each side of the value that was predicted by the default CE equation for each transition (CE=0.031 m/z+1), specified for Agilent QQQ mass spectrometer with the step size set to 6 V. In total, 6 collision energy voltage values were considered for each transition. The peak area under the curve (AUC) was integrated and analyzed with proprietary automated algorithms, developed at Applied Proteomics Inc. The CE that yielded the maximum peak AUC mean across 3 replicates was chosen as the optimal CE. A dynamic multiple reaction monitoring (dMRM) approach was selected for CE optimization and further experiments since it offers several advantages over the conventional segment dMRM approach for complex samples with low levels of the analytes of interest. The dMRM algorithm on the Agilent 6490 QQQ automatically constructed dMRM timetables throughout the LC-MS analysis based on the analyte RTs and acquisition windows. This approach allowed the instrument to acquire data only during specific RT windows, thus maximizing the concurrent ion transitions without compromising dwell time and sensitivity. The following conditions were maintained to ensure good signal to noise and sufficient data points across the peak of each transition based on our previous experience: acquisition window=42 seconds, dwell time>=2 ms, transition concurrency<=100, cycle time<=500 ms.
The 8806 transitions represented 901 proteotypic peptides from 430 proteins. The next step was to filter these to achieve acceptable LC concurrency and quality signal, aiming for two peptides/protein and two transitions/peptide. To this end, the transitions were first ranked and filtered according to five quantitative criteria related to heavy transition specificity, endogenous transition specificity, signal/noise, precision, and linearity. To obtain the five metrics, dMRM runs were performed using two 3-point curves of a heavy peptide mixture (15.8, 50, and 158 fmol/μL) in solvent and in endogenous matrix. For the solvent curve, the heavy peptide working solution was serially diluted in the half-log scale with the LC mobile phase (0.1% formic acid in 3% acetonitrile and 97% water). For the matrix curve, BioRec plasma was immuno-depleted and digested into endogenous peptides, and these lyophilized peptides were reconstituted to 3 μg/μL in each of the above three heavy peptide solutions. SIS curves in solvent and matrix were run in three technical replicates.
Transition specificity was evaluated by using the peak AUC ratio between two transitions of the same precursor (doubly charged peptide in this paper), referred to as “branching ratio” or “relative ratio”. The triplicate ratios were considered for all the transitions of each peptide. Heavy transition specificity was determined by a t-test comparing the heavy transition ratios in heavy peptide mixture (158 fmol/μL) with and without endogenous matrix. To evaluate light transition specificity, the acceptance requirement prior to performing the t-test was that heavy and light transition peaks co-elute with <=1-second difference between peak apexes, and then the comparison was performed between the transition ratios of heavy peptide and its corresponding light peptide in endogenous matrix spiked with heavy peptide solution at 158 fmol/μL. A p-value of 0.05 after multiple-test correction was the threshold to pass transition specificity and accept lack of interference. To evaluate signal/noise for each of the 8806 heavy transitions, averaged peak abundance was compared with instrument limit of quantitation (LOQ, 10× standard deviation of solvent blank's signal+averaged blank's signal) for each concentration level in the 3-point curve of the heavy peptide mixture in solvent. Signal abundance at 50 fmol/μL must be above or equal to instrument LOQ for the transition to pass the criterion of signal/noise. Precision was measured with the triplicate 3-point curves of the heavy peptide mixture (15.8, 50, and 158 fmol/μL) in solvent. Coefficient of variation (CV) was calculated for peak AUCs of heavy transition between three repeats at each concentration level. Three peak AUC values were required for all three dilution steps with CVs <=20% for the transition to pass the metric of precision. Linearity was assessed with a linear regression applied across the three concentration levels. The criteria for acceptance were that the multiple-test corrected p-value for slope must be <0.05, that the slope must be >0, and that the slope confidence interval must exclude 0.
Following the above measurements and calculations, each transition had a binary pass/fail result for each of five metrics and was assigned to one of ten tiers based on the combination of the five binary results in the hierarchical order of heavy transition specificity, signal/noise, precision, linearity, and light transition specificity as shown in Table 3.
All 8806 transitions were automatically ranked in this novel 10-tier system. In the event of multiple transitions from a given peptide assigned to the same tier, the transition peak AUC was used as tiebreaker, such that the transition with the higher AUC would be ranked higher. Transitions were then selected by a proprietary automated algorithm with transitions from tiers 1 and 2 selected as first choice to increase assay quality, followed by a secondary transition selection from the other tiers to increase assay quantity while maximizing protein number in the final dMRM assay. Overall, one (required) to two (preferred) top-ranked peptides were chosen for each protein, and at least two top-tier transitions were picked for each peptide. These two transitions might be used in later analyses as a quantifier and a qualifier, conforming to some recommended analysis procedures. An output report was generated from the proprietary algorithm for a manual review to confirm the transition performances and selections. A minimal manual replacement was performed for the cases shown in
Analytical Performance of the Final dMRM Method
Transition analytical performance in the final method was characterized next. This process used a new heavy peptide solution consisting of the final 641 SIS peptides with equal molar concentrations at 500 fmol/μL. This mixture was diluted to give a 10-point half-log-serial dilution series with concentrations of 0.0158, 0.05, 0.158, 0.5, 1.58, 5, 15.8, 50, 158, and 500 fmol/μL. 100 μL aliquots of each heavy peptide dilution were added to 300 μg of lyophilized endogenous peptides processed from BioRec plasma to give the standard series. In addition, one plasma matrix preparation was reconstituted with solvent to serve as a blank. Standards and blanks were run in triplicate on one instrument (Agilent 1290 UHPLC-6490 QQQ) over one day. Plate- and sample-level quality metrics were assessed as described below for study runs; no quality failures were encountered.
Sensitivity assessments began by determining the Limits of Blank (LoB) and Limits of Detection (LoD) for each of the 1552 heavy transitions. These were determined by using triplicate means and standard deviations to estimate percentiles that reasonably define the LoB and LoD. Specifically, the LoB was defined as the estimate of the 95th percentile of heavy transition peak area in the blank, and the LoD was defined as the minimum standard concentration at which the estimate of the heavy transition peak area's 5th percentile was greater than or equal to the LoB. Assuming normal distributions, the LoB and LoD were calculated as follows.
LoB=meanblank+(1.645×sdblank)
LoD=minimum standard concentration at which
meanstandard−(1.645×sdstandard)>=LoB
Linearity assessments consisted of finding the largest set of standards that met pre-specified criteria and that supported a linear response range for each of the 1552 heavy transitions. The criteria for standard measures to be included in linearity assessment were 1) CV<=30% and 2) nominal concentration>=LoD. Using these standards' measures for each heavy transition, a robust linear model was used to fit transition peak area to nominal standard concentration. If the fit slope's 95% confidence interval matched or extended below 0, the lowest standard concentration was dropped, and the fit was attempted again. This process was repeated until 1) fewer than three concentrations remained (linear fit failure), or 2) the fit slope's 95% confidence interval was positive and excluded 0 (linear fit success). Lower Limits of Quantitation (LLoQ), an additional sensitivity metric, were determined from the linearity assessments. For successful linear fits, the LLoQ was the nominal concentration of the lowest standard used in the fit.
Finally, the linear dynamic range of each heavy transition was calculated from the ratio of the maximum and minimum standard concentrations from a successful linear fit:
dynamic range=log 10(standard·concnmax/standard·concnmin)
All heavy and light transition pairs with successful linear fits (requiring a defined LoB, a defined LoD, at least 3 standard concentrations >=LoD and with CVs <=30%, and a positive linear slope distinguishable from 0) were considered to have quantitative performance.
The principal variables influencing the precision and accuracy of an dMRM-based quantitative experiment are often related to either the pre-analytical or analytical aspects of the study. In this study, the pre-analytical variables—sample-specific differences in collection, processing, handling and storage procedures—were controlled by implementing standard operating procedures (SOPs) during collection of the Endoscopy II specimens. In one aspect of this disclosure, we address analytical variation and review the procedures we have used to monitor the analytical variability in a large-scale, longitudinal study using multiple instruments over four months. The quality parameters we monitor address the sample processing, LC performance, MS performance, or any combination thereof.
The patient samples used in this study were drawn from a high-quality clinical sample set, Endoscopy II, described previously. In brief, plasma samples were collected between 2010 and 2012 at seven hospitals in Denmark from patients considered high risk for CRC because of symptoms of colorectal neoplasia. The study inclusion criteria encompassed age≥18 years, scheduled for first-time colonoscopy, and any symptom of colorectal neoplasia (abnormal bowel habits, abdominal pain, rectal bleeding, unexplained weight loss, meteorism, anemia, and/or palpable mass). Colonoscopies, which followed sample collection, revealed the presence or absence of CRC, with CRC staged according to the Union for International Cancer Control (UICC) tumor node metastasis (TNM) system. Each Endoscopy II patient was placed in one of eight diagnostic groups based on colonoscopy results and comorbidities: colon cancer (all stages), rectal cancer (all stages), colon adenoma, rectal adenoma, no comorbidities and no CRC or polyps (“no comorbidity-no finding” group), comorbidities present and no CRC or polyps (“comorbidity-no finding” group), other cancer(s), or other colonoscopy findings (“other findings”). Comorbidity referred to co-existing medical ailments not related to CRC, such as Crohn's disease, colitis, diverticulitis, acute chronic inflammation, diabetes, rheumatoid arthritis, cardiovascular diseases, cirrhotic liver diseases, obstructive lung diseases, or restrictive lung diseases. A total of 1045 Endoscopy II plasma samples was used in this biomarker discovery study. The distribution of the 1045 patient samples across the diagnostic groups is presented in Table 5.
The 1045 patients were divided into separate Discovery and Validation (Test) sets, consisting of 672 and 373 patients, respectively. Data from the Discovery set were used to provide an overview of CRC signal as evidenced by univariate measures. Data from the Validation set were not analyzed in the current study; these data were retained for future validation/testing following multivariate classifier development.
Plasma samples were visually inspected to exclude lipemic and hemolytic samples. They were then processed into lyophilized protein digests as previously described. Briefly, a single 25 μL plasma aliquot from each sample was filtered to remove lipids and loaded on a 10 mm×100 mm Human 14 MAR column (Agilent Technologies) for immuno-depletion. The flow-through fractions, representing depleted plasma, were collected for buffer exchange with ammonium bicarbonate before protein concentration determination (Quant-iT Protein Assay Kit, ThermoFisher Scientific) performed on a Freedom EVO 200 automated liquid handling system (Tecan), used as the total protein assay (TPA) result. The TPA result for each sample was used to determine the amount of enzyme to be added during protein digestion (trypsin to protein mass ratio=1:34), and also to calculate the volume of LC-MS sample reconstitution solution aiming for 3 μg/μL of endogenous protein concentration, prior to LC-MS analysis. Protein digestion on a Freedom EVO 150 platform (Tecan) started with protein denaturation with 2,2,2-trifluoroethanol (Acros), followed by reduction with DL-dithiothreitol (Sigma-Aldrich) and subsequent alkylation with iodoacetamide (Arcos). Appropriate trypsin (Promega) was added into each sample before the incubation at 37° C. for 16 hours. The reaction was stopped with 10 μL of neat formic acid (ThermoFisher Scientific), followed by lyophilization. Prior to LC-MS injection, each endogenous sample was reconstituted in the appropriate volume of heavy peptide solution (SIS mixture with equal molar concentration at 100 fmol/μL) to get 30 μg of endogenous protein and 1,000 fmol of each heavy peptide in a single injection (10 μL) loaded onto the LC column.
Laboratory automation was deployed for the TPA procedure, protein digestion, and LC-MS sample reconstitution to ensure operation reproducibility by eliminating error-prone manual procedures with automated processes requiring minimal technician involvement. Immuno-depletion efficiency was pretested with two aliquots of 25 μL BioRec plasma being processed with and without the step of immuno-depletion respectively. 91% (1365 μg/1500 μg) proteins were depleted based on TPA results and only one peptide of Human 14 proteins was detected in the depleted flow-through collection by LC-MS/MS (
The 1045 patient samples were randomized and divided into 66 batches of up to 16 samples each. Each batch also included four aliquots of a pooled set of plasma samples (BioReclamationTVT), referred to as process quality controls (PQCs). Two batches were run each day—one on each of two immuno-depletion systems coupled with two LC-MS workstations. Reproducibility of the sample processing was evaluated over the four-month study period. The UV (220 nm) chromatograms in protein depletion were overlaid daily for each batch to review every PQC and patient sample, with the reference of the runs in the study day 1 and the previous day to check uniformity of peak shape and RT. PQCs' flow-through peak AUCs in the step of immuno-depletion and TPA results were tracked and compared with the ranges of means+/−standard deviations. After processing each batch, one of the four PQCs was analyzed by full MS and tandem MS to further monitor immuno-depletion and trypsin digestion. Immuno-depletion efficiency was evaluated by investigating the presence or absence of the top 14 human plasma proteins. Digestion consistency was assessed by monitoring the counts of molecular features (z at 2-4) detected by full MS and the missed cleavage rate in MS2 data search.
The biomarker study was run using the optimized LC gradient and the final dMRM method on two sets of 1290 UHPLC coupled to 6490 QQQ (Agilent Technologies). Both 6490 QQQs were operated in positive mode and ionization source conditions were as follows: capillary voltage=3.5 kV, nozzle voltage=300 V, nebulizer pressure=20 psi, sheath gas flow=11 L/min and sheath gas temperature=250° C. Each LC-MS worklist was comprised of an initial 5-point standard curve of 641 heavy peptides in solvent (0.05-500 fmol/μL, log serial dilution), 3 PQCs at the beginning, middle and end of the run, 16 individual patient samples, and 7 Blank samples (LC solvent) interspersed throughout the worklist to evaluate carryover. One single injection per sample was loaded on LC-MS for 40-minute data collection and the entire worklist required 21 hours. The study took four months to complete data collection using two LC-MS workstations, with instrument maintenance performed daily to ensure consistent LC-MS performance.
MS raw data were automatically extracted, reduced, and integrated, and then visualized using a real-time analytical pipeline developed at Applied Proteomics, Inc. An internal web client, accessing the pipeline server, permitted monitoring of data reduction, reviewing dMRM traces for each targeted transition, and downloading data for further analyses. Additionally, R scripts were created specifically to consolidate processed data and automate LC-MS performance monitoring. The LC-MS system suitability test (SST) and LC-MS performance during data acquisition were monitored using reference materials consisting of processed PQC samples and heavy peptide solution (mix of the final 641 SIS peptides with equal molar concentrations at 500 fmol/μL).
Immediately prior to each of the sample batch runs, the SST was performed to determine LC-MS performance by running the 5-point SIS standard curve in log-serial dilution. LC performance was checked by monitoring all 1552 heavy transitions (internal standards) for RT stability. An RT plot was automatically generated for each data file immediately after it was processed through the pipeline, tracking RT shift between the detected value and the scheduled RT used in the method. In order to avoid truncated peaks, the main quality control check required that the upper 95% confidence interval of the 1552 heavy transitions' RTs were <=6 seconds from the margins of LC-MS acquisition windows. If this check failed, troubleshooting followed by RT reassignment if necessary was performed before further data acquisition. MS performance was checked using 176 high performing heavy and light transition pairs that were selected during assay development to serve as QC transitions. In the SST, peak AUCs were recorded for the heavy QC transitions across the five concentration levels on the SST 5-point standard curves. The main quality control check required an approximately 10-fold difference in MS signal between any two adjacent concentration levels, and a dynamic range of approximately four log units across the full curve. If this check failed, troubleshooting was performed before further data acquisition. For each standard concentration, heavy transition peak AUCs were compared across days and between LC-MS systems to determine consistent MS performance across the four-month data collection period.
The sample batch set-up was leveraged to evaluate the performance of each LC-MS system during data acquisition and to establish confidence in the quality of the acquired sample measurements. This was accomplished by analyzing data from the PQCs at the beginning, middle and end of each worklist, thereby providing information on the daily performance of each of the LC-MS systems during the experimental runs. The PQCs enabled LC-MS monitoring using both signal intensity and retention time stability. Heavy and light peak AUCs were tracked for the 176 QC transition pairs in PQC samples to confirm MS performance. CVs were calculated across three PQCs in each batch to evaluate intra-batch precision. Individual PQC plots were generated daily for both heavy and light peaks of the QC transitions to demonstrate peak AUC and CV trends over the four months. In addition, RT plots tracking RT shifts of 1552 heavy transitions were generated for all the 1045-patient data files to confirm data quality.
Data were compiled for the labeled and light peaks for each of the 1552 transition pairs in the final dMRM method, across all 1045 patient samples of the study. Prior to evaluating CRC signal, transition pairs were evaluated along three quality metrics; only transitions that passed all three checks were used to assess CRC signal in the study.
First, transitions were evaluated as to their quantitative performance. Specifically, the standard curve for a transition pair's labeled peak was required to have a successful linear fit (requiring a defined LoB, a defined LoD, at least 3 standard concentrations >=LoD and with CVs <=30%, and a positive linear slope distinguishable from 0).
Second, transitions were required to have high quality peaks. Peak quality was assessed with a proprietary machine learning tool developed in-house. Instead of directly assessing peak shape itself, the in-house tool integrated information about several parameters that, together, were found to be strongly associated with clearly favorable (large and easily recognized) peak shapes. These parameters covered seven measures related to labeled peak area, the consistency of labeled peak area, light peak area, light/labeled peak ratios, the difference between labeled peak retention time and expected retention time, consistency of labeled peak retention times, and consistency of differences between labeled and light peak retention times. The tool validated with 95% accuracy in predicting manual assessments of peak quality.
Third, transitions were required to have labeled peak measured in all 1045 samples. In combination with the other two criteria, this ensured that signal measurement was valid in all samples, thus obviating any need for imputation.
For transitions that passed these three quality checks, the light peak's endogenous concentration in each sample was calculated as the ratio of light/heavy peak area multiplied by the known spike-in concentration of the heavy peak. These endogenous concentrations were used to calculate each transition's univariate CRC signal; receiver operating characteristic (ROC) analysis was used to calculate a CRC vs nonCRC AUC in the 672-sample Discovery set. ROC analysis was performed using the pROC package (version 1.10.0). In addition, statistical tests (Student's T Test, and the Wilcoxon Rank Sum Test) were run to evaluate whether each transition's concentration was significantly different between CRC and nonCRC samples in the Discovery set. All analyses were performed using the R programming language running in Unix and OSX environments.
We previously reported an LC-dMRM method that measured 337 peptides from 187 proteins with a 29-minute gradient on an LC-MS system of Agilent 1290 UHPLC-6490 QQQ. In this study, we developed a new expanded method, in which the LC gradient was further optimized to separate a new candidate list of 1006 peptides in 32 minutes on the same LC-MS workstation. In some cases, the optimal gradient program would have elution concurrency at or below 25 peptides in every 42-second acquisition window over the entire LC method. The final gradient program located RTs of 979 peptides representing 430 proteins and achieved this concurrency requirement for 63% of the 979 peptides across 82% of the entire 31.75-min LC gradient. In addition, the full width half maximum (FWHM) of heavy peptide MS1 EIC peaks centered around 5-6 seconds (median 5.5 seconds)—wide enough to obtain 15-20 data points across each peak using a 500 ms cycle time, and narrow enough to accommodate RT shifts in the 42-second acquisition window.
Following LC optimization, the optimal CE was empirically determined for each of the 8806 heavy transitions as the CE yielding the highest average labeled peak AUC. An example of CE optimization for the heavy transition SLYLGR→y5 is shown in
Transition Selection to Build the Final Multiplexed dMRM Assay
With the optimal LC-MS condition, the 8806 heavy and light transition pairs were experimentally studied to select robust and interference-free transitions. Each transition pair was evaluated for passing or failing 5 quantitative criteria in the order of priority above. The passing rate in 8806 transitions for each of the five metrics is summarized in Table 6.
Transitions were automatically categorized and selected using the 10-tier ranking system (Table 3) with a proprietary algorithm, resulting in 1552 top performing transition pairs selected to represent 641 peptides from 392 CRC proteins. In detail, 718 transitions from tiers 1 and 2 were first chosen for 359 peptides representing 183 proteins. To increase the proteins covered, a second transition selection was performed for the remaining 247 proteins. An additional 558 top-performing transitions were selected in all the tiers for 279 peptides representative of 209 proteins. Next the unselected transitions of the existing 392 proteins were backfilled for any 42-second acquisition windows with transition concurrency <90 until it was equal to 90. An additional top-ranked 276 transitions were added for 3 peptides in the final assay. Following the automatic selection, manual review was performed and 117 of 1552 transitions (7.5%) were manually replaced due to interference.
Our 10-tier transition ranking system, incorporating five quantitative criteria, used a strict cutoff for each criterion to select the highest quality targets suitable for inclusion in the final dMRM method. This automated process was found to be accurate when compared to a small-scale manual transition selection that was performed in parallel. In addition, the speed and objectivity of the automated process render it preferable to manual processes.
After method development, each transition's analytic performance was characterized by considering LoBs, LoDs, LLoQs, and dynamic ranges established on the basis of 10-point standard curves run using the finalized method. Of the 1552 total transitions, 1357 had valid measures for all of these metrics. Example standard curves are shown in
The 1357 transitions for which analytical performance could be assessed covered 87.4% of the 1552 transitions measured in the study. On the peptide level, these 1357 transitions covered 596, or 93.0%, of the 641 peptides in the study. On the protein level, these 1357 transitions covered 373, or 95.2%, of the 392 proteins in the study.
Protein Immunodepletion and Digestion
The reproducibility of sample analysis is dependent on the consistency of sample preparation prior to data collection. In this study, we evaluated two processing steps subject to sample variation: immuno-depletion and trypsin digestion. To assess the reproducibility of plasma immuno-depletion, a photodiode array (PDA) detector using ultraviolet detection (220 nm) monitored peak AUC and RT for both the flow-through and bound fractions. The consistency in immuno-depletion was observed by overlaying UV traces of samples within a run and between days. 207 PQCs' flow-through peak AUCs (depleted plasma fractions) were monitored over the four-month study period.
In addition, one out of four PQCs was processed in each sample batch (16 patient samples) for the purpose of monitoring immuno-depletion as well as trypsin digestion efficiency. Following sample processing and prior to the start of the biomarker study data collection, the single PQC from each sample batch was analyzed by two separate injections on a 6550 Q-TOF (Agilent technologies). A full scan MS1 analysis provided information on the abundance of molecular features (z=2-4), whereas the MS2 data dependent acquisition (DDA) analysis provided information on the identification of immuno-depleted Human 14 proteins and the missed cleavage rate as a measure of digestion efficiency. The molecular feature counts (z=2-4) and missed cleavage rate of the PQC on a total of 47 plates demonstrated reproducibility in both the immuno-depletion and trypsin digestion (
An essential requirement of a biomarker discovery study is establishing confidence in the proteomic data set. In the study presented here, data were acquired over a four-month period across two LC-MS systems, therefore monitoring the intra- and inter-day reproducibility within and between LC-MS systems was essential to safeguarding confidence in the results. PQCs, a SIS peptide mixture, and selected QC transitions were used to test system suitability prior to data collection, and to monitor the performance of each LC-MS system during sample batch analysis.
An SST was performed using a 5-point log-serial dilution of SIS peptide mixture in solvent at the start of each worklist. This provided real-time information on the state and performance level of each LC-MS system prior to initiating sample data collection. Each set of 5 injections of the SIS peptide mixture (0.05, 0.5, 5, 50, and 500 fmol/μL) was monitored for RT shift and signal intensity. Each day, 95% of the observed RTs were within 5 seconds of expected, passing quality criteria required to run samples. Heavy peak AUCs of 176 pre-selected QC transitions were consistent across 33 running days on two Agilent 6490 QQQs (
While confirming acceptable performance of the LC-MS system prior to data collection was essential, establishing confidence in the results acquired over a 21-hour sample batch run period was equally important. In this study, reference materials were three PQCs spiked with SIS peptide mixture, interleaved between study samples to run at the beginning, middle, and end of each day's runs. Each PQC was used to monitor both the LC and MS performance. To monitor LC performance, the peak apex elution of each heavy transition from the first PQC run each day was used to monitor RT shift; the acceptance criterion for each peak permitted a maximum 15-second shift in peak elution.
In some embodiments, the consistency in heavy transition performance was achieved by adhering to a daily maintenance checklist for the HPLC, the QQQ, or both. High intra-batch CVs of 176 light transitions would trigger an investigation into either the instrument performance or sample processing. In actuality, no failures were observed in quality controls in the sample processing or system suitability testing. In addition, automated data processing permitted real time monitoring of trends in LC retention time and MS response. This allowed the operator to stop the instrument and remedy a problem if a component of the performance test failed to meet acceptance criteria.
Upon completion of data collection for the 1045 study samples, the data were compiled across all the samples for all 1552 transition pairs. Prior to study analysis, transitions were filtered according to three quality metrics. First, transitions were filtered according to their quantitative performance (see Methods “Assay analytical performance”). As described above, 1357 of the 1552 transitions were found to have quantitative performance. Second, both light and labeled peak pairs for each transition were filtered according to peak quality, assessed using a proprietary in-house machine learning tool (see Methods “Sample data processing”). Of the 1552 transitions, 1358 were found to have good quality for both light and labeled peaks throughout the study, 1290 of which also passed the first filter for quantitative performance. Finally, transitions were filtered to exclude those for which either light or labeled peaks were not evident in one or more of the study patient samples. Of the 1290 transitions that passed the first two filters, this step removed 338 transitions with missing values in one or more samples, leaving a total of 952 transitions passing all three quality filters. These 952 transitions covered 61.3% of the full 1552 transitions measured in the study. On the peptide level, these 952 transitions covered 529, or 82.5% of the 641 peptides in the study. On the protein level, these 952 transitions covered 345, or 88.0% of the 392 proteins in the study.
For each of these 952 transitions, endogenous concentration was calculated as the ratio of light/labeled peak area times the known spike-in concentration of the labeled peak. An overall assessment of univariate CRC signal in the dataset was performed. To this end, the CRC signal carried by each transition's endogenous concentrations in the 672-sample Discovery set was assessed. Each transition's univariate CRC signal was determined using ROC analysis to calculate a CRC vs non-CRC AUC, and its 95% confidence interval, in the 672-sample Discovery set.
Of the 952 transitions considered in this analysis, 252 transitions, covering 127 unique proteins, were found to have AUCs with confidence intervals that excluded 0.50, indicating potential as single biomarkers (
Plasma samples were taken from the Endoscopy II collection, described in Blume et al., 2016. The particular samples used in TPv2 were from the same 1,045 patients used to develop the SPCv1 CRC test, and are described in detail in Croner et al., unpublished. Briefly, the 1,045 samples were assigned to a 672-sample discovery set and a 373-sample validation set. The discovery set contained 373 samples in which the proportions of diagnostic groups were representative of the intent-to-test (ITT) population, and 299 additional CRC (176) and advanced adenoma (123) samples. The validation set contained 373 samples with ITT proportions of diagnostic groups. There was no overlap between the samples in the discovery and validation sets.
The sample concentrations of targeted peptide ions were obtained using a dynamic MRM method on MS instruments. Target selection, assay development, and initial (pre-classifier) data processing are described in detail in You et al., 2018.
Supervised classifiers were built using API's “simple grid” approach applied to data from the 672-sample discovery set. For each simple grid process, all possible classifiers defined by a set of parameters were built using ten iterations of 10-fold cross validation applied to the discovery set; the classifier with the highest median merged AUC across the ten iterations was then selected as the top build for that grid. In total, 58 simple grids were run. All the grids used glmnet feature selection within each fold. However, the grids varied in the range of feature counts considered, whether age and/or gender were included as predictor candidates, the subset of transitions included as predictor candidates, whether transition concentration data were log 2-transformed, whether ratios based on transitions and other features were included as predictor candidates, whether data scaling was tested, the classifier algorithms used, the supervised discrimination performed (CRC vs non-CRC, or CRC vs “No comorbidity-no finding” diagnostic group [NCNF, cleanest controls]), and/or the portion of the discovery set used (full discovery set or ITT subset). Further details about the simple grid approach can be found in Croner et al., 2017 and Croner et al., unpublished.
Final models from the most promising grid builds were used in Indeterminate or “NoCall” (NoC) analyses. NoC analyses were applied to the CRC vs non-CRC discrimination within the ITT subset of the discovery set. NoC analyses aimed to determine a contiguous range of model scores such that samples receiving scores in that range would not receive a final model-based CRC call, thus enhancing the overall performance of the model. Further details about NoC analyses can be found in Croner et al., 2017 and Croner et al., unpublished.
Six of the best-performing classifiers and their associated NoC regions were then tested in the separate validation set. Validation was considered a success if 1) the validation AUC was either not statistically distinguishable from the discovery AUC or was statistically distinguishable from and higher than the discovery AUC, and 2) the validation AUC was statistically distinguishable from and greater than the univariate age AUC in the validation set. For successful validations, the validation AUC was also compared with the SPCv1 validation AUC; in this comparison, the study goal of at least equivalent performance to SPCv1 would be met by finding that either the two AUCs were not statistically distinguishable, or that they were statistically distinguishable with the TPv2 AUC having the higher value.
Despite the wide variation across simple grid configurations, the 58 grid builds can be grouped into five general approaches, described below. The five approaches differ in the pool of features from which the simple grid's glmnet feature selection pulled candidate predictors for each fold of each build.
These builds used simplistic and pre-planned feature sets as pools of candidate predictors. These pools included the sets of transitions and demographics in each of the two main data matrices provided by Atet Kao (AK) (see below). They also included the set of 252 transitions with significant CRC vs non-CRC signal, as described in You et al., 2018.
These builds included ratios—ratios of transition concentrations, and ratios involving both patient age and transition concentrations—in the pool of candidate predictors. For these builds, all possible ratios were calculated for limited feature sets. Specifically, they were calculated for the 252 transitions with CRC vs non-CRC signal, and for the transitions involved in the best AK 2016 classifier (see below).
These builds aimed to use a small number of predictors, and pulled predictor candidates only from a list of 23 single features and feature ratios shown to have CRC vs NCNF univariate AUCs >=0.85 in the discovery set. These 23 features and ratios were as follows:
These builds pulled predictor candidates from one of three specialized feature subsets determined by ten feature selection algorithms that differed from the glmnet approach used in simple grids.
Both TPv1 (Jones et al., 2016), and AK 2016 builds (see below) used a variety of feature selection methods encompassed in the R package known as FSelector. To increase the power of the simple grids, ten FSelector feature selection algorithms were applied to three promising subsets of features; then simple grid builds pulled candidate predictors only from features selected by these additional algorithms.
The ten FSelector algorithms applied were correlation, consistency, linear correlation, rank correlation, information gain, gain ratio, symmetrical uncertainty, oneR, random forest, and relief. The three promising transition subsets to which these algorithms were applied were the 252 transitions with univariate CRC signal (see You et al., 2018), the 23 transitions and ratios with univariate CRC AUCs (CRC vs NCNF) >=0.85, and the 974 transitions with complete measures and passing peak quality metrics (from the second data matrix described below). For each feature subset, the features selected by the ten algorithms were pooled and then used as a single list of features from which the simple grid builds would pull candidate predictors in a separate set of builds.
These builds pulled predictors from a specialized subset of 23 transitions based on AK 2016 classifier builds.
AK built TPv2 classifiers using the “expanded grid” process in late 2016. The expanded grid differed from the simple grid primarily in using a wider range of feature selection methods. In the past, some of API's best-performing classifiers resulted from AK's expanded grid. Thus, one strategy for the new TPv2 classifiers described here was to limit features in some of the new builds to those used in the best AK build. To that end, AK's 2016 classifier files were compiled and explored to identify these features.
The best 2016 TPv2 build was an 11-feature glmboost, with median merged test AUC of 0.92 from discovery cross-validation. This build was for a CRC vs NCNF discrimination. For this particular model, 32 features (31 transitions and age) were selected as predictors in various versions of the 11-feature glmboost model. Ideally, all of these features would be explored with new classifiers using the final classifier matrices provided by AK to the team. However, only 23 of the 31 transitions appeared in the preferred data matrix (the matrix with complete measures from transitions that passed peak quality checks, see below). In addition, for those transitions that were represented in both AK builds' and the 2018 builds' data matrices, the concentration values differed numerically between the two files; this was likely due to the use of different algorithms for calculating raw peak area—probably pipeline-based raw peaks for the best AK build, and AKRawV1 raw peaks for the files distributed to the classifier team. Despite these issues, a reasonable approach was to use the 23 features appearing in both the AK and classifier team matrices, when performing the subset of the new builds aimed at exploring the best AK build. These 23 features were as follows:
To enable manual review of peak quality, peak images were built for transitions that appeared in top classifiers. The process for building these images was based on that employed by AK in 2016, when an effort was made to produce image files for all of the TPv2 transitions. This 2016 effort was halted before completion, in part because of the long time required to build the images. Here, the same process was used to build image files for just the subset of transitions playing important roles in the 2018 classifiers.
A peak identification algorithm was used for calculating raw peak areas. An alternative would have been to use the API pipeline algorithm. (Note: The pipeline algorithm was likely used to calculate peak areas for data used in AK's original classifier builds.)
Some data files contain only those transitions that had valid measures in all 1,045 samples. Valid measures were those with non-NA raw peak areas for SIS peaks.
Some data files considered only transitions with endogenous and SIS peaks assigned to peak quality group 1 or 2 when building the data file. Thus the data file contains only those transitions that were assessed as good quality and that had valid measures in all 1,045 samples. The peak quality tool used was a random forest classifier that assigns peaks to one of three quality groups, with group 3 being the lowest quality group.
Comparison of Measures from Three Endoscopy II Studies
Additional work was performed comparing the various measures API generated for the Endoscopy II samples. These included CRC05 ELISA, CRC06 MSD, CRC05 MRM (TPv2) measures.
Of the 58 simple grids performed, 17 gave rise to classifiers that were subjected to NoC analyses. Validation was attempted for six of these 17 classifiers, and succeeded for three. These three successful validations came from grid build numbers 28, 40, and 52. Further details about the 58 grids performed are presented in the Discussion. Here we offer
This application claims the benefit of U.S. Prov. App. Ser. No. 62/594,941, filed Dec. 5, 2017, which is hereby explicitly incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/064107 | 12/5/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62594941 | Dec 2017 | US |