The content of the following submission on ASCII text file is incorporated herein by reference in its entirety: a computer readable form (CRF) of the Sequence Listing (file name: 166532000900SEQLIST.txt, date recorded: Mar. 6, 2022, size: 674,724 bytes
Protein glycosylation and other post-translational modifications play vital roles in virtually all aspects of human physiology. Unsurprisingly, faulty or altered protein glycosylation often accompanies various disease states. The identification of aberrant glycosylation provides opportunities for early detection, intervention, and treatment of affected subjects. Current biomarker identification methods, such as those developed in the fields of proteomics and genomics, can be used to detect indicators of certain diseases, such as cancer, and to differentiate certain types of cancer from other, non-cancerous diseases. However, the use of glycoproteomic analyses has not previously been used to successfully manage treatment of a subject.
Glycoprotein analysis is fraught with challenges on several levels. For example, a single glycan composition in a glycopeptide can contain a large number of isomeric structures due to different glycosidic linkages, branching patterns, and/or multiple monosaccharides having the same mass. In addition, the presence of multiple glycans that share the same peptide backbone can lead to assay signals from various glycoforms, lowering their individual abundances compared to aglycosylated peptides. Accordingly, the development of algorithms that can identify glycan structures on peptide fragments remains elusive.
In light of the above, there is a desire for improved analytical methods that involve site-specific analysis of glycoproteins to obtain information about protein glycosylation patterns, which can in turn provide quantitative information that can be used to manage the treatment of a subject diagnosed with a particular disease or condition. Thus, it may be desirable to have methods and systems capable of addressing one or more of the above-identified issues.
In one or more embodiment, a method is provided for managing a treatment for a subject diagnosed with a melanoma condition. The method includes receiving peptide structure data corresponding to a set of glycoproteins in a biological sample obtained from the subject. A treatment score is computed using quantification data identified from the peptide structure data for a set of peptide structures. The set of peptide structures includes at least one peptide structure identified from a plurality of peptide structures listed in Table 1. A treatment output that indicates a predicted response to the treatment for the subject is generated using the treatment score.
In one or more embodiments, a method is provided for treatment management of a subject diagnosed with a melanoma condition. The method includes receiving peptide structure data corresponding to a set of peptide structures associated with a set of glycoproteins in a biological sample obtained from the subject. A plurality of treatment scores is computed using quantification data identified from the peptide structure data for a plurality of subsets of the set of peptide structures. Each treatment score of the plurality of treatment scores corresponds to a different treatment of a plurality of treatments; wherein each subset of the plurality of subsets includes at least one peptide structure identified from a plurality of peptide structures listed in Table 1. A comparison analysis of the plurality of treatment scores is performed. A treatment output is generated based on the comparison analysis. The treatment output includes a recommended treatment plan for treating the subject.
In one or more embodiments, a method is provided for treatment management of a subject diagnosed with a melanoma condition. The method includes receiving peptide structure data corresponding to a set of peptide structures associated with a set of glycoproteins in a biological sample obtained from the subject. A first treatment score is computed for a first treatment of pembrolizumab using first quantification data identified from the peptide structure data for a first subset of the set of peptide structures. The first subset includes at least one peptide structure identified from a plurality of peptide structures listed in Table 2. A second treatment score is computed for a second treatment comprised of nivolumab and ipilimumab using second quantification data identified from the peptide structure data for a second subset of the set of peptide structures. The second subset includes at least one peptide structure identified from a plurality of peptide structures listed in Table 3. A comparison analysis of the first treatment score and the second treatment score is performed. A treatment output is generated based on the comparison analysis. The treatment output identifies one of the first treatment and the second treatment as a recommended treatment for the subject.
In one or more embodiments, a method is provided for treating a subject diagnosed with a melanoma condition. The method includes receiving peptide structure data corresponding to a set of glycoproteins in a biological sample obtained from the subject. A treatment score is computed using quantification data identified from the peptide structure data for a set of peptide structures. The set of peptide structures includes at least one peptide structure identified from a plurality of peptide structures listed in Table 1. A treatment output that indicates a predicted response to a treatment for the subject is generated using the treatment score. The treatment is administered to the patient in response to the predicted response including a positive response classification. The step of administering comprises at least one of intravenous or oral administration of the recommended treatment or a derivative thereof at a therapeutic dosage. The treatment is selected as one from a group consisting of: a first treatment of pembrolizumab for which the therapeutic dosage of at least one of 200 mg every three weeks, 2 mg/kg every three weeks is administered, or 400 mg every 6 weeks; and a second treatment comprised of nivolumab and ipilimumab for which the therapeutic dosage of either 1 mg/kg nivolumab with 3 mg/kg ipilimumab or 3 mg/kg nivolumab with 1 mg/kg ipilimumab is administered.
In one or more embodiments, a method is provided for managing a treatment for a subject diagnosed with a melanoma condition. The method includes receiving sample data for a sample population. The sample data characterizes responses of a plurality of sample subjects diagnosed with the melanoma condition to the treatment and includes sample peptide structure data for a collection of peptide structures for each subject of the plurality of sample subjects. The sample data is grouped based on the responses of the plurality of sample subjects into a first group corresponding to a first response classification and a second group corresponding to a second response classification. A differential abundance analysis is performed using the sample data to compare the first group of the sample data corresponding to the first response classification and the second group of the sample data corresponding to the second response classification to identify a set of peptide structures from the collection of peptide structures. The set of peptide structures comprises a selected N most differentiating peptide structures between the first response classification and the second response classification. Peptide structure data corresponding to a set of glycoproteins in a biological sample obtained from the subject is received. A treatment score is computed for the treatment using quantification data identified from the peptide structure data for the set of peptide structures. A treatment output that indicates a predicted response to the treatment for the subject is generated using the treatment score.
In one or more embodiments, a method of treating melanoma in a subject is provided. The method includes receiving peptide structure data corresponding to a set of glycoproteins in a biological sample obtained from the subject. A treatment score is computed using quantification data identified from the peptide structure data for a set of peptide structures. The set of peptide structures includes at least one peptide structure identified from a plurality of peptide structures listed in Table 1. A treatment output is computed using the treatment score. A pembrolizumab treatment is administered to the subject if the treatment output includes at least one of a positive response classification for the pembrolizumab treatment or an identification of the pembrolizumab treatment as a recommended treatment.
In one or more embodiments, a method of treating melanoma in a subject is provided. The method includes receiving peptide structure data corresponding to a set of glycoproteins in a biological sample obtained from the subject. A treatment score is computed using quantification data identified from the peptide structure data for a set of peptide structures. The set of peptide structures includes at least one peptide structure identified from a plurality of peptide structures listed in Table 1. A treatment output is computed using the treatment score. A combination treatment comprising a combination of nivolumab and ipilimumab is administered to the subject if the treatment output includes at least one of a positive response classification for the combination treatment or an identification of the combination treatment as a recommended treatment.
In one or more embodiments, a method of identifying patients with melanoma for treatment with a pembrolizumab treatment is provided. The method includes receiving peptide structure data corresponding to a set of glycoproteins in a biological sample obtained from the subject. A treatment score is computed using quantification data identified from the peptide structure data for a set of peptide structures. The set of peptide structures includes at least one peptide structure identified from a plurality of peptide structures listed in Table 1. A treatment output is generated using the treatment score. The patient is treated with the pembrolizumab treatment if the treatment output includes at least one of a positive response classification for the pembrolizumab treatment or an identification of the pembrolizumab treatment as a recommended treatment.
In one or more embodiments, a method of identifying patients with melanoma for treatment with a combination treatment comprising nivolumab and ipilimumab is provided. The method includes receiving peptide structure data corresponding to a set of glycoproteins in a biological sample obtained from the subject. A treatment score is computed using quantification data identified from the peptide structure data for a set of peptide structures. The set of peptide structures includes at least one peptide structure identified from a plurality of peptide structures listed in Table 1. A treatment output is generated using the treatment score. The patient is treated with the combination treatment if the treatment output includes at least one of a positive response classification for the combination treatment or an identification of the combination treatment as a recommended treatment.
In one or more embodiments, a method is provided for analyzing a set of peptide structures in a sample from a patient. The method includes (a) obtaining the sample from the patient; (b) preparing the sample to form a prepared sample comprising a set of peptide structures; (c) inputting the prepared sample into a reaction monitoring mass spectrometry system to detect a set of product ions associated with each peptide structure of the set of peptide structures; and (d) generating quantification data for the set of product ions using the reaction monitoring mass spectrometry system. The set of peptide structures includes at least one peptide structure selected from peptide structures PS-1 to PS-38 identified in Table 6. The set of peptide structures includes a peptide structure that is characterized as having: (i) a precursor ion with a mass-charge (m/z) ratio within ±1.5 of the m/z ratio listed for the precursor ion in Table 6 as corresponding to the peptide structure; and (ii) a product ion having an m/z ratio within ±1.0 of the m/z ratio listed for the first product ion in Table 6 as corresponding to the peptide structure.
In one or more embodiments, a composition is provided, the composition comprising a peptide structure or a product ion, wherein: the peptide structure or product ion comprises the amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 21-46, corresponding to peptide structures PS-1 to PS-38 in Table 1; and the product ion is selected as one from a group consisting of product ions identified in Table 6 including product ions falling within an identified m/z range.
In one or more embodiments, a composition is provided, the composition comprising a glycopeptide structure selected as one from a group consisting of peptide structures PS-1 to PS-38 identified in Table 6. The glycopeptide structure comprises: an amino acid peptide sequence identified in Table 5 as corresponding to the glycopeptide structure; and a glycan structure identified in Table 1 as corresponding to the glycopeptide structure in which the glycan structure is linked to a residue of the amino acid peptide sequence at a corresponding position identified in Table 1. The glycan structure has a glycan composition.
In one or more embodiments, a composition is provided, the composition comprising a peptide structure selected as one from a plurality of peptide structures identified in Table 1. The peptide structure has a monoisotopic mass identified as corresponding to the peptide structure in Table 1. The peptide structure comprises the amino acid sequence of SEQ ID NOs: 21-46 identified in Table 1 as corresponding to the peptide structure.
In one or more embodiments, a kit is provided, the kit comprising at least one agent for quantifying at least one peptide structure identified in Table 1 to carry out at least a portion of any one of the methods disclosed herein.
In one or more embodiments, a kit is provided, the kit comprising at least one of a glycopeptide standard, a buffer, or a set of peptide sequences to carry out at least a portion of any one of the methods disclosed herein, a peptide sequence of the set of peptide sequences identified by a corresponding one of SEQ ID NOS: 21-46, defined in Table 1.
Provided herein are methods, devices, and kits for identifying glycoproteomic biomarkers and signatures for diagnosis of a disease or a condition, such as cancer, progression of the disease or condition, and response of the disease or condition to a treatment, such as treatment with immune checkpoint blockade for cancer.
Provided herein are methods for identifying one or more glycopeptide biomarkers predictive of a disease or a condition in a subject, the method comprising: (a) obtaining from a subject a first sample at a first timepoint and a second sample at a second timepoint, wherein the first sample and the second sample comprise a glycoprotein; (b) fragmenting the glycoprotein in the first sample or the second sample into one or more glycopeptides, wherein the one or more glycopeptides comprise one or more amino acid sequences selected from a group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof; (c) determining an amount of the one or more glycopeptides using multiple reaction monitoring mass spectrometry (MRM-MS); (d) associating the amount of the one or more glycopeptides with the first timepoint or the second timepoint, wherein the subject has a change in a disease or a condition from the first timepoint to the second timepoint; and (e) identifying as glycopeptide biomarkers the glycopeptide where the amount of the one or more glycopeptides changed from the first timepoint to the second timepoint.
Provided herein are methods for identifying one or more glycopeptide biomarkers predictive of a disease or a condition in a subject, the method comprising: (a) obtaining, by a computer, data of an amount of one or more glycopeptides for a set (n) of subjects, wherein the one or more glycopeptides are generated by fragmenting a glycoprotein in a sample from a subject, the amount of one or more glycopeptides are determined using multiple reaction monitoring mass spectrometry (MRM-MS), and the data for each subject comprises data from samples taken at a plurality of timepoints; (b) selecting, by the computer, a subset of the one or more glycopeptides to include in a predictive model; (c) assessing, by the computer, the predictive model using a cross-validation with n−1 subjects to generate an outcome score for a holdout subject; (d) iterating, by the computer, step (c) for each of n subjects as the holdout subject to generate an outcome score for each subject; (e) dichotomizing, by the computer, the outcome scores for each subject at a cutoff outcome score as below or above the cutoff outcome score; (f) analyzing, by the computer, the amount of one or more glycopeptides for subjects having outcome scores above the cutoff outcome score to the amount of one or more glycopeptides for subjects having outcome scores below the cutoff outcome score for each glycopeptide in the subset of the one or more glycopeptides to determine a hazard ratio and an interaction p-value for each glycopeptide; (g) identifying, by the computer, the glycopeptide having the interaction p-value ≤0.05 as a glycopeptide biomarker for predicting the disease or the condition. In some embodiments, the cross-validation is leave-one-out cross-validation (LOOCV). In some embodiments, the cutoff outcome score was determined to optimize Harrell's C-index. In some embodiments, the interaction p-value is less than or equal to 0.01, 0.005, or 0.001 in step (g).
Provided herein are methods for assessing a status of a condition and a treatment in a subject, the method comprising: (a) fragmenting a glycoprotein in a sample from a subject into one or more glycopeptides, wherein the sample comprises one or more of glycoproteins, glycans, or glycopeptides; (b) performing mass spectroscopy (MS) on the one or more glycopeptides using multiple reaction monitoring mass spectrometry (MRM-MS) to quantify an amount of the one or more glycopeptides in the sample, wherein the one or more glycopeptides comprise one or more amino acid sequences selected from a group consisting of SEQ ID NOs: 7, 9, 12, 15, 16, 18, 20, 30, 34, 37, 44, 59, 60, 61, 62, 66, 69, 70, 75, 77, 80, and 83, and combinations thereof; (c) inputting data of the amount of the one or more glycopeptides into a trained model to generate an output probability, wherein the output probability is indicative of whether a treatment positively influences an outcome of the subject having a condition; and (d) generating a treatment recommendation based on the output probability, wherein the condition is melanoma and the treatment comprises checkpoint inhibitors. In some embodiments, the outcome comprises overall survival time. In some embodiments, the outcome comprises progression-free survival time. In some embodiments, the treatment comprises one or more of ipilimumab, nivolumab, and pembrolizumab. In some embodiments, the treatment comprises one or more of PD-1-, PD-L1-, and CTLA-4-inhibitors. In some embodiments, the recommendation comprises continuing the treatment if the output probability indicates the treatment positively influences the outcome.
Furthermore, provided herein are methods for assessing a status of a condition and a treatment in a subject, the method comprising: (a) fragmenting a glycoprotein in a sample from a subject into one or more glycopeptides, wherein the sample comprises one or more of glycoproteins, glycans, or glycopeptides; (b) performing mass spectroscopy (MS) on the one or more glycopeptides using multiple reaction monitoring mass spectrometry (MRM-MS) to quantify an amount of the one or more glycopeptides in the sample, wherein the one or more glycopeptides comprise one or more amino acid sequences selected from a group consisting of SEQ ID NOs: 300-429, and combinations thereof; (c) inputting data of the amount of the one or more glycopeptides into a trained model to generate an output probability, wherein the output probability is indicative of whether a treatment positively influences an outcome of the subject having a condition; and (d) generating a treatment recommendation based on the output probability, wherein the condition is non-small cell lung cancer (NSCLC) and the treatment comprises checkpoint inhibitors. In some embodiments, the outcome comprises overall survival time. In some embodiments, the outcome comprises progression-free survival time. In some embodiments, the treatment comprises one or more of ipilimumab, nivolumab, and pembrolizumab. In some embodiments, the treatment comprises one or more of PD-1-, PD-L1-, and CTLA-4-inhibitors. In some embodiments, the treatment comprises chemotherapy. In some embodiments, the chemotherapy comprises one or more of carboplatin and pemetrexed. In some embodiments, the recommendation comprises continuing the treatment if the output probability indicates the treatment positively influences the outcome.
Provided herein are glycopeptides comprising an amino acid sequence selected from a group consisting of SEQ ID NOs: 300-429, and combinations thereof.
Described herein are kits comprising a glycopeptide standard comprising a glycopeptide comprising one or more amino acid sequences selected from a group consisting of SEQ ID NOs: 300-429, and an instruction for using the glycopeptide standard for treating cancer.
In some embodiments, fragmenting comprises protease digestion. In some embodiments, fragmenting comprises applying a mechanical force. In some embodiments, the amount of one or more glycopeptides measures multiple reaction monitoring (MRM) transitions. In some embodiments, the method comprises further generating a panel of glycopeptide biomarkers comprising one or more of the glycopeptide biomarkers identified in step (e). In some embodiments, the cross-validation is leave-one-out cross-validation (LOOCV). In some embodiments, the cutoff outcome score was determined to optimize Harrell's C-index. In some embodiments, the interaction p-value is less than or equal to 0.01, 0.005, or 0.001 in step (g). In some embodiments, the outcome comprises overall survival time. In some embodiments, the outcome comprises progression-free survival time. In some embodiments, the treatment comprises one or more of ipilimumab, nivolumab, and pembrolizumab. In some embodiments, the treatment comprises one or more of PD-1-, PD-L1-, and CTLA-4-inhibitors. In some embodiments, the treatment comprises chemotherapy. In some embodiments, the chemotherapy comprises one or more of carboplatin and pemetrexed. In some embodiments, the recommendation comprises continuing the treatment if the output probability indicates the treatment positively influences the outcome.
In one or more embodiments, a system is provided that includes one or more data processors and a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of one or more methods disclosed herein.
In one or more embodiments, a computer-program product is provided that is tangibly embodied in a non-transitory machine-readable storage medium and that includes instructions configured to cause one or more data processors to perform part or all of one or more methods disclosed herein.
The present disclosure is described in conjunction with the appended figures:
Objective response rates for immune-oncology therapy are low in malignant melanoma and non-small cell lung cancer patients. Subjects should avoid unnecessary exposure and toxicities if they will not respond to immune-oncology therapy. Thus, in some aspects, the present invention is directed to identifying subjects who are not likely to respond to immune-oncology therapy (such as treatment with pembrolizumab and/or treatment with nivolumab and ipilimumab). In some embodiments the methods provided herein increase the rate of responder to immune-oncology treatments by identifying non-responders. Another advantage of the present method is that it can be used to reduce the cost associated with immune-oncology therapy per indication by avoiding treatment of subjects that are not likely to respond to treatment.
In some aspects, the present methods employ models and other predictive methods to assess the likelihood of response of a subject to immunotherapy. In some aspects, the methods provided herein have a high sensitivity for non-responders (those that are not likely to respond to immune-oncology therapy). In some aspects, the methods provided herein have a >95%, >97%, >98, or >99% sensitivity for detection of non-responders.
Provided herein are methods for management of treatment for subjects diagnosed with melanomas. In some embodiments, the subject is diagnosed with advanced melanoma. In some embodiments, the subject is diagnosed with malignant melanoma. In some embodiments, the subject is diagnosed with metastatic melanoma. In some embodiments, the method comprises determining whether the subject is likely to respond to an immunotherapy. In some embodiments, the method comprises determining whether the subject is likely to respond to treatment with pembrolizumab. In some embodiments, the method comprises determining whether the subject is likely to respond to treatment with nivolumab and ipilimumab.
Provided herein are methods of treating melanoma in a subject comprising administering a treatment to the subject. In some embodiments, the melanoma is advanced melanoma. In some embodiments, the melanoma is malignant melanoma. In some embodiments, the melanoma is metastatic melanoma. In some embodiments, the treatment comprises administering pembrolizumab to the subject. In some embodiments, the treatment comprises administering nivolumab and ipilimumab to the subject.
In some embodiments, the method comprises determining the likelihood of response of a subject having melanoma to nivolumab plus ipilimumab as a first line therapy. In some embodiments, the method comprises determining the likelihood of response to nivolumab plus ipilimumab as a second line therapy.
In some embodiments, the method comprises determining the likelihood of response of a subject having non-small cell lung cancer to pembrolizumab as a first line therapy. In some embodiments, the method comprises determining the likelihood of response to pembrolizumab as a second line therapy.
In some embodiments, the methods provided herein comprises generating a treatment output that predicts a response to an immune-oncology therapy (such as pembrolizumab or nivolumab plus ipilimumab) In some embodiments, the predicted response is likely responsive, likely nonresponsive, or indeterminate. In some embodiments, the treatment output is determined based upon the presence, absence, or amount of one or more glycopeptide set forth in Table 7, Table 12, Table 14, or Table 16. In some embodiments, the methods provided herein predict overall survival in subjects with melanoma. In some embodiments, the methods provided herein predict progression free survival in subject with NSCLC.
The embodiments described herein recognize that glycoproteomics is an emerging field that can be used in the overall treatment of subjects (e.g., patients) with various types of diseases. Glycoproteomics aims to determine the positions, identities, and quantities of glycans and glycosylated proteins in a given sample (e.g., blood sample, cell, tissue, etc.). Protein glycosylation is one of the most common and most complex forms of post-translational protein modification, and can affect protein structure, conformation, and function. For example, glycoproteins may play crucial roles in important biological processes such as cell signaling, host-pathogen interactions, and immune response and disease. Glycoproteins may therefore be important to treating different types of diseases.
Although protein glycosylation provides useful information about cancer and other diseases, analysis of protein glycosylation may be difficult as the glycan typically cannot be traced back to the protein site of origin with currently available methodologies. Glycoprotein analysis can be challenging in general due to several reasons. For example, a single glycan composition in a peptide may contain a large number of isomeric structures because of different glycosidic linkages, branching, and many monosaccharides having the same mass. Further, the presence of multiple glycans that share the same peptide sequence may cause the mass spectrometry (MS) signal to split into various glycoforms, lowering their individual abundances compared to the peptides that are not glycosylated (aglycosylated peptides).
But to understand various disease conditions and more accurately manage the treatment of such disease conditions, such as melanoma, it may be important to perform analysis of glycoproteins and to identify not only the glycan but also the linking site (e.g., the amino acid residue of attachment) within the protein. Thus, there is a need to provide a method for site-specific glycoprotein analysis to obtain detailed information about protein glycosylation patterns which may be able to provide information that can be used to treat diseases, such as melanoma.
Melanoma is a type of cancer that develops from melanocytes, cells that product pigment. Melanoma may be treated using different types of treatment including, for example, immunotherapies. Such immunotherapies include various types of immune check point inhibitor treatments (e.g., pembrolizumab, nivolumab, ipilimumab) and cytokine therapies (e.g., interferon alpha (IFN-α) and Interleukin 2 (IL-2). Immune check point inhibitors include, for example, anti-cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) monoclonal antibodies (e.g., ipilimumab, tremelimumab), toll-like receptor (TLR) agonists, cluster of differentiation 40 (CD40) agonists, anti-programmed cell death protein 1 (PD-1) (e.g., pembrolizumab, pidilizumab, and nivolumab) and programmed death-ligand 1 (PD-L1) antibodies.
Different patients may respond differently to different treatments. For example, some patients may have great success with one type of treatment while other patients may have limited or no success with that same treatment. Because melanoma is an aggressive cancer and one of the most serious cancers, subjects may not have the luxury of trying different types of treatments over time. It may be important to identify those subjects who are likely to respond to a given treatment to help avoid the burden associated with adverse events (e.g., events that disrupt a subject's progression-free survival) and to avoid the cost associated with treatment subjects who are not likely to respond to certain treatments. Previous methodologies generally focused on specific mechanisms of drug efficacy of a particular treatment. For example, such methodologies focused on tumor response rather than subject survival. But the embodiments described herein provide ways in which to predict treatment response with respect to survivability for different drugs so that a better selection of treatment may be selected for a subject at the outset.
Analyzing peptide structure expression in subjects and, in particular, glycopeptide structure abundance may help predict subject response to treatment for melanoma. A peptide structure may be defined by an aglycosylated peptide sequence (e.g., a peptide or peptide fragment of a larger parent protein) or a glycosylated peptide sequence. A glycosylated peptide sequence (also referred to as a glycopeptide structure) may be a peptide sequence having a glycan structure that is attached to a linking site (e.g., an amino acid residue) of the peptide sequence, which may occur via, for example, a particular atom of the amino acid residue). Non-limiting examples of glycosylated peptides include N-linked glycopeptides and O-linked glycopeptides.
Further, with glycoproteins, there may be too many potential proteoforms to consider. Still further, analysis of peptide structure data in the manner described by the various embodiments herein may be more conducive to accurately predicting treatment response as compared to glycomic analysis that provides little to no information about what proteins and to which amino acid residue sites various glycan structures attach.
By analyzing which peptide structures are most differentiating between different treatment response classifications of interest (e.g., sustained control and early disruption) for a given treatment and then analyzing a subject's peptide structure profile of those particular peptide structures, a clearer understanding of how that subject will respond to that treatment may be achieved.
Accordingly, the embodiments described herein provide various methods and systems for analyzing proteins in subjects and, in particular, glycoproteins. In one or more embodiments, methods and systems are provided for treatment management of a subject diagnosed with a melanoma condition. For example, the embodiments described herein provide methods and systems for receiving peptide structure data corresponding to a set of glycoproteins in a biological sample obtained from the subject; computing a treatment score using quantification data identified from the peptide structure data for a set of peptide structures, wherein the set of peptide structures includes at least one peptide structure identified from a plurality of peptide structures listed in Table 1; and generating a treatment output that indicates a predicted response to the treatment for the subject using the treatment score. The predicted response may indicate whether the subject is likely to have sustained control (e.g., no disruption events that might disrupt the subject's progression-free survival within 12 months of treatment) with the treatment or to have early disruption (e.g., one or more disruption events within the first 6 months of treatment).
The description below provides exemplary implementations of the methods and systems described herein for the research and/or treatment (e.g., designing, planning, administration, etc. of a treatment) of melanoma. Descriptions and examples of various terms, as used herein, are provided in Section II below.
The term “ones” means more than one.
As used herein, the term “plurality” may be 2, 3, 4, 5, 6, 7, 8, 9, 10, or more.
As used herein, the term “set of” means one or more. For example, a set of items includes one or more items.
As used herein, the phrase “at least one of,” when used with a list of items, means different combinations of one or more of the listed items may be used and only one of the items in the list may be needed. The item may be a particular object, thing, step, operation, process, or category. In other words, “at least one of” means any combination of items or number of items may be used from the list, but not all of the items in the list may be required. For example, without limitation, “at least one of item A, item B, or item C” means item A; item A and item B; item B; item A, item B, and item C; item B and item C; or item A and C. In some cases, “at least one of item A, item B, or item C” means, but is not limited to, two of item A, one of item B, and ten of item C; four of item B and seven of item C; or some other suitable combination.
As used herein, “substantially” means sufficient to work for the intended purpose. The term “substantially” thus allows for minor, insignificant variations from an absolute or perfect state, dimension, measurement, result, or the like such as would be expected by a person of ordinary skill in the field but that do not appreciably affect overall performance. When used with respect to numerical values or parameters or characteristics that can be expressed as numerical values, “substantially” means within ten percent.
The term “amino acid,” as used herein, generally refers to any organic compound that includes an amino group (e.g. —NH2), a carboxyl group (—COOH), and a side chain group (R) which varies based on a specific amino acid. Amino acids can be linked using peptide bonds.
The term “alkylation,” as used herein, generally refers to the transfer of an alkyl group from one molecule to another. In various embodiments, alkylation is used to react with reduced cysteines to prevent the re-formation of disulfide bonds after reduction has been performed.
The term “linking site” or “glycosylation site” as used herein generally refers to the location where a sugar molecule of a glycan or glycan structure is directly bound (e.g. covalently bound) to an amino acid of a peptide, a polypeptide, or a protein. For example, the linking site may be an amino acid residue and a glycan structure may be linked via an atom of the amino acid residue. Non-limiting examples of types of glycosylation can include N-linked glycosylation, O-linked glycosylation, C-linked glycosylation, S-linked glycosylation, and glycation.
The terms “biological sample,” “biological specimen,” or “biospecimen” as used herein, generally refers to a specimen taken by sampling so as to be representative of the source of the specimen, typically, from a subject. A biological sample can be representative of an organism as a whole, specific tissue, cell type, or category or sub-category of interest. The biological sample can include a macromolecule. The biological sample can include a small molecule. The biological sample can include a virus. The biological sample can include a cell or derivative of a cell. The biological sample can include an organelle. The biological sample can include a cell nucleus. The biological sample can include a rare cell from a population of cells. The biological sample can include any type of cell, including without limitation prokaryotic cells, eukaryotic cells, bacterial, fungal, plant, mammalian, or other animal cell type, mycoplasmas, normal tissue cells, tumor cells, or any other cell type, whether derived from single cell or multicellular organisms. The biological sample can include a constituent of a cell. The biological sample can include nucleotides (e.g. ssDNA, dsDNA, RNA), organelles, amino acids, peptides, proteins, carbohydrates, glycoproteins, or any combination thereof. The biological sample can include a matrix (e.g., a gel or polymer matrix) comprising a cell or one or more constituents from a cell (e.g., cell bead), such as DNA, RNA, organelles, proteins, or any combination thereof, from the cell. The biological sample may be obtained from a tissue of a subject. The biological sample can include a hardened cell. Such hardened cells may or may not include a cell wall or cell membrane. The biological sample can include one or more constituents of a cell but may not include other constituents of the cell. An example of such constituents may include a nucleus or an organelle. The biological sample may include a live cell. The live cell can be capable of being cultured.
The term “denaturation,” as used herein, generally refers to any molecule that loses quaternary structure, tertiary structure, and secondary structure which is present in their native state. Non-limiting examples include proteins or nucleic acids being exposed to an external compound or environmental condition such as acid, base, temperature, pressure, radiation, etc.
The term “denatured protein,” as used herein, generally refers to a protein that loses quaternary structure, tertiary structure, and secondary structure which is present in their native state.
The terms “digestion” or “enzymatic digestion,” as used herein, generally refer to breaking apart a polymer (e.g. cutting a polypeptide at a cut site). Proteins may be digested in preparation for mass spectrometry using trypsin digestion protocols. Proteins may be digested using other proteases in preparation for mass spectrometry if access is limited to cleavage sites.
The term “treatment” may generally refer to any number of drugs, therapeutics, lifestyle modifications, behavioral modifications, dietary modifications, or combination thereof that can be used to treat a subject suffering form a disease condition.
The term “therapeutic” may refer generally to any drug that can be administered to a subject physically (e.g., via oral, intravenous injection, topical treatment, exposure, etc.).
The terms “immune checkpoint inhibitor,” “immune checkpoint inhibitor therapeutic,” and “immune checkpoint inhibitor drug,” as used herein, generally refer to drugs or therapeutics that can target immune checkpoint molecules (e.g. molecules on immune cells that need to be activated (or inactivated) to start an immune response). Non-limiting examples of immune checkpoint inhibitor therapeutics can include pembrolizumab, nivolumab, and ipilimumab.
The terms “glycan” or “polysaccharide” as used herein, both generally refer to a carbohydrate residue of a glycoconjugate, such as the carbohydrate portion of a glycopeptide, glycoprotein, glycolipid, or proteoglycan. Glycans can include monosaccharides.
The term “glycopeptide” or “glycopolypeptide” as used herein, generally refer to a peptide or polypeptide comprising at least one glycan residue. In various embodiments, glycopeptides comprise carbohydrate moieties (e.g. one or more glycans) covalently attached to a side chain (i.e. R group) of an amino acid residue.
The term “glycoprotein,” as used herein, generally refers to a protein having at least one glycan residue bonded thereto. In some examples, a glycoprotein is a protein with at least one oligosaccharide chain covalently bonded thereto. Examples of glycoproteins, include but are not limited to apolipoprotein C-III (APOC3), alpha-1-antichymotrypsin (AACT), afamin (AFAM), alpha-1-acid glycoprotein 1 & 2 (AGP12), apolipoprotein B-100 (APOB), apolipoprotein D (APOD), complement C1s subcomponent (C1S), calpain-3 (CAN3), clusterin (CLUS), complement component C8AChain (CO8A), alpha-2-HS-glycoprotein (FETUA), haptoglobin (HPT), immunoglobulin heavy constant gamma 1 (IgG1), immunoglobulin J chain (IgJ), plasma kallikrein (KLKB1), serum paraoxonase/arylesterase 1 (PON1), prothrombin (THRB), serotransferrin (TRFE), protein unc-13 homologA (UN13A), and zinc-alpha-2-glycoprotein (ZA2G). A glycopeptide, as used herein, refers to a fragment of a glycoprotein, unless specified otherwise to the contrary.
The term “liquid chromatography,” as used herein, generally refers to a technique used to separate a sample into parts. Liquid chromatography can be used to separate, identify, and quantify components.
The term “mass spectrometry,” as used herein, generally refers to an analytical technique used to identify molecules. In various embodiments described herein, mass spectrometry can be involved in characterization and sequencing of proteins.
The term “peptide,” as used herein, generally refers to amino acids linked by peptide bonds. Peptides can include amino acid chains between 10 and 50 residues. Peptides can include amino acid chains shorter than 10 residues, including, oligopeptides, dipeptides, tripeptides, and tetrapeptides. Peptides can include chains longer than 50 residues and may be referred to as “polypeptides” or “proteins.”
The terms “protein” or “polypeptide” or “peptide” may be used interchangeably herein and generally refer to a molecule including at least three amino acid residues. Proteins can include polymer chains made of amino acid sequences linked together by peptide bonds. Proteins may be digested in preparation for mass spectrometry using trypsin digestion protocols. Proteins may be digested using other proteases in preparation for mass spectrometry if access is limited to cleavage sites.
The term “peptide structure,” as used herein, generally refers to peptides or a portion thereof or glycopeptides or a portion thereof. In various embodiments described herein, a peptide structure can include any molecule comprising at least two amino acids in sequence.
The term “reduction,” as used herein, generally refers to the gain of an electron by a substance. In various embodiments described herein, a sugar can directly bind to a protein, thereby, reducing the amino acid to which it binds. Such reducing reactions can occur in glycosylation. In various embodiments, reduction may be used to break disulfide bonds between two cysteines.
The term “sample,” as used herein, generally refers to a sample from a subject of interest and may include a biological sample of a subject. The sample may include a cell sample. The sample may include a cell line or cell culture sample. The sample can include one or more cells. The sample can include one or more microbes. The sample may include a nucleic acid sample or protein sample. The sample may also include a carbohydrate sample or a lipid sample. The sample may be derived from another sample. The sample may include a tissue sample, such as a biopsy, core biopsy, needle aspirate, or fine needle aspirate. The sample may include a fluid sample, such as a blood sample, urine sample, or saliva sample. The sample may include a skin sample. The sample may include a cheek swab. The sample may include a plasma or serum sample. The sample may include a cell-free or cell free sample. A cell-free sample may include extracellular polynucleotides. The sample may originate from blood, plasma, serum, urine, saliva, mucosal excretions, sputum, stool, or tears. The sample may originate from red blood cells or white blood cells. The sample may originate from feces, spinal fluid, CNS fluid, gastric fluid, amniotic fluid, cyst fluid, peritoneal fluid, marrow, bile, other body fluids, tissue obtained from a biopsy, skin, or hair.
The term “sequence,” as used herein, generally refers to a biological sequence including one-dimensional monomers that can be assembled to generate a polymer. Non-limiting examples of sequences include nucleotide sequences (e.g. ssDNA, dsDNA, and RNA), amino acid sequences (e.g. proteins, peptides, and polypeptides), and carbohydrates (e.g. compounds including Cm(H2O))n).
The term “subject,” as used herein, generally refers to an animal, such as a mammal (e.g., human) or avian (e.g., bird), or other organism, such as a plant. For example, the subject can include a vertebrate, a mammal, a rodent (e.g., a mouse), a primate, a simian or a human. Animals may include, but are not limited to, farm animals, sport animals, and pets. A subject can include a healthy or asymptomatic individual, an individual that has or is suspected of having a disease (e.g., cancer) or a pre-disposition to the disease, and/or an individual that is in need of therapy or suspected of needing therapy. A subject can be a patient. A subject can include a microorganism or microbe (e.g., bacteria, fungi, archaea, viruses).
As used herein, a “model” may include one or more algorithms, one or more functions, one or more equations, one or more statistical tests, one or more mathematical techniques, one or more machine-learning algorithms, or a combination thereof.
As used herein, “abundance,” may refer to a quantitative value generated using mass spectrometry. The quantitative value may relate to the amount of a particular peptide structure. In one or more embodiments, the quantitative value may include an amount of an ion produced using mass spectrometry. The quantitative value may be expressed as an m/z value, in atomic mass units, or in some other manner.
As used herein, “relative abundance,” may refer to a comparison of two or more abundances. In one or more embodiments, the comparison may include comparing one peptide structure to a total number of a set of peptide structures (e.g., the total number of all peptide structures). In some embodiments, the comparison may include comparing one peptide glycoform (e.g., two identical peptides differing by one or more glycans) to a set of peptide glycoforms. In one or more embodiments, the comparison may include comparing a number of ions having a particular m/z ratio versus a total number of ions detected. In one or more embodiments, a relative abundance can be expressed as a ratio, as a percentage, or in some other manner.
The terms “determining”, “measuring”, “evaluating”, “assessing,” “assaying,” and “analyzing” are often used interchangeably herein to refer to forms of measurement, and include determining if an element is present or not (for example, detection). These terms can include quantitative, qualitative or quantitative and qualitative determinations. Assessing is alternatively relative or absolute. “Detecting the presence of” includes determining the amount of something present, as well as determining whether it is present or absent.
The terms “subject,” “individual,” or “patient” are often used interchangeably herein. A “subject” can be a biological entity containing expressed genetic materials. The biological entity can be a plant, animal, or microorganism, including, for example, bacteria, viruses, fungi, and protozoa. The subject can be tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro. The subject can be a mammal. The mammal can be a human. In some embodiments, the mammal is a mouse, rat, simian, canine, feline, bovine, equine, or ovine. The subject may be diagnosed or suspected of being at high risk for a disease. The disease can be cancer. In some cases, the subject is not necessarily diagnosed or suspected of being at high risk for the disease or the condition.
As used herein, the terms “cancer” and “cancerous” refer to or describe the physiological condition in a subject that is typically characterized by unregulated cell growth. Examples of cancer include, but are not limited to, melanoma, carcinoma, lymphoma, blastoma, sarcoma, and leukemia and metastases thereof. The term “metastasis” refers to the transference of disease-producing organisms or of malignant or cancerous cells to other parts of the body by way of the blood or lymphatic vessels or membranous surfaces. Non-limiting examples of such cancers include small-cell lung cancer, non-small cell lung cancer, adenocarcinoma of the lung, squamous carcinoma of the lung, melanoma, squamous cell cancer, cancer of the peritoneum, hepatocellular cancer, gastrointestinal cancer, pancreatic cancer, glioblastoma, cervical cancer, ovarian cancer, liver cancer, bladder cancer, hepatoma, breast cancer, colon cancer, colorectal cancer, endometrial or uterine carcinoma, salivary gland carcinoma, kidney cancer, liver cancer, prostate cancer, thyroid cancer, hepatic carcinoma and various types of head and neck cancer.
As used herein, the phrase “stage of disease” refers to the stages of cancer progression referred to as Stage I, II, III, or IV. Stage of disease indicates if metastasis has occurred in the subject.
As used herein, the terms “treatment” or “treating” are used in reference to a pharmaceutical or other intervention regimen for obtaining beneficial or desired results in the recipient. Beneficial or desired results include but are not limited to a therapeutic benefit and/or a prophylactic benefit. A therapeutic benefit may refer to eradication or amelioration of symptoms or of an underlying disorder being treated. Also, a therapeutic benefit can be achieved with the eradication or amelioration of one or more of the physiological symptoms associated with the underlying disorder such that an improvement is observed in the subject, notwithstanding that the subject may still be afflicted with the underlying disorder. A prophylactic effect includes delaying, preventing, or eliminating the appearance of a disease or condition, delaying or eliminating the onset of symptoms of a disease or condition, slowing, halting, or reversing the progression of a disease or condition, or any combination thereof. For prophylactic benefit, a subject at risk of developing a particular disease, or to a subject reporting one or more of the physiological symptoms of a disease may undergo treatment, even though a diagnosis of this disease may not have been made.
The term “protein” or “polypeptide” or “peptide” may be used interchangeably herein and refers to a molecule comprising at least three amino acid residues. As used herein, the term “protein” or “polypeptide” or “peptide” includes glycopeptides unless stated otherwise.
The term “polysaccharide” is used to describe any polymer made up of subunit monosaccharides, oligomers, or modified monosaccharides. In some embodiments, the polymer may be a homopolymer or a heteropolymer. The linkages between the subunits may include but are not limited to acetal linkages, such as glycosidic bonds; ester linkages such as phosphodiester linkages; amide linkages; and ether linkages.
The term “glycan” is used to describe a carbohydrate residue of a glycoconjugate, such as the carbohydrate portion of a glycopeptide, glycoprotein, glycolipid or proteoglycan. Glycan structures may be described by a glycan reference code number.
As used herein, the term “glycoform” refers to a unique primary, secondary, tertiary and quaternary structure of a protein with an attached glycan of a specific structure.
As used herein, the term “glycopeptide” or “glycopolypeptide” refers to a polypeptide having at least one glycan residue bonded thereto.
As used herein, the phrase “glycosylated peptides” or “glycosylated polypeptides” refers to a polypeptide bonded to a glycan residue.
As used herein, the term “glycoprotein,” refers to a protein having at least one glycan residue bonded thereto. In some examples, a glycoprotein is a protein with at least one oligosaccharide chain covalently bonded thereto. Examples of glycoproteins, include but are not limited to apolipoprotein C-III (APOC3), alpha-1-antichymotrypsin (AACT), afamin (AFAM), alpha-1-acid glycoprotein 1 & 2 (AGP12), apolipoprotein B-100 (APOB), apolipoprotein D (APOD), complement C1s subcomponent (C1S), calpain-3 (CAN3), clusterin (CLUS), complement component C8AChain (CO8A), alpha-2-HS-glycoprotein (FETUA), haptoglobin (HPT), immunoglobulin heavy constant gamma 1 (IgG1), immunoglobulin J chain (IgJ), plasma kallikrein (KLKB1), serum paraoxonase/arylesterase 1 (PON1), prothrombin (THRB), serotransferrin (TRFE), protein unc-13 homologA (UN13A), and zinc-alpha-2-glycoprotein (ZA2G). A glycopeptide, as used herein, refers to a fragment of a glycoprotein, unless specified otherwise to the contrary.
As used herein, the phrase “glycopeptide fragment,” “glycosylated peptide fragment,” “glycopolypeptide fragment”, and “glycosylated polypeptide fragment” refer to a glycosylated polypeptide or glycopeptide having an amino acid sequence that is the same as part (but not all) of the amino acid sequence of the glycosylated protein from which the glycosylated peptide is obtained by digestion, e.g., with one or more protease(s) or by fragmentation, e.g., ion fragmentation within a MRM-MS instrument. MRM refers to multiple-reaction-monitoring. Unless specified otherwise, “glycopeptide fragments” or “fragments of a glycopeptide” refer to the fragments produced directly by using a mass spectrometer optionally after the glycoprotein has been digested enzymatically to produce the glycopeptides.
As used herein, the phrase “multiple reaction monitoring mass spectrometry (MRM-MS),” refers to a highly sensitive and selective method for the targeted quantification of glycans and peptides in biological samples. Unlike traditional mass spectrometry, MRM-MS is highly selective (targeted), allowing researchers to fine tune an instrument to specifically look for certain peptides fragments of interest. MRM allows for greater sensitivity, specificity, speed and quantitation of peptides fragments of interest, such as a potential biomarker. MRM-MS involves using one or more of a triple quadrupole (QQQ) mass spectrometer and a quadrupole time-of-flight (qTOF) mass spectrometer.
As used herein, the phrase “digesting a glycopeptide,” refers to a biological process that employs enzymes to break specific amino acid peptide bonds. For example, digesting a glycopeptide includes contacting a glycopeptide with a digesting enzyme, e.g., trypsin, to produce fragments of the glycopeptide. In some examples, a protease enzyme is used to digest a glycopeptide. The term “protease” refers to an enzyme that performs proteolysis or breakdown of large peptides into smaller polypeptides or individual amino acids. Examples of a protease include, but are not limited to, one or more of a serine protease, threonine protease, cysteine protease, aspartate protease, glutamic acid protease, metalloprotease, asparagine peptide lyase, and any combinations of the foregoing.
As used herein, the phrase “fragmenting a glycopeptide,” refers to the ion fragmentation process which occurs in an MRM-MS instrument. Fragmenting may produce various fragments having the same mass but varying with respect to their charge.
As used herein, the phrase “multiple-reaction-monitoring (MRM) transition,” refers to the mass to charge (m/z) peaks or signals observed when a glycopeptide, or a fragment thereof, is detected by MRM-MS. The MRM transition is detected as the transition of the precursor and product ion.
As used herein, the phrase “detecting a multiple-reaction-monitoring (MRM) transition,” refers to the process in which a mass spectrometer analyzes a sample using tandem mass spectrometer ion fragmentation methods and identifies the mass to charge ratio for ion fragments in a sample. The absolute value of these identified mass to charge ratios are referred to as transitions. In the context of the methods set forth herein, the mass to charge ratio transitions are the values indicative of glycan, peptide or glycopeptide ion fragments. For some glycopeptides set forth herein, there is a single transition peak or signal. For some other glycopeptides set forth herein, there is more than one transition peak or signal. Background information on MRM mass spectrometry can be found in Introduction to Mass Spectrometry: Instrumentation, Applications, and Strategies for Data Interpretation, 4th Edition, J. Throck Watson, O. David Sparkman, ISBN: 978-0-470-51634-8, November 2007, the entire contents of which are here incorporated by reference in its entirety for all purposes.
As used herein, the phrase “detecting a multiple-reaction-monitoring (MRM) transition indicative of a glycopeptide,” refers to a MS process in which an MRM-MS transition is detected and then compare to a calculated mass to charge ratio (m/z) of a glycopeptide, or fragment thereof, in order to identify the glycopeptide. In some examples, herein, a single transition may be indicative of two more glycopeptides, if those glycopeptides have identical MRM-MS fragmentation patterns. A transition peak or signal includes, but is not limited to, those transitions set forth herein were are associated with a glycopeptide consisting essentially of an amino acid sequence selected from SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof, according to Tables 1-5. A transition peak or signal includes, but is not limited to, those transitions set forth herein were are associated with a glycopeptide consisting of an amino acid sequence selected from SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof, according to Tables 1-5.
As used herein, the term “reference value” refers to a value obtained from a population of individual(s) whose disease state is known. The reference value may be in n-dimensional feature space and may be defined by a maximum-margin hyperplane. A reference value can be determined for any particular population, subpopulation, or group of individuals according to standard methods well known to those of skill in the art.
As used herein, the term “population of individuals” means one or more individuals. In one embodiment, the population of individuals consists of one individual. In one embodiment, the population of individuals comprises multiple individuals. As used herein, the term “multiple” means at least 2 (such as at least 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, or 30) individuals. In one embodiment, the population of individuals comprises at least 10 individuals.
Glycans are referenced herein using the Symbol Nomenclature for Glycans (SNFG) for illustrating glycans. An explanation of this illustration system is available on the internet at www.ncbi.nlm.nih.gov/glycans/snfg.html, the entire contents of which are herein incorporated by reference in its entirety for all purposes. Symbol Nomenclature for Graphical Representation of Glycans as published in Glycobiology 25: 1323-1324, 2015. Additional information showing illustrations of the SNFG system are. Within this system, the term, Hex_i: is interpreted as follows: i indicates the number of green circles (mannose) and the number of yellow circles (galactose). The term, HexNAC_j, uses j to indicate the number of blue squares (G1cNAC's). The term Fuc_d, uses d to indicate the number of red triangles (fucose). The term Neu5AC_1, uses 1 to indicate the number of purple diamonds (sialic acid). The glycan reference codes used herein combine these i, j, d, and l terms to make a composite 4-5 number glycan reference code, e.g., 5300 or 5320. See, for example, FIGS. 1 through 14 of PCT Patent Application No. PCT/US2020/0162861, filed Jan. 31, 2020, which are herein incorporated by reference in their entirety for all purposes.
The term “in vivo” is used to describe an event that takes place in a subject's body.
The term “ex vivo” is used to describe an event that takes place outside of a subject's body. An “ex vivo” assay is not performed on a subject. Rather, it is performed upon a sample separate from a subject. An example of an “ex vivo” assay performed on a sample is an “in vitro” assay.
The term “in vitro” is used to describe an event that takes places contained in a container for holding laboratory reagent such that it is separated from the living biological source organism from which the material is obtained. In vitro assays can encompass cell-based assays in which cells alive or dead are employed. In vitro assays can also encompass a cell-free assay in which no intact cells are employed.
As used herein, the term ‘about’ a number refers to that number plus or minus 10% of that number. The term ‘about’ a range refers to that range minus 10% of its lowest value and plus 10% of its greatest value.
Sample collection 102 may include, for example, obtaining a biological sample 112 of one or more subjects, such as subject 114. Biological sample 112 may take the form of a specimen obtained via one or more sampling methods. Biological sample 112 may be representative of subject 114 as a whole or of a specific tissue, cell type, or other category or sub-category of interest. Biological sample 112 may be obtained in any of a number of different ways. In various embodiments, biological sample 112 includes whole blood sample 116 obtained via a blood draw. In other embodiments, biological sample 112 includes set of aliquoted samples 118 that includes, for example, a serum sample, a plasma sample, a blood cell (e.g., white blood cell (WBC), red blood cell (RBC) sample, another type of sample, or a combination thereof. Biological samples 112 may include nucleotides (e.g., ssDNA, dsDNA, RNA), organelles, amino acids, peptides, proteins, carbohydrates, glycoproteins, or any combination thereof.
Sample intake 104 may include one or more various operations such as, for example, aliquoting, registering, processing, storing, thawing, and/or other types of operations. In one or more embodiments, when biological sample 112 includes whole blood sample 116, sample intake 104 includes aliquoting whole blood sample 116 to form a set of aliquoted samples that can then be sub-aliquoted to form set of samples 120.
Sample preparation and processing 106 may include, for example, one or more operations to form set of peptide structures 122. In various embodiments, set of peptide structures 122 may include various fragments of unfolded proteins that have undergone digestion and may be ready for analysis.
Further, sample preparation and processing 106 may include, for example, data acquisition 124 based on set of peptide structures 122. For example, data acquisition 124 may include use of, for example, but is not limited to, a liquid chromatography/mass spectrometry (LC/MS) system.
Data analysis 108 may include, for example, peptide structure analysis 126. In some embodiments, data analysis 108 also includes output generation 110. In other embodiments, output generation 110 may be considered a separate operation from data analysis 108. Output generation 110 may include, for example, generating final output 128 based on the results of peptide structure analysis 126. Final output 128 may be used for the research, and/or treatment of disease, such as, for example, melanoma.
In various embodiments, final output 128 is comprised of one or more outputs. Final output 128 may take various forms. For example, final output 128 may be a report that includes, for example, a treatment output (e.g., a treatment design output, a treatment plan output, or combination thereof). In some embodiments, final output 128 may be an alert (e.g., a visual alert, an audible alert, etc.), a notification (e.g., a visual notification, an audible notification, an email notification, etc.), an email output, or a combination thereof. In some embodiments, final output 128 may be sent to remote system 130 for processing. Remote system 130 may include, for example, a computer system, a server, a processor, a cloud computing platform, cloud storage, a laptop, a tablet, a smartphone, some other type of mobile computing device, or a combination thereof.
In other embodiments, workflow 100 may optionally exclude one or more of the operations described herein and/or may optionally include one or more other steps or operations other than those described herein (e.g., in addition to and/or instead of those described herein). For example, in one or more embodiments, final output 128 may not be sent to remote system 130 for processing. Instead, a notification or a communication (e.g., email) may be sent to remote system 130 to notify a user(s) or entity that final output 128 is available for retrieval (e.g., download). Accordingly, workflow 100 may be implemented in any of a number of different ways for use in the research and/or treatment of melanoma.
I.A. Sample Preparation and Processing
In general, polymers, such as proteins, in their native form, can fold to include secondary, tertiary, and/or other higher order structures. Such higher order structures may functionalize proteins to complete tasks (e.g., enable enzymatic activity) in a subject. Further, such higher order structures of polymers may be maintained via various interactions between side chains of amino acids within the polymers. Such interactions can include ionic bonding, hydrophobic interactions, hydrogen bonding, and disulfide linkages between cysteine residues. However, when using analytic systems and methods, including mass spectrometry, unfolding such polymers (e.g., peptide/protein molecules) may be desired to obtain sequence information. In some embodiments, unfolding a polymer may include denaturing the polymer, which may include, for example, linearizing the polymer.
In one or more embodiments, denaturation and reduction 202 can be used to disrupt higher order structures (e.g., secondary, tertiary, quaternary, etc.) of one or more proteins (e.g., polypeptides and peptides) in a sample (e.g., one of set of samples 120 in
In one or more embodiments, the denaturation procedure may include using one or more denaturing agents in combination with heat. These one or more denaturing agents may include, for example, but are not limited to, any number of chaotropic salts (e.g., urea, guanidine), surfactants (e.g., sodium dodecyl sulfate (SDS), beta octyl glucoside, Triton X-100), or combination thereof. In some cases, such denaturing agents may be used in combination with heat when sample preparation workflow further includes a cleanup procedure.
The resulting one or more denatured (e.g., unfolded, linearized) proteins may then undergo further processing in preparation of analysis. For example, a reduction procedure may be performed in which one or more reducing agents are applied. A reducing agent may take the form of, for example, without limitation, dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP), or some other reducing agent. The reducing agent may reduce (e.g., cleave) the disulfide linkages between cysteine residues of the one or more denatured proteins to form one or more reduced proteins.
In various embodiments, the one or more reduced proteins resulting from denaturation and reduction 202 may undergo a process to prevent the reformation of disulfide linkages between, for example, the cysteine residues of the one or more reduced proteins. This process may be implemented using alkylation 204 to form one or more alkylated proteins. For example, alkylation 204 may be used add an acetamide group to a sulfur on each cysteine residue to prevent disulfide linkages from reforming. In various embodiments, an acetamide group can be added by reacting one or more alkylating agents with a reduced protein. The one or more alkylating agents may include, for example, one or more acetamide salts. An alkylating agent may take the form of, for example, iodoacetamide (IAA), 2-chloroacetamide, some other type of acetamide salt, or some other type of alkylating agent.
In some embodiments, alkylation 204 may include a quenching procedure. The quenching procedure may be performed using one or more reducing agents (e.g., one or more of the reducing agents described above).
In various embodiments, the one or more alkylated formed via alkylation 204 can then undergo digestion 206 in preparation for analysis (e.g., mass spectrometry analysis). Digestion 206 of a protein may include cleaving the protein at or around one or more cleavage sites (e.g., site 205 which may be one or more amino acid residues). For example, without limitation, an alkylated protein may be cleaved at the carboxyl side of the lysine or arginine residues. This type of cleavage may break the protein into various segments, which include one or more peptide structures (e.g., glycosylated or aglycosylated).
In various embodiments, digestion 206 is performed using one or more proteolysis catalysts. For example, an enzyme can be used in digestion 206. In some embodiments, the enzyme takes the form of trypsin. In other embodiments, one or more other types of enzymes (e.g., proteases) may be used in addition to or in place of trypsin. These one or more other enzymes include, but are not limited to, LysC, LysN, AspN, GluC, and ArgC. In some embodiments, digestion 206 may be performed using tosyl phenylalanyl chloromethyl ketone (TPCK)-treated trypsin, one or more engineered forms of trypsin, one or more other formulations of trypsin, or a combination thereof. In some embodiments, digestion 206 may be performed in multiple steps, with each involving the use of one or more digestion agents. For example, a secondary digestion, tertiary digestion, etc. may be performed. In one or more embodiments, trypsin is used to digest serum samples. In one or more embodiments, trypsin/LysC cocktails are used to digest plasma samples.
In one or more embodiments, digestion 206 further includes a quenching procedure. The quenching procedure may be performed by acidifying the sample (e.g., to a pH<3). In one or more embodiments, formic acid may be used to perform this acidification.
In various embodiments, preparation workflow 200 further includes post-digestion procedure 207. Post-digestion procedure 207 may include, for example, a cleanup procedure. The cleanup procedure may include, for example, the removal of unwanted components in the sample that results from digestion 206. For example, unwanted components may include, but are not limited to, inorganic ions, surfactants, etc. In some embodiments, post-digestion procedure 207 further includes a procedure for the addition of heavy-labeled peptide internal standards.
Although preparation workflow 200 has been described with respect to a sample created or taken from biological sample 112 that is blood-based (e.g., a whole blood sample, a plasma sample, a serum sample, etc.), sample preparation workflow 200 may be similarly implemented for other types of samples (e.g., tears, urine, tissue, interstitial fluids, sputum, etc.) to produce set of peptides structures 122.
I.B. Peptide Structure Identification and Quantitation
In various embodiments, targeted quantification 208 of peptides and glycopeptides can incorporate use of liquid chromatography-mass spectrometry LC/MS instrumentation. For example, LC-MS/MS, or tandem MS may be used. In general, LC/MS (e.g., LC-MS/MS) can combine the physical separation capabilities of liquid chromatograph (LC) with the mass analysis capabilities of mass spectrometry (MS). According to some embodiments described herein, this technique allows for the separation of digested peptides to be fed from the LC column into the MS ion source through an interface.
In various embodiments, any LC/MS device can be incorporated into the workflow described herein. In various embodiments, a Triple Quadrupole LC/MS™ includes example instruments suited for identification and targeted quantification 208. In various embodiments, targeted quantification 208 is performed using multiple reaction monitoring mass spectrometry (MRM-MS).
In various embodiments described herein, identification of a particular protein or peptide and an associated quantity can be assessed. In various embodiments described herein, identification of a particular glycan and an associated quantity can be assessed. In various embodiments described herein, particular glycans can be matched to a glycosylation site on a protein or peptide and their absolute or relative quantities assessed.
In some cases, targeted quantification 208 includes using a specific collision energy associated for the appropriate fragmentation to consistently see an abundant product ion. Glycopeptide structures may have a lower collision energy than aglycosylated peptide structures. When analyzing a sample that includes glycopeptide structures, the source voltage and gas temperature may be lowered as compared to generic proteomic analysis.
In various embodiments, quality control 210 procedures can be put in place to optimize data quality. In various embodiments, measures can be put in place allowing only errors within acceptable ranges outside of an expected value. In various embodiments, employing statistical models (e.g. using Westgard rules) can assist in quality control 210. For example, quality control 210 may include, for example, assessing the retention time and abundance of representative peptide structures (e.g., glycosylated and/or aglycosylated) and spiked-in internal standards, in either every sample, or in each quality control sample (e.g., pooled serum digest).
Peak integration and normalization 212 may be performed to process the data that has been generated and transform the data into a format for analysis. For example, peak integration and normalization 212 may include converting abundance data for various product ions that were detected for a selected peptide structure into a single quantification metric (e.g., a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, a normalized concentration, etc.) for that peptide structure. In some embodiments, peak integration and normalization 212 may be performed using one or more of the techniques described in U.S. Patent Publication No. 2020/0372973A1 and/or US Patent Publication No. 2020/0240996, the disclosures of which are incorporated by reference herein in their entireties.
II.A. Exemplary System
II.A.1. System for Analyzing Peptide Structure Data and Managing Melanoma Treatment
Analysis system 300 may include computing platform 302 and data store 304. In some embodiments, analysis system 300 also includes display system 306. Computing platform 302 may take various forms. In one or more embodiments, computing platform 302 includes a single computer (or computer system) or multiple computers in communication with each other. In other examples, computing platform 302 takes the form of a cloud computing platform. In still other examples, computing platform 302 may include any number of or combination computers, cloud computing platforms, servers, or mobile devices.
Data store 304 and display system 306 may each be in communication with computing platform 302. In some examples, data store 304, display system 306, or both may be considered part of or otherwise integrated with computing platform 302. Thus, in some examples, computing platform 302, data store 304, and display system 306 may be separate components in communication with each other, but in other examples, some combination of these components may be integrated together. Communication between these different components may be implemented using any number of wired communications links, wireless communications links, optical communications links, or a combination thereof.
Analysis system 300 includes, for example, treatment management system 308, which may be implemented using hardware, software, firmware, or a combination thereof. In one or more embodiments, peptide structure analyzer 308 is implemented using computing platform 302.
Treatment management system 308 may be used to manage the treatment of a subject diagnosed with a melanoma condition (i.e., malignant melanoma). Treatment management system 308 may be used to predict the subject's response to one or more treatments for the melanoma condition, select a treatment to be administered to the subject to prevent the progression (or advancement) of the melanoma condition and/or otherwise improve the condition of the subject, and/or otherwise plan the treatment of the subject.
Treatment management system 308 receives peptide structure data 310 for processing. Peptide structure data 310 may have been generated using multiple reaction monitoring mass spectrometry. Peptide structure data 310 may be, for example, the peptide structure data that is output from sample preparation and processing 106 in
Peptide structure data 310 can be sent as input into treatment management system 308, retrieved from data store 304 or some other type of storage (e.g., cloud storage), accessed from cloud storage, or obtained in some other manner. In some cases, peptide structure data 310 may be retrieved from data store 304 in response to (e.g., directly or indirectly based on) receiving user input entered by a user via an input device.
Treatment management system 308 may include scoring system 312. In one or more embodiments, treatment management system 308 further includes and treatment planning system 314. Scoring system 312 may be used to predict the response of a subject (e.g., subject 114) to one or more types of treatment. Treatment planning system 314 may be used to plan how to treat the subject based on the predicted response(s) for the subject.
Scoring system 312 may include, for example, model system 315 that is configured to receive peptide structure data 310 for processing. Model system 315 may be implemented in any of a number of different ways. Model system 315 may be a computational model system that may be implemented using any number of models, functions, equations, algorithms, and/or other mathematical techniques.
In one or more embodiments, scoring system 312 receives peptide structure data 310 for processing and inputs quantification data 316 identified from peptide structure data 310 for set of peptide structures 318 into model system 315. Model system 315 analyzes quantification data 316 to generate set of treatment scores 320 corresponding to a set of treatments. Peptide structure data 310 may comprise a set of quantification metrics for each peptide structure of, for example, set of peptide structures 122 in
A peptide structure of set of peptide structures 318 may be a glycosylated peptide structure, or glycopeptide structure, that is defined by a peptide sequence and a glycan structure attached to a linking site of the peptide sequence quantity. For example, the peptide structure may be a glycopeptide or a portion of a glycopeptide. Alternatively, a peptide structure of set of peptide structures 318 may be an aglycosylated peptide structure that is defined by a peptide sequence. For example, the peptide structure may be a peptide or a portion of a peptide and may be referred to as a quantification peptide.
Set of peptide structures 318 may be identified as being those most predictive or relevant to the response of a subject to a corresponding treatment(s) based on training of model 312. In one or more embodiments, set of peptide structures 318 includes at least one, at least three, at least five, or at least some other number of the peptide structures identified in Table 1 below in Section V.B. The number of peptide structures selected from Table 1 for inclusion in set of peptide structures 318 may be based on, for example, a desired level of accuracy, the number of treatments for which set of treatment scores 320 are being generated, one or more other factors, or a combination thereof.
In one or more embodiments, model system 315 may be used to analyze the response of a subject to a pembrolizumab treatment (“pembro”), the response of the subject to a combination treatment comprised of the combination of nivolumab and ipilimumab (“ipi/nivo”). Both pembro and ipi/nivo are treatments used to treat melanoma. For example, model system 315 may use quantification data 316 for set of peptide structures 318 to generate set of treatment scores 320 that includes a first treatment score 322 for pembro and a second treatment score 324 for ipi/nivo. In one or more embodiments, set of peptide structures 318 may include first subset 321 of set of peptide structures 318 used to compute first treatment score 322 and second subset 323 of set of peptide structures 318 used to compute second treatment score 324. In one or more embodiments, first subset 321 and the second subset 323 of set of peptide structures 318 may partially overlap (e.g., have one, two, three, four, five, some other number of peptide structures in common.
First portion 326 of quantification data 316 used to compute first treatment score 322 may correspond to first subset 321. Second portion 328 of quantification data 316 used to compute second treatment score 324 may correspond to second subset 323. First portion 326 and second portion 328 may be referred to as first quantification data and second quantification data, respectively. When first subset 321 and second subset 323 partially overlap, first portion 326 and second portion 328 similarly overlap. As one example, first portion 326 of quantification data 316 corresponding to first portion 321 used to compute first treatment score 322 and second portion 328 of quantification data 316 corresponding to second subset 323 of set of peptide structures 318 used to compute second treatment score 324 may have two peptide structures in common.
In one or more embodiments, first subset 321 of set of peptide structures 318 includes at least one, at least three, at least five, or at least some other number of the peptide structures identified in Table 2 below in Section V.B. In one or more embodiments, second subset 323 of set of peptide structures 318 includes at least one, at least three, at least five, or at least some other number of the peptide structures identified in Table 3 below in Section V.B.
In one or more embodiments, set of peptide structures 318 may have been identified by treatment management system 308 using relevance system 330. Relevance system 330 may include any number of computational models to analyze sample data 332 to determine which peptide structures to include in set of peptide structures 318. Sample data 332 may be retrieved from data store 304 or received in some other manner. Sample data 332 may include data capturing multiple subjects' responses to one or more treatments. For example, sample data 332 may include data capturing subjects' responses to pembro and to subjects' responses to ipi/nivo.
In one or more embodiments, relevance system 330 includes a first algorithm that uses a Wilcoxon rank-sum test to determine first subset 321 and a second algorithm that uses the Wilcoxon rank-sum test to determine second subset 323. For example, relevance system 330 includes a first algorithm that uses a Wilcoxon rank-sum test to determine which peptide structures to include in first subset 321 to compute first treatment score 322 (e.g., for pembro) and a second algorithm that uses the Wilcoxon rank-sum test to determine which peptide structures to include in second subset 323 to compute second treatment score 324 (e.g., for ipi/nivo).
Treatment planning system 314 receives set of treatment scores 320 from scoring system 312. Treatment planning system 314 uses set of treatment scores 320 to generate treatment output 334. Treatment output 334 may include, for example, an identification or categorization of the response of the subject to the one or more treatments for which the subject's response is being predicted, at least one of an identification of a therapeutic to treat the subject, a design for the therapeutic, a treatment plan for administering the therapeutic, or a combination thereof. In some embodiments, the therapeutic is an immune checkpoint inhibitor. In various embodiments, treatment output 326 includes a therapeutic dosage for each therapeutic to be used in treating the subject.
In one or more embodiments, treatment output 334 identifies a response classification that indicates a predicted response for the subject to a treatment. For example, set of treatment scores 320 may include a treatment score that can be used to classify a subject's response to a melanoma treatment as either early disruption or sustained control.
The response classification may be, for example, a positive response classification, a negative response classification, or some other type of response classification. A positive response classification may, for example, indicate that the subject is predicted to have a relatively positive or otherwise successful response to treatment. A negative response classification may, for example, indicate that the subject is predicted to have a relatively poor or otherwise unsuccessful response to treatment. In one or more embodiments, the response classification predicts response to treatment with respect to survivability (e.g., overall survival, progression-free survival, etc.).
“Early disruption” may be an example of a negative response classification. “Early disruption” may indicate that the subject is predicted to have a relatively poor response to the treatment. For example, a prediction of “early disruption” may mean that the subject is predicted to have a disruption event within an initial period of time (e.g., 6 months) after treatment. A disruption event may be any event that disrupts the subject's “progression-free survival” (PFS). A disruption event may be also referred to as a progression event or an advancement event as such an event indicates disease progression or advancement. In some cases, the progression event may be a final level of progression or disease advancement, such as death. Thus, “early disruption” may also be referred to as “progression,” “disease progression,” or “disease advancement.” A disruption event may include, for example, at least one of a new melanoma (e.g., malignant mole), an increase in the size of an existing melanoma, or some other type of event. A disruption event may be detected using any number of progression criteria. For example, a disruption event may be considered “detected” in response to a selected number or proportion of a set of progression criteria being met. The set of progression criteria may include, for example, but is not limited to, one or more immune-related response criteria (irRC), one or more response evaluation criteria in solid tumors (RECIST), one or more other types of criteria, or a combination thereof.
“Sustained control” may be one example of a positive response classification. “Sustained control” may be a response classification that indicates that the subject is predicted to have a relatively successful response to the treatment. For example, a prediction of “sustained control” may mean that the subject is predicted to have no disruption events within a sustained period of time (e.g., 12 months) after treatment. The sustained period of time may be longer than the initial period of time.
In one or more embodiments, treatment planning system 314 uses one or more selected thresholds to classify set of treatment scores 320. In one or more embodiments, a different selected threshold is used for each treatment. In other embodiments, a same threshold is used for all treatments being considered. For example, treatment planning system 314 may use selected threshold 336. In one or more embodiments, selected threshold is 0.5. In other embodiments, selected threshold is 0.6, 0.7, 0.75, 0.8, or some other threshold.
As one example, when selected threshold is 0.5, treatment planning system 314 may generate a first predicted response based on a determination that a treatment score is above (or is at and above) the selected threshold and may generate a second predicted response based on a determination that the treatment score is not above (or is below) the selected threshold. The first predicted response may be, for example, a first predicted response classification (e.g., sustained control); the second predicted response may be a second predicted response classification (e.g., early disruption).
Treatment output 334 may include the response classification that is predicted such that a user (e.g., a medical professional) can determine whether a corresponding treatment should be or should not be administered to a subject. For example, when first treatment score 322 is generated for pembro, and treatment output 334 indicates that a subject's predicted response is “early disruption,” a medical professional may determine to administer a different treatment, a higher dosage of pembro, or change the treatment plan for the subject in some other way.
When set of treatment scores 320 includes at least two treatment scores, treatment planning system 314 may analyze the at least two treatment scores and determine which treatment score indicates a best response to the corresponding treatment for the subject. As one example, treatment planning system 314 may compare the at least treatment scores and select the treatment corresponding to the highest treatment score for the subject. This selected treatment may then be identified in treatment output 334. In some cases, treatment output 334 may further include a therapeutic dosage (e.g., an approved dosage) for selected treatment for the subject. In some cases, treatment output 334 may further include a response classification for the selected treatment. For example, while first treatment score 322 may be higher than second treatment score 324, both first treatment score 322 and second treatment score 324 may indicate that the predicted response for the subject is “early disruption” with both treatments. In this example, treatment output 336 may identify the treatment corresponding to first treatment score 322 with an indication that the predicted response “early disruption” and a recommendation to either select a different treatment, alter (e.g., increase/decrease) a dosage of the treatment corresponding to first treatment score 322, combine the treatment with at least one other treatment, or change the treatment plan for the subject in some other manner.
Treatment output 334 may be sent to remote system 130 for processing in some examples. In other embodiments, treatment output 334 may be displayed on graphical user interface 338 in display system 306 for viewing by a human operator. The human operator may use treatment output 334 to manage the melanoma treatment of the subject.
II.A.2. Computer Implemented System
In one or more examples, computer system 400 can include a bus 402 or other communication mechanism for communicating information, and a processor 404 coupled with bus 402 for processing information. In various embodiments, computer system 400 can also include a memory, which can be a random-access memory (RAM) 406 or other dynamic storage device, coupled to bus 402 for determining instructions to be executed by processor 404. Memory also can be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404. In various embodiments, computer system 400 can further include a read only memory (ROM) 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404. A storage device 410, such as a magnetic disk or optical disk, can be provided and coupled to bus 402 for storing information and instructions.
In various embodiments, computer system 400 can be coupled via bus 402 to a display 412, such as a cathode ray tube (CRT) or liquid crystal display (LCD), for displaying information to a computer user. An input device 414, including alphanumeric and other keys, can be coupled to bus 402 for communicating information and command selections to processor 404. Another type of user input device is a cursor control 416, such as a mouse, a joystick, a trackball, a gesture input device, a gaze-based input device, or cursor direction keys for communicating direction information and command selections to processor 404 and for controlling cursor movement on display 412. This input device 414 typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane. However, it should be understood that input devices 414 allowing for three-dimensional (e.g., x, y, and z) cursor movement are also contemplated herein.
Consistent with certain implementations of the present teachings, results can be provided by computer system 400 in response to processor 404 executing one or more sequences of one or more instructions contained in RAM 406. Such instructions can be read into RAM 406 from another computer-readable medium or computer-readable storage medium, such as storage device 410. Execution of the sequences of instructions contained in RAM 406 can cause processor 404 to perform the processes described herein. Alternatively, hard-wired circuitry can be used in place of or in combination with software instructions to implement the present teachings. Thus, implementations of the present teachings are not limited to any specific combination of hardware circuitry and software.
The term “computer-readable medium” (e.g., data store, data storage, storage device, data storage device, etc.) or “computer-readable storage medium” as used herein refers to any media that participates in providing instructions to processor 404 for execution. Such a medium can take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Examples of non-volatile media can include, but are not limited to, optical, solid state, magnetic disks, such as storage device 410. Examples of volatile media can include, but are not limited to, dynamic memory, such as RAM 406. Examples of transmission media can include, but are not limited to, coaxial cables, copper wire, and fiber optics, including the wires that comprise bus 402.
Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other tangible medium from which a computer can read.
In addition to computer readable medium, instructions or data can be provided as signals on transmission media included in a communications apparatus or system to provide sequences of one or more instructions to processor 404 of computer system 400 for execution. For example, a communication apparatus may include a transceiver having signals indicative of instructions and data. The instructions and data are configured to cause one or more processors to implement the functions outlined in the disclosure herein. Representative examples of data communications transmission connections can include, but are not limited to, telephone modem connections, wide area networks (WAN), local area networks (LAN), infrared data connections, NFC connections, optical communications connections, etc.
It should be appreciated that the methodologies described herein, flow charts, diagrams, and accompanying disclosure can be implemented using computer system 400 as a standalone device or on a distributed network of shared computer processing resources such as a cloud computing network.
The methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in hardware, firmware, software, or any combination thereof. For a hardware implementation, the processing unit may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
In various embodiments, the methods of the present teachings may be implemented as firmware and/or a software program and applications written in conventional programming languages such as R, C, C++, Python, etc. If implemented as firmware and/or software, the embodiments described herein can be implemented on a non-transitory computer-readable medium in which a program is stored for causing a computer to perform the methods described above. It should be understood that the various engines described herein can be provided on a computer system, such as computer system 400, whereby processor 404 would execute the analyses and determinations provided by these engines, subject to instructions provided by any one of, or a combination of, the memory components RAM 406, ROM, 408, or storage device 410 and user input provided via input device 414.
II.B. Exemplary Methodologies for Analyzing Peptide Structure Data and Managing Melanoma Treatment
II.B.1. Predicting Treatment Response
Step 502 includes receiving peptide structure data corresponding to a set of glycoproteins in a biological sample obtained from the subject. The peptide structure data may be, for example, one example of an implementation of peptide structure data 310 in
Step 504 includes computing a treatment score using quantification data identified from the peptide structure data for a set of peptide structures, wherein the set of peptide structures includes at least one peptide structure identified from a plurality of peptide structures listed in Table 1. In step 504, the set of peptide structures may include, for example, at least two peptide structures from a selected group of peptide structures identified in Table 1 below. The selected group of peptide structures may be, for example, a portion of the peptide structure identified in Table 1. The selected group of peptide structures may be, for example, those peptide structures identified in Table 2 below or those peptide structures identified in Table 3 below. For example, when the treatment being considered includes pembrolizumab, the selected group of peptide structures includes the peptide structures listed in Table 2. When the treatment being considered includes a combination of nivolumab and ipilimumab, the selected group of peptide structures includes the peptide structures listed in Table 3. In step 504, the set of peptide structures may include at least one glycopeptide structure defined by a peptide sequence and a glycan structure linked to a linking site of the peptide sequence, as identified in Table 1.
In one or more embodiments, the set of peptide structures may have been identified using sample data for a sample population (e.g., subjects diagnosed with melanoma in which at least a portion of the subjects have been treated using the treatment being considered in process 500) and a statistical algorithm that identifies a relative significance for each peptide structure of a collection of peptide structures corresponding to the sample data. The statistical algorithm may include, for example, a Wilcoxon rank-sum test. In one or more embodiments, the identification of the set of peptide structures is performed using process 800 described below in
Step 504 may be performed by, for example, computing a proportion of the set of peptide structures having a certain type of abundance (e.g., relative abundance for glycopeptide structures and absolute abundance for aglycosylated peptide structures) greater than a reference abundance as the treatment score. In one or more embodiments, the reference abundance for a given peptide structure may be, for example, a median abundance of a plurality of abundances for that peptide structure across a sample population (e.g., as identified during training). The relative abundance for a given peptide structure is the abundance of that peptide structure relative to the corresponding aglycosylated peptide structure (e.g., the peptide structure having the same peptide sequence but without a glycan structure being bound to the peptide sequence).
Step 506 includes generating a treatment output that indicates a predicted response to the treatment for the subject using the treatment score. The treatment output may be one example of an implementation for treatment output 334 in
The treatment outcome may include, for example, a recommendation to modify a treatment plan for the subject. For example, in some cases, the treatment output may indicate that early disruption is predicted for the subject. Accordingly, it may be desirable to modify the treatment plan. For example, the recommendation for modifying the treatment plan may include at least one of selecting a different treatment for the subject, alter (e.g., increase/decrease) a dosage for the treatment, or combining the treatment with at least one other treatment.
In one or more embodiments, the treatment output includes at least one of a design for the treatment or a therapeutic dosage for the treatment. For example, in some cases when the treatment score indicates that the subject will respond well (e.g., sustained control) to the treatment, the treatment outcome may identify the therapeutic dosage for the treatment. In this manner, a medical professional that receives the treatment output at a remote system (e.g., phone, tablet, laptop, etc.) may be able to more quickly administer the treatment to the subject.
In one or more embodiments, process 500 may optionally include step 508. Step 508 may include administering a therapeutic dosage of the treatment based on the treatment output to the subject. For example, the treatment may be administered (e.g., via intravenous or oral administration) based on the predicted response being a predicted response classification that indicates the treatment will be successful. For example, a predicted response classification of “sustained control” may indicate that the subject is predicted to respond well to treatment.
II.B.2. Selecting Between Multiple Treatments
Step 602 may include receiving peptide structure data corresponding to a set of peptide structures associated with a set of glycoproteins in a biological sample obtained from the subject. Step 602 may be performed in a manner similar to step 502 as described above with respect to
Step 604 may include computing a plurality of treatment scores using quantification data identified from the peptide structure data for a plurality of subsets of the set of peptide structures, wherein each treatment score of the plurality of treatment scores corresponds to a different treatment of a plurality of treatments. Each subset of the plurality of subsets may include at least one peptide structure identified from a plurality of peptide structures listed in Table 1. Computing a treatment score of the plurality of treatment scores may be performed in a manner similar to step 504 as described above with respect to
In one or more embodiments, the plurality of subsets includes a first subset and a second subset. For example, step 604 may include computing a first treatment score for a first treatment of using a first portion of the quantification data identified from the peptide structure data for a first subset of the plurality of subsets of the set of peptide structures. Step 604 may further include computing a second treatment score for the second treatment using a second portion of the quantification data identified from the peptide structure data for a second subset of the plurality of subsets of the set of peptide structures. The first subset may include one or more peptide structures from those listed in Table 2. The second subset may include one or more peptide structures from those listed in Table 3.
In one or more embodiments, a subset of the plurality of subsets may have been previously identified using sample data for a sample population (e.g., subjects diagnosed with melanoma, in which at least a portion of the sample population has been treated with the plurality of treatments) and a statistical algorithm that identifies a relative significance for each peptide structure of a collection of peptide structures corresponding to the sample data with respect to a response to a selected treatment of the plurality of treatments. For example, identifying the subset may include performing a differential abundance analysis using the sample data to compare a first portion of the sample data corresponding to a first response classification (e.g., a positive response classification such as, for example, sustained control) for the selected treatment and a second portion of the sample data corresponding to a second response classification (e.g., a negative response classification such as, for example, early disruption) for the selected treatment to identify a selected N most differentiating peptide structures (e.g., the 20 most differentiating peptide structures) between the first response classification and the second response classification. The statistical algorithm may include, for example, a Wilcoxon rank-sum test.
Step 606 may include performing a comparison analysis of the plurality of treatment scores. Step 606 may be performed by, for example, determining which of the plurality of treatment scores is a highest-scoring treatment score. In some embodiments, step 606 may include determining that a treatment of the plurality of treatments has a treatment score below a selected threshold and excluding that treatment from the comparison analysis. The selected threshold may be, for example, 0.5.
Step 608 may include generating a treatment output based on the comparison analysis. The treatment output includes a recommended treatment plan for treating the subject. For example, step 608 may include identifying the treatment of the plurality of treatments having a highest treatment score as a recommended treatment for treating the subject.
In one or more embodiments, step 608 may include identifying a predicted response classification for the subject for each treatment of the plurality of treatments using a corresponding treatment score of the plurality of treatment scores. The predicted response classification may be, for example, a positive response classification, a negative response classification, or another type of response classification. In one or more embodiments, the predicted response classification for a particular treatment may be, for example, sustained control when the corresponding treatment score is above a selected threshold and may be, for example, early disruption when the corresponding treatment score is not above the selected threshold. The selected threshold may be, for example, 0.5.
In one or more embodiments, step 608 includes identifying a treatment of the plurality of treatments having a highest treatment score as a highest-scored treatment; determining that the highest treatment score is not above a selected threshold (e.g., 0.5); and generating the treatment output such that the recommended treatment plan includes a recommendation to modify an existing treatment plan for the subject. The recommendation for modifying the treatment plan may include at least one of selecting a different treatment for the subject, altering a dosage for a treatment that is part of the existing treatment plan, or combining the treatment with at least one other treatment.
In one or more embodiments, when the treatment output includes a recommended treatment, process 600 may optionally include step 610. Step 610 may include administering a therapeutic dosage of a treatment recommended by the treatment output to the subject.
Step 702 may include receiving peptide structure data corresponding to a set of peptide structures associated with a set of glycoproteins in a biological sample obtained from the subject. Step 702 may be performed in a manner similar to step 502 as described above with respect to
Step 704 may include computing a first treatment score for a first treatment of pembrolizumab using first quantification data identified from the peptide structure data for a first subset of the set of peptide structures, wherein the first subset includes at least one peptide structure identified from a plurality of peptide structures listed in Table 2. The treatment score may be computed using, for example, a proportion of a subset of the plurality of subsets of the set of peptide structures having a selected abundance (e.g., relative abundance for glycopeptide structures and absolute abundance for aglycosylated peptide structures) greater than a reference abundance for that peptide structure as a treatment score of the plurality of treatment scores. In one or more embodiments, the first subset includes all of or a majority of (e.g., more than 15) the peptide structures listed in Table 2.
Step 706 may include computing a second treatment score for a second treatment comprised of nivolumab and ipilimumab using second quantification data identified from the peptide structure data for a second subset of the set of peptide structures, wherein the second subset includes at least one peptide structure identified from a plurality of peptide structures listed in Table 3. In one or more embodiments, the first subset includes all of or a majority of (e.g., more than 15) the peptide structures listed in Table 3.
Step 708 may include performing a comparison analysis of the first treatment score and the second treatment score. Step 708 may include, for example, determining which of the first treatment score and the second treatment score is a highest score.
Step 710 may include generating a treatment output based on the comparison analysis, wherein the treatment output identifies one of the first treatment and the second treatment as a recommended treatment for the subject. For example, step 710 may include identifying the highest-scoring treatment as a recommended treatment for treating the subject. The recommended treatment may then be administered to the subject to treat the subject's melanoma. For example, the treatment may be administered via at least one of intravenous or oral administration at a therapeutic dosage.
In one or more embodiments, process 700 may optionally include step 712. Step 712 may include administering a therapeutic dosage of the recommended treatment to the subject.
II.C. Exemplary Methodology for Identifying a Set of Peptide Structures Corresponding to a Treatment
Step 802 includes receiving sample data for a sample population in which the sample data characterizes responses of a plurality of sample subjects diagnosed with the melanoma condition to the treatment and includes sample peptide structure data for a collection of peptide structures for each subject of the plurality of sample subjects.
Step 804 includes grouping the sample data based on the responses of the plurality of sample subjects into a first group corresponding to a first response classification and a second group corresponding to a second response classification.
Step 806 includes performing a differential abundance analysis using the sample data to compare the first group of the sample data corresponding to the first response classification and the second group of the sample data corresponding to the second response classification to identify a set of peptide structures from the collection of peptide structures. The set of peptide structures may be identified as a selected N most differentiating peptide structures (e.g., the 20 most significant peptide structures for differentiation) between the first response classification and the second response classification. The first response classification may be, for example, sustained control, which indicates an absence of disruption events during a sustained period of time (e.g., 12 months) after treatment administration. The second response classification may be, for example, early disruption, which indicates a presence of at least one disruption event during an initial period of time (e.g., 6 months) after treatment.
This set of peptide structure that is identified in step 806 may then be used in future analysis (e.g., in process 500 in
Step 806 may be performed using, for example, a Wilcoxon rank-sum test in one or more embodiments. Exemplary results of the differential abundance analysis performed using the Wilcoxon rank-sum test are presented below in Tables 5 and 6.
Aspects of the disclosure include compositions comprising one or more of the peptide structures listed in Table 1. In some embodiments, a composition comprises a plurality of the peptide structures listed in Table 1. In some embodiments, a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, or 38 of the peptide structures listed in Table 1. In some embodiments, a composition comprises a peptide structure having an amino acid sequence with at least 80% sequence identity, such as, for example, at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 21-46, listed in Table 1 and defined in Table 7 below.
Aspects of the disclosure include compositions comprising one or more precursor ions having a defined charge and/or defined mass-to-charge (m/z) ratio, as listed in Table 6. Aspects of the disclosure include compositions comprising one or more product ions having a defined mass-to-charge (m/z) ratio, which product ions are produced by converting a peptide structure described herein (e.g., a peptide structure listed in Table 1) into a gas phase ion in a mass spectrometry system. Conversion of the peptide structure into a gas phase ion can take place using any of a variety of techniques, including, but not limited to, matrix assisted laser desorption ionization (MALDI); electron ionization (EI); electrospray ionization (ESI); atmospheric pressure chemical ionization (APCI); and/or atmospheric pressure photo ionization (APPI).
Aspects of the disclosure include compositions comprising one or more product ions produced from one or more of the peptide structures described herein (e.g., a peptide structure listed in Table 1). In some embodiments, a composition comprises a set of the product ions listed in Table 1, having an m/z ratio selected from the list provided for each peptide structure in Table 1.
In some embodiments, a composition comprises at least one of peptide structures PS-1 to PS-38 identified in Table 1.
In some embodiments, a composition comprises a peptide structure or a product ion. The peptide structure or product ion comprises an amino acid sequence having at least 90% sequence identity to any one of SEQ NOs: 21-46, as identified in Table 7, corresponding to peptide structures PS-1 to PS-38 in Table 1.
In some embodiments, a composition comprises a peptide structure having a monoisotopic mass identified in Table 1 as corresponding to the peptide structure.
In some embodiments, the product ion is selected as one from a group consisting of product ions identified in Table 6, including product ions falling within an identified m/z range of the m/z ratio identified in Table 6 and characterized as having a precursor ion having an m/z ratio within an identified m/z range of the m/z ratio identified in Table 6. A first range for the product ion m/z ratio may be ±0.5. A second range for the product ion m/z ratio may be ±0.8. A third range for the product ion m/z ratio may be ±1.0. A first range for the precursor ion m/z ratio may be ±0.5; a second range for the precursor ion m/z ratio may be ±1.0; a third range for the precursor ion m/z ratio may be ±1.5. Thus, a composition may include a product ion having an m/z ratio that falls within at least one of the first range (±0.5), the second range (±0.8), or the third range (±1.0) of the product ion m/z ratio identified in Table 6, and characterized as having a precursor ion having an m/z ratio that falls within at least one of a first range (±0.5), a second range (±1.0), or a third range (±1.5) of the precursor ion m/z ratio identified in Table 6.
Table 7 defines the peptide sequences for SEQ ID NOS: 21-46 from Table 1. Table 7 further identifies a corresponding protein SEQ ID NO for each peptide sequence. Each peptide sequence in Table 7 is defined as an amino acid sequence.
Table 8 identifies the proteins of SEQ ID NOS: 1-20 from Table 1. Table 8 identifies a corresponding protein abbreviation and protein name for each of protein SEQ ID NOS: 1-20. Further, Table 8 identifies a corresponding Uniprot ID for each of protein SEQ ID NOS: 1-20.
Table 9 identifies and defines the glycan structures from Table 1. Table 9 identifies a graphical representation of the structure and a coded representation of the composition for each glycan structure included in Table 1. As used herein, the 4-digit GL NO. is a designation that represents the number of hexoses, the number of HexNAcs, the number of Fucoses, and the number of Neuraminic Acids.
Aspects of the disclosure include kits comprising one or more compositions, each comprising one or more peptide structures of the disclosure that can be used as assay standards, and instructions for use. Kits in accordance with one or more embodiments described herein may include a label indicating the intended use of the contents of the kit. The term “label” as used herein with respect to a kit includes any writing, or recorded material supplied on or with a kit, or that otherwise accompanies a kit.
The peptide structures and the transitions produced therefrom, as described herein, may be useful for treatment management of melanoma. A transition includes a precursor ion and at least one product ion grouping. As reviewed herein, the peptide structures in Table 1, as well as their corresponding precursor ion and product ion groupings (these ions having defined m/z ratios or m/z ratios that fall within the m/z ranges identified herein), can be used in mass spectrometry-based analyses to predict treatment response, select a treatment for administration, determine whether to alter a treatment plan or dosage, or a combination thereof.
Aspects of the disclosure include methods for analyzing one or more peptide structures, as described herein. In some embodiments, the methods involve processing a sample from a patient to generate a prepared sample that can be inputted into a mass spectrometry system (e.g., a reaction monitoring mass spectrometry system). In certain embodiments, processing the sample can comprise performing one or more of: a denaturation procedure, a reduction procedure, an alkylation procedure, and a digestion procedure. The denaturation and reduction procedures may be implemented in a manner similar to, for example, denaturation and reduction 202 in
In some embodiments, the methods for analyzing one or more peptide structures involve detecting a set of product ions generated by a reaction monitoring mass spectrometry system in which one or more product ions may correspond to each of the one or more peptide structures that have been inputted into the mass spectrometry system. As described herein, each peptide structure can be converted into a set of product ions having a defined m/z ratio, as provided in Table 6 or an m/z ratio within an identified m/z ratio as provided in Table 6. In some embodiments, the methods involve generating quantification (e.g., abundance) data for the one or more product ions detected using the reaction monitoring mass spectrometry system.
In some embodiments, the methods further comprise generating a diagnosis output using the quantification data and a model that has been trained using supervised or unsupervised machine-learning. In certain embodiments, the reaction monitoring mass spectrometry system may include multiple/selected reaction monitoring mass spectrometry (MRM/SRM-MS) to detect the one or more product ions and generate the quantification data.
Sample data via glycoproteomic analysis of pretreatment blood samples was compiled for a sample population comprising advanced malignant melanoma patients treated with pembrolizumab (Pembro; n=24) or nivolumab-ipilimumab (ipi/nivo; n=11). Samples were analyzed using an advanced glycoproteomics platform that combines ultra-high-performance liquid chromatography coupled to triple quadrupole mass spectrometry and a neural-network-based data processing engine. Individual glycopeptide signatures derived from 67 abundant serum proteins were analyzed and correlated with treatment, progression-free survival (PFS, and other clinical outcome metrics).
Two response groups were defined based on PFS: early disruption (e.g., early failure) (EF; PFS event within 6 months) and sustained control (SC; no events for ≥12 months). Differential relative/absolute abundances for 498 serum glycopeptides and aglycosylated peptides were calculated between SC and EF patients for each treatment group to determine a set of peptide structures more abundant in each SC versus EF by treatment group. A score was developed for each treatment group based on the 20 markers within each treatment group identified as the most statistically significant ones based on one-sided Wilcoxon test comparing EF and SC. For a given patient, the score was computed as the proportion of glycopeptides/aglycosylated peptides with relative/absolute abundance exceeding their median abundance. A low score was associated with high risk for early failure.
Table 10 and Table 11 below show the median abundances identified for the set of peptide structures. These median abundances are examples of what may be used as reference abundances for these peptide structures.
When examined in all patients in the cohort (regardless of treatment), both treatment scores isolated EF from SC. Algorithmic assignment was performed by choosing the treatment with the highest treatment-specific score (e.g., if ipi/nivo score>pembro score, then assign to ipi/nivo). PFS was superior for cases where the assigned treatment matched the treatment received. Log-rank p-values comparing PFS by assigned treatment within pembro- and ipi/nivo-treated cases were 0.009 and 0.0004, respectively. Our results show that serum glycoproteomic analysis allows targeted treatment assignment not only to immune checkpoint inhibitor treatment in general, but specifically to the most likely successful agent among different drugs for melanoma. This may fundamentally improve the clinical use of immuno-therapy in subjects with melanoma.
Provided herein are methods, devices, glycopeptides, and kits for identifying glycoproteomic biomarkers and signatures for risk of having a disease or a condition, progression of the disease or condition, and response of the disease or condition to a treatment, such as treatment with immune checkpoint blockade for cancer. In some cases, the disease or condition may be cancer. In some cases, the progression of the disease or condition includes but is not limited to stage of cancer or size of tumor or a surrogate endpoint. Such information may be used to provide actionable recommendations for treatment to a healthcare provider, including but not limited to initiation of a new treatment, continuation of ongoing treatment, adding a new therapy, or changing the dosage and/or frequency of ongoing treatment.
Protein glycosylation is one of the abundant and most complex form of post-translational protein modification. Glycosylation profoundly can affect structure, conformation, and function of a polypeptide. The elucidation of the potential role of differential polypeptide glycosylation as biomarkers has so far been limited by the technical complexity of generating and interpreting this information. A novel, powerful platform has been established that combines ultra-high-performance liquid chromatography (LC) coupled to triple quadrupole mass spectrometry (MS) with a machine-learning and neural-network-based data processing engine that allows for high-throughput, highly scalable interrogation of the glycoproteome. The glycoproteomic biomarkers and signatures may be used to predict which cancer patients may respond to immune checkpoint blockade treatment, such as PD1/PDL1 checkpoint inhibitors.
Changes in glycosylation have been described in relationship to disease states such as cancer. See, e.g., Dube, D. H.; Bertozzi, C. R. Glycans in Cancer and Inflammation—Potential for Therapeutics and Diagnostics. Nature Rev. Drug Disc. 2005, 4, 477-88, the entire contents of which are herein incorporated by reference in its entirety for all purposes. However, clinically relevant, non-invasive assays for diagnosing cancer in a patient based on glycosylation changes in a sample from that patient are still needed.
Mass spectroscopy (MS) offers sensitive and precise measurement of cancer-specific biomarkers including glycopeptides. See, for example, Ruhaak, L. R., et al., Protein-Specific Differential Glycosylation of Immunoglobulins in Serum of Ovarian Cancer Patients DOI: 10.1021/acs.jproteome.5b01071; J. Proteome Res., 2016, 15, 1002-1010 (2016); also Miyamoto, S., et al., Multiple Reaction Monitoring for the Quantitation of Serum Protein Glycosylation Profiles: Application to Ovarian Cancer, DOI: 10.1021/acs.jproteome.7b00541, J. Proteome Res. 2018, 17, 222-233 (2017), the entire contents of which are herein incorporated by reference in its entirety for all purposes. However, using MS to diagnose cancer has not been demonstrated to date in a clinically relevant manner. What is needed are new biomarkers and new methods of using MS to assess a diagnosis for a disease or a condition, a risk of having a disease or a condition, progression of the disease or condition, and response of the disease or condition to a treatment.
I. Overview
Provided herein are methods for identifying one or more glycopeptide biomarkers predictive of a disease or a condition in a subject, the method comprising: (a) obtaining from a subject a first sample at a first timepoint and a second sample at a second timepoint, wherein the first sample and the second sample comprise a glycoprotein; (b) fragmenting the glycoprotein in the first sample or the second sample into one or more glycopeptides, wherein the one or more glycopeptides comprise one or more amino acid sequences selected from a group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof; (c) determining an amount of the one or more glycopeptides using multiple reaction monitoring mass spectrometry (MRM-MS); (d) associating the amount of the one or more glycopeptides with the first timepoint or the second timepoint, wherein the subject has a change in a disease or a condition from the first timepoint to the second timepoint; and (e) identifying as glycopeptide biomarkers the glycopeptide where the amount of the one or more glycopeptides changed from the first timepoint to the second timepoint.
Described herein are methods for identifying one or more glycopeptide biomarkers predictive of a disease or a condition in a subject, the method comprising: (a) obtaining, by a computer, data of an amount of one or more glycopeptides for a set (n) of subjects, wherein the one or more glycopeptides are generated by fragmenting a glycoprotein in a sample from a subject, the amount of one or more glycopeptides are determined using multiple reaction monitoring mass spectrometry (MRM-MS), and the data for each subject comprises data from samples taken at a plurality of timepoints; (b) selecting, by the computer, a subset of the one or more glycopeptides to include in a predictive model; (c) assessing, by the computer, the predictive model using a cross-validation with n−1 subjects to generate an outcome score for a holdout subject; (d) iterating, by the computer, step (c) for each of n subjects as the holdout subject to generate an outcome score for each subject; (e) dichotomizing, by the computer, the outcome scores for each subject at a cutoff outcome score as below or above the cutoff outcome score; (f) analyzing, by the computer, the amount of one or more glycopeptides for subjects having outcome scores above the cutoff outcome score to the amount of one or more glycopeptides for subjects having outcome scores below the cutoff outcome score for each glycopeptide in the subset of the one or more glycopeptides to determine a hazard ratio and an interaction p-value for each glycopeptide; (g) identifying, by the computer, the glycopeptide having the interaction p-value ≤0.05 as a glycopeptide biomarker for predicting the disease or the condition. In some embodiments, the cross-validation is leave-one-out cross-validation (LOOCV). In some embodiments, the cutoff outcome score was determined to optimize Harrell's C-index. In some embodiments, the interaction p-value is less than or equal to 0.01, 0.005, or 0.001 in step (g).
Provided herein are method for identifying one or more glycopeptide biomarkers predictive of a disease or a condition in a subject, the method comprising: (a) obtaining, by a computer, data of an amount of one or more glycopeptides for a set (n) of subjects, wherein the one or more glycopeptides are generated by fragmenting a glycoprotein in a sample from a subject, the amount of one or more glycopeptides are determined using multiple reaction monitoring mass spectrometry (MRM-MS), and the data for each subject comprises data from samples taken at a plurality of timepoints; (b) selecting, by the computer, a subset of the one or more glycopeptides to include in a predictive model; (c) assessing, by the computer, the predictive model using a cross-validation with n−1 subjects to generate an outcome score for a holdout subject; (d) iterating, by the computer, step (c) for each of n subjects as the holdout subject to generate an outcome score for each subject; (e) dichotomizing, by the computer, the outcome scores for each subject at a cutoff outcome score as below or above the cutoff outcome score; (f) analyzing, by the computer, the amount of one or more glycopeptides for subjects having outcome scores above the cutoff outcome score to the amount of one or more glycopeptides for subjects having outcome scores below the cutoff outcome score for each glycopeptide in the subset of the one or more glycopeptides to determine a hazard ratio and an interaction p-value for each glycopeptide; (g) identifying, by the computer, the glycopeptide having the interaction p-value ≤0.05 as a glycopeptide biomarker for predicting the disease or the condition.
Described herein are methods for assessing a status of a condition and a treatment in a subject, the method comprising: (a) fragmenting a glycoprotein in a sample from a subject into one or more glycopeptides, wherein the sample comprises one or more of glycoproteins, glycans, or glycopeptides; (b) performing mass spectroscopy (MS) on the one or more glycopeptides using multiple reaction monitoring mass spectrometry (MRM-MS) to quantify an amount of the one or more glycopeptides in the sample, wherein the one or more glycopeptides comprise one or more amino acid sequences selected from a group consisting of SEQ ID NOs:101-131, 159-207, and 21-46, and combinations thereof; (c) inputting data of the amount of the one or more glycopeptides into a trained model to generate an output probability, wherein the output probability is indicative of whether a treatment positively influences an outcome of the subject having a condition; and (d) generating a treatment recommendation based on the output probability, wherein the condition is melanoma and the treatment comprises checkpoint inhibitors. In some embodiments, the outcome comprises overall survival time. In some embodiments, the outcome comprises progression-free survival time. In some embodiments, the treatment comprises one or more of ipilimumab, nivolumab, and pembrolizumab. In some embodiments, the treatment comprises one or more of PD-1-, PD-L1-, and CTLA-4-inhibitors. In some embodiments, the treatment comprises chemotherapy. In some embodiments, the chemotherapy comprises one or more of carboplatin and pemetrexed. In some embodiments, the recommendation comprises continuing the treatment if the output probability indicates the treatment positively influences the outcome.
Provided herein are methods for assessing a status of a condition and a treatment in a subject, the method comprising: (a) fragmenting a glycoprotein in a sample from a subject into one or more glycopeptides, wherein the sample comprises one or more of glycoproteins, glycans, or glycopeptides; (b) performing mass spectroscopy (MS) on the one or more glycopeptides using multiple reaction monitoring mass spectrometry (MRM-MS) to quantify an amount of the one or more glycopeptides in the sample, wherein the one or more glycopeptides comprise one or more amino acid sequences selected from a group consisting of SEQ ID NOs: 101-131, 159-207, and 21-46, and combinations thereof; (c) inputting data of the amount of the one or more glycopeptides into a trained model to generate an output probability, wherein the output probability is indicative of whether a treatment positively influences an outcome of the subject having a condition; and (d) generating a treatment recommendation based on the output probability, wherein the condition is non-small cell lung cancer (NSCLC) and the treatment comprises checkpoint inhibitors. In some embodiments, the outcome comprises overall survival time. In some embodiments, the outcome comprises progression-free survival time. In some embodiments, the treatment comprises one or more of ipilimumab, nivolumab, and pembrolizumab. In some embodiments, the treatment comprises one or more of PD-1-, PD-L1-, and CTLA-4-inhibitors. In some embodiments, the treatment comprises chemotherapy. In some embodiments, the chemotherapy comprises one or more of carboplatin and pemetrexed. In some embodiments, the recommendation comprises continuing the treatment if the output probability indicates the treatment positively influences the outcome.
In some embodiments, provided herein are methods for identifying a classification for a sample, the method comprising: quantifying by mass spectroscopy (MS) one or more glycopeptides in a sample wherein the glycopeptides each, individually in each instance, comprises a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof; and inputting the quantification into a trained model to generate an output probability; determining if the output probability is above or below a threshold for a classification; and identifying a classification for the sample based on whether the output probability is above or below a threshold for a classification.
In some embodiments, provided herein are methods for training a machine-learning algorithm, comprising: providing a first data set of MRM transition signals indicative of a sample comprising a glycopeptide consisting of, or consisting essentially of, an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46; providing a second data set of MRM transition signals indicative of a control sample; and comparing the first data set with the second data set using a machine-learning algorithm.
In some embodiments, provided herein are methods for diagnosing a patient having cancer; the method comprising: obtaining a biological sample from the patient; performing mass spectroscopy of the biological sample using MRM-MS with a QQQ and/or qTOF spectrometer to detect and quantify one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46; or to detect and quantify one or more MRM transitions; inputting the quantification of the detected glycopeptides or the MRM transitions into a trained model to generate an output probability, determining if the output probability is above or below a threshold for a classification; identifying a diagnostic classification for the patient based on whether the output probability is above or below a threshold for a classification; and providing a recommendation for treatment. In some examples, the method includes performing mass spectroscopy of the biological sample using MRM-MS with a QQQ.
II. Biomarkers
Provided herein are glycopeptide biomarkers. These biomarkers are useful for a variety of applications, including, but not limited to, diagnosing diseases and conditions. For example, certain biomarkers set forth herein, or combinations thereof, are useful for diagnosing cancer. In some embodiments, the cancer is melanoma. In some embodiments, the cancer is non-small cell lung cancer (NSCLC). In some embodiments, the biomarkers are useful for diagnosing and screening patients having cancer, an autoimmune disease, or fibrosis. In some embodiments, the biomarkers are useful for classifying a patient so that the patient receives the appropriate medical treatment. In some embodiments, the biomarkers are useful for treating or ameliorating a disease or condition in patient by, for example, identifying a therapeutic agent with which to treat a patient. In some embodiments, the biomarkers are useful for determining a prognosis of treatment for a patient or a likelihood of success or survivability for a treatment regimen.
in some embodiments, a sample from a patient is analyzed by MS and the results are used to determine the presence, absolute amount, and/or relative amount of a glycopeptide consisting of an amino acid sequence selected from SEQ ID NO: 101-131, 159-207, and 21-46 in the sample. In some embodiments, a sample from a patient, is analyzed by MS and the results are used to determine the presence, absolute amount, and/or relative amount of a glycopeptide consisting essentially of an amino acid sequence selected from SEQ ID NO: 101-131, 159-207, and 21-46 in the sample. In some embodiments, a sample from a patient is analyzed by MS and the results are used to determine the presence, absolute amount, and/or relative amount of a glycopeptide consisting of, or consisting essentially of, an amino acid sequence selected from SEQ ID NO: 101-131, 159-207, and 21-46 in the sample. In some embodiments, a sample from a patient is analyzed by MS and the results are used to determine the presence, absolute amount, and/or relative amount of a glycopeptide consisting of, or consisting essentially of, an amino acid sequence selected from SEQ ID NO: 101-131, 159-207, and 21-46 in the sample. In some embodiments, the presence, absolute amount, and/or relative amount of a glycopeptide is determined by analyzing the MS results. In some embodiments, the MS results are analyzed using machine-learning.
Provided herein are biomarkers selected from glycans, peptides, glycopeptides, fragments thereof, and combinations thereof. In some embodiments, the glycopeptide comprise an amino acid sequence selected from SEQ ID NO: 101-131, 159-207, and 21-46. In some embodiments, the glycopeptide consists essentially of an amino acid sequence selected from SEQ ID NO: 101-131, 159-207, and 21-46.
In some examples, the glycopeptides set forth herein include O-glycosylated peptides. These peptides include glycopeptides in which a glycan is bonded to the peptide through an oxygen atom of an amino acid. Typically, the amino acid to which the glycan is bonded is threonine (T) or serine (S). In some examples, the amino acid to which the glycan is bonded is threonine (T). In some examples, the amino acid to which the glycan is bonded is serine (S).
In certain examples, the 0-glycosylated peptides include those peptides from the group selected from Apolipoprotein C-Ill (APOC3), Alpha-2-HS-glycoprotein (FETUA.), and combinations thereof. In certain examples, the O-glycosylated peptide, set forth herein, is an Apolipoprotein (APOC3) peptide. In certain examples, the O-glycosylated peptide, set forth herein, is an Alpha-2-HS-glycoprotein (FETUA).
In some examples, the glycopeptides set forth herein include N-glycosylated peptides. These peptides include glycopeptides in which a glycan is bonded to the peptide through a nitrogen atom of an amino acid. Typically, the amino acid to which the glycan is bonded is asparagine (N) or arginine (R). In some examples, the amino acid to which the glycan is bonded is asparagine (N). In some examples, the amino acid to which the glycan is bonded is arginine (R).
In certain examples, the N-glycosylated peptides include members selected from the group consisting of Alpha-1-antitrypsin (A1AT), Alpha-1B-glycoprotein (A1BG), Leucine-richAlpha-2-glycoprotein (A2GL), Alpha-2-macroglobulin (A2MG), Alpha-1-antichymotrypsin (AACT), Afamin (AFAM), Alpha-1-acid glycoprotein 1 & 2 (AGP12), Alpha-1-acid glycoprotein 1 (AGP1), Alpha-1-acid glycoprotein 2 (AGP2), Apolipoprotein A-I (APOA1), Apolipoprotein B-100 (APOB), Apolipoprotein D (APOD), Beta-2-glycoprotein-1 (APOH), Apolipoprotein M (APOM), Attractin (ATRN), Calpain-3 (CAN3), Ceruloplasmin (CERU), Complement Factor H (CFAH), Complement Factor I (CFAI), Clusterin (CLU), ComplementC3 (CO3), ComplementC4-A&B (CO4A&CO4B), ComplementcomponentE6 (CO6),
ComplementComponentC8AChain (CO8A), Coagulation factor XII (FA12),
Haptoglobin (HPT), Histidine-rich Glycoprotein (HRG), Immunoglobulin heavy constant alpha 1&2 (IgA12), Immunoglobulin heavy constant alpha 2 (IgA2).
Immunoglobulin heavy constant gamma 2 (IgG2), Immunoglobulin heavy constant mu (IgM), Inter-alpha-trypsin inhibitor heavy chain H1 (ITIH1), Plasma Kallikrein (KLKB1),
Kininogen-1 (KNG1), Serum paraoxonase/arylesterase 1 (PON1), Selenoprotein P (SEPP1), Prothrombin (THRB), Serotransferrin (TREE), Transthyretin (TTR), Protein unc-13HomologA (UN13A), Vitronectin (VTNC), Zinc-alpha-2-glycoprotein (ZA2G), growth factor-II (IGF2), Apolipoprotein C-I (APOC1), Hemopexin (HEMO), Immunoglobulin heavy constant gamma 1 (IgG1), Immunoglobulin J chain (0.10), and combinations thereof.
In some examples, set forth herein is a glycopeptide or peptide consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof.
In some examples, set forth herein is a glycopeptide or peptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof.
III. Methods
Provided herein are methods of identifying the glycoproteomic biomarkers and signatures that may be used to predict which cancer patients respond to immune checkpoint blockade treatment, such as PD1/PDL1 checkpoint inhibitors, and have an improvement or a positive change in their condition.
In some embodiments, individual glycopeptide expression levels are associated with various timepoints to determine which glycopeptides changed with events, such as death or metastasis, at the various timepoints. In some embodiments, individual glycopeptide expression levels are associated with time from treatment initiation to progression/metastasis (progression-free survival, PFS) or death (overall survival, OS) in the patient cohorts. In some embodiments, examples of individual glycopeptide expression levels are shown in
In some embodiments, multivariable models are used predict OS and PFS in cancer patients. In some embodiments, the cancer patients have NSCLC or melanoma. In some embodiments, a small subset of glycopeptides for modeling are selected, a model with n−1 patients from a total of n patients is built, a survival score on the one holdout patient is predicted, and the step are iterated over all patients as individual holdouts, to generate unbiased prediction scores for everyone (a leave-one-out cross-validation approach, LOOCV). In some embodiments, the resulting scores are dichotomized at a cutoff which optimizes Harrell's C-index. In some embodiments, Kaplan-Meier (KM) curves were plotted for each glycopeptide.
In some embodiments, hazard ratio (HR), p-value, and interaction P-value were calculated. In some embodiments, hazard ratio (HR) is calculated from a Cox Proportional Hazards model, representing the multiplicative increase in odds of death or progression-free survival time for each increase of the biomarker by 1 unit. In some embodiments, p-value is associated with the HR above. In some embodiments, P<0.01 was considered significant. In some embodiments, P≤0.05, P≤0.01, P≤0.005, or P≤0.001 was considered significant. In some embodiments, interaction P-value is associated with the biomarker x treatment interaction; significance indicates potential for use in treatment selection.
In some embodiments, the model helped to determine whether the glycopeptide marker individually predictive of OS. In some embodiments, the model helped to determine whether the glycopeptide marker individually predictive of PFS. In some embodiments, the model helped to determine whether the glycopeptide marker individually is of use in treatment selection or varied with and without treatment. In some embodiments, individual Kaplan-Meier (KM) curves are plotted for the markers relevant in each disease for each outcome, such as OS or PFS. In some embodiments, hazard ratios and p-values on the plots are representative of the plotted high/low split at median biomarker expression. Examples of individual KM curves are shown in
In some embodiments, patients are treated with a therapeutically effective amount of an immune-therapeutic. In some embodiments, the immune-therapeutic comprises an immune checkpoint inhibitor. In some embodiments, the checkpoint inhibitor comprises PD-1 inhibitors, PD-L1 inhibitors, or CTLA-4 inhibitors, or combinations thereof.
In some embodiments, patients are treated with a therapeutically effective amount of a targeted therapeutic agent. In some embodiments, the targeted therapeutic agent is a drug that targets blood vessel that targets vascular endothelial growth factor (VEGF) such as bevacizumab, ramucirumab, and ziv-aflibercept. In some embodiments, the targeted therapeutic agent comprises an epidermal growth factor receptor (EGFR). In some embodiments, the EGFR comprises cetuximab or panitumumab. In some embodiments, the targeted therapeutic agent comprises a kinase inhibitor. In some embodiments, the kinase inhibitor comprises regorafenib.
In some embodiments, the patient is treated with a targeted therapy. In some embodiments, the methods herein include administering a therapeutically effective amount of one or more of 5-fluorouracil (5-FU); capecitabine, irinotecan, oxaliplatin, trifluridine, or tipiracil.
In some embodiments, provided herein are methods for detecting one or more a multiple-reaction-monitoring (MRM) transition, comprising: obtaining a biological sample from a patient, wherein the biological sample comprises one or more glycopeptides; digesting and/or fragmenting a glycopeptide in the sample; and detecting a multiple-reaction-monitoring (MRM) transition.
In some embodiments, provided herein are methods of detecting one or more glycopeptides, wherein each glycopeptide is individually in each instance selected from a glycopeptide consisting of an amino acid sequence selected from the group consisting of SEQ ID NOs: 21-46, 101-131, and 159-207, and combinations thereof. In some embodiments, provided herein are methods of detecting one or more glycopeptides, wherein each glycopeptide is individually in each instance selected from a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof.
In some embodiments, provided herein are methods of detecting one or more glycopeptides. In some examples, set forth herein is a method of detecting one or more glycopeptide fragments. In certain examples, the method includes detecting the glycopeptide group to which the glycopeptide, or fragment thereof, belongs. In some of these examples, the glycopeptide group is selected from Alpha-1-antitrypsin (A1AT), Alpha-1B-glycoprotein (A1BG), Leucine-richAlpha-2-glycoprotein (A2GL), Alpha-2-macroglobulin (A2MG), Alpha-1-antichymotrypsin (AACT), Afamin (AFAM), Alpha-1-acid glycoprotein 1 & 2 (AGP12), Alpha-1-acid glycoprotein 1 (AGP1), Alpha-1-acid glycoprotein 2 (AGP2), Apolipoprotein A-I (APOA1), Apolipoprotein C-III (APOC3), Apolipoprotein B-100 (APOB), Apolipoprotein D (APOD), Beta-2-glycoprotein-1 (APOH), Apolipoprotein M (APOM), Attractin (ATRN), Calpain-3 (CAN3), Ceruloplasmin (CERU), Complement Factor H (CFAH), Complement Factor I (CFAI), Clusterin (CLUS), ComplementC3 (CO3), ComplementC4-A&B (CO4A&CO4B), ComplementcomponentC6 (CO6), ComplementComponentC8AChain (CO8A), Coagulation factor XII (FA12), Alpha-2-HS-glycoprotein (FETUA), Haptoglobin (HPT), Histidine-rich Glycoprotein (HRG), Immunoglobulin heavy constant alpha 1&2 (IgA12), Immunoglobulin heavy constant alpha 2 (IgA2), Immunoglobulin heavy constant gamma 2 (IgG2), Immunoglobulin heavy constant mu (IgM), Inter-alpha-trypsin inhibitor heavy chain H1 (ITIH1), Plasma Kallikrein (KLKB1), Kininogen-1 (KNG1), Serum paraoxonase/arylesterase 1 (PON1), Selenoprotein P (SEPP1), Prothrombin (THRB), Serotransferrin (TRFE), Transthyretin (TTR), Protein unc-13HomologA (UN13A), Vitronectin (VTNC), Zinc-alpha-2-glycoprotein (ZA2G), Insulin-like growth factor-II (IGF2), Apolipoprotein C-I (APOC1), and combinations thereof.
In some embodiments, provided herein are methods comprising detecting a glycopeptide, a glycan on the glycopeptide and the glycosylation site residue where the glycan bonds to the glycopeptide. In some embodiments, the method includes detecting a glycan residue. In some embodiments, the method includes detecting a glycosylation site on a glycopeptide. In some embodiments, this process is accomplished with mass spectroscopy used in tandem with liquid chromatography.
In some embodiments, provided herein are methods comprising obtaining a biological sample from a patient. In some examples, the biological sample is synovial fluid, whole blood, blood serum, blood plasma, urine, sputum, tissue, saliva, tears, spinal fluid, tissue section(s) obtained by biopsy; cell(s) that are placed in or adapted to tissue culture; sweat, mucous, fecal material, gastric fluid, abdominal fluid, amniotic fluid, cyst fluid, peritoneal fluid, pancreatic juice, breast milk, lung lavage, bone marrow, gastric acid, bile, semen, pus, aqueous humor, transudate, or combinations of the foregoing. In some examples, the biological sample is selected from the group consisting of blood, plasma, saliva, mucus, urine, stool, tissue, sweat, tears, hair, or a combination thereof. In some examples, the biological sample is a blood sample. In some examples, the biological sample is a plasma sample. In some examples, the biological sample is a saliva sample. In some examples, the biological sample is a mucus sample. In some examples, the biological sample is a urine sample. In some examples, the biological sample is a stool sample. In some examples, the biological sample is a sweat sample. In some examples, the biological sample is a tear sample. In some examples, the biological sample is a hair sample.
In some examples, the method comprises digesting and/or fragmenting a glycopeptide in the sample. In some examples, the method includes digesting a glycopeptide in the sample. In some examples, the method includes fragmenting a glycopeptide in the sample. In some examples, the digested or fragmented glycopeptide is analyzed using mass spectroscopy. In some examples, the glycopeptide is digested or fragmented in the solution phase using digestive enzymes. In some examples, the glycopeptide is digested or fragmented in the gaseous phase inside a mass spectrometer, or the instrumentation associated with a mass spectrometer. In some examples, the mass spectroscopy results are analyzed using machine-learning algorithms. In some examples, the mass spectroscopy results are the quantification of the glycopeptides, glycans, peptides, and fragments thereof. In some examples, this quantification is used as an input in a trained model to generate an output probability. The output probability is a probability of being within a given category or classification, e.g., the classification of having cancer or the classification of not having cancer. In some other examples, the output probability is a probability of being within a given category or classification, e.g., the classification of having cancer or the classification of not having cancer. In some examples, the output probability is a probability of being within a given category or classification, e.g., the classification of having an autoimmune disease or the classification of not having an autoimmune disease. In some examples, the output probability is a probability of being within a given category or classification, e.g., the classification of having fibrosis or the classification of not having fibrosis.
In some examples, the mass spectroscopy is performed using multiple reaction monitoring (MRM) mode. In some examples, the mass spectroscopy is performed using qTOF MS in data-dependent acquisition. In some examples, the mass spectroscopy is performed using or MS-only mode.
In some examples, the method comprises introducing the sample, or a portion thereof, into a mass spectrometer. In some examples, the method comprises fragmenting a glycopeptide in the sample after introducing the sample, or a portion thereof, into the mass spectrometer. In some examples, the method includes digesting a glycopeptide in the sample occurs before introducing the sample, or a portion thereof, into the mass spectrometer. In some examples, the method comprises fragmenting a glycopeptide in the sample to provide a glycopeptide ion, a peptide ion, a glycan ion, a glycan adduct ion, or a glycan fragment ion. In some examples, the method comprises digesting and/or fragmenting a glycopeptide in the sample to provide one or more glycopeptides consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof. In some examples, the method comprises digesting and/or fragmenting a glycopeptide in the sample to provide one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof.
In some examples, the method includes detecting an MRM transition indicative of a glycopeptide or glycan residue, wherein the glycopeptide consists essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 and combinations thereof. In some examples, the method includes detecting an MRM transition indicative of a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 and combinations thereof.. In some examples, the method includes detecting more than one MRM transition indicative of a combination of glycopeptides having amino acid sequences selected from a combination of SEQ ID NO: 101-131, 159-207, and 21-46.
In some examples, the method includes detecting a MRM transition indicative of a glycopeptide or glycan residue, wherein the glycopeptide consists essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof. In some examples, the method includes detecting a MRM transition indicative of a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, and combinations thereof. In some examples, the method includes detecting a MRM transition indicative of a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 159-207, and combinations thereof. In some examples, the method includes detecting a MRM transition indicative of a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 21-46, and combinations thereof.
In some examples, the method includes detecting a MRM transition indicative of a glycopeptide or glycan residue, wherein the glycopeptide consists essentially of an amino acid sequence selected from the group consisting of SEQ ID NOs: 221-46, and combinations thereof. In some examples, the method includes detecting a MRM transition indicative of a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NOs: 101-131, and combinations thereof.. In some examples, the method includes detecting a MRM transition indicative of a glycopeptide or glycan residue, wherein the glycopeptide consists essentially of an amino acid sequence selected from the group consisting of 159-207.
In some examples, the method comprises performing mass spectroscopy on the biological sample using multiple-reaction-monitoring mass spectroscopy (MRM-MS).
In some examples, the method includes digesting a glycoprotein in the sample to provide one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof. In some examples, the biological sample is combined with chemical reagents. In some examples, the biological sample is combined with enzymes. In some examples, the enzymes are lipases. In some examples, the enzymes are proteases. In some examples, the enzymes are serine proteases. In some examples, the enzyme is selected from the group consisting of trypsin, chymotrypsin, thrombin, elastase, and subtilisin. In some examples, the enzyme is trypsin. In some examples, the methods comprises contacting at least two proteases with a glycopeptide in a sample. In some examples, the at least two proteases are selected from the group consisting of serine protease, threonine protease, cysteine protease, aspartate protease. In some examples, the at least two proteases are selected from the group consisting of trypsin, chymotrypsin, endoproteinase, Asp-N, Arg-C, Glu-C, Lys-C, pepsin, thermolysin, elastase, papain, proteinase K, subtilisin, clostripain, and carboxypeptidase protease, glutamic acid protease, metalloprotease, and asparagine peptide lyase.
In some examples, the method includes detecting an MRM transition indicative of a glycopeptide or glycan residue, wherein the glycopeptide consisting of, or consisting essentially of, an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 and combinations thereof. In some examples, the method includes detecting an MRM transition indicative of a glycopeptide or glycan residue, wherein the glycopeptide consists essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 and combinations thereof. In some examples, the method includes detecting an MRM transition indicative of a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 and combinations thereof. In some examples, the method includes detecting more than one MRM transition indicative of a combination of glycopeptides having amino acid sequences selected from a combination of SEQ ID NO: 101-131, 159-207, and 21-46.
In some examples, the method includes detecting a MRM transition indicative of a glycopeptide or glycan residue, wherein the glycopeptide consists essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof.
In some examples, the method comprises performing mass spectroscopy on the biological sample using multiple-reaction-monitoring mass spectroscopy (MRM-MS).
In some examples, the method comprises digesting a glycopeptide in the sample to provide a glycopeptide consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof. In some examples, the biological sample is contacted with one or more chemical reagents. In some examples, the biological sample is contacted with one or more enzymes. In some examples, the enzymes are lipases. In some examples, the enzymes are proteases. In some examples, the enzymes are serine proteases. In some examples, the enzyme is selected from the group consisting of trypsin, chymotrypsin, thrombin, elastase, and subtilisin. In some of these examples, the enzyme is trypsin. In some examples, the methods include contacting at least two proteases with a glycopeptide in a sample. In some examples, the at least two proteases are selected from the group consisting of serine protease, threonine protease, cysteine protease, aspartate protease. In some examples, the at least two proteases are selected from the group consisting of trypsin, chymotrypsin, endoproteinase, Asp-N, Arg-C, Glu-C, Lys-C, pepsin, thermolysin, elastase, papain, proteinase K, subtilisin, clostripain, and carboxypeptidase protease, glutamic acid protease, metalloprotease, and asparagine peptide lyase.
In some examples, the method includes conducting tandem liquid chromatography-mass spectroscopy on the biological sample. In some examples, the method includes multiple-reaction-monitoring mass spectroscopy (MRM-MS) mass spectroscopy on the biological sample. In some examples, the method includes detecting an MRM transition using a triple quadrupole (QQQ) and/or a quadrupole time-of-flight (qTOF) mass spectrometer. In some examples, the method includes detecting an MRM transition using a QQQ mass spectrometer. In some examples, the method includes detecting using a qTOF mass spectrometer. In some examples, a suitable instrument for use with the instant methods is an Agilent 6495B Triple Quadrupole LC/MS. In some examples, the method includes detecting using a QQQ mass spectrometer. In some examples, a suitable instrument for use with the instant methods is an Agilent 6545 LC/Q-TOF.
In some examples, the method comprises detecting more than one MRM transition using a QQQ and/or qTOF mass spectrometer. In some examples, the method includes detecting more than one MRM transition using a QQQ mass spectrometer. In some examples, the method includes detecting more than one MRM transition using a qTOF mass spectrometer. In some examples, the method includes detecting more than one MRM transition using a QQQ mass spectrometer.
In some examples, the methods herein include quantifying one or more glycomic parameters of the one or more biological samples comprises employing a coupled chromatography procedure. In some examples, these glycomic parameters include the identification of a glycopeptide group, identification of glycans on the glycopeptide, identification of a glycosylation site, identification of part of an amino acid sequence which the glycopeptide includes. In some examples, the coupled chromatography procedure comprises: performing or effectuating a liquid chromatography-mass spectrometry (LC-MS) operation. In some examples, the coupled chromatography procedure comprises: performing or effectuating a multiple reaction monitoring mass spectrometry (MRM-MS) operation. In some examples, the methods herein include a coupled chromatography procedure which comprises: performing or effectuating a liquid chromatography-mass spectrometry (LC-MS) operation; and effectuating a multiple reaction monitoring mass spectrometry (MRM-MS) operation. In some examples, the methods include training a machine-learning algorithm using one or more glycomic parameters of the one or more biological samples obtained by one or more of a triple quadrupole (QQQ) mass spectrometry operation and/or a quadrupole time-of-flight (qTOF) mass spectrometry operation. In some examples, the methods include training a machine-learning algorithm using one or more glycomic parameters of the one or more biological samples obtained by a triple quadrupole (QQQ) mass spectrometry operation. In some examples, the methods include training a machine-learning algorithm using one or more glycomic parameters of the one or more biological samples obtained by a quadrupole time-of-flight (qTOF) mass spectrometry operation. In some examples, the methods include quantifying one or more glycomic parameters of the one or more biological samples comprises employing one or more of a triple quadrupole (QQQ) mass spectrometry operation and a quadrupole time-of-flight (qTOF) mass spectrometry operation. In some examples, machine-learning algorithms are used to quantify these glycomic parameters. In some examples, including any of the foregoing, the mass spectroscopy is performed using multiple reaction monitoring (MRM) mode. In some examples, the mass spectroscopy is performed using qTOF MS in data-dependent acquisition. In some examples, the mass spectroscopy is performed using or MS-only mode.
In some examples, the method includes detecting one or more MRM transitions indicative of glycans. In some examples, the method comprises quantifying a glycan. In some examples, the method comprises quantifying a first glycan and quantifying a second glycan; and further comprising comparing the quantification of the first glycan with the quantification of the second glycan. In some examples, the method comprises associating the detected glycan with a peptide residue site, whence the glycan was bonded. In some examples, the method comprises generating a glycosylation profile of the sample. In some examples, the method comprises associating the detected glycan with a timepoint.
In some examples, the method includes spatially profiling glycans on a tissue section associated with the sample. In some examples, including any of the foregoing, the method includes spatially profiling glycopeptides on a tissue section associated with the sample. In some examples, the method includes matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF) mass spectroscopy in combination with the methods herein.
In some examples, the method includes quantifying relative abundance of a glycan and/or a peptide.
In some examples, the method includes normalizing the amount of a glycopeptide by quantifying a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof and comparing that quantification to the amount of another chemical species. In some examples, the method includes normalizing the amount of a peptide by quantifying a glycopeptide consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof, and comparing that quantification to the amount of another glycopeptide consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46. In some examples, the method includes normalizing the amount of a peptide by quantifying a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof, and comparing that quantification to the amount of another glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46.
In some embodiment, provided herein are methods for identifying a classification for a sample, the method comprising: quantifying by mass spectroscopy (MS) one or more glycopeptides in a sample wherein the glycopeptides each, individually in each instance, comprises a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of, or consisting essentially of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof; and inputting the quantification into a trained model to generate a output probability; determining if the output probability is above or below a threshold for a classification; and identifying a classification for the sample based on whether the output probability is above or below a threshold for a classification.
In some examples, provided herein are methods for identifying glycopeptide biomarkers, comprising: obtaining a biological sample from a patient; digesting and/or fragmenting a glycopeptide in the sample; detecting a multiple-reaction-monitoring (MRM) transition; and classifying the glycopeptides based on the MRM transitions detected. In some examples, a machine-learning algorithm is used to train a model using the analyzed the MRM transitions as inputs. In some examples, a machine-learning algorithm is trained using the MRM transitions as a training data set. In some examples, the methods herein include identifying glycopeptides, peptides, and glycans based on their mass spectroscopy relative abundance. In some examples, a machine-learning algorithm or algorithms select and/or identify peaks in a mass spectroscopy spectrum. In some examples, the MS is MRM-MS with a QQQ and/or qTOF mass spectrometer.
In some examples, including any of the foregoing, the mass spectroscopy is performed using multiple reaction monitoring (MRM) mode. In some examples, the mass spectroscopy is performed using qTOF MS in data-dependent acquisition. In some examples, the mass spectroscopy is performed using or MS-only mode.
In some examples, the machine-learning algorithm is selected from the group consisting of a deep learning algorithm, a neural network algorithm, an artificial neural network algorithm, a supervised machine-learning algorithm, a linear discriminant analysis algorithm, a quadratic discriminant analysis algorithm, a support vector machine algorithm, a linear basis function kernel support vector algorithm, a radial basis function kernel support vector algorithm, a random forest algorithm, a genetic algorithm, a nearest neighbor algorithm, k-nearest neighbors, a naive Bayes classifier algorithm, a logistic regression algorithm, or a combination thereof. In certain examples, the machine-learning algorithm is lasso regression.
In some examples, the method includes classifying a sample as within, or embraced by, a disease classification or a disease severity classification.
In some examples, the classification is identified with 80% confidence, 85% confidence, 90% confidence, 95% confidence, 99% confidence, or 99.9999% confidence.
In some examples, the method includes quantifying by MS the glycopeptide in a sample at a first time point; quantifying by MS the glycopeptide in a sample at a second time point; and comparing the quantification at the first time point with the quantification at the second time point.
In some examples, the method includes quantifying by MS a different glycopeptide in a sample at a third time point; quantifying by MS the different glycopeptide in a sample at a fourth time point; and comparing the quantification at the fourth time point with the quantification at the third time point.
In some examples, the method includes monitoring the health status of a patient.
In some examples, monitoring the health status of a patient includes monitoring the onset and progression of disease in a patient with risk factors such as genetic mutations, as well as detecting cancer recurrence.
In some examples, the method includes diagnosing a patient with a disease or condition based on the quantification. In some examples, the method includes treating the patient with a therapeutically effective amount of a therapeutic agent comprising one or more of a chemotherapeutic, an immunotherapy, a hormone therapy, a targeted therapy, a neoadjuvant therapy, and surgery. In some embodiments, the treatment comprises checkpoint inhibitors. In some examples, the method includes diagnosing an individual with a disease or condition based on the quantification. In some examples, the method includes treating the individual with a therapeutically effective amount of a treatment.
In some examples, provided herein are methods for assessing a patient having a disease or condition, comprising measuring by mass spectroscopy a glycopeptide in a sample from the patient.
In another embodiment, provided herein are methods for assessing a patient having cancer; the method comprising: obtaining a biological sample from the patient; performing mass spectroscopy of the biological sample using MRM-MS with a QQQ and/or qTOF spectrometer to detect and quantify one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46; inputting the quantification of the detected glycopeptides or the MRM transitions into a trained model to generate an output probability, determining if the output probability is above or below a threshold for a classification; and identifying a diagnostic classification for the patient based on whether the output probability is above or below a threshold for a classification; and assessing the patient as having cancer based on the classification.
In another embodiment, set forth herein is a method for diagnosing a patient having cancer; the method comprising: inputting the quantification of detected glycopeptides or MRM transitions into a trained model to generate an output probability, determining if the output probability is above or below a threshold for a classification; and identifying a diagnostic classification for the patient based on whether the output probability is above or below a threshold for a classification; and assessing the patient as based on the classification. In some examples, the method includes obtaining a biological sample from the patient; performing mass spectroscopy of the biological sample using MRM-MS with a QQQ and/or qTOF spectrometer to detect and quantify one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of 21-46, 101-131, and 159-207.
In some examples, set forth herein is a method for assessing a patient having cancer; the method comprising: obtaining a biological sample from the patient; performing mass spectroscopy of the biological sample using MRM-MS with a QQQ and/or qTOF spectrometer to detect one or more glycopeptides consisting or, or consisting essentially of, an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46; analyzing the detected glycopeptides or the MRM transitions to identify a classification; and assessing the patient based on the diagnostic classification.
In some examples, set forth herein is a method for assessing a patient having cancer; the method comprising: analyzing detected or quantified glycopeptides or MRM transitions to identify a classification; and assessing the patient based on the classification. In some examples, the method includes obtaining a biological sample from the patient; and performing mass spectroscopy of the biological sample using MRM-MS with a QQQ and/or qTOF spectrometer to detect one or more glycopeptides consisting or, or consisting essentially of, an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46.
In some examples, set forth herein is a method for diagnosing, monitoring, or classifying aging in an individual; the method comprising: obtaining a biological sample from the patient; performing mass spectroscopy of the biological sample using MRM-MS with a QQQ and/or qTOF spectrometer to detect one or more glycopeptides consisting or, or consisting essentially of, an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46; analyzing the detected glycopeptides or the MRM transitions to identify a diagnostic classification; and diagnosing, monitoring, or classifying the individual as having an aging classification based on the diagnostic classification.
Provided herein are biomarkers for diagnosing a variety of diseases and conditions. In some examples, the diseases and conditions include cancer. In some examples, the diseases and conditions are not limited to cancer.
In some embodiments, cancer refers to a physiological condition in a subject that is typically characterized by unregulated cell growth. Examples of cancer include, but are not limited to, melanoma, carcinoma, lymphoma, blastoma, sarcoma, and leukemia and metastases thereof. The term “metastasis” refers to the transference of disease-producing organisms or of malignant or cancerous cells to other parts of the body by way of the blood or lymphatic vessels or membranous surfaces. Non-limiting examples of such cancers include small-cell lung cancer, non-small cell lung cancer, adenocarcinoma of the lung, squamous carcinoma of the lung, melanoma, squamous cell cancer, cancer of the peritoneum, hepatocellular cancer, gastrointestinal cancer, pancreatic cancer, glioblastoma, cervical cancer, ovarian cancer, liver cancer, bladder cancer, hepatoma, breast cancer, colon cancer, colorectal cancer, endometrial or uterine carcinoma, salivary gland carcinoma, kidney cancer, liver cancer, prostate cancer, thyroid cancer, hepatic carcinoma and various types of head and neck cancer. The phrase “stage of disease” refers to the stages of cancer progression referred to as Stage I, II, III, or IV. Stage of disease indicates if metastasis has occurred in the subject.
In some examples, the “patient” described herein is equivalently described as an “individual.” For example, in some methods herein, set forth are biomarkers for monitoring or diagnosing a disease or a condition in an individual. In some of these examples, the individual is not necessarily a patient who has a medical condition in need of therapy.
In some examples, the methods herein comprise quantifying one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 using mass spectroscopy and/or liquid chromatography. In some examples, the quantification results are used as inputs in a trained model. In some examples, the quantification results are classified or categorized with a predictive algorithm based on the absolute amount, relative amount, and/or type of each glycan or glycopeptide quantified in the test sample, wherein the predictive algorithm is trained on corresponding values for each marker obtained from a population of individuals having known diseases or conditions. In some examples, the disease or condition is cancer. In some cases, the disease or condition is melanoma. In some cases, the disease or condition is NSCLC.
In some examples, including any of the foregoing, set forth herein is a method for training a machine-learning algorithm, comprising: providing a first data set of MRM transition signals indicative of a sample comprising a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46; providing a second data set of MRM transition signals indicative of a control sample; and comparing the first data set with the second data set using a machine-learning algorithm.
In some examples, the methods herein include using a sample comprising a glycopeptide consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 is a sample from a patient having the disease or condition. In some examples, the methods herein include using a sample comprising a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 is a sample from a patient having cancer. In some examples, the methods herein include using a control sample, wherein the control sample is a sample from a patient not having the disease or condition.
In some examples, the methods herein include using a sample comprising a glycopeptide consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, which is a pooled sample from one or more patients having the disease or condition. In some examples, the methods herein include using a control sample, which is a pooled sample from one or more patients not having the disease or condition.
In some examples, the methods include generating machine-learning models trained using mass spectrometry data (e.g., MRM-MS transition signals) from patients having a disease or condition and patients not having a disease or condition. In some examples, the disease or condition is cancer. In some examples, the methods include optimizing the machine-learning models by cross-validation with known standards or other samples. In some examples, the methods include qualifying the performance using the mass spectrometry data to form panels of glycans and glycopeptides with individual sensitivities and specificities. In certain examples, the methods include determining a confidence percent in relation to a diagnosis. In some examples, one to ten glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 may be useful for diagnosing a patient with the disease or condition with a certain confidence percent. In some examples, ten to fifty glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 may be useful for diagnosing a patient with the disease or condition with a higher confidence percent.
In some examples, including any of the foregoing, the methods include performing MRM-MS and/or LC-MS on a biological sample. In some examples, the methods include constructing, by a computing device, theoretical mass spectra data representing a plurality of mass spectra, wherein each of the plurality of mass spectra corresponds to one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46. In some examples, the methods include comparing, by the computing device, the mass spectra data with the theoretical mass spectra data to generate comparison data indicative of a similarity of each of the plurality of mass spectra to each of the plurality of theoretical target mass spectra associated with a corresponding glycopeptide of the plurality of glycopeptides.
In some examples, machine-learning algorithms are used to determine, by the computing device and based on the MRM-MS data, a distribution of a plurality of characteristic ions in the plurality of mass spectra; and determining, by the computing device and based on the distribution, whether one or more of the plurality of characteristic ions is a glycopeptide ion.
In some examples, the methods herein include training a predictive algorithm. Herein, training the predictive algorithm may refer to supervised learning of a predictive algorithm on the basis of values for one or more glycopeptides consisting of, or consisting essentially of, an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46. Training the predictive algorithm may refer to variable selection in a statistical model on the basis of values for one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46. Training a predictive algorithm may for example include determining a weighting vector in feature space for each category, or determining a function or function parameters.
In some examples, the machine-learning algorithm is selected from the group consisting of a deep learning algorithm, a neural network algorithm, an artificial neural network algorithm, a supervised machine-learning algorithm, a linear discriminant analysis algorithm, a quadratic discriminant analysis algorithm, a support vector machine algorithm, a linear basis function kernel support vector algorithm, a radial basis function kernel support vector algorithm, a random forest algorithm, a genetic algorithm, a nearest neighbor algorithm, k-nearest neighbors, a naive Bayes classifier algorithm, a logistic regression algorithm, or a combination thereof. In certain examples, the machine-learning algorithm is lasso regression.
In some examples, the machine-learning algorithm is LASSO, Ridge Regression, Random Forests, K-nearest Neighbors (KNN), Deep Neural Networks (DNN), and Principal Components Analysis (PCA). In certain examples, DNN's are used to process mass spec data into analysis-ready forms. In some examples, DNN's are used for peak picking from a mass spectra. In some examples, PCA is useful in feature detection.
In some examples, LASSO is used to provide feature selection.
In some examples, machine-learning algorithms are used to quantify peptides from each protein that are representative of the protein abundance. In some examples, this quantification includes quantifying proteins for which glycosylation is not measured.
In some examples, glycopeptide sequences are identified by fragmentation in the mass spectrometer and database search using Byonic software (Protein Metrics Inc).
In some examples, the methods herein include unsupervised learning to detect features of MRMS-MS data that represent known biological quantities, such as protein function or glycan motifs. In certain examples, these features are used as input for classifying by machine-learning. In some examples, the classification is performed using LASSO, Ridge Regression, or Random Forest nature.
In some examples, the methods herein include mapping input data (e.g., MRM transition peaks) to a value (e.g., a scale based on 0-100) before processing the value in an algorithm. For example, after an MRM transition is identified and the peak characterized, the methods herein include assessing the MS scans in an m/z and retention time window around the peak for a given patient. In some examples, the resulting chromatogram is integrated by a machine-learning algorithm that determines the peak start and stop points, and calculates the area bounded by those points and the intensity (height). The resulting integrated value is the abundance, which then feeds into machine-learning and statistical analyses training and data sets.
In some examples, machine-learning output, in one instance, is used as machine-learning input in another instance. For example, in addition to the PCA being used for a classification process, the DNN data processing feeds into PCA and other analyses. This results in at least three levels of algorithmic processing. Other hierarchical structures are contemplated within the scope of the instant disclosure.
In some examples, the methods include comparing the amount of each glycan or glycopeptide quantified in the sample to corresponding reference values for each glycan or glycopeptide in a predictive algorithm. In some examples, the methods include a comparative process by which the amount of a glycan or glycopeptide quantified in the sample is compared to a reference value for the same glycan or glycopeptide using a predictive algorithm. The comparative process may be part of a classification by a predictive algorithm. The comparative process may occur at an abstract level, e.g., in n-dimensional feature space or in a higher dimensional space.
In some examples, the methods herein include classifying a patient's sample based on the amount of each glycan or glycopeptide quantified in the sample with a predictive algorithm. In some examples, the methods include using statistical or machine-learning classification processes by which the amount of a glycan or glycopeptide quantified in the test sample is used to determine a category of health with a predictive algorithm. In some examples, the predictive algorithm is a statistical or machine-learning classification algorithm.
In some examples, classification by a predictive algorithm may include scoring likelihood of a panel of glycan or glycopeptide values belonging to each possible category, and determining the highest-scoring category. Classification by a predictive algorithm may include comparing a panel of marker values to previous observations by means of a distance function. Examples of predictive algorithms suitable for classification include random forests, support vector machines, logistic regression (e.g. multiclass or multinomial logistic regression, and/or algorithms adapted for sparse logistic regression). A wide variety of other predictive algorithms that are suitable for classification may be used, as known to a person skilled in the art.
In some examples, the methods herein include supervised learning of a predictive algorithm on the basis of values for each glycan or glycopeptide obtained from a population of individuals having a disease or condition (e.g., melanoma or NSCLS). In some examples, the methods include variable selection in a statistical model on the basis of values for each glycan or glycopeptide obtained from a population of individuals having the disease or condition. Training a predictive algorithm may for example include determining a weighting vector in feature space for each category, or determining a function or function parameters.
In one embodiment, the reference value is the amount of a glycan or glycopeptide in a sample or samples derived from one individual. Alternatively, the reference value may be derived by pooling data obtained from multiple individuals, and calculating an average (for example, mean or median) amount for a glycan or glycopeptide. Thus, the reference value may reflect the average amount of a glycan or glycopeptide in multiple individuals. Said amounts may be expressed in absolute or relative terms, in the same manner as described herein.
In some examples, the reference value may be derived from the same sample as the sample that is being tested, thus allowing for an appropriate comparison between the two. For example, if the sample is derived from urine, the reference value is also derived from urine. In some examples, if the sample is a blood sample (e.g. a plasma or a serum sample), then the reference value will also be a blood sample (e.g. a plasma sample or a serum sample, as appropriate). When comparing between the sample and the reference value, the way in which the amounts are expressed is matched between the sample and the reference value. Thus, an absolute amount can be compared with an absolute amount, and a relative amount can be compared with a relative amount. Similarly, the way in which the amounts are expressed for classification with the predictive algorithm is matched to the way in which the amounts are expressed for training the predictive algorithm.
When the amounts of the glycan or glycopeptide are determined, the method may comprise comparing the amount of each glycan or glycopeptide to its corresponding reference value. When the cumulative amount of one, some or all the glycan or glycopeptides are determined, the method may comprise comparing the cumulative amount to a corresponding reference value. When the amounts of the glycan or glycopeptides are combined with each other in a formula to form an index value, the index value can be compared to a corresponding reference index value derived in the same manner.
The reference values may be obtained either within (i.e., constituting a step of) or external to the (i.e., not constituting a step of) methods described herein. In some examples, the methods include a step of establishing a reference value for the quantity of the markers. In other examples, the reference values are obtained externally to the method described herein and accessed during the comparison step of the invention.
In certain embodiments, the lasso regression machine-learning model may be a regression model or other classification model that may be evaluated utilizing receiver operating characteristic (ROC) evaluation and/or area under curve (AOC) evaluation. For example, in certain embodiments, as will be further illustrated with respect to
In some examples, including any of the foregoing, training of a predictive algorithm may be obtained either within (i.e., constituting a step of) or external to (i.e., not constituting a step of) the methods set forth herein. In some examples, the methods include a step of training of a predictive algorithm. In some examples, the predictive algorithm is trained externally to the method herein and accessed during the classification step of the invention. The reference value may be determined by quantifying the amount of a glycan or glycopeptide in a sample obtained from a population of healthy individual(s). The predictive algorithm may be trained by quantifying the amount of a glycan or glycopeptide in a sample obtained from a population of healthy individual(s). As used herein, the term “healthy individual” refers to an individual or group of individuals who are in a healthy state, e.g., patients who have not shown any symptoms of the disease, have not been diagnosed with the disease and/or are not likely to develop the disease. Preferably said healthy individual(s) is not on medication affecting the disease and has not been diagnosed with any other disease. The one or more healthy individuals may have a similar sex, age and body mass index (BMI) as compared with the test individual. The reference value may be determined by quantifying the amount of a glycan or glycopeptide in a sample obtained from a population of individual(s) suffering from the disease. The predictive algorithm may be trained by quantifying the amount of a marker in a sample obtained from a population of individual(s) suffering from the disease. More preferably such individual(s) may have similar sex, age and body mass index (BMI) as compared with the test individual. The reference value may be obtained from a population of individuals suffering from cancer. The predictive algorithm may be trained by quantifying the amount of a glycan or glycopeptide in a sample obtained from a population of individuals suffering from cancer. Once the characteristic glycan or glycopeptide profile of cancer is determined, the profile of markers from a biological sample obtained from an individual may be compared to this reference profile to determine whether the test subject also has cancer. Once the predictive algorithm is trained to classify cancer, the profile of markers from a biological sample obtained from an individual may be classified by the predictive algorithm to determine whether the test subject is also at that particular stage of cancer.
In some examples, including any of the foregoing, set forth herein is a kit comprising a glycopeptide standard, a buffer, and one or more glycopeptides consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46.
In some examples, including any of the foregoing, set forth herein is a kit comprising a glycopeptide standard, a buffer, and one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46.
In some examples, set forth herein is a kit comprising a glycopeptide standard, a buffer, and one or more glycopeptides consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46, and combinations thereof. In some examples, set forth herein is a kit comprising a glycopeptide standard, a buffer, and one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131. In some examples, set forth herein is a kit comprising a glycopeptide standard, a buffer, and one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 159-207. In some examples, set forth herein is a kit comprising a glycopeptide standard, a buffer, and one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 21-46.
In some examples, set forth herein is a kit for diagnosing or monitoring cancer in an individual wherein the glycan or glycopeptide profile of a sample from said individual is determined and the measured profile is compared with a profile of a normal patient or a profile of a patient with a family history of cancer. In some examples, the kit comprises one or more glycopeptides consisting of an amino acid sequence selected from the group consisting SEQ ID NO: 101-131, 159-207, and 21-46. In some examples, the kit comprises one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46.
In some examples, set forth herein is a kit comprising the reagents for quantification of the oxidized, nitrated, and/or glycated free adducts derived from glycopeptides.
In some examples, the biomarkers, methods, and/or kits may be used in a clinical setting for diagnosing patients. In some of these examples, the analysis of samples includes the use of internal standards. These standards may include one or more glycopeptides consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46. These standards may include one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46.
In a clinical setting, samples may be prepared (e.g., by digestion) to include one or more glycopeptides consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46. In a clinical setting, samples may be prepared (e.g., by digestion) to include one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46. In some examples, the amount of a glycan or glycopeptide may be assessed by comparing the amount of one or more glycopeptides consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 to the concentration of another biomarker. In some examples, the amount of a glycan or glycopeptide may be assessed by comparing the amount of one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 to the concentration of another biomarker.
In some examples, the amount of a glycan or glycopeptide may be assessed by comparing the amount of one or more glycopeptides consisting of an amino acid sequence selected from the group consisting of SEQ ID NOs: 300-429 the amount of one or more glycopeptides consisting of an amino acid sequence selected from the group consisting of SEQ ID NOs: 300-429.
In some examples, the amount of a glycan or glycopeptide may be assessed by comparing the amount of one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 to the amount of one or more glycopeptides consisting essentially of an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46.
In some examples, including any of the foregoing, the kit may include software for computing the normalization of a glycopeptide MRM transition signal.
In some examples, including any of the foregoing, the kit may include software for quantifying the amount of a glycopeptide consisting of, or consisting essentially of, an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46. In some examples, including any of the foregoing, the kit may include software for quantifying the relative amount of a glycopeptide consisting of, or consisting essentially of, an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46.
In some examples, including any of the foregoing, a trained model is stored on a server which is accessed by a clinician performing a method, set forth herein. In some examples, the clinician inputs the quantification of the MRM transition signals from a patient's sample into a trained model which are stored on a server. In some examples, the server is accessed by the internet, wireless communication, or other digital or telecommunication methods.
In some examples, including any of the foregoing, a trained model is stored on a server which is accessed by a clinician performing a method, set forth herein. In some examples, the clinician inputs the quantification of the glycopeptide or glycopeptides consisting of, or consisting essentially of, an amino acid sequence selected from the group consisting of SEQ ID NO: 101-131, 159-207, and 21-46 from a patient's sample into a trained model which are stored on a server.
In some examples, the server is accessed by the internet, wireless communication, or other digital or telecommunication
Individual KM curves may be plotted for the markers relevant in for the disease interest in four files. Hazard ratios and p-values on the plots are representative of the plotted high/low split at median biomarker expression.
IV. Additional Proteins and Glycopeptides
In some embodiments, provided herein are methods for diagnosing a melanoma condition (metastatic melanoma) comprising detecting one or more biomarkers. In some embodiments, the one or more biomarkers comprise one or more glycopeptides. In some embodiments, the one or more biomarkers comprises one or more peptide structures set forth in Table 7. In some embodiments, the method comprises detecting one or more glycopeptides comprising a sequence set forth in SEQ ID NOs: 21-46. In some embodiments, the method comprises detecting one or more glycopeptides comprising a sequence set forth in SEQ ID NOs: 21-46. In some embodiments, the method comprises detecting one or more glycopeptides comprising a sequence set forth in SEQ ID NOs: 21-46. In some embodiments, the method comprises detecting one or more glycopeptides comprising a sequence set forth in SEQ ID NOs: 21-46. In some embodiments, the glycopeptide comprises a glycan with the structures in Table 7. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in Table 8. In some embodiments, the glycopeptide is a glycopeptide provided in Table 16. In some embodiments the glycopeptide comprises a sequence set forth in SEQ ID NO:300-429. In some embodiments, the glycopeptide is a glycopeptide a glycoprotein comprising SEQ ID NO:1-20.
In some embodiments, the diagnosis is based upon presence and/or amount of at least one, at least two, at least three, at least four, at least five, at least six, at least seven or eight peptide structures from Table 7. In some embodiments, the diagnosis is based upon the presence and/or amount of one or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of two or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of three or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of four or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of five or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of six or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of seven or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of each of the peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in Table 8. In some embodiments, the glycopeptide is a glycopeptide provided in Table 16. In some embodiments the glycopeptide comprises a sequence set forth in SEQ ID NO:300-429. In some embodiments, the glycopeptide is a glycopeptide a glycoprotein comprising SEQ ID NO:1-20.
In some embodiments, the diagnosis is based upon the presence and/or amount of one or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of two or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of three or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of four or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of five or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of six or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of seven or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of each of the peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in Table 8. In some embodiments, the glycopeptide is a glycopeptide provided in Table 16. In some embodiments the glycopeptide comprises a sequence set forth in SEQ ID NO:300-429. In some embodiments, the glycopeptide is a glycopeptide a glycoprotein comprising SEQ ID NO:1-20.
In some embodiments, the diagnosis is based upon the presence and/or amount of one or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of two or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of three or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of four or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of five or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of six or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of seven or more peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of each of the peptides comprising the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in Table 8. In some embodiments, the glycopeptide is a glycopeptide provided in Table 16. In some embodiments the glycopeptide comprises a sequence set forth in SEQ ID NO:300-429. In some embodiments, the glycopeptide is a glycopeptide a glycoprotein comprising SEQ ID NO:1-20.
In some embodiments, the diagnosis is based upon the presence and/or amount of one or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of two or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of three or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of four or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of five or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of six or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of seven or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of each of the peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in Table 8. In some embodiments, the glycopeptide is a glycopeptide provided in Table 16. In some embodiments the glycopeptide comprises a sequence set forth in SEQ ID NO:300-429. In some embodiments, the glycopeptide is a glycopeptide a glycoprotein comprising SEQ ID NO:1-20.
In some embodiments, provided herein is a method of treating a melanoma condition (metastatic melanoma) in an individual based upon the presence, absence, or amount of one or more peptide structures set forth in Table 7. In some embodiments, one or more peptide structures set forth in SEQ ID NOs: 21-46 is detected. In some embodiments, the method further comprises delivering a therapeutic agent based upon the presence, absence, or amount of one or more peptide structures set forth in Table 7. In some embodiments, the method comprises selecting a therapeutic agent based upon the presence, absence, or amount of one or more peptide structures set forth in Table 7. In some embodiments, the therapeutic agent is a chemotherapeutic agent and/or a hormone therapy.
In some embodiments, provided herein are methods for diagnosing a melanoma condition (metastatic melanoma) comprising detecting one or more biomarkers. In some embodiments, the one or more biomarkers comprise one or more glycopeptides. In some embodiments, the one or more biomarkers comprises one or more peptide structures set forth in Table 12. In some embodiments, the method comprises detecting one or more glycopeptides comprising a sequence set forth in SEQ ID NOs: 101-131. In some embodiments, the method comprises detecting one or more glycopeptides comprising a sequence set forth in SEQ ID NOs: 101-131. In some embodiments, the method comprises detecting one or more glycopeptides comprising a sequence set forth in SEQ ID NOs: 101-131. In some embodiments, the method comprises detecting one or more glycopeptides comprising a sequence set forth in SEQ ID NOs: 101-131. In some embodiments, the glycopeptide comprises a glycan with the structures in Table 12. In some embodiments the glycopeptide is a glycopeptide of a glycoprotein provided in Table 13. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in SEQ ID NO: 132-158.
In some embodiments, the diagnosis is based upon presence and/or amount of at least one, at least two, at least three, at least four, at least five, at least six, at least seven or eight peptide structures from Table 12. In some embodiments, the diagnosis is based upon the presence and/or amount of one or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of two or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of three or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of four or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of five or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of six or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of seven or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of each of the peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments the glycopeptide is a glycopeptide of a glycoprotein provided in Table 13. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in SEQ ID NO: 132-158.
In some embodiments, the diagnosis is based upon the presence and/or amount of one or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of two or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of three or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of four or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of five or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of six or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of seven or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of each of the peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments the glycopeptide is a glycopeptide of a glycoprotein provided in Table 13. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in SEQ ID NO: 132-158.
In some embodiments, the diagnosis is based upon the presence and/or amount of one or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of two or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of three or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of four or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of five or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of six or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of seven or more peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of each of the peptides comprising the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments the glycopeptide is a glycopeptide of a glycoprotein provided in Table 13. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in SEQ ID NO: 132-158.
In some embodiments, the diagnosis is based upon the presence and/or amount of one or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of two or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of three or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of four or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of five or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of six or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of seven or more peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments, the diagnosis is based upon the presence and/or amount of each of the peptides consisting of the amino acid sequence of SEQ ID NOs: 101-131. In some embodiments the glycopeptide is a glycopeptide of a glycoprotein provided in Table 13. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in SEQ ID NO: 132-158.
In some embodiments, provided herein is a method of treating a melanoma condition (metastatic melanoma) in an individual based upon the presence, absence, or amount of one or more peptide structures set forth in Table 12. In some embodiments, one or more peptide structures set forth in SEQ ID NOs: 101-131 is detected. In some embodiments, the method further comprises delivering a therapeutic agent based upon the presence, absence, or amount of one or more peptide structures set forth in Table 12. In some embodiments, the method comprises selecting a therapeutic agent based upon the presence, absence, or amount of one or more peptide structures set forth in Table 12. In some embodiments, the therapeutic agent is a chemotherapeutic agent and/or a hormone therapy. In some embodiments the glycopeptide is a glycopeptide of a glycoprotein provided in Table 13. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in SEQ ID NO: 132-158.
In some embodiments, provided herein are methods for diagnosing non-small-cell lung cancer (NSCLC) comprising detecting one or more biomarkers. In some embodiments, the one or more biomarkers comprise one or more glycopeptides. In some embodiments, the one or more biomarkers comprises one or more peptide structures set forth in Table 14. In some embodiments, the method comprises detecting one or more glycopeptides comprising a sequence set forth in SEQ ID NOs: 159-207. In some embodiments, the method comprises detecting one or more glycopeptides comprising a sequence set forth in SEQ ID NOs: 159-207. In some embodiments, the method comprises detecting one or more glycopeptides comprising a sequence set forth in SEQ ID NOs: 159-207. In some embodiments, the method comprises detecting one or more glycopeptides comprising a sequence set forth in SEQ ID NOs: 159-207. In some embodiments, the glycopeptide comprises a glycan with the structures in Table 14. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in Table 15. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein comprising a sequence set forth in SEQ ID NO: 208-253.
In some embodiments, the diagnosis is based upon presence and/or amount of at least one, at least two, at least three, at least four, at least five, at least six, at least seven or eight peptide structures from Table 14. In some embodiments, the diagnosis is based upon the presence and/or amount of one or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of two or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of three or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of four or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of five or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of six or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of seven or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of each of the peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in Table 15. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein comprising a sequence set forth in SEQ ID NO: 208-253.
In some embodiments, the diagnosis is based upon the presence and/or amount of one or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of two or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of three or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of four or more peptides consisting of the amino acid sequence of SEQ ID NOs: 21-46. In some embodiments, the diagnosis is based upon the presence and/or amount of five or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of six or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of seven or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of each of the peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in Table 15. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein comprising a sequence set forth in SEQ ID NO: 208-253.
In some embodiments, the diagnosis is based upon the presence and/or amount of one or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of two or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of three or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of four or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of five or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of six or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of seven or more peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of each of the peptides comprising the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in Table 15. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein comprising a sequence set forth in SEQ ID NO: 208-253.
In some embodiments, the diagnosis is based upon the presence and/or amount of one or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of two or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of three or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of four or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of five or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of six or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of seven or more peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the diagnosis is based upon the presence and/or amount of each of the peptides consisting of the amino acid sequence of SEQ ID NOs: 159-207. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in Table 15. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein comprising a sequence set forth in SEQ ID NO: 208-253.
In some embodiments, provided herein is a method of treating non-small-cell lung cancer (NSCLC) in an individual based upon the presence, absence, or amount of one or more peptide structures set forth in Table 14. In some embodiments, one or more peptide structures set forth in SEQ ID NOs: 159-207 is detected. In some embodiments, the method further comprises delivering a therapeutic agent based upon the presence, absence, or amount of one or more peptide structures set forth in Table 14. In some embodiments, the method comprises selecting a therapeutic agent based upon the presence, absence, or amount of one or more peptide structures set forth in Table 14. In some embodiments, the therapeutic agent is a chemotherapeutic agent and/or a hormone therapy. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein provided in Table 15. In some embodiments, the glycopeptide is a glycopeptide of a glycoprotein comprising a sequence set forth in SEQ ID NO: 208-253.
In the descriptions herein, it is understood that every description, variation, embodiment or aspect of a biomarker, peptide, glycopeptide, glycoprotein may be combined with every description, variation, embodiment or aspect of other biomarkers, peptides, glycopeptide, glycoproteins the same as if each and every combination of descriptions is specifically and individually listed.
The following examples are included for illustrative purposes only and are not intended to limit the scope of the invention.
Protein glycosylation is one of the most abundant and most complex form of post-translational protein modification. Glycosylation affects protein structure, conformation, and function. The elucidation of the potential role of differential protein glycosylation as biomarkers has so far been limited by the technical complexity of generating and interpreting this information. A novel, powerful platform has been recently established that combines ultra-high-performance liquid chromatography coupled to triple quadrupole mass spectrometry with a proprietary machine-learning and neural-network-based data processing engine that allows for high-throughput, highly scalable interrogation of the glycoproteome. This study assessed whether glycoproteomic biomarkers and signatures can predict which patients with metastatic malignant melanoma would respond to PD1/PDL1 checkpoint inhibitors.
Methods: this platform we interrogated 413 individual glycopeptide (GP) signatures derived from 69 abundant serum proteins in pretreatment blood samples from a cohort of 36 individuals (11 females, 25 males, age range 28 to 90 years) with metastatic malignant melanoma treated either with nivolumab plus ipilimumab (12 patients) or pembrolizumab (24 patients). Plasma samples were taken prior to beginning treatment, stored at −80 C, and run through InterVenn's targeted MRM panel.
The individual glycopeptide expression levels were associated with time from treatment initiation to progression/metastasis (progression-free survival, PFS) or death (overall survival, OS) in the patient cohorts.
In addition to assessing individual biomarker associations, multivariable models were built to predict PFS (Melanoma). The multivariate models were built by selecting a small subset of glycopeptides for modeling, proceeding to build a model with n−1 patients, predicting a survival score on the one holdout patient, and iterating over all patients as individual holdouts, to generate unbiased prediction scores for everyone (a leave-one-out cross-validation approach, LOOCV). The resulting scores were dichotomized at a cutoff which optimizes Harrell's C-index, and Kaplan-Meier (KM) curves were plotted.
Specifically, progression-free survival (PFS) data with follow-up of up to 3.7 years (median: 0.8 years) were used as clinical endpoint phenotype against which the predictive power of differential abundance of GPs was assessed. PFS data were analyzed using Cox Proportional Hazards models. Kaplan Meier curves were generated for GP markers that showed statistically significant differential abundances using a false discovery rate (FDR)-adjusted p-value of ≤0.1 as a cutoff. Hazard Ratio (HR) for PFS was calculated from a Cox Proportional Hazards model, representing the multiplicative increase in odds of progression for each increase of the biomarker by 1 unit. The p-value associated with the HR was analyzed, where p<0.01 was considered significant. The interaction p-value, the p-value associated with the biomarker x treatment interaction, was also analyzed, where significance indicates potential for use in treatment selection.
Further, as part of this example, an interrogation of 526 glycopeptide (GP) signatures derived from 75 serum proteins in pretreatment blood samples from a cohort of 205 individuals (66 females, 139 males, age range 24 to 97 years) with metastatic malignant melanoma treated either with nivolumab (N) with or without ipilimumab (I, 95 patients) or pembrolizumab (P, 110 patients) immune-checkpoint inhibitor (ICI) therapy.
In certain embodiments,
Results: 27 GPs with abundance differences at FDR p≤0.1 were identified, and among them 8 markers at p≤0.001. Using the latter 8 markers, a multivariable model for PFS was created by generating leave-one-out cross-validation (LOOCV) scores and determining an optimized cutoff value for these scores using Harrel's concordance index. Dichotomizing the LOOCV scores using this cutoff value demonstrated the model to yield a hazard ratio of 9.2 at a p-value of 10−5 for separating treatment responders and non-responders (70% vs. 0% PFS, respectively, at 18 months based on LOOCV score above/below cutoff), as compared to a hazard ratio of 1.5, p=0.5 for PDL1 expression.
In an optimized assay containing 27 glycopeptides and 20 non-glycosylated peptides, we identified 14 GPs with abundance differences at FDR q≤0.05 with regard to PFS. Using 40% of the cohort as a training set and selecting 12 glycopeptide and non-glycosylated peptide biomarker features of the 47 total by LASSO shrinkage, we created a multivariable-model-based classifier for PFS that yielded a hazard ratio (HR) for prediction of likely ICI benefit of 7.5 at p<0.0001. This classifier was validated in the test set comprised of the held-out 60% of patients, yielding a HR of 4.7 at a similar p-value for separating patients likely benefiting from either single or combination ICI therapy and those likely not benefiting (50% PFS of 18 months vs. 3 months based on classifier score above/below cutoff). This classifier has a sensitivity of >99% to predict likely ICI benefit, while still performing at a specificity of 26%, thus helping to safely reduce ultimately unnecessary and non-beneficial exposure to these agents of one in four who otherwise would unnecessarily be exposed to them.
Conclusions: Our results indicate that glycoproteomics holds a strong promise as a response predictor to checkpoint inhibitor treatment that appears to significantly outperform other currently pursued biomarker approaches in this context.
Background: Immune checkpoint blockade is an integral component of first-line therapy for most patients with advanced non-small cell lung cancer (NSCLC), however individual patient outcomes are highly variable and improved biomarkers are needed. Protein glycosylation is an emerging mechanism of immune evasion in cancer. Blood-based glycopeptide signatures were examined in a cohort of advanced NSCLC patients treated with first-line immune checkpoint blockade. This study assessed whether glycoproteomic biomarkers and signatures can predict which patients with NSCLC would respond to PD1/PDL1 checkpoint inhibitors.
Methods: In two independent studies, whether glycoproteomic biomarkers and signatures may predict which patients would respond to checkpoint inhibitor therapies was determined. For example, Study 1 included of n=205 patients with metastatic melanoma seen at Massachusetts General Hospital (MGH), treated either with Ipilimumab+Nivolumab (n=95) or Pembrolizumab (n=110). Plasma samples were taken prior to beginning treatment, stored at −80 C, and inputted to a targeted multiple reaction monitoring (MRM) panel. Study 2 included n=125 patients with metastatic non-small-cell lung cancer sourced from Tempus and treated with Pembrolizumab. Serum samples were taken prior to beginning treatment, stored at −80 C, and inputted to the targeted MRM panel. In both Study 1 and Study 2, individual glycopeptide expression levels were associated with time from treatment initiation to progression-free survival (PFS) (e.g., progression/metastasis) or overall survival (OS) in the patient cohorts.
In addition to assessing individual biomarker associations, multivariable models were built to predict OS (NSCLC) and PFS (Melanoma). The multivariable models were built to predict OS (NSCLC) and PFS (Melanoma) by selecting a small subset of glycopeptides through 5-fold repeated cross-validated LASSO regularization, proceeding to build a model with 40% of patients (allocated via balanced stratification on sex, age quartile, PFS/OS event), tuning hyperparameters in LASSO model in another 30% of patients, and predicting a survival score on the remaining 30% of holdout patients (to generate unbiased prediction scores). The resulting prediction scores were dichotomized at a cutoff which optimizes Harrell's C-index, and Kaplan-Meier (KM) curves were plotted final models for products were optimized for sensitivity for non-response. For example, in certain embodiments,
Results: 30 GPs with abundance differences using a False Discovery Rate (FDR) threshold of 0.05 were identified. Using the 5 most predictive GP markers, a multivariable model for OS was created by generating leave-one-out cross-validation (LOOCV) scores and determining an optimized cutoff value of −0.83 (range: −2.2-3.4) for these scores using Harrell's concordance index. The median overall survival was 2.8 years for patients (n=14) whose GP classifier value was above the cutoff and 0.8 years for patients (n=32) whose GP classifier value was below the cutoff (HR 7.4, 95% CI 1.7-32.1, p=0.007). The model's performance was not affected by sex, age, or treatment regimen.
Conclusions: Blood-based glycopeptide signatures may represent novel, non-invasive biomarkers of clinical outcome to first-line immune checkpoint blockade in advanced NSCLC. These findings may be validated in larger cohorts and applied in clinical decision-making.
Any headers and/or subheaders between sections and subsections of this document are included solely for the purpose of improving readability and do not imply that features cannot be combined across sections and subsection. Accordingly, sections and subsections do not describe separate embodiments.
While the present teachings are described in conjunction with various embodiments, it is not intended that the present teachings be limited to such embodiments. On the contrary, the present teachings encompass various alternatives, modifications, and equivalents, as will be appreciated by those of skill in the art. The present description provides preferred exemplary embodiments, and is not intended to limit the scope, applicability or configuration of the disclosure. Rather, the present description of the preferred exemplary embodiments will provide those skilled in the art with an enabling description for implementing various embodiments.
It is understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope as set forth in the appended claims. Thus, such modifications and variations are considered to be within the scope set forth in the appended claims. Further, the terms and expressions which have been employed are used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed.
In describing the various embodiments, the specification may have presented a method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the various embodiments.
Some embodiments of the present disclosure include a system including one or more data processors. In some embodiments, the system includes a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of one or more methods and/or part or all of one or more processes disclosed herein. Some embodiments of the present disclosure include a computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform part or all of one or more methods and/or part or all of one or more processes disclosed herein.
Specific details are given in the present description to provide an understanding of the embodiments. However, it is understood that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
Among the provided embodiments are:
1. A method for managing a treatment for a subject diagnosed with a melanoma condition, the method comprising:
This application claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Patent Application No. 63/158,283, filed 8 Mar. 2021, U.S. Provisional Patent Application No. 63/246,293, filed 20 Sep. 2021, and U.S. Provisional Patent Application No. 63/251,023, filed 30 Sep. 2021, each of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63158283 | Mar 2021 | US | |
63246293 | Sep 2021 | US | |
63251023 | Sep 2021 | US |