Ductal carcinoma in situ (DCIS) is a preinvasive lesion where tumor cells within the breast duct are isolated from the surrounding stroma by a near-continuous layer of myoepithelium and basement membrane proteins. This histologic feature is the central property that distinguishes it from invasive breast cancer (IBC), where this barrier has broken down and tumor cells have invaded the stroma. DCIS comprises 20% of new breast cancer diagnoses, but unlike IBC, in itself is not a life-threatening disease. However, if left untreated, approximately half of these patients will develop IBC within 10 years.
Sequencing-based approaches have been used extensively over the last decade to identify molecular features that could elucidate the connection between DCIS and IBC. Genomic profiling has identified recurrent copy number variants (CNV) that are more prevalent in high-grade DCIS lesions. Meanwhile, comparison of paired DCIS and IBC lesions from the same patient has provided clues into the clonal evolution from in situ to invasive disease by revealing genomic alterations that are acquired during this transition. To date, however, these findings have not been found to consistently explain this transition. Similarly, the utility of tumor phenotyping by single-plex immunohistochemical tissue staining has been limited as well.
In light of this uncertainty, clinical management has trended towards treating all patients presumptively as progressors with surgery, radiation therapy, and pharmacological interventions that carry risks for therapy-related adverse events. Consequently, this approach is likely to be overly aggressive for non-progressors. Thus, understanding the central biological features in DCIS that drive the transition to IBC is a critical unmet need.
Surprisingly, despite all the information now known about the genetic and functional state of tumor cells in DCIS, histopathology remains the only reliable way to diagnose it. Thus, DCIS is an intrinsically structured entity where the spatial orientation of tumor, myoepithelial, and stromal cells is the primary defining feature that distinguishes it from other forms of breast cancer.
Compositions and methods are provided for classification of ductal carcinoma in situ (DCIS) lesion with respect to its probability of recurrence and invasive disease. Classification with respect to the probability of cancer recurrence allows treatment appropriate for the condition. While most DCIS is indolent, due to the propensity of some DCIS to become invasive, many subjects with DCIS are treated aggressively. The methods disclosed herein provide a reliable test to determine the propensity of a DCIS lesion to progress to invasive cancer, which allows direction of therapy to those individuals that can benefit from it. Those subjects whose lesions are determined to be indolent can be treated by monitoring the lesion over time, or with low level therapeutics. Those subjects whose lesions have a high probability of invasiveness can receive aggressive therapy, including without limitation surgery, radiation, chemotherapy, immunotherapy, or a combination thereof.
The methods disclosed here utilize a spatial atlas of breast cancer progression identifying features in primary ductal carcinoma in situ (DCIS) that are associated with risk of invasive relapse. Specifically, features related to coordinated transformation of ductal myoepithelium and surrounding stroma are predictive of the clinical outcome. For example, relative to normal tissue, a thin myoepithelial layer in DCIS samples is indicative of whether a patient sample is a DCIS progressor or non-progressor. Analysis of ductal myoepithelium shows that DCIS samples with more continuous myoepithelium and high E-cadherin (ECAD) expression are at higher risk of ipsilateral invasive recurrence following primary DCIS surgical excision. Retention of these normal-like myoepithelial traits correlates with fewer stromal immune cells and cancer associated fibroblasts (CAFs). Conversely, thin, discontinuous, low-ECAD myoepithelium present in non-progressor tumors is correlated with a more reactive desmoplastic stroma with more immune cells, CAFs, and collagen remodeling.
In some embodiments a predictive method is provided for classification of a DCIS tissue from an individual as indolent; or invasive recurrent. The individual may be treated in accordance with the classification. In some embodiments the method comprises analysis of ductal myoepithelium features, where a lesion with myoepitheliem characterized as thin, discontinuous, low-ECAD myoepithelium, relative to a normal control, is classified as indolent. In some embodiments the structure of collagen fibers in the extracellular matrix, and the spatial distribution of multiple immune cell subsets is also analyzed. Imaging of myoepithelium and other features may be performed with multiplexed ion beam imaging by time of flight (MIBI-TOF). The classification can be made by targeted inspection of the imaging data. In some embodiments the method comprises analysis of features extracted from MIBI-TOF data, including, for example, phenotypic, functional, spatial, and morphologic features.
In some embodiments a predictive classifier model is provided for a method for classification of a DCIS tissue from an individual as indolent; or invasive recurrent. In some embodiments the classifier model is a random forest classifier model. In some embodiments a random-forest classifier with MIBI-identified tumor features is trained on patients with known clinical outcomes, and the classifier used to identify those features most useful to separating these outcome groups. The model can be trained to predict recurrence of DCIS and invasive breast cancer (IBC); or can be trained to predict only IBC. In some embodiments the features comprise metrics related to the phenotype of myoepithelium, the structure of collagen fibers in the extracellular matrix, and the spatial distribution of multiple immune cell subsets. The model has identified pixel-level, ECAD+ myoepithelial expression as the most predictive metric.
A DCIS sample can be obtained by any means available to those skilled in the art including, but not limited to, a biopsy of the DCIS lesion, including a needle biopsy or surgical removal of tissue containing the lesion. The DCIS lesion can be classified or predicted to be invasive recurrent or indolent based on analysis of the features identified herein. The determination of the aggressiveness phenotype of the DCIS lesion can be used to develop a treatment plan for the subject with the DCIS lesion and to treat the patient accordingly.
In one embodiment, there is provided herein a computer system for determining whether a subject has, is predisposed to having, or has a poor prognosis for, DCIS, comprising: a database of MIBI derived lesion feature datasets, and a server comprising a computer-executable code for causing the computer to receive one or more of the datasets, and to classify the lesion dataset according to a random forest model trained on a dataset of lesion features from tissue with a known outcome, and to generate a classification of whether the lesion is predisposed to invasive, recurrent DCIS. In another aspect, there is provided herein a computer-assisted method for evaluating the prognosis of breast cancer-related disease in a subject, comprising: (1) providing a computer comprising a model or algorithm for classifying data from a DCIS lesion sample obtained from the subject, wherein the classification includes analyzing the data for the presence, absence or amount of MIBI-TOF imaging features (2) inputting data from a biological sample obtained from the subject; and, (3) classifying the biological sample to indicate the DCIS prognosis.
The invention is best understood from the following detailed description when read in conjunction with the accompanying drawings. It is emphasized that, according to common practice, the various features of the drawings are not to-scale. On the contrary, the dimensions of the various features are arbitrarily expanded or reduced for clarity. Included in the drawings are the following figures.
Before the present methods and compositions are described, it is to be understood that this invention is not limited to particular method or composition described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, some potential and preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. It is understood that the present disclosure supercedes any disclosure of an incorporated publication to the extent there is a contradiction.
It must be noted that as used herein and in the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a cell” includes a plurality of such cells and reference to “the peptide” includes reference to one or more peptides and equivalents thereof, e.g. polypeptides, known to those skilled in the art, and so forth.
The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.
The types of cancer that can be treated using the subject methods of the present invention include but are not limited to forms of breast cancer, particularly ductal carcinoma in situ. Most breast cancers are epithelial tumors that develop from cells lining ducts or lobules; less common are nonepithelial cancers of the supporting stroma (eg, angiosarcoma, primary stromal sarcomas, phyllodes tumor). Cancers are divided into carcinoma in situ and invasive cancer.
Carcinoma in situ is proliferation of cancer cells within ducts or lobules and without invasion of stromal tissue. There are 2 types: Ductal carcinoma in situ (DCIS): About 85% of carcinoma in situ are this type. DCIS is usually detected only by mammography. It may involve a small or wide area of the breast; if a wide area is involved, microscopic invasive foci may develop over time. Lobular carcinoma in situ (LCIS): LCIS is often multifocal and bilateral. There are 2 types: classic and pleomorphic. Classic LCIS is not malignant but increases risk of developing invasive carcinoma in either breast. This nonpalpable lesion is usually detected via biopsy; it is rarely visualized with mammography. Pleomorphic LCIS behaves more like DCIS; it should be excised to negative margins.
Invasive carcinoma is primarily adenocarcinoma. About 80% is the infiltrating ductal type; most of the remaining cases are infiltrating lobular. Rare types include medullary, mucinous, metaplastic, and tubular carcinomas. Mucinous carcinoma tends to develop in older women and to be slow growing. Women with these rare types of breast cancer have a much better prognosis than women with other types of invasive breast cancer.
Breast cancer invades locally and spreads through the regional lymph nodes, bloodstream, or both. Metastatic breast cancer may affect almost any organ in the body—most commonly, lungs, liver, bone, brain, and skin. Most skin metastases occur near the site of breast surgery; scalp metastases are uncommon. Some breast cancers may recur sooner than others; recurrence can often be predicted based on tumor markers. For example, metastatic breast cancer may occur within 3 years in patients who are negative for tumor markers or occur>10 years after initial diagnosis and treatment in patients who have an estrogen-receptor positive tumor.
When an abnormality is detected during a physical examination or by a screening procedure, testing is required to differentiate benign lesions from cancer. Because early detection and treatment of breast cancer improves prognosis, this differentiation must be conclusive before evaluation is terminated. If advanced cancer is suspected based on physical examination, biopsy should be done first; otherwise, the approach is the same as evaluation for a breast mass, which typically includes ultrasonography. All lesions that could be cancer should be biopsied. A prebiopsy bilateral mammogram may help delineate other areas that should be biopsied and provides a baseline for future reference. However, mammogram results should not alter the decision to do a biopsy if that decision is based on physical findings. Percutaneous core needle biopsy is preferred to surgical biopsy. Core biopsy can be done guided by imaging or palpation (freehand). Routinely, stereotactic biopsy (needle biopsy guided by mammography done in 2 planes and analyzed by computer to produce a 3-dimensional image) or ultrasound-guided biopsy is being used to improve accuracy. Clips are placed at the biopsy site to identify it. If core biopsy is not possible (eg, the lesion is too posterior), surgical biopsy can be done; a guidewire is inserted, using imaging for guidance, to help identify the biopsy site. Any skin taken with the biopsy specimen should be examined because it may show cancer cells in dermal lymphatic vessels. The excised specimen should be x-rayed, and the x-ray should be compared with the prebiopsy mammogram to determine whether all of the lesion has been removed. If the original lesion contained microcalcifications, mammography is repeated when the breast is no longer tender, usually 6 to 12 weeks after biopsy, to check for residual microcalcifications. If radiation therapy is planned, mammography should be done before radiation therapy begins.
Staging follows the TNM (tumor, node, metastasis) classification. Because clinical examination and imaging have poor sensitivity for nodal involvement, staging is refined during surgery, when regional lymph nodes can be evaluated. However, if patients have palpably abnormal axillary nodes, preoperative ultrasonography-guided fine needle aspiration or core biopsy may be done. If biopsy results are positive, axillary lymph node dissection is typically done during the definitive surgical procedure. However, use of neoadjuvant chemotherapy may make sentinel lymph node biopsy possible if chemotherapy changes node status from N1 to N0. (Results of intraoperative frozen section analysis determine whether axillary lymph node dissection will be needed.) If results are negative, a sentinel lymph node biopsy, a less aggressive procedure, may be done instead.
For most types of breast cancer, treatment involves surgery, radiation therapy, and systemic therapy. Choice of treatment depends on tumor and patient characteristics. Surgery involves mastectomy or breast-conserving surgery plus radiation therapy. Some physicians use preoperative chemotherapy to shrink the tumor before removing it and applying radiation therapy; thus, some patients who might otherwise have required mastectomy can have breast-conserving surgery.
Radiation therapy is indicated after mastectomy if either of the following is present: The primary tumor is ≥5 cm. Axillary nodes are involved. In such cases, radiation therapy after mastectomy significantly reduces incidence of local recurrence on the chest wall and in regional lymph nodes and improves overall survival.
Patients with LCIS are often treated with daily oral tamoxifen. For postmenopausal women, raloxifene or an aromatase inhibitor is an alternative. For patients with invasive cancer, chemotherapy is usually begun soon after surgery. If systemic chemotherapy is not required, hormone therapy is usually begun soon after surgery plus radiation therapy and is continued for years. These therapies delay or prevent recurrence in almost all patients and prolong survival in some. However, some experts believe that these therapies are not necessary for many small (<0.5 to 1 cm) tumors with no lymph node involvement (particularly in postmenopausal patients) because the prognosis is already excellent. If tumors are >5 cm, adjuvant systemic therapy may be started before surgery.
Combination chemotherapy regimens are more effective than a single drug. Dose-dense regimens given for 4 to 6 months are preferred; in dose-dense regimens, the time between doses is shorter than that in standard-dose regimens. There are many regimens; a commonly used one is ACT (doxorubicin plus cyclophosphamide followed by paclitaxel). Acute adverse effects depend on the regimen but usually include nausea, vomiting, mucositis, fatigue, alopecia, myelosuppression, cardiotoxicity, and thrombocytopenia. Growth factors that stimulate bone marrow (eg, filgrastim, pegfilgrastim) are commonly used to reduce risk of fever and infection due to chemotherapy. Long-term adverse effects are infrequent with most regimens; death due to infection or bleeding is rare (<0.2%). High-dose chemotherapy plus bone marrow or stem cell transplantation offers no therapeutic advantage over standard therapy and should not be used.
If tumors overexpress HER2 (HER2+), anti-HER2 drugs (trastuzumab, pertuzumab) may be used. Adding the humanized monoclonal antibody trastuzumab to chemotherapy provides substantial benefit. Trastuzumab is usually continued for a year, although the optimal duration of therapy is unknown. If lymph nodes are involved involvement, adding pertuzumab to trastuzumab improves disease-free survival. A serious potential adverse effect of both these anti-HER2 drugs is a decreased cardiac ejection fraction. With hormone therapy (eg, tamoxifen, raloxifene, aromatase inhibitors), benefit depends on estrogen and progesterone receptor expression; benefit is greatest when tumors have expressed estrogen and progesterone receptors.
Adjunctive therapy: A treatment used in combination with a primary treatment to improve the effects of the primary treatment.
Clinical outcome: Refers to the health status of a patient following treatment for a disease or disorder or in the absence of treatment. Clinical outcomes include, but are not limited to, an increase in the length of time until death, a decrease in the length of time until death, an increase in the chance of survival, an increase in the risk of death, survival, disease-free survival, chronic disease, metastasis, advanced or aggressive disease, disease recurrence, death, and favorable or poor response to therapy.
Decrease in survival: As used herein, “decrease in survival” refers to a decrease in the length of time before death of a patient, or an increase in the risk of death for the patient.
Poor prognosis: Generally refers to a decrease in survival, or in other words, an increase in risk of death or a decrease in the time until death. Poor prognosis can also refer to an increase in severity of the disease, such as an increase in spread or invasiveness (metastasis) of the cancer to other tissues and/or organs.
The terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a mammal being assessed for treatment and/or being treated. In some embodiments, the mammal is a human. The terms “subject,” “individual,” and “patient” encompass, without limitation, individuals having a disease. Subjects may be human, but also include other mammals, particularly those mammals useful as laboratory models for human disease, e.g., mice, rats, etc.
The term “sample” with reference to a patient encompasses blood and other liquid samples of biological origin, solid tissue samples such as a biopsy specimen or tissue cultures or cells derived therefrom and the progeny thereof. The term also encompasses samples that have been manipulated in any way after their procurement, such as by treatment with reagents; washed; or enrichment for certain cell populations, such as diseased cells. The definition also includes samples that have been enriched for particular types of molecules, e.g., nucleic acids, polypeptides, etc. The term “biological sample” encompasses a clinical sample, and also includes tissue obtained by surgical resection, tissue obtained by biopsy, cells in culture, cell supernatants, cell lysates, tissue samples, organs, bone marrow, blood, plasma, serum, and the like. A “biological sample” includes a sample obtained from a patient's diseased cell, e.g., a sample comprising polynucleotides and/or polypeptides that is obtained from a patient's diseased cell (e.g., a cell lysate or other cell extract comprising polynucleotides and/or polypeptides); and a sample comprising diseased cells from a patient. A biological sample comprising a diseased cell from a patient can also include non-diseased cells.
In some embodiments of the present methods, use of a control is desirable. In that regard, the control may be a non-cancerous tissue sample obtained from the same patient, or a tissue sample obtained from a healthy subject, such as a healthy tissue donor. In another example, the control is a standard calculated from historical values. In one embodiment the control is a cancerous tissue sample of breast cancer. The control may be derived from tissue of known dysplasia, known cancer type, known mutation status, and/or known tumor stage. In one embodiment the control is a historical average derived from DCIS.
The term “diagnosis” is used herein to refer to the identification of a molecular or pathological state, disease or condition in a subject, individual, or patient.
The term “prognosis” is used herein to refer to the prediction of the likelihood of death or disease progression, including recurrence, spread, and drug resistance, in a subject, individual, or patient. The term “prediction” is used herein to refer to the act of foretelling or estimating, based on observation, experience, or scientific reasoning, the likelihood of a subject, individual, or patient experiencing a particular event or clinical outcome. In one example, a physician may attempt to predict the likelihood that a patient will survive.
As used herein, the terms “treatment,” “treating,” and the like, refer to administering an agent, or carrying out a procedure, for the purposes of obtaining an effect on or in a subject, individual, or patient. The effect may be prophylactic in terms of completely or partially preventing a disease or symptom thereof and/or may be therapeutic in terms of effecting a partial or complete cure for a disease and/or symptoms of the disease. “Treatment,” as used herein, may include treatment of cancer in a mammal, particularly in a human, and includes: (a) inhibiting the disease, i.e., arresting its development; and (b) relieving the disease or its symptoms, i.e., causing regression of the disease or its symptoms.
Treating may refer to any indicia of success in the treatment or amelioration or prevention of a disease, including any objective or subjective parameter such as abatement; remission; diminishing of symptoms or making the disease condition more tolerable to the patient; slowing in the rate of degeneration or decline; or making the final point of degeneration less debilitating. The treatment or amelioration of symptoms can be based on objective or subjective parameters; including the results of an examination by a physician. Accordingly, the term “treating” includes the administration of engineered cells to prevent or delay, to alleviate, or to arrest or inhibit development of the symptoms or conditions associated with disease or other diseases. The term “therapeutic effect” refers to the reduction, elimination, or prevention of the disease, symptoms of the disease, or side effects of the disease in the subject.
As used herein, a “therapeutically effective amount” refers to that amount of the therapeutic agent sufficient to treat or manage a disease or disorder. A therapeutically effective amount may refer to the amount of therapeutic agent sufficient to delay or minimize the onset of disease, e.g., to delay or minimize the growth and spread of cancer. A therapeutically effective amount may also refer to the amount of the therapeutic agent that provides a therapeutic benefit in the treatment or management of a disease. Further, a therapeutically effective amount with respect to a therapeutic agent of the invention means the amount of therapeutic agent alone, or in combination with other therapies, that provides a therapeutic benefit in the treatment or management of a disease.
As used herein, the term “dosing regimen” refers to a set of unit doses (typically more than one) that are administered individually to a subject, typically separated by periods of time. In some embodiments, a given therapeutic agent has a recommended dosing regimen, which may involve one or more doses. In some embodiments, a dosing regimen comprises a plurality of doses each of which are separated from one another by a time period of the same length; in some embodiments, a dosing regimen comprises a plurality of doses and at least two different time periods separating individual doses. In some embodiments, all doses within a dosing regimen are of the same unit dose amount. In some embodiments, different doses within a dosing regimen are of different amounts. In some embodiments, a dosing regimen comprises a first dose in a first dose amount, followed by one or more additional doses in a second dose amount different from the first dose amount. In some embodiments, a dosing regimen comprises a first dose in a first dose amount, followed by one or more additional doses in a second dose amount same as the first dose amount. In some embodiments, a dosing regimen is correlated with a desired or beneficial outcome when administered across a relevant population (i.e., is a therapeutic dosing regimen).
“In combination with”, “combination therapy” and “combination products” refer, in certain embodiments, to the concurrent administration to a patient of the engineered proteins and cells described herein in combination with additional therapies, e.g. surgery, radiation, chemotherapy, and the like. When administered in combination, each component can be administered at the same time or sequentially in any order at different points in time. Thus, each component can be administered separately but sufficiently closely in time so as to provide the desired therapeutic effect.
“Concomitant administration” means administration of one or more components, such as engineered proteins and cells, known therapeutic agents, etc. at such time that the combination will have a therapeutic effect. Such concomitant administration may involve concurrent (i.e. at the same time), prior, or subsequent administration of components. A person of ordinary skill in the art would have no difficulty determining the appropriate timing, sequence and dosages of administration.
The use of the term “in combination” does not restrict the order in which prophylactic and/or therapeutic agents are administered to a subject with a disorder. A first prophylactic or therapeutic agent can be administered prior to (e.g., 5 minutes, 15 minutes, 30 minutes, 45 minutes, 1 hour, 2 hours, 4 hours, 6 hours, 12 hours, 24 hours, 48 hours, 72 hours, 96 hours, 1 week, 2 weeks, 3 weeks, 4 weeks, 5 weeks 6 weeks, 8 weeks, or 12 weeks before), concomitantly with, or subsequent to (e.g., 5 minutes, 15 minutes, 30 minutes, 45 minutes, 1 hour, 2 hours, 4 hours, 6 hours, 12 hours, 24 hours, 48 hours, 72 hours, 96 hours, 1 week, 2 weeks, 3 weeks, 4 weeks, 5 weeks, 6 weeks, 8 weeks, or 12 weeks after) the administration of a second prophylactic or therapeutic agent to a subject with a disorder.
Chemotherapy may include Abitrexate (Methotrexate Injection), Abraxane (Paclitaxel Injection), Adcetris (Brentuximab Vedotin Injection), Adriamycin (Doxorubicin), Adrucil Injection (5-FU (fluorouracil)), Afinitor (Everolimus), Afinitor Disperz (Everolimus), Alimta (PEMET EXED), Alkeran Injection (Melphalan Injection), Alkeran Tablets (Melphalan), Aredia (Pamidronate), Arimidex (Anastrozole), Aromasin (Exemestane), Arranon (Nelarabine), Arzerra (Ofatumumab Injection), Avastin (Bevacizumab), Bexxar (Tositumomab), BiCNU (Carmustine), Blenoxane (Bleomycin), Bosulif (Bosutinib), Busulfex Injection (Busulfan Injection), Campath (Alemtuzumab), Camptosar (Irinotecan), Caprelsa (Vandetanib), Casodex (Bicalutamide), CeeNU (Lomustine), CeeNU Dose Pack (Lomustine), Cerubidine (Daunorubicin), Clolar (Clofarabine Injection), Cometriq (Cabozantinib), Cosmegen (Dactinomycin), CytosarU (Cytarabine), Cytoxan (Cytoxan), Cytoxan Injection (Cyclophosphamide Injection), Dacogen (Decitabine), DaunoXome (Daunorubicin Lipid Complex Injection), Decadron (Dexamethasone), DepoCyt (Cytarabine Lipid Complex Injection), Dexamethasone Intensol (Dexamethasone), Dexpak Taperpak (Dexamethasone), Docefrez (Docetaxel), Doxil (Doxorubicin Lipid Complex Injection), Droxia (Hydroxyurea), DTIC (Decarbazine), Eligard (Leuprolide), Ellence (Ellence (epirubicin)), Eloxatin (Eloxatin (oxaliplatin)), Elspar (Asparaginase), Emcyt (Estramustine), Erbitux (Cetuximab), Erivedge (Vismodegib), Erwinaze (Asparaginase Erwinia chrysanthemi), Ethyol (Amifostine), Etopophos (Etoposide Injection), Eulexin (Flutamide), Fareston (Toremifene), Faslodex (Fulvestrant), Femara (Letrozole), Firmagon (Degarelix Injection), Fludara (Fludarabine), Folex (Methotrexate Injection), Folotyn (Pralatrexate Injection), FUDR (FUDR (floxuridine)), Gemzar (Gemcitabine), Gilotrif (Afatinib), Gleevec (Imatinib Mesylate), Gliadel Wafer (Carmustine wafer), Halaven (Eribulin Injection), Herceptin (Trastuzumab), Hexalen (Altretamine), Hycamtin (Topotecan), Hycamtin (Topotecan), Hydrea (Hydroxyurea), Iclusig (Ponatinib), Idamycin PFS (Idarubicin), Ifex (Ifosfamide), Inlyta (Axitinib), Intron A alfab (Interferon alfa-2a), Iressa (Gefitinib), Istodax (Romidepsin Injection), Ixempra (Ixabepilone Injection), Jakafi (Ruxolitinib), Jevtana (Cabazitaxel Injection), Kadcyla (Ado-trastuzumab Emtansine), Kyprolis (Carfilzomib), Leukeran (Chlorambucil), Leukine (Sargramostim), Leustatin (Cladribine), Lupron (Leuprolide), Lupron Depot (Leuprolide), Lupron DepotPED (Leuprolide), Lysodren (Mitotane), Marqibo Kit (Vincristine Lipid Complex Injection), Matulane (Procarbazine), Megace (Megestrol), Mekinist (Trametinib), Mesnex (Mesna), Mesnex (Mesna Injection), Metastron (Strontium-89 Chloride), Mexate (Methotrexate Injection), Mustargen (Mechlorethamine), Mutamycin (Mitomycin), Myleran (Busulfan), Mylotarg (Gemtuzumab Ozogamicin), Navelbine (Vinorelbine), Neosar Injection (Cyclophosphamide Injection), Neulasta (filgrastim), Neulasta (pegfilgrastim), Neupogen (filgrastim), Nexavar (Sorafenib), Nilandron (Nilandron (nilutamide)), Nipent (Pentostatin), Nolvadex (Tamoxifen), Novantrone (Mitoxantrone), Oncaspar (Pegaspargase), Oncovin (Vincristine), Ontak (Denileukin Diftitox), Onxol (Paclitaxel Injection), Panretin (Alitretinoin), Paraplatin (Carboplatin), Perjeta (Pertuzumab Injection), Platinol (Cisplatin), Platinol (Cisplatin Injection), PlatinolAQ (Cisplatin), PlatinolAQ (Cisplatin Injection), Pomalyst (Pomalidomide), Prednisone Intensol (Prednisone), Proleukin (Aldesleukin), Purinethol (Mercaptopurine), Reclast (Zoledronic acid), Revlimid (Lenalidomide), Rheumatrex (Methotrexate), Rituxan (Rituximab), RoferonA alfaa (Interferon alfa-2a), Rubex (Doxorubicin), Sandostatin (Octreotide), Sandostatin LAR Depot (Octreotide), Soltamox (Tamoxifen), Sprycel (Dasatinib), Sterapred (Prednisone), Sterapred DS (Prednisone), Stivarga (Regorafenib), Supprelin LA (Histrelin Implant), Sutent (Sunitinib), Sylatron (Peginterferon Alfa- 2b Injection (Sylatron)), Synribo (Omacetaxine Injection), Tabloid (Thioguanine), Taflinar (Dabrafenib), Tarceva (Erlotinib), Targretin Capsules (Bexarotene), Tasigna (Decarbazine), Taxol (Paclitaxel Injection), Taxotere (Docetaxel), Temodar (Temozolomide), Temodar (Temozolomide Injection), Tepadina (Thiotepa), Thalomid (Thalidomide), TheraCys BCG (BCG), Thioplex (Thiotepa), TICE BCG (BCG), Toposar (Etoposide Injection), Torisel (Temsirolimus), Treanda (Bendamustine hydrochloride), Trelstar (Triptorelin Injection), Trexall (Methotrexate), Trisenox (Arsenic trioxide), Tykerb (Iapatinib), Valstar (Valrubicin Intravesical), Vantas (Histrelin Implant), Vectibix (Panitumumab), Velban (Vinblastine), Velcade (Bortezomib), Vepesid (Etoposide), Vepesid (Etoposide Injection), Vesanoid (Tretinoin), Vidaza (Azacitidine), Vincasar PFS (Vincristine), Vincrex (Vincristine), Votrient (Pazopanib), Vumon (Teniposide), Wellcovorin IV (Leucovorin Injection), Xalkori (Crizotinib), Xeloda (Capecitabine), Xtandi (Enzalutamide), Yervoy (Ipilimumab Injection), Zaltrap (Ziv-aflibercept Injection), Zanosar (Streptozocin), Zelboraf (Vemurafenib), Zevalin (Ibritumomab Tiuxetan), Zoladex (Goserelin), Zolinza (Vorinostat), Zometa (Zoledronic acid), Zortress (Everolimus), Zytiga (Abiraterone), Nimotuzumab and immune checkpoint inhibitors such as nivolumab, pembrolizumab/MK-3475, pidilizumab and AMP-224 targeting PD-1; and BMS-935559, MED14736, MPDL3280A and MSB0010718C targeting PD-L1 and those targeting CTLA-4 such as ipilimumab.
Radiotherapy means the use of radiation, usually X-rays, to treat illness. X-rays were discovered in 1895 and since then radiation has been used in medicine for diagnosis and investigation (X-rays) and treatment (radiotherapy). Radiotherapy may be from outside the body as external radiotherapy, using X-rays, cobalt irradiation, electrons, and more rarely other particles such as protons. It may also be from within the body as internal radiotherapy, which uses radioactive metals or liquids (isotopes) to treat cancer.
Methods are provided for prognostic determination for recurrence of DCIS breast cancer, including recurrence as DCIS or recurrence as IBC, allowing classification of patients based on the determination. Patients can be treated in accordance with the determination, where predicted aggressiveness of a DCIS lesion can be used to develop a treatment plan for the subject with the lesion. It is shown herein that such breast cancer progression is associated with a reduction in myoepithelial integrity, a shift in fibroblast function towards proliferative cancer-associated states (CAFs), and remodeling of collagen in the extracellular matrix (ECM).
In some embodiments a predictive method is provided for classification of a DCIS tissue from an individual as indolent; or invasive recurrent. In some embodiments the method comprises analysis of ductal myoepithelium features, where myoepitheliem characterized as thin, discontinuous, low-ECAD myoepithelium, relative to a normal control, is classified as indolent. In some embodiments the structure of collagen fibers in the extracellular matrix, and the spatial distribution of multiple immune cell subsets is also analyzed. In some embodiments a plurality of features obtained by MIBI-TOF analysis of a DCIS lesion are used for classification.
A DCIS sample can be obtained by any means available to those skilled in the art including, but not limited to, a biopsy of the DCIS lesion, including a needle biopsy or surgical removal of tissue containing the lesion. For example, a tissue slide or block is obtained. The tissue is optionally frozen or fixed. A plurality of tissue samples can be aggregated in a tissue microarray for convenience of analysis, optionally combined with samples of positive and/or negative controls. Serial sections of a slide can be cut for H&E staining to guide imaging, and for MIBI-TOF imaging.
In some embodiments the DCIS sample is stained with a panel of antibodies to define the cellular composition and structural characteristics of the tissue. In some embodiments the antibodies are conjugated directly or indirectly with a detectale marker, e.g. isotopic metal reporters, fluorescent dyes, and the like as known in the art. The slides are contacted with antibodies, usually a panel of antibodies, and then washed free of unbound antibodies.
In some embodiments the panel of antibodies comprises antibodies specific for one or more markers: Tryptase, CK7, VIM, CD44, CK5, PanCK, HIF1A, CD45, AR, HLADR/DP/DQ, GLUT1, ECAD, CD20, MMP9, FAP, CD11c, HER2, CD3, CD8, CD36, MPO, CD68, pS6, Granzyme B, P63, Ki67, IDO1, CD31, PD1, CD14, CD4, Collagen 1, SMA, COX2, Histone H3, ER, PDL1-biotin. In some embodiments the panel comprises at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35 or all of these markers. In some embodiments a panel of antibodies as defined above comprises at least an antibody specific for E-cadherin.
In some embodiments features are obtained from MIBI-TOF and antibody staining to generate parameters, or features, for classification, where multiplexed image sets are extracted and filtered. Deepcell segmentation parameters are optionally generated. Single cell expression of markers may be measured and normalized.
In some embodiments the features for classification comprise one or more of: myoepithelial E-cadherin expression, antigen presenting cells (APC) near endothelium, periductal immune cells, ER+luminal tumor cells, ER+tumor cells, myoepithelial CK5 expression, tumor-myoepithelium neighborhood, APC near fibroblast, CD8+T cells near double negative T cells (dnT), myoepithelial continuity, CD4+T cells near dnT, stromal mast cells, PDL1+CK5/7-low tumor cells, tumor-dominate neighborhood, B cell near dnT, macrophage near mast cells, CD8+T cells near mast cells, variation in collagen fiber orientation, periductal APCs, PD1+immune cells.
In some embodiments features for classification comprise at least myoepithelial E-cadherin. In some embodiments, features for classification comprise at least each of myoepithelial E-cadherin expression, antigen presenting cells (APC) near endothelium, periductal immune cells, ER+luminal tumor cells, ER+tumor cells, myoepithelial CK5 expression, tumor-myoepithelium neighborhood, APC near fibroblast, CD8+T cells near double negative T cells (dnT), myoepithelial continuity, CD4+T cells near dnT, stromal mast cells, PDL1+CK5/7-low tumor cells, tumor-dominate neighborhood, B cell near dnT, macrophage near mast cells, CD8+T cells near mast cells, variation in collagen fiber orientation, periductal APCs, PD1+immune cells. In some embodiments features for classification include additional features set forth in Table 1, e.g. at least 10, at least 20, at least 30, at least 40, at least 50 or more of the features, and may comprise all of the features set forth in Table 1.
An image of the tissue can be captured, transformed into data, and transmitted to a biological image analyzer for analysis, which biological image analyzer comprises a processor and a memory coupled to the processor, the memory to store computer-executable instructions that, when executed by the processor, cause the processor to perform operations comprising the classification processes disclosed herein. For example, the tissue may be analyzed, digitized, and either stored onto a non-transitory computer readable storage medium or transmitted as data directly to the biological image analyzer for analysis. As another example, a the stained tissue may be scanned, digitized, and either stored onto a non-transitory computer readable storage medium or transmitted as data directly to a computer system for analysis. In one embodiment, features are automatically identified.
In some embodiments, machine learning tools for multiplexed cell segmentation and spatial analytics are used to enumerate cell populations and to quantify how these populations are spatially distributed relative to one another. Object morphometrics and high dimensional pixel clustering are used to annotate the structure of stromal collagen and myoepithelial phenotypes that track with disease progression.
The features quantified in these analyses can be used to build a random forest classifier for predicting which patients will progress to invasive disease based exclusively on the original DCIS biopsy.
In some embodiments a predictive classifier model is provided for a method for classification of a DCIS tissue from an individual as indolent; or invasive recurrent. In some embodiments the classifier model is a random forest classifier model. In some embodiments a random-forest classifier with MIBI-identified tumor features is trained on patients with known clinical outcomes, and the classifier used to identify those features most useful to separating these outcome groups. The model can be trained to predict recurrence of DCIS and invasive breast cancer (IBC); or can be trained to predict only IBC. In some embodiments the features comprise metrics related to the phenotype of myoepithelium, the structure of collagen fibers in the extracellular matrix, and the spatial distribution of multiple immune cell subsets. For example, the model has identified pixel-level, ECAD+ myoepithelial expression as the most predictive metric.
A computational system (e.g., a computer) may be used in the methods of the present disclosure to control and/or coordinate stimulus through the one or more controllers, and to analyze data from imaging DCIS samples. A computational unit may include any suitable components to analyze the measured images. Thus, the computational unit may include one or more of the following: a processor; a non-transient, computer-readable memory, such as a computer-readable medium; an input device, such as a keyboard, mouse, touchscreen, etc.; an output device, such as a monitor, screen, speaker, etc.; a network interface, such as a wired or wireless network interface; and the like.
The raw data from measurements can be analyzed and stored on a computer-based system. As used herein, “a computer-based system” refers to the hardware means, software means, and data storage means used to analyze the information of the present invention. The minimum hardware of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means, and data storage means. A skilled artisan can readily appreciate that any one of the currently available computer-based system are suitable for use in the present invention. The data storage means may comprise any manufacture comprising a recording of the present information as described above, or a memory access means that can access such a manufacture.
A variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems. Such presentation provides a skilled artisan with a ranking of similarities and identifies the degree of similarity contained in the test data.
The analysis may be implemented in hardware or software, or a combination of both. In one embodiment of the invention, a machine-readable storage medium is provided, the medium comprising a data storage material encoded with machine readable data which, when using a machine programmed with instructions for using said data, is capable of displaying a any of the datasets and data comparisons of this invention. Such data may be used for a variety of purposes, such as drug discovery, analysis of interactions between cellular components, and the like. In some embodiments, the invention is implemented in computer programs executing on programmable computers, comprising a processor, a data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Program code is applied to input data to perform the functions described above and generate output information. The output information is applied to one or more output devices, in known fashion. The computer may be, for example, a personal computer, microcomputer, or workstation of conventional design.
Each program can be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Each such computer program can be stored on a storage media or device (e.g., ROM or magnetic diskette) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein. A variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention.
Further provided herein is a method of storing and/or transmitting, via computer, sequence, and other, data collected by the methods disclosed herein. Any computer or computer accessory including, but not limited to software and storage devices, can be utilized to practice the present invention. Sequence or other data (e.g., immune repertoire analysis results), can be input into a computer by a user either directly or indirectly. Additionally, any of the devices which can be used to analyze features can be linked to a computer, such that the data is transferred to a computer and/or computer-compatible storage device. Data can be stored on a computer or suitable storage device (e.g., CD). Data can also be sent from a computer to another computer or data collection point via methods well known in the art (e.g., the internet, ground mail, air mail). Thus, data collected by the methods described herein can be collected at any point or geographical location and sent to any other geographical location.
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g. amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Centigrade, and pressure is at or near atmospheric.
Transition to Invasive Breast Cancer is Associated with Progressive Changes in the Structure and Composition of Tumor Stroma
Ductal carcinoma in situ (DCIS) is a pre-invasive lesion that is thought to be a precursor to invasive breast cancer (IBC). To understand the changes in the tumor microenvironment (TME) accompanying transition to IBC, we used multiplexed ion beam imaging by time of flight (MIBI-TOF) and a 37-plex antibody staining panel to interrogate 79 clinically annotated surgical resections using machine learning tools for cell segmentation, pixel-based clustering, and object morphometrics. Comparison of normal breast with patient-matched DCIS and IBC revealed coordinated transitions between four TME states that were delineated based on the location and function of myoepithelium, fibroblasts, and immune cells. Surprisingly, myoepithelial disruption was more advanced in DCIS patients that did not develop IBC, suggesting this process could be protective against recurrence. Taken together, this HTAN Breast PreCancer Atlas study offers new insight into drivers of IBC relapse and emphasizes the importance of the TME inregulating these processes.
Ductal carcinoma in situ (DCIS) is a pre-invasive lesion of tumor cells within the breast duct that are isolated from the surrounding stroma by a near-continuous layer of myoepithelium and basement membrane proteins. This histologic property is the primary feature that distinguishes DCIS from invasive breast cancer (IBC), where this barrier is absent and tumor cells are in direct contact with the stroma (
Sequencing-based approaches have been used extensively over the last decade to identify molecular mechanisms that could explain the connection between DCIS and IBC. Genomic profiling has identified recurrent copy number variants that are more prevalent in high-grade DCIS lesions. Comparison of DCIS and IBC lesions from the same patient has provided clues into the clonal evolution from in situ to invasive disease by revealing genomic alterations that are acquired during this transition. To date, however, these findings have not consistently explained this transition. Similarly, the utility of tumor phenotyping by single-plex immunohistochemical tissue staining has been limited as well.
In light of this uncertainty, clinical management has favored treating all patients presumptively as progressors to IBC with surgery, radiation therapy, and pharmacological interventions, all of which carry risks for adverse events. Consequently, this approach is likely to be overly aggressive for patients who do not progress (non-progressors). Thus, understanding what drives DCIS to transition to IBC is a critical unmet need and opportunity for prevention. Surprisingly, despite all the information now known about the genetic and functional state of tumor cells in DCIS, histopathology remains the only reliable way to diagnose it. Thus, DCIS is an intrinsically structured entity for which the spatial orientation of tumor, myoepithelial, and stromal cells are defining characteristics.
To understand how DCIS structure and single-cell function are interrelated, we used new tools previously developed by our lab for highly multiplexed subcellular imaging to analyze a large cohort of human archival tissue samples covering the spectrum of breast cancer progression from in situ to invasive disease in a spatially resolved manner. In previous work, we used MIBI-TOF to identify rule sets governing the tumor microenvironment (TME) structure in triple-negative breast cancer that were highly predictive of the composition of immune infiltrates, the expression of immune checkpoint drug targets, and 10-year overall survival. This effort provided a framework for how TME structure and composition could be used more generally as a surrogate readout to understand the functional response to neoplasia. With this in mind, we sought to determine to the extent to which similar themes involving myoepithelial, stromal, and immune cells in the DCIS TME might play pivotal roles in breast cancer progression. These cell types have been implicated previously in promoting local invasion, metastasis, and correlation with clinical progression.
Here, we report the first systematic, high-dimensional analysis of breast cancer progression using the Washington University Resource Archival Human Breast Tissue (RAHBT) cohort, a clinically annotated set of archived tissue from patients diagnosed with DCIS and IBC. Because the DCIS patient population is complicated by differences in age, parity status, tumor subtype, and treatment course, a well-conceived cohort design is crucial for identifying meaningful features amidst these confounding variables. The RAHBT cohort was therefore composed of primary DCIS tumors from women who later progressed to IBC that were matched by age and year of diagnosis with DCIS from women who did not have a subsequent ipsilateral breast event. We used MIBI-TOF and a 37-plex antibody staining panel to comprehensively define the cellular composition and structural characteristics in normal breast tissue, DCIS, and IBC relapses. These findings were corroborated by transcriptomic data acquired from adjacent co-registered tissue regions isolated by laser capture microdissection. We used the 433 parameters quantified in these analyses to build a random forest classifier for predicting which DCIS patients would later progress to IBC based on the original resection specimen. This classifier was heavily weighted for spatially informed parameters quantifying breast cancer TME structure, particularly those relating to ductal myoepithelium. Surprisingly, myoepithelial loss was more pronounced in samples from DCIS patients that did not recur and was typically associated with a more reactive stroma. Taken together, the studies reported here provide new insight into potential etiologies of DCIS progression that will guide development of future diagnostics and serve as a template for how to conduct similar analyses of pre-invasive cancers.
Results
A longitudinal cohort of DCIS patients with or without subsequent invasive relapse. The goal of this study was to explore two central questions of breast cancer progression. First, how does the structure, composition, and function of breast tissue change with progression from DCIS to IBC? Second, what distinguishes DCIS lesions in patients that later develop IBC (progressors) from those that do not (non-progressors)? To examine these questions, we mapped the phenotype, structure, and spatial distribution of tumor, myoepithelium, stroma, and immune cells of 79 archival formalin-fixed paraffin-embedded patient tissues from the RAHBT cohort (
Patient samples included normal breast tissue (N=9, reduction mammoplasty), primary DCIS (N=58), and IBC (N=12). Of the 58 primary DCIS samples, 44 were from non-progressors (median follow-up=11.4 years), while the remaining 14 were from progressors (median time to subsequent breast event=9.1 years,
A single-cell phenotypic atlas of DCIS epithelium and its microenvironment. As part of the HTAN PreCancer Atlas, we created a multiomic atlas of breast cancer progression using co-registered adjacent serial sections cut from each RAHBT tissue microarray (TMA) block. For this study, these tissues were used for hematoxylin and eosin (H&E) histochemical staining, RNA transcriptome laser-capture microdissection (LCM-Smart-3SEQ), and highly multiplexed imaging (MIBI-TOF,
MIBI-TOF imaging was performed on each RAHBT TMA using a 37-plex metal-conjugated antibody staining panel (
DCIS epithelial and stromal tissue compartments were predominantly composed of epithelial cells and fibroblasts, respectively, which were each comprised of four major phenotypic subsets. Epithelial cells consisted of luminal (56.9%±33.7), basal (4.4%±6.6), epithelial-to-mesenchymal (EMT, 2.3%±2.8), and CK5/7-low (36.2%±33.5) subsets defined by variable expression of vimentin, CK7, and CK5 (
Transition to DCIS and IBC is marked by coordinated changes in the TME. In the previous section, we defined normal, DCIS, and IBC samples in terms of bulk cellular composition in a manner that was agnostic to the spatial location of each cell population. Next, to interrogate potential spatial differentiators of disease state, and to understand how tissue composition, cellular organization, and structure are interrelated, we augmented these compositional data with a description of the spatial distribution of each cell subset within the TME. First, to determine the proportion of each cell population residing within ductal or stromal regions, we used regional masks demarcating the epithelium and stroma to quantify the frequency of each cell type in these regions (Tissue Compartment Enrichment,
In addition to this more general cell-centric approach, we also developed custom tools for capturing specific morphologic and phenotypic attributes of the thin monolayer of myoepithelium-encapsulating ductal epithelial cells and the structure of stromal collagen (TME morphometrics,
We then compared these profiles for normal, DCIS, and IBC tissues to address our first question: how do the composition and structure of the TME change with progression to IBC? We applied the Kruskal-Wallis H test to discern which aspects of tissue composition and structure were significantly distinctive of each clinical group (p<0.05, STAR Methods Distinguishing Feature Analysis). This analysis identified 137 parameters that were preferentially enriched or depleted in normal, DCIS, or IBC tissue, with spatially agnostic (cell type, cell state) and spatially informed metrics accounting for 39% and 61% of differentially expressed parameters, respectively (
To organize distinguishing features into interpretable TME signatures, we performed k-means clustering to yield four clusters defining the breast tissue states: TME1, TM E2, and TME3 uniquely distinguished normal, DCIS, and IBC samples, respectively, and TME4 consisted of features that were specifically depleted in DCIS samples (
Along these lines, we noted when comparing TME2 and TME3 that—aside from the pathognomonic loss of ductal myoepithelium—the most distinctive property delineating DCIS from IBC samples was an increase in stromal desmoplasia (collagen deposition, CAF frequency, and proliferation). To further evaluate whether these trends reflected changes specific to the interval between a new DCIS diagnosis and ipsilateral invasive relapse, we compared these parameters in a subset of sample pairs in which both DCIS and IBC tissue had been procured longitudinally from the same patient (N=9). We found that the degree of statistical significance in this lesser-powered pairwise analysis and the larger unpaired analysis were linearly correlated (R2=0.58, p=3E-15) and that the salient trends reflected in TME2 and TME3 occurred at the patient level (
To quantify how this shift in fibroblast phenotype relates to the extent of stromal desmoplasia, we compared the shape, length, and density of individual collagen fibers with CAF location, frequency, and phenotype (
Identifying DCIS features correlated with risk of invasive progression. We next leveraged both spatially informed and agnostic parameters to examine our second central question: what distinguishes DCIS lesions that later progress to IBC from those that do not? We compared tissue procured at the time of diagnosis in two sets of patients with primary DCIS. The first set, referred to as “progressor”, consisted of 14 patients who had a subsequent ipsilateral invasive recurrence following a diagnosis of pure DCIS (median time to recurrence=9.1 years). The second set, referred to as “non-progressor”, consisted of 44 patients with pure DCIS that did not have a breast event following tumor resection (median time of follow=11.4 years).
To identify predictive features of the TME, we trained a random forest classifier to predict which patients would relapse with invasive disease based on cell-type prevalence, tissue compartment enrichment, cell-cell proximity, and morphometrics for each sample (
After removing sparse and overly correlated parameters, we randomly split the patient population 80/20 into training and test sets, respectively (
To understand the biology being leveraged by the model to accurately discriminate pre- invasive from indolent DCIS tumors, we ranked the top 20 features based on Gini importance. These features primarily consisted of metrics related to the phenotype of myoepithelium and the spatial distribution of multiple immune cell subsets (
Myoepithelial breakdown and phenotypic change between progressors and non-progressors. In the above analysis, myoepithelial structure and phenotype were overrepresented among the top Gini-ranked classifier features (
In our analyses comparing normal tissue, DCIS, and IBC, we observed the highest myoepithelial ECAD expression in normal breast tissue (
To understand how this loss might influence recurrence outcomes, we used a method derived from geneset enrichment analysis to identify ontologies that were correlated with high or low myoepithelial character (STAR Methods Feature Ontology Enrichment Analysis). Low scores typical of non-progressors were enriched for parameter ontologies relating to hypoxia, glycolysis, stromal immune density, and desmoplasia/remodeling of the extracellular matrix (ECM;
Here, we report the first spatial atlas of breast cancer progression. The central focus of this study was to central focus is to characterize features in primary DCIS that are associated with risk of invasive relapse, where tumor cells have breached the duct and invaded the surrounding stroma. Previous work examining breast cancer progression has attributed this transition either to tumor-intrinsic factors or to specific features of stromal cells in the surrounding TME. By simultaneously mapping both of these entities in intact human tissue, we sought to treat the DCIS TME as a single ecosystem in which progression to invasive disease depends on an evolving spatial distribution and function of multiple cell types, rather than on any single cell subset.
Meeting this goal required first assembling a large, well-annotated, and diversified pool of human breast cancer tissue: the RAHBT cohort. This effort was motivated in part by the success of similar works investigating invasive disease (METABRIC, TCGA) that have provided deep insights into breast tumor composition and have served as authoritative resources in breast cancer research (Cancer Genome Atlas Network, 2012). The Breast PreCancer Atlas constructed a unique set of archival human surgical resections that captured the full spectrum of breast cancer progression, from normal tissue, to primary DCIS, and onto patient-paired ipsilateral IBC recurrences. Here, assembling all these cases into TMAs has enabled a one-of-a-kind workflow for multiomics analyses in which genomic, transcriptomic, and proteomic techniques are performed not only on the same samples, but on co-registered serial sections of the same local region of tissue.
Here, we analyzed these TMAs using MIBI-TOF and a 37-marker staining panel to map breast cancer progression and to understand why some patients with DCIS relapse with invasive disease while others do not. Our results show that coordinated transformation of ductal myoepithelium and surrounding stroma plays a central role in determining clinical outcome by establishing a tumor-permissive niche that favors local invasion. Relative to normal tissue, the thin myoepithelial layer in DCIS samples was less phenotypically diverse and more proliferative (
Typified changes in TME structure and function were not only discriminative of DCIS and IBC, but also separated DCIS progressors from non-progressors. Using 433 spatial and compositional parameters drawn exclusively from original primary DCIS samples, we built a random forest classifier model to predict which patients would relapse with an ipsilateral invasive tumor following initial DCIS diagnosis (AUC=0.74, p=0.02). On examining the relative weighting given to each parameter in the model, two compelling and overarching insights emerged. First, spatially informed metrics relating cell function to structure and morphology were significantly over-represented relative to non-spatial metrics. Second, the most influential features were primarily related to myoepithelium and stroma rather than to the tumor cells themselves.
Given its loss in IBC, ductal myoepithelium has long been thought to act as a barrier that deters local invasion by partitioning in situ carcinoma cells away from the surrounding stroma. Initially, we hypothesized that a more intact and robust myoepithelial barrier resembling normal breast tissue would be protective against invasive progression. Surprisingly, however, our data seem to suggest the opposite: DCIS samples with more continuous myoepithelium and high ECAD expression were at higher risk of ipsilateral invasive recurrence following primary DCIS surgical excision. Retention of these normal-like myoepithelial traits correlated with fewer stromal immune cells and CAFs (
Taken together, the analyses reported here deliver a comprehensive, multi-compartmental atlas of preinvasive breast cancer that illustrates the full continuum of tissue structure and function starting from a homeostatic state in normal breast through in situ and invasive disease, including matched longitudinal samples. Combining this comprehensive data set with extensive patient follow-up has enabled identification of tumor features that are associated with risk of invasive relapse in DCIS patients and offers a framework for follow-on analysis.
Methods
Patient Cohort. We utilized a retrospective study cohort of patients from the Washington University Resource of Archival Tissue (RAHBT) that contained two outcome groups: non-progressors, which was composed of patients with DCIS who had no new breast event following resection (median follow-up=11.4 years), and progressors, which was composed of patients with DCIS who had a new ipsilateral invasive breast cancer event following primary DCIS resection (median time to new event=9.1 years). For each progressor, we matched two non- progressors who remained free from recurrent lesions, based on age at diagnosis (±5 years) and type of definitive surgery (mastectomy or lumpectomy). For each DCIS diagnosis, we retrieved primary and recurrent tumor slides and blocks for pathology review, secured a whole slide image of each sample, marked for tissue microarray (TMA) cores, and generated TMA blocks with 84 1.5-mm cores, including additional tonsil and normal breast tissue sourced from reduction mammoplasty.
Median age at diagnosis was 54 years, year of diagnosis was 1986 to 2017, and median time to recurrence with was 9.1 years for invasive lesions and 5.3 years for pre-malignant lesions. For women in the cohort with no recurrence, follow-up extended to 132 months, on average. Treatment of initial DCIS ranged from lumpectomy with radiation (approximately half of cases), lumpectomy with no radiation (20%), and mastectomy with no radiation (30%). The RAHBT cohort is composed of African American women (26%) and white women (74%).
Serial sections (5 μm) of each TMA slide were cut onto glass slides for hematoxylin and eosin (H&E) staining, onto laser-capture slides for LCM-RNAseq (SMART-3SEQ), and cut onto gold- and tantalum-sputtered slides for MIBI-TOF imaging. H&E slides were inspected by a breast cancer pathologist to address DCIS purity and to demarcate regions of DCIS to guide MIBI imaging and laser dissection of epithelial and stromal area. The Stanford Hospital cohort lacked paired LCM-RNAseq analysis.
Antibody Preparation. Antibodies were conjugated to isotopic metal reporters as described previously. Following conjugation, antibodies were diluted in Candor PBS Antibody Stabilization solution (Candor Bioscience). Antibodies were either stored at 4° C. or lyophilized in 100 mM D-(+)-Trehalose dehydrate (Sigma Aldrich) with ultrapure distilled H2O for storage at −20° C. Prior to staining, lyophilized antibodies were reconstituted in a buffer of Tris (Thermo Fisher Scientific), sodium azide (Sigma Aldrich), ultrapure water (Thermo Fisher Scientific), and antibody stabilizer (Candor Bioscience) to a concentration of 0.05 mg/mL. Some metal-conjugated antibodies in this study were used as secondary antibodies targeting hapten groups on hapten-conjugated primary antibodies, including the pairs PDL1-Biotin and Anti-Biotin149Sm, and ER-Alexa488 and Anti-Alexa488142Nd.
Tissue Staining. Tissues were sectioned (5 μm thick) from tissue blocks on gold- and tantalum-sputtered microscope slides. Slides were baked at 70° C. overnight followed by deparaffinization and rehydration with sequential washes in xylene (3×), 100% ethanol (2×), 95% ethanol (2×), 80% ethanol (1×), 70% ethanol (1×), and ddH2O with a Leica ST4020 Linear Stainer (Leica Biosystems). Tissues next underwent antigen retrieval by submerging sides in 3-in-1 Target Retrieval Solution (pH 9, DAKO Agilent) and incubating them at 97° C .for 40 min in a Lab Vision PT Module (Thermo Fisher Scientific). After cooling to room temperature, slides were washed in 1×phosphate-buffered saline (PBS) IHC Washer Buffer with Tween 20 (Cell Marque) with 0.1% (w/v) bovine serum albumin (Thermo Fisher).
Next, all tissues underwent two rounds of blocking, the first to block endogenous biotin and avidin with an Avidin/Biotin Blocking Kit (Biolegend). Tissues were then washed with wash buffer and blocked for 1 h at room temperature with 1×TBS IHC Wash Buffer with Tween 20 with 3% (v/v) normal donkey serum (Sigma-Aldrich), 0.1% (v/v) cold fish skin gelatin (Sigma Aldrich), (v/v) Triton X-100, and 0.05% (v/v) sodium azide. The first antibody cocktail was prepared in 1×TBS IHC Wash Buffer with Tween 20 with 3% (v/v) normal donkey serum (Sigma-Aldrich) and filtered through a 0.1-μm centrifugal filter (Millipore) prior to incubation with tissue overnight at 4° C. in a humidity chamber. Following the overnight incubation slides were washed twice for 5 min in wash buffer. On the second day, antibody cocktail was prepared as described above and incubated with the tissues for 1 h at 4° C. in a humidity chamber. Following staining, slides were washed twice for 5 min in wash buffer and fixed in a solution of 2% glutaraldehyde (Electron Microscopy Sciences) in low-barium PBS for 5 min. Slides were sequentially washed in PBS (1×), 0.1 M Tris at pH 8.5 (3×), ddH2O (2×), and then dehydrated by serially washing in 70% ethanol (1×), 80% ethanol (1×), 95% ethanol (2×), and 100% ethanol (2×). Slides were dried under vacuum prior to imaging.
MIBI-TOF Imaging. Imaging was performed using a MIBI-TOF instrument (IonPath) with a Hyperion ion source. Xe+ primary ions were used to sequentially sputter pixels for a given field of view(FOV). The following imaging parameters were used: acquisition setting: 80 kHz; field size: 500 μm2, 1024×1024 pixels; dwell time: 5 ms; median gun current on tissue: 1.45 nA Xe+; ion dose: 4.23 nAmp h/mm2 for 500×500 μm FOVs.
Low-level Image Processing and Single-cell Segmentation. Multiplexed image sets were extracted, slide background-subtracted, denoised, and aggregate-filtered as previously described. Nuclear segmentation was performed using an adapted version of the DeepCell (Mesmer) CNN architecture. A cell nuclei (“Nuc”) channel that combined HH3 and endogenous phosphorous (P) signal was generated for segmentation input as the nuclear channel, and a combination channel of E-cadherin, PanCK, CD45, CD44, and GLUT1 was used as the membrane channel input. To more effectively capture the range of cell shapes and morphologies present in DCIS, we generated two distinct Deepcell segmentation parameter sets for each image that were then combined for optimal cell detection accuracy. The first used a radial expansion of two pixels from the nuclear border to generate a cell object and a stringent threshold for splitting cells (
Single-cell Phenotyping and Composition. Single-cell expression of each marker was measured through total signal counts in each cell object, normalized by object area. Single-cell data were then linearly rescaled by the average cell area across the cohort, and subsequently as in h-transformed with a co-factor of 5. All mass channels were scaled to 99.9th percentile. In order to assign each cell to a lineage and subsequent cell type, the FlowSOM clustering algorithm was used in iterative rounds with the Bioconductor “FlowSOM” package in R (v.1.16.0). The first clustering round separated cells into 100 clusters (xdim=10, ydim=10), which were assigned to one of five major cell lineages based on well-established combinations of lineage marker expression, including: epithelial cells (PanCK+, ECAD+, CD45−, CK7+/−, VIM+/−), myoepithelial cells (SMA+, CD45−, PanCK+/−, ECAD+/−, CKS+/-, VIM+/−), fibroblasts (VIM+, PanCK−, ECAD−, CK7-, CD45-, SMA+/-, FAP+/-, CD36+/−), endothelial cells (CD31+, VIM+, PanCK−, ECAD−, CK7−, CD45−, SMA+/−), and immune cells (CD45+, PanCK−, ECAD−). Accurate lineage assignment was assessed by reviewing cells from each FlowSOM cluster in image overlays of lineage-defining markers. In clusters with rare, non-canonical combinations of marker expression, cluster assignments were extensively reviewed across images of various tissue types with pathologist assistance, utilizing morphometric and histological organization features in addition lineage marker expression to accurately phenotype the cells. See
Following lineage assignment, each lineage was subclustered to identify immune cell types including B cells (CD20+, CD4+/−), CD4 T cells (CD4T; CD3+, CD4+, CD8−/low), CD8 T cells (CD8T; CD3+, CD8+, CD4−/low), monocytes (Mono; CD14+, CD11c−, CD68−, CD3−), monocyte-derived dendritic cells (MonoDCs; CD14+, CD11c+, HLADR+, CD68−, CD3−), dendritic cells (DCs; CD11c+, HLADR+, CD3−), macrophages (Macs; CD68+, HLADR+, CD14+/−), mast cells (Mast; Tryptase+), double-negative T cells (dnT; CD3+, CD4−, CD8−), and HLADR+APC cells (APC; HLADR+, CD45+/low). CD45+-only immune cells were annotated as “immune other”. Neutrophils were rare in the dataset; they were assigned last based on the positivity threshold (>0.25) of MPO expression in immune cells. Tumor and fibroblast cells were similarly subclustered to reveal phenotypic subsets, including luminal (ECAD+, PanCK+, CK7+), basal (ECAD+, PanCK+, CK5+), epithelial-to-mesenchymal (EMT; ECAD+/−, PanCK+, VIM+), CK5/7-low (ECAD+, PanCK+) tumor cells, and normal (VIM+, CD36+), myo- (VIM+, SMA+), resting (VIM+only), and CAF (VIM+, FAP+) fibroblasts (
Throughout this work cellular data are presented as 1) the frequency of a cell type of its parental lineage across the entire image (e.g., luminal tumor cells as % of total tumor cells in image), 2) a cell type's density within a particular compartment of the image (e.g., 50 fibroblasts per mm2 of stroma (see Region Masking for compartment definition)), or 3) for immune cells, the frequency of immune cell types (of total immune) calculated for both epithelial and stromal regions (e.g. % macrophages of total epithelial immune). To calculate myoepithelial cell density, the number of cells phenotyped as myoepithelium in each image is normalized by the area of the myoepithelial mask in that image.
Region Masking. Region masks were generated to define histologic regions of each FOV including the epithelium, stroma, myoepithelial (periductal) zone, and duct. We removed gold-positive areas, which marked regions of bare slide from holes in the tissue, providing an accurate measurement of tissue area. This area measurement was used to calculate cellular density in specific histologic regions (e.g., fibroblast density in the stroma) to normalize observed cell abundances by the amount of tissue sampled. The epithelial mask was first generated though merging the ECAD and PanCK signals and applying smoothing (Gaussian blur, radius 2 px) and radial expansion (20 px) to incorporate the myoepithelial zone; the insides of ducts were filled. The stromal mask included all of the image area outside of the epithelial mask. Duct masks were generated through the erosion of the epithelial masks by 25 px. The myoepithelial mask was generated by subtracting the duct mask from the epithelial mask, leaving a ˜15 μm-wide periductal ribbon following the duct edge. To calculate the area in each mask, a bare slide mask was generated from the gold (Au) channel and this area was removed from the measurement, and pixel area was converted to mm2 of tissue.
Cellular Spatial Enrichment Analyses. A spatial enrichment approach was used as previously described for enrichment or exclusion across all cell-type pairs. HH3 was excluded from the analysis. For each cell type pair of cell type X and cell type Y, the number of times the centroid of cell X was within a ˜50 μm radius of cell Y was counted. A null distribution was produced by performing 100 bootstrap permutations in which the locations of cell Y were randomized. A z-score was calculated comparing the number of true co-occurrences of cell X and cell Y relative to the null distribution. Importantly, symmetry was assumed: the values of the spatial enrichment of cell X close to cell Y are the same as the values with cell Y close to cell X. For each pair of cell types, the average z-score was calculated across all DCIS FOVs. To analyze cellular associations with the edge of the epithelium, the distances between all cell centroids to the nearest perimeter location of the epithelial mask (described above) were calculated. Cell neighborhoods were produced by first generating a cell neighbor matrix in which each row represents an index cell and columns indicate the relative frequency of each cell phenotype within a 36-μm radius of the index cell. Next, the neighbor matrix was clustered to 10 clusters using k-means clustering, with the number of clusters being determined as the number that best separated distinct immune cell mixtures and tumor/myoepithelial spatial relationships. The neighborhood cellular profile was determined by assessing the mean prevalence of each cell phenotype within a 36-μm radius of the index cell.
Distinguishing Feature Analysis. To determine features that distinguish among normal breast tissue, DCIS, and IBC, means of all 433 features were compared between groups using the Kruskal-Wallis H test. Features with significance under p=0.05 were subsequently clustered using k-means clustering into the 4 TME clusters. For paired analyses, feature means were compared between DCIS and IBC samples from the same patient.
ECM Gene Analysis. To analyze ECM components by gene expression, an ECM gene signature (GO ECM structural constituent, GO:0030021) was downloaded from the GSEA website and used to compare MIBI-identified samples with the top and bottom quartiles of cancer-associated fibroblast density in the stroma. Stromal LCM-RNAseq samples were used for this analysis. Raw reads were normalized with DESeq2 R package (version 1.30.0) (Anders and Huber, 2010) and a paired t-test was compared to the log2 ratio of group means to generate the volcano plot.
Myoepithelial Continuity and Thickness Analysis. To define a window of myoepithelial signal quantitation, we used a topology-preserving operation and defined a curve 5 pixels out from the epithelial mask edge (see Region Masking) and a curve 30 pixels in from the epithelium mask edge; we defined those pixels between these two curves as the myoepithelium mask. We subdivided the outer curve into 5-px arc segments, and for each point on the outer edge between two segments, we found the nearest point on the inner edge, dividing the myoepithelium into a string of quadrilaterals or “wedges”. Wedges were then subdivided along the in-out (of the epithelium) axis into 10 segments. Wedges were merged when both their combined inner and outer edges had an arc length <15 px. We took pre-processed (background subtracted, de-noised) SMA pixels within the mesh and smoothed them with a Gaussian blur of radius of 1. We then calculated the density of SMA signal within each mesh segment as the mean pixel value of smoothed SMA within that mesh segment. This density was then binarized to create a SMA-positivity mesh using a threshold of 0.5 (density>0.5 as positive). The percentage of duct perimeter covered by myoepithelium was calculated by assigning an “SMA-present” variable to each wedge: “0” if no mesh segments in the wedge were positive for SMA, and “1” otherwise. Each wedge was weighted by its area relative to the myoepithelium area. The sum over all wedges of the product of the “SMA-present” variable and the weight was defined as the percent perimeter SMA positivity.
The average (non-zero) thickness of the myoepithelium for each duct was calculated by finding the weighted average “wedge thickness” for SMA-positive wedges (“SMA-present” was 1). The wedge thickness was calculated as the distance between the innermost and outermost positive mesh segments. Positive wedges were weighted by their area relative to the total area of positive wedges. The percent myoepithelial-covered perimeter and average myoepithelial thickness metrics were weighted over meshes (ducts) in a given image by assigning a weight to each duct equal to the total area of the duct myoepithelium divided by the sum of the total areas of all myoepithelium in the image that met a minimum size filter of 7500 px. To assess automated thickness and continuity accuracy, myoepithelial SMA continuity and thickness were quantified manually in 5 progressor and 5 non-progressor SMA images by a board-certified pathologist using ImageJ, blinded to tumor outcome. For continuity, the total periductal perimeter in each image was first quantified by manually outlining each epithelial region. Then, gaps in the myoepithelial layer along this manual outline with no discernable SMA signal where identified. The length for each of these gaps along the periductal perimeter was quantified. Lastly, gap measurements were the summed and divided by total duct perimeter. Smooth muscle thickness was calculated by taking the average of 10 representative linear measurements.
Myoepithelial Pixel Clustering Analysis. Pre-processed (background subtracted, de-noised) images were first subset for pixels within the myoepithelium mask (see Region Masking). Pixels within the myoepithelium mask were then further subset for pixels with SMA expression >0. For all SMA+pixels within the myoepithelium mask, a Gaussian blur was applied using a standard deviation of 1.5 for the Gaussian kernel. Pixels were normalized by their total expression such that the total expression of each pixel was equal to 1. A 99.9% normalization was applied for each marker. Pixels were clustered into 100 clusters using FlowSOM (Van Gassen et al., 2015) based on the expression of six markers: PanCK, CK5, vimentin, ECAD, CD44, and CK7. The average expression of each of the 100-px clusters was found and the z-score for each marker across the 100-px clusters was computed, with a maximum z-score of 3. Using these z-scored expression values, the 100-px clusters were hierarchically clustered using Euclidean distance into six metaclusters. SMA+pixels that were negative for the six markers used for FlowSOM were annotated as the SMA-only metacluster, resulting in a total of seven metaclusters. These metaclusters were mapped back to the original images to generate overlay images colored by pixel metacluster.
Collagen Morphometrics. To identify collagen fibers, background-removed Col1 images were first preprocessed: Col1 pixel intensities were capped at 5, gamma transformed (1 of 2), and contrast enhanced. Images were then blurred via Gaussian with a sigma of 2. While this process enhances fidelity, it yields less clear “0-borders”. This effect was mitigated by generating a “0-region” mask and setting all values to 0 in that region. Then, highly localized contrast enhancement was applied. Since raw fiber signal intensity can vary greatly within a FOV, this step helps enhance locally recognizable—but globally dim—fiber candidates. After this process, contrast was globally enhanced via a reverse gamma transformation (2 of 2). Collagen fiber objects were generated by watershed segmentation on the preprocessed images. An adaptive thresholding method was developed to appreciate variability in total image intensities across the large dataset. A dilated and eroded version of each preprocessed image was produced and subjected to multi-Otsu thresholding. Elevation maps for watershed were generated via the Sobel gradient of a blurred version of the preprocessed images. Once objects were extracted and segmented, length, global orientation, perimeter, and width were computed for each object. Objects that covered low-intensity regions of the image were treated as preprocessing artifacts and were not included in averaging. Average collagen fiber lengths and average collagen branch number were calculated in the entire stromal region. Collagen fiber density (#/area) and total collagen signal were also calculated in specific histological zones defined by distance from the epithelial mask. These zones comprised the periepithelial stroma region (0-20 px from the epithelial edge), mid-stroma region (20-60 px), and distal stroma region (60+px).
Collagen fiber-fiber alignment and fiber-epithelial edge alignment were also measured. For fiber-fiber alignment, fibers were filtered for elongated shape (length>2*width) and alignment was scored as the normalized total paired squared difference over its k nearest neighbors (k=4 was chosen). To accommodate for the elongated shape of these objects, k-nearest neighbors were computed with the ellipsoidal membrane distance, which is the Euclidean centroid distance minus the portion of that distance that lies within the ellipse representation of the object. To compute the myoepithelial-to-fiber (myo-fib) alignment score, the myoepithelial region was identified as the boundary of a manually annotated epithelial mask. This region was then subdivided and labeled as separate objects. The global angle of each object is then compared to the global angle of the K nearest fiber objects, via the same metric described in the fiber-fiber method.
Prediction of Recurrence. To predict recurrence, we compared tissue procured at the time of diagnosis in two sets of patients with primary DCIS. The first set, referred to as “progressor”, consisted of 14 patients who had a new ipsilateral invasive breast event following a diagnosis of pure DCIS (median time to recurrence=9.1 years). The second set, referred to as “non-progressor”, consisted of 44 patients with pure DCIS that did not have a new breast event following primary tumor resection (median time of follow=11.4 years). For each patient, a vector of summary statistics was generated from MIBI data using only images derived from the original lesion. The cohort was split into training (80%) and test (20%) sets; all model optimization and predictor selection steps used only the training set. Any missing values were replaced with the set's predictor mean. Predictors with <12 unique values in the training set were dropped from the analysis. We removed correlated parameters because they could confound predictor importance: all predictors were ranked in importance by performing a Kolmogorov-Smirnov test between progressor and non-progressor within the training set. Greater importance was placed on predictors with lower p-values, with ties broken by weighting predictors with greater effect sizes between patient groups. We quantified pairwise correlation for all predictors (Spearman method). For each group of highly correlated predictors (R>0.85), only the highest-ranked predictor was used in the model. We varied this cutoff and found no difference in model accuracy (FIG. S7E). Two-class random forest probability models (ranger package) (Wright and Ziegler, 2017) were trained to discriminate progressors versus non-progressors. Hyperparameters were tuned on the training set to minimize out-of-bag error. The optimized random forest model was evaluated on the test set and a receiver operating characteristic curve was generated for calculating the area under the curve (pROC package) (Robin et al., 2011) using the model's assigned probability scores. Each predictor's importance was evaluated in the model by its Gini index. All analyses were repeated with 10 distinct random seeds for partitioning patients into training and test sets. For each seed, we additionally trained models using randomly permuted patient group labels (
Myoepithelial Immunofluorescence ECAD Quantification. To identify the myoepithelial regions of interest, the SMA channel was first passed through a gaussian filter, and had its maximum intensity capped, to mitigate intense autofluorescent signatures. Next, after being passed through a locally scaled gamma transform to enhance ridge-like features, the channel went through a Meijering ridge filter . To identify candidate myoepithelial “ridges”, the channel was thresholded and all objects were labeled. To filter out distant candidates, their respective distances to a manually annotated mask of the epithelium were measured and gated, only classifying ridges within 80 px as the myoepithelial region. The co-expression of SMA and ECAD was measured in these generated regions.
Myoepithelial Feature Linear Discriminate Analysis (LDA). All myoepithelial features were selected and standardized (mean subtracted and divided by the standard deviation). DCIS (primary and recurring) samples were defined as training data while normal samples were defined as the test set. We then used a dimensionality reduction technique based on LDA on the DCIS-only training set in order to capture the main differences in myoepithelial character between progressors and non-progressors. This supervised method finds the optimal linear combination of a subset of features that maximizes the separation between pre-labeled classes. By combining the myoepithelial features with a progressor/non-progressor label, we separated the DCIS patients in a one-dimensional LDA-generated space (LD1 coordinate) with respect to their progression status. LD1 is therefore the optimized linear combination of the myoepithelial- and SMA-related features for separating progressors from non-progressors. We then calculated LD1 values for our test data—the normal samples based on the trained model. The code for this LDA-based method was provided by (Tsai et al., 2020) and was made available on GitHub. p-values for comparing LD1 distributions between sample types were calculated with the Kruskal-Wallis H test using the Matlab function kruskalwallis.
Feature Ontology Enrichment Analysis. Taking into account DCIS samples only, we calculated the correlation of features with LD1. In this calculation we excluded the 21 features used to define LD1 in the LDA analysis described above. We then sorted the features by correlation with LD1, creating a ranked list of features. Features were also annotated based on belonging to one (or none) of the following functional modules or pathways: Desmoplasia and ECM remodeling (terms: CAFs, MMP9 expression, collagen deposition and fibers), Immune: immunoregulation (immune cells+PD1/PDL1/IDO1/COX2), Lipid metabolism (CD36), Lymphoid: growth/proliferation (CD4T, CD8T, B cell, dnT cell+Ki67/pS6), Myeloid: growth/proliferation (Macs, Mono, MonoDC, DC, APC+Ki67/pS6), Immune density in stroma (immune cell+stroma density), Stroma: growth/proliferation (Fibroblast or endothelium+Ki67/pS6), Tumor: ER/AR/HER2 expression (tumor+ER/AR/HER2), Tumor: immunoregulation (tumor+PDL1/IDO1/COX2), Tumor: growth/proliferation (tumor+Ki67/pS6), and Hypoxia and Glycolysis (HIF1a+GLUT1). This ranked list of features combined with their annotations into pathways was used to perform geneset enrichment analysis (GSEA) using the R package FGSEA. This procedure identified functionally related groups of features that were enriched either among the features highly correlated with LD1 or significantly anti-correlated with LD1.
Statistical Analysis. All statistical analyses were performed using GraphPad Prism (9.1.0), Matlab (2016b), or R (1.2.5033). Grouped data are presented with individual sample points throughout, and where not applicable, data are presented as mean and standard deviation. For determining significance, grouped data were first tested for normality with the D'Agostino & Pearson omnibus normality test. Normally distributed data were compared between two groups with the two-tailed Student's t-test. Non-normal data were compared between two groups using the Mann—Whitney test. Multiple groups were compared using the Kruskal-Wallis H test, with Q-values used for feature selection.
Software. Image processing was conducted with Matlab 2016a and Matlab 2019b. Data visualization and plots were generated in R with ggplot and pheatmap packages, in GraphPad Prism, and in Python using the scikitimage, matplotlib, and seaborn packages. Representative images were processed in Adobe Photoshop CS6. Schematic visualizations were produced with Biorender. R packages used for GSEA were AnnotationDbi (1.52.0) and org.Hs.eg.db, (3.12.0), clusterProfiler (3.19.0), msigdbr (7.2.1), for C2 curated datasets. Python packages used for spatial enrichment analysis and collagen morphometrics were sckikit-image, pandas, numpy, xarray, scipy, stats models.
Data and Code Availability. All custom code used to analyze data is available through our Github repository and all processed images and annotated single-cell data will be made available on a Human Tumor Atlas Network public repository and are present as single marker Tiffs in a public Zenodo repository.
This application claims the benefit of PCT Application PCT/US2021/062909, filed Dec. 10, 2021, which claims the benefit of U.S. Provisional Patent Application No. 63/123,905, filed Dec. 10, 2021, which applications are incorporated herein by reference in their entirety.
This invention was made with Government support under contract CA233254 awarded by the National Institutes of Health. The Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/062909 | 12/10/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63123905 | Dec 2020 | US |