This invention relates generally to image analysis methods for the assessment of stained tissue sections. More specifically, the present invention relates to methods of preparing data from tissue sections for machine learning using both brightfield and fluorescent imaging.
Tissue sections are commonly made visible in a brightfield microscope using chromogenic dyes. One technique uses immunochemistry to localize the dye to where a specific biomarker is present (e.g., a protein or RNA molecule). Tissue sections can also be examined under the fluorescence microscope, after staining it with fluorescent dyes. Such dyes can also be localized to specific biomarkers using a similar technique referred to as immunofluorescence.
Methods are known to extract the contribution of individual chromogenic dyes from the color image obtained by a brightfield microscope. These methods only work if there are no more different dyes on the slide than color channels acquired by the microscope's camera. Since brightfield microscopes typically have RGB cameras, chromogenic dye quantification is typically limited to three dyes (two biomarkers and one counterstain for the nuclei). It is possible to use multispectral imaging to circumvent this limit, but it is a slow technique requiring specialized equipment.
In contrast, fluorescence microscopy is limited by the availability of fluorescent dyes with emission spectra that do not overlap. Each dye is typically imaged consecutively and independently. For the special case of one red, one green and one blue dye, one can use a special filter cube and an RGB camera to image all three dyes simultaneously. It is also possible to measure the emission spectra of the various dyes, and use that information to remove the channel cross-talk (caused by overlapping emission spectra), thereby increasing the amount of dyes that can be used together on the same tissue section.
Brightfield and fluorescence microscopes have a lot in common, and fluorescence microscopes usually have a brightfield mode. The two imaging modalities require different forms of illumination, and the fluorescence modality adds some filters to the light path. Sometimes a different camera will also be selected, though this is not necessary. Although whole slide scanners that can scan a slide in both brightfield and fluorescence modalities exist, technical differences between the modalities may not allow optimal approaches for interpreting staining from both modalities simultaneously.
Machine learning comprises a group of methods and algorithms to teach a computer to distinguish things. In the case of tissue sections, one could use these methods to teach a computer to distinguish different cell types, or to determine if the tissue sample is of healthy or cancerous tissue. Machine learning can be either very simple methods such as a linear classifier or a decision tree, or more complex ones such as random forests, support vector machines, or neural networks, including convolutional neural networks. Deep learning is a term commonly used today that refers to deep neural networks (networks with many hidden layers), and consequently is a form of machine learning.
In accordance with the embodiments herein, methods are described for preparing data from tissue sections for use with machine learning using both brightfield and fluorescent imaging. Generally, the method entails the following eight steps: (i) staining a tissue section with brightfield stain ensuring that a particular tissue object is stained; (ii) staining the same tissue section with fluorescent stain ensuring that the same tissue object is stained and that target cells are identified with the fluorescent stain; (iii) scanning the tissue section in brightfield and fluorescence to create two images; (iv) quantifying and identifying cells within the brightfield image; (v) creating a data set using a subset of the identified cells; (vi) aligning the fluorescent image with the brightfield image using the tissue object that is stained in both brightfield and fluorescent; (vii) labeling the cells in the data set based on the staining of the target cells in fluorescent; (viii) using the labeled cells within the data set for machine learning, for example to train a model to identify the target cells without specific staining. The same process can be followed without identifying individual cells, where the method identifies target regions of tissue.
In the following description, for purposes of explanation and not limitation, details and descriptions are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced in other embodiments that depart from these details and descriptions without departing from the spirit and scope of the invention.
For purpose of definition, a tissue object is one or more of a cell (e.g., immune cell), cell sub-compartment (e.g., nucleus, cytoplasm, membrane, organelle), cell neighborhood, a tissue compartment (e.g., tumor, tumor microenvironment (TME), stroma, lymphoid follicle, healthy tissue), blood vessel, a lymphatic vessel, vacuole, collagen, regions of necrosis, extra-cellular matrix, a medical device (e.g., stent, implant), a gel, a parasitic body (e.g., virus, bacterium,), a nanoparticle, a polymer, and/or a non-dyed object (e.g., metal particle, carbon particle). Tissue objects are visualized by histologic stains which highlight the presence and localization of a tissue object. Tissue objects can be identified directly by stains specifically applied to highlight the presence of said tissue object (e.g., hematoxylin to visualize nuclei, immunohistochemistry stain for a protein specifically found in a muscle fiber membrane), indirectly by stains applied which non-specifically highlight the tissue compartment (e.g., DAB background staining), are biomarkers known to be localized to a specific tissue compartment (e.g., nuclear-expressed protein, carbohydrates only found in the cell membrane), or can be visualized without staining (e.g., carbon residue in lung tissue).
For the purpose of this disclosure, patient status includes diagnosis of inflammatory status, disease state, disease severity, disease progression, therapy efficacy, and changes in patient status over time. Other patient statuses are contemplated.
In an illustrative embodiment of the invention, the methods can be summarized in the following eight steps: (i) staining a tissue section with brightfield stain ensuring that a particular tissue object is stained; (ii) staining the same tissue section with fluorescent stain ensuring that the same tissue object is stained and that target cells are identified with the fluorescent stain; (iii) scanning the tissue section in brightfield and fluorescence to create two images; (iv) quantifying and identifying cells within the brightfield image; (v) creating a data set using a subset of the identified cells; (vi) aligning the fluorescent image with the brightfield image using the tissue object that is stained in both brightfield and fluorescent; (vii) labeling the cells in the data set based on the staining of the target cells in fluorescent; (viii) using the labeled cells within the data set for machine learning. This illustrative embodiment of the invention is summarized in
In some embodiments the subset of identified cells is all of the identified cells. In other embodiments, the machine learning is training a machine learning model to identify the target cells, testing the machine learning model, or validating the trained machine learning model.
In further embodiments, the machine learning model is used to identify a patient status for a patient from whom the tissue section was taken or for a separate patient not associated with the tissue section used to train the machine learning model. This embodiment can be used to create a “synthetic stain”, a markup of a digital image of a tissue section that has not been stained to cause the cells within that digital image to appear as if they had been stained.
Another embodiment of the invention is illustrated in
In further embodiments, the machine learning is training a machine learning model to identify the target tissue region, testing the machine learning model, or validating the trained machine learning model.
In further embodiments, the machine learning model is used to identify a patient status for a patient from whom the tissue section was taken or for a separate patient not associated with the tissue section used to train the machine learning model. This embodiment can be used to create a “synthetic stain”, a markup of a digital image of a tissue section that has not been stained to cause the cells within that digital image to appear as if they had been stained.
Any of these embodiments can be used with multiple tissue sections to feed into the data set. This improves the accuracy and precision of the machine learning.
This application is a continuation-in-part (CIP) of U.S. Ser. No. 16/271,525, filed Sep. 30, 2020, and titled “Methods for Identification of Tissue Objects in IHC Without Specific Staining”, which is a CIP of U.S. Ser. No. 15/396,552, filed Dec. 31, 2016, and titled “METHODS FOR DETECTING AND QUANTIFYING MULTIPLE STAINS ON TISSUE SECTIONS”; the contents of each of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
Parent | 16271525 | Feb 2019 | US |
Child | 17576349 | US | |
Parent | 15396552 | Dec 2016 | US |
Child | 16271525 | US |