This invention relates to methods employed for processing in-situ (spatial) molecular data from one or more of genomics, transcriptomics, proteomics, and other related comics, including but not limited to one or more of patient metadata such as demographics, medical records, and other information, to be used to analyze biological tissues. Analysis includes statistical methods, machine learning and artificial intelligence, and the use of neural networks to create classifiers to stratify patients. Applications may include biomarker discovery and diagnostics that identify states of disease, predict response to therapies, predict disease relapse and recurrence, predict acquired drug resistance, and identify treatment strategies.
Cancer continues to burden the global healthcare system as personalized medicine has had limited success. Targeted therapies only benefit a select population of patients amenable to those targets. This has led to high demand for biomarkers used to stratify patients into groups such as responders and non-responders. For this reason, Companion Diagnostics (CDx) are the delivery vehicle of personalized medicine, however today's CDx tests suffer from low accuracy, and patients who receive targeted therapies regularly develop acquired drug resistance to those treatments. Today's biomarkers and CDx tests are inadequate to deliver the promise of personalized medicine, and therefore new technologies and methods are required.
Initially driven by the field of Immuno-oncology, demand has been growing for technologies that integrate molecular analysis (comics) with pathology (imaging). This in-situ molecular information may provide insight into the immune response in tumors, uncover interactions within the tissue microenvironment, and resolve heterogeneity across the tissue. This has led to the emerging field of Spatial Genomics and Transcriptomics, involving the in-situ mapping of comics data across tissues. Biomarkers have historically been based on analysis of entire tissue sections, where molecular signal is averaged when cells are lumped together and analyzed as a group. However, new biomarkers are required that leverage the advances of in-situ, spatial analysis which take into account both a molecular signal, where it is located in a tissue, and its proximity to and interaction with signals around it. This new genre of biomarker, “spatial biomarkers”, may provide insights into immuno-oncology, tumorigenesis, and many more fields beyond oncology, such as developmental biology, Alzheimer's, and more. In oncology, spatial biomarkers have the potential to more accurately stratify patients into groups of responders and non-responders as well as predict recurrence, drug resistance, and more.
In one aspect, the invention provides for the identification of biomarkers based on complex networks of signatures from tissues that may include molecular data, as well as patient metadata, such as health records and demographics. These complex biomarkers encompass details of the tissue microenvironment as well as cell-cell interactions which are crucial for more accurately describing biological systems. System-level signatures are necessary to increase the accuracy of companion diagnostic tests in immuno-oncology, as well as for other diseases and in basic research applications.
In another aspect, the invention enables analysis of spatial molecular information in a way that reduces the required computational resources compared to other methods, and enables neural networks to process spatial data while preserving macro and micro-level trends across the geography of a tissue. Classifiers built using spatial data have the potential to be used to classify disease states to identify and predict disease progression, as well as identify patient populations who may respond to treatment.
In another aspect, the invention provides strategies for augmenting spatial data to become of higher or lower resolution, as needed, in order to better characterize tissues and increase the accuracy or probability of identifying biomarkers.
Additional objects, advantages and novel features of the present invention will be set forth in part in the description which follows, and will in part become apparent to those in the practice of the invention, when considered with the attached figures.
Some embodiments of the invention listed in this disclosure are illustrated as pieces to exemplify the disclosure and are not limited by the figures of the accompanying drawings, in which the following references and those of the like may indicate examples of similarities to the disclosure and in which:
It is to be noted that the term “a” or “an” entity refers to one or more of that entity; for example, “an antibody” is understood to represent one or more antibodies. As such, the terms “a” (or “an”), “one or more,” and “at least one” can be used interchangeably herein.
Tissue analysis system 10 may generally comprise a computing device 12, which is in communication with a data storage device 14, and which optionally may be in communication with to one or more clinical devices, such as a physician's or technician's computer 16, and/or data storage device 18. The communication between the aforementioned devices may be achieved using a wireless, wired, or other type of physical connection, such as a Universal Serial Bus (USB) connector or cable, an IEEE 802.3 (Ethernet) network interface, or other suitable interface or adapter. Clinical devices may also be any type of data storage media, such as magnetic and solid state disk drives, optical media, or network file shares.
Computing device 12 is configured to run one or more software applications for performing the methods and processes described herein. As used herein, the term “software application” or “application” refers to computer-executable instructions or an algorithm stored in a non-transitory medium, such as a non-volatile memory, and executed by a computer processor. The computer processor, when executing the instructions, may receive inputs and transmit outputs to any of a variety of input or output devices to which it is coupled to perform the method described herein.
Data storage device 14 may be a non-volatile data store coupled to computing device 12. For example, data storage device 14 may be an external storage device locally coupled to computing device 12, or an internal data storage device such as a hard drive. In some cases, computing device 12 may be coupled to a networked remote data storage device or server 20 via a data communication network 22. Data communication network may be a private data communication network, such as a local area network or wide area network, or may also be a public data communication network, such as the Internet. An exemplary computing device will be described below with reference to
As discussed in greater detail below, computing device 12 is provided with a software application for performing tissue analysis. In operation, the tissue analysis application can be used to retrieve the data collection, e.g., from data storage device 14, and to generate a user interface to facilitate tissue analysis of the data collection as described further herein.
Attention is now turned to
An exemplary embodiment of in-situ molecular data structured for analysis by neural networks is shown in
In accordance with an aspect of the present invention, each tissue sample 101 is tracked from removal from the patient to analysis. Collection software correlates the spatial location of the tissue sample 101 to a position on a collection plate or tube. This plate or tube is then tracked though the analysis process. An exemplary dataset may include data correlated to each unit 105 such as transcriptomic data 107, genomic data 108, proteomic data 109, and/or other modalities such as methylation, epigenetic, glycosylation, as well as including patient metadata such as medical records and demographic data. Markers or signatures 111 in the data 107, 108, 109 may be identified, such as via a neural network, based on correlations between locations of units 105, relative measurements, or other factors.
As will be explained in greater detail below, in one example, patient metadata is incorporated into the dataset such that, if a tissue sample has, for example, multiplexed gene data 108, the patient metadata is used to augment that gene data (for example adding the metadata to gene data 108) so that the neural network uses the metadata as another variable to train against. In an alternative example, patient metadata is incorporated with the neural network conclusions so that if a classifier identifies a genetic signature 111 within the gene data 108, the genetic signature 111 is then correlated to the patient metadata.
The neural network may then output a score based upon the identified tissue signatures. The score may be, by example and without limitation thereto, a level of heterogeneity in the tissue, entropy in the tissue or an estimate of phenotypic features based on the molecular data, including one or more of cell density, cell counts, tumor purity and cell types. It should be understood that these datasets 106, 107, 108, 109 may be hundreds of gigabytes in size with millions of variables. As such, it would be impossible for a human to inspect these datasets and identify any relational patterns. In other words, the data cannot be interpreted by the human mind.
As generally shown in
Once nodes 201 have been identified, the nodes may be parameterized to generate unique signatures of tissues. Non-exhaustive examples of parameters may include distance between nodes, distance to other known targets, levels of genetic expression/chemical measurement, and image density/stain density from image analysis, such as but not limited to immunohistochemistry (IHC) imaging, fluorescent staining and/or hematoxylin and eosin (H&E) imaging. One non-limiting example in the field of immune-oncology may be the parameterization of the distance between a node of TIL cells (node 1, above) and a node of cancer cells (node 2, above). This parameterized distance may indicate a potential pathological response (e.g., whether TIL cells invade or stay away from cancel cells may be indicative of the body's ability to fight the cancer or response to certain drugs that stimulate the immune system).
In the case where comparison of molecular data and pathology imaging is required, molecular nodes 201 can be aligned with pathology images 114 by means of transforming the data set based on reference nodes 201 present in both the molecular data set and the imaging data set. That is, the pathology image represents the intensity of light captured by the camera across the tissue while the molecular data captured by spatial-instruments represents the intensity of the analytic target (e.g., gene expression) across the tissue. Identical reference nodes within the pathology image and molecular data are identified so that a mathematical transformation matrix can be determined. The transformation matrix may then be applied to one or the other of the pathology image or molecular data so as to align the pathology image and the molecular data with one another.
Turning now to
Attention is now turned to
An additional exemplary embodiment of processing spatial data is shown in
Attention is now turned to
In one exemplary embodiment, for measurements taken where discrete gaps between data units 105 exist, the data may be consolidated 135 such that spatially resolved units are adjacent to each other creating a virtual representation of the original region or tissue 106. Corresponding readouts of the data 140 show analytes 141 corresponding to each spatially resolved unit 105. In this case, the data is being structured during pre-processing such that the algorithm does not need to know that “gaps” exist.
Another exemplary embodiment of processing data when gaps are present between molecular measurements is to preserve those gaps in the dataset as shown in 136. Here, spatially resolved units 105 are recorded while the gaps 137 are also recorded, such that corresponding analysis and readouts of the data 142 show results of measured analytes 144 while illustrating the gaps 143. The gaps can be preserved as “zeroes” or “Not a number” to preserve distance metrics between data or nodes during analysis by the neural network, or the gaps can by synthetically eliminated and the neural network can process the data with a modified dataset.
Another exemplary embodiment of processing data when gaps are present between molecular measurements is to interpolate the data between measured units as shown in 138. Any suitable interpolation method may be used during pre-processing of the data, such as but not limited to linear, nearest neighbor or splines. Additionally or alternatively, the neural network may also perform the interpolation. By way of example, spatially resolved units 105 are recorded while gaps 137 are also recorded, and the measurement of analytes is estimated or interpolated between units 105. In this way and as shown in
Turning attention to
As shown in
Networks of correlated genes were then plotted, specifying inverse and positive correlations along with indicating where the gene was found in the tumor microenvironment. Patients stratified by complex co-expression analysis revealed a “hot” cohort 159 and a “cold” cohort 160. “Hot” and “cold” in the context of immuno-oncology references a tumor with active or inhibited immune system response, respectively. Identifying which patients will respond positively to specific drugs is highly desired in cancer diagnostics. Thus, in accordance with an aspect of the present invention, an exemplary method described herein may be used to help distinguish between patients who may or may not respond to special drugs, thereby leading to improved patient care and patient outcomes.
Referring now to
As further shown in
In a further aspect of the present invention, method 200 may also include step 218 wherein patient metadata including one or more of medical records, medical imaging and demographic data is correlated with its respective spatial molecular data. At step 220, method 200 may also provide an output based upon the generated unique tissue signatures, where the output may be a score indicating a level of heterogeneity in the tissue, entropy in the tissue, or an estimate of phenotypic features based on the molecular data, including one or more of cell density, cell counts, tumor purity and cell types.
Turning now to
In accordance with a further aspect of the present invention, method 300 may optionally include step 310 wherein one or more medical images of the tissue are also received by computing device 12, wherein the one or more medical images comprises tissue image data including one or more of immunohistochemistry (IHC) imaging, fluorescent staining (including fluorescent in situ hybridization, or FISH), hematoxylin and eosin (H&E) imaging, and brightfield imaging. At step 312, the molecular data is aligned with the one or more medical images.
In still another aspect of the present invention, method 300 may also include performing a preliminary analysis of the plurality of raw molecular data sets to define selected areas of spatial molecular data for pre-processing at step 314. The spatial molecular data may also be down-sampled to compress one or more of the plurality of raw molecular data sets at step 316, augmented by mathematical interpolation at step 318, or upscaled to a higher resolution by generative upscaling using the neural network at step 320.
Having described the system, processes and methods of the present invention and embodiments thereof, an exemplary computer environment for implementing the described processes and methods is provided below.
The system memory 436 is also connected to bus 424 and may include read only memory (ROM), random access memory (RAM), an operating system 444, a basic input/output system (BIOS) 446, application programs 448 and program data 450. The computer 412 may further include a hard disk drive 452 for reading from and writing to a hard disk, a magnetic disk drive 454 for reading from and writing to a removable magnetic disk (e.g., floppy disk), and an optical disk drive 456 for reading from and writing to a removable optical disk (e.g., CD ROM or other optical media). The computer 412 may also include USB drives 445 and other types of drives for reading from and writing to flash memory devices (e.g., compact flash, memory stick/PRO and DUO, SD card, multimedia card, smart media xD card), and a scanner 458. A hard disk drive interface 452a, magnetic disk drive interface 454a, an optical drive interface 456a, a USB drive interface 445a, and a scanner interface 458a operate to connect bus 424 to hard disk drive 452, magnetic disk drive 454, optical disk drive 456, USB drive 445 and scanner 458, respectively. Each of these drive components and their associated computer-readable media may provide computer 412 with non-volatile storage of computer-readable instruction, program modules, data structures, application programs, an operating system, and other data for computer 412. In addition, it will be understood that computer 412 may also utilize other types of computer-readable media in addition to those types set forth herein, such as digital video disks, random access memory, read only memory, other types of flash memory cards, magnetic cassettes, and the like.
Computer 412 may operate in a networked environment using logical connections with each of the system components described above. Network interface 428 provides a communication path 460 between bus 424 and network 22, which allows data to be communicated through network 22 to and from server 22 to photofinishers computing device 412. This type of logical network connection is commonly used in conjunction with a local area network (LAN). The data related to the methods and processes described herein may also be communicated from bus 424 through a communication path 462 to network 22 using serial port 432 and a modem 464. Using a modem connection between the computer 412 and the other components of system 10 is commonly used in conjunction with a wide area network (WAN). It will be appreciated that the network connections shown herein are merely exemplary, and it is within the scope of the present invention to use other types of network connections between computer 412 and the other components of system 10 including both wired and wireless connections.
From the foregoing, it will be seen that this invention is one well adapted to attain all the ends and objects hereinabove set forth together with other advantages which are obvious and which are inherent to the device described herein. It will be understood that certain features and sub combinations are of utility and may be employed without reference to other features and sub combinations. This is contemplated by and is within the scope of the claims. Since many possible embodiments of the invention may be made without departing from the scope thereof, it is also to be understood that all matters herein set forth or shown in the accompanying drawings are to be interpreted as illustrative and not limiting.
The constructions described above and illustrated in the drawings are presented by way of example only and are not intended to limit the concepts and principles of the present invention. As used herein, the terms “having” and/or “including” and other terms of inclusion are terms indicative of inclusion rather than requirement. Further, it should be understood that the use of the terms “module” and “component” herein are interchangeable and shall have the same meaning.
While the invention has been described with reference to preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof to adapt to particular situations without departing from the scope of the invention. Therefore, it is intended that the invention not be limited to the particular embodiments disclosed as the best mode contemplated for carrying out this invention, but that the invention will include all embodiments falling within the scope and spirit of the appended claims.
This application claims the benefit of U.S. Provisional Patent Application No. 62/988,341 filed on Mar. 11, 2020, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62988341 | Mar 2020 | US |