In at least one aspect, a system and method for determining cancer stage in a subject is provided.
In at least one aspect, a biological structure identification system is provided. The biological structure identification system includes an optical imaging system configured to illuminate a liquid biopsy sample for a subject. The liquid biopsy sample has one or more biological structures that are labeled with one or more fluorophores associated with a fluorescence assay for a cancer allowing detection of emitted electromagnetic radiation from the liquid biopsy sample as image data. The system also includes a processing system configured to:
In another aspect, a method of diagnosing a disease with the biological structure identification system set forth herein is provided. The method includes steps of:
The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The drawings are of illustrative examples. They do not illustrate all examples. Other examples may be used in addition or instead. Details that may be apparent or unnecessary may be omitted to save space or for more effective illustration. Some examples may be practiced with additional components or steps and/or without all of the components or steps that are illustrated. When the same numeral appears in different drawings, it refers to the same or like components or steps.
For a further understanding of the nature, objects, and advantages of the present disclosure, reference should be had to the following detailed description, read in conjunction with the following drawings, wherein like reference numerals denote like elements and wherein:
Reference will now be made in detail to presently preferred compositions, embodiments and methods of the present invention, which constitute the best modes of practicing the invention presently known to the inventors. The Figures are not necessarily to scale. However, it is to be understood that the disclosed embodiments are merely exemplary of the invention that may be embodied in various and alternative forms. Therefore, specific details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for any aspect of the invention and/or as a representative basis for teaching one skilled in the art to variously employ the present invention.
Except in the examples, or where otherwise expressly indicated, all numerical quantities in this description indicating amounts of material or conditions of reaction and/or use are to be understood as modified by the word “about” in describing the broadest scope of the invention. Practice within the numerical limits stated is generally preferred. Also, unless expressly stated to the contrary: percent, “parts of,” and ratio values are by weight; the description of a group or class of materials as suitable or preferred for a given purpose in connection with the invention implies that mixtures of any two or more of the members of the group or class are equally suitable or preferred; description of constituents in chemical terms refers to the constituents at the time of addition to any combination specified in the description, and does not necessarily preclude chemical interactions among the constituents of a mixture once mixed; the first definition of an acronym or other abbreviation applies to all subsequent uses herein of the same abbreviation and applies mutatis mutandis to normal grammatical variations of the initially defined abbreviation; and, unless expressly stated to the contrary, measurement of a property is determined by the same technique as previously or later referenced for the same property.
It must also be noted that, as used in the specification and the appended claims, the singular form “a,” “an,” and “the” comprise plural referents unless the context clearly indicates otherwise. For example, reference to a component in the singular is intended to comprise a plurality of components.
As used herein, the term “about” means that the amount or value in question may be the specific value designated or some other value in its neighborhood. Generally, the term “about” denoting a certain value is intended to denote a range within +/−5% of the value. As one example, the phrase “about 100” denotes a range of 100+/−5, i.e. the range from 95 to 105. Generally, when the term “about” is used, it can be expected that similar results or effects according to the invention can be obtained within a range of +/−5% of the indicated value.
As used herein, the term “and/or” means that either all or only one of the elements of said group may be present. For example, “A and/or B” shall mean “only A, or only B, or both A and B”. In the case of “only A”, the term also covers the possibility that B is absent, i.e. “only A, but not B”.
It is also to be understood that this invention is not limited to the specific embodiments and methods described below, as specific components and/or conditions may, of course, vary. Furthermore, the terminology used herein is used only for the purpose of describing particular embodiments of the present invention and is not intended to be limiting in any way.
The term “comprising” is synonymous with “including,” “having,” “containing,” or “characterized by.” These terms are inclusive and open-ended and do not exclude additional, unrecited elements or method steps.
The phrase “consisting of” excludes any element, step, or ingredient not specified in the claim. When this phrase appears in a clause of the body of a claim, rather than immediately following the preamble, it limits only the element set forth in that clause; other elements are not excluded from the claim as a whole.
The phrase “consisting essentially of” limits the scope of a claim to the specified materials or steps, plus those that do not materially affect the basic and novel characteristic(s) of the claimed subject matter.
The phrase “composed of” means “including” or “consisting of.” Typically, this phrase is used to denote that an object is formed from a material.
With respect to the terms “comprising,” “consisting of,” and “consisting essentially of,” where one of these three terms is used herein, the presently disclosed and claimed subject matter can include the use of either of the other two terms.
The term “one or more” means “at least one” and the term “at least one” means “one or more.” The terms “one or more” and “at least one” include “plurality” as a subset.
The term “substantially,” “generally,” or “about” may be used herein to describe disclosed or claimed embodiments. The term “substantially” may modify a value or relative characteristic disclosed or claimed in the present disclosure. In such instances, “substantially” may signify that the value or relative characteristic it modifies is within ±0%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5% or 10% of the value or relative characteristic.
It should also be appreciated that integer ranges explicitly include all intervening integers. For example, the integer range 1-10 explicitly includes 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10. Similarly, the range 1 to 100 includes 1, 2, 3, 4 . . . 97, 98, 99, 100. Similarly, when any range is called for, intervening numbers that are increments of the difference between the upper limit and the lower limit divided by 10 can be taken as alternative upper or lower limits. For example, if the range is 1.1. to 2.1 the following numbers 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, and 2.0 can be selected as lower or upper limits.
In the examples set forth herein, concentrations, temperature, measurement conditions, and reaction conditions (e.g., pressure, pH, temperature, etc.) can be practiced with plus or minus 50 percent of the values indicated rounded to or truncated to two significant figures of the value provided in the examples. In a refinement, concentrations, temperature, and reaction conditions (e.g., pressure, pH, temperature, etc.) can be practiced with plus or minus 30 percent of the values indicated rounded to or truncated to two significant figures of the value provided in the examples. In another refinement, concentrations, temperature, and reaction conditions (e.g., pressure, pH, temperature, etc.) can be practiced with plus or minus 10 percent of the values indicated rounded to or truncated to two significant figures of the value provided in the examples.
In this disclosure, the indefinite article “a” and phrases “one or more” and “at least one” are synonymous and mean “at least one”.
Relational terms such as “first” and “second” and the like may be used solely to distinguish one entity or action from another, without necessarily requiring or implying any actual relationship or order between them. The terms “comprises,” “comprising,” and any other variation thereof when used in connection with a list of elements in the specification or claims are intended to indicate that the list is not exclusive and that other elements may be included. Similarly, an element preceded by an “a” or an “an” does not, without further constraints, preclude the existence of additional elements of the identical type.
The term “event” refers to the detection of an observable imaging signal and in particular to the detection of a fluorescence signal.
The term “feature” refers to any measurable parameter that characterizes an event, image, or image data. For example, features can includes shape parameters, location parameters, texture parameters, and parameters quantifying the fluorescent image.
The term “cluster” refers to a group of similar data points. In a refinement, data points can be grouped together based on the proximity of the data points to a measure of central tendency of the cluster. For example, the measure of central tendency may be the arithmetic mean of the cluster. In such an example, the data points are joined together based on their proximity to the average value in the cluster. (e.g., hierarchical clustering).
The term “similar” when referring to data points means that the data points can be placed in the same cluster. That is, similar data points can be placed or included within the same cluster after a clustering analysis. In a refinement, a cell (or other biological structure) is similar to another cell (or other biological structure) if the cell (or other biological structure) belongs in the same cluster after cluster analysis (hierarchical clustering), which is an algorithm that groups similar objects into groups. OCULAR applies a Principal Component Analysis onto the high dimensional dataset and then undergoes hierarchical clustering on the distance matrix of the PCA dataset. The output of the hierarchical algorithm determines which cells (or other biological structures) are similar to another by determining which cluster each cell belongs in. In another refinement, a set of cellular features (e.g., biological structures) is similar to another set of cellular features if the distance of the principal components between those sets is within the 1 percentile of all distances found in the distance matrix of a large dataset, which includes those sets, that underwent PCA.
The term “imaging event” means imaging structures that are defined by imaging parameters collected by the imaging system without applying biological context/relevance.
The term “profile of biological structure identification buckets” means a predetermine collection of biological structure identification buckets. Therefore, the user or an algorithm can select a plurality of biological structure identification buckets from which profiles are formed. Profiles for a characterizing a cancer stage are a specific collection of biological structure identification buckets that are common to a cohort of human samples in a specific cancer stage. A profile from a given human sample can be computationally/mathematically compared to a reference cohort-determined cancer stage profiles to determine the similarity of the given sample to the reference profiles.
The term “computing device” refers generally to any device that can perform at least one function, including communicating with another computing device.
The term “computing device” refers generally to any device that can perform at least one function, including communicating with another computing device.
When a computer or other computing device is described as performing an action or method step, it is understood that the computer or other computing device are operable to and/or configured to perform the action or method step typically by executing one or more lines of source code. The actions or method steps can be encoded onto non-transitory memory (e.g., hard drives, optical drive, flash drives, and the like).
The term “configured to or operable to” means that the processing circuitry (e.g., a computer or computing device) is configured or adapted to perform one or more of the actions set forth herein, by software configuration and/or hardware configuration. The terms “configured to” and “operable to” can be used interchangeably.
The processes, methods, or algorithms disclosed herein can be deliverable to/implemented by a processing device, controller, or computer, which can include any existing programmable electronic control unit or dedicated electronic control unit. Similarly, the processes, methods, or algorithms can be stored as data and instructions executable by a controller or computer in many forms including, but not limited to, information permanently stored on non-writable storage media such as ROM devices and information alterably stored on writeable storage media such as floppy disks, magnetic tapes, CDs, RAM devices, and other magnetic and optical media. The processes, methods, or algorithms can also be implemented in an executable software object. Alternatively, the processes, methods, or algorithms can be embodied in whole or in part using suitable hardware components, such as Application Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), state machines, controllers or other hardware components or devices, or a combination of hardware, software and firmware components.
Throughout this application, where publications, patents, or published patent applications are referenced, the disclosures of these publications, patents, or published patent applications in their entireties are hereby incorporated by reference into this application to more fully describe the state of the art to which this invention pertains.
+: positive, when associated with a marker (e.g., CD31+, CD45+, CK+, vimentin+) or a chemical molecule (e.g., DAPI+), the cell or biological formation expresses this marker or a chemical molecule.
−: negative, when associated with a marker (e.g., CD31−, CD45−, CK−, vimentin−) or a chemical molecule (e.g., DAPI−), the cell or biological formation does not express this marker or a chemical molecule.
CD31: platelet endothelial cell adhesion molecule-1.
CD45: leukocyte-common antigen.
CPU: central processing unit.
CTC: circulating tumor cell
CRE: circulating rare event
CK: cytokeratin.
DAPI: 4′,6-diamidino-2-phenylindole.
HDSCA: High Definition Single Cell Assay.
OCULAR: Outlier Clustering Unsupervised Learning Automated Report.
PCA: principal component analysis.
Referring to
In a variation, the rare imaging events are observed as rare biological structures.
In another variation, one or more biological structures include simultaneously identified multiple biological structures.
In another variation, the rare biological structures are observed as rare imaging events in the imaging data.
Still referring to
Still referring to
In a variation, the biological structure identification system 10 is configured to receive a liquid biopsy sample by using the liquid biopsy sample carrier 16 and illuminate the liquid biopsy sample with an electromagnetic radiation from illumination system 18 that has a specific wavelength or wavelengths that can be absorbed by the fluorophore. Light detection system 20 is configured to detect and determine an intensity and a wavelength of fluorescence emitted by the fluorophore with light detection system 20 or produce input data for these characteristics so that they can be determined by processing system 12.
Processing system 14 is configured to generate an image of the biological structure(s) from image data received from light detection system 20; detect and determine a morphology of each biological structure from the image and/or the image data using the plurality of features; identify the type of each biological structure based on the features defined herein (which can determine a specific morphology) of each biological structure; form a biological structure identification buckets (“identification bucket”) based on the identified biological structure type such that each biological structure identification bucket contains the biological structure(s) that are similar in type and in particular cells containing such biological structure(s); and optionally, form a set of identification buckets (“identification bucket set”) based on the identification buckets. In a particularly useful variation, the biological structures are cells that a placed in the identification buckets and in the identification bucket set. In this context, “placed in” means that an association between the cells (or other biological structures) and the identification buckets and in the identification bucket set is saved in computer readable form as set forth below. In the context of the present embodiment, each biological structure identification bucket identifying imaging structures that are similar in type.
In a variation, processing system 14 is further configured to form a disease map based on information related to the biological structure identification bucket set(s), relate the disease map to a specific disease and disease stage, and label the disease map according to an identified related specific disease and disease stage.
In some aspects, the morphology of the biological structure may be determined by using at least one feature extracted from the image or image data. Typically, the image data will include features (e.g., parameters) of the fluorescent light emitted from the sample. These features can be extracted from the generated image or the image data by using know software packages such as the EBImage which is an open source R package distributed as part of the Bioconductor project. The morphology of the biological structure may be determined by using at least 10 features, at least 100 features, at least 500 features, or at least 1,000 features extracted from the image or image data. Features can include shape parameters, location parameters, texture parameters, and parameters quantifying the fluorescent image (e.g., specific fluorescence wavelength(s), fluorescence signal intensity, etc.). The feature may be related to size, shape, texture and structure of the biological structure's morphology. In some variations, an image mask is deployed limiting the observable image area to regions encompassed by the mask. Table 1 provides non-limiting examples of features that can be used in the analysis. Any combination of the features in Table 1 can be used.
In some refinements, the identification bucket may be a specific repository (e.g., classification) where information related to a specific biological structure(s) identified in a liquid biopsy sample is stored, wherein the specific biological structures may have substantially similar properties, including substantially similar morphologies and substantially similar marker profiles. The information related to the specific biological structure may be any information related to the biological structure, including the identification bucket's label, number of the specific biological structures identified in a given portion of the liquid biopsy sample analyzed, properties associated with the specific biological structure, information related to the liquid biopsy sample, the like, or a combination thereof. This information related to the specific biological structure(s) may be stored in any convenient manner. For example, the information related to the specific identification bucket may be stored in the memory system. In some refinement, the bucket is a cluster as described below.
In some aspects, at least a subset of the biological structure can a structure with a membrane, a protein, DNA, RNA, or a combination thereof. The structure with a membrane may be a cell, a vesicle, or a combination thereof. The vesicle may be an oncosome. The oncosome may have a characteristic size (e.g. characteristic length or characteristic diameter) equal to or larger than one micrometer. The oncosome may have a characteristic size (e.g. characteristic length or characteristic diameter) larger than an exosome.
In other aspects, the liquid biopsy sample may be a non-solid biological sample. The liquid biopsy sample may be a body fluid sample. The liquid biopsy sample may include a blood sample, a bone marrow sample, a peritoneal fluid sample, a urine sample, a saliva sample, a vaginal fluid sample, a semen sample, a tear sample, a mucus sample, an aqueous humor sample, cerebrospinal fluid (CSF) sample, or a combination thereof. The liquid biopsy sample may include a blood sample. The liquid biopsy sample may include common immune cells and rare biological structures.
In still other aspects, the rare biological structures may include cancer cells that have cancer genomic profiles and/or cancer protein markers; tumor microenvironment cells that leak into circulation, wherein these cells comprise epithelial cells, endothelial cells, mesenchymal cells, other stromal cells, cells that are in various transitional states, or a mixture thereof; immune cells that are responding to the tumor itself or cancer treatment; vesicles, or a mixture thereof. The rare biological structures may include conventional circulating tumor cells, which are CK+, vimentin−, CD31− and CD45−; circulating tumor cells, which are CK+, CD31−, CD45−, and vimentin+, and wherein tumor cells may putatively in epithelial to mesenchymal transition; tumor cells, which are CK+, and coated with platelets, which are CD31+; endothelial cells, which are CD31+, vimentin+, and CK−; endothelial cells, which are CD31+, vimentin+ and CK+; megakaryocytes, which are CD31+ and vimentin−, wherein megakaryocytes may comprise large cells containing a single, large, multi-lobulated, polyploidy nucleus responsible for the production of blood thrombocytes platelets; large cells, which are CD31+, and cytokeratins, which are CK+, wherein these large cells may be present in the liquid biopsy samples obtained from a bone marrow; large cells, which are CD31+ and CK+, wherein these large cells may be present in liquid biopsy samples obtained from a bone marrow; cells, which are DAPI+ and vimentin+; round cells, which are CD45+ and CK+; round cells, which are CD45+, vimentin+, CD45+, and CK+; clusters of cells (“cell cluster’) comprising at least two cells, wherein the cells are same type of cells and/or different types of cells; cells, which are DAPI+, CD45−, CD31−, and CK−; immune cells, which are CD45+ and vimentin−; immune cells, which are CD45+ and vimentin+ (type III intermediate filament protein), extra-cellular vesicles, or a mixture thereof.
In some aspects, the liquid biopsy sample may include common biological structures and rare biological structures. A total number of biological structures is a sum of the number of common biological structures and the number of rare biological structures. Characteristically, the fraction of the rare biological structures are equal to or less than 10%, 5%, 1%, 0.1%, or 0.01% of the total number of biological structures.
In a refinement, the optical imaging system includes a fluorescence imaging system, a brightfield imaging system, or a combination thereof. The optical imaging system may include a fluorescence microscope, a brightfield microscope, or a combination thereof.
In some aspects, the emitted electromagnetic radiation may be a fluorescent radiation.
In some aspects, the biological structure identification system includes at least one fluorescence channel. The number of fluorescence channels may be in the range of 1 to 10 fluorescence channels, or in the range of 4 to 7 fluorescence channels. In a refinement, the number of fluorescence channels may be only four. These four fluorescence channels may be a first fluorescence channel configured for detection useful for nuclear segmentation and characterization; a second fluorescence channel configured to detect a cytokeratin (CK) for its epithelial-like phenotype; a third fluorescence channel configured to detect a vimentin for its endothelial/mesenchymal-like phenotype; and a fourth fluorescence channel configured to detect both a CD31 for its endothelial-like phenotype, and a CD45 for its immune cell phenotype. These four fluorescence channels may be a first fluorescence channel configured for detection of fluorescence emission at a blue color wavelength region; a second fluorescence channel configured for detection of fluorescence emission at a red color wavelength region; a third fluorescence channel configured for detection of fluorescence emission at an orange color wavelength region; and a fourth fluorescence channel configured for detection of fluorescence emission at a green color wavelength region. For example, these for regions can be defined by an emission filter centered at 455 nm with a bandwidth of 50 nm for blue color wavelengths, an emission filter centered at 525 nm with a bandwidth of 36 nm for green color wavelengths, an emission filter centered at 605 nm with a bandwidth of 52 nm for orange color wavelengths, and an emission filter centered at 705 nm with a bandwidth of 72 nm for red color wavelengths. The first immunofluorescence channel may be configured to detect 4′,6-diamidino-2-phenylindole (DAPI) for nuclear segmentation and characterization.
In some aspects, the systems of this disclosure may be configured to identify endothelial cells and immune cells from the features and/or the morphology of the endothelial cells and the immune cells determined from the features. In particular, the system can be configured to identify the endothelial cells and the immune cells from the features (and/or morphology of the endothelial cells and the immune cells determined from the features), and to differentiate the endothelial cells from the immune cells. The endothelial cells may have more elongated morphologies as compared to the immune cells, and the immune cells may have more round morphologies as compared to the endothelial cells. In a refinement, such morphologies are determined from the features as described herein.
In some aspects, the liquid biopsy sample is obtained from a diseased human. For example, the liquid biopsy sample may be obtained from a human afflicted with a cancer.
In some refinements, the biological structure identification system is further configured to form a disease map based on information related to the identification bucket set(s), relate this disease map to a specific disease and disease stage, and label this disease map according to the related specific disease and its stage. The biological structure identification system may further be configured to store a disease map based on information related to the identification bucket set(s) and labeled by a disease type and the disease stage, and wherein the disease may cause formation of the biological structures forming said identification bucket set(s). The biological structure identification system is configured to form disease maps of at least two different types of diseases and stages of each disease.
The biological structure identification system may further be configured to form a disease atlas (“ATLAS) of disease maps based on the disease maps of different disease types and their stages. In this regard, the atlas built by using the trillions of cellular data, performing a PCA on the dataset and then selecting the cells that would create a dataset that would have a non-overlapping region in that PCA dataspace. Each cell would represent a certain region of that space such that any subsequently scanned cell would necessarily belong to a cell in the atlas. A cell would be assigned an ATLAS cell ID by applying the ATLAS PCA transform and finding the closest ATLAS cell. For example, identifying clusters into which a cell from a patient belongs can be used to assist in cancer identification and prognosis. In this context, “belong” means that the cell (or other biological structure) has feature values representative of the cluster (e.g., within the parameter or feature boundaries of the cluster). In some refinements, the atlas and/or the disease maps in atlas include metadata such as patients' identification, clinical parameters, image parameters and the like. The atlas and/or the disease maps can include this data for each cell (or other biological structures) contained therein. In a refinement, the disease atlas is stored in a computer readable medium and in particular a non-transitory computer readable medium (e.g., random access memory, CDROM, DVD, hard drive, etc.). In a refinement, the disease atlas is stored in a computer readable medium as a data structure with relationship between the stored values. Examples of data structures that can be used include, but are not limited to arrays, linked lists, records, a graph, a tree data structure (e.g., a binary tree), a database (e.g., a relational database), and combinations thereof. In a refinement, the disease atlas is stored as a database and in particular, a relational database that can be queried.
In some refinements, the biological structure identification system is further configured to diagnose the disease type and its stage based on the received liquid biopsy sample from a human afflicted with a disease. The biological structure identification system may further be configured to diagnose the disease type and its stage based on a liquid biopsy sample received from a human afflicted with a disease by comparing the disease map formed for the received liquid biopsy sample with the disease maps of the disease atlas stored in the biological structure identification system prior to receiving the liquid biopsy sample.
In a variation, an immunofluorescence assay for analyzing a liquid biopsy sample is provided. This assay may include antibodies against cytokeratin (CK), vimentin, CD31 and CD45. In a refinement, at least a subset of the antibodies against cytokeratin (CK), vimentin, CD31 and CD45 are labeled with a fluorophore. In the Baseline assay, each of cytokeratin (CK) and vimentin are independently labeled with a fluorophore while one or both of CD31 and CD45 are labeled with a fluorophore. Examples of fluorophores include but are not limited to, DAPI and Hoechst 33342 and 33258 (as nuclear dyes), Alexa Fluor 488 (for Vimentin), Alexa Fluor 555 (for cytokeratin), Alexa Fluor 647 (for CD31/CD45), and the like.
In a variation, a method of analyzing a liquid biopsy sample is provided. This method may include having a liquid biopsy sample comprising biological structures; preparing a sample comprising a single layer of biological structures (“single layer biological structure sample”) by using the liquid biopsy sample; staining the biological structures of the single layer biological structure sample with the fluorescent assay(s) set forth herein (having four fluorescent dyes) or any fluorescent assay; using the biological structure identification system (s) of this disclosure; identifying the rare biological structures through their fluorescence and morphology; and forming a biological structure identification bucket based on the identified biological structure type, wherein each biological structure identification bucket may contain a similar type of biological structures.
Referring to
Referring to
As depicted by box 230, common event clustering is analyzed as follows. Since the common clusters from the previous step are determined by a single region on the slide, each common cluster is then clustered together by their similarity. The sum of those events are preserved as the data converges with one another. As depicted by box 240, a common cell classifier is applied as follows. A dataset of all known events of the assay being used (referred to as “ATLAS) is applied to each common event cluster. In particular, each common event cluster is compared to all “ATLAS” data points and classified as one of our determined cell types. These events can then be enumerated. As depicted by box 250, the rare events undergo a filtering process, where each rare event candidate is compared to each common event cluster. This cleans the rare event candidate list for slide wide rarity, instead of regional rarity. As depicted by box 260, a rare cell classifier is applied as follows. The “ATLAS” dataset of all known events of the assay is applied. Each rare event candidate will be compared to all “ATLAS” data points and classified as one of our determined cell types. This step further filters out events that are “common.” The classified events can then be enumerated as certain cell types. Any event that is not classified within the “ATLAS” undergoes clustering and the aggregate information is collected and sent to the final report. As depicted by box 270, DAPI− event clustering proceeds as follows. All DAPI− events from the slide are collected, undergo dimensional reduction, and the hierarchical clustered into multiple groups. Each DAPI− group has the mean of the features of the events within the cluster. Each DAPI− event data is preserved as well as their position on the slide. The aggregated cluster information is sent to the report. As depicted by box 280, each common event cluster is represented in the report as 10 montages of sample even within the cluster as well as the count of all events within that cluster and their aggregate information. Each non-classified rare event cluster is represented similarly with 10 sample montages, the count of a events within the cluster, and their respective aggregate information. If the user wants to retrieve the individual event data or the events within a certain cluster, the user will send a command to the server to individually montage each event within the respective cluster. Similar to the non-classified rare events, the DAPI− event clusters are represented with 10 sample montages, the count of events within the cluster, and their respective aggregate information. If the user wants to retrieve the individual event data for the events within a certain cluster, the user will send a command to the server to individually montage each event within the respective cluster. The classified rare events, as well as any event within a cluster that the user sent to the server for individual event data collation, are individually montaged, easily sortable and Queryable, and are shown in a user interface that can give the user a holistic view of all rare events within the slide.
In some aspects, a method for evaluating a subject for cancer stage is provided is provided. This method may include having a liquid biopsy sample from the patient comprising biological structures; preparing a sample comprising a single layer of biological structures (“single biological structure layer sample”) from the liquid biopsy sample; staining the biological structures of the single biological structure layer sample with a fluorescence assay (e.g., immunofluorescence assay such as the Baseline assay set forth herein or any fluorescent assay); applying (e.g., determining) the biological structure identification system (s) set forth above; identifying the rare biological structures through their fluorescence and morphology; forming a biological structure identification bucket (“identification bucket”) based on the identified biological structure type, wherein each biological structure identification bucket contains the biological structure(s) that are similar in type; forming a set of identification buckets (“identification bucket set”) based on the identification buckets; comparing information related to the identification bucket set to that of the atlas; determining the disease afflicting the patient; and treating the patient. A processing system performs the following steps:
In the disease setting, in addition to these common immune cells, the liquid biopsy sample may further comprise rare cells that may actively escape or passively leak into the circulation and travel through the circulation, and may represent the disease.
Rare cells are defined as cells that are statistically distinct by their image analysis features. These rare cells are extracted by the following criteria: (a) after performing a bucketing analysis, the cells within the smallest population buckets are classified as rare; and (b) the cells within the cluster that is statistically deviant from the median value of all features from all cells are also classified as rare. The population of the rare cells may be lower than 5% of the total number of cells identified in the liquid biopsy sample. The population of the rare cells may be lower than 1% of the total number of cells identified in the liquid biopsy sample. The population of the rare cells may be lower than 0.1% of the total number of cells identified in the liquid biopsy sample.
The travel of the rare cells through the circulation may be with short half-lives or long half-lives. The rare cell travel may also include stopovers in various tissues along the way.
Representing the disease may mean that these rare cells may be (a) cancer cells as may be evidenced by their cancer genomic profiles and/or cancer protein markers; (b) tumor microenvironment cells that leak into circulation, wherein these cells may comprise epithelial cells, endothelial cells, mesenchymal cells, other stromal cells, cells that are in various transitional states, or a mixture thereof; (c) immune cells that may be responding to the tumor itself or cancer treatment; or (d) a mixture thereof.
The appearances of categories and classification of rare cells may be different across different cancers and stages of each cancer. Systems, methods and assays of this disclosure may identify various cellular subtypes both reproducibly for clinical practice while also enabling discovery of the unknown with an ability to detect a vast majority that have been implicated simultaneously in a unified experiment.
The subclasses of cells may be separated by protein and nuclear patterns as well as by cell morphology. The subclasses may be validated by downstream genomic or proteomic analyses, which might or might not be necessary for future clinical applications.
In another variation, one example relates to an approach to distinguish a substantially larger number of cellular groups using five markers. These markers are fluorescently protein antibodies or molecules labeled to four distinct fluorophores or fluorescent antibody. The computational method combines morphological differences as revealed by distinct fluorescence signatures to distinguish between at least twelve different rare cell subtypes, which may be present in the liquid biopsy sample. These rare cells are listed below.
This approach leverages both a new sample processing protocol reducing the five markers into four fluorescence channels and a novel computational method for classifying the different rare cell types via analysis of fluorescent microscopy images. Important for the success of this approach is the choice of marker combinations within and across fluorescent channels.
The computational approach is distinct from what everyone else is doing by putting ‘every event’ into a bucket of similar biological structures. Others look for specifics, for the known, which is a fundamental limitation of standard image analysis and of machine learning approaches as these would always ever only find the known. If on the other hand we force the computational method to accommodate every event on the slide defined by the existence of an imaging signal (in our current case it is fluorescent but it could also be brightfield), we can now cluster all events. As a next step, we allow for both common event clusters and rare event clusters. We in fact do not necessarily argue that all ‘cancer events’ are in rare clusters but instead we are effectively reducing the dimensionality of the total slide of millions of events with in itself hundreds of potential parameters, to a clustered framework that accommodates common (high frequency) and rare (low frequency) events. We know from the traditional CTC world that CTCs and by extension other disease associated events are typically rare.
Additional details of the invention are found in attached Exhibits A and B; the entire disclosure of which are hereby incorporated by reference.
The following examples illustrate the various embodiments of the present invention. Those skilled in the art will recognize many variations that are within the spirit of the present invention and scope of the claims.
Accurate prognosis at the time of a diagnosis with early-stage breast cancer is a critical aspect of the diagnostic workup. Analytes in the blood-based liquid biopsy carry the opportunity for better characterization of the systemic burden of the disease during this clinical process. Breast cancer (BC) is the most common cancer in women globally and with 7.8 million cases diagnosed in the past 5 years, it is the world's most prevalent cancer overall (1-3). Approximately 94% of patients are initially diagnosed with early-stage BC, without evidence of macroscopic metastasis, however, despite the initial lack of detectable metastases and administration of subsequent treatments, 40% of the early-stage BC patients will go on to develop recurrence over their lifetime (4-9). Relapse, progression, and onset of distant metastasis (late-stage BC) have a significant negative impact on clinical outcomes, dropping the 5-year survival rate from 91% to less than 30% (1,3). Considering the impact on survival rates, it is vital that robust stratification of early-stage BC be made possible at the time of the initial diagnostic workup and throughout the course of the disease.
Currently, the standard screening method for BC is mammography, with a tissue biopsy to confirm diagnosis (3,4). In patients with biopsy confirmed cases of BC, tumor burden and treatment response are typically assessed by clinical evaluation of symptoms alongside imaging (4). While cross sectional advanced imaging is sometimes used to identify disease spread, it is expensive, often inconclusive, and fails to provide insight into the status and changes of the molecular profile of the tumor. Solid tissue biopsies have great utility in clinical care and can provide information on tumor biomarker and histological subtyping, molecular profiles, and advise treatment planning. Nevertheless, they have several caveats. First, primary tumors or metastatic lesions are not always easily accessible. Second, although solid biopsies provide valuable insights into the molecular signatures of the tumor, they are limited to the precise sampling area and could fail to capture the tumor heterogeneity (10-14). Third, and most crucial, solid biopsies are inherently incompatible with characterization of the subclinical systemic spread of the disease in addition to being challenging for longitudinal monitoring since they are painful, invasive, and always carry a potential risk to the patient (15-19).
Liquid biopsy (LBx), with a focus on peripheral blood, is a minimally-invasive method that can provide key information about the tumor and the systemic burden of the disease in the circulatory system (20,21). The utility of LBx for BC detection in the metastatic setting has been well-established with numerous clinical trials focusing on their utility to inform clinical decision-making and improve patient outcomes (22-28). Most of the LBx studies on BC focus on the presence of circulating tumor cells (CTCs), however, in the case of early-stage BC where CTC positive patients are scarce (29-33), more comprehensive analysis of tumor-related analytes in the LBx could be beneficial to assess the disease status. The third generation high-definition single cell assay (HDSCA3.0) workflow provides the opportunity to identify and characterize epithelial, mesenchymal, endothelial, and hematopoietic cells, as well as large extracellular vesicles (LEVs), building a platform capable of providing a more comprehensive overview of the circulating rare events and capturing the heterogeneity of the LBx (34).
In this study, we demonstrate the feasibility of using the HDSCA3.0 to stratify late-stage BC, early-stage BC, and normal blood donor status, using peripheral blood samples. We observe a distinctly higher presence of CTCs in the late-stage BC, compared to the early-stage and normal groups. Additionally, we determine that tumor-associated LEVs are found more frequently and in greater abundance in the early-stage BC group compared to late-stage and normal blood donor groups. In combination, this allows for both the stratification of cancer vs. normal and early- vs. late-stage BC with statistical confidence. Our results open the opportunity for a complementary LBx at the time of diagnostic workup for cancer detection, stage stratification, and disease monitoring.
A total of 100 BC patients and 30 normal donors are included in this study. Cancer patients were recruited to the prospective Physical Sciences in Oncology study (PSOC-0068) entitled OPTImization of blood COLLection (OPTICOLL) (35). Here, we present a subset consisting of 74 patients clinically classified as early-stage and 26 patients clinically classified as late-stage BC at time of enrollment (Table 1). All cancer patients were enrolled between April 2013 and Jan. 17, 2017, at multiple clinical sites in the United States: Billings Clinic (Billings, MT), Duke University Cancer Institute (Durham, NC), City of Hope Comprehensive Cancer Center (Duarte, CA), and University of Southern California Norris Comprehensive Cancer Center (Los Angeles, CA). Patient recruitment took place according to an institutional review board approved protocol at each site and all study participants provided written informed consent (35,36).
The study schedules were coordinated and unified across the clinical sites. For patients included in this study with non-metastatic treatment naïve disease (early-stage BC), the blood draws were acquired prior to any treatment. Patients with metastatic disease (late-stage BC) had multiple blood specimens collected at the beginning of a new line of therapy, either as a first line of therapy or post-progression while on therapy for treatment of metastatic malignancy. A total of 10 normal blood donor samples were procured from the Scripps Clinic Normal Blood Donor Service and defined as individuals with no known pathology. Additionally, 20 age and gender matched normal donor samples were provided from Epic Sciences and defined as women between 45-82 yrs (median=57) with no known pathology. Normal donors will refer to the accumulation of both Scripps Clinic and Epic Sciences samples.
Approximately 8 mL peripheral blood was collected in 10-mL blood collection tubes (Cell-free DNA BCT, Streck) at the respective clinical site. Blood specimens were shipped to and processed at the Convergent Science Institute in Cancer (CSI-Cancer) at the University of Southern California within 24-48 hours of collection, as previously described (20). Upon receipt, all samples underwent red blood cell lysis and the remaining nucleated cell population was plated in a monolayer on custom-made cell adhesive glass slides (Marienfeld, Lauda, Germany), at approximately 3 million cells per slide. The prepped slides were subsequently incubated in 7% BSA, dried and stored at −80° C. (20, 35, 36).
Two slides from each patient, corresponding to approximately 6 million nucleated cells, were thawed and subsequently stained using IntelliPATH FLX™ autostainer (Biocare Medical LLC, Irvine, CA, USA) in batches of 50 slides (46 patient slides [2 slides per patient] and 4 control slides) as previously described (20, 34, 36). All steps were performed at room temperature. Cells were fixed with 2% neutral buffered formalin solution (VWR, San Dimas, CA) for 20 min, nonspecific binding sites were blocked with 10% goat serum (Millipore, Billerica, MA) for 20 min. Slides were subsequently incubated with 2.5 ug/mL of mouse anti-human CD31 monoclonal antibody (Ab) (clone: WM59, MCA1738A647, BioRad, Hercules, CA) preincubated with 100 ug/mL of goat anti-mouse IgG monoclonal Fab fragments (115-007-003, Jackson ImmunoResearch, West Grove, PA) for 4 hr. After incubation with CD31-Fabs, cells were permeabilized using 100% cold methanol for 5 min. Cells were then incubated with an Ab cocktail consisting of mouse anti-human pan-cytokeratin (PanCK) mAbs (clones: C-11, PCK-26, CY-90, KS-1A3, M20, A53-B/A2, C2562, Sigma, St. Louis, MO), mouse anti-human CK19 mAb (clone: RCK108, GA61561-2, Dako, Carpinteria, CA), mouse anti-human CD45 Alexa Fluor®647 mAb (clone: F10-89-4, MCA87A647, AbD serotec, Raleigh, NC), and rabbit anti-human vimentin (VIM) mAb (clone: D21H3, 9854BC, Cell Signalling, Danvers, MA) for 2 hr. Slides were then incubated with Alexa Fluor®555 goat anti-mouse IgG1 antibody (A21127, Invitrogen, Carlsbad, CA) and counterstained with 4′,6-diamidino-2-phenylindole (D1306, ThermoFisher, Waltham, MA) for 40 min. Slides were then mounted with an aqueous mounting media to preserve cellular integrity for further downstream analysis.
After staining, the slides were imaged using automated high-throughput fluorescence scanning microscopy at 100× magnification, resulting in 2304 image frames per slide, as previously reported (20). Exposure times and gain for PanCK, VIM, CD45/CD31, and DAPI (DNA) channels were determined computationally by the scanner control software to normalize the background intensity levels across all slides. Using customized EBImage (4.12.2) software and the R scripting language for image analysis, cells were segmented, and their cellular and nuclear descriptors were extracted as previously described (34).
Rare events were detected by the third-generation of our computational algorithm for unsupervised clustering, as previously described (34). In brief, this approach allows for the classification of cells into common and rare groups based on principal component analysis of cells' morphometric features and subsequent hierarchical clustering (
Rare cells were then further classified into 8 classes based on the combinations of immunofluorescent marker expression in 3 categories: PanCK, VIM, CD45/CD31. Four categories showed no expression of cytokeratins but were determined positive for either VIM or CD45/CD31, or determined positive or negative for both. Enumerations of the cellular categories were done by trained analysts who determined the final enumeration per cell type.
Finally, the frequency of rare events (CTCs and LEVs) for each category was reported as concentration of rare cells per ml (mean, median, range), calculated by measuring the total number of nucleated cells per two slides, estimated using DAPI-stained nuclei count, against the total complete blood count of the received sample.
The computational approach uses EBImage to segment cells and extract quantitative cellular and nuclear features (34). For our morphometric analysis, we utilized the extracted features to further analyze the identified rare cells. Features correspond to cell size and eccentricity, nucleus size and eccentricity, immunofluorescent intensity of the DAPI, PanCK, VIM, CD45/CD31 channels, and the ratios of all combinations of these features to one another. Values for the immunofluorescent channels are reported as the mean signal over cell area, normalized per slide to interval 0-1.
Statistical two-sided analyses were performed using R (Version 4.1.1., Boston, MA). Groups were compared using Kruskal-Wallis (one-way ANOVA on ranks) for non-parametric rank-based dependence between multiple groups to compare whether the distributions have a median shift greater than the null hypothesis, and student's t-test to determine if there is a significant difference between the means of two groups, for all analyses. P values below 0.05 were considered statistically significant. No correction was conducted as the comparisons were planned comparisons. Pearson correlation was used to evaluate the relationship between study groups.
The primary goal of this study was to determine the ability of HDSCA3.0 rare cell detection to stratify normal donor, early-stage BC, and late-stage BC into distinct groups based on the rare cellular events detected using the LBx approach. While this stratification was initially performed using statistical analysis on the cell counts, we explored the ability of using machine learning models with the target variable of disease state. We used the manual enumeration recorded as event counts per ml per fluorescent channel type. To overcome discrepancies in the sample size, we randomly oversampled the late-stage BC group to match the size of the early-stage BC cohort. Similarly, we oversampled the normal group to match the size of the combined BC groups. To ensure we were not biasing the dataset by oversampling two groups, we also performed combinations of random undersampling of early-stage and oversampling normal, as well as undersampling both early- and late-stage groups.
For the model, we tested random forest, logistic regression, and naïve bayes algorithms using Python 3 (Python Software Foundation, https://www.python.org/) and Orange 3.0 data-mining toolbox in Python (38). Model comparison was done by measuring the accuracy, sensitivity, specificity, and AUC (area under the ROC curve) to evaluate performance. In all comparisons, random forest was the top performing algorithm.
To determine the stratification efficiency of the LBx using HDSCA3.0, a random forest algorithm was used to develop models to predict disease state classification. We built a random forest model with 10 trees. Our random forest model was trained, validated, and tested using data from 296 samples (74 early-stage, 74 late-stage, and 148 normal donors). Training and validation of the model was performed on ˜75% of the dataset through random selection (111 BC and 111 normal donors for cancer vs. normal/56 early-stage BC and 55 late-stage BC for early vs. late), using 10-fold cross validation. Testing of the model was performed on the remaining ˜25% of the dataset (37 BC and 37 normal donors for cancer vs. normal/18 early-stage BC and 19 late-stage BC for early vs. late), thereby maintaining the class distribution across training/validation/test sets.
A total of 155 blood draws from 130 participants, with 74 (56.9%) treatment-naive, nonmetastatic early-stage patients, 26 (20%) metastatic late-stage, and 30 (23.1%) normal donors, were included in this study. All participants were female. Patients' demographics are provided in Supplemental Table 1. The total sample set included 310 slides each containing approximately 3 million nucleated cells that were processed and analyzed for rare event detection (Methods).
We identified and categorized candidate rare cells using an automated rare cell detection workflow followed by manual enumeration based on the four-channel immunofluorescence staining corresponding to DAPI, PanCK, VIM, CD45/CD31, and cellular morphology (
CTCs that were identified as DAPI+|PanCK+ were defined as epi.CTCs and enumerated for normal donor, early-stage BC, and late-stage BC samples. The epi.CTC enumeration of all samples revealed a median of 0 cells/ml (mean=2.66, range=0-50.10 cells/ml). For the late-stage group, 75% of patients had at least one epi.CTC (mean=6.75, median=2.02, range=0-50.10 cells/ml), compared to only 27% of early-stage patients (mean=0.77, median=0, range=0-12.13 cells/ml; p=0.0011×10−04). Late-stage patients had a significantly higher level of epi.CTCs than the normal donor group (mean=0.39, median=0, range=0-2 cells/ml; p=0.0038×10−03). No significant difference in the epi.CTCs was observed between the early-stage BC and the normal donor groups. (
VIM+ CTCs (mes.CTCs) were identified as DAPI+|PanCK+|VIM+. For all samples we observed a median of 0 cells/ml (mean=1.27, range=0-16.42). The late-stage BC group revealed a significantly higher overall count of mes.CTCs (mean=2.52, median=1.02, range=0-16.42 cells/ml), in comparison with the early-stage BC (mean=0.91, median=0, range=0-7.06 cells/ml; p=0.0019) and the normal donor (mean=0.55, median=0, range=0-5 cells/ml; p=0.0024) groups. No significant difference was observed between the normal donor and early-stage BC groups) (
Additional candidate CTCs include PanCK+|CD45/CD31+ (double positive CTC) and PanCK+|VIM+|
CD45/CD31+ (triple positive CTC) cells. No significant difference was observed between the levels of double positive CTCs between the groups. The triple positive CTCs were found at significantly higher frequencies in both the early-stage BC (mean=12.80, median=1.80, range=0-240.04 cells/ml; p=0.008) and the late-stage BC (mean=4.34, median=2.07, range=0-40.56 cells/ml; p=0.014) compared to the normal donor (mean=1.56, median=0, range=0-17.062 cells/ml) group. No significant difference was observed in the comparison between the early- and late-stage groups (
Other detectable rare cells include morphologically distinct VIM+|CD45/CD31+|DAPI+, CD45/CD31+|DAPI+, DAPI+, and VIM+|DAPI+ cells. The VIM+|DAPI+ only cells showed a significant increase in the late-stage group (mean=14.43, median=4.74, range=0-266.82 cells/ml), compared to the early-stage (mean=3.84, median=1.44, range=0-27.81 cells/ml; p=0.00056) and the normal donor (mean=1.72, median=0.93, range=0-12.10 cells/ml; p=0.0031×10−02) groups (
Morphological analysis was conducted on the identified rare cells based on extracted image features from EBImage. A visual representation of the identified rare cells based on their morphometric features has been provided as a uniform manifold approximation and projection (UMAP) figure (
A correlation analysis between the frequency of classified rare cell categories was conducted for all samples and no strong correlation was found (
LEVs, classified as DAPI-|PanCK+ events were most prevalent in the early-stage BC group, with 94% of patients having at least one LEV per ml, compared to 60% in the late-stage group (
Correlations with Clinical Outcome
In the patient population with identified hormone receptor (HR) and end-of-therapy status (44 early-stage/57% and 12 late-stage/46%) (Supplemental Table 1,
Our results indicate a significantly higher frequency of LEVs in the early-stage BC group with the last follow-up status of “alive, free of disease” (mean=46.10, median=20.25, range=0 to 400.52 LEVs/ml, n=39) in comparison to those with “alive, active cancer” (mean=18.03, median=11.89, range=7.41 to 32.88 LEVs/ml; p=0.047, n=5) (
In the late-stage BC group, the overall median time from diagnosis to follow-up was 19.5 months (range=1 to 41, n=14), with no cases reported to be cancer-free. We found significantly higher epi.CTC levels in group with the follow-up status of “deceased, active cancer on day of death” (mean=21.96, median=17.68, range=0 to 50.10 cell/ml, n=6), compared with “alive, active cancer” (mean=1.37, median=1.48, range=0 to 3.40 cell/ml; p=0.045, n=8).
Epi.CTC counts were also found to be elevated in BC patients with estrogen receptor (ER) positive (mean=14.78, median=2.18, range=0 to 50.10, n=9) compared to ER negative (mean=1.93, median=2.44, range=0 to 3.83, p=0.072, n=5) tumor status. The same relationship was also detected between the progesterone receptor (PR) positive (mean=20.33, median=13.70, range=0 to 50.10, n=6) and PR negative (mean=2.59, median=1.69, range=0 to 10.15, p=0.086, n=8) patients, although both levels did not reach statistical significance. No significant relationship was observed between ER/PR tumor status and follow-up patient status in the late-stage BC patients. No significant difference was observed between HER2 tumor status and epi.CTC levels.
The random forest model exhibited acceptable performance, as measured by the ROC/confusion matrix, between normal vs. cancer and early-stage vs. late-stage comparisons (
In this study, we set out to stratify late-stage BC, early-stage BC, and normal donor peripheral blood samples based on rare circulating events identified using the HDSCA3.0 LBx platform. We utilized 5 biomarkers to identify and distinguish rare circulating events as epithelial, mesenchymal, endothelial, or hematological origin. Using this comprehensive profiling without prior enrichment, we were able to observe events in all samples, allowing for robust stratification with both manual classification and mathematical model building approaches. We were able to detect reproducible patterns in the enumeration of rare cells and LEVs. These reproducible patterns separate the relevant groups of cancer vs. normal control and early-stage cancer vs. late-stage cancer with high accuracy. Our findings demonstrate the feasibility to provide robust and reproducible detection of rare circulating events in peripheral blood draws and to stratify late-stage BC, early-stage BC, and normal donor samples.
Since metastasis is the most common cause of cancer mortality (1), earlier detection and precise diagnosis of existent and early tumor dissemination is imperative to improving patient outcomes. In our study, we found a statistically significant increase of CTCs in patients of the late-stage compared to early-stage BC groups. Previous studies have attributed the higher frequency of CTCs in late-stage BC patients to the dissemination of tumor (39), therefore the lower incidence rate observed in the early-stage cancer setting could be explained by the organ-confined nature of the disease and lack of widespread metastasis. Previous work has demonstrated a link between CTC burden in late-stage BC and progression free survival (40), however, administration of treatment has been shown to affect the abundance of CTCs (41). In this study of late-stage BC patients, with draws taken either on and off therapy, we were able to detect epi.CTCs in 75% of the samples and observe negative association of epi.CTC count with overall survival. Therefore, our results using a high-sensitivity non-enrichment technology demonstrate that epi.CTCs may still be detected, and provide prognostic value prior to the initiation of therapy, as well as during treatment.
Despite advances in the LBx field, the low abundance of CTCs, especially in early-stage cancer, remains a challenge for establishing precise diagnosis and prognosis in this setting. Furthermore, tumors are complex and are comprised of heterogeneous cell types, with CTCs that are defined by dual positivity for EpCAM and Cytokeratin only representing a fraction of the total tumor cells responsible for dissemination and relapse (42). Motivated by these prior observations, this next generation LBx was designed to identify and characterize the tumor heterogeneity in the circulatory system. By including eight rare cell categories, we were able to observe the heterogeneous phenotypes in circulation and to use these multiple LBx analytes to stratify the samples according to disease status with high statistical significance.
Detection of LEVs represent a promising new LBx analyte (37). Our results demonstrate a statistically higher overall presence of tumor-associated LEVs in the early-stage BC group, compared to the late-stage BC group and the normal donors. The high level of LEVs in the early-stage BC patients could be explained by the presence of the primary tumor, since these early-stage BC patient samples were collected prior to any treatment, at which time the patient still had their primary tumor intact. This contrasts with the late-stage patients, who are more likely to have had their primary tumor removed prior to the time of blood draw. Tumor associated LEVs have been described as a component of the tumor microenvironment (45), and primary tumors have been shown to harbor more cellular heterogeneity in comparison to metastatic lesions which are mostly composed of tumor cells (46). Additionally, previous findings have implicated extracellular vesicles for their role in facilitating pre-metastatic niche preparation (47,48). Tumor progression and metastasis requires the acquisition of invasive traits within the primary tumor alongside the generation of a permissive microenvironment at distant metastatic sites. Previous studies have found that in the case of BC, extracellular vesicles can initiate organ-specific pre-metastatic niche preparation (49). These results suggest that there is an additional possibility that LEVs are secreted into circulation in pre-metastatic early-stage disease from the primary tumor to facilitate the preparation of metastatic niches and are less inclined to be present in late-stage disease where the metastatic sites are well-established. Our study demonstrates that detection of LEVs, when applied alongside rare cell enumeration, provides a more sensitive and specific LBx analysis.
The OPTICOLL study was originally designed to provide a comprehensive analysis of pre-analytical variables of LBx (35,36) and is providing a platform for discovery using sample preparation methods that have been previously validated. A limitation of this study is the number of patients with sufficient follow-up that we were able to include. The results of this study should however provide sufficient feasibility to conduct larger trials and higher patient recruitment as the next step towards clinical utility. Both the use of additional lineage markers and the inclusion of LEVs in addition to CTCs has significantly advanced our ability to separate the patient groups. The patients with sufficient follow up did not yet include plasma preparation for cell-free analysis, which one would expect to also add value. However, despite the current limitations, we were able to observe a highly significant difference in the LBx analytes between breast cancer patients and normal controls, and between the late-stage and early-stage BC samples collected. While the current observations are consistent with prior hypotheses of various liquid biopsy analytes, we expect these results will trigger further model system experiments to continue exploration of the early and late-stage implications of LEVs in particular as well as the design of additional trials to define the clinical utility as a potential adjunct to the diagnostic workup.
A more comprehensive profiling of the LBx as demonstrated here has the potential to complement the current diagnostic workup following a positive screening test. The current NCCN guidelines do not recommend systemic imaging such as FDG-PET scanning for the majority of early-stage patients as most patients will receive some form of adjuvant treatment (53). However, LBx findings, such as the frequencies of LEVs and CTCs, may provide diagnostic and prognostic information that would impact the utility of adjuvant systemic therapy in subsets of patients. LBx may also identify those patients who have occult secondary tumors as evidenced by persistence of LEVs following primary surgery or predict whether post-operative patients are more or less likely to benefit from adjuvant radiotherapy. For patients at risk of breast cancer, LBx may also have a role as an adjunct to radiologic screening for breast cancer by stratifying the Breast Imaging-Reporting and Data System (BI-RADS) category 3 patients into categories 2 or 4 based on LBx results. Such a combined approach may reduce the patient anxiety associated with indeterminate mammography results and reduce the need for 6 months call back imaging. Each of these hypotheses require testing in large scale prospective trials.
II. Characterization of Cellular and Acellular Analytes from Pre-Cystectomy Liquid Biopsies in Patients Newly Diagnosed with Primary Bladder Cancer
Bladder cancer (BCa) is the tenth most common cancer in the world, representing 3% of all new cancer cases [54]. Urothelial carcinoma (˜90%) is the most frequent BCa histology diagnosed in the U.S., and can be subdivided by stage, grade, and subtype (conventional or variant morphology) [55]. Less common types include squamous (2-5%), adenocarcinoma (2%), and neuroendocrine (1%), as well as other rare tumors (<1%). Tumors that are confined to the lamina propria of the bladder are termed non-muscle invasive BCa (NMIBC; Ta, Tis (carcinoma in situ), T1), while those that invade the muscularis propria are called muscle invasive BCa (MIBC, T2-T4), an advanced stage with life threatening consequences requiring surgical management. BCa is highly lethal once cells have spread from the primary tumor to surrounding tissues and distant organs [56]. Cystectomy, the surgical removal of the bladder, is used to treat most BCa patients, as it offers the best chance of cure. The procedure can be performed alone or in combination with other treatments and can be considered a first-line intervention in cases of superficial tumors with severe anaplasia.
We have previously reported on the clinically observed patterns of relapse following cystectomy. Metastases developed in 29% of patients (n=812), resulting in a 5-year overall survival rate of 20.4%, compared to 78.6% in those without relapse (n=1,983) [57,58]. Most metastatic progression occurs within the first 24 months. In another study, information theory and machine learning algorithms were employed to create predictive models around this BCa database, in which the primary predictors of recurrence and survival after radical cystectomy were determined to be pathologic T stage and subgrouping into localized or metastatic conditions [56]. Clinical T stage had a lower predictive signal than the true pathologic T stage. This loss of valuable information may especially affect those cases in which there is an underestimation of disease severity prior to surgery [59]. This recognizes the limitation of current clinical staging at the time of diagnosis and highlights the importance of precision cell and tissue analysis in differentiating patients by outcome prior to and following surgical intervention.
The early relapse in primary BCa patients undergoing cystectomy may be attributed to the presence of pre-existing subclinical metastatic disease in these patients [57]. Current prominent methods for detection, diagnosis, and surveillance of the disease are based on urine cytology and cystoscopy. Urine cytology, while non-invasive, approximately yields a low sensitivity of 38% and a specificity of 98% [60]. On the other hand, cystoscopy has a higher sensitivity between 65-90% depending on the subtype but is a highly invasive procedure with significant inter- and intra-observer variation in tumor stage and grade [61]. Thus, there is great need for improving the current clinical paradigm of diagnostic workup and treatment planning. We hypothesized that the liquid biopsy as a biomarker of systemic disease may be diagnostic of subclinical metastatic disease and prognostic of early relapse. If proven correct, it could serve as a surrogate marker to guide the addition or use of alternative therapy as opposed to surgical intervention alone in patients diagnosed with BCa. A comprehensive analysis of the blood-based liquid biopsy may assist in solving complex clinical problems by tracking cellular evolution and phenotypic populations, revealing treatments that are not efficacious for specific patients, thus developing a stratification system in order to avoid unnecessary surgical intervention.
Circulating tumor cells (CTCs) shed by the tumor are often detectable in the peripheral blood (PB) of cancer patients and have been associated with poor prognosis and early relapse [61-64]. Busetto et al. observed a strong correlation between the detection of CTCs by CellSearch® and the time to first recurrence [62]. Furthermore, in a meta-analysis of 2161 BCa patients from 30 published articles, Zhang et al. showed that the number of CTCs detected in the PB correlated with tumor stage, histological grade, metastasis, and regional lymph node metastasis [63]. These studies indicate that the presence of CTCs in the PB is an independent predictive indicator of poor outcomes for BCa patients. The work presented here is based on a third-generation comprehensive liquid biopsy [65]. This non-enrichment based, high-content direct imaging methodology is capable of providing both visualization and characterization of a broad range of CTCs that are present in circulation, along with molecular parameters (DNA and protein) at both the cellular and acellular (large extracellular vesicles [LEVs] and cell-free DNA [cfDNA]) levels. We have previously reported the value of single-cell genomic analysis conducted on this platform showing compatibility with clinical practice [65-67].
The third generation high-definition single cell assay (HDSCA3.0) liquid biopsy workflow [68-70] was designed for rare cell identification with immunocytochemistry [18] along with downstream molecular characterization in order to deliver diagnostic pathology-quality data for clinical decision making [66,67,72-74]. The primary objective of the present study was to investigate the prognostic significance of CTCs in BCa patients from PB samples taken prior to cystectomy. Secondary objective was to assess the association between CTC presence and known clinical data metrics such as clinical or pathological staging and histological subtype. This study aims at establishing evidence for the clinical utility of the liquid biopsy in BCa with the future goal of predicting metastatic relapse post-cystectomy and enable clinical intervention that can lead to improved outcomes.
This was a multiple institution prospective study of patients diagnosed with BCa in which PB samples were collected before cystectomy and prior to any procedures. Eligible patients underwent cystectomy for surgical removal of the primary tumor from the bladder. University of Southern California's Keck School of Medicine (Keck; n=25) samples were collected between January and November 2020. Samples from the University of California San Diego (UCSD; n=9), Johns Hopkins Hospital (JHH; n=13), and LAC/USC Medical Center (LAC; n=3) were collected between January 2016 and November 2017. The Keck patient subset has prospectively collected clinical, radiologic, and pathologic data elements as well as a limited amount of follow-up data. For this cohort, recurrence is defined as any clinical recurrence majority shown radiologically, either symptomatic or not. Patient recruitment took place according to an institutional review board approved protocol at each site, and all study participants provided written informed consent. Here we present the liquid biopsy analysis from a total of 50 BCa patients. Additionally, 50 normal donor (ND) samples from individuals with no known pathology were provided from Epic Sciences (San Diego, CA).
PB samples were collected in 10 ml blood collection tubes (Cell-free DNA, Streck) and processed by the Convergent Science Institute in Cancer (CSI-Cancer) at the University of Southern California within 24-48 hours as previously described [71]. Briefly, samples underwent red blood cell lysis, followed by plating the entire nucleated cell fraction on custom glass slides (Marienfeld, Lauda, Germany) at approximately 3 million cells per slide prior to long-term cryostorage at −80° C. and rare cell analysis.
For HDSCA analysis, each test consisted of two slides generated from the PB sample for an average of 0.74 ml blood analyzed. Slides were processed at room temperature using the IntelliPATH FLX™ autostainer (Biocare Medical LLC, Irvine, CA, USA) as previously described [65]. Briefly, samples were stained with 2.5 ug/ml of a mouse IgG1 anti-human CD31:Alexa Fluor®647 mAb (clone: WM59, MCA1738A647, BioRad, Hercules, CA) and 100 μg/ml of a goat anti-mouse IgG monoclonal Fab fragments (115-007-003, Jackson ImmunoResearch, West Grove, PA), permeabilized using 100% cold methanol, followed by an antibody cocktail consisting of mouse IgG1/Ig2a anti-human cytokeratins (CKs) 1, 4, 5, 6, 8, 10, 13, 18, and 19 (clones: C-11, PCK-26, CY-90, KS-1A3, M20, A53-B/A2, C2562, Sigma, St. Louis, MO), mouse IgG1 anti-human CK 19 (clone: RCK108, GA61561-2, Dako, Carpinteria, CA), mouse anti-human CD45:Alexa Fluor®647 (clone: F10-89-4, MCA87A647, AbD Serotec, Raleigh, NC), and rabbit IgG anti-human vimentin (Vim) (clone: D21H3, 9854BC, Cell Signaling, Danvers, MA). Lastly, slides were incubated with Alexa Fluor® 555 goat anti-mouse IgG1 antibody (A21127, Invitrogen, Carlsbad, CA) and 4′,6-diamidino-2-phenylindole (DAPI; D1306, ThermoFisher) prior to mounted with a glycerol-based aqueous mounting media. Samples were imaged using automated high-throughput fluorescence scanning microscopy at 10× objective magnification generating 2,304 frames images per fluorescence channel per slide.
As previously reported [65], rare cell candidates were detected using a custom computational methodology termed OCULAR (Outlier Clustering Unsupervised Learning Automated Report). Fluorescent images were used to segment each cell using the “EBImage” R package (EBImage_4.12.2) and extract 761 quantitative morphometric parameters based on the nuclear and cytoplasmic morphology and biomarker expression (CK, Vim, CD45/CD31) in a 4-channel immunofluorescence assay (DAPI, AlexaFluor® 488, AlexaFluor® 555, AlexaFluor® 647). Additionally, the algorithm identified DAPI-negative CK-positive events into a separate report to be classified as large extracellular vesicle (LEV) candidates [73].
Manual reporting was conducted on the identified events to check for signal intensity and distribution, as well as morphology. Images of candidate rare events were presented to a hematopathologist-trained technical analyst for analysis and interpretation. Rare events were classified into 12 categories (8 cellular, 4 LEV) based on the combination of immunofluorescent marker expression in the previously reported 4 channels. Epithelial-like CTCs (epi.CTCs) were classified as cells that were CK-positive, Vim-negative, and CD45/CD31-negative, with distinct appearing nucleus by DAPI morphology as previously described [65,71]. Epi.CTCs expressing Vim were classified as mesenchymal-like CTCs (mes.CTCs). White blood cell (WBC) counts of whole blood were determined automatically (Medonic M-series Hematology Analyzer, Clinical Diagnostic Solutions Inc., Fort Lauderdale, FL) and the number of WBCs detected by the assay per slide was used to calculate the actual amount of blood analyzed per test so that results are presented as fractional values of events/ml.
LEV candidates were positive for CK with variable Vim and CD45/CD31 expression. LEVs were identified through the OCULAR methodology outlined above with careful identification for those that were either free-floating or in close proximity to cells. Due to the close proximity of the cell-attached LEVs, OCULAR interpreted both as a single cellular event. Manual classification to separate these two entities as individual rare events was employed to correct for the computational oversimplification of OCULAR. Further, corrections included excluding any halos, bubbles, or light refractions resembling the morphology of LEVs (round and membranous) when examining frames of patient samples through the CK channel. A maximum threshold of three LEVs per frame was used to rule out CK-positive junk particles that may have landed on the slide during processing
Statistical significance was determined at a p-value ≤0.05. To perform statistical analysis of the clinical, radiologic, and pathologic data, we used two statistical tests: Spearman's rank correlation coefficient [75] and the Mann-Whitney U test, also known as the Wilcoxon rank sum test [76,77]. The Spearman rank test was used to calculate the correlation between continuous variables as we are not strictly evaluating the degree of linear relationship, but rather the degree of monotonic relationship between the two target variables. In addition, it was also non-exclusively applied to evaluate the correlation between continuous variables and categorical variables that have a well-defined ordinal encoding and multiple outcomes. For example, the clinical T stage encoded such that the available classifications (T0, Tis, Ta, T1, T2a, T3b, T4a) were assigned to ordinal values from 0 to 6. To evaluate the correlation between continuous and categorical data without a well-defined ordinal encoding, we also performed the Wilcoxon rank sum test.
The Wilcoxon rank sum test determines whether two samples are likely to derive from the same population, is appropriate for small datasets, and does not require that the data be paired or normally distributed [78]. This nonparametric test is calculated based on the ranks (or order) of the numerical variables, making it robust with respect to outliers. For categorical variables that can have more than two classifications, the Wilcoxon rank sum test is calculated between all possible classification pairs. For example, the correlation between total rare cell count vs clinical predominant cancer cell type (Urothelial, Other, No Tumor) is calculated for all combinations: Urothelial vs Other, Urothelial vs No Tumor and Other vs No Tumor. All statistical tests were performed in Python (version 3.8.5) with the Scipy library (version 1.5.0).
To visualize the morphometrics of detected cellular events, a two-dimensional tSNE (t-distributed stochastic neighbor embedding) was used [79]. To aid the identification of clusters in the tSNE, a clustering algorithm was used. Specifically, we applied agglomerative clustering imported from the sklearn library version 0.23.2 [80]. For the clustering parameters, we used Ward linkage and an Euclidian distance metric [81].
Classification models were used to test whether BCa patients can be discerned from NDs utilizing liquid biopsy data alone (i.e., whether one has distinct rare event populations when compared to the other). The python library sklearn version 0.23.2 was used to develop the machine learning models [80]. Two slides each from 50 ND samples were collected to mirror the 50 BCa patient samples. For each individual, the data utilized in the classification models was the counts for each cell and event classification per ml of blood averaged across both slides. Three different classification models (random forest [RF], support vector machine [SVM] and naive Bayes [NB]) were tested to produce a binary outcome indicating whether an individual is within the BCa or ND category. We employed a 5-fold cross validation method to test each model architecture in which the dataset was divided into five equal folds of 20 individuals. Each fold is then used as a test set for a model built with the remaining four, yielding five models for each or RF, SVM, and NB (i.e., 15 total models). We employed a grid search algorithm to find optimal hyperparameters for the RF and SVM. Final model metrics are averages across all models of the same type.
A total of 50 patients with primary BCa were accrued for this study, each providing a single PB sample obtained prior to cystectomy. Site specific liquid biopsy data is provided in supplemental
A complete blood cell count was taken at CSI-Cancer prior to blood processing. For the 50 BCa samples included here, there was a median WBC count of 6.75 (range 3.3-25; mean 7.5) million cells/ml PB. For all BCa samples, total rare event (total cells and LEVs) detection had a median of 132.67 events/ml (range 38.11-1,220.51; mean 230.33). For ND samples, total rare event detection had a median of 38.50 events/ml (range 4.39-141.55; mean 47.86). A significant difference was observed between the BCa patients and ND (p-value <0.0001).
We have identified 8 cellular categories defined by nuclear DAPI signal and rely on the expression of the different biomarkers in each channel. A gallery of CTCs and graphical representation of the frequency of each rare event identified per test for each patient sample is shown in
Total CK-positive cells were detected with a median of 27.59 cells/ml (range 0-895.72; mean 79.36) from all BCa samples. The ND samples had a median of 12.90 cells/ml (range 0-83.24; mean 18.96). There was a statistically significant difference in total CK-positive cell detection between BCa patient and ND samples (p-value=0.0093). Only 1 BCa patient (2%) did not present with CK-positive cells at the time of sample collection. Using a threshold of positivity of >5 cells/ml, a total of 44 samples (88%) were positive for CK expressing cells. The frequency of CK-positive cells detected within the total rare cell population varied. Overall, there was a median frequency of 30.2% (range 0-97%; mean 36%) in the BCa samples.
Epi.CTCs were detected with a median of 0 cells/ml (range 0-27; mean 1.2) from BCa patient samples. Mes.CTCs were detected with a median of 0 cells/ml (range 0-25.12; mean 2.33) from BCa patient samples. There was no statistically significant difference in epi.CTCs/ml or mes.CTCs/ml observed between BCa patient and ND samples.
Additional candidate CTCs detected include CK|CD45/CD31 (median 1.44 cells/ml; range 0-267.84; mean 13.76) and CK|Vim|CD45/CD31 (median 23.19 cells/ml; range 0-729.44; mean 60.09). Other detectable rare cells include morphologically distinct Vim|CD45/CD31 (median 10.51 cells/ml; range 0-919.24; mean 68.36), CD45/CD31 only (median 0 cells/ml; range 0-14.49; mean 1.89), DAPI only (median 5.00 cells/ml; range 0-46.86; mean 6.76), and Vim only (median 11.18 cells/ml; range 0-149.57; mean 22.04). There was a statistically significant difference between BCa patient and ND samples in cellular enumeration of Vim|CD45/CD31 (p-value=0.0018), CK|Vim|CD45/CD31 (p-value=0.0003), Vim only (p-value=0.0406), DAPI only (p-value=0.0430). The biological significance of these cellular populations has not been determined.
The most prevalent cell types observed in the PB of BCa patients prior to cystectomy were Vim|CD45/CD31 (median 15.19%; range 0-80.53%; mean 28.64%) and CK|Vim|CD45/CD31 cells (median 22.99%; range 0-79.49%; mean 26.10%), followed by Vim only cells (median 14.02%; range 0-81.13%; mean 21.94%). Out of all the rare cells detected across patient samples, Vim|CD45/CD31 cells constituted 45.24%, CK|Vim|CD45/CD31 cells constituted 31.05% and Vim only cells constituted 10.74%. We identified a positive correlation between mes.CTC and CK|Vim|CD45/CD31 (spearman coefficient=0.58, p-value <0.001), as well as two other cellular categories (Vim only [spearman coefficient=0.358, p-value=0.01], CK|CD45/CD31 [spearman coefficient=0.292, p-value=0.040]). This suggests that the cellular populations are associated with each other and represent the heterogeneity of the disease.
To visualize the cellular subgroups and their similarities with respect to morphometrics we used 8 key measures. The first four are obtained from the median immunofluorescence intensity of DAPI, CK and CD45/CD31 channels. The second set of four are the area and eccentricity for the cell and the nucleus. The morphometrics were visualized by a two-dimensional tSNE plot shown in
The channel-type classified cellular populations had observable morphological heterogeneity which is displayed in
LEVs were classified by DAPI negativity, CK signal positivity and distribution, as well as morphology. Total LEV detection for the BCa patient samples had a median of 30.91 LEVs/ml (range 2.22-319.08; mean 51.92). The ND samples presented with a median of 3.34 LEVs/ml (range 0-27.91; mean 4.65), which was significantly lower than that detected in the BCa samples (p-value <0.0001). In BCa patient samples, LEVs were detected either alone (n=740; 44.6%) or in close proximity to cells (n=918; 55.4%). In ND samples, these LEV populations totaled 85 (45.9%) and 100 (54.1%), respectively.
CK only LEVs were detected in all BCa patients with a median of 27.06 LEVs/ml (range 1.08-235.92; mean 37.80). CK|Vim|CD45/CD31 LEVs were also detected in 27 patients (54%) with a cohort median of 1.05 LEVs/ml (range 0-163.95; mean 11.60). A positive correlation was observed between CK|Vim LEVs and CK|Vim|CD45/CD31 LEVs (spearman coefficient=0.47, p-value=0.001). Both of these LEV populations were detected at a significantly higher level in BCa patient samples than ND samples (p-value <0.0001 for both). The observed LEVs represent additional tumor heterogeneity and a new potential analyte to monitor disease status.
The detection of LEVs was not associated with the detection of epi.CTCs or mes.CTCs. We observed a negative correlation between Vim|CD45/CD31 cells and CK only LEVs (spearman coefficient=−0.39, p-value=0.005). Additionally, a negative correlation was found between CK|Vim LEVs and DAPI only rare cells (spearman coefficient=−0.28, p-value=0.05).
3.4. Keck Cohort with Clinical Data
Correlation analysis was used to determine the relationship between the various liquid biopsy analytes and the clinical/demographics metrics collected for the Keck subset of patients (n=22). Here, we report only the significant correlations, whereas a complete table of all comparisons can be found in the supplemental. A negative correlation was detected between BMI and the Vim only cells/ml (spearman coefficient=−0.41, p-value=0.05), as well as age and the DAPI only cells/ml (spearman coefficient=−0.59, p-value <0.001). WBC count correlated with CK|CD45/CD31 cells/ml (spearman coefficient=0.47, p-value=0.02) and CK|Vim|CD45/CD31 cells/ml (spearman coefficient=0.46, p-value=0.03). Platelet count at the time of sample collection correlated with total rare events/ml (spearman coefficient=0.57, p-value <0.001), total CK expressing cells/ml (spearman coefficient=0.47, p-value=0.02), mes.CTCs/ml (spearman coefficient=0.48, p-value=0.02), total LEVs/ml (spearman coefficient 0.61, p-value <0.001), CK only LEVs/ml (spearman coefficient=0.63, p-value <0.001). Creatinine blood measurements correlated with CK only LEVs/ml (spearman coefficient=0.43, p-value=0.04).
Clinical T stage was negatively correlated with CK|CD45/CD31 LEVs/ml (spearman coefficient=−0.62, p-value <0.001). Pathological T stage was negatively correlated with total rare events/ml (spearman coefficient=−0.50, p-value=0.01) and total rare cells/ml (spearman coefficient=−0.53, p-value=0.01). Those patients with Tis had significantly more rare cells/ml than those patients with T3a pathological staging (Wilcoxon=−2.12, p-value=0.03). The significance of the other channel-type rare cells have yet to be determined. Additionally, patients with Tis had a significantly greater CK only LEVs/ml than patients with T3a pathological staging (Wilcoxon=−2.12, p-value=0.03). This suggests that LEVs could be an analyte for early disease.
Total cells/ml, total LEVs/ml, and CK+LEVs/ml negatively correlated with recurrence (spearman coefficients=−0.44, −0.42, −0.42, respectively; p-value <0.05). The potential for recurrence is low as this prospective study had a median follow-up time since surgery of 9 months (range: 6-17) and additional time is warranted for progression/survival data to mature.
Statistical tests and predictive modeling were used to discern the BCa population from NDs. According to the Wilcoxon rank sum test, the counts/ml detected in NDs belong to different populations than the corresponding samples of BCa for multiple rare event classifications and groups. According to the classification models, the BCa patients and NDs contained distinct cell populations that allowed for stratification, as evidenced by their overall accuracies. The RF, SVM, and NB architectures had average accuracies across their five respective models of 89%+/−9.7%, 87%+/−9.8%, and 83%+/−11.2%, respectively. This corresponds to incorrectly predicting 11 (BCa=5, ND=6), 13 (BCa=8, ND=5), and 17 (BCa=12, ND=5) individuals across each of the models. When looking at the receiver operating characteristic (ROC) curves, the RF yielded an average AUC of 0.94+/−0.09, as compared to 0.91+/−0.07 for SVM and 0.90+/−0.13 for NB. Among the three architectures tested, the RF achieved the highest sensitivity of all models (84%+/−18%), but the lowest specificity (90%+/−9%). Comparatively, the SVM and NB had sensitivities of 79%+/−17% and 70%+/−25% and specificities of 93%+/−10% and 92%+/−10%, respectively.
For the RF, the top three most important events for discerning BCa from ND were CK only LEVs, CK|Vim|CD45/CD31 LEVs, and Vim|CD45/CD31 cells (See
We have detected liquid biopsy analytes unique to patients diagnosed with BCa prior to cystectomy. More precise clinical diagnostic tools are warranted in the context of BCa to predict response to therapy and monitor minimum residual disease to minimize metastatic progression. This study documents several important findings for liquid biopsy analysis for patients with BCa undergoing cystectomy: (i) CTCs and LEVs are detected in the PB, (ii) there is a high heterogeneity of CTCs, and (iii) liquid biopsy analytes correlate with clinical data elements. The liquid biopsy is a useful non-invasive tool for the discovery of cancer related biomarkers to represent the complex process of tumorigenesis. Our findings suggest that CTC and LEV analysis from the liquid biopsy should be further investigated as an inclusion in BCa patient management.
In summary, our study found that rare cells can be detected in BCa PB samples (median 74.61 cells/ml) as well as ND samples (median 34.46 cells/ml). When specifically considering CK-positive cells, BCa samples presented a median of 27.59 cells/ml while ND samples presented a median of 12.90 cells/ml. This study also found that LEVs can be detected in BCa samples, at a significantly higher count than in ND samples (median 30.91 vs 3.34 LEVs/ml). Across all BCa samples, both epi.CTCs and mes.CTCs were observed in only 34% and 40% patients, respectively. However, other candidate CTCs were detected at higher frequencies which include CK|CD45/CD31 (median 1.44 cells/ml) and CK|Vim|CD45/CD31 (median 23.19 cells/ml). Additionally, our study found that multiple liquid biopsy analytes both positively and negatively correlated with clinical data metrics, including clinical and pathological T stage, as well as recurrence. For example, patients with Tis disease had significantly more rare cells and CK only LEVs than those with T3a disease.
There are several methods to detect bladder cancer, some more technically challenging and maintaining invasive requirements for the procedure, but different methods have varying degrees of accuracy which depends on the method's sensitivity and specificity. By having a foundational understanding of the interpretation of sensitivity and specificity, healthcare providers will understand outputs from current and new diagnostic assessments, aiding in decision-making and ultimately improving healthcare for patients. Cystoscopy is invasive and uncomfortable for patients due to the technical requirements of the procedure; but is still the most accurate diagnosis method for BCa (sensitivity 68-100%, specificity 57-97%; [82]. Urine cytology is a non-invasive liquid biopsy approach, and when high-grade tumors are considered, the sensitivity is high (84%), but the sensitivity decreases to 16% in NMIBC, precluding its use in the detection of low-grade lesions [83]. Here we show that in a mixed cohort (NMIBC and MIBC), applying classification models using liquid biopsy data, we achieved an average sensitivity of 78% and specificity of 92% for the identification of BCa patient samples. We set out to use the liquid biopsy for detection of subclinical metastasis prior to surgical resection. While this remains our primary goal, the data also supports the general consideration of the liquid biopsy for screening and diagnostic work-up of BCa.
The liquid biopsy might be an indicator of early disease dissemination with micrometastases, and assessment prior to cystectomy is therefore crucial. CellSearch® CTCs were detectable in 8/44 NMIBC patients at diagnosis (18%) in which the presence of CTCs was associated with a shorter time to first recurrence [84]. Using the HDSCA3.0 workflow we detected epi.CTCs in 38% and mes.CTCs in 46% of BCa patients presented here, however there was no statistical difference between the same type of cells detected in the ND samples. Detection of CTCs prior to cystectomy in BCa patients has been shown to serve as evidence of progressing disease which may predict the appearance of a macroscopic lesion in a longer-term period, therefore the patients with low CTC counts before cystectomy are hypothesized to have a low risk of recurrence and are thus good candidates for cystectomy [64]. Additional time is needed to monitor the progress of the BCa patients in this study and determine if our hypothesis is correct.
A heterogeneous population of rare cells was observed in the PB of BCa patients prior to cystectomy. Here we identified 8 categories of rare cells based on the expression of 4 biomarkers (CK, Vim, CD45/CD31), but further cellular stratification could be conducted using morphometric parameters as these categories include a mixture of cell types as seen by morphological analysis. Since the total rare cell count/ml was correlated with pathological T staging (spearman coefficient=−0.53, p-value=0.01), we conclude that the rare cells detected are indeed related to disease status. This is evidence for the circulation of multiple CTC populations and other rare cells, possibly from the tumor microenvironment (TME), as measures of tumor burden and disease state. Furthermore, since the HDSCA3.0 workflow detects rare cells beyond the epi.CTCs, the negative association between total rare cells/ml and tumor stage may be driven by the high frequency of cells other than CTCs that may represent the tumor microenvironment (TME). We hypothesize that rare cell populations expressing Vim|CD45/CD31 includes circulating endothelial cells (CECs). In a prior publication, we showed that CECs (CD138|von Willebrand Factor positive, CD45 negative) in PB samples were morphologically distinct from the surrounding WBCs, and CEC count was significantly higher in myocardial infarction patients than that of the healthy control [85]. The presence of CECs in the PB may be a novel way to assess vascular function in BCa patients, potentially as markers of altered vascular integrity or even direct contributors to tumor formation (i.e., angiogenesis). Further characterization is warranted to understand the biological significance of each channel-type cellular population, but this study highlights the promise of the liquid biopsy for early risk stratification of BCa patients, prediction of treatment response, and early detection of metastatic relapse.
Here we show that circulating LEVs have been detected in an enrichment-free liquid biopsy approach, representing a promising new analyte for BCa care. Tumor heterogeneity is further seen in the 4 different LEV categories detected. The results presented here demonstrated a statistically higher overall presence of tumor-associated LEVs in BCa patients prior to cystectomy compared to the NDs (median 30.91 LEVs/mL vs. 3.34 LEVs/mL, respectively), most likely due to the presence of the primary tumor. Exosomes contain a number of analytes (nucleic acids, proteins, and metabolites) which strongly reflect the parental cell properties, making them a promising alternative to CTCs or circulating tumor DNA (ctDNA) as biomarkers of disease. In a study of extracellular vesicles (EV; size 30-200 nm) detected from urine, BCa patients had higher concentration of EVs in the urine when compared with healthy controls, with a sensitivity of 81.3% and a specificity of 90.0% in the discrimination of BCa patients against healthy controls [86]. This supports the utility of LEVs in the diagnostic work-up for BCa clinical care. In prostate cancer, LEVs detected in the PB using the same workflow were 1.9 times as frequent as CTCs and shared a similar protein signature [73]. Here we show that LEVs are associated with BCa tumorigenesis and may be useful diagnostic and prognostic biomarkers. Further characterization of the LEVs detected here will validate their neoplastic origin and association with the BCa disease state.
Molecular characterization of the rare events detected in this study will elucidate their potential role in BCa tumorigenesis. Molecular profiling through genomic and proteomic analysis of a patient's liquid biopsy will have value in enabling the discovery of novel drivers of growth and metastasis that help direct individual treatment or identify potential new treatment targets. Using the HDSCA workflow, we have the unique opportunity for a comprehensive analysis of the liquid biopsy [66,67,74,87-90]. Previous studies have used single-cell sequencing and targeted multiplexed proteomic analysis to characterize both circulating rare and common cells detected by the HDSCA workflow in a variety of clinical scenarios [66,67,74,87,88,91]. Additionally, cfDNA genomic analysis is possible for a more comprehensive view of the liquid biopsy. Multiple prior studies indicate that ctDNA is detectable in plasma of BCa patients, and high levels of ctDNA are associated with progression and metastatic disease [92-95]. Chalfin et al. show that CTC and ctDNA provide complementary information in urothelial carcinomas [96]. The ability to characterize tumor heterogeneity using a single platform with comprehensive single-cell DNA, single-cell multiplexed targeted proteomics, and cfDNA analysis could provide precision diagnostics from the time of initial diagnosis for patients with BCa. Future research aims to establish evidence towards the clinical utility of the liquid biopsy in BCa to predict metastatic relapse post cystectomy and enable clinical intervention to lead to improved outcomes.
This study establishes evidence for the clinical utility of the liquid biopsy in BCa with the future goal of predicting metastatic relapse post-cystectomy and enabling clinical intervention that can lead to improved outcomes. Here we show the identification of rare cells and LEV frequencies unique to BCa patients, with distinct populations within and across patients underscoring the heterogeneity of liquid biopsy profiles. Further, the high specificity and sensitivity metrics of the prediction models demonstrate the stratification of BCa patients from ND using this methodology. While further investigation is needed to elucidate the predictive power of these analytes with respect to recurrence, the findings from this study show the liquid biopsy as a promising clinical tool for early-stage BCa patients.
While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms of the invention. Rather, the words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention. Additionally, the features of various implementing embodiments may be combined to form further embodiments of the invention.
This application claims the benefit of U.S. provisional application Ser. No. 63/304,404 filed Jan. 28, 2022, the disclosure of which is hereby incorporated in its entirety by reference herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2023/011889 | 1/30/2023 | WO |
Number | Date | Country | |
---|---|---|---|
63304404 | Jan 2022 | US |