The present invention relates to a data-driven integrative visualization system and method for summarizing and presenting genomic aberrations, their drug responses and multi-omic data of a patient. Specifically, a method for displaying genomic aberrations and multi-omic data of a patient in an interactive tool which allows the medical practitioner to access underlying supporting biologic and scientific evidence from relevant knowledge bases through a set of graphical interactions, is described. The method comprises the steps of obtaining and inputting multi-omic data of a patient or cohorts, identifying genomic aberrations and their drug responses, and displaying this information in a first level interactive classical/circular ideogram located by genome coordinates in one or multiple layers on a GUI, from which the user can access and view further information on the gene and molecular levels. The system provides an improved process of integrative analysis of a patient's multi-omic data for effective treatment planning.
Idiogram is a standard visual tool for locating the positions of individual genes or aberrations on chromosomes. Traditionally, the prominent Giemsa-staining bands are marked on each chromosome and they are named following the International System for Cytogenetic Nomenclature (ISCN). In the ISCN scheme, chromosomes are assigned a short arm and a long arm, which begin with the designations p and q respectively. The numbering for a chromosome begins at its centromere and the numbers assigned to each region increase towards the telomere.
Krzywinski, M. et al., Circos: an information aesthetic for comparative genomics, Genome Research 19, 1639-1645 (2009), describe a software-driven tool for visualizing data and information in a circular format, which makes it ideal for exploring relationships and information. This format was originally designed for visualizing genomic data and for creating publication-quality infographics and illustrations, but is also applied in data fields to describe the relationships between objects or positions in a circular layout, and to summarize multilayered annotations of one or more scales. When used in genomics as an alternative to classical ideograms, the circular genome coordinates make it effective in displaying variations in genomic structure, and data as scatter, line and histogram plots, heat maps, tiles, connectors and texts in multiple tracks. Currently, its use in genomics is mainly for the static presentation of cohort data, most often in scientific publications. It neither supports user interaction or data exploration, nor facilitates sample/cohort comparison, and is not intended for presenting precision medicine or clinical trial information for an individual patient.
The goal of this invention is to create a new tool that is useful for precision medicine software applications, such that both genomic aberrations and their corresponding treatment options and drug responses are summarized for one or more patients. The existing notion of the classical idiogram or circos plot is fairly simple, and non-interactive. However, by creating a new representation that is interactive, we enable users to navigate and view the details of the genomic data at different levels, explore the underlying scientific evidence and have quick access to relevant information in knowledge-bases. The new interactive Precision Medicine Explorer of this invention significantly improves the process of integrative analysis of a patient's multi-omic data for effective treatment planning.
In further contrast to prior art, this invention is an effective precision medicine tool for summarizing and presenting the genomic aberrations, their drug responses and multi-omic data of a patient. It facilitates the understanding of the underlying biology and the supporting scientific evidence by allowing a user to dig deep into the details and access relevant information from knowledge bases, such as ClinVar (www.ncbi.nlm.nih.gov/clinvar/), LOVD
(Leiden Open (source) Variation Database—www.lovd.nl/3.0/home), HGMD Human Gene Mutation Database www.hgmd.cf.ac.uk/ac/index.php, COSMIC cancer.sanger.ac.uk/cosmic, 1000 Genomes www.internationalgenome.org, OMIM omim.org and other databases, through an extensive set of graphical interactions.
Our Precision Medicine Explorer can be implemented as a standalone application or a GUI component that takes processed omic data as inputs. The software can run as software, as a service on a cloud based infrastructure, or as a standalone application on a mobile device, laptop or local server. Each layer is associated with an independent data environment, which may include multiple tables for mutations (SNVs, indels, CNVs, fusions, etc.) with annotation information, drug options, clinical trials, gene/exon expressions, and methylation. Besides visualizing and presenting the data, the tool also handles user inputs and interactions, and queries different knowledge bases to incorporate further information if necessary.
It is an object of the present invention to provide an improved presentation for exploration of patient-oriented omic data (genomic, transcriptomic, proteomic, epigenomic, etc.), treatment options and underlying scientific evidence for use by clinicians, oncologists, geneticists, medical professionals and scientists. In particular, it is an object of the present invention to provide a system and method that solves the above-mentioned problems of the prior art by providing an interactive visualization tool for summarizing and presenting patient multi-omic data in a circular or linear multilayered format. It is also an object of the present invention to provide a system and method for providing patient genomic aberration, detailed annotations and related drug response data to improve the view of combined effects of multiple genomic aberrations on the functional effect as well as link to potential therapy. It is a further object of the present invention to provide interactive access, through the visual multi-omics format, to underlying intergenic genomic information, methylation and gene/exon expression data, on a genic scale, and nucleotide sequence, amino acid sequence and methylation data, on a molecular scale. It is also an object of the present invention to provide an alternative to the prior art.
Thus, the above-described object and several other objects are intended to be obtained in a first aspect of the invention by providing a system and method for providing relevant patient-specific genomic information, such system and method comprising:
obtaining genomic aberration and other omics data from a patient and storing said data on a non-transitory computer readable storage medium—one of the common processses for data generation involves the collection of tissue and blood samples from the patient, performing next-generation sample preparation and DNA/RNA seqeuncing, read alignment and calling of variants and gene expressions;
optionally selecting a cohort of samples based on user-defined demographic and phenotypic criteria from a repository of patient or healthy samples, and extracting their genomic aberration and omics data for comparison with the patient of interest;
annotating the genomic aberration and omics data using internal/external knowledge bases, which include information such as mutation impact, population allele frequency, disease association with model of inheritance, drug response, etc.
filtering the genomic aberrations and omics data based on user-defined criteria, such as chromosome regions, genes, variant type/function/impact/population allele frequency, etc.
with a computing device with a graphical user interface, displaying the genomic aberration and omics data in an interactive multi-level format, which comprises;
a first level (Level 1), comprising an interactive chromosomal view that summarizes all the clinically relevant or actionable genomic aberrations of a patient by marking them on the genome coordinates, including known drug responses associated with a particular mutation/gene marked next to the mutation/gene accordingly, the first level further comprising two additional levels which can be accessed by the user which include Level 1A, a circular ideogram view where chromosomes are arranged in a circular layout, and Level 1B, an ideogram view, where each chromosome is separately displayed in a schematic;
a second level (Level 2), comprising an interactive intergenic genomic scale where multiple genes are displayed with their expression levels indicated by color. Additional Data tracks can be included to add more details such as methylation, chromatin immunoprecipitation sequencing (ChIP-Seq), Native Elongating Transcripts Sequencing (NET-Seq) and Assay of Transposase Accessible Chromatin Sequencing (ATAC-Seq) data at any view levels which may improve the functional view of genomic aberrations; With ChIP data we will see if there is functional binding of the transcription factors to their targets; with NET-Seq we can analyze the genome-wide transcriptional activity; and with ATAC-Seq we can study chromatin accessibility. These aspects may lead to conclusions about activation of gene targets downstream.
a third level (Level 3), comprising an interactive genic scale, depicting the structure and functional blocks within a gene, omics data such as methylation levels and gene/exon expression, the 3D protein structure (ribbon plot) with mutations marked and including general information about the gene; and
a fourth level (Level 4), comprising a molecular scale displaying the molecular sequence and its detailed annotations, such as the nucleotide sequence of the reference genome, the corresponding amino acid sequence in the protein-coding regions, nucleotide/amino acid changes caused by the mutations, exon/gene expression, methylation levels of CpG sites, ChIP-Seq data for histone modification, and any additional data tracks that incorporate more details. The complete human reference sequence (GRCh37) can be downloaded in fasta format from the UCSC Genome Browser Server (hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/) and the exon locations of the known canonical genes and other gene annotations can also be downloaded from the UCSC Genome Browser; and
displaying said first through forth levels individually on a graphical user interface.
By clicking/selecting a region on the chromosome or specifying a range of chromosome positions, users can view, access and explore data at these different view levels. The data come from different sources: (i) the patient-specific data such as mutations, gene expressions and additional data tracks can be stored as flat files or database tables, (ii) the variant annotations can be retrieved from local or online knowledge-bases, (iii) the reference genomes and gene locations and annotations consist of data files that can be downloaded from public repositories and stored locally.
In addition, a second aspect of the present invention is directed to a display of the omics data of a patient or a cohort of patients in multiple layers for side-by-side comparison. The genome coordinates are locked and in line across layers. Users are able to add/remove/combine/change the order of multiple layers and explore any one of them in details through all interactions that are applicable to a single layer, which when executed by a computing device with a graphical user interface, cause the device to carry out the steps of the method as described above.
The methods according to the invention will now be described in more detail with regard to the accompanying figures. The figures showing ways of implementing the present invention and are not to be construed as being limiting to other possible embodiments falling within the scope of the attached claims.
The methods according to the invention will now be described in more detail with regard to the accompanying figures. The figures showing ways of implementing the present invention and are not to be construed as being limiting to other possible embodiments falling within the scope of the attached claims.
The present invention provides a system and method for summarizing and presenting genomic aberrations, their drug responses and multi-omic data of a patient, by displaying genomic aberrations and multi-omic data of the patient in an interactive classical/circular ideogram format which allows the medical practitioner to access underlying supporting biologic and scientific evidence from relevant knowledge bases through a set of graphical interactions. The present invention is described in further detail below with reference made to
With a computing device having a graphical user interface, the genomic aberration and omics data are then displayed in an interactive multi-level format. At Level 1 of the method and system for displaying patient-specific genomic data and genomic aberrations, all the clinically relevant or actionable aberrations of a patient are summarized by marking them on the genome coordinates (see
By selecting a specific gene at Level 2, the user is directed to Level 3 of this embodiment, as shown in
Similarly, the user accesses Level 4, as seen in
To enhance data presentation, the invention employs different symbols to represent different types of aberrations and drug/clinical trial associations with their levels of significance indicated by properties such as color and size, as can be seen in
To enable the seamless navigation to the patient's multi-omic data at different levels of details and quick access to relevant information from different knowledge bases, the Precision Medicine tool of this invention is highly interactive and user friendly. The set of supported user interactions include, but are not limited to, the following:
In a further embodiment, users can choose to display the omic data of a patient or a cohort of patients in multiple layers of the visual representation in the Precision Medicine Explorer for side-by-side comparison. See
In genomics, it is customary to offer multiple filtering options to the user for each of the types of genomic aberrations. Within this embodiment, the goal is to associate the genomic aberrations to key evidence for treatment planning. In any embodiment of this invention, users can determine what data is to be presented in one or multiple layers of ideogram by applying a combination of filters that include but are not limited to the following:
Users can show the genes or other information associated with a keyword on the ideogram by typing the keyword in a search box with autocomplete functionality. The search term can be a gene symbol, signaling pathway, disease, drug, or biological concept such as oncogene/suppressor, etc. Users can also search for a combination of these terms concatenated by logical operators, such as “,/OR”, “&/AND”, etc. Once the data related to the search term(s) are retrieved from the databases, they are displayed on the same or a separate ideogram (see
Referring to
To make the zoom-in or zoom-out transition look continuous and smooth, and enhance the navigation and user experience, our Precision Medicine Explorer includes a 3D option that enables users to view the chromosome layouts from different visual perspectives (see
Association with Evidence for Key Findings
One essential functionality of our Precision Medicine Explorer is to display the drugs/treatments with their known predicted/experimental/clinical responses (increased/decreased) or clinical trial options associated with patient-specific data, such as genomic aberrations, up/down-regulated gene expressions, abnormal methylation levels or other omics anomalies with supporting evidence, which can be further explored through user interactions. For example, the gene mutation BRAF V600E is known for increased sensitivity to Vemurafenib in Melanoma, and the gene mutation EGFR T790M for resistance to tyrosine kinase inhibitors. Such associations can be looked up from local/external knowledge bases such as the Catalogue Of Somatic Mutations In Cancer (COSMIC) Database, the Mutations and Drugs Portal (MDP), the Cancer Drug Resistance Database (CancerDR), the Drug Gene Interaction Database (DGIdb) and ClinicalTrials.gov. Additional information on the drugs, such as the side effects, toxicity, mechanism of action, interactions with other drugs and the supporting scientific evidence can be accessed for display. Gathering, summarizing and presenting such information in one single tool can facilitate the design of combinatorial therapy and caution the potential threats of certain drug combinations that should be avoided.
As a use case example, our Precision Medicine Explorer is used for examining the omic data of an ER+ breast cancer patient. From the top-level view, the oncologist gets a genomic overview of the clinically relevant mutations carried by the patient and the available drug options. As expected, an overexpression of the ESR1 gene was reported with a list of drug options consisting of ER inhibitors. If the oncologist wants to further examine the expression levels of the genes in the ER pathway, she would then add a track for gene expression and filter for a pre-defined panel of ER pathway genes. After inspecting the expression values, she confirmed whether the patient has a hyperactive ER pathway, which could be effectively suppressed by ER inhibitors. She also noticed that the patient carries a known pathogenic mutation in the PIK3CA gene. She clicks on the mutation and checks the allele frequency, function, pathogenicity, call quality, related publications, among other details, and confirmed that the mutation served as a good prognostic biomarker for favorable therapeutic response of PIK3CA inhibitors. After comparing the clinical evidence and possible side effects of the drug options, she decided to administer the two inhibitors with the strongest clinical evidence respectively for suppressing the activities of ER and PIK3CA in combination for treating the patient. Our Precision Medicine Explorer significantly improved the workflow of an oncologist in performing integrative analysis on a patient's omic data for treatment planning.
This application claims priority to U.S. provisional patent application No. 62/490,921, filed on Apr. 27, 2017, the entire disclosure of which is hereby incorporated by reference herein for all purposes.
Number | Date | Country | |
---|---|---|---|
62490921 | Apr 2017 | US |