The present disclosure generally relates to genotype to phenotype association methods and devices, and more particularly to genotype to phenotype association methods and devices for use in indexing whole exomes or genomes relative to phenotypic expression.
Clinical genetics is a relatively new and evolving practice. Whole exome sequencing (WES) is increasingly being utilized to establish the genetic basis of disease in patients. Advances in genome sequencing technologies have allowed for rapid development of pipelines for sequence reading, alignment, and variant calling, but the downstream tasks of variant interpretation and assessing the clinical relevance of variants are still being refined.
Clinical genetics are a relatively new and evolving practice. Whole exome sequencing (WES) is increasingly being used in clinical settings to establish the genetic basis of rare and single gene disorders in patients. Sequencing laboratories will return a static report of genetic variants that are potentially associated with a patient's clinical feature (phenotype). The static report does not allow a clinician to easily assess the raw data, update results in light of an appearance of new symptoms. Further, there is an inherent lag between the static report and newly discovered genotype/phonotype associations.
Sometimes referred to as tertiary analysis, variant interpretation includes annotating, filtering, and associating sequence variants with disease, for example, translating the gnomic data into a clinical diagnosis. Typically, WES generates 30-60 million base pairs, or 4-6 GB of raw sequencing data for each patient. After aligning to the human reference genome about 250,000-400,000 variants are identified. Most of the variants are likely to be benign, and only a small number-often as few as one or two-contribute to a specific genetic disease in a patient. Identifying which sequence variants are disease-causing can be overwhelming and difficult for researchers to easily accomplish. Typically, extensive bioinformatics experience is required to use many of the analysis tools currently available to the research community.
Currently, there is a bottleneck in human genomics and exome sequencing studies when narrowing down a list of sequencing variants from 100,000+ to just a few (usually 1 or 2) that are disease-causing in an individual.
One aspect of the present disclosure comprises genome system for displaying an interactive genome dashboard. The genome system includes a processing device having a processor configured to perform machine learning and performing a matching function between phenotype keywords and gene variants identified in a genome sequence to create gene matches based upon multiple text inputs and the genome sequence introduced through the interactive genome dashboard. The processing device includes memory wherein previously generated matches are tagged and stored based upon the multiple text inputs, the genome sequence, and subsequent receipt of user interaction with the generated matches. The processing device receives one or more phenotype keywords and the genome sequence from the genome dashboard, identifies genetic variants associated with the phenotype keywords, matches the genetic variants to known genetic variants to generate a first diagnosis, and sends a signal to present the first diagnosis and the phenotype keywords associated with the genetic variants on the genome dashboard. Responsive to receiving a signal adding filters from a user of the genome dashboard, the processing device applies added filters to the phenotype keywords associated with the genetic variants and the first diagnosis and generates filtered phenotype keywords associated with the genetic variants and generates a second diagnosis, and sends a signal to present the second diagnosis and the filtered phenotype keywords associated with the genetic variants on the genome dashboard.
Another aspect of the present disclosure comprises a non-transitory computer readable medium storing instructions executable by an associated processor to perform a method for implementing a genome system for displaying an interactive genome dashboard. The method includes storing a first diagnosis generated by the genome system based upon a genome sequence and initial data, the initial data comprising identified genetic variants of the genome sequence, phenotype keywords, multiple text inputs, and phonotype genetic variant associations. The method further includes, responsive to receiving additional multiple text inputs, extracting one or more additional phonotypic terms from the additional multiple text inputs, identifying one or more genetic variants present in the genome sequence associated with the one or more additional phonotypic terms, and generating a second diagnosis based upon the one or more additional phonotypic terms and the initial data. The method additionally includes responsive to the first diagnosis being the same as the second diagnosis, storing the second diagnosis; and responsive to the first diagnosis being different than the second diagnosis, presenting the second diagnosis on the genome dashboard.
Yet another aspect of the present disclosure comprises A genome system for displaying an interactive genome dashboard. The genome system includes a processing device having a processor configured to perform a matching function between phenotypes and gene variants to create gene matches based upon multiple text inputs and genome sequences introduced through the interactive genome dashboard. The processing device receives one or more phenotype keywords and a genome sequence of a patient exhibiting the one or more phenotype keywords and matches and presents on the interactive genome dashboard one or more gene variants present in the genome sequence associated with the one or more phenotype keywords. Further, the processing device identifies and presents on the interactive genome dashboard disease candidates based upon the one or more gene variants association with the one or more phenotype keywords, identifies and presents on the interactive genome dashboard non-represented gene variants that are associated with each of the disease candidates that are not present in the one or more gene variants, and generating sortable list on the interactive genome dashboard of identifying each of the one or more phenotype keywords and each of the one or more gene variants the comprises clinical evidence supporting each of the disease candidates.
The foregoing and other features and advantages of the present disclosure will become apparent to one skilled in the art to which the present disclosure relates upon consideration of the following description of the disclosure with reference to the accompanying drawings, wherein like reference numerals, unless otherwise described refer to like parts throughout the drawings and in which:
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present disclosure.
The apparatus and method components have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.
Referring now to the figures generally wherein like numbered features shown therein refer to like elements throughout unless otherwise noted. The present disclosure generally relates to genotype to phenotype association methods and devices, and more particularly to genotype to phenotype association methods and devices for use in in indexing whole exomes or genomes relative to phenotypic expression.
The processing device 12 would generate outputs based upon inputs received from a secondary device 16, cloud storage, a local input form a user, etc. It would be appreciated by having ordinary skill in the art that the processing device 12 would include a data storage device 17 in various forms of non-transitory, volatile, and non-volatile memories which would store buffered or permanent data as well as compiled programming codes used to execute functions of the processing device 12. In another example embodiment, the data storage device 17 can be external to and accessible by the processing device 12, the data storage device 17 may comprise an external hard drive, cloud storage, and/or other external recording devices 19.
In one example embodiment, the processing device 12 comprises one of a remote or local computer system 21. The computer system includes desktop, laptop, tablet hand-held personal computing device, IAN, WAN, WWW, and the like, running on any number of known operating systems and are accessible for communication with remote data storage, such as a cloud, host operating computer, via a world-wide-web or Internet.
In another example embodiment, the processing device 12 comprises a processor, a data storage, computer system memory that includes random-access-memory (“RAM”), read-only-memory (“ROM”) and/or an input/output interface. The processing device 110 executes instructions by non-transitory computer readable medium either internal or external through the processor that communicates to the processor via input interface and/or electrical communications, such as from the secondary device 16 (e.g., smart phone, tablet, personal computer, or other device). In yet another example embodiment, the processing device 12 communicates with the Internet, a network such as a LAN, WAN, and/or a cloud, input/output devices such as flash drives, remote devices such as a smart phone or tablet, and displays. The secondary device 16 includes a display 18, the display having visual, audio, etc. output. In one example embodiment, the genome system 100 is a web-based tool (e.g., no download or installation is needed to utilize the genome system 100). In another example embodiment, the genome system 100 is partially and/or completely downloadable. The genome system 100 is interactive, meaning a user may change and alter their search preferences and view results in real-time.
Illustrated in
In the illustrated example embodiment of
The genome dashboard 200 provides input options to add priority measures (e.g., such as 1-10) to increase or reduce the contributions of different phenotype terms that an Artificial Intelligence engine 602 will process (see
The genome dashboard 200 illustrates variants and/or genes 205 matching the search 211, wherein each variant or gene is illustrated within a row. In the illustrated example embodiment of the second view 200b, one of the plurality of columns 205 includes one or more links to external public databases. The external public databases are identified and presented to a user, wherein the links are matched based upon the phenotype or disease description 112. In one example embodiment, the links 213 include information about other individuals with similar disease and/or gene variations.
In the illustrated example embodiment of the second view 200b, one of the plurality of columns 206 includes identified variants (e.g., in a protein-coding sequence region of a gene). In the illustrated example embodiment of the second view 200b, additional columns of the plurality of columns 206 include chromosome numbers, start, type of variation, zygosity, gene, Loc in gene, global frequency 210, and/or database matches 213. In one example embodiment, the additional columns of the plurality of columns 206 are filterable.
In one example embodiment, in the second view 200b, responsive to the user selecting a confirmation mode to confirm or reject a clinical diagnosis, the genome dashboard 200 will output a Yes/No/Maybe/Partial confirmation, and/or a confidence score. In another example embodiment, in the second view 200b, responsive to the user selecting a primary diagnosis mode, the genome dashboard 200 will output top clinical recommendations that supports the phenotype and genomic data, as identified and ranked by the knowledge base 108.
In yet another example embodiment, in the second view 200b the genome dashboard 200 will, responsive to the user selecting a secondary analysis mode, output additional variants/diagnosis recommendations and hide the top clinical recommendations that support the phenotype and genomic data, as identified and ranked by the knowledge base 108. In yet another example embodiment, in the second view 200b the genome dashboard 200 will, responsive to the user selecting a genomic reinterpretation or phenotypic updates mode, identify recent changes in the reference databases (e.g., new knowledge) and output the recent changes as patient conditions to highlight any changes in interpretation based upon the recent changes.
Additionally, in an example embodiment, the genome system 100 will integrate additional clinical information (e.g., lab test results, blood work, physical presentations of illness, etc.) and additional genomic data (e.g., proteomics, epigenomics, histology, etc.) in order to better filter the data for the second view 100b (e.g., a diagnosis confirm/reject or diagnosis recommendation). The genome system 100 will generate pop-ups and reports that illustrate how selected gene variants connect to the phenotype or disease description 112 and/or to a proposed diagnoses. The genome system 100 will illustrate high priority mismatches between genomic interpretations and phenotype or disease description 112. Another report will show in a filterable list of all gene variants connected by the genome system 100 with a particular phenotype or disease description 112. If there are differences between the canonical disease/genomics the genome system 100 will highlight the differences with visual indicators.
Illustrated in
At 510, in an optional step, the genome system 100 assesses one or more model organisms for impacts or potential impacts of the identified gene variants present in the sequencing data. In one example embodiment, the one or more model organisms are identified human orthologs that are maintained within the knowledgebase 108. In another example embodiment, the model organisms are identified using an external knowledgebase.
At 512, the genome system 100 filters for clinical priority, incidental findings, pharmocogenomic variants, mode of inheritance, and/or population frequency. For example, illustrated in the example embodiment of
At 518, the genome system 100 filters the additional phenotype terms based upon additional filters, including received coding sequence variants and/or frequency of variant. In one example embodiment, the additional filters, such as population frequency, clinically relevant variants, etc. are available from the knowledgebase 108 and may be applied into scoring for matching variants to phenotypes. In one example embodiment, the genome system 100 will rank a variant identified as clinically relevant (e.g., having a higher association with a disease phenotype) higher than a variant that is not associated with clinical outcomes (e.g., the variant has a low, or no association with a disease phenotype), where higher ranking indicates greater likelihood of the phenotype being associated with the variant. In one example embodiment, the variant is identified as relevant if it has an association with a phenotype or disease description 112 received from the clinical notes and/or the user over a variant association threshold. At 520, the genome system 100 presents additional ranked findings 500b to the user based upon the additional phenotype or disease descriptions 112 and/or the additional filters (see, for example,
Illustrated in
At 610, the artificial intelligence engine 602 filters the ranked matches based upon a strength of mutation/phenotype correlation. The artificial intelligence engine 602 performs the mutation/phenotype correlation, and subsequently performs additional mutation/phenotype correlations based upon one or more filters a user may emphasize or deemphasize. In one example, the user emphasizes a filter by providing a weighting/priority score that will be utilized to rank matches. In this example embodiment, the artificial intelligence engine 602 calculates a composite score based upon the strength of the mutation/phenotype correlation, additional mutation/phenotype correlations, and/or user provided weighting/priority scores.
In one example embodiment, the user provided weighting/priority scores are generated where a user determines that some of the phenotypes are more/less important than others for the patient or a specific disease. The user is provided with an option, by the artificial intelligence engine to weight the contributions of identified phenotypes from along a value scale (e.g., 1-5). Further, wherein the user is not confident that the patient was diagnosed correctly among similar phenotypic elements the user is provided with the option to alter the weighting to reduce the contribution of those phenotypes that the user has less confidence. Likewise, the artificial intelligence engine 602 presents the user with the option to change the weighting of certain genome variants because the user believes the variant is very important to the diagnosis (e.g., raise the value from default 3 to 5), or because there is a lack of scientific evidence that the variant is important to a disease (e.g., lower the value lower it from 3 to 1) to reduce the impact of a mismatch on the diagnosis. In this example embodiment, the artificial intelligence engine 602 assigns a default weight (e.g., 3) to all variants, wherein the user has the option of altering such default weights.
At 612, the artificial intelligence engine 602 utilizes named entity recognition (NER) to extract additional phenotype or disease descriptions 112 from the clinical notes 208. In one example embodiment, at 606, the extracted additional phenotype or disease descriptions are translated from natural language and/or colloquial terms to phenotype or disease descriptions 112 (as described at step 606). The extracted additional phenotype or disease descriptions 112 undergo steps 608-610. At 614, the artificial intelligence engine 602 generates highlighted (e.g., visually differentiated) phenotype or disease descriptions 112 extracted by NER on the generated ranked and filtered findings. At 616, the artificial intelligence engine 602 presents the ranked and highlighted findings to the user on the user interface 206 of the genome dashboard 200.
Illustrated in
At 708, the artificial intelligence engine 602 extracts entities, including disease names, symptoms, and/or diagnosis using NER. At 710, the artificial intelligence engine 602 generates highlighted (e.g., visually differentiated) phenotype terms extracted by NER on a generated ranked match (e.g., such as the ranked match generated at 614 of method 600). At 712, in an optional step, the user applies additional filters including frequency <1%, protein coding regions, molecular consequence, and/or damaging score>1. At 716, the artificial intelligence engine 602 presents the ranked and highlighted findings to the user on the user interface 206 of the genome dashboard 200. At 718, the artificial intelligence engine 602 records the user interaction with the ranked finding based upon the entities and user input responsive to results using the currently assigned values. In one example embodiment, the user interaction is utilized to alter the assigned value in step 706.
Illustrated in
Illustrated in
The genome system 100 provides notice to the user that a new analysis and new interpretation/diagnosis is available. At 908, the genome system 100 identifies which features altered the diagnosis (e.g., phenotype gene matching, gene association with diagnosis, etc.), for example based upon improvements to current diagnosis compared to closed case diagnosis. At 910 the genome system 100 provides the user with an updated diagnosis, including identifying the features that altered the updated diagnosis.
Illustrated in
As continued in example method 1000b in
Alternately, as continued from section line C-C extending from 1016 in
At 1050, responsive to the user selecting an evidence list view, the genome system 100 generates a clinical evidence list including phenotypes 204, clinical notes 208, and/or phenotype or disease description 112. At 1052, the genome system 100 integrates filters 207 selected by the user. In one example,
At 1062, responsive to the user selecting column view, the genome system 100 generates a column view including a first column having typical genes and variants associated with diagnosis and a second column having actual occurrence of typical genes in the gene sequence 202 of the patient. At 1064, the genome system 100 visually identifies matches between gene variants and observed phenotypes with a first visual marker, mismatches between gene variants and observed phenotypes with a second visual marker, and gene variants and observed phenotype pairs that do not confirm or reject diagnosis with a third visual marker. At 1066, the genome system 100 presents first and second visually marked columns to the user. At 1068, the genome system 100 presents a genetic variants filter to the user. At 1070, the genome system 100 receives a user selection for sorting variants. At 1072, the genome system 100 responsive to receiving a selection for sorting variants, adds or removes variants based upon the user selection. At 1074, the genome system 100 presents the filtered first and second columns to the user.
At 1076, responsive to the user selecting discovery view, the genome system 100 generates a discovery view including predicting functional changes/consequences of genetic variants of a patient. The functional consequences of the genetic variants are predicted by annotating the patients genome sequence with functional annotators that generate annotations associated with various genetic variants. In an example embodiment, the functional annotators include example functional annotators such as JANNOVAR and Exomiser. The annotations are extracted from the patient's genome sequence and assigned values for use in the ranking of genetic variants.
Optionally, the genome system 100 adds visual indicators as in step 1064 to the discovery view. At 1078, the genome system 100 identifies if a gene has a known or unknown significance. At 1080, the genome system 100 assigns a significance to a gene if known. At 1082, the genome system 100 generates a list of genes having an assigned significance over a significance threshold. Genes having an assigned significance under the significance threshold are not presented to the user. At 1083, the genome system 100 presents the list of significant genes to the user.
At 1084, responsive to the user selecting evidence gap view, the genome system 100 generates an evidence gap view including copy number variation, gaps in hard sequence regions and/or clinical pathology. At 1086, the genome system 100 presents the evidence gap view to the user. The evidence gap view will also inform the user when there is additional genomic data not present that could help confirm or reject a specific diagnosis. The additional genomic data comprises copy number variation (CNV), genotyping, sequencing a genomic region beyond Whole Exome Sequencing (e.g., if that was a filter), and/or other chromosomal aberrations (larger insertions, deletions, recombinations). The user may select the evidence gap view at any point of interaction with a case in the genome dashboard 200, as such the user can add additional genomic data for the patient at any time during the diagnosis.
As continued from 1060, 1074, 1083, and/or 1086 and section line D-D of example method 1000c and continued in example method 1000b in
Illustrated in
The version two case (e.g., the current diagnosis) is compared as either a reinterpretation or a full new diagnosis of the original sequence data 202 or a full new diagnosis using new clinical and new genomic data. In one example embodiment, visual indicators (e.g., highlights) are applied to what has changed in the patient phenotype input (if any), what has changed in the genomic data input (if any), what has changed in the key variant analysis, and/or what has changed in expert guided diagnosis recommendations. At 1116, the genome system 100 presents the compare view to the user on the genome dashboard 200. The compare view, the version one case, and/or the version two case are archived as soon as a case is closed. The genome system 100 enables the user to time/date stamp specific versions of a case, as well as to add and save specific annotations and bookmarks for genes, variants, phenotypes, included diagnoses, and/or excluded diagnoses. In one example embodiment, the version two case is created by selecting a new view option in order to save all the work that has been done on the version one case and then begin again with a reset case or retain the version one case work and make changes such as selecting or deselecting the genes/variants, by adding stars or ‘X’s to create annotations showing interest or disinterest (e.g., star indicates interest, and x indicates disinterest). The genome system 100 allows the user to compare one or more versions of a case for a particular patient side-by-side.
Illustrated in
Illustrated in
At 1308, the genome system 100 presents the partitioned diagnosis, including one more sub-cases to the user on the genome dashboard 200. At 1310, the genome system 100 presents the option on the genome dashboard 200 to the user to move variant and/or phenotype data between a main case and one or more sub-cases. At 1312, the genome system 100 receives a request to move variant and/or phenotype data between a main case and one or more sub-cases. At 1314, responsive to receiving the request to move variant and/or phenotype data between a main case and one or more sub-cases, the genome system 100 presents an option to move the variant and/or phenotype data to the sub-case while maintaining the data in the main case, or to move the variant and/or phenotype data into the sub-case and out of the main case. At 1316, the genome system 100 receives a request to move variant and/or phenotype data from a main case to one or more sub-cases and maintain variant and/or phenotype data in the main case. At 1318, responsive to receiving the request to move variant and/or phenotype data from the main case to one or more sub-cases and maintain the variant and/or phenotype data in the main case, the genome system 100 presents the sub-case to the user including the selected variant and/or phenotype data, while maintaining the selected variant and/or phenotype data in the main case. At 1320, the genome system 100 receives a request to move variant and/or phenotype data from a main case to one or more sub-cases and remove the variant and/or phenotype data from the main case. At 1318, responsive to receiving the request to move variant and/or phenotype data from the main case to one or more sub-cases and remove the variant and/or phenotype data from the main case, the genome system 100 presents the sub-case to user including the selected variant and/or phenotype data, while removing the selected variant and/or phenotype data from the main case.
The genome dashboard 200 and genome system 100 offer an effective solution by allowing users to upload sequencing data and explore and compare against known gene-disease associations in other humans and closely related animal models. Comparing sequencing data to that of other humans is one of the best and most efficient methods to help identify gene variants responsible for human disease.
The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential features or elements of any or all the claims. The disclosure is defined solely by the appended claims including any amendments made during the pendency of this application and all equivalents of those claims as issued.
Moreover in this document, relational terms such as first and second, top and bottom, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” “has”, “having,” “includes”, “including,” “contains”, “containing” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises, has, includes, contains a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element proceeded by “comprises . . . a”, “has . . . a”, “includes . . . a”, “contains . . . a” does not, without more constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises, has, includes, contains the element. The terms “a” and “an” are defined as one or more unless explicitly stated otherwise herein. The terms “substantially”, “essentially”, “approximately”, “about” or any other version thereof, are defined as being close to as understood by one of ordinary skill in the art. In one non-limiting embodiment the terms are defined to be within for example 10%, in another possible embodiment within 5%, in another possible embodiment within 1%, and in another possible embodiment within 0.5%. The term “coupled” as used herein is defined as connected or in contact either temporarily or permanently, although not necessarily directly and not necessarily mechanically. A device or structure that is “configured” in a certain way is configured in at least that way, but may also be configured in ways that are not listed.
To the extent that the materials for any of the foregoing embodiments or components thereof are not specified, it is to be appreciated that suitable materials would be known by one of ordinary skill in the art for the intended purposes.
The Abstract of the Disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in various embodiments for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.
The following application claims priority under 35 U.S.C. § 119 (e) to U.S. Provisional Patent Application Ser. No. 62/986,164 filed Mar. 6, 2020 entitled GENOME DASHBOARD. The above-identified application is incorporated herein by reference in its entirety for all purposes.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US21/21341 | 3/8/2021 | WO |
Number | Date | Country | |
---|---|---|---|
62986164 | Mar 2020 | US |