The current medical care model is largely focused on “sick care,” or waiting until symptoms of disease develop, followed by diagnosis and treatment. This approach was developed over the last century when infectious diseases and other acute conditions were more common than they are today. The “sick care” model is badly out of step with early detection and prevention in response to today's age-related chronic diseases epidemic including cardiovascular disease, cancer, diabetes, and dementia in the U.S. and other wealthy countries. As a result, many people suffer needlessly and die prematurely. The reasons for this mismatch are varied and numerous but largely revolve around the size, inertia, and misaligned payment incentives of and within the medical industry including health care providers, health insurers, and pharmaceutical companies.
Recent progress in genomics and other technologies along with the rising importance of age-related diseases have opened an opportunity to revolutionize health and the practice of medicine. Most dramatically, the costs of genomic sequencing have decreased by more than four orders of magnitude over the last fifteen years, going from $100,000,000 for the first human whole-genome sequence to less than $10,000. The same shotgun sequencing techniques Venter, et al. developed to revolutionize human whole-genome sequencing are now also being used to define and explore the microbiome. Sometimes called our “second genome,” the microbiome is composed of the trillions of bacteria and other microorganisms that live in and on our body, all with their own genetic material interacting with our own human cells to support health and cause or be associated with disease. Combining human whole-genome sequencing and microbiome characterization with recent progress in measuring metabolomics, the small molecules and chemicals that result from protein synthesis and other basic physiologic functions will provide new opportunities in medical diagnosis, early detection, and prevention.
To make use of all these data there needs to be an affordable place to securely store, access, and analyze. Fortunately, the availability and decreased costs of cloud computing has now made it possible to securely store and analyze genomics and phenotype metadata as integrated health records at scale previously unattainable.
As a result of these new capabilities in data generation and storage, medical science is poised for a potentially disruptive transition in discovery. Machine learning is a computer science focused on “extracting rules and patterns from sets of data,” “without having to be explicitly instructed every step of the way by human programmers” (Economist. How Machine Learning Works. May 13, 2015). Machine learning has been particularly impactful when used with huge amounts of data. Most large-scale applications of machine learning have occurred outside of medical science or health care. As machine learning is applied to medical science it will likely challenge traditional, more linear, hypothesis-driven, biomedical research as the gold standard for new discoveries. Described herein is the use of machine learning with a database of integrated health records to translate the “language” of biology in the form of DNA sequence data—the “software of life,” into the language of health and disease as phenotypes. The expectation is that this will result in a dramatic acceleration of novel therapeutics and diagnostics, and new models for medical care.
A fundamentally new information environment, a knowledgebase, is described herein based on our work in genomics, microbiomics, and metabolomics, as they relate to information technologies. The medical model focuses on integrating genomics and phenotype data to identify actionable individual health risks as a basis for early detection and prevention of age-related disease in adults. Based on these efforts, we will design and evaluate the feasibility of individualized care plans based on study designs that focus on single individuals—known as N-of-1 trials. Health outcomes related to effectiveness of individualized health risks and care plans will be evaluated for effectiveness in prevention and early detection and response to age-related disease.
In one aspect, disclosed herein are computer-implemented systems comprising: at least one memory storing computer-executable instructions; and at least one processor configured to access the at least one memory and execute the computer-executable instructions to perform: receiving a query, wherein the query defines a cohort of one or more individuals; querying a database; wherein the database comprises a plurality of genomic data and a plurality of phenotype data; generating, using at least the genomic data, a genome summary, a microbiome summary, and a metabolome summary for the cohort, the genome summary comprising genes and gene variants of the cohort, the microbiome summary comprising one or more of: a relative abundance of types of microflora found in the cohort and genes and gene variants of the microflora, and the metabolome summary comprising measurements of metabolites found in the cohort; determining a graphical representation of the genome summary, the microbiome summary, and the metabolome summary for the cohort; and sending the graphical representation to a display device. In some cases, the database further comprises a plurality of microbiomic data and a plurality of metabolomic data. The plurality of genomic data is optionally obtained by analysis of one or more biologic samples from one or more individuals. The query may further comprise one or more of: demographic information, clinical information, disease history, family history, disease treatments, physical traits, test results, diagnostic images, dietary information, lifestyle information, and cognitive assessments. In various cases, the query optionally includes a phenotype, a sample ID, an individual ID, and/or a gene name or gene variant name. In some cases, the genome summary further comprises functional effect, clinical significance, variant type, and allele frequency for the gene variants. The at least one processor may be further configured to access the at least one memory and execute the computer-executable instructions to perform presenting a graphical user interface (GUI) for receiving the query. The GUI may allow construction of the query by adding or removing one or more of: a demographic filter, a primary diagnosis filter, a medical history filter, a lifestyle filter, a clinical measurement filter, and a laboratory measurement filter. The GUI optionally shows a number of individuals remaining in the cohort in response to each adding or removing a filter. The GUI may allow the user to configure display of the microbiome summary to show abundance by species or genus of microflora found in the individuals. The GUI may allow the user to configure display of the metabolome summary to show measurements of metabolites found in the individuals by metabolic superpathway or sub-pathway. In some cases, the genome summary, microbiome summary, and metabolome summary comprise data for individuals. The GUI may display one of the genome information, microbiome information, and metabolome information at a time and allows the user to switch among displays of the genome information, the microbiome information, and the metabolome information. In some cases, the genomic data comprises data in variant call format (VCF). In some cases, the genomic data is annotated with one or more non-genomic data upon or before import into the database. In further cases, the one or more non-genomic data comprises one or more of: demographic information, clinical information, disease history, family history, disease treatments, physical traits, test results, microbiome information, metabolome information, dietary information, lifestyle information, cognitive assessments, sample ID, and patient ID. In some cases, at least one processor is allocated to the query independently of other queries. In some cases, at least one dedicated processor is allocated to the query and the database is shared for all queries.
In another aspect, disclosed herein are computer-implemented systems comprising: at least one memory storing computer-executable instructions; and at least one processor configured to access the at least one memory and execute the computer-executable instructions to generate a graphical user interface (GUI) that accepts a query from a user defining a cohort of one or more individuals and presents to the user genome information, microbiome information, and metabolome information related to the cohort.
In another aspect, disclosed herein are computer-implemented methods comprising: receiving a query, wherein the query defines a cohort of one or more individuals; querying a database; wherein the database comprises a plurality of genomic data and a plurality of phenotype data; generating, using at least the genomic data, a genome summary, a microbiome summary, and a metabolome summary for the cohort, the genome summary comprising genes and gene variants of the cohort, the microbiome summary comprising one or more of: a relative abundance of types of microflora found in the cohort and genes and gene variants of the microflora, and the metabolome summary comprising measurements of metabolites found in the cohort; determining a graphical representation of the genome summary, the microbiome summary, and the metabolome summary for the cohort; and sending the graphical representation to a display device. In some cases, the database further comprises a plurality of microbiomic data and a plurality of metabolomic data. The plurality of genomic data is optionally obtained by analysis of one or more biologic samples from one or more individuals. The query may further comprise one or more of: demographic information, clinical information, disease history, family history, disease treatments, physical traits, test results, diagnostic images, dietary information, lifestyle information, and cognitive assessments. In various cases, the query optionally includes a phenotype, a sample ID, an individual ID, and/or a gene name or gene variant name. In some cases, the genome summary further comprises functional effect, clinical significance, variant type, and allele frequency for the gene variants. The method may further comprise presenting a graphical user interface (GUI) for receiving the query. The GUI may allow construction of the query by adding or removing one or more of: a demographic filter, a primary diagnosis filter, a medical history filter, a lifestyle filter, a clinical measurement filter, and a laboratory measurement filter. The GUI optionally shows a number of individuals remaining in the cohort in response to each adding or removing a filter. The GUI may allow the user to configure display of the microbiome summary to show abundance by species or genus of microflora found in the individuals. The GUI may allow the user to configure display of the metabolome summary to show measurements of metabolites found in the individuals by metabolic superpathway or sub-pathway. In some cases, the genome summary, microbiome summary, and metabolome summary comprise data for individuals. The GUI may display one of the genome information, microbiome information, and metabolome information at a time and allows the user to switch among displays of the genome information, the microbiome information, and the metabolome information. In some cases, the genomic data comprises data in variant call format (VCF). In some cases, the genomic data is annotated with one or more non-genomic data upon or before import into the database. In further cases, the one or more non-genomic data comprises one or more of: demographic information, clinical information, disease history, family history, disease treatments, physical traits, test results, microbiome information, metabolome information, dietary information, lifestyle information, cognitive assessments, sample ID, and patient ID. The method may further comprise allocating at least one computing resource to the query independently of other queries. The method may further comprise allocating at least one dedicated computing resource to the query, wherein the database is shared for all queries.
In another aspect, disclosed herein are computer-implemented methods comprising generating a graphical user interface (GUI) that accepts a query from a user defining a cohort of one or more individuals and presents to the user genome information, microbiome information, and metabolome information related to the cohort.
In another aspect, disclosed herein are non-transitory computer-readable storage media encoded with a computer program including instructions executable by at least one processor to perform: receiving a query, wherein the query defines a cohort of one or more individuals; querying a database; wherein the database comprises a plurality of genomic data and a plurality of phenotype data; generating, using at least the genomic data, a genome summary, a microbiome summary, and a metabolome summary for the cohort, the genome summary comprising genes and gene variants of the cohort, the microbiome summary comprising one or more of: a relative abundance of types of microflora found in the cohort and genes and gene variants of the microflora, and the metabolome summary comprising measurements of metabolites found in the cohort; determining a graphical representation of the genome summary, the microbiome summary, and the metabolome summary for the cohort; and sending the graphical representation to a display device. In some cases, the database further comprises a plurality of microbiomic data and a plurality of metabolomic data. The plurality of genomic data is optionally obtained by analysis of one or more biologic samples from one or more individuals. The query may further comprise one or more of: demographic information, clinical information, disease history, family history, disease treatments, physical traits, test results, diagnostic images, dietary information, lifestyle information, and cognitive assessments. In various cases, the query optionally includes a phenotype, a sample ID, an individual ID, and/or a gene name or gene variant name. In some cases, the genome summary further comprises functional effect, clinical significance, variant type, and allele frequency for the gene variants. The instructions may be executable by the at least one processor to further perform presenting a graphical user interface (GUI) for receiving the query. The GUI may allow construction of the query by adding or removing one or more of: a demographic filter, a primary diagnosis filter, a medical history filter, a lifestyle filter, a clinical measurement filter, and a laboratory measurement filter. The GUI optionally shows a number of individuals remaining in the cohort in response to each adding or removing a filter. The GUI may allow the user to configure display of the microbiome summary to show abundance by species or genus of microflora found in the individuals. The GUI may allow the user to configure display of the metabolome summary to show measurements of metabolites found in the individuals by metabolic superpathway or sub-pathway. In some cases, the genome summary, microbiome summary, and metabolome summary comprise data for individuals. The GUI may display one of the genome information, microbiome information, and metabolome information at a time and allows the user to switch among displays of the genome information, the microbiome information, and the metabolome information. In some cases, the genomic data comprises data in variant call format (VCF). In some cases, the genomic data is annotated with one or more non-genomic data upon or before import into the database. In further cases, the one or more non-genomic data comprises one or more of: demographic information, clinical information, disease history, family history, disease treatments, physical traits, test results, microbiome information, metabolome information, dietary information, lifestyle information, cognitive assessments, sample ID, and patient ID. In some cases, at least one processor is allocated to the query independently of other queries. In some cases, at least one dedicated processor is allocated to the query and the database is shared for all queries.
In another aspect, disclosed herein are non-transitory computer-readable storage media encoded with a computer program including instructions executable by at least one processor to generate a graphical user interface (GUI) that accepts a query from a user defining a cohort of one or more individuals and presents to the user genome information, microbiome information, and metabolome information related to the cohort.
In another aspect, disclosed herein are platforms comprising: a database, in a computer memory, comprising biologic information for members of a population of individuals, the biologic information comprising one or more of: genome data, microbiome data, and metabolome data, the biologic information obtained by analysis of one or more biologic samples from each individual; and a processor configured to provide a biologic information visual synthesis application comprising: a software module presenting an interface allowing a user to query the database by inputting a search term; and a software module generating a visual synthesis display comprising one or more of: a genome summary, a microbiome summary, and a metabolome summary for a cohort, a demographically matched subpopulation, and the population, the genome summary comprising genes and gene variants, the microbiome summary comprising one or more of: relative abundance of types of microflora and genes and gene variants of the microflora, and the metabolome summary comprising measurements of metabolites. In some embodiments, the database further comprises phenotype data for members of the population of individuals. In further embodiments, the phenotype data comprises one or more of demographic information, clinical information, disease history, family history, disease treatments, physical traits, test results, diagnostic images, dietary information, lifestyle information, cognitive assessments, and the like. In some embodiments, the microbiome data comprises metagenomic sequences of the microbiomes. In some embodiments, the software module presenting an interface allowing a user to query the database allows the user to build a cohort of individuals from the population by applying filters to the population. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting a sample ID, a phenotypic trait, or a metabolite. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting an individual ID. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting a gene name, a gene variant, or a nucleic acid sequence. In some embodiments, the filters are one or more selected from the group consisting of: a demographic filter, a primary diagnosis filter, a medical history filter, a lifestyle filter, a clinical measurement filter, and a laboratory measurement filter. In further embodiments, the software module presenting an interface allowing the user to build a cohort of individuals displays and dynamically updates the number of individuals in the cohort as filters are applied or removed. In some embodiments, the genome summary comprises a visual display of one or more of the functional effect, the clinical significance, the variant type, and the allele frequency for the gene variants. In some embodiments, the microbiome summary comprises an interface element allowing the user to configure the microbiome summary to display abundance by species or genus of microflora. In some embodiments, the microbiome summary comprises an interface element allowing the user to configure the microbiome summary to display information obtained from a metagenomic sequence of the microbiome. In further embodiments, the information obtained from a metagenomic sequence of the microbiome comprises genes names or gene variants. In some embodiments, the metabolome summary comprises an interface element allowing the user to configure the metabolome summary to display measurements of metabolites by metabolic superpathway or sub-pathway. In some embodiments, the genome summary, microbiome summary, and metabolome summary comprise data for individuals. In some embodiments, the user optionally switches between the genome summary, the microbiome summary, and the metabolome summary to view the microbiome and the metabolome data in the context of the genome data.
In another aspect, disclosed herein are computer-implemented systems comprising: a digital processing device comprising at least one processor, an operating system configured to perform executable instructions, and a memory; and a computer program including instructions executable by the digital processing device to create a biologic information visual synthesis application comprising: a database, in a computer memory, comprising biologic information for members of a population of individuals, the biologic information comprising one or more of: genome data, microbiome data, and metabolome data, the biologic information obtained by analysis of one or more biologic samples from each individual; a software module presenting an interface allowing a user to query the database by inputting a phenotype or a software module presenting an interface allowing the user to build a cohort of individuals from the population by applying filters to the population; and a software module generating a visual synthesis display comprising one or more of: a genome summary, a microbiome summary, and a metabolome summary for the cohort, a demographically matched subpopulation, and the population, the genome summary comprising genes and gene variants, the microbiome summary comprising one or more of: relative abundance of types of microflora and genes and gene variants of the microflora, and the metabolome summary comprising measurements of metabolites. In some embodiments, the database further comprises phenotype data for members of the population of individuals. In further embodiments, the phenotype data comprises one or more of demographic information, clinical information, disease history, family history, disease treatments, physical traits, test results, diagnostic images, dietary information, lifestyle information, cognitive assessments, and the like. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting a sample ID. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting an individual ID. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting a gene name. In some embodiments, the filters are one or more selected from the group consisting of: a demographic filter, a primary diagnosis filter, a medical history filter, a lifestyle filter, a clinical measurement filter, and a laboratory measurement filter. In further embodiments, the software module presenting an interface allowing the user to build a cohort of individuals displays and dynamically updates the number of individuals in the cohort as filters are applied or removed. In some embodiments, the genome summary comprises a visual display of the functional effect, the clinical significance, the variant type, and the allele frequency for the gene variants. In some embodiments, the microbiome summary comprises an interface element allowing the user to configure the microbiome summary to display abundance by species or genus of microflora. In some embodiments, the metabolome summary comprises an interface element allowing the user to configure the metabolome summary to display measurements of metabolites by metabolic superpathway or sub-pathway. In some embodiments, the genome summary, microbiome summary, and metabolome summary comprise data for individuals. In some embodiments, the user optionally switches between the genome summary, the microbiome summary, and the metabolome summary to view the microbiome and the metabolome data in the context of the genome data.
In another aspect, disclosed herein are non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processor to create a biologic information visual synthesis application comprising: a database, in a computer memory, comprising biologic information for members of a population of individuals, the biologic information comprising one or more of: genome data, microbiome data, and metabolome data, the biologic information obtained by analysis of one or more biologic samples from each individual; a software module presenting an interface allowing a user to query the database by inputting a phenotype or a software module presenting an interface allowing the user to build a cohort of individuals from the population by applying filters to the population; and a software module generating a visual synthesis display comprising one or more of: a genome summary, a microbiome summary, and a metabolome summary for the cohort, a demographically matched subpopulation, and the population, the genome summary comprising genes and gene variants, the microbiome summary comprising one or more of: relative abundance of types of microflora and genes and gene variants of the microflora, and the metabolome summary comprising measurements of metabolites. In some embodiments, the database further comprises phenotype data for members of the population of individuals. In further embodiments, the phenotype data comprises one or more of demographic information, clinical information, disease history, family history, disease treatments, physical traits, test results, diagnostic images, dietary information, lifestyle information, cognitive assessments, and the like. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting a sample ID. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting an individual ID. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting a gene name. In some embodiments, the filters are one or more selected from the group consisting of: a demographic filter, a primary diagnosis filter, a medical history filter, a lifestyle filter, a clinical measurement filter, and a laboratory measurement filter. In further embodiments, the software module presenting an interface allowing the user to build a cohort of individuals displays and dynamically updates the number of individuals in the cohort as filters are applied or removed. In some embodiments, the genome summary comprises a visual display of the functional effect, the clinical significance, the variant type, and the allele frequency for the gene variants. In some embodiments, the microbiome summary comprises an interface element allowing the user to configure the microbiome summary to display abundance by species or genus of microflora. In some embodiments, the metabolome summary comprises an interface element allowing the user to configure the metabolome summary to display measurements of metabolites by metabolic superpathway or sub-pathway. In some embodiments, the genome summary, microbiome summary, and metabolome summary comprise data for individuals. In some embodiments, the user optionally switches between the genome summary, the microbiome summary, and the metabolome summary to view the microbiome and the metabolome data in the context of the genome data.
In another aspect, disclosed herein are platforms comprising: a database, in a computer memory, comprising biologic information for a population of individuals, the biologic information comprising genome data, microbiome data, and metabolome data for each individual, the biologic information obtained by analysis of a plurality of biologic samples from each individual; and a processor configured to provide a biologic information visual synthesis application comprising: a software module presenting an interface allowing a user to query the database by inputting a phenotype; a software module presenting an interface allowing the user to build a cohort of individuals from the population by applying filters to the population; and a software module generating a visual synthesis display comprising a genome summary, a microbiome summary, and a metabolome summary for the cohort, a demographically matched subpopulation, and the population, the genome summary comprising genes and gene variants, the microbiome summary comprising relative abundance of types of microflora, and the metabolome summary comprising measurements of metabolites. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting a sample ID. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting an individual ID. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting a gene name. In some embodiments, the filters are one or more selected from the group consisting of: a demographic filter, a primary diagnosis filter, a medical history filter, a lifestyle filter, a clinical measurement filter, and a laboratory measurement filter. In further embodiments, the software module presenting an interface allowing the user to build a cohort of individuals displays and dynamically updates the number of individuals in the cohort as filters are applied or removed. In some embodiments, the genome summary comprises a visual display of the functional effect, the clinical significance, the variant type, and the allele frequency for the gene variants. In some embodiments, the microbiome summary comprises an interface element allowing the user to configure the microbiome summary to display abundance by species or genus of microflora. In some embodiments, the metabolome summary comprises an interface element allowing the user to configure the metabolome summary to display measurements of metabolites by metabolic superpathway or sub-pathway. In some embodiments, the genome summary, microbiome summary, and metabolome summary comprise data for individuals. In some embodiments, the user optionally switches between the genome summary, the microbiome summary, and the metabolome summary to view the microbiome and the metabolome data in the context of the genome data.
In another aspect, disclosed herein are computer-implemented systems comprising: a digital processing device comprising at least one processor, an operating system configured to perform executable instructions, and a memory; and a computer program including instructions executable by the digital processing device to create a biologic information visual synthesis application comprising: a database, in a computer memory, comprising biologic information for a population of individuals, the biologic information comprising genome data, microbiome data, and metabolome data for each individual, the biologic information obtained by analysis of a plurality of biologic samples from each individual; a software module presenting an interface allowing a user to query the database by inputting a phenotype; a software module presenting an interface allowing the user to build a cohort of individuals from the population by applying filters to the population; and a software module generating a visual synthesis display comprising a genome summary, a microbiome summary, and a metabolome summary for the cohort, a demographically matched subpopulation, and the population, the genome summary comprising genes and gene variants, the microbiome summary comprising relative abundance of types of microflora, and the metabolome summary comprising measurements of metabolites. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting a sample ID. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting an individual ID. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting a gene name. In some embodiments, the filters are one or more selected from the group consisting of: a demographic filter, a primary diagnosis filter, a medical history filter, a lifestyle filter, a clinical measurement filter, and a laboratory measurement filter. In further embodiments, the software module presenting an interface allowing the user to build a cohort of individuals displays and dynamically updates the number of individuals in the cohort as filters are applied or removed. In some embodiments, the genome summary comprises a visual display of the functional effect, the clinical significance, the variant type, and the allele frequency for the gene variants. In some embodiments, the microbiome summary comprises an interface element allowing the user to configure the microbiome summary to display abundance by species or genus of microflora. In some embodiments, the metabolome summary comprises an interface element allowing the user to configure the metabolome summary to display measurements of metabolites by metabolic superpathway or sub-pathway. In some embodiments, the genome summary, microbiome summary, and metabolome summary comprise data for individuals. In some embodiments, the user optionally switches between the genome summary, the microbiome summary, and the metabolome summary to view the microbiome and the metabolome data in the context of the genome data.
In another aspect, disclosed herein are non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processor to create a biologic information visual synthesis application comprising: a database, in a computer memory, comprising biologic information for a population of individuals, the biologic information comprising genome data, microbiome data, and metabolome data for each individual, the biologic information obtained by analysis of a plurality of biologic samples from each individual; a software module presenting an interface allowing a user to query the database by inputting a phenotype; a software module presenting an interface allowing the user to build a cohort of individuals from the population by applying filters to the population; and a software module generating a visual synthesis display comprising a genome summary, a microbiome summary, and a metabolome summary for the cohort, a demographically matched subpopulation, and the population, the genome summary comprising genes and gene variants, the microbiome summary comprising relative abundance of types of microflora, and the metabolome summary comprising measurements of metabolites. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting a sample ID. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting an individual ID. In some embodiments, the software module presenting an interface allowing a user to query the database further allows the user to query the database by inputting a gene name. In some embodiments, the filters are one or more selected from the group consisting of: a demographic filter, a primary diagnosis filter, a medical history filter, a lifestyle filter, a clinical measurement filter, and a laboratory measurement filter. In further embodiments, the software module presenting an interface allowing the user to build a cohort of individuals displays and dynamically updates the number of individuals in the cohort as filters are applied or removed. In some embodiments, the genome summary comprises a visual display of the functional effect, the clinical significance, the variant type, and the allele frequency for the gene variants. In some embodiments, the microbiome summary comprises an interface element allowing the user to configure the microbiome summary to display abundance by species or genus of microflora. In some embodiments, the metabolome summary comprises an interface element allowing the user to configure the metabolome summary to display measurements of metabolites by metabolic superpathway or sub-pathway. In some embodiments, the genome summary, microbiome summary, and metabolome summary comprise data for individuals. In some embodiments, the user optionally switches between the genome summary, the microbiome summary, and the metabolome summary to view the microbiome and the metabolome data in the context of the genome data.
In another aspect, disclosed herein are platforms comprising: a database, in a computer memory, comprising biologic information for members of a population of individuals, the biologic information comprising genome data, the biologic information obtained by analysis of one or more biologic samples from each individual, each individual and sample having an ID; and a processor configured to provide a biologic information visual synthesis application comprising: a software module presenting an interface allowing a user to query the database by one or more of: inputting a phenotype, inputting a gene name, inputting an individual ID, and inputting a sample ID; a software module generating a genome browser, the genome browser comprising: a whole genome display comprising an icon representing each chromosome, each icon indicating a density of gene variants; and a chromosome display comprising an iconic representation of each chromosome, the representation indicating a density of gene variants located at the relevant portion of the chromosome, wherein selection of a chromosome by a user generates a linear display of the chromosome demonstrating the spatial relationship on the chromosome of the genes and the variants; and a software module generating a lineage viewer, the lineage viewer comprising: a geographic display of autosomal ancestry, a geographic display of maternal line ancestry, and a geographic display of paternal line ancestry. In some embodiments, the database further comprises phenotype data for members of the population of individuals. In further embodiments, the phenotype data comprises one or more of demographic information, clinical information, disease history, family history, disease treatments, physical traits, test results, diagnostic images, dietary information, lifestyle information, cognitive assessments, and the like. In some embodiments, the geographic displays are in map form. In some embodiments, the whole genome display is circular. In some embodiments, the variants are visually distinguished by type. In some embodiments, the genome browser further comprises an interface for allowing the user to apply filters to the variants, the filters comprising one more selected from the group consisting of: clinical significance, functional effects, variant type, and zygosity.
In another aspect, disclosed herein are computer-implemented systems comprising: a digital processing device comprising at least one processor, an operating system configured to perform executable instructions, and a memory; and a computer program including instructions executable by the digital processing device to create a biologic information visual synthesis application comprising: a software module presenting an interface allowing a user to query the database one or more of: inputting a phenotype, inputting a gene name, inputting an individual ID, and inputting a sample ID; a software module generating a genome browser, the genome browser comprising: a whole genome display comprising an icon representing each chromosome, each icon indicating a density of gene variants; and a chromosome display comprising an iconic representation of each chromosome, the representation indicating a density of gene variants located at the relevant portion of the chromosome, wherein selection of a chromosome by a user generates a linear display of the chromosome demonstrating the spatial relationship on the chromosome of the genes and the variants; and a software module generating a lineage viewer, the lineage viewer comprising: a geographic display of autosomal ancestry, a geographic display of maternal line ancestry, and a geographic display of paternal line ancestry. In some embodiments, the database further comprises phenotype data for members of the population of individuals. In further embodiments, the phenotype data comprises one or more of demographic information, clinical information, disease history, family history, disease treatments, physical traits, test results, diagnostic images, dietary information, lifestyle information, cognitive assessments, and the like. In some embodiments, the geographic displays are in map form. In some embodiments, the whole genome display is circular. In some embodiments, the variants are visually distinguished by type. In some embodiments, the genome browser further comprises an interface for allowing the user to apply filters to the variants, the filters comprising one more selected from the group consisting of: clinical significance, functional effects, variant type, and zygosity.
In another aspect, disclosed herein are non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processor to create a biologic information visual synthesis application comprising: a software module presenting an interface allowing a user to query the database one or more of: inputting a phenotype, inputting a gene name, inputting an individual ID, and inputting a sample ID; a software module generating a genome browser, the genome browser comprising: a whole genome display comprising an icon representing each chromosome, each icon indicating a density of gene variants; and a chromosome display comprising an iconic representation of each chromosome, the representation indicating a density of gene variants located at the relevant portion of the chromosome, wherein selection of a chromosome by a user generates a linear display of the chromosome demonstrating the spatial relationship on the chromosome of the genes and the variants; and a software module generating a lineage viewer, the lineage viewer comprising: a geographic display autosomal ancestry, a geographic display of maternal line ancestry, and a geographic display of paternal line ancestry. In some embodiments, the database further comprises phenotype data for members of the population of individuals. In further embodiments, the phenotype data comprises one or more of demographic information, clinical information, disease history, family history, disease treatments, physical traits, test results, diagnostic images, dietary information, lifestyle information, cognitive assessments, and the like. In some embodiments, the geographic displays are in map form. In some embodiments, the whole genome display is circular. In some embodiments, the variants are visually distinguished by type. In some embodiments, the genome browser further comprises an interface for allowing the user to apply filters to the variants, the filters comprising one more selected from the group consisting of: clinical significance, functional effects, variant type, and zygosity.
Described herein are computer-implemented systems comprising: at least one memory storing computer-executable instructions; and at least one processor configured to access the at least one memory and execute the computer-executable instructions to perform: receiving a query, wherein the query defines a cohort of one or more individuals; querying a database; wherein the database comprises a plurality of genomic data and a plurality of phenotype data; generating, using at least the genomic data, a genome summary, a microbiome summary, and a metabolome summary for the cohort, the genome summary comprising genes and gene variants of the cohort, the microbiome summary comprising one or more of: a relative abundance of types of microflora found in the cohort and genes and gene variants of the microflora, and the metabolome summary comprising measurements of metabolites found in the cohort; determining a graphical representation of the genome summary, the microbiome summary, and the metabolome summary for the cohort; and sending the graphical representation to a display device.
Also described herein are computer-implemented systems comprising: at least one memory storing computer-executable instructions; and at least one processor configured to access the at least one memory and execute the computer-executable instructions to generate a graphical user interface (GUI) that accepts a query from a user defining a cohort of one or more individuals and presents to the user genome information, microbiome information, and metabolome information related to the cohort.
Also described herein are computer-implemented methods comprising: receiving a query, wherein the query defines a cohort of one or more individuals; querying a database; wherein the database comprises a plurality of genomic data and a plurality of phenotype data; generating, using at least the genomic data, a genome summary, a microbiome summary, and a metabolome summary for the cohort, the genome summary comprising genes and gene variants of the cohort, the microbiome summary comprising one or more of: a relative abundance of types of microflora found in the cohort and genes and gene variants of the microflora, and the metabolome summary comprising measurements of metabolites found in the cohort; determining a graphical representation of the genome summary, the microbiome summary, and the metabolome summary for the cohort; and sending the graphical representation to a display device.
Also described herein are computer-implemented methods comprising generating a graphical user interface (GUI) that accepts a query from a user defining a cohort of one or more individuals and presents to the user genome information, microbiome information, and metabolome information related to the cohort.
Also described herein are non-transitory computer-readable storage media encoded with a computer program including instructions executable by at least one processor to perform: receiving a query, wherein the query defines a cohort of one or more individuals; querying a database; wherein the database comprises a plurality of genomic data and a plurality of phenotype data; generating, using at least the genomic data, a genome summary, a microbiome summary, and a metabolome summary for the cohort, the genome summary comprising genes and gene variants of the cohort, the microbiome summary comprising one or more of: a relative abundance of types of microflora found in the cohort and genes and gene variants of the microflora, and the metabolome summary comprising measurements of metabolites found in the cohort; determining a graphical representation of the genome summary, the microbiome summary, and the metabolome summary for the cohort; and sending the graphical representation to a display device.
Also described herein are non-transitory computer-readable storage media encoded with a computer program including instructions executable by at least one processor to generate a graphical user interface (GUI) that accepts a query from a user defining a cohort of one or more individuals and presents to the user genome information, microbiome information, and metabolome information related to the cohort.
Also described herein are platforms comprising: a database, in a computer memory, comprising biologic information for members of a population of individuals, the biologic information comprising one or more of: genome data, microbiome data, and metabolome data, the biologic information obtained by analysis of one or more biologic samples from each individual; and a processor configured to provide a biologic information visual synthesis application comprising: a software module presenting an interface allowing a user to query the database by inputting a search term; and a software module generating a visual synthesis display comprising one or more of: a genome summary, a microbiome summary, and a metabolome summary for a cohort, a demographically matched subpopulation, and the population, the genome summary comprising genes and gene variants, the microbiome summary comprising one or more of: relative abundance of types of microflora and genes and gene variants of the microflora, and the metabolome summary comprising measurements of metabolites.
Also described herein are computer-implemented systems comprising: a digital processing device comprising at least one processor, an operating system configured to perform executable instructions, and a memory; and a computer program including instructions executable by the digital processing device to create a biologic information visual synthesis application comprising: a database, in a computer memory, comprising biologic information for members of a population of individuals, the biologic information comprising one or more of: genome data, microbiome data, and metabolome data, the biologic information obtained by analysis of one or more biologic samples from each individual; a software module presenting an interface allowing a user to query the database by inputting a phenotype or a software module presenting an interface allowing the user to build a cohort of individuals from the population by applying filters to the population; and a software module generating a visual synthesis display comprising one or more of: a genome summary, a microbiome summary, and a metabolome summary for the cohort, a demographically matched subpopulation, and the population, the genome summary comprising genes and gene variants, the microbiome summary comprising one or more of: relative abundance of types of microflora and genes and gene variants of the microflora, and the metabolome summary comprising measurements of metabolites.
Also described herein are non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processor to create a biologic information visual synthesis application comprising: a database, in a computer memory, comprising biologic information for members of a population of individuals, the biologic information comprising one or more of: genome data, microbiome data, and metabolome data, the biologic information obtained by analysis of one or more biologic samples from each individual; a software module presenting an interface allowing a user to query the database by inputting a phenotype or a software module presenting an interface allowing the user to build a cohort of individuals from the population by applying filters to the population; and a software module generating a visual synthesis display comprising one or more of: a genome summary, a microbiome summary, and a metabolome summary for the cohort, a demographically matched subpopulation, and the population, the genome summary comprising genes and gene variants, the microbiome summary comprising one or more of: relative abundance of types of microflora and genes and gene variants of the microflora, and the metabolome summary comprising measurements of metabolites.
Also described herein are platforms comprising: a database, in a computer memory, comprising biologic information for a population of individuals, the biologic information comprising genome data, microbiome data, and metabolome data for each individual, the biologic information obtained by analysis of a plurality of biologic samples from each individual; and a processor configured to provide a biologic information visual synthesis application comprising: a software module presenting an interface allowing a user to query the database by inputting a phenotype; a software module presenting an interface allowing the user to build a cohort of individuals from the population by applying filters to the population; and a software module generating a visual synthesis display comprising a genome summary, a microbiome summary, and a metabolome summary for the cohort, a demographically matched subpopulation, and the population, the genome summary comprising genes and gene variants, the microbiome summary comprising relative abundance of types of microflora, and the metabolome summary comprising measurements of metabolites.
Also described herein are computer-implemented systems comprising: a digital processing device comprising at least one processor, an operating system configured to perform executable instructions, and a memory; and a computer program including instructions executable by the digital processing device to create a biologic information visual synthesis application comprising: a database, in a computer memory, comprising biologic information for a population of individuals, the biologic information comprising genome data, microbiome data, and metabolome data for each individual, the biologic information obtained by analysis of a plurality of biologic samples from each individual; a software module presenting an interface allowing a user to query the database by inputting a phenotype; a software module presenting an interface allowing the user to build a cohort of individuals from the population by applying filters to the population; and a software module generating a visual synthesis display comprising a genome summary, a microbiome summary, and a metabolome summary for the cohort, a demographically matched subpopulation, and the population, the genome summary comprising genes and gene variants, the microbiome summary comprising relative abundance of types of microflora, and the metabolome summary comprising measurements of metabolites.
Also described herein are non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processor to create a biologic information visual synthesis application comprising: a database, in a computer memory, comprising biologic information for a population of individuals, the biologic information comprising genome data, microbiome data, and metabolome data for each individual, the biologic information obtained by analysis of a plurality of biologic samples from each individual; a software module presenting an interface allowing a user to query the database by inputting a phenotype; a software module presenting an interface allowing the user to build a cohort of individuals from the population by applying filters to the population; and a software module generating a visual synthesis display comprising a genome summary, a microbiome summary, and a metabolome summary for the cohort, a demographically matched subpopulation, and the population, the genome summary comprising genes and gene variants, the microbiome summary comprising relative abundance of types of microflora, and the metabolome summary comprising measurements of metabolites.
Also described herein are platforms comprising: a database, in a computer memory, comprising biologic information for member of a population of individuals, the biologic information comprising genome data, the biologic information obtained by analysis of one or more biologic samples from each individual, each individual and sample having an ID; and a processor configured to provide a biologic information visual synthesis application comprising: a software module presenting an interface allowing a user to query the database one or more of: inputting a phenotype, inputting a gene name, inputting an individual ID, and inputting a sample ID; a software module generating a genome browser, the genome browser comprising: a whole genome display comprising an icon representing each chromosome, each icon indicating a density of gene variants; and a chromosome display comprising an iconic representation of each chromosome, the representation indicating a density of gene variants located at the relevant portion of the chromosome, wherein selection of a chromosome by a user generates a linear display of the chromosome demonstrating the spatial relationship on the chromosome of the genes and the variants; and a software module generating a lineage viewer, the lineage viewer comprising: a geographic display of autosomal ancestry, a geographic display of maternal line ancestry, and a geographic display of paternal line ancestry.
Also described herein are computer-implemented systems comprising: a digital processing device comprising at least one processor, an operating system configured to perform executable instructions, and a memory; and a computer program including instructions executable by the digital processing device to create a biologic information visual synthesis application comprising: a software module presenting an interface allowing a user to query the database one or more of: inputting a phenotype, inputting a gene name, inputting an individual ID, and inputting a sample ID; a software module generating a genome browser, the genome browser comprising: a whole genome display comprising an icon representing each chromosome, each icon indicating a density of gene variants; and a chromosome display comprising an iconic representation of each chromosome, the representation indicating a density of gene variants located at the relevant portion of the chromosome, wherein selection of a chromosome by a user generates a linear display of the chromosome demonstrating the spatial relationship on the chromosome of the genes and the variants; and a software module generating a lineage viewer, the lineage viewer comprising: a geographic display of autosomal ancestry, a geographic display of maternal line ancestry, and a geographic display of paternal line ancestry.
Also described herein are non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processor to create a biologic information visual synthesis application comprising: a software module presenting an interface allowing a user to query the database one or more of: inputting a phenotype, inputting a gene name, inputting an individual ID, and inputting a sample ID; a software module generating a genome browser, the genome browser comprising: a whole genome display comprising an icon representing each chromosome, each icon indicating a density of gene variants; and a chromosome display comprising an iconic representation of each chromosome, the representation indicating a density of gene variants located at the relevant portion of the chromosome, wherein selection of a chromosome by a user generates a linear display of the chromosome demonstrating the spatial relationship on the chromosome of the genes and the variants; and a software module generating a lineage viewer, the lineage viewer comprising: a geographic display of autosomal ancestry, a geographic display of maternal line ancestry, and a geographic display of paternal line ancestry.
Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. Any reference to “or” herein is intended to encompass “and/or” unless otherwise stated.
As used herein, “cohort” means a group of one or more individuals banded together or treated as a group.
Described herein is a cloud-based solution for the storage, query, and analysis of longitudinal data comprising a multiplicity of whole genomes, a large number of public and proprietary annotation sources as well as associated high quality phenotypic data, including microbiome metagenomes and metabolomics profiles. In various embodiments, the data analyzed by the platforms, systems, media, and methods described herein comprises more than 1,000, more than 5,000, more than 10,000, more than 20,000, more than 50,000, more than 100,000, more than 500,000, or more than 1,000,000 whole genomes.
The data analyzed by the platforms, systems, media, and methods described herein comprises genomic data. The genomic data is produced, by way of example, at a next generation sequencing (NGS) lab. In some cases, an AWS analysis pipeline based on Illumina's HiSeq X and the ISIS Analysis Software are utilized to produce the genomic data. Sequencing reads are mapped to the hg38 human reference sequence and the Isaac Variant Caller is used to call single nucleotide variants (SNVs) and insertions and deletions (indels). The genomic data comprises a multiplicity of unique SNVs. By way of examples, the genomic data comprises over 1 million, over 10 million, over 50 million, over 100 million, over 500 million, or over 1 billion unique SNVs.
The data analyzed by the platforms, systems, media, and methods described herein comprises metadata. The whole genomes are associated with high quality phenotypic information. A proprietary phenotype ingestion process enables the cleaning and standardization of phenotype data across disparate data sources. In some embodiments, the ingestion process includes: data integrity checks; standardization of units; standardization of terms; ontology/vocabulary mapping; and maintenance of the proprietary data dictionary.
In various embodiments, the phenotype data comprises more than 1000, more than 5000, more than 10,000, more than 100,000, more than 1,000,000, or more than 10,000,000 phenotype data fields with, more than 1 million, more than 5 million, more than 10 million, more than 50 million, more than 100 million, more than 500 million, or more than 1 billion data points.
The data analyzed by the platforms, systems, media, and methods described herein comprises annotation data. Annotation data is also cleaned and standardized through an automated end-to-end solution, which allows: idempotence, immutability, persistence; high quality data; consistency between data sources; and scalability and flexibility.
The platforms, systems, media, and methods described herein include biologic data pertaining to a population of individuals, or use of the same. In various embodiments, the population of individuals comprises more than 1,000, more than 5,000, more than 10,000, more than 20,000, more than 50,000, or more than 100,000, more than 500,000, more than 1,000,000 more than 10,000,000, more than 50,000,000, or more than 100,000,000 individuals. In some cases, the individuals in the population participated in academic medical research studies using consents allowing for genetic testing of specimens. In such cases, biologic specimens and phenotype data are collected for individuals from pharmaceutical clinical trials, academic research, and health care settings. In some cases, biologic data pertaining to a population of individuals is collected from integrated health records for individuals representing a spectrum of diseases with unmet medical needs.
The platforms, systems, media, and methods described herein include biologic information, or use of the same. In some embodiments, the biologic information comprises whole human genome sequencing information.
The biologic information comprises microbiome information. As used herein, “microbiome” refers to the bacteria and other microorganisms that live in and on the human body. In some embodiments, the microbiome information comprises metagenomic microbiome characterization. In various embodiments, the microbiome information comprises one or more of: microflora genus and/or species information, microflora relative abundance information, and microflora gene and/or gene variant information.
The biologic information comprises metabolome information. As used herein, “metabolome” refers to the small-molecule chemicals found within a biological sample. In some embodiments, metabolome information comprises the presence of one or more small-molecule chemicals. In further embodiments, the metabolome information comprises a qualitative measurement of one or more small-molecule chemicals. In still further embodiments, the metabolome information comprises a quantitative measurement of one or more small-molecule chemicals. In various embodiments, the microbiome information comprises measurements of at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1100, at least 1200, at least 1300, at least 1400, or at least 1500 substances (e.g., molecules).
In some embodiments, the system presents a user interface.
Importantly, in some embodiments, some or all of the biologic information is linked to phenotype metadata as integrated health records on a cloud computer platform.
The platforms, systems, media, and methods described herein include tools to query the biologic information, or use of the same. In some embodiments, the platforms, systems, media, and methods described herein allow a user to name, save, access, and edit queries performed.
Referring to
A user optionally queries the biologic information based on one or more of a wide variety of parameters. In some cases, to initiate a query, a user indicates a query scope by selecting whether they wish to analyze an individual, analyze a cohort, or analyze the entire population of biologic information.
Referring to
In addition to selecting a scope for a new query, a user optionally indicates a type of query, e.g., on what basis the biologic information will be analyzed. By way of example, a user optionally queries the biologic information based on one or more phenotypes.
Referring to
By way of example, a user optionally queries the biologic information based on one or more individuals. In such cases, the user specifies one or more individual IDs. Referring to
By way of still further example, a user optionally queries the biologic information based on one or more gene names. In such cases, the user specifies one or more gene names. Referring to
By way of yet further example, a user optionally queries the biologic information based on one or more biological samples. In such cases, the user specifies one or more sample IDs. Referring to
As described herein, a user optionally initiates a query by selecting a scope of biologic information to query and a type of query to conduct on the selected data.
The platforms, systems, media, and methods described herein include cohorts, or use of the same. As used herein, a “cohort” refers to a group of subjects with a common characteristic. Further, the platforms, systems, media, and methods described herein include tools for building and/or customizing one or more cohorts. Users optionally build, edit, and save cohorts via an interactive cohort builder tool. In various embodiments, a user selects diseases, traits, demographics, and/or observational data to construct one or more cohorts of individuals. By way of specific example, a user optionally queries to select male individuals who were diagnosed with coronary arteriosclerosis but had no history of myocardial infarction, were not taking beta-blockers, were not overweight and had low levels of low density lipoprotein (LDL) cholesterol. The cohort builder described herein allows users to save configured cohorts in order to revisit the results.
The platforms, systems, media, and methods described herein include tools for reviewing and analyzing the genomic variants, and other biologic parameters, in the cohort. By way of example, annotated variants are interactively filtered according to the associated, integrated annotation data, such as variant type, variant effects, and calculations of allele frequency. By way of specific example, a user filters to show only pathogenic, missense variants with an allele frequency of less than 0.01. Microbiome abundances, metabolome levels, and phenotypic information for the individuals in the cohort are also optionally analyzed.
Referring to
Referring to
Referring to
Referring to
In summary,
Referring to
Referring to
The platforms, systems, media, and methods described herein include a visual synthesis application, or use of the same. In further embodiments, the visual synthesis comprises analysis in the form of data tables of queried information, summaries of queried information, reviews of queried information, and the like.
The summaries comprise, for example, individual summaries (summarizing age, gender, ethnicity, primary diagnosis, etc.) The summaries also comprise, for example, gene variant summaries (summarizing clinical significance, gene name, HGVS nomenclature, rs ID, zygosity, functional effect, protein change, OMIM ID, variant type, allele frequency, etc.). The summaries also comprise, for example, microbiome summaries (summarizing microbial abundance for specific phyla, genera, or species of microbes). The summaries also comprise, for example, metabolome summaries (summarizing specific metabolites or categories of metabolites and associated ranges or levels).
The visual synthesis comprises an interface for browsing genomic data, microbiomic data, metabolomic data, and metadata returned by a query. In further embodiments, each of the genomic data, microbiomic data, metabolomic data, and metadata are accessible via a tab in the visualization interface. In still further embodiments, the visual synthesis also comprises a summary and an analysis tool, which are also accessible via a tabs in the visualization interface. In some embodiments, the visual synthesis comprises an interface for browsing genomic data, which further includes a set of user-configurable filters allowing “on-the-fly” refinement of the queried data set. For example, in some embodiments, the filters include a set of variant filters. Referring to
Referring to
In some embodiments, the visual synthesis comprises a genome browser tool. See, e.g.,
In some embodiments, the visual synthesis comprises a lineage viewer tool. See, e.g.,
In some embodiments, the platforms, systems, media, and methods described herein include infrastructure and processes for ingress of data of various types from various sources, or use of the same. See, e.g.,
In
In
In
In
In
In
The software architecture is designed to accommodate the massive quantity of genomic, microbiomic, and metabolomic data contemplated for the platforms, systems, media, and methods described herein. In addition to accommodating the sheer quantity of data, the software architecture is designed to accommodate the interrelations of the data, including relation of genomic, microbiomic, and metabolomic data to phenotype information and annotations. Significant challenges with regard to computing efficiency and conservation of computing resources are overcome by the architecture described herein, which allows for multiple concurrent users each performing complex queries and visualizations.
Referring to
In some embodiments, a query request made by the user 9300 needs further computations from the back-end system 9320. A request is handled by a query service engine 9321. The request is further sent to a resource management engine 9322, followed a job scheduling server 9323. In some embodiments, the job scheduling server 9323 is implemented by Spark. In some embodiments, the job scheduling server 9323 controls non-persistent computing instances 6924; for example, the job scheduling server 9323 tentatively adds new computing instances for new jobs and kills the instances once the new jobs are done. The computed results are further passed to persistent computing instances 9325 for further processing, such as visualization. In some embodiments, controlling non-persistent computing instances is implemented by Mesos frameworks 9324.
In some embodiments, in order to be able to receive a large number of queries and properly handle a large amount of data processing in response to the queries, the system takes several approaches.
In some embodiments, the system divides user queries into two categories: batch queries (filtering, transformation, and aggregate) and lookup queries. In-memory cluster computing is used to process batch queries (filter, sort, transform, and aggregate). In this manner, the computing cluster scales out as more user requests are received, reclaims computing nodes when users' job are completed, and scales down when no new user requests come in.
The system can be applied in a variety of fields. The system provides useful data and analysis to pharmaceutical companies, including informaticians, bench scientists, medical director, the senior executive team, or commercial organizations. Such data and analysis can include analysis of clinical trial data for patient stratification and biomarker discovery, identification and in silico validation of novel genetic targets, discovery of novel disease and dose response biomarkers/signatures, compound repurposing and expand indications of marketed drugs, rescue of failed phase 3 assets, real time genetic analysis of adverse events, or targeted accelerated recruitment for clinical trials. For academic research groups, including physicians/principal investigators, informaticians, research scientists and geneticists, the system can offer analysis of specific cohorts, analysis of individual patients, or large scale analysis of variation in populations. Clinics, hospitals and cancer centers, including physicians and genetic counsellors, may also find the system useful in the analysis of individuals, analysis of cohorts, wellness focus, or oncology focus. The data and analysis can also prove valuable to insurance companies, actuarial teams, or health economists.
Specifically, for pharma and researchers, the system can serve as or enable a reference set of knowledge/evidence, an hypothesis generation engine, a platform for analysis of pharma's own data, a platform for combination of pharma data and data and analysis provided by the system, a platform for combining data from multiple collaborators, a platform for sharing data within a company, etc. For physicians or genetic counsellors, the system can similarly be used as part of a care tool to identify the most relevant results for treatment and prevention, a reference set of knowledge/evidence, or a tool to identify other physicians with similar patients/share knowledge. In addition, for insurance companies, the system can be useful as part of a tool for detect individual care pathway and incentivize healthy living or a tool to help quantify risk that they have in the insured population.
In some embodiments, the platforms, systems, media, and methods described herein include a digital processing device, or use of the same. In further embodiments, the digital processing device includes one or more hardware central processing units (CPUs) or general purpose graphics processing units (GPGPUs) that carry out the device's functions. In still further embodiments, the digital processing device further comprises an operating system configured to perform executable instructions. In some embodiments, the digital processing device is optionally connected a computer network. In further embodiments, the digital processing device is optionally connected to the Internet such that it accesses the World Wide Web. In still further embodiments, the digital processing device is optionally connected to a cloud computing infrastructure. In other embodiments, the digital processing device is optionally connected to an intranet. In other embodiments, the digital processing device is optionally connected to a data storage device.
In accordance with the description herein, suitable digital processing devices include, by way of non-limiting examples, server computers, desktop computers, laptop computers, and notebook computers.
In some embodiments, the digital processing device includes an operating system configured to perform executable instructions. The operating system is, for example, software, including programs and data, which manages the device's hardware and provides services for execution of applications. Those of skill in the art will recognize that suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®. Those of skill in the art will recognize that suitable personal computer operating systems include, by way of non-limiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX-like operating systems such as GNU/Linux®. In some embodiments, the operating system is provided by cloud computing.
In some embodiments, the device includes a storage and/or memory device. The storage and/or memory device is one or more physical apparatuses used to store data or programs on a temporary or permanent basis. In some embodiments, the device is volatile memory and requires power to maintain stored information. In some embodiments, the device is non-volatile memory and retains stored information when the digital processing device is not powered. In further embodiments, the non-volatile memory comprises flash memory. In some embodiments, the non-volatile memory comprises dynamic random-access memory (DRAM). In some embodiments, the non-volatile memory comprises ferroelectric random access memory (FRAM). In some embodiments, the non-volatile memory comprises phase-change random access memory (PRAM). In other embodiments, the device is a storage device including, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, magnetic disk drives, magnetic tapes drives, optical disk drives, and cloud computing based storage. In further embodiments, the storage and/or memory device is a combination of devices such as those disclosed herein.
In some embodiments, the digital processing device includes a display to send visual information to a user. In some embodiments, the display is a cathode ray tube (CRT). In some embodiments, the display is a liquid crystal display (LCD). In further embodiments, the display is a thin film transistor liquid crystal display (TFT-LCD). In some embodiments, the display is an organic light emitting diode (OLED) display. In various further embodiments, on OLED display is a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display. In some embodiments, the display is a plasma display. In other embodiments, the display is a video projector. In some embodiments, the display is a wearable display. In still further embodiments, the display is a combination of devices such as those disclosed herein.
In some embodiments, the digital processing device includes an input device to receive information from a user. In some embodiments, the input device is a keyboard. In some embodiments, the input device is a pointing device including, by way of non-limiting examples, a mouse, trackball, track pad, joystick, game controller, or stylus. In some embodiments, the input device is a touch screen or a multi-touch screen. In other embodiments, the input device is a microphone to capture voice or other sound input. In other embodiments, the input device is a video camera or other sensor to capture motion or visual input. In further embodiments, the input device is a Kinect, Leap Motion, or the like. In still further embodiments, the input device is a combination of devices such as those disclosed herein.
In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked digital processing device. In further embodiments, a computer readable storage medium is a tangible component of a digital processing device. In still further embodiments, a computer readable storage medium is optionally removable from a digital processing device. In some embodiments, a computer readable storage medium includes, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, cloud computing systems and services, and the like. In some cases, the program and instructions are permanently, substantially permanently, semi-permanently, or non-transitorily encoded on the media.
In some embodiments, the platforms, systems, media, and methods disclosed herein include at least one computer program, or use of the same. A computer program includes a sequence of instructions, executable in the digital processing device's on or more CPUs, written to perform a specified task. Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), data structures, and the like, that perform particular tasks or implement particular abstract data types. In light of the disclosure provided herein, those of skill in the art will recognize that a computer program may be written in various versions of various languages.
The functionality of the computer readable instructions may be combined or distributed as desired in various environments. In some embodiments, a computer program comprises one sequence of instructions. In some embodiments, a computer program comprises a plurality of sequences of instructions. In some embodiments, a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations. In various embodiments, a computer program includes one or more software modules. In various embodiments, a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof.
In some embodiments, a computer program includes a web application. In light of the disclosure provided herein, those of skill in the art will recognize that a web application, in various embodiments, utilizes one or more software frameworks and one or more database systems. In some embodiments, a web application is created upon a software framework such as Apache Hadoop, Microsoft®.NET, or Ruby on Rails (RoR). In some embodiments, a web application utilizes one or more database systems including, by way of non-limiting examples, relational, non-relational, object oriented, associative, and XML database systems. In further embodiments, suitable relational database systems include, by way of non-limiting examples, Microsoft® SQL Server, mySQL™ and Oracle®. Those of skill in the art will also recognize that a web application, in various embodiments, is written in one or more versions of one or more languages. A web application may be written in one or more markup languages, presentation definition languages, client-side scripting languages, server-side coding languages, database query languages, or combinations thereof. In some embodiments, a web application is written to some extent in a markup language such as Hypertext Markup Language (HTML), Extensible Hypertext Markup Language (XHTML), or eXtensible Markup Language (XML). In some embodiments, a web application is written to some extent in a presentation definition language such as Cascading Style Sheets (CSS). In some embodiments, a web application is written to some extent in a client-side scripting language such as Asynchronous Javascript and XML (AJAX), Flash® ActionScript, Javascript, or Silverlight®. In some embodiments, a web application is written to some extent in a server-side coding language such as Active Server Pages (ASP), ColdFusion, Perl, Java™, JavaServer Pages (JSP), Hypertext Preprocessor (PHP), Python™, Ruby, Tcl, Smalltalk, WebDNA®, or Groovy. In some embodiments, a web application is written to some extent in a database query language such as Structured Query Language (SQL). In some embodiments, a web application integrates enterprise server products such as IBM® Lotus Domino®. In some embodiments, a web application includes a media player element. In various further embodiments, a media player element utilizes one or more of many suitable multimedia technologies including, by way of non-limiting examples, Adobe® Flash®, HTML 5, Apple® QuickTime®, Microsoft® Silverlight®, Java™, and Unity®.
In some embodiments, a computer program includes a mobile application provided to a mobile digital processing device. In some embodiments, the mobile application is provided to a mobile digital processing device at the time it is manufactured. In other embodiments, the mobile application is provided to a mobile digital processing device via the computer network described herein.
In view of the disclosure provided herein, a mobile application is created by techniques known to those of skill in the art using hardware, languages, and development environments known to the art. Those of skill in the art will recognize that mobile applications are written in several languages. Suitable programming languages include, by way of non-limiting examples, C, C++, C#, Objective-C, Java™, Javascript, Pascal, Object Pascal, Python™, Ruby, VB.NET, WML, and XHTML/HTML with or without CSS, or combinations thereof.
Suitable mobile application development environments are available from several sources. Commercially available development environments include, by way of non-limiting examples, AirplaySDK, alcheMo, Appcelerator®, Celsius, Bedrock, Flash Lite, .NET Compact Framework, Rhomobile, and WorkLight Mobile Platform. Other development environments are available without cost including, by way of non-limiting examples, Lazarus, MobiFlex, MoSync, and Phonegap. Also, mobile device manufacturers distribute software developer kits including, by way of non-limiting examples, iPhone and iPad (iOS) SDK, Android™ SDK, BlackBerry® SDK, BREW SDK, Palm® OS SDK, Symbian SDK, webOS SDK, and Windows® Mobile SDK.
Those of skill in the art will recognize that several commercial forums are available for distribution of mobile applications including, by way of non-limiting examples, Apple® App Store, Google® Play, Chrome WebStore, BlackBerry® App World, App Store for Palm devices, App Catalog for webOS, Windows® Marketplace for Mobile, Ovi Store for Nokia® devices, Samsung® Apps, and Nintendo® DSi Shop.
In some embodiments, the platforms, systems, media, and methods disclosed herein include software, server, and/or database modules, or use of the same. In view of the disclosure provided herein, software modules are created by techniques known to those of skill in the art using machines, software, and languages known to the art. The software modules disclosed herein are implemented in a multitude of ways. In various embodiments, a software module comprises a file, a section of code, a programming object, a programming structure, or combinations thereof. In further various embodiments, a software module comprises a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, or combinations thereof. In various embodiments, the one or more software modules comprise, by way of non-limiting examples, a web application, a mobile application, and a standalone application. In some embodiments, software modules are in one computer program or application. In other embodiments, software modules are in more than one computer program or application. In some embodiments, software modules are hosted on one machine. In other embodiments, software modules are hosted on more than one machine. In further embodiments, software modules are hosted on cloud computing platforms. In some embodiments, software modules are hosted on one or more machines in one location. In other embodiments, software modules are hosted on one or more machines in more than one location.
In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more databases, or use of the same. In view of the disclosure provided herein, those of skill in the art will recognize that many databases are suitable for storage and retrieval of user, patent, phenotypic, genomic, microbiome, and metabolome information. In various embodiments, suitable databases include, by way of non-limiting examples, relational databases, non-relational databases, object oriented databases, object databases, entity-relationship model databases, associative databases, and XML databases. Further non-limiting examples include SQL, PostgreSQL, MySQL, Oracle, DB2, and Sybase. In some embodiments, a database is internet-based. In further embodiments, a database is web-based. In still further embodiments, a database is cloud computing-based. In other embodiments, a database is based on one or more local computer storage devices.
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention.
This application claims the benefit of U.S. provisional application Ser. No. 62/253,629 filed Nov. 10, 2015, U.S. provisional application Ser. No. 62/296,986 filed Feb. 18, 2016, and U.S. provisional application Ser. No. 62/362,892 filed Jul. 15, 2016, the entire contents of each of which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62253629 | Nov 2015 | US | |
62296986 | Feb 2016 | US | |
62362892 | Jul 2016 | US |