The following relates generally to the clinical testing arts, genomic testing arts, genomic data processing architecture arts, and related arts.
Genomics is a powerful tool for medical diagnosis, treatment selection, and other clinical tasks. In the last 15 years, since the first published map of the Human genome, the introduction of next generation sequencing has enabled interrogation of structural and functional variations across the entire human genome. The rate at which the cost of sequencing has fallen as a function of time has far surpassed the rate of integrated circuit miniaturization predicted by Moore's law. Recent large efforts such as the 1000 Genomes which mapped human genome variation across different populations, and The Cancer Genome Atlas which mapped tumor biology across 40 tissue types have stimulated biomedical research with great potential impact on the diagnosis and treatment of cancer and other ailments. Yet challenges remain in bringing genomic sequencing into common usage in clinical practice, and in effectively leveraging genomic sequencing data to yield actionable clinical information.
The following discloses a new and improved systems and methods.
In one disclosed aspect, a clinical genomic data processing device comprises at least one microprocessor and a non-transitory storage medium storing instructions. These include: instructions readable and executable by the at least one microprocessor to implement a user interface configured to receive requests for execution of genomic workflows and to display output generated by the execution of the genomic workflows; instructions readable and executable by the at least one microprocessor to implement a genomic workflow manager configured to manage an asynchronous messaging queue and to manage the execution of the genomic workflows; and instructions readable and executable by the at least one microprocessor to implement service providers configured to perform jobs associated with the genomic workflows. The genomic workflow manager is configured to communicate with the service providers by messages exchanged via the asynchronous messaging queue to manage the execution of the genomic workflows via jobs performed by the service providers.
In another disclosed aspect, a non-transitory storage medium stores instructions readable and executable by at least one microprocessor to perform clinical genomic data processing. The instructions include: instructions readable and executable by the at least one microprocessor to implement a user interface configured to receive requests for execution of genomic workflows and to display output generated by the execution of the genomic workflows; instructions readable and executable by the at least one microprocessor to implement a genomic workflow manager configured to manage an asynchronous messaging queue and to manage the execution of the genomic workflows; and instructions readable and executable by the at least one microprocessor to implement service providers configured to perform jobs associated with the genomic workflows. The service providers include at least one genomic processing service provider configured to perform a job comprising processing genomic data to generate a list of aberrations, at least one annotation service provider configured to perform a job comprising processing a list of aberrations to generate annotated aberrations, at least one aberration prioritization service provider configured to perform a job comprising processing a list of annotated aberrations to generate a prioritized list of annotated aberrations, and at least one reporting service provider configured to perform a reporting job comprising at least display of a list of annotated aberrations via the user interface and receipt of a clinical report via the user interface. The genomic workflow manager is configured to communicate with the service providers by messages exchanged via the asynchronous messaging queue to manage the execution of the genomic workflows via jobs performed by the service providers.
In another disclosed aspect, a clinical genomic data processing method is disclosed. Via a web-based user interface, requests are received for execution of genomic workflows and output generated by the execution of the genomic workflows is displayed. Via service providers implemented on a cloud-based platform comprising microprocessors, jobs associated with the genomic workflows are asynchronously performed. Via a genomic workflow manager implemented on the cloud-based platform, state machines representing the genomic workflows are maintained, and communication with the service providers is performed by messages exchanged via an asynchronous messaging queue to manage the execution of the genomic workflows via the jobs asynchronously performed by the service providers. The genomic workflow manager further updates states of the state machines in accord with messages received from the service providers via the asynchronous messaging queue indicating successful completion of the jobs performed by the service providers.
One advantage resides in providing clinical genomic data processing devices and methods that are more effectively integrated with clinical workflows.
Another advantage resides in providing clinical genomic data processing devices and methods with a service-oriented architecture (SOA), preferably cloud-based, which employs service providers that can be frequently updated to implement the latest clinical knowledge (e.g. most up-to-date aberration definitions, most up-to-date annotation databases, current information on upcoming and in-progress clinical trials, latest therapy information, and so forth) without taking the clinical genomic data processing offline.
Another advantage resides in providing clinical genomic data processing devices and methods with an SOA architecture, preferably cloud-based, which employs service providers to perform jobs associated with genomic workflows and further provides a genomic workflow manager that manages an asynchronous messaging queue for communicating with the service providers to enable asynchronous parallel processing of various workflow tasks.
Another advantage resides in providing clinical genomic data processing devices and methods with an improved user interface for presenting the most clinically relevant genomic aberrations to clinicians.
Another advantage resides in providing clinical genomic data processing devices and methods with improved patient data security.
Another advantage resides in providing clinical genomic data processing devices and methods with an improved user interface that reduces the need to cut-and-paste information between processing components
Another advantage resides in providing clinical genomic data processing devices and methods providing processing of genomic data to generate clinically actionable information with improved computational efficiency.
A given embodiment may provide none, one, two, more, or all of the foregoing advantages, and/or may provide other advantages as will become apparent to one of ordinary skill in the art upon reading and understanding the present disclosure.
The invention may take form in various components and arrangements of components, and in various steps and arrangements of steps. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. In drawings presenting log or service call data, certain identifying information has been redacted by use of superimposed redaction boxes.
A difficulty with leveraging genomics in clinical practice is a dearth of informatics to store, manage, analyze and contextualize this data in a streamlined way that supports the clinical workflow of the clinical experts like oncologists and pathologists. The challenge is that there are many therapeutic options and many clinical trials and it is hard to test for one gene at a time. Next Generation Sequencing (NGS) platforms provide an opportunity to sequence genomes in a high throughput manner at reasonable cost. Algorithms exist that generally convert genomic data into meaningful biological information. Such algorithms are typically geared towards the bioinformatics expert user. Clinical specialists spend decades in obtaining specific expertise and forming their approach to problem solving and helping patients. In their way of thinking, the informatics tools they use should have a natural flow keeping in mind problems to solve and pertinent information that is needed to accomplish their task. Certain tasks may involve logging into half a dozen different IT systems, manually cutting and pasting, which may reduce the visibility of the right information and increases the chances of making errors.
Accordingly, it would be desirable to provide an informatics platform that includes a user experience that presents information in a lucid, workflow supporting fashion while leveraging clinical knowledge from various resources for annotation and interpretation to address the needs of clinical experts. Various embodiments follow a philosophy that the technology should be working to reduce time, increase the productivity and the chances for great outcome for patients. Information is deeply embedded in data which is not easily accessible in the modern day EMRs, LIS, and other clinical applications. In some embodiments, seeking expert opinion from other more experienced clinicians is an available option within the context of the decision making of a single patient.
According to various embodiments, the goal is to process genomics and clinical data including imaging and pathology data as well as any other real time diagnostic inputs to provide precision diagnostics. Some clinical questions to be answered include the following: How to match a tumor's genotype with a potential therapy for best outcome? How to elucidate the cancer subtypes in a set of tumour samples characterized at genomic, transriptomic, proteomic, epigenomic and metabolomics level? How to provide a new hypothesis and diagnosis for a patient who has been through an extensive battery of tests and is still a medical mystery? How to associate the patient microbiome data with the health condition of the person?
However, converting high-throughput genomic data into clinically actionable information is not a straightforward task. A first challenge is to be able to ingest and store extremely large amounts of genomic data (up to 1 TB for a single patient whole genome) in a reliable and secure manner while satisfying legal requirements for long-term storage. A second challenge is to be able to run asynchronously parallel processing heterogeneous pipelines and associated jobs (e.g. sequence alignment, variant and mutation calling, copy number variation detection), written in various programming languages, in a highly quality controlled, reliable, reproducible and scalable manner. A third challenge is to dynamically integrate domain-specific knowledge from various databases that may require frequent updates and to generate clinically actionable results that are reproducible during subsequent runs. A fourth challenge is to enable continuous communication across clinical specialties because oncology is usually a collaborative effort. There are many different insights to be conveyed and put together both from each clinician and also the outputs of the many smart algorithms. Various embodiments disclosed herein facilitate sharing pertinent information, communicating discordance between various types of clinical evidence, and promoting problem solving both for the diagnostic process as well as for the therapeutic planning and monitoring phase of the patient care.
Various embodiments described herein utilize a software product deployed within a cloud based platform running on various hardware including processors (e.g., microprocessors, FPGAs, ASICs, etc.), memories (e.g. L1/L2/L3 cache, system memory, and storage devices), network interfaces (e.g., Ethernet, WiFi, etc.), and so forth. An aim of the software is to provide readable and interpretable genomic information, which will present suggestions for therapy planning in oncology, however could be also used for constitutive genomics and other fields.
Various embodiments disclosed herein take data output from next generation sequencing machines, and other genomics instruments along with data from various clinical information technology (IT) systems and perform functions such as the following: (1) perform many different processes at the same time for a multitude of institutions with many users with different types of roles (e.g. oncologist, geneticist, pathologist, bioinformatician, molecular specialist); (2) automated execution of specific analytic pipelines for gene panels, whole exomes and whole genomes in order to detect DNA/RNA aberrations, by utilizing bioinformatics algorithms, and integrating such information to provide the clinical experts (such as oncologists, pathologists, and medical geneticists) with user portals with guided workflows to enable the review of candidate aberrations, along with annotated information, to facilitate aberration selection (i.e., aberrations the clinician believes are associated with the disease) and clinical report generation, clinical information and/or therapeutic treatment options, based on public and/or curated private database, and descriptions and links to eligible clinical trials; (3) associations of new genomic/transcriptomic/epi-genomic/proteomic biomarkers; (4) storage of both raw and analyzed data combining results from NGS with patient demographic, diagnostic, lifestyle and outcome data; (5) storage of analytical metadata and intermediate files produced in analysis; (6) storage of end-user actions which were applied to produce a clinical report; (7) social-network like communication features between clinicians to share more case-related information and second opinions; and (8) assembling the relevant information and generating a clinical report.
With reference now to
An application layer sits on top of the HSDP Cloud Foundry Network 16 (or similar) and perform various functions such as: providing connections 18 to PaaS microservices 20; implementing new microservices 20 specific for oncology, for example, clinical reporting microservice, annotation microservice, therapy matching microservice, clinical trial microservice, variant prioritization microservice, variant filtering microservice, auditing and logging, identify access management, pipeline management microservice and many others; implementing a workflow manager 22 which receives requests for execution of genomic workflows, queues up jobs associated with the genomic workflows (in the illustrative embodiment, using a RabbitMQ messaging bus 24, or more generally an asynchronous messaging queue managed by the workflow manager 22) and orchestrates the execution of these jobs by the service providers 20; and provides a back end webserver 26 which executes the complicated computations in order to manage the user events and visualize complex results. In the illustrative embodiment, the webserver 26 presents a user interface 26, 28 in the form of the webserver 26 with an HSDP cloud foundry proxy 28 via which a web client 30 (such as a web browser, e.g. Google Chrome, Mozilla Firefox, Microsoft Internet Explorer or so forth, or a custom web client communicating via a secure HTTPS protocol) communicates with the illustrative clinical genomic data processing device. The web client 30 only renders output and receives requests from the user.
The instructions stored on the non-transitory storage medium 12 include: instructions readable and executable by the at least one microprocessor 10 to implement the user interface 26, 28 configured to receive requests for execution of genomic workflows and to display output generated by the execution of the genomic workflows; instructions readable and executable by the at least one microprocessor 10 to implement the genomic workflow manager 22 configured to manage the asynchronous messaging queue 24 and to manage the execution of the genomic workflows; and instructions readable and executable by the at least one microprocessor 10 to implement the service providers 20 configured to perform jobs associated with the genomic workflows. The genomic workflow manager 22 is configured to communicate with the service providers 20 by messages exchanged via the asynchronous messaging queue 24 to manage the execution of the genomic workflows via jobs performed by the service providers.
As is known in the art, the non-transitory storage medium 12 which stores instructions that are readable and executable by at least on microprocessor 10 may, by way of non-limiting illustration, comprise memories such as L1/L2/L3 cache, system memory, and storage devices such as a hard disk drive, RAID disk array or other magnetic storage medium; a solid state drive (SSD) or other electronic storage medium, an optical disk or other optical storage medium, various combinations thereof, or so forth. The cloud-based system comprises the at least one microprocessor (e.g. server computers) 10 interconnected via network interfaces (e.g., Ethernet, WiFi, etc.), and the non-transitory storage medium 12. The web client 30 is typically implemented on a desktop computer, notebook computer, mobile device such as a cellphone, tablet computer or the like, which provides a display for presenting output generated by the execution of the genomic workflows, and one or more user input devices such as a keyboard, mouse, touch-sensitive display, dictation microphone, or so forth via which a user may initiate requests for execution of genomic workflows, enter or edit clinical reports, and otherwise interact with the clinical genomic data processing device.
The illustrative service providers 20 are microservices. Microservices are considered an extension of service-oriented architectures (SOA) used to build distributed software systems. Microservices are processes that communicate with each other over a nework using lightweight protocols. A benefit of using microservices is to enhance the cohesion and decrease coupling of software. This facilitates the ability to continuously add or drop services and refactor the system. In some embodiments, all microservices are stateless and share nothing. Any data that needs to persist must be stored in a stateful backing service, typically a database such as a cloud-based storage 32, e.g. Amazon Simple Storage Service (S3, available from Amazon Web Services, Inc.) in the illustrative embodiment. Microservices may declare all dependencies, completely and exactly, via a dependency declaration manifest. Furthermore, a dependency isolation tool may be used during execution to ensure that no implicit dependencies “leak in” from the surrounding system. The full and explicit dependency specification is applied uniformly to both production and development. The clinical genomic data processing device can have a configuration server (for example Spring Batch) and a Git repository (or similar type of software repository) that will hold the configuration for all micro services. The configuration server may be provided by a cloud foundry (e.g. the illustrative HSDP cloud foundry 14) or another, proprietary instance.
With reference to
In the following, examples of various illustrative service providers 20 are described. Some of the illustrative microservices include: at least one genomic processing service provider 201 configured to perform a job comprising processing genomic data to generate a list of aberrations (see
With returning reference to
The workflow manager 22 enables the workflows to be interpreted as state machines. Each step in the state machine is a job work item (e.g., a piece of software code) to be processed. The workflow manager 22 manages workflows—it does not perform any task by itself but rather relies on different job providers 20 for performing the specific jobs. When a workflow request arrives it is stored in a persistence layer and processed. The first job item is sent via the queue 24 to the specific provider 20 which supplies it. Once an item has successfully processed by a provider 20 it notifies the workflow manager 22 via the queue mechanism 24. At this point the workflow manager 22 updates the state of the state machine and sends the second job in the request to the second job provider 20 and so on until all the jobs are done or there was a failure. At that point the workflow manager 20 updates the status of the executing workflow with success or failure for the step performed by the completed job(s). The illustrative clinical genomic data processing device takes into consideration that both the workflow manager 22 and its providers 20 are microservices and that, at any point in time, a job may be handled by a different workflow manager or by provider instances. The workflow manager 22 will thus use the microservices cloud infrastructure services.
With continuing reference to
With reference now to
Once the annotation manager 202 receives an annotation match request it may perform one or more of the following steps. (1) Receive all genomic aberrations (SNV, CNV, fusions) for the requested workflow process. (2) Retrieve a list of all available annotation sources and their respective latest active versions (unless specified otherwise). (3) Create a progression entry for each annotation source in order to mark the progress of annotation with that particular source. (4) Send annotation match request to a specific service called vcfEtl, which is responsible for fetching and transformation of the entries of the vcf file into annotated entries, one per annotation source, with each row representing another genomic aberration. (5) Send an acknowledgement to the messaging broker 54 (messaging is asynchronous, decoupling applications by separating sending and receiving data). (6) After this point, the annotation match requests are processed by vcfEtl instances and upon completion they send annotation match responses with a body of the annotation results. (7) When receiving annotation match response the annotation manager 202 updates the progression entry for the source that responded. At this stage it checks that this response was not already received and failed due to error. However, if there was an error in the past the annotation manager 202 performs a database clean-up of the annotation results and another attempt to reprocess the response. (8) The annotation results for this source are stored in the database as annotated results. (9) The entry noting the progress for this source is updated to “done”. (10) The annotation manager 202 checks if all match sources returned successfully using the progression entries. If the match resources have not yet returned successfully, then it waits, and if some failed it returns a “fail” to the workflow manager 22. If all are successful then it returns a job done with success status to the workflow manager 22. (11) After this, the annotation results become available for the next steps of the genomic workflow, for example displaying results via the user interface 26, 28 or for submitting these results for therapy and clinical trial matching.
Once all annotation engines 50 have notified the annotation manager 202 they are done the annotation manager 202 creates the annotation entries, and sends a notification to the workflow manager 22 that the annotation job has been done and all results are available to be retrieved.
Because biological and clinical knowledge is an ever growing area, new annotation databases 52 may be brought into the engines 50 to update the annotation capabilities of the clinical genomic data processing device on a continuous basis. There are at least two ways: 1) a database for an annotation engine has a new version, or 2) a completely new database may be included with a novel data schema.
With reference now to
In the illustrative embodiment of
Quality information comes as part of the genomics processing pipelines 201a (see
Actionability is based on availability of U.S. Food and Drug Administration (FDA) approved therapies or trial matches for a specific gene or specific variant.
Disease context is suitably defined as follows. For each type of cancer (in an illustrative oncology workflow), there is a priority list of genes which are very relevant for that type of cancer. For example: Jak2 for myelodisplastic syndromes, BRAF for melanomas, EGFR for lung and colon cancer. Additionally, this step could also rely on an internal database which is curated and where there is high interest in the in-house curated genes, these should be prioritized higher for the hospital where the test is being performed.
Location of the variant can be variously defined: genic (exonic, intronic, variants that a located on the 5′ untranslated gene region (5′ UTR) of 3′ UTR untranslated gene region) and intergenic. If a variant is exonic then should be prioritized by the order given above. Impact on the protein function can be considered for exonic variants: The impact classification includes non-synonymous (missense, nonsense), frameshift, insertion, deletion, duplication, indel, synonymous. Another factor may be Hub in a Pathway based prioritization: If a gene has many connections within a pathway, we will prioritize this gene higher than other genes.
For non-synonymous aberrations, the following may be considered. Functional prediction: which refer to prediction scores for deleteriousness of the variant: benign, deleterious, tolerated (or high, medium low impact on the gene function), as they are given by SIFT, PolyPhen, FATHM, MUTATIONTASTER, and others. “D” may be denoted as a score based on the values in these databases that signifies that a variant has deleterious effect on the function of that gene. Another factor may be protein effect: gain of function, or loss of function (predicted or proven) and no effect. In various embodiments, when there is effect, the annotation is 1, otherwise, the annotation is 0. Another factor may be impact on regulatory elements, such as: transcription factor binding sites, methylation sites, long-noncoding RNAs regions, microRNAs regions.
Frequency information may be based on the frequency of the variant in specific databases (for example, external knowledge bases like TCGA or internal knowledge bases). The frequency information can also be obtained from other external knowledge bases, or from the so-called beacons (https://beacon-network.org) which is a federated ecosystem for sharing genomic and clinical data as part of the Global Alliance for Genomics and Health consortium.
In the illustrative variant prioritization of
Various embodiments of the aberration prioritization service provider 203 may utilize additional or alternative information for filtering and/or ranking variants for display to the clinician. According to some embodiments, superset categories are defined and a score based on these supersets is assigned to each variant. These scores are used to filter and rank each variant. The categories may in one illustrative embodiment include the following, in order of importance: dataset detection, functional, disease, other evidence, which are described in turn in the following.
External/internal dataset detection is one of the more important aspects of variant prioritization in regards to treatment and clinical trial matching, the reason being that if a variant does not exist in other patients it may be unlikely a clinical trial will be designed specifically targeting that variant. Dataset detection is an annotation that results from querying external (such as the Beacon network) and internal (such as hospital IT systems) variant datasets and returns a value of ‘true’ if the variant supplied in the query exists elsewhere, and ‘false’ otherwise. In some embodiments, these datasets are chosen based on those that are sufficiently large enough (e.g., in the order of hundreds of thousands or millions) to be confident in the result. This category may return a value of 100 or 0 for ‘detected’ or ‘not detected’, respectively. This category is heavily weighted for clinical trial matching specifically.
The functional category may include annotations (which can originally range in the hundreds) indicating the functional significance of a variant. In various embodiments, only variants which are identified as non-synonymous are considered, and only annotations indicating deleteriousness/pathogenicity are weighed (such as SIFT, Polyphen-2, Mutation Assessor, Condel, FATHMM, CHASM, and transFIC cancer-impact tools). The value of each weighed annotation may be a value of 1 or 0 (or a scaled value between 1 and 0 for annotations with numeric values), depending on whether the conclusion is deleterious/pathogenic or not. This category returns the average of these values. These values may only be considered for annotations that exist in each variant.
The disease category recognizes that the presentation of a variant in human disease (such as cancer) is important for identifying clinical trials or therapies targeting that specific disease. Supplied with the disease indication of the patient, and the disease associated with the variant (an annotation sourced from databases such as ClinVar, or the Jackson Laboratory's Clinical Knowledgebase), variant priority can be decided with in the order as follows: those involved in the disease of the patient, those involved in other diseases, and those not known to have any involvement in human disease (e.g., values of 1, 0.5, and 0, respectively).
Other Evidence is a “catch-all” category. In cases where there is additional data for the sample from other genomic modalities (e.g. transcriptomics), it is possible to gain additional insight about a variant. Some functional prediction tools (e.g. Ensembl Variant Effect Predictor) supply all transcripts associated with a particular variant. However, not all of these transcripts are actively expressed. Cross-referencing transcriptomic data enables the system to assign higher priority to a variant if the transcript annotations matching the variant are being actively expressed.
For a functional annotation paradigm of ‘deleterious vs. non-deleterious’, a conservative expression threshold of 0 is set in some embodiments. According to various embodiments, if the potentially deleterious transcripts are not greater than this threshold, this category is assigned a value of 0. Otherwise, a value of 1 is assigned.
After quality filtering of low confidence variants, the sum is computed for all categories. Variants are sorted and ranked in descending order.
Various embodiments of the aberration prioritization service provider 203 may be implemented as a stand-alone piece of software which processes one or many variant call files (and can be modified to process any data structure containing variant data and aforementioned variant-specific and database-dependent annotations) in a single-processor or parallel schema. The aberration prioritization service provider 203 is situated on-site or in the cloud and the results represent a penultimate step in retrieving the enriched approved dataset of variants (where the final step is clinician approval). For the purposes of identifying potentially disease-causing or actionable variants, there are multiple disparate annotations by which one can prioritize.
One such situation is as follows: a biopsy is sequenced using the genomic sequencer 8 according to the approved laboratory protocol (for example, whole exome sequencing); the sequencing data is processed by the variant calling pipeline 201a (see
With reference now to
Illustrative embodiments of the trial matching service provider 20s are described with reference to
In the following, some suitable embodiments of reporting service providers 204 are next described.
With reference to
With reference to
With continuing reference to
The pathologist receiving a second opinion request (i.e. the second registered user) has a similar application screen as the requesting pathologist, as shown in
After selection of the second opinion aberrations is confirmed by the clinician, the one or more reporting service providers 204 automatically adjusts both worklists to the combined list of selected aberrations 100 shown in
After receiving the second opinion from a second registered user, the reporting pathologist (i.e. primary pathologist, i.e. first registered user) can again access the case from his/her worklist 80 (see
With reference back to
With reference now to
With reference to
The invention has been described with reference to the preferred embodiments. Modifications and alterations may occur to others upon reading and understanding the preceding detailed description. It is intended that the invention be construed as including all such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
This application claims the benefit of U.S. Provisional Application No. 62/401,319 filed Sep. 29, 2016. U.S. Provisional Application No. 62/401,319 filed Sep. 29, 2016 is incorporated by reference herein in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2017/074886 | 9/29/2017 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62401319 | Sep 2016 | US |