The approximately 4,000 Mendelian diseases of known molecular basis are major causes of morbidity and mortality. Effective medical treatment of individual patients with suspected Mendelian diseases requires molecular diagnosis of the particular disease type. Effective treatment of Mendelian diseases includes provision of therapies that target causal disease mechanisms or disease symptoms, genetic counseling of families about risk of recurrence, prognostic determination, and anticipation and amelioration of disease complications and progression. Molecular diagnosis of Mendelian diseases has traditionally been performed by Sanger sequencing individual candidate genes, one at a time, based on their likelihood of causing the symptoms observed in individual patients. This process is obfuscated, however, by the broad range of symptoms that can be manifested in each Mendelian disease and the large number of Mendelian diseases. Next generation sequencing of the whole genome (WGS) or the parts of the genome that contain sets of disease genes (whole or targeted exome sequencing) (WES) is increasingly being used for diagnosis of Mendelian diseases. Genome sequencing, whether whole genome sequencing or sets of disease genes, allows all or most of the Mendelian diseases that cause symptoms in an individual patient to be examined diagnostically at once. This may decrease the time to diagnosis or the cost of diagnostic testing. Earlier diagnosis of Mendelian diseases can enable earlier institution of specific treatments, which may engender improved patient outcomes. It has been shown that it is possible to have molecular diagnosis in 50 hours by rapid whole genome sequencing (STATseq). However, in general, the methods that identify variants in genome sequencing were optimized for common variants and population research, and select against rare or novel deleterious variants that may cause disease, and, therefore, lack sensitivity for diagnosis of genetic diseases.
Neurodevelopmental disorders (NDD), including intellectual disability, global developmental delay and autism, affect more than 3% of children. Etiologic identification of NDD often engenders a lengthy and costly differential diagnostic odyssey without return of a definitive diagnosis. The current etiologic evaluation of NDD is complex: Primary tests include neuroimaging, karyotype, array comparative genome hybridization (array CGH) and/or single nucleotide polymorphism arrays, and phenotype-driven metabolic, molecular and serial gene sequencing studies. Secondary, invasive tests, such as biopsies, cerebrospinal fluid examination, and electromyography, enable diagnosis in a small percentage of additional cases. About 30% of NDD are attributable to structural genetic variation, but more than half of patients do not receive an etiologic diagnosis. Single gene testing for diagnosis of NDD is especially challenging due to profound locus heterogeneity and overlapping symptoms.
As predicted, the introduction of WGS and WES (whole exome sequence) into medical practice has begun to transform the diagnosis and management of patients with genetic disease. Acceleration and simplification of genetic diagnosis is a result of: 1) multiplexed testing to interrogate nearly all genes on a physician's differential at a cost and turnaround time approaching that of a single gene test; 2) the ability to analyze genes for which no other test exists; and 3) the capacity to cast a wide net that can detect pathogenic variants in genes not yet on the clinician's differential. The latter proves particularly powerful for diagnosing patients with very rare or newly discovered genetic diseases, and for patients with atypical or incomplete clinical presentations. Furthermore, new gene and phenotype discovery has increasingly become part of the diagnostic process. The importance of molecular diagnosis is that care of such patients can then shift from interim, phenotypic-driven management to definitive treatment that is refined by genotype. Although early reports indicate that WES enables diagnosis of neurologic disorders, the clinical and cost effectiveness are not known. Data are needed to guide best practice recommendations regarding testing of probands (affected patients) alone versus trios (proband plus parents), use of WES versus WGS, and the appropriate prioritization of genomic testing in an etiologic evaluation for various clinical presentations.
The effectiveness of a WGS and WES sequencing program for children with NDD, featuring an accelerated sequencing modality (rapid WGS, STATseq) for patients with high acuity illness were reported. Diagnostic yield and an initial analysis of the impact on time to diagnosis, cost of diagnostic testing and subsequent clinical care are outlined herein.
Herein are described methods for genome sequencing for diagnosis of genetic diseases with enhanced sensitivity. In one embodiment, whole genome sequencing is described herein with genome-wide genotyping and provisional diagnosis in 24 hours. By combining results from two, parallel bioinformatic methods, 2.8 billion nucleotides were genotyped and 4.9 million variants were detected. This technique increased the identification of rare, potentially disease causing variants 2.5-fold without significant loss of specificity. In 17 families (21 acutely ill neonates and infants) enrolled prospectively, clinical whole genome sequencing gave 10 definitive molecular diagnoses, and clinical management was modified in four. Therefore, rapid whole genome sequencing with twin bioinformatic analyses is effective for diagnosis of genetic disorders. In addition, rapid whole genome sequencing with multiple independent analysis methods (STATseq) produce a superset of highly sensitive variant calls, which increases the sensitivity, rate, or likelihood of diagnosis of genetic disorders.
The system of the present invention is used to perform nucleotide sequence variant detection using two or more independent analysis methods to produce a superset of highly sensitive variant calls (STATseq). Each independent analysis method includes at least one sequence alignment algorithm and at least one variant detection mechanism. Since variant detection methods have individual strengths and weaknesses, the combining of results from at least two methods produces a set of variant calls that could not have been produced by using a single analysis method. These results provided for a significant increase in the number of variants detected. The results include at least a 2.7 fold increase in the number of variants of types that can cause genetic disease.
In addition, the system of the present invention can provide rapid testing and interpretation of genetic diseases that involve large nucleotide inversions, large deletions, insertions, large triplet repeat expansions, gene conversions and complex rearrangements.
Other and further objects of the invention, together with the features of novelty appurtenant thereto, will appear in the course of the following description.
In the accompanying drawings, which form a part of the specification and are to be read in conjunction therewith:
FIG. MD 1 is a flow diagram of the study of the diagnostic sensitivity and accuracy of STATseq.
FIG. MD 2 an illustration of the Kaplan-Meier survival curves of NICU and PICU infants with and without a genetic disease diagnosis shown in (a) and clinical time course of patients CMH487 shown in (b) and CMH569 shown in (c).
FIG. ND s1 is an illustration of paried read alignments to a 5,294 nt interval encompassing the introless MAGEL2 gene on Chr 15q11.2 are shown in the Integrated Genome Viewer.
FIG. ND illustrates diagnoses and inheritance patterns in 100 NDD families tested by genome or exome sequencing, where (a) shows diagnostic outcomes in 100 families and (b) shows inheritance pattern in 45 families. AR, autosomal recessive.
FIG. ND 2 shows clinical features of patients CMH301, CMH663, CMH334 and CMH335. Patient CMH301, with multiple congenital anomalies-hypotonia-seizures syndrome 2 (PIGA, c.68dupG, p.Ser24LysfsX6) at age 2 years (A), 6 years (B), and 10 years (C). (D) Infant CMH663, with compound heterozygous mutations in the mitochondrial malate/citrate transporter (SLC25A1). (E) Male patients CMH334, (left), and CMH335 (right) with X-linked Rett syndrome (MECP2 c.419C>T, p.A140V), and their mother.
FIG. ND 3 provides for the expression of GPI-anchored proteins on peripheral blood cells of patient CMH301. CMH301 was diagnosed with multiple congenital anomalies-hypotonia-seizures syndrome 2. Flow cytometric signals corresponding to CMH301 are shown by the green lines, his mother CMH303 is shown in blue, and a normal control in red. Erythrocytes were stained with anti-CD59 antibodies. Granulocytes, B cells, and T cells were stained with fluorescent aerolysin (FLAER). The orange line represents an unstained normal control. The X-axis is the number of cells. The Y-axis is fluorescence intensity, representing the abundance of protein expression on the cell surface. CMH301 has normal expression of CD59 and decreased expression of glycosylphosphatidylinositol-anchored proteins on granulocytes, B lymphocytes and T lymphocytes.
FIG. ND 4 illustrates the effect of citrate supplementation on urinary citrate and 2-hydroxyglutarate in patient CMH663. CMH663 had combined D-2- and L-2-hydroxyglutaric aciduria. CMH urinary citrate reference value for normal urine is >994 mmol/mol creatinine. CMH urinary 2-OH-glutarate reference value for normal urine is <89 mmol/mol creatinine.
The requirements of genome sequencing for population research and individual diagnosis contrast sharply (
Variants were identified and genotyped with the sensitive Genomic Short-read Nucleotide Alignment Program (GSNAP) and the Genome Analysis Tool Kit (GATK) best practices (Published pipeline). In contrast to 91 genomes analyzed with pipelines developed for population research, the Published pipeline accessed 28% more of the genome and yielded 91% more indels (See Table s1 below).
Table s1 is a comparison of metrics of the Published, Rapid, Diagnostic and Dual pipelines in three genome sequencing samples with each other and those of 91 other published genome sequencing samples. Comparison of sensitivity and specificity of nucleotide variant genotypes of 18- and 26-hour 2×100 cycle HiSeq 2500 genome sequencing of samples UDT_173 and NA12878 with four alignment methods and two variant calling methods. In the Published pipeline, variants were identified and genotyped with the sensitive Genomic Short-read Nucleotide Alignment Program (GSNAP) and the Genome Analysis Tool Kit (GATK) best practices. The Diagnostic pipeline is the novel combination of methods that were developed to cure rare variant loss (GSNAP version 2012.07.12, with default parameters, and GATK version 1.6.13, without Variant Quality Score Recalibration). The Rapid pipeline uses the iSAAC alignment algorithm, version 01.13.01.31, and the starling variant caller, version 2.0.2. The Dual pipeline is the superset of the Diagnostic and Rapid pipelines. The set of consensus correct genotypes (Truth Set) for sample UDT_173 were from hybridization to the Omni4 SNV array. Correct genotypes for NA12878 were from ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/variant_calls/NIST
However, these methods still favored specificity over sensitivity, leading to the removal of rare, novel variants in aligned sequences (bam+, the binary version of the Sequence Alignment/Map format), which were supported by multiple, non-clonal reads and high-quality alignments (absent from Variant Call Format files (vcf−). Removal of rare and variants is problematic for clinical testing as these are enriched for disease causing mutations, significantly decreasing the diagnostic yield of clinical genome sequencing.
To rectify this phenomenon, a set of well supported, rare, potentially pathogenic bam+, vcf− variants in disease genes were used to optimize genome sequencing pipeline components, versions and parameters for diagnostic sensitivity (See
Table s2 is a comparison of sensitivity and specificity of nucleotide variant genotypes of 18- and 26-hour 2×100 cycle HiSeq 2500 genome sequencing of samples UDT_173 and NA12878 with four alignment methods and three variant calling methods. In the Published pipeline, variants were identified and genotyped with the sensitive Genomic Short-read Nucleotide Alignment Program (GSNAP) and the Genome Analysis Tool Kit (GATK) best practices. The Diagnostic pipeline is the novel combination of methods that were developed to cure rare variant loss (GSNAP version 2012.07.12, with default parameters, and GATK version 1.6.13, without Variant Quality Score Recalibration). BWA is the Burrows-Wheeler algorithm, version 0.6.2. Correct genotypes (Truth Set) for sample UDT_173 were from hybridization to the Omni4 SNV array. Correct genotypes for NA12878 were from ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/variant_calls/NIST. Portion a of Table s2 shows four comparisons of the sensitivity and specificity of variant genotypes of GATK with and without VQSR in sample UDT_173. The comparisons feature two alternative HiSeq 2500 genome sequencing run times and two short-read alignment algorithms (GSNAP and BWA). Portion b of Table s2 compares of the sensitivity and specificity of four genome sequencing alignment and variant calling pipelines in three samples. The four were the Published pipeline, the Diagnostic pipeline, a Rapid pipeline (iSAAC 01.13.01.31 and starling 2.0.2, respectively), and the superset of those methods (Dual pipeline) Portion c of Table s2 is a pairwise comparisons of three alignment algorithms (GSNAP, BWA and CASAVA), showing the overlap of variant calls following application of the GATK.
Table s3 is a comparison of sensitivity and specificity of nucleotide variant genotypes of exomes, analyzed in batches of 12 (Illumina TruSeq panel enrichment, 8 GB, 2×100 cycles HiSeq 2500), with OMNI SNP array results.
The specificity of the pipelines in the same samples were compared. Genome-wide array genotypes of common single nucleotide polymorphisms (SNPs) are frequently used for calibration of genome sequencing variant calls. The Diagnostic pipeline had 4.9% greater sensitivity for highly polymorphic SNP genotypes than the Published pipeline, while increasing false positives by only 0.17% (
When used to benchmark genome sequencing, common SNP arrays can overestimate true genotype sensitivity and underestimate accuracy. Therefore, the sensitivity and accuracy of the pipelines in 47 whole fold genome sequencing of a European female (NA12878) were compared for whom there is an accurate consensus set of 2.3 billion genotypes. The Diagnostic pipeline yielded 17% more genotypes than the Published method. 28% of the added genotypes were in the consensus set and correct, while 8.2% were present and incorrect (See
Segregation analysis of parent-child genotypes often aids in identification of rare genetic diseases in a proband. Therefore, the pipelines in genome sequencing of four trios were compared. Remarkably, 95% of an average 6.5 million variants added by the Diagnostic pipeline had concordant genotypes in trios (See Table s4 below). In agreement with singleton genome sequencing comparisons, the new calls were enriched for rare variants, especially those that were known or likely to cause genetic diseases (90% increase in rare ACMG category 1-3 variants). Notably, 69% of these had concordant genotypes in trios. These were especially likely to be true positives, since the prior probability of their being false calls was <0.0001. In contrast, there was only a 21% increase in rare, likely pathogenic false positive variants. However, the latter was likely an overestimate, since it was unadjusted for true positive de novo variants. In summary, two lines of evidence suggested that the Diagnostic pipeline reported twice as many variants in singleton, deep genome sequencing that could potentially cause rare genetic diseases, without an obfuscating increase in false positives.
Table s4 is a comparison of concordant and discordant variant genotypes in whole genome sequencing of four sets of trios with the Published and Diagnostic pipelines, showing results for rare, pathogenic variants and all variants.
Recent studies have shown that variants identified by alignment algorithms and variant callers have less overlap than anticipated, challenging the notion of a single, gold standard pipeline. In light of this, a dual pipeline that reported the superset of two alignment algorithms and variant callers were evaluated. The iSAAC aligner and associated starling variant caller (Rapid pipeline) were 8-fold faster than other methods, conforming to another major attribute of genome sequencing for neonatal diagnosis. The Rapid pipeline did identify variants other than those reported by the Published pipeline (
With the caveat of a modest increase in false positives, these results have implications for human genome evolution. Two common measures of this are variant density and heterozygosity. The Dual pipeline accessed 28% more of the reference genome than that reported in 91 prior whole genome sequences, and the variant density and heterozygosity were 1.71/kb and 1.09/kb, respectively, which were increases of 15% and 25% (See Table s2).
The increase in rare, potentially pathogenic variants was even greater (2.7-fold,
24-Hour Whole Genome Sequencing for Genetic Disease Diagnosis
For practical use in guidance of management of acute illness in hospitalized children with suspected genetic diseases, genomic diagnosis must be extremely rapid. While it was recently demonstrated the feasibility of genomic diagnosis of rare genetic diseases in 50 hours, the practical time-to-result for a trio was typically five to seven days. This reflected the time for necessary discussion and decision making by physicians and parents, the consent process, and the practicalities of trio phlebotomy and trio sequencing. Therefore, a two track, expedited diagnostic genome sequencing workflow was developed, whereby a first result was obtained in the proband with the Rapid pipeline after 24-hours, with subsequent results from the Diagnostic pipeline (See
Table s5 is a comparison of the metrics of sequence yield and quality of 18- and 26-hour genome sequencing (HiSeq 2500 2×100 nt rapid-run mode). In portion a of Table s5, R2 refers to read 2. 18 hour runs had marginally better quality than 26 hour runs, given slight differences in average cluster density. This might have been due to the shorter time of slide exposure to laser light and lesser loss in reagent stability. Portion b of Table s5 is a comparison of 18- and 26-hour genome sequencing metrics (HiSeq 2500 2×100 nt rapid-run mode), showing correlations between cluster density and metrics of sequence yield and quality. Cluster density explained much of the variability in yield, quality score, error rate and % reads passing filter.
An extreme bottleneck in diagnostic genome sequencing has been variant interpretation. To focus first on relevant variant interpretation, a healthcare provider entered the clinical features present in the neonate into clinicopathological correlation tools that mapped them to the corresponding diseases and genes. Interpretation of genome sequencing-derived variants and provisional molecular diagnosis were performed in less than one hour with VIKING interpretation software, which integrated the superset of relevant disease mappings and annotated variant genotypes, and allowed dynamic filtering of variants based on variables such as ACMG category, MAF, genotype, gene or inheritance pattern (See
Diagnostic Yield in a Prospective Case Series
Feasibility studies do not necessarily convey clinical utility. To assess the diagnostic utility of rapid genome sequencing, 56 individuals from 17 families were prospectively enrolled, with 21 undiagnosed newborns, stillborns or infants with symptoms and signs that suggested a genetic disorder (See Tables 1 and s6 below). Probands were selected for an assumed high pretest probability of genetic diagnosis and disease acuity, and were from three tertiary-care children's hospitals. Definitive molecular diagnoses in 48% (10) of affected individuals were identified. All potentially disease causing variants were confirmed by Sanger sequencing. Remarkably, five different patterns of inheritance were observed, and causative mutations occurred de novo in three probands. Consistent with this, recent data has suggested a surfeit of de novo mutations causing genetic diseases (Soden et al., in preparation). The spectrum of presentations was very broad and the clinical features prompting nomination for genome sequencing were frequently atypical for the condition that was diagnosed (See Table s6 below). A novel, plausible candidate disease gene was identified in two of eighteen probands.
Molecular diagnoses do not necessarily alter clinical care or improve outcomes. It was found that rapid diagnoses of genetic diseases in acutely ill neonates aided in selection for palliative care and genetic counseling for avoidance of unplanned recurrence. In addition, timely genomic diagnosis favorably altered the clinical management of three probands (See Table 1 below).
Table 1 is a prospective assessment of the utility of rapid genome sequencing for molecular diagnosis and treatment of 21 acutely ill neonates and infants in 17 families. Rapid genome sequencing or exome sequencing was performed on 56 individuals.
CMH586, a two month old infant with normal results on expanded newborn screening, presented with failure to thrive, lactic acidemia and hypoglycemia. An interim clinical diagnosis of pyruvate dehydrogenase complex (PDHC) deficiency was made based on worsening lactic acidemia with intravenous dextrose, and a ketogenic diet was initiated. Genome sequencing did not detect mutations in PDHC, but identified a homoplasmic mutation in both the proband and maternal mitochondrial DNA indicative of a diagnosis of transient cytochrome C oxidase deficiency (MIM #500009). Upon diagnosis, the ketogenic diet was discontinued and other interventions were considered. CMH569, a neonate with persistent hypoglycemia and congenital hyperinsulinism, was found to have uniparental, paternal isodisomy for a mutation in sulfonylurea receptor 1 (ABCC8), which causes focal insulin overproduction in pancreatic β cells (MIM #256450). This diagnosis led to a curative, subtotal pancreatectomy. Had this diagnosis not been made, the neonate would likely have undergone total pancreatectomy, leading to lifelong insulin dependent diabetes mellitus. CMH487 was a two month old that developed laboratory signs consistent with hemophagocytic lymphohistiocytosis (HLH) but with a confusing clinical picture. He was found to have compound heterozygous mutations in perforin 1 (PRF1), confirming HLH, type 2 (MIM #603553), was treated with immunosupressants, and his liver function improved.
In summary, 24-hour genomic diagnosis is possible for neonatal genetic diseases. In a small case series, timely genomic diagnoses were made in one half of affected individuals, and these diagnoses influenced clinical management in ˜30% of patients. This preliminary evidence suggests that the burden of undiagnosed genetic diseases in intensive care nurseries is greater than anticipated, although these cases were carefully selected for inclusion. Larger, prospective studies have recently begun to evaluate the potential benefits and harms of medical genome sequencing in apparently healthy, as well as acutely ill, newborns. Despite the improvements in diagnostic sensitivity for nucleotide variants described herein, there remain substantial needs for diagnosis of genomic structural and copy number variants, particularly in the one hundred to one million nucleotide range. Concomitant mRNA sequencing may provide functional evidence for pathogenicity of variants of uncertain significance, hypothesis generation in patients whose genome sequences are uninformative, and identification of molecular pathway targets for possible, novel interventions. Further development of web-based tools for candidate disease nomination and genome interpretation may enable democratization of the neonatal genome. Local hospital-based genome sequencing could be married with centralized, expert diagnostic interpretation and orphan treatment guidance. Finally, there is an immediate, profound need for the development of skills and best practices for conveying actionable genomic information both to healthcare providers and parents. Without genomic counselors and genomic neonatologists, the diagnostic genome cannot become the new standard-of-level IV NICU care for orphan genetic diseases.
Methods Summary: Informed written consent was obtained from adult subjects and parents of living children. The 56 prospective samples were from 17 families with 21 affected probands and siblings that presented in infancy, were without molecular diagnoses, and were enrolled for rapid genome sequencing (See Tables s6-s8 listed out below). 26-hour genome sequencing was performed as described. For 18-hour genome sequencing, isolated genomic DNA was sheared using a Covaris S2 Biodisruptor, end repaired, A-tailed and adaptor ligated. PCR was omitted. Libraries were purified using SPRI beads (Beckman Coulter). Samples for genome sequencing were each loaded onto two flowcells, and sequenced with 2×101 cycles on Illumina HiSeq2500 instruments in rapid run mode (26 hours) or with customized faster flowcell scanning times (18 hours). Isolated genomic DNA was prepared for Illumina TruSeq/Nextera exome sequencing using standard protocols and sequenced on HiSeq 2000 or 2500 instruments with TruSeq v3 or TruSeq Rapid reagents to a depth of >8 GB. Sequences were analyzed as described or as noted in the text and detailed in the supplementary methods.
Case Selection
The study was conducted at a children's hospital with 314 beds, including 70 level IV NICU beds. In 2011, the NICU had 86% bed occupancy. Retrospective samples, UDT103 and UDT173, were blinded validation samples with known molecular diagnoses for a genetic disease. Sample NA12878 was obtained from the Coriell Institute repository. The 56 prospective samples were from 17 families with 21 affected probands and siblings that presented in infancy, were without molecular diagnoses, and were enrolled for rapid genome sequencing (See Table s6 below).
Table s6 is a prospective assessment of the utility of rapid genome sequencing for molecular diagnosis and treatment of 21 acutely ill neonates and infants in 17 families. Rapid whole genome sequencing or exome sequencing was performed on 56 individuals. The electronic medical record was examined for each affected individual and the clinical features of the patient's illness were recorded using Human Phenotype Ontology (HPO) terms. Gene symbols, cDNA coordinates and polypeptide coordinates are recorded for mutation alleles.
The below Tables s7 and s8 below list all of the experimental data generated herein.
Table s7 shows a summary of experimental data related to comparisons of 18-hour and 26-hour HiSeq 2500 2×100 cycle runs. I refers to iSAAC with starling, GG refers to GSNAP and GATK with best practices, NGG refers to GSNAP and GATK without VQSR. GSNAP is the Genomic Short-read Nucleotide Alignment Program. The Genome Analysis Tool Kit (GATK) is a software library for variant identification and genotyping. The final stage in the GATK best practices with ˜40× human genome sequencing is to use known variants as training data to establish the probability of each variant's accuracy (Variant Quality Score Recalibration, VQSR), and removal of low-probability variants. iSAAC and starling are an extremely rapid read alignment and variant calling method pair. High sensitivity for rare variant identification was obtained herein by use of the superset of variants generated by two alignment and variant identification pipelines (GSNAP version 2012.07.12 with GATK version 1.6.13 without VQSR, and iSAAC version 01.13.01.31 with starling version 2.0.2). Rare or novel variants do not overlap sufficiently with extant training data to provide a statistically significant prior, so VQSR was not included.
Table s8 shows a summary of genome sequencing data generated for the current study. All samples were sequenced in two flowcells in single runs on HiSeq 2500 instruments with 2×100 cycles. Unless otherwise noted, genome sequencing was performed in rapid run mode (26 hours). PF: reads passing filter. %>Q30: percent nucleotides with Phred-like quality score greater or equal to 30.
For 26-hour genome sequencing, isolated genomic DNA was prepared for rapid genome sequencing using the TruSeq PCR-Free sample preparation (Illumina Inc.). Briefly, 1000-1500 ng of DNA was sheared using a Covaris LE220 focused-ultrasonicator, end repaired, A-tailed and adaptor ligated. No PCR amplification was performed. Libraries were purified using Ampure beads. Libraries were assessed for appropriate size with a 2100 Bioanalyzer (Agilent). Quantitation was carried out by real-time PCR or a Qubit 2.0 Fluorometer (Life Technologies). Libraries were denatured using 2N NaOH and diluted to between 5 and 20 pM (average 12.5 pM) in hybridization buffer. Approximately 1% PhiX library (Illumina) was spiked in as a real-time control.
For 18-hour genome sequencing, isolated genomic DNA was prepared using a modification of the standard Illumina TruSeq sample preparation. Briefly, DNA was sheared using a Covaris S2 Biodisruptor, end repaired, A-tailed and adaptor ligated. PCR was omitted. Libraries were purified using SPRI beads (Beckman Coulter). For 18-hour genome sequencing, the amount of DNA used was optimized, based on experience of varying the input from representative DNA samples, and allowed a concentration to be selected that produced a known cluster density after the library was denatured using 0.1M NaOH and presented to the flowcell.
Samples for rapid genome sequencing were each loaded onto two flowcells, followed by sequencing on Illumina HiSeq2500 instruments that were set to rapid run mode (26 hour run) or with customized faster flowcell scanning times (18 hour run). Cluster generation, followed by two×101 cycle sequencing reads, separated by paired-end turnaround, were performed automatically on the instrument.
Isolated genomic DNA was prepared for Illumina TruSeq/Nextera exome sequencing using standard Illumina TruSeq/Nextera protocols. Samples were enriched twice and sequenced on HiSeq 2000 or 2500 instruments with TruSeq v3 or TruSeq Rapid reagents to a depth of >8 GB of 2×100 nt reads.
Genome and exome sequencing were performed as research, not in a manner that complies with routine diagnostic tests as defined by the CLIA guidelines.
Sequence Analysis
The basal (Published pipeline) method of sequence analysis for 50-hour diagnostic genome sequencing was alignment to the reference nuclear and mitochondrial genome sequences (Hg19 and GRCH37 [NC_012920.1], respectively) using GSNAP version 2012.1.27 or BWA version 0.6.2 and variant identification and genotyping with GATK version 1.4.5 with best practices. GSNAP is the Genomic Short-read Nucleotide Alignment Program. The Genome Analysis Tool Kit (GATK) is software for variant identification and genotyping. A set of well supported bam+, vcf− variants were identified in disease genes to guide parameter tuning and optimization of genome sequencing pipeline components, versions and parameters for sensitivity (
The following Table 3 is a table of selected short-read DNA sequence alignment methods.
The following table is a table of selected DNA sequence variant identification methods.
Genome sequencing refers to methods that decode the sequence of those regions of the genome that are relevant for disease diagnosis. The following table is a table of selected genome sequencing methods that are relevant for disease diagnosis.
Clinicopatholigic Correlation
The features of the patients' diseases were mapped to likely candidate genes. This was performed manually by a board certified pediatrician and medical geneticist, or automatically by entry of terms describing the patients presentations into the clinico-pathological correlation tools, SSAGA or Phenomizer. This system was designed to enable physicians to delimit whole genome sequencing analyses to genes of causal relevance to individual clinical presentations, in accord with published guidelines for genetic testing in children. Upon entry of the clinical features of an individual patient, SSAGA or Phenomizer identified the corresponding superset of relevant diseases and genes, rank ordered by number of matching terms or probability.
VIKING (Variant Integration and Knowledge Interpretation in Genomes)
VIKING is a software tool for interpreting a patient's genome sequencing results that integrates raw sequencing results, variant characterization results and patient symptoms. Sequencing results are presented as a list of nucleotide variants, or places where the patient's genome sequence differs from that of the human reference genome. These variants are characterized by the RUNES pipeline, which seeks to determine the significance of each variant through comparison to known databases and other in silico predictions. Patient symptoms are loaded from SSAGA along with the SSAGA-predicted diseases and genes that are associated with the symptoms.
VIKING uses the information from SSAGA and RUNES to sort and filter the list of variants detected in genome sequencing so that only variants in genes indicated by the patient symptoms are displayed, and, further, so that genes are ordered by the number of SSAGA terms associated to them. This allows a researcher to quickly get a list of the most relevant nucleotide variants for the patients' symptoms.
VIKING offers several additional features to assist in the interpretation of sequencing results including dynamic filtering results by gene, disease or term, filtering by minor allele frequency so that only rare variants are displayed, filtering by genes that have a compound heterozygote variant or a homozygous variant and the ability to display all RUNES annotations for each variant. Aligned sequences containing variants of interest were inspected for veracity in pedigrees using the Integrative Genomics Viewer.
VIKING is implemented as a Java (jdk 1.6) Swing application that connects to the RUNES and S SAGA databases using the Java Database Connectivity (JDBC) API. The VIKING client application is cross-platform and can run on Windows, Mac OSX and Linux environments.
Clinical Study 1
Characteristics of Enrolled Patients—A biorepository was established at a children's hospital in the central United States for families with one or more children suspected of having a monogenetic disease, but without a definitive diagnosis. Over a 33 month period, 155 families with heterogeneous clinical conditions were enrolled into the repository and analyzed by WGS or WES for diagnostic evaluation. Of these, 100 families had 119 children with NDD and were the subjects of the analysis reported herein (ND Table 1). Standard WES or rapid WGS were performed based on acuity of illness: 85 families with affected children followed in ambulatory clinics received non-expedited WES, followed by non-expedited WGS if WES was unrevealing; 15 families with infants who were symptomatic at or shortly after birth and in neonatal intensive care units (NICU) or pediatric intensive care units (PICU) received immediate, rapid WGS (ND Table 1). The mean age of the affected children in the ambulatory clinic group was approximately 7 years at enrollment (ND Table 2). Symptoms were apparent at an average of less than one year of age in most children (ND Table 2). The clinical features of each affected child were ascertained by examination of electronic health records and communication with treating clinicians, and translated into Human Phenotype Ontology terms. The most common features of the 119 affected children from these families were global developmental delay/intellectual disability, encephalopathy, muscular weakness, failure to thrive, microcephaly, and developmental regression (ND Table 1). The most common phenotype among children in the non-acute group was global developmental delay/intellectual disability (61%). Among infants enrolled from intensive care units, seizures, hypotonia, and morphological abnormalities of the central nervous system were most common. Consanguinity was noted in only 4 families. Our intention was to enroll and test parent-child trios; in practice an average of 2.55 individuals were tested per family.
WES and WGS Data—
WES was performed in 16 days, to a depth of >8 gigabases (GB) (mean coverage >80-fold; Table S1). Six ambulatory patients received rapid WGS by HiSeq X Ten after negative analysis of WES. Rapid WGS (STATseq) was performed in acutely ill patients, and employed a 50-hour protocol and was to an average depth of at least 30-fold (ND Table s1). Nucleotide (nt) variants were identified with a pipeline optimized for sensitivity to detect rare new variants, yielding 4,855,911 variants per genome and 196,280 per exome (ND Table s1). Variants with allele frequencies <1% in a database of ˜3,500 individuals previously sequenced at our center, and of types that are potentially pathogenic, as defined by the American College of Medical Genetics, averaged 560 variants per exome and 835 per genome (ND Table s1).
Genomic Diagnostic Results—
A definitive molecular diagnosis of an established genetic disorder was identified in 45 of the 100 NDD families (53 of 119 affected children) and confirmed by Sanger sequencing (Table s3). In contrast, one diagnosis was made by clinical Sanger sequencing during the three year study period concurrent with genomic sequencing. That patient, CMH725, had CHD7 (Chromodomain Helicase DNA-binding protein 7)—associated CHARGE (Coloboma, Heart Anomaly, Choanal Atresia, Retardation, Genital and Ear anomalies) syndrome (Mendelian Inheritance in Man [MIM] #214800). The characteristics of families receiving diagnoses by WGS and WES were explored (ND Tables s2 and s3). Diagnoses occurred more commonly when the clinical history included failure to thrive or intrauterine growth retardation (p=0.04) (ND Table s3). No other clinical characteristic examined was associated with a change in rate of molecular diagnosis (ND Table s3). The diagnostic rate differed between the acutely ill infants and non-acutely ill older patients. 73% (11 of 15) of families with critically ill infants were diagnosed by rapid WGS. 40% (34 of 85) of families with children followed in ambulatory care clinics, who had been refractory to traditional diagnosis, received diagnoses: 33 by WES and one by WGS after negative WES. Rapid WGS in infants was performed at or near symptom onset. The non-acute, ambulatory clinic patients were older children (average age 83.6 months) and had received a much longer period of subspecialty care and considerable prior diagnostic testing (ND Table s4). These patients had received an average of 13.3 prior tests/panels (range 4-36) with a mean cost of $19,100, whereas the acute care group had received, on average, 7 prior diagnostic tests (range 1-15) with a mean cost of $9,550. In patients who received diagnoses, the inheritance of causative variants was autosomal dominant in 51% (44% de novo, 7% inherited), autosomal recessive in 33% (22% compound heterozygous, 11% homozygous), X-linked in 9% (2% de novo, 7% inherited), and mitochondrial in 6.6% (4.4% de novo, 2.2% inherited) (Table 3). De novo mutations accounted for 51% (23 of 45) of diagnoses overall and 62% (23 of 37) of diagnoses in families without a prior history of NDD. Paternity was confirmed by segregation analysis of private variants in all diagnoses associated with de novo mutations in trios.
For patients receiving diagnoses, the degree of overlap between the canonical clinical features expected for that disease and the observed clinical features in the patient was sought. Human Phenotype Ontology terms for the clinical features in each of the 51 affected children were mapped to ˜5,300 MIM diseases and ˜2,900 genes (ND Table s2). The Phenomizer rank of the correct diagnosis among the prioritized list of diseases matching the observed clinical features was a measure of the goodness of fit between the observed and expected presentations. Among the 41 affected children for whom the rank of the molecular diagnosis on the Phenomizer-derived candidate gene list was available, the median rank was 136th (range 1st to 3103rd, ND Table s2).
As anticipated, the time to diagnosis with 50-hour WGS was much shorter than routine WES or WGS (ND Table 2). Among the 11 families receiving 50-hour WGS, the fastest times to final report of a confirmed diagnosis were 6 days (n=1), 8 days (n=1) and 10 days (n=2) (Table 2). Time to diagnosis was longer for recently described or previously undescribed genetic diseases and in patients whose phenotypes were atypical for the causal gene, as measured by high Phenomizer ranking or divergence from the expected disease course, such as in case CMH301 presented below.
In addition to the 45 families receiving definitive molecular diagnoses, potentially pathogenic nucleotide variants were identified in candidate disease genes in 9 families. In the future, validation studies will determine whether these are indeed new disease genes. Three candidate disease genes identified during the study were subsequently validated and were included in the 45 definite diagnoses (ND Table 3).
Financial Impact of Genomic Diagnoses—
As a surrogate for cost effectiveness, it was determined the total cost of prior negative diagnostic testing for children who received a diagnosis. Laboratory tests, radiologic procedures, electromyograms and nerve conduction velocity studies performed for diagnostic purposes were included (ND Table s4, s5). The mean total charge for prior testing was $19,100 per family enrolled from the ambulatory care clinics (range $3,248-$55,321; ND Table s4). The diagnostic testing at outside institutions, tests necessary for patient management (such as electroencephalograms), physician visits, phlebotomy, and other healthcare charges and costs was omitted. To determine the cost at which, assuming a rate of diagnosis of 40% and an average charge for prior testing of $19,100 per family, WGS or WES sequencing would be cost-effective was sought. Excluding all costs other than that of prior tests, genomic sequencing of ambulatory care patients was cost-effective at a cost of no more than $7,640 per family (Table S4, S5). Assuming WES of an average of 2.55 individuals per family, as occurred when it was sought to enroll trios, it would be cost-effective as long as the cost was no more than $2,996 per individual.
For 11 families enrolled from the NICU and PICU, the mean total charge of conventional diagnostic tests was $9,550 (range $3,873-$14,605; Table S4). All other costs of intensive care potentially saved by earlier diagnosis, either through withdrawal of care where the prognosis rendered medical care futile, or as a result of institution of an effective treatment upon diagnosis was omitted.
Clinical Impact of Genomic Diagnoses—
Among ambulatory care clinic patients, the mean age at symptom onset was 6.6 months (range 0-90 months), enrollment was at 83.7 months (range 1-252 months), and confirmed and reported diagnosis at 95.3 months (range 16-262 months) (Table 2). Among infants who received a diagnosis via rapid WGS sequencing, the median age of symptom onset was 0 days (mean 8.2 days, range 0-90), median age at enrollment was 38 days (range 2-154 days), and median age at confirmed and reported diagnosis was 50 days (range 8-521 days).
As a surrogate measure of clinical effectiveness, the short-term clinical impact of diagnoses by chart reviews and interviews with referring physicians was assessed. Diagnoses changed patient management and/or clinical impression of the pathophysiology in 49% of the 45 families (n=22, ND Tables 3 and ND Table s6). Drug or dietary treatments were started or planned in ten children. In two, both of whom were diagnosed in infancy, there was a favorable response to the treatment. One of these, CMH663, is presented in detail below. The other, CMH680, was diagnosed with early infantile epileptic encephalopathy, type 11 (MIM #613721), and was started on a ketogenic diet with resultant decrease in seizures. Siblings CMH001 and CMH002, with advanced ataxia with oculomotor apraxia type 1 (MIM #208920), were treated with oral CoQ10 supplements; however, no reversal of existing morbidity was reported. Three diagnoses enabled discontinuation of unnecessary treatments, and nine prompted evaluation for possible disease complications.
CMH301 illustrated the utility of WES for diagnosis in a patient with an atypical, non-acute presentation of a recently-described cause of NDD. This patient was asymptomatic until six months of age when he developed tonic-clonic seizures. At 1½ years of age, he became withdrawn and developed motor stereotypies. He was diagnosed with autism spectrum disorder. Seizures occurred up to 30 times daily, despite antiepileptic treatment and a vagal nerve stimulator. At 3 years of age, he developed a tremor and unsteady gait. By age 10, he had frequent falls, loss of protective reflexes, and required a wheelchair for distances. Physical examination was notable for a long thin face, thin vermilion of the upper lip, and repetitive hand movements, including midline wringing. Gait was slow and unsteady. Electroencephalogram demonstrated a left hemisphere epileptogenic focus and atypical background activity with slowing. Extensive neurologic, laboratory and imaging evaluations were not diagnostic. WES revealed a new hemizygous variant in the class-A phosphatidylinositol glycan anchor biosynthesis protein (PIGA, c.68dupG (p.Ser24LysfsX6). His unaffected mother (CMH303) was heterozygous with a random pattern (54:46) of X-chromosome inactivation. PIGA has recently been associated with X-linked Multiple Congenital Anomalies-Hypotonia-Seizures syndrome 2, causing death in infancy (MIM #300868). However, Belet et al. demonstrated that an early stop mutation in PIGA results in a hypomorphic protein with initiation at p.Met37. This truncated PIGA partially restores surface expression of glycosylphosphatidylinositol (GPI)-anchored proteins, consistent with the less severe phenotype in CMH301, whose variant preserves the alternative start codon. A GPI-anchored protein assay confirmed decreased expression on granulocytes, T-cells, and B-cells, and normal erythrocyte expression consistent with the absence of hemolysis. Pyridoxine, an effective antiepileptic for at least one other GPI-anchor biosynthesis disorder, was trialed but was not efficacious.
CMH230 underscored the power of WES to provide a molecular diagnosis in a clinically heterogeneous, non-acute disorder. This patient was born at 37 weeks after detection of a complex congenital heart defect, growth restriction, and liver calcifications in utero. A complete atrioventricular canal defect was identified on postnatal echocardiography. Dysmorphic features included two posterior hair whorls, tall skull, short forehead, low anterior hairline, flat midface, prominent eyes, periorbital fullness, down-slanting palpebral fissures, sparse curly lashes, brows with medial flare, bluish sclerae, large protruding ears, a high nasal root, bulbous nasal tip, inverted nipples, taut skin on the lower extremities and hypotonia. Notable were the absence of wide spaced eyes or macrodontia. Complete repair of the atrioventricular canal was performed at 7 months of age, after which her growth improved. She was diagnosed with partial complex seizures at 15 months. By 2 years she was able to walk independently and began to develop expressive language. Karyotype and aCGH testing were not diagnostic. The clinical findings suggested a peroxisomal disorder or congenital glycosylation defect. Very long chain fatty acids, urine oligosaccharides and transferrin studies were not diagnostic. Two N-glycan profiles demonstrated a mild increase in monogalactosylated glycan, but were not consistent with a primary congenital glycosylation defect. O-glycan profile was initially suggestive of a multiple glycosylation defect, but repeat testing was normal.
WES revealed a de novo frameshift variant in the ankyrin repeat domain 11 (ANKRD11) gene (c.1385_1388delCAAA, p.Thr462LysfsX47) in the proband, consistent with a diagnosis of KBG Syndrome (MIM #148050). CMH230 did not present with the typical features of KBG, which is classically characterized by hypertelorism, macrodontia, short stature, skeletal findings and developmental delay.
CMH663 illustrated the diagnostic utility of rapid WGS (STATseq) in a rare cause of NDD that resulted in a change in patient management. This patient underwent evaluation at 6 months of age for delayed attainment of developmental milestones, hypotonia, mildly dysmorphic facies, and frequent episodes of respiratory distress. Extensive neurologic, laboratory and imaging evaluations were not diagnostic. An episode of acute respiratory decompensation necessitated intubation and transfer to an intensive care unit. EEG revealed generalized slowing. Rapid WGS identified compound heterozygous missense variants in the mitochondrial malate/citrate transporter (SLC25A1 c.578C>G, p.Ser193Trp and c.82G>A, p.Ala28Thr). D-2- and L-2-hydroxyglutaric acid were elevated in plasma and urine, confirming the diagnosis of combined D-2- and L-2-hydroxyglutaric aciduria (MIM #615182). This disorder is associated with a poor prognosis: 8 of 13 reported patients died by 8 months of age. Although no standardized treatment existed, Mühlhausen et al. successfully treated an affected patient with daily Na—K-citrate supplements, with subsequent decrease in biomarker concentrations and stabilization of apneic seizure-like activity that required respiratory support. CMH663 was started on oral Na—K-citrate (1500 mg/kg/day of citrate). After 6 weeks, 2-OH-glutaric acid excretion decreased and citric acid excretion increased. Muscle tone, head control, ptosis, and alertness improved, but she subsequently developed episodes of eye twitching and upper extremity extension, correlated with left temporal and occasional right temporal spike, sharp and slow waves suggestive of epilepsy. However, at 15 months of age, she has had no further episodes of respiratory decompensation.
CMH382 and CMH383 illustrated the utility of routine WGS for molecular diagnosis in patients with NDD in whom WES failed to yield a diagnosis. CMH382 was the first child born to healthy Caucasian, non-consanguineous parents. Pregnancy was complicated by hyperemesis and preterm labor resulting in birth at 32 weeks; size was appropriate for gestational age (AGA). She was hypotonic and lethargic after delivery. Hyperinsulinemic hypoglycemia was detected, and she spent 5 months in the NICU for respiratory and feeding support and blood sugar control. Physical examination was notable for ptosis, exotropia, high palate, smooth philtrum, inverted nipples, short upper arms with decreased elbow extension and wrist mobility, hypotonia, low muscle mass and increased central distribution of body fat. She was diagnosed with autism spectrum disorder at age 3. Developmental Quotients at ages 3 and 5 were less than 50. She required diazoxide treatment for hyperinsulinism until age 6. At age 7 she developed premature adrenarche, and an advanced bone age of 10 years was identified.
CMH383, the sibling of CMH382, was born at 34 weeks; size was AGA. Neonatal course was complicated by apnea, bradycardia, poor feeding, hyperinsulinemic hypoglycemia and seizures. Physical exam was notable for marked hypotonia, finger contractures and dysmorphic features similar to her sister's. She had gross developmental delays and autistic features. Extensive neurologic, laboratory and imaging evaluations were nondiagnostic. WES of both affected siblings and their unaffected parents did not reveal any shared pathogenic variants in NDD candidate genes. Subsequently, WGS was performed on CMH382 (HiSeq X Ten) and identified 156 rare, potentially pathogenic variants not disclosed by WES. Variant reanalysis revealed a new heterozygous, truncating variant in MAGE-like-2 (MAGEL2, c.1996dupC, p.Gln666Profs*47). Further investigation revealed incomplete coverage of the MAGEL2 coding domain with WES but not WGS. The variant was predicted to cause a premature stop codon at amino acid 713. Although this variant has not been reported in the literature, it is of a type expected to be pathogenic, leading to loss of protein function through either nonsense-mediated mRNA decay or production of a truncated protein.
Sanger sequencing confirmed the presence of the p.Gln666Profs*47 variant in CMH382 and her affected sibling, CMH383. The variant was undetectable in DNA from the blood of either parent, suggesting gonadal mosaicism of this paternally expressed gene. MAGEL2 is a GC-rich (61%), intronless gene which maps within the Prader-Willi Syndrome critical region on chromosome 15q11-q13. Truncating, de novo, paternally-derived variants in MAGEL2 have recently been linked to Prader-Willi-like syndrome (PWLS; OMIM#615547) (29). Because MAGEL2 is imprinted and exhibits paternal monoallelic expression in the brain, the findings are consistent with a loss of MAGEL2 function. Although parental gonadal mosaicism is rare, this case highlighted the need to include analysis of de novo disease-causing variants in families with multiple affected siblings.
Siblings CMH334 and CMH335 demonstrated that clinical heterogeneity in NDD can hinder molecular diagnosis by conventional methods and be circumvented by WES. CMH334 had a history of intellectual disability, a mixed seizure disorder with possible myoclonic epilepsy, and thrombocytopenia of unknown etiology. Scores on the Wechsler Intelligence Scale for Children (3rd Edition) revealed a Verbal IQ of 63, a Performance IQ of 65, and a Full Scale IQ of 61 (1st percentile). At age 17, after a sedated dental procedure, he developed a lower extremity tremor which progressed to tremulous movements and facial twitching. A decline in school performance and development of severe anxiety led to further evaluation. Physical features included synophrys and prominent eyebrow ridges. Neurologic findings included saccadic eye movements, a resting upper extremity tremor, a perioral tremor, and tongue fasciculations. Deep tendon reflexes were brisk, but muscle tone, bulk and strength were maintained. Speech was slow. Heel to toe gait was unsteady, but Romberg sign was negative. Laboratory studies suggested a possible creatine biosynthesis disorder; however, GATM (arginino: glycine amidinotransferase) and SLC6A8 (creatine transporter) sequencing was negative, and magnetic resonance spectroscopy revealed CNS creatine levels to be normal.
CMH335, a full-brother, was also diagnosed with Attention Deficit Hyperactivity Disorder, intellectual disability, and epilepsy. Notable features included macrocephaly, bitemporal narrowing, obesity, hypotonia, intention tremor and tongue fasciculations. At age 9 he had an episode of acute psychosis and transient loss of some cognitive skills, including inability to recognize family members. He had complete resolution of these symptoms after approximately 3 weeks. At age 16, he was again hospitalized for neuropsychiatric decompensation and a subacute decline in reading skills. He was found to have euthyroid thyroiditis with thyroglobulin antibodies at 2565 IU/mL (normal<116 IU/mL), resulting in a diagnosis of Hashimoto's Encephalopathy. He also underwent a lengthy diagnostic evaluation which included negative methylation studies for Prader-Willi/Angelman syndrome and an X-Linked-Intellectual Disability panel.
WES revealed a known pathogenic hemizygous variant in the methyl CpG binding protein 2 gene (MECP2 c.419C>T, p.A140V) in both boys; their asymptomatic mother was heterozygous. This variant has been previously reported as a hypomorphic allele that, unlike many MECP2 variants, is compatible with life in affected males. Such males exhibit Rett-like symptoms (MIM #312750); carrier females may have mild cognitive impairment or no symptoms.
Here high rates of monogenetic disease diagnosis in children with neurodevelopmental disorders by acuity-guided WGS or WES of trios were reported. Retrospective estimates of clinical and cost effectiveness of WGS- and WES-based diagnosis of NDD were also reported. Because NDD affects more than 3% of children, these results have broad implications for pediatric medicine.
The 45% rate of molecular diagnosis of NDD, reported herein, was modestly higher than previous reports, in which 8-42% of individuals or families received diagnoses by WGS or WGS. The high diagnostic rate reported here reflected, in part, the use of rapid WGS in critically ill infants, who had very little prior testing, with a resultant diagnosis rate of 73% (11 of 15 families). Nevertheless, the diagnostic yield in ambulatory patients who had received extensive prior testing (34 of 85 families; 40%) was also high in view of exclusion of readily diagnosed causes, low rate of consanguinity (4%), and inclusion criteria similar to prior studies. Cases CMH382 and CMH383 highlighted the potential for WGS to detect variants missed by WES, particularly variants in GC-rich exons. However, a broader comparison of the diagnostic sensitivity of WGS and WES was precluded by the two distinct populations tested in this study. At present, there is no generalizable evidence for the superiority of 40-fold WGS or deep WES for diagnosis of monogenetic disorders. This may change with maturation of tools for identification of pathogenic non-exonic variants and understanding of the burden of causal chimerism and somatic mutations in genetic diseases.
Two other methodological characteristics may have contributed to the high overall diagnostic sensitivity. Firstly, de novo mutations were the most common genetic cause of childhood NDD, accounting for 23 (51%) diagnoses (37). With the exception of curated known variants, such cases benefit from trio enrollment. Secondly, clinicopathologic software was used to translate individual symptoms into a comprehensive set of disease genes that was initially examined for causality. Such software helped to solve the immense interpretive problem of broad genetic and clinical heterogeneity of NDD. This was exemplified in many of the cases reported (for example CMH001, CMH002, CMH079, CMH096, CMH301, CMH334, and CMH335), where the clinical overlap with classic disease descriptions was modest, as objectively measured by the rank of the molecular diagnosis on the list of differential diagnosis derived from the clinical features with the Phenomizer tool. A consequence is that it will be challenging to recapitulate dynamic, clinical-feature-driven interpretive workflows in remote reference laboratories, where most molecular diagnostic testing is currently performed.
Broad adoption of acuity-guided allocation of WGS or WES for NDD will require prospective analyses of the incremental cost-effectiveness versus traditional testing. Decision-analytic models should include the total cost of implementation by healthcare systems and long-term comparisons of overall cost of care, given the chronicity of NDD. Here, as a retrospective proxy, the total charge for prior, negative diagnostic tests in families who received WES- or WES and WGS-based diagnoses was identified. The average cost of prior testing, $19,100, appeared representative of tertiary pediatric practice in the United States. Assuming the observed rate of diagnosis (40%) in the ambulatory group, sequencing was found to be a cost-effective replacement diagnostic test up to $7,640 per family or $2,996 per individual. Although $2,996 is at the lower end of the cost of clinical WES today, next-generation sequencing continues to decline in cost. Furthermore, the cost-effectiveness estimates reported herein excluded potential changes in healthcare cost associated with earlier diagnosis.
Two families powerfully illustrated the impact of WES on the cost and length of the NDD diagnostic odyssey. The first enrollees, CMH001 and CMH002, were sisters with progressive cerebellar atrophy. Prior to enrollment they had 45 subspecialist visits during seven years of progressive ataxia, and their cost of negative diagnostic studies exceeded $35,000. WES yielded a diagnosis of ataxia with oculomotor apraxia type 1. In contrast, one year later, siblings CMH102 and CMH103 were enrolled for WES at the first subspecialist visit. The cost of their diagnostic studies was $3,248. WES yielded a diagnosis of nemaline myopathy. A third affected sibling was diagnosed by Sanger sequencing of the causative variants.
Another prerequisite for broad acceptance and adoption of WGS and WES for diagnosis of childhood NDD is demonstration of clinical effectiveness. The premise of genomic medicine is that early molecular diagnosis enables institution of mechanism-targeting, useful treatments before the occurrence of fixed functional deficits. Prospective clinical effectiveness studies with randomization and comparison of morbidity, quality of life and life expectancy related to NDD have not yet been undertaken. Here, as preliminary surrogates, the time to diagnosis and changes in care upon return of new molecular diagnoses were retrospectively examined. In the ambulatory patient group, patients had been symptomatic for 77 months, on average, prior to enrollment. WES, if performed at symptom onset, would have had the potential to truncate the diagnostic odyssey in such cases. Time-to-diagnosis rates reported herein (WES 11.5 months, rapid WGS 43 days, Table 2) predict that use of rapid WGS could accelerate diagnosis by an additional 10 months. For children with progressive NDD for which treatments exist, outcomes are likely to be markedly improved by treatment institution months to years earlier than would have otherwise occurred.
Another well-established benefit of a molecular diagnosis is genetic counseling of families for recurrence risk. In the current study, there were five genetic disorder recurrences in four of the families who received diagnoses. Of equal importance, the 23 families with causative de novo variants could have been counseled earlier that, barring gonadal mosaicism, recurrence was not expected. Affected children in 49% of families receiving diagnoses by WGS or WES were reported by their physicians to have had a change in clinical management and/or clinical impression (ND Tables 3 and 6). A change in drug or dietary treatment either occurred or was planned in ten families (23%), in agreement with one previous report. In two patients, both of whom received diagnoses in infancy, there was a favorable response to that treatment. One of these, CMH663, was presented in detail here. Given that all diagnoses were of ultra-rare diseases, a recurrent finding was that the new treatment considered was supported only by case reports or studies in model systems. For example, several patients with ataxia with oculomotor apraxia type 1, which was the diagnosis for CMH001 and CMH002, had responded to oral Coenzyme Q10 supplements. In addition to only anecdotal evidence of efficacy, the treatment of CMH001 and CMH002 with Coenzyme Q10 was complicated by advanced cerebellar atrophy at time of diagnosis and the absence of pharmaceutical formulation, pharmacokinetic, phannacodynamic, or dosing information in children. Thus, demonstration of the clinical effectiveness of genomic medicine will require not only improved rates and timeliness of molecular diagnosis, but also multidisciplinary care to identify, design and implement candidate interventions on an N-of-1-family or N-of-1-genome basis.
Neurodevelopmental disorders exhibited a broad spectrum of monogenetic inheritance patterns and frequently, divergence of clinical features from classical descriptions. Over 2,400 genetically distinct neurologic disorders exist, underscoring the relative ineffectiveness of serial, single gene testing. Furthermore, the clinical features of patients and families receiving diagnoses did not delineate a subset of NDD patients unlikely to benefit from WGS or WES. Mechanistically, the low incidence of recurrent alleles was consistent with their recent origin, as was the high rate of causative de novo mutations. Given the broad enrollment criteria used herein, it is possible that this level of genetic and clinical heterogeneity may be typical of NDD in subspecialty practice.
The evaluation of NDD patients has, historically, been constrained by the availability and cost of testing. Limited availability of tests reflects both the delay between disease gene discovery and the development of clinical diagnostic gene panels, and the adverse economics of targeted test development for ultra-rare diseases. Acuity-guided WGS and WES largely circumvented these constraints. Indeed, eight of the diagnoses reported herein were in genes for which no individual clinical sequencing was available at the time of patient enrollment (ASXL3, BRAT1, CLPB, KCNB1, MTOR, PIGA, PNPLA8 and MAGEL2).
A new candidate NDD gene or a previously undescribed presentation of a known NDD-associated gene that required additional experimental support was identified in twelve families. Three new disease-gene associations, and one new phenotype, were validated or reported during the study. Functional studies will need to be performed in the future for the remaining nine candidate genes, which were not included among the positive diagnoses reported here. These patients lacked causative genotypes in known disease genes, and had rare, likely pathogenic changes in biologically plausible genes that exhibited appropriate familial segregation. The possibility of a substantial number of new NDD genes fits with findings in other recent case series. From a clinical standpoint, the common identification of variants of uncertain significance in candidate disease genes creates practical dilemmas that are not experienced with traditional diagnostic testing. Given the exacting principles of validation of a new disease gene, there exists an urgent need for pre-competitive sharing of the relevant pedigrees.
This study had several limitations. It was retrospective and lacked a control group. Clinical data were collected principally through chart review, which may have led to under- or over-estimates of acute changes in management. Information about long-term consequences of diagnosis, such as the impact of genetic counseling were not ascertained. Comparisons of costs of genomic and conventional diagnostic testing excluded associated costs of testing, such as outpatient visits, and may have included tests that would nevertheless have been performed, irrespective of diagnosis. The acuity-based approach to expedited WGS and non-expedited WES was a patient-care-driven approach and was not designed to facilitate direct comparisons between the two methods.
In summary, WGS and WES provided prompt diagnoses in a substantial minority of children with NDD who were undiagnosed despite extensive diagnostic evaluations. Preliminary analyses suggested that WES was less costly than continued conventional diagnostic testing of children with NDD in whom initial testing failed to yield a diagnosis. WES-based diagnoses were found to refine treatment plans in many patients with NDD. It is suggested that sequencing of genomes or exomes of trios should become an early part of the diagnostic work-up of NDD and that accelerated sequencing modalities be extended to patients with high-acuity illness.
Study Design—
This is a retrospective analysis of patients enrolled in a biorepository at a children's hospital in the central United States. The repository comprised all families enrolled in a research WGS and WES program established to diagnose pediatric monogenic disorders. Of 155 families analyzed by WGS or WES during the first 33 months of the diagnostic program, 100 were families affected by NDD. This is a descriptive study of the 119 affected children from these families.
Study Participants—
Referring physicians were encouraged to nominate families for enrollment in cases with multiple affected children, consanguineous unions where both biologic parents were available for enrollment, infants receiving intensive care, or children with progressive NDD. WES was deferred when the phenotype was suggestive of genetic diseases not detectable by next-generation sequencing, such as triplet repeat disorders, or when standard cytogenetic testing or array-based comparative genomic hybridization had not been obtained. Post-mortem enrollment was considered for deceased probands of families receiving ongoing healthcare services at the clinic.
NDD was characterized as central or peripheral nervous system symptoms and developmental delays or disabilities. With one exception, enrollment was from subspecialty clinics at a single, urban children's hospital. This study was approved by the Institutional Review Board at Children's Mercy—Kansas City. Informed written consent was obtained from adult subjects, parents of children, and children capable of assenting.
Ascertainment of Clinical Features in Affected Children—
The clinical features of each affected child were ascertained by examination of electronic health records and communication with treating clinicians, translated into Human Phenotype Ontology (HPO) terms, and mapped to ˜4,000 monogenic diseases and ˜2,800 genes with the clinicopathologic correlation tools SSAGA (Symptom and Sign Associated Genome Analysis) and/or Phenomizer (Supplementary Table S2).
Exome Sequencing—
WES was performed in a CLIA/CAP approved laboratory under a research protocol. Exome samples were prepared with either Illumina TruSeq Exome or Nextera Rapid Capture Exome kits according to manufacturer's protocols. Exon enrichment was verified by quantitative PCR of 4 targeted loci and 2 non-targeted loci, both before and after enrichment. Samples were sequenced on Illumina HiSeq 2000 and 2500 instruments with 2×100 nt sequences.
Genome Sequencing—
Genomic DNA was prepared for WGS using either Illumina TruSeq PCR Free (rapid WGS) or TruSeq Nano (HiSeq X Ten) sample preparation according to manufacturer's protocols. Briefly, 500 ng of DNA was sheared with a Covaris S2 Biodisruptor, end-repaired, A-tailed and adaptor-ligated. Quantitation was carried out by real-time PCR. Libraries were sequenced by Illumina HiSeq 2500 instruments (2×100 nt) in rapid run mode or by HiSeq X Ten (2×150 nt).
Next Generation Sequencing Analysis—
Sequence data were generated with Illumina RTA 1.12.4.2 & CASAVA-1.8.2, aligned to the human reference NCBI 37 using GSNAP, and variants were detected and genotyped with the Genome Analysis Tool Kit, versions 1.4 and 1.6, and Alpheus v3.0. Sequence analysis used FASTQ, barn, and VCF files. Variants were called and genotyped in WES in batches, corresponding to exome pools, using GATK 1.6 with best practice recommendations. Variants were identified in WGS using GATK 1.6 without Variant Quality Score Recalibration. The largest deletion variant detected was 9,992 nt, and the largest insertion was 236 nt.
Variants were annotated with the RUNES Software (v1.0). RUNES incorporates data from ENSEMBL's Variant Effect Predictor (VEP) software, produces comparisons to NCBI dbSNP, known disease variants from the Human Gene Mutation Database, and performs additional in silico prediction of variant consequences using RefSeq and ENSEMBL gene annotations. RUNES categorized each variant according to ACMG recommendations for reporting sequence variation and with an allele frequency (MAF) derived from CPGM's Variant Warehouse database. Category 1 variants had previously been reported to be disease-causing. Category 2 variants had not previously been reported to be disease-causing, but were of types that were expected to be pathogenic (loss of initiation, premature stop codon, disruption of stop codon, whole gene deletion, frameshifting indel, disruption of splicing). Category 3 were variants of unknown significance that were potentially disease-causing (nonsynonymous substitution, in-frame indel, disruption of polypyrimidine tract, overlap with 5′ exonic, 5′ flank or 3′ exonic splice contexts). Category 4 were variants that were probably not causative of disease (synonymous variants that were unlikely to produce a cryptic splice site, intronic variants >20 nt from the intron/exon boundary, and variants commonly observed in unaffected individuals). Causative variants were identified primarily with VIKING software. Variants were filtered by limitation to ACMG Categories 1-3 and MAF<1%. All potential monogenetic inheritance patterns were examined, including de novo, recessive, dominant, X-linked, mitochondrial, and, where possible, somatic variation. Where a single likely causative variant for a recessive disorder was identified, the entire coding domain was manually inspected using the Integrated Genome Viewer for coverage, additional variants, as were variants for that locus called in the appropriate parent that may have had low coverage in the proband. Expert interpretation and literature curation were performed for all likely causative variants with regard to evidence for pathogenicity. Sanger sequencing was used for clinical confirmation and reporting of all diagnostic genotypes. Additional expert consultation and functional confirmation were performed when the subject's phenotype differed from previous mutation reports for that disease gene.
Flow Cytometry—
Allophycocyanin-conjugated antibodies to CD59 were obtained from Becton Dickinson. Detection of glycosylphosphatidylinositol (GPI)-anchored protein expression on granulocytes, B cells, and T cells was performed with a fluorescent aerolysin-based assay (Protox Biotech). Before staining white blood cells, whole blood was incubated in 1× red blood cell lysis buffer (GIBCO). The remaining nucleated cells were identified on the basis of forward and side scatter and by staining with phycoerythrin (PE)-conjugated anti-CD3 (T cells), anti-CD15 (granulocytes), and anti-CD20 (B cells) antibodies (Becton Dickinson). Acquisition and analysis was performed by flow cytometry (FACSCalibur, Becton Dickinson) and Flow Jo (Tree Star. Inc). For all cell types, the isotypic control was set at 1%.
Clinical Study 2
The following are the diagnostic and clinical findings among critically ill infants receiving rapid whole genome sequencing for identification of Mendelian disorders. Genetic disorders and congenital anomalies are the leading cause of infant mortality. Diagnosis of most genetic diseases in neonatal and pediatric intensive care units (NICU, PICU) has not occurred in time to guide clinical management. Rapid whole-genome sequencing (STATseq) was performed in a level IV NICU and PICU to examine (1) the rate and types of molecular diagnoses, and (2) the prevalence, types and impact of medically actionable diagnoses.
Retrospective comparison of STATseq and standard etiologic testing in a case series collected from the NICU and PICU of a large children's hospital between November 2011 and October 2014. The participants were 35 families with an infant aged <4 months with an acute illness of suspected genetic etiology. The intervention was STATseq of trios (parents and their affected infant). The main measures were the diagnostic rate, time to diagnosis, and rate of change in management of reference standard testing and STATseq.
The rate of diagnosis of a genetic disease was 57% by STATseq, and 9% by the reference standard (p<0.001). Median time to genome analysis was 5 days, but to confirmed clinical report was 23 days. 65% of STATseq diagnoses were associated with de novo mutations. In infants receiving a genetic diagnosis, acute clinical utility was observed in 62%, a strongly favorable impact on management occurred in 19%, palliative care was instituted in 33%, and 120-day mortality was 57%.
In selected acutely ill infants, STATseq had a high rate of diagnosis of genetic disorders. A majority of diagnoses influenced acute management. Mortality is very high among NICU and PICU infants diagnosed with a genetic disease. Since disease progression can be extremely rapid in infants, diagnoses must be very fast to allow consideration of interventions that lessen morbidity and mortality. There are over 5,300 genetic diseases of known cause. Collectively, they are the leading cause of infant mortality, particularly in neonatal intensive care units (NICUs), and pediatric intensive care units (PICUs). The premise of genomic medicine is that molecular diagnosis may allow supplementation of empiric, phenotype-driven management with genotype-differentiated treatment and genetic counseling. Timely molecular diagnoses of suspected genetic disorders were previously largely precluded in acutely ill infants by profound clinical and genetic heterogeneity, and tardiness of results of reference standard tests, such as gene sequencing. While appropriate NICU treatment is among the most cost-effective methods of high-cost health care, the long-term outcomes of these in NICU subpopulations are diverse. In genetic diseases with poor prognosis, rapid diagnosis can empower early parental discussions regarding palliative care calibrated on minimization of suffering. Methods for 50-hour diagnosis of genetic disorders by rapid whole-genome sequencing (STATseq) were previously reported. STATseq simultaneously tested almost all Mendelian illnesses, and was hypothesized to give a diagnosis in time to guide clinical management acutely in infants and children in a NICU or PICU setting. This study reports the rate and types of molecular diagnosis from STATseq and reference standard tests among phenotypic groups in the first 35 infants in a level IV NICU and PICU at a quaternary children's hospital, and the prevalence, types and results of medically actionable findings.
Methods—Study Design, Setting and Participants
This study was approved by the Institutional Review Board at Children's Mercy—Kansas City. This was a retrospective comparison of the diagnostic rate, time to diagnosis, and types of molecular diagnosis of reference standard etiologic testing, as clinically indicated, with STATseq (index test) in a case series. Participants were principally parent-child trios, enrolled in a research biorepository who received genomic sequencing to diagnose monogenic disorders of unknown etiology in affected children. Affected infants and children with suspected genetic disorders were nominated for STATseq by a treating physician, typically a neonatologist. A standard form requesting the primary signs and symptoms, past diagnostic testing results, differential diagnosis or candidate genes, pertinent family history, availability of biologic parents for enrollment, and whether STATseq would potentially affect treatment was submitted for immediate evaluation. Infants received STATseq if the likely diagnosis was of a type that was detectable by next-generation sequencing and had any potential to alter management or genetic counseling. Patients were not required to undergo standardized clinical examinations or diagnostic testing prior to referral; standard etiologic testing was performed as clinically indicated. Infants likely to have disorders associated with cytogenetic abnormalities were not accepted unless standard testing for those disorders was negative. Approximately two thirds of nominees were accepted for STATseq. Informed written consent was obtained from parents. About one half of accepted families were enrolled. Major reasons for failure to enroll were unavailability of one or more biological parents, parents were minors and unable to consent, or parental refusal to participate. 49 families with acutely ill or deceased infants and children were enrolled and received STATseq of parent-child trios. 35 of these families met inclusion criteria for this report: age of the affected infant <4 months, enrollment from a level IV NICU or PICU at the clinic between November 2011 and October 2014, acute illness of suspected monogenetic etiology in the infant, and absence of an etiologic diagnosis. Approximately 2,400 infants <4 months of age were admitted to the NICU or PICU during the study period.
Ascertainment of Clinical Features
The clinical features of affected infants were ascertained comprehensively by physician interviews and review of the medical record. Clinical features were translated into Human Phenotype Ontology (HPO) term, and mapped to ˜5,300 monogenic diseases with the clinicopathologic correlation tool Phenomizer (MD Table s1).
Genome Sequencing and Quality Control
STATseq was performed at CPGM under a research protocol, and employed either a 50-hour or seven day protocol that was guided by acuity of illness. The laboratory was licensed by the Clinical Laboratory Improvement Amendments (CLIA) and accredited by the College of American Pathologists (CAP). STATseq was performed on both parents and affected infants simultaneously. Genomic DNA extraction from whole blood, library preparation, sequencing, and data analysis were performed using validated protocols. Genomic DNA was prepared using Illumina TruSeq PCR Free sample preparation. Quantitation was by real-time PCR. Libraries were sequenced by Illumina HiSeq 2500 instruments (2×100 nt) in rapid run mode (50-hour protocol) or standard run mode (7 day protocol). STATseq was to a depth of at least 90 Gb per sample (MD Table s2), to provide a mean 40-fold genome coverage. Each sample met established quality metrics.
Genome Sequence Analysis
Sequences were aligned to the human reference NCBI 37 using Genomic Short Read Nucleotide Alignment Program (GSNAP). Nucleotide variants were detected and genotyped with the Genome Analysis Toolkit (GATK) v. 1.4 and 1.6, and yielded an average of 4.9 million nucleotide variants per sample (Table S2). Variants were annotated with RUNES software. STATseq interpretations considered multiple sources of evidence, including variant attributes, the gene involved, inheritance pattern, and clinical case history. Causative variants were identified primarily with VIKING software by limitation to American College of Medical Genetics (ACMG) Categories 1-3 and allele frequency <1% from an internal database. On average, genomes contained 825 potentially pathogenic variants (allele frequency <1%, ACMG categories 1-3). All inheritance patterns were examined. Where a single likely causative variant for a recessive disorder was identified, the locus was manually inspected using the Integrated Genome Viewer in the trio for uncalled variants. Expert interpretation and literature curation were performed for likely causative variants with regard to evidence for pathogenicity. While STATseq can give a provisional diagnosis of genetic disorders in 50-hours, it is a research test, and Sanger sequencing was used for confirmation of all likely causative genotypes. During the study, the FDA granted non-significant risk status to verbal return of a provisional STATseq diagnosis to the treating physician in exceptional cases, where the results were actionable and the infant was imminently likely to die (FDA/CDRH/OIR submission Q140271, May 8, 2014). Familial relationships were confirmed by segregation analysis of private variants in STATseq diagnoses associated with de novo mutations. An infant was classified as having a definitive diagnosis if a pathogenic or likely pathogenic genotype in a disease gene that overlapped with a reported phenotype was reported in the medical record. Expert consultation and functional confirmation were performed when the subject's phenotype differed from the expected phenotype for that disease gene. Incidental findings were not reported.
Reference Standard Testing
Affected infants received diagnostic testing based on physician clinical judgment (reference standard), in addition to STATseq (index test). Standard etiologic testing for genetic diseases included biochemical and immunologic testing of body fluids, array comparative genomic hybridization, fluorescence in situ hybridization, high resolution chromosomes, sequencing of genes and gene panels, methylation studies, and gene deletion/duplication assays.
Outcomes
The primary outcomes evaluated were the diagnostic rate and time to diagnosis of the reference standard and STATseq. Measurements included the types of molecular diagnosis obtained, medically actionable diagnoses, and impact of diagnoses on medical care and outcomes.
Results—Demographics of Infants
49 families with acutely ill or deceased infants and children were enrolled and received STATseq of parent-child trios. 35 of these families met inclusion criteria for this report: age of the affected infant <4 months, enrollment from a level IV NICU or PICU at the clinic between November 2011 and October 2014, acute illness of suspected monogenetic etiology in the infant, absence of an etiologic diagnosis, and where that diagnosis had any potential to alter management or genetic counseling (FIG. MD 1). The phenotype(s) for which infants had been nominated were diverse, and were typically present at birth (MD Table 1). The most common phenotypes were congenital anomalies (26%) and neurologic findings (20%). However, frequently, infants had complex clinical features, and the proximate reason for nomination for STATseq was one of several co-occurring phenotypes (Table S1). For example, CMH487 was admitted to the NICU at birth with bronchopulmonary dysplasia and a ruptured omphalocele, but was nominated for STATseq for acute liver failure on day of life (DOL) 71.
Diagnostic Results
The reference standard comprised 94 clinical genetic tests that were performed in 33 of the 35 infants, and gave three genetic diagnoses (9%; by microarray comparative genomic hybridization in CMH773, and single gene sequencing in CMH725 and CMH890) (FIG. MD 1, MD Table 1). The average age at reference standard test order was DOL 20, and the median time to diagnostic report was 16 days (MD Table 1).
STATseq gave 20 diagnoses (57%), which was significantly more than the reference standard (χ2, p<10−10; FIG. MD 1, Tables 1 and 2). The average age at enrollment for STATseq was DOL 26, and the median time to confirmed, reported diagnosis was 23 days (MD Table 1). Of this, the median interval from enrollment to STATseq completion and start of variant analysis was 5 days (range 3-153 days; MD Table 1). The outlier, CMH064, was the first enrollee and STATseq methods were still in development. 65% of STATseq diagnoses were reported prior to discharge or death. In four infants, death occurred within four days of enrollment, and STATseq was incomplete at time of death (FIG. MD S2 and S3). Reasons for longer STATseq times-to-diagnosis were development of informatics tools for structural variant detection during the study, publication of novel disease-gene associations during the study, or infants whose phenotype differed sufficiently from prior reports to require extensive analysis and external expert consultation.
45% (9 of 20) of STATseq diagnoses were diseases that were not considered in the differential diagnosis at time of enrollment. In one acutely ill infant, an actionable, provisional molecular diagnosis was reported verbally on day 3, before confirmatory testing (see CMH487, below). STATseq replicated the three reference standard diagnoses, albeit one was not reported clinically as a result of STATseq, and was thus excluded from the STATseq diagnostic rate (FIG. MD 1). Inclusive of that case, the STATseq diagnostic rate was 60% (21 of 35; MD Table 1).
In almost all cases STATseq and clinical genetic testing also identified findings that were not reported since either they did not adequately explain the etiology of illness in those infants, or lacked sufficient evidence of pathogenicity.
No phenotypic feature was associated with a higher diagnostic yield with STATseq. Recurrent genes with causative variants were PTPN11 (3), CHD7 (2), and SCN2A (2); all of which occurred de novo (MD Tables 2 and s3). Dominant de novo mutations were the most common mechanism of genetic disease (65%). One patient had a dominantly inherited disease, with a paternally inherited variant and somatic loss of the maternal allele. Genome sequencing provided good coverage of the mitochondrial genome, yielding one maternally-inherited diagnosis. Of five patients with autosomal recessive inheritance, four were compound heterozygous, and one, from a genetically isolated population, was homozygous (MD Table 2).
In infants receiving STATseq diagnoses, the degree of overlap between the classical clinical features of that disease and those which were observed was examined. HPO terms for these were mapped to genetic diseases with Phenomizer (MD Table s1). The rank of the diagnosis in the genetic disease compendium reflected concordance of observed and expected presentations (MD Table s1). Among 19 infants whose genetic diagnosis was in the Phenomizer database, the average rank was 806th (median 181st, MD Table s1). In contrast, the average rank among 32 older children with neurodevelopmental disorders diagnosed by genomic sequencing was 279th (median 128th, MD table s4).
Clinical Outcomes and Impact of Genomic Diagnoses
The median NICU or PICU stay was 42 days (range 3-387 days). 120-day mortality was 34% (12 of 35). It was significantly higher in infants receiving diagnoses than those who did not (11 of 21, 52%, versus 1 of 14, 7%, respectively; χ2, p<10−22; Table 3, MD
The short-term clinical impact of STATseq diagnoses was assessed by chart reviews and interviews with referring physicians (MD Table 3). 62% of STATseq diagnoses were considered to have acute clinical utility (MD Table 3). Reasons for utility were diverse, and included institution of palliative care, medication changes, and change in genetic counseling. Of 13 diagnoses made prior to discharge or death, 11 (85%) were considered to have acute clinical utility. In four of these (31% of timely diagnoses, 19% of all diagnoses, 11% of the total cohort) the change in acute management or outcome was both considerable and favorable, detailed as follows.
Illustrative Cases
CMH487, a full-term male admitted to the NICU at birth with multiple congenital anomalies, required tracheostomy and was ventilator dependent (FIG. MD 2b). On day of life (DOL) 56 he developed acute hepatic failure. Extensive testing failed to yield an etiologic diagnosis. Steroids were initiated empirically on DOL 67 with some improvement in hepatic failure. Intravenous immunoglobulin was given on DOL 69. The infant-parent trio was enrolled on DOL 71. STATseq yielded a genotype suggestive of type 2 hemophagocytic lymphohistiocytosis on DOL 74, which was confirmed and reported on DOL 77 with recommendations for functional studies. Despite marginal overlap with the classic presentation, the diagnosis was confirmed functionally by absent NK cell function. Disease-specific treatment (intravenous immunoglobulin and corticosteroids) was continued, and empiric therapies discontinued on DOL 81. Coagulopathy resolved on DOL 88. The patient is now 23 months old, at home, has normal liver function, and has undergone several surgical procedures for correction of congenital anomalies.
CMH569 was admitted to the PICU on DOL 34 with a blood glucose of 18 mg/dL (FIG. MD 2c). Hypoglycemia persisted despite glucose infusion of >13 mg/kg/min and maximum dose of diazoxide. Testing revealed hyperinsulinemia (6.4 PU/mL with a serum glucose of 37 mg/dL). The infant-parent trio was enrolled on infant DOL 41. STATseq yielded a genotype suggestive of ABCC8-associated familial hyperinsulinism, type 1, which was reported provisionally on DOL 45. The presence of a single, paternally derived mutation and clinical presentation suggested the focal form of familial hyperinsulinism (FHI; pancreatic adenomatous hyperplasia that involved a portion of the pancreas), caused by biallelic mutations in ABCC8. Focal FHI is inherited autosomal dominantly, but only manifests when the mutation is on the paternally derived allele and there is somatic loss of the maternal allele in a p cell precursor. The confirmed diagnosis was reported on DOL 50. Fluorodopa positron emission tomography was used to confirm and localize the focal pancreatic lesions, which changed the surgical approach and clinical outcome: Targeted resection of focal pancreatic lesions was performed, avoiding insulin-requiring diabetes mellitus. STATseq shortened the PICU stay, as well as the morbidity (and potential mortality) associated with breakthrough hypoglycemia, by approximately three weeks. The patient is now 19 months old and euglycemic. The patient maintained normal blood glucose during a fasting challenge, indicating no persistent hyperinsulinism.
CMH586 was admitted on DOL 63 for failure to thrive (weight 5th percentile for a 2-week old, length 6th percentile, head circumference 15th percentile), with lactic acidosis, hypoglycemia and abnormal liver function. Intravenous dextrose increased the lactic acid. Ketosis was minimal and lactate: pyruvate ratio was normal. The empiric diagnosis was pyruvate dehydrogenase complex deficiency, and a modified ketogenic diet was started. STATseq identified reversible cytochrome C oxidase deficiency with a maternally inherited homoplasmic mitochondrial mutation. This diagnosis conferred a highly favorable long-term prognosis, and, thus, changed the clinical impression such that intensive interventions were indicated had the acute clinical course deteriorated. The ketogenic diet was unnecessary, and was discontinued. She is now 17 months old and has normal growth, weight and age-appropriate development.
CMH680 was diagnosed with early infantile epileptic encephalopathy, type 11, resulting in institution of a ketogenic diet and a change in anti-epileptic drug. She is now 16 months old, at home, and continues to have seizures, but has had improvement in electroencephalograms.
In several cases, literature review identified potential treatments that were novel or supported only by anecdotal evidence of efficacy. For example, in CMH809, with PTPN11-associated hypertrophic cardiomyopathy (LEOPARD syndrome), an N-of-1 trial of everolimus, an inhibitor of mTOR-dependent MEK/ERK activation, was internally discussed as a potential therapy, but not implemented. The infant died on DOL 17.
STATseq was feasible in a sustained manner in a NICU/PICU setting, and conferred etiologic diagnoses to a majority of enrolled infants with a wide range of clinical presentations. Since genetic diseases are the leading cause of death in the NICU and PICU, as well as overall infant mortality, these results have broad implications for the practice of neonatology.
The rate of definitive diagnosis by STATseq was 57%, which was significantly higher than that of reference methods (9%). Nine molecular diagnoses were unsuspected prior to STATseq, and thus patients did not receive reference standard testing for these specific genes. In addition, the rapidity of STATseq diagnosis abbreviated the extent of reference standard testing in some cases. The rate of diagnosis by STATseq was higher than that reported for exome sequencing, especially given the absence of consanguinity herein. Several factors may have contributed to this difference. A priori, genome sequencing is more complete than exome sequencing. Parent-infant trios were utilized, which allowed identification of de novo mutations that were the most common mechanism of disease. Clinicopathologic correlation software helped to overcome the interpretive difficulty of broad genetic and clinical heterogeneity in infants, particularly where the clinical overlap of presentations with classic genetic disease descriptions was modest. In fact, the phenotypes of infants were frequently formes frustes of classical genetic disease descriptions, as evidenced by the average STATseq-based diagnosis ranking 806th most likely on a software-derived list of differential diagnoses. In contrast, the average rank among 32 older children diagnosed in a similar manner was 279th. Additionally, the cases reported herein were a select subset of the total NICU and PICU admissions during the study period, with a strong pretest probability of genetic disease. Finally, the higher rate of diagnosis by STATseq may be the result of higher prevalence of genetic disease in a level IV NICU and PICU population, as opposed to the older children reported in prior exome studies. Irrespective, STATseq was effective for genetic disease diagnosis in infants in a level IV NICU or PICU setting.
While STATseq can give a provisional diagnosis of genetic disorders in 50-hours, the fastest time to reported diagnosis herein was 5 days, and median was 22.5 days. There were several reasons for this: Firstly, some diagnoses were made following improvements in methods or publication of novel disease-gene associations during the study. Secondly, extensive analysis and expert consultation where required in cases where diagnoses differed widely from expected presentations. Thirdly, STATseq is a research test, and confirmation with a clinical test is mandatory before reporting results. Confirmatory Sanger sequencing typically took one week. During the study, however, the FDA granted non-significant risk status to our return of a provisional STATseq-based diagnosis to the treating physician in exceptional cases, where the results were actionable and death was imminently likely. The fastest provisional diagnosis was 3 days.
A prerequisite for broad adoption of STATseq in infants is demonstration of improved outcomes. The mortality rate among infants receiving a diagnosis was very high (52% at 120 days). Among infants who died, the average age was 0.5 days at symptom onset, 26 days at enrollment, and 45 days at death. 65% of STATseq diagnoses were reported prior to discharge or death. Thus, the average interval for diagnosis and institution of genotype-directed interventions that could lessen morbidity and mortality was extremely brief. Nevertheless, treating physicians adjudged STATseq diagnoses to have been helpful in acute clinical care in 62% of infants. The principal types of change in care that were associated with diagnoses were in medications, genetic counseling and medical procedures. In four cases, which were described in detail, acute management and/or outcome was substantively and favorably changed, or had the potential to have been changed. Genetic diagnosis also enabled prognostic determination and discussion of institution of palliative care where the prognosis was poor. Palliative care was implemented in 33% of infants receiving genetic diagnoses.
In toto, this experience suggested a novel framework for implementation of genomic medicine in a level IV NICU or PICU. In families desiring the full complement of intensive care, optimal management of each infant could be considered an N-of-1-genome case study, as exemplified by CMH809. This could be accomplished, for example, by the institution of a specific genomic neontatology care team in large level IV NICUs and PICUs, for early ascertainment of candidate patients, facilitation of etiologic diagnosis by STATseq, immediate provision of prognostic and therapeutic guidance and counseling in ultra-rare disorders, and to facilitate rapid implementation of specialized treatments, services and studies in infants receiving diagnoses.
An unexpected finding was that mortality was significantly higher in infants receiving a diagnosis by STATseq (52% at 120 days) than in those who did not (7%). In addition, palliative care was instituted in a significantly higher number of infants receiving STATseq diagnoses (33%) than those who did not (0%). These findings reflect the poor prognosis for many genetic diseases of infancy, and current absence of ameliorative or curative treatments.
This study had several limitations. It was small, retrospective and lacked a randomized, blinded control group. It was limited to infants of <4 months in a single level IV NICU or PICU where the presentation was of a type that a diagnosis had any potential to alter management or genetic counseling. Sufficient time has not elapsed since study inception to ascertain long-term outcomes. The psychosocial impact of diagnoses for parents or healthcare providers was not measured. Fuller assessment of the utility of STATseq to impact infant morbidity and mortality will necessitate additional study, with enrollment at or close to birth, more timely STATseq than achieved herein, and rapid institution of individualized treatment. Some of these limitations will be addressed, and the generalizability of the results reported herein to broader newborn populations will be examined in a prospective, randomized, blinded study that has recently commenced (clinicaltrials.gov NCT02225522).
In conclusion, STATseq provided genetic diagnoses in a majority of infants of age less than 4 months in a level IV NICU and PICU in whom such diseases were suspected and had a potential to influence clinical management or genetic counseling. STATseq-based diagnoses refined treatment plans in a majority of such infants.
Supplementary Box 1: Retrospective Case Example of 24-Hour Diagnostic Whole Genome Sequencing
Supplementary Box 2: Retrospective Case Example of 24-Hour Diagnostic Whole Genome Sequencing
From the foregoing it will be seen that this invention is one well adapted to attain all ends and objects hereinabove set forth together with the other advantages which are obvious and which are inherent to the structure.
It will be understood that certain features and subcombinations are of utility and may be employed without reference to other features and subcombinations. This is contemplated by and is within the scope of the claims.
Since many possible embodiments may be made of the invention without departing from the scope thereof, it is to be understood that all matter herein set forth or shown in the accompanying drawings is to be interpreted as illustrative, and not in a limiting sense.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US15/15956 | 2/13/2015 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61939654 | Feb 2014 | US |