1. Technical Field
This application relates generally to the determination of the complete genomic sequence of 80 novel human rhinoviruses and to the analysis of the sequence information from these novel sequences in conjunction with sequences information from other, known human rhinoviruses.
2. Background on Human Rhinoviruses
Human rhinovirus (HRV), the disease agent for the common cold, is responsible for ˜50% of asthma exacerbations and is one of the factors that can direct the infant immune system towards an asthmatic phenotype (J. M. Gwaltney Jr. et al., N. Engl. J. Med. 275, 1261 (1966); S. L. Johnston et al., Am. J. Respir. Crit. Care Med. 154, 654 (1996); K. G. Nicholson et al., BMJ 307, 982 (1993); D. J. Jackson et al., Am. J. Respir. Crit. Care Med. 178(7):667-72 (2008)). Direct and indirect costs from the common cold and related complications in asthmatics amount to an estimated ˜$60 billion per year in the U.S.A. (A. M. Fendrick et al., Arch. Intern. Med. 163, 487 (2003); K. B. Weiss et al., J. Allergy Clin. Immunol. 107, 3 (2001)). HRVs are single-stranded, positive-sense RNA Enteroviruses in the Picornaviridae family and have been catalogued primarily by capsid serotyping relative to a historical repository of 99 strains, obtained from clinical specimens. HRVs are classified by their use of either ICAM-1 (89 major viruses), or VLDL (10 minor viruses) as their receptor for cell entry (M. K. Cooney et al., Infect. Immun. 37, 642 (1982)). They have also been characterized by composite sensitivities across a panel of potential therapeutics (K. Andries et al., J. Virol. 64, 1117 (1990)) which have been used to parse the strains into two related drug-reactivity groups. The partial sequences of viral capsid-coding regions, non-coding regions, and a limited number of complete genomes have resulted in a division of the original 99 strains into two species: HRV-A (containing 74 serotypes) and HRV-B (containing 25 serotypes).
The naked RNA genome (no more than ˜8 kb in length) is surrounded by a capsid composed of 60 copies each of four structural proteins, denoted VP1-VP4, in an icosahedral configuration (20 triangular faces with 12 vertices). High resolution structural data reveal a star-shaped “plateau” at the 5-fold axis of symmetry, a deep depression (often termed the “canyon”) and a second protrusion at the 3-fold axis. The serotype used to classify HRVs is determined by the C-termini and connecting loops of the capsid proteins.
The HRV genome is translated as a single polyprotein, which is cleaved during translation by virus-encoded proteinases to produce 11 proteins (
Recently, a number of new HRV-like sequences were detected in patients with influenza-like illnesses associated with severe respiratory compromise (S. R. Dominguez et al., J. Clin. Virol., 43, 219 (2008); P. McErlean et al., PLoS. ONE 3, e1847 (2008); S. K. Lau et al., J. Clin. Microbiol. 45, 3655 (2007)). The new viruses have not been cultured, but their sequences indicate they likely represent a third HRV-C species. The lack of whole-genome sequence data for the full cohort of HRVs has made it difficult to understand basic molecular and evolutionary characteristics of the viruses and has hampered investigations of the epidemiology of upper respiratory tract infections and asthma epidemics. The present invention addresses this lack of data.
The present invention is based on the determination of the complete, novel genomic sequences of 80 human rhinoviruses (HRVs). These genomic sequences (SEQ ID NOs: 1-80) were obtained through the diligent efforts of the inventors as described herein and details concerning each of these novel sequences are provided in Table 1 (the novel sequences determined herein are marked with a double asterisk in column (1) of Table 1). The particular viruses from which the novel sequences were obtained include some of the 99 HRV serotypes maintained in the American Type Culture Collection (http://www.atcc.org) as well as 10 additional viruses obtained from field samples. The complete genomic sequences of the novel HRVs have been submitted to GenBank and the accession numbers of each of these sequences is provided in Table 1.
The present invention is further based on the analysis of the genomic sequence information obtained from the 80 novel HRVs in conjunction the published sequences of the additional 58 HRVs provided in Table 1. In a first aspect, the present invention is directed to use of this genomic sequence information in providing important and useful information concerning the structural and functional relationship between different HRV serotypes. In particular, a phylogenetic relationship based on the sequence information of the 138 HRVs listed in Table 1 was determined and is shown in
The structural and functional relationships between different serotypes can be used in a variety of manners including the following:
1. Development of a diagnostic test. The individual serotype sequences may form the basis for diagnostic tests for screening individuals for HRV infections, such as through the use of a gene chip containing all of the sequences in a known pattern. A sample from a subject can be applied to the chip to determine whether the subject is infected with a particular HRV serotype. Treatment can then be tailored to the specific serotype. The skilled artisan will understand that there are a variety of means to which the sequence information, and the structural and functional information based thereon, can be used in the development of diagnostic tests. As such, diagnostic testing is not limited to the use of gene chips.
In a similar fashion, consensus sequences determined through the genetic analysis discussed herein can also be used in the diagnosis and prognosis of HRV infection. For example, a consensus sequence can be included on a gene chip with diagnostic polynucleotides for a variety of different viral and/or bacterial infections. In this manner one chip can serve as the basis for diagnosing a whole range of infections. The skilled artisan will again understand that there is a variety of means to which the consensus sequence information, and the structural and functional information based thereon, can be used in diagnostic assays. As such, diagnostic testing is not limited to the use of gene chips.
2. Development of therapeutic agents to treat HRV infection. Information concerning structural and functional relationships between different HRV serotypes can be used in the development of therapeutics. Detailed knowledge concerning consensus sequences among serotypes, from both a functional and structural standpoint, can be used to facilitate the development of therapeutics that are effective against a large number of different serotypes. Similarly, treatments can be tailored to specific HRV serotypes based on unique properties discovered from the genomic sequence information, both polynucleotide and polypeptide sequences. Both the nucleic acid and amino acid sequences of specific serotypes, as well as consensus sequences, can be used, for example, in molecular modeling of small chemical agents that specifically target and bind the sequences, thereby disrupting the activity of the virus.
3. Development of vaccines to prevent HRV. The amino acid sequences of specific HRV serotypes provided herein can be used to prepare vaccines against specific serotypes. Similarly, information concerning structural and functional relationships between different HRV serotypes can be used in the development of vaccines. Consensus regions of the genomic sequence and proteins encoded thereby can be targeted in a vaccine development against a large number of different serotypes. Thus, while vaccines may be developed against specific serotypes, vaccines can also be developed against different groups of serotypes. Potentially, only a small number of vaccines would need to be included in one formulation to provide protection against most or all serotypes.
4. Following the progression of disease. There is growing evidence that asthma may be persistent if the HRV infection is slow to be, or never, cleared. Thus a test can be developed to monitor the titers of HRV from biological samples, such as nasal washes or bronchoscopy-derived fluids in patients with asthma or COPD. The results of such tests can be used to guide therapeutic decisions (e.g., anti-HRV or antibiotics for secondary infections). The knowledge gained from the genomic sequence information provided herein can be used in such tests.
Moreover, the genomic sequence information can be used to survey for mutations that occur among the general population of individuals that are being infected with HRV, such as through the use of the gene chips described above. Advance warning of mutations in particular regions can serve as the basis for developing vaccines and treatments to minimize the risk of an epidemic for serotypes causing more severe disease.
In a second aspect, the present invention is directed to the complete genomic sequence of the 80 newly sequenced HRVs (SEQ ID NOs: 1-80). These sequences are the complete genomic sequences of both previously described and newly discovered HRVs. The serotypes from which these 80 sequences were obtained and sequenced are marked with a double asterisk in the first column of Table 1. Thus, in one embodiment the present invention is directed to the genomic sequence of each of the 80 different HRVs marked with a double asterisk in the first column of Table 1. The GenBank accession number for the sequence information is provided in Table 1 as well (column (7)).
In a third aspect, the present invention is directed to the polypeptide sequences encoded by the genomic sequence of the 80 newly sequenced HRVs. As discussed herein, the genomic sequence of each HRV seroptype encodes 11 different proteins. Thus, in one embodiment the present invention is directed to each of the polypeptide sequences of each of the 80 newly sequenced HRVs. In a second embodiment the present invention is directed to polypeptide homologues having at least 95% amino acid sequence identity with each of the polypeptide sequences of each of the 80 newly sequenced HRVs. In a related embodiment, each of the polypeptide homologues has substantially the same activity as the polypeptide with which it is a sequence homologue. In a third embodiment, the present invention is directed to antibodies that specifically bind to the polypeptides and polypeptide homologues of the present invention. Preferably the antibody is a monoclonal antibody, a polyclonal antibody, or an active fragment of such an antibody.
In a fourth aspect, the present invention is directed to methods for assaying for the presence of an HRV serotype in a subject. The assay may be a diagnostic assay. The HRV serotype may be any HRV serotype, including those serotypes populated by the 138 HRVs set forth in Table 1. The diagnostic assay may be an assay for the presence of one or more HRV polynucleotide sequence, and/or the presence of one or more HRV polypeptide sequence. The assay may use a biological sample obtained from a subject. The method may be performed by obtaining a biological sample from a subject suspected of having an HRV infection and assaying the sample for the presence of one or more HRV polynucleotide sequences, and/or the presence of one or more HRV polypeptide sequences.
The present invention provides the complete, novel genomic sequences of 80 HRVs (SEQ ID NOs: 1-80). Description information concerning these sequences is provided in the five tables included herein, with Table 1 providing the GenBank accession number (column (7)) for each of the different genomic sequences. Of the 138 different HRVs, 80 of the genomic sequences are novel (those HRVs marked with a double asterisk in column (1) of Table 1).
As discussed above, the genomic sequence information of each of the 138 different HRVs forms the basis for each aspect of the invention. The present invention is directed to use of the genomic sequence information in providing important and useful information concerning the structural and functional relationship between different HRV serotypes. In particular, a phylogenetic relationship between all known HRV serotypes based on genomic sequence data was determined and is shown in
In a first aspect, the structural and functional relationships between different serotypes can be used in a variety of manners including: (i) development of diagnostic and prognostic tests for HRV infection, (ii) development of therapeutic agents for use in the treatment of HRV infections, (iii) development of vaccines to prevent HRV infection, and (iv) serving as a basis from which to follow the progression of disease in a subject or a population of subjects.
The skilled artisan will readily understand the wide variety of uses to which the genomic sequence information, and the structural/functional information produced from the sequence information, can be put. Without being bound by a specific group of assays, procedures or techniques, polynucleotide and polypeptides sequences, both those of specific HRV serotypes and HRV consensus sequences, can be used in gene chips for diagnostic tests, for assaying potential HRV therapeutics and vaccines, and to monitor HRV infections in individuals. Similarly, the polynucleotide and polypeptides sequences can be used in affinity columns, Southern blots, western blots and northern blots, and to design primers for whole genome sequencing of HRVs.
In a second aspect, the present invention is directed to the complete genomic sequence of the 80 newly sequenced HRVs. The serotypes from which these 80 sequences were obtained and sequenced are marked with a double asterisk in the first column of Table 1. Thus, in one embodiment the present invention is directed to the complete genomic sequence of each of the 80 different HRVs marked with a double asterisk in the first column of Table 1. The GenBank accession number for the sequence information is provided in Table 1 as well (column (7)).
In a related aspect, the present invention is directed to the individual open reading frames (ORF) contained within the 80 different genomic HRVs sequences. As discussed above, each of the single stranded genomic sequences encodes 11 different proteins. The specific boundaries of the ORFs are provided in Tables 1 and 3. The skilled artisan will readily understand how to determine the specific ORF of each of the genomic sequences.
The present invention further comprises polynucleotide homologues of each of the HRV ORFs. As used herein, the polynucleotide homologues of the present invention include those polynucleotides having at least about 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to any of the HRV ORFs encompassed within the genomic sequences of the HRV serotypes included herein. In one embodiment the polynucleotide homologues encode polypeptides that have substantially the same activity as the polypeptides encoded by the polynucleotides with which they are homologous (i.e., the wild-type polynucleotide). As used herein, substantially the same activity means about 100%, 99%, 98%, 97%, 96%, 95%, 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20%, or 10% of the activity of the wild-type polypeptide.
In a third aspect, the present invention is directed to the polypeptide sequences encoded by the genomic sequence of the 80 newly sequenced HRVs. As discussed herein, the genomic sequence of each HRV seroptype encodes 11 different proteins. Thus, in one embodiment the present invention is directed to each of the polypeptide sequences encoded by each of the ORFs of the 80 newly sequenced HRVs. An alignment of the polypeptide sequences of the 80 newly sequenced HRVs, along with the previously disclosed polypeptide sequences of the other HRVs shown in Table 1, is provided in Table 5.
In a related aspect the present invention is directed to polypeptide homologues of the polypeptide sequences of each of the 80 newly sequenced HRVs. As used herein, the polypeptide homologues of the present invention include those peptides and polypeptides having at least about 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to any of the HRV peptides or polypeptides described herein. In a preferred embodiment the polypeptide homologues have substantially the same activity as the polypeptide to which they are homologous (i.e., the wild-type polypeptide). As used herein, substantially the same activity means about 100%, 99%, 98%, 97%, 96%, 95%, 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20%, or 10% of the activity of the wild-type polypeptide.
In further related aspect, the present invention is directed to antibodies that specifically bind to the polypeptides and polypeptide homologues of the present invention. Preferably the antibody is a monoclonal antibody or a polyclonal antibody, including chimeric and humanized monoclonal and polyclonal antibodies, or an active fragment of such an antibody. Active fragments include single chain antibodies, Fab fragments, F(ab)2 fragments, The skilled artisan will readily understand how to produce the antibodies of the present invention.
In a fourth aspect, the present invention is directed to assays for the presence of an HRV serotype in a subject. The assay may be a diagnostic or prognostic assay. The HRV serotype may be any HRV serotype, including those populated by the 138 HRVs set forth in Table 1. The diagnostic or prognostic assay may be an assay for the presence of one or more complete or partial HRV genomic sequence, including specific ORFs or portions thereof, and untranslated regions or portions thereof. The diagnostic assay may further be an assay for the presence of one or more HRV polypeptide sequences, or one or more portions of an HRV polypeptide sequence. The assay may use a biological sample obtained from a subject. As used herein, the biological sample obtained from a subject may be any biological sample that could contain some amount of an HRV polynucleotide or HRV polypeptide. The biological sample can be, for example (but is not limited to), nasal wash, nasal discharge, nasal scraping, nasal epithelial sampling (q-tip, spatula, etc.), tracheal wash, bronchial wash/lavage, bronchial epithelial biopsy, bronchial epithelial brushing, bronchial smooth muscle biopsy, bronchoalveolar lavage or lung biopsy. As used herein, “subject” is intended to mean an animal, such as mammals, including humans and animals of veterinary or agricultural importance, such as dogs, cats, horses, sheep, goats, and cattle. The biological sample can also be from tissue culture cells that have been infected with a biological sample from a subject, which allows-for replication of the HRV to produce larger amounts of virus for subsequent studies.
In other aspects of the present invention the HRV genomic sequence information, and the structural and functional information derived therefrom, both that described herein and that later developed from the sequence information, can be used in the diagnosis, prognosis, treatment, and surveying of a number of different diseases including: rhinitis, sinusitis, common cold, upper respiratory tract infection, otitis media, asthma, pneumonia, asthma (where HRV is a causative agent), asthma exacerbation (where HRV infection is not the causative agent but instead exacerbates the condition), persistent asthma (where uncleared HRV infection causes persistent asthma), and chronic obstructive pulmonary disease (COPD) exacerbation (where HRV infection is not the causative agent but instead exacerbates the condition). That is, the HRV genomic sequence information, and the structural and functional information, can be used in developing screening assays, means for treatment and therapy, means for prevention and prophylaxis, and means for monitoring disease progression in a number of diseases caused by or related to different HRV serotypes.
Thus, the skilled artisan will readily understand and appreciate that the genomic sequence information, and the structural and functional information derived therefrom, can be used in a large number of different manners, both those contemplated and discussed herein, and those that may be later developed.
I. HRV Genome Sequencing
To define the extent and nature of human rhinovirus (HRV) diversity and evolution, the genomes for every previously HRV that did not have full genome sequence in the reference repository were sequenced, completing the full set of 99 serotypes, as well as 10 new field samples (Table 1). Full (or complete) genome sequences is defined as the contiguous sequence without gaps from the first (or nearly first) nucleotide to the last (or nearly last) of the HRV genome. Modifications (A. Djikeng et al., BMC Genomics 9, 5 (2008)) were made to the sequence-independent, single-primer amplification (SISPA) method (see below) in order to determine the complete genomes of 70 HRVs from the reference repository, as well as 10 nasal wash samples from patients with HRV upper respiratory tract infections. The viruses were sequenced at an average of 6× complete coverage for each of the ˜7 kb genomes.
II. HRV Genome Analysis
To provide accuracy for determining the differences and commonalities between HRVs, and the phylogenetic relationship between these organisms with relatively small genomes and (often) high degrees of sequence similarity, a stringent approach was taken for aligning the sequences. The initial sequence fits for the polyproteins were performed on the basis of superimposition of the amino acid sequences within virion crystal structure maps (A. C. Palmenberg, J.-Y. Sgro, in Molecular Biology of Picornaviruses, B. L. Semler, E. Wimmer, Eds. (ASM Press, New York, 2001)) and supplemented with additional structure data from other viral proteins. In a step-wise manner, profile hidden Markov models (HMM) augmented the founder set with the remaining sequences (see below). The published sequences (including redundant determinations) for the remaining serotypes were added so that the final collection consisted of 138 full-length HRV genomes, including at least 1 representative for each of the 99 original strains, 10 field samples, and 7 HRV-C strains (Table 1).
Regardless of species, all HRVs were found to have similar average base compositions. They are rich in A (31-34%) and U (25-30%), but low in G (19-22%) and C (18-22%). The third codon positions have the highest composition skew. An identity matrix prepared for the polyproteins (
HRVs encode a single open reading frame (ORF) representing about 90% of the RNA length. Translation produces a polyprotein that is subsequently cleaved in a viral protease-dependent cascade to form the 11 mature viral proteins required to initiate and sustain an infection (
HRV RNA Structures. All enteroviruses encode 5′-terminal cloverleaf-like motifs (CL) which bind viral and cellular proteins for the initiation of RNA synthesis and also help convert infecting genomes from translation to replication templates. The HRV CLs (80-84 b) were predicted in every sequence with minimal structural variation among the species (representative structures are shown
Picornaviruses use internal ribosome entry sites (IRES) to mediate translation initiation of their polyprotein ORFs. The IRESes of all enteroviruses (termed Type-1 IRESes) are thought to bind 40S subunits internally within their 5′-UTR, and to then scan additional nucleotides to find the proper initiator AUG (R. J. Jackson, in Translational Control, W. B. Hershey, M. B. Mathews, N. Sonenberg, Eds. (Cold Spring Harbor Laboratory Press, 1996)). Modeling of the known serotypes confirmed that all HRV IRESes start just 3′ to the pyrimidine-rich spacer tract. The internal IRES sequences were found to be highly conserved with an average nucleotide identity of 82%. Indeed, this region of the genome has the greatest degree of identity among all HRV (
Near the bottom of a long (15-20 bp) minimum energy unbranched stem, the ORF AUG was invariably paired with a conserved upstream non-coding AUG (light gray boxes,
The HRV 3′-UTRs (40-60 b) begin with the ORF termination codon and extend to the genetically-encoded poly(A) tail. The ORF terminators themselves (dark gray boxes in
Phylogenetic Relationships of the HRV. Multiple methods were used to compute and compare phylogenetic trees for the aligned RNA and protein data.
According to the tree(s), HRV-A and HRV-C share a common ancestor, which is a sister group to the HRV-B. Although the HRV-C Glade currently has only seven full sequences, its genetic origin is clearly different from the reference set, and the phylogeny indicates that these represent a third HRV species, as has recently been recommended to the International Committee on the Taxonomy of Viruses (on the basis of these data, the International Committee of Taxonomy of Viruses Picornavirus Study Group recently proposed to recognize them as a new species (Nick Knowles, personal communication)). The HRV-C have yet to be cultured or assessed for immunological cross-reactivity, but the sequence space occupied by the available samples suggest that there may be many additional HRV-C strains awaiting discovery. Distance extrapolations relative to the new full reference cohort, predict the HRV-C may have an even broader range of serotypes than the original 99, of which each confers only limited immunologic cross-protection to another.
A separate phylogenic finding was the unexpected basal divergence within HRVA of a small (n=3) group of distinct strains, denoted clade-D (hrv-80 and hrv-95 are designated as two different serotypes by ATTC, but differ by ˜150 nucleotides, perhaps indicating a misidentification in the repository)). Although the major basis for discriminating clade-D from other HRV-A lies in their general, genome-length sequence divergence, these particular isolates have RNA elements, such as the cis-acting replication element, the 3′-UTR terminal loop feature (see above and
Recombination in HRV. Earlier sequencing reports with a subset of HRV reference genomes concluded that RNA recombination was not a major mechanism for HRV diversity (A. L. Kistler et al., Virol. J. 4, 40 (2007); P. Simmonds, J. Virol. 80, 11124 (2006)), and asserted that known isolates were independently segregating entities. The potential for recombination has been re-evaluated herein by scanning the full reference set and the new field strains with a suite of recombination detection programs (D. Martin, E. Rybicki, Bioinformatics 16, 562 (2000)) relying on phylogenetic distance and sequence similarity. Stringent criteria (probability P value of <0.00001 from 2 or more analyses modes), identified 23 genomes with probable origins resulting from at least twelve independent recombination events.
Supportive information and additional statistics are provided in Table 3. Of the recombination locales suggested by these events, the majority (10 of 12) involve the 5′-UTR or the adjacent capsid genes, which seemingly have been collectively rearranged to produce at least 20 separate progeny strains. Among the 138 full-length sequences, hrv-54 (or its ancestor) was apparently the most active in recombination. Field strain hrv-54-f05 links closely to three separate events (see
While the sequence fingerprints clearly trace these ancestral patterns, nevertheless all extant progeny also showed evidence of subsequent sequence divergence within the exchanged regions. Recombination was not identified between isolates from different species (i.e., HRV-A and HRV-B), but receptor binding preferences between ICAM and LDLR apparently presented no barriers to exchange. Major group hrv-54 and minor group hrv-29, for example, both contributed to the common ancestor of the minor group viruses, hrv-31 and hrv-47. These results, particularly for HRVs with different receptor preferences, and those from distant clades, such as the parents hrv-21 and hrv-65, suggests that co-infection of the host with multiple parental strains is not uncommon, and may lead to variant progeny with different biological properties. The field isolates also deviated from the reference sequences in a manner that was not confined to any specific portion of the genome. In fact, field strain deviation relative to the reference isolates (for example, hrv-52-f01 vs hrv-52 differ by 838 nucleotides) was frequently greater than that observed between pairs of characterized serotypes (hrv-44 and hrv-29 differ by 385 nucleotides). Indeed, field samples of the same serotype collected from the same geographical region within one year showed marked variability (Table 4 and
In summary, the data complete and define the full set of genome-length sequences in the canonical reference repository of 99 HRV-A and HRV-B serotypes. Alignment and examination of these genomes confirmed species-specific sequence and RNA structural elements which differentiate the HRV-A and HRV-B from newly described HRV-C, and further suggest the HRV-A harbor a distinct, uncharacteristic Glade, which may represent a fourth species (HRV-D). Local sequence variation particularly in the 5′-UTR, characterized each isolate within regions associated with the pathogenic potential for other picornaviruses. Parallel RNA structure comparisons defined several 5′ and 3′ elements as common to all isolates, and unique to the HRV. Motifs like the AUG-presenting 5′ ORF initiation stem, or the UAG-presenting 3′ stem, may contribute to HRV-specific IRES translation mechanisms, ORF termination, or polymerase recognition. It was also found embedded within multiple sequences, including recent field isolates, clear evidence for repeated, historical genome recombination. Co-infection with multiple HRVs is known to occur (C. Savolainen, M. N. Mulders, T. Hovi, Virus Res. 85, 41 (2002)), and it is now known that this can lead to unique strains that may have distinct biologic properties and clinical characteristics. The required host environment for HRV recombination is not known, but with complete genome sequences from additional patient isolates such factors may become apparent. The repository dataset provides a baseline framework for the analysis of additional HRVs that may be in communities, including the HRV-Cs, and will enable larger scale studies of basic molecular and evolutionary characteristics and assignment of disease phenotypes to specific genome regions. Indeed, the recombinations and mutations found in all regions of these genomes suggest that future HRV epidemiologic studies might benefit from full genome sequencing rather than the more limited serotyping. With such an approach, correlations may be more informative in inferring pathogenic potential, and in designing antiviral agents and vaccines.
III. Experiments
A. HRV Samples and Preparation of Viral Nucleic Acids
The HRV reference repository was considered to include all unique HRV serotypes available from American Type Culture Collection (http://www.atcc.org). These amounted to a total of 99 HRVs, as the previously designated “HRV-Hanks” is now considered hrv-21, hrv-87 is now classified as a strain of EV-86, a serotype within the human enterovirus D species, and hrv-1b and hrv-1a are now considered hrv-1. One described HRV serotype (hrv-57) was not available from ATCC and a field sample was utilized whose sequence was consistent with regions of the hrv-57 genome that had been previously reported and this is referred to as a reference sample. Additional field samples were obtained from the Wisconsin State Laboratory of Public Hygiene, collected from 2005-2006. These isolates were amplified in HeLa cells, the virus was concentrated, then snap frozen. Full genome sequencing was performed as described below with 105-106 virions from the contents of one vial of frozen virus provided by the aforementioned sources. Briefly, viral RNA and DNA was prepared in a manner previously described in detail (A. Djikeng et al., BMC Genomics 9, 5 (2008); T. Allander, S. U. Emerson, R. E. Engle, R. H. Purcell, J. Bukh, Proc. Natl. Acad. Sci. U.S.A. 98, 11609 (2001); T. Allander et al., Proc. Natl. Acad. Sci. U.S.A. 102, 12891 (2005)) with minor modifications. Each biological sample was first spun to remove cellular debris and processed through a 0.22 μM filter to enrich viral particles in the flow-through while retaining bacteria and other cells in the filter. To eliminate residual nucleic acid contaminants in the filtrate, 100 units of DNAse I and 3 μg RNAse A were added to the viral resuspension, followed by incubation at 37° C. for 1 hour. RNA was extracted with Trizol-LS reagent (Invitrogen, Carlsbad, Calif.) according to the manufacturer's instructions. The RNA pellet was resuspended in 20 μl of nuclease-free water.
B. Construction of a Library of Random PCR Fragments and Sequencing
The extracted RNA was processed as previously described (1-3). Briefly, 800 ng of purified RNA was reverse-transcribed with SSII Reverse Transcriptase (Invitrogen) using the FR26RV-N primer (5′ GCC GGA GCT CTG CAG ATA TCN NNN NN 3′; SEQ ID NO: 81) at a concentration of 1 μM. In addition, primer FR40RV-T (5′ GCC GGA GCT CTG CAG ATA TC (T)20 3′; SEQ ID NO: 82) was added at a concentration of 5 nM to specifically amplify the polyadenylated 3′ end. The second strand was synthesized by addition of Klenow exo-polymerase (New England Biolabs, Ipswich, Mass.) in the presence of the FR26RV-N random primer. To capture the 5′ end, the Klenow reaction was supplemented with 10-30 nM of primers FR30RVA (5′ GCC GGA GCT CTG CAG ATA TC TTA AAA CTG G 3′; SEQ ID NO: 83) and FR30RVB (5′ GCC GGA GCT CTG CAG ATA TC TTA AAA CAG C 3′; SEQ ID NO: 84) where the final 10 bp match the 5′ ends of A-type and B-type rhinoviruses, respectively.
PCR amplification used high fidelity Taq Gold DNA polymerase (Applied Biosystems, Foster City, Calif.) with the FR20RV primer (5′ GCC GGA GCT CTG CAG ATA TC 3′; SEQ ID NO: 85). PCR amplicons were A-tailed with dATP and 5 units of low fidelity DNA polymerase (Invitrogen) at 72° C. for 30 minutes. A-tailed PCR amplicons were fractionated on a 1% agarose gel and fragments between 500 and 1000 nt were extracted. Amplicons were ligated en masse into the Topo TA cloning vector (Invitrogen) and transformed into competent one-shot Topo top 10 bacteria (Invitrogen). Cells were plated on LB/Amp/XGal agar, and individual colonies were picked for sequencing. The inserted fragments were sequenced bidirectionally with the M13 primers from the Topo TA vector. A total of 192 fragments or more were routinely sequenced per library. Sequencing reactions were performed at the Joint Technology Center, Rockville, Md., on an ABI 3730×1 sequencing system with Big Dye Terminator chemistry (Applied Biosystems).
C. Assembly of Viral Genomes
Sequence reads were downloaded, trimmed to remove primer sequence as well as low quality sequence, and assembled with the program ELVIRA, the Executive for Large-scale Viral Assembly (http://sourceforge.net/projects/elvira). Additional manual inspections identified ambiguities or potential single nucleotide variants and were interrogated by further RT-PCRs, cloning, and sequencing. To close gaps between assembled contigs, strain-specific primers were utilized. Additional primer design, cDNA synthesis and sequencing were performed to ensure at least 4× sequence coverage along all genomes.
D. Polyprotein and RNA Genome Alignments
Founder data for the HRV polyprotein alignment are from superimposition of virion crystal structure hydrogen-bonding maps as described (A. C. Palmenberg, J.-Y. Sgro, in Molecular Biology of Picornaviruses, B. L. Semler, E. Wimmer, Eds. (ASM Press, New York, 2001)). Profile hidden Markov models, derived from the founder data, were progressively augmented with published sequences and those derived from this study. The HMMER program suite (Accelrys, San Diego, Calif.) reported both high and low road fits, building the alignment possibilities with additions from highest to lowest similarity. Each insertion and deletion (indel) in the output iterations was examined and eased within the confines of its high road and low road fits, to maximize conservation of viral cleavage sites, catalytic sites, determined structure landmarks (P1, 2A, 3B-3D) and sequence similarity. The HMM profile of composite HRVs was used to fit three additional HEV-C out-group sequences into the final alignment. Reverse translation of the polyprotein alignment relative to the original RNA sequences formed a core ORF alignment with analogous indels. Preliminary tree-building exercises identified 20 sequences representative of the dominant HRV clades, including three examples from the putative HRV-C species. The complete genome of each select sequence was analyzed separately by mFOLD (A. C. Palmenberg, J.-Y. Sgro, Seminars in Virology 8, 231 (1997)). The consensus topography (optimal plus 100 suboptimal structures) for each the 5′ and 3′ regions of each fold was superimposed into common alignments which maximized analogous base-pair superimposition (5′ cloverleaf, IRES, 3′ stems, etc.) and minimized indels. These foundations were converted into HMM profiles to which the remaining UTR sequences of all HRV and out-groups were fit. In the interest of alignment length, the HEV-C out-group sequences were truncated to remove a 5′ fragment without analog in the HRV (100 b ribosome read-through, 5′ to ORF AUG). The 5′, ORF and 3′ aligned segments for all included sequences were joined contiguously into full-length genome alignments. Again, each indel in the file was re-examined for plausibility, consistency and biological conservation, before the composite alignment was finalized.
E. Phylogenetic Analysis
Phylogenetic analyses on the polyprotein and genome alignments (msf file format) were conducted with MEGA version 4 (K. Tamura, J. Dudley, M. Nei, S. Kumar, Mol. Biol. Evol. 24, 1596 (2007)) and PHYML version 2.4.5 (S. Guindon, O. Gascuel, Syst. Biol. 52, 696 (2003)). HRV sequences from this study (including field strains) were augmented as needed with published data to include at least one representative of the 99 described HRV-A and HRV-B serotypes, and all available (full-length) data from the HRV-C. Multiple tree iterations were evaluated for both the protein (single gene and polyprotein) and RNA genome data, with UPGMA (MEGA), maximum parsimony (MEGA), neighbor joining methods (MEGA) with bootstrap tests (2000×), and maximum likelihood (PHYLM) with approximate likelihood ratio tests (minimum of SH-like and Chi2-based aLRT). None of these methods showed significant topological differences for major branch points with p-values >2 (i.e., 2% change), especially if the 3rd positions of the ORF codons were omitted from consideration in the RNA trees.
In the absence of full-genome data for all reference strains, HRV relationships have been approximated according to more limited sequence sets derived from the VP1, VP0, IRES and 3D regions. To determine whether “serotype” (i.e., VP0 or VP1 sequences) and/or other more conserved regions of the genomes (i.e., IRES or 3D gene) were useful indicators of full strain relationships, defined maximum likelihood (ML) topologies, optimal for these regions were compared statistically with the ML full-genome tree. Optimal ML topologies for the VP1-only (966 b), 3D-only (1389 b), IRES-only (547 b) and VP0-only (86 b VP4+352 b VP2) fragments within the RNA alignment were computed within PhyML, then compared individually against the optimal topology of the full-genome ML tree using PAML and the CONSEL (V0.1i) suite of programs (Shimodaira et al., Bioinformatics 17, 1246 (2001)). The tested (null) hypothesis was the expectation that HRV strains would cluster in reasonable approximations of the full genome data, using only these limited (albeit commonly used) regions of sequence. See also the legends to
F. 5′- and 3′-Structural Predictions
The accepted notion that thermodynamically derived models for phylogenetically related viruses should exhibit common RNA structural motifs, if such motifs are required for biological activity, was considered. The approach herein has been previously described in detail (A. C. Palmenberg, J.-Y. Sgro, Seminars in Virology 8, 231 (1997)). Briefly, rather than form structure predictions on the basis of sequence similarities, folding was undertaken first, and then a searched was conducted among the most probable configurations (energy minimization) for regions with consistent structures. As described above for the RNA alignments, full genome sequences for 20-40 hrv, representing different phylogenetic clades, were evaluated in their entirety by mFOLD (A. C. Palmenberg, J.-Y. Sgro, Seminars in Virology 8, 231 (1997)) asking for the optimal, and up to 100 closely related (+12 Kcal) suboptimal configurations. Without exception, the consensus fold for each sequence (required >80% of queried connect files) agreed that the 5′ and 3′ ends of each RNA generally configured independently. That is, few if any segments within these regions made preferred (low energy) long-range contacts with interior portions of the genome. The UTR topologies folded regionally as a series of local, connected motifs. This tendency was confirmed by assessing the P-num values, computed for the whole-genome folds. The pairing number (Pnum) is a quantitative measure of the propensity of any given base to become involved with the same or alternative pairing partners in a collection of suboptimal folds. It has been shown for other viral sequences that low P-num bases and their correlate partners usually dominate the most important helices and stems supporting biologically significant motifs, especially within the lowest energy configurations. These bases and their partners were therefore used to identify and align true homologues (functional analogues) at the primary sequence level, even in regions with less-than obvious conservation. Superimposition of the 5′ and 3′ low P-num motifs from the genome folds formed the core consensus profiles for the RNA alignments in these regions (as described above) and also identified multiple specific conserved motifs throughout the genomes. Once these commonalities were identified for the 5′ cloverleaf, IRES, 5′ ORF initiation stem, cre element, and 3′-UTR, all other sequences in the alignment were re-folded in these regions (mFold, with 50 suboptimals), to confirm that they too had similar, conserved, low energy, low P-num motifs.
G. Recombination Analysis
The recombination predictions of the genomic sequences, aligned as described above, were conducted with a suite of programs within the RDP3 package (D. P. Martin, C. Williamson, D. Posada, Bioinformatics 21, 260 (2005)). The individual programs RDP (D. Martin, E. Rybicki, Bioinformatics 16, 562 (2000)), Bootscan (M. O, Salminen, J. K. Carr, D. S. Burke, F. E. McCutchan, AIDS Res. Hum. Retroviruses 11, 1423 (1995)), Maximum X2 (J. M. Smith, J. Mol. Evol. 34, 126 (1992)), Chimaera (D. P. Martin, C. Williamson, D. Posada, Bioinformatics 21, 260 (2005)), SiScan (M. J. Gibbs, J. S. Armstrong, A. J. Gibbs, Bioinformatics 16, 573 (2000)) and 3Seq (M. F. Boni, D. Posada, M. W. Feldman, Genetics 176, 1035 (2007)), were implemented for the analysis. Since no single program provides optimal performance under all conditions, any event supported by evidence from two or more analyses with P-values <0.00001 was considered a result consistent with recombination. Potential recombination events were also assessed by phylogenetic analysis, breakpoint polishing and alignment consistency checks. For each individual program, default settings were used except as specified: RDP, internal reference only, window size for recombination, 100 by with step size 10 bp; GENECONV, G-scale was set as 3; BootScan, number of bootstrap replicates, 200, window size, 100 bp, step size, 10 bp, model options: Jukes and Cantor, 1969; Maxichi, variable window size was used, strip gap was selected; SiScan, window size, 100 by with step size 10 bp, P-value permutation number was set as 1000. In additional analyses, to confirm the recombinations that were found, genomic sequences were aligned using progressive alignment methods (C. Notredame, D. G. Higgins, J. Heringa, J. Mol. Biol. 302, 205 (2000)). Briefly, the progressive alignments were performed with non-coding (R. C. Edgar, Nucleic Acids Res. 32, 1792 (2004)) and coding regions (C. Notredame, D. G. Higgins, J. Heringa, J. Mol. Biol. 302, 205 (2000); M. Suyama, D. Torrents, P. Bork, Nucleic Acids Res. 34, W609-W612 (2006)) using the referenced programs and the alignments were concatenated by a custom script. The subsequent final alignment was utilized for recombination predictions using the RDP3 package as described above.
IV. Determining HRV Serotype in a Patient Sample
A sample from a patient, e.g., a sample from the respiratory tract (e.g., nose, mouth, larynx, trachea, bronchi, or alveoli), eyes, ears, blood, urine or stool, or from a culture derived from these samples, is sequenced, e.g., using a sequence-independent random priming method well-known in the art (A. Djikeng et al., BMC Genomics 9, 5 (2008); A. C. Palmenberg et al., Science 324:55 (2009).
Briefly, the sample is passed through a 0.22 micron filter and the filtrate is then treated with RNase A and DNase 1 at 37° C. for one hour (if the preparation consists of purified HRV this step can be eliminated). The RNA is then extracted by phenol chloroform extraction using methods well-established in the field. Samples are treated with Trizol-LS (Invitrogen, Carlsbad, Calif.) for 5 minutes at room temperature and then chloroform, and the aqueous phase is transferred to a new tube to which isopropanol is added and incubated for 10 minutes at 25° C. and centrifuged for 5 minutes at 4° C. The supernatant is discarded and the pellet is treated with 70% ethanol at 4° C. and centrifuged for 5 minutes at 4° C. The RNA-rich supernatant is removed and the pellet is air dried for 3-5 minutes and resuspended in RNase free water. This can be stored at ˜80° C. or the next step undertaken.
From the RNA sample ˜800 ng is subjected to reverse transcription to produce DNA by the use of SS11 Reverse Transcriptase (Invitrogen) and multiple primers that incorporate random nucleotide sequences. One example is the primer 5′ GCC GGA GCT CAG ATA TCN NNN NN 3′ (SEQ ID NO: 86) where N is either AGC or T. Ideally, then, the reaction contains 1296 primers (64) since this represents all possible combinations. Other primers could consist of ˜20-mers of all possible combinations. Also, the number of primers can be less than the theoretical maximal number of combinations. The second strand of DNA can be synthesized using Klenow exo-polymerase (New England Biolabs, Ipswich, Mass.) and the same set of primers to produce double-stranded DNA. However, the RNA:DNA complex formed from the reverse transcriptase can be used directly for the next step without the Klenow fill-in. To capture the 5′ end, the Klenow reaction can be supplemented with 10-30 nM of the primers: 5′ GCC GGA GCT CTG CAG ATA TC TTA AAA CTG G 3′ (SEQ ID NO: 83) or 5′ GCC GGA GCT CTG CAG ATA TC TTA AAA CAG C 3′ (SEQ ID NO: 84) where the final 10 bp universally match the 5′ ends of HRVs. The double-stranded DNA or the RNA:DNA complex is then subjected to a polymerase chain reaction using high-fidelity Taq Gold DNA polymerase (Applied Biosystems, Foster City, Calif.) using a primer such as 5′ GCC GGA GCT CTG CAG ATA TC 3′ (SEQ ID NO: 85). PCR amplicons are A-tailed with dATP and 5 units of low fidelity DNA polymerase (Invitrogen) at 72° C. for 30 minutes. A-tailed PCR amplicons are fractionated on a 1% agarose gel and fragments between 500 and 1000 nt are extracted. Amplicons are ligated en masse into the Topo TA cloning vector (Invitrogen) and transformed into competent one-shot Topo top 10 bacteria (Invitrogen). Cells are plated on LB/Amp/XGal agar, and individual colonies are picked, grown, and DNA prepared. The inserted fragments of HRV DNA are sequenced bidirectionally using the M13 primers which are incorporated in the Topo TA vector. One hundred and ninety-two clones or more, if needed, are then sequenced on an Applied Biosystems 3730×1 sequencing system with BigDye Terminator chemistry (Applied Biosystems). Sequence reads are downloaded, trimmed to remove primer sequence as well as low quality sequence, and can be assembled with the program ELVIRA (http://sourceforge.net/projects/elvira; see, e.g., Palmenberg et al., Science 324:55 (2009), or other widely available similar programs. To close gaps between assembled contigs, specific primers can be designed from the existing sequence and the polymerase chain reaction performed. The described procedure results in about four-fold redundancy in the sequence coverage of the entire HRV genome.
To ascertain the specific serotype (or strain) of the HRV isolated from a patient sample, the completed full genome sequence is compared to the 80 full-length novel genome sequences described herein, and optionally the additional 26 that are known from the literature. Although less than full-length sequences can be used for serotype identification, use of shorter sequences (i.e., less than the full genome) will likely result in a higher number of incomplete serotype identifications, because we have ascertained, from patient samples, that HRV recombines in vivo. The software used for comparing HRV sequences from a sample must be able to accommodate all the reference HRVs and the patient samples, and to identify the match using stringent criteria. There are multiple programs that accomplish this, including MUSCLE (http://www.ebi.ac.uk/Tools/muscle/index.html; Edgar, BMC Bioinformatics 5:113 (2004)), MacVector (Accelrys, San Diego, Calif.), and ClustalW (http://www.ebi.ac.uk/Tools/clustalw2/; Thompson et al., Nucleic Acids Res. 22:4673 (1994)).
While the foregoing specification teaches the principles of the present invention, with examples provided for the purpose of illustration, it will be appreciated by one skilled in the art from reading this disclosure that various changes in form and detail can be made without departing from the true scope of the invention.
All materials referred to herein, including books, manuals, journal publications, posters, abstracts, talks, patents, GenBank records and submissions, and published patent applications, are incorporated herein in their entireties.
This application claims priority to U.S. Provisional Application No. 61/151,302, filed Feb. 10, 2009, which is herein incorporated by reference in its entirety.
This invention was made with U.S. Government support under National Institutes of Health Grant Number U19-A1070503. The U.S. Government has certain rights in this invention.
Number | Date | Country | |
---|---|---|---|
61151302 | Feb 2009 | US |