Methods of making modified viral genomes

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

FIELD OF THE INVENTION

The present invention relates to the creation of an attenuated virus comprising a modified viral genome containing a plurality of nucleotide substitutions. The nucleotide substitutions result in the exchange of codons for other synonymous codons and/or codon rearrangement and variation of codon pair bias.

BACKGROUND OF THE INVENTION

Rapid improvements in DNA synthesis technology promise to revolutionize traditional methods employed in virology. One of the approaches traditionally used to eliminate the functions of different regions of the viral genome makes extensive but laborious use of site-directed mutagenesis to explore the impact of small sequence variations in the genomes of virus strains. However, viral genomes, especially of RNA viruses, are relatively short, often less than 10,000 bases long, making them amenable to whole genome synthesis using currently available technology. Recently developed microfluidic chip-based technologies can perform de novo synthesis of new genomes designed to specification for only a few hundred dollars each. This permits the generation of entirely novel coding sequences or the modulation of existing sequences to a degree practically impossible with traditional cloning methods.

Such freedom of design provides tremendous power to perform large-scale redesign of DNA/RNA coding sequences to: (1) study the impact of changes in parameters such as codon bias, codon-pair bias, and RNA secondary structure on viral translation and replication efficiency; (2) perform efficient full genome scans for unknown regulatory elements and other signals necessary for successful viral reproduction; and (3) develop new biotechnologies for genetic engineering of viral strains and design of anti-viral vaccines.

As a result of the degeneracy of the genetic code, all but two amino acids in the protein coding sequence can be encoded by more than one codon. The frequencies with which such synonymous codons are used are unequal and have coevolved with the cell's translation machinery to avoid excessive use of suboptimal codons that often correspond to rare or otherwise disadvantaged tRNAs (Gustafsson et al., 2004). This results in a phenomenon termed “synonymous codon bias,” which varies greatly between evolutionarily distant species and possibly even between different tissues in the same species (Plotkin et al., 2004).

Codon optimization by recombinant methods (that is, to bring a gene's synonymous codon use into correspondence with the host cell's codon bias) has been widely used to improve cross-species expression (see, e.g., Gustafsson et al., 2004). Though the opposite objective of reducing expression by intentional introduction of suboptimal synonymous codons has not been extensively investigated, isolated reports indicate that replacement of natural codons by rare codons can reduce the level of gene expression in different organisms. See, e.g., Robinson et al., 1984; Hoekema et al., 1987; Carlini and Stephan, 2003; Zhou et al., 1999. Accordingly, the introduction of deoptimized synonymous codons into a viral genome may adversely affect protein translation and thereby provide a method for producing attenuated viruses that would be useful for making vaccines against viral diseases.

Viral Disease and Vaccines

Viruses have always been one of the main causes of death and disease in man. Unlike bacterial diseases, viral diseases are not susceptible to antibiotics and are thus difficult to treat. Accordingly, vaccination has been humankind's main and most robust defense against viruses. Today, some of the oldest and most serious viral diseases such as smallpox and poliomyelitis (polio) have been eradicated (or nearly so) by world-wide programs of immunization. However, many other old viruses such as rhinovirus and influenza virus are poorly controlled, and still create substantial problems, though these problems vary from year to year and country to country. In addition, new viruses, such as Human Immunodeficiency Virus (HIV) and Severe Acute Respiratory Syndrome (SARS) virus, regularly appear in human populations and often cause deadly pandemics. There is also potential for lethal man-made or man-altered viruses for intentional introduction as a means of warfare or terrorism.

Effective manufacture of vaccines remains an unpredictable undertaking. There are three major kinds of vaccines: subunit vaccines, inactivated (killed) vaccines, and attenuated live vaccines. For a subunit vaccine, one or several proteins from the virus (e.g., a capsid protein made using recombinant DNA technology) are used as the vaccine. Subunit vaccines produced in Escherichia coli or yeast are very safe and pose no threat of viral disease. Their efficacy, however, can be low because not all of the immunogenic viral proteins are present, and those that are present may not exist in their native conformations.

Inactivated (killed) vaccines are made by growing more-or-less wild type (wt) virus and then inactivating it, for instance, with formaldehyde (as in the Salk polio vaccine). A great deal of experimentation is required to find an inactivation treatment that kills all of the virus and yet does not damage the immunogenicity of the particle. In addition, residual safety issues remain in that the facility for growing the virus may allow virulent virus to escape or the inactivation may fail.

An attenuated live vaccine comprises a virus that has been subjected to mutations rendering it less virulent and usable for immunization. Live, attenuated viruses have many advantages as vaccines: they are often easy, fast, and cheap to manufacture; they are often easy to administer (the Sabin polio vaccine, for instance, was administered orally on sugar cubes); and sometimes the residual growth of the attenuated virus allows “herd” immunization (immunization of people in close contact with the primary patient). These advantages are particularly important in an emergency, when a vaccine is rapidly needed. The major drawback of an attenuated vaccine is that it has some significant frequency of reversion to wt virulence. For this reason, the Sabin vaccine is no longer used in the United States.

Accordingly, there remains a need for a systematic approach to generating attenuated live viruses that have practically no possibility of reversion and thus provide a fast, efficient, and safe method of manufacturing a vaccine. The present invention fulfills this need by providing a systematic approach, Synthetic Attenuated Virus Engineering (SAVE), for generating attenuated live viruses that have essentially no possibility of reversion because they contain hundreds or thousands of small defects. This method is broadly applicable to a wide range of viruses and provides an effective approach for producing a wide variety of anti-viral vaccines.

SUMMARY OF THE INVENTION

The present invention provides an attenuated virus which comprises a modified viral genome containing nucleotide substitutions engineered in multiple locations in the genome, wherein the substitutions introduce a plurality of synonymous codons into the genome. This substitution of synonymous codons alters various parameters, including codon bias, codon pair bias, density of deoptimized codons and deoptimized codon pairs, RNA secondary structure, CpG dinucleotide content, C+G content, translation frameshift sites, translation pause sites, the presence or absence of tissue specific microRNA recognition sequences, or any combination thereof, in the genome. Because of the large number of defects involved, the attenuated virus of the invention provides a means of producing stably attenuated, live vaccines against a wide variety of viral diseases.

In one embodiment, an attenuated virus is provided which comprises a nucleic acid sequence encoding a viral protein or a portion thereof that is identical to the corresponding sequence of a parent virus, wherein the nucleotide sequence of the attenuated virus contains the codons of a parent sequence from which it is derived, and wherein the nucleotide sequence is less than 90% identical to the nucleotide sequence of the parent virus. In another embodiment, the nucleotide sequence is less that 80% identical to the sequence of the parent virus. The substituted nucleotide sequence which provides for attenuation is at least 100 nucleotides in length, or at least 250 nucleotides in length, or at least 500 nucleotides in length, or at least 1000 nucleotides in length. The codon pair bias of the attenuated sequence is less than the codon pair bias of the parent virus, and is reduced by at least about 0.05, or at least about 0.1, or at least about 0.2.

The virus to be attenuated can be an animal or plant virus. In certain embodiments, the virus is a human virus. In another embodiment, the virus infects multiple species. Particular embodiments include, but are not limited to, poliovirus, influenza virus, Dengue virus, HIV, rotavirus, and SARS.

This invention also provides a vaccine composition for inducing a protective immune response in a subject comprising the instant attenuated virus and a pharmaceutically acceptable carrier. The invention further provides a modified host cell line specially engineered to be permissive for an attenuated virus that is inviable in a wild type host cell.

In addition, the subject invention provides a method of synthesizing the instant attenuated virus comprising (a) identifying codons in multiple locations within at least one non-regulatory portion of the viral genome, which codons can be replaced by synonymous codons; (b) selecting a synonymous codon to be substituted for each of the identified codons; and (c) substituting a synonymous codon for each of the identified codons.

Moreover, the subject invention provides a method of synthesizing the instant attenuated virus comprising changing the order, within the coding region, of existing codons encoding the same amino acid in order to modulate codon pair bias.

Even further, the subject invention provides a method of synthesizing the instant attenuated virus that combines the previous two methods.

According to the invention, attenuated virus particles are made by transfecting viral genomes into host cells, whereby attenuated virus particles are produced. The invention further provides pharmaceutical compositions comprising attenuated virus which are suitable for immunization.

This invention further provides methods for eliciting a protective immune response in a subject, for preventing a subject from becoming afflicted with a virus-associated disease, and for delaying the onset, or slowing the rate of progression, of a virus-associated disease in a virus-infected subject, comprising administering to the subject a prophylactically or therapeutically effective dose of the instant vaccine composition.

The present invention further provides an attenuated virus which comprises a modified viral genome containing nucleotide substitutions engineered in multiple locations in the genome, wherein the substitutions introduce a plurality of synonymous codons into the genome, wherein the nucleotide substitutions are selected by a process comprising the steps of initially creating a coding sequence by randomly assigning synonymous codons in respective amino acid allowed positions, calculating a codon pair score of the coding sequence randomly selecting and exchanging either (a) pairs of codons encoding the same amino acids or (b) substituting synonymous codons in accordance with a simulated annealing optimization function and repeating the previous step until no further improvement (no change in pair score or bias) is observed for a specific or sufficient number of iterations, until the solution converges on an optima or near optimal value

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1. Codon use statistics in synthetic P1 capsid designs. PV-SD maintains nearly identical codon frequencies compared to wt, while maximizing codon positional changes within the sequence. In PV-AB capsids, the use of nonpreferred codons was maximized. The lengths of the bars and the numbers behind each bar indicate the occurrence of each codon in the sequence. As a reference, the normal human synonymous codon frequencies (“Freq.” expressed as a percentage) for each amino acid are given in the third column.

FIGS. 2A-B. Sequence alignment of PV(M), PV-AB and PV-SD capsid coding regions. The nucleotide sequences of PV(M) (SEQ ID NO:1), PV-AB (SEQ ID NO:2) and PV-SD (SEQ ID NO:3) were aligned using the MultAlin online software tool (Corpet, 1988). Numbers above the sequence refer to the position within the capsid sequence. (FIG. 2A) Nucleotide 1 to nucleotide 1300; (FIG. 2B) nucleotide 1301 to nucleotide 2643. Nucleotide 1 corresponds to nucleotide 743 in the PV(M) virus genome. In the consensus sequence, the occurrence of the same nucleotide in all three sequences is indicated by an upper case letter; the occurrence of the same nucleotide in two of the three sequences is indicated by a lower case letter; and the occurrence of three different nucleotides in the three sequences is indicated by a period.

FIGS. 3A-J. Codon-deoptimized virus phenotypes. (FIG. 3A) Overview of virus constructs used in this study. (FIG. 3B) One-step growth kinetics in HeLa cell monolayers. (FIGS. 3C to H) Plaque phenotypes of codon-deoptimized viruses after 48 h (FIGS. 3C to F) or 72 h (FIGS. 3G and H) of incubation; stained with anti-3D^polantibody to visualize infected cells. (FIG. 3C) PV(M), (FIG. 3D) PV-SD, (FIG. 3E) PV-AB, (FIG. 3F) PV-AB^755-1513, (FIGS. 3G and H) PV-AB^2470-2954. Cleared plaque areas are outlined by a rim of infected cells (FIGS. 3C and D). (FIG. 3H) No plaques are apparent with PV-AB^2470-2954after subsequent crystal violet staining of the well shown in panel FIG. 3G. (FIGS. 3I and J) Microphotographs of the edge of an immunostained plaque produced by PV(M) (FIG. 3I) or an infected focus produced by PV-AB^2470-2954(FIG. 3J) after 48 h of infection.

FIGS. 4A-E. Codon deoptimization leads to a reduction of specific infectivity. (FIG. 4A) Agarose gel electrophoresis of virion genomic RNA isolated from purified virus particles of PV(M) (lane 1), PV-AB^755-1513(lane 2), and PV-AB^2470-2954(lane 3). (FIG. 4B) Silver-stained SDS-PAGE protein gel of purified PV(M) (lane 1), PV-AB^755-1513(lane 2), and PV-AB^2470-2954(lane 3) virus particles. The three larger of the four capsid proteins (VP1, VP2, and VP3) are shown, demonstrating the purity and relative amounts of virus preparations. (FIG. 4C) Development of a virus capture ELISA using a poliovirus receptor-alkaline phosphatase (CD155-AP) fusion protein probe. Virus-specific antibodies were used to coat ELISA plates, and samples containing an unknown virus concentration were applied followed by detection with CD155-AP. Virus concentrations were calculated using a standard curve prepared in parallel with known amounts of purified wt virus (FIG. 4E). (FIG. 4D) The amounts of purified virus and extracted virion RNA were spectrophotometrically quantified, and the number of particles or genome equivalents (1 genome=1 virion) was calculated. In addition, virion concentrations were determined by ELISA. The infectious titer of each virus was determined by plaque/infected-focus assay, and the specific infectivity was calculated as PFU/particle or FFU/particle.

FIGS. 5A-B. In vitro translation of codon-deoptimized and wild type viruses. The PV-AB phenotype is determined at the level of genome translation. (FIG. 5A) A standard in vitro translation in HeLa S10 extract, in the presence of exogenously added amino acids and tRNAs reveals no differences in translation capacities of codon-deoptimized genomes compared to the PV(M) wt. Shown is an autoradiograph of [³⁵S]methionine-labeled translation products resolved on a 12.5% SDS-PAGE gel. The identity of an aberrant band (*) is not known. (FIG. 5B) In vitro translation in nondialyzed HeLa S10 extract without the addition of exogenous amino acids and tRNA and in the presence of competing cellular mRNAs uncovers a defect in translation capacities of codon-deoptimized PV genomes. Shown is a Western blot of poliovirus 2C reactive translation products (2C^ATPase, 2BC, and P2) resolved on a 10% SDS-PAGE gel. The relative amounts of the 2BC translation products are expressed below each lane as percentages of the wt band.

FIGS. 6A-B. Analysis of in vivo translation using dicistronic reporter replicons confirms the detrimental effect of codon deoptimization on PV translation. (FIG. 6A) Schematic of dicistronic replicons. Various P1 capsid coding sequences were inserted upstream of the firefly luciferase gene (F-Luc). Determination of changing levels of F-Luc expression relative to an internal control (R-Luc) allows for the quantification of ribosome transit through the P1 capsid region. (FIG. 6B) Replicon RNAs were transfected into HeLa cells and incubated for 7 h in the presence of 2 mM guanidine-hydrochloride to block RNA replication. The relative rate of translation through the P1 region was inversely proportional to the extent of codon deoptimization. While the capsid coding sequences of two viable virus constructs, PV-AB^2470-2954and PV-AB^2954-3386, allow between 60 and 80% of wt translation, translation efficiency below 20% is associated with the lethal phenotypes observed with the PV-AB, PV-AB^2470-3386, and PV-AB^1513-2470genomes. Values represents the average of 6 assays from 3 independent experiments.

FIG. 7. Determining codon pair bias of human and viral ORFs. Dots represent the average codon-pair score per codon pair for one ORF plotted against its length. Codon pair bias (CPB) was calculated for 14,795 annotated human genes. Under-represented codon pairs yield negative scores. CPB is plotted for various poliovirus P1 constructs, represented by symbols with arrows. The figure illustrates that the bulk of human genes clusters around 0.1. CPB is shown for PV(M)-wt (labeled “WT”) (−0.02), customized synthetic poliovirus capsids PV-Max (+0.25), PV-Min (−0.48), and PV(M)-wt:PV-Min chimera capsids PV-Min^755-2479(=“PV-MinXY”) (−0.31) and PV-Min^2470-3386(=“PV-MinZ”) (−0.20). Viruses PV-SD and PV-AB are the result of altered codon bias, but not altered codon pair bias.

FIGS. 8A-B. Characteristics of codon-pair deoptimized polio. (FIG. 8A) One-step growth kinetics reveals PFU production for PV-Min^755-2470and PV-Min^2470-3385that is reduced on the order of 2.5 orders of magnitude by comparison to PV(M)-wt. However, all viruses produce a similar number of viral particles (not shown in this Figure). (FIG. 8B) As a result the PFU/particle ratio is reduced, similar to codon deoptimized viruses PV-AB^755-1513and PV-AB^2470-2954(see FIG. 3B) (PFU is “Plaque Forming Unit”).

FIG. 9. Assembly of chimeric viral genomes. To “scan” through a target genome (red) small segments are amplified or synthesized and introduced into the wt genome (black) by overlapping PCR.

FIG. 10. The eight-plasmid pol I-pol II system for the generation of influenza A virus. Eight expression plasmids containing the eight viral cDNAs inserted between the human pol I promoter and the pol II promoter are transfected into eukaryotic cells. Because each plasmid contains two different promoters, both cellular pol I and pol II will transcribe the plasmid template, presumably in different nuclear compartments, which will result in the synthesis of viral mRNAs and vRNAs. After synthesis of the viral polymerase complex proteins (PB1, PB2, PA, nucleoproteins), the viral replication cycle is initiated. Ultimately, the assembly of all viral molecules directly (pol II transcription) or indirectly (pol I transcription and viral replication) derived from the cellular transcription and translation machinery results in the interaction of all synthesized molecules (vRNPs and the structural proteins HA, NA, M1, M2, NS2/NEP) to generate infectious influenza A virus. (Reproduced from Neumann et al., 2000.) (Note: there are other ways of synthesizing influenza de novo).

FIGS. 11A-B. Poliovirus Genome and Synthetic Viral Constructs. The poliovirus genome and open reading frames of chimeric virus constructs. (FIG. 11A) Top, a schematic of the full-length PV(M)-wt genomic RNA. (FIG. 11B) Below, the open reading frames of PV(M)-wt, the CPB customized synthetic viruses PV-Max, PV-Min, and the PV(M)-wt:PV-Min chimera viruses. Black corresponds to PV(M)-wt sequence, Gray to PV-Min synthetic sequence, and Thatched to PV-Max. The viral constructs highlighted, PV-Min^755-2470(PV-MinXY) and PV-Min^2470-3385(PV-MinZ), were further characterized due to a markedly attenuated phenotype.

FIGS. 12A-B. On-Step growth curves display similar kinetics yielding a similar quantity of particles with decreased infectivity. (FIG. 12A) An MOI of 2 was used to infect a monolayer of HeLa R19 cells, the PFU at the given time points (0, 2, 4, 7, 10, 24, 48 hrs) was measured by plaque assay. Corresponding symbols: (□) PV(M)-wt, (●) PV-Max, (⋄) PV-Min755-1513, (x) PV-Min1513-2470, (♦) PV-MinXY, (Δ) PV-MinZ (FIG. 12B) Displays the conversion of the calculated PFU/ml at each time point to particles/ml. This achieved by multiplying the PFU/ml by the respective viruses specific infectivity. Corresponding symbols as in (FIG. 12A)

FIGS. 13A-B. In vivo modulation of translation by alteration of CPB. (FIG. 13A) The dicistronic RNA construct used to quantify the in vivo effect CPB has on translation. The first cistron utilizes a hepatitis C virus (HCV) Internal Ribosome Entry Site (IRES) inducing the translation of Renilla Luciferase (R-Luc). This first cistron is the internal control used to normalize the amount of input RNA. The second cistron controlled by the PV(M)-wt IRES induces the translation of Firefly Luciferase (F-Luc). The region labeled “P1” in the construct was replaced by the cDNA of each respective viruses P1. (FIG. 13B) Each respective RNA construct was transfected, in the presence of 2 mM guanidine hydrochloride, into HeLa R19 cells and after 6 hours the R-Luc and F-Luc were measured. The F-Luc/R-Luc values were normalized relative to PV(M)-wt translation (100%).

FIG. 14. The heat inactivation profile of the synthetic viruses is unchanged. To rule out that large scale codon-pair bias modification alters the gross morphology of virions, as one might expect if capsid proteins were misfolded, the thermal stability of PVMinXY and PV-MinZ was tested. An equal number of particles were incubated at 50° C. and the remaining infectivity quantified after given periods of time via plaque assay. If the capsids of the synthetic viruses were destabilized we would expect increased loss of viability at 50° C. in comparison to wt PV(M). This was not the case. The thermal inactivation kinetics of both synthetic viruses was identical to the wt. In contrast, the Sabin-1 virus carries numerous mutations in the genome region encoding the capsid, which, fittingly, rendered this virus less heat stabile as compared to wt PV1(M).

FIG. 15. Neutralizing antibody titer following vaccination. A group of eight CD155 tg mice, seven of which completed the regimen, were each inoculated by intraperitoneal injection three times at weekly intervals with 10⁸particles of PV-MinZ (●) and PV-MinXY (♦) and the serum conversion was measured 10 days after the final vaccination. A horizontal lines across each data set marks the average neutralizing antibody titer for each virus construct. The anti-poliovirus antibody titer was measured via micro-neutralization assay. (*) No virus neutralization for mock-vaccinated animals was detected at the lowest tested 1:8.

FIGS. 16A-B. Influenza virus carrying codon pair-deoptimized NP segment. (FIG. 16A) A/PR8-NP^Minvirus are viable and produce smaller plaques on MDCK cells compared to the A/PR8 wt. (FIG. 16B) A/PR8-NP^Minvirus display delayed growth kinetics and final titers 3-5 fold below wild type A/PR8.

FIGS. 17A-B. Influenza virus carrying codon pair-deoptimized PB1 or HA and NP segments. (FIG. 17A) A/PR8-PB1^Min-RRand A/PR8-HA^Min/NP^Minvirus are viable and produce smaller plaques on MDCK cells as compared to the A/PR8 wild type. (FIG. 17B) A/PR8-PB1^Min-RRand A/PR8-HA^Min/NP^Minvirus display delayed growth kinetics and final titers about 10 fold below wild type A/PR8.

FIGS. 18A-C. Attenuation of A/PR8-NP^Minin BALB/c mouse model. (FIG. 18A) A/PR8-NP^Minvirus has reduced pathogenicity compared to wild type A/PR8 virus as determined by weight loss upon vaccination. (FIG. 18B) All mice (eight of eight) vaccinated with A/PR8-NP^Minvirus survived, where as only 25% (two of eight) mice infected with A/PR8 were alive 13 days post vaccination. (FIG. 18C) Mice vaccinated with A/PR8-NP^Minvirus are protected from challenge with 100×LD₅₀of A/PR8 wild type virus.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to the production of attenuated viruses that may be used as vaccines to protect against viral infection and disease. Accordingly, the invention provides an attenuated virus, which comprises a modified viral genome containing nucleotide substitutions engineered in multiple locations in the genome, wherein the substitutions introduce a plurality of synonymous codons into the genome and/or a change of the order of existing codons for the same amino acid (change of codon pair utilization). In both cases, the original, wild-type amino acid sequences of the viral gene products are retained.

Most amino acids are encoded by more than one codon. See the genetic code in Table 1. For instance, alanine is encoded by GCU, GCC, GCA, and GCG. Three amino acids (Leu, Ser, and Arg) are encoded by six different codons, while only Trp and Met have unique codons. “Synonymous” codons are codons that encode the same amino acid. Thus, for example, CUU, CUC, CUA, CUG, UUA, and UUG are synonymous codons that code for Leu. Synonymous codons are not used with equal frequency. In general, the most frequently used codons in a particular organism are those for which the cognate tRNA is abundant, and the use of these codons enhances the rate and/or accuracy of protein translation. Conversely, tRNAs for the rarely used codons are found at relatively low levels, and the use of rare codons is thought to reduce translation rate and/or accuracy. Thus, to replace a given codon in a nucleic acid by a synonymous but less frequently used codon is to substitute a “deoptimized” codon into the nucleic acid.

TABLE 1

Genetic Code

U
C
A
G

U
Phe
Ser
Tyr
Cys
U

Phe
Ser
Tyr
Cys
C

Leu
Ser
STOP
STOP
A

Leu
Ser
STOP
Trp
G

C
Leu
Pro
His
Arg
U

Leu
Pro
His
Arg
C

Leu
Pro
Gln
Arg
A

Leu
Pro
Gln
Arg
G

A
Ile
Thr
Asn
Ser
U

Ile
Thr
Asn
Ser
C

Ile
Thr
Lys
Arg
A

Met
Thr
Lys
Arg
G

G
Val
Ala
Asp
Gly
U

Val
Ala
Asp
Gly
C

Val
Ala
Glu
Gly
A

Val
Ala
Glu
Gly
G

^aThe first nucleotide in each codon encoding a particular amino acid is shown in the left-most column; the second nucleotide is shown in the top row; and the third nucleotide is shown in the right-most column.

In addition, a given organism has a preference for the nearest codon neighbor of a given codon A, referred to a bias in codon pair utilization. A change of codon pair bias, without changing the existing codons, can influence the rate of protein synthesis and production of a protein.

In various embodiments of the present invention, the virus is a DNA, RNA, double-stranded, or single-stranded virus. In further embodiments, the virus infects an animal or a plant. In preferred embodiments, the animal is a human. A large number of animal viruses are well known to cause diseases (see below). Certain medically important viruses, such as those causing rabies, severe acute respiratory syndrome (SARS), and avian flu, can also spread to humans from their normal non-human hosts.

Viruses also constitute a major group of plant pathogens, and research is ongoing to develop viral vectors for producing transgenic plants. The advantages of such vectors include the ease of transforming plants, the ability to transform mature plants which obviates the need for regeneration of a transgenic plant from a single transformed cell, and high levels of expression of foreign genes from the multiple copies of virus per cell. However, one of the main disadvantages of these vectors is that it has not been possible to separate essential viral replicative functions from pathogenic determinants of the virus. The SAVE strategy disclosed herein may afford a means of engineering non-pathogenic viral vectors for plant transformation.

Major Viral Pathogens in Humans

Viral pathogens are the causative agents of many diseases in humans and other animals. Well known examples of viral diseases in humans include the common cold (caused by human rhinoviruses, HRV), influenza (influenza virus), chickenpox (varicella-zoster virus), measles (a paramyxovirus), mumps (a paramyxovirus), poliomyelitis (poliovirus, PV), rabies (Lyssavirus), cold sores (Herpes Simplex Virus [HSV] Type 1), and genital herpes (HSV Type 2). Prior to the introduction of vaccination programs for children, many of these were common childhood diseases worldwide, and are still a significant threat to health in some developing countries. Viral diseases also include more serious diseases such as acquired immunodeficiency syndrome (AIDS) caused by Human Immunodeficiency Virus (HIV), severe acute respiratory syndrome (SARS) caused by SARS coronavirus, avian flu (H5N1 subtype of influenza A virus), Ebola (ebolavirus), Marburg haemorrhagic fever (Marburg virus), dengue fever (Flavivirus serotypes), West Nile encephalitis (a flavivirus), infectious mononucleosis (Epstein-Barr virus, EBV), hepatitis (Hepatitis C Virus, HCV; hepatitis B virus, HBV), and yellow fever (flavivirus). Certain types of cancer can also be caused by viruses. For example, although most infections by human papillomavirus (HPV) are benign, HPV has been found to be associated with cervical cancer, and Kaposi's sarcoma (KS), a tumor prevalent in AIDS patients, is caused by Kaposi's sarcoma-associated herpesvirus (KSHV).

Because viruses reside within cells and use the machinery of the host cell to reproduce, they are difficult to eliminate without killing the host cell. The most effective approach to counter viral diseases has been the vaccination of subjects at risk of infection in order to provide resistance to infection. For some diseases (e.g., chickenpox, measles, mumps, yellow fever), effective vaccines are available. However, there is a pressing need to develop vaccines for many other viral diseases. The SAVE (Synthetic Attenuated Virus Engineering) approach to making vaccines described herein is in principle applicable to all viruses for which a reverse genetics system (see below) is available. This approach is exemplified herein by focusing on the application of SAVE to develop attenuated virus vaccines for poliomyelitis, the common cold, and influenza.

Any virus can be attenuated by the methods disclosed herein. The virus can be a dsDNA virus (e.g. Adenoviruses, Herpesviruses, Poxviruses), a single stranded “plus” sense DNA virus (e.g., Parvoviruses) a double stranded RNA virus (e.g., Reoviruses), a single stranded+sense RNA virus (e.g. Picornaviruses, Togaviruses), a single stranded “minus” sense RNA virus (e.g. Orthomyxoviruses, Rhabdoviruses), a single stranded+sense RNA virus with a DNA intermediate (e.g. Retroviruses), or a double stranded reverse transcribing virus (e.g. Hepadnaviruses). In certain non-limiting embodiments of the present invention, the virus is poliovirus (PV), rhinovirus, influenza virus including avian flu (e.g. H5N1 subtype of influenza A virus), severe acute respiratory syndrome (SARS) coronavirus, Human Immunodeficiency Virus (HIV), Hepatitis B Virus (HBV), Hepatitis C Virus (HCV), infectious bronchitis virus, ebolavirus, Marburg virus, dengue fever virus (Flavivirus serotypes), West Nile disease virus, Epstein-Barr virus (EBV), yellow fever virus, Ebola (ebolavirus), chickenpox (varicella-zoster virus), measles (a paramyxovirus), mumps (a paramyxovirus), rabies (Lyssavirus), human papillomavirus, Kaposi's sarcoma-associated herpesvirus, Herpes Simplex Virus (HSV Type 1), or genital herpes (HSV Type 2).

The term “parent” virus or “parent” protein encoding sequence is used herein to refer to viral genomes and protein encoding sequences from which new sequences, which may be more or less attenuated, are derived. Parent viruses and sequences are usually “wild type” or “naturally occurring” prototypes or isolates of variants for which it is desired to obtain a more highly attenuated virus. However, parent viruses also include mutants specifically created or selected in the laboratory on the basis of real or perceived desirable properties. Accordingly, parent viruses that are candidates for attenuation include mutants of wild type or naturally occurring viruses that have deletions, insertions, amion acid substitutions and the like, and also include mutants which have codon substitutions. In one embodiment, such a parent sequence differs from a natural isolate by about 30 amino acids or fewer. In another embodiment, the parent sequence differs from a natural isolate by about 20 amino acids or fewer. In yet another embodiment, the parent sequence differs from a natural isolate by about 10 amino acids or fewer.

The attenuated PV may be derived from poliovirus type 1 (Mahoney; “PV(M)”), poliovirus type 2 (Lansing), poliovirus type 3 (Leon), monovalent oral poliovirus vaccine (OPV) virus, or trivalent OPV virus. In certain embodiments, the poliovirus is PV-AB having the genomic sequence set forth in SEQ ID NO:2, or PV-AB^755-1513, PV-AB^755-2470, PV-AB^1513-3386, PV-AB^2470-3386, PV-AB^1513-2470, PV-AB^2470-2954, or PV-AB^2954-3386. The nomenclature reflects a PV(M) genome in which portions of the genome, are substituted with nucleotides of PV-AB. The superscript provides the nucleotide numbers of PV-AB that are substituted.

In various embodiments, the attenuated rhinovirus is a human rhinovirus (HRV) derived from HRV2, HRV14, Human rhinovirus 10 Human rhinovirus 100; Human rhinovirus 11; Human rhinovirus 12; Human rhinovirus 13; Human rhinovirus 15; Human rhinovirus 16; Human rhinovirus 18; Human rhinovirus 19; Human rhinovirus 1A; Human rhinovirus 1B; Human rhinovirus 2; Human rhinovirus 20; Human rhinovirus 21; Human rhinovirus 22; Human rhinovirus 23; Human rhinovirus 24; Human rhinovirus 25; Human rhinovirus 28; Human rhinovirus 29; Human rhinovirus 30; Human rhinovirus 31 Human rhinovirus 32; Human rhinovirus 33; Human rhinovirus 34; Human rhinovirus 36; Human rhinovirus 38; Human rhinovirus 39; Human rhinovirus 40; Human rhinovirus 41; Human rhinovirus 43; Human rhinovirus 44; Human rhinovirus 45; Human rhinovirus 46; Human rhinovirus 47; Human rhinovirus 49; Human rhinovirus 50; Human rhinovirus 51; Human rhinovirus 53; Human rhinovirus 54; Human rhinovirus 55; Human rhinovirus 56; Human rhinovirus 57; Human rhinovirus 58; Human rhinovirus 59; Human rhinovirus 60; Human rhinovirus 61; Human rhinovirus 62; Human rhinovirus 63; Human rhinovirus 64; Human rhinovirus 65; Human rhinovirus 66; Human rhinovirus 67; Human rhinovirus 68; Human rhinovirus 7; Human rhinovirus 71; Human rhinovirus 73; Human rhinovirus 74; Human rhinovirus 75; Human rhinovirus 76; Human rhinovirus 77; Human rhinovirus 78; Human rhinovirus 8; Human rhinovirus 80; Human rhinovirus 81; Human rhinovirus 82; Human rhinovirus 85; Human rhinovirus 88; Human rhinovirus 89; Human rhinovirus 9; Human rhinovirus 90; Human rhinovirus 94; Human rhinovirus 95; Human rhinovirus 96 Human rhinovirus 98; Human rhinovirus 14; Human rhinovirus 17; Human rhinovirus 26; Human rhinovirus 27; Human rhinovirus 3; Human rhinovirus 8001 Finland November 1995; Human rhinovirus 35; Human rhinovirus 37; +Human rhinovirus 6253 Finland September 1994; Human rhinovirus 9166 Finland September 1995; Human rhinovirus 4; Human rhinovirus 42; Human rhinovirus 48; Human rhinovirus 9864 Finland September 1996; Human rhinovirus 5; Human rhinovirus 52; Human rhinovirus 6; Human rhinovirus 7425 Finland December 1995; Human rhinovirus 69; Human rhinovirus 5928 Finland May 1995; Human rhinovirus 70; Human rhinovirus 72; Human rhinovirus 79; Human rhinovirus 83; Human rhinovirus 84; Human rhinovirus 8317 Finland August 1996; Human rhinovirus 86; Human rhinovirus 91; Human rhinovirus 7851 Finland September 1996; Human rhinovirus 92; Human rhinovirus 93; Human rhinovirus 97; Human rhinovirus 99; Antwerp rhinovirus 98/99; Human rhinovirus 263 Berlin 2004; Human rhinovirus 3083/rhino/Hyogo/2005; Human rhinovirus NY-003; Human rhinovirus NY-028; Human rhinovirus NY-041; Human rhinovirus NY-042; Human rhinovirus NY-060; Human rhinovirus NY-063; Human rhinovirus NY-074; Human rhinovirus NY-1085; Human rhinovirus strain Hanks; Untyped human rhinovirus OK88-8162; Human enterovirus sp. ex Amblyomma americanum; Human rhinovirus sp. or Human rhinovirus UC.

In other embodiments, the attenuated influenza virus is derived from influenza virus A, influenza virus B, or influenza virus C. In further embodiments, the influenza virus A belongs to but is not limited to subtype H10N7, H10N1, H10N2, H10N3, H10N4, H10N5, H10N6, H10N7, H10N8, H10N9, H11N1, H11N2, H11N3, H11N4, H11N6, H11N8, H11N9, H12N1, H12N2, H12N4, H12N5, H12N6, H12N8, H12N9, H13N2, H13N3, H13N6, H13N9, H14N5, H14N6, H15N2, H15N8, H15N9, H16N3, H1N1, H1N2, H1N3, H1N5, H1N6, H1N8, H1N9, H2N1, H2N2, H2N3, H2N4, H2N5, H2N6, H2N7, H2N8, H2N9, H3N1, H3N2, H3N3, H3N4, H3N5, H3N6, H3N8, H3N9, H4N1, H4N2, H4N3, H4N4, H4N5, H4N6, H4N7, H4N8, H4N9, H5N1, H5N2, H5N3, H5N4, H5N6, H5N7, H5N8, H5N9, H6N1, H6N2, H6N3, H6N4, H6N5, H6N6, H6N7, H6N8, H6N9, H7N1, H7N2, H7N3, H7N4, H7N5, H7N7, H7N8, H7N9, H8N2, H8N4, H8N5, H9N1, H9N2, H9N3, H9N4, H9N5, H9N6, H9N7, H9N8, H9N9 and unidentified subtypes.

In further embodiments, the influenza virus B belongs to but is not limited to subtype Influenza B virus (B/Aichi/186/2005), Influenza B virus (B/Aichi/5/88), Influenza B virus (B/Akita/27/2001), Influenza B virus (B/Akita/5/2001), Influenza B virus (B/Alabama/1/2006), Influenza B virus (B/Alabama/2/2005), Influenza B virus (B/Alaska/03/1992), Influenza B virus (B/Alaska/12/1996), Influenza B virus (B/Alaska/16/2000), Influenza B virus (B/Alaska/16/2003), Influenza B virus (B/Alaska/1777/2005), Influenza B virus (B/Alaska/2/2004), Influenza B virus (B/Alaska/6/2005), Influenza B virus (B/Ann Arbor/1/1986), Influenza B virus (B/Ann Arbor/1994), Influenza B virus (B/Argentina/132/2001), Influenza B virus (B/Argentina/3640/1999), Influenza B virus (B/Argentina/69/2001), Influenza B virus (B/Arizona/1/2005), Influenza B virus (B/Arizona/12/2003), Influenza B virus (B/Arizona/13/2003), Influenza B virus (B/Arizona/135/2005), Influenza B virus (B/Arizona/14/2001), Influenza B virus (B/Arizona/14/2005), Influenza B virus (B/Arizona/140/2005), Influenza B virus (B/Arizona/146/2005), Influenza B virus (B/Arizona/148/2005), Influenza B virus (B/Arizona/15/2005), Influenza B virus (B/Arizona/16/2005), Influenza B virus (B/Arizona/162/2005), Influenza B virus (B/Arizona/163/2005), Influenza B virus (B/Arizona/164/2005), Influenza B virus (B/Arizona/2/2000), Influenza B virus (B/Arizona/2/2005), Influenza B virus (B/Arizona/2e/2006), Influenza B virus (B/Arizona/3/2006), Influenza B virus (B/Arizona/4/2002), Influenza B virus (B/Arizona/4/2006), Influenza B virus (B/Arizona/48/2005), Influenza B virus (B/Arizona/5/2000), Influenza B virus (B/Arizona/59/2005), Influenza B virus (B/Arizona/7/2000), Influenza B virus (B/Auckland/01/2000), Influenza B virus (B/Bangkok/141/1994), Influenza B virus (B/Bangkok/143/1994), Influenza B virus (B/Bangkok/153/1990), Influenza B virus (B/Bangkok/163/1990), Influenza B virus (B/Bangkok/163/90), Influenza B virus (B/Bangkok/34/99), Influenza B virus (B/Bangkok/460/03), Influenza B virus (B/Bangkok/54/99), Influenza B virus (B/Barcelona/215/03), Influenza B virus (B/Beijing/15/84), Influenza B virus (B/Beijing/184/93), Influenza B virus (B/Beijing/243/97), Influenza B virus (B/Beijing/43/75), Influenza B virus (B/Beijing/5/76), Influenza B virus (B/Beijing/76/98), Influenza B virus (B/Belgium/WV106/2002), Influenza B virus (B/Belgium/WV107/2002), Influenza B virus (B/Belgium/WV109/2002), Influenza B virus (B/Belgium/WV114/2002), Influenza B virus (B/Belgium/WV122/2002), Influenza B virus (B/Bonn/43), Influenza B virus (B/Brazil/017/00), Influenza B virus (B/Brazil/053/00), Influenza B virus (B/Brazil/055/00), Influenza B virus (B/Brazil/064/00), Influenza B virus (B/Brazil/074/00), Influenza B virus (B/Brazil/079/00), Influenza B virus (B/Brazil/110/01), Influenza B virus (B/Brazil/952/2001), Influenza B virus (B/Brazil/975/2000), Influenza B virus (B/Brisbane/32/2002), Influenza B virus (B/Bucharest/311/1998), Influenza B virus (B/Bucharest/795/03), Influenza B virus (B/Buenos Aires/161/00), Influenza B virus (B/Buenos Aires/9/95), Influenza B virus (B/Buenos Aires/SW16/97), Influenza B virus (B/Buenos Aires/VL518/99), Influenza B virus (B/California/01/1995), Influenza B virus (B/California/02/1994), Influenza B virus (B/California/02/1995), Influenza B virus (B/California/1/2000), Influenza B virus (B/California/10/2000), Influenza B virus (B/California/11/2001), Influenza B virus (B/California/14/2005), Influenza B virus (B/California/2/2002), Influenza B virus (B/California/2/2003), Influenza B virus (B/California/3/2000), Influenza B virus (B/California/3/2004), Influenza B virus (B/California/6/2000), Influenza B virus (B/California/7/2005), Influenza B virus (B/Canada/16188/2000), Influenza B virus (B/Canada/464/2001), Influenza B virus (B/Canada/464/2002), Influenza B virus (B/Chaco/366/00), Influenza B virus (B/Chaco/R113/00), Influenza B virus (B/Chantaburi/218/2003), Influenza B virus (B/Cheju/303/03), Influenza B virus (B/Chiba/447/98), Influenza B virus (B/Chile/3162/2002), Influenza B virus (B/Chongqing/3/2000), Influenza B virus (B/clinical isolate SA1 Thailand/2002), Influenza B virus (B/clinical isolate SA10 Thailand/2002), Influenza B virus (B/clinical isolate SA100 Philippines/2002), Influenza B virus (B/clinical isolate SA101 Philippines/2002), Influenza B virus (B/clinical isolate SA102 Philippines/2002), Influenza B virus (B/clinical isolate SA103 Philippines/2002), Influenza B virus (B/clinical isolate SA104 Philippines/2002), Influenza B virus (B/clinical isolate SA105 Philippines/2002), Influenza B virus (B/clinical isolate SA106 Philippines/2002), Influenza B virus (B/clinical isolate SA107 Philippines/2002), Influenza B virus (B/clinical isolate SA108 Philippines/2002), Influenza B virus (B/clinical isolate SA109 Philippines/2002), Influenza B virus (B/clinical isolate SA11 Thailand/2002), Influenza B virus (B/clinical isolate SA110 Philippines/2002), Influenza B virus (B/clinical isolate SA112 Philippines/2002), Influenza B virus (B/clinical isolate SA113 Philippines/2002), Influenza B virus (B/clinical isolate SA114 Philippines/2002), Influenza B virus (B/clinical isolate SA115 Philippines/2002), Influenza B virus (B/clinical isolate SA116 Philippines/2002), Influenza B virus (B/clinical isolate SA12 Thailand/2002), Influenza B virus (B/clinical isolate SA13 Thailand/2002), Influenza B virus (B/clinical isolate SA14 Thailand/2002), Influenza B virus (B/clinical isolate SA15 Thailand/2002), Influenza B virus (B/clinical isolate SA16 Thailand/2002), Influenza B virus (B/clinical isolate SA17 Thailand/2002), Influenza B virus (B/clinical isolate SA18 Thailand/2002), Influenza B virus (B/clinical isolate SA19 Thailand/2002), Influenza B virus (B/clinical isolate SA2 Thailand/2002), Influenza B virus (B/clinical isolate SA20 Thailand/2002), Influenza B virus (B/clinical isolate SA21 Thailand/2002), Influenza B virus (B/clinical isolate SA22 Thailand/2002), Influenza B virus (B/clinical isolate SA23 Thailand/2002), Influenza B virus (B/clinical isolate SA24 Thailand/2002), Influenza B virus (B/clinical isolate SA25 Thailand/2002), Influenza B virus (B/clinical isolate SA26 Thailand/2002), Influenza B virus (B/clinical isolate SA27 Thailand/2002), Influenza B virus (B/clinical isolate SA28 Thailand/2002), Influenza B virus (B/clinical isolate SA29 Thailand/2002), Influenza B virus (B/clinical isolate SA3 Thailand/2002), Influenza B virus (B/clinical isolate SA30 Thailand/2002), Influenza B virus (B/clinical isolate SA31 Thailand/2002), Influenza B virus (B/clinical isolate SA32 Thailand/2002), Influenza B virus (B/clinical isolate SA33 Thailand/2002), Influenza B virus (B/clinical isolate SA34 Thailand/2002), Influenza B virus (B/clinical isolate SA37 Thailand/2002), Influenza B virus (B/clinical isolate SA38 Philippines/2002), Influenza B virus (B/clinical isolate SA39 Thailand/2002), Influenza B virus (B/clinical isolate SA40 Thailand/2002), Influenza B virus (B/clinical isolate SA41 Philippines/2002), Influenza B virus (B/clinical isolate SA42 Philippines/2002), Influenza B virus (B/clinical isolate SA43 Thailand/2002), Influenza B virus (B/clinical isolate SA44 Thailand/2002), Influenza B virus (B/clinical isolate SA45 Philippines/2002), Influenza B virus (B/clinical isolate SA46 Philippines/2002), Influenza B virus (B/clinical isolate SA47 Philippines/2002), Influenza B virus (B/clinical isolate SA5 Thailand/2002), Influenza B virus (B/clinical isolate SA50 Philippines/2002), Influenza B virus (B/clinical isolate SA51 Philippines/2002), Influenza B virus (B/clinical isolate SA52 Philippines/2002), Influenza B virus (B/clinical isolate SA53 Philippines/2002), Influenza B virus (B/clinical isolate SA57 Philippines/2002), Influenza B virus (B/clinical isolate SA58 Philippines/2002), Influenza B virus (B/clinical isolate SA59 Philippines/2002), Influenza B virus (B/clinical isolate SA6 Thailand/2002), Influenza B virus (B/clinical isolate SA60 Philippines/2002), Influenza B virus (B/clinical isolate SA61 Philippines/2002), Influenza B virus (B/clinical isolate SA62 Philippines/2002), Influenza B virus (B/clinical isolate SA63 Philippines/2002), Influenza B virus (B/clinical isolate SA64 Philippines/2002), Influenza B virus (B/clinical isolate SA65 Philippines/2002), Influenza B virus (B/clinical isolate SA66 Philippines/2002), Influenza B virus (B/clinical isolate SA67 Philippines/2002), Influenza B virus (B/clinical isolate SA68 Philippines/2002), Influenza B virus (B/clinical isolate SA69 Philippines/2002), Influenza B virus (B/clinical isolate SA7 Thailand/2002), Influenza B virus (B/clinical isolate SA70 Philippines/2002), Influenza B virus (B/clinical isolate SA71 Philippines/2002), Influenza B virus (B/clinical isolate SA73 Philippines/2002), Influenza B virus (B/clinical isolate SA74 Philippines/2002), Influenza B virus (B/clinical isolate SA76 Philippines/2002), Influenza B virus (B/clinical isolate SA77 Philippines/2002), Influenza B virus (B/clinical isolate SA78 Philippines/2002), Influenza B virus (B/clinical isolate SA79 Philippines/2002), Influenza B virus (B/clinical isolate SA8 Thailand/2002), Influenza B virus (B/clinical isolate SA80 Philippines/2002), Influenza B virus (B/clinical isolate SA81 Philippines/2002), Influenza B virus (B/clinical isolate SA82 Philippines/2002), Influenza B virus (B/clinical isolate SA83 Philippines/2002), Influenza B virus (B/clinical isolate SA84 Philippines/2002), Influenza B virus (B/clinical isolate SA85 Thailand/2002), Influenza B virus (B/clinical isolate SA86 Thailand/2002), Influenza B virus (B/clinical isolate SA87 Thailand/2002), Influenza B virus (B/clinical isolate SA88 Thailand/2002), Influenza B virus (B/clinical isolate SA89 Thailand/2002), Influenza B virus (B/clinical isolate SA9 Thailand/2002), Influenza B virus (B/clinical isolate SA90 Thailand/2002), Influenza B virus (B/clinical isolate SA91 Thailand/2002), Influenza B virus (B/clinical isolate SA92 Thailand/2002), Influenza B virus (B/clinical isolate SA93 Thailand/2002), Influenza B virus (B/clinical isolate SA94 Thailand/2002), Influenza B virus (B/clinical isolate SA95 Philippines/2002), Influenza B virus (B/clinical isolate SA96 Thailand/2002), Influenza B virus (B/clinical isolate SA97 Philippines/2002), Influenza B virus (B/clinical isolate SA98 Philippines/2002), Influenza B virus (B/clinical isolate SA99 Philippines/2002), Influenza B virus (B/CNIC/27/2001), Influenza B virus (B/Colorado/04/2004), Influenza B virus (B/Colorado/11e/2004), Influenza B virus (B/Colorado/12e/2005), Influenza B virus (B/Colorado/13/2004), Influenza B virus (B/Colorado/13e/2004), Influenza B virus (B/Colorado/15/2004), Influenza B virus (B/Colorado/16e/2004), Influenza B virus (B/Colorado/17e/2004), Influenza B virus (B/Colorado/2/2004), Influenza B virus (B/Colorado/2597/2004), Influenza B virus (B/Colorado/4e/2004), Influenza B virus (B/Colorado/5/2004), Influenza B virus (B/Connecticut/02/1995), Influenza B virus (B/Connecticut/07/1993), Influenza B virus (B/Cordoba/2979/1991), Influenza B virus (B/Cordoba/VA418/99), Influenza B virus (B/Czechoslovakia/16/89), Influenza B virus (B/Czechoslovakia/69/1990), Influenza B virus (B/Czechoslovakia/69/90), Influenza B virus (B/Daeku/10/97), Influenza B virus (B/Daeku/45/97), Influenza B virus (B/Daeku/47/97), Influenza B virus (B/Daeku/9/97), Influenza B virus (B/Delaware/1/2006), Influenza B virus (B/Du/4/78), Influenza B virus (B/Durban/39/98), Influenza B virus (B/Durban/43/98), Influenza B virus (B/Durban/44/98), Influenza B virus (B/Durban/52/98), Influenza B virus (B/Durban/55/98), Influenza B virus (B/Durban/56/98), Influenza B virus (B/Egypt/2040/2004), Influenza B virus (B/England/1716/2005), Influenza B virus (B/England/2054/2005), Influenza B virus (B/England/23/04), Influenza B virus (B/EspiritoSanto/55/01), Influenza B virus (B/EspiritoSanto/79/99), Influenza B virus (B/Finland/154/2002), Influenza B virus (B/Finland/159/2002), Influenza B virus (B/Finland/160/2002), Influenza B virus (B/Finland/161/2002), Influenza B virus (B/Finland/162/03), Influenza B virus (B/Finland/162/2002), Influenza B virus (B/Finland/162/91), Influenza B virus (B/Finland/164/2003), Influenza B virus (B/Finland/172/91), Influenza B virus (B/Finland/173/2003), Influenza B virus (B/Finland/176/2003), Influenza B virus (B/Finland/184/91), Influenza B virus (B/Finland/188/2003), Influenza B virus (B/Finland/190/2003), Influenza B virus (B/Finland/191/2003), Influenza B virus (B/Finland/192/2003), Influenza B virus (B/Finland/193/2003), Influenza B virus (B/Finland/199/2003), Influenza B virus (B/Finland/202/2003), Influenza B virus (B/Finland/203/2003), Influenza B virus (B/Finland/204/2003), Influenza B virus (B/Finland/205/2003), Influenza B virus (B/Finland/206/2003), Influenza B virus (B/Finland/220/2003), Influenza B virus (B/Finland/223/2003), Influenza B virus (B/Finland/225/2003), Influenza B virus (B/Finland/227/2003), Influenza B virus (B/Finland/231/2003), Influenza B virus (B/Finland/235/2003), Influenza B virus (B/Finland/239/2003), Influenza B virus (B/Finland/244/2003), Influenza B virus (B/Finland/245/2003), Influenza B virus (B/Finland/254/2003), Influenza B virus (B/Finland/254/93), Influenza B virus (B/Finland/255/2003), Influenza B virus (B/Finland/260/93), Influenza B virus (B/Finland/268/93), Influenza B virus (B/Finland/270/2003), Influenza B virus (B/Finland/275/2003), Influenza B virus (B/Finland/767/2000), Influenza B virus (B/Finland/84/2002), Influenza B virus (B/Finland/886/2001), Influenza B virus (B/Finland/WV4/2002), Influenza B virus (B/Finland/WV5/2002), Influenza B virus (B/Florida/02/1998), Influenza B virus (B/Florida/02/2006), Influenza B virus (B/Florida/1/2000), Influenza B virus (B/Florida/1/2004), Influenza B virus (B/Florida/2/2004), Influenza B virus (B/Florida/2/2005), Influenza B virus (B/Florida/2/2006), Influenza B virus (B/Florida/7e/2004), Influenza B virus (B/Fujian/36/82), Influenza B virus (B/Geneva/5079/03), Influenza B virus (B/Genoa/11/02), Influenza B virus (B/Genoa/2/02), Influenza B virus (B/Genoa/21/02), Influenza B virus (B/Genoa/33/02), Influenza B virus (B/Genoa/41/02), Influenza B virus (B/Genoa/52/02), Influenza B virus (B/Genoa/55/02), Influenza B virus (B/Genoa/56/02), Influenza B virus (B/Genoa/7/02), Influenza B virus (B/Genoa/8/02), Influenza B virus (B/Genoa12/02), Influenza B virus (B/Genoa3/02), Influenza B virus (B/Genoa48/02), Influenza B virus (B/Genoa49/02), Influenza B virus (B/Genoa5/02), Influenza B virus (B/Genoa53/02), Influenza B virus (B/Genoa6/02), Influenza B virus (B/Genoa65/02), Influenza B virus (B/Genova/1294/03), Influenza B virus (B/Genova/1603/03), Influenza B virus (B/Genova/2/02), Influenza B virus (B/Genova/20/02), Influenza B virus (B/Genova/2059/03), Influenza B virus (B/Genova/26/02), Influenza B virus (B/Genova/30/02), Influenza B virus (B/Genova/54/02), Influenza B virus (B/Genova/55/02), Influenza B virus (B/Georgia/02/1998), Influenza B virus (B/Georgia/04/1998), Influenza B virus (B/Georgia/09/2005), Influenza B virus (B/Georgia/1/2000), Influenza B virus (B/Georgia/1/2005), Influenza B virus (B/Georgia/2/2005), Influenza B virus (B/Georgia/9/2005), Influenza B virus (B/Guangdong/05/94), Influenza B virus (B/Guangdong/08/93), Influenza B virus (B/Guangdong/5/94), Influenza B virus (B/Guangdong/55/89), Influenza B virus (B/Guangdong/8/93), Influenza B virus (B/Guangzhou/7/97), Influenza B virus (B/Guangzhou/86/92), Influenza B virus (B/Guangzhou/87/92), Influenza B virus (B/Gyeonggi/592/2005), Influenza B virus (B/Hannover/2/90), Influenza B virus (B/Harbin/07/94), Influenza B virus (B/Hawaii/1/2003), Influenza B virus (B/Hawaii/10/2001), Influenza B virus (B/Hawaii/10/2004), Influenza B virus (B/Hawaii/11/2004), Influenza B virus (B/Hawaii/11e/2004), Influenza B virus (B/Hawaii/11e/2005), Influenza B virus (B/Hawaii/12e/2005), Influenza B virus (B/Hawaii/13/2004), Influenza B virus (B/Hawaii/13e/2004), Influenza B virus (B/Hawaii/17/2001), Influenza B virus (B/Hawaii/18e/2004), Influenza B virus (B/Hawaii/1990/2004), Influenza B virus (B/Hawaii/1993/2004), Influenza B virus (B/Hawaii/19e/2004), Influenza B virus (B/Hawaii/2/2000), Influenza B virus (B/Hawaii/2/2003), Influenza B virus (B/Hawaii/20e/2004), Influenza B virus (B/Hawaii/21/2004), Influenza B virus (B/Hawaii/26/2001), Influenza B virus (B/Hawaii/31e/2004), Influenza B virus (B/Hawaii/32e/2004), Influenza B virus (B/Hawaii/33e/2004), Influenza B virus (B/Hawaii/35/2001), Influenza B virus (B/Hawaii/36/2001), Influenza B virus (B/Hawaii/37/2001), Influenza B virus (B/Hawaii/38/2001), Influenza B virus (B/Hawaii/4/2006), Influenza B virus (B/Hawaii/43/2001), Influenza B virus (B/Hawaii/44/2001), Influenza B virus (B/Hawaii/9/2001), Influenza B virus (B/Hebei/19/94), Influenza B virus (B/Hebei/3/94), Influenza B virus (B/Hebei/4/95), Influenza B virus (B/Henan/22/97), Influenza B virus (B/Hiroshima/23/2001), Influenza B virus (B/Hong Kong/02/1993), Influenza B virus (B/Hong Kong/03/1992), Influenza B virus (B/Hong Kong/05/1972), Influenza B virus (B/Hong Kong/06/2001), Influenza B virus (B/Hong Kong/110/99), Influenza B virus (B/Hong Kong/1115/2002), Influenza B virus (B/Hong Kong/112/2001), Influenza B virus (B/Hong Kong/123/2001), Influenza B virus (B/Hong Kong/1351/02), Influenza B virus (B/Hong Kong/1351/2002), Influenza B virus (B/Hong Kong/1434/2002), Influenza B virus (B/Hong Kong/147/99), Influenza B virus (B/Hong Kong/156/99), Influenza B virus (B/Hong Kong/157/99), Influenza B virus (B/Hong Kong/167/2002), Influenza B virus (B/Hong Kong/22/1989), Influenza B virus (B/Hong Kong/22/2001), Influenza B virus (B/Hong Kong/22/89), Influenza B virus (B/Hong Kong/28/2001), Influenza B virus (B/Hong Kong/293/02), Influenza B virus (B/Hong Kong/310/2004), Influenza B virus (B/Hong Kong/329/2001), Influenza B virus (B/Hong Kong/330/2001 egg adapted), Influenza B virus (B/Hong Kong/330/2001), Influenza B virus (B/Hong Kong/330/2002), Influenza B virus (B/Hong Kong/335/2001), Influenza B virus (B/Hong Kong/336/2001), Influenza B virus (B/Hong Kong/497/2001), Influenza B virus (B/Hong Kong/542/2000), Influenza B virus (B/Hong Kong/548/2000), Influenza B virus (B/Hong Kong/553a/2003), Influenza B virus (B/Hong Kong/557/2000), Influenza B virus (B/Hong Kong/6/2001), Influenza B virus (B/Hong Kong/666/2001), Influenza B virus (B/Hong Kong/692/01), Influenza B virus (B/Hong Kong/70/1996), Influenza B virus (B/Hong Kong/8/1973), Influenza B virus (B/Hong Kong/9/89), Influenza B virus (B/Houston/1/91), Influenza B virus (B/Houston/1/92), Influenza B virus (B/Houston/1/96), Influenza B virus (B/Houston/2/93), Influenza B virus (B/Houston/2/96), Influenza B virus (B/Houston/B15/1999), Influenza B virus (B/Houston/B56/1997), Influenza B virus (B/Houston/B57/1997), Influenza B virus (B/Houston/B58/1997), Influenza B virus (B/Houston/B59/1997), Influenza B virus (B/Houston/B60/1997), Influenza B virus (B/Houston/B61/1997), Influenza B virus (B/Houston/B63/1997), Influenza B virus (B/Houston/B65/1998), Influenza B virus (B/Houston/B66/2000), Influenza B virus (B/Houston/B67/2000), Influenza B virus (B/Houston/B68/2000), Influenza B virus (B/Houston/B69/2002), Influenza B virus (B/Houston/B70/2002), Influenza B virus (B/Houston/B71/2002), Influenza B virus (B/Houston/B720/2004), Influenza B virus (B/Houston/B74/2002), Influenza B virus (B/Houston/B745/2005), Influenza B virus (B/Houston/B75/2002), Influenza B virus (B/Houston/B756/2005), Influenza B virus (B/Houston/B77/2002), Influenza B virus (B/Houston/B787/2005), Influenza B virus (B/Houston/B79/2003), Influenza B virus (B/Houston/B81/2003), Influenza B virus (B/Houston/B84/2003), Influenza B virus (B/Houston/B846/2005), Influenza B virus (B/Houston/B850/2005), Influenza B virus (B/Houston/B86/2003), Influenza B virus (B/Houston/B87/2003), Influenza B virus (B/Houston/B88/2003), Influenza B virus (B/Hunan/4/72), Influenza B virus (B/Ibaraki/2/85), Influenza B virus (B/Idaho/1/2005), Influenza B virus (B/Illinois/1/2004), Influenza B virus (B/Illinois/13/2004), Influenza B virus (B/Illinois/13/2005), Influenza B virus (B/Illinois/13e/2005), Influenza B virus (B/Illinois/3/2001), Influenza B virus (B/Illinois/3/2005), Influenza B virus (B/Illinois/33/2005), Influenza B virus (B/Illinois/36/2005), Influenza B virus (B/Illinois/4/2005), Influenza B virus (B/Illinois/47/2005), Influenza B virus (B/Incheon/297/2005), Influenza B virus (B/India/3/89), Influenza B virus (B/India/7526/2001), Influenza B virus (B/India/7569/2001), Influenza B virus (B/India/7600/2001), Influenza B virus (B/India/7605/2001), Influenza B virus (B/India/77276/2001), Influenza B virus (B/Indiana/01/1995), Influenza B virus (B/Indiana/3/2006), Influenza B virus (B/Indiana/5/2006), Influenza B virus (B/Iowa/03/2002), Influenza B virus (B/Iowa/1/2001), Influenza B virus (B/Iowa/1/2005), Influenza B virus (B/Israel/95/03), Influenza B virus (B/Israel/WV124/2002), Influenza B virus (B/Israel/WV126/2002), Influenza B virus (B/Israel/WV133/2002), Influenza B virus (B/Israel/WV135/2002), Influenza B virus (B/Israel/WV137/2002), Influenza B virus (B/Israel/WV142/2002), Influenza B virus (B/Israel/WV143/2002), Influenza B virus (B/Israel/WV145/2002), Influenza B virus (B/Israel/WV146/2002), Influenza B virus (B/Israel/WV150/2002), Influenza B virus (B/Israel/WV153/2002), Influenza B virus (B/Israel/WV158/2002), Influenza B virus (B/Israel/WV161/2002), Influenza B virus (B/Israel/WV166/2002), Influenza B virus (B/Israel/WV169/2002), Influenza B virus (B/Israel/WV170/2002), Influenza B virus (B/Israel/WV174/2002), Influenza B virus (B/Israel/WV183/2002), Influenza B virus (B/Israel/WV187/2002), Influenza B virus (B/Istanbul/CTF-132/05), Influenza B virus (B/Japan/1224/2005), Influenza B virus (B/Japan/1905/2005), Influenza B virus (B/Jiangsu/10/03), Influenza B virus (B/Jiangsu/10/2003 (recomb)), Influenza B virus (B/Jiangsu/10/2003), Influenza B virus (B/Jilin/20/2003), Influenza B virus (B/Johannesburg/05/1999), Influenza B virus (B/Johannesburg/06/1994), Influenza B virus (B/Johannesburg/1/99), Influenza B virus (B/Johannesburg/113/010), Influenza B virus (B/Johannesburg/116/01), Influenza B virus (B/Johannesburg/119/01), Influenza B virus (B/Johannesburg/123/01), Influenza B virus (B/Johannesburg/163/99), Influenza B virus (B/Johannesburg/187/99), Influenza B virus (B/Johannesburg/189/99), Influenza B virus (B/Johannesburg/2/99), Influenza B virus (B/Johannesburg/27/2005), Influenza B virus (B/Johannesburg/33/01), Influenza B virus (B/Johannesburg/34/01), Influenza B virus (B/Johannesburg/35/01), Influenza B virus (B/Johannesburg/36/01), Influenza B virus (B/Johannesburg/41/99), Influenza B virus (B/Johannesburg/5/99), Influenza B virus (B/Johannesburg/69/2001), Influenza B virus (B/Johannesburg/77/01), Influenza B virus (B/Johannesburg/94/99), Influenza B virus (B/Johannesburg/96/01), Influenza B virus (B/Kadoma/1076/99), Influenza B virus (B/Kadoma/122/99), Influenza B virus (B/Kadoma/122/99-V1), Influenza B virus (B/Kadoma/122/99-V10), Influenza B virus (B/Kadoma/122/99-V11), Influenza B virus (B/Kadoma/122/99-V2), Influenza B virus (B/Kadoma/122/99-V3), Influenza B virus (B/Kadoma/122/99-V4), Influenza B virus (B/Kadoma/122/99-V5), Influenza B virus (B/Kadoma/122/99-V6), Influenza B virus (B/Kadoma/122/99-V7), Influenza B virus (B/Kadoma/122/99-V8), Influenza B virus (B/Kadoma/122/99-V9), Influenza B virus (B/Kadoma/136/99), Influenza B virus (B/Kadoma/409/2000), Influenza B virus (B/Kadoma/506/99), Influenza B virus (B/kadoma/642/99), Influenza B virus (B/Kadoma/647/99), Influenza B virus (B/Kagoshima/15/94), Influenza B virus (B/Kanagawa/73), Influenza B virus (B/Kansas/1/2005), Influenza B virus (B/Kansas/22992/99), Influenza B virus (B/Kentucky/4/2005), Influenza B virus (B/Khazkov/224/91), Influenza B virus (B/Kisumu/2036/2006), Influenza B virus (B/Kisumu/2037/2006), Influenza B virus (B/Kisumu/2038/2006), Influenza B virus (B/Kisumu/2039/2006), Influenza B virus (B/Kisumu/2040/2006), Influenza B virus (B/Kisumu/7/2005), Influenza B virus (B/Kobe/1/2002), Influenza B virus (B/Kobe/1/2002-V1), Influenza B virus (B/Kobe/1/2002-V2), Influenza B virus (B/Kobe/1/2003), Influenza B virus (B/Kobe/1/94), Influenza B virus (B/Kobe/2/2002), Influenza B virus (B/Kobe/2/2003), Influenza B virus (B/Kobe/25/2003), Influenza B virus (B/Kobe/26/2003), Influenza B virus (B/Kobe/28/2003), Influenza B virus (B/Kobe/3/2002), Influenza B virus (B/Kobe/3/2003), Influenza B virus (B/Kobe/4/2002), Influenza B virus (B/Kobe/4/2003), Influenza B virus (B/Kobe/5/2002), Influenza B virus (B/Kobe/6/2002), Influenza B virus (B/Kobe/64/2001), Influenza B virus (B/Kobe/65/2001), Influenza B virus (B/Kobe/69/2001), Influenza B virus (B/Kobe/7/2002), Influenza B virus (B/Kobe/79/2001), Influenza B virus (B/Kobe/83/2001), Influenza B virus (B/Kobe/87/2001), Influenza B virus (B/Kouchi/193/1999), Influenza B virus (B/Kouchi/193/99), Influenza B virus (B/Lazio/1/02), Influenza B virus (B/Lee/40), Influenza B virus (B/Leningrad/129/91), Influenza B virus (B/Leningrad/148/91), Influenza B virus (B/Lisbon/02/1994), Influenza B virus (B/Lissabon/2/90), Influenza B virus (B/Los Angeles/1/02), Influenza B virus (B/Lusaka/270/99), Influenza B virus (B/Lusaka/432/99), Influenza B virus (B/Lyon/1271/96), Influenza B virus (B/Malaysia/83077/2001), Influenza B virus (B/Maputo/1/99), Influenza B virus (B/Maputo/2/99), Influenza B virus (B/Mar del Plata/595/99), Influenza B virus (B/Mar del Plata/VL373/99), Influenza B virus (B/Mar del Plata/VL385/99), Influenza B virus (B/Maryland/1/01), Influenza B virus (B/Maryland/1/2002), Influenza B virus (B/Maryland/2/2001), Influenza B virus (B/Maryland/7/2003), Influenza B virus (B/Massachusetts/1/2004), Influenza B virus (B/Massachusetts/2/2004), Influenza B virus (B/Massachusetts/3/2004), Influenza B virus (B/Massachusetts/4/2001), Influenza B virus (B/Massachusetts/5/2003), Influenza B virus (B/Memphis/1/01), Influenza B virus (B/Memphis/10/97), Influenza B virus (B/Memphis/11/2006), Influenza B virus (B/Memphis/12/2006), Influenza B virus (B/Memphis/12/97), Influenza B virus (B/Memphis/12/97-MA), Influenza B virus (B/Memphis/13/03), Influenza B virus (B/Memphis/18/95), Influenza B virus (B/Memphis/19/96), Influenza B virus (B/Memphis/20/96), Influenza B virus (B/Memphis/21/96), Influenza B virus (B/Memphis/28/96), Influenza B virus (B/Memphis/3/01), Influenza B virus (B/Memphis/3/89), Influenza B virus (B/Memphis/3/93), Influenza B virus (B/Memphis/4/93), Influenza B virus (B/Memphis/5/93), Influenza B virus (B/Memphis/7/03), Influenza B virus (B/Memphis/8/99), Influenza B virus (B/Mexico/84/2000), Influenza B virus (B/Michigan/04/2006), Influenza B virus (B/Michigan/1/2005), Influenza B virus (B/Michigan/1/2006), Influenza B virus (B/Michigan/2/2004), Influenza B virus (B/Michigan/20/2005), Influenza B virus (B/Michigan/22572/99), Influenza B virus (B/Michigan/22587/99), Influenza B virus (B/Michigan/22596/99), Influenza B virus (B/Michigan/22631/99), Influenza B virus (B/Michigan/22659/99), Influenza B virus (B/Michigan/22687/99), Influenza B virus (B/Michigan/22691/99), Influenza B virus (B/Michigan/22721/99), Influenza B virus (B/Michigan/22723/99), Influenza B virus (B/Michigan/2e/2006), Influenza B virus (B/Michigan/3/2004), Influenza B virus (B/Michigan/4/2006), Influenza B virus (B/Michigan/e3/2006), Influenza B virus (B/micona/1/1989), Influenza B virus (B/Mie/01/1993), Influenza B virus (B/Mie/1/93), Influenza B virus (B/Milano/1/01), Influenza B virus (B/Milano/1/02), Influenza B virus (B/Milano/5/02), Influenza B virus (B/Milano/6/02), Influenza B virus (B/Milano/66/04), Influenza B virus (B/Milano/7/02), Influenza B virus (B/Minnesota/1/1985), Influenza B virus (B/Minnesota/14/2001), Influenza B virus (B/Minnesota/2/2001), Influenza B virus (B/Minsk/318/90), Influenza B virus (B/Mississippi/1/2001), Influenza B virus (B/Mississippi/2/2005), Influenza B virus (B/Mississippi/3/2001), Influenza B virus (B/Mississippi/3/2005), Influenza B virus (B/Mississippi/4/2003), Influenza B virus (B/Mississippi/4e/2005), Influenza B virus (B/Missouri/1/2006), Influenza B virus (B/Missouri/11/2003), Influenza B virus (B/Missouri/2/2005), Influenza B virus (B/Missouri/20/2003), Influenza B virus (B/Missouri/6/2005), Influenza B virus (B/Montana/1/2003), Influenza B virus (B/Montana/1/2006), Influenza B virus (B/Montana/1e/2004), Influenza B virus (B/Moscow/16/2002), Influenza B virus (B/Moscow/3/03), Influenza B virus (B/Nagoya/20/99), Influenza B virus (B/Nairobi/2032/2006), Influenza B virus (B/Nairobi/2033/2006), Influenza B virus (B/Nairobi/2034/2006), Influenza B virus (B/Nairobi/2035/2006), Influenza B virus (B/Nairobi/351/2005), Influenza B virus (B/Nairobi/670/2005), Influenza B virus (B/Nanchang/1/00), Influenza B virus (B/Nanchang/1/2000), Influenza B virus (B/Nanchang/12/98), Influenza B virus (B/Nanchang/15/95), Influenza B virus (B/Nanchang/15/97), Influenza B virus (B/Nanchang/195/94), Influenza B virus (B/Nanchang/2/97), Influenza B virus (B/Nanchang/20/96), Influenza B virus (B/Nanchang/26/93), Influenza B virus (B/Nanchang/3/95), Influenza B virus (B/Nanchang/4/97), Influenza B virus (B/Nanchang/480/94), Influenza B virus (B/Nanchang/5/97), Influenza B virus (B/Nanchang/560/94), Influenza B virus (B/Nanchang/560a/94), Influenza B virus (B/Nanchang/560b/94), Influenza B virus (B/Nanchang/6/96), Influenza B virus (B/Nanchang/6/98), Influenza B virus (B/Nanchang/630/94), Influenza B virus (B/Nanchang/7/98), Influenza B virus (B/Nanchang/8/95), Influenza B virus (B/Nashville/107/93), Influenza B virus (B/Nashville/3/96), Influenza B virus (B/Nashville/34/96), Influenza B virus (B/Nashville/45/91), Influenza B virus (B/Nashville/48/91), Influenza B virus (B/Nashville/6/89), Influenza B virus (B/Nebraska/1/01), Influenza B virus (B/Nebraska/1/2005), Influenza B virus (B/Nebraska/2/01), Influenza B virus (B/Nebraska/4/2001), Influenza B virus (B/Nebraska/5/2003), Influenza B virus (B/Nepal/1078/2005), Influenza B virus (B/Nepal/1079/2005), Influenza B virus (B/Nepal/1080/2005), Influenza B virus (B/Nepal/1087/2005), Influenza B virus (B/Nepal/1088/2005), Influenza B virus (B/Nepal/1089/2005), Influenza B virus (B/Nepal/1090/2005), Influenza B virus (B/Nepal/1092/2005), Influenza B virus (B/Nepal/1098/2005), Influenza B virus (B/Nepal/1101/2005), Influenza B virus (B/Nepal/1103/2005), Influenza B virus (B/Nepal/1104/2005), Influenza B virus (B/Nepal/1105/2005), Influenza B virus (B/Nepal/1106/2005), Influenza B virus (B/Nepal/1108/2005), Influenza B virus (B/Nepal/1114/2005), Influenza B virus (B/Nepal/1117/2005), Influenza B virus (B/Nepal/1118/2005), Influenza B virus (B/Nepal/1120/2005), Influenza B virus (B/Nepal/1122/2005), Influenza B virus (B/Nepal/1131/2005), Influenza B virus (B/Nepal/1132/2005), Influenza B virus (B/Nepal/1136/2005), Influenza B virus (B/Nepal/1137/2005), Influenza B virus (B/Nepal/1138/2005), Influenza B virus (B/Nepal/1139/2005), Influenza B virus (B/Nepal/1331/2005), Influenza B virus (B/Netherland/2781/90), Influenza B virus (B/Netherland/6357/90), Influenza B virus (B/Netherland/800/90), Influenza B virus (B/Netherland/801/90), Influenza B virus (B/Netherlands/1/97), Influenza B virus (B/Netherlands/13/94), Influenza B virus (B/Netherlands/2/95), Influenza B virus (B/Netherlands/31/95), Influenza B virus (B/Netherlands/32/94), Influenza B virus (B/Netherlands/384/95), Influenza B virus (B/Netherlands/429/98), Influenza B virus (B/Netherlands/580/89), Influenza B virus (B/Netherlands/6/96), Influenza B virus (B/Nevada/1/2001), Influenza B virus (B/Nevada/1/2002), Influenza B virus (B/Nevada/1/2005), Influenza B virus (B/Nevada/1/2006), Influenza B virus (B/Nevada/2/2003), Influenza B virus (B/Nevada/2/2006), Influenza B virus (B/Nevada/3/2006), Influenza B virus (B/Nevada/5/2005), Influenza B virus (B/New Jersey/1/2002), Influenza B virus (B/New Jersey/1/2004), Influenza B virus (B/New Jersey/1/2005), Influenza B virus (B/New Jersey/1/2006), Influenza B virus (B/New Jersey/3/2001), Influenza B virus (B/New Jersey/3/2005), Influenza B virus (B/New Jersey/4/2001), Influenza B virus (B/New Jersey/5/2005), Influenza B virus (B/New Jersey/6/2005), Influenza B virus (B/New Mexico/1/2001), Influenza B virus (B/New Mexico/1/2006), Influenza B virus (B/New Mexico/2/2005), Influenza B virus (B/New Mexico/9/2003), Influenza B virus (B/New York/1/2001), Influenza B virus (B/New York/1/2002), Influenza B virus (B/New York/1/2004), Influenza B virus (B/New York/1/2006), Influenza B virus (B/New York/10/2002), Influenza B virus (B/New York/11/2005), Influenza B virus (B/New York/12/2001), Influenza B virus (B/New York/12/2005), Influenza B virus (B/New York/12e/2005), Influenza B virus (B/New York/14e/2005), Influenza B virus (B/New York/17/2004), Influenza B virus (B/New York/18/2003), Influenza B virus (B/New York/19/2004), Influenza B virus (B/New York/2/2000), Influenza B virus (B/New York/2/2002), Influenza B virus (B/New York/2/2006), Influenza B virus (B/New York/20139/99), Influenza B virus (B/New York/24/1993), Influenza B virus (B/New York/2e/2005), Influenza B virus (B/New York/3/90), Influenza B virus (B/New York/39/1991), Influenza B virus (B/New York/40/2002), Influenza B virus (B/New York/47/2001), Influenza B virus (B/New York/6/2004), Influenza B virus (B/New York/7/2002), Influenza B virus (B/New York/8/2000), Influenza B virus (B/New York/9/2002), Influenza B virus (B/New York/9/2004), Influenza B virus (B/New York/C10/2004), Influenza B virus (B/NIB/48/90), Influenza B virus (B/Ningxia/45/83), Influenza B virus (B/North Carolina/1/2005), Influenza B virus (B/North Carolina/3/2005), Influenza B virus (B/North Carolina/4/2004), Influenza B virus (B/North Carolina/5/2004), Influenza B virus (B/Norway/1/84), Influenza B virus (B/Ohio/1/2005), Influenza B virus (B/Ohio/1/X-19/2005), Influenza B virus (B/Ohio/1e/2005), Influenza B virus (B/Ohio/1e4/2005), Influenza B virus (B/Ohio/2/2002), Influenza B virus (B/Ohio/2e/2005), Influenza B virus (B/Oita/15/1992), Influenza B virus (B/Oklahoma/1/2006), Influenza B virus (B/Oklahoma/2/2005), Influenza B virus (B/Oman/16291/2001), Influenza B virus (B/Oman/16296/2001), Influenza B virus (B/Oman/16299/2001), Influenza B virus (B/Oman/16305/2001), Influenza B virus (B/Oregon/1/2005), Influenza B virus (B/Oregon/1/2006), Influenza B virus (B/Oregon/5/80), Influenza B virus (B/Osaka/1036/97), Influenza B virus (B/Osaka/1058/97), Influenza B virus (B/Osaka/1059/97), Influenza B virus (B/Osaka/1146/1997), Influenza B virus (B/Osaka/1169/97), Influenza B virus (B/Osaka/1201/2000), Influenza B virus (B/Osaka/547/1997), Influenza B virus (B/Osaka/547/97), Influenza B virus (B/Osaka/710/1997), Influenza B virus (B/Osaka/711/97), Influenza B virus (B/Osaka/728/1997), Influenza B virus (B/Osaka/755/1997), Influenza B virus (B/Osaka/820/1997), Influenza B virus (B/Osaka/837/1997), Influenza B virus (B/Osaka/854/1997), Influenza B virus (B/Osaka/983/1997), Influenza B virus (B/Osaka/983/1997-M1), Influenza B virus (B/Osaka/983/1997-M2), Influenza B virus (B/Osaka/983/97-V1), Influenza B virus (B/Osaka/983/97-V2), Influenza B virus (B/Osaka/983/97-V3), Influenza B virus (B/Osaka/983/97-V4), Influenza B virus (B/Osaka/983/97-V5), Influenza B virus (B/Osaka/983/97-V6), Influenza B virus (B/Osaka/983/97-V7), Influenza B virus (B/Osaka/983/97-V8), Influenza B virus (B/Osaka/c19/93), Influenza B virus (B/Oslo/1072/2001), Influenza B virus (B/Oslo/1329/2002), Influenza B virus (B/Oslo/1510/2002), Influenza B virus (B/Oslo/1846/2002), Influenza B virus (B/Oslo/1847/2002), Influenza B virus (B/Oslo/1862/2001), Influenza B virus (B/Oslo/1864/2001), Influenza B virus (B/Oslo/1870/2002), Influenza B virus (B/Oslo/1871/2002), Influenza B virus (B/Oslo/2293/2001), Influenza B virus (B/Oslo/2295/2001), Influenza B virus (B/Oslo/2297/2001), Influenza B virus (B/Oslo/238/2001), Influenza B virus (B/Oslo/3761/2000), Influenza B virus (B/Oslo/47/2001), Influenza B virus (B/Oslo/668/2002), Influenza B virus (B/Oslo/71/04), Influenza B virus (B/Oslo/801/99), Influenza B virus (B/Oslo/805/99), Influenza B virus (B/Oslo/837/99), Influenza B virus (B/Panama/45/1990), Influenza B virus (B/Panama/45/90), Influenza B virus (B/Paraguay/636/2003), Influenza B virus (B/Paris/329/90), Influenza B virus (B/Paris/549/1999), Influenza B virus (B/Parma/1/03), Influenza B virus (B/Parma/1/04), Influenza B virus (B/Parma/13/02), Influenza B virus (B/Parma/16/02), Influenza B virus (B/Parma/2/03), Influenza B virus (B/Parma/2/04), Influenza B virus (B/Parma/23/02), Influenza B virus (B/Parma/24/02), Influenza B virus (B/Parma/25/02), Influenza B virus (B/Parma/28/02), Influenza B virus (B/Parma/3/04), Influenza B virus (B/Parma/4/04), Influenza B virus (B/Parma/5/02), Influenza B virus (B/Pennsylvania/1/2006), Influenza B virus (B/Pennsylvania/2/2001), Influenza B virus (B/Pennsylvania/2/2006), Influenza B virus (B/Pennsylvania/3/2003), Influenza B virus (B/Pennsylvania/3/2006), Influenza B virus (B/Pennsylvania/4/2004), Influenza B virus (B/Perth/211/2001), Influenza B virus (B/Perth/25/2002), Influenza B virus (B/Peru/1324/2004), Influenza B virus (B/Peru/1364/2004), Influenza B virus (B/Perugia/4/03), Influenza B virus (B/Philippines/5072/2001), Influenza B virus (B/Philippines/93079/2001), Influenza B virus (B/Pusan/250/99), Influenza B virus (B/Pusan/255/99), Influenza B virus (B/Pusan/270/99), Influenza B virus (B/Pusan/285/99), Influenza B virus (B/Quebec/1/01), Influenza B virus (B/Quebec/162/98), Influenza B virus (B/Quebec/173/98), Influenza B virus (B/Quebec/2/01), Influenza B virus (B/Quebec/3/01), Influenza B virus (B/Quebec/4/01), Influenza B virus (B/Quebec/452/98), Influenza B virus (B/Quebec/453/98), Influenza B virus (B/Quebec/465/98), Influenza B virus (B/Quebec/51/98), Influenza B virus (B/Quebec/511/98), Influenza B virus (B/Quebec/514/98), Influenza B virus (B/Quebec/517/98), Influenza B virus (B/Quebec/6/01), Influenza B virus (B/Quebec/7/01), Influenza B virus (B/Quebec/74199/99), Influenza B virus (B/Quebec/74204/99), Influenza B virus (B/Quebec/74206/99), Influenza B virus (B/Quebec/8/01), Influenza B virus (B/Quebec/9/01), Influenza B virus (B/Rabat/41/97), Influenza B virus (B/Rabat/45/97), Influenza B virus (B/Rabat/61/97), Influenza B virus (B/RiodeJaneiro/200/02), Influenza B virus (B/RiodeJaneiro/209/02), Influenza B virus (B/RiodeJaneiro/315/01), Influenza B virus (B/RiodeJaneiro/353/02), Influenza B virus (B/RiodeJaneiro/354/02), Influenza B virus (B/RioGdoSul/337/01), Influenza B virus (B/RioGdoSul/357/02), Influenza B virus (B/RioGdoSul/374/01), Influenza B virus (B/Roma/1/03), Influenza B virus (B/Roma/2/03), Influenza B virus (B/Roma/3/03), Influenza B virus (B/Roma/4/02), Influenza B virus (B/Roma/7/02), Influenza B virus (B/Romania/217/1999), Influenza B virus (B/Romania/318/1998), Influenza B virus (B/Russia/22/1995), Influenza B virus (B/Saga/S172/99), Influenza B virus (B/Seal/Netherlands/1/99), Influenza B virus (B/Seoul/1/89), Influenza B virus (B/Seoul/1163/2004), Influenza B virus (B/Seoul/12/88), Influenza B virus (B/seoul/12/95), Influenza B virus (B/Seoul/13/95), Influenza B virus (B/Seoul/16/97), Influenza B virus (B/Seoul/17/95), Influenza B virus (B/Seoul/19/97), Influenza B virus (B/Seoul/21/95), Influenza B virus (B/Seoul/232/2004), Influenza B virus (B/Seoul/28/97), Influenza B virus (B/Seoul/31/97), Influenza B virus (B/Seoul/37/91), Influenza B virus (B/Seoul/38/91), Influenza B virus (B/Seoul/40/91), Influenza B virus (B/Seoul/41/91), Influenza B virus (B/Seoul/6/88), Influenza B virus (B/Shandong/7/97), Influenza B virus (B/Shangdong/7/97), Influenza B virus (B/Shanghai/1/77), Influenza B virus (B/Shanghai/10/80), Influenza B virus (B/Shanghai/24/76), Influenza B virus (B/Shanghai/35/84), Influenza B virus (B/Shanghai/361/03), Influenza B virus (B/Shanghai/361/2002), Influenza B virus (B/Shenzhen/423/99), Influenza B virus (B/Shiga/51/98), Influenza B virus (B/Shiga/N18/98), Influenza B virus (B/Shiga/T30/98), Influenza B virus (B/Shiga/T37/98), Influenza B virus (B/Shizuoka/15/2001), Influenza B virus (B/Shizuoka/480/2000), Influenza B virus (B/Sichuan/281/96), Influenza B virus (B/Sichuan/317/2001), Influenza B virus (B/Sichuan/379/99), Influenza B virus (B/Sichuan/38/2000), Influenza B virus (B/Sichuan/8/92), Influenza B virus (B/Siena/1/02), Influenza B virus (B/Singapore/04/1991), Influenza B virus (B/Singapore/11/1994), Influenza B virus (B/Singapore/22/1998), Influenza B virus (B/Singapore/222/79), Influenza B virus (B/Singapore/31/1998), Influenza B virus (B/Singapore/35/1998), Influenza B virus (B/South Australia/5/1999), Influenza B virus (B/South Carolina/04/2003), Influenza B virus (B/South Carolina/25723/99), Influenza B virus (B/South Carolina/3/2003), Influenza B virus (B/South Carolina/4/2003), Influenza B virus (B/South Dakota/1/2000), Influenza B virus (B/South Dakota/3/2003), Influenza B virus (B/South Dakota/5/89), Influenza B virus (B/Spain/WV22/2002), Influenza B virus (B/Spain/WV26/2002), Influenza B virus (B/Spain/WV27/2002), Influenza B virus (B/Spain/WV29/2002), Influenza B virus (B/Spain/WV33/2002), Influenza B virus (B/Spain/WV34/2002), Influenza B virus (B/Spain/WV36/2002), Influenza B virus (B/Spain/WV41/2002), Influenza B virus (B/Spain/WV42/2002), Influenza B virus (B/Spain/WV43/2002), Influenza B virus (B/Spain/WV45/2002), Influenza B virus (B/Spain/WV50/2002), Influenza B virus (B/Spain/WV51/2002), Influenza B virus (B/Spain/WV56/2002), Influenza B virus (B/Spain/WV57/2002), Influenza B virus (B/Spain/WV65/2002), Influenza B virus (B/Spain/WV66/2002), Influenza B virus (B/Spain/WV67/2002), Influenza B virus (B/Spain/WV69/2002), Influenza B virus (B/Spain/WV70/2002), Influenza B virus (B/Spain/WV73/2002), Influenza B virus (B/Spain/WV78/2002), Influenza B virus (B/St. Petersburg/14/2006), Influenza B virus (B/StaCatarina/308/02), Influenza B virus (B/StaCatarina/315/02), Influenza B virus (B/StaCatarina/318/02), Influenza B virus (B/StaCatarina/345/02), Influenza B virus (B/Stockholm/10/90), Influenza B virus (B/Suzuka/18/2005), Influenza B virus (B/Suzuka/28/2005), Influenza B virus (B/Suzuka/32/2005), Influenza B virus (B/Suzuka/58/2005), Influenza B virus (B/Switzerland/4291/97), Influenza B virus (B/Switzerland/5219/90), Influenza B virus (B/Switzerland/5241/90), Influenza B virus (B/Switzerland/5441/90), Influenza B virus (B/Switzerland/5444/90), Influenza B virus (B/Switzerland/5812/90), Influenza B virus (B/Switzerland/6121/90), Influenza B virus (B/Taiwan/0002/03), Influenza B virus (B/Taiwan/0114/01), Influenza B virus (B/Taiwan/0202/01), Influenza B virus (B/Taiwan/0409/00), Influenza B virus (B/Taiwan/0409/02), Influenza B virus (B/Taiwan/0562/03), Influenza B virus (B/Taiwan/0569/03), Influenza B virus (B/Taiwan/0576/03), Influenza B virus (B/Taiwan/0600/02), Influenza B virus (B/Taiwan/0610/03), Influenza B virus (B/Taiwan/0615/03), Influenza B virus (B/Taiwan/0616/03), Influenza B virus (B/Taiwan/0654/02), Influenza B virus (B/Taiwan/0684/03), Influenza B virus (B/Taiwan/0699/03), Influenza B virus (B/Taiwan/0702/02), Influenza B virus (B/Taiwan/0722/02), Influenza B virus (B/Taiwan/0730/02), Influenza B virus (B/Taiwan/0735/03), Influenza B virus (B/Taiwan/0833/03), Influenza B virus (B/Taiwan/0874/02), Influenza B virus (B/Taiwan/0879/02), Influenza B virus (B/Taiwan/0880/02), Influenza B virus (B/Taiwan/0927/02), Influenza B virus (B/Taiwan/0932/02), Influenza B virus (B/Taiwan/0993/02), Influenza B virus (B/Taiwan/1013/02), Influenza B virus (B/Taiwan/1013/03), Influenza B virus (B/Taiwan/102/2005), Influenza B virus (B/Taiwan/103/2005), Influenza B virus (B/Taiwan/110/2005), Influenza B virus (B/Taiwan/1103/2001), Influenza B virus (B/Taiwan/114/2001), Influenza B virus (B/Taiwan/11515/2001), Influenza B virus (B/Taiwan/117/2005), Influenza B virus (B/Taiwan/1197/1994), Influenza B virus (B/Taiwan/121/2005), Influenza B virus (B/Taiwan/12192/2000), Influenza B virus (B/Taiwan/1243/99), Influenza B virus (B/Taiwan/1265/2000), Influenza B virus (B/Taiwan/1293/2000), Influenza B virus (B/Taiwan/13/2004), Influenza B virus (B/Taiwan/14/2004), Influenza B virus (B/Taiwan/1484/2001), Influenza B virus (B/Taiwan/1502/02), Influenza B virus (B/Taiwan/1503/02), Influenza B virus (B/Taiwan/1534/02), Influenza B virus (B/Taiwan/1536/02), Influenza B virus (B/Taiwan/1561/02), Influenza B virus (B/Taiwan/1574/03), Influenza B virus (B/Taiwan/1584/02), Influenza B virus (B/Taiwan/16/2004), Influenza B virus (B/Taiwan/1618/03), Influenza B virus (B/Taiwan/165/2005), Influenza B virus (B/Taiwan/166/2005), Influenza B virus (B/Taiwan/188/2005), Influenza B virus (B/Taiwan/1949/02), Influenza B virus (B/Taiwan/1950/02), Influenza B virus (B/Taiwan/202/2001), Influenza B virus (B/Taiwan/2026/99), Influenza B virus (B/Taiwan/2027/99), Influenza B virus (B/Taiwan/217/97), Influenza B virus (B/Taiwan/21706/97), Influenza B virus (B/Taiwan/2195/99), Influenza B virus (B/Taiwan/2551/03), Influenza B virus (B/Taiwan/2805/01), Influenza B virus (B/Taiwan/2805/2001), Influenza B virus (B/Taiwan/3143/97), Influenza B virus (B/Taiwan/31511/00), Influenza B virus (B/Taiwan/31511/2000), Influenza B virus (B/Taiwan/34/2004), Influenza B virus (B/Taiwan/3532/03), Influenza B virus (B/Taiwan/39/2004), Influenza B virus (B/Taiwan/41010/00), Influenza B virus (B/Taiwan/41010/2000), Influenza B virus (B/Taiwan/4119/02), Influenza B virus (B/Taiwan/4184/00), Influenza B virus (B/Taiwan/4184/2000), Influenza B virus (B/Taiwan/43/2005), Influenza B virus (B/Taiwan/4602/02), Influenza B virus (B/Taiwan/473/2005), Influenza B virus (B/Taiwan/52/2004), Influenza B virus (B/Taiwan/52/2005), Influenza B virus (B/Taiwan/54/2004), Influenza B virus (B/Taiwan/61/2004), Influenza B virus (B/Taiwan/635/2005), Influenza B virus (B/Taiwan/637/2005), Influenza B virus (B/Taiwan/68/2004), Influenza B virus (B/Taiwan/68/2005), Influenza B virus (B/Taiwan/69/2004), Influenza B virus (B/Taiwan/70/2005), Influenza B virus (B/Taiwan/74/2004), Influenza B virus (B/Taiwan/75/2004), Influenza B virus (B/Taiwan/77/2005), Influenza B virus (B/Taiwan/81/2005), Influenza B virus (B/Taiwan/872/2005), Influenza B virus (B/Taiwan/97271/2001), Influenza B virus (B/Taiwan/98/2005), Influenza B virus (B/Taiwan/H96/02), Influenza B virus (B/Taiwan/M214/05), Influenza B virus (B/Taiwan/M227/05), Influenza B virus (B/Taiwan/M24/04), Influenza B virus (B/Taiwan/M244/05), Influenza B virus (B/Taiwan/M251/05), Influenza B virus (B/Taiwan/M53/05), Influenza B virus (B/Taiwan/M71/01), Influenza B virus (B/Taiwan/N1013/99), Influenza B virus (B/Taiwan/N1115/02), Influenza B virus (B/Taiwan/N1207/99), Influenza B virus (B/Taiwan/N1316/01), Influenza B virus (B/Taiwan/N1549/01), Influenza B virus (B/Taiwan/N1582/02), Influenza B virus (B/Taiwan/N16/03), Influenza B virus (B/Taiwan/N1619/04), Influenza B virus (B/Taiwan/N1848/02), Influenza B virus (B/Taiwan/N1902/04), Influenza B virus (B/Taiwan/N200/05), Influenza B virus (B/Taiwan/N2050/02), Influenza B virus (B/Taiwan/N230/01), Influenza B virus (B/Taiwan/N232/00), Influenza B virus (B/Taiwan/N2333/02), Influenza B virus (B/Taiwan/N2335/01), Influenza B virus (B/Taiwan/N253/03), Influenza B virus (B/Taiwan/N2620/04), Influenza B virus (B/Taiwan/N2986/02), Influenza B virus (B/Taiwan/N3688/04), Influenza B virus (B/Taiwan/N371/05), Influenza B virus (B/Taiwan/N376/05), Influenza B virus (B/Taiwan/N384/03), Influenza B virus (B/Taiwan/N3849/02), Influenza B virus (B/Taiwan/N404/02), Influenza B virus (B/Taiwan/N473/00), Influenza B virus (B/Taiwan/N511/01), Influenza B virus (B/Taiwan/N559/05), Influenza B virus (B/Taiwan/N612/01), Influenza B virus (B/Taiwan/N701/01), Influenza B virus (B/Taiwan/N767/01), Influenza B virus (B/Taiwan/N798/05), Influenza B virus (B/Taiwan/N860/05), Influenza B virus (B/Taiwan/N872/04), Influenza B virus (B/Taiwan/N913/04), Influenza B virus (B/Taiwan/S117/05), Influenza B virus (B/Taiwan/S141/02), Influenza B virus (B/Taiwan/S76/02), Influenza B virus (B/Taiwan/S82/02), Influenza B virus (B/Taiwn/103/2005), Influenza B virus (B/Tehran/80/02), Influenza B virus (B/Temple/B10/1999), Influenza B virus (B/Temple/B1166/2001), Influenza B virus (B/Temple/B1181/2001), Influenza B virus (B/Temple/B1182/2001), Influenza B virus (B/Temple/B1188/2001), Influenza B virus (B/Temple/B1190/2001), Influenza B virus (B/Temple/B1193/2001), Influenza B virus (B/Temple/B17/2003), Influenza B virus (B/Temple/B18/2003), Influenza B virus (B/Temple/B19/2003), Influenza B virus (B/Temple/B20/2003), Influenza B virus (B/Temple/B21/2003), Influenza B virus (B/Temple/B24/2003), Influenza B virus (B/Temple/B3/1999), Influenza B virus (B/Temple/B30/2003), Influenza B virus (B/Temple/B7/1999), Influenza B virus (B/Temple/B8/1999), Influenza B virus (B/Temple/B9/1999), Influenza B virus (B/Texas/06/2000), Influenza B virus (B/Texas/1/2000), Influenza B virus (B/Texas/1/2004), Influenza B virus (B/Texas/1/2006), Influenza B virus (B/Texas/1/91), Influenza B virus (B/Texas/10/2005), Influenza B virus (B/Texas/11/2001), Influenza B virus (B/Texas/12/2001), Influenza B virus (B/Texas/14/1991), Influenza B virus (B/Texas/14/2001), Influenza B virus (B/Texas/16/2001), Influenza B virus (B/Texas/18/2001), Influenza B virus (B/Texas/2/2006), Influenza B virus (B/Texas/22/2001), Influenza B virus (B/Texas/23/2000), Influenza B virus (B/Texas/3/2001), Influenza B virus (B/Texas/3/2002), Influenza B virus (B/Texas/3/2006), Influenza B virus (B/Texas/37/1988), Influenza B virus (B/Texas/37/88), Influenza B virus (B/Texas/4/2006), Influenza B virus (B/Texas/4/90), Influenza B virus (B/Texas/5/2002), Influenza B virus (B/Texas/57/2002), Influenza B virus (B/Texas/6/2000), Influenza B virus (B/Tokushima/101/93), Influenza B virus (B/Tokyo/6/98), Influenza B virus (B/Trento/3/02), Influenza B virus (B/Trieste/1/02), Influenza B virus (B/Trieste/1/03), Influenza B virus (B/Trieste/15/02), Influenza B virus (B/Trieste/17/02), Influenza B virus (B/Trieste/19/02), Influenza B virus (B/Trieste/2/03), Influenza B virus (B/Trieste/25/02), Influenza B virus (B/Trieste/27/02), Influenza B virus (B/Trieste/28/02), Influenza B virus (B/Trieste/34/02), Influenza B virus (B/Trieste/37/02), Influenza B virus (B/Trieste/4/02), Influenza B virus (B/Trieste/8/02), Influenza B virus (B/Trieste14/02), Influenza B virus (B/Trieste18/02), Influenza B virus (B/Trieste23/02), Influenza B virus (B/Trieste24/02), Influenza B virus (B/Trieste7/02), Influenza B virus (B/Ulan Ude/4/02), Influenza B virus (B/Ulan-Ude/6/2003), Influenza B virus (B/UlanUde/4/02), Influenza B virus (B/United Kingdom/34304/99), Influenza B virus (B/United Kingdom/34520/99), Influenza B virus (B/Uruguay/19/02), Influenza B virus (B/Uruguay/19/05), Influenza B virus (B/Uruguay/2/02), Influenza B virus (B/Uruguay/28/05), Influenza B virus (B/Uruguay/33/05), Influenza B virus (B/Uruguay/4/02), Influenza B virus (B/Uruguay/5/02), Influenza B virus (B/Uruguay/65/05), Influenza B virus (B/Uruguay/7/02), Influenza B virus (B/Uruguay/74/04), Influenza B virus (B/Uruguay/75/04), Influenza B virus (B/Uruguay/NG/02), Influenza B virus (B/Ushuaia/15732/99), Influenza B virus (B/USSR/100/83), Influenza B virus (B/Utah/1/2005), Influenza B virus (B/Utah/20139/99), Influenza B virus (B/Utah/20975/99), Influenza B virus (B/Vermont/1/2006), Influenza B virus (B/Victoria/02/1987), Influenza B virus (B/Victoria/103/89), Influenza B virus (B/Victoria/19/89), Influenza B virus (B/Victoria/2/87), Influenza B virus (B/Victoria/504/2000), Influenza B virus (B/Vienna/1/99), Influenza B virus (B/Virginia/1/2005), Influenza B virus (B/Virginia/1/2006), Influenza B virus (B/Virginia/11/2003), Influenza B virus (B/Virginia/2/2006), Influenza B virus (B/Virginia/3/2003), Influenza B virus (B/Virginia/3/2006), Influenza B virus (B/Virginia/9/2005), Influenza B virus (B/Washington/1/2004), Influenza B virus (B/Washington/2/2000), Influenza B virus (B/Washington/2/2004), Influenza B virus (B/Washington/3/2000), Influenza B virus (B/Washington/3/2003), Influenza B virus (B/Washington/5/2005), Influenza B virus (B/Wellington/01/1994), Influenza B virus (B/Wisconsin/1/2004), Influenza B virus (B/Wisconsin/1/2006), Influenza B virus (B/Wisconsin/10/2006), Influenza B virus (B/Wisconsin/15e/2005), Influenza B virus (B/Wisconsin/17/2006), Influenza B virus (B/Wisconsin/2/2004), Influenza B virus (B/Wisconsin/2/2006), Influenza B virus (B/Wisconsin/22/2006), Influenza B virus (B/Wisconsin/26/2006), Influenza B virus (B/Wisconsin/29/2006), Influenza B virus (B/Wisconsin/3/2000), Influenza B virus (B/Wisconsin/3/2004), Influenza B virus (B/Wisconsin/3/2005), Influenza B virus (B/Wisconsin/3/2006), Influenza B virus (B/Wisconsin/31/2006), Influenza B virus (B/Wisconsin/4/2006), Influenza B virus (B/Wisconsin/5/2006), Influenza B virus (B/Wisconsin/6/2006), Influenza B virus (B/Wisconsin/7/2002), Influenza B virus (B/Wuhan/2/2001), Influenza B virus (B/Wuhan/356/2000), Influenza B virus (B/WV194/2002), Influenza B virus (B/Wyoming/15/2001), Influenza B virus (B/Wyoming/16/2001), Influenza B virus (B/Wyoming/2/2003), Influenza B virus (B/Xuanwu/1/82), Influenza B virus (B/Xuanwu/23/82), Influenza B virus (B/Yamagata/1/73), Influenza B virus (B/Yamagata/115/2003), Influenza B virus (B/Yamagata/1246/2003), Influenza B virus (B/Yamagata/1311/2003), Influenza B virus (B/Yamagata/16/1988), Influenza B virus (B/Yamagata/16/88), Influenza B virus (B/Yamagata/222/2002), Influenza B virus (B/Yamagata/K198/2001), Influenza B virus (B/Yamagata/K246/2001), Influenza B virus (B/Yamagata/K270/2001), Influenza B virus (B/Yamagata/K298/2001), Influenza B virus (B/Yamagata/K320/2001), Influenza B virus (B/Yamagata/K354/2001), Influenza B virus (B/Yamagata/K386/2001), Influenza B virus (B/Yamagata/K411/2001), Influenza B virus (B/Yamagata/K461/2001), Influenza B virus (B/Yamagata/K490/2001), Influenza B virus (B/Yamagata/K500/2001), Influenza B virus (B/Yamagata/K501/2001), Influenza B virus (B/Yamagata/K508/2001), Influenza B virus (B/Yamagata/K513/2001), Influenza B virus (B/Yamagata/K515/2001), Influenza B virus (B/Yamagata/K519/2001), Influenza B virus (B/Yamagata/K520/2001), Influenza B virus (B/Yamagata/K521/2001), Influenza B virus (B/Yamagata/K535/2001), Influenza B virus (B/Yamagata/K542/2001), Influenza B virus (B/Yamanashi/166/1998), Influenza B virus (B/Yamanashi/166/98), Influenza B virus (B/Yunnan/123/2001), Influenza B virus (strain B/Alaska/12/96), Influenza B virus (STRAIN B/ANN ARBOR/1/66 [COLD-ADAPTED]), Influenza B virus (STRAIN B/ANN ARBOR/1/66 [WILD-TYPE]), Influenza B virus (STRAIN B/BA/78), Influenza B virus (STRAIN B/BEIJING/1/87), Influenza B virus (STRAIN B/ENGLAND/222/82), Influenza B virus (strain B/finland/145/90), Influenza B virus (strain B/finland/146/90), Influenza B virus (strain B/finland/147/90), Influenza B virus (strain B/finland/148/90), Influenza B virus (strain B/finland/149/90), Influenza B virus (strain B/finland/150/90), Influenza B virus (strain B/finland/151/90), Influenza B virus (strain B/finland/24/85), Influenza B virus (strain B/finland/56/88), Influenza B virus (STRAIN B/FUKUOKA/80/81), Influenza B virus (STRAIN B/GA/86), Influenza B virus (STRAIN B/GL/54), Influenza B virus (STRAIN B/HONG KONG/8/73), Influenza B virus (STRAIN B/HT/84), Influenza B virus (STRAIN B/ID/86), Influenza B virus (STRAIN B/LENINGRAD/179/86), Influenza B virus (STRAIN B/MARYLAND/59), Influenza B virus (STRAIN B/MEMPHIS/6/86), Influenza B virus (STRAIN B/NAGASAKI/1/87), Influenza B virus (strain B/Osaka/491/97), Influenza B virus (STRAIN B/PA/79), Influenza B virus (STRAIN B/RU/69), Influenza B virus (STRAIN B/SINGAPORE/64), Influenza B virus (strain B/Tokyo/942/96), Influenza B virus (STRAIN B/VICTORIA/3/85), Influenza B virus (STRAIN B/VICTORIA/87), Influenza B virus (B/Rochester/02/2001), and other subtypes. In further embodiments, the influenza virus C belongs to but is not limited to subtype Influenza C virus (C/Aichi/1/81), Influenza C virus (C/Aichi/1/99), Influenza C virus (C/Ann Arbor/1/50), Influenza C virus (C/Aomori/74), Influenza C virus (C/California/78), Influenza C virus (C/England/83), Influenza C virus (C/Fukuoka/2/2004), Influenza C virus (C/Fukuoka/3/2004), Influenza C virus (C/Fukushima/1/2004), Influenza C virus (C/Greece/79), Influenza C virus (C/Hiroshima/246/2000), Influenza C virus (C/Hiroshima/247/2000), Influenza C virus (C/Hiroshima/248/2000), Influenza C virus (C/Hiroshima/249/2000), Influenza C virus (C/Hiroshima/250/2000), Influenza C virus (C/Hiroshima/251/2000), Influenza C virus (C/Hiroshima/252/2000), Influenza C virus (C/Hiroshima/252/99), Influenza C virus (C/Hiroshima/290/99), Influenza C virus (C/Hiroshima/4/2004), Influenza C virus (C/Hyogo/1/83), Influenza C virus (C/Johannesburg/1/66), Influenza C virus (C/Johannesburg/66), Influenza C virus (C/Kanagawa/1/76), Influenza C virus (C/Kanagawa/2/2004), Influenza C virus (C/Kansas/1/79), Influenza C virus (C/Kyoto/1/79), Influenza C virus (C/Kyoto/41/82), Influenza C virus (C/Mississippi/80), Influenza C virus (C/Miyagi/1/90), Influenza C virus (C/Miyagi/1/93), Influenza C virus (C/Miyagi/1/94), Influenza C virus (C/Miyagi/1/97), Influenza C virus (C/Miyagi/1/99), Influenza C virus (C/Miyagi/12/2004), Influenza C virus (C/Miyagi/2/2000), Influenza C virus (C/Miyagi/2/92), Influenza C virus (C/Miyagi/2/93), Influenza C virus (C/Miyagi/2/94), Influenza C virus (C/Miyagi/2/96), Influenza C virus (C/Miyagi/2/98), Influenza C virus (C/Miyagi/3/2000), Influenza C virus (C/Miyagi/3/91), Influenza C virus (C/Miyagi/3/92), Influenza C virus (C/Miyagi/3/93), Influenza C virus (C/Miyagi/3/94), Influenza C virus (C/Miyagi/3/97), Influenza C virus (C/Miyagi/3/99), Influenza C virus (C/Miyagi/4/2000), Influenza C virus (C/Miyagi/4/93), Influenza C virus (C/Miyagi/4/96), Influenza C virus (C/Miyagi/4/97), Influenza C virus (C/Miyagi/4/98), Influenza C virus (C/Miyagi/42/2004), Influenza C virus (C/Miyagi/5/2000), Influenza C virus (C/Miyagi/5/91), Influenza C virus (C/Miyagi/5/93), Influenza C virus (C/Miyagi/6/93), Influenza C virus (C/Miyagi/6/96), Influenza C virus (C/Miyagi/7/91), Influenza C virus (C/Miyagi/7/93), Influenza C virus (C/Miyagi/7/96), Influenza C virus (C/Miyagi/77), Influenza C virus (C/Miyagi/8/96), Influenza C virus (C/Miyagi/9/91), Influenza C virus (C/Miyagi/9/96), Influenza C virus (C/Nara/1/85), Influenza C virus (C/Nara/2/85), Influenza C virus (C/Nara/82), Influenza C virus (C/NewJersey/76), Influenza C virus (C/Niigata/1/2004), Influenza C virus (C/Osaka/2/2004), Influenza C virus (C/pig/Beijing/115/81), Influenza C virus (C/Saitama/1/2000), Influenza C virus (C/Saitama/1/2004), Influenza C virus (C/Saitama/2/2000), Influenza C virus (C/Saitama/3/2000), Influenza C virus (C/Sapporo/71), Influenza C virus (C/Shizuoka/79), Influenza C virus (C/Yamagata/1/86), Influenza C virus (C/Yamagata/1/88), Influenza C virus (C/Yamagata/10/89), Influenza C virus (C/Yamagata/13/98), Influenza C virus (C/Yamagata/15/2004), Influenza C virus (C/Yamagata/2/2000), Influenza C virus (C/Yamagata/2/98), Influenza C virus (C/Yamagata/2/99), Influenza C virus (C/Yamagata/20/2004), Influenza C virus (C/Yamagata/20/96), Influenza C virus (C/Yamagata/21/2004), Influenza C virus (C/Yamagata/26/81), Influenza C virus (C/Yamagata/27/2004), Influenza C virus (C/Yamagata/3/2000), Influenza C virus (C/Yamagata/3/2004), Influenza C virus (C/Yamagata/3/88), Influenza C virus (C/Yamagata/3/96), Influenza C virus (C/Yamagata/4/88), Influenza C virus (C/Yamagata/4/89), Influenza C virus (C/Yamagata/5/92), Influenza C virus (C/Yamagata/6/2000), Influenza C virus (C/Yamagata/6/98), Influenza C virus (C/Yamagata/64), Influenza C virus (C/Yamagata/7/88), Influenza C virus (C/Yamagata/8/2000), Influenza C virus (C/Yamagata/8/88), Influenza C virus (C/Yamagata/8/96), Influenza C virus (C/Yamagata/9/2000), Influenza C virus (C/Yamagata/9/88), Influenza C virus (C/Yamagata/9/96), Influenza C virus (STRAIN C/BERLIN/1/85), Influenza C virus (STRAIN C/ENGLAND/892/83), Influenza C virus (STRAIN C/GREAT LAKES/1167/54), Influenza C virus (STRAIN C/JJ/50), Influenza C virus (STRAIN C/PIG/BEIJING/10/81), Influenza C virus (STRAIN C/PIG/BEIJING/439/82), Influenza C virus (STRAIN C/TAYLOR/1233/47), Influenza C virus (STRAIN C/YAMAGATA/10/81), Isavirus or Infectious salmon anemia virus, Thogotovirus or Dhori virus, Batken virus, Dhori virus (STRAIN INDIAN/1313/61) or Thogoto virus, Thogoto virus (isolate SiAr 126) or unclassified Thogotovirus, Araguari virus, unclassified Orthomyxoviridae or Fowl plague virus or Swine influenza virus or unidentified influenza virus and other subtypes.

In various embodiments, the attenuated virus belongs to the delta virus family and all related genera.

In various embodiments, the attenuated virus belongs to the Adenoviridae virus family and all related genera, strains, types and isolates for example but not limited to human adenovirus A, B C.

In various embodiments, the attenuated virus belongs to the Herpesviridae virus family and all related genera, strains, types and isolates for example but not limited to herpes simplex virus.

In various embodiments, the attenuated virus belongs to the Reoviridae virus family and all related genera, strains, types and isolates.

In various embodiments, the attenuated virus belongs to the Papillomaviridae virus family and all related genera, strains, types and isolates.

In various embodiments, the attenuated virus belongs to the Poxviridae virus family and all related genera, strains, types and isolates.

In various embodiments, the attenuated virus belongs to the Retroviridae virus family and all related genera, strains, types and isolates. For example but not limited to Human Immunodeficiency Virus.

In various embodiments, the attenuated virus belongs to the Filoviridae virus family and all related genera, strains, types and isolates.

In various embodiments, the attenuated virus belongs to the Paramyxoviridae virus family and all related genera, strains, types and isolates.

In various embodiments, the attenuated virus belongs to the Orthomyxoviridae virus family and all related genera, strains, types and isolates.

In various embodiments, the attenuated virus belongs to the Picornaviridae virus family and all related genera, strains, types and isolates.

In various embodiments, the attenuated virus belongs to the Bunyaviridae virus family and all related genera, strains, types and isolates.

In various embodiments, the attenuated virus belongs to the Nidovirales virus family and all related genera, strains, types and isolates.

In various embodiments, the attenuated virus belongs to the Caliciviridae virus family and all related genera, strains, types and isolates.

In certain embodiments, the synonymous codon substitutions alter codon bias, codon pair bias, density of deoptimized codons and deoptimized codon pairs, RNA secondary structure, CpG dinucleotide content, C+G content, translation frameshift sites, translation pause sites, the presence or absence microRNA recognition sequences or any combination thereof, in the genome. The codon substitutions may be engineered in multiple locations distributed throughout the genome, or in the multiple locations restricted to a portion of the genome. In further embodiments, the portion of the genome is the capsid coding region.

In preferred embodiments of this invention, the virus retains the ability to induce a protective immune response in an animal host. In other preferred embodiments, the virulence of the virus does not revert to wild type.

Poliovirus, Rhinovirus, and Influenza Virus

Poliovirus, a member of the Picornavirus family, is a small non-enveloped virus with a single stranded (+) sense RNA genome of 7.5 kb in length (Kitamura et al., 1981). Upon cell entry, the genomic RNA serves as an mRNA encoding a single polyprotein that after a cascade of autocatalytic cleavage events gives rise to full complement of functional poliovirus proteins. The same genomic RNA serves as a template for the synthesis of (−) sense RNA, an intermediary for the synthesis of new (+) strands that either serve as mRNA, replication template or genomic RNA destined for encapsidation into progeny virions (Mueller et al., 2005). As described herein, the well established PV system was used to address general questions of optimizing design strategies for the production of attenuated synthetic viruses. PV provides one of the most important and best understood molecular models for developing anti-viral strategies. In particular, a reverse genetics system exists whereby viral nucleic acid can be synthesized in vitro by completely synthetic methods and then converted into infectious virions (see below). Furthermore, a convenient mouse model is available (CD155tg mice, which express the human receptor for polio) for testing attenuation of synthetic PV designs as previously described (Cello et al., 2002).

Rhinoviruses are also members of the Picornavirus family, and are related to PV. Human Rhinoviruses (HRV) are the usual causative agent of the common cold, and as such they are responsible for more episodes of illness than any other infectious agent (Hendley, 1999). In addition to the common cold, HRV is also involved in ear and sinus infections, asthmatic attacks, and other diseases. Similar to PV, HRV comprises a single-stranded positive sense RNA virus, whose genome encodes a self-processing polyprotein. The RNA is translated through an internal initiation mechanism using an Internal Ribosome Entry Site (IRES) to produce structural proteins that form the capsid, as well as non-structural proteins such as the two viral proteases, 2A and 3C, and the RNA-dependent polymerase (Jang et al., 1989; Pelletier et al., 1988). Also like PV, HRV has a non-enveloped icosahedral capsid, formed by 60 copies of the four capsid proteins VP1-4 (Savolainen et al., 2003). The replication cycle of HRV is also identical to that of poliovirus. The close similarity to PV, combined with the significant, almost ubiquitous impact on human health, makes HRV an extremely attractive candidate for generating a novel attenuated virus useful for immunization.

Despite decades of research by pharmaceutical companies, no successful drug against HRV has been developed. This is partly due to the relatively low risk tolerance of federal regulators and the public for drugs that treat a mostly non-serious infection. That is, even minor side effects are unacceptable. Thus, in the absence of a drug, there is a clear desire for a safe and effective anti-rhinovirus vaccine. However, developing an anti-rhinovirus vaccine is extremely challenging, because there are over 100 serotypes of HRV, of which approximately 30 circulate widely and infect humans regularly. An effective vaccine must enable the immune system to recognize every single serotype in order to confer true immunity. The SAVE approach described herein offers a practical solution to the development of an effective rhinovirus vaccine. Based on the predictability of the SAVE design process, it would be inexpensive to design and synthesize 100 or more SAVE-attenuated rhinoviruses, which in combination would constitute a vaccine.

Influenza virus—Between 1990 and 1999, influenza viruses caused approximately 35,000 deaths each year in the U.S.A. (Thompson et al., 2003). Together with approximately 200,000 hospitalizations, the impact on the U.S. economy has been estimated to exceed $23 billion annually (Cram et al., 2001). Globally, between 300,000 to 500,000 people die each year due to influenza virus infections (Kamps et al., 2006). Although the virus causes disease amongst all age groups, the rates of serious complications are highest in children and persons over 65 years of age. Influenza has the potential to mutate or recombine into extremely deadly forms, as happened during the great influenza epidemic of 1918, in which about 30 million people died. This was possibly the single most deadly one-year epidemic in human history.

Influenza viruses are divided into three types A, B, and C. Antigenicity is determined by two glycoproteins at the surface of the enveloped virion: hemagglutinin (HA) and neuraminidase (NA). Both glycoproteins continuously change their antigenicity to escape humoral immunity. Altering the glycoproteins allows virus strains to continue infecting vaccinated individuals, which is the reason for yearly vaccination of high-risk groups. In addition, human influenza viruses can replace the HA or NA glycoproteins with those of birds and pigs, a reassortment of gene segments, known as genetic shift, leading to new viruses (H1N1 to H2N2 or H3N2, etc.) (Steinhauer and Skehel, 2002). These novel viruses, to which the global population is immunologically naive, are the cause of pandemics that kill millions of people (Kilbourne, 2006; Russell and Webster, 2005). The history of influenza virus, together with the current threat of the highly pathogenic avian influenza virus, H5N1 (Stephenson and Democratis, 2006), underscores the need for preventing influenza virus disease.

Currently, two influenza vaccines are in use: a live, attenuated vaccine (cold adapted; “FluMist”) and an inactivated virus. The application of the attenuated vaccine is restricted to healthy children, adolescents and adults (excluding pregnant females), ages 5-49. This age restriction leaves out precisely those who are at highest risks of influenza. Furthermore, the attenuated FluMist virus has the possibility of reversion, which is usual for a live virus. Production of the second, more commonly administered inactivated influenza virus vaccine is complex. Further, this vaccine appears to be less effective than hoped for in preventing death in the elderly (>65-year-old) population (Simonson et al., 2005). These facts underscore the need for novel strategies to generate influenza virus vaccines.

Reverse Genetics of Picornaviruses

Reverse genetics generally refers to experimental approaches to discovering the function of a gene that proceeds in the opposite direction to the so-called forward genetic approaches of classical genetics. That is, whereas forward genetics approaches seek to determine the function of a gene by elucidating the genetic basis of a phenotypic trait, strategies based on reverse genetics begin with an isolated gene and seek to discover its function by investigating the possible phenotypes generated by expression of the wt or mutated gene. As used herein in the context of viral systems, “reverse genetics” systems refer to the availability of techniques that permit genetic manipulation of viral genomes made of RNA. Briefly, the viral genomes are isolated from virions or from infected cells, converted to DNA (“cDNA”) by the enzyme reverse transcriptase, possibly modified as desired, and reverted, usually via the RNA intermediate, back into infectious viral particles. This process in picornaviruses is extremely simple; in fact, the first reverse genetics system developed for any animal RNA virus was for PV (Racaniello and Baltimore, 1981). Viral reverse genetics systems are based on the historical finding that naked viral genomic RNA is infectious when transfected into a suitable mammalian cell (Alexander et al., 1958). The discovery of reverse transcriptase and the development of molecular cloning techniques in the 1970's enabled scientists to generate and manipulate cDNA copies of RNA viral genomes. Most commonly, the entire cDNA copy of the genome is cloned immediately downstream of a phage T7 RNA polymerase promoter that allows the in vitro synthesis of genome RNA, which is then transfected into cells for generation of virus (van der Wert, et al., 1986). Alternatively, the same DNA plasmid may be transfected into cells expressing the T7 RNA polymerase in the cytoplasm. This system can be used for various viral pathogens including both PV and HRV.

Molecular Virology and Reverse Genetics of Influenza Virus

Influenza virus, like the picornaviruses, PV and HRV, is an RNA virus, but is otherwise unrelated to and quite different from PV. In contrast to the picornaviruses, influenza is a minus strand virus. Furthermore, influenza consists of eight separate gene segments ranging from 890 to 2341 nucleotides (Lamb and Krug, 2001). Partly because of the minus strand organization, and partly because of the eight separate gene segments, the reverse genetics system is more complex than for PV. Nevertheless, a reverse genetics system has been developed for influenza virus (Enami et al., 1990; Fodor et al., 1999; Garcia-Sastre and Palese, 1993; Hoffman et al., 2000; Luytjes et al., 1989; Neumann et al., 1999). Each of the eight gene segments is expressed from a separate plasmid. This reverse genetics system is extremely convenient for use in the SAVE strategy described herein, because the longest individual gene segment is less than 3 kb, and thus easy to synthesize and manipulate. Further, the different gene segments can be combined and recombined simply by mixing different plasmids. Thus, application of SAVE methods are possibly even more feasible for influenza virus than for PV.

A recent paradigm shift in viral reverse genetics occurred with the present inventors' first chemical synthesis of an infectious virus genome by assembly from synthetic DNA oligonucleotides (Cello et al., 2002). This achievement made it clear that most or all viruses for which a reverse genetics system is available can be synthesized solely from their genomic sequence information, and promises unprecedented flexibility in re-synthesizing and modifying these viruses to meet desired criteria.

De Novo Synthesis of Viral Genomes

Computer-based algorithms are used to design and synthesize viral genomes de novo. These synthesized genomes, exemplified by the synthesis of attenuated PV described herein, encode exactly the same proteins as wild type (wt) viruses, but by using alternative synonymous codons, various parameters, including codon bias, codon pair bias, RNA secondary structure, and/or dinucleotide content, are altered. The presented data show that these coding-independent changes produce highly attenuated viruses, often due to poor translation of proteins. By targeting an elementary function of all viruses, namely protein translation, a very general method has been developed for predictably, safely, quickly and cheaply producing attenuated viruses, which are useful for making vaccines. This method, dubbed “SAVE” (Synthetic Attenuated Virus Engineering), is applicable to a wide variety of viruses other than PV for which there is a medical need for new vaccines. These viruses include, but are not limited to rhinovirus, influenza virus, SARS and other coronaviruses, HIV, HCV, infectious bronchitis virus, ebolavirus, Marburg virus, dengue fever virus, West Nile disease virus, EBV, yellow fever virus, enteroviruses other than poliovirus, such as echoviruses, coxsackie viruses, and entrovirus71; hepatitis A virus, aphthoviruses, such as foot-and-mouth-disease virus, myxoviruses, such as influenza viruses, paramyxoviruses, such as measles virus, mumps virus, respiratory syncytia virus, flaviviruses such as dengue virus, yellow fever virus, St. Louis encephalitis virus and tick-born virus, alphaviruses, such as Western- and Eastern encephalitis virus, hepatitis B virus, and bovine diarrhea virus, and ebolavirus.

Both codon and codon-pair deoptimization in the PV capsid coding region are shown herein to dramatically reduce PV fitness. The present invention is not limited to any particular molecular mechanism underlying virus attenuation via substitution of synonymous codons. Nevertheless, experiments are ongoing to better understand the underlying molecular mechanisms of codon and codon pair deoptimization in producing attenuated viruses. In particular, evidence is provided in this application that indicates that codon deoptimization and codon pair deoptimization can result in inefficient translation. High throughput methods for the quick generation and screening of large numbers of viral constructs are also being developed.

Large-Scale DNA Assembly

In recent years, the plunging costs and increasing quality of oligonucleotide synthesis have made it practical to assemble large segments of DNA (at least up to about 10 kb) from synthetic oligonucleotides. Commercial vendors such as Blue Heron Biotechnology, Inc. (Bothwell, Wash.) (and also many others) currently synthesize, assemble, clone, sequence-verify, and deliver a large segment of synthetic DNA of known sequence for the relatively low price of about $1.50 per base. Thus, purchase of synthesized viral genomes from commercial suppliers is a convenient and cost-effective option, and prices continue to decrease rapidly. Furthermore, new methods of synthesizing and assembling very large DNA molecules at extremely low costs are emerging (Tian et al., 2004). The Church lab has pioneered a method that uses parallel synthesis of thousands of oligonucleotides (for instance, on photo-programmable microfluidics chips, or on microarrays available from Nimblegen Systems, Inc., Madison, Wis., or Agilent Technologies, Inc., Santa Clara, Calif.), followed by error reduction and assembly by overlap PCR. These methods have the potential to reduce the cost of synthetic large DNAs to less than 1 cent per base. The improved efficiency and accuracy, and rapidly declining cost, of large-scale DNA synthesis provides an impetus for the development and broad application of the SAVE strategy.

Alternative Encoding, Codon Bias, and Codon Pair Bias

Alternative Encoding

A given peptide can be encoded by a large number of nucleic acid sequences. For example, even a typical short 10-mer oligopeptide can be encoded by approximately 4¹⁰(about 10⁶) different nucleic acids, and the proteins of PV can be encoded by about 10⁴⁴²different nucleic acids. Natural selection has ultimately chosen one of these possible 10⁴⁴²nucleic acids as the PV genome. Whereas the primary amino acid sequence is the most important level of information encoded by a given mRNA, there are additional kinds of information within different kinds of RNA sequences. These include RNA structural elements of distinct function (e.g., for PV, the cis-acting replication element, or CRE (Goodfellow et al., 2000; McKnight, 2003), translational kinetic signals (pause sites, frame shift sites, etc.), polyadenylation signals, splice signals, enzymatic functions (ribozyme) and, quite likely, other as yet unidentified information and signals).

Even with the caveat that signals such as the CRE must be preserved, 10⁴⁴²possible encoding sequences provide tremendous flexibility to make drastic changes in the RNA sequence of polio while preserving the capacity to encode the same protein. Changes can be made in codon bias or codon pair bias, and nucleic acid signals and secondary structures in the RNA can be added or removed. Additional or novel proteins can even be simultaneously encoded in alternative frames (see, e.g., Wang et al., 2006).

Codon Bias

Whereas most amino acids can be encoded by several different codons, not all codons are used equally frequently: some codons are “rare” codons, whereas others are “frequent” codons. As used herein, a “rare” codon is one of at least two synonymous codons encoding a particular amino acid that is present in an mRNA at a significantly lower frequency than the most frequently used codon for that amino acid. Thus, the rare codon may be present at about a 2-fold lower frequency than the most frequently used codon. Preferably, the rare codon is present at least a 3-fold, more preferably at least a 5-fold, lower frequency than the most frequently used codon for the amino acid. Conversely, a “frequent” codon is one of at least two synonymous codons encoding a particular amino acid that is present in an mRNA at a significantly higher frequency than the least frequently used codon for that amino acid. The frequent codon may be present at about a 2-fold, preferably at least a 3-fold, more preferably at least a 5-fold, higher frequency than the least frequently used codon for the amino acid. For example, human genes use the leucine codon CTG 40% of the time, but use the synonymous CTA only 7% of the time (see Table 2). Thus, CTG is a frequent codon, whereas CTA is a rare codon. Roughly consistent with these frequencies of usage, there are 6 copies in the genome for the gene for the tRNA recognizing CTG, whereas there are only 2 copies of the gene for the tRNA recognizing CTA. Similarly, human genes use the frequent codons TCT and TCC for serine 18% and 22% of the time, respectively, but the rare codon TCG only 5% of the time. TCT and TCC are read, via wobble, by the same tRNA, which has 10 copies of its gene in the genome, while TCG is read by a tRNA with only 4 copies. It is well known that those mRNAs that are very actively translated are strongly biased to use only the most frequent codons. This includes genes for ribosomal proteins and glycolytic enzymes. On the other hand, mRNAs for relatively non-abundant proteins may use the rare codons.

TABLE 2

Codon usage in Homo sapiens

(source: http://www.kazusa.or.jp/codon/)

Amino

Acid
Codon
Number
/1000
Fraction

Gly
GGG
636457.00
16.45
0.25

Gly
GGA
637120.00
16.47
0.25

Gly
GGT
416131.00
10.76
0.16

Gly
GGC
862557.00
22.29
0.34

Glu
GAG
1532589.00
39.61
0.58

Glu
GAA
1116000.00
28.84
0.42

Asp
GAT
842504.00
21.78
0.46

Asp
GAC
973377.00
25.16
0.54

Val
GTG
1091853.00
28.22
0.46

Val
GTA
273515.00
7.07
0.12

Val
GTT
426252.00
11.02
0.18

Val
GTC
562086.00
14.53
0.24

Ala
GCG
286975.00
7.42
0.11

Ala
GCA
614754.00
15.89
0.23

Ala
GCT
715079.00
18.48
0.27

Ala
GCC
1079491.00
27.90
0.40

Arg
AGG
461676.00
11.93
0.21

Arg
AGA
466435.00
12.06
0.21

Ser
AGT
469641.00
12.14
0.15

Ser
AGC
753597.00
19.48
0.24

Lys
AAG
1236148.00
31.95
0.57

Lys
AAA
940312.00
24.30
0.43

Asn
AAT
653566.00
16.89
0.47

Asn
AAC
739007.00
19.10
0.53

Met
ATG
853648.00
22.06
1.00

Ile
ATA
288118.00
7.45
0.17

Ile
ATT
615699.00
15.91
0.36

Ile
ATC
808306.00
20.89
0.47

Thr
ACG
234532.00
6.06
0.11

Thr
ACA
580580.00
15.01
0.28

Thr
ACT
506277.00
13.09
0.25

Thr
ACC
732313.00
18.93
0.36

Trp
TGG
510256.00
13.19
1.00

End
TGA
59528.00
1.54
0.47

Cys
TGT
407020.00
10.52
0.45

Cys
TGC
487907.00
12.61
0.55

End
TAG
30104.00
0.78
0.24

End
TAA
38222.00
0.99
0.30

Tyr
TAT
470083.00
12.15
0.44

Tyr
TAC
592163.00
15.30
0.56

Leu
TTG
498920.00
12.89
0.13

Leu
TTA
294684.00
7.62
0.08

Phe
TTT
676381.00
17.48
0.46

Phe
TTC
789374.00
20.40
0.54

Ser
TCG
171428.00
4.43
0.05

Ser
TCA
471469.00
12.19
0.15

Ser
TCT
585967.00
15.14
0.19

Ser
TCC
684663.00
17.70
0.22

Arg
CGG
443753.00
11.47
0.20

Arg
CGA
239573.00
6.19
0.11

Arg
CGT
176691.00
4.57
0.08

Arg
CGC
405748.00
10.49
0.18

Gln
CAG
1323614.00
34.21
0.74

Gln
CAA
473648.00
12.24
0.26

His
CAT
419726.00
10.85
0.42

His
CAC
583620.00
15.08
0.58

Leu
CTG
1539118.00
39.78
0.40

Leu
CTA
276799.00
7.15
0.07

Leu
CTT
508151.00
13.13
0.13

Leu
CTC
759527.00
19.63
0.20

Pro
CCG
268884.00
6.95
0.11

Pro
CCA
653281.00
16.88
0.28

Pro
CCT
676401.00
17.48
0.29

Pro
CCC
767793.00
19.84
0.32

The propensity for highly expressed genes to use frequent codons is called “codon bias.” A gene for a ribosomal protein might use only the 20 to 25 most frequent of the 61 codons, and have a high codon bias (a codon bias close to 1), while a poorly expressed gene might use all 61 codons, and have little or no codon bias (a codon bias close to 0). It is thought that the frequently used codons are codons where larger amounts of the cognate tRNA are expressed, and that use of these codons allows translation to proceed more rapidly, or more accurately, or both. The PV capsid protein is very actively translated, and has a high codon bias.

Codon Pair Bias

A distinct feature of coding sequences is their codon pair bias. This may be illustrated by considering the amino acid pair Ala-Glu, which can be encoded by 8 different codon pairs. If no factors other than the frequency of each individual codon (as shown in Table 2) are responsible for the frequency of the codon pair, the expected frequency of each of the 8 encodings can be calculated by multiplying the frequencies of the two relevant codons. For example, by this calculation the codon pair GCA-GAA would be expected to occur at a frequency of 0.097 out of all Ala-Glu coding pairs (0.23×0.42; based on the frequencies in Table 2). In order to relate the expected (hypothetical) frequency of each codon pair to the actually observed frequency in the human genome the Consensus CDS (CCDS) database of consistently annotated human coding regions, containing a total of 14,795 human genes, was used. This set of genes is the most comprehensive representation of human coding sequences. Using this set of genes the frequencies of codon usage were re-calculated by dividing the number of occurrences of a codon by the number of all synonymous codons coding for the same amino acid. As expected the frequencies correlated closely with previously published ones such as the ones given in Table 2. Slight frequency variations are possibly due to an oversampling effect in the data provided by the codon usage database at Kazusa DNA Research Institute (http://www.kazusa.or.jp/codon/codon.html) where 84949 human coding sequences were included in the calculation (far more than the actual number of human genes). The codon frequencies thus calculated were then used to calculate the expected codon-pair frequencies by first multiplying the frequencies of the two relevant codons with each other (see Table 3 expected frequency), and then multiplying this result with the observed frequency (in the entire CCDS data set) with which the amino acid pair encoded by the codon pair in question occurs. In the example of codon pair GCA-GAA, this second calculation gives an expected frequency of 0.098 (compared to 0.97 in the first calculation using the Kazusa dataset). Finally, the actual codon pair frequencies as observed in a set of 14,795 human genes was determined by counting the total number of occurrences of each codon pair in the set and dividing it by the number of all synonymous coding pairs in the set coding for the same amino acid pair (Table 3; observed frequency). Frequency and observed/expected values for the complete set of 3721 (61²) codon pairs, based on the set of 14,795 human genes, are provided herewith as Supplemental Table 1.

TABLE 3

Codon Pair Scores Exemplified

by the Amino Acid Pair Ala-Glu

amino

obs/

acid
codon
expected
observed
exp

pair
pair
frequency
frequency
ratio

AE
GCAGAA
0.098
0.163
1.65

AE
GCAGAG
0.132
0.198
1.51

AE
GCCGAA
0.171
0.031
0.18

AE
GCCGAG
0.229
0.142
0.62

AE
GCGGAA
0.046
0.027
0.57

AE
GCGGAG
0.062
0.089
1.44

AE
GCTGAA
0.112
0.145
1.29

AE
GCTGAG
0.150
0.206
1.37

Total

1.000
1.000

If the ratio of observed frequency/expected frequency of the codon pair is greater than one the codon pair is said to be overrepresented. If the ratio is smaller than one, it is said to be underrepresented. In the example the codon pair GCA-GAA is overrepresented 1.65 fold while the coding pair GCC-GAA is more than 5-fold underrepresented.

Many other codon pairs show very strong bias; some pairs are under-represented, while other pairs are over-represented. For instance, the codon pairs GCCGAA (AlaGlu) and GATCTG (AspLeu) are three- to six-fold under-represented (the preferred pairs being GCAGAG and GACCTG, respectively), while the codon pairs GCCAAG (AlaLys) and AATGAA (AsnGlu) are about two-fold over-represented. It is noteworthy that codon pair bias has nothing to do with the frequency of pairs of amino acids, nor with the frequency of individual codons. For instance, the under-represented pair GATCTG (AspLeu) happens to use the most frequent Leu codon, (CTG).

Codon pair bias was discovered in prokaryotic cells (see Greve et al., 1989), but has since been seen in all other examined species, including humans. The effect has a very high statistical significance, and is certainly not just noise. However, its functional significance, if any, is a mystery. One proposal is that some pairs of tRNAs interact well when they are brought together on the ribosome, while other pairs interact poorly. Since different codons are usually read by different tRNAs, codon pairs might be biased to avoid putting together pairs of incompatible tRNAs (Greve et al., 1989). Another idea is that many (but not all) under-represented pairs have a central CG dinucleotide (e.g., GCCGAA, encoding AlaGlu), and the CG dinucleotide is systematically under-represented in mammals (Buchan et al., 2006; Curran et al., 1995; Fedorov et al., 2002). Thus, the effects of codon pair bias could be of two kinds—one an indirect effect of the under-representation of CG in the mammalian genome, and the other having to do with the efficiency, speed and/or accuracy of translation. It is emphasized that the present invention is not limited to any particular molecular mechanism underlying codon pair bias.

As discussed more fully below, codon pair bias takes into account the score for each codon pair in a coding sequence averaged over the entire length of the coding sequence. According to the invention, codon pair bias is determined by

$CPB = \sum_{i = 1}^{k} \frac{CPSi}{k - 1} .$

Accordingly, similar codon pair bias for a coding sequence can be obtained, for example, by minimized codon pair scores over a subsequence or moderately diminished codon pair scores over the full length of the coding sequence.

Since all 61 sense codons and all sense codon pairs can certainly be used, it would not be expected that substituting a single rare codon for a frequent codon, or a rare codon pair for a frequent codon pair, would have much effect. Therefore, many previous investigations of codon and codon pair bias have been done via informatics, not experimentation. One investigation of codon pair bias that was based on experimental work was the study of Irwin et al. (1995), who found, counterintuitively, that certain over-represented codon pairs caused slower translation. However, this result could not be reproduced by a second group (Cheng and Goldman, 2001), and is also in conflict with results reported below. Thus, the present results (see below) may be the first experimental evidence for a functional role of codon pair bias.

Certain experiments disclosed herein relate to re-coding viral genome sequences, such as the entire capsid region of PV, involving around 1000 codons, to separately incorporate both poor codon bias and poor codon pair bias into the genome. The rationale underlying these experiments is that if each substitution creates a small effect, then all substitutions together should create a large effect. Indeed, it turns out that both deoptimized codon bias, and deoptimized codon pair bias, separately create non-viable viruses. As discussed in more detail in the Examples, preliminary data suggest that inefficient translation is the major mechanism for reducing the viability of a virus with poor codon bias or codon pair bias. Irrespective of the precise mechanism, the data indicate that the large-scale substitution of synonymous deoptimized codons into a viral genome results in severely attenuated viruses. This procedure for producing attenuated viruses has been dubbed SAVE (Synthetic Attenuated Virus Engineering).

According to the invention, viral attenuation can be accomplished by changes in codon pair bias as well as codon bias. However, it is expected that adjusting codon pair bias is particularly advantageous. For example, attenuating a virus through codon bias generally requires elimination of common codons, and so the complexity of the nucleotide sequence is reduced. In contrast, codon pair bias reduction or minimization can be accomplished while maintaining far greater sequence diversity, and consequently greater control over nucleic acid secondary structure, annealing temperature, and other physical and biochemical properties. The work disclosed herein includes attenuated codon pair bias-reduced or -minimized sequences in which codons are shuffled, but the codon usage profile is unchanged.

Viral attenuation can be confirmed in ways that are well known to one of ordinary skill in the art. Non-limiting examples induce plaque assays, growth measurements, and reduced lethality in test animals. The instant application demonstrates that the attenuated viruses are capable of inducing protective immune responses in a host.

Synthetic Attenuated Virus Engineering (SAVE)

SAVE employs specifically designed computer software and modern methods of nucleic acid synthesis and assembly to re-code and re-synthesize the genomes of viruses. This strategy provides an efficient method of producing vaccines against various medically important viruses for which efficacious vaccines are sought.

Two effective polio vaccines, an inactivated polio vaccine (IPV) developed by Jonas Salk and an oral polio vaccine (OPV) comprising live attenuated virus developed by Albert Sabin, respectively, have been available sine the 1950's. Indeed, a global effort to eradicate poliomyelitis, begun in 1988 and led by the World Health Organization (WHO), has succeeded in eradicating polio from most of the countries in the world. The number of annual diagnosed cases has been reduced from the hundreds of thousands to less that two thousand in 2005, occurring mainly in India and in Nigeria. However, a concern regarding the wide use of the OPV is that is can revert to a virulent form, and though believed to be a rare event, outbreaks of vaccine-derived polio have been reported (Georgescu et al., 1997; Kew et al., 2002; Shimizu et al., 2004). In fact, as long as the live poliovirus vaccine strains are used, each carrying less than 7 attenuating mutations, there is a possibility that this strain will revert to wt, and such reversion poses a serious threat to the complete eradication of polio. Thus, the WHO may well need a new polio vaccine to combat the potential of reversion in the closing stages of its efforts at polio eradication, and this provides one rationale for the studies disclosed herein on the application of SAVE to PV. However, PV was selected primarily because it is an excellent model system for developing SAVE.

During re-coding, essential nucleic acid signals in the viral genome are preserved, but the efficiency of protein translation is systematically reduced by deoptimizing codon bias, codon pair bias, and other parameters such as RNA secondary structure and CpG dinucleotide content, C+G content, translation frameshift sites, translation pause sites, or any combination thereof. This deoptimization may involve hundreds or thousands of changes, each with a small effect. Generally, deoptimization is performed to a point at which the virus can still be grown in some cell lines (including lines specifically engineered to be permissive for a particular virus), but where the virus is avirulent in a normal animal or human. Such avirulent viruses are excellent candidates for either a killed or live vaccine since they encode exactly the same proteins as the fully virulent virus and accordingly provoke exactly the same immune response as the fully virulent virus. In addition, the SAVE process offers the prospect for fine tuning the level of attenuation; that is, it provides the capacity to design synthetic viruses that are deoptimized to a roughly predictable extent. Design, synthesis, and production of viral particles is achievable in a timeframe of weeks once the genome sequence is known, which has important advantages for the production of vaccines in potential emergencies. Furthermore, the attenuated viruses are expected to have virtually no potential to revert to virulence because of the extremely large numbers of deleterious nucleotide changes involved. This method may be generally applicable to a wide range of viruses, requiring only knowledge of the viral genome sequence and a reverse genetics system for any particular virus.

Viral Attenuation by Deoptimizing Codon Bias

If one uses the IC₅₀-ratio of control cells/test cells method as described above, then compounds with CSG values less than or equal to 1 would not generally be considered to be good clinical candidate compounds, whereas compounds with CSG values of greater than approximately 10 would be quite promising and worthy of further consideration.

As a means of engineering attenuated viruses, the capsid coding region of poliovirus type 1 Mahoney [PV(M)] was re-engineered by making changes in synonymous codon usage. The capsid region comprises about a third of the virus and is very actively translated. One mutant virus (virus PV-AB), having a very low codon bias due to replacement of the largest possible number of frequently used codons with rare synonymous codons was created. As a control, another virus (PV-SD) was created having the largest possible number of synonymous codon changes while maintaining the original codon bias. See FIGS. 1 and 2. Thus, PV-SD is a virus having essentially the same codons as the wt, but in shuffled position while encoding exactly the same proteins. In PV-SD, no attempt was made to increase or reduce codon pair bias by the shuffling procedure. See Example 1. Despite 934 nucleotide changes in the capsid-coding region, PV-SD RNA produced virus with characteristics indistinguishable from wt. In contrast, no viable virus was recovered from PV-AB carrying 680 silent mutations. See Example 2.

A trivial explanation of the inviability of PV-AB is that just one of the nucleotide changes is somehow lethal, while the other 679 are harmless. For instance, a nucleotide change could be lethal for some catastrophic but unappreciated reason, such as preventing replication. This explanation is unlikely, however. Although PV does contain important regulatory elements in its RNA, such as the CRE, it is known that no such elements exist inside the capsid coding region. This is supported by the demonstration that the entire capsid coding region can be deleted without affecting normal replication of the residual genome within the cell, though of course viral particles cannot be formed (Kaplan and Racamiello, 1988).

To address questions concerning the inviability of certain re-engineered viruses, sub-segments of the capsid region of virus PV-AB were subcloned into the wild type virus. See Example 1 and FIG. 3. Incorporating large subcloned segments (including non-overlapping segments) proved lethal, while small subcloned segments produced viable (with one exception) but sick viruses. “Sickness” is revealed by many assays: for example, segments of poor codon bias cause poor titers (FIG. 3B) and small plaques (FIGS. 3C-H). It is particularly instructive that in general, large, lethal segments can be divided into two sub-segments, both of which are alive but sick (FIG. 3). These results rule out the hypothesis that inviability is due to just one change; instead, at minimum, many changes must be contributing to the phenotype.

There is an exceptional segment from position 1513 to 2470. This segment is fairly small, but its inclusion in the PV genome causes inviability. It is not known at present whether or not this fragment can be subdivided into subfragments that merely cause sickness and do not inactivate the virus. It is conceivable that this segment does contain a highly deleterious change, possibly a translation frameshift site.

Since poor codon bias naturally suggests an effect on translation, translation of the proteins encoded by virus PV-AB was tested. See Example 5 and FIG. 5. Indeed, all the sick viruses translated capsid protein poorly (FIG. 5B). Translation was less efficient in the sicker viruses, consistent with poor translation being the cause of the sickness. Translation was improved essentially to wt levels in reactions that were supplemented with excess tRNAs and amino acids (FIG. 5A), consistent with the rate of recognition of rare codons being limiting.

As a second test of whether deoptimized codon bias was causing inefficient translation, portions of wt and deoptimized capsid were fused to the N-terminus of firefly luciferase in a dicistronic reporter construct. See Example 5 and FIG. 6. In these fusion constructs, translation of luciferase depends on translation of the N-terminally fused capsid protein. Again, it was found that translation of the capsid proteins with deoptimized codons was poor, and was worse in the sicker viruses, suggesting a cause-and-effect relationship. Thus, the data suggest that the hundreds of rare codons in the PV-AB virus cause inviability largely because of poor translation. Further, the poor translation seen in vitro and the viral sickness seen in cultured cells are also reflected in infections of animals. Even for one of the least debilitated deoptimized viruses, PV-AB^2470-2954, the number of viral particles needed to cause disease in mice was increased by about 100-fold. See Example 4, Table 4.

Burns et al. (2006) have recently described some similar experiments with the Sabin type 2 vaccine strain of PV and reached similar conclusions. Burns et al. synthesized a completely different codon-deoptimized virus (i.e., the nucleotide sequences of the PV-AB virus described herein and their “abcd” virus are very different), and yet got a similar degree of debilitation using similar assays. Burns et al. did not test their viral constructs in host organisms for attenuation. However, their result substantiates the view that SAVE is predictable, and that the results are not greatly dependent on the exact nucleotide sequence.

Viral Attenuation by Deoptimizing Codon Pair Bias

According to the invention, codon pair bias can be altered independently of codon usage. For example, in a protein encoding sequence of interest, codon pair bias can be altered simply by directed rearrangement of its codons. In particular, the same codons that appear in the parent sequence, which can be of varying frequency in the host organism, are used in the altered sequence, but in different positions. In the simplest form, because the same codons are used as in the parent sequence, codon usage over the protein coding region being considered remains unchanged (as does the encoded amino acid sequence). Nevertheless, certain codons appear in new contexts, that is, preceded by and/or followed by codons that encode the same amino acid as in the parent sequence, but employing a different nucleotide triplet. Ideally, the rearrangement of codons results in codon pairs that are less frequent than in the parent sequence. In practice, rearranging codons often results in a less frequent codon pair at one location and a more frequent pair at a second location. By judicious rearrangement of codons, the codon pair usage bias over a given length of coding sequence can be reduced relative to the parent sequence. Alternatively, the codons could be rearranged so as to produce a sequence that makes use of codon pairs which are more frequent in the host than in the parent sequence.

Codon pair bias is evaluated by considering each codon pair in turn, scoring each pair according to the frequency that the codon pair is observed in protein coding sequences of the host, and then determining the codon pair bias for the sequence, as disclosed herein. It will be appreciated that one can create many different sequences that have the same codon pair bias. Also, codon pair bias can be altered to a greater or lesser extent, depending on the way in which codons are rearranged. The codon pair bias of a coding sequence can be altered by recoding the entire coding sequence, or by recoding one or more subsequences. As used herein, “codon pair bias” is evaluated over the length of a coding sequence, even though only a portion of the sequence may be mutated. Because codon pairs are scored in the context of codon usage of the host organism, a codon pair bias value can be assigned to wild type viral sequences and mutant viral sequences. According to the invention, a virus can be attenuated by recoding all or portions of the protein encoding sequences of the virus so a to reduce its codon pair bias.

According to the invention, codon pair bias is a quantitative property determined from codon pair usage of a host. Accordingly, absolute codon pair bias values may be determined for any given viral protein coding sequence. Alternatively, relative changes in codon pair bias values can be determined that relate a deoptimized viral protein coding sequence to a “parent” sequence from which it is derived. As viruses come in a variety of types (i.e., types I to VII by the Baltimore classification), and natural (i.e., virulent) isolates of different viruses yield different values of absolute codon pair bias, it is relative changes in codon pair bias that are usually more relevant to determining desired levels of attenuation. Accordingly, the invention provides attenuated viruses and methods of making such, wherein the attenuated viruses comprise viral genomes in which one or more protein encoding nucleotide sequences have codon pair bias reduced by mutation. In viruses that encode only a single protein (i.e., a polyprotein), all or part of the polyprotein can be mutated to a desired degree to reduce codon pair bias, and all or a portion of the mutated sequence can be provided in a recombinant viral construct. For a virus that separately encodes multiple proteins, one can reduce the codon pair bias of all of the protein encoding sequences simultaneously, or select only one or a few of the protein encoding sequences for modification. The reduction in codon pair bias is determined over the length of a protein encoding sequences, and is at least about 0.05, or at least about 0.1, or at least about 0.15, or at least about 0.2, or at least about 0.3, or at least about 0.4. Depending on the virus, the absolute codon pair bias, based on codon pair usage of the host, can be about −0.05 or less, or about −0.1 or less, or about −0.15 or less, or about −0.2 or less, or about −0.3 or less, or about −0.4 or less.

It will be apparent that codon pair bias can also be superimposed on other sequence variation. For example, a coding sequence can be altered both to encode a protein or polypeptide which contains one or more amino acid changes and also to have an altered codon pair bias. Also, in some cases, one may shuffle codons to maintain exactly the same codon usage profile in a codon-bias reduced protein encoding sequence as in a parent protein encoding sequence. This procedure highlights the power of codon pair bias changes, but need not be adhered to. Alternatively, codon selection can result in an overall change in codon usage is a coding sequence. In this regard, it is noted that in certain examples provided herein, (e.g., the design of PV-Min) even if the codon usage profile is not changed in the process of generating a codon pair bias minimized sequence, when a portion of that sequence is subcloned into an unmutated sequence (e.g., PV-MinXY or PV-MinZ), the codon usage profile over the subcloned portion, and in the hybrid produced, will not match the profile of the original unmutated protein coding sequence. However, these changes in codon usage profile have minimal effect of codon pair bias.

Similarly, it is noted that, by itself, changing a nucleotide sequence to encode a protein or polypeptide with one or many amino acid substitutions is also highly unlikely to produce a sequence with a significant change in codon pair bias. Consequently, codon pair bias alterations can be recognized even in nucleotide sequences that have been further modified to encode a mutated amino acid sequence. It is also noteworthy that mutations meant by themselves to increase codon bias are not likely to have more than a small effect on codon pair bias. For example, as disclosed herein, the codon pair bias for a poliovirus mutant recoded to maximize the use of nonpreferred codons (PV-AB) is decreased from wild type (PV-1(M)) by only about 0.05. Also noteworth is that such a protein encoding sequence have greatly diminished sequence diversity. To the contrary, substantial sequence diversity is maintained in codon pair bias modified sequences of the invention. Moreover, the significant reduction in codon pair bias obtainable without increased use of rare codons suggests that instead of maximizing the use of nonpreferred codons, as in PV-AB, it would be beneficial to rearrange nonpreferred codons with a sufficient number of preferred codons in order to more effectively reduce codon pair bias.

The extent and intensity of mutation can be varied depending on the length of the protein encoding nucleic acid, whether all or a portion can be mutated, and the desired reduction of codon pair bias. In an embodiment of the invention, a protein encoding sequence is modified over a length of at least about 100 nucleotide, or at least about 200 nucleotides, or at least about 300 nucleotides, or at least about 500 nucleotides, or at least about 1000 nucleotides.

As discussed above, the term “parent” virus or “parent” protein encoding sequence is used herein to refer to viral genomes and protein encoding sequences from which new sequences, which may be more or less attenuated, are derived. Accordingly, a parent virus can be a “wild type” or “naturally occurring” prototypes or isolate or variant or a mutant specifically created or selected on the basis of real or perceived desirable properties.

Using de novo DNA synthesis, the capsid coding region (the P1 region from nucleotide 755 to nucleotide 3385) of PV(M) was redesigned to introduce the largest possible number of rarely used codon pairs (virus PV-Min) (SEQ ID NO:4) or the largest possible number of frequently used codon pairs (virus PV-Max) (SEQ ID NO:5), while preserving the codon bias of the wild type virus. See Example 7. That is, the designed sequences use the same codons as the parent sequence, but they appear in a different order. The PV-Max virus exhibited one-step growth kinetics and killing of infected cells essentially identical to wild type virus. (That growth kinetics are not increased for a codon pair maximized virus relative to wild type appears to hold true for other viruses as well.) Conversely, cells transfected with PV-Min mutant RNA were not killed, and no viable virus could be recovered. Subcloning of fragments (PV-Min^755-2470, PV-Min^2470-3386) of the capsid region of PV-Min into the wt background produced very debilitated, but not dead, virus. See Example 7 and FIG. 8. This result substantiates the hypothesis that deleterious codon changes are preferably widely distributed and demonstrates the simplicity and effectiveness of varying the extent of the codon pair deoptimized sequence that is substituted into a wild type parent virus genome in order to vary the codon pair bias for the overall sequence and the attenuation of the viral product. As seen with PV-AB viruses, the phenotype of PV-Min viruses is a result of reduced specific infectivity of the viral particles rather than of lower production of progeny virus.

Virus with deoptimized codon pair bias are attenuated. As exemplified below, (see Example 8, and Table 5), CD155tg mice survived challenge by intracerebral injection of attenuated virus in amounts 1000-fold higher than would be lethal for wild type virus. These findings demonstrate the power of deoptimization of codon pair bias to minimize lethality of a virus. Further, the viability of the virus can be balanced with a reduction of infectivity by choosing the degree of codon pair bias deoptimization. Further, once a degree or ranges of degrees of codon pair bias deoptimization is determined that provides desired attenuation properties, additional sequences can be designed to attain that degree of codon pair bias. For example, SEQ ID NO:6 provides a poliovirus sequence with a codon pair bias of about −0.2, and mutations distributed over the region encompassing the mutated portions of PV-MinXY and PV-MinZ (i.e., PV^755-3385).

Algorithms for Sequence Design

The inventors have developed several novel algorithms for gene design that optimize the DNA sequence for particular desired properties while simultaneously coding for the given amino acid sequence. In particular, algorithms for maximizing or minimizing the desired RNA secondary structure in the sequence (Cohen and Skiena, 2003) as well as maximally adding and/or removing specified sets of patterns (Skiena, 2001), have been developed. The former issue arises in designing viable viruses, while the latter is useful to optimally insert restriction sites for technological reasons. The extent to which overlapping genes can be designed that simultaneously encode two or more genes in alternate reading frames has also been studied (Wang et al., 2006). This property of different functional polypeptides being encoded in different reading frames of a single nucleic acid is common in viruses and can be exploited for technological purposes such as weaving in antibiotic resistance genes.

The first generation of design tools for synthetic biology has been built, as described by Jayaraj et al. (2005) and Richardson et al. (2006). These focus primarily on optimizing designs for manufacturability (i.e., oligonucleotides without local secondary structures and end repeats) instead of optimizing sequences for biological activity. These first-generation tools may be viewed as analogous to the early VLSI CAD tools built around design rule-checking, instead of supporting higher-order design principles.

As exemplified herein, a computer-based algorithm can be used to manipulate the codon pair bias of any coding region. The algorithm has the ability to shuffle existing codons and to evaluate the resulting CPB, and then to reshuffle the sequence, optionally locking in particularly “valuable” codon pairs. The algorithm also employs a for of “simulated annealing” so as not to get stuck in local minima Other parameters, such as the free energy of folding of RNA, may optional be under the control of the algorithm as well, in order to avoid creation of undesired secondary structures. The algorithm can be used to find a sequence with a minimum codon pair bias, and in the event that such a sequence does not provide a viable virus, the algorithm can be adjusted to find sequences with reduced, but not minimized biases. Of course, a viable viral sequence could also be produced using only a subsequence of the computer minimized sequence.

Whether or not performed with the aid of a computer, using, for example, a gradient descent, or simulated annealing, or other minimization routine. An example of the procedure that rearranges codons present in a starting sequence can be represented by the following steps:

1) Obtain wildtype viral genome sequence.

2) Select protein coding sequences to target for attenuated design.

3) Lock down known or conjectured DNA segments with non-coding functions.

4) Select desired codon distribution for remaining amino acids in redesigned proteins.

5) Perform random shuffle of unlocked codon positions and calculate codon-pair score.

6) Further reduce (or increase) codon-pair score optionally employing a simulated annealing procedure.

7) Inspect resulting design for excessive secondary structure and unwanted restriction site:

- if yes→go to step (5) or correct the design by replacing problematic regions with wildtype sequences and go to step (8).

8. Synthesize DNA sequence corresponding to virus design.

9. Create viral construct and assess expression:

- if too attenuated, prepare subclone construct and goto 9;
- if insufficiently attenuated, goto 2.

Source code (PERL script) of a computer based simulated annealing routine is provided.

Alternatively, one can devise a procedure which allows each pair of amino acids to be deoptimized by choosing a codon pair without a requirement that the codons be swapped out from elsewhere in the protein encoding sequence.

Molecular Mechanisms of Viral Attenuation: Characterization of Attenuated PV Using High-Throughput Methods

As described above and in greater detail in the Examples, two synthetic, attenuated polioviruses encoding exactly the same proteins as the wildtype virus, but having altered codon bias or altered codon pair bias, were constructed. One virus uses deoptimized codons; the other virus uses deoptimized codon pairs. Each virus has many hundreds of nucleotide changes with respect to the wt virus.

The data presented herein suggest that these viruses are attenuated because of poor translation. This finding, if correct, has important consequences. First, the reduced fitness/virulence of each virus is due to small defects at hundreds of positions spread over the genome. Thus, there is essentially no chance of the virus reverting to wildtype, and so the virus is a good starting point for either a live or killed vaccine. Second, if the reduced fitness/virulence is due to additive effects of hundreds of small defects in translation, this method of reducing fitness with minimal risk of reversion should be applicable to many other viruses.

Though it is emphasized that the present invention is not limited to any particular mode of operation or underlying molecular mechanism, ongoing studies are aimed at distinguishing these alternative hypotheses. The ongoing investigations involve use of high throughput methods to scan through the genomes of various attenuated virus designs such as codon and codon pair deoptimized poliovirus and influenza virus, and to construct chimeras by placing overlapping 300-bp portions of each mutant virus into a wt context. See Example 11. The function of these chimeric viruses are then assayed. A finding that most chimeras are slightly, but not drastically, less fit than wild type, as suggested by the preliminary data disclosed herein, corroborates the “incremental loss of function” hypothesis, wherein many deleterious mutations are distributed throughout the regions covered by the chimeras. Conversely, a finding that most of the chimeras are similar or identical to wt, whereas one or only a few chimeras are attenuated like the parental mutant, suggests that there are relatively few positions in the sequence where mutation results in attenuation and that attenuation at those positions is significant.

As described in Example 12, experiments are performed to determine how codon and codon-pair deoptimization affect RNA stability and abundance, and to pinpoint the parameters that impair translation of the re-engineered viral genome. An understanding of the molecular basis of this impairment will further enhance the applicability of the SAVE approach to a broad range of viruses. Another conceivable mechanism underlying translation impairment is translational frameshifting, wherein the ribosome begins to translate a different reading frame, generating a spurious, typically truncated polypeptide up to the point where it encounters an in-frame stop codon. The PV genomes carrying the AB mutant segment from residue 1513 to 2470 are not only non-viable, but also produce a novel protein band during in vitro translation of approximately 42-44 kDa (see FIG. 5A). The ability of this AB^1513-2470fragment to inactivate PV, as well as its ability to induce production of the novel protein, may reflect the occurrence of a frameshift event and this possibility is also being investigated. A filter for avoiding the introduction of frameshifting sites is built into the SAVE design software.

More detailed investigations of translational defects are conducted using various techniques including, but not limited to, polysome profiling, toeprinting, and luciferase assays of fusion proteins, as described in Example 12.

Molecular Biology of Poliovirus

While studies are ongoing to unravel the mechanisms underlying viral attenuation by SAVE, large-scale codon deoptimization of the PV capsid coding region revealed interesting insights into the biology of PV itself. What determines the PFU/particle ratio (specific infectivity) of a virus has been a longstanding question. In general, failure at any step during the infectious life cycle before the establishment of a productive infection will lead to an abortive infection and, therefore, to the demise of the infecting particle. In the case of PV, it has been shown that approximately 100 virions are required to result in one infectious event in cultured cells (Joklik and Darnell, 1961; Schwerdt and Fogh, 1957). That is, of 100 particles inoculated, only approximately one is likely to successfully complete all steps at the level of receptor binding (step 1), followed by internalization and uncoating (step 2), initiation of genome translation (step 3), polyprotein translation (step 4), RNA replication (step 5), and encapsidation of progeny (step 6).

In the infectious cycle of AB-type viruses described here, steps 1 and 2 should be identical to a PV(M) infection as their capsids are identical. Likewise, identical 5′ nontranslated regions should perform equally well in assembly of a translation complex (step 3). Viral polyprotein translation, on the other hand (step 4), is severely debilitated due to the introduction of a great number of suboptimal synonymous codons in the capsid region (FIGS. 5 and 6). It is thought that the repeated encounter of rare codons by the translational machinery causes stalling of the ribosome as, by the laws of mass action, rare aminoacyl-tRNA will take longer to diffuse into the A site on the ribosome. As peptide elongation to a large extent is driven by the concentration of available aminoacyl-tRNA, dependence of an mRNA on many rare tRNAs consequently lengthens the time of translation (Gustafsson et al., 2004). Alternatively, excessive stalling of the ribosome may cause premature dissociation of the translation complex from the RNA and result in a truncated protein destined for degradation. Both processes lead to a lower protein synthesis rate per mRNA molecule per unit of time. While the data presented herein suggest that the phenotypes of codon-deoptimized viruses are determined by the rate of genome translation, other mechanistic explanations may be possible. For example, it has been suggested that the conserved positions of rare synonymous codons throughout the viral capsid sequence in Hepatitis A virus are of functional importance for the proper folding of the nascent polypeptide by introducing necessary translation pauses (Sanchez et al., 2003). Accordingly, large-scale alteration of the codon composition may conceivably change some of these pause sites to result in an increase of misfolded capsid proteins.

Whether these considerations also apply to the PV capsid is not clear. If so, an altered phenotype would have been expected with the PV-SD design, in which the wt codons were preserved, but their positions throughout the capsid were completely changed. That is, none of the purported pause sites would be at the appropriate position with respect to the protein sequence. No change in phenotype, however, was observed and PV-SD translated and replicated at wild type levels (FIG. 3B).

Another possibility is that the large-scale codon alterations in the tested designs may create fortuitous dominant-negative RNA elements, such as stable secondary structures, or sequences that may undergo disruptive long-range interactions with other regions of the genome.

It is assumed that all steps prior to, and including, virus uncoating should be unchanged when wt and the mutant viruses, described herein are compared. This is supported by the observation that the eclipse period for all these isolates is similar (FIG. 3B). The dramatic reduction in PFU/particle ratio is, therefore, likely to be a result of the reduced translation capacity of the deoptimized genomes, i.e., the handicap of the mutant viruses is determined intracellularly.

It is generally assumed that the relatively low PFU/particle ratio of picornaviruses of 1/100 to 1/1,000 (Rueckert, 1985) is mainly determined by structural alterations at the receptor binding step, either prior to or at the level of cell entry. The formation of 135S particles that are hardly infectious may be the major culprit behind the inefficiency of poliovirus infectivity (Hogle, 2002). However, certain virus mutants seem to sidestep A particle conversion without resulting in a higher specific infectivity, an observation suggesting that other post-entry mechanisms may be responsible for the low PFU/particle ratio (Dove and Racaniello, 1997).

The present data provide clear evidence for such post-entry interactions between virus and cell, and suggest that these, and not pre-entry events, contribute to the distinct PFU/particle ratio of poliovirus. As all replication proteins in poliovirus are located downstream of P1 on the polyprotein, they critically depend upon successful completion of P1 translation. Lowering the rate of P1 translation therefore lowers translation of all replication proteins to the same extent. This, in turn, likely leads to a reduced capacity of the virus to make the necessary modifications to the host cell required for establishment of a productive infection, such as shutdown of host cell translation or prevention of host cell innate responses. While codon deoptimization, as described herein, is likely to effect translation at the peptide elongation step, reduced initiation of translation can also be a powerful attenuating determinant as well, as has been shown for mutations in the internal ribosomal entry site in the Sabin vaccine strains of poliovirus (Svitkin et al., 1993; 1985).

On the basis of these considerations, it is predicted that many mutant phenotypes attributable to defects in genome translation or early genome replication actually manifest themselves by lowering PFU/particle ratios. This would be the case as long as the defect results in an increased chance of abortive infection. Since in almost all studies the omnipresent plaque assay is the virus detection method of choice, a reduction in the apparent virus titer is often equated with a reduction in virus production per se. This may be an inherent pitfall that can be excused with the difficulties of characterizing virus properties at the single-cell level. Instead, most assays are done on a large population of cells. A lower readout of the chosen test (protein synthesis, RNA replication, virus production as measured in PFU) is taken at face value as an indicator of lower production on a per-cell basis, without considering that virus production in a cell may be normal while the number of cells producing virus is reduced.

The near-identical production of particles per cell by codon-deoptimized viruses indicates that the total of protein produced after extended period of times is not severely affected, whereas the rate of protein production has been drastically reduced. This is reflected in the delayed appearance of CPE, which may be a sign that the virus has to go through more RNA replication cycles to build up similar intracellular virus protein concentrations. It appears that codon-deoptimized viruses are severely handicapped in establishing a productive infection because the early translation rate of the incoming infecting genome is reduced. As a result of this lower translation rate, PV proteins essential for disabling the cell's antiviral responses (most likely proteinases 2A^proand 3C^pro) are not synthesized at sufficient amounts to pass this crucial hurdle in the life cycle quickly enough. Consequently, there is a better chance for the cell to eliminate the infection before viral replication could unfold and take over the cell. Thus, the likelihood for productive infection events is reduced and the rate of abortive infection is increased. However, in the case where a codon-deoptimized virus does succeed in disabling the cell, this virus will produce nearly identical amounts of progeny to the wild type. The present data suggest that a fundamental difference may exist between early translation (from the incoming RNA genome) and late translation during the replicative phase, when the cell's own translation is largely shut down. Although this may be a general phenomenon, it might be especially important in the case of codon-deoptimized genomes. Host cell shutoff very likely results in an over-abundance of free aminoacyl-tRNAs, which may overcome the imposed effect of the suboptimal codon usage as the PV genomes no longer have to compete with cellular RNAs for translation resources. This, in fact, may be analogous to observations with the modified in vitro translation system described herein (FIG. 5B). Using a translation extract that was not nuclease-treated (and thus contained cellular mRNAs) and not supplemented with exogenous amino acids or tRNAs, clear differences were observed in the translation capacity of different capsid design mutants. Under these conditions, viral genomes have to compete with cellular mRNAs in an environment where supplies are limited. In contrast, in the traditional translation extract, in which endogenous mRNAs were removed and excess tRNAs and amino acids were added, all PV RNAs translated equally well regardless of codon bias (FIG. 5A). These two different in vitro conditions may be analogous to in vivo translation during the early and late phases in the PV-infected cell.

One key finding of the present study is the realization that, besides the steps during the physical interaction and uptake of virus, the PFU/particle ratio also largely reflects the virus' capacity to overcome host cell antiviral responses. This suggests that picornaviruses are actually quite inefficient in winning this struggle, and appear to have taken the path of evolving small genomes that can quickly replicate before the cell can effectively respond. As the data show, slowing down translation rates by only 30% in PV-AB^2470-2954(see FIG. 6) leads to a 1,000-fold higher rate of abortive infection as reflected in the lower specific infectivity (FIG. 4D). Picornaviruses apparently not only replicate at the threshold of error catastrophe (Crotty et al., 2001; Holland et al., 1990) but also at the threshold of elimination by the host cell's antiviral defenses. This effect may have profound consequences for the pathogenic phenotype of a picornavirus. The cellular antiviral processes responsible for the increased rate of aborted infections by codon-deoptimized viruses are not completely understood at present. PV has been shown to both induce and inhibit apoptosis (Belov et al., 2003; Girard et al., 1999; Tolskaya et al., 1995). Similarly PV interferes with the interferon pathway by cleaving NF-κB (Neznanov et al., 2005). It is plausible that a PV with a reduced rate of early genome translation still induces antiviral responses in the same way as a wt virus (induction of apoptosis and interferon by default) but then, due to low protein synthesis, has a reduced potential of inhibiting these processes. This scenario would increase the likelihood of the cell aborting a nascent infection and could explain the observed phenomena. At the individual cell level, PV infection is likely to be an all-or-nothing phenomenon. Viral protein and RNA syntheses likely need to be within a very close to maximal range in order to ensure productive infection.

Attenuated Virus Vaccine Compositions

The present invention provides a vaccine composition for inducing a protective immune response in a subject comprising any of the attenuated viruses described herein and a pharmaceutically acceptable carrier.

It should be understood that an attenuated virus of the invention, where used to elicit a protective immune response in a subject or to prevent a subject from becoming afflicted with a virus-associated disease, is administered to the subject in the form of a composition additionally comprising a pharmaceutically acceptable carrier. Pharmaceutically acceptable carriers are well known to those skilled in the art and include, but are not limited to, one or more of 0.01-0.1M and preferably 0.05M phosphate buffer, phosphate-buffered saline (PBS), or 0.9% saline. Such carriers also include aqueous or non-aqueous solutions, suspensions, and emulsions. Aqueous carriers include water, alcoholic/aqueous solutions, emulsions or suspensions, saline and buffered media. Examples of non-aqueous solvents are propylene glycol, polyethylene glycol, vegetable oils such as olive oil, and injectable organic esters such as ethyl oleate. Parenteral vehicles include sodium chloride solution, Ringer's dextrose, dextrose and sodium chloride, lactated Ringer's and fixed oils. Intravenous vehicles include fluid and nutrient replenishers, electrolyte replenishers such as those based on Ringer's dextrose, and the like. Solid compositions may comprise nontoxic solid carriers such as, for example, glucose, sucrose, mannitol, sorbitol, lactose, starch, magnesium stearate, cellulose or cellulose derivatives, sodium carbonate and magnesium carbonate. For administration in an aerosol, such as for pulmonary and/or intranasal delivery, an agent or composition is preferably formulated with a nontoxic surfactant, for example, esters or partial esters of C6 to C22 fatty acids or natural glycerides, and a propellant. Additional carriers such as lecithin may be included to facilitate intranasal delivery. Pharmaceutically acceptable carriers can further comprise minor amounts of auxiliary substances such as wetting or emulsifying agents, preservatives and other additives, such as, for example, antimicrobials, antioxidants and chelating agents, which enhance the shelf life and/or effectiveness of the active ingredients. The instant compositions can, as is well known in the art, be formulated so as to provide quick, sustained or delayed release of the active ingredient after administration to a subject.

In various embodiments of the instant vaccine composition, the attenuated virus (i) does not substantially alter the synthesis and processing of viral proteins in an infected cell; (ii) produces similar amounts of virions per infected cell as wt virus; and/or (iii) exhibits substantially lower virion-specific infectivity than wt virus. In further embodiments, the attenuated virus induces a substantially similar immune response in a host animal as the corresponding wt virus.

This invention also provides a modified host cell line specially isolated or engineered to be permissive for an attenuated virus that is inviable in a wild type host cell. Since the attenuated virus cannot grow in normal (wild type) host cells, it is absolutely dependent on the specific helper cell line for growth. This provides a very high level of safety for the generation of virus for vaccine production. Various embodiments of the instant modified cell line permit the growth of an attenuated virus, wherein the genome of said cell line has been altered to increase the number of genes encoding rare tRNAs.

In preferred embodiments, the rare codons are CTA (coding for Leu), TCG (Ser), and CCG (Pro). In different embodiments, one, two, or all three of these rare codons are substituted for synonymous frequent codons in the viral genome. For example, all Leu codons in the virus may be changed to CTA; all Ser codons may be changed to TCG; all Pro codons may be changed to CCG; the Leu and Ser, or Leu and Pro, or Ser and Pro codons may be replaced by the identified rare codons; or all Leu, Ser, and Pro codons may be changed to CTA, TCG, and CCG, respectively, in a single virus. Further, a fraction of the relevant codons, i.e., less than 100%, may be changed to the rare codons. Thus, the proportion of codons substituted may be about 20%, 40%, 60%, 80% or 100% of the total number of codons.

In certain embodiments, these substitutions are made only in the capsid region of the virus, where a high rate of translation is most important. In other embodiments, the substitutions are made throughout the virus. In further embodiments, the cell line over-expresses tRNAs that bind to the rare codons.

This invention further provides a method of synthesizing any of the attenuated viruses described herein, the method comprising (a) identifying codons in multiple locations within at least one non-regulatory portion of the viral genome, which codons can be replaced by synonymous codons; (b) selecting a synonymous codon to be substituted for each of the identified codons; and (c) substituting a synonymous codon for each of the identified codons.

In certain embodiments of the instant methods, steps (a) and (b) are guided by a computer-based algorithm for Synthetic Attenuated Virus Engineering (SAVE) that permits design of a viral genome by varying specified pattern sets of deoptimized codon distribution and/or deoptimized codon-pair distribution within preferred limits. The invention also provides a method wherein, the pattern sets alternatively or additionally comprise, density of deoptimized codons and deoptimized codon pairs, RNA secondary structure, CpG dinucleotide content, C+G content, overlapping coding frames, restriction site distribution, frameshift sites, or any combination thereof.

In other embodiments, step (c) is achieved by de novo synthesis of DNA containing the synonymous codons and/or codon pairs and substitution of the corresponding region of the genome with the synthesized DNA. In further embodiments, the entire genome is substituted with the synthesized DNA. In still further embodiments, a portion of the genome is substituted with the synthesized DNA. In yet other embodiments, said portion of the genome is the capsid coding region.

In addition, the present invention provides a method for eliciting a protective immune response in a subject comprising administering to the subject a prophylactically or therapeutically effective dose of any of the vaccine compositions described herein. This invention also provides a method for preventing a subject from becoming afflicted with a virus-associated disease comprising administering to the subject a prophylactically effective dose of any of the instant vaccine compositions. In embodiments of the above methods, the subject has been exposed to a pathogenic virus. “Exposed” to a pathogenic virus means contact with the virus such that infection could result.

The invention further provides a method for delaying the onset, or slowing the rate of progression, of a virus-associated disease in a virus-infected subject comprising administering to the subject a therapeutically effective dose of any of the instant vaccine compositions.

As used herein, “administering” means delivering using any of the various methods and delivery systems known to those skilled in the art. Administering can be performed, for example, intraperitoneally, intracerebrally, intravenously, orally, transmucosally, subcutaneously, transdermally, intradermally, intramuscularly, topically, parenterally, via implant, intrathecally, intralymphatically, intralesionally, pericardially, or epidurally. An agent or composition may also be administered in an aerosol, such as for pulmonary and/or intranasal delivery. Administering may be performed, for example, once, a plurality of times, and/or over one or more extended periods.

Eliciting a protective immune response in a subject can be accomplished, for example, by administering a primary dose of a vaccine to a subject, followed after a suitable period of time by one or more subsequent administrations of the vaccine. A suitable period of time between administrations of the vaccine may readily be determined by one skilled in the art, and is usually on the order of several weeks to months. The present invention is not limited, however, to any particular method, route or frequency of administration.

A “subject” means any animal or artificially modified animal. Animals include, but are not limited to, humans, non-human primates, cows, horses, sheep, pigs, dogs, cats, rabbits, ferrets, rodents such as mice, rats and guinea pigs, and birds. Artificially modified animals include, but are not limited to, SCID mice with human immune systems, and CD155tg transgenic mice expressing the human poliovirus receptor CD155. In a preferred embodiment, the subject is a human. Preferred embodiments of birds are domesticated poultry species, including, but not limited to, chickens, turkeys, ducks, and geese.

A “prophylactically effective dose” is any amount of a vaccine that, when administered to a subject prone to viral infection or prone to affliction with a virus-associated disorder, induces in the subject an immune response that protects the subject from becoming infected by the virus or afflicted with the disorder. “Protecting” the subject means either reducing the likelihood of the subject's becoming infected with the virus, or lessening the likelihood of the disorder's onset in the subject, by at least two-fold, preferably at least ten-fold. For example, if a subject has a 1% chance of becoming infected with a virus, a two-fold reduction in the likelihood of the subject becoming infected with the virus would result in the subject having a 0.5% chance of becoming infected with the virus. Most preferably, a “prophylactically effective dose” induces in the subject an immune response that completely prevents the subject from becoming infected by the virus or prevents the onset of the disorder in the subject entirely.

As used herein, a “therapeutically effective dose” is any amount of a vaccine that, when administered to a subject afflicted with a disorder against which the vaccine is effective, induces in the subject an immune response that causes the subject to experience a reduction, remission or regression of the disorder and/or its symptoms. In preferred embodiments, recurrence of the disorder and/or its symptoms is prevented. In other preferred embodiments, the subject is cured of the disorder and/or its symptoms.

Certain embodiments of any of the instant immunization and therapeutic methods further comprise administering to the subject at least one adjuvant. An “adjuvant” shall mean any agent suitable for enhancing the immunogenicity of an antigen and boosting an immune response in a subject. Numerous adjuvants, including particulate adjuvants, suitable for use with both protein- and nucleic acid-based vaccines, and methods of combining adjuvants with antigens, are well known to those skilled in the art. Suitable adjuvants for nucleic acid based vaccines include, but are not limited to, Quil A, imiquimod, resiquimod, and interleukin-12 delivered in purified protein or nucleic acid form. Adjuvants suitable for use with protein immunization include, but are not limited to, alum, Freund's incomplete adjuvant (FIA), saponin, Quil A, and QS-21.

The invention also provides a kit for immunization of a subject with an attenuated virus of the invention. The kit comprises the attenuated virus, a pharmaceutically acceptable carrier, an applicator, and an instructional material for the use thereof. In further embodiments, the attenuated virus may be one or more poliovirus, one or more rhinovirus, one or more influenza virus, etc. More than one virus may be preferred where it is desirable to immunize a host against a number of different isolates of a particular virus. The invention includes other embodiments of kits that are known to those skilled in the art. The instructions can provide any information that is useful for directing the administration of the attenuated viruses.

Of course, it is to be understood and expected that variations in the principles of invention herein disclosed can be made by one skilled in the art and it is intended that such modifications are to be included within the scope of the present invention. The following Examples further illustrate the invention, but should not be construed to limit the scope of the invention in any way. Detailed descriptions of conventional methods, such as those employed in the construction of recombinant plasmids, transfection of host cells with viral constructs, polymerase chain reaction (PCR), and immunological techniques can be obtained from numerous publications, including Sambrook et al. (1989) and Coligan et al. (1994). All references mentioned herein are incorporated in their entirety by reference into this application.

Full details for the various publications cited throughout this application are provided at the end of the specification immediately preceding the claims. The disclosures of these publications are hereby incorporated in their entireties by reference into this application. However, the citation of a reference herein should not be construed as an acknowledgement that such reference is prior art to the present invention.

Example 1

Re-Engineering of Capsid Region of Polioviruses by Altering Codon Bias

Cells, Viruses, Plasmids, and Bacteria

HeLa R19 cell monolayers were maintained in Dulbecco's modified Eagle medium (DMEM) supplemented with 10% bovine calf serum (BCS) at 37° C. All PV infectious cDNA constructs are based on PV1(M) cDNA clone pT7PVM (Cao et al., 1993; van der Werf et al., 1986). Dicistronic reporter plasmids were constructed using pHRPF-Luc (Zhao and Wimmer, 2001). Escherichia coli DH5α was used for plasmid transformation and propagation. Viruses were amplified by infection of HeLa R19 cell monolayers with 5 PFU per cell. Infected cells were incubated in DMEM (2% BCS) at 37° C. until complete cytopathic effect (CPE) was apparent or for at least 4 days post-infection. After three rounds of freezing and thawing, the lysate was clarified of cell debris by low-speed centrifugation and the supernatant, containing the virus, was used for further passaging or analysis.

Cloning of Synthetic Capsid Replacements and Dicistronic Reporter Replicons

Two PV genome cDNA fragments spanning the genome between nucleotides 495 and 3636, named SD and AB, were synthesized using GeneMaker® technology (Blue Heron Biotechnology). pPV-SD and pPV-AB were generated by releasing the replacement cassettes from the vendor's cloning vector by PflMI digestion and insertion into the pT7PVM vector in which the corresponding PflMI fragment had been removed. pPV-AB^755-1513and pPV-AB^2470-3386were obtained by inserting a BsmI fragment or an NheI-EcoRI fragment, respectively, from pPV-AB into equally digested pT7PVM vector. In pPV-AB^1513-3386and pPV-AB^755-2479, the BsmI fragment or NheI-EcoRI fragment of pT7PVM, respectively, replaces the respective fragment of the pPV-AB vector. Replacement of the NheI-EcoRI fragment of pPV-AB^1513-3386with that of pT7PVM resulted in pPV-AB^2470-3386. Finally, replacement of the SnaBI-EcoRI fragments of pPV-AB^2470-3386and pT7PVM with one another produced pPV-AB^2954-3386and pPV-AB^2470-2954, respectively.

Cloning of dicistronic reporter constructs was accomplished by first introducing a silent mutation in pHRPF-Luc by site-directed mutagenesis using oligonucleotides Fluc-mutRI(+)/Fluc-mutRI(−) to mutate an EcoRI site in the firefly luciferase open reading frame and generate pdiLuc-mRI. The capsid regions of pT7PVM, pPV-AB^1513-2470and pPV-AB^2470-2954were PCR amplified using oligonucleotides RI-2A-P1wt(+)/P1wt-2A-RI(−). Capsid sequences of pPV-AB^2470-3386and pPV-AB^2954-3386or pPV-AB were amplified with RI-2A-P1 wt(+)/P1AB-2A-RI(−) or RI-2A-P1 AB(+)/P1 AB-2A-RI(−), respectively. PCR products were digested with EcoRI and inserted into a now unique EcoRI site in pdiLuc-mRI to result in pdiLuc-PV, pdiLuc-AB^1513-2470, pdiLuc-AB^2470-2954, pdiLuc-AB^2470-3386, pdiLuc-AB^2954-3386, and pdiLuc-AB, respectively.

Oligonucleotides

The following oligonucleotides were used:

Fluc-mutRI(+),

(SEQ ID NO: 6)

5′-GCACTGATAATGAACTCCTCTGGATCTACTGG-3′;

Fluc-mutRI(−),

(SEQ ID NO: 7)

5′-CCAGTAGATCCAGAGGAGTTCATTATCAGTGC-3′;

RI-2A-P1wt(+),

(SEQ ID NO: 8)

5′-CAAGAATTCCTGACCACATACGGTGCTCAGGTTTCATCACAGAAA

GTGGG-3′;

RI-2A-P1AB(+),

(SEQ ID NO: 9)

5′-CAAGAATTCCTGACCACATACGGTGCGCAAGTATCGTCGCAAAAA

GTAGG-3;

P1wt-2A-RI(−),

(SEQ ID NO: 10)

5′-TTCGAATTCTCCATATGTGGTCAGATCCTTGGTGG-AGAGG-3′;

and

P1AB-2A-RI(−),

(SEQ ID NO: 11)

5′-TTCGAATTCTCCATACGTCGTTAAATCTTTCGTCGATAACG-3′.

In Vitro Transcription and RNA Transfection

Driven by the T7 promoter, 2 μg of EcoRI-linearized plasmid DNA were transcribed by T7 RNA polymerase (Stratagene) for 1 h at 37° C. One microgram of virus or dicistronic transcript RNA was used to transfect 10⁶HeLa R19 cells on a 35-mm-diameter plate according to a modification of the DEAE-dextran method (van der Werf et al., 1986). Following a 30-min incubation at room temperature, the supernatant was removed and cells were incubated at 37° C. in 2 ml of DMEM containing 2% BCS until CPE appeared, or the cells were frozen 4 days post-transfection for further passaging. Virus titers were determined by standard plaque assay on HeLa R19 cells using a semisolid overlay of 0.6% tragacanth gum (Sigma-Aldrich) in minimal Eagle medium.

Design and Synthesis of Codon-Deoptimized Polioviruses

Two different synonymous encodings of the poliovirus P1 capsid region were produced, each governed by different design criteria. The designs were limited to the capsid, as it has been conclusively shown that the entire capsid coding sequence can be deleted from the PV genome or replaced with exogenous sequences without affecting replication of the resulting sub-genomic replicon (Johansen and Morrow, 2000; Kaplan and Racaniello, 1988). It is therefore quite certain that no unidentified crucial regulatory RNA elements are located in the capsid region, which might be affected inadvertently by modulation of the RNA sequence.

The first design (PV-SD) sought to maximize the number of RNA base changes while preserving the exact codon usage distribution of the wild type P1 region (FIG. 1). To achieve this, synonymous codon positions were exchanged for each amino acid by finding a maximum weight bipartite match (Gabow, 1973) between the positions and the codons, where the weight of each position-codon pair is the number of base changes between the original codon and the synonymous candidate codon to replace it. To avoid any positional bias from the matching algorithm, the synonymous codon locations were randomly permuted before creating the input graph and the locations were subsequently restored. Rothberg's maximum bipartite matching program (Rothberg, 1985) was used to compute the matching. A total of 11 useful restriction enzyme sites, each 6 nucleotides, were locked in the viral genome sequence so as to not participate in the codon location exchange. The codon shuffling technique potentially creates additional restriction sites that should preferably remain unique in the resulting reconstituted full-length genome. For this reason, the sequence was further processed by substituting codons to eliminate the undesired sites. This resulted in an additional nine synonymous codon changes that slightly altered the codon frequency distribution. However, no codon had its frequency changed by more than 1 over the wild type sequence. In total, there were 934 out of 2,643 nucleotides changed in the PV-SD capsid design when compared to the wt P1 sequence while maintaining the identical protein sequence of the capsid coding domain (see FIGS. 1 and 2). As the codon usage was not changed, the GC content in the PVM-SD capsid coding sequence remained identical to that in the wt at 49%.

The second design, PV-AB, sought to drastically change the codon usage distribution over the wt P1 region. This design was influenced by recent work suggesting that codon bias may impact tissue-specific expression (Plotkin et al., 2004). The desired codon usage distribution was derived from the most unfavorable codons observed in a previously described set of brain-specific genes (Hsiao et al., 2001; Plotkin et al., 2004). A capsid coding region was synthesized maximizing the usage of the rarest synonymous codon for each particular amino acid as observed in this set of genes (FIG. 1). Since for all amino acids but one (Leu) the rarest codon in brain corresponds to the rarest codons among all human genes at large, in effect this design would be expected to discriminate against expression in other human tissues as well. Altogether, the PV-AB capsid differs from the wt capsid in 680 nucleotide positions (see FIG. 2). The GC content in the PVM-AB capsid region was reduced to 43% compared to 49% in the wt.

Example 2

Effects of Codon-Deoptimization on Growth and Infectivity of Polioviruses

Determination of Virus Titer by Infected Focus Assay

Infections were done as for a standard plaque assay. After 48 or 72 h of incubation, the tragacanth gum overlay was removed and the wells were washed twice with phosphate-buffered saline (PBS) and fixed with cold methanol/acetone for 30 min Wells were blocked in PBS containing 10% BCS followed by incubation with a 1:20 dilution of anti-3D mouse monoclonal antibody 125.2.3 (Paul et al., 1998) for 1 h at 37° C. After washing, cells were incubated with horseradish peroxidase-labeled goat anti-mouse antibody (Jackson ImmunoResearch, West Grove, Pa.) and infected cells were visualized using Vector VIP substrate kit (Vector Laboratories, Burlingame, Calif.). Stained foci, which are equivalent to plaques obtained with wt virus, were counted, and titers were calculated as in the plaque assay procedure.

Codon-Deoptimized Polioviruses Display Severe Growth Phenotypes

Of the two initial capsid ORF replacement designs (FIG. 3A), only PV-SD produced viable virus. In contrast, no viable virus was recovered from four independent transfections with PV-AB RNA, even after three rounds of passaging (FIG. 3E). It appeared that the codon bias introduced into the PV-AB genome was too severe. Thus, smaller portions of the PV-AB capsid coding sequence were subcloned into the PV(M) background to reduce the detrimental effects of the nonpreferred codons. Of these subclones, PV-AB^2954-3386produced CPE 40 h after RNA transfection, while PV-AB^755-1513and PV-AB^2470-2954required one or two additional passages following transfection, respectively (compared to 24 h for the wild type virus). Interestingly, these chimeric viruses represent the three subclones with the smallest portions of the original AB sequence, an observation suggesting a direct correlation between the number of nonpreferred codons and the fitness of the virus.

One-step growth kinetics of all viable virus variants were determined by infecting HeLa monolayers at a multiplicity of infection (MOI) of 2 with viral cell lysates obtained after a maximum of two passages following RNA transfection (FIG. 3B). The MOI was chosen due to the low titer of PV-AB^2470-2954and to eliminate the need for further passaging required for concentrating and purifying the inoculum. Under the conditions used, all viruses had produced complete or near complete CPE by 24 h post-infection.

Despite 934 single-point mutations in its capsid region, PV-SD replicated at wt capacity (FIG. 3B) and produced similarly sized plaques as the wt (FIG. 3D). While PV-AB^2954-3386grew with near-wild type kinetics (FIG. 3B), PV-AB^755-1513produced minute plaques and approximately 22-fold less infectious virus (FIGS. 2. 3B and F, respectively). Although able to cause CPE in high-MOI infections, albeit much delayed (80 to 90% CPE after 20 to 24 h), PV-AB^2470-2954produced no plaques at all under the conditions of the standard plaque assay (FIG. 3H). This virus was therefore quantified using a focus-forming assay, in which foci of infected cells after 72 h of incubation under plaque assay conditions were counted after they were stained immunohistochemically with antibodies to the viral polymerase 3D (FIG. 3G). After 48 h of infection, PV-AB^2470-2954-infected foci usually involved only tens to hundreds of cells (FIG. 3J) with a focus diameter of 0.2 to 0.5 mm, compared to 3-mm plaques for the wt (FIGS. 3C and D). However, after an additional 24 h, the diameter of the foci increased significantly (2 to 3 mm; FIG. 3G). When HeLa cells were infected with PV-AB^755-1513and PV-AB^2470-2954at an MOI of 1, the CPE appeared between 12 and 18 h and 3 and 4 days, respectively, compared to 8 h with the wt (data not shown).

In order to quantify the cumulative effect of a particular codon bias in a protein coding sequence, a relative codon deoptimization index (RCDI) was calculated, which is a comparative measure against the general codon distribution in the human genome. An RCDI of 1/codon indicates that a gene follows the normal human codon frequencies, while any deviation from the normal human codon bias results in an RCDI higher than 1. The RCDI was derived using the formula:

RCDI=[Σ(C_iF_a/C_iF_h)·N_ci]/N (i=1 through 64).

C_iF_ais the observed relative frequency in the test sequence of each codon i out of all synonymous codons for the same amino acid (0 to 1); C_iF_his the normal relative frequency observed in the human genome of each codon i out of all synonymous codons for that amino acid (0.06 to 1); N_ciis the number of occurrences of that codon i in the sequence; and N is the total number of codons (amino acids) in the sequence.

Thus, a high number of rare codons in a sequence results in a higher index. Using this formula, the RCDI values of the various capsid coding sequences were calculated to be 1.14 for PV(M) and PV-SD which is very close to a normal human distribution. The RCDI values for the AB constructs are 1.73 for PV-AB^755-1513, 1.45 for PV-AB^2470-2954, and 6.51 for the parental PV-AB. For comparison, the RCDI for probably the best known codon-optimized protein, “humanized” green fluorescent protein (GFP), was 1.31 compared to an RCDI of 1.68 for the original Aequora victoria gfp gene (Zolotukhin et al., 1996). According to these calculations, a capsid coding sequence with an RCDI of <2 is associated with a viable virus phenotype, while an RCDI of >2 (PV-AB=6.51, PV-AB^1513-3386=4.04, PV-AB^755-2470=3.61) results in a lethal phenotype.

Example 3

Effects of Codon-Deoptimization on Specific Infectivity of Polioviruses

Molecular Quantification of Viral Particles: Direct OD₂₆₀Absorbance Method

Fifteen-centimeter dishes of HeLa cells (4×10⁷cells) were infected with PV(M), PV-AB^755-1513, or PV-AB^2470-2954at an MOI of 0.5 until complete CPE occurred (overnight versus 4 days). Cell-associated virus was released by three successive freeze/thaw cycles. Cell lysates were cleared by 10 min of centrifugation at 2,000×g followed by a second 10-min centrifugation at 14,000×g for 10 min Supernatants were incubated for 1 h at room temperature in the presence of 10 μg/ml RNase A (Roche) to digest any extraviral or cellular RNA. After addition of 0.5% sodium dodecyl sulfate (SDS) and 2 mM EDTA, virus-containing supernatants were overlaid on a 6-ml sucrose cushion (30% sucrose in Hanks balanced salt solution [HBSS]; Invitrogen, Carlsbad, Calif.). Virus particles were sedimented by ultracentrifugation for 4 h at 28,000 rpm using an SW28 swinging bucket rotor. Supernatants were discarded and centrifuge tubes were rinsed twice with HBSS while leaving the sucrose cushion intact. After removal of the last wash and the sucrose cushion, virus pellets were resuspended in PBS containing 0.2% SDS and 5 mM EDTA. Virus infectious titers were determined by plaque assay/infected-focus assay (see above). Virus particle concentrations were determined with a NanoDrop spectrophotometer (NanoDrop Technologies, Inc., Wilmington, Del.) at the optical density at 260 nm (OD₂₆₀) and calculated using the formula 1 OD₂₆₀unit=9.4×10¹²particles/ml (Rueckert, 1985). In addition, virion RNA was extracted by three rounds of phenol extraction and one round of chloroform extraction. RNA was ethanol precipitated and resuspended in ultrapure water. RNA purity was confirmed by TAE-buffered agarose gel analysis, and the concentration was determined spectrophotometrically. The total number of genome equivalents of the corresponding virus preparation was calculated via the determined RNA concentration and the molecular weight of the RNA. Thus, the relative amount of virions per infectious units could be calculated, assuming that one RNase-protected viral genome equivalent corresponds to one virus particle.

Molecular Quantification of Viral Particles: ELISA Method

Nunc Maxisorb 96-well plates were coated with 10 μg of rabbit anti-PV(M) antibody (Murdin and Wimmer, 1989) in 100 μl PBS for 2 h at 37° C. and an additional 14 h at 4° C., and then the plates were washed three times briefly with 350 μl of PBS and blocked with 350 μl of 10% bovine calf serum in PBS for 1 h at 37° C. Following three brief washes with PBS, wells were incubated with 100 μl of virus-containing cell lysates or controls in DMEM plus 2% BCS for 4 h at room temperature. Wells were washed with 350 μl of PBS three times for 5 min each. Wells were then incubated for 4 h at room temperature with 2 μg of CD155-alkaline phosphatase (AP) fusion protein (He et al., 2000) in 100 μl of DMEM-10% BCS. After the last of five washes with PBS, 100 μl of 10 mM Tris, pH 7.5, were added and plates were incubated for 1 h at 65° C. Colorimetric alkaline phosphatase determination was accomplished by addition of 100 μl of 9 mg/ml para-nitrophenylphosphate (in 2 M diethanolamine, 1 mM MgCl₂, pH 9.8). Alkaline phosphatase activity was determined, and virus particle concentrations were calculated in an enzyme-linked immunosorbent assay (ELISA) plate reader (Molecular Devices, Sunnyvale, Calif.) at a 405-nm wavelength on a standard curve prepared in parallel using two-fold serial dilutions of a known concentration of purified PV(M) virus stock.

The PFU/Particle Ratio is Reduced in Codon-Deoptimized Viruses

The extremely poor growth phenotype of PV-AB^2470-2954in cell culture and its inability to form plaques suggested a defect in cell-to-cell spreading that may be consistent with a lower specific infectivity of the individual virus particles.

To test this hypothesis, PV(M), PV-AB^755-1513, and PV-AB^2470-2954virus were purified and the amount of virus particles was determined spectrophotometrically. Purified virus preparations were quantified directly by measuring the OD₂₆₀, and particle concentrations were calculated according to the formula 1 OD₂₆₀unit=9.4×10¹²particles/ml (FIG. 4D) (Rueckert, 1985). Additionally, genomic RNA was extracted from those virions (FIG. 4A) and quantified at OD₂₆₀(data not shown). The number of virions (1 virion=1 genome) was then determined via the molecular size of 2.53×10⁶g/mol for genomic RNA. Specifically, virus was prepared from 4×10⁷HeLa cells that were infected with 0.5 MOI of virus until the appearance of complete CPE, as described above. Both methods of particle determinations produced similar results (FIG. 4D). Indeed, it was found that PV(M) and PV-AB^755-1513produced roughly equal amounts of virions, while PV-AB^2470-2954produced between ⅓ (by the direct UV method (FIG. 4D) to ⅛ of the number of virions compared to PV(M) (by genomic RNA method [data not shown]). In contrast, the wt virus sample corresponded to approximately 30 times and 3,000 times more infectious units than PV-AB^755-1513and PV-AB^2470-2954, respectively (FIG. 4D). In addition, capsid proteins of purified virions were resolved by SDS-polyacrylamide gel electrophoresis (PAGE) and visualized by silver staining (FIG. 4B). These data also support the conclusion that on a per-cell basis, PV-AB^2470-2954and PV-AB^755-1513produce similar or only slightly reduced amounts of progeny per cell (FIG. 4B, lane 3), while their PFU/particle ratio is reduced. The PFU/particle ratio for a virus can vary significantly depending on the methods to determine either plaques (cell type for plaque assay and the particular plaque assay technique) or particle count (spectrophotometry or electron microscopy). A PFU/particle ratio of 1/115 for PV1(M) was determined using the method described herein, which compares well to previous determinations of 1/272 (Joklik and Darnell, 1961) (done on HeLa cells) and 1/87 (Schwerdt and Fogh, 1957) (in primary monkey kidney cells).

Development of a Virion-Specific ELISA

To confirm the reduced PFU/particle ratio observed with codon-deoptimized polioviruses, a novel virion-specific ELISA was developed (FIGS. 4C and E) as a way to determine the physical amount of intact viral particles in a sample rather than the infectious titer, which is a biological variable. The assay is based on a previous observation that the ectodomain of the PV receptor CD155 fused to heat-stable placental alkaline phosphatase (CD155-AP) binds very tightly and specifically to the intact 160S particle (He et al., 2000). Considering that PV 130S particles (A particles) lose their ability to bind CD155 efficiently (Hogle, 2002), it is expected that no other capsid intermediate or capsid subunits would interact with CD155-AP, thus ensuring specificity for intact particles. In support of this notion, lysates from cells that were infected with a vaccinia virus strain expressing the P1 capsid precursor (Ansardi et al., 1993) resulted in no quantifiable signal (data not shown).

The ELISA method allows for the quantification of virus particles in a crude sample such as the cell lysate after infection, which should minimize possible alteration of the PFU/particle ratio by other mechanisms during sample handling and purification (thermal/chemical inactivation, oxidation, degradation, etc.). Under the current conditions, the sensitivity of this assay is approximately 10⁷viral particles, as there is no signal amplification step involved. This, in turn, resulted in an exceptionally low background. With this ELISA, PV particle concentrations could be determined in samples by back calculation on a standard curve prepared with purified PV(M) of known concentration (FIG. 4E). The particle determinations by ELISA agreed well with results obtained by the direct UV method (FIG. 4D).

Implications of Results

The present study has demonstrated the utility of large-scale codon deoptimization of PV capsid coding sequences by de novo gene synthesis for the generation of attenuated viruses. The initial goal was to explore the potential of this technology as a tool for generating live attenuated virus vaccines. Codon-deoptimized viruses were found to have very low specific infectivity (FIG. 4). The low specific infectivity (that is the chance of a single virus particle to successfully initiate an infectious cycle in a cell) results in a more slowly spreading virus infection within the host. This in turn allows the host organism more time to mount an immune response and clear the infection, which is a most desirable feature in an attenuated virus vaccine. On the other hand, codon-deoptimized viruses generated similar amounts of progeny per cell as compared the wild type virus, while being 2 to 3 orders of magnitude less infectious (FIG. 4). This allows the production of virus particles antigenically indistinguishable from the wt as effectively and cost-efficiently as the production of the wt virus itself. However due to the low specific infectivity the actual handling and processing of such a virus preparation is much safer. Since, there are increasing concerns about the production of virulent virus in sufficient quantities under high containment conditions and the associated risk of virus escape from the production facility either by accident or by malicious intent. viruses as described herein may prove very useful as safer alternatives in the production of inactivated virus vaccines. Since they are 100% identical to the wt virus at the protein level, an identical immune response in hosts who received inactivated virus is guaranteed.

Example 4

Effects of Codon-Deoptimization on Neuropathogenicity of Polioviruses

Mouse Neuropathogenicity Tests

Groups of four to five CD155tg mice (strain Tg21) (Koike et al., 1991) between 6 and 8 weeks of age were injected intracerebrally with virus dilutions between 10²and 10⁶PFU/focus-forming units (FFU) in 30 μl PBS. Fifty percent lethal dose (LD₅₀) values were calculated by the method of Reed and Muench (1938). Virus titers in spinal cord tissues at the time of death or paralysis were determined by plaque or infected-focus assay.

Codon-Deoptimized Polioviruses are Neuroattenuated on a Particle Basis in CD155tg Mice

To test the pathogenic potential of viruses constructed in this study, CD155 transgenic mice (Koike et al., 1991) were injected intracerebrally with PV(M), PV-SD, PV-AB^755-1513, and PV-AB^2470-2954at doses between 10²and 10⁵PFU/FFU. Initial results were perplexing, as quite counterintuitively PV-AB^755-1513and especially PV-AB^2470-2954were initially found to be as neuropathogenic as, or even slightly more neuropathogenic, than the wt virus. See Table 4.

TABLE 4

Neuropathogenicity in CD155tg mice.

LD₅₀
Spinal cord titer

Construct
PFU or FFU^a
No. of virions^b
PFU or FFU/g^c
No. of virions/g^d

PV(M) wt
3.2 × 10²PFU
3.7 × 10⁴
1.0 × 10⁹PFU
1.15 × 10¹¹

PV-AB^755-1515
2.6 × 10²PFU
7.3 × 10⁵
3.5 × 10⁷PFU
9.8 × 10¹⁰

PV-AB^2470-2954
4.6 × 10²PFU
4.8 × 10⁶
3.4 × 10⁶FFU
3.57 × 10¹¹

^aLD₅₀expressed as the number of infectious units, as determined by plaque or infectious focus assay, that results in 50% lethality after intracerebral inoculation.

^bLD₅₀expressed as the number of virus particles, as determined by OD₂₆₀measurement, that results in 50% lethality after intracerebral inoculation.

^cVirus recovered from the spinal cord of infected mice at the time of death or paralysis; expressed in PFU or FFU/g of tissue, as determined by plaque or infectious focus assay.

^dVirus recovered from the spinal cord of infected mice at the time of death or paralysis, expressed in particles/g of tissue, derived by multiplying values in the third column by the particle/PFU ratio characteristic for each virus (FIG. 4D).

In addition, times of onset of paralysis following infection with PV-AB^755-1513and PV-AB^2470-2954were comparable to that of wt virus (data not shown). Similarly confounding was the observation that at the time of death or paralysis, the viral loads, as determined by plaque assay, in the spinal cords of mice infected with PV-AB^755-1513and PV-AB^2470-2954were 30- and 300-fold lower, respectively, than those in the mice infected with the wt virus (Table 4). Thus, it seemed unlikely that PV-AB^2470-2954, apparently replicating at only 0.3% of the wt virus, would have the same neuropathogenic potential as the wt. However, after having established the altered PFU/particle relationship in PV-AB^755-1513and PV-AB^2470-2954(see Example 3), the amount of inoculum could now be correlated with the actual number of particles inoculated. After performing this correction, it was established that on a particle basis, PV-AB^755-1513and PV-AB^2470-2954are 20-fold and 100-fold neuroattenuated, respectively, compared to the wt. See Table 4. Furthermore, on a particle basis the viral loads in the spinal cords of paralyzed mice were very similar with all three viruses (Table 4).

It was also concluded that it was not possible to redesign the PV capsid gene with synonymous codons that would specifically discriminated against expression in the central nervous system. This may be because tissue-specific differences in codon bias described by others (Plotkin et al., 2004) are too small to bring about a tissue-restrictive virus phenotype. In a larger set of brain-specific genes than the one used by Plotkin et al., no appreciable tissue-specific codon bias was detected (data not shown). However, this conclusion should not detract from the fact that polioviruses produced by the method of this invention are indeed neuroattenuated in mice by a factor of up to 100 fold. That is, 100 fold more of the codon or codon-pair deoptimized viral particles are needed to result in the same damage in the central nervous system as the wt virus.

Example 5

Effects of Codon Deoptimization on Genomic Translation of Polioviruses

In Vitro and In Vivo Translation

Two different HeLa cell S10 cytoplasmic extracts were used in this study. A standard extract was prepared by the method of Molla et al. (1991). [³⁵S]methionine-labeled translation products were analyzed by gel autoradiography. The second extract was prepared as described previously (Kaplan and Racaniello, 1988), except that it was not dialyzed and endogenous cellular mRNAs were not removed with micrococcal nuclease. Reactions with the modified extract were not supplemented with exogenous amino acids or tRNAs. Translation products were analyzed by western blotting with anti-2C monoclonal antibody 91.23 (Pfister and Wimmer, 1999). Relative intensities of 2BC bands were determined by a pixel count of the scanned gel image using the NIH-Image 1.62 software. In all cases, translation reactions were programmed with 200 ng of the various in vitro-transcribed viral genomic RNAs.

For analysis of in vivo translation, HeLa cells were transfected with in vitro-transcribed dicistronic replicon RNA as described above. In order to assess translation isolated from RNA replication, transfections were carried out in the presence of 2 mM guanidine hydrochloride. Cells were lysed after 7 h in passive lysis buffer (Promega, Madison, Wis.) followed by a dual firefly (F-Luc) and Renilla (R-Luc) luciferase assay (Promega). Translation efficiency of the second cistron (P1-Fluc-P2-P3 polyprotein) was normalized through division by the Renilla luciferase activity of the first cistron expressed under control of the Hepatitis C Virus (HCV) internal ribosome entry site (IRES).

Codon-Deoptimized Viruses are Deficient at the Level of Genome Translation

Since the synthetic viruses and the wt PV(M) are indistinguishable in their protein makeup and no known RNA-based regulatory elements were altered in the modified RNA genomes, these designs enabled study of the effect of reduced genome translation/replication on attenuation without affecting cell and tissue tropism or immunological properties of the virus. The PV-AB genome was designed under the hypothesis that introduction of many suboptimal codons into the capsid coding sequence should lead to a reduction of genome translation. Since the P1 region is at the N-terminus of the polyprotein, synthesis of all downstream nonstructural proteins is determined by the rate of translation through the P1 region. To test whether in fact translation is affected, in vitro translations were performed (FIG. 5).

Unexpectedly, the initial translations in a standard HeLa-cell based cytoplasmic S10 extract (Molla et al., 1991) showed no difference in translation capacities for any of the genomes tested (FIG. 5A). However, as this translation system is optimized for maximal translation, it includes the exogenous addition of excess amino acids and tRNAs, which could conceivably compensate for the genetically engineered codon bias. Therefore, in vitro translations were repeated with a modified HeLa cell extract, which was not dialyzed and in which cellular mRNAs were not removed by micrococcal nuclease treatment (FIG. 5B). Translations in this extract were performed without the addition of exogenous tRNAs or amino acids. Thus, an environment was created that more closely resembles that in the infected cell, where translation of the PV genomes relies only on cellular supplies while competing for resources with cellular mRNAs. Due to the high background translation from cellular mRNA and the low [³⁵S]Met incorporation rate in nondialyzed extract, a set of virus-specific translation products were detected by western blotting with anti-2C antibodies (Pfister and Wimmer, 1999). These modified conditions resulted in dramatic reduction of translation efficiencies of the modified genomes which correlated with the extent of the deoptimized sequence. Whereas translation of PV-SD was comparable to that of the wt, translation of three noninfectious genomes, PV-AB, PV-AB^1513-3386, and PV-AB^755-2479, was reduced by approximately 90% (FIG. 5B).

Burns et al. (2006) recently reported experiments related to those described herein. These authors altered codon usage to a much more limited extent than in the present study, and none of their mutant viruses expressed a lethal phenotype. Interestingly, Burns et al. determined that translation did not play a major role in the altered phenotypes of their mutant viruses, a conclusion at variance with the data presented herein. It is likely that the in vitro translation assay used by Burns et al. (2006), which employed a nuclease-treated rabbit reticulocyte lysate supplemented with uninfected HeLa cell extract and excess amino acids, explains their failure to detect any significant reduction in translation. Cf. FIG. 5A.

Considering the ultimately artificial nature of the in vitro translation system, the effect of various capsid designs on translation in cells was also investigated. For this purpose, dicistronic poliovirus reporter replicons were constructed (FIG. 6A) based on a previously reported dicistronic replicon (Zhao and Wimmer, 2001). Various P1 cassettes were inserted immediately upstream and in-frame with the firefly luciferase (F-Luc) gene. Thus, the poliovirus IRES drives expression of a single viral polyprotein similar to the one in the viral genome, with the exception of the firefly luciferase protein between the capsid and the 2A^proproteinase. Expression of the Renilla luciferase (R-Luc) gene under the control of the HCV IRES provides an internal control. All experiments were carried out in the presence of 2 mM guanidine hydrochloride, which completely blocks genome replication (Wimmer et al., 1993). Using this type of construct allowed an accurate determination of the relative expression of the second cistron by calculating the F-Luc/R-Luc ratio. As F-Luc expression depends on successful transit of the ribosome through the upstream P1 region, it provides a measure of the effect of the inserted P1 sequence on the rate of polyprotein translation. Using this method, it was indeed found that the modified capsid coding regions, which were associated with a lethal phenotype in the virus background (e.g., PV-AB, PV-AB^1513-2470, and PV-AB^2470-3386) reduced the rate of translation by approximately 80 to 90% (FIG. 6B). Capsids from two viable virus constructs, PV-AB^2470-2954and PV-AB^2954-3386, allowed translation at 68% and 83% of wt levels, respectively. In vivo translation rates of the first cistron remained constant in all constructs over a time period between 3 and 12 h, suggesting that RNA stability is not affected by the codon alterations (data not shown). In conclusion, the results of these experiments suggest that poliovirus is extremely dependent on very efficient translation as a relatively small drop in translation efficiency through the P1 region of 30%, as seen in PV-AB^2470-2954, resulted in a severe virus replication phenotype.

Example 6

Genetic Stability of Codon-Deoptimized Polioviruses

Due to the distributed effect of many mutations over large genome segments that contribute to the phenotype, codon deoptimized viruses should have genetically stable phenotypes. To study the genetic stability of codon deoptimized viruses, and to test the premise that these viruses are genetically stable, viruses are passaged in suitable host cells. A benefit of the present “death by 1000 cuts” theory of vaccine design is the reduced risk of reversion to wild type. Typical vaccine strains differ by only few point mutations from the wt viruses, and only a small subset of these may actually contribute to attenuation. Viral evolution quickly works to revert such a small number of active mutations. Indeed, such reversion poses a serious threat for the World Health Organization (WHO) project to eradicate poliovirus from the globe. So long as a live vaccine strain is used, there is a very real chance that this strain will revert to wt. Such reversion has already been observed as the source of new polio outbreaks (Georgescu et al., 1997; Kew et al., 2002; Shimizu et al., 2004).

With hundreds to thousands of point mutations in the present synthetic designs, there is little risk of reversion to wt strains. However, natural selection is powerful, and upon passaging, the synthetic viruses inevitably evolve. Studies are ongoing to determine the end-point of this evolution, but a likely outcome is that they get trapped in a local optimum, not far from the original design.

To validate this theory, representative re-engineered viruses are passaged in a host cell up to 50 times. The genomes of evolved viruses are sequenced after 10, 20 and 50 passages. More specifically, at least one example chimera from each type of deoptimized virus is chosen. The starting chimera is very debilitated, but not dead. For example, for PV the chimeras could be PV-AB^2470-2954and PV-Min^755-2470. From each starting virus ten plaques are chosen. Each of the ten plaque-derived virus populations are bulk passaged a total of 50 times. After the 10^th, 20^thand 50^thpassages, ten plaque-purified viruses are again chosen and their genomes are sequenced together with the genomes of the ten parent viruses. After passaging, the fitness of the 40 (30+10 per parent virus) chosen viruses is compared to that of their parents by examining plaque size, and determining plaque forming units/ml as one-step growth kinetics. Select passage isolates are tested for their pathogenicity in appropriate host organisms. For example, the pathogenicity of polioviruses is tested in CD155tg mice.

Upon sequencing of the genomes, a finding that all 10 viral lines have certain mutations in common would suggest that these changes are particularly important for viral fitness. These changes may be compared to the sites identified by toeprinting as the major pause sites (see Example 9); the combination of both kinds of assay may identify mutant codons that are most detrimental to viral fitness. Conversely, a finding that the different lines have all different mutations would support the view that many of the mutant codon changes are very similar in their effect on fitness. Thus far, after 10 passages in HeLa cells, PV-AB^755-1513and PV-AB^2470-2954have not undergone any perceivable gain of fitness. Viral infectious titers remained as low (10⁷PFU/ml and 10⁶FFU/ml) as at the beginning of the passage experiment, and plaque phenotype did not change (data not shown). Sequence analysis of these passaged viruses is now in progress, to determine if and what kind of genetic changes occur during passaging.

Burns et al. (2006) reported that their altered codon compositions were largely conserved during 25 serial passages in HeLa cells. They found that whereas the fitness for replication in HeLa cells of both the unmodified Sabin 2 virus and the codon replacement viruses increased with higher passage numbers, the relative fitness of the modified viruses remained lower than that of the unmodified virus. Thus, all indications are that viruses redesigned by SAVE are genetically very stable. Preliminary data for codon and codon-pair deoptimized viruses of the invention suggest that less severe codon changes distributed over a larger number of codons improves the genetic stability of the individual virus phenotypes and thus improves their potential for use in vaccines.

Example 7

Re-Engineering of Capsid Region of Polioviruses by Deoptimizing Codon Pairs

Calculation of Codon Pair Bias.

Every individual codon pair of the possible 3721 non-“STOP” containing codon pairs (e.g., GTT-GCT) carries an assigned “codon pair score,” or “CPS” that is specific for a given “training set” of genes. The CPS of a given codon pair is defined as the log ratio of the observed number of occurrences over the number that would have been expected in this set of genes (in this example the human genome). Determining the actual number of occurrences of a particular codon pair (or in other words the likelihood of a particular amino acid pair being encoded by a particular codon pair) is simply a matter of counting the actual number of occurrences of a codon pair in a particular set of coding sequences. Determining the expected number, however, requires additional calculations. The expected number is calculated so as to be independent of both amino acid frequency and codon bias similarly to Gutman and Hatfield. That is, the expected frequency is calculated based on the relative proportion of the number of times an amino acid is encoded by a specific codon. A positive CPS value signifies that the given codon pair is statistically over-represented, and a negative CPS indicates the pair is statistically under-represented in the human genome.

To perform these calculations within the human context, the most recent Consensus CDS (CCDS) database of consistently annotated human coding regions, containing a total of 14,795 genes, was used. This data set provided codon and codon pair, and thus amino acid and amino-acid pair frequencies on a genomic scale.

The paradigm of Federov et al. (2002), was used to further enhanced the approach of Gutman and Hatfield (1989). This allowed calculation of the expected frequency of a given codon pair independent of codon frequency and non-random associations of neighboring codons encoding a particular amino acid pair.

$S (P_{ij}) = \ln (\frac{N_{O} (P_{ij})}{N_{E} (P_{ij})}) = \ln (\frac{N_{O} (P_{ij})}{F (C_{i}) F (C_{j}) N_{O} (X_{ij})})$

In the calculation, P_ijis a codon pair occurring with a frequency of N_O(P_ij) in its synonymous group. C_iand C_jare the two codons comprising P_ij, occurring with frequencies F(C_i) and F(C_j) in their synonymous groups respectively. More explicitly, F(C_i) is the frequency that corresponding amino acid X_iis coded by codon C_ithroughout all coding regions and F(C_i)=N_O(C_i)/N_O(X_i), where N_O(C_i) and N_O(X_j) are the observed number of occurrences of codon C_iand amino acid X_irespectively. F(C_j) is calculated accordingly. Further, N_O(X_ij) is the number of occurrences of amino acid pair X_ijthroughout all coding regions. The codon pair bias score S(P_ij) of P_ijwas calculated as the log-odds ratio of the observed frequency N_O(P_ij) over the expected number of occurrences of N_e(P_ij).

Using the formula above, it was then determined whether individual codon pairs in individual coding sequences are over- or under-represented when compared to the corresponding genomic N_e(P_ij) values that were calculated by using the entire human CCDS data set. This calculation resulted in positive S(P_ij) score values for over-represented and negative values for under-represented codon pairs in the human coding regions (FIG. 7).

The “combined” codon pair bias of an individual coding sequence was calculated by averaging all codon pair scores according to the following formula:

$S (P_{ij}) = \sum_{l = 1}^{k} \frac{S (Pij) l}{k - 1} .$

The codon pair bias of an entire coding region is thus calculated by adding all of the individual codon pair scores comprising the region and dividing this sum by the length of the coding sequence.

Changing of Codon Pair Bias.

The capsid-coding region of PV(M) was re-engineered to change codon pair bias. The largest possible number of rarely used codon pairs (creating virus PV-Min) or the largest possible number of widely used codon pairs (creating virus PV-Max) was introduced, while preserving the codon bias and all other features of the wt virus genome. The following explains our method in detail.

Two sequences were designed to vary the poliovirus P1 region codon pair score in the positive (PV-Max; SEQ ID NO:4) and negative (PV-Min; SEQ ID NO:5) directions. By leaving the amino acid sequence unaltered and the codon bias minimally modified, a simulated annealing algorithm was used for shuffling codons, with the optimization goal of a minimum or maximum codon pair score for the P1 capsid region. The resulting sequences were processed for elimination of splice sites and reduction of localized secondary structures. These sequences were then synthesized by a commercial vendor, Blue Heron Biotechnology, and sequence-verified. The new capsid genes were used to replace the equivalent wt sequence in an infectious cDNA clone of wt PV via two PflMI restriction sites. Virus was derived as described in Example 1.

For the PV-Max virus, death of infected cells was seen after 24 h, a result similar to that obtained with wt virus. Maximal viral titer and one-step growth kinetics of PV-Max were also identical to the wt. In contrast, no cell death resulted in cells transfected with PV-Min mutant RNA and no viable virus could be recovered. The transfections were repeated multiple times with the same result. Lysates of PV-Min transfected cells were subjected to four successive blind passages, and still no virus was obtained.

The capsid region of PV-Min was divided into two smaller sub-fragments (PV-Min^755-2470and PV-Min^2470-3386) as had been done for PV-AB (poor codon bias), and the sub-fragments were cloned into the wt background. As with the PV-AB subclones, subclones of PV-Min were very sick, but not dead (FIG. 8). As observed with PV-AB viruses, the phenotype of PV-Min viruses is a result of reduced specific infectivity of the viral particles rather than to lower production of progeny virus. Ongoing studies involve testing the codon pair-attenuated chimeras in CD155tg mice to determine their pathogenicity. Also, additional chimeric viruses comprising subclones of PV-Min cDNAs are being made, and their ability to replicate is being determined (see example 8 and 9 below). Also, the effect of distributing intermediate amounts of codon pair bias over a longer sequence are being confirmed. For example, a poliovirus derivative is designed to have a codon pair bias of about −0.2 (PV-0.2; SEQ ID NO:6), and the mutations from wild type are distributed over the full length of the P1 capsid region. This is in contrast to PV-MinZ (PV-Min^2470-3386) which has a similar codon pair bias, but with codon changes distributed over a shorter sequence.

It is worth pointing out that PV-Min and PV-0.2 are sequences in which there is little change in codon usage relative to wild type. For the most part, the sequences employ the same codons that appear in the wild type PV(M) virus. PV-MinZ is somewhat different in that it contains a portion of PV-Min subcloned into PV(M). As with PV-Min and PV-0.2, the encoded protein sequence is unchanged, but codon usage as determined in either the subcloned region, or over the entire P1 capsid region, is not identical to PV-Min (or PV-0.2), because only a portion of the codon rearranged sequence (which has identical codons over its full length, but not within smaller segments) has been substituted into the PV(M) wild type sequence. Of course, a mutated capsid sequence could be designed to have a codon pair bias over the entire P1 gene while shuffling codons only in the region from nucleotides 2470-3386.

Example 8

Viruses Constructed by a Change of Codon-Pair Bias are Attenuated in CD155 tg Mice

Mice Intracerebral Injections, Survival

To test the attenuation of PV-Min^755-2470and PV-Min^2470-3385in an animal model, these viruses were purified and injected intra-cerebrally into CD 155 (PVR/poliovirus receptor) transgenic mice (See Table 5). Indeed these viruses showed a significantly attenuated phenotype due to the customization of codon pair bias using our algorithm PVM-wt was not injected at higher dose because all mice challenged at 10e5 virions died because of PVM-wt. This attenuated phenotype is due to the customization of codon pair bias using our algorithm. This reaffirms that the customization of codon-pair bias is applicable for a means to create live vaccines.

TABLE 5

Mice Intracerebral Injections, Survival.

10e4

Virus
Virions
10e5 Virions
10e6 Virions
10e7 Virions

PV-Min^755-2470
4/4
3/4
3/5
3/4

PV-Min^2470-3385
4/4
4/4
5/5
3/4

PVM-wt
3/4
0/4
—
—

These findings are significant in two respects. First, they are the first clear experimental evidence that codon pair bias is functionally important, i.e., that a deleterious phenotype can be generated by disturbing codon pair bias. Second, they provide an additional dimension of synonymous codon changes that can be used to attenuate a virus. The in vivo pathogenicity of these codon-pair attenuated chimeras have been tested in CD155tg and have shown an attenuated phenotype (See Table 5). Additional chimeric viruses comprising subclones of PV-Min capsid cDNAs have been assayed for replication in infected cells and have also shown an attenuated phenotype.

Example 9

Construction of Synthetic Poliovirus with Altered Codon-Pair Bias: Implications for Vaccine Development

Calculation of Codon Pair Bias, Implementation of Algorithm to Produce Codon Pair Deoptimized Sequences.

We developed an algorithm to quantify codon pair bias. Every possible individual codon pair was given a “codon pair score”, or “CPS”. We define the CPS as the natural log of the ratio of the observed over the expected number of occurrences of each codon pair over all human coding regions.

$CPS = \ln (\frac{{F (AB)}_{O}}{\frac{F (A) \times F (B)}{F (X) \times F (Y)} \times F (XY)})$

Although the calculation of the observed occurrences of a particular codon pair is straightforward (the actual count within the gene set), the expected number of occurrences of a codon pair requires additional calculation. We calculate This expected number is calculated to be independent both of amino acid frequency and of codon bias, similar to Gutman and Hatfield. That is, the expected frequency is calculated based on the relative proportion of the number of times an amino acid is encoded by a specific codon. A positive CPS value signifies that the given codon pair is statistically over-represented, and a negative CPS indicates the pair is statistically under-represented in the human genome

Using these calculated CPSs, any coding region can then be rated as using over- or under-represented codon pairs by taking the average of the codon pair scores, thus giving a Codon Pair Bias (CPB) for the entire gene.

$CPB = \sum_{i = 1}^{k} \frac{CPSi}{k - 1}$

The CPB has been calculated for all annotated human genes using the equations shown and plotted (FIG. 7). Each point in the graph corresponds to the CPB of a single human gene. The peak of the distribution has a positive codon pair bias of 0.07, which is the mean score for all annotated human genes. Also there are very few genes with a negative codon pair bias. Equations established to define and calculate CPB were then used to manipulate this bias.

Development and Implementation of Computer-Based Algorithm to Produce Codon Pair Deoptimized Sequences.

Using these formulas we next developed a computer based algorithm to manipulate the CPB of any coding region while maintaining the original amino acid sequence. The algorithm has the critical ability to maintain the codon usage of a gene (i.e. preserve the frequency of use of each existing codon) but “shuffle” the existing codons so that the CPB can be increased or decreased. The algorithm uses simulated annealing, a mathematical process suitable for full-length optimization (Park, S. et al., 2004). Other parameters are also under the control of this algorithm; for instance, the free energy of the folding of the RNA. This free energy is maintained within a narrow range, to prevent large changes in secondary structure as a consequence of codon re-arrangement. The optimization process specifically excludes the creation of any regions with large secondary structures, such as hairpins or stem loops, which could otherwise arise in the customized RNA. Using this computer software the user simply needs to input the cDNA sequence of a given gene and the CPB of the gene can be customized as the experimenter sees fit.

De Novo Synthesis of P1 Encoded by Either Over-Represented or Under-Represented Codon-Pairs.

To obtain novel, synthetic poliovirus with its P1 encoded by either over-represented or under-represented codon pairs, we entered the DNA sequence corresponding to the P1 structural region of poliovirus type I Mahoney (PV(M)-wt) into our program yielding—PV-Max-P1 using over-represented codon pairs (566 mutations) and PV-Min-P1 using under-represented codon pairs (631 mutations). The CPB scores of these customized, novel synthetic P-1 regions are PV-Max=+0.25 and PV-Min=−0.48, whereas the CPB of PV(M)-wt is −0.02 (FIG. 7).

Additional customization included inclusion of restriction sites that were designed into both synthetic sequences at given intervals, to allow for sub-cloning of the P1 region. These synthetic P1 fragments were synthesized de novo by Blue Herron Corp. and incorporated into a full-length cDNA construct of poliovirus (FIG. 11) (Karlin et al., 1994). A small fragment (3 codons, 9 nucleotides) of PV(M)-wt sequence was left after the AUG start codon in both constructs to allow translation to initiate equally for all synthetic viruses; thus providing more accurate measurement of the effect of CPB on the elongation phase of translation.

DNA Synthesis, Plasmids, Sub Cloning of Synthetic Capsids and Bacteria.

Large codon-pair altered PV cDNA fragments, corresponding to nucleotides 495 to 3636 of the PV genome, were synthesized by Blue Heron Corp. using their proprietary GeneMaker® system (http://www.blueheronbio.com/). All subsequent poliovirus cDNA clones/sub clones were constructed from PV1(M) cDNA clone pT7PVM using unique restriction sites (van der Wert, et al., 1986). The full-length PV-Min, PV-Max cassette was released from Blue Heron's carrier vector via PflMI digestion and insertion into the pT7PVM vector with its PflMI fragment removed. The PV-MinXY and PV-MinZ constructs were obtained by digestion with NheI and BglII simultaneously, then swapping this fragment with a pT7PVM vector digested similarly PV-MinXY and PV-MinZ were constructed via BsmI digestion and exchanging the fragment/vector with the similarly digested pT7PVM. PV-MinY was constructed by digesting the PV-MinXY construct with BsmI and swapping this fragment with the BsmI fragment for a digested pT7PVM. Plasmid transformation and amplification were all achieved via Escherichia coli DH5α.

Creation of Chimeric Viruses Containing CPB-Altered Capsid Regions: Under-Represented Codon Pair Bias Throughout the P1 Results in a Null Phenotype.

Using the T7 RNA polymerase promoter upstream of the poliovirus genomic sequence, positive-sense RNA was transcribed. 1.5 μg of a given plasmid cDNA clone from above was linearized via an EcoRI digestion and than was transcribed into RNA via T7 RNA polymerase (Stratagene) driven by its promoter upstream of the cDNA for 2 hours at 37° C. (van der Werf et al., 1986). This RNA was transfected into 1×10⁶HeLa R19 cells using a modified DEAE-Dextran method (van der Werf et al., 1986). These cells were than incubate at room-temperature (RT) for 30-minutes. The transfection supernatant was removed and Dulbecco's modified Eagle medium (DMEM) containing 2% bovine calf serum (BCS) was added and the cells were incubated at 37° C. and observed (up to 4 days) for the onset of cytopathic effect (CPE).

The PV-Max RNA transfection produced 90% cytopathic effect (CPE) in 24 hours, which is comparable to the transfection of PV(M)-wt RNA. The PV-Max virus generated plaques identical in size to the wild type. In contrast, the PV-Min RNA produced no visible cytopathic effect after 96 hours, and no viable virus could be isolated even after four blind passages of the supernatant from transfected cells.

The subsequent use of the supernatant from cells subjected to PV-Max RNA transfection also produced 95% CPE in 12 hours, thus indicating that the transfected genomic material successfully produced PV-Max poliovirus virions. In contrast, the PV-Min viral RNA yielded no visible CPE after 96 hours and four blind passages of the supernatant, possibly containing extremely low levels of virus, also did not produce CPE. Therefore the full-length PV-Min synthetic sequence, utilizing under-represented codon pairs, in the P1 region cannot generate viable virus and so it would need to be sub-cloned.

HeLa R19 cells were maintained as a monolayer in DMEM containing 10% BCS. Virus amplification was achieved on (1.0×10⁸cells) HeLa R19 monolayers using 1 M.O.I. Infected cells were incubated at 37° C. in DMEM with 2% BCS for three days or until CPE was observed. After three freeze/thaw cycles cell debris was removed form the lysates via low speed centrifugation and the supernatant containing virus was used for further experiments.

One-Step growth curves were achieved by infecting a monolayer of HeLa R19 cells with 5 M.O.I of a given virus, the inoculums was removed, cells washed 2× with PBS and then incubating at 37° C. for 0, 2, 4, 7, 10, 24, and 48 hours. These time points were then analyzed via plaque assay. All Plaque assay were performed on monolayers of HeLa R19 cells. These cells were infected with serial dilution of a given growth curve time point or purified virus. These cells were then overlaid with a 0.6% tragenthum gum in Modified Eagle Medium containing 2% BCS and then incubated at 37° C. for either 2 days for PV(M)-wt and PV-Max, or 3 days for PV-Min (X, Y, XY, or Z) viruses. These were then developed via crystal violet staining and the PFU/ml titer was calculated by counting visible plaques.

Small Regions of Under-Represented Codon Pair Bias Rescues Viability, but Attenuate the Virus.

Using the restriction sites designed within the PV-Min sequence we subcloned portions of the PV-Min P1 region into an otherwise wild-type virus, producing chimeric viruses where only sub-regions of P1 had poor codon pair bias (FIG. 11) (van der Werf et al., 1986). From each of these sub-clones, RNA was produced via in vitro transcription and then transfected into HeLa R19 cells, yielding viruses with varying degrees of attenuation (Viability scores, FIG. 11). P1 fragments X and Y are each slightly attenuated; however when added together they yield a virus (PV-Min^755-2470, PV-MinXY) that is substantially attenuated (FIGS. 3, 4). Virus PVMin^2470-3385(PV-MinZ) is about as attenuated as PV-MinXY. Construct PV-Min^1513-3385(YZ) did not yield plaques, and so apparently is too attenuated to yield viable virus. These virus constructs, which displayed varying degrees of attenuation were further investigated to determine their actual growth kinetics.

One-Step Growth Kinetics and the Mechanism of Attenuation: Specific Infectivity is Reduced.

For each viable construct, one step-growth kinetics were examined. These kinetics are generally similar to that of wild-type in that they proceed in the same basic manner (i.e. an eclipse phase followed by rapid, logarithmic growth). However, for all PV-Min constructs, the final titer in terms of Plaque Forming Units (PFU) was typically lower than that of wild-type viruses by one to three orders of magnitude (FIG. 12A).

When virus is measured in viral particles per ml (FIG. 12B) instead of PFU, a slightly different result is obtained and suggests these viruses produce nearly equivalent numbers of particles per cell per cycle of infection as the wild-type virus. In terms of viral particles per ml, the most attenuated viruses are only 78% (PV-MinXY) or 82% (PV-MinZ) attenuated which on a log scale is less than one order of magnitude. Thus these viruses appear to be attenuated by about two orders of magnitude in their specific infectivity (the number of virions required to generate a plaque).

To confirm that specific infectivity was reduced, we re-measured the ratio of viral particles per PFU using highly purified virus particles. Selected viruses were amplified on 10⁸HeLa R19 cells. Viral lysates were treated with RNAse A to destroy exposed viral genomes and any cellular RNAs, that would obscure OD values. Also the viral lysates were then incubated for 1 hour with 0.2% SDS and 2 min EDTA to denature cellular and non-virion viral proteins. A properly folded and formed poliovirus capsid survives this harsh SDS treatment, were as alph particles do not (Mueller et al., 2005). Virions from these treated lysates were then purified via ultracentrifugation over a sucrose gradient. The virus particle concentration was measured by optical density at 260 nm using the formula 9.4×10¹²particles/ml=1 OD₂₆₀unit (Rueckert, 1985). A similar number of particles was produced for each of the four viruses (Table 6). A plaque assay was then performed using these purified virions. Again, PV-MinXY and PV-MinZ required many more viral particles than wild-type to generate a plaque (Table 6).

For wild-type virus, the specific infectivity was calculated to be 1 PFU per 137 particles (Table 6), consistent with the literature (Mueller et al., 2006; Schwerdt and Fogh, 1957; Joklik and Darnell, 1961). The specific infectivities of viruses PV-MinXY and PV-MinZ are in the vicinity of 1 PFU per 10,000 particles (Table 6).

Additionally the heat stability of the synthetic viruses was compared to that of PV(M)-wt to reaffirm the SDS treatment data, that these particles with portions of novel RNA were equally as stable. Indeed these synthetic viruses had the same temperature profile as PV(M)-wt when incubated at 50° C. and quantified as a time course (data not shown).

Under-Represented Codon Pairs Reduce Translation Efficiency, Whereas Over-Represented Pairs Enhance Translation.

One hypothesis for the existence of codon pair bias is that the utilization of under-represented pairs causes poor or slow translation rates. Our synthetic viruses are, to our knowledge, the first molecules containing a high concentration of under-represented codon pairs, and as such are the first molecules suitable for a test of the translation hypothesis.

To measure the effect of codon pair bias on translation, we used a dicistronic reporter (Mueller et al., 2006) (FIG. 13). The first cistron expresses Renilla luciferase (R-Luc) under the control of the hepatitis C virus internal ribosome entry site (IRES) and is used as a normalization control. The second cistron expresses firefly luciferase (F-Luc) under the control of the poliovirus IRES. However, in this second cistron, the F-Luc is preceded by the P1 region of poliovirus, and this P1 region could be encoded by any of the synthetic sequence variants described here. Because F-Luc is translated as a fusion protein with the proteins of the P1 region, the translatability of the P1 region directly affects the amount of F-Luc protein produced. Thus the ratio of F-Luc luminescence to R-Luc luminescence is a measure of the translatability of the various P1 encodings.

The P1 regions of wild-type, PV-Max, PV-Min, PV-MinXY and PV-MinZ were inserted into the region labeled “P1” (FIG. 13A). PV-MinXY, PV-MinZ, and PV-Min produce much less F-Luc per unit of R-Luc than does the wild-type P1 region, strongly suggesting that the under-represented codon pairs are causing poor or slow translation rates (FIG. 13). In contrast, PV-Max P1 (which uses over-represented codon pairs) produced more F-Luc per unit of R-Luc, suggesting translation is actually better for PV-Max P1 compared to PV(M)-wt P1.

Dicistronic Reporter Construction, and In Vivo Translation.

The dicistronic reporter constructs were all constructed based upon pdiLuc-PV (Mueller et al., 2006). PV-Max and PV-Min capsid regions were amplified via PCR using the oligonucleotides P1max-2A-RI (+)/P1max-2A-RI (−) or P1min-2A-RI (+)/P1min-2A-RI (−) respectively. The PCR fragment was gel purified and then inserted into an intermediate vector pCR-®-XL-TOPO® (Invitrogen). This intermediate vector was than amplified in One Shot® TOP10 chemically competent cells. After preparation of the plasmid via Quiagne miniprep the intermediate vectors containing PV-Min was digested with EcoRI and these fragments were ligated into the pdiLuc-PV vector that was equally digested with EcoRI (Mueller et al., 2006). These plasmids were also amplified in One Shot® TOP10 chemically competent cells (Invitrogen). To construct pdiLuc-PV-MinXY and pdiLuc-PV-MinZ, pdiLuc-PV and pdiLuc-PV-Min were equally digested with NheI and the resulting restriction fragments were exchanged between the respective vectors. These were than transformed into One Shot® TOP10 chemically competent cells and then amplified. From all four of these clones RNA was transcribed via the T7 polymerase method (van der Werf et al., 1986).

To analyze the in vivo translation efficiency of the synthetic capsids the RNA of the dicistronic reporter constructs were transfected into 2×10⁵HeLa R19 cells on 12-well dishes via Lipofectamine 2000 (Invitrogen). In order to quantify the translation of only input RNA the transfection was accomplished in the presence of 2 min guanidine hydrochloride (GuHCL). Six hours after transfection cells were lysed via passive lysis buffer (Promega) and then these lysates were analyzed by a dual firefly (F-Luc) Renilla (R-Luc) luciferase assay (Promega).

Genetic Stability of PV-MinXY and PV-MinZ.

Because PV-MinXY and PV-MinZ each contain hundreds of mutations (407 and 224, respectively), with each mutation causing a miniscule decrease in overall codon pair bias, we believe it should be very difficult for these viruses to revert to wild-type virulence. As a direct test of this idea, viruses PV-MinXY and PV-MinZ were serially-passaged 15 times, respectively, at an MOI of 0.5. The titer was monitored for phenotypic reversion, and the sequence of the passaged virus was monitored for reversions or mutation. After 15 passages there was no phenotypic change in the viruses (i.e. same titer, induction of CPE) and there were no fixed mutations in the synthetic region.

Heat Stability and Passaging.

The stability of the synthetic viruses, PV-MinXY and PV-Min Z, was tested and compared to PV(M)-wt. This was achieved by heating 1×10⁸particles suspended in PBS to 50° C. for 60 minutes and then measuring the decrease in intact viral particles via plaque assay at 5, 15, 30 and 60 minutes (FIG. 14). In order to test the genetic stability of the synthetic portions of the P1 region of the viruses PV-MinXY and PV-MinZ these viruses were serial passaged. This was achieved by infecting a monolayer of 1×10⁶HeLa R19 cells with 0.5 MOI of viruses, PV-MinXY and PV-MinZ, and then waiting for the induction of CPE. Once CPE initiated, which remained constant throughout passages, the lysates were used to infect new monolayers of HeLa R19 cells. The titer and sequence was monitored at passages 5, 9, and 15 (data not shown).

Virus Purification and Determination of Viral Particles Via OD₂₆₀Absorbance.

A monolayer of HeLa R19 cells on a 15 cm dish (1×10⁸cells) were infected with PV(M)-wt, PV-Max, PV-MinXY or PV-Min Z until CPE was observed. After three freeze/thaw cycles the cell lysates were subjected to two initial centrifugations at 3,000×g for 15 minutes and then 10,000×g for 15 minutes. Then 10 μg/ml of RNAse A (Roche) was added to supernatant and incubated at RT for 1 hour; Subsequently 0.5% sodium dodecyl sulfate (SDS) and 2 mM EDTA was added to the supernatant, gently mixed and incubated at RT for 30 minutes. These supernatants containing virus particles were placed above a 6 ml sucrose cushion [30% sucrose in Hank's Buffered Salt Solution (HBSS)]. Sedimentation of virus particles was achieved by ultracentrifugation through the sucrose gradient for 3.5 hours at 28,000 rpm using an SW28 swing-bucket rotor.

After centrifugation, the sucrose cushion was left intact and the supernatant was removed and the tube was washed two times with HBBS. After washing, the sucrose was removed and the virus “pearl” was re-suspended in PBS containing 0.1% SDS. Viral titers were determined via plaque assay (above). Virus particles concentration was determined via the average of three measurements of the optical density at 260 nm of the solution via the NanoDrop spectrophotometer (NanoDrop Technologies) using the formula 9.4×10¹²particles/ml=1 OD₂₆₀unit (Mueller et al., 2006; Rueckert, 1985).

Neuroattenuation of PV-MinXY and PV-MinZ in CD155tg Mice.

The primary site of infection of wild-type poliovirus is the oropharynx and gut, but this infection is relatively asymptomatic. However, when the infection spreads to motor neurons in the CNS in 1% of PV(M)-wt infections, the virus destroys these neurons, causing death or acute flaccid paralysis know as poliomyelitis (Landsteiner and Popper, 1909; Mueller et al., 2005). Since motor neurons and the CNS are the critical targets of poliovirus, we wished to know whether the synthetic viruses were attenuated in these tissues. Therefore these viruses were administered to CD155tg mice (transgenic mice expressing the poliovirus receptor) via intracerebral injection (Koike et al., 1991). The PLD₅₀value was calculated for the respective viruses and the PV-MinXY and PV-MinZ viruses were attenuated either 1,000 fold based on particles or 10 fold based on PFU (Table 6) (Reed and Muench, 1938). Since these viruses did display neuroattenuation they could be used as a possible vaccine.

TABLE 6

Reduced Specific Infectivity and Neuroattenuation in CD155tg mice.

Purified
Purified
Specific
PLD₅₀
PLD₅₀

Virus
A₂₆₀
Particles/ml^a
PFU/ml
Infectivity^b
(Particles)^c
(PFU)^d

PV-M(wt)
0.956
8.97 × 10¹²
6.0 × 10¹⁰
1/137
10^4.0
10^1.9

PV-Max
0.842
7.92 × 10¹²
6.0 × 10¹⁰
1/132
10^4.1
10^1.9

PV-MinXY
0.944
8.87 × 10¹²
9.6 × 10⁸
1/9,200
10^7.1
10^3.2

PV-MinZ
0.731
6.87 × 10¹²
5.1 × 10⁸
1/13,500
10^7.3
10^3.2

^aThe A₂₆₀was used to determine particles/ml via the formula 9.4 × 10¹²particles/ml = 1 OD₂₆₀unit

^bCalculated by dividing the PFU/ml of purified virus by the Particles/ml

^c,dcalculated after administration of virus via intracerebral injection to CD155tg mice at varying doses

Vaccination of CD155tg Mice Provides Immunity and Protection Against Lethal Challenge.

Groupings of 4-6, 6-8 week old CD155tg mice (Tg21 strain) were injected intracerebrally with purified virus dilutions from 10²particles to 10⁹particles in 30 ul PBS to determine neuropathogenicity (Koike, et al., 1991).

The lethal dose (LD₅₀) was calculated by the Reed and Muench method (Reed and Muench, 1938). Viral titers in the spinal chord and brain were quantified by plaque assay (data not shown).

PV-MinZ and PV-MinXY encode exactly the same proteins as wild-type virus, but are attenuated in several respects, both a reduced specific infectivity and neuroattenuation.

To test PV-Min Z, PV-MinXY as a vaccine, three sub-lethal dose (10⁸particles) of this virus was administered in 100 ul of PBS to 8, 6-8 week old CD155tg mice via intraperitoneal injection once a week for three weeks. One mouse from the vaccine cohort did not complete vaccine regimen due to illness. Also a set of control mice received three mock vaccinations with 100 ul PBS. Approximately one week after the final vaccination, 30 ul of blood was extracted from the tail vein. This blood was subjected to low speed centrifugation and serum harvested. Serum conversion against PV(M)-wt was analyzed via micro-neutralization assay with 100 plaque forming units (PFU) of challenge virus, performed according to the recommendations of WHO (Toyoda et al., 2007; Wahby, A. F., 2000). Two weeks after the final vaccination the vaccinated and control mice were challenged with a lethal dose of PV(M)-wt by intramuscular injection with a 10⁶PFU in 100 ul of PBS (Toyoda et al., 2007). All experiments utilizing CD155tg mice were undertaken in compliance with Stony Brook University's IACUC regulations as well as federal guidelines. All 14 vaccinated mice survived and showed no signs of paralysis or parasia; in contrast, all mock-vaccinated mice died (Table 7). These data suggest that indeed the CPB virus using de-optimized codon pairs is able to immunize against the wild-type virus, providing both a robust humeral response, and also allowing complete survival following challenge.

TABLE 7

Protection Against Lethal Challenge

Virus^a
Mice Protected (out of 7)^b

PV-MinZ
7

PV-MinXY
7

Mock vaccinated
0

^aCD155tg mice received three vaccination doses (10⁸particles) of respective virus

^bchallenged with 10⁶PFU of PV(M)-wt via intramuscular injection.

Example 10

Application of SAVE to Influenza Virus

Influenza virus has 8 separate genomic segments. GenBank deposits disclosing the segment sequences for Influenza A virus (A/Puerto Rico/8/34/Mount Sinai (H1N1)) include AF389115 (segment 1, Polymerase PB2), AF389116 (segment 2, Polymerase PB1), AF389117 (segment 3, Polymerase PA), AF389118 (segment 4, hemagglutinin HA), AF389119 (segment 5, nucleoprotein NP), AF389120 (segment 6, neuraminidase NA), AF389121 (segment 7, matrix proteins M1 and M2), and AF389122 (segment 8, nonstructural protein NS1).

In initial studies, the genomic segment of strain A/PR/8/34 (also referred to herein as A/PR8) encoding the nucleoprotein NP, a major structural protein and the second most abundant protein of the virion (1,000 copies per particle) that binds as monomer to full-length viral RNAs to form coiled ribonucleoprotein, was chosen for deoptimization. (See Table 8, below, for parent and deoptimized sequences). Moreover, NP is involved in the crucial switch from mRNA to template and virion RNA synthesis (Palese and Shaw, 2007). Two synonymous encodings were synthesized, the first replacing frequently used codons with rare synonymous codons (NP^CD) (i.e., de-optimized codon bias) and, the second, de-optimizing codon pairs (NP^CPmin). The terminal 120 nucleotides at either end of the segment were not altered so as not to interfere with replication and encapsidation. NP^CDcontains 338 silent mutations and NP^CPmin(SEQ ID NO:23) contains 314 silent mutations. The mutant NP segments were introduced into ambisense vectors as described (below), and together with the other seven wt influenza plasmids co-transfected into 293T/MDCK co-cultured cells. As a control, cells were transfected with all 8 wt A/PR8 plasmids. Cells transfected with the NP^CDsegment and the NP^CPminsegment produced viable influenza virus similarly to cells transfected with wild-type NP. These new de-optimized viruses, referred to as A/PR8-NP^CDor A/PR8-NP^CPmin, respectively, appear to be attenuated: The titer (in terms of PFU) is 3- to 10-fold lower than the wild-type virus, and the mutant viruses both make small plaques.

Although the de-optimized influenza viruses are not as severely attenuated as a poliovirus containing a similar number of de-optimized codons, there is a difference in the translational strategies of the two viruses. Poliovirus has a single long mRNA, translated into a single polyprotein. Slow translation through the beginning of this long mRNA (as in our capsid de-optimized viruses) will reduce translation of the entire message, and thus affect all proteins. In contrast, influenza has eight separate segments, and de-optimization of one will have little if any effect on translation of the others. Moreover, expression of the NP protein is particularly favored early in influenza virus infection (Palese and Shaw, 2007).

Characterization of Influenza Virus Carrying a Codon Pair Deoptimized NP Segment

The growth characteristics of A/PR8-NP^CPminwere analyzed by infecting confluent monolayers of Madin Darby Canine Kidney cells (MDCK cells) in 100 mm dishes with 0.001 multiplicities of infection (MOI). Virus inoculums were allowed to adsorb at room temperature for 30 minutes on a rocking platform, then supplemented with 10 ml of Dulbecco Modified Eagle Medium (DMEM) containing 0.2% Bovine Serum Albumin (BSA) and 2 ug/ml TPCK treated Trypsin and incubated at 37 C. After 0, 3, 6, 9, 12, 24, and 48 hours, 100 μl of virus containing medium was removed and virus titers determined by plaque assay.

Viral titers and plaque phenotypes were determined by plaque assay on confluent monolayers of MDCK cells in 35 mm six well plates. 10-fold serial dilutions of virus were prepared in Dulbecco Modified Eagle Medium (DMEM) containing 0.2% Bovine Serum Albumin (BSA) and 2 μg/ml TPCK treated Trypsin. Virus dilutions were plated out on MDCK cells and allowed to adsorb at room temperature for 30 minutes on a rocking platform, followed by a one hour incubation at 37 C in a cell culture incubator. The inoculum was then removed and 3 ml of Minimal Eagle Medium containing 0.6% tragacanth gum (Sigma-Aldrich) 0.2% BSA and 2 ug/ml TPCK treated Trypsin. After 72 hours of incubation at 37 C, plaques were visualized by staining the wells with crystal violet.

A/PR8-NP^Minproduced viable virus that produced smaller plaques on MDCK cells compared to the A/PR8 wt (FIG. 16A). Furthermore, upon low MOI infection A/PR8-NP^Minmanifests a delayed growth kinetics, between 3-12 hrs post infection, where A/PR8-NP^Mintiters lags 1.5 logs behind A/PR8 (FIG. 16B). Final titers are were 3-5 fold lower than that of A/PR8 (average of three different experiments).

Characterization of Influenza Viruses A/PR8-PB1^Min-RR, A/PR8-HA^Minand A/PR8-HA^Min/NP^MinCarrying Codon Pair Deoptimized PB1, HA, or HA and NP Segments.

Codon pair de-optimized genomic segments of strain A/PR/8/34 encoding the hemagglutinin protein HA and the polymerase subunit PB1 were produced. HA is a viral structural protein protruding from the viral surface mediating receptor attachment and virus entry. PB1 is a crucial component of the viral RNA replication machinery. Specifically a synonymous encoding of PB1 (SEQ ID NO:15) was synthesized by de-optimizing codon pairs between codons 190-488 (nucleotides 531-1488 of the PB1 segment) while retaining the wildtype codon usage (PB1^Min-RR). Segment PB1^Min-RRcontains 236 silent mutations compared the wt PB1 segment.

A second synonymous encoding of HA (SEQ ID NO:21) was synthesized by de-optimizing codon pairs between codons 50-541 (nucleotides 180-1655 of the HA segment) while retaining the wildtype codon usage (HA^Min). HA^Mincontains 355 silent mutations compared the to wt PB1 segment.

The mutant PB1^Min-RRand HA^Minsegments were introduced into an ambisense vector as described above and together with the other seven wt influenza plasmids co-transfected into 293T/MDCK co-cultured cells. In addition the HA^Minsegment together with the NP^Minsegment and the remaining six wt plasmids were co-transfected. As a control, cells were transfected with all 8 wt A/PR8 plasmids. Cells transfected with either PB1^Min-RRor HA^Minsegments produced viable virus as did the combination of the codon pair deoptimized segments HA^Minand NP^Min. The new de-optimized viruses are referred to as A/PR8-PB1^Min-RR, A/PR8-HA^Min, and A/PR8-HA^Min/NP^Minrespectively.

Growth characteristics and plaque phenotypes were assessed as described above.

A/PR8-PB1^Min-RR, A/PR8-HA^Min, and A/PR8-HA^Min/NP^Minall produced viable virus. A/PR8-PB1^Min-RRand A/PR8-HA^Min/NP^Minproduced smaller plaques on MDCK cells compared to the A/PR8 wt (FIG. 17A). Furthermore, upon low MOI infection on MDCK cells A/PR8-HA^Minand A/PR8-HA^Min/NP^Mindisplay much reduced growth kinetics, especially from 3-12 hrs post infection, where A/PR8-HA^Min/NP^Mintiters lag 1 to 2 orders of magnitude behind A/PR8 (FIG. 17B). Final titers for both A/PR8-HA^Minand A/PR8-HA^Min/NP^Minwere 10 fold lower than that of A/PR8. As A/PR8-HA^Min/NP^Minis more severely growth retarded than A/PR8-HA^Min, it can be concluded that the effect of deoptimizing two segments is additive.

Attenuation of A/PR8-NP^Minin a BALB/c Mouse Model

Groups of 6-8 anesthetized BALB/c mice 6 weeks of age were given 12.5 μl of A/PR8 or A/PR8-NP^Minvirus solution to each nostril containing 10-fold serial dilutions between 10²and 10⁶PFU of virus. Mortality and morbidity (weight loss, reduced activity, death) was monitored. The lethal dose 50, LD₅₀, was calculated by the method of Reed and Muench (Reed, L. J., and M. Muench. 1938. Am. J. Hyg. 27:493-497).

Eight mice were vaccinated once by intranasal inoculation with 10²PFU of A/PR8-NP^Minvirus. A control group of 6 mice was not vaccinated with any virus (mock). 28 days following this initial vaccination the mice were challenged with a lethal dose of the wt virus A/PR8 corresponding to 100 times the LD50.

The LD50 for A/PR8 was 4.6×10¹PFU while the LD50 for A/PR8-NP^Minwas 1×10³PFU. At a dose of 10²all A/PR8-NP^Mininfected mice survived. It can be concluded that A/PR8-NP^Minis attenuated in mice by more than 10 fold compared to the wt A/PR8 virus. This concentration was thus chosen for vaccination experiments. Vaccination of mice with 10²A/PR8-NP^Minresulted in a mild and brief illness, as indicated by a relative weight loss of less than 10% (FIG. 18A). All 8 out of 8 vaccinated mice survived. Mice infected with A/PR8 at the same dose experienced rapid weight loss with severe disease. 6 of 8 mice infected with A/PR8 died between 10 and 13 days post infection (FIG. 18B). Two mice survived and recovered from the wildtype infection.

Upon challenge with 100 times LD50 of wt virus, all A/PR8-NP^Minvaccinated were protected, and survived the challenge without disease symptoms or weight loss (FIG. 18C). Mock vaccinated mice on the other hand showed severe symptoms, and succumbed to the infection between 9 and 11 days after challenge. It can be concluded that A/PR8-NP^Mininduced protective immunity in mice and, thus, has potential as a live attenuated influenza vaccine. Other viruses such as A/PR8-PB1^Min-RRand A/PR8-HA^Min/NP^Min, yet to be tested in mice, may lead to improve further the beneficial properties of codon-pair deoptimized influenza viruses as vaccines.

Example 11

Development of Higher-Throughput Methods for Making and Characterizing Viral Chimeras

Constructing Chimeric Viruses by Overlapping PCR

The “scan” through each attenuated mutant virus is performed by placing approximately 300-bp fragments from each mutant virus into a wt context using overlap PCR. Any given 300-bp segment overlaps the preceding segment by ˜200 bp, i.e., the scanning window is ˜300 bp long, but moves forward by ˜100 bp for each new chimeric virus. Thus, to scan through one mutant virus (where only the ˜3000 bp of the capsid region has been altered) requires about 30 chimeric viruses. The scan is performed in 96-well dish format which has more than sufficient capacity to analyze two viruses simultaneously.

The starting material is picogram amounts of two plasmids, one containing the sequence of the wt virus, and the other the sequence of the mutant virus. The plasmids include all the necessary elements for the PV reverse genetics system (van der Werf et al., 1986), including the T7 RNA polymerase promoter, the hammerhead ribozyme (Herold and Aldino, 2000), and the DNA-encoded poly(A) tail. Three pairs of PCR primers are used, the A, M (for Mutant), and B pairs. See FIG. 9. The M pair amplifies the desired 300 bp segment of the mutant virus; it does not amplify wt, because the M primer pairs are designed based on sequences that have been significantly altered in the mutant. The A and B pairs amplify the desired flanks of the wt viral genome Importantly, about 20-25 bp of overlap sequence is built into the 5′ ends of each M primer as well as A2 and B1, respectively; these 20-25 bps overlap (100% complementarity) with the 3′ end of the A segment and the 5′ end of the B segment, respectively.

To carry out the overlapping PCR, one 96-well dish contains wt plasmid DNA, and the 30 different A and B pairs in 30 different wells. A separate but matching 96-well plate contains mutant plasmid DNA and the 30 different M primer pairs. PCR is carried out with a highly processive, low error rate, heat-stable polymerase. After the first round of PCR, each reaction is treated with DpnI, which destroys the template plasmid by cutting at methylated GmATC sites. An aliquot from each wt and matching mutant reaction is then mixed in PCR reaction buffer in a third 96-well dish. This time, primers flanking the entire construct are used (i.e., the A1 and B2 primers). Since each segment (A, M, and B) is designed to overlap each adjacent segment by at least 20 bp, and since the reaction is being driven by primers that can only amplify a full-length product, the segments anneal and mutually extend, yielding full-length product after two or three cycles. This is a “3-tube” (three 96-well dish) design that may be compacted to a “1-tube” (one 96-well dish) design.

Characterization of Chimeric Viruses

Upon incubation with T7 RNA polymerase, the full length linear chimeric DNA genomes produced above with all needed upstream and downstream regulatory elements yields active viral RNA, which produces viral particles upon incubation in HeLa S10 cell extract (Molla et al., 1991) or upon transfection into HeLa cells. Alternatively, it is possible to transfect the DNA constructs directly into HeLa cells expressing the T7 RNA polymerase in the cytoplasm.

The functionality of each chimeric virus is then assayed using a variety of relatively high-throughput assays, including visual inspection of the cells to assess virus-induced CPE in 96-well format; estimation of virus production using an ELISA; quantitative measurement of growth kinetics of equal amounts of viral particles inoculated into cells in a series of 96-well plates; and measurement of specific infectivity (infectious units/particle [IU/P] ratio).

The functionality of each chimeric virus can then be assayed. Numerous relatively high-throughput assays are available. A first assay may be to visually inspect the cells using a microscope to look for virus-induced CPE (cell death) in 96-well format. This can also be run an automated 96-well assay using a vital dye, but visual inspection of a 96-well plate for CPE requires less than an hour of hands-on time, which is fast enough for most purposes.

Second, 3 to 4 days after transfection, virus production may be assayed using the ELISA method described in Example 3. Alternatively, the particle titer is determined using sandwich ELISA with capsid-specific antibodies. These assays allow the identification of non-viable constructs (no viral particles), poorly replicating constructs (few particles), and efficiently replicating constructs (many particles), and quantification of these effects.

Third, for a more quantitative determination, equal amounts of viral particles as determined above are inoculated into a series of fresh 96-well plates for measuring growth kinetics. At various times (0, 2, 4, 6, 8, 12, 24, 48, 72 h after infection), one 96-well plate is removed and subjected to cycles of freeze-thawing to liberate cell-associated virus. The number of viral particles produced from each construct at each time is determined by ELISA as above.

Fourth, the IU/P ratio can be measured (see Example 3).

Higher Resolution Scans

If the lethality of the viruses is due to many small defects spread through the capsid region, as the preliminary data indicate, then many or most of the chimeras are sick and only a few are non-viable. If this is the case, higher-resolution scans are probably not necessary. Conversely, if one or more of the 300 bp segments do cause lethality (as is possible for the codon-deoptimized virus in the segment between 1513 and 2470 which, as described below, may carry a translation frameshift signal that contribute to the strong phenotype of this segment), the genome scan is repeated at higher resolution, for instance a 30 bp window moving 10 bp between constructs over the 300-bp segment, followed by phenotypic analysis. A 30-bp scan does not involve PCR of the mutant virus; instead, the altered 30-bp segment is designed directly into PCR primers for the wt virus. This allows the changes responsible for lethality to be pinpointed.

Example 12

Ongoing Investigations into the Molecular Mechanisms Underlying SAVE

Choice of Chimeras

Two to four example chimeras from each of the two parental inviable viruses (i.e., 4 to 8 total viruses) are used in the following experiments. Viable chimeras having relatively small segments of mutant DNA, but having strong phenotypes are selected. For instance, viruses PV-AB^755-1513, PVAB^2470-2954and PV-AB^2954-3386from the deoptimized codon virus (see Example 1), and PV-Min^755-2470and PV-Min^2470-3386(see Example 7), are suitable. Even better starting chimeras, with smaller inserts that will make analysis easier, may also be obtained from the experiments described above (Example 8).

RNA Abundance/Stability

Conceivably the altered genome sequence destabilizes the viral RNA. Such destabilization could be a direct effect of the novel sequence, or an indirect effect of a pause in translation, or other defect in translation (see, e.g., Doma and Parker, 2006). The abundance of the mutant viral RNA is therefore examined Equal amounts of RNA from chimeric mutant virus, and wt virus are mixed and transfected into HeLa cells. Samples are taken after 2, 4, 8, and 12 h, and analyzed by Northern blotting or quantitative PCR for the two different viral RNAs, which are easily distinguishable since there are hundreds of nucleotide differences. A control with wt viral RNA compared to PV-SD (the codon-shuffled virus with a wt phenotype) is also done. A reduced ratio of mutant to wt virus RNA indicates that the chimera has a destabilized RNA.

In Vitro Translation

Translation was shown to be reduced for the codon-deoptimized virus and some of its derivatives. See Example 5. In vitro translation experiments are repeated with the codon pair-deoptimized virus (PV-Min) and its chosen chimeras. There is currently no good theory, much less any evidence, as to why deoptimized codon pairs should lead to viral inviability, and hence, investigating the effect on translation may help illuminate the underlying mechanism.

In vitro translations were performed in two kinds of extracts in Example 5. One was a “souped up” extract (Molla et al., 1991), in which even the codon-deoptimized viruses gave apparently good translation. The other was an extract more closely approximating normal in vivo conditions, in which the deoptimized-codon viruses were inefficiently translated. There were four differences between the extracts: the more “native” extract was not dialyzed; endogenous cellular mRNAs were not destroyed with micrococcal nuclease; the extract was not supplemented with exogenous amino acids; and the extract was not supplemented with exogenous tRNA. In the present study, these four parameters are altered one at a time (or in pairs, as necessary) to see which have the most significant effect on translation. For instance, a finding that it is the addition of amino acids and tRNA that allows translation of the codon-deoptimized virus strongly supports the hypothesis that translation is inefficient simply because rare aminoacyl-tRNAs are limiting. Such a finding is important from the point of view of extending the SAVE approach to other kinds of viruses.

Translational Frameshifting

Another possible defect is that codon changes could promote translational frameshifting; that is, at some codon pairs, the ribosome could shift into a different reading frame, and then arrive at an in-frame stop codon after translating a spurious peptide sequence. This type frameshifting is an important regulatory event in some viruses. The present data reveal that all PV genomes carrying the AB mutant segment from residue 1513 to 2470 are non-viable. Furthermore, all genomes carrying this mutant region produce a novel protein band during in vitro translation of approximately 42-44 kDa (see FIG. 5A, marked by asterisk). This novel protein could be the result of a frameshift.

Examination of the sequence in the 1513-2470 interval reveals three potential candidate sites that conform to the slippery heptameric consensus sequence for −1 frameshifting in eukaryotes (X-XXY-YYZ) (Farabaugh, 1996). These sites are A-AAA-AAT at positions 1885 and 1948, and T-TTA-TTT at position 2119. They are followed by stop codons in the −1 frame at 1929, 1986 or 2149, respectively. The former two seem the more likely candidates to produce a band of the observed size.

To determine whether frameshifting is occurring, each of the three candidate regions is separately mutated so that it becomes unfavorable for frameshifting. Further, each of the candidate stop codons is separately mutated to a sense codon. These six new point mutants are tested by in vitro translation. Loss of the novel 42-44 kDa protein upon mutation of the frameshifting site to an unfavorable sequence, and an increase in molecular weight of that protein band upon elimination of the stop codon, indicate that frameshifting is occurring. If frameshifting is the cause of the aberrant translation product, the viability of the new mutant that lacks the frameshift site is tested in the context of the 1513-2470 mutant segment. Clearly such a finding would be of significance for future genome designs, and if necessary, a frameshift filter may be incorporated in the software algorithm to avoid potential frameshift sites.

More detailed investigations of translational defects are conducted using various techniques including, but not limited to, polysome profiling, toeprinting, and luciferase assays of fusion proteins.

Polysome Profiling

Polysome profiling is a traditional method of examining translation. It is not high-throughput, but it is very well developed and understood. For polysome profiling, cell extracts are made in a way that arrests translation (with cycloheximide) and yet preserves the set of ribosomes that are in the act of translating their respective mRNAs (the “polysomes”). These polysomes are fractionated on a sucrose gradient, whereby messages associated with a larger number of ribosomes sediment towards the bottom. After fractionation of the gradient and analysis of RNA content using UV absorption, a polysome profile is seen where succeeding peaks of absorption correspond to mRNAs with N+1 ribosomes; typically 10 to 15 distinct peaks (representing the 40S ribosomal subunit, the 60S subunit, and 1, 2, 3, . . . 12, 13 ribosomes on a single mRNA) can be discerned before the peaks smudge together. The various fractions from the sucrose gradient are then run on a gel, blotted to a membrane, and analyzed by Northern analysis for particular mRNAs. This then shows whether that particular mRNA is primarily engaged with, say, 10 to 15 ribosomes (well translated), or 1 to 4 ribosomes (poorly translated).

In this study, for example, the wt virus, the PV-AB (codon deoptimized) virus, and its derivatives PV-AB^755-1513, and PV-AB^2954-3386, which have primarily N-terminal or C-terminal deoptimized segments, respectively, are compared. The comparison between the N-terminal and C-terminal mutant segments is particularly revealing. If codon deoptimization causes translation to be slow, or paused, then the N-terminal mutant RNA is associated with relatively few ribosomes (because the ribosomes move very slowly through the N-terminal region, preventing other ribosomes from loading, then zip through the rest of the message after traversing the deoptimized region). In contrast, the C-terminal mutant RNA are associated with a relatively large number of ribosomes, because many ribosomes are able to load, but because they are hindered near the C-terminus, they cannot get off the transcript, and the number of associated ribosomes is high.

Polysome analysis indicates how many ribosomes are actively associated with different kinds of mutant RNAs, and can, for instance, distinguish models where translation is slow from models where the ribosome actually falls off the RNA prematurely. Other kinds of models can also be tested.

Toeprinting

Toeprinting is a technique for identifying positions on an mRNA where ribosomes are slow or paused. As in polysome profiling, actively translating mRNAs are obtained, with their ribosomes frozen with cycloheximide but still associated; the mRNAs are often obtained from an in vitro translation reaction. A DNA oligonucleotide primer complementary to some relatively 3′ portion of the mRNA is used, and then extended by reverse transcriptase. The reverse transcriptase extends until it collides with a ribosome. Thus, a population of translating mRNA molecules generates a population of DNA fragments extending from the site of the primer to the nearest ribosome. If there is a site or region where ribosomes tend to pause (say, 200 bases from the primer), then this site or region will give a disproportionate number of DNA fragments (in this case, fragments 200 bases long). This then shows up as a “toeprint” (a band, or dark area) on a high resolution gel. This is a standard method for mapping ribosome pause sites (to within a few nucleotides) on mRNAs.

Chimeras with segments of deoptimized codons or codon pairs, wherein in different chimeras the segments are shifted slightly 5′ or 3′, are analyzed. If the deoptimized segments cause ribosomes to slow or pause, the toeprint shifts 5′ or 3′ to match the position of the deoptimized segment. Controls include wt viral RNA and several (harmlessly) shuffled viral RNAs. Controls also include pure mutant viral RNA (i.e., not engaged in translation) to rule out ribosome-independent effects of the novel sequence on reverse transcription.

The toeprint assay has at least two advantages. First, it can provide direct evidence for a paused ribosome. Second, it has resolution of a few nucleotides, so it can identify exactly which deoptimized codons or deoptimized codon pairs are causing the pause. That is, it may be that only a few of the deoptimized codons or codon pairs are responsible for most of the effect, and toe-printing can reveal that.

Dual Luciferase Reporter Assays of Fusion Proteins

The above experiments may suggest that certain codons or codon pairs are particularly detrimental for translation. As a high-throughput way to analyze effects of particular codons and codon pairs on translation, small test peptides are designed and fused to the N-terminus of sea pansy luciferase. Luciferase activity is then measured as an assay of the translatability of the peptide. That is, if the N-terminal peptide is translated poorly, little luciferase will be produced.

A series of eight 25-mer peptides are designed based on the experiments above. Each of the eight 25-mers is encoded 12 different ways, using various permutations of rare codons and/or rare codon pairs of interest. Using assembly PCR, these 96 constructs (8 25-mers×12 encodings) are fused to the N-terminus of firefly luciferase (F-luc) in a dicistronic, dual luciferase vector described above (see Example 5 and FIG. 6). A dual luciferase system uses both the firefly luciferase (F-Luc) and the sea pansy (Renilla) luciferase (R-Luc); these emit light under different biochemical conditions, and so can be separately assayed from a single tube or well. A dicistronic reporter is expressed as a single mRNA, but the control luciferase (R-Luc) is translated from one internal ribosome entry site (IRES), while the experimental luciferase (F-luc) (which has the test peptides fused to its N-terminus) is independently translated from its own IRES. Thus, the ratio of F-Luc activity to R-Luc activity is an indication of the translatability of the test peptide. See FIG. 6.

The resulting 96 dicistronic reporter constructs are transfected directly from the PCR reactions into 96 wells of HEK293 or HeLa cells. The firefly luciferase of the upstream cistron serves as an internal transfection control. Codon- or codon-pair-dependent expression of the sea pansy luciferase in the second cistron can be accurately determined as the ratio between R-Luc and F-Luc. This assay is high-throughput in nature, and hundreds or even thousands of test sequences can be assayed, as necessary.

Example 13

Design and Synthesis of Attenuated Viruses Using Novel Alternative-Codon Strategy

The SAVE approach to re-engineering viruses for vaccine production depends on large-scale synonymous codon substitution to reduce translation of viral proteins. This can be achieved by appropriately modulating the codon and codon pair bias, as well as other parameters such as RNA secondary structure and CpG content. Of the four de novo PV designs, two (the shuffled codon virus, PV-SD, and the favored codon pair virus, PV-Max) resulted in little phenotypic change over the wt virus. The other two de novo designs (PV-AB and PV-Min) succeeded in killing the virus employing only synonymous substitutions through two different mechanisms (drastic changes in codon bias and codon pair bias, respectively). The live-but-attenuated strains were constructed by subcloning regions from the inactivated virus strains into the wt.

A better understanding of the underlying mechanisms of viral attenuation employing large scale synonymous substitutions facilitates predictions of the phenotype and expression level of a synthetic virus. Ongoing studies address questions relating to the effect of the total number of alterations or the density of alterations on translation efficiency; the effect of the position of dense regions on translation; the interaction of codon and codon pair bias; and the effect of engineering large numbers of short-range RNA secondary structures into the genome. It is likely that there is a continuum between the wt and inactivated strains, and that any desired attenuation level can be engineered into a weakened strain. However, there may be hard limits on the attenuation level that can be achieved for any infection to be at self-sustaining and hence detectable. The 15⁴⁴²encodings of PV proteins constitutes a huge sequence space to explore, and various approaches are being utilized to explore this sequence space more systematically. These approaches include, first, developing a software platform to help design novel attenuated viruses, and second, using this software to design, and then synthesize and characterize, numerous new viruses that explore more of the sequence space, and answer specific questions about how alternative encodings cause attenuation. Additionally, an important issue to consider is whether dangerous viruses might accidentally be created by apparently harmless shuffling of synonymous codons.

Development of Software for Computer-Based Design of Viral Genomes and Data Analysis

Designing synthetic viruses requires substantial software support for (1) optimizing codon and codon-pair usage and monitoring RNA secondary structure while preserving, embedding, or removing sequence specific signals, and (2) partitioning the sequence into oligonucleotides that ensure accurate sequence-assembly. The prototype synthetic genome design software tools are being expanded into a full environment for synthetic genome design. In this expanded software, the gene editor is conceptually built around constraints instead of sequences. The gene designer works on the level of specifying characteristics of the desired gene (e.g., amino acid sequence, codon/codon-pair distribution, distribution of restriction sites, and RNA secondary structure constraints), and the gene editor algorithmically designs a DNA sequence realizing these constraints. There are many constraints, often interacting with each other, including, but not limited to, amino acid sequence, codon bias, codon pair bias, CG dinucleotide content, RNA secondary structure, cis-acting nucleic acid signals such as the CRE, splice sites, polyadenylation sites, and restriction enzyme recognition sites. The gene designer recognizes the existence of these constraints, and designs genes with the desired features while automatically satisfying all constraints to a pre-specified level.

The synthesis algorithms previously developed for embedding/removing patterns, secondary structures, overlapping coding frames, and adhering to codon/codon-pair distributions are implemented as part of the editor, but more important are algorithms for realizing heterogeneous combinations of such preferences. Because such combinations lead to computationally intractable (NP-complete) problems, heuristic optimization necessarily plays an important role in the editor. Simulated annealing techniques are employed to realize such designs; this is particularly appropriate as simulated annealing achieved its first practical use in the early VLSI design tools.

The full-featured gene design programming environment is platform independent, running in Linux, Windows and MacOS. The system is designed to work with genomes on a bacterial or fungal (yeast) scale, and is validated through the synthesis and evaluation of the novel attenuated viral designs described below.

Virus Designs with Extreme Codon Bias in One or a Few Amino Acids

For a live vaccine, a virus should be as debilitated as possible, short of being inactivated, in which case there is no way to grow and manufacture the virus. One way of obtaining an optimally debilitated is to engineer the substitution of rare codons for just one or a few amino acids, and to create a corresponding cell line that overexpresses the rare tRNAs that bind to those rare codons. The virus is then able to grow efficiently in the special, permissive cell line, but is inviable in normal host cell lines. Virus is grown and manufactured using the permissive cell line, which is not only very convenient, but also safer than methods used currently used for producing live attenuated vaccines.

With the sequencing of the human genome, information regarding copy number of the various tRNA genes that read rare codons is available. Based on the literature (e.g., Lavner and Kotlar, 2005), the best rare codons for present purposes are CTA (Leu), a very rare codon which has just two copies of the cognate tRNA gene; TCG (Ser), a rare codon with four copies of the cognate tRNA gene; and CCG (Pro), a rare codon with four copies of the cognate tRNA gene (Lavner and Kotlar, 2005). The median number of copies for a tRNA gene of a particular type is 9, while the range is 2 to 33 copies (Lavner and Kotlar, 2005). Thus, the CTA codon is not just a rare codon, but is also the one codon with the fewest cognate tRNA genes. These codons are not read by any other tRNA; for instance, they are not read via wobble base pairing.

Changing all the codons throughout the virus genome coding for Leu (180 codons), Ser (153), and Pro (119) to the rare synonymous codons CTA, TCG, or CCG, respectively, is expected to create severely debilitated or even non-viable viruses. Helper cells that overexpress the corresponding rare tRNAs can then be created. The corresponding virus is absolutely dependent on growing only in this artificial culture system, providing the ultimate in safety for the generation of virus for vaccine production.

Four high-priority viruses are designed and synthesized: all Leu codons switched to CTA; all Ser codons switched to TCG; all Pro codons switched to CCG; and all Leu, Ser, and Pro codons switched to CTA, TCG, and CCG, respectively, in a single virus. In one embodiment, these substitutions are made only in the capsid region of the virus, where a high rate of translation is most important. In another embodiment, the substitutions are made throughout the virus.

CG Dinucleotide Bias Viruses

With few exceptions, virus genomes under-represent the dinucleotide CpG, but not GpC (Karlin et al., 1994). This phenomenon is independent of the overall G+C content of the genome. CpG is usually methylated in the human genome, so that single-stranded DNA containing non-methylated CpG dinucleotides, as often present in bacteria and DNA viruses, are recognized as a pathogen signature by the Toll-like receptor 9. This leads to activation of the innate immune system. Although a similar system has not been shown to operate for RNA viruses, inspection of the PV genome suggests that PV has selected against synonymous codons containing CpG to an even greater extent than the significant under-representation of CpG dinucleotides in humans. This is particularly striking for arginine codons. Of the six synonymous Arg codons, the four CG containing codons (CGA, CGC CGG, CCU) together account for only 24 of all 96 Arg codons while the remaining two (AGA, AGG) account for 72. This in contrast to the average human codon usage, which would predict 65 CG containing codons and 31 AGA/AGO codons. In fact, two of the codons under-represented in PV are frequently used in human cells (CGC, CGG). There are two other hints that CG may be a disadvantageous dinucleotide in PV. First, in the codon pair-deoptimized virus, many of the introduced rare codon pairs contain CG as the central dinucleotide of the codon pair hexamer. Second, when Burns et al. (2006) passaged their codon bias-deoptimized virus and sequenced the genomes, it was observed that these viruses evolved to remove some CG dinucleotides.

Thus, in one high-priority redesigned virus, most or all Arg codons are changed to CGC or CGG (two frequent human codons). This does not negatively affect translation and allows an assessment of the effect of the CpG dinucleotide bias on virus growth. The increased C+G content of the resulting virus requires careful monitoring of secondary structure; that is, changes in Arg codons are not allowed to create pronounced secondary structures.

Modulating Codon-Bias and Codon-Pair Bias Simultaneously.

Codon bias and codon-pair bias could conceivably interact with each other at the translational level. Understand this interaction may enable predictably regulation of the translatability of any given protein, possibly over an extreme range.

If we represent wild type polio codon bias and codon pair bias as 0, and the worst possible codon bias and codon pair bias as −1, then four high-priority viruses are the (−0.3, −0.3), (−0.3, −0.6), (−0.6, −0.3), and (−0.6, −0.6) viruses. These viruses reveal how moderately poor or very poor codon bias interacts with moderately poor or very poor codon pair virus. These viruses are compared to the wild type and also to the extreme PV-AB (−1, 0) and PV-Min (0, −1) designs.

Modulating RNA Secondary Structure

The above synthetic designs guard against excessive secondary structures. Two additional designs systematically avoid secondary structures. These viruses are engineered to have wt codon and codon-pair bias with (1) provably minimal secondary structure, and (2) many small secondary structures sufficient to slow translation.

Additional Viral Designs

Additional viral designs include full-genome codon bias and codon-pair bias designs; non-CG codon pair bias designs; reduced density rare codon designs; and viruses with about 150 rare codons, either spread through the capsid region, or grouped at the N-terminal end of the capsid, or grouped at the C-terminal end of the capsid.

Example 14

Testing the Potential for Accidentally Creating Viruses of Increased Virulence

It is theoretically possible that redesigning of viral genomes with the aim of attenuating these viruses could accidentally make a virus more virulent than the wt virus. Because protein sequences are not altered in the SAVE procedure, this outcome is unlikely. Nevertheless, it is desirable to experimentally demonstrate that the SAVE approach is benign.

Out of the possible 10⁴⁴²sequences that could possibly encode PV proteins, some reasonably fit version of PV likely arose at some point in the past, and evolved to a local optimum (as opposed to a global optimum). The creation of a new version of PV with the same protein coding capacity but a very different set of codons places this new virus in a different location on the global fitness landscape, which could conceivably be close to a different local optimum than wt PV. Conceivably, this new local optimum could be better than the wild type local optimum. Thus, it is just barely possible that shuffling synonymous codons might create a fitter virus.

To investigate this possibility, 13 PV genomes are redesigned and synthesized: one virus with the best possible codon bias; one virus with the best possible codon pair bias (i.e., PV-Max); one virus with the best possible codon and codon pair bias; and 10 additional viruses with wt codon and codon pair bias, but shuffled synonymous codons. Other parameters, such as secondary structure, C+G content, and CG dinucleotide content are held as closely as possible to wt levels.

These 13 viruses may each be in a very different location of the global fitness landscape from each other and from the wild type. But none of them is at a local optimum because they have not been subject to selection. The 13 viruses and the wt are passaged, and samples viruses are taken at the 1^st, 10^th, 20^th, and 50^thpassages. Their fitness is compared to each other and to wt by assessing plaque size, plaque-forming units/ml in one-step growth curves, and numbers of particles formed per cell. See Example 1. Five examples of each of the 13 viruses are sequenced after the 10^th, 20^th, and 50^thpassage. Select passage isolates are tested for pathogenicity in CD155tg mice, and LD₅₀'s are determined. These assays reveal whether any of the viruses are fitter than wt, and provide a quantitative measure of the risk of accidental production of especially virulent viruses. The 10 viruses with wt levels of codon and codon pair bias also provide information on the variability of the fitness landscape at the encoding level.

In view of the possibility that a fitter virus could emerge, and that a fitter virus may be a more dangerous virus, these experiments are conducted in a BSL3 laboratory. After the 10^thpassage, phenotypes and sequences are evaluated and the susceptibility of emerging viruses to neutralization by PV-specific antibodies is verified. The experiment is stopped and reconsidered if any evidence of evolution towards a significantly fitter virus, or of systematic evolution towards new protein sequences that evade antibody neutralization, is obtained. Phenotypes and sequences are similarly evaluated after passage 20 before proceeding to passage 50. Because the synthetic viruses are created to encode exactly the same proteins as wt virus, the scope for increased virulence seems very limited, even if evolution towards (slightly) increased fitness is observed.

Example 15

Extension of SAVE Approach to Virus Systems Other than Poliovirus

Notwithstanding the potential need for a new polio vaccine to combat the potential of reversion in the closing stages of the global effort at polio eradication, PV has been selected in the present studies primarily as a model system for developing SAVE. SAVE has, however, been developed with the expectation that this approach can be extended to other viruses where vaccines are needed. This extension of the SAVE strategy is herein exemplified by application to Rhinovirus, the causative agent of the common cold, and to influenza virus.

Adaptation of SAVE to Human Rhinovirus—a Virus Closely Related to Polio Virus

Two model rhinoviruses, HRV2 and HRV14, were selected to test the SAVE approach for several reasons: (1) HRV2 and HRV14 represent two members of the two different genetic subgroups, A and B (Ledford et al., 2004); (2) these two model viruses use different receptors, LDL-receptor and ICAM-1, respectively (Greve et al., 1989; Hofer et al., 1994); both viruses as well as their infectious cDNA clones have been used extensively, and most applicable materials and methods have been established (Altmeyer et al., 1991; Gerber et al., 2001); and (4) much of the available molecular knowledge of rhinoviruses stems from studies of these two serotypes.

The most promising SAVE strategies developed through the PV experiments are applied to the genomes of HRV2 and HRV14. For example, codons, codon pairs, secondary structures, or combinations thereof, are deoptimized. Two to three genomes with varying degrees of attenuation are synthesized for each genotype. Care is taken not to alter the CRE, a critical RNA secondary structure of about 60 nucleotides (Gerber et al., 2001; Goodfellow et al., 2000; McKnight, 2003). This element is vital to the replication of picornaviruses and thus the structure itself must be maintained when redesigning genomes. The location of the CRE within the genome varies for different picornaviruses, but is known for most families (Gerber et al., 2001; Goodfellow et al., 2000; McKnight, 2003), and can be deduced by homology modeling for others where experimental evidence is lacking. In the case of HRV2 the CRE is located in the RNA sequence corresponding to the nonstructural protein 2A^pro; and the CRE of HRV14 is located in the VP1 capsid protein region (Gerber et al., 2001; McKnight, 2003).

The reverse genetics system to derive rhinoviruses from DNA genome equivalents is essentially the same as described above for PV, with the exception that transfections are done in HeLa-H1 cells at 34° C. in Hepes-buffered culture medium containing 3 mM Mg++ to stabilize the viral capsid. The resulting synthetic viruses are assayed in tissue culture to determine the PFU/IU ratio. See Example 3. Plaque size and kinetics in one-step growth curves are also assayed as described. See Example 2. Because the SAVE process can be applied relatively cheaply to all 100 or so relevant rhinoviruses, it is feasible to produce a safe and effective vaccine for the common cold using this approach.

Adaptation of SAVE to Influenza A Virus—a Virus Unrelated to Poliovirus

The most promising SAVE design criteria identified from PV experimentation are used to synthesize codon-deoptimized versions of influenza virus. The influenza virus is a “segmented” virus consisting of eight separate segments of RNA; each of these can be codon-modified. The well established ambisense plasmid reverse genetics system is used for generating variants of influenza virus strain A/PR/8/34. This eight-plasmid system is a variation of what has been described previously (Hoffmann et al., 2000), and has been kindly provided by Drs. P. Palese and A. Garcia-Sastre. Briefly, the eight genome segments of influenza each contained in a separate plasmid are flanked by a Pol I promoter at the 3′ end and Pol I terminator at the 5′ end on the antisense strand. This cassette in turn is flanked by a cytomegalovirus promoter (a Pol II promoter) at the 5′ end and a polyadenylation signal at the 3′ end on the forward strand (Hoffmann et al., 2000). Upon co-transfection into co-cultured 293T and MDCK cells, each ambisense expression cassette produces two kinds of RNA molecules. The Pol II transcription units on the forward strand produce all influenza mRNAs necessary for protein synthesis of viral proteins. The Pol I transcription unit on the reverse strand produces (−) sense genome RNA segments necessary for assembly of ribonucleoprotein complexes and encapsidation. Thus, infectious influenza A/PR/8/34 particles are formed (FIG. 10). This particular strain of the H1 N1 serotype is relatively benign to humans. It has been adapted for growth in tissue culture cells and is particularly useful for studying pathogenesis, as it is pathogenic in BALB/c mice.

When synthesizing segments that are alternatively spliced (NS and M), care is taken not to destroy splice sites and the alternative reading frames. In all cases the terminal 120 nt at either end of each segment are excluded, as these sequences are known to contain signals for RNA replication and virus assembly. At least two versions of each fragment are synthesized (moderate and maximal deoptimization). Viruses in which only one segment is modified are generated, the effect is assessed, and more modified segments are introduced as needed. This is easy in this system, since each segment is on a separate plasmid.

Virus infectivity is titered by plaque assay on MDCK cells in the presence of 1 ug/ml (TPCK)-trypsin. Alternatively, depending on the number of different virus constructs, a 96-well ELISA is used to determine the titer of various viruses as cell infectious units on MDCK cells essentially as described above for PV. See Example 3. The only difference is that now a HA-specific antibody is used to stain infected cells. In addition, the relative concentration of virions are determined via hemagglutination (HA) assay using chicken red blood cells (RBC) (Charles River Laboratories) using standard protocols (Kendal et al., 1982). Briefly, virus suspensions are 2-fold serially diluted in PBS in a V-bottom 96 well plates. PBS alone is used as an assay control. A standardized amount of RBCs is added to each well, and the plates are briefly agitated and incubated at room temperature for 30 minutes. HA titers are read as the reciprocal dilution of virus in the last well with complete hemagglutination. While HA-titer is a direct corollary of the amount of particles present, PFU-titer is a functional measure of infectivity. By determining both measures, a relative PFU/HA-unit ratio is calculated similar to the PFU/particle ratio described in the PV experiments. See Example 3. This addresses the question whether codon- and codon pair-deoptimized influenza viruses also display a lower PFU/particle as observed for PV.

Virulence Test

The lethal dose 50 (LD₅₀) of the parental NPR/8/34 virus is first determined for mice and synthetic influenza viruses are chosen for infection of BALB/c mice by intranasal infection. Methods for determining LD₅₀values are well known to persons of ordinary skill in the art (see Reed and Muench, 1938, and Example 4). The ideal candidate viruses display a low infectivity (low PFU titer) with a high virion concentration (high HA-titer). Anesthetized mice are administered 25 μl of virus solution in PBS to each nostril containing 10-fold serial dilutions between 10²to 10⁷PFU of virus. Mortality and morbidity (weight loss, reduced activity) are monitored twice daily for up to three weeks. LD₅₀is calculated by the method of Reed and Muench (1938). For the A/PR/8/34 wt virus the expected LD₅₀is around 10³PFU (Talon et al., 2000), but may vary depending on the particular laboratory conditions under which the virus is titered.

Adaptation of SAVE to Dengue, HIV, Rotavirus, and SARS

Several viruses were selected to further test the SAVE approach. Table 8 identifies the coding regions of each of Dengue, HIV, Rotavirus (two segments), and SARS, and provides nucleotide sequences for parent viruses and exemplary viral genome sequences having deoptimized codon pair bias. As described above, codon pair bias is determined for a coding sequence, even though only a portion (subsequence) may contain the deoptimizing mutations.

TABLE 8

Nucleotide sequence and codon pair bias

of parent and codon pair bias-reduced coding regions

Parent
Codon pair bias-reduced

sequence
sequence

SEQ

SEQ

ID

ID
deoptimized

Virus
NO:
CDS
CPB
NO:
segment*
CPB*

Flu PB1
13
25-2298
0.0415
14
531-2143
−0.2582

Flu PB1-
″
″
″
15
531-1488
−0.1266

RR

Flu PB2
16
28-2307
0.0054
17
33-2301
−0.3718

Flu PA
18
25-2175
0.0247
19
30-2171
−0.3814

Flu HA
20
33-1730
0.0184
21
180-1655
−0.3627

Flu NP
22
46-1542
0.0069
23
126-1425
−0.3737

Flu NA
24
21-1385
0.0037
25
123-1292
−0.3686

Flu M
26

0.0024

Flu NS
27
27-719
−0.0036
28
128-479
−0.1864

Rhino-
29
619-7113
0.051
30

−0.367

virus

89

Rhino-
31
629-7168
0.046
32

−0.418

virus

14

Dengue
33
95-10273
0.0314
34

−0.4835

HIV
35
336-1634
0.0656
36

−0.3544

1841-4585

4644-5102

5858-7924

8343-8963

Rotavirus
37
12-3284
0.0430
38

−0.2064

Seg. 1

Rotavirus
39
37-2691
0.0375
40

−0.2208

Seg. 2

SARS
41
265-13398
0.0286
42

−0.4393

13416-21485

21492-25259

26398-27063

*CPB can be reduced by deoptimizing an internal segment smaller than the complete coding sequence. Nevertheless, CPB is calculated for the complete CDS.

Example 16

Assessment of Poliovirus and Influenza Virus Vaccine Candidates in Mice

The ability of deoptimized viruses to vaccinate mice against polio or influenza is tested.

Poliovirus Immunizations, Antibody Titers, and Wt Challenge Experiments

The working hypothesis is that a good vaccine candidate combines a low infectivity titer with a high virion titer. This ensures that a high amount of virus particles (i.e., antigen) can be injected while at the same time having a low risk profile. Thus, groups of five CD155tg mice will be injected intraperitoneally with 10³, 10⁴, 10⁵, and 10⁶PFU of PV(Mahoney) (i.e., wild-type), PV1 Sabin vaccine strain, PV^AB2470-2954, PV-Min^755-2470, or other promising attenuated polioviruses developed during this study. For the wild-type, 1 PFU is about 100 viral particles, while for the attenuated viruses, 1 PFU is roughly 5,000 to 100,000 particles. Thus, injection with equal number of PFUs means that 50 to 1000-fold more particles of attenuated virus are being injected. For wt virus injected intraperitoneally, the LD₅₀is about 10⁶PFU, or about 10⁸particles. Accordingly, some killing is expected with the highest doses but not with the lower doses.

Booster shots of the same dose are given one week after and four weeks after the initial inoculation. One week following the second booster, PV-neutralizing antibody titers are determined by plaque reduction assay. For this purpose, 100 PFU of wt PV(M) virus are incubated with 2-fold serial dilutions of sera from immunized mice. The residual number of PFU is determined by plaque assays. The neutralizing antibody titer is expressed as the reciprocal of the lowest serum dilution at which no plaques are observed.

Four weeks after the last booster, immunized mice and non-immunized controls are challenged with a lethal dose of PV(M) wt virus (10⁶PFU intraperitoneally; this equals 100 times LD₅₀, and survival is monitored.

Influenza Immunizations, Antibody Titers, and Wt Challenge Experiments

For vaccination experiments, groups of 5 BALB/c mice are injected with wt and attenuated influenza viruses intraperitoneally at a dose of 0.001, 0.01, 0.1, and 1.0 LD₅₀. Booster vaccinations are given at the same intervals described above for PV. Influenza antibody titers one week after the second booster are determined by an inhibition of hemagglutination (HI) assay following standard protocols (Kendal et al., 1982). Briefly, sera from immunized and control mice treated with receptor destroying enzyme (RDE; Sigma, St Louis, Mo.) are 2-fold serially diluted and mixed with 5 HA-units of A/PR/8/34 virus in V-bottom 96 wells. RBCs are then added and plates are processed as above for the standard HA-assay. Antibody titers are expressed as the reciprocal dilution that results in complete inhibition of hemagglutination.

Three weeks after the last booster vaccination, mice are challenged infra-nasally with 100 or 1000 LD₅₀of A/PR/8/34 parental virus (approximately 10⁵and 10⁶PFU), and survival is monitored.

Animal Handling

Transgenic mice expressing the human poliovirus receptor CD155 (CD155tg) were obtained from Dr. Nomoto, The Tokyo University. The CD155tg mouse colony is maintained by the State University of New York (SUNY) animal facility. BALB/c mice are obtained from Taconic (Germantown, N.Y.). Anesthetized mice are inoculated using 25-gauge hypodermic needles with 30 μl of viral suspension by intravenous, intraperitoneal or intracerebral route or 50 ul by the intranasal route. Mice of both sexes between 6-24 weeks of age are used. Mice are the most economical model system for poliovirus and influenza virus research. In addition, in the case of PV, the CD155tg mouse line is the only animal model except for non-human primates. Mice also provide the safest animal model since no virus spread occurs between animals for both poliovirus and influenza virus.

All mice are housed in SUNY's state of the art animal facility under the auspices of the Department of Laboratory Animal Research (DLAR) and its veterinary staff. All animals are checked twice weekly by the veterinary staff. Virus-infected animals are checked twice daily by the investigators and daily by the veterinary staff. All infection experiments are carried out in specially designated maximum isolation rooms within the animal facility. After conclusion of an experiment, surviving mice are euthanized and cadavers are sterilized by autoclaving. No mouse leaves the virus room alive.

In the present study, mice are not subjected to any surgical procedure besides intravenous, intracerebral, intraperitoneal, intramuscular or intranasal inoculation, the injection of anesthetics, and the collection of blood samples. For vaccination experiments, blood samples are taken prior and after vaccination for detection of virus-specific antibodies. To this end, 50-100 μl are collected from mice the day before injection and one week following the second booster vaccination. A maximum of two blood samples on individual animals are collected at least four weeks apart. Animals are anesthetized and a sharp scalpel is used to cut off 2 mm of tail. Blood is collected with a capillary tube. Subsequent sampling is obtained by removing scab on the tail. If the tail is healed, a new 2-mm snip of tail is repeated.

All animal experiments are carried out following protocols approved by the SUNY Institutional Animal Care and Use Committee (IACUC). Euthanasia is performed by trained personnel in a CO₂gas chamber according to the recommendation of the American Veterinary Medical Association. Infection experiments are conducted under the latest the ABSL 2/polio recommendations issued by the Centers for Disease Control and Prevention (CDC).

Example 17

Codon Pair Bias Algorithm—Codon Pair Bias and Score Matrix

In most organisms, there exists a distinct codon bias, which describes the preferences of amino acids being encoded by particular codons more often than others. It is widely believed that codon bias is connected to protein translation rates. In addition, each species has specific preferences as to whether a given pair of codons appear as neighbors in gene sequences, something that is called codon-pair bias.

To quantify codon pair bias, we define a codon pair distance as the log ratio of the observed over the expected number of occurrences (frequency) of codon pairs in the genes of an organism. Although the calculation of the observed frequency of codon pairs in a set of genes is straightforward, the expected frequency of a codon pair is calculated as in Gutman and Hatfield, Proc. Natl. Acad. Sci. USA, 86:3699-3703, 1989, and is independent of amino acid and codon bias. To achieve that, the expected frequency is calculated based on the relative proportion of the number of times an amino acid is encoded by a specific codon. In short:

$codon pair score = \log (\frac{F (AB)}{\frac{F (A) \times F (B)}{F (X) \times F (Y)} \times F (XY)}),$

where the codon pair AB encodes for amino acid pair XY and F denotes frequency (number of occurrences).

In this scheme we can define a 64×64 codon-pair distance matrix with all the pairwise costs as defined above. Any m-residue protein can be rated as using over- or under-represented codon pairs by the average of the codon pair scores that comprise its encoding.

Optimization of a Gene Encoding Based on Codon Pair Bias

To examine the effects of codon pair bias on the translation of specific proteins, we decided to change the codon pairs while keeping the same codon distribution. So we define the following problem: Given an amino acid sequence and a set of codon frequencies (codon distribution), change the DNA encoding of the sequence such that the codon pair score is optimized (usually minimized or maximized).

Our problem, as defined above, can be associated with the Traveling Salesman Problem (TSP). The traveling salesman problem is the most notorious NP-complete problem. This is a function of its general usefulness, and because it is easy to explain to the public at large. Imagine a traveling salesman who has to visit each of a given set of cities by car. What is the shortest route that will enable him to do so and return home, thus minimizing his total driving?

TSP Heuristics

Almost any flavor of TSP is going to be NP-complete, so the right way to proceed is with heuristics. These are often quite successful, typically coming within a few percent of the optimal solution, which is close enough for most applications and in particular for our optimized encoding.

Minimum spanning trees—A simple and popular heuristic, especially when the sites represent points in the plane, is based on the minimum spanning tree of the points. By doing a depth-first search of this tree, we walk over each edge of the tree exactly twice, once going down when we discover the new vertex and once going up when we backtrack. We can then define a tour of the vertices according to the order in which they were discovered and use the shortest path between each neighboring pair of vertices in this order to connect them. This path must be a single edge if the graph is complete and obeys the triangle inequality, as with points in the plane. The resulting tour is always at most twice the length of the minimum TSP tour. In practice, it is usually better, typically 15% to 20% over optimal. Further, the time of the algorithm is bounded by that of computing the minimum spanning tree, only O(n lg n) in the case of points in the plane.

Incremental insertion methods—A different class of heuristics inserts new points into a partial tour one at a time (starting from a single vertex) until the tour is complete. The version of this heuristic that seems to work best is furthest point insertion: of all remaining points, insert the point v into partial tour T such that

$\max_{v \in V} \overset{\langle T \rangle}{\min_{t = 1}} (d (v, v_{i}) + d (v, v_{i + 1})) .$

The minimum ensures that we insert the vertex in the position that adds the smallest amount of distance to the tour, while the maximum ensures that we pick the worst such vertex first. This seems to work well because it first “roughs out” a partial tour before filling in details. Typically, such tours are only 5% to 10% longer than optimal.

k-optimal tours—Substantially more powerful are the Kernighan-Lin, or k-opt class of heuristics. Starting from an arbitrary tour, the method applies local refinements to the tour in the hopes of improving it. In particular, subsets of k≥2 edges are deleted from the tour and the k remaining subchains rewired in a different way to see if the resulting tour is an improvement. A tour is k-optimal when no subset of k edges can be deleted and rewired so as to reduce the cost of the tour. Extensive experiments suggest that 3 optimal tours are usually within a few percent of the cost of optimal tours. For k>3, the computation time increases considerably faster than solution quality. Two-opting a tour is a fast and effective way to improve any other heuristic. Simulated annealing provides an alternate mechanism to employ edge flips to improve heuristic tours.

Algorithm for Solving the Optimum Encoding Problem

Our problem as defined is associated with the problem of finding a traveling salesman path (not tour) under a 64-country metric. In this formulation, each of the 64 possible codons is analogous to a country, and the codon multiplicity modeled as the number of cities in the country. The codon-pair bias measure is reflected as the country distance matrix.

The real biological problem of the design of genes encoding specific proteins using a given set of codon multiplicities so as to optimize the gene/DNA sequence under a codon-pair bias measure is slightly different. What is missing in our model in the country TSP model is the need to encode specific protein sequences. The DNA triplet code partitions the 64 codons into 21 equivalence classes (coding for each of the 20 possible amino acids and a stop symbol). Any given protein/amino acid sequence can be specified by picking an arbitrary representative of the associated codon equivalence class to encode it.

Because of the special restrictions and the nature of our problem, as well as its adaptability to application of additional criteria in the optimization, we selected the Simulated annealing heuristic to optimize sequences. The technique is summarized below.

Simulated Annealing Heuristic

Simulated annealing is a heuristic search procedure that allows occasional transitions leading to more expensive (and hence inferior) solutions. This may not sound like a win, but it serves to help keep our search from getting stuck in local optima.

The inspiration for simulated annealing comes from the physical process of cooling molten materials down to the solid state. In thermodynamic theory, the energy state of a system is described by the energy state of each of the particles constituting it. The energy state of each particle jumps about randomly, with such transitions governed by the temperature of the system. In particular, the probability P(e_i, e_j, T) of transition from energy e_ito e_jat temperature T is given by:

P(e_i,e_j,T)=e^(eⁱ^−e^j^)/(k^B^T)

where kB is a constant, called Boltzmann's constant. What does this formula mean? Consider the value of the exponent under different conditions. The probability of moving from a high-energy state to a lower-energy state is very high. However, there is also a nonzero probability of accepting a transition into a high-energy state, with small energy jumps much more likely than big ones. The higher the temperature, the more likely such energy jumps will occur.

What relevance does this have for combinatorial optimization? A physical system, as it cools, seeks to go to a minimum-energy state. For any discrete set of particles, minimizing the total energy is a combinatorial optimization problem. Through random transitions generated according to the above probability distribution, we can simulate the physics to solve arbitrary combinatorial optimization problems.

As with local search, the problem representation includes both a representation of the solution space and an appropriate and easily computable cost function C(s) measuring the quality of a given solution. The new component is the cooling schedule, whose parameters govern how likely we are to accept a bad transition as a function of time.

At the beginning of the search, we are eager to use randomness to explore the search space widely, so the probability of accepting a negative transition should be high. As the search progresses, we seek to limit transitions to local improvements and optimizations. The cooling schedule can be regulated by the following parameters:

Initial system temperature—Typically t₁=1.

Temperature decrement function—Typically t_k=α·tk−1, where 0.8≤α≤0.99. This implies an exponential decay in the temperature, as opposed to a linear decay.

Number of iterations between temperature change—Typically, 100 to 1,000 iterations might be permitted before lowering the temperature.

Acceptance criteria—A typical criterion is to accept any transition from s_ito s_i+1 when C(s_i+1)<C(s_i) and to accept a negative transition whenever

$e^{- \frac{(C (s_{i}) - C (s_{i + 1}))}{{cs}_{i}}} \geq r,$

where r is a random number 0≤r<1. The constant c normalizes this cost function, so that almost all transitions are accepted at the starting temperature.

Stop criteria—Typically, when the value of the current solution has not changed or improved within the last iteration or so, the search is terminated and the current solution reported.

In expert hands, the best problem-specific heuristics for TSP can slightly outperform simulated annealing, but the simulated annealing solution works easily and admirably.

REFERENCES

Alexander, H. E., G. Koch, I. M. Mountain, K. Sprunt, and O. Van Damme 1958. Infectivity of ribonucleic acid of poliovirus on HeLa cell mono-layers Virology. 5:172-3.

Altmeyer, R., A. D. Murdin, J. J. Harber, and E. Wimmer 1991. Construction and characterization of poliovirus/rhinovirus antigenic hybrid. Virology. 184:636-44.

Ansardi, D. C., D. C. Porter, and C. D. Morrow. 1993. Complementation of a poliovirus defective genome by a recombinant vaccinia virus which provides poliovirus P1 capsid precursor in trans. J. Virol. 67:3684-3690.

Belov, G. A., L. I. Romanova, E. A. Tolskaya, M. S. Kolesnikova, Y. A. Lazebnik, and V. I. Agol. 2003. The major apoptotic pathway activated and suppressed by poliovirus. J. Virol. 77:45-56.

Buchan, J. R., L. S. Aucott, and I. Stansfield. 2006. tRNA properties help shape codon pair preferences in open reading frames. Nucl. Acids Res. 34:1015-27.

Burns, C. C., J. Shaw, R. Campagnoli, J. Jorba, A. Vincent, J. Quay, and O. Kew. 2006. Modulation of poliovirus replicative fitness in HeLa cells by deoptimization of synonymous codon usage in the capsii region. J. Virol. 80:3259-72.

Cao, X., R. J. Kuhn, and E. Wimmer 1993. Replication of poliovirus RNA containing two VPg coding sequences leads to a specific deletion event. J. Virol. 67:5572-5578.

Carlini, D. B., and W. Stephan 2003. In vivo introduction of unpreferred synonymous codons into the Drosophila Adh gene results in reduced levels of ADH protein. Genetics 163:239-243.

Cello, J., A. V. Paul, and E. Wimmer 2002. Chemical synthesis of poliovirus cDNA: generation of infectious virus in the absence of natural template Science. 297:1016-1018.

Cheng, L., and E. Goldman. 2001. Absence of effect of varying Thr-Leu codon pairs on protein synthesis in a T7 system. Biochemistry. 40:6102-6.

Cohen, B., and S. Skiena. 2003. Natural selection and algorithmic design of mRNA. J. Comput Biol. 10:419-432.

Coligan, J., A. Kruisbeek, D. Margulies, E. Shevach, and W. Strober, eds. (1994) Current Protocols in Immunology, Wiley & Sons, Inc., New York.

Corpet, F. 1988. Multiple sequence alignment with hierarchical clustering. Nucl. Acids Res. 16:10881-90.

Cram, P., S. G. Blitz, A. Monte, and A. M. Fendrick. 2001. Influenza. Cost of illness and consideration in the economic evaluation of new and emerging therapies. Pharmacoeconomics. 19:223-30.

Crotty, S., C. E. Cameron, and R. Andino. 2001. RNA virus error catastrophe: direct molecular test by using ribavirin. Proc. Natl. Acad. Sci. U.S.A. 98:6895-6900.

Curran, J. F., E. S. Poole, W. P. Tate, and B. L. Gross. 1995. Selection of aminoacyl-tRNAs at sense codons: the size of the tRNA variable loop determines whether the immediate 3′ nucleotide to the coder has a context effect. Nucl. Acids Res. 23:4104-8.

Doma, M. K., and R. Parker. 2006. Endonucleolytic cleavage of eukaryotic mRNAs with stalls in translation elongation. Nature. 440:561-4.

Dove, A. W., and V. R. Racaniello. 1997. Cold-adapted poliovirus mutants bypass a postentry replication block. J. Virol. 71:4728-4735.

Enami, M., W. Luytjes, M. Krystal, and P. Palese. 1990. Introduction of site-specific mutations into the genome of influenza virus. Proc. Natl. Acad. Sci. U.S.A. 87:3802-5.

Farabaugh, P. J. 1996. Programmed translational frameshifting Microbiol Rev. 60:103-34.

Fedorov, A., S. Saxonov, and W. Gilbert. 2002. Regularities of context-dependent codon bias in eukaryotic genes. Nucl. Acids Res. 30:1192-7.

Fodor, E., L. Devenish, O. G. Engelhardt, P. Palese, G. G. Brownlee, and A. Garcia-Sastre. 1999. Rescue of influenza A virus from recombinant DNA. J Virol. 73:9679-82.

Gabow, H. 1973. Ph.D. thesis. Stanford University, Stanford, Calif.

Garcia-Sastre, A., and P. Palese. 1993. Genetic manipulation of negative-strand RNA virus genomes. Annu. Rev. Microbiol. 47:765-90.

Georgescu, M. M., J. Balanant, A. Macadam, D. Otelea, M. Combiescu, A. A. Combiescu, R. Crainic, and F. Delpeyroux. 1997. Evolution of the Sabin type 1 poliovirus in humans: characterization of strains isolated from patients with vaccine-associated paralytic poliomyelitis. J. Virol. 71:7758-68.

Gerber, K., E. Wimmer, and A. V. Paul. 2001. Biochemical and genetic studies of the initiation of human rhinovirus 2 RNA replication: identification of a cis-replicating element in the coding sequence of 2A(pro). J. Virol. 75:10979-10990.

Girard, S., T. Couderc, J. Destombes, D. Thiesson, F. Delpeyroux, and B. Blondel. 1999. Poliovirus induces apoptosis in the mouse central nervous system. J. Virol. 73:6066-6072.

Goodfellow, I., Y. Chaudhry, A. Richardson, J. Meredith, J. W. Almond, W. Barclay, and D. J. Evans. 2000. Identification of a cis-acting replication element within the poliovirus coding region. J. Virol. 74:4590-600.

Greve, J. M., G. Davis, A. M. Meyer, C. P. Forte, S. C. Yost, C. W. Marlor, M. E. Kamarck, and A. McClelland. 1989. The major human rhinovirus receptor is ICAM-1. Cell. 56:839-47.

Gustafsson, C., S. Govindarajan, and J. Minshull. 2004. Codon bias and heterologous protein expression. Trends Biotechnol. 22:346-353.

Gutman, G. A., and G. W. Hatfield. 1989. Nonrandom utilization of codon pairs in Escherichia coli. Proc. Natl. Acad. Sci. U.S.A 86:3699-703.

He, Y., V. D. Bowman, S. Mueller, C. M. Bator, J. Bella, X. Peng, T. S. Baker, E. Wimmer, R. J. Kuhn, and M. G. Rossmann 2000. Interaction of the poliovirus receptor with poliovirus. Proc. Natl. Acad. Sci. USA 97:79-84.

Hendley, J. O. 1999. Clinical virology of rhinoviruses Adv Virus Res. 54:453-66.

Herold, J., and R. Andino. 2000. Poliovirus requires a precise 5′ end for efficient positive-strand RNA synthesis. J. Virol. 74:6394-400.

Hoekema, A., R. A. Kastelein, M. Vasser, and H. A. de Boer. 1987. Codon replacement in the PGK1 gene of Saccharomyces cerevisiae: experimental approach to study the role of biased codon usage in gene expression. Mol. Cell. Biol. 7:2914-2924.

Hofer, F., M. Gruenberger, H. Kowalski, H. Machat, M. Huettinger, E. Kuechler, and D. Blaas. 1994 Members of the low density lipoprotein receptor family mediate cell entry of a minor-group common cold virus. Proc. Natl. Acad. Sci. U.S.A. 91:1839-42.

Hoffmann, E., G. Neumann, Y. Kawaoka, G. Hobom, and R. G. Webster. 2000. A DNA transfection system for generation of influenza: A virus from eight plasmids. Proc. Natl. Acad. Sci. U.S.A. 97:6108-13.

Hogle, J. M. 2002. Poliovirus cell entry: common structural themes in viral cell entry pathways. Annu. Rev. Microbiol. 56:677-702.

Holland, J. J., E. Domingo, J. C. de la Torre, and D. A. Steinhauer. 1990. Mutation frequencies at defined single codon sites in vesicular stomatitis virus and poliovirus can be increased only slightly by chemical mutagenesis. J. Virol. 64:3960-3962.

Hsiao, L. L., F. Dangond, T. Yoshida, R. Hong, R. V. Jensen et al. 2001. A compendium of gene expression in normal human tissues. Physiol. Genomics 7:97-104.

Irwin, B., J. D. Heck, and G. W. Hatfield. 1995. Codon pair utilization biases influence translational elongation step times. J. Biol Chem. 270:22801-6.

Jang, S. K., M. V. Davies, R. J. Kaufman, and E. Wimmer 1989. Initiation of protein synthesis by internal entry of ribosomes into the 5′ nontranslated region of encephalomyocarditis virus RNA in vitro. J. Virol. 63:1651-1660.

Jayaraj, S., R. Reid, and D. V. Santi. 2005. GeMS: an advanced software package for designing synthetic genes. Nucl. Acids Res. 33:3011-3016.

Johansen, L. K., and C. D. Morrow. 2000. The RNA encompassing the internal ribosome entry site in the poliovirus 5′ nontranslated region enhances the encapsidation of genomic RNA. Virology 273:391-399.

Joklik, W., and J. Darnell. 1961. The adsorption and early fate of purified poliovirus in HeLa cells. Virology 13:439-447.

Kamps, B. S., C. Hoffmann, and W. Preiser (eds.) 2006. Influenza Report, 2006. Flying Publisher.

Kaplan, G., and V. R. Racaniello. 1988. Construction and characterization of poliovirus subgenomic replicons. J. Virol. 62:1687-96.

Karlin, S., W. Doerfler, and L. R. Cardon. 1994. Why is CpG suppressed in the genomes of virtually al small eukaryotic viruses but not in those of large eukaryotic viruses? J Virol. 68:2889-97.

Kendal, A. P., J. J. Skehel, and M. S. Pereira (eds.) 1982 Concepts and procedures for laboratory-based influenza surveillance. World Health Organization Collaborating Centers for Reference and Research on Influenza, Geneva.

Kew, O., V. Morris-Glasgow, M. Landaverde, C. Burns, J. Shaw, Z. Garib, J. Andre, E. Blackman, C. J. Freeman, J. Jorba, R. Sutter, G. Tambini, L. Venczel, C. Pedreira, F. Laender, H. Shimizu, T. Yoneyama, T. Miyamura, H. van Der Avoort, M. S. Oberste, D. Kilpatrick, S. Cochi, M. Pallansch, and C. de Quadros. 2002. Outbreak of poliomyelitis in Hispaniola associated with circulating type 1 vaccine-derived poliovirus. Science. 296:356-9.

Kilbourne, E. D. 2006. Influenza pandemics of the 20th century. Emerg. Infect. Dis. 12:9-14.

Kitamura, N., B. L. Semler, P. G. Rothberg, G. R. Larsen, C. J. Adler, A. J. Dorner, E. A. Emini, R. Hanecak, J. Lee, S. van der Well, C. W. Anderson, and E. Wimmer 1981. Primary structure, gene organization and polypeptide expression of poliovirus RNA. Nature. 291:547-553.

Koike, S., C. Taya, T. Kurata, S. Abe, I. Ise, H. Yonekawa, and A. Nomoto. 1991. Transgenic mice susceptible to poliovirus. Proc. Natl. Acad. Sci. U.S.A. 88:951-955.

Landsteiner, K. and E. Popper. 1909. Ubertragung der Poliomyelitis acuta auf Affen. Z. ImmunnitatsForsch Orig. 2:377-90.

Lavner, Y., and D. Kotlar. 2005. Codon bias as a factor in regulating expression via translation rate in the human genome. Gene. 345:127-38.

Ledford, R. M., N. R. Patel, T. M. Demenczuk, A. Watanyar, T. Herbertz, M. S. Collett, and D. C. Pevear. 2004. VP1 sequencing of all human rhinovirus serotypes: insights into genus phylogeny and susceptibility to antiviral capsid-binding compounds. J. Virol. 78:3663-74.

Luytjes, W., M. Krystal, M. Enami, J. D. Pavin, and P. Palese. 1989. Amplification, expression, and packaging of foreign gene by influenza virus. Cell. 59:1107-13.

McKnight, K. L. 2003. The human rhinovirus internal cis-acting replication element (cre) exhibits disparate properties among serotypes. Arch. Virol. 148:2397-418.

Molla, A., A. V. Paul, and E. Wimmer 1991. Cell-free, de novo synthesis of poliovirus. Science 254:1647-1651.

Mueller, S., D. Papamichail, J. R. Coleman, S. Skiena, and E. Wimmer 2006. Reduction of the Rate of Poliovirus Protein Synthesis through Large-Scale Codon Deoptimization Causes Attenuation of Viral Virulence by lowering specific infectivity. J. Virol. 80:9687-9696.

Mueller, S., E. Wimmer, and J. Cello. 2005. Poliovirus and poliomyelitis: a tale of guts, brains, and an accidental event. Virus Res. 111:175-193.

Murdin, A. D., and E. Wimmer 1989. Construction of a poliovirus type 1/type 2 antigenic hybrid by manipulation of neutralization antigenic site II. J. Virol. 63:5251-5257.

Neumann, G., T. Watanabe, H. Ito, S. Watanabe, H. Goto, P. Gao, M. Hughes, D. R. Perez, R. Donis E. Hoffmann, G. Hobom, and Y. Kawaoka. 1999. Generation of influenza A viruses entirely from clone cDNAs. Proc. Natl. Acad. Sci. U.S.A. 96:9345-50.

Neznanov, N., K. M. Chumakov, L. Neznanova, A. Almasan, A. K. Banerjee, and A. V. Gudkov. 2005. Proteolytic cleavage of the p65-RelA subunit of NF-kappaB during poliovirus infection. J. Biol. Chem. 280:24153-24158.

Palese, P., and M. L. Shaw. 2007. Orthomyxoviridae: the viruses and their replication, p. 1647-1689. In D. M. Knipe and P. M. Howley (ed.), Fields virology. Lippincott Williams & Wilkins, Philadelphia, Pa.

Park, S., X. Yang, and J. G. Saven. 2004. Advances in computational protein design. Curr Opin Struct Biol 14:487-94.

Paul, A. V., J. A. Mugavero, A. Molla, and E. Wimmer 1998. Internal ribosomal entry site scanning of the poliovirus polyprotein: implications for proteolytic processing. Virology 250:241-253.

Pelletier, J., and N. Sonenberg. 1988. Internal initiation of translation of eukaryotic mRNA directed by; sequence derived from poliovirus RNA. Nature. 334:320-325.

Pfister, T., and E. Wimmer 1999. Characterization of the nucleoside triphosphatase activity of poliovirus protein 2C reveals a mechanism by which guanidine inhibits poliovirus replication. J. Biol. Chem. 274:6992-7001.

Plotkin, J. B., H. Robins, and A. J. Levine. 2004. Tissue-specific codon usage and the expression of human genes. Proc. Natl. Acad. Sci. U.S.A. 101:12588-12591.

Racaniello, V. R., and D. Baltimore. 1981. Cloned poliovirus complementary DNA is infectious in mammalian cells. Science. 214:916-9.

Reed, L. J., and M. Muench. 1938. A simple method for estimating fifty percent endpoints. Am. J. Hyg. 27:493-497.

Richardson, S. M., S. J. Wheelan, R. M. Yarrington, and J. D. Boeke. 2006. GeneDesign: rapid, automated design of multikilobase synthetic genes. Genome Res. 16:550-556.

Robinson, M., R. Lilley, S. Little, J. S. Emtage, G. Yarronton, P. Stephens, A. Millican, M. Eaton, and G. Humphreys. 1984. Codon usage can affect efficiency of translation of genes in Escherichia coli. Nucl. Acids Res. 12:6663-6671.

Rothberg, E. 1985. wmatch: a C program to solve maximum-weight matching. [Online.]

Rueckert, R. R. 1985. Picornaviruses and their replication, p. 705-738. In B. N. Fields, D. M. Knipe, R. M. Chanock, J. L. Melnick, B. Roizman, and R. E. Shope (ed.), Fields virology, vol. 1: Raven Press, New York, N.Y.

Russell, C. J., and R. G. Webster. 2005. The genesis of a pandemic influenza virus. Cell. 123:368-371.

Sambrook, J., E. F. Fritsch, and T. Maniatis. (1989) Molecular Cloning: A Laboratory Manual, 2^nded., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.

Sánchez, G., A. Bosch, and R. M. Pinto. 2003. Genome variability and capsid structural constraints of hepatitis A virus. J. Virol. 77:452-459.

Savolainen, C., S. Blomqvist, and T. Novi. 2003. Human rhinoviruses. Paediatr. Respir. Rev. 4:91-98.

Schwerdt, C., and J. Fogh. 1957. The ratio of physical particles per infectious unit observed for poliomyelitis viruses. Virology 4:41-52.

Shimizu, H., B. Thorley, F. J. Paladin, K. A. Brussen, and V. Stambos et al. 2004. Circulation of type 1 vaccine-derived poliovirus in the Philippines in 2001. J. Virol. 78:13512-13521.

Simonsen, L., T. A. Reichert, C. Viboud, W. C. Blackwelder, R. J. Taylor, and M. A. Miller. 2005 Impact of influenza vaccination on seasonal mortality in the US elderly population. Arch. Intern. Med. 165:265-272.

Skiena, S. S. 2001. Designing better phages Bioinformatics. 17 Suppl 1:5253-61.

Steinhauer, D. A., and J. J. Skehel. 2002. Genetics of influenza viruses. Annu. Rev. Genet. 36:305-332.

Stephenson, I., and J. Democratis. 2006. Influenza: current threat from avian influenza. Br. Med. Bull. 75-76:63-80.

Svitkin, Y. V., G. A. Alpatova, G. A. Lipskaya, S. V. Maslova, V. I. Agol, O. Kew, K. Meerovitch, and N. Sonenberg. 1993. Towards development of an in vitro translation test for poliovirus neurovirulence. Dev. Biol. Stand. 78:27-32.

Svitkin, Y. V., S. V. Maslova, and V. I. Agol. 1985. The genomes of attenuated and virulent poliovirus strains differ in their in vitro translation efficiencies. Virology 147:243-252.

Talon, J., M. Salvatore, R. E. O'Neill, Y. Nakaya, H. Zheng, T. Muster, A. Garcia-Sastre, and P. Palese. 2000. Influenza A and B viruses expressing altered NS1 proteins: A vaccine approach. Proc. Natl. Acad. Sci. U.S.A. 97:4309-4314.

Thompson, W. W., D. K. Shay, E. Weintraub, L. Brammer, N. Cox, L. J. Anderson, and K. Fukuda. 2003. Mortality associated with influenza and respiratory syncytial virus in the United States. JAMA. 289:179-186.

Tian, J., H. Gong, N. Shang, X. Zhou, E. Gulari, X. Gao, and G. Church. 2004. Accurate multiplex gene synthesis from programmable DNA microchips. Nature. 432:1050-1054.

Tolskaya, E. A., L. I. Romanova, M. S. Kolesnikova, T. A. Ivannikova, E. A. Smirnova, N. T. Raikhlin, and V. I. Agol. 1995. Apoptosis-inducing and apoptosis-preventing functions of poliovirus. J. Virol. 69:1181-1189.

Toyoda, H., J. Yin, S. Mueller, E. Wimmer, and J. Cello. 2007. Oncolytic treatment and cure of neuroblastoma by a novel attenuated poliovirus in a novel poliovirus-susceptible animal model. Cancer Res. 67:2857-64.

van der Wert, S., J. Bradley, E. Wimmer, F. W. Studier, and J. J. Dunn. 1986. Synthesis of infectious poliovirus RNA by purified T7 RNA polymerase. Proc. Natl. Acad. Sci. U.S.A. 78:2330-2334.

Wahby, A. F. 2000. Combined cell culture enzyme-linked immunosorbent assay for quantification of poliovirus neutralization-relevant antibodies. Clin. Diagn. Lab. Immunol. 7:915-9.

Wang, B., D. Papamichail, S. Mueller, and S. Skiena. 2006. Two Proteins for the Price of One: The Design of Maximally Compressed Coding Sequences Natural Computing. Eleventh International Meeting on DNA Based Computers (DNA11), 2005. Lecture Notes in Computer Science (LNCS), 3892:387-398.

Zhao, W. D., and E. Wimmer 2001. Genetic analysis of a poliovirus/hepatitis C virus chimera: new structure for domain II of the internal ribosomal entry site of hepatitis C virus. J. Virol. 75:3719-3730.

Zhou, J., W. J. Liu, S. W. Peng, X. Y. Sun, and I. Frazer. 1999. Papillomavirus capsid protein expression level depends on the match between codon usage and tRNA availability. J. Virol. 73:4972-4982.

Zolotukhin, S., M. Potter, W. W. Hauswirth, J. Guy, and N. Muzyczka. 1996. A “humanized” green fluorescent protein cDNA adapted for high-level expression in mammalian cells. J. Virol. 70:4646-4654.

Observed/

AA pair
Codon pair
Expected
Observed
Expected
CPS

AA
GCGGCG
630.04
2870
4.555
1.516

AA
GCGGCC
2330.20
4032
1.730
0.548

AA
GCTGCT
3727.41
5562
1.492
0.400

AA
GCAGCA
2856.40
4196
1.469
0.385

AA
GCAGCT
3262.97
4711
1.444
0.367

AA
GCTGCA
3262.97
4357
1.335
0.289

AA
GCTGCC
5667.77
7014
1.238
0.213

AA
GCAGCC
4961.56
6033
1.216
0.196

AA
GCAGCG
1341.51
1420
1.059
0.057

AA
GCTGCG
1532.46
1533
1.000
0.000

AA
GCGGCT
1532.46
1472
0.961
−0.040

AA
GCCGCG
2330.20
2042
0.876
−0.132

AA
GCGGCA
1341.51
1142
0.851
−0.161

AA
GCCGCC
8618.21
5141
0.597
−0.517

AA
GCCGCT
5667.77
1378
0.243
−1.414

AA
GCCGCA
4961.56
1122
0.226
−1.487

AC
GCCTGC
2333.61
3975
1.703
0.533

AC
GCCTGT
1965.56
2436
1.239
0.215

AC
GCGTGC
630.96
560
0.888
−0.119

AC
GCTTGT
1292.65
1142
0.883
−0.124

AC
GCATGT
1131.59
881
0.779
−0.250

AC
GCGTGT
531.45
322
0.606
−0.501

AC
GCTTGC
1534.70
894
0.583
−0.540

AC
GCATGC
1343.47
554
0.412
−0.886

AD
GCAGAT
2373.33
4215
1.776
0.574

AD
GCTGAT
2711.15
3887
1.434
0.360

AD
GCTGAC
3062.55
4374
1.428
0.356

AD
GCGGAC
1259.11
1625
1.291
0.255

AD
GCAGAC
2680.95
3395
1.266
0.236

AD
GCGGAT
1114.64
839
0.753
−0.284

AD
GCCGAC
4656.80
2726
0.585
−0.535

AD
GCCGAT
4122.47
920
0.223
−1.500

AE
GCAGAA
3517.48
5814
1.653
0.503

AE
GCAGAG
4703.98
7094
1.508
0.411

AE
GCGGAG
2209.23
3171
1.435
0.361

AE
GCTGAG
5373.53
7362
1.370
0.315

AE
GCTGAA
4018.14
5186
1.291
0.255

AE
GCCGAG
8170.80
5082
0.622
−0.475

AE
GCGGAA
1651.99
949
0.574
−0.554

AE
GCCGAA
6109.85
1097
0.180
−1.717

AF
GCCTTC
4447.90
7382
1.660
0.507

AF
GCATTT
2237.22
2332
1.042
0.041

AF
GCTTTT
2555.66
2580
1.010
0.009

AF
GCCTTT
3886.04
3842
0.989
−0.011

AF
GCTTTC
2925.16
2315
0.791
−0.234

AF
GCGTTC
1202.63
636
0.529
−0.637

AF
GCGTTT
1050.71
518
0.493
−0.707

AF
GCATTC
2560.68
1261
0.492
−0.708

AG
GCGGGC
1369.64
2638
1.926
0.655

AG
GCGGGG
986.17
1738
1.762
0.567

AG
GCTGGG
2398.67
3855
1.607
0.474

AG
GCTGGT
1590.73
2524
1.587
0.462

AG
GCTGGA
2457.02
3783
1.540
0.432

AG
GCAGGA
2150.87
3074
1.429
0.357

AG
GCAGGG
2099.79
2782
1.325
0.281

AG
GCAGGT
1392.52
1748
1.255
0.227

AG
GCTGGC
3331.38
3961
1.189
0.173

AG
GCAGGC
2916.28
3119
1.070
0.067

AG
GCGGGT
654.00
617
0.943
−0.058

AG
GCGGGA
1010.16
793
0.785
−0.242

AG
GCCGGG
3647.33
2240
0.614
−0.488

AG
GCCGGC
5065.58
2977
0.588
−0.532

AG
GCCGGT
2418.80
581
0.240
−1.426

AG
GCCGGA
3736.06
795
0.213
−1.547

AH
GCGCAC
748.29
983
1.314
0.273

AH
GCCCAC
2767.53
3465
1.252
0.225

AH
GCTCAT
1319.86
1471
1.115
0.108

AH
GCACAT
1155.40
1122
0.971
−0.029

AH
GCCCAT
2006.93
1827
0.910
−0.094

AH
GCTCAC
1820.07
1526
0.838
−0.176

AH
GCACAC
1593.29
1312
0.823
−0.194

AH
GCGCAT
542.64
248
0.457
−0.783

AI
GCCATC
3894.51
7798
2.002
0.694

AI
GCCATT
3079.73
3761
1.221
0.200

AI
GCAATA
815.43
924
1.133
0.125

AI
GCAATT
1773.02
1684
0.950
−0.052

AI
GCCATA
1416.41
1257
0.887
−0.119

AI
GCTATT
2025.39
1709
0.844
−0.170

AI
GCTATA
931.50
771
0.828
−0.189

AI
GCTATC
2561.23
1194
0.466
−0.763

AI
GCGATT
832.70
373
0.448
−0.803

AI
GCAATC
2242.09
984
0.439
−0.824

AI
GCGATA
382.97
149
0.389
−0.944

AI
GCGATC
1053.00
404
0.384
−0.958

AK
GCCAAG
5767.01
9818
1.702
0.532

AK
GCAAAA
2563.57
3011
1.175
0.161

AK
GCCAAA
4452.91
4794
1.077
0.074

AK
GCAAAG
3320.10
3044
0.917
−0.087

AK
GCTAAA
2928.46
2022
0.690
−0.370

AK
GCGAAG
1559.29
765
0.491
−0.712

AK
GCTAAG
3792.68
1725
0.455
−0.788

AK
GCGAAA
1203.98
409
0.340
−1.080

AL
GCGCTG
2369.16
4619
1.950
0.668

AL
GCGCTC
1140.05
1765
1.548
0.437

AL
GCTTTG
1873.51
2601
1.388
0.328

AL
GCCCTG
8762.30
11409
1.302
0.264

AL
GCCTTG
2848.79
3695
1.297
0.260

AL
GCTTTA
1115.24
1385
1.242
0.217

AL
GCCCTC
4216.45
4499
1.067
0.065

AL
GCTCTT
1912.07
2038
1.066
0.064

AL
GCATTA
976.28
986
1.010
0.010

AL
GCTCTA
1031.16
940
0.912
−0.093

AL
GCACTT
1673.82
1444
0.863
−0.148

AL
GCATTG
1640.07
1364
0.832
−0.184

AL
GCACTA
902.68
747
0.828
−0.189

AL
GCGCTA
423.94
342
0.807
−0.215

AL
GCCCTA
1567.95
1228
0.783
−0.244

AL
GCTCTG
5762.53
4505
0.782
−0.246

AL
GCCCTT
2907.42
2230
0.767
−0.265

AL
GCTCTC
2772.95
2036
0.734
−0.309

AL
GCCTTA
1695.80
1205
0.711
−0.342

AL
GCACTG
5044.51
3522
0.698
−0.359

AL
GCGTTG
770.26
476
0.618
−0.481

AL
GCGCTT
786.11
459
0.584
−0.538

AL
GCACTC
2427.43
1415
0.583
−0.540

AL
GCGTTA
458.51
169
0.369
−0.998

AM
GCCATG
4236.47
6521
1.539
0.431

AM
GCAATG
2438.96
1900
0.779
−0.250

AM
GCTATG
2786.11
1561
0.560
−0.579

AM
GCGATG
1145.46
625
0.546
−0.606

AN
GCCAAC
3190.28
5452
1.709
0.536

AN
GCAAAT
1667.60
2282
1.368
0.314

AN
GCCAAT
2896.62
3122
1.078
0.075

AN
GCAAAC
1836.66
1512
0.823
−0.195

AN
GCTAAT
1904.97
1356
0.712
−0.340

AN
GCTAAC
2098.09
925
0.441
−0.819

AN
GCGAAC
862.59
331
0.384
−0.958

AN
GCGAAT
783.19
260
0.332
−1.103

AP
GCGCCG
406.74
1172
2.881
1.058

AP
GCGCCC
1122.56
2271
2.023
0.705

AP
GCCCCG
1504.34
2335
1.552
0.440

AP
GCTCCA
2360.19
2463
1.044
0.043

AP
GCTCCT
2445.47
2548
1.042
0.041

AP
GCCCCC
4151.78
3957
0.953
−0.048

AP
GCACCT
2140.76
2028
0.947
−0.054

AP
GCCCCA
3588.82
3371
0.939
−0.063

AP
GCACCA
2066.10
1831
0.886
−0.121

AP
GCACCC
2390.20
2111
0.883
−0.124

AP
GCCCCT
3718.49
3269
0.879
−0.129

AP
GCTCCC
2730.42
2384
0.873
−0.136

AP
GCTCCG
989.33
773
0.781
−0.247

AP
GCGCCT
1005.41
778
0.774
−0.256

AP
GCACCG
866.06
571
0.659
−0.417

AP
GCGCCA
970.35
595
0.613
−0.489

AQ
GCCCAG
7143.67
9550
1.337
0.290

AQ
GCGCAG
1931.51
2101
1.088
0.084

AQ
GCACAA
1472.79
1416
0.961
−0.039

AQ
GCTCAA
1682.42
1522
0.905
−0.100

AQ
GCTCAG
4698.04
4141
0.881
−0.126

AQ
GCACAG
4112.65
3374
0.820
−0.198

AQ
GCCCAA
2558.23
1943
0.760
−0.275

AQ
GCGCAA
691.70
244
0.353
−1.042

AR
GCGCGC
580.17
1255
2.163
0.772

AR
GCGCGG
634.54
1175
1.852
0.616

AR
GCCCGG
2346.82
3946
1.681
0.520

AR
GCCCGC
2145.76
3135
1.461
0.379

AR
GCCAGG
2323.57
3242
1.395
0.333

AR
GCAAGA
1362.59
1559
1.144
0.135

AR
GCTCGA
836.64
943
1.127
0.120

AR
GCCCGA
1272.16
1418
1.115
0.109

AR
GCCCGT
918.67
935
1.018
0.018

AR
GCTCGT
604.17
595
0.985
−0.015

AR
GCCAGA
2366.81
2219
0.938
−0.064

AR
GCTCGG
1543.39
1295
0.839
−0.175

AR
GCGCGT
248.39
205
0.825
−0.192

AR
GCAAGG
1337.69
1089
0.814
−0.206

AR
GCGAGG
628.25
486
0.774
−0.257

AR
GCACGA
732.39
533
0.728
−0.318

AR
GCTCGC
1411.16
941
0.667
−0.405

AR
GCGCGA
343.97
226
0.657
−0.420

AR
GCACGT
528.89
338
0.639
−0.448

AR
GCACGG
1351.08
859
0.636
−0.453

AR
GCACGC
1235.33
619
0.501
−0.691

AR
GCTAGA
1556.53
714
0.459
−0.779

AR
GCGAGA
639.94
263
0.411
−0.889

AR
GCTAGG
1528.10
487
0.319
−1.144

AS
GCCTCG
963.41
1977
2.052
0.719

AS
GCGTCG
260.49
465
1.785
0.579

AS
GCCAGC
4127.58
6466
1.567
0.449

AS
GCCTCC
3643.21
5443
1.494
0.401

AS
GCTTCT
2084.25
2488
1.194
0.177

AS
GCCAGT
2604.12
3085
1.185
0.169

AS
GCATCT
1824.55
2154
1.181
0.166

AS
GCTTCA
1684.99
1932
1.147
0.137

AS
GCGTCC
985.05
1079
1.095
0.091

AS
GCATCA
1475.04
1531
1.038
0.037

AS
GCCTCT
3169.23
3235
1.021
0.021

AS
GCCTCA
2562.14
2514
0.981
−0.019

AS
GCTTCC
2395.96
2295
0.958
−0.043

AS
GCAAGT
1499.21
1307
0.872
−0.137

AS
GCTTCG
633.59
516
0.814
−0.205

AS
GCATCC
2097.42
1658
0.790
−0.235

AS
GCATCG
554.64
403
0.727
−0.319

AS
GCGTCT
856.90
521
0.608
−0.498

AS
GCGAGC
1116.02
595
0.533
−0.629

AS
GCGTCA
692.75
319
0.460
−0.775

AS
GCAAGC
2376.27
1080
0.454
−0.789

AS
GCTAGT
1712.60
737
0.430
−0.843

AS
GCGAGT
704.10
265
0.376
−0.977

AS
GCTAGC
2714.51
673
0.248
−1.395

AT
GCCACG
1262.40
2478
1.963
0.674

AT
GCCACC
3842.98
6598
1.717
0.541

AT
GCCACA
3111.04
4031
1.296
0.259

AT
GCCACT
2751.18
3205
1.165
0.153

AT
GCAACA
1791.05
1761
0.983
−0.017

AT
GCGACG
341.33
329
0.964
−0.037

AT
GCAACT
1583.87
1509
0.953
−0.048

AT
GCTACT
1809.31
1395
0.771
−0.260

AT
GCTACA
2045.98
1528
0.747
−0.292

AT
GCGACC
1039.07
601
0.578
−0.547

AT
GCAACC
2212.43
1259
0.569
−0.564

AT
GCTACC
2527.34
1364
0.540
−0.617

AT
GCAACG
726.77
384
0.528
−0.638

AT
GCTACG
830.22
363
0.437
−0.827

AT
GCGACT
743.87
308
0.414
−0.882

AT
GCGACA
841.17
347
0.413
−0.885

AV
GCTGTT
1736.99
3025
1.742
0.555

AV
GCTGTG
4399.56
7279
1.654
0.503

AV
GCTGTA
1127.89
1750
1.552
0.439

AV
GCTGTC
2223.90
3351
1.507
0.410

AV
GCAGTA
987.35
1401
1.419
0.350

AV
GCGGTG
1808.80
2487
1.375
0.318

AV
GCAGTT
1520.56
2087
1.373
0.317

AV
GCAGTG
3851.36
4349
1.129
0.122

AV
GCGGTC
914.32
883
0.966
−0.035

AV
GCAGTC
1946.80
1806
0.928
−0.075

AV
GCCGTG
6689.81
4322
0.646
−0.437

AV
GCGGTT
714.13
423
0.592
−0.524

AV
GCGGTA
463.71
270
0.582
−0.541

AV
GCCGTC
3381.59
1798
0.532
−0.632

AV
GCCGTT
2641.21
563
0.213
−1.546

AV
GCCGTA
1715.03
329
0.192
−1.651

AW
GCCTGG
2528.22
3848
1.522
0.420

AW
GCGTGG
683.58
558
0.816
−0.203

AW
GCTTGG
1662.69
1066
0.641
−0.445

AW
GCATGG
1455.51
858
0.589
−0.529

AY
GCCTAC
2643.77
4073
1.541
0.432

AY
GCCTAT
2148.26
2457
1.144
0.134

AY
GCTTAT
1412.81
1478
1.046
0.045

AY
GCATAT
1236.77
1244
1.006
0.006

AY
GCTTAC
1738.68
1139
0.655
−0.423

AY
GCGTAC
714.83
429
0.600
−0.511

AY
GCATAC
1522.04
868
0.570
−0.562

AY
GCGTAT
580.85
310
0.534
−0.628

CA
TGTGCT
1164.04
2021
1.736
0.552

CA
TGTGCC
1769.99
2992
1.690
0.525

CA
TGTGCA
1019.00
1708
1.676
0.517

CA
TGTGCG
478.57
477
0.997
−0.003

CA
TGCGCG
568.18
502
0.884
−0.124

CA
TGCGCC
2101.42
1313
0.625
−0.470

CA
TGCGCT
1382.00
368
0.266
−1.323

CA
TGCGCA
1209.80
312
0.258
−1.355

CC
TGCTGC
1534.17
2610
1.701
0.531

CC
TGCTGT
1292.21
1571
1.216
0.195

CC
TGTTGT
1088.41
529
0.486
−0.721

CC
TGTTGC
1292.21
497
0.385
−0.956

CD
TGTGAC
1920.20
3470
1.807
0.592

CD
TGTGAT
1699.87
2853
1.678
0.518

CD
TGCGAC
2279.75
1134
0.497
−0.698

CD
TGCGAT
2018.17
461
0.228
−1.477

CE
TGTGAA
1901.69
3636
1.912
0.648

CE
TGTGAG
2543.16
3935
1.547
0.437

CE
TGCGAG
3019.37
1709
0.566
−0.569

CE
TGCGAA
2257.78
442
0.196
−1.631

CF
TGCTTC
1891.74
2684
1.419
0.350

CF
TGCTTT
1652.78
1685
1.019
0.019

CF
TGTTTT
1392.11
1096
0.787
−0.239

CF
TGTTTC
1593.38
1065
0.668
−0.403

CG
TGTGGG
1594.78
3240
2.032
0.709

CG
TGTGGA
1633.57
2846
1.742
0.555

CG
TGTGGT
1057.61
1627
1.538
0.431

CG
TGTGGC
2214.90
3133
1.415
0.347

CG
TGCGGG
1893.40
1137
0.601
−0.510

CG
TGCGGC
2629.63
1461
0.556
−0.588

CG
TGCGGT
1255.64
344
0.274
−1.295

CG
TGCGGA
1939.46
431
0.222
−1.504

CH
TGCCAC
1618.50
2144
1.325
0.281

CH
TGCCAT
1173.68
1253
1.068
0.065

CH
TGTCAT
988.58
831
0.841
−0.174

CH
TGTCAC
1363.24
916
0.672
−0.398

CI
TGCATC
1821.04
2813
1.545
0.435

CI
TGCATT
1440.05
1579
1.096
0.092

CI
TGCATA
662.30
576
0.870
−0.140

CI
TGTATA
557.84
474
0.850
−0.163

CI
TGTATT
1212.94
927
0.764
−0.269

CI
TGTATC
1533.83
859
0.560
−0.580

CK
TGCAAG
2777.53
3348
1.205
0.187

CK
TGCAAA
2144.62
2441
1.138
0.129

CK
TGTAAA
1806.38
1770
0.980
−0.020

CK
TGTAAG
2339.47
1509
0.645
−0.438

CL
TGCCTC
1722.14
2468
1.433
0.360

CL
TGCCTG
3578.83
4525
1.264
0.235

CL
TGTTTA
583.38
704
1.207
0.188

CL
TGCCTT
1187.49
1384
1.165
0.153

CL
TGTTTG
980.04
1079
1.101
0.096

CL
TGCTTG
1163.55
1179
1.013
0.013

CL
TGTCTT
1000.21
940
0.940
−0.062

CL
TGCCTA
640.41
585
0.913
−0.090

CL
TGTCTA
539.40
481
0.892
−0.115

CL
TGCTTA
692.62
565
0.816
−0.204

CL
TGTCTC
1450.53
1010
0.696
−0.362

CL
TGTCTG
3014.39
1633
0.542
−0.613

CM
TGCATG
1518.22
1979
1.304
0.265

CM
TGTATG
1278.78
818
0.640
−0.447

CN
TGCAAC
1825.04
2351
1.288
0.253

CN
TGCAAT
1657.05
1636
0.987
−0.013

CN
TGTAAT
1395.71
1349
0.967
−0.034

CN
TGTAAC
1537.20
1079
0.702
−0.354

CP
TGCCCG
687.28
978
1.423
0.353

CP
TGCCCC
1896.80
2279
1.201
0.184

CP
TGCCCA
1639.61
1728
1.054
0.053

CP
TGCCCT
1698.85
1690
0.995
−0.005

CP
TGTCCT
1430.91
1333
0.932
−0.071

CP
TGTCCA
1381.01
1263
0.915
−0.089

CP
TGTCCC
1597.65
1369
0.857
−0.154

CP
TGTCCG
578.88
271
0.468
−0.759

CQ
TGCCAG
3338.89
4321
1.294
0.258

CQ
TGCCAA
1195.69
1319
1.103
0.098

CQ
TGTCAA
1007.11
905
0.899
−0.107

CQ
TGTCAG
2812.30
1809
0.643
−0.441

CR
TGCCGC
1031.52
1860
1.803
0.590

CR
TGCCGG
1128.18
1543
1.368
0.313

CR
TGCAGG
1117.00
1450
1.298
0.261

CR
TGCCGT
441.63
541
1.225
0.203

CR
TGCCGA
611.56
742
1.213
0.193

CR
TGCAGA
1137.78
1252
1.100
0.096

CR
TGTCGA
515.11
458
0.889
−0.118

CR
TGTCGT
371.98
308
0.828
−0.189

CR
TGTAGA
958.34
570
0.595
−0.520

CR
TGTCGC
868.83
497
0.572
−0.559

CR
TGTCGG
950.24
463
0.487
−0.719

CR
TGTAGG
940.83
389
0.413
−0.883

CS
TGCAGC
1990.73
3150
1.582
0.459

CS
TGCTCC
1757.12
2397
1.364
0.311

CS
TGCAGT
1255.97
1701
1.354
0.303

CS
TGCTCG
464.65
571
1.229
0.206

CS
TGTTCT
1287.45
1184
0.920
−0.084

CS
TGCTCT
1528.52
1393
0.911
−0.093

CS
TGTTCA
1040.83
932
0.895
−0.110

CS
TGCTCA
1235.72
1079
0.873
−0.136

CS
TGTTCC
1479.99
1102
0.745
−0.295

CS
TGTAGT
1057.88
699
0.661
−0.414

CS
TGTTCG
391.37
192
0.491
−0.712

CS
TGTAGC
1676.76
767
0.457
−0.782

CT
TGCACG
535.88
829
1.547
0.436

CT
TGCACC
1631.31
2321
1.423
0.353

CT
TGCACA
1320.60
1508
1.142
0.133

CT
TGCACT
1167.85
1185
1.015
0.015

CT
TGTACT
983.66
802
0.815
−0.204

CT
TGTACA
1112.32
830
0.746
−0.293

CT
TGTACC
1374.02
942
0.686
−0.377

CT
TGTACG
451.36
160
0.354
−1.037

CV
TGTGTC
1064.94
1821
1.710
0.536

CV
TGTGTT
831.78
1383
1.663
0.508

CV
TGTGTA
540.10
866
1.603
0.472

CV
TGTGTG
2106.78
3241
1.538
0.431

CV
TGCGTG
2501.27
1537
0.614
−0.487

CV
TGCGTC
1264.35
734
0.581
−0.544

CV
TGCGTT
987.53
219
0.222
−1.506

CV
TGCGTA
641.24
137
0.214
−1.543

CW
TGCTGG
1275.05
1842
1.445
0.368

CW
TGTTGG
1073.95
507
0.472
−0.751

CY
TGCTAC
1379.34
1995
1.446
0.369

CY
TGCTAT
1120.82
1170
1.044
0.043

CY
TGTTAT
944.05
653
0.692
−0.369

CY
TGTTAC
1161.80
788
0.678
−0.388

DA
GATGCT
2675.13
5292
1.978
0.682

DA
GATGCA
2341.80
3898
1.665
0.510

DA
GATGCC
4067.71
5983
1.471
0.386

DA
GACGCG
1242.39
1116
0.898
−0.107

DA
GATGCG
1099.83
972
0.884
−0.124

DA
GACGCC
4594.94
2668
0.581
−0.544

DA
GACGCA
2645.34
852
0.322
−1.133

DA
GACGCT
3021.87
908
0.300
−1.202

DC
GACTGC
2386.86
3465
1.452
0.373

DC
GACTGT
2010.41
2804
1.395
0.333

DC
GATTGT
1779.74
1163
0.653
−0.425

DC
GATTGC
2112.99
858
0.406
−0.901

DD
GATGAT
4271.42
7846
1.837
0.608

DD
GATGAC
4825.06
7181
1.488
0.398

DD
GACGAC
5450.46
2965
0.544
−0.609

DD
GACGAT
4825.06
1380
0.286
−1.252

DE
GATGAA
5114.33
10045
1.964
0.675

DE
GATGAG
6839.48
9573
1.400
0.336

DE
GACGAG
7725.97
4498
0.582
−0.541

DE
GACGAA
5777.22
1341
0.232
−1.461

DF
GACTTC
4696.28
6094
1.298
0.261

DF
GACTTT
4103.05
4250
1.036
0.035

DF
GATTTT
3632.26
3485
0.959
−0.041

DF
GATTTC
4157.42
2760
0.664
−0.410

DG
GATGGT
1910.36
3443
1.802
0.589

DG
GATGGA
2950.72
5133
1.740
0.554

DG
GATGGG
2880.65
4437
1.540
0.432

DG
GATGGC
4000.77
5419
1.354
0.303

DG
GACGGC
4519.33
2987
0.661
−0.414

DG
GACGGG
3254.02
1979
0.608
−0.497

DG
GACGGT
2157.97
723
0.335
−1.094

DG
GACGGA
3333.18
886
0.266
−1.325

DH
GACCAC
2653.74
3480
1.311
0.271

DH
GACCAT
1924.41
2014
1.047
0.046

DH
GATCAT
1703.60
1623
0.953
−0.048

DH
GATCAC
2349.25
1514
0.644
−0.439

DI
GACATC
4715.94
6532
1.385
0.326

DI
GACATT
3729.31
4087
1.096
0.092

DI
GATATT
3301.40
3271
0.991
−0.009

DI
GATATA
1518.36
1495
0.985
−0.016

DI
GACATA
1715.16
1565
0.912
−0.092

DI
GATATC
4174.83
2205
0.528
−0.638

DK
GACAAG
5562.52
7324
1.317
0.275

DK
GACAAA
4295.02
4794
1.116
0.110

DK
GATAAA
3802.20
3855
1.014
0.014

DK
GATAAG
4924.27
2611
0.530
−0.634

DL
GACCTC
3785.97
5029
1.328
0.284

DL
GACTTG
2557.95
3396
1.328
0.283

DL
GATTTA
1347.95
1740
1.291
0.255

DL
GACCTG
7867.71
9796
1.245
0.219

DL
GATTTG
2264.44
2687
1.187
0.171

DL
GACCTT
2610.58
2774
1.063
0.061

DL
GATCTT
2311.04
2416
1.045
0.044

DL
GACCTA
1407.87
1416
1.006
0.006

DL
GACTTA
1522.66
1403
0.921
−0.082

DL
GATCTA
1246.33
1020
0.818
−0.200

DL
GATCTC
3351.56
2214
0.661
−0.415

DL
GATCTG
6964.95
3348
0.481
−0.733

DM
GACATG
4089.63
5411
1.323
0.280

DM
GATATG
3620.37
2299
0.635
−0.454

DN
GACAAC
3511.00
4849
1.381
0.323

DN
GACAAT
3187.82
3349
1.051
0.049

DN
GATAAT
2822.05
2549
0.903
−0.102

DN
GATAAC
3108.14
1882
0.606
−0.502

DP
GACCCC
3732.11
5119
1.372
0.316

DP
GACCCG
1352.28
1692
1.251
0.224

DP
GACCCT
3342.62
3700
1.107
0.102

DP
GATCCT
2959.08
3111
1.051
0.050

DP
GACCCA
3226.05
3205
0.993
−0.007

DP
GATCCA
2855.89
2349
0.823
−0.195

DP
GATCCC
3303.88
2338
0.708
−0.346

DP
GATCCG
1197.11
455
0.380
−0.967

DQ
GACCAG
5250.37
6524
1.243
0.217

DQ
GACCAA
1880.22
2169
1.154
0.143

DQ
GATCAA
1664.48
1808
1.086
0.083

DQ
GATCAG
4647.93
2942
0.633
−0.457

DR
GACCGC
1807.77
2634
1.457
0.376

DR
GACAGA
1994.00
2869
1.439
0.364

DR
GACAGG
1957.57
2730
1.395
0.333

DR
GACCGT
773.97
1029
1.330
0.285

DR
GACCGG
1977.16
2568
1.299
0.261

DR
GACCGA
1071.78
1292
1.205
0.187

DR
GATCGA
948.80
923
0.973
−0.028

DR
GATCGT
685.16
626
0.914
−0.090

DR
GATAGA
1765.20
1123
0.636
−0.452

DR
GATCGG
1750.30
859
0.491
−0.712

DR
GATCGC
1600.34
754
0.471
−0.753

DR
GATAGG
1732.96
658
0.380
−0.968

DS
GACTCG
918.57
1527
1.662
0.508

DS
GACAGC
3935.48
6143
1.561
0.445

DS
GACAGT
2482.92
3657
1.473
0.387

DS
GATTCT
2675.01
2968
1.110
0.104

DS
GACTCC
3473.65
3800
1.094
0.090

DS
GATTCA
2162.59
2129
0.984
−0.016

DS
GACTCA
2442.89
2382
0.975
−0.025

DS
GACTCT
3021.73
2910
0.963
−0.038

DS
GATTCC
3075.07
2186
0.711
−0.341

DS
GATAGT
2198.02
1355
0.616
−0.484

DS
GATTCG
813.17
414
0.509
−0.675

DS
GATAGC
3483.91
1212
0.348
−1.056

DT
GACACG
1110.58
1842
1.659
0.506

DT
GACACC
3380.79
4666
1.380
0.322

DT
GACACA
2736.88
3538
1.293
0.257

DT
GACACT
2420.30
2688
1.111
0.105

DT
GATACT
2142.59
1731
0.808
−0.213

DT
GATACA
2422.85
1788
0.738
−0.304

DT
GATACC
2992.87
1586
0.530
−0.635

DT
GATACG
983.15
351
0.357
−1.030

DV
GATGTT
1957.96
3699
1.889
0.636

DV
GATGTA
1271.37
2214
1.741
0.555

DV
GATGTC
2506.81
3869
1.543
0.434

DV
GATGTG
4959.23
6668
1.345
0.296

DV
GACGTG
5602.02
3616
0.645
−0.438

DV
GACGTC
2831.73
1654
0.584
−0.538

DV
GACGTT
2211.73
672
0.304
−1.191

DV
GACGTA
1436.16
385
0.268
−1.316

DW
GACTGG
2619.27
3853
1.471
0.386

DW
GATTGG
2318.73
1085
0.468
−0.759

DY
GACTAC
3307.71
3930
1.188
0.172

DY
GATTAT
2379.36
2608
1.096
0.092

DY
GACTAT
2687.76
2853
1.061
0.060

DY
GATTAC
2928.18
1912
0.653
−0.426

EA
GAGGCG
2437.29
3179
1.304
0.266

EA
GAAGCA
3880.59
4844
1.248
0.222

EA
GAAGCT
4432.94
5143
1.160
0.149

EA
GAGGCC
9014.27
9805
1.088
0.084

EA
GAGGCT
5928.25
5314
0.896
−0.109

EA
GAGGCA
5189.57
4530
0.873
−0.136

EA
GAAGCC
6740.57
5649
0.838
−0.177

EA
GAAGCG
1822.52
982
0.539
−0.618

EC
GAATGT
2182.58
3541
1.622
0.484

EC
GAGTGT
2918.80
2792
0.957
−0.044

EC
GAGTGC
3465.35
2987
0.862
−0.149

EC
GAATGC
2591.27
1838
0.709
−0.343

ED
GAAGAT
6605.82
9691
1.467
0.383

ED
GAGGAC
9979.09
9684
0.970
−0.030

ED
GAAGAC
7462.02
6820
0.914
−0.090

ED
GAGGAT
8834.07
6686
0.757
−0.279

EE
GAAGAA
10747.11
14461
1.346
0.297

EE
GAGGAG
19220.31
21731
1.131
0.123

EE
GAAGAG
14372.29
11875
0.826
−0.191

EE
GAGGAA
14372.29
10645
0.741
−0.300

EF
GAATTT
3136.91
4237
1.351
0.301

EF
GAGTTC
4801.58
4739
0.987
−0.013

EF
GAGTTT
4195.05
4095
0.976
−0.024

EF
GAATTC
3590.46
2653
0.739
−0.303

EG
GAAGGA
3358.73
5032
1.498
0.404

EG
GAAGGT
2174.51
2839
1.306
0.267

EG
GAAGGG
3278.97
3559
1.085
0.082

EG
GAGGGC
6090.10
6505
1.068
0.066

EG
GAAGGC
4553.97
4340
0.953
−0.048

EG
GAGGGG
4385.02
3795
0.865
−0.145

EG
GAGGGT
2908.01
2378
0.818
−0.201

EG
GAGGGA
4491.69
2793
0.622
−0.475

EH
GAACAT
2017.28
2539
1.259
0.230

EH
GAGCAC
3720.16
4190
1.126
0.119

EH
GAGCAT
2697.74
2448
0.907
−0.097

EH
GAACAC
2781.81
2040
0.733
−0.310

EI
GAAATA
1687.78
3007
1.782
0.578

EI
GAAATT
3669.78
4788
1.305
0.266

EI
GAGATC
6206.03
6191
0.998
−0.002

EI
GAGATT
4907.66
3978
0.811
−0.210

EI
GAGATA
2257.09
1785
0.791
−0.235

EI
GAAATC
4640.66
3620
0.780
−0.248

EK
GAGAAG
12729.57
15133
1.189
0.173

EK
GAAAAA
7349.75
7522
1.023
0.023

EK
GAGAAA
9828.94
9127
0.929
−0.074

EK
GAAAAG
9518.74
7645
0.803
−0.219

EL
GAGCTG
10945.64
15625
1.428
0.356

EL
GAATTA
1584.03
2256
1.424
0.354

EL
GAACTA
1464.61
1830
1.249
0.223

EL
GAACTT
2715.79
3371
1.241
0.216

EL
GAGCTC
5267.08
5877
1.116
0.110

EL
GAGCTA
1958.64
2049
1.046
0.045

EL
GAATTG
2661.03
2335
0.877
−0.131

EL
GAGCTT
3631.87
3084
0.849
−0.164

EL
GAGTTG
3558.64
2719
0.764
−0.269

EL
GAACTC
3938.54
2632
0.668
−0.403

EL
GAGTTA
2118.35
1357
0.641
−0.445

EL
GAACTG
8184.78
4894
0.598
−0.514

EM
GAAATG
4983.92
5010
1.005
0.005

EM
GAGATG
6665.08
6639
0.996
−0.004

EN
GAAAAT
4791.73
6977
1.456
0.376

EN
GAGAAC
7057.70
6756
0.957
−0.044

EN
GAAAAC
5277.51
4930
0.934
−0.068

EN
GAGAAT
6408.07
4872
0.760
−0.274

EP
GAGCCG
1650.94
2438
1.477
0.390

EP
GAGCCC
4556.38
6270
1.376
0.319

EP
GAGCCT
4080.86
4236
1.038
0.037

EP
GAGCCA
3938.55
4067
1.033
0.032

EP
GAACCA
2945.12
2684
0.911
−0.093

EP
GAACCT
3051.53
2547
0.835
−0.181

EP
GAACCC
3407.10
2106
0.618
−0.481

EP
GAACCG
1234.52
517
0.419
−0.870

EQ
GAACAA
2579.50
3396
1.317
0.275

EQ
GAGCAG
9632.80
11185
1.161
0.149

EQ
GAGCAA
3449.61
3185
0.923
−0.080

EQ
GAACAG
7203.08
5099
0.708
−0.345

ER
GAAAGA
2650.27
3769
1.422
0.352

ER
GAGAGG
3479.50
4315
1.240
0.215

ER
GAGCGG
3514.32
4356
1.240
0.215

ER
GAGCGC
3213.23
3682
1.146
0.136

ER
GAAAGG
2601.85
2679
1.030
0.029

ER
GAGAGA
3544.25
3633
1.025
0.025

ER
GAGCGT
1375.70
1286
0.935
−0.067

ER
GAACGT
1028.70
894
0.869
−0.140

ER
GAACGA
1424.52
1188
0.834
−0.182

ER
GAGCGA
1905.04
1562
0.820
−0.199

ER
GAACGG
2627.88
1333
0.507
−0.679

ER
GAACGC
2402.74
1071
0.446
−0.808

ES
GAAAGT
2081.93
3138
1.507
0.410

ES
GAGAGC
4413.03
5786
1.311
0.271

ES
GAGAGT
2784.21
3237
1.163
0.151

ES
GAGTCG
1030.03
1174
1.140
0.131

ES
GAATCT
2533.73
2812
1.110
0.104

ES
GAATCA
2048.37
2131
1.040
0.040

ES
GAAAGC
3299.91
2880
0.873
−0.136

ES
GAGTCC
3895.16
3392
0.871
−0.138

ES
GAGTCT
3388.40
2799
0.826
−0.191

ES
GAGTCA
2739.33
2198
0.802
−0.220

ES
GAATCC
2912.67
1943
0.667
−0.405

ES
GAATCG
770.22
407
0.528
−0.638

ET
GAGACG
1658.42
2190
1.321
0.278

ET
GAAACA
3056.09
3851
1.260
0.231

ET
GAAACT
2702.59
3224
1.193
0.176

ET
GAGACC
5048.51
5514
1.092
0.088

ET
GAGACA
4086.97
3619
0.885
−0.122

ET
GAGACT
3614.21
3028
0.838
−0.177

ET
GAAACC
3775.11
2950
0.781
−0.247

ET
GAAACG
1240.11
806
0.650
−0.431

EV
GAAGTA
1580.16
2675
1.693
0.526

EV
GAAGTT
2433.50
3724
1.530
0.425

EV
GAGGTG
8242.83
9074
1.101
0.096

EV
GAAGTC
3115.66
2860
0.918
−0.086

EV
GAGGTC
4166.62
3741
0.898
−0.108

EV
GAAGTG
6163.71
5122
0.831
−0.185

EV
GAGGTT
3254.36
2359
0.725
−0.322

EV
GAGGTA
2113.17
1515
0.717
−0.333

EW
GAGTGG
3085.08
3238
1.050
0.048

EW
GAATGG
2306.92
2154
0.934
−0.069

EY
GAATAT
2307.55
3428
1.486
0.396

EY
GAGTAC
3797.72
3796
1.000
0.000

EY
GAGTAT
3085.93
2596
0.841
−0.173

EY
GAATAC
2839.80
2211
0.779
−0.250

FA
TTTGCA
1643.98
3299
2.007
0.696

FA
TTTGCT
1877.98
3746
1.995
0.690

FA
TTTGCC
2855.59
4348
1.523
0.420

FA
TTTGCG
772.10
622
0.806
−0.216

FA
TTCGCG
883.73
598
0.677
−0.391

FA
TTCGCC
3268.46
1802
0.551
−0.595

FA
TTCGCT
2149.50
516
0.240
−1.427

FA
TTCGCA
1881.67
402
0.214
−1.543

FC
TTCTGC
2058.60
3045
1.479
0.391

FC
TTCTGT
1733.93
2055
1.185
0.170

FC
TTTTGT
1514.90
1159
0.765
−0.268

FC
TTTTGC
1798.56
847
0.471
−0.753

FD
TTTGAT
2786.65
5380
1.931
0.658

FD
TTTGAC
3147.84
4737
1.505
0.409

FD
TTCGAC
3602.96
1746
0.485
−0.724

FD
TTCGAT
3189.55
864
0.271
−1.306

FE
TTTGAA
3016.02
6247
2.071
0.728

FE
TTTGAG
4033.37
6066
1.504
0.408

FE
TTCGAG
4616.53
2165
0.469
−0.757

FE
TTCGAA
3452.08
640
0.185
−1.685

FF
TTCTTC
3429.53
5168
1.507
0.410

FF
TTCTTT
2996.32
2989
0.998
−0.002

FF
TTTTTT
2617.83
1937
0.740
−0.301

FF
TTTTTC
2996.32
1946
0.649
−0.432

FG
TTTGGA
2068.21
4271
2.065
0.725

FG
TTTGGT
1339.00
2552
1.906
0.645

FG
TTTGGG
2019.09
3449
1.708
0.535

FG
TTTGGC
2804.20
3462
1.235
0.211

FG
TTCGGG
2311.02
1292
0.559
−0.581

FG
TTCGGC
3209.64
1648
0.513
−0.667

FG
TTCGGT
1532.60
419
0.273
−1.297

FG
TTCGGA
2367.24
558
0.236
−1.445

FH
TTCCAC
2463.48
3200
1.299
0.262

FH
TTTCAT
1560.78
1697
1.087
0.084

FH
TTCCAT
1786.44
1866
1.045
0.044

FH
TTTCAC
2152.30
1200
0.558
−0.584

FI
TTCATC
3454.46
5156
1.493
0.400

FI
TTCATT
2731.75
2953
1.081
0.078

FI
TTTATT
2386.67
2296
0.962
−0.039

FI
TTTATA
1097.66
950
0.865
−0.144

FI
TTCATA
1256.36
1035
0.824
−0.194

FI
TTTATC
3018.10
1555
0.515
−0.663

FK
TTCAAG
4090.45
5137
1.256
0.228

FK
TTCAAA
3158.38
3245
1.027
0.027

FK
TTTAAA
2759.42
2762
1.001
0.001

FK
TTTAAG
3573.75
2438
0.682
−0.382

FL
TTCCTC
3228.53
4426
1.371
0.315

FL
TTCCTG
6709.28
8734
1.302
0.264

FL
TTTTTA
1134.45
1334
1.176
0.162

FL
TTTCTT
1945.00
2267
1.166
0.153

FL
TTCCTA
1200.58
1280
1.066
0.064

FL
TTTCTA
1048.92
1087
1.036
0.036

FL
TTCTTG
2181.32
2239
1.026
0.026

FL
TTCCTT
2226.21
2150
0.966
−0.035

FL
TTTTTG
1905.78
1799
0.944
−0.058

FL
TTCTTA
1298.47
1144
0.881
−0.127

FL
TTTCTC
2820.70
1904
0.675
−0.393

FL
TTTCTG
5861.77
3197
0.545
−0.606

FM
TTCATG
2804.11
3662
1.306
0.267

FM
TTTATG
2449.89
1592
0.650
−0.431

FN
TTCAAC
2855.47
3919
1.372
0.317

FN
TTTAAT
2265.13
2185
0.965
−0.036

FN
TTCAAT
2592.63
2456
0.947
−0.054

FN
TTTAAC
2494.77
1648
0.661
−0.415

FP
TTCCCG
961.40
1205
1.253
0.226

FP
TTTCCT
2076.25
2539
1.223
0.201

FP
TTCCCC
2653.35
3099
1.168
0.155

FP
TTTCCA
2003.85
2141
1.068
0.066

FP
TTCCCA
2293.57
2310
1.007
0.007

FP
TTCCCT
2376.44
2379
1.001
0.001

FP
TTTCCC
2318.18
1529
0.660
−0.416

FP
TTTCCG
839.96
321
0.382
−0.962

FQ
TTCCAG
5468.69
7069
1.293
0.257

FQ
TTTCAA
1711.02
1803
1.054
0.052

FQ
TTCCAA
1958.40
1980
1.011
0.011

FQ
TTTCAG
4777.89
3064
0.641
−0.444

FR
TTCCGC
1531.47
2588
1.690
0.525

FR
TTCCGA
907.97
1410
1.553
0.440

FR
TTCCGG
1674.97
2451
1.463
0.381

FR
TTCCGT
655.68
893
1.362
0.309

FR
TTCAGA
1689.24
1852
1.096
0.092

FR
TTCAGG
1658.38
1810
1.091
0.087

FR
TTTCGA
793.28
850
1.072
0.069

FR
TTTCGT
572.85
490
0.855
−0.156

FR
TTTAGA
1475.86
947
0.642
−0.444

FR
TTTAGG
1448.90
691
0.477
−0.740

FR
TTTCGG
1463.39
688
0.470
−0.755

FR
TTTCGC
1338.02
540
0.404
−0.907

FS
TTCTCC
2990.83
4507
1.507
0.410

FS
TTCAGC
3388.47
4577
1.351
0.301

FS
TTCAGT
2137.80
2692
1.259
0.231

FS
TTCTCG
790.89
910
1.151
0.140

FS
TTTTCT
2273.08
2536
1.116
0.109

FS
TTCTCT
2601.73
2741
1.054
0.052

FS
TTTTCA
1837.65
1903
1.036
0.035

FS
TTCTCA
2103.34
1997
0.949
−0.052

FS
TTTTCC
2613.03
1872
0.716
−0.334

FS
TTTAGT
1867.76
1201
0.643
−0.442

FS
TTTTCG
690.99
258
0.373
−0.985

FS
TTTAGC
2960.44
1062
0.359
−1.025

FT
TTCACC
2909.29
4513
1.551
0.439

FT
TTCACG
955.69
1315
1.376
0.319

FT
TTCACT
2082.75
2494
1.197
0.180

FT
TTCACA
2355.18
2372
1.007
0.007

FT
TTTACT
1819.66
1622
0.891
−0.115

FT
TTTACA
2057.68
1485
0.722
−0.326

FT
TTTACC
2541.79
1495
0.588
−0.531

FT
TTTACG
834.97
261
0.313
−1.163

FV
TTTGTA
912.19
1711
1.876
0.629

FV
TTTGTT
1404.80
2620
1.865
0.623

FV
TTTGTC
1798.60
2635
1.465
0.382

FV
TTTGTG
3558.17
5206
1.463
0.381

FV
TTCGTG
4072.62
2589
0.636
−0.453

FV
TTCGTC
2058.64
1086
0.528
−0.640

FV
TTCGTT
1607.91
386
0.240
−1.427

FV
TTCGTA
1044.07
224
0.215
−1.539

FW
TTCTGG
2126.30
2834
1.333
0.287

FW
TTTTGG
1857.70
1150
0.619
−0.480

FY
TTCTAC
2720.70
3710
1.364
0.310

FY
TTTTAT
1931.51
2003
1.037
0.036

FY
TTCTAT
2210.77
2145
0.970
−0.030

FY
TTTTAC
2377.02
1382
0.581
−0.542

GA
GGTGCT
1531.20
2505
1.636
0.492

GA
GGGGCG
949.27
1433
1.510
0.412

GA
GGGGCC
3510.85
5061
1.442
0.366

GA
GGTGCC
2328.29
3109
1.335
0.289

GA
GGAGCA
2070.38
2678
1.293
0.257

GA
GGTGCA
1340.41
1715
1.279
0.246

GA
GGCGCG
1318.38
1659
1.258
0.230

GA
GGAGCT
2365.08
2975
1.258
0.229

GA
GGGGCT
2308.91
2850
1.234
0.211

GA
GGAGCC
3596.25
3845
1.069
0.067

GA
GGGGCA
2021.22
2074
1.026
0.026

GA
GGTGCG
629.52
501
0.796
−0.228

GA
GGAGCG
972.36
712
0.732
−0.312

GA
GGCGCC
4876.02
3121
0.640
−0.446

GA
GGCGCT
3206.72
906
0.283
−1.264

GA
GGCGCA
2807.15
688
0.245
−1.406

GC
GGCTGC
1888.96
4102
2.172
0.775

GC
GGCTGT
1591.04
2360
1.483
0.394

GC
GGTTGT
759.72
658
0.866
−0.144

GC
GGATGT
1173.45
793
0.676
−0.392

GC
GGTTGC
901.97
523
0.580
−0.545

GC
GGATGC
1393.18
655
0.470
−0.755

GC
GGGTGC
1360.09
628
0.462
−0.773

GC
GGGTGT
1145.59
495
0.432
−0.839

GD
GGGGAC
3126.50
4967
1.589
0.463

GD
GGTGAT
1835.49
2621
1.428
0.356

GD
GGTGAC
2073.40
2960
1.428
0.356

GD
GGAGAT
2835.09
3829
1.351
0.301

GD
GGAGAC
3202.56
4240
1.324
0.281

GD
GGGGAT
2767.76
2575
0.930
−0.072

GD
GGCGAC
4342.22
1955
0.450
−0.798

GD
GGCGAT
3843.98
880
0.229
−1.474

GE
GGAGAA
3433.99
5903
1.719
0.542

GE
GGGGAG
4483.27
6552
1.461
0.379

GE
GGTGAA
2223.23
3248
1.461
0.379

GE
GGAGAG
4592.33
5961
1.298
0.261

GE
GGTGAG
2973.17
2988
1.005
0.005

GE
GGGGAA
3352.44
3041
0.907
−0.098

GE
GGCGAG
6226.56
3530
0.567
−0.568

GE
GGCGAA
4656.01
718
0.154
−1.869

GF
GGCTTC
3466.22
6121
1.766
0.569

GF
GGATTT
2233.54
2666
1.194
0.177

GF
GGTTTT
1446.04
1665
1.151
0.141

GF
GGCTTT
3028.37
3201
1.057
0.055

GF
GGTTTC
1655.11
1548
0.935
−0.067

GF
GGATTC
2556.47
1534
0.600
−0.511

GF
GGGTTT
2180.50
1244
0.571
−0.561

GF
GGGTTC
2495.76
1083
0.434
−0.835

GG
GGTGGT
1061.28
2286
2.154
0.767

GG
GGTGGC
2222.59
3657
1.645
0.498

GG
GGTGGA
1639.25
2618
1.597
0.468

GG
GGAGGA
2531.97
3609
1.425
0.354

GG
GGTGGG
1600.32
2267
1.417
0.348

GG
GGGGGC
3351.47
4673
1.394
0.332

GG
GGAGGT
1639.25
2152
1.313
0.272

GG
GGAGGC
3433.00
3776
1.100
0.095

GG
GGCGGC
4654.67
4787
1.028
0.028

GG
GGGGGT
1600.32
1543
0.964
−0.036

GG
GGAGGG
2471.84
2351
0.951
−0.050

GG
GGGGGA
2471.84
1517
0.614
−0.488

GG
GGCGGG
3351.47
2001
0.597
−0.516

GG
GGGGGG
2413.14
1080
0.448
−0.804

GG
GGCGGT
2222.59
936
0.421
−0.865

GG
GGCGGA
3433.00
845
0.246
−1.402

GH
GGCCAC
2540.15
3679
1.448
0.370

GH
GGTCAT
879.57
1022
1.162
0.150

GH
GGACAT
1358.57
1438
1.058
0.057

GH
GGCCAT
1842.04
1679
0.911
−0.093

GH
GGGCAC
1828.97
1629
0.891
−0.116

GH
GGTCAC
1212.92
1008
0.831
−0.185

GH
GGACAC
1873.46
1479
0.789
−0.236

GH
GGGCAT
1326.31
928
0.700
−0.357

GI
GGCATC
3372.48
5474
1.623
0.484

GI
GGAATA
904.63
1338
1.479
0.391

GI
GGAATT
1966.96
2560
1.302
0.264

GI
GGCATT
2666.92
2670
1.001
0.001

GI
GGTATT
1273.45
1052
0.826
−0.191

GI
GGGATC
2428.27
1958
0.806
−0.215

GI
GGTATA
585.67
461
0.787
−0.239

GI
GGAATC
2487.34
1910
0.768
−0.264

GI
GGGATA
883.14
666
0.754
−0.282

GI
GGGATT
1920.24
1421
0.740
−0.301

GI
GGCATA
1226.55
885
0.722
−0.326

GI
GGTATC
1610.35
931
0.578
−0.548

GK
GGAAAA
3199.11
4553
1.423
0.353

GK
GGGAAG
4044.81
5674
1.403
0.338

GK
GGGAAA
3123.14
4119
1.319
0.277

GK
GGCAAG
5617.61
5712
1.017
0.017

GK
GGAAAG
4143.21
3706
0.894
−0.112

GK
GGCAAA
4337.55
3581
0.826
−0.192

GK
GGTAAA
2071.17
1334
0.644
−0.440

GK
GGTAAG
2682.40
540
0.201
−1.603

GL
GGCCTC
3017.19
4559
1.511
0.413

GL
GGTTTA
579.43
820
1.415
0.347

GL
GGTTTG
973.39
1294
1.329
0.285

GL
GGGCTG
4514.62
5878
1.302
0.264

GL
GGTCTT
993.42
1258
1.266
0.236

GL
GGCCTG
6270.10
7822
1.248
0.221

GL
GGGCTC
2172.45
2563
1.180
0.165

GL
GGATTA
894.98
991
1.107
0.102

GL
GGACTT
1534.44
1613
1.051
0.050

GL
GGCTTG
2038.53
2109
1.035
0.034

GL
GGCCTT
2080.48
2098
1.008
0.008

GL
GGACTA
827.51
799
0.966
−0.035

GL
GGGCTT
1497.99
1445
0.965
−0.036

GL
GGTCTC
1440.70
1365
0.947
−0.054

GL
GGTCTA
535.75
487
0.909
−0.095

GL
GGGCTA
807.86
726
0.899
−0.107

GL
GGCCTA
1121.99
968
0.863
−0.148

GL
GGCTTA
1213.47
935
0.771
−0.261

GL
GGACTC
2225.29
1656
0.744
−0.295

GL
GGATTG
1503.50
1062
0.706
−0.348

GL
GGTCTG
2993.96
2034
0.679
−0.387

GL
GGGTTG
1467.79
870
0.593
−0.523

GL
GGGTTA
873.73
467
0.534
−0.626

GL
GGACTG
4624.44
2384
0.516
−0.663

GM
GGCATG
3177.11
3953
1.244
0.219

GM
GGAATG
2343.24
2482
1.059
0.058

GM
GGGATG
2287.59
2247
0.982
−0.018

GM
GGTATG
1517.06
643
0.424
−0.858

GN
GGAAAT
2150.19
3332
1.550
0.438

GN
GGGAAC
2311.93
2816
1.218
0.197

GN
GGCAAC
3210.92
3701
1.153
0.142

GN
GGAAAC
2368.18
2679
1.131
0.123

GN
GGGAAT
2099.13
1823
0.868
−0.141

GN
GGCAAT
2915.36
2061
0.707
−0.347

GN
GGTAAT
1392.08
784
0.563
−0.574

GN
GGTAAC
1533.21
785
0.512
−0.669

GP
GGGCCC
2634.22
3947
1.498
0.404

GP
GGGCCG
954.47
1417
1.485
0.395

GP
GGCCCC
3658.52
4576
1.251
0.224

GP
GGCCCG
1325.61
1623
1.224
0.202

GP
GGTCCT
1564.62
1910
1.221
0.199

GP
GGGCCT
2359.31
2542
1.077
0.075

GP
GGTCCC
1746.93
1827
1.046
0.045

GP
GGCCCT
3276.71
2994
0.914
−0.090

GP
GGGCCA
2277.03
2003
0.880
−0.128

GP
GGTCCA
1510.06
1264
0.837
−0.178

GP
GGACCC
2698.30
2240
0.830
−0.186

GP
GGACCA
2332.42
1908
0.818
−0.201

GP
GGACCT
2416.70
1957
0.810
−0.211

GP
GGCCCA
3162.44
2548
0.806
−0.216

GP
GGTCCG
632.98
351
0.555
−0.590

GP
GGACCG
977.69
421
0.431
−0.843

GQ
GGACAA
1382.58
1677
1.213
0.193

GQ
GGGCAG
3769.06
4425
1.174
0.160

GQ
GGCCAG
5234.64
6081
1.162
0.150

GQ
GGTCAA
895.11
953
1.065
0.063

GQ
GGCCAA
1874.58
1593
0.850
−0.163

GQ
GGGCAA
1349.74
1124
0.833
−0.183

GQ
GGACAG
3860.75
3134
0.812
−0.209

GQ
GGTCAG
2499.53
1879
0.752
−0.285

GR
GGCCGC
1832.29
3615
1.973
0.680

GR
GGAAGA
1490.60
2294
1.539
0.431

GR
GGCCGG
2003.98
2892
1.443
0.367

GR
GGCCGT
784.47
1022
1.303
0.265

GR
GGTCGT
374.58
450
1.201
0.183

GR
GGCCGA
1086.32
1252
1.153
0.142

GR
GGGCGC
1319.29
1471
1.115
0.109

GR
GGTCGA
518.71
546
1.053
0.051

GR
GGCAGG
1984.13
2022
1.019
0.019

GR
GGGAGG
1428.62
1435
1.004
0.004

GR
GGGCGG
1442.91
1437
0.996
−0.004

GR
GGAAGG
1463.37
1370
0.936
−0.066

GR
GGGAGA
1455.20
1344
0.924
−0.079

GR
GGACGT
578.58
514
0.888
−0.118

GR
GGACGA
801.20
671
0.837
−0.177

GR
GGGCGT
564.84
471
0.834
−0.182

GR
GGCAGA
2021.05
1684
0.833
−0.182

GR
GGGCGA
782.17
626
0.800
−0.223

GR
GGTCGC
874.92
596
0.681
−0.384

GR
GGTCGG
956.90
555
0.580
−0.545

GR
GGTAGA
965.05
529
0.548
−0.601

GR
GGACGC
1351.39
729
0.539
−0.617

GR
GGACGG
1478.01
737
0.499
−0.696

GR
GGTAGG
947.42
244
0.258
−1.357

GS
GGCAGC
3581.32
6542
1.827
0.603

GS
GGCTCC
3161.05
5376
1.701
0.531

GS
GGCTCG
835.91
1323
1.583
0.459

GS
GGCAGT
2259.47
2875
1.272
0.241

GS
GGAAGT
1666.45
2085
1.251
0.224

GS
GGTTCT
1313.02
1563
1.190
0.174

GS
GGCTCT
2749.80
3087
1.123
0.116

GS
GGGAGC
2578.63
2566
0.995
−0.005

GS
GGTTCC
1509.39
1428
0.946
−0.055

GS
GGCTCA
2223.05
2101
0.945
−0.056

GS
GGTTCA
1061.50
981
0.924
−0.079

GS
GGAAGC
2641.36
2137
0.809
−0.212

GS
GGATCA
1639.59
1281
0.781
−0.247

GS
GGGAGT
1626.88
1267
0.779
−0.250

GS
GGATCT
2028.08
1470
0.725
−0.322

GS
GGGTCC
2276.03
1646
0.723
−0.324

GS
GGGTCT
1979.92
1280
0.646
−0.436

GS
GGGTCG
601.87
379
0.630
−0.463

GS
GGTAGT
1078.89
646
0.599
−0.513

GS
GGATCC
2331.40
1342
0.576
−0.552

GS
GGGTCA
1600.65
887
0.554
−0.590

GS
GGTTCG
399.14
209
0.524
−0.647

GS
GGATCG
616.51
276
0.448
−0.804

GS
GGTAGC
1710.07
723
0.423
−0.861

GT
GGCACC
3271.07
4870
1.489
0.398

GT
GGCACG
1074.53
1368
1.273
0.241

GT
GGGACC
2355.25
2817
1.196
0.179

GT
GGAACA
1953.05
2290
1.173
0.159

GT
GGAACT
1727.13
1900
1.100
0.095

GT
GGGACG
773.69
838
1.083
0.080

GT
GGGACA
1906.66
1903
0.998
−0.002

GT
GGCACT
2341.75
2331
0.995
−0.005

GT
GGCACA
2648.06
2499
0.944
−0.058

GT
GGGACT
1686.11
1534
0.910
−0.095

GT
GGAACC
2412.54
1841
0.763
−0.270

GT
GGTACT
1118.18
840
0.751
−0.286

GT
GGTACC
1561.93
994
0.636
−0.452

GT
GGTACA
1264.44
780
0.617
−0.483

GT
GGAACG
792.51
445
0.562
−0.577

GT
GGTACG
513.09
150
0.292
−1.230

GV
GGTGTT
816.93
1802
2.206
0.791

GV
GGTGTC
1045.94
2070
1.979
0.683

GV
GGTGTA
530.46
957
1.804
0.590

GV
GGTGTG
2069.18
3207
1.550
0.438

GV
GGAGTA
819.35
1225
1.495
0.402

GV
GGAGTT
1261.83
1841
1.459
0.378

GV
GGGGTC
1577.18
2150
1.363
0.310

GV
GGAGTC
1615.55
1839
1.138
0.130

GV
GGGGTT
1231.86
1123
0.912
−0.093

GV
GGGGTG
3120.14
2770
0.888
−0.119

GV
GGAGTG
3196.04
2641
0.826
−0.191

GV
GGGGTA
799.89
631
0.789
−0.237

GV
GGCGTC
2190.46
1653
0.755
−0.282

GV
GGCGTG
4333.39
2790
0.644
−0.440

GV
GGCGTT
1710.87
499
0.292
−1.232

GV
GGCGTA
1110.93
232
0.209
−1.566

GW
GGCTGG
2102.85
3748
1.782
0.578

GW
GGTTGG
1004.11
690
0.687
−0.375

GW
GGATGG
1550.94
1012
0.653
−0.427

GW
GGGTGG
1514.10
722
0.477
−0.741

GY
GGCTAC
2577.81
4581
1.777
0.575

GY
GGTTAT
1000.20
1309
1.309
0.269

GY
GGCTAT
2094.66
2528
1.207
0.188

GY
GGATAT
1544.90
1478
0.957
−0.044

GY
GGTTAC
1230.90
1074
0.873
−0.136

GY
GGATAC
1901.24
1052
0.553
−0.592

GY
GGGTAC
1856.09
982
0.529
−0.637

GY
GGGTAT
1508.21
710
0.471
−0.753

HA
CATGCT
1101.90
1959
1.778
0.575

HA
CATGCA
964.61
1670
1.731
0.549

HA
CATGCC
1675.52
2408
1.437
0.363

HA
CACGCG
624.72
681
1.090
0.086

HA
CATGCG
453.03
447
0.987
−0.013

HA
CACGCC
2310.52
1649
0.714
−0.337

HA
CACGCA
1330.18
617
0.464
−0.768

HA
CACGCT
1519.52
549
0.361
−1.018

HC
CACTGC
1778.65
2629
1.478
0.391

HC
CACTGT
1498.13
1717
1.146
0.136

HC
CATTGT
1086.40
673
0.619
−0.479

HC
CATTGC
1289.82
634
0.492
−0.710

HD
CATGAT
1329.76
2349
1.766
0.569

HD
CATGAC
1502.11
2329
1.550
0.439

HD
CACGAC
2071.40
1343
0.648
−0.433

HD
CACGAT
1833.73
716
0.390
−0.940

HE
CATGAA
1769.46
3512
1.985
0.686

HE
CATGAG
2366.33
3307
1.398
0.335

HE
CACGAG
3263.15
2230
0.683
−0.381

HE
CACGAA
2440.07
790
0.324
−1.128

HF
CACTTC
2538.66
3116
1.227
0.205

HF
CATTTT
1608.41
1806
1.123
0.116

HF
CACTTT
2217.98
1884
0.849
−0.163

HF
CATTTC
1840.95
1400
0.760
−0.274

HG
CATGGA
1246.72
2238
1.795
0.585

HG
CATGGT
807.15
1426
1.767
0.569

HG
CATGGG
1217.11
1849
1.519
0.418

HG
CATGGC
1690.37
2320
1.372
0.317

HG
CACGGC
2331.01
1680
0.721
−0.328

HG
CACGGG
1678.38
1184
0.705
−0.349

HG
CACGGT
1113.05
468
0.420
−0.866

HG
CACGGA
1719.21
638
0.371
−0.991

HH
CACCAC
2269.33
2795
1.232
0.208

HH
CATCAT
1193.37
1250
1.047
0.046

HH
CACCAT
1645.65
1453
0.883
−0.125

HH
CATCAC
1645.65
1256
0.763
−0.270

HI
CACATC
2433.52
3538
1.454
0.374

HI
CACATT
1924.40
1924
1.000
0.000

HI
CACATA
885.05
867
0.980
−0.021

HI
CATATT
1395.51
1260
0.903
−0.102

HI
CATATA
641.81
552
0.860
−0.151

HI
CATATC
1764.71
904
0.512
−0.669

HK
CACAAG
3102.81
3928
1.266
0.236

HK
CACAAA
2395.79
2432
1.015
0.015

HK
CATAAA
1737.35
1690
0.973
−0.028

HK
CATAAG
2250.06
1436
0.638
−0.449

HL
CATTTA
707.71
1053
1.488
0.397

HL
CATTTG
1188.90
1485
1.249
0.222

HL
CACCTG
5042.69
6030
1.196
0.179

HL
CACCTC
2426.56
2850
1.175
0.161

HL
CATCTT
1213.36
1409
1.161
0.149

HL
CACTTG
1639.48
1700
1.037
0.036

HL
CATCTA
654.36
649
0.992
−0.008

HL
CACCTT
1673.21
1499
0.896
−0.110

HL
CACCTA
902.35
761
0.843
−0.170

HL
CATCTC
1759.66
1422
0.808
−0.213

HL
CACTTA
975.93
781
0.800
−0.223

HL
CATCTG
3656.80
2202
0.602
−0.507

HM
CACATG
2348.18
3023
1.287
0.253

HM
CATATG
1702.82
1028
0.604
−0.505

HN
CACAAC
2031.88
2762
1.359
0.307

HN
CACAAT
1844.85
1832
0.993
−0.007

HN
CATAAT
1337.83
1225
0.916
−0.088

HN
CATAAC
1473.45
869
0.590
−0.528

HP
CACCCG
846.94
1341
1.583
0.460

HP
CATCCT
1518.15
1770
1.166
0.153

HP
CACCCC
2337.46
2530
1.082
0.079

HP
CATCCA
1465.21
1577
1.076
0.074

HP
CACCCA
2020.51
1919
0.950
−0.052

HP
CACCCT
2093.51
1859
0.888
−0.119

HP
CATCCC
1695.05
1265
0.746
−0.293

HP
CATCCG
614.18
330
0.537
−0.621

HQ
CATCAA
1143.96
1358
1.187
0.172

HQ
CACCAG
4405.09
4761
1.081
0.078

HQ
CATCAG
3194.43
2957
0.926
−0.077

HQ
CACCAA
1577.51
1245
0.789
−0.237

HR
CACAGG
1447.19
1936
1.338
0.291

HR
CACCGC
1336.44
1772
1.326
0.282

HR
CACAGA
1474.12
1788
1.213
0.193

HR
CACCGG
1461.67
1772
1.212
0.193

HR
CACCGT
572.18
667
1.166
0.153

HR
CATCGA
574.58
627
1.091
0.087

HR
CATCGT
414.93
452
1.089
0.086

HR
CACCGA
792.34
855
1.079
0.076

HR
CATCGG
1059.96
729
0.688
−0.374

HR
CATAGA
1068.98
635
0.594
−0.521

HR
CATCGC
969.15
565
0.583
−0.540

HR
CATAGG
1049.46
423
0.403
−0.909

HS
CACTCG
551.81
880
1.595
0.467

HS
CACAGC
2364.16
3726
1.576
0.455

HS
CACAGT
1491.56
1957
1.312
0.272

HS
CATTCA
1064.20
1307
1.228
0.206

HS
CATTCT
1316.36
1517
1.152
0.142

HS
CACTCC
2086.72
1964
0.941
−0.061

HS
CACTCA
1467.52
1318
0.898
−0.107

HS
CATTCC
1513.23
1219
0.806
−0.216

HS
CACTCT
1815.24
1231
0.678
−0.388

HS
CATAGT
1081.63
710
0.656
−0.421

HS
CATTCG
400.16
256
0.640
−0.447

HS
CATAGC
1714.41
782
0.456
−0.785

HT
CACACG
778.62
1526
1.960
0.673

HT
CACACT
1696.86
2036
1.200
0.182

HT
CACACA
1918.82
2255
1.175
0.161

HT
CACACC
2370.26
2537
1.070
0.068

HT
CATACT
1230.51
1306
1.061
0.060

HT
CATACA
1391.46
979
0.704
−0.352

HT
CATACC
1718.84
806
0.469
−0.757

HT
CATACG
564.63
225
0.398
−0.920

HV
CATGTT
869.32
1563
1.798
0.587

HV
CATGTA
564.48
880
1.559
0.444

HV
CATGTC
1113.00
1607
1.444
0.367

HV
CATGTG
2201.86
2797
1.270
0.239

HV
CACGTG
3036.34
2579
0.849
−0.163

HV
CACGTC
1534.82
1158
0.754
−0.282

HV
CACGTT
1198.78
434
0.362
−1.016

HV
CACGTA
778.41
279
0.358
−1.026

HW
CACTGG
1602.74
2197
1.371
0.315

HW
CATTGG
1162.26
568
0.489
−0.716

HY
CACTAC
1943.40
2385
1.227
0.205

HY
CATTAT
1145.15
1240
1.083
0.080

HY
CACTAT
1579.16
1378
0.873
−0.136

HY
CATTAC
1409.29
1074
0.762
−0.272

IA
ATTGCT
1886.56
3678
1.950
0.668

IA
ATAGCA
759.54
1446
1.904
0.644

IA
ATTGCA
1651.49
2818
1.706
0.534

IA
ATAGCT
867.65
1289
1.486
0.396

IA
ATTGCC
2868.63
3435
1.197
0.180

IA
ATAGCC
1319.32
1191
0.903
−0.102

IA
ATCGCG
980.82
708
0.722
−0.326

IA
ATCGCC
3627.56
2570
0.708
−0.345

IA
ATTGCG
775.62
494
0.637
−0.451

IA
ATAGCG
356.72
198
0.555
−0.589

IA
ATCGCA
2088.41
831
0.398
−0.922

IA
ATCGCT
2385.67
910
0.381
−0.964

IC
ATCTGC
2115.05
3055
1.444
0.368

IC
ATCTGT
1781.48
2074
1.164
0.152

IC
ATATGT
647.91
731
1.128
0.121

IC
ATTTGT
1408.77
1197
0.850
−0.163

IC
ATATGC
769.23
470
0.611
−0.493

IC
ATTTGC
1672.56
868
0.519
−0.656

ID
ATTGAT
2604.76
4341
1.667
0.511

ID
ATAGAT
1197.96
1947
1.625
0.486

ID
ATTGAC
2942.37
3938
1.338
0.291

ID
ATAGAC
1353.23
1476
1.091
0.087

ID
ATCGAC
3720.81
2270
0.610
−0.494

ID
ATCGAT
3293.87
1141
0.346
−1.060

IE
ATAGAA
1371.51
2939
2.143
0.762

IE
ATTGAA
2982.12
5518
1.850
0.615

IE
ATTGAG
3988.04
4634
1.162
0.150

IE
ATAGAG
1834.15
1898
1.035
0.034

IE
ATCGAG
5043.12
3007
0.596
−0.517

IE
ATCGAA
3771.07
994
0.264
−1.333

IF
ATATTT
1144.73
1929
1.685
0.522

IF
ATCTTC
3602.60
4836
1.342
0.294

IF
ATTTTT
2489.02
2226
0.894
−0.112

IF
ATCTTT
3147.52
2779
0.883
−0.125

IF
ATATTC
1310.24
886
0.676
−0.391

IF
ATTTTC
2848.89
1887
0.662
−0.412

IG
ATTGGT
1013.16
2102
2.075
0.730

IG
ATTGGA
1564.91
3151
2.014
0.700

IG
ATAGGA
719.72
1054
1.464
0.381

IG
ATTGGG
1527.75
2144
1.403
0.339

IG
ATAGGT
465.96
596
1.279
0.246

IG
ATTGGC
2121.81
2706
1.275
0.243

IG
ATAGGG
702.63
549
0.781
−0.247

IG
ATAGGC
975.84
700
0.717
−0.332

IG
ATCGGG
1931.93
1244
0.644
−0.440

IG
ATCGGC
2683.15
1619
0.603
−0.505

IG
ATCGGT
1281.20
498
0.389
−0.945

IG
ATCGGA
1978.93
604
0.305
−1.187

IH
ATTCAT
1622.93
2242
1.381
0.323

IH
ATCCAC
2830.09
3367
1.190
0.174

IH
ATACAT
746.40
760
1.018
0.018

IH
ATCCAT
2052.29
1814
0.884
−0.123

IH
ATTCAC
2238.00
1778
0.794
−0.230

IH
ATACAC
1029.28
558
0.542
−0.612

II
ATCATC
3797.03
5979
1.575
0.454

II
ATAATA
502.24
700
1.394
0.332

II
ATAATT
1092.04
1309
1.199
0.181

II
ATCATT
3002.64
3321
1.106
0.101

II
ATTATT
2374.46
2157
0.908
−0.096

II
ATCATA
1380.95
1183
0.857
−0.155

II
ATTATA
1092.04
921
0.843
−0.170

II
ATAATC
1380.95
715
0.518
−0.658

II
ATTATC
3002.64
1340
0.446
−0.807

IK
ATAAAA
1419.09
2244
1.581
0.458

IK
ATCAAG
5053.39
5884
1.164
0.152

IK
ATAAAG
1837.88
1943
1.057
0.056

IK
ATTAAA
3085.58
3107
1.007
0.007

IK
ATCAAA
3901.90
3830
0.982
−0.019

IK
ATTAAG
3996.16
2286
0.572
−0.559

IL
ATTTTA
977.08
1679
1.718
0.541

IL
ATATTA
449.37
723
1.609
0.476

IL
ATTTTG
1641.41
2339
1.425
0.354

IL
ATTCTT
1675.18
2271
1.356
0.304

IL
ATCCTC
3072.14
4017
1.308
0.268

IL
ATCCTG
6384.29
7754
1.215
0.194

IL
ATTCTA
903.41
1021
1.130
0.122

IL
ATCTTG
2075.66
2250
1.084
0.081

IL
ATCCTA
1142.42
1170
1.024
0.024

IL
ATACTA
415.49
416
1.001
0.001

IL
ATCCTT
2118.37
2058
0.972
−0.029

IL
ATATTG
754.90
717
0.950
−0.052

IL
ATACTT
770.44
726
0.942
−0.059

IL
ATCTTA
1235.57
1077
0.872
−0.137

IL
ATTCTC
2429.41
1918
0.789
−0.236

IL
ATTCTG
5048.62
3005
0.595
−0.519

IL
ATACTC
1117.32
458
0.410
−0.892

IL
ATACTG
2321.92
934
0.402
−0.911

IM
ATCATG
3206.80
4314
1.345
0.297

IM
ATAATG
1166.29
1196
1.025
0.025

IM
ATTATG
2535.90
1399
0.552
−0.595

IN
ATAAAT
1088.42
1649
1.515
0.415

IN
ATCAAC
3296.07
4599
1.395
0.333

IN
ATCAAT
2992.68
2890
0.966
−0.035

IN
ATAAAC
1198.76
1113
0.928
−0.074

IN
ATTAAT
2366.58
1967
0.831
−0.185

IN
ATTAAC
2606.49
1331
0.511
−0.672

IP
ATTCCT
2051.78
2787
1.358
0.306

IP
ATTCCA
1980.23
2644
1.335
0.289

IP
ATACCA
910.73
1047
1.150
0.139

IP
ATCCCC
2896.94
3229
1.115
0.109

IP
ATACCT
943.64
995
1.054
0.053

IP
ATCCCG
1049.66
1073
1.022
0.022

IP
ATCCCA
2504.13
2366
0.945
−0.057

IP
ATCCCT
2594.61
2451
0.945
−0.057

IP
ATTCCC
2290.86
1775
0.775
−0.255

IP
ATACCC
1053.60
610
0.579
−0.547

IP
ATTCCG
830.06
386
0.465
−0.766

IP
ATACCG
381.76
125
0.327
−1.116

IQ
ATACAA
765.47
950
1.241
0.216

IQ
ATTCAA
1664.38
2045
1.229
0.206

IQ
ATCCAG
5877.26
6881
1.171
0.158

IQ
ATTCAG
4647.67
3987
0.858
−0.153

IQ
ATCCAA
2104.71
1765
0.839
−0.176

IQ
ATACAG
2137.52
1569
0.734
−0.309

IR
ATCCGC
1552.18
2623
1.690
0.525

IR
ATTCGA
727.72
1142
1.569
0.451

IR
ATCCGA
920.25
1434
1.558
0.444

IR
ATCCGT
664.55
943
1.419
0.350

IR
ATAAGA
622.67
877
1.408
0.342

IR
ATCCGG
1697.63
2265
1.334
0.288

IR
ATTCGT
525.51
677
1.288
0.253

IR
ATCAGA
1712.09
1680
0.981
−0.019

IR
ATCAGG
1680.81
1513
0.900
−0.105

IR
ATAAGG
611.30
547
0.895
−0.111

IR
ATACGT
241.69
213
0.881
−0.126

IR
ATACGA
334.69
292
0.872
−0.136

IR
ATTCGG
1342.46
907
0.676
−0.392

IR
ATTAGA
1353.90
900
0.665
−0.408

IR
ATTCGC
1227.45
780
0.635
−0.453

IR
ATACGG
617.42
260
0.421
−0.865

IR
ATTAGG
1329.16
503
0.378
−0.972

IR
ATACGC
564.52
170
0.301
−1.200

IS
ATCTCC
2689.59
3743
1.392
0.330

IS
ATATCA
687.92
954
1.387
0.327

IS
ATCAGC
3047.17
3998
1.312
0.272

IS
ATTTCT
1850.19
2423
1.310
0.270

IS
ATTTCA
1495.77
1957
1.308
0.269

IS
ATCAGT
1922.48
2287
1.190
0.174

IS
ATATCT
850.92
1012
1.189
0.173

IS
ATCTCG
711.23
773
1.087
0.083

IS
ATAAGT
699.19
695
0.994
−0.006

IS
ATCTCT
2339.68
2317
0.990
−0.010

IS
ATCTCA
1891.49
1767
0.934
−0.068

IS
ATTTCC
2126.89
1795
0.844
−0.170

IS
ATATCC
978.18
703
0.719
−0.330

IS
ATTAGT
1520.28
906
0.596
−0.518

IS
ATAAGC
1108.24
636
0.574
−0.555

IS
ATATCG
258.67
132
0.510
−0.673

IS
ATTTCG
562.43
255
0.453
−0.791

IS
ATTAGC
2409.67
797
0.331
−1.106

IT
ATCACC
3094.94
4722
1.526
0.422

IT
ATCACG
1016.68
1306
1.285
0.250

IT
ATAACT
805.82
1009
1.252
0.225

IT
ATCACT
2215.66
2751
1.242
0.216

IT
ATCACA
2505.48
2989
1.193
0.176

IT
ATAACA
911.22
1079
1.184
0.169

IT
ATTACT
1752.12
1369
0.781
−0.247

IT
ATTACA
1981.30
1531
0.773
−0.258

IT
ATAACC
1125.61
741
0.658
−0.418

IT
ATAACG
369.76
204
0.552
−0.595

IT
ATTACC
2447.44
1083
0.443
−0.815

IT
ATTACG
803.98
246
0.306
−1.184

IV
ATTGTT
1261.28
2414
1.914
0.649

IV
ATTGTA
819.00
1478
1.805
0.590

IV
ATAGTA
376.67
645
1.712
0.538

IV
ATAGTT
580.08
877
1.512
0.413

IV
ATTGTC
1614.84
2315
1.434
0.360

IV
ATTGTG
3194.65
3762
1.178
0.163

IV
ATCGTC
2042.07
1679
0.822
−0.196

IV
ATAGTG
1469.26
1196
0.814
−0.206

IV
ATAGTC
742.69
575
0.774
−0.256

IV
ATCGTG
4039.83
2922
0.723
−0.324

IV
ATCGTA
1035.67
361
0.349
−1.054

IV
ATCGTT
1594.97
547
0.343
−1.070

IW
ATCTGG
1887.23
2427
1.286
0.252

IW
ATATGG
686.37
622
0.906
−0.098

IW
ATTTGG
1492.40
1017
0.681
−0.384

IY
ATCTAC
2708.47
3486
1.287
0.252

IY
ATATAT
800.43
953
1.191
0.174

IY
ATTTAT
1740.39
1984
1.140
0.131

IY
ATCTAT
2200.83
2196
0.998
−0.002

IY
ATTTAC
2141.83
1403
0.655
−0.423

IY
ATATAC
985.05
555
0.563
−0.574

KA
AAAGCA
3029.93
4322
1.426
0.355

KA
AAAGCT
3461.21
4262
1.231
0.208

KA
AAGGCC
6816.15
6676
0.979
−0.021

KA
AAGGCG
1842.96
1790
0.971
−0.029

KA
AAGGCA
3924.10
3654
0.931
−0.071

KA
AAAGCC
5262.99
4742
0.901
−0.104

KA
AAGGCT
4482.65
4032
0.899
−0.106

KA
AAAGCG
1423.01
765
0.538
−0.621

KC
AAATGT
1815.55
2671
1.471
0.386

KC
AAGTGT
2351.33
2267
0.964
−0.037

KC
AAGTGC
2791.62
2498
0.895
−0.111

KC
AAATGC
2155.50
1678
0.778
−0.250

KD
AAAGAT
4684.00
6115
1.306
0.267

KD
AAGGAC
6852.58
6836
0.998
−0.002

KD
AAGGAT
6066.30
5379
0.887
−0.120

KD
AAAGAC
5291.12
4564
0.863
−0.148

KE
AAAGAA
6989.41
9895
1.416
0.348

KE
AAGGAG
12105.47
12287
1.015
0.015

KE
AAGGAA
9052.06
8366
0.924
−0.079

KE
AAAGAG
9347.06
6946
0.743
−0.297

KF
AAATTT
2631.62
3140
1.193
0.177

KF
AAGTTT
3408.25
3638
1.067
0.065

KF
AAGTTC
3901.02
3950
1.013
0.012

KF
AAATTC
3012.11
2225
0.739
−0.303

KG
AAAGGA
2672.15
4509
1.687
0.523

KG
AAAGGT
1730.00
2402
1.388
0.328

KG
AAAGGC
3623.06
3435
0.948
−0.053

KG
AAAGGG
2608.69
2465
0.945
−0.057

KG
AAGGGC
4692.27
4309
0.918
−0.085

KG
AAGGGT
2240.55
1978
0.883
−0.125

KG
AAGGGG
3378.54
2740
0.811
−0.209

KG
AAGGGA
3460.73
2568
0.742
−0.298

KH
AAACAT
1929.29
2356
1.221
0.200

KH
AAGCAC
3445.60
3583
1.040
0.039

KH
AAGCAT
2498.64
2430
0.973
−0.028

KH
AAACAC
2660.47
2165
0.814
−0.206

KI
AAAATA
1547.96
2667
1.723
0.544

KI
AAAATT
3365.76
3894
1.157
0.146

KI
AAGATC
5512.26
5523
1.002
0.002

KI
AAGATA
2004.77
1943
0.969
−0.031

KI
AAGATT
4359.03
3732
0.856
−0.155

KI
AAAATC
4256.21
3287
0.772
−0.258

KK
AAGAAG
11070.03
13815
1.248
0.222

KK
AAGAAA
8547.55
10129
1.185
0.170

KK
AAAAAG
8547.55
6145
0.719
−0.330

KK
AAAAAA
6599.86
4676
0.708
−0.345

KL
AAATTA
1273.72
2084
1.636
0.492

KL
AAACTA
1177.70
1750
1.486
0.396

KL
AAACTT
2183.78
3014
1.380
0.322

KL
AAGCTG
8523.68
9600
1.126
0.119

KL
AAGCTA
1525.25
1660
1.088
0.085

KL
AAGCTC
4101.62
4076
0.994
−0.006

KL
AAATTG
2139.75
2113
0.987
−0.013

KL
AAGCTT
2828.24
2772
0.980
−0.020

KL
AAGTTA
1649.61
1459
0.884
−0.123

KL
AAACTC
3167.00
2653
0.838
−0.177

KL
AAGTTG
2771.21
2280
0.823
−0.195

KL
AAACTG
6581.43
4462
0.678
−0.389

KM
AAGATG
5479.27
5650
1.031
0.031

KM
AAAATG
4230.73
4060
0.960
−0.041

KN
AAAAAT
3683.47
4378
1.189
0.173

KN
AAGAAC
5254.13
5515
1.050
0.048

KN
AAGAAT
4770.51
4618
0.968
−0.032

KN
AAAAAC
4056.89
3254
0.802
−0.221

KP
AAACCA
2803.51
3370
1.202
0.184

KP
AAGCCC
4200.41
4673
1.113
0.107

KP
AAGCCA
3630.85
4035
1.111
0.106

KP
AAACCT
2904.80
3118
1.073
0.071

KP
AAGCCG
1521.96
1544
1.014
0.014

KP
AAGCCT
3762.04
3396
0.903
−0.102

KP
AAACCC
3243.28
2624
0.809
−0.212

KP
AAACCG
1175.16
482
0.410
−0.891

KQ
AAACAA
2178.87
3274
1.503
0.407

KQ
AAGCAA
2821.88
3177
1.126
0.119

KQ
AAGCAG
7879.90
8081
1.026
0.025

KQ
AAACAG
6084.35
4433
0.729
−0.317

KR
AAAAGA
2247.57
3147
1.400
0.337

KR
AAGAGG
2857.67
3975
1.391
0.330

KR
AAGAGA
2910.85
3511
1.206
0.187

KR
AAAAGG
2206.51
2325
1.054
0.052

KR
AAACGT
872.39
862
0.988
−0.012

KR
AAGCGG
2886.27
2828
0.980
−0.020

KR
AAGCGC
2638.99
2532
0.959
−0.041

KR
AAACGA
1208.07
1087
0.900
−0.106

KR
AAGCGT
1129.84
978
0.866
−0.144

KR
AAGCGA
1564.59
1325
0.847
−0.166

KR
AAACGG
2228.59
1178
0.529
−0.638

KR
AAACGC
2037.65
1041
0.511
−0.672

KS
AAATCA
1871.14
2533
1.354
0.303

KS
AAAAGT
1901.80
2389
1.256
0.228

KS
AAATCT
2314.50
2793
1.207
0.188

KS
AAGTCA
2423.33
2566
1.059
0.057

KS
AAGAGC
3903.97
4045
1.036
0.035

KS
AAGAGT
2463.04
2459
0.998
−0.002

KS
AAGTCG
911.22
904
0.992
−0.008

KS
AAGTCC
3445.84
3100
0.900
−0.106

KS
AAGTCT
2997.54
2675
0.892
−0.114

KS
AAATCC
2660.65
2304
0.866
−0.144

KS
AAAAGC
3014.39
2381
0.790
−0.236

KS
AAATCG
703.58
462
0.657
−0.421

KT
AAAACA
2831.74
3611
1.275
0.243

KT
AAGACG
1488.17
1790
1.203
0.185

KT
AAAACT
2504.18
2969
1.186
0.170

KT
AAGACC
4530.26
4475
0.988
−0.012

KT
AAGACA
3667.42
3574
0.975
−0.026

KT
AAGACT
3243.20
2876
0.887
−0.120

KT
AAAACC
3497.97
2854
0.816
−0.203

KT
AAAACG
1149.07
763
0.664
−0.409

KV
AAAGTA
1317.00
2214
1.681
0.519

KV
AAAGTT
2028.22
3042
1.500
0.405

KV
AAAGTC
2596.78
2642
1.017
0.017

KV
AAGGTG
6653.25
6512
0.979
−0.021

KV
AAGGTC
3363.11
3016
0.897
−0.109

KV
AAGGTT
2626.77
2294
0.873
−0.135

KV
AAAGTG
5137.21
4417
0.860
−0.151

KV
AAGGTA
1705.66
1291
0.757
−0.279

KW
AAGTGG
2598.56
2701
1.039
0.039

KW
AAATGG
2006.44
1904
0.949
−0.052

KY
AAATAT
2319.32
2982
1.286
0.251

KY
AAGTAC
3696.62
3603
0.975
−0.026

KY
AAATAC
2854.29
2763
0.968
−0.033

KY
AAGTAT
3003.78
2526
0.841
−0.173

LA
CTGGCG
2275.39
3643
1.601
0.471

LA
TTGGCA
1575.16
2350
1.492
0.400

LA
CTGGCC
8415.49
12456
1.480
0.392

LA
TTGGCT
1799.36
2643
1.469
0.384

LA
TTAGCA
937.64
1314
1.401
0.337

LA
CTTGCT
1836.39
2345
1.277
0.244

LA
CTAGCA
866.95
1107
1.277
0.244

LA
CTTGCA
1607.57
1861
1.158
0.146

LA
TTAGCT
1071.10
1239
1.157
0.146

LA
CTGGCT
5534.46
6333
1.144
0.135

LA
CTAGCT
990.35
1099
1.110
0.104

LA
CTGGCA
4844.85
5013
1.035
0.034

LA
TTGGCC
2736.04
2824
1.032
0.032

LA
TTGGCG
739.77
623
0.842
−0.172

LA
CTTGCC
2792.34
2201
0.788
−0.238

LA
CTAGCC
1505.89
1159
0.770
−0.262

LA
CTAGCG
407.16
253
0.621
−0.476

LA
TTAGCC
1628.68
941
0.578
−0.549

LA
CTTGCG
755.00
346
0.458
−0.780

LA
TTAGCG
440.36
198
0.450
−0.799

LA
CTCGCC
4049.56
1527
0.377
−0.975

LA
CTCGCG
1094.93
390
0.356
−1.032

LA
CTCGCT
2663.20
605
0.227
−1.482

LA
CTCGCA
2331.36
429
0.184
−1.693

LC
CTCTGC
1769.27
3523
1.991
0.689

LC
CTCTGT
1490.23
2145
1.439
0.364

LC
CTTTGT
1027.58
1155
1.124
0.117

LC
TTATGT
599.35
627
1.046
0.045

LC
CTGTGC
3676.77
3517
0.957
−0.044

LC
TTGTGT
1006.86
856
0.850
−0.162

LC
CTTTGC
1219.99
974
0.798
−0.225

LC
CTGTGT
3096.89
2370
0.765
−0.268

LC
CTATGT
554.17
417
0.752
−0.284

LC
TTGTGC
1195.39
722
0.604
−0.504

LC
TTATGC
711.58
368
0.517
−0.659

LC
CTATGC
657.93
332
0.505
−0.684

LD
TTGGAT
2174.51
3688
1.696
0.528

LD
TTAGAT
1294.41
1977
1.527
0.424

LD
CTGGAC
7555.23
10531
1.394
0.332

LD
CTAGAT
1196.83
1584
1.323
0.280

LD
TTGGAC
2456.35
2775
1.130
0.122

LD
CTTGAT
2219.25
2463
1.110
0.104

LD
CTGGAT
6688.33
6912
1.033
0.033

LD
CTAGAC
1351.95
1390
1.028
0.028

LD
CTTGAC
2506.90
1832
0.731
−0.314

LD
TTAGAC
1462.19
969
0.663
−0.411

LD
CTCGAC
3635.60
981
0.270
−1.310

LD
CTCGAT
3218.44
658
0.204
−1.587

LE
TTAGAA
1739.66
3085
1.773
0.573

LE
CTAGAA
1608.51
2701
1.679
0.518

LE
TTGGAA
2922.49
4652
1.592
0.465

LE
CTGGAG
12021.09
18044
1.501
0.406

LE
TTGGAG
3908.29
4774
1.222
0.200

LE
CTAGAG
2151.09
2515
1.169
0.156

LE
CTTGAA
2982.63
3161
1.060
0.058

LE
CTGGAA
8988.96
7642
0.850
−0.162

LE
TTAGAG
2326.48
1873
0.805
−0.217

LE
CTTGAG
3988.72
2484
0.623
−0.474

LE
CTCGAG
5784.58
1305
0.226
−1.489

LE
CTCGAA
4325.51
512
0.118
−2.134

LF
CTCTTC
2629.18
6495
2.470
0.904

LF
TTATTT
923.85
1405
1.521
0.419

LF
CTCTTT
2297.07
3446
1.500
0.406

LF
CTTTTT
1583.93
1937
1.223
0.201

LF
CTTTTC
1812.93
1936
1.068
0.066

LF
CTATTT
854.20
876
1.026
0.025

LF
TTGTTT
1551.99
1544
0.995
−0.005

LF
CTGTTT
4773.59
2957
0.619
−0.479

LF
CTGTTC
5463.77
3119
0.571
−0.561

LF
TTATTC
1057.42
583
0.551
−0.595

LF
TTGTTC
1776.38
940
0.529
−0.636

LF
CTATTC
977.70
464
0.475
−0.745

LG
CTTGGA
1534.14
2667
1.738
0.553

LG
CTTGGT
993.23
1579
1.590
0.464

LG
CTGGGC
6268.87
9794
1.562
0.446

LG
CTAGGA
827.35
1087
1.314
0.273

LG
CTTGGG
1497.70
1881
1.256
0.228

LG
TTAGGA
894.81
1114
1.245
0.219

LG
CTGGGG
4513.74
5602
1.241
0.216

LG
TTGGGT
973.20
1194
1.227
0.204

LG
TTGGGA
1503.20
1820
1.211
0.191

LG
CTAGGT
535.64
611
1.141
0.132

LG
TTAGGT
579.32
611
1.055
0.053

LG
TTGGGG
1467.50
1452
0.989
−0.011

LG
CTGGGT
2993.37
2947
0.985
−0.016

LG
CTTGGC
2080.08
2009
0.966
−0.035

LG
CTAGGG
807.70
766
0.948
−0.053

LG
TTGGGC
2038.13
1786
0.876
−0.132

LG
CTGGGA
4623.54
4034
0.872
−0.136

LG
CTAGGC
1121.77
940
0.838
−0.177

LG
TTAGGG
873.56
529
0.606
−0.502

LG
CTCGGG
2172.02
1076
0.495
−0.702

LG
CTCGGC
3016.60
1313
0.435
−0.832

LG
TTAGGC
1213.24
507
0.418
−0.873

LG
CTCGGT
1440.42
365
0.253
−1.373

LG
CTCGGA
2224.86
510
0.229
−1.473

LH
CTTCAT
1127.31
1980
1.756
0.563

LH
TTACAT
657.52
935
1.422
0.352

LH
CTACAT
607.95
741
1.219
0.198

LH
CTGCAC
4685.05
5459
1.165
0.153

LH
CTCCAC
2254.46
2204
0.978
−0.023

LH
CTTCAC
1554.55
1490
0.958
−0.042

LH
CTCCAT
1634.86
1521
0.930
−0.072

LH
CTACAC
838.36
777
0.927
−0.076

LH
TTGCAT
1104.58
1017
0.921
−0.083

LH
TTGCAC
1523.20
1140
0.748
−0.290

LH
CTGCAT
3397.45
2394
0.705
−0.350

LH
TTACAC
906.71
634
0.699
−0.358

LI
CTCATC
2602.42
6250
2.402
0.876

LI
TTAATA
380.66
798
2.096
0.740

LI
TTAATT
827.68
1290
1.559
0.444

LI
CTCATT
2057.96
3117
1.515
0.415

LI
CTAATA
351.96
516
1.466
0.383

LI
CTAATT
765.28
952
1.244
0.218

LI
CTTATT
1419.05
1761
1.241
0.216

LI
TTGATA
639.48
791
1.237
0.213

LI
TTGATT
1390.44
1468
1.056
0.054

LI
CTTATA
652.64
683
1.047
0.045

LI
CTCATA
946.48
919
0.971
−0.029

LI
CTTATC
1794.48
1189
0.663
−0.412

LI
TTGATC
1758.29
1135
0.646
−0.438

LI
CTGATC
5408.15
3356
0.621
−0.477

LI
CTGATT
4276.70
2639
0.617
−0.483

LI
CTGATA
1966.91
1193
0.607
−0.500

LI
TTAATC
1046.66
633
0.605
−0.503

LI
CTAATC
967.75
563
0.582
−0.542

LK
TTAAAA
1429.91
2557
1.788
0.581

LK
CTAAAA
1322.10
1842
1.393
0.332

LK
TTGAAA
2402.12
3193
1.329
0.285

LK
CTCAAG
4604.55
6048
1.313
0.273

LK
CTAAAG
1712.27
2078
1.214
0.194

LK
TTAAAG
1851.89
2128
1.149
0.139

LK
CTGAAG
9568.82
10212
1.067
0.065

LK
TTGAAG
3111.01
3222
1.036
0.035

LK
CTCAAA
3555.33
2768
0.779
−0.250

LK
CTTAAA
2451.55
1850
0.755
−0.282

LK
CTGAAA
7388.42
5227
0.707
−0.346

LK
CTTAAG
3175.03
1448
0.456
−0.785

LL
TTATTA
500.55
802
1.602
0.471

LL
CTTCTA
793.49
1132
1.427
0.355

LL
CTTCTT
1471.36
2099
1.427
0.355

LL
CTTTTA
858.19
1203
1.402
0.338

LL
CTGCTG
13364.10
18236
1.365
0.311

LL
CTTTTG
1441.69
1945
1.349
0.299

LL
TTACTA
462.82
608
1.314
0.273

LL
CTCCTC
3094.54
3800
1.228
0.205

LL
CTCCTG
6430.85
7786
1.211
0.191

LL
TTACTT
858.19
1039
1.211
0.191

LL
TTGCTA
777.49
929
1.195
0.178

LL
CTGCTC
6430.85
7550
1.174
0.160

LL
CTACTA
427.93
474
1.108
0.102

LL
CTTCTC
2133.82
2292
1.074
0.072

LL
CTACTT
793.49
839
1.057
0.056

LL
CTCTTG
2090.79
2131
1.019
0.019

LL
TTGCTT
1441.69
1464
1.015
0.015

LL
TTATTG
840.89
818
0.973
−0.028

LL
CTCCTT
2133.82
2034
0.953
−0.048

LL
TTGTTA
840.89
771
0.917
−0.087

LL
TTGTTG
1412.62
1289
0.912
−0.092

LL
CTCCTA
1150.75
1034
0.899
−0.107

LL
TTGCTG
4344.93
3820
0.879
−0.129

LL
CTTCTG
4434.34
3837
0.865
−0.145

LL
CTGCTA
2391.41
1913
0.800
−0.223

LL
CTCTTA
1244.58
959
0.771
−0.261

LL
CTATTA
462.82
354
0.765
−0.268

LL
CTGCTT
4434.34
3148
0.710
−0.343

LL
TTGCTC
2090.79
1440
0.689
−0.373

LL
CTACTC
1150.75
792
0.688
−0.374

LL
CTATTG
777.49
532
0.684
−0.379

LL
CTACTG
2391.41
1583
0.662
−0.413

LL
CTGTTG
4344.93
2615
0.602
−0.508

LL
TTACTC
1244.58
657
0.528
−0.639

LL
TTACTG
2586.40
1358
0.525
−0.644

LL
CTGTTA
2586.40
953
0.368
−0.998

LM
CTCATG
2631.41
4030
1.531
0.426

LM
TTAATG
1058.32
1228
1.160
0.149

LM
CTAATG
978.53
1101
1.125
0.118

LM
TTGATG
1777.88
1763
0.992
−0.008

LM
CTGATG
5468.39
4470
0.817
−0.202

LM
CTTATG
1814.47
1137
0.627
−0.467

LN
TTAAAT
962.36
1926
2.001
0.694

LN
CTCAAC
2635.40
4681
1.776
0.574

LN
CTAAAT
889.81
1446
1.625
0.486

LN
TTGAAT
1616.68
2048
1.267
0.236

LN
CTCAAT
2392.82
2652
1.108
0.103

LN
CTAAAC
980.01
922
0.941
−0.061

LN
TTAAAC
1059.92
965
0.910
−0.094

LN
CTTAAT
1649.95
1441
0.873
−0.135

LN
TTGAAC
1780.58
1541
0.865
−0.145

LN
CTGAAC
5476.68
4308
0.787
−0.240

LN
CTGAAT
4972.58
3413
0.686
−0.376

LN
CTTAAC
1817.22
891
0.490
−0.713

LP
CTTCCT
1728.14
2795
1.617
0.481

LP
CTTCCA
1667.88
2369
1.420
0.351

LP
CTGCCC
5815.10
7856
1.351
0.301

LP
TTACCT
1007.96
1244
1.234
0.210

LP
CTGCCG
2107.02
2489
1.181
0.167

LP
TTACCA
972.81
1140
1.172
0.159

LP
CTCCCG
1013.90
1184
1.168
0.155

LP
TTGCCA
1634.25
1897
1.161
0.149

LP
CTACCT
931.97
1045
1.121
0.114

LP
TTGCCT
1693.30
1800
1.063
0.061

LP
CTTCCC
1929.51
1889
0.979
−0.021

LP
CTACCA
899.47
850
0.945
−0.057

LP
CTCCCA
2418.82
2126
0.879
−0.129

LP
CTGCCT
5208.23
4563
0.876
−0.132

LP
CTCCCT
2506.21
2192
0.875
−0.134

LP
CTACCC
1040.57
888
0.853
−0.159

LP
CTCCCC
2798.25
2369
0.847
−0.167

LP
TTGCCC
1890.60
1560
0.825
−0.192

LP
TTGCCG
685.03
478
0.698
−0.360

LP
CTGCCA
5026.60
3348
0.666
−0.406

LP
CTTCCG
699.13
451
0.645
−0.438

LP
TTACCC
1125.42
666
0.592
−0.525

LP
CTACCG
377.04
211
0.560
−0.580

LP
TTACCG
407.78
175
0.429
−0.846

LQ
TTACAA
864.28
1290
1.493
0.401

LQ
CTACAA
799.12
1188
1.487
0.397

LQ
CTTCAA
1481.79
2098
1.416
0.348

LQ
CTACAG
2231.48
2674
1.198
0.181

LQ
CTGCAG
12470.36
14508
1.163
0.151

LQ
CTTCAG
4137.79
4363
1.054
0.053

LQ
TTGCAA
1451.91
1467
1.010
0.010

LQ
CTCCAG
6000.78
5430
0.905
−0.100

LQ
TTACAG
2413.43
2107
0.873
−0.136

LQ
TTGCAG
4054.36
3177
0.784
−0.244

LQ
CTCCAA
2148.94
1524
0.709
−0.344

LQ
CTGCAA
4465.77
2694
0.603
−0.505

LR
CTTCGA
661.43
1365
2.064
0.725

LR
CTTCGT
477.64
784
1.641
0.496

LR
CTGCGG
3677.31
5467
1.487
0.397

LR
TTAAGA
717.74
1026
1.429
0.357

LR
CTGCGC
3362.26
4574
1.360
0.308

LR
CTCCGA
959.23
1289
1.344
0.295

LR
CTCCGG
1769.53
2229
1.260
0.231

LR
CTAAGA
663.63
821
1.237
0.213

LR
CTCAGG
1752.00
2047
1.168
0.156

LR
CTTCGG
1220.17
1415
1.160
0.148

LR
CTCCGT
692.69
771
1.113
0.107

LR
TTACGA
385.79
427
1.107
0.101

LR
CTAAGG
651.51
721
1.107
0.101

LR
CTCCGC
1617.93
1790
1.106
0.101

LR
TTGAGA
1205.75
1290
1.070
0.068

LR
CTACGT
257.59
275
1.068
0.065

LR
CTACGA
356.70
378
1.060
0.058

LR
CTGAGG
3640.88
3637
0.999
−0.001

LR
TTAAGG
704.63
678
0.962
−0.039

LR
TTACGT
278.59
264
0.948
−0.054

LR
CTGCGT
1439.50
1363
0.947
−0.055

LR
TTGAGG
1183.72
1080
0.912
−0.092

LR
CTACGG
658.03
577
0.877
−0.131

LR
CTCAGA
1784.60
1469
0.823
−0.195

LR
CTTCGC
1115.63
819
0.734
−0.309

LR
CTACGC
601.65
438
0.728
−0.317

LR
CTGCGA
1993.40
1399
0.702
−0.354

LR
TTGCGT
468.01
321
0.686
−0.377

LR
CTGAGA
3708.63
2486
0.670
−0.400

LR
TTGCGG
1195.56
772
0.646
−0.437

LR
TTGCGA
648.09
418
0.645
−0.439

LR
CTTAGA
1230.56
694
0.564
−0.573

LR
TTACGG
711.68
383
0.538
−0.620

LR
TTGCGC
1093.14
542
0.496
−0.702

LR
CTTAGG
1208.08
503
0.416
−0.876

LR
TTACGC
650.71
232
0.357
−1.031

LS
CTCAGC
2740.30
5167
1.886
0.634

LS
CTTTCT
1450.83
2502
1.725
0.545

LS
CTCTCC
2418.72
4070
1.683
0.520

LS
CTCTCG
639.61
1016
1.588
0.463

LS
CTCAGT
1728.87
2589
1.498
0.404

LS
TTATCA
684.12
963
1.408
0.342

LS
TTATCT
846.22
1175
1.389
0.328

LS
CTTTCA
1172.91
1626
1.386
0.327

LS
TTAAGT
695.33
886
1.274
0.242

LS
CTCTCT
2104.05
2553
1.213
0.193

LS
CTAAGT
642.91
770
1.198
0.180

LS
CTCTCA
1701.00
2003
1.178
0.163

LS
CTTTCC
1667.81
1819
1.091
0.087

LS
TTGTCA
1149.26
1210
1.053
0.052

LS
CTGTCG
1329.18
1392
1.047
0.046

LS
TTGTCT
1421.58
1461
1.028
0.027

LS
CTGAGC
5694.68
5805
1.019
0.019

LS
CTGTCC
5026.41
4628
0.921
−0.083

LS
TTGAGT
1168.09
1035
0.886
−0.121

LS
TTGTCC
1634.18
1334
0.816
−0.203

LS
CTATCA
632.54
512
0.809
−0.211

LS
CTAAGC
1019.02
791
0.776
−0.253

LS
TTATCC
972.78
727
0.747
−0.291

LS
CTGAGT
3592.81
2665
0.742
−0.299

LS
CTTAGT
1192.13
856
0.718
−0.331

LS
CTATCT
782.42
557
0.712
−0.340

LS
CTGTCT
4372.48
2950
0.675
−0.394

LS
CTTTCG
441.04
291
0.660
−0.416

LS
TTGTCG
432.14
278
0.643
−0.441

LS
CTGTCA
3534.89
2228
0.630
−0.462

LS
TTGAGC
1851.45
1128
0.609
−0.496

LS
CTATCC
899.44
541
0.601
−0.508

LS
TTATCG
257.24
152
0.591
−0.526

LS
TTAAGC
1102.11
551
0.500
−0.693

LS
CTATCG
237.85
102
0.429
−0.847

LS
CTTAGC
1889.55
793
0.420
−0.868

LT
CTCACC
2534.19
4959
1.957
0.671

LT
CTCACG
832.47
1510
1.814
0.595

LT
TTAACA
825.09
1163
1.410
0.343

LT
CTCACT
1814.22
2521
1.390
0.329

LT
TTAACT
729.65
969
1.328
0.284

LT
CTAACT
674.64
817
1.211
0.191

LT
CTAACA
762.89
898
1.177
0.163

LT
CTCACA
2051.52
2374
1.157
0.146

LT
CTGACG
1729.98
1795
1.038
0.037

LT
TTGACT
1225.76
1259
1.027
0.027

LT
TTGACA
1386.09
1401
1.011
0.011

LT
CTTACT
1250.98
1259
1.006
0.006

LT
CTGACC
5266.36
5160
0.980
−0.020

LT
CTTACA
1414.61
1109
0.784
−0.243

LT
CTGACT
3770.17
2808
0.745
−0.295

LT
TTGACC
1712.20
1235
0.721
−0.327

LT
CTAACC
942.38
678
0.719
−0.329

LT
TTGACG
562.45
399
0.709
−0.343

LT
CTGACA
4263.32
3003
0.704
−0.350

LT
CTAACG
309.57
215
0.695
−0.365

LT
TTAACC
1019.22
687
0.674
−0.394

LT
CTTACC
1747.43
1104
0.632
−0.459

LT
TTAACG
334.81
164
0.490
−0.714

LT
CTTACG
574.02
247
0.430
−0.843

LV
CTTGTT
1029.60
1741
1.691
0.525

LV
TTAGTA
389.95
602
1.544
0.434

LV
TTGGTA
655.07
980
1.496
0.403

LV
CTTGTA
668.56
993
1.485
0.396

LV
CTGGTG
7859.41
11424
1.454
0.374

LV
CTAGTA
360.55
519
1.439
0.364

LV
TTGGTT
1008.84
1427
1.414
0.347

LV
CTTGTC
1318.22
1541
1.169
0.156

LV
TTAGTT
600.53
690
1.149
0.139

LV
CTGGTC
3972.81
4541
1.143
0.134

LV
TTGGTG
2555.25
2882
1.128
0.120

LV
CTAGTT
555.26
580
1.045
0.044

LV
TTGGTC
1291.64
1345
1.041
0.040

LV
CTTGTG
2607.83
2540
0.974
−0.026

LV
CTAGTG
1406.38
1272
0.904
−0.100

LV
CTGGTA
2014.87
1720
0.854
−0.158

LV
CTGGTT
3102.98
2576
0.830
−0.186

LV
CTAGTC
710.90
551
0.775
−0.255

LV
TTAGTG
1521.06
947
0.623
−0.474

LV
TTAGTC
768.87
416
0.541
−0.614

LV
CTCGTC
1911.73
1013
0.530
−0.635

LV
CTCGTG
3781.97
1691
0.447
−0.805

LV
CTCGTT
1493.16
373
0.250
−1.387

LV
CTCGTA
969.56
191
0.197
−1.625

LW
CTCTGG
1742.64
2796
1.604
0.473

LW
CTGTGG
3621.43
3365
0.929
−0.073

LW
CTTTGG
1201.63
1018
0.847
−0.166

LW
CTATGG
648.03
501
0.773
−0.257

LW
TTATGG
700.87
535
0.763
−0.270

LW
TTGTGG
1177.40
877
0.745
−0.295

LY
CTCTAC
2082.09
4204
2.019
0.703

LY
TTATAT
680.44
1022
1.502
0.407

LY
CTCTAT
1691.85
2487
1.470
0.385

LY
CTTTAT
1166.60
1591
1.364
0.310

LY
CTATAT
629.14
596
0.947
−0.054

LY
TTGTAT
1143.08
1063
0.930
−0.073

LY
CTGTAC
4326.84
3390
0.783
−0.244

LY
CTTTAC
1435.69
1069
0.745
−0.295

LY
TTGTAC
1406.74
1006
0.715
−0.335

LY
TTATAC
837.39
579
0.691
−0.369

LY
CTGTAT
3515.88
2202
0.626
−0.468

LY
CTATAC
774.26
481
0.621
−0.476

MA
ATGGCG
1645.46
2370
1.440
0.365

MA
ATGGCA
3503.58
3580
1.022
0.022

MA
ATGGCT
4002.27
4003
1.000
0.000

MA
ATGGCC
6085.70
5284
0.868
−0.141

MC
ATGTGT
1386.67
1448
1.044
0.043

MC
ATGTGC
1646.33
1585
0.963
−0.038

MD
ATGGAT
4467.48
4634
1.037
0.037

MD
ATGGAC
5046.52
4880
0.967
−0.034

ME
ATGGAG
8054.28
8223
1.021
0.021

ME
ATGGAA
6022.72
5854
0.972
−0.028

MF
ATGTTT
2565.53
2833
1.104
0.099

MF
ATGTTC
2936.47
2669
0.909
−0.096

MG
ATGGGC
3467.73
3533
1.019
0.019

MG
ATGGGT
1655.83
1675
1.012
0.012

MG
ATGGGA
2557.59
2526
0.988
−0.012

MG
ATGGGG
2496.85
2444
0.979
−0.021

MH
ATGCAT
1465.33
1478
1.009
0.009

MH
ATGCAC
2020.67
2008
0.994
−0.006

MI
ATGATT
2305.40
2382
1.033
0.033

MI
ATGATA
1060.28
1094
1.032
0.031

MI
ATGATC
2915.32
2805
0.962
−0.039

MK
ATGAAG
6107.32
6423
1.052
0.050

MK
ATGAAA
4715.68
4400
0.933
−0.069

ML
ATGCTG
5938.40
6536
1.101
0.096

ML
ATGCTA
1062.63
1122
1.056
0.054

ML
ATGTTG
1930.69
1922
0.995
−0.005

ML
ATGTTA
1149.28
1134
0.987
−0.013

ML
ATGCTT
1970.42
1887
0.958
−0.043

ML
ATGCTC
2857.58
2308
0.808
−0.214

MM
ATGATG
3925.00
3925
1.000
0.000

MN
ATGAAT
3249.30
3301
1.016
0.016

MN
ATGAAC
3578.70
3527
0.986
−0.015

MP
ATGCCC
2676.16
2752
1.028
0.028

MP
ATGCCA
2313.29
2313
1.000
0.000

MP
ATGCCT
2396.87
2372
0.990
−0.010

MP
ATGCCG
969.67
919
0.948
−0.054

MQ
ATGCAG
5141.70
5165
1.005
0.005

MQ
ATGCAA
1841.30
1818
0.987
−0.013

MR
ATGAGG
1626.37
2127
1.308
0.268

MR
ATGAGA
1656.63
1974
1.192
0.175

MR
ATGCGG
1642.64
1513
0.921
−0.082

MR
ATGCGT
643.02
531
0.826
−0.191

MR
ATGCGA
890.44
684
0.768
−0.264

MR
ATGCGC
1501.91
1132
0.754
−0.283

MS
ATGTCG
666.33
809
1.214
0.194

MS
ATGTCT
2191.95
2338
1.067
0.065

MS
ATGTCA
1772.07
1781
1.005
0.005

MS
ATGTCC
2519.77
2493
0.989
−0.011

MS
ATGAGT
1801.10
1770
0.983
−0.017

MS
ATGAGC
2854.78
2615
0.916
−0.088

MT
ATGACT
2098.83
2195
1.046
0.045

MT
ATGACC
2931.75
2927
0.998
−0.002

MT
ATGACA
2373.36
2337
0.985
−0.015

MT
ATGACG
963.07
908
0.943
−0.059

MV
ATGGTG
4813.46
5122
1.064
0.062

MV
ATGGTT
1900.41
1915
1.008
0.008

MV
ATGGTA
1234.00
1191
0.965
−0.035

MV
ATGGTC
2433.13
2153
0.885
−0.122

MW
ATGTGG
1876.00
1876
1.000
0.000

MY
ATGTAC
2354.66
2363
1.004
0.004

MY
ATGTAT
1913.34
1905
0.996
−0.004

NA
AATGCA
1705.68
3344
1.961
0.673

NA
AATGCT
1948.47
3458
1.775
0.574

NA
AATGCC
2962.77
4259
1.438
0.363

NA
AATGCG
801.08
624
0.779
−0.250

NA
AACGCG
882.29
661
0.749
−0.289

NA
AACGCC
3263.12
1899
0.582
−0.541

NA
AACGCA
1878.60
700
0.373
−0.987

NA
AACGCT
2146.00
643
0.300
−1.205

NC
AACTGC
1868.57
2826
1.512
0.414

NC
AACTGT
1573.86
2016
1.281
0.248

NC
AATTGT
1429.00
935
0.654
−0.424

NC
AATTGC
1696.57
791
0.466
−0.763

ND
AATGAT
2555.01
4420
1.730
0.548

ND
AATGAC
2886.18
4521
1.566
0.449

ND
AACGAC
3178.77
1654
0.520
−0.653

ND
AACGAT
2814.03
839
0.298
−1.210

NE
AATGAA
3381.19
7367
2.179
0.779

NE
AATGAG
4521.72
5796
1.282
0.248

NE
AACGAG
4980.12
2476
0.497
−0.699

NE
AACGAA
3723.97
968
0.260
−1.347

NF
AACTTC
3150.86
4259
1.352
0.301

NF
AACTTT
2752.85
2846
1.034
0.033

NF
AATTTT
2499.46
2350
0.940
−0.062

NF
AATTTC
2860.84
1809
0.632
−0.458

NG
AATGGA
2235.93
4484
2.005
0.696

NG
AATGGT
1447.59
2430
1.679
0.518

NG
AATGGG
2182.83
3202
1.467
0.383

NG
AATGGC
3031.62
4001
1.320
0.277

NG
AACGGG
2404.12
1508
0.627
−0.466

NG
AACGGC
3338.95
1752
0.525
−0.645

NG
AACGGA
2462.61
804
0.326
−1.119

NG
AACGGT
1594.34
517
0.324
−1.126

NH
AACCAC
2167.68
2776
1.281
0.247

NH
AACCAT
1571.93
1639
1.043
0.042

NH
AATCAT
1427.24
1456
1.020
0.020

NH
AATCAC
1968.15
1264
0.642
−0.443

NI
AACATC
3876.27
5487
1.416
0.348

NI
AACATT
3065.31
3184
1.039
0.038

NI
AATATA
1280.01
1309
1.023
0.022

NI
AACATA
1409.77
1384
0.982
−0.018

NI
AATATT
2783.16
2725
0.979
−0.021

NI
AATATC
3519.48
1845
0.524
−0.646

NK
AACAAG
4824.98
5918
1.227
0.204

NK
AACAAA
3725.54
4221
1.133
0.125

NK
AATAAA
3382.62
3607
1.066
0.064

NK
AATAAG
4380.86
2568
0.586
−0.534

NL
AATTTA
1025.31
1571
1.532
0.427

NL
AACCTC
2807.78
3954
1.408
0.342

NL
AACTTG
1897.05
2429
1.280
0.247

NL
AACCTG
5834.92
6690
1.147
0.137

NL
AATTTG
1722.43
1947
1.130
0.123

NL
AATCTT
1757.88
1943
1.105
0.100

NL
AACCTA
1044.12
1135
1.087
0.083

NL
AACCTT
1936.08
2021
1.044
0.043

NL
AACTTA
1129.25
1129
1.000
0.000

NL
AATCTA
948.01
893
0.942
−0.060

NL
AATCTC
2549.34
1713
0.672
−0.398

NL
AATCTG
5297.84
2525
0.477
−0.741

NM
AACATG
3351.76
4374
1.305
0.266

NM
AATATG
3043.24
2021
0.664
−0.409

NN
AACAAC
3150.02
4430
1.406
0.341

NN
AACAAT
2860.08
2830
0.989
−0.011

NN
AATAAT
2596.82
2424
0.933
−0.069

NN
AATAAC
2860.08
1783
0.623
−0.473

NP
AACCCC
2770.02
3474
1.254
0.226

NP
AATCCA
2174.02
2380
1.095
0.091

NP
AACCCA
2394.42
2612
1.091
0.087

NP
AATCCT
2252.58
2414
1.072
0.069

NP
AACCCG
1003.68
1048
1.044
0.043

NP
AACCCT
2480.94
2578
1.039
0.038

NP
AATCCC
2515.05
1641
0.652
−0.427

NP
AATCCG
911.29
355
0.390
−0.943

NQ
AATCAA
1516.57
1905
1.256
0.228

NQ
AACCAA
1670.31
1955
1.170
0.157

NQ
AACCAG
4664.22
5409
1.160
0.148

NQ
AATCAG
4234.90
2817
0.665
−0.408

NR
AACAGA
1511.98
2383
1.576
0.455

NR
AACCGC
1370.77
1966
1.434
0.361

NR
AACAGG
1484.36
1903
1.282
0.248

NR
AACCGA
812.69
998
1.228
0.205

NR
AACCGT
586.88
706
1.203
0.185

NR
AACCGG
1499.21
1779
1.187
0.171

NR
AATCGA
737.89
687
0.931
−0.071

NR
AATCGT
532.86
486
0.912
−0.092

NR
AATAGA
1372.81
1117
0.814
−0.206

NR
AATCGC
1244.60
602
0.484
−0.726

NR
AATAGG
1347.73
643
0.477
−0.740

NR
AATCGG
1361.22
593
0.436
−0.831

NS
AACAGC
2917.73
4490
1.539
0.431

NS
AACAGT
1840.81
2414
1.311
0.271

NS
AACTCG
681.02
821
1.206
0.187

NS
AATTCA
1644.43
1970
1.198
0.181

NS
AATTCT
2034.08
2383
1.172
0.158

NS
AACTCC
2575.33
2818
1.094
0.090

NS
AACTCA
1811.14
1783
0.984
−0.016

NS
AACTCT
2240.29
1981
0.884
−0.123

NS
AATAGT
1671.38
1193
0.714
−0.337

NS
AATTCC
2338.29
1655
0.708
−0.346

NS
AATAGC
2649.17
1273
0.481
−0.733

NS
AATTCG
618.33
241
0.390
−0.942

NT
AACACG
860.22
1238
1.439
0.364

NT
AACACA
2119.90
2783
1.313
0.272

NT
AACACC
2618.65
3278
1.252
0.225

NT
AACACT
1874.68
2099
1.120
0.113

NT
AATACT
1702.13
1540
0.905
−0.100

NT
AATACA
1924.77
1692
0.879
−0.129

NT
AATACC
2377.62
1312
0.552
−0.595

NT
AATACG
781.04
317
0.406
−0.902

NV
AATGTA
927.15
1710
1.844
0.612

NV
AATGTT
1427.85
2573
1.802
0.589

NV
AATGTC
1828.10
2877
1.574
0.453

NV
AATGTG
3616.54
4314
1.193
0.176

NV
AACGTG
3983.18
2772
0.696
−0.363

NV
AACGTC
2013.43
1341
0.666
−0.406

NV
AACGTT
1572.60
509
0.324
−1.128

NV
AACGTA
1021.14
294
0.288
−1.245

NW
AACTGG
1808.22
2595
1.435
0.361

NW
AATTGG
1641.78
855
0.521
−0.652

NY
AACTAC
2506.72
3191
1.273
0.241

NY
AACTAT
2036.89
2145
1.053
0.052

NY
AATTAT
1849.41
1795
0.971
−0.030

NY
AATTAC
2275.98
1538
0.676
−0.392

PA
CCGGCG
470.57
1166
2.478
0.907

PA
CCGGCC
1740.39
2666
1.532
0.426

PA
CCAGCA
2390.31
3368
1.409
0.343

PA
CCAGCT
2730.54
3622
1.326
0.283

PA
CCTGCT
2829.20
3750
1.325
0.282

PA
CCTGCA
2476.67
3178
1.283
0.249

PA
CCAGCC
4151.96
4942
1.190
0.174

PA
CCCGCG
1298.71
1528
1.177
0.163

PA
CCTGCC
4301.98
5000
1.162
0.150

PA
CCAGCG
1122.61
1078
0.960
−0.041

PA
CCTGCG
1163.17
1105
0.950
−0.051

PA
CCGGCT
1144.57
1013
0.885
−0.122

PA
CCGGCA
1001.95
777
0.775
−0.254

PA
CCCGCC
4803.25
2690
0.560
−0.580

PA
CCCGCA
2765.26
846
0.306
−1.184

PA
CCCGCT
3158.86
821
0.260
−1.347

PC
CCCTGC
1550.51
2870
1.851
0.616

PC
CCCTGT
1305.97
1577
1.208
0.189

PC
CCGTGC
561.80
630
1.121
0.115

PC
CCTTGT
1169.67
1001
0.856
−0.156

PC
CCATGT
1128.89
831
0.736
−0.306

PC
CCGTGT
473.20
340
0.719
−0.331

PC
CCTTGC
1388.69
937
0.675
−0.393

PC
CCATGC
1340.27
733
0.547
−0.603

PD
CCAGAT
2721.60
4165
1.530
0.425

PD
CCTGAT
2819.94
3781
1.341
0.293

PD
CCGGAC
1288.69
1659
1.287
0.253

PD
CCAGAC
3074.36
3766
1.225
0.203

PD
CCTGAC
3185.44
3646
1.145
0.135

PD
CCGGAT
1140.82
895
0.785
−0.243

PD
CCCGAC
3556.62
2215
0.623
−0.474

PD
CCCGAT
3148.53
809
0.257
−1.359

PE
CCAGAA
3999.86
5699
1.425
0.354

PE
CCTGAG
5542.36
7122
1.285
0.251

PE
CCGGAG
2242.20
2870
1.280
0.247

PE
CCAGAG
5349.08
6777
1.267
0.237

PE
CCTGAA
4144.39
5108
1.233
0.209

PE
CCCGAG
6188.17
4149
0.670
−0.400

PE
CCGGAA
1676.64
1032
0.616
−0.485

PE
CCCGAA
4627.30
1013
0.219
−1.519

PF
CCCTTC
2555.92
4301
1.683
0.520

PF
CCATTT
1930.27
2057
1.066
0.064

PF
CCTTTT
2000.01
1967
0.983
−0.017

PF
CCCTTT
2233.06
2159
0.967
−0.034

PF
CCTTTC
2289.18
2078
0.908
−0.097

PF
CCGTTC
926.10
662
0.715
−0.336

PF
CCATTC
2209.35
1290
0.584
−0.538

PF
CCGTTT
809.12
439
0.543
−0.611

PG
CCTGGG
2918.52
4310
1.477
0.390

PG
CCTGGA
2989.52
4317
1.444
0.367

PG
CCGGGC
1639.82
2353
1.435
0.361

PG
CCGGGG
1180.71
1657
1.403
0.339

PG
CCTGGT
1935.48
2673
1.381
0.323

PG
CCAGGA
2885.27
3897
1.351
0.301

PG
CCAGGG
2816.75
3472
1.233
0.209

PG
CCAGGT
1867.98
2259
1.209
0.190

PG
CCTGGC
4053.37
4622
1.140
0.131

PG
CCAGGC
3912.02
4106
1.050
0.048

PG
CCGGGT
783.01
661
0.844
−0.169

PG
CCGGGA
1209.43
963
0.796
−0.228

PG
CCCGGG
3258.60
2136
0.655
−0.422

PG
CCCGGC
4525.68
2555
0.565
−0.572

PG
CCCGGA
3337.86
968
0.290
−1.238

PG
CCCGGT
2161.00
526
0.243
−1.413

PH
CCGCAC
725.13
972
1.340
0.293

PH
CCCCAC
2001.25
2505
1.252
0.225

PH
CCTCAT
1299.79
1592
1.225
0.203

PH
CCACAT
1254.46
1222
0.974
−0.026

PH
CCCCAT
1451.24
1303
0.898
−0.108

PH
CCTCAC
1792.40
1531
0.854
−0.158

PH
CCACAC
1729.89
1366
0.790
−0.236

PH
CCGCAT
525.84
289
0.550
−0.599

PI
CCCATC
2119.04
4651
2.195
0.786

PI
CCCATT
1675.71
2102
1.254
0.227

PI
CCAATA
666.18
819
1.229
0.207

PI
CCCATA
770.68
776
1.007
0.007

PI
CCAATT
1448.49
1386
0.957
−0.044

PI
CCTATA
690.25
603
0.874
−0.135

PI
CCTATT
1500.83
1266
0.844
−0.170

PI
CCAATC
1831.71
939
0.513
−0.668

PI
CCTATC
1897.89
957
0.504
−0.685

PI
CCGATT
607.17
299
0.492
−0.708

PI
CCGATC
767.80
342
0.445
−0.809

PI
CCGATA
279.24
115
0.412
−0.887

PK
CCCAAG
3738.47
6383
1.707
0.535

PK
CCCAAA
2886.60
3787
1.312
0.271

PK
CCAAAA
2495.20
2489
0.998
−0.002

PK
CCAAAG
3231.55
3127
0.968
−0.033

PK
CCTAAA
2585.35
1840
0.712
−0.340

PK
CCGAAG
1354.58
940
0.694
−0.365

PK
CCTAAG
3348.32
1660
0.496
−0.702

PK
CCGAAA
1045.92
460
0.440
−0.821

PL
CCGCTG
1824.84
3343
1.832
0.605

PL
CCGCTC
878.12
1254
1.428
0.356

PL
CCTTTG
1466.52
2054
1.401
0.337

PL
CCTTTA
872.97
1195
1.369
0.314

PL
CCCTTG
1637.40
2122
1.296
0.259

PL
CCTCTT
1496.70
1827
1.221
0.199

PL
CCCCTG
5036.31
5760
1.144
0.134

PL
CCCCTC
2423.49
2646
1.092
0.088

PL
CCTCTA
807.16
871
1.079
0.076

PL
CCATTA
842.53
826
0.980
−0.020

PL
CCACTT
1444.51
1371
0.949
−0.052

PL
CCACTA
779.01
729
0.936
−0.066

PL
CCTCTC
2170.57
1934
0.891
−0.115

PL
CCTCTG
4510.71
3745
0.830
−0.186

PL
CCATTG
1415.38
1172
0.828
−0.189

PL
CCCCTT
1671.10
1324
0.792
−0.233

PL
CCGCTA
326.54
255
0.781
−0.247

PL
CCCCTA
901.21
689
0.765
−0.268

PL
CCACTG
4353.41
3218
0.739
−0.302

PL
CCCTTA
974.69
709
0.727
−0.318

PL
CCACTC
2094.88
1475
0.704
−0.351

PL
CCGTTG
593.29
402
0.678
−0.389

PL
CCGCTT
605.50
402
0.664
−0.410

PL
CCGTTA
353.17
157
0.445
−0.811

PM
CCCATG
2307.54
3923
1.700
0.531

PM
CCAATG
1994.65
1552
0.778
−0.251

PM
CCGATG
836.10
520
0.622
−0.475

PM
CCTATG
2066.72
1210
0.585
−0.535

PN
CCCAAC
2313.61
4255
1.839
0.609

PN
CCAAAT
1815.81
2453
1.351
0.301

PN
CCCAAT
2100.65
2296
1.093
0.089

PN
CCAAAC
1999.90
1735
0.868
−0.142

PN
CCTAAT
1881.42
1342
0.713
−0.338

PN
CCTAAC
2072.16
997
0.481
−0.732

PN
CCGAAT
761.14
340
0.447
−0.806

PP
CCGCCG
608.57
2335
3.837
1.345

PP
CCGCCC
1679.58
2697
1.606
0.474

PP
CCCCCG
1679.58
2420
1.441
0.365

PP
CCTCCA
3588.72
4314
1.202
0.184

PP
CCTCCT
3718.39
4305
1.158
0.146

PP
CCACCA
3463.58
3850
1.112
0.106

PP
CCACCT
3588.72
3798
1.058
0.057

PP
CCCCCA
4006.89
4095
1.022
0.022

PP
CCACCC
4006.89
3595
0.897
−0.108

PP
CCGCCA
1451.84
1280
0.882
−0.126

PP
CCACCG
1451.84
1252
0.862
−0.148

PP
CCGCCT
1504.30
1286
0.855
−0.157

PP
CCTCCC
4151.67
3338
0.804
−0.218

PP
CCTCCG
1504.30
1152
0.766
−0.267

PP
CCCCCT
4151.67
3160
0.761
−0.273

PP
CCCCCC
4635.43
2315
0.499
−0.694

PQ
CCCCAG
5063.98
6421
1.268
0.237

PQ
CCGCAG
1834.86
2187
1.192
0.176

PQ
CCTCAA
1624.21
1752
1.079
0.076

PQ
CCTCAG
4535.49
4221
0.931
−0.072

PQ
CCACAA
1567.57
1405
0.896
−0.109

PQ
CCACAG
4377.33
3670
0.838
−0.176

PQ
CCCCAA
1813.47
1497
0.825
−0.192

PQ
CCGCAA
657.08
321
0.489
−0.716

PR
CCGCGC
563.43
1094
1.942
0.664

PR
CCGCGG
616.23
1113
1.806
0.591

PR
CCCAGG
1683.86
2927
1.738
0.553

PR
CCCCGG
1700.71
2608
1.533
0.428

PR
CCCCGC
1555.00
1979
1.273
0.241

PR
CCCCGA
921.92
1166
1.265
0.235

PR
CCTCGA
825.71
1015
1.229
0.206

PR
CCAAGA
1482.62
1608
1.085
0.081

PR
CCTCGT
596.27
644
1.080
0.077

PR
CCCAGA
1715.19
1801
1.050
0.049

PR
CCGAGG
610.12
636
1.042
0.042

PR
CCTCGG
1523.22
1511
0.992
−0.008

PR
CCCCGT
665.75
655
0.984
−0.016

PR
CCAAGG
1455.54
1347
0.925
−0.077

PR
CCACGA
796.91
632
0.793
−0.232

PR
CCGCGT
241.23
191
0.792
−0.233

PR
CCACGT
575.48
418
0.726
−0.320

PR
CCACGG
1470.10
1040
0.707
−0.346

PR
CCGCGA
334.04
226
0.677
−0.391

PR
CCTCGC
1392.72
838
0.602
−0.508

PR
CCACGC
1344.15
701
0.522
−0.651

PR
CCGAGA
621.48
308
0.496
−0.702

PR
CCTAGA
1536.19
692
0.450
−0.797

PR
CCTAGG
1508.13
586
0.389
−0.945

PS
CCCAGC
3196.25
6398
2.002
0.694

PS
CCCTCG
746.03
1385
1.856
0.619

PS
CCGTCG
270.31
483
1.787
0.580

PS
CCCAGT
2016.53
2743
1.360
0.308

PS
CCTTCA
1776.97
2263
1.274
0.242

PS
CCTTCT
2198.02
2711
1.233
0.210

PS
CCCTCC
2821.16
3353
1.189
0.173

PS
CCATCA
1715.00
1819
1.061
0.059

PS
CCATCT
2121.37
2183
1.029
0.029

PS
CCTTCC
2526.74
2594
1.027
0.026

PS
CCGTCC
1022.21
1048
1.025
0.025

PS
CCCTCA
1984.02
1945
0.980
−0.020

PS
CCAAGT
1743.10
1582
0.908
−0.097

PS
CCCTCT
2454.14
2113
0.861
−0.150

PS
CCTTCG
668.17
552
0.826
−0.191

PS
CCATCC
2438.63
1995
0.818
−0.201

PS
CCGAGC
1158.11
885
0.764
−0.269

PS
CCATCG
644.87
475
0.737
−0.306

PS
CCAAGC
2762.85
1659
0.600
−0.510

PS
CCGTCT
889.22
523
0.588
−0.531

PS
CCGAGT
730.66
371
0.508
−0.678

PS
CCGTCA
718.88
364
0.506
−0.681

PS
CCTAGT
1806.08
860
0.476
−0.742

PS
CCTAGC
2862.68
968
0.338
−1.084

PT
CCCACG
829.55
1764
2.126
0.754

PT
CCCACC
2525.29
4586
1.816
0.597

PT
CCCACA
2044.32
2719
1.330
0.285

PT
CCCACT
1807.85
2282
1.262
0.233

PT
CCAACA
1767.12
1895
1.072
0.070

PT
CCAACT
1562.71
1593
1.019
0.019

PT
CCGACG
300.57
305
1.015
0.015

PT
CCTACT
1619.18
1252
0.773
−0.257

PT
CCAACC
2182.87
1514
0.694
−0.366

PT
CCTACA
1830.97
1241
0.678
−0.389

PT
CCGACC
915.00
592
0.647
−0.435

PT
CCAACG
717.06
463
0.646
−0.437

PT
CCTACC
2261.75
1251
0.553
−0.592

PT
CCGACT
655.05
342
0.522
−0.650

PT
CCGACA
740.73
352
0.475
−0.744

PT
CCTACG
742.97
352
0.474
−0.747

PV
CCTGTT
1493.79
2375
1.590
0.464

PV
CCTGTA
969.97
1482
1.528
0.424

PV
CCAGTA
936.15
1352
1.444
0.368

PV
CCTGTG
3783.57
5362
1.417
0.349

PV
CCAGTT
1441.70
2038
1.414
0.346

PV
CCTGTC
1912.53
2666
1.394
0.332

PV
CCGGTG
1530.67
1911
1.248
0.222

PV
CCAGTG
3651.63
3787
1.037
0.036

PV
CCAGTC
1845.84
1863
1.009
0.009

PV
CCGGTC
773.73
778
1.006
0.006

PV
CCCGTG
4224.44
2576
0.610
−0.495

PV
CCGGTT
604.32
351
0.581
−0.543

PV
CCGGTA
392.41
215
0.548
−0.602

PV
CCCGTC
2135.39
1084
0.508
−0.678

PV
CCCGTT
1667.85
391
0.234
−1.451

PV
CCCGTA
1083.00
216
0.199
−1.612

PW
CCCTGG
1769.80
2753
1.556
0.442

PW
CCGTGG
641.26
661
1.031
0.030

PW
CCATGG
1529.83
1060
0.693
−0.367

PW
CCTTGG
1585.10
1052
0.664
−0.410

PY
CCCTAC
2166.25
3378
1.559
0.444

PY
CCCTAT
1760.24
2097
1.191
0.175

PY
CCTTAT
1576.54
1702
1.080
0.077

PY
CCATAT
1521.56
1513
0.994
−0.006

PY
CCTTAC
1940.18
1485
0.765
−0.267

PY
CCGTAC
784.91
592
0.754
−0.282

PY
CCGTAT
637.80
429
0.673
−0.397

PY
CCATAC
1872.52
1064
0.568
−0.565

QA
CAAGCA
1597.87
2339
1.464
0.381

QA
CAAGCT
1825.31
2409
1.320
0.277

QA
CAGGCG
2095.55
2271
1.084
0.080

QA
CAGGCC
7750.37
7695
0.993
−0.007

QA
CAAGCC
2775.49
2655
0.957
−0.044

QA
CAGGCT
5097.04
4584
0.899
−0.106

QA
CAGGCA
4461.94
3943
0.884
−0.124

QA
CAAGCG
750.44
458
0.610
−0.494

QC
CAGTGT
2490.13
2791
1.121
0.114

QC
CAGTGC
2956.40
3260
1.103
0.098

QC
CAATGT
891.74
822
0.922
−0.081

QC
CAATGC
1058.72
524
0.495
−0.703

QD
CAAGAT
2128.42
3326
1.563
0.446

QD
CAAGAC
2404.29
2506
1.042
0.041

QD
CAGGAC
6713.82
6642
0.989
−0.011

QD
CAGGAT
5943.46
4716
0.793
−0.231

QE
CAAGAA
3247.03
5286
1.628
0.487

QE
CAGGAG
12125.58
12556
1.035
0.035

QE
CAAGAG
4342.30
4206
0.969
−0.032

QE
CAGGAA
9067.09
6734
0.743
−0.297

QF
CAGTTT
3509.26
4032
1.149
0.139

QF
CAGTTC
4016.64
4205
1.047
0.046

QF
CAATTT
1256.70
1156
0.920
−0.084

QF
CAATTC
1438.40
828
0.576
−0.552

QG
CAAGGA
1440.03
2837
1.970
0.678

QG
CAAGGT
932.30
1506
1.615
0.480

QG
CAAGGG
1405.83
1700
1.209
0.190

QG
CAAGGC
1952.47
2192
1.123
0.116

QG
CAGGGC
5452.14
5605
1.028
0.028

QG
CAGGGT
2603.39
2292
0.880
−0.127

QG
CAGGGA
4021.17
2871
0.714
−0.337

QG
CAGGGG
3925.67
2730
0.695
−0.363

QH
CAACAT
1067.82
1364
1.277
0.245

QH
CAGCAC
4111.88
4483
1.090
0.086

QH
CAGCAT
2981.80
2794
0.937
−0.065

QH
CAACAC
1472.51
993
0.674
−0.394

QI
CAAATA
656.37
1125
1.714
0.539

QI
CAAATT
1427.17
1667
1.168
0.155

QI
CAGATC
5039.60
5197
1.031
0.031

QI
CAGATA
1832.87
1802
0.983
−0.017

QI
CAGATT
3985.26
3693
0.927
−0.076

QI
CAAATC
1804.74
1262
0.699
−0.358

QK
CAGAAG
8990.94
9726
1.082
0.079

QK
CAAAAA
2486.09
2610
1.050
0.049

QK
CAGAAA
6942.22
6532
0.941
−0.061

QK
CAAAAG
3219.76
2771
0.861
−0.150

QL
CAGCTG
10304.18
12629
1.226
0.203

QL
CAACTA
660.31
798
1.209
0.189

QL
CAACTT
1224.39
1479
1.208
0.189

QL
CAGCTC
4958.40
5986
1.207
0.188

QL
CAGCTA
1843.86
2002
1.086
0.082

QL
CAGCTT
3419.03
3476
1.017
0.017

QL
CAATTA
714.15
642
0.899
−0.107

QL
CAGTTG
3350.09
2597
0.775
−0.255

QL
CAGTTA
1994.20
1518
0.761
−0.273

QL
CAACTC
1775.66
1279
0.720
−0.328

QL
CAACTG
3690.04
2093
0.567
−0.567

QL
CAATTG
1199.70
635
0.529
−0.636

QM
CAGATG
5587.91
5592
1.001
0.001

QM
CAAATG
2001.09
1997
0.998
−0.002

QN
CAAAAT
1720.47
2394
1.391
0.330

QN
CAGAAC
5291.34
5195
0.982
−0.018

QN
CAGAAT
4804.30
4430
0.922
−0.081

QN
CAAAAC
1894.89
1692
0.893
−0.113

QP
CAGCCG
1816.66
2237
1.231
0.208

QP
CAGCCC
5013.75
6143
1.225
0.203

QP
CAGCCT
4490.51
4526
1.008
0.008

QP
CAGCCA
4333.91
4235
0.977
−0.023

QP
CAACCA
1552.02
1441
0.928
−0.074

QP
CAACCT
1608.10
1304
0.811
−0.210

QP
CAACCC
1795.48
1132
0.630
−0.461

QP
CAACCG
650.57
243
0.374
−0.985

QQ
CAACAA
1545.49
1866
1.207
0.188

QQ
CAGCAG
12051.19
13131
1.090
0.086

QQ
CAGCAA
4315.66
4034
0.935
−0.067

QQ
CAACAG
4315.66
3197
0.741
−0.300

QR
CAAAGA
1214.45
1863
1.534
0.428

QR
CAGAGG
3329.32
4331
1.301
0.263

QR
CAAAGG
1192.27
1360
1.141
0.132

QR
CAGAGA
3391.27
3777
1.114
0.108

QR
CAGCGC
3074.54
3169
1.031
0.030

QR
CAGCGG
3362.63
3352
0.997
−0.003

QR
CAGCGT
1316.32
1215
0.923
−0.080

QR
CAGCGA
1822.82
1469
0.806
−0.216

QR
CAACGT
471.39
327
0.694
−0.366

QR
CAACGA
652.77
413
0.633
−0.458

QR
CAACGG
1204.20
453
0.376
−0.978

QR
CAACGC
1101.03
404
0.367
−1.003

QS
CAAAGT
904.91
1408
1.556
0.442

QS
CAGAGC
4005.17
5248
1.310
0.270

QS
CAGAGT
2526.89
2963
1.173
0.159

QS
CAAAGC
1434.30
1465
1.021
0.021

QS
CAGTCG
934.84
923
0.987
−0.013

QS
CAGTCA
2486.15
2379
0.957
−0.044

QS
CAGTCT
3075.24
2806
0.912
−0.092

QS
CAATCA
890.32
781
0.877
−0.131

QS
CAGTCC
3535.16
3051
0.863
−0.147

QS
CAATCT
1101.28
765
0.695
−0.364

QS
CAATCC
1265.98
587
0.464
−0.769

QS
CAATCG
334.78
119
0.355
−1.034

QT
CAAACT
1116.05
1463
1.311
0.271

QT
CAAACA
1262.03
1602
1.269
0.239

QT
CAGACG
1430.02
1665
1.164
0.152

QT
CAGACC
4353.25
4301
0.988
−0.012

QT
CAGACA
3524.12
3445
0.978
−0.023

QT
CAGACT
3116.48
2792
0.896
−0.110

QT
CAAACC
1558.95
1232
0.790
−0.235

QT
CAAACG
512.11
373
0.728
−0.317

QV
CAAGTA
657.01
1210
1.842
0.611

QV
CAAGTT
1011.82
1737
1.717
0.540

QV
CAAGTC
1295.45
1468
1.133
0.125

QV
CAAGTG
2562.79
2712
1.058
0.057

QV
CAGGTG
7156.41
7062
0.987
−0.013

QV
CAGGTC
3617.45
3213
0.888
−0.119

QV
CAGGTT
2825.43
2269
0.803
−0.219

QV
CAGGTA
1834.65
1290
0.703
−0.352

QW
CAGTGG
3057.92
3447
1.127
0.120

QW
CAATGG
1095.08
706
0.645
−0.439

QY
CAATAT
1029.01
1120
1.088
0.085

QY
CAGTAC
3536.21
3820
1.080
0.077

QY
CAGTAT
2873.43
2979
1.037
0.036

QY
CAATAC
1266.36
786
0.621
−0.477

RA
CGGGCG
659.18
1185
1.798
0.587

RA
CGGGCC
2437.97
3513
1.441
0.365

RA
AGAGCA
1415.51
1970
1.392
0.331

RA
CGCGCG
602.71
827
1.372
0.316

RA
CGTGCC
954.35
1266
1.327
0.283

RA
CGAGCA
760.84
970
1.275
0.243

RA
CGAGCT
869.13
1108
1.275
0.243

RA
CGAGCC
1321.57
1595
1.207
0.188

RA
AGAGCT
1616.99
1949
1.205
0.187

RA
CGTGCT
627.63
744
1.185
0.170

RA
CGGGCA
1403.55
1612
1.149
0.138

RA
CGTGCA
549.43
570
1.037
0.037

RA
CGTGCG
258.04
250
0.969
−0.032

RA
CGAGCG
357.33
341
0.954
−0.047

RA
AGGGCC
2413.81
2173
0.900
−0.105

RA
AGAGCC
2458.73
2202
0.896
−0.110

RA
CGGGCT
1603.33
1435
0.895
−0.111

RA
AGGGCA
1389.65
1242
0.894
−0.112

RA
AGGGCT
1587.45
1311
0.826
−0.191

RA
AGGGCG
652.65
524
0.803
−0.220

RA
CGCGCC
2229.09
1712
0.768
−0.264

RA
AGAGCG
664.79
384
0.578
−0.549

RA
CGCGCA
1283.30
331
0.258
−1.355

RA
CGCGCT
1465.97
369
0.252
−1.379

RC
CGCTGC
986.26
2873
2.913
1.069

RC
CGCTGT
830.71
1313
1.581
0.458

RC
CGTTGT
355.66
320
0.900
−0.106

RC
CGTTGC
422.25
372
0.881
−0.127

RC
AGATGT
916.29
806
0.880
−0.128

RC
CGATGT
492.51
421
0.855
−0.157

RC
AGGTGT
899.55
671
0.746
−0.293

RC
AGGTGC
1067.99
758
0.710
−0.343

RC
CGATGC
584.73
381
0.652
−0.428

RC
CGGTGC
1078.67
660
0.612
−0.491

RC
AGATGC
1087.86
642
0.590
−0.527

RC
CGGTGT
908.55
414
0.456
−0.786

RD
AGAGAT
2027.66
2952
1.456
0.376

RD
CGGGAC
2271.13
3231
1.423
0.353

RD
CGAGAT
1089.87
1500
1.376
0.319

RD
CGAGAC
1231.14
1693
1.375
0.319

RD
CGTGAC
889.05
1044
1.174
0.161

RD
AGAGAC
2290.48
2433
1.062
0.060

RD
CGTGAT
787.04
833
1.058
0.057

RD
AGGGAC
2248.63
2322
1.033
0.032

RD
AGGGAT
1990.62
1732
0.870
−0.139

RD
CGGGAT
2010.54
1606
0.799
−0.225

RD
CGCGAC
2076.56
1092
0.526
−0.643

RD
CGCGAT
1838.29
313
0.170
−1.770

RE
AGAGAA
2644.21
4195
1.586
0.462

RE
CGGGAG
3506.29
5344
1.524
0.421

RE
CGAGAG
1900.69
2475
1.302
0.264

RE
CGAGAA
1421.27
1844
1.297
0.260

RE
CGTGAG
1372.55
1453
1.059
0.057

RE
AGGGAG
3471.55
3469
0.999
−0.001

RE
AGAGAG
3536.15
3392
0.959
−0.042

RE
CGTGAA
1026.35
947
0.923
−0.080

RE
AGGGAA
2595.91
2343
0.903
−0.103

RE
CGGGAA
2621.88
2131
0.813
−0.207

RE
CGCGAG
3205.89
1839
0.574
−0.556

RE
CGCGAA
2397.25
268
0.112
−2.191

RF
CGCTTC
1446.49
3411
2.358
0.858

RF
CGTTTC
619.29
823
1.329
0.284

RF
CGTTTT
541.07
705
1.303
0.265

RF
AGATTT
1393.96
1531
1.098
0.094

RF
CGCTTT
1263.77
1366
1.081
0.078

RF
CGATTT
749.26
772
1.030
0.030

RF
AGGTTT
1368.50
1295
0.946
−0.055

RF
AGGTTC
1566.36
1192
0.761
−0.273

RF
CGATTC
857.59
632
0.737
−0.305

RF
CGGTTC
1582.03
951
0.601
−0.509

RF
AGATTC
1595.50
944
0.592
−0.525

RF
CGGTTT
1382.19
744
0.538
−0.619

RG
CGTGGT
370.38
685
1.849
0.615

RG
CGTGGG
558.50
980
1.755
0.562

RG
CGTGGC
775.66
1315
1.695
0.528

RG
CGAGGA
792.21
1266
1.598
0.469

RG
CGAGGG
773.39
1219
1.576
0.455

RG
AGAGGA
1473.87
2281
1.548
0.437

RG
CGAGGT
512.89
789
1.538
0.431

RG
CGGGGC
1981.48
2952
1.490
0.399

RG
CGTGGA
572.08
844
1.475
0.389

RG
CGAGGC
1074.12
1569
1.461
0.379

RG
AGAGGT
954.21
1128
1.182
0.167

RG
CGGGGT
946.15
918
0.970
−0.030

RG
CGCGGC
1811.72
1574
0.869
−0.141

RG
AGGGGC
1961.86
1660
0.846
−0.167

RG
AGAGGC
1998.36
1680
0.841
−0.174

RG
AGAGGG
1438.87
1203
0.836
−0.179

RG
AGGGGT
936.78
777
0.829
−0.187

RG
CGGGGG
1426.72
1146
0.803
−0.219

RG
CGGGGA
1461.42
1140
0.780
−0.248

RG
CGCGGG
1304.48
904
0.693
−0.367

RG
AGGGGA
1446.94
923
0.638
−0.450

RG
AGGGGG
1412.58
683
0.484
−0.727

RG
CGCGGT
865.09
248
0.287
−1.249

RG
CGCGGA
1336.22
302
0.226
−1.487

RH
CGCCAC
1288.00
1861
1.445
0.368

RH
CGGCAC
1408.69
1707
1.212
0.192

RH
AGACAT
1030.24
1201
1.166
0.153

RH
CGTCAT
399.89
447
1.118
0.111

RH
AGGCAT
1011.41
988
0.977
−0.023

RH
CGACAT
553.75
530
0.957
−0.044

RH
AGGCAC
1394.73
1292
0.926
−0.077

RH
AGACAC
1420.69
1212
0.853
−0.159

RH
CGTCAC
551.44
468
0.849
−0.164

RH
CGACAC
763.62
614
0.804
−0.218

RH
CGCCAT
934.02
728
0.779
−0.249

RH
CGGCAT
1021.53
730
0.715
−0.336

RI
CGCATC
1625.56
2948
1.814
0.595

RI
AGAATA
652.11
1175
1.802
0.589

RI
AGAATT
1417.90
2185
1.541
0.432

RI
AGGATA
640.20
804
1.256
0.228

RI
CGAATA
350.51
439
1.252
0.225

RI
CGAATT
762.13
850
1.115
0.109

RI
AGGATT
1392.00
1366
0.981
−0.019

RI
AGGATC
1760.27
1662
0.944
−0.057

RI
CGAATC
963.75
802
0.832
−0.184

RI
CGGATC
1777.88
1479
0.832
−0.184

RI
AGAATC
1793.03
1389
0.775
−0.255

RI
CGTATT
550.36
408
0.741
−0.299

RI
CGCATT
1285.48
913
0.710
−0.342

RI
CGGATA
646.60
451
0.697
−0.360

RI
CGTATC
695.96
440
0.632
−0.459

RI
CGTATA
253.12
152
0.601
−0.510

RI
CGGATT
1405.93
825
0.587
−0.533

RI
CGCATA
591.21
276
0.467
−0.762

RK
AGGAAG
3199.71
4856
1.518
0.417

RK
AGGAAA
2470.61
3737
1.513
0.414

RK
AGAAAA
2516.58
3482
1.384
0.325

RK
CGCAAG
2954.85
2981
1.009
0.009

RK
CGGAAG
3231.73
3225
0.998
−0.002

RK
AGAAAG
3259.25
2909
0.893
−0.114

RK
CGAAAA
1352.67
1189
0.879
−0.129

RK
CGGAAA
2495.33
1834
0.735
−0.308

RK
CGAAAG
1751.85
1265
0.722
−0.326

RK
CGTAAA
976.81
566
0.579
−0.546

RK
CGCAAA
2281.54
1209
0.530
−0.635

RK
CGTAAG
1265.08
503
0.398
−0.922

RL
CGCCTC
1491.12
2511
1.684
0.521

RL
CGCCTG
3098.73
4809
1.552
0.439

RL
CGGCTG
3389.08
5029
1.484
0.395

RL
CGGCTC
1630.84
2301
1.411
0.344

RL
CGTTTA
256.76
337
1.313
0.272

RL
AGATTA
661.49
862
1.303
0.265

RL
CGTCTT
440.20
562
1.277
0.244

RL
CGTCTA
237.40
296
1.247
0.221

RL
CGTTTG
431.33
526
1.219
0.198

RL
CGTCTC
638.40
723
1.133
0.124

RL
AGGCTA
600.44
669
1.114
0.108

RL
AGACTT
1134.11
1227
1.082
0.079

RL
AGGCTG
3355.51
3531
1.052
0.051

RL
AGACTA
611.62
617
1.009
0.009

RL
AGGCTT
1113.39
1104
0.992
−0.008

RL
CGACTA
328.75
324
0.986
−0.015

RL
CGGCTA
606.45
593
0.978
−0.022

RL
CGTCTG
1326.68
1281
0.966
−0.035

RL
AGGCTC
1614.68
1540
0.954
−0.047

RL
CGATTA
355.55
337
0.948
−0.054

RL
CGACTT
609.59
576
0.945
−0.057

RL
CGCCTA
554.49
501
0.904
−0.101

RL
AGGTTA
649.40
586
0.902
−0.103

RL
CGCCTT
1028.19
862
0.838
−0.176

RL
CGCTTG
1007.46
804
0.798
−0.226

RL
CGGCTT
1124.53
866
0.770
−0.261

RL
AGATTG
1111.24
839
0.755
−0.281

RL
CGACTC
884.04
663
0.750
−0.288

RL
AGGTTG
1090.94
774
0.709
−0.343

RL
AGACTC
1644.73
1142
0.694
−0.365

RL
CGATTG
597.29
408
0.683
−0.381

RL
CGACTG
1837.15
1128
0.614
−0.488

RL
CGCTTA
599.71
345
0.575
−0.553

RL
CGGTTG
1101.86
566
0.514
−0.666

RL
AGACTG
3417.95
1701
0.498
−0.698

RL
CGGTTA
655.90
297
0.453
−0.792

RM
CGCATG
1558.32
1961
1.258
0.230

RM
AGGATG
1687.45
1974
1.170
0.157

RM
CGAATG
923.88
932
1.009
0.009

RM
AGAATG
1718.85
1690
0.983
−0.017

RM
CGGATG
1704.33
1374
0.806
−0.215

RM
CGTATG
667.17
329
0.493
−0.707

RN
AGAAAT
1568.88
2627
1.674
0.515

RN
AGGAAC
1696.37
2200
1.297
0.260

RN
AGGAAT
1540.22
1796
1.166
0.154

RN
AGAAAC
1727.93
1949
1.128
0.120

RN
CGAAAT
843.28
930
1.103
0.098

RN
CGCAAC
1566.55
1575
1.005
0.005

RN
CGGAAC
1713.34
1621
0.946
−0.055

RN
CGAAAC
928.77
784
0.844
−0.169

RN
CGGAAT
1555.63
1002
0.644
−0.440

RN
CGTAAT
608.96
340
0.558
−0.583

RN
CGCAAT
1422.36
711
0.500
−0.693

RN
CGTAAC
670.70
308
0.459
−0.778

RP
CGGCCG
587.88
1226
2.085
0.735

RP
CGGCCC
1622.47
2939
1.811
0.594

RP
CGCCCG
537.51
717
1.334
0.288

RP
AGGCCC
1606.39
1982
1.234
0.210

RP
AGGCCG
582.05
666
1.144
0.135

RP
AGGCCT
1438.75
1642
1.141
0.132

RP
AGGCCA
1388.57
1511
1.088
0.084

RP
CGTCCT
568.84
589
1.035
0.035

RP
AGACCA
1414.41
1387
0.981
−0.020

RP
CGGCCT
1453.14
1390
0.957
−0.044

RP
AGACCT
1465.52
1398
0.954
−0.047

RP
CGTCCC
635.12
582
0.916
−0.087

RP
CGGCCA
1402.47
1285
0.916
−0.087

RP
CGCCCC
1483.46
1320
0.890
−0.117

RP
CGTCCA
549.00
487
0.887
−0.120

RP
AGACCC
1636.29
1283
0.784
−0.243

RP
CGACCA
760.25
591
0.777
−0.252

RP
CGACCC
879.51
671
0.763
−0.271

RP
CGACCT
787.72
580
0.736
−0.306

RP
CGCCCA
1282.31
887
0.692
−0.369

RP
CGTCCG
230.13
159
0.691
−0.370

RP
CGCCCT
1328.65
830
0.625
−0.470

RP
CGACCG
318.68
184
0.577
−0.549

RP
AGACCG
592.88
246
0.415
−0.880

RQ
AGACAA
1054.78
1456
1.380
0.322

RQ
CGGCAG
2920.52
3950
1.352
0.302

RQ
CGCCAG
2670.31
3160
1.183
0.168

RQ
AGGCAA
1035.51
1177
1.137
0.128

RQ
AGGCAG
2891.59
3013
1.042
0.041

RQ
CGACAA
566.95
522
0.921
−0.083

RQ
CGTCAG
1143.25
953
0.834
−0.182

RQ
CGTCAA
409.41
327
0.799
−0.225

RQ
CGACAG
1583.16
1249
0.789
−0.237

RQ
CGGCAA
1045.87
763
0.730
−0.315

RQ
AGACAG
2945.39
2062
0.700
−0.357

RQ
CGCCAA
956.27
591
0.618
−0.481

RR
CGCCGC
1172.08
2232
1.904
0.644

RR
CGGCGG
1402.02
2316
1.652
0.502

RR
AGAAGA
1426.00
2307
1.618
0.481

RR
CGGCGC
1281.90
2064
1.610
0.476

RR
AGGAGG
1374.38
1973
1.436
0.362

RR
CGCCGG
1281.90
1679
1.310
0.270

RR
CGAAGA
766.48
987
1.288
0.253

RR
AGGAGA
1399.95
1758
1.256
0.228

RR
CGCAGG
1269.20
1565
1.233
0.209

RR
CGGAGG
1388.13
1670
1.203
0.185

RR
CGTCGT
214.84
228
1.061
0.059

RR
CGAAGG
752.48
770
1.023
0.023

RR
CGCCGT
501.81
502
1.000
0.000

RR
AGAAGG
1399.95
1325
0.946
−0.055

RR
CGGCGT
548.83
498
0.907
−0.097

RR
CGTCGA
297.51
265
0.891
−0.116

RR
CGGCGA
760.01
675
0.888
−0.119

RR
CGTCGC
501.81
438
0.873
−0.136

RR
AGGCGG
1388.13
1177
0.848
−0.165

RR
CGTCGG
548.83
450
0.820
−0.199

RR
CGACGT
297.51
241
0.810
−0.211

RR
CGCCGA
694.89
547
0.787
−0.239

RR
AGGCGA
752.48
570
0.757
−0.278

RR
CGGAGA
1413.96
1068
0.755
−0.281

RR
AGACGA
766.48
557
0.727
−0.319

RR
AGGCGT
543.39
383
0.705
−0.350

RR
AGGCGC
1269.20
889
0.700
−0.356

RR
AGACGT
553.50
376
0.679
−0.387

RR
CGACGA
411.98
272
0.660
−0.415

RR
CGCAGA
1292.82
771
0.596
−0.517

RR
CGACGG
760.01
411
0.541
−0.615

RR
CGACGC
694.89
368
0.530
−0.636

RR
CGTAGA
553.50
271
0.490
−0.714

RR
CGTAGG
543.39
235
0.432
−0.838

RR
AGACGC
1292.82
524
0.405
−0.903

RR
AGACGG
1413.96
569
0.402
−0.910

RS
CGCTCG
332.61
817
2.456
0.899

RS
CGCAGC
1425.00
2853
2.002
0.694

RS
CGCTCC
1257.78
2184
1.736
0.552

RS
AGAAGT
991.66
1532
1.545
0.435

RS
CGTTCT
468.44
687
1.467
0.383

RS
CGAAGT
533.02
728
1.366
0.312

RS
CGTTCC
538.50
707
1.313
0.272

RS
AGGAGC
1543.09
1992
1.291
0.255

RS
CGTTCA
378.71
471
1.244
0.218

RS
CGGAGC
1558.53
1856
1.191
0.175

RS
AGGAGT
973.54
1071
1.100
0.095

RS
AGAAGC
1571.80
1628
1.036
0.035

RS
AGATCA
975.67
1000
1.025
0.025

RS
CGAAGC
844.85
859
1.017
0.017

RS
CGCTCA
884.55
860
0.972
−0.028

RS
CGCAGT
899.04
853
0.949
−0.053

RS
AGATCT
1206.86
1106
0.916
−0.087

RS
CGCTCT
1094.14
942
0.861
−0.150

RS
CGTTCG
142.40
121
0.850
−0.163

RS
AGGTCA
957.85
808
0.844
−0.170

RS
CGATCA
524.43
416
0.793
−0.232

RS
AGGTCT
1184.81
939
0.793
−0.233

RS
AGGTCG
360.17
284
0.789
−0.238

RS
CGATCT
648.69
497
0.766
−0.266

RS
AGGTCC
1362.00
1036
0.761
−0.274

RS
CGGAGT
983.28
745
0.758
−0.278

RS
CGTAGT
384.91
278
0.722
−0.325

RS
CGGTCG
363.77
235
0.646
−0.437

RS
CGATCC
745.70
455
0.610
−0.494

RS
AGATCC
1387.35
830
0.598
−0.514

RS
CGGTCC
1375.63
821
0.597
−0.516

RS
CGATCG
197.19
107
0.543
−0.611

RS
CGGTCA
967.43
507
0.524
−0.646

RS
CGTAGC
610.09
317
0.520
−0.655

RS
AGATCG
366.87
177
0.482
−0.729

RS
CGGTCT
1196.66
518
0.433
−0.837

RT
CGCACG
450.78
858
1.903
0.644

RT
AGAACT
1083.61
1467
1.354
0.303

RT
CGCACC
1372.27
1821
1.327
0.283

RT
AGGACG
488.14
646
1.323
0.280

RT
AGGACT
1063.81
1389
1.306
0.267

RT
AGAACA
1225.34
1575
1.285
0.251

RT
AGGACA
1202.96
1523
1.266
0.236

RT
AGGACC
1485.98
1773
1.193
0.177

RT
CGGACG
493.02
537
1.089
0.085

RT
CGAACA
658.62
661
1.004
0.004

RT
CGAACT
582.44
556
0.955
−0.046

RT
CGGACC
1500.85
1408
0.938
−0.064

RT
CGCACA
1110.90
984
0.886
−0.121

RT
CGGACA
1215.00
949
0.781
−0.247

RT
AGAACC
1513.63
1166
0.770
−0.261

RT
CGTACT
420.60
313
0.744
−0.295

RT
CGAACC
813.58
599
0.736
−0.306

RT
CGGACT
1074.45
712
0.663
−0.411

RT
CGCACT
982.40
638
0.649
−0.432

RT
CGTACC
587.52
361
0.614
−0.487

RT
AGAACG
497.22
302
0.607
−0.499

RT
CGTACA
475.62
288
0.606
−0.502

RT
CGAACG
267.26
154
0.576
−0.551

RT
CGTACG
193.00
79
0.409
−0.893

RV
CGTGTG
889.90
1699
1.909
0.647

RV
CGTGTC
449.83
826
1.836
0.608

RV
CGAGTA
315.92
562
1.779
0.576

RV
CGTGTA
228.14
391
1.714
0.539

RV
CGTGTT
351.34
565
1.608
0.475

RV
AGAGTT
905.17
1350
1.491
0.400

RV
AGAGTA
587.76
876
1.490
0.399

RV
CGAGTC
622.91
914
1.467
0.383

RV
CGAGTT
486.53
681
1.400
0.336

RV
CGAGTG
1232.31
1576
1.279
0.246

RV
CGGGTC
1149.12
1310
1.140
0.131

RV
AGGGTC
1137.73
1221
1.073
0.071

RV
CGGGTG
2273.30
2328
1.024
0.024

RV
AGAGTC
1158.91
1154
0.996
−0.004

RV
CGCGTG
2078.54
1725
0.830
−0.186

RV
AGGGTA
577.02
471
0.816
−0.203

RV
AGAGTG
2292.67
1750
0.763
−0.270

RV
CGGGTA
582.79
438
0.752
−0.286

RV
AGGGTG
2250.78
1658
0.737
−0.306

RV
CGCGTC
1050.67
763
0.726
−0.320

RV
AGGGTT
888.63
645
0.726
−0.320

RV
CGGGTT
897.52
548
0.611
−0.493

RV
CGCGTA
532.86
132
0.248
−1.395

RV
CGCGTT
820.63
178
0.217
−1.528

RW
CGCTGG
1038.00
2199
2.118
0.751

RW
CGTTGG
444.40
380
0.855
−0.157

RW
AGGTGG
1124.01
876
0.779
−0.249

RW
CGATGG
615.40
466
0.757
−0.278

RW
AGATGG
1144.93
804
0.702
−0.353

RW
CGGTGG
1135.26
777
0.684
−0.379

RY
CGCTAC
1173.12
2612
2.227
0.800

RY
CGCTAT
953.25
1198
1.257
0.229

RY
CGTTAC
502.25
565
1.125
0.118

RY
CGTTAT
408.12
459
1.125
0.117

RY
AGATAT
1051.45
1018
0.968
−0.032

RY
AGATAC
1293.97
1239
0.958
−0.043

RY
CGATAT
565.15
509
0.901
−0.105

RY
CGATAC
695.51
584
0.840
−0.175

RY
AGGTAC
1270.33
1007
0.793
−0.232

RY
AGGTAT
1032.24
769
0.745
−0.294

RY
CGGTAC
1283.04
856
0.667
−0.405

RY
CGGTAT
1042.57
455
0.436
−0.829

SA
TCGGCG
241.39
778
3.223
1.170

SA
TCGGCC
892.76
1976
2.213
0.795

SA
TCAGCA
1366.87
2526
1.848
0.614

SA
TCTGCA
1690.75
3035
1.795
0.585

SA
TCTGCT
1931.41
3350
1.734
0.551

SA
TCAGCT
1561.43
2630
1.684
0.521

SA
AGTGCT
1587.01
2487
1.567
0.449

SA
AGTGCA
1389.27
2040
1.468
0.384

SA
AGTGCC
2413.15
3437
1.424
0.354

SA
TCAGCC
2374.25
3294
1.387
0.327

SA
TCGGCT
587.12
808
1.376
0.319

SA
TCTGCC
2936.83
3480
1.185
0.170

SA
TCGGCA
513.97
598
1.163
0.151

SA
TCTGCG
794.06
745
0.938
−0.064

SA
TCAGCG
641.95
584
0.910
−0.095

SA
AGTGCG
652.47
532
0.815
−0.204

SA
AGCGCG
1034.18
802
0.775
−0.254

SA
AGCGCC
3824.90
2428
0.635
−0.454

SA
TCCGCG
912.82
577
0.632
−0.459

SA
TCCGCC
3376.05
1230
0.364
−1.010

SA
AGCGCT
2515.45
709
0.282
−1.266

SA
AGCGCA
2202.02
601
0.273
−1.299

SA
TCCGCA
1943.61
476
0.245
−1.407

SA
TCCGCT
2220.26
481
0.217
−1.530

SC
TCCTGC
1640.34
2828
1.724
0.545

SC
AGCTGC
1858.43
3034
1.633
0.490

SC
TCCTGT
1381.63
1779
1.288
0.253

SC
AGCTGT
1565.33
1922
1.228
0.205

SC
TCGTGC
433.77
361
0.832
−0.184

SC
TCTTGT
1201.89
941
0.783
−0.245

SC
AGTTGT
987.57
698
0.707
−0.347

SC
TCGTGT
365.36
225
0.616
−0.485

SC
TCATGT
971.65
584
0.601
−0.509

SC
TCTTGC
1426.94
758
0.531
−0.633

SC
TCATGC
1153.59
525
0.455
−0.787

SC
AGTTGC
1172.49
504
0.430
−0.844

SD
TCAGAT
1978.63
3706
1.873
0.628

SD
AGTGAT
2011.05
3683
1.831
0.605

SD
AGTGAC
2271.71
4040
1.778
0.576

SD
TCGGAC
840.43
1438
1.711
0.537

SD
TCTGAT
2447.46
3578
1.462
0.380

SD
TCAGAC
2235.09
2906
1.300
0.262

SD
TCGGAT
744.00
840
1.129
0.121

SD
TCTGAC
2764.69
2949
1.067
0.065

SD
AGCGAC
3600.71
2017
0.560
−0.580

SD
TCCGAC
3178.17
1336
0.420
−0.867

SD
AGCGAT
3187.56
920
0.289
−1.243

SD
TCCGAT
2813.50
660
0.235
−1.450

SE
TCAGAA
2420.84
4815
1.989
0.688

SE
AGTGAA
2460.50
4686
1.904
0.644

SE
TCGGAG
1217.33
2184
1.794
0.584

SE
TCTGAA
2994.45
4621
1.543
0.434

SE
TCAGAG
3237.43
4683
1.447
0.369

SE
AGTGAG
3290.47
4410
1.340
0.293

SE
TCTGAG
4004.54
4891
1.221
0.200

SE
TCGGAA
910.28
879
0.966
−0.035

SE
AGCGAG
5215.47
2961
0.568
−0.566

SE
TCCGAG
4603.44
2005
0.436
−0.831

SE
AGCGAA
3899.95
847
0.217
−1.527

SE
TCCGAA
3442.29
715
0.208
−1.572

SF
TCCTTC
2645.79
4407
1.666
0.510

SF
AGCTTC
2997.56
3942
1.315
0.274

SF
TCATTT
1625.65
1773
1.091
0.087

SF
TCCTTT
2311.58
2487
1.076
0.073

SF
AGTTTT
1652.29
1695
1.026
0.026

SF
AGCTTT
2618.91
2370
0.905
−0.100

SF
TCTTTT
2010.85
1809
0.900
−0.106

SF
TCTTTC
2301.58
1728
0.751
−0.287

SF
AGTTTC
1891.18
1353
0.715
−0.335

SF
TCGTTT
611.27
342
0.559
−0.581

SF
TCATTC
1860.69
991
0.533
−0.630

SF
TCGTTC
699.65
330
0.472
−0.751

SG
AGTGGT
1051.00
2094
1.992
0.689

SG
TCGGGG
586.31
1117
1.905
0.645

SG
TCGGGC
814.29
1487
1.826
0.602

SG
AGTGGA
1623.36
2932
1.806
0.591

SG
TCAGGA
1597.19
2760
1.728
0.547

SG
TCTGGA
1975.64
3391
1.716
0.540

SG
AGTGGG
1584.81
2584
1.630
0.489

SG
TCTGGG
1928.73
2974
1.542
0.433

SG
AGTGGC
2201.05
3314
1.506
0.409

SG
TCTGGT
1279.07
1902
1.487
0.397

SG
TCAGGG
1559.26
2161
1.386
0.326

SG
TCAGGT
1034.06
1351
1.307
0.267

SG
TCGGGA
600.57
684
1.139
0.130

SG
TCGGGT
388.82
410
1.054
0.053

SG
TCTGGC
2678.70
2734
1.021
0.020

SG
TCAGGC
2165.57
2114
0.976
−0.024

SG
AGCGGC
3488.72
2475
0.709
−0.343

SG
AGCGGG
2511.96
1464
0.583
−0.540

SG
TCCGGG
2217.18
1117
0.504
−0.686

SG
TCCGGC
3079.31
1163
0.378
−0.974

SG
AGCGGT
1665.85
536
0.322
−1.134

SG
AGCGGA
2573.06
663
0.258
−1.356

SG
TCCGGA
2271.11
560
0.247
−1.400

SG
TCCGGT
1470.37
359
0.244
−1.410

SH
AGCCAC
2202.27
3210
1.458
0.377

SH
TCTCAT
1226.22
1426
1.163
0.151

SH
TCCCAC
1943.83
2233
1.149
0.139

SH
AGTCAT
1007.57
1082
1.074
0.071

SH
AGCCAT
1597.01
1606
1.006
0.006

SH
TCGCAC
514.03
512
0.996
−0.004

SH
TCCCAT
1409.60
1349
0.957
−0.044

SH
TCACAT
991.32
929
0.937
−0.065

SH
AGTCAC
1389.42
1077
0.775
−0.255

SH
TCACAC
1367.03
956
0.699
−0.358

SH
TCTCAC
1690.94
1158
0.685
−0.379

SH
TCGCAT
372.75
174
0.467
−0.762

SI
TCCATC
2374.96
4526
1.906
0.645

SI
AGCATC
2690.72
4471
1.662
0.508

SI
TCCATT
1878.09
2383
1.269
0.238

SI
AGCATT
2127.79
2384
1.120
0.114

SI
TCCATA
863.76
963
1.115
0.109

SI
AGTATA
617.40
640
1.037
0.036

SI
TCAATA
607.45
618
1.017
0.017

SI
AGTATT
1342.43
1299
0.968
−0.033

SI
AGCATA
978.60
943
0.964
−0.037

SI
TCTATA
751.38
658
0.876
−0.133

SI
TCTATT
1633.75
1215
0.744
−0.296

SI
TCAATT
1320.79
957
0.725
−0.322

SI
AGTATC
1697.59
924
0.544
−0.608

SI
TCGATA
228.41
109
0.477
−0.740

SI
TCTATC
2065.98
958
0.464
−0.769

SI
TCGATT
496.64
185
0.373
−0.988

SI
TCAATC
1670.22
557
0.333
−1.098

SI
TCGATC
628.03
184
0.293
−1.228

SK
TCCAAG
3563.99
5021
1.409
0.343

SK
TCCAAA
2751.88
3634
1.321
0.278

SK
AGCAAG
4037.83
5128
1.270
0.239

SK
AGCAAA
3117.75
3736
1.198
0.181

SK
TCAAAA
1935.30
2282
1.179
0.165

SK
AGTAAA
1967.01
2149
1.093
0.088

SK
TCAAAG
2506.42
2082
0.831
−0.186

SK
TCTAAA
2393.86
1838
0.768
−0.264

SK
TCGAAG
942.46
522
0.554
−0.591

SK
AGTAAG
2547.49
1300
0.510
−0.673

SK
TCTAAG
3100.32
1569
0.506
−0.681

SK
TCGAAA
727.71
331
0.455
−0.788

SL
AGTTTA
709.05
1103
1.556
0.442

SL
TCGCTG
1355.42
2104
1.552
0.440

SL
TCCTTG
1666.44
2462
1.477
0.390

SL
TCTTTA
862.92
1267
1.468
0.384

SL
AGCCTC
2794.39
4013
1.436
0.362

SL
TCTTTG
1449.64
2009
1.386
0.326

SL
TCATTA
697.62
862
1.236
0.212

SL
AGCCTG
5807.08
7014
1.208
0.189

SL
AGTTTG
1191.15
1427
1.198
0.181

SL
TCGCTC
652.23
777
1.191
0.175

SL
TCTCTA
797.87
950
1.191
0.175

SL
TCTCTT
1479.47
1750
1.183
0.168

SL
TCCCTG
5125.62
6034
1.177
0.163

SL
TCCCTC
2466.46
2805
1.137
0.129

SL
TCCTTA
991.98
1076
1.085
0.081

SL
AGTCTT
1215.66
1242
1.022
0.021

SL
AGCCTT
1926.85
1959
1.017
0.017

SL
TCACTA
645.03
630
0.977
−0.024

SL
AGCTTG
1888.00
1786
0.946
−0.056

SL
TCACTT
1196.06
1111
0.929
−0.074

SL
TCCCTT
1700.73
1545
0.908
−0.096

SL
TCCCTA
917.19
810
0.883
−0.124

SL
AGTCTA
655.60
569
0.868
−0.142

SL
TCATTG
1171.95
1015
0.866
−0.144

SL
AGCCTA
1039.14
875
0.842
−0.172

SL
TCTCTC
2145.58
1760
0.820
−0.198

SL
TCTCTG
4458.78
3418
0.767
−0.266

SL
AGCTTA
1123.86
758
0.674
−0.394

SL
AGTCTC
1763.00
1158
0.657
−0.420

SL
TCGTTG
440.67
280
0.635
−0.454

SL
TCACTC
1734.58
1100
0.634
−0.455

SL
TCACTG
3604.66
2254
0.625
−0.470

SL
TCGCTT
449.74
279
0.620
−0.477

SL
TCGCTA
242.54
143
0.590
−0.528

SL
TCGTTA
262.32
140
0.534
−0.628

SL
AGTCTG
3663.72
1808
0.493
−0.706

SM
TCCATG
2282.65
3908
1.712
0.538

SM
AGCATG
2586.13
3300
1.276
0.244

SM
TCAATG
1605.31
1129
0.703
−0.352

SM
TCGATG
603.62
365
0.605
−0.503

SM
AGTATG
1631.61
966
0.592
−0.524

SM
TCTATG
1985.68
1027
0.517
−0.659

SN
AGCAAC
2539.42
3717
1.464
0.381

SN
TCCAAC
2241.42
3216
1.435
0.361

SN
TCAAAT
1431.22
1883
1.316
0.274

SN
AGCAAT
2305.68
2513
1.090
0.086

SN
TCCAAT
2035.11
2000
0.983
−0.017

SN
AGTAAT
1454.67
1425
0.980
−0.021

SN
AGTAAC
1602.14
1339
0.836
−0.179

SN
TCAAAC
1576.31
1194
0.757
−0.278

SN
TCTAAT
1770.34
1297
0.733
−0.311

SN
TCTAAC
1949.81
955
0.490
−0.714

SN
TCGAAT
538.16
258
0.479
−0.735

SN
TCGAAC
592.72
240
0.405
−0.904

SP
TCGCCG
282.21
549
1.945
0.665

SP
TCGCCC
778.87
1221
1.568
0.450

SP
TCCCCG
1067.21
1621
1.519
0.418

SP
TCTCCA
2214.76
3119
1.408
0.342

SP
AGCCCC
3336.96
4654
1.395
0.333

SP
TCTCCT
2294.78
2888
1.259
0.230

SP
AGCCCG
1209.10
1432
1.184
0.169

SP
TCCCCA
2545.99
2968
1.166
0.153

SP
TCACCA
1790.50
1869
1.044
0.043

SP
AGCCCT
2988.71
3086
1.033
0.032

SP
AGTCCT
1885.59
1904
1.010
0.010

SP
TCACCT
1855.20
1752
0.944
−0.057

SP
AGCCCA
2884.48
2607
0.904
−0.101

SP
TCCCCT
2637.98
2238
0.848
−0.164

SP
AGTCCA
1819.84
1473
0.809
−0.211

SP
TCGCCT
697.59
562
0.806
−0.216

SP
TCGCCA
673.26
541
0.804
−0.219

SP
TCTCCC
2562.18
2036
0.795
−0.230

SP
TCACCC
2071.37
1568
0.757
−0.278

SP
AGTCCC
2105.31
1534
0.729
−0.317

SP
TCTCCG
928.37
664
0.715
−0.335

SP
TCCCCC
2945.37
2058
0.699
−0.358

SP
TCACCG
750.53
426
0.568
−0.566

SP
AGTCCG
762.83
319
0.418
−0.872

SQ
TCCCAG
4427.95
5592
1.263
0.233

SQ
AGCCAG
5016.65
6041
1.204
0.186

SQ
TCTCAA
1379.40
1644
1.192
0.175

SQ
AGTCAA
1133.44
1293
1.141
0.132

SQ
TCACAA
1115.16
1196
1.072
0.070

SQ
AGCCAA
1796.52
1819
1.013
0.012

SQ
TCCCAA
1585.70
1474
0.930
−0.073

SQ
TCTCAG
3851.88
3430
0.890
−0.116

SQ
TCGCAG
1170.92
1015
0.867
−0.143

SQ
TCACAG
3114.02
2271
0.729
−0.316

SQ
AGTCAG
3165.04
2215
0.700
−0.357

SQ
TCGCAA
419.32
186
0.444
−0.813

SR
AGCCGC
1540.23
2828
1.836
0.608

SR
TCCAGG
1472.14
2309
1.568
0.450

SR
AGCCGG
1684.56
2353
1.397
0.334

SR
TCCCGG
1486.87
1976
1.329
0.284

SR
AGCAGG
1667.87
2186
1.311
0.271

SR
AGCCGT
659.43
857
1.300
0.262

SR
TCGCGC
359.50
446
1.241
0.216

SR
TCCAGA
1499.54
1850
1.234
0.210

SR
TCAAGA
1054.57
1294
1.227
0.205

SR
TCGCGG
393.19
481
1.223
0.202

SR
TCCCGC
1359.49
1605
1.181
0.166

SR
TCTCGA
701.14
826
1.178
0.164

SR
AGTCGT
416.04
484
1.163
0.151

SR
TCCCGA
806.00
937
1.163
0.151

SR
AGCAGA
1698.90
1925
1.133
0.125

SR
AGCCGA
913.16
1020
1.117
0.111

SR
TCTCGT
506.32
493
0.974
−0.027

SR
AGTCGA
576.12
553
0.960
−0.041

SR
TCCCGT
582.04
553
0.950
−0.051

SR
TCAAGG
1035.31
922
0.891
−0.116

SR
TCGAGG
389.29
324
0.832
−0.184

SR
TCTCGG
1293.43
1062
0.821
−0.197

SR
TCACGT
409.33
323
0.789
−0.237

SR
AGTAGA
1071.85
746
0.696
−0.362

SR
TCGCGT
153.92
102
0.663
−0.411

SR
AGTCGG
1062.80
675
0.635
−0.454

SR
AGTCGC
971.74
591
0.608
−0.497

SR
TCACGA
566.83
344
0.607
−0.499

SR
TCGAGA
396.54
240
0.605
−0.502

SR
TCTAGA
1304.45
750
0.575
−0.553

SR
TCGCGA
213.14
115
0.540
−0.617

SR
TCTCGC
1182.62
636
0.538
−0.620

SR
TCACGG
1045.66
534
0.511
−0.672

SR
TCTAGG
1280.62
574
0.448
−0.802

SR
TCACGC
956.08
406
0.425
−0.856

SR
AGTAGG
1052.27
443
0.421
−0.865

SS
AGCAGC
3919.72
7160
1.827
0.602

SS
TCGTCG
213.54
376
1.761
0.566

SS
TCCTCG
807.53
1302
1.612
0.478

SS
TCCAGC
3459.74
4832
1.397
0.334

SS
TCTTCA
1868.19
2596
1.390
0.329

SS
AGCAGT
2472.97
3417
1.382
0.323

SS
TCCTCC
3053.74
4162
1.363
0.310

SS
TCTTCT
2310.85
2896
1.253
0.226

SS
TCCAGT
2182.77
2691
1.233
0.209

SS
TCATCA
1510.32
1795
1.188
0.173

SS
AGCTCC
3459.74
4024
1.163
0.151

SS
TCATCT
1868.19
2118
1.134
0.126

SS
TCCTCA
2147.58
2413
1.124
0.117

SS
AGCTCG
914.89
1001
1.094
0.090

SS
TCCTCT
2656.45
2744
1.033
0.032

SS
TCGTCC
807.53
818
1.013
0.013

SS
TCTTCC
2656.45
2600
0.979
−0.021

SS
AGTTCT
1898.79
1856
0.977
−0.023

SS
AGTTCA
1535.06
1498
0.976
−0.024

SS
TCAAGT
1535.06
1404
0.915
−0.089

SS
AGCTCA
2433.11
2075
0.853
−0.159

SS
AGCTCT
3009.63
2465
0.819
−0.200

SS
TCTTCG
702.47
556
0.791
−0.234

SS
TCATCC
2147.58
1632
0.760
−0.275

SS
AGTAGT
1560.21
1030
0.660
−0.415

SS
AGTTCC
2182.77
1405
0.644
−0.441

SS
TCGTCT
702.47
434
0.618
−0.482

SS
TCATCG
567.91
343
0.604
−0.504

SS
TCGTCA
567.91
313
0.551
−0.596

SS
TCTAGT
1898.79
957
0.504
−0.685

SS
TCGAGC
914.89
440
0.481
−0.732

SS
AGTAGC
2472.97
1158
0.468
−0.759

SS
TCAAGC
2433.11
1117
0.459
−0.779

SS
TCGAGT
577.21
259
0.449
−0.801

SS
AGTTCG
577.21
251
0.435
−0.833

SS
TCTAGC
3009.63
899
0.299
−1.208

ST
TCCACG
785.52
1434
1.826
0.602

ST
AGCACC
2709.18
4149
1.531
0.426

ST
TCCACC
2391.25
3527
1.475
0.389

ST
AGCACG
889.95
1180
1.326
0.282

ST
AGCACA
2193.18
2692
1.227
0.205

ST
TCCACA
1935.81
2329
1.203
0.185

ST
TCCACT
1711.89
1937
1.131
0.124

ST
AGCACT
1939.49
2193
1.131
0.123

ST
TCAACA
1361.39
1485
1.091
0.087

ST
TCAACT
1203.91
1270
1.055
0.053

ST
TCTACT
1489.18
1390
0.933
−0.069

ST
TCTACA
1683.97
1461
0.868
−0.142

ST
AGTACT
1223.64
1036
0.847
−0.166

ST
AGTACA
1383.69
1061
0.767
−0.266

ST
TCGACG
207.72
145
0.698
−0.359

ST
TCTACC
2080.15
1218
0.586
−0.535

ST
TCGACC
632.34
365
0.577
−0.550

ST
AGTACC
1709.24
976
0.571
−0.560

ST
TCGACT
452.69
240
0.530
−0.635

ST
TCAACC
1681.68
873
0.519
−0.656

ST
TCAACG
552.43
275
0.498
−0.698

ST
TCGACA
511.90
236
0.461
−0.774

ST
TCTACG
683.32
302
0.442
−0.817

ST
AGTACG
561.48
201
0.358
−1.027

SV
TCGGTG
935.47
1822
1.948
0.667

SV
TCTGTA
788.92
1398
1.772
0.572

SV
TCTGTT
1214.96
2136
1.758
0.564

SV
TCAGTA
637.79
1121
1.758
0.564

SV
AGTGTT
998.32
1719
1.722
0.543

SV
TCAGTT
982.23
1591
1.620
0.482

SV
TCTGTC
1555.54
2367
1.522
0.420

SV
AGTGTC
1278.17
1943
1.520
0.419

SV
TCTGTG
3077.33
4672
1.518
0.418

SV
AGTGTA
648.24
976
1.506
0.409

SV
TCGGTC
472.87
683
1.444
0.368

SV
TCAGTG
2487.84
2925
1.176
0.162

SV
AGTGTG
2528.60
2901
1.147
0.137

SV
TCAGTC
1257.56
1351
1.074
0.072

SV
TCGGTA
239.82
231
0.963
−0.037

SV
TCGGTT
369.33
266
0.720
−0.328

SV
AGCGTC
2025.93
1298
0.641
−0.445

SV
TCCGTG
3537.57
2065
0.584
−0.538

SV
AGCGTG
4007.89
2221
0.554
−0.590

SV
TCCGTC
1788.18
829
0.464
−0.769

SV
AGCGTT
1582.36
446
0.282
−1.266

SV
TCCGTA
906.91
239
0.264
−1.334

SV
TCCGTT
1396.67
329
0.236
−1.446

SV
AGCGTA
1027.48
217
0.211
−1.555

SW
TCCTGG
1756.97
2825
1.608
0.475

SW
AGCTGG
1990.56
2404
1.208
0.189

SW
TCGTGG
464.61
444
0.956
−0.045

SW
TCTTGG
1528.39
1137
0.744
−0.296

SW
TCATGG
1235.61
778
0.630
−0.463

SW
AGTTGG
1255.86
644
0.513
−0.668

SY
TCCTAC
1871.53
3038
1.623
0.484

SY
AGCTAC
2120.35
2864
1.351
0.301

SY
TCCTAT
1520.75
1869
1.229
0.206

SY
AGCTAT
1722.94
1609
0.934
−0.068

SY
AGTTAT
1087.01
1010
0.929
−0.073

SY
AGTTAC
1337.74
1153
0.862
−0.149

SY
TCATAT
1069.49
897
0.839
−0.176

SY
TCTTAT
1322.91
1100
0.832
−0.185

SY
TCTTAC
1628.04
1204
0.740
−0.302

SY
TCGTAC
494.91
304
0.614
−0.487

SY
TCGTAT
402.15
204
0.507
−0.679

SY
TCATAC
1316.18
642
0.488
−0.718

TA
ACGGCG
348.71
734
2.105
0.744

TA
ACAGCA
1829.79
3283
1.794
0.585

TA
ACGGCC
1289.71
2090
1.621
0.483

TA
ACTGCA
1618.13
2557
1.580
0.458

TA
ACAGCT
2090.24
3295
1.576
0.455

TA
ACTGCT
1848.45
2764
1.495
0.402

TA
ACAGCC
3178.34
3912
1.231
0.208

TA
ACGGCA
742.49
804
1.083
0.080

TA
ACTGCC
2810.69
3015
1.073
0.070

TA
ACGGCT
848.18
804
0.948
−0.053

TA
ACAGCG
859.36
803
0.934
−0.068

TA
ACTGCG
759.96
623
0.820
−0.199

TA
ACCGCG
1061.55
584
0.550
−0.598

TA
ACCGCC
3926.11
1648
0.420
−0.868

TA
ACCGCA
2260.29
561
0.248
−1.394

TA
ACCGCT
2582.01
577
0.223
−1.498

TC
ACCTGC
1892.82
3247
1.715
0.540

TC
ACCTGT
1594.30
1994
1.251
0.224

TC
ACGTGC
621.78
691
1.111
0.106

TC
ACGTGT
523.72
484
0.924
−0.079

TC
ACTTGT
1141.35
1033
0.905
−0.100

TC
ACATGT
1290.64
938
0.727
−0.319

TC
ACTTGC
1355.07
815
0.601
−0.508

TC
ACATGC
1532.31
750
0.489
−0.714

TD
ACAGAT
2415.25
4195
1.737
0.552

TD
ACAGAC
2728.31
3765
1.380
0.322

TD
ACTGAT
2135.87
2913
1.364
0.310

TD
ACGGAC
1107.10
1446
1.306
0.267

TD
ACTGAC
2412.71
2615
1.084
0.081

TD
ACGGAT
980.07
922
0.941
−0.061

TD
ACCGAC
3370.20
1547
0.459
−0.779

TD
ACCGAT
2983.49
730
0.245
−1.408

TE
ACAGAA
3127.33
5307
1.697
0.529

TE
ACGGAG
1697.07
2517
1.483
0.394

TE
ACTGAA
2765.58
4093
1.480
0.392

TE
ACAGAG
4182.23
5419
1.296
0.259

TE
ACTGAG
3698.46
4124
1.115
0.109

TE
ACGGAA
1269.01
1080
0.851
−0.161

TE
ACCGAG
5166.20
2450
0.474
−0.746

TE
ACCGAA
3863.10
779
0.202
−1.601

TF
ACCTTC
3026.54
4955
1.637
0.493

TF
ACATTT
2140.61
2275
1.063
0.061

TF
ACTTTT
1893.00
1904
1.006
0.006

TF
ACCTTT
2644.23
2518
0.952
−0.049

TF
ACTTTC
2166.69
1822
0.841
−0.173

TF
ACGTTT
868.62
650
0.748
−0.290

TF
ACGTTC
994.21
666
0.670
−0.401

TF
ACATTC
2450.10
1394
0.569
−0.564

TG
ACTGGA
1710.74
3660
2.139
0.761

TG
ACTGGT
1107.57
1887
1.704
0.533

TG
ACAGGA
1934.51
2970
1.535
0.429

TG
ACGGGC
1064.34
1583
1.487
0.397

TG
ACTGGG
1670.12
2322
1.390
0.330

TG
ACGGGG
766.35
1049
1.369
0.314

TG
ACAGGT
1252.44
1694
1.353
0.302

TG
ACAGGG
1888.57
2148
1.137
0.129

TG
ACTGGC
2319.53
2620
1.130
0.122

TG
ACAGGC
2622.93
2664
1.016
0.016

TG
ACGGGT
508.22
484
0.952
−0.049

TG
ACGGGA
784.99
710
0.904
−0.100

TG
ACCGGG
2332.90
1093
0.469
−0.758

TG
ACCGGC
3240.03
1373
0.424
−0.859

TG
ACCGGT
1547.11
355
0.229
−1.472

TG
ACCGGA
2389.65
528
0.221
−1.510

TH
ACTCAT
1054.95
1291
1.224
0.202

TH
ACCCAC
2032.09
2408
1.185
0.170

TH
ACGCAC
667.53
764
1.145
0.135

TH
ACACAT
1192.94
1186
0.994
−0.006

TH
ACTCAC
1454.76
1384
0.951
−0.050

TH
ACCCAT
1473.60
1287
0.873
−0.135

TH
ACACAC
1645.05
1383
0.841
−0.174

TH
ACGCAT
484.07
302
0.624
−0.472

TI
ACCATC
2842.70
5915
2.081
0.733

TI
ACCATT
2247.97
2878
1.280
0.247

TI
ACAATA
836.96
980
1.171
0.158

TI
ACCATA
1033.87
1137
1.100
0.095

TI
ACAATT
1819.82
1579
0.868
−0.142

TI
ACTATA
740.14
642
0.867
−0.142

TI
ACTATT
1609.31
1337
0.831
−0.185

TI
ACGATA
339.62
190
0.559
−0.581

TI
ACGATT
738.45
389
0.527
−0.641

TI
ACGATC
933.81
463
0.496
−0.702

TI
ACTATC
2035.08
942
0.463
−0.770

TI
ACAATC
2301.27
1027
0.446
−0.807

TK
ACCAAG
3878.56
6678
1.722
0.543

TK
ACCAAA
2994.77
3789
1.265
0.235

TK
ACAAAA
2424.38
2546
1.050
0.049

TK
ACAAAG
3139.84
2507
0.798
−0.225

TK
ACTAAA
2143.95
1684
0.785
−0.241

TK
ACGAAG
1274.09
708
0.556
−0.588

TK
ACGAAA
983.77
511
0.519
−0.655

TK
ACTAAG
2776.65
1193
0.430
−0.845

TL
ACGCTG
1815.48
3357
1.849
0.615

TL
ACTTTA
765.72
1207
1.576
0.455

TL
ACTTTG
1286.34
1876
1.458
0.377

TL
ACATTA
865.87
1115
1.288
0.253

TL
ACCTTG
1796.82
2257
1.256
0.228

TL
ACTCTA
707.99
876
1.237
0.213

TL
ACGCTC
873.61
1057
1.210
0.191

TL
ACCCTC
2659.44
3133
1.178
0.164

TL
ACCCTG
5526.65
6354
1.150
0.140

TL
ACTCTT
1312.81
1469
1.119
0.112

TL
ACACTA
800.60
799
0.998
−0.002

TL
ACGCTA
324.87
307
0.945
−0.057

TL
ACCTTA
1069.59
957
0.895
−0.111

TL
ACACTT
1484.53
1316
0.886
−0.121

TL
ACGTTG
590.25
505
0.856
−0.156

TL
ACATTG
1454.60
1210
0.832
−0.184

TL
ACCCTT
1833.80
1515
0.826
−0.191

TL
ACCCTA
988.95
802
0.811
−0.210

TL
ACTCTG
3956.51
3120
0.789
−0.238

TL
ACGTTA
351.36
262
0.746
−0.293

TL
ACTCTC
1903.88
1391
0.731
−0.314

TL
ACGCTT
602.39
427
0.709
−0.344

TL
ACACTG
4474.03
3013
0.673
−0.395

TL
ACACTC
2152.92
1274
0.592
−0.525

TM
ACCATG
2733.42
4467
1.634
0.491

TM
ACAATG
2212.81
1641
0.742
−0.299

TM
ACGATG
897.92
655
0.729
−0.315

TM
ACTATG
1956.85
1038
0.530
−0.634

TN
ACCAAC
2378.62
4300
1.808
0.592

TN
ACAAAT
1748.34
2194
1.255
0.227

TN
ACCAAT
2159.68
2454
1.136
0.128

TN
ACAAAC
1925.59
1486
0.772
−0.259

TN
ACTAAT
1546.11
1077
0.697
−0.362

TN
ACGAAT
709.45
336
0.474
−0.747

TN
ACTAAC
1702.85
789
0.463
−0.769

TN
ACGAAC
781.37
316
0.404
−0.905

TP
ACGCCG
349.03
632
1.811
0.594

TP
ACGCCC
963.29
1491
1.548
0.437

TP
ACTCCA
1814.66
2359
1.300
0.262

TP
ACCCCG
1062.52
1331
1.253
0.225

TP
ACTCCT
1880.23
2186
1.163
0.151

TP
ACACCA
2052.02
2361
1.151
0.140

TP
ACCCCA
2534.80
2784
1.098
0.094

TP
ACACCT
2126.17
2104
0.990
−0.010

TP
ACCCCT
2626.39
2415
0.920
−0.084

TP
ACGCCA
832.67
748
0.898
−0.107

TP
ACCCCC
2932.43
2380
0.812
−0.209

TP
ACACCC
2373.91
1922
0.810
−0.211

TP
ACGCCT
862.76
697
0.808
−0.213

TP
ACTCCC
2099.31
1649
0.785
−0.241

TP
ACTCCG
760.66
538
0.707
−0.346

TP
ACACCG
860.15
534
0.621
−0.477

TQ
ACTCAA
1103.35
1368
1.240
0.215

TQ
ACCCAG
4303.71
5173
1.202
0.184

TQ
ACGCAG
1413.75
1518
1.074
0.071

TQ
ACACAA
1247.67
1328
1.064
0.062

TQ
ACTCAG
3081.01
2839
0.921
−0.082

TQ
ACCCAA
1541.21
1410
0.915
−0.089

TQ
ACACAG
3484.02
2765
0.794
−0.231

TQ
ACGCAA
506.28
280
0.553
−0.592

TR
ACCAGG
1331.08
2049
1.539
0.431

TR
ACGCGC
403.79
605
1.498
0.404

TR
ACGCGG
441.63
661
1.497
0.403

TR
ACTCGA
521.72
717
1.374
0.318

TR
ACAAGA
1097.61
1429
1.302
0.264

TR
ACCCGC
1229.22
1547
1.259
0.230

TR
ACCCGG
1344.40
1668
1.241
0.216

TR
ACTCGT
376.76
448
1.189
0.173

TR
ACCAGA
1355.85
1599
1.179
0.165

TR
ACCCGA
728.77
758
1.040
0.039

TR
ACCCGT
526.27
535
1.017
0.016

TR
ACAAGG
1077.56
1072
0.995
−0.005

TR
ACGAGG
437.25
433
0.990
−0.010

TR
ACTCGG
962.45
823
0.855
−0.157

TR
ACGCGT
172.88
141
0.816
−0.204

TR
ACACGT
426.04
329
0.772
−0.258

TR
ACGAGA
445.39
331
0.743
−0.297

TR
ACACGA
589.97
432
0.732
−0.312

TR
ACACGG
1088.34
756
0.695
−0.364

TR
ACTCGC
879.99
607
0.690
−0.371

TR
ACTAGA
970.65
624
0.643
−0.442

TR
ACGCGA
239.40
150
0.627
−0.468

TR
ACACGC
995.10
498
0.500
−0.692

TR
ACTAGG
952.91
383
0.402
−0.911

TS
ACCAGC
2807.29
4575
1.630
0.488

TS
ACCTCG
655.24
1060
1.618
0.481

TS
ACGTCG
215.24
348
1.617
0.480

TS
ACTTCA
1247.51
1844
1.478
0.391

TS
ACTTCT
1543.11
1974
1.279
0.246

TS
ACATCA
1410.69
1754
1.243
0.218

TS
ACCAGT
1771.14
2194
1.239
0.214

TS
ACCTCC
2477.85
3050
1.231
0.208

TS
ACCTCA
1742.59
1938
1.112
0.106

TS
ACATCT
1744.95
1911
1.095
0.091

TS
ACGTCC
813.96
840
1.032
0.031

TS
ACCTCT
2155.49
2072
0.961
−0.040

TS
ACAAGT
1433.80
1335
0.931
−0.071

TS
ACTTCC
1773.89
1524
0.859
−0.152

TS
ACGTCA
572.43
450
0.786
−0.241

TS
ACATCC
2005.92
1570
0.783
−0.245

TS
ACTTCG
469.09
353
0.753
−0.284

TS
ACGTCT
708.07
527
0.744
−0.295

TS
ACATCG
530.44
361
0.681
−0.385

TS
ACTAGT
1267.95
725
0.572
−0.559

TS
ACAAGC
2272.61
1275
0.561
−0.578

TS
ACGAGT
581.81
297
0.510
−0.672

TS
ACGAGC
922.18
469
0.509
−0.676

TS
ACTAGC
2009.73
687
0.342
−1.073

TT
ACCACG
875.88
1567
1.789
0.582

TT
ACCACC
2666.32
4767
1.788
0.581

TT
ACCACA
2158.49
2882
1.335
0.289

TT
ACCACT
1908.81
2309
1.210
0.190

TT
ACAACA
1747.38
1793
1.026
0.026

TT
ACAACT
1545.26
1567
1.014
0.014

TT
ACGACG
287.72
252
0.876
−0.133

TT
ACTACT
1366.51
1065
0.779
−0.249

TT
ACTACA
1545.26
1196
0.774
−0.256

TT
ACGACC
875.88
575
0.656
−0.421

TT
ACGACA
709.06
437
0.616
−0.484

TT
ACAACC
2158.49
1310
0.607
−0.499

TT
ACGACT
627.04
357
0.569
−0.563

TT
ACTACC
1908.81
992
0.520
−0.655

TT
ACAACG
709.06
365
0.515
−0.664

TT
ACTACG
627.04
283
0.451
−0.796

TV
ACTGTA
845.20
1425
1.686
0.522

TV
ACTGTT
1301.64
2058
1.581
0.458

TV
ACGGTG
1512.80
2306
1.524
0.422

TV
ACAGTA
955.76
1371
1.434
0.361

TV
ACTGTC
1666.51
2289
1.374
0.317

TV
ACAGTT
1471.90
2019
1.372
0.316

TV
ACTGTG
3296.87
4505
1.366
0.312

TV
ACGGTC
764.70
911
1.191
0.175

TV
ACAGTG
3728.11
4108
1.102
0.097

TV
ACAGTC
1884.50
1933
1.026
0.025

TV
ACGGTA
387.83
286
0.737
−0.305

TV
ACGGTT
597.27
415
0.695
−0.364

TV
ACCGTG
4605.23
2640
0.573
−0.556

TV
ACCGTC
2327.87
1285
0.552
−0.594

TV
ACCGTT
1818.19
496
0.273
−1.299

TV
ACCGTA
1180.62
298
0.252
−1.377

TW
ACGTGG
606.25
837
1.381
0.323

TW
ACCTGG
1845.52
2403
1.302
0.264

TW
ACATGG
1494.02
1089
0.729
−0.316

TW
ACTTGG
1321.21
938
0.710
−0.343

TY
ACCTAC
2130.11
3648
1.713
0.538

TY
ACCTAT
1730.88
1778
1.027
0.027

TY
ACTTAC
1524.94
1383
0.907
−0.098

TY
ACGTAC
699.73
621
0.887
−0.119

TY
ACATAT
1401.21
1136
0.811
−0.210

TY
ACTTAT
1239.13
907
0.732
−0.312

TY
ACGTAT
568.59
408
0.718
−0.332

TY
ACATAC
1724.41
1138
0.660
−0.416

VA
GTGGCC
6082.92
9316
1.532
0.426

VA
GTAGCA
897.78
1347
1.500
0.406

VA
GTTGCT
1579.41
2217
1.404
0.339

VA
GTAGCT
1025.57
1407
1.372
0.316

VA
GTGGCT
4000.44
5252
1.313
0.272

VA
GTGGCG
1644.71
2099
1.276
0.244

VA
GTTGCA
1382.62
1728
1.250
0.223

VA
GTGGCA
3501.98
3859
1.102
0.097

VA
GTAGCC
1559.44
1363
0.874
−0.135

VA
GTTGCC
2401.60
1808
0.753
−0.284

VA
GTAGCG
421.64
216
0.512
−0.669

VA
GTTGCG
649.35
234
0.360
−1.021

VA
GTCGCG
831.37
284
0.342
−1.074

VA
GTCGCC
3074.82
992
0.323
−1.131

VA
GTCGCT
2022.16
406
0.201
−1.606

VA
GTCGCA
1770.19
318
0.180
−1.717

VC
GTCTGC
1410.66
2160
1.531
0.426

VC
GTCTGT
1188.18
1572
1.323
0.280

VC
GTTTGT
928.03
942
1.015
0.015

VC
GTATGT
602.60
594
0.986
−0.014

VC
GTGTGC
2790.71
2583
0.926
−0.077

VC
GTGTGT
2350.57
1996
0.849
−0.164

VC
GTTTGC
1101.80
830
0.753
−0.283

VC
GTATGC
715.44
411
0.574
−0.554

VD
GTAGAT
1225.65
1924
1.570
0.451

VD
GTGGAC
5400.58
7734
1.432
0.359

VD
GTTGAT
1887.55
2389
1.266
0.236

VD
GTGGAT
4780.91
5727
1.198
0.181

VD
GTAGAC
1384.52
1346
0.972
−0.028

VD
GTTGAC
2132.21
1791
0.840
−0.174

VD
GTCGAC
2729.91
602
0.221
−1.512

VD
GTCGAT
2416.67
445
0.184
−1.692

VE
GTAGAA
1456.83
2855
1.960
0.673

VE
GTGGAG
7599.48
11579
1.524
0.421

VE
GTTGAA
2243.56
2905
1.295
0.258

VE
GTGGAA
5682.64
6229
1.096
0.092

VE
GTAGAG
1948.24
2002
1.028
0.027

VE
GTTGAG
3000.36
1987
0.662
−0.412

VE
GTCGAG
3841.42
721
0.188
−1.673

VE
GTCGAA
2872.48
367
0.128
−2.058

VF
GTCTTC
2309.08
4216
1.826
0.602

VF
GTATTT
1023.16
1512
1.478
0.391

VF
GTCTTT
2017.40
2238
1.109
0.104

VF
GTTTTT
1575.70
1706
1.083
0.079

VF
GTTTTC
1803.52
1604
0.889
−0.117

VF
GTGTTT
3991.02
3257
0.816
−0.203

VF
GTGTTC
4568.05
3205
0.702
−0.354

VF
GTATTC
1171.09
721
0.616
−0.485

VG
GTTGGT
779.74
1617
2.074
0.729

VG
GTTGGA
1204.37
2315
1.922
0.653

VG
GTGGGC
4136.07
5977
1.445
0.368

VG
GTAGGA
782.04
1089
1.393
0.331

VG
GTTGGG
1175.77
1510
1.284
0.250

VG
GTTGGC
1632.96
1794
1.099
0.094

VG
GTAGGT
506.31
554
1.094
0.090

VG
GTGGGG
2978.07
3255
1.093
0.089

VG
GTGGGT
1974.96
2009
1.017
0.017

VG
GTAGGG
763.47
683
0.895
−0.111

VG
GTGGGA
3050.51
2599
0.852
−0.160

VG
GTAGGC
1060.34
676
0.638
−0.450

VG
GTCGGG
1505.36
734
0.488
−0.718

VG
GTCGGC
2090.72
734
0.351
−1.047

VG
GTCGGT
998.31
292
0.292
−1.229

VG
GTCGGA
1541.98
343
0.222
−1.503

VH
GTTCAT
911.79
1418
1.555
0.442

VH
GTACAT
592.06
773
1.306
0.267

VH
GTCCAC
1609.82
2085
1.295
0.259

VH
GTCCAT
1167.39
1313
1.125
0.118

VH
GTTCAC
1257.35
1319
1.049
0.048

VH
GTGCAC
3184.70
2856
0.897
−0.109

VH
GTACAC
816.44
613
0.751
−0.287

VH
GTGCAT
2309.44
1472
0.637
−0.450

VI
GTCATC
2367.78
5207
2.199
0.788

VI
GTCATT
1872.41
2827
1.510
0.412

VI
GTAATA
436.74
614
1.406
0.341

VI
GTAATT
949.63
1074
1.131
0.123

VI
GTTATT
1462.46
1595
1.091
0.087

VI
GTCATA
861.15
904
1.050
0.049

VI
GTTATA
672.60
702
1.044
0.043

VI
GTGATT
3704.20
2742
0.740
−0.301

VI
GTGATC
4684.19
3353
0.716
−0.334

VI
GTGATA
1703.61
1117
0.656
−0.422

VI
GTTATC
1849.37
1053
0.569
−0.563

VI
GTAATC
1200.86
577
0.480
−0.733

VK
GTAAAA
1288.46
1945
1.510
0.412

VK
GTCAAG
3290.24
3982
1.210
0.191

VK
GTGAAG
6509.08
7513
1.154
0.143

VK
GTAAAG
1668.70
1704
1.021
0.021

VK
GTCAAA
2540.51
2376
0.935
−0.067

VK
GTTAAA
1984.27
1777
0.896
−0.110

VK
GTGAAA
5025.89
4409
0.877
−0.131

VK
GTTAAG
2569.85
1171
0.456
−0.786

VL
GTTTTA
668.83
1311
1.960
0.673

VL
GTTCTT
1146.70
1859
1.621
0.483

VL
GTTTTG
1123.58
1737
1.546
0.436

VL
GTATTA
434.30
646
1.487
0.397

VL
GTCCTC
2129.16
3019
1.418
0.349

VL
GTTCTA
618.41
832
1.345
0.297

VL
GTCCTG
4424.65
5574
1.260
0.231

VL
GTCCTT
1468.14
1722
1.173
0.159

VL
GTGCTG
8753.31
10107
1.155
0.144

VL
GTCTTG
1438.54
1628
1.132
0.124

VL
GTACTA
401.55
447
1.113
0.107

VL
GTCCTA
791.76
874
1.104
0.099

VL
GTCTTA
856.32
863
1.008
0.008

VL
GTATTG
729.58
711
0.975
−0.026

VL
GTACTT
744.59
693
0.931
−0.072

VL
GTTCTC
1662.99
1501
0.903
−0.102

VL
GTGCTC
4212.12
3765
0.894
−0.112

VL
GTGCTA
1566.34
1286
0.821
−0.197

VL
GTTCTG
3455.90
2350
0.680
−0.386

VL
GTGTTG
2845.87
1910
0.671
−0.399

VL
GTGCTT
2904.43
1933
0.666
−0.407

VL
GTGTTA
1694.06
965
0.570
−0.563

VL
GTACTC
1079.84
541
0.501
−0.691

VL
GTACTG
2244.04
1121
0.500
−0.694

VM
GTCATG
2149.52
3308
1.539
0.431

VM
GTGATG
4252.41
3872
0.911
−0.094

VM
GTAATG
1090.17
935
0.858
−0.154

VM
GTTATG
1678.90
1056
0.629
−0.464

VN
GTCAAC
2052.00
3311
1.614
0.478

VN
GTAAAT
944.92
1518
1.606
0.474

VN
GTCAAT
1863.13
2155
1.157
0.146

VN
GTTAAT
1455.20
1325
0.911
−0.094

VN
GTGAAC
4059.49
3551
0.875
−0.134

VN
GTGAAT
3685.83
3110
0.844
−0.170

VN
GTAAAC
1040.71
854
0.821
−0.198

VN
GTTAAC
1602.73
880
0.549
−0.600

VP
GTTCCT
1434.04
2257
1.574
0.454

VP
GTTCCA
1384.03
1911
1.381
0.323

VP
GTGCCC
4055.45
4998
1.232
0.209

VP
GTACCT
931.17
1048
1.125
0.118

VP
GTCCCC
2049.96
2260
1.102
0.098

VP
GTCCCT
1836.02
2014
1.097
0.093

VP
GTACCA
898.70
963
1.072
0.069

VP
GTCCCG
742.77
786
1.058
0.057

VP
GTTCCC
1601.13
1506
0.941
−0.061

VP
GTCCCA
1772.00
1596
0.901
−0.105

VP
GTGCCT
3632.21
3062
0.843
−0.171

VP
GTGCCG
1469.43
1228
0.836
−0.179

VP
GTACCC
1039.67
809
0.778
−0.251

VP
GTGCCA
3505.55
2431
0.693
−0.366

VP
GTTCCG
580.15
279
0.481
−0.732

VP
GTACCG
376.71
161
0.427
−0.850

VQ
GTACAA
633.37
1049
1.656
0.505

VQ
GTTCAA
975.42
1485
1.522
0.420

VQ
GTCCAG
3487.32
3907
1.120
0.114

VQ
GTACAG
1768.65
1752
0.991
−0.009

VQ
GTTCAG
2723.79
2689
0.987
−0.013

VQ
GTGCAG
6898.98
6734
0.976
−0.024

VQ
GTCCAA
1248.85
1067
0.854
−0.157

VQ
GTGCAA
2470.60
1524
0.617
−0.483

VR
GTTCGA
463.33
867
1.871
0.627

VR
GTTCGT
334.59
580
1.733
0.550

VR
GTCCGA
593.21
805
1.357
0.305

VR
GTCCGC
1000.57
1332
1.331
0.286

VR
GTGCGC
1979.43
2543
1.285
0.251

VR
GTCCGT
428.38
549
1.282
0.248

VR
GTCCGG
1094.32
1346
1.230
0.207

VR
GTACGA
300.86
361
1.200
0.182

VR
GTAAGA
559.73
660
1.179
0.165

VR
GTGCGG
2164.91
2552
1.179
0.164

VR
GTCAGA
1103.65
1291
1.170
0.157

VR
GTACGT
217.26
253
1.165
0.152

VR
GTCAGG
1083.48
1238
1.143
0.133

VR
GTGAGG
2143.46
1986
0.927
−0.076

VR
GTGCGT
847.46
761
0.898
−0.108

VR
GTAAGG
549.51
444
0.808
−0.213

VR
GTTCGG
854.73
650
0.760
−0.274

VR
GTGCGA
1173.55
826
0.704
−0.351

VR
GTTCGC
781.50
545
0.697
−0.360

VR
GTGAGA
2183.35
1511
0.692
−0.368

VR
GTACGG
555.00
377
0.679
−0.387

VR
GTTAGA
862.01
556
0.645
−0.438

VR
GTACGC
507.46
286
0.564
−0.573

VR
GTTAGG
846.26
309
0.365
−1.007

VS
GTTTCT
1206.81
2161
1.791
0.583

VS
GTCTCC
1776.18
2936
1.653
0.503

VS
GTCAGC
2012.32
3223
1.602
0.471

VS
GTTTCA
975.63
1465
1.502
0.407

VS
GTCAGT
1269.59
1841
1.450
0.372

VS
GTATCT
783.62
1093
1.395
0.333

VS
GTATCA
633.51
806
1.272
0.241

VS
GTCTCT
1545.10
1847
1.195
0.178

VS
GTTTCC
1387.29
1604
1.156
0.145

VS
GTCTCG
469.69
542
1.154
0.143

VS
GTCTCA
1249.12
1333
1.067
0.065

VS
GTGTCC
3513.81
3722
1.059
0.058

VS
GTGTCG
929.19
860
0.926
−0.077

VS
GTGTCT
3056.67
2784
0.911
−0.093

VS
GTATCC
900.82
763
0.847
−0.166

VS
GTAAGT
643.89
499
0.775
−0.255

VS
GTGAGC
3980.98
2901
0.729
−0.316

VS
GTGTCA
2471.14
1710
0.692
−0.368

VS
GTTAGT
991.62
640
0.645
−0.438

VS
GTATCG
238.21
138
0.579
−0.546

VS
GTTTCG
366.85
202
0.551
−0.597

VS
GTGAGT
2511.63
1371
0.546
−0.605

VS
GTAAGC
1020.58
514
0.504
−0.686

VS
GTTAGC
1571.73
551
0.351
−1.048

VT
GTCACC
2294.69
4477
1.951
0.668

VT
GTCACT
1642.76
2452
1.493
0.401

VT
GTCACG
753.80
997
1.323
0.280

VT
GTAACT
833.15
1046
1.255
0.228

VT
GTCACA
1857.64
2207
1.188
0.172

VT
GTAACA
942.13
1096
1.163
0.151

VT
GTTACT
1283.09
1208
0.941
−0.060

VT
GTGACC
4539.59
4223
0.930
−0.072

VT
GTGACG
1491.24
1318
0.884
−0.123

VT
GTGACT
3249.88
2758
0.849
−0.164

VT
GTGACA
3674.98
2947
0.802
−0.221

VT
GTTACA
1450.92
1111
0.766
−0.267

VT
GTAACC
1163.79
758
0.651
−0.429

VT
GTTACC
1792.28
969
0.541
−0.615

VT
GTAACG
382.30
191
0.500
−0.694

VT
GTTACG
588.76
183
0.311
−1.169

VV
GTTGTA
655.54
1109
1.692
0.526

VV
GTTGTT
1009.55
1701
1.685
0.522

VV
GTAGTA
425.66
698
1.640
0.495

VV
GTGGTG
6476.64
9025
1.393
0.332

VV
GTGGTC
3273.84
4256
1.300
0.262

VV
GTAGTT
655.54
800
1.220
0.199

VV
GTTGTC
1292.55
1561
1.208
0.189

VV
GTGGTA
1660.38
1777
1.070
0.068

VV
GTGGTT
2557.05
2613
1.022
0.022

VV
GTTGTG
2557.05
2261
0.884
−0.123

VV
GTAGTG
1660.38
1161
0.699
−0.358

VV
GTAGTC
839.30
553
0.659
−0.417

VV
GTCGTC
1654.87
858
0.518
−0.657

VV
GTCGTG
3273.84
1250
0.382
−0.963

VV
GTCGTA
839.30
213
0.254
−1.371

VV
GTCGTT
1292.55
288
0.223
−1.501

VW
GTCTGG
1316.29
1763
1.339
0.292

VW
GTGTGG
2604.03
2451
0.941
−0.061

VW
GTATGG
667.58
578
0.866
−0.144

VW
GTTTGG
1028.10
824
0.801
−0.221

VY
GTCTAC
1602.79
2490
1.554
0.441

VY
GTTTAT
1017.23
1438
1.414
0.346

VY
GTATAT
660.53
875
1.325
0.281

VY
GTCTAT
1302.39
1544
1.186
0.170

VY
GTGTAC
3170.80
2654
0.837
−0.178

VY
GTTTAC
1251.87
1008
0.805
−0.217

VY
GTATAC
812.88
582
0.716
−0.334

VY
GTGTAT
2576.51
1804
0.700
−0.356

WA
TGGGCA
1469.77
1535
1.044
0.043

WA
TGGGCG
690.28
695
1.007
0.007

WA
TGGGCT
1678.97
1664
0.991
−0.009

WA
TGGGCC
2552.98
2498
0.978
−0.022

WC
TGGTGC
1057.38
1066
1.008
0.008

WC
TGGTGT
890.62
882
0.990
−0.010

WD
TGGGAC
2699.37
2807
1.040
0.039

WD
TGGGAT
2389.63
2282
0.955
−0.046

WE
TGGGAG
3580.00
3650
1.020
0.019

WE
TGGGAA
2677.00
2607
0.974
−0.026

WF
TGGTTT
1639.95
1735
1.058
0.056

WF
TGGTTC
1877.05
1782
0.949
−0.052

WG
TGGGGT
955.95
1064
1.113
0.107

WG
TGGGGC
2002.00
2179
1.088
0.085

WG
TGGGGA
1476.56
1454
0.985
−0.015

WG
TGGGGG
1441.49
1179
0.818
−0.201

WH
TGGCAT
971.42
1000
1.029
0.029

WH
TGGCAC
1339.58
1311
0.979
−0.022

WI
TGGATT
1537.91
1627
1.058
0.056

WI
TGGATA
707.30
714
1.009
0.009

WI
TGGATC
1944.78
1849
0.951
−0.051

WK
TGGAAG
3491.83
3645
1.044
0.043

WK
TGGAAA
2696.17
2543
0.943
−0.058

WL
TGGCTA
683.88
798
1.167
0.154

WL
TGGCTG
3821.78
4228
1.106
0.101

WL
TGGCTT
1268.11
1334
1.052
0.051

WL
TGGCTC
1839.05
1879
1.022
0.021

WL
TGGTTG
1242.54
855
0.688
−0.374

WL
TGGTTA
739.64
501
0.677
−0.390

WM
TGGATG
2335.00
2335
1.000
0.000

WN
TGGAAT
1978.70
2005
1.013
0.013

WN
TGGAAC
2179.30
2153
0.988
−0.012

WP
TGGCCC
1302.21
1381
1.061
0.059

WP
TGGCCG
471.84
486
1.030
0.030

WP
TGGCCA
1125.64
1123
0.998
−0.002

WP
TGGCCT
1166.31
1076
0.923
−0.081

WQ
TGGCAG
2983.56
2997
1.005
0.004

WQ
TGGCAA
1068.44
1055
0.987
−0.013

WR
TGGAGG
1198.99
1665
1.389
0.328

WR
TGGAGA
1221.30
1472
1.205
0.187

WR
TGGCGG
1210.98
979
0.808
−0.213

WR
TGGCGC
1107.23
895
0.808
−0.213

WR
TGGCGT
474.05
377
0.795
−0.229

WR
TGGCGA
656.45
481
0.733
−0.311

WS
TGGAGT
1031.75
1239
1.201
0.183

WS
TGGAGC
1635.35
1956
1.196
0.179

WS
TGGTCA
1015.12
898
0.885
−0.123

WS
TGGTCC
1443.44
1271
0.881
−0.127

WS
TGGTCT
1255.65
1076
0.857
−0.154

WS
TGGTCG
381.70
323
0.846
−0.167

WT
TGGACG
598.07
674
1.127
0.120

WT
TGGACA
1473.88
1559
1.058
0.056

WT
TGGACT
1303.39
1240
0.951
−0.050

WT
TGGACC
1820.65
1723
0.946
−0.055

WV
TGGGTC
1318.64
1378
1.045
0.044

WV
TGGGTG
2608.66
2633
1.009
0.009

WV
TGGGTA
668.77
665
0.994
−0.006

WV
TGGGTT
1029.93
950
0.922
−0.081

WW
TGGTGG
1559.00
1559
1.000
0.000

WY
TGGTAC
1444.91
1520
1.052
0.051

WY
TGGTAT
1174.09
1099
0.936
−0.066

YA
TATGCA
1120.39
2249
2.007
0.697

YA
TATGCT
1279.86
2296
1.794
0.584

YA
TATGCC
1946.11
2862
1.471
0.386

YA
TACGCG
647.56
622
0.961
−0.040

YA
TATGCG
526.19
482
0.916
−0.088

YA
TACGCC
2395.00
1402
0.585
−0.535

YA
TACGCA
1378.81
512
0.371
−0.991

YA
TACGCT
1575.07
444
0.282
−1.266

YC
TACTGC
1588.07
2411
1.518
0.418

YC
TACTGT
1337.61
1587
1.186
0.171

YC
TATTGT
1086.90
659
0.606
−0.500

YC
TATTGC
1290.42
646
0.501
−0.692

YD
TATGAT
2091.17
3707
1.773
0.572

YD
TATGAC
2362.22
3731
1.579
0.457

YD
TACGAC
2907.08
1653
0.569
−0.565

YD
TACGAT
2573.52
843
0.328
−1.116

YE
TATGAA
2515.85
5225
2.077
0.731

YE
TATGAG
3364.48
4722
1.403
0.339

YE
TACGAG
4140.53
2309
0.558
−0.584

YE
TACGAA
3096.14
861
0.278
−1.280

YF
TACTTC
2766.63
3380
1.222
0.200

YF
TATTTT
1964.12
2124
1.081
0.078

YF
TACTTT
2417.16
2201
0.911
−0.094

YF
TATTTC
2248.09
1691
0.752
−0.285

YG
TATGGA
1472.35
2874
1.952
0.669

YG
TATGGT
953.23
1665
1.747
0.558

YG
TATGGG
1437.38
2129
1.481
0.393

YG
TATGGC
1996.30
2749
1.377
0.320

YG
TACGGG
1768.93
1088
0.615
−0.486

YG
TACGGC
2456.76
1484
0.604
−0.504

YG
TACGGT
1173.10
448
0.382
−0.963

YG
TACGGA
1811.96
633
0.349
−1.052

YH
TACCAC
1862.81
2378
1.277
0.244

YH
TACCAT
1350.85
1420
1.051
0.050

YH
TATCAT
1097.67
1021
0.930
−0.072

YH
TATCAC
1513.67
1006
0.665
−0.409

YI
TACATC
2684.66
3935
1.466
0.382

YI
TACATT
2122.99
2162
1.018
0.018

YI
TATATT
1725.09
1554
0.901
−0.104

YI
TACATA
976.39
846
0.866
−0.143

YI
TATATA
793.39
648
0.817
−0.202

YI
TATATC
2181.48
1339
0.614
−0.488

YK
TACAAG
3508.58
4372
1.246
0.220

YK
TACAAA
2709.10
2847
1.051
0.050

YK
TATAAA
2201.34
2262
1.028
0.027

YK
TATAAG
2850.98
1789
0.628
−0.466

YL
TACCTG
4522.42
6324
1.398
0.335

YL
TATTTA
711.20
966
1.358
0.306

YL
TACCTC
2176.20
2598
1.194
0.177

YL
TACTTG
1470.33
1701
1.157
0.146

YL
TATTTG
1194.75
1358
1.137
0.128

YL
TACCTA
809.25
876
1.082
0.079

YL
TACCTT
1500.58
1449
0.966
−0.035

YL
TATCTT
1219.33
1166
0.956
−0.045

YL
TACTTA
875.24
763
0.872
−0.137

YL
TATCTA
657.58
541
0.823
−0.195

YL
TATCTC
1768.32
1087
0.615
−0.487

YL
TATCTG
3674.80
1751
0.476
−0.741

YM
TACATG
2325.97
3055
1.313
0.273

YM
TATATG
1890.03
1161
0.614
−0.487

YN
TACAAC
2442.24
3341
1.368
0.313

YN
TACAAT
2217.44
2200
0.992
−0.008

YN
TATAAT
1801.83
1629
0.904
−0.101

YN
TATAAC
1984.50
1276
0.643
−0.442

YP
TACCCG
668.65
1004
1.502
0.406

YP
TACCCA
1595.15
1925
1.207
0.188

YP
TATCCA
1296.18
1438
1.109
0.104

YP
TACCCC
1845.38
1961
1.063
0.061

YP
TATCCT
1343.02
1379
1.027
0.026

YP
TACCCT
1652.79
1558
0.943
−0.059

YP
TATCCC
1499.51
937
0.625
−0.470

YP
TATCCG
543.32
242
0.445
−0.809

YQ
TACCAG
3987.12
5013
1.257
0.229

YQ
TATCAA
1160.22
1179
1.016
0.016

YQ
TACCAA
1427.83
1397
0.978
−0.022

YQ
TATCAG
3239.83
2226
0.687
−0.375

YR
TACCGC
1307.70
2153
1.646
0.499

YR
TACCGA
775.30
990
1.277
0.244

YR
TACAGA
1442.41
1834
1.271
0.240

YR
TACCGG
1430.23
1796
1.256
0.228

YR
TACAGG
1416.06
1671
1.180
0.166

YR
TACCGT
559.87
642
1.147
0.137

YR
TATCGA
629.99
570
0.905
−0.100

YR
TATCGT
454.94
383
0.842
−0.172

YR
TATAGA
1172.07
827
0.706
−0.349

YR
TATCGG
1162.17
629
0.541
−0.614

YR
TATAGG
1150.66
560
0.487
−0.720

YR
TATCGC
1062.60
509
0.479
−0.736

YS
TACAGC
2204.13
3590
1.629
0.488

YS
TACTCG
514.46
783
1.522
0.420

YS
TACAGT
1390.60
1887
1.357
0.305

YS
TATTCA
1111.75
1210
1.088
0.085

YS
TACTCC
1945.47
2088
1.073
0.071

YS
TATTCT
1375.18
1466
1.066
0.064

YS
TACTCA
1368.18
1188
0.868
−0.141

YS
TATTCC
1580.84
1306
0.826
−0.191

YS
TACTCT
1692.37
1173
0.693
−0.367

YS
TATAGT
1129.96
728
0.644
−0.440

YS
TATTCG
418.04
229
0.548
−0.602

YS
TATAGC
1791.02
874
0.488
−0.717

YT
TACACG
697.26
1311
1.880
0.631

YT
TACACC
2122.58
2696
1.270
0.239

YT
TACACA
1718.31
2158
1.256
0.228

YT
TACACT
1519.54
1409
0.927
−0.076

YT
TATACT
1234.74
1049
0.850
−0.163

YT
TATACA
1396.25
1049
0.751
−0.286

YT
TATACC
1724.75
1063
0.616
−0.484

YT
TATACG
566.57
245
0.432
−0.838

YV
TATGTT
986.79
1723
1.746
0.557

YV
TATGTA
640.76
1113
1.737
0.552

YV
TATGTC
1263.40
1862
1.474
0.388

YV
TATGTG
2499.39
3382
1.353
0.302

YV
TACGTG
3075.90
2279
0.741
−0.300

YV
TACGTC
1554.82
991
0.637
−0.450

YV
TACGTA
788.55
284
0.360
−1.021

YV
TACGTT
1214.40
390
0.321
−1.136

YW
TACTGG
1609.87
2212
1.374
0.318

YW
TATTGG
1308.13
706
0.540
−0.617

YY
TACTAC
2256.03
2854
1.265
0.235

YY
TATTAT
1489.60
1459
0.979
−0.021

YY
TACTAT
1833.19
1760
0.960
−0.041

YY
TATTAC
1833.19
1339
0.730
−0.314

Number	Name	Date	Kind
6696289	Bae et al.	Feb 2004	B1
20040097439	Nicolas et al.	May 2004	A9
20040209241	Hermanson et al.	Oct 2004	A1
20080118530	Kew et al.	May 2008	A1

Number	Date	Country
2002095363	Nov 2002	WO
2006042156	Apr 2006	WO

	Number	Date	Country
	61068666	Mar 2008	US
	60909389	Mar 2007	US

	Number	Date	Country
Parent	12594173		US
Child	15258584		US

Methods of making modified viral genomes

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

International Classifications

Abstract

Description

Claims

CROSS REFERENCE TO RELATED APPLICATIONS

FEDERAL FUNDING

US Referenced Citations (4)

Foreign Referenced Citations (2)

Non-Patent Literature Citations (87)

Related Publications (1)

Provisional Applications (2)

Divisions (1)

Entry
Cheng, L. et al., “Absence of Effect of Varying Thr-Leu Codon Pairs on Protein Synthesis in a T7 System”, Biochem (2001), vol. 40, pp. 6102-6106.
Cohen, B. et al., “Natural Selection and Algorithmic Design of mRNA”, B JCB (2003), vol. 10, pp. 3-4.
Doma, M. et al., “Endonucleolytic Cleave of Euraryotic mRNAs with Stalls and Translation Elongation”, Nature (2006), vol. 440, pp. 561-564.
Garcia-Sastre, A. et al., “Genetic Manipulation of Negative-Strand RNA Virus Genomes”, Annu. Rev. Microbio. (1993), vol. 47, pp. 765-790.
Greve, J. et al., “The Major Human Rhinovirus Receptor is ICAM-1”, Cell (1989), vol. 56, pp. 839-847.
Gustafsson, C. et al., “Codon Bias and Heterologous Protein Expression”, Trends in Biotechnology (2004), vol. 22:7, pp. 346-353.
Johansen, L. et al., “The RNA Encompassing the Internal Ribosome Entry in the Poliovirus 5′ Nontranslated Region Enhances the Encapsidation of Genomic RNA”, Virology (2000), vol. 273, pp. 391-399.
Luytjes, W. et al., “Amplification, Expression, and Packaging of a Foreign Gene by Influenza Virus”, Cell (1989), vol. 59, pp. 1107-1113.
McKnight, K., “The Human Rhinovirus Internal Cis-acting replication element (cre) Exhibits Disparate Properties among Stereotypes”, Arch Virol. (2003), vol. 148, pp. 2397-2418.
Palease, P. et al., Orthomyxoviridae: The Viruses and Their Replication, Ch. 47, pp. 1647-1689 in Fields Virology (2007), vol. 2, 5th Edition, David M. Knipe, PHD, Editor-In-Chief, Wolters Kluwer, publisher, Philadelphia, USA.
Park, S. et al., “Advances in Computational Protein Design”, COSB (2004), vol. 14, pp. 487-494.
Paul, A. et al., “Internal Ribosomal Entry Site Scanning of the Poliovirus Polyprotein: Implications for Proteolytic Processing”, Virology (1998), vol. 250, 241-253.
Pelletier, J. et al., “Internal Intiiation of Translation of Eukaryotic mRNA Directed by a Sequence Derived from Poliovirus RNA”, Nature (1988), vol. 334, pp. 320-325.
Rueckert, R.R., “Picomaviruses and Their Replication”, Ch. 32, pp. 705-738, in Virology (1985), Bernard N. Fields, M.D., Editor-In-Chief, Raven Press, publisher, New York, USA.
Russell, C. et al., “The Genesis of a Pandemic Influenza Virus”, Cell (2005), pp. 368-371.
Savolainen, C. et al., “Human Rhinoviruses”, PRR (2003), vol. 4, pp. 91-98.
Tian, J. et al., “Accurate Muntiplex Gene Synthesis from Programmable DNA Microchips”, Nature (2004), vol. 432, pp. 1050-1054.
Ansardi, D., et al., “Complementation of a Poliovirus Defective Genome by a Recombinant Vaccinia Virus Which Provides Poliovirus P1 Capsid Precursor in Trans”, J. Virol. (2003), vol. 67:6, pp. 3684-3690.
Belov, G. et al., “The Major Apoptotic Pathway Activated and Suppressed by Poliovirus”, J. Virol. (2003), vol. 771, pp. 45-56.
Buchan, J. et al., “tRNA Properties Help Shape Codon Pair Preferences in Open Reading Frames”, Nucl. Acids Res. (2006), vol. 34:3, pp. 1015-1027.
Cao, X. et al., “Replication of Poliovirus RNA Containing Two Vpg Coding Sequences Leads to a Specific Deletion Event”, J. Virol. (1993), vol. 67:9, pp. 5572-5578.
Carlini, D. et al., “In Vivo Introduction of Unpreferred Synonymous Codons Into The Drosophila Adh Gene Results in Reduced Levels of ADH Protein”, Genetics (2003), vol. 163, pp, 239-243.
Cello, J. et al., “Chemical Synthesis of Poliovirus Cdna: Generation of Infectious Virus in the Absence of Natural Template”, Science (2002), vol. 297, pp. 1016-1018.
Coleman, J.R. et al., “Synthetic Construct Capsid Protein P1-Min Gene, Partial Cds”, (2007), retrived from EBI accession No. EM_SY: EU095953; Database accession No. EU095953.
Coleman, J.R. et al., “Virus Attenuation by Genome-Scale Changes in Condon Pair Bias”, Sceicne (2008), vol. 320, pp. 1784-1787.
Corpet, F., “Multiple Sequence Alignment with Hierarchical Clustering”, Nucl. Acids Res. (1988), vol. 16:22, pp. 10881-10890.
Crotty, S., et al., “RNA Virus Error Catastrophe: Direct Molecular Test by Using Ribavirin”, Proc. Natl. Acad. Sci. U.S.A. (2001), vol. 98:12, pp. 6895-6900.
Curran, J., et al., “Selection of aminoacyl-tRNAs at sense codons: the size of the tRNA variable loop determines whether the immediate 3′ nucleotide to the coder has a context effect”, Nucl. Acids Res. (1995), vol. 23:20, pp. 4104-4108.
Dove, A., et al., “Cold-Adapted Poliovirus Mutants Bypass a Postentry Replication Block”, J. Virol. (1997), vol. 71:6, pp. 4728-4735.
Enami, M. et al., “Introduction of Site-Specific Mutations into the Genome of Influenza Virus”, Proc. Natl. Acad. Sci. U.S.A. (1990), vol. 87, pp. 3802-3805.
Farabaugh, P.J. Programmed Translational Frameshifting, Microbiol Rev. (1996), vol. 60:1, pp. 103-134.
Fedorov, A. et al., “Regularities of Context-Dependent Codon Bias in Eukaryotic Genes”, Nucl. Acids Res. (2002), vol. 30:5, pp. 1192-1197.
Fodor, E. et al., “Rescue of Influenza a Virus From Recombinant DNA”, J Virol. (1999), vol. 73:11, pp. 9679-9682.
Georgescu, M. et al., “Evolution of the Sabin Type 1 Poliovirus in Humans: Characterization of Strains Isolated From Patients with Vaccine-Associated Paralytic Poliomyelitis”, J. Virol. (1997), vol. 71:10, pp. 7758-7768.
Gerber, K. et al., “Biochemical and Genetic Studies of the Initiation of Human Rhinovirus 2 RNA Replication: Identification of a Cis-Replicating Element in the Coding Sequence of 2Apro”, J. Virol. (2001), vol. 75:22, pp. 10979-10990.
Girard, S. et al., “Poliovirus Induces Apoptosis in the Mouse Central Nervous System”, J. Virol. (1999), vol. 73:7, pp. 6066-6072.
Goodfellow, I. et al., “Identification of a Cis-Acting Replication Element Within the Poliovirus Coding Region”, J. Viol. (2000), vol. 74:10, pp. 4590-4600.
Gu, W. et al., “Analysis of Synonymous Codon Usage in SARS Coronavirus and other viruses in the Nidovirales”, Virus Research (2004), vol. 101, pp. 155-161.
Gutman, G.A., et al, “Nonrandom Utilization of Codon Pairs in Escherichia coli”, Proc. Natl. Acad. Sci. U. S. A. (1989), vol. 86, pp. 3699-3703.
He, Y. et al., “Interaction of the Poliovirus Receptor with Poliovirus”, Proc. Natl. Acad. Sci. USA (2000), vol. 97:1, pp. 79-84.