This invention is related to the area of vaccines and immunity. In particular, it relates to vaccines for inducing immunity to Human Immunodeficiency Virus.
The rapid evolution of HIV-1 and resulting diversity in the viral proteomes is widely acknowledged as playing a major role in the failure of most infected individuals to control either acute or chronic HIV-1 infection (Abram et al., 2010; Goulder and Watkins, 2004; McMichael et al., 2010; Pereyra et al., 2010; Troyer et al., 2009). The sequence diversity of HIV-1 proteins is a combination of the frequency of mutations, about 1.4×10−5 per base pair (Abram et al., 2010), two to three recombination events per cycle of virus replication (Jetzt et al., 2000), and a high replication rate of about 1010 to 1012 virions per day (Perelson et al., 1996). This leads to the rapid evolution of genetically distinct mutant viruses, which accumulate within the host as a complex mixture of viral quasispecies (Eigen, 1993). Survival of the individual variant viruses is determined by the relative host fitness and a complex association of mutations and immune escape through a multiplicity of mechanisms (Brumme et al., 2009; Brumme and Walker, 2009; Liang et al., 2008; Wang et al., 2009). This process is initiated within a few days after infection by rapid selection of mutants resistant to host immune response, resulting in the development of reservoirs of progeny virus within one to two weeks after infection (Allen et al., 2005; Allen et al., 2004; Jones et al., 2009; Rychert et al., 2007; Salazar-Gonzalez et al., 2009). Changes in the proteins of the escape mutants, even of single amino acids, can result in loss of T-cell epitopes by modification of sequences required at any of several stages in the immune response mechanisms; for example, antigen protein processing of T-cell epitope sequences, epitope recognition by human leukocyte antigen (HLA), or epitope ligation and activation of T-cell receptors (Allen et al., 2004; Draenert et al., 2004; Kelleher et al., 2001; Leslie et al., 2004; Sloan-Lancaster and Allen, 1996; Yokomaku et al., 2004). Escape from the immune response is, however, limited in some individuals (HIV-controllers) and a recent report provides extensive genetic data implicating HLA-viral peptide interaction as the major factor in the control of HIV infection by these individuals (Pereyra et al., 2010). The ability of HIV-1 to escape the host immune system via mutation may also be restricted at sites of the genome (Korber et al., 2009; Yang, 2009) important for viral functions. Vaccines that target certain conserved epitopes of virus structural and regulatory proteins have been shown to elicit cellular immune responses that provide immune protection against HIV infection in BALB/c and transgenic mice (Gotch, 1998; Korber et al., 2009; Letourneau et al., 2007; Okazaki et al., 2003; Wilson et al., 2003).
There is a continuing need in the art for effective diagnosis, vaccines and treatments for HIV.
According to one aspect of the invention a polypeptide comprises one or more discontinuous segments of HIV-1 clade B proteins, said segments comprising from 9 to 40 contiguous amino acid residues, wherein said segments comprise at least one nonamer, wherein each nonamer is represented in the NCBI Entrez protein database of HIV-1 clade B proteins as of August 2008 at a frequency of greater than 80% and for which the maximum representation of individual variants from the amino acid sequence of said segments is less than 10% in said database. Two of these polypeptides are specific to HIV-1, with no matching sequence of nine amino acids in the sequences of other viruses or organisms reported in nature (as of December 2010), while many are specific to primate lentivirus group, including HIV-1 with multiclade conservation of the following possible combinations: clades A, B, C and D or clades B, A, and C or clades B, A and D or clades B, C and D or clades B and A or clades B and C or clades B and D or clade B only. The multiclade sequences may be used to specifically identify HIV-1 virus of the different clades.
Another aspect of the invention is a polynucleotide encoding the polypeptide that comprises one or more discontinuous segments of HIV-1 clade B proteins. The segments comprise from 9 to 40 contiguous amino acid residues, wherein said segments comprise at least one nonamer, wherein each nonamer is represented in the NCBI Entrez protein database of HIV-1 clade B proteins as of August 2008 at a frequency of greater than 80% and for which the maximum representation of individual variants from the amino acid sequence of said segments is less than 10% in said database.
Yet another aspect of the invention is a polypeptide made from an encoding polynucleotide, that further comprises: (a) a LAMP-1 luminal sequence comprising SEQ ID NO: 1278; and (b) a LAMP transmembrane and cytoplasmic tail comprising SEQ ID NO: 1279, wherein the luminal sequence is amino-terminal to the one or more discontinuous segments of the HIV-1 proteins which are amino-terminal to the LAMP transmembrane and cytoplasmic tail.
Additionally, a nucleic acid vector is provided that comprises the polynucleotide that encodes a polypeptide that comprises one or more discontinuous segments of HIV-1 clade B proteins, said segments comprising from 9 to 40 contiguous amino acid residues, wherein said segments comprise at least one nonamer, wherein each nonamer is represented in the NCBI Entrez protein database of HIV-1 clade B proteins as of August 2008 at a frequency of greater than 80% and for which the maximum representation of individual variants from the amino acid sequence of said segments is less than 10% in said database.
Further, a host cell is provided that comprises a nucleic acid vector that comprises the polynucleotide that encodes a polypeptide that comprises one or more discontinuous segments of HIV-1 clade B proteins, said segments comprising from 9 to 40 contiguous amino acid residues, wherein said segments comprise at least one nonamer, wherein each nonamer is represented in the NCBI Entrez protein database of HIV-1 clade B proteins as of August 2008 at a frequency of greater than 80% and for which the maximum representation of individual variants from the amino acid sequence of said segments is less than 10% in said database.
Another aspect of the invention is a method of producing a polypeptide. A host cell is cultured under conditions in which the host cell expresses the polypeptide. The host cell comprises a nucleic acid vector that comprises the polynucleotide that encodes a polypeptide that comprises one or more discontinuous segments of HIV-1 clade B proteins, said segments comprising from 9 to 40 contiguous amino acid residues, wherein said segments comprise at least one nonamer, wherein each nonamer is represented in the NCBI Entrez protein database of HIV-1 clade B proteins as of August 2008 at a frequency of greater than 80% and for which the maximum representation of individual variants from the amino acid sequence of said segments is less than 10% in said database.
A method is provided for producing a cellular vaccine. Antigen presenting cells are transfected with a nucleic acid vector, whereby the antigen presenting cells express the polypeptide. The nucleic acid vector comprises the polynucleotide that encodes a polypeptide that comprises one or more discontinuous segments of HIV-1 clade B proteins, said segments comprising from 9 to 40 contiguous amino acid residues, wherein said segments comprise at least one nonamer, wherein each nonamer is represented in the NCBI Entrez protein database of HIV-1 clade B proteins as of August 2008 at a frequency of greater than 80% and for which the maximum representation of individual variants from the amino acid sequence of said segments is less than 10% in said database.
A method of making a vaccine is another aspect of the invention. The method comprises mixing together a polypeptide and an immune adjuvant. The polypeptide comprises one or more discontinuous segments of HIV-1 clade B proteins, said segments comprising from 9 to 40 contiguous amino acid residues, wherein said segments comprise at least one nonamer, wherein each nonamer is represented in the NCBI Entrez protein database of HIV-1 clade B proteins as of August 2008 at a frequency of greater than 80% and for which the maximum representation of individual variants from the amino acid sequence of said segments is less than 10% in said database.
A method of immunizing a human or other animal subject is another aspect of the invention. The method comprises administering to the human or other animal subject a polypeptide or a nucleic acid vector or a host cell, in an amount effective to elicit HIV-specific T-cell activation. The polypeptide comprises one or more discontinuous segments of HIV-1 clade B proteins, said segments comprising from 9 to 40 contiguous amino acid residues, wherein said segments comprise at least one nonamer, wherein each nonamer is represented in the NCBI Entrez protein database of HIV-1 clade B proteins as of August 2008 at a frequency of greater than 80% and for which the maximum representation of individual variants from the amino acid sequence of said segments is less than 10% in said database. The nucleic acid vector comprises the polynucleotide that encodes a polypeptide that comprises one or more discontinuous segments of HIV-1 clade B proteins, said segments comprising from 9 to 40 contiguous amino acid residues, wherein said segments comprise at least one nonamer, wherein each nonamer is represented in the NCBI Entrez protein database of HIV-1 clade B proteins as of August 2008 at a frequency of greater than 80% and for which the maximum representation of individual variants from the amino acid sequence of said segments is less than 10% in said database. The host cell comprises a nucleic acid vector that comprises the polynucleotide that encodes a polypeptide that comprises one or more discontinuous segments of HIV-1 clade B proteins, said segments comprising from 9 to 40 contiguous amino acid residues, wherein said segments comprise at least one nonamer, wherein each nonamer is represented in the NCBI Entrez protein database of HIV-1 clade B proteins as of August 2008 at a frequency of greater than 80% and for which the maximum representation of individual variants from the amino acid sequence of said segments is less than 10% in said database.
Additional aspects of the invention permit the identification of lentivirus group, species, or clade. Oligonucleotide probes hybridize to genomic nucleic acid or its complement and identify group, species, or clade.
Another aspect of the invention involves protein-based diagnosis. A polypeptide which represents a conserved sequence according to the invention or an antibody which specifically binds such a conserved sequence is used to interrogate by binding a body sample of a patient. An antibody is used to identify viral protein in virus infected cells. A polypeptide is used to identify a patient's own antibodies to a lentivirus. Specific binding can be used to identify the presence in the patient of the primate lentivirus group species, including the HIV-1 species, of a specific clade, biclade, triclade or pan-clade.
These and other embodiments, which will be apparent to those of skill in the art upon reading the specification provide the art with methods and tools for reducing risk, severity, symptoms, and/or duration of acquired immunodeficiency disease. Thus the vaccines may be either prophylactic or therapeutic.
The inventors have identified and selected polypeptides that represent epitopes in humans, which are conserved in at least 80% of all recorded HIV clade B viruses as of August 2008, and wherein individual variants have an incidence of less than 10%. Selection criteria may be increased in stringency to, for example at least 85% or 90% or 95% incidence of primary conserved sequence and decreased individual variant stringency to an incidence of less than 5% or 1%. These epitopes are useful for vaccines as well as for diagnostic assays.
Discontinuous segments of the HIV-1 may be strung together to form a concatamer, if desired. They may be separated by spacer residues. Discontinuous segments are those that are not adjacent in the naturally occurring virus isolates. Segments are typically at least 9 amino acid residues and up to about 15, 16, 17, 18, 19, 20, 25, 30, 35, or 40 residues of contiguous amino acid residues from the virus proteome. Single segments may also be used. Because the segments are less than the whole, naturally occurring proteins, and/or because the segments are adjacent to other segments to which they are not adjacent in the proteome, the polypeptides and nucleic acids described here are non-naturally occurring.
Linkers or spacers with natural or non-naturally occurring amino acid residues may be used optionally. Particular properties may be imparted by the linkers. They may provide a particular structure or property, for example a particular kink or a particular cleavable site. Design is within the skill of the art.
Polynucleotides which encode the polypeptides may be designed and made by techniques well known in the art. The natural nucleotide sequences used by HIV-1 may be used. Alternatively non-natural nucleotide sequences may be used, including in one embodiment, human codon-optimized sequences. Design of human codon-optimized sequences is well within the skill of the ordinary artisan. Data regarding the most frequently used codons in the human genome are readily available. Optimization may be applied partially or completely.
The polynucleotides which encode the polypeptides can be replicated and/or expressed in vectors, such as DNA virus vectors, RNA virus vectors, and plasmid vectors. Preferably these will contain promoters for expressing the polypeptides in human or other mammalian or other animal cells. An example of a suitable promoter is the cytomegalovirus (CMV) promoter. Promoters may be inducible or repressible. They may be active in a tissue specific manner. They may be constitutive. They may express at high or low levels, as desired in a particular application. The vectors may be propagated in host cells for expression and collection of chimeric protein. Suitable vectors will depend on the host cells selected. In one embodiment host cells are grown in culture and the polypeptide is harvested from the cells or from the culture medium. Suitable purification techniques can be applied to the chimeric protein as are known in the art. In another embodiment one transfects antigen-presenting cells for ultimate delivery of the transfected cells to a vaccinee of a cellular vaccine which expresses and presents antigen to the vaccinee. Suitable antigen presenting cells include dendritic cells, B cells, macrophages, and epithelial cells.
Polynucleotides of the invention include diagnostic DNA or RNA oligonucleotides, i.e., short sequences of proven specificity to viral species; these are sufficient to uniquely identify the viral species or to a group or clade (SEQ ID NOs: 637-1140). Polynucleotides include oligonucleotides such as primers and probes, which may be labeled or not. These may contain all or portions of the coding sequences for an identified conserved polypeptide. Polynucleotides of the invention and/or their complements, may optionally be attached to solid supports as probes to be used diagnostically, for example, through hybridization to viral genomic sequences. Similarly, epitopic polypeptides can be attached to solid supports to be used diagnostically. These can be used to screen for activated T cells or even antibodies. Suitable solid supports include without limitation microarrays, microspheres, and microtiter wells. Antibodies may be used that are directed against the peptides as disclosed. The antibodies may be used to specifically diagnose species of the primate lentivirus group, including HIV-1 virus with multiclade conservation of the following possible combinations: clades A, B, C and D or clades B, A, and C or clades B, A and D or clades B, C and D or clades B and A or clades B and C or clades B and D or clade B only. The multiclade sequences may be used to specifically identify HIV-1 virus of the different clades. Polynucleotides may also be used as primers, for example, of length 18-30, 25-50, or 15-75 nucleotides, to amplify the genetic material of viruses of the primate lentivirus group, including HIV-1 virus(es) of the possible clade combinations listed above. Polynucleotide primers and probes may be labeled with a fluorescent or radioactive label, if desired. These polynucleotides can be used to amplify and/or hybridize to a test sample to determine the presence or species identity of a primate lentivirus, including HIV-1 virus(es) of the possible clade combinations listed above. Such polynucleotides will typically be at least 15, 18, 20, 25, or 30 bases to 50, 70, 90, 120, 150, or 500 bases in length. Any technique, including but not limited to amplification, hybridization, single nucleotide extension, and sequencing, can be used to identify the presence or species identity of the primate lentivirus, including HIV-1 virus(es) of the possible clade combinations listed.
Immune adjuvants may be administered with the vaccines of the present invention, whether the vaccines are polypeptides, polynucleotides, nucleic acid vectors, or cellular vaccines. The adjuvants may be mixed with the specific vaccine substance prior to administration or may be delivered separately to the recipient, either before, during, or after the vaccine substance is delivered. Some immune adjuvants which may be used include CpG oligodeoxynucleotides, GM-CSF, QS-21, MF-59, alum, lecithin, squalene, and Toll-like receptors (TLRs) adaptor molecules. These include the Toll-interleukin-1 receptor domain-containing adaptor-inducing beta interferon (TRIF) or myeloid differentiation factor 88 (MyD88). Vaccines may be produced in any suitable manner, including in cultured cells, in eggs, and synthetically. In addition to adjuvants, booster doses may be provided. Boosters may be the same or a complementary type of vaccine. Boosters may include a conventional live or attenuated HIV-1 vaccine. Typically a high titer of antibody and/or T cell activation is desired with a minimum of adverse side effects.
Any of the conventional or esoteric modes of administration may be used, including oral, mucosal, or nasal. Additionally intramuscular, intravenous, intradermal, or subcutaneous delivery may be used. The administration efficiency may be enhanced by using electroporation. Optimization of the mode of administration for the particular vaccine composition may be desirable. The vaccines can be administered to patients who are infected already or to patients who do not yet have an infection. The vaccines can thus serve as prophylactic or therapeutic agents. One must, however, bear in mind that no specific level of efficacy is mandated by the words prophylactic or therapeutic. Thus the agents need not be 100% effective to be vaccines. Vaccines in general are used to reduce the incidence in a population, or to reduce the risk in an individual. They are also used to stimulate an immune response to lessen the symptoms and or severity of the disease.
The above disclosure generally describes the present invention. All references disclosed herein are expressly incorporated by reference. A more complete understanding can be obtained by reference to the following specific examples, which are provided herein for purposes of illustration only, and are not intended to limit the scope of the invention.
We conducted a large-scale, systematic analysis of the recorded HIV-1 clade B protein sequences, focused on the variability and conservation of T-cell epitope relevant sequences. Detailed analyses were performed with clade B as it has the largest number of recorded sequences and can be used as a model for similar studies of the other clades. Modified Shannon's entropy and bioinformatics approaches were used to measure nonamer conservation and variability. Nonamers were chosen as they are the typical length of HLA class I epitopes, and the cores of HLA class II epitopes (Rammensee, 1995). Variants of the conserved nonamer sequences were analysed for the identification of regions of the proteome that were not only conserved, but also had a low incidence of individual variants. The immune relevance of selected sequences was assessed by their correlation with previously reported human T-cell epitopes and our recent study in the identification of human HIV-1 T-cell epitopes by use of HLA transgenic mice (Simon et al., 2010). The studies also included the identification of a) sequences specific to HIV-1 with no shared identity to other viruses and organisms, and b) specific sequences that are multiclade conserved as vaccine targets. These sequences have direct relevance to the development of new-generation vaccines and diagnostic applications.
HIV-1 protein sequence records were retrieved from the NCBI Entrez Protein Database in August 2008 by searching the NCBI taxonomy browser for HIV-1 (Taxonomy ID 11676). HIV-1 clade B specific entries were retrieved from the data collected via BLAST (version 2.2.18) searches (Altschul et al., 1990), using default parameters, with sample HIV-1 clade B protein sequences of the nine HIV-1 proteins from the HIV database (see website of Los Alamos National Laboratory (LANL) for HIV) as queries. Cutoff for the classification of each clade B protein was determined by manual inspection of the individual BLAST outputs. Duplicate sequences of each protein were removed and the remaining unique sequences, both partial and full length, were used for protein multiple sequence alignment. Alignment was difficult for some of the proteins because of the large number of diverse sequences, and thus different approaches were explored, as described below.
Sequence alignments of Vif, Vpr and Vpu were performed with PROMALS3D (Pei et al., 2008). The Gag, Pol, Tat, Rev, Env and Nef protein sequence with large datasets were first split into smaller and more manageable sections (about 200-500 sequences per subset). These smaller subsets were aligned using PROMALS3D or CLUSTAL W (Pei et al., 2008; Thompson et al., 1994) and refined with RASCAL (Thompson et al., 2003) before merging into a full protein multiple sequence alignment, by use of conserved sites that helped anchor the alignment subsets. All multiple sequence alignments were manually inspected and corrected for misalignments. Alignment positions with high fraction of gaps, 95% or more were removed. In total 29,211 Env protein sequences were retrieved but only 9,661 sequences were aligned and analysed due to the complexity in aligning large diverse protein sequences.
Protein alignment positions of clade B were cross-referenced to the HXB2 prototype protein sequences. It should be noted that the protein alignment positions differ from the HXB2 positions due to insertions and deletions in the alignment, especially in regions of high diversity.
Shannon's entropy (Miotto et al., 2008; Shannon, 1948) was used as a measure for HIV-1 diversity. The entropy of all overlapping nonamer positions across the protein alignment of HIV-1 clade B was measured and plotted by use of the ggplot2 suite (Wickham, 2009) of the R programming language and environment (R_Development_Core_Team, 2008). Entropy analysis was carried out by use of the Antigenic Variability Analyser tool (AVANA; see sourceforge website) and following the method as described in Khan et al. (2008). Briefly, the computation of entropy involves the number and incidence of unique nonamer peptides at a given position in an alignment. Nonamer entropy H(x) for a given position x in the alignment was calculated using the formula:
where pi,x is the probability of the occurrence (or incidence) of nonamer i with its center position at x (also referred to as the “nonamer position”), and n(x) is the total number of unique peptides observed at position x. Since the entropy values were calculated for each nonamer window based on its center position, values were not assigned to the four amino acids at the beginning and end of the alignments. A position that has a large number of unique peptides with majority displaying high incidence would evaluate to a high entropy value, which would imply that this position is highly diverse, where the maximum nonamer entropy value possible is 39 (log2209). Conversely, if the position has a single peptide that is completely conserved across all the sequences at that position in the alignment, the entropy will be zero, the lowest value possible. Entropy calculations are affected by the size of an alignment, and hence the entropies within the protein alignments of HIV-1 clade B were corrected for size bias via a statistical sub-sampling method (Khan et al., 2008).
All sequences at each of the nonamer positions in the protein alignments were extracted and studied for the incidence of the primary (most common) nonamer and its variants. Variants at a given position in the alignment were defined as peptides with at least one amino acid difference from the primary nonamer. Variant nonamers that contained gaps (−) or any one of the unresolved characters, including B (asparagine or aspartic acid), J (leucine or Isoleucine), X (unspecified or unknown amino acid) and Z (glutamine or glutamic acid) were excluded from the analysis. The ggplot2 suite was used to depict the incidence of total nonamer variants and the primary variant at each nonamer position across the proteome.
Highly conserved HIV-1 clade B sequences were identified as nonamers positions with (i) a primary nonamer incidence of 80% or more of the analysed viral sequences at that position and (ii) incidence of the primary variant of less than 10% of the primary nonamer sequence at the position. Identified nonamers that were contiguous (overlapped by eight amino acids) were joined. Positions with less than 100 sequences in the alignment were excluded from the selection of conserved sequences.
Correspondence of Highly Conserved HIV-1 Clade B Sequences with Reported T-Cell Epitopes
All published human T-cell epitopes from the HIV Molecular Immunology Database (November 2010) (see website of Los Alamos National Laboratory (LANL) for immunology) and our transgenic mice study (Simon et al., 2010) with a match of at least 9 consecutive amino acids with the highly conserved HIV-1 clade B sequences were identified.
The 2008 Web alignment of the complete protein sequences of the HIV-1 clade A, C and D were obtained from HIV sequence database (see website of Los Alamos National Laboratory (LANL) for HIV). All protein alignments were manually inspected and corrected where necessary. The clade B highly conserved sequences were analysed for their incidence in the corresponding protein alignments of clade A, C and D to identify HIV-1 pan-clade highly conserved sequences. Highly conserved HIV-1 sequences common to clade B and C were also identified as there was limited data for most of the proteins of clade A and D. The criteria for identification of pan-clade and biclade highly conserved sequences was similar to that used for clade B. Identified pan-clade and biclade nonamers that were contiguous were joined to form longer sequences.
Highly conserved HIV-1 clade B sequences that overlapped at least 9 consecutive amino acids sequences of other viruses and organisms were identified by performing an exhaustive string search of the nonamers of the conserved sequences against all protein sequences reported at the NCBI Entrez protein database (as of November 2010), excluding HIV-1 records, synthetic constructs and artificial sequences.
A total of 58,052 sequences of the HIV-1 clade B proteome, over 1000 of each protein, were extracted from the NCBI Entrez Protein Database and aligned for the analysis of the evolutionary conservation and diversity (Table 1). Approximately 90% or the sequences were of the Gag, Pol, Env, and Nef proteins. The other 5 proteins almost equally shared the remaining 6513 sequences. Sequences of other clades were obtained from the HIV Sequence Database Web alignment. The clade C alignment contained almost 4000 sequences, between 300 and 600 of each protein. Clades A and D had few sequences. Duplicate sequences, either partial or full-length, were removed to eliminate the possible bias of redundant sequences derived from identical HIV-1 isolates sequenced by surveillance programs or large sequencing projects at specific sites.
aApproximate size with respect to HXB2 sequences.
bRetrieved from HIV Sequence Database Web alignment. Sequences are used for the identification of HIV-1 pan-clade sequences. Refer to materials and methods for more information.
cRetrieved from NCBI Entrez Protein Database
Shannon's entropy methodology, commonly applied to measure differences in single amino acid residues in the alignment of protein sequences, was modified to analyze each of the 3,133 nonamer positions, overlapping by eight amino acids, that represent all putative MHC binding cores of the of the HIV-1 clade B proteome. The average number of each of the nonamer sequences at a given protein position depended on the alignment of the sequences taken from the NCBI Entrez Protein Database, ranging from an average of 965 aligned sequences for Vpr and Vpu, to 5,558 for Pol (Table 2). Entropy of a nonamer sequence results from change of one or more of the 20 amino acids at a single site or at multiple sites of the 9 amino acid nonamer unit, with a maximum entropy of 39 if there were all possible changes of each amino acid (log2209). Because these units are overlapping, an amino acid at the 9th position will eventually move to the 1st as the nonamer units shift from the N- to the C-terminus. Thus, a single variant amino acid is commonly seen in 9 overlapping nonamer sequences and the diversity of a series of nonamer units with one or more variant amino acids is typically clustered.
The extraordinary evolutionary diversity of HIV-1 proteins was evident from the range in the entropy of the overlapping nonamer units (
The data of each nonamer sequence of the protein alignments quantitatively document the incidence (prevalence) of the primary nonamer, total variants of the primary nonamer, primary variant and number of unique variants (Table 2).
All nonamer positions (3133) of the aligned clade B database sequences were compared with the clade B consensus HXB2 sequence. Many of the HXB2 sequences as expected were identical to the aligned database sequences. However, the HXB2 sequences represent selected variant strain and differ markedly at many positions from the primary nonamers of the aligned database sequences, especially in regions of high diversity.
An example of highly conserved and highly variable nonamer sites are the 25 overlapping nonamer positions of Env114-122:140-148 (Table 3). The five sites of the Env114-122:118-126 were highly conserved, with entropies of 0.8 to 1.1, containing primary nonamer sequences identical to those of HXB2 and with an incidence of 86 to 89% of the ˜1000 to 1600 aligned nonamer sequences at each of these sites. The remaining ˜11% to 15% of the aligned nonamers of these conserved Env sites were variants of the primary nonamer, comprising 21 to 29 unique sequences, with a 4 to 6% incidence of the primary (most common) variant of all nonamers analysed per site. Beginning at position Env119-127 the sequence diversity increased with amino acids that differed at some sites from almost every amino acid of HXB2, nonamer entropy increased to as high at 9.8, and primary nonamer sequences represented as few as 49 (˜2%) of the over 3000 nonamers at each of these aligned positions. Practically all of the nonamer sequences at these highly diverse sites of Env were variants of the primary sequence, with over 1000 unique sequences and fewer than 100 of the primary variant sequence at any one position.
aNote that the total number of nonamer positions analysed is different from the number of amino acids from the HXB2 sequences due to insertions and deletions in the protein alignments.
bAverage number of sequences analysed at each nonamer position (1-9, 2-10, 3-11, etc) of the protein alignments. The number of sequences varies due to the inclusion of both partial and full-length sequences.
cAverage Shannon's nonamer entropy across all nonamer positions in the protein alignment. For example, the average Gag Shannon's entropy is the mean entropy across all 504 nonamer positions in the Gag protein alignment.
dThe primary nonamer is the peptide with the highest incidence at a given nonamer position in the protein alignment.
eAverage incidence of the primary (most frequent) nonamer across all the positions in the protein alignment.
fVariants of the primary nonamers are all sequences that differ by one or more amino acids from the primary nonamer at a given nonamer position in the protein alignment.
gAverage incidence of the variants of the primary nonamer in the protein alignments.
hAverage number of different variant sequences to the primary nonamer.
iAverage incidence of the primary variant nonamer, the most highly represented variant sequence of all the nonamers analysed per nonamer position in the protein alignments.
a The total number of HIV-1 clade B protein sequences obtained at the respective nonamer positions of the protein sequence alignment. The number of sequences for each nonamer position varies due to the inclusion of both partial and full-length sequences.
b Shannon's nonamer entropy.
c The nonamer sequence corresponding to the HXB2 reference sequence. Insertions to the alignment with respect to the HXB2 sequence are shown as gaps “-”.
d The primary nonamer is the peptide with the highest incidence at a given nonamer position in the protein alignment. Residues that are identical to the HXB2 sequence is denoted as “.” whereas residues that are different have their amino acids displayed. For example, at position 1-9 of Gag, the HXB2 sequence have identical sequence to that of the primary nonamer thus the primary nonamer have the sequence “.........” displayed. However at position, 22-30 in Gag, the last residue in the nonamer differs from that of HXB2, having R instead of K, and thus the nonamer sequence is shown as “........R”.
e Variants of the primary nonamers are all sequences that differ by one or more amino acids from the primary nonamer at the corresponding position in the protein alignment.
f The number of unique variants at the indicated nonamer position.
g The primary variant is the most common (highest incidence) variant nonamer at the indicated nonamer position of the protein alignment.
f The primary variant is the most common variant nonamer at the indicated nonamer position of the protein alignment.
#This example shows a region of low entropy, positions 114-128 with entropy below 1.1, which is connected to positions 127-148, a region of high diversity (entropy above 8.0), by a transitional region of intermediate entropy.
A possible criterion for effective HIV-1 vaccine design is the consideration of the incidence of total variants to the primary nonamer. The total variants at each nonamer position represent the population of possible altered ligands that the immune system maybe exposed to upon immunization with the most common or primary nonamer at the position. We thus analysed the distribution of total variants of the primary nonamer in the context of diversity across the entire HIV-1 proteome (
Highly conserved positions with low total variants (<20%) are attractive sites for selection of vaccine targets, however, such sites with a large proportion of the total variants dominated by a single primary variant should be avoided. Analysis of the incidence of primary variants for all nonamer positions across the HIV-1 proteome (FIG. 3) revealed that as total variant incidence increases there is a wide range in the fraction of the primary variant, from about <1% to a maximum incidence up to 45%, with more than 40% incidence in Gag (3 positions, <1% of all positions), Pol (19 positions, ˜2%), and Env (5 positions, <1%). The shape of the plot depicts the increasing incidence of the primary variant to a maximum limited by the incidence of the total variants (zone A in the plot), after which (>50% total variant incidence) the incidence of the primary variant is further limited by the decreasing incidence of the primary nonamer (zone B), because the primary variant, the second most common peptide at a nonamer position, cannot exceed the incidence of the most common primary nonamer. Highly conserved sites with less than 20% total variants had individual primary variants with an incidence of more than 10% in Gag (15%), Pol (14%), Env (12%) and Nef (12%). The primary nonamer of low total variant sites (<20%) with major variant of <10% are attractive targets for HIV-1 vaccine design, and were identified and joined where possible (termed as highly conserved HIV-1 Clade B sequences). This comprised for Gag, 22% or 111 of 504 total primary nonamers; Pol, 33%, 318 of 995; Vif, 14%, 25 of 184; and Env 9%, 80 of 887 (red enclosed region in
A total of 78 highly conserved HIV-1 Clade B sequences (504 total nonamers) were identified across the whole proteome (Table 4 and Table 5). The length of these peptides ranged from 9 to 40 amino acids, covering a total length of 1101 amino acids (˜35%) of the complete HIV-1 proteome (˜3133 aa). The structural (Env and Gag) and enzymatic (Pol) proteins contained the greatest number of conserved sequences. Pol, the most conserved HIV-1 clade B protein with the lowest average nonamer entropy of 1.8 and lowest average total variants incidence of about 30% (Table 2), had 31 conserved sequences covering ˜48% of the protein length. The relatively more conserved Gag and the highly variable Env had 18 (˜51% of the protein length) and 14 (˜22%) conserved sequences, respectively. For the rest of the regulatory and auxillary proteins, a total of 15 conserved sequences, spanning from 12 to 38% of the individual protein length.
aApproximate size with respect to HXB2 sequences.
bTotal number of conserved sequences of 9 or more amino acids identified for each protein.
cTotal non-overlapping conserved sequence length.
a Start and end alignment positions. Such positions corresponding to the HXB2 reference sequences are indicated in the brackets, only if they differ from the alignment positions. These differences are due to insertions and deletions in the protein alignment.
b Sequences of 9 or more amino acids formed by one or by joining more than two contiguous nonamers that have primary nonamer incidence(s) of more than 80% and less than 10% representation of the primary variant. Sequences with less than 100 nonamers at that given nonamer position were ignored.
a Start and end alignment positions. Such positions corresponding to the HXB2 reference sequences are indicated in the brackets, only if they differ from the alignment positions. These differences are due to insertions and deletions in the protein alignment.
b The total number of HIV-1 clade B protein sequences obtained at the respective nonamer positions of the protein sequence alignment. The number of sequences for each nonamer position varies due to the inclusion of both partial and full-length sequences.
c Shannon's nonamer entropy.
d The nonamer sequence corresponding to the HXB2 reference sequence. Insertions to the alignment with respect to the HXB2 sequence are shown as gaps “-”.
e The primary nonamer is the peptide with the highest incidence at a given nonamer position in the protein alignment. Residues that are identical to the HXB2 sequence is denoted as “.” whereas residues that are different have their amino acids displayed. For example, at position 1-9 of Gag, the HXB2 sequence have identical sequence to that of the primary nonamer thus the primary nonamer have the sequence “.........” displayed. However at position, 22-30 in Gag, the last residue in the nonamer differs from that of HXB2, having R instead of K, and thus the nonamer sequence is shown as “........R”.
f Total variants of the primary nonamers are all sequences that differ by one or more amino acids from the primary nonamer at the corresponding position in the protein alignment.
g The number of unique variants at the indicated nonamer position.
h The primary variant is the most common (highest incidence) variant nonamer at the indicated nonamer position of the protein alignment.
# Highly conserved clade B nonamers that are also highly conserved in clade C with a primary nonamer incidence of 80% or more, the primary variant incidence of less than 10% and 100 or more nonamers analysed at that position.
$ Highly conserved clade B nonamers that are also highly conserved in clades A and C with a primary nonamer incidence of 80% or more, the primary variant incidence of less than 10% and 100 or more nonamers analysed at that position.
+ An example interpretation of the table: The primary nonamer MGARASVLS was present in 945 sequences (~82%) of all 1156 sequences analyzed at nonamer position 1-9 in the Gag protein alignment. The remaining 211 sequences (~18%) at that position were variants of the primary nonamer and comprised 33 unique peptides, one of which is the primary variant and is present in about 10% (110) of all the 1156 analysed sequences. The remaining 101 variants at that position were represented by 36 additional variant sequences.
BLAST search of the 504 highly conserved nonamers of clade B against all reported sequences of nature revealed that two were specific to HIV-1 with no matching 9 consecutive amino acid identity, while 374 were primate lentivirus group specific, with several showing multiclade conservation (Table 6). For example, of the 504 HIV-1 clade B conserved nonamers, 330 were biclade conserved and 84 were triclade conserved (Table 6). When contiguous nonamers were joined, there were 64 biclade and 24 triclade highly conserved sequences (Table 7).
a Start and end alignment positions. Such positions corresponding to the HXB2 reference sequences are indicated in the brackets, only if they differ from the alignment positions. These differences are due to insertions and deletions in the protein alignment.
b Sequences of 9 or more amino acids formed by one or by joining more than two contiguous nonamers that have primary clade B nonamer percentage incidence(s) of more than 80% and less than 10% representation of the primary variant in the Glade B and C protein alignments, respectively. Sequences with less than 100 nonamers in at that given nonamer position will be ignored. SEQ ID NOs for each peptide are identified in Table 5 and corresponding nonamers in Table 6.
A search of the HIV Molecular Immunology Database revealed that of the 78 highly conserved HIV-1 clade B sequences, 39 matched at least nine consecutive amino acids of reported human T-cell epitopes (Table 8). These epitopes were restricted by 68 HLAs of class I alleles and 34 class II, with several promiscuous to multiple HLA alleles (HLA-supertype restricted). Twenty-one of the 39 matched conserved sequences contained the full epitope sequences. Additionally, seven of the highly conserved clade B sequences shared at least nine amino acids of Elispot positive peptides HLA-DR4 transgenic mice (Table 8) (Simon et al., 2010).
EKIRLRPGGKKKYKL(B)
WASRELERF
(B)
ASRELERFAVNPGLL(B)
SQNYPIVQNIQ(B)
SPRTLNAWV
(B)
EEKAFSPEV
(A,B,C,D)
EKAFSPEVIPMFSALSEGAT(B)
EKAFSPEVIPMFSAL(B)
KAFSPEVIPMF
(B,C)
FSPEVIPMF
(B,C)
PQDLNTMLNTVGGHQ
(B)
LSEGATPQDL
(B)
ATPQDLNTMLNT
(C)
TPQDLNTML
(A,B,C,D)
PQDLNTMLN
(B)
DLNTMLNIV
(B)
GHQAAMQML
(B,C)
INEEAAEWDRV(B)
GSDIAGTTSTQEQI(B)
GSDIAGTTSTLQEQI
(B)
PRGSDIAGTTSTLQEQIGWM(B)
GQMREPRGSDI
(B,C)
PPIPVGEIY
(B)
GLNKIVRMYSPTSIL(B)
GLNKIVRMYSPTSIL(B)
NKIVRMYSPTSILDIRQGPK(B)
GLNKIVRMY
(B)
NKIVRMYSPVSILDI(A,AG)
PKEPFRDYV
(B)
EPFRDYVDRFYKTLRAEQAS(B)
EPFRDYVDRF
(B,D)
FRDYVDRFYK
(B,D)
FRDYVDRFYKTLRAE
(A,D)
RDYVDRFYKTL
(B)
DYVDRFYKTLR
(B)
DYVDRFYKT
(B)
YVDRFYKTLRAEQASQEV(B)
YVDRFYKTL
(B)
VDRFYKTLRAEQASQ(B)
DRFYKTLRA
(B)
DRFYKTLRAEQ
(B)
RFYKTLRAEQAS(B)
FYKTLRAEQASQE(B)
FYKTLRAEQASQ(B)
YKTLRAEQA
(B)
YKTLRAEQASQ(B)
DCKTILKAL
(B)
VKNWMTETLLVQNAN
(B)
VKNWMTETLL
(B)
NANPDCKTILRAL(C)
KMIGGIGGFI(B)
FPISPIETVP
(B)
FPISPIETV
(B)
PISPIETVPVKLKPGM(
C)
SPIETVPVKL ©
FWEVQLGIPHPAGLKKKK(C)
TVLDVGDAY
(B)
NNETPGIRY
(C)
LPQGWKGSPAI
(C)
LPQGWKGSPA
(B)
EKDSWTVNDIQKLVGKL
(C)
KLVGKLNWA
(A,B,C,D)
KLNWASQIY
(B,C)
ELAENREILKEPVHGVYY(C)
EIVASCDKCQL(C)
PAETGQETAYFILKLAGR(C)
HVASGYIEA
(B)
AETGQETAYY(C)
IPYNPQSQGVV
(A,B,C,D)
QVRDQAEHL
(C)
QMAVFIHNFK
(A,B,C,D)
AVFIHNFKRK
(B,CRF01_AE)
FKRKGGIGGY
(B,C)
RKGGIGGYSAGERIVDII(B)
KIQNFRVYY
(B,C)
KIQNFRVYYR
(A,B,C,D)
LWKGEGAVVIQDNSDIKV(B)
ERAEDSGNESEGDTEELSA(C)
LWVTVYYGV
(B)
LWVTVYYGVPVWKEATTTLFCA
(B)
TVYYGVPVWK
(A,B,C,D)
TVYYGVPVW
(A,B,C,D)
TVYYGVPVWKEAKTTLF(C)
VTVYYGVPVWK
(A,B,C,D)
TTLFCASDAK
(A,B,C,D)
KLTPLCVTL
(A,B,C,D)
WLWYIKIFI(B)
FPDWQNYTP(B)
a Start and end positions. Cross reference to the alignments positions are made with the HXB2 reference sequences and the HXB2 positions might be different from the reference HXB2 sequences due to insertions. and deletions in the protein alignments. HXB2 sequence positions differing from the protein alignment positions are shown within brackets.
b Highly conserved clade B sequences. SEQ ID NOs for each peptide are identified in Table 5 and for the corresponding nonamers in Table 6.
c Epitope sequences matching nine or more amino acids of the highly conserved HIV-1 clade B sequence are underlined. The clades that the epitopes are restricted to are shown in the brackets.
The disclosure of each reference cited is expressly incorporated herein.
This invention was made with funds from the U.S. government. Therefore the U.S. government retains certain rights in the invention according to the terms of grant no. R37 AI-041908.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US11/20122 | 1/4/2011 | WO | 00 | 10/17/2012 |
Number | Date | Country | |
---|---|---|---|
61292068 | Jan 2010 | US |