Information
-
Patent Application
-
20030180324
-
Publication Number
20030180324
-
Date Filed
February 12, 200321 years ago
-
Date Published
September 25, 200321 years ago
-
CPC
-
US Classifications
-
International Classifications
- C08B011/193
- C12P019/34
- A61K039/12
- C07K002/00
- C07K004/00
- C07K005/00
- C07K007/00
- C07K014/00
- C07K016/00
- C07K017/00
- A61K038/00
Abstract
The present invention relates to an immunodeficiency virus of drill monkeys, its RNA, the corresponding cDNA, proteins derived therefrom and fragments of the nucleic acids or proteins. The invention likewise relates to the diagnostic use of the nucleic acids and proteins mentioned and their fragments and to a diagnostic.
Description
[0001] The present invention relates to the immunodeficiency virus SIM27 of drill monkeys, whose RNA or a part thereof is complementary to the sequence shown below, and variants of this virus. Moreover, the viral RNA, the corresponding cDNA, proteins derived therefrom and fragments of the nucleic acids or proteins are a subject of the present invention. The invention likewise relates to the diagnostic use of the mentioned nucleic acids and proteins and their fragments, and a diagnostic comprising these nucleic acids and/or proteins and/or fragments thereof.
[0002] Primates have been developing for approximately 30 million years, which has lead to a high degree of variability of the individual primate species The New World monkeys (Platyrrhini) are differentiated from the Old World monkeys (Catarrhini), which for their part are divided into the hominoids (Hominoidae) and the cercopithecoids (Cercopithecoidea). Together with the primates, various infective agents have also developed, which have adapted to the individual primate species or, for example, to a whole family. Examples of virus are the simian pathogenic and the human pathogenic herpesviruses, which although they can still infect individuals of another primate species, are naturally not transmitted from one primate species to the other. Other viruses still infect all primates, such as the rabies virus, the yellow fever virus and the filovirus.
[0003] Lentiviruses are subdivided into the genera of the spume viruses, the T-leukemia/lymphoma viruses and the immunodeficiency viruses. A general survey of the leukemia and immunodeficiency viruses of the monkeys and their pathogenicity is found in the article of Hayami (Hayami M et al., Curr. Top. Microbiol. Immunol. 1994; 188: 1-20). Spume viruses appear to occur only in monkeys. Since until now a pathogenicity of the spume viruses has not been detected, this virus is being less intensively investigated than HIV/SIV and HTLV/STLV.
[0004] HTLVs, the human T-leukemia viruses type I and type II, are structurally very similar to STLVs, the simian (monkey) T-lymphoma viruses (Franchini et al., AIDS Res Human Retrovirus 1994; 10: 1047-1060). Thus the difference in the virus species, STLV I and II, and the viruses between man (HTLV) and monkeys (STLV) is a sign of a long individual evolution in the individual primates, if a cross-transmission between the various primate species can be excluded (Franchini et al., AIDS Res Human Retrovirus 1994; 10: 1047-1060). STLV-infected monkeys occur over the entire world (Hayami M et al, Curr. Top. Microbiol. Immunol. 1994; 188: 1-20), whereas SIV-infected monkeys are only to be found naturally in Africa, which- is an indication of the fact that SIV very probably developed later than STLV.
[0005] Molecular biology results show clearly that HIV-1 is very closely related to the immunodeficiency viruses of the chimpanzee. The latter viruses are subsequently designated as SIV-1, whereas the virus of the mangabeys, SIVsm, is designated as SIV-2. SIV-1 and HIV-1 derive with high probability from a precursor virus, just as SIV-2 and HIV-2 probably have a common precursor. Up to 25% of troops monkeys can naturally be infected with SIV-2 without signs of the virus pathogenesis being detectable in the infected animals (Chen Z et al., J Virol. 1996; 70: 3617-3627). In the case of SIV-2, infections in man were detected which do not differ in their pathogenesis from an HIV-2 infection. SIV-2 is closely related to HIV2 and particularly epidemically widespread in West Africa south of the Sahara, in the same region in which the mangabeys live (Gao F L et al., Nature 1992; 358: 495-499). The results of the investigations on SIV show that in addition to the SIV-2 (SIVsm) of mangabeys the immunodeficiency viruses of the African green meerkat represent a further type, perhaps SIV-3, and in addition meanwhile some further simian SIVs have been isolated which cannot be assigned to the groups of viruses mentioned and which probably represent the SIV type 4. This SIV-4 type is formed by the viruses of the Sykes monkeys (Cercopithecus mitis), the Hoest monkeys (Cercopithecus l'hoesti) (Hirsch V M et al., J. Virol. 1999; 73: 1036-1045), the red cap mangabeys (Cercopithecus torquatus torquatus) (Georges-Courbot M C et al., J. Virol. 1998; 72: 600-608), the mandrill monkeys SIVmnd (Mandrillus sphinx) (Tsujimoto H et al., Nature 1989; 341: 539-541), and the drill monkeys (Mandrillus leucophaeus) (Clewley J P et al., J. Virol. 1998; 72: 10305-10309). All previously isolated SIV-4s can be cultured in human peripheral blood lymphocytes and some in the human permanent cell line Molt4 clone 8 (Hirsch V M et al., J. Virol. 1999; 73: 1036-1045), which indicates that the infection of man with these viruses should also be possible. The SIV-4 type is so different from the SIV-2 type that an SIVmac(SIV2)-specific p25 antigen test cannot detect SIVhoest(SIV4) produced in the supernatant of infected cells (Hirsch V M et al., J. Virol. 1999; 73: 1036-1045), as the Gag region is too divergent for recognition by monoclonal antibodies. The phylogenetic comparison of the nucleic acid sequences of the simian viruses also shows that the SIV-4 described here differs from SIV-2 and SIV-3 (Korber et al. Human Retroviruses and AIDS 1997. A compilation and analysis of nucleic acid and amino acid sequences. Los Alamos National Laboratory, New Mexico, 1998).
[0006] As described above, a virus similar to SIVcpz is possibly the precursor virus of viruses causing human HIV-1 infections, which the high similarity of viruses of the group HIV1-M, -N and -O to SIV-1 indicates.
[0007] To date, there are no reports that humans have been infected with SIV-4. A nosocomial infection with SIV-3 or SIV-2 occurred due to contamination of the eczematous skin of a laboratory assistant (Khabbaz R F et al., N. Engl. J. Med. 1994; 330: 172-177). The SIV replicated for a certain time which was sufficient for the induction of a strong antibody response, but was not sufficient to establish a permanent infection (Khabbaz R F et al., N. Engl. J. Med. 1994; 330: 172-177). About 3.5 years after seroconversion, the laboratory assistant appeared to be free of the infection (Khabbaz R F et al., N. Engl. J. Med. 1994; 330: 172-177). Whether this path of virus elimination is the rule or whether persistent infections with corresponding pathogenesis can also result from the infective event is unknown.
[0008] Since until now no epidemiological studies on target groups in central Africa have been carried out which can show whether variant viruses such as SIV-4 also circulate in the human population, infection of man cannot be confirmed, but can also not be excluded.
[0009] As was seen in the example of the HIV-1 subtype O, antibody detection tests on the basis of HIV-1 subtype M were not sufficiently reactive in order to be able to detect all subtype O-infected patients (Simon F et al. AIDS 1994; 8: 1628-1629). The diagnosis of an infection with an aberrant human pathogenic SIV subtype could probably also not be made, as it must be assumed that the ELISA exploratory tests based on HIV-1 and/or HIV-2 antigens are negative or would only be slightly reactive, and the attempt at confirmation by means of the immunoblot produced a negative or probably questionable result. The diagnosis could probably also not be made by means of the nucleic acid tests, since with the presently available tests, for example, neither the nucleic acid of the viruses of group O nor that of HIV-2 can be reliably amplified (G{overscore (u)}rtler L et al., 12th World AIDS Conference Geneva Basic Science 1: 121-124).
[0010] The drill monkeys described here (Mandrillus leucophaeus) are animals which originate from the western region of Cameroon bordering Nigeria and live wild there in the bushland. Drill monkeys have become widespread in the central West-African region. The animals are hunted and eaten, which is why the stock in recent years has continuously decreased. Young animals are in some cases picked up and kept in the vicinity of the houses as pets. The monkey 27 described here (3 years old) was captured from a free hunting reserve and then domesticated over the course of a year and has had no contact with other monkeys of the same or of a similar species.
[0011] As described in Example 2, the virus originating from monkey 27 was replicated in human PBLs. Genomic DNA and thus also integrated proviral DNA of the SIV was isolated from the infected cells. The deciphering of the sequence of the total genome of the SIV is described in Example 3. The PCR (polymerase chain reaction) method was employed for the multiplication of the viral DNA. The components needed for carrying out the process can be acquired commercially.
[0012] Using this process, it is possible to amplify DNA sequences if DNA regions of the sequence to be amplified are known, or known sections are sufficiently similar. Short complementary DNA fragments (oligonucleotides=primers) which add to a short region of the nucleic acid sequence to be amplified must then be synthesized. For carrying out the test, nucleic acids are combined with the primers in a reaction mixture which additionally contains a polymerase and nucleotide triphosphates. The polymerization (DNA synthesis) is carried out for a specific time, then the nucleic acid strands are separated by warming. After cooling, the polymerization starts again.
[0013] The amplified genome sections were sequenced by the Sanger method. As described in Example 4, the genome of SIM27 was subjected to phylogenetic comparisons which showed that it is a strongly divergent novel simian immunodeficiency virus.
[0014] The present invention therefore relates to:
[0015] 1.) Immunodeficiency viruses which branch off as a side branch from the SIM27 side branch after the branching of SIM27 in a phylogenetic investigation of their total genome on the nucleic acid plane, as is described in Example 4 (see FIG. 1)
[0016] 2.) GAG proteins and fragments thereof which branch off as a side branch from the SIM27 side branch after the branching of SIM27 in a phylogenetic investigation of their total sequence on the amino acid plane, as is described in Example 4 (see FIG. 2).
[0017] 3.) Pol proteins and fragments thereof which branch off as a side branch from the SIM27 side branch after the branching of SIM27 in a phylogenetic investigation of their total sequence on the amino acid plane, as is described in Example 4 (see FIG. 4), or a POL protein fragment or subfragments thereof which branch off as a side branch from the SIM27 side branch after the branching of SIM27 in the region of the sequence including this amino acid sequence, published by Clewley (Clewley J P et al., J. Virol. 1998;
[0018]
72
: 10305-10309), as has been investigated as described in Example 4 (see FIG. 6).
[0019] 4.) ENV proteins and fragments thereof which branch off as a side branch from the SIM27 side branch after the branching of SIM27 in a phylogenetic investigation of their total sequence on the amino acid plane, as is described in Example 4 (see FIG. 7).
[0020] Of particular interest is furthermore the consideration of the strongly immunogenic cysteine loop region in the Env gene, which is therefore of particular diagnostic importance. The cysteine loop regions of various immunodeficiency viruses are shown in Table 1.
1TABLE I
|
|
SIM27.ENVRLTALEEYVADQSRLAVWGCSFSQVCHTNVKW
SIV-Mandrill, MNDGBIRLTSLENYIKDQALLSQWGCSWAQVCHTSVEW
HIV1-N, YBF30KVLAIERYLRDQQILSLWGCSGKTICYTTVPW
HIV1-C, 96bw05.02RTLAVERYLKDQQLLGIWGCSGKLICTTAVPW
HIV1-O, ANT70CRLLALETLLQNQQLLSLWGCKGKLVCYTSVKW
SIV-CPZ, CPZGABRLLAVERYLQDQQILGLWGCSGKAVCYTTVPW
HIV1-O, MVP5180RLQALETLIQNQQRLNLWGCKGKLICYTSVKW
SIV-lhoestiRLTALEEYVKHQALLASWGCQWKQVCHTNVEW
SIV-SYKESRLTALETYLRDQATLSNWGCAFKQICHTAVTW
SIV-CPZ, CPZANTRMLAVEKYLRDQQLLSLWGGADKVTCKTTVPW
SIV-CPZ-USRVLAVERYLKDQQILGLWGCSGKTTCYTTVPW
HIV1-F, 93br020.1RVLAVERYLKDQQLLGLWGCSGKLICTTNVPW
HIV1-A, 92ug037RVLAVERYLRDQQLLGIWGCSGKLICPTNVPW
HIV1-H, 90cr056RVLAVERYLRDQQLLGTWGCSGKLICTTNVPW
HIV1-D, NDKRVLAVERYLRDQQLLGIWGCSGRHICTTNVPW
HIV2-B, UC1RVTAIEKYLKDQALLNSWGCAERQVCHTTVPW
SIV-D, MNERVTAIEKYLXDQAQLNAWGCAERQVCHTTVPW
SIV-D, MM239RVTAIEKYLKDOAQLNAWGCAFRQVCHTTVPW
SIV, SME543RVTAIEKYLKDQAQLMSWGCAFRQVCHTTVPW
SIV-D, SMM-PBJ-6P9RVTAIEKYLKDQAQLNSWGCAERQVCHTTVPW
SIV-D, STMRVTAIEKYLKDQAQLNSWGCAERQVCHTTVPW
HIV2-A, CAMRVTAIEKYLKDQAQLNSWGCAERQVCHTTVPW
HIV2-A, GH1RVTAIEKYLKDQAQLNSWGCPFRQVCHTTVPW
HIV2-B, EHORVTAIEKYLKDQAQLNSWGCAFRQVCHTTVPW
SIV-SMM, PGMRVTAIEKYRKDQAQLNSWGCAFRQVCHTTVPW
SIV-VERVET, AGM155RVTALEKYLAZQARLNAWGCSWKQVCHTTVPW
SIV-VERVET, AGM3RVTALEKYLEDQARLNAWGCAWKQVCHTTVPW
SIV-SABAEUS, AGMSAB1RVTALEKYLEDQARLNIWGCAFRQVCHTTVLW
SIV-VERVET, AGMTY6RVTALEKYLEDQARLNSWGCAWKQVOHTTVEW
SIV-GRIVET, AGM677ARVTALEKYLEDQARLNSWGCAWKQVCHTTVPW
SIV-VERVET, REVRVTALEKYLEDQARLNVWGCAWKQVCHTTVPW
SIV-TANTALUS, TAN1RVTALEKYLEDQTRLNLWGCAFKQVCHTTVPW
|
[0021] As can be clearly seen, either lysine or arginine occurs in position 3 of the cysteine loop (C12345C) in nearly all representatives of immunodeficiency viruses. The only exception up to now was found in the immunodeficiency virus MNDGB1, which was likewise isolated from a drill monkey (Mandrillus spinx). With great probability it is to be assumed from this that antibodies formed against this modified epitope cannot be recognized or can be recognized with clearly decreased efficiency from diagnostic tests known up to now which are based on the customary arginine- or lysine-containing antigens.
[0022] This invention therefore likewise relates to antigens in which arginine and/or lysine within the cysteine loop region in position 3 has been replaced by any desired amino acid, particularly preferably a polar amino acid such as serine or an amino acid having an aliphatic side chain such as alanine.
[0023] The present invention is moreover described in the examples and in the patent claims, where the examples serve for summarization and no restriction of the present invention must be derived therefrom.
EXAMPLE 1
[0024] Identification of the SIM27 infection in drill monkeys
[0025] In the course of a study, EDTA blood was taken from drill monkeys in the villages of rural Cameroon, in which they were kept, and this was analyzed in various HIV tests. On testing the serum of the monkey SIM27 for antibodies, a competitive ELISA for HIV-1 was negative and an ELISA from Dade Behring (Enzygnost HIV-1/2 plus) recognizing HIV-1, -2 and -O was likewise negative, the extinction lying near the threshold value. In the analysis of the HIV-1 Western blot (virus MVP899-87) which was carried out at the same time, no virus-specific bands were to be seen, in the HIV-2 blot (virus MVP11971-87), the band gp36 was to be seen strongly, and the bands p55 and p68 were to be seen, and in the HIV-1 group O blot (virus MVP5180-91), the bands p24 and p55 were to be seen. Gp36 is the transmembrane protein of HIV-2, the bands p55 and p68 correspond to the reverse transcriptase (p55) plus the RNaseH (p68) of HIV-2, and p24 is the inner core protein of HIV-1 group O viruses and p55 the precursor protein of gag and thus also p24. 20 ml of plasma from the animals were employed in order to develop the Western blot. According to the analysis of the nucleic acid sequence, the virus MVP11971-87 is a representative of the group HIV-2A, the virus MVP899 a representative of HIV-1B.
[0026] The SIV infection of the monkeys with the drill virus is thus distinguished:
[0027] by negativity in normal screening ELISAs for HIV antibodies,
[0028] by serological cross reaction in the env and pol region with the HIV-2 transmembrane glycoprotein and the reverse transcriptase in the Western blot,
[0029] by serological cross reaction in the gag region with the inner core protein of HIV-1 group O and absent cross reaction with the core proteins of group M (HIV-1B) in the Western blot.
EXAMPLE 2
[0030] Isolation of the SIM27 virus
[0031] The lymphocyte fraction was isolated by Ficoll gradient centrifugation from 5 ml each of EDTA blood of the monkeys. The lymphocytes were stimulated with PHA (phytohemaglutinin, 5 mg/ml) and PMA (myristylphorbol ester, 10 ng/ml), after 3 days both additives were washed out and the culture was continued in the presence of RPMI-1640, as usual, with interleukin-2 addition. The PMA stimulation was described by Kubo et al. (Kubo M et al., J. Virol 1997; 71: 7560-7566).
[0032] The culture conditions were similar to those which have been described by Tamalet et al. (Tamalet, C. et al., AIDS 1994; 1083-1088). After one week in culture, human PHA-stimulated and nonstimulated blood lymphocytes (PBLs) were added to the monkey lymphocytes and the addition was repeated once weekly until it was possible after about 3 weeks to detect beginning SIV production by means of a commercially obtainable p24 antigen test (Abbott, Wiesbaden).
[0033] The virus was then subcultured on human lymphocytes from the supernatant of the cells. All attempts to transfer the SIM27 to permanent culture cells such as HUT-78 or Jurkat have failed up to now. By means of monthly subculturing, it was possible to keep SIM27 on PBL in culture for 9 months from then on.
EXAMPLE 3
[0034] DNA isolation, amplification and structural characterization of genome sections of the HIV isolate SIM27
[0035] Genomic DNA from SIM27-infected blood lymphocytes was isolated by standard methods (Current Protocols in Molecular Biology, Wiley Interscience, 1994).
[0036] The total genome was amplified exclusively by means of PCR (polymerase chain reaction). All PCRs were begun by means of “Hot Start”: after addition of all components of the PCR, except the polymerase, this was added only after heating the sample to 94° C., which strongly reduces the extension of nonspecifically binding primers.
[0037] A general survey of the individual stages of the deciphering of the genome is shown in FIG. 8.
[0038] For the characterization of genome regions of the isolate SIM27, PCR experiments were carried out with primer pairs from the region of the integrase in the pol gene. The PCR (Saiki et al., Science 239: 487-491, 1988) was modified as follows:
[0039] For the first amplification of HIV-specific DNA regions, 5 μl (200 μg/ml) of genomic DNA from SIM27-infected blood lymphocytes were pipetted into a 50 μl reaction mixture (0.25 mM DNTP, 1 μM each primer, 10 mM tris HCl, pH 8.3, 50 mM KCl, 1.5 mM MgCl2, 0.001% gelatin, 2.5 units platinum-Taq DNA polymerase (Gibco)) and amplified according to the following temperature program:
[0040] 1) initial denaturation: 3 min. 95° C,
[0041] 2) amplification: 30 sec. 94° C., 30 sec. 49° C., 30 sec. 68° C. (30 cycles).
[0042] The primers used for the PCR had the following sequence:
[0043] (Seq. ID No. 1 and 2)
[0044] 5′Spol2380agm GCC ATG TGT CCA AAA TGT CA 3pol2930agm CTT CTC TGT AGT AGA CTC TA
[0045] 5 μl of the amplificate were employed as a template for a second nested PCR with the following primers and the same temperature profile:
[0046] (Seq. ID No. 3 and 4)
[0047] Spol2460agm TAG TAG CAG TCC MYR KWG (M=A/C, Y=C/T, R=A/G, K=G/T, W=A/T) 3pol2760agm TCT CTA ATT TGT CCT ATG AT
[0048] The amplificate thus obtained was sequenced directly without cloning.
[0049] The sequence found is shown in Table 2.
2TABLE 2
|
|
1AGTAGCAGTC CATGTAGCCA GTGGATACCT AGAGGCAGAA GTAATACCAG
|
51CAGAGACAGG AAAAGAGACA GCACATTTCC TGTTAAAGTT AGCAGGCAGG
|
101TGGCCTGTAA AACATTTAGA CACTGACAAT GGCCCCAACT TTGTCAGTGA
|
151AAAGGTAGCC ACAGTCTGTT GGTGGGCTCA AATAGAGCAC ACCACAGGTG
|
201TACCCTATAA CCCCCAGAGT CAGGGAGTAG TGGAAGCAAA GAATCATCAT
|
231CTTAAGACAA TCATAGGACA AATTAGAGA
|
[0050] Based on the publication of Clewley (Clewley J P et al., J. Virol 1998; 72: 10305-10309), a further amplificate was obtained in the 5′ region of the pol gene. The primers DR1, DR2 and, for the nested PCR, DR4 and DR5 described by Clewley were used, as well as the temperature cycles described in this publication. The polymerases used were DNA-Taq polymerase (Perkin Elmer) and the buffers described above.
[0051] The sequence according to Table 3 was obtained here:
3TABLE 3
|
|
1GGGATTCCGC ANCCGGCAGG TCTAAAACAA TGTGAACAGA TCAGAGTATT
|
51GGATATAGGA GATGCCTATT TTTCATGCCC ATTGGATGAG GACTTTAGAA
|
101AGTATACTGC ATTCACCATT CCATCGGTGA ATAATCAGGG GCCCAGGAAT
|
151CAGATACCAG TATAATGTCC TCCCNCAGGG NTGGAAGGGG TCCCC
|
[0052] In a next amplification, the region of SIM27 lying between the amplificates already obtained was amplified. The primers mentioned below were used here.
[0053] For the first PCR:
[0054] (Seq. ID No. 5 and 6)
[0055] 1216 ATG CCC ATT GGA TGA GGA C 1197 GAC TGT GGC TAC CTT TTC ACT
[0056] For the nested PCR:
[0057] (Seq. ID No. 7 and 8)
[0058] 1218 CAT CGG TGA ATA ATC AGG 1226 GGT ATT ACT TCT GCC TCT A
[0059] The platinum-Taq DNA polymerase (Gibco) was used according to the following temperature program:
[0060] 1) initial denaturation: 2 min. 95° C.,
[0061] 2) amplification: 30 sec. 95° C., 30 sec. 55° C., 150 sec. 68° C. (30 cycles).
[0062] The sequence according to Table 4 was obtained here.
4TABLE 4
|
|
1CATCGGTGAA TAATCAGGGC CCAGGAATCA GATACCAGTA TAATGTCCTC
|
51CCACAGGGAT GGAAAGGCTC TCCAGCAATT TTTCAGGCAA CAGCTGATAA
|
101AATCTTGAAA ACATTCAAAG AAGAATACCA GAGGTATTAA TTTATCAGTA
|
151TATGGATGAT CTGTTCGTGG GAAGTGACTT AAATGCCACT GAACATAACA
|
201AAATGATAAA CAAGTTGAGA GAGCATCTGA GATTCTGGGG GCTCGAGACC
|
251CCAGATAAGA AGTTTCAAAA GGAACCTCCT TTTGAATGGA TGGGATATGT
|
301GCTACACCCA AAGAAATGGA CAGTGCAGAA AATACAACTA CCAGAAAAAG
|
351AGCAATGGAC AGTGAATGAT ATTCAGAAAT TGGTAGGAAA ACTTAATTGG
|
401GCAAGTCAGA TATATTCCGG AATTAAAACA AAAGAGCTCT GTAAATTGAT
|
451CAGAGGAGGA AAACCTCTAG ATGAAATAGT AGAATGGACA AGAGAAGCAG
|
501AATTAGAGTA TGAAGAGAAT AAGATAATAG TGCAGGAGGA GGTGCATGGA
|
551GTGTACTATC AGCCAGAAAA ACCACTGATG GCAAAAGTAC AAAAGTTGAC
|
601ACAAGGACAG TGGAGTTATC AAATAGAGCA AGAAGAAAAC AAACCTCTCA
|
651AGGCAGGAAA ATATGCCAGG ACAAAGAATG CCCACACAAA TGAGTTAAGG
|
701ACACTTGCAG GGTTAGTACA AAAAATAGCC AAGGAATGCA TAGTAATCTG
|
751GGGAAGATTG CCAAAATTTT ACCTCCCCTT GGAGAGAGAA GTATGGGATC
|
801AATGGTGGCA TGATTATTGG CAGGTAACAT GGATCCCAGA GTGGGAATTC
|
851ATCTCAACAC CACCATTGAT AAGGCTATGG TACAACCTCC TGAAAGAACC
|
901AATTCCAGGA GAAGATGTAT ACTATGTACA TGGGGCAGCT AACAGAAATT
|
951CTAAAGAAGG CAAGGCAGGA TACTATACAG CAAGGGGCAA AAGTAAGGTA
|
1001ATAGCTTTAG AAAATACAAC CAATCACAAG GCAGACCTGA AGGCAATAGA
|
1051ATTAGCCCTA AAAGATTCAG GACCAAGAGT AAACATAGTA ACAGATTCAC
|
1101AGTATGCATT AGGCATACTC ACAGCATCCC CAGATCAGTC AGATAACCCC
|
1151ATAGTTAGGG AAATAATTAA CCTCATGATA GCCAAGGAAG CAGTCTACCT
|
1201GTCATGGGTA CCAGCCCACA AGGGTATAGG AGGTAACGAA CAAATAGACA
|
1251AATTAGTAAG CCAAGGAATT AGGCAAGTAC TATTCCTGGA AGGAATAGAC
|
1301AGAGCTCAGG AAGAACACGA CAAATATCAT AACAACTGGA GAGCTTTAGC
|
1351TCACGAATTC AGCATACCTC CTATAGTGGC AAAAGAGATA GTTGCACAAT
|
1401GCCCAAAATG CCAGATAAAA GGGGAACCTA TTCATGGCCA GGTAGATGCA
|
1451AGTCCTGGGA CATGGCAAAT GGATTGCACC CATCTAGAAG GAAAGGTCAT
|
1501CATAGTGGCA CTCCATGTAG CCAGTGGATA CCTACAGGCA GAAGTAATAC
|
1551C
|
[0063] The region of the total sequence of the 5′-LTR region of the genome up to the pol gene was amplified with the following primer pairs:
[0064] 1. PCR:
[0065] (Seq. ID No. 9 and 10)
[0066] 1248 CTC AAT AAA GCT TGC CTT GA 1217 GTC CTC ATC CAA TGG GCA T
[0067] 2. Nested PCR:
[0068] (Seq. ID No. 11 and 12)
[0069] 1249 TRD CTA GAG ATC CCT CAG A (R=A/5, D=G/A/T) 1219 CCA ATA CTG TGA TCT GTT CAC
[0070] The platinum-Taq DNA polymerase (Gibco) was in each case used according to the following temperature program:
[0071] 1) initial denaturation: 2 min. 95° C.,
[0072] 2) amplification: 30 sec. 95° C., 30 sec. 50° C., 180 sec. 68° C. (30 cycles). 1× enhancer (Gibco) was used in addition to the buffers indicated above.
[0073] The sequence according to Table 5 was obtained here:
5TABLE 5
|
|
1TRDCTAGAGA TCCCTCAGAT TTGTGCCAGA CTTCTGATAT CTAGTGAGAG
|
51TAGAGAAAAA TCTCCAGGAG TGGCGCCCGA ACAGGGACTT GACGAAGAGC
|
101CAAGTCATTC CCACCTGTGA GGGACAGCGG CGGCAGCGRG CCGGACCGAC
|
151CCACCCGGTG AAGTGAGTTA ACCAAGGAGC CCCGACGCGC AGGACACAAG
|
201GTAAGCGCTG CACCGTGCTG TAGTGAGTGT GTGTCCAGGA TCCGCTTGAG
|
251CAGGCGAGAT CGCCGAGGCA ACCCCACTAG AAAAAGAAAA GAGGGGAAGT
|
301AAGGCCGAGG CAAAGTGAAA GTAAAAGAGA TCCTCTGAGA AGAGGAACAG
|
351GGGGCAATAA AATTGGCGCG AGCGCGTCAG GACTTAGGGG AAGAGAATTG
|
401GATGAGCTGG AAAAGATTAC GTTACGGCCC TCCGGAAAGA AAAAATACCA
|
451GCTAAAACAT GTGATATGGG TAAGCAAGGA ACTAGATAGA TTTGGCCTAC
|
501ATGAAAAGTT GTTAGAAACC AAGGAAGGAT GCGAAAAAAT TCTTAGCGTA
|
551CTCTTTCCTC TAGTTCCTAC AGGGTCAGAA AATTTAATTT CGCTGTACAA
|
601CACCTGCTGT TGCATTTGGT GCCTACATGC GAAAGTGAAA GTAGCAGATA
|
651CAGAAGAGGC AAAAGAGAAA GTAARACAAT GCTACCATCT AGTGGTTGAA
|
701AAACAGAATG CAGCCTCAGA AAAAGAAAAA GGAGCAACAG TGACACCTAG
|
751TGGCCACTCA ARAAATTACC CCATTCAGAT AGTAAATCAA ACCCCAGTAC
|
801ACCAGGGAAT TTCTCCCAGA ACACTGAATG CTTGGGTAAA ATGTATAGAG
|
851GAGAAGAAAT TCAGCCCAGA AATAGTGCCT ATGTTCATAG CTTTGTCAGA
|
901AGGATGCCTC CCATACGACC TCAACGGCAT GCTCAATGCC ATTGGGGACC
|
951ATCAGGGAGC TCTCCAAATA GTGAAAGATG TCATCAATGA CGAAGCTGCA
|
1001GACTGGGATC TTAGACATCC TCAGATGGGG CCTATGCCCC AAGGGGTGCT
|
1051AAGAAACCCA ACAGGGAGTG ACATAGCAGG AACCACCAGC AGCATAGAAG
|
1101AACAAATTGA ATGGACAACT AGGCAGCAAG ATCAGGTAAA TGTAGGAGGA
|
1151ATTTACAAAC AATGGATAGT TCTGGGATTG CAAAAATGTG TGAGCATGTA
|
1201CAATCCAGTG AATATTCTAG ATATAAAACA GGGACCAAAA GAACCCTTTA
|
1251AGGACTATGT GGATCGATTT TACAAAGCTC TGCGGGCGGA GCGAACAGAT
|
1301CCACAAGTGA AAAACTGGAT GACGCAGACA TTGCTCATCG AGAATGCAAA
|
1351CCCAGATTGT AAAGCCATTC TTAAGGGATT AGGCATGAAC CCCACCTTGG
|
1401AAGAAATGTT ATTGGCATGT CAAGGAGTAG GGGGACCAAA GTATAAAGCT
|
1451CAAATGATGG CAGAAGCAAT GCAGGAGGTG CAAGGAAAAA TTATGATGCA
|
1501AGCCTCGGGA GGACCACCGC GGGGTCCCCC AAGGCAGCCA CCCAGAAATC
|
1551CTAGATGCCC CAACTGTGGA AAGTTTGGAC ATGTACTGAG AGACTGTAGA
|
1601GCCCCAAGAA AGCGAGGATG CTTCAAGTGT GGAGATCCAG GACATCTGAT
|
1651GAGAAACTGC CCAAAGATGG TGAATTTTTT AGGGAATGCT CCYTGGGGCA
|
1701GTGGCAAAGC GAGGAACTTT CCTGCCGTGC CACTGACCCC AACGGCACCC
|
1751CCGATGCCAG GATTAGAGGA YCCAGCAGAG ARGATGCTRC TGGATTACAT
|
1801GAAGAAGGGG CAACAGATGA AGGCAGAGAG GGAAGCCAAA CGGGAGAAGG
|
1851ACAAAGGCCC TTACCAGGCG GCTTACAACT CCCTCAGTTC TCTCTTTGCA
|
1901ACAGACCAAC TACAGTAGTA GAGATAGAGG GGCAAAAAGT GGAGGCCCTA
|
1951CTAGATACAG GAGCAGATGA CACAGTAATC AAAGATTTAC AATTAACAGG
|
2001CAATTGGAAA CCACAAATCA TAGGAGGAAT TGGAGGAGCA ATTAGGGTAA
|
2051AGCAATATTT CAATTGTAAA ATAACAGTGG CAGGTAAAAG CACTCATGCT
|
2101TCAGTACTAG TGGGCCCCAC TCCTGTAAAT ATTATAGGTA GAAATGTACT
|
2151TAAAAAGTTA GGATGTACTT TGAACTTCCC TATTAGTAAR ATAGAAACAG
|
2201TAAAGGTAAC ACTAAAACCA GGAACTGATG GACCAAGAAT CAAACAGTGG
|
2251CCACTGTCTA AAGAAAAGAT TTTAGCCTTA CAAGAAATAT GCAATCAGAT
|
2301GGAAAAAGAA GGCAAAATCT CTAGAATAGG TCCAGAAAAT CCTTACAACA
|
2351CACCAGTGTT TTGTATAAAA AAGAAAGATG GAGCCAGCTG GAGAAAACTG
|
2401GTAGATTTTA GACAATTGAA TAAAGTGACA CAGGATTTCT TTGAGGTGCA
|
2451GCTAGGAATC CCACATCCTG GAGGTCTAAA ACAATGTGAA CAGATCACAG
|
[0074] The still missing region of the total sequence of the integrase up to the 3′-LTR was amplified with the following primer pairs, the primer 1270 being discarded on account of the sequence of the 5′LTR region (prior amplificate):
[0075] 1. PCR:
[0076] (Seq. ID No. 13 and 14)
[0077] 1246 CCT ATT CAT GGC CAG GTA 1270 GAT TTT TCT CTA CTC TCA CTA
[0078] 2. Nested PCR:
[0079] (Seq. ID No. 15 and 16) 1196 AGT GAA AAG GTA GCC ACA GTC 1270 GAT TTT TCT CTA CTC TCA CTA
[0080] The platinum-Taq DNA polymerase (Gibco) was in each case used according to the following temperature program:
[0081] 1) initial denaturation: 2 min. 95° C.,
[0082] 2) amplification: 30 sec. 95° C., 30 sec. (47° C. 1.PCR; 51° C. 2.PCR), 360 sec. 68° C. (30 cycles). 1× enhancer (Gibco) was used in addition to the buffers indicated above.
[0083] The sequence according to Table 6 was obtained here:
6TABLE 6
|
|
1AGTGAAAAGG TAGCCACAGT CTGTTGGTGG GCTCAAATAG AGCACACCAC
|
51AGGTGTACCC TATAACCCCC AGAGTCAGGG AGTAGTGGAA GCAAAGAATC
|
101ATCATCTTAA GACAATCATA GAACAAGTTA GGGATCAAGC AGAAAAATTA
|
151GAAACAGCAG TACAAATGGC ASTATTAATA CACAATTTTA AAAGAAAAGG
|
201GGGGATAGGG GAGTATAGTC CAGGAGAAAG AATAGTAGAT ATCATAACCA
|
251CAGACATTCT AACAACTAAA TTACAACAAA ATATTTCAAA AATTCAAAAT
|
301TTTCGGGTTT ATTACAGAGA AGGAAGGGAT CAACAGTGGA AAGGACCAGC
|
351AGAACTCATT TGGAAAGGAG AAGGCGCTGT GGTGATTAAA GAAGGGACAG
|
401ACTTAAAGGT GGTACCAAGA AGAAAAGCCA AAATCATCAG AGATTATGGA
|
451AAAGCAGTGG ATAGTAATTC CCACATGGAG AGTAGAGAGG AATCAGCTTG
|
501AGAAATGGAA TTCATTAGTA AAATATCATA AATATAGGGG AGAAAAATAC
|
551CTAGAAAGAT GGGAACTATA CCACCATTTC CAATGCTCGG GGTGGTGGAC
|
601ACACTCTAGA AAAGATGTTT ACTTTAAAGA TGGCTCAGTA ATAAGCATTA
|
651CTGCCTTCTG GAATCTTACC CCAGAGAAAG GATGGTTGTC TCAATATGCA
|
701GTTACAATAG AATATGTAAA AGAAAGCTAT TATACTTACA TAGACCCAGT
|
751TACAGCAGAC AGAATGATTC ATTGGGAATA TTTCCCATGT TTTACAGCCC
|
801AGGCTGTGAG AAAAGTACTG TTTGGAGAAA GACTAATAGC TTGCTACAGC
|
851CCCTGGGGAC ACAAAGGACA GGTAGGGACT CTACAATTCC TGGCTTTGCA
|
901AGCTTACCTT CAGTATTGTA AACATGGCAG AAAGAGCACC AGAAGTGCCG
|
951GAAGGGGCAG GAGAGATACC TCTAGAACAG TGGCTAGAAA GATCATTAGA
|
1001ACAACTCAAC AGAGAGGCCC GGTTACACTT CCACCCAGAG TTCCTTTTCC
|
1051GTCTTTGGAA CACTTGTGTA GAACATTGGC ATGATAGACA CCAGAGGAGC
|
1101CTGGAGTATG CAAAATACAG ATATCTTTTG TTGGTGCATA AGGCCATGTT
|
1151TACCCATATG CAACAGGGAT GCCCATGTAG AAATGGGCAC CCAAGAGGAC
|
1201CTCCTCCTCC AGGATTGGCC TAATTTCTGT CTTGCAGATG GAACAGCCAC
|
1251CTGAGGACGA GGCTCCACAG AGAGAACCTT ATAATGAATG GCTGATAGAT
|
1301ACCTTGGCAG AAATCCAGGA AGAAGCTTTG AAGCATTTTG ATAGGCGCTT
|
1351GCTACATGCA GTAGGCTCAT GGGTGTATGA GCAACAGGGA GACACCTTAG
|
1401AAGGTGTCCA AAAGCTAATA ACTATTCTAC AAAGAGCTTT GTTTTTGCAC
|
1451TTCAGGCATG GATGCAGGGA AAGCCGCATT GGACAAGCAG GAGGGAAATA
|
1501TAATTCCCTC AGATCCTTTC CAAGGCCAGA CAACCCCTTG TAATAAATGC
|
1551TATTGTAAAA GATGTTGCTA TCACTGCCAG TTATGCTTCT TGCAGAAAGC
|
1601CTTAGGGATA CATTATCATG TCTACAGAGT CAGGAGACCT CGACAGAGAT
|
1651TTTTGGGCGA AGTACCACCA CATAGTGCAG CAACTGTGGA AAGGTAAGTA
|
1701AAAAGTAAGT AGACATGCTT AGATATATAG TTTTAGGAAT AGTCATAGGA
|
1751TTAGGGATAG GACACCAATG GGTTACAGTG TATTATGGAA CACCTAAATG
|
1801GCACCCAGCT AGGACACATC TCTTTTGTGC AACAGATAAT AATTCCTTTT
|
1851GGGTCACAAC AAGTTGTGTG CCCAGCCTAT TGCACTATGA AGAACAACAC
|
1901ATTCCCAACA TAACAGAAAA CTTCACAGGC CCCATAACAG AGAATGAAGT
|
1951AATAAGACAA GCATGGGGAG CTATCTCTTC CATGATAGAT GCAGTCTAA
|
2001AACCCTGTCT AAAGCTGACA CCATATTGTG TCAAGATGAA ATGCACAAAG
|
2051GGAGATACTG ATACTACAGA AAGGACAACA TCAACCACTC CCTCTTGGTC
|
2101CACATCCACC CCAACCTCTA CCCCTATGAC TCCCAATACC ACTGGATTAG
|
2151ATATAGACTC AAACAATACA GAACCCACAA CACAAGAGAA TCGGATATGT
|
2201AAATTTAATA CTACAGGATT ATGTAGAGAC TGCAGATTGG AAATAGAAGA
|
2251AAACTTCACA TATCAGGATA TAACATCTAG AAATAGTAGT GAAGATACTG
|
2301AAGAGTGCTA TATGACACAT TGTAACTCAT CAGTAATAAC ACAGGATTGC
|
2351AATAAGGCAT CAACAGATAA AATGACTTTT AGGTTGTCTG CACCACCAGG
|
2401ATATGTCCTG TTGAGATGTA GAGAAAAGCT AAACCAAACC AAATTGTGTG
|
2451GCAATATTAC AGCAGTGCAA TGCACTGACC CAATGCCTGC AACTATATCC
|
2501ACTATGTTTG GATTTAATGG GACCAAACAT GACTATGATG AGCTAATTTT
|
2551AACAAACCCT CAAAAGATAA ATGAGTTTCA TGATCACAAG TATGTATATA
|
2601GAGTTGATAA AAAATGGAAG CTACAGGTAG TATGTAGAAG AAAAGGGAAT
|
2651AGATCAACAA TATCAACGCC AAGTGCTACG GGCTTATTGT TCTATCATGG
|
2701GCTACAACCA GGGAAAAATT TAAAAAAGGG GATGTGCCAG CTGAAGGGAT
|
2751TATGGGGAAA GGCCATGCAC CAACTATCAG AGGAACTTAG AAAGATAAAT
|
2801GGAAGTATTT ATAGAAAATG GAATGAGACA GCAGGCTGCA GAAAGCTAAA
|
2851CAAACAGAAC GGTACAGGTT GCTCATTGAA AACAATAGAA GTTAGTGAGT
|
2901ACACCACGGA GGGCGATCCG GGGGCAGACA CAATTATGCT TCTTTGTGGA
|
2951GGTGAGTATT TCTTTTGTAA TTGGACAAAG ATTTGGAAGA CATGGAATAA
|
3001CCAGACGTCA AATGTCTGGT ATCCTTGGAT GTCATGCAAT ATTAGACAAA
|
3051TTGTAGATGA TTGGCATAAA GTAGGGAAAA AAATTTATAT GCCTCCTGCA
|
3101AGTGGATTTA ACAAIGAGAT AAGGTGTACT AATGATGTCA CGGAAATGTT
|
3151CTTTGAGGTT CAGAAGAAGG AAGAGAATAA ATATTTAATA AAGTTTATAC
|
3201CTCAAGATGA GATACAAAAT CAGTATACAG CAGTAGGAGC ACATTATAAA
|
3251TTGGTGAAAG TGGATCCTAT AGGGTTCGCA CCCACAGATG TGCATAGATA
|
3301CCATCTACCA GATGTAAAGC AGAAGAGAGG AGCAGTCTTG CTTGGAATGC
|
3351TCGGCCTCTT AGGTTTGGCA GGTTCCGCGA TGGGCTCAGT GGCGATAGCA
|
3401CTGACGGTCC AGTCCCAGGC TTTATTGAAT GGGATTGTGG AGCAGCAGAA
|
3451GGTTCTGCTG AGCCTGATAG ATCAGCACTC CGAGTTATTA AAACTAACTA
|
3501TCTGGGGTGT AAAAAATCTT CAGGCCCGCC TCACAGCCTT GGAGGAATAC
|
3551GTAGCGGACC AATCAAGACT GGCAGTATGG GGATGCTCAT TCTCTCAAGT
|
3601ATGCCACACT AATGTAAAGT GGCCTAATGA TTCAATAGTT CCTAACTGGA
|
3651CCTCGGAAAC ATGGCTTGAA TGGGATAAAA GAGTGACAGC AATTACAACA
|
3701AATATGACAA TAGACTTGCA GAGGGCATAT GAATTCGAAC AAAAGAATAT
|
3751GTTTGAGCTT CAAAAATTAG GAGATCTCAC CTCCTGGGCC AGCTGGTTCG
|
3801ACCTCACGTG GTGGTTTAAA TATATTAAGA TAGGAATTCT TATAATAATA
|
3851GTGATAATAG GACTTAGAAT ATTAGCTTGC TTATGGTCAG TATTAGGCAG
|
3901GTTTAGGCAG GGTTACCGCC CTCTTCCTTA TGTCTTCAAG GCAGACTATC
|
3951ACCGACCCCA CAACCTCAAA CAGCCAGACA AAGAAAGAGG AGAAGAGCAA
|
4001GACAGAGAAA AACAGAACAT CAGCTCAGAG AATTACAGGC CAGGATCTGG
|
4051CAGAGCTTGG AGCAAAGAGC AAGTAGAGAC CTGGTGGAAG GAGTCCAGGC
|
4101TCTACATTTG GTTGAAGAGC ACACAAGCAG TAATTGAATA TGGGTGGCAA
|
4151GAGCTCAAAG CAGCAGGAGC AGAAATATAT AAAATATTAC AGAGCGCTGC
|
4201GCAGAGGCTA TGGAGCGGAG GGCACCAACT CGGACTATCA TGTATTAGAG
|
4251CAGCTACAGC CTTTGGCAGA GGAGTCAGAA ACATTCCTAG ACGCATCAGA
|
4301CAAGGAGCAG AAGTCTTACT CAACTGAGTT AGACTTAAGA CATCAACAAG
|
4351ATGTAAGCCT CCCCACAGAA GAAGAACAGC CTTGGGAAGA GGAAGAGGAG
|
4401GTAGGCTTTC CAGTCTACCC ACGACAGCCT GTGCATGAAG CCACCTATAA
|
4451AGACTTGATA GACCTGTCCC ACTTTTTAAA AGAAAAGGGG GGACTGGAAG
|
4501GGATTTGGTG GTCTAAAAGA AGAGAAGAAA TCTTGGATAT ATATGCACAA
|
4551AATGAATGGG GAATTATACC TGACTGGCAG GCTTACACTT CAGGACCGGG
|
4601GATCAGGTAT CCAAAAGCAT TTGGGTTCCT GTTTAAACTG ATCCCAGTGG
|
4651CAGTTCCACC GGAACAAGAG AACAATGAAT GCAATAGGCT GCTAAACTCT
|
4701TCTCAGACAG GAATCCAGGA AGATCCATGG GGAGAAAGGC TCATGTGGAA
|
4751GTTTGACTCT GCTCTTGCCT ATACTTTCTA TGCTCCCATA AAGAGGCCAG
|
4801GAGACTTCAA GCATGTCCAA AGTCTTAGCT ATGAAGCTTA TAAGAAGGAA
|
4851CCTGACTGCT GCAAGAGGAA GTGGTGGCGC TTCTAGCCGA CCACAGAGGG
|
4901TTGCTATGGC GATACCCTTT AAAACTGCTA ACTCTGGAGG GACTTTCCAC
|
4951TAGTGCATGC GCACTGGACT GGGGACTTTC CAGGATGACG CCGGGTGGGG
|
5001GAGTGGTCAG CCCAATCTGG CTGCATATAA GCAGCTCGCT TTGCGCTTGT
|
5051ATTGAGTCTC TCCCTGAGAG CCTACCAGAT TGAGCCTAGG TTGTTCTCTG
|
5101GTGAGTCCTT GAAGGAGTGC CTGCTTGTAG CCCTGGGCGG TTCGCAGGCC
|
5151CCTGGCTTGT AGCTCTGGGT AGCTCGTCAG GTGTTCTGGA AAGGTCTTGC
|
5201TAAGGGGACG CCTTTGCTTG GTCTTGGTAG ACCTCTAGCA GTCTCAGTGG
|
5251CCAGGAGGCT GTGGGATTCA CTACCGCTTG CTTGCCTTTG ATGCTCAATA
|
5301AAGCTTACCC GAATTAGAAA GGCATTCAAG TGTACTCGCT CATTTTGTCT
|
5351TTGGTAGAAA CTCTGGTTAC TGGAGATCCC TCAGATTTGT GCCAGAGTTC
|
5401TGATATCTAG TGAGAGTAGA GAAAAATC
|
[0084] The total sequence which results from the sum of the sequences according to Tables 2 to 6 is shown in Table 7:
7TABLE 7
|
|
1TRDCTAGAGA TCCCTCAGAT TTGTGCCAGA CTTCTGATA CTAGTGAGAG
|
51TAGAGAAAAA TCTCCAGCAG TGGCGCCCGA ACAGGGACTT GACGAAGAGC
|
101CAAGTCATTC CCACCTGTGA GGGACAGCGG CGGCAGCCGG CCGGACCGAC
|
151CCACCCGGTG AAGTGAGTTA ACCAAGGAGC CCCGACGCGC AGGACACAAG
|
201GTAAGCGGTG CACCGTGCTG TAGTGAGTGT GTGTCCAGGA TCCGCTTGAG
|
251CAGGCGAGAT CGCCGAGGCA ACCCCAGTAG AAAAAGAAAA GAGGGGAAGT
|
301AAGGCCGAGG CAAAGTGAAA GTAAAAGAGA TCCTCTGAGA AGAGGAACAG
|
351GGGGCAATAA AATTGGCGCG AGCGCGTCAG GACTTAGGGG AAGAGAATTG
|
401GATGAGCTGG AAAAGATTAG GTTACGGCCC TCCGGAAAGA AAAAATACCA
|
451GCTAAAACAT GTGATATGGG TAAGCAAGGA ACTAGATAGA TTTGGCCTAC
|
501ATGAAAAGTT GTTAGAAACC AAGGAAGGAT GCGAAAAAAT TCTTAGCGTA
|
551CTCTTTCCTC TAGTTCCTAC AGGGTCAGAA AATTTAATTT CGCTGTACAA
|
601CACCTGCTGT TGCATTTGGT GCGTACATGC GAAACTGAAA GTAGCAGATA
|
651CAGAAGAGGC AAAAGAGAAA GTAAAACAAT GCTACCATCT AGTGGTTGAA
|
701AAACAGAATC CAGCCTCAGA AAAAGAAAAA GGAGCAACAG TGACACCTAG
|
751TGGCCACTCA AGAAATTACC CCATTCAGAT AGTAAATCAA ACCCCAGTAC
|
801ACCAGGGAAT TTCTCCCAGA ACACTGAATG CTTGGGTAAA ATGTATAGAG
|
851GAGAAGAAAT TCAGCCCAGA AATAGTGCCT ATGTTCATAG CTTTGTCAGA
|
901AGGATGCCTC CCATACGACC TCAACGGCAT GCTCAATGCC ATTGGGGACC
|
951ATCAGGGAGC TCTCCAAATA GTGAAAGATG TCATCAATGA CGAAGCTGCA
|
1001GACTGGGATC TTAGACATCC TCAGATGGGG CCTATGCCCC AAGGGGTGCT
|
1051AAGAAACCCA ACAGGGAGTG ACATAGCAGG AACCACCAGC AGCATAGAAG
|
1101AACAAATTGA ATGGACAACT AGGCAGCAAG ATCAGGTAAA TGTAGGAGGA
|
1151ATTTACAAAC AATGGATAGT TCTGGGATTG CAAAAATGTG TGACCATGTA
|
1201CAATCCAGTG AATATTCTAG ATATAAAACA GGGACCAAAA GAACCCTTTA
|
1251AGGACTATGT GGATCGATTT TACAAAGCTC TGCGGGCGGA GCGAACAGAT
|
1301CCACAAGTGA AAAACTGGAT GACGCAGACA TTGCTCATCC AGAATGCAAA
|
1351CCCAGATTGT AAAGCCATTC TTAAGGGATT AGGCATGAAC CCCACCTTGG
|
1401AAGAAATGTT ATTGGCATGT CAAGGAGTAG GGGGACCAAA GTATAAAGCT
|
1451CAAATGATGG CAGAAGCAAT GCAGGAGGTG CAAGGAAAAA TTATGATGCA
|
1501AGCCTCGGGA GGACCACCGC GGGGTCCCCC AAGGCAGCCA CCCAGAAATC
|
1551CTAGATGCCC CAACTCTGGA AAGTTTGGAC ATGTACTGAG AGACTGTAGA
|
1601GCCCCAAGAA AGCGAGGATG CTTCAAGTGT GGAGATCCAG GACATGTGAT
|
1651GAGAAACTGC CCAAAGATGG TGAATTTTTT AGGGAATGCT CCCTGGGGCA
|
1701GTGGCAAACC CAGGAACTTT CCTGCCGTGC CACTGACCCC AACGGCACCC
|
1751CCGATGCCAG GATTAGAGGA CCCAGCAGAG AAGATGCTAC TOGATTACAT
|
1801GAAGAAGGGG CAACAGATGA AGGCAGAGAG GGAAGCCAAA CGGGAGAAGG
|
1851ACAAAGGCCC TTACGAGGCG GCTTACAACT CCCTCAGTTC TCTCTTTGGA
|
1901ACAGACCAAC TACAGTAGTA GAGATAGAGG GGCAAAAAGT GGAGGCCCTA
|
1951CTAGATACAG GAGCAGATGA CACAGTAATC AAAGATTTAC AATTAACAGG
|
2001CAATTGGAAA CCACAAATCA TAGGAGGAAT TGGAGGAGCA ATTAGGGTAA
|
2051AGCAATATTT CAATTGTAAA ATAACAGGG CAGGTAAAAG CACTCATGCT
|
2101TCACTACTAG TGGGCCCCAC TCCTCTAAAT ATTATAGGTA GAAATGTAGT
|
2151TAAAAAGTTA GGATGTACTT TGAACTTTCC TATTAGTAAG ATAGAAACAG
|
2201TAAAGGTAAC ACTAAAACCA GGAACTGATG GACCAAGAAT CAAACAGTGG
|
2251CCACTGTCTA AAGAAAAGAT TTTAGCCTTA CAAGAAATAT GCAATCAGAT
|
2301GGAAAAAGAA GGCAAAATCT CTAGAATAGG TCCAGAAAAT CCTTACAACA
|
2351CACCAGTGTT TTGTATAAAA AAGAAAGATG GAGCCAGCTG GAGAAAACTG
|
2401GTAGATTTTA GACAATTGAA TAAAGTGACA CAGGATTTCT TTGAGGTGCA
|
2451GCTAGGAATC CCACATCCTG GAGGTCTAAA ACAATGTGAA CAGATCACAG
|
2501TATTGCATAT AGGAGATGCC TATTTTTCAT GCCCATTGGA TGAGGACTTT
|
2551AGAAAGTATA CTGCATTCAC CATTCCATCG GTGAATAATC AGGGCCCAGG
|
2601AATCAGATAC CAGTATAATG TCCTCCCACA GGGATGGAAA GGCTCTCCAG
|
2651CAATTTTTCA GGCAACAGCT GATAAAATCT TGAAAACATT CAAAGAAGAA
|
2701TACCCAGAGG TATTAATTTA TCAGTATATG GATGATCTGT TCGTCCGAAG
|
2751TGACTTAAAT GCCACTGAAC ATAACAAAAT GATAAACAAG TTGAGAGAGC
|
2801ATCTGAGATT CTGGGGGCTC GAGACCCCAG ATAAGAAGTT TCAAAAGGAA
|
2851CCTCCTTTTG AATGGATGGG ATATGTGCTA CACCCAAAGA AATGGACAGT
|
2901GCAGAAAATA CAACTACCAG AAAAAGAGCA ATGGACAGTG AATGATATTC
|
2951AGAAATTGGT AGGAAAACTT AATTGGGCAA GTCAGATATA TTCCGGAATT
|
3001AAAACAAAAG AGCTCTGTAA ATTGATCAGA GGAGCAAAAC CTCTAGATGA
|
3051AATAGTAGAA TGGACAAGAG AAGCAGAATT AGAGTATGAA GAGAATAAGA
|
3101TAATAGTGCA GGAGGAGGTG CATGGAGTGT ACTATCAGCC AGAAAAACCA
|
3151CTGATGGCAA AAGTACAAAA GTTGACACAA GGACAGTGGA GTTATCAAAT
|
3201AGAGCAAGAA GAAAACAAAC CTCTCAAGGC AGGAAAATAT GCCAGGACAA
|
3251ACAATGCCCA CACAAATGAG TTAAGGACAC TTGCAGCGTT ACTACAAAAA
|
3301ATAGCCAAGG AATGCATAGT AATCTCGGCA AGATTGCCAA AATTTTACCT
|
3351CCCCTTGGAG AGAGAAGTAT GGGATCAATG GTGGCATGAT TATTGGCAGG
|
3401TAACATGGAT CCCAGAGTGG GAATTCATCT CAACACCACC ATTGATAAGG
|
3451CTATGGTACA ACCTCCTGAA AGAACCAATT CCAGGAGAAG ATGTATACTA
|
3501TGTAGATGGG GCAGCTAACA GAAATTCTAA AGAAGGCAAG GCACGATACT
|
3551ATACAGCAAG GGGCAAAAGT AAGGTAATAC CTTTAGAAAA TACAACCAAT
|
3601CACAAGGCAC AGCTGAAGGC AATAGAATTA GCCCTAAAAG ATTCAGGACC
|
3651AAGAGTAAAC ATAGTAACAC ATTCCCAGTA TGCATTAGGC ATACTCACAG
|
3701CATCCCCACA TCAGTCAGAT AACCCCATAG TTAGGGAAAT AATTAACCTC
|
3751ATGATAGCCA AGGAAGCAGT CTACCTGTCA TGGGTACCAG CCCACAAGGG
|
3801TATAGGAGGT AACGAACAAA TAGACAAATT AGTAAGCCAA GGAATTAGGC
|
3851AAGTACTATT CCTGGAAGGA ATAGACAGAG CTCAGGAACA ACACGACAAA
|
3901TATCATAACA ACTGGAGAGC TTTAGCTCAG CAATTCAGCA TACCTCCTAT
|
3951AGTGGCAAAA GAGATAGTTG CACAATGCCC AAAATGCCAG ATAAAAGGGG
|
4001AACCTATTCA TGGCCAGGTA GATGCAAGTC CTGGGACATG GCAAATGGAT
|
4051TGCACCCATC TAGAAGGAAA GGTCATCATA GTGGCAGTCC ATGTAGCCAG
|
4101TGGATACCTA GAGGCAGAAG TAATACCAGC AGAGACAGGA AAAGAGACAG
|
4151CACATTTCCT GTTAAAGTTA GCAGGCAGGT GGCCTGTAAA ACATTTACAC
|
4201ACTGACAATG GCCCCAACTT TGTCAGTGAA AAGGTAGCCA CAGTCTGTTG
|
4251GTGGGCTCAA ATAGAGCACA CCACAGGTGT ACCCTATAAC CCCCAGAGTC
|
4301AGGGAGTAGT GGAAGCAAAG AATCATCATC TTAAGACAAT CATAGAACAA
|
4351GTTAGGGATC AAGCAGAAAA ATTAGAAACA GCAGTACAAA TGGCAGTATT
|
4401AATACACAAT TTTAAAAGAA AAGGGGGGAT AGGGGAGTAT AGTCCAGGAG
|
4451AAAGAATAGT AGATATCATA ACCACAGACA TTCTAACAAC TAAATTACAA
|
4501CAAAATATTT CAAAAATTCA AAATTTTCGG GTTTATTACA GAGAAGGAAG
|
4551GGATCAACAG TGGAAAGGAC CAGCAGAACT CATTTGGAAA GGAGAAGGCG
|
4601CTGTGGTGAT TAAAGAAGGG ACACACTTAA AGGTGGTACC AAGAAGAAAA
|
4651GCCAAAATCA TCAGACATTA TGGAAAAGCA GTGGATAGTA ATTCCCACAT
|
4701GGAGAGTAGA GAGGAATCAG CTTGAGAAAT GCAATTCATT AGTAAAATAT
|
4751CATAAATATA GGGGAGAAAA ATACCTAGAA AGATGGGAAC TATACCACCA
|
4801TTTCCAATGC TCGGGGTGGT GGACACACTC TAGAAAAGAT GTTTACTTTA
|
4851AAGATGGCTC AGTAATAAGC ATTACTCCCT TCTGGAATCT TACCCCAGAG
|
4901AAAGGATGGT TGTCTCAATA TGCAGTTACA ATAGAATATG TAAAAGAAAG
|
4951CTATTATACT TACATAGACC CAGTTACAGC AGACAGAATG ATTCATTGGG
|
5001AATATTTCCC ATGTTTTACA GCCCAGGCTG TGAGAAAAGT ACTGTTTGGA
|
5051GAAAGACTAA TAGCTTGCTA CAGCCCCTGG GGACACAAAG GACAGGTAGG
|
5101GACTCTACAA TTCCTGGCTT TGCAAGCTTA CCTTCAGTAT TGTAAACATG
|
5151GCAGAAAGAG CACCAGAAGT GCCGGAAGGG GCAGGAGAGA TACCTCTAGA
|
5201ACAGTGGCTA GAAAGATCAT TAGAACAACT CAACAGAGAG GCCCGGTTAC
|
5251ACTTCCACCC AGAGTTCCTT TTCCGTCTTT GGAACACTTG TGTAGAACAT
|
5301TGGCATGATA GACACCAGAG GAGCCTGGAG TATGCAAAAT ACACATATCT
|
5351TTTGTTCGTG CATAAGGCCA TGTTTACCCA TATCCAACAG GGATGCCCAT
|
5401GTAGAAATGG GCACCCAAGA GGACCTCCTC CTCCAGGATT GGCCTAATTT
|
5451CTGTCTTGCA GATGGAACAG CCACCTGAGG ACGAGGCTCC ACAGAGAGAA
|
5501CCTTATAATG AATGGCTGAT AGATACCTTG GCAGAAATCC AGGAAGAAGC
|
5551TTTGAAGCAT TTTGATAGGC GCTTGCTACA TGCAGTAGGC TCATGGGTGT
|
5601ATGAGCAACA GGGAGACACC TTAGAAGGTG TCCAAAAAGT AATAACTATT
|
5651CTACAAAGAG CTTTGTTTTT GCACTTCAGG CATGGATGCA GGGAAAGCCG
|
5701CATTGCACAA GCAGGAGGGA AATATAATTC CCTCAGATCC TTTCCAAGGC
|
5731CAGACAACCC CTTGTAATAA ATGCTATTGT AAAAGATGTT GCTATCACTG
|
5801CCAGTTATGC TTCTTGCAGA AAGCCTTAGG GATAGATTAT GATGTCTACA
|
5851GAGTCAGGAG ACCTCGACAG AGATTTTTGG GCGAAGTACC ACCACATAGT
|
5901GCAGCAACTG TGGAAAGGTA AGTAAAAAGT AAGTAGACAT GCTTAGATAT
|
5951ATAGTTTTAG GAATAGTCAT AGGATTAGGG ATAGGACACC AATGGGTTAC
|
6001AGTGTATTAT GGAACACCTA AATGGCACCC AGCTAGGACA CATCTCTTTT
|
6051GTGCAACAGA TAATAATTCC TTTTGGGTCA CAACAAGTTG TGTGCCCAGC
|
6101CTATTGCACT ATGAAGAACA ACACATTCCC AACATAACAG AAAACTTCAC
|
6151AGGCCCCATA ACAGAGAATG AAGTAATAAG ACAAGCATGG GGAGCTATCT
|
6201CTTCCATGAT AGATGCAGTC TTAAAACCCT GTGTAAAGCT GACACCATAT
|
6251TGTGTCAAGA TGAAATGCAC AAAGGGAGAT ACTGATACTA CAGAAAGGAC
|
6301AACATCCACC ACTTCCTCTT GGTCCACATC CACCCCAACC TCTACCCCTA
|
6351TGACTCCCAA TACCACTGGA TTAGATATAG ACTCAAACAA TACAGAACCC
|
6401ACAACACAAG AGAATCGGAT ATGTAAATTT AATACTACAG GATTATGTAG
|
6451AGACTGCAGA TTGGAAATAG AAGAAAACTT CAGATATCAG GATATAACAT
|
6501GTAGAAATAG TAGTGAAGAT ACTGAAGAGT GCTATATGAC ACATTGTAAC
|
6551TCATCAGTAA TAACACAGCA TTGCAATAAG GCATCAACAG ATAAAATGAC
|
6601TTTTAGGTTG TGTGCACCAC CAGGATATGT CCTGTTGAGA TGTAGAGAAA
|
6651AGCTAAACCA AACCAAATTG TGTCGCAATA TTACAGCAGT GCAATGCACT
|
6701GACCCAATGC CTGCAACTAT ATCCACTATG TTTGGATTTA ATGGGACCAA
|
6751ACATGACTAT GATGAGCTAA TTTTAACAAA CCCTCAAAAG ATAAATGAGT
|
6801TTCATGATCA CAAGTATGTA TATAGAGTTG ATAAAAAATG GAAGCTACAG
|
6851GTAGTATGTA GAAGAAAAGG GAATAGATCA ATAATATCAA CGCCAAGTGC
|
6901TACGGGCTTA TTGTTCTATC ATGGGCTAGA ACCAGGGAAA AATTTAAAAA
|
6951AGGGGATGTG CCAGCTGAAG GGATTATGGG GAAAGGCCAT GCACCAACTA
|
7001TCAGAGGAAC TTAGAAAGAT AAATGGAAGT ATTTATAGAA AATGGAATGA
|
7051GACAGCAGGC TGCAGAAAGC TAAACAAACA GAACGGTACA GGTTGCTCAT
|
7101TGAAAACAAT AGAAGTTAGT GACTACACCA CGGAGGGCGA TCCGGGGGCA
|
7151GAGACAATTA TGCTTCTTTG TGGAGGTGAG TATTTCTTTT GTAATTGGAC
|
7201AAAGATTTGG AAGACATGGA ATAACCAGAC GTCAAATGTC TGGTATCCTT
|
7251GGATGTCATG CAATATTAGA CAAATTGTAG ATGATTGGCA TAAAGTAGGG
|
7301AAAAAAATTT ATATGCCTCC TGCAAGTGGA TTTAACAATG AGATAAGGTG
|
7351TACTAATGAT GTCACGGAAA TGTTCTTTGA GGTTCAGAAG AAGGAAGAGA
|
7401ATAAATATTT AATAAAGTTT ATACCTCAAG ATGAGATACA AAATCAGTAT
|
7451ACAGCAGTAG GAGCACATTA TAAATTGGTG AAAGTGGATC CTATAGGGTT
|
7501CGCACCCACA GATGTGCATA GATACCATCT ACCAGATGTA AAGCAGAAGA
|
7551GAGGAGCAGT CTTGCTTGGA ATGCTCGGCC TCTTAGGTTT GGCAGGTTCC
|
7601GCGATGGGCT CAGTGGCGAT AGCACTGACG GTCCAGTCCC AGGCTTTATT
|
7651GAATGGGATT GTGGAGCAGC AGAAGGTTCT GCTGAGCCTG ATAGATCAGC
|
7701ACTCCGAGTT ATTAAAACTA ACTATCTGGG GTGTAAAAAA TCTTCAGGCC
|
7751CGCCTCACAG CCTTGGAGGA ATACGTAGCG GACCAATCAA GACTGGCAGT
|
7801ATGGGGATGC TCATTCTCTC AAGTATGCCA CACTAATGTA AAGTGGCCTA
|
7851ATGATTCAAT AGTTCCTAAC TGGACCTCGG AAACATGGCT TGAATGGGAT
|
7901AAAAGAGTGA CAGCAATTAC AACAAATATG ACAATAGACT TGCAGAGGGC
|
7951ATATGAATTG GAACAAAAGA ATATGTTTGA GCTTCAAAAA TTAGGAGATC
|
8001TCACCTCCTG CGCCAGCTGG TTCGACCTCA CGTGGTGGTT TAAATATATT
|
8051AAGATAGGAA TTCTTATAAT AATAGTGATA ATAGGACTTA GAATATTAGC
|
8101TTGCTTATGG TCAGTATTAG GcAGGTTTAG GCAGGGTTAC CGCCCTCTTC
|
8151CTTATGTCTT CAAGGGAGAC TATCACCGAC CCCACAACCT CAAACAGCCA
|
8201GACAAAGAAA GAGGAGAAGA GCAAGACACA GAAAAACAGA ACATCAGCTC
|
8251AGAGAATTAC AGGCCAGGAT CTGGCAGAGC TTGGAGCAAA GAGCAAGTAG
|
8301AGACCTGGTG GAAGGAGTCC AGGCTCTACA TTTGGTTGAA GAGCACACAA
|
8351GCAGTAATTG AATATGGGTG GCAAGAGCTC AAAGCAGCAG GAGCAGAAAT
|
8401ATATAAAATA TTACAGAGCG CTGCGCAGAG GCTATGGAGC GGAGGGCACC
|
8451AACTCGGACT ATCATGTATT AGAGCAGCTA CAGCCTTTGG CAGAGGAGTC
|
8501AGAAACATTC CTAGACGCAT CAGACAAGGA GCAGAAGTCT TACTCAACTG
|
8551AGTTAGACTT AAGACATCAA CAAGATGTAA GCCTCCCCAC AGAAGAAGAA
|
8601CAGCCTTGGG AAGAGGAAGA GGAGGTAGGC TTTCCAGTCT ACCCACGACA
|
8651GCCTGTGCAT GAAGCCACCT ATAAAGACTT GATAGACCTG TGCCACTTTT
|
8701TAAAAGAAAA GGGGGGACTG GAAGGGATTT GGTGGTCTAA AAGAAGAGAA
|
8751GAAATCTTGG ATATATATGC ACAAAATGAA TGGCGAATTA TACCTGACTG
|
8801GCAGGCTTAC ACTTCAGGAC CGGGGATCAG GTATCCAAAA GCATTTGGGT
|
8851TCCTGTTTAA ACTGATCCCA GTGGCAGTTC CACCGGAACA AGAGAACAAT
|
8901GAATGCAATA GGCTGCTAAA CTCTTCTCAG ACAGGAATCC AGGAAGATCC
|
8951ATGGGGAGAA AGGCTCATGT GGAAGTTTGA CTCTGCTCTT GCCTATACTT
|
9001TCTATGCTCC CATAAAGAGG CCAGGAGAGT TCAAGCATGT CCAAAGTCTT
|
9051AGCTATGAAG CTTATAAGAA GGAACCTGAC TGCTGCAAGA GGAAGTGGTG
|
9101GCGCTTCTAG CCGACCACAG AGGGTTGCTA TGGCCATACC CTTTAAAACT
|
9151GCTAACTCTG GAGGGACTTT CCACTAGTGC ATGCGCACTG CACTGGGGAC
|
9201TTTCCAGGAT GACGCCGGGT GGGGGAGTGG TCAGCCCAAT CTGGCTGCAT
|
9251ATAAGCACCT CGCTTTGCGC TTGTATTGAG TCTCTCCCTG AGAGGCTACC
|
9301AGATTGAGCC TAGGTTGTTC TCTGGTGAGT CCTTGAAGGA GTGCCTGCTT
|
9351GTAGCCCTGG GCGGTTCGCA GGCCCCTGGC TTGTACCTCT GGGTAGCTCG
|
9401TCAGGTGTTC TCCAAAGGTC TTGCTAAGGG GACGCCTTTG CTTGGTCTTG
|
9451GTAGACCTCT AGCAGTCTCA GTGGCCAGGA GGCTGTGGGA TTGACTACCG
|
9501CTTGCTTCCC TTTGATGCTC AATAAAGCTT ACCCOAATTA GAAAGGCATT
|
9551CAAGTGTACT CGCTCATTTT GTCTTTGGTA GAAACTCTGG TTACTGGAGA
|
9601TCCCTCAGAT TTGTGCCAGA CTTCTGATAT CTAGTGAGAG T
|
[0085] In 3 reading frames, the nucleotide sequence was converted into amino acid sequences, after which the amino acid sequences of GAG (Table 8), POL (Table 9) and ENV (Table 10) were identified by homology comparisons.
8TABLE 8
|
|
GAG:
|
|
1IGASASGLRG RELDELEKIR LRPSGKKKYQ LKHVIWVSKE LDRFGLHEKL
|
51LETKEGCEKI LSVLFPLVPT GSENLISLYN TCCCIWCVHA KVKVADTEEA
|
101KEKVKQCYHL VVEKQNAASE KEKGATVTPS GHSRNYPIQI VNQTPVHQGI
|
151SPRTLNAWVK CIEEKKESPE IVPMFIALSE GCLPYDLNGM LNAIGDHQGA
|
201LQIVKDVIND EAADWDLRHP QMGPMPQGVL RNPTGSDIAG TTSSIEEQIE
|
251WTTRQQDQVN VGGIYKQWIV LGLQKCVSMY NPVNILDIKQ GPKEPFKDYV
|
301DRFYKALRAE RTDPQVKNWM TQTLLIQNAN PDCKAILKGL GMNPTLEEML
|
351LACQGVGGPK YKAQMMAEAM QEVQGKIMMQ ASGGPPRGPP RQPPRNPRCP
|
401NCGKFGHVLR DCRAPRKRGC FKCGDPGHLM RNCPKMVNFL GNAPWGSGKP
|
451RNFPAVPLTP TAPPMPGLED PAEKMLLDYM KKGQQMKAER EAYREKDKGP
|
501YEAAYNSLSS LFGTDQLQ
|
[0086]
9
TABLE 9
|
|
|
POL:
|
|
|
1
FFRECSLGQW QTQELSCRAT DPNGTPDARI RGPSREDATG LHEBGATDEG
|
|
51
REGSQTGEGQ RPLRGGLQLP QFSLWNRPTT VVEIEGQKVE ALLDTGADDT
|
|
101
VIKDLQLTGN WKPQIIGGIG GAIRVKQYFN CKITVAGKST HASVLVGPTP
|
|
151
VNIIGRNVLK KLGCTLNEPI SKIETVKVTL KPGTDGPRIK QWPLSKEKIL
|
|
201
ALQEICNQME KEGKISRIGP ENPYNTPVFC IKKKDGASWR KLVDFRQLNK
|
|
251
VTQDFFEVQL GIPHPGGLKQ CEQITVLDIG DAYFSCPLOE DFRKYTAFTI
|
|
301
PSVNNQGPGI RYQYNVLPQG WKGSPAIFQA TADKILKTFK EEYPEVLIYQ
|
|
351
YMDDLFVGSD LNATEHNKMI NKLREHLRFW GLETPDKKFQ KEPPFEWMGY
|
|
401
VLHPKKWTVQ KIQLPEKEQW TVNDIQKLVG KLNWASQIYS GIKTKELCKL
|
|
451
IRGAKPLDEI VEWTREAELE YEENKIIVQE EVHGVYYQPE KPLMAKVQKL
|
|
501
TQGQWSYQIE QEENKPLKAG KYARTKNAHT NELRTLAGLV QKIAKECLVT
|
|
551
WGRLPKFYLP LEREVWDQWW HDYWQVTWIP EWEFISTPPL IRLWYNLLKE
|
|
601
PIPGEDVYYV DGAANRNSKE GKAGYYTARG KSKVIALENT TNQKAELKAI
|
|
651
ELALKDSGPR VNIVTDSQYA LGILTASPDQ SDNPIVREII NLMIAKEAVY
|
|
701
LSWVPAHKGI GGNEQIDKLV SQGIRQVLFL EGIDRAQEEH DKYHNNWRAL
|
|
751
AQEFSIPPIV AKEIVAQCPK CQIKGEPIHG QVDASPGTWQ MDCTHLEGKV
|
|
801
IIVAVHVASG YLEAEVIPAE TGKETAHFLL KLAGRWPVKH LHTDNGPNFV
|
|
851
SEKVATVCWW AQIEHTTGVP YNPQSQGVVE AKNHHLKTII EQVRDQAEKL
|
|
901
ETAVQMAVLI HNFKRKGGIG EYSPGERIVD IITTDILTTK LQQNISKIQN
|
|
951
ERVYYREGRD QQWKGPAELI WKGEGAVVIK EGTDLKVVPR RKAKIIRDYG
|
|
1001
KAVDSNSHME SREESA*
|
|
[0087]
10
TABLE 10
|
|
|
ENV:
|
|
|
1
QWVTVYYGTP KWHPARTHLF CATDNNSFWV TTSCVPSLLH YEEQHIPNIT
|
|
51
ENFTGPITEN EVIRQAWGAI SSMIDAVLKR CVKLTPYCVK MKCTKGDTDT
|
|
101
TERTTSTTSS WSTSTPTSTP MTPNTTGLDI DSNNTEPTTQ ENRTCKFNTT
|
|
151
GLCRDCRLEI EENFRYQDIT CRNSSEDTEE CYMTHCNSSV TTQDCNKAST
|
|
201
DKMTFRLCAP PGYVLLRCRE KLNQTKLCGN ITAVQCTDPM PATISTMFGF
|
|
251
NGTKHDYDEL ILTNPQKINE FHDHKYVYRV DKKWKLQVVC RRKGNRSIIS
|
|
301
TPSATGLLFY HGLEPGKNLK KGMCQLKGLW GKAMHQLSEE LRKINGSIYR
|
|
351
KWNETAGCRK LNKQNGTGCS LKTTEVSEYT TEGDEGAETI MLLCGGEYFF
|
|
401
CNWTKIWKTW NNQTSNVWYP WMSCNIRQIV DDWHKVGKKI YMPPASGFNN
|
|
451
EIRCTNDVTE MFFEVQKKEE NKYLIKFIPQ DEIQNQYTAV GAHYKLVKVD
|
|
501
PTGFAPTDVH RYHLPDVKQK RGAVLLGMLG LLGLAGSAMG SVAIALTVQS
|
|
551
QALLNGIVEQ QKVLLSLIDQ NDSIVPNWTS GVKNLQARLT ALEEYVADQS
|
|
601
RLAVWGCSFS QVCHTNVKWP LTSWASWFDL ETWLEWDKRV TAITTNMTID
|
|
651
LQRAYELEQK NMFELQKLGD LTSWASWEDL TWWFKYIKIG ILITIVIIGL
|
|
701
RILACLWSVL GRFRQGYRPL PYVFKGKYHR PHNLKQPDKE RGEEQDREKQ
|
|
751
NISSENYRPG SGRAWSKEQV ETWWKESRLY IWLKSTQAVI EYGWQELKAA
|
|
801
GAEIYKILQS AAQRLWSGGH QLGLSCIRAA TAFGRGVRNI PRRIRQGAEV
|
|
851
LLN*
|
|
EXAMPLE 4
[0088] Determination of the phylogenetic position of SIM27
[0089] Selection of the sequences:
[0090] From the HIV WWW server of the LANL (Los Alamos National Laboratory, hiv-web.lanl.gov), 31 HIV and SIV sequences were selected which all comprised complete SIV genomes and representatives of the various HIV-1 and HIV-2 subtypes. The following sequences according to Table 11 were taken into consideration.
11TABLE 11
|
|
Genbank
Accession No.:Name:
|
AF075269SIV-1 'hoesti
AF077017SIV-SMM, PGM
L06042SIV-SYKES
M27470SIV-Mandrill, MNDGB1
L40990SIV-VERVET, REV
M29975SIV-VERVET, AGM155
M30931SIV-VERVET, AGM3
X07805SIV-VERVET, AGMTY6
M66437SIV-GRIVET, AGM677A
U04005SIV-SABAEUS, AGMSAB1
AF103818SIV-CPZ-US
U42720SIV-CPZ, CPZANT
X52154SIV-CPZ, CPZGAB
U58991SIV-TANTALUS, TAN1
U72748SIV, SME543
Y00277SIV-D, MAC250
M32741SIV-D, MNE
M33262SIV-D, MM239
L09213SIV-D, SMM-PBJ-6P9
M80194SIV-D, SMM9
M83293IV-D, STM
U51190HIV1-A, 92ug037
AF110967HIV1-C, 96bw05.02
M27323HIV1-D, NDK
AF005494HIV1-F, 93br020.1
AF005496HIV1-H, 90cr056
AJ006022HIV1-N, YBF30
L20587HIV1-O, ANT70C
L20571HIV1-O, MVP5180
D00835HIV2-A, CAM2
M30895HIV2-A, GH1
U27200HIV2-B, EHO
L07625HIV2-B, UC1
|
[0091] With the aid of the Genbank accession numbers of these sequences, the actual sequence entries were extracted from the gene database “Genbank”. With the aid of annotation, the genes env, gag and pol were extracted from these sequences and translated into the amino acid sequence. For the translation, only those sequences were used which were annotated as functional. Pseudogenes and genome sections not annotated as one of the 3 genes were not taken into consideration.
[0092] In addition, the sequence of the genome of SIM27 was compared with the actual gene database “Genbank” in order not to overlook an SIV partial sequence having a high relationship to SIM27. 2 partial sequences of SIVrcm (gag and pol) and a pol partial sequence of Mandrillus leucophaeus (Clewley J P et al., J. Virol. 1998; 72: 10305-10309) were identified as additionally relevant here:
12|
|
RCM-GAGSIV, RCM gag
RCM-POLSIV, RCM pol
CLEW-POLSIV, Drill, Clewley
|
[0093] In total, 4 data sets were obtained in this way: 3 protein data sets (env, gag and pol), and one from genomic sequences (GENOME).
[0094] Alignment:
[0095] The above sequences were aligned together with the corresponding SIM27 sequences using CLUSTALW (Version 1.74) with standard settings (Thompson J. D et al., Nucleic Acids Res. 22: 4673-4680 (1994)) The sequence alignments thus obtained were then checked manually.
[0096] The published pol partial sequence of drill monkeys (Clewley et al.), and the pol partial sequence of the RCM monkey was added once more each to the pol sequence alignment in analyses which were separate in each case. The same was carried out for the GAG partial sequence of the RCM monkeys for the gag alignments
[0097] For the addition of the individual sequences to the alignments, the profile alignment option of CLUSTALW 1.74 was used with standard settings.
[0098] 3 further protein data sets with small partial sequences RCM-GAG, RCM-POL and DRILL-POL thus resulted. Each of these data sets was considered only with respect to the region of the respective partial sequence.
[0099] Phylogenetic analyses:
[0100] Using the above seven alignments (GENOME (FIG. 1), GAG (FIG. 2), RCM-GAG (FIG. 3), POL (FIG. 4), RCM-POL (FIG. 5), DRILL-POL (FIG. 6), ENV (FIG. 7)), phylogenetic family trees were then independently set up. For this, the neighbor-joining method, as is implemented in CLUSTALW 1.74, was used in 1000 boot strap analyses. To calculate the trees, the standard settings were used, only all alignment gaps with holes were ignored, and the correction for multiple mutations was switched on.
EXAMPLE 5
[0101] Detection of the diagnostic relevance in the Western blot
[0102] According to known methods of molecular biology (Current Protocols in Molecular Biology, Wiley Interscience, 1994) , the region of env containing the cysteine loop was stabily expressed either as a fusion with the maltose-binding protein (pMAL-New England Biolabs) or as a fusion with β-Gal (Knapp et al., Biotechniques, Vol. 8, No. 3, 1990). The proteins were blotted on nitrocellulose, incubated overnight with the sera in a dilution of 1:100 in TBS containing 5% skimmed milk (150 mM NaCl, 50 mM tris pH 8.0), washed with, TBS and incubated with anti-human IgG-AP (Sigma A064) and anti-monkey IgG-AP (Sigma A1929) for 2 h in a dilution of 1:1000 and stained according to the manufacturer's instructions by means of Nitrotetrazolium Blue (Sigma N-6878) and 5-bromo-4-chloroindolyl phosphate.p-toluidine (Bachem M105). The results shown in Table 9 were obtained (FIG. 9).
13TABLE 9
|
|
Anti-Anti-HIV1-Anti-Anti-SIV-
Protein/HIV1subtypeHIV2drill 7
serumserumO serumserumserum
|
PMAL-HIV1env+++++−−
PSEM-HIV1-++++−−
subtype-O-env
pMAL-HIV2-env−−+++−
pMAL-SIM27-−−−+++
env
pMAL−−−−
PSEM−−−−
|
[0103] It was surprisingly seen here that the env region of SIM27 does not react with anti-HIV-1, anti-HIV-1 subtype O and anti-HIV2 sera and at the same time antibodies from SIM27, which react strongly with SIM27-env, could not be detected by the use of HIV-1-env, HIV-1-subtype O env and HIV2-env. It is therefore to be assumed from this that in the case in which SIM27 or a variant with comparable serological properties ought to complete the transition into the human population, the detection of antibodies against SIM27 in human sera is not possible with the tests currently employed, but rather SIM27-env, or antigens derived therefrom having comparable immunological properties, have to be employed.
FIGURES
[0104]
FIG. 1
[0105] Phylogenetic investigation of the sequences of Table 11 including the total genome of SIM27 as described in Example 4 by the multiple alignment and the neighbor-joining method of ClustalW Version 1.74
[0106]
FIG. 2
[0107] Phylogenetic investigation of the GAG proteins extracted from the sequences of Table 11 including the GAG protein of SIM27 (Table 8) as described in Example 4 by the multiple alignment and the neighbor-joining method of ClustalW Version 1.74
[0108]
FIG. 3
[0109] Phylogenetic investigation of the GAG proteins extracted from the sequences of Table 11 including the GAG protein of SIM27 (Table 8) and the GAG partial sequence of SIVrcm as described in Example 4 by the multiple alignment and the neighbor-joining method of ClustalW Version 1.74
[0110]
FIG. 4
[0111] Phylogenetic investigation of the POL proteins extracted from the sequences of Table 11 including the POL protein of SIM27 (Table 9) as described in Example 4 by the multiple alignment and the neighbor-joining method of ClustalW Version 1.74
[0112]
FIG. 5
[0113] Phylogenetic investigation of the POL proteins extracted from the sequences of Table 11 including the POL protein of SIM27 (Table 9) and the POL partial sequence of SIVrcm as described in Example 4 by the multiple alignment and the neighbor-joining method of ClustalW Version 1.74
[0114]
FIG. 6
[0115] Phylogenetic investigation of the POL proteins extracted from the sequences of Table 11 including the POL protein of SIM27 (Table 9) and the POL partial sequence as published by Clewley (Clewley J P et al., J. Virol. 1998; 72: 10305-10309) and as described in Example 4 by the multiple alignment and the neighbor-joining method of ClustalW Version 1.74
[0116]
FIG. 7
[0117] Phylogenetic investigation of the ENV proteins extracted from the sequences of Table 11 including the ENV protein of SIM27 (Table 10) as described in Example 4 by the multiple alignment and the neighbor-joining method of ClustalW Version 1.74
[0118]
FIG. 8
[0119] General survey of the individual PCR amplifications which lead to the complete genomic nucleic acid sequence of SIM27.
[0120]
FIG. 9
[0121] Western blot, as described in Example 5.
ABBREVIATIONS
[0122] HIV: Human immunodeficiency virus
[0123] SIV: Simian (monkey) immunodeficiency virus
[0124] HTLV: Human T-lymphoma virus
[0125] STLV: Simian T-lymphoma virus
[0126] p: Protein
[0127] gp: Glycoprotein
[0128] pol: Gene of the enzymes of HIV or SIV, designated according to the polymerase
[0129] gag: Gene of the core proteins of HIV or SIV
[0130] env: Gene of the surface glycoproteins/glycoproteins of HIV or SIV
[0131] IN: Integrase
[0132] RT: Reverse transcriptase
[0133] PR: Protease
Claims
- 1. The immunodeficiency virus SIM27, whose RNA or a part thereof is complementary to the sequence according to Table 2 or Table 3 or Table 4 or Table 5 or Table 6 or Table 7.
- 2. A variant of the immunodeficiency virus as claimed in claim 1, comprising an RNA which is complementary to a DNA which is investigated by the following process:
(a) extraction of the sequences mentioned in Table 11 from the gene database “Genbank” and loading of the sequences including the sequence to be investigated into the computer program “ClustalW Version 1.74”, (b) multiple alignment of the sequences according to Table 11 and of the sequence to be investigated and phylogenetic analysis of the data obtained by the neighbor-joining method by means of the computer program “ClustalW Version 1.74”, (c) visualization of the family tree data obtained as a family tree using a suitable presentation program, wherein the variant branches off along the distance from the end up to the first branching point from the branch of the family tree on the end of which SIM27 is located.
- 3. The GAG protein of SIM27 or a variant thereof, which is investigated by the following process:
(a) extraction of the GAG portions of the sequences mentioned in Table 11 from the gene database “Genbank” and loading of the corresponding amino acid sequences into the computer program “ClustalW Version 1.74”, (b) multiple alignment of these amino acid sequences with the sequence according to Table 8 and phylogenetic analysis of the data obtained by the neighbor-joining method by means of the computer program “ClustalW Version 1.74”, (c) visualization of the family tree data obtained as a family tree using a suitable presentation program; wherein the variant branches off along the distance from the end up to the first branching point from the branch of the family tree on the end of which SIM27-gag is located.
- 4. The Env protein of SIM27 or a variant thereof, which is investigated by the following process:
(a) extraction of the ENV portions of the sequences mentioned in Table 11 from the gene database “Genbank” and loading of the corresponding amino acid sequences into the computer program “ClustalW Version 1.74”, (b) multiple alignment of these amino acid sequences with the sequences according to Table 10 and phylogenetic analysis of the data obtained by the neighbor-joining method by means of the computer program “ClustalW Version 1.74”, (c) visualization of the family tree data obtained as a family tree using a suitable presentation program; wherein the variant branches off along the distance from the end up to the first branching point from the branch of the family tree on the end of which SIM27-env is located.
- 5. The POL protein of SIM27 or a variant thereof, which is investigated by the following process:
(a) extraction of the POL portions of the sequences mentioned in Table 11 from the gene database “Genbank” and loading of the corresponding amino acid sequences into the computer program “ClustalW Version 1.74”, (b) multiple alignment of these amino acid sequences with the sequences according to Table 9 and phylogenetic analysis of the data obtained by the neighbor-joining method by means of the computer program “ClustalW Version 1.74”, (c) visualization of the family tree data obtained as a family tree using a suitable presentation program; wherein the variant branches off along the distance from the end up to the first branching point from the branch of the family tree on the end of which SIM27-pol is located.
- 6. The use of a virus as claimed in claim 1 or of a protein as claimed in one or more of claims 3 to 5 for the detection of antibodies directed against an immunodeficiency virus in a sample.
Priority Claims (1)
Number |
Date |
Country |
Kind |
199 36 003.0 |
Aug 1999 |
DE |
|
Divisions (1)
|
Number |
Date |
Country |
Parent |
09625972 |
Jul 2000 |
US |
Child |
10364360 |
Feb 2003 |
US |