Nucleotide sequence of the Mycoplasma genitalium genome, fragments thereof, and uses thereof

Information

  • Patent Application
  • 20030170663
  • Publication Number
    20030170663
  • Date Filed
    July 26, 2002
    22 years ago
  • Date Published
    September 11, 2003
    21 years ago
Abstract
The present invention provides the nucleotide sequence of the entire genome of Mycoplasma genitalium, SEQ ID NO: 1. The present invention further provides the sequence information stored on computer readable media, and computer-based systems and methods which facilitate its use. In addition to the entire genomic sequence, the present invention identifies protein encoding fragments of the genome, and identifies, by position relative to two (2) genes known to flank the origin of replication, any regulatory elements which modulate the expression of the protein encoding fragments of the Mycoplasma genitalium genome.
Description


REFERENCE TO SEQUENCE LISTING

[0003] This application refers to a “Sequence Listing” listed below, which is provided as an electronic document on two identical compact discs (CD-R), labeled “Copy 1” and “Copy 2.” These compact discs each contain the file “PB196P1D1.ST25.txt” (735,244 bytes, created on Jun. 24, 2002), which is hereby incorporated in its entirety herein.



FIELD OF THE INVENTION

[0004] The present invention relates to the field of molecular biology. The invention discloses compositions comprising the nucleotide sequence of Mycoplasma genitalium, fragments thereof, and its use in medical diagnostics, therapies and pharmaceutical development.



BACKGROUND OF THE INVENTION

[0005] Mycoplasmas are the smallest free-living bacterial organisms known (Colman, S. D. et al., Mol. Microbiol. 4:683-687 (1990)). Mycoplasmas are thought to have evolved from higher gram-positive bacteria through the loss of genetic material (Bailey, C. C. et al., J. Bacteriol. 176:5814-5819 (1994)). Mycoplasma genitalium (M. genitalium) is widely considered to be the smallest self-replicating biological system, as the molecular size of its genome has been shown to be only 570-600 kp (Pyle, L. E. et al., Nucleic Acids Res. 16(13):6015-6025 (1988); Peterson, S. N. et al., J. Bacteriol. 175:7918-7930 (1993)). All mycoplasmas lack a cell wall and have small genomes and a characteristically low G+C content (Razin, S., Microbiol. Rev. 49(4):419-455 (1985); Peterson, S. N. et al., J. Bacteriol. 175:7918-7930 (1993)). Some mycoplasmas, including M. genitalium, have a specialized codon usage, whereby UGA encodes tryptophan rather than serving as a stop codon (Inamine, J. M. et al., J. Bacteriol. 172:504-506 (1990); Tanaka, J. G. et al., Nucleic Acids Res. 19:6787-6792 (1991); Yamao, F. A. et al., Proc. Natl. Acad. Sci. USA 82:2306-2309 (1985)).


[0006] Mycoplasmas are widely known to be significant pathogens of humans, animals, and plants (Bailey, C. C. et al., J. Bacteriol. 176:5814-5819 (1994)). The metabolic systems of mycoplasmas indicate that they are generally biosynthetically deficient, and thus depend on the microenvironment of the host by characteristically adhering to host cells in order to obtain essential precursor molecules, i.e., amino acids, fatty acids and sterols etc. (Baseman, J. B., 1987. Mycoplasma Cell Membranes, Vol. 20. The Plenum Press, New York, N.Y.).


[0007] In particular, M. genitalium, a newly discovered species, is a pathogenic etiological agent first isolated in 1980 from the urethras of human males infected with non-gonococcal urethritis (Tully, J. G. et al., Lancet 1:1288-1291 (1981); Tully, J. G., et al., Int. J. Syst. Bacteriol. 33:387-396 (1983)). M. genitalium has also been identified in specimens of pneumonia patients as a co-isolate of Mycoplasma pneumoniae (Baseman, J. B. et al., J. Clin. Microbiol. 26:2266-2269 (1988)). M. genitalium opportunistic infection has often been observed in individuals infected with human immunodeficiency virus type 1 (HIV-1) (Lo, S. -C. et al., Amer. J. Trop. Med. Hyg. 41:601-616 (1989); Lo, S. -C. et al., Amer. J. Trop. Med. Hyg. 41:601-616 (1989); Sasaki, Y. et al., AIDS Res. Hum. Retrov. 9(8):775-780 (1993)). Mycoplasmas can also induce various cytokines, including tumor necrosis factor, which may enhance HIV replication (Chowdhury, I. H. et al., Biochem. Biophys. Res. Commun. 170:1365-1370 (1990)).


[0008] A high amino acid homology exists between the attachment protein of M. genitalium and the aligned proteins of several human Class II major histocompatibility complex proteins (HLA), suggesting that M. genitalium infection may play an important role in triggering autoimmune mechanisms, thereby aggravating the immunodeficiency characteristics of acquired immune deficiency syndrome (AIDS) (Montagnier, L. et al., C.R. Acad. Sci. Paris 311(3):425-430 (1990); Root-Bernstein, R. S. et al., Res. Immunol. 142:519-523 (1991); Bisset, L. R. Autoimmunity 14:167-168 (1992)). A diagnostic immunoassay for detecting M. genitalium infection using monoclonal antibodies specific for some M. genitalium antigens has been developed. Baseman, J. B. et al., U.S. Pat. No. 5,158,870.


[0009] Due to its diminutive genomic size, M. genitalium provides a useful model for determining the minimum number of genes and protein products necessary for a host-independent existence. M. genitalium expresses a characteristically low number of base-pairs and low G+C content, which along with its UGA tryptophan codon, has hampered sequencing efforts by conventional techniques (Razin, A., Microbiol. Rev. 49(4):419-455 (1985); Colman, S. D. et al., Gene 87:91-96 (1990); Dybvig, K. 1992. Gene Transfer In: Maniloff, J. (ed.) Mycoplasmas: Molecular Biology and Pathogenesis., Am. Soc. Microbiol. Washington, D.C., pp.355-362)). M. genitalium possesses a single circular chromosome (Colman, S. D. et al., Gene 87:91-96 (1990); Peterson, S. N. et al., J. Bacteriol. 175:7918-7930 (1993)). The characterization of the genome of M. genitalium has also been hampered by the lack of auxotrophic mutants and by the lack of a system for genetic exchange, precluding reverse genetic approaches. Thus, the sequencing of the M. genitalium genome would enhance the understanding of how M. genitalium causes or promotes various invasive or immunodeficiency diseases and to how best to medically combat mycoplasma infection.


[0010] Prior attempts at characterizing the structure and gene arrangement of the chromosomes of mycoplasmas using pulsed-field gel electrophoretic methods (Pyle, L. E. et al., Nucleic Acids Res. 16(13):6015-6025 (1988); Neimark, H. C. et al., Nucleic Acids Res. 18(18):5443-5448 (1990)), indicated that mycoplasmas have genomes ranging widely in size. Southern blot hybridization of digested DNAs of M. genitalium compared to the well-known human pathogen, M. pneumoniae, indicated overall low homology values of approximately 6-8% (Yogev, D. et al., Int. J. Syst. Bacteriol. 36(3):426-430 (1986)). However, high homologies have been reported between the adhesin genes of M. genitalium and M. pneumoniae (Dallo, S. F. et al., Microbial Path. 6:69-73 (1989)). Initial studies at characterizing the genome of M. genitalium by comparison to the well-known M. pneumoniae species, indicated that both species have three (3) rRNA genes clustered together in a chromosomal segment of about 5 kb and form a single operon organized in classical procaryotic fashion, but differences exist between their respective restriction sites (Yogev, D. et al., Int. J. Syst. Bacteriol. 36(3):426-430 (1986)).


[0011] Restriction enzyme mapping of M. genitalium indicates that the genome is approximately 600 kb. Several genes have also been mapped, including the single ribosomal operon, and the gene encoding the MgPa cytadhesion protein (Su, C. J. et al., J. Bacteriol. 172:4705-4707 (1990); Colman, S. D. et al., Mol. Microbiol. 4(4):683-687 (1990)). The entire restriction map of the genome of M. genitalium has also been cloned in an ordered library of 20 overlapping cosmids and one λ clone (Lucier, T. S. et al., Gene 150:27-34 (1994)).


[0012] An initial study using random sequencing techniques to characterize the M. genitalium genome resulted in forty-four (44) random clones being partially sequenced; several long open reading frames were also found (Peterson, S. N. et al., Nucleic Acids Rev. 19:6027-6031 (1991)). Subsequent work using random sequencing of 508 random nonidentical clones has allowed sequence information to be compiled for approximately seventeen percent (17%) (100,993 nucleotides) of the M. genitalium genome (Peterson, S. N. et al., J. Bacteriol. 175:7918-7930 (1993)). Sequence information indicates that the diminutive genome of M. genitalium contains numerous genes involved in various metabolic processes. The genome is estimated to encode approximately 390 proteins, indicating that M. genitalium makes very efficient use of its limited amount of DNA (Peterson, S. N. et al., J. Bacteriol. 175:7918-7930 (1993)).


[0013] Several studies have been undertaken to sequence and characterize individual genes identified in M. genitalium. In particular, the medically important aspects of M. genitalium have helped to direct interest to those genes which determine the degree of infectivity and the virulence characteristics of the organism. The nucleotide sequence and deduced amino acid sequence for the MgPa adhesin gene, i.e., the gene encoding the surface cytadhesion protein of M. genitalium, indicates that the complete gene contains 4,335 nucleotides coding for a protein of 159,668 Da. (Dallo, S. F. et al., Infect. Immun. 57(4):1059-1065 (1989)). Furthermore, subsequent nucleotide sequencing of the M. genitalium MgPa adhesin gene revealed the specific codon order for this important gene (Inamine, J. M. et al., Gene 82:259-267 (1989)). The MgPa adhesin gene also has been shown to express restriction fragment length polymorphism (Dallo, S. F. et al., Microbial Path. 10:475-480 (1991)). Nucleotide homology to the well-known highly conserved procaryotic origin-of-replication gene (gyrA) was noted for M. genitalium (Bailey, C. C. et al., J. Bacteriol. 176:5814-5819 (1994)). The highly conserved procaryotic elongation factor, Tu, encoded by the tuf gene, has been noted and sequenced for M. genitalium, and was found to contain an open reading frame encoding a protein of approximately 393 amino acids (Loechel, S. et al., Nucleic Acids Res. 17(23):10127 (1989)). The tuf gene of M. genitalium has also been determined to use a signal other than a Shine-Dalgamo (ribosomal binding site) sequence preceding the initiation codon (Loechel, S. et al., Nucleic Acids Res. 19:6905-6911 (1991)).



SUMMARY OF THE INVENTION

[0014] The present invention is based on the sequencing of the Mycoplasma genitalium genome. The primary nucleotide sequence which was generated is provided in SEQ ID NO: 1.


[0015] The present invention provides the generated nucleotide sequence of the Mycoplasma genitalium genome, or a representative fragment thereof, in a form which can be readily used, analyzed, and interpreted by a skilled artisan. In one embodiment, present invention is provided as a contiguous string of primary sequence information corresponding to the nucleotide sequence depicted in SEQ ID NO: 1.


[0016] The present invention further provides nucleotide sequences which are at least 99.9% identical to the nucleotide sequence of SEQ ID NO: 1.


[0017] The nucleotide sequence of SEQ ID NO: 1, a representative fragment thereof, or a nucleotide sequence which is at least 99.9% identical to the nucleotide sequence of SEQ ID NO: 1 may be provided in a variety of mediums to facilitate its use. In one application of this embodiment, the sequences of the present invention are recorded on computer readable media. Such media includes, but is not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media.


[0018] The present invention further provides systems, particularly computer-based systems which contain the sequence information herein described stored in a data storage means. Such systems are designed to identify commercially important fragments of the Mycoplasma genitalium genome.


[0019] Another embodiment of the present invention is directed to isolated fragments of the Mycoplasma genitalium genome. The fragments of the Mycoplasma genitalium genome of the present invention include, but are not limited to, fragments which encode peptides, hereinafter open reading frames (ORFs), fragments which modulate the expression of an operably linked ORF, hereinafter expression modulating fragments (EMFs), fragments which mediate the uptake of a linked DNA fragment into a cell, hereinafter uptake modulating fragments (UMFs), and fragments which can be used to diagnose the presence of Mycoplasma genitalium in a sample, hereinafter, diagnostic fragments (DFs).


[0020] Each of the ORF fragments of the Mycoplasma genitalium genome disclosed in Tables 1(a), 1(c) and 2, and the EMF found 5′ to the ORF, can be used in numerous ways as polynucleotide reagents. The sequences can be used as diagnostic probes or diagnostic amplification primers for the presence of a specific microbe in a sample, for the production of commercially important pharmaceutical agents, and to selectively control gene expression.


[0021] The present invention further includes recombinant constructs comprising one or more fragments of the Mycoplasma genitalium genome of the present invention. The recombinant constructs of the present invention comprise vectors, such as a plasmid or viral vector, into which a fragment of the Mycoplasma genitalium has been inserted.


[0022] The present invention further provides host cells containing any one of the isolated fragments of the Mycoplasma genitalium genome of the present invention. The host cells can be a higher eukaryotic host such as a mammalian cell, a lower eukaryotic cell such as a yeast cell, or can be a procaryotic cell such as a bacterial cell.


[0023] The present invention is further directed to isolated proteins encoded by the ORFs of the present invention. A variety of methodologies known in the art can be utilized to obtain any one of the proteins of the present invention. At the simplest level, the amino acid sequence can be synthesized using commercially available peptide synthesizers. In an alternative method, the protein is purified from bacterial cells which naturally produce the protein. Lastly, the proteins of the present invention can alternatively be purified from cells which have been altered to express the desired protein.


[0024] The invention further provides methods of obtaining homologs of the fragments of the Mycoplasma genitalium genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention. Specifically, by using the nucleotide and amino acid sequences disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs.


[0025] The invention further provides antibodies which selectively bind one of the proteins of the present invention. Such antibodies include both monoclonal and polyclonal antibodies.


[0026] The invention further provides hybridomas which produce the above-described antibodies. A hybridoma is an immortalized cell line which is capable of secreting a specific monoclonal antibody.


[0027] The present invention further provides methods of identifying test samples derived from cells which express one of the ORF of the present invention, or homolog thereof. Such methods comprise incubating a test sample with one or more of the antibodies of the present invention, or one or more of the DFs of the present invention, under conditions which allow a skilled artisan to determine if the sample contains the ORF or product produced therefrom.


[0028] In another embodiment of the present invention, kits are provided which contain the necessary reagents to carry out the above-described assays.


[0029] Specifically, the invention provides a compartmentalized kit to receive, in close confinement, one or more containers which comprises: (a) a first container comprising one of the antibodies, or one of the DFs of the present invention; and (b) one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting presence of bound antibodies or hybridized DFs.


[0030] Using the isolated proteins of the present invention, the present invention further provides methods of obtaining and identifying agents capable of binding to a protein encoded by one of the ORFs of the present invention. Specifically, such agents include antibodies (described above), peptides, carbohydrates, pharmaceutical agents and the like. Such methods comprise the steps of:


[0031] (a) contacting an agent with an isolated protein encoded by one of the ORFs of the present invention; and


[0032] (b) determining whether the agent binds to said protein.


[0033] The complete genomic sequence of M. genitalium will be of great value to all laboratories working with this organism and for a variety of commercial purposes. Many fragments of the Mycoplasma genitalium genome will be immediately identified by similarity searches against GenBank or protein databases and will be of immediate value to Mycoplasma researchers and for immediate commercial value for the production of proteins or to control gene expression. A specific example concerns PHA synthase. It has been reported that polyhydroxybutyrate is present in the membranes of M. genitalium and that the amount correlates with the level of competence for transformation. The PHA synthase that synthesizes this polymer has been identified and sequenced in a number of bacteria, none of which are evolutionarily close to M. genitalium. This gene has yet to be isolated from M. genitalium by use of hybridization probes or PCR techniques. However, the genomic sequence of the present invention allows the identification of the gene by utilizing search means described below.


[0034] Developing the methodology and technology for elucidating the entire genomic sequence of bacterial and other small genomes has and will greatly enhance the ability to analyze and understand chromosomal organization. In particular, sequenced genomes will provide the models for developing tools for the analysis of chromosome structure and function, including the ability to identify genes within large segments of genomic DNA, the structure, position, and spacing of regulatory elements, the identification of genes with potential industrial applications, and the ability to do comparative genomic and molecular phylogeny.







BRIEF DESCRIPTION OF THE FIGURES

[0035]
FIG. 1—EcoRI restriction map of the Mycoplasma genitalium genome.


[0036]
FIG. 2—Block diagram of a computer system 102 that can be used to implement the computer-based systems of present invention.


[0037]
FIG. 3—Summary of the Mycoplasma genitalium sequencing project.


[0038]
FIG. 4—A circular representation of the M. genitalium chromosome. Outer concentric circle: Coding regions on the plus strand for which a gene identification was made. Second concentric circle: Coding regions on the minus strand for which a gene identification was made. Third concentric circle: The direction of transcription on each strand of the chromosome is depicted as an arrow starting at the putative origin of replication. Fourth concentric circle: Coverage by cosmid and lambda clones. Nineteen cosmid clones and one lambda clone were sequenced from each end to confirm the overall structure of the genome. Fifth concentric circle: The locations of the single ribosomal operon and the 33 tRNAs. The clusters of tRNAs (trnA, trnB, trnC, trnD and trnE) are indicated by the letters A-E with the number of tRNAs in each cluster listed in parentheses. Sixth concentric circle: Location of the MgPa operon and MgPa repeat fragments.


[0039] FIGS. 5A-5R—Gene map of the M. genitalium genome. Predicted coding regions are shown on each strand. The rRNA operon and tRNA genes are shown as described in the Figure key. Gene identification numbers correspond to those in Table 6.


[0040]
FIG. 6—Location of the MgPa repeats in the M. genitalium genome. The structure of the MgPa operon (ORF1-MgPa gene-ORF3) in the M. genitalium genome is illustrated across the top. In addition to the complete operon, nine repetitive elements which are composites of particular regions of the MgPa operon were found. The coordinates of each repeat in the genome are indicated on the left and right end of each line. The repetitive elements are located directly below those regions in the operon for which there is sequence similarity. The percent of sequence identity between the repeat elements and the MgPa gene ranges from 78%-90%. In some of the repeats, the MgPa-related sequences are separated in the genome by a variable length, A-T rich spacer sequence (indicated in the figure by a line with the length of the spacer indicated in bp). In cases where no spacer sequence is shown, the composites of the operon are co-linear in the genome. In repeats 7 and 9, the order of the sequences in the repeats differs from that in the operon. In these cases, the order of the elements in each repeat in the genome is indicated numerically where element 1 is followed by element 2 which is followed by element 3, etc.







DETAILED DESCRIPTION

[0041] The present invention is based on the sequencing of the Mycoplasma genitalium genome. The primary nucleotide sequence which was generated is provided in SEQ ID NO: 1. As used herein, the “primary sequence” refers to the nucleotide sequence represented by the IUPAC nomenclature system.


[0042] The sequence provided in SEQ ID NO: 1 is oriented relative to two genes (DNAA and DNA gyrase) known to flank the origin of replication of the Mycoplasma genitalium genome. A skilled artisan will readily recognize that this start/stop point was chosen for convenience and does not reflect a structural significance.


[0043] The present invention provides the nucleotide sequence of SEQ ID NO: 1, or a representative fragment thereof, in a form which can be readily used, analyzed, and interpreted by a skilled artisan. In one embodiment, the sequence is provided as a contiguous string of primary sequence information corresponding to the nucleotide sequence provided in SEQ ID NO: 1.


[0044] As used herein, a “representative fragment of the nucleotide sequence depicted in SEQ ID NO: 1” refers to any portion of SEQ ID NO: 1 which is not presently represented within a publicly available database. Preferred representative fragments of the present invention are Mycoplasma genitalium open reading frames, expression modulating fragments, uptake modulating fragments, and fragments which can be used to diagnose the presence of Mycoplasma genitalium in sample. A non-limiting identification of such preferred representative fragments is provided in Tables 1(a), 1(c) and 2.


[0045] The nucleotide sequence information provided in SEQ ID NO: 1 was obtained by sequencing the Mycoplasma genitalium genome using a megabase shotgun sequencing method. The nucleotide sequence provided in SEQ ID NO: 1 is a highly accurate, although not necessarily a 100% perfect, representation of the nucleotide sequence of the Mycoplasma genitalium genome.


[0046] As discussed in detail below, using the information provided in SEQ ID NO: 1 and in Tables 1(a), 1(c) and 2 together with routine cloning and sequencing methods, one of ordinary skill in the art would be able to clone and sequence all “representative fragments” of interest including open reading frames (ORFs) encoding a large variety of Mycoplasma genitalium proteins. In very rare instances, this may reveal a nucleotide sequence error present in the nucleotide sequence disclosed in SEQ ID NO: 1. Thus, once the present invention is made available (i.e., once the information in SEQ ID NO: 1 and Tables 1(a), 1(c) and 2 have been made available), resolving a rare sequencing error in SEQ ID NO: 1 would be well within the skill of the art. Nucleotide sequence editing software is publicly available. For example, Applied Biosystem's (AB) AutoAssembler™ can be used as an aid during visual inspection of nucleotide sequences.


[0047] Even if all of the very rare sequencing errors in SEQ ID NO: 1 were corrected, the resulting nucleotide sequence would still be at least 99.9% identical to the nucleotide sequence in SEQ ID NO: 1.


[0048] The nucleotide sequences of the genomes from different strains of Mycoplasma genitalium differ slightly. However, the nucleotide sequence of the genomes of all Mycoplasma genitalium strains will be at least 99.9% identical to the nucleotide sequence provided in SEQ ID NO: 1.


[0049] Thus, the present invention further provides nucleotide sequences which are at least 99.9% identical to the nucleotide sequence of SEQ ID NO: 1 in a form which can be readily used, analyzed and interpreted by the skilled artisan. Methods for determining whether a nucleotide sequence is at least 99.9% identical to the nucleotide sequence of SEQ ID NO: 1 are routine and readily available to the skilled artisan. For example, the well known fasta algorithm (Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85:2444 (1988)) can be used to generate the percent identity of nucleotide sequences.


[0050] Computer Related Embodiments


[0051] The nucleotide sequence provided in SEQ ID NO: 1, a representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 may be “provided” in a variety of mediums to facilitate use thereof. As used herein, provided refers to a manufacture, other than an isolated nucleic acid molecule, which contains a nucleotide sequence of the present invention, i.e., the nucleotide sequence provided in SEQ ID NO: 1, a representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1. Such a manufacture provides the Mycoplasma genitalium genome or a subset thereof (e.g., a Mycoplasma genitalium open reading frame (ORF)) in a form which allows a skilled artisan to examine the manufacture using means not directly applicable to examining the Mycoplasma genitalium genome or a subset thereof as it exists in nature or in purified form.


[0052] In one application of this embodiment, a nucleotide sequence of the present invention can be recorded on computer readable media. As used herein, “computer readable media” refers to any medium which can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. A skilled artisan can readily appreciate how any of the presently known computer readable mediums can be used to create a manufacture comprising computer readable medium having recorded thereon a nucleotide sequence of the present invention.


[0053] As used herein, “recorded” refers to a process for storing information on computer readable medium. A skilled artisan can readily adopt any of the presently know methods for recording information on computer readable medium to generate manufactures comprising the nucleotide sequence information of the present invention.


[0054] A variety of data storage structures are available to a skilled artisan for creating a computer readable medium having recorded thereon a nucleotide sequence of the present invention. The choice of the data storage structure will generally be based on the means chosen to access the stored information. In addition, a variety of data processor programs and formats can be used to store the nucleotide sequence information of the present invention on computer readable medium. The sequence information can be represented in a word processing text file, formatted in commercially-available software such as WordPerfect and Microsoft Word, or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase, Oracle, or the like. A skilled artisan can readily adapt any number of data processor structuring formats (e.g. text file or database) in order to obtain computer readable medium having recorded thereon the nucleotide sequence information of the present invention.


[0055] By providing the nucleotide sequence of SEQ ID NO: 1, a representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 in computer readable form, a skilled artisan can routinely access the sequence information for a variety of purposes. Computer software is publicly available which allows a skilled artisan to access sequence information provided in a computer readable medium. The examples which follow demonstrate how software which implements the BLAST (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) and BLAZE (Brutlag et al., Comp. Chem. 17:203-207 (1993)) search algorithms on a Sybase system was used to identify open reading frames (ORFs) within the Mycoplasma genitalium genome which contain homology to ORFs or proteins from other organisms. Such ORFs are protein encoding fragments within the Mycoplasma genitalium genome and are useful in producing commercially important proteins such as enzymes used in fermentation reactions and in the production of commercially useful metabolites.


[0056] The present invention further provides systems, particularly computer-based systems, which contain the sequence information described herein. Such systems are designed to identify commercially important fragments of the Mycoplasma genitalium genome.


[0057] As used herein, “a computer-based system” refers to the hardware means, software means, and data storage means used to analyze the nucleotide sequence information of the present invention. The minimum hardware means of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means, and data storage means. A skilled artisan can readily appreciate that any one of the currently available computer-based system are suitable for use in the present invention.


[0058] As stated above, the computer-based systems of the present invention comprise a data storage means having stored therein a nucleotide sequence of the present invention and the necessary hardware means and software means for supporting and implementing a search means. As used herein, “data storage means” refers to memory which can store nucleotide sequence information of the present invention, or a memory access means which can access manufactures having recorded thereon the nucleotide sequence information of the present invention.


[0059] As used herein, “search means” refers to one or more programs which are implemented on the computer-based system to compare a target sequence or target structural motif with the sequence information stored within the data storage means. Search means are used to identify fragments or regions of the Mycoplasma genitalium genome which match a particular target sequence or target motif. A variety of known algorithms are disclosed publicly and a variety of commercially available software for conducting search means are available and can be used in the computer-based systems of the present invention. Examples of such software includes, but is not limited to, MacPattern (EMBL), BLASTN and BLASTX (NCBIA). A skilled artisan can readily recognize that any one of the available algorithms or implementing software packages for conducting homology searches can be adapted for use in the present computer-based systems.


[0060] As used herein, a “target sequence” can be any DNA or amino acid sequence of six or more nucleotides or two or more amino acids. A skilled artisan can readily recognize that the longer a target sequence is, the less likely a target sequence will be present as a random occurrence in the database. The most preferred sequence length of a target sequence is from about 10 to 100 amino acids or from about 30 to 300 nucleotide residues. However, it is well recognized that searches for commercially important fragments of the Mycoplasma genitalium genome, such as sequence fragments involved in gene expression and protein processing, may be of shorter length.


[0061] As used herein, “a target structural motif,” or “target motif,” refers to any rationally selected sequence or combination of sequences in which the sequence(s) are chosen based on a three-dimensional configuration which is formed upon the folding of the target motif. There are a variety of target motifs known in the art. Protein target motifs include, but are not limited to, enzymatic active sites and signal sequences. Nucleic acid target motifs include, but are not limited to, promoter sequences, hairpin structures and inducible expression elements (protein binding sequences).


[0062] A variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention. A preferred format for an output means ranks fragments of the Mycoplasma genitalium genome possessing varying degrees of homology to the target sequence or target motif. Such presentation provides a skilled artisan with a ranking of sequences which contain various amounts of the target sequence or target motif and identifies the degree of homology contained in the identified fragment.


[0063] A variety of comparing means can be used to compare a target sequence or target motif with the data storage means to identify sequence fragments of the Mycoplasma genitalium genome. In the present examples, implementing software which implement the BLAST and BLAZE algorithms (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) was used to identify open reading frames within the Mycoplasma genitalium genome. A skilled artisan can readily recognize that any one of the publicly available homology search programs can be used as the search means for the computer-based systems of the present invention.


[0064] One application of this embodiment is provided in FIG. 2. FIG. 2 provides a block diagram of a computer system 102 that can be used to implement the present invention. The computer system 102 includes a processor 106 connected to a bus 104. Also connected to the bus 104 are a main memory 108 (preferably implemented as random access memory, RAM) and a variety of secondary storage devices 110, such as a hard drive 112 and a removable medium storage device 114. The removable medium storage device 114 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. A removable storage medium 116 (such as a floppy disk, a compact disk, a magnetic tape, etc.) containing control logic and/or data recorded therein may be inserted into the removable medium storage device 114. The computer system 102 includes appropriate software for reading the control logic and/or the data from the removable medium storage device 114 once inserted in the removable medium storage device 114.


[0065] A nucleotide sequence of the present invention may be stored in a well known manner in the main memory 108, any of the secondary storage devices 110, and/or a removable storage medium 116. Software for accessing and processing the genomic sequence (such as search tools, comparing tools, etc.) reside in main memory 108 during execution.


[0066] Biochemical Embodiments


[0067] Another embodiment of the present invention is directed to isolated fragments of the Mycoplasma genitalium genome. The fragments of the Mycoplasma genitalium genome of the present invention include, but are not limited to fragments which encode peptides, hereinafter open reading frames (ORFs), fragments which modulate the expression of an operably linked ORF, hereinafter expression modulating fragments (EMFs), fragments which mediate the uptake of a linked DNA fragment into a cell, hereinafter uptake modulating fragments (UMFs), and fragments which can be used to diagnose the presence of Mycoplasma genitalium in a sample, hereinafter diagnostic fragments (DFs).


[0068] As used herein, an “isolated nucleic acid molecule” or an “isolated fragment of the Mycoplasma genitalium genome” refers to a nucleic acid molecule possessing a specific nucleotide sequence which has been subjected to purification means to reduce, from the composition, the number of compounds which are normally associated with the composition. A variety of purification means can be used to generated the isolated fragments of the present invention. These include, but are not limited to methods which separate constituents of a solution based on charge, solubility, or size.


[0069] In one embodiment, Mycoplasma genitalium DNA can be mechanically sheared to produce fragments of 15-20 kb in length. These fragments can then be used to generate an Mycoplasma genitalium library by inserting them into lambda clones as described in the Examples below. Primers flanking, for example, an ORF provided in Table 1(a), 1(c) or 2 can then be generated using nucleotide sequence information provided in SEQ ID NO: 1. PCR cloning can then be used to isolate the ORF from the lambda DNA library. PCR cloning is well known in the art. Thus, given the availability of SEQ ID NO: 1, Table 1(a), 1(c) and Table 2, it would be routine to isolate any ORF or other representative fragment of the present invention.


[0070] The isolated nucleic acid molecules of the present invention include, but are not limited to single stranded and double stranded DNA, and single stranded RNA.


[0071] As used herein, an “open reading frame,” ORF, means a series of triplets coding for amino acids without any termination codons and is a sequence translatable into protein. Tables 1(a), 1(b), 1(c) and 2 identify ORFs in the Mycoplasma genitalium genome. In particular, Table 1(a) indicates the location of ORFs (i.e., the addresses) within the Mycoplasma genitalium genome which encode the recited protein based on homology matching with protein sequences from the organism appearing in parentheticals (see the fifth column of Table 1(a)).


[0072] The first column of Table 1(a) provides the “UID” (an arbitrary identification number) of a particular ORF. The second and third columns in Table 1(a) indicate an ORFs position in the nucleotide sequence provided in SEQ ID NO: 1. One of ordinary skill in the art will recognize that ORFs may be oriented in opposite directions in the Mycoplasma genitalium genome. This is reflected in columns 2 and 3.


[0073] The fourth column of Table 1(a) provides the accession number of the database match for the ORF. As indicated above, the fifth column of Table 1(a) provides the name of the database match for the ORF.


[0074] The sixth column of Table 1(a) indicates the percent identity of the protein encoded for by an ORF to the corresponding protein from the organism appearing in parentheticals in the fifth column. The seventh column of Table 1(a) indicates the percent similarity of the protein encoded for by an ORF to the corresponding protein from the organism appearing in parentheticals in the fifth column. The concepts of percent identity and percent similarity of two polypeptide sequences are well understood in the art. For example, two polypeptides 10 amino acids in length which differ at three amino acid positions (e.g., at positions 1, 3 and 5) are said to have a percent identity of 70%. However, the same two polypeptides would be deemed to have a percent similarity of 80% if, for example at position 5, the amino acids moieties, although not identical, were “similar” (i.e., possessed similar biochemical characteristics). The eighth column in Table 1(a) indicates the length of the ORF in nucleotides.


[0075] Table 1(b) is a list of ORFs that have database matches to previously published Mycoplasma genitalium sequences over the full length of the ORF. The table headings for Table 1(b) are identical for Table 1(a) with the following two exceptions: (II) The heading for the eighth column in Table 1(a) (i.e., nucleotide length of the ORF) has been replaced with the following in Table 1(b): “Match_info”. “Match_info” refers to the coordinates of the match of the ORF and the previously published Mycoplasma genitalium sequence. For example, “MG002 (1-930 of 930) GB:U09251 (298-1227 of 6140),” indicates that for ORF MG002, which is 930 nucleotides in length, there is a database match to accession number GB:U09251, which has a total length of 6140 nucleotides. The ORF matches this accession from position 298 to 1227.


[0076] (II) Where an ORF shows homology matches for both a previously published Mycoplasma genitalium sequence and a previously published sequence from a different organism, columns 3, 4, 5, and 6 of Table 1(b) respectively provide the accession number, protein name (and organism in parentheticals), percent identity and percent similarity for the “other organism,” rather than for the previously published Mycoplasma genitalium sequence. (However, in this scenario, the accession number for the Mycoplasma genitalium sequence is still provided in column 8.)


[0077] Table 1(c) provides ORFs having database matches to previously published Mycoplasma genitalium sequences but only over a portion of the ORF. The table headings are the same as above for Table 1(b).


[0078] In Tables 1(a), 1(b) and 1(c), unique identifiers are used to identify the recited ORFs, (e.g., “MG123”). In the parent U.S. application Ser. Nos. 08/488,018 and 08/473,545, the recited ORFs are identified using the “MORF” identifier. Table 1(d) lists which of the new and old identifiers refer to the same ORF. For example, the first entry in Table 1(d) indicates that the ORF identified as MG001 in the current application is the same ORF which was previously identified as MORF-20072 in parent U.S. application Ser. Nos. 08/488,018 and 08/473,545. Similarly, the third entry in Table 1(d) indicates that the ORF identified as MG003 in the current application is the same ORF which was previously identified as MORF-19818 and MORF-20073 in the parent applications.


[0079] Table 2 provides ORFs of the Mycoplasma genitalium genome which did not elicit a “homology match” with a known sequence from either M. genitalium or another organism.


[0080] Table 6 classifies each ORF according to its role category (adapted from Riley, M., Microbiol. Rev. 57:862 (1992)). The gene identification, the accession number from public archives that corresponds to the best match, the percent amino acid identity, and the length of the match in amino acids is also listed for each entry as above in Tables 1 (a-c). Those genes in M. genitalium that also match a gene in H. influenzae are indicated by an asterisk (*). For the purposes of Tables 6 and 7 and FIG. 4, each of the MgPa repetitive elements has been assigned an MG number, even though there is evidence to suggest that these repeats may not be transcribed.


[0081] Table 7 sorts the gene content in H. influenzae and M. genitalium by functional category. The number of genes in each category is listed for each organism. The number in parentheses indicates the percent of the putatively identified genes devoted to each functional category. For the category of unassigned genes, the percent of the genome indicated in parentheses represents the percent of the total number of putative coding regions.


[0082] Further details concerning the algorithms and criteria used for homology searches are provided in the Examples below.


[0083] A skilled artisan can readily identify ORFs in the Mycoplasma genitalium genome other than those listed in Tables 1(a), 1(b), 1(c) and 2, such as ORFs which are overlapping or encoded by the opposite strand of an identified ORF in addition to those ascertainable using the computer-based systems of the present invention.


[0084] As used herein, an “expression modulating fragment,” EMF, means a series of nucleotide molecules which modulates the expression of an operably linked ORF or EMF.


[0085] As used herein, a sequence is said to “modulate the expression of an operably linked sequence” when the expression of the sequence is altered by the presence of the EMF. EMFs include, but are not limited to, promoters, and promoter modulating sequences (inducible elements). One class of EMFs are fragments which induce the expression or an operably linked ORF in response to a specific regulatory factor or physiological event. A review of known EMFs from Mycoplasma are described by (Tomb et al. Gene 104:1-10 (1991), Chandler, M. S., Proc. Natl. Acad. Sci. USA 89:1626-1630 (1992).


[0086] EMF sequences can be identified within the Mycoplasma genitalium genome by their proximity to the ORFs provided in Tables 1(a), 1(b), 1(c) and 2. An intergenic segment, or a fragment of the intergenic segment, from about 10 to 200 nucleotides in length, taken 5′ from any one of the ORFs of Tables 1(a), 1(b), 1(c) or 2 will modulate the expression of an operably linked 3′ ORF in a fashion similar to that found with the naturally linked ORF sequence. As used herein, an “intergenic segment” refers to the fragments of the Mycoplasma genome which are between two ORF(s) herein described. Alternatively, EMFs can be identified using known EMFs as a target sequence or target motif in the computer-based systems of the present invention.


[0087] The presence and activity of an EMF can be confirmed using an EMF trap vector. An EMF trap vector contains a cloning site 5′ to a marker sequence. A marker sequence encodes an identifiable phenotype, such as antibiotic resistance or a complementing nutrition auxotrophic factor, which can be identified or assayed when the EMF trap vector is placed within an appropriate host under appropriate conditions. As described above, an EMF will modulate the expression of an operably linked marker sequence. A more detailed discussion of various marker sequences is provided below.


[0088] A sequence which is suspected as being an EMF is cloned in all three reading frames in one or more restriction sites upstream from the marker sequence in the EMF trap vector. The vector is then transformed into an appropriate host using known procedures and the phenotype of the transformed host in examined under appropriate conditions. As described above, an EMF will modulate the expression of an operably linked marker sequence.


[0089] As used herein, an “uptake modulating fragment,” UMF, means a series of nucleotide molecules which mediate the uptake of a linked DNA fragment into a cell. UMFs can be readily identified using known UMFs as a target sequence or target motif with the computer-based systems described above.


[0090] The presence and activity of a UMF can be confirmed by attaching the suspected UMF to a marker sequence. The resulting nucleic acid molecule is then incubated with an appropriate host under appropriate conditions and the uptake of the marker sequence is determined. As described above, a UMF will increase the frequency of uptake of a linked marker sequence. A review of DNA uptake in Mycoplasma is provided by Goodgall, S. H., et al., J. Bact. 172:5924-5928 (1990).


[0091] As used herein, a “diagnostic fragment,” DF, means a series of nucleotide molecules which selectively hybridize to Mycoplasma genitalium sequences. DFs can be readily identified by identifying unique sequences within the Mycoplasma genitalium genome, or by generating and testing probes or amplification primers consisting of the DF sequence in an appropriate diagnostic format which determines amplification or hybridization selectivity.


[0092] The sequences falling within the scope of the present invention are not limited to the specific sequences herein described, but also include allelic and species variations thereof. Allelic and species variations can be routinely determined by comparing the sequence provided in SEQ ID NO: 1, a representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 with a sequence from another isolate of the same species. Furthermore, to accommodate codon variability, the invention includes nucleic acid molecules coding for the same amino acid sequences as do the specific ORFs disclosed herein. In other words, in the coding region of an ORF, substitution of one codon for another which encodes the same amino acid is expressly contemplated.


[0093] Any specific sequence disclosed herein can be readily screened for errors by resequencing a particular fragment, such as an ORF, in both directions (i.e., sequence both strands). Alternatively, error screening can be performed by sequencing corresponding polynucleotides of Mycoplasma genitalium origin isolated by using part or all of the fragments in question as a probe or primer.


[0094] Each of the ORFs of the Mycoplasma genitalium genome disclosed in Tables 1(a), 1(b), 1(c) and 2, and the EMF found 5′ to the ORF, can be used in numerous ways as polynucleotide reagents. The sequences can be used as diagnostic probes or diagnostic amplification primers to detect the presence of a specific microbe, such as Mycoplasma genitalium, in a sample. This is especially the case with the fragments or ORFs of Table 2, which will be highly selective for Mycoplasma genitalium.


[0095] In addition, the fragments of the present invention, as broadly described, can be used to control gene expression through triple helix formation or antisense DNA or RNA, both of which methods are based on the binding of a polynucleotide sequence to DNA or RNA. Polynucleotides suitable for use in these methods are usually 20 to 40 bases in length and are designed to be complementary to a region of the gene involved in transcription (triple helix—see Lee et a., Nucl. Acids Res. 6:3073 (1979); Cooney et al., Science 241:456 (1988); and Dervan et al., Science 251:1360 (1991)) or to the mRNA itself (antisense—Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, Fla. (1988)).


[0096] Triple helix-formation optimally results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization blocks translation of an MRNA molecule into polypeptide. Both techniques have been demonstrated to be effective in model systems. Information contained in the sequences of the present invention is necessary for the design of an antisense or triple helix oligonucleotide.


[0097] The present invention further provides recombinant constructs comprising one or more fragments of the Mycoplasma genitalium genome of the present invention. The recombinant constructs of the present invention comprise a vector, such as a plasmid or viral vector, into which a fragment of the Mycoplasma genitalium has been inserted, in a forward or reverse orientation. In the case of a vector comprising one of the ORFs of the present invention, the vector may further comprise regulatory sequences, including for example, a promoter, operably linked to the ORF. For vectors comprising the EMFs and UMFs of the present invention, the vector may further comprise a marker sequence or heterologous ORF operably linked to the EMF or UMF. Large numbers of suitable vectors and promoters are known to those of skill in the art and are commercially available for generating the recombinant constructs of the present invention. The following vectors are provided by way of example. Bacterial: pBs, phagescript, PsiX174, pBluescript SK, pBs KS, pNH8a, pNH16a, pNH18a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia). Eukaryotic: pWLneo, pSV2cat, pOG44, pXT1, pSG (Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia).


[0098] Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers. Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial promoters include lacI, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art.


[0099] The present invention further provides host cells containing any one of the isolated fragments of the Mycoplasma genitalium genome of the present invention, wherein the fragment has been introduced into the host cell using known transformulation methods. The host cell can be a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic host cell, such as a yeast cell, or the host cell can be a procaryotic cell, such as a bacterial cell. Introduction of the recombinant construct into the host cell can be effected by calcium phosphate transfection, DEAE, dextran mediated transfection, or electroporation (Davis, L. et al., Basic Methods in Molecular Biology (1986)).


[0100] The host cells containing one of the fragments of the Mycoplasma genitalium genome of the present invention, can be used in conventional manners to produce the gene product encoded by the isolated fragment (in the case of an ORF) or can be used to produce a heterologous protein under the control of the EMF.


[0101] The present invention fuirther provides isolated polypeptides encoded by the nucleic acid fragments of the present invention or by degenerate variants of the nucleic acid fragments of the present invention. By “degenerate variant” is intended nucleotide fragments which differ from a nucleic acid fragment of the present invention (e.g., an ORF) by nucleotide sequence but, due to the degeneracy of the Genetic Code, encode an identical polypeptide sequence. Preferred nucleic acid fragments of the present invention are the ORFs depicted in Tables 1(a), 1(c) and 2.


[0102] A variety of methodologies known in the art can be utilized to obtain any one of the isolated polypeptides or proteins of the present invention. At the simplest level, the amino acid sequence can be synthesized using commercially available peptide synthesizers. This is particularly useful in producing small peptides and fragments of larger polypeptides. Fragments are useful, for example, in generating antibodies against the native polypeptide. In an alternative method, the polypeptide or protein is purified from bacterial cells which naturally produce the polypeptide or protein. One skilled in the art can readily follow known methods for isolating polypeptides and proteins in order to obtain one of the isolated polypeptides or proteins of the present invention. These include, but are not limited to, immunochromatography, HPLC, size-exclusion chromatography, ion-exchange chromatography, and immuno-affinity chromatography.


[0103] The polypeptides and proteins of the present invention can alternatively be purified from cells which have been altered to express the desired polypeptide or protein. As used herein, a cell is said to be altered to express a desired polypeptide or protein when the cell, through genetic manipulation, is made to produce a polypeptide or protein which it normally does not produce or which the cell normally produces at a lower level. One skilled in the art can readily adapt procedures for introducing and expressing either recombinant or synthetic sequences into eukaryotic or prokaryotic cells in order to generate a cell which produces one of the polypeptides or proteins of the present invention.


[0104] Any host/vector system can be used to express one or more of the ORFs of the present invention. These include, but are not limited to, eukaryotic hosts such as HeLa cells, Cv-1 cell, COS cells, and Sf9 cells, as well as prokaryotic host such as E. coli and B. subtilis. The most preferred cells are those which do not normally express the particular polypeptide or protein or which expresses the polypeptide or protein at low natural level.


[0105] “Recombinant,” as used herein, means that a polypeptide or protein is derived from recombinant (e.g., microbial or mammalian) expression systems. “Microbial” refers to recombinant polypeptides or proteins made in bacterial or fungal (e.g., yeast) expression systems. As a product, “recombinant microbial” defines a polypeptide or protein essentially free of native endogenous substances and unaccompanied by associated native glycosylation. Polypeptides or proteins expressed in most bacterial cultures, e.g., E. coli, will be free of glycosylation modifications; polypeptides or proteins expressed in yeast will have a glycosylation pattern different from that expressed in mammalian cells.


[0106] “Nucleotide sequence” refers to a heteropolymer of deoxyribonucleotides. Generally, DNA segments encoding the polypeptides and proteins provided by this invention are assembled from fragments of the Mycoplasma genitalium genome and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a synthetic gene which is capable of being expressed in a recombinant transcriptional unit comprising regulatory elements derived from a microbial or viral operon.


[0107] “Recombinant expression vehicle or vector” refers to a plasmid or phage or virus or vector, for expressing a polypeptide from a DNA (RNA) sequence. The expression vehicle can comprise a transcriptional unit comprising an assembly of (1) a genetic element or elements having a regulatory role in gene expression, for example, promoters or enhancers, (2) a structural or coding sequence which is transcribed into mRNA and translated into protein, and (3) appropriate transcription initiation and termination sequences. Structural units intended for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling extracellular secretion of translated protein by a host cell. Alternatively, where recombinant protein is expressed without a leader or transport sequence, it may include an N-terminal methionine residue. This residue may or may not be subsequently cleaved from the expressed recombinant protein to provide a final product.


[0108] “Recombinant expression system” means host cells which have stably integrated a recombinant transcriptional unit into chromosomal DNA or carry the recombinant transcriptional unit extra chromosomally. The cells can be prokaryotic or eukaryotic. Recombinant expression systems as defined herein will express heterologous polypeptides or proteins upon induction of the regulatory elements linked to the DNA segment or synthetic gene to be expressed.


[0109] Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell-free translation systems can also be employed to produce such proteins using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described by Sambrook, et al., in Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, N.Y. (1989), the disclosure of which is hereby incorporated by reference.


[0110] Generally, recombinant expression vectors will include origins of replication and selectable markers permitting transformation of the host cell, e.g., the ampicillin resistance gene of E. coli and S. cerevisiae TRP1 gene, and a promoter derived from a highly-expressed gene to direct transcription of a downstream structural sequence. Such promoters can be derived from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), a-factor, acid phosphatase, or heat shock proteins, among others. The heterologous structural sequence is assembled in appropriate phase with translation initiation and termination sequences, and preferably, a leader sequence capable of directing secretion of translated protein into the periplasmic space or extracellular medium. Optionally, the heterologous sequence can encode a fusion protein including an N-terminal identification peptide imparting desired characteristics, e.g., stabilization or simplified purification of expressed recombinant product.


[0111] Useful expression vectors for bacterial use are constructed by inserting a structural DNA sequence encoding a desired protein together with suitable translation initiation and termination signals in operable reading phase with a functional promoter. The vector will comprise one or more phenotypic selectable markers and an origin of replication to ensure maintenance of the vector and to, if desirable, provide amplification within the host. Suitable prokaryotic hosts for transformation include E. coli, Bacillus subtilis, Salmonella typhimurium and various species within the genera Pseudomonas, Streptomyces, and Staphylococcus, although others may, also be employed as a matter of choice.


[0112] As a representative but nonlimiting example, useful expression vectors for bacterial use can comprise a selectable marker and bacterial origin of replication derived from commercially available plasmids comprising genetic elements of the well known cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for example, pKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 (Promega Biotec, Madison, Wis., USA). These pBR322 “backbone” sections are combined with an appropriate promoter and the structural sequence to be expressed.


[0113] Following transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the selected promoter is derepressed by appropriate means (e.g., temperature shift or chemical induction) and cells are cultured for an additional period. Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for further purification.


[0114] Various mammalian cell culture systems can also be employed to express recombinant protein. Examples of mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts, described by Gluzman, Cell 23:175 (1981), and other cell lines capable of expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell lines. Mammalian expression vectors will comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and 5′ flanking nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be used to provide the required nontranscribed genetic elements.


[0115] Recombinant polypeptides and proteins produced in bacterial culture is usually isolated by initial extraction from cell pellets, followed by one or more salting-out, aqueous ion exchange or size exclusion chromatography steps. Protein refolding steps can be used, as necessary, in completing configuration of the mature protein. Finally, high performance liquid chromatography (HPLC) can be employed for final purification steps. Microbial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents.


[0116] The present invention further includes isolated polypeptides, proteins and nucleic acid molecules which are substantially equivalent to those herein described. As used herein, substantially equivalent can refer both to nucleic acid and amino acid sequences, for example a mutant sequence, that varies from a reference sequence by one or more substitutions, deletions, or additions, the net effect of which does not result in an adverse functional dissimilarity between reference and subject sequences. For purposes of the present invention, sequences having equivalent biological activity, and equivalent expression characteristics are considered substantially equivalent. For purposes of determining equivalence, truncation of the mature sequence should be disregarded.


[0117] The invention further provides methods of obtaining homologs from other strains of Mycoplasma genitalium, of the fragments of the Mycoplasma genitalium genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention. As used herein, a sequence or protein of Mycoplasma genitalium is defined as a homolog of a fragment of the Mycoplasma genitalium genome or a protein encoded by one of the ORFs of the present invention, if it shares significant homology to one of the fragments of the Mycoplasma genitalium genome of the present invention or a protein encoded by one of the ORFs of the present invention. Specifically, by using the sequence disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs.


[0118] As used herein, two nucleic acid molecules or proteins are said to “share significant homology” if the two contain regions which process greater than 85% sequence (amino acid or nucleic acid) homology.


[0119] Region specific primers or probes derived from the nucleotide sequence provided in SEQ ID NO: 1 or from a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 can be used to prime DNA synthesis and PCR amplification, as well as to identify colonies containing cloned DNA encoding a homolog using known methods (Innis et al., PCR Protocols, Academic Press, San Diego, Calif. (1990)).


[0120] When using primers derived from SEQ ID NO: 1 or from a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1, one skilled in the art will recognize that by employing high stringency conditions (e.g., annealing at 50-60° C.) only sequences which are greater than 75% homologous to the primer will be amplified. By employing lower stringency conditions (e.g., annealing at 35-37° C.), sequences which are greater than 40-50% homologous to the primer will also be amplified.


[0121] When using DNA probes derived from SEQ ID NO: 1 or from a nucleotide sequence at least 99.9% identical to SEQ ID NO: 1 for colony/plaque hybridization, one skilled in the art will recognize that by employing high stringency conditions (e.g., hybridizing at 50-65° C. in 5× SSC and 50% formamide, and washing at 50-65° C. in 0.5× SSC), sequences having regions which are greater than 90% homologous to the probe can be obtained, and that by employing lower stringency conditions (e.g., hybridizing at 35-37° C. in 5× SSC and 40-45% formamide, and washing at 42° C. in SSC), sequences having regions which are greater than 35-45% homologous to the probe will be obtained.


[0122] Any organism can be used as the source for homologs of the present invention so long as the organism naturally expresses such a protein or contains genes encoding the same. The most preferred organism for isolating homologs are bacteria which are closely related to Mycoplasma genitalium.


[0123] Uses for the Compositions of the Invention


[0124] Each ORF provided in Table 1(a), 1(b) and 1(c) was assigned to biological role categories adapted from Riley, M., Microbiology Reviews 57(4):862 (1993)). This allows the skilled artisan to determine a use for each identified coding sequence. Tables 1(a), 1(b) and 1(c) further provides an identification of the type of polypeptide which is encoded for by each ORF. As a result, one skilled in the art can use the polypeptides of the present invention for commercial, therapeutic and industrial purposes consistent with the type of putative identification of the polypeptide.


[0125] Such identifications permit one skilled in the art to use the Mycoplasma genitalium ORFs in a manner similar to the known type of sequences for which the identification is made; for example, to ferment a particular sugar source or to produce a particular metabolite. (For a review of enzymes used within the commercial industry, see Biochemical Engineering and Biotechnology Handbook 2nd, eds. Macmillan Publ. Ltd., NY (1991) and Biocatalysts in Organic Syntheses, ed. J. Tramper et al., Elsevier Science Publishers, Amsterdam, The Netherlands (1985)).


[0126] 1. Biosynthetic Enzymes


[0127] Open reading frames encoding proteins involved in mediating the catalytic reactions involved in intermediary and macromolecular metabolism, the biosynthesis of small molecules, cellular processes and other functions includes enzymes involved in the degradation of the intermediary products of metabolism, enzymes involved in central intermediary metabolism, enzymes involved in respiration, both aerobic and anaerobic, enzymes involved in fermentation, enzymes involved in ATP proton motor force conversion, enzymes involved in broad regulatory function, enzymes involved in amino acid synthesis, enzymes involved in nucleotide synthesis, enzymes involved in cofactor and vitamin synthesis, can be used for industrial biosynthesis. The various metabolic pathways present in Mycoplasma can be identified based on absolute nutritional requirements as well as by examining the various enzymes identified in Table 1(a), 1(b) and 1(c).


[0128] Identified within the category of intermediary metabolism, a number of the proteins encoded by the identified ORFs in Tables 1(a), 1(b) and 1(c) are particularly involved in the degradation of intermediary metabolites as well as non-macromolecular metabolism. Some of the enzymes identified include amylases, glucose oxidases, and catalase.


[0129] Proteolytic enzymes are another class of commercially important enzymes. Proteolytic enzymes find use in a number of industrial processes including the processing of flax and other vegetable fibers, in the extraction, clarification and depectinization of fruit juices, in the extraction of vegetables' oil and in the maceration of fruits and vegetables to give unicellular fruits. A detailed review of the proteolytic enzymes used in the food industry is provided by Rombouts et al., Symbiosis 21:79 (1986) and Voragen et al. in Biocatalyst in Agricultural Biotechnology, edited J. R. Whitaker et al., American Chemical Society Symposium Series 389:93 (1989)).


[0130] The metabolism of glucose, galactose, fructose and xylose are important parts of the primary metabolism of Mycoplasma. Enzymes involved in the degradation of these sugars can be used in industrial fermentation. Some of the important sugar transforming enzymes, from a commercial viewpoint, include sugar isomerases such as glucose isomerase. Other metabolic enzymes have found commercial use such as glucose oxidases which produces ketogulonic acid (KGA). KGA is an intermediate in the commercial production of ascorbic acid using the Reichstein's procedure (see Krueger et al., Biotechnology 6(A), Rhine, H. J. et al., eds., Verlag Press, Weinheim, Germany (1984)).


[0131] Glucose oxidase (GOD) is commercially available and has been used in purified form as well as in an immobilized form for the deoxygenation of beer. See Hartmeir et al., Biotechnology Letters 1:21 (1979). The most important application of GOD is the industrial scale fermentation of gluconic acid. Market for gluconic acids which are used in the detergent, textile, leather, photographic, pharmaceutical, food, feed and concrete industry (see Bigelis in Gene Manipulations and Fungi, Benett, J. W. et al., eds., Academic Press, New York (1985), p. 357). In addition to industrial applications, GOD has found applications in medicine for quantitative determination of glucose in body fluids recently in biotechnology for analyzing syrups from starch and cellulose hydrosylates. See Owusu et al., Biochem. et Biophysica. Acta. 872:83 (1986).


[0132] The main sweetener used in the world today is sugar which comes from sugar beets and sugar cane. In the field of industrial enzymes, the glucose isomerase process shows the largest expansion in the market today. Initially, soluble enzymes were used and later immobilized enzymes were developed (Krueger et al., Biotechnology, The Textbook of Industrial Microbiology, Sinauer Associated Incorporated, Sunderland, Mass. (1990)). Today, the use of glucose-produced high fructose syrups is by far the largest industrial business using immobilized enzymes. A review of the industrial use of these enzymes is provided by Jorgensen, Starch 40:307 (1988).


[0133] Proteinases, such as alkaline serine proteinases, are used as detergent additives and thus represent one of the largest volumes of microbial enzymes used in the industrial sector. Because of their industrial importance, there is a large body of published and unpublished information regarding the use of these enzymes in industrial processes. (See Faultman et al., Acid Proteases Structure Function and Biology, Tang, J., ed., Plenum Press, New York (1977) and Godfrey et al., Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner et al., Report Industrial Enzymes by 1990, Hel Hepner & Associates, London (1986)).


[0134] Another class of commercially usable proteins of the present invention are the microbial lipases identified in Tables 1(a), 1(b) and 1(c) (see Macrae et al., Philosophical Transactions of the Chiral Society of London 310:227 (1985) and Poserke, Journal of the American Oil Chemist Society 61:1758 (1984). A major use of lipases is in the fat and oil industry for the production of neutral glycerides using lipase catalyzed inter-esterification of readily available triglycerides. Application of lipases include the use as a detergent additive to facilitate the removal of fats from fabrics in the course of the washing procedures.


[0135] The use of enzymes, and in particular microbial enzymes, as catalyst for key steps in the synthesis of complex organic molecules is gaining popularity at a great rate. One area of great interest is the preparation of chiral intermediates. Preparation of chiral intermediates is of interest to a wide range of synthetic chemists particularly those scientists involved with the preparation of new pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et al., Recent Advances in the Generation of Chiral Intermediates Using Enzymes, CRC Press, Boca Raton, Fla. (1990)). The following reactions catalyzed by enzymes are of interest to organic chemists: hydrolysis of carboxylic acid esters, phosphate esters, amides and nitrites, esterification reactions, transesterification reactions, synthesis of amides, reduction of alkanones and oxoalkanates, oxidation of alcohols to carbonyl compounds, oxidation of sulfides to sulfoxides, and carbon bond forming reactions such as the aldol reaction.


[0136] When considering the use of an enzyme encoded by one of the ORFs of the present invention for biotransformation and organic synthesis it is sometimes necessary to consider the respective advantages and disadvantages of using a microorganism as opposed to an isolated enzyme. Pros and cons of using a whole cell system on the one hand or an isolated partially purified enzyme on the other hand, has been described in detail by Bud et al., Chemistry in Britain (1987), p. 127.


[0137] Amino transferases, enzymes involved in the biosynthesis and metabolism of amino acids, are useful in the catalytic production of amino acids. The advantages of using microbial based enzyme systems is that the amino transferase enzymes catalyze the stereo-selective synthesis of only l-amino acids and generally possess uniformly high catalytic rates. A description of the use of amino transferases for amino acid production is provided by Roselle-David, Methods of Enzymology 136:479 (1987).


[0138] 2. Generation of Antibodies


[0139] As described here, the proteins of the present invention, as well as homologs thereof, can be used in a variety procedures and methods known in the art which are currently applied to other proteins. The proteins of the present invention can further be used to generate an antibody which selectively binds the protein. Such antibodies can be either monoclonal or polyclonal antibodies, as well fragments of these antibodies, and humanized forms.


[0140] The invention further provides antibodies which selectively bind to one of the proteins of the present invention and hybridomas which produce these antibodies. A hybridoma is an immortalized cell line which is capable of secreting a specific monoclonal antibody.


[0141] In general, techniques for preparing polyclonal and monoclonal antibodies as well as hybridomas capable of producing the desired antibody are well known in the art (Campbell, A. M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. Groth et al., J. Immunol. Methods 35:1-21 (1980); Kohler and Milstein, Nature 256:495-497 (1975)), the trioma technique, the human B-cell hybridoma technique (Kozbor et al., Immunology Today 4:72 (1983); Cole et al., in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc. (1985), pp. 77-96).


[0142] Any animal (mouse, rabbit, etc.) which is known to produce antibodies can be immunized with the pseudogene polypeptide. Methods for immunization are well known in the art. Such methods include subcutaneous or interperitoneal injection of the polypeptide. One skilled in the art will recognize that the amount of the protein encoded by the ORF of the present invention used for immunization will vary based on the animal which is immunized, the antigenicity of the peptide and the site of injection.


[0143] The protein which is used as an immunogen may be modified or administered in an adjuvant in order to increase the protein's antigenicity. Methods of increasing the antigenicity of a protein are well known in the art and include, but are not limited to coupling the antigen with a heterologous protein (such as globulin or β-galactosidase) or through the inclusion of an adjuvant during immunization.


[0144] For monoclonal antibodies, spleen cells from the immunized animals are removed, fused with myeloma cells, such as SP2/0-Ag14 .myeloma cells, and allowed to become monoclonal antibody producing hybridoma cells.


[0145] Any one of a number of methods well known in the art can be used to identify the hybridoma cell which produces an antibody with the desired characteristics. These include screening the hybridomas with an ELISA assay, western blot analysis, or radioimmunoassay (Lutz et al., Exp. Cell Res. 175:109-124 (1988)).


[0146] Hybridomas secreting the desired antibodies are cloned and the class and subclass is determined using procedures known in the art (Campbell, A. M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984)).


[0147] Techniques described for the production of single chain antibodies (U.S. Pat. No. 4,946,778) can be adapted to produce single chain antibodies to proteins of the present invention.


[0148] For polyclonal antibodies, antibody containing antisera is isolated from the immunized animal and is screened for the presence of antibodies with the desired specificity using one of the above-described procedures.


[0149] The present invention further provides the above-described antibodies in detectably labeled form. Antibodies can be detectably labeled through the use of radioisotopes, affinity labels (such as biotin, avidin, etc.), enzymatic labels (such as horseradish peroxidase, alkaline phosphatase, etc.) fluorescent labels (such as FITC or rhodamine, etc.), paramagnetic atoms, etc. Procedures for accomplishing such labeling are well-known in the art, for example see (Stemberger, L. A. et al., J. Histochem. Cytochem. 18:315 (1970); Bayer, E. A. et al., Meth. Enzym. 62:308 (1979); Engval, E. et al., Immunol. 109:129 (1972); Goding, J. W. J. Immunol. Meth. 13:215 (1976)).


[0150] The labeled antibodies of the present invention can be used for in vitro, in vivo, and in situ assays to identify cells or tissues in which a fragment of the Mycoplasma genitalium genome is expressed.


[0151] The present invention further provides the above-described antibodies immobilized on a solid support. Examples of such solid supports include plastics such as polycarbonate, complex carbohydrates such as agarose and sepharose, acrylic resins and such as polyacrylamide and latex beads. Techniques for coupling antibodies to such solid supports are well known in the art (Weir, D. M. et al., “Handbook of Experimental Immunology” 4th Ed., Blackwell Scientific Publications, Oxford, England, Chapter 10 (1986); Jacoby, W. D. et al., Meth. Enzym. 34 Academic Press, N.Y. (1974)). The immobilized antibodies of the present invention can be used for in vitro, in vivo, and in situ assays as well as for immunoaffinity purification of the proteins of the present invention.


[0152] 3. Diagnostic Assays and Kits


[0153] The present invention further provides methods to identify the expression of one of the ORFs of the present invention, or homolog thereof, in a test sample, using one of the DFs or antibodies of the present invention.


[0154] In detail, such methods comprise incubating a test sample with one or more of the antibodies or one or more of the DFs of the present invention and assaying for binding of the DFs or antibodies to components within the test sample.


[0155] Conditions for incubating a DF or antibody with a test sample vary. Incubation conditions depend on the format employed in the assay, the detection methods employed, and the type and nature of the DF or antibody used in the assay. One skilled in the art will recognize that any one of the commonly available hybridization, amplification or immunological assay formats can readily be adapted to employ the DFs or antibodies of the present invention. Examples of such assays can be found in Chard, T., An Introduction to Radioimmunoassay and Related Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands (1986); Bullock, G. R. et al., Techniques in Immunocytochemistry, Academic Press, Orlando, Fla. Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Tijssen, P., Practice and Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1985).


[0156] The test samples of the present invention include cells, protein or membrane extracts of cells, or biological fluids such as sputum, blood, serum, plasma, or urine. The test sample used in the above-described method will vary based on the assay format, nature of the detection method and the tissues, cells or extracts used as the sample to be assayed. Methods for preparing protein extracts or membrane extracts of cells are well known in the art and can be readily be adapted in order to obtain a sample which is compatible with the system utilized.


[0157] In another embodiment of the present invention, kits are provided which contain the necessary reagents to carry out the assays of the present invention.


[0158] Specifically, the invention provides a compartmentalized kit to receive, in close confinement, one or more containers which comprises: (a) a first container comprising one of the DFs or antibodies of the present invention; and (b) one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting presence of a bound DF or antibody.


[0159] In detail, a compartmentalized kit includes any kit in which reagents are contained in separate containers. Such containers include small glass containers, plastic containers or strips of plastic or paper. Such containers allows one to efficiently transfer reagents from one compartment to another compartment such that the samples and reagents are not cross-contaminated, and the agents or solutions of each container can be added in a quantitative fashion from one compartment to another. Such containers will include a container which will accept the test sample, a container which contains the antibodies used in the assay, containers which contain wash reagents (such as phosphate buffered saline, Tris-buffers, etc.), and containers which contain the reagents used to detect the bound antibody or DF.


[0160] Types of detection reagents include labeled nucleic acid probes, labeled secondary antibodies, or in the alternative, if the primary antibody is labeled, the enzymatic, or antibody binding reagents which are capable of reacting with the labeled antibody. One skilled in the art will readily recognize that the disclosed DFs and antibodies of the present invention can be readily incorporated into one of the established kit formats which are well known in the art.


[0161] 4. Screening Assay for Binding Agents


[0162] Using the isolated proteins of the present invention, the present invention further provides methods of obtaining and identifying agents which bind to a protein encoded by one of the ORFs of the present invention or to one of the fragments and the Mycoplasma genome herein described.


[0163] In detail, said method comprises the steps of:


[0164] (a) contacting an agent with an isolated protein encoded by one of the ORFs of the present invention, or an isolated fragment of the Mycoplasma genome; and


[0165] (b) determining whether the agent binds to said protein or said fragment.


[0166] The agents screened in the above assay can be, but are not limited to, peptides, carbohydrates, vitamin derivatives, or other pharmaceutical agents. The agents can be selected and screened at random or rationally selected or designed using protein modeling techniques.


[0167] For random screening, agents such as peptides, carbohydrates, pharmaceutical agents and the like are selected at random and are assayed for their ability to bind to the protein encoded by the ORF of the present invention.


[0168] Alternatively, agents may be rationally selected or designed. As used herein, an agent is said to be “rationally selected or designed” when the agent is chosen based on the configuration of the particular protein. For example, one skilled in the art can readily adapt currently available procedures to generate peptides, pharmaceutical agents and the like capable of binding to a specific peptide sequence in order to generate rationally designed antipeptide peptides, for example see Hurby et al., “Application of Synthetic Peptides: Antisense Peptides,” In Synthetic Peptides, A User's Guide, W. H. Freeman, NY (1992), pp. 289-307, and Kaspezak et al., Biochemistry 28:9230-8 (1989), or pharmaceutical agents, or the like.


[0169] In addition to the foregoing, one class of agents of the present invention, as broadly described, can be used to control gene expression through binding to one of the ORFs or EMFs of the present invention. As described above, such agents can be randomly screened or rationally designed/selected. Targeting the ORF or EMF allows a skilled artisan to design sequence specific or element specific agents, modulating the expression of either a single ORF or multiple ORFs which rely on the same EMF for expression control.


[0170] One class of DNA binding agents are agents which contain base residues which hybridize or form a triple helix formation by binding to DNA or RNA. Such agents can be based on the classic phosphodiester, ribonucleic acid backbone, or can be a variety of sulfhydryl or polymeric derivatives which have base attachment capacity.


[0171] Agents suitable for use in these methods usually contain 20 to 40 bases and are designed to be complementary to a region of the gene involved in transcription (triple helix—see Lee et al., Nucl. Acids Res. 6:3073 (1979); Cooney et al., Science 241:456 (1988); and Dervan et al., Science 251: 1360 (1991)) or to the MRNA itself (antisense—Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, Fla. (1988)). Triple helix-formation optimally results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization blocks translation of an mRNA molecule into polypeptide. Both techniques have been demonstrated to be effective in model systems. Information contained in the sequences of the present invention is necessary for the design of an antisense or triple helix oligonucleotide and other DNA binding agents.


[0172] Agents which bind to a protein encoded by one of the ORFs of the present invention can be used as a diagnostic agent, in the control of bacterial infection by modulating the activity of the protein encoded by the ORF. Agents which bind to a protein encoded by one of the ORFs of the present invention can be formulated using known techniques to generate a pharmaceutical composition for use in controlling Mycoplasma growth and infection.


[0173] 5. Vaccine and Pharmaceutical Composition


[0174] The present invention further provides pharmaceutical agents which can be used to modulate the growth of Mycoplasma genitalium, or another related organism, in vivo or in vitro. As used herein, a “pharmaceutical agent” is defined as a composition of matter which can be formulated using known techniques to provide a pharmaceutical compositions. As used herein, the “pharmaceutical agents of the present invention” refers the pharmaceutical agents which are derived from the proteins encoded by the ORFs of the present invention or are agents which are identified using the herein described assays.


[0175] As used herein, a pharmaceutical agent is said to “modulated the growth of Mycoplasma sp., or a related organism, in vivo or in vitro,” when the agent reduces the rate of growth, rate of division, or viability of the organism in question. The pharmaceutical agents of the present invention can modulate the growth of an organism in many fashions, although an understanding of the underlying mechanism of action is not needed to practice the use of the pharmaceutical agents of the present invention. Some agents will modulate the growth by binding to an important protein thus blocking the biological activity of the protein, while other agents may bind to a component of the outer surface of the organism blocking attachment or rendering the organism more prone to act the bodies nature immune system. Alternatively, the agent may be comprise a protein encoded by one of the ORFs of the present invention and serve as a vaccine. The development and use of a vaccine based on outer membrane components, such as the LPS, are well known in the art.


[0176] As used herein, a “related organism” is a broad term which refers to any organism whose growth can be modulated by one of the pharmaceutical agents of the present invention. In general, such an organism will contain a homolog of the protein which is the target of the pharmaceutical agent or the protein used as a vaccine. As such, related organism do not need to be bacterial but may be fungal or viral pathogens.


[0177] The pharmaceutical agents and compositions of the present invention may be administered in a convenient manner such as by the oral, topical, intravenous, intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes. The pharmaceutical compositions are administered in an amount which is effective for treating and/or prophylaxis of the specific indication. In general, they are administered in an amount of at least about 10 μg/kg body weight and in most cases they will be administered in an amount not in excess of about 8 mg/Kg body weight per day. In most cases, the dosage is from about 10 μg/kg to about 1 mg/kg body weight daily, taking into account the routes of administration, symptoms, etc.


[0178] The agents of the present invention can be used in native form or can be modified to form a chemical derivative. As used herein, a molecule is said to be a “chemical derivative” of another molecule when it contains additional chemical moieties not normally a part of the molecule. Such moieties may improve the molecule's solubility, absorption, biological half life, etc. The moieties may alternatively decrease the toxicity of the molecule, eliminate or attenuate any undesirable side effect of the molecule, etc. Moieties capable of mediating such effects are disclosed in Remington's Pharmaceutical Sciences (1980).


[0179] For example, a change in the immunological character of the functional derivative, such as affinity for a given antibody, is measured by a competitive type immunoassay. Changes in immunomodulation activity are measured by the appropriate assay. Modifications of such protein properties as redox or thermal stability, biological half-life, hydrophobicity, susceptibility to proteolytic degradation or the tendency to aggregate with carriers or into multimers are assayed by methods well known to the ordinarily skilled artisan.


[0180] The therapeutic effects of the agents of the present invention may be obtained by providing the agent to a patient by any suitable means (i.e., inhalation, intravenously, intramuscularly, subcutaneously, enterally, or parenterally). It is preferred to administer the agent of the present invention so as to achieve an effective concentration within the blood or tissue in which the growth of the organism is to be controlled.


[0181] To achieve an effective blood concentration, the preferred method is to administer the agent by injection. The administration may be by continuous infusion, or by single or multiple injections.


[0182] In providing a patient with one of the agents of the present invention, the dosage of the administered agent will vary depending upon such factors as the patient's age, weight, height, sex, general medical condition, previous medical history, etc. In general, it is desirable to provide the recipient with a dosage of agent which is in the range of from about 1 pg/kg to 10 mg/kg (body weight of patient), although a lower or higher dosage may be administered. The therapeutically effective dose can be lowered by using combinations of the agents of the present invention or another agent.


[0183] As used herein, two or more compounds or agents are said to be administered “in combination” with each other when either (1) the physiological effects of each compound, or (2) the serum concentrations of each compound can be measured at the same time. The composition of the present invention can be administered concurrently with, prior to, or following the administration of the other agent.


[0184] The agents of the present invention are intended to be provided to recipient subjects in an amount sufficient to decrease the rate of growth (as defined above) of the target organism.


[0185] The administration of the agent(s) of the invention may be for either a “prophylactic” or “therapeutic” purpose. When provided prophylactically, the agent(s) are provided in advance of any symptoms indicative of the organisms growth. The prophylactic administration of the agent(s) serves to prevent, attenuate, or decrease the rate of onset of any subsequent infection. When provided therapeutically, the agent(s) are provided at (or shortly after) the onset of an indication of infection. The therapeutic administration of the compound(s) serves to attenuate the pathological symptoms of the infection and to increase the rate of recovery.


[0186] The agents of the present invention are administered to the mammal in a pharmaceutically acceptable form and in a therapeutically effective concentration. A composition is said to be “pharmacologically acceptable” if its administration can be tolerated by a recipient patient. Such an agent is said to be administered in a “therapeutically effective amount” if the amount administered is physiologically significant. An agent is physiologically significant if its presence results in a detectable change in the physiology of a recipient patient.


[0187] The agents of the present invention can be formulated according to known methods to prepare pharmaceutically useful compositions, whereby these materials, or their functional derivatives, are combined in admixture with a pharmaceutically acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive of other human proteins, e.g., human serum albumin, are described, for example, in Remington's Pharmaceutical Sciences (16th ed., Osol, A., Ed., Mack, Easton Pa. (1980)). In order to form a pharmaceutically acceptable composition suitable for effective administration, such compositions will contain an effective amount of one or more of the agents of the present invention, together with a suitable amount of carrier vehicle.


[0188] Additional pharmaceutical methods may be employed to control the duration of action. Control release preparations may be achieved through the use of polymers to complex or absorb one or more of the agents of the present invention. The controlled delivery may be exercised by selecting appropriate macromolecules (for example polyesters, polyamino acids, polyvinyl, pyrrolidone, ethylenevinylacetate, methylcellulose, carboxymethylcellulose, or protamine, sulfate) and the concentration of macromolecules as well as the methods of incorporation in order to control release. Another possible method to control the duration of action by controlled release preparations is to incorporate agents of the present invention into particles of a polymeric material such as polyesters, polyamino acids, hydrogels, poly(lactic acid) or ethylene vinylacetate copolymers. Alternatively, instead of incorporating these agents into polymeric particles, it is possible to entrap these materials in microcapsules prepared, for example, by coacervation techniques or by interfacial polymerization, for example, hydroxymethylcellulose or gelatine-microcapsules and poly(methylmethacylate) microcapsules, respectively, or in colloidal drug delivery systems, for example, liposomes, albumin microspheres, microemulsions, nanoparticles, and nanocapsules or in macroemulsions. Such techniques are disclosed in Remington's Pharmaceutical Sciences (1980).


[0189] The invention further provides a pharmaceutical pack or kit comprising one or more containers filled with one or more of the ingredients of the pharmaceutical compositions of the invention. Associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration. In addition, the agents of the present invention may be employed in conjunction with other therapeutic compounds.



EXPERIMENTAL


EXAMPLE 1

[0190] Overview of Experimental Design and Methods


[0191] 1. Shotgun Sequencing Strategy


[0192] The overall strategy for a shotgun approach to whole genome sequencing is outlined in Table 3. The theory of shotgun sequencing follows from the application of the equation for the Poisson distribution px=mxe/x!, where x is the number of occurrences of an event and m is the mean number of occurrences. To determine the probability that any given base is not sequenced after a certain amount of random sequence has been generated, if L is the genome length, n is the number of clone insert ends sequenced, and w is the sequencing read length, then m=nw/L, and the probability that no clone originates at any of the w bases preceding a given base, i.e., the probability that the base is not sequenced, id p0=e−m. Using the fold coverage as the unit form, one sees that after 580 kb of sequence has been randomly generated, m=1, representing 1× coverage. In this case, p0=e−1=37, thus approximately 37% is unsequenced. A 5× coverage (approximately 3150 clones sequenced from both insert ends) yields p0=e−5=0.0067, or 0.67% unsequenced. The total gap length is Le−m and the average gap size is L/n. 5×coverage would leave about 48 gaps averaging about 80 bp in size. The treatment is essentially that of Lander and Waterman. Table 4 illustrates a computer simulation of a random sequencing experiment for coverage of a 580 kb genome with an average fragment size of 400 bp.


[0193] 2. Random Library Construction


[0194] In order to approximate the random model described above during actual sequencing, a nearly ideal library of cloned genomic fragment is required. M. genitalium genomic chromosomal DNA was mechanically sheared, digested with BAL31 nuclease to produce blunt-ends, and size-fractionated by agarose gel electrophoresis. Fragments in the 2.0 kb size range were excised and recovered. These fragments were ligated to SmaI-cut, phophatased pUC18 vector and the ligated products were fractionated on an agarose gel. The linear vector plus insert band was excised and recovered. The ends of the linear recombinant molecules were repaired with T4 polymerase treatment and the molecules were then ligated into circles. This two-stage procedure resulted in a molecularly random collection of single-insert plasmid recombinants with minimal contamination from double-insert plasmid recombinants with minimal contamination from double-insert chimeras (<1%) or free vector (<1%). Deviation from randomness is most likely to occur during cloning. E. coli host cells deficient in all recombinant and restriction functions were used to prevent rearrangements, deletions, and loss of clones by restriction. Transformed cells were plated directly on antibiotic diffusion plates to avoid the usual broth recovery phase which allows multiplication and selection of the most rapidly growing cells. All colonies were picked for template preparation regardless of size. Only clones lost due to “poison” DNA or deleterious gene products would be deleted from the library, resulting in a slight increase in gap number over that expected.


[0195] In order to evaluate the quality of the M. genitalium random insert library, sequence data was obtained from approximately 2000 templates using the M13F primer. The random sequence fragments were assembled using The Institute for Genomic Research (TIGR) autoassembler software after obtaining 500, 1000, 1500, and 2000 sequence fragments, and the number of unique assembled base pairs was determined. The progression of assembly was plotted using the actual data obtained from the assembly of up to 2000 sequence fragments and compared the data that is provided in the ideal plot. There was essentially no deviation of the actual assembly data from the ideal plot, indicating that we had constructed close to an ideal random library with minimal contamination from double insert chimeras and free of vector.


[0196] 3. Random DNA Sequencing


[0197] Five-thousand seven hundred and sixty (5,760) plasmid templates were prepared using a “boiler bead” preparation method developed in collaboration with AGTC (Gaithersburg, Md.), as suggested by the manufacturer. The AGTC method is performed in a 96-well format for all stages of DNA preparation from bacterial growth through final DNA purification. Template concentration was determined using Hoechst Dye and a Millipore Cytofluor. DNA concentrations were not adjusted and low-yielding templates were identified and not sequenced where possible. Sequencing reactions were carried out on plasmid templates using the AB Catalyst Lab station or Perkin-Elmer 9600 Thermocyclers with Applied Biosystems PRISM Ready Reaction Dye Primer Cycle Sequencing Kits for the M13 forward (−21M13) and the M13 reverse (RP1) primers. Dye terminator sequencing reactions were carried out on the lambda templates on a Perkin-Elmer 9600 Thermocyler using the Applied Biosystems Ready Reaction Dye Terminator Cycle Sequencing kits. Nine-thousand eight hundred and forty-six (9,846) sequencing reactions were performed during the random phase of the project by 4 individuals using an average of 10 AB373 DNA Sequencers over a 2 month period. All sequencing reactions were analyzed using the Stretch modification of the AB373, primarily using a 36cm well-to-read distance. The overall sequencing success rate for M13-21 sequences was 88% and 84% for M13RP1 sequences. The average usable read length for M13-21 sequences was 485 and 441 for M13RP1 sequences.


[0198] The art has described the value of using sequence from both ends of sequencing templates to facilitate ordering of contigs in shotgun assembly projects. A skilled artisan must balance the desirability of both-end sequencing (including the reduced cost of lower total number of templates) against shorter read-lengths and lower success rates for sequencing reactions performed with the M13RP1 (reverse) primer compared to the M13-21 (forward) primer. For this project, essentially all of the templates were sequenced from both ends.


[0199] 4. Protocol for Automated Cycle Sequencing


[0200] The sequencing consisted of using five (5) ABI Catalyst robots and ten (10) ABI 373 Automated DNA Sequencers. The Catalyst robot is a publicly available sophisticated pipetting and temperature control robot which has been developed specifically for DNA sequencing reactions. The Catalyst combines pre-aliquoted templates and reaction mixes consisting of deoxy- and dideoxynucleotides, the Taq thermostable DNA polymerase, fluorescently-labeled sequencing primers, and reaction buffer. Reaction mixes and templates were combined in the wells of an aluminum 96-well thermocycling plate. Thirty consecutive cycles of linear amplification (e.g., one primer synthesis) steps were performed including denaturation, annealing of primer and template, and extension of DNA synthesis. A heated lid with rubber gaskets on the thermocycling plate.prevented evaporation without the need for an oil overlay.


[0201] Two sequencing protocols were used: dye-labeled primers and dye-labeled dideoxy chain terminators. The shotgun sequencing involves use of four dye-labeled sequencing primers, one for each of the four terminator nucleotide. Each dye-primer is labeled with a different fluorescent dye, permitting the four individual reactions to be combined into one lane of the 373 DNA Sequencer for electrophoresis, detection, and base-calling. ABI currently supplies pre-mixed reaction mixes in bulk packages containing all the necessary non-template reagents for sequencing. Sequencing can be done with both plasmid and PCR-generated templates with both dye-primers and dye-terminators with approximately equal fidelity, although plasmid templates generally give longer usable sequences.


[0202] Thirty-two reactions were loaded per 373 Sequencer each day, for a total of 960 samples. Electrophoresis was run overnight following the manufacture's protocols, and the data was collected for twelve hours. Following electrophoresis and fluorescence detection, the ABI 373 performs automatic lane tracking and base-calling. The lane-tracking was confirmed visually. Each sequence electropherogram (or fluorescence lane trace) was inspected visually and assessed for quality. Trailing sequences of low quality were removed and the sequence itself was loaded via software a Sybase database (archived daily to a 8 mm tape). Leading vector polylinker sequence was removed automatically by software program. The average edited lengths of sequences from the ABI 373 Sequencers converted to Stretch Liners were approximately 460 bp.


[0203] Informatics


[0204] 1. Data Management


[0205] A number of information management systems (LIMS) for a large-scale sequencing lab have been developed. A system was used which allowed an automated data flow wherever possible to reduce user error. The system used to collect and assemble the sequence information obtained is centered upon a relational data management system built using the Sybase RDBMS. The database is designed to store and correlate all information collected during the entire operation from template preparation to final analysis of the genome. Because the raw output of the AB 373 Sequencers is based on a Macintosh platform and the data management system chosen is based on a Unix platform, it was necessary to design and implement a variety of multi-user, client server applications which allow the raw data as well as analysis results to flow seamlessly into the database with a minimum of user effort.


[0206] 2. Assembly


[0207] The sequence data from 8,472 sequence fragments was used to assemble the M. genitalium genome. The assembly was performed by using a new assembly engine (TIGR Assembler—previously designated ASMG) developed at TIGR. The TIGR Assembler simultaneously clusters and assembles fragments of the genome. In order to obtain the necessary speed, the TIGR Assembler builds a hash table of 10 bp oligonucleotide subsequences to generate a list of potential sequence fragment. The number of potential overlaps for each fragment determines which fragments are likely to fall into repetitive elements. Beginning with a single seed sequence fragment, the TIGR Assembler extends the current contig by attempting to add the best matching fragment based on oligonucleotide content. The current contig and candidate fragment are aligned using a modified version of the Smith-Waterman algorithm which provides for optimal gap alignments. The current contig is extended by the fragment only if strict criteria for the quality of the match are met. The match criteria include the minimum length of overlap, the maximum length of an unmatched end, and the minimum percentage match. These criteria are automatically lowered by the TIGR Assembler in regions of minimal coverage and raised in regions with a good chance of containing repetitive elements. Potentially chimeric fragments and fragments representing the boundaries of repetitive elements are often rejected based on partial mismatches at the ends of alignments and excluded from the current contig. The TIGR Assembler is designed to take advantage of clone size information coupled with sequencing from both ends of each template. The TIGR Assembler enforces the constraint that sequence fragments from two ends of the same template point toward one another in the contig and are located within a certain range of base pairs (definable for each clone based on the known clone size range for a given library). Assembly of the 8,472 sequence fragments of M. genitalium required 10 hours of CPU time on a SPARCenter 2000. All contigs were loaded into a Sybase structure representing the location of each fragment in the contig and extensive information about the consensus sequence itself. The result of this process was approximately 40 contigs ordered into 2 groups (See below). Because of the high stringency of the TIGR Assembler process it was found to be useful to perform a FASTA (GRASTA) alignment of all contigs built by the TIGR Assembler process against each other. In this way additional overlaps were detected which enabled compression of the data set into 26 contigs in 2 groups.


[0208] Achieving Closure


[0209] The complete genome sequence was obtained by sequencing across the gaps between contigs. While gap filling has occupied a major portion of the time and expense of other genome sequencing projects, it was minimal in the present invention. This was primarily due to 1) saturation of the genome as a result of the number of random clones and sequencing reactions performed, 2) the longer read lengths obtained from the Stretch Liners, 3) the anchored ends which were obtain for joining contigs, and 4) the overall capacity and efficiency of the high throughput sequencing facility.


[0210] Gaps occurred on a predicted random basis, as shown in Table 4, which illustrates simulated random sequencing. These gaps generally were less than 200 bp in size. All of the gaps were closed by sequencing further on the templates bordering the gaps. In these cases, oligo primers for extension of the sequence from both ends of the gap were generated using techniques known in the art. This gave a double standard coverage across the gap areas.


[0211] The high redundancy of sequence information that was obtained from the shotgun approach gave a highly accurate sequence. Our sequence accuracy was confirmed by comparing the sequence information obtained against known M. genitalium genes present in the GenBank database. The accuracy of our chromosome structure was confirmed by comparison of restriction digests to the known restriction map of IM. genitalium. The EcoRI restriction map of M. genitalium is shown in FIG. 1 and expressed in tabular form in Table 5.


[0212] Identifying Genes


[0213]

M. genitalium
ORFs were initially defined by evaluating their coding potential with the program GeneWorks using composition matrices specific to Mycoplasma genomic DNA. The ORF sequences (plus 300 bp of flanking sequence) were used in searches against a database of non-redundant bacterial proteins (NRBP). Redundancy was removed from NRBP at two stages. (1) All DNA coding sequences were extracted from GenBank (release 85), and sequences from the same species were searched against each other. Sequences having >97% similarity over regions >100 nucleotides were combined. (2) The sequences were translated and used to protein comparisons with all sequences in Swiss-Prot (release 30). Sequences belonging to the same species and having >98% similarity over 33 amino acids were combined. NRBP is composed of 21445 sequences from 23751 GenBank sequences and 11183 Swiss-Prot sequences from 1099 different species.


[0214] Searches were performed using an algorithm that (1) translates the query DNA sequence in all six reading frames for searching against a protein database, (2) identifies the protein sequences that match the query, and (3) aligns the protein-protein matches using a modified Smith-Waterman algorithm. In cases where insertion or deletions in the DNA sequence produced a frame shift error, the alignment algorithm started with protein regions of maximum similarity and extended the alignment to the same database match using the 300 bp flanking region. Regions known to contain frame shift errors were saved to the database and evaluated for possible correction. The role categories were adopted from those previously defined by Riley et al. for E. coli gene products. Role assignments were made to M. genitalium ORFS at the protein sequence level by linking the protein sequence of the ORFS with the Swiss-Prot sequences in the Riley database.


[0215] Detailed Description of Sequencing the Mycoplasma genitalium Genome, Genome Analysis and Comparative Genomics


[0216] We have determined the complete nucleotide sequence (580,071 bp) of the Mycoplasma genitalium genome using the approach of whole chromosome shotgun sequencing and assembly, which has successfully been applied to the analysis of the Haemophilus influenzae genome (R. Fleischmann et al., Science 269:496 (1995)). These data, together with the description of the complete genome sequence (1.83 Mb) of the eubacterium Haemophilus influenzae, have provided the opportunity for comparative genomics on a whole genome level for the first time. Our initial whole genome comparisons reveal fundamental differences in genome content which are reflected in different physiological and metabolic capacities of M. genitalium and H. influenzae.


[0217] The strategy and methodology for whole genome shotgun sequencing and assembly was similar to that previously described for H. influenzae (R. Fleischmann et al., Science 269:496 (1995). In particular, a total of 50 μg of purified M. genitalium strain G-37 DNA (ATCC No. 33530) was isolated from cells grown in Hayflick's medium. A mixture (990 μl) containing 50 μg of DNA, 300 mM sodium acetate, 10 mM tris HCl, 1 mM EDTA, and 30 percent glycerol was chilled to 0° C. in a nebulizer chamber and sheared at 4 lbs/in2 for 60 seconds. The DNA was precipitated in ethanol and redissolved in 50 μl of tris-EDTA (TE) buffer to create blunt ends; a 40 μl portion was digested for 10 minutes at 30° C. in 85 μl of BAL31 buffer with 2 units of BAL 31 nuclease (New England BioLabs). The DNA was extracted with phenol, precipitated in ethanol, dissolved in 60 μl of TE buffer, and fractionated on a 1.0 percent low melting agarose gel. A fraction (2.0 kb) was excised, extracted with phenol, and redissolved in 20 μl of TE buffer. A two-step ligation procedure was used to produce a plasmid library in which 99% of the recombinants contained inserts, of which >99% were single inserts. The first ligation mixture (50 μl) contained approximately 2 μg DNA fragments, 2 μg of SmaI+bacterial alkaline phosphatase pUC 18 DNA (Pharmacia), and 10 units of T4 DNA ligase (GIBCO/BRL), and incubation was for 5 hours at 4° C. After extraction with phenol and ethanol precipitation, the DNA was dissolved in 20 μl of TE buffer and separated by electrophoresis on a 1.0 percent low melting agarose gel. A ladder of ethidium bromide-stained, linearized DNA bands, identified by size as insert (i), vector (v), v+i, v+2i, v+3i, etc. was visualized by 360 nm ultraviolet light. The v+i DNA was excised and recovered in 20 μl of TE buffer. The v+i DNA was blunt-ended by T4 polymerase treatment for 5 minutes at 37° C. in a reaction mixture (50til) containing the linearized v+i fragments, four deoxynucleotide triphosphates (dNTPs) (25 μM each), and 3 units of T4 polymerase (New England Biolabs) under buffer conditions recommended by the supplier. After phenol extraction and ethanol precipitation, the repaired v+i linear pieces were dissolved in 20 μl of TE. The final ligation to produce circles was carried out in a 50 μl reaction containing 5 μl of v+i DNA and 5 units of T4 ligase at 15° C. overnight. The reaction mixture was heated at 67° C. for 10 minutes and stored at −20° C.


[0218] For transformation, a 100 μl portion of Epicurian-SURE 2 Supercompetent Cells (Stratagene 200152) was thawed on ice and transferred to a chilled Falcon 2059 tube on ice. A 1.7 μl volume of 1.42M β-mercaptoethanol was added to the cells to a final concentration of 25 mM. Cells were incubated on ice for 10 minutes. A 1 μl sample of the final ligation mix was added to the cells and incubated on ice for 30 minutes. The cells were heat-treated for 30 seconds at 42° C. and placed back on ice for 2 minutes. The outgrowth period in liquid culture was omitted to minimize the preferential growth of any transformed cell. Instead, the transformed cells were plated directly on a nutrient rich SOB plate containing a 5 ml bottom layer of SOB agar (1.5 percent SOB agar consisted of 20 g of tryptone, 5 g of yeast extract, 0.5 g of NaCl, and 1.5 percent Difco agar/liter). The 5 ml bottom layer was supplemented with 0.4 ml of ampicillin (50 mg/ml) per 100 ml of SOB agar. The 15 ml top layer of SOB agar was supplemented with 1 ml of MgCl2 (1M) and 1 ml of MgSO4 (1M) per 100 ml of SOB agar. The 15 ml top layer was poured just before plating. The titer of the library was approximately 100 colonies per 10 μl aliquot of transformation.


[0219] One of the lessons learned from sequencing and assembly of the complete H. influenzae genome was that contig ordering and gap closure is most efficient if the random sequencing phase of the project is continued until at least 99.8% -99.9% of the genome is sequenced with at least 6-fold coverage. To calculate the number of random sequencing reactions necessary to obtain this coverage for the M. genitalium genome, we made use of the Lander and Waterman [E. S. Lander and M. S. Waterman, Genomics 2:231 (1988)] application of the Poisson distribution, where px=e−nw/L. px is the probability that any given base is not sequenced, n is the number of clone insert ends sequenced, w is the average read length of each template in bp, and L is the size of the genome in bp. For a genome of 580 kb with an average sequencing read length of 450 bp after editing, approximately 8650 sequencing reactions (or 4325 clones sequenced from both ends) should theoretically provide 99.85% coverage of the genome. This level of coverage should leave approximately 10 gaps with an average size of 70 bp unsequenced.


[0220] To evaluate the quality of the M. genitalium library, sequence data were obtained from both ends of approximately 600 templates using both the M13 forward (M13-21) and the M13 reverse (M13RP1) primers. Sequence fragments were assembled using the TIGR ASSEMBLER and found to approximate a Poisson distribution of fragments with an average read length of 450 bp for a 580 kb library, indicating that the library was essentially random.


[0221] For this project, a total of 5760 double-stranded DNA plasmid templates were prepared in a 96-well format using a boiling bead method. Ninety-four percent of the templates prepared yielded a DNA concentration ≧30 ng/μl and were used for sequencing reactions. To facilitate ordering of contigs each template was sequenced from both ends. Reactions were carried out on using the AB Catalyst LabStation with Applied Biosystems PRISM Ready reaction Dye Primer Cycle Sequencing Kits for the M13 forward (M13-21) and the M13 reverse (M13RP1) primers. The success rate and average read length after editing with the M13-21 primer were 88 percent and 444 bp, respectively, and 84 percent and 435 bp, respectively, with the M13RP1 primer. All data from template preparation to final analysis of the project were stored in a relational data management system developed at TIGR [A. R. Kerlavage et al., Proceedings of the Twenty-Sixth Annual Hawaii International Conference on System Science (IEEE Computer Society Press, Washington, D.C., 1993), p. 585] To facilitate ordering of contigs each template was sequenced from both ends. A total of 9846 sequencing reactions were performed by five individuals using an average of 8 AB 373 DNA Sequencers per day for a total of 8 weeks. Assembly of 8472 high quality M. genitalium sequence fragments along with 299 random genomic sequences from Peterson et al. (S. N. Peterson et al., J. Bacteriol. 175:7918 (1993)) was performed with the TIGR ASSEMBLER. The assembly process generated 39 contigs (size range: 606 to 73,351 bp) which contained a total of 3,806,280 bp of primary DNA sequence data. Contigs were ordered by ASM_ALIGN, program which links contigs based on information derived from forward and reverse sequencing reactions from the same clone.


[0222] ASM_ALIGN analysis revealed that all 39 gaps were spanned by an existing template from the small insert genomic DNA library (i.e., there were no physical gaps in the sequence assembly). The order of the contigs was confirmed by comparing the order of the random genomic sequences from Peterson et al. (S. N. Peterson et al., J. Bacteriol. 175:7918 (1993)) that were incorporate into the assembly with their known position on the physical map of the M. genitalium chromosome (T. S. Lucier et al., Gene 150:27 (1994); Peterson et al., J. Bacteriol. 177:3199 (1995)). Because of the high stringency of the TIGR ASSEMBLER, the 39 contigs were searched against each other with GRASTA (a modified FASTA (B. Brutlag et al., Comp. Chem. 1:203 (1993)). The BLOSUM 60 amino acid substitution matrix was used in all protein-protein comparisons [S. Henikoff and J. G. Henikoff, Proc. Natl. Acad. Sci. USA 89:1091 (1992)] to detect overlaps (<30 bp) that would have been missed during the initial assembly process. Eleven overlaps were detected with this approach which reduced the total number of gaps from 39 to 28.


[0223] Templates spanning each of the sequence gaps were identified and oligonucleotide primers were designed from the sequences at the end of each contig. All gaps were less than 300 bp; thus a primer walk from both ends of each template was sufficient for closure. All electropherograms were visually inspected with TIGR EDITOR (R. Fleischmann et al., Science 269:496 (1995)) for initial sequence editing. Where a discrepancy could not be resolved o a clear assignment made, the automatic base calls were left unchanged.


[0224] Several criteria for determination of sequence completion were established for the H. influenzae genome sequencing project and these same criteria were applied to this study. Across the assembled M. genitalium genome there is an average sequence redundancy of 6.5-fold. The completed sequence contains less than 1-% single sequence coverage. For each of the 53 ambiguities remaining after editing and the 25 potential frameshifts found after sequence-similarity searching, the appropriate template was resequenced with an alternative sequencing chemistry (dye terminator vs. dye primer) to resolve ambiguities. Although it is extremely difficult to assess sequence accuracy, we estimate our error rate to be less than 1 base in 10,000 based upon frequency of shifts in open reading frames, unresolved ambiguities, overall quality of raw data, and fold coverage.


[0225] A direct cost estimate for sequencing, assembly, and annotation of the M. genitalium genome was determined by summing reagent and labor costs for library construction, template preparation and sequencing, gap closure, sequence confirmation, annotation, and preparation for publication, and dividing by the size of the genome in base pairs. This yielded a final cost of 30 cents per finished base pair.


[0226] Genomic Analysis


[0227] The M. genitalium genome is a circular chromosome of 580,071 bp. The overall G+C content is 32% (A, 34%; C, 16%; G, 16%; and T, 34%). The G+C content across the genome varies between 27 and 37% (using a window of 5000 bp), with the regions of lowest G+C content flanking the presumed origin of replication of the organism. As in H. influenzae (Fleischmann, R. et al., Science 269:496 (1995)), the rRNA operon in M. genitalium contains a higher G+C content (44%) than the rest of the genome, as do the tRNA genes (52%). The higher G+C content in these regions may reflect the necessity of retaining essential G+C base pairing for secondary structure in rRNAs and tRNAs (Rogers, M. J. et al., Isr. J. Med. Sci. 20:768 (1984)).


[0228] The genome of M. genitalium contains 74 EcoRI fragments, as predicted by cosmid mapping data (Lucier, T. S. et al., Gene 150:27 (1994); Peterson et al., J. Bacteriol. 177:3199 (1995)). The order and sizes of the EcoRI fragments determined from sequence analysis are in agreement with those previously reported (Lucier, T. S. et al., Gene 150:27 (1994); Peterson et al., J. Bacteriol. 177:3199 (1995)), with one apparent discrepancy between coordinates 62,708 and 94,573 in the sequence. However, re-evaluation of cosmid hybridization data in light of results from genome sequence analysis confirms that the sequence data are correct, and the extra 4.0 kb EcoRI fragment in this region of the cosmid map reflects a misinterpretation of the overlap between cosmids J-8 and 21 (Lucier, T. S., unpublished observation). The ends of each clone from the ordered cosmid library were sequenced and are shown on the circular chromosome in FIG. 4. The order of the cosmids based on sequence analysis is in complete agreement with that determined by physical mapping (Lucier, T. S. et al., Gene 150:27 (1994); Peterson et al., J. Bacteriol. 177:3199 (1995)).


[0229] We defined the first bp of the chromosomal sequence of M. genitalium based on the putative origin of replication (Bailey & Bott, J. Bacteriol. 176:5814 (1994)). Studies of origins of replication in some prokaryotes have shown that DNA synthesis is initiated in an untranscribed AT rich region between dnaA and dnaN (Ogasawara, N. et al., in The Bacterial Chromosome, Krlica & Riley, eds., American Society for Microbiology, Washington, D.C. (1990), pp. 287-295; Ogasawara & Yoshikawa, Mol. Microbiol. 6:629 (1992)). A search of the M. genitalium sequence for “DnaA boxes” around the putative origin of replication with consensus “DnaA boxes” from Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa revealed no significant matches. Although we have not been able to precisely localize the origin, the co-localization of dnaA and dnaN to a 4000 bp region of the chromosome lends support to the hypothesis that it is the functional origin of replication in M. genitalium (Ogasawara, N. et al., in The Bacterial Chromosome, Krlica & Riley, eds., American Society for Microbiology, Washington, D.C. (1990), pp. 287-295; Ogasawara & Yoshikawa, Mol. Microbiol. 6:629 (1992), Miyata, M. et al., Nucleic Acids Res. 21:4816 (1993)). We have chosen an untranscribed region between dnaA and dnaN so that dnaN is numbered as the first open reading frame in the genome. As seen in FIG. 4, genes to the right of this region are preferentially transcribed from the plus strand and to the left of this region, are preferentially transcribed from the minus strand. The apparent polarity in gene transcription is maintained across each half of the genome (FIGS. 4 and 5). This stands in marked contrast to H. influenzae which displays no apparent polarity of transcription around the origin of replication. The significance of this observation remains to be determined.


[0230] The predicted coding regions of M. genitalium were initially defined by searching the entire genome for open reading frames greater than 100 amino acids. Translations were made using the genetic code for mycoplasma species in which UGA encodes tryptophan. All open reading frames were searched with BLAZE (Brutlag, D. et al., Comp. Chem. 1:203 (1993). The BLOSUM 60 amino acid substitution matrix was used in all protein-protein comparisons (Henikoff, S. and Henikoff, J. G., Proc. Natl. Acad. Sci. USA 89:1091 (1992)) against a non-redundant bacterial protein database (NRBP) (Fleischmann, R. et al., Science 269:496 (1995)) developed at TIGR on a MasPar MP-2 massively parallel computer with 4096 microprocessors. Protein matches were aligned with PRAZE, a modified Smith-Waterman (Waterman, M. S., Methods Enzymol. 164:765 (1988)) algorithm. Segments between predicted coding regions of the genome were used in additional searches against all protein sequences from GenPept, Swiss-Prot, and PIR. Pairwise alignments between M. genitalium predicted open reading frames and sequences from the public archives were examined. Motif matches were annotated in cases where sequence similarity was confined to short domains in the predicted coding region. The coding potential of 170 unidentified open reading frames was analyzed with GeneMark (Borodovsky & Mcninch, ibid, p. 123) which had been trained with 308 M. genitalium sequences. Open reading frames that had low coding potential (based on the GeneMark analysis) and were smaller than. 100 nucleotides (a total of 53) were removed from the final set of putative coding regions. In a separate analysis, open reading frames were searched against the complete set of translated sequences from H. influenzae (GSDB accession L42023, see (Fleischmann, R. et al., Science 269:496 (1995))). In total, these processes resulted in the identification of 482 predicted coding regions, of which 365 were putatively identified (Twenty-three of the protein matches in Table 6 were annotated as motifs. These data matches were not full-length protein matches, but nonetheless displayed regions of significant amino acid similarity) and 117 had no matches to protein sequences from any other organism.


[0231] The 365 predicted coding regions that matched protein sequences from the public sequence archives were assigned biological roles. The role classifications were developed from Riley (Riley, M., Microbiol. Rev. 57:862 (1992)) and identical to those used in H. influenzae assignments (Fleischmann, R., et al., Science 269:496 (1995)). A separate search procedure was used in cases where we were unable to detect genes in the M. genitalium genome. Query peptide sequences that were available from eubacteria such as E. coli, B. subtilis, M. capricolum, and H. influenzae were used in searches against all six reading frame translations of the entire genome sequence, and the alignments were examined. The possibility remains that current searching methods, an incomplete set of query sequences, or the subjective analysis of the database matches, are not sensitive enough to identify certain M. genitalium gene sequences.


[0232] One-half of all predicted coding regions in M. genitalium for which a putative identification could be assigned display the greatest degree of similarity to a protein from either a gram-positive organism (e.g., B. subtilis) or a Mycoplasma species. The significance of this finding is underscored by the fact that NRBP contained 3885 sequences from E. coli and only 1975 sequences from B. subtilis. In the majority of cases where M. genitalium coding regions matched sequences from both E. coli and Bacillus species, the better match was to a sequence from Bacillus (average of 62 percent similarity) rather than to a sequence from E. coli (average of 56 percent similarity). The evolutionary relationship between Mycoplasma and the Lactobacillus-Clostridium branch of the gram-positive phylum has been deduced from small subunit rRNA sequences (Maidak, B. L. et al., Nucleic Acids Research 22:3485 (1994)). Our data from whole genome analysis support this hypothesis.


[0233] Comparative Genomics: M. genitalium and H. influenzae


[0234] A survey of the genes and their organization in M. genitalium makes possible the description of a minimal set of genes required for survival. One would predict that a minimal cell must contain genes for replication and transcription, at least one rRNA operon and a set of ribosomal proteins, tRNAs and tRNA synthetases, transport proteins to derive nutrients from the environment, biochemical pathways to generate ATP and reducing power, and mechanisms for maintaining cellular homeostasis. Comparison of the genes identified in M. genitalium with those in H. influenzae allows for identification of a basic complement of genes conserved in these two species and provides insights into physiological differences between one of the simplest self-replicating prokaryotes and a more complex, gram-negative bacterium.


[0235] The M. genitalium genome contains 482 predicted coding sequences (Table 6) as compared to 1,727 identified in H. influenzae (Fleischmann, R. et al., Science 269:496 (1995)). Table 7 summarizes the gene content of both organisms sorted by functional category. The percent of the total genome in M. genitalium and H. influenzae encoding genes involved in cell envelope, cellular processes, energy metabolism, purine and pyrimidine metabolism, replication, transcription, transport, and other categories is similar; although the total number of genes in these categories is considerably fewer in M. genitalium. A smaller percentage of the M. genitalium genome encodes genes involved in amino acid biosynthesis, biosynthesis of co-factors, central intermediary metabolism, fatty acid and phospholipid metabolism, and regulatory functions as compared with H. influenzae. A greater percentage of the M. genitalium genome encodes proteins involved in translation than in H. influenzae , as shown by the similar numbers of ribosomal proteins and tRNA synthetases in both organisms.


[0236] The 482 predicted coding regions in M. genitalium (average size of 1100 bp) cover 85% of the genome (on average, one gene every 1169 bp), a value similar to that found in H. influenzae where 1727 predicted coding regions (average size of 900 bp) cover 91% of the genome (one gene every 1042 bp). These data indicate that the reduction in genome size that has occurred within Mycoplasma has not led to an increase in gene density or a decrease in gene size (Bork, P. et al., Mol. Microbiol. 16:955 (1995)). A global search of M. genitalium and H. influenzae genomes reveals short regions of conservation of gene order, particularly two clusters of ribosomal proteins.


[0237] Replication. Two major protein complexes are formed during replication: the primosome and the replisome. We have identified genes encoding many of the essential proteins in the replication process, including M. genitalium isologs of the primosome proteins DnaA, DnaB, GyrA, GyrB, a single stranded DNA binding protein, and the primase protein, DnaE. DnaJ and DnaK, heat shock proteins that may function in the release of the primosome complex, are also found in M. genitalium. A gene encoding the DnaC protein, responsible for delivery of DnaB to the primosome, has yet to be identified.


[0238] Genes encoding most of the essential subunit proteins for DNA polymerase III in M. genitalium were also identified. The polC gene encodes the a subunit which contains the polymerase activity. We have also identified the isolog of dnaH in B. subtilis (dnaX in E. coli) which encodes the γ and t subunits as alternative products from the same gene. These proteins are necessary for the processivity of DNA polymerase III. An isolog of dnaN which encodes the P subunit was previously identified in M. genitalium (Bailey & Bott, J. Bacteriol. 176:5814 (1994)) and is involved in the process of clamping the polymerase to the DNA template. While we have yet to identify a gene encoding the subunit responsible for the 3′-5′ proofreading activity, it is possible that this activity is encoded in the a subunit as has been previously described (Sanjanwala, B. and Ganesa, A. T., Mol. Gen. Genet. 226:467 (1991); Sanjanwala, B. and Ganesan, A. T., Proc. Natl. Acad. Sci. USA 86:4421 (1989)). Finally, we have identified a gene encoding a DNA ligase, necessary for the joining of the Okazaki fragments formed during synthesis of the lagging strand.


[0239] While we have identified genes encoding many of the isologs thought to be essential for DNA replication, some genes encoding proteins with key functions have yet to be identified. Examples of these are the DnaC protein mentioned above as well as Dnaθ and Dnaδ whose functions are less well understood but are thought to be involved in the assembly and processivity of polymerase III. Also apparently absent is a specific RNaseH protein responsible for the hydrolysis of the RNA primer synthesized during lagging strand synthesis.


[0240] DNA Repair. It has been suggested that in E. coli as many as 100 genes are involved in DNA repair (Kornberg, A. and Baker, T. A., DNA Replication—2nd Ed., W. H. Freeman and Co., New York (1992)), and in H. influenzae the number of putatively identified DNA repair enzymes is approximately 30 (Fleischmann, R. et al., Science 269:496 (1995)). Although M. genitalium appears to have the necessary genes to repair many of the more common lesions in DNA, the number of genes devoted to the task is much smaller. Excision repair of regions containing missing bases (apurinic/apyriminic (AP) sites) can likely occur by a pathway involving endonuclease IV (info), Pol I, and ligase. The ung gene which encodes uracil-DNA glycosylase is present. This activity removes uracil residues from DNA which usually arise by spontaneous deamination of cytosine. This produces an AP site which could then be repaired as described above.


[0241] All three genes necessary for production of the uvr ABC exinuclease are present, and along with Pol I, helicase II, and ligase should provide a mechanism for repair of damage such as cross-linking, which requires replacement of both strands. Although recA is present, which in E. coli is activated as it binds to single strand DNA, thereby initiating the SOS response, we find no evidence for a lexA gene which encodes the repressor which regulates the SOS genes. We have not identified photolyase (phr) in M. genitalium which repairs UV-induced pyrimidine dimers, or other genes involved in reversal of DNA damage rather than excision and replacement of the lesion.


[0242] Transcription. The critical components for transcription were identified in M. genitalium. In addition to the a, b, and b-prime subunits of the core RNA polymerase, M. genitalium appears to encode a single & factor, whereas E. coli and B. subtilis encode at least six and seven, respectively. We have not detected a homolog of the Rho termination factor gene, so it seems likely that a mechanism similar to Rho-independent termination in E. coli operates in M. genitalium. We have clear evidence for homologs of only two other genes which modulate transcription, nusA and nusG.


[0243] Translation. M. genitalium possesses a single rRNA operon which contains three rRNA subunits in the order: 16S rRNA(1518 bp)-spacer (203 bp)-23S rRNA (2905 bp)-spacer (56 bp)-5S rRNA (103 bp). The small subunit rRNA sequence was compared with the Ribosomal Database Project's (Maidak, B. L. et al., Nucleic Acids Research 22:3485 (1994)) prokaryote database with the program “similarity_yank.” Our sequence is identical to the M. genitalium (strain G37) sequence deposited there, and the 10 most similar taxa returned by this search are also in the genus Mycoplasma.


[0244] A total of 33 tRNA genes were identified in M. genitalium, these were organized into five clusters plus nine single genes. In all cases, the best match for each tRNA gene in M. genitalium was the corresponding gene in M. pneumoniae (Simoneau, P. et al., Nuc. Acids Res. 21:4967 (1993)). Furthermore, the grouping of tRNAs into clusters (tmA, trnB, trnC, trnD, and trnE) was identical in M. genitalium and M. pneumoniae as was gene order within the cluster (Simoneau, P. et al., Nuc. Acids Res. 21:4967 (1993)). The only difference between M. genitalium and M. pneumoniae observed with regard to tRNA gene organization was an inversion between trnD and GTG. In contrast to H. influenzae and many other eubacteria, no tRNAs were found in the spacer region between the 16S and 23S rRNA genes in the rRNA operon of M. genitalium, similar to what has been reported for M. capricolum (Sawada, M. et al., Mol. Gen. Genet. 182:502 (1981)).


[0245] A search of the M. genitalium genome for tRNA synthetase genes identified all of the expected genes with the exception of glutaminyl tRNA synthetase. We expect that this gene is present in the M. genitalium genome, but we have not been able to identify it by similarity searches. The latest GenBank release (release 89) contains only a single entry for a glutaminyl tRNA synthetase from a bacterial species; this was from E. coli, a gram-negative organism only distantly related to Mycoplasma. In general, tRNA synthetase sequences from gram-positive organisms such as B. subtilis displayed greater similarity to those from M. genitalium than the corresponding sequences from E. coli, lending support to the notion that the similarity between the E. coli and M. genitalium glutaminyl tRNA synthetase may not have been high enough to be detected.


[0246] Metabolic pathways. The reduction in genome size among Mycoplasma species is associated with a marked reduction in the number and components of biosynthetic pathways in these organisms, requiring them to use metabolic products from their hosts. In the laboratory, M. genitalium has not been grown in a chemically defined medium. The complex growth requirements of this organism can be explained by the almost complete lack of enzymes involved in amino acid biosynthesis, de novo nucleotide biosynthesis, and fatty acid biosynthesis (Table 6 and FIGS. 5A-5R). When the number of genes in the categories of central intermediary metabolism, energy metabolism, and fatty acid and phospholipid metabolism are summed, marked differences in gene content between H. influenzae and M. genitalium are apparent. For example, whereas the H. influenzae genome contains 68 genes involved in amino acid biosynthesis, the M. genitalium genome contains only one. In total, the H. influenzae genome has 167 genes associated with metabolic pathways whereas the M. genitalium genome has just 42. A recent analysis of 214 kb of sequence from Mycoplasma capricolum (Bork, P. et al., Mol. Microbiol. 16:955 (1995)), a related organism whose genome size is twice as large as that of M. genitalium, reveals that M. capricolum contains a number of biosynthetic enzymes not present in M. genitalium. This observation suggests that M. capricolum's larger genome confers a greater anabolic capacity.


[0247]

M. genitalium
is a facultative anaerobe that ferments glucose and possibly other sugars via glycolysis to lactate and acetate. Genes that encode all the enzymes of the glycolytic pathway were identified, including genes for components of the pyruvate dehydrogenase complex, phosphotransacetylase, and acetate kinase. The major route for ATP synthesis may be through substrate level phosphorylation since no cytochromes are present. M. genitalium also lacks all the components of the tricarboxylic acid cycle. None of the genes coding for glycogen or poly-beta-hydroxybutryate production were identified, indicating limited capacity for carbon and energy storage. The pentose phosphate pathway also appears limited since only genes encoding 6-phosphogluconate dehydrogenase and transketolase were identified. The limited metabolic capacity of M. genitalium sharply contrasts with the complexity of catabolic pathways in H. influenzae, reflecting the four-fold greater number of genes involved in energy metabolism found in H. influenzae.


[0248] Transport. The transporters identified in H. influenzae are specific for a range of nutritional substrates. Using protein transport as an example, both oligopeptide and amino acid transporters are represented. One interesting peptide transporter has homology to a lactococcin transporter (IcnDR3) and related bacteriocin transporters, suggesting the M. genitalium may export a small peptide with antibacterial activity. The H. influenzae isolog of the M. hyorhinis p37 high-affinity transport system also has a conserved lipid modification site, providing further evidence that the Mycoplasma binding-protein dependent transport systems are organized in a manner analogous to gram positive bacteria (Gilson, E. et al., EMBO J. 7:3971 (1988)).


[0249] Genes encoding proteins that function in the transport of glucose via the phosphoenolpyruvate:sugar transferase system (PTS) have been identified in M. genitalium. These include enzyme I (EI), HPr and sugar specific enzyme IIs (EII) (Postma, P. W. et al., Microbiol. Rev. 57:543 (1993)). EIIs consist of a complex of at least there domains, EIIA, EIIB and EIIC. In some bacteria (e.g., E. coli), EIIA is a soluble protein, while in others (Bacillus subtilis), a single membrane protein contains all three domains, EIIA, B and C. These variations in the proteins that make up the Ell complex are due to fusion or splitting of domains during evolution and are not considered to be mechanistic differences (Postma, P. W. et al., Microbiol. Rev. 57:543 (1993)). In M. genitalium EIIA, B, and C are located in a single protein similar to the protein found in B. subtilis. In Mycoplasma capricolum ptsH, the gene which encodes for HPr, is located on a monocistronic transcriptional unit while genes encoding EI (ptsI) and EIIA (crr) are located on a dicistronic operon (Zhu, P. P. et al., Protein Sci. 3:2115 (1994); Zhu, P. P. et al., J. Biol. Chem. 268:26531 (1993)). In most bacterial species studied to date, ptsl, ptsH, and crr are part of a polycistronic operon (pts operon). In M. genitalium ptsH, ptsI and the gene encoding EIIABC reside at different locations of the genome and thus each of these genes may constitute monocistronic transcriptional units. We have also identified EIIBC component for uptake of fructose; however, other components of the fructose PTS were not found. Thus, M. genitalium may be limited to the use of glucose as an energy source. In contrast, H. influenzae has the ability to use at least six different sugars as a source of carbon and energy.


[0250] Regulatory Systems. It appears that regulatory systems found in other bacteria are absent in M. genitalium. For instance, although two component systems have been described for a number of gram-positive organisms, no sensor or response regulator genes are found in the M. genitalium genome. Furthermore, the lack of a heat shock σ factor raises the question of how the heat shock response is regulated. Another stress faced by all metabolically active organisms is the generation of reactive oxygen intermediates such as superoxide anions and hydrogen peroxide. Although H. influenzae has an oxyR homologue, as well as catalase and superoxide dismutase, M. genitalium appears to lack these genes as well as an NADH peroxidase. The importance of these reactive intermediate molecules in host cell damage suggests that some as yet unidentified protective mechanism may exist within the cell.


[0251] Antigenic variation. Numerous examples exist of microbial pathogens expressing outer membrane proteins that vary due to DNA rearrangements as a mechanism for providing antigenic and functional variations that influence virulence potential (Bergstrom, S. et al., Proc. Natl. Acad. Sci. USA 83:3890 (1986); Meier, J. T. et al., Cell 47:61 (1986); Majiwa, P. A. O. et al., Nature 297:514 (1982)). Because humans are the natural host for both M. genitalium and H. influenzae, it was of interest to compare mechanisms for generating antigenic variation in these organisms. In H. influenzae, a number of virulence-related genes encoding membrane proteins contain tandem tetramer repeats that undergo frequent addition and deletion of one or more repeat units during replication, such that the reading frame of the gene is changed and its expression altered (Weiser, J. N. et al., Cell 59:657 (1989)).


[0252]

M. genitalium
appears to use a different system for evading host immune responses. The 140 kDa adhesion protein of M. genitalium is densely clustered at a differentiated tip of this organism and elicits a strong immune response in humans and experimentally infected animals (Collier, A. M. et al., Zbl. Bkt. Suppl. 20:73 (1992)). The adhesion protein (MgPa) operon in M. genitalium contains a 29 kDa ORF, the MgPa protein (160 kDa) and a 114 kDa ORF with intervening regions of 6 and 1 nt, respectively (Inamine, J. M. et al., Gene 82:259 (1989)). Based on hybridization experiments (Dallo, S. F. and Baseman, J. B., Microb. Pathog. 8:371 (1990)), multiple copies of regions of the M. genitalium MgPa gene and the 114 kDa ORF are known to exist throughout the genome.


[0253] The availability of the complete genomic sequence from M. genitalium has allowed a comprehensive mapping of the MgPa repeats (FIGS. 4 and 6). In addition to the complete operon, nine repetitive elements which are composites of particular regions of the MgPa operon were found. The percent of sequence identity between the repeat elements and the MgPa gene ranges from 78%-90%. In some of the repeats, the MgPa-related sequences are separated in the genome by a variable length, A-T rich spacer sequence, as has previously been described (Peterson, S. N., PhD dissertation, Univ. No. Carolina 1992, Univ. Mi. Dissertation Services #6246). The sequences contained in the MgPa operon and the nine repeats scattered throughout the chromosome represent 4.5% of the total genomic sequence. At first glance this might appear to contradict the expectation for a minimal genome. However, recent evidence for recombination between the repetitive elements and the MgPa operon has been reported (Peterson, S. N. et al., Proc. Natl. Acad. Sci. USA, in press (1995)). Such recombination may allow M. genitalium to evade the host immune response through mechanisms that induce antigenic variation within the population. Since M. genitalium survives in nature by obtaining essential nutrients from its mammalian host, an efficient mechanism to evade the immune response may be a necessary part of this minimal genome.


[0254] The M. genitalium genome contains 93 putatively identified genes that are apparently not present in H. influenzae. Almost 60% of these genes have database matches to known or hypothetical proteins from gram-positive bacteria or other Mycoplasma species, suggesting that these genes may encode proteins with a restricted phylogenetic distribution. One hundred seventeen potential coding regions in M. genitalium have no database match to any sequences in public archives including the entire H. influenzae genome; therefore, these likely represent novel. genes in M. genitalium, and related organisms.


[0255] The predicted coding sequences of the hypothetical ORFs, the ORFs with motif matches and the ORFs that have no similarities to known peptide sequences were analyzed. The two programs used were the Kyte-Doolittle algorithm (Kyte, J. and Doolittle, R. F., J. Mol. Biol. 157:105 (1982)) with a range of 11 residues, and PSORT which is available on the WWW site http://psort.nibb.ac.jp. PSORT predicts the presence of signal sequences by the methods of McGeoch (McGeoch, D. J., Virus Res. 3:271 (1985)) and von Heijne (von Heijne, G., Nucl. Acids Res. 14:4683 (1986)), and detects potential transmembrane domains by the method of Klein et al. (Klein, P. et al., Biochim. Biophys. Acta 815:468 (1985)). Of a total of 201 ORFs examined, 90 potential membrane proteins were found. Eleven of them are predicted to have type I signal peptides, and five type II signal peptides. Using this approach, at least fifty potential membrane proteins were identified from the list of ORFs with known functions. This brings the total number of membrane proteins in M. genitalium to approximately 140.


[0256] To manage these putative membrane proteins, M. genitalium has at its disposal a minimal secretary machinery composed of seven functions: three chaperoning GroEL, DnaK and the trigger factor Tig (Pugsley, A. P., Microbiol. Rev. 57:50 (1993); Guthrie, B. and Wickner, W., J. Bacteriol. 172:5555 (1990), an ATPase pilot protein SecA, one integral membrane protein translocase (SecY), a signal recognition particle protein (Ffh) and a lipoprotein-specific signal peptidase LspA (Pugsley, A. P., Microbiol. Rev. 57:50 (1993)). Perhaps the lack of other known translocases like SecE, SecD, and SecF which are present in E. coli and H. influenzae , is related to the fact that M. genitalium has a one-layer cell envelope. Also, the absence of a SecB homologue, the secretory chaperonin of E. coli, in M. genitalium (it is also absent in B. subtilis (Collier, D. N. J. Bacteriol. 176:4937 (1994))) might reflect a difference between gram negative and wall-less Mollicutes in handling nascent proteins destined for the general secretory pathway. Considering the presence of several putative membrane proteins that contain type I signal peptides, the absence of a signal peptidase I (lepB) is most surprising. A direct electronic search for the M. genitalium lepB gene using the E. coli lepB and the B. subtilis sipS (van Dijil, J. M. et al., EMBO J. 11:2819 (1992)) as queries did not reveal any significant similarities.


[0257] There are a number of possible explanations as to why genes encoding some of the proteins thought to be essential for a self-replicating organism appear to be absent in M. genitalium. One possibility is that a limited number of proteins may have adapted to take on other functions. A second possibility is that certain proteins thought to be essential for life based on studies in E. coli are not required in a simpler prokaryote like M. genitalium. Finally, it may be that sequences from M. genitalium have such a low similarity to known sequences from other species that matches are not detectable above a reasonable confidence threshold.


[0258] Determination of the complete genome sequence of M. genitalium provides a new starting point in understanding the biology of this and related organisms. Comparison of the genes expressed in M. genitalium, a simple prokaryote, with those in H. influenzae, a more complex organism, has revealed a myriad of differences between these species. Fifty-six percent of the genes in M. genitalium have apparent isologs in H. influenzae, suggesting that this subset of the M. genitalium genome may encode the genes that are truly essential for a self-replicating organism. Notable among the genes that are conserved between M. genitalium and H. influenzae are those involved in DNA replication and repair, transcription and translation, cell division, and basic energy metabolism via glycolysis. Isologs of these genes are found in eukaryotes as well.



EEXAMPLE 2

[0259] Production of an Antibody to a Mycoplasma genitalium Protein


[0260] Substantially pure protein or polypeptide is isolated from the transfected or transformed cells using any one of the methods known in the art. The protein can also be produced in a recombinant prokaryotic expression system, such as E. coli, or can by chemically synthesized. Concentration of protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can then be prepared as follows:


[0261] Monoclonal Antibody Production by Hybridoma Fusion


[0262] Monoclonal antibody to epitopes of any of the peptides identified and isolated as described can be prepared from murine hybridomas according to the classical method of Kohler, G. and Milstein, C., Nature 256:495 (1975) or modifications of the methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody-producing clones are identified by detection of antibody in the supernatant fluid of the wells by immunoassay procedures, such as ELISA, as originally described by Engvall, E., Meth. Enzymol. 70:419 (1980), and modified methods thereof. Selected positive clones can be expanded and their monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et al. Basic Methods in Molecular Biology Elsevier, New York. Section 21-2 (1989).


[0263] Polyclonal Antibody Production by Immunization


[0264] Polyclonal antiserum containing antibodies to heterogeneous epitopes of a single protein can be prepared by immunizing suitable animals with the expressed protein described above, which can be unmodified or modified to enhance immunogenicity. Effective polyclonal antibody production is affected by many factors related both to the antigen and the host species. For example, small molecules tend to be less immunogenic than other and may require the use of carriers and adjuvant. Also, host animals vary in response to site of inoculations and dose, with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at multiple intradermal sites appears to be most reliable. An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al., J. Clin. Endocrinol. Metab. 33:988-991 (1971).


[0265] Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, as determined semi-quantitatively, for example, by double immunodifflision in agar against known concentrations of the antigen, begins to fall (See Ouchterlony, O. et al., Chap. 19 in: Handbook ofExperimental Immunology, Wier, D., ed, Blackwell (1973)). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12 μM). Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, for example, by Fisher, D., Chap. 42 in: Manual of Clinical Immunology, second edition, Rose and Friedman, (eds.), Amer. Soc. For Microbio., Washington, D.C. (1980).


[0266] Antibody preparations prepared according to either protocol are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological sample.



EXAMPLE 3

[0267] Preparation of PCR Primers and Amplification of DNA


[0268] Various fragments of the Mycoplasma genitalium genome, such as those disclosed in Tables 1a, 1b, 1c and 2 can be used, in accordance with the present invention, to prepare PCR primers for a variety of uses. The PCR primers are preferably at least 15 bases, and more preferably at least 18 bases in length. When selecting a primer sequence, it is preferred that the primer pairs have approximately the same G/C ratio, so that melting temperatures are approximately the same. The PCR primers and amplified DNA of this Example find use in the examples that follow.



EXAMPLE 4

[0269] Gene Expression from DNA Sequences Corresponding to ORFs


[0270] A fragment of the Mycoplasma genitalium genome provided in Tables 1a, 1b, 1c and 2 is introduced into an expression vector using conventional technology (techniques to transfer cloned sequences into expression vectors that direct protein translation in mammalian, yeast, insect or bacterial expression systems are well known in the art). Commercially available vectors and expression systems are available from a variety of suppliers including Stratagene (La Jolla, Calif.), Promega (Madison, Wis.), and Invitrogen (San Diego, Calif.). If desired, to enhance expression and facilitate proper protein folding, the codon context and codon pairing of the sequence may be optimized for the particular expression organism, as explained by Hatfield et al., U.S. Pat. No. 5,082,767, which is hereby incorporated by reference.


[0271] The following is provided as one exemplary method to generate polypeptide(s) from cloned ORFs of the Mycoplasma genome fragment. Since the ORF lacks a poly A sequence because of the bacterial origin of the ORF, this sequence can be added to the construct by, for example, splicing out the poly A sequence from pSG5 (Stratagene) using BglI and SalI restriction endonuclease enzymes and incorporating it into the mammalian expression vector pXT1 (Stratagene) for use in eukaryotic expression systems. pXT1 contains the LTRs and a portion of the gag gene from Moloney Murine Leukemia Virus. The position of the LTRs in the construct allow efficient stable transfection. The vector includes the Herpes Simplex thymidine kinase promoter and the selectable neomycin gene. The Mycoplasma DNA is obtained by PCR from the bacterial vector using oligonucleotide primers complementary to the Mycoplasma DNA and containing restriction endonuclease sequences for PstI incorporated into the 5′ primer and BgllI at the 5′ end of the corresponding Mycoplasma DNA 3′ primer, taking care to ensure that the Mycoplasma DNA is positioned such that its followed with the poly A sequence. The purified fragment obtained from the resulting PCR reaction is digested with PstI, blunt ended with an exonuclease, digested with BglII, purified and ligated to pXT1, now containing a poly A sequence and digested BglII.


[0272] The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life Technologies, Inc., Grand Island, N.Y.) under conditions outlined in the product specification. Positive transfectants are selected after growing the transfected cells in 600 μg/ml G418 (Sigma, St. Louis, Mo.). The protein is preferably released into the supernatant. However if the protein has membrane binding domains, the protein may additionally be retained within the cell or expression may be restricted to the cell surface.


[0273] Since it may be necessary to purify and locate the transfected product, synthetic 15-mer peptides synthesized from the predicted Mycoplasma DNA sequence are injected into mice to generate antibody to the polypeptide encoded by the Mycoplasma DNA.


[0274] If antibody production is not possible, the Mycoplasma DNA sequence is additionally incorporated into eukaryotic expression vectors and expressed as a chimeric with, for example, β-globin. Antibody to β-globin is used to purify the chimeric. Corresponding protease cleavage sites engineered between the β-globin gene and the Mycoplasma DNA are then used to separate the two polypeptide fragments from one another after translation. One useful expression vector for generating β-globin chimerics is pSG5 (Stratagene). This vector encodes rabbit β-globin. Intron II of the rabbit β-globin gene facilitates splicing of the expressed transcript, and the polyadenylation signal incorporated into the construct increases the level of expression. These techniques as described are well known to those skilled in the art of molecular biology. Standard methods are published in methods texts such as Davis et al. and many of the methods are available from the technical assistance representatives from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may additionally be produced from either construct using in vitro translation systems such as In vitro Express™ Translation Kit (Stratagene).


[0275] While the present invention has been described in some detail for purposes of clarity and understanding, one skilled in the art will appreciate that various changes in formn and detail can be made without departing from the true scope of the invention.


[0276] All patents, patent applications and publications recited herein are hereby incorporated by reference.
1TABLE 1(a)UIDend5end3db_matchdb_match nameper_idper_simgene_lenMG00685529181SP-P00572thymidylate kinase (CDC8) {Saccharomyces cerevisiae}27.586251.7241630MG0091125212037GB: D26185hypothetical protein (GB: D26185_102) {Bacillus subtilis}35.433155.1181786102MG0101206912722SP: P33655DNA primase (dnaE) {Clostridium acetobutylicum}25.73153.2164654MG0121424713573SP: P17116ribosomal protein S6 modification protein (rimK) {Escherichia coli}31.496154.3307675MG0131521714399GB: D10588_15,10-methylene-tetrahydrofolate dehydrogenase (folD) {Escherichia33.047253.2189819coli}MG0151747419240SP: P27299transport ATP-binding protein (msbA) {Escherichia coli}32.238257.49491767MG0232647827341GB: M22039_4fructose-bisphosphate aldolase (tsr) {Bacillus subtilis}45.964965.9649864MG0242734528445GP: U02423_1GTP-binding protein (gtpl) {Escherichia coli}46.840167.6581101MG0323697838975GB: M63489_1ATP-dependent nuclease (addA) {Bacillus subtilis}26.829354.26831998MG0333924239901GB: M99611_2glycerol uptake facilitator (glpF) {Bacillus subtilis}35.897455.3846660MG0344051439876GB: M97678_5thymidine kinase (tdk) {Bacillus subtilis}48.128369.5187639MG0354054341784GB: U00011_2histidyl-tRNA synthetase (hisS) {Mycobacterium leprae}30.710750.76141242MG0384627744754GB: L19201_68glycerol kinase (glpK) {Escherichia coli}46.825470.23811524MG0394742246271PIR: S48379glycerol-3-phospate dehydrogenase (GUT2) {Saccharomyces43.209960.49381152cerevisiae}MG0414937749640GB: L22432_2phosphohistidinoprotein-hexose phosphotransferase (ptsH)48.863670.4545264{Mycoplasma capricolum}MG0425006051517GB: M64519_1spermidine/putrescine transport ATP-binding protein (potA)41.923165.38461458{Escherichia coli}MG0435152552379GB: M64519_2spermidine/putrescine transport system permease protein (potB)26.511657.2093855{Escherichia coli}MG0445236653217GB: M64519_3spermidine/putrescine transport system permease protein (potC)29.457458.1395852{Escherichia coli}MG0465465855602GB: M62364_1sialoglycoprotease (gcp) {Pasteurella haemolytica}36.601359.4771945MG0485831056973SP: P37105signal recognition particle protein (ffh) {Bacillus subtilis}43.020666.13271338MG0495811759076GB: U14003purine-nucleoside phosphorylase (deoD) {Escherichia coli}44.782663.0435960295MG0505908359751GB: X13544_1deoxyribose-phosphate aldolase (deoC) {Mycoplasma pneumoniae}83.035791.5179669MG0566573164901GB: D26185_99hypothetical protein (GB: D26185_99) {Bacillus subtilis}30.258354.6125831MG0576624965716GB: D26185hypothetical protein (GB: D26185_104) {Bacillus subtilis}28.901728.9017534104MG0678104782594GB: D00730_1glutamic acid specific protease (SPase) {Staphylococcus aureus}28.846248.07691548MG0709106591916SP: P34831ribosomal protein S2 (rpS2) {Spirulina platensis}34.855.2852MG077103104104324SP: P24138oligopeptide transport system permease protein (oppB) {Bacillus28.052858.41581221subtilis}MG078104320105447SP: P26904oligopeptide transport system permease protein (dciAC) {Bacillus33.457255.01861128subtilis}MG079105452106657SP: P18765oligopeptide transport ATP-binding protein (amiE) {Streptococcus47.941267.94121206pneumoniae}MG081109262109672SP: P29395ribosomal protein L11 (RPL11) {Thermotoga maritima}51.798671.9424411MG085111790112722PIR: S24760hydroxymethylglutaryl-CoA reductase (NADPH) {Nicotiana23.321649.1166933sylvestris}MG086112718113863GB: L13259_2prolipoprotein diacylglyceryl transferase (lgt) {Salmonella29.126253.88351146typhimurium}MG091117553118032GB: U04997_2single-stranded DNA binding protein (ssb) {Haemophilus21.794941.6667480influenzae}MG092118025118339GB: U14003ribosomal protein S18 (rpS18) {Escherichia coli}45.454568.1818315114MG093118345118794GB: M57623_1ribosomal protein L9 (rpL9) {Bacillus stearothermophilus}32.885956.3758450MG099125852127282GB: M61151_1hydrolase (aux2) {Agrobacterium rhizogenes}32.121251.81821431MG106134826134149SP: P27251formylmethionine deformylase (def) {Escherichia coli}36.936968.4685678MG107134558135334GB: L10328_145′guanylate kinase (gmk) {Escherichia coli}42.62365.0273777MG114141345142052GB: M12299_2phosphatidylglycerophosphate synthase (pgsA) {Escherichia coli}29.299457.3248708MG118143935144954SP: P09147UDP-glucose 4-epimerase (galE) {Escherichia coli}34.055753.871020MG121148238149155SP: P32720hypothetical protein (SP: P32720) {Escherichia coli}30.882450.7353918MG125153081153935GB: L10328_61hypothetical protein (GB: L10328_61) {Escherichia coli}31.914948.227855MG126154962153922GB: M24068_1tryptophanyl-tRNA synthetase (trpS) {Bacillus subtilis}41.158561.58541041MG127154998155432SP: P19434hypothetical protein (SP: P19434) {Streptomyces25.961549.0385435viridochromogenes}MG128155443156219GB: U00021_19hypothetical protein (GB: U00021_19) {Mycobacterium leprae}27.702749.3243777MG129156222156572GB: U12340_1PTS glucose-specific permease {Bacillus stearothermophilus}25.454551.8182351MG130156565158016GB: M91593_1hypothetical protein (GB: M91593_1) {Mycoplasma mycoides}30.677355.77691452MG131158022158243GB: M31161_3hypothetical protein (GB: M31161_3) {Spiroplasma citri}21.590956.8182222MG132159005158583SP: P32083hypothetical protein (SP: P32083) {Mycoplasma hyorhinis}30.097156.3107423MG136160964162431GB: D26185lysyl-tRNA synthetase (lysS) {Bacillus subtilis}45.621268.43181470144UIDend5end3db_matchdb_match nameper_idper_simgene_lenMG137162376163587GP: L41518_4 dTDP-4-dehydrorhamnose reductase (rfbD) {Klebsiella32.162255.94591212pneumoniae}MG139165470167176GB: L18927_2hypothetical protein (GB: L18927_2) {Buchnera aphidicola}28.571462.85711707MG143182853183188SP: P09170hypothetical protein (SP: P09170) {Escherichia coli}2553.7037336MG145184055184861GB: M35367_1protein X {Pseudomonas fluorescens}29.069848.4496807MG148187304188530GB: L18965_6hypothetical protein (GB: L18965_6) {Thermophilic bacterial sp.}25.287452.87361227MG150190048190365SP: P38518ribosomal protein S10 (rpS10) {Thermotoga maritima}48.91371.7391318MG152191145191777SP: P28601ribosomal protein L4 (rpL4) {Bacillus stearothermophilus}39.234563.1579633MG153191784192101SP: P04454ribosomal protein L23 (rpL23) {Bacillus stearothermophilus}38.709762.3656318MG154192104192958SP: P04257ribosomal protein L2 (rpL2) {Bacillus stearothermophilus}58.781472.4014855MG155192961193221GB: X02613_6ribosomal protein S19 (rpS19) {Escherichia coli}58.620777.0115261MG156193227193658GB: M74770_4ribosomal protein L22 (rpL22) {Mycoplasma-like organism}49.038567.3077432MG157193664194467SP: P02353ribosomal protein S3 (rpS3) {Mycoplasma capricolum}46.72967.2897804MG158194476194889SP: P02415ribosomal protein L16 (rpL16) {Mycoplasma capricolum}63.503778.1022414MG159194892195491SP: P38514ribosomal protein L29 (rpL29) {Thermotoga maritima}41.666765600MG160195494195748SP: P10131ribosomal protein S17 (rpS17) {Mycoplasma capricolum}51.190567.8571255MG161195755196120SP: P04450ribosomal protein L14 (rpLl4) {Bacillus stearothermophilus}63.114886.0656366MG162196123196446SP: P04455ribosomal protein L24 (rpL24) {Bacillus stearothermophilus)44.578366.2651324MG163196455196994SP: P08895ribosomal protein L5 (rpL5) {Bacillus stearothermophilus}57.541977.095540MG164197000197182GB: X06414_15ribosomal protein S14 (rpS14) {Mycoplasma capricolum}70.491883.6066183MG165197179197601SP: P04446ribosomal protein S8 (rpS8) {Mycoplasma capricolum}46.87571.0938423MG166197611198162SP: P04448ribosomal protein L6 (rpL6) {Mycoplasma capricolum}46.994566.6667552MG167198167198511GB: M57624_1ribosomal protein L18 (rpL18) {Bacillus stearothermophilus}42.982557.8947345MG169199160199609SP: P10138ribosomal protein L15 (rpL15) {Mycoplasma capricolum}41.891966.2162450MG170199612201036SP: P10250preprotein translocase secY subunit (secY) {Mycoplasma38.789268.16141425capricolum}MG171201033201674GB: M88104_2adenylate kinase (adk) {Bacillus stearothermophilus}32.211557.6923642MG172201680202423GB: D00619_5methionine amino peptidase (map) {Bacillus subtilis}36.290358.4677744MG173202426202635GB: M26414_1initiation factor 1 (infA) {Bacillus subtilis}48.529467.6471210MG174202649202759SP: P38015ribosomal protein L36 (rpL36) {Chlamydia trachomatis}78.378483.7838111MG177203516204499GB: M26414_5RNA polymerase alpha core subunit (rpoA) {Bacillus subtilis}39.393965.9933984MG178204515204515GB: M26414_6ribosomal protein L17 (rpL17) {Bacillus subtilis}34.782659.1304369MG179204873205694SP: P11599haemolysin secretion ATP-binding protein (hlyB) {Proteus vulgaris}34.599262.0253822MG187216762218516GB: M77351_7ATP-bindingprotein(msmK) {Streptococcus mutans}40.532565.68051755MG188218522219508GB: M77351_4membrane protein (msmF) {Streptococcus mutans}22.471951.6854987MG189219435220436GB: M77351_5membrane protein (msmG) {Streptococcus mutans}27.142952.85711002MG196235635236057GB: X16188_1translation initiation factor 1F3 (infC) {Bacillus stearothermophilus}31.343362.6866423MG197236063236239PIR: S05347ribosomal protein L35 (rpL35) {Bacillus stearothermophilus}6072.7273177MG198236245236616SP: Q05427ribosomal protein L20 (rpL20) {Mycoplasma fermentans}57.522173.4513372MG201239163239813GB: M84964_2heat shock protein (grpE) {Bacillus subtilis}31.67749.6894651MG205245596244568GB: M84964_1hypothetical protein (GB: M84964_1) {Bacillus subtilis}30.994258.18711029MG213252579253991GB: L09228_16hypothetical protein(GB: L09228_16) {Bacillus subtilis}27.118654.6611413MG214253978254598GB: L09228_17hypothetical protein (GB: L09228_17) {Bacillus subtilis}34.857159.4286621MG215254620255588SP: P202756-phosphofructokinase (pfk) {Spiroplasma citri}39.44163.0435969MG217258040259155SP: P29126bifunctional endo-1,4-beta-xylanase xyla precursor (xynA)37.583948.99331116{Ruminococcus flavefaciens}MG219265596266039GB: M87491_1IgAl protease {Haemophilus influenzae}32.231451.2397444MG220266382266077GB: Z26883_1pre-procytotoxin (vacA) {Helicobacter pylori}36.144651.8072306MG222267080268006GB: D10483_63hypothetical protein (GB: D10483_63){Escherichia coli}35.197456.5789927MG224269249270355GB: U06462_1cell division protein (ftsZ) {Staphylococcus aureus}30.882450.73531107MG234279491279802GB: K02665_2ribosomal protein L27 (rpL27) {Bacillus subtilis}64.367880.4598312MG235279798280670SP: P12638endonuclease IV (nfo) {Escherichia coli}29.36851.3011873MG245293446293940GB: M12965_1hypothetical protein (GB: M12965_1) {Escherichia coli}33.846256.9231495MG247295484294768SP: P31056hypothetical protein (SP: P31056) {Escherichia coli}32.97356.2162717MG248296127295474GP: U17284_2major sigma factor (rpoD) {Listeria monocytogenes}28.484851.5152654MG251300802299465GB: L08106_1glycyl-tRNA synthetase {Bombyx mori}35.897456.17721338MG252301550300825GP: Z33076_2rRNA methylase {Mycoplasma capricolum}38.862659.7156726MG253302839301556GB: D26185cysteinyl-tRNA synthetase (cysS) {Bacillus subtilis}34.345856.30841284156MG257307635307925GB: L19201_78ribosomal protein L31 (rpL31) {Escherichia coli}37.313461.194291MG258307928309004GB: M11519_1peptide chain release factor 1 (RF-1) {Escherichia coli}43.167766.45961077MG259309008310375GB: D28567_2protoporphyrinogen oxidase (hemK) {Escherichia coli}30.573254.14011368MG260310509312803GB: Z32651_1hypothetical protein (GB: Z32651_1) {Mycoplasma pneumoniae}57.142971.42862295MG262318330319202GB: L11920_1DNA polymerase I (polI) {Mycobacterium tuberculosis}29.941947.9651873MG264321044321637GB: M64324_16-phosphogluconate dehydrogenase (gnd) {Escherichia coli}29.850747.7612594MG265322412321579GB: L10328_61hypothetical protein (GB: L10328_61) {Escherichia coli}27.19348.6842834MG268325877325194GB: U01881_2deoxyguanosine/deoxyadenosine kinase(I) subunit 2 {Lactobacillus29.518149.3976684acidophilus}MG270328442327435GB: U14003hypothetical protein (GB: U14003_297) {Escherichia coli}38.283857.75581008297MG272330984329833GB: M81753_3dihydrolipoamide acetyltransferase (pdhC) {Acholeplasma45.152462.04991152laidlawii}MG273332214331237GB: M81753_2pyruvate dehydrogenase E1-beta subunit (pdhB) {Acholeplasma55.031476.7296978laidlawii}MG274333308332235GB: M81753_1pyruvate dehydrogenase E1-alpha subunit (pdhA) {Acholeplasma42.982561.11111074laidlawii}MG277338323335414GB: L16960_2spore germination apparatus protein (gerBB) {Bacillus subtilis}31.255.22910MG280341920341177GB: Z35086_1sensory rhodopsin II transducer (htrII) {Natronobacterium15.714346.6667744pharaonis }MG288353034351793GB: L04466_1protein L {Peptostreptococcus magnus}31.147550.81971242MG290355119355853SP: P15361ATP-binding protein P29 {Mycoplasma hyorhinis}32.300958.8496735MG292360592357893GB: J01581_1alanyl-tRNA synthetase (alaS) {Escherichia coli}33.840355.642700MG295364022362922SP: P25745hypothetical protein (SP: P25745) {Escherichia coli}34.710757.02481101MG299369694368735SP: P39646phosphotransacetylase (pta) {Clostridium acetobutylicum}44.654163.522960MG303373998372928GB: M61017_1membrane transport protein (glnQ) {Bacillus stearothermophilus}31.98254.9551071MG304374741373983GB: U13043_1membrane associated ATPase (cbiO) {Propionibacterium30.044853.8117759freudenreichii}MG310386462387265GB: D11037_1proline iminopeptidase (pip) {Bacillus coagulans}29.207951.4851804MG311387892387278GB: M59358_1ribosomal protein S4 (rpS4) {Bacillus subtilis}4365.5615MG313392023391397GP: L38997_5cytadherence-accessory protein (hmwl) {Mycoplasma pneumoniae}53.846279.8077627MG315394550393660GP: L38997_3cytadherence accessory protein (hmwl) {Mycoplasma pneumoniae}44.387869.898891MG316395583394477GB: L15202_4competence locus E (comE3) {Bacillus subtilis}30.493352.46641107MG322405398403725GB: D17462_11Na+ATPase subunit J (ntpJ) {Enterococcus hirae}31.081156.30631674MG323405455406135GB: D37799_6hypothetical protein (GB: D37799_6) {Bacillus subtilis}27.570154.2056681MG325408953408795SP: P23375ribosomal protein L33 (rpL33) {Bacillus stearothermophilus}58.139569.7674159MG326409857408973GB: Z18629_1hypothetical protein (GB: Z18629_1) {Bacillus subtilis}27.075852.7076885MG329414318412975GB: U00021_5hypothetical protein (GB: U00021_5) {Mycobacterium leprae}32.183954.25291344MG332416329415613GB: D10165_3hypothetical protein (GB: D10165_3) {Escherichia coli}26.923149.1453717MG346443922444419GB: M65289_3hypothetical protein (GB: M65289_3) {Bacillus stearothermophilus}37.974760.1266498MG347444413445042SP: P32049hypothetical protein (SP: P32049) {Escherichia coli}28.461546.9231630MG351449665450216SP: P37981inorganic pyrophosphatase (ppa) {Thermoplasma acidophilum}38.853561.7834552MG355453757451616GB: M29364_2ATP-dependent protease binding subunit (clpB) {Escherichia coli}47.733770.67992142MG356454753453914GB: M27280_1lic-1 operon protein (licA) {Haemophilus influenzae}27.777856.25840MG359457347458267GB: M21298_2Holliday junction DNA helicase (ruvB) {Escherichia coli}34.693964.966921MG360459495458263SP: P14303UV protection protein (mucB) {Salmonella typhimurium}22.085948.15951233MG363460497460667GB: M29698_2ribosomal protein L32 (rpL32) {Escherichia coli}48.148162.963171MG364461015461686GB: M95954_1mobilization protein (mobl3) {Leuconostoc oenos}30.872553.6913672MG367465434464649GB: X02673_1ribonuclease III (mc) {Escherichia coli}30.172465.5172786MG380478999479574GB: L10328_105glucose inhibited division protein (gidB) {Escherichia coli}24.827651.7241576MG382480691481329SP: P31218uridine kinase (udk) {Escherichia coli}34.482862.5616639MG383482075481332GB: M15811_1sporulation protein (outB) {Bacillus subtilis}36.363654.9784744MG384483369482071GB: M24537_2GTP-binding protein (obg) {Bacillus subtilis}39.62762.00471299MG387490711489842SP: P37214GTP-binding protein era homolog (spg) {Streptococcus mutans}27.385951.0373870MG396500719500264GB: M80797_2galactosidase acetyltransferase (lacA) {Streptococcus mutans}40.579757.971456MG398502823502425SP: P33255ATP synthase epsilon chain (atpC) {Mycoplasma gallisepticum}36.923155.3846399MG402507201506674SP: P33254ATP synthase delta chain (atpH) {Mycoplasma gallisepticum}33.918158.4795528MG403507820507197SP: P33256ATP synthase B chain (atpF) {Mycoplasma gallisepticum}36.597966.4948624MG404508131507826SP: P33258ATP synthase C chain (atpE) {Mycoplasma gallisepticum}5074.359306MG407510836509463GB: L29475_4enolase (eno) {Bacillus subtilis}54.079374.12591374MG408510903511373SP: P14930pilin repressor (pilB) {Neisseria gonorrhoeae}49.218868.75471MG409512050511376GB: L10328_88peripheral membrane protein U (phoU) {Escherichia coli}27.02748.6486675MG420524144523365GB: D26185_83DNA polymerase III subunit (dnaH) {Bacillus subtilis}49.11568.5841780MG424531479531222SP: P05766ribosomal protein S15 (BS18) {Bacillus stearothermophilus}48.148171.6049258MG426533040533231GB: L12244_2ribosomal protein L28 (rpL28) {Bacillus subtilis}36.065659.0164192MG429536036534321GB: M69050_2PEP-dependent HPr protein kinase phosphoryltransferase (ptsI)46.478966.54931716{Staphylococcus carnosus}MG430537563536043GB: L29475_3phosphoglycerate mutase (pgm) {Bacillus subtilis}45.186662.47541521MG432539546538353SP: P27712hypothetical protein (SP: P27712) {Spiroplasma citri}28.43648.81521194MG433539632540525GB: M31161_2elongation factor Ts (tsf) {Spiroplasma citri}39.057262.6263894MG434540848541237GB: D26562_56mukB suppressor protein (smbA) {Escherichia coli}40.869661.7391390MG435541240541788GB: D26562_57ribosome releasing factor (frr) {Escherichia coli}34.911257.3965549MG438543004544152GB: J01631_1restriction-modification enzyme EcoD specificity subunit (hsdS)24.573445.73381149{Escherichia coli}MG442547690546881GB: U00021_5hypothetical protein (GB: U00021_5) {Mycobacterium leprae}26.896642.069810MG443548849547665GB: D16311_1hypothetical protein (GB: D16311_1) {Bacillus subtilis}26.1818521185MG444549224548868SP: P30529ribosomal protein L19 (rpL19) {Bacillus stearothermophilus}49.107169.6429357MG445549903549211SP: P36245tRNA (guanine-N1)-methyltransferase (trmD) {Salmonella40.807264.1256693typhimurium}MG446550172549906SP: P21474ribosomal protein S16 (BS17) {Bacillus subtilis}48.780564.6341267MG448552897552448GB: Z33052_1pilin repressor (pilB) {Mycoplasma capricolum}53.488472.093450MG454557770557306SP: P23929osmotically inducible protein (osmC) {Escherichia coli}28.409151.1364465MG457562602560497GB: D26185cell division protein (ftsH) {Bacillus subtilis}49.744568.14312106132MG461566203564929GB: X73124_94hypothetical protein (GB: X73124_94) {Bacillus subtilis}4064.28571275MG464569554568400GB: D14982_3hypothetical protein (GB: D14982_3) {Mycoplasma capricolum}32.369953.75721155MG465569912569529GB: D14982_2RNaseP C5 subunit (rnpA) {Mycoplasma capricolum}4058.75384MG466570027569884GB: L10328_67ribosomal protein L34 (rpL34) {Escherichia coli}67.391380.4348144MG470580030579224GB: D26185_55SpoOJ regulator {Bacillus subtilis}27.888453.3865807


[0277]

2












TABLE 1(b)








UID
end5
end3
db_match
db_match name
per_sim
per_id
match_info






















MG002
1829
2758
SP: P35514
heat shock protein (dnaJ) {Lactococcus
40
61.6667
MG002(1-930 of 930)








lactis
}



GB: U09251(298-1227 of 6140)


MG003
2846
4795
GB: U09251_3
DNA gyrase subunit B (gyrB) {Mycoplasma
99.3846
99.3846
MG003(1-1950 of 1950)








genitalium
}



GB: U09251(1315-3264









of 6140)


MG004
4813
7320
GB: U09251_4
DNA gyrase subunit A (gyrA) {Mycoplasma
99.8804
99.8804
MG004(1-2508 of 2508)








genitalium
}



GB: U09251(3282-5789









of 6140)


MG191
221571
225902
SP: P20796
attachment protein, MgPa operon (mgp)
100
100
MG191(1-4332 of 4332)






{Mycoplasma genitalium}


GB: M31431(1066-5397









of 8760)


MG192
225907
229062
SP: P22747
114 kDa protein, MgPa operon (mgp)
100
100
MG192(1-3156 of 3156)






{Mycoplasma genitalium}


GB: M31431(5402-8557









of 8760)


MG232
278904
279203
SP: P26908
ribosomal protein L21 (rpL21) {Bacillus
37.8947
65.2632
MG232(1-300 of 300)








subtilis
}



GB: U02141(138-437 of 827)


MG233
279199
279495
GP: U02141_2
ribosomal protein L21 homolog
100
100
MG233(1-297 of 297)






{Mycoplasma genitalium}


GB: U02141(433-729 of 827)


MG287
348882
349133
SP: P04686
nodulation protein F (nodE) {Rhizobium
34.9398
56.6265
MG287(1-252 of 252)








leguminosarum
}



GB: U01810(152-403 of 917)


MG417
521868
521473
SP: P07842
ribosomal protein S9 (rpS9) {Bacillus
51.9685
71.6535
MG417(1-396 of 396








stearothermophilus
}



GB: U01744(127-522 of 620)










[0278]

3












TABLE 1(c)








UID
end5
end3
db_match
db_match name
per_sim
per_id
match_info






















MG001
1026
1826
GB: U09251_1
DNA polymerase III beta subunit (dnaN)
100
100
MG001(507-801 of






{Mycoplasma genitalium}


801) GB: U09251









(1-295 of 6140)


MG005
7295
8545
GB: D26185_77
seryl-tRNA synthetase (serS) {Bacillus subtilis}
42.615
66.3438
MG005(1-377









of 1251)









GB: U09251(5764-









6140 of 6140)


MG005
7295
8545
GB: D26185_77
seryl-tRNA synthetase (serS) {Bacillus subtilis}
42.615
66.3438
MG005(16-337









of 1251)









GB: U02210(1-322









of 322)


MG007
9157
9918
GB: D26185_83
DNA polymerase III subunit (dnaH) {Bacillus subtilis}
22.695
45.3901
MG007(762-711









of 762)









GB: U02216(270-









321 of 321)


MG008
9924
11249
GB: D26185_60
thiophene and furan oxidizer (tdhF) {Bacillus subtilis}
31.9101
59.7753
MG008(264-1









of 1326)









GB: U02216(1-264









of 321)


MG011
13565
12705




MG011(473-767









of 861)









GB: U02257(2-









296 of 296)


MG014
15556
17424
SP: P27299
transport ATP-binding protein (msbA) {Escherichia coli}
28.0702
52.6316
MG014(1005-678









of 1869)









GB: U02235(1-









326 of 326)


MG018
21063
22343
SP: P32333
helicase (motl) {Saccharomyces cerevisiae}
36.6972
60.0917
MG018(1281-1067









of 1281)









GB: U01723(89-









304 of 304)


MG018
21063
22343
SP: P32333
helicase (mot1) {Saccharomyces cerevisiae}
36.6972
60.0917
MG018(409-105









of 1281)









GB: U02179(1-









305 of 305)


MG018
21063
22343
SP: P32333
helicase (mot1) {Saccharomyces cerevisiae}
36.6972
60.0917
MG018(592-896









of 1281)









GB: U01757(1-









305 of 305)


MG019
22388
23554
SP: P35514
heat shock protein (dnaJ) {Lactococcus lactis}
33.9779
51.105
MG019(44-1









of 1167)









GB: U01723(1-









44 of 304)


MG020
23541
24464
GB: Z25461_2
proline iminopeptidase (pip) {Neisseria gonorrhoeae}
37.5439
55.7895
MG020(723-924









of 924)









GB: U02229(1-









202 of 333)


MG021
24467
26002
GB: D26185_101
methionyl-tRNA synthetase (metS) {Bacillus subtilis}
37.5494
58.8933
MG021(1-129









of 1536)









GB: U02229(205-









333 of 333)


MG021
24467
26002
GB: D26185_101
methionyl-tRNA synthetase (metS) {Bacillus subtilis}
37.5494
58.8933
MG021(1318-1527









of 1536)









GB: X61513(1-209









of 209)


MG022
26035
26469
GB: M21677_1
RNA polymerase delta subunit (rpoE) {Bacillus subtilis}
28.6765
49.2647
MG022(254-1









of 435)









GB: U01721(1-254









of 299)


MG025
28651
29544
GP: Z47767_4
TrsB {Yersinia enterocolitica}
27.551
54.0816
MG025(514-894









of 894)









GB: U02253(1-381









of 649)


MG026
29551
30120
GB: U14003_62
elongation factor P (efp) {Escherichia coli}
26.3804
47.2393
MG026(1-262









of 570)









GB: U02253(388-









649 of 649)


MG029
31702
31145
GB: L19300_1
hypothetical protein (GB: L19300_1)
27.027
45.045
MG029(1-93






{Staphylococcus aureus}


of 558) GB: U01773









(210-302 of 302)


MG030
32324
31707
GB: Z27121_3
uracil phosphoribosyltransferase (upp)
44.9275
66.6667
MG030(414-618






{Mycoplasma hominis}


of 618) GB:









U01773(1-205









of 302)


MG031
36713
32361
GB: U06833_1
DNA polymerase III (polC)
38.0303
59.3182
MG031(1473-1701






{Mycoplasma pulmonis}


of 4353) GB:









U01807(1-









229 of 229)


MG031
36713
32361
GB: U06833_1
DNA polymerase III (polC)
38.0303
59.3182
MG031(2923-3309






{Mycoplasma pulmonis}


of 4353) GB:









U01712(1-









387 of 387)


MG031
36713
32361
GB: U06833_1
DNA polymerase III (polC) {Mycoplasma pulmonis}
38.0303
59.3182
MG031(3330-









3676 of 4353)









GB: U02208(1-









347 of 347)


MG036
41777
43426
SP: P36419
aspartyl-tRNA synthetase (aspS) {Thermus aquaticus}
40.8582
62.8731
MG036(1115-









1650 of 1650)









GB: U01814(1-









532 of 1006)


MG036
41777
43426
SP: P36419
aspartyl-tRNA synthetase (aspS) {Thermus aquaticus}
40.8582
62.8731
MG036(1407-









1638 of 1650)









GB: X61511(1-









232 of 232)


MG036
41777
43426
SP: P36419
aspartyl-tRNA synthetase (aspS) {Thermus aquaticus}
40.8582
62.8731
MG036(1412-









1160 of 1650)









GB: X61523(1-









252 of 252)


MG037
43402
44751
GP: U02020_1
pre-B cell enhancing factor (PBEF) {Homo sapiens}
34.3164
52.2788
MG037(1-









500 of 1350)









GB: U01814(508-









1006 of 1006)


MG040
47581
49353
SP: P29724
membrane lipoprotein (tmpC) {Treponema pallidum}
30.8594
48.0469
MG040(1341-









1552 of 1773)









GB: U02125(1-









212 of 212)


MG045
53205
54653




MG045(381-









4 of 1449)









GB: U02166(1-









378 of 378)


MG047
55589
56737
SP: P30869
S-adenosylmethionine synthetase 2 (metX)
43.6111
60.5556
MG047(787-






{Escherichia coli}


1070 of 1149)









GB: U02123(1-









284 of 284)


MG051
59741
61003
GB: L13289_3
thymidine phosphorylase (deoA) {Mycoplasma pirum}
52.7316
73.6342
MG051(1161-









1263 of 1263)









GB: U02191(1-









103 of 183)


MG052
61015
61404
GB: L13289_4
cytidine deaminase (cdd) {Mycoplasma pirum}
38.2114
64.2276
MG052(1-









69 of 390)









GB: U02191(115-









183 of 183)


MG052
61015
61404
GB: L13289_4
cytidine deaminase (cdd) {Mycoplasma pirum}
38.2114
64.2276
MG052(320-









390 of 390)









GB: U02108(1-









71 of 212)


MG053
61407
63056
GB: L13289_5
phosphomannomutase (cpsG) {Mycoplasma pirum}
38.7868
58.0882
MG053(1-









140 of 1650)









GB: U02108(74-









212 of 212)


MG054
63986
63039
GB: D13303_4
transcription antitermination factor (nusG)
30.8571
51.4286
MG054(688-






{Bacillus subtilis}


44 of 948)









GB: U01710(1-









645 of 645)


MG054
63986
63039
GB: D13303_4
transcription antitermination factor (nusG)
30.8571
51.4286
MG054(948-






{Bacillus subtilis}


719 of 948)









GB: U02236(45-









274 of 276)


MG055
64361
63993




MG055(1-









326 of 369)









GB: U02240(23-









348 of 348)


MG058
67121
66231
GB: D26185_114
phosphoribosylpyrophosphate synthetase (prs)
44.4089
63.5783
MG058(72 -






{Bacillus subtilis}


1 of 891)









GB: U01693(1-









72 of 350)


MG059
67644
67210
GB: D12501_1
small protein (smpB) {Escherichia coli}
32.5581
62.0155
MG059(435-









247 of 435)









GB: U01693(161-









350 of 350)


MG060
67651
68541
SP: P26401
lipopolysaccharide biosynthesis protein (rfbV)
36.0656
59.8361
MG060(723-






{Salmonella typhimurium}


396 of 891)









GB: U02262(1-









328 of 328)


MG061
69908
68526
GB: M89480_4
hexosephosphate transport protein (uhpT)
30.9091
57.2727
MG061(1273-






{Salmonellatyphimurium}


613 of 1383)









GB: U01705(1-









661 of 661)


MG062
70531
72570
SP: P20966
fructose-permease IIBC component (fruA)
42.723
60.5634
MG062(439-






{Escherichia coli}


761 of 2040)









GB: U02138(1-









323 of 323)


MG063
72668
73432
SP: P23539
1-phosphofructokinase (fruK) {Escherichia coli}
26.3158
51.5038
MG063(363-









626 of 765)









GB: U01777(1-









264 of 264)


MG065
77686
79083
GB: X75422_1
heterocyst maturation protein (devA) {Anabaena sp.}
35.2941
59.7285
MG065(1398-









1176 of 1398)









GB: U02154(133-









354 of 354)


MG066
79090
81033
SP: P27302
transketolase 1 (TK 1) (tktA)
32.5617
54.9383
MG066(126-






{Escherichia coli}


1 of 1944)









GB: U02154(1-









126 of 354)


MG068
82621
84042




MG068(1244-









919 of 1422)









GB: U02162(1-









326 of 326)


MG069
88228
90951
SP: P20166
phosphotransferase enzyme II, ABC component (ptsG)
43.1596
61.0749
MG069(1127-






{Bacillus subtilis}


849 of 2724)









GB: U02207(l-









279 of 279)


MG071
91924
94545
SP: P37278
cation-transporting ATPase (pacL)
34.3897
57.277
MG071(1470-






{Synechococcus sp.}


1209 of 2622)









GB: X61532(1-









262 of 262)


MG072
94535
96952
GB: D10279_2
preprotein translocase (secA) {Bacillus subtilis}
43.6601
66.7974
MG072(2269-









2418 of 2418)









GB: U01743(1-









150 of 365)


MG073
96933
98900
SP: P07025
excinuclease ABC subunit B (uvrB)
47.9751
67.2897
MG073(1-






{Escherichia coli}


235 of 1968)









GB: U01743(131-









365 of 365)


MG073
96933
98900
SP: P07025
excinuclease ABC subunit B (uvrB)
47.9751
67.2897
MG073(1584-






{Escherichia coli}


1240 of 1968)









GB: U01698(1-









345 of 345)


MG073
96933
98900
SP: P07025
excinuclease ABC subunit B (uvrB)
47.9751
67.2897
MG073(305-






{Escherichia coli}


694 of 1968)









GB: U02119(1-









391 of 391)


MG074
98906
99316




MG074(369-









411 of 411)









GB: U01715(1-









43 of 576)


MG075
99383
102454




MG075(1-









467 of 3072)









GB: U01715(110-









576 of 576)


MG075
99383
102454




MG075(1206-









804 of 3072)









GB: U02251(1-









403 of 403)


MG075
99383
102454




MG075(1927-









2210 of 3072)









GB: U01749(1-









284 of 284)


MG075
99383
102454




MG075(2841-









2422 of 3072)









GB: U01775(1-









420 of 420)


MG080
106660
109203
SP: P18766
oligopeptide transport ATP-binding protein (amiF)
46.6403
67.1937
MG080(2268-






{Streptococcus pneumoniae}


1954 of 2544)









GB: U02129(1-









315 of 315)


MG080
106660
109203
SP: P18766
oligopeptide transport ATP-binding protein (amiF)
46.6403
67.1937
MG080(951-






{Streptococcus pneumoniae}


646 of 2544)









GB: U01758(1-









306 of 306)


MG082
109675
110352
SP: P04447
ribosomal protein L1 (rpL1)
48.1982
67.5676
MG082(446-






{Bacillus stearothermophilus}


170 of 678)









GB: U02113(1-









278 of 278)


MG083
110355
110921
GB: L32144_1
peptidyl-tRNA hydrolase homolog (pth)
38.2166
57.3248
MG083(567-






{Borrelia burgdorferi}


220 of 567)









GB: U02185(26-









373 of 373)


MG084
110917
111786
SP: P37563
hypothetical protein (SP: P37563)
28.125
46.3542
MG084(30-






{Bacillus subtilis}


1 of 870)









GB: U02185(1-









30 of 373)


MG084
110917
111786
SP: P37563
hypothetical protein (SP: P37563) {Bacillus subtilis}
28.125
46.3542
MG084(794-









870 of 870)









GB: U01783(1-









77 of 269)


MG087
113895
114311
SP: P09901
ribosomal protein S12 (rpS12)
75.3731
82.0896
MG087(417-






{Bacillus stearothermophilus}


349 of 417)









GB: U02212(326-









394 of 394)


MG088
114331
114795
SP: P22744
ribosomal protein S7 (rpS7)
64.9351
81.1688
MG088(305-






{Bacillus stearothermophilus}


1 of 465)









GB: U02212(2-









306 of 394)


MG089
114808
116871
SP: P13551
elongation factor G (fus)
59.2105
78.0702
MG089(1878-






{Thermus aquaticus}


1540 of 2064)









GB: U02180(1-









339 of 340)


MG089
114808
116871
SP: P13551
elongation factor G (fus) {Thermus aquaticus}
59.2105
78.0702
MG089(1885-









2064 of 2064)









GB: U02136(1-









180 of 410)


MG089
114808
116871
SP: P13551
elongation factor G (fus) {Thermus aquaticus}
59.2105
78.0702
MG089(687-









1374 of 2064)









GB: U01722(1-









688 of 688)


MG090
116926
117549
SP: P02358
ribosomal protein S6 (rpS6) {Escherichia coli}
23.8636
44.3182
MG090(1-









176 of 624)









GB: U02136(235-









410 of 410)


MG094
118847
120184
SP: P03005
replicative DNA helicase (dnaB)
33.105
55.0228
MG094(1068-






{Escherichia coli}


731 of 1338)









GB: U01803(1-









336 of 336)


MG094
118847
120184
SP: P03005
replicative DNA helicase (dnaB)
33.105
55.0228
MG094(228-






{Escherichia coli}


1 of 1338)









GB: U02158(1-









228 of 301)


MG095
120191
121384




MG095(355-









759 of 1194)









GB: U01787(1-









403 of 403)


MG096
121939
123519




MG096(1-









309 of 1581)









GB: U01713(58-









366 of 366)


MG096
121939
123519




MG096(361-









531 of 1581)









GB: U01762(1-









171 of 171)


MG097
123579
124313
GB: D13169_3
uracil DNA glycosylase (ung)
32.5688
51.8349
MG097(220-






{Escherichia coli}


694 of 735)









GB: U02201(1-









475 of 475)


MG098
124416
125846
GP: M74170_2
p48 eggshell protein (p48) {Schistosoma mansoni}
23.0769
47.9853
MG098(1260-









831 of 1431)









GB: U01782(1-









431 of 431)


MG098
124416
125846
GP: M74170_2
p48 eggshell protein (p48) {Schistosoma mansoni}
23.0769
47.9853
MG098(134-









467 of 1431)









GB: U01701(1-









334 of 334)


MG100
127278
128708
GP: L22072_1
PET112 protein {Saccharomyces cerevisiae}
30.8696
54.1304
MG100(533-









238 of 1431)









GB: U01799(1-









296 of 296)


MG101
128686
129351




MG101(89-









398 of 666)









GB: U02103(1-









309 of 309)


MG102
129347
130291
GB: J03762_1
thioredoxin reductase (trxB)
38.5906
59.396
MG102(45-






{Escherichia coli}


367 of 945)









GB: U02197(1-









322 of 322)


MG103
130284
131123




MG103(623-









256 of 840)









GB: U02170(1-









368 of 369)


MG104
131384
133558
GB: U14003_91
virulence associated protein homolog (vacB)
29.2335
52.2282
MG104(215-






{Escherichia coli}


491 of 2175)









GB: U01795(1-









277 of 277)


MG108
135337
136116
SP: P35182
protein phosphatase 2C homolog (ptc1)
27.5362
52.1739
MG108(780-






{Saccharomyces cerevisiae}


598 of 780)









GB: U02111(33-









215 of 215)


MG109
136179
137264
PIR: S36944
protein serine/threonine kinase {Arabidopsis thaliana}
33.7398
52.0325
MG109(425-









786 of 1086)









GB: U01720(1-









362 of 362)


MG109
136179
137264
PIR: S36944
protein serine/threonine kinase {Arabidopsis thaliana}
33.7398
52.0325
MG109(781-









1084 of 1086)









GB: U01748(1-









303 of 303)


MG110
137380
138087
GB: U14003_76
hypothetical protein (GB: U14003_76)
28.5714
54.1126
MG110(140-






{Escherichia coli}


242 of 708)









GB: X61518(1-









102 of 102)


MG110
137380
138087
GB: U14003_76
hypothetical protein (GB: U14003_76)
28.5714
54.1126
MG110(670-






{Escherichia coli}


378 of 708)









GB: U01714(1-









293 of 293)


MG111
138105
139403
SP: P13376
phosphoglucose isomerase B (pgiB)
34.8235
53.6471
MG111(1-






{Bacillus stearothermophilus}


98 of 1299)









GB: U01747(38-









135 of 135)


MG112
139396
140022
GB: M64173_3
D-ribulose-5-phosphate 3 epimerase (cfxEc)
33.1361
53.8462
MG112(207-






{Alcaligenes eutrophus}


473 of 627)









GB: U02181(1-









267 of 267)


MG113
140039
141406
GB: M33145_1
asparaginyl-tRNA synthetase (asnS) {Escherichia coli}
41.4579
64.2369
MG113(1231-









941 of 1368)









GB: U01692(1-









291 of 291)


MG115
142314
142550
SP: P31131
hypothetical protein (SP: P31131) {Escherichia coli}
32.6087
50
MG115(198-









237 of 237)









GB: U02127(1-









40 of 234)


MG116
142562
143314




MG116(1-









183 of 753)









GB: U02127(52-









234 of 234)


MG119
144972
146663
GB: M59444_2
methylgalactoside permease ATP-binding protein (mglA)
33.1984
57.6923
MG119(1660-






{Escherichia coli}


1692 of 1692)









GB: U02147(1-









33 of 301)


MG119
144972
146663
GB: M59444_2
methylgalactoside permease ATP-binding protein (mglA)
33.1984
57.6923
MG119(192-






{Escherichia coli}


1 of 1692)









GB: U02149(1-









192 of 681)


MG120
146673
148232
SP: P36948
ribose transport system permease protein (rbsC)
27.4809
51.9084
MG120(1-






{Bacillus subtilis}


259 of 1560)









GB: U02147(43-









301 of 301)


MG122
149198
151324
GB: L27797_2
DNA topoisomerase I (topA) {Bacillus subtilis}
38.9222
59.7305
MG122(1193-









1443 of 2127)









GB: U02134(1-









251 of 251)


MG122
149198
151324
GB: L27797_2
DNA topoisomerase I (topA) {Bacillus subtilis}
38.9222
59.7305
MG122(1578-









1971 of 2127)









GB: U02242(1-









394 of 394)


MG123
151305
152717
GB: M91593_1
hypothetical protein (GB: M91593_1)
23.9837
50.4065
MG123(1413-






{Mycoplasma mycoides}


1236 of 1413)









GB: U01796(114-









291 of 291)


MG124
152767
153072
GB: J03294_1
thioredoxin (trx) {Bacillus subtilis}
36.0825
65.9794
MG124(64-









1 of 306)









GB: U01796(1-









64 of 291)


MG133
159669
158986




MG133(1-









110 of 684)









GB: U02144(237-









345 of 345)


MG133
159669
158986




MG133(435-









673 of 684)









GB: X61537(1-









238 of 238)


MG134
159797
160096
GB: M38777_3
hypothetical protein (GB: M38777_3)
28.5714
57.1429
MG134(109-






{Escherichia coli}


1 of 300)









GB: U02144(1-









109 of 345)


MG135
160913
160074
PIR: E22845
hypothetical protein 4 (GP: Z33006_1)
30.7692
55.9441
MG135(485-






{Trypanosoma brucei}


782 of 840)









GB: U02114(1-









298 of 298)


MG138
163590
165383
GB: K00426_1
GTP-binding membrane protein (lepA)
47.5465
70.5584
MG138(1237-






{Escherichia coli}


938 of 1794)









GB: U02133(2-









301 of 301)


MG138
163590
165383
GB: K00426_1
GTP-binding membrane protein (lepA)
47.5465
70.5584
MG138(1318-






{Escherichia coli}


1794 of 1794)









GB: U01745(1-









477 of 524)


MG138
163590
165383
GB: K00426_1
GTP-binding membrane protein (lepA)
47.5465
70.5584
MG138(323-






{Escherichia coli}


591 of 1794)









GB: X61521(1-









269 of 269)


MG140
175807
179145




MG140(1-









41 of 3339)









GB: U02110(178-









218 of 218)


MG140
175807
179145




MG140(2727-









2429 of 3339)









GB: U01730(1-









297 of 297)


MG140
175807
179145




MG140(3302-









2994 of 3339)









GB: U02156(1-









308 of 308)


MG140
175807
179145




MG140(382-









834 of 3339)









GB: U01729(1-









454 of 454)


MG140
175807
179145




MG140(834-









616 of 3339)









GB: X61512(1-









220 of 220)


MG140
175807
179145




MG140(880-









1182 of 3339)









GB: U01742(1-









303 of 303)


MG141
179153
180745
SP: P32727
N-utilization substance protein A homolog (nusA)
30.8743
53.8251
MG141(223-






{Bacillus subtilis}


871 of 1593)









GB: U01778(1-









652 of 652)


MG142
181007
182863
GB: M34836_1
protein synthesis initiation factor 2 (infB)
46.0292
64.6677
MG142(265-






{Bacillus subtilis}


393 of 1857)









GB: U01765(1-









129 of 129)


MG144
183216
184052




MG144(190-









420 of 837)









GB: U02121(1-









231 of 231)


MG146
184877
186148
GB: X73141_2
hemolysin (tlyC) {Serpulina hyodysenteriae}
26.2712
52.1186
MG146(1272-









1174 of 1272)









GB: U02223(19-









117 of 117)


MG149
188609
189451




MG149(843-









765 of 843)









GB: U02135(182-









260 of 260)


MG151
190372
191142
SP: P10134
ribosomal protein L3 (rpL3) {Mycoplasma capricolum}
42.5926
61.5741
MG151(528-









1 of 771)









GB: U02153(1-









527 of 543)


MG168
198519
199151
GB: M57621_1
ribosomal protein S5 (rpS5)
55.9748
72.327
MG168(505-






{Bacillus stearothermophilus}


633 of 633)









GB: U01726(1-









129 of 260)


MG175
202762
203133
GB: M26414_3
ribosomal protein S13 (rpS13) {Bacillus subtilis}
63.3333
82.5
MG175(22-









372 of 372)









GB: U01733(1-









351 of 600)


MG176
203136
203528
GB: X02543_2
ribosomal protein S11 (rpS11) {Escherichia coli}
47.7876
69.9115
MG176(1-









247 of 393)









GB: U01733(354-









600 of 600)


MG180
205682
206593
GB: M61017_1
membrane transport protein (glnQ)
37.3832
63.0841
MG180(249-






{Bacillus stearothermophilus}


1 of 912)









GB: U01754(1-









248 of 265)


MG180
205682
206593
GB: M61017_1
membrane transport protein (glnQ)
37.3832
63.0841
MG180(912-






{Bacillus stearothermophilus}


784 of 912)









GB: U01750(167-









295 of 295)


MG181
206589
207848




MG181(171-









1 of 1260)









GB: U01750(1-









171 of 295)


MG182
207844
208575
SP: P07649
pseudouridylate synthase I (hisT) {Escherichia coli}
27.0042
45.1477
MG182(1-









308 of 732)









GB: U02176(70-









377 of 377)


MG182
207844
208575
SP: P07649
pseudouridylate synthase I (hisT) {Escherichia coli}
27.0042
45.1477
MG182(732-









383 of 732)









GB: U02100(31-









380 of 380)


MG183
208568
210388
GB: Z32522_1
oligoendopeptidase F (pepF) {Lactococcus lactis}
30
50.6667
MG183(27-









335 of 1821)









GB: U02198(1-









309 of 309)


MG183
208568
210388
GB: Z32522_1
oligoendopeptidase F (pepF) {Lactococcus lactis}
30
50.6667
MG183(38-









1 of 1821)









GB: U02100(1-









38 of 380)


MG184
210392
211342
GB: M97479_2
methyltransferase (ssoIM) {Shigella sonnei}
42.5249
67.4419
MG184(520-









719 of 951)









GB: U02115(1-









200 of 201)


MG190
220479
221561
PIR.JS0068
29 kDa protein, MgPa operon (mgp)
62.0833
82.0833
MG190(28-






{Mycoplasma genitalium}


1083 of 1083)









GB: M31431(1-









1056 of 8760)


MG194
232007
233029
GB: V00291_5
phenylalanyl-tRNA synthetase beta-subunit (pheS)
35.0769
56.3077
MG194(194-






{Escherichia coli}


359 of 1023)









GB: U02120(1-









166 of 166)


MG195
233036
235453
SP: P17922
phenylalanyl-tRNA synthetase beta chain (pheT)
25.4597
49.0806
MG195(2044-






{Bacillus subtilis}


2396 of 2418)









GB: U02173(1-









353 of 353)


MG200
237346
239148
GB: L36455_1
heat shock protein (dnaJ) {Coxiella burnetii}
33.5938
51.5625
MG200(842-









1227 of 1803)









GB: U02163(2-









387 of 387)


MG203
240322
242220
GB: U25549_1
topoisomerase IV subunit B (parE)
100
100
MG203(1216-






{Mycoplasma genitalium}


1899 of 1899)









GB U25549(1-









684 of 2124)


MG204
242223
244565
GB: U25549_2
topoisomerase IV subunit A (parC)
99.7912
99.7912
MG204(1-






{Mycoplasma genitalium}


1438 of 2343)









GB: U25549(687-









2124 of 2124)


MG204
242223
244565
GB: U25549_2
topoisomerase IV subunit A (parC)
99.7912
99.7912
MG204(1950-






{Mycoplasma genitalium}


1641 of 2343)









GB: U02155(1-









308 of 308)


MG206
246127
247422
SP: P14951
excinuclease ABC subunit C (uvrC)
28.0872
51.0896
MG206(738-









399 of 1296)









GB: U02182(1-









341 of 341)


MG208
248492
247905




MG208(585-









162 of 588)









GB: U01785(1-









423 of 423)


MG209
249402
248479
SP: P23851
hypothetical protein (SP: P23851) {Escherichia coli}
30.4498
55.0173
MG209(730-









372 of 924)









GB: U02214(1-









359 of 359)


MG210
249947
249405
GB: M83994_1
prolipoprotein signal peptidase (lsp)
32.3944
52.1127
MG210(1-






{Staphylococcus aureus}


116 of 543)









GB: U01759(196-









311 of 311)


MG212
251780
252583
GB: L32861_1
1-acyl-sn-glycerol-3-phosphate acetyltransferase (plsC)
32.1429
60.7143
MG212(7-






{Borrelia burgdorferi}


315 of 804)









GB: U02160(5-









313 of 313)


MG216
255594
257117
GB: L07920_2
pyruvate kinase (pyk) {Lactococcus lactis}
35.3319
57.6017
MG216(1118-









790 of 1524)









GB: U01798(1-









329 of 329)


MG218
259176
264590
PIR: S37536
no score generated -score shown is bogus
−1
−1
MG218(1669-









1977 of 5415)









GB: U02165(1-









309 of 309)


MG221
266626
267087
SP: P22186
hypothetical protein (SP: P22186) {Escherichia coli}
28.8732
56.338
MG221(337-









49 of 462)









GB: U02195(1-









290 of 290)


MG225
270404
271870
GB: U14003_71
hypothetical protein (GB: U14003_71)
21.9565
48.0435
MG225(1467-






{Escherichia coli}


1409 of 1467)









GB: U02264(289-









347 of 347)


MG226
271938
273314
GB: D26562_11
aromatic amino acid transport protein (aroP)
24.5902
47.2131
MG226(221-






{Escherichia coli}


1 of 1377)









GB: U02264(1-









221 of 347)


MG227
273789
274649
SP: P13954
thymidylate synthase (thyA) {Staphylococcus aureus}
56.5972
75.3472
MG227(577-









861 of 861)









GB: U01718(1-









285 of 439)


MG228
274652
275131
GB: X60681_1
dihydrofolate reductase (dhfr) {Lactococcus lactis}
33.1288
59.5092
MG228(480-









385 of 480)









GB: U02137(174-









269 of 269)


MG229
275140
276159
SP: P17424
ribonucleotide reductase 2 (nrdF)
50
70.0637
MG229(1020-






{Salmonella typhimurium}


697 of 1020)









GB: U01739(22-









344 of 344)


MG231
276646
278808
GB: X73226_1
ribonucleoside-diphosphate reductase (nrdE)
54.1193
73.1534
MG231(2122-






{Salmonella typhimurium}


2163 of 2163)









GB: U02141(1-









42 of 827)


MG237
281078
281959




MG237(647-









882 of 882)









GB: U01774(1-









236 of 289)


MG238
281992
283323
GB: M34066_1
trigger factor (tig) {Escherichia coli}
24.6193
47.9695
MG238(420-









648 of 1332)









GB: U01772(1-









229 of 229)


MG239
283395
285779
SP: P37945
ATP-dependent protease (Ion) {Bacillus subtilis}
43.6268
65.8344
MG239(1818-









1449 of 2385)









GB: U02148(1-









370 of 370)


MG240
286657
285782
GB: M91593_1
hypothetical protein (GB: M91593_1)
27.8195
53.3835
MG240(876-






{Mycoplasma mycoides}


598 of 876)









GB.U01734(27-









305 of 305)


MG242
288752
290641




MG242(886-









543 of 1890)









GB: U02194(1-









344 of 344)


MG244
291332
293440
GB: M99049_1
DNA helicase II (mutB1) {Haemophilus influenzae}
36.0078
55.9687
MG244(829-









1035 of 2109)









GB: X61517(1-









207 of 207)


MG249
297604
296114
SP: P33656
RNA polymerase sigma-A factor (sigA)
43.6842
66.0526
MG249(970-






{Clostridium acetobutylicum}


666 of 1491)









GB: X61535(1-









306 of 306)


MG250
299472
297652
GB: M10040_1
DNA primase (dnaE) {Bacillus subtilis}
27.2727
52.2078
MG250(1530-









1821 of 1821)









GB: U01771(1-









292 of 572)


MG250
299472
297652
GB: M10040_1
DNA primase (dnaE) {Bacillus subtilis}
27.2727
52.2078
MG250(648-









231 of 1821)









GB: U02146(1-









418 of 418)


MG254
304823
302847
GB: M24278_1
DNA ligase (lig) {Escherichia coli}
38.2263
59.3272
MG254(1429-









1722 of 1977)









GB: U02152(1-









294 of 294)


MG254
304823
302847
GB: M24278_1
DNA ligase (lig) {Escherichia coli}
38.2263
59.3272
MG254(37-









367 of 1977)









GB: U01761(1-









330 of 330)


MG255
304999
306093




MG255(726-









1095 of 1095)









GB: U02164(1-









370 of 370)


MG255
304999
306093




MG255(729-









400 of 1095)









GB: U02174(1-









333 of 333)


MG261
315699
318320
GB: M19334_4
DNA polymerase III alpha subunit (dnaE)
31.9115
55.7662
MG261(2442-






{Escherichia coli}


2159 of 2622)









GB: U01738(1-









284 of 284)


MG263
320175
321047
GB: L10328_61
hypothetical protein (GB: L10328_61)
27.8008
47.7178
MG263(828-






{Escherichia coli}


489 of 873)









GB: U01764(1-









340 of 340)


MG266
324809
322434
GB: M88581_1
leucyl-tRNA synthetase (leuS)
43.401
64.2132
MG266(78-






{Bacillus stearothermophilus}


287 of 2376)









GB: U01780(1-









210 of 210)


MG266
324809
322434
GB: M88581_1
leucyl-tRNA synthetase (leuS)
43.401
64.2132
MG266(957-






{Bacillus stearothermophilus}


622 of 2376)









GB: U02167(1-









336 of 336)


MG269
327050
326031
GB: D90354_1
surface protein antigen precursor (pag)
25.5144
47.3251
MG269(239-






{Streptococcus sobrinus}


1 of 1020)









GB: U02215(1-









239 of 366)


MG271
329826
328456
SP: P11959
Dihydrolipoamide dehydrogenase (pdhD)
38.3592
62.306
MG271(914-






{Bacillus stearothermophilus}


1214 of 1371)









GB: U01784(1-









301 of 301)


MG275
334772
333339
SP: P37061
NADH oxidase (nox) {Enterococcus faecalis}
39.229
62.1315
MG275(81-









1 of 1434)









GB: U01786(4-









84 of 280)


MG276
335397
334858
GB: M14040_1
Adenine phosphoribosyltransferase (apt)
34.3373
58.4337
MG276(540-






{Escherichia coli}


430 of 540)









GB: U01786(170-









280 of 280)


MG278
338366
340525
GB: X72832_5
stringent response-like protein (rel)
29.1339
55.1181
MG278(391-






{Streptococcus equisimilis}


697 of 2160)









GB: U01770(1-









308 of 308)


MG281
343702
342035




MG281(748-









1051 of 1668)









GB: U01706(1-









303 of 303)


MG282
344849
344367
SP: P2740
transcription elongation factor (greA)
40.146
65.6934
MG282(483-






{Rickettsia prowazekii}


356 of 483)









GB: U02104(187-









314 of 314)


MG283
345181
346629
GB: M97858_1
prolyl-tRNA synthetase (proS) {Escherichia coli}
22.6562
46.0938
MG283(839-









1183 of 1449)









GB: U02205(1-









346 of 346)


MG285
347214
348254




MG285(315-









493 of 1041)









GB: U02266(1-









180 of 180)


MG289
354023
355126
SP: P15363
high affinity transport system protein P37 (P37)
35.7798
58.4098
MG289(105-






{Mycoplasma hyorhinis}


1 of 1104)









GB: U02132(1-









105 of 571)


MG291
355846
357474
SP: P15362
transport system permease protein P69 (P69)
27.9159
54.8757
MG291(1216-






{Mycoplasma hyorhinis}


1629 of 1629)









GB: U01768(1-









415 of 705)


MG291
355846
357474
SP: P15362
transport system permease protein P69 (P69)
27.9159
54.8757
MG291(279-






{Mycoplasma hyorhinis}


1 of 1629)









GB: U02171(1-









279 of 346)


MG293
361384
360653
SP: P37965
Glycerophosphoryl diester phosphodiesterase (glpQ)
30.3965
55.9471
MG293(357-






{Bacillus subtilis}


41 of 732)









GB: U02118(1-









317 of 317)


MG294
362801
361380
GB: L19201_18
hypothetical protein (GB: L19201_18)
23.1013
46.2025
MG294(256-






{Escherichia coli}


592 of 1422)









GB: U02243(1-









337 of 337)


MG297
365574
364537
GB: U00039_18
cell division protein (ftsY) {Escherichia coli}
36.1371
57.9439
MG297(1-









57 of 1038)









GB: U02177(215-









271 of 271)


MG298
368529
365584
GB: M34956_1
115 kDa protein (p115) {Mycoplasma hyorhinis}
33.4059
57.5626
MG298(2743-









2946 of 2946)









GB: U02177(1-









205 of 271)


MG300
370962
369715
SP: P36204
phosphoglycerate kinase (pgk) {Thermotoga maritima}
51.2887
70.6186
MG300(1-









167 of 1248)









GB: U02178(167-









333 of 333)


MG300
370962
369715
SP: P36204
phosphoglycerate kinase (pgk) {Thermotoga maritima}
51.2887
70.6186
MG300(935-









609 of 1248)









GB: U02226(1-









326 of 326)


MG300
370962
369715
SP: P36204
phosphoglycerate kinase (pgk) {Thermotoga maritima}
51.2887
70.6186
MG300(939-









1243 of 1248)









GB: U02234(1-









305 of 305)


MG301
371962
370952
GB: X72219_1
glyceraldehyde-3-phosphate dehydrogenase (gap)
56.0606
73.0303
MG301(244-






{Clostridium pasteurianum}


1 of 1011)









GB: U02213(1-









244 of 364)


MG301
371962
370952
GB: X72219_1
glyceraldehyde-3-phosphate dehydrogenase (gap)
56.0606
73.0303
MG301(835-






{Clostridium pasteurianum}


1011 of 1011)









GB: U02178(1-









177 of 333)


MG302
372946
371996




MG302(951-









865 of 951)









GB: U02213(278-









364 of 364)


MG305
376705
374921
GB: D30690_3
heat shock protein 70 (hsp70) {Staphylococcus aureus}
57.4359
75.8974
MG305(1382-









1055 of 1785)









GB: U02204(1-









327 of 327)


MG307
381507
377977




MG307(3175-









2042 of 3531)









GB: U01767(1-









1134 of 1134)


MG308
382724
381495
SP: P23304
ATP-dependent RNA helicase (deaD) {Escherichia coli}
23.0986
48.169
MG308(1-









89 of 1230)









GB: U02200(276-









364 of 364)


MG309
386408
382734




MG309(3410-









3675 of 3675)









GB: U02200(1-









266 of 364)


MG312
391334
387918
GB: U11381_1
cytadherence-accessory protein (hmwl)
39.3235
60.6765
MG312(2541-






{Mycoplasma pneumoniae}


2160 of 3417)









GB: U02261(1-









382 of 382)


MG314
393633
392305
GP: L38997_4
hypothetical protein (GP: L38997_4)
51.4477
71.4922
MG314(514-






{Mycoplasma pneurnoniae}


206 of 1329)









GB: U02151(1-









309 of 309)


MG317
397423
395627
GB: M82965_1
cytadherence-accessory protein (hmw3)
41.1458
59.8958
MG317(1329-






{Mycoplasma pneumoniae}


1542 of 1797)









GB: U02267(1-









214 of 214)


MG317
397423
395627
GB: M82965_1
cytadherence-accessory protein (hmw3)
41.1458
59.8958
MG317(509-






{Mycoplasma pneumoniae}


169 of 1797)









GB: U02224(1-









341 of 341)


MG317
397423
395627
GB: M82965_1
cytadherence-accessory protein (hmw3)
41.1458
59.8958
MG317(73-






{Mycoplasma pneumoniae}


1 of 1797)









GB: U01716(1-









73 of 325)


MG318
398280
397441
GB: J04151_1
fibronectin-binding protein (fnbA)
24.6154
43.0769
MG318(840-






{Staphylococcus aureus}


604 of 840)









GB: U01716(91-









325 of 325)


MG319
398833
398300




MG319(423-









1 of 534)









GB: U01769(1-









426 of 541)


MG320
399797
398940




MG320(371-









781 of 858)









GB: U01700(1-









410 of 410)


MG324
408792
407731
GB: D00398_1
aminopeptidase P (pepP) {Escherichia coli}
30.531
54.4248
MG324(883-









1062 of 1062)









GB: U01717(1-









181 of 223)


MG324
408792
407731
GB: D00398_1
aminopeptidase P (pepP) {Escherichia coli}
30.531
54.4248
MG324(889-









1062 of 1062)









GB: U01755(2-









175 of 217)


MG327
410676
409873
SP: P26174
magnesium-chelatase 30 kDa subunit (bchO)
26.7281
51.1521
MG327(782-






{Rhodobacter capsulatus}


533 of 804)









GB: U02232(1-









250 of 250)


MG328
412933
410666
GB: X62467_1
protein V (fcrV) {Streptococcus sp.}
27.5434
48.3871
MG328(339-









53 of 2268)









GB: U02188(1-









287 of 287)


MG328
412933
410666
GB: X62467_1
protein V (fcrV) {Streptococcus sp.}
27.5434
48.3871
MG328(817-









462 of 2268)









GB: U02203(1-









356 of 356)


MG330
414975
414325
SP: P38493
cytidylate kinase (cmk) {Bacillus subtilis}
40.3756
61.0329
MG330(537-









226 of 651)









GB: U02241(1-









312 of314)


MG334
419480
416970
SP: Q05873
valyl-tRNA synthetase (valS) {Bacillus subtilis}
38.5629
60.5988
MG334(1109-









781 of 2511)









GB: U02202(1-









330 of 330)


MG334
419480
416970
SP: Q05873
valyl-tRNA synthetase (valS) {Bacillus subtilis}
38.5629
60.5988
MG334(2400-









2511 of 2511)









GB: U02249(1-









112 of 305)


MG335
420045
419473
SP: P38424
hypothetical protein (SP: P38424) {Bacillus subtilis}
34.5238
61.3095
MG335(1-









95 of 573)









GB: U02190(200-









294 of 294)


MG336
421467
422690
GB: U00013_6
nitrogen fixation protein (nifS) {Mycobacterium leprae}
26.2295
47.2678
MG336(990-









719 of 1224)









GB: U02256(1-









272 of 272)


MG337
422697
423110




MG337(414-









151 of 414)









GB: U01709(35-









297 of 297)


MG338
426915
423103




MG338(1-









251 of 3813)









GB: U02269(65-









315 of 315)


MG338
426915
423103




MG338(1304-









917 of 3813)









GB: U02221(1-









388 of 388)


MG338
426915
423103




MG338(3342-









3067 of 3813)









GB: U01809(1-









276 of 276)


MG338
426915
423103




MG338(3772-









3813 of 3813)









GB: U01709(1-









42 of 297)


MG339
428115
427096
GB: L25893_1
recombination protein (recA) {Staphylococcus aureus}
46.5986
69.3878
MG339(372-









93 of 1020)









GB: U01704(1-









279 of 279)


MG340
434458
430583
SP: P00577
DNA-directed RNA polymerase beta′chain (rpoC)
44.4828
66.0345
MG340(1294-






{Escherichia coli}


999 of 3876)









GB: X61534(1-









295 of 295)


MG340
434458
430583
SP: P00577
DNA-directed RNA polymerase beta′chain (rpoC)
44.4828
66.0345
MG340(1519-






{Escherichia coli}


1289 of 3876)









GB: X61528(1-









231 of 231)


MG340
434458
430583
SP: P00577
DNA-directed RNA polymerase beta′chain (rpoC)
44.4828
66.0345
MG340(3444-






{Escherichia coli}


3083 of 3876)









GB: U02169(1-









361 of 361)


MG340
434458
430583
SP: P00577
DNA-directed RNA polymerase beta′chain (rpoC)
44.4828
66.0345
MG340(3772-






{Escherichia coli}


3876 of 3876)









GB: U01766(1-









105 of 467)


MG340
434458
430583
SP: P00577
DNA-directed RNA polymerase beta′chain (rpoC)
44.4828
66.0345
MG340(426-






{Escherichia coli}


66 of 3876)









GB: U01797(1-









361 of 361)


MG341
438640
434471
GB: L24376_3
RNA polymerase beta subunit (rpoB) {Bacillus subtilis}
46.5338
67.5043
MG341(1-









107 of 4170)









GB: U02230(217-









323 of 323)


MG341
438640
434471
GB: L24376_3
RNA polymerase beta subunit (rpoB) {Bacillus subtilis}
46.5338
67.5043
MG341(1932-









1595 of 4170)









GB: U01737(1-









338 of 338)


MG341
438640
434471
GB: L24376_3
RNA polymerase beta subunit (rpoB) {Bacillus subtilis}
46.5338
67.5043
MG341(2833-









3201 of 4170)









GB: U01735(1-









369 of 369)


MG342
439236
438733




MG342(381-









504 of 504)









GB: U02230(1-









124 of 323)


MG342
439236
438733




MG342(386-









65 of 504)









GB: U02231(1-









322 of 322)


MG343
440355
439318




MG343(108-









452 of 1038)









GB: U01811(1-









345 of 345)


MG344
441180
440362
GP: U17036_2
lipase-esterase (lip1) {Mycoplasma mycoides}
26.6667
47.5
MG344(575-









767 of 819)









GB: U02222(1-









193 of 193)


MG345
443878
441194
SP: P00956
isoleucyl-tRNA synthetase (ileS) {Escherichia coli}
33.2963
56.2708
MG345(1115-









782 of 2685)









GB: U02196(1-









334 of 334)


MG345
443878
441194
SP: P00956
isoleucyl-tRNA synthetase (ileS) {Escherichia coli}
33.2963
56.2708
MG345(1811-









2134 of 2685)









GB: U02254(1-









324 of 324)


MG348
446165
445200




MG348(166-









459 of 966)









GB: U01781(1-









292 of 292)


MG352
450222
450719
GB: U11883_2
hypothetical protein (GB: U11883_2) {Bacillus subtilis}
33.3333
56.7901
MG352(366-









498 of 498)









GB: U02237(1-









133 of 310)


MG353
451048
450722




MG353(327-









153 of 327)









GB: U02237(136-









309 of 310)


MG357
455947
454769
GB: L17320_2
acetate kinase (ackA) {Bacillus subtilis}
42.6735
65.5527
MG357(342-









131 of 1179)









GB: X61531(1-









211 of 211)


MG358
456590
457369
GB: M21298_1
Holliday junction DNA helicase (ruvA)
26.2411
42.5532
MG358(350-






{Escherichia coli}


87 of 780)









GB: U02233(1-









265 of 265)


MG361
459615
460100
SP: P29394
ribosomal protein L10 (rpL10) {Thermotoga maritima}
29.8137
61.4907
MG361(274-









486 of 486)









GB: U02206(1-









213 of 345)


MG362
460126
460491
SP: P02394
ribosomal protein L7/L12 (‘A’type) (rpL7/L12)
47.5
70
MG362(1-






{Bacillus subtilis}


107 of 366)









GB: U02206(239-









345 of 345)


MG365
461682
462614
GB: X63666_2
methionyl-tRNA formyltransferase (fmt)
24.43
50.8143
MG365(292-






{Escherichia coli}


1 of 933)









GB: U02238(1-









292 of 349)


MG368
466410
465427
GB: M96793_1
fatty acid/phospholipid synthesis protein (plsX)
28.972
52.3364
MG368(227-






{Escherichia coli}


1 of 984)









GB: U01791(1-









227 of 326)


MG369
468083
466413




MG369(1146-









1446 of 1671)









GB: U01763(1-









300 of 300)


MG370
469123
468155
SP: P23851
hypothetical protein (SP: P23851) {Escherichia coli}
26.9531
48.8281
MG370(240-









599 of 969)









GB: U02220(1-









360 of 360)


MG371
470084
469113
GB: D26185_10
hypothetical protein (GB: D26185_10)
25.8065
47.0046
MG371(349-






{Bacillus subtilis}


689 of 972)









GB: U02263(1-









341 of 341)


MG374
472891
472070




MG374(1-









178 of 822)









GB: U02250(159-









337 of 337)


MG375
474578
472887
GB: M36594_1
threonyl-tRNA synthetase (thrSv) {Bacillus subtilis}
38.7097
60.7527
MG375(1048-









1389 of 1692)









GB: U02130(1-









342 of 342)


MG375
474578
472887
GB: M36594_1
threonyl-tRNA synthetase (thrSv) {Bacillus subtilis}
38.7097
60.7527
MG375(1530-









1692 of 1692)









GB: U02250(1-









163 of 337)


MG378
477139
475529
SP: P35868
arginyl-tRNA synthetase (argS)
33.6406
56.9124
MG378(1364-






{Corynebacterium glutamicum}


1047 of 1611)









GB: U01740(1-









319 of 319)


MG378
477139
475529
SP: P35868
arginyl-tRNA synthetase (argS)
33.6406
56.9124
MG378(765-






{Corynebacterium glutamicum}


456 of 1611)









GB: U02168(1-









309 of 309)


MG379
477168
479003
GB: L10328_106
glucose inhibited division protein (gidA)
40.7346
61.9366
MG379(900-






{Escherichia coli}


1184 of 1836)









GB: U01812(1-









285 of 285)


MG385
484699
483992




MG385(234-









6 of 708)









GB: U02112(1-









229 of 229)


MG385
484699
483992




MG385(523-









708 of 708)









GB: U02239(1-









186 of 320)


MG385
484699
483992




MG385(528-









259 of 708)









GB: U02246(1-









270 of 270)


MG386
489552
484705
GB: U11381_1
cytadherence-accessory protein (hmwl)
31.1755
49.4037
MG386(1294-






{Mycoplasma pneumoniae}


1628 of 4848)









GB: U02175(1-









335 of 335)


MG386
489552
484705
GB: U11381_1
cytadherence-accessory protein (hmwl)
31.1755
49.4037
MG386(2274-






{Mycoplasma pneumoniae}


1991 of 4848)









GB: X61519(1-









283 of 284)


MG386
489552
484705
GB: U11381_1
cytadherence-accessory protein (hmwl)
31.1755
49.4037
MG386(3247-






{Mycoplasma pneumoniae}


3420 of 4848)









GB: U02126(1-









174 of 174)


MG386
489552
484705
GB: U11381_1
cytadherence-accessory protein (hmwl) {
31.1755
49.4037
MG386(3842-








Mycoplasma pneumoniae
}



4196 of 4848)









GB: U02192(1-









355 of 355)


MG386
489552
484705
GB: U11381_1
cytadherence-accessory protein (hmwl)
31.1755
49.4037
MG386(767-






{Mycoplasma pneumoniae}


1281 of 4848)









GB: U02245(2-









515 of 515)


MG388
491004
490702
GB: U00016_19
hypothetical protein (GB: U00016_19)
30.9278
56.701
MG388(285-






{Mycobacterium leprae}


1 of 303)









GB: U02265(1-









285 of 339)


MG389
491530
491150




MG389(320-









129 of 381)









GB: U01813(1-









192 of 192)


MG390
493516
491537
SP: P37608
lactococcin transport ATP-binding protein (lcnDR3)
22.3421
46.5331
MG390(1395-






{Lactococcus lactis}


1744 of 1980)









GB: U02218(1-









350 of 350)


MG390
493516
491537
SP: P37608
lactococcin transport ATP-binding protein (lcnDR3)
22.3421
46.5331
MG390(1400-






{Lactococcus lactis}


1174 of 1980)









GB: U02248(1-









227 of 227)


MG391
494967
493627
GB: D17450_1
aminopeptidase {Mycoplasma salivarium}
41.2921
60.3933
MG391(1-









217 of 1341)









GB: U02268(256-









472 of 472)


MG391
494967
493627
GB: D17450_1
aminopeptidase {Mycoplasma salivarium}
41.2921
60.3933
MG391(412-









735 of 1341)









GB: U01801(1-









324 of 324)


MG391
494967
493627
GB: D17450_1
aminopeptidase {Mycoplasma salivarium}
41.2921
60.3933
MG391(412-









735 of 1341)









GB: U01802(1-









324 of 324)


MG392
496615
494987
GB: L10132_2
heat shock protein (groEL)
51.5209
71.4829
MG392(1394-






{Bacillus stearothermophilus}


1629 of 1629)









GB: U02268(1-









236 of 472)


MG392
496615
494987
GB: L10132_2
heat shock protein (groEL)
51.5209
71.4829
MG392(181-






{Bacillus stearothermophilus}


1 of 1629)









GB: U02252(1-









181 of 296)


MG393
496960
496631
GB: D17398_1
heat shock protein 60-like protein (PggroES)
39.5604
54.9451
MG393(330-






{Porphyromonas gingivalis}


231 of 330)









GB: U02252(197-









296 of 296)


MG394
498306
497089
SP: P06192
serine hydroxymethyltransferase (glyA)
55.303
70.7071
MG394(328-






{Salmonella typhimurium}


683 of 1218)









GB: U02131(1-









356 of 356)


MG395
499890
498319




MG395(457-









116 of 1572)









GB: U02260(1-









342 of 342)


MG395
499890
498319




MG395(763-









979 of 1572)









GB: X61530(1-









217 of 217)


MG399
503976
502831
SP: P33253
ATP synthase beta chain (atpD)
80.9524
89.418
MG399(447-






{Mycoplasma gallisepticum}


852 of 1146)









GB: U01752(1-









406 of 406)


MG400
505099
504263
SP: P33257
ATP synthase gamma chain (atpG)
37.9433
62.0567
MG400(160-






{Mycoplasma gallisepticum}


711 of 837)









GB: U01703(1-









552 of 552)


MG401
506655
505102
SP: P33252
ATP synthase alpha chain (atpA)
63.3911
79.5761
MG401(973-






{Mycoplasma gallisepticum}


1554 of 1554)









GB: U01727(1-









583 of 598)


MG405
509012
508137
GB: X64256_2
adenosinetriphosphatase (atpB)
36.4261
63.9175
MG405(75-






{Mycoplasma gallisepticum}


1 of 876)









GB: U01728(1-









75 of 299)


MG406
509319
508981
SP: P15362
transport system permease protein P69 (P69)
40
57.1429
MG406(339-






{Mycoplasma hyorhinis}


84 of 339)









GB: U01728(44-









299 of 299)


MG410
513042
512056
GB: L10328_89
peripheral membrane protein B (pstB) {Escherichia coli}
50.813
70.3252
MG410(301-









941 of 987)









GB: U01707(1-









640 of 640)


MG411
514991
513030
GB: X75297_1
periplasmic phosphate permease homolog (AG88)
30.7692
56.2753
MG411(406-






{Mycobacterium tuberculosis}


632 of 1962)









GB: U01746(1-









227 of 229)


MG412
516124
514994




MG412(252-









1 of 1131)









GB: U01702(1-









252 of 313)


MG412
516124
514994




MG412(675-









563 of 1131)









GB: U02101(1-









113 of 113)


MG413
518389
516248
GB: L22432_4
hypothetical protein (GB: L22432_4)
25
54.1667
MG413(1179-






{Mycoplasma capricolum}


701 of 2142)









GB: U01699(1-









480 of 480)


MG413
518389
516248
GB: L22432_4
hypothetical protein (GB: L22432_4)
25
54.1667
MG413(1535-






{Mycoplasma capricolum}


1230 of 2142)









GB: U01804(1-









305 of 305)


MG414
519355
516248




MG414(438-









154 of 917)









GB: U01695(1-









285 of 285)


MG416
521414
520371




MG416(1-









39 of 1044)









GB: U01744(580-









618 of 620)


MG416
521414
520371




MG416(7-









351 of 1044)









GB: U02102(1-









345 of 345)


MG418
522314
521877
SP: P02410
ribosomal protein L13 (rpL13) {Escherichia coli}
41.3043
70.2899
MG418(321-









438 of 438)









GB: U01744(1-









118 of 620)


MG421
526696
524153
SP: P07671
excinuclease ABC subunit A (uvrA) {Escherichia coli}
47.7541
68.5579
MG421(1693-









1393 of 2544)









GB: X61514(1-









301 of 301)


MG422
529493
526989




MG422(2274-









2101 of 2505)









GB: U02117(1-









174 of 174)


MG422
529493
526989




MG422(2439-









2505 of 2505)









GB: U02172(1-









67 of 318)


MG422
529493
526989




MG422(35-









1 of 2505)









GB: U02228(1-









35 of 304)


MG423
531216
529534




MG423(1434-









1197 of 1683)









GB: X61510(1-









238 of 238)


MG423
531216
529534




MG423(161-









413 of 1683)









GB: X61524(1-









252 of 255)


MG423
531216
529534




MG423(1683-









1455 of 1683)









GB: U02228(76-









304 of 304)


MG425
531668
533014
SP: P23304
ATP-dependent RNA helicase (deaD) {Escherichia coli}
32.4121
58.0402
MG425(989-









769 of 1347)









GB: U01805(1-









220 of 220)


MG431
538290
537559
GB: L27492_1
triosephosphate isomerase (tim) {Thermotoga maritima}
39.7541
61.8852
MG431(463-









732 of 732)









GB: U02109(1-









270 of 277)


MG437
542067
542981
GB: M11330_1
CDP-diglyceride synthetase (cdsA) {Escherichia coli}
38.0165
55.3719
MG437(679-









378 of 915)









GB: U02189(2-









303 of 303)


MG441
546707
546300




MG441(20-









318 of 408)









GB: U02128(1-









299 of 299)


MG447
552444
550804
GB: L08897_1
hypothetical protein (GB: L08897_1)
34.058
55.0725
MG447(319-






{Mycoplasma gallisepticum}


645 of 1641)









GB: U01788(1-









327 of 327)


MG451
555612
554431
SP: P13927
elongation factor TU (tuf) {Mycoplasma genitalium}
100
100
MG451(927-









586 of 1182)









GB: U02255(1-









342 of 342)


MG453
556435
557310
GB: L12272_1
UDP-glucose pyrophosphorylase (gtaB)
48.0287
65.233
MG453(491-






{Bacillus subtilis}


181 of 876)









GB: U02258(1-









311 of 311)


MG455
557724
558944
GB: M77668_1
tyrosyl tRNA synthetase (tyrS)
38.539
61.7128
MG455(604-






{Bacillus stearothermophilus}


362 of 1221)









GB: U02247(5-









247 of 247)


MG456
559941
558940




MG456(256-









568 of 1002)









GB: U01790(1-









312 of 312)


MG458
563307
562783
SP: Q02522
hypoxanthine-guanine phosphoribosyltransferase (hpt)
38.3721
66.8605
MG458 (295-






{Lactococcus lactis}


24 of 525)









GB: U02193(1-









272 of 272)


MG459
563818
563312
GB: M64978_2
surface exclusion protein (prgA) (Plasmid pCF10)
28.3582
49.2537
MG459(330-






{Enterococcus faecalis}


1 of 507)









GB: U01725(1-









330 of 638)


MG460
563991
564926
SP: P33572
L-lactate dehydrogenase (ldh)
50.3226
67.7419
MG460(1 -






{Mycoplasma hyopneumoniae}


136 of 936)









GB: U01725(503-









638 of 638)


MG462
567638
566187
GB: M55072_1
glutamyl-tRNA synthetase (gltX)
42.887
65.272
MG462(1452-






{Bacillus stearothermophilus}


1081 of 1452)









GB: U02122(9-









379 of 379)


MG463
568404
567628
GB: D26185_105
high level kasgamycin resistance (ksgA)
35.6164
53.8813
MG463(777-






{Bacillus subtilis}


409 of 777)









GB: U01719(36-









405 of 405)


MG467
570988
570056
GB: X75422_1
heterocyst maturation protein (devA) {Anabaena sp.}
39.899
63.1313
MG467(40-









352 of 933)









GB: U01741(1-









313 of 313)


MG469
578578
577268
SP: P34028
chromosomal replication initiator protein (dnaA)
30.9469
57.2748
MG469(845-






{Spiroplasma citri}


547 of 1311)









GB: U02259(1-









299 of 299)


MG469
578578
577268
SP: P34028
chromosomal replication initiator protein (dnaA)
30.9469
57.2748
MG469(855-






{Spiroplasma citri}


1206 of 1311)









GB: U02145(1-









352 of 352)










[0279]

4






TABLE 1(d)








UID
Old_id(s)


















MG001
MORF-20072




MG002
MORF-19817


MG003
MORF-19818
MORF-20073


MG004
MORF-19819
MORF-20074


MG005
MORF-20075


MG006
MORF-20076


MG007
MORF-19820


MG008
MORF-20077


MG009
MORF-20078


MG010
MORF-20079


MG011
MORF-19821
MORF-19822


MG012
MORF-20080


MG013
MORF-19823
MORF-20080
MORF-20081


MG014
MORF-20082


MG015
MORF-20084


MG016
MORF-19824


MG017
MORF-19825


MG018
MORF-20085


MG019
MORF-20086


MG020
MORF-20088


MG021
MORF-20089


MG022
MORF-20091


MG023
MORF-20092


MG024
MORF-19826
MORF-20093


MG025
MORF-20094


MG026
MORF-20095


MG027
MORF-19827


MG028
MORF-19828


NG029
MORF-19829


MG030
MORF-20096


MG031
MORF-19830
MORF-20097


MG032
MORF-20099


MG033
MORF-20100


MG034
MORF-20101


MG035
MORF-20102


MG036
MORF-20103


MG037
MORF-20104


MG038
MORF-20105


MG039
MORF-19831
MORF-20106


MG040
MORF-20107


MG042
MORF-19832
MORF-20108


MG043
MORF-20110


MG044
MORF-20111


MG045
MORF-19833


MG046
MORF-20112


MG047
MORF-20113


MG048
MORF-19834
MORF-20114
MORF-20115


MG049
MORF-20114
MORF-20115


MG050
MORF-20117


MG051
MORF-19835
MORF-20118


MG052
MORF-20119


MG053
MORF-20120


MG054
MORF-20120
MORF-20121


MG055
MORF-19836


MG056
MORF-20122


MG057
MORF-20123


MG058
MORF-20124


MG059
MORF-20124
MORF-20125


MG060
MORF-20126


MG061
MORF-19838


MG062
MORF-19839
MORF-20127
MORF-20128


MG063
MORF-19840
MORF-20128


MG064
MORF-19841
MORF-19842


MG065
MORF-19843
MORF-20129


MG066
MORF-19844
MORF-20130


MG067
MORF-19845


MG068
MORF-20131


MG069
MORF-19847
MORF-20135


MG070
MORF-20136


MG071
MORF-19848
MORF-19849
MORF-19850


MORF-19851
MORF-20137


MG072
MORF-19852
MORF-19853
MORF-19854


MORF-20138


MG073
MORF-20139


MG074
MORF-19855


MG075
MORF-19856
MORF-19857


MG076
MORF-19858


MG077
MORF-20140


MG078
MORF-19859
MORF-20141


MG079
MORF-20142


MG080
MORF-20143


MG081
MORF-20144


MG082
MORF-20145


MG083
MORF-20146


MG084
MORF-20147


MG085
MORF-20147
MORF-20148


MG086
MORF-19860
MORF-19861


MG087
MORF-20149


MG088
MORF-20150


MG089
MORF-20151
MORF-20152


MG090
MORF-19862


MG091
MORF-20153


MG092
MORF-20154


MG093
MORF-20155


MG094
MORF-20156


MG095
MORF-19863


MG096
MORF-20157


MG097
MORF-20158


MG098
MORF-20159


MG099
MORF-19864
MORF-20160


MG100
MORF-19865
MORF-20161


MG101
MORF-19866


MG102
MORF-20162


MG103
MORF-19867
MORF-19868


MG104
MORF-20163


MG105
MORF-19869


MG106
MORF-20164
MORF-20165


MG107
MORF-20164
MORF-20165


MG108
MORF-20166


MG109
MORF-20167


MG110
MORF-20168


MG111
MORF-20169


MG112
MORF-20170


MG113
MORF-19870
MORF-20171
MORF-20172


MG114
MORF-20171
MORF-20172


MG116
MORF-19871


MG117
MORF-19872


MG118
MORF-20173


MG119
MORF-19873
MORF-20174


MG120
MORF-19874


MG121
MORF-19875
MORF-20175


MG122
MORF-20176


MG123
MORF-19876


MG124
MORF-20177


MG125
MORF-19877


MG126
MORF-20178


MG127
MORF-20179


MG128
MORF-20180


MG129
MORF-20181


MG130
MORF-20182


MG132
MORF-20183


MG133
MORF-19878


MG134
MORF-20184


MG135
MORF-20185


MG136
MORF-20186
MORF-20187


MG137
MORF-20186
MORF-20187


MG138
MORF-20188


MG139
MORF-19879


MG140
MORF-19884


MG141
MORF-19885
MORF-20192


MG142
MORF-19886
MORF-20193


MG143
MORF-20194


MG144
MORF-19887


MG145
MORF-20195


MG146
MORF-20196


MG147
MORF-19888
MORF-19889


MG148
MORF-19890


MG149
MORF-19891


MG150
MORF-19893
MORF-20197


MG151
MORF-19893
MORF-20198


MG152
MORF-19895
MORF-20199


MG153
MORF-19894


MG154
MORF-19896
MORF-20200


MG156
MORF-19897


MG157
MORF-20201


MG158
MORF-20202


MG159
MORF-19898


MG161
MORF-19900
MORF-20203


MG162
MORF-19899
MORF-19900


MG163
MORF-20204


MG165
MORF-20205


MG166
MORF-19901
MORF-20206


MG167
MORF-19901
MORF-20207


MG168
MORF-19902
MORF-20208


MG169
MORF-20209


MG170
MORF-20210


MG171
MORF-20211


MG172
MORF-20212


MG175
MORF-20213


MG176
MORF-20214


MG177
MORF-19903
MORF-20215


MG178
MORF-20216


MG179
MORF-19904
MORF-20217


MG180
MORF-20218


MG181
MORF-19905


MG182
MORF-20219


MG183
MORF-20219


MG184
MORF-20220


MG185
MORF-20221


MG186
MORF-19907


MG187
MORF-19908
MORF-19909
MORF-20225


MG188
MORF-20226
MORF-20227


MG189
MORF-20226
MORF-20227


MG190
MORF-20228


MG191
MORF-19910
MORF-19911
MORF-20229


MG192
MORF-19911
MORF-19912
MORF-20230


MG194
MORP-19913
MORF-20234


MG195
MORF-20235


MG196
MORF-20236


MG199
MORF-19914


MG200
MORF-19915
MORF-20237


MG201
MORF-19916
MORF-20239


MG202
MORF-19917


MG203
MORF-19918
MORF-19919
MORF-20240


MG204
MORF-20241
MORF-20242


MG205
MORF-20243


MG206
MORF-20244


MG207
MORF-19920


MG208
MORF-19921


MG209
MORF-20245


MG210
MORF-20246


MG211
MORF-19922


MG212
MORF-19924
MORF-20247
MORF-20248


MG213
MORF-20248


MG214
MORF-20249


MG215
MORF-20250


MG216
MORF-20251


MG217
MORF-20252


MG218
MORF-19926
MORF-19927
MORF-20253


MG219
MORF-19928
MORF-19930
MORF-20253


MG220
MORF-19931


MG221
MORF-20255


MG222
MORF-20256


MG223
MORF-19932


MG224
MORF-20257


MG225
MORF-20258


MG226
MORF-20259


MG227
MORF-20260


MG228
MORF-19933


MG229
MORF-19934
MORF-20261


MG230
MORF-19935


MG231
MORF-20262


MG232
MORF-20263


MG234
MORF-20264


MG235
MORF-19936
MORF-20265


MG236
MORF-19937


MG237
MORF-19938


MG238
MORF-19939
MORF-20266


MG239
MORF-20267


MG240
MORF-20268


MG241
MORF-19940
MORF-19941
MORF-19942


MG242
MORF-19943


MG243
MORF-19945


MG244
MORF-20269


MG245
MORF-19946


MG246
MORF-19947


MG247
MORF-20270


MG248
MORF-19948


MG249
MORF-19949
MORF-20271


MG250
MORF-20272


MG251
MORF-19950
MORF-20273


MG252
MORF-20274


MG253
MORF-20275


MG254
MORF-20276


MG255
MORF-19951
MORF-19952


MG256
MORF-19953


MG258
MORF-19954
MORF-20277


MG259
MORF-20278


MG260
MORF-19955
MORF-19956
MORF-20279


MG261
MORF-19958
MORF-20282


MG262
MORF-20283


MG263
MORF-20285


MG264
MORF-20286
MORF-20287


MG265
MORF-20286
MORF-20287


MG266
MORF-20288


MG267
MORF-19959
MORF-19960


MG268
MORF-20290


MG269
MORF-20291


MG270
MORF-20292


MG271
MORF-20293


MG272
MORF-19961
MORF-19962
MORF-20294


MG273
MORF-20295


MG274
MORF-20296


MG275
MORF-20297


MG276
MORF-20298


MG277
MORF-19963
MORF-20299


MG278
MORF-19964
MORF-20300


MG279
MORF-19965


MG280
MORF-19966
MORF-20301


MG281
MORF-19967
MORF-19968


MG282
MORF-20302


MG283
MORF-20303


MG284
MORF-19969
MORF-19970
MORF-19971


MG285
MORF-19969
MORF-19970
MORF-19971


MG286
MORF-19972


MG288
MORF-20306


MG289
MORF-20307


MG290
MORF-20308


MG291
MORF-20309


MG292
MORF-20310


MG293
MORF-20311


MG294
MORF-19974
MORF-20312


MG295
MORF-20313


MG296
MORF-19975


MG297
MORF-20314


MG298
MORF-19976
MORF-20315


MG299
MORF-20316


MG300
MORF-20317


MG301
MORF-19977
MORF-20318


MG302
MORF-19978


MG303
MORF-20319


MG304
MORF-20320


MG305
MORF-19979
MORF-20321


MG306
MORF-19980


MG307
MORF-19981
MORF-19982


MG308
MORF-20323


MG309
MORF-19983
MORF-19984


MG310
MORF-20324


MG311
MORF-20325


MG312
MORF-20326


MG314
MORF-19985
MORF-19986


MG315
MORF-19987
MORF-19988
MORF-20327


MG316
MORF-19988
MORF-20327


MG317
MORF-20328
MORF-20329


MG318
MORF-19989
MORF-19990


MG319
MORF-20330


MG320
MORF-19991


MG321
MORF-19992


MG322
MORF-19993
MORF-20331


MG323
MORF-19994
MORF-20332


MG324
MORF-19995
MORF-20333


MG326
MORF-20334


MG327
MORF-20335


MG328
MORF-19996
MORF-20336


MG329
MORF-19997
MORF-20337


MG330
MORF-20338
MORF-20339


MG331
MORF-20339


MG332
M0RF-20340


MG333
MORF-19998


MG334
MORF-20341


MG336
MORF-20343
MORF-20344


MG337
MORF-19999


MG338
MORF-20000


MG339
MORF-20001
MORF-20345


MG340
M0RF-20006
MORF-20348


MG341
MORF-20349


MG342
MORF-20350


MG343
MORF-20007


MG344
MORF-20008


MG345
MORF-20351


MG346
MORF-20352


MG348
MORF-20009


MG349
MORF-20010


MG350
MORF-20011


MG351
MORF-20353


MG352
MORF-20354


MG353
MORF-20355


MG354
MORF-20013
MORF-20014


MG355
MORF-20015
MORF-20016
MORF-20356


MG356
MORF-20357


MG357
MORF-20358


MG358
MORF-20017
MORF-20018
MORF-20019


MORF-20359


MG359
MORF-20019
MORF-20359
MORF-20360


MG360
MORF-20361


MG361
MORF-20362


MG362
MORF-20363


MG364
MORF-20364


MG365
MORF-20020
MORF-20365


MG366
MORF-20021


MG367
MORF-20366


MG368
MORF-20022
MORF-20366
MORF-20367


MG369
MORF-20022
MORF-20023


MG370
MORF-20368


MG371
MORF-20368
MORF-20369


MG372
MORF-20370


MG373
MORF-20024


MG374
MORF-20025


MG375
MORF-20371


MG376
MORF-20026


MG377
MORF-20027


MG378
MORF-20372


MG379
MORF-20373


MG380
MORF-20374


MG381
MORF-20028


MG382
MORF-20375


MG383
MORF-20376


MG384
MORF-20029
MORF-20377


MG385
MORF-20031
MORF-20378


MG386
MORF-20032
MORF-20379
MORF-20381


MG387
MORF-20382


MG388
MORF-20383


MG389
MORF-20033


MG390
MORF-20034
MORF-20384


MG391
MORF-20034
MORF-20035
MORF-20385


MG392
MORF-20036
MORF-20037
MORF-20386


MG393
MORF-20038


MG394
MORF-20387


MG395
MORF-20039


MG396
MORF-20388


MG397
MORF-20040
MORF-20041


MG398
MORF-20042


MG399
MORF-20389


MG400
MORF-20390


MG401
MORF-20043
MORF-20391


MG402
MORF-20392


MG403
MORF-20393


MG404
MORF-20394


MG405
MORF-20395
MORF-20396


MG406
MORF-20395
MORF-20396


MG407
MORF-20044
MORF-20397


MG408
MORF-20398


MG409
MORF-20045


MG410
MORF-20046
MORF-20399


MG411
MORF-20400


MG412
MORF-20047


MG413
MORF-20401


MG414
MORF-20048


MG415
MORF-20049


MG416
MORF-20050
MORF-20051


MG417
MORF-20402


MG418
MORF-20052


MG419
MORF-20053


MG420
MORF-20403


MG421
MORF-20404


MG422
MORF-20054
MORF-20055


MG423
MORF-20056


MG425
MORF-20406


MG427
MORF-20057


MG428
MORF-20058


MG429
MORF-20059
MORF-20407


MG430
MORF-20408


MG431
MORF-20409


MG432
MORF-20410


MG433
MORF-20411


MG435
MORF-20060
MORF-20412


MG436
MORF-20060
MORF-20412


MG437
MORF-20413


MG438
MORF-20414


MG439
MORF-20061


MG440
MORF-20062


MG441
MORF-20063


MG442
MORF-20415


MG443
MORF-20064


MG444
MORF-20065
MORF-20416


MG445
MORF-20417


MG447
MORF-20418


MG448
MORF-20419
MORF-20420


MG449
MORF-20419
MORF-20420


MG450
MORF-20066


MG451
MORF-20421


MG452
MORF-20067


MG453
MORF-20422


MG454
MORF-20423
MORF-20424


MG455
MORF-20423
MORF-20424


MG456
MORF-20068


MG457
MORF-20069
MORF-20425


MG458
MORF-20426


MG459
MORF-20070


MG460
MORF-20427


MG461
MORF-20428


MG462
MORF-20429


MG463
MORF-20430


MG464
MORF-20431


MG467
MORF-20432


MG468
MORF-20283


MG469
MORF-20434


MG470
MORF-20071
MORF-20435










[0280]

5










TABLE 2











UID
end5
end3
gene_len









MG016
 19253
 19756
 504



MG017
 19825
 20352
 528



MG027
 30092
 30544
 453



MG028
 30547
 31149
 603



MG064
 74066
 77683
3618



MG076
102870
102457
 414



MG105
133569
134168
 600



MG117
143310
143951
 642



MG147
186138
187262
1125



MG185
211445
213547
2103



MG186
216017
216766
 750



MG199
237094
236594
 501



MG202
239826
240191
 366



MG207
247523
247906
 384



MG211
250997
251437
 441



MG223
268011
269243
1233



MG230
276166
276624
 459



MG236
280663
281082
 420



MG241
286884
288743
1860



MG243
290976
291323
 348



MG246
293936
294778
 843



MG256
306819
307586
 768



MG267
325157
324813
 345



MG279
341181
340528
 654



MG284
346853
347248
 396



MG286
348260
348847
 588



MG296
364414
364028
 387



MG306
377974
376796
1179



MG321
402922
400121
2802



MG331
415622
414987
 636



MG333
416716
416339
 378



MG349
446576
447787
1212



MG350
447790
448722
 933



MG354
451197
451607
 411



MG366
462619
464619
2001



MG372
471234
470080
1155



MG373
472066
471224
 843



MG376
474892
474581
 312



MG377
475479
474901
 579



MG381
479570
480223
 654



MG397
502420
500723
 1698



MG415
520238
519929
 310



MG419
523215
522355
 861



MG427
533270
533692
 423



MG428
533806
534318
 513



MG436
542092
541739
 354



MG439
545378
544563
 816



MG440
546154
545381
 774



MG449
553295
552864
 432



MG450
554269
553559
 711



MG452
555665
556447
 783



MG468
318330
319202
 873











[0281]

6





TABLE 3










Whole Genome Sequencing Strategy








Stage
Description





Random small insert and large
Randomly shear genomic DNA on the


insert library construction
order of 2 kb and 15-20 kb, respectively


Library plating
Maximize random selection of small



insert and large insert clones for



template production


High-throughput DNA
Sequence xxx,xxxx templates from both


sequencing
ends (>99% genome coverage)


Assembly (TIGR Assembler,
Assembly of sequence fragments into


GRASTA)
contigs


Gap closure


a. Physical gaps
Order all contigs into a circular genome



and provide templates for closure of all



physical gaps


b. Sequence gaps
Complete the genome by primer walking


Editing
Visual inspection and resolution of all



sequence ambiguities when possible,



including frameshifts


Annotation
Identification and description of all



ORF's, putative identification, role



assignments










[0282]

7





TABLE 4










Computer simulation of random sequencing


experiments where L = 580,000 and w = 400.











Clones
Percent of

Number
Average


sequenced
genome
Base pairs
of double
gap length


(n)
unsequenced
unsequenced
strand gaps
(bp)





1000
50.18
291014
501
580


2000
25.18
146016
503
289


4000
 6.34
 36759
253
145


6000
 1.60
 9254
 97
 96


7250
 0.67
 3886
 48
 80


8000
 0.40
 2330
 32
 72


10000 
 0.10
  586
 10
 59










[0283]

8





TABLE 5












Mycoplasma genitalium
- EcoRI fragments













5′ Enzyme
Start Res
3′ Enzyme
End Res
Length
M W















EcoRI
572231
EcoRI
1530
9367
5763365


EcoRI
1531
EcoRI
6723
5193
3195384


EcoRI
6724
EcoRI
15283
8560
5266795


EcoRI
15284
EcoRI
25781
10498
6459359


EcoRI
25782
EcoRI
35532
9751
5999831


EcoRI
35533
EcoRI
39821
4289
2639037


EcoRI
39822
EcoRI
43179
3358
2066196


EcoRI
43180
EcoRI
43707
528
324906


EcoRI
43708
EcoRI
49410
5703
3509174


EcoRI
49411
EcoRI
62708
13298
8182420


EcoRI
62709
EcoRI
71387
8679
5340230


EcoRI
71388
EcoRI
80769
9382
5772840


EcoRI
80770
EcoRI
84845
4076
2507946


EcoRI
84846
EcoRI
89622
4777
2939580


EcoRI
89623
EcoRI
93383
3761
2314332


EcoRI
93384
EcoRI
94573
1190
732268


EcoRI
94574
EcoRI
102229
7656
4710994


EcoRI
102230
EcoRI
107347
5118
3149292


EcoRI
107348
EcoRI
110797
3450
2122895


EcoRI
110798
EcoRI
114909
4112
2530290


EcoRI
114910
EcoRI
116440
1531
942140


EcoRI
116441
EcoRI
137514
21074
12967294


EcoRI
137515
EcoRI
144092
6578
4047534


EcoRI
144093
EcoRI
155336
11244
6918646


EcoRI
155337
EcoRI
162136
6800
4184109


EcoRI
162137
EcoRI
163907
1771
1089750


EcoRI
163908
EcoRI
169816
5909
3636217


EcoRI
169817
EcoRI
171885
2069
1273325


EcoRI
171886
EcoRI
176630
4745
2920129


EcoRI
176631
EcoRI
221880
45250
27844584


EcoRI
221881
EcoRI
225692
3812
2345923


EcoRI
225693
EcoRI
228254
2562
1576700


EcoRI
228255
EcoRI
277826
49572
30503951


EcoRI
277827
EcoRI
282740
4914
3023818


EcoRI
282741
EcoRI
285470
2730
1679928


EcoRI
285471
EcoRI
292152
6682
4111409


EcoRI
292153
EcoRI
293879
1727
1062607


EcoRI
293880
EcoRI
312725
18846
11596154


EcoRI
312726
EcoRI
347231
34506
21232617


EcoRI
347232
EcoRI
352330
5099
3137714


EcoRI
352331
EcoRI
362310
9980
6140434


EcoRI
362311
EcoRI
377990
15680
9648201


EcoRI
377991
EcoRI
390080
12090
7439090


EcoRI
390081
EcoRI
402043
11963
7361170


EcoRI
402044
EcoRI
408452
6409
3943775


EcoRI
408453
EcoRI
419230
10778
6631662


EcoRI
419231
EcoRI
422653
3423
2106066


EcoRI
422654
EcoRI
425383
2730
1679735


EcoRI
425384
EcoRI
426391
1008
620235


EcoRI
426392
EcoRI
439467
13076
8046286


EcoRI
439468
EcoRI
444297
4830
2971763


EcoRI
444298
EcoRI
444940
643
395631


EcoRI
444941
EcoRI
452525
7585
4667018


EcoRI
452526
EcoRI
455595
3070
1888976


EcoRI
455596
EcoRI
461533
5938
3653550


EcoRI
461534
EcoRI
467016
5483
3373523


EcoRI
467017
EcoRI
483871
16855
10370549


EcoRI
483872
EcoRI
487269
3398
2090889


EcoRI
487270
EcoRI
488085
816
502090


EcoRI
488086
EcoRI
488496
411
252914


EcoRI
488497
EcoRI
498574
10078
6201025


EcoRI
498575
EcoRI
499113
539
331666


EcoRI
499114
EcoRI
516146
17033
10480304


EcoRI
516147
EcoRI
524998
8852
5446303


EcoRI
524999
EcoRI
527362
2364
1454583


EcoRI
527363
EcoRI
529777
2415
1485826


EcoRI
529778
EcoRI
530256
479
294749


EcoRI
530257
EcoRI
531045
789
485489


EcoRI
531046
EcoRI
533591
2546
1566584


EcoRI
533592
EcoRI
549000
15409
9480966


EcoRI
549001
EcoRI
550638
1638
1007852


EcoRI
550639
EcoRI
563713
13075
8045103


EcoRI
563714
EcoRI
566925
3212
1976345


EcoRI
566926
EcoRI
572230
5305
3264227










[0284]

9




















MG#
Identification
MatchAcc
% ID
Length
MG#
Identification
MatchAcc
% ID
Length
























*MG394
Uridine Kinase (udk) (Escherichia coli)
SP: P31218
34.5
204
*MG390
arginyl-tRNA synthetase (argS) (Corynebacterium glutamicum)
SP: P35868
33.6
431












Purine ribonucleotide biosynthesis
*MG114
asparaginyl-tRNA synthetase (asnS) (Escherichia coli)
GP: M33145_1
41.5
449
















*MG107
5′ guanylate kinase (gmk) (Escherichia coli)
GP: L10328_14
42.6
183
*MG036
aspartyl-tRNA synthetase (aspS) (Thermus aqusticus)
SP: P36419
40.9
563


*MG175
adenylate kinase (adk) (Bacillus stearothermophilus)
GP: M88104_2
32.2
210
*MG258
cysteinyl-tRNA synthetase (cryS) (Bacillus subtilis)
GP: D26185_158
34.3
437


*MG058
phosphoribosyloyrophosphate synthetase (prs)
GP: D26185_114
44.4
310
*MG474
glutamyl-tRNA synthetase (gtiX) (Bacillus stearothermophilus)
GP: M55072_1
42.9
480



(Bacillus subtilis)












Pyrimidine nibonucleotide biosynthesis
*MG256
glycyl-tRNA synthetase (Bombyx mori)
GP: L06106_1
35.9
574


Salvage of nucleosides and nucleotides
*MG035
histidyl-tRNA synthetase (hisS) (Mycobacterium leprae)
GP: U00011_2
30.7
386
















*MG284
adenine phosphoribosyltransferase (apt)
GP: M14040_1
34.1
153
*MG357
Isoleucyl-tRNA synthetase (ileS) (Escherichia coli)
SP: P00958
33.3
921



(Escherichia coli)


*MG052
cytidine deaminase (cdd) (Mycoplasma pirum)
GP: L13289_4
38.2
121
*MG274
leucyl-tRNA synthetase (leuS) (Bacillus stearothermophilus)
GP: M88581_1
43.4
799


*MG340
cytidylate kinase (cmk) (Bacillus subtilis)
SP: P38493
40.4
215
*MG137
lysyl-tRNA synthetase (lysS) (Bacillus subtilis)
GP: D26185_144
45.6
490


MG276
deoxyguanosine/deoxyadenosine kinase(I) subunit 2
GP: U01881_2
29.5
164
*MG377
methlonyl-tRNa lormytransferase (lmt) (Escherichia coli)
GP: X63668_2
24.1
304



(Lactobacillus acrdophilus)



*MG021
methlonyl-tRNA synthetase (metS) (Bacillus subtilis)
GP: D26185_101
37.5
515


*MG470
hypoxanthine-guanine phosphorlbosyltransferase
SP: O02522
38.4
170
*MG085
peptidyl-tRNA hyorolase homolog (ptn) (Borrelia burgdorieri)
GP: L32144_1
38.2
154



(hpt) (Lactococcus lactus)



*MG201
phenylalanyl-tRNA synthetase beta chain (pheT) (Bacillus subtilis)
SP: P17922
26.0
677


*MG048
punne-nucleoside phosphorylase (deoD)
GP: U14003_295
44.3
228
*MG200
phenylalanyl-tRNA synthetase beta-subunit (pheS) (Escherichia coli)
GP: V00291_5
35.1
320



(Escherichia coli)


*MG034
thymidine kinase (Bacillus subtilis)
GP: M97678_5
48.1
187
*MG292
prolyl-tRNA synthetase (proS) (Escherichia coli)
GP: M97858_1
22.7
438


MG051
thymidine phosphorylase (deoA)
GP: L13289_3
52.7
416
*MG005
Seryl-tRNA synthetase (serS) (Bacillus subtilis)
GP: D26185_77
42.6
416



(Mycoplasma plrum)


*MG030
uracil phosphorlbosyltransferase (upp)
GP: Z27121_3
44.9
206
*MG387
threonyl-tRNA synthetase (thrSv) (Bacillus subtills)
GP: M36594_1
38.7
558



(Mycoplasma hominis)















Sugar-nucleotide biosynthesis and conversions
*MG457
tRNA (guanine-N1)-methyltransferase (trmD) (Salmonelia
SP: P36245
40.8
223













*MG119
UDP-glucose 4-epinerase (galE) (Escherichia coli)
SP: P09147
34.1
322

typhimurium)
















*MG465
UDP-glucose pyrophosphorylase (gtaB)
GP: L12272_1
48.0
277
*MG127
tryptophanyl-tRNA synthetase (trpS) (Bacillus subtills)
GP: M24068_1
41.2
324



(Bacillus subtilis)



*MG466
tyrosyl tRNA synthetase (tyrS) (Bacillus stearothermophilus)
GP: M77668_1
38.5
418












Regulatory functions
*MG344
valyl-tRNA synthetase (valS) (Bacillus subtilis)
SP: O06873
38.5
857












*MG396
GTP-binding protein (obg) (Bacillus subtilis)
GP: M24537_2
39.6
426
Degradation of proteins, peptides, and glycopeptides
















*MG399
GTP-binding protein era homolog (spg)
SP: P37214
27.4
273
*MG334
aminopeptidase P (pepP) (Escherichia coli)
GP: D00398_1
30.5
254



(Streptococcus mulans)


*MG460
pilB homolog transcription repressor
GP: Z33052_1
53.5
128
*MG403
aminopeptidase (Mycoplasma salivarium)
GP: D17450_1
44.6
303



(Mycoplasma capricolum)


*MG420
PILB protein MOTIF (Neisseria gonorrhoeae)
SP: P14930
49.2
127
*MG244
ATP-dependent protease (lon) (Bacillus subtilis)
SP: P37945
43.6
753


*MG105
virulence associated protein homolog (vacB)
GP: U14003_91
29.2
560
*MG367
ATP-dependent protease binding subunit (ctpB) (Escherichia coli)
GP: M29364_2
47.7
709



(Escherichia coli)













*MG067
glutamic acid specific protease prepropetide (Staphylococcus
GP: D00730_1
28.8
250








Replication
aureus)












Degradation of DNA
MG224
IgA1 protease (Haemophilus Influenzae)
GP: M87491_1
32.2
675
















MG032
ATP-dependent nuclease (addA) (Bacillus subtilis)
GP: M63489_1
26.8
706
MG186
oligoendopeptidase F (pepF) (Lactococcus lactis)
GP: Z32522_1
30.0
442


MG240
endonuclease IV (nfo) (Escherichia coli)
SP: P12638
29.4
267
MG321
proline iminopeptidase (pip) (Bacillus coagulans)
GP: D11037_1
29.2
209












DNA replication, restinction, modification, recombination, and repair
MG020
proline iminopeptidase (pip) (Neisseria gonorrhoaeae)
GP: Z25461_2
37.5
281
















*MG481
chromosomal replication initiator protein (dnaA)
SP: P34028
30.9
432
*MG046
sialoglycoprotease (gcp) (Pasteurella haemolytica)
GP: M62384_1
36.4
313



(Spiroplasma citri)












*MG210
DNA gyrase subunit A (Mycoplasma genitalium)
GP: U09251_4
37.4
782
Nucleoproteins


*MG004
DNA gyrase subunit A (Mycoplasma genitalium)
GP: U09251_4
99.9
835
Protein modification and translation factors
















*MG003
DNA gyrase subunit B (gyrB)
GP: U09251_3
99.2
645
*MG090
elongation factor G (fus) (Thermus aquaticus)
SP: P13551
59.2
683



(Mycoplasma genitalium)


*MG249
DNA helicase II (mutB1) (Haernophilus influenzae)
GP: M99049_1
36.0
715
*MG026
elongation factor P (efp) (Escherichia coli)
GP: U14003_62
26.4
162


*MG259
DNA ligase (lig) (Escherichia coli)
GP: M24278_1
38.2
657
*MG445
elongation factor Ts (lsf) (Spiroplasma citri)
GP: M31161_2
39.1
294


*MG269
DNA polymerase I (poll) MOTIF
GP: L11920_1
29.9
837
*MG463
elongation factor TU (luf) (Mycoplasma genitalium)
SP: P13927
100.0
383



(Mycobacterium tuberculosis)


*MG031
DNA polymerase III (polC) (Mycoplasma pulmonis)
GP: U06833_1
38.1
1352
*MG176
methionine amino peptidase (Bacillus subtilis)
GP: D00619_5
36.3
245


*MG001
DNA polymerase III beta subunit (dnaN)
GP: U09251_1
100.0
97
*MG263
peptide chain release factor I (RF-1) (Escherichia coli)
GP: M11519_1
43.2
320



(Mycoplasma genitalium)


MG007
DNA polymerase III subunit (dnaH) MOTIF
GP: D26185_83
22.7
142
*MG108
polypeptide delormylase (lormylmethionine delormylase) (def)
SP: P27251
36.9
107



(Bacillus subtilis)


*MG432
DNA polymerase III subunit (dnaH)
GP: D26185_83
49.1
224

MOTIF (Escherichia coli)



(Bacillus subtilis)


*MG268
DNA polymerase III, alpha chain (dnaE)
GP: M19334_4
31.9
843
MG109
protein phosphatase 2C homolog (ptc1) MOTIF (Saccharomyces
SP: P35182
27.5
141


*MG010
DNA primase (dnaE) MOTIF (Clostridium
SP: P33655
25.7
174

cerevisiae)





acatobutylicum
) (Escherichia coli)



*MG255
DNA primase (dnaE) (Bacillus subtilis)
GP: M10040_1
27.3
587
MG110
protein serine/threonine kinase MOTIF (Arabidopsis thaliana)
PIR: S36944
33.7
242


*MG123
DNA topolsomerase I (topA) (Bacillus subtilis)
GP: L27797_2
38.9
658
*MG146
protein synthesis initiation factor 2 (inIB) (Bacillus subtilis)
GP: M34836_1
48.0
619


*MG433
excinuclease ABC subunit A (uvrA)
SP: P07671
47.8
842
*MG447
ribosome releasing factor (irr) (Escherichia coli)
GP: D26552_57
34.9
169



(Escherichia coli)


*MG075
excinuclease ABC subunit B (uvrB)
SP: P07025
48.0
662
*MG291
transcription elongation factor (greA) (Rickettsia prowazekll)
SP: P27640
40.1
135



(Escherichia coli)


*MG270
formamidopyrimidine-DNA glycosylase (tpg)
SP: P19210
37.6
272
*MG202
translation initiation factor IF3 (inIC) Bacillus stearothermophilus)
GP: X16188_1
31.3
133



(Bacillus firmus)












*MG391
glucose inhibited division protein (gidA)
GP: L10328_106
40.3
600
Ribosomal proteins synthesis end modification



(Escherichia coli)
















*MG392
glucose inhibited division protein (gidB)
GP: L10328_105
24.8
143
*MG084
ribosomal protein L1 (rpL1) (Bacillus stearothermophilis)
SP: P04447
48.2
221



(Escherichia coli)


*MG370
Holliday junction DNA helicase (ruvA)
GP: M21298_1
26.2
153
*MG373
ribosomal protein L10 (rpL10) (Thermologa maritime)
SP: P29394
29.8
162



(Escherichia coli)


*MG371
Holliday junction DNA helicase (ruvB)
GP: M21298_2
34.7
297
*MG083
ribosomal protein L11 (RPL11) (Thermologa maritime)
SP: P29395
51.8
140



(Escherichia coli)


MG187
methyltransferase (ssoIM)
GP: M97479_2
42.5
314
*MG430
ribosomal protein L13 (Escherichia coli)
SP: P02410
39.9
137



(Shigella sonnei)


*MG349
recombination protein (recA)
GP: L25893_1
46.6
292
*MG165
ribosomal protein L14 (rpL14) (Bacillus stearothermophilus)
SP: P04450
63.1
121



(Staphylococcus aureus)


*MG095
replicative DNA helicase (dnaB)
SP: P03005
33.1
439
*MG173
ribosomal protein L15 (rpL15) (Mycoplasma capricolum)
SP: P10138
41.9
144



(Escherichia coli)


MG450
restriction-modification enzyme EcoD
GP: J01631_1
24.6
390
*MG162
ribosomal protein L16 (rpL16) (Mycoplasma capricolum)
SP: P02415
63.5
136



specificity subunit (nsdS) (Escherichia coli)



*MG161
ribosomal protein L17 (rpL17) (Bacillus subtilis)
GP: M26414_6
34.8
115


*MG047
S-adenosylmethlonine synthetase 2 (metX)
SP: P30869
43.8
363
*MG171
ribosomal protein L18 (rpL18) (Bacillus stearothermophilus)
GP: M57624_1
43.0
113



(Escherichia coli)


*MG092
single-stranded DNA binding protein (ssb)
GP: U04997_2
21.8
162
*MG456
ribosomal protein L19 (rpL19) (Bacillus stearothermophilus)
SP: P30529
49.1
111



(Haemophilus Influenzae)


*MG209
topoisomerase II subunit B (topIIB)
GP: L35044_2
52.4
630
*MG158
ribosomal protein L2 (rpL2) (Bacillus stearothermophilus)
SP: P04257
58.4
273



(Mycoplasma gallisepticum)


*MG098
uracil DNA glycosylase (ung) (Escherichia coli)
GP: D13169_3
32.6
217
*MG238
ribosomal protein L21 (rpL21) (Bacillus subtilis)
SP: P26908
37.9
98













*MG160
ribosomal protein L22 (rpL22) (Mycoplasma-like organism)
GP: M74770_4
49.0
103












Transcription
*MG157
ribosomal protein L23 (Bacillus stearothermophilus)
SP: P04454
38.7
89


Degradation of RNA
*MG166
ribosomal protein L24 (Bacillus stearothermophilus)
SP: P04455
44.6
83
















*MG379
ribonuclease III (rnc) (Escherichia coli)
GP: X02673_1
30.2
118
*MG239
ribosomal protein L27 (rpL27) (Bacillus subtilis)
GP: K02665_2
64.4
88


*MG477
RNaseP C5 subunit (Mycoplasma capricolum)
GP: D14982_2
40.0
78
*MG163
ribosomal protein L29 (Thermologa mantima)
SP: P38514
41.7
59












RNA synthesis, modification, and DNA transcription
*MG155
ribosomal protein L3 (rpL3) (Mycoplasma capricolum)
SP: P10134
42.6
213
















*MG319
ATP-dependent RNA helicase (deaD)
SP: P23304
23.1
369
*MG335
ribosomal protein L33 (Bacillus stearothermophilus)
SP: P23375
58.1
42



(Escherichia coli)


*MG437
ATP-dependent RNA helicase (deaD)
SP: P23304
32.4
390
*MG478
ribosomal protein L34 (rpL34) (Escherichia coli)
GP: L10328_67
67.4
45



(Escherichia coli)


*MG352
DNA-directed RNA polymerase beta' chain (rpoC)
SP: P00577
44.5
1348
*MG156
ribosomal protein L4 (rpL4) (Bacillus stearothermophilus)
SP: P28601
39.2
205



(Escherichia coli)


*MG018
helicase (mol1) MOTIF (Saccharomyces cerevisiae)
SP: P32333
36.5
502
*MG167
ribosomal protein L5 (rpL5) (Bacillus stearothermophilus)
SP: P08895
57.5
178


*MG145
N-utilization substance protein A homolog (nusA)
SP: P32727
30.9
360
*MG170
ribosomal protein L6 (rpL6) (Mycoplasma capricolum)
SP: P04446
46.4
179



(Bacillus subtilis)


*MG180
RNA polymerase alpha-core-subunit (rpoA)
GP: M26414_5
39.4
295
*MG374
ribosomal protein L7/L12 (‘A’type) (rpL7/L12)(Bacillus subtilis)
SP: P02394
47.5
118



(Bacillus subtilis)


*MG353
RNA polymerase beta-subunit (rpoB)
GP: L24376_3
46.5
1144
*MG094
ribosomal protein L9 (rpL9) (Bacillus stearothermophilus)
GP: M57623_1
32.9
148



(Bacillus subtilis)


MG022
RNA polymerase delta-subunit (rpoE)
GP: M21677_1
28.7
152
*MG154
ribosomal protein S10 (rpS10) (Thermologa maritime)
SP: P38518
48.9
91



(Bacillus subtilis)


*MG254
RNA polymerase sigma-A factor (sigA)
SP: P33656
43.7
370
*MG179
ribosomal protein S11 (rpS11) (Escherichia coli)
GP: X02543_2
47.8
112



(Clostridium acetobutylicum)


*MG054
transcription antitermination factor (nusG)
GP: D13303_4
30.9
171
*MG088
ribosomal protein S12 (rpS12) (Bacillus stearothermophilus)
SP: P09901
75.4
133



(Bacillus subtilis)



*MG178
ribosomal protein S13 (rpS13) (Bacillus subtilis)
GP: M26414_3
63.3
119












Translation
*MG168
ribosomal protein S14 (Mycoplasma capricolum)
GP: X06414_15
70.0
59


Amino acyl tRNA synthetases and tRNA modification
*MG438
ribosomal protein S15 (BS18) (Bacillus stearothermophilus)
SP: P05768
48.1
80
















*MG303
alanyl-tRNA-synthetase (alaS) (Escherichia coli)
GP: J01581_1
33.8
795
*MG458
ribosomal protein S16 (BS17) (Bacillus subtilis)
SP: P21474
48.8
81








Amino acid biosynthesis
Central Intermediary metabolism


Aromatic amino acid family
Amino sugars


Aspartate family
Degradation of polysaccharides












Branched chain family
*MG222
bifunctional endo-1,4-beta-xylanase xyla precursor MOTIF
SP: P29126
37.6
240








Glutamate family
(Ruminococcus flavefaciens)


Pyruvate family
Other












Sarine family
*MG369
acetate kinase (Bacillus subtilis)
GP: L17320_2
42.7
391
















*MG406
serine hydroxymethyltransferase (glyA)
SP: P06192
55.3
397
*MG038
glycerol kinase (glpK) (Escherchia coli)
GP: L19201_68
46.8
498



(Salmonella typhimurium)













*MG304
glycerophosphoryl diester phosphodiesterase (glpO) (Bacillus subtilis)
SP: P37965
30.4
235


Biosynthesis of cofactors, prosthetic groups, and carriers
*MG310
phosphotransacetylase (Closindium acetobutylicum)
SP: P39648
44.7
320








Biotin
Phosphorus compounds












Folic acid
*MG363
Inorganic pyrophosphalase (ppa) (Thermoplasma acidophilum)
SP: P37981
38.9
156












*MG013
5,10-methylene-tetrahydrofolate dehydrogenase
GP: D10588_1
33.0
238
Polyamine biosynthesis



(foID) (Escherichia coli)



Polysaccharides —(cytoplasmic)


*MG234
dihydrofolate reductase
GP: X60681_1
33.1
166
Sulfur metabolism



(Lactococcus lactis)







Hemo and porphyrin












*MG264
protoporphyrinogen oxidase (hernK)
GP: D28567_2
30.6
160
Energy metabolism



(Escherichia coli)








Locate
Aerobic












Menaquinone and ubiquinone
*MG039
glycerol-3-phospate dehydrogenase (GUT2) (Saccharomyces
PIR: S48379
43.2
212


Molybdopterin



cerevisiae
)



Pantothenate
*MG472
L-lactate dehydrogenase (ldh) (Mycoplasma hyopneumoniae)
SP: P33572
50.3
312


Pyndoxne
MG283
NADH oxidase (nox) (Enterococcus faecalis)
SP: P37061
39.2
433








Riboflavin
Amino acids and amines


Thoredoxin, glutaredoxin, and glutathione
Anaerobic



ATP-proton motive force interconversion












Cell envelope
*MG410
ATP synthase epsilon chain (atpC) (Mycoplasma gallisepticum)
SP: P33255
36.9
129


Membranes, lipoproteins, and portis
*MG411
ATP synthase beta chain (atpD) (Mycoplasma gallisepticum)
SP: P33253
81.0
377
















MG328
fibronectin-binding protein (fnbA)
GP: J04151_1
24.6
913
*MG412
ATP synthase gamma chain (atpG) (Mycoplasma gallisepticum)
SP: P33257
37.9
285



(Staphylococcus aureus)


MG040
membrane lipoprotein (tmpC) (Treponema pallidum)
SP: P29724
30.9
248
*MG413
ATP synthase alpha chain (atpA) (Mycoplasma gallisepticum)
SP: P33252
63.4
517


*MG087
prolipoprotein diacylglyceryl transferase
GP: L13259_2
29.1
261
*MG414
ATP synthase delta chain (atpH) (Mycoplasma gallisepticum)
SP: P33254
33.9
168



(Salmonella typhimurium)












Murein sacculus and peptidoglycan
*MG415
ATP synthase B chain (atpF) (Mycoplasma gallisepticum)
SP: P33258
36.6
192


Surface polysaccharides, lipopolysaccharides and antigens
*MG416
ATP synthase C chain (atpE) (Mycoplasma gallisepticum)
SP: P33258
50.0
77
















*MG368
lic-1 operon protein (licA) MOTIF
GP: M27280_1
27.8
152
*MG417
adenosinetriphosphatase (atpB) (Mycoplasma gallisepticum)
GP: X64256_2
35.7
292



(Haemophilus Influenzae)












*MG060
lipopolysaccharide biosynthesis protein
SP: P26401
36.1
185
Electron transport



(rfbV) MOTIF (Salmonella typhimurium)



Entner-Doudoroff


*MG277
surface protein antigen precursor (pag)
GP: D90354_1
25.5
797
Fermentation



MOTIF (Streptococcus sobrinus)



Gluconeogenesis








Surface structures
Glycolysis
















MG196
attachment protein (mgpA)
SP: P20796
100.0
1443
*MG063
1-phosphotructokinase (lruK) (Escherichia coli)
SP: P23539
26.3
268



(Mycoplasma genitalium)


MG190
attachment protein repeat (mgpA)
SP: P20796
36.6
903
*MG220
6-phosphotructokinase (phosphotructokinase) (phosphohexokinase)
SP: P20275
39.4
321



(Mycoplasma genitalium)


MG267
attachment protein repeat (mgpA)
SP: P20796
38.0
963

(Spiroplasma citn)



(Mycoplasma genitalium)


MG188
attachment protein repeat (mgpA)
SP: P20796
61.8
943
*MG419
enolase (Bacillus subtilis)
GP: L29475_4
54.1
425



(Mycoplasma genitalium)


MG069
attachment protein repeat (mgpA)
SP: P20796
76.4
760
*MG023
fructose-bisphosphate aidolase (tsr) (Bacillus subtilis)
GP: M22039_4
46.0
282



(Mycoplasma genitalium)


MG189
attachment protein repeat (mgpA)
SP: P20796
77.9
763
*MG312
glyceraldehyde-3-phosphate dehydrogenase (gap) (Clostridium
GP: X72219_1
56.1
329



(Mycoplasma genitalium)


MG232
attachment protein repeat (mgpA)
SP: P20796
78.2
86

pasteurianum)



(Mycoplasma genitalium)


MG297
attachment protein repeat (mgpA)
SP: P20796
80.2
756
*MG112
phosphoglucose Isomerase B (pgiB) (Bacillus stearothermophilus)
SP: P13376
34.8
424



(Mycoplasma genitalium)


MG141
attachment protein repeat (mgpA)
SP: P20796
80.3
753
*MG311
phosphoglycerate kinase (Thermotoga maritima)
SP: P36204
51.3
383



(Mycoplasma genitalium)


*MG198
attachment protein repeat (mgpA)
SP: P20796
81.3
753
MG442
phosphoglycerate mutase (pgm) (Bacillus subtilis)
GP: L29475_3
45.2
510



(Mycoplasma genitalium)


MG266
attachment protein repeat (mgpA)
SP: P20796
82.2
753
*MG221
pyruvate kinase (pyk) (Lactococcus lactis)
GP: L07920_2
35.3
467



(Mycoplasma genitalium)


MG351
attachment protein repeat (mgpA)
SP: P20796
84.3
734
*MG443
triosephosphate Isomerase (tim) Thermotoga maritima)
GP: L27492_1
39.8
247



(Mycoplasma genitalium)












*MG398
Cylacherence-accessory protein (hmw1)
GP: U11381_1
34.1
876
Pentose phosphate pathway



(Mycoplasma pneumoniae)
















MG323
Cylacherence-accessory protein (hmw1)
GP: U11381_1
39.3
1015
*MG272
6-phosphogluconate dehydrogenase (gnd) (Escherichia coli)
GP: M64324_1
29.9
440



(Mycoplasma pneumoniae)


*MG327
Cylacherence-accessory protein (hmw3)
GP: M82965_1
41.1
669
*MG066
transketolase 1 (TK 1) (tk1A) (Escherichia coli)
SP: P27302
32.8
647



(Mycoplasma pneumoniae)









Pyruvate dehydrogenase












Cellular processes
*MG280
dihydrolipoamide acetyltransferase (pdhC) (Acholeplasma Isidiawi)
GP: M81753_3
45.2
524


Cell division
*MG279
lipoamide dehydrogenase component (E3) of pyruvate dehydrogenase
SP: P11959
38.4
453
















*MG469
cell division protein (ftsH) (Bacillus subtilis)
GP: D26185_132
49.7
627

complex dihydrolipoamide dehydrog





*MG308
cell division protein (ftsY) (Escherichia coli)
GP: U00039_18
36.1
323
*MG282
pyruvate dehydrogenase E1-alpha subunit (pdhA) (Acholeplasma
GP: M81753_1
43.0
341


*MG229
cell division protein (ftsZ) (Staphylococcus aureus)
GP: U06462_1
30.9
274

ladiawi)












Cell killing
*MG281
pyruvate dehydrogenase E1-beta subunit (pdhB) (Acholeplasma
GP: M81753_2
55.0
317













*MG150
hemolysin (ftyC) (Serpulina hyodysenterise)
GP: X73141_2
26.3
234

ladiawi)


MG225
pre-procylotoxin (Helicobacter pylon)
GP: Z26883_1
36.1
789
Sugars












Chaperones
*MG113
D-ribulose-5-phosphate 3 epimerase (ctxEc) (Aicafigenes eutrophus)
GP: M64173_3
33.1
175
















*MG404
groEL protein (Bacillus stearothermophilus)
GP: L10132_2
51.5
524
*MG050
deoxyribose-phosphate aidolase (deoC) (Mycoplasma pneumoniae)
GP: X13544_1
83.0
223


*MG206
heat shock protein (dnaJ) MOTIF (Coxiella burnatil)
GP: L36455_1
33.6
349
MG408
galactosidase acetyltransferase (Streptococcus mutans)
GP: M80797_2
40.3
135


MG002
heat shock protein (dnaJ) MOTIF
SP: P35514
40.0
60
*MG053
phosphomannomutase (cpsG) (Mycoplasma pirum)
GP: L13289_5
38.6
534



(Lactococcus lactis)












*MG019
heat shock protein (dnaJ) (Lactococcus lactis)
SP: P35514
34.0
357
TCA cycle


*MG207
heat shock protein (grpE) (Bacillus subtilis)
GP: M84964_2
31.7
158


*MG405
heat shock protein 60 (GroEL) like protein
GP: D17398_1
39.6
87
Fatty acid and phospholipid metabolism

















(PggroES) (Porphyromonas gingivalis)



*MG217
1-acyl-sn-glycerol-3-phosphate acetyltransferase (ptsC) (Borrella
GP: L32881_1
32.1
119


*MG316
heat shock protein 70 (HSP70)
GP: D30690_3
57.3
580

burgdorfen)



(Staphylococcus aureus)












Detoxification
*MG448
CDP-diglyceride synthetase (cdsA) (Escherichia coli)
GP: M11330_1
38.0
120
















*MG008
thiophene and furan oxidizer (tohF)
GP: D26185_60
31.9
456
MG380
fatty acid phosphol-pid synthesis protein (ptsX) (Escherichia coli)
GP: M96793_1
29.0
327



(Bacillus subtilis)












Protein and peptide secretion
MG086
hydroxymethylglutaryl-CoA reductase (NADPH)
PIR: S24760
23.3
502




(Nicotiana sylvestris)
















*MG139
GTP-binding membrane protein (lepA)
GP: K00426_1
47.5
589
*MG115
phosphatidylglycerophosphate synthase (pgsA) (Escherichia coli)
GP: M12299_2
29.3
156



(Escherichia coli)


*MG182
haemolysin secretion ATP-binding protein
SP: P11599
34.6
236













(hlyB) MOTIF (Proteus vulgaris)



Purines, Pyrimidines, nucleosides, and nucleotides


*MG074
preprotein translocase (secA) (Bacillus sutilis)
GP: D10279_2
43.7
764
2′-Deoxyribonucleotide metabolism
















*MG174
preprotein translocase secY subunit (SecY)
SP: P10250
38.8
449
*MG237
ribonucleoside-diphosphate reductase (nrdE) (Salmonella
GP: X73226_1
54.1
703



(Mycoplasma capricolum)


MG215
prolipoprotein signal peptidase (lsp)
GP: M83994_1
32.4
145

typhimurium)



(Staphylococcus aureus)


*MG049
signal recognition particle protein (lfh)
SP: P37105
43.0
439
*MG235
ribonucleotide reductase 2 (nrdF) (Salmonella typhimurium)
SP: P17424
50.0
313



(Bacillus subtilis)


*MG243
trigger factor (tig) Escherichia coli)
GP: M34066_1
24.6
391
*MG125
thioredoxin (trx) (Bacillus subtilis)
GP: J03294_1
36.1
98












Transformation
*MG103
thioredoxin reductase (trxB) (Escherichia coli)
GP: J03762_1
38.6
299
















MG326
competence locus E (comE3) MOTIF
GP: L15202_4
30.5
239
*MG233
thymidylate synthase (thyA) (Staphylococcus aureus)
SP: P13954
56.6
311



(Bacillus subtilis)









Nucleotide and nucleoside interconversions
















*MG164
ribosomal protein S17 (Mycoplasma capricolum)
SP: P10131
51.2
82
*MG309
115 kDa protein (p115) (Mycoplasma hyorhinis)
GP: M34958_1
33.4
975


*MG093
ribosomal protein S18 (rpS18) (Escherichia coli)
GP: U14003_114
45.5
64
*MG065
heterocysl maturation protein (devA) (Anabaena sp.)
GP: X75422_1
35.3
221


*MG159
ribosomal protein S19 (Escherichia coli)
GP: X02613_6
58.6
86
*MG479
heterocysl maturation protein (devA) (Anabaena sp.)
GP: X75422_1
39.9
198


*MG072
ribosomal protein S2 (Spirulina platensis)
SP: P34831
34.8
247
MG100
hydrolase (aux2) (Agrobacterium rhizogenes)
GP: M61151_1
32.1
458


*MG161
ribosomal protein S3 (rpS3)
SP: P02353
46.7
212
*MG223
macrogolgin (Homo sapiens)
PIR: S37538
25.3
3055



(Mycoplasma capricolum)


*MG322
ribosomal protein S4 (rpS4) (Bacillus subtilis)
GP: M59358_1
43.0
197
*MG337
magnesium-chelatase 30 kDa subunit (bchO) (Rhodobacter
SP: P26174
26.7
245


*MG172
ribosomal protein S5 (Bacillus stearothermophilus)
GP: M57621_1
56.0
157

capsulatus)


*MG012
ribosomal protein S6 modification protein (nmK)
SP: P17116
31.5
127
*MG315
membrane associated ATPase (cbrO)) (Propionibacterium
GP: U13043_1
30.0
227



MOTIF (Escherichia coli)




Ireudenrechii)


*MG091
ribosomal protein S6 (Escherichia coli)
SP: P02358
23.9
87
MG376
mobilization protein (mob13) MOTIF (Leuconostoc cenos)
GP: M95954_1
30.9
161


*MG089
ribosomal protein S7 rpS7)
SP: P22744
64.9
153
MG372
muc8 protein (muc8) (Salmonella typhimurium)
SP: P14303
22.1
331



(Bacillus stearothermophilus)


*MG169
ribosomal protein S8 (Mycoplasma capriocolum)
SP: P04446
46.9
125
*MG346
nitrogen fixation protein (nifS) (Mycobacterium leprae)
GP: U00013_6
26.2
358


*MG429
ribosomal protein S9 (rpS9)
SP: P07842
52.0
125
MG296
nodulation protein F (host-specificity of nodulation protein A)
SP: P04686
34.9
86



(Bacillus stearothermophilus)









(Rhizobium legumnosarum)












Transport and binding protein
MG299
protein L (Peptostreptococcus magnus)
GP: L04466_1
31.1
663


Amino acids, peptides and amines
*MG338
protein V (IcrV) (Streptococcus sp.)
GP: X62467_1
28.3
478
















MG231
aromatic amino acid transport protein (aroP)
GP: D26562_11
24.6
389
*MG149
protein X (Pseudomonas fluorescens)
GP: M35367_1
29.1
280



(Escherichia coli)


*MG314
membrane transport protein (gtnQ)
GP: M61017_1
32.0
219
MG132
protein X (Spiroplasma citn)
GP: M31161_3
21.6
88



(Bacillus stearothermophilus)


*MG183
membrane transport protein (gtnQ)
GP: M61017_1
37.4
210
*MG288
sensory rhodopsin II transducer (hiril) MOTIF (Natronobacterium
GP: Z35088_1
15.7
208



(Bacillus stearothermophilus)


*MG081
oligopeptide transport ATP-binding protein
SP: P18765
47.9
336

pharaonis)



(amiE) (Streptococcus pneumoniae)



*MG059
small protein (smpB) (Escherichia coli)
GP: D12501_1
32.6
128


*MG082
oligopeptide transport ATP-binding protein (amiF)
SP: P18766
46.6
250










(Streptococcus pneumoniae)
Hypothetical
















*MG080
oligopeptide transport system permease protein
SP: P26904
33.5
269
MG142
hypothetical 130K protein (P1 operon) MOTIF (Mycoplasma
PIR: JS0069
55.4
512



(dciAC) (Bacillus subtilis)




pneumoniae


*MG079
oligopeptide transport system permease protein
SP: P24138
28.1
308
MG199
hypothetical 130K protein (P1 operon) (Mycoplasma pneumoniae)
PIR: JS0069
45.2
570



(oppB) (Bacillus subtilis)



MG195
hypothetical 28K protein (P1 operon) (Mycoplasma pneumoniae)
PIR: JS0068
61.7
239


*MG042
spermidine/putrescine transport ATP-binding protein
GP: M64519_1
41.9
262
*MG342
hypothetical protein (GB: D10165_3) (Escherichia coli)
GP: D10165_3
28.9
233



(potA) (Escherichia coli)



*MG227
hypothetical protein (GB: D10483_63) (Escherichia coli)
GP: D10483_63
35.2
304


*MG043
spermidine/putrescine transport system permease
GP: M64519_2
26.5
221
*MG476
hypothetical protein (GB: D14982_3) (Mycoplasma capricolum)
GP: D14982_3
32.0
377



protein (potB) (Escherichia coli)



MG455
hypothetical protein (GB: D16311_1) (Bacillus subtilis)
GP: D16311_1
26.2
267


*MG044
spermidine/putrescine transport system permease
GP: M64519_3
29.5
252
MG383
hypothetical protein (GB: D26185_10) (Bacillus subtilis)
GP: D26185_10
25.8
221



protein (potC) (Escherichia coli)



*MG009
hypothetical protein (GB: D26185_102) (Bacillus subtilis)
GP: D26185_102
35.4
249












Anions
MG057
hypothetical protein (GB: D26185_104) (Bacillus subtilis)
GP: D26185_104
28.9
175
















*MG422
peripheral membrane protein B (pstB)
GP: L10328_89
50.8
244
*MG024
hypothetical protein (GB: D26185_50) (Bacillus subtilis)
GP: D26185_50
51.1
363



(Escherichia coli)


MG421
peripheral membrane protein U (Escherichia coli)
GP: L10328_88
27.0
169
*MG006
hypothetical protein (GB: D26185_92) (Bacillus subtilis)
GP: D26185_92
41.5
178


*MG423
periplasmic phosphate permease homolog (AG88)
GP: X75297_1
30.8
254
*MG056
hypothetical protein (GB: D26185_99) (Bacillus subtilis)
GP: D26185_99
29.3
275



(Mycobacterium tuberculosis)



MG333
hypothetical protein (GB: D37799_6) (Bacillus subtilis)
GP: D37799_6
27.6
211












Carbohydrates, organic alcohols, and acids
MG459
hypothetical protein (GB: L08897_1) (Mycoplasma gallisepticum)
GP: L08897_1
34.1
138
















*MG192
ATP-binding protein (msmK)
GP: M77351_7
40.5
357
MG218
hypothetical protein (GB: L09228_16) (Bacillus subtilis)
GP: L09228_16
27.1
238



(Streptococcus mutans)


*MG062
fructose-permease IIBC component (truA)
SP: P20966
42.7
416
MG219
hypothetical protein (GB: L09228_17) (Bacillus subtilis)
GP: L10228_17
34.9
174



(Escherichia coli)


*MG033
glycerol uptake facilitator (glpF)
GP: M99611_2
35.9
189
*MG273
hypothetical protein (GB: L10328_61) (Escherichia coli)
GP: L10328_61
27.2
267



(Bacillus subtilis)


MG061
hexosephosphate transport protein (uhpT)
GP: M89480_4
30.9
158
*MG271
hypothetical protein (GB: L10328_61) (Escherichia coli)
GP: L10328_61
27.8
250



(Salmonella typhimurium)


*MG193
membrane protein (msmF) (Streptococcus mutans)
GP: M77351_4
22.5
263
MG126
hypothetical protein (GB: L10328_61) (Escherichia coli)
GP: L10328_61
31.9
252


MG194
membrane protein (msmG)
GP: M77351_5
27.1
272
MG140
hypothetical protein (GB: L18927_2) (Buchnera aphidicola)
GP: L18927_2
28.6
68


*MG120
methylgalactoside permease ATP-binding protein
GP: M59444_2
33.2
487
MG152
hypothetical protein (GB: L18965_6) (Thermophilic bacterial sp.)
GP: L18965_6
25.3
170



(mglA) (Escherichia coli)



MG305
hypothetical protein (GB: L9201_18) (Escherichia coli)
GP: L19201_18
23.1
328


*MG441
PEP-dependent HPr protein kinase
GP: M69050_2
46.5
570
MG029
hypothetical protein (GB: L19300_1) (Staphylococcus aureus)
GP: L19300_1
27.0
109



phosphoryltransferase (ptsI)



MG425
hypothetical protein (GB: L22432_4) (Mycoplasma capricolum)
GP: L22432_4
25.0
94



(Staphylococcus camosus)


MG041
phosphohistidinoprotein-hexose phosphotransferase
GP: L22432_2
48.9
86
MG250
hypothetical protein (GB: M12965_1) (Escherichia coli)
GP: M12965_1
33.8
64



(ptsH) (Mycoplasma capricolum)



*MG135
hypothetical protein (GB: M38777_3) (Escherichia coli)
GP: M38777_3
28.6
98


*MG071
phosphotransferase enzyme II, ABC component
SP: P20166
43.2
620
*MG358
hypothetical protein (GB: M65289_3) (Bacillus stearothermophilus)
GP: M65289_3
38.0
155



(ptsG) (Bacillus subtilis)



MG211
hypothetical protein (GB: M84964_1) (Bacillus subtilis)
GP: M84964_1
30.7
341


MG130
PTS glucose-specific permease
GP: U12340_1
25.5
108
MG124
hypothetical protein (GB: M91593_1) (Mycoplasma mycoides)
GP: M91593_1
24.0
249



(Bacillus stearothermophilus)


*MG121
ribose transport system permease protein RBSC
SP: P36948
27.5
199
MG245
hypothetical protein (GB: M91593_1) (Mycoplasma mycoides)
GP: M91593_1
27.8
130



(Bacillus subtilis)












Cations
MG131
hypothetical protein (GB: M91593_1) (Myocoplasma mycoides)
GP: M91593_1
30.7
246
















MG073
cation-transporting ATPase (pacL)
SP: P37278
34.4
887
*MG400
hypothetical protein (GB: U00016_19) (Mycobacterium leprae)
GP: U00016_19
30.9
106



(Synechococcus sp)












Nucleosides, purines and pyrimidines
*MG129
hypothetical protein (GB: U00021_19) (Mycobacterium leprae)
GP: U00021_19
27.7
152


Other
*MG454
hypothetical protein (GB: U00021_5) (Mycobacterium leprae)
GP: U00021_5
26.9
150
















*MG301
ATP-binding protein P29 (Mycoplasma hyorhinis)
SP: P15361
32.3
227
*MG339
hypothetical protein (GB: U00021_5) (Mycobacterium leprae)
GP: U00021_5
32.9
430


*MG402
lactococcin transport ATP-binding protein (lcnDR3)
SP: P37608
22.3
654
MG364
hypothetical protein (GB: U11883_2) (Bacillus subtilis)
GP: U11883_2
33.3
167



(Laciococcus tactis)



MG230
hypothetical protein (GB: U14003_71) (Escherichia coli)
GP: U14003_71
22.0
481


MG332
Na + ATPase subunit J (ntpJ) (Enterococcus hirae)
GP: D17462_11
31.1
436
*MG111
hypothetical protein (GB: U14003_76) (Escherichia coli)
GP: U14003_76
28.6
230


MG300
protein P37 precursor (Mycoplasma hyorhinis)
SP: P15363
35.8
331
MG473
hypothetical protein (GB: X73124_94) (Bacillus subtilis)
GP: X73124_94
40.0
68


*MG014
transport ATP-binding protein (msbA)
SP: P27299
28.1
518
MG265
hypothetical protein (GB: Z32651_1) (Mycoplasma pneumoniae)
GP: Z32651_1
57.1
41



(Escherichia coli)


*MG015
transport ATP-binding protein (msbA)
SP: P27299
32.2
482
*MG257
hypothetical protein (GB: Z33076_2) (Mycoplasma capricolum)
GP: Z33076_2
37.7
210



(Escherichia coli)


*MG418
transport system permease protein P69 MOTIF
SP: P15362
40.0
252
*MG147
hypothetical protein (SP: P09170) (Escherichia coli)
SP: P09170
24.1
109



(Mycoplasma hyorhinis)



*MG128
hypothetical protein (SP: P19434) (Streptomyces vindochromogenes)
SP: P19434
26.0
105


MG302
transport system permease protein P69
SP: P15362
27.9
524
*MG226
hypothetical protein (SP: P22186) (Escherichia coli)
SP: P22186
28.9
148



(Mycoplasma hyorhinis)







*MG382
hypothetical protein (SP: P23851) (Escherichia coli)
SP: P23851
27.0
253












Other categories
*MG214
hypothetical protein (SP: P23851) (Escherichia coli)
SP: P23851
30.5
295


Adaptations and atypical conditions
*MG306
hypothetical protein (SP: P25745) (Escherichia coli)
SP: P25745
34.7
123
















MG467
osmotically inducible protein (osmC)
SP: P23929
28.4
88
MG444
hypothetical protein (SP: P27712) (Spiroplasma citri)
SP: P27712
28.4
231



(Escherichia coli)


MG640
phosphate limitation protein (sphX)
GP: D26161_1
30.9
271
*MG252
hypothetical protein (SP: P31056) (Escherichia coli)
SP: P31058
33.0
180



(Synechococcus sp)


MG482
SpoOJ regulator MOTIF (Bacillus subtilis)
GP: D26185_55
27.5
245
MG116
hypothetical protein (SP: P31131) (Escherichia coli)
SP: P31131
32.6
45


*MG285
spore germination apparatus protein (gerBB)
GP: L16960_2
31.2
128
*MG359
hypothetical protein (SP: P32049) (Escherichia coli)
SP: P32049
28.5
128



MOTIF (Bacillus subtilis)



MG480
hypothetical protein (SP: P32049) (Escherichia coli)
SP: P32049
28.5
128


MG395
sporulation protein (outB) MOTIF (Bacillus subtilis)
GP: M15811_1
36.4
235
*MG133
hypothetical protein (SP: P32083) (Mycoplasma hyorhinis)
SP: P32083
30.1
102












Colicin-related functions
*MG122
hypothetical protein (SP: P32720) (Escherichia coli)
SP: P32720
30.9
132


Drug and analog sensitivity
MG138
hypothetical protein (SP: P37747) (Escherichia coli)
SP: P37747
34.1
363
















*MG475
high level kasgamycin resistance (ksgA)
GP: D26185_105
35.6
224
*MG345
hypothetical protein (SP: P38424) (Bacillus subtilis)
SP: P38424
33.9
167



(Bacillus subtilis)












Phage-related functions and prophages
*MG136
hypothetical protein 4 (Trypanosoma brucei)
PIR: E22845
30.8
302


Radiation sensitivity
*MG286
sinngent response-like protein (rel) (Streptococcus equisimllis)
GP: X72832_5
29.1
713


Transposon-related functions
MG338
U3 protein (Bacillus subtilis)
GP: Z18629_1
27.1
272


Other
MG278
$$F protein (Escherichia coli)
GP: U14003_297
38.3
302










[0285]

10





TABLE 7










Summary of gene content in H. influenzae and




M. genitalium
sorted by functional category










Biological role


H. influenzae




M. genitalium







Amino acid biosynthesis
68 (6.8%)
 1 (0.3%)


Biosynthesis of cofactors
54 (5.4%)
 3 (0.8%)


Cell envelope
84 (8.3%)
21 (5.8%)


Cellular processes
53 (5.3%)
21 (5.8%)


Cell division
16
 3


Cell killing
 5
 2


Chaperones
 6
 7


Detoxification
 3
 1


Protein secretion
15
 7


Transformation
 8
 1


Central intermediary metabolism
30 (3%)  
 6 (1.7%)


Energy metabolism
112 (10 4%)
31 (8.5%)


Aerobic
 4
 3


Amino acids and amines
 4
 0


Anerobic
24
 0


ATP-proton force interconversion
 9
 8


Electron transport
 9
 0


Entner-Doudoroff
 9
 0


Fermentation
 8
 0


Gluconeogenesis
 2
 0


Glycolysis
10
10


Pentose phosphate pathway
 3
 2


Pyruvate dehydrogenase
 4
 4


Sugars
15
 4


TCA cycle
11
 0


Fatty acid and phospholipid metabolism
25 (2.5%)
 5 (1.4%)


Purines, pyrimidines, nucleosides and
53 (5 3%)
20 (5.4%)


nucleotides


2′ Deoxyribonucleotide metabolism
 8
 5


Nucleotide and nucleoside
 3
 1


interconversions


Purine ribonucleotide biosynthesis
18
 3


Pyrimidine ribonucleotide biosynthesis
 5
 0


Salvage of nucleosides and nucleotides
13
 9


Sugar-nucleotide biosynthesis and
 6
 2


conversions


Regulatory functions
64 (6.3%)
 5 (1.4%)


Replication
87 (8.6%)
32 (8.8%)


Degradition of DNA
 8
 2


DNA replication, restriction,
76
30


modification, recombination and repair


Transcription
27 (2.7%)
12 (3.3%)


Degradation of RNA
10
 2


RNA synthesis, modification and DNA
17
10


transcription


Translation
141 (14%)  
 90 (24.7%)


Transport and binding proteins
123 (12 2%)
34 (9.3%)


Amino acids and peptides
38
10


Anions
 8
 3


Carbohydrates
30
12


Cations
24
 1


Other transporters
22
 8


Other Categories
93 (9.2%)
23 (6.3%)


Unassigned role
736 (43%)  
178 (37%)  


No database match
389 
117 


Match hypothetical proteins
347 
61










[0286]


Claims
  • 1. An isolated polynucleotide comprising the nucleotide sequence of any one of the fragments of SEQ ID NO: 1 depicted in Tables 1a, 1c and 2, or a degenerate variant thereof.
  • 2. An isolated polynucleotide complementary to the polynucleotide of claim 1.
  • 3. The isolated polynucleotide of claim 1, wherein said polynucleotide comprises a heterologous nucleic acid sequence.
  • 4. A method for making a recombinant vector comprising inserting the isolated polynucleotide of claim 1 into a vector.
  • 5. A recombinant vector comprising the isolated polynucleotide of claim 1.
  • 6. The recombinant vector of claim 5, wherein said polynucleotide is operably associated with a heterologous regulatory sequence that controls gene expression.
  • 7. A recombinant host cell comprising the isolated polynucleotide of claim 1.
  • 8. The recombinant host cell of claim 7, wherein said polynucleotide is operably associated with a heterologous regulatory sequence that controls gene expression.
  • 9. An isolated polynucleotide comprising a nucleic acid sequence which hybridizes under hybridization conditions comprising hybridization in 5×SSC and 50% formamide at 50° C. and washing in a wash buffer consisting of 0.5× SSC at 65° C., to the complementary strand of any one of the fragments of SEQ ID NO: 1 depicted in Tables 1a, 1c and 2, or a degenerate variant thereof.
  • 10. An isolated polynucleotide complementary to the polynucleotide of claim 9.
  • 11. The isolated polynucleotide of claim 9, wherein said polynucleotide comprises a heterologous nucleic acid sequence.
  • 12. A recombinant vector comprising the isolated polynucleotide of claim 9.
  • 13. A recombinant host cell comprising the isolated polynucleotide of claim 9.
  • 14. An isolated polynucleotide comprising at least 50 contiguous nucleotides of any one of the fragments of SEQ ID NO: 1 depicted in Tables 1a, 1c and 2, or a degenerate variant thereof.
  • 15. The isolated polynucleotide of claim 14, wherein said polynucleotide comprises at least 100 contiguous nucleotides of any one of the fragments of SEQ ID NO: 1 depicted in Tables 1a, 1c and 2, or a degenerate variant thereof.
  • 16. An isolated polynucleotide complementary to the polynucleotide of claim 14.
  • 17. The isolated polynucleotide of claim 14, wherein said polynucleotide comprises a heterologous nucleic acid sequence.
  • 18. A recombinant vector comprising the isolated polynucleotide of claim 14.
  • 19. A recombinant host cell comprising the isolated polynucleotide of claim 14.
CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a divisional of and claims priority under 35 U.S.C. § 120 to U.S. application Ser. No. 08/545,528, filed Oct. 19, 1995, which is a continuation-in-part of and claims priority under 35 U.S.C. § 120 to U.S. application Ser. Nos. 08/488,018 and 08/473,545, both filed Jun. 7, 1995. U.S. application Ser. Nos. 08/488,018 and 08/473,545 are each hereby incorporated herein by reference.

STATEMENT REGARDING FEDERALLY-SPONSORED RESEARCH AND DEVELOPMENT

[0002] Part of the work performed during development of this invention utilized U.S. Government funds. The U.S. Government may have certain right in the invention—DE-FC02-95ER61962.A000; NP-838C; NIH-AI08998, AI33161, and HL19171.

Divisions (1)
Number Date Country
Parent 08545528 Oct 1995 US
Child 10205220 Jul 2002 US
Continuation in Parts (2)
Number Date Country
Parent 08488018 Jun 1995 US
Child 08545528 Oct 1995 US
Parent 08473545 Jun 1995 US
Child 08545528 Oct 1995 US