Genomic sequence of Rhizobium sp. NGR 234 symbiotic plasmid

Information

  • Patent Grant
  • 6475793
  • Patent Number
    6,475,793
  • Date Filed
    Tuesday, June 22, 1999
    25 years ago
  • Date Issued
    Tuesday, November 5, 2002
    22 years ago
Abstract
The sequencing and analysis of the complete nucleotide sequence of symbiotic plasmid pNGR234a isolated from Rhizobium sp. NGR234. The complete sequence of pNGR234a is presented. The analysis includes the identification of a number of novel ORFs and the proteins expressible therefrom which have been ascribed putative functions.
Description




TECHNICAL FIELD




This invention relates to a symbiotic plasmid of the broad host-range Rhizobium sp. NGR234 and its use. In particular, this invention relates to the isolation and analysis of the complete sequence of the NGR234 symbiotic plasmid pNGR234a, and the open reading frames (ORFs) identifiable therein as well as the proteins expressible from said ORFs.




BACKGROUND OF THE INVENTION




Together with carbon, hydrogen and oxygen, nitrogen is one of the essential components in organic chemistry. Although it is present in vast quantities in the atmosphere, nitrogen in its diatomic form N


2


remains unassimilable by living organisms. The nitrogen cycle begins by the fixation of nitrogen into ammonia which is chemically more reactive and can be assimilated into the food chain. A large fraction of the total nitrogen fixed every year is produced by microorganisms. Among these, the soil bacteria of the genera Azorhizobium, Bradyrhizobium, Sinorhizobium and Rhizobium, generally referred to as rhizobia, fix nitrogen in symbiotic associations with many plants from the Leguminosae family. This highly specific interaction leads to the formation of specialized root-, and in the case of Azorhizobium, stem-structures called nodules. It is within these nodules that rhizobia differentiate into bacteroids capable of fixing atmospheric nitrogen into ammonia. In turn, ammonia diffuses into the vegetal cells and sustains plant growth even under limiting nitrogen conditions.




The Rhizobium-legume interaction presents many interesting features. Obviously, the possibility of using this symbiosis as an “environmentally friendly” way to provide some of the most important world crops (such as soybean, bean and many other legumes) with fixed nitrogen without using nitrate-rich fertilizers, has important economic consequences. It is also an ideal model to study a non-pathogenic interaction between bacteria and a highly developed, multicellular organism such as the host plant. Furthermore, the various steps involved in the establishment of a functional nitrogen symbiosis, which include some dramatic morphological changes as well as processes of cellular differentiation, require a complex exchange of molecular signals. Despite many decades of studies, it is only recently that the Rhizobium-legume interaction has been partially understood at the molecular level. The establishment of a functional symbiosis can be divided into two major steps as follows.




(A) Rhizosphere Ecology and Modulation




Rhizobia are soil bacteria that proliferate in the rhizosphere of compatible plants, taking advantage of the many compounds released by plant roots. In return it has been shown that the presence of rhizobia in the rhizosphere reduces susceptibility of plants to many root diseases. In the case of low nitrogen levels in the soil, compatible rhizobia can interact with host plants and start the nodulation process (Long, 1989; Fellay et al., 1995; van Rhijn and Vanderleyden, 1995). Molecular signalling between the two partners begins with the release by the plant of phenolic compounds (mostly flavonoids) that induce the expression of nodulation genes (referred to as nod, nol and noe genes). The NodD1 gene product appears to be the central mediator between the plant signal and nodulation gene induction (Bender et al., 1988). It is modified by the binding of flavonoids and acts as a positive regulator on the expression of the remaining nodulation genes. Among them, the nodABC loci encode products responsible for the synthesis of the core structure of lipooligosaccharides called Nod factors (Relić et al., 1994). More nodulation genes are involved in strain-specific modifications of the Nod factors as well as in its secretion. It seems established now that variability in the structure of Nod factors may play a significant role in the determination of the host-range of a given Rhizobium strain, that is in its ability to efficiently nodulate different legumes. For example, the strain


Rhizobium meliloti


can only nodulate Medicago, Melilotus and Trigonella ssp., whereas Rhizobium sp. NGR234 can symbiotically interact with more than 105 different genera of plants, including the non-legume


Parasponia andersonii.






The structure of many Nod factors, their isolation from Rhizobium strains and their commercial application in agriculture have been described (NodNGR-Faktoren: Relić et al., 1994; WO 94/00466; NodRm-Faktoren: WO 91/15496). Secreted Nod factors act in turn as signal molecules that allow rhizobia to enter young root hairs of a host plant, and induce root-cortical cell division that will produce the future nodule. Invaginated rhizobia progress towards the forming nodule within infection threads that are synthesized by the plant cells. Bacteria are then released into the cytoplasm of dividing nodule cells where they differentiate into bacteroids capable of fixing atmospheric nitrogen.




With respect to regulation of the nodulation genes, other regulatory genes with similarities to nodD1 (genes that belong to the lysR family) have been identified in various strains (Davis and Johnston, 1990). The function of these genes, called nodD2, nodD3 or syrM, is only partially understood. Some nodD genes have been described (WO 94/00466; CA 1314249; WO 87/07910; U.S. Pat No. 5,023,180). Also, recombinant DNA molecules including the consensus sequence of the promoters of nodD1-regulated genes, called nod-boxes (Fisher and Long, 1993), have been disclosed (U.S. Pat. Nos. 5,484,718; 5,085,588). Finally, recombinant plasmids with the nodABC genes or, in one case (


Bradyrhizobium japonicum


), a sequence influencing host specificity have been disclosed (U.S. Pat. Nos. 5,045,461; 4,966,847).




(B) Symbiotic Nitrogen Fixation




Inside the nodules, rhizobia differentiate into bacteroids that express the enzymatic complex (nitrogenase) required for the reduction of atmospheric nitrogen into ammonia. The nitrogenase is encoded by three genes nifH, nifD and nifK which are well conserved in nitrogen fixing organisms (Badenoch-Jones et al., 1989). Many additional loci are necessary for functional nitrogenase activity. Those originally identified in


Klebsiella pneumoniae


are known as nif genes, whereas those found only in Rhizobium strains are described as fix genes (Fischer, 1994). Some of these gene products are required for the biosynthesis of cofactors, the assembly of the enzymatic complex or play regulatory and different accessory roles (oxygen-limited respiration, etc.). Many of these genes are less conserved among the various rhizobial strains and in some cases their function is still not fully understood. The high sensitivity of the nitrogenase complex to free oxygen requires a very strict control of most nif and fix gene expression. In this respect, the FixL, FixJ, FixK, NifA and RpoN proteins have been identified in representative Rhizobium species as the major regulatory elements that, in microanaerobic conditions, activate the synthesis of the nitrogenase complex (Fischer, 1994). Recombinant DNA molecules containing nif genes/promoters have been disclosed: nifH promoters of


B. japonicum


(U.S. Pat. No. 5,008,194), nifH and nifD promoter of


R. japonicum


(EP 164245), nifA of


B. japonicum


and


R. meliloti


(EP 339830), nifHDK and hydrogen-uptake (hup) genes of


R. japonicum


(EP 205071).




Many more genetic determinants play a significant role in the Rhizobium-legume symbiosis. Genes (exo, lps and ndv genes) involved in the production of extracellular polysaccharides (EPS), lipopolysaccharides (LPS) and cyclic glucanes of rhizobia play an essential role in the symbiotic interaction (Long et al., 1988; Stanfield et al., 1988). Mutation in these genes negatively influences the development of functional nodules. In this respect, some exopolysaccharides of the NGR234 derivative strain ANU280, have been disclosed (WO 87/06796). Although Nod factors seem to play a key role in the nodulation process, experimental data indicate that other signal molecules produced by the bacterial symbionts are required for functional symbiosis and may play a role in coordinating various steps such as the controlled invasion process, the release of rhizobia from the infection thread into the plant cell cytoplasm, the bacteroid differentiation process, etc. Moreover, the need for rhizobia to survive in the rhizosphere and to compete adequately with other microorganisms requires many more unidentified genes that, although they may not be characterised as proper symbiotic loci, do affect the efficiency of the various strains to induce functional nitrogen fixing symbiosis in field conditions. Finally, in our view genetic engineering of improved rhizobial strains cannot be pursued without a more extended knowledge of the structure and complexity of the Rhizobium symbiotic genome.




In this respect we decided to determine the complete DNA sequence of a symbiotic plasmid of Rhizobium sp. NGR234. In contrast to Bradyrhizobium and Azorhizobium that carry symbiotic genes on large chromosomes (ca. 8 Mbp) and to


R. meliloti


that harbours two very large symbiotic plasmids of 1.4 and 1.6 Mbp, NGR234 carries a single plasmid of ca. 500 kbp, pNGR234a. Moreover, it has been shown by transfer of pNGR234a into heterologous rhizobia, and even into non-nodulating


Agrobacterium tumefaciens,


that most nodulation functions are encoded by this plasmid (Broughton et al., 1984). The fact that NGR234 is able to interact symbiotically with more plants than any other known strain, and that a complete ordered cosmid library of pNGR234a was available, reinforced NGR234 as the best choice for a large-scale sequencing effort on a symbiotic plasmid (Perret et al., 1991; Freiberg et al., 1997).




Automated fluorescent methods have been used to sequence cosmids from eukaryotic organisms, including


Saccharomyces cerevisiae


(Levy, 1994),


Caenorhabditis elegans


(Sulston et al., 1992),


Drosophila melanogaster


(Hartl and Palazzolo, 1993), and


Homo sapiens


(Bodmer, 1994), as well as chromosomes from the prokaryotes


Haemophilus influenzae


(Fleischmann et al., 1995) and


Mycoplasma genitalium


(Fraser et al., 1995). In most large-scale sequencing centres this technology is based mainly on the shotgun approach. After random fragmentation of DNA (e.g. cosmids, bacterial artificial chromosomes (BACs), entire chromosomes) using sonication or mechanical forces, size-selected fragments are subcloned into M13 phages, phagemids or plasmids and sequenced by cycle sequencing using dye primers (Craxton, 1993). A disadvantage of this method is that DNA regions with elevated GC contents produce large numbers of compressions (unresolvable foci in sequence gels) in the dye primer sequences leading to several hundred compressions per assembled cosmid sequence. It is known that the use of dye terminators—fluorescently labelled dideoxynucleoside triphosphates—instead of dye primers reduces the number of compressions (Rosenthal and Charnock-Jones, 1993). Therefore, dye terminators are frequently being used for gap closure and proofreading after assembly of the shotgun data.




To sequence GC-rich cosmids with the highest accuracy, the effectiveness of shotgun sequencing with dye terminators in comparison to dye primer sequencing was investigated. To improve the incorporation of dye terminators into DNA, a modified Taq DNA polymerase carrying a single mutation was used (Tabor and Richardson, 1995). This enzyme has properties similar to a thermostable “sequenase” and is commercially available as Thermo Sequenase (Amersham, Buckinghamshire, UK) or AmpliTaq FS (Perkin-Elmer, Foster City, Calif., USA). Concentrations of dye terminators needed in the cycle sequencing reactions can be reduced by 20-250 times. It was found that dye terminator shotgun sequencing leads to compression-free raw data that can be assembled much faster than shotgun data mainly obtained by dye primer sequencing. This strategy thus allows a several-fold increase in speed to sequence individual cosmids. This was demonstrated by comparing assembly of the sequence data of two cosmids from pNGR234a generated by different chemistries: Cosmid pXB296 was sequenced with dye terminators, whereas data for pXB110 were obtained using the common dye primer method. Also disclosed is the analysis of the entire pXB296 sequence.




Moreover, the dye terminator shotgun sequencing strategy used to generate the sequence data for pXB296 was also used to sequence all the other remaining overlapping cosmids of the plasmid pNGR234a. In summary, 20 cosmids have been sequenced together with two PCR products and a subcloned DNA fragment derived from a cosmid identified as pXB564 in order to generate the plasmid's complete nucleotide sequence.




After its assembly, the analysis of the entire nucleotide sequence of pNGR234a, especially the determination of putative coding regions and the prediction of their expressible proteins and putative functions, was performed. Initially, analysis of the region covered by cosmid pXB296 was extended to cosmids pXB368 and pXB110. Thus, in approximately 100 kb of the plasmid (position 417,796-517,279) most ORFs and their deduced proteins with different putative functions were predicted. Subsequently, the rest of pNGR234a was analyzed.




SUMMARY OF THE INVENTION




The present invention provides the complete nucleotide sequence of symbiotic plasmid pNGR234a or degenerate variants thereof of Rhizobium sp. NGR234.




The present invention also contemplates sequence variants of the plasmid pNGR234a altered by mutation, deletion or insertion.




Also encompassed by the present invention are each of the ORFs derivable from the nucleotide sequence of pNGR234a or variants thereof.




In a preferred embodiment, the ORFs derived from the nucleotide sequence of pNGR234a encode the functions of nitrogen fixation, nodulation, transportation, permeation, synthesis and modification of surface poly- or oligosaccharides, lipo-oligosaccharides or secreted oligosaccharide derivatives, secretion (of proteins or other biomolecules), transcriptional regulation or DNA-binding, peptidolysis or proteolysis, transposition or integration, plasmid stability, plasmid replication or conjugal plasmid transfer, stress response (such as heat shock, cold shock or osmotic shock), chemotaxis, electron transfer, synthesis of isoprenoid compounds, synthesis of cell wall components, rhizopine metabolism, synthesis and utilization of amino acids, rhizopines, amino acid derivatives or other biomolecules, degradation of xenobiotic compounds, or encode proteins exhibiting similarities to proteins of amino acid metabolism or related ORFs, or enzymes (such as oxidoreductase, transferase, hydrolase, lyase, isomerase or ligase).




In another preferred embodiment, the ORFs are under the control of their natural regulatory elements or under the control of analogues to such natural regulatory elements.




The present invention also provides the sequences of the intergenic regions of pNGR234a which, in a preferred embodiment, are regulatory DNA sequences or repeated elements. In a further preferred embodiment, the intergenic sequences are ORF-fragments.




Also provided by the present invention are mobile elements (insertion elements or mosaic elements) derivable from the nucleotide sequences of the present invention.




The present invention also contemplates the use of the disclosed nucleotide sequences or ORFs in the analysis of genome structure, organisation or dynamics.




Also provided by the present invention is the use of the nucleotide sequences or ORFs in the subcloning of new nucleotide sequences. In a preferred embodiment, the new nucleotide sequences are coding sequences or non-coding sequences.




In yet a further preferred embodiment, the nucleotide sequences or ORFs are used in genome analysis and subcloning methods as oligonucleotide primers or hybridization probes.




The present invention further provides proteins expressible from the disclosed nucleotide sequences or ORFs.




Also contemplated by the present invention is the use of the disclosed nucleotide sequences, individual ORFs or groups of ORFs or the proteins expressible therefrom in the identification and classification of organisms and their genetic information, the identification and characterisation of nucleotide sequences, the identification and characterisation of amino acid sequences or proteins, the transportation of compounds to and from an organism which is host to said nucleotide sequences, ORFs or proteins, the degradation and/or metabolism of organic, inorganic, natural or xenobiotic substances in a host organism, or the modification of the host-range, nitrogen fixation abilities, fitness or competitiveness of organisms.




The present invention also provides plasmid pNGR234a of Rhizobium sp. NGR234 comprising the disclosed nucleotide sequence or any degenerate variant thereof.




The present invention also provides a plasmid harbouring at least one of the disclosed ORFs or any degenerate variant thereof.




The plasmids of the invention may be produced recombinantly and/or by mutation, deletion, insertion or inactivation of an ORF, ORFs or groups of ORFs.




The present invention also provides the use of the disclosed plasmids or variants thereof in obtaining a synthetic minimal set of ORFs required for functional Rhizobium-legume symbiosis, the modification of the host-range of rhizobia, the augmentation of the fitness or competitiveness of Rhizobium sp. NGR234 in the soil and its nodulation efficiency on host plants, the introduction of desired phenotypes into host plants using the disclosed plasmids as stable shuttle systems for foreign DNA encoding said desired phenotypes, or the direct transfer of the disclosed plasmids into rhizobia or other microorganisms without using other vectors for mobilization.




The nucleotide sequences of the present invention were advantageously obtained using known cycle sequencing methods. The preferred dye terminator/thermostable sequenase shotgun sequencing method used to generate the nucleotide sequences of the present invention, when applied to cosmids and when compared to other sequencing methods, was shown to yield sequence reads of the highest fidelity. Consequently, the speed of assembly of particular cosmids was increased, and the resultant high-quality sequences required little editing or proofreading. Thus, the preferred sequencing method described herein was successfully used to generate the complete nucleotide sequence of all the overlapping cosmids of plasmid pNGR234a, thereby resulting in the assembly of the complete sequence of the plasmid.




The complete sequence of pNGR234a is disclosed for the first time in this application, as are the majority of the ORFs predicted within the sequence. Putative functions have been ascribed to the novel and inventive ORFs disclosed herein and the proteins for which they code.











BRIEF DESCRIPTION OF DRAWINGS




The present invention is described below and illustrated thereafter in the appended examples, with reference to the following figures:





FIG. 1

A comparative graph showing the comparison of sequences from pXB296 created by different cycle sequencing methods.





FIG. 2

A schematic diagram showing the organization of the predicted ORFs in pXB296 from Rhizobium sp. NGR234.





FIG. 3

The complete nucleotide sequence of plasmid pNGR234a (with the pages labelled sequentially from 19961 to 1996142).





FIG. 4

A schematic diagram showing the map of the 20 sequenced cosmids covering the 536 kb symbiotic plasmid pNGR234a of Rhizobium sp. NGR234.





FIG. 5

A diagram indicating multiple alignments of the nucleotide sequence of the replication origins of various plasmids.





FIG. 6

A diagram indicating multiple DNA sequence alignments of the regions containing the origin of transfer of various plasmids.





FIG. 7

A schematic diagram showing a circular representation of the symbiotic plasmid pNGR234a of NGR234.











DETAILED DESCRIPTION OF THE INVENTION AND BEST MODE




Comparison of Different Shotgun Sequencing Strategies




The following is a more detailed description of certain key aspects of the present invention.




GC-rich cosmids were examined to investigate whether they could be sequenced much more efficiently using dye terminators throughout the shotgun phase instead of dye primers. As a test case, cosmid pXB296 with a GC content of 58 mol % from pNGR234a, the symbiotic plasmid of Rhizobium sp. NGR234, was exclusively sequenced using dye terminators in combination with a thermostable sequenase [Thermo Sequenase (Amersham)]. Another rhizobial cosmid with identical GC content, pXB110, was sequenced using traditional dye primer chemistry and Taq DNA polymerase.




Using the dye terminator/thermostable sequenase shotgun strategy, it was shown that most, if not all, compressions could be resolved and reads were produced with the highest fidelity among all sequencing chemistries tested. As a result, a much faster assembly of cosmid pXB296 in comparison to pXB110 was obtained. The shotgun data could be assembled into a high-quality sequence without extensive editing and proofreading. By measuring the error rate in overlapping regions between individual cosmids from pNGR234a, as well as the cosmid vector sequence itself (data not shown), it was estimated that the accuracy of the pXB296 sequence is higher than 99.98%. Using other thermostable sequenases such as AmpliTaq FS (Perkin-Elmer), similar results were expected because thermostable sequenases have similar properties.




Dye primer chemistry in combination with Thermo Sequenase was also examined. Although the peak uniformity of signals was much improved over dye primer/Tag DNA polymerase data, the number of compressions in GC-rich shotgun reads was not reduced significantly. Compressions in shotgun raw data enormously increase the overall effort of editing, proofreading, and finishing a cosmid as shown for pXB110 (Table 1).




Because of their longer reading potential, dye primer reads are helpful for gap closure. However, using ABI 373A sequencers (Applied Biosystems, Inc. (ABI), Perkin-Elmer, Foster City, Calif., USA), dye primer reads are, on average, only ˜50 bases longer than dye terminator reads.




Using the experimental conditions of the present invention, shotgun sequencing with dye terminators and a thermostable sequenase is superior because for GC-rich cosmid templates it removes most of the compressions and this leads to a several-fold improvement in assembling and finishing of cosmid-sized projects. Although dye terminators are slightly more expensive than dye primers, the overall saving in time for finishing projects has, in our experience, a much greater effect on general costs.




It has been shown that the strategy of the present invention is effective for high-throughput shotgun sequencing of GC-rich templates. This strategy was therefore used to sequence the remaining 19 overlapping cosmids of the symbiotic plasmid pNGR234a of Rhizobium sp. NGR234. In total, 20 cosmids, two PCR products (1.5 and












TABLE 1











Comparison of the assembly of the sequence data from cosmids






pXB296 (dye terminator shotgun reads) and pXB110 (dye primer






shotgun reads)













Data assembly




pXB296




pXB110
















Average length of the shotgun reads (bases)




332




378






No. of shotgun reads used for assembly




786




899






No. of shotgun reads assembled with 4%




736




308






mismatch


a








No. of shotgun reads assembled with 25%




775




879






mismatch


a








No. of contigs


b


longer than 1 kbp




3




25






No. of contigs left after editing


c






2




4






No. of additional reads (gap closure and




32




191






proofreading)


d








Total length of cosmid insert (bp)




34,010




34,573






Sequencing redundancy (per bp)




8.0




10.5













a


Assembling program: XGAP; principal autoassembling conditions: normal shotgun assembly, joins permitted, minimum initial match = 15, maximum no. of pads per reading during the alignment procedure = 8, maximum no. of pads per reading in contig to align any new reading = 8, alignment mismatches 4% and 25%, respectively.












b


Contiguous parts of sequence created by overlapping reads.












c


Lengths of contigs: 6-10 kbp (pXB296); 2-12 kbp (pXB110).












d


Reads necessary for closing gaps and making single-stranded regions double-stranded by primer walking on selected templates and, in case of pXB110, for solving ambiguities (compressions) by the resequencing of clones with universal primer and dye terminators.













2.0 kb in length) and a 1.5 kb restriction fragment were sequenced in order to generate the complete pNGR234a sequence (FIG.


4


).




Genetic Organization of pXB296




All 28 predicted open reading frames (ORFs) in pXB296 (

FIG. 2

) show significant homologies to database entries (Table 2). The first putative gene cluster (cluster I) containing ORF1 to ORF5 corresponds to various oligopeptide permease operons (Hiles et al., 1987; Perego et al., 1990). Only ORF5 shows homology to a gene from a different bacterium,


Bacillus anthracis


(Makino et al., 1989). Each homologue encodes membrane-bound or membrane-associated proteins suggesting that all five ORFs are involved in oligopeptide permeation.




Organization of the predicted gene cluster IV, including the nifA homologue ORF16 (fixABCX, nifA, nifB, fdxN, ORF, fixU homologues, position 16,746-24,731), the predicted locations of the σ


54


-dependent promoters and the nifA upstream activator sequences (FIG.


2


), correspond to the organization found in


Rhizobium meliloti


and


Rhizobium leguminosarum


bv.


trifolii.


(Iismaa et al., 1989; Fischer, 1994). NifA is a positive transcriptional activator (Buikema et al., 1985), whereas nif and fix genes are essential for symbiotic nitrogen fixation. Identification of σ


54


-dependent promoter sequences, together with the upstream activator motifs upstream of ORF21, ORF22, and ORF23, suggests that these ORFs may play an important, but still undefined, role in symbiosis.




Inevitably, large-scale sequencing uncovers differences with already published sequences. van Slooten et al. (1992) cloned a 5.8 kb EcoRI fragment from Rhizobium sp. NGR234 and sequenced 2067 bp by manual radioactive methods (EMBL accesion no. S38912). This sequence exhibits 2.4% mismatches with the corresponding sequence in pXB296.












TABLE 2











Putative ORFs of pXB296 and homologies of the deduced amino acid sequences to known proteins




















ribosomal binding site:













SD-sequence -








distance from start








codon (bases)-




no. of







position on




start codon


d






deduced




homologous




homologous protein




iden-




simi-
























cosmid




SD-Sequence: 5′-




amino




amino acids





length





acces-




tity




larity






ORF


a






st.


b






(base no.)


c






TAAGGAGGTGA-3′




acids




(position)




name




(aa)


e






function


f






sion no.




(%)


g






(%)


g



























ORF1


h






+




00001-00625





>207




1-207




OppB




306




oligopeptide




X05491




45




68






ORF2




+




00628-01503




GTATCC


GGT


-7-ATG




291




2-289




OppC




305




permease




X56347




37




63






ORF3




+




01505-02512




AGC


GGAGG


-7-ATG




335




8-327




OppD




336




proteins




X56347




49




69






ORF4




+




02509-03570




TGAAGT


GGT


-6-ATG




353




2-323




OppF




334





X05491




51




69






ORF5




+




03606-04991




C


AAGGA


-6-ATG




461




1-458




CapA




411




encapsulation




M24150




25




48














protein






ORF6




+




05460-06863




CCGA


GAGG


-8-ATG




467




1-464




BioA




455




aminotransferase




M29292




29




55






ORF7




+




06888-08426




GCCTTC


GG


-5-GTG




512




97-509 




ORF


i






417




unknown




D37877




36




58











34-510 




GapD




482




succinic




M38417




33




57














semialdehyde














dehydrogenase






ORF8









09781-10860




G


AA


C


G


T


GG


-8-ATG




359




72-299 




ORF


i






414




transposase




X15942




30




48














homologue














minicircle DNA






ORF9




+




11124-12455




?-7-ATG




443




2-443




GLUD1




558




glutamate




M37154




41




60














dehydrogenase






ORF10









13370-14116




A


AAGGA


-6-ATG




248




1-245




ORF2


i






231




transposase




X79443




45




64






ORF11









14128-15672




CAT


GGAG


-7-TTG




514




1-513




ORF1


i






558




homologues,




X79443




41




62














IS1162






ORF12









16712-16942




G


AAGGA


-8-ATG




76




1-70 




FixU




70




unknown




P42710




63




80






ORF13









16939-17265




ACAA


GAGG


-7-ATG




109




1-79 




ORF2


i






>78




unknown




X07567




53




81











15-107 




NifZ




159




involved in




M20568




39




56














FeMo-cofactor














synthesis






ORF14









17349-17543




CC


AGGAG


-9-ATG




64




1-64 




FdxN




64




ferredoxin-like




M21841




80




87






ORF15









17585-19066




AGT


GGAG


-7-ATG




493




1-493




NifB




490




involved in




M15544




73




84














FeMo-cofactor














synthesis






ORF16









19292-20962




ATT


GG


-12-ATG




556




9-556




NifA




541




transcriptional




X02615




59




72














regulator






ORF17









21129-21422




A


GGGGAG


-7-ATG




97




1-97 




FixX




98




required for




M15546




84




87






ORF18









21437-22744




AACT


GAGGT


-7-ATG




435




1-435




FixC




435




nitrogen




M15546




83




90






ORF19









22755-23864




AT


AGGAG


-6-ATG




369




18-369 




FixB




353




fixation




M15546




79




89






ORF20









23874-24731






TAA


A


GAG


-5-ATG




285




1-285




FixA




292





M15546




74




85






ORF21









25148-25468




CC


AGGAG


-10-ATG




106




1-106




ORF118


i






108




unknown




X13691




55




71






ORF22









26145-26711




G


AAGGAG


-9-ATG




188




9-199









241




hypothetical




U32739




47




64














protein











1-173









166




peroxisomal




U11244




32




57














protein






ORF23




+




27169-27861




G


AAGGA


-7-ATG




230




1-167




NifQ




167




probably involved




X13303




37




57














in Mo-processing






ORF24




+




27920-29434




CT


GGG


A


GG


-18-ATG




504




1-454




DctA1




456




C


4


-dicarboxylate




S38912




97




98











8-454




DctA2




449




transporter




S38912




97




98






ORF25




+




29431-30675




TTC


GG


C


GG


-12-ATG




414




2-414




CamC




415




cytP450-like




M12546




34




53






ORF26




+




30676-31332




TT


GGG


-5-TTG




218




30-190 




LinA




155




γ-hexachloro-




D90355




27




51














cyclohexan-














dechlorinase






ORF27




+




31329-33035




AGT


GGAG


-10-ATG




568




28-270 




FabG




244




reductase




M84991




38




57











294-534 








30




57






ORF28


k






+




33173-34010




C


AAGGAG


-5-ATG




>279




1-279




LuxA




355




luciferase




M10961




23




49














α-subunit













a


(ORF) Open reading frame.












b


(st.) Plus or minus strand.












c


Position on cosmid: from the first base of the start codon to the last base of the stop codon; alternative start points are 6912/6927/7017 (ORF7), 10665/10656 (ORF8), 11220 (ORF9), 15699/15651 (ORF11), 17322/17271 (ORF13), 20995/21076 (ORF16), 26744 (ORF22), 27229/27304 (ORF23), 27941 (ORF24), and 30751/30754 (ORF26).












d


(SD sequence) Shine-Dalgamo sequence (Shine and Dalgamo 1974). Bases underlined are identical with the Shine-Dalgamo sequence. The following possible start codons were considered: ATG, GTG, or TTG.












e


(aa) Amino acids.












f


Organisms:


Salmonella typhimurium, Bacillus subtilis


(OppBCDF),


Bacillus anthracis


(CapA),


Bacillus sphaericus


(BioA),


Streptomyces hygroscopicus


(ORF7 homolog),


Escherichia coli


(GapD),


Streptomyces coelicolor


(ORF8 homolog),


Homo sapiens


(GLUD1),


Pseudomonas fluorescens


(ORF10, ORF11 homologs),


Rhizobium leguminosarum


(FixU),


Rhodobacter capsulatus


(ORF13 homolog),


Azotobacter vinelandii


(NifZ),


Rhizobium










#


meliloti


(FdxN, NifBA, FixXCBA),


Bradyrhizobium japonicum


(ORF118),


Haemophilus influenzae


(hypothetical protein),


Lipomyces kononenkoae


(peroxisomal protein),


Klebsiella pneumoniae


(NifQ), Rhizobium sp. NGR234 (DctA),


Pseudomonas putida


(CamC),


Pseudomonas paucimobilis


(LinA),


Escherichia coli


(FabG),


Vibrio harveyi


(LuxA).










g


Identity and similarity were calculated using the program BESTFIT (Smith and Waterman 1981).












h


(ORF1) 3′ end.












i


Translated ORF.












k


(ORF28) 5′ end.













It contains the gene dctA (encoding a C


4


-dicarboxylate permease), which is 144 bases shorter than in pXB296. In this respect, a single nucleotide deletion in position 29,248 of the cosmid sequence close to the 3′ end of the gene causes a frameshift leading to a DctA product extended by 48 residues. van Slooten et al. (1992) also failed to identify the nifQ homologue, ORF23 (position 27,169-27,861), presumably because they overlooked a small XhoI fragment located between positions 27,349 and 27,536 on pXB296. Expression studies allowed these investigators to define a putative σ


54


-dependent promoter in a 1.7 kb SmaI fragment (position 27,094-28,818 in pXB296). This fragment stretches from the upstream region of ORF23 to the 5′ part of dctA. The 58 bp intergenic region between ORF23 and dcta contains a stem-loop structure but no obvious promoter sequence. Possibly the promoter that controls dctA is located upstream of ORF23 (e.g. the minimal consensus sequence included in GGGGGCACAATTGC at position 27,098-27,111). Although clones containing dctA complemented mutants of


R. meliloti


and


R. leguminosarum


for growth on dicarboxylates, the growth of the NGR234 dctA deletion mutant was not affected (van Slooten et al., 1992). Nevertheless, this mutant was unable to fix nitrogen in nodules. Because dctA is now possibly part of a larger transcription unit, the symbiotic phenotype may also result from the inactivation of downstream genes.




Interestingly, the GC content of the predicted pXB296 ORFs ranges from 53.3 mol % to 64.6 mol %, with an overall cosmid GC content of 58.5 mol %. Genomes of Azorhizobium, Bradyrhizobium, and Rhizobium species have GC contents of 59 mol % to 65 mol % (Padmanabhan et al., 1990), with 62 mol % reported for Rhizobium sp. NGR234 (Broughton et al., 1972). Although pXB296 covers <7% of the complete symbiotic plasmid sequence, its lower overall GC value suggests that symbiotic genes might have evolved by lateral transfer from other organisms. In this case, methods of the type applied in the present invention will become even more relevant in sequencing the whole genome.




Genetic Organization of the 100 kb Region Covered by Cosmids pXB296, pXB368 and pXB110




Extending the analysis of pXB296 to a 100 kb region stretching from position 417,796 to 517,279 on the symbiotic plasmid pNGR234a led initially to the assignation of only 76 ORFs listed within Table 3 (excluding the first incomplete ORF noted in the analysis of pXB296 (“ORF1” of Table 2)). The ORFs y4tQ to y4vJ (excluding ORFs y4uD and y4uG and excluding ORF-fragments fu1, fu2, fu3, fu4 and fv1; see Table 3) are identical to the ORFs 2 to 28 of the analysis of pXB296 in Table 2 apart from minor revisions (N.B. the analysis recited in Table 3 should be taken as the definitive analysis—Table 2 merely represents preliminary findings). The cosmid pXB110, which was sequenced with the dye primer shotgun sequencing strategy in order to compare it with the dye terminator shotgun sequencing strategy used to sequence cosmid pXB296, in combination with pXB296 and pXB368 cover nearly this entire region. A PCR product and a restriction fragment of cosmid pXB564 also had to be sequenced in order to fill in the gap from position 480,607 to 483,991 between cosmids pXB368 and pXB110 (FIG.


4


). Among the 76 predicted ORFs, 7 ORFs and their deduced proteins show no homologies to database entries. The other predicted ORFs and their deduced proteins do exhibit such homologies and therefore play putative roles in nitrogen fixation (ORFs y4uJ to y4vB, y4vE, y4vN to y4vR, y4wK and y4wL), nodulation (ORFs y4yC and y4yH), transportation (ORFs y4tQ to y4uA, y4vF and y4wM), secretion of proteins or other biomolecules (ORFs y4yI and y4yO), transcriptional regulation/DNA binding (ORFs y4wC and y4xI), in amino acid metabolism or metabolism of amino acid derivatives (ORFs y4uB, y4uC, y4uF, y4wD, y4wE and y4xN to y4yA), degradation of xenobiotic compounds (ORFs y4vG to y4vI), in peptidolysis/proteolysis (ORFs y4wA and y4wB) or transposition (ORFs y4uE, y4uH and y4uI) (see Table 3). The




role of some ORFs like the luciferase-like ORFs (y4vJ and y4wF; see Table 3) in rhizobia is still not clear. In the 100 kb region, the duplication of a 5 kb sequence (position 451,886 to 456,157 and 483,764 to 488,035) including the genes nifHDK is remarkable. These genes encode the basic subunits of the nitrogenase. Furthermore, the transcriptional regulator nodD2 is very interesting because its role seems not to be identical to a previously identified nodD2 in a closely related strain (Appelbaum et al., 1988; data not shown). Also the pmrA-homologous ORF y4xI putatively plays an important role in regulating symbiotic processes because a nod box (binding region for the basic regulator nodD1; Fisher and Long, 1993) is located upstream of this ORF (position 493,962 to 494,000). Finally, the presence of ORFs (y4yI and y4yK to y4yN; see Table 3) homologous to type III secretion proteins, which have only been known previously in plant or animal/human pathogenic bacteria, shows that there only seems to be a subtle difference between symbiotic and pathogenic abilities of microorganisms.












TABLE 3











List of the predicted functional ORFs and of fragments representing putative remnants of functionial ORFs




















no. of




hom.









func-




position in




deduced




amino




hom. protein























tional





plasmid




amino




acids





length




accession




I/




S/







ORF


a






name




st.


b






(base no.)


c






acids




(position)




name




(aa)


d






no.


e






%


f






%


f






note


g











y4aA





−2/3




534696-000474




647




16-646




Shc




658




X86552




78




88




prob. squalene-hopene-cyclase; put.

















operon y4aABCD: inv. in synthesis

















of an isoprenoid compound






y4aB





−3




000523-001776




417




6-415




ORF1




414




X80766




43




63




put. flavoprotein oxidoreductase






y4aC





−2




001776-002615




279




3-247




Psy1




419




X68017




34




50




put. phytoene synthase






y4aD





−1




002612-003490




292




10-195




Crt1




342




L37405




33




51




hyp. protein hom. to squalene and phyto-






fa1





−3




003487-004011











ene synthetases fragmentous character






y4aF




nolK




−3




005173-006117




314




9-310




ORF14.8




321




U46859




51




70




put. NAD-dep. nucleotide sugar epimerase/

















dehydrogenase; NoeJKL/NodZ/NolK inv.

















in biosynthesis of fucose moiety of

















Nod factors






y4aG




noeH




−2




006126-007181




351




4-339




RfbD




348




U24571




65




80




put. GDP-D-mannose dehydratase






y4aH




nodZ




−1




007426-008394




322




3-254




NodZ




324




L22756




69




83




put. fucosyltransferase






y4aI




noeK




−3




008623-010047




474




5-471




ORF5




483




U47057




42




59




put. phosphomannomutase






y4aJ




noeJ




+3




010110-011648




512




33-498




XanB




466




M83231




50




65




put. mannose-1-phosphate guanylyl-

















transferase






y4aK





+2




012125-012277




50










hyp. 5.5 kd protein






y4aL




nodD1




+2




012380-013348




322




1-322




NodD1




322




Y00059




98




99




transcriptional regulator (LysR family);











1-310




NodD2




312




this work




68




84




high similarity to Y4xH(NodD2)






y4aM





+3




013911-014342




143




7-132




ORF3




127




L13845




50




66




put. DNA-binding protein; high











1-143




Y4wC




143




this work




69




77




similarity to Y4wC






y4aN





+1




014488-014934




148




1-129




ORF3




128




X04833




41




56




homologue located nearby the

















replicator region of pRiA4b






y4aO





+3




015065-015643




192










hyp. 21.8 kd protein; low similarity

















to Y4nF(<30% id.)






y4aP




mucR




+3




016161-016592




143




1-143




MucR




143




L37353




89




95




put. transcriptional regulator (Ros/MucR

















family); similarity to Y4pD;

















possibly inv. in regulation

















of exopolysaccharide synthesis






y4aQ





−2




017016-017582




188




15-167




No1265




266




X74068




33




50




hyp. 20.4 kd protein; similar

















to Y4hP, Y4jD, Y4qI






y4aR





+2




017798-018121




107










hyp. 12.1 kd protein






y4aS





+1




018121-018666




181










hyp. 20 kd protein






fa2





+3




018912-019664




250




126-250




Tnp




465




U04047




38




51




hyp. protein fragment











78-150




Y4iG




90




this work




93




97











3-266




Y4bF




457




this work




53




73






y4bA





−2




019674- 021758




694




1-393




fo6




430




this work




89




95




hyp. 78.7 kd protein; identical to Y4pH











406-532




fo5




136




this work




83




94











532-694




fo4




143




this work




77




83






y4bB





−3




021748-022014




88




2-88




Y4oL




88




this work




63




69




hyp. 9.7 kd protein precurser; identical to Y4pI






y4bC





−1




022034-022483




149




1-149




Y4oM




149




this work




79




88




hyp. 16.8 kd protein; identical to Y4pJ






y4bD





−2




022674-022943




89




20-89




Y4oN




70




this work




73




84




hyp. 10.2 kd protein; identical to Y4PK






fb1





+2




022985-023659




224




36-224




Y4bF




457




this work




42




63




hyp. protein fragment






y4bF





+1




023953-025326




457




130-436




Tnp




465




U04047




31




46




put. transposase;











2-265




Fa2




266




this work




53




73




upstream of this ORF (23875-











77-169




Y4iG




90




this work




51




72




23987) 89% nt-id. to part











285-457




Fb1




188




this work




42




63




of origin of replication-region











410-457




Y4JM




70




this work




75




79




(


R. meliloti;


, S66221)






y4bG





+1




025870-026685




271










hyp. 30 kd protein precurser






y4bH





+1




028513-028788




91










hyp. 9.6 kd integral membrane protein






y4bI





+3




028860-029276




138




3-108




HI1631




190




U00085




41




61




hyp. 15.3 kd protein precurser






y4bJ





+1




029392-031284




630




429-564




HtrA




503




L20127




40




53




hyp. 67.9 kd integral membrane protein, distantly

















related to peptidase family S2C






y4bK





+2




031625-032293




222




83-212




ORF1




215




D84146




25




45




hyp. 24.3 kd protein






y4bL





+1




032641-034191




516




7-515




ORF1




558




X79443




44




63




identical to Y4kJ and Y4tB; similar to

















Fo3 and Fo7; put. transposase











6-516




Y4ul




515




thiswork




48




66






y4bM





+3




034188-034979




263




1-203




ORF2




231




X79443




45




62




identical to Y4kI and Y4tA; put.











6-248




Y4pL




245




this work




55




73




insertion sequence ATP- binding











6-254




Y4uH




248




this work




48




68




protein; similarity to Y4pL, Y4uH,











1-263




Y4iQ




298




this work




31




56




also to Y4sD/Y4nD/Y4iQ






y4bN





+1




035278-036573




431










hyp. 47.6 kd protein






y4bO





+1




036646-038466




606










hyp. 66.8 kd protein






y4cA





−1




038576-042169




1197










hyp. 137.7 kd protein;

















largest protein in pNGR234a






y4cB





−3




042226-042522




98










hyp. 10.2 kd integral membrane protein






y4cC





−3




042556-044109




517










hyp. 57.8 kd protein






y4cD





−2




044106-046028




640










hyp. 71.6 kd protein






y4cE





−3




046486-047661




391










hyp. 43.4 kd protein






y4cF





−1




047687-048829




380










hyp. 41.8 kd protein






y4cG





+2




049361-050278




305




16-173




Pin




184




K00676




50




68




prob. DNA invertase “resolvase-type”











17-222




Y4IS




183




this work




40




60






Y4cH





−2




050427-050636




69




4-65




CspS




70




L23115




56




70




prob. cold shock regulator






y4cI





−2




053202-054416




404




1-397




RepC




405




X04833




60




73




put. replication protein C






y4cJ





−3




054571-055551




326




1-317




RepB




319




X89447




39




55




put. replication protein B






y4cK





−2




055608-056831




407




10-404




RepA




398




X89447




58




73




put. replication protein A






y4cL




tra1




+2




057635-058261




208




1-206




TraI




212




U43675




55




66




prob. autoinducer synthetase (inv. in

















control of conjugal transfer)






y4cM




trbB




+3




058272-059249




325




3-325




TrbB




323




U43675




80




88




prob. conjugal transfer protein











1-115




Y4oG




125




this work




25




51




(PulE family)






y4cN




trbC




+1




059239-059622




127




7-127




TrbC




134




U43675




69




78




prob. conjugal transfer protein

















(integral membrane prot.)






y4cO




trbD




+2




059615-059914




99




1-99




TrbD




99




U43675




70




89




prob. conjugal transferprotein

















(integral membrane prot.)






y4cP




trbEa




+3




059925-060374




149




1-136




TrbE




820




U43675




80




91




prob. conjugal transfer protein

















(hom. to 5′ part of trbE)






y4cQ




trbEb




+1




060394-062382




662




5-659




TrbE




820




U43675




83




90




prob. conjugal transfer protein

















(hom. to 3′ part of trbE)






y4dA




trbJ




+2




062354-063157




267




1-107




TrbJ




175




U43675




60




69




prob. conjugal transfer protein











194-267







71




79






y4dB




trbK




+1




063154-063351




65




5-65




TrbK




75




U43675




40




56




prob. conjugal transfer

















protein precurser






y4dC




trbL




+3




063345-064520




391




3-387




TrbL




395




U43675




74




85




prob. conjugal transfer protein

















(integral membrane prot.)






y4dD




trbF




+2




064544-065206




220




1-220




TrbF




220




U43675




80




90




prob. conjugal transfer protein






y4dE




trbG




+1




065224-066036




270




6-270




TrbG




284




U43675




74




84




prob. conjugal transfer protein precurser






y4dF




trbH




+1




066040-066486




148




1-147




TrbH




159




U43675




55




68




prob. conjugal transfer protein precurser

















(with lipid

















anchor)






y4dG




trbI




+3




066498-067793




431




1.430




TrbI




433




U43675




66




79




prob. conjugal transfer protein

















(integral membrane prot.)






y4dH




traR




+2




068096-68806




236




7-236




TraR




234




Z15003




28




45




prob. transriptional activator of conju-

















gal transfer genes (LuxR family)






y4dI




traM




−1




068810-069133




107




8-101




TraM




102




U43674




30




51




prob. modulator of TraR/autoinducer-

















mediated activation of tra genes






y4dJ





+3




069351-069584




77




1-67




ORF




84




X16458




37




59




hyp. transcriptional regulator (PbsX family); low

















similarity to N-terminus of Y4dL






y4dK





−1




069629-069949




106










hyp. 11.8 kd protein






fd1





−2




069936-070250




(105)




(2-85)




ORFA




400




X67861




39




58




put. transposase fragment






y4dL





+1




070603-071193




196


































hyp. 21.8 kd protein; low

















similarity to Y4dJ






y4dM





+2




071186-072415




409




1-357




HipA




440




M61242




31




46




hyp. 45.3 kd protein; homolog affects frequency











3-405




Y4mE




420




this work




34




56




of persistence after inhibition

















of cell wall or DNA synthesis






y4dN





+1




072787-072975




62


































hyp. 7 kd protein






y4dO





−1




073550-073951




133




12-121




ORF




38.1




D83536




43




57




hyp. 14.9 kd (fragmentous?)

















protein; homology to intron

















protein of


P. anserina



















continues in fr.-2 (73541-73467)






y4dP





−1




074423-075025




200




1-48




ORFR2




57




U43674




72




89




hyp. 21 kd protein; hom. to











56-198




ORFR3




154





47




71




conjugal transfer region 1






y4dQ




traB




−2




075042-076205




387




1−387




TraB




421




U40389




61




72




prob. conjugal transfer protein






y4dR




traF




−3




076195-076761




188




20-188




TraF




176




U40389




55




73




prob. conjugal transfer protein






y4dS




traA




−2




076758-080066




1102




1-1102




TraA




1100




U43674




67




79




prob. conjugal transfer

















protein (relaxase)






y4dT




traC




+3




080319-080627




102




1-102




TraC




98




U40389




64




80




prob. conjugal transfer protein






y4dU




traD




+1




080632-080847




71




1-71




TraD




71




U43674




77




84




prob. conjugal transfer protein






y4dV




traG




+2




080834-082756




640




1-631




TraG




658




U40389




71




83




prob. conjugal transfer protein






fd2





+




083002-083293






ORFL1




152




U43674






fragments hom. to ORFL 1

















(conjugal transfer region 1);

















frameshifts: 83072 (1 > 3), 83161 (3 > 2)






y4dW





+1




083305-083919




204










hypothetical 22.9 kd protein






y4dX





+1




083944-84522




192










hypothetical 20.6 kd protein






ydeA





−2




084570-084836




88










hypothetical 9.9 kd protein






ydeB





−2




084976-085290




104










hypothetical 11.6 kd protein






fe1










085829-088007






MerA




474




X65467






put. fragments; homology to

















mercuric reductase, put.

















frameshifts: 86592 (−1<−3),

















87288 (−3<−2)






y4eC





−2




088305-089228




307




14−306




TraC-1




1061




X59793




38




55




hyp. 34.2 kd protein; hom. to 5′ end.

















of traC-1 from plasmid RP4






y4eD





+1




091051-092178




375




51-136




ORF145




145




X52594




29




55




put. phosphodiesterase; low

















homology to glycerophosphoryl-

















diester-phosphodiesterase






y4eE





+1




092212-093288




358










hyp. 38.5 kd protein






fe2










093572-093969






TrpA





U14952






(fragments of put. transposase; put.

















frameshift; 93798 (2>3)






y4eF





−1




093980-094735




251




2-236




Int




259




U14952




37




53




put. integrase/recombinase











1-251




Y4qK




308




this work




92




94




(”phage-type”);

















similar to Y4rF (35% aa-id.);

















low similarity to Y4rABCDE






fe5





−1




094988-095188




66




1-66




Fq6




66




this work




79




94




put. defective integrase/recombinase











1-66




Y4rC




332




this work




41




55






fe3










095343-096025






Int




259




U14952






fragments hom. to integrase; put. frameshift:

















95559-95671 (−2<−1)






y4eH




nolL




−2




096093-097193




366




11-359




NolL




373




U22899




63




77




nodulation protein; hyp. acetyl transferase






y4eI





−2




097914-098225




103




!









hyp. 11.1 kd protein with

















transmembrane domain






fe6





+3




098358-098657




99




3-98




AatB




410




L12149




40




55




hyp. 10.3 kd protein fragment,

















hom. to C-terminal part

















of bacterial aminotransferases






y4eK





+2




098675-099421




248




10-245




Adh




252




U00084




37




53




hyp. short chain type

















dehydrogenase/reductase






y4eL





+3




099447-100193




248




1-244




Gno




256




X80019




31




47




hyp. short chain type

















dehydrogenase/reductase






fe4





+




100270-101901






IlvG





M37337






put. fragment; put. frameshifts:

















100721 (1 < 2), 101728 (2 > 1)






fe7





−1




101585-102298




237




1-103




Tnp




398




U08627




91




95




put. truncated transposase-like

















protein; similar to Y4pO






y4eN





−3




102625-102936




103










hyp. 11.5 kd protein






y4eO





−2




102933-103598




221










hyp. 24.5 kd protein






y4fA





−1




103805-106342




845




327-837




MepA




657




X66502




41




59




prob. methyl-accepting











7-845




Y4sI




756




this work




29




49




chemotaxis protein






y4fB





+3




106620-108614




664










hyp. 73.7 kd protein






y4fC





+3




109884-110618




244




10-163




DszA




453




L37363




38




52




hyp. (fragmentous?) monooxygenase;

















extended homology to DszA

















in fr.2: 110372 to 110506.






y4fD





−1




110516-111178




220










hyp. 24.6 kd integral membrane protein






y4fE





−2




111195-111677




160










hyp. 17.2 kd protein precurser






y4fF





−1




111803-112348




181










hyp. 19.5 kd protein






y4fG





−2




112338-112727




129










hyp. 14.5 kd protein






y4fH





−1




113474-113782




102










hyp. 11.6 kd protein






ff1





−3




113779-114114




111




61-97




DppF




330




L08399




56




86




hyp. protein fragment, similar to

















central region of oligo/di-peptide

















ABC transporter ATP-binding proteins






y4fJ





−2




114348-115379




343




3−210




RopA




318




M69214




53




66




put. outer membrane protein

















(point) precurser






y4fK





−2




116112-117395




427




275-421




XyIS2




157




L02642




31




53




put. transcriptional regulator

















(AraC family)






y4fL





−3




117385-118212




275




9−243




ORF




268




U39059




32




46




hyp. 29.1 kd integral membrane

















protein, belongs to the

















inositol monophosphatase family






y4fM





−2




118209-119144




311










hyp. 35.5 kd protein






y4fN





−2




119145-120854




569




11-513




CysU




550




U32807




23




45




prob. ABC transporter permease

















protein; put. part of binding-

















-protein-dependent transport system Y4fNOP






y4fO





−1




120851-121870




339




12-247




PotA




381




U32759




49




68




prob. ABC transporter AtP-

















binding protein






y4fP





−1




121883-122959




358




32−293




SufA




338




M33815




23




42




prob. ABC transporter periplasmic

















binding protein precurser






y4fQ





+1




123016-124194




392




9−234




NagC




406




X14135




25




46




hyp. 41.6 kd protein; belongs to

















“ROK” family (transcriptiotial

















regulator or transferase)






y4fR





+1




124813-126453




546




88-539




JpaH




532




M32063




38




54




hyp. 60.5 kd protein, hom. to

















invasion plasmid antigen H






y4gA





−1




126806-127369




187










hyp. 20.9 kd protein; low similarity to Y4rE






y4gB





−2




127485-127904




139










hyp. 16.1 kd protein






y4gC





−1




127901-128479




192




1-178




ORF2




415




L34580




43




58




put. integrase/recombinase

















(“phage-type)






y4gD





−1




128579-128857




92










hyp. 10.5 kd protein






y4gE





+2




131021-131767




248










hyp. (fragmentous?) 27.7 kd protein;

















put. frameshifts: 131532 (2>1),

















131892 (1>2)






y4gF





+2




132734-133786




350




4-345




RhsB




353




U51197




65




74




prob. dTDP-D-glucose-4,6-

















dehydratase (Y4gFGH inv. in

















dTDP-L-rhamnose biosynthesis)






y4gG





+2




133790-134680




296




1-290




RhsD




288




U51197




48




66




prob. dTDP-4-dehydrorhamnose reductase






y4gH





+1




134677-135537




286




2-285




RfbA




293




U09876




65




82




prob. glucose-1-phosphate

















thymidylyltransferase






y4gI





+3




135534-138263




909




276-894




RfbC




1275




U36795




38




55




hyp. 102.8 kd protein (homolog is

















involved in O-antigen biosynthesis)






y4gJ





−1




138737-139315




192










hyp. 21.1 kd protein






y4gK




fixF




+3




142026-143234




402




114-184




KpsS




389




X74567




26




54




necessary for functional nitrogen











203-362







30




53




fixation, hom. to capsule

















polysaccharide export protein






y4gL





−3




143473-144060




195




24-192




RhsC




188




U51197




53




65




prob. dTDP-4-dehydrohamnose-

















3,5-epimerase (inv. in dTDP

















-L-rhamnose biosynthesis)






y4gM





−2




144147-145907




586




26-581




MsbA




582




Z11796




32




56




prob. ABC transporter ATP-binding protein






y4gN





+2




146075-147226




383




52-297




VirA




304




L08012




29




46




hyp. 45 kd protein






y4hA





−1




147455-148558




367




7−362




ChaA




366




L28709




34




58




put. ionic transporter






y4hB




noeE




−3




148819-150078




419




3-138




F42G9.8




359




U00051




32




49




nodulation protein (put.











197−289







25




50




sulfate transferase)






y4hC




noeG




−3




151051-151782




243




18−229




u0002kb




243




U00024




27




42




nodulation protein

















(unknown function)






y4hD




nolO




−1




151979-154021




680




1-126




NolN




127




L22756




70




83




inv. in O-carbamoylation of











140-496




NolO




358





78




89




Nod factors (sim. to NodU)






y4hE




zodJ




−3




154120-154908




262




5−261




NodJ




262




J03685




69




84




prob. ABC transporter

















permease (see nodI)






y4hF




nod1




−3




154912-155943




343




15−343




NodI




339




X55795




69




85




prob. ABC transporter ATP-binding

















transport protein; put role;

















together with NodJ export of modified

















beta-1,4 N-glucosamine oligosaccharides






y4hG




nodC




−1




156095-157336




413




1-413




NodC




413




X73362




99




100




N-acetylglucosaminyltransferase






y4hH




nodB




−3




157351-157998




215




1-215




NodB




214




X73362




99




99




chitooligosaccharide deacytelase






v4hI




nodA




−2




579951-158585




196




1-196




NodA




196




X73362




100




100




N-acyltransferase; nodABC involved

















in synthesis of backbone of

















modifled N-acylated glucosamine

















oligosaccharides






y4hJ





−1




158993-159775




260




59-240




ORF2




251




L133618




68




81




hom. to part of coproporphyrinogenIII

















oxidase (lacks C-terminus and

















conserved N-term. domain)






y4hK





+3




160722-161465




247










hyp. 25.4 kd internal membrane protein






y4hL





+1




161569-161826




85










hyp. 9.6 kd protein






y4hM





+1




163042-164253




403




53-169




Gfor




439




M97379




31




54




hyp. 43.9 kd protein (partially hom. to

















glucose-fructose oxidoreductase)






y4hN





+2




164600-165034




144




10-144




ORFA




135




X84099




38




53




hyp. 16 kd protein; partiaily

















hom. to Y4jB and Y4rG






y4hO





+1




165037-165384




115




1-115




ORF140




140




X74068




100




100




hyp. 12.8 kd protein











1-115!




ORFC




144




X84099




54




69











1-115




Y4jC




117




this work




36




62






y4hP





+1




165430-167088




552




1-215




no1265




266




X74068




97




97




hyp. 61.7 kd protein; similar to











80−328




ORF2




258




M10204




67




79




Y4aQ, Y4jD and Y4qI











162-492




ORF3




163




M10204




47




61






y4hQ





+3




167091-167675




194




5-185




ORF3




237




X51418




35




53




hyp. 21.7 kd protein











1-52




ORF91




>91




X74068




96




98






y4hR





−3




167710-167934




74










hyp. 8.8 kd protein






fi1





−1




168208-168300











hyp. transposase fragment similar to



















R. meliloti


ISRm2011-2






fi2





+1




168430-168792




120




1-130




Y4iO




252




this work




78




87




put. defective transposase (homologous











1-108




Y4rJ




396




this work




74




87




to N-terminal

















parts of Y4iO and Y4rJ)






fi3





+2




168798-169190




130




1-109




ORF1A




317




M33159




37




55




put. defective transposase(hom.to C-











1-130




Y4iO




252




this work




78




87




terminal parts of Y4iO and Y4rJ);











1-130




Y4rJ




396




this work




76




84




additionally weak homology to

















Y4pF/Y4sB and Y4qE (<30% identity)






y4iR





−3




169231-169716




161




15-145




PsiB




134




L26581




55




74




hyp. protein (homolog located in a poly-

















saccharide biosynthesis inhibition operon






y4iC





−2




169929-170621




230




58-123




ORF




161




Z73419




41




54




hyp. 25.8 kd protein

















(ORF=MTCY373.06)






y4iD





−3




170563-172551




662




137-342




ORF




495




Z73101




40




59




prob. monooxygenase (ORF=MTCY31.20)











418-605







28




51






y4iE





+3




173295-173702




135




1-135




Y4rL




155




this work




33




52




hyp. 15.4 kd (fragmentous?) protein;

















similar.to Y4ZA






y4iF





−3




174211-175128




305










hyp. 34.1 kd protein






y4iG





−2




175590-175862




90




1-73




Y4aT




266




this work




93




97




hyp. 10.5 kd (fragmentous?) protein











1-73




Y4bF




457




this work




60




76






y4iH





+2




176045-176764




239




1−236




Y4jT




336




this work




32




53




hyp. 26 kd protein precurser






y4iI





−2




176937-179048




703










hyp. 76.2 kd integral membrane protein






y4iJ





−2




179097-180887




596










hyp. 65.5 kd protein;

















low similarity to Y4iM






y4iK





−3




180940-181638




232










hyp. 26.8 kd protein; y4iKL:

















two fragments of one gene?;

















put. frameshift: 181884 (−3<−2)






y4iL





−2




181692-182990




432










hyp. 47.8 kd protein; y4iKL two fragments

















of one gene?; put.

















frameshift: 181884 (−3<−2)






y4iM





−2




183036-184334




432










hyp. 47.1 kd protein; low similarity to Y4iJ; y4iMN

















two fragments of one gene?;

















put. frameshift: 184440 (−2<−3)






y4iN





−3




184309-184935




208










hyp. 22.1 kd protein precurser; y4iMN two fragments

















of one gene?; put. frameshift:

















184440 (−2<−3)






y4iO





−2




185679-186437




252




17−243




Tnp




334




Z48244




29




46




put. transposase or transposase-











1-121




Fi2




120




this work




67




79




fragment; additionally











123-252




Fi3




130




this work




78




87




weak homology to Y4pF/











1-252




Y4rJ




396




this work




71




83




/Y4sB and Y4qE (<30% identity)






y4iP





−1




186437-186832




131




4-163




Y4rJ




396




this work




58




80




hyp. 14.4 kd protein or fragment hom.

















to N-term. of Y4rJ






y4iQ





−3




187162-188058




298




13−253




IstB




265




U38187




34




56




identical to Y4nD/Y4sD; put. inser-











8−283




Y4bM




263




this work




31




56




tion sequence ATP-binding protein;











5−265




Y4uH




248




this work




31




52




similarity to Y4bM/Y4kI/Y4tA,

















Y4uH and weakly to Y4PL






y4jA





−2




188055-189569




504




147-494




IstA




507




U38187




25




42




identical to y4nE/y4sE; hyp. 57.2 kd











395-504




Fz4




110




this work




72




85




protein with low similarity to

















IS21/IS408/IS1162 transposases






y4jB





+3




190248-190706




152




24-79




ORF1




130




U19148




46




69




hyp. 16.7 kd protein; partially similarity

















Y4hN; low similarity to Y4rG






y4jC





+2




190703-191056




117




1-115




ORFC




144




X84099




39




58




hyp. 13.1 kd protein; see y4hO











1-117




Y4hO




115




this work




36




62






y4jD





+2




191105-192640




511




89-298




ORF2




258




M10204




36




53




hyp. 56.7 kd protein: see y4hP











340-453




ORF3




163




M10204




28




49











18-183




no1265




266




X74068




32




48






y4jE





+1




192637-193458




273










hypothetical (fragmentous?) 29.4 kd integral

















membrane protein; put. frameshift: 192996

















(1>2; end of shifted ORF at 193183)






y4jF





−1




194771-196330




519










hyp. 55.4 kd integral membrane protein






y4jG





−3




196333-196821




162










hyp. 17.9 kd transmembrane protein






y4jH





−2




196818-197435




205










hyp. 23 kd protein






y4jI





−3




197428-197820




130










hyp. 13.6 kd protein






y41J





+1




198043-198300




85




1-85




StbC




103




L48985




67




76




put. plasmid stability protein






y4jK





+3




198297-198719




140




1-138




StbB




139




L48985




57




76




put. plasmid stabiiity protein






y4jL





+3




199002-199664




220










hyp. 25.1 kd protein






y4jM





−2




199746-199958




70




1-58




Y4bF




457




this work




75




79




hyp. 8 kd protein or protein fragment











15-58




fb1




188




this work




50




64






y4iN





−3




199975-200415




146










hyp. 16.3 kd protein






y4jO





−3




201514-202479




321










hyp. 36.1 kd protein; y4jOP: two

















fragments of one gene?,

















put. frameshift: 202550 (−3<-1)






y4jP





−1




202406-203194




262










hyp. 29.5 kd protein; y4jOP:

















two fragments of one gene?,

















put. frameshift: 202550 (−3<−1)






y4iQ





+2




203729-206848




1039










hyp. 115.9 kd protein






y4iR





+1




206860-207315




151










hyp. 17.3 kd protein






y4iS





+1




207316−208557




413










hyp. 44.8 kd protein






y4iT





−1




208877-209887




336




17−283




Y4iH




239




this work




32




53




hyp. 36.4 kd protein precurser






y4kA





−3




209917-210885




322










hyp. 36.7 kd protein






y4kB





+1




211663-212088




141










hyp. 15.2 kd integral membrane protein






fk2





−1




212111-212479




122




58-116




ORFl4




104




X00493




59




76




hyp. fragment; sim. to Y4hP, Y4jD

















and Y4qI; additional homology to

















ORF14 in fr. +3/+2: 212331-212509






y4kD





−1




212750-214399




549










hyp. 60.4 kd protein






y4kE





−1




214412-215455




347










hyp. 38 kd protein; y4kEF: two fragments

















of one gene?, put.

















frameshift: 215616 (−1<−2)






y4kF





−2




215439-216743




434










hyp. 47.4 kd protein; y4kEF: two

















fragments of one gene?,

















put. frameshift: 215616 (−1<−2)






y4kG





−2




216855-217064




69










hyp. 7.7 kd protein






y4kH





−3




217105-217488




127










hyp. 14.1 kd protein






y4kI





−1




217670-218461




263


































see y4bM






y4kJ





−3




218458-220008




516


































see y4bL






y4kK





−1




220103-221041




312










hyp. 34.9 kd protein






y4kL





−2




221049-222041




330




101-296




ORF300




300




U23723




39




56




hyp. 37.6 kd AAA-family ATPase protein






y4kM





+2




222641−222994




117










hyp. 13.1 kd protein






y4kN





+2




223115-223537




140










hyp. 15.7 kd protein






y4kO





+2




223970-224218




82










hyp. 9.2 kd protein






y4kP





+1




224215-224505




96










hyp. 11 kd protein






y4kQ





−2




224898-225326




142










hyp. (fragmentous?) 15.3 kd protein;

















homology to hipO fragments

















on the complementary strand






fk1





+3




225094-225473








Z36940






fragments hom. to hipO






y4kR





−3




225535-225666




43




1-36




ORF6




347




M87280




55




66




hyp. 4.8 kd (fragmentous?) protein

















(smallest ORF predicted to be a protein);

















hom. to N-term. of protein in

















crtE-crtX intergenic region






y4kS





−3




225751-226656




301




1-301




ORF8




300




U12678




93




94




hyp. 33.2 kd protein






y4kT





−2




226653-228203




516




1-516




ORF7




516




U12678




93




94




hyp. 55.1 kd protein






y4kU





−3




228514-229512




332




1-332




ORF6




332




U12678




90




94




prob. geranyltranstransferase






y4kV





−3




229666-231009




447




92-447




CYP117




356




U12678




89




94




cytochrome P450 BJ-4 homolog






y4lA





−2




231009-231845




278




1-274




ORF4




275




U12678




83




87




short-chain type dehydrogenase/reductase






yrlB





−3




231832-232140




102




1-58




ORF3




94




U12678




93




98




put. P450-system 3Fe-3S ferredoxin






y4lC





−2




232170-233573




467




48-428




CYP114




382




U12678




90




93




cytochrome P-450 BJ-3 homolog






y4lD





−1




233666-234868




400




3-400




CYP112




401




U12678




92




95




cytochrome P-450 BJ-1 homolog






fl3





−2




235704-235904




66




2-54




ORF8




>207




X66124




60




71




hyp. 7.6 kd protein fragment, homology to ORF8

















fragments also upstream of fl3 up to 236048






fl1










236796-237416








Z36981






homology to hupK/hupJ fragments

















(fr. −3/−2)






y4lF





+1




237508-238479




323










hyp. 36.1 kd protein






y4lG





+2




238490-238975




161










hyp. 17.4 kd protein






y4lH





−2




238959-239537




192




3-184




Fic




200




M28363




34




51




hyp. 22.4 kd protein; hom. to cell

















filamentation/division protein






y4lI





−2




239541-239750




69










hyp. 7.3 kd protein






y4lJ





−3




240358−240861




167










hyp. 18.1 kd protein






fl2










240920-241040








X65471






fragments of transposase (ISRm4)






y4lK





+1




241207-241605




132










hyp. 14.3 kd protein






y4lL





−2




241845-244328




827




118-816




SLR0359




1244




D63999




33




50




hyp. 91.8 kd protein (member of


E. coli



















YegE/YhdA/YhjK/YicC family)






fl4





+1




244540-244851




103




19-103




TnpA




990




L14931




39




51




put. truncated transposase; hom. to











28-81




F15




112




this work




94




98




N-term. of TnpA (transposon

















Tn163); strong similarity to

















to


C-terminus


of F15






Y4lN





+3




244848-245330




160










hyp. 18.1 kd protein






y4lO





−3




247156-247938




260




11-216




AvrRxv




373




L20423




36




50




hyp. 29.1 kd protein; hom. to avirulence

















protein; put, frameshift according

















to homolog: 247230-247293

















(−2<−3):

















end of shifted frame: 246960






fl5





+1




248290-248628




112




59-112




F14




103




this work




94




98




hyp. protein fragment; strong

















similarity to part of F14






fl6





+3




248814-249680




288




8-286




Tnp




988




M97297




27




49




put. fragmentous transposase; homologous

















C-term. of transposase (Tn1546)






y4lR





+3




249696-251264




522










hyp. 56.8 kd protein






y4lS





+1




251407-251958




183




3-176




PaeR7IN




195




S78872




42




56




put. integrase/recombinase











4-181




Y4cG




305




this work




40




60




(“resolvase-type”)






y4mA





+3




251955-252380




141










hyp. 15.8 kd protein






fm1










254694-254920











fragments hom. to xylitol-dehydrogenase






y4mB





+3




255450-256139




229




59−229




ORF4




212




X13583




33




53




hyp. 24.6 kd outer membrane

















protein precurser






y4mC





+2




256811-257524




237










hyp. 26.2 kd protein precurser






y4mD





−1




258065-258334




89










hyp. 10 kd protein






y4mE





−3




259030-260292




420




6-334




HipA




440




M61242




32




46




hyp. 45.7 kd protein











2-417




Y4dM




409




this work




34




56






y4mF





−2




260289-260519




76




11-47




ORF3




90




X06090




37




70




hyp. transcriptional regulator; very

















low similarity to

















phage repressor proteins






y4mG





+3




261174−261395




73










hyp. 7.8 kd protein






y4mH





−2




261747−262640




297










hyp. 33.9 kd protein






y4mI





−2




262698-263672




324




11-252




RbsB




296




M13169




25




49




prob. ABC transporter periplasmic

















binding protein precurser (transport

















system Y4mIJK probably transports

















a sugar)






y4mJ





−3




263716-264717




333




12-323




RbsC




321




M13169




34




55




prob. ABC transporter permease






y4mK





−2




264714-266207




497




8-489




RbsA




501




M13169




34




55




prob. ABC transporter ATP-binding protein






y4mL





−3




266218-267477




419




1-418




HI1029




425




U00079




33




58




put. permease

















(


E. coli


YiaN/YgiK family)






y4mM





−2




267474-269099




541




38-360




HI1028




328




U32729




33




54




put. permease (SBR family 7)






y4mN





−1




269096-270133




345




37−340




Tkt




655




U09256




36




54




hyp. transketolase family protein (fragmentous?);

















hom. to C-term. of transketolases






y4mO





−3




270130-270969




279




9−270




Tkt




655




U09256




36




52




hyp. transketolase family protein (fragmentous?);

















hom. to N-term. of transketolases






y4mP





−3




271000-271761




253




4-249




F09E10.3




255




U41749




41




60




put. short-chain type dehydrogenase/reductase






y4mQ





+1




271909-272805




298




1-289




PerR




297




U57080




48




65




hyp. transcriptional regulator

















(LysR family)






y4nA





−2




273204-275384




726




45-302




ORF




690




D14005




21




36




prob. peptidase; very low similarity to











365-718







38




54




Y4qF and Y4sO (<25% identity)






y4nB




nodU




−3




276451-278127




558




1-558




NodU




558




X89965




100




100




inv. in 6-O-carbamoylation of Nod

















factors; similar to Y4hD






y4nC




nodS




−1




278144-278794




216




1-216




NodS




216




J03686




100




100




methyltransferase inv. in Nod-factor synthesis






y4nD





−3




280453-281349




298


































see Y4iQ






y4nE





−2




281346-282860




504


































see Y4jA






fn1





+




283238-283467







241




M26938






hom. to virG fragments; similar to fq3






y4nF





+3




283809-284501




230










hyp. 25.4 kd protein precurser; low

















similarity to Y4aO (<30% id.)






fn2










284752-284923








X79443






fragments hom. to ORF2 (IS-ATP-binding

















protein) from IS1162






y4nG





+2




285407-286597




396




53-365




ORF4




333




U08223




31




47




put. NAD-dep. nucleotide sugar

















epimerase(dehydrogenase






y4nH





+1




286594-286947




117




5-113




MvrC




110




M62732




30




47




hyp. 12.3 kd integral membrane

















protein (some similarity

















to ethidium bromide resistance proteins)






y4nI





+2




286964-287326




120










hyp. 13 kd transmembrane protein






y4nJ





+1




287335-288852




505




80−266




BetA




548




U39940




29




44




hyp. GMC-type oxidoreductase











343-468







32




45






y4nK





−2




288906-290894




662










hyp. integral membrane protein






y4nL





−3




290914-291984




356




14-345




ORF6




328




U47057




26




45




put. NAD dep. nucleotide sugar epimerase/dehydrogenase






y4nM





−3




292003-293553




516




226-514




NoeC




307




L18897




30




52




put. permease






y4oA





−3




294502-296283




593




328-494




MccB




350




X57583




29




41




hyp. 65.2 kd protein; homolog inv. in production of the











4-590




Y4qC




583




this work




30




50




translation inhibitor microcin C7






y4oB





+1




296572-296961




129










hyp. 14.7 kd protein






y4oC





+1




296965-297657




230










hyp. 26 kd protein






y4oD





−1




297746-298390




214










hyp. 23.5 kd protein






y4oE





−3




298939-299148




69










hyp. 7.4 kd protein






fo1





−2




299145-299588




147










fo1 and fo2: two fragments of one put. gene; put.

















frameshift: 299664 (−2<−3)






fo2





−3




299578-299955




125




25-109




ORF11




344




X53264




37




63




homology to 5 part of ORF11;











1-123




Y4cM




325




this work




25




51




fo1 and fo2: two fragments of one putative gene; put.

















frameshift: 299664 (−2<−3)






fo3





+3




300015-300815




267




15-252




Tnp




518




L09108




40




59




fo3 and fo7: transposase-like protein interrupted by

















NGRIS-6






fo4





−2




300828-301259




143




1-143




Y4bA




694




this work




77




83




hyp. fragment; f04/5/6: fragments of one gene similar to

















Y4bA/Y4pH






fo5





−1




301274-301684




136




1-127




Y4bA




694




this work




83




94




hyp. fragment; f04/5/6: fragments of one gene






fo6





−2




301608-302900




430




1-393




Y4bA




694




this work




89




95




hyp. fragment; f04/5/6: fragments of one gene






y4oL





−3




302890-303156




88




1-88




Y4bB




98




this work




63




69




hyp. 9.6 kd protein






y4oM





-1




303179-303628




149




1-149




Y4bC




149




this work




79




88




hyp. 16.8 kd protein






y4oN





−2




303810-304022




70




1-70




Y4bD




89




this work




73




84




hyp. 8.1 kd protein






fo7





+2




304118-304453




111




4-103




Tnp




518




L09108




40




59




fo3 and fo7: transposase-like protein interrupted by

















NGRIS-6






y4oP





+1




304861-306156




431




47-429




u1756v




469




U15180




27




42




prob. ABC transporter binding protein (Y4OPQRS: sugar-

















like transport system)






y4oQ





+2




306236-307165




309




31−301




MalF




310




U15180




35




56




prob. ABC transporter permease protein






y4oR





+2




307178-308011




277




12−277




MalG




296




U15180




30




52




prob. ABC transporter permease protein






y4oS





+1




308008-309123




371




7−369




UgpC




369




U00039




50




68




prob. ABC transporter ATP.binding protein






y4oT





−2




309132-309722




196




2-196




Y4pA




609




this work




28




50




hyp. 20.6 kd protein; homologous to N-terminus of

















Y4PA, and weakly to Y4oV






y4oU





+1




309853−311061




402










hyp. 43.1 kd protein precurser






y4oV





+2




311051-311908




285




3-280




Y4pA




609




this work




32




56




hyp. 30.2 kd protein; homologous to N-terminus of

















Y4PA, and weakly to Y4oT






y4oW





+1




311911-312561




216










hyp. 23.7 kd protein






y4oX





+3




312606-313688




360




36-233




MocA




317




X78503




29




44




prob. NAD-dep. oxidoreductase






y4pA





+1




313714-315543




609




310-596




HydG




441




U00006




33




50




put. transcriptional regulator (sigrra54-dep.)











6-290




Y4oV




285




this work




32




56











35-237




Y4oT




196




this work




28




50






y4pB




otsB




+3




316350-317147




265




30-260




OtsB




266




X69160




41




57




prob. trehalose-phosphate phosphatase






y4pC




otsA




+1




317185-318579




464




1-456




OtsA




474




X69160




46




66




prob. trehalose-6-phosphate synthase; similar to fq1/2






fp1





+




318915-319242








U08864






fragments homologous to ORF3; put. frameshift

















acc. to homologue: 319122 (3>1)






fp2





+




319236-319670








U08864






fragment homologous to ORF1 from IS1248 (fr. 3);

















similar to fs4






Y4pD





−1




319601-320116




171




13-140




Ros




142




M65201




50




71




put. transcriptional regulator (MucR family); missing Zn

















finger motif; similar to Y4ap






y4pE





−1




320606-321013




135




1-135





222




U18764




91




94




identical to y4sA; hyp. 15.5 kd protein hom. to N-term.

















of RFRS9 2SkDa protein






y4pF





−2




321297-322460




387




50−374




Tnp




334




Z48244




43




60




identical to y4sB; put. transposase; low similarity to

















Y4qE, Y4iB and Y4jO (<30% aa4d.)






y4pG





−3




322486-323064




192




1-191!




ORFA




197




U22323




47




64




identical to y4sC; hyp. 21.1 kd protein






fp3





+2




323189-323956








X79443






“ORF” homologous to ORF1 of ISI162 interrupted by

















stop codon (323444)






y4pH





−1




323969-326053




694


































see y4bA






y4pI





−2




326043-326309




88


































see y4bB






y4pJ





−3




326329-326778




149


































see y4bC






y4pK





−1




326969-327238




89


































see y4bD






fp4





+1




327277-328059








LO9108




48




65




fragment homologous to put. IS-ATP-binding protein






y4pL





+3




328071-328808




245




1-204




ORF2




231




X79443




51




63




put. insenion sequence ATP-binding protein: similarity to











1-242




Y4bM




263




this work




55




73




Y4bM/Y4kI/Y4tA, Y4uH, and weakly to











1-245




Y4uH




248




this work




61




77




Y4iQ/Y4nD/Y4sD (<30 aa-id.)






y4pM





+2




329159-329977




272










hyp. 30.9 kd protein






fp5










330657-331414











put. frameshift: 331032 (2<1)






y4pN




syrM1




−3




332506-333522




338




13-324




SyrM




326




M33495




63




77




probable symbiotic regulator (LysR family)











1-338




SyrM2




339




thiswork




62




79






y4pO





+1




335062-336264




400




1-400




Tnp




400




M60971




96




98




prob. transposase (Mutater family); similarity to fe7






fq2





−2




333987-335003




338




1-320




OtsA




474




X69160




44




61




join fq1 + fq2: hom. to trehalose-6-phosphate synthase

















interupted by ISRm3-like element NGRIS-8; similarity

















to Y4pC (45% aa-id.)






fq1





−1




336311-336694




128




44-174




OtsA




474




X69160




48




67




see fq2






fq3





+




337338-338056








M26938






virG homologous fragments: stop at 37380; put.

















frameshift at 337844 (3>2); similar to fn1






y4qB





−1




339053-339547




164










hyp. 18.8 kd protein






y4qC





−3




339535-341286




583




314-489




ORF




401




Z54354




28




46




hyp. 63.6 kd protein











1-583




Y4oA




593




this work




30




50






y4qD





−3




343216-343950




244




1-244




Y4oA




618




this work




55




74




hyp. 26.8 kd protein, similar to N-terminus cf Y4rO






y4qE





+2




344114-345286




390




37-380




Tnp




364




X77623




38




57




prob. transposase; low similarity to Y4pF/Y4sB, Y4iB,

















Y4iO and Y4rJ (<30% aa-id.)






fq4





+3




345798−346130








M38257




34




51




fragments homologous to XerC (integrase)






y4qF





−2




346215-348479




754




41-725




PtrII




707




D10976




31




49




prob. peptidase (S9A family); high similarity to Y4sO;











32-736




Y4sO




705




this work




70




84




low similarity to Y4nA (<25% id.)






y4qG





−2




348501-349847




448




40-389




YgjG




454




U32722




42




62




prob. aminotransferase (class 3)






y4qH





−1




350294-351274




326




144-326




LasR




239




M59425




37




51




hyp. transcriptional regulator (LuxR family)






y4qI





−2




351837-353456




539




146-419




ORF1




322




M25805




44




63




hyp. 59.7 kd protein; similar to Y4aQ, Y4hP, Y4iD






fq5





−3




353533−353775











fragments fq5 and fr3 represent one put. gene similar to

















Y4hO and Y4jC interrupted by IS elements






y4qJ





−1




354140-355336




398




7-395




TnpA




388




U14952




42




60




put. transposase






y4qK





−2




355344-356270




308




51-293




Int




259




U14952




39




55




put. integrase/recombinase (“phage-type”); similar to











51-308




Y4eF




251




this work




92




94




Y4rF; low similarity to Y4rABCDE






fq6





−2




356436-356636




66




1-66




Fe5




66




this work




79




94




put. defective integrase/recombinase (“phage-type”); 75%












Y4rC




332




this work




45




62




nt-identity: 356436-356710 and 94988-95262 R[20]






y4rA





+1




356803-358032




409




17-397




ORF2




415




L34580




39




55




put. integrase/recombinase (“phage-type”)






y4rB





+3




358029-358973




314




135-267




TnpI




284




X07651




30




51




put. integrase/recombinase (“phage-type”)






y4rC





+2




358970-359968




332




22-294




XerC




295




U32696




31




50




put. integrase/recombinase (“phage-type”)











267-332




Fe5




66




this work




41




55











267-332




Fq6




66




this work




45




62






y4rD





−3




360025-360870




281




15-277




XprB




298




M54884




25




46




put. integrase/recombinase (“phage-type”)






y4rE





−2




360867-361799




310




50-288




YqkM




296




D84432




27




48




put. integrase/recombinase (“phage-type”); low similarity

















to Y4gA






y4rF





−1




361796-363073




425




126-414




ORF2




415




L34580




34




49




put. integrase/recombinase (“phage-type”)






y4rG





−1




363287-363694




135




16-109




ORF1




130




U19148




32




48




hyp. 14.8 kd protein (IS866 family); low similarity to

















Y4jB, Y4hN






y4rH





−3




363895-365331




478




62-374




Bcp




598




X63470




26




44




put. ligase; hom. to biotin carboxylases






fr1





−3




366307-366669











85% aa-identity to part of Y4rL






fr2










366594-367402











put. frameshift: 367296 (−2<−1)






fr3





−3




367705-367827











hom. to N-term. of Y4hO; see fq5






y4rI





−3




368503-369675




390










hyp. 44 kd protein






y4rJ





+1




369697-370887




396




152−379




Tnp




339




M80806




28




45




put. transposase; low similarity to Y4qE (<30% aa-id.)











135−244




Y4iA




120




this work




74




87











266−396




Y4iB




130




this work




76




84











135−396




Y4iO




252




this work




71




83











2-131




Y4iP




131




this work




58




80






y4rK





−1




370976-371350




124










hyp. 14.5 kd protein






y4rL





−2




371454-371921




155




1-99




Y4zA




295




this work




99




99




hyp. 17.7 kd protein; y4rLM: two fragments of one











17-155




Y4iE




135




this work




33




52




gene?; put. frameshift: 371972 (−2<−3); 85-99% aa-

















identity to parts of Y4ZA and fr1






y4rM





−3




371938-372990




350




258-339




Y4zA




295




this work




98




98




hyp. 39.4 kd protein; see y4rL






y4rN





−2




373578-374795




405




35-368




P43




416




X57470




26




44




hyp. 41.6 kd integral membrane protein






y4rO





+1




375313-377169




618




274-596




HIN0578




366




U32742




25




45




hyp. 69.3 kd protein; N-terminus: hom. to Y4qD; C-











1-244!




Y4qD




244




this work




55




74




terminus: hom. to C-terminus of histidinol-1-phosphate

















transaminase






fr4





+




377185-377534








X66016






sim. to Y4rG; put. frameshift: 377376 (1>3); hom. to

















fragment of ORFA3 (377409-377540)






y4sA





−3




377842-378249




135


































see y4pE






y4sB





−1




378533-379696




387


































see y4pF






y4sC





−2




379722−380300




192


































see y4pG






y4sD





−1




380933-381829




298


































see y4iQ






y4sE





−3




381826-383340




504


































see y4jA






fs5





−3




383593-384054




153




8-150




Tnp




334




Z48244




48




65




put. defective transposase; sim. to fs1









384210-384493











fragments with 94-84% nt-id. to ISRm6 (


R. meliloti


;

















acc. no. X95567)






y4sG





+1




384808-385818




336




97-325




Dd1




306




M14029




34




57




hom. to D-alanine:D-alanine ligase; probably different

















function






y4sH





+3




386505-387890




461




267-337




CapA




411




M24150




42




63




hom. to encapsulation protein A; nearly identical to

















Y4uA






fs1










388138-388586






Tnp





Z48244






fragments of put. transposase; put. frameshift: 388452

















(−3<−2); sim. to Y4pF, Y4sB, fs5






fs2





+2




388697-388897






ORF1





U19148




43




62




put. transposase fragment; hom. to N-term. of ORF1;

















sim. to Y4jB, Y4rG, Y4hN






fs3





+




388966-390695






AtoC





U17902






put. transcriptional regulator fragment (put. frameshifts:

















389891 (1>2); 390170 (2>3)); sim. to Y4pA, Y4oV,

















Y4oT)






y4sI





+2




390971-393241




756




325-741




McpA




657




X66502




41




60




prob. methyl-accepting chemotaxis protein











1-749




Y4fA




845




this work




29




49






y4sJ




gapD




−3




393202-394677




491




29-489




GabD




482




M88334




58




75




prob. succinate-semialdehyde dehydrogenase






y4sK





−1




394790-395170




126




5-122




C23G10.2




185




U39851




55




71




bel. to the YER057C/YIL051C/YJGF family; probably

















important cellular function






y4sL





−1




395204-395815




203




2-203




DadA




432




L02948




57




74




either functional dehydrogenase or non-functional

















fragment; hom. to small subunit of D-aminoacid

















dehydrogenase






y4sM





+1




395935-396318




127




1-127




ORF1




127




X74314




99




99




put. transcriptional regulator (AsnC/Lrp family; low

















homology to y4tD); missing H-T-H region






y4sN





+1




396523-396900




125




1-123




ORF2




>123




X74314




98




98




similar to ORFs derived from insertion elements (IS6501

















family); low similarity to fu4






fs4





+




396855-397283




(143




8-141




ORF1




186




X53945




48




63




put. IS-derived protein fragment (homology to C-term. of











1-141




Fp2




145




this work




39




62




ORF1 from IS869)






y4sO





−2




397608-399725




705




10-694




PtrII




706




D10976




32




49




prob. peptidase (S9A family); low similarily to Y4nA











1-705




Y4qF




754




this work




70




84




(<25% id.)






ft1





+3




400377-400625




(83)




20-83




Y4tE




300




this work




64




78




ft1 and ft2: one put. gene encoding an amino acid ABC

















transponter binding protein interrupted by NGRIS-3c






y4tA





−3




400732-401523




263


































see y4bM






y4tB





−2




401520-403070




516


































see y4bL






ft2





+1




403249-403899




(216)




5-195




ArgT




260




V01368




25




48




see ft1











2−215




Y4tE




300




this work




76




86






y4tD





+1




404182-404691




169




11-161




HIN1362




168




U32817




38




64




put. transcriptional regulator (AsnC/Lrp family; but low

















homology to y4sM)






y4tE





+1




405157-406059




300




31-281




FliY




257




U32734




27




48




prob. aminoacid ABC transporter binding protein











86-299




Ft2




215




this work




76




86




(periplasmic); prob. part of binding-protein-

















dep. transport system Y4tEFGH






y4tF





+1




406111-406827




238




25-233




YckJ




234




X77636




35




54




prob. aminoacid ABC transporter permease protein






Y4tG





+3




406830-407525




231




1-220




GlnP




226




D30762




32




54




prob. aminoacid ABC transporter permease protein






y4tH





+2




407522-408295




257




5-256




GlnQ




242




M61017




52




71




prob. amino acid ABC transporter ATP-binding protein






y4tI





+1




408745-409953




402




22-391




Slr0072




393




D64004




35




54




put. peptidase (M40 family)






y4tJ





+1




409990-410988




332




7-328




Thd2




329




M21312




35




57




put. threonine dehydratase






y4tK





+3




410988-411983




331




69-326




ArcB




351




U39262




30




44




hyp. cyclodeaminase; (sim. to ornithine cyclodeaminase)






y4tL





+2




412118-413290




390




10-384




ORF




411




D14463




27




45




hyp. hydrolase/peptidase (M24 family)











1-389




Y4tM




392




this work




34




53






y4tM





+2




413453-414631




392




17-390




PepQ




368




Z34896




24




43




put. hydrolase/peptidase (M24 family)











1-390




Y4tL




390




this work




34




53






y4tN





+1




414655-415179




174










hyp. 19.6 kd protein






y4tO





+1




415252-416847




531




1-484




OppA




543




M60918




28




46




prob. peptide ABC transporter binding protein precurser;

















prob. part of a binding-protein-dependent transport system

















Y4tOPQRS






y4tP





+2




416852-417793




313




4-313




DPPB




339




L08399




36




58




prob. peptide ABC transporter permease protein






y4tQ





+1




417796-418671




291




9-287




AppC




303




U20909




36




56




prob. peptide ABC transporter permease protein;

















418611: C or T possible!






y4tR





+2




418673-419680




335




12-327




OppD




336




X56347




50




68




prob. peptide ABC transporter ATP-binding protein






y4tS





+1




419677-420738




353




3-320




AppF




329




U20909




49




69




prob. peptide ABC transporter ATP-binding protein






y4uA





+3




420774-422159




461




267-337




CapA




411




M24150




42




63




put. cell wall compound biosynthesis protein; almost

















identical to Y4sH






y4uB





+3




422628-424031




467




1-464




BioA




448




U51868




33




57




prob. aminotransferase (class 3)






y4uC





+3




424056-425594




512




58-509




GabD




482




M88334




33




52




prob. aldehyde dehydrogenase






fu1





+2




425699-425779






N15K




238




D45911






put. protein fragment; 67% id. to N15K in 26 aa






fu2





+3




425841-426083






PhbA




393




U17226






fragment 65% identical to C-term. of beta-keto-thiolase






y4uD





+1




426010-426507




165










hyp. 18.7 kd protein






y4uE





−3




426949-428028




359




78-290




Tnp




414




X15942




31




45




put. transposase (IS110 family); put. frameshift: between

















427040 and 427180 (−2<−3; end of shifted ORF: 426699)






y4uF





+3




428292-429623




443




13-440




GLUD1




558




X07674




42




60




prob. glutamate dehydrogenase






fu3





+




429860-430007






Tnp




398




U08627






put. transposase fragment (92% id. in 16 aa); 85% nt-

















identity to 3′term. part of ISRm5






y4uG





+1




430105-430320




71










hyp. 7.8 kd protein






y4uH





−1




430538-431284




248




1-202




ORF2




231




X79443




48




63




put insertion sequence ATP-binding protein; similarity to











1-245




Y4pL




245




this work




61




77




Y4pL, Y4bM/Y4kI/Y4tA and Y4iQ/Y4nD/Y4sD











1-248




Y4bM




263




this work




48




68




(IS21/IS1162 family)











4-248




Y4iQ




298




this work




31




52






y4uI





−3




431296-432840




514




1-514




Tnp




518




L09108




44




63




put. transposase; similarity to Y4bL/Y4kJ/Y4tB

















(IS21/IS1162 family)






fu4










433222-433560






Tnp




201




X65471






put. transposase fragments (74-92% id. in 88 aa); 79% nt-

















identity to 5′term. of ISRm4






y4uJ




fixU




−1




433880-434110




76




1-70




FixU




70




X51963




63




80




hyp. 8.5 kd ptotein






y4uK




nifZ




−3




434107-434433




108




6-79




ORF2




>78




X07567




52




78




put. nitrogen fixation Nifz protein






y4uL




fdxN




−2




434517-434711




64




1-64




FdxN




64




M21841




79




84




prob. 4Fe-4S ferredoxin






y4uM




nifB




−1




434753-436234




493




1-493




NifB




490




M15544




72




81




involved in FeMo cofactor biosynthesis






y4uN




nifA




−1




436460-438244




594




37-594




NifA




584




U31630




62




74




positive regulator of nif, fix, and additional genes

















(sigm54-dep.)






4yuO




fixX




−2




438297-438590




97




2-97




FixX




98




M15546




84




89




prob. 3Fe-35 ferredoxin inv. in nitrogen fixadon






y4uP




fixC




−1




438605-439912




435




1-435




FixC




435




M15546




82




89




required for nitrogenase activity






y4vA




fixB




−2




439923-441032




369




18−363




FixB




353




M15546




79




87




putatively inv. in a redox process in nitrogen fixation






y4vB




fixA




−2




441042-441899




285




1-280




FixA




292




M15546




75




90




putatively inv. in a redox process in nitrogen fixation






fv1





−1




442181-442252






Nifs




384




X68444






put. NifS fragment (70% idendtity in 24 aa)






y4vC





−1




442316-442636




106




1-106




ORF118




118




X13691




54




72




hyp. 11 kd protein (HesB/YadR/YfhF family);

















homologues located upstream of nifS






y4vD





−2




443313-443879




188




5-173




HIN1693




241




U32848




46




60




put. redox enzyme (hom. to glutaredoxin-like membrane

















protein and peroxysomat membrane proteins)






y4vE




nifQ




+1




444337-445029




230




56-212




NifQ




180




M26323




39




56




putatively involved in Mo cofactor processing






y4vF




dctA1




+2




445088-446602




504




1-443




DctA1




456




S38912




99




99




C


4


-dicarboxylate transport protein; nt-deletion at 446416

















in comparison to sequence of acc. no. S38912 causing a

















frameshift (DctA1 is 48 aa longer than DctA1 in S38912)






y4vG





+1




446599-447843




414




1-3413




CamC




415




M12546




34




50




prob. cytochrome P450






y4vH





+1




447844-448500




218




(32-157




LinA




155




D90355




28




46)




hyp. 24.6 kd protein (with very weak homology to

















gamma-hexachlorocyclohexane-dechlorinase)






y4vI





+3




448557-450203




548




9−250




FabG




244




U39441




38




56




short-chain type dehydrogenase/reductase











276-513







30




48






y4vJ





+2




450341-451396




351




1-188




LuxA




357




M36597




27




47




put. monooxygenase; similar to Y4wF;






y4vK




nifH1




+1




451993-452883




296




1-296




NifH




296




M26961




99




99




Fe protein of nitrogenase






y4vL




nifD1




+1




452980-454494




504




199-393




NifD




>195




M26962




98




99




alpha-subunit of MoFe protein of nitrogenase






y4vM




nifK1




+3




454590-456131




513




132-195




NifK




>64




M26963




100




100




beta-subunit of MoFe protein of nitrogenase






y4vN




nifE




+1




456187-457677




496




1-469




NifE




547




X56894




62




78




involved in FeMo cofactor biosynthesis






y4vO




nifN




+1




457687-459096




469




1-455




NifN




441




M18272




70




81




involved in FeMo cofactor biosynthesis






y4vP




nifX




+3




459093-459575




160




22-156




NifX




159




X17433




52




68




nitrogen fixation protein






y4vQ





+3




459579-460067




162




22-162




ORF4




156




X17433




49




70




hyp. 17.7 kd protein, similar to proteins of other











1-162




Y4xD




162




this work




61




75




nitrogen-fixing bacteria and to Y4XD






y4vR





+1




460501-460920




139




1-58




NifH




296




M26961




50




63




similar to N-term. of Fe protein of nitrogenase






y4vS




fdxB




+2




461228-461545




105




1-88




ORF5




102




M26323




52




65




prob. 4Fe-4S ferredoxin






y4wA





+1




463201-464739




512




86-499




PqqE




709




LA3135




50




70




hyp. zinc protease M16 family); sim. to Y4wB






y4wB





+3




464736-466079




447




236-438




PqqF




213




LA3135




42




61




put. protease (lacks Zn-binding site; M16 family); sim. to

















Y4wA






y4wC





+3




466590-467021




143




8-132




ORF3




127




L13845




48




66




put. DNA-binding protein; high similarity to Y4aM











1-143




Y4aM




143




this work




69




77






y4wD





+1




467758-468891




377




11-370




MosC




407




U23753




29




48




permeasc-type protein; hom. to membrane protein from

















the rhizopine biosynthesis (mosABC) gene cluster






y4wE





+3




469311-470417




368




20-361




His1




356




D14440




32




53




prob. aminotransferase (class 2)






y4wF





+1




470824-471852




342




40-194




LuxA




354




X06758




27




54




put. monooxygenase; sim. to Y4vJ






y4wG





+2




471890-472435




181










hyp. 19.4 kd protein






y4wH





+3




473343-473780




145




1-145




ORF2




145




M19352




64




76




hyp. 15.6 kd protein






y4wI





−2




473928-475469




513










hyp. 59 kd protein






y4wJ





−2




475503-475880




125










hyp. 13.3 kd protein






y4wK




nifW




−1




476519-476971




150




12-118




NifW




108




M86823




50




63




NifW protein homolog; required for full activity of FeMo

















protein






y4wL




nifS




−2




477135-478298




387




4−387




NifS




402




M17349




58




73




prob. NifS protein (member of class-5 pyridoxal-

















phosphate-dep. aminotransferase family)






y4wM





−2




479145-481136




663




225-620




YejA




>409




U00008




38




55




put. ABC transporter binding protein (transporter or

















enzymatic function)






fw1





−1




481460-481834




124




1-116




DctA




441




M26531




55




61




hyp. truncated transporter-like protein; hom. to N-term. of

















DctA (see y4vF); two frameshifts acc. to homologue:

















481606 (−3<−1); 481530 (−2<−3; homology stops at

















481419)






y4wO





−3




481834-482154




106










hyp. 11 kd protein






y4wP





+2




482540-482947




135










hyp. 14.9 kd protein






y4xA




nifH2




+1




483871-484761




296




1-296




NifH




296




M26961




99




99




Fe protein of nitrogenase






y4xB




nafD2




+1




484858-486372




504




199−393




NifD




>195




M26962




98




99




alpha-subunit of MoFe protein of nitrogenase






y4xC




nifK2




+3




486468-488009




513




132-195




NifK




>64




M26963




100




100




beta-subunit of MoFe protein of nitrogenase






y4xD





+3




488262-488750




162




22-162




ORF4




156




X17433




47




73




hyp. 18 kd protein; similar to proteins of other nitrogen-











2-162




Y4vQ




162




this work




61




75




fixing bacteria and to Y4vQ






y4xE





+1




488773-488976




67




1-64




ORF1




69




X55450




40




67




hyp. 7.6 kd protein; similar to proteins of other nitrogen-

















fixing bacteria






y4xF





+3




488973-489149




58










hyp. 6.5 kd protein






y4xQ





+2




489281-489583




100




14-83




ExoX




98




M61751




31




52




put. exopolysaccharide production repressor (integral

















membrane protein)






y4xG





+2




490010-491527




505










hyp. 55.5 kd protein






y4xH




nodD2




−2




491655-492593




312




1-312




NodD2




312




L38460




99




99




transcriptional regulator (LysR family); high similarity to











1-310




NodD1




322




this work




68




83




Y4aL, (NodD1)






y4xI





+2




494297-494977




226




1-224




PmrA




222




L13395




39




58




signal transduction-type regulator






y4xJ





+1




495157-496428




423




76-378




GPIV




426




J02451




27




46




hyp. protein hom. to proteins of the general secretion

















pathway (pulD family), sim. to Y4yD (NolW)






y4xK





+1




496438-497004




188










hyp. 20.6 kd protein precurser






y4xL





−1




497444-498460




338










hyp. 37.1 kd protein






y4xM





−1




498719-499933




404




23-403




ORF1




408




X59939




22




49




permease-type protein












(YceE)






y4xN





−3




499930-501816




628




183-505




IucC




580




X76100




28




43




hyp. 71 kd protein hom. to aerobactin synthetase subunit






y4xO





−2




501816-502955




379










hyp. 40.9 kd protein






y4xP





−1




502952-503962




336




5-304




CysK




308




D26185




40




60




put. cysteine synthase






y4yA





−1




503963-505336




457










hyp. 49.9 kd protein; low similarity to diaminopimelate

















decrboxylase






y4yB





−3




505336-505800




154










hyp. 17.1 kd protein






y4yC




nolX




−2




505950-507740




596




1-596




NolX




596




L12251




98




99




nodulation protein as in


R. fredii USDA257








y4yD




nolW




−3




508021-508725




234




1-234




NolW




234




L12251




99




100




nodulation protein (PulD family); sim. to Y4xJ






y4yE




nolB




+3




508881-509375




164




1-164




NolB




164




L12251




98




99




nodulation protein






y4yF




nolT




+3




509385-510254




289




1-289




NolT




289




L12251




96




97




nodulation protein precurser (YscJ homolog; M74011)






y4yG




nolU




+2




510251-510889




212




1-212




NolU




212




L12251




99




99




nodulation protein






y4yH




nolV




+3




510891-511517




208




1-60




ORF4




65




L12251




100




100




homologous to two (nodulation) proteins of


R.fredil













73-208




NolV




135





96




97




USDA257 (YscL homolog; M74011)






y4yI




hrcN




+2




511514-512869




451




35-450




YscN




439




U00998




55




73




prob. ATPase involved in secretion











1-80




HrcN




450




L12251




97




97











105-450







97




98






y4yJ





+1




512845-513381




178




1-178




ORF7




178




L12251




97




98




hyp. 20.4 kd protein






v4yK




hrcQ




+1




513406-514482




358




171-350




YscQ




307




L25667




27




46




prob. translocation protein inv. in secretion processes











1-358




HrcQ




382




L12251




96




98




(FliN/MopA/SpaO family)






y4yL




hrcR




+2




514475-515143




222




6-216




YscR




217




L25667




46




66




prob. translocation protein inv. in secretion processes











1-222




HrcR




249




L12251




99




99




(FliP/MopC/SpaP family)






y4yM




hrcS




+1




515143-515418




91




1-66




YscS




88




L25667




34




65




prob. translocation protein inv. in secretion processes











1-91




HrcS




92




L12251




98




100




(FliQ/MopD/SpaQ family)






y4yN




hrcT




+3




515427-516245




272




28-250




YscT




261




L25667




31




52




prob. translocation protein inv. in secretion processes











1-272




HrcT




272




L12251




98




99




(FliR/MopE/SpaR family)






y4yO




hrcU




+2




516242-517279




345




5-339




YscU




354




L25667




30




50




prob. translocation protein inv. in secretion processes











1-340




HrcU




351




L12251




99




99




(FlhB/HrpN/YscU/SpaS family)






y4yP





+1




518077-518892




271




35-262




HipA




295




M19019




88




91




homolog is inducible by root-exudate and diadzein;

















frameshift acc. to homologue: 518855 (1>2)






fy1





+




519655-519995






NolJ




148




L26967






nodulation gene homologous fragments (80-100% id. in

















97 aa); frameshifts acc. to homologue: 519789 (1>3);

















519900 (3>2); 519965 (2>3)






y4yQ





+2




520280-521170




296










hyp. 31.3 kd integral membrane protein






y4yR





+2




521360-523453




697




17-677




LcrD




704




M96850




40




65




prob. translocation protein inv. secretion processes

















[Flage11a/HR/Invasion proteins export pore (FHIPEP)

















family]






y4yS





+3




523470-524018




182










hyp. 20.1 kd protein






y4zA





+2




525005-525892




295




34-115




Y4rM




350




this work




98




98




hyp. (fragmentous?) 32.9 kd protein; put. frameshift:











133-231




Y4rL




155




this work




99




99




525699 (2>3); similar to Y4iE






y4zB





+1




526051-527121




356




60-320




Tnp




377




X67862




29




47




put. (fragmentous?) transposase (IS4 family) 526103-

















526200 higher cod. prob. in fr. 2; put. frameshift: 526200

















(2>1)






fz1





+




527337-527902






Hdc




378




J02577






fragments homologous to histidine decarboxylases (30-

















45% id. in 134aa); put. frameshift (3>2) around 527478






y4zC





+3




529125-529910




261




65-248




AvrPph3




276




M86401




27




41




hyp. 28.3 kd protein; hom. to avirulence protein






y4zD





+3




530145-530294




49










hyp. 5.5 kd protein






fz4





+2




530432-530764




110




1-110




Y4jA




504




this work




72




85




hom. to C-terminus of Y4jA/Y4nE/Y4sE






fz2





+




530761-531250






ORFB




251




X67861






put. IS-ATF-binding protein fragments (32-40% id. in

















137aa); put. frameshift acc. to homolog: 531062 (1>2)






y4zF




syrM2




+2




532676-533695




339




1−320




SyrM




326




M33495




69




81




prob. symbiotic regulator (LysR family)











1−335




SyrMl




338




this work




62




79






fz3





+




534257-534422






ORF




338




M73488






fragments homologous to 1-aminocydopropane-1-

















cabboxylate deaminase (63-83% id. in 56aa); put.

















frameshift: 534291













a


open reading frame (ORF)












b


strand (−/+) or frame (−1; −2; −3; +1; +2; +3)












c


number (no.)












d


aminoacids (aa)












e


GenBank/EMBL accession numbers












f


identity (I)and similarity (S) have been calculated by the programme BESTFIT (local homology algorithm; Smith and Waterman, 1981) of the WISCONSIN SEQUENCE ANALYSIS PACKAGE (version 8.0, GCG, Madison, USA)












g


abbreviations: prob. = probable; cod. prob. = coding probability; acc. = according; inv. = involved; sim. = similar; id. = identical; fr. = frame; acc. no. = accession number; nt = nucleotide; hyp. = hypothetical; put. = putative; hom. = homologous; dep. = dependent; N/C-term = N/C-terminus













In a second stage, the remaining 436 kb of pNGR234a were analyzed. Several ORFs and their deduced proteins were identified that belong to functional groups not previously identified in the analysis of cosmids pXB296, pXB368 and pXB110 (replication of the plasmid, conjugal transfer of the plasmid, functions in oligosaccharide biosynthesis and cleavage, functions in sugar or sugar-derivative metabolism, functions in lipid or lipid-derivative metabolism, functions in chemoperception/chemotaxis, functions in biosynthesis of cofactors, prosthetic groups and carriers, etc.).




Although further functional analyses of selected ORFs in pNGR234a still have to be performed, large-scale sequencing gives a global picture of their genomic organization and possible roles. Determination of putative functions of predicted genes by homology searches and identification of sequence motifs (promoters, nod boxes, nifA activator sequences, and other regulatory elements) will aid in finding new symbiotic genes. High-fidelity sequence data covering long stretches of the genome are a prerequisite for these studies. The use of the dye terminator/thermostable sequenase shotgun approach has allowed the completion of the entire ˜500 kb sequence of pNGR234a and has opened up new avenues for the genetic analysis of symbiotic function.




Genetic Organization of the Whole Plasmid pNGR234a




Within the complete nucleotide sequence of pNGR234a, which comprises 536,165 bp, a total of 416 ORFs were predicted to encode proteins. An additional 67 ORF-fragments were detected that seem to be remnants of functional ORFs.




Thirty four percent (139) of the 416 potential proteins, have no obvious similarities to any known proteins. Of the remaining 277 proteins, 31 (8%) are similar to proteins for which no biochemical or phenotypic role has been assigned, 12 (3%) are similar to proteins for which limited biological data is available, and 234 (56%) are similar to proteins with a more precise biological function: enzymes (95), proteins involved in integration and recombination of insertion elements (44), transporters (32), transcriptional regulators (22), protein secretion/export (21), proteins involved in replication and control of the plasmid (12), electron transporters (6), and proteins involved in chemotaxis (2). A high proportion of enzymes was expected of a symbiotic replicon involved in nodulation (Nod-factor biosynthesis, etc.) and nitrogen fixation. As expected from the observation that NGR234 can be cured of its plasmid (Morrison et al., 1983), no ORFs essential to transcription, translation or to primary metabolism were found.




A large number of protein families are present in several copies on pNGR234a. This is true even after elimination of the many proteins which are encoded in repeated IS elements, or are involved in transposition, integration or recombination. The most notable examples of highly represented protein families include: five members of the short-chain dehydrogenase/reductase family, one of which (y4vI) contains two homologous domains; Five complete and one partial ABC-type transporter operons that each encode for at least one ABC-type permease and an ABC-type ATP-binding protein; four cytochrome P450's; and three members of peptidase family S9A. In total, 85 proteins belong to families that are represented more than once and which do not seem to be linked to insertion or recombination.




The majority (330, 79%) of the putative proteins are probably located in the cytoplasm of the bacterium, 62 (15%) possibly span membranes, 20 (5%) could be located in the periplasm, 3 are predicted to be lipoproteins that could associate with the outer membrane, and 2 are probably outer membrane proteins. These observations accord well with the dominance of biosynthetic proteins, as well as proteins involved in transcriptional regulation and insertion/recombination, most of which are thought to be cytoplasmic.




Although other start points cannot be excluded, replication of pNGR234a probably begins at oriV which is located within the intergenic sequence (igs) between the repC and repB-like genes y4cI and y4cJ. This locus (positions 54,417 to 54,570) encodes three proteins with 40-60% amino acid identities to RepABC of pTiB6S3 (a Ti-plasmid of


Agrobacterium tumefaciens


), pRiA4b (an Ri-plasmid of


A. rhizogenes


) and pRL8JI (a cryptic plasmid of


R. leguminosarum


bv.


leguminosarum


). Amongst replication regions, highest identities (69 to 71% at the nucleotide level) are found in the igs's between repC and repB (FIG.


5


). In Agrobacterium, these igs's are the determinants which render parental plasmids incompatible. Two ORF's (position 198,500), which are homologous to pseudomonal genes involved in plasmid stability, may also play a role in replication of pNGR234a. A 12 bp portion of the origin of transfer (oriT) is identical to that of pTiC58 of


Agrobacterium tumefaciens


(nt 80,162 to 80,173), and highly similar to those of RSF1010 (


Escherichia coli


) and pTFI (


Thiobacillus ferrooxidans


). This sequence corresponds to the oriT of plasmids containing the “Q-type nick-region” (FIG.


6


).




Another 24 predicted ORFs show homologies to conjugal transfer genes of Agrobacterium Ti-plasmids. All are located in two large clusters between position 57,000 to 83,000. Since pNGR234a was believed to be non-transmissible (Broughton et al., 1987), the fact that both the nucleotide sequence of the individual ORFs and their order is similar in Agrobacterium and NGR234 came as a surprise. Conjugal transfer of Ti plasmids in


A. tumefaciens


is controlled by a family of N-acyl-L-homoserine lactone auto-inducers (Zhang et al., 1993). Similar molecules, which are able to interact with the traR gene product of


A. tumefaciens,


were detected in the supernatants of NGR234 cultures using the assay of Piper et al. (1993).




Reiterated sequences first became apparent in NGR234 during the construction of an ordered array of cosmid clones (Perret et al., 1991). It is now clear that 97 kbp (18%) of pNGR234a represents insertion-(IS) and mosaic-(MS) sequences (FIG.


7


). Homology searches for known IS/MS revealed some of these, while comparison of repeated sequences within pNGR234a, as well as between the plasmid and 2,500 random chromosome sequences (V. Viprey, pers. communication) located the rest. Seventy five putative ORFs (18% of the total) and 40 fragments of ORFs were identified this way, nearly half of which (44) show homologies to integrases and transposases. Many of these IS elements are similar not only to those derived from Rhizobium and Agrobacterium species, but also to those of other, diverse Gram (−) and Gram (+) bacteria (e.g. Bacillus, Escherichia, and Pseudomonas). The shear number and diversity of these IS/MS elements suggests that NGR234 has functioned as a “transposon trap”. This is supported by the fact that their average G,C content (61.5%) is 3% higher than that of pNGR234a (58.5%). Interestingly, many IS/MS are clustered between positions 300,000 to 390,000 (FIG.


7


), while some loci are almost unaffected by insertions (oriV, nod-, fix- and nif-ORFs). Small IS/MS clusters divide the replicon into large blocks of often functionally related ORFs (e.g. blocks of nod-ORFs replication and conjugal transfer ORFs, nif-ORFs and fix-ORFs). A list of all sequences with IS-elment or mosaic sequence character is given in Table 4. Although transposition of these IS/MS elements has not been demonstrated, transfer of plasmids amongst rhizobia in the legume rhizosphere (Broughton et




al., 1987) and to other non-symbiotic bacteria in fields (Sullivan et al., 1995) suggests that lateral transfer of genetic information has helped shape symbiotic potential.












TABLE 4











Insertion/mosaic sequences in pNGR234a




















put. ORFs/












ORF-









start of




stop of




name of




fragments




similarities





homologous sequences in






region




region




region




included




within pNGR234a




similarities to chromosome




other organisms/comments









 17000




 17600




ISH-10b




y4aQ




33% aa-id. to y4hP





geneproducts from IS866 and IS66 from










(ISH-10a)







Ag. tumefaciens








 18900




 19661




ISH-11b




fa2




54% aa-id. to part of





Tnp of IS1202 from


Str. pneumoniae












y4bF (ISH-11a);










19096-19362: 91% nt-id.










to ISH-11c






 19666




 22981




NGRIS-4a




y4bABCD




identical to NGRIS-4b




many copies on the











chromosome






 22985




 25400




ISH-11a




fb1, y4bF




y4bF: sim. to fb1 and




partially 91% nt-id. to




Tnp of IS1202 from


Str. pneumoniae












fa2 (ISH-11b)




chromosomal sequences






 32463




 35085




NGRIS-3a




y4bLM




identical to NGRIS-3b/c




copie(s) on the




62% nt-id. (over 2352 nt) to IS1162 of











chromosome






Ps. fluorescens


(IS21/IS1162/IS408 family)






 49300




 50300




ISH-13a




y4cG




similar to y4lS (ISH-13b)





DNA invertase






 69936




 70385




ISH-4c




fd1




70233-70385: 93% nt-id.





ORFA of IS5376 from


B. stearothermophilus












to part of NGRIS-4






 93322




 96025




ISH-12a




fe2, y4eF,




93574-94927: 90%





Tnp (fe2) and Int (4AeF, fe3) from


Weeksella











fe5, fe3




nt-id. to ISH-12b1; 75%







zoohelcum


-IS-element; (93322-94586: 57%










nt-id. to fq6 region (ISH-





nt-id. to IS292 from


Ag. radiobacter


); “phage”










12b2); 95343-95558:





Integrase family (Y4eF, fe5, fe3)










88% nt-id. to ISH-12b3






101939




102394




ISH-8b




fe7






84% nt-id. to ISRm5 of


R. meliloti


; fe7:












mutator family of transposases






115881




116004




MSH-14b





partially homologous to




72-73% nt-id. to sequences




mosaic element










ISH-14a




downstream from chvl/up-











stream from rpoN on the











chromosome






124396




124500




MSH-14a





partially homologous to




82% nt-id. to sequence




mosaic element










ISH-14b




RIME1 downstream from











chvl on the chromosome;











parts of MSH-14a show











73-89% nt-id. to chromo-











somal sequences






126806




127369




ISH-12f




y4gA




low. similarity to y4rE






127900




128500




ISH-12e




y4gC






recombinase from pAE1 of


Al. eutrophus














(“phage” integrase family)






131000




131800




ISH-15




y4gE*





partially 87% nt-id. to











chromosomal sequences






159781




160564




ISH-16







96% nt-id. to repetitive sequence from


R. fredii














USDA257 (acc. no. M73698)






164600




167700




ISH-10a




y4hNOPQ





99% nt-id. of parts of




different ORFs derived from IS-like sequences;











y4hPQ to chromosomal




partially known as acc. no. X74068 (“Region2”











sequences




from pNGR234a); 164853-167086: 66% nt-id.












to IS66 from


Ag. tumefaciens








168208




169190




ISH-2c




fi1, fi2, fi3




168343-168659: 72%





168208-168383: 70 nt-id. to ISRm2011-2










nt-id. to ISH-2f1/





(


R. meliloti


); fi2/3: IS1111A, IS1328, IS1533










ISH-2d1





family of transposases










165785-169091: 73%










nt-id. to ISH-2f2/










ISH-2d2






173295




173702




ISH-8g




y4iE*




y4iE: sim. to y4rL,










y4zA, and fr2






175590




175909




ISH-11c




y4iG*




175643-175909: 91%










nt-id. to ISH-11a






185672




186507




ISH-2d




y4iO*/P*




185672-186075(−): 73%





Y4iO: Tnp of IS1325 from


Y. enterocolitica











(3′-end)




nt-id. to ISH-2c2(+)





(IS1111A, IS1328, IS1533 family)










186208-186507(−): 72%










nt-id. to ISH-2c1(+)






187112




189752




NGRIS-5a




y4iQjA




identical to NGRIS-5b/c




copie(s) on the




1stA and B (Tnps) of IS1326 from


E. coli













chromosome




(IS21/IS1162/IS408 family)






190000




193500




ISH-10c




y4jBCD(E*)




38/32 aa-id. of y4jCD





different ORFs derived from IS-like sequences;










to y4hOP (ISH10a)





partially 60% nt-id. to IS866 (


Ag. tumefaciens


);












IS292 (


Ag. radiobacter


); ISR11 (


R.
















leguminosarum


)






193518




193634




MSH-17







76% nt-id. to repetitive sequence RMX6 from














Myxococcus xanthus


(acc. no. M60865)






199746




199958




ISH-11d




y4jM*




similarity to fb1 and










y4bF (ISH-11a)






211165




211265




ISH-10g







74% nt-id. to ISR11 (


R. leguminosarum


),












IS66/IS866 derivative






211350




212580




ISH-10h




fk2




similar to y4jD





74% nt-id. to IS66










(ISH-10c)






217564




220186




NGRIS-3b




y4kIJ




identical to NGRIS-3a/c




copie(s) on the




62% nt-id. (over 2352 nt) to IS1162 of











chromosome






Ps. fluorescens


(IS21/IS1162/IS408 family)






224547




224995




ISH-18a




y4kQ




83% nt-id. to ISH18b





IS110 family









(3′-end)




(427651-428102)






240800




241040




ISH-24b




fl2






60% nt-id. to ISR12 from


R. leguminosarum








244540




244851




ISH-19a




fl4




244620-244812: 97%





TnpA from Tn163 (


R. leguminosarum


)










nt-id. to ISH-19b






248290




248655




ISH-19b




fl5




248463-248655: 97%










nt-id. to ISH-19a






248814




249680




ISH-20




fl6






Tnp of Tn1546 (


Enterococcus faecium;














Tn21/501/1721 family)






251407




252400




ISH-13b




y4lSmA




y4lS: similar to y4cG





y4lS: invertase; 58% nt-id. (251409-252211) to










(ISH-13a)





Tn501 from


Ps. aeruginosa


(acc. no. Z00027)






258551




258657




MSH-21







mosaic sequence: 82% nt-id. to sequence up-












stream of ropA2 (


R. leguminosarum


; acc. no.












X80794)






280403




283043




NGRIS-5b




y4nDE




identical to NGRIS-5b/c




copie(s) on the




1stA and B (Tnps) of IS1326 from


E. coli













chromosome




(IS21/IS1162/IS408 family)






284722




284985




ISH-1b




fn2






60% nt-id. to IS1162 (


Ps. fluorescens,














IS21/IS1162/IS408 family)






300017




300819




ISH-1c




fo3






61% nt-id. IS408 (


Ps. cepacia;














IS21/IS1162/IS408 family)






300820




304117




NGRIS-6




fo4/5/6,




77% nt-id. to NGRIS-4









y4oL/M/N







304118




304434




ISH-1d




fo7






61% nt-id. IS408 (


Ps. cepacia;














IS21/IS1162/IS408 family)






318854




319686




NGRIS-7




fp1-2






66% nt-id. to IS1248 of


Pa. denitirificans








320456




328935




NGRRS-1a




fp3/4; y4pL





3 copies on the




interrupted by NGRIS2a and 4b; fp3/4, Y4pL:











chromosome




IS21/IS1162/IS408 family






320590




323147




NGRIS-2a




y4pEFG




identical to NGRIS-2b





partially 88-90% nt-id. to repetitive sequence












RDRS9 of


R. fredii


USDA257 (IS1111A/












IS1328/IS1533 family)






323961




327276




NGRIS-4b




y4pHIJK




identical to NGRIS-4a




many copies on the











chromosome (disrupts all 4











copies of NGRRS-1)






335004




336301




NGRIS-8




y4pO




similar to fe7 (ISH-8b)





88% nt-id. to ISRm3 of


R. meliloti


: mutator












family of transposases






342272




342419




ISH-12d





342272-342419: 87%










nt-id. to ISH-12b4






344100




345300




ISH-2e




y4qE






Tnp (


Leptospira borgpetersenii


):












IS1111A/IS1328/IS1533 family






345755




346133




ISH-12c




fq4




345755-346133: 82%





Int (XerC,


E. coli


): “phage” integrase family










nt-id. to ISH-12b5






351600




351735




MSH-22







80 nt-id. to sequence from pTiS4 (


Ag. vitis


;












acc. no. M91609)






351826




353794




ISH-10d




y4qI, fq5




fq5: 35% aa-id. to y4hQ




71-95% nt-id. of parts of




67% nt-id. to ISR11 (


R. leguminosarum


; acc.










(ISH-10a)




y4qI to chromosomal




no. L19650); IS866/66 homolog











sequences






354000




363073




ISH-12b




y4qJK, fq6,




354942-35612/356215-




70% nt-id. of parts of




Tnp and Int from


Weeksella zoohelcum


-IS-ele-









y4rABCDEF




356383: 90/91% nt-id. to




ISH14a1 to chromosomal




ment (y4qJK), different intergrases (y4rAB),










ISH12a1; 75% nt-id. to




sequences




integrase XerC of


H. influenzae


(y4rC);










fe5 region (ISH-12a2);





y4qK, fq6, y4rABCDEF: “phage” integrase










359753-359968: 88%





family










nt-id. to ISH-12a3;










361029-361410: 82%










nt-id. to ISH-12c










362507-362654: 87%










nt-id. to ISH-12d






363287




363694




ISH-10i




y4rG




low similarity to y4jB





unknown protein from IS1312 (


Ag.












and fr4 (ISH-10c/i)







tumefaciens


)/IS866






366252




367402




ISH-8f




fr1, fr2




366252-366524: 88%










nt-id. to ISH-8e










366773-366953: 92%










nt-id. to ISH-8g






367699




367970




ISH-10e




fr3




56% aa-id. of fr3 to





75% nt-id. to IS66 (


Ag. tumefaciens


)










y4hO (ISH-10a)






368503




369675




ISH-23




y4rI





91-93% nt-id. of parts of











y4rI to chromosomal











sequences






369697




370887




ISH-2f




y4rJ




370012-370328: 72%





y4rJ: Tnp from IS1111a of


Coxiella burnetii












nt-id. to ISH-2c1





(IS1111A/IS1328/IS1533 family)










370479-370785: 73%










nt-id. to ISH-2c2






371399




372990




ISH-8e




y4rL*M*




371399-371671: 88%










nt-id. to ISH-8f










371474-372228: 97%










nt-id. to ISH-8d






377185




377695




ISH-10j




fr4




similar to y4rG (ISH-10i)





377327-377695: 75% nt-id. to ISRm6 (


R.
















meliloti


)






377826




380383




NGRIS-2b




y4sABC




identical to NGRIS-2a





partially 88-90% nt-id. to repetitive sequence












RFRS9 of


R. fredii


USDA257 (IS1111A/












IS1328/IS1533 family)






380883




383523




NGRIS-5c




y4sDE




identical to NGRIS-5a/b




copie(s) on the chromo-




1stA and B Tnps) of IS1326 from


E. coli













some




(IS21/IS1162/IS408 family)






383593




384054




ISH-2g




fs5






Tnp of IS1328 of


Y. enterocolitica














(IS1111A/IS1328/IS1533 family)






384210




384493




ISH-10k







fragments with 94-84% nt-id. to ISRm6 (


R.
















meliloti


)






388100




388600




ISH-2h




fs1






different Tnps (IS1111A/IS1328/IS1533












family)






388601




388900




ISH-10l




fs2






ORF from IS1312 of


Ag. tumefaciens














(IS66/866 family)






396445




397301




NGRIS-9




y4sN and





91-99% nt-id. of NGRIS9-




different ORFs derived from IS elements;









fs4





parts to chromosomal




partially known from acc. no. X74314











sequences






400626




403248




NGRIS-3c




y4tAB




identical to NGRIS-3a/b




copie(s) on the chromo-




62% nt-id. (over 2352 nt) to IS1162 of











some






Ps. fluorescens


(IS21/IS1162/IS408 family)






426525




428102




ISH-18b




y4uE*




427651-428102: 83%




77-96% nt-id. of ISH-18b-




Tnp of mini-circle DNA from


Str. coelicolor












nt-id. to ISH-18a




parts to chromosomal




(IS110 family)











sequences






429860




430007




ISH-8c




fu3






85% nt-id. to ISRm5 (


R. meliloti


)






430568




432851




ISH-1e




y4uHI






60% nt-id. to IS408/IS1162












(


Ps. cepacia/Ps. fluorescens


)






433222




433560




ISH-24a




fu4




low similarity to y4sN





79% nt-id. to ISRm4 (


R. meliloti


)/ISR12-like










(NGRIS-9)






462554




463053




ISH-10f







fragments with 83-69% nt-id. to IS866












(


Ag. tumefaciens


)






524946




525892




ISH-8d




y4zA




525095-525849: 97%





524946-525580: 61% nt-id. to ISRm5 (


R.












nt-id. to ISH-8e







meliloti


)






526051




527121




ISH-25




y4zB*






Tnp of IS5376 from


B. stearothermophilus


(IS4












family of transposases)






530364




531249




ISH-1f




fz4, fz2




79% nt-id. to part of





fz4/2: IS21/IS1162/IS408 family










NGRIS-5











Abbreviations: Tnp = transposase; Int = integrase; nt-id. = nucleotide-identity; aa-id. = aminoacid identity










IS elements with precisely defined borders are designated as NGRRS/NGRIS-1 to 9. Other sequences which show homologies to known mosaic or IS-like sequences (mosaic/insertion sequence homologs) are named MSH and ISH, respectively.













Carbohydrates are constituents of the rhizobial cell wall as well as morphogens called Nod-factors (short tri- to penta-mers of N-acetyl-D-glucosamine, substituted at the non-reducing terminus with C16 to C18 saturated or partially unsaturated fatty acids). Elements of the biosynthetic pathways leading to cell walls or to lipo-chito-oligosaccharides (Nod-factors) are common. Most differences are found in the later stages of the pathways that lead to specific cell-wall components or to Nod-factors.




As befits a symbiotic replicon, only 13 ORF's with homology to polysaccharide synthesis genes (house-keeping genes senso stricto) are located on the plasmid (Table 3). Sequences homologous to exoB, exoF, exoK, exoL, exoP, exoU, and exoX (X. Perret and V. Viprey, unpublished), and exoY (Gray et al., 1990) are clearly located on the chromosome. Although loci with weak homologies to nod-box::psiB of


R. leguminosarum,


and exoX of


R. meliloti


exist on the plasmid (y4iR, and y4xQ respectively), these are regulatory rather than structural genes, suggesting that almost all cell wall polysaccharide synthesis ORFs are chromosomally located.




Except for nodPQ and nodE, at least one copy of all the regulatory and structural ORFs involved in Nod-factor biosynthesis seem to be located on the plasmid. The activity of most nodulation genes is modulated by four transcriptional regulators of the lysR family. These are nodD1 (y4aL), syrM1 (y4pN), nodD2 (y4xH), and syrM2 (y4zF). NodC, which is an N-acetylglucosaminyltransferase. the first committed enzyme in the Nod-factor biosynthetic pathway, is part of an operon which includes nodABCIJnolOnoeIE (y4hI to y4hB, Table 3). Together, these genes, which form the hsnIII locus, are responsible for the synthesis of the core Nod-factor molecule, and the adjunction of 3- (or 4)-O-carbamoyl, 2-O-methyl, and 4-O-sulfate groups (Hanin et al., unpublished). nodZ (y4aH), which encodes a fucosyltransferase, is part of the hsnI locus, which includes noeJ (y4aJ), noeK (y4aI), noeL (y4aG), nolK (y4aF), all of which are involved in the fucosylation of NodNGR factors (Fellay et al., 1995a). Wild-type NodNGR factors are also N-methylated and 6-O-carbamoylated, adjuncts which are added by the transferases encoded by nodS and nodU respectively [y4nC and y4nB; hsnII (Lewin et al., 1990)]. Possibly the only other enzyme which may be directly involved in Nod-factor biosynthesis is that encoded by nolL (y4eH, Table 3). As the 2-O-methylfucose residue of NGR234 Nod-factors is either 3-O-acetylated, or 4-O-sulphated, an acetyltransferase is obviously required. Since NolL shows only limited homology to acetyltransferases, experimental proof of the transferase activity will be required however.




In contrast to


R. leguminosarum


and


R. meliloti


harbouring pNGR234a,


A. tumefaciens


(pNGR234a) transconjugants are incapable of nitrogen fixation (Broughton et al., 1984), suggesting that some essential fix ORFs are also carried by the chromosome Nevertheless, more than 40 nif- and fix-ORFs are plasmid borne. Included amongst these are nifA (y4uN) which encodes for a sigma-54 dependent regulator. Mutation of rpoN (which encodes sigma 54) causes a Fix





phenotype on NGR234 hosts (van Slooten et al., 1990). Similarly, mutation of fixF (y4gN) disrupts synthesis of a rhamnose-rich extra-cellular polysaccharide, and results in a Fix





phenotype on


Vigna unguiculata,


the reference host for NGR234 (unpublished). In fact, loci adjacent to fixF are probably responsible for the synthesis of dTDP-rhamnose from glucose-1-phosphate. Enzymes involved in this biosynthetic pathway include glucose-1-phosphate thymidylyltransferase (y4gH), dTDP-glucose-4,6-dehydratase (y4gF), dTDP-4-dehydrorhamnose-3,5-epimerase (y4gL), and dTDP-4-dehydrorhamnose reductase (y4gG). Rhamnose-rich lipopolysaccharides (LPS) seem to be necessary for complete bacteroid development and nitrogen fixation (Krishnan et al., 1995). Perhaps the enzyme encoded by y4gI is needed for the synthesis of the rhamnose rich LPS's from dTDP-rhamnose.




Although not directly involved in the fixation process, mutation of the plasmid borne copy of dctA (=dctA1, y4vF) also impairs nitrogen fixation (van Slooten et al., 1992). Other nif- and fix-ORFs are involved in elaboration of the electron-transfer complex (fixAB), in various cofactors required for nitrogen fixation (e.g. fixC, nifB, nifE, nifN, etc.), and in the synthesis of ferrodoxins (fdxB, fdxN, fixX). Finally, those ORFs involved in the synthesis of the nitrogenase complex are also present. Amongst these are two functional copies of the nifKDH ORFs (y4vM to y4vK and y4xC to y4xA) (Badenoch-Jones et al., 1989). Additionally, 17 new ORFs located within the nitrogen fixation cluster (see

FIG. 7

; ORFs y4vC to y4vJ with the exception of dctA1, y4wA to y4wG, y4wI, y4wJ and y4xQ) are co-transcribed together with the ORFs homologous to known nif and fix genes. It thus seems likely that most ORFs necessary for bacteroid development and synthesis of the nitrogen-fixing complex, are carried by pNGR234a.




Two types of regulatory elements which frequently occur in pNGR234a are the NodD- and NifA/sigma-54-dependent promoters. NodD-dependent promoter-like sequences known as nod boxes have been identified by homology search within intergenic regions, using the following consensus sequence: 5′-YATCCAYNNYRYRGATGNNNNYNATCNAAACAATCRATTTTACCAATCY-3′ [12 mismatches allowed (van Rhijn and Vanderleyden, 1993); Y=C or T, R=A or G, N=A,C,G or T]. Putative NifA-dependent promoters (Fischer, 1994) have been predicted by screening for the NifA activator sequence (5′-TGT-N


10


-ACA-3′) together with the sigma-54 promoter consensus sequence (5′-TGGCAC-N


5


-TTGCA/T-3′ with GG and GC as the most conserved doublets; 3 mismatches allowed) separated by 60 to 150 nucleotides. The identified conserved promoter-like sequences in pNGR234a are listed in Tables 5 and 6.












TABLE 5











nod box-like sequences in pNGR234a



















number of




distance




name









mismatches to




to the




of the






nod




position in




orien-




the consensus




following




following






box




pNGR234a




tation




sequence




ORF




ORF



















 1




4514-4562









11




504




(fal)






 2




8481-8529









8




 87




nodZ






 3




12322-12370









7









?#






 4




97470-97518









6




277




nolL






 5




129615-129663




+




10




1358 




y4gE






 6




141088-141136




+




8




890




fixF






 7




150280-150327









11




202




noeE






 8




158820-158868









4




235




nodA






 9




161891-161939




+




11




1103 




y4hM






10




169833-169881









7




117




y4iR






11




278947-278995









7




153




nodS






12




279821-279869




+




7









?#






13




443101-443149









10




465




y4vC






14




473059-473107




+




9




236




y4wH






15°




481253-481301









16




117




y4wM






16




493961-494009




+




6




288




y4xI






17




532039-532087




+




5




589




syrM2






18




256434-256482




+




12




329




y4mC






19




469151-469199




+




12




112




y4wE











°The majority of the mismatches is located in the 3′-terminal part of the sequence.










#No predicted ORF can be found downstream of the putative nod box.





















TABLE 6











Putative NifA-dependent promoters in pNGR234a




















distance




name








sigma-54 pro-





to the




of the







NifA-dep.




moter (−12/−24




orien-




following




following






Nr.




UAS*: position




region#): position




tation




ORF (nt)




ORF









 1




90812-90827




90910-90924




+




127 




y4eD






 2




162727-162742




162788-162802




+




240 




y4hM






 3




235036-235051




234934-234948









66




y4lD






 4




255021-255036




255130-255144




+




306 




y4mB






 5




285265-285280




285343-285357




+




50




y4nG






 6




436363-436378




436275-436289









41




nifB






 7




442046-442061




441955-441969









56




fixA






 8




442735-442750




442676-442690









40




y4vC






 9




444109-444124




443983-443997









104 




y4vD






10




444137-444152









444241-444299°




+









38°




nifQ






11




451782-451799




451891-451905




+




88




nifH1






12




460319-460334




460424-460438




+




63




y4vR






13




463063-463078




463139-463153




+




48




y4wA






14




478839-478854




478761-478775









463 




nifS






15°




483663-483678




483769-483783




+




88




nifH2











*“Upstream Activator Sequence”: NifA-binding site located 80 to 150 nt upstream of the transcription start point (5′-TGT-N


10


-ACA-3′).










#sequence corresponding to the consensus sequence of conserved sigma-54-promoters 12 nt upstream of the transcription start point: 5′-TGGCAC-N


5


-TTGC-3′ (2 mismatches allowed).










°3 possibilities for a promoter (in two cases only corresponding to the minimal consens: 5′-GG-N


10


-GC-3′)













EXAMPLES




Example 1




General Methods




Bacteria and Plasmids






Escherichia coli


was grown on SoC, in TB or in two-fold YT medium (Sambrook et al., 1989). The cosmid clones pXB296 and PXB110 (Perret et al., 1991) were raised in


E. coll


strain 1046 (Cami and Kourilsky, 1978). Subclones in M13mp18 vectors (Yanisch-Perron et al., 1985) were grown in


E. coli


strain DH5αF′IQ (Hanahan, 1983).




Construction of Cosmid Libraries




Cosmid DNA was prepared by standard alkaline lysis procedures followed by purification in CsCl gradients (Radloff et al., 1967). DNA fragments sheared by sonication of 10 μg of cosmid DNA were treated for 10 min at 30° C. with 30 units of mung bean nuclease (New England Biolabs, Beverly, Mass., USA), extracted with phenol/chloroform (1:1), and precipitated with ethanol. DNA fragments, ranging in size from 1 to 1. 4 kbp, were purified from agarose gels using Geneclean II (Bio101, Vista, Calif., USA) and ligated into SmaI-digested M13pm18. Electroporation of aliquots of the ligation reaction into competent


E. coli


DH5αF′IQ was performed according to standard protocols (Dower et al., 1988; Sambrook et al., 1989).




M13 Template Preparation




Fresh 1 ml


E. coli


cultures in twofold YT held in 96-deep-well microtiter plates (Beckman Instruments, Fullerton, Claif., USA) were infected with recombinant phages from white plaques grown on plates containing X-gas (5-bromo-4-chloro-indoyl-β-D-galactoside) and IPTG (isopropyl-β-thiogalactopyranoside). Rapid preparation of ˜0.5 μg of single-stranded M13 template DNA was carried out as follows: 190 μl portions of the phage cultures grown for 6 hr at 37° C. were transferred into 96-well microtiter plates. Lysis of the phages was obtained by adding 10 μl of 15% (w/v) SDS followed by 5 min incubation at 80° C. Template DNA was trapped using 10 μl (1 mg) of paramagnetic beads (Streptavidin MagneSphere Paramagnetic Particles Plus M13 Oligo, Promega, Madison, Wis., USA) and 50 μl of hybridization solution [2.5 M NaCl, 20% (w/v) polyethylene glycol (PEG-8000)] during an annealing step of 20 min at 45° C. Beads were pelleted by placing microtiter plates on appropriate magnets and washing three times with 100 μl of 0.1-fold SSC. The DNA was recovered in 20 μl of water by a denaturation step of 3 min at 80° C. When required, larger amounts of single-stranded recombinant DNA (>10 μg) were purified using QIAprep 8 M13 Purification Kits (Qiagen, Hilden, Germany) from 3 ml of supernatant of phage cultures grown for 6 hr at 37° C.




Sequencing




Two sequencing methods were used: dye terminator and dye primer cycle sequencing, each in combination with AmpliTaq DNA polymerase (Perkin-Elmer) and Thermo Sequenase (Amersham). All reactions, including ethanol precipitation, were performed in microtiter plates. Reagents were pipetted using 12-channel pipettes. Where necessary, sequencing reaction mixtures, including enzymes, were pipetted into the plates in advance and held at −20° C. until needed.




Dye Terminator Cycle Sequencing




For dye terminator/AmpliTaq DNA polymerase sequencing, 0.5 μg of template DNA, and the PRISM Ready Reaction DyeDeoxy Terminator Cycle Sequencing Kit (Perkin-Elmer) were used. Cycle sequencing was performed in microtiter plates using 25 PCR cycles (30 sec at 95° C., 30 sec at 50° C., and 4 min at 60° C.). Prior to loading the amplified products on electrophoresis gels, unreacted dye terminators were removed using Sephadex columns scaled down to microtiter plates (Rosenthal and Charnock-Jones, 1993).




Dye terminator/Thermo Sequenase sequencing was performed using the same experimental conditions except that the reaction mix contained 16.25 mM Tris-HCl (pH 9.5), 4.0 mM MgCl


2


, 0.02% (v/v) NP-40, 0.02% (v/v) Tween 20, 42 μM 2-mercaptoethanol, 100 μM dATP/dCTP/dTTP, 300 μM dITP, 0.017 μM A/0.137 μM C/0.009 μM G/0.183 μM T from Taq Dye Terminators (Perkin-Elmer; no. A5F034), 0.67 μM primer, 0.2-0.5 μg of template DNA, and 10 units of Thermo Sequenase (Amersham) in a 30 μl reaction volume. Unincorporated dye terminators were removed from reaction mixtures by precipitation with ethanol.




Dye Primer Cycle Sequencing




Dye primer/AmpliTaq DNA polymerase sequencing reactions were performed according to the instructions accompanying the Taq Dye Primer, 21M13 Kit (Perkin-Elmer). Cycle sequencing was carried out on 0.5 μg of template DNA with 19 PCR cycles (30 sec at 95° C., 30 sec at 50° C., and 90 sec at 72° C.) followed by six cycles, each consisting of 95° C. for 30 sec and 72° C. for 2.5 min. Prior to electrophoresis, the four base-specific reactions were pooled and precipitated with ethanol.




Identical PCR conditions and the Thermo Sequenase Fluorescent Labelled Primer Cycle Sequencing Kit (Amersham) were used for dye primer/Thermo Sequenase sequencing reactions.




Sequence Acquisition and Analysis




Gel electrophoresis and automatic data collection were performed with ABI 373A DNA sequencers (Perkin-Elmer). After removing cosmid vector and M13mp18 sequences from the shotgun sequence data, the data were assembled using the program XGAP (Dear and Staden, 1991) and edited against the fluorescent traces. To close remaining gaps, to make single-stranded regions double-stranded, and to clarify ambiguities, additional cycle sequencing reactions with selected shotgun templates were carried out using either custom-made primers (primer-walks) or universal primer.




The complete double-stranded DNA sequence of cosmid pXB296 was analyzed using programs from the Wisconsin Sequence Analysis Package (version 8, Genetics Computer Group, Madison, Wis., USA). Homology searches were performed with BLAST (version 1.4; Altschul et al., 1990) and FASTA (version 2.0; Pearson and Lipman, 1988). Several nucleotide and protein databases were screened (GenBank/Genpept, SwissProt, EMBL, and PIR). Identities and similarities between homologous amino acid sequences were calculated with the alignment program BESTFIT (Smith and Waterman, 1981).




Example 2




Comparison of Fluorescent Traces Created by Different Cycle Sequencing Methods




When using a thermostable sequenase [Thermo Sequenase (Amersham)], the concentrations of dye terminators (Perkin-Elmer) can be reduced by 20- to 250-fold in comparison to the concentrations needed for Taq DNA polymerase without compromising the quality of the sequencing results (Table 7).




To compare the dye terminator and dye primer cycle sequencing procedures, representative templates derived from the pXB296 library were sequenced by both methods, each performed with Thermo Sequenase and Taq DNA polymerase












TABLE 7











Concentrations (In μM) of dye terminators in each cycle sequencing






reaction with two different thermostable DNA polymerases














Dye




AmpliTaq DNA




Thermo Sequenase




Dilution factor for






terminator




polymerase




DNA polymerase




dye terminators


a



















A Taq




0.751




0.017




40






C Taq




22.500




0.137




160






G Taq




0.200




0.009




20






T Taq




45.000




0.183




250













a


Thermo Sequenase vs. AmpliTaq.













(FIG.


1


). In general, dye terminator traces do not contain the many compressions (on average, one compression every 50 bases in single reads) that are common with dye primers if mixes do not contain nucleotide analogues like deoxyinosine or 7-deaza-deoxyguanosine triphosphates or if sequencers are used without active heating systems. In addition, dye terminator traces obtained with Thermo Sequenase show more uniform signal intensities over those obtained with Taq DNA polymerase, thus resulting in a reduced number of weak and missing peaks (e.g. a weak G-signal following an A-signal in Thermo Sequenase traces or a weak C-signal following a G-signal in Taq DNA polymerase traces). Using ABI 373A sequencers, errors in automatic base-calling of Thermo Sequenase/dye terminator scans only arise after 300-350 bases. The average number of resolved bases in dye primer gels (378 bases) is 46 bases longer than in those produced with dye terminators (332 bases). Furthermore, in Thermo Sequenase/dye primer sequences the peaks are very regular and the number of stops and missing bases decreases in comparison to Taq DNA polymerase/dye primer electropherograms. The number of compressions, however, is not significantly reduced.




Example 3




Shotgun Sequencing of Entire Cosmids Using Dye Terminators or Dye Primers




To compare the efficiency of both methods, cosmid pXB296 of pNGR234a was shotgun sequenced using a combination of dye terminators and thermostable sequenase (Thermo Sequenase), whereas another cosmid, pXB110, was sequenced using a combination of dye primers and Taq DNA polymerase (Table 1). Over 93% (736 clones) of 786 dye terminator reads of pXB296 were accepted by XGAP with a maximal alignment mismatch of 4%. By increasing this level to 25%, so that most of the remaining data could be included in the assembly, 775 reads led to three 6 to 10 kbp stretches of contiguous sequence (contigs), two of which were joined after editing. To close the last gap and to complete single-stranded regions with data derived from the opposite strand, only 32 additional dye terminator reads using custom-made primers were required. It took <1 week to assemble and finalize the 34,010 bp DNA sequence of pXB296 (EMBL accession no. Z68203; eight-fold redundancy; GC content, 58.5 mol %).




In contrast, only 308 (34%) of 899 shotgun reads obtained by Taq DNA polymerase/dye primer cycle sequencing of pXB110 were included in the first assembly (4% alignment mismatch). At the 25% alignment mismatch level, 879 reads were assembled, leading to 25 short contigs (1-2 kbp). These contigs had to be edited extensively in order to join most of them. “Primer walks”, covering gaps and complementing single-stranded regions, were not sufficient to clarify all the remaining ambiguities in the assembled sequence. Every 100-150 bp, a compression in one strand could not be resolved by sequence data from the complementary strand. Therefore, it was necessary to resequence clones using dye terminators and universal primer. In total, 191 additional dye terminator reads had to be created. As a result, assembling and finalizing the 34,573 bp sequence of pXB110 (10.5-fold redundancy; GC content, 58.3 mol %) took much more time than pXB296 did.




Example 4




Analysis of Cosmid pXB296




Putative ORFs were located on the 34,010 bp sequence of pXB296 using the programs TESTCODE (Fickett, 1982) and CODONPREFERENCE (Gribskov et al., 1984), the latter in combination with a codon frequency table based on previously sequenced genes of Rhizobium sp. NGR234 (as well as the closely related


R. fredii


). All 28 ORFs and their deduced amino acid sequences exhibited significant homologies to known genes and/or proteins. The positions of the ORFs along pXB296, as well as the best homologues, are displayed in Table 2 and FIG.


2


. Ribosomal binding site-like sequences (Shine and Dalgarno, 1974) precede each putative ORF except for ORF9 (position 11,214-12,455). If one disregards the homology to known glutamate dehydrogenases in the first 32 amino acids deduced from this ORF, a downstream alternative start codon (position 11,220) preceded by a Shine-Dalgarno sequence can be identified. Most of the ORFs are organised in five clusters (ORFs with only short intergenic spaces or overlaps between them). Cluster I, containing ORF1 to ORF5, encodes proteins homologous to trans-membrane and membrane-associated oligopeptide permease proteins and to a


Bacillus anthracis


encapsulation protein. Cluster II, includes ORF6 and ORF7, which are homologous to aminotransferase and (semi)aldehyde dehydrogenase genes. Homologies to transposase genes [ORF8; cluster III (ORF10 and ORF11)] and to various nif and fix genes [cluster IV (ORF12 to ORF20); ORF23, part of cluster V] are also reported.




Presumed promoter and stem-loop sequences that might represent ρ-independent terminator-like structures (Platt, 1986) are shown in FIG.


2


. Significant σ


54


-dependent promoter consensus sequences (5′-TGGCACG-N


4


-TTGC-3′; Morett and Buck, 1989), as well as nifA upstream activator sequences (5′-TGT-N


10


-ACA-3′; Morett and Buck, 1988), are found upstream of the nifB homologue ORF15, the fixA homologue ORF20, ORF21, ORF22, and ORF23. ORF23 is part of cluster V in pXB296, which includes the dctA gene of Rhizobium sp. NGR234 (van Slooten et al., 1992). Surprisingly, the published dctA sequence shows important discrepancies. Therefore, a fragment encompassing this locus was amplified by PCR using NGR234 genomic DNA as template. By sequencing this fragment, the cosmid sequence of the present invention was confirmed.




Example 5




Analysis of the Complete Plasmid pNGR234a




Using the thermostable sequenase/dye terminator cycle sequencing method herein described, 20 overlapping cosmids (including pXB296) of the symbiotic plasmid pNGR234a of Rhizobium sp. NGR234 were sequenced, together with two PCR products and a subcloned DNA fragment derived from cosmid pXB564 that cover two remaining gaps (position 276,448-277,944 and position 480,607-483,991). The map of the sequenced cosmids is shown in FIG.


4


. The entire assembled 536 kb sequence of pNGR234a is given in

FIG. 3

(deposited in EMBL/GenBank under accession no. U00090).




The analysis of the complete nucleotide sequence revealed few regions of 98-100% identity to already published sequences in public databases. These sequences are listed in Table 8. These sequences had been derived either from Rhizobium sp. NGR234, derivatives of it or closely related strains of it. Therefore, the ORFs and their deduced proteins, 98-100% homologous to nifH, nodA, nodB, nodC, nodD1, nodS, nodU, nolX, nolW, nolB, nolU and “ORF1”, represent already known genes/proteins (Table 8 and References). Some other ORFs and their deduced proteins, nearly identical to public database entries, were either only partially known before the disclosure of the present invention or exhibited significant differences, for instance, dctA, host-inducible gene A, nifD, nifK, nodD2, nolT, nolX, nolV, “ORF140”, “ORF91”, “RSRS9 25 kDa-protein gene” (Table 8 and References).




As a first step, approximately 100 kb of pNGR234a was analyzed between position 417,796 to 517,279 using the programs TESTCODE (Fickett, 1982) and CODONPREFERENCE (Gribskov et al., 1984). In this initial ˜100 kb of sequence, 76 ORFs were found and ascribed putative functions




(=ORFs y4tQ to y4yO (excluding ORFs y4uD, y4uG, y4wG, y4wO, y4wP, y4xF, y4xQ, y4xG and y4yB and excluding ORF-fragments fu1, fu2, fu3, fu4, fv1 and fw1); see Table 3). It should be noted that since the sequence of cosmid pXB296 forms part of this 100 kb region, all of the ORFs identified in Table 2 (except “ORF1”) are reproduced (albeit with minor, but definitive, revisions) in Table 3. Most of the 76 ORFs and their deduced proteins showed homologies to public database entries that could help identify their putative functions. Only ORFs y4vK and y4xA (duplicated nifH) as well as y4yD, y4yE and y4yG (nolW, nolB and nolU) were identical to database entries (98-100% homology). In the case of 7 ORFs and their deduced proteins, no homologous sequences in public databases have been found.












TABLE 8











All ORFs that show 98-100% identity in the nucleotide sequence to ORFs located in pNGR234a and that have already been published in databases:

















EMBL/GeneBank




+




claimed in the patent application/






ORF




organism




accession no.









not claimed in the patent application









dctA




Rhizobium sp. NGR234




S38912




+




sequencing mistakes in the database entry: the










real dctA in pNGR234a is 144 bases longer (see










table 4)






host inducible geneA






Rhizobium fredii


USDA 201#




M19019 RFIND




+




significant difference in pNGR234a (frameshift;










see table 4)






nifH




Rhizobium sp. ANU 240*




M26961 RHMNIFKDH3











nifD (partially)




Rhizobium sp. ANU 240*




M26961 RHMNIFKDH2




+




only part of nifD is in the public database






nifK (partially)




Rhizobium sp. ANU 240*




M26961 RHMNIFKDH1




+




only part of nifK is in the public database






nodABC






Rhizobium fredii


USDA 257#




M73362 RSNOD2











nodD1




Rhizobium sp. mpik 3030*




Y00059 RSNODD1











nodD2






Rhizobium japonicum


US6A 191#




M18972 RHMNODD2M




+




significantly different function of NodD2 in










NGR234 than in USDA 191 (despite of 98%










identity°)






nodS




Rhizobium sp. NGR234




J03686 NGRNOIDSU











nodU (partially)




Rhizobium sp. NGR234




J03686 NGRNODSU











nodU (full)




Rhizobium sp.*




X89965 RSNODUGEN







nolXWBTUV






Rhizobium fredii


USDA 257#




L12251 RHMNOLBTU









nolXWB, nolU









+




NolT: 97% identical (amino acid sequence level)









+




NolX, NolV + ORF4 of pNGR234a show










significant differences to USDA257 (see table 4)






ORF1; ORF2 (partially)




Rhizobium sp. NGR234




X74314 RSORF











ORF140 nodulation gene;




Rhizobium sp. NGR234




X74068 RSPLAS




+




database entry includes sequencing mistakes






ORF91 (partially)







causing frameshifts






RFRS9 25 kDa protein gene*






Rhizobium fredii


USDA 257#




U18764 RFU18764




+




repetitive element in pNGR234a showing










insertions, deletions of nucleotides in










comparison to the database entry











*strains representing derivatives of NGR234: Rhizobium sp. ANU 240, Rhizobium sp. mpik 3030, Rhizobium sp.










#strains closely related to NGR234:


Rhizobium fredii


USDA 257,


Rhizobium japonicum


USDA 191,


Rhizobium fredii


USDA 201.










°identity in nucleotide sequence as well as amino acid sequence













As a second step, the remaining 436 kb of pNGR234a were analyzed using the methods noted above. The results of this analysis are discussed in Example 6.




Example 6




Genetic Organization of the Complete Plasmid pNGR234a




In order to confirm and to improve the identification of probable coding regions in pNGR234a, the program GeneMark was used which is based on matrices developed for related organisms of Rhizobium sp. NGR234 (


R. leguminosarum


and


R. meliloti


(Borodovsky et al., 1994)). The use of this program currently represents the most frequently applied method to distinguish coding and non-coding regions in newly sequences DNA of prokaryotes. Further analysis of the putative ORF products was carried out using methods to detect signal sequences, transmembrane segments and various other domains (PROSITE database search (Bairoch et al., 1995); PSORT program (Nakai et al., 1991)).




In total, 416 ORFs were predicted to encode putative proteins (Freiberg et al., 1997). Additionally, 67 fragments were detected that seemed to be remnants of functional ORFs. Some of these were disrupted by insertion of mobile elements. All identified functional ORFs and fragments of former functional ORFs are listed in Table 3.




Within the initial ˜100 kb region (position 417,796 to 517,279) first analyzed in this study, 9 ORFs (y4uD, y4uG, y4wG, y4wO, y4wP, y4xF, y4xQ, y4xG and y4yB) and 6 ORF-fragments (fu1, fu2, fu3, fu4, fv1 and fw1) were predicted in addition to the 76 ORFs (y4tQ to y4yO) listed within Table 3.




According to Table 8, 12 ORFs of the 416 predicted coding regions were identical to public database entries (98% to 100% homology at the amino acid level), namely: y4hI (nodA), y4hH (nodB), y4hG (nodC), y4aL (nodD1), y4nC (nodS), y4nB (nodU), y4sM (ORF1), y4vK (nifH1), y4xA (nifH2), y4yD (nolW), y4yE (nolB), y4yG (nolU). In addition, the database entry of the homologue to y4yC (nolX) has been corrected to 98% identical to y4yC. Furthermore, the sequence of the ORF y4hB (noeE) has been available to the public since October 1996. Except the 14 ORFs mentioned above, the remaining 402 ORFs are new. 139 of them show no homology to any known ORF/protein. The others exhibit less than 98% amino acid identity to public database entries over their whole length.




Industrial Applicability




The present invention provides a detailed analysis of the symbiotic plasmid pNGR234a of Rhizobium sp. NGR234. The plasmid pNGR234a (including any ORFs encoded therein, or any part of the nucleotide sequence of the plasmid, or any proteins expressible from any of said ORFs or any part of said nucleotide sequence) has industrial applicability which can include its use in, inter alia, the following areas:




(a) the analysis of the structure, organisation or dynamics of other genomes;




(b) the screening, subcloning, or amplification by PCR of nucleotide sequences;




(c) gene trapping;




(d) the identification and classification of organisms and their genetic information;




(e) the identification and characterisation of nucleotide sequences, amino acid sequences or proteins;




(f) the transportation of compounds to and from an organism which is host to at least to one of said nucleotide sequences, ORFs or proteins;




(g) the degradation and/or metabolism of organic, inorganic, natural or xenobiotic substances in a host organism;




(h) the modification of the host-range, nitrogen fixation abilities, fitness or competitiveness of organisms;




(i) obtaining a synthetic minimal set of ORFs required for functional Rhizobium-legume symbiosis;




(j) the modification of the host-range of rhizobia;




(k) the augmentation of the fitness or competitiveness of Rhizobium sp. NGR234 in the soil and its nodulation efficiency on host plants;




(l) the introduction of desired phenotype(s) into host plants using said plasmid as a stable shuttle system for foreign DNA encoding said desired phenotype(s); or




(m) the direct transfer of said plasmid into rhizobia or other microorganisms without using other vectors for mobilization.




REFERENCES




Altschul, S. F., G. Warren, W. Miller, E. M. Myers, and D. J. Lipman. 1990. Basic local alignment search tool.


J. Mol. Biol.


215: 403-410.




Appelbaum, E. R., D. V. Thompson, K. Idler and N. Chartrain. 1988.


Rhizobium japonicum


USDA1 191 has two nodD genes that differ in primary structure and function.


J. Bacteriol.


170: 12-20.




Badenoch-Jones, J., T. A. Holton, C. M. Morrison, K. F. Scott and J. Shine. 1989. Structural and functional analysis of nitrogenase genes from the broad host-range Rhizobium strain ANU240.


Gene


77: 141-153.




Bender, G. L., M. Nayudu, K. K. L. Strange and B. G. Rolfe. 1988. The nodD1 gene from Rhizobium strain NGR234 is a key determinant in the extension of host-range to the non-legume


Parasponia. Mol. Plant


-


Microbe Interact.


1: 259.




Bodmer, W. F. 1994. The Human Genome Project.


Rev. Invest. Clin.


(Suppl.) 3-5.




Broughton, W. J., M. J. Dilworth, and I. K. Passmore. 1972. Base ratio determination using unpurified DNA.


Anal. Biochem.


46: 164-172.




Broughton, W. J., N. Heycke, H. Meyer z. A., and C. E. Pankhurst. 1984. Plasmid-linked nif and “nod” genes in fast growing rhizobia that nodulate


Glycine max, Psophocarpus tetragonolobus,


and


Vigna unguiculata. Proc. Natl. Acad. Sci. USA.


81: 3093-3097.




Broughton, W. J. C-H. Wong, A. Lewin, U. Samrey, H. Myint, H. Meyer z. A., D. N. Dowling, and R. Simon. 1986. Identification of Rhizobium plasmid sequences involved in recognition of Psophocarpus, Vigna, and other legumes.


J. Cell Biol.


102: 1173-1182.




Buikema, W. J., W. W. Szeto, P. V Lemley, W. H. Orme-Johnson, and F. M. Ausubel. 1985. Nitrogen fixation specific regulatory genes of


Klebsiella pneumoniae


and


Rhizobium meliloti


share homology with the general nitrogen regulatory gene ntrC of


K. pneumoniae. Nucleic Acids Res.


13: 4539-4555.




Cami, B. and P. Kourilsky. 1978. Screening of cloned recombinant DNA in bacteria by in situ colony hybridization.


Nucleic Acids Res.


5: 2381-2390.




Craxton, M. 1993. Cosmid sequencing.


Methods Mol. Biol.


23: 149-167.




Dear, S. and R. Staden. 1991. A sequence assembly and editing for efficient management of large projects.


Nucleic Acids Res.


19: 3907-3911.




Davis, E. O. and A. W. B. Johnston. 1990. Regulatory functions of the 3 nodD genes of


Rhizobium leguminosarum


bv.


phaseoli. Mol. Microbiol.


4: 933-941.




Dower, W. J., J. F. Miller, and C. W. Ragsdale. 1988. High efficiency transformation of


E. coli


by high voltage electroporation.


Nucleic Acids Res.


16: 6127-6145.




Fellay, R., P. Rochepeau, B. Relić, and W. J. Broughton. 1995. Signals to and emanating from Rhizobium largely control symbiotic specificity. In Pathogenesis and host specificity in plant diseases.


Histopathological, biochemical, genetic, and molecular bases


(ed. U. S. Singh, R. P. Singh, and K. Kohmoto), Vol. I, pp. 199-220. Pergamon/Elsevier Science Ltd., Oxford, U.K.




Fickett, J. W. 1982. Recognition of protein coding regions in DNA sequences.


Nucleic Acids Res.


10: 5303-5318.




Fischer, H.-M. 1994. Genetic regulation of nitrogen fixation in Rhizobia.


Microbiol. Rev.


58: 352-386.




Fisher, R. F. and S. R. Long. 1993. Interactions of NodD at the nod box: NodD binds to two distinct sites on the same face of the helix and induces a bend in the DNA.


J. Mol. Biol.


233: 336-348.




Fleischmann, R. D., M. D. Adams, O. White, R. A. Clayton, E. F. Kirkness, A. R. Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty, J. M. Merrick, et al. 1995. Whole-genome random sequencing and assembly of


Haemophilus influenzae


Rd.


Science


269: 496-512.




Fraser, C. M., J. D. Gocayne, O. White, M. D. Adams, R. A. Clayton, R. D. Fleischmann, C. J. Bult, A. R. Kerlavage, G. Sutton, J. M. Kelley, et al. 1995. The minimal gene complement of


Mycoplasma genitalium. Science


270: 397-403.




Freiberg, C., X. Perret, W. J. Broughton and A. Rosenthal. 1996. Sequencing the 500-kb GC-rich symbiotic replicon of Rhizobium sp. NGR234 using dye terminators and a thermostable sequenase: A beginning.


Genome Research,


in press.




Gribskov, M., J. Devereux, and R. R. Burgess. 1984. The codonpreference plot: Graphic analysis of protein coding sequences and prediction of gene expression.


Nucleic Acids Res.


12: 539-549.




Hanahan, D. 1983. Studies on transformation of


Escherichia coli


with plasmids.


J. Mol. Biol.


166: 557-580.




Hartl, D. L. and M. J. Palazzolo. 1993. Drosophila as a model organism in genome analysis. In


Genome research in molecular medicine and virology


(ed. K. W. Adolf), pp. 115-129. Academic Press, Orlando, Fla., U.S.A.




Hiles, I. D., M. P. Gallagher, D. J. Jamieson, and C. F. Higgins, 1987. Molecular characterization of the oligopeptide permease of


Salmonella typhimurium. J. Mol. Biol.


195: 125-142.




Iismaa, S. E., P. M. Ealing, K. F. Scott, and J. M. Watson. 1989. Molecular linkage of the nif/fix and nod gene regions in


Rhizobium leguminosarum


biovar


trifolii. Mol. Microbiol.


3: 1753-1764.




Levy, J. 1994. Sequencing the yeast genome: An international achievement.


Yeast


10: 1689-1706.




Lewin, A., E. Cervantes, C.-H. Wong and W. J. Broughton. 1990. nodSU, two new nod genes of the broad host range Rhizobium strain NGR234 encode host-specific nodulation of the tropical tree


Leucaena leucocephala. Mol. Plant Microbe Interact.


3: 317-326.




Long, S. R. 1989. Rhizobium-legume nodulation: life together in the underground.


Cell


56: 203-214.




Long, S., J. W. Reed, J. Himawan and G. C. Walker. 1988. Genetic analysis of a cluster of genes required for synthesis of the calcofluor-binding exopolysaccharide of


Rhizobium meliloti. J. Bacteriol.


170: 4239-4248.




Makino, S.-I., I. Uchida, N. Terakado, C. Sasakawa, and M. Yoshikawa. 1989. Molecular characterization and protein analysis of the cap region, which is essential for encapsulation in


Bacillus anthracis. J. Bacteriol


171: 722-730.




Martinez, E., D. Romero, and R. Palacios. 1990. The Rhizobium genome.


Crit. Rev. Plant Sci.


9: 59-93.




Morett, E. and M. Buck. 1988. NifA-dependent in vivo protection demonstrates that the upstream activator sequence of nif promoters is a protein binding site.


Proc. Natl. Acad. Sci. USA.


85: 9401-9405.




Morett, E. and M. Buck. 1989. In vivo studies on the interaction of RNA polymerase-σ


54


with the


Klebsiella pneumoniae


and


Rhizobium neliloti


nifH promoters: The role of nifA in the formation of an open promoter complex.


J. Mol. Biol.


210: 65-77.




Padmanabhan, S., R.-D. Hirtz, and W. J. Broughton. 1990. Rhizobia in tropical legumes: Cultural characteristics of Bradyrhizobium and Rhizobium sp.


Soil Biol. Biochem.


22: 23-28.




Pearson, W. R. and D. J. Lipman. 1988. Improved tools for biological sequence comparison.


Proc. Natl. Acad. Sci.


85: 2444-2448.




Perego, M., C. F. Higgins, S. R. Pearce, M. P. Gallagher, and J. A. Hoch. 1991. The oligopeptide transport system of


Bacillus subtilis


plays a role in the initiation of sporulation.


Mol. Microbiol.


5: 173-185.




Perret, X., W. J. Broughton, and S. Brenner. 1991. Canonical ordered cosmid library of the symbiotic plasmid of Rhizobium species NGR234.


Proc. Natl. Acad. Sci. USA.


88: 1923-1927.




Perret, X., R. Fellay, A. J. Bjourson, J. E. Cooper, S. Brenner, and W. J. Broughton. 1994. Subtraction hybridization and shotgun sequencing: A new approach to identify symbiotic loci.


Nuclei Acids Res.


22: 1335-1341.




Platt, T. 1986. Transcription termination and regulation of gene expression.


Annu. Rev. Biochem.


55: 339-372.




Radloff, R., W. Bauer, and J. Vinograd. 1967. A dye-buoyant-density method for the detection and isolation of closed circular duplex DNA: The closed circular DNA in HELA cells.


Proc. Natl. Acad. Sci. USA.


57: 1514-1521.




Relić, B., X. Perret, M. T. Estrada-García, J. Kopcinska, W. Golinowski, H. B. Krishnan, S. G. Pueppke and W. J. Broughton. 1994. Nod factors of Rhizobium are a key to the legume door.


Mol. Microbiol.


13: 171-178.




Rosenthal, A. and D. S. Charnock-Jones. 1993. Linear amplification sequencing with dye terminators.


Methods Mol. Biol.


23: 281-296.




Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989.


Molecular cloning: A laboratory manual,


2nd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., U.S.A.




Shine J. and L. Dalgarno. 1974. The 3′-terminal sequence of


Escherichia coli


16S ribosomal RNA: Complementary to nonsense triplets and ribosome binding sites.


Proc Natl. Acad. Sci.


71: 1342-1346.




Smith, T. F. and M. S. Waterman. 1981. Identification of common molecular subsequences.


J. Mol. Biol.


147: 195-197.




Stanfield, S., L. Ielpi, D. O'Brochta, D. R. Hesinki and G. S. Ditta. 1988. The ndvA gene product of


Rhizobium meliloti


is required for Beta(1-2)glucan production and has homology to the ATP binding export protein HlyB.


J. Bacteriol.


170: 3523-3530.




Sulston, J, Z. Du, K. Thomas, R. Wilson, L. Hillier, R. Staden, N. Halloran, P. Green, J. Thierry-Mieg, L. Qiu, et al. 1992. The


C. elegans


genome sequencing project: A beginning.


Nature


356: 37-41.




Tabor, S. and C. C. Richardson. 1995. A single residue in DNA polymerases of the


Escherichia coli


DNA polymerase I family is critical for distinguishing between deoxy- and dideoxyribonucleotides.


Proc. Natl. Acad. Sci.


92: 6339-6343.




van Rhijn, P. and J. Vanderleyden. 1995. The Rhizobium-plant symbiosis.


Microbiol. Rev.


59: 124-142.




van Slooten, J. C., T. V. Bhuvanasvari, S. Bardin, and J. Stanley. 1992. Two C


4


-dicarboxylate transport systems in Rhizobium sp. NGR234: Rhizobial dicarboxylate transport is essential for nitrogen fixation in tropical legume symbioses.


Mol. Plant Microbe Interact.


5: 179-186.




Yanisch-Perron, C., J. Ira, and J. Messing. 1985. Improved M13 phage cloning vectors and host strains: Nucleotide sequences of M13mp18 and pUC19 vectors.


Gene


33: 103-119.




Bairoch A., P. Bucher, and K. Hofmann. 1995. The prosite database, its status in 1995.


Nucleic Acids Res.,


24 189.




Borodovsky, M. Y., K. E. Rudd and E. V. Koonin. 1994. Intrinsic and extrinsic approaches for detecting genes in a bacterial genome


Nucleic Acids Res.


22: 4756.




Broughton, W. J., U. Samrey, and J. Stanley. 1987. Ecological genetics of


Rhizobium meliloti:


symbiotic plasmid transfer in the


Medicago sativa rhizosphere FEMS Microbiol. Lett.


40: 251.




Fellay, R., X. Perret, V. Viprey, W. J. Broughton, and S. Brenner. 1995a. Organization of host-inducible transcripts on the symbiotic plasmid of Rhizobium sp. NGR234


Mol. Microbiol.


16: 657.




Freiberg, C., R. Fellay, A. Bairoch, W. J. Broughton, A. Rosenthal, and X. Perret. 1997. Molecular basis of symbiosis between Rhizobium and legumes.


Nature,


387: 394-401.




Gray, J. X., M. A. Djordjevic, and B. G. Rolfe. 1990. Two genes that regulate exopolysaccharide production in Rhizobium sp. strain NGR234: DNA sequences and resultant phenotypes


J. Bacteriol.


172: 195.




Hanin, M., S. Jabbouri, D. Quesada-Vincens, C. Freiberg, X. Perret, J.-C. Prome, W. J. Broughton, and R. Fellay. 1996. Sulphatation of Rhizobium sp. NGR234 Nod factors is dependent on noeE, a new host-specificity gene


Mol. Microbiol.,


in press.




Krishnan, H. B., C.-I. Kuo, and S. G. Pueppke. 1995. Elaboration of flavonoid-induced proteins by the nitrogen-fixing soybean symbiont


Rhizobium fredii


is regulated by both nodD1 and nodD2, and is dependent on the cultivar-specificity locus,


nolXWBTUV Microbiology.


141: 2245.




Morrison, N. A., C. Y. Hau, M. J. Trinick, J. Shine and B. G. Rolfe. 1983. Heat curing of a sym plasmid in a fast-growing Rhizobium sp. that is able to nodulate legumes and the nonlegume Parasponia sp.


J. Bacteriol.


153: 427.




Nakai, K. and M. Kanehisa. 1992. Expert system for predicting protein localization sites in Gram-negative bacteria.


PROTEINS: STructure, Functions, and Genetics


11: 95-110.




Piper, K. R., S. Beck von Bodman, and S. K. Farrand. 1993. Conjugation factor of


Agrobacterium tumefaciens


regulates Ti plasmid transfer by autoinduction


Nature


362: 448.




Sullivan, J. T., H. N. Patrick, W. L. Lowther, D. B. Scott, and C. W. Ronson. 1995. Nodulating strains of


Rhizobium loti


arise through chromosomal symbiotic gene transfer in the environment


Proc. Natl. Acad. Sci.,


92: 8985.




van Slooten, J. C., E. Cervantes, W. J. Broughton, C.-H. Wong, and J. Stanley. 1990. Sequence and analysis of the rpoN sigma factor gene of Rhizobium sp. strain NGR234


J. Bacteriol.


172: 5563.




van Slooten, J. C., T. V. Bhuvanaswari, S. Bardin, and J. Stanley. 1992. Two C4-dicarboxylate transport systems in Rhizobium sp. NGR234: rhizobial dicarboxylate transport is essential for nitrogen fixation in tropical legume symbioses


Mol. Plant


-


Microbe Interact.


5: 179.




Zhang, L-H., P. J. Murphy, A. Kerr, and M. E. Tate. 1993. Agrobacterium conjugation and gene regulation by N-acyl-L-homoserine lactones


Nature


362: 446.














SEQUENCE LISTING











The patent contains a lengthy “Sequence Listing” section. A copy of the “Sequence Listing” is available in electronic form from the USPTO






web site (http://seqdata.uspto.gov/sequence.html?DocID=06475793B1). An electronic copy of the “Sequence Listing” will also be available from the






USPTO upon request and payment of the fee set forth in 37 CFR 1.19(b)(3).












Claims
  • 1. An isolated polynucleotide open reading frame (ORF) y4gA to y4gN derived from the polynucleotide sequence of SEQ ID NO: 1 at nucleotide base numbers 142,026 to 143,234 and degenerate variants thereof.
  • 2. The ORF of claim 1 which is under the control of its natural regulatory elements.
  • 3. A plasmid which harbours the ORF of claim 1 or any degenerate variant thereof.
  • 4. The plasmid of claim 3 produced recombinantly.
  • 5. A method for transforming a microorganism, comprising the step of transforming the microorganism with the plasmid of claim 3.
  • 6. A method for transforming a plant, comprising the step of transforming the plant with a shuttle vector comprising the plasmid of claim 3.
  • 7. A method for transforming a plant, comprising the step of transforming the plant with the plasmid of claim 3.
  • 8. A transgenic plant, transformed with the ORF of claim 1.
  • 9. A transgenic microorganism, transformed with the ORF of claim 1.
Priority Claims (2)
Number Date Country Kind
96730001 Jul 1996 EP
9710395 May 1997 GB
PCT Information
Filing Document Filing Date Country Kind
PCT/IB97/00950 WO 00
Publishing Document Publishing Date Country Kind
WO98/02560 1/22/1998 WO A
Foreign Referenced Citations (2)
Number Date Country
0 211 662 Feb 1987 EP
WO 9400466 Jan 1994 WO
Non-Patent Literature Citations (13)
Entry
Aguilar et al. J. Bacteriol. vol. 169(12):5393-5400, 1987.*
Appelbaum, E.R. et al., “Rhizobium japonicum USDA 191 has Two nodD Genes that Differ in Primary Structure and Function,” J. Bacteriology, 1988, 170(1), 12-20.
Badenoch-Jones, J. et al., “Structural and functional analysis of nitrogenase genes from the broad-host-range Rhizobium strain ANU240,” Gene, 1989, 77, 141-153.
Fellay, R. et al., “Organization of host-inducible transcripts on the symbiotic plasmid of Rhizobium sp. NGR234,” Bol. Microbiol., 1995, 16(4), 657-667.
Freiberg, C. et al., “Sequencing the 500-kb GC-rich Symbiotic Replicon of Rhizobium sp. NGR234 Using Dye Terminators and a Thermostable “Sequenase”: A Beginning,” Genome Res., 1996, 6(7), 590-600.
Frelberg, C. et al., “Molecular basis of symbiosis between Rhizobium and legumes,” Nature, 1997, 387, 394-401.
Horvath, B. et al., “Host-specific regulation of nodulation genes in Rhizobium is mediated by a plant-signal, interacting with the nodD gene product,” EMBO J., 1987, 6(4), 841-848.
Lewin, A. et al., “nodSU, Two New nod Genes of the Broad Host Range Rhizobium Strain NGR234 Encode Host-Specific Nodulation of the Tropical Tree Leucaena leucocephala,” Mol. Plant-Microbe Interactions, 1990, 3(5), 317-326.
Meinhardt, L.W. et al., “Molecular cloning and characterization of a sym plasmid locus that regulates cultivar-specific nodulation of soybean by Rhizobium fredii USDA257,” Mol. Microbiol., 1993, 9(1), 17-29.
Nayudu, M. et al.,. “Analysis of R-primes demonstrates that genes for broad host range nodulation of Rhizobium strain NGR234 are dispersed on the Sym plasmid,” Mol. Gen. Genet., 1987, 206, 326-337.
Perret, X. et al., “Subtraction hybridisation and shot-gun sequencing: a new approach to identify symbiotic loci,” Nucl. Acids Res., 1994, 22(8), 1335-1341.
Perret, X. et al., “Canonical ordered cosmid library of the symbiotic plasmid of Rhizobium species NGR234,” Proc. Natl. Acad. Sci. USA, 1991, 88, 1923-1927.
van Slooten, J.C. et al., “Two C4-Dicarboxylate Transport Systems in Rhizobium sp. NGR234: Rhizobial Dicarboxylate Transport is Essential for Nitrogen Fixation in Tropical Legume Symbioses,” Mol. Plant-Microbe Interactions, 1992, 5(2), 179-186.