The official copy of the sequence listing is submitted electronically via EFS-Web as an ASCII formatted sequence listing with a file named 341683seqlist.txt, created on Apr. 30, 2010, and having a size of 563 KB and is filed concurrently with the specification. The sequence listing contained in this ASCII formatted document is part of the specification and is herein incorporated by reference in its entirety.
This invention relates to polynucleotides and polypeptides encoded by them, as well as methods for using the polypeptides and microorganisms producing them.
Lactobacillus acidophilus is a Gram-positive, rod-shaped, non-spore forming, homofermentative bacterium that is a normal inhabitant of the gastrointestinal and genitourinary tracts. Since its original isolation by Moro (1900) from infant feces, the “acid loving” organism has been found in the intestinal tract of humans, breast-fed infants, and persons consuming high milk, lactose, or dextrin diets. Historically, L. acidophilus is the Lactobacillus species most often implicated as an intestinal probiotic capable of eliciting beneficial effects on the microflora of the gastrointestinal tract (Klaenhammer and Russell (2000) “Species of the Lactobacillus acidophilus complex” Encyclopedia of Food Microbiology, Volume 2, pp. 1151-1157. Robinson et al., eds. (Academic Press, San Diego, Calif.). L. acidophilus can ferment hexoses, including lactose and more complex oligosaccharides, to produce lactic acid and lower the pH of the environment where the organism is cultured. Acidified environments (e.g., food, vagina, and regions within the gastrointestinal tract) can interfere with the growth of undesirable bacteria, pathogens, and yeasts. The organism is well known for its acid tolerance, survival in cultured dairy products, and viability during passage through the stomach and gastrointestinal tract. Lactobacilli and other commensal bacteria, some of which are considered as probiotic bacteria that “favor life,” are generally recognized for their role in flavor and aroma development and to spoilage retardation in fermented food products, and have been studied extensively for their effects on human health, particularly in the prevention or treatment of enteric infections, diarrheal disease, prevention of cancer, and stimulation of the immune system.
During fermentation, lactic acid bacteria are exposed to toxic byproducts of their growth, such as lactic acid and hydrogen peroxide, antimicrobial agents produced by neighboring microorganisms, and the harsh environmental conditions that is encountered during proper fermentation of a raw food item. They must also adapt to the extreme conditions found in the stomach during ingestion, and severe temperatures associated with storage or production conditions, as well as compete with other microorganisms for resources. These bacteria have evolved sensory and regulatory mechanisms, which enable them to monitor external conditions and respond accordingly. One such mechanism is referred to as the “two-component” system, and is structured around two proteins: a histidine protein kinase and a response regulator protein. Furthermore, one of the major responses controlled by these sensory and regulatory systems of these bacteria is the production of their own antimicrobial agents, of which bacteriocins are an example. Two-component regulatory systems have been shown to control many diverse processes in bacteria, such as sporulation, chemotaxis, nitrogen assimilation, outer membrane protein expression, response to osmolarity, regulation of competence and virulence, as well as the production of antimicrobials.
Microorganisms that can respond to changes in the environment, such as those present during commercial fermentation and storage, as well as those microorganisms that can compete more effectively with other microorganisms are advantageous. Therefore, isolated nucleic acid sequences encoding these proteins are desirable for use in engineering microorganisms, including Lactobacillus acidophilus, to have an increased ability to tolerate changes in growth environment and an improved ability to inhibit food-borne pathogens.
Compositions and methods for modifying Lactobacillus organisms are provided. Compositions of the invention include isolated nucleic acid molecules encoding proteins involved in and those produced under the control of two-component sensing and regulatory systems.
Compositions comprise isolated nucleic acid molecules comprising a) a nucleic acid molecule comprising any one of even numbered SEQ ID NOS:1-164; b) a nucleic acid molecule comprising a nucleotide sequence having at least 80% sequence identity to any one of even numbered SEQ ID NOS:1-164; c) a nucleic acid molecule that encodes a polypeptide comprising the amino acid sequence as set forth in any one of odd numbered SEQ ID NOS:1-164; d) a nucleic acid molecule comprising a nucleotide sequence encoding a polypeptide having at least 80% amino acid sequence identity to the amino acid sequence as set forth in any one of odd numbered SEQ ID NOS:1-164; and e) a complement of any of a)-d).
Additional compositions include a polypeptide selected from the group consisting of a) a polypeptide comprising the amino acid sequence as set forth in any one of odd numbered SEQ ID NOS:1-164; b) a polypeptide comprising an amino acid sequence having at least 80% sequence identity to the amino acid sequence as set forth in any one of odd numbered SEQ ID NOS:1-164, wherein said polypeptide retains activity; c) a polypeptide encoded by the nucleotide sequence as set forth in any one of odd numbered SEQ ID NOS:1-164; and d) a polypeptide that is encoded by a nucleic acid molecule comprising a nucleotide sequence having at least 80% sequence identity to the nucleotide sequence as set forth in any one of odd numbered SEQ ID NOS:1-164.
Variant nucleic acid molecules, peptides and polypeptides sufficiently identical to and/or functionally equivalent to the nucleotide and amino acid sequences set forth in the attached Sequence Listing are encompassed by the present invention. Additionally, fragments and sufficiently identical fragments of the nucleotide and amino acid sequences are encompassed. Nucleotide sequences that are complementary to a nucleotide sequence of the invention, or that hybridize to a sequence of the invention, are also encompassed.
Compositions of this invention further include vectors and cells comprising the nucleic acid molecules described herein, as well as, cells and transgenic microbial populations comprising the vectors. Also included in the invention are methods for the recombinant production of the peptides and polypeptides of the invention, and methods for their use. Further included are methods and kits for detecting the presence of a nucleic acid and/or peptide and/or polypeptide sequence of the invention in a sample. Additionally provided are antibodies that bind to a peptide and/or polypeptide of the invention, methods of making the antibodies of this invention and methods for using the antibodies of this invention to detect a peptide and/or polypeptide of this invention.
Compositions also provided herein include a polypeptide of the invention further comprising one or more heterologous amino acid sequences, and antibodies that selectively bind to a polypeptide of the invention.
The two-component sensing and regulatory response molecules and molecules under the control of two-component sensing and regulatory response molecules of the present invention are useful for the selection and production of recombinant bacteria, particularly the production of bacteria with improved ability to survive under stressful conditions.
Additionally provided herein are methods for producing a polypeptide, comprising culturing a cell of the invention under conditions in which a nucleic acid molecule encoding the polypeptide is expressed, said polypeptide being selected from the group consisting of: a) a polypeptide comprising the amino acid sequence as set forth below; b) a polypeptide encoded by the nucleic acid sequence as set forth below; c) a polypeptide comprising an amino acid sequence having at least 80% sequence identity to the amino acid sequence as set forth below, wherein said polypeptide retains activity; and d) a polypeptide encoded by a nucleotide sequence having at least 80% sequence identity to the nucleotide sequence as set forth below, wherein said polypeptide retains activity.
Additionally provided are methods for detecting the presence of a polypeptide of the invention in a sample comprising contacting the sample with a compound that selectively binds to a polypeptide and determining whether the compound binds to the polypeptide in the sample.
Further provided are methods for detecting the presence of a polypeptide in a sample wherein the compound that binds to the polypeptide is an antibody, as well as kits comprising a compound for use in methods of the invention for detecting the presence of a polypeptide in a sample and instructions for use.
The present invention also provides methods for detecting the presence of a nucleic acid molecule and/or fragments thereof, of this invention in a sample, comprising: a) contacting the sample with a nucleic acid probe or primer that selectively hybridizes to the nucleic acid molecule; and b) detecting hybridization of the nucleic acid probe or primer with the nucleic acid molecule.
Also provided are methods for detecting the presence of a nucleic acid molecule and/or fragment of the invention in a sample wherein the sample comprises mRNA molecules and is contacted with a nucleic acid probe. Additionally provided herein is a kit comprising a compound that selectively hybridizes to a nucleic acid of the invention, and instructions for use.
Further provided herein are methods for increasing the ability of a microorganism to survive stressful conditions, comprising introducing into said microorganism a nucleic acid molecule of the invention and expressing the nucleic acid molecule. In specific embodiments, the nucleotide sequence encodes a protein of a two-component regulatory system, a histidine protein kinase and/or a response regulator of a two-component regulatory system, a protein under the control of a two-component regulatory system, or a bacteriocin. In further aspects of the invention, the stressful conditions comprise osmotic stress, oxidative stress and/or starvation conditions.
Methods are also provided herein for enhancing the ability of a microorganism to survive passage through the gastrointestinal tract, comprising introducing into the microorganism a nucleic acid molecule comprising at least one nucleotide sequence selected from the group consisting of: a) the nucleotide sequence as set forth in any one of odd numbered SEQ ID NO:1-164; b) a nucleotide sequence encoding a polypeptide comprising the amino acid sequence as set forth in any one of even numbered SEQ ID NO:1-164; c) a nucleotide sequence that is at least 80% identical to the sequence as set forth in any one of odd numbered SEQ ID NO:1-164, wherein said nucleotide sequence encodes a polypeptide that retains activity; and, d) a nucleotide sequence encoding a polypeptide comprising an amino acid sequence having at least 80% sequence identity to the amino acid sequence as set forth in any one of even numbered SEQ ID NO:1-164, wherein said polypeptide retains activity.
Methods are also provided herein for enhancing the ability of a microorganism to survive passage through the gastrointestinal tract, comprising introducing into the microorganism at least one nucleic acid molecule of the invention. In specific embodiments, the nucleotide sequence encodes a protein of a two-component regulatory system, a histidine protein kinase of a two-component regulatory system, a response regulator of a two-component regulatory system, a bacteriocin, and/or encodes a protein under the control of a two-component regulatory system.
Additional aspects of the invention comprise methods for increasing the ability of a microorganism to survive in the presence of an antimicrobial, comprising introducing into said microorganism a nucleic acid molecule comprising at least one nucleotide sequence of the invention. In specific embodiments, the nucleotide sequence encodes a protein of a two-component regulatory system, the nucleotide sequence encodes a histidine protein kinase of a two-component regulatory system and/or a response regulator of a two-component regulatory system, and/or the nucleotide sequence encodes a protein or proteins that is under the control of a two-component regulatory system.
Also provided are methods for enabling an organism to respond to environmental stimuli, comprising introducing into the organism a vector comprising at least one nucleotide sequence of the invention. In specific embodiments, the nucleotide sequence encodes a protein of a two-component regulatory system, a histidine protein kinase of a two-component regulatory system, a response regulator of a two-component regulatory system, a bacteriocin, and/or encodes a protein under the control of a two-component regulatory system. The environmental stimuli can be selected from the group consisting of turgor pressure, a chemical stimulus, heavy-metal cations, oxygen, iron, an antimicrobial compound, various carbohydrates, including glucose.
Yet another embodiment of the invention comprises a Lactobacillus acidophilus cell with an increased ability to survive stressful conditions compared to a wild-type Lactobacillus acidophilus cell, wherein said increased ability to survive stressful conditions is the result of overexpression of a nucleic acid molecule encoding an amino acid sequence as set forth herein. In specific embodiments, the stressful conditions comprise osmotic stress, oxidative stress, starvation, or the presence of antimicrobials.
The present inventions now will be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the inventions are shown. Indeed, these inventions may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like numbers refer to like elements throughout.
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation. The present invention relates to two-component sensing and regulatory system proteins and proteins under the control of the two-component regulatory system. These proteins include, but are not limited to, histidine protein kinases, response regulators and bacteriocins. Examples of nucleic acid sequences encoding two-component sensing and regulatory system, related antimicrobial proteins and proteins under the control of two-component sensing and regulatory molecules are provided in Table 1.
Two-component regulatory system molecules and molecules expressed under the control of two-component regulatory system molecules are provided. The full-length gene sequences, referred to as “two-component regulatory system sequences,” have similarity to two-component regulatory system genes. The invention further provides fragments and variants of these two-component regulatory system sequences, which can also be used to practice the methods of the present invention. As used herein, the terms “gene” and “recombinant gene” refer to nucleic acid molecules comprising an open reading frame, particularly those encoding a two-component regulatory system protein. Isolated nucleic acid molecules of the present invention comprise nucleic acid sequences encoding two-component regulatory system proteins and proteins under the control of two-component regulatory system proteins, nucleic acid sequences encoding the amino acid sequences set forth in SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, the nucleic acid sequences set forth in SEQ ID NOS:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 132, 135, 137, 139, 1143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 164, and variants and fragments thereof. The present invention also encompasses antisense nucleic acid molecules, as described herein.
In addition, isolated peptides, polypeptides and proteins of a two-component regulatory system or that are produced under the control of a two-component regulatory system, and variants and fragments thereof are encompassed, as well as, methods for producing all of these. For purposes of the present invention, the terms “protein” and “polypeptide” are used interchangeably. A representative amino acid sequence of the present invention is set forth in SEQ ID NO:2. In some embodiments, peptides and/or polypeptides of the present invention affect a stress-related protective activity. Stress-related protective activity refers to a biological or functional activity as determined in vivo or in vitro according to standard assay techniques. These techniques could involve, for example, measuring bacterial survival or growth under adverse environmental conditions. See, for example, Varcamonti et al. (2003) Appl. Environ. Microbiol. 69:1287-1289, herein incorporated by reference. By “adverse environmental conditions” or “stressful environmental conditions” is meant an environmental condition or state that is not conducive for growth of the microorganism, and includes, but is not limited to, acidic conditions, alkaline conditions, non-optimal osmotic stress conditions, non-optimal oxidative stress conditions, starvation conditions, and in the presence of antimicrobials.
As used herein, the terms peptide and polypeptide are used to describe a chain of amino acids, which correspond to those encoded by a nucleic acid. A peptide usually describes a chain of amino acids of from two to about 30 amino acids and polypeptide usually describes a chain of amino acids having more than about 30 amino acids. The term polypeptide can refer to a linear chain of amino acids or it can refer to a chain of amino acids, which have been processed and folded into a functional protein. It is understood, however, that 30 is an arbitrary number with regard to distinguishing peptides and polypeptides and the terms may be used interchangeably for a chain of amino acids around 30. The peptides and polypeptides of the present invention are obtained by isolation and purification of the peptides and polypeptides from cells where they are produced naturally or by expression of a recombinant and/or synthetic nucleic acid encoding the peptide or polypeptide. The peptides and polypeptides of this invention can be obtained by chemical synthesis, by proteolytic cleavage of a polypeptide and/or by synthesis from nucleic acid encoding the peptide or polypeptide.
It is also understood that the peptides and polypeptides of this invention may also contain conservative substitutions where a naturally occurring amino acid is replaced by one having similar properties and which does not alter the function of the polypeptide. Such conservative substitutions are well known in the art. Thus, it is understood that, where desired, modifications and changes, which are distinct from the substitutions which enhance immunogenicity, may be made in the nucleic acid and/or amino acid sequence of the peptides and polypeptides of the present invention and still obtain a peptide or polypeptide having like or otherwise desirable characteristics. Such changes may occur in natural isolates or may be synthetically introduced using site-specific mutagenesis, the procedures for which, such as mis-match polymerase chain reaction (PCR), are well known in the art. One of skill in the art will also understand that polypeptides and nucleic acids that contain modified amino acids and nucleotides, respectively (e.g., to increase the half-life and/or the therapeutic efficacy of the molecule), can be used in the methods of the invention.
The nucleic acid and protein compositions encompassed by the present invention are isolated or substantially purified. By “isolated” or “substantially purified” is intended that the nucleic acid or protein molecules, or biologically active fragments or variants, are substantially or essentially free from components normally found in association with the nucleic acid or protein in its natural state. Such components include other cellular material, culture media from recombinant production, and various chemicals used in chemically synthesizing the proteins or nucleic acids. Preferably, an “isolated” nucleic acid of the present invention is free of nucleic acid sequences that flank the nucleic acid of interest in the genomic DNA of the organism from which the nucleic acid was derived (such as coding sequences present at the 5′ or 3′ ends). However, the molecule can include some additional bases or moieties, which do not deleteriously affect the basic characteristics of the composition. For example, in various embodiments, the isolated nucleic acid contains less than 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb, or 0.1 kb of nucleic acid sequence normally associated with the genomic DNA in the cells from which it was derived. Similarly, an isolated or substantially purified protein has less than about 30%, 20%, 10%, 5%, or 1% (by dry weight) of contaminating protein, or non-two-component regulatory protein. When the protein is recombinantly produced, preferably culture medium represents less than 30%, 20%, 10%, or 5% of the volume of the protein preparation, and when the protein is produced chemically, preferably the preparations have less than about 30%, 20%, 10%, or 5% (by dry weight) of chemical precursors, or non-two-component regulatory chemicals.
The compositions and methods of the present invention can be used to modulate the function of the two-component regulatory molecules of the invention or the sequences under the control of the two component sensing or regulatory molecules. By “modulate,” “alter,” or “modify” is intended the up- or down-regulation of a target biological activity. In accordance with the present invention, the level or activity of a sequence of the invention is modulated (i.e., overexpressed or underexpressed) if the level and/or activity of the sequence is statistically lower or higher than the level and/or activity of the same sequence in an appropriate control. Concentration and/or activity can be increased or decreased by at least 0.5%, 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% relative to an appropriate control. Proteins of the invention are useful in modifying the biological activities of lactic acid bacteria, especially lactic acid bacteria that are used to ferment foods with nutritional or health-promoting characteristics. Nucleic acid molecules of the invention are useful in modulating production of the sequences of the invention by lactic acid bacteria. Up- or down-regulation of expression of a nucleic acid of the present invention is encompassed. Up-regulation can be accomplished by providing multiple nucleic acid copies, modulating expression by modifying regulatory elements, promoting transcriptional or translational mechanisms, or other means. Down-regulation can be accomplished by using known antisense and gene silencing techniques. By “lactic acid bacteria” is intended bacteria from a genus selected from the following: Aerococcus, Carnobacterium, Enterococcus, Lactococcus, Lactobacillus, Leuconostoc, Oenococcus, Pediococcus, Streptococcus, Melissococcus, Alloiococcus, Dolosigranulum, Lactosphaera, Tetragenococcus, Vagococcus, and Weissella (Holzapfel et al. (2001) Am. J. Clin. Nutr. 73:365S-373S; Bergey's Manual of Systematic Bacteriology, Vol. 2 (Williams and Wilkins, Baltimore (1986)) pp. 1075-1079).
Microorganisms expressing the nucleic acid molecules to produce the polypeptides of the present invention are useful as additives in dairy and fermentation processing. The nucleic acid sequences, encoded polypeptides, and microorganisms expressing them are useful in the manufacture of milk-derived products, such as cheeses, yogurt, fermented milk products, sour milks, and buttermilk. Microorganisms that produce polypeptides of the invention may be probiotic organisms. By “probiotic” is intended a live microorganism that survives passage through the gastrointestinal tract and has a beneficial effect on the subject. By “subject” is intended an organism that comes into contact with a microorganism producing a protein of the present invention. Subject may refer to humans and other animals.
In addition to the sequences disclosed herein, and fragments and variants thereof, the isolated nucleic acid molecules of the current invention also encompass homologous nucleic acid sequences identified and isolated from other organisms or cells by hybridization with entire or partial sequences obtained from the two-component regulatory nucleotide sequences disclosed herein, or variants and fragments thereof.
In another embodiment of the invention, nucleotide sequences and fragments thereof that are expressed under the control of proteins and polypeptides of a two-component regulatory system and the proteins and polypeptides encoded by those nucleotide sequences are provided. In a preferred embodiment, the protein or polypeptide produced from a nucleotide sequence under control of a two-component regulatory system is a bacteriocin. By “bacteriocin” is intended a group of polypeptides produced by a bacterium as an antimicrobial substance. Included in this group are: Class I bacteriocins or lantibiotics which contain the unusual amino acids lantionine, β-methyl-lanthionine and dehydrated residues dehydroalanine and dehydrobutyrine; Class II bacteriocins, i.e., small heat-stable, non-lanthionine containing, membrane-active peptides; and Class III bacteriocins, i.e., large, heat-labile proteins.
The invention provides isolated nucleic acid molecules comprising nucleotide sequences encoding two-component regulatory proteins, as well as peptides and/or proteins encoded thereby. By “two-component regulatory protein” or “two-component sensing protein” is meant proteins comprising, consisting of and/or consisting essentially of the amino acid sequences set forth in even numbered SEQ ID NOS:1-38. By “proteins under the control of two-component sensing and regulatory molecules” is meant proteins having the amino acid sequences set forth in even numbered SEQ ID NOS:40-164. Fragments and variants of these nucleotide sequences and encoded proteins are also provided. By “fragment” of a nucleotide sequence or protein is intended a portion of the nucleotide or amino acid sequence.
Fragments and variants of the nucleic acid molecules disclosed herein can be used as hybridization probes to identify two-component regulatory protein-encoding nucleic acids and/or proteins under the control of two-component sensing and regulatory molecules, or they can be used as primers in amplification protocols (e.g., polymerase chain reaction) or mutation of two-component regulatory nucleic acid molecules, proteins under the control of two-component sensing and regulatory molecules and/or stress-related nucleic acid molecules. Such fragments or variants need not encode function polypeptides. Fragments of nucleic acids can also be bound to a physical substrate to comprise a macro- or microarray (for example, U.S. Pat. No. 5,837,832; U.S. Pat. No. 5,861,242). Such arrays of nucleic acids can be used to study gene expression or to identify nucleic acid molecules with sufficient identity to the target sequences.
By “nucleic acid molecule” is meant DNA molecules (e.g., cDNA or genomic DNA) and RNA molecules (e.g., mRNA) and analogs of the DNA or RNA generated using nucleotide analogs. The nucleic acid molecule can be single-stranded or double-stranded, but preferably is double-stranded DNA. A fragment of a nucleic acid molecule encoding a protein of the invention may encode a protein fragment that is biologically active, or it may be used as a hybridization probe or PCR primer as described below. A biologically active fragment of a polypeptide disclosed herein can be prepared by isolating a portion of one of the nucleotide sequences of the invention, expressing the encoded portion of the protein (e.g., by recombinant expression in vitro), and assessing the activity of the encoded portion.
Fragments of nucleic acid molecules of the invention comprise at least about 15, 20, 50, 75, 100, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1400, 1600, 1800, 2000, 2200, 2415 nucleotides (for example, 714 for SEQ ID NO:1, 1854 for SEQ ID NO:3, etc.), including any value between these numbers recited here, e.g., 36 nucleotides or 423 nucleotides up to the total number of nucleotides present in a full-length nucleotide sequence as disclosed herein.
Fragments of amino acid sequences of this invention can include polypeptide fragments that function as immunogens for example, for the production of antibodies to two-component regulatory system proteins or to proteins under the control of two-component sensing and regulatory molecules. Fragments of this invention include peptides comprising amino acid sequences sufficiently identical to and/or derived from the amino acid sequence of a protein of the invention, or partial-length protein of the invention and exhibiting at least one activity of the protein, but which include fewer amino acids than the full-length proteins disclosed herein. Typically, biologically active fragments of this invention comprise a domain or motif with at least one activity of the protein. A biologically active portion or fragment of a two-component regulatory protein or a protein under the control of two-component sensing and regulatory molecules can be a peptide or polypeptide that is, for example, 10, 25, 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 805 contiguous amino acids in length, or up to the total number of amino acids present in a full-length protein of the current invention (for example, 238 for SEQ ID NO:2, 618 for SEQ ID NO:4, etc.), including any value in between these explicitly listed herein, e.g., 17 amino acids or 106 amino acids up to the total number of amino acids present in a full-length protein sequence of the invention. Such biologically active fragments can be prepared by recombinant techniques and evaluated for one or more of the functional activities according to standard protocols. As used herein, a fragment can comprise at least 5 contiguous amino acids of even numbered SEQ ID NOS:1-164. The invention encompasses other fragments, however, such as any fragment of a protein of this invention comprising greater than 6, 7, 8, or 9 amino acids.
Variants of the nucleotide and amino acid sequences are encompassed in the present invention. By “variant” is meant a sufficiently identical sequence. Accordingly, the invention encompasses isolated nucleic acid molecules that are sufficiently identical to the nucleotide sequences of the invention set forth in the odd numbered SEQ ID NOS:1-164, or nucleic acid molecules that hybridize to a nucleic acid molecule of odd numbered SEQ ID NOS:1-164, or a complement thereof, under stringent conditions. Variants also include variant polypeptides encoded by the nucleotide sequences of the present invention. In addition, polypeptides of the current invention have an amino acid sequence that is sufficiently identical to an amino acid sequence set forth in even numbered SEQ ID NOS:1-164. By “sufficiently identical” is meant that one amino acid or nucleotide sequence contains or encodes a sufficient or minimal number of equivalent or identical amino acid residues or nucleotides as compared to a second amino acid or nucleotide sequence, thus providing a common structural domain and/or a common functional activity. Conservative variants include those sequences that differ due to the degeneracy of the genetic code.
In general, amino acids or nucleotide sequences that have at least about 45%, 55%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 95%, 96%, 97%, 98%, 99%, or 99.5% sequence identity to any of the amino acid sequences of even numbered SEQ ID NOS:1-164 or any of the nucleotide sequences of odd numbered SEQ ID NOS:1-164, respectively, are defined herein as sufficiently identical. Variant proteins encompassed by the present invention are biologically active, that is they retain a desired biological activity of the native protein. Such activities are discussed in detail elsewhere herein. By “two-component regulatory system activity” is intended the ability of an organism to respond to an environmental stimuli to enable the organism to better survive. This encompasses both stressful environmental conditions, as described above, and beneficial environmental conditions, wherein a molecule desired by the organism is present, such as glucose. Assays to measure the activity of two-component regulatory system proteins or the proteins under the control of the two-component sensing and regulatory molecules are well known in the art. See, for example, Lee et al. (2004) Infect. Immun. 72:3968-3973; Walker and Miller (2004) J. Bacteriol. 186:4056-4066; Saini et al. (2004) Microbiology. 150:865-875; Abo-Amer et al. (2004) J. Bacteriol. 186:1879-1889. A biologically active variant of a protein of the invention can differ from that protein by as few as 1-15 amino acid residues, as few as 1-10, such as 6-10, as few as 5, as few as 4, 3, 2, or even 1 amino acid residue.
In one embodiment, the sequence according to the present invention or for use in the methods of the invention may be one or more of the nucleotide sequences set forth in 3, 7, 13, 15, 19, 23, 29, 33, and 35, which can encode a histidine kinase. Variants of such nucleotide sequences are also included including sequences that have at least about 45%, 55%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 95%, 96%, 97%, 98%, 99%, or 99.5%.
In one embodiment, the sequence according to the present invention or for use in the methods of the invention may be one or more of the nucleotide sequences set forth in 73, 75, 85, 89, 91, 95, and 113, which can encode a bacteriocin. Variants of such nucleotide sequences are also included including sequences that have at least about 45%, 55%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 95%, 96%, 97%, 98%, 99%, or 99.5%.
In one embodiment, the sequence according to the present invention or for use in the methods of the invention may be one or more of the nucleotide sequences set forth in 1, 9, 11, 17, 21, 25, 27, 31, and 37, which can encode a response regulator. Variants of such nucleotide sequences are also included including sequences that have at least about 45%, 55%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 95%, 96%, 97%, 98%, 99%, or 99.5%.
In one embodiment, the sequence according to the present invention or for use in the methods of the invention may be one or more of the nucleotide sequences set forth in 5, 49, 51, 53, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 113, 115, 117, 119, 121, 151, 153, 155, 157, 159, 161, and 163, which can encode a polypeptide produced under the control of a two-component regulatory system. Variants of such nucleotide sequences are also included including sequences that have at least about 45%, 55%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 95%, 96%, 97%, 98%, 99%, or 99.5%.
In one embodiment, the sequence according to the present invention or for use in the methods of the invention may be one or more of the amino acid sequences set forth in 4, 8, 14, 16, 20, 24, 30, 34, and 36, which can encode a histidine kinase. Variants of such amino acid sequences are also included including sequences that have at least about 45%, 55%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 95%, 96%, 97%, 98%, 99%, or 99.5%.
In one embodiment, the sequence according to the present invention or for use in the methods of the invention may be one or more of the amino acid sequences set forth in 74, 76, 90, 92, 96, and 114, which can encode a bacteriocin. Variants of such amino acid sequences are also included including sequences that have at least about 45%, 55%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 95%, 96%, 97%, 98%, 99%, or 99.5%.
In one embodiment, the sequence according to the present invention or for use in the methods of the invention may be one or more of the amino acid sequences set forth in 2, 10, 12, 18, 22, 26, 28, 32, and 38, which can encode a response regulator. Variants of such amino acid sequences are also included including sequences that have at least about 45%, 55%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 95%, 96%, 97%, 98%, 99%, or 99.5%.
In one embodiment, the sequence according to the present invention or for use in the methods of the invention may be one or more of the amino acid sequences set forth in 6, 50, 52, 54, 74, 76, 78, 80, 82, 84, 66, 68, 90, 92, 94, 96, 98, 100, 102, 114, 116, 118, 120, 122, 152, 154, 156, 158, 160, 163, and 164, which can encode a polypeptide produced under the control of a two-component regulatory system. Variants of such amino acid sequences are also included including sequences that have at least about 45%, 55%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 95%, 96%, 97%, 98%, 99%, or 99.5%.
Full-length or partial nucleic acid sequences can be used to obtain homologues and orthologs encompassed by the present invention. By “orthologs” is intended genes derived from a common ancestral gene and which are found in different species as a result of speciation. Genes found in different species are considered orthologs when their nucleotide sequences and/or their encoded amino acid sequences share substantial identity as defined elsewhere herein. Functions of orthologs are often highly conserved among species.
Naturally occurring variants can exist within a population (e.g., the Lactobacillus acidophilus population). Such variants can be identified by using well-known molecular biology techniques, such as the polymerase chain reaction (PCR), and hybridization as described herein. Synthetically derived nucleotide sequences, for example, sequences generated by site-directed mutagenesis or PCR-mediated mutagenesis that still encode a two-component regulatory protein, are also included as variants. One or more nucleotide or amino acid substitutions, additions, and/or deletions can be introduced into a nucleotide or amino acid sequence disclosed herein, such that the substitutions, additions, or deletions are introduced into the encoded protein. The additions (insertions) and/or deletions (truncations) can be made at the N-terminal and/or C-terminal end of the native protein, and/or at one or more sites in the native protein. Similarly, a substitution of one or more nucleotides or amino acids can be made at one or more sites in the native protein.
For example, conservative amino acid substitutions can be made at one or more predicted, preferably nonessential amino acid residues. A “nonessential” amino acid residue is a residue that can be altered from the wild-type sequence of a protein without altering the biological activity, whereas an “essential” amino acid is required for biological activity. A “conservative amino acid substitution” is one in which the amino acid residue is replaced with an amino acid residue with a similar side chain. Families of amino acid residues having similar side chains are known in the art. These families include amino acids with basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). Such substitutions would not be made for conserved amino acid residues, or for amino acid residues residing within a conserved motif, where such residues are essential for protein activity.
Alternatively, mutations can be made randomly along all or part of the length of the two-component regulatory coding sequence or along all or part of the length of the sequences under the control of two-component sensing and regulatory molecules, such as by saturation mutagenesis. The mutants can be expressed recombinantly, and screened for those that retain biological activity e.g., by assaying for two-component regulatory system activity using standard assay techniques. Methods for mutagenesis and nucleotide sequence alterations are known in the art. See, for example, Kunkel (1985) Proc. Natl. Acad. Sci. USA 82:488-492, Kunkel et al. (1987) Methods in Enzymol. Molecular Biology (MacMillan Publishing Company, New York) and the references sited therein. Obviously the mutations made in the DNA encoding the variant must not disrupt the reading frame and preferably will not create complementary regions that could produce secondary mRNA structure. See, EP Patent Application Publication No. 75,444. Guidance as to appropriate amino acid substitutions that do not affect biological activity of the protein of interest may be found in the model of Dayhoff et al. (1978) Atlas of Protein Sequence and Structure (Natl. Biomed. Res. Found., Washington, D.C.), herein incorporated by reference in its entirety for these teachings.
The deletions, insertions, and substitutions of the amino acid sequences encompassed herein are not expected to produce radical changes in the characteristics of the protein. However, when it is difficult to predict the exact effect of the substitution, deletion, or insertion in advance of doing so, one skilled in the art will appreciate that the effect will be evaluated by routine screening assays. That is, the activity can be evaluated by comparing the activity of the modified sequence with the activity of the original sequence. See, for example, Baruah et al. (2004) J. Bacteriol. 186:1694-1704; Wang et al. (2001) J. Bacteriol. 183:2795-2802; and, Piazza et al. (1999) J. Bacteriol. 181:4540-4548), each of which is herein incorporated by reference in their entireties for these teachings.
Variant nucleotide and amino acid sequences of the present invention also encompass sequences derived from mutagenic and recombinogenic procedures such as DNA shuffling. With such a procedure, one or more different polypeptides of the invention can be used to create a new polypeptide possessing the desired properties. In this manner, libraries of recombinant polynucleotides are generated from a population of related sequence polynucleotides comprising sequence regions that have substantial sequence identity and can be homologously recombined in vitro or in vivo. For example, using this approach, sequence motifs encoding a domain of interest can be shuffled between the two-component regulatory nucleic acid of the invention and other known two-component regulatory nucleic acid to obtain a new nucleic acid encoding for a peptide, polypeptide or protein with an improved property of interest, such as an increased Km in the case of an enzyme. Strategies for such DNA shuffling are known in the art. See, for example, Stemmer (1994) Proc. Natl. Acad. Sci. USA 91:10747-10751; Stemmer (1994) Nature 370:389-391; Crameri et al. (1997) Nature Biotech. 15:436-438; Moore et al. (1997) J. Mol. Biol. 272:336-347; Zhang et al. (1997) Proc. Natl. Acad. Sci. USA 94:4504-4509; Crameri et al. (1998) Nature 391:288-291; and U.S. Pat. Nos. 5,605,793 and 5,837,458.
Variants of the two-component regulatory proteins can function as either two-component-related agonists (mimetics) or as two-component-related antagonists. An agonist of the two-component-related protein can retain substantially the same, or a subset, of the biological activities of the naturally occurring form of the two-component regulatory protein. An antagonist of the two-component regulatory protein can inhibit one or more of the activities of the naturally occurring form of the two-component regulatory protein by, for example, competitively binding to a downstream or upstream member of a cellular signaling cascade that includes the two-component regulatory protein.
Variants of a two-component regulatory protein or variants of polypeptides under the control of the two-component sensing and regulatory molecules that function as either agonists or antagonists can be identified by screening combinatorial libraries of mutants, e.g., truncation mutants, of a two-component regulatory protein for stress-related protein agonist or antagonist activity. In one embodiment, a variegated library of two-component regulatory variants is generated by combinatorial mutagenesis at the nucleic acid level and is encoded by a variegated gene library. A variegated library of two-component regulatory variants can be produced by, for example, enzymatically ligating a mixture of synthetic oligonucleotides into gene sequences such that a degenerate set of potential two-component regulatory sequences is expressible as individual polypeptides, or alternatively, as a set of larger fusion proteins (e.g., for phage display) containing the set of two-component regulatory sequences therein. There are a variety of methods that can be used to produce libraries of variants from a degenerate oligonucleotide sequence. Chemical synthesis of a degenerate gene sequence can be performed in an automatic DNA syntheizer, and the synthetic gene then ligated into an appropriate expression vector. Use of a degenerate set of genes allows for the provision, in one mixture, of all of the sequences encoding the desired set of potential two-component regulatory sequences. Methods for synthesizing degenerate oligonucleotides are known in the art (see, e.g., Narang (1983) Tetrahedron 39:3; Itakura et al. (1984) Annu. Rev. Biochem. 53:323; Itakura et al. (1984) Science 198:1056; Ike et al. (1983) Nucleic Acids Res. 11:477).
In addition, libraries of fragments of a two-component regulatory protein coding sequence can be used to generate a variegated population of two-component regulatory fragments for screening and subsequent selection of variants of a two-component regulatory protein. In one embodiment, a library of coding sequence fragments can be generated by treating a double-stranded PCR fragment of a two-component regulatory coding sequence with a nuclease under conditions wherein nicking occurs only about once per molecule, denaturing the double-stranded DNA, renaturing the DNA to form double-stranded DNA which can include sense/antisense pairs from different nicked products, removing single-stranded portions from reformed duplexes by treatment with Si nuclease, and ligating the resulting fragment library into an expression vector. By this method, one can derive an expression library that encodes N-terminal and internal fragments of various sizes of the protein.
Several techniques are known in the art for screening gene products of combinatorial libraries made by point mutations or truncation and for screening cDNA libraries for gene products having a selected property. Such techniques are adaptable for rapid screening of the gene libraries generated by the combinatorial mutagenesis of proteins. The most widely used techniques, which are amenable to high through-put analysis, for screening large gene libraries typically include cloning the gene library into replicable expression vectors, transforming appropriate cells with the resulting library of vectors, and expressing the combinatorial genes under conditions in which detection of a desired activity facilitates isolation of the vector encoding the gene whose product was detected. Recursive ensemble mutagenesis (REM), a technique that enhances the frequency of functional mutants in the libraries, can be used in combination with the screening assays to identify two-component regulatory variants (Arkin and Yourvan (1992) Proc. Natl. Acad. Sci. USA 89:7811-7815; Delgrave et al. (1993) Protein Engineering 6(3):327-331).
The two-component regulatory sequences and the sequences under the control of two-component sensing and regulatory molecules are members of various families of molecules with conserved functional features. By “family” is intended two or more proteins or nucleic acid molecules having sufficient nucleotide or amino acid sequence identity. By “sequence identity” is intended the nucleotide or amino acid residues that are the same when aligning two sequences for maximum correspondence over a specified comparison window. By “comparison window” is intended a contiguous segment of the two nucleotide or amino acid sequences for optimal alignment, wherein the second sequence can contain additions or deletions (i.e., gaps) as compared to the first sequence. Generally, for nucleic acid alignments, the comparison window is at least 20 contiguous nucleotides in length, and optionally can be 30, 40, 50, 100, or longer. For amino acid sequence alignments, the comparison window is at least 6 contiguous amino acids in length, and optionally can be 10, 15, 20, 30, or longer. Those of skill in the art understand that to avoid a high similarity due to inclusion of gaps, a gap penalty is typically introduced and is subtracted from the number of matches.
Family members can be from the same or different species, and can include homologues as well as distinct proteins. Often, members of a family display common functional characteristics. Homologues can be isolated based on their identity to the Lactobacillus acidophilus nucleic acid sequences disclosed herein using the cDNA, or a portion thereof, as a hybridization probe according to standard hybridization techniques under stringent hybridization conditions as disclosed herein.
To determine the percent identity of two amino acid or nucleotide sequences, an alignment is performed. Percent identity of the two sequences is a function of the number of identical residues shared by the two sequences in the comparison window (i.e., percent identity=number of identical residues/total number of residues×100). In one embodiment, the sequences are the same length. Methods similar to those mentioned below can be used to determine the percent identity between two sequences. The methods can be used with or without allowing gaps. Alignment can also be performed manually by inspection.
When amino acid sequences differ in conservative substitutions, the percent identity can be adjusted upward to correct for the conservative nature of the substitution. Means for making this adjustment are known in the art. Typically the conservative substitution is scored as a partial, rather than a full mismatch, thereby increasing the percentage sequence identity.
Mathematical algorithms can be used to determine the percent identity of two sequences. Non-limiting examples of mathematical algorithms are the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA 87:2264, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90:5873-5877; the algorithm of Myers and Miller (1988) CABIOS 4:11-17; the local alignment algorithm of Smith et al. (1981) Adv. Appl. Math. 2:482; the global alignment algorithm of Needleman and Wunsch (1970) J. Mol. Biol. 48:443-453; and the search-for-local alignment-method of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. USA 85:2444-2448.
Various computer implementations based on these mathematical algorithms have been designed to enable the determination of sequence identity. The BLAST programs of Altschul et al. (1990) J. Mol. Biol. 215:403 are based on the algorithm of Karlin and Altschul (1990) supra. Searches to obtain nucleotide sequences that are homologous to nucleotide sequences of the present invention can be performed with the BLASTN program, score=100, wordlength=12. To identify amino acid sequences homologous to amino acid sequences of the proteins of the present invention, the BLASTX program can be used, score=50, wordlength=3. Gapped alignments can be obtained by using Gapped BLAST (in BLAST 2.0) as described in Altschul et al. (1997) Nucleic Acids Res. 25:3389. To detect distant relationships between molecules, PSI-BLAST can be used. See Altschul et al. (1997) supra. For all of the BLAST programs, the default parameters of the respective programs can be used. Alignment can also be performed manually by inspection.
Another program that can be used to determine percent sequence identity is the ALIGN program (version 2.0), which uses the mathematical algorithm of Myers and Miller (1988) supra. A PAM120 weight residue table, a gap length penalty of 12, and a gap penalty of 4 can be used with this program when comparing amino acid sequences.
In addition to the ALIGN and BLAST programs, the BESTFIT, GAP, FASTA and TFASTA programs are part of the GCG Wisconsin Genetics Software Package, Version 10 (available from Accelrys Inc., 9685 Scranton Rd., San Diego, Calif., USA), and can be used for performing sequence alignments. The preferred program is GAP version 10, which used the algorithm of Needleman and Wunsch (1970) supra. Unless otherwise stated, the sequence identity similarity values provided herein refer to the value obtained using GAP Version 10 with the following parameters: % identity and % similarity for a nucleotide sequence using GAP Weight of 50 and Length Weight of 3 and the nwsgapdna.cmp scoring matrix; % identity and % similarity for an amino acid sequence using GAP Weight of 8 and Length Weight of 2, and the BLOSUM62 scoring matrix; or any equivalent program. By “equivalent program” is intended any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide or amino acid residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by GAP Version 10
Two-component regulatory nucleotide sequences or proteins under the control of two-component sensing and regulatory molecules identified based on their sequence identity to the sequences set forth herein or to fragments and variants thereof are encompassed by the present invention. Methods such as PCR or hybridization can be used to identify sequences from a cDNA or genomic library, for example, that are substantially identical to a sequence of the invention. See, for example, Sambrook et al. (1989) Molecular Cloning: Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.) and Innis, et al. (1990) PCR Protocols: A Guide to Methods and Applications (Academic Press, New York). Methods for construction of such cDNA and genomic libraries are generally known in the art and are also disclosed in the above reference.
In hybridization techniques, the hybridization probes can be genomic DNA fragments, cDNA fragments, RNA fragments, and/or other oligonucleotides, and can consist of all or part of a known nucleotide sequence disclosed herein. In addition, they can be labeled with a detectable group such as 32P, or any other detectable marker, such as other radioisotopes, a fluorescent compound, an enzyme, or an enzyme co-factor. Probes for hybridization can be made by labeling synthetic oligonucleotides based on the known two-component regulatory nucleotide sequences disclosed herein. Degenerate primers designed on the basis of conserved nucleotides or amino acid residues in a known two-component regulatory nucleotide sequence or encoded amino acid sequence can additionally be used. The hybridization probe typically comprises a region of nucleotide sequence that hybridizes under stringent conditions to at least about 10, or about 20, or about 50, 75, 100, 125, 150, 175, 200, 250, 300, 350, or 400 consecutive nucleotides of a two-component regulatory nucleotide sequence of the invention or a fragment or variant thereof. To achieve specific hybridization under a variety of conditions, such probes include sequences that are unique among two-component regulatory protein sequences or unique among proteins under the control of two-component sensing and regulatory molecules. Preparation of probes for hybridization is generally known in the art and is disclosed in Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.), herein incorporated by reference in its entirety for these teachings.
In one embodiment, the entire nucleotide sequence of the invention is used as a probe to identify novel sequences and messenger RNAs. In another embodiment, the probe is a fragment of a nucleotide sequence disclosed herein. In some embodiments, the nucleotide sequence that hybridizes under stringent conditions to the probe can be at least about 300, 325, 350, 375, 400, 425, 450, 500, 550, 600, 650, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, or 4000 nucleotides in length (including any value not explicitly stated herein).
Substantially identical sequences will hybridize to each other under stringent conditions. By “stringent conditions” is meant conditions under which a probe will hybridize to its target sequence to a detectably greater degree than to other sequences (e.g., at least 2-fold over background). Generally, stringent conditions encompass those conditions for hybridization and washing under which nucleotides having at least about 60%, 65%, 70%, preferably 75% sequence identity typically remain hybridized to each other. Stringent conditions (e.g., high, medium, low stringency) are known in the art and can be found in Current Protocols in Molecular Biology (John Wiley & Sons, New York (1989)), 6.3.1-6.3.6, the entire contents of which are incorporated herein by reference for these teachings. Hybridization typically occurs for less than about 24 hours, usually about 4 to about 12 hours.
Stringent conditions are sequence dependent and will differ in different circumstances. When using probes, stringent conditions can be, e.g., those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30° C. for short probes (e.g., 10 to 50 nucleotides) and at least about 60° C. for long probes (e.g., greater than 50 nucleotides).
The post-hybridization washes are instrumental in controlling specificity. The two factors are ionic strength and temperature of the final wash solution. For the detection of sequences that hybridize to a full-length or approximately full-length target sequence, the temperature under stringent conditions is selected to be about 5° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. However, stringent conditions would encompass temperatures in the range of 1° C. to 20° C. lower than the Tm, depending on the desired degree of stringency as otherwise qualified herein. For DNA-DNA hybrids, the Tm can be determined using the equation of Meinkoth and Wahl (1984) Anal. Biochem. 138:267-284: Tm=81.5° C.+16.6(logM)+0.41(% GC)−0.61(% form)−500/L; where M is the molarity of monovalent cations, % GC is the percentage of guanosine and cytosine nucleotides in the DNA, % form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. The Tm is the temperature (under defined ionic strength and pH) at which 50% of a complementary target sequence hybridizes to a perfectly matched probe.
The ability to detect sequences with varying degrees of homology can be obtained by varying the stringency of the hybridization and/or washing conditions. To target sequences that are 100% identical (homologous probing), stringency conditions must be obtained that do not allow mismatching. By allowing mismatching of nucleotide residues to occur, sequences with a lower degree of similarity can be detected (heterologous probing). For every 1% of mismatching, the Tm is reduced about 1° C.; therefore, hybridization and/or wash conditions can be manipulated to allow hybridization of sequences of a target percentage identity. For example, if sequences with ≧90% sequence identity are preferred, the Tm can be decreased by 10° C. Two nucleotide sequences could be substantially identical, but fail to hybridize to each other under stringent conditions, if the polypeptides they encode are substantially identical. This situation could arise, for example, if the maximum codon degeneracy of the genetic code is used to create a copy of a nucleic acid.
Exemplary low stringency conditions include hybridization with a buffer solution of 30-35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulfate) at 37° C., and a wash in 1× to 2×SSC (20×SSC=3.0 M NaCl/0.3 M trisodium citrate) at 50 to 55° C. Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1.0 M NaCl, 1% SDS at 37° C., and a wash in 0.5× to 1×SSC at 55 to 60° C. Exemplary high stringency conditions include hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37° C., and a wash in 0.1×SSC at 60 to 65° C. Optionally, wash buffers can comprise about 0.1% to about 1% SDS. Duration of hybridization is generally less than about 24 hours, and is usually about 4 to about 12 hours. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology—Hybridization with Nucleic Acid Probes, Part I, Chapter 2 (Elsevier, New York); and Ausubel et al., eds. (1995) Current Protocols in Molecular Biology, Chapter 2 (Greene Publishing and Wiley-Interscience, New York). See Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed.; Cold Spring Harbor Laboratory Press, Plainview, N.Y.), the entire contents of which are incorporated herein by reference for these teachings.
In a PCR approach, oligonucleotide primers can be designed for use in PCR reactions to amplify corresponding DNA sequences from cDNA or genomic DNA extracted from any organism of interest. PCR primers can be preferably at least about 10 nucleotides in length, or at least about 20 nucleotides in length. Methods for designing PCR primers and PCR cloning are generally known in the art and are disclosed in Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.). See also Innis et al., eds. (1990) PCR Protocols: A Guide to Methods and Applications (Academic Press, New York); Innis and Gelfand, eds. (1995) PCR Strategies (Academic Press, New York); and Innis and Gelfand, eds. (1999) PCR Methods Manual (Academic Press, New York), the entire contents of which are incorporated herein by reference for these teachings. Known methods of PCR include, but are not limited to, methods using paired primers, nested primers, single specific primers, degenerate primers, gene-specific primers, vector-specific primers, partially-mismatched primers, and the like.
Diagnostic assays to detect expression of the peptides, polypeptides and/or nucleic acid molecules of this invention, as well as, their disclosed activity in a sample are disclosed. An exemplary method for detecting the presence or absence of a nucleic acid or protein of this invention in a sample comprises obtaining a sample from a food/dairy/feed product, starter culture (mother, seed, bulk/set, concentrated, dried, lyophilized, frozen), cultured food/dairy/feed product, dietary supplement, bioprocessing fermentate, a subject (e.g., a subject that has ingested a probiotic material), etc., and contacting the sample with a compound or an agent that interacts with or combines with the peptides, polypeptides or nucleic acids of this invention in a detectable manner (e.g., an mRNA or genomic DNA comprising the disclosed nucleic acid or fragment thereof) such that the presence of the peptide or nucleic acid is detected in the sample. Results obtained with a sample from the food, supplement, culture, product, or subject can be compared to results obtained with a sample from a control culture, product, or subject and a qualitative and/or quantitative determination of the presence of a polypeptide or nucleic acid of this invention in the sample can be made.
One agent for detecting the mRNA and/or genomic DNA comprising a disclosed nucleotide sequence of this invention is a labeled nucleic acid probe capable of hybridizing to the nucleotide sequence present in the mRNA and/or genomic DNA. The nucleic acid probe can be, for example, a disclosed nucleic acid molecule, such as the nucleic acid of odd numbered SEQ ID NOS:1-164, or a fragment thereof, such as a nucleic acid molecule of at least 15, 30, 50, 100, 250, or 500 nucleotides in length and sufficient to specifically hybridize under stringent conditions to the mRNA or genomic DNA comprising the disclosed nucleic acid sequence. Other suitable probes for use in the diagnostic assays of the invention are described herein.
One agent for detecting a protein of this invention is an antibody or ligand that specifically binds a peptide or protein of this invention. In some embodiments, the antibody or ligand can comprise a detectable label. Antibodies of this invention can be polyclonal, or monoclonal. An intact antibody, or a fragment thereof (e.g., Fab or F(abN)2) can be used. The term “labeled,” with regard to the probe or antibody, is intended to encompass direct labeling of the probe or antibody by coupling (i.e., physically linking) a detectable substance to the probe or antibody, as well as indirect labeling of the probe or antibody, by reactivity with another reagent that is directly labeled. Examples of indirect labeling include detection of a primary antibody using a fluorescently labeled secondary antibody and end-labeling of a DNA probe with biotin such that it can be detected with fluorescently labeled streptavidin.
An isolated peptide, polypeptide or protein of the present invention can be used as an antigen or immunogen to generate antibodies that specifically bind two-component regulatory proteins or proteins under the control of two-component sensing and regulatory molecules or generate antibodies that stimulate production of antibodies in vivo. The full-length polypeptide of the invention can be used as an immunogen or, alternatively, antigenic peptide fragments. The antigenic peptide can comprise at least 8, 10, 15, 20, or 30 or more amino acid residues of the amino acid sequence shown in even numbered SEQ ID NOS:1-164 and encompasses an epitope of a two-component regulatory protein or a protein under the control of two-component sensing and regulatory molecules such that an antibody raised against the peptide forms a specific immune complex with the protein or fragment thereof. An epitope encompassed by the antigenic peptide can comprise are regions of a protein that are located on the surface of the protein, e.g., a hydrophilic region.
The term “sample” is intended to include tissues, cells, and biological fluids present in or isolated from a subject, as well as cells from starter cultures or food products carrying such cultures, or derived from the use of such cultures. That is, the detection method of the invention can be used to detect mRNA, protein, or genomic DNA comprising a nucleic acid molecule or amino acid sequence of this invention in a sample both in vitro and in vivo. In vitro techniques for detection of mRNA comprising a disclosed sequence include, but are not limited to, Northern hybridizations and in situ hybridizations. In vitro techniques for detection of a protein comprising a disclosed polypeptide include, but are not limited to, enzyme linked immunosorbent assays (ELISAs), Western blots, immunoprecipitations, and immunofluorescence. In vitro techniques for detection of genomic DNA comprising the disclosed nucleotide sequences include, but are not limited to, Southern hybridizations. Furthermore, in vivo techniques for detection of a protein of this invention include introducing into a subject a labeled antibody or ligand that specifically binds the protein. For example, the antibody or ligand can be labeled with a radioactive marker whose presence and location in a subject can be detected by standard imaging techniques.
In one embodiment, the sample of this invention comprises protein molecules from a subject that has consumed a probiotic material. Alternatively, the sample can contain mRNA or genomic DNA from a starter culture.
The invention also encompasses kits for detecting the presence of the nucleic acids or proteins of this invention in a sample. Such kits can be used to determine if a microbe producing a specific polypeptide of the invention is present in a food product or starter culture, or in a subject that has consumed a probiotic material. For example, the kit can comprise a labeled compound or agent capable of detecting a disclosed polypeptide or mRNA in a sample and means for determining the amount of a the disclosed polypeptide in the sample (e.g., an antibody or ligand that specifically binds the disclosed polypeptide or nucleic acid probe that hybridizes with nucleic acid sequences encoding a disclosed polypeptide, e.g., odd numbered SEQ ID NOS:1-164). Kits can also include instructions detailing the use of such compounds.
For antibody-based kits, the kit can comprise, for example: (1) a first antibody (e.g., attached to a solid support) that binds to a disclosed polypeptide; and, optionally, (2) a second, different antibody that binds to the disclosed polypeptide or the first antibody and is conjugated to a detectable agent. For nucleic acid-based kits, the kit can comprise, for example: (1) a nucleic acid molecule, e.g., a detectably labeled oligonucleotide, that hybridizes to a disclosed nucleic acid sequence or (2) a pair of primers useful for amplifying a disclosed nucleic acid molecule.
The kit can also comprise, e.g., a buffering agent, a preservative, and/or a protein stabilizing agent. The kit can also comprise components necessary for detecting the detectable agent (e.g., an enzyme or a substrate). The kit can also contain a control sample or a series of control samples that can be assayed and compared to the test sample. Each component of the kit can be enclosed within an individual container, and all of the various containers can be within a single package along with instructions for use.
In one embodiment, the kit comprises multiple probes in an array format, such as those described, for example, in U.S. Pat. Nos. 5,412,087 and 5,545,531, and International Publication No. WO 95/00530, herein incorporated by reference in their entireties. Probes for use in the array can be synthesized either directly onto the surface of the array, as disclosed in International Publication No. WO 95/00530, or prior to immobilization onto the array surface (Gait, ed. (1984), Oligonucleotide Synthesis a Practical Approach IRL Press Oxford, England). The probes can be immobilized onto the surface using techniques well known to one of skill in the art, such as those described in U.S. Pat. No. 5,412,087. Probes can be a nucleic acid or amino acid sequence, preferably purified, or an antibody.
The arrays can be used to screen organisms, samples, or products for differences in their genomic, cDNA, polypeptide, or antibody content, including the presence or absence of specific sequences or proteins, as well as the concentration of those materials. Binding to a capture probe is detected, for example, by signal generated from a label attached to the nucleic acid molecule comprising the disclosed nucleic acid sequence, a polypeptide comprising the disclosed amino acid sequence, or an antibody. The method can include contacting the molecule comprising the disclosed nucleic acid, polypeptide, or antibody with a first array having a plurality of capture probes and a second array having a different plurality of capture probes. The results of each hybridization can be compared to analyze differences in expression between a first and second sample. The first plurality of capture probes can be from a control sample, e.g., a wild type lactic acid bacteria, or control subject, e.g., a food, dietary supplement, starter culture sample or a biological fluid. The second plurality of capture probes can be from an experimental sample, e.g., a mutant type lactic acid bacteria, or subject that has consumed a probiotic material, e.g., a starter culture sample, or a biological fluid.
These assays can be especially useful in microbial selection and quality control procedures where the detection of unwanted materials is essential. The detection of particular nucleotide sequences or polypeptides can also be useful in determining the genetic composition of food, fermentation products, or industrial microbes, or microbes present in the digestive system of animals or humans that have consumed probiotics.
The present invention further provides a nucleic acid array or chip, i.e., a multitude of nucleic acids (e.g., DNA) as molecular probes precisely organized or arrayed on a solid support, which allow for the sequencing of genes, the study of mutations contained therein and/or the analysis of the expression of genes, as such arrays and chips are currently of interest given their very small size and their high capacity in terms of number of analyses.
For an analysis, the carrier, such as in a DNA array/chip, is coated with DNA probes (e.g., oligonucleotides) that are arranged at a predetermined location or position on the carrier. A sample containing a target nucleic acid and/or fragments thereof to be analyzed, for example DNA or RNA or cDNA, that has been labeled beforehand, is contacted with the DNA array/chip leading to the formation, through hybridization, of a duplex. After a washing step, analysis of the surface of the chip allows any hybridizations to be located by means of the signals emitted by the labeled target. A hybridization fingerprint results, which, by computer processing, allows retrieval of information such as the expression of genes, the presence of specific fragments in the sample, the determination of sequences and/or the identification of mutations.
In one embodiment of this invention, hybridization between target nucleic acids and nucleic acids of the invention, used in the form of probes and deposited or synthesized in situ on a DNA chip/array, can be determined by means of fluorescence, radioactivity, electronic detection or the like, as are well known in the art.
In another embodiment, the nucleotide sequences of the invention can be used in the form of a DNA array/chip to carry out analyses of the expression of Lactobacillus acidophilus genes. This analysis is based on DNA array/chips on which probes, chosen for their specificity to characterize a given gene or nucleotide sequence, are present. The target sequences to be analyzed are labeled before being hybridized onto the chip. After washing, the labeled complexes are detected and quantified. Comparative analyses of the signal intensities obtained with respect to the same probe for different samples and/or for different probes with the same sample, allows, for example, for differential transcription of RNA derived from the sample.
In yet another embodiment, arrays/chips containing nucleotide sequences of the invention can comprise nucleotide sequences specific for other microorganisms, which allows for serial testing and rapid identification of the presence of a microorganism in a sample.
In a further embodiment, the principle of the DNA array/chip can also be used to produce protein arrays/chips on which the support has been coated with a polypeptide and/or an antibody of this invention, or arrays thereof, in place of the nucleic acid. These protein arrays/chips make it possible, for example, to analyze the biomolecular interactions induced by the affinity capture of targets onto a support coated, e.g., with proteins, by surface plasma resonance (SPR). The polypeptides or antibodies of this invention, capable of specifically binding antibodies or polypeptides derived from the sample to be analyzed, can be used in protein arrays/chips for the detection and/or identification of proteins and/or peptides in a sample.
Thus, the present invention provides a microarray or microchip comprising various nucleic acids of this invention in any combination, including repeats, as well as a microarray comprising various polypeptides of this invention in any combination, including repeats. Also provided is a microarray comprising one or more antibodies that specifically react with various polypeptides of this invention, in any combination, including repeats.
The present invention also encompasses antisense nucleic acid molecules, i.e., molecules that are complementary to a sense nucleic acid encoding a protein, e.g., complementary to the coding strand of a double-stranded cDNA molecule, or complementary to an mRNA sequence. Accordingly, an antisense nucleic acid can hydrogen bond to a sense nucleic acid. The antisense nucleic acid can be complementary to an entire sequence, or to only a portion thereof, e.g., all or part of the protein coding region (or open reading frame). An antisense nucleic acid molecule can be antisense to a noncoding region of the coding strand of a nucleotide sequence of the invention. The noncoding regions are the 5′ and 3′ sequences that flank the coding region and are not translated into amino acids. Antisense nucleotide sequences are useful in disrupting the expression of the target gene. Antisense constructions having 70%, 80%, or 85% sequence identity to the corresponding sequence can be used.
Given the coding-strand sequence encoding a protein disclosed herein (e.g., odd numbered SEQ ID NOS:1-164), antisense nucleic acids of the invention can be designed according to the rules of Watson and Crick base pairing. The antisense nucleic acid molecule can be complementary to the entire coding region of the mRNA, but can also be an oligonucleotide that is antisense to only a portion of the coding or noncoding region of the mRNA. An antisense oligonucleotide can be, for example, about 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 nucleotides in length, or it can be 100, 200 nucleotides, or greater in length, including any value in between those listed herein. An antisense nucleic acid of the invention can be constructed using chemical synthesis and enzymatic ligation procedures known in the art.
An antisense nucleic acid molecule of the invention can be an α-anomeric nucleic acid molecule (Gaultier et al. (1987) Nucleic Acids Res. 15:6625-6641). The antisense nucleic acid molecule can also comprise a 2′-O-methylribonucleotide (Inoue et al. (1987) Nucleic Acids Res. 15:6131-6148) or a chimeric RNA-DNA analogue (Inoue et al. (1987) FEBS Lett. 215:327-330). The invention also encompasses ribozymes, which are catalytic RNA molecules with ribonuclease activity that are capable of cleaving a single-stranded nucleic acid, such as an mRNA, to which they have a complementary region. The invention also encompasses nucleic acid molecules that form triple helical structures. See generally Helene (1991) Anticancer Drug Des. 6(6):569; Helene (1992) Ann. N.Y. Acad. Sci. 660:27; and Maher (1992) Bioassays 14(12):807, the entire contents of each of which are incorporated herein by reference for these teachings.
In some embodiments, the nucleic acid molecules of the invention can be modified at the base moiety, sugar moiety, or phosphate backbone to improve, e.g., the stability, hybridization, or solubility of the molecule. As used herein, the terms “peptide nucleic acids” or “PNAs” refer to nucleic acid mimics, e.g., DNA mimics, in which the deoxyribose phosphate backbone is replaced by a pseudopeptide backbone and only the four natural nucleobases are retained. The neutral backbone of PNAs has been shown to allow for specific hybridization to DNA and RNA under conditions of low ionic strength. The synthesis of PNA oligomers can be performed using standard solid-phase peptide synthesis protocols as described, for example, in Hyrup et al. (1996) supra; Perry-O'Keefe et al. (1996) Proc. Natl. Acad. Sci. USA 93:14670, the entire contents of each of which are incorporated herein by reference for these teachings.
In another embodiment, PNAs of a sequence can be modified, e.g., to enhance stability, specificity, or cellular uptake, by attaching lipophilic or other helper groups to PNA, by the formation of PNA-DNA chimeras, or by the use of liposomes or other techniques of drug delivery known in the art. The synthesis of PNA-DNA chimeras can be performed as described in Hyrup (1996) supra; Finn et al. (1996) Nucleic Acids Res. 24(17):3357-3363; Mag et al. (1989) Nucleic Acids Res. 17:5973; and Peterson et al. (1975) Bioorganic Med. Chem. Lett. 5:1119, the entire contents of each of which are incorporated herein by reference for these teachings.
The invention also includes chimeric or fusion proteins. A “chimeric protein” or “fusion protein” of this invention comprises a peptide or polypeptide as described herein operably linked (e.g., in frame) to a heterologous peptide or polypeptide. “Heterologous” in reference to a sequence is a sequence that originates from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. For example, a promoter operably linked to a heterlogous polypeptide is from a species different from the species from which the polynucleotide was derived, or, if from the same/analogous species, one or both are substantially from their original genomic locus, or the promoter is not the native promoter for the operably linked polynucleotide. A “heterologous peptide or polypeptide” refers to a peptide or polypeptide having an amino acid sequence corresponding to a protein that is not substantially identical to the amino acid sequence or protein of this invention, and which is derived from the same or a different organism. Within a fusion protein of this invention, the two-component regulatory peptide or polypeptide or the protein under the control of two-component sensing and regulatory molecules can comprise all or a portion of a polypeptide of the invention, preferably including at least one biologically active portion of the polypeptide. Within the fusion protein, the term “linked” is intended to indicate that the two-component regulatory peptide or polypeptide or the protein under the control of two-component sensing and regulatory molecules and the heterologous peptide or polypeptide are fused or joined or connected in-frame to each other. The heterologous peptide or polypeptide can be fused to the N-terminus and/or C-terminus of a peptide or polypeptide of this invention.
Expression of the linked coding sequences (e.g., a nucleotide sequence encoding the peptide or polypeptide of the invention linked in frame with a nucleotide sequence encoding the heterologous peptide or polypeptide) in some embodiments results in production of the fusion protein. The heterologous sequence can be a polypeptide that potentiates or increases production of the fusion protein in a cell. The portion of the fusion protein encoded by the heterologous sequence, i.e., the heterologous polypeptide, can be a protein fragment or peptide, an entire functional moiety, or an entire protein sequence. The heterologous peptide or polypeptide can be designed to be used in purifying the fusion protein, either with antibodies or with affinity purification specific for the heterologous polypeptide. Likewise, physical properties of the heterologous polypeptide can be exploited to allow selective purification of the fusion protein. Particular heterologous polypeptides of interest include superoxide dismutase (SOD), maltose-binding protein (MBP), glutathione-S-transferase (GST), an N-terminal histidine (His) tag, GST, immunoglobulin, and the like. This list is not intended to be limiting, as any heterologous polypeptide (e.g., a protein that potentiates production of the two-component regulatory protein as a fusion protein can be used in the compositions and methods of the invention. In one embodiment, the fusion protein is a GST-two-component regulatory fusion protein in which the two-component regulatory sequences are fused to the C-terminus of the GST sequences. In another embodiment, the fusion protein is a two-component regulatory-immunoglobulin fusion protein in which all or part of a two-component regulatory protein is fused to sequences derived from a member of the immunoglobulin protein family.
The immunoglobulin fusion proteins of the invention can be used as immunogens to produce antibodies in a subject to purify ligands, and in screening assays to identify molecules that inhibit the interaction of a protein of the invention with a ligand.
One of skill in the art will recognize that the particular heterologous polypeptide is chosen with the purification scheme in mind. For example, His tags, GST, and maltose-binding protein represent heterologous polypeptides that have readily available affinity columns to which they can be bound and eluted. Thus, where the heterologous polypeptide is an N-terminal His tag such as hexahistidine (His6 tag), the two-component regulatory fusion protein can be purified using a matrix comprising a metal-chelating resin, for example, nickel nitrilotriacetic acid (Ni-NTA), nickel iminodiacetic acid (Ni-IDA), and cobalt-containing resin (Co-resin). See, for example, Steinert et al. (1997) QIAGEN News 4:11-15, herein incorporated by reference in its entirety for these teachings. Where the heterologous polypeptide is GST, the fusion protein can be purified using a matrix comprising glutathione-agarose beads (Sigma or Pharmacia Biotech); where the heterologous polypeptide is a maltose-binding protein (MBP), the fusion protein can be purified using a matrix comprising an agarose resin derivatized with amylose.
Preferably, a chimeric or fusion protein of the invention is produced by standard recombinant DNA techniques. For example, nucleic acid fragments coding for the different polypeptide sequences can be ligated together in-frame, or the fusion nucleic acid can be synthesized, such as with automated DNA synthesizers. Alternatively, PCR amplification of gene fragments can be carried out using anchor primers that give rise to complementary overhangs between two consecutive nucleic fragments, which can subsequently be annealed and re-amplified to generate a chimeric nucleic acid sequence (see, e.g., Ausubel et al., eds. (1995) Current Protocols in Molecular Biology (Greene Publishing and Wiley-Interscience, New York). Moreover, the sequences of the invention can be cloned into a commercially available expression vector such that it is linked in-frame to an existing fusion moiety. Thus, the present invention also provides a vector comprising a nucleic acid encoding a fusion protein of this invention.
A fusion protein expression vector is typically designed for ease of removing the heterologous polypeptide to allow the two-component regulatory protein or the protein under the control of two-component sensing and regulatory molecules to retain the native biological activity associated with it. Methods for cleavage of fusion proteins are known in the art. See, for example, Ausubel et al., eds. (1998) Current Protocols in Molecular Biology (John Wiley & Sons, Inc.). Chemical cleavage of the fusion protein can be accomplished with reagents such as cyanogen bromide, 2-(2-nitrophenylsulphenyl)-3-methyl-3′-bromoindolenine, hydroxylamine, or low pH. Chemical cleavage is often accomplished under denaturing conditions to cleave otherwise insoluble fusion proteins.
Where separation of the polypeptide from the heterologous polypeptide is desired and a cleavage site at the junction between these fused polypeptides is not naturally occurring, the fusion construct can be designed to contain a specific protease cleavage site to facilitate enzymatic cleavage and removal of the heterologous polypeptide. In this manner, a linker sequence comprising a coding sequence for a peptide that has a cleavage site specific for an enzyme of interest can be fused in-frame between the coding sequence for the heterologous polypeptide (for example, MBP, GST, SOD, or an N-terminal His tag) and the coding sequence for the two-component regulatory polypeptide. Suitable enzymes having specificity for cleavage sites include, but are not limited to, factor Xa, thrombin, enterokinase, remin, collagenase, and tobacco etch virus (TEV) protease. Cleavage sites for these enzymes are well known in the art. Thus, for example, where factor Xa is to be used to cleave the heterologous polypeptide from the two-component regulatory polypeptide, the fusion construct can be designed to comprise a linker sequence encoding a factor Xa-sensitive cleavage site, for example, the sequence IEGR (see, for example, Nagai and Thøgersen (1984) Nature 309:810-812, Nagai and Thøgersen (1987) Meth. Enzymol. 153:461-481, and Pryor and Leiting (1997) Protein Expr. Pur 10(3):309-319, herein incorporated by reference). Where thrombin is to be used to cleave the heterologous polypeptide from the two-component regulatory polypeptide, the fusion construct can be designed to comprise a linker sequence encoding a thrombin-sensitive cleavage site, for example the sequence LVPRGS or VIAGR (see, for example, Pryor and Leiting (1997) Protein Expr. Purif. 10(3):309-319, and Hong et al. (1997) Chin. Med. Sci. J. 12(3):143-147, respectively, herein incorporated by reference). Cleavage sites for TEV protease are known in the art. See, for example, the cleavage sites described in U.S. Patent No. 5,532,142, herein incorporated by reference in its entirety. See also the discussion in Ausubel et al., eds. (1998) Current Protocols in Molecular Biology (John Wiley & Sons, Inc.), Chapter 16.
An isolated polypeptide of the present invention can be used as an immunogen to generate antibodies that specifically bind to the sequence of the invention or stimulate production of antibodies in vivo. A full-length polypeptide of the invention can be used as an immunogen or, alternatively, antigenic peptide fragments of the polypeptides described herein can be used. The antigenic peptide of the polypeptide comprises at least 8, preferably 10, 15, 20, or 30 amino acid residues of the amino acid sequence shown in even SEQ ID NOS:2-164 and encompasses an epitope of a protein of the invention such that an antibody raised against the peptide forms a specific immune complex with the related protein. Specific epitopes encompassed by the antigenic peptide are regions of can be located on the surface of the protein, e.g., hydrophilic regions.
The nucleic acid molecules of the present invention can be included in vectors, which can be expression vectors. “Vector” refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. Expression vectors include one or more regulatory sequences and direct the expression of nucleic acids to which they are operably linked. By “operably linked” is intended that the nucleotide sequence of interest is linked to the regulatory sequence(s) such that expression of the nucleotide sequence is allowed (e.g., in an in vitro transcription/translation system or in a cell when the vector is introduced into the cell). The term “regulatory sequence” is intended to include controllable transcriptional promoters, operators, enhancers, transcriptional terminators, and other expression control elements such as translational control sequences (e.g., Shine-Dalgarno consensus sequence, initiation and termination codons). These regulatory sequences will differ, for example, depending on the cell being used.
The vectors can be autonomously replicated in a cell (episomal vectors), or can be integrated into the genome of a cell, and replicated along with the host genome (non-episomal mammalian vectors). Integrating vectors can contain at least one sequence homologous to the bacterial chromosome that allows for recombination to occur between homologous DNA in the vector and the bacterial chromosome. Integrating vectors can also comprise bacteriophage or transposon sequences. Episomal vectors, or plasmids are circular double-stranded DNA loops into which additional DNA segments can be ligated. Plasmids capable of stable maintenance in a cell are generally the preferred form of expression vectors when using recombinant DNA techniques.
The expression constructs or vectors encompassed in the present invention comprise a nucleic acid construct of the invention in a form suitable for expression of the nucleic acid in a cell. Expression in prokaryotic cells is encompassed in the present invention. It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the cell to be transformed, the level of expression of protein desired, etc. The expression vectors of the invention can be introduced into cells to thereby produce proteins or peptides, including fusion proteins or peptides, encoded by nucleic acids as described herein (e.g., two-component regulatory proteins, mutant forms of two-component regulatory proteins, fusion proteins, etc.).
Regulatory sequences include those that direct constitutive expression of a nucleotide sequence as well as those that direct inducible expression of the nucleotide sequence only under certain conditions. A bacterial promoter is any DNA sequence capable of binding bacterial RNA polymerase and initiating the downstream (3′) transcription of a coding sequence into mRNA. A promoter can have a transcription initiation region, which is usually placed proximal to the 5′ end of the coding sequence. This transcription initiation region typically includes an RNA polymerase binding site and a transcription initiation site. A bacterial promoter can also have a second domain called an operator, which can overlap an adjacent RNA polymerase binding site at which RNA synthesis begins. The operator permits negative regulated (inducible) transcription, as a gene repressor protein can bind the operator and thereby inhibit transcription of a specific gene. Constitutive expression can occur in the absence of negative regulatory elements, such as the operator. In addition, positive regulation can be achieved by a gene activator protein binding sequence, which, if present is usually proximal (5′) to the RNA polymerase binding sequence.
An example of a gene activator protein is the catabolite activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli (Raibaud et al. (1984) Annu. Rev. Genet. 18:173). Regulated expression can therefore be either positive or negative, thereby either enhancing or reducing transcription. Other examples of positive and negative regulatory elements are well known in the art. Various promoters that can be included in the protein expression system include, but are not limited to, a T7/LacO hybrid promoter, a trp promoter, a T7 promoter, a lac promoter, and a bacteriophage lambda promoter. Any suitable promoter can be used to carry out the present invention, including the native promoter or a heterologous promoter. Heterologous promoters can be constitutively active or inducible. A non-limiting example of a heterologous promoter is given in U.S. Pat. No. 6,242,194 to Kullen and Klaenhammer.
Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, lactose (lac) (Chang et al. (1987) Nature 198:1056), and maltose. Additional examples include promoter sequences derived from biosynthetic enzymes such as tryptophan (trp) (Goeddel et al. (1980) Nucleic Acids Res. 8:4057; Yelverton et al. (1981) Nucleic Acids Res. 9:731; U.S. Pat. No. 4,738,921; EPO Publication Nos. 36,776 and 121,775). The beta-lactamase (bla) promoter system (Weissmann, (1981) “The Cloning of Interferon and Other Mistakes,” in Interferon 3 (ed. I. Gresser); bacteriophage lambda PL (Shimatake et al. (1981) Nature 292:128); the arabinose-inducible araB promoter (U.S. Pat. No. 5,028,530); and T5 (U.S. Pat. No. 4,689,406) promoter systems also provide useful promoter sequences. See also Balbas (2001) Mol. Biotech. 19:251-267, where E. coli expression systems are discussed.
In addition, synthetic promoters that do not occur in nature also function as bacterial promoters. For example, transcription activation sequences of one bacterial or bacteriophage promoter can be joined with the operon sequences of another bacterial or bacteriophage promoter, creating a synthetic hybrid promoter (U.S. Pat. No. 4,551,433). For example, the tac (Amann et al. (1983) Gene 25:167; de Boer et al. (1983) Proc. Natl. Acad. Sci. 80:21) and trc (Brosius et al. (1985) J. Biol. Chem. 260:3539-3541) promoters are hybrid trp-lac promoters comprised of both trp promoter and lac operon sequences that are regulated by the lac repressor. The tac promoter has the additional feature of being an inducible regulatory sequence. Thus, for example, expression of a coding sequence operably linked to the tac promoter can be induced in a cell culture by adding isopropyl-1-thio-β-D-galactoside (IPTG). Furthermore, a bacterial promoter can include naturally occurring promoters of non-bacterial origin that have the ability to bind bacterial RNA polymerase and initiate transcription. A naturally occurring promoter of non-bacterial origin can also be coupled with a compatible RNA polymerase to produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA polymerase/promoter system is an example of a coupled promoter system (Studier et al. (1986) J. Mol. Biol. 189:113; Tabor et al. (1985) Proc. Natl. Acad. Sci. 82:1074). In addition, a hybrid promoter can also be comprised of a bacteriophage promoter and an E. coli operator region (EPO Publication No. 267,851).
The vector can additionally contain a nucleotide sequence encoding the repressor (or inducer) for that promoter. For example, an inducible vector of the present invention can regulate transcription from the Lac operator (LacO) by expressing the nucleotide sequence encoding the Lad repressor protein. Other examples include the use of the lexA gene to regulate expression of pRecA, and the use of trpO to regulate ptrp. Alleles of such genes that increase the extent of repression (e.g., lacIq) or that modify the manner of induction (e.g., lambda CI857, rendering lambda pL thermo-inducible, or lambda CI+, rendering lambda pL chemo-inducible) can be employed. In addition to a functioning promoter sequence, an efficient ribosome-binding site is also useful for the expression of the fusion construct. In prokaryotes, the ribosome binding site is called the Shine-Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 nucleotides in length located 3-11 nucleotides upstream of the initiation codon (Shine et al. (1975) Nature 254:34). The SD sequence is thought to promote binding of mRNA to the ribosome by the pairing of bases between the SD sequence and the 3′ end of bacterial 16S rRNA (Steitz et al. (1979) “Genetic Signals and Nucleotide Sequences in Messenger RNA,” in Biological Regulation and Development: Gene Expression (ed. R. F. Goldberger, Plenum Press, NY).
Two-component regulatory proteins and proteins under the control of two-component sensing and regulatory molecules can also be secreted from the cell by creating chimeric DNA molecules that encode a protein comprising a signal peptide sequence fragment that provides for secretion of the two-component regulatory polypeptides in bacteria (U.S. Pat. No. 4,336,336). The signal sequence fragment typically encodes a signal peptide comprised of hydrophobic amino acids that direct the secretion of the protein from the cell. The protein is either secreted into the growth medium (Gram-positive bacteria) or into the periplasmic space, located between the inner and outer membrane of the cell (Gram-negative bacteria). Preferably there are processing sites, which can be cleaved either in vivo or in vitro, encoded between the signal peptide fragment and the protein of the invention.
DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, such as the E. coli outer membrane protein gene (ompA) (Masui et al. (1983) FEBS Lett. 151(1):159-164; Ghrayeb et al. (1984) EMBO J. 3:2437-2442) and the E. coli alkaline phosphatase signal sequence (phoA) (Oka et al. (1985) Proc. Natl. Acad. Sci. 82:7212). Other prokaryotic signals include, for example, the signal sequence from penicillinase, Ipp, or heat stable enterotoxin II leaders.
Typically, transcription termination sequences recognized by bacteria are regulatory regions located 3′ to the translation stop codon and thus, together with the promoter, flank the coding sequence. These sequences direct the transcription of an mRNA that can be translated into the polypeptide encoded by the DNA. Transcription termination sequences frequently include DNA sequences (of about 50 nucleotides) that are capable of forming stem loop structures that aid in terminating transcription. Examples include transcription termination sequences derived from genes with strong promoters, such as the trp gene in E. coli as well as other biosynthetic genes.
Bacteria such as Lactobacillus acidophilus generally utilize the translation start codon ATG, which specifies the amino acid methionine (which is modified to N-formylmethionine in prokaryotic organisms). Bacteria also recognize alternative translation start codons, such as the codons GTG and TTG, which code for valine and leucine, respectively. However, when these alternative translation start codons are used as the initiation codon, these codons direct the incorporation of methionine rather than of the amino acid that they normally encode. Lactobacillus acidophilus NCFM recognizes these alternative translation start sites and incorporates methionine as the first amino acid.
The expression vectors will have a plurality of restriction sites for insertion of the sequence of the invention so that it is under transcriptional regulation of the regulatory regions. Selectable marker genes that ensure maintenance of the vector in the cell can also be included in the expression vector. Preferred selectable markers include those which confer resistance to drugs such as ampicillin, chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline (Davies et al. (1978) Annu. Rev. Microbiol. 32:469). Selectable markers can also allow a cell to grow on minimal medium, or in the presence of toxic metabolite and can include biosynthetic genes, such as those in the histidine, tryptophan, and leucine biosynthetic pathways.
The regulatory regions can be native (homologous), or can be foreign (heterologous) to the cell and/or the nucleotide sequence of the invention. The regulatory regions can also be natural or synthetic. Where the region is “foreign” or “heterologous” to the cell, it is intended that the region is not found in the native cell into which the region is introduced. Where the region is “foreign” or “heterologous” to the sequence of the invention, it is intended that the region is not the native or naturally occurring region for the operably linked two-component regulatory nucleotide sequence of the invention. For example, the region can be derived from phage. While the sequences could be expressed using heterologous regulatory regions, native regions can be used. Such constructs would be expected in some cases to alter expression levels of two-component regulatory proteins in the cell. Thus, the phenotype of the cell could be altered.
In preparing the expression cassette, the various DNA fragments can be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers can be employed to join the DNA fragments or other manipulations can be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, can be involved.
The invention further provides a vector comprising a nucleic acid molecule of the invention cloned into the vector in an antisense orientation. That is, the nucleic acid molecule is operably linked to a regulatory sequence in a manner that allows for expression (by transcription of the DNA molecule) of an RNA molecule that is antisense to two-component regulatory mRNA. Regulatory sequences operably linked to a nucleic acid cloned in the antisense orientation can be chosen to direct the continuous or inducible expression of the antisense RNA molecule. The antisense expression vector can be in the form of a recombinant plasmid or phagemid in which antisense nucleic acids are produced under the control of a high efficiency regulatory region, the activity of which can be determined by the cell type into which the vector is introduced. For a discussion of the regulation of gene expression using antisense genes see Weintraub et al. (1986) Reviews—Trends in Genetics, Vol. 1(1).
Alternatively, some of the above-described components can be put together in transformation vectors. Transformation vectors are typically comprised of a selectable marker that is either maintained in a replicon or developed into an integrating vector, as described above.
The production of bacteria containing heterologous genes, the preparation of starter cultures of such bacteria, and methods of fermenting substrates, particularly food substrates such as milk, can be carried out in accordance with known techniques, including but not limited to those described in Mäyrä-Mäkinen and Bigret (1993) Lactic Acid Bacteria. Salminen and vonWright eds. Marcel Dekker, Inc. New York. 65-96; Sandine (1996) Dairy Starter Cultures Cogan and Accolas eds. VCH Publishers, New York. 191-206; Gilliland (1985) Bacterial Starter Cultures for Food. CRC Press, Boca Raton, Fla.
By “fermenting” is intended the energy-yielding, metabolic breakdown of organic compounds by microorganisms that generally proceeds under anaerobic conditions and with the evolution of gas.
Nucleic acid molecules of the invention can be introduced into cells by methods known in the art. By “introducing” is intended introduction into prokaryotic cells via conventional transformation or transfection techniques, or by phage-mediated infection. As used herein, the terms “transformation,” “transduction,” “conjugation,” and “protoplast fusion” are intended to refer to a variety of art-recognized techniques for introducing foreign nucleic acid (e.g., DNA) into a cell, including calcium phosphate or calcium chloride co-precipitation, DEAE-dextran-mediated transfection, lipofection, or electroporation. Suitable methods for transforming or transfecting cells can be found in Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.) and other laboratory manuals.
Bacterial cells used to express the sequences of the invention are cultured in suitable media, as described generally in Sambrook et al. (1989) Molecular Cloning, A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.).
Bacteria respond to their environment through the interation of two regulatory proteins in a two-component transduction system. One protein, generally located in the cytoplasmic membrane, is a sensor that monitors the environment, while the other is a response regulator that mediates an adaptive response, often through effecting a change in the expression of one or more genes. Two-component regulatory system proteins from different bacterial species share considerable amino acid sequence homology. The sensor protein, a histidine kinase, has an N-terminal domain (input domain) (PFAM Accession No. PF00512) that detects stimuli either directly, or though interaction with a receptor. This domain is a dimerization and phosphoacceptor domain. The cytoplasmic region (transmitter domain) of the sensor protein is highly conserved, and comprises two independently folding domains: the phosphotransfer domain and the ATP-binding kinase domain (PFAM Accession No. PF02518). The N-terminal domain may be linked through the phosphotransfer domain to the kinase domain by a HAMP domain (PFAM Accession PF00672). The phosphotransfer domain has a histidine residue in a region called the H box, that is involved in protein autophosphorylation and phosphatase activity. The catalytic (ATP-binding) domain contains regions of amino acid similarity, including the N, G1, F, and G2 boxes, which have been classically defined in alignments of the histidine kinase superfamily. The G1 and G2 boxes are glycine-rich sequences that resemble the nucleotide-binding motifs of other proteins, and the F box is named by a conserved phenyalanine residue. Histidine kinases fall into three subfamilies, with proteins containing the transmitter domain preceded by an amino-terminal input domain, as described above, in the classical protein subfamily. More complex histidine kinases possess a receiver domain that follows the transmitter domain. This receiver domain is similar to those from response regulators and is linked to an Hpt module (Histidine containing PhosphoTransfer). The phosphotransfer domain is remote from the kinase domain and separate from the sensor domain. These proteins are members of the unorthodox protein subfamily. The third subfamily, the hybrid proteins, are similar to the unorthodox proteins, but the Hpt module is not linked to the receiver.
Histidine kinases may also act as phosphoprotein phosphatases, increasing the dephosphorylation of their cognate response regulators in an ATP-dependent fashion. These phosphorylation/dephosphorylation reactions allow precise control over the amount of the phosphorylated form of the response regulator in the cell.
Assays to measure histidine kinase activity are well known in the art (see, for example, Stewart et al. (1998) Biochemistry 37:12269-12279; Levit et al. (1999) Biochemistry. 38:6651-6658). Methods for identifying active variants of histidine kinase proteins are well known in the art (see, for example, Tawa and Stewart (1994) J. Bacteriol. 176:4210-4218; Hirschman et al. (2001) Biochemistry. 40:13876-13887; Marina et al. (2001) J. Biol. Chem. 276:41182-41190). Methods to identify essential amino acids in the HAMP domain are well known in the art (see, for example, Appleman and Stewart (2003) J. Bacteriol. 185:89-97).
The response regulator generally has two domains, a conserved amino-terminal region termed the receiver domain (PFAM Accession No. PF00072), and a C-terminal output domain (or effector domain) (PFAM Accession No. PF00486), which is typically a transcriptional regulator (Pao and Saier (1995) J. Mol. Evol. 40:136-154). The receiver domain contains three conserved aspartyl and one conserved lysine residue that characterize the response regulator family. The conserved residues fold together to form the active site, where an aspartate residue accepts the phosphoryl group from the transmitter histidine residue, or alternatively, from a variety of small molecules (not
ATP). The phosphorylation state of the receiver domain affects the activity of the output domain to elicit a response.
The transcriptional regulatory protein, C terminal domain is almost always found associated with the response regulator receiver domain. It may play a role in DNA binding (Martinez-Hackert and Stock (1997) Structure January 5:109-124). Most output domains have a helix-turn-helix DNA-binding motif. Assays to measure activity of two-component regulatory systems are well known in the art (see, for example, Lee et al. (2004) Infect. Immun. 72:3968-3973; Walker and Miller (2004) J. Bacteriol. 186:4056-4066; Saini et al. (2004) Microbiology. 150:865-875; Abo-Amer et al. (2004) J. Bacteriol. 186:1879-1889). Methods to identify variants that retain activity are well known in the art (see, for example, Baruah et al. (2004) J. Bacteriol. 186:1694-1704; Wang et al. (2001) J. Bacteriol. 183:2795-2802; Piazza et al. (1999) J. Bacteriol. 181:4540-4548).
Proteins of the present invention having a response regulator receiver domain and/or a transcriptional regulatory protein, C terminal domain include those set forth in SEQ ID NOS:2, 12, 22, 26 and 28. Proteins with a histidine kinase A (phosphoacceptor) N-terminal domain of the present invention include those set forth in SEQ ID NOS:4, 14, 20, 24, 30 and 36. Proteins with a histidine kinase-, DNA gyrase B-, and HSP90-like ATPase domain of the present invention include those set forth in SEQ ID NOS:4, 14, 20, 24, 30, 34,and 36. Proteins with a HAMP domain of the present invention include those set forth in SEQ ID NOS:4, 14 and 24. Additional proteins with a response regulator domain of PFAM 00072 include SEQ ID NO:32.
The GGDEF domain (PFAM Accession No. PF00990) is found linked to a wide range of non-homologous domains in a variety of bacteria. It has been shown to be homologous to the adenyl cyclase catalytic domain (Pei and Grishin (2001) Proteins 42:210-216) and has diguanylate cyclase activity (Paul et al. (2004) Genes Dev. 18:715-727; Galperin et al. (2001) FEMS Microbiol. Lett. 203:11-21). This observation correlates with the functional information available on two GGDEF-containing proteins, namely diguanylate cyclase and phosphodiesterase A of Acetobacter xylinum, both of which regulate the turnover of cyclic diguanosine monophosphate. Assays to measure diguanylate cyclase activity are well known in the art (see, for example, Paul et al. (2004) Genes Dev. 18:715-727). Proteins with a GGDEF domain of the present invention include those set forth in SEQ ID NO:16.
The EAL domain (PFAM Accession No. PF00563) is found in diverse bacterial signaling proteins. It is called EAL for its conserved residues. The EAL domain is a good candidate for a diguanylate phosphodiesterase function (Galperin et al. (2001) FEMS Microbiol. Lett. 203:11-21). The domain contains many conserved acidic residues that could participate in metal binding and might form the phosphodiesterase active site. It often but not always occurs along with PAS and DUF9 domains that are also found in many signaling proteins. Assays to measure phosphodiesterase activity are well known in the art (see, for example, Ausmees et al. (2001) FEMS Microbiol. Lett. 204:163-167). Proteins with a EAL domain of the present invention include those set forth in SEQ ID NO:18.
Many bacterial transcription regulatory proteins bind DNA via a helix-turn-helix (HTH) motif These proteins are very diverse, but for convenience may be grouped into subfamilies on the basis of sequence similarity (Dehoux and Cossart (1995) Mol. Microbiol. 15:591). The deoR family (PFAM Accession No. PF00455) groups together a range of proteins, including lacR, deor, fucR and gutR. Within this family, the HTH motif is situated towards the N-terminus (Mortensen et al. (1989) EMBO J. 8:325-331; Rosey and Stewart (1992) J. Bacteriol. 174:6159-6170; Lu and Lin (1989) Nucleic Acids Res. 17:4883-4884). One other such family, marR, groups together a range of proteins, including emrR, hpcR, hpR, marR, pecS, petP, papX, prsX, ywaE, yxaD and yybA. The Mar proteins are involved in the multiple antibiotic resistance, a non-specific resistance system. The expression of the mar operon is controlled by a repressor, MarR. A large number of compounds induce transcription of the mar operon. This is thought to be due to the compound binding to MarR, and the resulting complex stops MarR binding to the DNA. With the MarR repression lost, transcription of the operon proceeds (Sulavik et al. (1997) J. Bacteriol. 179:1857-1866). Assays to measure transcription factor activity are well known in the art (see, for example, Sulavik et al. (1997) J. Bacteriol. 179:1857-1866). Proteins with a bacterial regulatory protein, deoR domain of the present invention include those set forth in SEQ ID NO:40. Proteins in the marR family of the present invention include those set forth in SEQ ID NO:58.
The Patatin-like phospholipase family (PFAM Accession No. PF01734) consists of various patatin glycoproteins from the total soluble protein in potato tubers, with some members also found in vertebrates. Patatin is a storage protein but it also has the enzymatic activity of lipid acyl hydrolase, catalysing the cleavage of fatty acids from membrane lipids (Mignery et al. (1988) Gene 62:27-44). Proteins in the patatin-like phospholipase family of the present invention include those set forth in SEQ ID NO:44.
The band 7 protein (PFAM Accession No. PF01145) is an integral membrane protein which is thought to regulate cation conductance by interacting with other proteins of the junctional complex of the membrane skeleton. A variety of proteins belong to this family. These include the prohibitins, cytoplasmic anti-proliferative proteins and stomatin, an erythrocyte membrane protein. Bacterial HflC protein also belongs to this family. Structurally, these proteins consist of a short N-terminal domain which is followed by a transmembrane region and a variable size (from 170 to 350 residues) C-terminal domain. Proteins in the band 7 protein family of the present invention include those set forth in SEQ ID NO:50.
ABC transporters form a large family of proteins responsible for translocation of a variety of compounds across biological membranes. They are minimally composed of four domains, with two transmembrane domains (TMDs) (PFAM Accession PF00664) responsible for allocrite binding and transport and two nucleotide-binding domains (NBDs) (PFAM Accession PF00005) responsible for coupling the energy of ATP hydrolysis to conformational changes in the TMDs. Both NBDs are capable of ATP hydrolysis, and inhibition of hydrolysis at one NBD effectively abrogates hydrolysis at the other. The proteins belonging to this family also contain one or two copies of the ‘A’ consensus sequence (Walker et al. (1982) EMBO J. 1:945-951) or the ‘P-loop’ (Saraste et al. (1990) Trends Biochem Sci. 15:430-434). Methods for measuring ATP-binding and transport are well known in the art (see, for example, Hung et al. (1998) Nature 396:703-707; Higgins et al. (1990) J. Bioenerg. Biomembr. 22:571-592). ABC transporter proteins of the present invention include those set forth in SEQ ID NOS:60 and 82.
Characterized members of the Multi Antimicrobial Extrusion (MATE) family (PFAM Accession No. PF01554) function as drug/sodium antiporters. These proteins mediate resistance to a wide range of cationic dyes, fluroquinolones, aminoglycosides and other structurally diverse antibiotics and drugs. MATE proteins are found in bacteria, archaea and eukaryotes. These proteins are predicted to have 12-helical transmembrane regions, some of the animal proteins may have an additional C-terminal helix. Methods for measuring antibiotic and drug resistance are well known in the art (see, for example, Mitchell et al. (1998) Antimicrob. Agents Chemother. 42:475-477; Mitchell et al. (1999) J. Biol. Chem. 274:3541-3548). Multi Antimicrobial Extrusion (MATE) family proteins of the present invention include those set forth in SEQ ID NO:72.
Lantibiotic and non-lantibiotic bacteriocins are synthesized as precursor peptides containing N-terminal extensions (leader peptides), which are cleaved off during maturation. Most non-lantibiotics and also some lantibiotics have leader peptides of the so-called double-glycine type. These leader peptides share consensus sequences and also a common processing site with two conserved glycine residues in positions −1 and −2. The double-glycine-type leader peptides are unrelated to the N-terminal signal sequences, which direct proteins across the cytoplasmic membrane via the sec pathway. Various methods can be used to assay for bacteriocin activity including, for example, the experimental section herein, Ogunbanwo et al. (2003) Afr. J. Biotechnology 2: 219-227, Allison et al. (1994) J. Bacteriol. 176:2235-2241 and Van Loveren et al. (2000) Caries Research 34:481-485. Examples of amino acid sequences of the present invention that have double-glycine-type leader peptides include those set forth in SEQ ID NOS:74, 76, 84, 86, 90, 92, 96 and 114.
The processing sites of these peptides are different from typical signal peptidase cleavage sites, suggesting that a different processing enzyme is involved. Peptide bacteriocins are exported across the cytoplasmic membrane by a dedicated ATP-binding cassette (ABC) transporter. The ABC transporter is the maturation protease and its proteolytic domain resides in the N-terminal part of the protein (Havarstein et al. (1995) Mol. Microbiol. 16:229-240). This peptidase domain is found in a wide range of ABC transporters, however the presumed catalytic cysteine and histidine are not conserved in all members of this family. Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Families are grouped by their catalytic type, the first character representing the catalytic type: S, serine; T, threonine; C, cysteine; A, aspartic; M, metallo and U, unknown. A clan that contains families of more than one type is described as being of type P. The serine, threonine and cysteine peptidases utilise the catalytic part of an amino acid as a nucleophile and form an acyl intermediate—these peptidases can also readily act as transferases. In the case of aspartic and metallopeptidases, the nucleophile is an activated water molecule.
Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. The peptidase domain is responsible for peptide bond hydrolysis; in Merops this is termed the peptidase unit. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad (Barrett and Rawlings (2001) Biol. Chem. 382:727-733). The peptidase C39 family (clan CA) (PFAM Accession No. PF03412) is found in a wide range of ABC transporters, which are maturation proteases for peptide bacteriocins, the proteolytic domain residing in the N-terminal region of the protein (Rawlings and Barrett (1995) Methods Enzymol. 248:183-228). Assays for measuring peptidase activity are well known in the art (see, for example, (Havarstein et al. (1995) Mol. Microbiol. 16:229-240). Proteins of the present invention in the peptidase C39 family include those set forth in SEQ ID NO:82.
RelE and RelB form a toxin-antitoxin system. RelE represses translation, probably through binding ribosomes (Pedersen et al. (2002) Mol Microbiol 45:501-510 and Terry et al. (2001) J. Bacteriol 183:2700-2703). A polypeptide having a RelE and RelB domain is set forth in SEQ ID NO:52.
Viruses, parasites and bacteria are covered in protein and sugar molecules that help them gain entry into a host by counteracting the host's defences. One such molecule is the M protein produced by certain streptococcal bacteria. M proteins embody a motif that is now known to be shared by many Gram-positive bacterial surface proteins. The motif includes a conserved hexapeptide, which precedes a hydrophobic C-terminal membrane anchor, which itself precedes a cluster of basic residues. It has been proposed that this hexapeptide sequence is responsible for a post-translational modification necessary for the proper anchoring of the proteins which bear it, to the cell wall. A polypeptide having such a domain is found in SEQ ID NO:78.
The LytTr domain is found in a variety of bacterial transcriptional regulators. The domain binds to a specific DNA sequence pattern (see Nikolskya et al. (2002) Nucleic Acid Research 30:2453-459). The LytTr domain is a DNA-binding, potential winged helix-turn-helix domain (˜100 residues) present in a variety of bacterial transcriptional regulators of the algR/agrA/lytR family. It is named after the lytR response regulators involved in the regulation of cell autolysis. The LytTr domain binds to a specific DNA sequence pattern in the upstream regions of target genes. The N-terminal of the protein contains a response regulator receiver domain. The consensus sequence for this domain is in PFAM04397. A polypeptide having this domain is set forth in SEQ ID NO:32.
Members of the CAAX amino terminal protease family are probably proteases. The family contains CAAX prenyl protease. The proteins contain a highly conserved Glu-Glu motif at the amino end of the alignment. The alignment also contains two histidine residues that may be involved in zinc binding. This family consists of various hypothetical protein sequences for which the function is unknown. One of the proteins is an abortive infection protein that confers resistance to the bacteriophage Phi 712. AbiG is an abortive infection (Abi) mechanism encoded by the conjugative plasmid pCI750 originally isolated from Lactococcus lactis subsp. cremoris UC653. The resistance mechanism acts at neither the phage adsorption or phage DNA restriction level. Also in this family is a series of bacteriocin-like peptides PlnP, PlnI, PlnT, PlnP and PlnU from Lactobacillus plantarum C11. Lactobacillus plantarum C11 secretes a small cationic peptide, plantaricin A, that serves as an induction signal for bacteriocin production as well as transcription of plnABCD. The plnABCD operon encodes the plantaricin A precursor (PlnA) itself and determinants (PlnBCD) for a signal transducing pathway. The consensus sequence for this domain is in PFAM12517. A polypeptide having this domain is set forth in SEQ ID NO:98 and 102.
Glycosyl hydrolases are key enzymes of carbohydrate metabolism. Family 31 comprises of enzymes that are, or similar to, alpha-galactosidases. O-Glycosyl hydrolases (EC 3.2.1.-) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl hydrolases, based on sequence similarity, has led to the definition of 85 different families. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site PUBMED:PUB00007032.
Because the fold of proteins is better conserved than their sequences, some of the families can be grouped in ‘clans’. Glycoside hydrolase family 31 comprises enzymes with several known activities; α-glucosidase (EC:3.2.1.20), α-galactosidase (EC:3.2.1.22); glucoamylase (EC:3.2.1.3), sucrase-isomaltase (EC:3.2.1.48) (EC:3.2.1.10); α-xylosidase (EC:3.2.1); α-glucan lyase (EC:4.2.2.13). Glycoside hydrolase family 31 groups a number of glycosyl hydrolases on the basis of sequence similarities PUBMED:1747104, PUBMED:1761061, PUBMED:1743281 An aspartic acid has been implicated PUBMED:1856189 in the catalytic activity of sucrase, isomaltase, and lysosomal α-glucosidase. The consensus sequence for this domain is in PFAM01055. A polypeptide having this domain is set forth in SEQ ID NO:116.
The mur ligase family, glutamate ligase domain contains a number of related ligase enzymes which have EC numbers 6.3.2. This family includes: MurC, MurD, MurE, MurF, Mpl and FolC. MurC, MurD, Mure and MurF catalyse consecutive steps in the synthesis of peptidoglycan. Peptidoglycan consists of a sheet of two sugar derivatives, with one of these N-acetylmuramic acid attaching to a small pentapeptide. The pentapeptide is is made of L-alanine, D-glutamic acid, Meso-diaminopimelic acid and D-alanyl alanine. The peptide moiety is synthesised by successively adding these amino acids to UDP-N-acetylmuramic acid. MurC transfers the L-alanine, MurD transfers the D-glutamate, MurE transfers the diaminopimelic acid, and MurF transfers the D-alanyl alanine This family also includes Folylpolyglutamate synthase that transfers glutamate to folylpolyglutamate. Proteins containing this domain include a number of related ligase enzymes that catalyse consecutive steps in the synthesis of peptidoglycan. Proteins also include folylpolyglutamate synthase that transfers glutamate to folylpolyglutamate and cyanophycin synthetase that catalyses the biosynthesis of the cyanobacterial reserve material multi-L-arginyl-poly-L-aspartate (cyanophycin). The C-terminal domain is almost always associated with the cytoplasmic peptidoglycan synthetases, N-terminal domain. The consensus sequence for this domain is in PFAM02875. A polypeptide having this domain is set forth in SEQ ID NO:118.
ATP-binding cassette (ABC) transporters are multidomain membrane proteins, responsible for the controlled efflux and influx of substances (allocrites) across cellular membranes. They are minimally composed of four domains, with two transmembrane domains (TMDs) responsible for allocrite binding and transport and two nucleotide-binding domains (NBDs) responsible for coupling the energy of ATP hydrolysis to conformational changes in the TMDs. Both NBDs are capable of ATP hydrolysis, and inhibition of hydrolysis at one NBD effectively abrogates hydrolysis at the other. Hydrolysis at the two NBDs may occur in an alternative fashion although they appear substantially functionally symmetrical in terms of their binding to diverse nucleotides. A number of bacterial transport systems have been found to contain integral membrane components that have similar sequences: these systems fit the characteristics of ATP-binding cassette transporters. The proteins form homo- or hetero-oligomeric channels, allowing ATP-mediated transport. Hydropathy analysis of the proteins has revealed the presence of 6 possible transmembrane regions. These proteins belong to family 2 of ABC transporters. The consensus sequence for this domain is in PFAM01061. A polypeptide having this domain is set forth in SEQ ID NO:120 and 122.
ATP-binding cassette (ABC) transporters are multidomain membrane proteins, responsible for the controlled efflux and influx of substances (allocrites) across cellular membranes. They are minimally composed of four domains, with two transmembrane domains (TMDs) responsible for allocrite binding and transport and two nucleotide-binding domains (NBDs) responsible for coupling the energy of ATP hydrolysis to conformational changes in the TMDs. Both NBDs are capable of ATP hydrolysis, and inhibition of hydrolysis at one NBD effectively abrogates hydrolysis at the other. Hydrolysis at the two NBDs may occur in an alternative fashion although they appear substantially functionally symmetrical in terms of their binding to diverse nucleotides. A variety of ATP-binding transport proteins have a six transmembrane helical region. They are all integral membrane proteins involved in a variety of transport systems. Members of this family include; the cystic fibrosis transmembrane conductance regulator (CFTR), bacterial leukotoxin secretion ATP-binding protein, multidrug resistance proteins, the yeast leptomycin B resistance protein, the mammalian sulphonylurea receptor and antigen peptide transporter 2. Many of these proteins have two such regions. The consensus sequence for this domain is in PFAM00664. A polypeptide having this domain is set forth in SEQ ID NO:120 and 122.
GTPase of unknown function family is a member of the G-protein superfamily clan. This clan includes the following Pfam members: NOG1; MMR_HSR1; IIGP; GTP_EFTU; GTP_CDC; Dynamin_N; DUF258; Arf; AIG1; Human HSR1, has been localized to the human MHC class I region and is highly homologous to a putative GTP-binding protein, MMR1 from mouse. These proteins represent a new subfamily of GTP-binding proteins that has both prokaryote and eukaryote members. The consensus sequence for this domain is in PFAM01926. A polypeptide having this domain is set forth in SEQ ID NO:154.
Proteins containing the ParB-like nuclease domain, appear to be related to the Escherichia coli plasmid protein ParB, which preferentially cleaves single-stranded DNA. ParB also nickssupercoiled plasmid DNA preferably at sites with potential single-stranded character, like AT-rich regions and sequences that can form cruciform structures. ParB also exhibits 5-3 exonuclease activity. The consensus sequence for this domain is in PFAM02195. A polypeptide having this domain is set forth in SEQ ID NO:158 and 162.
The CobQ/CobB/MinD/ParA nucleotide binding domain family consists of various cobyrinic acid a,c-diamide synthases. These include CbiA and CbiP from S. typhimurium (Pollich et al. (1995) J. Bacteriol 177:1487-4487, and CobQ from R. capsulatus (Roth et al. (1993) J Bacteriol 175:3303-3316. These amidases catalyse amidations to various side chains of hydrogenobyrinic acid or cobyrinic acid a,c-diamide in the biosynthesis of cobalamin (vitamin B12) from uroporphyrinogen III. Vitamin B12 is an important cofactor and an essential nutrient for many plants and animals and is primarily produced by bacteria (Pollich et al. (1995) J. Bacteriol 177:1487-4487). The family also contains dethiobiotin synthetases as well as the plasmid partitioning proteins of the MinD/ParA family (Raux et al. (1998) Biochem J 335:159-166). This entry consists of various cobyrinic acid a,c-diamide synthases. These include CbiA and CbiP from Salmonella typhimurium, and CobQ from Rhodobacter capsulatus. These amidases catalyse amidations to various side chains of hydrogenobyrinic acid or cobyrinic acid a,c-diamide in the biosynthesis of cobalamin (vitamin B12) from uroporphyrinogen III. Vitamin B12 is an important cofactor and an essential nutrient for many plants and animals and is primarily produced by bacteria. The consensus sequence for this domain is in PFAM01656. A polypeptide having this domain is set forth in SEQ ID NO:160.
Glucose inhibited division protein is a family of bacterial Glucose inhibited division proteins these are probably involved in the regulation of cell division. This family is a member of the Methyltransferase superfamily clan. This clan includes the following Pfam members: CheR; CMAS; Cons_hypoth95; DNA_methylase; DOT1; Eco57I; Fibrillarin; FtsJ; GidB; MethyltransfD12; Methyltransf—10; Methyltransf—2; Methyltransf—3; Methyltransf—4; Methyltransf—5; Methyltransf—8; Methyltransf—9; Met—10; Mg-por_mtran_C; MT-A70; MTS; N6_Mtase; N6_N4_Mtase; NNMT_PNMT_TEMT; NodS; Nol1_Nop2_Fmu; PARP_regulatory; PCMT; PrmA; RrnaAD; rRNA_methylase; Spermine_synth; TehB; TPMT; TRM; tRNA_U5-meth_tr; Ubie_methyltran; UPF0020. GidB (glucose-inhibited division protein B) appears to be present and in a single copy in all complete eubacterial genomes so far. Its mode of action is unknown, but a methytransferase fold is reported from the crystal structure. It may be a family of bacterial glucose inhibited division proteins that are involved in the regulation of cell division. A polypeptide having this domain is set forth in SEQ ID NO:164.
Many two-component response systems are known in bacteria, including, but not limited to, the Arc two-component signal transduction system of E. coli, which regulates numerous operons in response to respiratory growth conditions (see, for example, Kwon et al. (2000) J. Bacteriol. 182:2960-2966); PhoQ/PhoP, which responds to changes in environmental levels of Mg2+ (see, for example, Marina et al. (2001) J. Biol. Chem. 276:41182-41190; PmrAB, which modulates resistance to cationic antimicrobial peptides (see, for example, Moskowitz et al. (2004) J. Bact. 186:575-579); EnvZ/OmpR, which respond to changes in osmotic conditions (see, for example, Cai and Inouye (2002) J. Biol. Chem. 277:24155-24161); NarX/NarL, which respond to nitrite levels (see, for example, Stewart (1994) Antonie Van Leeuwenhoek 66:37-45); PhoR/PhoB, which responds to low phosphate concentrations in the environment and periplasmic space (see, for example, Pragai et al. (2004) J. Bacteriol. 186:1182-1190); covRS, which regulates expression of fructosyltransferase (see, for example, Lee et al. (2004) Infect. Immun. 72:3968-3973); and RegB/RegA, which is a highly conserved redox-responding global two-component regulatory system from Rhodobacter capsulatus and Rhodobacter sphaeroides (see, for example, Elsen et al. (2004) Microbiol. Mol. Biol. Rev. 68:263-279).
The two-component regulatory system proteins of the present are useful in regulating the response of an organism to various environmental conditions. Methods are provided wherein properties of microbes used in fermentation are modified to provide bacterial strains able to survive stressful conditions, such as acid or alkaline stress, osmotic or oxidative stress, starvation, or in the presence of other microorganisms (see, for example, Wick and Egli (2004) Adv. Biochem. Eng. Biotechnol. 89:1-45). This ability to survive stressful environmental conditions will increase the utility of these microorganisms in fermenting various foods, as well as allowing them to provide longer-lasting probiotic activity after ingestion. One way this may occur is by enhancing the ability of an organism to survive passage through the gastrointestinal tract. In general the methods comprise overexpressing one or more proteins controlled by two-component sensing and regulatory systems. In one embodiment, the protein is a bacteriocin. By “overexpressing” is meant that the protein of interest is produced in an increased amount in the modified bacterium compared to its production in a wild-type bacterium.
The proteins and nucleic acid sequences encoding them may increase the ability of a microorganism to survive in the presence of an antimicrobial (see, for example, Moskowitz et al. (2004) J. Bact. 186:575-579). They may also enable an microorganism to form a biofilm (see, for example, Danhorn et al. (2004) J. Bacteriol. 186:4492-4501).
The proteins and nucleic acid sequences encoding them may enable an organism to respond to an environmental stimuli, including, but not limited to, turgor pressure, a chemical stimulus, heavy-metal cations, oxygen, iron, an antimicrobial, and glucose.
The following examples are offered by way of illustration and not by way of limitation.
The complete genome of Lactobacillus acidophilus NCFM consists of 1,993,570 nucleotides with an average GC content of 34.71%. In silico analyses revealed the presence of 1864 open reading frames (ORFs) resulting in a coding percentage of 87.9%. One or more protein families (PFam) were attributed to 75% of these ORFs and 89% showed similarities to at least one COG (cluster of orthologous groups of proteins). As a result of the manual annotation curation, only 11.7% of the ORFs remained unknown and 15.8% showed similarities to unclassified genes of other organisms. Of the predicted ORFs, 72.5% were assigned to a defined function. Sequences from the genome of Lactobacillus acidophilus NCFM have been described in U.S. Provisional Patent Application No. 60/465,621 filed on Apr. 23, 2003, U.S. Provisional Patent Application No. 60/480,764 filed on Jun. 23, 2003, U.S. Provisional Patent Application No. 60/546,745 filed on Feb. 23, 2004, U.S. Provisional Patent Application No. 60/551,121 filed on Mar. 8, 2004, U.S. Provisional Patent Application No. 60/551,161 filed on Mar. 8, 2004 and U.S. Provisional Patent Application No. 60/662,712 filed on Oct. 27, 2004, and U.S. patent application Ser. No. 10/831,070 filed on Apr. 23, 2004, U.S. patent application Ser. No. 10/873,467 filed on Jun. 22, 2004, U.S. patent application Ser. No. 11/074,176 filed on Mar. 7, 2005 and U.S. patent application Ser. No. 11/074,226 filed on Mar. 7, 2005, the disclosures of which are incorporated herein by reference in their entireties.
The Origin of Replication was predicted by GC-skew analysis and the ORF orientation shift. Directly adjacent to this locus, a gene showing significant similarities to dnaA was identified. Further analyses revealed the presence of a highly conserved gene arrangement (rnpA, ORF La1978; rpmH, ORF La1979; dnaA, ORF La1; dnaN, ORF La2; recF, ORF La4; and gyrB, ORF La5) which can be found in a wide range of other prokaryotes, including Bacillus subtilis, Escherichia coli, and Synechococcus (Liu and Tsinoremas (1996) Gene 172:105-109, Ogasawara et al. (1985) EMBO J. 4:3345-3350). In order to initiate the chromosome replication, DnaA requires the presence of several DnaA-boxes (Fujikawa et al. (2003) Nucleic Acids Res. 31:2077-2086). Seven DnaA-boxes with a length of 8 nucleotides were determined directly upstream of dnaA, whereas only one was identified downstream of dnaA. Accordingly, this region was designated oriC and most likely represents the DNA replication initiation locus. Subsequently, the genome sequence was rotated and starts 30 nucleotides upstream of dnaA. The Terminus of DNA replication was identified similarly by GC-skew and ORF orientation shift analysis. The exact position could not be determined, since no replication terminator protein could be identified (Griffiths et al. (1998) J. Bacteriol. 180:3360-3367). However, a chromosome segregation helicase (ORF La1077) and DnaD (ORF La1161) were identified at the proposed Terminus locus. In addition, a genome region of ˜300 kilobase pairs with the predicted Terminus in its center showed a significantly lower average GC content. This lower GC content could aid in the separation of the chromosomal strands. The Origin and the Terminus of DNA replication are placed fairly symmetrical in the genome.
Sixty-one tRNAs were identified within the genome. Only 8 tRNAs were located on the lagging strand, mostly clustered around an rRNA locus. tRNAs for all 21 amino acids were found with redundant tRNAs for all amino acids except cysteine and tryptophan. Ribosomal proteins were mainly assembled around one locus at 260 kilobase pairs. Four ribosomal RNA loci were identified throughout the genome. Three of them were clustered within the first 500 kilobase pairs and oriented in the same sense-direction, whereas the fourth rRNA locus, located at ˜1.6 megabases, is oriented in the opposite direction. Thus, all rRNA loci were in phase with the direction of DNA replication.
The COG database classifies paralogous proteins of at least three lineages into functionally related groups. Three major sections are currently described and a forth section includes proteins with poorly characterized functions. The graphical representation of the COG distribution shows that the majority of predicted proteins (64.4%) could be classified into the three functional classes and only 19% were assigned to the “poorly characterized” group. However, 6.6% of COGs could not be assigned into any classification, designated here as COG category 5. Of those, five genome regions stand out, due to their visual dominance (COG-I to COG-V). Functional annotation revealed that all of the genes present in these COG category 5 regions Ito V were predicted to be involved in cell-adherence and initial host-cell recognition (i.e., ORF La1016-ORF La1020, ORF La1377, ORF La1392: mucus binding proteins; ORF La1606-ORF La1612: fibronectin binding proteins; and ORF La1633-ORF La1636: surface bound proteins). Further analyses of other organisms might lead to a separate COG group within the extracellular structures (functional category W) to reflect this set of proteins and their common function.
Analysis of the GC-content distribution showed localized peak deviations from the average GC content of the genome. Without exceptions, GC-content spikes were found to harbor the four rRNA loci (average GC content of 50.88%), whereas the two neighboring low GC-regions at 1.75 megabases (average GC content of 28.5%) revealed the presence of a large uncharacterized region unique to Lactobacillus acidophilus NCFM and an EPS cluster. The EPS cluster consisted of fourteen genes including the highly conserved proteins EpsA-EpsF (ORF La1732-ORF La1737), EpsJ (ORF La1725 and ORF La1726), and EpsI (ORF La1724) and five variable proteins (ORF La1727-ORF La1731) representing glycosyl transferases and polysaccharide polymerases. Together, this set shows high synteny to reported exopolysaccharide (EPS) clusters in streptococci (Stingele et al. (1996) J. Bacteriol. 178:1680-1690) and recently reported in L. gasseri and L. johnsonii (Pridmore et al. (2004) Proc. Natl. Acad. Sci. U.S.A. 101:2512-2517). Scanning electron microscopy of NCFM did not detect an external polysaccharide layer (Hood and Zottola (1987) J. Food Sci. 52:791), and it remains unclear whether the EPS cluster is functional or if any EPS produced is excreted rather than anchored. Three ORFs in the NCFM EPS cluster encode for two UDP-galactopyranose mutases and a membrane protein involved with the export of O-antigen and teichoic acid. Other teichoic acid associated ORFs include a tandem set of teichoic acid biosynthesis and transport proteins (ORF La524 and ORF La525), another predicted biosynthetic protein (ORF La519), two more polysaccharide transporters specific to O-antigen and teichoic acid (ORF La1614 and ORF La1917), along with a cell wall teichoic acid glycosylation protein (ORF La621). An exaggerated inflammatory response from intestinal epithelial cells to gram-negative bacteria can be tempered by teichoic acids from lactobacilli (Vidal et al. (2002) Infect. Immun. 70:2057-2064) suggesting an intimate involvement of teichoic acids and the immune system. The uncharacterized low GC regions and the EPS cluster are centered on two divergently oriented transposases (ORF La1722, ORF La1721, and ORF La1720). The exceptionally low GC content and the presence of mobile elements could indicate the acquisition of this region via horizontal gene transfer.
The NCFM genomic DNA sequence was analyzed for repetitive DNA by a “repeat and match analysis.” One intergenic region between ORF La1550 (DNA polymerase I, polA) and ORF La1551 (putative phosphoribosylamine-glycine ligase, purD) had features characteristic of a SPIDR (SPacers Interspersed Direct Repeats) locus. This region was approximately 2.4 kilobases long and contained 32 nearly perfect repeats of 29 base pairs separated by unique 32 base pair spacers. The SPIDR locus constitutes a novel family of repeat sequences that are present in Bacteria and Archaea but not in Eukarya (Jansen et al. (2002) OMICS 6:23-33). The repeat loci typically consist of repetitive stretches of nucleotides with a length of 25 to 37 base pairs alternated by nonrepetitive DNA spacers of approximately equal size as the repeats. To date, SPIDR loci have been identified in more than forty microorganisms (Jansen et al. (2002) OMICS 6:23-33), but from the lactic acid bacteria, have only been described from Streptococcus spp. Despite their discovery over 15 years ago in E. coli (Ishino et al. (1987) J. Bacteriol. 169:5429-5433), no physiological function has yet been elucidated.
A Gapped BlastP sequence alignment showed that SEQ ID NO:2 (238 amino acids) has about 83% identity from amino acids 1-237 with a protein from Lactobacillus johnsonii that is a two-component regulatory system response regulator (Accession No. NP—964081), about 83% identity from amino acids 1-237 with a protein from Lactobacillus gasseri that is a response regulator consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain (ZP—00046798), about 71% identity from amino acids 1-237 with a protein from Lactobacillus sakei that is a putative response regulator (Accession No. AAD10263), about 72% identity from amino acids 3-237 with a protein from Enterococcus faecalis that is a DNA-binding response regulator VicR (Accession No. NP—814922), and about 72% identity from amino acids 3-237 with a protein from Enterococcus faecalis that is a response regulator VicR (Accession No. CAB64972).
A Gapped BlastP sequence alignment showed that SEQ ID NO:4 (618 amino acids) has about 67% identity from amino acids 7-618 with a protein from Lactobacillus johnsonii that is a two-component regulatory system histidine kinase (Accession No. NP—964082), about 66% identity from amino acids 7-618 with a protein from Lactobicillus gasseri that is a signal transduction histidine kinase (Accession No. ZP—00046799), about 54% identity from amino acids 2-618 with a protein from Lactobacillus sakei that is a putative histidine kinase (Accession No. AAD 10264), about 52% identity from amino acids 4-617 with a protein from Lactobacillus plantarum that is a histidine kinase sensor protein (Accession No. NP—783897), and about 47% identity from amino acids 12-616 with a protein from Enterococcus faecalis that is a sensory box histidine kinase VicK (Accession No. NP—814923).
A Gapped BlastP sequence alignment showed that SEQ ID NO:6 (150 amino acids) has about 79% identity from amino acids 1-150 with a hypothetical protein LJ0247 from Lactobacillus johnsonii (Accession No. NP—964263), about 70% identity from amino acids 45-150 with a protein from Lactobacillus gasseri that is a response regulator of the LytR/AlgR family (Accession No. ZP—00046165), about 37% identity from amino acids 4-150 with a protein from Streptococcus mutans that is a putative transcriptional regulator (Accession No. NP—720879), about 40% identity from amino acids 18-150 with a protein from Oenococcus oeni that is a response regulator of the LytR/AlgR family (Accession No. ZP—00069670), and about 31% identity from amino acids 1-148 with a protein from Leuconostoc mesenteroides that is a response regulator of the LytR/AlgR family (Accession No. ZP—00063955).
A Gapped BlastP sequence alignment showed that SEQ ID NO:8 (426 amino acids) has about 32% identity from amino acids 20-425 with a protein from Lactobacillus johnsonii that is a lactacin F two-component system histidine kinase (Accession No. NP—964617), about 32% identity from amino acids 10-425 with a protein from Lactobacillus salvarius that is AbpK (Accession No. AAM61782), about 27% identity from amino acids 22-426 with a protein from Lactobacillus johnsonii that is a two-component system histidine kinase (Accession No. NP—964473), about 34% identity from amino acids 141-426 with a protein from Carnobacterium piscicola that is a putative histidine kinase PisK (Accession No. AAK69421), and about 31% identity from amino acids 132-426 with a protein from Lactobacillus sakei that is a histidine kinase homolog SapK (Accession No. CAA86944).
A Gapped BlastP sequence alignment showed that SEQ ID NO:10 (265 amino acids) has about 41% identity from amino acids 2-259 with a protein from Lactobacillus salivarius that is AbpR (Accession No. AAM61783), about 40% identity from amino acids 1-256 with a protein from Lactobacillus johnsonii that is a lactacin F two-component system response regulator (Accession No. NP—964619), about 32% identity from amino acids 3-242 with a protein from Lactobacillus johnsonii that is a two-component system response regulator (Accession No. NP—964474), about 29% identity from amino acids 1-250 with a protein from Lactobacillus sakei that is a sakacin A production response regulator SapR (Accession No. CAA86945), and about 29% identity from amino acids 1-246 with a protein from Carnobacterium piscicola that is a response regulator (Accession No. AAB81306).
A Gapped BlastP sequence alignment showed that SEQ ID NO:12 (240 amino acids) has about 73% identity from amino acids 3-239 with a protein from Lactobacillus gasseri that is a response regulator consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain (Accession No. ZP—00046225), about 63% identity from amino acids 3-239 with a protein from Lactobacillus sakei that is a putative response regulator (Accession No. AAD10267), about 63% identity from amino acids 3-236 with a protein from Enterococcus facium that is a response regulator consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain (ZP—00036862), about 62% identity from amino acids 3-240 with a protein from Lactobacillus plantarum that is a response regulator (Accession No. NP—785945), and about 62% identity from amino acids 3-236 with a protein from Enterococcus faecalis that is a DNA-binding response regulator (Accession No. NP—814983).
A Gapped BlastP sequence alignment showed that SEQ ID NO:14 (483 amino acids) has about 61% identity form amino acids 1-482 with a protein from Lactobacillus johnsonii that is a two-component system histidine kinase (Accession No. NP—964774), about 61% identity from amino acids 8-482 with a protein from Lactobacillus gasseri that is a signal transduction histidine kinase (Accession No. ZP—00046226), about 45% identity from amino acids 1-475 with a protein from Lactobacillus sakei that is a putative histidine kinase (Accession No. AAD10268;), about 44% identity from amino acids 1-479 with a protein from Lactobacillus plantarum that is a histidine kinase sensor protein (Accession No. CAD64795), and about 41% homology from amino acids 1-474 with a protein from Enterococcus faecalis that is a sensor histidine kinase (Accession No. NP—814984).
A Gapped BlastP sequence alignment showed that SEQ ID NO:16 (367 amino acids) has about 27% identity from amino acids 10-363 with a protein from Oenococcus oeni that is a COG2199: FOG: GGDEF domain (Accession No. ZP—00069778), about 32% identity from amino acids 114-366 with a protein from Listeria monocytogenes that is similar to unknown proteins (hypothetical sensory transduction histidine kinase) (Accession No. NP—465435), about 30% identity from amino acids 114-366 with a protein from Listeria innocua that is a hypothetical sensory transduction histidine kinase (Accession No. NP—471359), about 38% identity from amino acids 200-366 with a protein from Leuconostoc mesenteroides that is a COG2199: FOG: GGDEF domain (Accession No. ZP—00062660), and about 33% identity with a protein from Vibrio vulnificus that is a GGDEF family protein (Accession No. NP—936516).
A Gapped BlastP sequence alignment showed that SEQ ID NO:18 (236 amino acids) has about 33% identity from amino acids 12-228 with a protein from Leuconostoc mesenteroides that is a COG2200: FOG: EAL domain (Accession No. ZP—00062661), about 33% identity from amino acids 12-223 with a protein from Leuconostoc mesenteroides that is a COG2200: FOG: EAL domain (Accession No. ZP—00062662), about 28% identity from amino acids 12-224 with a protein from Lactococcus lactis that is a hypothetical protein (Accession No. CAA04442), about 26% identity from amino acids 6-228 with a protein from Listeria monocytogenes that is lmo0111 (Accession No. NP—463644), and about 26% identity from amino acids 8-228 with a protein from Listeria innocua that is lin0158 (Accession No. NP—469503).
A Gapped BlastP sequence alignment showed that SEQ ID NO:20 (427 amino acids) has about 59% identity from amino acids 4-427 with a protein from Lactobacillus gasseri that is a signal transduction histidine kinase (Accession No. ZP—00046476), about 62% identity from amino acids 38-427 with a protein from Lactobacillus johnsonii that is a two-component system histidine kinase (Accession No. NP—965390), about 37% identity from amino acids 4-421 with an unknown protein from Streptococcus algalactiae (Accession No. NP—735834), about 37% identity from amino acids 4-421 with a protein from Streptococcus algalactiae that is sensor histidine kinase (Accession No. NP—688325), and about 37% identity from amino acids 4-423 with a protein from Streptococcus mutans that is a putative histidine kinase (Accession No. NP—721328).
A Gapped BlastP sequence alignment showed that SEQ ID NO:22 (221 amino acids) has about 77% identity from amino acids 1-220 with a protein from Lactobacillus johnsonii that is a two-component response regulator (Accession No. NP—965391), about 77% identity from amino acids 1-220 with proteins from Lactobacillus gasseri that are response regulators consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain (Accession No. ZP—00046475), about 59% identity from amino acids 1-221 with a protein from Streptococcus pyogenes that is a putative two-component response regulator (Accession No. NP—269073), about 57% identity from amino acids 1-221 with an unknown protein from Streptococcus agalactiae (Accession No. NP—735835), and about 58% identity from amino acids 1-221 with a protein from Streptococcus agalactiae that is a DNA-binding response regulator (Accession No. NP—688326).
A Gapped BlastP sequence alignment showed that SEQ ID NO:24 (525 amino acids) has about 53% identity from amino acids 1-502 with a protein from Lactobacillus johnsonii that is a two-component system histidine kinase (Accession No. NP—965436), about 55% identity from amino acids 100-509 with a protein from Lactobacillus gasseri that is a signal transduction histidine kinase (Accession No. ZP—00047348), about 39% identity from amino acids 52-518 with a protein from Lactobacillus plantarum that is a histidine protein kinase sensor protein (Accession No. NP—785147), about 39% identity from amino acids 12-500 with a protein from Enterococcus faecalis that is a sensor histidine kinase (Accession No. NP—814784), and about 47% identity from amino acids 211-507 with a protein from Leuconostoc mesenteroides that is a signal transduction histidine kinase (Accession No. ZP—00063323).
A Gapped BlastP sequence alignment showed that SEQ ID NO:26 (238 amino acids) has about 74% identity from amino acids 1-237 with a protein from Lactobacillus johnsonii that is a two-component regulatory system response regulator (Accession No. NP—965437), about 61% identity from amino acids 1-237 with a protein from Lactobacillus gasseri that are response regulators consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain (Accession No. ZP—00047347), about 63% identity from amino acids 1-231 with a protein from Lactobacillus plantarum that is a response regulator (Accession No. NP—785146), about 61% identity from amino acids 1-231 with a protein from Enterococcus faecalis that is a DNA-binding response regulator (Accession No. 814783), and about 59% identity from amino acids 1-231 with a protein from Listeria innocua that is a two-component response regulator (Accession No. NP—470750).
A Gapped BlastP sequence alignment showed that SEQ ID NO:28 (247 amino acids) has about 73% identity from amino acids 21-247 with a protein from Lactobacillus johnsonii that is a two-component system response regulator (Accession No. NP—964988), about 48% identity from amino acids 21-241 with a protein from Clostridium tetani that is a transcriptional regulatory protein (Accession No. NP—781768), about 47% identity from amino acids 22-245 with a protein from Lactobacillus plantarum that is a response regulator (Accession No. NP—784099), about 48% identity from amino acids 21-247 with proteins from Thermobacter tengcongensis that are response regulators consisting of a CheY-like receiver domain and a HTH DNA_binding domain (Accession No. NP—622667), and about 46% identity from amino acids 21-241 with a protein from Clostridium acetobutylicum that is a response regulator (Accession No. NP—348326).
A Gapped BlastP sequence alignment showed that SEQ ID NO:30 (441 amino acids) has about 49% identity from amino acids 2-439 with a protein from Lactobacillus johnsonii that is a two-component system histidine kinase (Accession No. NP—964989), about 32% identity from amino acids 3-434 with a protein from Lactococcus lactis that is a sensor protein kinase (Accession No. NP—267160), about 32% identity from amino acids 2-434 with a protein from Lactococcus lactis that is a histidine kinase (Accession No. AAC45387), about 31% identity from amino acids 3-438 with a protein from Oenococcus oeni that is signal transduction histidine kinase (Accession No. ZP—00069020), and 36% identity from amino acids 79-437 with a protein from Lactobacillus plantarum that is a histidine protein kinase sensor protein (Accession No. NP—784098).
A Gapped BlastP sequence alignment showed that SEQ ID NO:32 (274 amino acids) has about 55% identity from amino acids 11-274 with a protein from Lactobacillus johnsonii that is a lactacin F two-component system response regulator (Accession No. NP—964619), about 38% identity from amino acids 9-268 with a protein from Lactobacillus salivarius that is AbpR (Accession No. AAM61783), about 32% identity from amino acids 9-262 with a protein from Lactobacillus johnsonii that is a two-component system response regulator (Accession No. NP—964474), about 47% identity from amino acids 140-274 with a protein from Lactobacillus johnsonii that is a lactacin F two-component system response regulator (Accession No. 964627), and 33% identity from amino acids 12-262 with a protein from Lactobacillus sakei that is a response regulator (Accession No. CAA86945).
A Gapped BlastP sequence alignment showed that SEQ ID NO:34 (440 amino acids) has about 39% identity from amino acids 2-435 with a protein from Lactobacillus johnsonii that is a lactacin F two-component system histidine kinase (Accession No. NP—964617), about 31% identity from amino acids 58-431 with a protein from Lactobacillus salivarius that is AbpK (Accession No. AAM61782), about 31% identity from amino acids 73-431 with a protein from Lactobacillus johnsonii that is a two-component histidine kinase (Accession No. NP—964473), about 24% identity from amino acids 59-418 with a protein from Carnobacterium piscicola that is a histidine protein kinase (Accession No. AAB81305), and about 25% identity from amino acids 59-412 with a protein from Carnobacterium piscicola that is a histidine kinase CbaK (Accession No. AAF18146).
A Gapped BlastP sequence alignment showed that SEQ ID NO:36 (381 amino acids) has about 63% identity from amino acids 1-381 with a protein from Lactobacillus gasseri that is a signal transduction histidine kinase (Accession No. ZP—00046636), about 63% identity from amino acids 1-381 with a protein from Lactobacillus johnsonii that is a two-component system histidine kinase (Accession No. NP—965691), about 52% identity from amino acids 4-375 with a protein from Lactobacillus sakei that is a putative histidine kinase (Accession No. AAD10266), about 53% identity from amino acids 6-375 with a protein from Lactobacillus plantarum that is a histidine kinase sensor protein (Accession No. NP—786468), and 52% identity from amino acids 2-379 with a protein from Enterococcus facium that is a signal transduction histidine kinase (Accession No. ZP—00036366).
A Gapped BlastP sequence alignment showed that SEQ ID NO:38 (228 amino acids) has about 89% identity from amino acids 1-228 with proteins from Lactobacillus gasseri that are response regulators consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain (Accession No. ZP—00046635), about 85% identity from amino acids 1-227 with a protein from Lactobacillus sakei that is a putative response regulator (Accession No. AAD10265), about 81% identity from amino acids 1-228 with a protein from Lactobacillus plantarum that is a response regulator (Accession No. NP—786469), about 80% identity from amino acids 2-227 with proteins from Oenococcus onei that are response regulators consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain (Accession No. ZP—00069111), and about 80% identity from amino acids 1-228 with a protein from Enterococcus faecalis that is a DNA-binding response regulator (Accession No. NP—816885).
A Gapped BlastP sequence alignment showed that SEQ ID NO:40 (254 amino acids) has about 42% identity from amino acids 3-254 with a protein from Lactobacillus johnsonii that is a hypothetical protein LJ0802 (Accession No. NP—964657), about 42% identity from amino acids 3-254 with proteins from Lactobacillus gasseri that are transcriptional regulators of sugar metabolism (Accession No. ZP—00046400), about 30% identity from amino acids 1-239 with a protein from Listeria monocytogenes that is similar to a transcriptional regulator (DeoR) (Accession No. NP—465631), about 30% identity from amino acids 1-231 with a protein from Oceanobacillus iheyensis that is a transcriptional repressor of the phosphotransferase system (Accession No. NP—693730), and about 28% identity from amino acids 1-239 with a protein from Listeria innocua that is similar to a transcriptional regulator (DeoR family) (Accession No. NP—471545).
A Gapped BlastP sequence alignment showed that SEQ ID NO:42 (805 amino acids) has about 84% identity from amino acids 7-805 with a protein from Lactobacillus johnsonii that is a probable xylulose-5-phosphate/fructose-6-phosphste phosphoketolase (Accession No. NP—964658), about 65% identity from amino acids 7-805 with a protein from Lactobacillus plantarum that is a phosphoketolase (Accession No. NP—786060), about 65% identity from amino acids 7-805 with a protein from Lactobacillus pentosus that is similar to a phosphoketolase (Accession No. CAC84393), about 65% identity from amino acids 6-805 with a protein from Oenococcus onei that is a phosphoketolase (Accession No. ZP—00069369), and 65% identity from amino acids 7-805 with a protein from Lactobacillus paraplantarum that is a xylulose-5-phosphate phosphoketolase (Accession No. AAQ64626).
A Gapped BlastP sequence alignment showed that SEQ ID NO:44 (286 amino acids) has about 59% identity from amino acids 5-286 with a protein from Lactobacillus johnsonii that is a hypothetical protein LJ0785 (Accession No. NP—964640), about 59% identity from amino acids 5-286 with a protein from Lactobacillus gasseri that is a predicted esterase of the alpha-beta hydrolase superfamily (Accession No. ZP—00045972), about 41% identity from amino acids 5-284 with a protein from Fasobacterium nucleatum that is a serine protease (Accession No. ZP—00143830), about 41% identity from amino acids 5-284 with a protein from Fasobacterium nucleatum that is a Serine protease (Accession No. NP—603405), and 41% identity from amino acids 5-284 with a protein from Streptococcus agalactiae that is a protein of unknown function (Accession No. NP—689045).
A Gapped BlastP sequence alignment showed that SEQ ID NO:46 (402 amino acids) has about 35% identity from amino acids 74-387 with a protein from Lactobacillus gasseri that is a predicted metal-dependent membrane protease (Accession No. ZP—00046861), about 29% identity from amino acids 1-389 with a protein from Lactobacillus johnsonii that is a hypothetical protein LJ1642 (Accession No. NP—965449), about 26% identity from amino acids 113-392 with a protein from Lactobacillus plantarum that is a CAAX family membrane-bound protease (Accession No. NP—786255), about 27% identity from amino acids 90-383 with a hypothetical protein from Lactobacillus gasseri (Accession No. ZP—00047041), and 23% identity from amino acids 104-389 with a protein from Lactobacillus johnsonii that is hypothetical protein LJ0777 (Accession No. NP—964632).
A Gapped BlastP sequence alignment showed that SEQ ID NO:48 (224 amino acids) has about 31% identity from amino acids 64-222 with proteins from Methanosarcina barkeri that are flavodoxins (Accession No. ZP—00079137), about 27% identity from amino acids 37-224 with proteins from Leuconostoc mesenteroides that are flavodoxins (Accession No. ZP—00062708), about 29% identity from amino acids 64-222 with a protein from Porphyromonas gingivalis that is a putative flavodoxin (Accession No. NP—905330), about 29% identity from amino acids 63-224 with a protein from Azobacter vinelandii that is a flavodoxin (Accession No. ZP—00092501), and about 33% identity from amino acids 74-224 with a protein from Methanosarcina barkeri that is a flavodoxin (Accession No. ZP—00079128).
A Gapped BlastP sequence alignment showed that SEQ ID NO:50 (293 amino acids) has about 85% identity from amino acids 20-287 with a protein from Lactobacillus johnsonii that is a hypothetical protein LJ1250 (Accession No. NP—965105), about 83% identity from amino acids 22-291 with proteins from Lactobacillus gasseri that are membrane protease subunits, stomatin/prohibitin homologs (Accession No. ZP—00045910), about 60% identity from amino acids 22-286 with proteins from Leuconostoc mesenteroides that are membrane protease subunits, stomatin/prohibitin homologs (Accession No. ZP—00063597), about 57% identity from amino acids 22-289 with proteins from Oenococcus oeni that are membrane protease subunits, stomatin/prohibitin homologs (Accession No. ZP—00069250), and about 43% identity from amino acids 23-283 with an unknown protein from Lactobacillus plantarum (Accession No. NP—784144).
A Gapped BlastP sequence alignment showed that SEQ ID NO:52 (105 amino acids) has about 30% identity from amino acids 7-100 with a hypothetical protein from Streptococcus pyogenes (Accession No. NP—664197), about 30% identity from amino acids 7-100 with a hypothetical protein from Streptococcus pyogenes (Accession No. NP—606807), about 34% identity from amino acids 3-72 with a hypothetical protein from Streptococcus pyogenes (Accession No. NP—268822), about 40% identity from amino acids 8-60 with a protein from Treponema denticola that is a putative DNA-damage-inducible protein J (Accession No. NP—971120), and about 30% identity from amino acids 3-58 with a protein from Desulfitobacterium hafniense that is a DNA-damage-inducible protein J (Accession No. ZP—00099746).
A Gapped BlastP sequence alignment showed that SEQ ID NO:54 (325 amino acids) has about 42% identity from amino acids 1-323 with a hypothetical protein from Lactobacillus gasseri (Accession No. ZP—00047284), about 41% identity from amino acids 9-323 with a hypothetical protein LJ0696 from Lactobacillus johnsonii (Accession No. NP—964548), about 35% identity from amino acids 17-322 with a protein from Lactobacillus helveticus that is a helveticin (Accession No. AAA63274), about 24% identity from amino acids 116-258 with a protein from Rattus norvgicus that is similar to Gli3 protein (Accession No. XP—225411), and about 21% identity from amino acids 153-289 with a protein from Saccharomyces cerevisiae that is Tom1p (Accession No. NP—010745).
A Gapped BlastP sequence alignment showed that SEQ ID NO:56 (272 amino acids) has about 38% identity from amino acids 3-272 with proteins from Lactobacillus gasseri that are predicted hydrolases of the HAD superfamily (Accession No. ZP—00046918), about 36% identity from amino acids 7-270 with a protein from Streptococcus mutans that is a conserved hypothetical protein (Accession No. NP—721496), about 34% identity from amino acids 7-272 with a protein from Streptococcus agalactiae that is unknown (Accession No. NP—735618), about 33% identity from amino acids 7-272 with a protein from Streptococcus agalactiae that is a haloacid dehalogenase-like family hydrolase (Accession No. AAM99986), and about 33% identity from amino acids 7-272 with a protein from Listeria innocua that is a conserved hypothetical protein lin0440 (Accession No. NP—469785).
A Gapped BlastP sequence alignment showed that SEQ ID NO:58 (146 amino acids) has about 35% identity from amino acids 26-145 with proteins from Lactobacillus gasseri that are transcriptional regulators (Accession No. ZP—00045996), about 35% identity from amino acids 28-142 with a protein from Lactococcus lactis that is a transcriptional regulator (Accession No. NP—267638), about 31% identity from amino acids 29-142 with a protein from Clostridium acetobutylicum that is a MarR/EmrR family transcriptional regulator (Accession No. NP—349100), about 34% identity from amino acids 14-104 with a protein from Methanothermobacter thermautotrophicus that is a transcription regulator (Accession No. NP—275456), and about 27% identity from amino acids 14-142 with a protein from Staphylococcus aureus that is a hypothetical protein (Accession No. NP—370857).
A Gapped BlastP sequence alignment showed that SEQ ID NO:60 (585 amino acids) has about 57% identity from amino acids 16-582 with a protein from Lactobacillus brevis that is a Hop-resistant MDR (multidrug resistance)-like gene (Accession No. BAA21552), about 57% identity from amino acids 16-584 with a protein from Lactobacillus plantarum that is a multidrug ABC transporter ATP-binding and permease protein (Accession No. NP—786297), about 51% identity from amino acids 11-582 with a protein from Lactococcus lactis that is a multidrug resistance protein LmrA (Accession No. AAB49750), about 51% identity from amino acids 11-582 with a protein from Lactococcus lactis that is a multidrug resistance ABC transporter ATP-binding and permease protein (Accession No. Q9CHL8), and about 51% identity from amino acids 11-582 with a protein from Lactococcus lactis that is a multidrug resistance ABC transporter ATP-binding and permease protein (Accession No. NP—266867).
A Gapped BlastP sequence alignment showed that SEQ ID NO:62 (118 amino acids) has about 33% identity from amino acids 4-115 with a protein from Lactobacillus gasseri that is a hypothetical protein (Accession No. ZP—00046399), about 26% identity from amino acids 29-115 with a protein from Carnobacterium divergens that is dvnl (Accession No. CAA11807), about 26% identity from amino acids 29-115 with a protein from Lactobacillus plantarum that is a bacteriocin immunity protein (Accession No. NP—786516), about 38% identity from amino acids 74-117 with a protein from Equine coronavirus NC99 that is a spike protein (Accession No. AAQ67205), and about 25% identity from amino acids 20-115 with a protein from Clostridium acetobutylicum that is an uncharacterized protein similar to the mesC/lccI/entI family bacteriocin immunity protein (Accession No. NP—149170).
A Gapped BlastP sequence alignment showed that SEQ ID NO:64 (505 amino acids) has about 27% identity from amino acids 3-481 with a protein from Thermoanaerobacter tengcongensis that is aminopeptidase N (Accession No. NP—624209), about 33% identity from amino acids 122-369 with a protein from Streptomyces avermitilis that is a putative metallopeptidase (Accession No. NP—821429), about 31% identity from amino acids 122-371 with a protein from Streptomyces coelicolor that is a putative metallopeptidase (Accession No. NP—631646), about 24% identity from amino acids 11-480 with a protein from Chloroflexus auranticus that is a hypothetical protein (Accession No. ZP—00017564), and about 23% identity from amino acids 282-499 with a protein from Xylella fastidiosa that is aminopeptidase N (Accession No. ZP—00042138).
A Gapped BlastP sequence alignment showed that SEQ ID NO:66 (353 amino acids) has about 22% identity from amino acids 128-344 with proteins from Haemophilus somnus that are proteins involved in heme utilization (Accession No. ZP—00133280), and about 21% identity with a protein from Homo sapiens that is unknown (Accession No. AAH62424)
A Gapped BlastP sequence alignment showed that SEQ ID NO:68 (201 amino acids) has about 27% identity from amino acids 1-134 with a protein from Xylella fastidiosa that is a transposase and inactivated derivatives (Accession No. ZP—00038374), about 58% identity from amino acids 159-201 with a protein from Lactobacillus delbrueckii that is a transposase for insertion sequence element (Accession No. AAQ06905), about 26% identity from amino acids 1-134 with a protein from Xylella fastidiosa that is a transposase and inactivated derivatives (Accession No. ZP—00038149), about 24% identity from amino acids 27-196 with a protein from Nostoc sp. that is a transposase (Accession No. NP—490351), and about 25% identity from amino acids 1-132 with a protein from Xylella fastidiosa that is a transposase and inactivated derivatives (Accession No. ZP—00038301).
A Gapped BlastP sequence alignment showed that SEQ ID NO:70 (180 amino acids) has about 67% identity from amino acids 1-138 with a protein from Lactobacillus debruekii that is a transposase for insertion sequence element (Accession No. AAQ06905), about 36% identity from amino acids 2-179 with a protein from Clostridium perfringens that is a probable transposase (Accession No. NP—561584), about 36% identity from amino acids 2-178 with a protein from Clostridium tetani that is a transposase (Accession No. NP—781063), about 36% identity from amino acids 2-179 with a protein from Clostridium perfringens that is a probable transposase (Accession No. NP—562803), and about 35% identity from amino acids 2-179 with a protein from Clostridium tetani that is a transposase (Accession No. AAO35235).
A Gapped BlastP sequence alignment showed that SEQ ID NO:72 (444 amino acids) has about 55% identity from amino acids 1-432 with a protein from Lactobacillus plantarum that is a cation efflux protein (Accession No. NP—783937), about 42% identity from amino acids 2-432 with a protein from Bifidobacterium longum that is a Na+-driven multidrug efflux pump (Accession No. ZP—00120269), about 33% identity from amino acids 3-421 with a protein from Clostridium tetani that is a Na+-driven multidrug efflux pump (Accession No. NP—781116), about 31% identity from amino acids 3-431 with a protein from Methanosarcina acetivorans that is an integral membrane protein (Accession No. NP—616062), and about 29% identity from amino acids 7-432 with a protein from Clostridium acetobutlycum that is a predicted membrane protein and probable cation efflux pump (MDR-type) (Accession No. NP—349099).
A Gapped BlastP sequence alignment showed that SEQ ID NO:74 (64 amino acids) has about 28% identity from amino acids 4-49 with a protein from Nostoc sp. that is a hypothetical protein (Accession No. NP—478212).
A Gapped BlastP sequence alignment showed that SEQ ID NO:76 (63 amino acids) has about 40% identity from amino acids 9-39 with a protein from Bacillus subtilis that is an assimilatory nitrate reductase (Accession No. NP—388214), and about 40% identity from amino acids 9-41 with a protein from Bacillus subtilis that is an assimilatory nitrite reductase (Accession No. NP—388212).
A Gapped BlastP sequence alignment showed that SEQ ID NO:78 (438 amino acids) has about 40% identity from amino acids 66-188 with a protein from Lactobacillus salavarius that is unknown (Accession No. AAM61773), about 28% identity from amino acids 4-297 with a protein from Streptococcus mutans that is a hypothetical protein (Accession No. NP—722210), about 26% identity from amino acids 101-220 with a protein from Streptococcus agalactiae that is a putative bacteriocin transport accessory protein (Accession No. NP—687482), about 27% identity from amino acids 86-216 with a protein from Brochothrix campestris that is a transport accessory protein (Accession No. AAC95141), and about 25% identity from amino acids 101-220 with a protein from Streptococcus agalactiae that is unknown (Accession No. NP—734963).
A Gapped BlastP sequence alignment showed that SEQ ID NO:80 (196 amino acids) has about 56% identity from amino acids 1-196 with a protein from Lactobacillus gasseri that is a putative gassericin K7 B accessory protein (Accession No. AAP73779), about 56% identity from amino acids 1-196 with a protein from Lactobacillus gasseri that is ORF2 (Accession No. BAA82351), about 55% identity from amino acids 10-196 with a protein from Lactobacillus gasseri that is unknown (Accession No. AAP56342), about 49% identity from amino acids 10-196 with a protein from Lactobacillus sp. that is a hypothetical protein in the LAF 5′ region (ORF1) (Accession No. AAA16635), and about 28% identity from amino acids 41-195 with a protein from Lactobacillus casei that is an ABC-transporter accessory factor (Accession No. NP—542220).
A Gapped BlastP sequence alignment showed that SEQ ID NO:82 (720 amino acids) has about 68% identity from amino acids 1-720 with a protein from Lactobacillus salivarius that is AbpT (Accession No. AAM61785), about 62% identity from amino acids 9-720 with a protein from Lactobacillus plantarum that is an ATP-binding and permease protein PlnG bacteriocin ABC-transporter (Accession No. NP—784218), about 62% identity from amino acids 9-720 with a protein from Lactobacillus plantarum that is the ABC-transporter PlnG (Accession No. CAA64189), about 62% identity from amino acids 6-720 with a protein from Lactobacillus sakei that is the probable ATP-dependent translocation protein sppT (Accession No. AAA16635), and about 62% identity from amino acids 2-720 with a protein from Lactobacillus sakei that is an ABC-exporter (Accession No. CAA86946).
A Gapped BlastP sequence alignment showed that SEQ ID NO:84 (83 amino acids) has about 100% identity from amino acids 20-42 with a protein from Lactobacillus acidophilus that is the acidocin J1132 alpha peptide (N-terminal) (Accession No. AAB49523), and about 100% identity from amino acids 19-42 with a protein from Lactobacillus acidophilus that is the acidocin J1132 beta peptide (Accession No. AAB49524).
A Gapped BlastP sequence alignment showed that SEQ ID NO:94 (208 amino acids) has about 25% identity from amino acids 23-125 with a hypothetical protein from Lactobacillus helveticus (Accession No. CAA57507).
A Gapped BlastP sequence alignment showed that SEQ ID NO:98 (197 amino acids) has about 35% identity from amino acids 3-196 with a protein from Lactobacillus gasseri that is a predicted metal-dependent membrane protease (Accession No. ZP—00046861), about 38% identity from amino acids 1-151 with a protein from Lactobacillus gasseri that is a hypothetical protein (Accession No. ZP—00047041), about 26% identity from amino acids 3-183 with a protein from Lactobacillus plantarum that is a CAAX family membrane-bound protease (Accession No. NP—786255), about 30% identity from amino acids 1-142 with a protein from Lactobacillus gasseri that is a predicted metal-dependent membrane protease (Accession No. ZP—00047281), and about 35% identity from amino acids 80-156 with a protein from Lactobacillus plantarum that is the CAAX family membrane-bound protease immunity protein PlnI (Accession No. NP—784215).
A Gapped BlastP sequence alignment showed that SEQ ID NO:100 (263 amino acids) has about 23% identity from amino acids 57-263 with a protein from Lactobacillus gasseri that is a hypothetical protein (Accession No. ZP—00047041), about 33% identity from amino acids 134-201 with a protein from Halobacterium sp. that is the 3-oxoacyl-[acyl-carrier protein] reductase FabG (Accession No. NP—280196), about 29% identity from amino acids 62-245 with a protein from Lactobacillus gasseri that is a prediceted metal-dependent protease (Accession No. ZP—00046861), about 30% identity from amino acids 26-109 with a protein from Plasmodium falciparum that is a conserved hypothetical protein (Accession No. NP—701942), and about 26% identity from amino acids 83-229 with a protein from Avian infectious prochitis virus that is the replicase polyprotein lab (Accession No. AAP92673).
A Gapped BlastP sequence alignment showed that SEQ ID NO:102 (398 amino acids) has about 30% identity from amino acids 6-396 with a protein from Lactobacillus gasseri that is a predicted metal-dependent membrane protease (Accession No. ZP—00046861), about 27% identity from amino acids 4-392 with a protein from Lactobacillus gasseri that is hypothetical protein (Accession No. ZP—00047041), about 30% identity from amino acids 201-381 with a protein from Lactobacillus gasseri that is a predicted metal-dependent membrane protease (Accession No. ZP—00047281), about 24% identity from amino acids 103-394 with a protein from Lactobacillus plantarum that is a CAAX family membrane-bound protease (Accession No. NP—786255), and about 35% identity from amino acids 256-360 with a protein from Lactobacillus plantarum that is the CAAX family membrane-bound protease immunity protein PlnP (Accession No. NP—784209).
A Gapped BlastP sequence alignment showed that SEQ ID NO:104 (103 amino acids) has about 48% identity from amino acids 51-83 with a protein from Pyrococcus abyssi that is a hypothetical molybdenum cofactor (Accession No. NP—126386), about 28% identity from amino acids 1-92 with a protein from Dictyostelium discoideum that is a vacuolar proton ATPase 100 kDa subunit (Accession No. AAB49621), about 28% identity from amino acids 34-102 with a protein from Agrobacterium tumefaciens that is a conserved hypothetical protein (Accession No. NP—535397), about 40% identity from amino acids 53-94 with a protein from Ralstonia solanacearum that is a putative hemaglutanin-related protein (Accession No. NP—521309), and about 44% identity from amino acids 48-76 with a protein from Lactobacillus gasseri that is ORF3 (Accession No. BAA82352).
A Gapped BlastP sequence alignment showed that SEQ ID NO:106 (767 amino acids) has about 71% identity from amino acids 3-766 with proteins from Lactobacillus gasseri that is that are alpha-glucosidases (Accession No. ZP—00046641), about 65% identity from amino acids 5-761 with a protein from Lactobacillus plantarum that is an alpha-glucosidase (Accession No. NP—621719), about 40% identity from amino acids 15-767 with a protein from Thermoanaerobacter tengcongensis that is an alpha-glucosidase (Accession No. NP—535397), about 40% identity from amino acids 20-717 with a protein from Bacillus thermoamyloliquefaciens that is alpha-glucosidase II (Accession No. Q9F234), and about 38% identity from amino acids 10-750 with proteins from Nostoc punctiforme that are alpha-glucosidases (Accession No. ZP—00110705).
A Gapped BlastP sequence alignment showed that SEQ ID NO:116 (249 amino acids) has about 90% identity from amino acids 1-249 with a protein from Lactobacillus gasseri that is an aspartate racemase (Accession No. ZP—00046638), about 87% identity from amino acids 1-249 with a protein from Lactobacillus johnsonii that is an aspartate racemace (Accession No. NP—965689), about 52% identity from amino acids 1-234 with a protein from Pediococcus pentosaceus that is an aspartate racemance (Accession No. CAA43598), and about 48% identity from amino acids 1-235 with a protein from Streptococcus thermophilus that is an aspartate racemace (ZP00285115).
A Gapped BlastP sequence alignment showed that SEQ ID NO:118 (523 amino acids) has about 85% identity from amino acids 2-519 with a protein from Lactobacilllus johnsonii that is a UDP-N-acetylmuramoyl-L-alanyl-D-glutamate lysine ligase (Accession No. NP—965690), about 85% identity from amino acids 2-519 with a protein from Lactobaillus johnsonii that is a UDP-N-acetylmuramyl tripeptide synthase (Accession No. ZP—00046637), about 52% identity from amino acids 1-510 with a protein from Pediococcus pentosaceus that is a UDP-N-acetylmuramyl tripeptide synthase (Accession No. ZP—00323229), and about 45% identity from amino acids 1-515 with a protein from Leuconostoc mesenteroides that is a UDP-N-acetylmuramyl tripeptide synthase (Accession No. ZP—00062837).
A Gapped BlastP sequence alignment showed that SEQ ID NO:120 (621 amino acids) has about 84% identity from amino acids 7-620 with a protein from Lactobacillus johnsonii that are ABC transporter ATPase and permease components (Accession No. NP—965693), about 82% identity from amino acids 10-620 with a protein from Lactobacillus gasseri that are ABC-type multidrug transport system, ATPase and permease components (Accession No. ZP—00046634), about 52% identity with a protein from Clostridium acetobutylicum that is an ABC-type multidrug/protein/lipid transport system, ATPase component (Accession No. NP—350005), and about 52% identity from amino acids 40-621 with a protein from Desulfitobacterium hafniense that are ABC-type multidrug transport system, APTase and permease components (Accession No. ZP—00099385). A Gapped BlastP sequence alignment showed that SEQ ID NO:122 (576 amino acids) has about 83% identity from amino acids 1-576 with a protein from Lactobacillus gasseri that are ABC-type multidrug transport system ATPase and permease components (Accession No. ZP—00046633), about 83% identity from amino acids 1-576 with a protein from Lactobacillus johnsonii that are ABC transporter ATPase and permease components (Accession No. NP—965694), about 51% identity from amino acids 1-574 with a protein from Desulfitobacterium hafniense that are ABC-type multidrug transport system ATPase and permease components (Accession No. ZP—00099386), and about 50% identity from amino acids 1-569 with a protein from Bifidobacterium longum that is an ATP-binding protein of an ABC transporter (Accession No. NP—696913).
A Gapped BlastP sequence alignment showed that SEQ ID NO:124 (452 amino acids) has about 40% identity from amino acids 4-431 with a protein from Lactobacillus gasseri that is an uncharacterized protein conserved in bacteria (Accession No. ZP—00341762), about 39% identity from amino acids 4-431 with a protein from Lactobacillus johnsonii that is a hypothetical protein (Accession No. NP—964083), about 26% identity from amino acids 9-427 with a protein from Lactobacillus plantarum that is a hypothetical protein (Accession No. NP—783898), and about 25% identity from amino acids 21-427 with a protein from Pediococcus pentosaceus that is an uncharacterized protein conserved in bacteria (Accession No. ZP—00323558).
A Gapped BlastP sequence alignment showed that SEQ ID NO:126 (274 amino acids) has about 42% identity from amino acids 1-268 with a protein from Lactobacilus johnsonii that is a hypothetical protein (Accession No. NP—964084), about 40% identity from amino acids 1-268 with a protein from Lactobacillus gasseri that is an uncharacterized protein conserved in bacteria (Accession No. ZP—0046801), about 33% identity from amino acids 1-269 with a protein from Lactobacillus plantarum that is a hypothetical protein (Accession No. NP—783899), and about 27% identity from amino acids 1-266 with a protein from Pediococcus pentosaceus that is an uncharacterized protein conserved in bacteria (Accession No. ZP—00323559).
A Gapped BlastP sequence alignment showed that SEQ ID NO:128 (265 amino acids) has about 74% identity from amino acids 1-265 with a protein from Lactobacillus gasseri that are metal-dependent hydrolases of the beta-lactamase superfamily (Accession No. ZP—00046802), about 73% identity from amino acids 1-265 with a protein from Lactobacillus johnsonii that is a hypothetical protein (Accession No. NP—964085), about 52% identity from amino acids 1-265 with a protein from Lactobacillus plantarum that is a hydrolase (Accession No. NP—783900), and about 52% identity from amino acids 1-255 with a protein from Pediococcus pentosaceus that are metal-dependent hydrolases of the beta-lactamase superfamily (Accession No. ZP—00323560).
A Gapped BlastP sequence alignment showed that SEQ ID NO:130 (423 Amino acids) has about 86% identity from amino acids 12-423 with a protein from Lactobacillus helveticus that is HtrA (Accession No. CAA06668), about 60% identity from amino acids 18-420 with a protein from Lactobacillus johnsonii that is a serine protease do-like HtrA (Accession No. NP—964086), about 50% identity from amino acids 36-412 with a protein from Pediococcus pentosaceus that are trypsin-like serine proteases (Accession No. ZP—00323561), and about 41% identity from amino acids 22-420 with a protein from Exiguobacterium sp. that is a trypsin-like serine protease (Accession No. ZP—00184047).
A Gapped BlastP sequence alignment showed that SEQ ID NO:118 (236 amino acids) has about 35% identity from amino acids 25 to 221 with a protein from Oenococcus oeni that is an EAL domain (Accession No. ZP—00319350), about 33% identity from amino acids 7 to 223 with a protein from Leuconostocmesesenteroides subsp. mesenteroides ATCC 8293 that are EAL domains (Accession No. ZP—00062661), and about 33% identity with a protein from Leuconostocmesenteroides subsp. mesenteroides ATCC 8293 that is an EAL domain (Accession No. ZP—00062662).
A Gapped BlastP sequence alignment showed that SEQ ID NO:132 (56 amino acids) has about 55% identity from amino acids 1 to 56 with a protein from Oenococcus oeni PSU-1 that is a NADH:flavin oxidoreductases (Accession No. ZP—00318642), about 51% identity from amino acids 1 to 56 with a protein from Leuconostoc mesenteroides subsp. mesenteroides ATCC 8293 that are NADH:flavin oxidoreductases (Accession No. ZP—00064370), and about 50% identity from amino acids 3 to 56 with proteins from Lactococcus lactis subsp. lactis that are NADH-dependent oxidoreductase (Accession Nos. NP—267851, AAK05793, and G86836).
A Gapped BlastP sequence alignment showed that SEQ ID NO:134 (184 amino acids) has about 68% identity from amino acids 3 to 184 with a protein from Oenocossus oeni PSU-1 that is an amidase related to nicotinamidase (Accession No. ZP—00318699), about 63% identity from amino acids 2 to 183 with a protein from Lactobacillus plantarum WCFS1 that is a pyrazinamidase/nicotinamidase (Accession Nos. NP—786021 and CAD64878), and about 59% identity from amino acids 2 to 182 with the protein from Pediococcus pentosaceus ATCC 25745 that is an amidase related to nicotinamidase (Accession No. ZP—00323805).
A Gapped BlastP sequence alignment showed that SEQ ID NO:138 (498 amino acids) has about 83% identity from amino acids 1 to 493 with a protein from Lactobacillus johnsonii NCC533 that is an amino acid transporter (Accession No. NP—965275), about 82% identity from amino acids 1 to 493 with a protein from Lactobacillus gasseri that is an amino acid transporter (Accession No. ZP—00046566), and about 46% identity from amino acids 8 to 492 with the protein from Pediococcus pentosaceus ATCC 25745 that is an amino acid transporter (Accession No. ZP—00323277).
A Gapped BlastP sequence alignment showed that SEQ ID NO:140 (231 amino acids) has about 46% identity from amino acids 1 to 230 with a protein from Lactobacillus plantarum WCFS1 that is a cell surface hydrolase (putative) (Accession No. NP—785474), about 46% identity from amino acids 1 to 231 with a protein from Lactobacillus johnsonii NCC533 that is a hypothetical protein LJ0748 (Accession No. NP—964600), and about 45% identity with the protein from Lactobacillus gasseri that is an uncharacterized protein with an alpha/beta hydrolase fold (Accession No. ZP—00045991).
A Gapped BlastP sequence alignment showed that SEQ ID NO:144 (230 amino acids) has about 27% identity from amino acids 1 to 226 with a protein from Oenocossus oeni PSU-1 that is an aldo/keto reductases, related to diketogulonate reductases (Accession No. ZP—00319386), about 26% identity from amino acids 4 to 226 with a protein from Bifidobacterium longum NCC2705 that is a morphine 6-dehydrogenate (Accession No. NP—696457), and about 26% identity from amino acids 9 to 226 with a protein from Bifidobactrium longum DJO 10A that is an aldo/keto reductases related to diketogulonate reductase (Accession No. ZP—00120718).
A Gapped BlastP sequence alignment showed that SEQ ID NO:148 (392 amino acids) has about 68% identity from amino acids 1 to 389 with a protein from Lactobacillus gasseri that is a permeases of the major facilitator super family (Accession No. ZP—00046919), about 68% identity from amino acids 1 to 391 with a protein from Lactobaccillus johnsonii NCC 533 that is a major facilitator super family permease (Accession No. NP—965415), and about 46% identity from amino acids 1 to 385 with a protein from Lactobacillus plantarum WCFS1 that is a multi-drug transport protein (Accession No. NP—784617).
A Gapped BlastP sequence alignment showed that SEQ ID NO:22 (221 amino acids) has about 77% identity with a protein from Lactobacillus johnsonii NCC 533 that is a 2-component system response regulator (Accession No. NP—965391), about 77% identity from amino acids 1 to 220 with a protein from Lactobacillus gasseri that is a response regulator consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain (Accession No. ZP—00046475), and about 59% identity from amino acids 1 to 221 with a protein from Streptococcus pyogenes SSI-1 that are putative to component response regulators (Accession No. NP—607078).
A Gapped BlastP sequence alignment showed that SEQ ID NO:108 (63 amino acids) has about 59% identity from amino acids 1 to 63 with a protein from Bacillus thuringiensis serovar konkukian str. 97-27 that is a flagellar hook-associated protein 1 (Accession No. YP—035858), and about 40% identity from amino acids 32 to 63 with a protein from Bacillus cereusZK that is a flagellar hook-associated protein 1 (Accession No. YP—083109).
A Gapped BlastP sequence alignment showed that SEQ ID NO:122 (576 amino acids) has about 83% identity from amino acids 1 to 576 with a protein from Lactobacillus gasseri that is an ABC-type multi-drug transport system, ATPace and permease components (Accession No. ZP—00046633), about 83% identity from amino acids 1 to 576 with a protein from Lactobacillus johnsonii NCC 533 that is an ABC transporter ATPace and permease components (Accession Nos. NP—965694 and AAS09660), and about 51% identity from amino acids 1 to 574 with a protein from Desulfitobacterium hafniense DCB-2 that is an ABC-type multi-drug transport system, ATPace and permease components (Accession No. ZP—00099386).
A Gapped BlastP sequence alignment showed that SEQ ID NO:152 (260 amino acids) has about 45% identity from amino acids 2 to 249 with a protein from Lactobacillus gasseri that is an uncharacterized membrane-bound protein conserved in bacterium (Accession No. ZP—00046632).
A Gapped BlastP sequence alignment showed that SEQ ID NO:154 (366 amino acids) has about 93% identity from amino acids 1 to 366 with a protein from Lactobacillus johnsonii NCC533 that is a probable GTP-binding protein (Accession No. AAS09662), about 93% identity from amino acids 1 to 366 with a protein from Lactobacillus gasseri that is a predicted GTPase, probable translation factor (Accession No. ZP—00046631), and about 77% identity from amino acids 1 to 366 with a protein from Pediococcus pentosaceus ATCC 25745 that is a predicted GTPase, probable translation factor (Accession No. ZP—00322452) and 76% identity from amino acid 1 to 366 with a protein from Lactobacillus plantarum WCFS1 that is a GTP-binding protein (Accession No. NP—786473).
A Gapped BlastP sequence alignment showed that SEQ ID NO:158 (294 amino acids) has about 80% identity from amino acids 1 to 293 with a protein from Lactobacillus gasseri that is a predicted transcriptional regulator (Accession No. ZP—00046630), about 78% identity from amino acids 1 to 293 with a protein from Lactobacillus johnsonii NCC 533 which is a chromosome partitioning protein ParB (Accession No. NP—965698), and about 60% identity from amino acids 5 to 293 of a protein from Lactobacillus plantarum WCFS1 that is a chromosome partitioning protein (Accession No. NP—786475) and about 59% identity from amino acids 12 to 293 to a protein from Entrococcus faecalis V583 which is a chromosome partitioning protein ParB family (Accession No. NP—816893).
A Gapped BlastP sequence alignment showed that SEQ ID NO:160 (259 amino acids) has about 85% identity from amino acids 1 to 257 to a protein from Lactobacillus johnsonii NCC533 that is a chromosome partitioning protein ParA (Accession No. NP—965699), about 85% identity from amino acids 1 to 257 to a protein from Lactobacillus gasseri that is an ATPases involved in chromosome partitioning (Accession No. ZP—00046629), and permease components (Accession No. ZP—00046629), and about 68% identity from amino acids 1 to 251 of a protein from enterococcus faecalis V583 that is an ATPase, ParA family (Accession No. NP—816894).
A Gapped BlastP sequence alignment showed that SEQ ID NO:162 (276 amino acids) has about 57% identity from amino acids 1 to 276 with a protein from Lactobacillus johnsonii NCC533 which is a probable chromosome partitioning protein ParB (Accession No. NP—96570), about 58% identity from amino acids 1 to 276 of a protein from Lactobacillus gasseri that is a predicted transcriptional regulator (Accession No. ZP—00046628), and about 50% identity from amino acids 14 to 275 of a protein from Lactobacillus plantarum WCFS1 that is a chromosome partitioning protein, DNA binding protein (Accession No. NP—786477) and about 50% identity from amino acids 19 to 276 with a protein from Geobacillus kaustophilus HTA426 which is a hypothetical protein GK3491 (Accession No. YP—149344).
A Gapped BlastP sequence alignment showed that SEQ ID NO:164 (240 amino acids) has about 67% identity from amino acids 1 to 239 with a protein from Lactobacillus johnsonii NCC533 which is a glucose inhibited division protein B (Accession No. NP—965701), and about 66% identity from amino acids 1 to 239 to a protein from Lactobacillus gasseri that is a predicted S-adenosylmethionine-dependent methyltransferase involved in bacterial cell division (Accession No. ZP—00046627), and about 62% identity from amino acids 1 to 239 a protein from Pediococcus pentosaceus ATCC 25745 that is a predicted S-adenosylmethionine-dependent methyltransferase involved in bacterial cell division (Accession No. ZP—00322449).
SEQ ID NO:2 contains a predicted Response_reg domain located from about amino acids 3 to 92 and a predicted Trans_reg_C domain located from about amino acids 84 to 225, and is a member of the Response regulator receiver domain family (Response_reg) (PFAM Accession PF00072) and a member of the Transcriptional regulatory protein C family (Trans_reg_C) (PFAM Accession PF00486).
SEQ ID NO:4 contains a predicted HAMP domain from about amino acids 184 to 253, a predicted HisKA domain located from about amino acids 376 to 443 and a predicted HATPase_c domain from about amino acids 496 to 607, and is a member of the HAMP domain family (HAMP) (PFAM Accession PF00672), a member of the His Kinase A (phosphoacceptor) domain family (HisKA) (PFAM accession PF00512), and a member of the Histidine kinase-, DNA gyrase B-, and HSP90-like ATPase family (HATPase_c) (PFAM Accession PF02518).
SEQ ID NO:12 contains a predicted Response_reg domain from about amino acids 3 to 124 and a Trans_reg_C domain from about amino acids 160 to 131, and is a member of the Response regulator receiver domain family (Response_reg) (PFAM Accession PF00072) and a member of the Transcriptional regulatory protein C family (Trans_reg_C) (PFAM Accession PF00496).
SEQ ID NO:14 contains a predicted HAMP domain from about amino acids 173 to 242, a predicted HisKA domain located from about amino acids 253 to 319 and a predicted HATPase_c domain from about amino acids 364 to 475, and is a member of the HAMP domain family (HAMP) (PFAM Accession PF00672), a member of the His Kinase A (phosphoacceptor) domain family (HisKA) (PFAM accession PF00512), and a member of the Histidine kinase-, DNA gyrase B-, and HSP90-like ATPase family (HATPase_c) (PFAM Accession PF02518).
SEQ ID NO:16 contains a predicted GGDEF domain from about amino acids 200-363, and is a member of the GGDEF domain family (GGDEF) (PFAM Accession PF00990).
SEQ ID NO:18 contains a predicted EAL domain from about amino acids 4 to 234, and is a member of the EAL domain family (EAL) (PFAM Accession PF00563).
SEQ ID NO:20 contains a predicted HisKA domain located from about amino acids 208 to 270, a predicted HATPase_c domain from about amino acids 314 to 426, and is a member of the His Kinase A (phosphoacceptor) domain family (HisKA) (PFAM accession PF00512), and a member of the Histidine kinase-, DNA gyrase B-, and HSP90-like ATPase family (HATPase_c) (PFAM Accession PF02518).
SEQ ID NO:22 contains a predicted Response_reg domain from about amino acids 1 to 120, and is a member of the Response regulator receiver domain family (Response_reg) (PFAM Accession PF00072).
SEQ ID NO:24 contains a predicted HAMP domain from about amino acids 203 to 274, a predicted HisKA domain from about amino acids 278 to 345, a predicted HATPase_c domain from about amino acids 391 to 502, and is a member of the HAMP domain family (HAMP) (PFAM Accession PF00672), a member of the His Kinase A (phosphoacceptor) domain family (HisKA) (PFAM accession PF00512) and a member of the Histidine kinase-, DNA gyrase B-, and HSP90-like ATPase family (HATPase_c) (PFAM Accession PF02518).
SEQ ID NO:26 contains a predicted Response_reg domain from about amino acids 2 to 120 and a predicted Trans_reg_C domain from about amino acids 156 to 227, and is a member of the response regulator receiver domain family (Response_reg) (PFAM Accession PF00072) and a member of the Transcriptional regulatory protein C family (Trans_reg_C) (PFAM Accession PF00486).
SEQ ID NO:28 contains a predicted Response_reg domain from about amino acids 20 to 138 and a predicted Trans_reg_C domain from about amino acids 170 to 240, and is a member of the Response regulator receiver domain family (Response_reg) (PFAM Accession PF00072) and a member of the transcriptional regulatory protein C family (Trans_reg_C) (PFAM Accession PF00486).
SEQ ID NO:30 contains a predicted HisKA domain from about amino acids 223 to 290 and a predicted HATPase_c domain from about amino acids 330 to 441, and is a member of the His Kinase A (phosphoacceptor) domain family (HisKA) (PFAM accession PF00512) and a member of the Histidine kinase-, DNA gyrase B-, and HSP90-like ATPase family (HATPase_c) (PFAM Accession PF02518).
SEQ ID NO:36 contains a predicted HisKA domain from about amino acids 153 to 219 and a predicted HATPase_c domain from about amino acids 265 to 376, and is a member of the His Kinase A (phosphoacceptor) domain family (HisKA) (PFAM accession PF00512) and a member of the Histidine kinase-, DNA gyrase B-, and HSP90-like ATPase family (HATPase_c) (PFAM Accession PF02518).
SEQ ID NO:38 contains a Response_reg domain from about amino acids 1 to 120 and a predicted Trans_reg_C domain from about amino acids 150 to 226, and is a member of the Response regulator receiver domain family (Response_reg) (PFAM Accession PF00072) and a member of the Transcriptional regulatory protein C family (Trans_reg_C) (PFAM Accession PF00486).
SEQ ID NO:40 contains a predicted DeoR domain from about amino acids 6 to 231, and is a member of the Bacterial regulatory proteins, DeoR family (DeoR, PFAM Accession PF00455).
SEQ ID NO:44 contains a predicted Patatin domain from about amino acids 9 to 176, and is a member of the Patatin-like phospholipase family (Patatin) (PFAM Accession PF01734).
SEQ ID NO:50 contains a predicted Band—7 domain from about amino acids 21 to 194, and is a member of the SPFH domain/Band 7 family (Band—7) (PFAM Accession PF01145).
SEQ ID NO:58 contains a predicted MarR domain from about amino acids 35 to 138, and is a member of the MarR family (MarR) (PFAM Accession PF01047).
SEQ ID NO:60 contains an ABC_membrane domain from about amino acids 41 to 307 and an ABC_tran domain from about amino acids 377 to 582, and is a member of the ABC transporter transmembrane region family (ABC_membrane) (PFAM Accession PF00664) and a member of the ABC transporter family (ABC_tran) (PFAM Accession PF00005).
SEQ ID NO:72 contains a MatE domain from about amino acids 27 to 189, and is a member of the MatE domain family (MatE) (PFAM Accession PF01554).
SEQ ID NO:82 contains a Peptidase_C39 domain from about amino acids 10 to 145, an ABC_membrane domain from about amino acids 164 to 440 and an ABC_tran domain from about amino acids 512 to 696, and is a member of the Peptidase C39 family (Peptidase_C39) (PFAM Accession PF03412), a member of the ABC transporter transmembrane region family (ABC_membrane) (PFAM Accession PF00664) and a member of the ABC transporter family (ABC_tran) (PFAM Accession PF00005).
SEQ ID NO:124 contains a predicted YycH domain from about amino acids 12 to 429 and is a member of the YycH domain family (PFAM Accession No. PF07435).
SEQ ID NO:128 contains a predicted Lactamase_B domain from about amino acids 11 to 219 and is a member of the Lactamase_B domain family (PFAM Accession No. PF00753).
SEQ ID NO:130 contains a predicted PDZ domain from about amino acids 315 to 408, a predicted trypsin domain from about amino acids 132 to 312, and is a member of the PDZ domain family (PFAM Accession No. PF00595) and a member of the trypsin domain family (PFAM Accession No. PF00089).
SEQ ID NO:56 contains a predicted hydrolase domain from about amino acids 6 to 243, and is a member of the hydrolase domain family (PFAM Accession No. PF00702).
SEQ ID NO:8 contains a domain with an E-value of 0.015 to a predicted HAT Pase_C domain from amino acids 321 to 425, and is a member of the HATPase_C domain family (PFAM Accession No. PF02518).
SEQ ID NO:10 contains a predicted response_reg domain from about amino acids 3 to 140, a predicted LyTR domain from about amino acids 160 to 254, and is a member of the response_reg domain family (PFAM Accession No. PF00072) and a member of the LytTR domain family (PFAM Accession No. PF04397).
SEQ ID NO:138 contains a predicted amino acid permease domain from about amino acids 13 to 498, and is a member of the AA_permease domain family (PFAM Accession No. PF00324).
SEQ ID NO:144 contains a predicted aldo/keto reductase domain from about amino acids 10 to 228, and is a member of the Aldo/keto reductase family (PFAM Accession No. PF00248).
SEQ ID NO:148 contains a predicted major facilitator super family domain from about amino acid 15 to 356 and is a member of the major facilitator super family (MFS—1) domain family (PFAM Accession No. PF07609).
SEQ ID NO:150 contains a predicted region found in RelA/SpoT proteins from about amino acids 44 to 169, and is a member of the RelA_SpoT domain family (PFAM Accession No. PF04607).
SEQ ID NO:48 contains a predicted flavodoxin domain from about amino acids 67 to 224 (E-value equals 0.021) and is a member of the flavodoxin—1 domain family (PFAM Accession No. PF00258).
SEQ ID NO:52 contains a predicted RelB antitoxin domain from about amino acids 5 to 76 with an E-value of 0.0001 and is a member of the RelB domain family (PFAM Accession No. PF04221).
SEQ ID NO:64 contains a predicted peptidase family M1 domain from about amino acids 31 to 416, and is a member of the peptidase M1 domain family (PFAM Accession No. PF01433).
SEQ ID NO:70 contains a punitive transposase DNA-binding domain from about amino acids 97 to 178, and is a member of the transposase—35 domain family (PFAM Accession No. PF07282).
SEQ ID NO:78 contains a predicted gram positive anchor domain from about amino acids 393 to 433, which is a member of the gram positive anchor domain family (PFAM Accession No. PF00746).
SEQ ID NO:32 contains a predicted LytTr DNA-binding domain from about amino acids 172 to 266, a predicted response regulator receiver domain from about amino acids 12 to 152, and is a member of the LytTR DNA-binding domain family (PFAM Accession No. PF04397) and a member of the response regulator receiver domain family (PFAM Accession No. PF00072).
SEQ ID NO:34 contains a HATPase_C domain from about amino acids 320 to 434, and is a member of the histidine kinase-, DNA gyrase B-, and HSP90-like ATPace family (HATPase_C) (PFAM Accession No. PF02518).
SEQ ID NO:98 contains a predicted CAAX amino terminal protease family domain from about amino acids 38 to 148, with an E-value of 0.00025, which is a member of the ABI domain family (PFAM Accession No. PF02517).
SEQ ID NO:102 contains a predicted CAAX amino terminal protease family domain from about amino acids 243 to 353, with an E-value of 8.9e-06 and is a member of the ABI domain family (PFAM Accession No. PF02517).
SEQ ID NO:106 contains a predicted glycosylhydrolases family domain from about amino acids 185 to 296, and is a member of the glycosylhydrolases family (Gylco_hydro—31) (PFAM Accession No. PF01055).
SEQ ID NO:116 contains a predicted asp/glu/hydantoin racemase from about amino acids 2 to 231, and is a member of the asp/glu/hydantoin racemase domain family (PFAM Accession No. PF01055).
SEQ ID NO:118 contains a predicted mur ligase family, glutamate ligase domain from about amino acids 30 to 102, and is a member of the mur ligase family, glutamate ligase domain family (PFAM Accession No. PF02875).
SEQ ID NO:120 contains a predicted ABC transporter domain from about amino acids 408 to 592 and a predicted ABC transporter transmembrane region located about amino acids 36 to 315, and is a member of the ABC transporter domain family (PFAM Accession No. PF01061) and a member of the ABC transporter transmembrane region domain family (PFAM Accession No. PF00664).
SEQ ID NO:122 contains a predicted ABC transporter domain from about amino acids 360 to 544 and a predicted ABC transporter transmembrane region located from about amino acids 16 to 287, and is a member of the ABC transporter domain family (PFAM Accession No. PF01061) and a member of the ABC transporter transmembrane region domain family (PFAM Accession No. PF00664).
SEQ ID NO:154 contains a predicted GTPase of unknown function from about amino acids 3 to 145, which is a member of the MMR_HSR1 domain family (PFAM Accession No. PF01926).
SEQ ID NO:158 contains a predicted ParB-like nuclease domain from about amino acids 37 to 126, and is a member of the ParB-like nuclease domain family (PFAM Accession No. PF02195).
SEQ ID NO:160 contains a predicted CobQ-CobB/MinD/ParA nucleotide binding domain from about amino acids 5 to 221, and is a member of the CbiA domain family (PFAM Accession No. PF01656).
SEQ ID NO:162 contains a predicted ParB-like nuclease domain from about amino acids 20 to 109, and is a member of the ParBc domain family (PFAM Accession No. PF02195).
SEQ ID NO:164 contains a predicted glucose inhibited division protein from about amino acids 21 to 215, and is a member of the GidB domain family (PFAM Accession No. PF02527).
Survival of microorganisms during their transit through the gastrointestinal tract requires the capability to sense and respond to the various and changing conditions present in that environment. Two-component regulatory systems (2CRS) are one of the most important mechanisms for environmental sensing and signal transduction. They are found in the majority of gram-positive and gram-negative bacteria and control housekeeping functions, as well as regulating proteins important for pathogenesis, stress and adherence (Cotter et al. (1999) J. Bacteriol. 181:6840-6843; Sebert et al. (2002) Infect. Immun. 70:4059-4067; Teng et al. (2002) Infect. Immun. 70:1991-1996). A typical 2CRS consists of a membrane-associated histidine protein kinase (HPK), which detects specific environmental signals, and a cytoplasmic response regulator (RR), which regulates expression of one or more genes in a regulon (Parkinson (1993) Cell 73:857871). 2CRS are located in modules with varying arrangements of conserved domains (West and Stock (2001) TRENDS Biochem. Sci. 26:369-376). HPKs generally consist of a signal input domain and an autokinase domain, which can be divided into two sub domains: a histidine phosphotransferase sub domain and an ATP-binding sub domain. The RR is typically composed of a regulatory (receiver) domain and a DNA binding (output) domain (Hoch and Varughese (2001) J. Bacteriol. 183:4941-4949). Detection of an external signal by the input domain of the kinase controls its own activation. The active kinases will autophosphorylate via ATP hydrolysis, on a histidine residue. This phosphoryl group is then transferred to an aspartate residue in the receiver domain of the RR that activates the regulatory protein and promotes the transcriptional response (Foussard et al. (2001) Microbes Infect. 3:417-424).
Genomic sequencing of microorganisms has uncovered the presence of many 2CRS and promoted global analysis of their responses to different environments. For those studies, DNA microarray technology involving high-density arrays of open reading frame-specific fragments has been instrumental. Fabret et al. (Fabret et al. (1999) J. Bacteriol. 181:1975-1983) identified and grouped 2CRS in Bacillus subtilis in five different groups and the function of these 2CRS have been investigated by microarray analysis (Kobayashi et al. (2001) J. Bacteriol. 183:7365-7370; Ogura et al. (2001) Nucleic Acids Res. 29:3804-3813).
In lactic acid bacteria (LAB), production of some class II bacteriocins (plantaricin, sakacin P, sakacin A, carnobacteriocin 132) is transcriptionally regulated through a signal transduction pathway which consists of three components: an inducer bacteriocin-like peptide, a HPK, and a RR (for a review see 25). In fact, the production of many small antimicrobial peptides appears to be modulated by a cell-density response mechanism. Additionally, multiple 2CRS have been identified in a number of LAB (Miller and Bassler (2001) Annu. Rev. Microbiol. 55:165-199; Morel-Deville et al. (1997) Microbiology 143:1513-1520). For example, six 2CRS were detected in Lactococcus lactis, with four of them implicated in cellular responses to stress (O'Connell-Motherway et al. (2000) Microbiology 46:935-947).
Lactobacillus acidophilus NCFM is a probiotic organism that has been used extensively in yogurt, fermented foods, and dietary supplements (Sanders and Klaenhammer (2001) J. Dairy Sci. 84:319-331). The annotated genome sequence of L. acidophilus NCFM encodes nine putative 2CRS (Altermann et al. (2004) Proc. Natl. Acad. Sci. U.S.A. 102:3906-3912). In this study, we identified a 2CRS similar to the lisRK system described in Listeria monocytogenes (Cotter et al. (1999) J. Bacteriol. 181:6840-6843), which participates in both stress response and virulence in L. monocytogenes. The HPK gene from the LBA1524HPK-LBA1525RR system was disrupted to investigate its putative role in acid tolerance. A whole genome array containing 97.4% L. acidophilus annotated genes was constructed and used to compare genome-wide transcriptional patterns of the control and the HPK mutant, exposed to three different pHs.
The bacterial strains used in this study were Escherichia coli EC 1000 (RepA+ MC1000, KmR; host for pORI28-based plasmids, [Law et al. (1995) J. Bacteriol. 177:7011-7018]), and L. acidophilus strains: NCFM (human intestinal isolate; [Barefoot and Klaenhammer (1983) Appl. Environ. Microbiol. 45:1808-1815]), NCK1398 (NCFM lacL::pTRK685, [Russell and Klaenhammer (2001) Appl. Environ. Microbiol. 67:4361-4364]) and NCK1686 (NCFM LBA1524::pTRK807, [this example]).
E. coli strains were propagated at 37° C. in Luria-Bertani (LB, Difco Laboratories Inc., Detroit, Mich.) broth with shaking Erythromycin (Em) resistant clones of E. coli were selected on brain heart infusion (BHI) agar (Difco) supplemented with Em (150 μg/ml). Lactobacilli were propagated statically at 37° C. in MRS (Difco) or on MRS supplemented with 1.5% agar. When appropriate, Em (5.0 μg/ml) and/or chloramphenicol (Cm, 7.0 μg/ml) was added. Reconstituted skim milk (10% SM) and 10% SM supplemented with 1% yeast extract (Difco) or 0.25% casaminoacids (Difco) were used for determination of acidification rates.
Restriction enzymes (Roche Molecular Biochemicals, Indianapolis, Ind.) and T4 DNA ligase (New England Biolabs, Beverly, Mass.) were used according to the suppliers' recommendations. Plasmid preparations from E. coli were performed using the QIAprep Spin Plasmid Minipreps kit (QIAGEN Inc., Valencia, Calif.). Chromosomal DNA from L. acidophilus was extracted according to Walker and Klaenhammer (Walker and Klaenhammer (1994) J. Bacteriol. 176:5330-5340). Electrotransformation of L. acidophilus was carried out as described by Walker et al. (Walker et al. (1996) FEMS Microbiol. Lett. 138:233-237). PCR was performed by standard protocols using Taq DNA polymerase (RocheMolecular Biochemicals).
Potential coding sequences were derived from the genomic sequence of L. acidophilus NCFM (Genbank accession number CP000033, [Altermann et al. (2004) Proc. Natl. Acad. Sci. U.S.A. 102:3906-3912]). Protein sequence similarity analysis was conducted using the BlastP module (Altschul et al. (1997) Nucleic Acids Res. 25: 3389-3402) at NCBI (nebi.nlm.nih.gov/). TMHMM (cbs.dtu.dklservices/TMHMM) was used to predict transmembrane helices in proteins. CD-Search (Marchler-Bauer et al. (2003) Nucleic Acids Res. 31:383-387) was employed to identify conserved domains in protein sequences.
Microarray platform and data are available at the Gene Expression Omnibus (GEO rhtto://www.nebi.nlm.nih.goy/ge2]) under accession numbers GPL1401 (platform) and GSE1976 (series).
Aliquots (10 ml) of L. acidophilus cultures grown on MRS to A600=0.3 were transferred to MRS (adjusted to desired pH with lactic acid). After 30 minutes, cells were harvested by centrifugation and frozen immediately in a dry ice/ethanol bath. One ml Trizol (Life Technologies, Rockville, Md.) was added to the cell pellets and they were homogenized in a Mini-Beadbeater-8 cell disruptor (Biospec Products, Bartlesville, Okla.) for five 1-min cycles (and chilled on ice for 1 min between the cycles), the phases were separated by centrifugation (14,000 rpm, 15 min, 4° C.). The aqueous phase was removed to a fresh tube and 0.4 ml of Trizol and 0.2 ml of chloroform were added. The mixture was vortexed for 15 s and centrifuged to separate the phases. The Trizol step was repeated twice and RNA was precipitated from the final aqueous phase by adding 1 volume of isopropanol, followed by incubation at room temperature for 10 min and centrifugation (12,000 rpm, 10 min, 4° C.). Concentration and purity of RNA samples were determined by electrophoresis on agarose gels and standard spectrophotometer measurements.
Total RNA hybridizations using a slot-blot apparatus (Bio-Dot SF, Bio-Rad) and Zeta-Probe membrane (Bio-Rad Laboratories, Inc.) were carried out as previously described (Durmaz et al. (2002) J. Bacteriol. 184:6532-6543). [α-32P]dCTP-labeled probes were generated from PCR fragments using the Multiprime DNA labeling system (Amersham Pharmacia Biotech Inc., Piscataway, N.J.) and purified using the NucTrap Probe purification columns (Stratagene, La Jolla, Calif.). The primers utilized are listed in Table 2. Radioactive signals were detected by using a Kodak Biomax film and autoradiographs were analyzed by densitometry using the SpotDenso function with auto-linked background on an Alphalmager 2000 (Innotech Scientific). Primers as set forth in table 2 are denoted in the sequence listing as follows LBA 0197 (SEQ ID NO:167), for LBA 1300 (SEQ ID NO:168), for LBA 1524 (SEQ ID NO:169), for LBA 1525 (SEQ ID NO:170), for LBA 0698 (SEQ ID NO:171), for LBA 1075 (SEQ ID NO:175), for LBA 1196 (SEQ ID NO:176).
Generation of Lactobacillus acidophilus DNA Microarray
A whole genome DNA microarray based on the PCR products of predicted ORFs from the L. acidophilus genome was used for global gene expression analysis. PCR primers for 1,966 genes were designed using GAMOLA software (Altermann and Klaenhammer (2003) OMICS 7:161-169) and purchased from Qiagen Operon (Alameda, Calif.). Total genomic DNA from L. acidophilus NCFM was used as a template for 96-well PCR amplifications. To amplify gene-specific PCR products, a 100 μl reaction mix contained: I μl L. acidophilus DNA (100 ng/l), 10 μl specific primer pairs (10 μM), 0.5 μl of dNTP mix (10 mM), 10 μl PCR buffer (10×), and 1 μl Taq DNA polymerase (5 U/μl [Roche Molecular Biochemicals]). The following PCR protocol was used: an initial denaturation step for 5 min at 94° C. followed by 40 cycles of denaturation at 94° C. for 15 sec, annealing at 50° C. for 30 sec and polymerization at 72° C. for 45 sec. Approximately 95% of ORFs produced a unique PCR product between 100-800 bp. The size of fragments was confirmed by electrophoresis in 1% agarose gels. DNA from 96-well plates were purified using the Qiagen Purification Kit. In general, the total quantity of each PCR product was greater than 1 μg. The purified PCR fragments were spotted three times in a random pattern on glass slides (Corning, Acton, Mass.) using the Affymetrix® 417™ Arrayer at the NCSU Genome Research Laboratory (cals.ncsu.edu:8050/grl/). To prevent carry-over contaminations, pins were washed between uses in different wells. Humidity was controlled at 50-55% during printing. DNA was cross-linked to the surface of the slide by UV (300 mJ) and posterior incubation of the slides for 2 h at 80° C. The reliability of the microarray data was assessed by hybridization of two cDNA samples prepared from the same total RNA, labeled with Cy3 and Cy5. Hybridization data revealed a linear correlation in the relative expression level of 98.6% of 5685 spots (each gene by triplicate) with no more than a two-fold change.
cDNA Probe Preparation and Microarray Hybridization
Identical amounts (25 μg) of DNAse treated (Invitrogen) RNA were aminoallyl-labeled by reverse transcription with random hexamers in the presence of amino-allyl dUTP (Sigma Chemical Co.), using Superscript II reverse transcriptase (Life Technologies) at 42° C. overnight, followed by fluorescence-labeling of amino allylated cDNA with N-hydroxysuccinimide-activated Cy3 or Cy5 esters (Amersham Pharmacia Biotech). Labeled cDNA probes were purified using the PCR Purification Kit (Qiagen). Coupling of the Cy3 and Cy5 dyes to the AA-dUTP labeled cDNA and hybridization of samples to microarrays were performed according to the protocols outlined in the TIGR protocols website (tigr.org/tdb/microarray/protocosTGR.shtml). Briefly, combined Cy5- and Cy3-labeled cDNA probes were hybridized to the arrays for 16 h at 42° C. After hybridization, the slides were washed twice in low stringency buffer (1×SSC containing 0.2% SDS) for 5 min each. The first wash was performed at 42° C. and the second one at room temperature. Subsequently, the slides were washed in a high stringency buffer (0.1×SSC containing 0.2% SDS, for 5 min at room temperature) and finally in 0.1×SSC (2 washes of 2.5 min each at room temperature).
Immediately after washing of the arrays, fluorescence intensities were acquired at 10 μm resolution using a ScanArray 4000 Microarray Scanner (Packard Biochip BioScience, Biochip Technologies LLC, Mas.) and stored as TIFF images. Signal intensities were quantified, the background was subtracted and data was normalized using the QuantArray 3.0 software package (Perkin Elmer). Two slides (each containing triplicate arrays) were hybridized reciprocally to Cy3- and Cy5-labeled probes per experiment (dye swap). Spots were analyzed by adaptive quantitation. Data was median normalized. When the local background intensity was higher than the spot signal (negative values) no data was considered for those spots. The median of the six ratios per gene was recorded. The ratio between the average absolute pixel values for the replicated spots of each gene with and without treatment represented the fold change in gene expression. All genes belonging to a potential operon were considered for analysis if at least one gene of the operon showed significant expression changes, and the remaining genes showed trends toward that expression. Confidence intervals and P values on the fold change were also calculated with the use of a two-sample t test. P values of 0.05 or less were considered significant (Knudsen (2002) “A Biologist's Guide to Analysis of DNA Microarray Data,” (John Wiley & Sons, Inc., New York)).
A 766-bp internal fragment of ORF LBA1524 was amplified using L. acidophilus NCFM chromosomal DNA as template and the primers 11 524F (5′-gatctagacagcgctctagca-3′) and 11 524R (5′-gatcgatcttcggccaatgtg-3′). The internal fragment was cloned in the integrative vector pORI28 (Law et al. (1995) J. Bacteriol. 177:7011-7018) generating pTRK807, and introduced by electroporation in L. acidophilus NCFM containing pTRK669 (Russell and Klaenhammer (2001) Appl. Environ. Microbiol. 67:4361-4364).
Subsequent steps to facilitate the integration event were carried out according to Russell and Klaenhammer (Russell and Klaenhammer (2001) Appl. Environ. Microbiol. 67:4361-4364). The suspected integrants were confirmed by PCR and Southern hybridization analysis, using standard procedures.
For acid challenge analysis, cells were grown to an absorbance at 600 nm (A600) of 0.25-0.3 (pH>5.8) from a 2% inoculum in MRS broth. Cultures were centrifuged and resuspended in the same volume of MRS adjusted to pH 3.5 with lactic acid at 37° C. Survival was determined at 30 minutes intervals by plating serial dilutions in a 10% MRS broth diluent onto MRS agar using a Whitley Automatic Spiral Plater (Don Whitley Scientific Limited, West Yorkshire, England).
For acid adaptation assays, cells were grown to an A600 of 0.25-0.3 (pH>5.8). Cells were centrifuged and resuspended in the same volume of MRS pH 5.5 (adjusted with lactate) and incubation continued for 1 hour at 37° C. as described previously (Azcarate-Peril et al. (2004) Appl. Environ. Microbiol. 70:5315-5322). Controls were resuspended in MRS broth at pH 6.8. The cells from the adapted (pH 5.5) and control (pH 6.8) cultures were then centrifuged and resuspended in MRS broth at pH 3.5 (adjusted with lactic acid). Viable-cell counts were performed at 30 minutes intervals for 2.5 h by plating on MRS agar.
Log phase cells at an A600 of 0.25-0.3 (pH>5.8) from a 2% inoculum in MRS broth were centrifuged and resuspended in the same volume of NMS, or MRS containing 15 or 20% (v/v) ethanol. CFU/ml were determined at 30 minutes intervals by serial dilutions in 10% MRS and enumeration on MRS agar as described above.
Using CD-search (Marchler-Bauer et al. (2003) Nucleic Acids Res. 31:383-387) and BlastP (Altschul et al. (1997) Nucleic Acids Res. 25: 3389-3402) programs, we identified nine signal transduction systems consisting of a histidine protein kinase (HPK) and a response regulator (RR; [Altermann et al. (2004) Proc. Natl. Acad. Sci. U.S.A. 102:3906-3912]). These 2CRS represented almost 1% of L. acidophilus NCFM ORFs. Additionally, four RRs were identified containing a LytTR DNA binding motif that were not associated with a histidine kinase. HPKs share a characteristic kinase core composed of a dimerization domain and a catalytic domain for ATP binding and phosphate transfer. The C-terminal half of the HPK proteins showed five conserved amino acid motifs: the H box, containing the His residue that will be phosphorylated, and the N, G1, F and G2 boxes (Stock et al. (2000) Annu. Rev. Biochem. 69:183-215). ORFs LBA0079HPK, LBA0747HPK, LBA1524HPK, LBA1430HPK, LBA1660HPK and LBAI819HPK were assigned to the group IIIA/OmpR of HPKs in accordance with the region surrounding the histidine that becomes phosphorylated; whereas the HPKs LBA0602HPK and LBA1799HPK, were categorized in the Class IV (Fabret et al. (1999) J. Bacteriol. 181:1975-1983). The remaining 2CRS (LBA1413-LBA1414) could not be classified into any known category. LBA1413 showed a Domain of Unknown Function with GGDEF motif (smart00267, DUF 1), which apparently occurs exclusively in eubacteria and might participate in prokaryotic signaling processes. LBA1414 showed also a domain of unknown function (cd01948, EAL), which is found in diverse bacterial signaling proteins. Together with the GGDEF domain, EAL might be involved in regulating cell surface adhesiveness in bacteria (Galperin et al. (2001) FEMS Microbiol Lett. 203:11-21).
Response regulators contain two conserved domains. First a regulator, which receives the signal from the sensor partner in bacterial 2CRS. It contains a phosphoaceeptor site that is phosphorylated by the histidine kinase. Second, a DNA binding effector domain in the C terminus of the protein. RRs present in L. acidophilus contained these two conserved domains. The RRs ranged from 221 to 274 amino acids in size. ORFs LBA0078RR, LBA0746RR, LBA1525RR, LBA1431RR, LBA1659RR and LBA1820RR can be included in the OmpR family of response regulators according to the amino acid sequence of their output domains, where the residues involved in the hydrophobic core of the domain are conserved (Martinez-Hackert and Stock (1997) Structure 5:109124). The response regulators encoded by LBA603RR and LBAi 798RR can be defined as members of the AlgR/AgrA/LytR family of RRs (Nikolskaya and Galperin (2002) Nucleic Acids Res. 30:2453-2459).
The 2CRS composed of LBA1524HPK and LBA1525RR formed an operon flanked by two terminators with a free energy of −11.0 and −13.8 Kcal/mol, respectively. Also, a typical RBS sequence and a putative promoter were positioned upstream of LBA1525RR (
To investigate the physiological function of LBA I 524HPK-LBA I 525RR 2CRS and to examine its putative association with acid tolerance in L. acidophilus, a chromosomally interrupted LBA1524HPK mutant was constructed. For insertional inactivation of the HPK, a 766-bp internal region was amplified by PCR using the primers I1524F-I1524R described in Materials and Methods. This fragment was cloned into pORI28 and the resulting plasmid, pTRK807, was then transferred by electroporation into L. acidophilus NCFM, already harboring the helper plasmid pTRK669. Integrants were isolated as described by Russell and Klaenhammer (Russell and Klaenhammer (2001) Appl. Environ. Microbiol. 67:4361-4364) to generate strain NCK1686. PCR experiments and Southern hybridizations were performed to confirm the integration event via junction amplicons and fragments (data not shown). Because this operon was flanked by two putative terminators, polar effects from the inactivation of LBA1524HPK were not expected. Phase-contrast microscopy analyses of the HPK mutant revealed a decrease in cell size and chain length compared to the wild type, NCFM cells (data not shown).
Two strong transmembrane regions can be predicted, by in silico analysis, in the histidine protein kinase of LBA I 524BPK-LBA 15 25RR 2CRS (from 24 to 42 aa, and 202 to 226 aa). The ATP-binding phosphotransfer (catalytic domain) and the dimerization domain can be located in the carboxy termini of the protein from 396 to 499 as and from 276 to 341 aa, respectively. The 766-bp internal region of LBA1524HPK, amplified by PCR using primers I 1524F-11524R, used to inactivate the HPK spanned from 51 to 347 aa. As a consequence, insertion of the vector would have affected the second transmembrane and/or the dimerization domain of the HPK.
The response of log phase cells to pH 3.5 was compared between the HPK mutant strain NCK1686 and control, L. acidophilus NCK1398 (NCFM::lacL). Strain NCK1398 was used as a control throughout the study so that the effects of antibiotic pressure could be accounted for. When log phase cells of NCK1686 were exposed to pH 3.5, more than a 2-log reduction in cfu was observed after 2.5 hours, compared to a half-log reduction in the control (
Acid Adaptation of L. acidophilus
Log phase cells of L. acidophilus NCK1398 and NCK1686 were exposed to pH 5.5 for 1 h, prior to challenge by pH 3.5. Remarkably, both the control and HPK mutant exhibited a high tolerance to acid challenge (
In an attempt to identify genes regulated by the 2CRS, and potentially affected by inactivation of the LBA1524HPK ORF, parallel cultures of the control strain NCK1398 (NCFM::lacL) and the HPK mutant (NCK1686) were grown in MRS broth to an optical density of 0.3 and transferred to MRS adjusted to pH 6.8, 5.5, or 4.5. After 30 minutes, RNA was isolated and used for hybridization to microarray slides printed with representative sequences of the majority of the identified ORFs on the L. acidophilus genome. Statistically significant (P≦0.05) gene expression changes were considered for ORFs exhibiting at least a two-fold change.
Comparison of the expression profiles identified 80 differentially-expressed genes showing at least two-fold changes in expression patterns (Table 3). As expected, the components of the LBA I 524HPK-LBA I 525RR 2CRS, as well as the large and small subunits of the (i-galactosidase and UDP-glucose 4-epimerase were differentially expressed, owing to the inactivation of these genes in the compared strains. Surprisingly, the inactivated HPK gene and the RR were over expressed in the NCK1686 mutant. This might be attributable to amplification of the vector in the chromosome and/or a readthrough event where a longer transcript is generated, but not translated into a functional protein. The same effect was observed for NCFM::lacL where the disrupted operon appeared to be highly expressed. Alternatively, a non-functional HPK could result in elevated transcriptional expression of the 2CRS, if the phosphorylated form of the RR was involved in the auto regulation of the 2CRS.
0.36
0.47
0.57
0.53
0.71
6.22
4.43
7.42
1.96
6.70
6.31
7.27
8.09
2.55
7.44
7.09
3.89
8.01
7.10
5.57
2.15
1.89
1.79
2.90
0.44
0.43
6.96
5.51
4.32
3.44
1.64
2.00
2.25
0.35
0.47
0.36
4.92
1.98
2.00
2.22
2.13
2.05
0.15
3.10
2.05
1.91
3.22
0.07
0.17
0.27
0.17
0.43
0.41
0.29
0.35
0.67
0.21
0.46
1.96
2.21
2.11
2.07
2.88
7.95
0.56
2.27
2.82
2.09
1.54
2.27
3.01
3.90
2.94
4.15
7.37
5.16
8.72
1.98
2.91
3.28
1.97
7.53
7.02
1.47
2.08
0.55
0.18
3.52
12.60
1.74
2.31
0.43
3.56
2.35
0.43
5.27
2.97
4.22
3.46
6.24
0.24
2.26
0.47
0.60
0.56
2.34
1Array ratios from two biological replicates and two technical replicates for each condition were averaged.
2Clusters of Orthologous Groups (37). Genes were classified according to the COG domain present in the potentially encoded protein sequence.
3Values in boldface indicate ratios that meet the P criteria (P < 0.05).
The most dramatic changes in expression in the HPK mutant were observed in genes predicted to encode components of the proteolytic enzyme system. Proteolyitc systems of lactic acid bacteria are divided into three functional categories 1) proteinases (that degrade casein into small peptides); 2) transport systems (that import those peptides) and 3) peptidases (Kunji et al. (1996) Antonie van Leeuwenhoek 70:187-221). The expression of ORF LBA1512 encoding the proteinase precursor in L. acidophilus, PAP (39% identical and 53% similar to the cell envelope proteinase PrtR from L. rhamnosus GI27527536), increased in the HPK mutant more than 7-fold at pH 6.8 and 5.5 (Table 3). However, PrtM (LBA1588), the protein putatively involved in the maturation of the proteinase, showed expression levels comparable to the control strain (ratios between 0.8 and 1.1).
Two operons potentially encoding oligopeptide ABC transporters are present in the L. acidophilus genome (
Four peptidases were also differentially expressed in the HPK mutant strain. A neutral endopeptidase PepO (LBAO165) was up regulated at all pHs evaluated. The aminopeptidase encoded by LBA0911, and peptidase T (LBA1515) were up regulated at pH 5.5. Finally, a cytosol non-specific dipeptidase encoded by ORF LBA1837 was significantly up regulated at pH 4.5.
To investigate potential alterations in the proteolytic system of the HPK mutant, we compared the acidification rates of L. acidophilus NCFM (wt; since NCK 1398 does not grow in milk) versus the HPK mutant in 10% skim milk (SM) and in 10% SM plus yeast extract (
Expression of LBA1080 (a putative methionine synthase) and LBA1081 (luxS) was increased up to 6.9-fold under all conditions in the HPK mutant. At the amino acid level, the LuxS (LBA1081) homolog in the genome sequence of L. acidophilus was 77% identical and 84% similar to the S-ribosylhomocysteinase (autoinducer-2 production protein LuxS) from L. plantarum (Kleerebezem et al. (2003) Proc. Natl. Acad. Sci. U.S.A. 100:1990-1995), and 73% identical and 89% similar to LuxS from S. pyogenes (Lyon et al. (2001) Mol. Microbiol. 42: 145-157). Examination of the surrounding chromosomal region suggested that luxS is the second member of an operon consisting of five genes whose function is poorly characterized. A putative rho-independent terminator with a low free energy of −8.5 Kcal/mol was present downstream of luxS.
Among the global transcriptional changes observed in the HPK mutant, two key enzymes involved in lysine biosynthesis, aspartate kinase (EC 2.7.2.4, LBA0850) and diaminopimelate epimerase (EC 5.1.1.7, LBA0849), were up-regulated at pH 5.5. Additionally, a putative operon composed of a cytosol non-specific dipeptidase, an ABC transporter and a transcriptional regulator from the TetR/AcrR family (ORFs LBA1737 to LBA 1840) was highly up regulated at pH 4.5.
Given the similarity of LBA1524HPK with lisK, the HPK from L. monocytogenes (Barefoot and Klaenhammer (1983) Appl. Environ. Microbiol. 45:1808-1815), and the fact that a lisK-deficient mutant was able to grow at a higher concentration of ethanol than its parent strain, survival of the L. acidophilus HPK mutant was investigated in the presence of ethanol. No differences were observed when log-phase cells were exposed to 15% (v/v) ethanol indicating that L. acidophilus is naturally highly resistant. However, at 20% ethanol the HPK mutant showed a 4-log reduction in survival after 90 min compared to only a 1-log reduction in the control (data not shown).
Cells of the control and the HPK mutant strains were harvested at an A600 of 0.3 and exposed to pH 6.8, 5.5, and 4.5 in MRS broth for 30 minutes. Total RNA was prepared and hybridized with several labeled probes. For analysis of gene expression, DNAs of the ORFs indicated in Table 2 were amplified by PCR and labeled with α-32p. Selected for analysis by Northern blot, were oppA1 (LBA0197, up regulated in the HPK mutant), oppA2 (LBA1300, down regulated in the HPK mutant cells), and LBA1524HPK and LBA1525RR (components of the inactivated 2CRS) genes. Genes encoding a glyceraldehyde-3-P dehydrogenase (LBA0698), malolactic enzyme (LBA1075), and RNA polymerase sigma factor rpoD (LBA1196) were also evaluated as controls because these were not differentially expressed at the different pH conditions when evaluated in the microarrays (data not shown).
The hybridized membranes and comparison between relative expression ratios obtained by Microarray and Northern analysis are shown in
Analysis of the genome sequence of L. acidophilus revealed the presence of nine 2CRS (Altermann et al. (2004) Proc. Natl. Acad. Sci. U.S.A. 102:3906-3912). All the identified histidine protein kinases showed between two and six transmembrane domains, suggesting their location in the cell membrane. One of the identified 2CRS's, LBA1524HPK-LBA1525RR, showed homology to lisRK, a signal transduction system previously shown to participate in stress response and virulence in L. monocytogenes (Cotter et al. (1999) J. Bacteriol. 181:6840-6843). When we insertionally interrupted LBA1524HPK, log-phase cells became more sensitive to acid pH. We previously reported that L. acidophilus induces an adaptive response at pH 5.5 that provides elevated acid tolerance to the cells (Azcarate-Peril et al. (2004) Appl. Environ. Microbiol. 70:5315-5322). Both, the HPK mutant and the control NCFM::lacL exhibited an acid induced tolerance response (ATR), although this response was slightly impaired by the LBA1524HPK mutation. This indicates that while LBA1524HPK-LBAi525RR plays some role, additional mechanisms contribute to acid adaptation in L. acidophilus that are not regulated by this 2CRS.
A whole genome array comparing the expression profile between the control and the HPK mutant revealed an altered expression pattern of numerous ORFs encoding genes for major components of the proteolytic enzyme system. Based on its genome sequence, L. acidophilus has a limited capacity to synthesize amino acids, with the potential to synthesize only three amino acids (cysteine, serine, and aspartate) de novo. Additionally, cysteine and serine could be synthesized from pyruvate, and aspartate from fumarate. Based on these three amino acids, a series of other derivatives might be generated (asparagine, threonine, glycine, lysine, methionine, glutamine and glutamate). However, neither de novo or conversion pathways were predicted for the remaining 13 amino acids (Altermann et al. (2004) Proc. Natl. Acad. Sci. U.S.A. 102:3906-3912). Therefore, amino acid requirements must be satisfied by the uptake of amino acids and oligopeptides. L. acidophilus encodes two putative oligopeptide transporting systems (Altermann et al. (2004) Proc. Natl. Acad. Sci. U.S.A. 102:3906-3912), opp1 (ORFs LBA0197 to LBA0203) and opp2 (ORFs LBA1300 to LBA1306). As well, six additional genes coding for periplasmic substratebinding proteins (OppA) were identified (LBAI216, LBA1347, LBA1400, LBA1665, LBA1958, and LBA1961). One major function of oligopeptide transport (Opp) systems for bacterial cells is to internalize peptides to be used as carbon and nitrogen sources. They are also involved in the recycling of the cell wall peptides, which are likely one of the first targets of physiochemical stress. Opp systems are members of the ABC transporters family and usually consist of two ATP-binding proteins, two transmembrane proteins, and an extracellular ligand-specific binding protein. In gram-positive bacteria, the substrate-binding protein aligns with the external face of the cytoplasmic membrane (Sutcliffe and Russell (1995) J. Bacteriol. 177:1123-1128) and biochemical evidence suggests that they have a chaperone-like function in protein folding, protection against thermal denaturation, and interaction with unfolded proteins (Richarme and Caldas (1997) J. Biological Chem. 272:15607-15612). Since several components of the proteolytic system were overexpressed, we expected that the HPK mutant would be able to grow better in milk than the control. On the contrary, the mutant was not able to acidify 10% skim milk (SM) under pH 5.0. However, when SM was supplemented with yeast extract both the parent and the mutant were stimulated to the same degree. Yeast extract is the water-soluble portion of autolyzed yeast, containing vitamin B complex. It provides vitamins, nitrogen, amino acids, and carbon in growth media or supplemented milk. Furthermore, supplementation of SM with casaminoacids essentially abolished differences in acidification rate between the wild type and the mutant strains. These observations provide evidence that the proteolytic system in the HPK mutant was debilitated. An alternative possibility is that inactivation of the 2CRS resulted in the reduced expression of a specific amino acid transporter. The decreased intracellular concentration of that amino acid might trigger the cell to overexpress other options to obtain that amino acid, i.e., through peptide transport and peptidases, or through other pathways such as enzymes involved in the biosynthesis of lysine (LBA0849 and LBA0850). Two genes encoding putative opp binding proteins (LBA1300 and LBA1665) were consistently under expressed in the mutant suggesting that these transport systems are important for the organism's ability to grow in milk. It is not clear, however, why other opp transporters present in the genome would not replace any loss of capacity from the limited expression of LBA1300 and LBA1665, especially when a number of these were overexpressed.
Opp systems are also related to mechanisms of signaling since they transport signal peptides that, once inside the cell, will interact with intracellular receptors to regulate cellular functions (Lazazzera (2001) Peptides 22:1519-1527). In gram-positive bacteria, cell-density response mechanisms are well studied. A peptide signal precursor locus is translated into a precursor protein that is cleaved to produce an autoinducer signal that is transported out of the cell. When the extracellular concentration of the peptide signal accumulates to the minimal stimulatory level, a HPK of a 2CRS detects it and the phosphorylated RR activates the transcription of target genes (Miller and Bassler (2001) Annu. Rev. Microbiol. 55:165-199).
Interestingly, the autoinducer-2 production gene, luxS, was significantly overexpressed in the BPK mutant. The gene luxS is responsible for the production of an autoinducer molecule AI-2 in Vibrio harveyi and other gram-positive and gram-negative bacteria (Shauder et al. (2001) Mol. Microbiol. 41:463-476). LuxS is the autoinducer synthase, responsible for catalysis of the final step in AI-2 biosynthesis. The disruption of luxS in S. pyogenes had several effects suggesting that it is an important component of the response machinery that allows this strain to adapt to changing conditions during an infection. These effects include regulation of the SpeB protease and stress response (Lyon et al. (2001) Mol. Microbiol. 42: 145-157). The gene located upstream luxS (LBA1080) was also up-regulated in the mutant at both pH 5.5 and 4.5.
Intriguingly, the expression of the aspartate kinase (EC 2.7.2.4, LBA0850) and diaminopimelate epimerase (EC 5.1.1.7, LBA0849) was increased at pH 5.5 in the HPK mutant. These are key enzymes in the biosynthesis of lysine and are organized in an operon in L. acidophilus. However, the diaminopimelate decarboxylase (EC 4.1.1.20, LBA0851), enzyme responsible for the last step in the synthesis of lysine, was not over expressed in the HPK mutant under these conditions, we suggest D,Ldiaminopimelate, instead of being converted to L-lysine, enters the peptidoglycan biosynthesis pathway. It is unclear if the HPK mutant produces more peptidoglycan. If so, that may contribute to the changes observed in cell morphology and chain length. In summary, environmental conditions that included changes in acid concentration and fluctuations of pH were sensed by the 2CRS, LBA1524HPK. It would be expected that this protein then initiates a phosphorylation cascade that regulates expression of a number of genes in the L. acidophilus genome. Most of the differentially expressed genes were up regulated in the HPK mutant, suggesting that LBA1525RR may act as a repressor. The inactivation of this 2CRS resulted in alterations in cell morphology, acid sensitivity, ethanol sensitivity, and poor acidification rates in skim milk indicating a loss of proteolytic activity. Microarray data showed that more than 50% of the genes differentially expressed in the BPK mutant encode putative membrane proteins. Additionally, expression of multiple components of the proteolytic enzyme system, i.e. opp transporters, permeases, and peptidases, were dramatically affected by the inactivation of the HPK, but no simple correlation of higher or lower gene expression to proteolytic activity, or the loss thereof, was apparent.
Bacteriocins are a diverse group of antimicrobial peptides produced by microorganisms. Their range of inhibition is narrow, typically limited to species that inhabit the same environmental niches such as the gastrointestinal tract. Many bacteriocins are able to elicit their lethal effects by creating pores in the cellular membrane of target organisms. This results in a dissipation of the proton motive force, leakage of ATP and other essential cellular ions leading to cell death. Currently, bacteriocins produced by lactic acid bacteria (LAB), in particular, are widely used within the food industry due to their efficacy against foodborne pathogens such as Listeria monocytogenes and Clostridium botulinum. Lactacin B is a chromosomally encoded bacteriocin produced by Lactobacillus acidophilus NCFM. Recent sequencing of the NCFM genome revealed a primary region of interest possibly responsible for lactacin B production, processing, and export. The overall objective of our study was to investigate the role of this region in lactacin B production and processing
The activity of lactacin B, a bacteriocin produced by L. acidophilus NCFM, was assayed using the direct method for bacteriocin detection (Barefoot et al. (1983) Appl. Environ. Microbiol. 45:1808-1815). Zones of inhibition indicate death of indicator strain. Bacteriocin production by L. acidophilus NCFM and its derivatives was carried out under both aerobic and anaerobic conditions.
Stationary phase cultures of NCFM were carried out as follows. 5 μl of culture were aliquotted onto MRS agar plate (1.5% w/v). MRS soft agar (0.75% w/v) containing indicator strain was poured onto surface of plate. After 19-24 hour incubation, zones of inhibition were analyzed.
The consensus genetic elements necessary for production of many LAB bacteriocins have been elucidated. These elements can exist on a plasmid and/or chromosomally and include genes encoding a two-component regulatory system, one or more structural genes encoding the pre-bacteriocin peptide, a gene encoding an immunity protein and finally one or more genes encoding a dedicated export system responsible for export of the bacteriocin molecule from the cell. These coordinated processes yield a mature biologically active antimicrobial peptide as illustrated by Ennahar et al. (2000) FEMS Microbiol. Lett. 24: 85-106.
Previous analysis revealed that lactacin B is a 6.5 kDa bacteriocin with antagonistic activity against closely-related species; the genetic determinants were unknown (Barefoot et al. (1983) Appl. Environ. Microbiol. 45:1808-1815.). Recent mining of the NCFM genome revealed a region possibly responsible for lactacin B production (Altermann et al. (2004) Proc. Natl. Acad. Sci. USA 102: 3906-12). This region is flanked by two strong terminators and includes 11 putative open reading frames (ORFs) with similarities to conventional bacteriocin machinery including a regulation system, an immunity protein, and a dedicated ABC transporter protein involved in bacteriocin export. Seven additional putative open reading frames with unknown functions were also identified in the putative operon (data not shown). Table 4 provides a summary of the various open reading frames and their function.
thermophilus LMG18311]
In order to examine the role that this region plays in lactacin B production, the gene encoding the putative ABC transporter protein (LBA1796) was functionally disrupted by homologous recombination using the targeted integration vector pORI28 as described by Russell et al. (2001) Appl. Environ. Microbiol. 67: 4361-4364. An 800 bp internal fragment of LBA1796 was PCR amplified and cloned into pORI28 using XbaI/BglII sites. Subsequent transformation into NCFM containing a temperature sensitive helper plasmid (pTRK669) selects for chromosomal integrants following a temperature increase.
Inactivation of the putative ABC transporter protein was confirmed via Southern hybridization analysis (data not shown) and by PCR to confirm junction fragments using chromosomal DNA as a template (data not shown). The integrant was assayed for lactacin B activity (
ABC Transporter Protein (LabT) appears crucial for lactacin B export and activity. It is likely that this region also encodes the genetic determinants for lactacin B regulation, production, and immunity.
The effectiveness of any bacterium used as a probiotic or biotheraputic vector intended to act in the intestinal tract depends on its ability to survive in this region where it must be able to withstand stresses imposed by the body's physicochemical defense system. These stresses include low pH, high osmolality, and the presence of bile (Chowdhury et al. (1996) Stress Response in Pathogenic Bacteria. Indian Journal of Biosciences 21:149-160). Bile's amphipathic nature allows it to act as a detergent, dissolving the phospholipid membranes that surround bacteria leading to a loss of membrane integrity and cell death. In addition to its action as a detergent, bile has been shown to cause DNA damage and induce genes involved with DNA repair (Dashkevicz et al. (1989) Appl Environ Microbiol 55:11-6, McAuliffe et al. In Press. Appl Environ Microbiol.). Bacteria employ a plethora of mechanisms to respond to and defend against bile in their environment, including mechanisms that remove bile from the cell, modify it, and repair damage through general stress responses. The pathway to the induction of genes that mediate these responses is largely unknown, but may be mediated by a histidine protein kinase-response regulator phosphorelay pathway (Begley et al. In Press. The interaction between bacteria and bile. FEMS Microbiol Rev.). A whole-genome microarray study has shown that several genes in L. acidophilus NCFM are upregulated upon exposure to 5% Oxgall (See, Example 2). Included among these was a group of six tandem genes containing both histidine kinase and response regulator genes. This study examined this putative operon and the influence of its histidine kinase on cell growth in the presence of bile.
A microarray study of the expression of genes in L. acidophilus NCFM cells exposed to 5% Oxgall was performed that indicated the upregulation of six tandem genes (LBA1432-LBA1427) with this treatment (see, Example 2). The sequenced L. acidophilus NCFM genome (Accession NC—006814) indicates that LBA1430 encodes a histidine protein kinase and LBA1431 encodes a response regulator gene. The function of the other genes in this group remains largely unknown, although LBA1429 consists of 12 transmembrane domains and is believed to encode a transporter (Altermann et al. (2005) Proc Natl Acad Sci USA 102:3906-12). Clone Manager software indicated the presence of dyad symmetry that could lead to a stem loop structure, typical of a transcriptional terminator upstream of the putative operon. Reverse transcriptase PCR using primers designed to amplify the intergenic regions in the proposed operon was performed on RNA extracted from cells grown in MRS with no Oxgall or MRS+0.3% Oxgall for 1.5 hours. PCR amplification of cDNA from the intergenic regions indicated that these genes are cotranscribed into RNA.
The putative histidine kinase gene in the six-gene operon was selected for inactivation in order to investigate its role in bile tolerance in L. acidophilus NCFM. This inactivation was carried out by insertion of an erythromycin cassette by the method of Russell and Klaenhammer, utilizing the Ori+ and RepA− integration plasmid, pORI28 (Flahaut et al. (1996) Appl. Environ. Microbiol. 62:2416-2420). The integration vector was created through ligation of a BglII and XbaI digested pORI28 with BglII and XbaI digested PCR fragment of LBA1430. The resulting plasmid, pTRK843 was transformed into L. acidophilus NCFM containing pTRK669, a plasmid containing a functional repA gene and a chloramphenicol resistance cassette. A temperature shift from 37° to 42° resulted in the loss of pTRK669 and selection for clones where pTRK843 had integrated into the genome. Integration of pTRK843 was confirmed by Southern blot (data not shown).
Cells were grown anaerobically in MRS with 0.0%, 0.3%, or 0.5% Oxgall for 15 hours with OD600 measurements taken every 15 minutes. The resulting growth curves showed decreasing ability for the histidine kinase mutant to grow as the concentration of Oxgall in the medium increases as compared to the wild type strain. An ABC transporter, LBA1796, knockout mutant (Bac) was used in this study as a control so erythromycin pressure could be maintained indicating the continued presence of the insertional knockout (A. Dobson, personal communication). The maximum specific growth rate (μmaxh-1) for the HPK mutant was significantly different than for the controls when grown in MRS with 0.3% and 0.5% Oxgall. See
Growth curve experiments were also performed using 0.3% of individual bile salts: taurocholic acid, taurodeoxycholic acid, taurochenodeoxycholic acid, glycocholic acid, glycodeoxycholic acid. Sodium taurodeoxycholate was the only salt that affected the growth of the HPK mutant as compared to wild type, however the maximum specific growth rate between the strains were not significantly different with any of the salts. See
It is known that some lactobacilli and bifidobacteria strains, including L. acidophilus NCFM, possess the ability to deconjugate bile salts, or separate their amino acid moiety from the cholesterol backbone (Gilliland et al. (1977) Appl Environ Microbiol 33:15-8). Although the role of this process in bacteria is not clear, it is believed to confer some positive effect on the cell including protection against the toxic effects of bile (Flahaut et al. (1996) Appl. Environ. Microbiol. 62:2416-2420). Functional analysis of the bile salt hydrolase genes has shown that the bshB gene (LBA1078) deconjugates sodium taurodeoxycholate (McAuliffe et al. In Press. Appl Environ Microbiol.). Since the growth of the HPK mutant was decreased in this particular salt, it is possible that genes controlled by this particular HPK include the bshB gene. In order to determine if the HPK mutant retained the ability to deconjugate this bile salt, cells were plated onto MRS agar with 0.3% of each of the salts used in the growth experiments. Zones of clearing surrounding the colonies indicated the activity of the bile salt hydrolases (Dashkevicz et al. (1989) Appl Environ Microbiol 55:11-6). No difference in deconjugation was seen between the wild type L. acidophilus NCFM and the HPK mutant strain.
It has been proposed that since sodium taurodeoxycho late is more hydrophobic than other bile salts, that it imposes a more disruptive effect on bacterial cell membranes (Sung et al. (1993) Dig Dis Sci 38:2104-12). Since this particular salt lowers growth of the HPK mutant as compared to the wild type and control strains, it is possible that genes regulated by this particular histidine kinase encode proteins that may counteract this disruptive effect.
LBA1427-1432 in L. acidophilus NCFM constitute an operon involved in bile tolerance. LBA1430 in L. acidophilus NCFM encodes a histidine kinase involved in bile tolerance. Loss of histidine kinase activity from LBA1430 leads to a decreased ability of cells to grow in the presence of bile. Sodium taurodeoxycholate has a more inhibitory effect on the growth of the HPK mutant than other salts tested.
All publications, patents and patent applications mentioned in the specification are indicative of the level of those skilled in the art to which this invention pertains. All publications, patents and patent applications are herein incorporated by reference in their entireties to the same extent as if each individual publication, publication or patent application was specifically and individually indicated to be incorporated by reference for the teachings disclosed in the sentence and/or paragraph in which the publication, patent or patent application is cited.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be obvious that certain changes and modifications may be practiced within the scope of the appended claims.
This application is a divisional of U.S. patent application Ser. No. 12/046,080, filed Mar. 11, 2008, which is a divisional of U.S. patent application Ser. No. 11/199,489, filed Aug. 8, 2005 and claims the benefit of U.S. Provisional Application Ser. No. 60/599,972, filed Aug. 9, 2004, the contents of which are herein incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
60599972 | Aug 2004 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12046080 | Mar 2008 | US |
Child | 12771489 | US | |
Parent | 11199489 | Aug 2005 | US |
Child | 12046080 | US |