Methods and Compositions Using Highly Conserved Pneumococcal Surface Proteins

FIELD OF THE INVENTION

This invention is generally in the field of infections by the bacterium Streptococcus pneumoniae. In particular, the invention provides methods and compositions that use one or more highly conserved genes encoding a surface protein of Streptococcus pneumoniae for diagnosing, treating, or preventing infection by Streptococcus pneumoniae. In particular embodiments, immunogenic compositions according to the invention include at least one in-common, highly conserved Streptococcus pneumoniae surface antigen or immunogenic fragment thereof, which may be used to elicit an immune response against a majority (i.e., >50%), advantageously a supermajority (i.e., at least 80%), and ideally all or nearly all (e.g., 90 or more) of the known capsular serotypes of Streptococcus pneumoniae, thus advancing modes of addressing Streptococcus pneumoniae infection toward a universal vaccine.

BACKGROUND OF THE INVENTION

Pneumonia is an ailment that can be lethal, particularly to the elderly and young. Every year, approximately 1.3 million children under the age of five die from pneumonia, mostly in the developing world. Infection by the bacterium, Streptococcus pneumoniae (also referred herein as “pneumococcus”), the most common cause of severe pneumonia, kills more than half a million children annually. While even healthy adults may be infected by S. pneumoniae, the elderly, immunocompromised individuals, and children are at greatest risk from the disease. Pneumococcus can also cause bacterial sepsis, bacterial sinusitis, and bacterial meningitis, and is the leading cause of middle ear infection (otitis media).

S. pneumoniae strains vary by region, and there are more than 94 capsular serotypes. While there has been some success in making pneumococcal vaccines, current pneumococcal vaccines do not protect against most pneumococcal capsular serotypes. For example, one currently marketed vaccine, PREVNAR-13® (Wyeth LLC, marketed by Pfizer, New York, N.Y.), is only partially efficacious, protecting against only 13 of the 94 or so pneumococcal capsular serotypes. Another marketed vaccine, PNEUMOVAX®-23 (Merck, Kenilworth, N.J.), offers no protection for children less than two years old, and no protection for older children and adults against 71 of the 94 or so pneumococcal capsular serotypes.

Accordingly, there has been an acute desire to find a “common protein” or “universal” vaccine that would be useful for immunizing against all or nearly all pneumococcal strains (currently ˜94 known), a supermajority, e.g., at least 80%, or at least a majority, e.g., greater than 50%, of the known pneumococcal strains. A critical barrier to the development of such a “common protein” vaccine has been pneumococcal antigenic diversity driven by host immunological selective pressure. The paradigm for development of a common protein vaccine has primarily been empirical, top-down, trial and error testing of known antigenic outer surface protein candidates. Such an approach has yielded poor results primarily because promiscuously recombinant bacteria like S. pneumoniae are adept at “immune escape”, based on the fact that the immunologically active regions of nearly all major surface proteins that have been examined mutate constantly. Copies of a mutated gene that provides an S. pneumoniae cell with a selective advantage for evading attack by an immune response will be rapidly transferred from the mutant cell to recipient cells owing to a promiscuous horizontal (lateral) gene transfer system with subsequent recombination of the mutated gene (recombinational mutation) into the genomes of the recipient cells.

Clearly, needs remain for compositions and methods for identifying, treating, or preventing infections by S. pneumoniae that are more resistant to the adaptive ability of S. pneumoniae cells to evade an immune response.

SUMMARY OF THE INVENTION

The present invention provides a particularized family of streptococcal surface antigens that are useful as immunogens to elicit an immune response recognizing all or nearly all S. pneumoniae strains. Polypeptide immunogens of the present invention therefore represent a very important advance in the search for a universal pneumonia vaccine. The immunogenic polypeptides of the present invention are serotype-independent and are characterized by their presence across virtually all genomic variants of S. pneumoniae and by their very low incidence of sequence variability from one allele to another across S. pneumoniae populations.

The candidate immunogens of the invention are referred to herein as cold spot polypeptides, which makes reference to the fact that the encoding genes for the polypeptides are located in recombinationally quiescent regions of the S. pneumoniae genome flanking the rRNA operons and show very low rates of mutation. Genes from such recombinationally quiescent loci are termed “cold spot genes”, and their expression products are referred to herein as “cold spot polypeptides”.

In a particular embodiment, the present invention provides an immunogenic composition comprising:

- (a) at least one cold spot polypeptide or an immunogenic fragment thereof, which polypeptide comprises at least a portion of an extracellular domain of a S. pneumoniae surface protein, wherein said protein:
  - (i) is in-common to all or nearly all known strains of S. pneumoniae, and
  - (ii) has at least a 98% average amino acid sequence pairwise homology among such known strains of S. pneumoniae;
- (b) a pharmaceutically acceptable vehicle or excipient; and
- (c) optionally, an adjuvant.
  
  Preferred polypeptides disclosed herein have an average amino acid sequence pairwise homology of at least 99%. Most preferred polypeptides of the present invention have an average amino acid sequence pairwise homology of at least 99.5%.

In particular embodiments, immunogenic cold spot polypeptides may be selected from the surface antigens set forth in Tables 1A and 1B, infra, having SEQ ID NOs:1-273. Of particular interest are extracellular regions of such cold spot polypeptides, as those regions correspond to naturally occurring surface targets on pneumococcal cells. Particular extracellular domains derived from the top six of 21 selected cold spot polypeptides from Table 1A are set forth in Table 3, infra, having SEQ ID NOs:336-341. Natural allelic variants of such polypeptides and extracellular portions thereof are equally useful immunogens for the compositions and uses disclosed herein.

The invention also provides methods of making immunogenic compositions comprising one or more cold spot polypeptides described herein for use in raising an immune response against a majority (i.e., >50%), more preferably a supermajority (i.e., at least 80%), and even more preferably all or nearly all (e.g., 90 or more) of the known serotypes of Streptococcus pneumoniae.

In a particular embodiment, the invention provides a method of making an immunogenic composition for raising an immune response producing antibodies reactive against at least 80 different serotypes of S. pneumoniae, said method comprising:

(1) selecting one or more cold spot surface antigens of S. pneumoniae,

(2) isolating one or more polypeptide segments from an extracellular domain of the one or more cold spot surface antigens selected in (1), and,

(3) formulating said one or more isolated polypeptide segments obtained in (2) by admixing said isolated polypeptide segments with a pharmaceutically acceptable carrier, to produce an immunogenic composition for inoculating human subjects against S. pneumonia infection.

In the above method of making an immunogenic composition according to the invention, the immunogenic composition may also include one or more adjuvants.

In another embodiment, in the method of making an immunogenic composition as described above, the immunogenic composition is effective for raising an immune response producing antibodies reactive with at least 90 S. pneumonia serotypes. More preferably, the immunogenic composition in the above method of making an immunogenic composition of the invention is effective for raising an immune response producing antibodies reactive with at least 91 S. pneumonia serotypes. Even more preferably, the immunogenic composition in the above method of making an immunogenic composition of the invention is effective for raising an immune response producing antibodies reactive with at least 93 S. pneumonia serotypes.

In another embodiment, step (1) of the above method of making an immunogenic composition according to the invention selects 1, 2, 3, 4, or 5 cold spot surface antigens. In another embodiment, step (1) of the above method of making an immunogenic composition according to the invention selects 1 or 2 cold spot surface antigens. In a further embodiment, step (1) of the above method of making an immunogenic composition according to the invention selects 1 cold spot surface antigen from a strain of S. pneumonia.

The present invention also relates to a method of eliciting an immune response in a mammal, such as a human, said method comprising administering to said mammal an immunogenic composition comprising:

- (a) at least one cold spot polypeptide or an immunogenic fragment thereof, which polypeptide comprises at least a portion of an extracellular domain of a S. pneumoniae surface protein, wherein said protein:
  - (i) is in-common to all or nearly all known strains of S. pneumoniae, and
  - (ii) has at least a 98% average amino acid sequence pairwise homology among such known strains of S. pneumoniae;
- (b) a pharmaceutically acceptable vehicle or excipient; and
- (c) optionally, an adjuvant.

In other embodiments, such a method of eliciting an immune response will utilize two or more such immunogenic compositions.

Also contemplated herein is a method of vaccinating a subject against S. pneumoniae infection, said method comprising administering at least one immunogenic composition according to the invention to a subject in an amount or for a number of administrations sufficient to elicit an immune response characterized by the presence of antibodies recognizing a majority of S. pneumoniae serotypes. Currently, there are 94 known serotypes of S. pneumoniae, with additional serotypes expected to evolve or to become recognized in the future. Accordingly, for the purposes of the present invention, “a majority of S. pneumoniae serotypes” will signify 50 or more serotypes. Recognizing that commercial pneumonia vaccines (e.g., PREVNAR®, PNEUMOVAX®) available today address only up to 23 of the known serotypes, a vaccine composition capable of addressing a majority of S. pneumoniae serotypes represents a significant advance in the field of vaccine development.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the estimation of the number of S. pneumoniae isolates necessary to analyze in order to be representative of species diversity. Two hundred thirty-one (231) fully sequenced S. pneumoniae genomes were analyzed and allelic variants compared for four cold spot proteins, MreD (SP_2217), MreC (SP_2218), HtrA (SP_2239) and an Fe ABC transporter (SP_1872), and for a more variable surface antigen, PspA (SP_0117). For each isolate (x axis), they value increases by 1 if the isolate represents an allelic variant not previously identified in the genome set. The resulting plots are depicted in dotted lines, with fitted non-linear regression curves. The corresponding goodness of fit is shown in parentheses. The model used for the non-linear regression is the one-phase exponential association function y=y_max(1−e^−Kx). As the data set increases and becomes more representative of species diversity, fewer new alleles appear, and the tangent to the regression curve tends toward a slope of 0, at which point one can estimate both the number of allelic variants likely to appear and the approximate number of isolates required in order to achieve this maximal representation of the diversity of the species. It is seen that for proteins as allelically invariant as cold spot proteins, the number of isolates necessary to analyze to be representative of S. pneumoniae diversity is between 100 and 200 isolates.

DETAILED DESCRIPTION OF THE INVENTION

This invention is based on the discovery of a family of S. pneumoniae genes that are highly conserved in identical or highly homologous forms throughout all or nearly all known S. pneumoniae strains. The genes of this family are examples of recombinational “cold spots” in the genome where very low levels of sequence variability are observed across the known S. pneumoniae strains. Owing to an exceptionally promiscuous system of horizontal DNA exchange across S. pneumonia cells, recombinational mutation frequently alters or deletes genes along approximately 74 percent of the S. pneumoniae (pneumococcal) genome. See, Croucher et al., Science, 331: 430-434 (2011). The approach taken for this invention was to identify pneumococcal genes that are specifically localized in recombinational “cold spot” regions of the pneumococcal genome, where mutation frequency is very low. Genes isolated from within such recombinationally quiescent regions are referred to herein as “cold spot” genes, and the polypeptides encoded by such “cold spot” genes are referred to as “cold spot polypeptides”. A “cold spot” gene is characterized by a lack of high sequence variability among its known alleles across all or nearly all strains of S. pneumoniae. Such a lack of variability in nucleotide sequence of a given cold spot gene from strain to strain is in turn reflected in a low variability in the amino acid sequence of the encoded cold spot polypeptide as expressed in all or nearly all strains of S. pneumoniae. Typically, and preferably, a cold spot polypeptide described herein has a greater than 98% average amino acid pairwise homology from strain to strain of S. pneumoniae (i.e., there is greater than 98% average amino acid pairwise homology between allelic forms of the cold spot polypeptide from strain to strain); in most cases, and more preferably, a cold spot polypeptide described has greater than 99% average amino acid pairwise homology from strain to strain of S. pneumoniae, and, in many cases, and even more preferably, a cold spot polypeptide described has a greater than 99.5% average amino acid pairwise homology from strain to strain of S. pneumoniae. In the most preferred embodiments, sixteen cold spot polypeptides are disclosed having a percent average amino acid pairwise homology from strain to strain of S. pneumoniae of 99.50% or higher. Such significantly low levels of variability in cold spot gene (nucleotide) sequences and in cold spot polypeptide (amino acid) sequences from strain to strain reflect the presence of an innate and extremely high selective pressure to maintain the existing “cold spot” gene sequences and their encoded polypeptide sequences across all or nearly all known pneumococcal strains and serotypes.

While not intending to be bound by any particular theory or mechanism, it is more likely that the resistance to sequence variability in the cold spot genes suggests that most recombinational events leading to mutations have lethal consequences either directly or by impacting adjacent genetic loci (e.g., rrn operons) that are essential for growth.

All of the cold spot genes disclosed herein encode polypeptides that are expressed on the surface of a majority or, indeed, of all or nearly all, e.g., >90%, of S. pneumoniae capsular serotypes. Cold spot polypeptides of the present invention will be expressed on the surface of at least 50, preferably at least 80, more preferably at least 90, and most preferably all of the 94 capsular serotypes currently known. Reflecting the high nucleotide sequence conservation of the cold spot genes, the corresponding amino acid sequences of the encoded S. pneumoniae cold spot polypeptides are also highly conserved across the known S. pneumoniae strains. Accordingly, cold spot polypeptides, extracellular domains of cold spot polypeptides, or immunogenic fragments of the cold spot extracellular domain, are particularly useful in immunogenic and vaccine compositions to elicit production of antibodies that will bind the polypeptides or their antigenic fragments that are expressed on the surface of all or nearly all S. pneumoniae strains. The isolated extracellular domain of a cold spot polypeptide is also referred to as the “cold spot extracellular domain”. Accordingly, a cold spot polypeptide, an isolated cold spot extracellular domain, or an immunogenic fragment comprising all or a portion the extracellular domain thereof, may be used as the basis of a “universal” or “capsular serotype-independent” vaccine that can elicit antibody that binds all or nearly all S. pneumoniae cells, independent of any S. pneumoniae cell's known or unknown capsular serotype classification. For the purposes of the present invention, a S. pneumoniae surface antigen polypeptide will be considered “capsular serotype-independent” if it is a portion of a surface protein common to all or nearly all serotypes of S. pneumoniae. Preferred such polypeptides will have an amino acid sequence variability from strain to strain that is so low that it exhibits an average amino acid pairwise homology among known serotypes of S. pneumoniae of greater than 98%, preferably greater than 99%, and most preferably greater than 99.5%. Immunization with such an in-common, highly conserved cold spot polypeptide according to the invention elicits production of antibodies capable of recognizing a supermajority, e.g., greater than 90% of known S. pneumoniae strains.

In order that the invention may be more fully understood, the following terms are defined.

Unless indicated otherwise, when the terms “about” and “approximately” are used in combination with an amount, number, or value, then that combination describes the recited amount, number, or value alone as well as the amount, number, or value plus or minus 10% of that amount, number, or value. By way of non-limiting example, the phrases “about 40%” and “approximately 40%” disclose both “40%” and “from 36% to 44%, inclusive”.

The terms “protein” and “polypeptide” are used interchangeably to refer to a polymer of amino acid residues connected by peptide bonding as used and understood in the fields of protein biochemistry and molecular biology. Typically, but not exclusively, the terms “protein” and “polypeptide” are used to refer to polymers of 20 or more amino acid residues linked by peptide bonds. The term “peptide”, like “protein” and “polypeptide”, also refers to a polymer of two or more amino acid residues linked by peptide bonding, but usually, although not exclusively, is used to describe a polymer of less than 20 amino acid residues.

The terms “antigenic determinant” and “epitope” are used synonymously and refer to the specific site on an antigen at which an antibody molecule binds. The antigenic determinant or epitope of an antigen is complementary to the antigen binding site of an antibody. An antigen may have only one or, as is usually the case, several or even many epitopes. Epitopes of a given antigen may be present as multiple copies of structurally identical moieties, as in the case of repetitive amino acid sequences in some proteins, or distinctly different, in which case each epitope could be bound by a different antibody. An epitope may be composed of a sequence of contiguous amino acid residues (linear epitope) along a polypeptide chain or non-sequential (nonlinear) amino acid residues from segments of a polypeptide chain (or even more than one polypeptide chain) that are brought together in a folded, three-dimensional conformation. A minimal sequential (linear) epitope of a polypeptide is typically five to eight amino acid residues in length (see, e.g., Kuby, Immunology, 2^nded., (W.H. Freeman and Company, New York, 1994), sentence bridging pp. 94-95; Watson et al., In Molecular Biology of the Gene, Fourth Edition (The Benjamin/Cummings Publishing Co., Inc., 1987), p. 836; Davis et al., In Microbiology, Second Edition (Harper & Row, Hagerstown, 1973). Epitopes of a polypeptide may be at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 contiguous amino acid residues. Epitopes of particular interest for the purposes of this invention are those located in the extracellular domain of a cold spot polypeptide, as such epitopes are expected to be exposed on the surface of S. pneumoniae cells and thus available for binding by an antibody.

The term “antigen”, as used herein, refers to any compound that is bound by an antibody, or antigen-binding fragment of the antibody, wherein the antibody was produced by an immune response elicited by an immunogenic composition comprising an epitope (antigenic determinant), which is contacted by the antigen binding site of the elicited antibody and which is also present on the antigen. The degree of specificity and affinity of an antibody to bind a cognate epitope of an intended (target) antigen is determined by the structural configuration and complementary determining regions (CDRs) of the antigen binding site of the elicited antibody that provides both spatial and non-covalent bonding features that are necessary to bind and retain the cognate epitope in the antigen-binding site, and thereby form an antibody-target antigen complex to the exclusion of other molecules that lack the structural features of the cognate epitope under the same environmental conditions. It is understood that some molecules may contain cross-reacting epitopes that approximate the features of the cognate epitope of the target antigen so that such molecules may also be bound by the antibody that was raised against the cognate epitope of the target antigen. Typically, however, antibody binding to a cross-reacting epitope is with a lower affinity as compared to the target antigen (or fragment thereof) containing the cognate epitope. Typically such cross-reactivity will be seen between only highly homologous antigens; of significance here is that BLAST comparison of the cold spot gene products described herein against all protein sequences in the NCBI GenBank databases found no homology other than to conserved proteins of other streptococcal strains.

The binding affinity of an antibody to a target antigen, or antigenic fragment thereof, comprising the cognate epitope can be readily determined using any of a number of methods available in the art including, but not limited to, enzyme linked immunosorbent assay (ELISA), surface plasmon resonance-based measurements (for example, using a Biacore® surface plasmon resonance instrument, Biacore® AB, Uppsala, Sweden), immuno-dot blot assay, Western (immuno) blotting, immuno-affinity chromatography, immunoprecipitation, flow cytometry, and fluorescence-activated cell sorting (FACS). More preferably, a method for determining affinity of an antibody to a target antigen, or antigenic fragment thereof, is determined using enzyme linked immunosorbent assay (ELISA), surface plasmon resonance-based measurements, flow cytometry, or FACS.

Preferred antigens of the invention include cold spot polypeptides encoded by cold spot genes, isolated cold spot extracellular domains of a cold spot polypeptide (cold spot extracellular domains), and immunogenic fragments of a cold spot extracellular domain. Preferred antibodies, or antigen-binding fragments thereof, bind to an epitope of the cold spot extracellular domain that is expressed on the surface of S. pneumoniae cells.

Unless indicated otherwise, the term “immunogen”, as used herein, refers to any compound or composition that elicits an immune response in a human or non-human subject that has been inoculated with the compound or composition, wherein the immune response includes production of antibody that binds the compound or a component of the composition. Such compounds and compositions are also described as “immunogenic”.

The term “immunogenic fragment” as used herein, with respect to a cold spot polypeptide, refers to a fragment of a cold spot polypeptide that is capable of eliciting an immune response when a mammalian subject is inoculated with the polypeptide, wherein the elicited immune response includes production of antibodies recognizing the cold spot polypeptide or an antigenic fragment of the cold spot polypeptide.

As used herein, the terms “recombinational cold spot” or simply “cold spot” refer to a gene in the genome of S. pneumoniae that is characterized by the absence of alleles or by a low incidence of sequence variability among known alleles of the gene across the 94 known serotypes of S. pneumoniae or the terms refer to the encoded polypeptide of such a gene. The low incidence of sequence variability means that the known alleles of a cold spot gene encode cold spot polypeptides that exhibit greater than 98% average pairwise amino acid sequence homology. In most cases, the allelic sequences of a cold spot gene useful in the invention encode polypeptides that show greater than 99% average pairwise amino acid sequence homology, and in many cases more than 99.5% average pairwise amino acid sequence homology across any particular population of pneumococcal isolate strains considered. Thus, the amino acid sequence of a polypeptide (or protein or peptide) encoded by a cold spot gene may be referred to as a cold spot amino acid sequence, and the encoded polypeptide may be referred to as a “cold spot polypeptide”. A preferred “cold spot polypeptide” useful in the invention is (1) encoded by a cold spot gene of S. pneumoniae, (2) is a polypeptide expressed on the surface of S. pneumoniae cells, (3) is in-common across all or nearly all strains of S. pneumoniae, meaning that the surface polypeptide or an allelic variant thereof is encoded in all or nearly all of the known S. pneumoniae genomes, (4) has an amino acid sequence showing greater than 98% average amino acid sequence pairwise homology across such known S. pneumoniae strains, and (5) is immunogenic, that is, capable of eliciting an immune response when administered to a mammal such as a human. Preferably, cold spot polypeptides according to the invention are suitably immunogenic so as to raise an antibody response that is reactive with whole S. pneumoniae cells (i.e., the cold spot protein is an immunogenic S. pneumoniae cell surface antigen). It will be understood from the context of a description herein whether the term “cold spot polypeptide” refers generally to the full-length surface antigen polypeptide encoded by a cold spot gene of a S. pneumoniae or to a portion thereof comprising the cold spot extracellular domain that is expressed on the surface of S. pneumoniae cells. The isolated extracellular domain(s) of a cold spot polypeptide may also be referred to as the “cold spot extracellular domain(s)”. Smaller portions of the extracellular domain are also contemplated, such as portions corresponding to the fragment encoded on a convenient restriction fragment found in the coding sequence, so long as the portions are still immunogenic.

As used herein, the term “all or nearly all S. pneumoniae strains” or “all or nearly all strains” means, with respect to a cold spot polypeptide as described herein, that the gene encoding the peptide is present in at least 95% of genomes in any collection of at least 123 Streptococcus pneumoniae strains. The cold spot gene products described in Table 1A (or alleles) were found to be in-common or universally present in all of the 231 Streptococcus pneumoniae full genomes available for analysis at the time the present discoveries were made. Most of the additional genes located in the recombinationally quiescent flanking regions adjacent the rrn operons of S. pneumoniae cited in Table 2 are present in at least 95% of the 123 or more genomic sequences studied to compile that listing. Additional genomes continue to be made available, but the inventor's non-linear regression analysis of cold spot proteins in relation to number of different strains revealed that analysis of at least 150 or so different strains would be accurately representative of the diversity of a particular genetic sequence over the population of genomes compared. Accordingly, for the purposes of the present invention, a gene or gene expression product found to be present in at least 95%, especially 98%, 99%, 99.5%, or up to 100% of strains in any population of at least 123 different Streptococcus pneumoniae strains will be considered universally present, or “in-common”, across the population, or present in “all or nearly all known strains” of Streptococcus pneumoniae. The term “all or nearly all known serotypes” as used in regard of different serotypes of Streptococcus pneumoniae, means at least 80 of the currently known Streptococcus pneumoniae serotypes. At the time of the present invention, 94 or so Streptococcus pneumoniae serotypes are known, but additional serotypes may be characterized and published in the future. To provide protection against as many pneumococcal serotypes as possible is a common desire of practitioners in the field of vaccines, and therefore, for the purposes of the present invention, in the context of eliciting immune responses, an immune response reactive with 80, 85, 90, 91, 92, 93, or 94 serotypes will be considered reactive with “all or nearly all known serotypes” of Streptococcus pneumoniae. In the same context, an immune response reactive with 50 or more serotypes will be considered reactive with “a majority of the known serotypes” of Streptococcus pneumoniae.

The terms “disorder” and “disease” are synonymous and refer to any pathological condition. Diseases of particular interest with respect to this invention are those caused by S. pneumoniae infection of particular tissues, organs or systems, such as pneumonia (lung infection), otitis media (middle ear infection), sinusitis (sinus infection), sepsis (bacteremia, bloodstream infection), and meningitis (infection of the meninges, which comprise three membranes covering the brain and spinal cord).

As used herein, the terms “treatment” and “treating” refer to any regimen that alleviates one or more symptoms or manifestations of a disease or disorder caused by S. pneumoniae, that inhibits progression of a disease or disorder caused by S. pneumoniae, that arrests progression or reverses progression (causes regression) of a disease or disorder caused by S. pneumoniae, or that prevents onset of a disease or disorder caused by S. pneumoniae. Treatment may include prophylaxis (preventative treatment) of one or more diseases caused by S. pneumoniae and includes but does not require cure of such a disease. Treatment may include active immunization resulting in an endogenous immune response that alleviates or combats S. pneumoniae infection or may include the use of exogenously prepared anti-cold spot polypeptide antibodies (for example, polyclonal antiserum or monoclonal antibodies) in passive immunization (immune therapy).

The terms “immunogen” and “immunogenic” refer to any compound or substance that is capable of eliciting an immune response in a human or non-human individual to the compound or substance.

A “vaccine composition,” or simply “vaccine,” has the definition commonly understood in the art and refers to an immunogenic composition, which upon administration to a human or non-human individual, elicits an initial (primary) immune response that produces antibodies that bind to an antigenic compound or substance in the vaccine composition, and wherein a secondary immune response is elicited later when the individual is again exposed to a source of the compound or substance. For example, a vaccine designed to provide an individual with a protective immune response (“protective immunity”, see below) against an infectious agent, such as a pathogenic virus or bacterium, may comprise inactivated or avirulent virus particles, killed bacterial cells, or one or more antigenic polypeptides, polysaccharides, or other components of the pathogenic virus or bacterium, such that a primary immune response elicited to the viral or bacterial components in the vaccine not only produces a first population of antibodies that bind to the viral or bacterial components of the vaccine but also produces memory immune cells that will trigger a secondary immune response against the corresponding virus or bacterium that infects the individual at a later time. Preferably, a secondary immune response is more pronounced than the primary response to the vaccine and is elicited every time the individual is subsequently exposed to an infecting virus or bacterium that expresses or presents the viral or bacterial component(s) present in the vaccine composition. To maintain the ability to elicit a robust secondary immune response, an individual may be administered the vaccine again at a later time as a “boost” (or “booster shot”) that stimulates production of additional memory immune cells in the individual. For example, a commonly known vaccine that is periodically followed by one or more boosts (booster shoots) to maintain the ability of an individual to elicit a robust secondary immune response, is any of the tetanus vaccines currently in use to provide protection from infection by the bacterium Clostridium tetani.

The term “isolated” when used to describe the various nucleic acid and polypeptide molecules disclosed herein, means a nucleic acid or polypeptide that has been identified and separated and/or recovered from the nucleic acid or polypeptide's natural environment. Contaminant components of the natural environment are materials that would typically interfere with a diagnostic, therapeutic, or prophylactic use of a nucleic acid or polypeptide, and may include enzymes, hormones, carbohydrates, lipids, and other proteinaceous or non-proteinaceous molecules. An isolated polypeptide may include a polypeptide in situ within recombinant cells engineered to express it, since at least one component of the polypeptide's natural environment will not be present. Ordinarily, however, an isolated polypeptide will be prepared by at least one purification step. An “isolated nucleic acid”, “isolated polynucleotide”, or an “isolated polypeptide-encoding nucleic acid” is a nucleic acid molecule that is identified and separated from at least one contaminant nucleic acid molecule with which it is ordinarily associated in a natural source of such nucleic acid, e.g., the Streptococcus pneumoniae genome. An isolated nucleic acid (or isolated polynucleotide or isolated polypeptide-encoding nucleic acid) is other than in the form or setting in which it is found in nature. An isolated nucleic acid (or isolated polynucleotide or isolated polypeptide-encoding nucleic acid) therefore is distinguished from the specific polypeptide-encoding nucleic acid molecule as it exists in natural cells. An isolated nucleic acid (or isolated polynucleotide) includes polypeptide-encoding nucleic acid molecules contained in cells that ordinarily express the polypeptide but where, for example, the nucleic acid molecule is in a different location (e.g., a plasmid) from that of natural cells (e.g., the bacterial chromosome).

The term “isolated” or “isolate” when used to describe a cell means that the cell and its genetically uniform sister cells produced by cell division have been taken out of or separated from an environment comprising at least one other, genetically diverse cell, which may be a prokaryotic or eukaryotic cell. For example, cells of a Streptococcus pneumoniae capsular serotype are typically maintained in stock cultures that are generated by single colony purification on an agar plate, wherein a single colony is assumed to contain genetically uniform sister cells of the particular serotype. An “isolate” of S. pneumoniae may also refer to a cell and its genetically uniform sister cells that were purified as a single colony from a human, clinical sample (such as from sputum, bronchial lavage, blood, tissue, organic material) and that have been maintained and passed on as a stock culture.

The term “polynucleotide” as referred to herein means a polymeric form of two or more nucleotides, either ribonucleotides (RNA) or deoxyribonucleotides (DNA), or a modified form of either type of nucleotide. The term includes single and double stranded forms of RNA and DNA, but preferably is double-stranded DNA.

The term “isolated polynucleotide” as used herein shall mean a polynucleotide (e.g., of genomic, cDNA, or synthetic origin, or some combination thereof) that is not associated with all or a portion of a polynucleotide with which the “isolated polynucleotide” is found in nature or is operably linked to a polynucleotide that it is not linked to in nature.

The term “vector”, as used herein, is intended to refer to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of vector is a plasmid, which refers to a circular double stranded DNA molecule into which additional DNA segments may be inserted. Another type of vector is a viral vector wherein additional DNA segments may be inserted into the viral genome. Of particular interest are bacteriophage vectors, used to transduce bacterial host cells. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) can be integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operatively linked. Such vectors are referred to herein as “recombinant expression vectors” or simply, “expression vectors”. In general, expression vectors of utility in recombinant DNA techniques are often in the form of plasmids. In the present specification, “plasmid” and “vector” may be used interchangeably as the plasmid is the most commonly used form of vector. However, the invention is intended to include such other forms of expression vectors, such as viral vectors (e.g., replication defective retroviruses, adenoviruses and adeno-associated viruses), which serve equivalent functions.

The term “operably linked” refers to a juxtaposition wherein the components described are in a relationship permitting them to function in their intended manner. A nucleic acid control sequence “operably linked” to a nucleic acid coding sequence is ligated in such a way that expression of the coding sequence is achieved under conditions compatible with the control sequence. “Operably linked” sequences include both expression control sequences that are contiguous with a gene of interest and expression control sequences that act in trans or at a distance to control the gene of interest. The term “expression control sequence” as used herein refers to polynucleotide sequences that are necessary to effect the expression of coding sequences to which they are ligated. For prokaryotic or eukaryotic host cells, expression control sequences will include appropriate transcription initiation, termination, and promoter sequences. If eukaryotic hosts are employed, other control elements that may be used include enhancer sequences; efficient RNA processing signals, such as splicing and polyadenylation signals; sequences that stabilize cytoplasmic mRNA; sequences that enhance translation efficiency (such as, a Kozak sequence); sequences that enhance protein stability; and, when desired, sequences that enhance protein insertion into or translocation across one or more cell membranes. The term “control sequence” is intended to include components whose presence is essential for gene transcription and translation and processing of expressed gene product, and can also include additional components whose presence is advantageous, for example, leader sequences and fusion partner sequences.

The term “recombinant host cell” or simply “host cell”, as used herein, is intended to refer to a cell into which exogenous nucleic acid has been or can be introduced. It will be understood that such terms are intended to refer not only to the particular subject cell, but also to the progeny of such a cell. Host cells include prokaryotic and eukaryotic cells. Suitable prokaryotic host cells useful in the invention include, but are not limited to, Escherichia coli, Bacillus (for example, B. subtilis), Streptomyces (for example, S. lividans), Salmonella, and Pseudomonas. Eukaryotic cells that may be used as host cells in the present invention include protist cells, fungal cells, insect cells, plant cells, and animal cells. Preferably, an animal host cell is a mammalian cell of an established cell line, such as, but not limited to, CHO, HEK 293, COS, NS0, and SP2 cell lines. A preferred CHO cell carries a dhfr− mutation (as described in Urlaub and Chasin, Proc. Natl. Acad. Sci. USA, 77: 4216-4220 (1980)) and is used with a DHFR selectable marker (such as described in Kaufman and Sharp, J. Mol. Biol., 159: 601-621 (1982)). A preferred insect host cell is a cell of the Sf9 cell line. Preferred fungal host cells are yeast cells, such as Saccharomyces (for example, S. cerevisiae), Kluveromyces (for example, K. lactis), Schizosaccharomyces (for example, S. pombe), and Picha (for example, P. pastoris). A particularly preferred yeast host cell is Saccharomyces cerevisiae. The preferred host for prokaryotic expression of polypeptides according to this invention is E. coli. The use of a Streptococcus (for example, S. pneumoniae) strain as a host is possible but not preferred, as it would require separation of the recombinantly expressed cold spot polypeptide from other streptococcal proteins.

The term “physiologically acceptable” as used in reference to a compound or composition means that the compound or composition is compatible with the physiology of a living subject, e.g., a human or a non-human animal, to which the compound or composition is to be administered and also is not deleterious to the cold spot polypeptide, or antigenic fragment thereof, or to a desired property or activity of any other component that may be present in a composition. Any compound or composition used in standard methods to raise polyclonal or monoclonal antibodies is also understood to be sterile and used under sterile conditions to avoid introduction of infectious, undesirable, and/or interfering substances into the subject and/or cells thereof. A physiologically acceptable compound or composition used in producing polyclonal or monoclonal antibodies must of course be compatible with the established protocols. Nevertheless, vehicles (carriers), compounds, and compositions that are acceptable in protocols for raising polyclonal or monoclonal antibodies in non-human animals are not necessarily approved and “pharmaceutically acceptable” for use in humans. “Pharmaceutically acceptable” refers to that which is not only physiologically acceptable but also of the quality and type that is permissible for medical use in treating humans.

A composition or method described herein as “comprising” one or more named elements or steps is open-ended, meaning that the named elements or steps are essential, but other elements or steps may be added within the scope of the composition or method. To avoid prolixity, it is also understood that any composition or method described herein as “comprising” (or “which comprises”) one or more named elements or steps also describes the corresponding, more limited, composition or method “consisting essentially of” (or “which consists essentially of”) the same named elements or steps, meaning that the composition or method includes the named essential elements or steps and may also include additional elements or steps that do not materially affect the basic and novel characteristic(s) of the composition or method. It is also understood that any composition or method described herein as “comprising” or “consisting essentially of” one or more named elements or steps also describes the corresponding, more limited, and close-ended composition or method “consisting of” (or “which consists of”) the named elements or steps to the exclusion of any other unnamed element or step. In any composition or method disclosed herein, known or disclosed equivalents of any named essential element or step may be substituted for that element or step.

Other terms are defined in the text below or, unless indicated otherwise, the meaning of other terms used herein is the same as understood and used by persons in the corresponding field, including, but not limited to, the fields of microbiology, population and evolutionary biology, immunology, genetics, biochemistry, molecular biology, and medicine.

A Family of Cold Spot Genes of S. Pneumoniae

The approach taken for this invention was to identify pneumococcal genes (encoding polypeptides) that reside in a portion of the S. pneumoniae genome having very low mutational frequency. In the present case, the regions flanking (e.g., roughly 50 kb upstream and 50 kb downstream of) ribosomal RNA (rRNA) operons of the Streptococcus pneumoniae genome were analyzed. The pneumococcal genome contains four copies of the essential rRNA operons (rrn operons). See, Tettelin et al., Science, 293: 498-506 (2001). It was proposed that genes located in these flanking regions would not exhibit high recombinational mutation frequency, i.e., the “cold spot” or recombinationally quiescent regions of the genome would prove to be highly resistant to incorporation and perpetuation of horizontally exchanged DNA (see, for example, the rationale of Bouchet et al., Clin. Microbiol. Rev., 21(2): 262-273 (2008) in studies of multiple microbial species). Recombination of any of the rrn operons is nearly always a fatal event, and therefore strains evolving mutations or deletions in proximity to the rrn operons do not survive to propagate the genomic changes. Cold spot regions perpetuate genetic conservation because recombination-based mutation nearly always proves lethal, thus purging the vast majority of such recombinational mutants from the bacterial population.

According to the invention, cell surface-exposed cold spot polypeptides encoded by cold spot genes from such recombinationally quiescent regions are highly if not universally conserved across pneumococcal strains. According to the invention, surface antigens encoded by cold spot genes are likely to be particularly well-suited as universal or at least capsular serotype-independent protective antigens common to a majority up to all capsular serotypes of S. pneumoniae. Immunogenic compositions comprising such cold spot polypeptides or fragments thereof comprising all or a portion of the cold spot extracellular domain expressed on the surface of S. pneumoniae cells are useful for eliciting production of antibodies that bind to all or nearly all S. pneumoniae strains. Preferably, such elicited antibodies bind to 90 percent or more of all capsular serotypes. There are about 94 known S. pneumoniae capsular serotypes at the present time, and available capsular polysaccharide-oriented vaccines aim at eliciting an antibody response to a maximum of 23 different serotypes thus far. Because the immunogenic S. pneumoniae surface antigens of the present invention elicit antibodies recognizing less variant bacterial surface structures, it is possible to address many more of the capsular serotypes of S. pneumoniae in a single cold spot polypeptide-containing vaccine. Preferred immunogenic compositions according to the invention will raise antibodies recognizing 50 or more of the capsular serotypes, which represents a significant advance over available vaccines; moreover, since the present compositions comprise capsule-independent immunogens, it is contemplated that the elicited immune response to compositions of the invention will preferably recognize 80 or more, 85 or more, 90 or more, or 94 (e.g., all) capsular serotypes of S. pneumoniae. If additional serotypes are isolated, it is contemplated that immunogenic compositions of the present invention will elicit antibodies that recognize those new serotypes as well.

Three intensive, genome-scale, subcellular localization protein predictor programs were employed to analyze the approximately 50 kilobase pair region of each of the eight flanking regions of the four rrn operons in the S. pneumoniae genome: Gpos-mPloc, described by Shen et al., Protein Pept. Lett., 16(12): 1478-1484 (2009); PSORTdb described by Yu et al., Nucleic Acids Res., 39 (database issue): D241-D244 (2011); and LocateP described by Zhou et al., BMC Bioinformatics, 9:173 (2008). The analysis using these programs identified 104 genes located within the approximately 400 kilobase pairs that flank the four rrn operons present in the S. pneumoniae genome that putatively encode surface expressed polypeptides. These 104 genes represented ˜5% of the roughly 400 kb of genomic sequences analyzed.

The 104 genes from the above analysis were further assessed (by study of the scientific literature pertaining to all 104 proteins and their cognates in other organisms) with respect to biological function of the encoded polypeptide to generate a family of 79 genes that are most likely to encode surface-expressed polypeptides that are vital (essential) to the survival of the cell. The 50 proteins were ranked by likelihood of being essential to cell survival, and the proteins deemed most vital for cell survival were further analyzed by amino acid sequence to identify numbers of alleles and the degree of amino acid homology across the library of 231 S. pneumoniae genomes that represented all known and fully sequenced strains contained in the NCBI GenBank database at the time. The predominant amino acid sequences for the 21 most vital cold spot polypeptides are given in Table 1A, below. In Table 1B, which follows Table 1A, information on the allelic variants of the predominant sequence for each of the 21 cold spot polypeptides is given, and the percent average amino acid sequence pairwise homology across the allelic variants is also given. The GBSP# is an internal laboratory designation for each of the proteins.

TABLE 1A

Polypeptides Encoded by Cold Spot Genes

Polypeptide

Designation/

Literature

Acronym/
Protein
Amino Acid Sequence

(length)
Function
(SEQ ID NO:)

GBSP3
shape-
MNRFKKSKYV IIVFVTVLLV SALLATTYSS TIVTKLGDGI

MreC
determining
SLVDRVVQKP FQWFDSVKSD LAHLTRTYNE NESLKKQLYQ

(272 aa)
protein
LEVKSNEVES LKTENEQLRQ LLDMKSKLQA TKTLAADVIM

RSPVSWKQEL TLDAGRSKGA SENMLAIANG GLIGSVSKVE

ENSTIVNLLT NTENADKISV KIQHGSTTIY GIIIGYDKEN

DVLKISQLNS NSDISAGDKV TTGGLGNFNV ADIPVGEVVA

TTHSTDYLTR EVTVKLSADT HNVDVIELVG NS (SEQ

ID NO: 1)

GBSP14
sensor histidine
MKRYLQFWLV NLSVSLILIA GMALTWISKG IGLFLLALSL

PnpS
kinase
GLGGYWLFCL WKWEVAFETL HQPLLTSSEY FLEKGQEDLK

(443 aa)

SLAQYVSGLK TKVSQQDQQY KDLAETMEVL LSHLTMGTFL

VSAQGQMLLS SRSLPHYFPD VDGDISSLDD LKRMDIRNLV

HQAFDQKTRL KQEVSGFHEG DLILEVTAVP VFSPTQSVEA

VLVLLYDLTT IRTYEKLNLA FVSNASHELR TPVTSIKGFA

ETIKGMSAEE EALKDDFLDI IYKESLRLEH IVEHLLTLSK

AQQMPIQWTT LSLAEFVQDL TQSLQPQLKK KDLQLKVQVP

DDVTLVSDSQ LLSQILLNLL SNAIRYTEQG GKIEVKTQKV

NEGIKISVSD TGIGISQLEQ DRIFERFYRV NKGRSRQTGG

TGLGLAIVKE LSQLLGGQVT VTSQLGRGSC FTIFLPNQSF

AQD (SEQ ID NO: 2)

GBSP15
type II secretory
MILLEAVVAL AIFASIATLL LGQIQKNRQE EAKILQKEEV

unannotated
pathway
LRVAKMALQT GQNQVSINGV EIQVFSSEKG LEVYHGSEQL

protein of PulG
pseudopilin
LAIKEP (SEQ ID NO: 3)

superfamily

(86 aa)

GBSP22
ABC-type
MDSMILGRYI PGDSIVHRLD PRSKLLAMML LILIVFWANN

CbiQ
cobalt transport
PLTNLILFIA TGIFIALSGV SLSFFIQGLK SMFFLIAFTT

(264 aa)
system
IFQLFFISNG NVLFEFSFVR ITDYALQQAG IIFCRFVLII

permease
FFSTLLTLTT MPLSLASAVE ALLAPLKRVK VPVHEIGLML

component
SMSLRFVPTL MDDTTRIMNA QKARGVDFGE GSIVQKVKAM

IPILIPLFAT SLKRADSLAI AMEARGYQGG KGRSQYRQLK

WTLKDTLTIL VILVLGCCLF FLKS (SEQ ID NO: 4)

GBSP23
serine protease
MKHLKTFYKK WFQLLVVIVI SFFSGALGSF SITQLTQKSS

HtrA

VNNSNNNSTI TQTAYKNENS TTQAVNKVKD AVVSVITYSA

(393 aa)

NRQNSVFGND DTDTDSQRIS SEGSGVIYKK NDKEAYIVTN

NHVINGASKV DIRLSDGTKV PGEIVGADTF SDIAVVKISS

EKVTTVAEFG DSSKLTVGET AIAIGSPLGS EYANTVTQGI

VSSLNRNVSL KSEDGQAIST KAIQTDTAIN PGNSGGPLIN

IQGQVIGITS SKIATNGGTS VEGLGFAIPA NDAINIIEQL

EKNGKVTRPA LGIQMVNLSN VSTSDIRRLN IPSNVTSGVV

VRSVQSNMPA NGHLEKYDVI TKVDDKEIAS STDLQSALYN

HSIGDTIKIT YYRNGKEETT SIKLNKSSGD LES (SEQ

(ID NO: 5)

GBSP24
putative sensor
MDLFGFGTVI VHFLIISHSY HFICKGQINR KELFVFGAYT

ComD
histidine kinase
LLTEIVFDFP LYILYLDGLG IERFLFPLGL YSYFRWMKQY

(441 aa)

ERDRGLFLSL LLSLLYESTH NFLSVTFSSI TGDNFVLQYH

FPFFFVVTVL TYFVTLKIIY YFHLELAYFD EDYLYPFLKK

VFFALLLLHI VSFVSDMVST IKHLNSFGSI LSSIVFISLL

LTFFAMNSHK VQMEKEIALK QKKFEQKHLQ NYTDEIVGLY

NEIRGFRHDY AGMLVSMQMA IDSGNLQEID RIYNEVLVKA

NHKLRSDKYT YFDLNNIEDS ALRSLVAQSI VYARNNGVEF

TLEVKDTITK LPIELLDLVR IMSVLLNNAV EGSADSYKKQ

MEVAVIKMET ETVIVIQNSC KMTMTPSGDL FALGFSTKGR

NRGVGLNNVK ELLDKYNNII LETEMEGSTF RQIIRFKREF

E (SEQ ID NO: 6)

GBSP1
preprotein
MRFIGDIFRL LKDTTWPTRK ESWRDFRSIM EYTAFFVVII

SecE
translocase
YIFDQLIVSG LIRFINIF (SEQ ID NO: 7)

(58 aa)
subunit E

GBSP2
magnesium ion
MKGVTNMTPE EMYLTERLDV QIAHFLKKSV QHRRRYKVLK

unannotated
transporter
ITEIVAGFLI AVFCAIPMPG DRYRLISVAL SSLGLLCEGI

protein of

INLYNAKENW ISYQKTAQLL EKEKFLYQCQ TEKYAGKTKA

CorA family

FALFVKTCEG LISEEINQWE SIQSKEVAAS ADAPVKKE

(158 aa)

(SEQ ID NO: 8)

GBSP4
permease
MKGVNMEKQQ PSKAALLSII PGLGQIYNKQ KAKGFIFLGV

MalC
protein of
TIVFVLYFLA LATPELSNLI TLGDKPGRDN SLFMLIRGAF

(435 aa)
maltodextrin
HLIFVIVYVL FYFSNIKDAH TIAKRINNGI PVPRTLKDMI

ABC transporter
KGIYENGFPY LLIIPSYVAM TFAIIFPVIV TLMIAFTNYD

FQHLPPNKLL DWVGLTNFTN IWSLSTFRSA FGSVLSWTII

WALAASTLQI VIGIFTAIIA NQPFIKGKRI FGVIFLLPWA

VPAFITILTF SNMFNDSVGA INTQVLPILA KFLPFLDGAL

IPWKTDPTWT KIALIMMQGW LGFPYIYVLT LGILQSIPND

LYEAAYIDGA NAWQKFRNIT FPMILAVAAP TLISQYTFNF

NNFSIMYLFN GGGPGSVGGG AGSTDILISW IYRLTTGTSP

QYSMAAAVTL IISIIVISIS MIAFKKLHAF DMEDV

(SEQ ID NO: 9)

GBSP6
rod shape-
MRQLKRVGVF LLLPFFVLID AHISQLLGSF FPHVHLASHF

MreD
determining
LFLFLLFETI EVSEYLYLVY CFVIGLVYDV YFFHLIGITT

(164 aa)
protein
LLFILLGAFL HKLNSVILLN RWTRMLAMIV LTFLFEMGSY

LLAFMVGLTV DSMSIFIVYS LVPTMILNFL WITVFQFIFE

KYYL (SEQ ID NO: 10)

GBSP7
phospho-
MTNSNYKLTK EDFNQINKRS LFTFQLGWNY ERMQASGYLY

unannotated
transferase
MILPQLRKMY GDGTPELKEM MKVHTQFFNT SPFFHTIIAG

analogue of
system (PTS)
FDLAMEEKDG VGSKDAVNGI KTGLMGPFAP LGDTIFGSLV

PTS subunit
transporter
PAIMGSVAAT MAIAGQPWGI FLWIAVAVAY DIFRWKQLEF

IID

AYKEGVNLIN NMQSTLTALI DAASVLGVFM MGALVATVIN

(271 aa)

FEISYKLPIG EKMIDFQDIL NQIFPRLLPA IFTAFIFWLL

GKKGMNSTKA IGIIIVLALA LSALGHFALG M (SEQ ID

NO: 11)

GBSP8
permease
MSTIDKEKFQ FVKRDDFASE TIDAPAYSYW KSVFKQFMKK

AmiD
protein of
KSTVVMLGIL VAIILISFIY PMFSKFDFND VSKVNDFSVR

(308 aa)
oligopeptide
YIKPNAEHWF GTDSNGKSLF DGVWFGARNS ILISVIATVI

ABC transporter
NLVIGVFVGG IWGISKSVDR VMMEVYNVIS NIPPLLIVIV

LTYSIGAGFW NLIFAMSVTT WIGIAFMIRV QILRYRDLEY

NLASRTLGTP TLKIVAKNIM PQLVSVIVTT MTQMLPSFIS

YEAFLSFFGL GLPITVPSLG RLISDYSQNV TTNAYLFWIP

LTTLVLVSLS LFVVGQNLAD ASDPRTHR (SEQ ID

NO: 12)

GBSP9
phospho-
MIQWWQILLL TLYSAYQICD ELTIVSSAGS PVFAGFITGL

unannotated
transferase
IMGDVTTGLL IGGNLQLFVL GVGTFGGASR IDATSGAVLA

analogue of
system (PTS)
TAFSVSQGID APLAITTIAV PVAALLTYFD VLGRMTTTFF

PTS subunit
transporter
AHRVDAAIER FDYKGIERNY LLGAIPWALS RALPVFFALA

IID

FGGAFVQSVV DFVEAYKWVA DGLTLAGRML PGLGFAILLR

(301 aa)

TLPVKRNLHY LAMGFGLTAM LTVLYSYVTG LGGAVAGIVG

TLPAEVAEKI GFVNNFKGLS MIGISIVGIF LAVLHFKNSQ

KVAVAAPSTP SESGEIEDDE F (SEQ ID NO: 13)

GBSP10
nicotinamide
MMHTYLQKKI ENIKTTLGEM SGGYRRMVAA MADLGFSGTM

unannotated
mononucleotide
KAIWDDLFAH RSFAQWIYLL VLGSFPLWLE LVYEHRIVDW

analogue of
transporter
IGMICSLTGI ICVIFVSEGR ASNYLFGLIN SVIYLILALQ

PnuC
analogue
KGFYGEVLTT LYFTVMQPIG LLVWIYQAQF KKEKQEFVAR

nicotinamide

KLDGKGWTKY LSISVLWWLA FGFIYQSIGA NRPYRDSITD

mononucleo-

ATNGVGQILM TAVYREQWIF WAATNVFSIY LWWGESLQIQ

tide transporter

GKYLIYLINS LVGWYQWSKA AKQNTDLLN (SEQ ID

(269 aa)

NO: 14)

GBSP11
protein involved
MKKKLKLTSL LGLSLLIMTA CATNGVTSDI TAESADFWSK

unannotated
in membrane
LVYFFAEIIR FLSFDISIGV GIILFTVLIR TVLLPVFQVQ

analogue of
insertion
MVASRKMQEA QPRIKALREQ YPGRDMESRT KLEQEMRKVF

SpoIIIJ family

KEMGVRQSDS LWPILIQMPV ILALFQALSR VDFLKTGHFL

(274 aa)

WINLGSVDTT LVLPILAAVF TFLSTWLSNK ALSERNGATT

AMMYGIPVLI FIFAVYAPGG VALYWTVSNA YQVLQTYFLN

NPFKIIAERE AVVQAQKDLE NRKRKAKKKA QKTK (SEQ

ID NO: 15)

GBSP13
phosphatidyl
MKKEQIPNLL TIGRILFIPI FIFILTIGNS IESHIVAAII

PgsA
transferase
FAVASITDYL DGYLARKWNV VSNFGKFADP MADKLLVMSA

(181 aa)

FIMLIELGMA PAWIVAVIIC RELAVTGLRL LLVETGGTIL

AAAMPGKIKT FSQMFAIIFL LLHWTLLGQV LLYVALFFTI

YSGYDYFKGS AYVFKGTFGS K (SEQ ID NO: 16)

GBSP17
permease
MIHLIMISAI ALAIGIGYRT KINIGLLAIA FSYLIATTLM

ArsB

GLSPKELLHF WPTSLFFTIF SVSLFYNVAT TNGTLDVLAQ

(436 aa)

HILYRTRTHP NALYMILYLM ATLLSALGAG FFTTMAVCCP

LAITLCQKAD KHPLIGAQAV NWGASGGANL ITSSSGIVFQ

GLFKQMGWEE QAFSLGNHIF IVSIIYPLIV LLLLSCYSHY

SKGRTNSSLT IDQPPLLSKV QRQTTLLMIS SMVLVWLFPL

LHLIFPNIAW IATYQKTFDI GFVSILMVCL ALRLKLGKQE

AILAKVPWAT IIMLCGMSLL MSLAVKSGLV TLIGHLMTTT

IPHFWLPLFF CVIAGVMSLF SSTLSVVAPA LFPIIAIISA

QNPQIDIHLL TTATVIGALS TNISPFSSAG SLIQLSLPNI

EERGLAFKKQ IILGVPISLS LGLLTTWILI LLASLS

(SEQ ID NO: 17)

GBSP18
multi-
MGENFLQMLM GMVDSYLVAH LGLIAISGVS VAGNIITIYQ

MatE
antimicrobial
AIFIALGAAI SSVISKSIGQ KDQSKLAYHV TEALKITLLL

(411 aa)
extrusion
SFLLGFLSIF AGKEMIGLLG TERDVAESGG LYLSLVGGSI

protein (MATE)
VLLGLMTSLG ALIRATHNPR LPLYVSFLSN ALNILFSSLA

IFVLDMGIAG VAWGTIVSRL VGLVILWSQL KLPYGKPTFG

LDKELLTLAL PAAGERLMMR AGDVVIIALV VSFGTEAVAG

NAIGEVLTQF NYMPAFGVAT ATVMLLARAV GEDDWKRVAS

LSKQTFWLSL FLMLPLSFSI YVLGVPLTHL YTTDSLAVEA

SVLVTLFSLL GTPMTTGTVI YTAVWQGLGN ARLPFYATSI

GMWCIRIGTG YLMGIVLGWG LPGIWAGSLL DNGFRWLFLR

YRYQRYMSLK G (SEQ ID NO: 18)

GBSP19
(function
MNYPKIDLKT IRQESKHFQA DTPRLFLLYI LPSMLVILSG

unannotated
unknown)
FLNPLSRIHG TVLEQPFFSI LGQILQTYLF PLLVSFIGTI

open reading

LLTSSVYATL TLMKDSKTEP SVKNSLALFD EERFSQTFLT

frame

LLLKRFYLFL WSIPNLLGIY LLFYSSFLAK KFVTLHPEFP

(285 aa)

NLDLSSVETE RFLMVFGLYF LASLILIIVG NILYIPQYYA

YSQVEFLLCY SLDLGQVPPR RILKTSRSFM KGYKFQHFVL

DLQLLPWYFL NWITFGIASF SLLPYIQCTK IMFYRAVLAR

KRPKA (SEQ ID NO: 19)

GBSP20
cation efflux
MRNMKAKYAV WVAFFLNLTY AIVEFIAGGV FGSSAVLADS

CzcD
system protein
VHDLGDAIAI GISAFLETIS NREEDNQYTL GYKRFSLLGA

(299 aa)

LVTAVILVTG SVLVILENVT KILHPQPVND EGILWLGIIA

ITINLLASLV VGKGKTKNES ILSLHFLEDT LGWVAVILMA

IVLRFTDWYI LDPLLSLVIS FFILSKALPR FWSTLKIFLD

AVPEGLDIKQ VKSGLERLDN VASLNQLNLW TMDALEKNAI

VHVCLKEMEH METCKESIRI FLKDCGFQNI TIEIDADLET

HQTHKRKVCD LERSYEHQH (SEQ ID NO: 20)

GBSP21
competence
MKPEFLESAE FYNRRYHNFS SSVIVPMALL LVFLLGFATV

ComB
factor transport
AEKEMSLSTR ATVEPSRILA NIQSTSNNRI LVNHLEENKL

(449 aa)
protein
VKKGDLLVQY QEGAEGVQAE SYASQLDMLK DQKKQLEYLQ

KSLQEGENHF PEEDKFGYQA TFRDYISQAG SLRASTSQQN

ETIASQNAAA SQTQAEIGNL ISQTEAKIRD YQTAKSAIET

GASLAGQNLA YSLYQSYKSQ GEENPQTKVQ AVAQVEAQIS

QLESSLATYR VQYAGSGTQQ AYASGLSSQL ESLKSQHLAK

VGQELTLLAQ KILEAESGKK VQGNLLDKGK VTASEDGVLH

LNPETSDSSM VAEGALLAQL YPSLEREGKA KLTAYLSSKY

VARIKVGDSV RYTTTHDAGN QLFLDSTITS IDATATKTEK

GNFFKIEAET NLTSEQAEKL RYGVEGRLQM ITGKKSYLRY

YLDQFLNKE (SEQ ID NO: 21)

TABLE 1B

Allelic Variants of the Cold Spot Polypeptides (Predominant Sequences) in Table 1A

number of

alleles found/

Cold Spot
% average aa

Polypeptide
pairwise

presented in
homology (to
Amino Acid Variations for Each Allele by

Table 1A
predominant
Position in the Predominant Polypeptide

(SEQ ID NO.)
sequence)
Sequence in Table 1A

GBSP3 allelic
6 alleles found
1. I205V (SEQ ID NO: 22)

variants from
99.89% average
2. I32V (SEQ ID NO: 23)

predominant
aa pairwise
3. A231V (SEQ ID NO: 24)

sequence (SEQ
homology
4. T67A, L78I, V88A, D103G, A134V, R136K, S186T,

ID NO: 1)

I194V, A216T (SEQ ID NO: 25)

5. S141F (SEQ ID NO: 26)

6. T187I (SEQ ID NO: 27)

GBSP14 allelic
14 alleles found
1. L42Q (SEQ ID NO: 28)

variants from
99.76% average
2. L42Q, G88A (SEQ ID NO: 29)

predominant
aa pairwise
3. Q84K (SEQ ID NO: 30)

sequence (SEQ
homology
4. L42Q, V109I (SEQ ID NO: 31)

ID NO: 2)

5. L38F (SEQ ID NO: 32)

6. V360G (SEQ ID NO: 33)

7. L42Q, G88A, I271S (SEQ ID NO: 34)

8. E58K (SEQ ID NO: 35)

9. L275H (SEQ ID NO: 36)

10. L42Q, V326I (SEQ ID NO: 37)

11. T213I (SEQ ID NO: 38)

12. A23V, L42Q, G88A (SEQ ID NO: 39)

13. Q84K, S235L (SEQ ID NO: 40)

14. L42Q, G88A, A343T (SEQ ID NO: 41)

GBSP15 allelic
10 alleles found
1. S56N (SEQ ID NO: 42)

variants from
99.33% average
2. N53S (SEQ ID NO: 43)

predominant
aa pairwise
3. M1V (SEQ ID NO: 44)

sequence (SEQ
homology
4. T18P (SEQ ID NO: 45)

ID NO: 3)

5. M1V, N53S (SEQ ID NO: 46)

6. M1V, S56N (SEQ ID NO: 47)

7. A44T, S56N (SEQ ID NO: 48)

8. S56N, E72A (SEQ ID NO: 49)

9. S56N, P86L (SEQ ID NO: 50)

10. V55I, S56N (SEQ ID NO: 51)

GBSP22 allelic
5 alleles found
1. H154Y (SEQ ID NO: 52)

variants from
99.94% average
2. S234I (SEQ ID NO: 53)

predominant
aa pairwise
3. A199S (SEQ ID NO: 54)

sequence (SEQ
homology
4. M162I (SEQ ID NO: 55)

ID NO: 4)

5. L62F (SEQ ID NO: 56)

GBSP23 allelic
12 alleles found
1. V320I (SEQ ID NO: 57)

variants from
99.62% average
2. K211R (SEQ ID NO: 58)

predominant
aa pairwise
3. W11S, V320I (SEQ ID NO: 59)

sequence (SEQ
homology
4. E143K, V320I (SEQ ID NO: 60)

ID NO: 5)

5. W11G, T33A, N42S, D91E, R98Q, V301I,

N385D (SEQ ID NO: 61)

6. Y8N, S302I, V320I, Q355P (SEQ ID NO: 62)

7. K211R, N283K, G284R (SEQ ID NO: 63)

8. K211R, T227A (SEQ ID NO: 64)

9. K211R, A290V (SEQ ID NO: 65)

10. K211R, V320I (SEQ ID NO: 66)

11. S186N (SEQ ID NO: 67)

12. W11G, T33A, N42S, D91E, R98Q, A147T,

V301I, N385D (SEQ ID NO: 68)

GBSP24 allelic
23 alleles found
1. F4L, H21R, F22L, Q27R, F34Y, F47L, D48E,

variants from
98.37% average
P50S, L51F, I53L, G58K, L59I, T106I, E151K

predominant
aa pairwise
(SEQ ID NO: 69)

sequence (SEQ
homology
2. F13L, S19N, F22L, E62A, R63I, E151K,

ID NO: 6)

A392S (SEQ ID NO: 70)

3. F4L, H21R, F22L, Q27R, F34Y, V35I, F47L,

D48E, P50S, I53L, G58K, L59I, E151K, I374M

(SEQ ID NO: 71)

4. F4L, H21R, F22L, Q27R, F34Y, F47L, D48E,

P50S, L51F, I53L, G58K, L59I, T106I, E151K,

M384I (SEQ ID NO: 72)

5. F22L, L51F, E62A, R63T, M77I, S104P,

E151K (SEQ ID NO: 73)

6. G264S (SEQ ID NO: 74)

7. F22L, E62A, R63T, M77I, E151K (SEQ ID

NO: 75)

8. F4L, H21R, F22L, Q27R, F34Y, F47L, D48E,

P50S, L51F, I53L, G58K, L59I, E151K, I374M

(SEQ ID NO: 76)

9. Y73N (SEQ ID NO: 77)

10. F13L, S19N, F22L, E62A, R63I, E151K

(SEQ ID NO: 78)

11. S19N, F22L, E62A, R63T, M77I, E151K,

A392S (SEQ ID NO: 79)

12. F4L, H21R, F22L, Q27R, F34Y, F47L,

D48E, P50S, L51F, I53L, G58K, L59I (SEQ ID

NO: 80)

13. L3F, G5L, F6L, G7V, T8D, V9L, V11L,

H12Y, H21R, F22L, G26D, V35F, F47L, L51F,

I53L, E62A, R63T, H120Y, F121G, P122L,

E151K, I272V, E439K (SEQ ID NO: 81)

14. A164V (SEQ ID NO: 82)

15. M253T (SEQ ID NO: 83)

16. A280X, I374V (SEQ ID NO: 84)

17. S104Y (SEQ ID NO: 85)

18. G7E, S104Y (SEQ ID NO: 86)

19. S19N, F22L, E62A, R63T, M77I, E151K,

E371G, A392S (SEQ ID NO: 87)

20. F22L, E62A, R63T, M77I, E151K, I272V

(SEQ ID NO: 88)

21. F22L, M77I, T99I, E151K, V171I, L254F

(SEQ ID NO: 89)

22. F4L, H21R, F22L, Q27R, F34Y, F47L,

D48E, P50S, L51F, I53L, Y55S, G58K, L59I,

E151K (SEQ ID NO: 90)

23. L3F, G5L, F6L, G7V, T8D, V9L, V11L,

H12Y, H21R, F22L, G26D, V35F, F47L, L51F,

I53L, E62A, R63T, H120Y, F121G, P122L,

I272V, E439K (SEQ ID NO: 91)

GBSP1 allelic
1 allele found
1. R9K (SEQ ID NO: 92)

variant from
99.91% average

predominant
aa pairwise

sequence (SEQ
homology

ID NO: 7)

GBSP2 allelic
3 alleles found
1. N89H, K102R, D152E, V155G (SEQ ID

variants from
99.90% average
NO: 93 )

predominant
aa pairwise
2. T127A (SEQ ID NO: 94)

sequence (SEQ
homology
3. V20I (SEQ ID NO: 95)

ID NO: 8)

GBSP4 allelic
8 alleles found
1. I422V (SEQ ID NO: 96)

variants from
99.92% average
2. G39S (SEQ ID NO: 97)

predominant
aa pairwise
3. D64N (SEQ ID NO: 98)

sequence (SEQ
homology
4. G174S (SEQ ID NO: 99)

ID NO: 9)

5. A332T (SEQ ID NO: 100)

6. G39S, W201L (SEQ ID NO: 101)

7. N5I, T53A, I86V, S94A, T179S, L269F

(SEQ ID NO: 102)

8. N5I, T53A, I86V, R114L, S193A, G371A

(SEQ ID NO: 103)

GBSP6 allelic
5 alleles found
1. V70I (SEQ ID NO: 104)

variants from
99.69% average
2. Y68H (SEQ ID NO: 105)

predominant
aa pairwise
3. V67I (SEQ ID NO: 106)

sequence (SEQ
homology
4. I64T (SEQ ID NO: 107)

ID NO: 10)

5. T79A, M105I, L111M, T112S, E116D, S119T,

L122F, F124L, M125V, V130L, M133L, S134P,

T144S, F149L, L150V, I152M, T153L, V154I,

K161R (SEQ ID NO: 108)

GBSP7 allelic
7 alleles found
1. G117A (SEQ ID NO: 109)

variants from
99.88% average
2. A260T (SEQ ID NO: 110)

predominant
aa pairwise
3. S20R (SEQ ID NO: 111)

sequence (SEQ
homology
4. V198M, L269F (SEQ ID NO: 112)

ID NO: 11)

5. A263T (SEQ ID NO: 113)

6. M49I, L207W, P208A, L264F, H266K, A268V

(SEQ ID NO: 114)

7. S93A, N167S, V198L, I203V, Q222S, L264F,

H266K, A268V, −272G, −273A (SEQ ID NO: 115)

GBSP8 allelic
5 alleles found
1. V79I (SEQ ID NO: 116)

variants from
99.78% average
2. L122V (SEQ ID NO: 117)

predominant
aa pairwise
3. A216V (SEQ ID NO: 118)

sequence (SEQ
homology
4. V79I, G106V (SEQ ID NO: 119)

ID NO: 12)

5. V79I, G168R (SEQ ID NO: 120)

GBSP9 allelic
10 alleles found
1. E248Q (SEQ ID NO: 121)

variants from
99.86% average
2. D181Y (SEQ ID NO: 122)

predominant
aa pairwise
3. A283V (SEQ ID NO: 123)

sequence (SEQ
homology
4. F131Y, S168H, F172L, E174T, A175E,

ID NO: 13)

K177Q, V179I (SEQ ID NO: 124)

5. I41V (SEQ ID NO: 125)

6. T471 (SEQ ID NO: 126)

7. F131Y (SEQ ID NO: 127)

8. A164E, S168G, E174T, A175E, K177Q (SEQ

ID NO: 128)

9. A91T, P92A, F172L, I238V, V239I, A244K,

E245D (SEQ ID NO: 129)

10. A91T, P92A, A164E, S168H, E174T, A175K,

K177Q, A244K, E245D (SEQ ID NO: 130)

GBSP10 allelic
9 alleles found
1. G164D (SEQ ID NO: 131)

variants from
99.84% average
2. L268P (SEQ ID NO: 132)

predominant
aa pairwise
3. Q186R (SEQ ID NO: 133)

sequence (SEQ
homology
4. G38E (SEQ ID NO: 134)

ID NO: 14)

5. A212T (SEQ ID NO: 135)

6. T168A (SEQ ID NO: 136)

7. W80C (SEQ ID NO: 137)

8. Q149R (SEQ ID NO: 138)

9. M2I (SEQ ID NO: 139)

GBSP11 allelic
12 alleles found
1. G25E, H158Q (SEQ ID NO: 140)

variants from
99.81% average
2. E47K (SEQ ID NO: 141)

predominant
aa pairwise
3. V67I (SEQ ID NO: 142)

sequence (SEQ
homology
4. S166G (SEQ ID NO: 143)

ID NO: 15)

5. L41S ( SEQ ID NO: 144)

6. E47K, R70C (SEQ ID NO: 145)

7. E47K, A255T (SEQ ID NO: 146)

8. A222T, V253A (SEQ ID NO: 147)

9. F211L (SEQ ID NO: 148)

10. P102S (SEQ ID NO: 149)

11. T19A (SEQ ID NO: 150)

12. G25E, H158Q, F243L (SEQ ID NO: 151)

GBSP13 allelic
8 alleles found
1. M83V (SEQ ID NO: 152)

variants from
99.61% average
2. M83I (SEQ ID NO: 153)

predominant
aa pairwise
3. I22L (SEQ ID NO: 154)

sequence (SEQ
homology
4. A122S (SEQ ID NO: 155)

ID NO: 16)

5. M124I (SEQ ID NO: 156)

6. L84F (SEQ ID NO: 157)

7. A90D (SEQ ID NO: 158)

8. A122S, G169S (SEQ ID NO: 159)

GBSP17 allelic
22 alleles found
1. Q255R (SEQ ID NO: 160)

variants from
99.51% average
2. V68F, T70I, H89Y, Q255R, T320S (SEQ ID

predominant
aa pairwise
NO: 161)

sequence (SEQ
homology
3. V68F, T70I, H89Y, E170D, T320S (SEQ ID

ID NO: 17)

NO: 162)

4. A92T (SEQ ID NO: 163)

5. T70I, Q255R (SEQ ID NO: 164)

6. V68F, T70I, H89Y, R204K, T320S (SEQ ID

NO: 165)

7. V68F, T70I, H89Y, T320S (SEQ ID NO: 166)

8. G108S (SEQ ID NO: 167)

9. H89Y (SEQ ID NO: 168)

10. Q255R, H315Q (SEQ ID NO: 169)

11. T37I, T70I, H89Y (SEQ ID NO: 170)

12. T70I, H89Y (SEQ ID NO: 171)

13. V68F, T70I, H89Y (SEQ ID NO: 172)

14. T70I, H89Y, Q255R, T319I, T320S (SEQ

ID NO: 173)

15. T70I, H89Y, Q255R, T320S (SEQ ID

NO: 174)

16. R273K (SEQ ID NO: 175)

17. A116V (SEQ ID NO: 176)

18. A249V (SEQ ID NO: 177)

19. Y18F, T113A (SEQ ID NO: 178)

20. T70I (SEQ ID NO: 179)

21. T70I, H89Y, Q255R, T320S, G422E (SEQ

ID NO: 180)

22. V68F, T70I, T86I, H89Y, Q255R, T320S,

G404A (SEQ ID NO: 181)

GBSP18 allelic
21 alleles found
1. V177L, M373L (SEQ ID NO: 182)

variants from
99.50% average
2. V177L, S408N (SEQ ID NO: 183)

predominant
aa pairwise
3. A169T, V177L, P197L, V344I, M373L (SEQ

sequence (SEQ
homology
ID NO: 184)

ID NO: 18)

4. V177L (SEQ ID NO: 185)

5. V177L, S408I (SEQ ID NO: 186)

6. A169T, V177L, P197L, S408I (SEQ ID

NO: 187)

7. S408N (SEQ ID NO: 188)

8. V177L, P197L (SEQ ID NO: 189)

9. S408I (SEQ ID NO: 190)

10. P142L (SEQ ID NO: 191)

11. A237T, S408N (SEQ ID NO: 192)

12. V177L, G195E (SEQ ID NO: 193)

13. G195E (SEQ ID NO: 194)

14. V177L, P197L, T260M (SEQ ID NO: 195)

15. L159X (SEQ ID NO: 196)

16. A317V (SEQ ID NO: 197)

17. K203R, S408I (SEQ ID NO: 198)

18. V177L, S297Y, M373L (SEQ ID NO: 199)

19. A169T, V177L, P197L, P295L, S408I (SEQ

ID NO: 200)

20. Q7H, A169T, V177L, T285I, S408I (SEQ

ID NO: 201)

21. Q7H, A169T, G170S, T285I, S408I (SEQ

ID NO: 202)

GBSP19 allelic
17 alleles found
1. G40S, T67A (SEQ ID NO: 203)

variants from
99.37% average
2. G40S, T67A, H237R (SEQ ID NO: 204)

predominant
aa pairwise
3. T67A, Y210H (SEQ ID NO: 205)

sequence (SEQ
homology
4. G40S, H237R (SEQ ID NO: 206)

ID NO: 19)

5. G40S, T67A, V239I (SEQ ID NO: 207)

6. G40S (SEQ ID NO: 208)

7. G40S, T67A, Y140H, N161S (SEQ ID

NO: 209)

8. G40S, T67A, R275Q (SEQ ID NO: 210)

9. G40S, T67A, S183G, H237R (SEQ ID

NO: 211)

10. G40S, T67A, S75F, H237R (SEQ ID

NO: 212)

11. G40S, T67A, S75F, K150Q, E170D, H237R

(SEQ ID NO: 213)

12. L245I (SEQ ID NO: 214)

13. G78R (SEQ ID NO: 215)

14. T67A, V153A, Y210H (SEQ ID NO: 216)

15. R12H, L27F, G40S (SEQ ID NO: 217)

16. T67A, H237R (SEQ ID NO: 218)

17. T67A, H237R, I271L (SEQ ID NO: 219)

GBSP20 allelic
20 alleles found
1. A47V, P189S (SEQ ID NO: 220)

variants from
99.31% average
2. A47V, L81V, G214D (SEQ ID NO: 221)

predominant
aa pairwise
3. T58L, A222V (SEQ ID NO: 222)

sequence (SEQ
homology
4. V30I, G45V, S143N, P189S (SEQ ID

ID NO: 20)

NO: 223)

5. P189S (SEQ ID NO: 224)

6. S143N, P189S (SEQ ID NO: 225)

7. I52M, H250Y (SEQ ID NO: 226)

8. T83I, P189S (SEQ ID NO: 227)

9. V30I, G45V, A222V (SEQ ID NO: 228)

10. A222V (SEQ ID NO: 229)

11. A47V, A222V (SEQ ID NO: 230)

12. V10A, G45V (SEQ ID NO: 231)

13. A47V (SEQ ID NO: 232)

14. A187V (SEQ ID NO: 233)

15. V30I, G45V, L81V, V131I, A222V (SEQ ID

NO: 234)

16. V30I, G45V, L81V, A222V (SEQ ID

NO: 235)

17. V30I, G45V, L125M (SEQ ID NO: 236)

18. V30I, A222V, I260N (SEQ ID NO: 237)

19. V10A, A21T, A47V, A222V (SEQ ID

NO: 238)

20. A47V, L81V, G214D, G266V (SEQ ID

NO: 239)

GBSP21 allelic
34 alleles found
1. A202T, V311I, Y360D (SEQ ID NO: 240)

variants from
99.13% average
2. I198M, G206S, V311I, Y360D (SEQ ID

predominant
aa pairwise
NO: 241)

sequence (SEQ
homology
3. S22R, A28S, V40F, M45I, T227I,

ID NO: 21)

V311I, Y360D (SEQ ID NO: 242)

4. S182N, I198M, G206S, V311I, Y360D

(SEQ ID NO: 243)

5. V53I, I198M, G206S, G257A, V311I,

Y360D (SEQ ID NO: 244)

6. I198M, G206S, V311I, Y360D (SEQ ID

NO: 245)

7. E125G, V311I, A335T, Y360D (SEQ ID

NO: 246 )

8. V311I, Y360D (SEQ ID NO: 247)

9. D85E, A94V, A149E, A194V, V311I,

Y360D (SEQ ID NO: 248)

10. T227I, V311I, Y360D (SEQ ID NO: 249)

11. A94T, I198M, G206S, V311I, Y360D

(SEQ ID NO: 250)

12. A175S, V311I, Y360D, L382F (SEQ ID

NO: 251)

13. A194V, V311I, Y360D, T410I (SEQ ID

NO: 252)

14. A202T, T286S, V311I, A335T, Y360D

(SEQ ID NO: 253)

15. A175S, V311I, Y360D (SEQ ID NO: 254)

16. H129R, S182N, K195Q, I198M, G206S,

V311I, Y360D (SEQ ID NO: 255)

17. G84E, S182N, I198M, G206S, V311I,

Y360D (SEQ ID NO: 256)

18. I198M, G206S, K300N, V311I, Y360D

(SEQ ID NO: 257)

19. A202T, G282C, V311I, Y360D (SEQ ID

NO: 258)

20. L213F, V311I, Y360D (SEQ ID NO: 259)

21. A202T (SEQ ID NO: 260)

22. S182N, K195Q, I198M, G206S, V311I,

Y360D (SEQ ID NO: 261)

23. I198M, G206S, V311I, A350T, Y360D

(SEQ ID NO: 262)

24. A175S, I198M, G206S, V311I, Y360D

(SEQ ID NO: 263)

25. V40I, A202T, G206S, V311I, Y360D

(SEQ ID NO: 264)

26. A202T, V311I, Y360D, L382F (SEQ ID

NO: 265)

27. E125G, I198M, V311I, Y360D (SEQ ID

NO: 266)

28. A175S, V311I, Y360D (SEQ ID NO: 267)

29. Y138H, L152I, A175S, V311I, Y360D

(SEQ ID NO: 268)

30. A175S, V311I, Y360D, T410I, L428I

(SEQ ID NO: 269)

31. A175S, L213F, A231V, V311I, Y360D

(SEQ ID NO: 270)

32. L75S (SEQ ID NO: 271)

33. A149V, A194V, V311I, Y360D, T410I

(SEQ ID NO: 272)

34. A194V, V311I, Y360D (SEQ ID

NO: 273)

Alignment and analysis of genetic sequence data from isolates of S. pneumoniae present in the publicly accessible National Center for Biotechnology Information (NCBI) GenBank database (National Center for Biotechnology Information (NCBI), U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda Md., 20894, USA) was performed using the MEGABLAST® program for aligning nucleotide sequences (Altschul et al., J. Mol. Biol., 215(3): 403-410 (1990); Zhang et al., J. Comput. Biol., 7 (1/2): 203-214 (2000)) and BLASTP for aligning amino acid sequences (Altschul et al. (1990)) to determine the degree of conservation across the genomic sequences. Eventually 231 complete S. pneumoniae genomes were analyzed. This analysis involved aligning over 10 million nucleotide sequences and over 3 million corresponding amino acid sequences. Where possible, corresponding cold spot proteins from non-fully sequenced pneumococcal genomes in the Microbial Genomes database of NCBI were also included in these alignment studies. Thus, for many of the disclosed cold spot polypeptides, sequence homology data included more than the 231 related sequences from the fully sequenced pneumococcal genomes.

By this alignment analysis, 16 putative surface-expressed antigens encoded by cold spot genes were identified that had an average amino acid pairwise sequence homology of 99.50% or greater. A group of the top 21 cold spot genes most likely to be essential by locus criticality (e.g., proximity to essential rrn operons, see, for example, Bouchet et al., Clin. Microbiol. Rev., 21: 2262-2273 (2008)) and/or function, encoding polypeptides corresponding to SEQ ID NOS:1-21 in Table 1A (and their alleles, SEQ ID NOs:22-273 in Table 1B), were selected for further characterization. Using dot blot DNA hybridization, the top 21 cold spot genes were also subjected to a survey for species-wide commonality (presence) across the evolutionary diversity of a worldwide phylogenetically organized collection of more than 2500 pneumococcal isolates from North America, South America, Europe, Africa, South Asia, and China. The results confirmed that the 21 genes were common to (“present in”, “in-common” among) the more than 2500 S. pneumoniae isolates in the inventor's laboratory collection of isolates, indicating that all 21 of the S. pneumoniae proteins were encoded in the genome of every known strain of S. pneumoniae—in other words, such cold spot genes were universally present in the species.

Production of Cold Spot Polypeptides and Fragments Thereof

Cold spot polypeptides, and fragments thereof, of the present invention may be produced by any of a number of techniques known in the art including recombinant genetic engineering, chemical synthesis, and cell-free translation.

For example, a cold spot polypeptide, or immunogenic fragment thereof, can be produced using methods of conventional recombinant nucleic acid technology to express a cold spot gene or portion thereof in cultured cells. See, for example, Sambrook et al., eds., Molecular Cloning: a Laboratory Manual (3d ed.) (Cold Spring Harbor Press, 2001); Ausubel et al. (eds.), Current Protocols in Molecular Biology (John Wiley & Sons, Inc., New York, 1994); Innis et al. (eds.), PCR Protocols (Academic Press, New York, 1990). For example, a nucleic acid molecule encoding a cold spot polypeptide, or fragment thereof, may be inserted into an expression vector, such that the open reading frame is properly oriented for the expression of the encoded protein under the control of a promoter of choice that is compatible with a particular eukaryotic or prokaryotic host cell. In addition to a promoter, an expression vector should also contain the necessary elements for the transcription of the inserted nucleic acid encoding the desired polypeptide.

By way of non-limiting example, for expression of a cold spot polypeptide or fragment thereof in host cells, an expression vector encoding a cold spot polypeptide, or desired fragment thereof, can be introduced into a host cell by standard techniques. A wide variety of techniques may be used for the introduction of exogenous DNA into a prokaryotic or eukaryotic host cell, including, but not limited to, electroporation (for example, Neumann et al., EMBO J., 1(7):841-845 (1982); Wong et al., Biochem. Biophys. Res. Commun., 107(2):584-587 (1982); Potter et al., Proc. Natl. Acad. Sci. USA, 81(22):7161-7165 (1984), incorporated herein by reference), calcium-phosphate precipitation, DEAE-dextran transfection, lipofection, protoplast fusion, particle bombardment, polyethylene glycol-mediated DNA uptake (see, for example, Sambrook et al. (eds.), Molecular Cloning: A Laboratory Manual, second edition (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989), which is hereby incorporated by reference in its entirety), and fusion of protoplasts with other entities (for example, minicells, other cells, liposomes, or other fusible lipid-surfaced bodies that contain the desired cold spot gene or coding sequence). See, for example, Fraley et al., Proc. Natl. Acad. Sci. USA, 79(6): 1859-1863 (1982), which is hereby incorporated by reference in its entirety. Such methods of introduction of DNA into a target cell will be variously referred to in the art as, e.g., “transformation”, “conjugation”, “transduction”, “transfection”, etc., and all such methods and techniques are contemplated and may be appropriately used according to the desires of the practitioner.

Using standard methods, such as described above, a nucleic acid molecule encoding a cold spot polypeptide, or fragment thereof (for example, an immunogenic fragment), can be cloned into an expression vector, which in turn is introduced into a host cell for expression of the encoded cold spot polypeptide or fragment thereof. A variety of expression systems comprising suitable expression vectors and host cells are available in the art for expressing a recombinant nucleic acid encoding a cold spot polypeptide or fragment thereof. Examples of suitable host cells include, but are not limited to, bacterial cells, fungal cells, mammalian cells, insect cells, plant cells, and protist cells. For expression of large quantities of cold spot polypeptides, prokaryotic hosts will typically be employed. Suitable prokaryotic host cells useful in the invention include, but are not limited to, Escherichia coli, Bacillus (for example, B. subtilis), Streptomyces (for example, S. lividans), Salmonella, and Pseudomonas. Escherichia coli is preferred. The use of a Streptococcus (for example, S. pneumoniae) strain as a host is possible but not preferred, as it would require separation of the recombinantly expressed cold spot polypeptide from other streptococcal proteins.

Under certain circumstances, expression in a eukaryotic host cell may be favored. Preferred mammalian host cells for expressing a recombinant cold spot polypeptide or fragment thereof of the invention include, but are not limited to, Chinese Hamster Ovary (CHO cells) (including dhfr− CHO cells, described in Urlaub and Chasin, Proc. Natl. Acad. Sci. USA, 77:4216-4220 (1980)) preferably used with a DHFR selectable marker as described in Kaufman and Sharp, J. Mol. Biol., 159: 601-621 (1982)), NS0 myeloma cells, COS cells, and SP2 cells.

By way of non-limiting example, when a recombinant expression vector carrying an inserted S. pneumoniae cold spot gene is introduced into a host cell, the cold spot polypeptide is produced by culturing the host cell for a period of time sufficient to allow for expression of the cold spot polypeptide in the host cells. Cold spot polypeptides can be recovered from the cultured host cells using standard protein purification methods and then formulated into immunogenic compositions or vaccines following standard methods and protocols in the art.

As should be clear from the above comments, a variety of methods are available in the art for the skilled practitioner to express cold spot polypeptides or fragments thereof. Alternatively, in view of the full disclosure herein of the amino acid sequence encoded by each cold spot gene (Table 1A and Table 1B), a cold spot polypeptide or fragment thereof may also be produced to order from a contract research organization. Similarly, polyclonal or monoclonal antibodies that bind to a particular cold spot polypeptide or antigenic fragment thereof may be produced using well known immunization techniques, which may also be performed as a service by contract research organizations, such as ProMab (Richmond, Calif.) or GenScript (Piscataway, N.J.).

Compositions Comprising Cold Spot Polypeptides and Antigenic Fragments Thereof

The polypeptides encoded by the recombinational cold spot genes described above (“cold spot polypeptides”) are useful in immunogenic compositions that are capable of eliciting an immune response to the cold spot polypeptide or an antigenic fragment thereof (such as the isolated cold spot extracellular domain). An immunogenic composition of the invention comprises at least one cold spot polypeptide or an immunogenic fragment thereof. Such compositions may also advantageously include a pharmaceutically acceptable carrier (vehicle) or diluent and/or an adjuvant to enhance elicitation of an immune response. Immunogenic compositions of the invention may be used to raise polyclonal or monoclonal antibodies capable of specific binding to the cold spot polypeptide, or an immunogenic fragment thereof, using standard methods and compositions known in the art.

For purposes of vaccine development, cold spot polypeptides comprising the native amino acid sequence of an in-common S. pneumoniae surface antigen, in particular the extracellular domain of the surface antigen, or an immunogenic fragment thereof, are preferred. Preferably, the predominant cold spot amino acid sequence among known alleles will be used, but other allelic sequences may be used and are expected to be effective, due to the extremely high sequence similarity among alleles, exceeding 95%, and usually exceeding 99%, average amino acid sequence pairwise homology, among the 231 known S. pneumoniae genomes. Although less preferred, a minor degree of artificial mutation from the predominant native amino acid sequence can be tolerated without destroying the desired immunogenicity that produces antibodies recognizing the native antigenic polypeptide and natural pneumococcal cells. Such artificial mutations may be introduced for other reasons, such as increasing recombinant expression, avoiding intracellular protease digestion, improving solubility, aiding purification, and the like; and, accordingly, amino acid variants of the aforementioned cold spot polypeptides that maintain at least 90%, preferably 95%, and most preferably 98% or more sequence identity with the predominant native amino acid sequence for any given cold spot polypeptide may be used, where the variant polypeptide maintains immunogenicity, i.e., the ability to elicit production of antibodies reactive with the naturally occurring cold spot polypeptide.

An immunogenic composition comprising a cold spot polypeptide, or immunogenic fragment thereof, may be used in any of a variety of methods known in the art for raising polyclonal or monoclonal antibodies to the polypeptide or antigenic fragment thereof. Such methods will typically require administration to a non-human animal (for example, mouse, rat, rabbit, guinea pig, hamster, chicken) of a composition comprising a cold spot polypeptide, or antigenic fragment thereof, that is dissolved, suspended, or otherwise contained within a physiologically acceptable vehicle and may further comprise one or more additional physiologically acceptable components including, but not limited to, an adjuvant and/or an excipient.

Preferred immunogenic compositions and vaccines according to the invention comprise a cold spot polypeptide, or an immunogenic fragment thereof, as described herein, and may further comprise one or more pharmaceutically acceptable components, such as a pharmaceutically acceptable vehicle, adjuvant, excipient, or other ingredient. Immunogenic compositions and vaccines described herein are understood to be dispersed in sterile vehicles and used under sterile conditions to avoid introduction of infectious agents and/or undesirable or interfering substances into the human or non-human subject.

Pharmaceutical vehicles or pharmaceutical carriers useful in compositions described herein for eliciting an immune response or producing antibodies to a cold spot polypeptide, or immunogenic fragment thereof, are well known to practitioners in the art and include, but are not limited to, sterile water; pyrogen-free preparations of sterile water; saline; phosphate buffered saline (PBS); dextrose; glycerol; ethanol; and combinations thereof. In some compositions, it may be preferable to include physiological or pharmaceutically acceptable isotonic agents, for example, but not limited to, sugars; polyalcohols, such as mannitol or sorbitol; sodium chloride; and combinations thereof. Pharmaceutically acceptable carriers may further comprise minor amounts of auxiliary substances such as wetting or emulsifying agents, preservatives, or buffers to enhance the shelf life or effectiveness of an immunogenic composition or vaccine.

Excipients may also be used in pharmaceutical compositions according to the invention. An excipient is generally any compound or combination of compounds that provides a desired feature to a composition. The pH may be adjusted in a composition as necessary, for example, to promote or maintain solubility of component ingredients, to maintain stability of one or more ingredients in a composition, and/or to deter undesired growth of microorganisms that inadvertently may be introduced at some point during the preparation or use of a composition or vaccine.

Particularly with respect to raising polyclonal antibodies in non-human animals, a polypeptide may be conjugated to or otherwise associated with a carrier protein or matrix that enhances the immune response to the polypeptide. The term “carrier protein” in the context of an immunogenic composition refers to a protein that elicits an immune response to itself and/or to an antigen conjugated or otherwise associated with or complexed with such carrier protein. In a conjugate immunogenic composition, an antigen is reacted with a carrier protein, so that the antigen and carrier protein are covalently linked to each other by design. Preferably, a carrier protein contains epitopes recognized by a helper T-cell to stimulate B cells that produce antibodies to the antigen that is conjugated to the carrier protein. Carrier proteins that are useful in raising antibodies to a polypeptide that is conjugated or otherwise associated with such carrier proteins include, but are not limited to, keyhole limpet hemocyanin (KLH), blue carrier protein, ovalbumin, bovine serum albumin, and derivatives thereof (for example, pegylated forms). Also encompassed by the definition of a “carrier protein” are multi-antigenic peptides (MAPs), which are branched peptides having a plurality of reactive sites to which antigenic molecules may be linked Preferably, a MAP includes lysine (Lys) residues that can link to antigenic molecules. Exemplary carrier proteins may also include, but are not limited to, bacterial toxins and toxoids, which may be mutated or chemically treated, for example, to reduce undesirable reactogenicity with tissues of an individual. Such carrier proteins include, for example, diphtheria toxin or a non-toxic mutant thereof, for example, diphtheria toxoid; tetanus toxin, or a non-toxic mutant thereof, for example, tetanus toxoid; Pseudomonas aeruginosa exotoxin A, or a non-toxic mutant thereof; cholera toxin B subunit; tetanus toxin fragment C; bacterial flagellin; pneumolysin; listeriolysin O (LLO, and related molecules); an outer membrane protein of Neisseria meningitidis; Pseudomonas aeruginosa Hcp1 protein; Escherichia coli heat labile enterotoxin; Shiga-like toxin; human LTB protein; the dominant negative inhibitor mutant (DNI) of the Protective Antigen of Bacillus anthracis; and Escherichia coli beta-galactosidase.

A polypeptide can be conjugated to a carrier protein using standard protocols and cross-linking (or “coupling”) agents known in the art. Some cross-linking agents are “homobifunctional” coupling agents that have two identical reactive sites to link a polypeptide to a carrier protein. Some cross-linking agents are “heterobifunctional” coupling agents that have two different reactive sites to link a polypeptide (at one site) with a carrier protein (at the other different site). Examples of cross-linking agents that may be used for conjugating a polypeptide to a carrier protein for raising antibody to the polypeptide include, but are not limited to, any one of glutaraldehyde, maleimide agents (for example, m-maleimidobenzoyl-N-hydroxysuccinimide ester), carbodiimide agents, and bis-diazotized benzidine. Those skilled in the art will be able to select appropriate cross-linking agents and cross-linking chemistries to effect conjugation of desired antigens with a chosen carrier protein. For example, glutaraldehyde is a suitable homobifunctional cross-linking agent for creating cross-links between the NH₂-functional side-chains of lysine residues in the polypeptide antigen and a carrier protein.

Notwithstanding the foregoing discussion, the immunogenic cold spot polypeptides of the present invention will typically not be conjugated or cross-linked to a carrier protein, as this would not be necessary to render them immunogenic. In fact, it is contemplated that the cold spot polypeptides disclosed herein may themselves be useful as a carrier protein for conjugation with weakly immunogenic antigens, such as S. pneumoniae capsular polysaccharides. Additionally, the cold spot polypeptides of the present invention may be admixed with conventional conjugate vaccines (e.g., PREVNAR®-7, PREVNAR®-13) or capsular polysaccharide antigen compositions (e.g., PNEUMOVAX®-23) to produce a compound vaccine composition. Such compound compositions would be expected to have extended range—vaccination coverage extending beyond the conjugate vaccine or capsular polysaccharide vaccine components used in the admixture.

Methods for the preparation and formulation of vaccine compositions are well known to those skilled in the art. The choice of ingredients will for instance vary depending on the intended administration route of the composition. For example, vaccine compositions for parenteral administration may include pharmaceutically acceptable sterile aqueous or non-aqueous solvents, suspensions, and emulsions. Examples of non-aqueous solvents include, but are not limited to, propylene glycol, polyethylene glycol, vegetable oils (such as olive oil), and injectable organic esters such as ethyl oleate. Topical carriers or occlusive dressings can be used to increase skin permeability and enhance antigen absorption for topically applied vaccine compositions. Liquid dosage forms for oral administration may generally comprise, for example, a liposome composition containing the liquid dosage form. Suitable forms for suspending liposomes include emulsions, suspensions, solutions, syrups, and elixirs containing inert diluents commonly used in the art, such as purified sterile water.

An immunogenic composition or vaccine comprising an S. pneumoniae cold spot polypeptide, or immunogenic fragment thereof, of the invention may also comprise any of a variety of known adjuvants that enhance an immune response, especially production of antibodies, to an antigen. Examples of adjuvants that may be used in an immunogenic or vaccine composition comprising an S. pneumoniae cold spot polypeptide, or an immunogenic fragment thereof, described herein include, but are not limited to:

(1) Freund's complete adjuvant (FCA) or Freund's incomplete adjuvant (HA) may be employed to boost an immune response to a polypeptide in a non-human animal in order enhance the level (titer) of antibody produced to the polypeptide. FCA comprises a water in mineral oil emulsion and inactivated mycobacteria. HA comprises a water in mineral oil emulsion without the mycobacteria component of FCA. FCA and HA are particularly useful when antibody is being produced in a non-human animal for research purposes. Owing to various side effects, including severe local reactions at sites of inoculation, FCA is not approved for use in humans and, therefore, will not comprise a component of an immunogenic composition or vaccine intended for administration to human subjects. FIA has been used in some vaccines administered to humans, but is less preferred in view of undesirable side effects compared with other approved alternatives, such as alum salts and monophosphoryl lipid A discussed below.

(2) Aluminum salts, such as alum salts, are particularly useful as adjuvant to stimulate an immune response to a polypeptide in vaccine compositions administered to humans. The term “alum” is commonly used to refer to a compound that is a combination of aluminum and potassium sulfate salt (such as KAl(SO₄)₂.12H₂O), or to an aluminum hydroxide. Although relatively weak when compared to other adjuvants, alum salts are the most commonly employed adjuvants used in vaccine compositions that are manufactured for administration to human subjects owing to the low incidence, if any, of adverse side effects. The term “alum” may sometimes also be used to refer to a broader group of aluminum and ammonia sulfate salts. Other aluminum salts that may be used to stimulate an immune response to a cold spot polypeptide, or immunogenic fragment thereof, present in an immunogenic composition or vaccine include, without limitation, aluminum hydroxide, aluminum phosphate, aluminum sulfate, hydrated alumina, alumina hydrate, alumina trihydrate (ATH), aluminum hydrate, aluminum trihydrate, alhydrogel, Superfos, Amphogel, aluminum (III) hydroxide, aluminum hydroxyphosphate sulfate, amorphous alumina, trihydrated alumina, and trihydroxyaluminum.

(3) Monophosphoryl lipid A, which, like alum, is an adjuvant that has been approved and used in vaccines manufactured for administration to human subjects.

(4) Oil-in-water emulsion formulations (with or without other specific immunostimulating agents such as muramyl peptides or bacterial cell wall components), such as, for example, (a) MF59 (International Publication No. WO 90/14837), containing 5% squalene, 0.5% Tween 80, and 0.5% Span 85 (optionally containing various amounts of MTP-PE (see below, although not required)) formulated into submicron particles using a microfluidizer such as Model 11 OY microfluidizer (Microfluidics, Newton, Mass.); (b) SAF, containing 10% squalene, 0.4% Tween 80, 5% pluronic-blocked polymer L121, and thr-MDP (see below) either microfluidized into a submicron emulsion or vortexed to generate a larger particle size emulsion; and (c) RIBI™ adjuvant system (RAS), (Corixa, Hamilton, Mont.) containing 2% squalene, 0.2% Tween 80, and one or more bacterial cell wall components from the group consisting of 3-0-deaylated monophosphoryl lipid A (MPL™) described in U.S. Pat. No. 4,912,094 (Corixa), trehalose dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL+CWS (Detox™).

(5) Muramyl peptides used as adjuvants may include, but are not limited to, N-acetyl-muramyl-L-threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanine-2-(1′-2′dipalmitoyl-sn-glycero-3-hydroxyphosphoryloxy)-ethylamine (MTP-PE).

(6) Saponin adjuvants, such as Quil A or STIMULON™ QS-21 (Antigenics, Framingham, Mass., USA) (U.S. Pat. No. 5,057,540) may be used, or particles generated therefrom such as ISCOMs (immunostimulating complexes).

(7) Bacterial lipopolysaccharide (LPS) or synthetic lipid A analogs, such as aminoalkyl glucosamine phosphate compounds (AGP), or derivatives or analogs thereof (available from Corixa), as described in U.S. Pat. No. 6,113,918. One such AGP is 2-[(R)-3-tetradecanoyloxytetradecanoylamino]ethyl 2-deoxy-4-0-phosphono-3-0-[(R)-3-tetradecanoyloxytetradecanoyl]-2-[(R)-3-tetradecanoyloxytetradecanoylamino]-b-D-glucopyranoside, which is also known as 529 (formerly known as RC529), which is formulated as an aqueous form or as a stable emulsion.

(8) Synthetic polynucleotides, such as oligonucleotides containing CpG motif(s) as described, for example, in U.S. Pat. No. 6,207,646 may also be used in vaccine compositions of the invention.

(9) Cytokines, including, but not limited to, interleukins (for example, any one of IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, IL-15, IL-18, etc.), interferons (for example, gamma interferon), granulocyte macrophage colony stimulating factor (GM-CSP), macrophage colony stimulating factor (M-CSF), tumor necrosis factor (TNF), costimulatory molecules B7-1 and B7-2, may be used as adjuvants in immunogenic compositions and vaccines of the invention.

(10) Detoxified mutants of a bacterial ADP-ribosylating toxin such as a cholera toxin (CT) either in a wild-type or mutant form, for example, where the glutamic acid at amino acid position 29 is replaced by another amino acid, preferably a histidine, in accordance with International Publication No. WO 00/18434. See also WO 02/098368 and WO 02/098369), a pertussis toxin (PT), or an E. coli heat-labile toxin (LT), particularly LT-K63, LT-R72, CT-S109, PT-K9/G129 (see, e.g., WO 93/13202 and WO 92/19265).

Additional descriptions and details of the above adjuvants and others may be found in various pharmaceutical science and compositions handbooks and reviews available in the art. See, for example, in Petrovsky et al., Immunol. Cell Biol., 82: 488-496 (2004), incorporated herein by reference.

It is understood that any immunogenic composition or vaccine comprising a cold spot polypeptide, or immunogenic fragment thereof, of the invention will require regulatory approval for non-experimental use of the composition in human subjects. Accordingly, an adjuvant that is not commonly employed in immunogenic compositions and vaccines, may nevertheless be approved when present as a component of a particular formulation that is granted regulatory approval for use in humans. A regulatory approval process typically assesses the benefit of a vaccine with respect to pneumococcal infection and disease against any risk or severity of untoward side effects.

Thus, any of a variety of pharmaceutically acceptable ingredients may be used in preparing an immunogenic composition comprising one or more S. pneumonia cold spot surface antigens, or immunogenic polypeptide fragments thereof, according to the invention. Preferably, such immunogenic compositions are prepared that are used to raise an immune response in a human that produces antibodies reactive with a majority of S. pneumonia capsular serotypes, more preferably with at least 80 different S. pneumonia serotypes, even more preferably with at least 90 different S. pneumonia serotypes, and still more preferably, with all or nearly all pneumococcal strains (currently, ˜94 known serotypes).

The invention also provides methods of making immunogenic compositions described above. For example, a method of making an immunogenic composition for raising an immune response producing antibodies reactive against at least 80 different serotypes of S. pneumoniae comprises:

(1) selecting one or more cold spot surface antigens of S. pneumoniae,

(2) isolating one or more polypeptide segments from an extracellular domain of the one or more cold spot surface antigens selected in step (1), and

(3) formulating said one or more isolated polypeptide segments obtained in step (2) by admixing said isolated polypeptide with a pharmaceutically acceptable carrier, to produce an immunogenic composition for inoculating human subjects against S. pneumonia infection.

In an embodiment of the above method of making an immunogenic composition according to the invention, 1, 2, 3, 4, or 5 cold spot surface antigens are selected in step (1). In another embodiment of the above method, 1 or 2 cold spot surface antigens are selected. In a further embodiment of the method described above, 1 cold spot surface antigen is selected in step (1).

Preferably, in the above method of making an immunogenic composition according to the invention, the composition, when administered to a mammalian subject, is effective for raising an immune response producing antibodies reactive with at least 90 S. pneumonia serotypes, more preferably with at least 91 S. pneumonia serotypes, and even more preferably with at least 93 S. pneumonia serotypes.

In addition, in the above method of making an immunogenic composition according to the invention, the immunogenic composition may also include one or more adjuvants.

Uses of Cold Spot Polypeptide Compositions

As noted above, an immunogenic composition comprising an S. pneumoniae cold spot polypeptide, or immunogenic fragment thereof, described herein may be used to raise polyclonal or monoclonal antibodies for research or for passive immunity (immunotherapy).

For example, a polyclonal or monoclonal antibody raised against a cold spot polypeptide can be used to identify fragments of the cold spot polypeptide that are bound by the antibody (“antigenic” fragments). In a general, but non-limiting, example, standard deletion mutation protocols are available in the art and/or from commercial sources (for example, polymerase chain reaction (PCR) or nuclease deletion protocol) that can be used to generate a nested set of deletion mutated (truncated) sequences of a cold spot gene. Each of the corresponding encoded deletion mutant (truncated) polypeptides may then be expressed using a standard recombinant expression system (for example, but not limited to, a bacterial or yeast expression system) or by a cell-free in vitro transcription and translation system, which are readily available in the art and/or from commercial sources. Some deletion protocols also provide internal deletions of sequences for additional or closer analyses of encoded mutated polypeptides. Each of the expressed mutant polypeptides can then be assayed to determine whether or not it is bound by a polyclonal or monoclonal antibody raised to the cold spot protein using any of a variety of immuno-detection assays (for example, enzyme-linked immunosorbent assay (ELISA), immuno-dot blot, Western blot), which provide relatively rapid, multi-sample determination of antibody binding. Such mutant analysis to identify antigenic fragments of a cold spot polypeptide can also be carried out under contract with a commercial laboratory.

Other immunogenic compositions may be designed and used to determine efficacy as a potential vaccine composition to produce antibody that provides protective immunity against S. pneumoniae infection in a human or non-human animal subject or an animal model of pneumococcal infection. A “protective” immunity or “protective” immune response, when used in the context of a S. pneumoniae cold spot polypeptide or immunogenic fragment thereof, a composition comprising a S. pneumoniae cold spot polypeptide or immunogenic fragment thereof, or a method comprising administration of a S. pneumoniae cold spot polypeptide or immunogenic fragment thereof indicates that inoculation with such polypeptide, such composition, or according to such method results in a detectable immune response of sufficient magnitude (antibody titer) to provide a level of protection against S. pneumoniae infection in the inoculated human or non-human animal subject. Such protective immunity includes therapeutic and/or prophylactic effects that reduce the likelihood of S. pneumoniae infection or the likelihood of contracting one or more disorders resulting from such infection, as well as reducing the severity of the infection and/or a disorder resulting from such infection. As such, a protective immune response includes, for example, the ability to reduce bacterial load, ameliorate one or more disorders or symptoms associated with S. pneumoniae infection, and/or to delay the onset of disease progression resulting from S. pneumoniae infection. Diseases resulting from S. pneumoniae infection of particular interest with respect to this invention are pneumonia (lung infection), otitis media (middle ear infection), sinusitis (sinus infection), sepsis (bacteremia, bloodstream infection), and meningitis (infection of the meninges, which comprise three membranes covering the brain and spinal cord).

The level of protection can be assessed initially using animal models and subsequently phased clinical trials. The nature of an immune response may be measured, for example, by flow cytometry, development of antibodies, ELISA, opsonophagocytosis (OPA), or by measuring resistance to pathogen challenge in vivo. In a non-limiting example, to determine whether a protective immune response is induced by compositions of the present invention, immunized test animals can be challenged with Streptococcus pneumoniae and the growth and spread of the bacteria in the animals, survival of challenged animals, or change in health of the challenged animals may be measured over time. Such a study may also monitor progression of symptoms of any disease caused by pneumococcal infection in one or more tissues or organs.

As explained above, cold spot polypeptides encoded by S. pneumoniae cold spot genes are highly conserved across most if not all known capsular serotypes of S. pneumoniae. Therefore, a pharmaceutically acceptable immunogenic composition comprising a cold spot polypeptide, or an immunogenic fragment thereof, as described herein may be used as a vaccine that is administered to a human subject to elicit an immune response to the cold spot polypeptide or immunogenic fragment thereof, and thereby to any or all S. pneumoniae capsular serotypes that express the same highly conserved cold spot polypeptide or immunogenic fragment. Such a vaccine is particularly desirable because currently marketed vaccines are designed to elicit an immune response to only a limited number of the 94 or so known pneumococcal polysaccharide capsular serotypes. As just one example, the vaccine currently marketed as PREVNAR-13® (Wyeth LLC, marketed by Pfizer, New York, N.Y.) comprises 13 polysaccharide capsular antigens from 13 selected pneumococcal capsular serotypes and thus is formulated to elicit an immune response to infection by any of the 13 selected S. pneumoniae serotypes but none of the remaining 81 known S. pneumoniae capsular serotypes. In contrast, each cold spot polypeptide described herein is highly conserved and expressed across all or nearly all known S. pneumoniae capsular serotypes, and therefore an immune response elicited to a cold spot polypeptide, or immunogenic fragment thereof, provides an immune response (such as antibodies) directed to most if not all known serotypes of S. pneumoniae. Moreover, the emergence of a breakthrough escape mutant is significantly lower owing to the recombinantly quiescent (cold spot) origin of the gene encoding the cold spot polypeptide. It is expected that a vaccine composition comprising as few as two cold spot polypeptides or immunogenic fragments thereof would virtually eliminate the possibility of a breakthrough strain of S. pneumoniae emerging.

Accordingly, in a preferred embodiment, an immunogenic composition of the invention useful as a vaccine against pneumococcal infection comprises at least one S. pneumoniae cold spot polypeptide or immunogenic fragment thereof, as described herein; a pharmaceutically acceptable vehicle (such as PBS); and, as optionally desired, a pharmaceutically acceptable adjuvant (for example, an alum salt or another adjuvant described above). Based on the property of a vaccine composition of the invention comprising at least one S. pneumoniae cold spot polypeptide or immunogenic fragment thereof to elicit an immune response that produces antibodies to most if not all pneumococcal serotypes, a vaccine composition of the invention may also be referred to as a “capsular serotype-independent vaccine” or, alternatively, as a “universal S. pneumoniae vaccine”.

Detection and Characterization of Antibodies Raised Against Cold Spot Polypeptides

Antibodies raised to a S. pneumoniae cold spot polypeptide, or an immunogenic fragment thereof, may be detected using any of a variety of assays that are known in the art and/or that are commercially available. Such assays present a cold spot polypeptide or an antigenic fragment thereof that can be brought into contact with a sample that contains or is suspected of containing antibody raised to the cold spot polypeptide or an immunogenic fragment thereof. Such assay formats include, but are not limited to, immunoprecipitation, enzyme linked immunosorbent assay (ELISA), opsonophagocytosis (OPA), immuno-dot blot assay, Western (immuno) blotting, immuno-affinity chromatography, flow cytometry (FCM), and fluorescence-activated cell sorting (FACS). In such assays, antibody bound to the cold spot polypeptide or antigenic fragment thereof, forming an antibody-antigen complex, can be detected by any of a variety of detection systems available in the art. Such detection systems may include the use of a secondary antibody that will bind to the antibody raised against the cold spot polypeptide or fragment thereof. Such a secondary antibody may be conjugated or otherwise linked to a detectable label, such as, but not limited to, a radiolabel, a fluorescent label, biotin, or an enzyme that can generate a detectable signal when provided with its corresponding substrate. Examples of enzymes that may be conjugated to a secondary antibody in such detection systems include, but are not limited to, horseradish peroxidase, alkaline phosphatase, and luciferase. Other standard assays for detecting antibody or determining the titer of antibody in a sample are known to the skilled practitioner in the art.

Antibodies can be assessed for binding affinity and potency using various analytical ELISA formats or by detecting binding in real time using surface plasmon resonance (SPR) techniques using, for example, a BIACORE™ SPR detection instrument (GE Healthcare Bio-Sciences, Pittsburgh, Pa., USA). Such affinity assays may be especially useful in characterizing antibody used for research or passive immunity studies.

Additional embodiments and features of the invention will be apparent from the following non-limiting examples.

EXAMPLES

Streptococcus Pneumoniae Highly Conserved “Cold Spot” Proteins Useful in Pneumococcal Vaccines

By analysis of 50-kb regions upstream and downstream of the four rrn operons of all available S. pneumoniae genomes on the NCBI website as outlined supra, recombinationally quiescent or “cold spot” genes were identified that encode polypeptides (cold spot polypeptides) of extremely high retention from strain to strain and extremely low variability in sequence. Among such cold spots in the S. pneumoniae genome, it was hypothesized that proteins would be encoded that would be valuable as vaccine antigens. Using sub-cellular protein localization algorithms (Gpos-mPloc, described by Shen et al., Protein Pept. Lett., 16(12): 1478-1484 (2009); PSORTdb described by Yu et al., Nucleic Acids Res., 39 (database issue): D241-D244 (2011); and LocateP described by Zhou et al., BMC Bioinfonnatics, 9(173): 17 pages. (2008)), 104 cold spot genes, encoding putative expressed proteins, were identified from the rrn flanking regions. At the time of the analysis, 231 fully sequenced S. pneumoniae genomes were available for comparison. Cold spot genes that were in-common (present) in all strains and showed very low sequence variability among all 231 sequenced genomes reported in the NCBI pneumococcal database were of particular interest, as these two characteristics gave an initial indication that such phenotypes might be essential for survival of the cell, since S. pneumoniae genomes deleting or significantly mutating the protein evidently did not survive to propagate.

A search of the scientific literature was conducted on each of the 104 genes and their predicted expression products, and on cognate proteins of other organisms, to determine likely biological properties and functions of the encoded surface proteins. From the literature survey, ˜79 genes were selected as likely encoding surface proteins and ranked in order of most likely to be essential to cell survival. The top 21 surface proteins from the 79 identified surface protein genes located in the flanking regions of the S. pneumoniae rrn operons were further analyzed for universal presence in the 231 known, fully sequenced S. pneumoniae genomes and to determine the exact degree of sequence invariability across the population of different S. pneumoniae strains. Sequence information and amino acid sequence homology for the 21 cold spot surface proteins and their allelic variants is shown in Tables 1A and 1B, above. Identification of 58 additional surface proteins from the cold spot flanking regions adjacent the rrn operons of S. pneumoniae is made in Table 2, below.

The 58 genes in the table below are listed by consecutive map positions, located immediately up-stream and down-stream of the 4 rrn operons (rrnA, rrnB, rrnC, rrnD) of the genomically sequenced S. pneumoniae isolate TIGR4. The data in Table 2 were acquired by examination of at least 123 pneumococcal isolates. The data presented in Tables 1A and 1B (supra) are thus more complete, representing comparison of all 231 fully sequenced pneumococcal genomes (as well as additional sequences from incomplete genomes) available at the time. In Table 2, below, in addition to the mapped position for the 58 additional surface proteins identified, the mapped position in the TIGR4 S. pneumoniae genome of the four rrn operons is also given, to show the location of the cold spot genes in relation to the positions of the rrn operons.

TABLE 2

Additional Surface Protein-Encoding Genes from

rrn Operon-Flanking Regions of S. pneumoniae

mapped position

GenBank

in

Designation in

Gene

TIGR4 genome,

TIGR4
Encoded Protein and
Presence

including stop
protein

Genomic
Function
across ≥123
% Sequence
codon
length

Sequence
(if known)
genomes
Homology
(SEQ ID NO:)
(aa)

SP_2207
ComF, putative
100%
>99%
2127745-
220

competence protein;

2128407

KEGG pathway Type

(SEQ ID

II secretion system

NO: 274)

SP_2216
PcsB, putative murein
98%
>99%
2135267-
392

hydrolase or protein

2136445

required for cell wall

(SEQ ID

separation

NO: 275)

SP_2223
Hypothetical protein;
100%
>99%
2140901-
276

Function unknown.

2141731

(SEQ ID

NO: 276)

SP_2226
Hypothetical protein;
100%
>99%
2144453-
122

Function unknown

2144821

(SEQ ID

NO: 277)

SP_0005
PTH, peptidyl-tRNA
100%
>99%
4382-
189

hydrolase

4951

(SEQ ID

NO: 278)

SP_0007
SS4Dom
100%
>99%
8519-
88

8785

(SEQ ID

NO: 279)

SP_0008
Hypothetical protein,
100%
>99%
8778-
122

PY-triad

9146

(SEQ ID

NO: 280)

SP_0010
Hypothetical protein,
100%
>99%
9266-
422

PY-triad

10534

(SEQ ID

NO: 281)

rrnA operon
5S, 23S and 16S rrn
100%

15344-

5S-23S-16S
structural RNA

20236

genes
molecules

(SEQ ID

NO: 282)

SP_0016
IS630-Spn1,
100%
>99%
20929-
112

transposase Orf2

21267

Transposase and

(SEQ ID

inactivated derivatives

NO: 283)

SP_0020
cytidine/deoxycytidylate
100%
>99%
24319-
155

deaminase family

24786

protein

(SEQ ID

NO: 284)

SP_0021
Dut, deoxyuridine 5′-
100%
>99%
24973-
147

triphosphate

25416

nucleotidohydrolase

(SEQ ID

NO: 285)

SP_0024
conserved hypothetical
100%
>99%
27381-
165

protein, Carbonic

27878

anhydrase

(SEQ ID

NO: 286)

SP_0030
ccs16, competence-
100%
>99%
30563-
108

induced protein Ccs16;

30889

(SEQ ID

NO: 287)

SP_0037
plsX, fatty
98%
>99%
38101-
330

acid/phospholipid

39093

synthesis protein;

(SEQ ID

Lipid transport and

NO: 288)

metabolism

SP_0041
blpU, bacteriocin,
100%
>97%
39871-
76

PY-triad

40101

(SEQ ID

NO: 289)

SP_0046
purF, amidophospho-
100%
>97%
49228-
480

ribosyl transferase;

50670

Nucleotide transport

(SEQ ID

and metabolism.

NO: 290)

SP_0048
purN, phosphoribosyl
98%
>99%
51908-
181

glycinamide

52453

formyltransferase;

(SEQ ID

Nucleotide transport

NO: 291)

and metabolism:

SP_0050
purH, bifunctional
100%
>99%
53071-
515

phosphoribosyl

54618

aminoimidazole-

(SEQ ID

carboxamide formyl

NO: 292)

transferase/IMP

cyclohydrolase; Purine

metabolism

SP_0053
purE, phosphoribosyl
100%
>99%
56405-
162

aminoimidazole

56893

carboxylase catalytic

(SEQ ID

subunit; Catalyzes a

NO: 293)

step in the de novo

purine nucleotide

biosynthetic pathway

SP_0057
PY-triad, Beta-N-
98%
>98%
59624-
1,312

acetylhexosaminidase;

63562

identified by match to

(SEQ ID

PFAM protein family

NO: 294)

HMM PF00746

SP_1732
StkP, serine/threonine
100%
>98%
1634699-
659

protein kinase;

1636678

(SEQ ID

NO: 295)

SP_1793
hypothetical protein;
63%
>95%
1708735-
191

spoU rRNA methylase

1709310

family

(SEQ ID

NO: 296)

SP_1847
xpt, xanthine
100%
>98%
1755277-
193

phosphoribosyl

1755858

transferase;

(SEQ ID

Adenine/guanine

NO: 297)

phosphoribosyl

transferases and

related PRPP-binding

proteins; Nucleotide

transport and

metabolism

SP_1850
dpnC, type II
100%
>99%
1757628-
254

restriction

1758392

endonuclease DpnI;

(SEQ ID

Recognizes the

NO: 298)

double-stranded and

methylated sequence

G(Me)ATC and

cleaves after A-2;

Endonucleolytic

cleavage of DNA

SP_1855
alcohol
100%
>99%
1762944-
345

dehydrogenase, zinc-

1763981

containing; threonine

(SEQ ID

dehydrogenase and

NO: 299)

related Zn-dependent

dehydrogenases;

Amino acid transport

and metabolism

SP_1865
pepA, glutamyl-
100%
>99%
1771923-
354

aminopeptidase;

1772987

cellulase M and

(SEQ ID

related proteins;

NO: 300)

Acting on peptide

bonds; Carbohydrate

transport and

metabolism

SP_1897
sugar ABC transporter
98%
>99%
1803124-
419

substrate-binding

1804383

protein, PY-triad

(SEQ ID

NO: 301)

rrnD operon
5S, 23S and 16S rrn
100%

1810171-

5S-23S-16S
structural RNA

1815064

genes
molecules

(SEQ ID

NO: 302)

SP_1911
Thioredoxin, putative;
100%
>96%
1824358-
105

similar to SP: P29449

1824675

PID: 20047; identified

(SEQ ID

by sequence similarity

NO: 303)

SP_1912
Hypothetical protein
100%
>96%
1824672-
99

1824971

(SEQ ID

NO: 304)

SP_1923
ply, Pneumolysin;
100%
>99%
1831896-
471

identified by match to

1833311

PFAM protein family

(SEQ ID

HMM PF01289;

NO: 305)

SP_1937
lytA, Autolysin;
100%
>98%
1840405-
318

identified by match to

1841361

PFAM protein family

(SEQ ID

HMM PF015100

NO: 306)

acetylmuramoyl-L-

alanine amidase,

phage origin

SP_1947
Hypothetical protein
91%
>92%
1850094-
56

1850264

(SEQ ID

NO: 307)

SP_1948
Hypothetical protein;
100%
>99%
1850367-
74

similar to PID: 559859;

1850591

identified by sequence

(SEQ ID

similarity

NO: 308)

SP_1949
Hypothetical protein
100%
>99%
1850598-
62

1850786

(SEQ ID

NO: 309)

SP_1954
serine protease
99%
>99%
1857994-
467

subtilase family

1859397

protein

(SEQ ID

NO: 310)

SP_1955
Hypothetical protein
95%
>99%
1859412-
103

1859723

(SEQ ID

NO: 311)

SP_1980
cbf1, cmp-binding
100%
>99%
1884257-
308

factor1; Predicted HD-

1885183

superfamily hydrolase

(SEQ ID

NO: 312)

SP_1990
Primase-related
100%
>99%
1893835-
186

protein; topoisomerase

1894395

primase (TOPRIM)

(SEQ ID

nucleotidyl

NO: 313)

transferase/hydrolase

domain found in

Ribonuclease M5

SP_1991
Hydrolase; DNase
100%
>99%
1894395-
257

TatD; TatD like

1895168

proteins

(SEQ ID

NO: 314)

SP_1994
alaT,
100%
>99%
1896929-
404

Aminotransferase

1898143

AlaT (alanine

(SEQ ID

aminotransferase);

NO: 315)

Aspartate/tyrosine/

aromatic

aminotransferase;

Transferase activity,

transferring

nitrogenous groups

SP_1997
Cof family protein;
98%
>98%
1899242-
462

Hydrolyse activity

1900630

(catalyzing hydrolysis

(SEQ ID

of various bonds, e.g.,

NO: 316)

C—O, C—N, C—C,

phosphoric anhydride

bonds, etc.)

rrnC operon
5S, 23S and 16S rrn

1908828-

5S-23S-16S
structural RNA

1913721

genes
molecules

(SEQ ID

NO: 317)

SP_2010
pbp2A, Penicillin-
100%
>99%
1915717-
731

binding protein 2A

1917912

(penicillin-binding

(SEQ ID

protein, 1A family);

NO: 318)

Membrane

carboxypeptidase;

Cell wall/membrane/

envelope biogenesis;

Biosynthesis and

degradation of murein

sacculus and

peptidoglycan

SP_2014
IS630-Spn1,
Not
Not
1921149-
112

transposase Orf2;
Determined
Determined
1921487

DNA replication,

(SEQ ID

recombination, and

NO: 319)

repair

SP_2021
Glycosyl hydrolase,
Not
Not
1926449-
469

family 1; Beta-
Determined
Determined
1927858

glucosidase/6-

(SEQ ID

phospho-beta-

NO: 320)

glucosidase/beta-

galactosidase;

Carbohydrate transport

and metabolism

SP_2027
Hypothetical protein;
Not
Not
1933726-
136

COG4642,
Determined
Determined
1934136

Uncharacterized

(SEQ ID

protein conserved in

NO: 321)

bacteria (Function

unknown)

SP_2030
Tkt, Transketolase;
Not
Not
1935028-
658

carbohydrate transport
Determined
Determined
1937004

and metabolism;

(SEQ ID

catalyzes the

NO: 322)

formation of ribose 5-

phosphate and

xylulose 5-phosphate

from sedoheptulose 7-

phosphate and

glyceraldehyde 3-

phosphate

SP_2042
rnpA, Ribonuclease
Not
Not
1947360-
123

P; protein component
Determined
Determined
1947731

of RNaseP which

(SEQ ID

catalyzes the removal

NO: 323)

of the 5′-leader

sequence from pre-

tRNA to produce the

mature 5′terminus

SP_2051
cglC, competence
Not
Not
1952343-
108

protein CglC; similar
Determined
Determined
1952669

to GP: 3211750;

(SEQ ID

identified by sequence

NO: 324)

similarity

SP_2055
Alcohol
Not
Not
1955168-
352

dehydrogenase, zinc-
Determined
Determined
1956226

containing; amino acid

(SEQ ID

transport and

NO: 325)

metabolism

SP_2056
nagA, N-
Not
Not
1956389-
383

acetylglucosamine-6-
Determined
Determined
1957540

phosphate deacetylase;

(SEQ ID

Hydrolase activity

NO: 326)

acting on carbon-

nitrogen bonds other

than peptide bonds

SP_2057
Hypothetical protein;
Not
Not
1957693-
605

Predicted
Determined
Determined
1959510

acyltransferase

(SEQ ID

NO: 327)

SP_2063
LysM domain-
Not
Not
1963634-
370

containing protein;
Determined
Determined
1964746

Lysin domain, found

(SEQ ID

in a variety of

NO: 328)

enzymes involved in

bacterial cell wall

degradation

SP_2064
HAD superfamily
Not
Not
1964912-
206

hydrolase, haloacid
Determined
Determined
1965532

dehalogenase-like

(SEQ ID

family; identified by

NO: 329)

match to PFAM

protein family HMM

PF00702

SP_2066
thrC, Threonine
Not
Not
1966877-
494

synthase; catalyzes
Determined
Determined
1968361

formation of L-

(SEQ ID

threonine from O-

NO: 330)

phospho-L-

homoserine; Amino

acid transport and

metabolism

rrnB operon
5S, 23S and 16S rrn

1970844-

5S-23S-16S
structural RNA

1975734

genes
molecules

(SEQ ID

NO: 331)

SP_2093
hypothetical protein
Not
Not
2001118-
322

Determined
Determined
2002086

(SEQ ID

NO: 332)

SP_2097
2,3,4,5-tetrahydro-
Not
Not
2004797-
232

pyridine-2-carboxylate
Determined
Determined
2005495

N-succinyltransferase,

(SEQ ID

putative; Amino acid

NO: 333)

transport and

metabolism

SP_2105
Hypothetical protein
Not
Not
2014784-
190

Determined
Determined
2015356

(SEQ ID

NO: 334)

SP_2106
malP, glycogen
Not
Not
2016399-
752

phosphorylase family
Determined
Determined
2018657

protein; Glucan

(SEQ ID

phosphorylase;

NO: 335)

Carbohydrate transport

and metabolism

To determine that the cold spot genes selected were in-common to all S. pneumoniae strains, genomic preparations immobilized on Zeta-Probe nylon membranes (Bio-Rad Laboratories, Inc.) were probed with a radiolabeled probe derived from the coding sequence for each of the 21 selected cold spot genes. In this way, it was determined that each of the selected cold spot genes was universally present across the inventor's laboratory collection of 2,500 phylogenically organized S. pneumoniae isolates. This is significant, because all protein-based vaccine candidates offered thus far have proved to be non-universal across all strains or have proved to be highly variable in sequence when multiple strains are considered. This leads to failure of such vaccines to address a significant number of S. pneumoniae strains or to allow “breakthrough” of strains in which an antigen which a candidate vaccine targets avoids the immune response induced by the original vaccine. See, e.g., Navarro-Torné et al., Emerging Infectious Diseases, 21(3):417-425 (2015).

Next, amino acid sequences of the predicted expression products of the surface protein genes were analyzed for sequence homology. The reference population of genomic sequences for this work was the 231 fully sequenced S. pneumoniae genomes available in the NCBI GenBank database at the time. For some proteins, alleles from additional strains for which the entire genome had not been fully sequenced but which were nonetheless available from the GenBank database were also analyzed. Amino acid sequences for each of the 21 proteins in each of the 231+ full or partial genomes were aligned and the degree of variability in sequence was determined. For each protein, a predominant sequence appeared, in which the amino acid sequence was the same across a majority of the strains considered, followed by allelic variants of varying numbers. The allelic variants differed from the predominant sequence by from one amino acid switch to as many as 23 amino acid sequence variations along the full protein sequence, for the 21 proteins studied. The number of alleles varied across the 21 cold spot genes studied from 1 to 34.

To quantitate the degree of sequence variability, the percent average amino acid sequence pairwise homology was determined for each cold spot polypeptide, according to the formula:

$\frac{(100 % \times n) + (x % \times n 1) + (x % \times n 2) + \dots (x % \times nI)}{(total number of sequences compared)}$

where: x % is the percent sequence identity in comparison to the predominant aa sequence;

n is the number of strains having the most prevalent (predominant) amino acid sequence;

n1 is the number of allelic variants of the next most prevalent amino acid sequence, after n;

n2 is the number of allelic variants of the next most prevalent amino acid sequence, after n1;

nI is the number of allelic variants of the least prevalent allelic variant sequence, where I is the total number of allelic variants in the population of amino acid sequences considered.

Accordingly, by way of example, a S. pneumoniae surface protein of 450 amino acids, having 65 allelic variants, where each of the variants has two amino acid differences from the predominant sequence, and where 231 genomic sequences were compared, would have an average amino acid sequence pairwise homology of over 99.87%, calculated as: [(100%×166)+(448/450×100%×65)]÷231=[16600%+6471.11%]÷231=23071.11%÷231=99.875% average amino acid sequence pairwise homology.

It will be appreciated that the average amino acid sequence pairwise homology value will be affected by the total number of sequences compared, and therefore the question arises how many sequences must be compared before the pairwise homology number is accurately representative of the variability of the amino acid sequence throughout the universe of S. pneumoniae strains. Because the amino acid sequences encoded by cold spot genes are extremely invariant, the number of sequences that must be compared before a sequence homology value is obtained that is representative of the diversity of the entire population of S. pneumoniae strains is much lower than for a protein that shows more sequence variability across the population. To illustrate this, non-linear regression analysis was performed, plotting the cumulative number of allelic variants against the cumulative number of isolates considered for three cold spot polypeptides, i.e., GBSP23 (SP_2239), GBSP3 (SP_2218), and GBSP6 (SP_2217), in comparison with another well-known S. pneumoniae surface protein, PspA (SP_0117), which is much more variable across the population of S. pneumoniae strains. The comparative plot is shown in FIG. 1. The additional plot shown in FIG. 1, for SP_1872, was also for a cold spot polypeptide, but one having a lesser amino acid sequence pairwise homology (99.19%), and that protein was not extensively studied at this time. From this analysis, it can be seen that for the cold spot genes, the rate of diversification levels off well under 200 isolates, meaning that the results of sequence comparisons reported herein of 231 fully sequenced genomes (plus additional instances of individual genes from incomplete genomes) is representative of the diversity of the universe of S. pneumoniae strains. In contrast, referring in FIG. 1 to the curve for PspA, the rate of diversification has not leveled off on this plot at the limit of 300 compared isolates, meaning that many more isolates (than 300) would need to be compared before an accurate analysis of sequence diversity was obtained. This is important, since the number of known and fully sequenced genomes of S. pneumoniae has continued to grow, but calculations of relative constants, such as percentage amino acid sequence homology, will be expected not to vary significantly for relatively invariant cold spot genes, if at least about 150 or more separate strains are analyzed. Accordingly, in order to determine a percentage average amino acid pairwise sequence homology for a given coding sequence, the practitioner may compare 150 allelic sequences, or, e.g., 200, 400, 1000, 2000 or more allelic sequences, and the calculated sequence homology value will be expected to be essentially the same.

From the 21 cold spot surface polypeptides analyzed for essential function, sequence homology, and universal presence in the species, six polypeptides were further analyzed for suitability as vaccine candidates, as detailed below. Similar procedures can be applied to the other cold spot polypeptides determined to be surface-expressed in S. pneumoniae, to be in-common across all strains, and to have very high (99% or more) average amino acid sequence pairwise homology among isolates to confirm their utility as immunogens.

GenScript (Piscataway, N.J. (US)) was contracted to express and raise polyclonal antiserum against six selected cold spot surface polypeptides. Predominant coding sequence information for the proteins GBSP3 (SEQ ID NO:1), GBSP14 (SEQ ID NO:2), GBSP15 (SEQ ID NO:3), GBSP22 (SEQ ID NO:4), GBSP23 (SEQ ID NO:5), and GBSP24 (SEQ ID NO:6) set forth in Table 1A, above, was analyzed to predict such features as expressed amino acid sequence, antigenic index, hydrophilicity, solvent accessibility, flexible regions, coil/sheet/helical regions, and a breakdown of extracellular, transmembrane, and cytoplasmic domains along the length of the protein. For each of the six proposed cold spot polypeptides selected, a structural gene for expression of the largest extracellular segment was constructed, inserted into an expression vector, and expressed in E. coli. Purification of hexaHis- or octaHis-tagged polypeptides via Ni⁺⁺ affinity chromatography was performed prior to enzymatic elimination of the His tag. The portion of each polypeptide expressed and purified is presented in Table 3:

TABLE 3

Extracellular Portions of Cold Spot

Surface Proteins for Immunogenicity Studies

S. pneumoniae Surface

Protein source of

extracellular fragment
Amino Acid Sequence of the

expressed (SEQ ID for
Portion of the S. pneumoniae

protein fragment)
Surface Protein Expressed

GBSP3
RVVQKPFQWF DSVKSDLAHL TRTYNENESL KKQLYQLEVK

(SEQ ID NO: 336)
SNEVESLKTE NEQLRQLLDM KSKLQATKTL AADVIMRSPV

SWKQELTLDA GRSKGASENM LAIANGGLIG SVSKVEENST

IVNLLTNTEN ADKISVKIQH GSTTIYGIII GYDKENDVLK

ISQLNSNSDI SAGDKVTTGG LGNFNVADIP VGEVVATTHS

TDYLTREVTV KLSADTHNVD VIELVGNS

GBSP14
LILEVTAVPV FSPTQSVEAV LVLLYDLTTI RTYEKLNLAF

(SEQ ID NO: 337)
VSNASHELRT PVTSIKGFAE TIKGMSAEEE ALKDDFLDII

YKESLRLEHI VEHLLTLSKA QQMPIQWTTL SLAEFVQDLT

QSLQPQLKKK DLQLKVQVPD DVTLVSDSQL LSQILLNLLS

NAIRYTEQGG KIEVKTQKVN EGIKISVSDT GIGISQLEQD

RIFERFYRVN KGRSRQTGGT GLGLAIVKEL SQLLGGQVTV

TSQLGRGSCF TIFLPNQSFA QD

GBSP15
QKNRQEEAKI LQKEEVLRVA KMALQTGQNQ VSINGVEIQV

(SEQ ID NO: 338)
FSSEKGLEVY HGSEQLLAIK EP

GBSP22
SLASAVEALL APLKRVKVPV HEIGLMLSMS LRFVPTLMDD

(SEQ ID NO: 339)
TTRIMNAQKA RGVDFGEGSI VQKVKAMIPI LIPLFATSLK

RADSLAIAME ARGYQGGKGR SQYRQLKWTL KD

GBSP23
TQKSSVNNSN NNSTITQTAY KNENSTTQAV NKVKDAVVSV

(SEQ ID NO: 340)
ITYSANRQNS VFGNDDTDTD SQRISSEGSG VIYKKNDKEA

YIVTNNHVIN GASKVDIRLS DGTKVPGEIV GADTFSDIAV

VKISSEKVTT VAEFGDSSKL TVGETAIAIG SPLGSEYANT

VTQGIVSSLN RNVSLKSEDG QAISTKAIQT DTAINPGNSG

GPLINIQGQV IGITSSKIAT NGGTSVEGLG FAIPANDAIN

IIEQLEKNGK VTRPALGIQM VNLSNVSTSD IRRLNIPSNV

TSGVVVRSVQ SNMPANGHLE KYDVITKVDD KEIASSTDLQ

SALYNHSIGD TIKITYYRNG KEETTSIKLN KSSGDLES

GBSP24
NSHKVQMEKE IALKQKKFEQ KHLQNYTDEI VGLYNEIRGF

(SEQ ID NO: 341)
RHDYAGMLVS MQMAIDSGNL QEIDRIYNEV LVKANHKLRS

DKYTYFDLNN IEDSALRSLV AQSIVYARNN GVEFTLEVKD

TITKLPIELL DLVRIMSVLL NNAVEGSADS YKKQMEVAVI

KMETETVIVI QNSCKMTMTP SGDLFALGFS TKGRNRGVGL

NNVKELLDKY NNIILETEME GSTFRQIIRF KREFE

Six of the purified cold spot polypeptide candidate antigens were used to inoculate rabbits 2-3 times over 6-8 weeks for production of polyclonal antiserum. Rabbits were selected for immunization to elicit a greater volume of antiserum than would have been produced in, e.g., laboratory mice. About 3 mg of each polypeptide at >85% purity, and 10-50 mg of affinity-purified rabbit polyclonal antibodies (ELISA titer of 1:32,000), were provided for further experiments. For control purposes, pre-inoculation rabbit serum was also supplied. The rabbit antisera were first tested using a pneumococcal whole cell ELISA (WCE) assay. Briefly, S. pneumoniae cells were immobilized in the wells of microtiter plates, antiserum at 1:10 dilution from the cold spot polypeptide immunizations were separately added to wells and incubated with the cells. A radiolabeled reporter anti-rabbit IgG antibody was used for detection of bound antiserum. Control wells received pre-immune serum for contrast with the test wells.

All of the cold spot polypeptides showed elicitation of antiserum that recognized the native surface antigens of pneumococcal cells from ten different S. pneumoniae serotypes. The selected serotypes corresponded to ten of the thirteen serotypes addressed by the commercial PREVNAR-13® vaccine. Table 4 shows the fold increase range in colorimetric response of the polyclonal antiserum compared with the signal produced by pre-immune serum.

TABLE 4

Vaccine Candidate Antigenicity Results

Based on Whole Cell ELISA (WCE)

Range of Increase in Colorimetric

Intensity Across 10 Pneumococcal Strain

Cold Spot Surface
SEQ ID
Serotypes, from Comparison of Pre-immune

Antigen Tested
NO:
vs. Post-immune Antiserum

GBSP3
1
15-20

GBSP14
2
20-30

GBSP15
3
10-25

GBSP22
4
20-30

GBSP23
5
40-220

GBSP24
6
20-70

Whole cell ELISA results confirmed that the cold spot polypeptides are antigenic, presenting antibody-recognizable targets on the surface of pneumococcal cells. The fact that the tested polyclonal antiserum included anti-cold spot polypeptide IgG antibodies (bound by the reporter antibody) indicates that the surface antigens are immunogenic, eliciting a T cell-dependent antibody response.

Flow cytometry was also performed to evaluate surface antigenicity, using 1:100 dilution of antisera. The relevance of the six cold spot polypeptides was also confirmed by testing with pooled healthy human antiserum and rabbit antiserum from whole cell vaccinations.

Evaluation of Cold Spot Polypeptide Antisera for the Ability to Elicit Protection In Vivo

The antisera raised by the six selected cold spot polypeptides were next tested in a passive immunity model described in Briles et al., J. Infect. Dis., 182(6):1694-1701 (2000). Pre-immune and post-immune antiserum from the cold spot polypeptide immunizations was administered intraperitoneally to groups of CBA/N mice at 1:25, 1:100, 1:400, or 1:1600 dilutions one hour before intravenous challenge with a mouse-virulent strain of S. pneumoniae. Ringer's injection solution was used as a negative control; a known protective anti-PspA monoclonal antibody for the challenge strain was used as a positive control. Challenged mice were observed for changes in health and time to moribund. Time to moribund is determined by monitoring the mice every 6 hours post-challenge for signs of disease, hunched back, ruffled fur, irritability, lack of mobility, or failure to respond to touch. As soon as any of these symptoms are observed the surface temperature, determined with a scanning thermometer, is checked every 6 hours. When the surface temperature falls to 25° C., mice are considered moribund and are scored as such, euthanized with CO₂narcosis and cervical dislocation. In some cases, heart blood, collected after death, was plated to verify bacteremia with pneumococci. The results indicated that three of the cold spot immunogens (GBSP14, GBSP15, and GBSP23) were at least partially protective, in that time to death was extended with GBSP15 and “statistically protective” with GBSP23. Mice receiving GBSP14 antiserum exhibited passive immunity protection equivalent to the PspA monoclonal antibody positive control.

Active immunization experiments are under way using 10 pg of antigen in a 10 pL volume of Pierce Alum Imject adjuvant, administered subcutaneously three times at 2-week intervals. It is expected that this immunization will result in protective immunity, demonstrated, e.g., by intravenous challenge with active S. pneumoniae bacteria.

Results from the battery of tests performed using purified antigen and rabbit antisera are shown in Table 5 below.

TABLE 5

Test Results for Different Vaccine Candidates

Whole

Passive
Human

Whole

Candidate
Cell
Flow
protection
Pooled

Cell

Designation
ELISA
cytometry
assay
Serum
IgG
Vaccine

GBSP3
++
−
−
+++
+++
++

GBSP14
++
−
+++
+++
+++
++

GBSP15
++
+
+
+++
+++
++

GBSP22
−
++
−
+++
+++
++

GBSP23
−
−
+++
+++
+++
++

GBSP24
−
−
−
+++
+++
++

“−” negative; “+” slightly positive; “++”positive; “+++”, strongly positive.

The results from the assay with healthy human serum shows that the general population has been exposed to S. pneumoniae and has generated a background anti-pneumococcal response even in the absence of pathogenic infection. Significantly, the assay reported here indicates that the six selected cold spot surface antigens were among the host of pneumococcal antigens recognized by the pooled serum.

All of the studied S. pneumoniae cold spot surface antigens were immunogenic in at least some of the plural tests. Differing results from test to test may be attributable to such factors as amount of immunogen utilized, purity of the inoculum, number of boosts, or factors pertaining to the target antigen itself such as differences in biological function, differential gene expression, or the effect of antibody complexation at the cell surface as part of the assay. For example, a very abundant protein on the cell surface might outperform a less abundant protein in the whole cell ELISA, e.g., by presenting many more antibody targets on the cell surface. In contrast, a cell surface antigen which, even in low abundance, led to cell death when complexed with antibody would make the protein a high performing antigen in the passive immunity assay.

From the foregoing results, it is seen that this invention provides a family of immunogenic S. pneumoniae surface antigens that will be useful for immunization of subjects, including humans, for eliciting an anti-S. pneumoniae antibody response. Optimization of dosing and boosting schedules according to well known practices will result in immunogenic compositions effective as vaccines for inducing protective immunity against S. pneumoniae. Moreover, because the surface antigens disclosed herein are in-common to all or nearly all strains of S. pneumoniae, and because they all exhibit a very high degree of sequence invariability, as expressed by percent amino acid sequence pairwise homology across at least 123 different S. pneumoniae strains, the antigens of the present invention are expected to provide universal immunogens, from which breakthrough will be extremely rare if not unknown over time.

The present invention provides a means for determining and selecting S. pneumoniae surface antigens that are universal vaccine candidates by virtue of their essentially universal presence across all S. pneumoniae genomes and their high degree of amino acid sequence conservation among S. pneumoniae strains. While the particular surface antigens identified herein fit the rigorous selection criteria for a universal S. pneumoniae vaccine, a few antigens have been identified previously using alternative techniques such as antigenomic screening which qualify them as potential vaccine candidates without determination of their universal presence across pneumococcal genomes or their level of sequence homology. See, e.g., Giefing et al., J. Exp. Med., 205(1):117-131 (2008). In such instances where the present disclosure represents a rediscovery of highly immunogenic proteins that are disclosed herein to be, additionally, universally present and invariant, the proteins have been eliminated from the claims.

Additional embodiments of the invention and alternative methods adapted to a particular composition will be evident from studying the foregoing description. All such embodiments and obvious alternatives are intended to be within the scope of this invention, as defined by the claims that follow. The various publications, patents, and references cited in the foregoing description are incorporated herein by reference in their entirety.

Methods and Compositions Using Highly Conserved Pneumococcal Surface Proteins

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATIONS

STATEMENT AS TO FEDERALLY SPONSORED RESEARCH

Provisional Applications (1)