The present invention is directed to the field of translation optimization.
The region approximately 8-10 nucleotides upstream of the translational start site in prokaryotic mRNA tends to include a purine-rich sequence. This sequence is named the Shine-Dalgarno (SD) sequence or ribosome binding site (RBS), and is believed to be involved in prokaryotic translation initiation via base-pairing to a complementary sequence in the 16S rRNA component of the small ribosomal subunit, namely the anti-Shine-Dalgarno sequence (aSD).
Recent studies have also suggested that sequences (motifs) within the coding regions that interact with the aSD, similarly to the SD, can slow down or pause translation elongation in E. coli. Thus, such sequences in the coding regions decrease the overall translation elongation rate and can generally be considered deleterious. Other studies have suggested that selection against internal SD-like sequences which promote rRNA-mRNA interactions can act against codons that tend to compose such motifs. A comprehensive understanding of rRNA-mRNA interactions is however lacking, and methods of optimizing mRNA sequences for enhanced or decreased translation are greatly needed.
The present invention provides, in some embodiments, nucleic acid molecules comprising a mutation that modulates the interaction strength of the nucleic acid molecule to a 16S ribosomal RNA. Methods of improving the translation process of a nucleic acid molecule and producing a nucleic acid molecule optimized for translation, as well as cells comprising the nucleic acid molecules and computer program products are also provided.
According to a first aspect, there is provided a nucleic acid molecule comprising a coding sequence, wherein the nucleic acid molecule comprises at least one mutation within a region of the molecule, wherein the mutation modulates the interaction strength of the nucleic acid molecule to a 16S ribosomal RNA (rRNA); and wherein the region is selected from the group consisting of:
According to another aspect, there is provided a cell comprising a nucleic acid molecule of the invention.
According to another aspect, there is provided a method for improving the translation potential of a coding sequence, the method comprising introducing at least one mutation into a nucleic acid molecule comprising the coding sequence, wherein the mutation modulates the interaction strength of the nucleic acid molecule to a 16S rRNA, thereby improving the translation potential of a coding sequence.
According to another aspect, there is provided a method of modifying a cell, the method comprising expressing a nucleic acid molecule of the invention or an improved nucleic acid molecule produced by a method of the invention, within the cell, thereby modifying a cell.
According to another aspect, there is provided a computer program product for modulating translation potential of a coding sequence in a nucleic acid molecule, comprising a non-transitory computer-readable storage medium having program code embodied thereon, the program code executable by at least one hardware processor to:
According to some embodiments, the mutation modulates the interaction strength of a six-nucleotide sequence containing the mutation to the 16S rRNA.
According to some embodiments, the interaction strength to a 16S rRNA is to an anti-Shine Dalgarno (aSD) sequence of the 16S rRNA.
According to some embodiments, the interaction strength of a sequence of the nucleic acid molecule to the aSD sequence is determined from Table 3.
According to some embodiments, the increasing increases interaction strength to a strong interaction strength, decreasing decreases interaction strength to a weak interaction strength and wherein strong, weak and intermediate interaction strengths are determined from Table 1.
According to some embodiments, the region from position 26 downstream of the TSS through position −13 upstream of the TTS comprises the first 400 base pairs of the region.
According to some embodiments, the nucleic acid molecule of the invention comprises at least a second mutation, wherein the second mutation is in a different region than the at least one mutation.
According to some embodiments, the at least one mutation is within the coding sequence and mutates a codon of the coding sequence to a synonymous codon.
According to some embodiments, the mutation improves the translation potential of the coding sequence.
According to some embodiments, the improving comprises at least one of: increasing translation initiation efficiency, increasing translation initiation rate, increasing diffusion of the small subunit to the initiation site, increasing elongation rate, optimization of ribosomal allocation, increasing chaperon recruitment, increasing termination accuracy, decreasing translational read-through and increasing protein yield.
According to some embodiments, the nucleic acid molecule is a messenger RNA (mRNA).
According to some embodiments, the cell is a bacterial cell.
According to some embodiments, the bacteria is selected from a bacterium recited in Table 1.
According to some embodiments, the bacterium is selected from Escherichia Coli, Alphprotebacteria, Spriochaete, Purple bacteris, Gammaproteoaceteria, deltaproteobacteria and Betaproteobacteria.
According to some embodiments, the bacterium is not a Cyanobacteria or Gram-positive bacteria.
According to some embodiments, the nucleic acid molecule is endogenous to the cell.
According to some embodiments, the nucleic acid molecule is exogenous to the cell.
According to some embodiments, the mutation is located at a region selected from the group consisting of:
b. positions −1 upstream of a TSS through position 5 downstream of the TSS of the coding sequence and the mutation increases interaction strength;
c. positions 6 through 25 downstream of a TSS of the coding sequence and the mutation decreases interaction strength;
d. positions 26 downstream of a TSS of the coding sequence through position −13 upstream of a translational termination site (TTS) of the coding sequence and the mutation modulates interaction strength to an intermediate interaction strength;
e. positions −8 through −17 upstream of a TTS of the coding sequence and the mutation increases interaction strength; and
f. a position downstream of a TTS of the coding sequence and the mutation increases interaction strength.
According to some embodiments, the nucleic acid molecule is a nucleic acid molecule of the invention.
According to some embodiments,
According to some embodiments, the method of the invention further comprises introducing at least a second mutation in a different region from the at least one mutation.
According to some embodiments, introducing a mutation comprises:
According to some embodiments, the calculating comprises calculating interaction strength of a plurality of 6-nucleotide long subregions with a region of the nucleic acid molecule, wherein the region is selected from:
According to some embodiments, the calculating comprises calculating the interaction strength of each 6-nucleotide long subregion within the region.
According to some embodiments, the output modified sequence of the nucleic acid molecule comprises at least the top 5 mutations within the nucleic acid molecule that increase or decrease translation potential.
According to some embodiments, the output modified sequence of the nucleic acid molecule comprises at least the top 5 mutations within the region that increase or decrease translation potential.
Further embodiments and the full scope of applicability of the present invention will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee
The invention is based on the surprising findings that strong, weak and intermediate interactions between mRNAs and the 16S rRNA are selected for in particular regions of an mRNA. Further, these selected for interactions enhance translation and the introduction of mutations that alter interaction strengths in these regions in turn alter the translation efficiency of the mutated mRNA. It was found that in addition to the canonical rRNA-mRNA interaction that triggers initiation the following rules appear in many bacteria across the tree of life in different stages and sub-stages of the translation process (
Early elongation—at the beginning of the coding region there is evidence of selection for strong rRNA-mRNA interactions that slow down the early translation elongation.
Elongation 1—inside the coding region there is evidence of selection against strong rRNA-mRNA interactions. This signal is related also to improving translation elongation (and not only to prevent incorrect initiation).
Elongation 2—there is evidence of selection inside the transcript for intermediate rRNA-mRNA interactions to improve pre-initiation.
Termination—there is evidence of selection for strong rRNA-mRNA interactions upstream of the STOP codon to prevent ribosomal read-trough.
The findings disclosed herein are based on the comprehensive analysis of 551 prokaryotic genomes. We show that the current knowledge regarding the functional rRNA-mRNA interactions during translation is only the ‘tip of the iceberg’: in most of the analyzed prokaryotes, rRNA-mRNA interactions seem to be involved in all sub-stages of translation, via corresponding sequence signatures encoded across the entire transcript. Thus, rRNA-mRNA interactions affect the way evolution shapes the nucleotide composition along the entire transcript to optimize translation.
By a first aspect, there is provided a nucleic acid molecule comprising a coding sequence, the nucleic acid molecule comprising at least one mutation that modulates the interaction strength of the nucleic acid molecule to a ribosomal RNA.
The term “nucleic acid” is well known in the art. A “nucleic acid” as used herein will generally refer to a molecule (i.e., a strand) of DNA, RNA or a derivative or analog thereof, comprising a nucleobase. A nucleobase includes, for example, a naturally occurring purine or pyrimidine base found in DNA (e.g., an adenine “A,” a guanine “G,” a thymine “T” or a cytosine “C”) or RNA (e.g., an A, a G, an uracil “U” or a C).
The terms “nucleic acid molecule” include but not limited to modified and unmodified single-stranded RNA (ssRNA) or single-stranded DNA (ssDNA) having both a coding region and a noncoding region. In some embodiments, the nucleic acid molecule is DNA. In some embodiments, the nucleic acid molecule is RNA. In some embodiments, the DNA is single stranded DNA. In some embodiments, the DNA is double stranded DNA. In some embodiments, the DNA is plasmid DNA. In some embodiments, the RNA is single stranded RNA. In some embodiments, the RNA is plasmid RNA. In some embodiments, the RNA is messenger RNA (mRNA). In some embodiments, the RNA is pre-mRNA. mRNA is well known in the art. In some embodiments, mRNA comprises a 5′ cap. In some embodiments, the mRNA is devoid of a 5′ cap. In some embodiments, the cap is a 7-methylguanasine cap. In some embodiments, mRNA comprises a 3′ polyA tail. In some embodiments, mRNA is polyadenylated. In some embodiments, mRNA comprises a 3′ oligouridine tail. In some embodiments, mRNA is oligouridylated. In some embodiments, the mRNA is monocistronic. In some embodiments, the mRNA is polycistronic. In some embodiments, the nucleic acid molecule comprises a plurality of coding sequences.
As used herein, the phrases “Coding sequence” and “coding region” are interchangeably used herein to refer to a nucleic acid sequence that when translated results in an expression product, such as a polypeptide, protein, or enzyme. In some embodiments, the coding sequence is to be used as a basis for making codon alterations. In some embodiments, the coding sequence is a bacterial gene. In some embodiments, the coding sequence is a viral gene. In some embodiments, the coding sequence is a mammalian gene. In some embodiments, the coding sequence is a human gene. In some embodiments, the coding sequence is a portion of one of the above listed genes. In some embodiments, the coding sequence is a heterologous transgene. In some embodiments, the above listed genes are wild type, endogenously expressed genes. In some embodiments, the above listed genes have been genetically modified or in some way altered from their endogenous formulation.
The term “heterologous transgene” as used herein refers to a gene that originated in one species and is being expressed in another. In some embodiments, the transgene is a part of a gene originating in another organism. In some embodiments, the heterologous transgene is a gene to be overexpressed. In some embodiments, expression of the heterologous transgene in a wild-type cell reduces global translation in the wild-type cell.
In some embodiments, the nucleic acid molecule further comprises a non-coding region. In some embodiments, the non-coding region is an untranslated region (UTR). In some embodiments, the UTR is 5′ to the coding sequence. In some embodiments, the UTR is 3′ to the coding sequence. In some embodiments, the nucleic acid molecule comprises a 5′ UTR and a 3′ UTR. In some embodiments, the UTR is the endogenous UTR associated with the coding sequence. In some embodiments, the UTR comprises at least one regulatory element that regulates translation of the coding sequence. In some embodiments, the UTR is transcribed with the coding sequence. In some embodiments, an mRNA transcribed from the nucleic acid molecule is a functional mRNA. In some embodiments, a functional mRNA is an mRNA that is capable of being translated. In some embodiments, the nucleic acid molecule is an mRNA. In some embodiments, the nucleic acid molecule is a functional mRNA.
As used herein, the phrases “noncoding sequence” and “noncoding region” are interchangeably used herein to refer to sequences upstream of the translational start site (TSS) or downstream of the translational termination site (TTS). The noncoding region can be at least 1, 5, 10, 25, 50, 100, 200, 500, 1000, 2000, 5000 or 10000 base pairs upstream of the TSS or downstream of the TTS.
In some embodiments of the invention, the noncoding sequence upstream of the TSS refers to a 5′ untranslated region also referred to as 5′ UTR. According to some embodiments, the 5′UTR includes a ribosome binding site (RBS). In some embodiments, the RBS comprises a Shine-Dalgarno (SD) sequence. In some embodiments, the SD sequence is a canonical SD sequence. In some embodiments, the SD sequence is a non-canonical SD sequence. In some embodiments, the RBS does not comprise a SD sequence. In some embodiments, the canonical SD sequence comprises the sequence AGGAGG. In some embodiments, the SD sequence comprises the sequence AGGAGGU. The SD sequence is involved in prokaryotic translation initiation via base-pairing to a complementary sequence named the anti-SD (aSD) sequence on the 3′ tail of the 16S rRNA component of the small ribosomal subunit. In some embodiments, the aSD sequence comprises and/or consists of the sequence ACCUCCUUA. In some embodiments, the E. coli aSD sequence comprises and/or consists of the sequence ACCUCCUUA. In some embodiments, the aSD comprises a 6-nucleotide long subregion. In some embodiments, interaction strength is the binding strength to the subregion. In some embodiments the canonical subregion comprises and/or consists of CCUCCU. In some embodiments the canonical subregion comprises and/or consists of CCTCCT. In some embodiments, the aSD subregion comprises and/or consists of a sequence selected from: GCCGCG, CGGCTG, CTCCTT, GCCGTA, GCGGCT, GTGGCT, and GGCTGG. U and T are used interchangeably herein.
In some embodiments of the invention, the noncoding sequence downstream of the TTS refers to a 3′ untranslated region also referred to as 3′ UTR.
In some embodiments, the ribosomal RNA is a small ribosome subunit. According to some embodiments, the ribosomal RNA may be a 30S small subunit of a ribosome. According to other embodiments, the ribosomal RNA is a 16S ribosomal RNA. According to some embodiments of the invention, the 16S ribosomal RNA has an aSD sequence. In some embodiments, interaction strength is calculated to the aSD. In some embodiments, interaction strength is calculated to a subregion of the aSD.
The term “interaction strength” as used herein refers to hybridization free energy between a nucleic acid molecule and a ribosomal RNA. Lower and more negative free energy is related to stronger hybridization and stronger interaction strength. Hybridization free energy can be computed based on the Vienna package RNAcoFold, which computes a common secondary structure of two RNA molecules. According to some embodiments, the interaction strength can be defined by a scale of strong, intermediate and weak.
The term “hybridization” or “hybridizes” as used herein refers to the formation of a duplex between nucleotide sequences which are sufficiently complementary to form duplexes via Watson-Crick base pairing. Two nucleotide sequences are “complementary” to one another when those molecules share base pair organization homology. “Complementary” nucleotide sequences will combine with specificity to form a stable duplex under appropriate hybridization conditions. For instance, two sequences are complementary when a section of a first sequence can bind to a section of a second sequence in an anti-parallel sense wherein the 3′-end of each sequence binds to the 5′-end of the other sequence and each A, T (U), G and C of one sequence is then aligned with a T (U), A, C and G, respectively, of the other sequence. RNA sequences can also include complementary G=U or U=G base pairs. Thus, two sequences need not have perfect homology to be “complementary” under the invention.
As used herein, the term “free energy” refers is made to the Gibbs free energy (AG), referring to the thermodynamic potential that measures the hybridization reaction between a given oligonucleotide and its DNA or RNA complement.
In some embodiments, the nucleic acid molecule comprises a mutation. In some embodiments, a mutation is introduced into the nucleic acid molecule. In some embodiments, the mutation is in the coding sequence. In some embodiments, the mutation is in the noncoding sequence of the nucleic acid molecule. In some embodiments, the mutation results in modulated interaction strength between a nucleic acid molecule region and a ribosomal RNA compared to the interaction strength between an unmodified nucleic acid molecule and a ribosomal RNA. In some embodiments, the mutation modulates local interaction strength. In some embodiments, the mutation modulates interaction strength at the mutated nucleotide. In some embodiments, the mutation is a mutation to a nucleotide with stronger interaction. In some embodiments, the mutation is a mutation to a nucleotide with a weaker interaction. In some embodiments, the mutation modulates interaction strength in a particular region. In some embodiments, the mutation modulates interaction strength in a particular subregion. In some embodiments, the mutation modulates interaction strength of a subregion of the mRNA that is bound by the aSD sequence of a small ribosomal subunit.
In some embodiments, at least one mutation is introduced to at least one region of the nucleic acid molecule. In some embodiments, the mutation is in a region. In some embodiments, the region is selected from the group consisting of:
In some embodiments, the mutation is in a region comprising positions −8 through −17 upstream of a TSS. In some embodiments, the mutation is in a region comprising positions −1 upstream of a translational start site through position 5 downstream of the translational start site.
In some embodiments, the mutation is in a region comprising positions 6 through 25 downstream of a TSS. In some embodiments, the mutation is in a region comprising positions 26 downstream of a TSS through position −13 upstream of a translational termination site.
In some embodiments, the mutation is in a region comprising positions −8 through −17 upstream of a TTS. In some embodiments, the mutation is in a region comprising positions −9 through −12 upstream of a TTS. In some embodiments, the region comprising positions −8 through −17 upstream of the TTS is a region comprising position −9 through −12 upstream of the TTS. In some embodiments, the mutation is in a region comprising positions downstream of a TTS. In some embodiments, the region from position 26 downstream of the TSS through position −13 upstream of the TSS comprises at most 400 nucleotides. In some embodiments, the region from position 26 downstream of the TSS through position −13 upstream of the TSS comprises or consists of position 26 though position 400 downstream of the TSS.
In some embodiments, the mutation is in a region comprising positions −8 through −17 upstream of a TSS, increases interaction strength and enhances translation potential. In some embodiments, the mutation is in a region comprising positions −8 through −17 upstream of a TSS, decreases interaction strength and decreases translation potential. In some embodiments, the mutation is in a region comprising positions −1 upstream of a TSS through position 5 downstream of the TSS, increases interaction strength and increases translation potential. In some embodiments, the mutation is in a region comprising positions −1 upstream of a TSS through position 5 downstream of the TSS, decreases interaction strength and decreases translation potential. In some embodiments, the mutation is in a region comprising positions 6 through 25 downstream of a TSS, increases interaction strength and decreases translation potential. In some embodiments, the mutation is in a region comprising positions 6 through 25 downstream of a TSS, decreases interaction strength and increases translation potential. In some embodiments, the mutation is in a region comprising positions 26 downstream of a TSS through position −13 upstream of a translational termination site, increases interaction strength and decreases translation potential. In some embodiments, the mutation is in a region comprising positions 26 downstream of a TSS through position −13 upstream of a translational termination site, decreases interaction strength and increases translation potential. In some embodiments, the mutation is in a region comprising positions −8 through −17 upstream of a TTS, increases interaction strength and increases translation potential. In some embodiments, the mutation is in a region comprising positions −8 through −17 upstream of a TTS, decreases interaction strength and decreases translation potential. In some embodiments, the mutation is in a region comprising positions downstream of a TTS. increases interaction strength and decreases translation potential. In some embodiments, the mutation is in a region comprising positions downstream of a TTS. decreases interaction strength and increases translation potential. Thus, it can be understood that interaction strength and translation potential are correlated in regions between −8 and −17 in the 5′ UTR, between −1 of the 5′ UTR and +5 of the coding region, and between −8 to −17 relative to the TTS; whereas interaction strength and translation potential are inversely related in the middle regions of the coding region (from +6 relative to the TSS to −12 relative to the TTS) and in the 3′ UTR. This is particularly true from +6 to +25 relative to the TSS. “Interaction strength modulation” refers to increasing or decreasing the interaction strength between a nucleic acid molecule and a ribosomal RNA sequence. In some embodiments, the interaction strength is modulated at the site of the mutation. In some embodiments, the interaction strength is modulated in the region comprising the mutation. In some embodiments, the interaction strength is modulated in a subregion comprising the mutation.
According to some embodiments, interaction strength modulation may result in modifying at least one step of the translation process including, but not limited to increased translation initiation efficiency, decreased translation initiation efficiency, increased translation initiation rate, decreased translation initiation rate, increased diffusion of the small ribosomal subunit to the initiation site, decreased diffusion of the small subunit to the initiation site, increased elongation rate, decreased elongation rate, optimization of ribosomal allocation, deoptimization of ribosomal allocation, increased chaperon recruitment, decreased chaperon recruitment, increased termination accuracy, decreased termination accuracy, increased translational read-through, decreased translational read-through, increase protein level and decreased protein level. Each possibility represents a separate embodiment of the invention. In some embodiments, modulating interaction strength alters translation potential.
As used herein, the term “translation potential” refers to the potential translation that would occur if the nucleic acid were introduced into a system competent to translate the nucleic acid. In some embodiments, translation potential comprises translation rate. In some embodiments, translation potential comprises translation efficiency. In some embodiments, translation potential comprises translation initiation rate or efficiency. In some embodiments, translation potential comprises ribosome diffusion. In some embodiments, translation potential comprises, ribosomal allocation. In some embodiments, translation potential comprises termination accuracy. In some embodiments, translation potential comprises termination efficiency. In some embodiments, translation potential comprises termination rate. In some embodiments, translation potential comprises total protein yield.
In some embodiments, translation is in vivo translation. In some embodiments, translation is in vitro translation. In vitro translation systems are well known in the art, and include for example, rabbit reticulocyte lysates. In some embodiments, translation comprises translation pre-initiation. In some embodiments, translation comprises translation initiation. In some embodiments, translation comprises early elongation. In some embodiments, translation comprise elongation. In some embodiments, translation comprises translation termination.
In some embodiments, the interaction strength is increased by at least 1%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 150%, 200%, 250%, 300%, 350%, 400%, 450%, 500%, 1000%, or 10000% relative to an unmodified region of a nucleic acid molecule and a ribosomal RNA. Each possibility represents a separate embodiment of the invention.
In some embodiments, a strong interaction is an interaction of at least 1.3, 1.5, 1.7, 1.8, 1.9, 2.0. 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7.0, 7.1, 7.2 or 7.3 kcal/mol. Each possibility represents a separate embodiment of the invention According to some embodiments, the interaction strength is increased to a strong interaction strength. Organism specific interaction strengths are provided in Table 1. In some embodiments, the interaction strength (Hybridization energy value or “H.E.V”) of specific 6-nucleotide long subregions of an mRNA to canonical and non-canonical aSD sequences are as provided in Table 3. Organisms specific aSD sequences are known in the art and can be determined for each organism selected.
Achromobacter denitrificans
Acidovorax avenae subsp
Advenella kashmirensis WT001
Alcaligenaceae bacterium LMG
Alcalis faecalis
Alicycliphilus denitrificans BC
Aquabacterium sp NJ1
Aquaspirillum sp LM1
Azoarcus aromaticum EbN1
Betaproteobacteria bacterium GR1643
Bordetella avium 197N
Burkholderia ambifaria
Burkholderiales bacterium 23
Candidatus Accumulibacter
phosphatis
Castellaniella defragrans 65Phen
Chromobacterium sphagni
Collimonas arenae
Comamonas aquatica
Cupriavidus basilensis
Curvibacter sp AEP13
Dechloromonas agitata is5
Dechlorosoma suillum PS
Delftia acidovorans
Diaphorobacter
polyhydroxybutyrativorans
Gallionella capsiferriformans ES2
Herbaspirillum frisingense
Herminiimonas arsenicoxydans
Hydrogenophaga crassostreae
Janthinobacterium agaricidamnosum
Jeongeupia sp USM3
Laribacter hongkongensis
Leptothrix cholodnii SP6
Limnohabitans sp 63ED372
Massilia putida
Methylibium petroleiphilum PM1
Methylophilus sp 5
Methylotenera versatilis 301
Methyloversatilis discipulorum
Mitsuaria sp 7
Nitrosomonas communis
Nitrosospira briensis C128
Noviherbaspirillum autotrophicum
Paraburkholderia caballeronis
Paucibacter sp KCTC
Polaromonas glacialis
Pseudogulbenkiania sp MAI1
Pusillimonas sp T77
Ralstonia eutropha H16
Ramlibacter tataouinensis
Rhizobacter gummiphilus
Rhodoferax antarcticus
Roseateles depolymerans
Rubrivivax gelatinosus IL144
Sideroxydans lithotrophicus ES1
Sulfuricella denitrificans skB26
Sulfuritalea hydrogenivorans sk43H
Thauera chlorobenzoica
Thiomonas sp str
Verminephrobacter eiseniae EF012
Vitreoscilla filiformis
Vogesella sp LIG4
Polyangium brachysporum
Pseudomonas mesoacidophila
Nostoc azollae 0708
Acaryochloris marina MBIC11017
Anabaena cylindrica PCC
Anabaenopsis circularis NIES21
Arthrospira platensis C1
Aulosira laxa NIES50
Calothrix brevissima NIES22
Chamaesiphon minutus PCC
Chondrocystis sp NIES4102
Chroococcidiopsis thermalis PCC
Crinalium epipsammum PCC
Cyanobacterium aponinum PCC
Cyanobium gracile PCC
Cyanothece sp ATCC
Cylindrospermopsis raciborskii CS505
Cylindrospermum stagnale PCC
Dactylococcopsis salina PCC
Dichlorospermum compactum
Filamentous cyanobacterium ESFC1
Fischerella sp NIES3754
Fortiea contorta PCC
Fremyella diplosiphon NIES3275
Geitlerinema sp PCC
Geminocystis herdmanii PCC
Gloeobacter kilaueensis JS1
Gloeocapsa sp PCC
Gloeomargarita lithophora
Halomicronema hongdechloris C2206
Halothece sp PCC
Leptolyngbya boryana dg5
Lyngbya confervoides BDU141951
Mastigocladopsis repens PCC
Microcoleus sp PCC
Microcystis aeruginosa NIES2481
Moorea bouillonii PNG
Nodosilinea nodulosa PCC
Nodularia sp NIES3585
Nostoc carneum NIES2107
Nostocales cyanobacterium HT582
Oscillatoria acuminata PCC
Oscillatoriales cyanobacterium JSC12
Planktothrix agrdhii NIVACYA
Pleurocapsa sp PCC
Pseudanabaena sp PCC
Raphidiopsis curvata NIES932
Rivularia sp PCC
Scytonema hofmannii PCC
Sphaerospermopsis kisseleviana
Spirulina major PCC
Stanieria cyanosphaera PCC
Synechococcus sp 60AY4M2
Synechocystis sp PCC
Tolypothrix tenuis PCC
Trichodesmium erythraeum IMS101
Scytonema hofmanm UTEX
Anaeromyxobacter dehalogenans
Bilophila wadsworthia 316
Deferrisoma camini S3R1
Desulfarculus baarsii DSM
Desulfatibacillum alkenivorans AK01
Desulfobacca acetoxidans DSM
Desulfobacter postgatei 2ac9
Desulfobacterium autotrophicum
Desulfobacula toluolica Tol2
Desulfocapsa sulfexigens DSM
Desulfococcus multivorans
Desulfomicrobium baculatum DSM
Desulfomonile tiedjei DSM
Desulfonatronumlacustre DSM
Desulfotalea psychrophila LSv54
Desulfotignum balticum DSM
Desulfovibrio africanus str
Desulfurivibrio alkaliphilus AHT2
Desulfuromonas soudanensis
Geoalkalibacter subterraneus
Geobacter anodireducens
Geopsychrobacter electrodiphilus
Haliangium ochraceum DSM
Melittangium boletus DSM
Nannocystis exedens
Pelobacter acetylenicus
Pseudodesulfovibrio indicus
Sandaracinus amylolyticus
Sorangium cellulosum So
Syntrophobacter fumaroxidans MPOB
Syntrophorhabdus aromaticivorans UI
Syntrophus aciditrophicus SB
Vulgatibacter incomptus
Acidihalobacter ferrooxidans
Acinetobacter baumannii
Aeromonas aquatica
Agarilytica rhodophyticola
Agarivorans gilvus
Alcanivorax borkumensis SK2
Algiphilus aromaticivorans DG1253
Aliivibrio salmonicida LFI1238
Alkalilimnicola ehrlichii MLHE1
Allochromatium vinosum DSM
Alteromonadaceae bacterium Bs12
Alteromonas addita
Azotobacter chroococcum
Bacterioplanes sanyensis
Beggiatoa alba B18LD
Brenneria goodwinii
Budvicia aquatica
Candidatus Sodalis pierantonius
Cedecea davisae DSM
Cellvibrio japonicus Ueda107
Chania multitudinisentens RB25
Chromatiaceae bacterium
Chromohalobacter salexigens DSM
Citrobacter amalonaticus
Cobetia marina
Colwellia beringensis
Congregibacter
litoralis KT71
Cronobacter condimenti 1330
Dokdonella koreensis DS123
Dyella japonica A8
Ectothiorhodospira sp BSL9
Edwardsiella anguillarum ET080813
Enterobacter asburiae
Enterobacteriaceae bacterium
Erwinia amylovora
Escherichia albertii
Ferrimonas balearica DSM
Flavobacterium sp 29
Fluoribacter dumoffii NY
Frateuria aurantia DSM
Gibbsiella quercinecans
Gilliamella apicola
Gilvimarinus agarilyticus
Glaciecola nitratireducens FR1064
Granulosicoccus antarcticus
Grimontia hollisae
Gynuella sunshinyii YC6258
Hafnia alvei
Hahella chejuensis KCTC
Halioglobus japonicus
Halomonas aestuarii
Halotalea alkalilenta
Idiomarina sp 513
Immundisolibacter cernigliae
Klebsiella aeros
Kluyvera intermedia
Kosakonia cowanii
Kushneria sp X49
Lacimicrobium alkaliphilum
Leclercia adecarboxylata
Legionella anisa
Lelliottia amnigena
Photobacterium damselae subsp
Acetobacterium woodii DSM
Acutalibacter muris
Aeribacillus pallidus
Alicyclobacillus acidocaldarius subsp
Alkaliphilus metalliredigens QYMF
Anaeromassilibacillus sp
Anaerostipes hadrus
Aneurinibacillus migulanus
Anoxybacillus sp B2M1
Blautia coccoides
Brevibacillus brevis
Butyrivibrio hungatei
Carnobacterium gallinarum DSM
Clostridioides difficile
Cohnella panacarvi Gsoil
Dehalobacter sp CF
Dehalobacterium formicoaceticum
Desulfitobacterium dehalogenans
Desulfosporosinus acidiphilus SJ4
Eisenbergiella tayi
Erysipelotrichaceae bacterium 146
Ethanolins harbinense YUAN3
Exiguobacterium acetylicum DSM
Faecalibacterium prausnitzii
Fictibacillus arsenicus
Flavonifractor plautii
Geobacillus genomosp 3
Geosporobacter ferrireducens
Gottschalkia acidurici 9a
Halobacillus halophilus
Herbivorax saccincola
Hungatella hathewayi WAL18680
Intestinimonas butyriciproducens
Jeotgalibacillus malaysiensis
Kyrpidia sp EA1
Lachnoclostridium phytofermentans
Lactobacillus casei
Lentibacillus amyloliquefaciens
Limnochorda pilosa
Listeria innocua Clip11262
Lysinibacillus fusiformis
Mahella australiensis 501
Niameybacter massiliensis
Novibacillus thermophilus
Numidum massiliense
Oceanobacillus iheyensis HTE831
Oscillibacter valericis Sjm1820
Paenibacillaceae bacterium GAS479
Paeniclostridium sordellii
Parageobacillus genomosp 1
Pelosinus fermentans
Peptoclostridium difficile
Peptostreptococcaceae bacterium VA2
Planococcus antarcticus DSM
Planomicrobium sp ES2
Pseudobacteroides cellulosolvens
Robinsoniella sp KNHs210
Roseburia hominis A2183
Ruminiclostridium sp KB18
Ruminococcaceae bacterium AE2021
Ruminococcus albus 7
Rummeliibacillus stabekisii
Saccharibacillus sacchari DSM
Salipaludibacillus agaradhaerens
Sediminibacillus massiliensis isolate
Selenomonas ruminantium subsp
Solibacillus silvestris
Sporolactobacillus pectinivorans
Sporosarcina globispora
Staphylococcus aureus
Sulfobacillus thermosulfidooxidans
Symbiobacterium thermophilum IAM
Syntrophobotulus glycolicus DSM
Terribacillus aidingensis
Thalassobacillus sp TM1
Thermanaeromonas toyohensis ToBE
Thermicanus aegyptius DSM
Thermincola potens JR
Thermoanaerobacterium sp RBIITD
Thermobacillus composti KWC4
Tumebacillus algifaecis
Ureibacillus thermosphaericus
Virgibacillus dokdonensis
Viridibacillus sp OK051
Desulfotomaculum guttoideum
Eubacterium cellulosolvens 6
Bacillus abyssalis
Clostridium difficile CD196
Desulfotomaculum acetoxidans DSM
Eubacterium limosum
Bacillus thuringiensis serovar
Bacillus clarkii
Brevibacterium frigoritolerans
Arcobacter nitrofigilis DSM
Bacteriovorax marinus SJ
Bdellovibrio bacteriovorus
Halobacteriovorax marinus
Leucothrix mucor DSM
Luminiphilus syltensis NOR51B
Luteibacter sp 9133
Luteimonas abyssi
Lysobacter antibioticus
Marichromatium purpuratum 984
Marinobacter adhaerens HP15
Marinobacterium sp ST5810
Marinomonas mediterranea MMB1
Methylobacter luteus IMVB3098
Methylococcus capsulatus str
Methylomagnum ishizawai
Methylomarinum vadi
Methylomicrobium agile
Methylomonas denitrificans
Methylophaga nitratireducenticrescens
Methylosarcina fibrata AMLC10
Methylovulum miyakonense HT12
Microbulbifer agarilyticus
Morganella morganii
Moritella viscosa
Neptunomonas phycophila
Nitrococcus mobilis Nb231
Nitrosococcus halophilus Nc4
Obesumbacterium proteus
Oceanicoccus sagamiensis
Oceanimonas sp GK1
Oceanisphaera profunda
Oleiphilus messinensis
Oleispira antarctica
Pantoea agglomerans
Paraglaciecola psychrophila 170
Pectobacterium atrosepticum
Photorhabdus asymbiotica
Plautia stali symbiont
Plesiomonas shigelloides
Pluralibacter gergoviae
Polycyclovorans algicola TG408
Pragia fontium
Proteus mirabilis
Providencia alcalifaciens
Pseudoalteromonas agarivorans DSM
Pseudohongiella spirulinae
Pseudoxanthomonas spadix BDa59
Psychrobacter alimentarius
Psychromonas ingrahamii 37
Rahnella aquatilis CIP
Raoultella ornithinolytica
Reinekea forsetii
Rhodanobacter denitrificans
Rhodobaca barguzinensis
Rhodobacter capsulatus SB
Rhodobacteraceae bacterium
Rhodobacterales bacterium Y4I
Rhodomicrobium vannielii ATCC
Rhodoplanes sp Z2YC6860
Rhodopseudomonas palustris BisA53
Rhodovibrio salinarum DSM
Rhodovulum sp ES010
Roseobacter denitrificans OCh
Roseomonas gilardii
Roseovarius mucosus
Ruegeria mobilis F1926
Saccharophagus degradans 240
Sagittula sp P11
Salmonella bongori N26808
Sedimenticola thiotaurini
Sedimentitalea nanhaiensis DSM
Serratia ficaria
Shewanella algae
Shigella dysenteriae Sd197
Shimwellia blattae DSM
Shinella sp HZN7
Silicibacter lacuscaerulensis ITI1157
Simiduia agarivorans SA1
Sinorhizobium americanum
Sodalis glossinidius str
Sphingobium baderi
Sphingopyxis alaskensis RB2256
Sphingorhabdus flavimaris
Spongiibacter sp IMCC21906
Stappia sp ES058
Starkeya novella DSM
Stenotrophomonas acidaminiphila
Steroidobacter denitrificans
Sulfitobacter donghicola DSW25
Sulfurifustis variabilis
Sulfurospirillum halorespirans DSM
Tateyamaria omphalii
Tatlockia micdadei
Tatumella citrea
Teredinibacter sp 1162TS0a05
Thalassobium sp R2A62
Thalassolituus oleivorans
Thalassospira sp CSC3H3
Thalassotalea sp LPB0090
Thioalkalivibrio nitratireducens DSM
Thiobacimonas profunda
Thioclava nitratireducens
Thiocystis violascens DSM
Thioflavicoccus mobilis 8321
Thiohalobacter thiocyanaticus
Thiolapillus brandeum
Thioploca ingrica
Thiothrix nivea DSM
Tistrella mobilis KA081020065
Tolumonas auensis DSM
Variibacter gotj awalensis
Vibrio alginolyticus
Vibro shilonii
Wenzhouxiangella marina
Woeseia oceani
Xanthobacter autotrophicus Py2
Xanthobacteraceae bacterium 501b
Xanthomonas albilineans
Xenorhabdus bovienii str
Xuhuaishuia manganoxidans
Yersinia aldovae 67083
Zhongshania aliphaticivorans
Zobellella denitrificans
Zooshikella ganghwensis
Pseudomonas syringae pv
Salinispira pacifica
Sphaerochaeta globosa str
Spirochaeta africana DSM
Treponema azotonutricium ZAS9
Acetobacter aceti
Acidiphilium cryptum JF5
Afipia broomeae
Agrobacterium genomosp 3
Altererythrobacter atlanticus
Aminobacter aminovorans
Ancylobacter sp FA202
Antarctobacter heliothermus
Asaia bogorensis NBRC
Aurantimonas manganoxydans
Azorhizobium caulinodans ORS
Azospirillum brasilense
Beijerinckia indica subsp
Belnapia sp F41
Blastochloris viridis
Blastomonas sp RAC04
Bosea sp AS1
Bradyrhizobiaceae bacterium SG6C
Bradyrhizobium diazoefficiens
Brevundimonas diminuta
Brucella abortus 2308
Candidatus Filomicrobium marinum
Caulobacter crescentus CB15
Caulobacteraceae bacterium
Celeribacter ethanolicus
Chelativorans sp BNC1
Chelatococcus daeguensis
Citromicrobium sp JL477
Cohaesibacter sp ES047
Confluentimicrobium sp EMB200NS6
Croceicoccus marinus
Defluviimonas alba
Devosia sp A16
Dinoroseobacter shibae DFL
Ensifer adhaerens
Erythrobacter atlanticus
Fulvimarina pelagi HTCC2506
Geminicoccus roseus DSM
Gluconacetobacter diazotrophicus PA1
Gluconobacter albidus
Halocynthiibacter arcticus
Hartmannibacter diazotrophicus
Henriciella litoralis
Hirschia baltica ATCC
Hoeflea phototrophica DFL43
Hyphomicrobium denitrificans 1NES1
Hyphomonas neptunium ATCC
Jannaschia sp CCS1
Ketogulonicigenium vulgare
Komagataeibacter europaeus
Labrenzia aggregata
Leisingera aquimarina DSM
Litoreibacter janthinus
Loktanella vestfoldensis
Magnetococcus marinus MC1
Magnetospira sp QH2
Magnetospirillum gryphiswaldense
Maricaulis mans MC S10
Marinovum algicola DG
Martelella endophytica
Mesorhizobium amorphae
Methylobacterium aquaticum
Methylocapsa acidiphila B2
Methyloceanibacter caenitepidi
Methylocella silvestris BL2
Methylocystis bryophila
Methyloferula stellata AR4
Methylopila sp 73B
Methylosinus sp LW3
Microvirga ossetica
Neoasaia chiangmaiensis
Neorhizobium galegae complete
Nitratireductor basaltis
Nitrobacter hamburgensis X14
Novosphingobium aromaticivorans
Oceanicaulis sp HTCC2633
Oceanicola litoreus
Ochrobactrum pseudogrignonense
Octadecabacter antarcticus 307
Oligotropha carboxidovorans OM4
Pacificimonas flava
Pannonibacter phragmitetus
Paracoccus aminophilus JCM
Parvibaculum lavamentivorans DS1
Pelagibaca abyssi
Pelagibacterium halotolerans B2
Phaeobacter gallaeciensis
Phenylobacterium zucineum HLK1
Phyllobacterium sp Tri48
Planktomarina temperata RCA23
Polymorphum gilvum SL003B26A1
Porphyrobacter neustonensis
Pseudolabrys sp Root1462
Pseudooceanicola batsensis
Pseudophaeobacter arcticus DSM
Pseudorhodoplanes sinuspersici
Pseudovibrio sp FOBEG1
Puniceibacterium sp IMCC21224
Reyranella massiliensis 521
Rhizobium etli
Rhizorhabdus dicambivorans
Rhodospirillum photometricum DSM
Ecoli MG1655
According to some embodiments, the interaction strength of a various aSD sequences with different 6 nt sequences are given in Table 3. Any 6 nt sequence not provided in Table 3 for a specific aSD sequence has an interaction strength of zero.
According to some embodiments, Table 3 includes the interaction strength of the canonical aSD sequence and non-canonical aSD sequences GCCGCG, CGGCTG, CTCCTT, GCCGTA, GCGGCT, GTGGCT and GGCTGG. The interaction strengths that appear in Table 3 are sorted by increasing interaction strength. The interactions gradually increase from weak, to intermediate, to strong interaction strengths. According to some embodiments, interaction strength classification as weak, intermediate or strong is organism specific. In some embodiments, organism specific interaction strength classifications as weak, intermediate and strong are provided in Table 1. According to some embodiments, the interaction strength classifications for a bacterium that is not listed in Table 1 can be deduced based on the interaction strength classification of a bacteria that is disclosed in Table 1 and has the closest evolutionary distance to it. In some embodiments, the interaction strength classification for a bacterium that is not listed in Table 1 can be deduced by using the strengths for a bacterium with the same aSD or aSD subregion sequence.
In some embodiments, the interaction strength is decreased by at least 1%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 99% or 100%, relative to the interaction strength between an unmodified region of a nucleic acid molecule and a ribosomal RNA. Each possibility represents a separate embodiment of the invention.
In some embodiments, a weak interaction is an interaction of at most 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7 or 2.8 kcal/mol. Each possibility represents a separate embodiment of the invention. According to some embodiments, the interaction strength is decreased to a weak interaction strength. Organism specific interaction strengths are provided in Table 1. In some embodiments, the interaction strength of canonical aSD sequence and non-canonical aSD sequences are as provided in Table 3. Organisms specific aSD sequences are known in the art, and can be found, for example is Ruhul Amin, et al., “Re-annotation of 12,495 prokaryotic 16S rRNA 3′ ends and analysis of Shine-Dalgarno and anti-Shine-Dalgarno sequences”, PLoS One, 2018; 13(8).
In some embodiments, an intermediate interaction is an interaction between a weak and a strong interaction. According to some embodiments, the interaction strength is modulated to an intermediate interaction strength. In some embodiments, the interaction strength is decreased to an intermediate reaction strength. In some embodiments, the interaction strength is increased to an intermediate reaction strength. It will be appreciated by a skilled artisan that weak, strong and intermediate interactions are distinct to each prokaryote and what may numerically be a strong interaction for one organism may be weak for another. Organism specific interaction strengths are provided in Table 1. In some embodiments, the interaction strength of canonical aSD sequence and non-canonical aSD sequences are as provided in Table 3.
In some embodiments, the interaction strength is the interaction strength of a subregion of the nucleic acid molecule. In some embodiments, the subregion is at least 1, 2, 3, 4, 5, 6, 7, or 8 nucleotides long. Each possibility represents a separate embodiment of the invention. In some embodiments, the subregion is at most 5, 6, 7, 8, 9, 10, 11 or 12 nucleotides long. Each possibility represents a separate embodiment of the invention. In some embodiments, the subregion is between 4-12, 5-12, 6-12, 7-12, 8-12, 4-11, 5-11, 6-11, 7-11, 8-11, 4-10, 5-10, 6-10, 7-10, 8-10, 4-9, 5-9, 6-9, 7-9, 4-8, 5-8, 6-8 or 7-8 nucleotides long. Each possibility represents a separate embodiment of the invention. In some embodiments, the subregion is the size of a SD sequence. In some embodiments, the subregion is the size of an aSD sequence. In some embodiments, the subregion is 6-nucleotides in length. According to some embodiments, organisms specific 6-nucleotides subregions are provided in Table 3.
In some embodiments, the mutation is within more than one subregion. In some embodiments, the mutation modulates the interaction strength of each subregion differently. In some embodiments, increasing interaction is increasing the cumulative interaction of all the subregions comprising the mutation. In some embodiments, decreasing interaction is decreasing the cumulative interaction of all the subregions comprising the mutation.
In some embodiments, the mutation it is a silent mutation. In some embodiments, the mutation results in the alteration of an amino acid of the sequence encoded by the nuclei acid of the invention to an amino acid with a similar function characteristic. In some embodiments, a characteristic is selected from size, charge, isoelectric point, shape, hydrophobicity and structure. In some embodiments of the methods of the invention, the mutation results in a synonymous codon (Synonymous codons are provided in Table 4). In some embodiments, the mutation does not alter protein function. In some embodiments, the mutation alters protein function. As used herein, the term “silent mutation” refers to a mutation that does not affect or has little effect on protein functionality. A silent mutation can be a synonymous mutation and therefore not change the amino acids at all, or a silent mutation can change an amino acid to another amino acid with the same functionality or structure, thereby having no or a limited effect on protein functionality.
In some embodiments, the nucleic acid molecule comprises at least 1, 2, 3, 4, 5, 7 10, 20, 30, 40, 50, 60, 70, 80, 100, 200, 300, 400, 500, 1000 or 10000 mutations. Each possibility represents a separate embodiment of the invention. According to some embodiments, the nucleic acid molecule comprises mutations at least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45% 50%, 75% or 100% of positions of the nucleic acid molecule. Each possibility represents a separate embodiment of the invention. In some embodiments, more than one mutation is in the same region. In some embodiments, more than one interaction is in the same subregion. In some embodiments, the nucleic acid molecule comprises at least two mutations and wherein the two mutation are in different regions. In some embodiments, the nucleic acid molecule comprises at least two mutations and wherein the two mutation are in different subregions.
In some embodiments, the nucleic acid molecule comprises a second mutation in a different region than the at least one mutation. In some embodiments, the second mutation modulates interaction strength of the nucleic acid molecule to a 16S ribosomal RNA (rRNA). In some embodiments, the second mutation and at least one mutation modulate synergistically. It will be understood by a skilled artisan that a synergistic modulation will both effect translation in the same way. Thus, if the at least one mutation improves translation potential, then the second mutation also improves translation potential. Similarly, if the at least one mutation decreases translation potential, then the second mutation also decreases translation potential. The two mutations need to create this effect in the same way. For a non-limiting example, the at least one mutation could increase translation initiation efficiency, while the second mutation optimizes ribosomal allocation. Similarly, for example, the at least one mutation may affect early elongation and the second mutation may affect translation termination. In some embodiments, the at least one mutation and the second mutation both improve translation efficiency. In some embodiments, the at least one mutation and the second mutation both decrease translation efficiency. In some embodiments, improving translation efficiency is increasing translation efficiency.
Introduction of a mutation into the genome of a cell is well known in the art. Any known genome editing method may be employed, so long as the mutation is specific to the location and change that is desired. Non-limiting examples of mutation methods include, site-directed mutagenesis, CRISPR/Cas9 and TALEN.
In some embodiments, the nucleic acid molecule of the invention is part of a vector. In some embodiments, the vector is an expression vector. In some embodiments, the expression vector is a prokaryotic expression vector. In some embodiments, the prokaryotic expression vector comprises any sequences necessary for expression of the protein encoded by the nucleic acid molecule of the invention in a prokaryotic cell. In some embodiments, the expression vector is a eukaryotic expression vector.
According to another aspect, there is provided a biological compartment, comprising a nucleic acid molecule of the invention.
According to another aspect, there is provided, a cell comprising a nucleic acid molecule of the invention.
In some embodiments, the biological compartment is a cell. In some embodiments, the biological compartment is a virion. In some embodiments, the biological compartment is a virus. In some embodiments, the biological compartment is a bacteriophage. In some embodiments, the biological compartment is an organelle. Organelles are well known in the art and include, but are not limited to, mitochondria, chloroplasts, rough endoplasmic reticulum, and nuclei.
In some embodiments, the cell is a genetically modified cell. In some embodiments, the cell is prokaryotic cell. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a bacterial cell. In some embodiments, the cell is in culture. In some embodiments, the cell is in vivo. In some embodiments, the cell is a pathogen. In some embodiments, the nucleic acid molecule of the invention is an endogenous molecule of the cell that has been mutated. In some embodiments, the nucleic acid molecule of the invention is a heterologous transgene or a heterologous gene that has been added to the cell. In some embodiments, the cell is a virally infected cell.
The bacteria may be selected from a phyla or classes including but not limited to Alphaprobacteria, Betaprotobacteria, Cyanobacteria, Delataprotobacteria, Gammaprtobacteria, Gram positive bacteria, Purple bacteria and Spirochaetes bacteria. According to some embodiments, the bacteria is selected from a phyla or classes selected from Alphaprobacteria, Betaprotobacteria, Cyanobacteria, Delataprotobacteria, Gammaprtobacteria, Gram positive bacteria, Purple bacteria and Spirochaetes bacteria. According to some embodiments the bacteria is selected from the list provided in Table 1. According to some embodiments, the bacterial cell is not Cyanobacteria or Gram-positive bacteria.
In some embodiments, the cell comprises increased fitness. In some embodiments, the cell comprises decreased fitness. In some embodiments, the cell produces increased amounts of the protein encoded by the nucleic acid of the invention as compared to the amount of protein produced by an unmutated nucleic acid.
In some embodiments, a cell comprises a nucleic acid molecule comprising at least one mutation at least one region of the nucleic acid molecule, the region is selected from the group consisting of:
According to some embodiments, the nucleic acid molecule comprises a mutation at positions −8 through −17 upstream of a translational start site is introduced into a cell. According to some embodiments, the mutation increases the interaction strength between a nucleic acid molecule region and the 16S ribosomal RNA thereby improving the translation initiation stage.
According to some embodiments, the nucleic acid molecule comprises a mutation at positions −1 upstream of a translational start site through position 5 downstream of the translational start site is introduced into a cell. According to some embodiments, the mutation increases the interaction strength between a nucleic acid molecule region and the 16S ribosomal RNA thereby optimizing ribosomal allocation and chaperon recruitment in the cell.
According to some embodiments, the nucleic acid molecule comprises a mutation at positions 6 through 25 downstream of a translational start site is introduced into a cell. According to some embodiments, the mutation decreases the interaction strength between a nucleic acid molecule region and the 16S ribosomal RNA thereby increasing translation elongation efficiency and avoiding errant translation initiation.
According to some embodiments, the nucleic acid molecule comprises a mutation at positions 25 downstream of a translational start site through position −13 upstream of a translational termination site is introduced into a cell. According to some embodiments, the mutation modulated the interaction strength between a nucleic acid molecule region and the 16S ribosomal RNA thereby increasing the ribosome diffusion efficiency towards the regions surrounding the start codon and/or improving translation initiation efficiency. In some embodiments, the modulation is to an intermediate interaction strength.
According to some embodiments, the nucleic acid molecule comprises a mutation at positions −8 through −17 upstream of a translational termination site is introduced into a cell. According to some embodiments, the mutation increases the interaction strength between a nucleic acid molecule region and the 16S ribosomal RNA improving translation termination fidelity and/or efficiency.
According to some embodiments, the nucleic acid molecule comprises a mutation at a position downstream of a translational termination site is introduced into a cell. According to some embodiments, the mutation decreases the interaction strength between a nucleic acid molecule region and the 16S ribosomal RNA thereby keeping the small sub-unit of the ribosome attached to the transcript after finishing the translation cycle, improving the recycling of ribosomes and thus the translation process. According to some embodiments, the mutation increases the interaction strength between a nucleic acid molecule region and the 16S ribosomal RNA thereby keeping the small sub-unit of the ribosome attached to the transcript after finishing the translation cycle, improving the recycling of ribosomes and thus the translation process.
By another aspect, there is provided, a method for improving or impairing the translation process of a nucleic acid molecule, the method comprising introducing a mutation into the nucleic acid molecule, wherein the mutation modulates the interaction strength of the nucleic acid molecule to a 16S ribosomal RNA, thereby improving the translation process of a nucleic acid molecule.
In some embodiments, the mutation is a mutation described hereinabove. In some embodiments, method improves the translation process. In some embodiments, the method impairs the translation process. In some embodiments, the translation process comprises translation potential. In some embodiments, translation process in a cell is improved or impaired. In some embodiments, the translation process comprises translation pre-initiation. In some embodiments, the translation process comprises translation initiation. In some embodiments, the translation process comprises early elongation. In some embodiments, the translation process comprises elongation. In some embodiments, the translation process comprises translation termination.
The term “expression” as used herein refers to the biosynthesis of a gene product, including the transcription and/or translation of the gene product. Thus, expression of a nucleic acid molecule may refer to transcription of the nucleic acid fragment (e.g., transcription resulting in mRNA or other functional RNA) and/or translation of RNA into a precursor or mature protein (polypeptide).
Expressing of a gene within a cell is well known to one skilled in the art. It can be carried out by, among many methods, transfection, transformation, viral infection, or direct alteration of the cell's genome. In some embodiments, the gene is in an expression vector such as plasmid or viral vector.
Recombinant expression vectors generally contains at least an origin of replication for propagation in a cell and optionally additional elements, such as a heterologous polynucleotide sequence, expression control element (e.g., a promoter, enhancer), selectable marker (e.g., antibiotic resistance), poly-Adenine sequence that allows for expression of the nucleotide sequence (e.g. in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
As used herein the term “in vitro” refers to any process that occurs outside a living organism. As used herein the term “in-vivo” refers to any process that occurs inside a living organism. In one embodiment, “in-vivo” as used herein is a cell within an intact tissue or an intact organ.
In some embodiments, the gene is operably linked to a promoter. The term “operably linked” is intended to mean that the nucleotide sequence of interest is linked to the regulatory element or elements in a manner that allows for expression of the nucleotide sequence.
Various methods can be used to introduce the expression vector of the present invention into cells. Such methods are generally described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Springs Harbor Laboratory, New York (1989, 1992), in Ausubel et al., Current Protocols in Molecular Biology, John Wiley and Sons, Baltimore, Md. (1989), Chang et al., Somatic Gene Therapy, CRC Press, Ann Arbor, Mich. (1995), Vega et al., Gene Targeting, CRC Press, Ann Arbor Mich. (1995), Vectors: A Survey of Molecular Cloning Vectors and Their Uses, Butterworths, Boston Mass. (1988) and Gilboa et at. [Biotechniques 4 (6): 504-512, 1986] and include, for example, stable or transient transfection, lipofection, electroporation and infection with recombinant viral vectors. In addition, see U.S. Pat. Nos. 5,464,764 and 5,487,992 for positive-negative selection methods.
General methods in molecular and cellular biochemistry, such as methods useful for carrying out DNA and protein recombination, as well as other techniques described herein, can be found in such standard textbooks as Molecular Cloning: A Laboratory Manual, 3rd Ed. (Sambrook et al., HaRBor Laboratory Press 2001); Short Protocols in Molecular Biology, 4th Ed. (Ausubel et al. eds., John Wiley & Sons 1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy eds., Academic Press 1995); Immunology Methods Manual (I. Lefkovits ed., Academic Press 1997); and Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, John Wiley & Sons 1998).
As used herein, the term “recombinant protein” refers to protein which is coded for by a recombinant DNA and is thus not naturally occurring. The term “recombinant DNA” refers to DNA molecules formed by laboratory methods of genetic recombination. Generally, this recombinant DNA is in the form of a vector, plasmid or virus used to express the recombinant protein in a cell.
Purification of a recombinant protein involves standard laboratory techniques for extracting a recombinant protein that is essentially free from contaminating cellular components, such as carbohydrate, lipid, or other proteinaceous impurities associated with the peptide in nature. Purification can be carried out using a tag that is part of the recombinant protein or thought immuno-purification with antibodies directed to the recombinant protein. Kits are commercially available for such purifications and will be familiar to one skilled in the art. Typically, a preparation of purified peptide contains the peptide in a highly-purified form, i.e., at least about 80% pure, at least about 90% pure, at least about 95% pure, greater than 95% pure, or greater than 99% pure. Each possibility represents a separate embodiment of the invention.
According to some embodiments, the invention concerns an isolated genetically modified organism, wherein at least one position of a nucleic acid molecule comprising a coding sequence comprises a sequence mutation wherein the genetically modified organism has a modified translation process as compared to an unmodified form of the same organism.
In some embodiments, improving comprises at least one of: increasing translation initiation efficiency, increasing translation initiation rate, increasing diffusion of the small subunit to the initiation site, increasing elongation rate, optimization of ribosomal allocation, increasing chaperon recruitment, increasing termination accuracy, decreasing translational read-through and increasing protein yield. In some embodiments, impairing comprises at least one of: decreasing translation initiation efficiency, decreasing translation initiation rate, decreasing diffusion of the small subunit to the initiation site, decreasing elongation rate, deoptimization of ribosomal allocation, decreasing chaperon recruitment, decreasing termination accuracy, increasing translational read-through and decreasing protein level.
By another aspect, there is provided a method of improving the translation process, the method comprising introducing a sequence mutation to a nucleic acid molecule comprising a coding sequence, thereby modulating the interaction strength of the nucleic acid molecule to a 16S ribosomal RNA and modifying the translation process of a nucleic acid molecule.
By another aspect, there is provided a method of modifying a biological compartment, the method comprising performing a method of the invention on a nucleic acid molecule, thereby modifying the translation potential of the nucleic acid molecule, expression the modulated nucleic acid molecule within the cell, thereby modifying a cell.
By another aspect, there is provided a method of modifying a biological compartment, the method comprising performing a method of the invention on a nucleic acid molecule within the cell, thereby modifying a cell.
According to another aspect, there is provided a method for producing a nucleic acid molecule having an optimized or deoptimized translation process, the method comprising:
By another aspect, there is provided a method for producing a nucleic acid molecule having decreased or increased translation potential, comprising:
In some embodiments, the biological compartment is a cell. In some embodiments, the biological compartment is an organelle. In some embodiments, the biological compartment is a virion. In some embodiments, the biological compartment is a bacteriophage.
In some embodiments, at least the top 1, 2, 3, 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 mutations are introduced. Each possibility represents a separate embodiment of the invention. In some embodiments, all introduced mutations increase the translation potential. In some embodiments, all introduced mutations decrease the translation potential. In some embodiments, the mutations are selected from the mutations described hereinabove. It will be understood that the mutations are region specific and increasing interaction strength in a particular region will either increase or decrease translation potential, which increasing interaction strength in a different region might have a different effect on translation potential. In some embodiments, the method produces nucleic acid molecules optimized or deoptimized for translation in a target bacterium. In some embodiments, the target bacterium is a bacterium described hereinabove.
According to some embodiments, profiling the interaction strength of a sequence mutation on the interaction strength between a nucleic acid molecule and a ribosomal RNA, comprises comparing the interaction strength of a mutated sequence to a ribosomal RNA to the interaction strength of an unmodified sequence to a ribosomal RNA.
By another aspect, there is provided a computer program product for improving the translation process of a nucleic acid molecule, comprising a non-transitory computer-readable storage medium having program code embodied thereon, the program code executable by at least one hardware processor to:
By another aspect, there is provided a system for improving the translation process of a nucleic acid molecule, comprising:
By another aspect, there is provided a computer program product for profiling the interaction strength between a nucleic acid molecule and a 16S ribosomal RNA, comprising a non-transitory computer-readable storage medium having program code embodied thereon, the program code executable by at least one hardware processor to:
By another aspect, there is provided a computer program product for modulating translation potential of a nucleic acid molecule comprising a coding sequence, comprising a non-transitory computer-readable storage medium having program code embodied thereon, the program code executable by at least one hardware processor to:
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
These computer readable program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
Embodiments may comprise a computer program that embodies the functions described and illustrated herein, wherein the computer program is implemented in a computer system that comprises instructions stored in a machine-readable medium and a processor that executes the instructions. However, it should be apparent that there could be many different ways of implementing embodiments in computer programming, and the embodiments should not be construed as limited to any one set of computer program instructions. Further, a skilled programmer would be able to write such a computer program to implement one or more of the disclosed embodiments described herein. Therefore, disclosure of a particular set of program code instructions is not considered necessary for an adequate understanding of how to make and use embodiments. Further, those skilled in the art will appreciate that one or more aspects of embodiments described herein may be performed by hardware, software, or a combination thereof, as may be embodied in one or more computing systems. Moreover, any reference to an act being performed by a computer should not be construed as being performed by a single computer as more than one computer may perform the act.
By device for sequencing it is meant a combination of components that allows the sequence of a piece of DNA to be determined. In some embodiments, the testing device allows for the high-throughput sequencing of DNA. In some embodiments, the testing device allows for massively parallel sequencing of DNA. The components may include any of those described above with respect to the methods for sequencing.
In certain embodiments the system further comprises a display for the output from the processor.
Before the present invention is further described, it is to be understood that this invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
Certain ranges are presented herein with numerical values being preceded by the term “about”. The term “about” is used herein to provide literal support for the exact number that it precedes, as well as a number that is near to or approximately the number that the term precedes. In determining whether a number is near to or approximately a specifically recited number, the near or approximating unrecited number may be a number which, in the context in which it is presented, provides the substantial equivalent of the specifically recited number.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a polynucleotide” includes a plurality of such polynucleotides and reference to “the polypeptide” includes reference to one or more polypeptides and equivalents thereof known to those skilled in the art, and so forth. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements or use of a “negative” limitation.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. All combinations of the embodiments pertaining to the invention are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations of the various embodiments and elements thereof are also specifically embraced by the present invention and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein.
Additional objects, advantages, and novel features of the present invention will become apparent to one ordinarily skilled in the art upon examination of the following examples, which are not intended to be limiting. Additionally, each of the various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below finds experimental support in the following examples.
Before the present invention is further described, it is to be understood that this invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
General methods in molecular and cellular biochemistry can be found in such standard textbooks as Molecular Cloning: A Laboratory Manual, 3rd Ed. (Sambrook et al., HaRBor Laboratory Press 2001); Short Protocols in Molecular Biology, 4th Ed. (Ausubel et al. eds., John Wiley & Sons 1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy eds., Academic Press 1995); Immunology Methods Manual (I. Lefkovits ed., Academic Press 1997); and Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, John Wiley & Sons 1998).
The analyzed organisms. We analyzed 551 bacteria from the following phyla or classes: Alphaprobacteria, Betaprotobacteria, Cyanobacteria, Delataprotobacteria, Gammaprtobacteria, Gram positive bacteria, Purple bacteria, Spirochaetes bacteria. We analyzed an additional 76 bacteria across the tree of life that do not have a canonical aSD sequence in their 16S rRNA. Additionally, we analyzed 207 bacteria with known growth rates. The full lists can be found in Table 1. All of the bacterial genomes were downloaded from the NCBI database (ncbi.nlm.nih.gov/) on October 2017. For each gene, aside from the annotated coding regions, we also analyzed the 50 nt upstream of the translational start site and the 50 nt downstream of the translational termination site (approximating the end of the 5′UTR, and the beginning of the 3′UTR respectively).
The rRNA-mRNA interaction strength prediction and profile. The prediction of rRNA-mRNA interaction strength is based on the hybridization free energy between two sub-sequences: The first sequence is a 6 nt sequence from the mRNA and the second sequence is the aSD from the rRNA. This energy was computed based on the Vienna package RNAcoFold35, which computes a common secondary structure of two RNA molecules. Lower, more negative free energy is related to stronger hybridization (See below).
The rRNA-mRNA interaction strength profiles include the predicted rRNA-mRNA hybridization strength for each position in each transcript (UTRs and coding regions), and in each bacterium. We calculated the interaction strength between all 6 nucleotide sequences along each transcript (UTR's and coding sequences) with the 16S rRNA aSD. For each possible genomic position along the transcripts we performed a statistical test to decide if the potential rRNA-mRNA interaction in this position is significantly strong, intermediate, or weak. For more details, see below. We also created Z-score maps of the strength of interactions, see below.
The null model. We designed for each bacterial genome 100 randomizations according to the following null model: UTR randomized versions were generated based on nucleotide permutation which preserves the nucleotide distribution, and specifically the GC content. The coding region randomized versions were generated by permuting synonymous codons, thus preserving the codon frequencies, the amino acid order and content, and the GC content of the original protein.
Similar rRNA-mRNA interaction strength profiles as the ones described above were computed for the randomized versions of the transcripts, to compute p-values related to possible selection for strong/intermediate/weak rRNA-mRNA interactions.
We computed an empirical p-value for every position in the transcriptome of a certain organism. To this end, the average rRNA-mRNA interaction strength in the position was compared to the average obtained in all of the randomized genomes. The p-value was computed based on the number of times the real genome average was higher or lower (depend on the hypothesis we checked) than the null model average. A significant position is a position with a p-value smaller than 0.05.
Protein levels. E. coli Endogenous protein abundance data was downloaded from PaxDB (pax-db.org/download), we used “E. coli—whole organism, EmPAI” published in 2012.
The rRNA-mRNA strength prediction. The definition of rRNA-mRNA interaction strength is based on the hybridization free energy between two sub-sequences. The first sequence is a 6 nt sequence from the mRNA and the second sequence is the aSD from the rRNA. The energy value was computed based on the Vienna package RNAcoFold, which computes a common secondary structure of two RNA molecules. The RNAcofold parameters were the default ones to correspond to all of the analyzed bacteria.
Lower and more negative free energy is related to stronger hybridization. We assumed that the interacting sub-sequence at the 16S rRNA 3′ end is TCCTCC (3′ to 5′). However, when we remove this assumption and infer it in an unsupervised manner, the results remain similar.
The rRNA-mRNA interaction strength profiles and selection strength. rRNA-mRNA interaction strength profiles are based on the predicted rRNA-mRNA hybridization strength for each position, in each transcript (UTRs and coding regions), and in each bacterium. We report the average profile of each bacterium.
The Vienna program RNAcoFold (see definition in the section above) was employed to calculate the free energy related to rRNA-mRNA hybridization strength (i.e. the energy which is released when two sequences “bind”). We calculated the interaction strength between all 6 nucleotide sub-sequences that begin in a specific position in the transcript (UTR's and coding sequence) with the 16S ribosomal RNA aSD. By calculating the interaction between the aSD and all possible 6 nt sub-sequences along the mRNA, we achieved the hybridization strength (interaction strength) profile at a resolution of single nucleotides. In order to decide if a position (across the entire transcriptome) tends to include sub-sequences with certain rRNA-mRNA interaction strength (strong, intermediate or weak) we compared it to the properties of sub-sequences observed in a null model in the same position (see further details regarding the null model below).
The intermediate rRNA-mRNA interaction definition. In order to define intermediate interaction strength, we devised an unsupervised adaptive optimization model that defines intermediate interaction strength thresholds. Our goal function in the algorithm was the number of significant positions for intermediate interactions. The algorithm selects thresholds (interaction strength values) and calculates significant positions for intermediate interactions compared to the null model. At each iteration, the thresholds are chosen greedily to improve the number of significant intermediate positions (as compared to the null model). This procedure was also computed for the null model sequences to demonstrate selection.
The first iteration thresholds were selected as follows; we created a distribution histogram of interaction strength in the region with the strong canonical SD interaction in the 5′UTR of each bacterium (positions −8 through −17,
To study the properties of the selected thresholds, we created the interaction strength histograms for two regions in the 5′UTR (
Next, we looked at the positions of the two inferred thresholds in comparison to these two histograms; as can be seen in
To further quantitatively validate the inferred thresholds, we calculated the area under the two histograms mentioned above induced by the two inferred thresholds. The ratio between these two areas (the first one divided by the second one) was computed: A ratio larger than one suggests that it is more probable that the inferred thresholds are related to (intermediate) interactions between the rRNA and mRNA than to lack of interactions; indeed, in most bacteria (503/551) the ratio was larger than one (
Relation between the number of intermediate rRNA-mRNA interactions in the coding regions and heterologous protein levels. We aimed at showing that intermediate sequences in the coding region of a gene directly improve its translation initiation efficiency, and thus its protein levels. Hence, we calculated the partial Spearman correlations between the number of intermediate interaction sequences in the GFP variant and the heterologous protein levels (PA), based on 146 synonymous GFP variants that were expressed from the same promoter and the same UTR.
The control variables were the CAI and folding energy (FE) near the start codon. We defined an area of intermediate interactions according to the thresholds received by our model in E. coli and we expanded it by 20% to allow maximum intermediate interactions in this synthetic system (which is expected to differ from endogenous genes). The correlation was indeed positive and significant (r=0.35; P=2·10-5), suggesting that variants with more sub-sequences in the coding region that bind to the rRNA with an intermediate interaction strength tend to have higher PA.
Ribosome Profiling. E. coli Ribosome footprint reads were obtained from (SRR2340141,3-4). E. coli transcript sequences were obtained from NCBI (NC_000913.3). Sequenced reads were mapped as described in Diament, A. & Tuller, T. Estimation of ribosome profiling performance and reproducibility at various levels of resolution. Biol. Direct 11, 24 (2016) herein incorpatered by reference in its interity, with the following minor modifications. We trimmed 3′ adaptors from the reads using Cutadapt (version 1.17, described in Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal 17, 10-12 (2011), herein incorpatered by reference in its interity), and utilized Bowtie (version 1.2.1, described in Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009), herein incorpatered by reference in its interity) to map them to the E. coli transcriptome. In the first phase, we discarded reads that mapped to rRNA and tRNA sequences with Bowtie parameters ‘-n 2-seedlen 21-k 1-norc’. In the second phase, we mapped the remaining reads to the transcriptome with Bowtie parameters ‘-v 2-a-strata-best-norc-m 200’. We filtered out reads longer than 30 nt and shorter than 23 nt. Unique alignments were first assigned to the ribosome occupancy profiles. For multiple alignments, the best alignments in terms of number of mismatches were kept. Then, multiple aligned reads were distributed between locations according to the distribution of unique ribosomal reads in the respective surrounding regions. To this end, a 100 nt window was used to compute the read count density RCDi (total read counts in the window divided by length, based on unique reads) in vicinity of the M multiple aligned positions in the transcriptome, and the fraction of a read assigned to each position was RCDi/Σj=1M RCDj. The location of the A-site was set for each read length by the peak of read distribution upstream of the translational termination site for that length.
After creating the ribosome profiling distributions, for each gene, we calculated the number of positions with strong rRNA-mRNA interaction in the last 20 nucleotides of the coding region (the location of the reported signal,
Z-score calculation in highly and lowly expressed genes. To validate the reported signals, we performed all of our analyses on highly and lowly expressed genes of E. coli. We chose the highly and lowly expressed genes according to their PA (20% highest and lowest PA values), and computed Z-scores as explained in the next sub-sections.
Highly Vs. Lowly: Selection for Strong rRNA-mRNA Interactions at the 5′UTR End and at the Beginning of the Coding Region
We calculated the Z score based on the rRNA-mRNA interaction strength in all possible positions in the 5′UTR and coding region in the highly and lowly expressed genes.
The results of the Z-score analysis can be seen in
From a statistical point of view, we defined each gene by two values according to the reported signal: 1) Minimum Z-score value in position −8 through −17 in the 5′UTR. 2) Minimum Z-score value in position 1 through 5 at the beginning of the coding region. The regions were selected according to the reported signal in
We performed two Wilcoxon rank sum tests to estimate the p-values for the two reported signals in highly vs. lowly expressed genes.
Highly Vs. Lowly: Selection Against Strong rRNA-mRNA Interactions at the Beginning of the Coding Sequence
We calculated the Z-score (as described above) based on the rRNA-mRNA interaction strength of each position in the first 400 nt of the coding region in the highly and lowly expressed genes.
The results of the Z-score analysis can be seen in
Highly Vs. Lowly: Z-Score Calculation of Selection for Strong mRNA-rRNA Interactions at the End of the Coding Sequence
In this case, we calculated the Z score (as described above) based on the rRNA-mRNA interaction strength of each position in the last 20 nt of the coding region in each bacterium.
For each bacterium, we found the position with a minimum Z-score value (strongest interaction compared to the null model). We created a histogram of the positions of strongest z-scores in the last 20 nt of the coding region distribution (
Selection against strong interaction in the coding region in positions that are not upstream to a close AUG codon. To detect signal of selection for/against strong interaction in the coding region after excluding positions that are upstream to a close start codon, we preformed the following analysis. We considered the E. coli genomes (both real and randomized versions) and in each gene we “marked”, position that are up to 14 positions upstream of an AUG (in all frames). We then computed p-value related to selection for strong rRNA-mRNA interactions (as mentioned before) but when we consider only the non-marked positions (both in the real and the randomized genomes). The result can be seen in
Read-through experiment to evaluate the effect of rRNA-mRNA interaction at the end of the coding region. To investigate the selection for strong rRNA-mRNA interaction at the end of the coding region (alignment to the STOP codon) we used a construct of RFP linked to a GFP (
To investigate the selection for strong rRNA-mRNA interaction at the end of the coding region (alignment to the stop codon) we used a construct of RFP linked to a GFP (
Unified biophysical translation model of the reported signals. We developed a computational simulative model of translation that includes the pre-initiation, initiation and elongation phases. Our model is based on a mean field approximation of the TASEP model. All of the model parameters are based on rRNA-mRNA interaction strength.
The model consists of two types of ‘particles’: 1. Small sub-units of the ribosome (pre-initiation): in this case, detachment/attachment and bi-direction movement of the particles is possible along the entire transcript. 2. Ribosome (elongation): the movement is unidirectional (from the 5′ to the 3′ of the mRNA) and possible only in the coding region; the initiation rate is affected by the density of the small sub-units of the ribosome at the ribosomal binding site (RBS).
Unified Biophysical Translation Model of the Reported Signals.
To validate that intermediate sequences in the coding region can improve the translation process by improving the pre-initiation diffusion of the small subunit to the initiation site and thus enhance the initiation phase of translation, we constricted a computational model of translation that includes the pre-initiation/initiation, and elongation phases. Our model is based on a mean field approximation of the TASEP model.
All of the model parameters are based on rRNA-mRNA interaction strength. The model consists of two types of ‘particles’: 1. Small sub-units of the ribosome (pre-initiation): their movement is possible through all of the transcript. 2. Ribosome (elongation): the movement is possible only in the coding region.
The model equations: Small sub-unit basic model. In this model there are several parameters that describe the movement of the small sub-unit in each site of the transcript. The small sub-unit can attach to the relevant site in the mRNA at a certain rate (depends on the rRNA-mRNA interaction value at that site). The small sub-unit can detach from a site at a certain rate (depends on the complementary interaction to the rRNA-mRNA interaction).
Attachment(i)=c1*Attachmentn(i) 3.
Detachment(i)=c1*Detachmentn(i) 4.
The movement forward of the small sub-unit to the next site depends on the detachment rate from the current site and the attachment rate of the next site.
Flow from cell i to cell i+1
Forward(i)=c2+(Detachment(i)*Attachment(i+1)) 5.
The movement backwards of the small sub-unit to the previous site depends on the detachment rate from the current site and the attachment rate of the previous site.
Flow from cell i+1 to cell i
Backward(i)=c2+(Detachment(i+1)*Attachment(i)) 6.
The start and end terms of the equations depends on the attachment or detachment of the first/last site.
“initiation” of the small sub-unit into the first site:
Forward(0)=c2+Attachment(1)
Backward(0)=c2+Detachment(1)
“termination” of the small sub-unit from the last site:
Forward(end)=c2+Detachment(end)
Backward(end)=c2+Attachment(end)
This is an example of the simple model equations that is based on the RFM. The density of ribosomes of site i depends on the flow to the site (from the site before and the next site), depends on the flow from site i (to the previous site and the next site) and the detachment and attachment rates of site i.
For example, i=2:
{dot over (x)}
2=Flow(1,2)x1(1−x2)−Flow(2,1)x2(1−x1)+Flow(3,2)x3(1−x2)−Flow(2,3)x2(1−x3)+Attachment(2)(1−x2)−Detachment(2)x2
Small sub-unit k-sites model. To fully grasp the intermediate interaction effect we extended the small sub-unit model in a way that the i'th site is affected by k sites before it and k sites after it.
Attachment, Detachment equations are the same as in the basic model.
The movement between sites of the small sub-unit depends on the detachment rate from the i'th site and the attachment rate of the k'th site.
Flow from Cell i to Cell k:
Flow(i,k)=c2+(Detachment(i)*Attachment(k))
FlowF—Flow forward to the first site (initiation)
FlowB—Flow backward from the first site (initiation)
The Model Equations from an mRNA in the Length of n Sites:
Initiation: {dot over (x)}1=FlowF(1−x1)+Attachment(1)(1−x1)−Flow(1,2)x1(1−x2)−FlowBx1−Detachment(1)x1+Σj=2k+1Flow(j,1)xj(1−x1)−Flow(1,j)x1(1−xj) a.
Elongation (k<i<n−k): b.
In this case we have k sites before the i'th site and k sites after the i'th site.
Therefore, we sum all contribution of all k sites (in both sides of site i) to calculate the density of site i.
{dot over (x)}
l=[Σj=i−ki−1(Flow(j,i)xj(1−xi)−Flow(i,j)xi(1−xj))+Σm=i+1i+k(Flow(m,i)xm(1−xi)−Flow(i,m)xi(1−xm))]+Attachment(i)(1−xi)−Detachment(i)xi
Elongation (i<=k): c.
In this case we have less than k sites before the i'th site and k sites after the i'th site.
Therefore, we sum all contribution of all k sites after the i'th site all k′ sites before the i'th site (k′<k, the maximum number of possible sites before the i'th site) to calculate the density of site i.
{dot over (x)}
l=[Σj=1i=1(Flow(j,i)xj(1−xi)−Flow(i,j)xi(1−xj))+Σm=i+1i+k(Flow(m,i)xm(1−xi)−Flow(i,m)xi(1−xm))]+Attachment(i)(1−xi)−Detachment(i)xi
Elongation (i>=n−k): d.
In this case we have k sites before the i'th site and less than k sites after the i'th site.
Therefore, we sum all contribution of all k sites before the i'th site all k′ sites after the i'th site (k′<k, the maximum number of possible sites after the i'th site) to calculate the density of site i.
{dot over (x)}
t=[Σj=i−ki−1(Flow(j,i)xj(1−xi)−Flow(i,j)xi(1−xj))+Σm=i+1n(Flow(m,i)xm(1−xi)−Flow(i,m)xi(1−xm))]+Attachment(i)(1−xi)−Detachment(i)xi
Termination: {dot over (x)}n=Flow(n+1,n)(1−xn)+Attachment(n)(1−xn)−Flow(n,n+1)xn−Detachment(n)xn+Σj=n−kn−1Flow(j,n)xj(1−xn)−Flow(n,j)xn(1−xj) e.
The model of ribosomal movement during elongation. To initiate the movement of the ribosome we calculate the initiation rate considering the density from the small sub-unit model in the SD location in the 5′ UTR.
The movement of the ribosome depends on the rRNA-mRNA interaction of the relevant site and the effect of other features such as adaptation to the tRNA pool (denoted as typical decoding rate, TDR) on the elongation at the site codon.
initiation rate=mean(density(34:43)) 1.
Flow Model Results.
Parameters and model validation. To demonstrate our model, we created an artificial gene with 100 codons that all of its sites are weak sites (rRNA-mRNA interaction=0). From this basic variant we generated 5 additional variants via introducing in nucleotide 33 a gradient of different rRNA-mRNA interaction strength.
We simulated our complete model (the pre-initiation stage with k=20 and the elongation model) for all the variants. As can be seen the signal is convex: Initially stronger interactions improve the translation rate but when the interaction strength is stronger than a certain threshold (−2.7<=intermediate<=−1.8) there is a decrease in the translation rate.
As can be seen (
Adding intermediate interaction along the transcript improve the translation process. To show that adding many intermediate interactions along the transcript (as we see in endogenous genes) improve the translation rate we performed the following simulation: we started with a variant with one intermediate interaction close to the beginning of the coding sequence (3 nt after the start codon);_we gradually added intermediate downstream of start codon to improve the translation rate. Specifically, to make sure that even for long genes the intermediate effect exist we simulated a longer sequence with 500 nucleotides, and each added intermediate sequence was downstream of the previous one in a position that improve the translation.
The simulation result appear in
Selection Against Strong Interaction at the End of the Coding Region—Read-Through Experiment.
Plasmids construction. We used plasmid pRX80 and modified it by deleting the lac I repressor gene and the CAT selectable marker. The resulting plasmid contained the RFP and GFP genes in tandem, both are expressed from a promoter with two consecutive lac operator domains. The plasmid contains also the pBR322 origin of replication and the Kanamycin resistance gene as a selectable marker. Because the 2 Operator sequences caused instability at the promoter region, we replaced the promoter region with a lacUV promoter with only one operator sequence. The resulting plasmid, pRCK28 was now used for the generation of variants which differ in the 40 last nucleotides of the RFP ORF. The variants include synonymous changes composed of both ribosome binding site at 3 energy ranges and which also alter the local folding energy (LFE) of the 40 last nucleotides of the RFP ORF end. The variable sequences where synthesized as G-blocks and Gibson assembly was used to replace the relevant region of the pRCK28 plasmid, generating 9 variants as described in
Fluorescent Tests. Single colonies of each variant as well as of the original pRCK28 clone and of a negative control (an E. coli clone harboring a Kanamycin resistant plasmid at the same size of pRC28 but without any fluorescent gene) were grown overnight in LB-Kanamycin. Cells were then diluted and 10,000 cells were inoculated into 110 ul defined medium (1×M9 salts, 1 mM thiamine hydrochloride, 2% glucose, 0.2% casamino acids, 2 mM MgSO4, 0.1 mM CaCl2) in 96 well plates. For each variant 2 biological repeats and 4 technical repeats of each were used. A fluorimeter (Spark-Tecan) was used to run growth and fluorescence kinetics. For growth, OD at 600 nm data were collected. For red fluorescence, excitation at 555 nm and emission at 584 nm were used. For green fluorescence, excitation at 485 nm and emission at 535 nm were used. Data was analyzed and normalized by subtracting the auto fluorescence values of the negative control, and by calculating the fluorescence to growth intensity ratios.
Western blot analyses. Cells were grown overnight, 1 ml cultures were concentrated by centrifugation and lysed using the BioGold lysis buffer supplemented with lysozyme. Total protein lysates were resolved on Tris glycin 4-15% acrylamide mini protein TGX stain free gels (BioRad). Proteins were transferred to nitrocellulose membranes using the trans-blot Turbo apparatus and transfer pack. Membranes were incubated in blocking buffer (TBS+1% casein) for 1 hr at room temperature. Anti GFP and/or anti RFP antibodies (Biolegend) were used at 1:5K, for 1 hr in blocking buffer, at room temperature to probe the GFP and RFP expression. Goat anti-mouse 2nd antibody was then applied at 1:10K dilution. ECL was used to generate a binding signal.
To understand the interactions between the 16S rRNA and mRNAs across the bacterial kingdom, a high-resolution computational model to predict the strength of rRNA-mRNA interactions was developed, where low hybridization free energy indicates a stronger interaction (See Methods). This model was used to analyze the entire transcriptome of 823 bacterial species, investigating all possible positions across all transcripts (i.e. 2,896,245 transcripts). To detect patterns of evolutionary selection, the distribution of rRNA-mRNA interaction strength was compared in each position along the transcriptome of each genome to the one expected by a null model. The null model preserves the codon frequencies, amino acid content, and GC content in each transcript (see Methods).
For each position along the transcriptome three statistical tests are performed to answer the following questions:
Herein there is reported the observed tendencies of sub-sequences within different transcript regions to produce strong, intermediate, and weak interactions with the 16S rRNA.
First, we analyzed the 5′UTRs of 551 bacteria with aSD (anti Shine Delgarno) sequence in the rRNA. It was suggested that translation initiation in prokaryotes is initiated by hybridization of the 16S rRNA to the mRNA. The 16S rRNA binds to the 5′UTR near and upstream of the START codon4 as depicted in
A second signal of selection for strong rRNA-mRNA interactions appears in the last nucleotide of the 5′UTR and the first five nucleotides of the coding sequence (
It has been suggested that at the beginning of the coding region there are various features that slow down the early stages of translation elongation to improve organism fitness, e.g. via optimizing ribosomal allocation and chaperon recruitment (
A comparison of highly and lowly expressed genes in E. coli (
Ribo-seq analyses in E. coli have indicated that strong interactions between the 16S rRNA and the mRNA can lead to pauses during translation elongation, hindering translation (
Our analysis reveals evidence of significant selection against strong rRNA-mRNA interactions in the coding region (
We found evidence for selection against strong rRNA-mRNA interactions in the coding region throughout the bacteria phyla analyzed, except for in cyanobacteria and gram-positive bacteria which seem to exhibit selection for strong rRNA-mRNA interactions (
Again, a comparison between highly and lowly expressed genes in E. coli reveals that selection against nucleotide sequences leading to strong interactions in the coding region is stronger for highly expressed genes which are under stronger selective pressure for more accurate and efficient translation (Wilcoxon rank-sum test p=1.5·10−30;
In addition, as can be seen in
In 82% of the analyzed bacterial species, in 50% of the positions at the last 20 nucleotides of the coding region, there is selection for strong rRNA-mRNA interactions (
Many genes in bacteria are transcribed as operons. Specifically, in E. coli, 55% of the genes are grouped in operons. In operons, the downstream gene has a start codon near the stop codon of the upstream gene which can affect the selection for strong interaction at the end of the coding region. Therefore, we further validate this signal, by looking on operons and especially looking on genes at the begging/middle/ending of an operon. As can be seen in
It has previously been found that when the rRNA binds to the mRNA the ribosome is generally decoding a codon located approximately 11 nt downstream of the binding site. To validate this, we inferred the positions with selection for the strongest interactions and identified those with minimum rRNA-mRNA interaction Z-scores within the last 20 nt of the coding region, in most of the analyzed bacteria (See Methods). We discovered that the strongest and most significant positions across all bacteria are indeed −9 through −12 relative to the STOP codon (
We examined the relationship between the strength of selection for strong interaction in the last 20 nt of coding regions with different levels of gene expression and found it to be convex: such selection is stronger for genes with intermediate expression and weaker for both lowly- and highly-expressed genes (
To test if strong rRNA-mRNA interactions just prior to the stop codon improve termination fidelity, we analyzed Ribo-seq data of E. coli (
To further experimentally test our hypothesis of strong rRNA-mRNA interactions just prior to the stop codon preventing stop-codon read-through, we used a construct mRNAs with a gene coding for red fluorescent protein (RFP) linked to a gene coding for green fluorescent protein (GFP;
The previous sections presented evidence for selection against strong interactions between the rRNA and mRNA throughout most of the coding region, but this doesn't mean that all interactions throughout this region are deleterious: other forces may act in differing directions. Prior to binding with mRNA, free ribosomal units travel by diffusion. Some interaction with the mRNA may assist to ‘guide’ the diffusing small subunit of the ribosome to remain near the transcript and ‘help’ them find the start codon, increasing their diffusion efficiency and consequently overall translation initiation efficiency (
Initiation is often the rate limiting stage of translation and the most limiting aspects probably appear to be the 3-dimensional diffusion of the small sub-unit to the SD region. One-dimensional diffusion (i.e. along the mRNA) may be faster: if mRNAs can ‘catch’ small ribosomal sub-units and then direct them to their start codons, they may be favored by evolution. The large amount of redundancy in the genetic code allows for mutations that may improve interactions between the rRNA and mRNA even in the coding region, without negatively affecting protein products; however as we have seen, strong interactions in the coding region are problematic. Based on these considerations; we hypothesized that evolution shapes coding regions to include intermediate rRNA-mRNA interactions, which are not strong enough to halt elongation, but can optimize pre-initiation diffusion.
To test this hypothesis, we created an unsupervised optimization model to identify sequences with intermediate rRNA-mRNA interactions by adaptively calculating rRNA-mRNA interaction-strength thresholds for each bacterium. The algorithm selects rRNA-mRNA interaction strength thresholds such that they delineate the maximum number of significant positions with rRNA-mRNA interactions between these thresholds (see Methods).
To verify that the thresholds are reasonable, we looked at the highest (per gene) rRNA-mRNA interaction strength distribution in the 5′UTR in two regions: 1) The canonical rRNA-mRNA interaction region during initiation (i.e. nucleotides −8 through −17 upstream to the start codon). 2) The region in the 5′UTR which is upstream to 1). We then defined each gene by two values: a. Minimum interaction strength (i.e. strongest interaction) from region 1) distribution. b. Minimum interaction strength from region 2) distribution. For each bacterium, we created distribution plots based on values a. and b. over its genes.
Our analyses revealed that in 52% of the analyzed bacteria at least 50% of the positions are under significant selection for intermediate rRNA-mRNA interactions: according to the null model this would be expected to be the case for only 0.18% (
When looking on the intermediate selection signal, we can see that the signal can be observed in 52% of the analyzed bacteria, The groups of bacteria that exhibits that signal are: 47% of the Betaprotobacteria, 49% of the Cyano bacteria, 94% of the Delta bacteria, 43% of the Gamma bacteria, 83% of the Gram positive bacteria, 28% of the Purple bacteria, 100% of the Spirochete bacteria, and 26% of the Alpha bacteria and E. coli.
Selection for intermediate interactions in the coding region and 3′UTR can be seen in
Our null model preserves the protein itself, the codon bias and the GC content. Therefore, the observed selection cannot be favoring specific codons or amino acids. In addition, our rRNA-mRNA interaction profiles consider all three reading frames; hence, the amino acids are not the key factor that influences this signal. Furthermore, the fact that we see a similar pattern of selection in the UTRs (
We hypothesize that selection for intermediate rRNA-mRNA interactions in the coding region of a gene should improve its translation initiation efficiency and thus its protein levels. To demonstrate this, we calculated the partial Spearman correlations between the number of intermediate interaction sequences in the GFP variant (see previous Example) and the heterologous protein abundance (PA), based on 146 synonymous GFP variants that were expressed from the same promoter. The control variables were the codon adaptation index (CAI); a measure of codon usage bias, and mRNA folding energy (FE) near the start codon, known to affect translation initiation efficiency (the weaker the folding in the vicinity of the start codon the higher the fidelity and efficiency of translation initiation).
We defined an area of intermediate interactions according to the thresholds determined by our model in E. coli and calculated the correlation explained above. As expected, the correlation was positive and significant (r=0.35; P=0.2·10-4) indicating that variants with more sub-sequences in the coding region that bind to the rRNA with an intermediate interaction strength tend to have higher PA.
We found that this correlation is specifically very high (r=0.61; p=0.003) when the FE near the start codon is the strongest (
When calculating the partial Spearman correlation between the number of sub-sequences that interact in a weak manner with the rRNA and the PA of the GFP variants, the correlation is negative and significant (r=−0.32; p=8.5·10-5). This further validates our conjecture that translation efficiency in this case is indeed related to interactions that are neither very strong, nor very weak or absent. It also suggests that this effect on translation efficiency is related to the pre-initiation step and not the elongation step, otherwise we would expect positive correlation with weak interaction.
To validate the GFP correlation of intermediate interactions in an ‘unsupervised’ manner, we calculated the hybridization energy of all Ent sequences in the GFP variant and divided the sequences hybridization energy into five groups. Afterwards, we calculated the Spearman correlation between the number of sequences in a specific group of hybridization energy value and PA of the GFP variants. As can be seen in
We also analyzed E. coli genes by their mRNA half-life to assess how selection for intermediate interactions varies among them. We found that genes with shorter half-life tend to have more intermediate interaction. It is possible that these genes undergo stronger selection to include intermediate interactions since their corresponding mRNAs ‘have less time’ to initiate translation. Thus, the reported results discussed here suggest that the diffusion of the small ribosomal sub-unit is relatively fast.
To enhance our knowledge of the effect of intermediate interactions, we divided E. coli genes according to their mRNA half-life. For the top and bottom 20% we calculated the percentage of genes that have intermediate interaction in each position in the coding region. From this analysis we discovered that genes with shorter mRNA half-life tend to have more intermediate interactions (Wilcoxon test P=2.060·10−6). This result may be related to the fact that those mRNAs have ‘less time’ as genes to ‘catch’ ribosomes before they are degraded. Moreover, mRNA molecules of various genes tend to localized in certain regions in the cell; this may suggest that ‘catching’ ribosomes by one of the mRNA may improve their diffusion time to other close mRNAs once this specific mRNA has undergone degradation.
It is known that mRNAs tend to localize in certain regions in the cell, meaning that if we can keep the ribosome close to a certain mRNA we also keep it close to other mRNA's. If a certain mRNA ‘captures’ a ribosome then undergoes degradation this ribosome will likely remain close to other nearby mRNAs. It is also possible that due to compartmentalization and aggregation of many mRNA molecules the interaction with the small sub-unit of one mRNA can be ‘helpful’ for a nearby mRNA.
We further investigated the relation between the signals of selection for intermediate rRNA-mRNA interactions and doubling time. We divided the bacteria according to their doubling time and calculated the average number of intermediate significant positions in the coding region (
Finally, we created a computational biophysical model that describes the movement of the small ribosomal sub-unit along the transcript. In this model the movement is influenced by the intermediate interactions (
To verify and further investigate the reported signals, we analyzed bacteria that do not have the canonical aSD in their 16S rRNA. As expected, while analyzing such bacteria, most of our reported signals could not be found. The results of this sub-section reinforce our model, and conjecture of the importance of rRNA-mRNA interactions in all stages and sub-stages of translation.
We looked at selection for strong interactions at the 5′UTR. Due to the fact that the bacteria do not have the canonical aSD sequence in their 16S rRNA, there was no clear evidence of selection for strong rRNA-mRNA interactions in positions −8 through −17 in the 5′UTR (
As can be seen in
In bacteria with canonical aSD, at the end of the coding region, we detected a signal of selection for strong rRNA-mRNA interactions that enables stop codon recognition and prevents read-through. When we look at the bacteria with no canonical aSD (
The common assumption is that the SD and aSD sequences are usually the canonical ones. However, we believe that there may be organisms with different rRNA-mRNA interaction motifs. Thus, we developed an optimization model that finds the optimized SD and aSD sequences for a given bacterium in an unsupervised manner.
To find the optimal SD we devised the following algorithm (
For each such potential alternative “aSD”, and for each gene in the organism, we considered all the sub-sequences in position −8 through −17 in the 5′UTR, to find the sub-sequence with the strongest rRNA-mRNA interaction, with the potential to be an alternative “aSD”. These values were averaged across the genes, and the potential alternative “aSD” that yields the lowest average (related to strongest predicted averaged rRNA-mRNA interaction strength) is predicted to be an alternative “aSD” sequence.
We executed the optimization model on 551 bacteria. As can be seen in
To validate the GFP correlation of intermediate interactions in an ‘unsupervised’ manner, we calculated the hybridization energy of all 6 nt sequences in the GFP variant and divided the sequences hybridization energy into five groups. Afterwards, we calculated the Spearman correlation between the number of sequences in a specific group of hybridization energy value and PA of the GFP variants. As can be seen in
This application is a continuation of PCT Patent Application No. PCT/IL2020/050367 having International filing date of Mar. 26, 2020, which claims the benefit of priority of U.S. Provisional Patent Application No. 62/825,143 filed Mar. 28, 2019, both titled “METHODS FOR MODIFYING TRANSLATION”, the contents of which are all incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
62825143 | Mar 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/IL2020/050367 | Mar 2020 | US |
Child | 17486936 | US |