1. Field of the Invention
The present invention relates generally to biochemical models of living organisms and more specifically to modeling of metabolism and macromolecular expression, and microbial systems biology.
2. Background Information
The genotype-phenotype relationship is fundamental to biology. Historically, and still for most phenotypic traits, this relationship is described through qualitative arguments based on observations or through statistical correlations. Studying the genotype-phenotype relationship demands an appreciation that the relationship is multi-scale, ranging from the molecular to the whole cell. Reductionist approaches to biology have produced ‘parts lists’, and successfully identified key concepts (e.g., central dogma) and specific chemical interactions and transformations (e.g., metabolic reactions) fundamental to life. However, reductionist viewpoints, by definition, do not provide a coherent understanding of whole cell functions. Cellular phenotypes have been programmed into the genome over millions of years based on governing selection pressures. Accordingly, organisms have evolved highly intricate coordinated responses to external signals; these responses include regulated changes in gene expression and enzymatic activity needed to execute the growth process.
An understanding of the biophysical (i.e. physical, spatial, chemical, genetic, thermodynamic, etc.) constraints, both natural and artificial, placed on cellular functions at the genome-scale combined with in silico optimization of cellular fitness allows for approximating phenotypes even in the absence of complete regulatory knowledge. Constraints bridge the gap between system architecture (the cellular reaction network) and system behavior (biological phenotypes), but their definition requires a deep theoretical understanding of interactions among cellular components (including emergent phenotypes). Constraint-based modeling allows one to make testable predictions about biological phenotypes from limited knowledge.
The purpose of modeling a cell is to provide predictions about what will happen when it gets perturbed, either through changes in the environment or genetically through evolution or targeted mutation (i.e. predict response to both natural and artificial perturbations). Escherichia coli (E. coli) is a workhorse for fundamental microbiological studies and various biotechnological applications. Predictive models for E. coli are therefore of great commercial and scientific value. Our earlier experience demonstrated that coupling multiple cellular processes into a single constraint-based model leads to an ability to predict emergent and multi-scale phenotypes.
A goal of systems biology is to provide comprehensive biochemical descriptions of organisms that are amenable to mathematical inquiry. The biochemical descriptions are knowledgebases that are assembled from various biological data sources, including but not limited to biochemical, genetic, genomic, and metabolic; these knowledgebases may then be converted to mathematical models. These models may then be used to investigate fundamental biological questions, guide industrial strain design and provide a systems perspective for analysis of the expanding ocean of “omics” data. Omics data are high-throughput surveys of the molecular components of an organism, including but not limited to mRNA, proteins, and metabolites. Over the past decade, there has been steady progress in developing and applying biochemically-accurate genome-scale models of metabolism (M-Models) for basic research and industrial applications.
M-Models have proved foundational to the development of the field of microbial metabolic systems biology. M-Models have enabled a variety of basic and applied studies. M-Models provide a solution space that contains all possible molecular phenotypes underlying a global phenotype. Because M-Models do not explicitly account for all cellular processes, such as the production of macromolecular machinery of the target cell the M-Model solution space contains a substantial number of biologically-implausible predictions in additional to biologically-plausible predictions. If the production and degradation of the macromolecular machinery is taken into account in chemically accurate terms then we can effectively provide a full genetic basis for every computed molecular phenotype and compare outcomes of computation directly to omics data. The cellular processes of transcription and translation are comprised of a series of elementary chemical transformations that can be reconstructed from available data for target organisms and making them amenable to constraint-based modeling.
The cellular processes of transcription and translation are a series of elementary chemical transformations that depend on metabolism for raw materials and energy, but they create the macromolecular machinery responsible for all cellular functions, including metabolism. A modeling approach that accounts for the production and degradation of a cell's macromolecular machinery in chemically accurate terms will effectively provide a full genetic basis for every computed molecular phenotype (
The present invention provides an integrated model of metabolic and macromolecular expression (ME-Model), and a method for reconstructing an ME-Model from biological data. Specifically, the present invention provides a ME-Model which uses a biochemical knowledgebase of an organism to accurately determine the metabolic and macromolecular phenotype of the organism under different conditions. Further, the present invention provides a method to determine the most efficient conditions for producing a product from an organism.
The present invention uses two model laboratory microbial organisms, Thermotoga maritima (T. maritima) and E. coli K-12 MG1655, as illustrative examples. T. maritima was chosen due to its small genome size, wide-availability of structural data, and the presence of an M-Model. E. coli was chosen due to the large amount of experimental data available, including, but not limited to, transcription unit architecture, omics data, an M-Model, and a model of gene expression (E-Model). The ME-Model for T. maritima was reconstructed by correcting and updating the available M-Model, reconstructing the processes underlying macromolecular expression, and then coupling the metabolic and macromolecular expression processes. The ME-Model for E. coli K-12 MG1655 was reconstructed by correcting and updating the extant M-Model and E-Model and then coupling the models. Next, constraints were imposed as balances and bounds on the activity and flow of biomolecules through this integrated network. To compute cellular phenotypes with the constrained model, a scalable optimization procedure was developed, which allowed for the prediction of multi-scale phenotypes underlying cellular phenotypes, such as growth control and product formation. This model computes the functional proteome that is required to execute the cellular phenotypes. It computes a variety of data types that are available and provides unity in the field microbial systems biology by reconciling a variety of theories and principles related to cellular growth at various scales of complexity.
In one embodiment, the present invention provides a method for generating a model for determining the metabolic and macromolecular phenotype of an organism. The method includes generating a biochemical knowledgebase of an organism including metabolic and macromolecular synthetic pathways; generating a computational model from the biochemical knowledgebase by applying at least one coupling constraint; using the model to determine the metabolic and macromolecular phenotype of the organism or organisms as a function of genetic and environmental parameters; and computing metabolic and macromolecular changes associated with a perturbation of the organism or the organism's environment, thereby generating a model. The computational model assimilates the metabolic and macromolecular changes caused by the perturbation and then determines the metabolic and macromolecular phenotype of the organism.
In one aspect of the invention, the biochemical knowledgebase includes information regarding the organisms genome, proteome, RNA, metabolic pathways and reactions, biochemical pathways and reactions, energy sources and uses, reaction by-products, protein complexes, reactions to post-translationally modify/functionalize protein complexes, macromolecular synthesis machinery, transcription units, lipid content, metalio-ions, amino acid content, covalent modifications, and non-covalent modifications, or any combination thereof. In another aspect, the knowledgebase includes calculation of a structural reaction using lipid content, metal ion content, energy requirements of the organism, dNTP requirements for production of the organism's genome, ribosome production and doubling time, or any combination thereof. The relative composition of the structural reaction is derived from empirical measurements.
In an additional aspect, the perturbation of the organism or its environment is a change in genetic or environmental parameters. In one aspect, the change in genetic or environmental parameters includes change in the composition of growth media, sugar source, carbon source, growth rate, ribosome production, antibiotic presence, oxygen level, efficiency of macromolecular machinery, subjection to a chemical compound, genetic alteration, forced overproduction of a network component, and inhibition or hyperactivity of at least one enzyme, or any combination thereof. In one aspect, the efficiency of macromolecular machinery includes, but is not limited to, transcription and translation rates, enzyme catalytic rates and transport rates, or any combination thereof. In an aspect, the inhibition or hyperactivity of an enzyme may be caused by an environmental change or genetic perturbation. Further, the environmental change may be the presence or absence of antibiotics and the genetic perturbation may be directed protein engineering of specific chemical residues leading to modulated catalytic efficiency. In another aspect, the inhibition or hyperactivity of an enzyme may be a decrease or increase to an efficiency parameter. In a further aspect, the change in genetic parameters is the addition of heterologous and/or synthetic genetic material.
In certain aspects, the perturbations are subsequently related to the endogenous regulatory network of an organism to determine regulators that may facilitate or interfere with the process of achieving a desired phenotype. In other aspects, the perturbations are related to the endogenous regulatory network to discover new regulatory capacities in the organism.
In a further aspect, the perturbation is at least one change in basic model parameters to characterize the robustness of predictions to changes in the model parameters and determine the most relevant parameters.
In an aspect, the metabolic and macromolecular changes include alterations in gene expression, protein expression, RNA expression, translation, transcription, pathway activation or inactivation, production of metabolic by-products, energy use, growth rate, proteome changes and transcriptome changes or any combination thereof. In specific aspects, metabolic by-products include acetate secretion and hydrogen production; the proteome changes include amino acid incorporation rate, protein production, macromolecular synthesis, ribosomal protein expression, expression of peptide chains, enzyme expression, enzyme activity, RNA to protein mass ratio, protein degradation, post translational protein modification, proteome fluxes, translation and protein expression profile or any combination thereof and the transcriptome changes include gene expression, transcription, functional RNA expression, transcriptome fluxes, transcription rate, gene expression profile, or any combination thereof.
In one aspect of the invention, the coupling constraints may be applied to system boundaries, maximal transcriptional rate for stable RNA and mRNA; relaxing of the requirement that all synthesized components need to be used within the network; mRNA dilution; mRNA degradation or complex dilution; hyperbolic ribosomal catalytic rate; ribosomal dilution rate; RNA polymerase dilution rate; hyperbolic mRNA rate; coupling of mRNA dilution, degradation and translation reactions; coupling of tRNA dilution and charging reactions; macromolecular synthesis machinery dilution rate; and metabolic enzyme dilution rate, or any combination thereof. System boundaries include, but are not limited, to the external environment, interfaces between cellular compartments, interfaces between multi-scale processes, and biophysical limits on the lifetime and efficiency for cellular machinery.
In specific non-limiting examples, the coupling constraint for mRNA dilution is VmRNA Dilution≧amax*VmRNA Degradation; wherein amax is TmRNA/Td; the coupling constraint for mRNA degradation is VmRNA Degradation≧bmax*VTranslation; wherein bmax=1/ktranslation*TmRNA; the coupling constraint for complex dilution is VComplex Dilution≧cmax*VComplex Usage; wherein cmax=1/kcat*Td; the coupling constraint for the hyperbolic ribosomal catalytic rate is
the coupling constraint of the ribosomal dilution rate is
the coupling constraint of the RNA polymerase dilution rate is
the coupling constraint of coupling of mRNA dilution, degradation and translation reactions is dilmRNA≧α1degmRNA and degmRNA≧α2trslmRNA, wherein
the coupling constraint of the hyperbolic mRNA rate is
the coupling constraint of the hyperbolic tRNA efficiency rate is
the coupling constraint of the coupling of tRNA dilution and charging reactions is diltRNA≧αchgtRNA, wherein
the coupling constraint of the macromolecular synthesis machinery dilution rate is
and the coupling constraint of the metabolic enzyme dilution rate is
(where, TmRNA is the measured, or assumed, half-life for the mRNA molecule; Td is the organism's doubling time; ktranslation is the rate of translation; kcat is the enzyme's turnover constant; and, VmRNA Dilution, VmRNA Degradation, VTranslation, VComplex Dilution, and VComplex usage are reaction fluxes whose values are determined during the simulation procedure; kribo is the effective ribosomal rate; cribosome is
ro is the value of the vertical intercept if growth rate and the RNA/protein ratio are plotted (growth on the x-axis and RNA/protein ratio on the y-axis); kτ is the inverse of the slope of the relationship when growth and the RNA/protein ratio are plotted as for determination of ro; μ is growth rate; kRNAP is RNA polymerase (RNAP) transcription rate; VRibosome Dilution is dilution of ribosome; VRNAP dilution is the dilution of RNAP; Vtranslation of peptide is the translation of peptide; Vtranscription of TUi is the transcription of TUi; length (peptide)i is the length of peptidei; length is the number of nucleotides in TUi; diltRNA is μ[tRNA] chgtRNA is
[tRNA] is
diltRNA is the dilution of mRNA; degmRNA is the degradation of mRNA; trslmRNA is translation of protein from mRNA; [mRNA] is mRNA concentration; kmRNA is the mRNA catalytic rate; cmRNA is
chgmRNA is the charging of tRNA; diltRNA is the dilution of tRNA; [tRNA] is the tRNA concentration; ktRNA is the tRNA catalytic rate; ctRNA is
Vmachineryi dilution is the flux of the reaction leading to dilution of machine i; Vmetabolic enzymei dilution is the flux of the reaction leading to dilution of metabolic enzyme i, Vuse of machineryi is the sum of all fluxes using machine i; Vuse of metabolic enzymei is the sum of all fluxes using metabolic enzyme i). The coupling constraint is applied to one or more system boundary conditions resulting in a change in environmental conditions for the organism. The change in environmental conditions includes carbon source, sugar source, nitrogen source, metal source, phosphate source, oxygen level, carbon dioxide level, change in growth media, and the presence of another organism (of the same or different species) or any combination thereof.
In a further aspect, the coupling constraint is a component's efficiency of use. The efficiency of use may be determined by relating the rate of use of a component by the integrated network to its rate of dilution or degradation. The component maybe the ribosome, RNA Polymerase, mRNA, tRNA, or metabolic enzymes. Additionally, the efficiency of use is may be determined using properties of the component including molecular weight, solvent-accessible surface area, number of catalytic sites, kinetic parameters of its catalytic and allosteric sites, and elemental composition or any combination thereof. Additionally, the efficiency of use maybe determined by using the macromolecular composition of the cell. In a further aspect, the mRNA constraint includes the ratio of mRNA dilution/mRNA degradation, the ratio of mRNA degradation/translation rate, and the ratio of mRNA dilution/translation rate, or any combination thereof. Further, the efficiency of use for the mRNA maybe determined using mRNA half-life data, proteomics and transcriptomics data, a ribosome flow model, and ribosome profiling, or any combination thereof.
In one aspect, the coupling constraints provide lower and/or upper bounds on flux ratios.
In one aspect, the organism is a microbial organism. In one aspect, the organism is genetically modified. In non-limiting examples, the organism includes Thermotoga maritima (T. maritima) and Escherichia coli (E. coli).
In an additional aspect, the generation of the model comprises high-precision arithmetic by an optimization solver. Further, the model predicts the organism's maximum growth rate (μ*) in the specified environment, substrate uptake/by-product secretion rates at μ*, biomass yield at μ*, central carbon metabolic fluxes at μ*, and gene product expression levels (both in terms of mRNA and protein) at μ* or any combination thereof.
In another embodiment, the invention provides a model for determining the metabolic and macromolecular phenotype of an organism. The model includes a data storage device which contains a biochemical knowledgebase of the organism; a user input device wherein the user inputs perturbation of the organism or the organism's environment information; a processor having the functionality to compare the biochemical knowledgebase and the perturbation information, then apply at least one coupling constraint thereto to determine the metabolic and macromolecular phenotype of the organism; a visualization display which displays the results of the determination; and an output which provides the metabolic and macromolecular phenotype of the organism. The perturbation information includes metabolic and macromolecular changes.
In one aspect, the biochemical knowledgebase includes information regarding the organism's genome, proteome, DNA, RNA, metabolic pathways and reactions, biochemical pathways and reactions, energy sources and uses, reaction by-products, protein complexes, macromolecular synthesis machinery, transcription units, lipid content, metalio-ions, amino acid content, covalent modifications, and non-covalent modifications, or any combination thereof. In another aspect, the biochemical knowledgebase includes calculation of a structural reaction using lipid content, metal ion content, energy requirements of the organism, ribosome production and doubling time, or any combination thereof.
In an aspect, the perturbation of the organism or its environment is a change in genetic or environmental parameters. In one aspect, the change in genetic or environmental parameters includes change in the composition of growth media, sugar source, carbon source, growth rate, ribosome production, antibiotic presence, oxygen level, efficiency of macromolecular machinery, subjection to a chemical compound, genetic alteration, forced overproduction of a network component, and inhibition or hyperactivity of at least one enzyme or any combination thereof. In one aspect, the efficiency of macromolecular machinery includes, but is not limited to transcription and translation rates, enzyme catalytic rates and transport rates, or any combination thereof. In an aspect, the inhibition or hyperactivity of an enzyme may be caused by an environmental change or genetic perturbation. Further, the environmental change may be the presence or absence of antibiotics and the genetic perturbation is directed protein engineering of specific chemical residues leading to modulated catalytic efficiency. In another aspect, the inhibition or hyperactivity of an enzyme is a decrease or increase to the efficiency parameter. In a further aspect, the change in genetic parameters is the addition of heterologous and/or synthetic genetic material.
In certain aspects, the perturbations are subsequently related to the endogenous regulatory network of the organism to determine regulators that may facilitate or interfere with the process of achieving a desired phenotype. In other aspects, the perturbations are related to the endogenous regulatory network to discover new regulatory capacities in the organism.
In an additional aspect, the metabolic and macromolecular changes include alterations in gene expression, protein expression, RNA expression, translation, transcription, pathway activation or inactivation, production of metabolic by-products, energy use, growth rate, proteome changes and transcriptome changes or any combination thereof. In specific aspects, the metabolic by-products include acetate secretion and hydrogen production; the proteome changes include amino acid incorporation rate, protein production, macromolecular synthesis, ribosomal protein expression, expression of peptide chains, enzyme expression, enzyme activity, RNA to protein mass ratio, protein degradation, post translational protein modification, proteome fluxes, translation and protein expression profile or any combination thereof; and the transcriptome changes include gene expression, transcription, functional RNA expression, transcriptome fluxes, transcription rate, gene expression profile or any combination thereof.
In a further aspect, the coupling constraints may be applied to system boundaries; maximal transcriptional rate for stable RNA and mRNA; relaxing of the requirement that all synthesized components need to be used within the network; mRNA dilution; mRNA degradation or complex dilution; hyperbolic ribosomal catalytic rate; ribosomal dilution rate; RNA polymerase dilution rate; hyperbolic mRNA rate; coupling of mRNA dilution, degradation and translation reactions; coupling of tRNA dilution and charging reactions; macromolecular synthesis machinery dilution rate; and metabolic enzyme dilution rate, or any combination thereof. System boundaries include, but are not limited to the external environment, interfaces between cellular compartments, interfaces between multi-scale processes, and biophysical limits on the lifetime and efficiency for cellular machinery.
The coupling constraint is applied to one or more system boundary conditions resulting in a change in environmental conditions for the organism. Additionally, the change in environmental includes carbon source, sugar source, nitrogen source, metal source, phosphate source, oxygen level, carbon dioxide level, change in growth media, and the presence of another organism (of the same or different species) or any combination thereof.
In a further aspect, the coupling constraint is a component's efficiency of use. The efficiency of use may be determined by relating the rate of use of a component by the integrated network to its rate of dilution or degradation. The component maybe the ribosome, RNA Polymerase, mRNA, tRNA, or metabolic enzymes. Additionally, the efficiency of use is may be determined using properties of the component including molecular weight, solvent-accessible surface area, number of catalytic sites, kinetic parameters of its catalytic and allosteric sites, and elemental composition or any combination thereof. The efficiency of use maybe determined by using the macromolecular composition of the cell. In a further aspect, the mRNA constraint includes the ratio of mRNA dilution/mRNA degradation, the ratio of mRNA degradation/translation rate, and the ratio of mRNA dilution/translation rate, or any combination thereof. Additionally, the efficiency of use for the mRNA maybe determined using mRNA half-life data, proteomics and transcriptomics data, a ribosome flow model, and ribosome profiling, or any combination thereof.
In one aspect, the coupling constraints provide lower and/or upper bounds on flux ratios.
In a further embodiment, the present invention provides a method to determine the metabolic and macromolecular phenotype of an organism. The subject method includes generating a biochemical knowledgebase of the organism; introducing a perturbation to the organism or the organism's environment; using the biochemical knowledgebase to determine the metabolic and macromolecular changes associated with the perturbation and applying at least one coupling constraint; and determining of the metabolic and macromolecular phenotype of the target organism.
In one embodiment, the present invention provides a model for performing a cost estimate analysis of producing a product in an organism. The model includes a data storage device which contains a biochemical knowledgebase of the organism, costs associated with producing the product and price of the product; a user input device wherein the user inputs parameters for producing the product; a processor having the functionality to compare the biochemical knowledgebase and the parameters to determine metabolic and macromolecular changes; apply at least one coupling constraint and perform cost benefit analysis thereto; a visualization display which displays the results of the analysis; and an output which provides the cost estimate analysis.
In a one aspect, the output is a graph or a chart depicting profitability estimate, estimates of key bioprocessing parameters such as feedstock consumption, reactor volume and production formation. In one aspect, the product is a naturally occurring or a recombinant protein. In another aspect, the product is a molecule, such as hydrogen or acetate.
a-b) show characteristics of M- and ME-Models objective functions and assumptions.
a-b) show how perturbing ME-Model parameters can aid the development of hypotheses to explain discrepancies between the ME-Model and experimental data.
The present invention provides an integrated model of metabolic and macromolecular expression (ME-Model), and a method for reconstructing an ME-Model from biological data. Specifically, the present invention provides a ME-Model which uses a biochemical knowledgebase of an organism to accurately determine the metabolic and macromolecular phenotype of the organism under different conditions. Further, the present invention provides a method to determine the most efficient conditions for producing a product from an organism.
Before the present compositions and methods are described, it is to be understood that this invention is not limited to particular compositions, methods, and experimental conditions described, as such compositions, methods, and conditions may vary. It is also to be understood that the terminology used herein is for purposes of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only in the appended claims.
As used in this specification and the appended claims, the singular forms “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise. Thus, for example, references to “the method” includes one or more methods, and/or steps of the type described herein which will become apparent to those persons skilled in the art upon reading this disclosure and so forth.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the invention, the preferred methods and materials are now described.
Here, it is shown that the integration of the metabolic and macromolecular expression networks leads to ME-Models that effectively describe the molecular biology of the target cell at a genome-scale along with its metabolic requirements, thus enabling the direct and mechanistic interpretation of omics data. ME-Models are biochemical knowledgebases of the genomic, genetic, biochemical, metabolic, transcriptional, translational, and ancillary biological and chemical processes that necessary to represent metabolism and macromolecular expression for a self-propagating organism. ME-Models allow the full reconciliation of the simultaneous cellular processes that underlie to the function of a cell. The subject ME-Models may be used for (1) modeling recombinant protein production, (2) modeling processes underlying antibiotic-mediated cell death, since the integrated model accounts for the majority of antibiotic targets, and (3) interpreting regulatory circuits in terms of economic efficiency. The ME-Model approximates the content of the transcriptome and proteome in the absence of regulatory constraints with failures being indicative of regulatory constraints.
Thermotoga maritima (T. maritima) is a hyperthermophillic bacterium that is found in one of the deepest branches of Eubacteria. There is substantial interest in developing T. maritima as a model organism for industrial engineering processes due to its ability to metabolize a wide variety of feedstocks into valuable products, including hydrogen gas, H2. T. maritima is able to produce H2 near the Thauer limit of 4 moles per mole of glucose, however, H2 inhibits growths. T. maritima has a small 1.8 Mb genome and supports relatively few transcriptional regulatory states, with only 53 predicted transcription factors. The existence of a few regulatory states may simplify the addition of synthetic capabilities by reducing unexpected and irremediable side-effects and facilitate metabolic engineering efforts. In other words, starting with a minimal genome as a chassis for cellular design will reduce the potential that the features added to the organism will trip an unexpected signal, thus simplifying the addition of synthetic circuits to convert waste streams into valuable products. Efforts are underway to establish genetic tools to facilitate the manipulation of T. maritima and potentially increase growth while sustaining high hydrogen yields, however, no efficient tools exist to date. Quantitative computer models are the basis for large-scale biological design.
A first step in the establishment of computational tools for modeling T. maritima metabolism was accomplished with the integration of structural genomics data with a metabolic network knowledgebase. The knowledgebase of Biochemically, Genetically, and Genomically (BiGG) consistent knowledgebase of metabolism is an established four step procedure that has been extensively automated. Here, the network knowledgebase procedure was extended to include macromolecular synthesis and post-transcriptional modifications (
A similar ME Model was developed using E. coli. The most recent metabolic knowledgebase (M-Model) of E. coli accounts for function of 1366 metabolic genes, which represents approximately 30% of the open reading frames (ORF) in E. coli's genome. Recently, the first genome-scale, stoichiometric network of the transcriptional and translational (tr/tr) machinery of E. coli was constructed (E-Model). The knowledgebase accounts for 303 gene products, including ribosomal proteins, RNA polymerase, tRNA and rRNA. The method prototyped on T. maritima was employed to integrate updated versions of the E. coli M-Model and E-Model into an ME-Model.
With the formulation of an ME-Model, it is no longer necessary to include gross amino-acid and ribonucleotide compositions in the biomass reaction. In the ME-Model, the biomass requirements are simplified and only contains lipids, metal ions, and energy requirements, that together can be thought of as a structural maintenance requirement. Instead of employing the gross biomass requirement as the optimization target when computationally simulating log-phase growth, ribosome production was employed as the optimization target for the ME-Model (
As used herein, the terms “omics”, “omics data” and “multi-omics data” includes information from genomics, transcriptomics, proteomics, metabolomics, snpomics, and fluxomics, and other high-throughput measurements of biological components or chemical or physical modifications to the components.
Metabolic models (M-models) represent metabolism in biochemical detail and at a genome-scale, but they do not quantitatively describe gene expression thus do not afford quantitative interpretation of omics data. In M-models an enzyme may carry infinite fluxes, unless vmax constraints are imposed, and a simple monomeric enzyme is equivalent to a complex multimeric isozymes. Successful applications of M-models have often focused on numerically simulating the overall production of cellular components required for cell growth's. The organism's gross lipid, nucleotide, amino acid, and cofactors, as well as growth-associated and maintenance ATP usage, are experimentally measured. Then, these measurements are integrated with the organism's doubling time (Td) to define a biomass reaction that approximates the dilution of cellular materials during formation of daughter cells. By employing the biomass formation as an optimality target, it has been possible to simulate quantitatively accurate global phenotypes (e.g., log-phase growth rates, substrate consumption, product formation) for microbes on a variety of carbon sources. As the biomass reaction only provides a gross approximation of cellular components, M-model simulations do not provide explicit predictions for which RNAs and proteins are active and thus causal for the global phenotype.
Metabolic and macromolecular expression models (ME-Models) allow for the explicit analysis and simulation of transcriptomes and proteomes in the context of the underlying reaction network. The incorporation of metabolic and macromolecular analysis reduces the dependence on artificial objective functions, such as the biomass objective function, which do not have a strict biological basis. ME-Models that effectively describe the molecular biology of the target cell at a genome-scale along with its metabolic requirements, thus enabling the direct and mechanistic interpretation of omics data. ME-Models allow the full reconciliation of the simultaneous cellular processes that underlie to the function of a cell. The incorporation of biochemical reactions underlying the expression of gene products within a metabolic network knowledgebase allowed the removal of artificial Boolean gene-protein-reaction and facilitated the simulation of variable enzyme concentrations. This type of model allows the explicit representation of transcription and translation provided an opportunity to directly employ quantitative transcriptomic and proteomic measurements as model constraints.
As used herein the term “metabolic and macromolecular phenotype” refers to metabolic, genetic, biochemical or macromolecular status. This includes, but is not limited to, gene expression, protein expression, enzyme activity, pathway activity, metabolic by-product formation, energy usage or any combination thereof.
As used herein, a structural reaction is used to account for the dilution of structural materials (e.g., DNA, cell wall, lipids, etc.) during cell division and the energy cost associated with the cellular maintenance of the structure. Conceptually, this structural reaction approximates the production of a cell whose composition varies as a function or environment and growth rate. M-models often focus on numerically simulating the overall production of cellular components required for cell growth. The organisms gross lipid, nucleotide, amino acid and cofactors as well as growth-maintenance ATP usage are experimentally measured and then integrated with the organisms doubling time (Td) to define a biomass reaction. In contrast, the subject ME-Model does not require gross amino acid and ribonucleotide compositions in the biomass reaction. The ME-Model relies on a structural reaction using only DNA, lipid, metal ions and energy requirements. As the scope of the knowledgebase increases the number of components in the structural reaction decreases. For example, the structural reaction for T. maritima ME-Model included metal ions, whereas, the structural reaction for the recent E. coli ME-Model did not.
In one embodiment, the present invention provides a method for generating a model for determining the metabolic and macromolecular phenotype of an organism. The method includes generating a biochemical knowledgebase of an organism including metabolic and macromolecular synthetic pathways; generating a computational model from the biochemical knowledgebase by applying at least one coupling constraint; using the model to determine the metabolic and macromolecular phenotype of the organism or organisms as a function of genetic and environmental parameters; and computing metabolic and macromolecular changes associated with a perturbation of the organism or the organism's environment, thereby generating a model. The computational model assimilates the metabolic and macromolecular changes caused by the perturbation and then determines the metabolic and macromolecular phenotype of the organism.
In one aspect of the invention, the biochemical knowledgebase includes information regarding the organism's genome, proteome, RNA, metabolic pathways and reactions, biochemical pathways and reactions, energy sources and uses, reaction by-products, protein complexes, reactions to post-translationally modify/functionalize protein complexes, macromolecular synthesis machinery, transcription units, lipid content, metalio-ions, amino acid content, prosthetic cofactors, covalent modifications, and non-covalent modifications, or any combination thereof. In another aspect, the biochemical knowledgebase includes calculation of a structural reaction using lipid content, metal ion content, energy requirements of the organism, dNTP requirements for production of the organism's genome, ribosome production and doubling time, or any combination thereof. The relative composition of the structural reaction is derived from empirical measurements.
The biochemical knowledgebase contains all known genes, gene products and proteins of an organism. In addition, metabolic reactions are associated with protein complexes. Additionally, the biochemical knowledgebase contains reactions including, but not limited to, transcription, mRNA degradation, translation, protein maturation, RNA processing, protein complex formation, ribosomal assembly, rRNA modification, tRNA modification, tRNA charging, aminoacyl-tRNA synthetase charging, charging EF-Tu (elongation factor), cleavage of polycistronic mRNA to release stable RNA products, demands, tRNA activation and metabolism. The model also includes transcription units (TU), stable RNAs (tRNA, rRNA, etc.) peptide chains, prosthetic groups, covalent modifications, non-covalent modifications, and assembly of multimeric proteins and dilution of macromolecules during cell growth and division. Further, the model accounts for reaction by products and energy usage.
In an additional aspect, the perturbation of the organism or its environment is a change in genetic or environmental parameters. In one aspect, the change in genetic or environmental parameters includes changes in the composition of growth media, sugar source, carbon source, growth rate, ribosome production, antibiotic presence, oxygen level, efficiency of macromolecular machinery, subjection to a chemical compound, genetic alteration, forced overproduction of a network component, and inhibition or hyperactivity of at least one enzyme, or any combination thereof. In one aspect, the efficiency of macromolecular machinery includes, but is not limited to, transcription and translation rates, enzyme catalytic rates and transport rates, or any combination thereof. In an aspect, the inhibition or hyperactivity of an enzyme may be caused by an environmental change or genetic perturbation. Further, the environmental change may be the presence or absence of antibiotics and the genetic perturbation may be directed protein engineering of specific chemical residues leading to modulated catalytic efficiency. In another aspect, the inhibition or hyperactivity of an enzyme may be a decrease or increase to an efficiency parameter. In a further aspect, the change in genetic parameters is the addition of heterologous and/or synthetic genetic material.
In certain aspects, the perturbations are subsequently related to the endogenous regulatory network to determine regulators that may facilitate or interfere with the process of achieving a desired phenotype, such as production of a small metabolite. In other aspects, the perturbations are related to the endogenous regulatory network to discover new regulatory capacities in the target organism.
In a further aspect, the perturbation is at least one change in basic model parameters to characterize the robustness of predictions to changes in the model parameters and determine the most relevant parameters.
In an additional aspect, the metabolic and macromolecular changes include alterations in gene expression, protein expression, RNA expression, translation, transcription, pathway activation or inactivation, production of metabolic by-products, energy use, growth rate, proteome changes and transcriptome changes or any combination thereof. In specific aspects, metabolic by-products include acetate secretion and hydrogen production; the proteome changes include amino acid incorporation rate, protein production, macromolecular synthesis, ribosomal protein expression, expression of peptide chains, enzyme expression, enzyme activity, RNA to protein mass ratio, protein degradation, post translational protein modification, proteome fluxes, translation and protein expression profile or any combination thereof and the transcriptome changes include gene expression, transcription, functional RNA expression, transcriptome fluxes, transcription rate, gene expression profile or any combination thereof.
These changes include increased or decreased expression of enzymes, proteins, genes, RNA or peptide chains; increase or decrease in by-product formation; increase or decrease in enzyme activity; increase or decrease in protein degradation or post translational modification; increase or decrease on transcription or translation; increase or decrease in proteome or transcriptome fluxes and changes in overall transcriptome and proteome profiles and activities.
In a further aspect, the coupling constraints may be applied to system boundaries, maximal transcriptional rate for stable RNA and mRNA; relaxing of the requirement that all synthesized components need to be used within the network; mRNA dilution; mRNA degradation or complex dilution; hyperbolic ribosomal catalytic rate; ribosomal dilution rate; RNA polymerase dilution rate; hyperbolic mRNA rate; coupling of mRNA dilution, degradation and translation reactions; coupling of tRNA dilution and charging reactions; macromolecular synthesis machinery dilution rate; and metabolic enzyme dilution rate, or any combination thereof. System boundaries include, but are not limited to the external environment, interfaces between cellular compartments, interfaces between multi-scale processes, and biophysical limits on the lifetime and efficiency for cellular machinery.
In specific non-limiting examples, the coupling constraint for mRNA dilution is VmRNA Dilution≧amax*VmRNA Degradation; wherein amax is TmRNA/Td; the coupling constraint for mRNA degradation is VmRNA Degradation≧bmax*VTranslation; wherein bmax=1/ktranslation*TmRNA; the coupling constraint for complex dilution is VComplex Dilution≧cmax*VComplex Usage; wherein cmax=1/kcat*Td; the coupling constraint for the hyperbolic ribosomal catalytic rate is
the coupling constraint of the ribosomal dilution rate is
the coupling constraint of the RNA polymerase dilution rate is
the coupling constraint of coupling of mRNA dilution, degradation and translation reactions is dilmRNA≧α1degmRNA and degmRNA≧α2trslmRNA, wherein
the coupling constraint of the hyperbolic mRNA rate is
the coupling constraint of the hyperbolic tRNA efficiency rate is
the coupling constraint of the coupling of tRNA dilution and charging reactions is diltRNA≧αchgtRNA, wherein
the coupling constraint of the macromolecular synthesis machinery dilution rate is
and the coupling constraint of the metabolic enzyme dilution rate is
(where, TmRNA is the measured, or assumed, half-life for the mRNA molecule; Td is the organism's doubling time; ktranslation is the rate of translation; kcat is the enzyme's turnover constant; and, VmRNA Dilution, VmRNA Degradation, VTranslation, VComplex Dilution, and VComplex Usage are reaction fluxes whose values are determined during the simulation procedure; kribo is the effective ribosomal rate; cribosome is
ro is the value of the vertical intercept if growth rate and the RNA/protein ratio are plotted (growth on the x-axis and RNA/protein ratio on the y-axis); kτ is the inverse of the slope of the relationship when growth and the RNA/protein ratio are plotted as for determination of ro; μ is growth rate; kRNAP is RNA polymerase (RNAP) transcription rate; VRibosome Dilution is dilution of ribosome; VRNAP dilution is the dilution of RNAP; Vtranslation of peptide is the translation of peptide; Vtranscription of TUi is the transcription of TUi; length (peptide)i is the length of peptidei; length is the number of nucleotides in TUi; diltRNA is μ[tRNA]; chgtRNA is
[tRNA] is
dilmRNA is the dilution of mRNA; degmRNA is the degradation of mRNA; trslmRNA is translation of protein from mRNA; [mRNA] is mRNA concentration; kmRNA is the mRNA catalytic rate; cmRNA is
chgmRNA is the charging of tRNA; diltRNA is the dilution of tRNA; [tRNA] is the tRNA concentration; ktRNA is the tRNA catalytic rate; ctRNA is
Vmachineryi dilution is the flux of the reaction leading to dilution of machine i; Vmetabolic enzymei dilution is the flux of the reaction leading to dilution of metabolic enzyme i, Vuse of machineryi is the sum of all fluxes using machine i; Vuse of metabolic machineryi is the sum of all fluxes using metabolic enzyme i). The coupling constraint is applied to one or more system boundary conditions resulting in a change in environmental conditions for the organism. The change in environmental conditions includes carbon source, sugar source, nitrogen source, metal source, phosphate source, oxygen level, carbon dioxide level, change in growth media, and the presence of another organism (of the same or different species) or any combination thereof.
In a further aspect, the coupling constraint is a component's efficiency of use. The efficiency of use may be determined by relating the rate of use of a component by the integrated network to its rate of dilution or degradation. The component maybe the ribosome, RNA Polymerase, mRNA, tRNA, or metabolic enzymes. Additionally, the efficiency of use is may be determined using properties of the component including molecular weight, solvent-accessible surface area, number of catalytic sites, kinetic parameters of its catalytic and allosteric sites, and elemental composition or any combination thereof. The efficiency of use maybe determined by using the macromolecular composition of the cell. In a further aspect, the mRNA constraint includes the ratio of mRNA dilution/mRNA degradation, the ratio of mRNA degradation/translation rate, and the ratio of mRNA dilution/translation rate, or any combination thereof. Additionally, the efficiency of use for the mRNA maybe determined using mRNA half-life data, proteomics and transcriptomics data, a ribosome flow model, and ribosome profiling, or any combination thereof.
In one aspect, the coupling constraints provide lower and/or upper bounds on flux ratios.
Coupling constraints are added to more accurately reflect the metabolic state of the organism. The subject ME-Model uses a mRNA dilution constraint which requires that one mRNA must be removed from the cell for every Td/TmRNA times it is degraded; a mRNA degradation constraint which requires that one mRNA must be degraded every ktranslation*TmRNA times it is translated; and a complex dilution constraint which requires that one complex must be removed from the cell for every kcat*Td times it is used in the network. Other coupling constraints include, but are limited to, constrains on the exchange reactions to simulate different environmental conditions, constraints on the maximal transcription rate for stable and mRNA (vi: vimin≦vi≦vimax) and coupling constrains on reactions in the form of v4−cmin*vs≧−s,s≧0 and v4−cmax*vs≦0. Details regarding these constraints and their derivations are provided in the examples.
The term “organism” refers both to naturally occurring organisms and to non-naturally occurring organisms, such as genetically modified organisms. An organism can be a virus, a unicellular organism, or a multicellular organism, and can be either a eukaryote or a prokaryote. Further, an organism can be an animal, plant, protist, fungus or bacteria. Exemplary organisms include, but are not limited to bacterial organisms, which include a large group of single-celled, prokaryote microorganisms, and archeal organisms, which include a group of single-celled microorganisms. Bacterial organisms also include gram negative bacteria, gram positive bacteria, pathogenic bacteria, electrosynthetic bacteria and photosynthetic bacteria. Additional examples of bacterial organisms include, but are not limited to, Acinetobacter baumannii, Acinetobacter baylyi, Bacillus subtilis, Buchnera aphidicola, Chromohalobacter salexigens, Clostridium acetobutylicum, Clostridium beijerinckii, Clostridium thermocellum, Corynebacterium glutamicum, Dehalococcoides ethenogenes, Escherichia coli, Francisella tularensis, Geobacter metallireducens, Geobacter sulfurreducens, Haemophilus influenza, Helicobacter pylori, Klebsiella pneumonia, Lactobacillus plantarum, Lactococcus lactis, Mannheimia succiniciproducens, Mycobacterium tuberculosis, Mycoplasma genitalium. Neisseria meningitides, Porphyromonas gingivalis, Pseudomonas aeruginosa, Pseudomonas putida, Rhizobium etli, Rhodoferax ferrireducens, Salmonella typhimurium, Shewanella oneidensis, Staphylococcus aureus, Streptococcus thermophiles, Streptomyces coelicolor, Synechocystis sp. PCC6803, Thermotoga maritima, Vibrio vulnificus, Yersinia pestis, Zymomonas mobilis, Halobacterium salinarum, Methanosarcina barkeri, Methanosarcina acetivorans, Methanosarcina acetivorans, Natronomonas pharaonis, Arabidopsis thaliana, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Cryptosporidium hominis, Chlamydomonas reinhardtii.
Organisms are ordinarily grown in media containing nutrients. Growth media is the media which provides the nutrients that an organism requires for growth. Generally, undefined growth media contains a source of amino acids and nitrogen (e.g., beef, yeast extract). This is an undefined medium because the amino acid source contains a variety of compounds with the exact composition being unknown. Nutrient media contain all the elements that most bacteria need for growth and are non-selective, so are used for the general cultivation and maintenance of bacteria kept in laboratory culture collections. An undefined medium (also known as a basal or complex medium) is a medium that contains a carbon source such as glucose for bacterial growth, water and various salts needed for bacterial growth. Minimal media are those that contain the minimum nutrients possible for colony growth, generally without the presence of amino acids. Minimal medium typically contains a carbon source for bacterial growth, which may be a sugar such as glucose, or a less energy-rich source like succinate; various salts, which may vary among bacteria species and growing conditions; these generally provide essential elements such as magnesium, nitrogen, phosphorus, and sulfur to allow the bacteria to synthesize protein and nucleic acid and water. The growth media may be supplemented with other factors such as amino acids, sugars and antibiotics for example.
In one aspect, the organism is a microbial organism. In one aspect, the organism is genetically modified. In non-limiting examples, the organism includes Thermotoga maritima (T. maritima) and Escherichia coli (E. coli).
In an additional aspect, the generation of the model comprises high-precision arithmetic by an optimization solver. Further, the model predicts the organism's maximum growth rate (μ*) in the specified environment, substrate uptake/by-product secretion rates at μ*, biomass yield at μ*, central carbon metabolic fluxes at μ*, and gene product expression levels (both in terms of mRNA and protein) at μ* or any combination thereof. High precision arithmetic is >64-bit computing or relying on an iterative refinement procedure.
As described in the examples, ME-Model for T. maritima simulates changes in cellular composition with growth rate, in agreement with previously reported experimental findings. Positive correlations were observed between in silico and in vivo transcriptomes and proteomes for the 651 genes in our ME-Model with statistically significant (p<1×10−5 t-test) Pearson Correlation Coefficients (PCC) of 0.54 and 0.57, respectively. And, when the subject ME-Model was used as an exploratory platform for an in silico comparative transcriptomics study, it was discovered putative transcription factor (TF) binding motifs and regulons associated with L-arabinose (L-Arab) and cellobiose metabolism, and improved functional and transcription unit (TU) architecture annotation. Further, a ME-Model for E. coli was used to simulate growth rates, substrate reuptake rates, oxygen uptake rates, central carbon fluxes, by-product secretion, phenotypic changes arising from adaptive evolution, macromolecular expression under nutrient limitation and nutrient excess, and demonstrated a correlation between effective in silico and in vivo codon usage. Overall, ME-Models provide a chemically and genetically consistent description of an organism, thus they begin to bridge the gap currently separating molecular biology and cellular physiology.
In another embodiment, the invention provides a model for determining the metabolic and macromolecular phenotype of an organism. The model includes a data storage device which contains a biochemical knowledgebase of the organism; a user input device wherein the user inputs perturbation of the organism or the organism's environment information; a processor having the functionality to compare the biochemical knowledgebase and the perturbation information, then apply at least one coupling constraint thereto to determine the metabolic and macromolecular phenotype of the organism; a visualization display which displays the results of the determination; and an output which provides the metabolic and macromolecular phenotype of the organism. The perturbation information includes metabolic and macromolecular changes.
A storage device is a device for recording (storing) information (data). Storing can be done using virtually any form of energy, spanning from manual muscle power in handwriting, to acoustic vibrations in phonographic recording, to electromagnetic energy modulating magnetic tape and optical discs. A storage device may hold information, process information, or both. A device that only holds information is a storing medium. Devices that process information (data storage equipment) may either access a separate portable (removable) recording medium or a permanent component to store and retrieve information. Electronic data storage requires electrical power to store and retrieve that data. Most storage devices that do not require vision and a brain to read data fall into this category. Electromagnetic data may be stored in either an analog or digital format on a variety of media. This type of data is considered to be electronically encoded data, whether or not it is electronically stored in a semiconductor device, for it is certain that a semiconductor device was used to record it on its medium. Most electronically processed data storage media (including some forms of computer data storage) are considered permanent (non-volatile) storage, that is, the data will remain stored when power is removed from the device. In contrast, most electronically stored information within most types of semiconductor (computer chips) microcircuits are volatile memory, for it vanishes if power is removed.
A user input device is device is any peripheral (piece of computer hardware equipment) used to provide data and control signals to an information processing system such as a computer or other information appliance. Examples of input devices include keyboards, mice, scanners, digital cameras and joysticks.
A processor is a device that performs calculations or other manipulations of data. Data processing is any process that uses a computer program to enter data and summarize, analyze or otherwise convert data into usable information. It involves recording, analyzing, sorting, summarizing, calculating, disseminating and storing data. Because data are most useful when well-presented and actually informative, data-processing systems are often referred to as information systems. Scientific data processing usually involves a great deal of computation (arithmetic and comparison operations) upon a relatively small amount of input data, resulting in a small volume of output. This refers to a class of programs that organize and manipulate data, usually large amounts of numeric data.
“Visualization device” is any device on which the results of the data analysis are displayed.
The output can be a graph, chart, list or any other output which describes the metabolic and molecular phenotype of the organism.
In one aspect of the invention, the biochemical knowledgebase includes information regarding the organism's genome, proteome, RNA, metabolic pathways and reactions, biochemical pathways and reactions, energy sources and uses, reaction by-products, protein complexes, macromolecular synthesis machinery, transcription units, lipid content, metalio-ions, amino acid content, prosthetic cofactors, covalent modifications, and non-covalent modifications, or any combination thereof. In another aspect, the biochemical knowledgebase includes calculation of a structural reaction using lipid content, metal ion content, energy requirements of the organism, ribosome production and doubling time, or any combination thereof. The relative composition of the structural reaction is derived from empirical measurements.
In an aspect, the perturbation of the organism or its environment is a change in genetic or environmental parameters. In one aspect, the change in genetic or environmental parameters includes changes in the composition of growth media, sugar source, carbon source, growth rate, ribosome production, antibiotic presence, forced overproduction of a network component, oxygen level, efficiency of macromolecular machinery, subjection to a chemical compound, genetic alteration and inhibition or hyperactivity of at least one enzyme, or any combination thereof. In one aspect, the efficiency of macromolecular machinery includes, but is not limited to transcription and translation rates, enzyme catalytic rates and transport rates, or any combination thereof. In an aspect, the inhibition or hyperactivity of an enzyme may be caused by an environmental change or genetic perturbation. Further, the environmental change may be the presence or absence of antibiotics and the genetic perturbation is directed protein engineering of specific chemical residues leading to modulated catalytic efficiency. In another aspect, the inhibition or hyperactivity of an enzyme is a decrease or increase to the efficiency parameter. In a further aspect, the change in genetic parameters is the addition of heterologous and/or synthetic genetic material.
In certain aspects, the perturbations are subsequently related to the endogenous regulatory network to determine regulators that may facilitate or interfere with the process of achieving a desired phenotype. In other aspects, the perturbations are related to the endogenous regulatory network to discover new regulatory capacities in the target organism.
Input device is any device in which information is inputted in to a system.
In an additional aspect, the metabolic and macromolecular changes include alterations in gene expression, protein expression, RNA expression, translation, transcription, pathway activation or inactivation, production of metabolic by-products, energy use, growth rate, proteome changes and transcriptome changes or any combination thereof. In specific aspects, the metabolic by-products include acetate secretion and hydrogen production; the proteome changes include amino acid incorporation rate, protein production, macromolecular synthesis, ribosomal protein expression, expression of peptide chains, enzyme expression, enzyme activity, RNA to protein mass ratio, protein degradation, post translational protein modification, proteome fluxes, translation and protein expression profile or any combination thereof; and the transcriptome changes include gene expression, transcription, functional RNA expression, transcriptome fluxes, transcription rate, gene expression profile or any combination thereof.
In a further aspect, the coupling constraints may be applied to system boundaries; maximal transcriptional rate for stable RNA and mRNA; relaxing of the requirement that all synthesized components need to be used within the network; mRNA dilution; mRNA degradation or complex dilution; hyperbolic ribosomal catalytic rate; ribosomal dilution rate; RNA polymerase dilution rate; hyperbolic mRNA rate; coupling of mRNA dilution, degradation and translation reactions; coupling of tRNA dilution and charging reactions; macromolecular synthesis machinery dilution rate; metabolic enzyme dilution rate, or any combination thereof. System boundaries include, but are not limited to the external environment, interfaces between cellular compartments, interfaces between multi-scale processes, and biophysical limits on the lifetime and efficiency for cellular machinery.
In specific non-limiting examples, the coupling constraint for mRNA dilution is VmRNA Dilution≧amax*VmRNA Degradation; wherein amax is TmRNA/Td; the coupling constraint for mRNA degradation is VmRNA Degradation≧bmax*VTranslation; wherein bmax=1/ktranslation*TmRNA; the coupling constraint for complex dilution is VComplex Dilution≧cmax*VComplex Usage; wherein cmax=1/kcat*Td; the coupling constraint for the hyperbolic ribosomal catalytic rate is
the coupling constraint of the ribosomal dilution rate is
the coupling constraint of the RNA polymerase dilution rate is
the coupling constraint of coupling of mRNA dilution, degradation and translation reactions is dilmRNA≧α1degmRNA and degmRNA≧α2trslmRNA, wherein
the coupling constraint of the hyperbolic mRNA rate is
the coupling constraint of the hyperbolic tRNA efficiency rate is
the coupling constraint of the coupling of tRNA dilution and charging reactions is diltRNA≧αchgtRNA, wherein
the coupling constraint of the macromolecular synthesis machinery dilution rate is
and the coupling constraint of the metabolic enzyme dilution rate is
(where, TmRNA is the measured, or assumed, half-life for the mRNA molecule; Td is the organism's doubling time; ktranslation is the rate of translation; kcat is the enzyme's turnover constant; and, VmRNA Dilution, VmRNA Degradation, VTranslation, VComplex Dilution, and VComplex Usage are reaction fluxes whose values are determined during the simulation procedure; kribo is the effective ribosomal rate; cribosome is
ro is the value of the vertical intercept if growth rate and the RNA/protein ratio are plotted (growth on the x-axis and RNA/protein ratio on the y-axis); kcat is the inverse of the slope of the relationship when growth and the RNA/protein ratio are plotted as for determination of ro; μ is growth rate; kRNAP is RNA polymerase (RNAP) transcription rate; VRibosome Dilution is dilution of ribosome; VRNAP dilution is the dilution of RNAP; Vtranslation of peptide is the translation of peptide; Vtranscription of TUi is the transcription of TUi; length (peptide)i is the length of peptidei; length TUi is the number of nucleotides in TUi; diltRNA is μ[tRNA]; chgtRNA is
[tRNA] is
dilmRNA is the dilution of mRNA; degmRNA is the degradation of mRNA; trslmRNA is translation of protein from mRNA; [mRNA] is mRNA concentration; is the mRNA catalytic rate; cmRNA is
chgmRNA is the charging of tRNA; diltRNA is the dilution of tRNA; [tRNA] is the tRNA concentration; ktRNA is the tRNA catalytic rate; ctRNA is
Vmachineryi dilution is the flux of the reaction leading to dilution of machine i; Vmetabolic enzymei dilution is the flux of the reaction leading to dilution of metabolic enzyme i, Vuse of machineryi is the sum of all fluxes using machine i; Vuse of metabolic enzymei is the sum of all fluxes using metabolic enzyme i). The coupling constraint is applied to one or more system boundary conditions resulting in a change in environmental conditions for the organism. The change in environmental conditions includes carbon source, sugar source, nitrogen source, metal source, phosphate source, oxygen level, carbon dioxide level, change in growth media, and the presence of another organism (of the same or different species) or any combination thereof.
In one aspect, the coupling constraints provide lower and/or upper bounds on flux ratios.
In a further embodiment, the present invention provides a method to determine the metabolic and macromolecular phenotype of an organism. The subject method includes generating a biochemical knowledgebase of the organism; introducing a perturbation to the organism or the organism's environment; using the biochemical knowledgebase to determine the metabolic and macromolecular changes associated with the perturbation and applying at least one coupling constraint; and determining of the metabolic and macromolecular phenotype of the target organism.
In one embodiment, the present invention provides a model for performing a cost estimate analysis of producing a value added product in an organism. The subject model includes a data storage device which contains a biochemical knowledgebase of the organism, costs associated producing the product and price of the product; a user input device wherein the user inputs parameters for producing the product; a processor having the functionality to compare the biochemical knowledgebase and the parameters to determine metabolic and macromolecular changes; apply at least one coupling constraint and perform cost benefit analysis thereto; a visualization display which displays the results of the analysis; and an output which provides the cost estimate analysis.
In a one aspect, the output is a graph or a chart depicting profitability estimate, estimates of key bioprocessing parameters such as feedstock consumption, reactor volume, production formation, copy number, catalytic efficiency, and cellular growth rate.
In a one aspect, the output is a graph or a chart depicting profitability estimate, estimates of key bioprocessing parameters such as feedstock consumption, reactor volume and production formation. In one aspect, the product is a naturally occurring or a recombinant protein. In another aspect, the product is a molecule, such as hydrogen or acetate.
As described in the examples, the subject ME-Model was used to determine the conditions for the best profitability for the production of spider silk. The model indicated that in the short term (less than 50 hr) maximum production and profitability occur when the organism is designed to dedicate most of its resources to spider silk production and specific growth rate is less than 0.01 hr−1. There was also a substantial decrease in net profits at the higher specific growth rates over an extended period of time. It was determined that the reduction in profits is due to an exponential increase in the amount of feedstock required to support the microbial population at these later time points.
The following examples are intended to illustrate, but not limit the invention.
The metabolic content for the biochemical knowledgebase was based on the previously published model (Zhang et al. (2009), Science 325:1544; Thiele et al. (2010), Nature Protocols 5:93) with updates to keep the network current with available literature. In associating metabolic reactions with protein complexes, cases were encountered where the metabolic model from Zhang et al. indicated a protein complex that hasn't been observed for T. maritima; these cases may have arisen from the Zhang et al. model using E. coli's metabolic model as the template. In these cases, a protein complex was assigned but denoted it low confidence. In addition to metabolism, the model contained reactions representing: transcription of TUs, TU degradation, translation, protein maturation, transcription, mRNA degradation, transcription, translation, protein maturation, RNA processing, protein complex formation, ribosomal assembly, rRNA modification, tRNA modification, tRNA charging, aminoacyl-tRNA synthetase charging, charging EF-TU, cleavage of polycistronic mRNA to release stable RNA products, demands, tRNA activation (EF-TU), and metabolism. Reversible reactions were split into two separate reactions representing each direction.
Macromolecular Synthesis Machinery
The molecular machines (e.g., proteins, genes, RNAs) involved in macromolecular synthesis were identified from the genome annotation, SEED subsystem analysis, comparative genomics analysis of the E. coli models, KEGG, and PubMed and Google Scholar searches for “T. maritima, or Thermotogales” and “transcription or translation.” The E. coli knowledgebase had 194 protein ORFs and SEED found 144 (74%) homologous proteins in T. maritima. Proteins used by T. maritima, but not E. coli, in transcription or translation were also identified (SI Table S5). Bi-directional best BLAST hits in T. maritima 's proteome to transcription/translation proteins from Bacillus subtilis were also used to prime specific literature searches to reduce bias introduced by using the E. coli model as a search parameter. Additionally, the annotation strings were manually checked for the remaining proteins to ensure no key transcription/translation machinery were omitted.
The functions of each of the 159 proteins associated with macromolecular synthesis in T. maritima were determined by primary literature when available. When no primary literature was available, the Uniprot and SEED databases (http://www.uniprot.org/ and http://www.theseed.org/) were used to infer function by homology. In a few instances, structural alignments were performed using the tool FATCAT to support the assessment of homologous function. The functions of 148 genes (˜93% of genes known to be involved in macromolecular synthesis in this organism) are linked in our final integrated model.
Protein Complexes
For each protein machine, primary literature and the RCSB Protein Data Bank (PDB) were used to determine whether the machine was a monomer or oligomer. The PDB entries also provided an opportunity to integrate 3-D structural data into the knowledgebase (this model includes structures for 32 additional ORFs compared to Zhang et al). When structures and states were unavailable for the protein of interest, orthologs in closely related organisms were considered when possible. Otherwise, the Uniprot database was consulted. When no information was available, the protein was assumed to act as a monomer and this assumption was noted in the model.
Transcription Unit Architecture
T. maritima has a genome organized by transcription units (TUs). Unfortunately, T. maritima 's TU architecture is far from being enumerated thus bioinformatics methods were required in addition to primary literature. The draft knowledgebase of the transcription unit architecture of T. maritima was achieved using ‘OR’ logic applied over a set of conditions. A TU would start with a gene and then proceed until one of the following conditions was met:
Rule 1: Two genes are found in convergent orientation on different strands.
Rule 2: Two genes are found in divergent orientation on different strands.
The convergent and divergent criteria were chosen because it is rare to see experimentally annotated TUs with these features. This procedure did not contradict any experimentally annotated TUs in T. maritima.
Rule 3: A high-confidence Rho-independent transcription terminator is found separating two genes oriented in series on the same strand.
Intrinsic terminators were predicted using the TransTermHP database (http://transterm.cbcb.umd.edu/). T. maritima uses the intrinsic RNA mechanism for transcriptional termination at many TU boundaries. Only terminator structures called with a “100%” confidence score were included.
Rule 4: More than 55 base pairs (bps) separate two genes in series on the same strand.
Among the many features used to predict operons, intergenic distance was found to be the best single predictor of operons in bacteria. Genes belonging to the same operon tend to exhibit small intergenic distance. In contrast, genes not in the same operon have a more uniform distribution of intergenic distance. In E. coli, the log-likelihood of finding two adjacent genes in a single TU plummets at an intergenic distance of 55 bp, thus 55 bp was chosen as the cutoff. For stable RNA operons this rule was not followed because stable RNAs frequently rely on the Rho protein for termination, and that could not be assessed for the current study. Additionally, in examining the distribution of intergenic distances around RNA genes, the distance metric does not appear to be of much use in these cases.
Rule 5: A high-confidence promoter region is found separating two genes oriented in series on the same strand.
It was assumed that there is no reason to keep two genes structurally linked if a promoter region is present. For prediction of promoters, we scanned 400 bp upstream of each ORF (or to the start of the previous gene) for the regular expression “TTGACA 16-18 bp TATAAT”. The spacer between these two boxes can be any sequence of the four nucleotides 16-18 bps in length. This regular expression corresponds a well-conserved bacterial promoter region.
Rule 6: An experimentally annotated stop is found after a gene.
TU prediction has only moderate statistical power. A few TUs determined experimentally were included.
All TUs are taken to be leaderless (no 51 extension) unless primary literature indicated the exact transcription start site and a TU would start with a gene and then proceed until one of the conditions was met.
Computational Methods
A custom Python (www.python.org) modules was built to construct an integrated model of Metabolism and Expression (ME-Model) from the previously published metabolic models, the T. maritima genome, and the rules described above. Because of numerical difficulties associated & with the range of parameters in our model precluded the use of inexact numerical solvers, we used an exact solver, QSopt_ex, with its default parameter settings. The LP problem file used for maltose minimal medium simulations, is provided as Supplement TMA_ME_v1.0_maltose_minimalJp.bz. Simulations involving only the metabolic portion from Zhang et al.'s were performed with the ILOG/CPLEX solver.
Derivation of the Coupling Constraints
a: mRNA Dilution
V
mRNA Dilution
≧a
max
*V
mRNA Degradation
Coupling constraint #1 approximates the passage of intact transcription units to daughter cells during cell division. This constraint ensures that the in silica cell incurs a material cost for mRNAs; otherwise, the cell only pays the energetic cost of converting NMPs to NTPs. Here, are all of the assumptions required to arrive at the coupling constraint given above and derive a biological Interpretation of the coupling parameter amax:
Coupling constraint a is interpreted to mean: “one mRNA must be removed from the cell for every Td/TmRNA times it is degraded”
b: mRNA Degradation
V
mRNA Degradation
≧b
max
*V
Translation; wherein bmax=1/ktranslation*TmRNA.
Coupling constraint b is to place an upper limit on the number of peptides produced per mRNA. In order to implement this constraint, we require an mRNA to pass through its degradation reaction once it has reached the limit. Here are all of the assumptions required to arrive at the coupling constraint given above and derive a biological interpretation of the coupling parameter bmax.
The mean lifetime of an mRNA molecule is denoted TmRNA,
b
max=1/ktranslation*TmRNA.
Coupling constraint b is interpreted to mean: “one mRNA must be degraded every 1/(ktranslation*TmRNA) times it is translated”.
Bulk order of magnitude approximations for TmRNA and ktranslation (derived from omics sources) was employed to arrive at the coupling parameter bmax used in this study.
Bulk Approximations for the Coupling Constraint Parameters.
Coupling Parameter Assumptions for the First Coupling Constraint:
Td, the doubling time of the cell, was calculated as ln(2)/λ. Here, λ is the experimentally measured growth rate (in minutes) for the particular condition modeled.
τmRNA, the mean lifetime of all mRNAs in the cell, was assumed to be 5 minutes. We based this on a wide range of stabilities observed for individual mRNAs of E. coli. In that bacterium, ≈80% of all mRNAs had half-lives between 3 and 8 min (Bernstein et al., 2002, Proc Natl Acad Sci USA, 99, 9697-702).
Coupling parameter assumptions for the second coupling constraint:
ktranslation is globally set to 4 proteins per minute. This value was tuned so that each mRNA will ultimately produce approximately 20 proteins during its effective lifetime. This mean yield (proteins/mRNA) was taken from a recent experiment which achieved simultaneous quantification of the E. coli Proteome and Transcriptome with Single-Molecule Sensitivity in Single Cells (Taniguchi et al., 2010, Science, 329, 533-8). It is important to note that literature sources disagree on the order of magnitude this parameter should take. The yield was reported as high as 300-600 in a separate quantitative study (Lu et al., 2007, Nat Biotechnol, 25, 117-24).
τmRNA, the mean lifetime of all mRNAs in the cell, was assumed to be 5 minutes. We based this on a wide range of stabilities observed for individual mRNAs of E. coli. In that bacterium, ≈80% of all mRNAs had half-lives between 3 and 8 min (Bernstein et al., 2002, Proc Natl Acad Sci USA, 99, 9697-702).
Coupling parameter assumptions for the third coupling constraint:
Td, the doubling time of the cell, was calculated as ln(2)/λ. Here, is the experimentally measured growth rate (in seconds) for the particular condition modeled.
kcat is globally set to 15 reactions per second per protein complex. Fluxes in metabolic models are on the order of 1 mmol/gDW h and less. Protein synthesis fluxes occur on the order of nmol/gDW h. This kcat parameter setting allows for feasible solutions by spanning the gap. Later, it can potentially be bounded using omics sources.
Special precautions are taken for the ribosome, RNA polymerases, and tRNAs as described below. Their rates can be confidently bounded using order of magnitude approximations:
RNA Polymerase (RNAP):
Ribosome:
tRNAs:
c: Complex Dilution
V
Complex Dilution
≧c
max
*V
Complex Usage; wherein
Coupling constraint c is used to approximate dilution of a complex to a daughter cell. Here are all of the assumptions required to arrive at the coupling constraint given above and derive a biological interpretation the coupling parameter cmax
V
Complex Usage=(Vmax[S])/(Km+[S])
V
Complex Usage
=Kcat[E] [Equation 1].
It is assumed that on the order of the cell's doubling time VComplex Degradation<<VComplex Dilution and therefore: VComplex Synthesis=VComplex Loss=VComplex Dilution.
V
Complex Synthesis
=V
Complex Loss
=V
Complex Dilution=(d[E]/dt)=(1/Td)[E] [equation 2]
V
Complex Dilution
≧c
max
*V
Complex Usage)=(1/Td)[E]≧cmax*kcat[E]
cmax is the inverse of the maximum number of complex uses in a doubling time.
Coupling constraint c is interpreted to mean: “one complex must be removed from the cell for every kcat*Td times it is used in the network”.
N-Terminal Methionine Cleavage Prediction
Predictions were made using the TermiNator program with protein sequences for T. maritima obtained from KEGG.
Genetic Code Determination.
From inspection of tRNA sequences and structures downloaded from the transfer RNA database (http://trna.bioinfuni-leipzig.de/DataOutput/), it was determined that T. maritima uses uniform-GUC decoding spread over 46 tRNA genes. In both Archaea and Bacteria, but not in Eukarya, the conversion of C34 of a CAU-anticodon to lysidine (k2C) or analogue generates an anticodon for isoleucine (ile). TMtRNA-Met-2 was assigned this role based on a strong sequence alignment to E. coli tRNAs containing k2C. The T. maritima genome encodes two additional tRNA genes with CAU anticodons. TMtRNA-Met-1 appears to be used for translation initiation while MARNA-Met-3 appears to be used during translation elongation. Evidence for distinguishing these two tRNA genes was based on the fact that TMtRNA-Met-1 has features that resemble those found in a crystal structure of formyl-methionyltRNAIMet from E. coli. Specifically, the presence of three consecutive G:C base pairs conserved in the anticodon stem of initiator tRNAs in initiation of protein synthesis in other organisms was relied on to make the final determination.
rRNA Modifications
For T. maritima, there was no organism-specific literature supporting modifications to the 5S and the 23S rRNA. No modifications of the 5S rRNA was assumed as modifications to 5S rRNA are infrequent in bacteria. Attempting to extrapolate 23S rRNA modifications from E. coli was relatively unsuccessful as alignment via ClustalW2 showed significant differences near many of the putative modification sites. The alignment also reveals that the 23S rRNA of T. maritima is significantly longer (>100 bp) than that of E. coli. Only three proteins with annotated roles in modifying the 23S rRNA were added to the model, TM0940, TM0462, and TM1715.
For 16S rRNA, there are experimental evidence for 10 modifications 15 in this organism. The locations of pseudouridines, which are mass silent, were not available, but an 1 lth modification, U to Y at position 516, was included in the knowledgebase based on the fact that it is well-conserved in bacteria and the alignment supports its inclusion. Finally, an unusual derivative of cytidine designated N-330 has been sequenced to position 1404 in the decoding region of the 16S rRNA. It was found to be identical to an earlier reported nucleoside of unknown structure at the same location in the 16S rRNA of the archaeal mesophile Haloferax volcanil. This modified nucleoside was excluded from the knolwedgebase since the exact chemical composition of the modification is unknown.
tRNA Modifications
Post-transcriptional modification of tRNA requires a significant investment in genes, enzymes, substrates, and energy. A variety of modifications were included in the model based on bioinformatics predictions and literature evidences.
RNaseP—
The Ribonuclease P Database2 (http://www.mbio.ncsu.edu/RNaseP/home.html) was used to locate the RNaseP gene at the genomic coordinates 752885-753222 on the + strand. This gene was absent from the T. maritima annotation in KEGG.
T. maritima MSB8 (ATCC: 43589) was grown in an 500 mL serum bottles containing 200 mL of anoxic minimal media with 10 mM maltose, xylose, cellobiose, arabinose or glucose as the sole carbon source at 80° C. All samples were collected during log-phase growth. Substrate uptake and by-product secretion rates, and compositional analyses were performed as described below.
Samples were collected for gene and protein expression measurements after the growth was stopped with 20 mL of stopping solution comprised of 5 parts Trizol and 95 parts 200-proof ethanol (Sigma-Aldrich, St. Louis, Mo., USA). Uptake and secretion measurements were performed by the continuous sampling of the growth medium and assessing the depletion or accumulation of extracellular metabolites using the HPLC (Waters Corp., Milford, Mass., USA) as previously described (Johnson et al. (2006), Appl Environ Microbiol 72:811).
Transcriptome Analysis
RNA isolation and transcriptome measurements were performed as previously described. Briefly, RNA was extracted using RNAeasy mini kit protocol with DNaseI treatment (Qiagen, Valencia, Calif., USA). Total RNA yields were measured by using a NanoDrop (Thermo Fisher Scientific, Waltham, Mass., USA) at wavelength of 260 nm and quality was checked by measuring the sample A260/A280 ratio (>1.8). Amino-allyl cDNAs were reverse transcribed from 10 μg of purified total RNA and then labeled with Cy3 Monoreactive dyes (Amersham, GE HealthCare, UK). Labeled cDNA samples were fragmented to 50-300 by range with DNaseI (Epicentre Biotechnologies, Madison, Wis., USA) and interrogated with high-density four-plex oligonucleotide tiling arrays consisting of 4×71548 probes of variable length spaced across the whole T. maritima genome were used (Roche-NimbleGen, Madison, Wis., USA). Hybridization, wash and scan were performed according to the manufacturer's instructions. Probe level data were normalized using Robust Multiarray Analysis without background correction as implemented in NimbleScan™ 2.4 software (Roche-NimbleGen). The mean value across all replicates was used in the comparison to model predicted expression levels.
Proteomic Analysis
Cell pellets were stored at −80° C. prior to proteomic sample preparation. Individual frozen pellets ˜0.75 g each from midlog phase cultures were thawed and resuspended in 2 mL of 100 mM NH4HCO9 (pH 8.0) and lysis was achieved by passing the samples through a pre-chilled French pressure cell press (SLM Aminco) at 8000 lb/in2 for four cycles. Lysed samples were centrifuged at 500×g (10 min, 4° C.) to remove cell debris, and the supernatants were divided into two aliquots per sample: one for global (whole cell lysate) sample preparation, and the other for soluble/insoluble fractionation. Ultracentrifugation (100,000 RPM, 10 min, 4° C.) was used to prepare insoluble protein/pellets and soluble protein/supernatant fractions. Cell pellets were washed once and the supernatants were combined with the soluble protein samples. Insoluble pellets were solubilized in 1% CHAPS in 50 mM NH4HCO3 (pH 7.8). Protein concentrations for global, soluble, and insoluble protein fractions were determined by the BCA protein assay (Sigma-Aldrich).
Following protein quantitation, lysate was denatured and reduced by incubation with 8 M urea and 0.1 M Bond Breaker TCEP (Pierce, Thermo Fisher Scientific) for 30 min at 6° C. Samples were diluted 10-fold with 50 mM ammonium bicarbonate (pH 7.8), and CaCl2 was added to achieve a 1 mM final concentration. Proteins were digested with trypsin (1:50, trypsin to protein wt/wt) (Sequencing grade modified trypsin, Promega, Madison, Wis., USA) for 4 h at 37° C. Digested peptide samples were cleaned-up with Discovery C18 SPE (global and soluble samples) or Discovery SCX (insoluble samples) columns (Supelco, St. Louis, Mo., USA) according to manufacturer recommendations and concentrated using a Speed-Vac (ThermoSavant, San Jose, Calif., USA).
Peptides (0.5 μg/μL) from the global, soluble, and insoluble preparations were separated by a custom-built automated reverse-phase capillary HPLC system. Briefly, peptides were separated on a slurry-packed Jupiter 3 μm C18 resin (Phenomenex, Torrance, Calif., USA) fused silica capillary column (60 cm length 175 μm ID) at constant 10K psi pressure, exponential gradient (100% A to 60% A over 100 min), flow rate 500 nL/min. Mobile phase consisted of A) 0.1% formic acid in water and B) 0.1% formic acid in acetonitrile. The eluate was directly analyzed by electrospray ionization using an LTQ Orbitrap Velos mass spectrometer (Thermo Fisher Scientific) operated in data-dependent mode with m/z range of 400-2000, collision energy of 35 eV, and the 10 most intense peaks were selected for fragmentation.
Data were processed by DeconMSn and the SEQUEST peptide identification software was used to match MS/MS fragmentation spectra with potential protein sequences derived from a six frame translation of the Thermotoga maritima genome (minimum length 30 amino acids). The parent mass tolerance used for matching was set to ±3 Da and fragment ion tolerance was set to ±1 Da. Peptides were searched with a dynamic oxidized methionine modification and no enzyme was specified. Peptide identifications were retained based upon the following criteria: 1) SEQUEST DelCn2 value ≧0.10 and 2) SEQUEST correlation score (Xcorr) ≧1.77 for charge state 1+ for fully tryptic peptides and Xcorr ≧3.04 for 1+ for partially tryptic peptides; Xcorr ≧1.98 for charge state 2+ and fully tryptic peptides and Xcorr ≧3.35 for charge state 2+ and partially tryptic peptides; Xcorr ≧2.84 for charge state 3+ and fully tryptic peptides and Xcorr ≧4.34 for charge state 3+ and partially tryptic peptides. Proteins used in the semi-quantitative analysis were required to have ≧2 unique peptides for identification or 1 peptide with a minimum of two observations. Redundant peptides (i.e., peptides mapping to multiple protein entries), comprising <0.30% of all peptide identifications, were excluded from the analysis to minimize potential ambiguity. Using the reverse database approach, the false discovery rate was calculated to be 0.08% at the spectrum level. Spectral counts were calculated as the sum of all peptide observations corresponding to a given protein. A normalized abundance score was calculated for each protein by dividing the total spectral count by the number of possible tryptic peptides (400-6000 m/z). For each protein, missing values were zero-filled and the mean of the normalized spectral count across all fractions was used for downstream analyses.
In Vitro Vs. In Silico Omics.
The predicted transcription level of a gene was determined by summing across the demand fluxes of the TUs containing that gene. Translation levels were reported as the sum across the relevant translation initiation fluxes as many TUs can contribute to the production of a given protein. These values were compared to the values reported experimentally.
The RNA-to-protein mass ratio (r) has been observed to increase as a function of specific growth rate (μ) (Schaechter et al., 1958, J Gen Microbiol, 19, 592-606; Scott et al., 2010, Science, 330, 1099-102) and decreases as a function of translation efficiency Scott et al., 2010, Science, 330, 1099-102). Schaechter et al. also observed an increase in the number of ribonucleoprotein particles with increasing μ, whereas the translation rate per ribonucleoprotein particle was relatively constant (Schaechter et al., 1958, J Gen Microbiol, 19, 592-606).
To ascertain whether the subject ME-Model recapitulated the observed increases in r, ribosomal RNA and proteins with increasing μ, a range of growth rates were simulated in a defined minimal medium (Rinker and Kelly, 1996, Appl Environ Microbiol, 62, 4478-85). To simulate the molecular physiology of T. maritima for a particular μ, FBA (Orth et al., 2010, Nat Biotechnol, 28, 245-8) was used subject to linear programming optimization (Applegate et al., 2007, Operations Research Letters, 35, 693-699) to identify the minimum ribosome production rate required to support a given μ (
a-b) show characteristics of M- and ME-Models objective functions and assumptions.
Consistent with experimental observations (Schaechter et al., 1958, J Gen Microbiol, 19, 592-606; Scott et al., 2010, Science, 330, 1099-102), the ME-Model simulated an increase in r with increasing μ and with decreasing translation efficiency (
With M-Models, the cellular macromolecular composition is constant, ergo they cannot reproduce the observed increases in r or ribosomes with increasing μ (
By explicitly accounting for enzyme expression and activity, ME-Model simulations should identify the set of proteins that will result in optimally efficient conversion of growth substrates into cells. To determine if the ME-Model was more economic in terms of enzyme usage than the M-Model, the ME-Model simulation was compared to a random sampling of the M-Model solution space (Reed and Palsson, 2004, Genome Res, 14, 1797-805). After normal distribution was fit to the sampled M-Model space it was found that there is a small (2.1×10−5) probability of finding an M-Model solution as efficient as the ME-Model solution (
To compare the range of permissible, i.e., computationally feasible, activity for each metabolic enzyme in the ME-Model versus the M-Model we performed flux variability analysis (FVA). FVA identifies the flux range that each reaction may carry given that the model must also simulate the specified objective value, such as μ, with a set tolerance. The permissible enzyme activities for simulating efficient growth with a 1% tolerance tended to have smaller ranges in the ME-Model compared to the M-Model (
In addition to simulating variable cellular composition and effectively eliminating the infinite catalysis problem, there are a number of metabolic activities that are required for optimally efficient growth with the ME-Model but not with the M-Model (
These differences highlight the interplay between macromolecular synthesis and degradation, metabolism and salvage, and optimal use of the proteome. The ME-models allow a fine resolution view of these processes and their simultaneous reconciliation.
To assess the subject ME-Model's ability to simulate systems-level molecular phenotypes, model were compared to predictions to substrate consumption, product secretion, AA composition, transcriptome, and proteome measurements. With the only external constraints for the ME-Model being the experimentally-determined μduring log-phase growth in maltose minimal medium at 80° C., the model accurately predicted maltose consumption and acetate and H2 secretion (
Interestingly, when we compared the simulated transcriptome and proteome fluxes to transcriptome and proteome measurements, respectively, there were statistically significant (p<2.2×10−16 t-test) positive correlations for both the transcriptome (0.54 PCC;
Within the transcriptome and proteome scatterplots (
Although there is a positive correlation (PCC of 0.54) between the simulated transcriptome fluxes and semiquantitative transcriptome data there was still a substantial amount of dispersion (
While it is a non-trivial endeavor to identify the source of all variation between the simulated and measured transcriptomes, it is possible to use the ME-Model for comparative transcriptomics approaches similar to two-channel DNA microarray studies. Despite the early technological limitations of DNA microarrays, biological discovery was enabled by performing comparative transcriptomics. Large-scale gene expression profiling has been used extensively to identify genes that are differentially regulated as a function of genetics and environment. Analysis of differentially expressed genes has contributed to the identification of gene product responsible for unannotated enzymatic activities. In combination with sequence analysis, differential gene expression data can be used to investigate transcriptional regulation.
A workflow was devised and implemented for in silico comparative transcriptomics which resulted in the discovery of new regulons and improved both genome and TU annotation (
A procedure was developed for cost estimate analysis for production of a value-added product in a genetically manipulated organism.
First all the necessary mutations were introduced (additions, subtractions, and/or modifications to the genome, transcriptome, proteome and/or reactome) in the computer representation of the target organism to provide it with a functioning pathway for converting feedstock into the desired valued added product.
The above described method was used to calculate the minimum ribosome production rate that is capable of supporting the maximum experimentally measured growth rate for the wild type organism in the defined growth medium (i.e., feedstocks). Term this ribosome production rate as the economically efficient ribosome production rate (R). In subsequent simulations, R is used as the upper bound constraint for ribosome production rate.
A growth rate was specified in the model and the above method was used to identify the maximum production rate for the value added product that can be supported while maintaining the specified growth rate. If data for substrate uptake as a function of growth rate are available then they can be used as additional constraints and the upper bound constraint for ribosome production can be relaxed.
For each simulation, information on sugar consumption, product formation, ribosome formation, and other parameters relevant to the growth medium and economic analysis was collected.
The collected consumption and production rates with current market estimates for feedstock and product prices was used to construct a profitability estimate graph and graphs for estimates of key bioprocessing parameters, such as feedstock consumption, reactor volume, and product information. These graphs will guide the selection of the most economically attractive operating conditions for a given bioprocessing plant design.
This method was applied to the production of spider silk protein by. T. maritima growing in maltose minimal medium (
Experimental Procedures
Network Knowledgebase
The two primary reaction networks used to create the ME-Model were the most recent metabolic knowledgebase (Orth et al., 2011), and a network detailing the reactions of gene expression and functional enzyme synthesis (Thiele et al., 2009). The gene expression knowledgebase is formalized as a set of ‘template reactions’ that can be applied to different components (e.g. gene, peptide, set of peptides) to generate balanced reactions. Merging the E. coli metabolic network knowledgebase with the gene expression knowledgebase required a conversion of the Boolean Gene-Protein-Reaction associations (GPRs) to protein complexes. EcoCyc's annotation was used to map gene sets to functional enzyme complexes. The network knowledgebase procedure is similar to that described in Example 1. Non-limiting modifications to the network knowledgebase procedure include mechanistic accounting for protein prosthetic group synthesis, integration with enzymes, and degradation, and implementation of variable coupling constraints based on empirical observations.
The scope and coverage of cellular processes in the integrated network is extensive. The integrated network mechanistically links the functions of 1541 unique protein-coding open reading frames (ORFs) and 109 RNA genes; it thus accounts for ˜35% (of the 4420) protein-coding ORFs, ˜65% of the functionally well-annotated ORFs (Riley et al., 2006), and 53.7% of the non-coding RNA genes identified in E. coli K-12 (Keseler et al., 2013). In total, 1295 unique functional protein complexes are produced. Taken together, these complexes account for 80-90% of E. coli's proteome by mass.
The integrated reaction network covers and accurately predicts a large proportion of essential cellular functions. It includes 223 of the 302 (73.8%) genes classified as essential for cell growth under any condition (Kato and Hashimoto, 2007), and 166 of the 206 functions (80.6%) estimated as essential for a minimal organism (Gil et al., 2004).
coli genomic DNA
Growth Demands and Constraints on Molecular Catalytic Rates
The reconstructed network can be converted into a genome-scale computational model to compute phenotypic states in a defined environment. Genome-scale models formally relate reaction network structure and governing constraints, which limit the range of functional states the network can achieve (Doyle and Csete, 2011; Milo and Last, 2012). Here, constraints on growth and gene expression were developed that allow for meaningful computation with the ME-Model.
To compute functional states of the integrated network, growth demands are first imposed. Growth requires the replication of the organism's genome and synthesis of a new cell wall to contain the replicated DNA. In the ME-Model, growth rate-dependent DNA and cell wall demand functions formalize these requirements (
RNA and protein are not included as demand functions as they are in M-Models (Feist and Palsson, 2010); instead, expression of specific RNA and protein molecules are free variables determined during ME-Model simulations. ‘Coupling constraints’ (Lerman et al., 2012; Thiele et al., 2010) relate the synthesis of RNA- and protein-based molecules to their catalytic functions in the cell (
A nutritional environment is then defined by setting constraints on the availability and uptake of nutrients. For a particular nutritional environment, there is a maximum growth rate at which the cell can no longer produce enough RNA and protein machinery to meet the demands of growth. The computed cellular state (biomass composition, substrate uptake and by-product secretion, metabolic flux, and gene expression) at this maximum growth rate is the predicted response of the cell to the specified nutritional environment.
A sigmoid function was then fit to the ‘% cell DNA’ column of Table 4 above. The values from this function represent the final growth rate-dependent DNA demand requirements. The constraint was imposed as in genome-scale models of metabolism (Orth et al., 2011).
Cell Wall
Biomass demand-like constraints were added to account for lipid/murein/LPS. These demands were formulated to be growth-rate-dependent, but the composition itself was assumed constant. The ‘base shell composition’ was constrained to be as shown in Table 5:
To arrive at growth-rate-dependent cell wall dilution constraints, the cell surface area (SA) is calculated assuming that the cell is a cylinder with hemispherical caps:
Volume of the cell as a function of μ in μm3,
An empirical relation for
Given these 2 functions for volume, and also an empirical function for cell length as a function of μ in μm,
through a least-squares optimization problem. A similar approach was taken in (Pramanik and Keasling, 1997), with the form of equations and numerical parameters taken from (Donachie and Robinson, 1987).
SA (in in μm2) can then be calculated as function of μ using the equation:
Next it was assumed as in (Pramanik and Keasling, 1997) that phosphatidylethanolamine makes up ˜77% of the lipids, phosphatidylglycerol 18%, and cardiolipin 5%. It was also assumed that an individual lipid has an area ˜0.5 nm2 and that 50% of the surface area is created by lipids (vs. proteins or other macromolecules). We also take into account that there are 4 individual lipid layers (2 lipid bilayers).
To calculate the grams of lipid per volume of cell as a function of growth rate, the following formula is used:
where wmwlipid=: is the weighted molecular weight (in g/mol) using the assumed composition and individual molecular weights of the lipids as follows: 734.03 g/mol for phosphatidylethanolamine, 827.11 g/mol for phosphatidylglycerol, and 1546 g/mol for cardiolipin. The 106 term is to correct the units, as SA(μ) is given in μm2 (1 μm2=106 nm2).
Next, we convert this to lipid grams per gDW using an assumed cell density of 1.105 g/mL cell and an assumption that the dry weight of the cell is roughly 30% of its total weight.
Finally, we scale the demand reactions from the ‘base shell composition’ by a scalar that causes the bottom components listed in the table above to match this calculated growth-dependent demand for lipids.
Glycogen
The glycogen content of the cell was assumed constant in all simulations (independent of growth rate) performed in this study. It was set to 0.023 grams Glycogen per gDW of biomass based on the biomass objective function in (Feist et al., 2007).
The molecular weight for glycogen was taken to be 162.141 mg mmol−1.
All of these nutrients have the potential to be limiting for growth. An upper bound of 1000 mmol gDW−1 h−1 is used to simulate growth in batch culture whereas lower values are used in nutrient-limited simulations. The upper bound for D-Glucose uptake is set to 1000 for all nutrient-limited simulations except when simulating D-Glucose limitation.
Coupling constraints may be represented with different mathematical formulae that are constructed from available data
Variables and Parameters Used in Derivations
To estimate the growth rate-dependent catalytic rates of enzymes we use the following variables and parameters.
P=total cellular protein mass (g gDW−1)
R=total cellular RNA mass (g gDW−1)
μ=specific growth rate (s−1)
frRNA=fraction of RNA that is rRNA
fmRNA=fraction of RNA that is mRNA
ftRNA=fraction of RNA that is tRNA
mαα=molecular weight of average amino acid (g mmol−1)
mnt=molecular weight of average mRNA nucleotide (g mmol−1)
mtRNA=molecular weight of average tRNA (g mmol−1)
mrr=mass of rRNA per ribosome (g)
kdegmRNA=first-order mRNA degradation constant (s−1)
Other than μ and P and R (which are functions of μ (equation 1)), the others parameters are constants in derivations and their numerical values are listed in Example 6. To derive the catalytic rates of molecular machines, we rely on average values (e.g. average molecular weight of mRNA, protein). However, when transforming these into coupling constraints in the ME-Model, actual molecular weights of specific molecular species are used. For computations, all coupling parameters are computed to 4 significant digits for numerical purposes. In derivations, seconds were used as the time unit, though these were converted into hours for ME-Model computations.
Empirical RNA-to-Protein ratio
In (Scott et al., 2010) the RNA-to-Protein ratio was shown to increase linearly with growth rate, regardless of the specific environmental condition:
For E. coli grown at 37° C., (Scott et al., 2010) empirically found ro=0.087 and κt=4.5 h−1. We use these values in our derivations throughout.
70S Ribosomes
Ribosomal Translation Rate and Dilution
Assume all rRNA is incorporated into ribosomes. Then:
Assume proteins are stable and not degraded. Then:
Hyperbolic Ribosomal Catalytic Rate
k′ribo=average translation rate of active ribosome (aa s−1)
fr=fraction of ribosomes that are active
kribo=effective ribosomal translation rate (aa s−1)
k
ribo
=k′
ribo
f
r
Thus, translation rate is hyperbolic with respect to growth rate:
Vmax=κτcribosome and Km=roκτ.
Using, parameters from Example 6, we get:
Vmax=22.7 aa ribosome−1 s−1
Km=0.391 h−1.
Ribosomal Coupling
An inequality constraint was derived setting a lower bound on ribosomal dilution (to daughter cells)
The inequality is imposed in a manner that takes into account the length of each particular peptide that needs to be translated. Said another way, ribosomal machinery demands depend on the precise number of amino acids incorporated for each peptide in the model.
VRibosome Dilution=dilution of ribosome (mmol ribosome gDW−1 s−1)
VTranslation of peptide
length(peptidei)=number of amino acids in peptidei
RNA Polymerase
krnap=RNAP transcription rate (nucleotide RNAP−1 s−1)
The transcription rate, krnap, is taken to be exactly 3 times the translation rate at all growth rates based on data from Table 1 from (Proshkin et al., 2010).
k
rnap=3kribo
Using equation, an inequality constraint was derived setting a lower bound on ribosomal dilution (to daughter cells)
The inequality was imposed in a manner that takes into account the length of each particular transcription unit (TU) that needs to be transcribed. Said another way, RNA polymerase machinery demands depend on the precise number of nucleotides transcribed for each RNA in the model.
VRNAP Dilution=dilution of RNAP (mmol RNAP gDW−1 s−1)
VTranscription of TU
length(TUi)=number of nucleotides in
mRNA Coupling
Dilution, Degradation, Translation Reaction Rates
For the derivation, assume that mass of mRNA transcribed, translated, degraded, and diluted is only in coding regions. In actuality, the molecular weight of mRNA will be higher due to untranslated regions, which is reflected in the values used in the ME-Model. Let:
dilmRNA=dilution of mRNA (mmol nucleotides gDW−1 s−1)
degmRNA=degradation of mRNA (mmol nucleotides gDW−1 s−1)
trslmRNA=translation of protein from mRNA (mmol amino acids gDW−1 s−1)
[mRNA]=mRNA concentration (mmol nucleotides gDW−1)
Coupling
The mRNA dilution, degradation, and translation reactions are coupled in the ME-Model with linear inequalities as followed:
dilmRNA≧α1degmRNA
degmRNA≧α2trslmRNA
The inequality formulation allows for some mRNA transcribed to not be translated, but it still must be diluted and degraded.
When the inequality constraints are operating at their bounds, α1 and alpha2 will then be:
Note: The factor of 3 above is to account for 3 nucleotides per amino acid.
Hyperbolic mRNA Catalytic Rate
The above formulation also results in a hyperbolic mRNA catalytic rate.
kmRNA=mRNA catalytic rate (mmol protein (mmol mRNA)−1 hr−1)
Using parameters in Example 6, we get:
Vmax=0.5 protein mRNA−1 s−1
Km=0.391 h−1.
Rates of Charging and Dilution of tRNA
chgmRNA=charging of tRNA (mmol tRNA gDW−1 s−1)
dilrRNA=dilution of tRNA (mmol tRNA gDW−1 s−1)
[tRNA]=tRNA concentration (mmol tRNA gDW−1)
Coupling
The tRNA dilution and charging reactions are coupled in the ME-Model with linear inequalities as followed:
diltRNA≧αchgtRNA
At the bound of equality,
Hyperbolic tRNA Efficiency
The above formulation also results in a hyperbolic tRNA catalytic rate.
ktRNA=tRNA catalytic rate (mmol protein (mmol tRNA)−1 hr−1)
Using parameters in Example 6, we get:
Vmax=CtRNAκτ=2.39 aa tRNA−1 s−1.
Km=0.391 h−1.
Remaining Macromolecular Synthesis Machinery
For the remaining macromolecular synthesis machinery, we set kcat=65 (s−1) across all growth rates:
Metabolic Enzymes
For metabolic enzymes, the catalytic rate is set to be proportional to the enzyme solvent accessible surface area (SASA).
based on the empirical fit from (Miller et al., 1987).
The specific enzyme efficiency value received for a given enzyme/complex was assumed to be linearly dependent on its SASA value. The mean of all the kinetic constants was centered at keff=65 (s−1). Let sasa denote a particular value after centering.
This coupling is a gross approximation for an enzyme's kinetic information. Its purpose is to reward expression of large complexes (such as pyruvate dehydrogenase which is composed of 12 AceE dimers, a 24-subunit AceF core, and 6 LpdA dimers), given these complexes have many more active sites (on average) than smaller enzymes. In the future, these values can be parameterized further using condition-specific multi-omics data.
By definition, the total biomass produced must be equal to the growth rate. In metabolic models, this constraint is imposed by the definition of the biomass objective function: the total mass in the biomass objective function sums to 1 g/gDW and the flux through the biomass reaction is equal to the growth rate (h−1). As biomass is now split up into many dilution reactions for individual peptides, RNAs, and enzymes (to allow for variable biomass composition through gene expression) in addition to the DNA, Cell Wall, and Glycogen demand functions, this constraint is no longer explicitly enforced. The difference between Strictly Nutrient-Limited and Janusian and Batch (
For simulations in the Batch and Janusian regions (when proteome limitation is active and enzymes are saturated), an additional ‘biomass capacity constraint’ is added. This additional row appended to the stoichiometric matrix enforces that the sum of the masses of all biomass production (component dilution plus demand function fluxes) equals the growth rate. With this additional constraint, a simple binary search for the maximum feasible growth rate determines the final solution where growth rate is maximized. While the objective of the overall optimization is growth rate, the production of a random peptide is chosen to be the objective of each LP problem in the process. As expression of this random peptide is unnecessary as far as the model is concerned, the production of this peptide is 0 at the maximum growth rate.
For simulations in the Strictly Nutrient-Limited (SNL) region, a simple ‘biomass capacity constraint’ is insufficient. This is because enzymes are not saturated in nutrient-limited conditions; however, these trends are not fully understood so cannot be modeled a priori. If a direct 1:1 relationship between activity and abundance is assumed for enzymes, at low growth rates the in silico cell will produce hardly any protein or RNA. On the other hand, if the biomass capacity constraint is imposed, unnecessary RNA is produced simply to satisfy the biomass capacity requirement (as it is cheaper metabolically than protein) and enzymes are fully saturated, which is not accurate. Thus, it was assumed that the cell makes as much protein as possible (as it is generally the functional machinery of a cell); then it was assumed that this protein is all metabolic protein and the proteins are not saturated (so do not operate at keat). This is accomplished through two binary search procedures. In the first, the production of a ‘dummy protein’ is maximized, and a growth rate, μ*, is searched for where growth rate is equal to biomass dilution. The solution after this initial binary search will generally have a non-zero dummy protein production. Then, the growth rate, μ*, is fixed and a binary search for the minimal fractional enzyme saturation (keff/kcat) is found. At minimal fractional enzyme saturation and μ*, the dummy protein production will be 0. The qualitative shape of keff/kcat vs. μ obtained matches empirical trends for individual enzymes and small-scale kinetic models (
For most simulations (unless all uptakes are unbounded), it is not known if the specific uptake bounds will result in a solution that lies in the Strictly Nutrient-Limited, Janusian, or Batch growth region. For these cases, they are first solved as SNL. If no feasible solution is found where growth rate is equal to biomass dilution, the biomass capacity constraint is added and the problem is solved using the proteome-limited procedure.
With ME-Models, linear optimization begins to encounter scaling and/or infeasibility issues. To mitigate this problem, we used the SoPlex LP solver (freely available at http://soplex.zib.de/ http://soplex.zibde) (Roland, 1996), which provides for solving the individual LPs using extended precision floating point numbers (80 bits) on x86 processors.
As an initial validation of the E. coli ME-Model, we compare the computationally predicted and experimentally measured total RNA and protein content of the cell. The ratio of RNA to protein biomass in E. coli (and other microbes) has been shown to follow consistent ‘growth laws’ in which the RNA-to-protein ratio increases linearly with growth rate, independent of the specific medium (Scott et al., 2010) (
Metabolic enzymes also display lower effective catalytic rates at lower growth rates. Experiments suggest that the effective catalytic rates of metabolic enzymes are specific to a given nutritional environment (Boer et al., 2010) (i.e., the identity of the limiting nutrient matters). This phenomenon is well-recognized for transporters under nutrient limitation—enzyme kinetics dictate that at a lower external nutrient concentration, transporters will have a lower effective catalytic rate (O'Brien et al., 1980) (
What is less well appreciated, though, is that many internal enzymes also display lower effective catalytic rates under nutrient limitation. Quantitative metabolomics data shows that internal enzymes become less saturated when external nutrients are limited. In nutrient-excess conditions (i.e., batch culture), [S]˜Km (Bennett et al., 2009); however, in nutrient-limited conditions (i.e., chemostat culture), internal metabolites ‘related’ to the limiting nutrient have a lower concentration ([S]<Km) (Boer et al., 2010). These trends also occur in a small-scale kinetic model (Molenaar et al., 2009).
These changes were accounted for in metabolic enzyme catalysis in the ME-Model with two minimal assumptions: (1) that when the cell is nutrient limited, protein content is maximized (at a given growth rate) and, (2) that this protein mass is metabolic enzymes not operating at their maximal catalytic rate (i.e., keff/kcat<1). This procedure results in a calculated nutrient limitation-dependent effective catalytic rate with the same qualitative shape as experimental data (
Prediction of Growth Rate, Nutrient Uptake, and Yield
Growth, nutrient uptake, and by-product secretion rates are some of the most informative and concise descriptions of the physiological state of a microbial cell (Monod, 1949; Neidhardt, 1999). However, the underlying determinants of growth, uptake, and secretion are not generally understood. The ME-Model was used to predict the relationship between growth rate, nutrient uptake, and secretion under varying external nutrient availability. Importantly, the interplay between external (nutrient) and internal (proteome) growth limitations can be simultaneously reconciled using the ME-Model.
In nutrient-excess conditions, growth in the ME-Model is limited by internal constraints on protein production and catalysis—the cell is ‘proteome-limited’—resulting in a corresponding maximal growth rate (
The ME-Model-predicted response to glucose limitation was detailed. When the uptake of glucose is restricted below the amount required for optimal growth in batch culture, the cell's growth is carbon-limited. Growth rate linearly increases with glucose uptake when glucose availability is low (
Simulating small molecular by-product yield (
As nutrient levels are varied, the balancing of proteomic resources to maximize growth results in intricate behavior and trade-offs. With integrated treatment of metabolism and protein synthesis, a ME-Model can compute this interplay and the optimal allocation of cellular processes.
At a more detailed level, the ME-Model predicts genome-scale changes in metabolic fluxes. Previous studies have evaluated the ability of M-Models (which do not include protein synthesis) together with assumed optimality principles to predict metabolic fluxes as inferred from 13C fluxomic datasets (Nanchen et al., 2006; Schuetz et al., 2007; Schuetz et al., 2012). These studies concluded that no single ‘objective function’ applied to M-Models can accurately represent fluxomic data from all environmental conditions studied (Schuetz et al., 2007). Instead, metabolic fluxes can be understood as being ‘Pareto optimal’: multiple objectives are simultaneously optimized and their relative importance varies depending on the environmental condition (Schuetz et al., 2012). The three objectives needed to explain most of the variation in the data from Schuetz et al. were: (1) maximum ATP yield, (2) maximum biomass yield, and (3) minimum sum of absolute fluxes (which is a proxy for minimum enzyme investment). These three objectives formed a Pareto optimal surface that was valuable for interpreting fluxomic data; however, the surface was large and it was not possible to predict the importance of each of the objectives a priori.
By explicitly accounting for variable growth demands, enzyme expression, and constraints on enzymatic activity, the ME-Model eliminates the need for multiple objectives. Using the E. coli ME-Model we show that growth rate optimization alone is sufficient to predict the fluxes through central carbon metabolism (
Gene expression changes were analyzed in the context of the ME-Model to provide a wider view of the molecular response to glucose limitation. The ME-Model was used to simulate the transcriptome and proteome as a function of growth rate and then examine the relative differences in transcriptome and proteome for different growth rates. We identify groups of proteins that change their expression concertedly from low to high growth rates under glucose limitation, and provide new insight into why certain proteins have characteristic profiles. We also identify how these concerted changes might be regulated.
In the Strictly Nutrient-Limited region, the expression of most proteins decreases as growth rate increases (
A smaller number of proteins show increases in their relative expression levels at higher growth rates (
The simulated expression profiles can be related to molecular mechanisms known to control growth rate-dependent gene expression in vivo. In addition to direct transcription factor (TF) interactions, in vivo gene expression levels are influenced by the physiological state of the cell (Berthoumieux et al., 2013). Growth rate-dependent regulation of translation machinery has been extensively characterized (Dennis et al., 2004; Condon et al., 1995); however, there have been few studies describing such control mechanisms for other genes. It was previously shown that the steady-state expression of a constitutively expressed gene decreases as growth rate increases (Klumpp et al., 2009) due to a decrease in the availability of free RNAP as cells grow faster (Klumpp and Hwa, 2008). We predict that most metabolic proteins decrease in expression at higher growth rates (
In the Janusian region of growth (
The gene modules that change during the transition to nutrient-excess growth can be related known transcriptional regulatory interactions. Several TFs regulate genes predicted to significantly change in the shift (
The ability of the ME-Model to compute high-resolution molecular phenotypes reveals network-wide patterns in gene expression under glucose limitation. Even though regulatory constraints and interactions are beyond the scope of the ME-Model, the patterns it predicts are highly consistent with our knowledge of broad-acting TFs.
Here we show that the ME-Model can be used to identify changes in biological parameters that occur during adaptive evolution. In the ME-Model, evolution to higher growth rates under nutrient-excess conditions can be simulated by relaxing at least one model constraint. Parameters related to various growth demands and the efficiency of the proteome were investigated. The ME-Model can simulate changes in gene expression (and other phenotypic properties) after the parameter change leading to a higher growth rate.
When E. coli is grown in glycerol in batch culture (Conrad et al., 2010), mutations in rpoC leading to gene expression changes consistently occur. In silico changes in substrate uptake rate, biomass yield, and expression of cellular subsystems to measurements from evolved strains were compared. It was found that increasing the effective catalytic rate of enzymes in the ME-Model results in phenotypic changes that are consistent with experiments (
a-b) show how perturbing ME-Model parameters can aid the development of hypotheses to explain discrepancies between the ME-Model and experimental data.
Identify the environmental and/or organismal constraints corresponding to two conditions of interest C1 and C2. The environmental constraints are defined by media composition and the organismal constraints are defined by the production/activity of specific model components (e.g. genes, reactions, metabolites).
Use our optimization method to find the maximum feasible value for the selected trait, T, (e.g growth rate) subject to the environmental and organismal constraints C1 and C2.
Determine the changes in the phenotype(s) of interest (e.g. gene expression levels) between the results of C1 and C2.
If desired, identify regulators that promote and/or interfere with the computed shift between C1 and C2 based on known or computationally predicted regulatory interactions.
As a demonstration, we use the above method both to both look at environmental perturbations (
The method can also be extended to include the parameter sensitivity analysis or the inclusion of a organismal state determined with omics data The method can also be extended to simulate the whole transition between C1 and C2 (instead of just the end points).
Procedure for using omics data to constrain the functional state of an organism.
Constrain the growth rate of the organism as predicted or measured experimentally.
Optionally, constrain substrate uptake, secretion, and/or metabolic fluxes as measured experimentally.
Determine a suitable set of kinetic parameters for enzymes, or sample a range of parameters to account for their uncertainty.
Subject to the imposed constraints in 1, 2, and 3, minimize the (relative) error between measured and modeled gene expression. For example, this can be achieved with the objective, minimize: |vmodel/vdata−1|.
As a demonstration, we have applied the procedure to determine the state of wild-type E. coli grown in glucose minimal medium in aerobic batch culture (
The method is not limited to the particular measure of gene expression and multiple measures (e.g. RNA abundance and protein abundance) of gene expression can be simultaneously accounted for.
Although the invention has been described with reference to the above example, it will be understood that modifications and variations are encompassed within the spirit and scope of the invention. Accordingly, the invention is limited only by the following claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US13/40351 | 5/9/2013 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61644924 | May 2012 | US |