The present invention relates to a method for simulating a production process of a substance, typically an amino acid or a nucleic acid, using cells of a microorganism or the like, and a program therefor.
The technique of constructing a mathematical equation model of biochemical reactions caused by intracellular enzymes to estimate intracellular dynamic behaviors of metabolites is called metabolic simulation, and there are many examples thereof (Ishii, N. et al., J. Biotechnol., 113:281-294, 2004) and proposed many methods therefor (U.S. Patent Application No. 2002/0022947, International Publication Nos. WO2004/081862, WO03/07217, WO02/55995, WO02/05205, Japanese Patent Application Laid-Open (KOKAI) No. 2003-180400). As an example of comparatively large scale metabolic simulation, Teusink et al. performed simulation of anaerobic ethanol fermentation of Saccharomyces cerevisiae considering metabolic routes branching from the glycolytic system for producing glycogen, trehalose, glycerol and succinic acid (Teusink, B. et al., Eur. J. Biochem., 267:5313-5329, 2000). For Escherichia coli, Chassagnole et al. constructs a central metabolic model under a condition of a constant growth rate to perform simulation (Chassagnole, C. et al., Biotechnol. Bioeng., 26:203-216, 2002). According to another report, a part of the gene expression of E. coli metabolic enzymes was modeled and combined with an enzymatic reaction model to perform simulation (Wang, J. et al., J. Biotechnol., 92:133-158, 2001; Schmid J. W. et al., Metab. Eng., 6:364-377, 2004). Further, in the report of Varner, construction of a large scale model in which gene expression is incorporated into a kinetic model of enzymes is conceptually disclosed, and a specific growth rate is expressed by an equation using saturation coefficients of precursors for the maximum specific growth rate (Varner, J. D., Biotechnol. Bioeng., 69:664-678, 2000). For other organisms, further detailed models have been reported. Jeong et al. constructed a model of sporulation process of Bacillus subtilis in batch culture using mathematical equations (Jeong et al., Biotechnol. Bioeng., 35:160-184, 1990). Tomita et al., Bioinformatics 15:72-84, 1999, also has reported on the simulation of 127 genes involved in transcription and translation of the Mycoplasma genitalium genome, as well as energy production and phospholipid synthesis of the microorganism.
In simulation of a production process of a substance, typically a nucleic acid or an amino acid, using cells of microorganisms or the like, enzymatic reactions from a substrate to an objective product are represented by mathematical equations using kinetic parameters in many cases. However, batch culture or semi-batch culture (fed batch culture) is often used for production of a useful substance, and therefore the growth rate of cells changes. In connection with it, various kinds of parameters in the cells also change. In addition, besides the production rates of a substance serving as a substrate, components present in the medium and an objective product, production rates of by-products such as amino acids, organic acids and carbon dioxide (CO2) also change during the process of substance production. Therefore, a technique for performing metabolic simulation with sufficient precision in such a manner that the simulation should well fit to experimental data of such growth rates or by-products is desired. If an accurate metabolic simulation reflecting experimental data is enabled, it becomes possible to conduct experiments of amplification or deletion of a gene by a computer in a short time (in silico experiments). It is expected that it should greatly shorten the development period for improving substance production ability of cells.
The inventors of the present invention conducted various experiments in view of the aforementioned problems. Consequently, they found that they could achieve more accurate metabolic simulation of the production of a substance, typically an amino acid or a nucleic acid, by cells of microorganisms or the like. The enhanced accuracy pertained when (A) a specific growth rate, serving as an index of cell growth, was represented by a mathematical equation that used a time function based on measured values, and (B) time functions or functions using the specific growth rate as a variable were employed with various parameters, including outflow rates of intracellular metabolites into cell components and, further, uptake rates of intracellular metabolites from the outside of the cells or excretion rates of the same to the outside of the cells.
The present invention was accomplished based on the aforementioned findings and provides the following.
[1] A method for effecting a simulation of a substance-production process that uses cells, wherein said simulation is based on a set of differential equations that represent intracellular metabolites and gene expression, said method comprising the steps of:
[2] The method according to [1], wherein the differential equations including the specific growth rate of the cells include the differential equations represented as the following equations (1) to (3):
d[Metabolite]/dt=Vinput−Voutput−μ[Metabolite] (Equation 1)
d[mRNA]/dt=ktranscription[P]−(kdRNA+μ)[mRNA] (Equation 2)
d[Protein]/dt=ktranslation[mRNA]−(kdProtein+μ)[Protein] (Equation 3)
wherein, in the equation 1, [Metabolite] represents an intracellular concentration of a metabolite, Vinput represents the sum of rates of reactions producing the metabolite, Voutput represents the sum of rates of reactions consuming the metabolite, and μ represents the specific growth rate;
in the equation 2, [mRNA] represents a concentration of mRNA, ktranscription represents a rate constant of transcription, [P] represents a promoter concentration, kdRNA represents a rate constant of decomposition of mRNA, and μ represents the specific growth rate, and
in the equation 3, [Protein] represents a concentration of a protein, ktranslation represents a rate constant of translation, kdProtein represents a rate constant of decomposition of the protein, and pt represents the specific growth rate.
[3] The method according to [1], wherein the growth rate factor is a function of the specific growth rate or a function of time.
[4] The method according to [1], wherein the specific growth rate is represented as a function of time, and the function is obtained by generating a mathematic equation from measurement data of the specific growth rate in the production process.
[5] The method according to [1], wherein the growth rate factor representing the formation rate is obtained by preparing a mathematic equation expressing measurement data of the formation rate in the production process.
[6] The method according to [1], wherein the growth rate factor representing the inflow rate and/or the outflow rate is obtained by preparing a mathematic equation expressing measurement data of the inflow rate and/or the outflow rate in the production process.
[7] The method according to [1], wherein the metabolite taken up into the cells is a substrate and/or an organic substance in a medium.
[8] The method according to [1], wherein the metabolite excreted out of the cells is an objective substance and/or a by-product.
[9] The method according to [8], wherein the metabolite excreted out of the cells is an amino acid, an organic acid and/or carbon dioxide.
[10] The method according to [9], wherein the metabolite excreted out of the cells is an amino acid or an organic acid.
[11] The method according to [1], wherein the parameters required for the simulation are a rate constant of transcription and/or a rate constant of translation.
[12] The method according to [1], wherein the cells are those of a microorganism having an amino acid producing ability and/or an organic acid producing ability.
[13] The method according to [12], wherein the microorganism is Escherichia coli.
[14] The method according to [1], wherein a composition of cell components itself is represented by a mathematical equation using the specific growth rate of the cells or the cells' equivalent index concerning the growth.
[15] A computer program product for effecting a simulation of a substance-production process that uses cells, wherein said simulation is based on a set of differential equations that represent intracellular metabolites and gene expression, comprising:
[16] A system for effecting a simulation of a substance-production process that uses cells, wherein said simulation is based on a set of differential equations that represent intracellular metabolites and gene expression, comprising:
a processor for processing information; and
a storing means, including:
According to the method of the present invention, it becomes possible to perform simulation of a production process of a substance, typically an amino acid or a nucleic acid, for a production process using cells of a microorganism or the like, in which a growth rate markedly changes.
Hereafter, the present invention will be explained in detail.
In this description, the phrase “substance production process using cells” and similar terminology refers to a process of biochemical reactions from a substrate to a product that is caused as sequential enzymatic reactions, using cells to produce an objective product.
The simulation method of the present invention is based on differential equations, which may be the same as a conventional simulation methodology except that specific conditions according to the present invention are employed. Conventional simulation methods comprise preparing differential equations for intracellular metabolites and gene expression, assigning values for parameters in the differential equations required for the simulation and solving the differential equations with the assigned values.
Differential equations can usually be prepared by incorporating mathematical equations for expression control into metabolic simulation.
Metabolic simulation is a technique for describing time-dependent dynamic changes by expressing intracellular biochemical reactions with mathematical equations, describing changes of substances caused by the reactions with differential equations, and solving them by numerical computation. The mathematical equations used for this purpose are often nonlinear ordinary differential equations (ODE), and this process of preparing mathematical equations is generally called “modeling.” Typically, many nonlinear differential equations are solved by using a computer.
In this regard, the phrase “biochemical reactions” refers to processes for converting an intracellular metabolite by an enzymatic reaction. Data on such reactions are stored in databases for many organisms. For example, KEGG (Kyoto Encyclopedia of Genes and Genomes, http://www.genome.ad.jp/kegg/; Kanehisa, M. et al., Nucleic Acids Res., 32:277-280, 2004) can be referred to. As for E. coli, EcoCyc (Encyclopedia of Escherichia coli K12 Genes and Metabolism, http://ecocyc.org/, Keseler et al. (2005) Nucleic Acids Res., 33, D334-D337, 2005) is known. For describing these biochemical enzymatic reactions with mathematical equations, dynamic equations based on Michaelis-Menten type reaction formulas are often used (Segel I. H., Enzyme Kinetics: Behavior and Analysis of Rapid Equilibrium and Steady-State Enzyme Systems, John Wiley & Sons, 1975). Parameters of respective enzymes can be collected from literatures. For example, for E. coli, Chassagnole et al. described enzymatic reactions of the glycolytic pathway and pentose phosphate pathway from glucose to acetyl-CoA with mathematical equations using kinetic parameters (Chassagnole, C. et al., Biotechnol. Bioeng., 26, 203-216, 2002).
It is gene expression that determines amounts of intracellular enzymes, and this process is realized through the processes of transcription of a gene into mRNA and translation of mRNA into a protein. By incorporating this gene expression into metabolic simulation, it becomes possible to describe more detailed intracellular behaviors. Examples include description of expression control with mathematical equations and incorporation of them into metabolic simulation. For example, Wang et al. described expression control of the sucrose and glycerol uptake system of E. coli with mathematical equations and combined them with an enzymatic reaction model of the glycolytic pathway to perform simulation (Wang, J. et al., J. Biotechnol., 92:133-158, 2001), and Schmid et al. modeled expression control of a tryptophan biosynthesis pathway gene (trp operon) and combined it with a central metabolic model of E. Coli to perform analysis (Schmid J. W. et al., Metab. Eng., 6:364-377, 2004). Differential equations concerning intracellular metabolites and gene expression are prepared as described above.
Concentration change and gene expression of a metabolite can be generally represented by one or more differential equations. Thus, one commonly represents each intracellular metabolite with an equation 1, considering the enzymatic reaction rate for each intracellular metabolite and the dilution effect due to growth:
d[Metabolite]/dt=Vinput−Voutput−μ[Metabolite] (equation 1)
In the equation 1, [Metabolite] represents the concentration of an intracellular metabolite, Vinput represents the sum of rates of enzymatic reactions for producing the metabolite, Voutput represents the sum of rates of enzymatic reactions consuming the metabolite, and μ represents the specific growth rate.
Gene expression is generally described in terms of the two stages of transcription and translation, i.e., as changes in concentrations of mRNA produced by transcription of a gene and protein produced by translation of mRNA (
For mRNA produced by transcription of various genes, one may describe the concentration by the following equation, considering the transcription rate with which mRNA is synthesized from a gene and the decomposition rate of mRNA as well as the dilution effect:
d[mRNA]/dt=ktranscription[RNAP][Promoter]−(kdmRNA+μ)[mRNA] (equation 2)
In equation 2, [mRNA] represents the concentration of mRNA, ktranscription represents a rate constant of transcription, [RNAP] represents the concentration of RNA polymerase which performs the transcription, [Promoter] represents the concentration of a promoter of the corresponding gene, kdRNA represents a rate constant of decomposition of mRNA, and μ represents the specific growth rate.
For a protein produced by translation of mRNA, the concentration can be described by the following equation, which takes into account the translation rate with which mRNA is translated into a protein and decomposition rate of the protein as well as the dilution effect:
d[Protein]/dt=ktranslation[Ribosome][mRNA]−(kdProtein+μ)[Protein] (equation 3)
In equation 3, [Protein] represents the concentration of a protein, ktranslation represents a rate constant of translation, [Ribosome] represents the concentration of ribosome which performs translation, kdprotein represents a rate constant of protein decomposition, and t represents the specific growth rate.
Equations 2 and 3 can also be described in various other ways.
When differential equations are prepared, values are assigned for the parameters required for simulation in the prepared differential equations. The parameters required for the simulation include rate constants, initial concentrations, and so forth. The parameters are preferably a rate constant of transcription and/or a rate constant of translation.
Parameters for respective enzymatic reactions and gene expression can be collected from literatures. However, there are many parameters for modeling of enzymatic reactions and gene expression, and therefore it is often impossible to collect all the parameters from literatures. In such a case, it is possible to estimate appropriate values from various information in literatures, or estimate them by optimization of measurement results obtained in experiments.
In order to solve such nonlinear ordinary differential equations obtained as described above, it is possible to use a mathematical calculation program such as MATLAB® (MathWorks) and MATHEMATICA® (Wolfram Research). To execute metabolic simulation, for example, ODE solvers of MATLAB® (MathWorks) can be used. As the ODE solver for solving a metabolic reaction equation or gene expression equation, ode45, ode21s and so forth are preferably used. Moreover, many kinds of metabolic simulation software have been developed as software for performing metabolic simulation, and they can be utilized (Ishii, N. et al., J. Biotechnol., 113:281-294, 2004). Examples include GEPASI (Mendes, P. Comput. Applic. Biosci., 9:563-571, 1993), SCAMP (Sauro, H. M., Comput. Appl. Biosci., 9:441-450, 1993), E-CELL (Tomita, M. et al., Bioinformatics, 15:72-84, 1999), and so forth.
A simulation of the present invention is characterized by additionally using certain conditions, as described in greater detail below. This approach enables more accurate simulation of a substance production process accompanied by growth of cells.
(a) Description of Specific Growth Rate with Time Function
A differential equation that incorporates a specific growth rate of cells is included in the differential equations, and the specific growth rate is represented with a function of time.
The phrase “growth of cells” refers to a phenomenon whereby the number of cells increases in a substance production process. The number of cells usually increases with conversion of a substrate added for the substance production into cell components. When growth is represented by the number of cells, the growth rate is the rate at which the number of cells increases, and a specific growth rate is obtained by dividing the increase rate of the number of cells with the number of cells. The number of cells used in this context is one of the values serving as indexes of growth of cells, and any value may be used so long as a value having an equivalent function (for example, turbidity of culture broth) is chosen.
The function of time of the specific growth rate is preferably obtained by preparing mathematical equations that express measurement data of the specific growth rate in a production process. Growth of cells can be measured by measuring turbidity of culture broth or counting the number of cells in a diluted culture broth. A curve of cell growth experimentally measured can be approximated as a function of time. A large number of products are marketed as software for obtaining such an approximate mathematical equation, and preferred examples include TableCurve® 2D (Systat Software). The obtained time function of growth curve can be differentiated and divided with the equation of the growth curve to obtain a time equation of the specific growth rate. Examples of the growth curve obtained by measurement of turbidity (OD) and the time equation of the specific growth rate μ obtained therefrom are shown in
(b) Representation of Parameters with a Growth Rate Factor
One or more of the parameters of the differential equations are represented as a growth rate factor.
The parameters change with the growth of cells. Thus, it is preferable to represent as many parameters as possible with a growth rate factor. The growth rate factor can be expressed as a function of time. A function of time of the specific growth rate is generated with measurement data relating to the specific growth rate obtained in a production process. In the alternative, the growth rate factor can be expressed as a function of the specific growth rate μ. The specific growth rate μ may be obtained by dividing the rate of increase of the number of cells by the number of cells.
For example, it is known that rate constants of transcription and translation, which are parameters of gene expression, markedly change with the growth rate of cells, and molecular numbers of RNA polymerase and ribosome, which catalyze the respective processes, markedly change with the culture rate (Bremer and Dennis, In Escherichia coli and Salmonella: Cellular and Molecular Biology/Second Edition (Neidhardt, F. C. Ed.), pp. 1553-1569, American Society for Microbiology Press, Washington, D.C., 1996). If the rate constants of transcription and translation are expressed as specific growth rate-dependent mathematical equations and incorporated into mathematical equations for gene expression, it becomes possible to prepare mathematical equations that accurately describe the expression. More specifically, in the aforementioned differential equation of mRNA (equation 2), [RNAP] can be represented with an equation of specific growth rate g. Similarly, in the aforementioned differential equation of protein (equation 3), [Ribosome] can be represented with an equation of the specific growth rate μ. The specific growth rate μ used herein may be the same as that used in (a) mentioned above, or a specific growth rate based on another index relating to growth of cells.
As mentioned above, it is also possible to directly express values measured in a cell growth process as a function of time. For example, if a function representing a parameter is represented with a specific growth rate-dependent equation, the parameter can be converted into a time function by substituting a time function of the specific growth rate into the specific growth rate-dependent equation.
(c) Description of Cell Component Formation Rate with a Growth Rate Factor
A formation rate with which a cell component is formed from an intracellular metabolite is incorporated into the differential equations, and the formation rate is represented as a function of a growth rate factor. As set forth above, the growth rate factor can be expressed as a function of time or as a function of the specific growth rate g.
The phrase “cell component” refers to a major polymer compound constituting cells such as protein, RNA, DNA, lipid and lipopolysaccharide. By enzymatic conversion of a substrate into a cell component such as protein, nucleic acid and lipid, cells can obtain a required component. By enumerating biochemical reactions resulting in such a cell component, a stoichiometric matrix can be created (Savinell J. M., Palsson B. O., J. Theor. Biol., 154:421-454, 1992; Vallino, J. J. and Stephanopoulos, G., Biotechnol. Bioeng., 41:633-646, 1993). Details of creation of a stoichiometric matrix of metabolic reactions from glucose to all the amino acids in E. coli are described in WO2005/001736 in detail. If a composition of the cell components is given, the formation rates of the cell components from intracellular metabolites, Vbiomass, can be calculated by using the stoichiometric matrix (Pramanik, J. and Keasling J. D., Biotechnol. Bioeng., 56:398-421, 1997).
It is also known that the cell component formation rate is growth rate-dependent (Pramanik, J. and Keasling J. D., Biotechnol. Bioeng., 60:230-238, 1998). Further, the composition of cell components also depends on the growth rate, and by measuring it, it can be described with mathematical equations by using the specific growth rate (Bremer and Dennis, In Escherichia coli and Salmonella: Cellular and Molecular Biology/Second Edition (Neidhardt, F. C. Ed.), pp. 1553-1569, American Society for Microbiology Press, Washington, D.C., 1996). With the cell component formation rate Vbiomass obtained as described above, the influence on intracellular metabolites can be taken into consideration for metabolites as precursors of the cell components. Specifically, by incorporating Vbiomass into Voutput in the differential equation of a metabolite as a precursor of the cell component (equation 1) to perform calculation, it becomes possible to describe the material balance of the metabolite as a precursor of the cell component with a growth rate factor equation.
If the formation rate of the cell component can be measured, it is also possible to represent values measured in a production process as a time function and use it directly. Moreover, if the function representing the formation rate is represented with a specific growth rate-dependent equation, the formation rate can be converted into a time function by substituting a time function of the specific growth rate into the specific growth rate-dependent equation. The specific growth rate may be the same as that used in (a) mentioned above, or a specific growth rate based on another index relating to growth of cells.
Moreover, it is preferable to represent the composition of cell components itself with a mathematical equation using the specific growth rate of cells or its equivalent index concerning the growth.
(d) Preparation of Mathematical Equations Representing Inflow Rate of Metabolites From Outside of Cells and Outflow Rate of Metabolites Out of Cells
An inflow rate of a metabolite taken up from the outside of cells and/or an outflow rate of a metabolite excreted out of cells are incorporated into the differential equations, and the inflow rate and/or the outflow rate is represented as a growth rate factor. As set forth above, the growth rate factor can be expressed as a function of time or as a function of the specific growth rate μ.
In a production process of a substance, it is common to add an organic substance other than the substrate such as glucose as a medium component. Examples include tryptone, soybean hydrolysate, yeast extract and so forth. Substances such as amino acids derived from such medium components are also taken up into cells and affect the metabolic simulation. It is possible to describe an uptake rate of a metabolite taken up from the inside of cells Vuptake with a function of the specific growth rate or time function based on measured values of concentrations of medium components remaining in the medium. As the metabolites excreted out of the cells from the inside of the cells during a production process, substances called by-products may also be excreted other than the objective product. By also incorporating these substances into the metabolic simulation, more accurate material balance can be described. The outflow rate to the outside of the cells Vexcretion can be represented with a mathematical equation using a growth rate factor. The growth rate factor can be a function of the specific growth rate or a time function from measured values of the metabolite concentration detected in the medium. By performing calculation with incorporating Vuptake into Vinput and Vexcretion into Vinput in the differential equation of a metabolite (equation 1), the material balance depending on the growth rate of the cells or time can be described (
If the inflow rate and/or the outflow rate can be measured, it is also possible to represent values measured in a production process as a time function and use it directly. For instance,
The metabolites taken up into the cells preferably are a substrate and/or an organic substance in the medium.
The substances excreted out of the cells preferably are an objective product and/or a by-product. The substances excreted out of the cells more preferably are amino acids, organic acids and/or carbon dioxide, further preferably amino acids or organic acids.
The cells used for the production process may be those of any type, so long as those used for substance production are chosen. Examples include, for example, various cultured cells, those of mold, yeast, various bacteria, and so forth. Preferred are those of a microorganism having an ability to produce a useful compound, for example, an amino acid, nucleic acid or organic acid. As a microorganism having an ability to produce an amino acid, nucleic acid or organic acid, E. coli, Bacillus bacteria, coryneform bacteria and so forth are preferably used. More preferred are those of a microorganism having an ability to produce an amino acid and/or an ability to produce an organic acid. The microorganism is preferably Escherichia coli.
By the simulation according to the method of the present invention, it becomes possible to predict dynamic behaviors of mRNA or protein concentrations for various enzymes in addition to intracellular metabolites. Therefore, the method of the present invention can serve as a useful tool in improvement of a production process of a useful substance, typically an amino acid or nucleic acid. For example, it becomes possible to verify effect of amplification or deletion of various enzymes in a computer (in silico experiments). Moreover, easy estimation of influence of change of parameters of various enzymes such as affinity to a substrate and affinity to an inhibitor on the whole metabolism and effect of amplification or deletion of a factor controlling expression of various enzyme genes also becomes possible. These results provide an important direction for improvement of a production process, and thus also have superior industrial usefulness.
The present invention further provides a program for executing the simulation method of the present invention and a storage means storing the program.
The program of the present invention is a program for making a computer execute the simulation method of the present invention and causes a computer to function as the following means (1) to (3):
(1) a means for storing a set of differential equations concerning intracellular metabolites and gene expression and satisfying the following (a) to (d);
(a) the differential equations include a specific growth rate of the cells, expressed as a differential equation wherein the specific growth rate is represented as a function of time;
(b) all or a part of the parameters of the differential equations are represented as a function of a growth rate factor.
(c) the differential equations include a formation rate for formation of a cell component from an intracellular metabolite, and the formation rate is represented as a function of a growth rate factor;
(d) the differential equations include an inflow rate of a metabolite taken up from the outside of the cells and/or an outflow rate of a metabolite excreted out of the cells from the inside of the cells, and the inflow rate and/or the outflow rate is represented as a growth rate factor;
(2) a means for storing values of parameters in the set of differential equations required for the simulation, and
(3) a means for calculating solutions of the set of differential equations based on the stored differential equations and values of the parameters.
A flowchart of a process executed by the program of the present invention is shown in
The means for storing a set of differential equations is constituted by a central processing portion 1, a storing portion 2 and an input portion 3. In a routine (S1) of storing a set of differential equations, the central processing portion 1 stores data of the set of differential equations inputted from the input portion 3 in the storing portion 2. The format of the data of the set of differential equations is not particularly limited, and it may be a usual format.
A means for storing values of parameters is constituted by the central processing portion 1, the storing portion 2 and the input portion 3. In a routine (S2) of storing values of parameters, the central processing portion 1 stores values of parameters inputted from the input portion 3 in the storing portion 2.
A means for computing solutions of the set of differential equations is constituted by the central processing portion 1, the storing portion 2 and an output portion 4. In a routine (S3) of computing the solutions of the set of differential equations, the central processing portion reads out the data of the differential equations and the values of parameters from the storing portion 2, computes the solutions from them and outputs the solutions to the output portion 4.
The central processing portion 1 is, for example, a processor. The storing portion 2 is, for example, a storage device using a recording medium. The input portion 3 is, for example, an input device such as keyboard and other devices or a data receiver for data from another device. The output portion 4 is, for example, an output device such as display, or a data transmission device for transmission to other devices.
The program for causing a computer to function as the aforementioned means can be created according to a usual programming method.
Further, the program of the present invention can also be stored in a computer-readable recording medium. The “recording medium” referred to herein include any “transportable physical media” such as floppy disk (registered trademark), magneto-optical disk, ROM, EPROM, EEPROM, CD-ROM, MO and DVD, any “physical media for fixation” such as ROM, RAM and HD built in various computer systems, and “communication media” storing the program over a short period of time such as communication cables and carrier waves in the case of transmitting the program through a network, of which typical examples are LAN, WAN and Internet.
Further, the “program” is a data processing method described with any language or description mode, and the format such as source code and binary code is not limited. In addition, a “program” is not necessarily restricted to those configured as a single program, and includes those configured as a distributed system as two or more modules and libraries and those achieving the function through co-operation with another program, of which typical example is an operation system (OS). As specific configuration for reading from the recording medium in the devices shown in the embodiment, routines for reading, routines for installation after reading and so forth, known configurations and routines can be used.
The present invention provides accurate metabolic simulation results for a substance-production process that is accompanied by growth of cells. A simulation of the invention thus enables in silico experiments, which lend practical direction to improving such a production process, typically for obtaining an amino acid or nucleic acid. Illustrative of the improvement realizable in this regard is production optimization in batch culture or fed-batch culture, often used as an actual industrial process.
Hereafter, the present invention will be further explained with reference to examples.
<1> System Parameters
Simulation of expression of the proteins of the enzymes and transcription factors from the genes and conversion of substances by the enzymatic reactions mentioned in the central metabolic map of E. coli shown in
<2> Modeling of Enzymatic Reaction
The abbreviations and initial values of the metabolites used in this example are shown in Table 2. As for those obtained from literature, titles of literature are shown. In the simulation, supposing that the volumes of all cells increased in the process of growth, the total cell volume (cellvoltot) was used as a variable. The initial value of the total cell volume (cellvoltot) was calculated from the initially measured value of OD (initOD) in accordance with the following equation.
cellvoltot=cellvol×celldens×reacvol×initOD
Modeling of the enzymatic reactions was performed based on the Michaelis-Menten type equation described by Segel (Segel, Enzyme Kinetics: Behavior and Analysis of Rapid Equilibrium and Steady-State Enzyme Systems, John Wiley & Sons, New York, 1975). Names and initial values of quantities of enzymes, transcription factors and mRNA are shown in Table 3. Types, values of parameters, substrates, products and effectors of the enzymatic reactions are shown in Table 4. The enzymatic reaction formulas are shown in Table 4.
<3> Equilibirium Reactions
The algebraic equations were reduced for solutions by assuming that equilibirium is established for the binding of transcription factors and effectors and activation of CYA. CRP is a transcription factor involved in catabolite suppression, and [CRP-cAMP]tot produced by equilibration of CRP and cAMP as an effector thereof was represented as a solution of the quadratic equation shown in the row of CRP1 in Table 6. [CRP]tot and [cAMP]tot represent the total concentrations of intracellular CRP and cAMP, respectively. About 200 binding sites on the genome are known for CRP, and it is necessary to consider that equilibirium is established for CRP-cAMP and these binding sites. Assuming that the dissociation constant for this, KdCRP is 4×10−8 (M), the concentration of CRP-cAMP binding with the promoter on the genome can be considered a solution of the quadratic equation shown in the row of CRP2 in Table 6. It is known that Mlc suppresses a target gene in the absence of glucose, whereas it binds to non-phosphorylated IICBGlc in the presence of glucose. [Mlc-IICBGlc] which binds with IICBGlc and thus is inactivated was represented as a solution of the quadratic equation shown in the row of Mlc in Table 6. Cra is a transcription factor known as an activator/repressor of many sugar metabolism-related genes. It was assumed that the Cra concentration was constant. [Cra-F1P] binding to FIP, which is an effector thereof, was represented as a solution of the quadratic equation represented in the row of Cra in Table 6. The effector of PdhR, which is a repressor of the aceEF gene coding for PDH, is PYR (Quail and Guest, Mol. Microbiol., 15, 519-529, 1995), and [PdhR-PYR] which binds with PYR and is thus inactivated was represented as a solution of the quadratic equation shown in the row of PdhR in Table 6. It has been suggested by Cortay et al. that IclR is a transcription factor that suppresses expression of the glyoxylic acid pathway, and the effector thereof is PEP (Cortay et al., EMBO J., 10, 675-679, 1991). [IclR-PEP] which binds with PEP and thus is inactivated was represented as a solution of the quadratic equation shown in the row of PclR in Table 6. Activation of CYA by phosphorylated IIAGlc (IIAGlc-P) is known. Although the detailed mechanism is unknown, modeling was performed by assuming that CYA and IIAGlc-P bind to each other to form an activated CYA (CYAA). Based on the difference in CYA activity between a wild type strain and a strain deficient in crr, which codes for IIAGlc (Reddy and Kamireddi, J. Bacteriol., 180, 732-736, 1998), the dissociation constant of CYAA, KdCYAA, was predicted as 1.34×10−4 (M). Based on this, concentration of the activated CYAA [CYAA] produced by the equilibirium of CYA and IIAGlc-P was represented as a solution of the quadratic equation shown in the row of CYA in Table 6.
<4> Modeling of Gene Expression
As for the gene expression of E. coli, modeling was performed for the genes to which the transcription factors CRP, Cra, Mlc, PdhR, and IclR relate by referring to the EcoCyc database (Keseler et al., Nucleic Acids Res., 33, D334-D337, 2005). As for the transcription factors (TF) per se, modeling of expression was performed for CRP, Mlc, PdhR and IclR. A list of the equations of transcription and translation and parameters of the genes is shown in Table 7, and the equations used for the gene expression are shown in Table 8. [mRNAgene] represents the concentration of mRNA to be transcribed, [Pgene] represents the concentration of a promoter for a gene, and [RNAP□D] represents the concentration of RNA polymerase bound with □D. The parameters kgenebase, kgeneTF and kgenedRNA are a baseline transcription rate constant, a transcription rate constant for TF-binding gene, and a decomposition constant of mRNA, respectively. [Protein] represents the concentration of translated protein, and [Ribosome] represents the ribosome concentration. The parameters ktrans and kdeg represent a translation rate constant and a proteolysis rate constant, respectively. The NoTF equation was used for a gene of which control is not known and a gene for which control is not considered, and TF1 was used for a gene to which one transcription factor relates. For the genes to which two transcription factors relate (crp, mlc, ptsG, ptsHI, aceBAK), equation was mentioned for each gene. If the concentrations of the translated proteins of the same operon are different, it is considered that such difference is due to reduction of transcription product or difference in transcription efficiency. Therefore, TProtein (translation efficiency coefficient of protein) was defined, and the TL1 equation was used. The sdhCDAB-sucABCD operon of E. coli contains sdhCDAB coding for SDH, and sucABCD composed of sucAB coding for the E1 and E2 subunits of the KGDH complex and sucCD coding for SCS (Cunningham and Guest, Microbiology, 144, 2113-2123, 1998). In consideration of the fact that the sucABCD operon is also transcribed from Psuc besides Psdh, and coefficients concerning the translation efficiency, TKGDH and TSCS, for the KGDH complex and SCS, respectively, it was described by using TL (KGDH) and TL (SCS). Because it was reported that the ratio of the products of the aceBAK operon, ICL, MSA and ICDKP, is 1:0.3:0.003 (Chung et al., J. Bacteriol., 175, 4572-4575, 1993), translation constants TMSA and TICDKP were defined for the translation of MSA and ICDKP, respectively, and TL1 was used.
As concentration of a promoter, a value at μ of 0.01 (min)−1 was estimated based on the μ-dependent intracellular gene number data described by Bremer and Dennis (Escherichia coli and Salmonella: Cellular and Molecular Biology/Second Edition (Neidhardt F. C., Ed., pp. 1553-1569, American Society for Microbiology Press, Washington, D.C., 1996), and used as a constant. As for the number of binding sites on the genome of CRP, a value at μ of 0.01 (min)−1 was calculated on the presumption that about 200 of the binding sites of CRP are uniformly distributed over the genome. As the rate constant of transcription, a value calculated so that it should give the literature value of the protein concentration or specific activity of enzyme as a constant value was used. When two or more transcription rate constants were required in control by a transcription factor, they were calculated by using data of transcription activity or protein concentration in a transcription factor-deficient strain. As the mRNA decomposition rate, if any experimental value is available from literature for a certain gene, it was used, or otherwise, measurement data based on the DNA microarray experiment of Selinger et al. (Genome Res., 13, 216-223, 2003) were used.
<5> Preparation of Mathematical Equation for Specific Growth Rate μ and Cell Formation Rate
The specific growth rate μ is an index often used for representing growth. In order to represent growth with μ as accurately as possible, the following approximate equation of OD was obtained from OD data over time from culture of a wild type strain in a S-type jar by using a curve fitting program, TableCurve 2D (Systat Software), and a time function of μ was obtained from an equation obtained by differentiating the approximate equation. The result of plotting for OD and R based on the approximate equation is shown in
OD=(2.05+1.53×10−5t2−5.35×1010t4+3.07×10−25t6)/(1−1.76×10−5t2+1.17×1010t4−3.19×10−16t6+4.19×10−22t8)
μ=dOD/dt/OD
In order to compute the cell formation rate during the growth, metabolic reactions of E. coli were defined based on the report of Chassagnole et al. (Biotechnol. Bioeng., 79, 53-73, 2002). Synthetic reactions of the cell components were defined for each of protein synthesis, RNA synthesis, DNA replication, lipid synthesis, glycogen synthesis and peptidoglycan (murein) synthesis, and stoichiometric equations were defined from ratios of components (Neidhardt and Umbarger, Escherichia coli and Salmonella: Cellular and Molecular Biology (Neidhardt, F. C. Ed., pp. 13-16, American Society for Microbiology and Washington D.C., 1996; Pramanik and Keasling, Biotechnol. Bioeng., 56, 398-421, 1997) and energy required for synthesis (Stephanopoulos et al., Metabolic Engineering: Principles and Methodologies, Academic Press, San Diego, 1998). Furthermore, a stoichiometric equation was prepared for each component required for synthesis of 1 g of cells from the composition of cells (Chassagnole et al., Biotechnol. Bioeng., 79, 53-73, 2002). This equation concerning the cell formation was converted into a stoichiometric equation using intermediate metabolites used in the simulation to create the following stoichiometric equation for each intermediate metabolite required for producing 1 g of cells.
g_biomass=3.962 Pyr+1.229 aKG+−2.232 CO2+10.91 NH4+44.69 ATP+−44.6
ADP+−15.18 P+16.09H+18.17 NADPH+−18.17 NADP+2.409 AcCoA+−
2.949 CoA+−0.487 Fum+2.393 OAA+1.957 3PG+0.252 SO4+−2.329 NADH+
2.329 NAD+0.5402 SucCoA+−0.4727 Suc+0.6887 PEP+0.3312 E4P+0.4133
DHAP+0.1023 O2+−0.0432 GAP+0.5312 R5P+0.1025 F6P
The amount of the intermediate metabolite required for formation of 1 g of cells was converted into a value per cell volume and minute and incorporated into the differential equations as an equation of the specific growth rate μ.
<6> Preparation of Mathematical Equations of RNA Polymerase and Ribosome
Transcription and translation in gene expression are catalyzed by RNA polymerase and ribosome, respectively. It is known that the molecular numbers of these enzymes change during the process of growth. From the data of μ-dependent intracellular molecular number described by Bremer and Dennis (Escherichia coli and Salmonella: Cellular and Molecular Biology/Second Edition (Neidhardt F. C. Ed., pp. 1553-1569, American Society for Microbiology Press, Washington, D.C., 1996), mathematical equations were prepared by using approximate equations. RNA polymerase binds with the □ factor to become a holoenzyme and then function. Because □D responsible for gene expression during the growth phase is substantially constant during the growing process, □D-bound RNA polymerase concentration [RNAP□D] was considered to be ⅓ of the total RNA polymerase concentration, and represented by the following equation.
[RNAPD]=6.67×10−7+3.0×10−4μ+2.64×10−2μ2
As for ribosome, by fitting the data of Bremer and Dennis (Escherichia coli and Salmonella: Cellular and Molecular Biology/Second Edition (Neidhardt F. C. Ed., pp. 1553-1569, American Society for Microbiology Press, Washington, D.C., 1996) using TableCurve 2D, the following equation was obtained.
[Ribosome]=1.90×10−5+1.38μ2+10.2μ2.5+36.6μ3
<7> Excretion to Outside of Cells and Uptake from Outside of Cells
Among the substances excreted to the outside of cells, excretion of acetic acid (AcOH) and formic acid (Formate), of which amounts detected in culture of a wild type strain were large, was incorporated. Based on the measured data for extracellular concentrations of acetic acid and formic acid over time, the profiles over time were approximated to time functions, and the functions were converted into rates by differentiation and incorporated into the differential equations of ACCoA and PYR. The plot of the amount of acetic acid based on the approximated function and the rate obtained by differentiating the approximated function is shown in
AcOHex=(2.49×10−3−7.61×10−3t−3.38×10−t2+9.33×10−10t3)/(1−7.61×10−3t+
2.44×10−5t2−1.05×10−8t3) Formex=4.41×10−4−1.17×10−9t2+1.97×10−13t4+3.93×10−19t6
Among the organic substances existing in medium, amino acids and so forth are taken up into the cells. Uptake of glutamic acid (Glu) and alanine (Ala), of which existing amounts in culture of a wild type strain were large, was represented with mathematical equations and incorporated. Uptake of glutamic acid and alanine contained in the initial medium was approximated with the following time functions, and the functions were converted into rates by differentiation of the functions and incorporated into the differential equations of AKG and PYR.
Gluin=9.2×10−4−7.26×10−6t
Alain=8.36×10−4−6.69×10−6t
<8> Culture of Wild Type Strain Mg1655 and Metabolic Flux Analysis
The wild type strain MG1655 was cultured overnight in 30 ml of LB medium, and the cell were collected from the culture broth. The cells were cultured in MS medium using 13C glucose contained in an S-type jar under the conditions of batch culture. As for the composition of the MS medium, it had a composition of 40 g of glucose, 1 g of MgSO4.7H2O, 16 g of (NH4)2SO4, 1 g of KH2PO4, 2 g of Bacto yeast extract, 0.01 g of MnSO4.4H2O, 0.01 g of FeSO4.7H2O, and 0.5 ml of GD113 (antifoaming agent) in 1 L, and as for the culture conditions, the culture was carried out in a culture volume of 0.3 L, at a temperature of 37° C. and pH 7.0 with aeration by stirring. Metabolic flux analysis was performed during the growth phase (315 minutes) and the stationary phase (495 minutes). The method of metabolic flux analysis is described in International Publication No. WO2005/001736 in detail. These culture results were used for verification of the simulation. Plots of the results of the measurements of extracellular glucose concentration and extracellular CO2 concentration are shown in
<9> Simulation and Verification of E. coli Central Metabolic Model
The simulation was performed by describing differential equations using a mathematical calculation program MATLAB (MathWorks) and using ode15s as an ODE solver. The differential equations of material balance used for the simulation are shown below. Material balance of each substance is described as the sum of enzymatic reaction rate, dilution effect due to growth, synthesis rate of cell component (y_biomass(metabolite)), uptake rate of extracellular substance and rate of excretion to the outside of cells mentioned in Table 4.
d[Cellvoltot]/dt=μ*[Cellvoltot]
d[Glucose]/dt=−rx1e
d[G6P]/dt=rx1e−rx2−rx12−(μ*[G6P])
d[F6P]/dt=rx2+rx29+rx16+rx17b−rx3−(t*[F6P])−y_biomass(F6P)*mu/cellvol*cellweight
d[FDP]/dt=rx3−rx4−rx29−(μ*[FDP])−y_biomass(FDP)*t/cellvol*cellweight
d[DHAP]/dt=rx4−rx5−(μ*[DHAP])−y_biomass(DHAP)*μ/cellvol*cellweight
d[GA3P]/dt=rx4+rx5+rx17b−rx6−rx16−rx17−(μ*[GA3P])−y_biomass(GA3P)*μ/cellvol*cellweight
d[13DPG]/dt=rx6−rx7−(μ*[13DPG])
d[3PG]/dt=rx7−rx8−rx9−(μ*[3PG])−y_biomass(3PG)*μ/cellvol*cellweight
d[2PG]/dt=rx8+rx9−rx10−(μ*[2PG])
PEP
d[PEP]/dt=rx10+rx30+rx34−rx11−rx1a−rx31−(μ*[PEP])−y_biomass(11)*μ/cellvol*cellweight
d[PYR]/dt=rx11+rx1a+rx32+rx33−rx18−rx30−(μ*[PYR])−y_biomass(PYR)
*mu/cellvol*cellweight+Alauptake−Formin
d[6PGC]/dt=rx12−rx13−(μ*[6PGC])
d[RL5P]/dt=rx13−rx14−rx15−(μ*[RL5P])
d[R5P]/dt=rx14+rx17−(μ*[R5P])−y_biomass(R5P)*μ/cellvol*cellweight
d[X5P]/dt=rx15+rx17−rx17b−(μ*[X5P])
d[E4P]/dt=rx16−rx17b−(μ*[E4P])−y_biomass(E4P)*μ/cellvol*cellweight
d[S7P]/dt=−rx16−rx17−(μ*[S7P])
d[ACCoA]/dt=rx18−rx19−rx36−(μ*[ACCoA])−AcOHin−y_biomass(19)*mu/cellvol*cellweight+Formin
d[OAA]/dt=rx28+rx31−rx19−rx34−(μ*[OAA])−y_biomass(OAA)*μ/cellvol*cellweight
d[CIT]/dt=rx19−rx20−rx21−(μ*[CIT])−y_biomass(CIT)*μ/cellvol*cellweight
d[ICIT]/dt=rx20+rx21−rx22−rx35−(μ*[ICIT])−y_biomass(ICIT)*μ/cellvol*cellweight
d[AKG]/dt=rx22−rx23−(μ*[AKG])−y_biomass(AKG)*μ/cellvol*cellweight+Gluuptake
d[SUCCoA]/dt=rx23−rx24−(μ*[SUCCoA])−y_biomass(SUCCoA)*μ/cellvol*cellweight
d[SUCC]/dt=rx24+rx35−rx25−rx26−(μ*[SUCC])−y_biomass(SUCC)*μ/cellvol*cellweight
d[FUM]/dt=rx25+rx26−rx27−(μ*[FUM])−y_biomass(FUM)*μ/cellvol*cellweight
d[MAL]/dt=rx27+rx36−rx28−rx32−rx33−(μ*[MAL])−y_biomass(27)*μ/cellvol*cellweight
d[GLX]/dt=rx35−rx36−(μ*[GLX])
d[cAMP]/dt=rx39+rx39a−rx40−rx41−(μ*[cAMP])
d[cAMPex]/dt=rx41
d[F1P]/dt=−rx42−(μ*[FIP])
d[CO2]/dt=rx13+rx18+rx22+rx23+rx32+rx33+rx34−rx31−(μ*[CO2])−y_biomass(32)*μ/cellvol*cellweight
During the simulation, a part of the parameters were manually changed to perform the simulation. The changed parameters are shown in Tables 3, 4 and 7. Among the results of the simulation of the E. coli central metabolic model, temporal changes of major metabolites are shown in
The results of simulation of gene expression where RNA polymerase and ribosome concentrations were independent from μ are shown in
Number | Date | Country | Kind |
---|---|---|---|
2005-231005 | Aug 2005 | JP | national |