It is estimated that the human genome comprises 20,000 to 80,000 genes, which contain the genetic code for around one million proteins. In the specialized body cells only subsets of all genes are actually read off (expressed) in each case. The totality of the proteins created in this way is referred to as the proteome of this cell. The interplay of the proteins with each other as well as with the DNA represents the most important part of the machinery which underlies the development of the human body from the fertilized egg cell as well as all body functions. From the information technology standpoint the genome thereby represents a procedural code for the structure and function of the human body.
Many illnesses and malfunctions of the body are attributable to faults in the functional network of genome and proteome. Thus a plurality of medicaments operate as agonists or antagonists of specific target proteins, i.e. they strengthen or weaken the function of a protein with the aim of returning the function of the regulatory network formed from proteome and genome back into a normal functional mode. These targets have previously been derived using heuristic principles from biochemical considerations. It is often unclear in such cases whether the malfunction of a protein represents the actual cause of the illness or only one of the symptoms of a hidden mis-regulation at another point of the network.
The simulation of nerve cells (neurons) and their biological functionality by artificial neurons is known from Zell, A., “Simulation Neuronaler networks” (Simulation of Neuronal Networks), P. 35 to 51, P. 55 to 86, Addison-Wesley Longman Verlag GmbH, 1994, 3rd unamended reprint, R. Oldenbourg Verlag, ISBN 3-486-24350-0, 2000 (“Zell pages 35 to 51 and 55-86”) and Patent Application PCT/DE02/03381 (“PCT03381”).
Making a distinction between two types of synapses or nerve cells, an excitatory synapse 251 or nerve cell and an inhibiting synapse 252 or nerve cell is further known from Zell pages 35 to 51 and 55-86.
Inhibitory synapses 252 reduce electrical potentials to be transmitted or forwarded, excitatory synapses 251 increase electrical potentials to be transmitted.
Further information about the structure and functionality of a nerve cell as well as for nerve conductance is given in Zell pages 35 to 51 and 55-86.
Furthermore an artificial nerve cell (artificial neuron) which emulates a (biological) nerve cell is known from Zell pages 35 to 51 and 55-86 and PCT03381.
At first glance such an artificial neuron is a mathematical mapping, which, in accordance with the transmission behavior of the biological nerve cell, maps an input variable of the artificial neuron onto an output variable of the artificial neuron.
In compliance with the biological template an artificial neuron has three components: the cell body, the dendrites, which sum input signals into the artificial neuron, the axon, which forwards the output signal of the artificial neuron to the outside, branches and comes into contact with the dendrites of subsequent artificial neurons via synapses.
A strength of a synapse or the type of the synapse is mostly represented by a numeric value or by its leading sign. This value is referred to as the connection weight.
In accordance with the biological template a transmission behavior or the mapping behavior of an artificial neuron can be mapped as described in Zell pages 35 to 51 and 55-86 and PCT03381.
Further information about artificial neurons and their functionality is given in Zell pages 35 to 51 and 55-86 and PCT03381.
The linkage of individual neurons with each other is further known from Zell, A., “Simulation Neuronaler networks” (Simulation of Neuronal Networks), P. 87 to 114, Addison-Wesley Longman Verlag GmbH, 1994, 3rd unamended reprint, R. Oldenbourg Verlag, ISBN 3-48624350-0, 2000 (“Zell pages 87-114”) and PCT03381. Such an arrangement with linked neurons is referred to as a neuronal network. The basics of neuronal networks, for example different types of neuronal networks, training methods for neuronal networks, references to biological nerve cell arrangements, are described in Zell pages 87-114.
An application of a mean-field model to the description of a complex system is known from J. J. Binney, N. J. Dowrick, A. J. Fisher, M. E. J. Newman, “The Theory of Critical Phenomena”, Chap. 6: Mean-Field Theory, Clarendon Press Oxford 1992 and PCT03381. With a mean-field model stochastic interaction influences between components of a system are approximated by a mean interaction influence. This allows stochastic systems which cannot be described analytically to be reduced to describable, deterministic systems.
The application of the mean-field model to the description of a neuron structure is known from C. Koch, I. Segev (Hrsg), “Methods of Neural Modeling: From Ions To Networks”, Chap 13: D. Hansel and H. Sompolinsky: “Modeling Feature Selectivity in Local Circuits”, MIT Press, Cambridge, 1998 and PCT03381.
One possible object of the invention is to improve the identification of proteins which are suitable as targets for treatment without medicaments of genetic diseases or problems.
The inventors propose to determine a plurality of gene expression patterns of similar types of cells or of a tissue, with an expression rate of the genes of the cell being determined in each case. The plurality of gene expression patterns is determined such that the chronological sequence of the gene expression pattern of the cell can be at least partly reconstructed. A dynamic model of the regulatory network made up of genome and proteome of the cell is formed by forming an equivalent neuronal network in the following way:
The equivalent neuronal network is compared with the gene expression patterns determined and is adapted to these. From the adapted neuronal network the regulatory network of the cell investigated is deduced.
There is thus an equivalence relationship between the functional network of the genome and proteome on the one side and the neuronal network of the human brain on the other side, which both represent strongly networked closed-loop systems. This mapping brings about successful modeling of the functional network of proteins and genes.
The method allows the identification of target proteins on a systematic basis. The equivalence relationship described can be established between genetic and neuronal networks. The dynamic interaction of genes and regulatory proteins is thus modeled by a dynamic neuronal network. The method uses the information contained in the chronological sequence of the gene expression pattern for the identification of causal regulatory interrelationships.
As a rule genetic illnesses lead to complex malfunctions which however often only lead back to a few malfunctioning genes or proteins. Until now these key genes have not been known except in individual cases. Instead heuristic processes have been used in conventional target finding to search for targets for which regulation without medicament would restore the healthy gene expression pattern in the best possible way.
Recent estimates talk of 10,000 different proteins as possible targets in the human genome which it would not be practicable to thin out using a heuristic approach alone.
The model approach described here represents a powerful method for systematic target finding, that its for identification of one or more key genes or proteins which are located at the start of the regulation cascade and which for example introduce an organic development, regulate the regeneration capability of tissue but which are also responsible for mostly complex changes of gene expression patterns in the event of illness.
The method described allows a computer-based target finding which is able to analyze the large amount of data and the numerous and complex interrelationships.
It allows the following application areas to be specified:
These and other objects and advantages of the present invention will become more apparent and more readily appreciated from the following description of the preferred embodiments, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout.
The top part of
The option thus basically exists for influencing the expression rate of individual genes of a cell over the path mentioned from outside the cells.
A not necessarily contiguous section of the DNA is referred to as a gene which contains the genetic code for a protein or also for a group of proteins. In general the DNA features what are known as exons and introns. Exons represent parts of the DNA which actually encode a protein. Introns represent parts of the DNA which do not directly encode a protein. In a first approximation they have no function. Exons and introns alternate with each other in the DNA. If a gene is referred to as the quantity of the exons which together encode a specific protein, such a gene—as mentioned above—is as a rule not contiguous.
The production process of a protein from a gene, for example protein A, starting from gene A in
Not all genes are expressed in a cell. Instead different cell types are differentiated by their gene expression pattern. This then often also applies to the difference between diseased and healthy cells.
The expression pattern of a cell is determined by the regulatory processes shown schematically in
Thus the expression rate of a gene A can be regulated by the presence of another protein B, i.e. increased, reduced or brought to a standstill. In this example the protein B acts in a regulatory way on the gene A or the protein A. The protein components of activator complexes can for example be reckoned to be regulatory proteins. Regulatory proteins can operate on many target genes simultaneously.
A second type of interaction is a post-translational modification of proteins, i.e. the modification of proteins after their translation. As a rule the post-translational modification of a protein occurs directly after the translation, i.e. before the protein acts in the cell. Thus for example many proteins are phosphorylized or glycolyzed by specific enzymes, i.e. the target protein is put into its functional state by appending or splitting off chemical groups or is put into a state of in which it no longer has any effect. Post-translational modification can thus temporarily switch on or switch off the functions of a protein where necessary.
In
Protein B is a regulatory protein since it determines the expression rate of the protein A by interacting with that DNA section which contains the gene A. The protein D thus modifies the function of a regulatory protein (protein B) in the course of the post-translational modification.
Database
The nucleic acid sequence of the human DNA is very largely known. The genes encoded by the DNA have also been identified to an increasing extent. Not quite so complete is the knowledge about the proteins including the post translational modified proteins possibly produced by interactions between the proteins. In any event new sequencing and high-throughput screening processes allow further genes and proteins to be quickly identified.
A further important step for clarifying the expression pattern of a cell has been completed with the development of high-throughput hybridization techniques. With this method the expression rate of many 100 different genes is tested simultaneously on what is known as a microarray. With the aid of this method it is possible to determine the gene expression pattern of a cell.
To this end the mRNA (messenger RNAs) synthesized in the cell are determined as a rule. The mRNA is an intermediate product in the translation of the gene into a protein. The mRNA is thus a preliminary stage in the formation of the protein and points to the formation of the associated protein. The cell to be investigated is first isolated. Subsequently it is deduced. Suitable rationalization steps are used to isolate the mRNAs from the cell. Then the mRNA is translated using reverse transcriptasis into cDNA (complementary DNA). This is generally amplified by linear PCR (polymerase chain reaction). The cDNA thus obtained is analyzed with the aid of suitable microarrays, e.g. DNA chips, qualitatively or quantitatively. With modern microarrays the expression rates of 5,000 and more genes can be calibrated simultaneously.
Because of these improved techniques there is now comprehensive knowledge about the human genome and proteome as well as about the interactions between proteins and genes or between the proteins themselves.
Of particular importance are data records in which the chronological sequence of gene expression patterns in a tissue is stored. What are known as longitudinal hybridization studies, i.e. chronological sequences of gene expression patterns during the organ differentiation as part of the embryonal development is one example that might be mentioned. Time-resolved gene expression data also exists for the cell division cycle of single cell creatures and is also possible for more complex tissue.
Neuronal modeling of genome and proteome (cf. PCT03381)
A general outline of the modeling principle is given below. The basics are known from PCT03381.
The basic principle relates to establishing an equivalence relationship between the functional network of the genome and proteome on the one hand and the neuronal network of the human brain on the other hand, which both represent heavily networked closed-loop systems.
The neuronal network of the human brain is illustrated below in a plurality of fundamentals with reference to
In the human brain there are around 100 billion nerve cells 20 (neurons), which each exchange information with tens of thousands of other nerve cells 20. The information passes from a neuron 20 via the axon 22 belonging to each neuron to another neuron 20. Each neuron has precisely one axon to send information to other neurons. In its further progress and axon typically branches around one thousand times, so that a neuron 20 can send information of via its axon 22 to around one thousand other neurons 20.
To receive information neurons 20 have dendrites 24. The axon 22 carrying information is connected via a synapse 26 with the dendrites 24. The information passes via this synapse 26 from the axon 22 into the dendrite 24 and thereby from the emitting neuron 22 to the downstream neuron. Between thousands and hundreds of thousands of axons 22 or synapses 26 can access a single dendrite 24 so that a downstream neuron 20 can receive signals from many 1000 upstream neurons.
Reference is made to
The text below refers to
PSP=W·ε(t),
with w, as defined above, representing the synaptic weight of the first synapse 26 and 8(t) the timing of the post-synaptic potential 32 in a suitable normalization.
If in addition the modulatory synapse 36 accesses the first synapse 26, this produces a modified post-synaptic potential PSP′ in the dendrite 24 which can be expressed by a multiplicative term act:
PSP′=act·w·ε(t)=act·PSP.
In this case act identifies the activity of the modulatory synapse 36.
For example dopaminergic synapses have a modulatory character in the central nervous system, that is in the brain and the spinal cord.
The neuronal activity of each neuron, i.e. the number of the spikes emitted for each unit of time, is produced—in simple terms—by a non-linear and chronologically non-local function of all incoming post-synaptic potentials. If this function exceeds a specific threshold value a spike 30 is initiated and transmitted via the axon 22.
Thus the biological neuronal network of the brain represents a complex non-linear system which also features a high networking density. To describe this system in a formal model neuro-information technology has developed powerful theories and algorithms in recent years (e.g. compartment model, spike response model, mean-field model, multi-modular neuro cognitive model, Bayes belief networks).
These theories or equations correspond in their structure to the equations derived above for reaction kinetics. Thus the regulatory network of genome and proteome of a cell can be mapped to an equivalent neuronal network as follows:
The equivalence relationship described can be established between genetic and neuronal networks. The dynamic interaction of genes and regulatory proteins is thus modeled by a dynamic neuronal network.
Networks of spiking neurons count as suitable neuronal algorithms but also mean-field models which take into account the explicit passage of time of the signal transmission between the neurons by the explicit description of the post-synaptic potentials. They allow the modeling of the development over time of the neuronal activities in the network as a result of external stimulation or intrinsic activity.
The development over time of the concentrations which is produced by the reaction kinetics between the molecules involved (e.g. between regulatory protein and DNA promoter) will thus be replaced by the time sequence of the activities of the neurons so that the resulting network model for simulating the timing development of gene expression patterns can be included.
The neuronal activities over time can be included for this type of neuronal network. Since the neuronal activity corresponds to the gene expression patterns, the two can be compared to each other. The neuronal network corresponds to a simulated gene expression pattern.
The object of the modeling is to determine the regulatory network underlying the expression sequence, i.e. to answer the following question: “Which neuronal networking structure with which weights and reaction constants is consistent with the observed gene expression sequence?”
To answer this question the network is trained with a method oriented to structured learning: An attempt is made to explain the observed behavior with as few regulatory connections as possible but also as well as possible, that is to find the simplest model consistent with the data.
A preferred optimization method minimizes the total deviation between measured and simulated gene expression patterns by using a “sparse prior”, that is an additional condition which penalizes the co-existence of many connections with small weights in favor of fewer regulatory connections. An option for implementing such a sparse prior is known to those skilled in the art.
Cross-validation and statistical optimization allow the uniqueness of the solution to be estimated as well as its ability to predict (generalization capability).
Causal relationships between genes but also the role of different genes can be taken from the trained network on the basis of the connection structure of the neurons. Thus an asymmetrical weight only from gene B to gene A indicates that gene B regulates gene A. At the same time in the model different genes or regulatory connections can be artificially switched off or switched on and the effects of the gene expression pattern with the target quantified, to identify the cause(s) of the illness-related changes of the gene expression (known as inverse modeling).
The invention has been described in detail with particular reference to preferred embodiments thereof and examples, but it will be understood that variations and modifications can be effected within the spirit and scope of the invention covered by the claims which may include the phrase “at least one of A, B and C” as an alternative expression that means one or more of A, B and C may be used, contrary to the holding in Superguide v. DIRECTV, 69 USPQ2d 1865 (Fed. Cir. 2004).
Number | Date | Country | Kind |
---|---|---|---|
103 42 274.9 | Sep 2003 | DE | national |
This application is based on and hereby claims priority to PCT Application No. PCT/EP2004/051835 filed on Aug. 18, 2004 and German Application 10342274.9 filed Sep. 12, 2003, the contents of which are hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP04/51835 | 8/18/2004 | WO | 3/13/2006 |