The current invention is in the field of mammalian cell cultivation. More specifically, the object of the present invention is a method for the on-line determination of process target parameters based on historical on-line and off-line values of a set of process variables.
For the production of therapeutic agents in the pharmaceutical industry, the highest demands are placed on quality and reproducibility. For this reason, empirical standards (GMP guidelines, good manufacturing practice) are in place that define target values, process limits and deviations in order to meet these requirements. The US Food and Drug Administration (FDA) recently asked the pharmaceutical industry through a PAT (Process Analytical Technology) initiative to develop a better understanding of the processes carried out in order to increase the quality of their products [1]. In recent years, new technologies such as computer-based models have been used to advance the understanding of cell culture processes of, for example, CHO cells, which are used to produce therapeutic proteins.
Bioreactors are most often used to cultivate cells. Various process variables are recorded here during cultivation. These enable process monitoring as well as control and serve to maintain controlled environmental conditions. A distinction is made between on-line and off-line values. Both values provide important information about the process. On-line values are collected by appropriate sensors that are used for direct on-line control. Off-line values, however, are to be determined by manual sampling with subsequent, external analytical methods. Such off-line parameters are, for example, the live cell density, glucose and lactate concentrations. These can be used to assess the current cultivation conditions and, if necessary, to intervene in the regulation of the process.
The analysis of the samples requires an increased manual effort, especially in the case of high-throughput cultivation systems. These external methods can also lead to errors and device failures in some circumstances. In order to make the processes more efficient and robust, it is possible to use on-line values already recorded during the cultivation to obtain information on-line. In this way, the existing measured parameters and their relationships can be analyzed so as to be described using suitable mathematical models of machine learning.
Artificial neural networks (ANN) for monitoring biomass in fed-batch processes have been described [8]. Kroll et al. described a model-based soft sensor for determining the subpopulations of the CHO cell biomass [9].
Hutter, S., et al. discloses glycosylation flux analysis of immunoglobulin G in Chinese hamster ovary perfusion cell culture (Process 6 (2018) 176). They describes a metabolic flux analysis based approach to generate insights on glycosylation pathways. Hutter et al. are focusing on metabolic flux analysis in perfusion cell culture experiments. Only off-line determined parameters were used to fit a mechanistic (linear) model using a random forest model to rank the input parameters influence on the glycosylation outcome. As such, Hutter et al. disclose a statistical analysis based on off-line data and performed after the cultivation, i.e. a modeling tool to make (biological) sense of historic data. No predictive or on-line algorithm is disclosed.
In the White Paper “Biopharma PAT—Quality Attributes, Critical Process Parameters & Key Performance Indicators at the Bioreactor” (available at https://www.researchgate.net/publication/326804832_Biopharma_PAT_-_Quality_Attributes_Criticcal_Process_Parameters_Key_Performance_Indicators_at_the_Bioreactor) a high-level overview of process analytical technologies. The paper describes cultivation principles (e.g. batch, fed-batch and perfusion, monitoring methods). Therein impact of measurements like dissolved oxygen are used to gain process understanding. No prediction of any output parameters or any machine learning approaches are disclosed.
Rubin, J., et al., report that pH excursions impact CHO cell culture performance and antibody N-linked glycosylation (Bioprocess. Biosys. Eng., 41 (2018) 1731-1741). The authors disclose a study on the impact of cell culture pH on antibody glycosylation using typical off-line measurements of process parameters as conducted in any cultivation, and the influences of pH variation thereon.
Downey, B. J., et al. report a novel approach for using dielectric spectroscopy to predict viable cell volume (VCV) in early process development (Biotechnol. Prog. 30 (2014) 479-487).
Xiao, P., et al. reported the metabolic characterization of a CHO cell size increase phase in fed-batch cultures (Appl. Microbiol. Biotechnol. 101 (2017) 8101-8113).
Kroll, P., et al. reported about a soft sensor for monitoring biomass subpopulations in mammalian cell culture processes (Biotechnol. Lett. 39 (2017) 1667-1673). The authors used a turbidity physical sensor to measure viable cell counts (VCC, equivalent to VCD) based on a linear model.
The present invention is based, at least in part, on the finding that by selecting certain, specific process variables from historical datasets, useful data-driven models can be obtained, which contain important parameters for the cultivation of CHO cells, such as, VCD (Viable Cell Density), VCV (Viable Cell Volume), glucose and lactate in real time.
With the method according to the invention, it becomes possible to provide precise online-like values for the target variables over the entire course of the cultivation without sampling.
The object of the invention is a method for determining the viable cell density, and/or the viable cell volume and/or the glucose concentration in the cultivation medium and/or the lactate concentration in the cultivation medium for and during the cultivation of a CHO cell expressing an antibody, using exclusively on-line measured values from said cultivation by means of a model for the cultivation of this CHO cell, characterized in that the model based on a feature matrix comprising the features ‘Time’, ‘CHT.PV’, ‘ACOT.PV’, ‘FED2T.PV’, ‘GEW.PV’, ‘CO2T.PV’, ‘ACO.PV’, ‘AO.PV’, ‘N2.PV’, ‘CO2.PV’, ‘FED3T.PV’, ‘OUR’ and ‘PH.PV’ is generated.
The features denote the parameters as follows:
In one embodiment, the model is generated using the random forest method.
In one embodiment, the training dataset comprises at least 10 cultivation runs, preferably at least 60 cultivation runs.
In one embodiment, the model is obtained with a training dataset that includes cultivation runs of mammalian cells that express a complex IgG, i.e. an antibody that comprises a different form than a wild-type Y-shaped full length antibody, e.g. by comprising additional domains, such as one or more Fabs. In one embodiment, the training dataset also contains cultivation runs of mammalian cells that express standard IgG, i.e. a Y-shaped wild-type-like antibody without additional or deleted domains.
In one embodiment, approximately 80% of the datasets available for the model formation are used as training datasets and the remaining datasets are used as test datasets.
In one embodiment
In one embodiment, missing data points in the datasets are supplemented by interpolation.
In one embodiment, the datasets contain a data point for at least 60 minutes, preferably a data point for approximately every 5 to 10 minutes.
In order to be able to achieve a high throughput of test cultivations, especially for complex molecules and molecular formats, the cultivation containers must be reduced in size and the cultivations must be automated. The success of a cultivation depends on the controlled process variables and the desired molecule can only be produced in high yields when optimal cultivation conditions are provided. Thus, fast and efficient control of the relevant process variables is required in order to be able to set the respective process variables and maintain optimal cultivation conditions. Such a control is particularly necessary for small-scale parallel cultivations, as each cultivation has to be monitored separately. In particular, the so called off-line process variables are a problem here, because on the one hand the required sampling and separate analysis result in a time offset, i.e. the cultivation continues and the process variables once determined off-line differ from the real process variables, and on the other hand the number of sampling points is significantly smaller compared to on-line available process variable, which leads to temporally worse control of this process variable.
It is therefore an object of the present invention to make process variables, which cannot be determined on-line but only off-line, especially due to the size of the employed cultivation vessel, available like process variables, which are available on-line, in the cultivation scale used, based on a data-driven model and in real time.
To produce recombinant proteins, bioreactors are mostly operated using a fed-batch process [4]. In addition to the fed-batch process, there are other operating modes such as the batch process and the continuous cultivation mode.
The fed-batch or feed process is one of the partially open systems. Advantages of this process are that nutrients such as glucose, glutamine and other amino acids can be added to the cultivation during the process. Resulting substrate limitations can be avoided and longer process times can be ensured. The substrates can be added continuously or in the form of (one or more) concentrated boluses. A suitable feeding strategy can be used to better control inhibitory effects and the accumulation of toxic by-products. However, this requires sufficient knowledge as well as control of the process.
To provide and maintain optimum conditions during the cultivation of mammalian cells, such as CHO cells, bioreactors are used almost exclusively [2]. The bioreactors used are mostly stirred-tank reactors. The cultivation takes place in suspension, i.e. of suspension growing cells.
Aerobic mammalian cells, such as CHO cells, require oxygen to maintain their cell metabolism. The cells are usually supplied with oxygen by submerged gassing of the culture broth. The concentration of dissolved oxygen in the reactor is one of the most important parameters for the cultivation of aerobic cells. The concentration of oxygen dissolved in the medium is determined by a number of transport resistances. Diffusion causes the oxygen to be transported from the gas bubble to the cell so that it can ultimately be metabolized by the cells. It is described that the transport mechanism can be carried out using the oxygen transport rate (Oxygen Transfer Rate, abbreviated OTR), whereas the consumption of oxygen by the cells themselves can be determined using the oxygen consumption rate (Oxygen Uptake Rate, abbreviated OUR) [2]. Appropriate exhaust gas analysis can provide the data required to calculate the OUR and OTR. Process variables such as temperature, pH value and the concentration of dissolved oxygen are monitored with suitable sensors and are comprised in the parameters to be controlled within a cultivation. These process variables have a significant influence on the effective productivity of mammalian cell lines [3].
In order to shorten development and set-up times for bioreactors, research and development are increasingly focused on single use technology (single use bioreactor; abbreviated: SUB). The great advantage of these systems is that complex cleaning processes and the necessary complex and costly cleaning methods such as CIP (cleaning in place) and SIP (sterilization in place) are not required.
Automated high-throughput cultivation systems such as the ambr250 system (automated microscale bioreactor) help accelerate drug development. Twelve single-use bioreactors with a volume of 250 mL each are available within this system. An automated liquid handler is used for pipetting and sampling. Operations are controlled by central process software. The entire ambr250 system is located under a laminar flow box to ensure a sterile environment during operation.
Soft sensors have been used more and more industrially in the past two decades for the monitoring of process variables [6]. Said process variables can usually only be determined with high analytical effort or externally, i.e. off-line. Particularly, when employing single-use systems on a small scale, the required additional sensors often cannot be installed (space and availability or connectivity to the disposable bioreactor, possibly not gamma-irradiatable, etc.). Therefore, there is a lack of continuous data of important process variables, especially at small cultivation scale, which could be used for process monitoring and which would allow regulation of said process variables, i.e. process target parameters. The name “soft sensor” combines the two terms “software” and “sensor”. The term “software” signifies computer-aided programming of a model. The output of these models provides information about the cultivation, in particular real-time values of process variables that would otherwise not be available due to the lack of the respective physical sensors [5].
Basically, soft sensors can be divided into two classes, model-driven and data-driven soft sensors.
Model-driven soft sensors are subject to theoretical process models. These require detailed knowledge of the ongoing process and describe said process using differential equations of state. This means that dynamic behavior of a process has to be described using mechanistic models. Such models are primarily developed for the planning and design of process plants and focus on the description of ideal equilibrium states.
In data-driven soft sensors (what are termed black box models), models based on machine learning are used. These include empirical models, which use historical data to describe process variable correlations. Biological processes are complex and not yet fully understood with regard to each and any aspect of the metabolism of the cultivated mammalian cells.
The field of application of data-driven soft sensors within the pharmaceutical industry is large. As a rule, cultivations are monitored and recorded.
It has now been found by the current inventors that such historical data can be used to generate data-driven models for on-line estimation of off-line process variables.
Process variables are primarily determined, i.e., made available, in real time; they can normally only be determined with difficulty and with increased analytical effort and the associated time offset. In addition, a robust and long-term stable on-line sensor system is not always available for the on-line monitoring of some process variables, such as biomass or certain substrate and product concentrations [7]. Although these parameters comprise important information about the cultivation process, they are only available at the limited time-points during the cultivation, i.e. at those at which a sample is taken and analyzed off-line.
With small systems, such as the ambr250 systems, it is not possible to measure certain process variables, such as turbidity and/or conductivity, due to the lack of probe ports. Furthermore, due to their design, some common probes require a relatively large amount of space, which is not available in these low volume systems.
Machine learning is the application of algorithms to describe the basic structures of datasets. Machine learning can be divided into two parts: supervised and unsupervised learning.
Supervised learning is used when a model is prepared to make predictions for future or unknown data based on training data. It is supervised because the training dataset already contains information about the desired output value. One example is the filtering of spam emails [10]: Accordingly, the algorithm receives a dataset which consists of spam and non-spam messages, and which already contains the information about spam/non-spam with which it goes through the learning phase. With a new, unmarked email, the algorithm now tries to predict what type of message it is. Since this is a categorical target variable (spam/non-spam), the term “classification” is used.
In the case of unsupervised learning, an attempt is made to acquire relationships within the dataset but without presenting a target variable to the algorithm. The focus is on exploring the underlying structure of the data in order to extract meaningful information therefrom. The simplest example of this group is clustering. In this exploratory data analysis, an attempt is made to divide the dataset into meaningful subgroups without prior knowledge of the actual group membership.
If the target variable is a continuous variable, one speaks of regression or regression analysis. Variables that are used to describe a regression model are called independent or explanatory variables. Based on this, an attempt is made to find a mathematical relationship between input variables and target parameter in order to be able to predict results.
The method according to the present invention uses supervised learning, in which the target variable is described by means of regression.
The modeling can be aligned schematically in the steps of: preprocessing, learning, evaluating and estimating of the target variables.
The preprocessing of the data is necessary to ensure that the model is able to correctly interpret the information it is based on. The dataset is prepared in the form of a feature matrix x and contains m features (columns) and n rows, which thus represent the explanatory variables. Each row n contains the specification of a feature for a specific data point.
The target variables are arranged in a vector y. Each row of the feature matrix x(n) therefore contains the information for the associated value of the target variable y(n).
Statistical analyses are used to identify suitable features. Once a suitable feature has been identified place and a corresponding feature matrix has been created, a subset (70-80% of the entire dataset) is made available to the model for learning. This subset is called the training dataset.
Typical data preprocessing might include providing the models with datasets in a standardized form. The data of each feature is thus given the property of a standard normal distribution with a mean of 0 and a standard deviation of 1. This increases the comparability of the features with each other and enables the learning algorithms to achieve their optimal performance [10].
Learning is the central part of model building. During learning, the model tries to understand and recognize the relationships between the data. Each model is subject to a mathematical formula with specific parameters. These adapt within the training process in order to describe the relationships between the data as good as possible.
Some models, such as neural networks, have other parameters that are not changed during the learning process. These are called hyperparameters. They influence the complexity of the models or the speed of the learning process and are determined before the training process. There is no fixed formula for choosing the right hyperparameters. Different models are therefore trained with different hyperparameters and then tested. Only then can it be judged which model is most suitable.
Randomized and raster-based algorithms are used to search for the optimal combination of hyperparameters. Each hyperparameter is represented by a list with different values. Models are trained in a grid search (GridSearch) with every possible combination from the respective lists. The required computing effort can be reduced by randomized searches. Various random parameter combinations are used, wherein the computing effort can be predetermined here. In one embodiment, the model is first executed with a randomized search for a rough estimate of the hyperparameters and then a grid search is carried out for the fine adjustment of the hyperparameters. The aim of learning is to train models so that bias and variance are kept as low as possible.
Models often learn the relationships between the training data better than with subsequent prediction with an unknown dataset. This behavior is called over-fitting. Accordingly, the model has memorized the training dataset and describes with new data the relationships with insufficient accuracy. Similar behavior can also be attributed to an excessive variance. Here the model uses too many input parameters for the dataset to be trained, leading to a complex model that only fits this dataset with a high data variance. Accordingly, the model learned the noise of the data without being able to map the actual relationship.
If, on the other hand, a model is not complex enough to be able to react to changes in the test dataset, this is called under-fitting. Then the bias is too great and the model can only imprecisely map the relationships of the training data to the test data.
Already during learning, k-fold cross-validations of the training dataset offer the possibility to avoid over-fitting of the models [11]. The training dataset is divided into k subsets. Then, k−1 subsets are used to train the model and the remaining subset is used as the test dataset. This procedure is repeated k times. In this way, k models are trained and k estimates of the target variable are obtained.
A performance estimate E of the model is generated for each run. As a performance estimate for regression, for example, the mean square deviation as a measure of the error is used. In practice, 10-fold cross-validations have proven to be a good compromise for bias and variance in most cases [12]:
Artificial neural networks (ANN) were described in 1943 by Warren McCulloch and Walter Pitts with a mathematical model of the neuron. Information transfer in biological systems could be made understandable in this way [13]. Frank Rosenblatt was then able to link the McCulloch-Pitts model of an artificial neuron with a learning rule and, thus, described the perceptron [14]. The perceptron still forms the basis of an ANN.
A simple perceptron has n inputs x1, . . . , xn∈IR, each with a weighting w1, . . . , wn∈IR. The output is represented by o∈IR. The processing of the input signals with appropriate weighting is the propagation function (input function) σ,
which describes the network input of a neuron. Via the activation function ϕ,
o=ϕ(σ),
the output o of a perceptron is then determined. Various functions can be used for ϕ, which can lead to activation of the perceptron.
The activation function, thus, calculates how strongly a neuron is activated depending on the threshold value and the network input [15]. If several of these neurons are interconnected in a suitable structure, the complex relationships between the input layer and the output layer can be mapped. The simplest form of such structural interconnections of simple neurons are feed-forward networks. These are arranged in layers and consist of an input layer, an output layer and, depending on the structure, several hidden layers.
In feed-forward networks (so called multilayer perceptrons), every neuron in one layer is connected to all other neurons in the next layer. These networks, thus, propagate the information content created through the network in a forward direction. Each neuron weights the incoming signal with an initially randomly selected weight and adds a bias term. The output of this neuron corresponds to the sum of all weighted input data. Depending on the number of neurons in a layer and the number of hidden layers, the complexity of the neural network can be determined.
Multi-layered feed-forward networks, which contain error feedback (backpropagation), are mostly used in supervised learning by an ANN [16].
The training of such a neural network can be divided into three steps:
In the first step, an input is made to the input layer of the network, which is propagated layer by layer through the network until there is an output from the network. The output of the network is compared with the expected value in the second step and the error of the network is calculated using an error function. Depending on the current weighting, each neuron within the hidden layers contributes to the calculated error to different extents. In the third step, the error is propagated backwards through the network, wherein the weighting is adjusted depending on the contribution of the weighting of an individual neuron to the error. The aim of the backpropagation algorithm is to minimize the error and usually uses a gradient descent method [17]. With this method, the quadratic distance between the output of the network and the expected output is calculated as an error function:
In order to calculate the contribution of the weighting of each neuron to the error, the error function Err must be derived from the considered weight wij. Accordingly, only activation functions that are continuous and differentiable can be used here [17]. This determines the weight adjustment delta that will be used in the next iteration step. The relationship can be described mathematically as follows:
The learning rate together with the number of iterations, is a hyperparameter that is established before training the model. The two steps are repeated until the maximum number of iterations or a defined error value has been reached and a good result can be achieved for an unknown input.
In addition, the random forest (RF) algorithm can be used in machine learning for regression problems [18]. RF learns through a multitude of decision trees and thus belongs to the category of ensemble learners. A decision tree can spread out from the root (top node, no predecessor). Each node divides the dataset into two groups based on a feature. The successors of the root can be leaves (no successors) or nodes (at least one successor). Nodes and leaves are connected by an edge. In the case of regression problems, [19]
In a preferred embodiment, the RF uses the bagging principle (bootstrap aggregation principle) according to Breiman [18] to create a suitable training set, wherein the training set is created by a sampling from the entire training dataset with replacement. Some data may be selected multiple times, while other data are not selected as training data. The quantity of the training set always corresponds to the quantity of the entire training dataset. Each selected training set is used to generate a decision using a decision tree (classifier). The decisions of all training sets are then averaged, whereby a majority decision determines the final classification. The generation of the bootstrap samples thus creates a low correlation between the individual classifiers. Furthermore, the variance of the individual classifiers can be reduced and the overall classification performance increased [18].
In a preferred embodiment, a feature is used for the decision of a split (division of a node) during the creation of the tree, which feature makes the clearest decision regarding a random selection of features of the dataset. The selected split is no longer selected as the best split in terms of all features, but only the best within a random selection of the features. As a result of this randomization, the bias (distortion, systematic error) of the tree increases in the course of the creation. Because of the formation of the mean value of all trees contained in the RF, the variance decreases. The decreasing variance has a greater added value than the increase in the bias, which leads to an increased accuracy of the model [20].
Furthermore, overfitting of the models is almost prevented in RF predictions, since the average of all individual decisions is always considered [18].
The XGBoost (eXtreme Gradient BOOSTing) uses an ensemble of regression trees as a model-forming basis. Both the bagging principle already described and a special boosting technique is used to train the ensemble for the most accurate prediction possible. In simple terms, the boosting technique can be seen as a combination of a gradient descent method that consists of many weak learners [21]. These weak learners are usually no more precise than random guessing and are grouped together as strong learners in the course of creating the ensemble. A typical example of such a weak learner is a simple regression tree with only one node. The principle of the boosting algorithm is to select training data that are difficult to classify in order to learn from these poorly classified objects with these weak learners and, thus, improve the performance of the ensemble. Due to the complexity of the XGBoost, the algorithm is considered a black box. However, due to its scalability and speed of solving problems, the algorithm is used very successfully in a direct comparison of different models of machine learning [22].
The method implemented by XGBoost combines a gradient descent method with a boosting technique, which is described below using the original literature by Tianqi Chen, “XGBoost: A Scalable Tree Boosting System” [22].
With an ensemble that consists of K decision trees, the model can be described according to:
wherein fk is the prediction of a single decision tree. Seen across all decision trees, a prediction can be made
wherein xi is the feature vector for the i-th data point. In order to train the model, a loss function L is optimized. In the case of regression problems, the RMSE (root mean squared error) is used:
Regularization is an important part that prevents overfitting of the models:
wherein T is the number of leaves, and w2j is the achieved scoring of the j-th leaf. If regularization and loss function are brought together, the basic objective function of the model can be formulated as
Obj=L+Ω,
wherein the loss function determines the predictive power and the regularization controls the complexity of the model. The target function is optimized using the gradient descent method. Given an objective function Obj(y,ŷ) to be optimized, the gradient descent is calculated in each iteration
∂ijObj(y,ŷ)
and ŷ is changed along the descending gradient so that the objective function Obj is minimized.
To create the regression trees, internal nodes are divided based on a feature of the dataset. The resulting edges define the value range that allows the datasets to be divided. Leaves within the regression trees are weighted, wherein the weight corresponds to the predicted value. The number of iterations indicates how often the process of bagging and boosting is repeated. The XGBoost algorithm provides a very extensive list of hyperparameters, which contribute significantly to the formation of a good model.
Irrespective of the model used, correlations can be used to evaluate and represent linear relationships between two variables. The Pearson correlation coefficient r (or r2) provides a common measure for evaluating this relationship. It is dimensionless and is calculated according to:
and varies within a range of −1≤r≤+1. The counter describes the sum of the deviation products of the two variables x and y to the mean, which corresponds to the empirical covariance sxy. The denominator is the root of the product of the individual empirical standard deviations sx and sy. The mean values of the quantity to be correlated are described as
In the correlation analysis, it should be noted that only linear relationships can be shown here. The Bravais-Pearson correlation coefficient is therefore not suitable for describing non-linear relationships. This can mean that despite a correlation coefficient 0.0≤r≤0.2, there is a strong non-linear dependency of the variables.
Through mutual information, non-linear dependencies of two random variables can be determined. It is used in information theory [24]. With the help of the probabilities, the information content of a random variable compared to a second random variable is described. The basic formal relationship is:
This approach was accordingly expanded by Kraskov et al. and Ross et al. so that it could be used for the selection of suitable, continuous variables [25] [26].
Appropriate metrics must be used to compare different models. With the help of these, it is possible to make a statement about the accuracy with which a model can describe a target variable.
The coefficient of determination R2 indicates which proportion of the variance of the target variable y can be described by the model. The coefficient of determination can be calculated according to:
wherein ŷ is the estimated target variable of the i-th example and yi is the associated true value.
The root mean squared error (RMSE) is another statistical measure that can be used to determine the model quality. Here the root of the mean, squared distance of the actual to the estimated value is calculated:
By squaring the error and then forming the root, the RMSE can be interpreted as the standard deviation of the variable to be estimated. Where n is the number of observations and ŷ is the estimated value of a target variable y. The representation of the error by the RMSE is an absolute error value that delivers values of different sizes depending on the target parameter examined. Therefore, it makes sense to relate the RMSE to the mean:
Thus, the RMSE can be calculated relative to the mean true value
With the method according to the present invention, it is possible to determine the cell growth, i.e. the timeline of the cell density, and the timeline of certain metabolites, in particular glucose and lactate, in real time during a cultivation, especially on a small cultivation scale, from on-line process variables. With the method according to the current invention it is, thus, possible to provide real-time values for process variables that were previously not available in real time but only off-line. This represents an improvement over the conventional determination methods for cell growth and the timeline of certain metabolites, in particular glucose and lactate, insofar that the method according to the current invention does not require sampling from the cultivation medium.
In a preferred embodiment, the method according to the present invention is used to determine the cell density, the glucose concentration and the lactate concentration in a fed-batch cultivation of mammalian cells with a cultivation volume of 300 mL or less from on-line process variables, wherein the method is carried out without sampling, i.e. feedback control sampling.
The method according to the present invention allows cultivations to be carried out fully automatically, i.e. without sampling, on a small scale, i.e. with a cultivation volume of 300 mL or less, in which relevant process variables, such as cell density, cannot be determined on-line but only off-line.
The method of the present invention is particularly suitable for monitoring and controlling cultivations of mammalian cells on a small scale.
In the method according to the current invention, a method for determining the live cell density, glucose and lactate concentration as the target parameter in a CHO cell cultivation is provided, wherein the method employs a data-based soft sensor. Machine learning models are used to describe the different target variables.
The present invention is based, at least in part, on the finding that the selection of the process variables used for the model generation has a significant impact on the quality of the determined target process variables.
Furthermore, the present invention is based, at least in part, on the finding that the type of division, i.e. allocation, of the existing datasets into a training dataset and a test dataset influences the model quality.
Furthermore, the present invention is based, at least in part, on the finding that the type of antibody produced influences the choice of the optimal target parameter.
The method according to the current invention is described below using 155 exemplary datasets, which have been obtained from cultivations in the ambr250 system. This should not be understood as limiting the teaching according to the current invention or the method according to the current invention, but rather as an exemplary application of the teaching according to the current invention. Other datasets, which have been generated with the same or a different cultivation system, can equally well be used for and in the method according to the invention.
The 155 datasets were analyzed and examined for suitable features. Corresponding interpolation strategies were used to map the target parameters in such a way that the selected models could provide values for all target parameters at discrete points in time. The models were assessed with regard to errors and model quality. The methods based on it allowed for the provision of a robust and precise model of the respective target variable/process variable.
The molecular formats of the antibodies produced in the cultivations in the datasets differed. An overview of the various projects and molecular formats as well as the respective number of cultivations is shown in Table 1 below.
The data, i.e. the on-line parameter set, associated with the entire cultivation process, and the associated date and time stamp were used for each cultivation. The data density of the different process values varied with regard to the timeline. These deviations in the data density can be attributed to the fact that, due to the system, a new data point was only recorded for an on-line parameter if the measured value was changed by a delta that was specifically defined for each measured value. In order to make continuous process data available and to ensure that the runs can be compared with one another, the corresponding on-line parameters were interpolated for all missing time stamps.
It should be noted for the on-line process variables that too much smoothing of the data would lead to a loss of fluctuations in the measured values. However, this noise also represents any process-relevant changes that are taking place and are contained in the process values as information. It is therefore important not to oversmooth the process values and to make changes in the process course accessible even after interpolation.
The off-line data contains different numbers of analysis values depending on the number of samples (between 8 and 13) during a cultivation. Each dataset contains a date and time stamp for each data point and the associated analysis values of the off-line parameters.
The preprocessing by interpolation of the on-line and off-line data results in a dataset, which contains the same number of data points for all process variables, regardless of whether they were on-line or off-line process variables, at the same times. The analysis was based on the interpolated dataset. Such an interpolation is not necessary if data points are available at the same frequency and at the same times for all on-line and off-line process variables.
Due to the preprocessing of the available on-line and off-line data, the different time profiles of the individual process variables due to different measurement frequencies are standardized to a uniform time profile, i.e. a single timeline. Bad values caused by technical and process management are identified and deselected or corrected, as well as existing time gaps are closed so that all process variables in one dataset for a cultivation and all datasets for all cultivations with regard to the time and number of process variables are uniform.
So that the fluctuations of the measurement signals caused by switching on the control at the beginning or switching it off at the end of the cultivation do not falsify the model formation, the data that was collected in the first and the last 12 hours of the cultivation was not used. In the specific example, this means that a time range from day 0.5 to day 13.5 was used. This guarantees that a change in the process variables can only be attributed to processes in the cell culture. The interpolation of the on-line data was carried out for the entire dataset.
As can be seen in
For the off-line data, the obtained analysis values (VCD, VCV, glucose, lactate) were fitted with three different interpolations.
The respective coefficient of determination, R2, was calculated to evaluate the individual interpolations for the VCD. The univariate spline achieved the highest R2 value here, but tended towards significant overfitting. Accordingly, the univariate spline describes almost every measured value exactly, but does not depict a typical growth curve of a biological system. On the other hand, there are smaller differences between the Peleg fit and the polynomial fit. However, the Peleg fit can describe the different growth phases of biological systems much better and is therefore used for interpolation of the target variable for VCD [27].
The interpolation of the lactate and glucose profiles showed that the univariate spline maps the off-line data with better R2 and describes the profile in the case of lactate much better. Since the polynomial fit interpolates negative values for lactate from day 10, the interpolations of the univariate spline were defined as target vectors y for lactate. For glucose, however, the polynomial fit (third degree) was used to describe the target variable (glucose: univariate spline (R2=0.999) and polynomial fit (R2=0.958); lactate: univariate spline (R2=0.999) and polynomial fit (R2=0.959)).
Furthermore, datasets, which contained too few off-line data points for preprocessing (three or less), were no longer used for the analysis. This was the case for two datasets. The entire interpolated and adjusted dataset therefore contained 153 cultivations.
Since the interpolated datasets with the maximum resolution of five minutes contained a large number of data points, the analysis was carried out with a resolution of 1/10 days only to reduce the computational effort. The JMP® program can be used for this.
In the scatter plots in
Looking at the value of ‘O2.PV’ as an example, the calculated coefficients for the interpolations are very close to each other (0.9547; 0.9490; 0.9490).
The correlation analysis was accordingly carried out on the entire dataset. Table 3 below shows the Bravais-Pearson correlation coefficients determined in this way.
Compared with the correlation analysis on a single ambr250 run (see the previous Table 3 and
When independent variables correlate with one another, one refers to multicolinearity. As shown here using the example of ‘O2.PV’, there is a clear linear relationship between the two best correlation coefficients for ‘N2.PV’ and ‘O2.PV’ from
Based on the calculation of the information content and the results of the correlation analysis, the best ten process variables (CHT.PV, ACOT.PV, FED2T.PV, GEW.PV, CO2T.PV, ACO.PV, AO.PV, LGE.PV, O2.PV and N2.PV) are selected and a corresponding feature matrix X is created. The matrix contains the interpolated data of the available dataset. A resolution of five minutes for the feature (f1 . . . f10) and the duration of the cultivation in hours were selected as an additional column in the matrix:
The division into training and test datasets was done in such a way that these were exclusively datasets from cultivations from Project 2. The target variable ‘VCD’ was divided according to the distribution of the feature matrix.
To check the quality of the models obtained, the relative frequency density of the errors on the entire test dataset was calculated. The histograms of the prediction on the entire test dataset of the model determined using the MLPRegressor (a), random forest (b) and XGBoost (c) for the target variable VCD showed on the X-axis the error of the estimated VCD values compared to the predicted values, and on the Y-axis the relative frequency of the errors. All three distributions show a left skew tendency, which indicates an underestimation of the VCD. Furthermore, examination of all the histograms shows that the estimates of all three models yield comparable results. The XGBoost shows the most homogeneous distribution of the calculated errors, although here also an overestimation of the target variable can be seen.
For each model, the RMSE and R2 were calculated based on the entire test dataset. Both values relate to the Peleg fit of the target variable VCD. The results of the three models are summarized in Table 5 below.
All models achieved comparable results with regard to the RMSE and the coefficient of determination.
If one examines some specific datasets that were determined with random forest (best model), it can be seen that it is not possible to accurately map the Peleg fit of the VCD over the entire cultivation period (see
It has now surprisingly been found that the exchange of features in the feature matrix with a high information content for those with a significantly lower information content but with a still determinable information content can significantly increase the quality of the prediction.
It has been found that the extension of the matrix by the features ‘CO2.PV’, ‘FED3T.PV’, ‘OUR’ and ‘PH.PV’ and the deletion of the redundant feature ‘O2.PV’ (gassing with N2 and O2) leads to an improved quality of the prediction.
The improved feature matrix contains the following 14 features: ‘Time’, ‘ACO.PV’, ‘ACOT.PV’, ‘AO.PV’, ‘CHT.PV’, ‘CO2.PV’, ‘CO2T.PV’, ‘FED2T.PV’, ‘FED3T.PV’, ‘GEW.PV’, ‘PH.PV’, ‘N2.PV’, ‘IGE.PV’ and ‘OUR.PV’.
Furthermore, it has been found that the selection or the division into training and test datasets has an influence on the quality of the prediction.
When comparing the training and test datasets already selected with regard to the target variable, it was found that the distribution of the VCD for the training dataset consisting of cultivations from Project 2 had a mean value μTrain=84.60, with a standard deviation of σTrain=48.62, while the test dataset had a mean value of μTest=64.22, with a standard deviation of σTest=38.02.
It has been found that it is disadvantageous to take training datasets from only one project if the prediction is to be made for cells that express structurally different proteins. It has been found that it is advantageous to randomly distribute the training datasets over the entirety of the existing datasets.
In the present example, in order to distribute the datasets more homogeneously, 30 random numbers between 0 and 152 (since there were 153 datasets) were generated. The numbers stood for one cultivation run each. The random numbers were repeatedly generated until comparable mean values and standard deviations regarding the division between test and training dataset could be achieved with the trained model. The final division resulted in μTrain=80.72 with σTrain=47.11 and μTest=80.11 with σTest=48.70 and was used as the split ratio of the two datasets in the further course.
Thus, in one embodiment of the method according to the invention, the existing, preferably preprocessed, datasets are divided into a training dataset and a test dataset, wherein the training dataset is 70-80% of the total dataset (in this example 80% and, thus, 123 cultivation runs) and the test dataset contains 20-30% of the data of the entire dataset (in this example, 30 randomly selected cultivations of the entire dataset that were validated as described above were available for the validation of the models).
The models were then trained and tested with the extended feature matrix as well as with the new distribution of the datasets. The strategy for optimizing the hyperparameters as outlined above has been retained for this. From the corresponding histograms of the estimates for the VCD with the newly divided training and test datasets, it can be seen that the distribution of the errors for all three models have become significantly narrower, which can be attributed to a more precise estimate of the target parameter (
All three models can achieve an error distribution that fluctuates more clearly around the true value (X-axis at 0) of the target variable. Here too, the histogram of the XGBoost shows the most homogeneous error distribution. The histogram of the random forest shows minor errors over the entire area. If the two histograms (a) and (c) are compared with each other, the XGBoost more often estimates the exact target value than the MLPRegressor. However, due to the width of the distribution of the MLPRegressor with a low degree of error, one can infer about the same accuracy for both models.
All three models are able to achieve closely related results.
Nearly ideal estimates of the target variable based on the Peleg fit of the raw data are thus achieved. Looking at the entire test dataset, all models are able to achieve good results with regard to the R2 and RMSE with the split ratio of the datasets as described above.
The glucose values fitted by a third degree polynomial fit were used as the target parameter for the estimation of the glucose concentration. The feature matrix used for the training contained the same features as for the VCD. The same division into training and test datasets was also used.
Just as for the VCD, the histograms show comparable results in terms of errors. Here, too, the XGBoost can most often achieve a small error between the actual and estimated values. The random forest histogram also shows minor errors between the interpolated value of the target variable and the estimate, which are distributed homogeneously around the actual value of the glucose. The MLPRegressor shows the largest errors compared to the other two histograms.
For the estimation of the lactate concentration, the values of the lactate fitted by the univariate spline method were used as the target parameter. The feature matrix used for the training contained the same features as for the VCD and glucose. The same division into training and test datasets was also used. The histograms show different results regarding the errors (
If the histogram of the MLPRegressor is considered, an estimate with small errors is not possible as often as for the other two models. The random forest and XGBoost, on the other hand, have very narrow distributions. It seems that for some estimates of the target variable, very good predictions can be made with few errors, but these quickly lead to more significant errors within the entire test dataset. The neural network has the most homogeneous error distribution here.
Table 8 below shows the results of the lactate evaluation for RMSE and R2 for all models.
For the validation, first there was a study to determine which of the models is most efficiently able to describe the interrelationships of the feature on the test dataset. For this purpose, the models were initially only provided with ten datasets for learning. As the process progressed, the number increased by ten datasets each. This resulted in twelve training processes in which the models received between 10 and 120 datasets. After each training session, the target variable was estimated based on the test dataset. The respective RMSE was calculated. The test dataset also consisted of the 30 randomly validated selected datasets, as described above. VCD was chosen as the target variable. This resulted in the learning behavior described in
As
A detailed evaluation of the estimate of the models regarding the prediction of the VCD for the 30 cultivations of the test dataset was carried out. Despite the good results across the entire dataset (histograms, coefficient of determination, RMSE), it was found that some predictions still showed a significantly larger deviation.
The insufficient accuracy of the estimate was increasingly observed in cultivations from Projects 1 and 3. In cultivations from both projects, the cultivated cells produced a complex molecular format.
It was found that the VCD in IgG-based formats, which have the characteristic Y-shape of a natural IgG antibody or largely retain it (Projects 2 and 4), is higher on average than in cells that have a complex molecular format as the target product (projects 1 and 3) and that the calculated cell diameter had a higher value for projects with complex molecular formats.
It was found that the relationship between a higher VCD and a smaller cell diameter for IgG formats, as well as the smaller VCD and larger cells in complex protein formats, causes the inaccurate prediction of VCD.
It has also been found that the viable cell volume (VCV) represents a more suitable target variable than the VCD, not only for those cultivations in which a complex antibody format is produced, but also for cultivations in which a Y-shaped IgG antibody is produced.
The VCV is calculated using the following formula:
The VCV is therefore a better approximation for describing the living biomass in a cultivation than the VCD.
Since the calculated values of the VCV, like all other off-line parameters, only contained times of the sampling, the new target parameter was fitted with a third-degree polynomial fit. The models were then trained and evaluated for the new target size, as already described for the other target parameters above.
The RMSE and the coefficient of determination were used to evaluate the individual models. In summary, the best models with 14 features achieved the following results:
For the target variable VCV, the calculated errors and coefficient of determination of the individual models are summarized in the following Table 10:
By using the target variable VCV instead of VCD, all models were able to achieve coefficients of determination above 0.9. The improvement in the models can be seen in both, a lower RMSE and higher R2 values.
In order to demonstrate the improved results in the comparison between the live cell density and the cell volume, scatter plots were obtained which represent both the estimate for the entire training set and the test dataset. Random forest estimates the best results for VCD and VCV. The two scatter plots are shown in
If the two scatter plots are compared with each other, it can be seen that the prediction of the VCV is closer to the ideal estimate and has a significantly smaller spread of the test and training datasets than the prediction of the VCD. If only the training data (blue dots) are considered, the models learn the relationships of the feature better in relation to the cell volume than to the live cell density. These features therefore allow a more precise estimate of the cell volume for the entire test dataset for all trained models.
The extent to which the division of the antibodies into different groups and the use of only limited datasets with regard to the training of the method influence the quality was investigated as follows.
If all four projects are considered separately with regard to the course of the target parameter VCV, the box plot shown in
The different combinations have shown that the prediction using the random forest method has achieved the best results, i.e. the lowest RMSE.
The RMSE showed a significant improvement (reduction) in all combinations of the training or test datasets when the VCV was used as the target parameter compared to the VCD.
The different combinations of training and test datasets showed that the selection of the datasets depending on the molecular format influence the RMSE of the target parameter. In case of model training with datasets of a standard format and the estimation of the VCD or VCV of complex formats, this combination achieves the highest RMSE. Training with datasets of the complex molecular formats and prediction of the VCD or VCV led to a smaller RMSE. The smallest RMSE could be achieved if a mixed dataset was used for standard Y-IgG and complex molecular formats.
Furthermore, the models were evaluated with regard to the estimate of the training and test dataset in order to check whether the models already trained had been overfitted. The trained models for the target variable VCV were estimated on the test and training dataset. The estimated values were evaluated according to the RMSE and the difference between the test and training dataset was then shown in the form of a bar chart (
The prior art uses parameters such as Glucose, Lactate, Ammonia, VCD etc (all of which are off-line parameters) as input variables for a random forest regression analysis to explain the dynamic behavior of intracellular activities but not for the prediction or modelling of off-line parameters.
In contrast to the prior art, in the current invention the parameters used for the machine learning models are exclusive online-parameters (which are used to control fermentation conditions).
The current invention, thus, makes use of typical online measurement parameters, which are generated throughout the cultivation and a statistical model to estimate parameters like VCV, glucose, etc., without the need for an additional sensor or sampling.
By interpolating existing on-line and off-line cultivation datasets, a standardized and uniform dataset could be obtained, which was used for a model generation to predict target parameters that are only available off-line.
For the off-line data, which was considered a target variable in the further course, it was essential to find an interpolation that can representatively describe the course for the respective target parameter. Since the live cell density relates to the growth course of a biological system, conventional interpolations such as the polynomial fit or the univariate spline fit can often only describe this target parameter with insufficient accuracy. An incorrect extrapolation would lead to a falsified description of the target variables. Although the selected interpolations yielded comparable results with regard to R2, the selected interpolation according to M. Peleg [27] is best able to describe growth processes of cell culture processes. The background of the interpolation strategy lies in the combination of a continuous logistic equation for the description of the growth of the cells and the mirrored logistic equation for the description of the death behavior (Fermi's equation).
The result of the correlation analysis is only marginally unaffected by the choice of the interpolation strategy.
The accuracy of the estimate for the VCD target variable could be increased by an adapted split ratio of the datasets into training datasets and test datasets. For this purpose, the validation dataset was selected with regard to the distribution of the target variables so that the mean values and standard deviations were as small as possible from one another. The goal was not to artificially generate a better dataset for prediction. Rather, it was assumed that the test dataset previously generated could not be used to describe the overall dataset with sufficient accuracy. Reference is hereby made to cross-validation as a corresponding method.
The calculation of the cell volume and the related relation to the size of the cells could describe a better approximation of the biomass than the VCD, which, thus, resulted in VCV as a new target parameter.
The calculated cell volume as an approximation of the description of the biomass provided a higher information content about the process characteristics than the previously employed live cell density of the culture determined by analysis of the samples. The average volume of the cell culture can be concluded from the measured average diameter of the cells. It could be shown that the size of the cells, especially those with complex target molecules as a product, increases continuously with increasing cultivation time. However, the live cell density cannot map this relationship. In the end, the metabolic activity of the cultured cells can be better described by the viable cell volume than by the live viable cell density.
For real-time determination of the target parameters, the estimate should be made at a defined interval, for example ten minutes. For CHO cells, this interval is an acceptable resolution because they have a doubling time of approximately 24 hours.
The following Examples and Figures serve only to illustrate the invention. The scope of protection is defined by the pending claims. However, modifications to the disclosed embodiments can be made without departing from the principle according to the invention.
The programming language Python was used in the Spyder development environment for the entire work. The implementation was carried out in object-oriented programming. Several classes were written, which implemented individual tasks within the project.
The entire dataset included 155 cultivation runs. These were broken down into on-line and off-line data. Data processing was implemented with Spyder in the Python programming language. The data were available as csv files. The data were read in with the “csv” program library. This allows data to be read in quickly and easily and converted into new data structures within the development environment. A “PlFileParser” class for the on-line data and an “Off-lineDataParser” class for the off-line data have been implemented.
Since the data were available with different data densities, they had to be interpolated accordingly. A linear interpolation and an interpolation using the moving average method were used for this purpose. Both functions have been implemented with the “scipy” library: “linear-interpolate. interpld” and “moving-average-convolve”. This ensured that the interpolated values were always between two raw measured values. The interpolation is therefore always within the natural fluctuation range of the measurement signal of the process variables. Since each process variable had different time stamps within a file, it was necessary to create another CSV file. The “TimelineMapping” contained all start and end times of the respective cultivation and was created by another database query. Three different intervals were selected for the resolution of the data:
Due to the considerably lower data density and the non-linear data course, no linear interpolation was applied to the off-line data. Here three different interpolation strategies were used for fitting:
The interpolation according to M. Peleg is able to map biological growth through additional functional terms and thus to describe the course of growth well [27]. Therefore, the raw data of the live cell density were fitted with all three interpolations. For glucose and lactate, an interpolation was carried out using the polynomial and spline method, since no biological behavior was to be assumed here. The on-line and off-line datasets were merged for the different intervals and saved as a CSV file for each cultivation. The correlation analysis was then carried out based on these datasets.
The correlation analysis was carried out with JMP®. With JMP® it is possible to apply statistical analysis to datasets. Multivariate statistics of the on-line data (feature) related to the respective target variable (lactate, glucose, VCD, VCV) was applied. The data are analyzed both for statistical significance in describing the target variables and for linear relationships. The correlation analysis shows linear relationships between an independent and a dependent variable, in the form of the correlation coefficient according to Bravais-Pearson.
Another method of identifying suitable features has been used in the form of mutual information. In determination by means of mutual information, the information content is determined, which is contained in an independent variable X in order to describe the target variable Y. The dependencies were calculated and implemented with “sklearn” by means of “mutual information regression”. Based on the size of the datasets with a resolution of five minutes, the information content was calculated separately for each cultivation and then the mean of the values obtained across all cultivations was formed.
The creation of the feature matrix came from the result of the correlation analysis and the statistical evaluation based on the information content. This can be represented as a matrix and contains one feature per column and one point in time with the respective version of the feature. The feature matrix was saved as a Panda DataFrame. Thus, a suitable file format was available for training and testing the models.
With the help of the results of the correlation analysis, a separate dataset was created for each target variable. A division of the feature matrix into a training and test dataset was necessary to train the models. A later use for on-line prediction required the withholding of a complete validation project. The training dataset contained 80% and thus 123 cultivation runs of the entire dataset.
Since the prediction of all target variables is a constant target parameter, only regressors were used as models. A number of hyperparameters, which differed from model to model, were available for the models. The training of the models thus served to adapt the hyperparameters so as to map the target variable as precisely as possible.
For the training itself, the entire feature matrix was standardized with the standard scaler of the Scikit-Learn library.
The hyperparameters were optimized with a randomized search (RandomizedSearchCV) and a grid-based search (GridSearchCV) from the Scikit-Learn library. All models were trained using the randomized search (RandomizedSearchCV) of the Scikit-Learn library in combination with a tenfold cross-validation of the training dataset. The various areas of the hyperparameters were examined for the smallest RMSE. The randomized search was carried out 30 times. Accordingly, a different, randomly selected set of hyperparameters was used in each iteration. The hyperparameters of the ten models with the smallest RMSE were output. Then, based on the hyperparameters from the randomized search, the hyperparameters for the grid search were more finely graded. The grid search was carried out again with a tenfold cross-validation of the dataset. The model with the least error (smallest RMSE) was saved and then used to estimate the target variables from the test dataset.
The Scikit-Learn library was used to implement the multilayer perceptron (MLP). The following list contains the hyperparameters that were used to train the models:
The random forest was also implemented by the Scikit-Learn library. The following candidates were available as hyperparameters within this optimization:
The XGBoost algorithm was integrated into the project structure through the XGBoost library. The following hyperparameter space corresponded to:
The model evaluation was primarily implemented by displaying an error histogram. This shows the errors (residuals) that the model has when predicting the test dataset for the actual value of the target parameter.
The RMSE was calculated for the accuracy of the estimation of the target parameter and compared with the mean value of the target parameter.
In order to examine the models for overfitting, the RMSE was calculated for the entire training and test dataset. The difference between the two errors was used as an indication of overfitting of the models:
Overfit=RMSEtest−RMSEtrain
To further describe the model quality, the coefficient of determination for the entire test dataset and for each cultivation considered individually was used.
155 datasets based on cultivations in the ambr250 system were collected. The eukaryotic cells used were CHO cells that expressed a target molecule extracellularly. The cultivations were carried out using the fed-batch process. The ambr system used enables twelve cultivations to be carried out simultaneously. The cultivation time of the main culture was 13 to 14 days. The single-use bioreactors (250 mL) provided the reaction space for this. The pre-culturing was carried out in shake flasks and lasted three weeks. The starting conditions in terms of volume and number of cells at the time of inoculation were comparable in each reactor. The media used were exclusively chemically defined media. Only one medium batch was used per cultivation.
In order to provide optimal cultivation conditions within this system, a number of process variables were available. The parameters to be controlled were pH, temperature and the dissolved oxygen concentration in the medium. The following table contains a complete list of all process variables used for this work.
All measured variables were recorded over the entire cultivation period by what is termed a PI system. The PI system only contains on-line measured variables.
The parameters listed here were available for monitoring optimal cultivation conditions. Exhaust gas analysis from BlueSens was also available for each reactor. This detects the O2 and CO2 content in the exhaust gas flow from the bioreactors and thus provided another important component in the process control. These two measured variables in the exhaust gas flow can be used to determine OUR and OTR.
Samples were taken daily during cultivation. These were then analyzed for various concentrations of the metabolites and product titers using Cedex Bio HAT® (Roche Diagnostics GmbH, Mannheim, Germany).
Furthermore, the cell count measurement was carried out. The measurement provides information about live cell density, total cell density, viability, aggregation rate and cell diameter. These parameters can be used to infer the growth behavior of the culture. The off-line sizes were measured by the Cedex HiRes® (Roche Diagnostics GmbH, Mannheim, Germany) cell counter. The error from these cell counting and cell analysis systems is in a range of 10%. All off-line measured quantities used are shown in the following table.
Number | Date | Country | Kind |
---|---|---|---|
19191807.7 | Aug 2019 | EP | regional |
This application is a continuation of International Application No. PCT/EP2020/072560 having an International filing date of Aug. 12, 2020, which claims benefit of and priority to European Patent Application No. 19191807.7, filed Aug. 14, 2019, the contents of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2020/072560 | Aug 2020 | US |
Child | 17670299 | US |