This disclosure is directed to methods and systems for performing cell-based, functional virus counting assays, and more particularly to methods and systems for allowing such assays to be completed in much less time than usual.
Functionality of viruses, such as active viruses, is currently measured in a variety of ways. The most widely used method is the standard plaque assay, which was first described in 1953. The assay measures virus function via the infection and lysis of target cells. The assay yields a plaque titer (concentration) indicative of the number of functional viruses, or plaque forming units, within a sample. The basic method is shown in
Other functional assays, such as the fifty-percent tissue culture infective dose (TCID50), are derivative of the plaque assay. All involve incubation periods, virus with cells, of 2 to 12+ plus days depending upon the virus and cells used to measure functional infectivity.
There is wide disparity in the ratios of total virus particles, functional and non-functional, to plaque-forming units (PFUs) for different viruses, as indicated by Table 1. Infectivity titers or infectious particle counts and total particle counts are essential for full virus characterization. The particle to PFU ratio can vary across orders of magnitude.
Adenoviridae
Alphaviridae
Herpesviridae
Orthomyxoviridae
Papillomaviridae
Picomaviridae
Polyomaviridae
Poxviridae
Reoviridae
Therefore, it has become critically important to understand both the functional virus titer, via a plaque assay, as well as the total virus particle number in virus samples, both for commercial applications, such as vaccine development and manufacture, as well as for safety in gene therapy.
To address the need for total particle counts, more modern technologies, such as embodied in Sartorius' Virus Counter® product, electron microscopy and a number of indirect methods, have been developed to provide total particle counts. For some of these technologies, these counts can be measured in as little as 30 minutes. Unfortunately, heretofore there are no rapid surrogates for the legacy virus plaque assay of
In one aspect, described herein is a method for training a machine learning model to predict virus titer from an image, or sequence of images, of a cell culture containing a virus population. In this document, the term “machine learning model” refers to a computational system that has used optimization algorithms to learn and perform a task based on previous examples of desired input-output pairs. The trained machine learning model allows a prediction of virus titer to be made much earlier than in the standard virus plaque assay, for example in 6 or 8 hours (or possibly less) after initial inoculation of the cell culture with the virus sample, as compared to many days in the prior art. The method of training the machine learning model can include the steps of: (1) obtaining a training set in a form of a plurality of images of virus-treated cell cultures from a plurality of experiments at one or more time points from a start time t0 to a final time tfinal, (2) for each experiment, recording at least one numeric virus titer readout of the virus-treated cell culture at time tfinal, (hereafter “ground truth”), (3) processing all the images in the training set to acquire a numeric representation of each image, and (4) training one or more machine learning models to make a prediction of a final virus titer on the training set numeric representations.
Also described herein is an application of the trained one or more machine learning models as a method of predicting a virus titer of a cell culture to which a virus sample of unknown titer has been added. In this “application” phase, the method can include the steps of: a) obtaining a time sequence of images of the cell culture, b) supplying a numeric representation of the time sequence of images obtained in step a) to one or more machine learning models trained in accordance with the previous paragraph, and c) making a prediction with the one or more trained machine learning models of the virus titer.
In another aspect, an analytical instrument is provided that is configured to hold one or more plates containing a cell culture and a virus sample. The instrument includes an integrated imaging system. The instrument is configured with a machine learning model trained to make a prediction of a virus titer in the cell culture from one or more images in a time sequence of images of the cell culture obtained by the imaging system, wherein the prediction is made before the viral infection of the cell culture has proceeded to term. For example, the prediction can be made at 4, 6, 10 or 15 hours, as an example, after initiation of the viral infection, instead of several days.
In one further aspect, this analytical instrument can be further configured with a processing unit executing a training module which enables the user of the instrument to conduct a training procedure to create a new trained machine learning model to make a virus titer prediction. This training module provides set-up instructions for facilitating a user of the instrument conducting a training method with the instrument. This training method can include the steps of: (1) obtaining a training set in the form of a plurality of images of virus-treated cell cultures from a plurality of experiments at a set of time points from a start time to to a final time tfinal, (2) for each experiment, recording at least one numeric virus titer readout of the virus-treated cell culture at time tfinal (3), processing all of the images in the training set to acquire a numeric representation of each image, and (4) training one or more machine learning models to make a prediction of a final virus titer on the numeric representations in the training set, wherein the training comprises minimizing an error between the model prediction of a final virus titer and a ground truth.
Several different methods can be used for the processing step (3) in the model training. In one embodiment, the processing step (3) involves passing the images through a convolutional neural network (CNN) to thereby acquire an intermediate data representation of the images. In another embodiment, the processing step (3) takes the form of sub-steps a)-c): a) segmenting individual cells from the image, b) calculating a cell-by-cell numeric description of each cell, and c) aggregating the numeric descriptions over all cells.
This disclosure provides a method of predicting a virus titer readout at experiment end time, or tfinal, at a time point t which is less than (or earlier than) tfinal. The phrase “experiment end time,” or tfinal, refers to the time when a virus titer assay has been allowed to run to completion, or equivalently, proceeded to term, i.e., when visible plaques have formed, and is typically 2 days or more depending on the virus or family of viruses in question, the cell type the virus is allowed to grow in, and other factors which are known in the art. The disclosed methods allow this predicted virus titer readout to be made long before the usual experiment end time, for example in 6-8 hours, or possibly even earlier, for some virus titer assays, instead of a matter of days.
The method involves the training and use of a one or more machine learning models that are used to make this prediction. This disclosure therefore involves two different aspects, shown in
The model training phase 100 can include several steps. First, in step (1) a training set 102 is obtained in the form of a plurality of sets of images, typically microscopic images 104 of virus-treated cell cultures from a plurality of experiments at one or more time points between a start time t0 to a final time tfinal. The time points t1, t2, . . . could be periodic, such as every 30 or 60 minutes, for example. In step (2), for each experiment, a recording is made of at least one numeric virus titer readout of the virus-treated cell culture at time tfinal, hereafter “ground truth” 106. While
The model application phase 200 of
An example of the model output 204 is shown in
Referring back to
The method of this disclosure can be performed in any suitable machine or instrument which includes a mechanism for obtaining images of the cell culture with added viral sample at different points in time. Preferably, such images are microscopic images. One example of such an instrument is shown in
The instrument 400 includes a tray 408 which slides out of the system and allows the culture plate 404 to be placed onto the tray 408 and then retracts and closes so as to place the culture plate 404 within the interior of the housing 410. The culture plate 404 remains stationary within the housing while a fluorescence optics module 402 (see
The module 402 includes LED excitation light sources 450A and 450B which emit light at different wavelengths, such as 453-486 nm and 546-568 nm, respectively. The optics module 402 could be configured with a third LED excitation light source (not shown) which emits light at a third wavelength, such as 648-674 nm, or even a fourth LED excitation source at a fourth different wavelength. The light from the LEDs 450A and 450B passes through narrow bandpass filters 452A and 452B, respectively, which pass light at particular wavelengths that are designed to excite fluorophores in the cell culture and virus medium. The light passing through the filter 452A reflects off a dichroic 454A and reflects off dichroic mirror 454B and is directed to an objective lens 460, e.g., a 20× magnifying lens. Light from LED 450B also passes through the filter 452B and also passes through the dichroic mirror 454B and is directed to the objective lens 460. The excitation light passing through the lens 460 then impinges on the bottom of the plate 10 and passes into the medium 404. In turn, emissions from the fluorophores in the sample pass through the lens 460, reflect off the mirror 454B, pass through the dichroic 454A, and pass through a narrow band emission filter 462 (filtering out non-fluorescence light) and impinge on a digital camera 464, which may take the form of a charge coupled device (CCD) or other type of camera currently known in the art and used in fluorescence microscopy. A motor system 418 then operates to move the entire optics module 402 in the X, Y and optionally Z directions while the light source 450A or 450B is in an ON state. It will be appreciated that normally only one optical channel is activated at a time, for example the LED 450A is turned on and image is captured, then LED 450A is turned off and LED 450B is activated, and a second image is captured.
It will be appreciated that the objective lens 460 can be mounted to a turret which can be rotated about a vertical axis such that a second objective lens of different magnification is placed into the optical path to obtain a second image at a different magnification. Furthermore, the motor system 418 can be configured such that it moves in the X and Y directions below the plate 404 such that the optical path of the fluorescence optics module 402 and the objective lens 460 is placed directly below each of cell cultures in the various wells 10 of the plate 404.
The details of the motor system 418 for the fluorescence optics module 402 can vary widely and are known to persons skilled in the art.
The use of fluorescence and filters in
In one embodiment, the virus sample is supplied to a separate instrument, such as the Sartorius Virus Counter®, in order to acquire a total particle count, where that separate instrument can operate in parallel with the virus plaque assay in the instrument of
This section will describe numerous possible embodiments for the model training and model application phases, respectively, in conjunction with
As explained previously, there is a model training (or setup) phase 100 of
As noted earlier, it is possible and preferable in some embodiments to go through or repeat the training phase to train models for commonly used cell types. Such models can then be shipped or provided as integrated parts of software modules to users of analytical instruments such as the one described above in
In the following discussion, certain meanings are ascribed to the terms used herein:
“Artificial neural network” (ANN) refers to type of machine learning model, consisting of multiple layers of nonlinear mathematical transformations with model parameters that are learned using optimization algorithms.
“Convolutional Neural Network” (CNN), refers to a type of ANN that is commonly used for processing data with spatial correlations, such as pixels forming shapes and objects in images.
“Activations” refers to an intermediate representation of input data as passed through layers of an ANN.
An example of the 4-step model training phase or process 100 and the 3-step application phase 200 shown in
Step 1. Acquire a training set in the form of one or more images, preferably microscopic images (602,
The microscopic images 602 may be label-free light microscopic images, such as brightfield images or phase contrast images.
Alternatively, the microscopic images 602 may also be fluorescence images of the cell culture labelled with a fluorescent marker of interest. In this embodiment, the fluorescent marker may be a fluorescent antibody binding to virus-specific protein epitopes, expressed on the surface or interior of virus-infected cells. Alternatively, the fluorescent marker may be a cell membrane marker, or a cell death marker. It is possible to label the cell culture with a combination of the above markers.
The microscopic images 602 may also be immunohistochemistry images, brightfield and phase, of the cell culture labeled as the result of the enzymatic action of a chromogenic detection system. This chromogenic marker may be an enzyme-linked direct primary antibody binding to virus-specific protein epitopes, expressed on the surface or interior of virus-infected cells. Alternatively, the chromogenic marker may be a secondary antibody with affinity for a first antibody, the latter specific for virus-specific protein epitopes, expressed on the surface or interior of virus-infected cells. As another possibility, the chromogenic detection system may be a combination of the horseradish peroxidase (HRP) enzyme conjugated to the primary or secondary antibody and the insoluble product made as a result of the action of HRP on 3,3′-diaminobenzidine tetrahydrochloride (DAB). As a still further possibility, the chromogenic detection system may be one of a plurality of other pairs of immunohistochemistry detection systems. See, for example, https://www.abcam.com/kits/substrates-and-chromogens-for-ihc.
The microscopic images 602 may be pairs of label-free light microscopic images and fluorescence images labelled with a fluorescent marker as explained above.
Step 2. For each experiment imaged in the training set, record at least one numeric virus titer readout 604 of the virus-treated cell cultures at tfinal. The virus titer readout may be recorded manually by visual inspection, or automatically using a computational algorithm to process the image(s) at tfinal. These virus titer readouts will hereon be denoted as the ground truth target, or simply “ground truth.”
The virus titer readout may be the readout from a plaque assay. In particular, the plaque assay readout may be the number of individual plaques (i.e., a standard readout).
Alternatively, the plaque assay readout may also be the area covered by plaque. (option a)
As a variation, the virus titer may be the readout from a plaque assay wherein the plaque assay readout is acquired automatically by using an image segmentation algorithm to segment cell mass from background and plaques as holes of absent cell mass forming during the duration of the experiment. (option b)
As another variation, the virus titer readout may be the readout from a Tissue Culture Infective Dose 50% Assay (TCID50). (option c)
As still another variation, the virus titer readout may by the readout from a focus forming assay (FFA). (option d)
As still another possibility, the virus titer readout may be combination of the above options, for example options a or b and option c; or options a or b and option d.
Step 3. Process all microscopic images in the training set to acquire a numeric representation 608 of each image (pair, in case fluorescent images are used), step 606 in
The processing 606 may consist of passing the whole images through a CNN to acquire a set of CNN activations per image.
As an alternative, the processing 606 may also or alternatively consist of the process steps shown in
The segmentation (step 1100) may be performed by various possible techniques, such as:
Label-free cell segmentation using traditional computer vision algorithms, label-free cell segmentation using a CNN for cell instance segmentation, or thresholding the membrane marker fluorescent image as given by the procedure of step 1 of using a cell membrane marker.
The cell-by-cell numeric description step 1104 may be calculated in a number of different manners. For example, one can use either of the following methods:
1. Extracting morphological features using feature extraction as in the process of calculating a human defined set of feature descriptors based on cell area, eccentricity, pointiness, minor/major-axis, granularity, etc.
2. Extracting morphological features by feeding the segmented sub-images of cells to a CNN to extract a machine learning-defined set of feature descriptors.
3. Extracting fluorescence levels as defined by the intra-cell sum of fluorescent pixels based on fluorescent image (where the cell culture is labeled with a fluorescence marker of interest, as explained in step 1).
The aggregation step 1106 may be performed in several possible methods. For example, it can be performed by calculating the feature-wise average over all cells in an image, calculating the ratio between different types of cells as defined based on the cell-by-cell numeric descriptions, or performing dimension reduction and calculating the probability distribution of the reduced dimension-space over all images and then aggregating each image as the distribution over the reduced dimension according to the probability distribution defined based on all images. This latter method refers to the single cell-shape distribution analysis as described in EP Patent Application 20290050.2 filed Jun. 12, 2020, the content of which is incorporated by reference. Alternatively, the aggregation step can be performed by feeding the cell-wise feature descriptions to a set-invariant neural network (such as Deep Sets, see Zaheer, Manzil, et al. “Deep sets”, Advances in neural information processing systems 30 (NIPS 2017)).
As noted in
As another example of the filtering step 1102 of
As another example of the filtering step 1102 of
Step 4. Train one or more machine learning models (step 108) on the training set numeric representations (608) to minimize the difference (or equivalently, error or loss) between the model prediction of the virus titer (predicted plaque assay) at tfinal and the ground truth, resulting in a trained machine learning model 150. See
The model 150 may take a variety of forms. For example, it could be a linear model, such as a partial least squares regression model. Alternatively, it could be a non-linear model, such as an ANN. As another example, the model 150 could be a probability distribution over the plaque assay readout, such as Gaussian process regression. See Rasmussen, Carl Edward, “Gaussian processes in machine learning.” Summer School on Machine Learning. Springer, Berlin, Heidelberg (2003). As another example, the model 150 could be a dynamic model, such as a neural ordinary differential equation model. See e.g., Chen, Ricky TQ, et al., “Neural ordinary differential equations,” Advances in neural information processing systems 31 (NeurIPS 2018).
The model 150 may be trained by iteratively adjusting the model parameters to minimize the error of the predicted plaque assay readout compared to the ground truth. This error could be given as a mean-squared error, a mean-absolute error, or as a piece-wise absolute, piece-wise squared error, also known as “Huber loss.” See Huber, Peter J., “Robust estimation of a location parameter”, Breakthroughs in statistics. Springer, New York, NY, 1992, pp. 492-518.
Note: When ANN models are used in two or more consecutive steps, they may optionally be connected and the virus titer prediction loss is back-propagated through the multiple sub-ANNs to optimize them jointly.
During the application phase, one or more experiments are run with virus-treated cell culture(s) for which the virus titer readout will be predicted earlier in time based on the model trained during the model training phase. For a given experiment, the final virus titer readout count is predicted at a time point t<tfinal by:
1. Acquiring one or more microscopic images of the experiment cell culture (
2. Processing the acquired image(s) into numerical representation(s) (
3. Predicting the final virus titer 204 by applying the trained machine learning model 150 to the numerical representation(s) 1000, as shown in
Referring now to
the type of cell line in their experiment,
the type of virus family being inoculated into the cell line,
the assay type (e.g., plaque forming unit count per unit volume, TCID50, both, other, etc.)
the dilution level in the cell plate,
the time or time periods after the start of processing that the prediction is desired (e.g. 4 hours, 6 hours, 15 hours, every 30 minutes or every hour, etc.).
Optionally, the menu can include a confidence level feature in which the user can program the application such that only predictions within a certain confidence interval or error limit are reported, and predictions with larger uncertainty are not reported. The interface shown in
Additionally, the menu can include an option to enter into a training mode whereby the user sets up the experimental design to train new, additional machine learning models to predict virus titer. For example, the display of the instrument could include a “TRAIN” icon (see
The trained machine learning model could be implemented in a processing unit of the instrument 400 of
As explained above, an embodiment is described in which an image or a sequence images is taken and segmentation algorithms are used to identify individual cells, see
Furthermore, because the methods are based in image-based trained models, the methods permit determination of a degree of cytopathic effects by predicting information such as how large, how many, and specific location of the plaques. Moreover, just because cytopathic effects are present in a sample, that this does not necessarily equate to plaque assay formation. The plaque assay of this disclosure enables a prediction of whether the cytopathic effects will develop into plaques. It is also not obvious or expected that all cells exhibiting cytopathic effects will lead to or result in plaque formation. Certain environmental conditions have to present for plaque formation to occur. The live-cell, unperturbed, imaging capabilities of an incubator-based microscope, such as shown in
The various embodiments of this disclosure may be complemented by in-silico labelling. In-silico labelling means that a machine learning or deep learning model has been trained to predict the corresponding fluorescent image of a fluorescent label of interest. Given a dataset of virus-treated cell cultures that is labelled with a label indicating viral activity, degree of infection, etc., a machine learning or deep learning model may be trained to predict the label from a corresponding light microscopy image. This trained model may then be applied to further image sets to predict a corresponding label in a label-free fashion. The predicted label can then be used as auxiliary input in the plaque prediction model or used as additional description of cells/grid elements in which predictions are assigned to particular cells in a grid.
Multivariate data analysis may also detect other phenotypic effects, either time-related or at a single time point, such as cell detachment and rounding during normal replication that are not directly related to viral activity and used as part of the prediction model or filtered out. The emerging technology may also be able to in real-time or temporally discriminate plaque size and rate of growth, potentially revealing information in regard to a quality control monitoring, outlier detection, and root cause analysis investigations for a number of quality parameters relating to the virus sample tested including aggregation status of the original sample, viral potency, and variation within the population due to mutations. The multivariate data analysis based on the extracted or generated image feature sets or other related metadata may also function as a discovery tool for the identification of other undetermined virus/cell interaction features through the detection of other variations in plaque morphology. Such variations in plaque morphology may not be distinguishable by current methods but are revealed by the multi-parametric analysis and throughput enabled by the combination of automation machine learning, multivariate data analysis and live cell imaging, which are aspects of this disclosure.
As another benefit of the methods of this disclosure, the plaque size required for detection is reduced, therefore effectively allowing more plaques per unit area/field of view without risk of overlap. As such, this effectively decreases the dilution series that is required, lessening a significant labor burden. The other alternative here is that one can decrease the area needed to examine plaques from an appropriate test dilution and potentially move to smaller formats, with higher throughput.
Specialist media may be added to the culture plate, e.g. detection aids specific to the virus or containing agents that aid imaging generally and in a machine learning specific manner. The latter might have more general application. Such media could, for example, have reagents that react to the release of cell contents or that aggregate on viral antigens (e.g. antibodies) and either carry detectable markers or through their aggregation form detectable structures. Such reagents might be either fluorogenic dyes or dyes with higher quantum yields at lower concentrations in the medium or other reagents such as molecular tags designed for detection through specific imaging methods, e.g. Raman spectroscopy. Reagents could be used for early detection of specific cytopathic effects such as, but not limited to, remodeling of the cytoskeleton, live/dead detection via membrane integrity changes, activation of apoptosis and autophagy pathways, cell cycle, and oxidative stress.
Antibody and other labels can facilitate the generation of mathematical algorithms up-front by adding classical identification for the training models in machine learning or Al that would be derived from these images, i.e. they can tell the person doing the modeling where the action is, where to look. These reagents can also be used for model confirmation. This is true in cases were the gross effects of virus infection are less obvious in a heterogeneous or raw preparation of virus where cytopathic effects might be more subtle. This may result with some viruses that may not be lytic or may not be lytic within the same time period as the more virulent viruses in a preparation.
Such binding molecules can provide additional information through the addition of information on the chemical and molecular composition of an area to the physical structural information detectable.
As stated previously, the present methods now allow for the essentially contemporaneous determination of (1) total particle count (by means of an assay of the sample in a viral particle counting instrument such as Sartorius' Virus Counter®) and (2) infectious particle count, via the viral plaque assay of this disclosure. Both assays can be conducted in parallel, at the same time, in separate instruments or platforms.
Finally, these techniques may also be applied to new chemical and biological entity potency assays developed and used with adhered cells outside of virology. These would include, but not be limited to, neutralization, cell proliferation, cell death (apoptosis), cytokine release, modulation of cell signaling, modulation of inflammatory response, receptor binding/activation, ligand binding, and calcium flux.
The applications for this invention are quite broad. The majority of the virus quantification market across basic research, development and manufacturing considers the plaque assay to be the standard assay. The applications include, but are not limited to:
1. basic research (academic or industrial),
2. assay development,
3. process development and production, including gene therapy, protein manufacturing via expression with baculovirus, and viral vaccines,
4. antiviral screening and development,
5. manufacturing Quality Control (QC),
6. Conversion Rate Optimization (CRO) testing,
7. viral stock establishment,
8. virus removal and/or inactivation, and
9. Good Manufacturing Process (GMP) validation and non-GMP studies, and
10. potency assays for new chemical or biological entities.
The appended claims are offered as further descriptions of the disclosed inventions. The above detailed description describes various features and functions of the disclosed systems, devices, and methods with reference to the accompanying figures. In the figures, similar symbols typically identify similar components, unless context indicates otherwise. The illustrative embodiments described in the detailed description, figures, and claims are not meant to be limiting. Other embodiments can be utilized, and other changes can be made, without departing from the scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein.
With respect to any or all of the message flow diagrams, scenarios, and flowcharts in the figures and as discussed herein, each step, block and/or communication may represent a processing of information and/or a transmission of information in accordance with example embodiments. Alternative embodiments are included within the scope of these example embodiments. In these alternative embodiments, for example, functions described as steps, blocks, transmissions, communications, requests, responses, and/or messages may be executed out of order from that shown or discussed, including in substantially concurrent or in reverse order, depending on the functionality involved. Further, more or fewer steps, blocks and/or functions may be used with any of the message flow diagrams, scenarios, and flow charts discussed herein, and these message flow diagrams, scenarios, and flow charts may be combined with one another, in part or in whole.
A step or block that represents a processing of information may correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique. Alternatively, or additionally, a step or block that represents a processing of information may correspond to a module, a segment, or a portion of program code (including related data). The program code may include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique. The program code and/or related data may be stored on any type of computer-readable medium, such as a storage device, including a disk drive, a hard drive, or other storage media.
The computer-readable medium may also include non-transitory computer-readable media such as computer-readable media that stores data for short periods of time like register memory, processor cache, and/or random access memory (RAM). The computer-readable media may also include non-transitory computer-readable media that stores program code and/or data for longer periods of time, such as secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, and/or compact-disc read only memory (CD-ROM), for example. The computer-readable media may also be any other volatile or non-volatile storage systems. A computer-readable medium may be considered a computer-readable storage medium, for example, or a tangible storage device.
Moreover, a step or block that represents one or more information transmissions may correspond to information transmissions between software and/or hardware modules in the same physical device. However, other information transmissions may be between software modules and/or hardware modules in different physical devices.
While various aspects and embodiments have been disclosed for purposes illustration and not limitation, it will be apparent to those skilled in the art that variation from the specifics of this disclosure are possible without departure from the scope of the invention. All questions concerning scope are to be answered by reference to the appended claims.