The invention relates to cell biology imaging and/or High-Content Screening and/or high throughput screening (HTS).
High-Content Screening (HCS) technologies generate a massive amount of data. Today's major challenge in HCS is the analysis of those data and thus the development of effective data mining and exploration tools. Indeed images acquisition, segmentation and features extraction are now well conducted with the latest technologies such as BD Pathway™ and BD Attovision™ system. Finally, today's data analyses for biological response only provide results from a black box global analysis, overall results from each well, and do not extract the full potential from the High-Content single cell response data.
The known solutions are very complicated and demand trained people to perform accurate analysis. Several concepts of computer sciences are required (token) to achieve complete analysis.
The present invention proposes a method, a system and a computer program allowing the visualization of cell images, the analysis and the representation of cellular data produced by microscopes and by third parties automated image processing system. It allows a user to define gates to segregate cell populations and then to quantitate assay.
In particular the invention proposes an innovative software-implemented method and system allowing the discrimination of the different cellular populations present in the well and thus individual analysis on each of them. Moreover, it also provides statistical analysis methods and eases the data management and visualization.
Thus, the invention proposes a data processing system for generating representations and analyses of cytometric information, said system being configured for:
loading at least one set of data including:
processing the loaded data for obtaining representations and analyses of the data, said representations and analyses being chosen from the following;
The invention can comprise at least one of the following features:
The invention also proposes a method for generating representations and analyses of cytometric information performed by the system according to the invention.
The invention also proposes a user interface for use with the system of the invention, for visualizing and analyzing data from cellular image processing, comprising a graphical interface operative to designate image and statistics files source and destination, display parameters, scatter plots, heat-maps, dose responses, biological conditions input, gates in scatter-plots and histograms.
The invention also proposes a computer program executed by the system according to the invention.
The present invention aims to analyze cell biology data obtained from experiments run in a variety of format including 96- to 384-well microplate formats and provides a method and a system having the following advantages:
The present invention consists of an innovative software solution that can discriminate the various cell populations present in a well and thus provides individual analysis of each of them. Moreover, this solution provides statistical analysis methods and facilitates data management and display.
The invention provides a generic tool for loading and visualizing statistical data generated by cell segmentation and feature extraction. A user-friendly graphic interface displays selected features of wells of interest with various graphic options. Gates are used to identify subpopulations and to filter out unwanted cells. Cell statistics (including gate information) can be exported into table files.
An advantage of the invention is to select individual cell to check their positions in the different windows in order to place correctly the gate. Cells can be checked in galleries, in the image fields or in the scattergrams.
An advantage of the invention is to use gates to quantify high content screening data or image-based cellular assays, which is very adapted. It helps to remove all unwanted cells, and to segregate subpopulations that behave differently. It makes possible to monitor cell subpopulations using the percentage of cells in each subpopulation as a read-out. This read-out is more stable than the average of a given feature.
An advantage of the invention is to regroup easily the replicates, making the navigation inside the results very simple. In particular, it allows the direct derivation of dose response functions with error bars, and directly computes the experiment robustness measure (z factor).
Visualization of HCS data requires the display of many cellular information in separated graphs like cell view, field view, scatter plot, histogram. In the invention, all the different windows are synchronized in real time. The multiple windows allow a visual check of the position of the gates and make possible trial and error gate positioning.
The invention permits to differentiate between multiple cell populations in a given well and allows the discrimination of the different cellular populations present in the well and thus individual analysis on each of them.
The invention does not require any particular skills in computer science nor image processing or statistics knowledge for its use.
The invention also presents other multiple advantages since:
Other features and advantages of the invention will appear in the following description. Embodiments of the invention will be described with reference to the drawings, in which:
We introduce various definitions for the understanding of the following description.
Feature: numerical value measuring a parameter in a cell. Features can reveal the cell morphology (shape quantification) or the cell intensity.
Gate: closed region in a 1D or 2D feature space. Gates are used to select cells that are present in the defined region.
Child and parent gates: a gate “C” can be linked to an existing gate “P”, such that cells inside gate “C” are inside gate “C” and gate “G.” Gate “C” is named child gate and gate “P” is named parent gate.
Scatterplot or scattergram: the data are displayed as a collection of points, each having the value of one feature determining the position on the horizontal axis and the value of the other feature determining the position on the vertical axis.
Density plot: it is calculated as a two dimensional histogram view, where the 2D space is subdivided into subunit of area. The color in each subunit of area indicates the cell count in this area. The white color indicates that no cells are detected. Other colors indicate that at least one cell is detected (black means few cells, yellow means many cells).
Well: the well is a part of a multi-well plate. Each well is treated independently form others.
Sample: If the experiment is not done on multi-well plate but in any other format, the user has to name the different samples using well name convention (A01, A02, . . . P24). Then samples are considered as wells.
Cells: referring to biological cells. By abuse of language, it can also refer to part of cells like cell membrane, nuclei, cytoplasmic compartment. Sometime the word “cell/object” will be used.
Data to be opened in the data processing system are generated by third parties cell/objects segmentation software (like Metamorph™ (Molecular Device Corporation) or Attovision™ (BD Becton, Dickinson)). Those software load initial images acquired with a microscope corresponding to wells and analyze the image. It identifies the different cells/objects and segments them (meaning it defines a unique area around each cell to separate the cell from others and from the background area). The result of the segmentation is a mask for all images giving the position and the boundary of each cell/object. Then, all cells/objects are quantified by calculating morphometric parameters and/or intensity based measurements. So each cell/object is defined by a set of parameters that are called numerical features (or for sake of simplicity feature). For each well or sample corresponding to one or more channels images, an array of feature is extracted giving for each cell/object a feature set. A dataset is generally composed of a set the feature array per well. The dataset is saved into a text file or an Excel™ file.
The full dataset is composed of the original image files (one or more channels per well), the feature dataset and masks of the cell/object segmentation. The correspondence between those different kinds of information is achieved by the well names and cell indexes.
The application is for instance compatible to the following file formats:
The system loads data from the different wells composing the plate. If the features associated with the well data are not consistent along the plate then the application can reorder them and fill the missing feature with dummy values (NaN: Not a Number).
Once a first dataset is loaded, it is possible to load a second one and bind them. There is two different ways to bind datasets. In the first case, the wells present in the first dataset are different from the wells present in the second dataset. Thus the binding consists in pooling all wells from the first dataset and from the second dataset. If some feature names from the first dataset are different from feature names of the second dataset, then a new feature name set is defined as the union of the two feature name sets. Feature data that are not reported in the datasets are assigned to NaN values.
The second way to combine two datasets is to bind two dataset referring the same well list and the same cell list but that integrate different features. In that case the two datasets are bounded by reporting for each cell the information from the two datasets.
Users can import the experiment's plate layout to make easier the analysis. It defines a biological condition corresponding to each well. The plate format is specified either by 96 or 384 options. User can directly change the table values or copy the data from Excel™ and press Paste Clipboard. Once the table is well filled, loading is completed by pressing the button “Import Layout”. The different wells having the same biological conditions are considered as replicates and are regrouped. The replicate grouping allows representing all cells belonging to a biological condition together in the scattergram or a histogram. It also allows computing standard deviation among replicate in order to plot error bars in dose response or computing the z factor.
With reference to
With reference to
The plate view 1 allows selecting some wells to display. User performs a multiple selection by pressing the key Control on the keyboard and clicking on the wells. When a plate layout is loaded, the user has the choice between the well view and the replicate view. In the well view, all wells appear is the list whereas in the replicate view, biological conditions are listed and each biological condition contains several wells. In the replicate view, the different replicate names are display with under bracket the number of well corresponding to the given replicate name.
The two list boxes 2 and 3 are meant to select the features to display. If only one feature is selected, a histogram of the chosen feature is displayed. If two features are selected, a scatted plot of the two features is represented.
In the histogram view and in the scatter plot view, several display options 4 are available. First, colors encode cell gate and different symbols are used for the multi-selection. Second, colors encode multiple selection and symbols encode the gates. Third, color encode cell gate and all cells are represented by points. Fourth, colors encode multiple selection and all cells are represented by points. In all cases, the figure legend 8 gives explicitly the meaning of the different color and symbols.
The right panel gives some real time statistics 5 on the selected wells or replicates and the gates, as shown
The plot panel 6 gives a view of the selected features and the gates. The X and Y axis names are specified on the graph. Several graphical options are usable like the log view or the density map. The plot limits can be set using several ways: first where the minimum and the maximum values are specified by the user, second the plot limits are set to the minimum and the maximum values of the current cell selection, third the plots limits are set to the minimum and the maximum values of the full dataset.
With gate management 7, different buttons allows creating, deleting and selecting gates.
The histogram view is obtained by selecting only one feature. Each line represents one histogram and one histogram represents cells belonging to a well (or a replicate) and in a given gate. Other histograms represent for each well or replicate. The number of bins can be changed by the user. The histograms can be renormalized and expressed as ratio of the full cell population.
With reference to
To view a scatter plot, two features have to be selected. All symbols correspond to an individual cell. As for the histogram view, the axes can be either logarithmic or linear.
With reference to
Density Map View (See
Pixel intensity corresponds to the cell density. Bright pixels correspond to high-density regions. The number of bins can be set by the user.
Plate Visualization Using Heatmap (See
A global view of the plate is available using a heatmap representation of the 96 or 384 multi-well plate. This visualization is only accessible if the well names are formatted using the standard well names like A5, A05 or A005. This representation is a color-coding of the average intensity per well. Crosses in wells indicates that no cells are present is the well. The represented feature and the current gates can be changed using the pop-up menus. Several kind of information can be displayed:
If the user move or remove a gate, the heatmap representation is updated in real time.
If the user put the mouse cursor above a well in the heatmap representation, some additional information is written in the window specifying the well label (like A05), the biological condition if it is referenced and the current value according the heatmap parameters.
With reference to
A dedicated window allows visualizing dose response. The user can select biological conditions (
With reference to
Gates are used to select cells according to their localization. Each gate is associated to one or two features. The gate is drawn in the associated features. If the gate is drawn in the histogram view, then the gate is associated to one feature. If the gate is drawn in a scatter plot then the gate is associated to two features. In the scatter plot view, the user is meant to draw a closed polygon on the graph. In the histogram view, the user has to draw a line to define the selection interval of the gate. A new gate can be linked to another by selecting an already defined gate in proposed list (“Base the new gate on”). If the choice is “new gate” then the new gate will be independent to all other gates. But if the choice is an existing gate then the resulting gate will be the intersection between the new gate and the selected gate.
The process can be extended to any gate number. When a gate is linked to another, the two gates can be defined in the same feature set or not. If the button delete gate is pressed the gate selected in the list box is deleted. All gates can be moved either by moving 1 point of the polygon or by drifting the full gate. The symbol and the color of the cells are not automatically refreshed when a gate is changed.
The gate list can be exported and imported another time using text files. It is also possible to remove points inside a gate or outside a gate. This is generally used to remove unwanted cell population like error of segmentation or cell clusters. Cell removal is applied to the full dataset (to all wells).
With reference to
With reference to
A button View cells allows viewing some cells in a given gate. This feature is only available if the dataset contains links to valid image files. The user has to choose a gate in which the cells will be picked and a particular well. To ease the process, only wells contained in the current selection are proposed as choice to the user. Thus, a figure is created showing a montage of cropped cells. On each cell a feature value is added to get a direct understanding of feature values. In the new figure many display options are available. First, the selected well is indicated and can be changed. Second, the cell montage can be obtained using overlay of different channels. Each channel can be attributed to one of those colors: green red blue and grey. The cyan line corresponds to the contour of the cell of interest whereas the blue ones correspond to the surrounding cells. Third, the chosen gate is indicated and can be changed. Fourth, the montage parameter indicates the number of cells in the montage: 3×3, 5×5 or 10×10. If all the cells cannot fit into the montage, several pages are accessible. Fifth, the user can change the page, if several pages are accessible. Sixth, the feature displayed on each cell can be changed.
With reference to
The field view is obtained by clicking into a well in the heatmap representation. It opens a window containing an overlay of the different channel of the selected well. The overlay is controlled by the same options than in the cell montage. A color is given for all cells outside of any gate. The correspondence between colors and gates is given directly in the field image. The user can click on a cell and view its position in the scatter plots and in the cell montage if this cell is present in the montage. The color of the cell contour is given by the gate containing the cell.
With reference to
In many situations, creating new features from the existing ones is valuable. Two ways of creating feature is possible, either combining two features (like multiplying 2 features) or combining one feature and a fixed value (like dividing one feature by 1000). A dedicated interface allows performing those operations. The number of features has to be set and the subsequent features have to be selected like in the following examples. For each feature creation an operator is selected among the usual operator *, /, +, −.
With reference to
A dedicated interface allows computing the first principle component of a dataset. The user is required to select features from the current dataset on which a PCA is applied. The user has to select the number of components to be added in the current dataset, as well as a preliminary normalization. The available normalizations are non-normalization, “z score” normalization and “min max” normalizations. The “z score” normalization can be achieved using zscore Matlab™ command. The “min max” normalization stretches (affine transformation) the dataset such as the new minimum is 0 and the new maximum is 1.
With reference to
In order to lighten the data and to increase the readability, some features can be deleted for a better focus on the important ones. This can be done by selecting one or several features to delete. If a selected feature is used by a gate then the given feature is not deleted.
Once the gates are correctly placed, an export function generates the statistics of the dataset. Those statistics give a quantitative summary of each well and each replicate. The user can choose the type of statistics to be performed on the dataset. The available choices are:
The user can also choose the gates in which the statistics will be calculated. One the options are set by the user, the statistics are exported into an excel file.
With reference to
In this document layout, the user places and arranges any graphs of the application. It can be scattergrams, histograms and cell montage as well as some textual information like the current well or biological condition and statistics. All information present in this graph is related the current well or biological condition selection.
With reference to
All windows are connections through event processing. Whenever a gate is moved or removed, all the windows are automatically updated such as the user can directly monitor the impact the gate transform.
With reference to
It is possible that the user clicks on a given cell either on scattergrams, or on density maps, or on cell montages or on field views. The selected cell is represented by a red cross on the scattergram and density maps or surrounded by a red rectangle in field views and cell montages. This enables the monitoring of a same cell on the different views. It eases the correct positioning of the gate using trial and error strategy.
A cell cycle study related example is now described.
The dataset used for the case study is a 96 well plate with Hoechst (nuclei staining) and EdU-GFP (DNA synthesis). The dataset is acquired using the BD Pathway™ 855.
1. Open Attovision™ statistical file (click on open Attovision™ statistical file).
2. Load a plate layout (click on Load plate layout) and paste biological condition from excel (press copy from clipboard button, see
3. Remove unwanted cell (like cell fragments and cell doublets):
Comparison Between the Invention and Other Products Known in the Art
The table below summarizes the differences between the invention and the known solutions of the prior art.
| Filing Document | Filing Date | Country | Kind | 371c Date |
|---|---|---|---|---|
| PCT/EP2012/056928 | 4/16/2012 | WO | 00 | 5/27/2014 |
| Number | Date | Country | |
|---|---|---|---|
| 61475828 | Apr 2011 | US |