FLUID SAMPLE CLASSIFICATION

Description

TECHNICAL FIELD OF THE INVENTION

The present invention relates to methods and computer programs for the classification of samples which have been subject to a separation of their constituents, for example either by chromatography or electrophoresis, and more particularly to a method for classifying that sample based on the relative amount and constituent profile similarity to a reference sample.

BACKGROUND OF THE INVENTION

In the manufacturing of biopharmaceuticals such as vaccines, antibodies, recombinant proteins, gene therapy vectors etc. several chromatographic separation steps are usually needed to remove various contaminants and impurities from the product. During each step of the manufacturing process, there is a need to check both amount and purity compared to reference samples.

However, the separation profiles often display multiple molecule-peak bands which may overlap. The analysis of such complex separation profiles adds significant cost and process time. Furthermore, separation profiles of complex samples can be difficult to analyse accurately which may introduce individual operator bias. Hence, there is a significant interest in fast automated analysis, both to remove personal bias and to reduce the time of manufacturing biopharmaceuticals. Accordingly, there is a need for methods and computer programs for sample comparisons which can be both fast and automated if needed.

SUMMARY OF THE INVENTION

One aspect of the invention is to provide a method, which may be implemented by a computer program, for comparison of different samples in a biopharmaceutical process. This is achieved with the introduction of a similarity score based on a two-dimensional analysis of the relative amount and constituent profile similarity to a reference sample. The relative amount is a measurement of the magnitude of different chemical constituents of a sample compared to the magnitude of constituents of a reference sample. The constituent profile similarity calculation is a measurement of similarity of a spatial, or temporal, profile of separated constituents generated by a separation process, in comparison to the profile(s) of one or multiple reference sample(s) which have been subject to the same separation process. The resulting two-dimensional data set forms the basis of classification of samples by providing a score of similarity to each sample which allows an estimate of the similarity of the sample of interest with the reference sample(s). The analysis method can be automated and implemented using computer analysis software together with suitable hardware, after the classification criteria has been set.

One advantage is that such analysis method allows for a fast, non-operator dependent, classification scheme, in which limits for grouping samples can be easily set for automated analysis. This method allows for decisions to be made, for example if the manufacturing process is working satisfactory, or if separation parameters need to be changed. Additionally, it is common for separations of sample constituents to be incomplete, in other words measured bands of constituents overlap, leading to data which is difficult to analyse. The proposed method allows for such incomplete separation by comparing a measured profile with a reference in the manner described immediately above, rather than looking at certain peaks of measurements only.

Further suitable embodiments of the invention are described in the dependent claims.

DRAWINGS

FIG. 1 shows an image of an analysed Coomassie stained gel. Different samples (1-10) were separated on a polyacrylamide gel and subsequently stained with the Coomassie stain. The gel was analysed using the analysis software Image Quant TL (available from Cytiva Life Sciences).

FIG. 2 shows the electrophoretic lane profile of sample 5 in FIG. 1.

FIG. 3 shows the two-dimensional similarity score plot of the samples in FIG. 1 and FIG. 2. The relative amount of sample (y-axis) and the lane profile similarity score (x-axis) are displayed in one plot. In this example, sample number 8 was the reference sample.

FIG. 4 shows one way of grouping the different samples of FIG. 3 into three groups (A-C).

FIG. 5 shows chromatograms for two protein samples.

FIG. 6 shows a 2D similarity scatter plot of 185 chromatograms from different cycles in a protein purification process produced using an embodiment of the present invention.

FIG. 7 shows a Graphical User Interface (GUI) for presenting results produced using an embodiment of the present invention.

FIG. 8 shows a method according to various embodiments of the present invention.

DEFINITIONS

As used herein, the terms “comprises,” “comprising,” “containing,” “having” and the like can have the meaning ascribed to them and can mean “includes,” “including,” and the like; “consisting essentially of” or “consists essentially” likewise has the meaning ascribed in patent law and the term is open-ended, allowing for the presence of more than that which is recited so long as basic or novel characteristics of that which is recited is not changed by the presence of more than that which is recited, but excludes prior art embodiments.

DETAILED DESCRIPTION OF EMBODIMENTS

In one aspect, the present invention discloses a computer assisted method for automated analysis of samples which have been subject to a separation, which term includes forming into aliquots, fractions or streams of higher concentration of at least some of the samples' constituents. The term separation includes partial separation.

In certain embodiments, the samples may be intermediates or a final product, in a biopharmaceutical manufacturing process. In addition, reference samples may either be subjected to a chemical separation and subsequent analysis when the process was created, or saved for a chemical separation analysis at a later stage. It is well known in the art of biopharmaceutical manufacturing how store reference samples for future analysis. Saved reference sample data may be used also for comparisons.

In some embodiments, the separation is performed by electrophoresis and the separated molecules are detected using colour stains or fluorescent dyes. Lane profiles of the electrophoretic separations are then compared, both in terms of how much sample constituent there is in the lane, and how similar the lane spatial profiles are, in comparison with a reference sample.

In some embodiments, the separation is performed by chromatography and separated molecules are detected using light absorbance measurements. Separation profiles are then compared, both in terms of how much sample was eluted from the chromatography column and how similar the sample profiles (also called chromatograms) are. In this way the reproducibility of different batches in a bioprocess manufacturing process can quickly assessed and decisions for the continuation of the process can be easily made.

EXAMPLES
Example 1

Different protein samples were analysed using SDS-PAGE electrophoresis. The resulting Coomassie stained gel is shown in FIG. 1. Some samples contain multiple proteins and the resulting lane profile after electrophoretic separation is complex, i.e. it exhibits many overlapping peaks. Such lane profiles are difficult to compare manually by eye. The opacity, also termed optical density, of each lane or lanes of interest can be measured in relation to the spatial position along the lane. Such a measurement is given in the graph shown in FIG. 2 for lane 5 of FIG. 1 only, but the same measurement can be made for the other lanes shown in FIG. 1. Using the data generated from the graph of FIG. 2 (which need not be graphically represented but could exist purely as data), the lane profiles were analysed both in terms of amount of protein (opacity =optical density in this case) and how the different lane profiles correlate spatially, using the Pearson correlation coefficient. This calculation compares two arrays of data, in this case sample lane profile data of the type graphically illustrated in FIG. 2, with a reference sample lane profile-in this case lane 8. The calculated value can range from −1 (negative correlation) to 0 (no correlation) to 1 (positive correlation). The Pearson pair-wise correlation calculation is one way to correlate different lane-profiles using a reference lane profile or a reference average of several lane profiles. However, there are many other ways to correlate lane profiles which may be used.

The results of the correlation calculation for each of the ten lanes shown in FIG. 1 are displayed in a two-dimensional plot in FIG. 3. Where it becomes apparent that certain lanes correlate well to the reference sample in terms of the amount of a sample constituent separated proteins in this lane give rise to the opacity bands shown in FIG. 1 and are given a score close to 1 (y axis), and also the entire lane profile for certain lanes correlate well with the reference lane and are thus given a score close to 1 (x axis). These data provide two similarity comparisons: the amount of chemical constituent (e.g. protein) similarity (y axis); and the actual chemical constituent profile similarity (x axis). It is also apparent that the samples in this example fall into different groups based on the separation profile similarity score.

As shown in FIG. 4, the data points allow for easy classification by grouping of samples. Such classification allows for fast analysis and is not dependent on the users estimate of how similar different profiles are. Limits for grouping sample can be set based on:

- A) The amount of sample similarity only;
- B) The chemical separation profile similarity only;
- C) The amount of sample and chemical separation profile similarity.

Thus, depending on which samples have been analysed, different rules for grouping samples may be applied. For example, only group C samples in FIG. 4 are considered to be classified as falling within an acceptable similarity to the reference sample.

Example 2

A similar technique as described above can be applied to chromatographic separations as shown in FIG. 5, where a photo-absorbance meter has been used to measure bands of chemical constituents, such as proteins, emerging from a column containing a chromatographic media (typical UV absorbance to measure protein concentration). The resultant data (called a chromatogram) is shown in FIG. 5 with two chromatograms superimposed and the data set comprises a photodetector output reading, equivalent to the opacity/constituent amount measurements of FIG. 2, and temporal (time related) data, equivalent to the spatial/distance data shown also in FIG. 2. That data set is processed in the same way as described above, again to provide a comparison score of the constituent amount and chemical similarity with a reference sample or average sample data. That processed data then classifies the sample, for example to pass or fail result in a biopharmaceutical manufacturing process. In FIG. 5, the two chromatograms of two protein samples, with the same starting concentration, were compared using an Akta pure 25 chromatography system (available from Cytiva Life Sciences). The samples differed in type of Strep-Tactin tag used for separation. As a result, the chromatogram (i.e. the time separation profile) showed different peak shapes and subtle changes in separation profile. For this part of the chromatogram, the similarity score analysis resulted in a similarity score of (0,90; 0,87) for sample 1, where 0,90 is the separation profile similarity score and 0,87 is the relative amount, which shows that sample 1 differed both in eluted amount, and also to some degree in the chemical binding to the separation matrix, compared to the reference sample.

Those skilled in the art would be aware that various analyses could be performed in accordance with the present invention. For example, when considering chromatography examples, additional dimensions in the chromatography analysis may be, for example, one or more of: pH, conductivity and/or pressure. Hence, chromatography run data such as time-related data corresponding to pressure, pH and/or conductivity data can be used for the classification of samples. Such parameters may instead of a relative amount, or additionally thereto, be used as y-axis data (ordinate data) for a similarity plot.

FIG. 6 shows a 2D similarity scatter plot of 185 chromatograms from different cycles in a protein purification process produced using an embodiment of the present invention. The plot shows how chromatogram relative peak area correlates to separation profile similarity score. Such plot data demonstrated that using embodiments of the present invention many runs can be analyzed at high resolution, and can further identify data points that cannot readily be determined, for example, by a skilled operator. Various embodiments of the invention may thus be used to alert an operator to outlier values and/or could be used in automated processing systems to optimise control parameters for a chromatography based bio-processing system.

In FIG. 6, data was derived for each of the 185 chromatograms from purification cycles using Fibro HiTrap PrismA™ chromatography units in an Akta Pure 25™ system with pH step elution. The reference sample was chosen as the first chromatogram in the series. All chromatograms were apparently very similar but a Pearson correlation coefficient calculation was used to detect two runs which differed in terms of peak shape. These two values appear as outliers to the left of FIG. 6. After inspection, all cycle chromatograms were judged to be acceptable. (Note: Some values in these chromatograms were saturated and peak area is therefore not proportional to relative amount for this particular data-set.)

FIG. 7 shows a Graphical User Interface for presenting results produced using an embodiment of the present invention. The GUI includes a window that provides 2D visualization of data. In this example, a similarity score vs. relative volume plot is shown. Such a 2D scatter plot provides for easy user visualization. The GUI may also, or alternatively, enable user selection of reference profiles so as to display profiles, for example electrophoresis lanes or chromatograms, and show the two-dimensional sample data in a scatter plot.

Furthermore, in various embodiments, the 2D scatter plot(s) can enable a user to select a region of interest for analysis, remove all data points from an analysis which have reached the maximum limit of the detector and/or use the scatter plot to set limits for grouping samples into different groups. Moreover, various embodiments may also allow a scatter plot to track a protein purification process, for example, by using trend lines or colour gradients.

FIG. 8 shows a method according to various embodiments of the present invention. The method 100 comprises the steps of: a. at least partially separating 110 one or more of the chemical constituents of the fluid sample; b. measuring and recording 120 the amount of separated chemical constituents of the sample during, or after, the chemical separation; c. measuring and recording 130 the spatial or time separation profile of sample constituents, during or after separation, and providing a data set of the same; d. comparing 140 the amount of said separated constituents to one or more reference samples; e. comparing 150 the spatial or time separation profile to the corresponding profile of the or each reference sample; f assigning 160 a similarity score to the sample based on the similarity of the amount or the profile comparisons of the separated constituents, as performed under steps d. 140 and e. 150 above, or both, with the equivalent amount and/or profile of the or each reference sample respectively; g. providing 170 a classification of the sample based on the similarity score.

In various embodiments, relating to electrophoresis, the method 100 may include one or more of the following steps:

1a. Selecting region of interest in an image. For electrophoresis a user may create a lane box, for example, using a GUI of the type referred to above.

2a. Optionally, saturated regions of the lane profiles may then be excluded from analysis, i.e. regions in which the detector has reached its maximum value. This may be automated or could be user driven.

3a. Correcting for uneven migration. A user may adjust a lane box and lanes to correct for uneven migration across an electrophoresis gel. Optionally, the lane profile scales may be corrected by comparing to marker samples, i.e. samples with known molecular weight, in other lanes, either by the user or automatically.

In various embodiments, relating to chromatography, the method 100 may include one or more of the following steps:

1b. Selecting a region of interest in chromatogram either manually or automatically by way of analysis software. This step is optional, alternatively a full chromatogram can be analyzed.

2b. Optionally, saturated regions of the lane profiles are excluded from an analysis, i.e. regions in which the detector has reached its maximum value.

3b. Aligning of chromatograms for comparison is undertaken. Chromatogram alignment can be performed either automatically by a software algorithm automatically or by the user. Alignment is typically based on the performed chromatography operations, for example start of a phase, or time at elution of a known reference sample. This step is optional, and in some cases no alignment is needed.

The following steps may then be applied to both electrophoresis analysis and chromatogram analysis:

4. If data-sets have a different number of data points, individual electrophoresis lane profiles or chromatograms may then either be sampled or interpolated to obtain the same number of data points per sample.

5. Analysis of all data points or user a defined analysis range may then be performed. For example, in some cases there is only one peak of interest, then an analysis range may adjusted accordingly either automatically or by the user.

6. The integrated signal of samples of a lane or chromatogram is calculated. Alternatively, the volume sum of all detected bands, or peaks, are summed for each lane.

7. In a preferred embodiment, all possible pair-wise comparisons of the N samples are made. This results in N X N arrays. For example, one array for relative amounts, and one for profile similarity score. Such may also be used to compare pressure, pH and/or conductivity data-sets for chromatography.

8. A GUI allows user to select one reference lane or chromatogram and a 2D scatter plot may then be generated showing relative amounts on the y-axis and profile similarity score on the x-axis.

Various embodiments and features of the present invention have thus been described. This written description further uses examples to disclose the invention, including the preferred mode, and is provided to enable any person skilled in the art to practice the invention, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the invention is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal languages of the claims. Any patents or patent applications or commercially available products, such as systems or software, mentioned in the text herein are hereby incorporated by reference in their entireties, as if they were individually incorporated, where such is permitted.

Claims

1. A method for classifying a fluid sample, which method comprises the steps of: a. at least partially separating one or more of the chemical constituents of the fluid sample;b. measuring and recording data relating to the separated chemical constituents of the sample during, or after, the chemical separation, such as an amount thereof;c. measuring and recording the spatial or time separation profile of sample constituents, during or after separation, and providing a data set of the same;d. comparing the amount of said separated constituents to one or more reference samples;e. comparing the spatial or time separation profile to the corresponding profile of the or each reference sample;f. assigning a similarity score to the sample based on the similarity of the amount or the profile comparisons of the separated constituents, as performed under steps d and e above, or both, with the equivalent amount and/or profile of the or each reference sample respectively; andg. providing a classification of the sample based on the similarity score.
2. A method according to claim 1, wherein plural samples are classified, wherein step b. includes providing a data set of the amount of the or each separated constituent for each of the samples, wherein step c. includes providing a data set of the spatial separation profile or time separation profile for each sample, and wherein those two data sets are processed by an algorithm to provide a two-dimensional sample data set for each sample which is used in steps d. to g.
3. A method according to claim 2, wherein different groups of samples represent pass or fail results in a quality control of a biopharmaceutical manufacturing process.
4. A method according to claim 1, wherein the presence, or absence, of samples in different groups lead to changes in a biopharmaceutical manufacturing process.
5. A method according to claim 1, wherein the chemical separation is electrophoresis.
6. A method according to claim 1, wherein the chemical separation is chromatography.
7. A method according to claim 6 where chromatography run data, for example time-series of pressure, pH and/or conductivity data, or a combination of these, is used for the classification of samples.
8. A method according to claim 1, wherein the spatial or time separation profile similarity score is calculated using the Pearson correlation function.
9. A computer program, comprising program code for performing the method of claim 1 when the program is run on a computer.
10. A computer program according to claim 9, further operable to provide a graphical user interface (GUI) for user selection and/or presentation of reference profiles and/or regions of interest for analysis and/or data, and/or electrophoresis lanes and/or chromatograms and/or a two-dimensional scatter plot.
11. A computer program according to claim 10, wherein the GUI is further configured to enable a user to remove data points from an analysis which have reached a maximum limit of a detector.
12. A computer program according to claim 10, wherein the GUI is configured to present a two-dimensional scatter plot which can be used to set limits for grouping samples into different groups and/or track a protein purification process.
13. A computer program according to claim 12, wherein the GUI is configured to present trend lines and/or colour gradients to aid in tracking a protein purification process.
14. A biopharmaceutical manufacturing plant configured to implement the method as defined in claim 1 to check and/or control a biopharmaceutical manufacturing process.

Priority Claims (1)

Number	Date	Country	Kind
1914575.4	Oct 2019	GB	national

PCT Information

Filing Document	Filing Date	Country	Kind
PCT/EP2020/078119	10/7/2020	WO

FLUID SAMPLE CLASSIFICATION

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

PCT Information