The present invention relates to computer-implemented methods of differentiating myeloid and lymphoid blast cells, and related methods for diagnosing acute myeloid leukaemia or acute lymphoid leukaemia based on the output of the differentiation. Computer-implemented methods for training a deep neural network are also provided, as well as a clinical decision support system.
Blast cells are precursors to mature blood cells which are found circulating in a person's bloodstream. Normally, blast cells are confined to a person's bone marrow. However, when a patient suffers from leukaemia, abnormal blast cells proliferate uncontrollably in the bone marrow to such an extent that production of other cells, important for survival, is prevented. Furthermore, the uncontrollable proliferation also causes the abnormal blast cells to leak into a person's bloodstream. Accordingly, leukaemia may be diagnosed by detection of these abnormal blast cells within a patient's blood stream.
Acute leukaemia presents itself in forms including acute myeloid leukaemia (AML) and acute lymphoid leukaemia (ALL), each of which have several subtypes. In order to determine which type of leukaemia is present, it is necessary to classify the abnormal blast cells as either myeloid blast cells or lymphoid blast cells. Up to this point, this classification has been very challenging, because the cells are immature, and lack cell lineage differentiation.
Current methods for classifying whether the abnormal blast cells are myeloid blast cells or lymphoid blast cells rely on several stages of cluster of differentiation (CD) marker assessments. An example of a workflow which might be used is as follows. In a first step, a complete blood count (CBC) would be obtained, either as part of routine screening, or due to the expression of possible symptoms of leukaemia. If the results of the CBC show abnormal results (i.e. abnormal numbers of blast cells in the blood), then a blood smear may be taken, and examined by a haematologist. If the presence of abnormal blast cells in the blood is confirmed in the analysis of the blood smear, repeat samples may be taken for confirmation. Then, further analysis including CD marker assessment may be performed. The CD marker assessment is used to determine cell lineage (i.e. whether the blast cells are myeloid blast cells or lymphoid blast cells). After that determination has taken place, a CD marker panel for specific myeloid or lymphoid cell lines may be carried out, eventually leading to a diagnosis. It will be appreciated that this is a lengthy process which requires several stages of CD marker assessment in order to reach a diagnosis. In combination with the CD marker assessment, other techniques may be used including analysis of cerebrospinal fluid (CSE) or bone marrow samples using cytogenetics, fluorescence in situ hybridization (FISH) or polymerase chain reaction (PCR) techniques. These processes are equally time consuming, and in many cases, expensive.
The present invention aims to address this by providing a computer-implemented method for differentiating between myeloid blast cells and lymphoid blast cells.
At a high level, the present invention provides a computer-implemented method which uses a parametric model to differentiate between lymphoid blast cells and myeloid blast cells. Alternatively put, the computer-implemented method is a method of determining whether a blast cell in a digital image is from a myeloid lineage or a lymphoid lineage.
More specifically, a first aspect of the present invention provides a computer-implemented method of differentiating between lymphoid blast cells and myeloid blast cells, the computer-implemented method comprising: receiving a digital image containing one or more blast cells; applying a parametric model classifier to one or more portions of the digital image each containing a respective blast cell, the parametric model classifier configured to generate an output indicative of whether each blast cell is a lymphoid blast cell or a myeloid blast cell.
By using a classifier at an initial stage, as required by computer-implemented methods according to the first aspect of the invention, it is possible to reduce or avoid the need for various stages of CD marker analysis (or other lengthy techniques) in order to arrive at a definitive diagnosis of AML or ALL.
Optional features of the first aspect of the invention are now set out. It should be noted that any or all of these features may be combined with each other, except when context dictates otherwise, or when it is explicitly stated that a certain feature is incompatible or otherwise cannot be combined with another feature.
The digital image may be received from an imaging apparatus. And, in addition to the computer-implemented steps, a method according to the first aspect of the invention may comprise capturing the digital image of the one or more blast cells using the imaging apparatus. The imaging apparatus may comprise a camera. The imaging apparatus may further comprise a microscope. Accordingly, the digital image may be a microscopy image such as a bright-field microscopy image. Alternatively, however, the digital image could be based on phase contrast imaging, differential interference contrast microscopy, or dark field microscopy.
The digital image of the one or more blast cells may be a digital image of a sample on a slide.
In addition to the computer-implemented steps, a method according to the first aspect of the invention may further comprise preparing a slide containing one or more blast cells.
Preferably, the slide is prepared using a method in which a drop of blood is allowed to dry in the presence of air, or in a moderate air flow, and using a technique which creates a monolayer of all of the cells in a volume transferred to a slide. Again preferably, the slide is prepared such that every single cell can be counted and differentiated by type. In order to ensure this, a proper diluent, dilution factor, means of allowing the drop to spread over a relatively large area by improving the hydrophilic nature of the glass slide and/or by mechanically spreading the liquid drop must be selected. Appropriate selection of parameters and methods gives rise to a very thin layer of liquid that rapidly dries in such a way that the presentation and preservation of the cells mimics that found in a conventional blood “smear”. Details of methods by which advantageous slides may be prepared may be found in Wo 2012/030313 A1, the entirety of which is incorporated by reference. Additional disclosure regarding the preparation of the slides may be found in US 2009/0269799 A1, US 2010/284602 A1, US 2011/014645 A1, and US 2016/209320 A1, the entirety of each of which is incorporated by reference.
As will be noted from the references above, it is preferred that the sample is stained. Accordingly, the digital image may be an image of a stained sample on a slide. Stains which may be used include: Romanowsky staining, Giemsa staining, Jenner staining, Wright staining, Field staining, May-Grünwald staining, and Leishman staining. It will be appreciated that other kinds of stains may also be used.
The computer-implemented method may further comprise, after receiving the digital image containing one or more blast cells: identifying the one or more blast cells in the digital image. In other words, before applying the classifier to each of the blast cells, the computer-implemented method may further comprise locating where the blast cells are in the image. Even in patients suffering from AML or ALL, the proportion of blast cells in the blood is still very low, relative to e.g. red blood cells. It is therefore beneficial to identify the blast cells within the digital image before applying the classifier, to ensure that the classifier acts only on the identified blast cells. Identifying the blast cells may comprise applying an image analysis algorithm to the digital image. The image analysis algorithm is preferably configured to identify bounding boxes around the one or more blast cells in the digital image. Specifically, the image analysis algorithm is preferably configured to identify a respective bounding box around each of the one or more blast cells in the digital image. Herein, identifying a bounding box should be understood to mean identifying an area, preferably a square or rectangular area which contains preferably a single blast cell. In some cases, the image analysis algorithm may be configured to generate a plurality of image files, each image file containing a digital image of a blast cell of the one or more blast cells. The boundary of the image in each image file of the plurality of image files may be the bounding box described above. In some cases, the plurality of files may be generated after the image analysis algorithm has been applied to the digital image. In the cases in which a plurality of image files are generated, the computer-implemented method may comprise applying the parametric model classifier to each of the generated image files, and configured to generate a respective output indicative of whether the respective blast cell in each image is a lymphoid blast cell or a myeloid blast cell. The output of the computer-implemented method may therefore comprise a plurality of outputs, each indicative of whether a respective blast cell of the one or more blast cells is a lymphoid blast cell or a myeloid blast cell.
We now consider in more detail the form of the output of the parametric model classifier. For each blast cell in the digital image, or for each image file, the output of the parametric model classifier may comprise a numerical value x indicative of whether the blast cell is a lymphoid blast cell or a myeloid blast cell. In some cases, the output (again, for each blast cell in the digital image, or for each image file) may comprise a first value and second value, the first value indicative of the likelihood that the blast cell is a lymphoid blast cell (or that the image file contains an image of a lymphoid blast cell), the second value indicative of the likelihood that the blast cell is a myeloid blast cell (or that the image file contains an image of a myeloid blast cell). Preferably, the first and the second value are probabilities, and preferably they sum to 1. The more extreme the values, the higher the confidence in the result. Put alternatively, the numerical value x may be in the range [0,1]; if the value x is equal to 1, the blast cell may be identified as a lymphoid blast cell with 100% confidence; and if the value x is equal to 0, the blast cell may be identified as a type of cell other than a lymphoid blast cell with 100% confidence. Or, the numerical value x may be in the range [0,1]; if the value x is equal to 1, the blast cell may be identified as a myeloid blast cell with 100% confidence; and if the value x is equal to 0, the blast cell may be identified as a type of cell other than a myeloid blast cell with 100% confidence. It is known that it can be challenging to differentiate between myeloid blast cells and lymphoid blast cells, and accordingly, it is conceivable that in some cases (either due to the nature of the blast cell itself or, for example, the angle at which it is shown in the digital image), it is not possible accurately to classify the cell. Accordingly, in some cases, if the numerical value falls within a predetermined range, the output of the parametric classifier is configured to be inconclusive or uncertain. Such results may be discarded. The predetermined value may be between 0.1 and 0.9, 0.2 and 0.8, 0.3 and 0.7, 0.4 and 0.6, or 0.45 and 0.55.
Each output may take the form [first value, second value]. Consider the case where the predetermined threshold range is 0.4 to 0.6:
In these cases, it is assumed that the blast cell is either a myeloid blast cell or a lymphoid blast cell.
Throughout the application so far, we refer to a “parametric model classifier”. In the context of the present application, a parametric model is one which assumes a parametric form for a function for generating the output from the input data (i.e. the data representing the digital image), the function comprising a fixed number of parameters. The parametric form relies, as the name suggests, on a plurality of parameters, and the goal of e.g. a training process is to identify those parameters. In contrast, non-parametric models are unbounded, and there is no limit to their complexity. Advantages of parametric model classifiers (relative to non-parametric classifiers) include simplicity, speed, and the fact that they can produce reliable results on lower volumes of data. The parametric model classifier is preferably a machine learning-based classifier. In preferred cases, the classifier is based on a convolutional neural network, such as a deep neural network. One example of a suitable neural network classifier is a residual neural network classifier1 2 3 (herein, “ResNet”). A ResNet is an artificial neural network that builds on constructs known from pyramidal cells in the cerebral cortex. ResNets work on the principle of skip connections or shortcuts to jump over the layers in the neural network. Typical ResNet models are implemented with double- or triple-layer skips that contain nonlinearities or and batch normalization in between. There are two reasons why connections may be skipped: to avoid the problem of vanishing gradients, or to mitigate the degradation (accuracy saturation) problem, where adding more layers to a suitably deep model leads to higher training error. During training, the weights adapt to mute the upstream layer and amplify the previously-skipped layer. In the simplest case, only the weights for the adjacent layer's connection are adapted, with no explicit weights for the upstream layer. This works best when a single nonlinear layer is stepped over, or when the intermediate layers are all linear. If not, then an explicit weight matrix may be learned for the skipped connection. ResNets are advantageous because skipping effectively simplifies the network, using fewer layers in the initial training stages, thereby speeding learning by reducing the impact of vanishing gradients, as there are fewer layers to propagate through. The network then gradually restores the skipped layers as it learns the feature space. Towards the end of training, when all layers are expanded, it stays closer to the manifold and thus learns faster. Examples of suitable ResNets include: ResNet34, ResNet50 and ResNet100, in which the number represents the number of layers present in the residual neural network. Other types of convolutional neural networks such as Inception4, VGG5, and EfficientNet6 may also be used in implementations of the invention. 1He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing: Sun, Jian (2015 Dec. 10). “Deep Residual Learning for Image Recognition”. arXiv:1512.033852He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). “Deep Residual Learning for Image Recognition”. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE. 3Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems 25 (2012): 1097-1105.4Szegedy et al. (2014) “Going Deeper with Convolutions” arXiv:1409.48425Simonyan & Zisserman (2015) “Very Deep Convolutional Networks for Large-Scale Image Recognition” arXiv:1409.15566Tan & Le (2019) “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks” arXiv:1905.119467Maximilian Ilse, Jakub M. Tomcak, Max Welling (2018) “Attention-based Deep Multiple Instance Learning”. arXiv:1802.04712
Alternatively, or additionally, the model may be created using multiple instance learning, using e.g. Deep Attention MIL, as described in Ilse et al. (2018) 7.
Rather than neural network methods involving training a model using per image labels, a Siamese model could be used. A Siamese model takes two images and has to determine whether they are of the same class. The labels, which apply either to a complete slide or to each cell individually, could be used to identify each pair as true if it consists of the same cell type (e.g. two lymphoid blast cells or two myeloid blast cells) or false if the images are from cells of different types.
An aim of the present invention is to improve the efficiency with which a patient may be diagnosed with either AML or ALL (or neither). We therefore now consider how the output of the parametric model classifier is used. Accordingly, the computer-implemented method may further comprise calculating a patient level score based on the respective output value x for all of the blast cells in the digital image. In this way, a diagnosis, or clinical decision may be made based on a plurality of blast cells, rather than a single one. The patient level score may comprise one or more of: a mean value, a median value, a maximum value, or a minimum value. Specifically, the patient level score may be calculated on all of the x values representing the likelihood that a blast cell is a lymphoid blast cell, and/or the likelihood that the blast cell is a myeloid blast cell. If the numerical value of the patient level score falls within a predetermined range, the output of the parametric classifier is configured to be inconclusive or uncertain. Such results may be discarded
In some cases, the computer-implemented method of the first aspect of the invention may be used as part of a decision support system to enable clinicians to select an appropriate course of action. In such cases, it may be desirable for the clinician to review the results generated by the parametric model classifier. Specifically, the computer-implemented may further comprise: generating, based on the output of the parametric model classifier, instructions configured to cause a display device of a computing system to display a gallery comprising: a first plurality of images showing the blast cells identified as lymphoid blast cells with the highest confidence; and a second plurality of images showing the blast cells identified as myeloid blast cells with the highest confidence. In this context “with the highest confidence” may be understood to mean the blast cells for which the probability of being a particular type of blast cell (i.e. a myeloid blast cell or a lymphoid blast cell) is the highest. This will enable a clinician to review the images, in order to determine a patient's prognosis, and to decide on an appropriate course of action. In some cases the first plurality of images includes the same number of images as the second plurality of images. The number may be between 1 and 100, more preferably between 5 and 50, more preferably between 10 and 25, and most preferably about 20.
As discussed earlier in this application, in order to reach a more definitive diagnosis, it is often necessary to perform a CD marker screening. Various types of CD marker screenings are available, each targeting a particular biomarker. In some cases, a clinician may identify a CD marker to use for screening based on the output of the parametric model classifier. However, in other cases, the computer-implemented method may further comprise selecting, from a plurality of available CD markers, one or more CD markers, based on the output of the parametric model classifier. In this way, a CD marker to be used for subsequent testing can be identified automatically based on the output of the parametric model.
In order for a parametric model to provide accurate results, it is often necessary to train it using known data. A second aspect of the present invention therefore provides a computer-implemented method of generating a parametric model classifier configured to differentiate between lymphoid blast cells and myeloid blast cells in a digital image, the computer-implemented method comprising: receiving a plurality of pairs of labelled training data, each pair of labelled training data including: input data comprising a digital image of a blast cell from a patient who has been diagnosed with either acute myeloid leukaemia or acute lymphoid leukaemia; and output data comprising an indication of whether the patient has acute myeloid leukaemia or acute lymphoid leukaemia; and training a parametric model classifier using the training data. The optional features set out above in respect of the first aspect of the invention apply equally well to the second aspect of the invention where compatible. For the avoidance of doubt, we stress that, in the second aspect of the invention, the parametric model classifier may be machine learning-based classifier. In preferred cases, the classifier is based on a convolutional neural network, such as a deep neural network. One example of a suitable neural network classifier is a residual neural network classifier (herein, “ResNet”). Examples of suitable ResNets include: ResNet34, ResNet50 and ResNet100, in which the number represents the number of layers present in the residual neural network. The convolutional neural network may be trained using an Adam optimizer8. The convolutional neural network may be trained using the one-cycle policy for learning rate scheduling, as described in Smith9. A framework which may be used for training of the convolutional neural network is fastai10, and may use the library of Paszke et al11. A different learning strategy relying instead on flat learning rates and cosine annealing could also be used. Rather than convolutional neural networks, vision transformers or experimental models such as capsules could also be used. 8“Adam: A Method for Stochastic Optimization” by Kingma et al https://arxiv.org/abs/1412.69809“A disciplined approach to neural network hyper-parameters: Part I—learning rate, batch size, momentum, and weight decay” by Smith https://arxiv.org/abs/1803.0982010“fastai: A Layered API for Deep Learning” by Jeremy Howard, Sylvain Gugger https://arxiv.org/abs/2002.0468811 “PyTorch: An Imperative Style, High-Performance Deep Learning Library” by Paszke et al, https://arxiv.org/abs/1912.01703
The indication of whether the patient has acute myeloid leukaemia or acute lymphoid leukaemia may comprise a numerical value. Specifically, the numerical value may be 1 if the patient has acute myeloid leukaemia, and 0 otherwise. Conversely, the numerical value may be 1 if the patient has acute lymphoid leukaemia, and 0 otherwise. More detailed information about the training process is set out later in this application, in the “Experimental Results” section.
In the computer-implemented method of the first aspect of the invention, the parametric model classifier may be trained using the computer-implemented method of the second aspect of the invention.
A third aspect of the invention provides a computer-implemented method of generating a provisional diagnosis of acute myeloid leukaemia or acute lymphoid leukaemia, the computer-implemented method comprising: performing the computer-implemented method of the first aspect of the invention; and based on the output of the parametric model classifier or a patient level score calculated based on the output of the parametric model classifier, determining whether a patient whose blast cells are shown in the digital image is suffering from acute myeloid leukaemia, acute myeloid leukaemia, or neither. As before, all optional features set out in respect of the first aspect of the invention or the second aspect of the invention apply equally well to computer-implemented inventions of the third aspect of the invention. Furthermore, based on the result of the determination, computer-implemented methods of the third aspect of the invention may further comprise generating instructions configured to cause a display device of a computing system to display the result of the determination.
The first to third aspects of the invention relate to computer-implemented methods. Corresponding fourth to sixth aspects of the invention provide computer program products which, when the program is executed by a computer or other computing device, cause the computer to carry out the computer-implemented methods, respectively, of the first to third aspects of the invention. Seventh to ninth aspects of the invention provide, respectively, a computer-readable data carrier having stored thereon the computer program product of the fourth to sixth aspects of the invention.
As explained above, a purpose of the invention is to provide a clinical decision support system which assists clinicians in their diagnoses of acute myeloid leukaemia and/or acute lymphoid leukaemia. A tenth aspect of the invention, accordingly, provides a clinical decision support system comprising a computing device having a processor, the processor configured to perform the computer-implemented method of any one of the first to third aspects of the invention.
A particularly preferred eleventh aspect of the invention provides a computer-implemented method of differentiating between lymphoid blast cells and myeloid blast cells, the computer-implemented method comprising: receiving a digital image containing one or more blast cells; applying an image analysis to the digital image, the image analysis algorithm configured to: detect one or more blast cells in the digital image; generate a bounding box around each of the one or more blast cells; and generate a plurality of image files, each image file containing a digital image of a blast cell of the one or more blast cells, the boundary of the image in each file corresponding to a respective bounding box; applying a deep neural network classifier to each of the generated image files, the deep neural network classifier configured to generate an output indicative of whether each blast cell is a lymphoid blast cell or a myeloid blast cell, the output comprising a numerical value x in the range [0,1], wherein: either if the value x is equal to 1, the blast cell is identified as a lymphoid blast cell with 100% confidence, and if the value x is equal to 0, the blast cell is identified as a myeloid blast cell with 100% confidence; or if the value x is equal to 1, the blast cell is identified as a myeloid blast cell with 100% confidence, and if the value x is equal to 0, the blast cell is identified as a lymphoid blast cell with 100% confidence; and calculating a patient level score based on the respective output value x for all of the blast cells in the digital image, wherein the patient level score comprises a mean value; a median value; a maximum value of x; and/or a minimum value of x. Optional features set out above in respect of the first to tenth aspects of the invention apply equally well to the eleventh aspect of the invention, except where clearly technically incompatible or where context clearly dictates otherwise.
In some cases, the deep neural network classifier of the eleventh aspect of the invention is generated using a computer-implemented method comprising: receiving a plurality of pairs of labelled training data, each pair of labelled training data including: input data comprising a digital image of a blast cell from a patient who has been diagnosed with either acute myeloid leukaemia or acute lymphoid leukaemia; output data comprising an indication of whether the patient has acute myeloid leukaemia or acute lymphoid leukaemia, the output comprising a numerical value x in the taking the value 0 or 1, wherein: either if the value x is equal to 1, the blast cell is a lymphoid blast cell, and if the value x is equal to 0, the blast cell is a myeloid blast cell; or if the value x is equal to 0, the blast cell is a lymphoid blast cell, and if the value x is equal to 1, the blast cell is a myeloid blast cell; and training the deep neural network classifier using the training data.
A twelfth aspect of the invention provides a clinical decision support system, comprising: a computing device having a processor, the processor configured to generate a provisional diagnosis of acute myeloid leukaemia or acute lymphoid leukaemia by performing the computer-implemented method of the eleventh aspect of the invention and wherein the processor is further configured to: based on the patient level score calculated based on the output of the deep neural network classifier, determine whether a patient whose blast cells are shown in the digital image is suffering from acute myeloid leukaemia, acute lymphoid leukaemia, or neither; and generate, based on a result of the determination, instructions configured to cause a display device of a computing system to display the result of the determination.
Embodiments of the present invention will now be described with reference to the accompanying drawings, in which:
Aspects and embodiments of the present invention will now be discussed with reference to the accompanying figures. Further aspects and embodiments will be apparent to those skilled in the art. All documents mentioned in this text are incorporated herein by reference.
We now explain the operation of the clinical decision support apparatus 400 to identify and classify blast cells, and to generate instructions to cause various images to be displayed on the display 408.
In a first step S30, the image containing the blast cells is received from the imaging apparatus 300 at the clinical decision support system 400 via the imaging apparatus interface module 402 thereof. Then, in step S32, the blast cells are identified within the image by the blast cell identification module 410. The output of this process may be, for example, several image files (which may be stored temporarily in the buffer 418 or more permanently in the memory 406), each containing an image of a single blast cell from the image. Alternatively, rather than generating a plurality of image files, the blast cell identification module 410 may identify blast cells within the image, and define boundaries of regions, each containing a single blast cell. The output of the blast cell identification module 410 may be in the form of a list of pixel arrays, each pixel array corresponding to a region of the image containing a single blast cell. This list may also be stored in the buffer 418 or more permanently in the memory 406. In step S34, the parametric model 416 is applied to each image file (or region of image containing a blast cell) by the blast cell classification module 412 in order to determine whether the blast cell is a lymphoid blast cell or a myeloid blast cell. As discussed earlier, the output of the blast cell classification module 412, for each blast cell, is preferably a number from 0 to 1, the number representing a probability or a confidence level that the blast cell in question is either a myeloid blast cell or a lymphoid blast cell. Alternatively, the model may return two probabilities, one that the blast cell is a lymphoid blast cell, and one that the blast cell is a myeloid blast cell. These probabilities should add to 1 (or 100%, or equivalent). Then, once a plurality of probabilities have been calculated using the parametric model 416, by the blast cell classification module 412, various different steps may be taken. In
In some examples, not shown, the method may further include a step of determining an appropriate CD marker for a subsequent screening step based on e.g. the patient level score.
Having explained the computer-implemented method of the invention we now present some experimental data which demonstrates the efficacy of the invention. Two experiments were performed on different sets of data, referred to herein as the Toulouse Dataset and the Boston Dataset.
It will be appreciated from the below that computer-implemented methods according to various aspects of the invention are very reliable at differentiating between myeloid blast cells and lymphoid blast cells.
In this experiment, image from blood smears of 119 patients were used, of whom 52 had acute myeloid leukaemia, and 67 had acute lymphoid leukaemia (51 ALL-B, and 16 ALL-T). The images of the blood smears were acquired on a CellaVision instrument with high resolution.
16 slides were sampled randomly as a validation set, and the rest were used to train the neural network, in this case a ResNet50 network.
On the validation set, an AUC (Area Under Curve in the Receiver Operator Plot) of 83.91% was achieved per cell.
After aggregating the results, i.e. by generating a patient level score, the AUC increased to 95.31%.
The same model was tested on an additional dataset from the same hospital which contained 125 slides from AML/ALL cases with lower Blast counts. On this dataset the same model achieved an AUC of 0.89963.
In this Experiment, 39 slides were Obtained from AML or ALL patients. The slides were printed and imaged using the methods explained earlier in this application. The images were obtained at 20× magnification using a high-resolution camera. Of the 39 slides, 21 were from AML patients, and 18 were from ALL patients. The neural network (a ResNet50 network) was trained on 29 slides, and 10 slides (5 AML and 5 ALL) were kept as a validation set. Depending on which slides were chosen for the validation set, an AUC of 79.51% to 91.55%.
The dataset is split into a training and validation set by excluding complete slides, rather than randomly choosing cells from the complete dataset.
To thoroughly test the per slide performance for the whole dataset a per slide cross validation was performed in which one slide was excluded, and the model was trained on all the others. The performance was the evaluated on the excluded slide. This was repeated for all the slides.
An ensemble of 5 models (again using ResNet34) was used. Ensembles deliver better performance because random mistakes from individual models are corrected. A total AUC of 86.09% was achieved, for the whole dataset.
General Statements about the Application
The features disclosed in the foregoing description, or in the following claims, or in the accompanying drawings, expressed in their specific forms or in terms of a means for performing the disclosed function, or a method or process for obtaining the disclosed results, as appropriate, may, separately, or in any combination of such features, be utilised for realising the invention in diverse forms thereof.
While the invention has been described in conjunction with the exemplary embodiments described above, many equivalent modifications and variations will be apparent to those skilled in the art when given this disclosure. Accordingly, the exemplary embodiments of the invention set forth above are considered to be illustrative and not limiting. Various changes to the described embodiments may be made without departing from the spirit and scope of the invention.
For the avoidance of any doubt, any theoretical explanations provided herein are provided for the purposes of improving the understanding of a reader. The inventors do not wish to be bound by any of these theoretical explanations.
Any section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.
Throughout this specification, including the claims which follow, unless the context requires otherwise, the word “comprise” and “include”, and variations such as “comprises”, “comprising”, and “including” will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps.
It must be noted that, as used in the specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Ranges may be expressed herein as from “about” one particular value, and/or to “about” another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by the use of the antecedent “about,” it will be understood that the particular value forms another embodiment. The term “about” in relation to a numerical value is optional and means for example +/−10.
| Number | Date | Country | Kind |
|---|---|---|---|
| 21217712.5 | Dec 2021 | EP | regional |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/EP2022/087779 | 12/23/2022 | WO |