The disclosure relates to computer-aided diagnosis (CAD). The disclosure also relates to a method and a platform or system for using machine learning algorithms for processing medical data. In particular, the disclosure relates to a method and apparatus for encoding medical image data into a latent space representation and analysing said representation.
Advances in computed tomography (CT) allow early detection of cancer, in particular lung cancer which is one of the most common cancers. As a result, there is increased focus on using regular low-dose CT screenings to ensure early detection of the disease with improved chances of success of the following treatment. This increased focus leads to an increased workload for professionals such as radiologists who have to analyze the CT screenings.
To cope with the increased workload, computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems are being developed. Hereafter both types of systems will be referred to as CAD systems. CAD systems can detect lesions (e.g. nodules) and subsequently classify them as malignant or benign. A classification need not be binary, it can also include a stage of the cancer. Usually, a classification is accompanied with a confidence value as calculated by the CAD system.
CAD systems typically follow a number of general steps. In an optional first step, the input imaging data is segmented, for example to distinguish lung tissue from the background signal. Then, regions of interest are identified, for example all lung tissue with nodule-like forms in them. It is also possible to simply examine every data point, without a pre-selection of region of interest. For a selected data point a number of input values is calculated, the so-called feature vector. This feature vector is used as input in a decision function, which projects the feature vector to a classification.
Hereafter the term “model” will be used to indicate a computational framework for performing one or more of a segmentation and a classification of imaging data. The segmentation, identification of regions of interest, and/or the classification may involve the use of a machine learning (ML) algorithm. The model comprises at least one decision function, which may be based on a machine learning algorithm, which projects the input to an output. Where the term machine learning is used, this also includes further developments such as deep (machine) learning and hierarchical learning.
Whichever type of model is used, suitable training data needs to be available to train the model. In addition, there is a need to obtain a confidence value to be able to tell how reliable a model outcome is. Most models will always give a classification, but depending on the quality of the model and the training set, the confidence of the classification may vary. It is of importance to be able to tell whether or not a classification is reliable.
While CT was used as an example in this introduction, the disclosure can also be applied to other modalities, such as ultrasound, Magnetic Resonance Imaging (MRI), Positron Emission Spectrograph (PET), Single Photon Emission Computed Tomography (SPECT), X-Ray, and the like.
It is an object of this disclosure to provide a method and apparatus for classifying imaging data which addresses at least one of the above drawbacks.
Accordingly, the disclosed subject matter provides a computer-implemented method for processing medical image data, the method comprising:
In an embodiment, the method comprises inputting, with the one or more processors, patient metadata into the encoder stage of the EDP as second input.
In an embodiment, the patient metadata and the calculated latent space representation are added to the latent space database stored within the one or more storage devices.
In an embodiment, the method further comprises projecting, with the one or more processors, the latent space representation using a projection function to obtain a classification value.
In an embodiment, the projection function uses a convolutional neural network (CNN).
In an embodiment, the method further comprises determining, with the one or more processors, if the latent space representation is part of a cluster of other latent space representations using a cluster detection algorithm.
In an embodiment, the EDP is a variational autoencoder. The EDP may be trained using a loss function based on a Generative Adversarial Network (GAN).
In an embodiment, a plurality of medical image data over time are used as input, and the corresponding latent space representations are tracked over time.
The disclosure further provides a computer-implemented method of training a model using medical image data, the method comprising:
In an embodiment, patient metadata is input into the EDP as second input. The patient metadata may be included as embedding layer.
In an embodiment, the projection function uses a convolutional neural network (CNN).
The disclosure further provides a computing system for processing medical image data, comprising:
one or more computation devices in a computing environment and one or more storage devices accessible by the one or more computation devices, wherein the one or more computing devices comprise one or more processors, and wherein the one or more processors are programmed to:
input medical image data into an encoder stage of an encoder-decoder pair (EDP) as a first input among one or more inputs;
calculate a latent space representation based on the one or more inputs using the encoder stage of the EDP;
provide, from a latent space database stored within the one or more storage devices, latent space representations of other inputs; and
determine a classification based on the latent space representation of the one or more inputs and at least one latent space representation of the other inputs. In an embodiment, the computing devices are cloud computing devices.
In an embodiment, the system comprises a second input module for inputting patient metadata into the encoder stage of the EDP as second input among the one or more inputs.
In an embodiment, the system is configured to use a cluster detection algorithm to determine if the latent space representation is part of a cluster of other latent space representations.
In an embodiment, the EDP is a variational autoencoder. The EDP may be trained using a loss function based on a Generative Adversarial Network (GAN). The EDP may comprise a probabilistic U-Net.
In an embodiment, the system comprises a temporal analyzer for tracking a latent space representation of the one or more inputs over time.
The invention provides a computer program product comprising instructions which, when executed on a processor, cause said processor to implement one of the methods or systems as described above.
The invention further provides a non-transitory computer-readable medium with instructions stored thereon, that when executed by one or more processors, perform the steps comprising:
inputting medical image data into an encoder stage of an encoder-decoder pair (EDP) as a first input among one or more inputs;
calculating a latent space representation of the one or more inputs using the encoder stage of the EDP;
providing from a latent space database latent space representations of other inputs; and
determining a classification based on the latent space representation of the one or more inputs and at least one latent space representation of the other inputs.
Embodiments of the present disclosure will be described hereinafter, by way of example only, with reference to the accompanying drawings which are schematic in nature and therefore not necessarily drawn to scale. Furthermore, like reference signs in the drawings relate to like elements.
In the following, the example of a CT device, in particular a CT device for low dose screenings, will be used. However, this is only exemplary. Aspects of the disclosure can be applied to any instantiation of imaging modality, provided that it is capable of providing imaging data. A distinct type of scan (X-Ray CT, low-dose X-Ray CT, CT with contrast agent X) can be defined as a modality.
The images generated by the CT device 10 (hereafter: imaging data) are sent to a storage 11 (step S1). The storage 11 can be a local storage, for example close to or part of the CT device 10. It can also be part of the IT infrastructure of the institute that hosts the CT device 10. The storage 11 is convenient but not essential. The data could also be sent directly from the CT device 10 to computation platform 12.
All or parts of the imaging data is then sent to the computation platform 12 in step S2. In general it is most useful to send all acquired data, so that the computer models of platform 12 can use all available information. However, partial data may be sent to save bandwidth, to remove redundant data, or because of limitations on what is allowed to be sent (e.g. because of patient privacy considerations). The data sent to the computation platform 12 may be provided with metadata from scanner 10, storage 11, or further database 11a. Metadata can include additional data related to the imaging data. For example statistical data of the patient (gender, age, medical history) or data concerning the equipment used (type and brand of equipment, scanning settings, etc).
Computation platform 12 comprises one or more storage devices 13 and one or more computation devices 14, along with the necessary network infrastructure to interconnect the devices 13, 14 and to connect them with the outside world, preferably via the Internet. It should be noted that the term “computation platform” is used to indicate a convenient implementation means (e.g. via available cloud computing resources). However, embodiments of the disclosure may use a “private platform”, i.e. storage and computing devices on a restricted network, for example the local network of an institution or hospital. The term “computation platform” as used in this application does not preclude embodiments of such private implementations, nor does it exclude embodiments of centralized or distributed (cloud) computing platforms.
The imaging data is stored in the storage 13. The central computing devices 14 can process the imaging data to generate feature data as input for the models. The computing devices 14 can segment imaging data. The computing devices 14 can also use the models to classify the (segmented) imaging data. More functionality of the computing devices 14 will be described in reference to the other figures.
A work station 15 for use by a professional, for example a radiologist, is connected to the computation platform 12. Hereafter, the terms “professional” and “user” will be used interchangeably. The work station 15 is configured to receive data and model calculations from the computation platform, and to send instructions and feedback to the computation platform 12. The work station 15 can visualize received raw data and model results.
In step S3, the professional selects the model (or in general: specifies model parameters) for use in a calculation. Based on the entered model parameters, in step S4 the platform 12 generates the model (if needed—the model may be already cached), performs the needed calculations for training the model (if needed—training data for the model may already be available in the computation platform 12), and applies the model to the imaging data that was received in step S2. In general, the computation platform will use stored results for calculations that have been performed earlier (i.e. calculated image features, model training data) and only perform the calculations it has not done before. This way, the professional accessing the computation platform 12 using the work station 15 can have a fast response to his or her instructions.
The result of the model calculations, for example classification of the most recent imaging data and associated patient metadata, is sent to the professional in step S5. The received data is visualized on the work station 15. The professional will examine the results and may prepare feedback in step S6. Feedback may for example be that, in the professional's opinion, the presented classification is correct or incorrect. In this manner, the feedback information can be used to enrich the model so that at a later stage more sophisticated models can be trained.
Along with the feedback, the source of the feedback may also be stored. That makes it possible to train future models using only feedback from selected sources. For example, the professional can request models that are only trained using his own data or data from close colleagues (e.g. “trusted data”). Instead or in addition to this, the feedback can be used to incrementally adjust the decisions functions of the model. The feedback can be used only in one or more selected decision functions, again to ensure that models are trained using data from known and trusted sources.
The model will now be further discussed in reference to
The model can make use of an encoder-decoder pair (EDP). The encoder 22 is a neural network which takes data input x (e.g. training data 22 or patient data 31) and outputs a latent space or representation space value z (latent space representation 23, 33). The decoder 24, is also a neural network. It takes as input the latent space value z, and calculates an approximation of the input data x′ (generated data 25). The loss function 26 is designed to make the encoder and decoder work to minimize the difference between the actual and approximated inputs x and x′. A key aspect of the EDP is that the latent space z has a lower dimensionality than the input data. The latent space z is thus a bottleneck in the conversion of data x into x′, making it generally impossible to reproduce every detail of x exactly in x′. This bottleneck effectively forces the encoder/decoder pair to learn an ad-hoc compression algorithm that is suitable for the type of data x in the training set. Another way of looking at it, is that the encoder learns a mapping from the full space of x to a lower dimension manifold z that excludes the regions of the full space of x that contain (virtually) no data points.
An example EDP is an autoencoder. The most basic autoencoder has a loss function which, as a loss function 26, calculates an L1 or L2 norm of the generated data minus the training data. However, if the latent space is to have certain characteristics (such as smoothness), it is useful to also use aspects of the latent space as input in the loss function 26. For example, a variational autoencoder (Diederik P Kingma and Max Welling, “Auto-encoding variational Bayes”, Proceedings of the 2nd International Conference on Learning Representations, 2013) has a loss function that includes next to the standard reconstruction error an additional regularization term (the KL divergence) in order to encourage the encoder to provide a better organization of the latent space.
A feature of variational autoencoders is that, contrary to the most basic autoencoder, the latent space is stochastic. The latent variables are drawn from a prior p(z). The data x have a likelihood p(x | z) that is conditioned on the latent variables z. The encoder will learn a p(z | x) distribution.
In a further development of VAE's, a β parameter was introduced to add more weight to the KL divergence, in order to promote an even better organization of the latent space, at the cost of some increase in the reconstruction error.
Autoencoders and VAE's are not the only possible EDP's that can be used. It is also possible to use a U-Net as encoder-decoder. A U-Net EDP is similar to an EDP using a conventional Convolutional Neural Network encoder and decoder, with the difference that there are additional connections between encoder layers and the mirrored decoder layers, which bypass the latent space 23 between the encoder 22 and decoder 24. While it may seem counter-intuitive to have these latent space bypasses in order to promote a better latent space, these bypasses may actually help the encoder to reduce the reconstruction error without overburdening the latent space with storage of high-frequency image details which are important for the decoder to accurately recreate the input image (and thus to reduce the reconstruction error), but which are not important for the purposes of the latent space representation (more details on the purpose of the latent space representation are discussed in connection with the
As a further refinement, the encoder may be built using a probabilistic U-Net. A probabilistic U-Net is able to learn a distribution over possible outcomes (such as segmentation) rather than a single most likely outcome/segmentation. Like VAEs, the probabilistic U-Nets use a stochastic variable distribution to draw latent space samples from. The probabilistic U-Net allows for hi-resolution encoding/decoding without much loss in the decoded images. It also allows the variability in the labelled image or other data (due to radiologist marking variability, measurement variability, etc.) to be explicitly modelled.
Another way to improve the latent space representation 23 is by including a Discriminator of a Generative Adversarial Network (GAN) in the loss function 26. The discriminator is separately trained to learn to distinguish the generated data 25 from the original training data 21. The training process then involves training both the EDP and the loss function's discriminator. Usually, this is done by alternately training one and the other. Use of a GAN discriminator typically yields sharper and more realistic looking generated data than traditional reconstruction errors (e.g. L1 or L2 norm).
In
The EDP encoder stage 46 has already been discussed in reference to encoder 22 of
In
For patient metadata which is (close to) numeric in nature, such as age or gender, the word embedding module 85 in
The functioning of the various latent space explorer components is also discussed in reference to
Each latent space representation from the latent space database 49 is represented by a letter A, B, or C. The position of the letter represents the latent space representation (in this exemplary 2D example) and the letter itself represents a classification. For example, if the data relates to lesions, A, B and C might relate to different types of lesions.
The cluster analyzer 63 is configured to detect groups or clusters of points with like classification in the latent space 67. The encoder is trained in such a manner that images with related classifications are mapped in similar regions in latent space, so that there these similarly classified images can be found in groups or clusters according to a relevant distance metric of the latent space.
The cluster analyzer 63 may employ an algorithm such as k-means clustering. In the example of
The 2D space is thus a projection of the latent space, with in it the classifications (or other labels of interest) of which the uncertainty may be calculated. That is, if part of the classification is a specific type of nodule, from the variance and overlap it can be seen if there is a need to sample more from this type of image data. The 2D (or N-dimensional) space can be any projection for any pathology classification that is of interest. These classifications can all be calculated via the same latent space manifold. Instead of a cluster analyzer 63, a different type of module can be used. What is key, is that the module identifies, for any given new data point, similar earlier data points with a known classification, so that a classification of the new data point may be arrived at.
Returning to
The confidence analyzer 65 may work as follows. First it determines if a current latent space representation 47 is located in a cluster of points with a specific classification. In case a classifier function F(z) 55 is available (see
The case where the latent space representation 47 is not in any cluster and thus not near earlier data points, is indicative of a situation where the latent space is not well-suited to distinguish the different classifications. It may be that the dimensionality of the latent space is too high or too low, or that important discriminating data is missing from the input. While that may be a result that is suboptimal from the point of view of promptly determining a classification, the fact that it is known that the classification is unreliable is worthwhile in itself. It could, for example, prompt the practitioner to schedule further tests in order to obtain a more reliable outcome. On the other hand, if the confidence analyzer declares a high confidence (because for example the point is in the middle of a homogenous cluster of points), it may tell the practitioner that further tests are not needed.
There are various ways to calculate a confidence value. Given the projection (interpreted as a probability distribution) and point x, if P(A | x) is close to 1, and P(B | x) and P(C | x) are both close to 0, that means that the assignment to A is quite confident. If P(A | x), P(B | x) and P(C | x) are all far from 0, then the assignment is less confident. Alternatively one could use a so-called silhouette-score which is effectively a distance to center divided by max spread, which will be much higher for A than for C.
The temporal analyzer 66 follows points in latent space over time. In
The temporal analyzer 66 is able to follow development over time in lateral space. This is useful, for example in order to determine the effect of a treatment. In the example of
If enough of this statistical information is collected, it becomes possible to determine a vector FX, including an uncertainty factor, which indicates the likely effect over time (in terms of movement in latent space) of treatment X. This can be extended to also include other treatments, say Y and Z.
This is illustrated in
It should be noted that the vector fields of
It is also worth noting that the vector fields, and indeed the entire latent space and prognosis (or generally, classification) values, can be determined based on a subset of patent data, for example according to patient metadata. One can determine separate vector fields depending on factors such as age, sex, smoking habits, etc. These vector fields can be seen as a summary of all or a subset of the data that is available in the latent space database 49.
The influence of the randomness of dropout on the resulting classification, combined with the robustness of the features will lead to an observed confidence. It might be that this confidence boundary might be very sharp, which then also shows that the predictions are stable.
The loss function calculator 59, which steers the training of the encoder 46 and decoder 56, will be provided with the original data 21, or 44 and 45 or 52, 53, and 45 and the generated data 57, 58. In addition, it may optionally be provided with the output of classifier function F(z) 55 and the known classification 60. By including the classification prediction in the loss function, the encoder may be encouraged to better distinguish classifications in the latent space.
Optionally, the loss function can also be provided with the output of segmentation function 54 and known segmentation data 61. In this way, a classifier function 55 and segmenter function 54 may also be trained during the training phase. The loss function may also receive the latent space representation as input, for example in order to calculate a latent space regularization term.
The loss function calculator can be a determined loss function, such as L2 or L1 for optimizing the EDP. Alternatively, this loss function could be based on the underlying distribution and use a GAN to learn a loss function.
The classifier function F(z) 55 may be a deep neural network, such as a CNN. The segmenter function 54 may also be a deep neural network, such as a CNN.
Combinations of specific features of various aspects of the disclosure may be made. An aspect of the disclosure may be further advantageously enhanced by adding a feature that was described in relation to another aspect of the disclosure.
It is to be understood that the disclosure is limited by the annexed claims and its technical equivalents only. In this document and in its claims, the verb “to comprise” and its conjugations are used in their non-limiting sense to mean that items following the word are included, without excluding items not specifically mentioned. In addition, reference to an element by the indefinite article “a” or “an” does not exclude the possibility that more than one of the elements is present, unless the context clearly requires that there be one and only one of the elements. The indefinite article “a” or “an” thus usually means “at least one”.