The present disclosure relates to a device and method for locating determined brain target points on a magnetic resonance image of the brain of a subject. This disclosure finds particular application in locating target points for Transcranial Magnetic Stimulation (TMS) or Deep Brain Stimulation.
Transcranial Magnetic Stimulation is a recent and growing therapy for a variety of neurological disorders though to arise from, or be modulated by, cortical regions of the brain represented by singular 3D target points.
These target points are usually determined from pre-operative magnetic resonance images (MR Images) manually. Accordingly, an expert visualizes an anatomical standard MR Image of the brain of the patient and locates the target points on the patient's cortical surface. This manual annotation is however time consuming and requires rare skills.
There is also a growing interest in automatically identifying the target points using a pre-annotated atlas. An atlas is an MRI brain image of a person in which target points have been identified by one or more experts. A brain MRI image of a subject is the deformably registered to the image of the atlas, and the target points identified on the atlas are then projected onto the cortical surface of the subject. Such approach has been discussed for instance in Ahdab R, Ayache S S, Brugières P, Goujon C, Lefaucheur J P. Comparison of “standard” and “navigated” procedures of TMS coil positioning over motor, premotor and prefrontal targets in patients with chronic pain and depression. Neurophysiol Clin. 2010 March; 40(1):27-36. doi: 10.1016/j.neucli.2010.01.001. Epub 2010 Jan. 22. PMID: 20230933.
However, this method is still quite time-consuming, and it is also susceptible to error resulting from topological differences between the atlas image and the particular brain morphology of a patient, due to the high patient variability in the number and position of cortical gyri.
There is therefore a need for an improved solution for locating said brain target points, which is both faster and more reliable than the prior art solutions.
In view of the above, one aim of the invention is to alleviate at least part of the inconveniences of the prior art.
In particular, one aim of the invention is to provide a device and a method for locating target points on a magnetic resonance image of the brain of a subject.
Another aim of the invention is to enable performing said localization in less time than the prior art methods, but with acceptable precision.
To this end, it is proposed a device for locating target points on a magnetic resonance image of the brain of a subject, the device comprising a memory storing a trained neural network configured to receive as input a 3D magnetic resonance image of the brain of a subject, and to output the location, on said image, of at least one determined brain target point, and a computer configured for executing said neural network on an input 3D magnetic resonance image of the brain of a subject and,
wherein the neural network is a deep convolutional neural network comprising a plurality of processing stages arranged such that:
wherein the estimate determined by the processing stage of highest resolution forms the output of the neural network.
In embodiments, brain target points are cortical target points for Transcranial Magnetic Stimulation, stimulation points for deep brain stimulation, or brain anatomical targets forming reference points for later applications.
In embodiments, the processing stage of highest resolution receives as input the input image, and each other processing stage receives as input an image of lower resolution from the input image obtained by downsampling the input image.
In embodiments, the neural network further comprises, for each processing stage except the processing stage of highest resolution, a pooling layer configured to perform the downsampling of the image that is received as input by the corresponding processing stage.
In embodiments, the neural network is configured for outputting the location of a plurality of target points, and each processing stage comprises a number of convolution layers common to all the target points, and, for each target point, a respective branch configured for computing the location of said target point.
In embodiments, each processing stage further comprises a plurality of residual layers configured to correct the location of each target point computed by each respective branch, wherein the parameters of the residual layers of all processing stages are identical.
In embodiments, the number of processing stages is determined by 1+log2(N/Ns) where N is the size of the input image along one dimension and Ns is the size, along one dimension of a cropped region of the image.
According to another aspect, the present disclosure relates to a method implemented by the above device, for locating at least one determined brain target point on a magnetic resonance image of the brain of a subject, the method comprising:
In embodiments, the 3D magnetic resonance image is a T1-weighted magnetic resonance image.
In embodiments, the method further comprises transmitting the obtained target point locations to a guidance system of a transcranial magnetic simulation device or a deep brain stimulation device.
According to another aspect, the present disclosure relates to a method for training a neural network configured to output, from a 3D resonance magnetic image of a brain, the location of at least one determined brain target point,
wherein the neural network is a deep convolutional neural network comprising a plurality of processing stages arranged such that:
wherein the estimate determined by the processing stage of highest resolution forms the output of the neural network,
and the method comprises providing the neural network with a set of training resonance magnetic images of brains in which the determined target points have been annotated, and, for each training image, computing a loss function quantifying an error between locations computed by neural network and the actual locations of each target points,
wherein the loss function is a weighted sum of the square error of each processing stage of the neural network divided by the resolution of said processing stage.
In embodiments, wherein each weight of the weighted sum is expressed as:
where wl is a weight associated to a processing stage l, fl is the field of view of said processing stage measured in pixels, el is the error of said processing stage rl is the resolution of said processing stage, and α is a fixed parameter.
According to another aspect, the present disclosure relates to a computer-program product comprising code instructions for implementing the methods disclosed above, when they are executed by a computer.
According to another aspect, the present disclosure relates to a non-transitory computer readable storage having stored thereon code instructions for implementing the methods disclosed above, when they are executed by a computer.
According to another aspect, the present disclosure relates to a computer-implemented neural network, characterized in that it is obtained by implementing the training method disclosed above.
According to the proposed device and method, a specific neural network is used to locate determined target points on brain magnetic resonance images. The structure and the training method of the neural network are adapted to the task, providing average errors that are less than the errors of the registration approach and comparable to the errors that exist between individual human experts. However, the time required for locating the positions of the target points is greatly reduced as compared to the prior art solutions.
Further details, aspects and embodiments of the proposed solution will be described, by way of example, with reference to the drawings.
With reference to
The device 1 comprises a memory 10, storing a trained neural network that is described in greater details below. The memory 10 may store the structure and parameters of the trained neural network as well as code instructions for executing the neural network. The device 1 also comprises a computer 11, configured for executing said neural network on a 3D magnetic resonance image of the brain. The computer may include one or more processor(s), central processing unit(s) and/or graphical processing unit(s).
The computer 11 may receive one or several input image, each input image being a 3D magnetic resonance images of the brain of a subject. To this end, the device 1 may comprise an interface 12 for receiving this image. The interface 12 may include a connection interface with a magnetic resonance imaging device allowing any image acquired by the device to be transmitted to the device 1, either automatically or after performing a selection of one or more images to be transferred. Alternatively the interface 12 may include a connection interface with a memory having stored thereon the image to be processed by the computer 11. According to still other embodiments, the interface 12 may include a connection interface with a communication network, allowing the device to download the input image from the network.
The neural network stored in the memory 10 and executed by the computer 11 is configured for locating, on a 3D magnetic resonance input image of the brain of a subject, a set of determined target points. In embodiments, the input image is a T1-weighted MR image.
The set of determined target points comprises at least one target point, but in embodiments it may comprise a plurality of target points, for instance at least 5 target points, for instance between 5 and 15 target points. Each target point is defined in three dimensions, the target points and their three dimensions' coordinates forming a cloud of points of interest. In still other embodiments, the target points may be specific brain anatomical targets that can form reference points for later use, for instance, targets allowing the definition of a Talairach space.
Those target points correspond to very specific and punctual locations of the brain, which need to be precisely located in preparation of later neurosurgical intervention. The brain target points can in particular be cortical target points for Transcranial Magnetic Stimulation. They can alternatively be potential stimulation sites for deep brain stimulation.
The neural network is a deep convolutional network configured to process the input image at different resolutions in order to locate the positions of the target points on the input image.
To this end, with reference to
In the example shown in
The processing stage of highest resolution receives as input the input image I in its original resolution (Ir1=I).
For each other processing stage Si (i>1), the neural network comprises a downsampling block Di that receives the input image (Iri-1) of the processing stage of next higher resolution, and downsamples this image to output an image of lower resolution Iri. Said image of lower resolution forms the input image of the considered processing stage Si.
In embodiments, the downsampling block Di is implemented by an average pooling layer. The pooling size may depend on the size of the original input image. In other embodiments, the downsampling block is implemented by a convolutional layer followed by a ReLU layer (ReLU standing for Rectified Linear Unit).
Therefore, the processing stage of second higher resolution receives as input an image that has been downsampled once, the next stage receives as input an image that has been downsampled twice, etc.
Each processing stage is configured for computing, on its corresponding input image Iri, an estimate of the location of each target point of the set of determined target points. Furthermore, each processing stage except the processing stage of lowest resolution is configured to compute said estimate based on an estimate of the location of each target point received from a processing stage of lower resolution, and more specifically from the processing stage of next lower resolution.
Due to the specific function of point localization of the neural network, an important part of the input image does not contribute to solving the problem. Thus, each processing stage Si, except the one of lowest resolution, is further configured to crop its input image, based on the estimate of the location of the target points, to a smaller region of the input image surrounding each estimated target point. This is denoted by a cropping block “crop” at the beginning of each processing stage except the coarsest one, corresponding to the lowest resolution, in
This cropping operation significantly reduces the requirement for computational memory needed by each processing stage for implementing the subsequent task of locating the target points on its input image. This allows for the processing stages that update the points estimated locations to be relatively large.
In embodiments, each processing stage may thus comprise a plurality of convolutional layers, having a number of kernels that is greater that the number of kernels of the downsampling block Di, if the latter is implemented by a convolutional layer.
With reference to
In embodiments, the estimation block of each processing stage comprises one or several layers that are common to all the target points to locate, and then a respective branch configured to compute the location of each target point of the set of predetermined target points.
The layers that are common to all the target points to locate may comprise at least one convolutional layer followed by ReLU layer, followed by a Batch Normalization Layer. In the example shown in
Furthermore, each branch corresponding to a respective target point (shown in broken line in
where j is a channel of the convolutional layer of the respective branch at resolution ri, x a location in the image at resolution ri, Mij(x) is the output of the convolution stack for that channel, and cij is the centroid of the resulting probability map.
These centroids are treated as potential estimates, and the final updated estimate is taken by a weighted mean of these centroids, where the weight Wij of each channel j is determined by another parallel computation occurring in the same respective branch. This computation comprises an average pooling of the output of the series of layers common to all target points, and a series of two linear layers. An additional softmax layer is applied to normalize these weights.
The final updated estimate is thus given by:
In embodiments, due to missing or unclear information in the image, it may also be beneficial for these points to be updated, given the knowledge of neighboring similar points. Accordingly, each processing stage of the neural network may further comprise a set of residual layers, comprising a plurality of residual layers RL in series appended to the estimation block Ei. With reference to
In order to prevent the residual layers from causing a large increase in the number of parameters, the number of residual layers is the same for all processing stages, and furthermore the parameters of the residual layers are also fixed to be the same for all the processing stages.
Regarding the number of processing stages, this number may vary according to the size of the initial input image and the size of each cropped region. The number NS of processing stages may be computed as:
where N is the size of the image along one direction, generally equal to 256, and NR is the size of the cropped image along one direction. This number may be configured. For instance it may be equal to 8. In that case, and as shown in the example of
When the trained neural network is applied to an input image, the output of the neural network is formed by the output of the processing stage of highest resolution, which is formed by the input image on which the locations of each determined target point is identified.
Back to
The training is performed on a training dataset comprising a plurality of brain MR images which have all been annotated by one or several experts in order to locate, on each image, the position of each point of the set of determined target points. In
During the training, an error ei is computed between an estimate vector comprising the estimate output by all the processing stages of the neural network, and the ground truth, corresponding to the positions of the target points determined by the experts.
However, given the structure of the network detailed above, it should be underlined that the cropping operations are not differentiable with respect to the location they are cropping around, meaning that the gradients cannot travel from finer resolution layers to coarser ones. Thus conventional gradient-based training function cannot be used.
Furthermore, due to the increasing pixel width, changes in the coarser resolution layers are magnified. For instance, an error of 1 pixel at the coarsest layer can be equivalent to an error of 32 pixels at the finest layer.
Last, each layer has a limited space of possible estimates, its so-called “field-of-view”, and thus cannot learn from gradients when the ground truth is outside of that space.
In order to take into account these considerations, a specific training function is associated to this neural network structure.
Regarding the first problem, an L2 loss can be applied to the estimate from each layer. In other words, the loss function can be expressed as a sum of squared values of the error of each resolution layer.
To address the second problem, the error for each processing stage can be divided by its resolution.
To address the third problem, a weighting scheme is used, configured to take into account, for each processing stage, the error of the processing stage as well as its resolution and field of view, in order to give more importance to processing stages which error has the same order of magnitude as the resolution, and to remove importance from ones in which the error approaches the edges of the field of view or exceeds the field of view.
Thus, the loss function L computed during the training is given by:
where ri is the resolution of a processing stage, ei is the error of this processing stage, and wi is a weighting coefficient, that can be computed as:
where fi is the field of view of a processing stage, and α is a constant that is used to control how close this weighting is to binary, i.e. how much it allows a particular processing stage that is coarser than its optimal resolution but still within its field of view. This value may for instance be set to 0.5.
An illustration of the localization of brain points on a MRI images is shown in
According to an exemplary embodiment, a neural network having the structure detailed above has been trained and then applied on T1-weighted brain MR images having a size of 256×256×256 pixels.
In this example, the neural network comprises 6 processing stages of respective resolutions. Each downsampling block performs an average pooling of size 2×2×2 pixels. Thus the voxel size processed by the finest resolution layer is 1 mm, and the voxel size processed by the coarsest resolution layer is 32 mm.
Each processing stage except the coarsest one crops its respective input image to a region having a size of 8×8×8 pixels.
The training dataset comprises 16 full volumetric images annotated by experts, on which data augmentation has been implemented in the form of random rotation (std. 10°) and translation (std. 10 pixels), which can be easily applied to the point location as well.
Each image is annotated with 12 determined target points. Preferably, for at least some points, a plurality of experts provides the localization of each point and the location of the point considered as the ground truth is a consensus of the experts on the location. However for some points, a plurality of experts annotations may not be available and the location of a point may be provided by only one expert.
The efficacy of the network is compared to both the locations determined by expert neurologist and to deformable registration performed by deforming an atlas pre-annotated by an expert. The deformable registration procedure takes about 4-5 minutes to compute per patient.
In order to evaluate the method, the mean distance error was calculated in a Leave-One-Out cross validation system, which was repeated 4 times in order to calculate a dispersion metric, that is, an average distance between the point estimate of each network from the mean point across all three other networks.
The results are reproduced in
Furthermore, the target points are identified by series of letters, which significations are provided as:
One can thus notice that the proposed method borders on human expert performance and sometimes outperforms human expert error, and sometimes underperforms, but neither to statistical performance. In any case the performance of the method was not achieved by the registration-based approach as the latter consistently underperform the human experts with statistical significance for all but the RLLIMBMC and LULIMBMC target points.
The various embodiments described above can be combined to provide further embodiments. These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.
Number | Date | Country | Kind |
---|---|---|---|
21305776.3 | Jun 2021 | EP | regional |