This application claims priority to Italian Patent Application No. 102020000025006 filed on Oct. 22, 2020, the entire contents of which is hereby incorporated in its entirety by reference.
The present invention relates to a method for the automatic identification and quantification of radioisotopes, e.g. in low resolution gamma spectra, based on convolutional neural networks densely connected in directed acyclic graph.
With reference to
Starting from these physical features, it was attempted to automatically recognize the presence of specific isotopes in gamma spectra.
In this respect, for the prior art reference is made to a recent review containing all of the pertinent references related to the previously consolidated methods of isotope identification in gamma spectra [1].
The consolidated approaches can be grouped into two macro-categories: “peak search and match” and “template matching”.
In the first method, the first step consists in identifying the peaks present in the spectrum, which correspond to the characteristic emissions of the isotope. Such a process is not trivial on spectra with a low number of events and with modest energy resolutions as statistical fluctuations and broad peaks can prevent a small signal from being distinguished from noise. In the second step, determined numeric attributes are calculated from the initial spectrum (e.g. area of each peak). The quality and number of attributes selected is fundamental for performing the subsequent task of classification (probability of the presence of the recognized element) in an accurate manner and in reasonable times. These results are used to select the correct solution in an existing library by means of a comparison. The dimension and quality of the library are crucial as it is always necessary to reach a compromise between speed and accuracy. There are various classification algorithms (decision trees, neural networks, Naïve Bayes, Nearest Neighbor, Support vector machines, the neural networks being used here merely for the purposes of classification downstream of the extraction of features with inexperienced methods and algorithms) and the choice of which to use depends on the previous steps.
The second method consists in constructing a library of isotopes in various configurations. An algorithm searches for the best combination of solutions present in the library, which best reproduce the spectrum. In order to overcome the combinatory problem, the algorithms vary and can be divided into heuristic and systematic. The drawback of this approach is that the library must be representative of the detection system used. Even slight distortions mislead the matching algorithm (e.g. statistical noise or the presence of absorber materials).
Recently, algorithms based on artificial neural networks (ANN) combined with other methods have appeared in this scenario, both in scientific publications (e.g. [2,3,5]) and as patents ([6-9]). This category differs from the previous ones in that the comparison with the library is not made for each new measurement: once trained, the network is capable of providing the response immediately. The patents of this type suggested so far are limited to classification or identification, i.e. they determine the probability of a radioisotope being present or absent. Furthermore, the analysis with the ANNs is always preceded by a data pre-processing step to remove noise and reduce the dimensionality of the problem.
A method is also known from publication [10], which uses algorithms for recognizing patterns, such as the artificial neural networks (NN) and the convolution neural networks (CNNs) to carry out the automated gamma-ray spectroscopy. How these patterns train and operate imitates how the trained spectroscopists identify spectra. These patterns have shown promising results in identifying gamma-ray spectra with a wide calibration drift and unknown background radiation fields.
In this scenario, a need remains for a method capable of quantifying the fraction of each isotope detected. Furthermore, a need is felt for a method capable of eliminating the preliminary step of reducing the dimensionality of the problem, as well as the step of smoothing the incoming data, the whole at a speed which is obtainable with portable personal devices, such as smartphones or personal computers. Another need is to have a method for recognizing and quantifying isotopes, which can be trained using both experimental measures and simulations.
It is the object of the invention to provide a method for the automatic identification and quantification of radioisotopes in low resolution gamma spectra, which at least partially solves the problems and overcomes the drawbacks of the prior art.
A method according to the appended claims is the subject of the present invention.
The present invention will now be described by way of non-limiting example, with particular reference to the figures in the accompanying drawings, in which:
It is specified here that elements of different embodiments may be combined together to provide further embodiments without restrictions while respecting the technical concept of the invention, as those skilled in the art will effortlessly understand from the description.
The present description further refers to the prior art for the implementation thereof, with regard to non-described detail features, such as elements of minor importance usually used in the prior art in solutions of the same type.
When an element is introduced, it is always understood that there may be “at least one” or “one or more”.
When elements or features are listed in this description, it is understood that the finding according to the invention “comprises” or alternately “consists of” such elements.
The identification method of the invention is based, inter alia, on convolutional neural networks (CNNs), an algorithm known per se and highly powerful in analyzing and recognizing images as it is capable of capturing, in an image, attributes of a local character (shapes, outlines, colors, etc.) irrespective of where they are therein and the identification thereof is invariant to small transformations, distortions, and translations.
The physical and mathematical problem faced by the Inventors in view of isotopic recognition in a gamma spectrum was set as follows.
A measured gamma spectrum, generated by various radioactive sources, can be considered as a linear combination of the spectra generated by each single source. If Nc is the number of channels forming the spectrum and Ni the number of possibly identifiable isotopes, the measured spectrum can be expressed according to the relation:
where ci is the number of counts in the i-th channel, wj is the weight or coefficient of the j-th isotope, and â is the matrix which describes how the detector responds in the presence of a given radioisotope. In essence, the j-th column of â represents the ideal spectrum that the detector would measure in the presence of the j-th isotope. The problem of identifying the isotopes present in the measured spectrum and quantifying the fraction thereof thus consists in inverting the Equation (1) and obtaining the weights from the measured spectrum.
However, since the matrix â is hardly invertible, the problem is unstable to slight fluctuations, which lead to results devoid of physical sense, such as, for example, negative or huge weights, due to the presence of statistical noise in the measurement. Instead of inverting â, it is possible to fit the inverse thereof, using experimental measurements in which the actual weight of each isotope present in each one is known.
One way of doing this is to use a neural network with the following architecture (see
The problem is that this architecture has great limitations. The absence of non-linearity prevents the insertion of hidden layers as they would be redundant (linear combinations of linear combinations) and this leads to a maximum number of trainable parameters (given by the product Nc·Ni) and to limited predictive capacities (networks with linear activation functions cannot reproduce any function, unlike a multi-layer network with non-linearity).
For this reason, according to the present invention, it is advantageous to regularize the problem or reduce the dimensionality thereof, for example by identifying which isotopes are actually present and only calculate the weights for those. In fact, the problem of identifying isotopes in a gamma spectrum is simpler, although not trivial: the presence of determined features or attributes in the measured spectrum (for example, position of the peaks) automatically identifies which isotope generated it, and therefore the problem is transferred to the capacity to identify and recognize such attributes (“peak searching”, “template matching”), without any quantitative analysis for each one of them.
Therefore, the problem was split by the Inventors into two problems: identifying the isotopes present (classification in terms of probability); quantifying the fraction of each one (regression).
Neural networks generally perform only one of such tasks, while the invention achieves both, with techniques adopted for 1) efficiently extracting the relevant information from the spectrum measured and 2) efficiently combining the information related to the identification in order to obtain the quantification.
The first obstacle of the above problem is the specific nature of raw data. A measured gamma spectrum is affected by statistical noise. Therefore, the first step generally is that of smoothing, which limits the statistical fluctuations but, in the case of too noisy measurements, it can introduce artifacts. Furthermore, the spectrum generally consists of a few thousand channels. Such an amount of starting data is high for a standard multi-layer network, which would require several layers with a comparable number of neurons for the analysis, thus achieving a trainable number of parameters even equal to ˜106.
Therefore, a first appropriate action according to the present invention is to reduce the dimensionality to reduce the complexity of the problem using various possible methods. Such a reduction in dimensionality, as will be seen, will be different from that of the prior art, because convolutional networks are trainable with respect to the so-called hyper-parameters and not because the dimensionality of the incoming datum is reduced.
Finally, the last limit of a network, as usually applied to the general problem of the invention, consists in not considering the spatial relationships between the input data: if the channels of the dataset spectra were all remixed in the same manner, the training would not suffer from positive or negative consequences. This is a waste of resources and misuse of information because the network must learn again which relation exists between the various input data, wherever placed, when such information is already available: in fact, in the case of gamma spectra, if a sequence of channels forms a peak, it is important, according to the Inventors, to assess the whole sequence and, that is, also to consider the local neighborhood of each channel.
After posing the problem so, the Inventors agreed that the best candidate for solving both problems would be the convolutional neural networks. Since the parameters of the convolutional filters are trained, it does not matter how long the input sequence is: the number of parameters remains unchanged. With equal parameters, this allows creating deeper networks and with more layers, thus increasing the abstraction power of the network, without needing to perform a pre-processing of any kind of the raw spectrum. Furthermore, by assessing portions or segments of data at a time, it is possible to extract the relevant attributes present in the various zones of the spectrum in an invariant manner by translation and scale (typical feature of CNNs).
Finally, the number of parameters for training a convolutional network, which has input I images of a few thousand pixels, is highly limited (˜104), thus facilitating the learning thereof, even on datasets of modest dimensions.
In short, according to the assessments of the Inventors, the choice of the CNNs in the application for recognizing radioisotopes in gamma spectra would have allowed (as later demonstrated, see below) an effective extraction of the relevant attributes directly from the raw measurement (therefore without loss of information given by the compression and without introducing possible artifacts), using few parameters and in a robust manner as compared to distortions given by the statistical noise. This was considered to be the first block of the network of the invention, technically referred to as “features extraction”.
The main limit in the construction of deep networks lies in the propagation of the information through the various layers. In the case of CNNs, each new convolutional block must re-learn what is relevant from what is not as it only has access to the output data of the previous block. Recently, the technique of connecting the output of each convolutional block to the input of every other one (densely connected CNN) was suggested, as shown in
Even if the number of connections and relationships between the layers increases, this type of networks requires fewer parameters and favors the re-use of data extracted at each block, ensuring a more compact and accurate learning with fewer problems of overfitting and without degrading performance for deeper networks. For an in-depth analysis on this matter, see the article at the following link: https://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_Densely_Connected_Convulutional_CVPR_2017_paper.pdf.
In the context of the identification of isotopes of the present invention, the DC-CNNs are a contrivance for strengthening the learning of the network, reducing the number of parameters thereof, and avoiding the “overfitting”.
Following the features extraction, according to the present invention, the network must perform the tasks of regression and classification (probability of presence, normally varying from 0 to 1) to solve the two problems described above in the application of recognizing and quantifying isotopes.
According to the present invention, it is possible to bifurcate the network and assign an objective to each branch. The regression part can be structured according to the previous description (see above in relation to the formula {right arrow over (c)}=â·{right arrow over (w)}): data exiting the convolutional part is linearly combined and outputs, in turn, a coefficient for each identifiable isotope (quantification); this is the only structure which allows the network to conceive the input data as overlap or linear combination, thus the quantification branch according to the invention is without activation functions. With respect to the application of this approach directly to the raw spectrum, now the information has been processed and the effect of the distortions is attenuated, although not eliminated.
The second branch is structured in the same manner (input and output with the same number of neurons as the other branch) with the difference that a step-like activation function is applied to each neuron of the output layer (with a number of neurons equal to the number of isotopes), which grows quickly from the minimum value to the maximum value, the output of which represents the probability that that isotope is present (sigmoidal function).
This is defined as “multi-label classification” as, for each isotope, a value released from the others is obtained: they can all be present or they can all be absent. Unlike the activation functions such as SoftMax, in the case of the present invention, it is not necessary to identify at least one class. This is important if, in the measured spectrum, an isotope is present, for which the network has not been trained, which network will thus return null values, avoiding identification errors.
In essence, the same information is processed by two different networks (multi-objective architecture), obtaining two values for each isotope: the weight and the probability that it is present. In order to process both pieces of information, the bifurcation converges into a single node: the negative weights or the weights of the isotopes, the probability of which is less than a certain threshold, are disregarded and the remaining ones are conveniently normalized so as to finally obtain a vector of numbers with a length equal to the number of isotopes the sum of which is unitary.
In practice, according to the present invention, in order to achieve the structure of the DC-CNN and the multi-objective networks, as outlines above, the topology of the directed acyclic graphs is used (see
The basic ingredients for performing the identification and quantification of isotopes have been described in the previous sections. The final architecture, obtained after several trials and errors, is shown and described in detail. However, it is worth pointing out that a different number of convolutional blocks, different numbers of filters having different dimensions, can however perform the task.
In a specific embodiment, the input layer corresponds to a vector with a predetermined number of channels for acquiring the spectroscopic image, e.g. equal to 2048 (number set based on the typical data of the analyzed gamma spectra). The counts are normalized so that the area of the spectrum is unitary.
With reference to
Optionally, a batch normalization (Batch Normalization 1 layer) is then carried out, a well-known technique for reducing sensitivity at initialization of the parameters of the network and commonly used between the convolutional layer and the activation functions. It consists in re-scaling and re-centering each input of a mini-batch.
The one non-linear activation function is then applied, advantageously the ELU function (exponential linear unit—Activation eLu 1), to each element; the non-linearity has a similar function to the standard networks and facilitates the extraction of the attributes.
On the other hand, the absence of the typical pooling layer is a merely simplifying choice: it has been shown in literature that it is possible to attain equally optimum results without it (“all convolutional net” https://arxiv.org/abs/1412.6806) therefore, without having to calibrate the hyper-parameters linked to the pooling, which, in this case, would not even be mandatory as it is not necessary to compress the data: the dimensionality must remain unaltered to concatenate the layers.
In all, various convolutional blocks equal to the first one already described can be present (filter 1×24 (e.g. 4 convolutional blocks), zero padding of 23 zeroes, batch-normalization and ELU function) but each block is connected to all of the subsequent ones and the outputs are conveniently concatenated.
This gives, progressively in the specific case shown, vectors of 1×2048×2, 1×2048×3, 1×2048×4 e 1×2048×5 as input with each convolutional layer (Convolutional 1, Convolutional 2, Convolutional 3, and Convolutional 4 in
The subsequent dropout layer (Dropout Layer in
The bifurcation comprises:
The outputs of the first and second branches are concatenated so as to provide a vector with a number of components equal to the identifiable isotopes and vector component values equal to the corresponding quantification coefficients normalized, the concatenation being performed after applying a first cost function to the first branch and a second cost function to the second branch.
The values of the two cost functions are combined (with sum or another appropriate operation) to provide a single cost value to be minimized in the training.
In the specific example, the output of both branches is concatenated (Concatenation
Output, 1×16 output values) and processed by a specific, personalized cost function: a cost function is applied to the classification part, e.g. the cross-entropy loss function since it is a multi-class and multi-label problem (i.e. several isotopes can be present at the same time). Isotopes with a greater output than 0.5 or another threshold value are considered present (being a hyperparameter calibrated during the training).
The corresponding values of the regression part are compared with the real values by means of the sum of the square differences (second cost function) or other regression function.
The cost functions are calculated at the output from the bifurcation, in the “output layer” block in
The total error is given by the sum of both cost functions. The overall number of parameters is 66084 in the specific illustrated case. For a quick comparison, just think that a purely linear network without hidden layers with 8 possible isotopes would consist of 16384 parameters. With only a factor 4 of difference, the architecture of this network allows managing problems of a completely different complexity.
The dataset can consist of spectra with various statistics and number of isotopes actually present:
In the case of spectra with two isotopes, each possible combination of the eight possible isotopes of this example (57Co, 60Co, 133Ba, 137Cs, 192Ir, 204Tl, 226Ra, 241Am) is considered. The whole dataset available (19320 spectra) has been divided as follows: 80% for training, 10% for validation, and 10% for verification.
In a specific case, mini-batches of 128 spectra have been created for the training with a learning rate of 0.001, and the updating of the parameters uses the Adam optimization algorithm. If the cost function on the validation dataset for 6 consecutive iterations does not improve, the training is stopped to prevent overfitting. On a standard single core laptop, such a training took about 20 min.
Shown below are the results on a verification dataset, which is not used for the training: each spectrum is “new” for the network. In relation to spectra with a single isotope, the network does not make mistakes and is always able to recognize the isotope regardless of the statistics, as per Table 1 shown below:
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
An example of a raw output of the network for a spectrum of 57Co with 1000 counts (lowest statistic used) is shown below:
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
The first line corresponds to the outputs of the classification branch, indicating that the probability is virtually null for each isotope except for 57Co. This allows only the first weight of the second line to be considered, disregarding the others.
As for spectra with two isotopes, instead of showing the values of each prediction, the average and standard deviation (in brackets) have been calculated for the weights of each combination of isotopes, even between different statistics (%), as in Tables 2-4 below.
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
57Co + 60Co
57Co + 133Ba
57Co + 137Cs
57Co + 192Ir
57Co + 204Tl
57Co + 226Ra
57Co + 241Am
60Co + 133Ba
60Co + 137Cs
60Co + 192Ir
60Co + 204Tl
60Co + 226Ra
60Co + 241Am
133Ba + 137Cs
133Ba + 192Ir
133Ba + 204Tl
133Ba + 226Ra
133Ba + 241Am
137Cs + 192Ir
137Cs + 204Tl
137Cs + 226Ra
137Cs + 241Am
192Ir + 204Tl
192Ir + 226Ra
192Ir + 241Am
204Tl + 226Ra
204Tl + 241Am
226Ra + 241Am
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
57Co + 60Co
57Co + 133Ba
57Co + 137Cs
57Co + 192Ir
57Co + 204Tl
57Co + 226Ra
57Co + 241Am
60Co + 133Ba
60Co + 137Cs
60Co + 192Ir
60Co + 204Tl
60Co + 226Ra
60Co + 241Am
133Ba + 137Cs
133Ba + 192Ir
133Ba + 204Tl
133Ba + 226Ra
133Ba + 241Am
137Cs + 192Ir
137Cs + 204Tl
137Cs + 226Ra
137Cs + 241Am
192Ir + 204Tl
192Ir + 226Ra
192Ir + 241Am
204Tl + 226Ra
204Tl + 241Am
226Ra + 241Am
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
57Co + 60Co
57Co + 133Ba
57Co + 137Cs
57Co + 192Ir
57Co + 204Tl
57Co + 226Ra
57Co + 241Am
60Co + 133Ba
60Co + 137Cs
60Co + 192Ir
60Co + 204Tl
60Co + 226Ra
60Co + 241Am
133Ba + 137Cs
133Ba + 192Ir
133Ba + 204Tl
133Ba + 226Ra
133Ba + 241Am
137Cs + 192Ir
137Cs + 204Tl
137Cs + 226Ra
137Cs + 241Am
192Ir + 204Tl
192Ir + 226Ra
192Ir + 241Am
204Tl + 226Ra
204Tl + 241Am
226Ra + 241Am
As can be seen immediately, the network always and only recognizes the isotopes actually present with considerable precision and reproducibility of the coefficients. An example of raw output for a spectrum with 57Co and 60Co at 1:1 ratio with 2000 counts is:
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
The first line corresponds to the outputs of the classification branch indicating that the probability is virtually null for each isotope except for 57Co and 60Co. This allows only the first two weights of the second line to be considered, disregarding the others.
In order to demonstrate the potential of this approach, spectra belonging to different categories were submitted to the network: spectra with only one isotope with 100 counts (a smaller order of magnitude than the minimum value used for the training) and spectra with 3 isotopes at 1:1:1 ratio. In the first case, a dataset with 100 spectra per isotope was constructed. The number of times an isotope has been identified is shown in the following Table 5: the elements on the diagonal consist of correct predictions (the columns do not add up 100 because of the possibility of identifying more than one isotope).
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
From this test, it follows that in 99.88% of cases, the algorithm is however capable of identifying which is the correct isotope, even though, because of the low statistics, it is the only isotope only in 86.38% of cases. In the remaining 13.5%, the network also identifies other isotopes. In only one case the network does not identify any isotope because the probability for each isotope does not exceed the threshold. An example of raw output in the case of a spectrum of 226Ra in which an error is made is shown below:
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
In fact, since the probability of 241Am is greater than the threshold, it is considered present.
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
57Co + 60Co + 133Ba
57Co + 60Co + 137Cs
57Co + 60Co + 192Ir
57Co + 60Co + 204Tl
57Co + 60Co + 226Ra
57Co + 60Co + 241Am
57Co + 133Ba + 137Cs
57Co + 133Ba + 192Ir
57Co + 133Ba + 204Tl
57Co + 133Ba + 226Ra
57Co + 133Ba + 241Am
57Co + 137Cs + 192Ir
57Co + 137Cs + 204Tl
57Co + 137Cs + 226Ra
57Co + 137Cs + 241Am
57Co + 192Ir + 204Tl
57Co + 192Ir + 226Ra
57Co + 192Ir + 241Am
57Co + 204Tl + 226Ra
57Co + 204Tl + 241Am
57Co + 226Ra + 241Am
60Co + 133Ba + 137Cs
60Co + 133Ba + 192Ir
60Co + 133Ba + 204Tl
60Co + 133Ba + 226Ra
60Co + 133Ba + 241Am
60Co + 137Cs + 192Ir
60Co + 137Cs + 204Tl
60Co + 137Cs + 226Ra
60Co + 137Cs + 241Am
60Co + 192Ir + 204Tl
60Co + 192Ir + 226Ra
60Co + 192Ir + 241Am
60Co + 204Tl + 226Ra
60Co + 204Tl + 241Am
60Co + 226Ra + 241Am
133Ba + 137Cs + 192Ir
133Ba + 137Cs + 204Tl
133Ba + 137Cs + 226Ra
133Ba + 137Cs + 241Am
133Ba + 192Ir + 204Tl
133Ba + 192Ir + 226Ra
133Ba + 192Ir + 241Am
133Ba + 204Tl + 226Ra
133Ba + 204Tl + 241Am
133Ba + 226Ra + 241Am
137Cs + 192Ir + 204Tl
137Cs + 192Ir + 226Ra
137Cs + 192Ir + 241Am
137Cs + 204Tl + 226Ra
137Cs + 204Tl + 241Am
137Cs + 226Ra + 241Am
192Ir + 204Tl + 226Ra
192Ir + 204Tl + 241Am
192Ir + 226Ra + 241Am
204Tl + 226Ra + 241Am
With reference to Table 6 above, surprisingly, the network behaves well, also in the case of spectra with 3 isotopes, identifying the correct ones and correctly estimating the weights, regardless of the statistics. However, in many cases, it only recognizes 2 out of 3 isotopes present, the relative fractions of which are however comparable. Using these spectra in training, clearly better results are obtained.
As already said, mutually connecting the convolutional blocks is a contrivance to improve the training and performance of the network, but it is not strictly necessary. Excellent results can also be obtained without, but worse than the version adopting this architecture, as shown in the following Table 7 in which errors are also present on the spectra with a single isotope.
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
The need for the classification branch is apparent from the examples previously shown: without filtering the weights with the probability that an isotope is present, the results are unpredictable with negative weights or weights comparable to those of the isotopes actually present.
The network structure has the following types of hyper-parameters:
The dimension of the filters is linked to the spatial extension of the features present in the image (for example, photo-peaks, Compton shoulders) and to the levels of noise present. On the one hand, the perceptive field of the convolutional block must not be too wide so as to identify details which may prove to be relevant in the subsequent analysis (regression and classification). On the other hand, if the statistical fluctuations are high, the network must not confuse such oscillations as features, and therefore the use of a wide filter attenuates this effect since a sufficiently extensive portion is examined to observe the overall trend of that region. Since this network has been conceived for use even on low-statistics spectra, the dimension of the filters is relatively large (1×24) as compared to other CNN applications. As for the dimensions of the filters in the first densely connected convolutional part, it was taken into account that by increasing the number of convolutional blocks, the abstraction capacities of the network increase, and therefore improved performance is obtained. However, networks with too many layers can suffer from the “vanishing gradient problem”, which is such that the updating of the weights is slower in the first layers of the network, resulting in increased training times and the convergence itself of the cost function can be problematic. At the same time, the increase in the number of trainable parameters implies the use of a wider dataset. As for all artificial neural networks, a compromise was made between all these factors and the optimum number of blocks identified is 4 (as shown below, but such a dimension is to be understood as optional).
The further final block responds to a precise need for optimization. The DC-CNN allow an improved propagation of the feature maps through the various layers of the network, but this means that the raw input spectrum is also analyzed by the subsequent blocks. Since the spectrum in hand can be very noisy, in order to avoid the processing of such a noisy spectrum, a further convolutional block was advantageously inserted, which processes the product of each previous block. By doing this, the overall amount of data is reduced, while facilitating the analysis of the two completely connected subsequent layers.
In order to compare the performance of the network by varying the number of convolutional blocks, the dataset containing spectra with 100 events (not used for the training) was selected. The reason for such a choice consists in highlighting not what the network learns, but what it is capable of generalizing. In fact, the trend of the cost function during the training does not exhibit any substantial differences by varying the number of convolutional blocks, and therefore practically comparable performance is obtained on the test dataset.
Using two convolutional blocks, the following results are obtained (see table 8):
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
Using three convolutional blocks, the following results are obtained (see table 9):
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
Using four convolutional blocks, the following results are obtained (see table 10):
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
57Co
60Co
133Ba
137Cs
192Ir
204Tl
226Ra
241Am
If the number of isotopes to be identified were to be expanded, the only modification in the architecture would consist in increasing the number of neurons in the completely connected layers of classification and regression. Additionally, the corresponding spectra should be added to the dataset, both individually and combined with others. The complexity would increase as some isotopes might have spectral lines similar to others and so on. Even though all this is possible, it should be pointed out that it is not strictly necessary to train a network with every possible isotope since, depending on the application of the gamma sensor used, some isotopes would never actually be used. On the other hand, it is more convenient to train networks aimed at the final application. In the more complex applications, the number of isotopes rarely exceeds 20, positioning the method of the present invention at a good stage.
A first version of the invention was tested on spectra of four isotopes (57Co, 109Cd, 133Ba, 241Am) measured by a CdZnTe detector, with an energy resolution of 3% at 662 keV. The network was trained with spectra with 102 and 103 events. In the case of only one isotope, the network achieves an accuracy of 100% on spectra not used for training. Such spectra have a statistically insufficient number of events for applying standard algorithms.
A second version was tested on simulated spectra of eight isotopes used in the industrial field (57Co, 60Co, 133Ba, 137Cs, 192Ir, 204Tl, 226Ra, 241Am). The network was trained with spectra of 103-4-5 events. Also in this case, on spectra with a single isotope, there is 100% accuracy, irrespective of the statistics. Furthermore, high performance (98,5%) is also obtained on spectra with 102 events (not used for training): the network proved to be able to generalize what it learned (see sections above). The network was also tested on spectra with several isotopes at 1:1, 3:1, 1:3 and 1:1:1 ratio with different statistics (2·103-4-5, 4·103-4-5 and 3·103-4-5 respectively) and for each possible combination. The network only detects the isotopes present and estimates the fraction thereof. The TRL (Technology Readiness Level) is 4 (Technology validated in lab).
In the case of the presence of a material between the radioactive source and the detector, it attenuates the gamma rays, to a greater extent with low energies and to a lesser extent with high energies, distorting the spectrum. This is not a problem for identification as it is the presence of a determined attribute which determines the isotope, not the intensity thereof. However, quantification would be more complicated. Even though the accuracy would certainly worsen, by introducing the spectra related to the same isotopes under various conditions of attenuation in the dataset it would still be possible to estimate the relative fractions.
Thus, the architecture of the present invention would not experience variations.
Furthermore, during a gamma radiation measurement, a natural background radiation is always present, to a greater or lesser extent depending on the place (open, closed environment, etc.). Such a radiation is weak, but in the case of long measurements, it can give a not negligible contribution in the spectrum measured. The nature of such a radiation is generally known (it is a mixture of naturally-occurring radioactive isotopes). Therefore, it is possible to add a further class to the method of the invention (i.e. a further neuron to the completely connected layers) the task of which is to estimate the fraction of the background radiation, which is thus effectively treated as a radioisotope and thus assessed for the purpose of classification of the isotopes of interest.
There are substantially four fields in which the recognition of isotopes finds application (medical, industrial, environmental, and nuclear) and the list of most commonly used isotopes is defined for each one of these.
As discussed above, it is possible to identify any radioisotope instead of creating an ad hoc network for each category. Furthermore, it is possible to manage different conditions (presence of shielding materials, scattering sources).
Although the present invention was initially developed for detectors of gamma rays in the solid state with low resolution (CdTe, CdZnTe), it remains valid for detectors based on different technologies, such as scintillators, the market for which is much broader than the first ones. Having low costs and well-established stability and efficiency, scintillators are the perfect instrument for manufacturing portable devices for the automatic identification of radioisotopes. However, having limited performance in terms of energy resolution, the main obstacle to the use thereof in this field is the performance of the analysis algorithms. The applicability of the present invention to this type of already marketed instruments increases the potential interest thereof.
By virtue of the method according to the present invention:
By virtue of the method of the present invention, superior performance is obtained as compared to the current methods applied to noisy spectra (early detection), as well as the unprecedented ability to quantify the relative fraction of each isotope without intermediate steps.
The method of training the expert algorithm applied according to the invention is very quick even on a normal laptop with a single CPU (˜20 minutes), without using cloud computing or GPU. The method is ideal for portable or hand-held devices, in which energy consumption and computational load are to be taken into consideration. The dataset to be used for training can be obtained both by experimental measurements and simulations (preferable and most commonly used method since access to radioactive sources is limited). In the second case, the modeling of the response function of the detection system is a mandatory step and can be considered a disadvantage (which is also common to other methods). However, the insensitivity of CNNs to slight distortions allows a certain tolerance in the accuracy of such simulations.
According to the present invention, it is not necessary to perform: 1) smoothing of the spectrum, 2) decomposition in wavelet, 3) analysis of features previously extracted with the neural network as in some methods of the prior art. Only one step of parallel recognition and quantification is carried out starting directly from the spectrum measured as input of the network.
Parallelism of the two analyses is ensured by a directed acyclic graph (DAG).
Preferred embodiments were described and variants of the present invention were suggested; however, it is to understood that those skilled in the art may make modifications and changes without thereby departing from the scope of protection, as described and claimed herein.
1. Monterial, M., Nelson, K. E., Labov, S. E. & Sangiorgio, S. Benchmarking Algorithm for Radio Nuclide Identification (BARNI) Literature Review. (2019). doi:10.2172/1544518
2. Liang, D. et al. Rapid nuclide identification algorithm based on convolutional neural network. Ann. Nucl. Energy 133, 483-490 (2019)
3. Kamuda, M. & Sullivan, C. J. An automated isotope identification and quantification algorithm for isotope mixtures in low-resolution gamma-ray spectra. Radiat. Phys. Chem. 155, 281-286 (2019)
4. Kamuda, M., Stinnett, J. & Sullivan, C. J. Automated Isotope Identification Algorithm
Using Artificial Neural Networks. IEEE Trans. Nucl. Sci. 64, 1858-1864 (2017).
5. Chen, L. & Wei, Y. X. Nuclide identification algorithm based on K-L transform and neural networks. Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip. 598, 450-453 (2009).
6. “System and method for resolving gamma-ray spectra”, U.S. Pat. No. 7,711,661 B2, 2010.
7. “System and Method for Making Nuclear Radiation Detection Decisions and/or Radionuclide Identification Classifications”, US20190034786A1, 2017.
8. “Apparatus and method for identifying multi-radioisotope based on plastic scintillator using Artificial Neural Network”, KR102051576B1, 2018.
9. “A kind of gamma-ray spectrum analysis method based on approximation coefficient and deep learning”, CN107229787A, 2017.
10. KAMUDA MARK ET AL: “A comparison of machine learning methods for automated gamma-ray spectroscopy”, NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH. SECTION A, ELSEVIER BV * NORTH-HOLLAND, NL, vol. 954, 19 October 2018.
Number | Date | Country | Kind |
---|---|---|---|
102020000025006 | Oct 2020 | IT | national |