An image processing system for supporting tomographic imaging, a training data generation system, a computer-implemented method for supporting tomographic imaging, a computer-implemented method of generating training data, a computer program element, and a computer readable medium.
Dark-field (“DF”) computed tomography (CT), in short “DF-CT”, is an imaging modality in which, in addition to a conventional attenuation image, two other images are obtained, namely a phase contrast (“PC”) image, which is related to the real part of the refractive index of the imaged object, and the dark-field image, which is related to the strength of ultra-small-angle scattering within the object.
It has been demonstrated in several pre-clinical studies that the dark-field signal contains valuable diagnostic information, in particular about the lung.
Some types of DF-CT scanners include a grating interferometer that is placed into the path of the X-ray beam to achieve the DF imaging capability.
However, artifacts have been observed to occur in reconstructed DF imagery. Whilst existing techniques such as described in Applicant’s WO 2016/207423A1 have succeeded in reducing artifacts, some artifacts still remain.
There may therefore be a need for improving in particular CT-DF imaging.
The object of the present invention is solved by the subject matter of the independent claims where further embodiments are incorporated in the dependent claims. It should be noted that the following described aspect of the invention equally applies to the training data generation system, to the computer-implemented method for supporting tomographic imaging, to the computer-implemented method of generating training data, to the computer program element, and to the computer readable medium.
According to a first aspect of the invention there is provided an image processing system for supporting tomographic imaging, in particular DF or PC tomographic imaging, comprising:
The proposed system allows further reducing image artifacts in reconstructed DF- and/or PC-imagery. The artifacts appear to be caused at least in parts by fluctuating reference data. The referenced data include in particular reference visibility and reference phase. Reference visibility and reference phase are properties of the interferometer. The reference data is needed in reconstruction algorithms to produce sectional images of the imaged object. The manner in which the reference data changes appears to be difficult to model analytically. The rotation of the scanner’s X-ray source during image acquisition, but also ambient factors such as temperature, humidity etc. appear to cause some of the observed changes of the reference data. These changes lead to artifacts in the reconstructed imagery. With the proposed machine learning approach those changes can be better accounted for, thus leading to a reduction or elimination of those artifacts.
More specifically, the proposed image processor acts as a pre-processor to pre-processes the acquired (projection) raw data to obtain new projection imagery which may then be used in the reconstruction instead of the acquired projection raw data. The new projection images obtained in the pre-processing appear to better capture and disentangle the three contrast mechanisms from each other, namely attenuation, phase contrast, and ultra-small-angle scatter and the said disentanglement helps to eliminate reconstruction artifacts caused by the changes in the reference data. Preferably, for any given projection direction, there are three output projection images produced, one for each of the three contrasts mechanisms.
In embodiments, the input projection images at the different phase steps are acquired by the tomographic X-ray imaging apparatus at respective different projection directions associated with the given projection direction.
Whilst the output projection imagery computed by the machine learning component may be useful in their own right for teaching purposes or analysis, in embodiments, the system comprises a reconstructor configured to implement a reconstruction algorithm to reconstruct the output projection imagery in projection domain for a plurality of such given reference projection directions into reconstructed dark-field and/or phase contrast imagery in image domain.
In embodiments, the machine learning component has a neural network structure.
In embodiments, the neural network structure includes at least in parts a convolutional neural network structure.
In embodiments, the network includes at least one layer, the said at least one layer operable based on at least one 2D convolution filter.
In embodiments, the network includes a sequence of hidden layers each operable based on respective one or more convolution filters.
In embodiments, output(s) of the said sequence is/are combined by a combiner layer into the said output projection imagery.
In another aspect there is provided an image arrangement, comprising a system or any one of the above mentioned embodiments, and a tomographic imaging apparatus for acquiring the input projection data.
In another aspect there is provided a training system for training, based on training data, the machine learning component as used in any one of the above described embodiments.
In another aspect there is provided a training data generation system, configured to:
In another aspect there is provided a computer-implemented image processing method for supporting tomographic imaging, in particular DF or PC tomographic imaging, comprising:
In embodiments, the input projection images at the different phase steps are acquired by the tomographic X-ray imaging apparatus at respective different projection directions associated with the given projection direction.
In embodiments, the method comprises reconstructing the output projection imagery in projection domain for plural such given projection directions into reconstructed dark-field and/or phase contrast imagery in image domain.
In another aspect there is provided a computer-implemented training method for training the said machine learning component based on training data for phase-contrast and/or dark-field tomographic imaging.
In another aspect there is provided a computer-implemented method of generating training data, comprising:
In another aspect there is provided a computer program element, which, when being executed by at least one processing unit, is adapted to cause the processing unit to perform the method as per any one of the above mentioned embodiments.
In another aspect still, there is provided a computer readable medium having stored thereon the program element.
“user” relates to a person, such as medical personnel or other, operating the imaging apparatus or overseeing the imaging procedure. In other words, the user is in general not the patient.
“object” is used herein in the general sense to include animate “objects” such as a human patient or animal, or anatomic parts thereof, but also includes inanimate objects such as an item of baggage in security checks or a product in non-destructive testing. However, the proposed system will be discussed herein with main reference to the medical field, so we will be referring to the “object” as “the patient” and the region of interest ROI, being a particular anatomy or group of anatomies of the patient.
By “phase retrieval (algorithm)” is meant any algorithm based on signal models or otherwise that compute a phase signal in combination with a dark-field signal from measured raw data (i.e. intensities). Because of the mutual interplay between phase shift and the dark-field signal which results from small angle scattering, in phase retrieval algorithms both signals, dark-field and phase, are usually computed jointly. Although “phase retrieval” is an established name, it may also be referred to herein as a “dark-field signal retrieval”. The phase retrieval operation may be facilitated by an imaging facilitator structure such as gratings, structured masks, coded aperture plates, crystals etc., or other at least partially radiation blocking structures with periodic or non-periodic sub-structures, that interact with the imaging X-ray beam to realize different measurements to so impose more constraints. This helps resolve ambiguities, or ill-posedness, otherwise inherent in phase retrieval.
In general, the “machine learning component” is a computerized arrangement that implements a machine learning (“ML”) algorithm that is configured to perform a task. In an ML algorithm, task performance improves measurably with training experience. Training experience may include exposing the arrangement to more (new and suitably varied) training data. The task’s performance may be measured by objective tests when feeding the system with test data. The task’s performance may be defined in terms of a certain error rate to be achieved for the given test data. See for example, T. M. Mitchell, “Machine Learning”, page 2, section 1.1, McGraw-Hill, 1997.
“2D”, “3D”, etc is shorthand for two-dimensional and three-dimensional, respectively.
Exemplary embodiments of the invention will now be described with reference to the following drawings, which, unless stated otherwise, are not to scale, wherein:
With reference to
The imaging arrangement IA includes an X-ray imaging apparatus XI that is configured to acquire projection raw data of an object PAT such as a human patient or animal.
The projection raw data acquired by the imaging apparatus XI may be processed by a computerized image processing system IPS to produce sectional imagery of object PAT. In more detail, the image processing system includes a reconstructor RECON and a projection image pre-processor PP (referred to herein as “the pre-processor”). The acquired projection raw data are pre-processed by the pre-processor PP to produce pre-processed projection imagery. The reconstructor RECON processes the pre-processed projection imagery into the section imagery, as will be explored more fully below.
The sectional imagery may be passed through a communication interface CI to be stored in memory DB, such as in a database system, and/or may be visualized by a visualizer (not shown) on a display device DD, or may be otherwise processed.
The imaging apparatus XI (“imager”) envisaged herein is in particular of the X-ray based tomographic type. In this type of imaging, also referred to as rotational imaging, the projection raw data λ are acquired of a ROI of patient PAT or of the patient as a whole, by exposing the patient to an X-ray beam. The projection raw data may then be pre-processed as will be explained more fully below, and then reconstructed by a reconstructor RECON into axial cross-sections - the said sectional images or “slices”. The axial cross-sectional images may reveal information about internal structures of the ROI to allow examination and diagnosis by clinicians in line with clinical goals or objectives to be achieved. Particularly envisaged herein are X-ray based imagers, such as computed tomography (CT) scanners, or C-arm/U-arm imagers, mobile, or fixedly mounted in an operating theatre, etc. The imaging apparatus is configured for tomographic dark-field (“DF”) CT imaging (DF-CT) and/or tomographic phase-contrast (“PC”)-CT imaging (PCI-CT). The DF and/or PC- imagining capacity of the imager XI is facilitated by an imaging facilitator structure IFS that is arranged in the X-ray beam. The pre-processor PP helps reduce artifacts in reconstructed sectional DF and/or PC-imagery. Operation of the pre-processor PP will be explored in more detail further below.
The imager XI includes an X-ray source XS and an X-ray sensitive detector D. The detector includes a plurality of X-ray sensitive (detector) pixels. The imager XI may be configured for energy integrating imaging or for spectral imaging (also referred to as energy discriminating imaging). Accordingly, the detector D may be of the energy integrating-type, or of the energy discriminating type, such as a photon-counting detector.
During image acquisition, patient PAT resides in an examination region ER between the source XS and detector D. In embodiments, the source X-ray moves in an imaging orbit or scan path in a rotation plane around an imaging axis. Helical scan paths are also envisaged. The rotation may be achieved by having imager XI include a stationary gantry FG and a rotating gantry MG. The rotating gantry MG is rotatably supported by the stationary gantry FG. The rotating gantry MG rotates around the examination region ER and at least a portion of subject PAT therein, and about an imaging axis. The radiation source XS, such as an X-ray tube, is supported by and rotates with the rotating gantry MG around the examination region ER. The imaging axis passes through the ROI. Preferably, the patient’s longitudinal axis is aligned with the imaging axis, but other arrangements and geometries are also envisaged. For example, in some embodiments, the patient is standing, with the X-ray source rotating around the patient’s (now upright) longitudinal axis.
During the rotation, the source XS emanates the X-ray beam, and irradiates the ROI. During the rotation, the projection raw data are acquired at the detector D from different projection direction pi. The X-ray beam passes along the different directions through the patient PAT, particularly through the ROI. The X-ray beam interacts with matter in the region of interest. The interaction causes the beam to be modified. Modified radiation emerges at the far end of the patient and then impinges on the X-ray sensitive detector D. Circuitry in the detector converts the impinging modified radiation into electrical signals. The electrical signals may then be amplified or otherwise conditioned and are then digitized to obtain (digital) projection raw data λ. The projection raw data λ may then be pre-processed by the pre-processor PP, and is then reconstructed into the axial sectional DF or PC-imagery by a reconstructor RECON.
The re-constructor RECON is a computer implemented module that runs a reconstruction algorithm, such as FBP (filtered back-projection), Fourier-domain based reconstruction algorithms, algebraic (ART) reconstruction algorithm, or iterative reconstruction algorithms. The reconstruction algorithm is adapted to the imaging geometry used. In embodiments, a cone-beam reconstruction algorithm is used. In embodiments, the reconstruction algorithm is adapted for helical scan paths. The reconstruction algorithm is adapted for DF-CT and/or PCI-CT imaging, as will be explored in more detail below.
The reconstructor RECON module may be arranged in hardware or software or both. The reconstructor RECON transforms the projection raw data λ acquired in the projection domain of the detector D into the axial sectional imagery in image domain. Image domain includes the portion of space in the examination region where the patient resides during imaging. In contrast, the projection domain is located in an X-ray radiation sensitive surface or layer of the X-ray detector D. In the image domain, the reconstructed imagery may be defined in cross sectional planes parallel to the rotation plane(s) of the orbit and perpendicular to the imaging axis. Different axial images in different cross sectional planes can be acquired, that together form a 3D image volume, i.e., a 3D image representation of the ROI. The 3D volume may be acquired by advancing the support table TB on which patient PAT resides during imaging, such as in a helical scan path. Alternatively, or in addition, it is the stationary gantry FG that is translated. The relative translational motion of patient PAT versus source XS along the imaging axis, and the rotation of source XS around said imaging axis give rise to the helical scan path at a certain pitch. The pitch may be fixed or is user adjustable. In non-helical scans, the scan path in general subtends an arc of at least, or substantially equal to, 180° (plus fan angle).
The projection raw data λ as acquired in the scan comprises a number of different projection raw images, or “frames”, as shown in
Generally, when X-radiation interacts with material, it experiences both attenuation and refraction, The refraction results in a phase change. The attenuation on the other hand can be broken down into attenuation that stems from photo-electric absorption and attenuation that comes from scatter. The scatter contribution in turn can be decomposed into Compton scattering and Rayleigh scattering. For present purposes of dark-field imaging it is the small angle scattering that is of interest, where “small angle” means that the scatter angle is so small that the scattered photon still reaches the same detector pixel as it would have reached without being scattered.
As such, the original projection raw data λ records a combination of contributions of the above-mentioned contrast mechanism of attenuation, refraction, and small-angle scattering. The pre-processing is configured to isolate the two contrast mechanisms of main interest herein, the phase-contrast and the dark-field contribution, from the combined contribution as recorded in the projection raw data λ.
In the proposed system IPS, it is not the measured raw data λ themselves that are used in the reconstruction, but instead suitably prep-processed projection imagery λ′, produced by the projection image pre-processor PP from the measured raw data λ. After providing more details on the imaging procedure, operation of the pre-processor PP will be explained in more detail further below.
Turning now first in more detail to the above mentioned imaging facilitator structure IFS, this is arranged in the X-ray beam to facilitate phase/DF retrieval by the reconstructor RECON. More specifically, the imaging facilitator structure IFS ensures that refraction and small-angle scattering have an influence on the detected raw data so that the subsequent reconstructor RECON can create a PC and/or DF image. In general, the image facilitator structure IFS allows acquiring multiple measurements for a given projection direction pi so as to be able to resolve the intensity measurements into phase and/or dark-field information. The process of acquiring those multiple measurements is sometimes referred to as “phase stepping”. Again, we shall retain this terminology herein for historical reasons, with the understanding that herein the phase stepping is an operation to further in addition or instead, DF imaging and not (only) phase contrast imaging.
In some embodiments, but not all embodiments, the imaging facilitator structure IFS is an arrangement of gratings. Specifically, and in some such embodiments, the imaging facilitator structure IFS is arranged as an interferometer, in the form of one, two, or three grating structures as shown in
The phase stepping operation may be active or passive, or both. Active embodiments include mechanisms, such as control circuitry and hardware of the imager XI, operable to induce phase stepping for each projection direction pi, for instance by scanning one of the gratings past the X-ray beam, or by changing source XS’s focal spot position, etc. Passive phase stepping embodiments use the CT scanning rotation of the source XS itself in combination with grating movement induced by vibrations caused by the rotation. Passive phase stepping does not require additional control circuity and/or hardware.
In the example of 2B, it is earlier and later projection images that together with the current projection image form the phase stepping group. The phase stepping group in the embodiment of
Λi ={..., λj, λi, λl, ...},j<i<l. Each projection direction covered by the rotation gives rise to a different phase stepping group. A plurality of such phase stepping groups can thus be defined, and any given projection direction covered in the scan is associable with “its own”, respective phase stepping group.
In the reconstruction algorithm, for each given projection direction, the projection images from the respective phase stepping group are pre-processed together to form the pre-processed projection data which are then used to reconstruct the phase-contrast and/or the dark-field sectional image. The reference line RL in
One type of reconstruction algorithm as used herein is iterative, although non-iterative, analytic, reconstruction algorithms are also envisaged herein, such as FBP and others. However, turning now first to the iterative reconstruction algorithms, these include in particular a forward projector. The forward projector describes how the three contrast mechanisms of interest herein interact together and map from image domain into projection domain.
In iterative CT image reconstruction, the image domain is populated, in each iteration cycle, with tentative image values placed in a grid of voxel positions. The forward projection produces synthesized projections based on the said values currently placed at the voxel grids. Unlike in the proposed system, in previous reconstruction approaches the so synthesized projections are compared with the actually acquired projection raw data λ. There is usually a mismatch which is quantified by a cost function. The values in image domain are updated iteratively by an up-dater, so as to improve the cost function and hence reduce the residue between the measured projections, and the synthesized projections as per the forward model:
wherein:
As indicated above at (1b), the reconstruction algorithm uses reference data I0, ϕ0, V0. Reference data is also used in non-iterative reconstruction schemes such as FBP. The reference data relates to measurements obtained in a calibration operation, an “air scan”, and includes in particular reference visibility and reference phase ϕ0 of the fringe pattern recordable without there being an object OB in the examination region (1° refers to the reference attenuation and is not considered herein further). The reference data describe a reference fringe pattern recordable at the detector in the air scan. When an object is introduced into a beam during imaging, a new fringe pattern is recordable which may be understood as a disturbed, deviated, version of the reference pattern. It is this deviation that is harvested by the reconstruction algorithms to reconstruct the sought after imagery PC and/or DF imagery δ, ε.
Whilst the system vibrations are usefully leveraged herein in some embodiments for phase stepping as described above, the said vibrations have also undesirable effects. More specifically, in traditional dark-field and phase-contrast reconstruction, it is usually assumed that the reference phase and reference visibility as included in the reference data is constant from “view to view”, that is, from projection direction pi to another projection direction pj. This assumption has proved to be wrong. The reference data may indeed change with projection direction. If this change is unaccounted for in the modelling (1a), this may lead to image artifacts. In one approach as described in applicant’s WO 2016/207423 an attempt was made to include the reference phase as an additional fitting variable. Other approaches have attempted to model the phase reference fluctuation by more complex functions, such as polynomials etc.
It appears that the fluctuation of the phase reference, and also of the reference visibility, is the result of the rotation of the X-ray source which induces vibrations. Ambient factors, such as temperature changes, changes in humidity etc. may also affect the gratings of the interferometer IFS which are delicate structures.
The upshot of all this is that it appears difficult to model the changes of reference phase and/or visibility analytically.
It is therefore proposed herein to eliminate image artifacts caused by an unknown variation of the reference phase and reference visibility. This is achieved by the proposed projection image pre-processor PP. The projection image pre-processor PP receives as input the acquired projection raw data λ for a given phase stepping group, and processes these into preferably three pre-processed projection images λµ, λδ, λε, one for each contrast mechanism. Each of the pre-processed projection images now records information of the respective contrast mechanism, in isolation from the other two, or at least with negligible contribution from the other two contrast mechanisms. The pre-processed projection images λµ, λδ, λε, represent approximations where the disturbing effects of phase reference and visibility fluctuations have been accounted for. The information on the fluctuations that had earlier evaded analytic representation is now encoded in the image information as per the pre-processed projection imagery λµ, λδ, λε. Because of the disentanglement of the three contrast mechanisms into the three respective pre-processed projection images λµ, λδ, λε, the artifacts caused by the changes in the reference data can be reduced or even entirely eliminated.
More particularly, rather than attempting to map out the reference data fluctuation by classical analytical methods, a machine learning approach is proposed herein. More particularly still, the projection pre-processor PP includes a machine learning component MLC, pre-trained based on training data. The projection image pre-processor PP uses its machine learning component MLC to transform the acquired projection raw data λ per phase stepping group into the pre-processed projection imagery λµ, λδ, λε. The reconstructor RECON may then use the so transformed pre-processed projection imagery (λµ, λδ, λε) =λ′, instead of the originally measured projection raw data λ, to reconstruct artifact free, or at least artifact reduced, sectional images. The phase retrieval in the proposed image processing IPS, can thus be seen herein to arise in the pre-processing of the raw data λ into the new projection images λµ, λδ, λε.
The computed improved projections (λµ, λδ, λε) are now better approximations for the line integrals ∫Lij µ dl, ∂x ∫Lij δ dl, and ∫Lij ε dl, respectively, with Lij indicating the respective line between source focal spot for projection position pi and detector pixel j. The processed projections (λµ, λδ, λε) allow definition of simplified and dis-entangled forward projections:
The pre-processed projections (λµ, λδ, λε) may be used in separate iterative 1-channel reconstruction algorithms, for one of λδ, λε. Alternatively, even classical-analytical reconstructions such as FBP can benefit where λδ or λε is used. “Channel” as used herein refers to either one of the contrast mechanism attenuation, DF and PC. If an algorithm or model accounts for all three of attenuation, DF and PC, it is said to have three channels, or, accordingly, two channels or one channel, if only both of DF and PC is computed, or only DF or PC is computed, respectively.
The pre-processed attenuation projections λµ, whilst of less interest herein, may still be used with benefit herein as a third (or second) channel when reconstructing for DF and/or PC sectional imagery. This is because jointly running the reconstruction for the attenuation channel based on λµ alongside the other one, or preferably two, projections λδ, λε may make the reconstruction more stable. The reconstruction may converge quicker to realistic solutions. The attenuation projections λµ, or more specifically, the attenuation image µ act as a source of regularization.
The machine learning model component MLC is based on a machine learning model M. In embodiments, a convolutional neural network (“CNN”) is used for this. The model M may be trained by a computerized training system TS. In the training, the training system TS adapts an initial set of (model) parameters θ of the model M. The training data may be generated by a training data generator system TDG. Two processing phases may thus be defined in relation to the machine learning model M: a training phase and a deployment phase. In training phase, prior to deployment phase, the model is trained by adapting its parameters. Once trained, the model may be used in deployment phase to transform projection raw data (that are not from the training data) into the disentangled pre-processed projection imagery for any given patient PAT during clinical use. The training may be a one off operation, or may repeated with new training data. The above mentioned components will now be described in more detail.
Turning now first to
Referring now in more detail to
In deployment, input raw data such as a phase stepping group comprising five projection images λ1-λ5 is received at the input layer IL. The input raw data λ is fed into model M at input layer IL, then propagates through a sequence of hidden layers L1-L3 (only three are shown, but there may be one or two, or more than three), to then emerge at an output layer OL. The output includes the three pre-processed projection images, each capturing only the respective contributions for attenuation, differential phase-contrast, and dark-field signals, indicated in
The network M is preferably arranged as a multi-channel system where the input raw data λ is arranged in multiple channels one for each projection direction pi, in this case five, of the phase stepping group. The layers of the network, and indeed the input and output imagery, and the input and output between hidden layers (referred to herein as feature maps), can be represented as two or higher dimensional matrices (“tensors”) for computational and memory allocation efficiency. For the input raw data, dimensions X, Y correspond to pixel information, whilst the channel depth C corresponds to the size of the phase stepping groups (in this example five, but this is exemplary, and not limiting herein in any way). The number of projection directions per phase stepping group could be less than 5 such as three although having at least 3 or more projection images per group is preferred. The input imagery λ1-5 forms a 3D-dimensional matrix, with depth N equal to the size C of the phase stepping group. The matrix size X × Y × C of the input layer IL equals that of the input phase stepping group of raw data λ1-5.
Preferably, the hidden layers include a sequence of convolutional layers, represented herein as layers L1 - LN-k, k>1. The number of convolutional layers is at least one, such as 3 or 5 or any other number. The number may run into double-digit figures.
In embodiments, downstream of the sequence of convolutional layers there may be one or more fully connected layers but this is not necessarily the case in all embodiments and indeed preferably no fully connected layer is used in the architecture envisaged herein in embodiments.
Each hidden Lm layer and the input layer IL implements one or more convolutional operators CV. Each layer Lm may implement the same number of convolution operators CV or the number may differ for some or all layers.
A convolutional operator CV implements a convolutional operation to be performed on its respective input. The convolutional operator may be conceptualized as a convolutional kernel. It may be implemented as a matrix including entries that form filter elements referred to herein as weights θ. It is in particular these weights that are adjusted in the learning phase. The first layer IL processes, by way of its one or more convolutional operators, the input raw data λ1-λ5 in each channel separately to produce respective feature maps for each convolutional operator per channel. Feature maps are the outputs of convolutional layers, one feature map for each convolutional operator in a layer and for a given channel. The feature map of an earlier layer is then input into the next layer to produce feature maps of a higher generation, and so forth until the last layer OL combines all feature maps into the three pre-processed images λµ λδ and λε in the correct dimension. Alternatively, it may be possible to configure the network M to produce only a 2-channel output λδ, λε or indeed a 1-channel output λδ or λε, depending on the way the pre-processed data are to be used in the subsequent reconstruction via reconstructor RECON.
The convolutional operators in the hidden layers are preferably two dimensional so no cross-channel convolution is used in between the hidden layers. This helps better isolating and separating the two (DF and PC) or all three contrast mechanisms. However, as said, the last layer OL operates as a dimension reducer to aggregate all feature maps produced in the hidden layers into the correct dimensions. This is achieved in an embodiment by linearly combining across the channels of the respective feature maps. The 2D convolutions in the hidden layers with linear combination across feature maps at output layer IL allows more efficient processing and still rich enough modelling, as opposed to 3D convolutions in hidden layers. Whilst 3D convolutions are not excluded herein, they would incur higher computational overhead.
A convolutional operator in a convolutional layer is distinguished from a fully connected layer in that an entry in the output feature map of a convolutional layer is not a combination of all nodes received as input of that layer. In other words, the convolutional kernel is only applied to sub-sets of the input raw data λ, or to the feature map as received from an earlier convolutional layer. The sub-sets are different for each entry in the output feature map. The operation of the convolution operator can thus be conceptualized as a “sliding” over the input, akin to a discrete filter kernel in a classical convolution operation known from classical signal processing. Hence the naming “convolutional layer”. In a fully connected layer an output node is general obtained by processing all nodes of the input layer.
The stride of the convolutional operators can be chosen as one or greater than one. The stride defines how the sub-sets are chosen. A stride greater than one reduces the dimension of the feature map relative to the dimension of the input in that layer. A stride of one is preferred herein. In order to maintain the dimensioning of the feature maps to correspond to the dimension of the input imagery, a zero padding layer P is applied. This allows convolving even feature map entries situated at the edge of the processed feature map.
Some or each convolutional filter in some or each layer Lm may be combined with a nonlinearity inducing layer, such as a RELU (rectified linear unit) operator as shown in
Other layers than the ones shown can be combined with the convolutional layers in any combination, including max-pooling layers, drop-out layers and others, or such other layers may be used instead.
The totality of the weights for all convolutional filter kernels of the model M define a configuration of the machine learning model. It is these weights that are learned in a training phase. Once the training phase has concluded, the fully learned weights, together with the architecture in which the nodes are arranged, can be stored in memory MEM and can be used for deployment. In deployment phase, newly acquired projection raw data, not forming part of the training set, can then be fed into the input layer so as to obtain an estimate for the three pre-processed projections (λµ, λδ, λε) at output lay OL.
Functionally, operation of the machine learning model M as used herein is a regression, and models other than CNN, indeed other than NN altogether, may also be used instead herein, such statistical regression models that can account for training data.
As mentioned earlier, depending on the forward model used in the reconstruction, in embodiments the ML model M may be configured to output two projections images for the DF and PC contribution, without the attenuation projection image. Single channel outputs with a projection output for either the DF or the PC are also envisaged in alternate embodiments.
The training aspects of the machine learning model M will be explained further below at
Turning now first to
At step S410, projection images are obtained by a tomographic X-ray system configured for phase-contrast and/or dark-field imaging. Preferably, but not necessarily, in the proposed system, the rotation of the X-ray source is used as a passive phase stepping mechanism.
For some or each given projection direction, a group of projection raw data is associated to form the above described phase stepping group. For each given projection direction, its phase stepping group of projection raw data is processed at step S420 by a pre-trained machine learning algorithm to produce in embodiments three estimated pre-processed projections image: one for phase-contrast, one for the dark-field, and one for attenuation. In the three output pre-processed projection images, the effects of fluctuations in reference visibility and/or reference phase have been eliminated. In addition, information encoded into the three projection images so produced is resolved and separated into phase-contrast, attenuation and/or dark-field contrast, respectively. In embodiments not all three pre-processed projection images are produced per any given direction, but only one or two, namely for the phase-contrast and/or the dark-field signal, as required.
At step S430 the pre-processed projection imagery (λµ, λδ, λε) is then reconstructed by a single-channel, a multi-channel or an FBP reconstruction algorithm into phase-contrast δ and/or dark-field ε sectional images in image domain, which may then be made available for display, storage or other processing.
Providing now further details on the training aspects of the machine learning model, reference is first made to
Due to imperfections of, for example IBSIR and its inability to account for changes in the reference data, the reconstructed DF and PC images will suffer from artifacts. However, since the internal structure of the phantom SB is known, these artifacts can be eliminated using image processing (such as structure filtering) to obtain artifact reduced, “clean”, reconstructed DF and/or PC imagery. For instance, a CAD (computer aided design) model of the phantom SB can be rigidly registered to the reconstructed images, respectively, and the reconstructed image values can then be replaced by ground truth values. Alternatively, within each of the homogeneous objects in the phantom SB, a mean value is computed and the image values in each homogeneous object is set to the so calculated mean value. The cleaned reconstructed DF and/or PC images (and the attenuation imagery, if any) can then be respectively projected forward to generate ground-truth target pre-processed projections which, together with the actually measured raw data, form pairs of training data for training a machine learning model, such as the CNN in
The pairs for training image data may then be formed as
with Λi the phase stepping group for the phantom scan raw data and for a given direction pi as defined above in
The sample body SB or phantom may be formed in any shape such as a cylinder. Other shapes are also envisaged. In embodiments a diameter of the cylindrical phantom corresponds to a size of the examination region. In embodiments, the diameter of cylindrical phantom SB is substantially flush with the inner lumen of the examination region. Preferably, the phantom SB comprises inclusions in any shape or size (such as ellipsoids or others) of several distinct materials such as material equivalent to water, fat, muscle, bone and a material of spongy structure to mimic diffusive properties of lung tissue. In embodiments, lumps of real animal lung tissue may be used, such as from pigs, suitably fixed. Other combinations of materials may be chosen instead. Preferably, however, the materials are so chosen that each creates a distinct, i.e. different from the other materials, set of signals for the three contrast mechanisms.
Reference is now made to
The training input data xk may be obtained from historical image data acquired in the lab or previously in the clinic and held in image repositories. The targets yk or “ground truth” may represent examples of the above mentioned results of pre-processed projections, including the one, two or preferably three pre-processed projections from which artifact free images are constructible, substantially free of artifact caused by reference data variation during rotation. In embodiments, the training data is obtained as described at
with i the respective projection direction, the phase stepping group Λ defined for the projection raw data collected in the phantom scan. The index k for the pairs in general corresponds to projection direction i.
In the training phase, an architecture of a machine learning model M, such as the shown CNN network in
Assuming for now the paradigm of a cost function F, this measures the aggregated residue(s), that is, the error incurred between data estimated by the model M and the targets as per some or all of the training data pairs k:
In training, the training input data xk of a training pair is propagated through the initialized network M. Specifically, the training input xkfor a k-th pair is received at an input IL, passed through the model and is then output at output OL as output training data Mθ(x). A suitable measure || . || is used such as a p-norm, squared differences, or other, to measure the difference between the actual training output Mθ(xk) produced by the model M, and the desired target yk.
The output training data M(xk) is an estimate for target yk associated with the applied input training image data xk. In general, there is an error between this output M(xk) and the associated target yk of the presently considered k-th pair. An optimization scheme such as backward/forward propagation or other gradient based methods may then be used to adapt the parameters θ of the model Mθ so as to decrease the residue for the considered pair (xk, yk) or a subset of training pairs from the full training data set.
After one or more iterations in a first, inner, loop in which the parameters θ of the model are updated by updater UP for the current pair (xk,yk), the training system TS enters a second, an outer, loop where a next training data pair xk+1, yk+1 is processed accordingly. The structure of updater UP depends on the optimization scheme used. For example, the inner loop as administered by updater UP may be implemented by one or more forward and backward passes in a forward/backpropagation algorithm. While adapting the parameters, the aggregated, summed, residues of all the training pairs are considered up to the current pair, to improve the objective function. The aggregated residue can be formed by configuring the objective function F as a squared sum (or other algebraic combination) such as in eq. (3) of some or all considered residues for each pair.
Optionally, one or more batch normalization operators (“BN”, not shown) may be used. The batch normalization operators may be integrated into the model M, for example coupled to one or more of the convolutional operator CV in a layer. BN operators allow mitigating vanishing gradient effects, the gradual reduction of gradient magnitude in the repeated forward and backward passes experienced during gradient-based learning algorithms in the learning phase of the model M.
The generalized training system as shown in
Referring now to
At step S610 projection images are acquired by a tomographic X-ray apparatus XI configured for phase-contrast and/or dark-field imaging of a sample body resident in the examination region. In embodiments, the phantom SB is preferably as described above. The so acquired projection images form a first set of projection images.
At step S620, sample DF and/or PC imagery reconstructable from this first set of projection images can be image-processed based on prior knowledge of material type and geometry of the internal structures of the known phantom SB. Based on the image-processed DG and/or PC reconstructions, a second set of projection images is derived. The first set of projection images and the second set of projection images in association are then provided as training data for training a machine learning model.
In embodiments the step S620 for image processing the first set of projection images may be implemented as follows. In a sub-step of step S620, a phase-contrast and/or a dark-field reconstruction algorithm, such as an iterative reconstruction algorithm IBSIR, is used to reconstruct one or more dark-field and/or phase-contrast section images in image domain. These images will include artifacts due to the described fluctuations of reference phase and reference visibility. However, because the true structure of the phantom is known, image processing can be used to eliminate the artifacts the sectional images in image domain. For instance, a CAD model of the phantom SB can be rigidly registered to the reconstructed image and the reconstructed image values can then be replaced by ground truth values. Alternatively, within each of the homogeneous objects in the phantom SB, a mean value is computed and the image values in each homogeneous object is set to the calculated mean value. The so image-processed sectional DF and/or PC image, with artifacts eliminated, is then forward projected in step S630 to obtain the second set of, now disentangled, projection images which are now free of information that may otherwise cause artifacts in image domain due to the fluctuations of reference visibility and reference phase. The disentangled projection images for the DF and PC channel may then be paired up based on projection direction with an associated phase stepping group defined in the first set of projections to obtain pairs of training data.
Referring now to
At step S710 training data is received in the form of pairs (xk,yk). The pair may be generated as described above at
At step S720, the projection imagery of a given phase stepping group is applied to an initialized machine learning model M to produce a training output.
A deviation of the training output M(xk) from the associated target yk is quantified by a cost function F. One or more parameters of the model are adapted at step S730 in one or more iterations in an inner loop to improve the cost function. For instance, the model parameters are adapted to decrease residues as measured by the cost function.
The training method then returns in an outer loop to step S710 where the next pair of training data is fed. In step S720, the parameters of the model are adapted so that he aggregated residues of all pairs considered are decreased, in particular minimized. Forward- backward propagation or similar gradient-based techniques may be used in the inner loop.
More generally, the parameters of the model M are adjusted to improve an objective function F which is either a cost function or a utility function. In embodiments, the cost function is configured to the measure the aggregated residues. In embodiments the aggregation of residues is implemented by summation over all or some residues for all pairs considered. The method may be implemented on one or more general-purpose processing units TS, preferably having processors capable for parallel processing to speed up the training.
The components of the training system TS, the training data generator TDG or the image processing system IPS including the projection image processor PP may be implemented as one or more software modules, run on one or more general-purpose processing units PU such as a workstation associated with the imager XI, or on a server computer associated with a group of imagers.
Alternatively, some or all components of the training system TS, the training data generator TDG or the image processing system IPS including the projection image processor PP may be arranged in hardware such as a suitably programmed microcontroller or microprocessor, such an FPGA (field-programmable-gate-array) or as a hardwired IC chip, an application specific integrated circuitry (ASIC), integrated into the imaging system XI. In a further embodiment still, the training system TS, the training data generator TDG or the image processing system IPS including the projection image processor PP may be implemented in both, partly in software and partly in hardware.
The different components of the training system TS, the training data generator TDG or the image processing system IPS including the projection image processor PP may be implemented on a single data processing unit PU. Alternatively, some or more components are implemented on different processing units PU, possibly remotely arranged in a distributed architecture and connectable in a suitable communication network such as in a cloud setting or client-server setup, etc.
One or more features described herein can be configured or implemented as or with circuitry encoded within a computer-readable medium, and/or combinations thereof. Circuitry may include discrete and/or integrated circuitry, a system-on-a-chip (SOC), and combinations thereof, a machine, a computer system, a processor and memory, a computer program.
In another exemplary embodiment of the present invention, a computer program or a computer program element is provided that is characterized by being adapted to execute the method steps of the method according to one of the preceding embodiments, on an appropriate system.
The computer program element might therefore be stored on a computer unit, which might also be part of an embodiment of the present invention. This computing unit may be adapted to perform or induce a performing of the steps of the method described above. Moreover, it may be adapted to operate the components of the above-described apparatus. The computing unit can be adapted to operate automatically and/or to execute the orders of a user. A computer program may be loaded into a working memory of a data processor. The data processor may thus be equipped to carry out the method of the invention.
This exemplary embodiment of the invention covers both, a computer program that right from the beginning uses the invention and a computer program that by means of an up-date turns an existing program into a program that uses the invention.
Further on, the computer program element might be able to provide all necessary steps to fulfill the procedure of an exemplary embodiment of the method as described above.
According to a further exemplary embodiment of the present invention, a computer readable medium, such as a CD-ROM, is presented wherein the computer readable medium has a computer program element stored on it which computer program element is described by the preceding section.
A computer program may be stored and/or distributed on a suitable medium (in particular, but not necessarily, a non-transitory medium), such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the internet or other wired or wireless telecommunication systems.
However, the computer program may also be presented over a network like the World Wide Web and can be downloaded into the working memory of a data processor from such a network. According to a further exemplary embodiment of the present invention, a medium for making a computer program element available for downloading is provided, which computer program element is arranged to perform a method according to one of the previously described embodiments of the invention.
It has to be noted that embodiments of the invention are described with reference to different subject matters. In particular, some embodiments are described with reference to method type claims whereas other embodiments are described with reference to the device type claims. However, a person skilled in the art will gather from the above and the following description that, unless otherwise notified, in addition to any combination of features belonging to one type of subject matter also any combination between features relating to different subject matters is considered to be disclosed with this application. However, all features can be combined providing synergetic effects that are more than the simple summation of the features.
While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. The invention is not limited to the disclosed embodiments. Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing a claimed invention, from a study of the drawings, the disclosure, and the dependent claims.
In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single processor or other unit may fulfill the functions of several items re-cited in the claims. The mere fact that certain measures are re-cited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. Any reference signs in the claims, be they numerals, alphanumerical, or a combination of one or more letters, or a combination of any of the foregoing, should not be construed as limiting the scope.
Number | Date | Country | Kind |
---|---|---|---|
20185827.1 | Jul 2020 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/068450 | 7/5/2021 | WO |