The present invention relates to two dimensional (2D) to three dimensional (3D) medical image registration, and more particularly, to deep learning based 2D/3D medical image registration.
Two dimensional (2D) to three dimensional (3D) image registration is an important technology in medical imaging and image-guided interventions. 2D/3D image registration can be used to bring pre-operative 3D medical image data and intra-operative 2D medical image data into the same coordinate system to facilitate accurate diagnosis and/or provided advanced image guidance. For example, the pre-operative 3D medical image data generally includes computed tomography (CT), cone-beam CT (CBCT), magnetic resonance imaging (MRI), and/or computer aided design (CAD) models of medical devices, while the intra-operative 2D medical image data is typically X-ray images.
2D/3D image registration is typically achieved using intensity-based methods. In such intensity-based 2D/3D image registration methods, in order to register a 3D X-ray attenuation map provided by CT or CBCT (or converted from another imaging modality), a simulated X-ray image, referred to as a digitally reconstructed radiograph (DRR), is derived from the 3D attenuation map by simulating the attenuation of virtual X-rays. An optimizer is then employed to maximize an intensity-based similarity measure between the DRR and X-ray images. Intensity-based methods are able to achieve high registration accuracy, but suffer drawbacks including long computation time and small capture range. Because intensity-based methods involve a large number of evaluations of the similarity measure, each requiring heavy computation in rendering the DRR, such methods typically result in running times greater than one second, and therefore are not suitable for real-time applications. In addition, because the similarity measures to be optimized in intensity-based methods are often highly non-convex, the optimizer has a high chance of getting trapped into a local maxima, which leads to such methods having a small capture range in which high registration accuracy can be achieved.
The present invention provides a method and system for 2D/3D medical image registration deep learning-based regression. Embodiments of the present invention achieve real-time 2D/3D medical image registration with a large capture range and high accuracy. Embodiments of the present invention train convolutional neural network (CNN) regressors to determine a mapping from a 2D medical image and digitally reconstructed radiograph (DRR) generated from a 3D medical image to the difference of their underlying transformation parameters. Embodiments of the present invention utilize a local image residual (LIR) feature to simplify the underlying mapping to be captured by the CNN regressors Embodiments of the present invention utilize parameter space partitioning (PSP) to partition the transformation parameter space into zones and train CNN regressors in each zone separately. Embodiments of the present invention utilize hierarchical parameter regression (HPR) to decompose the transformation parameters and regress them in a hierarchical manner.
In one embodiment of the present invention, a parameter space zone is determined based on transformation parameters corresponding to a digitally reconstructed radiograph (DRR) generated from the 3D medical image. Local image residual (LIR) features are calculated from local patches of the DRR and the X-ray image based on a set of 3D points in the 3D medical image extracted for the determined parameter space zone. Updated transformation parameters are calculated based on the LIR features using a hierarchical series of regressors trained for the determined parameter space zone. The hierarchical series of regressors includes a plurality of regressors each of which calculates updates for a respective subset of the transformation parameters.
These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.
The present invention relates to a method and system for 2D/3D medical image registration using deep learning-based regression. Embodiments of the present invention are described herein to give a visual understanding of the 2D/3D medical image registration method. A digital image is often composed of digital representations of one or more objects (or shapes). The digital representation of an object is often described herein in terms of identifying and manipulating the objects. Such manipulations are virtual manipulations accomplished in the memory or other circuitry/hardware of a computer system. Accordingly, is to be understood that embodiments of the present invention may be performed within a computer system using data stored within the computer system.
Embodiments of the present invention are described herein as registering a 3D X-ray attenuation map provided by a CT or CBCT with a 2D X-ray image in real-time. Depending on the application, other 3D modalities, such as MRI or a CAD model, can be converted to a 3D X-ray attenuation map before performing the 2D/3D registration.
Assuming that the X-ray imaging system corrects the beam divergence and the X-ray sensor has a logarithm static response, X-ray image generation can be described by the following model:
I(p)=∫μ(L(p,r))dr, (1)
where I(p) is the intensity of the X-ray image at point p, L(p, r) is the ray from the X-ray source to point p, parameterized by r, and μ(·) is the X-ray attenuation coefficient. Denoting the X-ray attenuation map of an object to be imaged as J:3→, and the 3D transformation from the object coordinate system to the X-ray imaging coordinate system as T: 3→3, the attenuation coefficient at point x in the X-ray imaging coordinate system is:
μ(x)=J(T−1∘x). (2)
Combining Equation (1) and Equation (2), we have:
I(p)=∫J(T−1∘L(p,r))dr. (3)
In 2D/3D registration problems, L is determined by the X-ray imaging system, J is provided by the 3D data (e.g., CT intensity), and the transformation T is to be estimated based on the input X-ray image I. Note that given J, L, and T, a synthetic X-ray image I(·) can be computed following Equation (3) using a well-known Ray-Casting algorithm, and the generated synthetic image is referred to as a digitally reconstructed radiograph (DRR).
A rigid-body 3D transformation T can be parameterized by a vector t with six components. According to an advantageous embodiment, the transformation is parameterized by three in-plane transformation parameters and three out-of-plane transformation parameters.
Mathematical definitions of the in-plane and out-of-plane parameters are given as follows.
tx=co−cp·ex,
ty=co−cp·ey,
tz=co−cp·ez,
The in-plane rotation parameter tθ, also referred to as “yaw”, is defined as the angle between ey and the projection of ea on the imaging plane:
tθ∠(ey,ea−ea,eyey).
The out-of-plane rotation parameter tα, also referred to as “pitch”, is defined as the angle between ea and its projection on the imaging plane:
tα=∠(ea,ea−ea,eyey).
The out-of-plane rotation parameter tβ, also referred to as “roll”, is defined as the angle between eb and the projection of ez on the plane perpendicular to ea:
tβ=∠(eb,ez−ez,eaea).
Based on Equation (3), we denote the X-ray image with transformation parameters t as It, where the variables L and J are omitted for simplicity because they are non-varying for a given 2D/3D registration task. The inputs for the 2D/3D registration are: (1) a 3D object described by its X-ray attenuation map J; (2) an X-ray image It
tgt−tini≈f(X(tini,It
An estimation of tgt is then obtained by applying the regressors and incorporating the estimated parameter residuals into tini:
{circumflex over (t)}gt=tini+f(X(tini,It
It is worth noting that the range ϵ in Equation (4) is equivalent to the range of optimization-based registration methods. Based on Equation (4), the problem formulation can be expressed as designing a feature extractor X(·) and training regressors f(·), such that:
δt≈f(X(t,It+δt)),∀δt∈ϵ. (6)
Embodiments of the present invention described below discuss in detail how the feature (X(t, It+δt)) is calculated and how the regressors f(·) are designed, trained, and applied, to achieve accurate, real-time 2D/3D medical image registration.
At step 304, a 2D X-ray image is received. The 2D X-ray image may be a frame in a sequence of X-ray images. In an advantageous embodiment, the 2D X-ray image may be an intra-operative X-ray image received in a sequence of X-ray images acquired during a surgical procedure to guide the surgical procedure. The X-ray image may be received directly from an X-ray image acquisition device or may be received by loading a previously acquired X-ray image from a memory or storage of a computer system. In an advantageous embodiment, the X-ray image is received from the X-ray image acquisition device in real-time as it is acquired.
At step 306, initial transformation parameters are acquired. The initial transformation parameters are estimated by generating an initial DRR from the 3D medical image based on the X-ray image. The initial DRR having the initial transformation parameters can be estimated using well-known techniques for generating a DRR to approximate a 2D X-ray image. For example, a well-understood template matching method can be used to estimate the initial transformation parameters and generate the initial DRR.
At step 308, a parameter space zone is determined based on the current transformation parameters. In an advantageous implementation, the parameter space zone is determined based on the current out-of-plane rotation transformation parameters. In the first iteration of step 308, the parameter space zone is determined based on the initial out-of-plane rotation transformation parameters. In each subsequent iteration of step 308, the parameter space zone is determined by the most recently updated out-of-plane rotation transformation parameters.
The method of
X(t1,It
As discussed in greater detail below in connection with step 310, the method of
Xk(t1,It
where Xk(·,·) denotes the LIR feature extractor for the k-th zone, and Ωk denotes the area covered by the k-th zone. The regressors are trained separately for each zone to recover the simplified mapping that is insensitive to t.
Returning to
x(t,δt,P)={Hp
In the local area of It, the effect of varying tα and tβ within a zone is approximately a 2D translation. Therefore, by extracting the local patches from ROIs selected based on t, the effects of all six translation parameters in t are compensated, making Hpt(It) approximately invariant to t. Since the difference between Hpt(It) and Hpt(It+δt) is merely an additional 2D transformation caused by δt, Hpt(It+δt) is also approximately invariant to t.
The 3D points used for calculating the FIR features are extracted separately for each zone. In an exemplary implementation, the 3D points of the 3D medical image can be calculated once for each zone prior to the surgical procedure and stored in a memory or storage of a computer system (e.g., in a database). When performing registration of the 3D medical image with X-ray images acquired in real-time, the locations of the 3D points for the current zone can then be retrieved and used to calculate the FIR features. The 3D points are extracted separately for each zone as follows. First, 3D points that correspond to edges are extracted as candidates. Specifically, the candidates are extracted by thresholding pixels with high gradient magnitudes in a synthetic X-ray image (i.e., generated using DRR) with tα and tβ at the center of the zone, and then back-projecting them to the corresponding 3D structures. The formation model of gradients in X-ray images can be expressed as:
g(p)=∫η(L(p,r))dr, (10)
where g(p) is the magnitude of the X-ray image gradient at the point p, and η(·) can be computed from μ(·) and the X-ray perspective geometry. p is back-projected to L(p, r0), where
r0=argmaxr L(p,r), (11)
if
∫r
The condition in Equation (12) ensures that the 3D structure around L(p, r) “essentially generates” the 2D gradient g(p) because the contribution of η(·) within a small neighborhood (e.g., σ=2 mm) of L(p, r0) leads to the majority (i.e., ≥90%) of the magnitude of g(p). In other words, we find the dominant 3D structure corresponding to the gradient in the X-ray image.
Once the candidates are extracted, the candidates are filtered so that only ones leading to LIR features satisfying Equation (7) and also not significantly overlapped are kept. This is achieved by randomly generating {tj}j=1M with tα and tβ within the zone and {δtk}k=1M within the capture range ϵ (e.g., M=1000 in an exemplary implementation. The intensity of the n-th pixel of Hp
Et=(hn,i,j,k−hn,i,j,kj)2n,j,k, (13)
Fi=(hn,i,j,k−hn,i,j,kk)2n,j,k, (14)
where · is an average operator with respect to all indexes in the subscript. Since Ei and Fi measure the sensitivity of Hp
Returning to
Steps 508-516 provide details regarding the implementation of step 312 of
Steps 502-516 are repeated for k iterations, and the transformation parameters resulting from the current iterations is used as the starting position for the next iteration. The number of iterations k can be determined empirically. In an exemplary implementation, k=3 iterations can be used. At step 518, the algorithm outputs the final transformation parameters t, which provides the transformation that registers the 3D medical image to the coordinate system of the 2D X-ray image.
In the hierarchical regression approach used in step 312 of
Network Structure:
In an exemplary implementation, the present inventors empirically selected the size of the ROI, which led to N≈18. In this implementation, using the CNN model 600 shown in
Training: In an advantageous implementation, the CNN regression models can be trained exclusively on synthetic X-ray images, because they provide reliable ground truth labels with little need for laborious manual annotation, and the amount of real X-ray images may be limited. In an exemplary implementation, for each group in each zone, 25,000 pairs of t and δt are generated. The parameters t follow a uniform distribution with tα and tβ constrained in the zone. The parameter errors δt also follow a uniform distribution, while three different ranges are used for the three groups, as shown in Table 1, below. The distribution ranges of δt for Group 1 are the target capture range that the regressors are designed for. The distribution ranges of δtx, δty, and δtθ are reduced for Group 2, because they are close to zero after the regressors in the first group are applied. For the same reason, the distribution in the ranges of δtα and δtβ are reduced for Group 3. For each pair of t and δt, a synthetic X-ray image It+δt is generated and the LIR features X(t, It+δt) are calculated using Equation (9).
The objective function to be minimized during the training is the Euclidean loss, defined as:
where K is the number of training samples, yi is the label for the i-th training sample, W is a vector of weights to be learned, and f(Xi; W) is the output of the regression model parameterized by W on the i-th training sample. The weights W are learned using Stochastic Gradient Descent (SGD), with a batch size of 64, momentum of m=0.9, and weight decay of d=0.0001. The update rule for W is:
where i is the iteration index, V is the momentum variable, κi is the learning rate at the i-th iteration, and
is the derivative of the objective function computed on the i-th batch Di with respect to W, evaluated at Wi. The learning rate κi is decayed in each iteration following:
κi=0.0025·(1+0.0001·i)−75. (18)
The derivative
is calculated using back-propagation. For weights share in multiple paths, their derivatives in all paths are back-propagated separately and summed up for the weight update. The weights can be initialized using the Xavier method and mini-batch SGD can be performed for a number of iterations (e.g., 12,500 iterations (32 epochs)).
Returning to
At step 316, the registration results are output. In particular, the final transformation parameters are output, and the transformation parameters provide a transformation that registers the 3D medical image to the coordinate system of the X-ray image. Information from the 3D medical image can then be projected or overlaid onto the X-ray image. For example, a DDR can be generated using the final transformation parameters and the DDR can be overlaid on the X-ray image to provide locations of organs or other anatomical structures that are more visible or defined in the 3D medical image than in the X-ray image. The resulting fused image can be displayed on a display device in real-time during the surgical procedure.
The method of
The above-described methods for 2D/3D medical image registration may be implemented on a computer using well-known computer processors, memory units, storage devices, computer software, and other components. A high-level block diagram of such a computer is illustrated in
The foregoing Detailed Description is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention.
This application claims the benefit of U.S. Provisional Application No. 62/195,430, filed Jul. 22, 2015, and U.S. Provisional Application No. 62/215,326, filed Sep. 8, 2015, the disclosures of which are herein incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
20120039519 | Fei | Feb 2012 | A1 |
20180061051 | Matsunaga | Mar 2018 | A1 |
20180075309 | Sathyanarayana | Mar 2018 | A1 |
Number | Date | Country |
---|---|---|
WO2005024721 | Mar 2005 | WO |
Entry |
---|
K. Hara and R. Chellappa, “Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation,” 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, 2013, pp. 3390-3397.doi: 10.1109/CVPR.2013.435 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6619279&isnumber=6618844. |
M. Ozay, K. Walas and A. Leonardis, “A hierarchical approach for joint multi-view object pose estimation and categorization,” 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, 2014, pp. 5480-5487. doi: 10.1109/ICRA.2014.6907665 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6907665&isnumber=6906581. |
S. Miao, Z. J. Wang and R. Liao, “A CNN Regression Approach for Real-Time 2D/3D Registration,” in IEEE Transactions on Medical Imaging, vol. 35, No. 5, pp. 1352-1363, May 2016. doi: 10.1109/TMI.2016.2521800, URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7393571&isnumber=7463083. |
Number | Date | Country | |
---|---|---|---|
20170024634 A1 | Jan 2017 | US |
Number | Date | Country | |
---|---|---|---|
62195430 | Jul 2015 | US | |
62215326 | Sep 2015 | US |