Cardiac magnetic resonance (CMR) is an important medical imaging tool for heart disease detection and treatment. Conventional CMR technologies often require patients to hold their breath during an imaging procedure so as to diminish the impact of respiratory motions, and an electrocardiogram (ECG) may also be needed in order to determine the cardiac phase of each CMR image and/or to combine data from multiple heart beats to form a single synthesized cardiac contraction cycle. In recent years, an alternative magnetic resonance imaging (MRI) technology called real-time CMR has been increasingly adopted for its faster and more flexible mode of operation. With real-time CMR, however, MRI signals (e.g., k-space data) may be acquired continuously (e.g., instead of always at the start of a specific cardiac phase or slice by slice) and, as such, determining the spatial and/or temporal alignment of the acquired images has posed a challenge.
Described herein are systems, methods, and instrumentalities associated with real-time cardiac MRI image processing. According to embodiments of the present disclosure, an apparatus capable of performing the real-time MRI image processing task may comprise at least one processor configured to obtain a plurality of medical images of a heart and determine, automatically, a slice and a cardiac phase associated with each of the plurality of medical images based on one or more machine-learned (ML) image recognition models. The plurality of medical images may be captured based on a real-time MRI technique and may span multiple cardiac phases and multiple slices of the heart. For example, the plurality of medical images may include a first medical image of the heart captured consecutively with a second medical image of the heart, where the first and second medical images may be associated with respective cardiac phases and slices, and where the first and second medical images may differ from each other with respect to at least one of the cardiac phases or the slices associated with the first and second medical images. Based at least on the automatically determined slice and cardiac phase information of each of the plurality of medical images, and a requirement of a cardiac analysis task, the at least processor may be further configured to select a first group of medical images from the plurality of medical images, and provide the first group of medical images for the cardiac analysis task.
In examples, the at least one processor of the apparatus may be further configured to determine, automatically, a view associated with the each of the plurality of medical images based on the one or more ML image recognition models, and select the first group of images further based on the view associated with the each of the plurality of medical images. Such a view may include, for example, a short-axis view, a 2-chamber long-axis view, a 3-chamber long-axis view, or a 4-chamber long-axis view of the heart.
In examples, the at least one processor of the apparatus may be further configured to select a second group of medical images from the plurality of medical images based at least on the requirement of the cardiac analysis task and the slice and cardiac phase associated with the each of plurality of medical images. The second group of medical images may be associated with a different cardiac cycle than the first group of medical images described above, and the second group of medical images may be misaligned with the first group of medical images with respect to one or more time spots. In such cases, the at least one processor of the apparatus may be configured to generate one or more additional medical images of the heart for the second group of medical images and add the one or more additional medical images to the second group of medical images such that the second group of medical images may be aligned with the first group of medical images with respect to the one or more time spots. The one or more additional medical images may be determined, for example, based on respective timestamps of the medical images comprised in the first and second groups. The one or more additional medical images may be generated, for example, based on an interpolation technique or a machine-learned image synthesis model.
In examples, the at least one processor of the apparatus may be further configured to register a first medical image of the first group of medical images with a second medical image of the first group of medical images, where the registration may compensate for a respiratory motion associated with the first medical image or the second medical image. In examples, the at least one processor may be further configured to perform the cardiac analysis task described above based on the first group of medical images.
A more detailed understanding of the examples disclosed herein may be had from the following description, given by way of example in conjunction with the accompanying drawings.
The present disclosure is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings. A detailed description of illustrative embodiments will now be described with reference to the various figures. Although this description provides a detailed example of possible implementations, it should be noted that the details are intended to be exemplary and in no way limit the scope of the application.
Machine learning techniques may be employed to automatically determine the temporal and spatial properties of the CMR images 102, and group the CMR images 102 based at least on these properties and the requirements of a specific clinical task (e.g., T1/T2 mapping, tissue characterization, medical abnormality detection, etc.) to be performed. The temporal properties may include, for example, the cardiac phase of each of the CMR images 102, the sequential order of the CMR images within a certain slice, etc., while the spatial properties may include, for example, the slice to which each of the CMR images 102 belongs, the view represented by each of the CMR images 102, etc. For example, as shown in
In examples, the one or more ML image recognition models 104 may include a first image classification model trained for separating the CMR images 102 into classes or categories corresponding to different slices or views of the heart. For instance, the first image classification model may be learned and/or implemented using a deep neural network (DNN) to classify each CRM image 102 into a category of views such as a short-axis view, a 2-chamber long-axis view, a 3-chamber long-axis view, a 4-chamber long-axis, etc. The CMR images 102 may be further classified to determine whether they belong to the same slice or different slices (e.g., a view may correspond to an angle at which an image is acquired, while a slice may correspond to a cut along that angle). For instance, the CMR images 102 may include a real-time CMR series comprising images {img_0_0_sax, img_1_0_sax, . . . , img_0_1_sax, img_1_1_sax, . . . , img_0_0_2ch, img_1_0_2ch, . . . , img_0_0_4ch, . . . }, where the first and second numerical values in the denotations may represent the time and slice locations at which the images are captured, respectively, and the last part of the denotations may represent the view (e.g., short-axis (sax), 2-chamber-long-axis (2ch), 3-chamber-long-axis (3ch), 4-chamber-long-axis (4ch), etc.) captured in each image. The DNN may be trained to learn features associated with the various slices and views through a training procedure, and subsequently determine, automatically, the slice and/or view associated with a given image based on the learned features. The determined slice and/or view information may then be used to arrange the CMR images 102 into different groups including, for example, a first group {img_0_0_sax, img_1_0_sax . . . } that may correspond to slice 0 of the short-axis view, a second group {img_0_1_sax, img_1_1_sax . . . } that may correspond to slice 1 of the short-axis view, a third group {img_0_0_2ch, img_1_0_2ch . . . } that may correspond to slice 0 of the 2-chamber long-axis view, a fourth group {img_0_0_4ch . . . } that may correspond to slice 0 of the 4-chamber long-axis view, etc. The DNN may take (e.g., only take) the CMR image 102 as inputs for determination the slice/view information of each image, or the DNN may take the CMR images 102 and other acquisition information (e.g., such as absolute acquisition times and/or locations included in a DICOM header) as inputs for the determination. In the latter case, the first image classification model may exploit the CMR images and acquisition information together or in a sequential order.
In examples, the one or more ML image recognition models 104 may include a second image classification model trained for determining the cardiac phase (e.g., within a cardiac cycle) associated with each CMR image 102. Such an ML model may also be learned and implemented using a DNN, which may be trained to learn features associated with various cardiac phases (e.g., end-of-diastole (ED), end-of-systole (ES), etc.) through a training procedure, and subsequently predict the cardiac phase depicted in a given image based on the learned features automatically. For instance, the CMR images 102 may include images {img_0_0_sax, img_0.3_0_sax, img_0.6_0_sax, img_0.9_0_sax, img_1.2_0_sax, img_1.5_0_sax, img_1.8_0_sax, img_2.1_0_sax . . . } spanning one or more cardiac phases or cycles, where the first number in the denotations may represent an absolute acquisition time (e.g., 0 may not necessarily correspond to the beginning of a cardiac cycle) of the image and the second number in the denotations may represent the slice to which the image may belong. Based on the features learned through training, the DNN may classify img_0.3_0_sax and img_1.5_0_sax as belonging to ED, and img_0.9_0_sax and img_2.1_0_sax as belonging to ES. The CMR images 102 may then be grouped into {img_0_0_sax}, {img_0.3_0_sax, img_0.6_0_sax, img_0.9_0_sax, img_1.2_0_sax}, and {img_1.5_0_sax, img_1.8_0_sax, img_2.1_0_sax} based on the determination of these key cardiac phases, where each group may correspond to a cardiac cycle and may include images starting from one ED and ending before the next ED. The rest of the images may be distributed into these groups based on the detected key cardiac phases, since those images may have been captured sequentially in time to reflect the continuous motion of the heart). In examples, a timestamp or time position relative to the first image in a group (e.g., which may be ED) may be assigned to each image in the group such that the images may be aligned with the images of another group, as will be described in greater detail below. The DNN may take (e.g., only take) the CMR image 102 as inputs for determining the cardiac phases of the images, or the DNN take the CMR images 102 and other acquisition information (e.g., such as absolute acquisition times included in a DICOM header) as inputs for the determination. In examples, additional ML-based processing may be conducted to facilitate the detection of the cardiac phases. For instance, the heart depicted in a CMR image 102 may be segmented using a segmentation network so as to obtain volumetric information of the heart (e.g., such as a left ventricle (LV) volume) as depicted in the image. The volumetric information may then be used to facilitate the determination of the cardiac phases, for example, since the ED and/or ES phases may have a strong association with the LV volume. The result of the segmentation operation may be re-used in a subsequent image analysis task (e.g., a post-analysis task) without incurring additional costs.
With the automatically determined time (e.g., cardiac phase) and space (e.g., slice/view) information of the CMR images 102, and the requirements 108 associated with the cardiac analysis tasks 110, the CMR images 102 may be grouped at 112 according to the requirements 108 and the automatically determined information. For example, with some cardiac analysis tasks, it may be desirable to examine the CMR images 102 grouped into different slices, where each slice may include images depicting a cardiac motion, while for other analysis tasks, it may be desirable to group the CMR images 102 into different cardiac phases, where each phase may include images spanning multiple slices and encompassing whole heart information.
In examples, all or a subset of the CMR images 102 may be arranged into groups that correspond to respective cardiac cycles, for example, upon tagging the images with automatically determined cardiac phase information and/or timestamp at 106. Two or more of these groups of images, however, may not be aligned with respect to time (e.g., for patients with heart diseases like premature ventricle contraction that may cause the timing of cardiac cycles to vary from one cycle to the next). For example, a first group corresponding to a first cardiac cycle may include 3 images with respective timestamps or time positions 1, 3, and 5 (e.g., relative to the first image in the first group), while a second group corresponding to a second cardiac cycle may include 5 images with respective timestamps or time positions of 1, 2, 3, 4, and 5 (e.g., relative to the first image in the second group). For clinical applications that may require CMR images to be time-aligned across different cardiac cycles, multiple CRM images in a group may be merged (e.g., into one image) or additional CMR images may be generated for a group (e.g., at the 2 and 4 time positions in the first group) such that the images in the group may be aligned with the images of another group (e.g., the second group mentioned above). The additional images may be generated using various interpolation techniques (e.g., linear interpolation techniques) based on existing images within a cardiac cycle and/or across different cardiac cycles (e.g., utilizing corresponding timestamps determined for the images associated with the different cardiac cycles). The additional images may also be generated using a neural network trained for image synthesis, e.g., by exploiting neighboring images that may be temporally related.
In examples, one or more of the CMR images 102 (e.g., belong to the same slice or cardiac phase) may be captured while the patient was engaged in a motion (e.g., a respiratory motion). The effects of such a motion on the CMR images (e.g., reflected through an in-plane translation) may be compensated using various image registration techniques including, for example, an image registration neural network. The motion compensation operation may be combined with motion related operations (e.g., such as motion estimation) in a post-analysis procedure such that the results of the motion compensation operation may be re-used during the post-analysis procedure without incurring additional computation or resource usage. In examples, a respiratory motion may not be removed via the motion compensation operation, and through-plane translation (e.g., cross-slice motion correction) may be accomplished by separating the CRM images into slices and grouping multiple slices together, e.g., using the image alignment techniques described herein.
The decoder network of ANN 302 may be configured to receive the representation produced by the encoder network, decode the features of the input image 304 based on the representation, and generate a mask 306 (e.g., a pixel- or voxel-wise segmentation mask) for segmenting one or more objects (e.g., the LV and/or RV of a heart, the AHA heart segments, etc.) from the input image 302. The decoder network may also include a plurality of layers configured to perform up-sampling and/or transpose convolution (e.g., deconvolution) operations on the feature representation produced by the encoder network, and to recover spatial details of the input image 304. For instance, the decoder network may include one or more un-pooling layers and one or more convolutional layers. Through the un-pooling layers, the decoder network may up-sample the feature representation produced by the encoder network (e.g., based on pooled indices stored by the encoder network). The up-sampled representation may then be processed through the convolutional layers to produce one or more dense feature maps, before batch normalization is applied to the one or more dense feature maps to obtain a high dimensional representation of the input image 304. As described above, the output of the decoder network may include a segmentation mask for delineating one or more anatomical structures or regions from the input image 304. In examples, such a segmentation mask may correspond to a multi-class, pixel/voxel-wise probabilistic map in which pixels or voxels belonging to each of the multiple classes are assigned a high probability value indicating the classification of the pixels/voxels.
I
reg
=I
mov(θ(x)) (1)
where x may represent coordinates in the moving image domain, θ(x) may represent the mapping of x to the fixed image domain, and Imov (θ(x)) may represent one or more grid sampling operations (e.g., using a sampler 406). θ may include parameters associated with an affine transformation model, which may allow for translation, rotation, scaling, and/or skew of the input image. θ may also include parameters associated with a deformable field (e.g., a dense deformation field), which may allow for deformation of the input image. For example, θ may include rigid parameters, B-spline control points, deformable parameters, and/or the like.
At 510, the loss calculated using one or more of the techniques described above may be used to determine whether one or more training termination criteria are satisfied. For example, the training termination criteria may be determined to be satisfied if the loss is below a threshold value or if the change in the loss between two training iterations falls below a threshold value. If the determination at 510 is that the termination criteria are satisfied, the training may end; otherwise, the presently assigned network parameters may be adjusted at 512, for example, by backpropagating a gradient descent of the loss function through the network before the training returns to 506.
For simplicity of explanation, the training steps are depicted and described herein with a specific order. It should be appreciated, however, that the training operations may occur in various orders, concurrently, and/or with other operations not presented or described herein. Furthermore, it should be noted that not all operations that may be included in the training method are depicted and described herein, and not all illustrated operations are required to be performed.
The systems, methods, and/or instrumentalities described herein may be implemented using one or more processors, one or more storage devices, and/or other suitable accessory devices such as display devices, communication devices, input/output devices, etc.
Communication circuit 604 may be configured to transmit and receive information utilizing one or more communication protocols (e.g., TCP/IP) and one or more communication networks including a local area network (LAN), a wide area network (WAN), the Internet, a wireless data network (e.g., a Wi-Fi, 3G, 4G/LTE, or 5G network). Memory 606 may include a storage medium (e.g., a non-transitory storage medium) configured to store machine-readable instructions that, when executed, cause processor 602 to perform one or more of the functions described herein. Examples of the machine-readable medium may include volatile or non-volatile memory including but not limited to semiconductor memory (e.g., electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM)), flash memory, and/or the like. Mass storage device 608 may include one or more magnetic disks such as one or more internal hard disks, one or more removable disks, one or more magneto-optical disks, one or more CD-ROM or DVD-ROM disks, etc., on which instructions and/or data may be stored to facilitate the operation of processor 602. Input device 610 may include a keyboard, a mouse, a voice-controlled input device, a touch sensitive input device (e.g., a touch screen), and/or the like for receiving user inputs to apparatus 600.
It should be noted that apparatus 600 may operate as a standalone device or may be connected (e.g., networked or clustered) with other computation devices to perform the tasks described herein. And even though only one instance of each component is shown in
While this disclosure has been described in terms of certain embodiments and generally associated methods, alterations and permutations of the embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure. In addition, unless specifically stated otherwise, discussions utilizing terms such as “analyzing,” “determining,” “enabling,” “identifying,” “modifying” or the like, refer to the actions and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (e.g., electronic) quantities within the computer system's registers and memories into other data represented as physical quantities within the computer system memories or other such information storage, transmission or display devices.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other implementations will be apparent to those of skill in the art upon reading and understanding the above description.