The current invention relates to microbubble detection. More specifically, the invention relates to a method of detecting targeted microbubbles nondestructively using an deep neural network beamformer that processes channel data from dual-frequency transmissions.
Ultrasound imaging is attractive as a medical imaging modality because it is low cost, portable, non-invasive, and does not utilize ionizing radiation. However, conventional ultrasound imaging lacks the molecular specificity of alternative modalities such as magnetic resonance imaging and positron emission tomography. Recently, ultrasound molecular imaging (USMI) has been enabled by the introduction of targeted microbubbles (MBs). MBs are micron-sized gas bubbles encapsulated in a lipid shell, and are commonly used as an ultrasound contrast agent because of their strong scattering properties. The shells of MBs can be conjugated to bind to desired biomarkers with high specificity, and the bound MBs are subsequently detected using ultrasound. Thus, USMI can be used to detect molecular biomarkers with high specificity and high sensitivity.
USMI enables a wide range of applications, including the early detection of cancer. For instance, a biomarker associated with the development of tumor neovasculature called VEGFR-2 has been successfully targeted using MB contrast agents in preclinical studies for the detection of breast, prostate, and ovarian cancers in animal models.
However, clinical translation of USMI to human imaging faces several unique challenges that are often circumvented in preclinical imaging. For instance, preclinical tumors are often more accessible than human tumors (e.g., subcutaneous vs. deep). Most significantly, preclinical imaging studies commonly employ destructive-subtraction imaging (see
Moreover, destructive pulses intrinsically cannot be used for real-time imaging. Each time the MBs are destroyed, they must be replenished and given time to bind to the biomarkers (often upwards of 10 min.), leading to long examination times and potentially requiring higher dosages.
What is needed is a method of using USMI detect bound MBs nondestructively, allowing the clinician to freely interrogate the tissue for MBs in real time until they can arrive at a diagnosis.
To address the needs in the art, a method of nondestructively detecting targeted contrast agents in real-time is provided that includes using a neural network (NN) beamformer, where an input of the NN includes ultrasound transducer channel data from a dual-frequency pulse-echo acquisition from a medium that may contain targeted contrast agents, where an output of the NN is an image of pixel-wise probability of the targeted contrast agent presence, where the NN nondestructively distinguishes the targeted contrast agent from tissue and noise by exploiting characteristic differences in responses of the targeted contrast agent versus responses from the tissue and noise present in the channel data of the dual-frequencies, where the NN is trained to operate according to destructive-subtraction ultrasound molecular imaging datasets that are used as a ground truth.
According to one aspect of the invention, the NN is configured to accept interleaved fundamental and harmonic frequency channel data, where the fundamental frequency acquisition includes one set of pulses at the imaging frequency, where the harmonic frequency acquisition includes two sets of pulses at half of the imaging frequency with opposite polarities that are summed.
In another aspect of the invention, the NN is configured to accept fundamental and harmonic frequency channel data, where the fundamental frequency acquisition includes one set of pulses at half of the imaging frequency, where the harmonic frequency acquisition includes a sum of said set of pulses at half of the imaging frequency with a second set of pulses at half of the imaging frequency with opposite polarities.
In a further aspect of the invention, the dual-frequency pulse-echo acquisitions are performed using a plane wave or diverging wave synthetic transmit aperture technique.
In one aspect of the invention, the channel data acquisition includes the radiofrequency data acquired on all transducer elements.
According to another aspect of the invention, the channel data acquisition includes a downsampled form of the radiofrequency data acquired on all transducer elements.
In yet another aspect of the invention, the NN is trained to identify the contrast agents according to destructive-subtraction images that are used as the ground truth, where each destructive-subtraction image is formed by acquiring a pre-destruction image, eliminating the contrast agents from an imaging field of view using destruction, and subtracting a post-destruction image from the pre-destruction image, where the pre-destruction and post-destruction images are each formed using the best available temporal filtering techniques and beamforming methods.
In a further aspect of the invention, the pre-destruction and post-destruction images are reconstructed by using temporal filtering techniques that can include averaging a group of the channel data acquisitions comprising up to 30 frames and subsequently beamforming.
In a further aspect of the invention, the pre-destruction and post-destruction images are reconstructed using a beamforming method that can include delay-and-sum beamforming, or SLSC beamforming, where the destructive-subtraction images are further enhanced using manual segmentation and image post-processing to eliminate artifacts.
In yet another aspect of the invention, training of the NN includes obtaining a pre-destruction dual-frequency channel data acquisition, passing the dual-frequency channel data acquisition into the NN to estimate a map of pixel-wise probability of the presence of the contrast agent (ŷ), applying a strong destructive pulse to eliminate contrast agents from an imaging field of view and forming a ground truth destructive-subtraction image (y), and comparing the (ŷ) versus (y) using a loss function, and to update the parameters of the neural network to minimize the loss function during the training.
Targeted microbubbles (MBs) enable ultrasound molecular imaging (USMI) by binding to specific biomarkers and producing strong reflections to ultrasound. However, current USMI techniques are not easily translatable for clinical use. In particular, preclinical studies often utilize destruction-subtraction imaging, wherein a strong destructive pulse is used to destroy MBs to confirm their locations. This approach is potentially unsafe, and is intrinsically not real-time. The current invention provides a method of nondestructively detecting targeted contrast agents in real-time that includes using a neural network (NN) beamformer. Here, an input of the NN includes ultrasound transducer channel data from a dual-frequency pulse-echo acquisition from a medium that may contain targeted contrast agents, where an output of the NN is an image of pixel-wise probability of the targeted contrast agent presence. The NN nondestructively distinguishes the targeted contrast agent from tissue and noise by exploiting characteristic differences in responses of the targeted contrast agent versus responses from the tissue and noise present in the channel data of the dual-frequencies. Finally, the NN is trained to operate according to destructive-subtraction ultrasound molecular imaging datasets that are used as a ground truth.
In one exemplary embodiment, the network is trained using a total of 20 USMI datasets acquired in a mouse model of hepatocellular carcinoma and in microvessel flow phantoms. The network was then evaluated on 5 distinct datasets: a positive control, a negative control, and three previously unseen mouse tumors. Across the 5 datasets, the neural network achieved a mean AUC of 0.91 and DC of 0.56 compared to the destruction-subtraction images. These results demonstrate that a neural network can nondestructively distinguish MBs from background tissue and noise by exploiting characteristic differences in their fundamental and harmonic responses. The nondestructive dual-frequency DNN beamformer enables safe and real-time USMI and can aid in the translation to clinical applications.
In another exemplary embodiment, networks were trained over a range of training hyperparameters using different combinations of input data configurations to identify the components essential to consistent and reproducible training. The networks did not train successfully when using fundamental frequency data alone and trained most successfully and consistently when using dual-frequency data as input.
The current invention advances a coherence-based beamforming technique for USMI, which utilized correlations among the transducer element signals to enhance MBs and suppress background tissue, further improving destruction-subtraction imaging. This previous technique showed that the channel data contain valuable information that is inaccessible via traditional delay-and-sum techniques. The current invention provides a clinically translatable method for forming high-quality USMI images nondestructively using a novel neural network beamformer.
In one aspect of the invention, the pre-destruction and post-destruction images are reconstructed using a beamforming method that can include delay-and-sum beamforming, or SLSC beamforming, or any other useful beamforming method, where the destructive-subtraction images are further enhanced using manual segmentation and image post-processing to eliminate artifacts.
According to one aspect of the invention, the NN is configured to accept interleaved fundamental and harmonic frequency channel data, where the fundamental frequency acquisition includes one set of 10 MHz pulses, where the harmonic frequency acquisition includes two sets of 5 MHz pulses with opposite polarities that are summed. Further, the NN is configured to accept fundamental and harmonic frequency channel data, where the fundamental frequency acquisition includes two sets of 5 MHz pulses with opposite polarities, where the harmonic frequency acquisition includes a sum of the two sets of 5 MHz pulses with opposite polarities.
In a further exemplary embodiment of the invention, USMI was performed in a mouse model of hepatocellular carcinoma in xenografted subcutaneous tumors. VEGFR-2-targeted BR55 MBs (Bracco, Milan, Italy) were injected via the tail vein. The MBs were allowed to circulate for 7 min. prior to imaging to provide sufficient time for targeted MBs to bind and for free MBs to be cleared. Low-mechanical-index nonlinear pulse sequences were used to perform USMI. The dual-frequency pulse-echo acquisitions are performed using a plane wave synthetic transmit aperture technique. Focal hotspots and inertial cavitation of the MBs were avoided by performing retrospective transmit beamforming of 7 plane waves transmitted at angles ranging from −9° to +9°. An L12-3v transducer was used to transmit pairs of 5 MHz pulses with inverted polarity and to receive signals bandpass filtered at 10 MHz. A Verasonics Vantage 256 research scanner and a custom GPU-based software beamformer were used to obtain radiofrequency (RF) signals from 128 transducer elements. The signals were demodulated and focused (i.e., delayed but not summed) into a M×N grid, yielding an IQ dataset of size CM×N×128. A pixel spacing of 3 pixels per wavelength was used. In one aspect of the invention, the channel data acquisition includes a downsampled form of the radiofrequency data acquired on all transducer elements.
In one embodiment of the invention, the NN is trained to identify the contrast agents according to destructive-subtraction images that are used as the ground truth, where each destructive-subtraction image is formed by acquiring a pre-destruction image, eliminating the contrast agents from an imaging field of view using destruction, and subtracting a post-destruction image from the pre-destruction image, where the pre-destruction and post-destruction images are each formed by averaging a group of the channel data acquisitions comprising up to 30 frames and subsequently beamforming.
In a further example, receive USMI beamforming was performed using the coherence-based short-lag spatial coherence (SLSC) technique, which measured the average correlation coefficient across channel pairs with a spacing of at most 4 elements. Destruction-subtraction images were formed by acquiring images seven minutes after MB injection (pre-burst) and again after a strong destructive pulse (post-burst) and subtracting the post-burst SLSC image from the pre-burst SLSC image. These images were further manually segmented into a binary mask to eliminate obvious artifacts, resulting in a “ground truth” image denoted as y∈{0, 1}M×N.
In the method of the current invention, a fully convolutional neural network is used to perform USMI. The network replaced the SLSC and destructive-subtraction components of beamforming. In one exemplary embodiment, a network was designed to accept the focused data demodulated at 10 MHz from two nondestructive pulse sequences: two 5 MHz inverted pulses (for second harmonic imaging) as well as a 10 MHz transmission (for fundamental imaging). Due to computational constraints, the focused channel data for each acquisition was downsampled to 16 channels via non-overlapping subaperture beamforming with subapertures of 8 elements each. Here, the acquired channel data from the nondestructive fundamental and harmonic acquisitions are denoted as Xf and Xh, respectively, and their concatenation is denoted Xfh. The output of the neural network is the pixel-wise probability of MB presence, ŷ∈[0, 1]M×N. The neural network includes 4 repeated blocks of the Conv2D, BatchNorm, and ReLU layers, followed by a softmax operation to obtain the pixel-wise probability distribution. The network was implemented using TensorFlow.
In yet another aspect of the invention, training of the NN includes obtaining a pre-destruction dual-frequency channel data acquisition, passing the dual-frequency channel data acquisition into the NN to estimate a map of pixel-wise probability of the presence of the contrast agent (ŷ), applying a strong destructive pulse to eliminate contrast agents from an imaging field of view and forming a ground truth destructive-subtraction image (y), and comparing the (ŷ) versus (y) using a loss function, and to update the parameters of the neural network to minimize the loss function during the training.
More specifically, the network can be denoted as fθ(Xfh)=ŷ, where θ contains the learnable parameters. The parameters were updated via gradient descent by iterating over a training set (described below) so as to minimize a loss function L:
where a mixture of the cross-entropy loss function and soft Dice similarity coefficient was used:
with p iterating over all M×N pixels, where α=0.3 was selected heuristically and ε=10−10 was used for numerical stability. The network was trained to minimize L for 125 epochs, i.e., iterations over the training dataset.
Regarding datasets and metrics, in one exemplary embodiment, a total of 25 distinct dual-frequency and destruction-subtraction datasets were obtained, with 5 acquisitions in a tissue-mimicking microvessel phantom (positive controls), one acquisition in a mouse abdomen prior to MB injection (negative control), and 19 acquisitions in mouse tumors 7 min. post-injection of targeted MBs. The 25 acquisitions were split into a training set of 20 and testing set of 5 acquisitions. Care was taken to ensure that the 25 acquisitions were acquired in different locations and tumors to avoid the inadvertent re-use of highly correlated data in the training and testing sets. For each acquisition, two frames of data were selected randomly to get two realizations of thermal noise. The datasets were then augmented two-fold by a left-to-right flip in both the azimuth and channel dimensions, and another two-fold by applying a constant π/3 radian complex phase rotation over the entire dataset, yielding a total of 160 training samples and 40 validation samples per input configuration. The network performance was then measured in the test dataset using the Dice coefficient and area under the ROC curve (AUC) metric.
These results indicate that the neural network was able to distinguish MB signal from background tissue and noise using only the nondestructive dual-frequency channel data. Moreover, the quality of the results was comparable to that acquired using destruction-subtraction SLSC imaging, with accurate MB detection in the positive and negative controls as well as in the tumors. This shows that, through repetitive training, the network learned to detect characteristic frequency-dependent channel signal response of the MBs present in the nondestructive signals.
In another exemplary embodiment, the same NN was modified to accept different combinations of input data and trained with the same protocol. Nine separate configurations were compared: 1) Fundamental frequency 10 MHz only, denoted Xf; 2) Fundamental frequency 5 MHz (positive polarity) only, denoted Xp; 3) Sum of positive and negative polarity 5 MHz, denoted Xh; 4, 5, 6) Concatenation of Xp and Xh in channel data form, channel sum form, and detected envelope form, denoted Xph, Xphsum, and Xphenv, respectively; 7, 8, 9) Concatenation of Xf and Xh in channel data form, channel sum form, and detected envelope form, denoted Xfh, Xfhsum, and Xfhenv, respectively. For each of the nine configurations, the networks were trained across a range of learning rates ranging from 10−5 to 10−1 by employing Bayesian hyperparameter optimization over 100 iterations.
The manually segmented destruction-subtraction SLSC images were treated as ground truth in this example. Although destruction-subtraction is currently considered the gold standard for MB confirmation, even these images contained significant amounts of noise, leading to a potential mislabeling of pixels. For instance, it was unclear in
An important consequence of these exemplary results is that MBs were detected nondestructively using the neural network beam-former, a critical step towards enabling safe and real-time USMI for the translation to clinical applications.
To summarize these examples, a novel neural-network-based beamformer is provided for the purpose of achieving safe and real-time USMI. The network was designed to utilize nondestructive channel data acquired at two distinct frequencies, and to produce a pixel-wise estimate of MB probability. The network was trained using a total of 20 USMI datasets acquired in a mouse model of hepatocellular carcinoma and in microvessel flow phantoms. The network was then evaluated on 5 distinct datasets: a positive control, negative control, and three previously unseen mouse tumors. Across the 5 datasets, the neural network achieved a mean AUC of 0.91 and DC of 0.56 compared to the destruction-subtraction images. These results demonstrate that a neural network can nondestructively distinguish MBs from background tissue and noise by exploiting characteristic differences in their fundamental and harmonic responses. The network was also found unable to learn when using only fundamental frequency data as input, was able to learn suboptimally when using only harmonic frequency data as input, and learned optimally when using both fundamental and harmonic data together. The nondestructive dual-frequency DNN beamformer enables safe and real-time USMI and can aid in the translation to clinical applications.
The present invention has now been described in accordance with several exemplary embodiments, which are intended to be illustrative in all aspects, rather than restrictive. Thus, the present invention is capable of many variations in detailed implementation, which may be derived from the description contained herein by a person of ordinary skill in the art. For example, the invention can be used any transmit pulse sequence, including diverging wave transmissions, focused transmissions, and coded excitations. The invention can be used with different combinations of ultrasonic frequencies and harmonics beyond the fundamental and second harmonics. Alternative preprocessing and post-processing can be performed besides channel downsampling and manual segmentation. The same methodology applies to alternative contrast agents with similar frequency characteristics to microbubbles, such as “nanodroplets” or “nanobubbles”, or microbubbles that have been loaded with a therapeutic agent. The ground truth images for training the neural network can be obtained using any variety of contrast agent imaging, including but not limited to difference imaging, spatial coherence imaging, acoustic angiography, and acoustic radiation force-induced motion imaging techniques. More sophisticated neural network architectures than the one employed here could yield improved results. The invention can be used for volumetric imaging in conjunction with a translating arm, such as an automated breast volume scanner system, or using matrix array transducers.
All such variations are considered to be within the scope and spirit of the present invention as defined by the following claims and their legal equivalents.
This application claims priority from U.S. Provisional Patent Application 62/721,950 filed Aug. 23, 2018, which is incorporated herein by reference.
This invention was made with Government support under contract EB022770 awarded by the National Institutes of Health. The Government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62721950 | Aug 2018 | US |