Ultrasound imaging is a useful medical imaging modality. For example, internal structures of a patient's body may be imaged before, during or after a therapeutic intervention. Also, qualitative and quantitative observations in an ultrasound image can be a basis for diagnosis. For example, ventricular volume determined via ultrasound is a basis for diagnosing, for example, ventricular systolic dysfunction and diastolic heart failure.
A healthcare professional typically holds a portable ultrasound probe, sometimes called a “transducer,” in proximity to the patient and moves the transducer as appropriate to visualize one or more target structures in a region of interest in the patient. A transducer may be placed on the surface of the body or, in some procedures, a transducer is inserted inside the patient's body. The healthcare professional coordinates the movement of the transducer so as to obtain a desired representation on a screen, such as a two-dimensional cross-section of a three-dimensional volume.
Particular views of an organ or other tissue or body feature (such as fluids, bones, joints or the like) can be clinically significant. Such views may be prescribed by clinical standards as views that should be captured by the ultrasound operator, depending on the target organ, diagnostic purpose or the like.
In some ultrasound images, it is useful to identify anatomical structures visualized in the image. For example in an ultrasound image view showing a particular organ, it can be useful to identify constituent structures within the organ. As one example, in some views of the heart, constituent structures are visible, such as the left and right atria; left and right ventricles; and aortic, mitral, pulmonary, and tricuspid valves.
Existing software solutions have sought to identify such structures automatically. These existing solutions seek to “detect” a structure by specifying a horizontal bounding box in which the structure is visible, or “segment” the structure by identifying the individual pixels in the image that show the structure.
The inventors have recognized that conventional approaches to anatomical structure identification in ultrasound images have significant disadvantages.
In particular, the inventors have recognized that conventional detection techniques provide inadequate detail for many typical diagnostic uses of ultrasound images. As examples: (1) having a horizontal bounding box surrounding an aorta cross-section is not adequate to determine a long-axis diameter of the aorta cross-section in any orientation of the aorta cross-section to the ultrasound image; (2) having a horizontal bounding box surrounding a left ventricle outflow track (“LVOT”) is not adequate to place a Doppler sample gate to capture a pulse wave Doppler signal specific to the LVOT in any orientation of the LVOT to the ultrasound image; and (3) having a horizontal bounding box surrounding a B-line reverberation artifact in lung ultrasound is not adequate to determine the angular width of the B-line.
The inventors have further recognized that conventional segmentation techniques have a high computational resource cost, which necessitates the use of expensive, powerful computing hardware, and/or precludes real-time or near-real-time operation. Additionally, for many purposes conventional segmentation techniques are overkill, in the sense that the high level of detail of the individual-pixel surfaces that they delineate are unnecessary—and often even disadvantageous—for typical diagnostic uses of ultrasound images such as those listed above.
In response to recognizing these disadvantages, the inventors have conceived and reduced to practice a software and/or hardware facility that automatically detects and quantifies anatomical structures in an ultrasound image using a customized shape prior (“the facility”). For particular ultrasound application, the facility defines an anatomical structure whose instances are to be identified in ultrasound images, as well as a shape to fit to identified structure instances (the “shape prior”), and attributes of that shape. The facility uses this information to define and train a structure identification model for this structure. The facility applies this model to ultrasound images; for each instance of the structure visualized in the image, the model returns the attributes of the shape fitted to that structure instance. In some embodiments, the facility uses the shape's attributes to superimpose the shape in a display of the ultrasound image.
In some embodiments, the facility uses one or more of the shape's attributes as a diagnostic value for the patient. In various embodiments, the facility uses the shape's attributes for a variety of other purposes.
In some embodiments, the structure identification model used by the facility is derived from You Only Look Once (“YOLO”) models, described in (1) Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. “You only look once: Unified, real-time object detection,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016, available at arxiv.org/pdf/1506.02640v5.pdf; (2) Redmon, Joseph and Ali Farhadi., “YOLO9000: Better, Faster, Stronger,” University of Washington, Allen Institute for AI, 2016, available at arxiv.org/pdf/1612.08242v1.pdf; (3) Redmon, Joseph and Ali Farhadi, YOLOv3: An Incremental Improvement, University of Washington, available at pkeddie.com/media/files/papers/YOLOv3.pdf; and (4) Hurtik, Petr, et al. “Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3,” arXiv preprint arXiv:2005.13243, 2020, available at arxiv.org/pdf/2005.13243, each of which are hereby incorporated by reference in their entirety. In cases where a document incorporated herein by reference conflicts with the present disclosure, the present disclosure controls. Briefly, YOLO models divide each input image into rectangular regions rotationally aligned with the borders of the image. For each region, the model outputs a probability that the region contains at least part of a target structure instance, as well as attributes defining a horizontal bounding box around the structure instance that occurs in the region.
In defining its per-application structure identification models, the facility defines (a) the shape prior and its attributes, and (b) a shape, size, and arrangement of regions well-suited to the shape prior and common distributions of the shape prior in typical ultrasound images. The facility chooses an architecture for the model suited to an output in which one or more dimensions enumerate the regions, and a final dimension contains, for each region: (1) the probability that the region contains at least part of a structure instance, and (2) the attributes of the shape prior. The facility uses ultrasound images showing the structure to train the model defined as described above. Once trained, the facility applies the model to detect instances of the structure and produce the attributes of the shape prior fitted to each structure instance. In some embodiments, the facility uses the results of applying the model as a basis for instructing the operator to reposition and/or reorient the transducer, and/or adjusting Doppler analysis results based on a rotational angle of the fitted shape prior.
As one example, in some embodiments, to detect aorta cross-sections in ultrasound images of the heart, the facility defines an elliptical shape prior, and the following attributes for it: probability of presence, center X and Y coordinates, long axis diameter and rotational angle, and short axis diameter. Based upon the elliptical shape prior and common distributions of aortic cross-sections in typical ultrasound images, the facility selects a grid of rectangular regions.
As another example, in some embodiments, to detect LVOTs in ultrasound images of the heart, the facility defines a rotatable rectangle shape prior, and the following attributes for it: probability of presence, center X and Y coordinates, long axis diameter and rotational angle, and short axis diameter. Based upon the rotatable rectangle shape prior and common distributions of LVOTs in typical ultrasound images, the facility selects a grid of rectangular regions.
As an additional example, in some embodiments, to detect the interior of blood vessels in ultrasound images of them, the facility defines a rotatable rectangle shape prior, and the following attributes for it: probability of presence, center X and Y coordinates, long axis diameter and rotational angle, and short axis diameter. Based upon the rotatable rectangle shape prior and common distributions of blood vessels in typical ultrasound images, the facility selects a grid of rectangular regions.
As a further example, in some embodiments, to detect B-lines in ultrasound images of the lung, the facility defines a circle-sector shape prior defined with respect to the top and bottom of the ultrasound image cone, and the following attributes for it: probability of presence, directed angle between a line perpendicular to the center of the active surface of the probe (sometimes called the “scanning axis”) and the center of the sector, and angular width of the sector. Based upon the sector shape prior and common distributions of B-lines in typical ultrasound images, the facility selects a series of circle-sectors spanning the bottom of the ultrasound image cone. The four examples are discussed in further detail below.
By performing in some or all of these ways, the facility rapidly and efficiently identifies instances of a target anatomical structure visualized in an ultrasound image, and directly provides diagnostically-useful values for those structure instances.
Additionally, the facility improves the functioning of computer or other hardware, such as by reducing the dynamic display area, processing, storage, and/or data transmission resources needed to perform a certain task, thereby enabling the task to be permitted by less capable, capacious, and/or expensive hardware devices, and/or be performed with lesser latency, and/or preserving more of the conserved resources for use in performing other tasks. For example, by not seeking to identify every individual pixel showing a structure instance, the facility can reduce the processing load on the computing device I which the facility is implemented, permitting it to be outfitted with a less powerful and less expensive processor, or permitting it to undertake more or larger simultaneous processing tasks.
The probe 12 is configured to transmit an ultrasound signal toward a target structure and to receive echo signals returning from the target structure in response to transmission of the ultrasound signal. The probe 12 includes an ultrasound sensor 20 that, in various embodiments, may include an array of transducer elements (e.g., a transducer array) capable of transmitting an ultrasound signal and receiving subsequent echo signals.
The device 10 further includes processing circuitry and driving circuitry. In part, the processing circuitry controls the transmission of the ultrasound signal from the ultrasound sensor 20. The driving circuitry is operatively coupled to the ultrasound sensor 20 for driving the transmission of the ultrasound signal, e.g., in response to a control signal received from the processing circuitry. The driving circuitry and processor circuitry may be included in one or both of the probe 12 and the handheld computing device 14. The device 10 also includes a power supply that provides power to the driving circuitry for transmission of the ultrasound signal, for example, in a pulsed wave or a continuous wave mode of operation.
The ultrasound sensor 20 of the probe 12 may include one or more transmit transducer elements that transmit the ultrasound signal and one or more receive transducer elements that receive echo signals returning from a target structure in response to transmission of the ultrasound signal. In some embodiments, some or all of the transducer elements of the ultrasound sensor 20 may act as transmit transducer elements during a first period of time and as receive transducer elements during a second period of time that is different than the first period of time (i.e., the same transducer elements may be usable to transmit the ultrasound signal and to receive echo signals at different times).
The computing device 14 shown in
In some embodiments, the display screen 22 may be a touch screen capable of receiving input from an operator that touches the screen. In such embodiments, the user interface 24 may include a portion or the entire display screen 22, which is capable of receiving operator input via touch. In some embodiments, the user interface 24 may include one or more buttons, knobs, switches, and the like, capable of receiving input from an operator of the ultrasound device 10. In some embodiments, the user interface 24 may include a microphone 30 capable of receiving audible input, such as voice commands.
The computing device 14 may further include one or more audio speakers 28 that may be used to output acquired or conditioned auscultation signals, or audible representations of echo signals, blood flow during Doppler ultrasound imaging, or other features derived from operation of the device 10.
The probe 12 includes a housing, which forms an external portion of the probe 12. The housing includes a sensor portion located near a distal end of the housing, and a handle portion located between a proximal end and the distal end of the housing. The handle portion is proximally located with respect to the sensor portion.
The handle portion is a portion of the housing that is gripped by an operator to hold, control, and manipulate the probe 12 during use. The handle portion may include gripping features, such as one or more detents, and in some embodiments, the handle portion may have a same general shape as portions of the housing that are distal to, or proximal to, the handle portion.
The housing surrounds internal electronic components and/or circuitry of the probe 12, including, for example, electronics such as driving circuitry, processing circuitry, oscillators, beamforming circuitry, filtering circuitry, and the like. The housing may be formed to surround or at least partially surround externally located portions of the probe 12, such as a sensing surface. The housing may be a sealed housing, such that moisture, liquid or other fluids are prevented from entering the housing. The housing may be formed of any suitable materials, and in some embodiments, the housing is formed of a plastic material. The housing may be formed of a single piece (e.g., a single material that is molded surrounding the internal components) or may be formed of two or more pieces (e.g., upper and lower halves) which are bonded or otherwise attached to one another.
In some embodiments, the probe 12 includes a motion sensor. The motion sensor is operable to sense a motion of the probe 12. The motion sensor is included in or on the probe 12 and may include, for example, one or more accelerometers, magnetometers, or gyroscopes for sensing motion of the probe 12. For example, the motion sensor may be or include any of a piezoelectric, piezoresistive, or capacitive accelerometer capable of sensing motion of the probe 12. In some embodiments, the motion sensor is a tri-axial motion sensor capable of sensing motion about any of three axes. In some embodiments, more than one motion sensor 16 is included in or on the probe 12. In some embodiments, the motion sensor includes at least one accelerometer and at least one gyroscope.
The motion sensor may be housed at least partially within the housing of the probe 12. In some embodiments, the motion sensor is positioned at or near the sensing surface of the probe 12. In some embodiments, the sensing surface is a surface which is operably brought into contact with a patient during an examination, such as for ultrasound imaging or auscultation sensing. The ultrasound sensor 20 and one or more auscultation sensors are positioned on, at, or near the sensing surface.
In some embodiments, the transducer array of the ultrasound sensor 20 is a one-dimensional (1D) array or a two-dimensional (2D) array of transducer elements. The transducer array may include piezoelectric ceramics, such as lead zirconate titanate (PZT), or may be based on microelectromechanical systems (MEMS). For example, in various embodiments, the ultrasound sensor 20 may include piezoelectric micromachined ultrasonic transducers (PMUT), which are microelectromechanical systems (MEMS)-based piezoelectric ultrasonic transducers, or the ultrasound sensor 20 may include capacitive micromachined ultrasound transducers (CMUT) in which the energy transduction is provided due to a change in capacitance.
The ultrasound sensor 20 may further include an ultrasound focusing lens, which may be positioned over the transducer array, and which may form a part of the sensing surface. The focusing lens may be any lens operable to focus a transmitted ultrasound beam from the transducer array toward a patient and/or to focus a reflected ultrasound beam from the patient to the transducer array. The ultrasound focusing lens may have a curved surface shape in some embodiments. The ultrasound focusing lens may have different shapes, depending on a desired application, e.g., a desired operating frequency, or the like. The ultrasound focusing lens may be formed of any suitable material, and in some embodiments, the ultrasound focusing lens is formed of a room-temperature-vulcanizing (RTV) rubber material.
In some embodiments, first and second membranes are positioned adjacent to opposite sides of the ultrasound sensor 20 and form a part of the sensing surface. The membranes may be formed of any suitable material, and in some embodiments, the membranes are formed of a room-temperature-vulcanizing (RTV) rubber material. In some embodiments, the membranes are formed of a same material as the ultrasound focusing lens.
In act 301, the facility chooses a shape prior for the target structure. As discussed below in more detail, the facility typically chooses a simple shape that best matches the shape of common examples of the target structure in typical ultrasound images. For example, for generally round aortal cross-sections, the facility chooses an elliptical shape prior; for generally triangular or wedge-shaped pulmonary B-lines, the facility chooses a circle-sector shape prior.
In act 302, the facility determines attributes that can be used to fit the shape prior chosen in act 301 to each detected instance of the target structure. In the case of an elliptical shape prior chosen for aortal cross-sections, in some embodiments, the facility determines the following attributes: X and Y coordinates of the center of the ellipse; long-axis and short-axis diameters of the ellipse; and angle of rotation between the long-axis of the ellipse in the center of the ultrasound cone. In some embodiments, these attributes also include a presence probability attribute to represent the likelihood that each reference region contains an instance of the target structure.
In act 303, the facility determines the shape and arrangement of reference regions. In the case of aortal cross-sections, in some embodiments, the facility determines that a grid of rectangular reference regions will be used. In the case of pulmonary B-lines, the facility determines that a series of circle-sectors will be used.
In act 304, the facility defines a model whose input is an ultrasound image, and whose output is a multidimensional tensor. The output tensor includes one or more dimensions for traversing the reference regions, and an additional dimension that is a vector containing (a) a target structure presence probability for the reference region, and (b) attribute values fitting the shape prior to a target structure instance occurring in the reference region. Example model definitions are discussed below in connection with
In act 305, the facility trains the model defined in act 304 using ultrasound images showing the target structure. After act 305, this process concludes.
The model takes a 128×128×1 ultrasound image 620 as its input, and produces a 4×4×N tensor 690 as its output. The model first subjects the input ultrasound image to a convolutional block made up of 2D convolutional layer 631, 2D batch normalization layer 632, and leaky relu activation function layer 633. The model then proceeds to a convolutional block made up of 2D convolutional layer 634, 2D batch normalization layer 635, and leaky relu activation function layer 636. The model then proceeds to a downsample layer 640. The model then proceeds to a convolutional block made up of 2D convolutional layer 641, 2D batch normalization layer 642, and leaky relu activation function layer 643. The model then proceeds to a convolutional block made up of 2D convolutional layer 644, 2D batch normalization layer 645, and leaky relu activation function layer 646. The model then proceeds to a downsample layer 650. The model then proceeds to a convolutional block made up of 2D convolutional layer 651, 2D batch normalization layer 652, and leaky relu activation function layer 653. The model then proceeds to a convolutional block made up of 2D convolutional layer 654, 2D batch normalization layer 655, and leaky relu activation function layer 656. The model then proceeds to a downsample layer 660. The model then proceeds to a convolutional block made up of 2D convolutional layer 661, 2D batch normalization layer 662, and leaky relu activation function layer 663. The model then proceeds to a downsample layer 670. The model then proceeds to a convolutional block made up of 2D convolutional layer 671, 2D batch normalization layer 672, and leaky relu activation function layer 673. The model then proceeds to a downsample layer 680. The model then proceeds to a convolutional block made up of 2D convolutional layer 681, 2D batch normalization layer 682, and leaky relu activation function layer 683. Leaky relu activation function layer 683 produces the output tensor. In various embodiments, the facility uses a variety of neural network architectures and other machine learning model architectures to produce similar results.
In some embodiments, the facility allocates the first two dimensions of the output tensor to identify each of the reference regions, such as in a grid of 4×4 reference regions. In some embodiments, the facility selects the following ellipse shape prior attributes for use in the model it defines for detecting aortal cross-sections: probability of presence, center X and Y coordinates, long-axis diameter and rotational angle, and short-axis diameter. Thus, the N value sizing the vector that makes up the final dimension of the output tensor for the aortal cross-section application is 6.
In some embodiments, the facility assists the operator in aligning the rotational angle of the scanning axis to be parallel to the fitted rectangle. In some such embodiments, the facility determines (1) the orientation of the LVOT rectangle, and (2) the directed angle from the scanning axis to a line between the origin of the ultrasound cone and the center of the LVOT rectangle. If the LVOT orientation is aligned with the scanning axis, the facility indicates to the user that the orientation is optimal for Doppler data acquisition. If the LVOT orientation is not parallel to the scanning axis, the facility indicates to the user that the probe needs to be angled left or right (depending on the how the LVOT orientation and the scanning axis align). In the example shown in
In some vascular Doppler cases, the operator will not be able to orient the probe parallel to the blood flow due to anatomical constraints. However, because of the tube shape of the vasculature, in some embodiments the facility assumes homogeneous direction of the blood flow and applies Doppler angle correction to adjust the calculation of blood velocity. In particular, band on the angle between the scanning axis and the vasculature θ, in some embodiments the facility adjusts the blood velocity using the following equation:
v
correct
=v
Doppler/cos(θ)
where vDoppler is the measured Doppler velocity. To improve system reliability, in some embodiments the facility applies such correction only when θ is smaller than a certain threshold, such as 60 degrees.
The various embodiments described above can be combined to provide further embodiments. All of the U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet are incorporated herein by reference, in their entirety. Aspects of the embodiments can be modified, if necessary to employ concepts of the various patents, applications and publications to provide yet further embodiments.
These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.