N/A
Unintentional injury and trauma continues to be a widespread issue throughout the United States. For example, estimates for the year 2016 indicate that unintentional injury or trauma was the third leading cause of death in the United States. This estimate does not favor a particular subset, and rather spans across all age groups (1-44 years old) and populations. Of the unintentional injuries and traumas, unintentional motor vehicle and traffic accidents, unintentional falls and injuries, and firearm injuries were the groups with the highest likelihood for morbidity and mortality.
Typically, the probability for survival rapidly diminishes based on the severity of the trauma incident, and the time elapsed since the trauma incident. For example, highly severe trauma incidents (e.g., hemorrhage), require more prompt attention when compared to less severe incidents, as the patient's condition can rapidly deteriorate. As another example, according to the Golden Hour concept (e.g., as described by R Adams Cowley of the University of Maryland) the longer the elapsed time since a trauma incident, the lower the probability of survival. Thus, it is imperative to minimize the time between the start of the traumatic incident and the initiation of appropriate medical care.
Accordingly improved systems, methods, and media for remote trauma assessment are desirable.
In accordance with some embodiments of the disclosed subject matter, systems, methods, and media for remote trauma assessment are provided.
In accordance with some embodiments of the disclosed subject matter, a system for remote trauma assessment is provided, the system comprising: a robot arm; an ultrasound probe coupled to the robot arm a depth sensor; a wireless communication system; and a processor that is programmed to: cause the depth sensor to acquire depth data indicative of a three dimensional shape of at least a portion of a patient; generate a 3D model of the patient based on the depth data; automatically identify, without user input, a plurality of scan positions using the 3D model; cause the robot arm to move the ultrasound probe to a first scan position of the plurality of scan positions; receive, from a remote computing device via the wireless communication system, movement information indicative of input to the remote computing device provided via a remotely operated haptic device; cause the robot arm to move the ultrasound probe from the first scan position to a second position based on the movement information; cause the ultrasound probe to acquire ultrasound signals at the second position; and transmit ultrasound data based on the acquired ultrasound signals to the remote computing device.
In some embodiments, the system further comprises a force sensor coupled to the robot arm, the force sensor configured to sense a force applied to the ultrasound probe and in communication with the processor.
In some embodiments, the processor is further programmed to: inhibit remote control of the robot arm while the force applied to the ultrasound probe is below a threshold; determine that the force applied to the ultrasound probe at the first position exceeds the threshold based on a force value received from the force sensor; and in response to determining that the force applied to the ultrasound probe at the first position exceeds the threshold, accept movement information from the remote computing device.
In some embodiments, the system further comprises a depth camera comprising the depth sensor, and the 3D model is a 3D point cloud, and wherein the processor is further programmed to: cause the robot arm to move the depth camera to a plurality of positions around a patient; cause the depth camera to acquire the depth data and corresponding image data at each of the plurality of positions; generate the 3D point cloud based on the depth data and the image data; determine a location of the patient's umbilicus using image data depicting the patient; determine at least one dimension of the patient using the 3D point cloud; identify the plurality of scan positions based on the location of the patient's umbilicus, the at least one dimension, and a labeled atlas.
In some embodiments, the processor is further programmed to: provide an image of the patient to a trained machine learning model, wherein the trained machine learning model is a detection network (e.g., a Faster R-CNN) that was trained to identify a region of an image corresponding to an umbilicus using labeled training images depicting umbilici; receive, from the trained machine learning model, an output indicating a location of the patient's umbilicus within the image; and map the location of the patient's umbilicus within the image to a location on the 3D model.
In some embodiments, the processor is further programmed to: provide an image of the patient to a trained machine learning model, wherein the trained machine learning model is a Faster R-CNN that was trained to identify a region of an image corresponding to a wound using labeled training images depicting wounds; receive, from the trained machine learning model, an output indicating a location of a wound within the image; map the location of the wound within the image to a location on the 3D model; and cause the robot arm to avoid moving the ultrasound probe within a threshold distance of the wound.
In some embodiments, the processor is further programmed to: generate an artificial potential field emerging from the location of the wound and having a field strength that decreases with distance from the wound; determine, based on a position of the ultrasound probe, a force exerted on the ultrasound probe by the artificial potential field; transmit force information indicative of the force exerted by on the ultrasound probe by the artificial potential field to the remote computing device thereby causing the force exerted on the ultrasound probe by the artificial potential field to be provided as haptic feedback by the haptic device.
In some embodiments, the processor is further programmed to: provide an image of the patient to a trained machine learning model, wherein the trained machine learning model was trained to identify regions of an image corresponding to objects to be avoided during an ultrasound procedure using labeled training images depicting objects to be avoided during an ultrasound procedure; receive, from the trained machine learning model, an output indicating one or more locations corresponding to objects to avoid within the image; map the one or more locations within the image to a location on the 3D model; and cause the robot arm to avoid moving the ultrasound probe within a threshold distance of the one or more locations.
In some embodiments, the processor is further programmed to: receive a force value from the force sensor indicative of the force applied to the ultrasound probe; and transmit force information indicative of the force value to the remote computing device such that the remote computing device displays information indicative of force being applied by the ultrasound probe.
In some embodiments, the system further comprises a camera, and the processor is further programmed to: receive an image of the patient from the camera; format the image of the patient for input to an automated skin segmentation model to generate a formatted image; provide the formatted image to the automated skin segmentation model, wherein the automated skin segmentation model is a segmentation network (e.g., a U-Net-based model) trained using a manually segmented dataset of images that includes a plurality of images that each depict an exposed human abdominal region; receive, from the automated skin segmentation model, a mask indicating which portions of the image correspond to skin; and label at least a portion of the 3D model as corresponding to skin based on the mask.
In accordance with some embodiments of the disclosed subject matter, a method for remote trauma assessment is provided, the method comprising: causing a depth sensor to acquire depth data indicative of a three dimensional shape of at least a portion of a patient; generating a 3D model of the patient based on the depth data; automatically identifying, without user input, a plurality of scan positions using the 3D model; causing the robot arm to move an ultrasound probe mechanically coupled to a distal end of the robot arm to a first scan position of the plurality of scan positions; receiving, from a remote computing device via a wireless communication system, movement information indicative of input to the remote computing device provided via a remotely operated haptic device; causing the robot arm to move the ultrasound probe from the first scan position to a second position based on the movement information; causing the ultrasound probe to acquire ultrasound signals at the second position; and transmitting ultrasound data based on the acquired ultrasound signals to the remote computing device.
In accordance with some embodiments of the disclosed subject matter, a system for remote trauma assessment is provided, the system comprising: a haptic device having at least five degrees of freedom; a user interface; a display; and a processor that is programmed to: cause a graphical user interface comprising a plurality of user interface elements to be presented by the display, the plurality of user interface elements including a switch; receive, from a remote mobile platform over a communication network, an instruction to enable actuation of the switch; receive, via the user interface, input indicative of actuation of the switch; receive, from the haptic device, input indicative of at least one of a position and orientation of the haptic device; and in response to receiving the input indicative of actuation of the switch, transmit movement information based on the input indicative of at least one of the position and orientation of the haptic device to the mobile platform.
In some embodiments, the haptic device includes an actuatable switch, the actuatable switch having a first position, and a second position, and the processor is further programmed to: in response to the actuatable switch being in the first position, cause a robot arm associated with the mobile platform to inhibit translational movements of an ultrasound probe coupled to a distal end of the robot arm along a first axis, a second axis, and a third axis; and in response to the actuatable switch being in the second position, cause the robot arm to accept translational movement commands that cause the ultrasound probe to translate along at least one of the first axis, the second axis, and the third axis.
In some embodiments, the plurality of user interface elements includes a selectable first user interface element corresponding to a first location, and wherein the processor is further programmed to: receive, via the user interface, input indicative of selection of the first user interface element; and in response to receiving the input indicative of selection of the first user interface element, cause a robot arm associated with the mobile platform to autonomously move an ultrasound probe coupled to a distal end of the robot arm to a first position associated with the first user interface element.
In some embodiments, the processor is further programmed to: receive force information indicative of a force value generate by a force sensor associated with the mobile device and the robot arm, wherein the force value is indicative of a normal force acting on the ultrasound probe; and cause the display to present information indicative of the force value based on the information indicative of the force value.
In some embodiments, the processor is further programmed to: determine that the force value does not exceed a threshold, and in response to determining that the force value does not exceed a threshold, cause the display to present information indicating that the ultrasound probe is not in contact with a patient to be scanned.
In some embodiments, the processor is further programmed to: receive, from the mobile platform, image data acquired by a camera associated with the mobile platform, wherein the image data depicts at least a portion of the ultrasound probe; receive, from the mobile platform, ultrasound data acquired by the ultrasound probe; and cause the display to simultaneously present an image based on the image data and an ultrasound image based on the ultrasound image.
Various objects, features, and advantages of the disclosed subject matter can be more fully appreciated with reference to the following detailed description of the disclosed subject matter when considered in connection with the following drawings, in which like reference numerals identify like elements.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
As described above, the likelihood of surviving a severe trauma incident typically decreases as the time elapsed increases (e.g., from the start of the incident to when medical care is administered). Thus, timing, although not entirely controllable in many situations, is critical. The implementation of statewide emergency medical services has vastly improved patient outcomes by attempting to optimize initial care and providing fast transport to dedicated trauma centers. Ultimately, emergency medical services significantly reduce prehospital time for trauma victims, which thereby decreases the total elapsed time from the start of the incident to when medical care is administered. However, although emergency medical services have improved patient outcomes, emergency medical services cannot mitigate all problems. For example, even the fastest and best organized emergency medical services cannot transport a patient to a trauma center fast enough to prevent all patient deaths. Of all pre-hospital deaths, 29% are estimated to be classified as potentially survivable, being attributed to uncontrolled hemorrhages that are not readily unidentified and/or treated by emergency medical technicians.
Additionally, in some cases, upon arriving at the hospital, the trauma patient must undergo diagnostic imaging procedures to understand the extent of the trauma patient's injuries, and to determine a diagnosis for an appropriate treatment. At times, the trauma patient must wait for various steps during the diagnostic process. For example, a trauma patient may have to wait for an imaging system to become available. As another example, a trauma patient may have to wait during the imaging procedure. As yet another example, a trauma patient may have to wait following the imaging procedure while the results is being analyzed (e.g., to determine a diagnosis).
Various diagnostic imaging procedures have been proposed to effectively diagnose trauma patients. For example focused assessment with ultrasound for trauma (sometimes referred to as “FAST” or “eFAST”) is a relatively well established, and accepted diagnostic procedure. The FAST procedure is desirable for both its simplicity and speed. Additionally, the FAST procedure can be used to identify collapsed lungs (pneumothorax), and can identify significant hemorrhage in the chest, abdomen, pelvis, and pericardium.
In general, the FAST technique focuses on imaging four major scan locations on a patient, which can include (1) an area of the abdomen below the umbilicus, (2) an area of the chest above the umbilicus, (3) an area on a right side of the back ribcage, and (4) an area on a left side of the back ribcage. Although, compact and mobile ultrasound systems could be used at the initial point of care, first responders tasked with utilizing the ultrasound systems would require extensive teaching to use such systems. Accordingly, even though first responders could theoretically implement the FAST technique, the first responders lack the experience required to produce ultrasound images that can be used for diagnostic purposes, and regardless of the image quality, to accurately interpret the resulting image. Aside from imaging difficulties and diagnostic difficulties, requiring first responders to implement a FAST technique redirects attention away from tasks typically implemented by first responders that are known to improve outcomes for patients that would not benefit from a FAST scan (e.g., patients without hidden hemorrhages). For example, first responders typically stabilize the patient, provide compression for hemorrhage control, and initiate cardiopulmonary resuscitation. Considering that typical tasks completed by first responders can be far more important than a theoretical FAST scan, in many cases forcing first responders to implement FAST scans would not be advantageous, and could potentially cause harm.
Delays in generating appropriate diagnostic information for first responders, and other medical practitioners, can often be problematic, when the diagnosis is not visually apparent. For example, without sufficient diagnostic information, first responders cannot initiate the necessary initial treatment for the underlying diagnosis, and thus cannot effectively address the injuries causing a potential hemorrhage, or loss of blood volume. Additionally as described above the trauma patient must undergo diagnostic imaging, potentially requiring wait times before and/or after arriving at the hospital, before obtaining a diagnosis. This can be especially problematic for specific trauma injuries, such as occult thoracic or abdominal injuries (e.g., caused by high energy mechanisms, such as automotive crashes), penetrating injuries (e.g., gun shot and stab wounds), and blunt trauma injuries, all of which can result in significant hemorrhage. Without a proper diagnosis for these injuries, potentially life-saving invasive techniques such as placement of a resuscitative endovascular balloon occlusion of the aorta (“REBOA”), or self-expanding foam for the treatment of severe intra-abdominal hemorrhage, may be delayed or cannot be administered in time (e.g., the trauma patient's condition has deteriorated beyond a particular level).
In some embodiments, systems, methods, and media for remote trauma assessment can be provided. For example, in some embodiments, a robotic imaging system can perform FAST scanning at the initial point of care (e.g., within an emergency medical service vehicle), with remote control of the robotic imaging system by a trained user (e.g., an ultrasound technician, a radiologist, etc.). In a more particular example, the robotic imaging system can be controlled remotely by an experienced practitioner (e.g., an ultrasound technician, a radiologist, etc.). In such an example, the experienced practitioner can manipulate a haptic device (e.g., another robot) to a desired orientation. Movements and/or positions of the haptic device can be relayed to the robotic imaging system, and the robotic imaging system can initiates the relayed movements. In some embodiments, the tactile proficiency of an experienced practitioner can facilitate utilization of mechanisms described herein for remote trauma assessment at the initial point of care, and thus the robotic imaging system can be used to produce diagnostically sufficient images. Additionally, a diagnosis can be made (e.g., by a radiologist) well before the trauma patient arrives at the hospital. This can allow first responders the ability to initiate appropriate care while transporting the patient to the hospital, and allowing the hospital sufficient time to prepare for potentially life-saving procedures while the trauma patient is traveling (e.g., REBOA, self-expanding foam, etc.), by providing diagnostic information while the patient is still in transit.
In some embodiments, a robotic imaging system can have a force sensor, which can be used to provide force information to a remote user. For example, generally, if too much or too little force is applied while depressing an ultrasound probe on a patient, the quality of the ultrasound images can be negatively affected. Thus, in some embodiments, a computing device can present the force data to the user as feedback and/or limit the amount of force that can be applied to address negative effects that excessive force can have on the quality of the acquired ultrasound images. In some embodiments, the robotic imaging system can have a camera, which can image the trauma patient. Images from such a camera can be analyzed to determine locations to avoid while acquiring ultrasound images of the patient. For example, the robotic imaging system, or other suitable computing device, can extract and label regions within the images acquired by the camera that correspond to regions that are not suitable for ultrasound imaging. In some embodiments, the robotic imaging system can use labelling information to avoid contacting these regions (and/or or contacting the region with a force value above a threshold) during ultrasound imaging. For example, such regions can correspond to wounds, bandages, etc., and contacting such regions may exacerbate an underlying problem (e.g., may cause further bleeding).
In some embodiments, the fixed camera 112 can be mounted to a structure above the imaging scene (e.g., where a trauma patient is positioned) of the robotic imaging system 102 and away from the robotic system 114 (e.g., above the robotic system 114). For example, the fixed camera 112 can be mounted to an interior surface of an emergency vehicle. As another example, the camera 112 can be mounted to a fixed structure, such that the fixed structure can allow the camera 112 to be positioned above the robotic imaging system 102. In such examples, the robotic system 114 does not interfere with the acquisition of an image with the fixed camera 112 (e.g., by entirely blocking a field of view of fixed camera 112). In some embodiments, the fixed camera 112 can be implemented using any suitable camera technology to acquire two-dimensional image data. For example, the fixed camera 112 can be a two-dimensional (“2D”) color camera. As another example, the fixed camera 112 can acquire images using various wavelengths of light (e.g., infrared light, visible light, etc.). In a more specific example, the fixed camera 112 can be implemented using a Chameleon CMLN-13S2C color camera (available from Sony Corporation, Tokyo, Japan). Additionally or alternatively, in some embodiments, the fixed camera 112 can be implemented using any suitable camera technology to acquire three-dimensional image data using a stereoscopic camera, a monocular camera, etc., and can detect one or more wavelengths of light (e.g., infrared light, visible light, etc.). For example, the fixed camera 112 can be implemented using a stereoscopic camera that includes stereoscopically positioned image sensors, to acquire 3D imaging data (e.g., by using triangulation on corresponding images acquired from the stereoscopically positioned image sensors). As another example, the fixed camera 112 can be implemented using a depth camera that can acquire 3D imaging data (e.g., using continuous time-of-flight imaging depth sensing techniques, using structured light depth sensing techniques, using discrete time of flight depth sensing techniques, etc.). In a more particular example, the fixed camera 112 can implemented using a RealSense D415 RGB-D camera (available from Intel Corporation, Santa Clara, Calif.).
As shown in
In some embodiments, the mobile camera 116 can be coupled (and/or mounted) to the robot arm 122. For example, the mobile camera 116 can be mounted to a specific segment of the robot arm 122 that also can include and/or implement the end effector (“EE”) of the robotic system 114 (e.g., the end effector can be mounted to the same segment). In some embodiments, mobile camera 116 can be any suitable camera that can be used to acquire three-dimensional (“3D”) imaging data of the trauma patient and corresponding visual (e.g., color) image data of the trauma patient, using any suitable technique or combinations of techniques. For example, the mobile camera 116 can be implemented using a stereoscopic camera, a monocular camera, etc., and can detect one or more wavelengths of light (e.g., infrared light, visible light, etc.). In a more particular example, the mobile camera 116 can be implemented using a stereoscopic camera that includes stereoscopically positioned image sensors, to acquire 3D imaging data (e.g., by using triangulation on corresponding images acquired from the stereoscopically positioned image sensors). As another example, the mobile camera 116 can be implemented using a depth camera that can acquire 3D imaging data (e.g., using continuous time-of-flight imaging depth sensing techniques, using structured light depth sensing techniques, using discrete time of flight depth sensing techniques, etc.). In a more particular example, the mobile camera 116 can implemented using a RealSense D415 RGB-D camera (available from Intel Corporation, Santa Clara, Calif.). In some embodiments, in lieu of or in addition to depth information from the fixed camera 112 and/or the mobile camera 116, the mobile platform 110 can be associated with one or more depth sensors (not shown) that can be used to generate depth information indicative of a shape of a patient (e.g., a patient located in a particular location with respect to the robot arm 122). For example, such depth sensors can include one or more sonar sensors, one or more ultrasonic detectors, one or more LiDar-based detectors, etc. In such embodiments, depth information form depth sensors can be used in connection with images from the fixed camera 112 and/or the mobile camera 116 to generate a 3D model of a patient.
In some embodiments, the ultrasound probe 118 can be coupled (and/or mounted) to a particular segment of the robot arm 122. In some embodiments, the ultrasound probe 118 can be implemented as the end effector of the robotic system 114 (e.g., the ultrasound probe 118 can be mounted to the robotic segment most distal from the origin or base of the robot arm 122). In some embodiments, the ultrasound probe 118 can include a processor, piezoelectric transducers, etc., that cause the ultrasound probe to emit an ultrasound signal and/or receive an ultrasound signal (e.g., after interacting with the patient's anatomy), to generate ultrasound imaging data, etc. In a particular example, the ultrasound probe 118 can be implemented using a CMS600P2 Portable Ultrasound Scanner (available at Contec Medical Systems Co. Ltd, Hebei, China), having a 3.5 MHz convex probe for ultrasound imaging. In some embodiments, the ultrasound probe 118 can be mounted to the last joint of the robot (e.g., coaxial with the last joint), and the mobile camera 116 can be mounted to the last joint (or segment) of the robot arm 122. For example, the mobile camera 116 can be mounted proximal to (with respect to the base of the robot arm 122) the ultrasound probe 118 on the same segment of the robot arm 122, such that the mobile camera 116 can acquire images of the ultrasound probe 118, and a scene surrounding at least a portion of the ultrasound probe 118 (e.g., images of the ultrasound probe in context).
In some embodiments, the force sensor 120 can be coupled (and/or mounted) to a particular segment of the robot arm 122, within the robotic system 114. For example, the force sensor 120 can be positioned and mounted to the last joint (or robotic segment) of the robot arm 122, of the robotic system 114. In a more particular example, the force sensor 120 can be mounted to a proximal end of (and coaxially with) the ultrasound probe 118, such that contact between the ultrasound probe 118 and another object (e.g., the trauma patient) transmits force to the force sensor 120. In some embodiments, the force sensor 120 can be implemented using any suitable technique or combination of techniques. Additionally or alternatively, in some embodiments, the force sensor 120 can be implemented as a pressure sensor. For example, the force sensor 120 (or pressure sensor) can be resistive, capacitive, piezoelectric, etc., to sense a compressive (or tensile) force applied to the force sensor 120. In a more particular example, the force sensor 120 can be implemented using an SI-65-5 six-axis F/T Gamma transducer (available from ATI Industrial Automation, Apex, N.C.). In other embodiments, the end effector forces can be calculated using joint torques of the robot arm (e.g., deriving joint torques via measuring a current provided to at least one joint of a robot arm).
As shown in
As shown in
In some embodiments, the haptic device 108 can behave (or move) similarly to the robotic system 114. For example, the haptic device 108 can include a stylus (e.g., a tip-like structure that acts as the end effector of the haptic device 108), and can include segments, having a total number of degrees of freedom. In some embodiments, the stylus can have a first end and a second end, where the first end defines the tip of the stylus, which can be defined as the origin of the haptic device 108 (e.g., for the purposes of defining movements of the haptic device from an origin). Additionally, the stylus can be configured to be easily manipulatable by a user (e.g., similar to a writing utensil). In some embodiments, a size of the haptic device 108 can be a fraction (e.g., 0.5, 0.25, 0.1, etc.) of the size of the robot arm 122. In such embodiments, the haptic system 108 can implement the same or a similar number of degrees of freedom of the robot arm 122, but with sizing of the segments (and joints) of the haptic device 108 reduced by half (or a different scaling factor) compared to that of the size of the robot arm 122. Note that, the size, shape, and number of segments of the haptic device 108 can be, and often is, different than the size, shape, and number of segments of the robot arm 122. However, in some embodiments, haptic device 108 can be implemented as a scaled version of robot arm 122, with the same number of segments and same relative dimensions, but with a smaller relative size. In some embodiments, configuring the haptic device 108 to use the same coordinate system as the robotic system 114 and/or the robot arm 122 can lead to requiring that less data be acquired and/or sent, and may facilitate more accurate movements of the robot arm 122. In a particular example, the haptic device 108 can be implemented using a Geomagic Touch haptic device (available at 3D Systems, Rock Hill, S.C.). Regardless of the structure of the haptic device 108, when the haptic device 108 is manipulated, movements and/or positions of the haptic device 108 can be received by the computing device 106. In some embodiments, computing device 106 can transmit movement information and/or position information received from haptic device 108 and/or commands based on such movement information and/or position information to the mobile platform 110 to cause the robot arm 122 of the robotic system 114 to move to a specific location within the coordinate system of the robot arm 122.
In some embodiments, the display 126 can present a graphical user interface. In some embodiments, the display 126 can be implemented using any suitable display devices, such as a computer monitor, a touchscreen, a television, etc. In some embodiments, the inputs 128 of the computing device 106 can include indicators, sensors, actuatable buttons, a keyboard, a mouse, a graphical user interface, a touch-screen display, etc. In some embodiments, the inputs 128 can allow a user (e.g., a medical practitioner, such as a radiologist) to interact with the computing device 106, and thereby to interact with the mobile platform 110 (e.g., via the communication network 104).
In some embodiments, the communication system 130 can include any suitable hardware, firmware, and/or software for communicating with the other systems, over any suitable communication networks. For example, the communication system 130 can include one or more transceivers, one or more communication chips and/or chip sets, etc. In a more particular example, communication system 130 can include hardware, firmware, and/or software that can be used to establish a coaxial connection, a fiber optic connection, an Ethernet connection, a USB connection, a Wi-Fi connection, a Bluetooth connection, a cellular connection, etc. In some embodiments, the communication system 130 allows the computing device 106 to communicate with the mobile platform 110 (e.g., directly, or indirectly such as via the communication network 104).
In some embodiments, the memory 132 can include any suitable storage device or devices that can be used to store instructions, values, etc., that can be used, for example, by processor 124 to present content using display 126, to communicate with the mobile platform 110 via communications system(s) 130, etc. Memory 132 can include any suitable volatile memory, non-volatile memory, storage, or any suitable combination thereof. For example, memory 132 can include RAM, ROM, EEPROM, one or more flash drives, one or more hard disks, one or more solid state drives, one or more optical drives, etc. In some embodiments, memory 132 can have encoded thereon a computer program for controlling operation of computing device 106 (or mobile platform 110). In such embodiments, processor 124 can execute at least a portion of the computer program to present content (e.g., user interfaces, images, graphics, tables, reports, etc.), receive content from the mobile platform 110, transmit information to the mobile platform 110, etc.
As shown in
In some embodiments, the display 146 can present a graphical user interface. In some embodiments, the display 146 can include any suitable display devices, such as a computer monitor, a touchscreen, a television, etc. In some embodiments, the inputs 148 of the mobile platform 110 can include indicators, sensors, actuatable buttons, a keyboard, a mouse, a graphical user interface, a touch-screen display, and the like. In some embodiments, the inputs 148 allow a user (e.g., a first responder) to interact with the mobile platform 110, and thereby to interact with the computing device 106 (e.g., via the communication network 104).
As shown in
In some embodiments, the memory 152 can include any suitable storage device or devices that can be used to store instructions, values, etc., that can be used, for example, by processor 144 to present content using display 146, to communicate with the computing device 106 via communications system(s) 150, etc. Memory 152 can include any suitable volatile memory, non-volatile memory, storage, or any suitable combination thereof. For example, memory 152 can include RAM, ROM, EEPROM, one or more flash drives, one or more hard disks, one or more solid state drives, one or more optical drives, etc. In some embodiments, memory 152 can have encoded thereon a computer program for controlling operation of the mobile platform 110 (or computing device 106). In such embodiments, processor 144 can execute at least a portion of the computer program to present content (e.g., user interfaces, graphics, tables, reports, etc.), receive content from the computing device 106, transmit information to the computing device 106, etc.
In some embodiments, the connectors 154 can be wired connections, such that the fixed camera 112 and the robotic system 114 (e.g., including the mobile camera 116, the ultrasound probe 118, and the force sensor 120) can communicate with the mobile platform 110, and thus can communicate with the computing device 106 (e.g., via the communication system 150 and being directly, or indirectly, such as via the communication network 104). Additionally or alternatively, the fixed camera 112 and/or the robotic system 114 can send information to and/or receive information from the mobile platform 110 (e.g., using the connectors 154, and/or the communication systems 150).
After acquiring camera images 240, the flow 234 can include generating a 3D point cloud 242. The previously acquired images from the mobile camera 116 (e.g., camera images 240) can be used to generate the 3D point cloud 242. For example, the mobile platform 110 (or the computing device 106) can perform a point cloud registration using the 3D images (depicting multiple scenes) to construct a 3D model using any suitable technique or combinations of techniques. For example, the mobile platform 110 can use the Iterative Closest Point (“ICP”) algorithm. In such an example, the ICP algorithm can be implemented using the point cloud library (“PCL”), and pointmatcher, and can include prior noise removal with a PCL passthrough filter. In some embodiments, a point cloud color segmentation can be applied to extract a point cloud corresponding to the patient from the reconstructed scene, for example by removing background items such as a table and a supporting frame.
After acquiring camera images 240 and/or generating 3D point cloud 242, the flow 234 can include inputting images into the machine learning model 244. The machine learning model 244 can classify one or more portions of the inputted image(s) into various categories, such as umbilici, bandages, wounds, mammary papillae (sometimes referred to as a nipple), skin, etc. Additionally or alternatively, in some embodiments, the machine learning model 244 can identify, and/or output a segmented image of a correctly identified location or region of interest within the image (e.g., an umbilicus). In some embodiments, the identified location(s) 248 (or region of interest) for each image can be mapped to the 3D point cloud 242 (or model) (e.g., which can later be used to avoid the locations during ultrasound imaging). In some embodiments, any suitable type of location can be identified, such as, an umbilicus, a bandage, a wound, and a mammary papillae. In some embodiments, such features can be identified as reference points and/or as points that should not be contacted during ultrasound imaging (e.g., wounds and bandages). In some embodiments, identification and segmentation of the umbilicus can be used to provide an anatomical landmark for finding the FAST scan locations, as described below (e.g., in connection with
In some embodiments, the machine learning model 244 can be trained using training images 246. The training images 246 can include examples of correctly identified images that depict the classes described above (e.g., umbilicus, a wound, a bandage, a mammary papilla, etc.). In some embodiments, the machine learning model 244 can be trained to classify images as including an example of a particular class and/or can output a mask or other information segmenting the image to identify where in the image the example of the class was identified. In some embodiments, locations of objects identified by the machine learning model 244 can be mapped to the 3D point cloud 242 at determine locations 248 of flow 234. In some embodiments, prior to inputting a particular image into the machine learning model 244, the location of the image can be registered to the 3D point cloud 242 such that the location within the image at which an object (e.g., a wound, an umbilicus, a bandage, etc.) is identified can be mapped to the 3D point cloud. In some embodiments, flow 234 can use a region identified within an image (e.g., the umbilicus region) to map the region of the image to the 3D point cloud 242 at determine locations 248.
In some embodiments, at determine FAST scan locations 250 of flow 234, the FAST scan locations can be determined relative to the 3D point cloud of the patient. In some embodiments, the mobile platform 110 (and/or the computing device 106) can use an anatomical landmark to determine the dimensions of the 3D point cloud 242. For example, the location of umbilicus on the subject can be determined and used as the anatomical landmark (e.g., being previously calculated at determine locations 248) to calculate the FAST scan positions. In some embodiments, the FAST scan positions of the subject can be derived by scaling an atlas patient for which FAST scan locations have already been determined using the ratio of the dimensions of atlas and the patient. For example, the atlas can be generated using a 3D model of a CT scan of an example patient having similar proportions to the patient with hand segmented (or identified) FAST locations provided by an expert radiologist (e.g., the atlas being annotated with FAST locations). In some embodiments, scaling of the atlas can be carried out by first determining the width and height of the patient based on the 3D point cloud 242. For example, the width of the patient can be derived by projecting the 3D point cloud 242 into the x-y plane and finding the difference between the maximum y and the minimum y (e.g., the distance between). As another example, the height of the patient can be derived by projecting the 3D point cloud 242 into the x-z plane and finding the difference between the maximum z and the minimum z (e.g., the distance between). As yet another example, a length variable associated with the patient can be derived by measuring the distance from the umbilicus to the sternum of the atlas. In some embodiments, the mobile platform 110 (or computing device 106) can use the known length, width, and/or height, and FAST scan positions of the atlas to calculate the FAST scan position on the subject using the following relationship:
where, Xp, Yp, and Zp are the coordinates of a FAST scan position (e.g., FAST scan position 1) on the subject, Xa, Ya, Za are the coordinates of a FAST scan position on the atlas (which are known), lp, wp, and hp are the length, width, and height associated with the patient's torso, and la, wa, and ha are the length, width, and height associated with the atlas. In some embodiments, the length between the umbilicus and sternum can be estimated using EQ. (1). Note that in some embodiments, an approximation of the ratio of lp to la can be derived from the other variables. In some embodiments, each of the FAST scan locations can be determined (e.g., at 242 using EQ. (1) for each remaining location), and each FAST scan location can be mapped onto the 3D point cloud 242.
In general, the location of the xiphoid process can be difficult to determine visually from image data acquired by cameras 112 and/or 116. In some embodiments, the umbilicus can be used as visual landmark in combination with the measured dimensions of the patient's torso in the 3D point cloud to estimate the FAST locations. In some embodiments, CT scan images of one or more anonymized atlas patients can be used to identify a ratio (e.g., R1 in EQS. (2) and (3) below) between the length of a person's torso to the distance between the umbilicus and the 4th FAST position (e.g., the person's xiphoid). A ratio (R2 in EQS. (2) and (3) below) between C (e.g., the entire length of the torso) and the distance of 3rd FAST scan region (e.g., the abdomen) from the umbilicus can be calculated. The distance D can represent the distance from the 4th FAST position to the 1st and 2nd FAST positions along the y-axis. Another relation R3 (in EQS. (2) and (4)) between C and D can be calculated. Distance A can be the distance from the umbilicus to the fourth fast scan location along the y-axis (see
In EQS. (3), (4), and (5) Xi, Yi, Zi for iϵ1, 2, 3, 4 can be the coordinates of the respective FAST scan positions while Xu, Yu, Zu can be the coordinates of the detected umbilicus in world frame, E can be the width of the subject (e.g., wp in EQ. (1)), and H can be the height of the patient (e.g, hp in EQ. (1)). In some embodiments, mean values for the atlases for R1, R2, and R3 can be 0.29, 0.22, and 0.20, respectively.
In some embodiments, the robotic system 114 can move the ultrasound probe 118 to a specific location corresponding to a FAST location, using the FAST scan locations been determined at 250. In some embodiments, the mobile platform 110 can receive a user input (e.g., from the computing device 106) that instructs the robotic system 114 to travel to a particular FAST scan location (e.g., the first FAST scan location). In some embodiments, the coordinate of the fast scan location (e.g., Xp1, Yp2, Zp3) can be utilized by the robotic system 114 as an instruction to travel to the first coordinate location. Additionally or alternatively, the FAST scan coordinate location can define a region surrounding the determined fast scan location (e.g., a threshold defined by each of the coordinates, such as a percentage), so as to allow for preforming a FAST scan near a location (e.g., where the specific coordinate is impeded by a bandage or wound). For example, if the first coordinate location is impeded (e.g., with a bandage), the robotic system 114 can utilize a location within a threshold distance of that coordinate, to perform a FAST scan near that region on the patient.
In some embodiments, after moving the ultrasound probe 118 to the FAST scan location (specified by the instructed coordinate or region), the robotic system 114 can determine whether or not contact between the ultrasound probe 118 and the subject can be sufficiently established for generating an ultrasound image (e.g., of a certain quality). For example, the mobile platform 110 can receive a force (or pressure) value from the force sensor 120 (e.g., at determine force while imaging 254). In such an example, the mobile platform 110 can determine whether or not the ultrasound probe 118 is in contact with the patient based on the force value. In a more particular example, if the force reading is less than (and/or based on the force reading being less than) a threshold value (e.g., 1 Newton (“N”)), the robotic system 114 can move the ultrasound probe axially (e.g., relative to the ultrasound probe surface) toward the patient's body until the force reading reaches (and/or exceeds) a threshold value (e.g., 3 N). In some embodiments, after the robotic system 114 places the ultrasound probe 118 in contact with the patient with sufficient force, the mobile platform 110 can begin generating and/or transmitting ultrasound data/images. In some embodiments, the force sensor value from the force sensor 120 can be calibrated prior to movement to a FAST location.
In some embodiments, after the robotic system 114 travels to the particular FAST scan location, the robotic system 114 can cease movement operations (e.g., bypassing the axial movement and corresponding force reading) until user input is received at the mobile platform 110 (e.g., sent by the computing device 106) to initiate the haptic feedback controller 256. Additionally, in some embodiments, in response to the user input being received by the mobile platform 110, the mobile platform 110 can provide (e.g., by transmitting) ultrasound images for the specific location 252 to the computing device 106, and/or can provide the force value from the force sensor 120 to the computing device 106. In such embodiments, the computing device 106 can present images and force to user 258 (e.g., digital images, ultrasound images, and feedback indicative of a force with which the ultrasound probe 118 is contacting the patient). In some embodiments, the ultrasound images and the force values (e.g., time based force values), can be assessed by a user (e.g., a radiologist), and the user can adjust the orientation of the haptic device 108, thereby manipulating the orientation and position of the ultrasound probe 118 (e.g., via the robotic system 114). For example, when the user initiates the haptic feedback control 256, movements of the stylus (e.g., the end effector) of the haptic device 108 can be translated into commands for the robotic system 114.
In some embodiments, the mobile platform 110 can receive force information from the force sensor 120, and can transmit the force information to the computing device 106 and/or the haptic device 108. In some embodiments, the force information can be a decomposition of the normal force into magnitude and directional components along each dimensional axis (e.g., x, y, and z directions), based on the position and orientation of the robot arm 122. Alternatively, the normal force value can be transmitted and decomposed by the computing device 106 into magnitude and directional components along each three dimensional axis. In some embodiments, the haptic device 108 can use the force values (e.g., along each axis) to resist movement (e.g., of one or more particular joints and/or segments of the haptic device 108), based on the magnitude and direction of the force (e.g., from the force sensor 120). In some embodiments, the magnitude of the forces can be proportional to the force value provided by the force sensor 120 (e.g., scaled down, or up, appropriately).
In some embodiments, the forces acting along the normal axis of the EE of the robot arm 122 (e.g., the zR axis described below in connection with
In some embodiments, position guiding, rate guiding, or combinations thereof, can be used to control the EE of the robot arm 122 (e.g., the ultrasound probe 118). For example, in some embodiments, computing device 106 and/or mobile platform 110 can receive movement information from the haptic device 108, and using the movement information from the haptic device 108, can generate and provide movement commands for the robot arm 122 as increments to the robot arm 122. The movement commands can be defined from the initial Cartesian pose of the robot arm 122 (e.g., the pose prior to the incremental movement, prior to a first incremental movement, etc.), such as in the coordinate system shown in
In some embodiments, forward kinematics for translations of the haptic device 108 can be calculated with respect to the origin shown in
In some embodiments, the mobile camera 116 can be rigidly mounted on the robot arm 122, such that there is a fixed transform between the camera and the EE of the robot arm 122. For example, as described below in connection with
In some embodiments, computing device 106 and/or mobile platform 110 can transform the initial position of the robot arm 122 for each FAST scan location into the instantaneous EE frame. Additionally, in some embodiments, computing device 106 and/or mobile platform 110 can transform the FAST scan location into the world frame of the robotic system 114 (e.g., based on the origin of the robot arm 122, which can be the base of the most proximal segment of the robot arm 122). In some embodiments, robot arm 122 can inhibit the position and orientation of the EE from being changed simultaneously. Thus, in some embodiments, commands for changing the position and orientation of the EE of the robot arm 122 can be decoupled such that the position and orientation are limited to being changed independently (e.g., one at a time), which may eliminate any unintentional coupling effects. In some embodiments, orientation commands from the haptic device 108 can be determined using only the displacement of joints 4, 5, and 6 for roll, pitch, and yaw respectively (e.g., from their center positions) of the haptic device 108. For example, joints 4, 5, and 6 can be the most distal joints of the haptic device 108 as described below in connection with
In some embodiments, an actuatable button on the stylus of the haptic device 108 can be used to control whether position commands or orientation commands are received by the computing device 106, and subsequently relayed to the mobile device 110 to instruct the robot arm 122 to move. For example, the actuatable button can be normally in a first position (e.g., open, un-depressed, etc.), which causes the computing device 106 to only receive translation commands from the haptic device 108 (e.g., while keeping the orientation of the EE of the robot arm 122 constant). In such an example, when the actuatable button is in a second position (e.g., closed, depressed, etc.), the computing device 106 can receive only orientation commands from the haptic device 108 (e.g., while keeping the position of the EE of the robot arm 122 constant).
In some embodiments, the mobile device 110 can automatically perform an indexing operation, allowing the user to re-center the haptic device 108 for translations or orientations by keeping one of them constant and changing the other, while refraining from sending any commands to the robot arm 122. The initial Cartesian pose of the robot arm 122 (e.g., using the EE coordinate system of the robot arm 122) can be updated with its current Cartesian pose every time the state of the actuatable button changes. In some embodiments, the operator can have control over the yaw (e.g., joint 7), of the robot arm 122, using joint 6 of the haptic device 108, for the slave (e.g., the robot arm 122), such that the most distal joint of the device 108 controls the most distal joint of the robot arm 122. In some embodiments, the position control scheme can be mathematically described using the following relationship:
where XR4×4 can be the homogenous transformation matrix from the robot world frame to the current EE frame,
can include the initial positions of the EE in the world frame, KP
In some embodiments, a rate control scheme can be mathematically described using the following relationship:
where, θR3 can be the new roll, pitch, and yaw angles of the EE of the robot arm 122 in the current EE frame, RwT∈
3×3 can be the rotation matrix from the robot world frame to the EE frame,
∈3 can be the initial roll, pitch, and yaw angles of the EE in the world frame, KP
3 can be the displacements of joints J4, J5, and J6 of the haptic device 108 from their respective mean positions.
In some embodiments, computing device 106 and/or mobile platform 110 can transform the Cartesian pose formed from xR
In some embodiments, the EE position feedback from the robot (e.g., the ultrasound probe 118) can be used, in addition to the position commands from the haptic device 108 multiplied by a scaling factor (e.g., 1.5), to determine the EE reference positions of the robot in the task space. This can be done by adding new reference positions read from the displacement of the haptic device 108 to command the final EE position of the robot in a Cartesian task-space as shown in EQ. (8) below.
The forward kinematics for positions of the haptic device 108 can be calculated with respect to the origin (e.g., the starting position of the tip of the stylus) through the phantom omni driver. The X, Y, and Z coordinates can be the new coordinates of the EE. Initx, Inity, Initz can be the initial positions of the EE (e.g., the ultrasound probe 118) for that scan position. Hx, Hy, Hz can be the positions of the tip of the stylus of the haptic device 108 relative to an origin of the haptic device 108 (e.g., the tip of the stylus at a point in time, such as after the ultrasound probe 118 moves to the particular fast scan location).
In some embodiments, the stylus of haptic device 108 can be oriented using, the roll, pitch, and yaw (e.g., see
In EQ. (9), θ⋅r, θ⋅p, and θ⋅y are the roll, pith, and yaw rates of the EE (of the robot arm) in the world frame. K1, K2, and K3 can be controller gains. Ho4, Ho5, Ho6 can be wrist joint angles of the haptic device 108 (see, e.g.,
In some embodiments, a hybrid control scheme can be used to control the robot arm 122. For example, the hybrid control scheme can use a combination of position and rate guiding modes. In some embodiments, the translations can be controlled using the components and techniques described above (e.g., using EQS. (6) or (8)). In some embodiments, the rate control can be used as a guiding scheme during the manipulation of the EE of the robot arm 122 while the position is maintained (e.g., during orientation commands). The roll, pitch, and yaw rate of the EE of the robot arm 122 (e.g., the three most distal joints of the robot arm 122) can be controlled using individual wrist joints (e.g., J4, J5, and J6) of the haptic device 108. The rate of change of each orientation of the haptic device 108 can be directly proportional to the deviation of each axis of the respective joint (e.g., the angles of J4, J5, J6) relative to their corresponding mean position. In some embodiments, a dead-zone of −35 degrees to +35 degrees can be used for the roll and yaw of the haptic device 108, a dead zone of −20 degrees to +20 degrees can be used for the pitch of the haptic device 108. The respective dead zones can avoid accidental change in EE orientation of the robot arm 122.
In some embodiments, a rate control scheme without the dead zones can be mathematically represented using the following relationship:
{dot over (θ)}REE=KθθH (10)
where, {dot over (θ)}R3 can be the roll, pitch, and yaw rates of the EE in the world frame, Kθ=diag[kθ1, kθ2, kθ3]>0 can be the controller gains, θH∈
3 can be the displacements of joints J4, J5, and J6 of the haptic device 108 from their respective mean positions, {dot over (θ)}R
In some embodiments, the rate control scheme described above in connection with EQS. (7), (9), and (10) can decouple orientations from each other, which may implicitly make the user move one orientation axis at a time and ensure that the orientation of the haptic device 108 is calibrated with what the user sees in the tool camera (e.g., the mobile camera 116) as roll, pitch, and yaw angles J4, J5, and J6 of the haptic device 108 are brought inside their respective dead zones.
In some embodiments, as the user (e.g., a radiologist) manipulates the haptic device 108, the user can be presented with ultrasound images and force values at 258. The ultrasound images and the force values can be transmitted to the computing device 106 (e.g., via the communication network 104) in real-time. In some embodiments, real-time feedback provided at 258 can assist a user in determining how to adjust the position (and/or orientation) of the stylus of the haptic device 108 to thereby move the ultrasound probe 118 of the robotic system 114. Generally, the force values from the force sensor 120 can be forces acting normal to the scanning surface of the ultrasound probe 118 (e.g., because the quality of the ultrasound image is only affected by normal forces, and not lateral forces applied by the ultrasound probe 118). In some embodiments, the force value from the force sensor 120 can be calibrated to adjust for gravity.
In some embodiments, a soft virtual fixture (“VF”) can be implemented (e.g., as a feature on a graphical user interface presented to the user) to lock the position of the EE of the robotic system 114, while implementing (only) orientation under certain conditions (e.g., as soon as a normal force greater than a threshold of 7N was received). For example, the user can make sweeping scanning motions while the robotic system 114 maintains the ultrasound probe 118 in stable and sufficient contact with the patient. For example, when the virtual fixture is initiated, the virtual fixture can prevent any translation except in the +zR axis (e.g., the axis normal to the ultrasound probe 118 and away from the patient). In some embodiments, to continue scanning and deactivate the virtual fixture, the user can be required to move the ultrasound probe 118 away until the ultrasound probe 118 is no longer in contact with the patient (e.g., via the haptic device 108). Additionally, in some embodiments, a hard virtual fixture can be used to cut the system off (e.g., cease operation of the robotic system 114) if (and/or based on) the magnitude of forces acting on the patient exceeds 12N (or any other suitable force sensor values).
In some embodiments, when the soft VF is initiated, the soft VF can “lock” the EEs of the robot arm 122 to inhibit translation motion, while still allowing orientation control, when a normal force value received by a suitable computing device is greater than a threshold value. In some embodiments, the soft VF can help limit the forces applied to the patient while keeping the probe stable during sweeping scanning motions. In some embodiments, the force threshold for the soft VF can be set to about 3N (e.g., which can be determined experimentally based on ultrasound image quality).
In some embodiments, force feedback models can be used prior to contact with the patient. For example, computing device 106 and/or mobile platform 110 can generate an artificial potential field(s) (“APF”), which can be defined as emerging from the center of a wound and applying a force in a direction away from the wound with a strength that diminishes with distance from the wound (e.g., the field can decrease inversely to the square of the distance). For example, the APF can be used to calculate a force to apply along a direction from the wound to the EE based on the current distance between the EE and the wound. In some embodiments, APFs can be are generated in the shape of a hemisphere with a radius 2 cm or greater than the radius of the identified wound. For example, the radius of the APF shown in
In some embodiments, a user can provide input to instruct the robotic system 114, via a user input on the computing device 106, to travel to a next FAST location (e.g., when the user deems that ultrasound images gathered at a particular FAST location can be sufficient for diagnostic purposes (e.g., above an image quality level)). At the next FAST location, flow 234 can repeat the actions of presenting images and force to the user at 258, and using haptic feedback control at 256. After all of the desired FAST locations have been imaged with the ultrasound probe 118, the radiologist (or other user, or computing device 106) can analyze the ultrasound images/data for a diagnostic determination. In some embodiments, the radiologist can communicate with the first responders (e.g., via the communication device 106 communicating with the mobile platform 110) to convey information about the diagnosis to the first responders. Examples of instructions can be a graphical image on the display (e.g., the display 146), audible instructions through the inputs 148 (e.g., a microphone), etc. This way, the first responders can implement life-saving techniques while in transit to the trauma center (e.g., hospital).
At 406, process 400 can include positioning the ultrasound probe 118 to a specific FAST scan location. For example, the user can select (using the inputs 128 and the computing device 106) one of the FAST scan locations that have been previously identified (e.g., at 404). As detailed above, the FAST scan location can correspond to a coordinate (or coordinate range) that when instructed to the robotic system 114, causes the ultrasound probe 118 to move to that particular location (or a location range). In some embodiments, the ultrasound probe 118 can be placed at the particular location, the user can select (using the inputs 128 and the computing device 106) or enable usage of the haptic device 108 to transmit movement instructions to the robotic system 114. For example, the user (radiologist) can manipulate the stylus of the haptic device 108 until the ultrasound probe 118 is positioned at a desirable location (e.g., at 408). In some embodiments, or while moving the haptic device 108, the user can receive ultrasound images (captured by the ultrasound probe and transmitted to the computing device 106, via the mobile device 110). Similarly, while moving the haptic device 108 (or while the user selection can be enabled) the user can view presented images (e.g., from the ultrasound probe 118, and cameras 112, 116) and the (normal) force values (from the force sensor 120), where the images and force sensor values can be transmitted to the computing device 106, via the mobile device 110 and displayed accordingly. In some embodiments, the information received by the computing device 106 can be displayed (e.g., via the display 126) to the user (radiologist). In some embodiments, the displayed information may allow the user to effectively adjust the orientation (or position) of the haptic device 108 to move the ultrasound probe 118. This way, the tactile proficiency of the user can be telecommunicated to the mobile device 110, while the user can be remote to the subject (e.g., the trauma patient).
At 410, process 400 can determine whether the user has finished acquiring ultrasound images at the specific FAST scan location. For example, if process 400 determines that no user input has been received (within a time period) to move to a new FAST scan location, the process can return to 408 and continue receiving input from a remote user to control the ultrasound probe. Alternatively, if the user has finished imaging at 410, the process 400 can proceed to determining whether additional FAST locations can be desired to be imaged at 412. For example, the user, via the inputs 128, can select a user input, and the computing device 106 can proceed to 412 after receiving the user input. Alternatively, if at 410 the process 400 determines that imaging has not been completed (e.g., such as by a lack of a received user input) the process 400 can proceed back to receiving input from a remote user at 408. In some embodiments, if at 412, the process 400 determines that additional FAST scan locations are desired to be imaged, such as by receiving a user input for the additional FAST scan location, the process 400 can proceed back to position the probe at a FAST scan location at 406. Alternatively, if at 410 additional FAST scan locations are not desired (e.g., such as with a user input or lack thereof during a time period, such as after 410) the process 400 can proceed to 414 to annotate, store, and/or transmit acquired images. Upon receiving sufficient ultrasound images at specific FAST locations, the user (radiologist) can store the images (e.g., within the computing device 110), can annotate images, such as, highlighting portions of the image indicative of a disease state (e.g., hemorrhage), and can transmit ultrasound images (including highlighted images) using the mobile platform 110.
In some embodiments, the machine learning model trained at 602 can be a faster R-CNN. More information regarding the architecture and training of a faster R-CNN model can be found in Ren et al., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” which is hereby incorporated by reference herein in its entirety.
At 604, process 600 can train a machine learning model to identify one or more candidate regions in an image that are likely to correspond to a class for which the machine learning model was trained at 602. For example, the machine learning model can use a trained image classification CNN (e.g., the image classification CNN trained at 602) as a feature extractor to identify features corresponding to wound, umbilicus, and/or bandage image data (e.g., among other features), which can be used in training of a region proposal network and a detection network for the identification of candidate regions, and/or regions of interest within a corresponding image.
For example, process 600 can use images depicting the classes on which the classification network was trained at 602 to train a machine learning model to identify candidate regions in the images. In such an example, the training images can be cropped such that the wound, umbilicus, and bandage (or other feature) occupies over 70 percent of the image, which can facilitate learning of the differences between the three classes.
In some embodiments, additional training images can be generated by augmented the cropped images make the classification and detection more robust. Examples of augmentation operations include random pixel translation, random pixel rotation, random hue changes, random saturation changes, image mirroring, etc.
At 606, process 600 can use the machine learning model by providing an image to the machine learning model (e.g., from fixed camera 112 and/or mobile camera 116). In some embodiments, images from fixed camera 112 and/or mobile camera 116 can be formatted to have similar characteristics to the images used for training at 602 and/or 604. For example, similar aspect ratio, similar size (in pixels), similar color scheme (e.g., RGB), etc.
In some embodiments, at 608, process 600 can receive an output from the machine learning model indicating which class(es) is present in the input image provided at 606. In some embodiments, the output can be in any suitable format. For example, the output can be received as a set of likelihood values associated with each class indicating a likelihood that each class (e.g., wound, bandage, umbilicus) is present within the image. As another example, the output can be received as a label indicating that a particular class is present with at least a threshold confidence (e.g., at least 50% confidence, at least 70% confidence, at least 95% confidence, etc.).
At 610, process 600 can receive an output from the machine learning model that is indicative of a region(s) within the input image that correspond to a particular class(es). For example, a region can be defined by a bounding box labeled as corresponding to a particular class.
In some embodiments, process 600 can carry out 608 and 610 in parallel (e.g., substantially simultaneously) or serially in any suitable order. In some embodiments, multiple trained machine learning models can be used in which the output of one machine learning model (e.g., an image classification model) can be used as input to another machine learning model (e.g., as a feature input to a region identification model).
At 612, process 600 can map one or more regions of interest received at 610 to the 3D point cloud. For example, the outputted image can be registered to the 3D point cloud, and based on the output of the classification model received at 608, can be used to map a location(s) to avoid (e.g., bandages, wounds), and/or landmarks to use for other calculations (or determinations), such as the umbilicus. Note that although bandages and wounds are described as examples of objects for which the location can be mapped such that the object can be avoided, this is merely an example, and process 600 can be used in connection with any suitable object to be avoided during an ultrasound procedure, such as an object protruding from the skin of a patient (e.g., a piercing, an object which has become embedded during a traumatic injury, etc.), clothing, medical equipment (e.g., an ostomy pouch, a chronic dialysis catheter, etc.), etc. In some embodiments, at 602, the process 600 can be trained to identify such objects to be avoided. Additionally or alternatively, in some embodiments, process 600 can map regions to be avoided by positively identifying regions to that are permissible to scan, such as regions corresponding to skin. For example, process 600 can be modified (e.g., based on skin segmentation techniques described below in connection with
At 702, process 700 can begin remote feedback control when a robot arm (e.g., robot arm 122) moves an ultrasound probe (e.g., ultrasound probe 118) to a first predetermined location (e.g., first FAST scan location). In some embodiments, in response to the ultrasound probe (e.g., the ultrasound probe 118) moving to a particular location, process 700 can transmit an instruction (e.g., to the computing device 106) to enable toggling of a graphical user interface switch (e.g., on a display 126). In some embodiments, prior to enabling toggling of a GUI switch, the GUI switch can be disabled (e.g., greyed out). In some embodiments, the computing device 106 can receive a toggling of the user interface switch from the user, which can cause the computing device 106 to request at least partial control of the robotic system 114.
At 704, process 700 can receive user input (e.g., via the computing device 106 and/or haptic device 108) and can include receiving movement information corresponding to movements of the haptic device 108. In some embodiments, process 700 can transmit the movement information directly (e.g., directly to the mobile platform 110, robot system 114, and/or robot arm 122). Additionally or alternatively, in some embodiments, process 700 can transform the movement information (e.g., by calculating for movements of the robotic system 114 and/or robot arm 122, and/or appropriately scaling) prior to transmission (e.g., to the mobile platform 110, robot system 114, and/or robot arm 122).
At 706, process 700 can determine and/or transmitting movement information and/or commands for the robotic system 114 and/or robot arm 122, which are based on the movements of the haptic device 108. For example, as described above in connection with
At 708, process 700 can present an ultrasound image, one or more images of the patient (e.g., from the fixed camera 112 and/or the mobile camera 116), and force information (e.g., based on a force value from force sensor 120). For example, process 700 can receive ultrasound images, RGB images, and/or force information from mobile platform 110, and can use the information to populate a graphical user interface displayed by display 126. In some embodiments, the displayed information may allow the user (e.g., a radiologist) to remotely adjust the position and/or orientation of the ultrasound probe 118 via the haptic device 108 based on information gathered from the patient by the mobile platform 110.
At 710, the process 700 can receive a user input for a next FAST scan location. For example, a user input may be received, via an actuatable user interface element provided via a graphical user interface. In some embodiments, at 710 if an input has not been received to move to a new location, the process 700 can proceed back to 704 to continue to receive haptic device movement information as feedback information is presented at 708.
Otherwise, if an input has been received to move to a new location, the process 700 can proceed to 712, at which process 700 can inhibit the remote feedback control, and transmitting control to the robot system 114 to move the robot arm to the next selected location. For example, if the system 100 is implemented using a toggle for activation of feedback control, the computing device 110 can initiate the toggle (or deactivate the toggle) to prevent the user from controlling the robot system 114 while the robot arm 122 is being positioned at a next FAST scan location.
The 3D scan initiation button 722, when actuated by a user, causes the computing device 106 to transmit an instruction to the mobile platform 110, which can cause the robotic system 114 to acquire 3D imaging data (e.g., to initiate 502 of process 500). The tele-manipulation toggle switch 724, when actuated, causes the mobile platform 110 to transmit movements received by the haptic device 108 into movements of the ultrasound probe 118 (e.g., initiating remote control of the robotic system 114). When the toggle switch 724 is deactivated, the robotic system 114 does not instruct (or receive) the movements transmitted by the mobile platform 110. The ultrasound image 726 can be a real-time ultrasound image, acquired by the ultrasound probe 118, transmitted to the computing device 106 from the mobile platform 110. In some embodiments, when the ultrasound imaging data (or image) is received by the computing device 106, the computing device 106 displays the ultrasound image 726 on the display 126.
The robotic software architecture (and graphical user interface) was developed to control the remote trauma assessment system, which included a control system having planning algorithms, robot controllers, computer vision, and the control allocation strategies being integrated via Robot Operating System (“ROS”). More information regarding the ROS system can be found in Quigly, et al., “ROS: an open-source robot operating system,” which is hereby incorporated by reference herein in its entirety. Smooth time-based trajectories were produced between the waypoints using Reflexxes Motion Libraries. More information regarding the Motion libraries can be found at, “Reflexxes motion libraries for online trajectory generation,” (available at reflexes.ws), which is hereby incorporated herein in its entirety. A Kinematics and Dynamics Library (“KDL”) in Open Robot Control Systems (“OROCOS”) was used to transform the task-space trajectories of the robot to the joint space trajectories, which is the final output of the high-level autonomous control. More information regarding the library can be found at Smits, “KDL: Kinematics and Dynamics Library,” (e.g., available at orocos(dot)org/kdl), which is hereby incorporated herein in its entirety. Finally, IIWA stack helps to apply the low-level controllers of the robot to follow the desired joint-space trajectories. More information regarding the IIWA stack can be found in Hennersperger, et al., “Towards MM-based autonomous US acquisitions: A first feasibility study,” which is hereby incorporated by reference herein in its entirety. A Graphical User Interface, such as graphical user interface 714, was developed in MATLAB and linked to ROS, for the user to command the robot, display instantaneous forces on the probe, initiate 3D scan, move robot to each initial scan location, and switch tele-manipulation. More information regarding the ROS system can be found at, “Robot operating system (ros) support from robotics system toolbox—matlab,” (available at www(dot)mathworks(dot)com/hardware-support/robot-operating-system), which is hereby incorporated by reference herein in its entirety. In this example, two other screens showed live feeds from world camera, tool camera, and the US system.
In some embodiments, the reference frames between tool (ultrasound probe), camera, and haptic device can be determined. The robot is tele-operated in the tool frame which is defined at the tip of the ultrasound probe. This allowed the operator to reorient the probe without any change in the position of the probe tip. This enabled the sweeping scanning motions necessary for FAST, while the probe tip maintained the position in contact with the phantom. Even though the operator perceived the scene in the camera frame through the GUI, it was observed that the fixed relative transformation between the camera and tool frames does not prevent the user from operating intuitively. The operator uses visual feedback from the RGB cameras to command the robot using a Geomagic Touch haptic device. In this example, the haptic device has 6 DOFs, one less than the manipulator, and a smaller workspace. For tele-manipulation, the robot arm is autonomously driven to each initial FAST location.
The machine learning model (e.g., the machine learning model 244) was implemented as a Faster R-CNN. The Faster R-CNN model was trained from a pre-trained network using transfer learning techniques. AlexNet trained on ImageNet was imported using Matlab's Machine Learning Toolbox and the fully connected layer of AlexNet was modified to account for 3 classes (wounds, umbilicus and bandage). After training the classifications, the Faster R-CNN model was trained on the cropped augmented images. Using stochastic gradient descent (“SGDM”) and an initial learning rate of 1e-4 the network converged in 10 epochs and 2450 iterations. For the faster R-CNN model, the model was trained on 858 wound images, 2982 umbilicus images, and 840 bandage images. Some of the wound images were taken from Medetec Wound Database (e.g., available at www(dot)medetec(dot)co(dot)uk/files/medetec-image-databases) and the rest of the images for wound, umbilicus and bandage were obtained from Google Images.
The CNN trained previously was used as a feature extractor to simultaneously train the region proposal network and the detection network. The positive overlap range was set to [0.6 1], such that if the intersection over union (“IOU”) of the bounding box is greater than 0.6, it will consider it as a positive training sample. Similarly the negative overlap range was [0 0.3]. Using SGDM, the network finished training the region proposed network and the detection network in 1,232,000 iterations. Before calculating the final detection result some optimization techniques on the bounding box (“BB”) were also applied. This included removing any BB with aspect ratio less than 0.7, and the width and height of BB were constrained to 300 pixels. These threshold values were experimentally determined. Additionally, any BB out of the skin is removed. The skin detection was carried out by color thresholding in RGB, YCrCb and HSV color spaces.
Generally, to train the feature extractor cropped image data was augmented to increase training data and make classification and detection more robust. A total of 604 original wound images, 350 umbilicus images, and 200 bandage images were used, which after the data augmentation amounted to 3539, 2100, and 1380 images, respectively. In some embodiments, the data was split randomly into 70:30 percentage of training data and validation data. A separate data set for testing was also created, which consisted of 143 wound, 382 umbilicus and 140 bandage images.
For some uses, faster R-CNN has advantages over other machine learning models. For example, the Faster R-CNN for object detection can simultaneously train the region proposal network and the classification network, and the detection time can also be significantly faster than other region based convolution neural networks. The simultaneous nature of the Faster R-CNN can improve the region proposal quality and estimation as opposed to using fixed region proposals, such as in selective search.
The phantom (e.g., the phantom 810) was scanned with the robotic system and four FAST scan positions were estimated using atlas based scaling as described previously. Table I (below) shows estimated positions and actual positions of the four FAST scan. Actual positions can be the positions where the FAST scan locations can be on the phantom, manually selected by an expert radiologist. The average position accuracy of the system was 10.63 cm±3.2 cm.
The feature extractor CNN used sensitivity (recall or true positive rate) and specificity (True Negative rate) as metrics, as shown in Table 2 (below). The sensitivity and specificity for all the three classes can be inferred to be above 94 percent and therefore can be used as a base network for the Faster R-CNN model. In one example, the overall accuracy for the classifier is 97.9 percent. The increased sensitivity and specificity for the umbilicus class can be explained by the significant higher amount of training data as compared to the other classes. In some embodiments the Faster R-CNN model was trained, the Mean Average Precision (“mAP”) was calculated on the test data (e.g., Table 2) at an Intersection Over Union (“IOU”) of 0.5. This metric was followed as per the guidelines in PASCAL VOC 2007. The mAP for the three classes at an IOU of 0.5 was 0.51, 0.55, and 0.66 respectively for umbilicus, bandage, and wounds as shown in
A 34 minute long training about the robotic system, which also encompassed practice for tele-manipulation was given to the expert radiologist. Following the training, a complete FAST scan of the phantom was conducted. For the scanning procedure using the robotic system, the position of the umbilicus along with the 3D image were used to autonomously estimate the FAST scan locations and position the robot autonomously just above consecutive FAST scan locations, as described above. Following this autonomous initialization, the remotely-located radiologist commanded the robot to move to each scan location one by one and perform a tele-manipulated FAST exam. While the radiologist manipulated the haptic device, tele-manipulation commands were transmitted to the robot. Additionally, the force value from the force sensor was transmitted to the haptic device, to provide feedback for the radiologist. In a more particular example, the force value provided to the haptic device, resisted movement by providing a force normal to the stylus of the haptic device, which was proportional to the forces acting on the ultrasound probe.
The scanning procedure using the robotic system was completed in 16 minutes and 3 seconds as compared to freehand manual scanning, which was completed in 4 minutes and 13 seconds. However, the real-time images obtained by the robot were found to be more stable and having better contrast. This can be attributed to the robot's ability to hold the probe stationary in position.
The positioning test included a custom test rig with 3 negative 3D printed molds of the probe enlarged by 10% (see
The sweeping test involved rotating the probe about the EE's roll, pitch, and yaw axes, individually from −30 to +30 degrees while being in contact with the foam test rig at a marked location (see
Performance of a remote trauma assessment system implemented in accordance with some embodiments of the disclosed subject matter was analyzed based on two main criteria: the ability of the probe to reach the desired positions in the desired orientations (positioning test) and the ability of the probe to sweep at the desired position (sweeping test). The metric for assessing the advantages of the VF was based on the consistency of forces and the maximum force exerted during the scan. The positioning test was analyzed based on the completion time and number of collisions. The former was the time taken for the operator to correctly orient and place the probe into each mold, while the latter is the number of times the probe collided with surfaces on the test-bed before fitting inside the mold. In general, the less time taken and the fewer the number of collisions, the better.
Two metrics were used to assess the quality of the ultrasound scan for the sweeping test: the velocity of sweep in each axis, and the smoothness of the sweep. The velocity was the average velocity during sweep in each axis. The faster the sweep, the better, as this allows for faster maneuverability of the probe. Smoothness, on the other hand, is measured by the standard deviation in angular velocities of each axis during the sweep. The lower the standard deviation, the smoother the sweep.
Two metrics are used to study the benefits of the VF: consistency of forces and maximum force exerted during a scan. Consistency is determined using the standard deviation of the forces along with the percentage of time during which the probe was in contact with the foam during the sweep. The higher the percentage and lower the standard deviation, the better the system's performance. The VF is responsible for maintaining a limit on the maximum forces exerted on the patient. Hence, the closer the maximum force to the VF threshold, the better.
The results for the positioning test can be seen in Table 3 below. Columns 2 and 3 of Table 3 show the total number of collisions for the positions described in
As shown in Table 3, there were a total of 18 collisions using the Position control strategy as compared to 9 collisions using the Hybrid control strategy amongst the 4 participants. The average time of completion for all three positions was 125.25 seconds for Position control and 100.33 seconds for Hybrid control. Even though users perform 25% faster with Hybrid control strategy, participants performed significantly better in the test which they performed second.
For the sweeping test, the Position and Hybrid control strategies were tested using with and without an active VF totaling 4 test combinations. The results of the sweeping tests are shown in Table 4 and Table 5. Columns 2 and 4 in Tables 4 and 5 show the average angular velocities of all test subjects in each axis (in degrees/sec) while Columns 3 and 5 show the standard deviation of the group.
While comparing position and hybrid control strategies without VF in Table 4, the standard deviation of angular velocities is 58% lower for the Hybrid strategy. For the same test with VF, the standard deviation of angular velocities is 70% lower in case of hybrid strategy. Therefore, the Hybrid strategy may allow the operator to perform the ultrasound scan while scanning the probe with much more consistent angular velocities. The Position control strategy however provides 80% faster angular velocities without VF and 114% faster angular velocities with VF. Hence, the position control strategy may allow for faster reorientation of the probe. In some cases, it may be preferable to prioritize the quality of the ultrasound image over speed, as the speed of scan is insignificant as compared to the total time the patient generally spends en-route to the hospital.
Comparing Tables 7 and 6, it can be seen that the mean percentage duration of contact during the sweep is almost 100% higher with VF. While the average forces exerted in both cases is similar, the standard deviation is approximately 150% lower when the VF is active. The maximum force exerted by the system is also significantly closer to the desired VF threshold of 7 N when the VF is active. Despite the presence of a VF, the forces exceed the threshold because the VF only locks the probe in its current position and should not be confused with impedance control. The cause of this increase in forces is mainly due to the change in interaction forces of the probe with the test-bed.
In this example experiment, the ability of the remote trauma assessment system to accurately classify umbilicus and wounds on the classification test data was evaluated and then calculate the detection accuracy on the detection test data was calculated. The sensitivity (recall or true positive rate) and specificity (True Negative rate) of the feature extractor CNN for both the classes was above 94%. This can therefore be used as a base network to train the Faster R-CNN model. The overall accuracy for the classifier was 97.9%. Once the Faster R-CNN model was trained, the Mean Average Precision (“mAP”) was calculated on the detection test data at an IoU of 0.5. This metric was followed as per the guidelines established in PASCAL VOC 2007. The mAP for the two classes at an IOU of 0.5 is 0.51 and 0.66 for umbilicus and wounds, respectively.
In some embodiments, a remote trauma assessment system implemented in accordance with some embodiments of the disclosed subject matter can generate a warning for a radiologist if a wound was detected near a FAST exam location. Moreover, the FAST exam points are estimated with respect to the umbilicus in some embodiments. Accordingly, it can be important to accurately determine the position of these objects in the robot's frame. In an experiment described in connection with Table 8, readings were taken for objects of each of the three classes placed at 5 random positions and angles on a wound phantom to make a total of 15 tests. The ground truth values for each, umbilicus and wounds, were estimated by touching the actual center of the object using a pointed tool attached to the robot and performing forward kinematics to determine the object locations in the robot world frame.
The results for average error (Euclidean distance) and standard deviation for each class is shown in Table 8. The average localization error for both the classes combined was found to be 0.947 cm±0.179 cm. To evaluate the accuracy of the estimated FAST exam points, five localization phantoms were scanned with the robotic system and the four FAST scan positions were estimated using techniques described above. The actual positions are the centroids of the FAST regions on the localization phantoms (e.g., the FAST scan positions 1 and 2 are placed symmetrically on the body). Hence, their accuracies are reported together. Since the final step of the semi-autonomous FAST exam is tele-manipulated, the estimated locations need not be highly accurate and only need to be within the workspace of the haptic device and Field Of View (“FOV”) of the camera. Table 9 shows the mean error (Euclidean distance) and standard deviation between the estimated and actual positions of the four FAST exam points in the robot's base frame. The third column of Table 9 shows whether the estimated point was within this marked region. The average position accuracy of the system was 2.2 cm±1.88 cm. All the estimated points were found to be within the scanning region marked by the expert radiologist for each FAST scan location. The largest error for any location over the five test phantoms was 7.1 cm, which was well within the workspace of the slave robot and FOV of the RGB-D camera. Thus, all the FAST points were within tele-manipulatable distance from the estimated initialization positions.
In this experiment, subjects were asked to tele-manipulate the robot from the start to the end point while trying to avoid a wound of 4 cm radius in the path. The resulting trajectories for 4 different subjects from a pilot study involving 8 human subjects are shown in
In some embodiments, at 802, the process 800 can train an automated skin segmentation model to automatically segment portions of an input image that correspond to skin. For example, after training at 802, the automated skin segmentation model can receive an input image, and output a mask corresponding to the input image in which each pixel of the mask corresponds to a portion of the input image (e.g., a pixel of the input image), and each pixel of the mask has a value indicative of whether the corresponding portion of the image is more like skin or not skin.
In some embodiments, one or more sets of labeled training data can be used to train the automated skin segmentation model at 802. The labeled training data can include images in which portions of the images corresponding to skin have been labeled as skin, and portions of the images that correspond to something other than skin (e.g., background, clothing, tattoos, wounds, etc.) have been labeled as non-skin (or not labeled). In some embodiments, the label associated with each training image can be provided in any suitable format. For example, each training image can be associated with a mask in which each pixel of the mask corresponds to a portion of the input image (e.g., a pixel of the input image), and each pixel of the mask has a value indicative of whether the corresponding portion of the image is more like skin or not skin.
Several datasets for evaluating skin segmentation algorithms exist, including HGR (e.g., described in Kawulok, et al., “Self-adaptive algorithm for segmenting skin regions,” EURASIP Journal on Advances in Signal Processing, vol. 2014, no. 1, p. 170, 2014, which is hereby incorporated by reference herein in its entirety), Pratheepan (e.g., as described in Tan, et al. “A fusion approach for efficient human skin detection,” IEEE Transactions on Industrial Informatics, vol. 8, no. 1, pp. 138-147, 2012, which is hereby incorporated by reference herein in its entirety), and ECU (e.g., Casati, et al. “SFA: A human skin image database based on FERET and AR facial images,” in IX workshop de Visao Computational, Rio de Janeiro, 2013, which is hereby incorporated by reference herein in its entirety). HGR includes 1,559 hands images, Pratheepan includes 78 images with the majority of skin pixels from face and hands, and ECU includes 1,118 images with the majority of skin pixels from face and hands. However, abdominal skin pixels were manually segmented from 30 abdominal images, and abdominal skin was shown to have different RGB, HSV, and YCbCr color pixel distributions compared to skin from the HGR and ECU datasets, suggesting that an abdominal dataset encompasses supplementary information on skin features. Due to the gap in existing datasets, adding abdominal skin samples can potentially improve the accuracy of wound, lesion, and/or cancerous region detection, especially if located on an abdomen.
In some embodiments, the process 800 can use images from a dataset of 1,400 abdomen images retrieved from a Google images search, which were subsequently manually segmented. This set of 1,400 images is sometimes referred to herein as the Abdominal skin dataset. The selection and cropping of the images was performed to match the camera's field of observation depicted in
In the images of the data set of 1,400 abdominal images, skin pixels correspond to 66% of the entire pixel data, with a mean of 54.42% per individual image, and a corresponding standard deviation of 15%. As shown
In order to form a holistic skin segmentation model, training data from the abdomen as well as other facial and hands datasets can be used, in some embodiments. In some embodiments, at 802, the process 800 can use training images from the Abdominal skin dataset, and images from other skin datasets, such as TDSD, ECU, Schmugge, SFA, HGR, etc. In some embodiments, images included in the training data can be restricted to data sets that are relatively diverse (e.g., in terms of uncontrolled background and lighting conditions). For example, of the five datasets described above, HGR includes the most diverse set of images in terms of uncontrolled background and lighting conditions. Accordingly, in some embodiments, at 802, the process 800 can use training images from the Abdominal skin dataset, and images from the HGR dataset. In some embodiments, the process 800 can divide the training data using an 80-20% training-validation split to generate validation data.
In some embodiments, the process 800 can use any suitable testing data to evaluate the performance of an automated skin segmentation model during training. For example, a test dataset can include images were from various datasets, such as the Abdominal skin dataset, Pratheepan, and ECU. Pratheepan and ECU are established and widely used testing datasets, which were selected to provide results that can be used to compare the performance of an automated skin segmentation model trained in accordance with some embodiments of the disclosed subject matter to existing skin segmentation techniques. In a more particular example, the test dataset can include 200 images from the Abdominal skin dataset that were not included in the training or validation dataset (e.g., 70 from the light lean category, 30 from light obese, 70 from dark lean, and 30 from dark obese). The testing data was selected to attempt to obtain an even evaluation of various segmentation models across different ethnic groups.
In some embodiments, the process 800 can train any suitable type of machine learning model as a skin segmentation model. For example, in some embodiments, the process 800 can train a convolutional network as a skin segmentation model. In general, convolutional networks can be trained to extract relevant features on a highly detailed level, and to compute optimum filters tailored specifically for a particular task. For example, U-Net-based convolutional neural networks are often used for segmenting grayscale images, such as CT and MRI images. An example of a U-Net-based convolutional neural network is described in Ronneberger, et al., “U-net: Convolutional networks for biomedical image segmentation,” International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234-241, which is hereby incorporated by reference herein in its entirety.
In some embodiments, the process 800 can train a U-net-based architecture using 128×128 pixel images and three color channels (e.g., R, G, and B channels). For example, such a U-net-based architecture can include a ReLU activation function in all layers, but the output layer can use a sigmoid activation function.
In some embodiments, the process 800 can initialize the weights of the U-net-based model using a sample drawn from a normal distribution centered at zero, with a standard deviation of √{square root over (2/s)}, where s is a size of the input tensor. Additionally, the process 800 can fine-tune any suitable parameters for abdominal skin segmentation. For example, the process 800 can use an Adam-based optimizer, which showed superior loss-convergence characteristics as compared to stochastic gradient descent which showed signs of early loss stagnation. As another example, the process 800 can use a learning rate of about 1e-3. As yet another example, the process 800 can use a batch size of 64. In testing, the model loss did not converge with a higher batch size, and the model overfitted to the training data with a smaller batch size. The model converged within 82 epochs using an Adam optimizer, a learning rate of 1e-3, and a batch size of 64.
Additionally or alternatively, in some embodiments, the process 800 can train a Mask-RCNN-based architecture. For example, Mask-RCNN can be described as an extension of Faster-RCNN, whereby a fully convolutional network branch can be added to each region of interest to predict segmentation masks. Mask-RCNN can be well suited to abdominal skin segmentation because of its use of region proposals, which is a different learning algorithm in terms of feature recognition. Additionally, Mask-RCNN makes use of instance segmentation, which can potentially classify the different skin regions as being part of the abdomen, hand, face, etc. In some embodiments, the process 800 can train an architecture similar to the architecture implemented in W. Abdulla, “Mask r-cnn for object detection and instance segmentation on keras and tensorflow,” 2017 (e.g., available via github(dot)com/matterport/mask_RCNN), which is hereby incorporated by reference herein in its entirety, and can use coco weight initialization. In such embodiments, the process 800 can convert the training image datasets into coco format. A smaller resnet50 backbone can be used because it can achieve a faster loss convergence. An anchor ratio can be set to [0.5, 1, 2], and the anchor can be scaled to [8, 16, 32, 64, 128]. Using the coco pre-initialized weights, the network can first be trained for 128 epochs with a learning rate of 1e-4 in order to adapt the weights to the skin dataset(s). The resultant weights can then be used to build a final model, trained using a stochastic gradient descent technique with a learning rate of 1e-3 over 128 epochs. In an example implementation of Mask-RCNN, the selected learning rate converged to an overall loss of 0.52, which is an improvement by 3.6 times over a learning rate of the order of 1e-4.
At 804, the process 800 can provide an image to be segmented to the trained automated skin segmentation model. In some embodiments, the process 800 can format the image to be input into a format that matches the format of the images used to train the automate skin segmentation model. For example, input images can be formatted as 128×128 pixel images to be input to a U-net-based automated skin segmentation model.
At 806, the process 800 can receive an output from the trained automated skin segmentation model indicating which portion or portions of the input image have been identified as corresponding to skin (e.g., abdominal skin), and which portion of portions of the input image have been identified as corresponding to non-skin. In some embodiments, the output can be provided in any suitable format. For example, the output can be provided as a mask that includes an array of pixels in which each pixel of the mask corresponds to a portion of the input image (e.g., a pixel of the input image), and each pixel of the mask has a value indicative of whether the corresponding portion of the image is more like skin or not skin.
At 808, the process 800 can label regions of an image or images as skin and/or non-skin based on the output of the automated skin segmentation model received at 806. In some embodiments, the labeled image can be an original version (or other higher resolution) of the image input to the automated skin segmentation model at 804, prior to formatting for input into the model.
In some embodiments, one or more images that include regions labeled as skin and/or non-skin can be used in any suitable application. For example, in some embodiments, the process 800 can provide labeled images to the computing device 106, the mobile platform 110, and/or the robotic system 114. In such an example, the labeled images can be used to determine portions of a patient that can be scanned using an ultrasound probe (e.g., ultrasound probe 118) and/or to determine portions of the patient that are to be avoided because the regions do not correspond to skin.
The performance of each segmentation model was evaluated based on four image segmentation metrics: accuracy, precision, recall, and F-measure. The formulas for each metric can be represented using the following relationships:
These metrics can provide context for the performance of the networks which are not readily established from accuracy measurements alone, and they can allow comparing the performance of the skin segmentation models described herein to other existing techniques.
From the boxplot shown in
As shown in
Mask-RCNN yielded a comparatively lower accuracy of 87.01%, which can be attributed to the poor capabilities of the region proposal algorithm to adapt to the high variability of skin areas in the images. These areas can range from small patches such as fingers, to covering almost the entire image, as is the case with some abdomen images. The variation in the 2D shape of skin regions can also negatively impact the network's performance; the presence of clothing, or any other occlusions caused by non-skin items, results in a different skin shape as perceived by the algorithm.
The fully connected network, which was implicitly designed for finding the optimum decision boundaries for thresholding skin color in RGB, HSV, and YCbCr colorspaces, surpassed the fixed thresholding technique by 6.45%. However, unlike the CNN-based segmentation models, the fully connected network does not account for any spatial relation between the pixels or even textural information, and hence cannot be deemed reliable enough as a stand-alone skin segmentation technique. Due to limitations imposed by the fixed threshold values, neither thresholding nor the fully connected network would be able to produce acceptable segmentation masks with differently colored skin pixels. The maximum accuracy obtained on the Abdomen test set was 86.71% for the Fully Connected Network.
Additionally, to assess the improvement resulting from the addition of the Abdominal dataset into the training set, all three networks were trained both with and without the abdomen images in two separate instances. As shown in
The thresholding results were generated using a thresholding technique that explicitly delimits boundaries on skin pixel values in pre-determined colorspaces. In general, RGB is not recommended as a stand-alone colorspace, given the high correlation between the R, G, and B values, and their dependency on environmental settings such as lighting. Accordingly, HSV, which has the advantage of being invariance to white light sources, and YCbCr, which separates the luminance (Y) from the chrominance (Cb and Cr), were additionally considered. The decision boundaries were manually optimized for the Abdominal skin dataset, and the final masks used to generate the results shown in
RGB=(R>95)∧(G>40)∧(B>20)∧(R>G)∧(R>B)∧(|R−G|>15)
YCBCR=(Cr>135)∧(Cb>85)∧(Y>80)∧[Cr≤(1.5862Cb+20)]
HSV=(H>0.8)∥(H<0.2)
Final Mask=RGB∥(HSV∧YCbCr)
Note that due to the high dimensional combinatorial aspect of fine-tuning 11 parameters for thresholding (e.g., see the relationships described immediately above in connection with thresholding), the chances of manually determining the optimal values of each parameter are small. The fully connected network (sometimes referred to herein as a fully connected feature network, or features network) results were generated using a fully connected network designed to determine the most suitable decision boundary. The fully connected network included 7 hidden layers with 32, 64, 128, 256, 128, 64, and 32 neurons, respectively. The input layer included 9 neurons, corresponding to the pixel values extracted from the colorspaces as [R, G, B, H, S, V, Y, Cb, Cr], which can be referred to as “features.” The output layer included one neuron for binary pixel classification. The input and hidden layers were followed by a dropout layer each, with a dropout percentage increasing from 10% to 30% towards the end of the network to avoid overfitting. ReLU activation functions were used throughout the network, and the output neuron was activated by a sigmoid. The optimizer used was a momentum stochastic gradient descent (SGD), with a learning rate of 3e-4, a decay of 1e-6, and a momentum of 0.9. The optimizer and corresponding hyperparameters resulted in the best performing features network. The network was trained for 50 epochs.
As shown in
The precision and recall for both U-Net and Mask-RCNN over Pratheepan and ECU were higher than the results reported by state-of-the-art networks, such as the image-based Network-in-Network (NIN) configurations described in Kim, et al., “Convolutional neural networks and training strategies for skin detection,” in 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017, pp. 3919-3923, and FCN described in Ma, et al., “Human Skin Segmentation Using Fully Convolutional Neural Networks,” in 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE). IEEE, 2018, pp. 168-170, both of which are incorporated herein. This improvement over the existing state of the art networks shows that U-Net and Mask-RCNN networks implemented in accordance with some embodiments of the disclosed subject matter are able to correctly classify almost all of the skin pixels, with a small percentage of non-skin regions incorrectly labeled as skin.
It should be understood that the above described steps of the processes of
In some embodiments, aspects of the present disclosure, including computerized implementations of methods, can be implemented as a system, method, apparatus, or article of manufacture using standard programming or engineering techniques to produce software, which can be firmware, hardware, or any combination thereof to control a processor device, a computer (e.g., a processor device operatively coupled to a memory), or another electronically operated controller to implement aspects detailed herein. Accordingly, for example, embodiments of the invention can be implemented as a set of instructions, tangibly embodied on a non-transitory computer-readable media, such that a processor device can implement the instructions based upon reading the instructions from the computer-readable media. Some embodiments of the invention can include (or utilize) a device such as an automation device, a special purpose or general purpose computer including various computer hardware, software, firmware, and so on, consistent with the discussion below.
The term “article of manufacture” as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier (e.g., non-transitory signals), or media (e.g., non-transitory media). For example, computer-readable media can include but can be not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips, and so on), optical disks (e.g., compact disk (CD), digital versatile disk (DVD), and so on), smart cards, and flash memory devices (e.g., card, stick, and so on). Additionally, it should be appreciated that a carrier wave can be employed to carry computer-readable electronic data such as those used in transmitting and receiving electronic mail or in accessing a network such as the Internet or a local area network (LAN). Those skilled in the art will recognize many modifications may be made to these configurations without departing from the scope or spirit of the claimed subject matter.
Certain operations of methods according to the invention, or of systems executing those methods, may be represented schematically in the FIGS. or otherwise discussed herein. Unless otherwise specified or limited, representation in the FIGS. of particular operations in particular spatial order may not necessarily require those operations to be executed in a particular sequence corresponding to the particular spatial order. Correspondingly, certain operations represented in the FIGS., or otherwise disclosed herein, can be executed in different orders than can be expressly illustrated or described, as appropriate for particular embodiments of the invention. Further, in some embodiments, certain operations can be executed in parallel, including by dedicated parallel processing devices, or separate computing devices configured to interoperate as part of a large system.
As used herein in the context of computer implementation, unless otherwise specified or limited, the terms “component,” “system,” “module,” etc. can be intended to encompass part or all of computer-related systems that include hardware, software, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a processor device, a process being executed (or executable) by a processor device, an object, an executable, a thread of execution, a computer program, or a computer. By way of illustration, both an application running on a computer and the computer can be a component. One or more components (or system, module, and so on) may reside within a process or thread of execution, may be localized on one computer, may be distributed between two or more computers or other processor devices, or may be included within another component (or system, module, and so on).
As used herein, the term, “controller” and “processor” include any device capable of executing a computer program, or any device that can include logic gates configured to execute the described functionality. For example, this may include a processor, a microcontroller, a field-programmable gate array, a programmable logic controller, etc.
The discussion herein is presented for a person skilled in the art to make and use embodiments of the invention. Various modifications to the illustrated embodiments will be readily apparent to those skilled in the art, and the generic principles herein can be applied to other embodiments and applications without departing from embodiments of the invention. Thus, embodiments of the invention can be not intended to be limited to embodiments shown, but can be to be accorded the widest scope consistent with the principles and features disclosed herein. The detailed description is to be read with reference to the figures, in which like elements in different figures have like reference numerals. The figures, which can be not necessarily to scale, depict selected embodiments and can be not intended to limit the scope of embodiments of the invention. Skilled artisans will recognize the examples provided herein have many useful alternatives and fall within the scope of embodiments of the invention.
Although the invention has been described and illustrated in the foregoing illustrative embodiments, it is understood that the present disclosure has been made only by way of example, and that numerous changes in the details of implementation of the invention can be made without departing from the spirit and scope of the invention, which is limited only by the claims that follow. Features of the disclosed embodiments can be combined and rearranged in various ways.
This application is based on, claims the benefit of, and claims priority to U.S. Provisional Application No. 62/779,306, filed Dec. 13, 2018, which is hereby incorporated by reference herein in its entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
62779306 | Dec 2018 | US |