The present invention relates generally to systems and methods for assessing surgical skill. More particularly, the present invention relates to systems and methods for using videos of the surgical field and context-specific quantitative metrics to automate the assessment of surgical skill in an operating room.
Surgery continuously evolves through new techniques, procedures, and technologies. Throughout their careers, surgeons acquire skill by learning new techniques and mastering known techniques. Prior to board certification, their learning is supported by an experienced supervisor during residency and fellowship. However, surgeons perform a small fraction of the total procedures in their career during residency training. Furthermore, learning in the operating room, despite being essential to acquire surgical skill, is limited by ad hoc teaching opportunities that compete with patient care. Once surgeons start practice, they lose routine access to specific feedback that helps them improve how they operate.
In one example, cataract surgery is the definitive intervention for vision loss due to cataract. Cataract surgery may result in distinct patient benefits including a reduced risk of death, falls, and motor vehicle accidents. An estimated 6353 cataract surgery procedures per million individuals are performed in the United States each year. Nearly 2.3 million procedures were performed in 2014 in Medicare beneficiaries alone. About 50 million Americans are expected to require cataract surgery by 2050.
Even with a common surgical procedure, such as cataract surgery, patient outcomes improve with surgeons' experience. Compared with surgeons who perform more than 1000 cataract procedures, the estimated risk of adverse events is 2-, 4-, and 8-fold higher for surgeons who performed 500 to 1000 procedures, 251 to 500 procedures, and fewer than 250 procedures, respectively. High complication rates in patients are 9 times more likely for surgeons in their first year than those in their tenth year of independent practice. Furthermore, each year of independent practice reduces this risk of complication by about 10%. Academic settings are similar, where the risk of complications when residents operate under supervision were higher for novice faculty than experienced faculty. Continuing technical development may improve the quality of surgical care and outcomes, but surgeons lack structured resources during training and accessible resources after entering independent practice.
The status quo of providing surgeons with patient outcomes or subjective skill assessments is insufficient because it is not intuitive for most surgeons to translate them into specifically how they can improve. Current alternatives for continuous feedback for surgeons include subjective crowdsourcing assessments and surgical coaching, which can be either through direct observation in the operating room or through video review. Despite evidence of effectiveness, surgical coaching is limited by barriers, including lack of time and access to qualified coaches, concerns of judgment by peers, and a sense of loss of autonomy. Therefore, it would be beneficial to have improved systems and methods for using context-specific quantitative metrics to automate the assessment of surgical skill in an operating room.
A method for determining or assessing a surgical skill is disclosed. The method includes determining one or more metrics of a surgical task being performed by a surgeon based at least partially upon a type of the surgical task being performed and a video of the surgical task being performed. The method also includes determining a surgical skill of the surgeon during the surgical task based at least partially upon the video, the one or more metrics, or a combination thereof.
In another embodiment, a method for determining a surgical skill of a surgeon during a surgical task is disclosed. The method includes capturing a video of a surgical task being performed by a surgeon. The method also includes segmenting the surgical task a plurality of segments. The method also includes marking one or more portions in the video. The one or more marked portions include a hand of the surgeon, an instrument that the surgeon is using to perform the surgical task, an anatomy on which the surgical task is being performed, or a combination thereof. The method also includes determining one or more metrics of the surgical task based at least partially upon a type of the surgical task being performed, one or more of the segments, and the one or more marked portions. The one or more metrics describe movement of the instrument, an appearance of the anatomy, a change in the anatomy, an interaction between the instrument and the anatomy, or a combination thereof. The method also includes determining a surgical skill of the surgeon during the surgical task based at least partially upon the one or more metrics. The method may also include providing feedback about the surgical skill.
A system for determining a surgical skill of a surgeon during a surgical task is also disclosed. The system includes a computing system having one or more processors and a memory system. The memory system includes one or more non-transitory computer-readable media storing instructions that, when executed by at least one of the one or more processors, cause the computing system to perform operations. The operations include receiving a video of a surgical task being performed by a surgeon. The operations also include segmenting the surgical task into a plurality of segments. The operations also include marking one or more portions in the video. The one or more marked portions include a hand of the surgeon, an instrument that the surgeon is using to perform the surgical task, an anatomy on which the surgical task is being performed, or a combination thereof. The operations also include determining one or more metrics of the surgical task based at least partially upon a type of the surgical task being performed, one or more of the segments, and the one or more marked portions. The one or more metrics describe movement of the instrument, an appearance of the anatomy, a change in the anatomy, an interaction between the instrument and the anatomy, or a combination thereof. The operations also include determining a surgical skill of the surgeon during the surgical task based at least partially upon the one or more metrics. The operations also include providing feedback about the surgical skill.
The accompanying drawings provide visual representations, which will be used to more fully describe the representative embodiments disclosed herein and can be used by those skilled in the art to better understand them and their inherent advantages. In these drawings, like reference numerals identify corresponding elements and:
The presently disclosed subject matter now will be described more fully hereinafter with reference to the accompanying Drawings, in which some, but not all embodiments of the inventions are shown. Like numbers refer to like elements throughout. The presently disclosed subject matter may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Indeed, many modifications and other embodiments of the presently disclosed subject matter set forth herein will come to mind to one skilled in the art to which the presently disclosed subject matter pertains having the benefit of the teachings presented in the foregoing descriptions and the associated Drawings. Therefore, it is to be understood that the presently disclosed subject matter is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims.
The present disclosure is directed to systems and methods for determining quantitative assessment of surgical skill using videos of the surgical field including metrics that pertain to specific aspects of a given surgical procedure, and using these metrics to assess surgical skill. More particularly, quantitative metrics that specifically describe different aspects of how a surgical task is performed may be determined. The metrics may be identified using textbooks, teachings by surgeons, etc. The metrics may be specific to the surgical context in a given scenario. The metrics may be described or defined in terms of objects in the surgical field (e.g., in a simulation and/or in an operating room). The objects may be or include the instruments used to perform the surgery, the anatomy of the patient, and specific interactions between the instruments and anatomy that are observed during a surgery. The metrics may then be extracted using data from the surgical field. A subset of the extracted metrics may be selected to determine or predict skill. A skill assessment may then be generated based upon the subset. The specificity of the metrics to the task or activity being performed may result in a translation of measurable change in performance that surgeons can target during their learning.
The systems and methods described herein may develop and/or store a library of surgical videos, intuitively displayed on a dashboard on a computing system. This may allow a surgeon to watch the video of the full surgical task or one or more selected steps thereof. The system and method may also generate an unbiased objective assessment of the surgeon's skill for target steps, and review pertinent examples with feedback on how to improve the surgeon's performance. The platform functionalities may be enabled and automated by machine learning (ML) techniques. These functionalities may include extraction of targeted segments of a surgical task, assessment of surgical skills for the extracted segments, identifying appropriate feedback, and relating the assessment and feedback to the surgeon.
The method 100 may also include performing a surgical task, as at 102. In one example, the surgical task may be or include at least a portion of a capsulorhexis procedure, and the following description of the method 100 is described using this example. However, as will be appreciated, the method 100 may be applied to any surgical task. In another example, the surgical task may be or include at least a portion of a trabeculectomy procedure or a prostatectomy procedure. As used herein, a “surgeon task” refers to at least a portion of a “surgical procedure.”
The method 100 may also include capturing a video of the surgical task being performed, as at 104. This may also or instead include capturing a video of at least a portion of the full surgical procedure including the surgical task.
The method 100 may include segmenting the surgical task (e.g., into different portions), as at 106. This may also or instead include segmenting the surgical procedure (e.g., into different surgical tasks).
As mentioned above, in this particular example, the surgical task 300 that is segmented is at least a part of a capsulorhexis procedure. A capsulorhexis procedure is used to remove a membrane (e.g., the lens capsule) 310 from the eye during cataract surgery by shear and stretch forces. More particularly, during a capsulorhexis procedure, a surgeon may use one or more instruments (e.g., forceps) to hold the lens capsule 310 and tear it in discrete movements to create a round, smooth, and continuous aperture to access the underlying lens. For example, the instrument may be inserted into/through the lens capsule 310 at an insertion point 320 and used to tear the lens capsule 310 into four segments/portions: a subincisional quadrant 331, a postincisional quadrant 332, a supraincisional quadrant 333, and a preincisional quadrant 334. The subincisional quadrant 331 may be defined by a first tear line 341 and a second tear line 342. The postincisional quadrant 332 may be defined by the second tear line 342 and a third tear line 343. The supraincisional quadrant 333 may be defined by the third tear line 343 and a fourth tear line 344. The preincisional quadrant 334 may be defined by the fourth tear line 344 and the first tear line 341.
The method 100 may also include marking the video, as at 108. This may include marking (also referred to as localizing) the hand of the surgeon 210 that is performing the surgical task. This may also include marking an instrument or other elements visible or hypothesized in the video that is/are used (e.g., by the surgeon 210) to perform the surgical task. The hand, the instrument, or both may be referred to as an effector. This may also or instead include marking the anatomy (e.g., the appearance and/or change of the anatomy) of the patient 220 on which the surgical task is being performed (e.g., the lens capsule 310).
The portions 411-414 may be marked one or more times in the video. In one example, the portions 411-414 may be marked in each segment of the video. In another example, the portions 411-414 may be marked in each frame 400 of the video. In one example, coordinate points of the marked instrument tips 411, 412 may be standardized so that the middle of the marked insertion sites 413, 414 may be set as the origin in each marked frame. This may help to account for potential movement of the camera. However, other techniques may also or instead be used to account for movement of the camera.
In one embodiment, the portions 411-414 may be marked manually (e.g., using crowdsourcing). In another embodiment, the portions 411-414 may be marked automatically using an algorithm (e.g., a high-resolution net algorithm). For example, the algorithm may be able to predict the locations of the portions 411-414 (e.g., when the locations are not visible in the video). In yet another embodiment, step 108 (i.e., marking the video) may be omitted.
The method 100 may also include determining one or more metrics of the surgical task, as at 110. The metrics may be based at least partially upon unmarked videos (from 104), the segments of the task (from 106), marked videos (from 108), or a combination thereof. The metrics may be measured in one or more frames (e.g., each frame 400) of the video, between two or more (e.g., consecutive) frames of the video, or a combination thereof. The metrics may be or include context-specific metrics for the particular surgical task (e.g., capsulorhexis procedure). In other words, each type of surgical task may have a different set of metrics. For example, the metrics may describe the movement of the anatomy (e.g., the lens capsule 310), the movement of the instrument 410, the interaction between the anatomy and the instrument 410, or a combination thereof.
In one embodiment, the metrics may be measured/determined manually in the video (e.g., using crowdsourcing). For example, a user (e.g., a surgeon) watching the video (or viewing the frames of the video) may measure/determine the metrics in one or more frames of the video based at least partially upon the marked portions 411-414. In another embodiment, the metrics may be measured/determined automatically in the video. For example, one or more artificial neural networks (ANNs) may measure/determine the metrics in one or more frames of the video (e.g., based at least partially upon the marked portions 411-414). In one embodiment, the ANN may be trained to determine the metrics using a library of videos of similar surgical tasks (e.g., capsulorhexis procedures). The metrics may have been previously determined in the videos in the library.
Each type of surgical task may have different metrics. Illustrative metrics for the particular surgical task (e.g., capsulorhexis procedure) described above may include when the instrument 410 is grasping the tissue/membrane (e.g., the lens capsule 310) and when the instrument 410 is tearing the lens capsule 310. The proximity of the tips 411, 412 of the instrument 410 may be used to determine when the instrument 410 is grasping and/or tearing. The distance between the marked tips 411, 412 may be measured/determined in one or more frames (e.g., each frame) of the video. In one embodiment, the tips 411, 412 of the instrument 410 may be defined as touching when the space between them is less than the sum of the mode (e.g., most frequent value) of the distance between the tips 411, 412 and the standard deviation of these values. This may be referred to as the touch distance threshold. The touch distance threshold may be verified manually through visual comparison with the video. The marked tips 411, 412 may be determined to be grasping the tissue/membrane (e.g., lens capsule 310) in response to a predetermined number of consecutive frames (e.g., two consecutive frames) of the video in which the marked tips 411, 412 are determined to be touching. Tears may be treated as a subset of grasps. For example, the instrument 410 may be determined to be tearing the tissue/membrane (e.g., lens capsule 310) in response to (1) the displacement of the instrument 410 during the grasp being greater than the touch distance threshold; and/or (2) the grasp lasting for longer than a predetermined period of time (e.g., 1 second).
Additional metrics may include: the eye that was operated on (e.g., left or right), the location of incision to access the eye, the direction of flap propagation, the area of the convex hull, the circularity of the convex hull, the total number of grasp movements, the total number of tears, the number of tears placed into quadrants, the average and standard deviation of tear distance (e.g., in pixels), the average and standard deviation of tear duration (e.g., in seconds), the average and standard deviation of retear distance (e.g., in pixels), the average and standard deviation of retear duration (e.g., in seconds), the average and/or standard deviation of the length of the tool within the eye (e.g., in pixels), the distance traveled to complete each quadrant (e.g., in pixels), the average and/or standard deviation of the changes in the angle relative to the insertion point for each quadrant (e.g., in degrees), the total change in the angle relative to the insertion point for each quadrant (e.g., in degrees), the difference between DeltaTheta1 and DeltaTheta2 as well as DeltaTheta3 and DeltaTheta4 (e.g., in degrees), the number of tears placed in each quadrant, the average distance of each tear per quadrant (e.g., in pixels), the average duration of each tear per quadrant (e.g., in seconds), the average length of tool within eye/quadrant (e.g., in pixels), or a combination thereof. Table 1 below provides additional details about these metrics.
The method 100 may also include categorizing the one or more metrics into one or more categories, as at 112. This may be a sub-step of 110. In one embodiment, the metrics may be categorized manually (e.g., using user/expert input). In another embodiment, the metrics may be categorized automatically. For example, the ANN may categorize the metrics. In one embodiment, the ANN may be trained to categorize the metrics using the library of videos of similar surgical tasks where the metrics have been previously categorized.
Each type of surgical task may have different categories. Illustrative categories for the particular surgical task (e.g., capsulorhexis step) 200 described above may include: metrics that span the entire video and are unrelated to the quadrants, all of the metrics that are related to the quadrants, quadrant-specific metrics divided into each respective quadrant, all of the metrics that characterize grasps and/or tears, including quadrant-specific metrics, quadrant-specific metrics characterizing grasps and/or tears, all metrics relating to the position, distance, and/or angle of the tips 411, 412 of the instrument 410. Table 2 below provides additional details about these categories.
The method 100 may also include determining (also referred to as assessing) a surgical skill (e.g., of a surgeon) during the surgical task, as at 114. The surgical skill may be determined based at least partially (or entirely) upon the unmarked video (from 104), the segments of the task (from 106), the marked portions 411-414 (from 108), the metrics (from 110), the categories (from 112), or a combination thereof. The determined surgical skill may be in the form of a score (e.g., on a scale from 0-100). More particularly, the score may be a continuous scale of surgical skill spanning from poor skill (e.g., novice) to superior skill (e.g., expert). In one embodiment, for capsulorhexis, the score may include two items with each item having a value of either 2 (e.g., novice), 3 (e.g., beginner), 4 (e.g., advanced beginner) or 5 (e.g., expert). In one embodiment, the surgical skill may be assessed in real-time (e.g., during the surgical task).
The surgical skill may be determined automatically. More particularly, the decision tree may determine the surgical skill. For example, the decision tree may be trained to select one or more subsets of the segments, the portions 411-414, the metrics, the categories, or a combination thereof, and the surgical skill may be determined therefrom. The decision tree may be trained using the library of videos of similar surgical tasks where the surgical skill has been previously determined. The ANN may also or instead use attention mechanisms/modules to identify segments and/or metrics in the video that may influence the network's determination. The ANN may also or instead be trained to function as a powerful feature extractor from input data including videos, where the resulting metrics are effectively analyzed to achieve one or more functionalities in the platform.
In one embodiment, the surgical skill may be determined using the ANN (e.g., a temporal convolution network (TCN)) applied to a partially marked video for instrument tips 411, 412. In another embodiment, the surgical skill may be determined using a convolutional neural network (CNN) in combination with or without a spatial attention module to transform the unmarked video (e.g., frames) into a feature that is then run through a recurrent neural network (RNN) with or without temporal attention module(s). As used herein, a “feature” refers to spatial and temporal patterns in video frames that are extracted through convolutions and other operations within the ANN. In yet another embodiment, the surgical skill may be determined using a multi-task learning framework for training neural networks.
Conventional attention models, including the baseline model, learn attention maps with task-oriented loss (e.g., cross-entropy loss). As used herein, an “attention map” refers to weights assigned to each pixel in an image. These attention maps, which may be computed within the attention modules mentioned in the previous paragraph, represent a layer of re-weighting or “attending to” the image features. However, without explicit supervision, they may not localize relevant regions in the images. As used herein, “explicit supervision” refers to guiding the network to specific known regions or time windows in the image features. Furthermore, without a large amount of training data, attention mechanisms may assign higher weights to regions having spurious correlations with the target label. To remedy these issues, the system and method herein may explicitly supervise the attention map using specific structured information or cues in the images that are related to the task of surgical skill to improve accuracy of the model predictions. The structured information may include, for example, instrument tip locations, instrument pose, or specific changes in anatomy or other elements in the surgical field. Thus, in one embodiment, determining the surgical skill (e.g., step 114) may include explicit supervision of the attention map using instrument tip trajectories. In an example, binary trajectory heat maps Bi may be constructed for each frame i, combining the locations sk,m,n of all instrument tips, where s is a binary indicator variable denoting if instrument tip k is located at pixel coordinates m, n:
For training, the overall loss function may combine binary cross-entropy for skill classification L B C E and the Dice coefficient between the spatial attention map Aispatial and the tool-tip heat map B:
L
Dice
=DL({Aispatial|i∈[1,N]},{Bi|∈[1,N]}) (Equation 2)
L=L
BCE
+λ·L
Dice (Equation 3)
The weighting factor λ, may empirically be set to a number from about 0.1 to about 0.9 a(e.g., 0.5). The attention map Aispatial may be supervised using the trajectory heat map (which is one example of a structured element relevant for surgical skill) so that the attended image feature vector has greater weight on features around the structured element (instrument tips).
One or more views (e.g., cross-sectional views) 1220 of the instrument 410 may be determined based at least partially upon the first input 1210. The view(s) 1220 may be determined manually and/or automatically. The view(s) 1220 may be introduced into a first ANN 1230, which may be running a supervised machine learning (SVM) algorithm. One or more time series 1222 of the instrument 410 may also or instead be determined based at least partially upon the first input 1210. The time series 1222 may be determined manually and/or automatically. The time series 1222 may be introduced into a second ANN 1232, which may be running a recurrent neural network (RNN) algorithm.
One or more spatial features 1224 in the frames of the video may be determined based at least partially upon the second input 1212. The spatial features 1224 may be determined manually or automatically. The spatial features 1224 may be introduced into a third ANN 1234, which may be running a convolution neural network (CNN) algorithm.
In one embodiment, the time series 1222 and/or the output from the third ANN 1234 may be introduced into a fourth ANN 1236, which may be running a RNN algorithm. The output from the third ANN 1234 may also or instead be introduced into a fifth ANN 1238, which may be running a RNN algorithm. One or more of the ANNs 1230, 1232, 1234, 1236, 1238 may categorize the metrics. Performance of the ANNs may be measured using the area under the receiver-operating characteristic curve (e.g., AUROC or AUC). AUROC may be interpreted as the probability that the algorithm correctly assigns a higher score to the expert video in a randomly drawn pair of expert and novice videos. The AUC for ANNs 1230, 1232, 1234, 1236, 1238 are shown at the bottom-left of
The graph 1300 may be generated as part of step 114 to provide a visual representation of performance of the algorithm used to determine surgical skill. The ANNs may receive different input data, including (e.g., manually) annotated instrument tips 411, 412 (represented as tool velocity; TV in
Table 3 below illustrates results from an illustrative algorithm (e.g., a random forest algorithm) determining the surgical skill based upon the one or more metrics. As used in the table, “positive predictive value” refers to the probability that a video determined to be by an expert is actually by an expert. As used in the table, “negative predictive value” refers to the probability that a video determined to be by a novice is actually by a novice. As used in the table, “quadrant-specific” refers to metrics computed using data from one quadrant or segment of capsulorhexis as illustrated in
The method 100 may also include providing feedback about the surgical skill, as at 116. The feedback may be determined and provided based at least partially upon the unmarked video (from 104), the segments of the task (from 106), the marked portions 411-414 (from 108), the metrics (from 110), the categories (from 112), the determined skill (from 114), or a combination thereof. The feedback may be targeted to a specific part of the surgical task (e.g., a particular segment). In one embodiment, the feedback may be provided in real-time (e.g., during the surgical task).
The feedback may be determined and provided automatically. More particularly, the ANN may determine and provide the feedback. The ANN may be trained using the library of videos of similar surgical tasks where the metrics and surgical skill have been previously determined. The feedback may be in the form of audio feedback, video feedback, written/text feedback, or a combination thereof.
The method 100 may also include predicting the surgical skill (e.g., of the surgeon) during a future task, as at 118. The surgical skill may be predicted based at least partially upon the unmarked video (from 104), the segments of the task (from 106), the marked portions 411-414 (from 108), the metrics (from 110), the categories (from 112), the determined skill (from 114), the feedback (from 116), or a combination thereof. The future task may be the same type of surgical task (e.g., a capsulorhexis procedure) or a different type of surgical task (e.g., a prostatectomy procedure).
Thus, the systems and methods described herein may use videos of the surgical task as input to a software solution to provide surgeons with information to support their learning. The solution includes a front end to interface with surgeons, whereby they upload videos of surgical tasks 200 they perform, and receive/view objective assessments of surgical skill and specific feedback on how they can improve. On the back end, the software includes multiple algorithms that provide the functionalities in the platform. For example, when a surgeon uploads a video of a cataract surgery procedure, one implementation of an ANN extracts video for the capsulorhexis step, and additional implementations of ANNs predict a skill rating for capsulorhexis and specific feedback on how the surgeon can improve his/her performance. An additional element may include providing surgeons with narrative feedback. This feedback can effectively support surgeon's learning and improvement in skill.
A processor can include a microprocessor, microcontroller, processor module or subsystem, programmable integrated circuit, programmable gate array, or another control or computing device.
The storage media 1406A can be implemented as one or more computer-readable or machine-readable storage media. Note that while in the example embodiment of
In some embodiments, computing system 1400 contains one or more fine scale surgical assessment module(s) 1408 which may be used to perform at least a portion of the method 100. It should be appreciated that computing system 1400 is only one example of a computing system, and that computing system 1400 may have more or fewer components than shown, may combine additional components not depicted in the example embodiment of
The many features and advantages of the invention are apparent from the detailed specification, and thus, it is intended by the appended claims to cover all such features and advantages of the invention which fall within the true spirit and scope of the invention. Further, since numerous modifications and variations will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation illustrated and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope of the invention.
This patent application is the national stage entry of International Patent Application No. PCT/US2022/021258, filed on Mar. 22, 2022, and published as WO 2022/204083 A1 on Sep. 29, 2022, which claims the benefit of U.S. Provisional Patent Application Ser. No. 63/165,862, filed Mar. 25, 2021, both of which are hereby incorporated by reference herein in their entireties.
This invention was made with Government support under EY033065, awarded by the National Health Institutes. The Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/021258 | 3/22/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63165862 | Mar 2021 | US |