Some enterprises implement services for generating transcripts of conversations. For example, automatic speech recognition (ASR) may be used to generate transcripts. Also, some enterprises provide natural language processing (NLP) services. However, general purpose ASR and NLP systems may not function well for medical conversations. For example, due to specialized terms used in the medical industry. Also, NLP services may provide acceptable results when asked to perform discrete low-level tasks, but may provide low quality results when asked to perform higher-level tasks such as generating an overall summary of a medical conversation.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. The drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.
This specification includes references to “one embodiment” or “an embodiment.” The appearances of the phrases “in one embodiment” or “in an embodiment” do not necessarily refer to the same embodiment. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure.
“Comprising.” This term is open-ended. As used in the claims, this term does not foreclose additional structure or steps. Consider a claim that recites: “An apparatus comprising one or more processor units . . . .” Such a claim does not foreclose the apparatus from including additional components.
“Configured To.” Various units, circuits, or other components may be described or claimed as “configured to” perform a task or tasks. In such contexts, “configured to” is used to connote structure by indicating that the units/components include structure that performs those task or tasks during operation. As such, the unit/component can be said to be configured to perform the task even when the specified unit/component is not currently operational (e.g., is not on). The units/components used with the “configured to” language include hardware—for example, circuits, memory storing program instructions executable to implement the operation, etc. Reciting that a unit/component is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. § 112, paragraph (f), for that unit/component. Additionally, “configured to” can include generic structure that is manipulated by software or firmware to operate in manner that is capable of performing the task(s) at issue.
“Based On” or “Dependent On.” As used herein, these terms are used to describe one or more factors that affect a determination. These terms do not foreclose additional factors that may affect a determination. That is, a determination may be solely based on those factors or based, at least in part, on those factors. Consider the phrase “determine A based on B.” While in this case, B is a factor that affects the determination of A, such a phrase does not foreclose the determination of A from also being based on C. In other instances, A may be determined based solely on B.
“Or.” When used in the claims, the term “or” is used as an inclusive or and not as an exclusive or. For example, the phrase “at least one of x, y, or z” means any one of x, y, and z, as well as any combination thereof.
It will also be understood that, although the terms 1, 2, N, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a component with the term 1 could be termed a second component, and, similarly, a component with the term 2 could be termed a first component, without departing from the scope of the present invention. The first components and the second component are both components, but they are not the same components. Also, the term N indicates that an Nth amount of the elements may or may not exist depending on the embodiments.
The burden of documenting clinical visits is one of the largest sources of inefficiency in healthcare. Physicians often spend considerable time navigating different tabs, fields, and drop-downs in electronic health record (EHR) systems to capture details such as medications, allergies, and medical conditions. Physicians also make short-hand notes during the consultation on topics such as patient's history of illness or clinical assessment, and enter their summarized notes in the EHR systems after the visit, often during the off-peak hours. Even with the help of scribes, creating clinical documentation and summaries can be time consuming and inefficient. Training a machine learning model for generating transcripts and summaries of medical conversations require a range of resources which adds cost and complexity.
Additionally, current machine learning models are not well suited to the nuanced tasks of generating summaries of medical conversations. For example, the large number of variables involved in generating a summary of a medical conversation and the specialized terms used in the medical industry may cause inaccurate results when using current machine learning models. Also, due to the importance of accuracy in medical records, a very low (or zero) error rate may be required in medical summaries. For example, mis-stating a drug dosage in a summary and using such information subsequently may lead to poor patient outcomes. Thus, a highly accurate medical audio summarization service is needed.
To address these issues and/or other issues, in some embodiments, a system may provide a HIPAA-eligible conversational intelligence capability trained to understand patient-physician conversations across diverse medical specialties. In order to overcome accuracy problems of current machine learning models, a medical transcription engine may be trained using medical training data that includes annotated medial entities that the medical transcription engine is to be trained to detect. Also, a medical natural language processing engine may be trained with annotated versions of medical transcripts generated by the medical transcription engine. Additionally, instead of using a single (or shared) machine learning model to perform various tasks involved in summary generation from a transcript, different tasks involved in generating a summary may be separated out into discrete tasks of a workflow performed by a medical natural language processing engine. Additionally, discrete machine learning models may be trained to perform respective ones of the discrete tasks of the workflow, wherein the discrete machine learning models are trained to perform a narrow task and are also trained using specialized training data (that has been annotated), wherein the specialized training data is based on outputs generated by a preceding task in the workflow, wherein the preceding task uses its own discrete machine learning model specially trained to perform a narrow task involved in the preceding task in the workflow. In this way, highly accurate medical transcripts may be generated with a high-level of confidence. For example, a single machine learning model that has been trained to go directly from an input transcript to an output summary may make errors in the summary, such as attributing phrases spoken by a patient to be marked as if they were spoken by a physician, incorrectly stating a drug name or dosage, etc. However, when separated into discrete tasks and integrated together via shared data that builds on a preceding task, a medical natural language processing engine comprising multiple specially trained machine learning models may be configured to identify phrase attribution with very low error, and identify medical entities, such as drug names and dosages, with high accuracy, as well as perform other summarization tasks with a high-level of accuracy and confidence.
In some embodiments, a medical audio summarization service may receive a request to generate a transcript and a summary of a medical conversation, with a medical conversation job packet to be summarized, including audio data and meta data of the medical conversation. In some embodiments, the transcript may be generated via a medical transcription service based on the audio data from the medical conversation job packet. In some embodiments, the transcript may be generated while the medical conversation is occurring. The transcription service may receive audio data from the medical conversation and may begin to generate the transcript while continuing to receive audio data from the same medical conversation. The transcript and meta data may be provided to a medical natural language processing service to generate a summary of the medical conversation using the transcript.
To generate the summary of the medical conversation, a plurality of specialized machine learning models may be implemented to perform discrete tasks, such as identify medical entities and speaker roles in the transcript, determine sections of the transcript corresponding to the summary, extract phrases for the summary, and/or abstract phrases for the summary. In some embodiments, medical entities including but not limited to medical terms for medicines and diseases may be identified in the transcript using a first machine learning model. Using a second machine learning model, speaker roles, such as physician and patient, may be identified. Portions of the transcript that correspond to subject matter of sections for the summary may be determined using a third machine learning model. The third machine learning model or a fourth machine learning model may be implemented to extract phrases from the sections for the summary or to abstract phrases for the summary. In some embodiments, the abstracting phrases may be performed by paraphrasing from the sections of the transcript. The summary may then be generated using the identified medical entities, identified speaker roles, determined sections, and extracted/abstracted phrases.
In some embodiments, the respective machine learning models may be used in different orders, but may be trained in whichever order the machine learning models are to be used. For example, in some embodiments, speaker role identification may be performed before medical entity identification, but in such a case, the medical entity identification model may be trained using training data this is output from the speaker role identification task. In other embodiments, medical entity identification may be performed prior to speaker role identification, in which case the speaker role identification model may be trained using training data that is output from the medical entity identification task.
In some embodiments, a notification report indicating the generation of the summary document may be provided. An application programmatic interface (API) may be implemented for providing the summary for upload to an electronic health record service.
In some embodiments, the transcript may be merged with the results of a preceding model before being used for a future model. For example, a merged transcript including identified medical entities may be the transcript used for identifying the speaker roles with the second machine learning model. In some embodiments, the machine learning models may be trained based on merged transcripts comprising results from previously trained machine learning models. For example, if the first and second machine learning models are trained before the third machine learning model, third machine learning model may be trained with a transcript including identified medical entities and speaker roles.
In some embodiments, customer preferences may be uploaded to a customer interface to update the machine learning model of the transcription service and the machine learning models for generating the summary. For example, a physician may require a specific summary template and may upload the template to the customer interface to further train the machine learning models. In another example, a physician may upload their own training data including annotated transcripts for training the machine learning models. For example, physicians practicing in different specialized fields may desire to have models trained using terms specific to their specialties. In some embodiments, a medical audio summarization service may maintain specialty specific models regardless of whether or not a physician provides practice specific training data.
As will be appreciated by those skilled in the art, features of the system disclosed herein may be implemented in computer systems to solve technical problems in the state of the art and to improve the functioning of the computer systems. For example, as discussed above, and as discussed in more detail below, such features of the system improve medical conversation transcript and summary generating in a way that provides higher accuracy than prior approaches. This may be achieved, at least in part, by dividing the summarization process into discrete tasks and training discrete machine learning models in an integrated fashion to perform the respective discrete tasks. Such features also improve the functioning of the computer system by requiring less computational resources than conventional machine learning models. For example, computational resources required to perform the discrete tasks may sum to be significantly less than the computational resources required to perform summarization using a single model with significantly more variables. These and other features and advantages of the disclosed system are discussed in further detail below, in connection with the figures.
In some embodiments, medical audio summarization is performed, such as by a medical audio summarization service 100, and may resemble embodiments as shown in
In some embodiments, a medical natural language processing engine 122 may receive notification of a job request to generate a summary and may also receive the transcript needed for the job request via a transcript retrieval interface 124. Notification of the job request and the transcript may be provided to a control plane 126 for the medical natural language processing engine 122 and the job request and transcript may be provided to a job queue 128. A work flow processing engine 130 may be instantiated by the control plane 126 and may receive the job request and the transcript from the job queue 128. The work flow processing engine 130 may then invoke machine learning models such as a medical entity detection model 132 to identify medical entities, a role identification model 134 to identify speaker roles, and a summarization module 140 including a sectioning model 136 to determine sections for the summary, and an extraction/abstraction model 138 to extract or abstract phrases for the summary. The work flow processing engine 130 may then generate the summary based on the results from the invocation of the machine learning models. For example, a computing instance instantiated as a workflow processing engine 130 may access respective ones of the models to perform discrete tasks, such as medical entity detection, role identification, sectioning, extraction, and abstraction. The workflow processing engine 130 may merge results from each task into a current version of the transcript that is being updated as the discrete tasks are performed. The currently updated (and merged) version of the transcript may be used as an input to perform respective ones of the subsequent discrete tasks.
For example, in some embodiments, the work flow processing engine 130 may merge the results from a task performed using a prior model with the transcript and use the merged transcript to determine results for a task that uses the next model. For example, a workflow worker instance of the work flow processing engine 130 may invoke a medical entity detection model 132 to identify medical entities in a transcript. The results may then be merged with the transcript to include the original transcript with the identified medical entities. The workflow worker instance may then invoke the role identification model 134 to identify speaker roles in the merged transcript. The identified speaker role results may then be merged with the merged transcript to include the identified medical entities and identified speaker roles. The workflow worker instance may invoke the section model 136 to determine portions of the merged transcript corresponding to subject matter of sections for the summary document. The section results may then be merged with the merged transcript to include the identified medical entities, the identified speaker roles, and the determined sections. The workflow worker instance may invoke the extraction/abstraction model 138 to extract and/or abstract phrases in the merged transcript for the summary.
In some embodiments, the models may be invoked in a different order. For example, the role identification model 134 may be invoked first causing the identified speaker role results to be merged with the transcript. The medical entity detection model 132 may then be performed meaning the workflow worker instance may use the transcript merged with the identified speaker role results to determine the medical entities in the merged transcript. In some embodiments, the information gained from use of the previous model may be used for the next model. For example, if the role identification model 134 is performed using a merged transcript including identified medical entities, the role identification model 134 may identify that the physician as the speaker who used the most technical medical entities.
In some embodiments, a model training coordinator 142 may be used for training the machine learning models with labeled training data, such as annotated transcripts. For example, training is further discussed in detail in regard to
Once the summary is generated, the work flow processing engine 130 may provide the generated summary to an output interface 144. The output interface 144 may notify the customer of the completed job request. In some embodiments, the output interface may provide a notification of a completed job to the output API 146. In some embodiments, the output API 146 may be implemented to provide the summary for upload to an electronic health record or may push the summary out to an electronic health record, in response to a notification of a completed job.
Some embodiments, such as shown in
The extraction model 138A may be a machine learning model used to extract phrases from the transcript for the summary. The abstraction model 138B may be a machine learning model utilized to abstract phrases or sections from the transcript for the summary. In some embodiments, the extraction model 138A may be used first and the abstraction model 138B may be utilized with results from the extraction model 138A to abstract the extracted phrases. In some embodiments, the abstraction model 138B may be utilized without the results from the extraction model 138A and may analyze the sections of the transcript to paraphrase sections for the summary. The evidence linkage model 138C may be a generative model used to link either the extracted or abstracted phrases from the generated summary to the transcript based on confidence values. The higher the confidence value the stronger likelihood that the extracted or abstracted phrase from the generated summary is related to a specific portion of the transcript. Thus, the evidence linkage model 138C may be used to ensure the quality of the generated summary by requiring the confidence values to be higher than a determined threshold.
Some embodiments, such as shown in
In various embodiments, the medical audio summarization service 100 may implement interface(s) 211 to allow clients (e.g., client(s) 250 or clients implemented internally within provider network 200, such as a client application hosted on another provider network service like an event driven code execution service or virtual compute service) to interact with the medical audio summarization service 100. The interface(s) 211 may be one or more of graphical user interfaces, programmatic interfaces that implement Application Program Interfaces (APIs) and/or command line interfaces, such as input interface 102, customer interface 108, and/or output interface 144, for example as shown in
In at least some embodiments, workflow processing engine(s) 130 may be implemented on servers 231 to initiate tasks for a medical transcription engine 110 and a medical natural processing engine 122. The workload distribution 234, comprising one or more computing devices, may be responsible for selecting the particular server 231 in execution fleet 230 that is to be used to implement a workflow engine to be used to perform a given job. The medical audio summarization service 100 may implement control plane(s) 220 to perform various control operations to implement the features of medical audio summarization service 100, such as control plane 112 and control plane 126 in
The medical audio summarization service 100 may utilize machine learning resources 240. The machine learning resources 240 may include parameter tuning model 244 and models 242 such as the medical entity detection model 132, the role identification model 134, the sectioning model 136, and the extraction/abstraction model 138, for example as shown in
Generally speaking, clients 250 may encompass any type of client that can submit network-based requests to provider network 200 via network 260, including requests for the medical audio summarization service 100 (e.g., a request to generate a transcript and summary of a medical conversation). For example, a given client 250 may include a suitable version of a web browser, or may include a plug-in module or other type of code module that can execute as an extension to or within an execution environment provided by a web browser.
In some embodiments, a client 250 may provide access to provider network 200 to other applications in a manner that is transparent to those applications. Clients 250 may convey network-based services requests (e.g., requests to interact with services like medical audio summarization service 100) via network 260, in some embodiments. In various embodiments, network 260 may encompass any suitable combination of networking hardware and protocols necessary to establish network-based-based communications between clients 250 and provider network 200. For example, network 260 may generally encompass the various telecommunications networks and service providers that collectively implement the Internet. Network 260 may also include private networks such as local area networks (LANs) or wide area networks (WANs) as well as public or private wireless networks, in one embodiment. For example, both a given client 250 and provider network 200 may be respectively provisioned within enterprises having their own internal networks. In such an embodiment, network 260 may include the hardware (e.g., modems, routers, switches, load balancers, proxy servers, etc.) and software (e.g., protocol stacks, accounting software, firewall/security software, etc.) necessary to establish a networking link between the given client 250 and the Internet as well as between the Internet and provider network 200. It is noted that in some embodiments, clients 250 may communicate with provider network 200 using a private network rather than the public Internet.
In some embodiments, a process for generating a transcript and a summary of a medical conversation may resemble a process such as that which is shown in
Some embodiments, such as shown in
In block 400, a notification of a medical conversation transcript to be processed may be received. For example, the transcript retrieval interface 124 shown in
Some embodiments, such as shown in
In block 500, the merged medical conversation transcript (from block 408) may be submitted by the workflow worker instance to the role identification model. In block 502, the results from the role identification model may be received. An example of results includes the identified physician and patient roles bolded in the transcript of the medical conversation shown in
Some embodiments, such as shown in
In block 600, the merged medical conversation transcript (from block 504) may be submitted by the workflow worker instance to the sectioning model. In block 602, the results from the sectioning model may be received. An example of results includes the determined sections shown by labeled bracketed sections in the transcript of the medical conversation in
Some embodiments, such as shown in
In block 700, the merged medical conversation transcript (from block 604) may be submitted by the workflow worker instance to the extraction/abstraction model. In block 702, the results from the extraction/abstraction model may be received. Results may include important phrases to be included in the generated summary as shown by underlined phrases and short descriptions of the phrases in
Some embodiments, such as shown in
Some embodiments, such as shown in
Transcript C 908 may represent a transcript annotated with medical entities and speaker roles. To train a role identification model such as the role identification model 134 in
Transcript D 914 may represent a transcript annotated with medical entities, speaker roles, and determined sections. To train a sectioning model such as the sectioning model 136 in
Transcript E 920 may represent a transcript annotated with medical entities, speaker roles, determined sections, and extracted phrases. To train an extraction/abstraction model such as the extraction/abstraction model 136 in
In some embodiments, a process for generating a transcript and a summary of a medical conversation may resemble a process such as that which is shown in
Blocks 1021, 1022, 1023, 1024, and 1025 may further describe block 1020. In block 1021, the transcript and the meta data may be accessed from the medical transcription service. In block 1022, medical entities in the transcript may be identified using a first machine learning model. In block 1023, speaker roles in the transcript, wherein the speaker roles comprise at least a patient and physician may be identified. In block 1024, portions of the transcript corresponding to subject matter of sections for the summary document may be determined using a third machine learning model. In block 1025, phrases may be extracted and/or abstracted, using the third or a fourth machine learning model, from the sections for the summary document. In block 1030, the summary document comprising the extracted or abstracted phrases may be provided. In block 1040, a notification report indicating generation of the summary document may be provided.
In at least some embodiments, a server that implements a portion or all of one or more of the technologies described herein, including the techniques for performing medical audio summarizing, may include a general-purpose computer system that includes or is configured to access one or more computer-accessible media.
In various embodiments, computing device 1100 may be a uniprocessor system including one processor 1102, or a multiprocessor system including several processors 1102 (e.g., two, four, eight, or another suitable number). Processors 1102 may be any suitable processors capable of executing instructions. For example, in various embodiments, processors 1102 may be general-purpose or embedded processors implementing any of a variety of instruction set architectures (ISAs), such as the x86, PowerPC, SPARC, or MIPS ISAs, or any other suitable ISA. In multiprocessor systems, each of processors 1102 may commonly, but not necessarily, implement the same ISA. In some implementations, graphics processing units (GPUs) may be used instead of, or in addition to, conventional processors.
System memory 1110 may be configured to store instructions and data accessible by processor(s) 1102. In at least some embodiments, the system memory 1110 may comprise both volatile and non-volatile portions; in other embodiments, only volatile memory may be used. In various embodiments, the volatile portion of system memory 1110 may be implemented using any suitable memory technology, such as static random-access memory (SRAM), synchronous dynamic RAM or any other type of memory. For the non-volatile portion of system memory (which may comprise one or more NVDIMMs, for example), in some embodiments flash-based memory devices, including NAND-flash devices, may be used. In at least some embodiments, the non-volatile portion of the system memory may include a power source, such as a supercapacitor or other power storage device (e.g., a battery).
In various embodiments, memristor based resistive random-access memory (ReRAM), three-dimensional NAND technologies, Ferroelectric RAM, magnetoresistive RAM (MRAM), or any of various types of phase change memory (PCM) may be used at least for the non-volatile portion of system memory. In the illustrated embodiment, program instructions and data implementing one or more desired functions, such as those methods, techniques, and data described above, are shown stored within system memory 1110 as program instructions for medical audio summarizing 1112 and medical audio summarizing data 1114. For example, program instructions for medical audio summarizing 1112 may include program instructions for implementing a medical audio summarization service, such as medical audio summarization service 100 illustrated in
In one embodiment, I/O interface 1108 may be configured to coordinate I/O traffic between processor 1102, system memory 1110, and any peripheral devices in the device, including network interface 1116 or other peripheral interfaces such as various types of persistent and/or volatile storage devices. In some embodiments, I/O interface 1108 may perform any necessary protocol, timing, or other data transformations to convert data signals from one component (e.g., system memory 1110) into a format suitable for use by another component (e.g., processor 1102).
In some embodiments, I/O interface 1108 may include support for devices attached through various types of peripheral buses, such as a variant of the Peripheral Component Interconnect (PCI) bus standard or the Universal Serial Bus (USB) standard, for example. In some embodiments, the function of I/O interface 1108 may be split into two or more separate components, such as a north bridge and a south bridge, for example. Also, in some embodiments some or all of the functionality of I/O interface 1108, such as an interface to system memory 1110, may be incorporated directly into processor 1102.
Network interface 1116 may be configured to allow data to be exchanged between computing device 1100 and other devices 1120 attached to a network or networks 1118, such as other computer systems or devices as illustrated in
In some embodiments, system memory 1110 may be one embodiment of a computer-accessible medium configured to store program instructions and data as described above for
In some embodiments, a plurality of non-transitory computer-readable storage media may collectively store program instructions that when executed on or across one or more processors implement at least a subset of the methods and techniques described above. A computer-accessible medium may include transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as a network and/or a wireless link, such as may be implemented via network interface 1116.
Portions or all of multiple computing devices such as that illustrated in
The various methods as illustrated in the figures and described herein represent example embodiments of methods. The methods may be implemented in software, hardware, or a combination thereof. The order of method may be changed, and various elements may be added, reordered, combined, omitted, modified, etc.
Various modifications and changes may be made as would be obvious to a person skilled in the art having the benefit of this disclosure. It is intended that the invention encompasses all such modifications and changes and, accordingly, the above description to be regarded in an illustrative rather than a restrictive sense.