Embodiments of the present disclosure relate to audio recognition including natural language processing. More particularly, a personal artificial intelligence assistant is used to build a model for learning and tracking a patient's speech and language patterns to detect changes in speech patterns and other indicative language information to determine changes in the patient's overall cognitive health and health condition.
Various technologies exist that allow for a person to call for assistance when in distress, but require the person to activate a call response (e.g., press one or more buttons) or rely on vital sign monitoring (e.g., an electrocardiogram) to trigger a call-worthy condition before assistance is requested. However, some conditions, such as gradual cognitive decline, occur over long periods of time and may not present as recognized emergency situations. Accordingly, various vulnerable persons need non-intrusive technologies that allow for passive monitoring and assistance.
Certain embodiments provide a method that includes at a first time, capturing, via an artificial intelligence (AI) assistant device, first audio from an environment, detecting first utterances from the first audio for a patient, and adding the first utterances to a language tracking model. The method also includes at a second time, capturing, via the AI assistant device, second audio from the environment, detecting second utterances from the second audio for the patient, detecting via a language tracking engine provided by the AI assistant device, the language tracking model, and the second utterances, a condition change indicating a condition of the patient has changed from the first time to the second time, and generating a condition notification may include the condition change. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
Other embodiments provide a non-transitory computer-readable storage medium including computer-readable program code that when executed using one or more computer processors, performs an operation. The operation includes at a first time, capturing, via an artificial intelligence (AI) assistant device, first audio from an environment, detecting first utterances from the first audio for a patient, and adding the first utterances to a language tracking model. The operation also includes at a second time, capturing, via the AI assistant device, second audio from the environment, detecting second utterances from the second audio for the patient, detecting via a language tracking engine provided by the AI assistant device, the language tracking model, and the second utterances, a condition change indicating a condition of the patient has changed from the first time to the second time, and generating a condition notification may include the condition change.
Other embodiments provide an artificial assistant device. The artificial assistant device includes one or more computer processors and a memory containing a program which when executed by the processors performs an operation. The operation includes at a first time, capturing, via the artificial intelligence (AI) assistant device, first audio from an environment, detecting first utterances from the first audio for a patient, and adding the first utterances to a language tracking model. The operation also includes at a second time, capturing, via the AI assistant device, second audio from the environment, detecting second utterances from the second audio for the patient, detecting via a language tracking engine provided by the AI assistant device, the language tracking model, and the second utterances, a condition change indicating a condition of the patient has changed from the first time to the second time, and generating a condition notification may include the condition change.
The appended figures depict certain aspects of the one or more embodiments and are therefore not to be considered limiting of the scope of this disclosure.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the drawings. It is contemplated that elements and features of one embodiment may be beneficially incorporated in other embodiments without further recitation.
Embodiments herein determine when to place, and then placing, a passive assistive call using personal artificial intelligence (AI) assistants. Assistant devices, by which various personal AI assistants are deployed, offer several benefits over previous assistive call systems, including the ability to use audio recorded in the environment to passively assess the condition of the persons in the environment over both short term and long term time frames. The assistant device may be in communication with various other sensors to enhance or supplement the audio assessment of the persons in the environment, and may be used in a variety of scenarios where prior monitoring and call systems struggled to quickly and accurately identify immediate distress in various monitored persons (e.g., patients) or mounting distress such as gradual health condition or cognitive decline (e.g., caused by minor strokes and other cognitive decline).
AI assistants such as provide a bevy of services to their users. These services can include responding to voice-activated requests (e.g., responding via audio to a request for the day's forecast with a local weather prediction), integrating with a human user's calendar, controlling appliances or lights, placing phone calls, or the like. These AI assistants often reside partially on a local device, as a local client, and partially in a back-end service located remotely (e.g., in a cloud server) from the local device. The local client handles data collection, some preprocessing, and data output, while the back-end service may handle speech recognition, natural language processing, and data fetching (e.g., looking up the requested weather forecast).
However, assistant devices, although beneficial in the environment, are active devices that often require the users to speak an utterance with a cue phrase to activate the device, or require active input from the users to perform various tasks. Accordingly, although the assistant device can be unobtrusive, and users may seek to incorporate the assistant devices in various environments, the assistant devices often purposely exclude audio not related to human speech from collection and analysis. In contrast, the present disclosure improves upon the base functionalities of the assistant devices by routinely and actively engaging with a user to determine and set baseline health and cognitive conditions and monitor the user over time to detect any potential changes in the overall condition or health of the user.
Accordingly, the present disclosure provides for improved functionality in assistant devices and devices linked to the assistant devices, improved processing speed, improved data security, and improved outcomes in healthcare (including prophylactic care and improved accuracy in diagnoses and treatments).
In a healthcare context, the persons that the AI assistant device 110 may interact with include patients 120 whose health and well-being are monitored, authorized persons, including authorized person 130, who are currently authorized by the patients 120 to receive health information related to the patient 120 via the AI assistant device 110, and unauthorized persons 140 who are not currently authorized by the patients 120 receive health information related to the patient 120. In various embodiments, the authorized person 130 and the unauthorized persons 140 may be permitted to interact with the AI assistant device 110 (or denied access to the AI assistant device 110) for non-healthcare related information independently of the permissions granted/denied for receiving health information related to the patient 120. Various other objects 170a-f (generally or collectively, objects 170) may also be present in the environment 100 or otherwise be observable by the AI assistant device 110 including, but not limited to: toilets 170a, sinks 170b, cars 170c, pets 170d, appliances 170e, audio sources 170f (e.g., televisions or radios), etc.
As used herein, a patient 120 may be one of several persons in the environment 100 to whom medical data and personally identifiable information (PII) pertain. Generally, a patient 120 is an authorized user for accessing their own data, and may grant rights for others to also access those data or to grant additional persons the ability to access these data on behalf of the patient 120 (e.g., via medial power of attorney). For example, a patient 120 may grant a personal health assistant, a nurse, a doctor, a trusted relative, or other person (herein provider) the ability to access medical data and PII. A patient 120 may also revoke access to the medical data and PII, and may grant or revoke access to some or all of the data. Accordingly, a patient 120 is a person that the medical data and PII relate to, authorized person 130 are those with currently held rights to access some or all of the medical data and PII, and unauthorized persons 140 include those who have not yet been identified as well as those currently lacking rights to access the medical data and PII. The identification and classification of the various persons is discussed in greater detail in relation to
The AI assistant device 110 offers a user interface for requesting and receiving controlled access to health information. In some embodiments, the AI assistant device 110 is an audio-controlled computing device with which the users may interact with verbally, but various other devices may also be used as a user interface to request or provide health information to authorized parties in the environment. For example, a television may be used to output health information via a video overlay, a mobile telephone may be used to receive requests via touch-input and output health information via video or audio, etc. Generally, the AI assistant device 110 can be any device capable of hosting a local instance of an AI assistant and that remains in an “on” or “standby” mode to receive requests and provide outputs related to health information while remaining available for other tasks. For example, the AI assistant device 110 may also handle home automation tasks (e.g., controlling a thermostat, lights, appliances) on behalf of a user or interface with the television to provide health information while the patient 120 is watching a program. Example hardware for the AI assistant device 110 is discussed in greater detail in regard to
In various embodiments, the AI assistant device 110 captures audio in the environment 100 and, to determine how to respond to the captured audio, may locally process the audio, may be in communication with remote computing resources 160 via a network 150 to process the audio remotely, or may perform some audio processing locally and some audio processing remotely. The AI assistant device 110 may connect to the network 150 via wired technologies (e.g., wires, fiber optic cable, etc.), wireless technologies (e.g., WIFI, cellular, satellite, Bluetooth, etc.), or combinations thereof. The network 150 may be any type of communication network, including data and/or voice networks, local area networks, and the Internet.
To determine how or whether to respond to audio captured in the environment, the AI assistant device 110 may need to filter out unwanted noises from desired audio, identify the source of the audio, and determine the content of the audio. For example, if the AI assistant device 110 detects audio of a request for the next scheduled doctor's appointment for the patient 120, the AI assistant device 110 may need to determine whether the request was received from an audio source 170f as unwanted noise (e.g., a character speaking in a movie or television program), the patient 120, an authorized person 130 (e.g., an in-home care assistant looking up care details for the patient 120), or an unauthorized person 140 (e.g., a curious visitor without authorization to receive that information from the AI assistant device 110). Other filters may be used to identify and discard sounds made by other objects 170 in the environment 100.
In order to identify the content of the desired audio (e.g., a command to the AI assistant device 110), an audio recognition (AR) engine performs audio analysis/filtering and speech recognition on the captured audio signals and calculates a similarity between any audio identified therein and known audio samples (e.g., utterances for certain desired interactions). The AR engine then compares this similarity to a threshold and, if the similarity is greater than the threshold, the AR engine determines that a known audio cue has been received from the environment. The AR engine may use various types of speech and audio recognition techniques, such as, large-vocabulary speech recognition techniques, keyword spotting techniques, machine-learning techniques (e.g., support vector machines (SVMs)), neural network techniques, or the like. In response to identifying an audio cue, the AI assistant device 110 may then use the audio cue to determine how to next respond. Some or all of the audio processing may be done locally on the AI assistant device 110, but the AI assistant device 110 may also offload more computationally difficult tasks to the remote computing resources 160 for additional processing.
In various embodiments, the AI assistant device 110 may also access the electronic health records 180 via the network 150 or may store some of the electronic health records 180 locally for later access. The electronic health records 180 may include one or more of: medical histories for patients, upcoming or previous appointments, medications, personal identification information (PII), demographic data, emergency contacts, treating professionals (e.g., physicians, nurses, dentists, etc.), medical powers of attorney, and the like. The electronic health records 180 may be held by one or more different facilities (e.g., a first doctor's office, a second doctor's office, a hospital, a pharmacy) that the AI assistant device 110 authenticates with to receive the data. In some embodiments, the AI assistant device 110 may locally cache some of these electronic health records 180 for offline access or faster future retrieval. Additionally or alternatively, a patient 120 or authorized person 130 can locally supply the medical data, such as by requesting the AI assistant device 110 to “remind me to take my medicine every morning”, importing a calendar entry for a doctor's appointment from a linked account or computer, or the like.
Additionally, the AI assistant device 110 may store identifying information to distinguish the patient 120, authorized person 130, and unauthorized persons 140 when deciding whether to share the electronic health records 180 or data based on the electronic health records 180.
Generally, until a person has been identified, the AI assistant device 110 classifies that person as an unauthorized person 140, and may ignore commands or audio from that person. For example, at Time1, the AI assistant device 110 may know that two persons are present in the environment 200, but may not know the identities of those persons, and therefore treats the first person as a first unauthorized person 140a and the second person as a second unauthorized person 140b.
In various embodiments, persons can identify themselves directly to the AI assistant device 110 or may identify other parties to the AI assistant device 110. For example, when a first utterance 210a (generally or collectively, utterance 210) is received from the first unauthorized person 140a, the AI assistant device 110 may extract a first voice pattern 220a (generally or collectively, voice pattern 220) from the words (including pitch, cadence, tone, and the like) to compare against other known voice patterns, such as the voice pattern 220 to identify an associated known person. In the illustrated example, the first voice pattern 220a matches that of a patient 120, and the AI assistant device 110 therefore reclassifies the first unauthorized person 140a to be the patient 120.
The AI assistant device 110 may store various identity profiles for persons to identify those persons as a patient 120, authorized person 130 for that patient, or as unauthorized persons 140 for that patient, with various levels of rights to access or provide health information for the patient 120 and various interests in collecting or maintaining data related to that person.
Once a person has been identified as a patient 120 (or other authorized party trusted to identify other persons with whom access should be granted), the AI assistant device 110 may rely on utterances 210 from that trusted person to identify other persons. For example, the first utterance 210a can be used to identify the first unauthorized person 140a as the patient 120 based on the associated first voice pattern 220a, and the contents of the first utterance 210a can be examined for information identifying the other party. In the illustrated example, the AI assistant device 110 (either locally or via remote computing resources 160) may extract the identity “Dr. Smith” from the first utterance 210a to identify that the second unauthorized person 140b is Dr. Smith, who is an authorized person 130 for the patient 120, and the AI assistant device 110 therefore reclassifies the second unauthorized person 140b to be an authorized person 130 for the patient 120.
Additionally or alternatively, the AI assistant device 110 may identify Dr. Smith as an authorized person 130 based on a second voice pattern 220b extracted from the second utterance 210b spoken by Dr. Smith. The voice patterns 220 may be continuously used by the AI assistance device 110 to re-identify Dr. Smith or the patient 120 (e.g., at a later time) within the environment 200 or to distinguish utterances 210 as coming from a specific person within the environment 200.
When multiple persons are present in the environment 200, and potentially moving about the environment, the AI assistant device 110 may continually reassess which person is which. If a confidence score for a given person falls below a threshold, the AI assistant device 110 may reclassify one or more persons as unauthorized persons 140 until identities can be reestablished. In various embodiments, the AI assistant device 110 may use directional microphones to establish where a given person is located in the environment 200, and may rely on the sensors 230 to identify how many persons are located in the environment 200 and where those persons are located.
In some examples herein, immediate and dramatic changes to the speech and vocal patterns of the patient 120 are detected by the device 110 and indicate the patient is experiencing a medical emergency. In other examples, subtle changes occur in the patient's speech and vocal patterns over long periods of time (e.g., days, weeks, years), where the change in the condition of the patient is harder to identify based on presenting symptoms. In order to track and detect the subtle changes (as well as to enhance detection of more immediate changes) the device 110 builds/trains a language learning model discussed herein.
For ease of discussion, the steps of the method 300 are discussed with reference to
In each of
The AI assistant device 110 provides an audio recognition (AR) engine, which may be another machine learning model or an additional layer of the filtering machine learning model. The device 110 also provide a language recognition (LR) engine 406 that builds or otherwise trains a language learning model, such as model 405. The model 405 may be any type of machine learning model, such as a neural network or other type of network. In some examples, where the patient 120 actively and frequently engages with the device 110, the device 110 trains the model 405 without needing to prompt the patient 120 for speech. In some examples, the model 405 is generated and trained using conversation questions from the AI assistant device 110, where the conversation questions are configured to generate audio responses including conversation answers, from the patient 120. In some examples, the conversation questions are generated by the device 110 in order to entice the patient 120 to engage with the device 110 such that the device 110 may passively train the model 405.
For example, the conversation questions may include a greeting and service offering that generally elicit a response from the patient 120. The conversation questions can be the same during each scenario or time shown in
In addition to processing and classifying the environmental sounds 408 and listening for answers to conversation questions/output audio, in some embodiments, the audio recognition engine may include speech recognition for various key phrases. For example, various preloaded phrases may be preloaded for local identification by the AI assistant device 110, such as, a name of the AI assistant to activate the AI assistant device 110 (e.g., “Hey, ASSISTANT”) or phrases to deactivate the AI assistant device 110 (e.g., “never mind”, “I'm fine”, “cancel request”, etc.). The AI assistant device 110 may offload further processing of speech sounds to a speech recognition system offered by remote computing resources 160 to identify the contents and intents of the various utterances from the patient 120 captured in the environment 400.
Referring to back to
At block 308, the AI assistant device 110 detects, via natural language processing, tracked words in the first utterances at the time 401, including the utterances 420a and 440a. In some examples, tracked words are words that are expected to be spoken by the patient 120 frequently. For example, the patient 120 is expected to respond to or refer to the AI assistant device 110 using “you,” among other uses of the word “you” in utterances from the patient 120. In some examples, the tracked words may be preconfigured/preset by the AI assistant device 110. The tracked words may also be determined based on the patient 120. For example, if the patient 120 frequently uses a word when communicating with the device 110, that word may be added to the tracked words list. When the AI assistant device 110 detects the tracked words, method 300 proceeds to block 310 where the AI assistant device 110 marks the tracked words for tracking in the model 405 with an indication for enhanced tracking.
Additionally at block 312, the AI assistant device 110 detects, via natural language processing, triggers words in the first utterances at the time 401, including the utterances 420a and 440a. For example, trigger words include words that may not indicate immediate distress or decline, but may be utilized over time to detect a change in a patient condition (including cognitive decline or depression, etc.). For example, patient 120 uses the word “okay” in utterance 420a. While the usage of the trigger word “okay” does not immediately indicate that the patient 120 is experiencing a condition change, the overuse of the trigger word may indicate a change in the future, as described herein. For examples, repeating a word frequently may indicate a loss of vocabulary, among other cognitive changes.
At block 314, the AI assistant device 110 determines a baseline level for the trigger word and tracks a number of uses of the trigger word using the model 405 at block 316. For example, the frequency of occurrence of trigger words in utterances in the environment 400 are tracked using the model 405 and used to determine a change in the patient condition as described in more detail in relation to
At block 318, the AI assistant device 110 adds the first utterances to the language tracking model, e.g., the language tracking model 405 shown in
At block 320, the AI assistant device 110 determines whether the model 405 is trained to a level sufficient to monitor the patient 120 for a condition change. In some examples, the model 405 may be a pre-trained model such that after one collection of utterances, such as at time 401, the model 405 is sufficiently trained to monitor the patient 120 for condition changes. In another example, the AI assistant device 110 determines that the model requires additional training by comparing the trained data to one or more predefined thresholds for monitoring a patient and proceeds back to block 302 of method 300 to transmit conversation questions to the patient 120.
In this example, method 300 proceeds to block 322 from block 320 and begins monitoring the patient 120 for condition changes using the model 405. In some examples, while the AI assistant device 110 is using the model 405 to monitor the patient 120 for condition changes, the AI assistant device 110 also continues training and updating the model 405. For example, the method 300 proceeds back to block 302 to transmit conversation questions to the patient 120.
For ease of discussion, the steps of the method 500 will be discussed with reference to
In each of
The AI assistant device 110 provides an audio recognition engine, which may be another machine learning model or an additional layer of the filtering machine learning model, that builds or otherwise trains a language learning model, such as model 405 which is trained as described in relation to
In addition to processing and classifying the environmental sounds to and listening for answers to conversation questions/output audio, in some embodiments, the audio recognition engine may include speech recognition for various key phrases. For example, various preloaded phrases may be preloaded for local identification by the AI assistant device 110, such as, a name of the AI assistant to activate the AI assistant device 110 (e.g., “Hey, ASSISTANT”) or phrases to deactivate the AI assistant device 110 (e.g., “never mind”, “I'm fine”, “cancel request”, etc.). The AI assistant device 110 may offload further processing of speech sounds to a speech recognition system offered by remote computing resources 160 to identify the contents and intents of the various utterances from the patient 120 captured in the environment 400 during the scenarios 601-603.
Referring to back to
At block 502, the AI assistant device 110, using the model 405 and a language tracking engine, detects via audio associated with utterances stored in the language tracking model, such as model 405, a change in voice tone of the patient. At block 510, the AI assistant device 110 associates the change in the voice tone with at least one predefined tone change indicators and determines, from the at least one predefined tone change indicators, a change in the condition of the patient. For example, utterance 620a and 640a in scenario 601 may include an aggressive and loud voice tone as detected by the AI assistant device 110.
The AI assistant device 110, using the model 405 associates aggressive and loud voice tones with an agitation indicator. An increase in the agitation indicator of the tone of the patient 120 may indicate symptoms of cognitive decline caused by memory loss conditions, cognitive decline caused by minor strokes, other neurological events, or other conditions such as depression or anxiety, among others. In some examples, the indicator indicates a change in the tone as determined at block 512 and the AI assistant device 110 updates a condition 605 of the patient 120 at block 514.
The condition 605 may also be updated using additional learning model methods as described in relation to blocks 520a-550. For example, a condition change from just a detected tone change may warrant a note for follow up or an update to the model 405, while a tone change along with other detected changes in the speech patterns may warrant a more immediate follow up or alert as described herein.
In some examples, at block 512, the AI assistant device 110 determines that the tone indicators do not indicate a change in the condition of the patient 120 and the method 500 proceeds to block 520a. For example, the tone of the utterances 620a and 640a may indicate agitation, but may be within an expected range as determined by the language engine of the AI assistant device 110 and the model 405.
In both examples, whether the AI assistant device 110 has determined there is a change associated with the tone of the patient or not, the AI assistant device 110 continues to check other factors such as at blocks 520a and 520b of method 500, where the AI assistant device 110 detects the presences of tracked words. In the scenario 601, there are tracked words in the utterances 620a and 620b (i.e., “you”). For the scenario 601, at block 522b the AI assistant device 110 detects, via natural language processing and fuzzy matching processing, that a pronunciation of the tracked words has not changed in the utterances 620a and 640a.
In another example,
Additionally, the AI assistant device 110 may detect via natural language processing and fuzzy matching processing additional words that may have altered pronunciation. For example, “doing” is not a tracked word in the model 405, but the AI assistant device 110 uses the natural language processing and fuzzy matching processing to determine that the word the patient is pronouncing is “doing” and that the pronunciation is unexpected for the word and or the patient 120.
When the AI assistant device 110 detects a change in pronunciation or unexpected pronunciation for words including tracked and non-tracked words (at either block 522a or block 522b), the AI assistant device 110 updates the patient's condition 605 at block 534. In another example, the AI assistant device 110 does not detect a pronunciation change and the method 500 proceeds to block 530b from 522b (e.g., in scenario 601). In an example, where no tone change is detected and no pronunciation change is detected, the method 500 proceeds to block 530a.
At blocks 530a and 530b, the AI assistant device 110 determines whether trigger words are present in the second utterances. In an example, where trigger words are present, the AI assistant device 110 compares the number of uses to the baseline level and predefined threshold for the trigger word. For example, in the scenario 601, there are trigger words in the utterances 620a and 620b (i.e., okay). At block 532b the AI assistant device 110 determines that the usage of the trigger word in scenario 601 is below a baseline according to the model 405.
In another example,
When the AI assistant device 110 detects a usage of trigger words above a respective baseline (at either block 532a or block 532b), the AI assistant device 110 updates condition 605 at block 534. In another example, the AI assistant device 110 does not detect a usage above a baseline and the method 500 proceeds to block 550 from 532b or to block 540 from block 532a. In some examples, none of the indicators in the model 405 indicate that the condition 605 of the patient 120 has changed.
In an example where one or more indicators has caused an update to the condition 605, the AI assistant device 110 detects/determines the condition change from the condition 605. For example, the AI assistant device 110 aggregates the changes made in blocks 501-534 of method 500. In some examples, the AI assistant device 110 may also detect via a language tracking engine provided by the AI assistant device and the model additional factors/indicators for a condition change. For example, slurring or stuttering speech, slowed/paused speech, and other factors derived from the model 405 and the utterances in the scenarios 601-603 may further indicate a condition change.
At block 560, the AI assistant device 110 generates a condition notification comprising the condition change indicating a condition of the patient has changed from the first time to the second time. Generation of the condition notification is further described in relation to
Method 700 begins at block 702 where the device 110 logs the condition notification for caretaker review. In some examples, the condition notification includes a minor change that does not warrant immediate follow up. For example, a detected change in the memory of the patient 120 may not require an immediate visit from a provider. In another example, the condition notification includes a significant or sudden change in the speech patterns, which requires attention as soon as feasible.
At block 704, the device 110 determining, from the condition change, whether an emergency condition change is indicated. In some examples, the device 110 may conduct further inquiries into the condition of the patient 120 to determine if a medical emergency is occurring, as described in
For example, two of the signs for rapid diagnosis of strokes in patients 120 include slurred or disjointed speech (generally, slurred or slurry speech) and facial paralysis, often on only one side of the face, causing “facial droop”, where an expression is present on one side of the face and muscle control has been lost in the lower face and one side of the upper face. Depending on the severity of the stroke in the patient 120, the patient 120 may no longer be able to produce intelligible speech or otherwise actively call for assistance. Accordingly, the AI assistant device 110 may analyze utterances 810a and 820a generated by the patient 120 to determine when to generate an alert for a medical professional to diagnose and aid the patient 120.
As illustrated in
For example, the first portion of the first utterance 210a of “Hey assitsn” may be compared to the cue phrase of “Hey Assistant” to determine that the uttered speech does not satisfy a confirmation threshold for the patient 120 to have spoken “Hey ASSISTANT” to activate the AI assistant device 110, but does satisfy a proximity threshold as being close to intelligibly saying “Hey ASSISTANT”. When the confidence in matching a received phrase to a known phrase falls between the proximity threshold (as a lower bound) and the confirmation threshold (as an upper bound), and does not satisfy a confirmation threshold for another known phrase (e.g., “Hey hon” to address a loved one as ‘hon’), the AI assistant device 110 may take further action to determine if the patient is in distress.
When the patient 120 is identified as being in distress as possibly suffering a stroke (i.e., an emergency condition), the AI assistant device 110 may generate audio outputs 850a and 860a to prompt the patient 120 to provide further utterances to gather additional speech samples to compare against the model 405 to guard against accents or preexisting speech impediments yielding false positives for detecting a potential stroke.
As illustrated in
The AI assistant device 110 may generate a second audio output 860a of “What sport is played during the World Series?” or another pre-arranged question/response pair that the patient 120 should remember the response to. The second audio output 860a prompts the patient 120 to reply via a utterance 810a, “Bizbull” to intended to convey the answer of “baseball”, albeit with a slurred speech pattern. Similarly, if the patient 120 were to supply an incorrect answer (e.g., basketball) after having established knowledge of the correct answer when setting up the pre-arranged question/response, the mismatch may indicate cognitive impair, even if the speech is not otherwise slurred, which may be another sign of stroke.
When the AI assistant device 110 detects slurred speech via the utterances 210 almost (but not quite) matching known audio cues or not matching a pre-supplied audio clip of the patient 120 speaking the words from model 405, the AI assistant device 110 may activate various supplemental sensors to further identify whether the patient is in distress. For example, a camera sensor 230c of the sensors 230 may be activated and the images provided to a facial recognition system (e.g., provided by remote computing resources 160) to identify whether the patient 120 is experiencing partial facial paralysis; another sign of stroke.
Additionally or alternatively, the AI assistant device 110 may the access the electronic health records 180 for the patient 120 to adjust the thresholds used to determine whether the slurred speech or facial paralysis is indicative of stroke. For example, when a patient 120 is observed with slurred speech, but the electronic health records 180 indicate that the patient 120 was scheduled for a dental cavity filling earlier in the day, the AI assistant device 110 may adjust the confidence window upward so that false positives for stroke are not generated due to the facial droop and speech impairment expected from local oral anesthesia. In another example, when the patient 120 is prescribed medications that affect motor control, the AI assistant device 110 may adjust the confidence window upward so that greater confidence in stroke is required before an alert is generated. In a further example, when the electronic health records 180 indicate that the patient 120 is at an elevated risk for stroke (e.g., due to medications, previous strokes, etc.), the AI assistant device 110 may adjust the confidence window downward so that lower confidence in stroke is required before an alert is generated.
When the patient 120 has slurred speech and/or exhibits partial facial paralysis sufficient to satisfy the thresholds for stroke, the AI assistant device 110 determines that the patient 120 is in distress and is non-responsive (despite potentially attempting to be responsive), and therefore generates an alert that the patient 120 is in distress.
In various embodiments, the AI assistant device 110 transmits the alert to an emergency alert system, alert system 880, to propagate according to various transmission protocols to one or more personal devices 885 associated with authorized person 130 to assist the patient 120. Example hardware as may be used in the alert system 880 and the personal devices 885 can include a computing system 900 as is discussed in greater detail in
Returning back to method 700 of
For example, the AI assistant device 110 outputs the output audio 850b which states “Hello, I've noticed a change in your condition. This is not an emergency, but I would like to inform a provider. Can I inform the provider?” The patient 120 in turn may respond via utterance 810b granting permission to call a provider or respond via utterance 820b declining a call.
In some examples, the AI assistant device 110 may inform the patient 120 of the details of the detected change. For example, the AI assistant device 110 may inform the patient 120 that their speech indicates a high agitation level. In this example, the patient 120 may know that they are agitated for a specific reason (not related to a condition change) and decline the call from the AI assistant device 110. In this example, the AI assistant device 110 determines that the condition change may be followed up at a later time and does not transmit a call to a provider.
In another example, the AI assistant device 110 informs the patient 120 that a condition change indicates that minor strokes may have occurred. In this example, whether the patient declines the call or grants permission for the call, the AI assistant device 110 determines that the condition change requires provider notification notifies at least one provider.
In various embodiments, the AI assistant device 110 transmits a condition notification to a call system 890 to propagate according to various transmission protocols to one or more personal devices 885 associated with authorized person 130 to assist the patient 120. The call system 890 provides a lower importance call level compared to the alert system 880. Example hardware as may be used in the system 890 and the personal devices 885 can include a computing system 900 as is discussed in greater detail in
The processor 950 retrieves and executes programming instructions stored in the memory 960. Similarly, the processor 950 stores and retrieves application data residing in the memory 960. An interconnect can facility transmission, such as of programming instructions and application data, between the processor 950, I/O devices 910, network interface 930, and memory 960. The processor 950 is included to be representative of a single processor, multiple processors, a single processor having multiple processing cores, and the like. And the memory 960 is generally included to be representative of a random access memory. The memory 960 can also be a disk drive storage device. Although shown as a single unit, the memory 960 may be a combination of fixed and/or removable storage devices, such as magnetic disk drives, flash drives, removable memory cards or optical storage, network attached storage (NAS), or a storage area-network (SAN). The memory 960 may include both local storage devices and remote storage devices accessible via the network interface 930. One or more machine learning models 971 may be are maintained in the memory 960 to provide localized portion of an AI assistant via the computing system 900. Additionally, one or more AR engines 972 may be maintained in the memory 960 to match identified audio to known events occurring in an environment where the computing system 900 is located.
Further, the computing system 900 is included to be representative of a physical computing system as well as virtual machine instances hosted on a set of underlying physical computing systems. Further still, although shown as a single computing system, one of ordinary skill in the art will recognize that the components of the computing system 900 shown in
As shown, the memory 960 includes an operating system 961. The operating system 961 may facilitate receiving input from and providing output to audio components 980 and non-audio sensors 990. In various embodiments, the audio components 980 include one or more microphones (including directional microphone arrays) to monitor the environment for various audio including human speech and non-speech sounds, and one or more speakers to provide simulated human speech to interact with persons in the environment. The non-audio sensors 990 may include sensors operated by one or more different computing systems, such as, for example, presence sensors, motion sensors, cameras, pressure or weight sensors, light sensors, humidity sensors, temperature sensors, and the like, which may be provided as separate devices in communication with the AI assistant device 110, or a managed constellation of sensors (e.g., as part of a home security system in communication with the AI assistant device 110). Although illustrated as external to the computing system 900, and connected via an I/O interface, in various embodiments, some or all of the audio components 980 and non-audio sensors 990 may be connected to the computing system 900 via the network interface 930, or incorporated in the computing system 900.
The preceding description is provided to enable any person skilled in the art to practice the various embodiments described herein. The examples discussed herein are not limiting of the scope, applicability, or embodiments set forth in the claims. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments. For example, changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. For instance, the methods described may be performed in an order different from that described, and various steps may be added, omitted, or combined. Also, features described with respect to some examples may be combined in some other examples. For example, an apparatus may be implemented or a method may be practiced using any number of the aspects set forth herein. In addition, the scope of the disclosure is intended to cover such an apparatus or method that is practiced using other structure, functionality, or structure and functionality in addition to, or other than, the various aspects of the disclosure set forth herein. It should be understood that any aspect of the disclosure disclosed herein may be embodied by one or more elements of a claim.
As used herein, the word “exemplary” means “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
As used herein, a phrase referring to “at least one of” a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiples of the same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a c c, b-b, b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).
As used herein, the term “determining” encompasses a wide variety of actions. For example, “determining” may include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database or another data structure), ascertaining and the like. Also, “determining” may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory) and the like. Also, “determining” may include resolving, selecting, choosing, establishing and the like.
The methods disclosed herein comprise one or more steps or actions for achieving the methods. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is specified, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims. Further, the various operations of methods described above may be performed by any suitable means capable of performing the corresponding functions. The means may include various hardware and/or software component(s) and/or module(s), including, but not limited to a circuit, an application specific integrated circuit (ASIC), or processor. Generally, where there are operations illustrated in figures, those operations may have corresponding counterpart means-plus-function components with similar numbering.
The following claims are not intended to be limited to the embodiments shown herein, but are to be accorded the full scope consistent with the language of the claims. Within a claim, reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more. No claim element is to be construed under the provisions of 35 U.S.C. § 112(f) unless the element is expressly recited using the phrase “means for” or, in the case of a method claim, the element is recited using the phrase “step for.” All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims.
The following clauses describe various embodiments of the present disclosure.
Clause 1: A method, comprising: at a first time, capturing, via an Artificial Intelligence (AI) assistant device, first audio from an environment; detecting first utterances from the first audio for a patient; adding the first utterances to a language tracking model; at a second time, capturing, via the AI assistant device, second audio from the environment; detecting second utterances from the second audio for the patient; detecting via a language tracking engine provided by the AI assistant device, the language tracking model, and the second utterances, a condition change indicating a condition of the patient has changed from the first time to the second time; and generating a condition notification comprising the condition change.
Clause 2: In addition to the method of clause 1, further comprising: logging the condition notification for caretaker review; providing the condition notification to the patient for review; requesting patient permission to provide an alert to a caretaker for the patient via a call system; receiving the patient permission from the patient to provide the alert to the caretaker; and transmitting the condition notification via the call system, where the call system transmits the condition notification via a phone network to a personal device associated with the caretaker for the patient as at least one of: text message; or phone call using a synthesized voice.
Clause 3: In addition to the method of clauses 1 or 2, further comprising: at regular time intervals, transmitting, via the AI assistant device, conversation questions to the patient; capturing, via the AI assistant device, audio comprising conversation answers from the patient; adding the conversation answers to the language tracking model, and wherein detecting that the condition of the patient has changed further comprises comparing the conversation answers captured across the regular time intervals to detect changes in the conversation answers.
Clause 4: In addition to the method of clauses 1, 2, or 3, wherein detecting the condition of the patient has changed comprises: detecting, via audio associated with utterances stored in the language tracking model, a change in voice tone of the patient; associating the change in the voice tone with at least one predefined tone change indicator; and determining a change in the condition of the patient based one the at least one predefined tone change indicator.
Clause 5: In addition to the method of clauses 1, 2, 3, or 4, further comprising: detecting, via natural language processing, tracked words in the first utterances; marking the tracked words in the language tracking model with an indication for enhanced tracking; and wherein detecting the condition of the patient has changed comprises: detecting, via at least one of natural language processing or fuzzy matching processing, that a pronunciation of the tracked words has changed in the second utterances.
Clause 6: In addition to the method of clauses 1, 2, 3, 4, or 5, further comprising: determining, from the condition change, an emergency condition change indicating the patient is experiencing an emergency; providing emergency condition information to the patient via the AI assistant; generating an emergency alert via an alert system associated with the AI assistant, wherein when the alert system is a phone network, the emergency alert is sent to a personal device associated with a caretaker for the patient as at least one of: a text message; or a phone call using a synthesized voice, wherein when the alert system is part of an alert system in a group home or medical facility, the alert system transmits the emergency alert via a broadcast message to a plurality of personal devices associated with caretakers in the group home or medical facility.
Clause 7: In addition to the method of clauses 1, 2, 3, 4, 5 or 6, further comprising: detecting, using natural language processing, at least one trigger word in the first utterances; determining a baseline level for the at least one trigger word; tracking a number of uses of the at least one trigger word using the language tracking model; and wherein detecting the condition change comprises comparing the number of uses to the baseline level and predefined threshold for the at least one trigger word.
This application claims priority to U.S. Provisional Patent Application No. 63/362,246, filed Mar. 31, 2022, the entire content of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63362246 | Mar 2022 | US |