The field of the invention is healthcare informatics, especially analysis of psychological or other medical conditions.
The following description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
Diagnosis, detection, and monitoring of medically-related conditions remain a critical need. The problems are often exacerbated by: (i) lack of access to neurologists or psychiatrists; (ii) lack of awareness of a given condition and the need to see a specialist; (iii) lack of an effective standardized diagnostic or endpoint for many of these health conditions; (iv) substantial transportation and cost involved in conventional or traditional solutions; and in some cases, (v) shortage of medical specialists in these fields.
There have been many efforts to address these problems, including use of telemedicine, in which a practitioner interacts with a patient or patients utilizing telecommunications. Telemedicine does not, however, resolve problems associated with insufficient numbers of trained practitioners, or available time of existing practitioners. Psychological conditions, in particular, can often require lengthy times spent with responding patients. Current systems for telemedicine also fail to address inadequacies in electronic communications, especially in rural areas where adequate line speed and reliability are lacking.
As used herein, the term “patient” means any person with which a human or virtual practitioner is communicating with respect to a psychological or other condition, or potential such conditions, even if the person has not been diagnosed, and is not under the care of any practitioner. Where communication is via telecommunications, such person is also from time to time herein referred to as a “user”.
As used herein, the term “practitioner” broadly refers to any person whose vocation involves diagnosing, treating, or otherwise assisting in assessing or remediating psychological and/or other medical issues. In this usage, practitioners are not limited to medical doctors or nurses, or other degreed providers. Still further, as used herein, “medical conditions” should be interpreted as including psychological conditions, regardless of whether such conditions have any underlying physical etiology.
As used herein, the terms “assessment”, “assessing”, and related terms means weighing information from which at least a tentative conclusion can be drawn. The at least tentative conclusion need not rise to the level of a formal diagnosis.
As used herein, the term “virtual agent” broadly refers to a computer or other non-human functionality configured to operate as a practitioner in assessing or remediating psychological and/or other medical issues.
In view of the challenges mentioned above, there is a need for a virtual agent that can assess one or more psychological and/or other medical conditions of a patient or other user, utilizing both semantic and affect content. There is also a need for a communication agent that can cooperate with a practitioner and/or virtual agent to individually compensate for adverse telecommunications environments encountered during assessment sessions.
The inventive subject matter provides apparatus, systems, and methods in which a virtual agent converses with a responding person to assess one or more psychological or other medical conditions of the user. The virtual agent uses both semantic and affect content from the responding person to branch the conversation, and also to interact with a data store to provide an assessment of the medical or psychological condition.
As used herein, the term “semantic content” means language information that a person is conveying, whether with verbalized words, with sign language, or with other body movements. Body movements used to convey semantic content can include facial expressions, gestures, postures, vocal intonations, and so forth. As a simple example, a person could answer a question with an audible “I don't know”, or simply shrug to convey “I don't know”. Either way, the semantic content is that the person doesn't know.
As used herein, the term “affect content” means the observable manifestations of an emotion. Emotions can also be gleaned from such manifestations as facial expressions, gestures, postures, vocal intonations, and so forth. Affect content can signal any emotion, including for example, anger, happiness, boredom, and frustration. In the example above, a person could unemotionally provide the semantic content that he/she does not know the answer to a question, and could alternatively provide that same semantic content, along with an angry facial expression, indicating the affect content of anger or frustration.
In other aspects, a communication agent monitors a telecommunication session with a user, and if appropriate, modifies relative bandwidth utilization between the audio and image inputs. Such modification can be advantageously based at least in part on at least one of the semantic and affect contents. For example, if communication speeds are low, and the responding person is mumbling, but is otherwise communicating with little affect, the communications agent might divert a greater bandwidth to the audio communication, and a lesser bandwidth to the video communication.
In still other aspects, the communications agent could be configured to modify relative bandwidth utilization between audio and image inputs, based at least in part on content of at least one of the questions being asked, rapidity of the user's speech or movement of a hand or body part.
In still other aspects, an artificial intelligence agent can assist the virtual agent in assessing the psychological or other medical condition(s) of the user.
In still other aspects, an artificial intelligence agent can simultaneously assist multiple virtual agents, who are each conversing with a responding person and assessing their psychological or other medical condition(s), in parallel.
Although it is contemplated that a virtual agent could rely solely on information from the responding person and the data store to assess the psychological or other medical condition(s) of the user, it is contemplated that the virtual agent could also make assessments with direct or indirect input from a human assessor, and/or from an artificial intelligence agent. In preferred embodiments, artificial intelligence agents would cooperate with multiple virtual agents and multiple human assessors to improve future assessments. Depending on the system architecture, the virtual agent, communications agent, and artificial intelligence agent can be entirely separate, or alternatively can overlap to any suitable degree.
Because of the focus on both semantic and affect contents, it is contemplated that the apparatus, systems, and methods disclosed herein can be especially useful in assessing disorder severity in multiple neurological and mental disorders. Specific examples include Parkinson's disease, schizophrenia, depression and autism spectrum disorder.
Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.
The following discussion provides many example embodiments of the inventive subject matter. Although each embodiment represents a single combination of inventive elements, the inventive subject matter is considered to include all possible combinations of the disclosed elements. Thus if one embodiment comprises elements A, B, and C, and a second embodiment comprises elements B and D, then the inventive subject matter is also considered to include other remaining combinations of A, B, C, or D, even if not explicitly disclosed.
As used herein, and unless the context dictates otherwise, the term “coupled to” is intended to include both direct coupling (in which two elements that are coupled to each other contact each other) and indirect coupling (in which at least one additional element is located between the two elements). Therefore, the terms “coupled to” and “coupled with” are used synonymously.
As used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g. “such as”) provided with respect to certain embodiments herein is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the invention. Unless a contrary meaning is explicitly stated, all ranges are inclusive of their endpoints, and open-ended ranges are to be interpreted as bounded on the open end by commercially feasible embodiments.
Groupings of alternative elements or embodiments of the invention disclosed herein are not to be construed as limitations. Each group member can be referred to and claimed individually or in any combination with other members of the group or other elements found herein. One or more members of a group can be included in, or deleted from, a group for reasons of convenience and/or patentability. When any such inclusion or deletion occurs, the specification is herein deemed to contain the group as modified thus fulfilling the written description of all Markush groups used in the appended claims.
Practitioner 120 is using a computer 122 having an optional keyboard 123, a combination camera/microphone 124, and a speaker 126. Although the computer is depicted as a desktop model, the computer and its other electronic components should be viewed generically to include any device or devices fulfilling the usual functions of these components, including for example a laptop, an iPad™ or other tablet, and even a cell phone.
Data processing and storage functionality (depicted here as cloud 110) should be viewed generically as one or more computing and storage devices that collectively operate to execute the functions of a virtual agent 111, a data store 112, an artificial intelligence agent 113, and a communication agent 114, including storing and executing instructions stored on a computer readable tangible, non-transitory medium. For example, contemplated computing and storage devices include one or more computers operating as a web server, database server, or other type of computer server, and related storage devices, and can be physically local to one another, or more likely are distributed in different cities and even different countries. Accordingly, practitioner 120 and responding person 130 might be in different parts of the same building, or widely separated across the planet. One should also appreciate that such servers and storage devices can be re-configured from time to time to produce better conversational responding person experiences, and more reliable assessment accuracy.
It should be appreciated that virtual agent, data store, artificial intelligence agents, and communication agent are depicted within cloud 110 without clear boundaries. This is done intentionally to show that these items are not necessarily separate. For example, functionalities of the virtual agent might well be combined with those of the artificial intelligence agent and/or the communications agent, whether or not the corresponding software or firmware is physically operating from the same hardware.
Responding person 130 is also using a computer 132 having an optional keyboard 133, a combination camera/microphone 134 that provides inputs to the practitioner 120/virtual agent 111/artificial intelligence agent 113, and a speaker 136. Computer 122 might or might not be similar in features to computer 132, and here again, computer 132 should be viewed generically to include any device or devices fulfilling the usual functions of these components, including for example a laptop, an iPad™ or other tablet, and even a cell phone.
Practitioner 120 and responding person 130 are each depicted as sitting at a desk, however, it is contemplated that either or both of them could be interacting in any suitable posture, including for example, walking about, sitting on a couch, or lying in bed. Similarly, although practitioner 120 is shown as a middle aged woman, and responding person 130 is shown as an older man,
It should be appreciated that although practitioner 120 and responding person 130 should be viewed as sufficiently distant from one another that it is reasonable for them to be communicating through cloud 110.
As indicated above,
Guidance regarding suitable questions and comments to assess depression can be taken from the priority provisional application, and the relevant literature. Following is an example of a very short portion of a possible assessment.
In this example the virtual agent 111/AI agent 113 would utilize the speaker 226A to present the comment and question, and the responding person 210A would answer with the audible response and images coming through the combined camera/microphone 224A. The virtual agent/AI agent, in cooperation with the data store 112, would then analyze the semantic content of the spoken words, as well as the affect content provided by the tone of voice and facial expressions, to assist in assessing depression. In that way, both the semantic content and the affect content would be utilized to provide an assessment of a medical or psychological condition.
Here again, guidance regarding suitable questions and comments can be taken from the priority provisional application, and the relevant literature. Following is an example of a very short portion of a possible assessment.
In this example the virtual agent 111/AI agent 113 would utilize the speaker 226B to present the comment and question, and the responding person 210A would answer with the audible response and images coming through the combined camera/microphone 224B. The virtual agent/AI agent, in cooperation with the data store 112, would then analyze the semantic cues from the finger movement gestures, and affective content from the pitch glide. In that way, both the semantic content and the affect content would be utilized to provide an assessment of a medical or psychological condition.
Here again, guidance regarding suitable questions and comments can be taken from the priority provisional application, and the relevant literature. Following is an example of a very short portion of a possible assessment.
In this example the virtual agent 111/AI agent 113 would utilize the speaker 226C to present the comment and question, and the responding person 210A would answer with the audible response and images coming through the combined camera/microphone 224C. The virtual agent/AI agent, in cooperation with the data store 112, would then analyze the semantic cues from the spoken language, and affective content from the responsive person exhibiting a still expressionless face and then an emotionally responsive face with brows raised and mouth open. In that way, both the semantic content and the affect content would be utilized to provide an assessment of a medical or psychological condition.
Here again, guidance regarding suitable questions and comments can be taken from the priority provisional application, and the relevant literature. Following is an example of a very short portion of a possible assessment.
In this example, the virtual agent 111/AI agent 113 would utilize the speaker 226D to present the comment and question, and the responding person 210A would answer with the audible response and images coming through the combined camera/microphone 224D. The virtual agent/AI agent, in cooperation with the data store 112, would then analyze the semantic cues from the spoken language, and affective content from the responsive person exhibiting completely different facial expressions from one day to the next. In that way, both the semantic content and the affect content would be utilized to provide an assessment of a medical or psychological condition.
Here again, guidance regarding suitable questions and comments can be taken from the priority provisional application, and the relevant literature. Following is an example of a very short portion of a possible assessment.
In this example, the virtual agent 111/AI agent 113 would utilize the speaker 226E to present the comment and question, and the responding person 210A would answer with the audible response and images coming through the combined camera/microphone 224E. The virtual agent/AI agent, in cooperation with the data store 112, would then use emotion content from the child's speech and facial expression in imitating the semantic and acoustic content of her speech, while describing a picture to form an assessment score. In that way, both the semantic content and the affect content would be utilized to provide an assessment of a medical or psychological condition.
In yet another example, not shown, a practitioner 120 and/or the virtual agent 111/AI agent 113, utilize verbal communication, a camera, and a microphone to assess Amyotrophic Lateral Sclerosis (ALS). As before, guidance regarding suitable questions and comments can be taken from the priority provisional application, and the relevant literature, and following is an example of a very short portion of a possible assessment.
Agent : “Please count up from 1 until you run out of breath”
Responding person: “1 . . . 2 . . . 3 . . . 4 . . . 5 . . . 6 . . . .7 . . . 8 . . . 9 . . . .”
Agent: “Thank you. That was great. Can you now repeat the following sentences after me?”
Responding person: <repeats sentences>
In this example the virtual agent 111/AI agent 113, in cooperation with the data store 112, would use the rate of the responding person's speech to estimate semantic information, the duration of a breath to estimate respiratory information and the facial expression and prosody of speech to estimate affective content.
In the different examples above, there can be differences in the relative importance of audio and video information coming from the responding person. For example, in some examples, hand movements are more important, and in other examples, the speech can be more important. These differences can become significant if there are transmission or other line difficulties. In such cases, the communication agent 114 is configured to make adjustments to prioritize audio over video, or vice versa. This can be done by adjusting the relative bandwidth of audio and video during data streaming and collection, or by using different weighted combinations of content extracted from post-processed audio and video streams in order to produce assessments or inferences.
It should be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the spirit of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification refers to at least one of something selected from the group consisting of A, B, C . . . . and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc.
This application claims priority to provisional patent application Ser. No. 63/050284, filed on Jul. 10, 2020. The provisional and all other referenced extrinsic materials are incorporated herein by reference in their entirety. Where a definition or use of a term in a reference that is incorporated by reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein is deemed to be controlling.
Number | Date | Country | |
---|---|---|---|
63050284 | Jul 2020 | US |