Not applicable.
This invention relates to artificial intelligence (AI), in particular AI based agents and methods of using them.
AI agents are digitally instantiated entities in complex systems that specifically sense their environment, exhibit attributes, adopt behaviors, and perform actions towards goals as determined by their underlying models. Such models may be deterministic or probabilistic. AI agents require sensors (to collect data) and actuators (to affect the world).
Interactive AI agents (or simply Interactive Agents [IA]) are a class of AI agents that directly or indirectly interface with one or more human users of the agent or the larger system in which they exist. At a basic level, IA both ingest human-sourced sensing data to inform the models driving them and output data in the form of human-destined communication, using one or more modalities (such as visual, aural, touch-based, smell-based, or taste-based).
Conversational agents (CAs) are a class of Interactive Agents that use natural language as their primary interface with humans as part of a sociotechnical system. The nature of their use influences the design of the underlying artificial intelligence and their embodiment. To date, CAs have been applied in three relevant use cases:
Humans interacting in conversations use various forms of information to establish and maintain context. This information not only includes the utterances during the conversation, but also includes but is not limited to, the topic of the conversation, any materials that were provided prior to the conversation (e.g. read ahead materials, presentation slides, assigned readings, etc.), memories of prior conversations, knowledge of other participants in the conversation, participants' position (e.g., subordinate, peer, superior, etc.), participants' roles in the conversation (e.g., facilitator, subject matter expert, novice, unknown, etc.), the participants' previously known positions on the conversation's topic (e.g., pro, anti, neutral, etc.), and any indicators of a participants' emotional state, (body language, facial expressions, speech volume, speech speed, inflections, emphasis, etc.). For CAs to effectively participate in a conversation and communicate with their human counterparts they also need to establish and maintain context in terms of the conversation.
The following summary is included only to introduce some concepts discussed in the Detailed Description below. This summary is not comprehensive and is not intended to delineate the scope of protectable subject matter, which is set forth by the claims presented at the end.
In one example embodiment, a processor based method of having an artificial intelligence (AI) agent system contribute one or more utterance to a current discussion is provided, the method comprising collecting information about a current discussion, generating one or more potential statement for the AI agent system, constructing one or more utterance from the one or more potential statements, and communicating the one or more utterance through a user interface.
In one example embodiment, an artificial intelligence (AI) agent system for participating in a current discussion, the AI agent system comprising one or more processors, and one or more memory elements including instructions that, when executed, cause the one or more processors to perform operations comprising: collecting information about a current discussion. generating one or more potential statement for the AI agent system, constructing one or more utterance from the one or more potential statements, and communicating the one or more utterance through a user interface.
In one example embodiment, a processor-based method of communicating an attribute of an artificial intelligence agent system through an interface is provided, the method comprising receiving an input data at an agent subsystem, determining a first attribute value of a response attribute of an artificial intelligence agent system for a first temporal period given the input data, determining a second attribute value of the attribute of the artificial intelligence agent system for a second temporal period given the input data, determining a first interface attribute value of an interface attribute representing the first attribute value, determining a second interface attribute value of the interface attribute representing the second attribute value wherein the second attribute value is a different value than the first attribute value, and communicating the first interface attribute value and the second interface attribute value to the interface wherein the interface is a dynamic interface.
In one example embodiment, a processor-based method of determining an attribute value from a dialog input to an artificial intelligence agent system is provided, the method comprising receiving a dialog input data, associating a dialog input data value for the dialog input data with one or more attribute value of an attribute of the artificial agent system, determining a dialog fit between the input data value and the one or more attribute value of the dialog input data, and selecting the one of the one or more attribute value that optimizes dialog fit as the attribute value.
In some embodiments, the dialog input data comprises a dialog data type selected from the group comprising a text data, an audio data and a visual data.
In some embodiments, the attribute represents a state of the agent subsystem or a response of the agent subsystem.
In some embodiments, the interface attribute comprises a perceptual attribute, a sociotechnical attribute, a presential attribute or one selected from the group consisting of a text attribute, a shape attribute, a size attribute, a color attribute, a motion attribute, a texture attribute, a space attribute, a form attribute, a sound attribute and a sensory attribute. In some embodiments, the sociotechnical attribute represents the agent subsystem as a peer to a human.
In some embodiments, the attribute engine is configured to perform the method of determining a first and second input data value from the input data, populating an attribution algorithm with the first and second input data value, determining the first and second attribute value with the attribution algorithm and communicating the first and second attribute value to the interface engine.
In some embodiments, the interface engine is configured to perform the method of receiving the first and second attribute value, populating an interfacing algorithm with the first and second attribute value, determining the first and second interface attribute value with the interfacing algorithm and communicating the first and second interface attribute value to the output interface.
The disclosed contextualized components, alone or implemented in and AI agent system, all also called contextualized systems, combine interaction information, such as a dialog, with context reasoning to enhance the capabilities of artificial intelligence agent systems. The interactions may take advantage of cognitive principles and recent advances in interaction technology to improve the capabilities of artificial intelligence agent systems. The contextual information may place input information in better context for leveraging by the AI agent system or the contextual information may be used to define attributes used by the output interface.
The contextualized systems may be implemented alone or in various systems such as AI agent systems. Using an example of a data analyst using a human machine system in intelligence analysis, the context reasoning processes computational representations of context (including mission data, related intelligence, and user interactions) may be used to proactively support the analyst with partially automated product generation and recommendations of relevant information and information products. This enables analysts to more effectively process an ever-increasing amount of data.
Contextualized systems may quantify and maintain links between data from processing to dissemination. Unlike current static work products, work products generated with contextualized systems may be dynamic and interactive, enabling work product consumers to quickly access the information they need, when they need it. The dynamic nature of these products means that additional contextual information is available on-demand.
In one embodiment, a contextualized system is provided comprising: a context platform configured to receive an input data; the context platform configured to define, from the input data, a first property value of a first node corresponding to a multi-layer knowledge graph; the context platform configured to define a second property value of a second node of the multi-layer knowledge graph; the first node and the second node comprising a node pairing; the context platform defining a relationship property value of a relationship type between the first node and the second node; and a recommendation engine configured to execute a recommendation algorithm to automatically determine a context-aware recommendation of a third node based on a connection strength measure and a similarity measure.
In some embodiments, the recommendation algorithm comprises a graph traversal algorithm configured to: (a) identify one or more additional node pairing of the first node connected by any relationship type to another node in a graph layer of the multi-layered knowledge graph; (b) calculate a connection strength measure of the relationship type for each node pairing and associate the connection strength measure to each of the nodes in the node pairing; (c) calculate a similarity measure of the nodes in each node pairing and associate the similarity measure to each of the nodes in the node pairing; (d) iterate steps (a)-(c) for a next step out of the graph layer for subsequent node pairs of nodes connected by any relationships type until a threshold traversal depth of steps; (e) define each of the nodes in the each of the node pairings and the subsequent node pairings as a plurality of related nodes; (f) filter the plurality of related nodes to define a plurality of filtered nodes as a plurality of potential recommendations; (g) determine a weighted value of each of the plurality of filtered nodes as a function of the connection strength measure and the similarity measure; and (h) select the filtered activity node with the greatest weighted value as the context-aware recommendation.
In some embodiments, the first and the second node are selected from the group consisting of an activity node, a content node, an actor node and mission node.
In some embodiments, the input data comprises a chat message. In some embodiments, the input data comprises a representation of a user activity with a user interface.
In some embodiments, the first and the second node are selected from the group consisting of an activity node, a content node, an actor node and mission node.
In some embodiments, the contextualized system further comprises a synonymy layer configured to translate the input data to match the first property value and the second property value as defined by a pre-defined domain model.
In some embodiments, a processor-based method of automatically determining a context-aware recommendation to a user of a human-machine system is provided, the method comprising: receiving an input data; defining, from the input data, an activity property value of an activity node corresponding to a multi-layer knowledge graph; defining a content property value of a content node of the multi-layer knowledge graph; defining a relationship property value of a relationship type between the content node and the activity node; and executing a recommendation algorithm to automatically determine a context-aware recommendation for a second activity node or a second content node based on a connection strength measure and a similarity measure.
Other objects, features, and advantages of the techniques disclosed in this specification will become more apparent from the following detailed description of embodiments in conjunction with the accompanying drawings.
In order that the manner in which the above-recited and other advantages and features of the invention are obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
5 shows a process diagram illustrating one example embodiment methods of using components of the contextualized system;
COPYRIGHT NOTICE: A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to any software and data as described below and in the drawings hereto: Copyright© 2020-2023, Aptima, Inc, All Rights Reserved.
AI agent systems and methods of use will now be described in detail with reference to the accompanying drawings. Notwithstanding the specific example embodiments set forth below, all such variations and modifications that would be envisioned by one of ordinary skill in the art are intended to fall within the scope of this disclosure. For example only, and not for limitation, although example embodiments include use of the disclosed systems and methods with AI agent and contextualized systems in the course of dialog, they may also be used with other input or interaction information and output or recommendation types.
As used herein, the term “module” refers to hardware and/or software implementing entities, and does not include a human being. The operations performed by the “module” are operations performed by the respective hardware and/or software implementations, e.g. operations that transform data representative of real things from one state to another state, and these operations do not include mental operations performed by a human being.
The terms “sensor data”, as used herein is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art (and are not to be limited to a special or customized meaning), and furthermore refers without limitation to any data associated with a sensor, such as a continuous analyte sensor.
Recent advances in AI—specifically the advent of transformer-based generative language models exemplified by OpenAI's GPT-2—have resulted in capabilities that can generate believable, if not strictly veridical, statements, allowing the model to support a broad range of improvisational discourses as a peer to humans.
However, there exists no process for operating generative models as peers in a sociotechnical system, nor definitions of the methods necessary to effectively enable interaction between an AI agent system leveraging such models and human participants.
Specific problems include gaps in technology and methods to:
Therefore, of specific interest is a process for operating a generative language model in multi-participant, improvisational, conjecture-based discourse providing:
The disclosed AI agent system addresses several of these technical problems and improves the function of AI agent systems, in particular those system involved in active discourse with participants.
To understand a multi-party discourse and automatically provide a relevant response to the discourse, the disclosed AI agent system receives and processes multi-entity discourse data, applies response attribute algorithms to automatically provide a relevant response from the AI agent system. While the response is being generated, state attribution algorithms are applied to the discourse data to determine state attribute values and these values are communicated to a dynamic user interface to represent the state of the AI agent system.
Input data is received from sensors and this input is multiplexed and labeled dynamically into a single input data stream for analysis. The response and state attributes are quantified by applying reinforcement learning algorithms to identify attributes and attribute values from predetermined attribute and attribute values given the input/state data in the input data stream.
The disclosed system also improves AI agent systems by incorporating contextual information to inform the system when determining input data and when providing output to the output interface. Context-aware reasoning is provided by the use of multi-layer knowledge graphs and pattern recognition techniques across these graphs. By representing different attributes as different knowledge graphs, relationships between different attributes, such as contextual relationships, can be defined and more easily found and used to inform the AI agent system. The contextual information may be able to place input data, such as participants' statements and utterances, in better context for leveraging by the AI agent system and the contextual information may be used to define attributes used by the output interface.
The multi-layer knowledge graphs are used to establish connections between input data or interaction information such as participants, participants' attributes, participants' utterances, and conversations' attributes. For example, once captured, attributes regarding the participants' utterances, their affective state, their actions, and any prior utterances, affective state, and actions are represented by multiple knowledge graphs connected by common nodes.
A conversation can be represented as a single knowledge graph with connections containing participant utterances spoken over time—a sequence of utterances each with attributes such as who spoke the utterance, the time of the utterance, and the context of the utterance. Simple attributes about the participant may be available such as attributes about the speaker—their position in the conversation (moderator, facilitator, participant), their role in the organization, their background (education, experience, etc.). However, a conversation constructed as a connected set of knowledge graphs provides a connection of each utterance to knowledge graphs of the participant vs. simply attributes of the participant.
Representing a participant as a knowledge graph allows complex relationships to be constructed over the course of the conversation. With the relationships represented as a set of interconnected knowledge graphs, the contextualizer module can use methods such as pattern recognition techniques to identify relationships between different attributes, such as contextual relationships, to recommend information such as context information or work product to inform the AI agent system.
To assess and ensure the relevance and quality of statements and utterances generated by an AI agent system at-scale and at-speed prior to the dialog action, the technical solution disclosed employs a series of generative language (GL) models relying on the dual actionable data consisting in detected statements or utterances and the contextualized meta-data tagged to them. The GL models use the context provided by the current conversation to generate potential statements for the AI agent system. The GL models provide statements in multiple styles, for different components of a response or for different stages of a conversation.
To select and aggregate the relevant attributes over which an AI agent system reasons to provide utterances as responses to fit in the conversation, the technical solution employs a probabilistic and iterative method to score the relevance of various attributes in determining the content and form of the AI agent system's contribution to the conversation. The solution reviews a list of pre-determined attributes and assesses their relevance against the content and context embedded in the actionable data from the conversation. Attributes are then ranked based on a multifactorial score and a subset is selected based on one or more criteria.
To assess and ensure the coherence of the content and form, generally defined as attribute values, of the statements and utterances of an AI agent system, the technical solution employs a probabilistic and iterative method to score the relevance of various candidate attribute values in determining the coherence and fit of the AI agent system's contribution to the conversation. The solution constructs utterances (content) from one or more generated statements. This provides both a filter to eliminate unwanted candidate statements, and an opportunity to use the most relevant statement in each utterance. The solution also constructs multimodal communications (form) to embody the utterances to be shared with human participants.
The disclosed AI agent system achieves peer-level recognition from users by providing an interface to function as a gateway, or interface, from the language model to the other participants involved in the conversation. That gateway visually depicts the current state of the agent (thinking, speaking, idle, ready to interject) as a proxy for body language and other non-verbal communication, as well as a text-to-speech capability so that the agent can contribute utterances to the conversation.
The disclosed AI agent system is different from conventional solutions.
Regarding input collection, the disclosed system collects input from multiple sources, such as multiple participants (be they human or even AI) in real-time, whereas prior work has collected utterances from a single human participant.
Regarding input contextualization, the disclosed system contextualizes input from participants within the scope of the discourse as relevant background to incorporate into an utterance or as a direct prompt for an utterance, whereas prior work has treated all utterances as direct prompts or revisions to direct prompts.
Regarding utterance construction, the disclosed system enables utterance construction from discrete statements generated by multiple models, whereas prior work has only leveraged a single model designed for information retrieval, not generation.
Regarding attribute selection, the disclosed system dynamically and contextually optimizes its reasoning over multiple dimensions, where prior work has employed limited (sometimes single) and static attributes.
Regarding attribute value selection, the disclosed system provides a real-time, on-the-loop quality assessment and feedback for novel utterances, whereas prior work has assessed congruence of factual information relative to the request.
Regarding output embodiment, the disclosed system achieves conversational fit and peer-level recognition from humans because it:
The AI conversational agent system is not a conventional solution.
Regarding input collection and contextualization this process:
Regarding attribute and attribute value selection, this process:
Regarding output embodiment, this process achieves peer-level recognition of the AI agent system from humans because it:
This technical solution is directed to the practical application of AI agents to interact with multiple users such as participating in conference panel discussions, moderating group brainstorms, or generating reports or other artifacts in a collaborative fashion.
Regarding input collection, the disclosed systems and methods provide a new feature to traditional speech-to-text as a sensor capability by multiplexing and labeling multiple participants into a single input stream for analysis. This integrated and labeled input stream allows the generative language model to differentiate between what other participants are discussing and what utterances it has formulated. This improves the ability of the language model to stay “on topic” while providing it the freedom to improvise and generate novel contribution to the multi-participant activity.
Regarding input contextualization, the AI agent and contextualized systems provide a new feature to augment GL models with context-aware metadata that renders them more efficient and effective. The contextualizer module traverses and extracts the connections of knowledge graphs representing the organization of various sources of contextualized input collected over the conversation in order to provide a rank ordering of the context-aware meta data. This improves the AI agent system's ability to fit with the conversational flow in a manner that provides value to the group, as opposed to being “on topic” but contributing content and form that do not move the conversation forward.
Regarding utterance construction, this AI agent system improves upon conventional utterance generation processes by involving multiple generative language models with inherently different training, and therefore performance, characteristics into a unified process. This produces two possible novel results that improve the dynamism and natural qualities of the discourse: 1) statements generated for two or more different generative language models can be constructed into a single utterance, thus creating a heterogeneous utterance; and 2) different models can be used to construct sequential utterances, thus creating a heterogeneous utterance sequence. Prior work only supports homogeneous utterances and utterance sequences.
Regarding attribute and attribute value selection, AI agent system further improves the fit of the AI agent system's contribution by dynamically optimizing the content and the form the system's contribution to maximize multi-dimensional objectives such as peerness, appropriateness, or relevance. In turn, the AI agent system becomes an adaptive and integral part of the group no matter how or where the group interactions go, as opposed to being a static tool or side element.
Regarding output construction, the AI agent system improves the efficiency in human-machine interface by inverting the default human-computer interaction paradigm of human superiority. Altogether, the AI agent system is able to: to take input from the moderator and/or other panelists as speech; generate natural language responses to questions that account for responses; vocalize the generated statements so that they can be heard by the audience; provide an embodiment on stage so that the audience and panel members are aware of its presence; and provide some indication of its internal state such as readiness to speak.
For illustration purposes and not for limitation, one example embodiment of the AI agent system is shown in
The AI agent system 100 shown in
The agent subsystem 120 generally comprises a data processor module 130 an agent attribute engine 140 and an interface engine 160. The data processor module 130 is generally configured to receive, process, format and communicate received input data 112 and database data the agent subsystem components such as the agent attribute engine 140. The data processor module 130 generally receives the input data 112, converts it to a digestible stream of data, tags the data and formats it for communication to the agent attribute engine 140. The data processor module 130 may also be able to identify the entity providing the input data and may be able to determine contextual information regarding the input. The agent attribute engine 140 is generally configured to receive processed input data and provide input to the interface engine 160. The agent attribute engine 140 generally defines attributes of the AI agent system where the state attribute engine 142 generally defines relevant states that the AI agent can adopt based on the input data 112 and the response attribute engine 144 generally defines the relevant responses that the AI agent can contribute based on the input data 112. The interface engine 160 generally defines how attributes are communicated through the interface where the state interface engine 162 defines how the state is communicated and the response interface engine 164 defines how the response is communicated. Both the attribute and the interface engine are the recipient of input data 112 and system data from databases 180. The agent subsystem 120 generates data transmitted to an output interface 170 for communication to and consumption by third party users (humans or AI). The input data 112 is generally used to represent current environmental state information used by algorithms in both the agent attribute engine 140 and the interface engine 160 to determine a response and other AI agent system output to the interface.
The data processor module 130 generally receives the input data 112 such as interaction or conversation data through one or more input interface 110 such as sensors or microphones. The converter 131 transforms the collected input data 112 to a digestible stream of data, for example by using a speech-to-text capability. The entity identifier module 132 dynamically identifies input data as coming from a particular entity in a multi-entity conversation. The entity identifier module 132 uses deterministic or probabilistic reasoning methods to delineate between the participants/entities for appropriately tagging of the data with relevant meta-data. The tagger module 133 dynamically applies meta-data tags to the converted data stream and the formatter module 134 formats the data for communication to the agent attribute engine 140. The tagging with meta-data allows the data to be used by one or more language models such as natural language processing (NLP) models for analysis of the actionable data or generative language (GL) models as input so they can produce new statements or utterances. The contextualizer module 135 provides context-aware reasoning to generate additional actionable data that places the participants' statements and utterances in context for analysis by the AI agent system. The contextualizer module 135 may implement methods to contextualize input data such as those methods disclosed in co-pending U.S. patent application Ser. No. 17/143,152, entitled “CONTEXTUALIZED HUMAN MACHINE SYSTEMS AND METHODS OF USE” and filed on Jan. 6, 2021, the content of which is herein incorporated by reference in its entirety. The contextualizer module 135 may also perform an NLP analysis of detected statements and utterances in the input data to assess attributes that contextualize each statement or utterance. Examples of such attributes include speed of speech, tone, emotional valence, likelihood of links to previous statements or utterances. The data processor module 130 may use a series of GL models relying on the input data attributes present in detected statements or utterances and the contextualized meta-data tagged to them.
For the disclosed solution, context is defined in a manner that takes advantage of the input provided and allows for use by the AI agent system components. In the disclosed embodiments, context may be defined as any type of information. For example, context may be about the actors, content, mission or activity associated with the system that may impact interpretation of the input date or actions to be made on the input data. With this example of four attributes, these four attributes are then treated as layers in a knowledge graph that can be analyzed using a variety of techniques. This graph-based context model can be dynamically populated by parsing input data according to a pre-defined domain model. And for system data that cannot be easily parsed with common technologies, additional tools such as a synonymy layer may be used to translate input data into domain model consistent language.
As shown, the interface engine 360 may comprise a state interface engine 362 of the interface engine 360 may further comprise output state interface attributes and values and a state interface algorithm module 363. As also shown, the interface engine 360 may retrieve and store data such as state and response interface attribute data 384.
Generally, the output of the interface engine 360 is communicated to the output interface 370.
Within the AI agent system, reinforcement learning may be used as an algorithm to make decisions on which attributes and what values should be applied to those attributes. Through reinforcement learning, the AI agent system employs trial and error to come up with an attribute or value to the situation. The algorithm assigns either rewards or penalties for the actions the agent performs and its goal is to maximize the total reward.
The reinforcement learning algorithms may be value-based, policy-based or model-based. For a value-based method, the algorithm generally attempts to maximize a value function V(s). In this method, the agent is expecting a long-term return of the current states under policy π. For a policy-based method, which may be deterministic or stochastic, the algorithm generally attempts to come up with such a policy that the action performed in every state helps you to gain maximum reward in the future. For a model-based method, a virtual model must be created for each environment (e.g., attributes and interfaces) and the algorithm learns to perform in that specific environment.
For example, for one method, the following formula and parameters are used to define attribute values:
where:
For example, for a model-based method, the virtual models may be a Markov Decision Process (MDP) model or a Q learning model. For an MDP, the following parameters are used to get a solution:
For a Q-value or action-value (Q): Q-value is similar to Value—V, except that it takes an extra parameter, the current action a. Qπ(s, a) refers to the long-term return of the current state s, taking action a under policy π.
In some embodiments, the attribution algorithm comprises a reinforcement learning algorithm to select a least one attribute value wherein the reward function comprises Ra(s,s′) wherein A represents a set of actions of an agent, S represents a set of an environmental state and an agent state of the agent and the reward function is defined to determine a maximum relevance of an attribute value of the agent and the attribute value with the maximum relevance defines the first interface attribute value and the second interface attribute value to the interface.
In some embodiments, the reinforcement learning algorithm comprises a Partially Observable Markov Decision Process (POMDP) and the reward function of the POMDP is defined to maximize a relevance of the attribute value wherein the attribute value defines the first interface attribute value and the second interface attribute value to the interface.
In some embodiments, the interfacing algorithm comprises a reinforcement learning algorithm to select at least one interface attribute value for the agent subsystem representing A, B, and C wherein A defines a context variable, B defines a temporal variable and C defines a desired meaning variable. In some embodiments, the reinforcement learning algorithm comprises a Partially Observable Markov Decision Process (POMDP) and the reward function of the POMDP is defined to maximize a peerness of the agent subsystem.
Generally, as used in the AI agent system, the reinforcement learning algorithms take input data representing the current state/situational data from the environment, map that current data against predefined attributes from the attribute and attribute value databases to populate the reinforced learning algorithms. The populated algorithms are then used to determine the state of the AI agent system, determine a response and determine how that response and state will be represented in the interface.
As shown in
In some embodiments, the steps of determining attributes may comprise those steps disclosed in co-pending U.S. patent application Ser. No. 17/374,974, entitled “ARTIFICIAL INTELLIGENCE AGENT SYSTEMS AND METHODS OF USE” and filed on Jul. 13, 2021, the entire contents of which are incorporated herein.
Referring to
In some embodiments, the state attribute values 347B are determined by the state attribute algorithm module 343 which generally defines the attributes of the state of the AI agent system based on input data. The state attribute algorithm module 343 may comprise a state attribution algorithm 346 which may further comprise a state selection algorithm 346A and a scoring algorithm 346B. The state attribution algorithm 346 generally reviews potential combinations of state attribute values against input data and outputs the set of state attribute values that maximizes relevance. The state selection algorithm 346A generally generates the combinations of states attribute values to be reviewed, based on a database of predetermined possible values. The scoring algorithm 346B generally computes the relevance of the selected set of state attribute values against the input data and returns a relevance score to the attribution algorithm.
The response attribute engine 344 generally defines the attributes of relevant responses that the agent can contribute based on inputs. The response attribute engine 344 may comprise response attributes and values 348 and a response attribute algorithm module 345. For example only, and not for limitation, examples of response attributes 348A may include volume, type of response, content of response. For example only, and not for limitation, examples of response attribute values 348B for response attribute volume may include loud or quiet, examples of response attribute values for response attribute type may include a statement or a question, and examples of response attribute values for response attribute content may include generated utterance A, generated utterance B or generated utterance C.
In some embodiments, the possible response attributes 348A are predetermined in a database such as the state and response attributes data 383 and state and response data 382. In some embodiments, the attribute values are determined by the response attribute algorithm module 345.
The response attribute algorithm module 345 may comprise a response attribution algorithm 349 which may further comprise a response generation algorithm 349B, a generative response model 349A and a scoring algorithm 349C. The response attribution algorithm 349 generally reviews potential combinations of response attribute values against input data and outputs the set of response attribute values that maximizes relevance. The response generation algorithm 349B generally generates the combinations of response attribute values to be reviewed, based on a database of predetermined possible values (e.g., “volume” or “type”) or based on a generative response model 349A. The scoring algorithm 349C generally computes the relevance of the selected set of response attribute values against the input data and returns a relevance score to the attribution algorithm. The generative response model 349A permits the creation of new utterances.
Referring to
The state interface algorithm module 363 may comprise a state interfacing algorithm 366 which may further comprise a state selection algorithm 366A and a state scoring algorithm 366B. The state interfacing algorithm 366 generally reviews potential combinations of state interface attribute values against input data and outputs the set of state interface attribute values that maximizes relevance such as peerness. The state selection algorithm 366A generally generates the combinations of states attribute values to be reviewed, based on a database of predetermined possible values. The state scoring algorithm 366B generally computes scores for interface attributes (e.g., peerness, relevant, appropriateness, etc.) to determine the state interface attribute value to be communicated to the output interface.
The response interface engine 364 generally determines the output response interface attributes and values 368 provided by the AI agent system. The output response interface attributes and values 368 of the response interface engine 364 may comprise response interface attributes 368A such as for example only and not for limitation audible sound, visual image, utterances, peerness, relevance, appropriateness or presence. Typically, the response interface attributes 368A are predefined in the state and response interface attribute database 384. Response interface attribute values 368B for these attributes reflect an objective representation of these attributes and are determined by the response interfacing algorithm 369. The response interfacing algorithm 369 may comprise a response selection algorithm 369A to identify potential response attribute values and the response scoring algorithm 369B scores the potential response interface attribute values to determine the response interface attribute value to be communicated to the output interface. For example, and not for limitation, examples of response interface attribute values for the response interface attribute of peerness may be the specific color or size of a graphic icon or the specific volume of an audible response to match the volume of other participants. For example, and not for limitation, examples of state attribute values for the state attribute of presence may comprise a changing graphic icon, a specific color, a specific size of icon or the specific volume of an audible response to mimic a human's breathing. Typically, the response interface attribute values 368B are determined by the response interface algorithm module 365.
As shown in
These state interface attributes may also comprise nonvisual interface attributes such as sound, smell, texture, vibration or other sensory attributes.
These state interface attributes may also comprise multiple interface attributes such as a visual attribute combined with other attributes. For example, a transition from idle to thinking would be a change in the visual attribute and may also include the phrase “that's a good question, let me think about that”.
Response interface attributes may be similarly dynamic to represent a response of the AI agent system. For illustration purposes only and not for limitation, examples output response interface attributes and values may include utterances, sounds, words, textual or other data provided as a response to the input data.
In one example embodiment the attributes and the attribute values provided by the interface engine to the output interface are specifically chosen to represent “peerness” as an interface attribute. This peerness is a sociotechnical attribute specifically intended to help portray the AI agent system as a “peer” to the other entities is it interacting with. For illustration purposes only and not for limitation, examples of interface attributes for peerness for a graphic icon may comprise size, shape, color, rate of change or textual content. For illustration purposes only and not for limitation, examples of interface attributes for peerness for an audible interface may comprise volume, choice of words, content of words or rate of word/data flow. The values for these attributes would be selected in an attempt to have the interface come across as a peer of the other participants.
The dynamic features of the interface result in unique features for the AI agent system such as:
As shown in
For illustration purposes and not for limitation, one example embodiment of methods used within the contextualizer module is shown in
Initializing context data at 535A is generally the population of multi-layer knowledge graphs with properties and property values defined by the domain model and the definition of context. Context data is created by unifying input data through tools such as NLP processes or through the use of tools such as the synonymy layer. With the unified input data, vertex and relationship property values can be determined according to the domain model and these values are used to populate the appropriate knowledge graph. The multi-layer knowledge graphs may be populated with attributes such as: a topic attribute, a participant attribute, a agenda attribute, a place attribute, a modality attribute, a read ahead attribute, and a location attribute. The multi-layer knowledge graphs may be populated with attributes such as: a physiological data of the entity, a behavioral data of the entity, a location data of the entity; and an orientation data of the entity. Populated knowledge graphs are used as prior knowledge graphs to be updated and used with current interaction information.
Receiving input data at 535B is generally the system components receiving current interaction information, such as communication data as input data for the system user or receiving data from information sources the user is subscribed to. This input data is also used to update and create current knowledge graphs.
Contextualizing and making a recommendation at 535C generally comprises the use of the recommendation algorithm with the knowledge graphs. The recommendation algorithm takes graph nodes pairs and determines a connection strength measure of the pairs and a similarity measure of the pairs. Each of the graph nodes are factored with their corresponding connection strength and similarity measures to define a ranking of relevancy of the nodes and the most relevant node is then used as the recommendation.
Referring to
In this layered model, the activity layer acts as a connective tissue to fuse the knowledge graphs, allowing for use case-specific implementation of the actor, content, and mission layers in the system.
Data for the activity nodes is captured by monitoring and logging the interaction information such as the conversation or the activity of the analyst. Monitoring and logging interaction information such as dialog is monitored as described herein. Activity logging as may be used with an analyst is a well-documented practice in both academic and commercial realms. Traditional approaches model system-level events, such as mouse clicks, keystrokes, and/or window focus changes, then use post-hoc analyses to derive insights. These traditional methods have shortcomings as they require significant additional information to imbue the system events with process semantics and are not well-suited to dynamic task changes. Instead, the disclosed systems capture and represent activity semantics explicitly, by translating system-level events into semantic events imbued with the requisite data from an activity modeling approach within the user interface. Data representing activity semantics may comprise any set of data sufficient to populate the node and represent the activity. In one embodiment, a minimum tuple that must be captured is defined as (actor; target; action), where target can be an entity in the mission or content layers. This tuple is appropriate when the actor is mutating a property of an element, such as rating the credibility of the information contained in a snippet of text. In some embodiments, more complex activity may also include an output. In this case, the tuple would be adjusted to be (actor; source; output; action). Limits on the tuples that are captured are not required and it is recommend that the system capture as much detail as possible. It is significant to label the actions of the activity nodes in a semantically meaningful way. Looking again at
With the known list of initial participants in the conversation, the initialization module can also load the information of the participants. This information on the participants may comprise title, position in an organization, role in the conversation (moderator, participant, subject matter expert, etc.), and location as may be obtained from the context data repository 881. Additionally, any prior conversations knowledge graphs previously generated by the contextualizer module can be obtained. These previously generated knowledge graphs can be found based on the topic of the current conversation, the participants in the current conversation, or based on other information captured in the initialization module. The initialization module is complete once the current knowledge graph is interconnected to the previous knowledge graphs based on their common attributes including topic, participant, location, agenda items, etc.
The maintenance module updates the current conversation with the utterance which includes information such as the participant provided the utterance, and the time of the utterance was made. Other information about the utterance may be captured such as volume of a spoken utterance and the pitch of the utterance may be captured and included in the knowledge graph. Beyond the utterance and its related information, data may be captured from the participant in real-time which could include the physiological data (heart rate, respiration, galvanic skin response, etc.), behavioral data, (positional, tasking, activity data, etc.), and environmental data (temperature, humidity, location data, etc.). This data is then integrated in the knowledge graph.
In
Filtering related nodes at 1157 generally comprise the steps of:
Determining a recommendation at 1158 generally comprises the steps of:
In one example embodiment, a contextualizer module for conversation supports the AI agent system by continuously providing updated context as new utterances are made in a conversation. In this embodiment, a discussion will include a topic, a location, three human participants, and the AI agent system. The topic will be discussion, the AI agent system comprising one or more processors, and one or more memory elements including instructions that, when executed, cause the one or more processors to perform operations comprising: collecting information about a current discussion. generating one or more potential statement for the AI agent system, constructing one or more utterance from the one or more potential statements, and communicating the one or more utterance through a user interface.
In some embodiments, the step of determining the state of the agent may comprise steps performed by a functional state estimation system as disclosed in co-pending U.S. patent application Ser. No. 17/000,327, entitled “SYSTEMS AND METHODS TO ESTIMATE A USER FUNCTIONAL STATE” (MOTOR) and filed on Aug. 23, 2020, the entire contents of which are incorporated herein. For example, see the functional estimator subsystem features shown in
Although not shown in
As will be readily apparent to those skilled in the art, one embodiment of the AI conversational agent systems and methods can be embodied in hardware, software, or a combination of hardware and software. For example, a computer system or server system, or other computer implemented apparatus combining hardware and software adapted for carrying out the methods described herein, may be suitable. One embodiment of a combination of hardware and software could be a computer system with a computer program that, when loaded and executed, carries out the respective methods described herein. In some embodiments, a specific use computer, containing specialized hardware or computer programming for carrying out one or more of the instructions of the computer program, may be utilized.
Computer program, software program, program, software or program code in the present context mean any expression, in any language, code or notation, of a set of instructions readable by a processor or computer system, intended to cause a system having an information processing capability to perform a particular function or bring about a certain result either directly or after either or both of the following: (a) conversion to another language, code or notation; and (b) reproduction in a different material form. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
The processor 1210 is capable of receiving the instructions and/or data and processing the instructions of a computer program for execution within the computer system 1200. In some embodiments, the processor 1210 is a single-threaded processor. In some embodiments, the processor 1210 is a multi-threaded processor. The processor 1210 is capable of processing instructions of a computer stored in the memory 1220 or on the storage device 1230 to communicate information to the input/output device 1240. Suitable processors for the execution of the computer program instruction include, by way of example, both general and special purpose microprocessors, and a sole processor or one of multiple processors of any kind of computer.
The memory 1220 stores information within the computer system 1200. Memory 1220 may comprise a magnetic disk such as an internal hard disk or removable disk; a magneto-optical disk; an optical disk; or a semiconductor memory device such as PROM, EPROM, EEPROM or a flash memory device. In some embodiments, the memory 1220 comprises a transitory or non-transitory computer readable medium. In some embodiments, the memory 1220 is a volatile memory unit. In another embodiments, the memory 1220 is a non-volatile memory unit.
The processor 1210 and the memory 1220 can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
The storage device 1230 may be capable of providing mass storage for the system 1200. In various embodiments, the storage device 1230 may be, for example only and not for limitation, a computer readable medium such as a floppy disk, a hard disk, an optical disk, a tape device, CD-ROM and DVD-ROM disks, alone or with a device to read the computer readable medium, or any other means known to the skilled artisan for providing the computer program to the computer system for execution thereby. In some embodiments, the storage device 1230 comprises a transitory or non-transitory computer readable medium.
In some embodiments, the memory 1220 and/or the storage device 1230 may be located on a remote system such as a server system, coupled to the processor 1210 via a network interface, such as an Ethernet interface.
The input/output device 1240 provides input/output operations for the system 1200 and may be in communication with a user interface 1240A as shown. In one embodiment, the input/output device 1240 includes a keyboard and/or pointing device. In some embodiments, the input/output device 1240 includes a display unit for displaying graphical user interfaces or the input/output device 1240 may comprise a touchscreen. In some embodiments, the user interface 1240A comprises devices such as, but not limited to a keyboard, pointing device, display device or a touchscreen that provides a user with the ability to communicate with the input/output device 1240.
The computer system 1200 can be implemented in a computer system that includes a back-end component, such as a data server, or that includes a middleware component, such as an application server or an Internet server, or that includes a front-end component, such as a client computer having a graphical user interface or an Internet browser, or any combination of them. The components of the system can be connected by any form or medium of digital data communication such as a communication network. Examples of communication networks include, e.g., a LAN, a WAN, wireless phone networks and the computers and networks forming the Internet.
The publication and other material used herein to illuminate the invention or provide additional details respecting the practice of the invention, are incorporated by reference herein, and for convenience are provided in the following bibliography.
Citation of the any of the documents recited herein is not intended as an admission that any of the foregoing is pertinent prior art. All statements as to the date or representation as to the contents of these documents is based on the information available to the applicant and does not constitute any admission as to the correctness of the dates or contents of these documents.
This application claims benefit of U.S. Pat. App. No. 63/371,404, filed on Aug. 15, 2023; this application is a Continuation in Part application of U.S. patent application Ser. No. 17/374,974, filed on Jul. 13, 2023; this application is a Continuation in Part application of U.S. patent application Ser. No. 17/143,152, filed on Jan. 6, 2021; this application is a Continuation in Part application of U.S. patent application Ser. No. 17/000,327, filed on Aug. 23, 2020; U.S. patent application Ser. No. 17/374,974 claims benefit of U.S. Pat. App. No. 63/051,305, filed on Jul. 13, 2020; U.S. patent application Ser. No. 17/143,152 claims benefit of U.S. Pat. App. No. 62/985,123, filed on Mar. 4, 2020; U.S. patent application Ser. No. 17/000,327 claims benefit of U.S. Pat. App. No. 62/916,077, filed on Oct. 16, 2019; and the entire contents of all are hereby incorporated by reference.
This invention was made with Government support under Contract No. FA8650-18-C-6869 awarded by the U.S. Air Force and FA8650-19-P-6002 awarded by USAF, AFMC, AFRL Wright Research Site. The Government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63371404 | Aug 2022 | US | |
63051305 | Jul 2020 | US | |
62985123 | Mar 2020 | US | |
62916077 | Oct 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17374974 | Jul 2021 | US |
Child | 18450277 | US | |
Parent | 17143152 | Jan 2021 | US |
Child | 17374974 | US | |
Parent | 17000327 | Aug 2020 | US |
Child | 17143152 | US |