PINNING ARTIFACTS FOR EXPANSION OF SEARCH KEYS AND SEARCH SPACES IN A NATURAL LANGUAGE UNDERSTANDING (NLU) FRAMEWORK

BACKGROUND

The present disclosure relates generally to the fields of natural language understanding (NLU) and artificial intelligence (AI), and more specifically, to an artifact pinning subsystem for NLU.

This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.

Cloud computing relates to the sharing of computing resources that are generally accessed via the Internet. In particular, a cloud computing infrastructure allows users, such as individuals and/or enterprises, to access a shared pool of computing resources, such as servers, storage devices, networks, applications, and/or other computing based services. By doing so, users are able to access computing resources on demand that are located at remote locations and these resources may be used to perform a variety computing functions (e.g., storing and/or processing large quantities of computing data). For enterprise and other organization users, cloud computing provides flexibility in accessing cloud computing resources without accruing large up-front costs, such as purchasing expensive network equipment or investing large amounts of time in establishing a private network infrastructure. Instead, by utilizing cloud computing resources, users are able redirect their resources to focus on their enterprise's core functions.

Such a cloud computing service may host a virtual agent, such as a chat agent, that is designed to automatically respond to issues with the client instance based on natural language requests from a user of the client instance. For example, a user may provide a request to a virtual agent for assistance with a password issue, wherein the virtual agent is part of a Natural Language Processing (NLP) or Natural Language Understanding (NLU) system. NLP is a general area of computer science and AI that involves some form of processing of natural language input. Examples of areas addressed by NLP include language translation, speech generation, parse tree extraction, part-of-speech identification, and others. NLU is a sub-area of NLP that specifically focuses on understanding user utterances. Examples of areas addressed by NLU include question-answering (e.g., reading comprehension questions), article summarization, and others. For example, an NLU may use algorithms to reduce human language (e.g., spoken or written) into a set of known symbols for consumption by a downstream virtual agent. NLP is generally used to interpret free text for further analysis. Current approaches to NLP are typically based on deep learning, which is a type of AI that examines and uses patterns in data to improve the understanding of a program.

Certain existing virtual agents implementing NLU techniques attempt to derive meaning from a received user utterance by comparing features of the user utterance to a stored collection of sample utterances. Based on any matches therebetween, the virtual agents may understand a request of a received user utterance and perform suitable actions or provide suitable replies in response to the request. In such search-based implementations of NLU, it may be important to compare multiple interpretations of the user utterance to a sizeable quantity of sample utterances to provide a widely-scoped meaning search. However, undirected expansion of user utterances and/or sample utterances to achieve this wide search scope may introduce challenges with respect to processing and memory resources, inference latency, precision, and consistency during meaning derivation.

SUMMARY

A summary of certain embodiments disclosed herein is set forth below. It should be understood that these aspects are presented merely to provide the reader with a brief summary of these certain embodiments and that these aspects are not intended to limit the scope of this disclosure. Indeed, this disclosure may encompass a variety of aspects that may not be set forth below.

Present embodiments are directed to an agent automation framework that is designed to extract meaning from user utterances, such as requests received by a virtual agent, and suitably respond to these user utterances. To perform these tasks, the agent automation framework includes an NLU framework and an intent-entity model having defined intents and entities (e.g., artifacts) that are associated with sample utterances. The NLU framework includes a meaning extraction subsystem that is designed to generate meaning representations for the sample utterances of the intent-entity model to construct an understanding model, as well as generate meaning representations for a received user utterance to construct an utterance meaning model. As noted herein, each meaning representation embodies a different understanding or interpretation of an underlying utterance, whether the utterance is a sample utterance or a received user utterance. Additionally, the disclosed NLU framework includes a meaning search subsystem that is designed to search the meaning representations of the understanding model (which defines the meaning search space) to locate matches for meaning representations of the utterance meaning model (which defines the meaning search keys). As discussed herein, the meaning search subsystem performs expansion of both the search space and the search keys, as well as targeted pinning of the search space, to provide structure-specific search participants for improved meaning derivation. As such, present embodiments generally improve search-based NLU by providing focused expansion of the search keys and the search space against which the search keys are compared.

More specifically, present embodiments are directed to an artifact pinning subsystem that includes the meaning extraction subsystem and the meaning search subsystem mentioned above. By coordinating operation of the meaning extraction subsystem and the meaning search subsystem, the artifact pinning subsystem leverages relational cues provided during user interaction with the virtual agent (e.g., a behavior engine) and/or during compilation of the understanding models for improved meaning searching. For example, the artifact pinning subsystem may receive structural information from embedded relationships within the intent-entity model, contextual information from a behavior engine (BE) guiding end-user interaction, or both, to generate a tailored search space against which the NLU framework compares user-utterance-based search keys to more adeptly interact with and satisfy requests of the users interfacing with the NLU framework. To generate a particular search space, the artifact pinning subsystem disclosed herein may first generate multiple meaning representations as utterance tree structures, which provide potential representations of the various understandings derivable from each sample utterance of the intent-entity model. In some cases, the meaning representations are generated via vocabulary cleansing, vocabulary injection, and/or various part-of-speech assignments that are applied to respective tokens associated with nodes of the utterance trees. However, any suitable techniques that generate alternative meaning representations corresponding to various understandings of the sample utterances, including alternative parse structure discovery, vocabulary substitution, re-expressions, and so forth, may be implemented within the NLU framework. A significant number of potential candidates for the search space are thus generated by construing each of the sample utterances to have multiple different understandings, each corresponding to a respective meaning representation.

Notably, to guide pruning of the potential candidates, the artifact pinning subsystem identifies that each sample utterance of the intent-entity model was annotated with artifact labels that define relationships between the intents and the one or multiple entities of each sample utterance, within the structure defined by the particular intent-entity model. For example, an author of the intent-entity model may identify that a particular sample utterance relates to a particular intent, then label or annotate any suitable entities within the respective sample utterance that correspond to (e.g., belong to, are related to) the particular intent. The author may similarly label additional entities corresponding to additional intents of each given sample utterance. As recognized herein, the artifact pinning subsystem leverages the artifact labels of the sample utterances to prune any meaning representations that are not valid representations of a particular sample utterance, thereby improving a quality of a subsequent intent match by excluding non-relevant meaning representations for the intent match.

In particular, with respect to each intent of the intent-entity model, the artifact pinning subsystem may generate a set of meaning representations from a respective sample utterance that include the respective intent, as well as one or multiple respective entities that correspond to the labeled entity of a corresponding sample utterance. As discussed in more detail below, verifying that the entities of the generated meaning representations align with the artifact labels annotated in a respective intent-entity model enables the artifact pinning subsystem to efficiently prune invalid or non-relevant meaning representations from consideration for improved meaning search quality. The artifact pinning subsystem may then re-express the set of meaning representations by altering the arrangement or included number of nodes of utterance trees associated with the set, removing any duplicate candidates, and finally, generating the search space based on the remaining meaning representations of the set that have the labeled entity in a proper entity format. Similarly, the artifact pinning subsystem may form the search keys by generating multiple potential meaning representations of a user utterance, then, during a meaning search or inference, compare the search keys to the search space that includes the meaning representations that survive the above-mentioned model-based entity pinning. The meaning search may also be performed with respect to an inferenced, contextual intent of conversation between the user and a behavior engine, such that meaning representation matches are identified based on their correspondence to the contextual intent (e.g., an intent previously inferenced by the NLU during a dialog with a user) and thereby guide targeted pruning or refinement of the search space. Further, during the meaning search, the artifact pinning subsystem may increase the contribution (e.g., increase the respective similarity score) of meaning representations within the search space that match the contextual intent to improve similarity scoring processes based on a particular conversational situation or identified topic of conversation. In other embodiments, the artifact pinning subsystem may remove meaning representations that are not associated with the contextual intent, expediting the similarity scoring processes via implementation of a narrower embodiment of the search space.

BRIEF DESCRIPTION OF THE DRAWINGS

Various aspects of this disclosure may be better understood upon reading the following detailed description and upon reference to the drawings in which:

FIG. 1 is a block diagram of an embodiment of a cloud computing system in which embodiments of the present technique may operate;

FIG. 2 is a block diagram of an embodiment of a multi-instance cloud architecture in which embodiments of the present technique may operate;

FIG. 3 is a block diagram of a computing device utilized in a computing system that may be present in FIG. 1 or 2, in accordance with aspects of the present technique;

FIG. 4A is a schematic diagram illustrating an embodiment of an agent automation framework including an NLU framework that is part of a client instance hosted by the cloud computing system, in accordance with aspects of the present technique;

FIG. 4B is a schematic diagram illustrating an alternative embodiment of the agent automation framework in which portions of the NLU framework are part of an enterprise instance hosted by the cloud computing system, in accordance with aspects of the present technique;

FIG. 5 is a flow diagram illustrating an embodiment of a process by which an agent automation framework, including an NLU framework and a Behavior Engine (BE), extracts intents and/or entities from and responds to a user utterance, in accordance with aspects of the present technique;

FIG. 6 is a block diagram illustrating an embodiment of the NLU framework including a meaning extraction subsystem and a meaning search subsystem, wherein the meaning extraction subsystem generates meaning representations from a received user utterance to yield an utterance meaning model and generates meaning representations from sample utterances of an understanding model to yield an understanding model, and wherein the meaning search subsystem compares meaning representations of the utterance meaning model to meaning representations of the understanding model to extract artifacts (e.g., intents and/or entities) from the received user utterance, in accordance with aspects of the present technique;

FIG. 7 is a diagram illustrating an example of an utterance tree generated for an utterance, in accordance with an embodiment of the present approach;

FIG. 8 is an information flow diagram illustrating an embodiment of an artifact pinning subsystem coordinating operation of the meaning extraction system and the meaning search subsystem, wherein the meaning extraction subsystem generates a structure-pinned search space from multiple understanding models, and wherein the meaning search subsystem generates multiple meaning representations of a user utterance as a search key to extract the artifacts from the user utterance, in accordance with aspects of the present technique;

FIG. 9 is an information flow diagram illustrating an embodiment of the artifact pinning subsystem pinning entities within meaning representations utilized to generate the search space to identify suitable meaning representations, which include a particular intent associated with an entity that corresponds to a particular labeled entity of the intent-entity model, in accordance with aspects of the present technique;

FIG. 10 is a flow diagram illustrating an embodiment of a process whereby, during compilation of the search space, the artifact pinning subsystem expands and narrows a number of meaning representations generated for the sample utterances via model-based pinning of labeled entities within intents of the meanings representations; and

FIG. 11 is a flow diagram illustrating an embodiment of a process by which, during an intent match or inference, the artifact pinning subsystem expands a number of meaning representations generated for the received user utterance to form search keys and compares the search keys to a compiled search space that is narrowed via intent-pinning in view of a contextual intent derived from the BE.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

One or more specific embodiments will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.

As used herein, the term “computing system” or “computing device” refers to an electronic computing device such as, but not limited to, a single computer, virtual machine, virtual container, host, server, laptop, and/or mobile device, or to a plurality of electronic computing devices working together to perform the function described as being performed on or by the computing system. As used herein, the term “machine-readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store one or more instructions or data structures. The term “non-transitory machine-readable medium” shall also be taken to include any tangible medium that is capable of storing, encoding, or carrying instructions for execution by the computing system and that cause the computing system to perform any one or more of the methodologies of the present subject matter, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such instructions. The term “non-transitory machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media. Specific examples of non-transitory machine-readable media include, but are not limited to, non-volatile memory, including by way of example, semiconductor memory devices (e.g., Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), and flash memory devices), magnetic disks such as internal hard disks and removable disks, magneto-optical disks, and CD-ROM and DVD-ROM disks.

As used herein, the terms “application,” “engine,” and “plug-in” refer to one or more sets of computer software instructions (e.g., computer programs and/or scripts) executable by one or more processors of a computing system to provide particular functionality. Computer software instructions can be written in any suitable programming languages, such as C, C++, C #, Pascal, Fortran, Perl, MATLAB, SAS, SPSS, JavaScript, AJAX, and JAVA. Such computer software instructions can comprise an independent application with data input and data display modules. Alternatively, the disclosed computer software instructions can be classes that are instantiated as distributed objects. The disclosed computer software instructions can also be component software, for example JAVABEANS or ENTERPRISE JAVABEANS. Additionally, the disclosed applications or engines can be implemented in computer software, computer hardware, or a combination thereof.

As used herein, the term “framework” refers to a system of applications and/or engines, as well as any other supporting data structures, libraries, modules, and any other supporting functionality, that cooperate to perform one or more overall functions. In particular, a “natural language understanding framework” or “NLU framework” comprises a collection of computer programs designed to process and derive meaning (e.g., intents, entities, artifacts) from natural language utterances based on an understanding model. As used herein, a “behavior engine” or “BE,” also known as a reasoning agent or RA/BE, refers to a rule-based agent, such as a virtual agent, designed to interact with users based on a conversation model. For example, a “virtual agent” may refer to a particular example of a BE that is designed to interact with users via natural language requests in a particular conversational or communication channel. With this in mind, the terms “virtual agent” and “BE” are used interchangeably herein. By way of specific example, a virtual agent may be or include a chat agent that interacts with users via natural language requests and responses in a chat room environment. Other examples of virtual agents may include an email agent, a forum agent, a ticketing agent, a telephone call agent, and so forth, which interact with users in the format of email, forum posts, autoreplies to service tickets, phone calls, and so forth.

As used herein, an “intent” refers to a desire or goal of a user which may relate to an underlying purpose of a communication, such as an utterance. As used herein, an “entity” refers to an object, subject, or some other parameterization of an intent. It is noted that, for present embodiments, certain entities are treated as parameters of a corresponding intent. More specifically, certain entities (e.g., time and location) may be globally recognized and extracted for all intents, while other entities are intent-specific (e.g., merchandise entities associated with purchase intents) and are generally extracted only when found within the intents that define them. As used herein, “artifact” collectively refers to both intents and entities of an utterance. As used herein, an “understanding model” is a collection of models used by the NLU framework to infer meaning of natural language utterances. An understanding model may include a vocabulary model that associates certain tokens (e.g., words or phrases) with particular word vectors, an intent-entity model, an entity model, or a combination thereof. As used herein an “intent-entity model” refers to a model that associates particular intents with particular sample utterances, wherein entities associated with the intent may be encoded as a parameter of the intent within the sample utterances of the model. As used herein, the term “agents” may refer to computer-generated personas (e.g., chat agents or other virtual agents) that interact with users within a conversational channel. As used herein, a “corpus” refers to a captured body of source data that includes interactions between various users and virtual agents, wherein the interactions include communications or conversations within one or more suitable types of media (e.g., a help line, a chat room or message string, an email string). As used herein, an “utterance tree” refers to a data structure that stores a meaning representation of an utterance. As discussed, an utterance tree has a tree structure (e.g., a dependency parse tree structure) that represents the syntactic and grammatical structure of the utterance (e.g., relationships between words, part-of-speech (POS) taggings), wherein nodes of the tree structure store vectors (e.g., word vectors, subtree vectors) that encode the semantic meaning of the utterance. As used herein, a “quality” of an inference or meaning search refers to a quantitative measure based on one or multiple of an accuracy, a precision, and/or any suitable F-score of the inference, as would be understood by one of ordinary skill in the art of NLU.

As used herein, “source data” or “conversation logs” may include any suitable captured interactions between various agents and users, including but not limited to, chat logs, email strings, documents, help documentation, frequently asked questions (FAQs), forum entries, items in support ticketing, recordings of help line calls, and so forth. As used herein, an “utterance” refers to a single natural language statement made by a user or agent that may include one or more intents. As such, an utterance may be part of a previously captured corpus of source data, and an utterance may also be a new statement received from a user as part of an interaction with a virtual agent. As used herein, “machine learning” or “ML” may be used to refer to any suitable statistical form of artificial intelligence capable of being trained using machine learning techniques, including supervised, unsupervised, and semi-supervised learning techniques. For example, in certain embodiments, ML techniques may be implemented using a neural network (NN) (e.g., a deep neural network (DNN), a recurrent neural network (RNN), a recursive neural network). As used herein, a “vector” (e.g., a word vector, an intent vector, a subject vector, a subtree vector) refers to a linear algebra vector that is an ordered n-dimensional list (e.g., a 300 dimensional list) of floating point values (e.g., a 1×N or an N×1 matrix) that provides a mathematical representation of the semantic meaning of a portion (e.g., a word or phrase, an intent, an entity, a token) of an utterance.

As used herein, the terms “dialog” and “conversation” refer to an exchange of utterances between a user and a virtual agent over a period of time (e.g., a day, a week, a month, a year, etc.). As used herein, an “episode” refers to distinct portions of dialog that may be delineated from one another based on a change in topic, a substantial delay between communications, or other factors. As used herein, “context” refers to information associated with an episode of a conversation that can be used by the BE to determine suitable actions in response to extracted intents and/or entities of a user utterance. As used herein, a “contextual intent” refers to an intent that was previously identified or processed by the NLU framework during a flow or conversation between the BE and the user. As used herein, “domain specificity” refers to how attuned a system is to correctly extracting intents and entities expressed in actual conversations in a given domain and/or conversational channel. As used herein, an “understanding” of an utterance refers to an interpretation or a construction of the utterance by the NLU framework. As such, it may be appreciated that different understandings of an utterance are generally associated with different meaning representations having different structures (e.g., different nodes, different relationships between nodes), different POS taggings, and so forth.

As mentioned, a computing platform may include a chat agent, or another similar virtual agent, that is designed to automatically respond to user requests to perform functions or address issues on the platform via NLU techniques. The disclosed NLU framework is based on principles of cognitive construction grammar (CCG), in which an aspect of the meaning of a natural language utterance can be determined based on the form (e.g., syntactic structure, shape) and semantic meaning of the utterance. The disclosed NLU framework is capable of generating multiple meaning representations that form one or more search keys for an utterance. Additionally, the disclosed NLU framework is capable of generating an understanding model having multiple meaning representations for certain sample utterances, which expands the search space for meaning search, thereby improving operation of the NLU framework. However, when attempting to derive user intent from natural language utterances by comparing the search keys to the search space, it is presently recognized that certain NLU frameworks may perform inefficient searches when considering search keys and search spaces that include a large number of meaning representations, potentially returning irrelevant intent matches and/or incurring undesirable inference latency that reduces user satisfaction with the certain NLU frameworks.

Accordingly, present embodiments are generally directed toward an agent automation framework capable of leveraging CCG techniques to generate multiple meaning representations for utterances, including sample utterances in the intent-entity model and utterances received from a user. In particular, an artifact pinning subsystem of the agent automation system directs a meaning extraction subsystem and a meaning search subsystem of the agent automation subsystem during both compilation of a search space, as well as during inference of a received user utterance that is transformed into one or more search keys and compared to the search space. During generation of the search space, the artifact pinning subsystem may determine multiple different understandings of sample utterances within one or multiple intent-entity models by performing vocabulary adjustment, varied part-of-speech assignment, and/or any other suitable processes that generate multiple meaning representations corresponding to various understandings of each sample utterance. As such, the artifact pinning subsystem thus generates a potentially-sizeable quantity of candidates for inclusion within the search space.

To selectively prune the candidates, the artifact pinning subsystem disclosed herein leverages artifact correlations of the sample utterances to prune any meaning representations that are not valid representations of a particular sample utterance. That is, the sample utterances generally each belong to an identified intent that may have been labeled (e.g., by an author or ML-based annotation subsystem) with any suitable number of corresponding entities, within a structure or set of relationships defined by the intent-entity model. To validate the relevance of each candidate meaning representation for an identified intent, the artifact pinning subsystem may identify and pin a set of meaning representations that include the particular intent and include one or multiple respective entities corresponding to one or multiple labeled entities of a corresponding sample utterance. That is, the meaning representations of the set are desirably retained (e.g., pinned) as valid formulations of an associated sample utterance, within the structure defined by artifact labels of a respective intent-entity model. The set of meaning representations may then be re-expressed or expanded via any suitable processes, such as altering the arrangement or included number of nodes of meaning representations (e.g., utterance trees) associated with the set. The artifact pinning subsystem of certain embodiments may therefore remove any duplicate candidates and generate (e.g., compile) the search space based on the remaining meaning representations with the appropriate pinned entity.

During a meaning search performed to derive meaning from an on-going conversation between a user and a behavior engine, the artifact pinning subsystem may form the one or more search keys to compare against the search space by generating multiple potential meaning representations of a user utterance. Notably, the artifact pinning subsystem may pin the search space with respect to an inferenced, contextual intent of the on-going conversation, guiding further targeted pruning of the search space that may otherwise decrease a quality or increase a resource demand of a meaning search. For example, the artifact pinning subsystem may identify relevant meaning representations within the search space or the underlying understanding model to provide a similarity scoring bonus to the relevant meaning representations or to prune other, non-relevant meaning representations from the search space, thereby providing more direct search paths for the meaning searches. As will be understood, the herein-disclosed pinning of the multiple various candidates of the search space enables the agent automation system to target particularly-relevant candidates for improved meaning search quality. Pruning the search space to these candidates may also limit computing resource usage and improve the efficiency of the NLU framework.

With the preceding in mind, the following figures relate to various types of generalized system architectures or configurations that may be employed to provide services to an organization in a multi-instance framework and on which the present approaches may be employed. Correspondingly, these system and platform examples may also relate to systems and platforms on which the techniques discussed herein may be implemented or otherwise utilized. Turning now to FIG. 1, a schematic diagram of an embodiment of a cloud computing system 10 where embodiments of the present disclosure may operate, is illustrated. The cloud computing system 10 may include a client network 12, a network 18 (e.g., the Internet), and a cloud-based platform 20. In some implementations, the cloud-based platform 20 may be a configuration management database (CMDB) platform. In one embodiment, the client network 12 may be a local private network, such as local area network (LAN) having a variety of network devices that include, but are not limited to, switches, servers, and routers. In another embodiment, the client network 12 represents an enterprise network that could include one or more LANs, virtual networks, data centers 22, and/or other remote networks. As shown in FIG. 1, the client network 12 is able to connect to one or more client devices 14A, 14B, and 14C so that the client devices are able to communicate with each other and/or with the network hosting the platform 20. The client devices 14 may be computing systems and/or other types of computing devices generally referred to as Internet of Things (IoT) devices that access cloud computing services, for example, via a web browser application or via an edge device 16 that may act as a gateway between the client devices 14 and the platform 20. FIG. 1 also illustrates that the client network 12 includes an administration or managerial device, agent, or server, such as a management, instrumentation, and discovery (MID) server 17 that facilitates communication of data between the network hosting the platform 20, other external applications, data sources, and services, and the client network 12. Although not specifically illustrated in FIG. 1, the client network 12 may also include a connecting network device (e.g., a gateway or router) or a combination of devices that implement a customer firewall or intrusion protection system.

For the illustrated embodiment, FIG. 1 illustrates that client network 12 is coupled to a network 18. The network 18 may include one or more computing networks, such as other LANs, wide area networks (WAN), the Internet, and/or other remote networks, to transfer data between the client devices 14A-C and the network hosting the platform 20. Each of the computing networks within network 18 may contain wired and/or wireless programmable devices that operate in the electrical and/or optical domain. For example, network 18 may include wireless networks, such as cellular networks (e.g., Global System for Mobile Communications (GSM) based cellular network), IEEE 802.11 networks, and/or other suitable radio-based networks. The network 18 may also employ any number of network communication protocols, such as Transmission Control Protocol (TCP) and Internet Protocol (IP). Although not explicitly shown in FIG. 1, network 18 may include a variety of network devices, such as servers, routers, network switches, and/or other network hardware devices configured to transport data over the network 18.

In FIG. 1, the network hosting the platform 20 may be a remote network (e.g., a cloud network) that is able to communicate with the client devices 14 via the client network 12 and network 18. The network hosting the platform 20 provides additional computing resources to the client devices 14 and/or the client network 12. For example, by utilizing the network hosting the platform 20, users of the client devices 14 are able to build and execute applications for various enterprise, IT, and/or other organization-related functions. In one embodiment, the network hosting the platform 20 is implemented on the one or more data centers 22, where each data center could correspond to a different geographic location. Each of the data centers 22 includes a plurality of virtual servers 24 (also referred to herein as application nodes, application servers, virtual server instances, application instances, or application server instances), where each virtual server 24 can be implemented on a physical computing system, such as a single electronic computing device (e.g., a single physical hardware server) or across multiple-computing devices (e.g., multiple physical hardware servers). Examples of virtual servers 24 include, but are not limited to a web server (e.g., a unitary Apache installation), an application server (e.g., unitary JAVA Virtual Machine), and/or a database server (e.g., a unitary relational database management system (RDBMS) catalog).

To utilize computing resources within the platform 20, network operators may choose to configure the data centers 22 using a variety of computing infrastructures. In one embodiment, one or more of the data centers 22 are configured using a multi-tenant cloud architecture, such that one of the server instances 24 handles requests from and serves multiple customers. Data centers 22 with multi-tenant cloud architecture commingle and store data from multiple customers, where multiple customer instances are assigned to one of the virtual servers 24. In a multi-tenant cloud architecture, the particular virtual server 24 distinguishes between and segregates data and other information of the various customers. For example, a multi-tenant cloud architecture could assign a particular identifier for each customer in order to identify and segregate the data from each customer. Generally, implementing a multi-tenant cloud architecture may suffer from various drawbacks, such as a failure of a particular one of the server instances 24 causing outages for all customers allocated to the particular server instance.

In another embodiment, one or more of the data centers 22 are configured using a multi-instance cloud architecture to provide every customer its own unique customer instance or instances. For example, a multi-instance cloud architecture could provide each customer instance with its own dedicated application server and dedicated database server. In other examples, the multi-instance cloud architecture could deploy a single physical or virtual server 24 and/or other combinations of physical and/or virtual servers 24, such as one or more dedicated web servers, one or more dedicated application servers, and one or more database servers, for each customer instance. In a multi-instance cloud architecture, multiple customer instances could be installed on one or more respective hardware servers, where each customer instance is allocated certain portions of the physical server resources, such as computing memory, storage, and processing power. By doing so, each customer instance has its own unique software stack that provides the benefit of data isolation, relatively less downtime for customers to access the platform 20, and customer-driven upgrade schedules. An example of implementing a customer instance within a multi-instance cloud architecture will be discussed in more detail below with reference to FIG. 2.

FIG. 2 is a schematic diagram of an embodiment of a multi-instance cloud architecture 40 where embodiments of the present disclosure may operate. FIG. 2 illustrates that the multi-instance cloud architecture 40 includes the client network 12 and the network 18 that connect to two (e.g., paired) data centers 22A and 22B that may be geographically separated from one another. Using FIG. 2 as an example, network environment and service provider cloud infrastructure client instance 42 (also referred to herein as a client instance 42) is associated with (e.g., supported and enabled by) dedicated virtual servers (e.g., virtual servers 24A, 24B, 24C, and 24D) and dedicated database servers (e.g., virtual database servers 44A and 44B). Stated another way, the virtual servers 24A-24D and virtual database servers 44A and 44B are not shared with other client instances and are specific to the respective client instance 42. In the depicted example, to facilitate availability of the client instance 42, the virtual servers 24A-24D and virtual database servers 44A and 44B are allocated to two different data centers 22A and 22B so that one of the data centers 22 acts as a backup data center. Other embodiments of the multi-instance cloud architecture 40 could include other types of dedicated virtual servers, such as a web server. For example, the client instance 42 could be associated with (e.g., supported and enabled by) the dedicated virtual servers 24A-24D, dedicated virtual database servers 44A and 44B, and additional dedicated virtual web servers (not shown in FIG. 2).

Although FIGS. 1 and 2 illustrate specific embodiments of a cloud computing system 10 and a multi-instance cloud architecture 40, respectively, the disclosure is not limited to the specific embodiments illustrated in FIGS. 1 and 2. For instance, although FIG. 1 illustrates that the platform 20 is implemented using data centers, other embodiments of the platform 20 are not limited to data centers and can utilize other types of remote network infrastructures. Moreover, other embodiments of the present disclosure may combine one or more different virtual servers into a single virtual server or, conversely, perform operations attributed to a single virtual server using multiple virtual servers. For instance, using FIG. 2 as an example, the virtual servers 24A, 24B, 24C, 24D and virtual database servers 44A, 44B may be combined into a single virtual server. Moreover, the present approaches may be implemented in other architectures or configurations, including, but not limited to, multi-tenant architectures, generalized client/server implementations, and/or even on a single physical processor-based device configured to perform some or all of the operations discussed herein. Similarly, though virtual servers or machines may be referenced to facilitate discussion of an implementation, physical servers may instead be employed as appropriate. The use and discussion of FIGS. 1 and 2 are only examples to facilitate ease of description and explanation and are not intended to limit the disclosure to the specific examples illustrated therein.

As may be appreciated, the respective architectures and frameworks discussed with respect to FIGS. 1 and 2 incorporate computing systems of various types (e.g., servers, workstations, client devices, laptops, tablet computers, cellular telephones, and so forth) throughout. For the sake of completeness, a brief, high level overview of components typically found in such systems is provided. As may be appreciated, the present overview is intended to merely provide a high-level, generalized view of components typical in such computing systems and should not be viewed as limiting in terms of components discussed or omitted from discussion.

By way of background, it may be appreciated that the present approach may be implemented using one or more processor-based systems such as shown in FIG. 3. Likewise, applications and/or databases utilized in the present approach may be stored, employed, and/or maintained on such processor-based systems. As may be appreciated, such systems as shown in FIG. 3 may be present in a distributed computing environment, a networked environment, or other multi-computer platform or architecture. Likewise, systems such as that shown in FIG. 3, may be used in supporting or communicating with one or more virtual environments or computational instances on which the present approach may be implemented.

With this in mind, an example computer system may include some or all of the computer components depicted in FIG. 3. FIG. 3 generally illustrates a block diagram of example components of a computing system 80 and their potential interconnections or communication paths, such as along one or more busses. As illustrated, the computing system 80 may include various hardware components such as, but not limited to, one or more processors 82, one or more busses 84, memory 86, input devices 88, a power source 90, a network interface 92, a user interface 94, and/or other computer components useful in performing the functions described herein.

The one or more processors 82 may include one or more microprocessors capable of performing instructions stored in the memory 86. Additionally or alternatively, the one or more processors 82 may include application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and/or other devices designed to perform some or all of the functions discussed herein without calling instructions from the memory 86.

With respect to other components, the one or more busses 84 include suitable electrical channels to provide data and/or power between the various components of the computing system 80. The memory 86 may include any tangible, non-transitory, and computer-readable storage media. Although shown as a single block in FIG. 1, the memory 86 can be implemented using multiple physical units of the same or different types in one or more physical locations. The input devices 88 correspond to structures to input data and/or commands to the one or more processors 82. For example, the input devices 88 may include a mouse, touchpad, touchscreen, keyboard and the like. The power source 90 can be any suitable source for power of the various components of the computing device 80, such as line power and/or a battery source. The network interface 92 includes one or more transceivers capable of communicating with other devices over one or more networks (e.g., a communication channel). The network interface 92 may provide a wired network interface or a wireless network interface. A user interface 94 may include a display that is configured to display text or images transferred to it from the one or more processors 82. In addition and/or alternative to the display, the user interface 94 may include other devices for interfacing with a user, such as lights (e.g., LEDs), speakers, and the like.

It should be appreciated that the cloud-based platform 20 discussed above provides an example of an architecture that may utilize NLU technologies. In particular, the cloud-based platform 20 may include or store a large corpus of source data that can be mined, to facilitate the generation of a number of outputs, including an intent-entity model. For example, the cloud-based platform 20 may include ticketing source data having requests for changes or repairs to particular systems, dialog between the requester and a service technician or an administrator attempting to address an issue, a description of how the ticket was eventually resolved, and so forth. Then, the generated intent-entity model can serve as a basis for classifying intents in future requests, and can be used to generate and improve a conversational model to support a virtual agent that can automatically address future issues within the cloud-based platform 20 based on natural language requests from users. As such, in certain embodiments described herein, the disclosed agent automation framework is incorporated into the cloud-based platform 20, while in other embodiments, the agent automation framework may be hosted and executed (separately from the cloud-based platform 20) by a suitable system that is communicatively coupled to the cloud-based platform 20 to process utterances, as discussed below.

With the foregoing in mind, FIG. 4A illustrates an agent automation framework 100 (also referred to herein as an agent automation system 100) associated with a client instance 42. More specifically, FIG. 4A illustrates an example of a portion of a service provider cloud infrastructure, including the cloud-based platform 20 discussed above. The cloud-based platform 20 is connected to a client device 14D via the network 18 to provide a user interface to network applications executing within the client instance 42 (e.g., via a web browser of the client device 14D). Client instance 42 is supported by virtual servers similar to those explained with respect to FIG. 2, and is illustrated here to show support for the disclosed functionality described herein within the client instance 42. The cloud provider infrastructure is generally configured to support a plurality of end-user devices, such as client device 14D, concurrently, wherein each end-user device is in communication with the single client instance 42. Also, the cloud provider infrastructure may be configured to support any number of client instances, such as client instance 42, concurrently, with each of the instances in communication with one or more end-user devices. As mentioned above, an end-user may also interface with client instance 42 using an application that is executed within a web browser.

The embodiment of the agent automation framework 100 illustrated in FIG. 4A includes a behavior engine (BE) 102, an NLU framework 104, and a database 106, which are communicatively coupled within the client instance 42. The BE 102 may host or include any suitable number of virtual agents or personas that interact with the user of the client device 14D via natural language user requests 122 (also referred to herein as user utterances 122 or utterances 122) and agent responses 124 (also referred to herein as agent utterances 124). It may be noted that, in actual implementations, the agent automation framework 100 may include a number of other suitable components, including the meaning extraction subsystem, the meaning search subsystem, and so forth, in accordance with the present disclosure.

For the embodiment illustrated in FIG. 4A, the database 106 may be a database server instance (e.g., database server instance 44A or 44B, as discussed with respect to FIG. 2), or a collection of database server instances. The illustrated database 106 stores an intent-entity model 108, a conversation model 110, a corpus of utterances 112, and a collection of rules 114 in one or more tables (e.g., relational database tables) of the database 106. The intent-entity model 108 stores associations or relationships between particular intents and particular entities via particular sample utterances. In certain embodiments, the intent-entity model 108 may be authored by a designer using a suitable authoring tool. In other embodiments, the agent automation framework 100 generates the intent-entity model 108 from the corpus of utterances 112 and the collection of rules 114 stored in one or more tables of the database 106. The intent-entity model 108 may also be determined based on a combination of authored and ML techniques, in some embodiments. In any case, it should be understood that the disclosed intent-entity model 108 may associate any suitable combination of intents and/or entities with respective ones of the corpus of utterances 112. For embodiments discussed below, sample utterances of the intent-entity model 108 are used to generate meaning representations of an understanding model to define the search space for a meaning search.

For the embodiment illustrated in FIG. 4A, the conversation model 110 stores associations between intents of the intent-entity model 108 and particular responses and/or actions, which generally define the behavior of the BE 102. In certain embodiments, at least a portion of the associations within the conversation model are manually created or predefined by a designer of the BE 102 based on how the designer wants the BE 102 to respond to particular identified artifacts in processed utterances. It should be noted that, in different embodiments, the database 106 may include other database tables storing other information related to intent classification, such as a tables storing information regarding compilation model template data (e.g., class compatibility rules, class-level scoring coefficients, tree-model comparison algorithms, tree substructure vectorization algorithms), meaning representations, and so forth.

For the illustrated embodiment, the NLU framework 104 includes an NLU engine 116 and a vocabulary manager 118. It may be appreciated that the NLU framework 104 may include any suitable number of other components. In certain embodiments, the NLU engine 116 is designed to perform a number of functions of the NLU framework 104, including generating word vectors (e.g., intent vectors, subject or entity vectors, subtree vectors) from word or phrases of utterances, as well as determining distances (e.g., Euclidean distances) between these vectors. For example, the NLU engine 116 is generally capable of producing a respective intent vector for each intent of an analyzed utterance. As such, a similarity measure or distance between two different utterances can be calculated using the respective intent vectors produced by the NLU engine 116 for the two intents, wherein the similarity measure provides an indication of similarity in meaning between the two intents.

The vocabulary manager 118 addresses out-of-vocabulary words and symbols that were not encountered by the NLU framework 104 during vocabulary training. For example, in certain embodiments, the vocabulary manager 118 can identify and replace synonyms and domain-specific meanings of words and acronyms within utterances analyzed by the agent automation framework 100 (e.g., based on the collection of rules 114), which can improve the performance of the NLU framework 104 to properly identify intents and entities within context-specific utterances. Additionally, to accommodate the tendency of natural language to adopt new usages for pre-existing words, in certain embodiments, the vocabulary manager 118 handles repurposing of words previously associated with other intents or entities based on a change in context. For example, the vocabulary manager 118 could handle a situation in which, in the context of utterances from a particular client instance and/or conversation channel, the word “bike” actually refers to a motorcycle rather than a bicycle.

Once the intent-entity model 108 and the conversation model 110 have been created, the agent automation framework 100 is designed to receive a user utterance 122 (in the form of a natural language request) and to appropriately take action to address the request. For example, for the embodiment illustrated in FIG. 4A, the BE 102 is a virtual agent that receives, via the network 18, the utterance 122 (e.g., a natural language request in a chat communication) submitted by the client device 14D disposed on the client network 12. The BE 102 provides the utterance 122 to the NLU framework 104, and the NLU engine 116, along with the various subsystems of the NLU framework discussed below, processes the utterance 122 based on the intent-entity model 108 to derive artifacts (e.g., intents and/or entities) within the utterance. Based on the artifacts derived by the NLU engine 116, as well as the associations within the conversation model 110, the BE 102 performs one or more particular predefined actions. For the illustrated embodiment, the BE 102 also provides a response 124 (e.g., a virtual agent utterance 124 or confirmation) to the client device 14D via the network 18, for example, indicating actions performed by the BE 102 in response to the received user utterance 122. Additionally, in certain embodiments, the utterance 122 may be added to the utterances 112 stored in the database 106 for continued learning within the NLU framework 104.

It may be appreciated that, in other embodiments, one or more components of the agent automation framework 100 and/or the NLU framework 104 may be otherwise arranged, situated, or hosted for improved performance. For example, in certain embodiments, one or more portions of the NLU framework 104 may be hosted by an instance (e.g., a shared instance, an enterprise instance) that is separate from, and communicatively coupled to, the client instance 42. It is presently recognized that such embodiments can advantageously reduce the size of the client instance 42, improving the efficiency of the cloud-based platform 20. In particular, in certain embodiments, one or more components of the artifact pinning subsystem discussed below may be hosted by a separate instance (e.g., an enterprise instance) that is communicatively coupled to the client instance 42, as well as other client instances, to enable improved meaning searching for suitable matching meaning representations within the search space to enable identification of artifact matches for the utterance 122.

With the foregoing in mind, FIG. 4B illustrates an alternative embodiment of the agent automation framework 100 in which portions of the NLU framework 104 are instead executed by a separate, shared instance (e.g., enterprise instance 125) that is hosted by the cloud-based platform 20. The illustrated enterprise instance 125 is communicatively coupled to exchange data related to artifact mining and classification with any suitable number of client instances via a suitable protocol (e.g., via suitable Representational State Transfer (REST) requests/responses). As such, for the design illustrated in FIG. 4B, by hosting a portion of the NLU framework as a shared resource accessible to multiple client instances 42, the size of the client instance 42 can be substantially reduced (e.g., compared to the embodiment of the agent automation framework 100 illustrated in FIG. 4A) and the overall efficiency of the agent automation framework 100 can be improved.

In particular, the NLU framework 104 illustrated in FIG. 4B is divided into three distinct components that perform distinct processes within the NLU framework 104. These components include: a shared NLU trainer 126 hosted by the enterprise instance 125, a shared NLU annotator 127 hosted by the enterprise instance 125, and an NLU predictor 128 hosted by the client instance 42. It may be appreciated that the organizations illustrated in FIGS. 4A and 4B are merely examples, and in other embodiments, other organizations of the NLU framework 104 and/or the agent automation framework 100 may be used, in accordance with the present disclosure.

For the embodiment of the agent automation framework 100 illustrated in FIG. 4B, the shared NLU trainer 126 is designed to receive the corpus of utterances 112 from the client instance 42, and to perform semantic mining (e.g., including semantic parsing, grammar engineering, and so forth) to facilitate generation of the intent-entity model 108. Once the intent-entity model 108 has been generated, when the BE 102 receives the user utterance 122 provided by the client device 14D, the NLU predictor 128 passes the utterance 122 and the intent-entity model 108 to the shared NLU annotator 127 for parsing and annotation of the utterance 122. The shared NLU annotator 127 performs semantic parsing, grammar engineering, and so forth, of the utterance 122 based on the intent-entity model 108 and returns annotated utterance trees of the utterance 122 to the NLU predictor 128 of client instance 42. The NLU predictor 128 then uses these annotated structures of the utterance 122, discussed below in greater detail, to identify matching intents from the intent-entity model 108, such that the BE 102 can perform one or more actions based on the identified intents. It may be appreciated that the shared NLU annotator 127 may correspond to the meaning extraction subsystem 150, and the NLU predictor may correspond to the meaning search subsystem 152, of the NLU framework 104, as discussed below.

FIG. 5 is a flow diagram depicting a process 145 by which the behavior engine (BE) 102 and NLU framework 104 perform respective roles within an embodiment of the agent automation framework 100. For the illustrated embodiment, the NLU framework 104 processes a received user utterance 122 to extract artifacts 140 (e.g., intents and/or entities) based on the intent-entity model 108. The extracted artifacts 140 may be implemented as a collection of symbols that represent intents and entities of the user utterance 122 in a form that is consumable by the BE 102. As such, these extracted artifacts 140 are provided to the BE 102, which processes the received artifacts 140 based on the conversation model 110 to determine suitable actions 142 (e.g., changing a password, creating a record, purchasing an item, closing an account) and/or virtual agent utterances 124 in response to the received user utterance 122. As indicated by the arrow 144, the process 145 can continuously repeat as the agent automation framework 100 receives and addresses additional user utterances 122 from the same user and/or other users in a conversational format.

As illustrated in FIG. 5, it may be appreciated that, in certain situations, no further action or communications may occur once the suitable actions 142 have been performed. Additionally, it should be noted that, while the user utterance 122 and the agent utterance 124 are discussed herein as being conveyed using a written conversational medium or channel (e.g., chat, email, ticketing system, text messages, forum posts), in other embodiments, voice-to-text and/or text-to-voice modules or plugins could be included to translate spoken user utterance 122 into text and/or translate text-based agent utterance 124 into speech to enable a voice interactive system, in accordance with the present disclosure. Furthermore, in certain embodiments, both the user utterance 122 and the virtual agent utterance 124 may be stored in the database 106 (e.g., in the corpus of utterances 112) to enable continued learning of new structure and vocabulary within the agent automation framework 100.

As mentioned, the NLU framework 104 includes two primary subsystems that cooperate to convert the hard problem of NLU into a manageable search problem—namely: a meaning extraction subsystem and a meaning search subsystem. For example, FIG. 6 is a block diagram illustrating roles of the meaning extraction subsystem 150 and the meaning search subsystem 152 of the NLU framework 104 within an embodiment of the agent automation framework 100. For the illustrated embodiment, a right-hand portion 154 of FIG. 6 illustrates the meaning extraction subsystem 150 of the NLU framework 104 receiving the intent-entity model 108, which includes sample utterances 155 for each of the various artifacts of the model. The meaning extraction subsystem 150 generates an understanding model 157 that includes meaning representations 158 (e.g., utterance tree structures) of the sample utterances 155 of the intent-entity model 108. In other words, the understanding model 157 is a translated or augmented version of the intent-entity model 108 that includes meaning representations 158 to enable searching (e.g., comparison and matching) by the meaning search subsystem 152, as discussed in more detail below. As such, it may be appreciated that the right-hand portion 154 of FIG. 6 is generally performed in advance of receiving the user utterance 122, such as on a routine, scheduled basis or in response to updates to the intent-entity model 108.

For the embodiment illustrated in FIG. 6, a left-hand portion 156 illustrates the meaning extraction subsystem 150 also receiving and processing the user utterance 122 to generate an utterance meaning model 160 having at least one meaning representation 162. As discussed in greater detail below, these meaning representations 158 and 162 are data structures having a form that captures the grammatical, syntactic structure of an utterance, wherein subtrees of the data structures include subtree vectors that encode the semantic meanings of portions of the utterance. As such, for a given utterance, a corresponding meaning representation captures both syntactic and semantic meaning in a common meaning representation format that enables searching, comparison, and matching by the meaning search subsystem 152, as discussed in greater detail below. Accordingly, the meaning representations 162 of the utterance meaning model 160 can be generally thought of like one or more search keys, while the meaning representations 158 of the understanding model 157 define a search space in which the search keys can be sought. Thus, the meaning search subsystem 152 searches the meaning representations 158 of the understanding model 157 to locate one or more artifacts that match the meaning representation 162 of the utterance meaning model 160 as discussed below, thereby generating the extracted artifacts 140.

As an example of one of the meaning representations 158, 162 disclosed herein, FIG. 7 is a diagram illustrating an example of an utterance tree 166 generated for an utterance. As should be understood, the utterance tree 166 is a data structure that is generated by the meaning extraction subsystem 150 based on the user utterance 122, or alternatively, based on one of the sample utterances 155. For the example illustrated in FIG. 7, the utterance tree 166 is based on an example utterance, “I want to go to the store by the mall today to buy a blue, collared shirt and black pants and also to return some defective batteries.” The illustrated utterance tree 166 includes a set of nodes 202 (e.g., nodes 202A, 202B, 202C, 202D, 202E, 202F, 202G, 202H, 202I, 202J, 202K, 202L, 202M, 202N, and 202P) arranged in a tree structure, with each node representing a particular word or phrase (e.g., token) of the example utterance. It may be noted that each of the nodes 202 may also be described as representing a particular subtree of the utterance tree 166, wherein a subtree can include one or more nodes 202.

The form or shape of the utterance tree 166 illustrated in FIG. 7 is determined by the meaning extraction subsystem 150 and represents the syntactic, grammatical meaning of one understanding of the example utterance. More specifically, a prosody subsystem of the meaning extraction subsystem 150 breaks the utterance into intent segments, while a structure subsystem of the meaning extraction subsystem 150 constructs the utterance tree 166 from these intent segments. Each of the nodes 202 store or reference a respective word vector (e.g., token) that is determined by the vocabulary subsystem to indicate the semantic meaning of the particular word or phase of the utterance. As mentioned, each word vector is an ordered n-dimensional list (e.g., a 300 dimensional list) of floating point values (e.g., a 1×N or an N×1 matrix) that provides a mathematical representation of the semantic meaning of a portion of an utterance.

Moreover, in other embodiments, each of the nodes 202 may be annotated by the structure subsystem with additional information about the word or phrase represented by the node to form an annotated embodiment of the utterance tree 166. For example, each of the nodes 202 may include a respective tag, identifier, shading, or cross-hatching that is indicative of a class annotation of the respective node. In particular, for the example utterance tree 166 illustrated in FIG. 7, certain subtrees or nodes (e.g., nodes 202A, 202B, 202C, and 202D) may be annotated with part-of-speech labels or tags to be verb nodes, certain subtrees or nodes (e.g., nodes 202E, 202F, 202G, 202H, 202I, and 202J) may be annotated to be subject or object nodes, and certain subtrees or nodes (e.g., nodes 202K, 202L, 202M, 202N, and 202P) may be annotated to be modifier nodes (e.g., subject modifier nodes, object modifier nodes, verb modifier nodes) by the structure subsystem. These class annotations may then be used by the meaning search subsystem 152 when comparing meaning representations that are generated from annotated utterance trees.

As such, it may be appreciated that the utterance tree 166, from which the meaning representations are generated, serves as a basis (e.g., an initial basis) for artifact extraction. Further to this effect, the nodes 202 of certain embodiments of the utterance tree 166, such as those in which the utterance is a sample utterance 155 of the intent-entity model 108, may also be annotated or tagged with respective artifact labels (e.g., intent labels and/or entity labels) that identify a particular one of the nodes as a particular entity that is defined within a particular intent. For example, during construction of the intent-entity model 108, an author of the intent-entity model 108 may identify the sample utterance 155 as belonging to a particular intent. Within the intent-entity model 108, the author may then identify (e.g., highlight, annotate, label) certain tokens within the sample utterance 155, such as particular entities, as entities that belong to or are associated with the intent. For example, a “purchase product” intent may have a number of labeled entities within an intent-entity model 108 that belong to the intent, such as a “brand” entity, a “model” entity, a “color” entity, a “size” entity, a “shipping address” entity, and so forth, depending on the nature of the product. In some embodiments, at least a portion of the sample utterances 155 are associated with artifact labels by machine learning features of the NLU framework, either in addition to or in alternative to the manually-specified annotations of an author. In either case, the artifact labels for the sample utterances 155 may be specific to the particular relationships defined by the associated understanding model, as well as the underlying intent-entity model, providing a structure that enables improved inference within the scope of the particular intent-entity model 108.

Based on cognitive construction grammar techniques, the structure subsystem may consider the artifact labeling of the sample utterances 155 to guide determination of the shape of the utterance tree 166. It should be understood that the artifact labels applied to the tokens of the sample utterances 155 may be propagated to the meaning representations 158, and in particular, are included as parameters associated with the respective nodes of the meaning representations, where each node represents a token of the utterance. For example, in the present embodiment, the entirety of the sample utterance 155 of the utterance tree 166 may be identified as belonging to one (or multiple of) a desire intent, a travel intent, or a purchase intent, based on the presence of the “want” of node 202A, the “to go” of node 202B, and the “to buy” of node 202C, and “to return” of node 202D. These intents may therefore be leveraged within the structure of a particular intent-entity model 108 to identify meaning matches to meaning representations within the intent categories.

Further, certain nodes of the utterance tree 166 may be labeled as an entity that is associated with a particular intent of the sample utterance 155. For example, the tokens “store” of node 202F and “mall” of node 202G may include artifact labels to indicate that the tokens are labeled entities that correspond to the purchase intent. The nodes 202F, 202G may both be annotated as entities that represent locations within the travel intent. That is, each of these labeled entities further define their associated intent to enable the NLU framework 104 to derive meaning and perspective from user utterances 122 related to the associated intent. The artifact labels discussed herein are further leveraged by an artifact pinning subsystem, as discussed below, to verify whether certain cleansed forms and/or alternative forms of meaning representations generated for various sample utterances 155 have valid form interpretations that uphold the embedded artifact relationship information within a particular intent-entity model 108. By excluding interpretations that construe entities in manners unsuited to (e.g., different from, not in accordance with) the artifact labels of the sample utterances 155, the ambiguity of complex user or sample utterances, such as polysemic utterances and/or utterances having multi-word entities that are interpretable in multiple ways, can be reduced or eliminated. It should be understood that embodiments of the utterance tree 166 that are generated from a received user utterance 122 provided to the BE 102 may also include any other suitable tags that the BE 102 derives from the context of interaction with an end-user, such as tags indicating that a particular user utterance 122 was received during discussion of a particular intent, during a certain time of day, during occurrence of a particular news or weather event, and so forth.

FIG. 8 is an information flow diagram illustrating an embodiment of an artifact pinning subsystem 250 compiling a search space 252 from multiple understanding models 157, and then implementing one or more search keys 254 against the search space 252 to identify the extracted artifacts 140. As acknowledged herein, to facilitate generation of the extracted artifacts 140, the artifact pinning subsystem 250 may expand either or both the search keys 254 and the search space 252 and pin the search space 252 to target variations of below-discussed sets of the meaning representations 158. The meaning representations 158 of the search space 252 may be pinned in response to the artifact pinning subsystem determining that the meaning representations 158 are formulated with valid configurations of artifact assignments therein, align with a current contextual intent of conversation between the user and the BE 102, or both.

Providing more detail herein with respect to generation of the search space 252, the artifact pinning subsystem 250 may aggregate the sample utterances 155 of a set 270 of intent-entity models, such as multiple intent-entity models 108 that are each suited for a particular purpose or domain. As discussed above, the sample utterances 155 are individually associated with artifact labels 272 that link model-specific relationships between various intents and various entities of the sample utterances 155. For example, each intent-entity model 108 of the set 270 may include sample utterances 155 that provide guidance for the NLU framework 104 to perform meaning searches with respect to any suitable natural language interaction with users, such as greeting users, managing meetings, managing a particular product of an enterprise, managing human resource actions, and/or concluding conversations with users, among many other suitable interactions. The sample utterances 155 are analyzed by the meaning extraction subsystem 150 of the artifact pinning subsystem 250 to generate a set 282 of meaning representations that assign possible forms to, as well as consider polysemic expression of, each respective sample utterance 155. For the set 282 of meaning representations, a respective understanding model of a set 284 of understanding models may therefore be generated, wherein each understanding model of the set 284 defines a respective model-specific search space 286.

As discussed below, the artifact pinning subsystem 250 desirably pins suitable meaning representation candidates of the multiple model-specific search spaces 286 to compile the search space 252 (e.g., compiled search space). In particular, for a given model-specific search space 286, the artifact pinning subsystem 250 identifies and pins suitable candidates from the set 282 of meaning representations that align with the artifact labels 272 of the particular one of the set 270 of intent-entity models from which the particular one of the set 284 of understanding models was derived. It should be understood that the artifact pinning subsystem 250 may compile the search space 252 after one or more conversations between a user and the BE 102, periodically, in response to receipt of new or updated sample utterances 155, and so forth.

Similarly, during search key generation and utilization, the artifact pinning subsystem 250 receives a user utterance 122 and derives a set 290 of meaning representations for the user utterance 122 that assigns potential entities to tokens within the user utterance 122. Notably, the artifact pinning subsystem 250 may not prune the set 290 of meaning representations because the search space 252 is already pruned to surviving meaning representations 158 that uphold the artifact relationships set forth by the artifact labels 272 of the set 270 of intent-entity models. Additionally, the artifact pinning subsystem 250 may further refine the set 282 of meaning representations of the set 270 of intent-entity models based on a context of a conversation between the user and the BE 102, such as a conversation to order a product or schedule a meeting. Thus, the artifact pinning subsystem 250 generates the tailored utterance meaning model 160 from the set 290 of meaning representations as the search keys 254 for comparison to the particularly context-aware search space 252. Indeed, as discussed in more detail below, the meaning search subsystem 152 may compare the meaning representations of the set 290 defining the search keys 254 to the meaning representations 158 of the search space 252 to identify any suitable, matching meaning representations 158, which enable the NLU framework 104 to identify the extracted artifacts 140 therefrom. The meaning search subsystem 152 may also score the matching meaning representations 158 and/or the artifacts therein with an accompanying confidence level to facilitate appropriate agent responses 124 and/or actions 142 to the most likely extracted artifacts 140 from the meaning representations 158. As will be understood, the disclosed embodiments of the model-based and context-specific expansion and pruning (e.g., pinning) of the meaning representations performed by the artifact pinning subsystem 250 provide enhancements to the search space 252 and/or the search keys 254 to facilitate efficient, relationship-aware identification of the extracted artifacts 140 for improved meaning search quality.

FIG. 9 is an embodiment of an information flow diagram illustrating the artifact pinning subsystem 250 pinning entities within meaning representations utilized to generate the search space 252. As recognized herein, the artifact pinning subsystem 250 is designed to leverage embedded relationships between the artifacts of the sample utterances 155 of the set 270 of intent-entity models 108 to derive a targeted subset of meaning representations for meaning searches. In the present embodiment, the artifact pinning subsystem 250 receives or considers one of the sample utterances 155. As mentioned above and discussed in more detail below, the artifact pinning subsystem 250 derives (e.g., generates, extracts) multiple alternative form assignments for the utterance 300 via any suitable processes, including alternative parse structure discovery, re-expressions, cleansing, vocabulary injection or substitution, and/or various part-of-speech assignments that are applied to respective tokens associated with nodes of utterance trees of the meaning representations for the sample utterance 155, thereby generating a set 310 of meaning representation candidates that corresponds to the set 282 of meaning representations derived from the sample utterances 155 of one intent-entity model of the set 270 discussed above.

Generally, each meaning representation candidate of the set 310 may have a respective CCG form that is assigned based on the structure of the sample utterance 155, including the interdependencies of the intents and entities within each meaning representation candidate. The disclosed artifact pinning subsystem 250 performs entity pinning to enable the meaning search subsystem 152 to focus on meaning representation candidates of the set 310 that align with the aforementioned relational cues embedded within the associated intent-entity model 108 (e.g., defined herein as meaning representations having valid entity formulations), thereby leveraging these embedded relationships to verify the prescribed forms of the set 310 of meaning representation candidates. That is, assuming that each meaning representation candidate of the set 310 is related to a particular intent, the artifact pinning subsystem 250 pins one or multiple entities 312 (illustrated via filled circles in the present embodiment) therein that have a valid entity formulation and that correspond with the labeled or annotated entities within the set 270 of intent-entity models (defined by the artifact labels 272). Other meaning representation candidates of set 310 that do not contain a valid entity formulation (e.g., that do not correspond with the artifact labels 272) may therefore be disregarded as unviable candidates, while the entity-pinned meaning representation candidates of the set 310 are retained (e.g., pinned). The entity pinning faculties discussed herein may be performed with respect to the search space 252 to leverage the artifact-level relationships embedded within the set 270 of intent-entity models for structure-verification-based pruning. Indeed, with the search space 252 pruned and compiled to match the embedded relationships of the intent-entity model 108, the artifact pinning subsystem 250 enables the search keys 254 to be expanded, not pinned, and utilized as broader search components for improved inference quality. However, in other embodiments, the search keys 254 may be pruned in a manner similar to that of the search space 252, but with respect to potential, estimated artifact labels that are generated via ML techniques and contrasted to the artifact labels 272 of the sample utterances 155.

As a particular example, for a sample utterance 155 of “Order [three-way switch],” the set 310 of meaning representation candidates may include entity formulations and associated entity assignments (indicated as brackets) that express the sample utterance 155 as a first meaning representation for a first interpretation of the utterance as “I would like to order one three-way switch” and a second meaning representation for a second interpretation of the utterance as “I would like to order three and to switch my way.” Because the first interpretation includes the particular labeled entity [three way switch], the artifact pinning subsystem 250 may retain and pin the first meaning representation as a valid parse of the sample utterance 155 in light of the associated intent-entity model 108. In contrast, because the second meaning representation does not include the particular labeled entity of the sample utterance 155, the artifact pinning subsystem 250 may determine that the second meaning representation is not a valid interpretation of the sample utterance 155 and remove or prune the second meaning representation from the search space 252, thereby directing appropriate consumption of utterances for improved understanding accuracy.

Moreover, with respect to meaning searches performed by comparing the search keys 254 to the search space 252, the artifact pinning subsystem 250 may also leverage contextual information provided by the BE 102 to verify whether any of the meaning representation candidates of the set 310 generated for the search space 252 align with the context of conversation (e.g., a current or on-going conversation) between an end-user and the BE 102. For example, in response to receiving a user utterance 122 requesting to set up a meeting at a particular time, NLU framework 104 may perform a meaning search based on the user utterance 122 to determine that the utterance corresponds to a “meeting setup intent” defined in a particular intent-entity model 108. The NLU framework 104 provides this intent, along with any entities identified in the utterance, to the BE 102, as discussed above with respect to FIG. 5. In response, the BE 102 may execute a flow that is designed to handle the various tasks involved in scheduling the meeting for the user, including slot-filling information to complete the task from the entities (e.g., participants, location, time, topics) received from the NLU framework 104, and prompting the user to respond with additional details to fill remaining slots. With this in mind, it is presently recognized that the NLU framework 104 can use information about the current flow being executed by the BE 102 to provide context for interpreting subsequent utterances from the user. For the above example, the BE 102 may provide context (e.g., information regarding the current contextual intent) to the artifact pinning subsystem 250 of the NLU framework 104 indicating that the slot-filling process for the meeting setup intent is being performed. In response, the NLU framework 104 may perform a meaning search of a subsequent user utterance 122 that focuses on the entities related to the meeting setup intent (e.g., meeting participants, meeting locations, meeting times). That is, using the contextual intent information provided by the BE 102, the artifact pinning subsystem 250 of the NLU framework 104 may desirably narrow the search space 252 to particular meaning representation candidates having entities and/or intents that correspond to the current flow being executed by the BE 102.

As will be understood, these techniques thereby efficiently leverage implicit clues derivable from the current conversation and/or provided by an author of the intent-entity model 108 to pin candidate meaning representations as suitable search spaces 252 that provide an improved meaning search. It should be understood that although the artifact labels 272 of FIG. 8 are primarily described with respect to an author of the intent-entity model 108, the NLU framework 104 may additionally or alternatively associate respective entities with respective intents within the sample utterances 155 via ML-based techniques, in some embodiments. The above-introduced entity-pinning may be further understood with respect to the following discussion, which relates to generation of the search space 252 followed by performing a meaning search by comparing the search keys 254 to a context-aware embodiment of the search space 252.

FIG. 10 is a flow diagram illustrating an embodiment of a process 350 whereby, during compilation of the search space 252, the artifact pinning subsystem 250 efficiently expands a number of meaning representations generated for the sample utterances 155 of the set 270 of intent-entity models, and then narrows the number via model-based pinning based on respective entities of each meaning representation corresponding to labeled entities of the intents of the set 270. The present embodiment of the process 350 particularly illustrates generation of the model-specific search space 286 for one understanding model of the set 270 of FIG. 8, though it should be understood that each model-specific search space 286 may be generated via the process 350 and aggregated into the search space 252 by any suitable data aggregation processes or structures. Additionally, the process 350 may be stored in a suitable memory (e.g., memory 86) and executed by a suitable processor (e.g., processor(s) 82) associated with the client instance 42 or the enterprise instance 125, as discussed above with respect to FIGS. 3, 4A, and 4B.

The artifact pinning subsystem 250 performing the illustrated embodiment of the process 350 begins with a vocabulary refinement phase 352 that cleanses and validates (block 354) the sample utterances 155 stored within one intent-entity model of the set 270. For example, the artifact pinning subsystem 250 may utilize a vocabulary subsystem 356 of the NLU framework 104 (e.g., corresponding to the vocabulary manager 118 in certain embodiments) to access and apply the rules 114 stored in the database 106 that modifies certain tokens (e.g., words, phrases, punctuation, emojis) of the sample utterances 155. In some embodiments, the vocabulary subsystem 356 performs the vocabulary refinement phase 352 based on a vocabulary model 360 that is stored with the intent-entity model 108 within a particular understanding model of the set 282 or based on an aggregated vocabulary model derived from each respective vocabulary model of the set 282. By way of example, in certain embodiments, cleansing may involve applying a rule that removes non-textual elements (e.g., emoticons, emojis, punctuation) from the sample utterances 155. In certain embodiments, cleansing may involve correcting misspellings or typographical errors in the sample utterances 155. Additionally, in certain embodiments, cleansing may involve substituting certain tokens with other tokens. For example, the vocabulary subsystem 356 may apply a rule that that all entities with references to time or color with a generic or global entity, such as a global entity for a phone number, a time, a color, a meeting room, and so forth. In certain cases in which the intent-entity model 108 is a pre-built vocabulary model, the artifact pinning subsystem 250 may omit cleansing, while proceeding with validation to ensure the sample utterances 155 are valid sample utterances in view of validation rules stored within the database 106.

Continuing the vocabulary refinement phase 352 of the process 350, the artifact pinning subsystem 250 then performs (block 362) vocabulary injection on tokens of the sample utterances 155, thereby re-rendering the sample utterances 155 by adjusting phraseology and/or terminology of the sample utterances 155. Based on the vocabulary model 360 (or aggregate vocabulary model) stored within the understanding model 157, the artifact pinning subsystem 250 utilizes the vocabulary subsystem 356 to replace the content of certain tokens of the sample utterances 155 to become more discourse-appropriate phrases and/or terms. In certain embodiments, multiple phrases and/or terms may be replaced, and the various permutations of such replacements are used to generate a set 364 of utterances (e.g., vocabulary-adjusted sample utterances) for each sample utterance 155 of the particular intent-entity model 108. For example, in certain embodiments, the vocabulary subsystem 356 may access the vocabulary model 360 of the understanding model 157 to identify alternative vocabulary that can be used to generate re-expressions of the utterances having different tokens. By way of specific example, in an embodiment, the vocabulary subsystem 356 may determine that a synonym for “developer” is “employee,” and may generate a new utterance in which the term “developer” is substituted by the term “employee.”

For the embodiment illustrated in FIG. 10, after cleansing and vocabulary injection, the artifact pinning subsystem 250 removes (block 366) vocabulary-level duplicates from the set 364 of utterances. In situations in which identical utterances were generated for a given sample utterance 155, the vocabulary-level deduplication of block 366 generally discards all but one utterance from the set 364, efficiently conserving computing resources of the NLU framework 104. It may be appreciated that the set 364 of utterances may include original embodiments of sample utterances 155 or cleansed version of the sample utterances 155, and may include any suitable number of alternative re-expressions utterances generated through the vocabulary injection of block 362. It may be noted that, in certain circumstances, the vocabulary injection of block 362 may not generate re-expressions of the sample utterances 155, and as such, the set 364 of utterances may only include the sample utterances 155, or cleansed versions thereof. In other embodiments, the sample utterances 155 may be provided directly to a structure subsystem 370 of the NLU framework 104 without the cleansing (e.g., in situations in which the intent-entity model 108 is a pre-built model) or the vocabulary injection of block 366.

The artifact pinning subsystem 250 may subsequently direct remaining candidates of the set 364 to the structure subsystem 370 of an alternative form expansion phase 372 of the process 350 for part-of-speech (POS) tagging and parsing. That is, in the illustrated embodiment, the artifact pinning subsystem 250 implements the structure subsystem 370 to generate (block 374) the set 282 of one or more meaning representations that are representative of the sample utterances 155 of the intent-entity model 108. For the embodiment illustrated in FIG. 10, after generating the set 282 of meaning representations, the artifact pinning subsystem 250 may remove (block 376) any artifact-level duplicates across the entire set 282 of meaning representations. That is, because generation of the alternative forms of the meaning representations may produce permutations of meaning representations for the set 364 of utterances that overlap one another, the artifact pinning subsystem 250 analyzes each artifact across the set 282 of meaning representations to identify any meaning representations of the set 282 that are unsuitably similar to other meaning representations. It should be understood that for embodiments in which the search space 252 is generated from multiple intent-entity models of the set 270, the duplication removal of block 376 may be extended to remove duplicates between multiple embodiments of the set 364 of meaning representations, thereby providing improved processing speed during subsequent meaning search processes. Moreover, it should be understood that the above-discussed vocabulary refinement phase 352 and alternative form expansion phase 372 are examples of certain processes the artifact pinning subsystem 250 may implement to generate the set 282 of meaning representations, and that other suitable processes, including alternative parse structure discovery, vocabulary substitution, and/or re-expressions, may be implemented in addition or in alternative to the phases 352, 372.

The artifact pinning subsystem 250 performing the illustrated embodiment of FIG. 10 may therefore analyze the set 282 of meaning representations during a model-based entity pinning phase 380 of the process 350 that is performed with respect to each intent present within the set 282. For example, for each intent present within the set 282 of meaning representations (block 382), the artifact pinning subsystem 250 considers (block 384) each meaning representation of the set 282 determined to have at least one particular labeled entity for the respective intent. Within this nested loop structure, the artifact pinning subsystem 250 may iteratively identify (block 386) a set 388 of meaning representations (for each respective intent) that include a respective entity, in a suitable entity form, that corresponds to the particular labeled entity of the particular intent. By consulting the artifact labels 272 defined within the respective intent-entity model 108, the artifact pinning subsystem 250 may verify whether these parses are valid interpretations of the utterance within the structure defined by the given intent-entity model 108. The artifact pinning subsystem 250 thus retains or pins (block 390) the set 388 of meaning representations that are valid formulations of the respective meaning representation having a particular intent, for each intent of the intent-entity model 108. For embodiments simultaneously consolidating multiple intent-entity models 108, the model-based entity pinning phase 380 may be performed individually (and in some embodiments, simultaneously or in parallel) for each respective intent-entity model 108, thereby testing the validity of each meaning representation of the set 282 against the artifact labels 272 most suitable for verifying the appropriateness of the respective parse and POS tagging of each meaning representation.

For example, given a sample utterance stating “Book meeting,” a first meaning representation of the set 282 of the meaning representations may indicate a parse and POS tagging in which “book” is interpreted as a verb-style intent (e.g., corresponding to a “schedule” intent) having “meeting” as an entity. As such, the first meaning representation of the set 282 may be interpreted as, “I want to schedule a meeting.” For a second meaning representation of the set 282, the parse and POS tagging may interpret “book” as a noun, such that the second meaning representation is interpreted as, “I want a book meeting.” Assuming that a “schedule” intent of the intent-entity model 108 includes at least one sample utterance 155 with author-labeled entities defined therein, the artifact pinning subsystem 250 analyzes the artifact labels 272 to determine whether “meeting” in the first meaning representation corresponds to at least one appropriate labeled entity of the “schedule” intent of the intent-entity model 108. In response to determining that “meeting” in the first meaning representation corresponds to a “meeting” entity of the “schedule” intent within the intent-entity model 108, the artifact pinning subsystem 250 may pin the first meaning representation as a valid formulation of the sample utterance 155. In contrast, upon analyzing a “desire” or “I want” intent of the intent-entity model 108 when considering the second meaning representation of the set 282, the artifact pinning subsystem 250 may determine that there are no sample utterances having a labeled “book meeting” entity defined for the “desire” intent within the intent-entity model 108. As such, the artifact pinning subsystem 250 may recognize that the second meaning representation is an invalid or irrelevant parse of the particular sample utterance 155, at least with respect to the particular domain in which the intent-entity model 108 is defined.

With the set 388 of meaning representations pinned as having suitable structure for each intent, the artifact pinning subsystem 250 may perform a re-expression phase 392 of the process 350 in which the artifact pinning subsystem 250 determines (block 394) suitable re-expressions of the tokens of the meaning representations of the set 388. For example, for the validated and number-reduced set 388 of meaning representations, the artifact pinning subsystem 250 may adjust the order of tokens, add tokens, remove tokens, transform tokens, generalize tokens, or otherwise manipulate the expression of each token of each meaning representation of the set 310. In certain embodiments, the re-expression phase 392 of the process 350 is performed by the NLU framework 104 described in U.S. patent application Ser. No. 16/239,218, entitled, “TEMPLATED RULE-BASED DATA AUGMENTATION FOR INTENT EXTRACTION,” filed Jan. 3, 2019, which is incorporated by reference herein in its entirety for all purposes.

The step of block 394 may generate multiple different meaning representations for each meaning representation of the set 388, again expanding the number of potential meaning representation candidates for inclusion in the search space 252. However, the artifact pinning subsystem 250 of certain embodiments may again leverage the artifact labels 272 to discard (block 396) certain meaning representations of the re-expressed variants of the set 388 that do not include the respective pinned entity therein. Then, the artifact pinning subsystem 250 removes (block 398) any artifact-level duplicates across the entire set 388 of meaning representations, as discussed above with respect to block 376. As such, the artifact pinning subsystem 250 may provide the suitably-distinct meaning representations remaining within the set 388 as meaningful candidates within a compiled understanding model 400, which corresponds to the compiled search space 252 introduced above. In some embodiments, the NLU framework 104 having the artifact pinning subsystem 250 may maintain the compiled search space 252 for a threshold amount of time, for use for a threshold number of meaning searches, and so forth.

FIG. 11 is a flow diagram illustrating a process 450 whereby the artifact pinning subsystem 250 implements model-based entity pinning and BE context based intent-pinning to expand and proficiently narrow the search space 252 that is subsequently applied against the search key 254 during a meaning search, as discussed below. For example, the illustrated embodiment of the process 450 begins with the artifact pinning subsystem 250 receiving the user utterance 122 and performing a refinement and expansion phase 452 to generate a set 454 of potential meaning representations for the user utterance 122. In certain embodiments, the refinement and expansion phase 452 is a sequential combination of the vocabulary refinement phase 352 and the alternative form expansion phase 372 each discussed above with respect to FIG. 10. In such embodiments, the artifact pinning subsystem 250 may direct operation of the vocabulary subsystem 356 to perform (block 456) vocabulary refinement on the user utterance 122, then direct operation of the structure subsystem 370 to perform (block 458) alternative form expansion on the variants of the user utterance 122 generated by the vocabulary subsystem 356. The set 454 of potential meaning representations for the user utterance 122 produced via the refinement and expansion phase 452 may therefore express the user utterance 122 in multiple vocabulary, parse, and POS variants. It should be understood that the refinement and expansion phase 452 may also include removal of token-level duplicates and artifact level duplicates from the set 454 of potential meaning representations, in some embodiments.

In other embodiments, the refinement and expansion phase 452 may be modified to include any suitable processes that generate the set 454 of potential meaning representations having zero, one, or more suitably-distinct variations of the respective user utterance 122, in accordance with the present disclosure. For example, any one or multiple of vocabulary cleansing, vocabulary substitution, vocabulary injection, various part-of-speech assignments, alternative parse structure discovery, and re-expressions may be performed to generate the set 454 of potential meaning representations. As such, the artifact pinning subsystem 250 may aggregate the suitably-distinct meaning representations of each generated set 454 of potential meaning representations as meaningful candidates within the utterance meaning model 160, thereby forming the utterance meaning model 160 as the one or more search keys 254 suitable for efficient comparison to the search space 252.

Notably, it is presently recognized that the artifact pinning subsystem 250 may leverage contextual intent information related to a current conversation between the user and the BE 102 via a BE context intent pinning phase 470 that improves targeted refinement of the search space 252. In the illustrated embodiment, the artifact pinning subsystem 250 may identify (block 472) a contextual intent 474 from the context of the current episode between the user and the BE 102. As examples, the contextual intent 474 may be a purchase intent, a meeting setup intent, a travel intent, a desire intent, and so forth, based on the current operation of the BE 102 with respect to the dialog between the user and the BE 102. In some embodiments, the BE 102 may provide or make available to the NLU framework 104 the current contextual intent 474 of the BE 102, which may be utilized to tailor operation of the NLU framework 104 for subsequent intent inference operations. For example, in certain embodiments, the contextual intent 474 may be in the form of an indication of a flow (e.g., script, process) that is currently being executed by the BE 102 in response to a previously received user utterance 122, wherein the flow corresponds to a particular intent (e.g., a purchase intent, a schedule meeting intent) that was inferenced by the NLU framework 104 from the previously received user utterance 122. In other embodiments, the artifact pinning subsystem 250 may determine the contextual intent 474 from previous episodes, or from any suitable context information associated with an episode of a conversation that the BE 102 may use to determine suitable actions in response to extracted intents and/or entities of the user utterance 122. Moreover, in certain embodiments, the artifact pinning subsystem 250 and/or the BE 102 may identify that a slot-filling operation or conversation is to be performed to gather additional information or entities associated with the contextual intent 474.

Based on the contextual intent 474 of the BE 102, the artifact pinning subsystem 250 may aggregate (block 476) or identify relevant meaning representations 480 of the meaning representations 382 of the compiled understanding model 400 that are associated with (e.g., include) the contextual intent 474. It should be understood that the compiled understanding model 400 may have been previously generated via the model-based entity pinning phase 380 discussed above with respect to FIG. 10, such that the meaning representations 382 therein align with the inherent cues embedded within the artifact labels of the compiled understanding model 400. As such, in certain embodiments, the BE context intent pinning phase 470 is performed on the compiled understanding model 400 during inference to further tune the search space 252 based on the current contextual intent 474 of conversation. In other embodiments, the BE context intent pinning phase 470 may be performed concurrently with the model-based entity pinning phase 380, before the model-based entity pinning phase 380, or alternatively, the model-based entity pinning phase 380 may be omitted.

With the relevant meaning representations 480 identified, the artifact pinning subsystem 250 may pin (block 482) or intent-pin the relevant meaning representations 480 within the search space 252. The artifact pinning subsystem 250 may therefore pin the meaning search to particular meaning representation candidates that relate to the contextual intent 474, therefore efficiently leveraging available information to improve how the user may interface with the BE 102. In certain embodiments, the pinning performed by the artifact pinning subsystem 250 at block 482 enables the agent automation system 100 to leverage the contextual intent 474 to soft-pin (e.g., identify and retain) particular subsets of the compiled understanding model 400 for particularly relevant meaning searches. The relevant meaning representations 480 identified within the compiled understanding model 400 may be provided a scoring bonus during such searching processes, thereby soft-pinning the generated search space 252 to provide more direct search paths for the meaning searches. The scoring bonus provided to the relevant meaning representations 480 (or scoring penalty provided to irrelevant meaning representations) may be any suitable increase above a threshold score. Thus, the artifact pinning subsystem 250 may increase the contribution of similarity scores determined for meaning representations within the search space 252 that match the contextual intent 474 (e.g., current topic of discussion) to improve similarity scoring processes, imparting a preference to the contextual intent 474 based on a particular conversational frame of reference. By influencing similarity scores (e.g., soft-pinning) instead of pruning the search space 252 based on the contextual intent 474, the artifact pinning subsystem 250 is designed to enable the agent automation system 100 to correctly interpret user utterances that change the topic of conversation, such as by enabling other intents to remain within meaning search results and potentially be identified as meaning search matches to the changed topic.

In other embodiments, the artifact pinning subsystem 250 executes the pinning of block 482 as hard-pinning, in which meaning representations 382 of the compiled understanding model 400 that are not associated with contextual intent 474 are removed from consideration (e.g., pruned). The artifact pinning subsystem 250 may therefore generate a restrictive or constrained embodiment of the search space 252 against which the search keys 254 are compared to identify the extracted artifacts 240, directing searching and similarity scoring resources to the remaining relevant meaning representations 480 of the complied understanding model 400. It should be understood that the artifact pinning subsystem 250 may be individually tailored to perform the BE context intent pinning phase 470 via any suitable combination of soft-pinning or hard-pinning. For example, the artifact pinning subsystem 250 may utilize soft-pinning in response to determining that the contextual intent 474 is relatively uncommon, is associated with an average episode duration that is below a threshold duration, or has any other suitable quality indicative of an increased likelihood of imminent topic change. In embodiments in which the contextual intent 474 is not associated with an increased likelihood of imminent topic change, the artifact pinning subsystem 250 may utilize hard-pinning. In any case, it should be understood that the process 450 may be repeated in response to subsequent user utterances 122 until the contextual intent 474 is satisfied (e.g., all sub-branches of a slot-filling operation are filled).

The BE context intent pinning phase 470 may be better understood with respect to an example embodiment of a particular slot-filling operation of BE 102 in which the user has already indicated a particular intent to purchase an item. Within the purchase intent or context identified as a particular contextual intent 474 (e.g., target intent, intent identified by BE 102 from previous user utterance 122), the BE 102 may respond by prompting the user for additional information related to the purchase intent, and the artifact pinning subsystem 250 may analyze subsequently-received user utterances by targeting (e.g., intent-pinning) the entities as likely being slot-filling responses associated with the contextual intent 474. In such embodiments, the BE 102 progresses through an episode of conversation that requests entities from the user that are missing and important for satisfying the contextual intent 474, without redundantly requesting information that the user has already provided.

As another specific example, in response to receiving a particular user utterance 122 of “I want to schedule a meeting at 10:00 am tomorrow,” the NLU framework 104 may identify that the contextual intent 474 is a “schedule meeting” intent that is associated with a “meeting start time” entity within the intent-entity model 108. As such, the BE 102 may provide an agent response to request that the user provide entities for meeting participants, a meeting subject, and/or a meeting end time. Because the NLU framework 104 is performing the BE context intent pinning phase 470, the search space 252 for meaning search may be desirably narrowed to or directed toward the meeting setup intent, thereby improving agent responses by discrediting or disregarding meaning representations 382 that are not associated with the meeting setup intent.

In other words, the artifact pinning subsystem 250 may recognize that a particular intent is current being discussed or was being previously discussed during the course of conversation based on input from the BE 102. As such, when the user has already requested scheduling of a meeting, the NLU framework 104 may receive a subsequent user utterance 122 indicating a meeting end time, which is identified as including a potential entity without any potential intent (e.g., a noun phrase). To better analyze the user utterance 122, the artifact pinning subsystem 250 desirably associates the contextual intent 474 from with the entity of the user utterance 122 and boosts the confidence level (e.g., above a threshold confidence level) to encourage or force intent-specific entity matching, while permitting potential topic changes. This association effectively operates as a contextual intent pinning to facilitate high-quality and context-relevant meaning searches.

Technical effects of the present disclosure include providing an agent automation framework that implements an artifact pinning subsystem that controls operation of the meaning extraction subsystem and the meaning search subsystem to expand and subsequently narrow a search space used during intent inference. During generation of the search space, the artifact pinning subsystem may determine multiple different understandings of sample utterances within one or multiple intent-entity models by performing vocabulary adjustment and varied part-of-speech assignment to each sample utterance, thus generating a potentially-sizeable quantity of candidates for inclusion within the search space. To prune the candidates, the disclosed artifact pinning subsystem leverages artifact correlations of the sample utterances to prune meaning representations that are not valid representations of a particular sample utterance. That is, the sample utterances generally each belong to an intent that may have been labeled (e.g., by an author or ML-based annotation subsystem) with a particular entity, within the structure defined by the related intent-entity model. The artifact pinning subsystem analyzing the various meaning representations for the identified intent may therefore identify a set of meaning representations that are associated with the particular intent and include a respective entity corresponding to the labeled entity of a corresponding sample utterance. The set of meaning representations may then be re-expressed by altering the arrangement or included number of nodes of meaning representations associated with the set, remove any duplicate candidates, and generate the search space based on the remaining meaning representations with the appropriate pinned entity. During a meaning search based on a received utterance from a user, the search keys may be formed by generating multiple potential meaning representations the received utterance from a user. The search space may also be generated or refined during the meaning search with respect to an inferenced, contextual intent or current topic of conversation between the user and a behavior engine, guiding further targeted pruning of the search space. As such, the artifact pinning subsystem improves a quality of meaning searches by removing or assigning similarity scores below a threshold to unviable and irrelevant meaning representations during search space compilation and inference processes.

The specific embodiments described above have been shown by way of example, and it should be understood that these embodiments may be susceptible to various modifications and alternative forms. It should be further understood that the claims are not intended to be limited to the particular forms disclosed, but rather to cover all modifications, equivalents, and alternatives falling within the spirit and scope of this disclosure.

The techniques presented and claimed herein are referenced and applied to material objects and concrete examples of a practical nature that demonstrably improve the present technical field and, as such, are not abstract, intangible or purely theoretical. Further, if any claims appended to the end of this specification contain one or more elements designated as “means for [perform]ing [a function] . . . ” or “step for [perform]ing [a function] . . . ”, it is intended that such elements are to be interpreted under 35 U.S.C. 112(f). However, for any claims containing elements designated in any other manner, it is intended that such elements are not to be interpreted under 35 U.S.C. 112(f).

Number	Date	Country
62869811	Jul 2019	US
62869864	Jul 2019	US
62869817	Jul 2019	US
62869826	Jul 2019	US

PINNING ARTIFACTS FOR EXPANSION OF SEARCH KEYS AND SEARCH SPACES IN A NATURAL LANGUAGE UNDERSTANDING (NLU) FRAMEWORK

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCES

Provisional Applications (4)