This application claims priority to and the benefit of Korean Patent Application No. 2023-0108034, filed on Aug. 18, 2023, the disclosure of which is incorporated herein by reference in its entirety.
The present disclosure relates to a device and method for tracking a multidomain dialogue state, and more particularly, a device and method for tracking a multidomain dialogue state using a hierarchical slot selector and a dual dynamic graph.
Task-oriented dialogue (TOD) systems aim to efficiently accomplish specific user needs, such as a restaurant reservation, hailing a taxi, and the like, from users' dialogues. A TOD system should be able to understand a user's request as expressed in a multiturn dialogue with the user, and generate an appropriate response to the user's request such that the user may achieve his or her goal through a natural dialogue with the TOD system.
To this end, TOD systems may perform dialogue state tracking (DST). DST involves tracking a dialogue state in which a user's request dynamically changing in a multiturn dialogue between the user and a TOD system is expressed using a triplet of domain, slot, and value, making it possible to accurately understand the user's intention. In other words, according to DST, aims are extracted from dialogues and set as domains, an attribute that each domain may have is set as a slot, and state values of attributes extracted from the dialogues are allocated to slots, such that a dialogue state of each domain which is at least one aim is tracked. Here, DST involves changing a state value of each slot in each domain according to the content of a dialogue.
A TOD system may include a separate DST device for the purpose of DST. The DST device continuously tracks and updates a user's intention of which details, such as a location, dates for changing hotel reservations, and the like, change frequently during a dialogue with the user such that the TOD system can easily determine which hotel the user wants to stay at from the current dialogue.
Therefore, the DST device should be able to consistently track the user's intention across various domains during the multiturn dialogue. However, in multiturn and multidomain dialogues, state values of slots may have the same indicating words. For example, during a dialogue about traveling, slots for a departure point and destination in the domain “taxi” may be the same as the names of a hotel or restaurant mentioned in a previous dialogue turn. Also, slots may have multiple features, and thus it is necessary to distinguish between values that states may have. Therefore, graph model-based DST devices according to the related which only learn domain-slot relationships in consideration of only a single feature of a slot without understanding the entire dialogue context are inappropriate for tracking values with the same indicating words.
The present disclosure is directed to providing a dialogue state tracking (DST) device and method for accurately tracking a dialogue state by hierarchically classifying each slot in consideration of multiple features of domain-specific slots extracted from a dialogue with a user.
The present disclosure is also directed to providing a DST device and method for extracting a state value of an appropriate domain slot for a user's intention using a dual dynamic graph neural network.
According to an aspect of the present disclosure, there is provided a device for DST including a memory and a processor configured to execute at least some of operations based on a program stored in the memory. In dialogue states including pairs of each of multiple slots and a state value according to turn-specific dialogues of a multiturn dialogue, the processor determines whether a state value of each of the multiple slots is updated to classify the multiple slots as inherit slots and update slots, determines whether state values that the update slots may have are limited to multiple predefined possible values to hierarchically classify the update slots as categorical slots and span slots, and acquires and updates a state value of each of the categorical slots and the span slots using a possible value index and a dialogue index which are extracted by separately generating a value graph for the classified categorical slots and a dialogue graph for the span slots and performing neural network computation on the value graph and the dialogue graph.
The processor may extract the turn-specific dialogues from the multiturn dialogue based on a turn order, receive the turn-specific dialogues based on the turn order with the dialogue states which are tracked according to turn-specific dialogues until a previous turn, and encode the turn-specific dialogues and the dialogue states to generate multiple dialogue vectors and multiple slot vectors.
The processor may further acquire a possible value set including all state values that the span slots among the multiple slots may have and encode the acquired possible value set to generate multiple possible value vectors.
The processor may classify, on the basis of the multiple slot vectors, the multiple slots as the update slots of which state values are to be updated according to a current turn-specific dialogue and the inherit slots of which state values are to be carried over and may classify the update slots as the categorical slots which may have state values limited to the multiple possible values and the span slots which may have unlimited state values.
The processor may connect multiple dialogue nodes, multiple slot nodes, and multiple possible value nodes, which are generated by projecting multiple possible value vectors acquired by encoding the multiple dialogue vectors, the multiple slot vectors, and the multiple possible values to an embedding space, using edges to generate the value graph, and may connect multiple dialogue nodes and multiple slots, which are generated by projecting the multiple dialogue vectors and the multiple slot vectors to the embedding space, using edges to generate the dialogue graph.
In the value graph, the processor may connect the multiple slot nodes to the multiple dialogue nodes and the multiple possible value nodes according to detailed formats of the slots using edges. In the dialogue graph, the processor may connect the multiple dialogue nodes based on the turn order using edges, connect a dialogue node for a current turn-specific dialogue to all other dialogue nodes using edges, and connect each of the multiple slot nodes to a dialogue node of a turn in which a state value of the dialogue state is updated using an edge.
The processor may estimate a weight of each edge by performing neural network computation on the value graph, determine a possible value that is most likely to be paired with each slot among the multiple possible values, and extract the possible value index, and may estimate a weight of each edge by performing neural network computation on the dialogue graph, determine a turn-specific dialogue that has the closest correlation with each slot among the multiple turn-specific dialogues, and extract the dialogue index.
The processor may receive and encode the turn-specific dialogues, the dialogue states, the possible value indexes, the dialogue indexes, and the multiple possible values using neural network computation to acquire a context representation vector.
The processor may perform neural network computation on the context representation vector to select and acquire a state value to be updated in the categorical slots from among the multiple possible values.
The processor may perform neural network computation on the context representation vector to extract a state value to be updated in the span slots from a current turn-specific dialogue.
According to another aspect of the present disclosure, there is provided a method for DST performed by a processor, the method including, in dialogue states including pairs of each of multiple slots and a state value according to turn-specific dialogues of a multiturn dialogue, classifying the multiple slots as inherit slots and update slots by determining whether a state value of each of the multiple slots is updated, and hierarchically classifying the update slots as categorical slots and span slots by determining whether state values that the update slots may have are limited to multiple predefined possible values, and acquiring and updating a state value of each of the categorical slots and the span slots using a possible value index and a dialogue index which are extracted by separately generating a value graph for the classified categorical slots and a dialogue graph for the span slots and performing neural network computation on the value graph and the dialogue graph.
The above and other objects, features and advantages of the present disclosure will become more apparent to those of ordinary skill in the art by describing exemplary embodiments thereof in detail with reference to the accompanying drawings, in which:
Hereinafter, specific embodiments of the present disclosure will be described with reference to the drawings. The following detailed description is provided to help with comprehensive understanding of a method, a device, and/or a system described in this specification. However, this is only an example, and the present invention is not limited thereto.
In describing embodiments of the present disclosure, when it is determined that detailed description of well-known technologies related to the present invention may unnecessarily obscure the gist of embodiments, the detailed description will be omitted. Terms to be described below are terms defined in consideration of functions in the present invention, and may vary depending on the intention, practice, or the like of a user or operator. Therefore, the terms should be defined on the basis of the overall content of this specification. Terms used in the detailed description are only used to describe embodiments and should not be construed as limiting. Unless otherwise clearly specified, a singular expression includes the plural meaning. In this description, an expression such as “include” or “have” is intended to indicate certain features, numerals, steps, operations, elements, or some or combinations thereof, and should not be construed as excluding the presence or possibility of one or more other features, numerals, steps, operations, elements, or some or combinations thereof. Also, the terms “unit,” “device,” “module,” “block,” and the like described in this specification refer to units for processing at least one function or operation, which may be implemented by hardware, software, or a combination of hardware and software.
As described above, a task-oriented dialogue (TOD) system may include a DST device, and the DST device extracts an aim from a multiturn dialogue between a user and the TOD system to set the aim as a domain, sets an attribute that each domain may have as a slot, and assigns a state value for an attribute extracted from a turn-specific dialogue to a slot, tracking a change in the state value of a slot for each domain representing the aim during the dialogue.
In the example of
The DTS device extracts a state value of each slot set for each domain from a multiturn dialogue and matches the state value to a corresponding slot, thereby allocating the state value to the slot or updating a previously allocated state value.
In the multiturn dialogue shown in
The same indicating words and similar indicating words frequently appear in a multiturn dialogue. Accordingly, the DTS device should extract “center” and “Fitzbillies restaurant” as a state value of the area slot of the hotel domain and a state value of the destination slot of the taxi domain but may incorrectly extract “east” and “Gonville hotel.”
To prevent such a DST error, in consideration of multiple features of slots, the DTS device of the present disclosure not only classifies each slot as an update slot of which a value is updated according to the context of a dialogue or an inherit slot which inherits a previous value of the slot but also hierarchically classifies each slot as a categorical slot or a span slot according to schema information. For a categorical slot, multiple possible values are set in advance as selectable state values, and a state may be selected and set from among the set possible values only. For a span slot, no possible value is provided, and a state value may be extracted from a dialogue and set.
Also, a dual dynamic graph neural network is used for classified slots to understand the context of a dialogue such that a state value of an appropriate domain slot for a user's intention can be correctly extracted.
Referring to
The encoder 10 receives a dialogue Dt of a specific turn t and an already acquired dialogue state Bt−1 tracked until a previous turn t−1 from entire dialogue history (D={D1, D2, . . . , DT}) including multiple turns t (here, t∈{1, 2, . . . , T}) with a user, acquires an input sequence Xt by combining the dialogue Dt with the dialogue state Bt−1, and acquires turn-specific dialogue vectors and slot vectors by encoding the acquired input sequence Xt.
The encoder 10 may include a dialogue collection module 11, an input sequence creation module 12, and a first embedding module 13. The dialogue collection module 11 acquires the dialogue history D of all the T turns with the user. Then, the dialogue collection module 11 divides the acquired entire dialogue history D by turn to extract a turn-specific dialogue Dt. The turn-specific dialogue Dt may be represented as t=Rt⊕; ⊕Ut⊕[SEP] Here, Ut may be user speech, and Rt may be a system response. Also, “;” is a speaker distinguishment token for distinguishing between user speech Ut and a system response Rt, and [SEP] is a turn end token for indicating the end of a dialogue turn. ⊕ is a concatenation operator.
The input sequence creation module 12 sequentially receives the turn-specific dialogues D1 from the dialogue collection module 11, receives a dialogue state Bt−1 of a previous turn t−1 which is tracked from dialogues D1 to Dt−1 until the previous turn t−1 and stored in the dialogue state storage module 50, and combines the turn-specific dialogues Dt with the dialogue state Bt−1 of the previous turn t−1 to acquire an input sequence Xt of the current dialogue turn t.
Here, the input sequence Xt may be acquired according to Equation 1.
Here, [CLS] is a turn distinguishment token [CLS] which is a special token for distinguishing dialogue turns t in the dialogue history D of all the T turns T. The turn distinguishment token [CLS] may be added when a turn-specific dialogue Dt is applied to the input sequence creation module 12 or the dialogue collection module 11 extracts a turn-specific dialogue Dt from the dialogue history D.
The dialogue state Bt of each turn t stored in the dialogue state storage module 50 may be a set of slot-state-value pairs Bti=[SLOT]j⊕Sj⊕−⊕Vj each of which is obtained by combining a slot Sj and a state value Vj. Here, “-” is a value distinguishment token for distinguishing between the slot Sj and the state value Vj in a jth slot-state-value pair Brj, and [SLOT]j is a slot distinguishment token for distinguishing between the jth slot-state-value pair Brj from other jth slot-state-value pairs. Here, each slot Sj may be set as a combination of a domain name and a slot name, such as restaurant-area or taxi-destination in the example of
When the number of slots which are trackable by the DST device is J (e.g., 30), a dialogue state Bt may be a set Bt=Bi1⊕ . . . ⊕BtJ of J slot-state-value pairs (Bt1, Bt2, . . . , BtJ). Therefore, a dialogue state Bt of each turn t may be represented as Bt={(Sj, Vj)|1≤j≤J}.
The input sequence creation module 12 which may be omitted is an element for inputting a turn-specific dialogue Dt and a dialogue state Bt−1 of a previous turn t−1 to the first embedding module 13.
The first embedding module 13 may receive the input sequence Xt obtained by combining the turn-specific dialogues Dt with the dialogue state Bt−1 of the previous turn t−1 and encode the input sequence Xt to generate a turn-specific vector and a slot vector. The first embedding module 13 may be implemented as an artificial neural network, specifically, a pretrained language model (PrLM). Here, the first embedding module 13 may generate a turn-specific dialogue vector from the turn-specific dialogue Dt of the input sequence Xt and separate slot-state-value pairs Bt−11, Bt−12, . . . , Bt−1J according to a slot distinguishment token [SLOT]j to generate multiple different slot vectors.
Meanwhile, the encoder 10 may further include a possible value setting module 14. The possible value setting module 14 may provide possible values p1 to pn representing all possible state values that a slot may have to the input sequence creation module 12. As described above, slots may be classified as categorical slots and span slots on the basis of characteristics of state values that the slots may have. In the case of a categorical slot, a state value may be extracted from a dialogue, or a state value may be selected from among the multiple possible values p1 to pn which are set in advance and set. Therefore, in the case of a categorical slot, the possible values p1 to pn which are state values that each slot may have are provided in advance such that a dialogue state can be tracked more effectively with accuracy.
Accordingly, the possible value setting module 14 may transmit the multiple possible values p1 to pn to the input sequence creation module 12, and the input sequence creation module 12 may create the input sequence Xt by combining the multiple possible values p1 to pn with the turn-specific dialogues Dt and the dialogue state Bt−1. As shown in
Meanwhile, the hierarchical slot selector 20 receives the multiple slot vectors applied from the encoder 10 to hierarchically classify a feature of each slot. The hierarchical slot selector 20 may include a state update predictor 21 and a slot classifier 22.
As shown in
As an example, when a jth slot vector is applied, the state update predictor 21 estimates an update score Total_scorej for the jth slot vector. When the update score Total_scorej estimated as shown in Equation 2 exceeds a threshold δ, the state update predictor 21 determines the jth slot Sj of the dialogue state Bt as an update slot paired with a state value Vj which is to be updated on the basis of a tth turn dialogue Dt. On the other hand, when the update score Total_scorej is the threshold δ or less, the state update predictor 21 determines that the jth slot Sj is an inherit slot, that is, a state value Vj of the jth slot Sj in the previous dialogue state Bt−1 is to be carried over in the dialogue state Bt of the current turn t.
When each of multiple slots is classified as an update slot or an inherit slot, the state update predictor 21 selects a set (Us={j|SUP(Sj)=update}) of update slots among the classified slots and transmits the update slot set Us to the slot classifier 22. In the case of an inherit slot, a state value Vj of a previous dialogue state Bt−1 is to be carried over to a current dialogue state Bt, and thus no additional task is required. Accordingly, in the present disclosure, no additional task is performed on inherit slots, which reduces computational cost.
When the update slot set Us is received from the state update predictor 21, the slot classifier 22 classifies the update slots of the received update slot set Us as categorical slots and span slots.
The slot classifier 22 may classify slots in various ways and may be implemented as an artificial neural network to classify slots. As an example, settable types, subtypes, names, and the like of slots have already been proposed as shown in Table 1 in various studies on the basis of MultiWOZ 2.2, which is a representative dataset used for DST, and are being used, and thus it is assumed that slots are classified herein on the basis of the studies according to the related art.
For the categorical slots of Table 1, sets of possible values which are settable depending on subtype may be defined as shown in Table 2 in advance and provided. Different possible value sets are provided according to subtypes of slots.
As shown in Tables 1 and 2, the slot classifier 22 may classify slots that may have limited state values as categorical slots, and classify other slots as span slots.
As shown in the example of
On the other hand, in the case of the hotel-name slot, no possible value is provided, and a label extractor 44 extracts the hotel name “Gonville Hotel” from the dialogue later. Accordingly, the slot classifier 22 may determine the hotel-name slot to be a span slot.
When the update slots of the update slot set Us are classified categorical slots and span slots, the slot classifier 22 checks a number nc of classified categorical slots and a number ns of classified span slots. Since the sum is the number nc of classified categorical slots and the number ns of classified span slots is the number of update slots of the update slot set Us, |Us|=nc+ns.
The dual dynamic graph neural network module 30 generates a value graph and a dialogue graph according to the categorical slots and span slots classified by the slot classifier 22 and performs neural network computation on each of the generated value graph and dialogue graph to extract a value index pn
The dual dynamic graph neural network module 30 may include a value graph generation module 31, a value graph neural network model 32, a dialogue graph generation module 33, and a dialogue graph neural network model 34.
The value graph generation module 31 generates a value graph by embedding a slot vector, turn-specific dialogue vectors, and multiple possible value vectors, which are acquired from the first embedding module 13 for each slot classified as a categorical slot by the slot classifier 22, into an embedding space.
The value graph generation module 31 generates multiple dialogue nodes Nd by projecting multiple dialogue vectors according to turn-specific dialogues D1 to Dt until the current turn t to the virtual embedding space, generates multiple slot nodes Ns by projecting slot vectors according to multiple categorical slots, and generates multiple possible value nodes Nv by projecting multiple possible value vectors.
Then, the value graph generation module 31 connects each of the multiple dialogue nodes Nd and each of the multiple slot nodes Ns using an edge e and connects the slot of each slot node Ns to possible value nodes Nv of possible values which may be paired with the slot of each slot node Ns using edges. Here, as shown in
A graph generated in this way may be defined using a weighted graph (G=(V, E)), which includes an entire node set V including the dialogue nodes Nd, the slot nodes Ns, and the possible value nodes Nv and an edge set E including multiple edges connecting all nodes of the node set V to each other, and an N×N binary symmetric adjacency matrix S which represents whether there is an edge e connecting two nodes vi and vj of the node set V in the weighted graph (G=(V, E)). Here, N is the sum (N=Nd+Ns+Nv) of the number of dialogue nodes Nd, the number of slot nodes Ns, and the number of possible value nodes Nv, and each element [S]ij of the binary symmetric adjacency matrix S has a binary value as an element value according to whether there is an edge eij connecting two nodes vi and vj. For example, when there is no edge eij connecting two nodes vi and vj ((vi, vj)∉eij), an element value of the element [S]ij is 0 ([S]ij=0), and when there is an edge eij, the element value is 1 ([S]ij=1).
Each node vi in the weighted graph (G=(V, E)) corresponds to an F-dimension input vector (xi∈) (a slot vector, a turn-specific vector, and a possible value vector) extracted from the first embedding module 13 of the encoder 10. Accordingly, a value graph generated by the value graph generation module 31 may be represented using an N×F input matrix X including an input vector xi and the binary symmetric adjacency matrix S.
Therefore, the graph neural network model 32 receives the value graph (X;S) represented using the input matrix X and the binary symmetric adjacency matrix S and performs neural network computation on the value graph to extract a value index pn
The value graph neural network model 32 may be implemented as a neural network including at least one graph attention (GAT) layer and estimates a weight W of each edge e by performing neural network computation on the received value graph. Then, the value graph neural network model 32 extracts a possible value index pn
Meanwhile, the dialogue graph generation module 33 generates a dialogue graph by embedding a slot vector and turn-specific dialogue vectors, which are acquired by the first embedding module 13 for each slot classified as a span slot by the slot classifier 22, into the virtual embedding space.
The dialogue graph generation module 33 generates multiple dialogue nodes Nd by projecting multiple dialogue vectors according to turn-specific dialogues D1 to Dt until the current turn t to the virtual embedding space and generates multiple slot nodes Ns by projecting slot vectors according to multiple span slots. In the case of a span slot, there is no possible value, and thus the dialogue graph generation module 33 generates no possible value node unlike the value graph generation module 31.
The dialogue graph generation module 33 connects the multiple dialogue nodes Nd to each other using edges e based on a turn order. However, a dialogue node Nd for the current turn among the multiple dialogue nodes Nd is connected to all other dialogue nodes Nd using edges e. Also, each of the multiple dialogue nodes Nd is connected to a slot node Ns, specifically, a slot node Ns of a slot of which a state value is updated according to the turn-specific dialogues D1 to Dt, using an edge e. In the dialogue graph, there is no edge e connecting the slot nodes Ns.
The dialogue graph generated by the dialogue graph generation module 33 may also be represented using the input matrix X and the binary symmetric adjacency matrix S like the value graph.
The dialogue graph neural network model 34 receives the dialogue graph (X;S) represented using the input matrix X and the binary symmetric adjacency matrix S and performs neural network computation on the dialogue graph to extract a dialogue turn index dn
The dialogue graph neural network model 34 may also be implemented as a neural network including at least one GAT layer and estimates a weight W of each edge e by performing neural network computation on the received dialogue graph. Then, the dialogue graph neural network model 34 extracts a dialogue turn index dn
The state generator 40 receives multiple possible value indexes pn
The state generator 40 may include a sequence recreation module 41, a second embedding module 42, a state value selector 43, and a label extractor 44.
Like the input sequence creation module 12, the sequence recreation module 41 receives a turn-specific dialogue Dt and possible values P1 to Pn of a current turn t and a dialogue state Bt−1 of a previous turn t−1 together and generates a reconfigured sequence C which is obtained by adding a possible value index and a dialogue turn index dn
Here, [SEP] is a sequence distinguishment token added for sequence distinguishment.
The sequence recreation module 41 may receive a turn-specific dialogue Dt and possible values P1 to Pn from the dialogue collection module 11 and the possible value setting module 14.
Like the input sequence creation module 12, the sequence recreation module 41 is an element for combining the turn-specific dialogue Dt and the possible values P1 to Pn of the turn t, the dialogue state Bt−1, the possible value index pn
The second embedding module 42 receives the reconfigured sequence C created by the sequence recreation module 41 and extracts a context representation vector Ht. The second embedding module 42 may encode the reconfigured sequence C, which is created and applied by an artificial neural network, through neural network computation to acquire the context representation vector Ht. The second embedding module 42 may be implemented as, for example, a lite bidirectional encoder representation from transformer (ALBERT) which is a pretrained language model.
Since the reconfigured sequence C includes both possible value indexes pn
As shown in
The state value selector 43 receives the context representation vector Ht and selects a state value which is to be updated appropriately for each slot from a possible value set PVj corresponding to a detailed format of each categorical slot. The state value selector 43 may be implemented as an artificial neural network and perform neural network computation on the received context representation vector Ht to select a state value to be updated from the possible value set PVj. The state value selector 43 may have at least one neural network computation layer to extract slot-specific features from the context representation vector Ht, perform softmax operation on the extracted slot-specific features, and select a possible value y with the highest possibility from among the multiple possible values of the possible value set PVj as a state value.
Meanwhile, the label extractor 44 receives the context representation vector Ht and extracts a label which is a state value of each span slot. The label extractor 44 may also be implemented as an artificial neural network and perform neural network computation on the received context representation vector Ht to estimate a start position p and an end position q of the label in a turn-specific dialogue Dt of a current turn t. A label that represents a state value of a span slot does not have any designated format. Accordingly, a label is not extracted in a limited format such as a word, a phrase, or the like. Therefore, the label extractor 44 has neural network layers which are configured in parallel to estimate a start position p and an end position q of a label which is a state value of each span slot.
Since span slots are included in update slots through hierarchical slot classification, a state value to be updated in a span slot may be considered to be included in a turn-specific dialogue Dt of a turn t. Accordingly, the label extractor 44 may estimate a start position p and an end position q in the turn-specific dialogue Dt of the current turn t to acquire a state value of a span slot in the form of a partial section D1 (p:q) of the turn-specific dialogue Dt of the current turn t.
The state value selector 43 and the label extractor 44 configured as artificial neural networks are trained in advance, and lossc and losse for training the state value selector 43 and the label extractor 44 may be set according to Equations 4 and 5.
Here, {circumflex over (p)} and {circumflex over (q)} are ratios of all possible start positions and end positions in a turn-specific dialogue Dt, and ŷ is a possibility of a possible value.
Losses calculated on the basis of training data according to Equations 4 and 5 may be backpropagated such that the state value selector 43 and the label extractor 44 may be trained.
The dialogue state storage module 50 receives state values (y, (p, q)) which are separately acquired from the state generator 40 according to categorical slots and span slots, pairs the slots with the state values, and stores the dialogue state Bt according to the turn-specific dialogues D1 to Dt until the current turn t in the entire dialogue. Then, the dialogue state storage module 50 transmits the stored dialogue state Bt to the input sequence creation module 12 and the sequence recreation module 41 such that a dialogue state Bt+1 according to a turn-specific dialogue Dt+1 of the subsequent turn t+1 is acquired.
The dialogue state storage module 50 is separated here to facilitate understanding, but it may be included in the encoder 10. Also, the dialogue state storage module 50, the dialogue collection module 11, and the possible value setting module 14 may be separated as input modules. The input modules may be implemented as storage devices, such as memories, but are not limited thereto.
As a result, the DST device according to the present disclosure divides an entire dialogue into turns, classifies slots extracted from an acquired previous dialogue state and turn-specific dialogues as update slots of which state values are to be updated and inherit slots of which state values are not to be updated first, and hierarchically classifies the update slots as categorical slots which may have limited state values and span slots which may have unlimited state values. Then, the DST device generates a dual dynamic graph of a value graph and a dialogue graph for each of the classified categorical slots and span slots and performs neural network computation on the dual dynamic graphs to extract possible value indexes pn
The DST device according to the present disclosure hierarchically classifies slots and estimates a state value of each classified slot appropriate for dialogue context on the basis of a dual dynamic graph, thus tracking a dialogue state very accurately.
In the embodiments shown in the drawings, each element may have functions and capabilities other than those described above, and additional elements other than those described above may be included. Also, in an exemplary embodiment, each element may be implemented by one or more physically divided devices, one or more processors, or a combination of one or more processors and software. Unlike what is shown in examples, elements may not be clearly divided in terms of operation.
The DST device shown in
In addition, the DST device may be installed in a computing device or server with hardware elements in the form of software, hardware, or a combination thereof. The computing device or server may be various devices including all or some of a communication device, such as a communication modem, for communicating with various types of equipment or wired or wireless communication networks, a memory for storing data for executing a program, a microprocessor for executing the program for computation and instructing, and the like.
The DST method of
When the turn-specific dialogues Dt and the dialogue state Bt−1 are acquired, the acquired turn-specific dialogues Dt and the dialogue state Bt−1 are encoded using neural network computation to generate a dialogue vector and a slot vector (72). Also, when the possible values p1 to pn are acquired together therewith, a possible value vector may also be generated.
When the dialogue vector and the slot vector are generated, slots are hierarchically classified on the basis of the generated dialogue vector and slot vector (73). Here, the slots may be first classified as update slots and inherit slots depending on whether state values paired with the slots are to be updated according to the turn-specific dialogue Dt of the current turn t, and the classified update slots may be hierarchically classified as categorical slots which may have state values limited to possible values and span slots which may have unlimited state values.
When the slots are hierarchically classified, a dual dynamic graph is generated according to the classified categorical slots and span slots (74). Here, the dual dynamic graph is generated as two graphs, that is, a value graph for the classified categorical slots and a dialogue graph for the span slots. The value graph between the two graphs includes multiple dialogue nodes Nd, multiple slot nodes Ns, multiple possible value nodes Nv, and edges connecting the nodes on the basis of turn-specific dialogue vectors until a current turn t and slot vectors and possible value vectors for the categorical slots. In the value graph, each slot node Ns may be connected to the multiple dialogue nodes Nd and the multiple possible value nodes Nv using edges e, and there is no edge e between the dialogue nodes Nd, between the slot nodes Ns, or between the possible value nodes Nv.
The dialogue graph includes multiple dialogue nodes Nd and multiple slot nodes Ns on the basis of turn-specific dialogue vectors until a current turn t and slot vectors for the span slots. The multiple dialogue nodes Nd are basically connected to each other through edges based on the turn order, and a dialogue node Nd for a turn-specific dialogue Dt of the current turn t is connected to all other dialogue nodes through edges e. The dialogue node Nd is connected to a slot node Ns of a slot of which a state value is updated according to the corresponding turn-specific dialogue Dt through an edge.
When the value graph and the dialogue graph are generated, neural network computation is performed on each of the generated value graph and dialogue graph to estimate the relationship between nodes and give weights W to edges, and a possible value or dialogue having the closest correlation with each slot according to the given weight is estimated to extract possible value indexes pn
When the possible value indexes pn
Like the input sequence Xt, when the possible value indexes pn
When the context representation vector Ht is extracted, neural network computation is performed on the context representation vector Ht to estimate state values according to slot formats (77). Here, for a categorical slot, the most appropriate possible value is selected from among multiple possible values that the categorical slot may have, and extracted as a state value. Also, for a span slot, a start position and an end position of words including a state value in the turn-specific dialogue Dt of the current turn t are estimated, and words between the estimated start position and end position are extracted as a state value.
When the state value is estimated, a state value extracted from each of the categorical slots and span slots in the update slots is paired with the slot, excluding the inherit slots which do not require state value updates. Accordingly, the dialogue state Bt including slot-state-value pairs until the current turn t can be tracked.
Although it has been described that operations of
In the exemplary embodiment shown in the drawing, each component may have functions and capabilities other than those described below, and additional components other than those described below may be included. A computing environment 90 shown in the drawing may include a computing device 91 to perform the DST method of
The computing device 91 includes at least one processor 92, a computer-readable storage medium 93, and a communication bus 95. The processor 92 may cause the computing device 91 to operate according to the foregoing exemplary embodiment. For example, the processor 92 may execute one or more programs 94 stored in the computer-readable storage medium 93. The one or more programs 94 may include one or more computer-executable instructions, and the computer-executable instructions may be configured to cause the computing device 91 to perform operations according to the exemplary embodiment when executed by the processor 92.
The communication bus 95 connects various components of the computing device 91 including the processor 92 and the computer-readable storage medium 93 to each other.
The computing device 91 may include at least one input/output interface 96 which provides an interface for at least one input/output device 98 and at least one communication interface 97. The input/output interface 96 and the communication interface 97 are connected to the communication bus 95. The input/output device 98 may be connected to other components of the computing device 91 through the input/output interface 96. The exemplary input/output device 98 may include input devices, such as a pointing device (a mouse, a trackpad, or the like), a keyboard, a touch input device (a touchpad, a touchscreen, or the like), a voice or sound input device, various types of sensor devices, and/or an imaging device, and/or output devices such as a display device, a printer, a speaker, and/or a network card. The exemplary input/output device 98 may be included in the computing device 91 as a component of the computing device 91 or connected to the computing device 91 as a device separate from the computing device 91.
With a device and method for DST according to the present disclosure, it is possible to accurately track an appropriate dialogue state for a user's intention by hierarchically classifying each slot in consideration of multiple features of domain-specific slots extracted from a dialogue with the user and tracking a state value of each of the hierarchically classified slots using a dual dynamic graph neural network.
Although the present invention has been described above with reference to the exemplary embodiments, those of ordinary skill in the art should understand that various modifications and other equivalent embodiments can be made from the embodiments. Therefore, the technical scope of the present invention should be determined from the technical spirit of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0108034 | Aug 2023 | KR | national |