RESPONSE DETERMINING METHOD AND APPARATUS

TECHNICAL FIELD

This application relates to the artificial intelligence field, and in particular, to a response determining method and apparatus.

BACKGROUND

Artificial intelligence (artificial intelligence, AI) is a theory, a method, a technology, or an application system that simulates, extends, and expands human intelligence by using a digital computer or a machine controlled by a digital computer, to perceive an environment, obtain knowledge, and achieve an optimal result based on the knowledge. In other words, artificial intelligence is a branch of computer science, and is intended to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is to study design principles and implementation methods of various intelligent machines, so that the machines have perception, inference, and decision-making functions.

A dialog system has a plurality of dialog types, for example, a chit-chat dialog (mainly for entertainment and escort), a task-oriented dialog (for meeting specific requirements of users, such as ticket booking and hotel booking), and a question answering dialog (providing knowledge-related services for users and answering questions of the users). With development of deep learning, the dialog system has made great progress.

In a conventional implementation, to enable the dialog system to simultaneously cope with user dialogs of the plurality of dialog types, a dialog model corresponding to each dialog type is separately trained, and different dialog models are organized together in an integrated manner to construct a multi-functional dialog system. However, the dialog system has a complex system structure and occupies large storage space.

SUMMARY

This application provides a response determining method. A dialog type of a user dialog is identified by using a state determining network, and responses corresponding to different dialog types are generated by reusing a dialog generation network, which is equivalent to processing user statements of different dialog types by using a same model, thereby reducing model complexity and a model size of a dialog system.

According to a first aspect, this application provides a response determining method. The method includes:

- obtaining a to-be-responded first user statement.

In a possible implementation, the first user statement may be a text, such as a question or a request, input by a user to a question answering device. For example, the user may input a target question into the question answering device in a text form. In this case, the question answering device may directly obtain the first user statement in the text form. The user may further input a target question into the question answering device in a speech form. In this case, the question answering device may convert received speech information into text information, to obtain the first user statement in the text form. The user may further input a target question into the question answering device by using a body language. In this case, the question answering device captures and analyzes body movement of the user, and identifies the first user statement in the text form.

The method includes: determining first state information of the first user statement based on the first user statement by using a state determining network, where the first state information includes a first dialog type of the first user statement, and the first dialog type is a chit-chat dialog, a task-oriented dialog, a question answering dialog, or a retrieval dialog.

The state determining network may be trained and has a capability of determining a corresponding dialog type based on a user statement.

It should be understood that, the state determining network has a capability of identifying the four dialog types (the chit-chat dialog, the task-oriented dialog, the question answering dialog, and the retrieval dialog), and the state determining network may have a capability of identifying at least two of the four dialog types. This is not limited in this application.

It should be understood that, when the dialog type is determined, input of the state determining network may be the first user statement (optionally, may further include another historical statement of the user). This is not limited herein.

It should be understood that the dialog type may also be referred to as a dialog belief state (belief state).

The chit-chat dialog may also be referred to as a chat dialog.

The state determining network may be a part of a GPT model or a complete GPT model, a response generation network may be a part of a DialoGPT model or a complete DialoGPT model, a response generation network may be a part of a BART model or a complete BART model, or a response generation network may be a part of a T5 model or a complete T5 model.

The method further includes: inputting the first user statement and the first dialog type into a response generation network, to obtain a response corresponding to the first user statement.

In a possible implementation, the response generation network may be a GPT model, a DialoGPT model, a BART model, or a T5 model. The response generation network may be a part of the GPT model or the complete GPT model, the response generation network may be a part of the DialoGPT model or the complete DialoGPT model, the response generation network may be a part of the BART model or the complete BART model, or the response generation network may be a part of the T5 model or the complete T5 model.

Optionally, the state determining network and the response generation network in this embodiment of this application may be two parts of a same network, or may be different networks.

It should be understood that the response generation network may further generate the response of the first user statement based on another user historical statement other than the first user statement. This is not limited herein.

It should be understood that user statements of different dialog types may be used as input of a same response generation network to obtain a response.

In this embodiment of this application, a dialog type of a user dialog is identified by using the state determining network, and responses corresponding to different dialog types are generated by reusing the dialog generation network, which is equivalent to processing user statements of different dialog types by using a same model. During model training, modes of a plurality of dialog types can be unified, so that the plurality of dialog types can be trained at the same time, and a trained dialog system has a capability of a plurality of dialog types, thereby reducing model complexity and a model size of the dialog system.

In a possible implementation, the first state information may further include slot information, and the slot information may be a keyword in the first user statement.

In a possible implementation, the determining first state information of the first user statement based on the first user statement by using a state determining network includes: determining the first dialog type of the first user statement from a plurality of dialog types by using a state determining network, where the plurality of dialog types includes at least two of the chit-chat dialog, the task-oriented dialog, the question answering dialog, and the retrieval dialog.

In a possible implementation, the first dialog type of the first user statement can be determined from the plurality of dialog types by using the state determining network, where the plurality of dialog types includes at least two of the chit-chat dialog, the task-oriented dialog, the question answering dialog, and the retrieval dialog.

For example, the plurality of dialog types include the chit-chat dialog and the task-oriented dialog.

For example, the plurality of dialog types include the chit-chat dialog and the question answering dialog.

For example, the plurality of dialog types include the chit-chat dialog and the retrieval dialog.

For example, the plurality of dialog types include the task-oriented dialog and the question answering dialog.

For example, the plurality of dialog types include the task-oriented dialog and the retrieval dialog.

For example, the plurality of dialog types include the question answering dialog and the retrieval dialog.

For example, the plurality of dialog types include the chit-chat dialog, the task-oriented dialog, and the question answering dialog.

For example, the plurality of dialog types include the chit-chat dialog, the task-oriented dialog, and the retrieval dialog.

For example, the plurality of dialog types include the task-oriented dialog, the question answering dialog, and the retrieval dialog.

For example, the plurality of dialog types include the chit-chat dialog, the task-oriented dialog, the question answering dialog, and the retrieval dialog.

In this embodiment of this application, the responses corresponding to different dialog types can be generated by reusing the dialog generation network in this embodiment of this application. In a possible implementation, the method further includes: obtaining a to-be-responded second user statement; determining second state information of the second user statement based on the second user statement by using the state determining network, where the second state information includes a second dialog type of the second user statement, the second dialog type is a chit-chat dialog, a task-oriented dialog, a question answering dialog, or a retrieval dialog, and the second dialog type is different from the first dialog type; and inputting the second user statement and the second dialog type into the response generation network, to obtain a response corresponding to the second user statement.

In a possible implementation, the state determining network and the response generation network each are a GPT model, a DialoGPT model, a BART model, or a T5 model. The state determining network and the response generation network each are a complete GPT model, DialoGPT model, BART model, or T5 model; or the state determining network and the response generation network may be models of similar network structures or network performance having the GPT model, the DialoGPT model, the BART model, or the T5 model. This is not limited in this application. For example, the state determining network and the response generation network each may be a part of the GPT model, the DialoGPT model, the BART model, or the T5 model.

In a possible implementation, a dialog system may obtain, from the first user statement or a database based on the first user statement, a keyword or a key sentence for constructing the response; and input the first user statement, the first dialog type, and the keyword or the key sentence into the response generation network, to obtain the response corresponding to the first user statement.

In this embodiment of this application, data or text content related to a dialog can be obtained from an external resource such as an external database/a knowledge base/a corpus based on the first user statement and the first dialog type, and is used as dialog information (namely, the keyword or key sentence) to join a dialog process.

In a possible implementation, the method further includes: inputting the first user statement and the first dialog type into a response generation network, to obtain a response corresponding to the first user statement; or inputting the first user statement, the first dialog type, and the keyword or the key sentence into the response generation network, to obtain the response corresponding to the first user statement.

According to a second aspect, this application provides a response determining method. The method includes:

- obtaining a first user statement, a first dialog type of the first user statement, and a first response corresponding to the first user statement, where the first dialog type is a real type of the first user statement, and the first dialog type is a chit-chat dialog, a task-oriented dialog, a question answering dialog, or a retrieval dialog;
- determining first state information of the first user statement based on the first user statement by using a state determining network, where the first state information includes a second dialog type of the first user statement;
- inputting the first user statement and the first dialog type into a response generation network, to obtain a second response corresponding to the first user statement;
- updating the state determining network based on a difference between the first dialog type and the second dialog type; and
- updating the response generation network based on a difference between the first response and the second response.

In this application, a dialog type of a user dialog is identified by using the state determining network, and responses corresponding to different dialog types are generated by reusing the dialog generation network, which is equivalent to processing user statements of different dialog types by using a same model. During model training, modes of a plurality of dialog types can be unified, so that the plurality of dialog types can be trained at the same time, and a trained dialog system has a capability of a plurality of dialog types, thereby reducing model complexity and a model size of the dialog system.

In a possible implementation, the determining first state information of the first user statement based on the first user statement by using a state determining network includes:

- determining the second dialog type of the first user statement from a plurality of dialog types by using the state determining network, where the plurality of dialog types include at least two of the chit-chat dialog, the task-oriented dialog, the question answering dialog, and the retrieval dialog.

In a possible implementation, the method further includes:

- obtaining a second user statement, a third dialog type of the second user statement, and a third response corresponding to the second user statement, where the third dialog type is a real type of the second user statement;
- determining second state information of the second user statement based on the second user statement by using the state determining network, where the second state information includes a fourth dialog type of the second user statement, and the fourth dialog type is different from the third dialog type;
- inputting the second user statement and the third dialog type into the response generation network, to obtain a fourth response corresponding to the second user statement;
- updating the state determining network based on a difference between the fourth dialog type and the third dialog type; and
- updating the response generation network based on a difference between the fourth response and the third response.

In a possible implementation, the state determining network and the response generation network each are a GPT model, a DialoGPT model, a BART model, or a T5 model.

In a possible implementation, the inputting the first user statement and the first dialog type into a response generation network, to obtain a second response corresponding to the first user statement includes:

- obtaining, from the first user statement or a database based on the first user statement, a keyword or a key sentence for constructing the response; and
- inputting the first user statement, the first dialog type, and the keyword or the key sentence into the response generation network, to obtain the second response corresponding to the first user statement.

According to a third aspect, this application provides a response determining apparatus. The apparatus includes:

- an obtaining module, configured to obtain a to-be-responded first user statement;
- a state generation module, configured to determine first state information of the first user statement based on the first user statement by using a state determining network, where the first state information includes a first dialog type of the first user statement, and the first dialog type is a chit-chat dialog, a task-oriented dialog, a question answering dialog, or a retrieval dialog; and
- a response generation module, configured to input the first user statement and the first dialog type into a response generation network, to obtain a response corresponding to the first user statement.

This application provides a response determining apparatus. The apparatus includes: the obtaining module, configured to obtain the to-be-responded first user statement; the state generation module, configured to determine the first state information of the first user statement based on the first user statement by using the state determining network, where the first state information includes the first dialog type of the first user statement, and the first dialog type is the chit-chat dialog, the task-oriented dialog, the question answering dialog, or the retrieval dialog; and the response generation module, configured to input the first user statement and the first dialog type into the response generation network, to obtain the response corresponding to the first user statement. A dialog type of a user dialog is identified by using the state determining network, and responses corresponding to different dialog types are generated by reusing the dialog generation network, which is equivalent to processing user statements of different dialog types by using a same model. During model training, modes of a plurality of dialog types can be unified, so that the plurality of dialog types can be trained at the same time, and a trained dialog system has a capability of a plurality of dialog types, thereby reducing model complexity and a model size of the dialog system.

In a possible implementation, the state generation module is specifically configured to:

- determine the first dialog type of the first user statement from a plurality of dialog types by using a state determining network, where the plurality of dialog types include at least two of the chit-chat dialog, the task-oriented dialog, the question answering dialog, and the retrieval dialog.

In a possible implementation, the obtaining module is further configured to:

- obtain a to-be-responded second user statement;
- the state generation module is further configured to determine second state information of the second user statement based on the second user statement by using the state determining network, where the second state information includes a second dialog type of the second user statement, the second dialog type is a chit-chat dialog, a task-oriented dialog, a question answering dialog, or a retrieval dialog, and the second dialog type is different from the first dialog type; and
- the response generation module is further configured to input the second user statement and the second dialog type into the response generation network, to obtain a response corresponding to the second user statement.

In a possible implementation, the state determining network and the response generation network each are a GPT model, a DialoGPT model, a BART model, or a T5 model.

In a possible implementation, the response generation module is specifically configured to:

- obtain, from the first user statement or a database based on the first user statement, a keyword or a key sentence for constructing the response; and
- input the first user statement, the first dialog type, and the keyword or the key sentence into the response generation network, to obtain the response corresponding to the first user statement.

According to a fourth aspect, this application provides a response determining apparatus. The apparatus includes:

- an obtaining module, configured to obtain a first user statement, a first dialog type of the first user statement, and a first response corresponding to the first user statement, where the first dialog type is a real type of the first user statement, and the first dialog type is a chit-chat dialog, a task-oriented dialog, a question answering dialog, or a retrieval dialog;
- a state generation module, configured to determine first state information of the first user statement based on the first user statement by using a state determining network, where the first state information includes a second dialog type of the first user statement;
- a response generation module, configured to input the first user statement and the first dialog type into a response generation network, to obtain a second response corresponding to the first user statement; and
- a model update module, configured to: update the state determining network based on a difference between the first dialog type and the second dialog type; and
- update the response generation network based on a difference between the first response and the second response.

This application provides the response determining apparatus. The apparatus includes: the obtaining module, configured to obtain the first user statement, the first dialog type of the first user statement, and the first response corresponding to the first user statement, where the first dialog type is the real type of the first user statement, and the first dialog type is the chit-chat dialog, the task-oriented dialog, the question answering dialog, or the retrieval dialog; the state generation module, configured to determine the first state information of the first user statement based on the first user statement by using the state determining network, where the first state information includes the second dialog type of the first user statement; the response generation module, configured to input the first user statement and the first dialog type into the response generation network, to obtain the second response corresponding to the first user statement; and the model update module, configured to: update the state determining network based on the difference between the first dialog type and the second dialog type; and update the response generation network based on the difference between the first response and the second response. In this application, a dialog type of a user dialog is identified by using the state determining network, and responses corresponding to different dialog types are generated by reusing the dialog generation network, which is equivalent to processing user statements of different dialog types by using a same model. During model training, modes of a plurality of dialog types can be unified, so that the plurality of dialog types can be trained at the same time, and a trained dialog system has a capability of a plurality of dialog types, thereby reducing model complexity and a model size of the dialog system.

In a possible implementation, the state generation module is specifically configured to:

- determine the second dialog type of the first user statement from a plurality of dialog types by using the state determining network, where the plurality of dialog types include at least two of the chit-chat dialog, the task-oriented dialog, the question answering dialog, and the retrieval dialog.

In a possible implementation, the obtaining module is further configured to:

- obtain a second user statement, a third dialog type of the second user statement, and a third response corresponding to the second user statement, where the third dialog type is a real type of the second user statement;
- the state generation module is further configured to determine second state information of the second user statement based on the second user statement by using the state determining network, where the second state information includes a fourth dialog type of the second user statement, and the fourth dialog type is different from the third dialog type;
- the response generation module is further configured to input the second user statement and the third dialog type into the response generation network, to obtain a fourth response corresponding to the second user statement; and
- the model update module is further configured to: update the state determining network based on a difference between the fourth dialog type and the third dialog type; and
- update the response generation network based on a difference between the fourth response and the third response.

In a possible implementation, the state determining network and the response generation network each are a GPT model, a DialoGPT model, a BART model, or a T5 model.

In a possible implementation, the response generation module is specifically configured to:

- obtain, from the first user statement or a database based on the first user statement, a keyword or a key sentence for constructing the response; and
- input the first user statement, the first dialog type, and the keyword or the key sentence into the response generation network, to obtain the second response corresponding to the first user statement.

According to a fifth aspect, an embodiment of this application provides a response determining apparatus. The apparatus may include a memory, a processor, and a bus system. The memory is configured to store a program, and the processor is configured to execute the program in the memory, to perform any optional method according to the first aspect.

According to a sixth aspect, an embodiment of this application provides a response determining apparatus. The apparatus may include a memory, a processor, and a bus system. The memory is configured to store a program, and the processor is configured to execute the program in the memory, to perform any optional method according to the second aspect.

According to a seventh aspect, an embodiment of this application provides a computer-readable storage medium. The computer-readable storage medium stores a computer program, and when the computer program is run on a computer, the computer is enabled to perform any optional method according to the first aspect, or any optional method according to the second aspect.

According to an eighth aspect, an embodiment of this application provides a computer program product, including code. When being executed, the computer program product is configured to implement any optional method according to the first aspect and any optional method according to the second aspect.

According to a ninth aspect, this application provides a chip system. The chip system includes a processor, configured to support an execution device or a training device in implementing functions in the foregoing aspects, for example, sending or processing data or information in the foregoing method. In a possible design, the chip system further includes a memory. The memory is configured to store program instructions and data that are necessary for the execution device or the training device. The chip system may include a chip, or may include a chip and another discrete component.

Embodiments of this application provide the response determining method. The method includes: obtaining the to-be-responded first user statement; determining the first state information of the first user statement based on the first user statement by using the state determining network, where the first state information includes the first dialog type of the first user statement, and the first dialog type is the chit-chat dialog, the task-oriented dialog, the question answering dialog, or the retrieval dialog; and inputting the first user statement and the first dialog type into the response generation network, to obtain the response corresponding to the first user statement. The dialog type of the user dialog is identified by using the state determining network, and the responses corresponding to different dialog types are generated by reusing the dialog generation network, which is equivalent to processing user statements of different dialog types by using the same model. During model training, modes of the plurality of dialog types can be unified, so that the plurality of dialog types can be trained at the same time, and the trained dialog system has the capability of a plurality of dialog types, thereby reducing model complexity and the model size of the dialog system.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a schematic diagram of a structure of an artificial intelligence main framework;

FIG. 2 is a schematic diagram of a system architecture according to an embodiment of this application;

FIG. 3 is a schematic diagram of an embodiment of a response determining method according to an embodiment of this application;

FIG. 4 is a schematic diagram of an interface of a task-oriented dialog;

FIG. 5 is a schematic diagram of a model according to an embodiment of this application;

FIG. 6 is a schematic diagram of a model according to an embodiment of this application;

FIG. 7 is a schematic diagram of a response determining method according to an embodiment of this application;

FIG. 8 is a schematic diagram of a response determining apparatus according to an embodiment of this application;

FIG. 9 is a schematic diagram of a response determining apparatus according to an embodiment of this application;

FIG. 10 is a schematic diagram of a structure of an execution device according to an embodiment of this application;

FIG. 11 is a schematic diagram of a structure of a training device according to an embodiment of this application; and

FIG. 12 is a schematic diagram of a structure of a chip according to an embodiment of this application.

DESCRIPTION OF EMBODIMENTS

The following describes embodiments of the present invention with reference to accompanying drawings in embodiments of the present invention. Terms used in implementations of the present invention are merely intended to explain specific embodiments of the present invention, but not intended to limit the present invention.

The following describes embodiments of this application with reference to the accompanying drawings. A person of ordinary skill in the art may learn that, with development of technologies and emergence of a new scenario, technical solutions provided in embodiments of this application are also applicable to a similar technical problem.

In the specification, claims, and accompanying drawings of this application, the terms “first”, “second”, and so on are intended to distinguish between similar objects but do not necessarily indicate a specific order or sequence. It should be understood that the terms used in such a way are interchangeable in appropriate circumstances, which is merely a discrimination manner that is used when objects having a same attribute are described in embodiments of this application. In addition, the terms “include”, “have” and any other variants mean to cover the non-exclusive inclusion, so that a process, method, system, product, or device that includes a series of units is not necessarily limited to those units, but may include other units not expressly listed or inherent to such a process, method, system, product, or device.

An overall working procedure of an artificial intelligence system is first described with reference to FIG. 1. FIG. 1 is a schematic diagram of a structure of an artificial intelligence main framework. The following describes the artificial intelligence main framework from two dimensions: an “intelligent information chain” (a horizontal axis) and an “IT value chain” (a vertical axis). The “intelligent information chain” indicates a process from data obtaining to data processing. For example, the “intelligent information chain” may be a general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, and intelligent execution and output. In this process, data undergoes a refining process of “data—information—knowledge—intelligence”. The “IT value chain” is an industrial ecological process from underlying infrastructure of artificial intelligence to information (providing and processing technical implementations) to a system, and indicates value brought by artificial intelligence to the information technology industry.

(1) Infrastructure

Infrastructure provides computing capability support for the artificial intelligence system, to communicate with the outside world and implement support by using basic platforms. The infrastructure communicates with the outside by using sensors. A computing capability is provided by intelligent chips (hardware acceleration chips such as a CPU, an NPU, a GPU, an ASIC, and an FPGA). The basic platforms include related platforms, for example, a distributed computing framework and network, for assurance and support. The basic platforms may include a cloud storage and computing network, an interconnection network, and the like. For example, the sensor communicates with the outside to obtain data, and the data is provided for an intelligent chip in a distributed computing system provided by the basic platform to perform computation.

(2) Data

Data at an upper layer of the infrastructure indicates a data source in the artificial intelligence field. The data relates to graphics, images, speech, and text, and further relates to internet of things data of conventional devices, and includes service data of a conventional system and perception data such as force, displacement, a liquid level, temperature, and humidity.

(3) Data Processing

Data processing usually includes manners, such as data training, machine learning, deep learning, searching, reasoning, and decision-making.

The machine learning and the deep learning may be used for performing symbolic and formal intelligent information modeling, extraction, preprocessing, training, and the like on data.

The reasoning is a process of performing machine thinking and solving problems by simulating an intelligent reasoning mode of humans in a computer or intelligent system by using formal information and according to a reasoning control policy. Typical functions are searching and matching.

The decision-making is a process of performing decision-making after performing reasoning on intelligent information, and usually provides classification, sorting, prediction, and other functions.

(4) General Capabilities

After data undergoes the foregoing data processing, some general capabilities may be formed based on a data processing result. For example, the general capabilities may be an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, and image recognition.

(5) Intelligent Products and Industry Application

The intelligent product and industry application are a product and an application of the artificial intelligence system in various fields, and are package of an overall solution of artificial intelligence, so that decision-making for intelligent information is productized and an application is implemented. Application fields mainly include a smart terminal, smart transportation, smart health care, autonomous driving, a smart city, and the like.

The following describes an example of an application scenario of this application.

A method and an apparatus provided in embodiments of this application are applied to a man-machine dialog scenario in a natural language processing (natural language processing, NLP) technology. Specifically, embodiments of this application are applied to a scenario of constructing a dialog robot and providing a semantic understanding and a dialog service for an end user. The dialog robot is, for example, a child accompanying education robot, an after-sales automatic answer application, a pre-sales consultation robot, or an intelligent voice assistant on a terminal.

The following describes an application architecture in embodiments of this application.

The following describes in detail the system architecture provided in embodiments of this application with reference to FIG. 2. FIG. 2 is a schematic diagram of a system architecture according to an embodiment of this application. As shown in FIG. 2, the system architecture 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550, and a data collection system 560.

The execution device 510 includes a computation module 511, an I/O interface 512, a preprocessing module 513, and a preprocessing module 514. The computation module 511 may include a state determining network/rule 501, and the preprocessing module 513 and the preprocessing module 514 are optional.

The data collection device 560 is configured to collect a training sample. The training sample may be text data or the like. In this embodiment of this application, the training sample is data for training the state determining network and a response generation network. After collecting the training samples, the data collection device 560 stores the training samples in the database 530.

It should be understood that the database 530 may further maintain a pre-trained model such as a state determining network and a response generation network, or a model obtained after fine-tune (fine-tune) is performed on the pre-trained model for at least one time.

The training device 520 may train the state determining network and the response generation network by using the training samples maintained in the database 530, to obtain the state determining network/rule 501. In this embodiment of this application, the state determining network/rule 501 may be a trained state determining network and response generation network.

It should be noted that, during actual application, the training samples maintained in the database 530 are not necessarily collected by the data collection device 560, but may be received from another device. It should further be noted that the training device 520 may not necessarily train the state determining network/rule 501 totally based on the training samples maintained in the database 530, or may obtain a training sample from a cloud or another place for model training. The foregoing descriptions should not be construed as a limitation on embodiments of this application.

Specifically, the training sample may be private data from the client device 540, and the training device 520 may use the private data from the client device 540 as the training sample to perform model fine-tune on the state determining network and the response generation network.

In this embodiment of this application, the training device 520 may train the state determining network and the response generation network in the model training method in embodiments of this application, to obtain the trained state determining network and response generation network.

The state determining network/rule 501 obtained through training by the training device 520 is applied to different systems or devices, for example, the execution device 510 shown in FIG. 2. The execution device 510 may be a terminal, for example, a mobile phone terminal, a tablet computer, a laptop computer, an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) device, or a vehicle-mounted terminal; or may be a server, a cloud, or the like.

In FIG. 2, the input/output (input/output, I/O) interface 512 is configured in the execution device 510, and is configured to exchange data with an external device. A user may input data (for example, a first user statement and a second user statement in embodiments of this application) to the I/O interface 512 by using the client device 540.

The preprocessing module 513 and the preprocessing module 514 each are configured to perform preprocessing based on input data received by the I/O interface 512. It should be understood that there may be no preprocessing module 513 or preprocessing module 514, or there is only one preprocessing module. If the preprocessing module 513 and the preprocessing module 514 do not exist, the computation module 511 may be directly configured to process input data.

In a process in which the execution device 510 preprocesses the input data, the computation module 511 in the execution device 510 performs computing, or the like, the execution device 510 may invoke data, code, and the like in the data storage system 550 for corresponding processing, and may further store, in the data storage system 550, data, instructions, and the like that are obtained through the corresponding processing.

Finally, the I/O interface 512 presents a processing result (for example, a response) to the client device 540, so as to provide the processing result to a user.

In a case shown in FIG. 2, the user may manually specify the input data, and the “manually specifying the input data” may be operated on an interface provided by the I/O interface 512. In another case, the client device 540 may automatically send the input data to the I/O interface 512. If that the client device 540 automatically sends the input data needs to obtain authorization from the user, the user may set corresponding permission on the client device 540. The user can view, on the client device 540, the result output by the execution device 510. The result may be specifically presented as displaying, a sound, or an action. The client device 540 may also serve as a data collector to collect, as new sample data, the input data input into the I/O interface 512 and the output result output from the I/O interface 512 that are shown in the figure, and store the new sample data into the database 530. Certainly, the client device 540 may alternatively not perform collection, but the I/O interface 512 directly stores, as new sample data into the database 530, the input data input into the I/O interface 512 and the output result output from the I/O interface 512 that are shown in the figure.

It should be noted that FIG. 2 is merely the schematic diagram of the system architecture according to this embodiment of this application. A location relationship between the devices, the components, the modules, and the like shown in the figure does not constitute any limitation. For example, in FIG. 2, the data storage system 550 is an external memory relative to the execution device 510, but in another case, the data storage system 550 may alternatively be disposed in the execution device 510. It should be understood that the execution device 510 may be deployed in the client device 540.

In this embodiment of this application, the training device 520 may obtain code stored in a memory (not shown in FIG. 2, and the memory may be integrated into the training device 520 or may be separately deployed from the training device 520), to implement the response determining method in embodiments of this application.

In this embodiment of this application, the training device 520 may include hardware circuits (for example, an application-specific integrated circuit (application-specific integrated circuit, ASIC), a field-programmable gate array (field-programmable gate array, FPGA), a general purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor, a microcontroller, and the like), or a combination of these hardware circuits. For example, the training device 520 may be a hardware system having an instruction execution function, for example, a CPU or a DSP, or a hardware system having no instruction execution function, for example, an ASIC or an FPGA, or a combination of the hardware system having no instruction execution function and the hardware system having the instruction execution function.

Specifically, the training device 520 may be the hardware system having the instruction execution function. The response determining method provided in embodiments of this application may be software code stored in the memory. The training device 520 may obtain the software code from the memory, and execute the obtained software code to implement the response determining method provided in embodiments of this application.

It should be understood that the training device 520 may be the combination of the hardware system having no instruction execution function and the hardware system having the instruction execution function. Some steps of the model training method provided in embodiments of this application may alternatively be implemented by the hardware system, in the training device 520, having no instruction execution function. This is not limited herein.

It should be understood that the execution device may be a server on a cloud side or an electronic device on a terminal side.

Embodiments of this application relate to massive application of a neural network. Therefore, for ease of understanding, the following first describes terms and concepts related to the neural network in embodiments of this application.

(1) Neural Network

The neural network may include a neuron. The neuron may be an operation unit that uses xs (namely, input data) and an intercept of 1 as input. Output of the operation unit may be as follows:

$h_{W, b} (x) = f (W^{T} x) = f (\sum_{s = 1}^{n} W_{s} x_{s} + b) .$

s=1, 2, . . . , or n, where n is a natural number greater than 1; Ws is a weight of xs; b is a bias of the neuron; and f is an activation function (activation function) of the neuron, and is used for introducing a non-linear characteristic into the neural network, to convert an input signal of the neuron into an output signal. The output signal of the activation function may be used as input of a next convolutional layer. The activation function may be a sigmoid function. The neural network is a network constituted by linking a plurality of single neurons together. To be specific, output of one neuron may be input of another neuron. Input of each neuron may be connected to a local receptive field of a previous layer to extract a feature of the local receptive field. The local receptive field may be a region including several neurons.

(2) Deep Neural Network

The deep neural network (Deep Neural Network, DNN), also referred to as a multi-layer neural network, may be understood as a neural network having many hidden layers. The “many” herein does not have a special measurement standard. The DNN is divided based on locations of different layers, and a neural network in the DNN may be divided into three types: an input layer, a hidden layer, and an output layer. Generally, a first layer is the input layer, a last layer is the output layer, and a middle layer is the hidden layer. Layers are fully connected. To be specific, any neuron at an i^thlayer is necessarily connected to any neuron at an (i+1)^thlayer. Although the DNN seems complex, it is not complex in terms of work at each layer. Simply, it is the following linear relationship expression: {right arrow over (y)}=α(W{right arrow over (x)}+{right arrow over (b)}). {right arrow over (x)} is an input vector, {right arrow over (y)} is an output vector, {right arrow over (b)} is an offset vector, W is a weight matrix (also referred to as a coefficient), and α( ) is an activation function. At each layer, the output vector {right arrow over (y)} is obtained by performing such a simple operation on the input vector {right arrow over (x)}. Because the DNN has the plurality of layers, there are also a plurality of coefficients W and bias vectors {right arrow over (b)}. Definitions of these parameters in the DNN are as follows: The coefficient W is used as an example. It is assumed that in a DNN having three layers, a linear coefficient from a fourth neuron at a second layer to a second neuron at a third layer is defined as w₂₄³. The superscript 3 represents a layer at which the coefficient W is located, and the subscript corresponds to an output third-layer index 2 and an input second-layer index 4. In conclusion, a coefficient from a k^thneuron at an (L−1)^thlayer to a j^thneuron at an L^thlayer is defined as W_jk^L. It should be noted that there is no parameter W at the input layer. In the deep neural network, more hidden layers make the network more capable of describing a complex case in the real world. Theoretically, a model with more parameters has higher complexity and a larger “capacity”. It indicates that the model can complete a more complex learning task. Training the deep neural network is a process of learning a weight matrix, and a final objective of the training is to obtain a weight matrix of all layers of the trained deep neural network (a weight matrix formed by vectors W at many layers).

(3) Loss Function

In a process of training a deep neural network, because it is expected that output of the deep neural network is as much as possible close to a predicted value that is actually expected, a predicted value of a current network and a target value that is actually expected may be compared, and then a weight vector of each layer of the neural network is updated based on a difference between the predicted value and the target value (certainly, there is usually an initialization process before a first update, to be specific, parameters are preconfigured for all layers of the deep neural network). For example, if the predicted value of the network is large, the weight vector is adjusted to decrease the predicted value, and adjustment is continuously performed, until the deep neural network can predict the target value that is actually expected or a value that is close to the target value that is actually expected. Therefore, “How to obtain, through comparison, the difference between the predicted value and the target value” needs to be predefined. This is the loss function (loss function) or an objective function (objective function). The loss function and the objective function are important equations that measure the difference between the predicted value and the target value. The loss function is used as an example. A higher output value (loss) of the loss function indicates a larger difference. Therefore, training of the deep neural network is a process of minimizing the loss as much as possible.

(4) Back Propagation Algorithm

An error back propagation (back propagation, BP) algorithm may be used to correct a value of a parameter in an initial model in a training process, so that an error loss of the model becomes smaller. Specifically, an input signal is transferred forward until an error loss occurs at output, and the parameter in the initial model is updated based on back propagation error loss information, to make the error loss converge. The back propagation algorithm is an error-loss-centered back propagation motion intended to obtain a parameter, such as a weight matrix, of an optimal model.

In the open field, ensemble learning is usually a learning method that integrates a plurality of functional modules or task types. In the research of a dialog system, different dialog types and different dialog fields can be integrated through ensemble learning. A transformer model is a common model architecture for modeling dialogs. The model includes a transformer encoder and decoder. The encoder module is configured to encode dialog context information, and the decoder module is generated based on a dialog context. The conventional technology 1 proposes that a plurality of decoder modules may be used to model dialog fields, and each decoder module corresponds to one dialog field. In a model training process, an encoder module corresponding to each dialog field learns through parameter sharing, and a data set corresponding to each dialog field is used to learn a decoder module corresponding to the field. In addition, in a training process, the system learns a recurrent neural network-based module to determine a domain to which a current dialog context belongs, and then performs weighted integration of a plurality of decoder parameters by using a determining probability distribution, to obtain a multi-domain dialog system.

The ensemble learning method has many disadvantages, such as a complex model, high deployment costs, and high update costs. In the conventional technology 1, each field corresponds to one system submodule, which greatly increases model complexity and training overheads. When a quantity of dialog domains increases, a larger submodule is required to carry functions. Consistency between a plurality of domains is not achieved.

With evolution of technologies, user requirements will always evolve towards one system to resolve all problems. It is a development trend to use a dialog pre-training technology to enable a single model to support various dialog types and switch between tasks. Therefore, the solutions of the present invention provide a unified end-to-end dialog system framework, to unify dialog systems of different types into a same dialog mode. This implements unified training of different dialog types, so that a model has a capability of completing different types of dialogs.

The response determining method provided in embodiments of this application is described first by using a model inference phase as an example.

FIG. 3 is a schematic diagram of an embodiment of a response determining method according to an embodiment of this application. As shown in FIG. 3, the response determining method provided in this embodiment of this application includes the following steps.

301: Obtain a to-be-responded first user statement.

In a possible implementation, the first user statement may be a text, such as a question or a request, input by a user to a question answering device. For example, the user may input a target question into the question answering device in a text form. In this case, the question answering device may directly obtain the first user statement in the text form. The user may further input a target question into the question answering device in a speech form. In this case, the question answering device may convert the received speech information into text information, to obtain the first user statement in the text form. The user may further input a target question into the question answering device by using a body language. In this case, the question answering device captures and analyzes body movement of the user, and identifies the first user statement in the text form.

302: Determine first state information of the first user statement based on the first user statement by using a state determining network, where the first state information includes a first dialog type of the first user statement, and the first dialog type is a chit-chat dialog, a task-oriented dialog, a question answering dialog, or a retrieval dialog.

In a possible implementation, after the first user statement is obtained, the first state information of the first user statement needs to be determined, where the first state information may include the first dialog type.

In a possible implementation, the first state information of the first user statement may be determined by using the state determining network.

In a possible implementation, the state determining network may be a generative pre-trained transformer (generative pre-trained transformer, GPT) model, a dialogue generative pre-trained transformer (dialogue generative pre-trained transformer, DialoGPT) model, a bidirectional and auto-regressive transformer (bidirectional and auto-regressive transformer, BART) model, or a T5 (transfer text-to-text transformer) model.

In a possible implementation, the first dialog type of the first user statement can be determined from a plurality of dialog types by using the state determining network, where the plurality of dialog types includes at least two of the chit-chat dialog, the task-oriented dialog, the question answering dialog, and the retrieval dialog.