This application claims priority to Chinese Patent Application No. 201910002553.3, filed on Jan. 2, 2019, which is hereby incorporated by reference in its entirety.
The technical field relates to voice interactions, and particularly to a method and voice interaction control apparatus.
Under a full-duplex interaction scene, a device is typically in a sound reception state. Various sounds will be recorded in a sound reception process, and excessive disturbances will be caused when all the sounds cause a response. If a user wants to change the response of the device, the user needs to actively issue a command to stop the response.
For example, after ‘Xiaodu, xiaodu, play a song’ is said, the device starts to play a song. If another function is needed, the user should say ‘pause playing’ to stop the device from playing. Then, the user says ‘what's the weather like today’, and the device gives an answer such as ‘it's sunny today; the highest temperature is xx, and the lowest temperature is xx’. Next, the user says ‘continue to play’, and the device continues to play the song. This experience of pausing and continuing the playback is unnatural and requires a user education.
A voice interaction control method and apparatus are provided according to the embodiments of the present disclosure, so as to solve one or more technical problems in the existing technology.
In a first aspect, a voice interaction control method is provided according to the embodiments of the present disclosure, the method includes:
In one embodiment, the method further includes:
In one embodiment, the receiving a negative feedback after responding to the voice interaction requirement; and deleting the voice interaction requirement from the admission requirements in response to the negative feedback, includes:
In one embodiment, the negative feedback includes a negative feedback expression and/or a negative feedback behavior.
In one embodiment, the method further includes at least one of:
In a second aspect, a voice interaction control apparatus is provided according to the embodiments of the present disclosure, the apparatus includes:
In one embodiment, the apparatus further includes:
In one embodiment, the requirement deleting module is further configured to determine that the number of receiving the negative feedback exceeds a set threshold after responding to the voice interaction requirement, and delete the voice interaction requirements.
In one embodiment, the negative feedback includes a negative feedback expression and/or a negative feedback behavior.
In one embodiment, the apparatus further includes at least one of:
In a third aspect, a voice interaction control apparatus is provided according to the embodiments of the present disclosure, and the functions thereof can be realized by hardware or by executing corresponding software through the hardware. The hardware or the software includes one or more modules corresponding to the above functions.
In a possible embodiment, the structure of the apparatus includes a memory configured to store a program supporting the apparatus to perform the voice interaction control method, and a processor configured to execute the program stored in the memory. The apparatus may further include a communication interface configured to communicate with other device or a communication network.
In a fourth aspect, a computer readable storage medium is provided according to the embodiments of the present disclosure, which is configured to store computer software instructions for use by a voice interaction control apparatus, including a program involved in performing the voice interaction control method.
One of the above technical solutions has the following advantages or beneficial effects: the natural experience requirement of the user can be met, the real requirement of the user can be learned in the use process by the user, and the wrongly identified requirement can be corrected.
The above summary is for the purpose of description, and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments and features described above, further aspects, embodiments and features of the present disclosure will be readily apparent with reference to the drawings and the following detailed descriptions.
In the drawings, unless otherwise specified, the same reference numeral refers to the same or similar parts or elements throughout the drawings. These drawings are not necessarily drawn to scale. It should be understood that these drawings depict only some embodiments disclosed in accordance with the present disclosure and should not be considered as limitations to the scope of the present disclosure.
In the following, certain embodiments are briefly described. As will be recognized by persons skilled in the art, the described embodiments can be modified in a variety of different ways without departing from the spirit or scope of the present disclosure. Accordingly, the drawings and descriptions are regarded as illustrative in nature rather than restrictive.
S11: identifying a voice signal received by a voice interaction device, to obtain a voice interaction requirement;
S12: determining that the voice interaction requirement is included in admission requirements learned in advance;
S13: responding to the voice interaction requirement.
In the embodiments of the present disclosure, the voice interaction device may include various devices with a voice interaction function, such as a mobile phone, a notebook computer, a handheld computer, a smart speaker box, an audio and video player, etc.
After the voice interaction device is awakened, it enters a wake-up state and may begin to receive the sounds continuously within a reception duration. The reception duration may be set, according to the type of the voice interaction device and the requirement of the specific application scene. If the voice interaction device identifies a voice interaction requirement from the received voice signal within the reception duration, a corresponding operation may be performed according to the voice interaction requirement. The voice interaction device may identify the voice signal locally, or send the received voice signal to other device such as a voice identification server in the cloud for identification.
In addition, the admission requirement for the voice interaction device may be learned in advance. The learned admission requirement may be different for various voice interactive devices depending on their characteristics such as the environments and the user habits. The admission requirement for the voice interaction device can reflect the personalized characteristics of the voice interaction device.
In one example, if the user continuously utters identical or similar voices to the voice interaction device multiple times, a requirement corresponding to the identical or similar voices may be taken as an admission requirement. For example, if the user repeatedly utters the voices such as ‘hello’, ‘play a song’, ‘please turn off’ and ‘fast forward’ multiple times, the requirements corresponding to ‘hello’, ‘play a song’, ‘please turn off’ and ‘fast forward’ will be taken as the admission requirements.
In another example, it is assumed that a voice interaction device such as a speaker box is located in a studio, and the high-frequency or often-occurring voices usually occurring in the studio may include ‘play music XX’, ‘open video XX’ and ‘turn off’ for example.
Interferences may be caused if a response is made whenever any of these high-frequency voices is received. Thus, the learned admission requirements for this speaker box do not include those corresponding to ‘play music XX’, ‘open video XX’ and ‘turn off’.
In another example, it is assumed that a voice interaction device such as a speaker box is located in a hotel, and the high-frequency or often-occurring voices usually occurring in the hotel may include the greetings such as ‘hello’ and ‘welcome’. Interferences may be caused if a response is made whenever any of these high-frequency voices is received. Thus, the learned admission requirements for this speaker box do not include those corresponding to ‘hello’ and ‘welcome’.
In one embodiment, in the method, there are various modes to learn the admission requirements, and the examples are given as follows.
In mode 1, a voice interaction requirement is taken as an admission requirement, if expressions approximate or identical to the voice interaction requirement are continuously detected within a set duration.
For example, if it is detected multiple times within 10 s that the user repeatedly utters the voices including ‘play a song’ to the device continuously, playing music may be taken as an admission requirement for the device.
For another example, if it is detected multiple times within 10 s that the user continuously utters the voices similar to the requirement of playing music, including ‘play a song’, ‘play music’, ‘please play song XX’, etc., playing music may be taken as an admission requirement for the device.
In mode 2, an admission requirement is obtained by making statistics of responses of the voice interaction device to voice interaction requirements, and making statistics of feedbacks for the responses of the voice interaction device.
For example, a statistic analysis is made to determine voice interaction requirements responded by the device, and a voice interaction requirement without negative feedback, such as prohibiting a response thereto, from the user. Next, the voice interaction requirement without negative feedback is taken as an admission requirement.
In mode 3, a candidate requirement, to which the voice interaction device has responded, is taken as an admission requirement.
For example, 100 candidate requirements are preset. The device identifies the voices uttered by the user to obtain corresponding candidate requirements, and then responds to the candidate requirements. In addition, after the device responds, the user continues to interact with the device. In this case, a candidate requirement to which the device has responded may be taken as an admission requirement.
In the above mode 1, the set duration may be a reception duration of the voice interaction device. There are many modes to calculate the reception duration, and the examples are given as follows.
In example 1, the duration from the latest timing, at which the voice interaction requirement is identified, to the current timing is taken as the reception duration.
For example, if the latest timing at which the voice interaction requirement ‘what's the weather like today’ is identified is 10:00:00, and the current timing is 10:00:05, the reception duration is 5 s.
In example 2: the duration from the latest timing at which the voice signal is detected to the current timing is taken as the reception duration.
For example, if the latest timing at which the voice signal is detected is 8:00:00, and the current timing is 8:00:07, the reception duration is 7 s.
Next, it is determined whether the reception duration has timed out. For example, a duration threshold is set as 8 s, and if the reception duration is less than or equal to 8 s, it does not time out; otherwise it has timed out.
In a case where the reception duration does not time out, the voice interaction device can continuously receive the sounds, and identify the voice interaction requirement in the received voice signal.
In one embodiment, as illustrated in
S21: receiving a negative feedback after responding to the voice interaction requirement; and deleting the voice interaction requirement from the admission requirements in response to the negative feedback.
In one embodiment, S21 includes:
In one embodiment, the negative feedback includes a negative feedback expression and/or a negative feedback behavior.
The negative feedback expression may include a voice uttered by the user after hearing a voice response from the voice interaction device, the voice indicating that the response is not needed. The negative feedback behavior may include a behavior made by the user after hearing a voice response from the voice interaction device, the behavior indicating that the response is not needed.
After a certain voice interaction requirement is responded to by the device, if negative feedbacks are received multiple times, it indicates that the user may not want the device to respond to the voice interaction requirement. If being included in the admission requirement learned in advance, the voice interaction requirement may be deleted therefrom in order that the device no longer responds to the voice interaction requirement subsequently. In this way, it is beneficial to correct the requirement of misidentified.
In one example, some default admission requirements may be preset for the voice interaction device. If no negative feedback is received subsequently, these default admission requirements will be reserved. A default admission requirement may be deleted if negative feedbacks are received for the default admission requirement multiple times. For example, the default admission requirement includes ‘play’, ‘what's the weather like’, etc. However, if most of the users prohibit the response to the above default requirement in a personalized manner, the default requirement will no longer be taken as the admission requirement.
The embodiments of the present disclosure can meet the natural experience requirement of the user, learn the real requirement of the user in the use process by the user, and correct the wrongly identified requirement. By personalizing the user experience, the self-iterative closed loop of the user experience is realized, and the data really takes effect.
In one application example, the admission modes are shown in Table 1, the prohibition modes are shown in Table 2, and the device may be in the same state after the prohibition and before the admission. The limit value of the reception duration is assumed as 8 s. If the characteristics of the learning signals are different within 8 seconds, the feedback modes may be different. After the initial admission and the second admission after learning, the response modes of the device may also be different. In Tables 1 and 2, Q indicates the content said by the user, and A indicates the response content of the device. ‘An=Refuse’ indicates that the device refuses to respond at the n-th time. The user's positive follow indicates that the user has uttered an approximate or identical expression, etc., which is a positive signal for admission. The user's negative follow indicates that the user has uttered a negative expression, etc., which is a negative signal for admission.
Referring to Table 1, when the learning signal is ‘Expressed continuously, approximately and repeatedly in a short term in case of unawakening’, it is not necessary to learn from some meaningless expressions, such as ultra-short sentences and expressions having no specific meaning like ‘play’, ‘of’ and ‘for’.
In one embodiment, as illustrated in
In one embodiment, the requirement deleting module 44 is further configured to determine that the number of receiving the negative feedback exceeds a set threshold after responding to the voice interaction requirement, and delete the voice interaction requirement from the admission requirements.
In one embodiment, the negative feedback includes a negative feedback expression and/or a negative feedback behavior.
In one embodiment, the apparatus further includes at least one of:
The function of each of the modules in the apparatus according to the embodiments of the present disclosure can refer to corresponding descriptions in the above method, and will not be repeated here.
The apparatus further includes:
The memory 910 may include a high-speed random access memory (RAM), and may also include a non-volatile memory, such as at least one disk memory.
If being implemented independently, the memory 910, the processor 920 and the communication interface 930 may be connected to each other through a bus and perform communications with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Component (EISA) bus, or the like. The bus may be classified into an address bus, a data bus, a control bus, etc. For the convenience of representation, a single thick line is used in
Alternatively, during implementation, if being integrated onto one chip, the memory 910, the processor 920 and the communication interface 930 can perform communications with each other through internal interfaces.
A computer readable storage medium is provided according to the embodiments of the present disclosure, the storage medium is configured for storing a computer program, which implements the method according to any one of the above embodiments when being executed by a processor.
Among the descriptions herein, a description referring to terms ‘one embodiment’, ‘some embodiments’, ‘example’, ‘specific example’, ‘some examples’, or the like means that specific features, structures, materials, or characteristics described in conjunction with the embodiment(s) or example(s) are included in at least one embodiment or example of the present disclosure. Moreover, the specific features, structures, materials, or characteristics described may be incorporated in any one or more embodiments or examples in a suitable manner. In addition, persons skilled in the art may incorporate and combine different embodiments or examples described herein and the features thereof without a contradiction therebetween.
In addition, the terms ‘first’ and ‘second’ are used for descriptive purposes only and cannot be understood as indicating or implying a relative importance or implicitly pointing out the number of the technical features indicated. Thus, the features defined with ‘first’ and ‘second’ may explicitly or implicitly include at least one of the features. In the description of the present disclosure, ‘a (the) plurality of’ means ‘two or more’, unless otherwise specified explicitly.
Any process or method description in the flow chart or otherwise described herein may be understood to mean a module, a segment, or a part including codes of executable instructions of one or more steps for implementing a specific logical function or process, and the scope of preferred embodiments of the present disclosure includes additional implementations, wherein the functions may be performed without in a sequence illustrated or discussed, including being performed in a substantially simultaneous manner according to the functions involved or in a reverse sequence, which should be understood by skilled persons in the technical field to which the embodiments of the present disclosure belong.
At least one of the logics and the steps represented in the flow chart or otherwise described herein, for example, may be considered as a sequencing list of executable instructions for implementing logical functions, and may be embodied in any computer readable medium for being used by or in conjunction with an instruction execution system, an apparatus or a device (e.g., a computer-based system, a system including a processor, or any other system capable of fetching and executing instructions from the instruction execution system, the apparatus, or the device). Regarding this specification, the ‘computer readable medium’ may be any means that can contain, store, communicate, propagate, or transfer a program for being used by or in conjunction with the instruction execution system, the apparatus, or the device. More specific examples (non-exhaustive list) of the computer readable medium include an electrical connection portion (electronic device) having one or more wires, a portable computer enclosure (magnetic device), a random access memory (RAM), a read only memory (ROM), an erasable editable read only memory (EPROM or flash memory), an optical fiber device, and a portable read only memory (CDROM). In addition, the computer readable medium may even be paper or any other suitable medium on which the program is printed, because the program can be electronically obtained, for example, by optically scanning the paper or other medium, and editing, interpreting, or processing in other suitable ways if necessary, and then stored in a computer memory.
It should be understood that various parts of the present disclosure may be implemented by hardware, software, firmware, or combinations thereof. In the above embodiments, a plurality of steps or methods may be implemented by software or firmware stored in a memory and executed with a suitable instruction execution system. For example, if hardware is employed for implementation, like in another embodiment, the implementation may be made by any one or combinations of the following technologies known in the art: a discreet logic circuit having a logic gate circuit for implementing logic functions on data signals, an application specific integrated circuit having an appropriate combinational logic gate circuit, a programmable gate array (PGA), a field programmable gate array (FPGA), etc.
Persons of ordinary skill in the art can understand that all or part of the steps carried by the above method embodiments can be implemented by instructing relevant hardware through a program, wherein the program may be stored in a computer readable storage medium, and it includes one or combinations of the steps of the method embodiments when being executed.
In addition, the functional units in various embodiments of the present disclosure may be integrated into one processing module, or may be physically presented separately, or two or more units may be integrated into one module. The above integrated module may be implemented in the form of one of hardware and a software functional module. If the integrated module is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer readable storage medium that may be a read only memory, a magnetic disk or an optical disk, etc.
Those described above are only embodiments of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Within the technical scope revealed in the present disclosure, any skilled person familiar with the technical field can easily conceive of various changes or replacements thereof, which should be covered by the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure should be subject to that of the accompanied claims.
Number | Date | Country | Kind |
---|---|---|---|
201910002553.3 | Jan 2019 | CN | national |