Embodiments relate to the field of information processing technologies, and in particular, to a voice user interface display method and a conference terminal.
With emergence of artificial intelligence, a voice interaction technology is gradually used in various industries, for example, a home smart speaker, a voice control vehicle-mounted terminal, a personal voice assistant, and a voice control conference system.
The voice control conference system is used in a public place such as a conference room, and uniqueness of the system lies in that users are variable. For example, an organizer and a participant of each conference changes. Currently, the voice control conference system presents a same user interface to all users.
However, different users participating in a conference may have different requirements in the conference. For example, for a user familiar with the conference system, the user expects to efficiently complete a voice conference control task. For a user using the conference system for the first time, the user expects to obtain more help. The current voice control conference system cannot meet different requirements of different users for the conference system.
Embodiments provide a voice user interface display method and a conference terminal, to resolve a problem that a current voice control conference system cannot meet different requirements of different users for the conference system.
According to a first aspect, an embodiment provides a voice user interface display method, including:
when voice information input by a user into a conference terminal is received, collecting a voice of the user, where the voice information includes a voice wakeup word or voice information starting with the voice wakeup word;
obtaining identity information of the user based on the voice of the user;
obtaining a user voice instruction based on the voice information;
generating user interface information that matches the user, based on the identity information of the user, a conference status of the conference terminal, and the user voice instruction; and
displaying the user interface information.
According to the voice user interface display method provided in the first aspect, when the voice information input by the user is received, the voice of the user is collected. The user voice instruction may be obtained based on the voice information input by the user. The identity information of the user may be obtained in real time based on the voice of the user. Further, the user interface information that matches the user may be displayed based on the identity information of the user, the user voice instruction, and the current conference status of the conference terminal. Because identity information of a user is considered, usage requirements of different users for a conference may be recognized, and user interface information is generated for a target user, thereby meeting different requirements of different users for a conference system, improving diversity of display of the user interface information, and improving user experience in using the conference system.
In a possible implementation, the user voice instruction is used to wake up the conference terminal, and the generating user interface information that matches the user, based on the identity information of the user, a conference status of the conference terminal, and the user voice instruction includes:
determining a type of the user based on the conference status and the identity information of the user, where the type of the user is used to indicate a degree of familiarity of the user in completing a conference control task by inputting the voice information; and
if the type of the user indicates that the user is a new user, generating conference operation prompt information and a voice input interface based on the conference status.
In a possible implementation, the method further includes:
if the type of the user indicates that the user is an experienced user, generating the voice input interface.
In a possible implementation, if the conference status indicates that the user has joined a conference, the method further includes:
obtaining role information of the user in the conference; and
the generating conference operation prompt information and a voice input interface based on the conference status includes:
generating the conference operation prompt information and the voice input interface based on the conference status and the role information.
In a possible implementation, the determining a type of the user based on the conference status and the identity information of the user includes:
obtaining a historical conference record of the user based on the identity information of the user, where the historical conference record includes at least one of the following data: latest occurrence time of different conference control tasks, a quantity of cumulative task usage times, and a task success rate; and
determining the type of the user based on the conference status and the historical conference record of the user.
In a possible implementation, the determining the type of the user based on the conference status and the historical conference record of the user includes:
obtaining data of at least one conference control task associated with the conference status in the historical conference record of the user; and
determining the type of the user based on the data of the at least one conference control task.
In a possible implementation, the determining the type of the user based on the data of the at least one conference control task includes:
for each conference control task, if data of the conference control task includes latest occurrence time, and a time interval between the latest occurrence time and current time is greater than or equal to a first preset threshold, and/or if data of the conference control task includes a quantity of cumulative task usage times, and the quantity of cumulative task usage times is less than or equal to a second preset threshold, and/or if data of the conference control task includes a task success rate, and the task success rate is less than or equal to a third preset threshold, determining that the user is a new user for the conference control task; or
for each conference control task, if at least one of latest occurrence time, a quantity of cumulative task usage times, and a task success rate that are included in data of the conference control task meets a corresponding preset condition, determining that the user is an experienced user for the conference control task, where a preset condition corresponding to the latest occurrence time is that a time interval between the latest occurrence time and current time is less than the first preset threshold, a preset condition corresponding to the quantity of cumulative task usage times is that the quantity of cumulative task usage times is greater than the second preset threshold, and a preset condition corresponding to the task success rate is that the task success rate is greater than the third preset threshold.
In a possible implementation, the user voice instruction is used to execute a conference control task after waking up the conference terminal, a running result of the user voice instruction includes a plurality of candidates, and the generating user interface information that matches the user, based on the identity information of the user, a conference status of the conference terminal, and the user voice instruction includes:
sorting the plurality of candidates based on the identity information of the user to generate the user interface information that matches the user.
In a possible implementation, the sorting the plurality of candidates based on the identity information of the user to generate the user interface information that matches the user includes:
obtaining a correlation between each candidate and the identity information of the user; and
sorting the plurality of candidates based on the correlations to generate the user interface information that matches the user.
In a possible implementation, the obtaining a user voice instruction based on the voice information includes:
performing semantic understanding on the voice information to generate the user voice instruction;
or
sending the voice information to a server; and
receiving the user voice instruction sent by the server, where the user voice instruction is generated after the server performs semantic understanding on the voice information.
In a possible implementation, the method further includes:
when the voice information input by the user into the conference terminal is received, collecting a profile picture of the user; and
the obtaining identity information of the user based on the voice of the user includes:
obtaining the identity information of the user based on the voice and the profile picture of the user.
In a possible implementation, the obtaining the identity information of the user based on the voice and the profile picture of the user includes:
determining a position of the user relative to the conference terminal based on the voice of the user;
collecting facial information of the user based on the position of the user relative to the conference terminal; and
determining the identity information of the user based on the facial information of the user and a facial information library.
In a possible implementation, the obtaining the identity information of the user based on the voice and the profile picture of the user further includes:
obtaining voiceprint information of the user based on the voice of the user; and
determining the identity information of the user based on the voiceprint information of the user and a voiceprint information library.
According to a second aspect, an embodiment provides a voice user interface display apparatus, including:
a receiving module, configured to, when voice information input by a user into a conference terminal is received, collect a voice of the user, where the voice information includes a voice wakeup word or voice information starting with the voice wakeup word;
a first obtaining module, configured to obtain identity information of the user based on the voice of the user;
a second obtaining module, configured to obtain a user voice instruction based on the voice information;
a generation module, configured to generate user interface information that matches the user, based on the identity information of the user, a conference status of the conference terminal, and the user voice instruction; and
a display module, configured to display the user interface information.
In a possible implementation, the user voice instruction is used to wake up the conference terminal, and the generation module includes:
a first determining unit, configured to determine a type of the user based on the conference status and the identity information of the user, where the type of the user is used to indicate a degree of familiarity of the user in completing a conference control task by inputting the voice information; and
a first generation unit, configured to, if the type of the user indicates that the user is a new user, generate conference operation prompt information and a voice input interface based on the conference status.
In a possible implementation, the generation module further includes:
a second generation unit, configured to, if the type of the user indicates that the user is an experienced user, generate the voice input interface.
In a possible implementation, if the conference status indicates that the user has joined a conference, the generation module further includes:
a first obtaining unit, configured to obtain role information of the user in the conference; and
the first generation unit is configured to:
generate the conference operation prompt information and the voice input interface based on the conference status and the role information.
In a possible implementation, the first determining unit includes:
a first obtaining subunit, configured to obtain a historical conference record of the user based on the identity information of the user, where the historical conference record includes at least one of the following data: latest occurrence time of different conference control tasks, a quantity of cumulative task usage times, and a task success rate; and
a determining subunit, configured to determine the type of the user based on the conference status and the historical conference record of the user.
In a possible implementation, the determining subunit is configured to:
obtain data of at least one conference control task associated with the conference status in the historical conference record of the user; and
determine the type of the user based on the data of the at least one conference control task.
In a possible implementation, the determining subunit is configured to:
for each conference control task, if data of the conference control task includes latest occurrence time, and a time interval between the latest occurrence time and current time is greater than or equal to a first preset threshold, and/or if data of the conference control task includes a quantity of cumulative task usage times, and the quantity of cumulative task usage times is less than or equal to a second preset threshold, and/or if data of the conference control task includes a task success rate, and the task success rate is less than or equal to a third preset threshold, determine that the user is a new user for the conference control task; or
for each conference control task, if at least one of latest occurrence time, a quantity of cumulative task usage times, and a task success rate that are included in data of the conference control task meets a corresponding preset condition, determine that the user is an experienced user for the conference control task, where a preset condition corresponding to the latest occurrence time is that a time interval between the latest occurrence time and current time is less than the first preset threshold, a preset condition corresponding to the quantity of cumulative task usage times is that the quantity of cumulative task usage times is greater than the second preset threshold, and a preset condition corresponding to the task success rate is that the task success rate is greater than the third preset threshold.
In a possible implementation, the user voice instruction is used to execute a conference control task after waking up the conference terminal, a running result of the user voice instruction includes a plurality of candidates, and the generation module includes:
a third generation unit, configured to sort the plurality of candidates based on the identity information of the user to generate the user interface information that matches the user.
In a possible implementation, the third generation unit includes:
a second obtaining subunit, configured to obtain a correlation between each candidate and the identity information of the user; and
a generation subunit, configured to sort the plurality of candidates based on the correlations to generate the user interface information that matches the user.
In a possible implementation, the second obtaining module is configured to:
perform semantic understanding on the voice information to generate the user voice instruction;
or
send the voice information to a server; and
receive the user voice instruction sent by the server, where the user voice instruction is generated after the server performs semantic understanding on the voice information.
In a possible implementation, the receiving module is further configured to:
when the voice information input by the user into the conference terminal is received, collect a profile picture of the user; and
the first obtaining module is configured to obtain the identity information of the user based on the voice and the profile picture of the user.
In a possible implementation, the first obtaining module includes:
a second determining unit, configured to determine a position of the user relative to the conference terminal based on the voice of the user;
a collection unit, configured to collect facial information of the user based on the position of the user relative to the conference terminal; and
a third determining unit, configured to determine the identity information of the user based on the facial information of the user and a facial information library.
In a possible implementation, the first obtaining module further includes:
a second obtaining unit, configured to obtain voiceprint information of the user based on the voice of the user; and
a fourth determining unit, configured to determine the identity information of the user based on the voiceprint information of the user and a voiceprint information library.
According to a third aspect, an embodiment provides a conference terminal, including a processor, a memory, and a display.
The memory is configured to store program instructions.
The display is configured to display user interface information under control of the processor.
The processor is configured to invoke and execute the program instructions stored in the memory, and when the processor executes the program instructions stored in the memory, the conference terminal is configured to perform the method in any implementation of the first aspect.
According to a fourth aspect, an embodiment provides a chip system. The chip system includes a processor, and may further include a memory, and the chip system is configured to implement the method in any implementation of the first aspect. The chip system may include a chip or may include a chip and another discrete component.
According to a fifth aspect, an embodiment provides a program. When executed by a processor, the program is used to perform the method in any implementation of the first aspect.
According to a sixth aspect, an embodiment provides a computer program product including instructions. When the computer program product is run on a computer, the computer is enabled to perform the method in any implementation of the first aspect.
According to a seventh aspect, an embodiment provides a computer-readable storage medium. The computer-readable storage medium stores instructions. When the instructions are run on a computer, the computer is enabled to perform the method in any implementation of the first aspect.
Each device in the conference system may be pre-installed with a software program or an application (APP), to implement a voice interaction task between a user and the conference system by using a voice recognition technology and a semantic understanding technology.
It should be noted that a quantity of conference terminals 100 and a quantity of servers in the conference system is not limited in this embodiment.
The conference terminal 100 may include a sound collection device, a sound playing device, a photographing device, a memory, a processor, and the like. The sound collection device is configured to obtain a voice input by the user. The photographing device may collect an image or a video in a conference. The sound playing device may play a voice part in a result of the voice interaction task. Optionally, the conference terminal 100 may further include a transceiver. The transceiver is configured to communicate with another device and transmit data or instructions. Optionally, the conference terminal 100 may further include a display. The display is configured to display a displayable part in the result of the voice interaction task. Optionally, if the conference terminal 100 does not include a display, the conference terminal 100 may further perform data transmission with an external display device, so that the display device displays the displayable part in the result of the voice interaction task.
The following uses an example to describe the voice interaction task.
In some implementations or scenarios, the voice interaction task may also be referred to as a voice task, a conference control task, or the like. A function implemented by the voice interaction task is not limited in the embodiments.
For example, the user says a voice wakeup word “Hi, Scotty” to the conference terminal in a listening state. The voice interaction task may be waking up the conference terminal. After the task is performed, the conference terminal changes from the listening state to a standby state to wait for the user to continue to input a voice. In this case, a voice input window may be displayed on the display of the conference terminal.
For another example, the user says “Hi, Scotty, please call user A” to the conference terminal in a conference. The voice interaction task may be waking up the conference terminal and then initiating a call. After the task is performed, the conference terminal may be woken up and call user A. In this case, an interface of calling user A may be displayed on the display of the conference terminal.
It should be noted that a shape and a product type of the conference terminal 100 are not limited in this embodiment.
It should be noted that an implementation of each part in the conference terminal 100 is not limited in this embodiment. For example, the sound collection device may include a microphone or a microphone array. The sound playing device may include a loudspeaker or a speaker. The photographing device may be a camera with different pixels.
The following describes the conference system from a perspective of software.
For example,
As shown in
In this embodiment, when the instruction is executed, a type of the user, a user identity, and a conference status may be all considered. By executing the instruction based on the foregoing factors, a running result of the instruction that matches the type of the user, the user identity, and the conference status may be obtained. This improves interface displaying flexibility and user experience.
For example, an identity recognition engine 24 may obtain identity information of the user from a user information database 27 by using at least one of a sound source localization technology, a sound source tracking technology, a voiceprint recognition technology, a facial recognition technology, a lip movement recognition technology, and the like. The identity recognition engine 24 outputs the identity information of the user to the central control module 23.
An identity type determining unit 25 may determine the type of the user. The type of the user is used to indicate a degree of familiarity of the user in completing a conference control task by inputting a voice. It should be noted that, for different conference control tasks, a same user may have different types. For example, if user B often organizes a conference, user B may be an experienced user for conference control tasks such as joining the conference, initiating the conference, and adding a participant to the conference. However, if user B only organizes a conference and does not participate in a subsequent conference, user B may be a new user for a conference control task such as ending the conference, sharing a screen in the conference, or viewing a site in the conference. The identity type determining unit 25 may outputs the type of the user to the central control module 23.
A prompt information management unit 26 may push different prompt information to the central control module 23 based on the conference status.
Finally, the central control module 23 executes the instruction based on outputs of the identity recognition engine 24, the identity type determining unit 25, the prompt information management unit 26, and the dialog management module 22 to obtain the execution result.
It should be noted that the conference system in this embodiment may implement functions implemented by all modules shown in
The following describes the solutions in detail by using various embodiments. The following several embodiments may be combined with each other, and same or similar concepts or processes may not be described in detail in some embodiments.
It should be noted that the terms “first”, “second”, “third”, “fourth”, and so on (if existent) are intended to distinguish between similar objects but do not necessarily indicate a specific order or sequence.
S301: When voice information input by a user into the conference terminal is received, a voice of the user is collected.
The voice information includes a voice wakeup word or voice information starting with the voice wakeup word.
For example, before performing voice interaction with the conference terminal, the user needs to first wake up the conference terminal by using the voice wakeup word. The voice information input by the user may include only the voice wakeup word, for example, “Hi, Scotty”. Alternatively, the voice information may be voice information starting with the voice wakeup word, for example, “Hi, Scotty, please call user A”, “Hi, Scotty, please share the screen of conference room B”, or “Hi, Scotty, I want to end the conference”. A sound collection device is disposed on the conference terminal. When the user inputs the voice information into the conference terminal, the conference terminal may collect the voice of the user.
Optionally, a photographing device may be disposed on the conference terminal. When the user inputs the voice information into the conference terminal, the conference terminal may collect a profile picture of the user.
It should be noted that an implementation of the voice wakeup word is not limited in this embodiment.
S302: Identity information of the user is obtained based on the voice of the user.
For example, when the voice information input by the user is received, the voice of the user is collected. Because the voice of the user is very recognizable information, the identity information of the user may be obtained in real time by using the voice of the user, thereby improving timeliness of obtaining the identity information of the user.
Then, whether the user is an authorized user may be determined based on the identity information of the user, and a personalized conference display interface may be customized for the user. For example, different interfaces may be displayed based on different display styles for users in different departments.
Optionally, the identity information of the user may include at least one of the following:
name, gender, age, graduation date, work experience, onboarding date, department, employee ID, office site number, fixed-line phone number, mobile phone number, whether the user is on a business trip, business trip destination, and hobbies.
Optionally, if the profile picture of the user is further collected when the voice information input by the user into the conference terminal is received, that identity information of the user is obtained based on the voice of the user in S302 may include:
obtaining the identity information of the user based on the voice and the profile picture of the user.
For example, the profile picture of the user is also very recognizable information. The identity information of the user is obtained based on both the voice and the profile picture of the user. This further improves accuracy of the identity information of the user, especially in a scenario in which a large quantity of users use the conference terminal and users change frequently, for example, a large enterprise with a large quantity of employees.
S303: A user voice instruction is obtained based on the voice information.
For example, after obtaining the voice information input by the user, the conference terminal may perform voice recognition and semantic understanding on the voice, to obtain the user voice instruction. The user voice instruction may be executed by the conference terminal.
It should be noted that an execution sequence of S302 and S303 is not limited in this embodiment. For example, S302 may be performed before or after S303, or S302 and S303 may be performed simultaneously.
S304: User interface information that matches the user is generated based on the identity information of the user, a conference status of the conference terminal, and the user voice instruction.
S305: The user interface information is displayed.
For example, user interface information that matches the user may be different based on different identity information of the user, different conference statuses, and different user voice instructions.
The following provides description by using an example.
In an example, the graduation date of a user is July 2018, and the onboarding date of the user is August 2018. Currently, it is November 2018. It indicates that the user is a new employee who has just worked for three months. It is assumed that the conference terminal is in a listening state. The user voice instruction is used to wake up the conference terminal. Therefore, after the conference terminal switches from the listening state to a standby state, displayed user interface information that matches the user may include prompt information related to entering a conference.
In another example, the onboarding date of a user is 2014. Currently, it is 2018. It indicates that the user has worked for four years. It may be determined that the user is familiar with a conference procedure. In a same scenario as the foregoing example, when the conference terminal switches from the listening state to the standby state, no prompt information may be displayed, and only a voice input window is displayed.
The conference status is used to indicate an execution stage and an execution status of a conference or the conference terminal. Specific classification of the conference status is not limited in this embodiment.
Optionally, the conference status may include at least one of the following: having not joined a conference, having joined a conference, sharing a screen in a conference, viewing a site in a conference, and the like.
It can be understood that, according to the voice user interface display method provided in this embodiment, when the voice information input by the user into the conference terminal is received, the voice of the user is collected. The user voice instruction may be obtained based on the voice information input by the user. The identity information of the user may be obtained in real time based on the voice of the user. Further, the user interface information that matches the user may be displayed based on the identity information of the user, the user voice instruction, and the current conference status of the conference terminal. Because identity information of a user is considered, usage requirements of different users for a conference may be recognized, and user interface information is generated for a target user, thereby meeting different requirements of different users for a conference system, improving diversity of display of the user interface information, and improving user experience in using the conference system.
Optionally, in S302, that identity information of the user is obtained based on the voice of the user may include:
obtaining voiceprint information of the user based on the voice of the user; and
determining the identity information of the user based on the voiceprint information of the user and a voiceprint information library.
For example, a voiceprint recognition technology or the like may be used to obtain the voiceprint information of the user, and then a match is searched in the voiceprint information library, so as to determine the identity information of the user.
Optionally, the voiceprint information library may be periodically updated.
Optionally, the obtaining the identity information of the user based on the voice and the profile picture of the user may include:
determining a position of the user relative to the conference terminal based on the voice of the user;
collecting facial information of the user based on the position of the user relative to the conference terminal; and
determining the identity information of the user based on the facial information of the user and a facial information library.
For example, the position of the user relative to the conference terminal may be determined by using a sound source tracking technology, a sound source localization technology, a lip movement recognition technology, or the like. Further, in an image or a video collected by the photographing device, a facial recognition technology or the like is used to collect the facial information of the user based on the position of the user relative to the conference terminal. Then, a match is searched in the facial information library based on the facial information of the user, to determine the identity information of the user.
Optionally, the facial information library may be periodically updated.
Optionally, the position of the user relative to the conference terminal may include a direction of the user relative to the conference terminal.
Optionally, in S302, the obtaining the identity information of the user based on the voice and the profile picture of the user may further include:
obtaining voiceprint information of the user based on the voice of the user; and
determining the identity information of the user based on the voiceprint information of the user and a voiceprint information library.
In this implementation, after the voiceprint information of the user is obtained, a match is searched in the voiceprint information library, to determine the identity information of the user. Because the identity information of the user is determined based on both the voiceprint information and facial matching of the user, accuracy of the identity information of the user is further improved.
Optionally, in S303, that a user voice instruction is obtained based on the voice information includes:
performing semantic understanding on the voice information to generate the user voice instruction;
or
sending the voice information to a server; and
receiving the user voice instruction sent by the server, where the user voice instruction is generated after the server performs semantic understanding on the voice information.
For example, in an implementation, the conference terminal may perform voice recognition and semantic understanding to generate the user voice instruction based on the voice information input by the user. A processing process of obtaining the user voice instruction is simplified.
In another implementation, data transmission may be performed between the conference terminal and the server, and the server performs voice recognition and semantic understanding on the voice information input by the user. The server returns the user voice instruction to the conference terminal. This reduces a hardware configuration of the conference terminal and is easy to implement.
This embodiment provides the voice user interface display method, including: when the voice information input by the user into the conference terminal is received, collecting the voice of the user; obtaining the identity information of the user based on the voice of the user; obtaining the user voice instruction based on the voice information; generating the user interface information that matches the user, based on the identity information of the user, the conference status of the conference terminal, and the user voice instruction; and displaying the user interface information. According to the voice user interface display method provided in this embodiment, when the voice information input by the user is received, the voice of the user is collected. The identity information of the user may be obtained in real time based on the voice of the user. Because identity information of a user, a conference status of the conference terminal, and a voice interaction task that the user expects to perform are considered, usage requirements of different users for a conference may be recognized, and user interface information is generated for a target user, thereby meeting different requirements of different users for a conference system, improving diversity of display of the user interface information, and improving user experience in using the conference system.
As shown in
S401: A type of the user is determined based on the conference status and the identity information of the user.
The type of the user is used to indicate a degree of familiarity of the user in completing a conference control task by inputting the voice information.
S402: If the type of the user indicates that the user is a new user, conference operation prompt information and a voice input interface are generated based on the conference status.
S403: If the type of the user indicates that the user is an experienced user, the voice input interface is generated.
For example, for a same user, if the conference status is different, the degree of familiarity of the user in completing the conference control task by inputting the voice information may be different. When it is determined that the user is a new user, the conference operation prompt information and the voice input interface may be generated. The conference operation prompt information may provide good conference guidance for the new user, thereby improving efficiency and accuracy of inputting a voice by the new user and improving a success rate of completing the conference control task by the new user. This meets a conference requirement of the new user. When it is determined that the user is an experienced user, no prompt information is required. In this case, only the voice input interface is generated. The user can complete the conference control task by directly inputting a voice. This saves time for displaying the conference operation prompt information and skips a step for displaying the conference operation prompt information, and skips a process for guidance, thereby improving efficiency of completing the conference control task by the experienced user. This meets a conference requirement of the experienced user and improves user experience.
The following provides description by using an example.
Optionally, in an example,
As shown on the left side of
It should be noted that a display location, display content, and a display style of the prompt area 101 and the voice input interface 102 are not limited in this example.
Optionally, the prompt area 101 may be displayed in a noticeable area of the voice user interface, so that the new user can easily see the prompt area 101.
Optionally, in another example,
As shown on the left side of
Optionally, according to the voice user interface display method provided in this embodiment, if the conference status indicates that the user has joined a conference, the method may further include:
obtaining role information of the user in the conference.
Correspondingly, in S402, that conference operation prompt information and a voice input interface are generated based on the conference status may include:
generating the conference operation prompt information and the voice input interface based on the conference status and the role information.
For example, for a conference, there may be a plurality of conference statuses in an entire process from creating the conference to ending the conference. Different conference statuses may relate to different conference control tasks. When the user has joined the conference, the user may have a plurality of roles in the conference, for example, a conference host and a non-conference host. Role classification for the user in the conference is not limited in this embodiment.
Therefore, based on different conference statuses, if the conference status indicates that the user has joined the conference, the conference operation prompt information and the voice input interface are generated based on the conference status and the role information of the user in the conference, thereby further improving a matching degree between the prompt information and the user, and improving user experience in using the conference system.
The following provides description by using an example.
As shown in
As shown in
By comparing scenario (a) with scenario (b), when the user has not joined the conference, a related conference control task may include “joining the conference” and does not include “exiting the conference”. However, if the user has joined the conference, a related conference control task may include “exiting the conference” and does not include “joining the conference”.
As shown in
As shown in
As shown in
As shown in
Optionally, in S401, that a type of the user is determined based on the conference status and the identity information of the user may include:
obtaining a historical conference record of the user based on the identity information of the user, where the historical conference record includes at least one of the following data: latest occurrence time of different conference control tasks, a quantity of cumulative task usage times, and a task success rate; and
determining the type of the user based on the conference status and the historical conference record of the user.
The following provides description with reference to
It should be noted that a manner of recording data in the historical conference record library is not limited in this embodiment. For example, the data may be stored in a form of a table.
Optionally, the historical conference record library may be periodically updated.
Optionally, the determining the type of the user based on the conference status and the historical conference record of the user may include:
obtaining data of at least one conference control task associated with the conference status in the historical conference record of the user; and
determining the type of the user based on the data of the at least one conference control task.
For example, for a conference, there may be a plurality of conference statuses in an entire process from creating the conference to ending the conference. Different conference statuses may relate to different conference control tasks. The type of the user is determined based on the data of the at least one conference control task associated with the conference status, thereby further improving accuracy of determining the type of the user.
Optionally, the determining the type of the user based on the data of the at least one conference control task may include:
for each conference control task, if data of the conference control task includes latest occurrence time, and a time interval between the latest occurrence time and current time is greater than or equal to a first preset threshold, and/or if data of the conference control task includes a quantity of cumulative task usage times, and the quantity of cumulative task usage times is less than or equal to a second preset threshold, and/or if data of the conference control task includes a task success rate, and the task success rate is less than or equal to a third preset threshold, determining that the user is a new user for the conference control task.
For example, for a condition that it is determined that the user is a new user for a conference control task, as long as data such as one of latest occurrence time, a quantity of cumulative task usage times, and a task success rate meets the corresponding condition that user is a new user, it may be determined that the user is a new user.
For example, if the data of the conference control task includes the last occurrence time and the task success rate, in an implementation, the time interval between the last occurrence time and the current time is greater than or equal to the first preset threshold, and the task success rate is greater than the third preset threshold. Because the latest occurrence time meets the corresponding condition that the user is a new user, even if the task success rate does not meet the corresponding condition that the user is a new user, it is also determined that the user is a new user.
It should be noted that specific values of the first preset threshold, the second preset threshold, and the third preset threshold are not limited in this embodiment.
It should be noted that, if there are various types of data used to determine that the user is a new user, an execution sequence of determining whether various types of data meet the corresponding condition that the user is a new user is not limited.
Optionally, the determining the type of the user based on the data of the at least one conference control task may include:
for each conference control task, if at least one of latest occurrence time, a quantity of cumulative task usage times, and a task success rate that are included in data of the conference control task meets a corresponding preset condition, determining that the user is an experienced user for the conference control task. A preset condition corresponding to the latest occurrence time is that a time interval between the latest occurrence time and current time is less than the first preset threshold, a preset condition corresponding to the quantity of cumulative task usage times is that the quantity of cumulative task usage times is greater than the second preset threshold, and a preset condition corresponding to the task success rate is that the task success rate is greater than the third preset threshold.
For example, for a condition that it is determined that the user is an experienced user for a conference control task, only when all data such as latest occurrence time, a quantity of cumulative task usage times, and a task success rate meets the corresponding condition that the user is an experienced user, it may be determined that the user is an experienced user.
For example, if the data of the conference control task includes the last occurrence time and the task success rate, only when the time interval between the latest occurrence time and the current time is less than the first preset threshold, and the task success rate is greater than the third preset threshold, it may be determined that the user is an experienced user for the conference control task.
For another example, if the data of the conference control task includes the last occurrence time, the quantity of cumulative task usage times, and the task success rate, only when the time interval between the latest occurrence time and the current time is less than the first preset threshold, the quantity of cumulative task usage times is greater than the second preset threshold, and the task success rate is greater than the third preset threshold, it may be determined that the user is an experienced user for the conference control task.
It should be noted that, if there are various types of data used to determine that the user is an experienced user, an execution sequence of determining whether various types of data meet the corresponding condition that the user is an experienced user is not limited.
This embodiment provides the voice user interface display method, and the type of the user is determined based on the conference status and the identity information of the user. If the type of the user indicates that the user is a new user, the conference operation prompt information and the voice input interface are generated based on the conference status. If the type of the user indicates that the user is an experienced user, the voice input interface is generated. For a new user, the conference operation prompt information may provide good guidance for the new user, thereby improving efficiency and accuracy of inputting a voice by the new user and improving a success rate of completing the conference control task. For an experienced user, this avoids displaying redundant prompt information, and skips a process for guidance, thereby improving efficiency of completing the conference control task by the experienced user. This meets different requirements of different users for the conference system and improves user experience.
Embodiment 3 further provides a voice user interface display method. Based on the embodiment shown in
In this embodiment, the user voice instruction is used to execute the conference control task after waking up the conference terminal. If a running result of the user voice instruction includes a plurality of candidates, in S304, that user interface information that matches the user is generated based on the identity information of the user, a conference status of the conference terminal, and the user voice instruction may include:
sorting the plurality of candidates based on the identity information of the user to generate the user interface information that matches the user.
The following provides description by using an example.
It is assumed that a voice input by a user 1 is that “Hi, Scotty, call Li Jun”. The generated user voice instruction is used to wake up the conference terminal, and then call Li Jun. However, there are a plurality of employees called Li Jun in the company. In addition, because the user inputs the voice, there are a plurality of names that have a same pronunciation as “Li Jun”. In this case, the running result of the user voice instruction includes a plurality of candidates. The plurality of candidates need to be sorted based on identity information of the user to generate user interface information that matches the user. Therefore, a matching degree between a displayed candidate result and the user is improved, and user experience is improved.
Optionally, the sorting the plurality of candidates based on the identity information of the user to generate the user interface information that matches the user may include:
obtaining a correlation between each candidate and the identity information of the user; and
sorting the plurality of candidates based on the correlations to generate the user interface information that matches the user.
The following provides description by using an example.
Optionally, in an example,
Optionally, in another example,
Optionally, if there is only one running result of the user voice instruction, user interface information is directly displayed.
The following provides description by using an example.
Optionally,
This embodiment provides the voice user interface display method. When the user voice instruction is used to execute the conference control task after waking up the conference terminal, if the running result of the user voice instruction includes the plurality of candidates, the plurality of candidates are sorted based on the identity information of the user to generate the user interface information that matches the user. Therefore, the matching degree between the displayed candidate result and the user is improved, and user experience is improved.
The receiving module 1201 is configured to, when voice information input by a user into a conference terminal is received, collect a voice of the user, where the voice information includes a voice wakeup word or voice information starting with the voice wakeup word.
The first obtaining module 1202 is configured to obtain identity information of the user based on the voice of the user.
The second obtaining module 1203 is configured to obtain a user voice instruction based on the voice information.
The generation module 1204 is configured to generate user interface information that matches the user, based on the identity information of the user, a conference status of the conference terminal, and the user voice instruction.
The display module 1205 is configured to display the user interface information.
In a possible implementation, the user voice instruction is used to wake up the conference terminal, and the generation module 1204 includes:
a first determining unit, configured to determine a type of the user based on the conference status and the identity information of the user, where the type of the user is used to indicate a degree of familiarity of the user in completing a conference control task by inputting the voice information; and
a first generation unit, configured to, if the type of the user indicates that the user is a new user, generate conference operation prompt information and a voice input interface based on the conference status.
In a possible implementation, the generation module 1204 further includes:
a second generation unit, configured to, if the type of the user indicates that the user is an experienced user, generate the voice input interface.
In a possible implementation, if the conference status indicates that the user has joined a conference, the generation module 1204 further includes:
a first obtaining unit, configured to obtain role information of the user in the conference; and
the first generation unit is configured to:
generate the conference operation prompt information and the voice input interface based on the conference status and the role information.
In a possible implementation, the first determining unit includes:
a first obtaining subunit, configured to obtain a historical conference record of the user based on the identity information of the user, where the historical conference record includes at least one of the following data: latest occurrence time of different conference control tasks, a quantity of cumulative task usage times, and a task success rate; and
a determining subunit, configured to determine the type of the user based on the conference status and the historical conference record of the user.
In a possible implementation, the determining subunit is configured to:
obtain data of at least one conference control task associated with the conference status in the historical conference record of the user; and
determine the type of the user based on the data of the at least one conference control task.
In a possible implementation, the determining subunit is configured to:
for each conference control task, if data of the conference control task includes latest occurrence time, and a time interval between the latest occurrence time and current time is greater than or equal to a first preset threshold, and/or if data of the conference control task includes a quantity of cumulative task usage times, and the quantity of cumulative task usage times is less than or equal to a second preset threshold, and/or if data of the conference control task includes a task success rate, and the task success rate is less than or equal to a third preset threshold, determine that the user is a new user for the conference control task; or
for each conference control task, if at least one of latest occurrence time, a quantity of cumulative task usage times, and a task success rate that are included in data of the conference control task meets a corresponding preset condition, determine that the user is an experienced user for the conference control task, where a preset condition corresponding to the latest occurrence time is that a time interval between the latest occurrence time and current time is less than the first preset threshold, a preset condition corresponding to the quantity of cumulative task usage times is that the quantity of cumulative task usage times is greater than the second preset threshold, and a preset condition corresponding to the task success rate is that the task success rate is greater than the third preset threshold.
In a possible implementation, the user voice instruction is used to execute a conference control task after waking up the conference terminal, a running result of the user voice instruction includes a plurality of candidates, and the generation module 1204 includes:
a third generation unit, configured to sort the plurality of candidates based on the identity information of the user to generate the user interface information that matches the user.
In a possible implementation, the third generation unit includes:
a second obtaining subunit, configured to obtain a correlation between each candidate and the identity information of the user; and
a generation subunit, configured to sort the plurality of candidates based on the correlations to generate the user interface information that matches the user.
In a possible implementation, the second obtaining module 1203 is configured to:
perform semantic understanding on the voice information to generate the user voice instruction;
or
send the voice information to a server; and
receive the user voice instruction sent by the server, where the user voice instruction is generated after the server performs semantic understanding on the voice information.
In a possible implementation, the receiving module 1201 is further configured to:
when the voice information input by the user into the conference terminal is received, collect a profile picture of the user; and
the first obtaining module 1202 is configured to obtain the identity information of the user based on the voice and the profile picture of the user.
In a possible implementation, the first obtaining module 1202 includes:
a second determining unit, configured to determine a position of the user relative to the conference terminal based on the voice of the user;
a collection unit, configured to collect facial information of the user based on the position of the user relative to the conference terminal; and
a third determining unit, configured to determine the identity information of the user based on the facial information of the user and a facial information library.
In a possible implementation, the first obtaining module 1202 further includes:
a second obtaining unit, configured to obtain voiceprint information of the user based on the voice of the user; and
a fourth determining unit, configured to determine the identity information of the user based on the voiceprint information of the user and a voiceprint information library.
The voice user interface display apparatus provided in this embodiment may be configured to perform the foregoing solutions in the voice user interface display method embodiments. Implementation principles and effects of the apparatus are similar to those of the method embodiments. Details are not described herein again.
The memory 1302 is configured to store program instructions.
The display 1303 is configured to display user interface information under control of the processor 1301.
The processor 1301 is configured to invoke and execute the program instructions stored in the memory 1302, and when the processor 1301 executes the program instructions stored in the memory 1302, the conference terminal is configured to perform the solutions in the voice user interface display method embodiments. Implementation principles and effects of the conference terminal are similar to those of the method embodiments. Details are not described herein again.
It may be understood that
An embodiment further provides a chip system. The chip system includes a processor, and may further include a memory, configured to perform the foregoing solutions in the voice user interface display method embodiments. Implementation principles and effects of the chip system are similar to those of the method embodiments. Details are not described herein again. The chip system may include a chip or may include a chip and another discrete component.
An embodiment further provides a program. When executed by a processor, the program is used to perform the foregoing solutions in the voice user interface display method embodiments. Implementation principles and effects of the program are similar to those of the method embodiments. Details are not described herein again.
An embodiment further provides a computer program product including instructions. When the computer program product is run on a computer, the computer is enabled to perform the foregoing solutions in the voice user interface display method embodiments. Implementation principles and effects of the computer program product are similar to those of the method embodiments. Details are not described herein again.
An embodiment further provides a computer-readable storage medium. The computer-readable storage medium stores instructions, and when the instructions are run on a computer, the computer is enabled to perform the foregoing solutions in the voice user interface display method embodiments. Implementation principles and effects of the computer-readable storage medium are similar to those of the method embodiments. Details are not described herein again.
In the embodiments, the processor may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or another programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or perform the methods, steps, and logical block diagrams disclosed in the embodiments. The general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed with reference to the embodiments may be directly performed by a hardware processor or may be performed by using a combination of hardware in the processor and a software module.
In the embodiments, the memory may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), or may be a volatile memory, such as a random access memory (RAM). The memory is any other medium that can be used to carry or store expected program code in a form of an instruction or a data structure and that can be accessed by a computer. However, this is not limited thereto.
In the several embodiments provided, it should be understood that the disclosed apparatus and method may be implemented in another manner. For example, the described apparatus embodiment is used as an example. For example, the unit division is merely logical function division and may be other division in an actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected based on actual requirements to achieve the objectives of the solutions of the embodiments.
In addition, functional units in the embodiments may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in a form of hardware or may be implemented in a form of hardware combined with a software functional unit.
A person of ordinary skill in the art may understand that sequence numbers of the foregoing processes do not mean execution sequences in various embodiments. The execution sequences of the processes should be determined based on functions and internal logic of the processes and should not constitute any limitation on the implementation processes of the embodiments.
All or some of the foregoing embodiments may be implemented by using software, hardware, firmware, or any combination thereof. When software is used for implementation, all or some of the embodiments may be implemented in a form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the procedure or functions according to the embodiments are all or partially generated. The computer may be a general-purpose computer, a dedicated computer, a computer network, or another programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or may be transmitted from a computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center to another website, computer, server, or data center in a wired (for example, a coaxial cable, an optical fiber, or a digital subscriber line (DSL)) or wireless (for example, infrared, radio, or microwave) manner. The computer-readable storage medium may be any usable medium accessible by a computer, or a data storage device, such as a server or a data center, integrating one or more usable media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, an SSD).
Number | Date | Country | Kind |
---|---|---|---|
201811467420.5 | Dec 2018 | CN | national |
This application is a continuation of International Application No. PCT/CN2019/118081, filed on Nov. 13, 2019, which claims priority to Chinese Patent Application No. 201811467420.5, filed on Dec. 3, 2018. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2019/118081 | Nov 2019 | US |
Child | 17331953 | US |