The present technology relates to an information processing apparatus and an information processing method, and more particularly, to an information processing apparatus and an information processing method capable of satisfactorily issue an instruction relating to a written sentence of an utterance in dictation.
In a case where a plurality of persons conducts the dictation, it is difficult to identify whether the plurality of persons is making an unrelated conversation or alternately conducting the dictation. In addition, the way of saying differs depending on the person. Hence, even when a command is distinguished with accuracy, the recognition result may not always be an intended one due to ambiguities in utterances of users, individual differences in expression, or the like.
For example, PTL 1 discloses that an input voice is divided into a plurality of segments, one or more phonemes are assigned to each segment, one or more words are decided based on the phoneme, one of the words stored in a storage unit is displayed on a monitor as a decided word, and words other than the decided word are set as next candidates of display.
[PTL 1]
Japanese Patent Laid-Open No. Hei 11-143487
In a case where one person conducts the dictation, such one person can determine, for example, whether or not what such one person inputs now is necessary. However, in a case where a plurality of persons conducts the dictation, it is impossible to determine whether or not one person is talking to another one or it is an input to an agent. Further, in a case where the plurality of persons alternately makes an input, since the characteristics and expressions in the utterances are different depending on the person, it may be difficult to correct incorrect recognition or the like with a candidate similar to that in a case where one person makes an input.
The present technology has an object to satisfactorily issue an instruction relating to a written sentence of an utterance in dictation.
A concept of the present technology lies in an information processing apparatus including a display control unit configured to control displaying of a written sentence of an utterance in dictation, a giving unit configured to give an initiative to a predetermined user, and an edit control unit configured to control such that the user to whom the initiative has been given is able to issue an instruction relating to the written sentence of the utterance.
In the present technology, the display control unit controls displaying of the written sentence of the utterance in the dictation. For example, the display control unit may display the written sentence of the utterance in a state in which a user who has made the utterance is identifiable. For example, by displaying in different colors or applying an icon or a symbol, the user who has made the utterance is made in an identifiable state. In addition, the display control unit may display the written sentence of the utterance in an undecided state until a decision is made. For example, blinking, gray characters, or the like is applicable. In this case, for example, the written sentence of the utterance may be decided by a timeout or a decision process.
The giving unit gives an initiative to a user. For example, the giving unit may give the initiative to the user who has started a dictation. In this case, for example, the giving unit may not give the initiative in a case where the user who has started the dictation has a predetermined attribute. This enables prevention of an occurrence of inconvenience due to the initiative to be given to a user having a predetermined attribute. For example, initiative in a case where the user who has started the dictation is equal to or younger than a predetermined age. This enables avoidance of mischief by a child. In addition, in this case, for example, the giving unit may give the initiative to the user depending on a receiver to whom the written sentence of the utterance is sent even in the case where the user who has started the dictation is equal to or younger than the predetermined age. This enables a child to send a message to, for example, a family member.
The edit control unit controls such that the user to whom the initiative has been given is able to issue an instruction relating to the written sentence of the utterance. For example, the instruction relating to the written sentence of the utterance includes send, decide, complete, register, cancel, clear, and the like.
In such a manner, in the present technology, the instruction relating to the written sentence of the utterance can be issued by the user to whom the initiative has been given. Therefore, the user to whom the initiative has been given is able to satisfactorily issue an instruction relating to the written sentence of the utterance in the dictation. For example, even in an environment in which a message is created by a plurality of persons, the user having the initiative is able to create and send the message as the user intends.
Hereinafter, an embodiment for carrying out the invention (hereinafter, referred to as an “embodiment”) will be described. It is to be noted that the description will be given in the following order.
1. Embodiment
2. Modification
The control unit 101 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random access memory), and the like and controls operations of each unit of the information processing apparatus 100. The input and output interface 102 connects the operation input device 103, the camera 104, the microphone 105, the speaker 106, and the display 107. The operation input device 103 constitutes an operation unit for performing various operation inputs by an administrator or a user of the information processing apparatus 100. The operation input device 103 also includes a touch panel arranged on a screen of the display 107.
The camera 104 captures an image of, for example, a user on the front side of the information processing apparatus 100 to obtain image data. The microphone 105 detects an utterance of a user to obtain voice data. The speaker 106 outputs a voice as a response output to the user. The display 107 outputs a screen as a response output to the user.
The user recognition unit 108 performs a face recognition process on the image data, detects a face of each user present in an image that is a field of view of the information processing apparatus 100, performs an image analysis process on an image of the detected face of each user, and identifies a user by comparing with a feature amount of each user that has been registered beforehand. Note that it is conceivable that the user recognition unit 108 analyzes the voice data, compares with the feature amount of each user that has been registered beforehand, and identifies the user. In addition, with respect to the user recognition, the user may designate with any recognition means (button operation, voice operation, or the like), even when the user is not automatically recognized.
Further, the user recognition unit 108 performs the image analysis process on the image of each user's face that has been detected and detects the direction of each user's face and the visual line of each user. Further, the user recognition unit 108 performs an analysis process on the image data of each user and detects a finger pointing direction indicating which direction a finger points, in a case where, for example, the user is indicating a direction with the finger. Various types of detection information obtained by the user recognition unit 108 in such manners are sent to the control unit 101.
The voice recognition unit 109 performs a voice recognition process on the voice data to obtain utterance text information. The utterance text information is sent to the control unit 101. The voice text information is held in association with a user, based on user identification information obtained by the user recognition unit 108, as described above. The communication interface 110 communicates with a cloud server, not illustrated, through a network such as the Internet, and obtains various types of information.
The semantic analysis guide database 111 is a database to be referred to in a case where an utterance of a user is in a request utterance mode, such as “tell me the weather for tomorrow,” or “what time is it now?” The dictation guide database 112 is a database to be referred to in a case where an utterance of a user is in a dictation mode, such as “send a message to oo,” “register schedule next month,” or “register ToDo.” Here, the dictation mode is a mode in which an utterance of a user is input into a text as it is, unlike an utterance of a request.
The information processing apparatus 100 illustrated in
A flowchart of
In a case where the mode identification is possible, the control unit 101 determines whether the mode corresponding to the utterance of the user is the request utterance mode or the dictation mode in step ST3. In a case of the request utterance mode, the control unit 101 performs the request utterance mode process in step ST4. On the other hand, in a case of the dictation mode, the control unit 101 performs the dictation mode process in step ST5.
In addition, in a case where the mode identification is not possible in step ST2, the control unit 101 performs the fuzzy mode process corresponding to both modes of the request utterance mode and the dictation mode in step ST6.
In the case of the utterance request mode, it is not necessary to write the words one by one precisely. It is sufficient if a command is communicated. In addition, in such a case, only the command may be executed without writing. In a case of incorrect recognition, it is considered that the user desires to know a candidate for re-execution as a command. Therefore, a command similar in a partial match or the like or a related command is presented together with an execution result.
In addition, in the case of the dictation mode, when a sentence is not written as the user said, the user desires to correct the sentence. In the case of incorrect recognition, it is considered that the user desires to see candidates for a correction utterance, a partially replaced phrase or a phrase with a symbol such as a question mark “?” is presented.
Further, in the case of the fuzzy mode, any of the request utterance and the dictation can be accepted. That is, while executing the request, a dictation standby is presented. In this case, the dictation standby is presented while executing the request such that areas that are separated are displayed on the presentation screen.
The dictation mode process will be further described.
In this case, Mammy makes an instruction utterance of “send.” This causes a message “buy milk on your way home, buy strawberry jam, too” to be sent to Daddy. In a case where “buy strawberry jam, too,” which is an utterance made by the child, is incorrect, the information processing apparatus 100 is not capable of identifying it. Hence, Mammy has to cancel that part intentionally. In addition, in this case, in a case where “buy strawberry jam, too” which is an utterance made by the child is incorrect and then the child makes an instruction utterance “send,” it is also important not to send the message “buy milk on your way home, buy strawberry jam, too.”
As illustrated in the above examples of
In addition, in the present embodiment, the user who has started the dictation has an initiative, and only the user having the initiative is able to issue an instruction such as send, decide, complete, register, cancel, and clear so as to prevent mischief or forcible interruption. In this case, in a case where the user who has started the dictation has a predetermined attribute (age, sex, character, ability, or the like), the initiative may not be given. This enables prevention of an occurrence of inconvenience caused by giving the initiative to a user having a predetermined attribute.
In this case, an utterance, an external sound, or the like that has been input unintentionally is subject to dictation, but is not executed. Therefore, this is not critical. In addition, as long as a decision process is not performed, temporary input information (for example, blinking, gray characters, or the like) may be set to provide a timeout to the decision process. Further, in a case where a child or the like possibly conducts mischief, the initiative may be given to an adult only. In this case, for example, in a case where a user who has started the dictation is equal to or younger than a predetermined age, the initiative is not given. Furthermore, for example, the process of the initiative may be changed depending on the person such that, in a case where a receiver is a family member, a child is also allowed to send. In this case, for example, depending on the receiver to whom the written sentence of the utterance is to be sent, the initiative is given even in the case where the user who has started the dictation is equal to or younger than a predetermined age.
For example,
In addition, in this case, the part “what time are you coming back home today?” and the part “buy a toy” are displayed so that the users that have made the respective utterances are identifiable, for example, in different colors. By displaying to be identifiable in such a manner, it is convenient to, for example, designate the part to be canceled.
It is to be noted that, in the above description, the example of canceling the utterance input by the child has been illustrated. However, in a similar manner, a meaningless written sentence due to incorrect recognition of an external sound or the like can be an utterance input. Also in such a case, the user having the initiative is able to delete the written sentence by making an instruction utterance “clear.” In addition, also in a case of being used in business or the like, it can be used as an application that gives the initiative to a person having a specific authority only.
Here, session management of inputs in the dictation mode will be described. In a case where a user who is making an utterance input in dictation is present, another user is able to make an utterance input additionally, without starting a new session in particular. In this case, in the case where a user who is making the utterance input is present, another user near the user who is making the utterance input is detected, and the utterance input of the another user is additionally written. In addition, in a case where it is clearly understood that it is not an additional utterance input from information regarding the face direction or the like of the another user, the utterance input is not written. By conducting the session management in such a manner, a user who performs an additional utterance input later does not have to mention a starting word, and each user is able to alternately make an utterance input.
Next, a decision process in the dictation mode will be described. A termination of an utterance is detected, and the decision process is performed for each termination. Such a decision process is performed by the user having the initiative making an instruction utterance “decide” or by a timeout due to the lapse of a certain period of time after the termination is detected. For example, an interruptive utterance can be cleared before the timeout at each termination. In a case of not being cleared, it is decided at the timeout or a decision utterance.
The utterance input continues as it is even when there is an utterance termination until the user decides the utterance. In this case, in a case where a part is desired to be cleared, the part to be decided is designated and then decided. For example, the part to be decided can be designated by making an utterance “decide including “coming back home?”” or “send including “coming back home?”” In addition, by designating the part desired to be cleared, clear is executed. For example, by making an utterance “buy,” a part continuous from “buy” (buy and later) is cleared. Further, for example, by making an utterance “buy a toy,” “buy a toy” as a whole is cleared.
Here, by using
Then, in the state illustrated in
Next, by using
In this state, when a certain period of time, for example, four seconds elapses, a timeout is determined. As illustrated in
In this case, the utterance part of Mammy “what time are you coming back home today?” and the utterance part of the child “buy a toy” are displayed so that which users have made the respective utterances are identifiable, for example, in different colors. It is to be noted that, instead of using different colors, the user can also be identified by an icon or a symbol. For example,
Next, by using
Mammy who has started dictation has the initiative. In the state of
It is to be noted that, in the example of
It is to be noted that, in the above description, the user having the initiative is able to perform the cancellation process in the state where the written sentence of the utterance input is in an undecided state. However, in such a state, each user is also able to perform a correction process on the sentence. Also in this case, the final decision of the correction process on the sentence can be performed by the user having the initiative.
In addition, in a case where a process such as cancellation or sentence correction is performed its time point is set as a new timeout process start point, for example. Accordingly, even in a case where a user performs a plurality of processes such as the cancellation and sentence correction, the user is able to perform the processes with an enough time.
Further, by using
Then, Mammy makes an instruction utterance “send.” Then, as illustrated in
Next, by using
In this state, as illustrated in
Then, Mammy having the initiative performs an instruction utterance “send,” as illustrated in
Next, by using
Mammy who has started dictation has the initiative. In the state of
A flowchart of
First, the control unit 101 starts the dictation mode process in step ST11. Next, the control unit 101 gives an initiative to a start utterance user in step ST12. Next, the control unit 101 determines whether or not there is an utterance in step ST13.
In a case where there is an utterance, the control unit 101 determines whether or not the utterance is a correction instruction utterance in step ST14. In a case where the utterance is the correction instruction utterance, the control unit 101 performs a correction process on a written sentence in step ST15, and then returns to the process of step ST13.
In a case where the utterance is not the correction instruction utterance, the control unit 101 determines whether or not the utterance is another instruction utterance other than the correction instruction, such as “clear,” “decide,” “register,” “send,” or “correct,” in step ST16. In a case where the utterance is not another instruction utterance, the control unit 101 displays the written sentence corresponding to the utterance on the display 107 in step ST17, and then returns to the process of step ST13.
In a case where the utterance is another instruction utterance in step ST16, the control unit 101 determines whether or not an utterance user has the initiative in step ST18. In a case where the utterance user does not have the initiative, the another instruction utterance is made invalid, and the control unit 101 returns to the process of step ST13.
In a case where the utterance user has the initiative in step ST18, the control unit 101 determines whether or not the instruction is a decision (send, register, or the like) in step ST19. In a case where the instruction is not a decision (send, register, or the like), the control unit 101 performs a process other than the decision (send, register, or the like) in step ST20, and then returns to the process of step ST13.
On the other hand, in a case where the instruction is a decision (send, register, or the like), the control unit 101 performs a decision process (send, register, or the like) in step ST21, and then ends a series of processes in step ST22.
A case where a plurality of users desires to conduct different tasks will be described. In this case, the information processing apparatus 100 regards as alternate utterances in a case where a domain (intent) and a slot (entity) are the same. Here, the domain means, for example, sending of a message, calendar registration, ToDo registration, and the like. In addition, the slot means, for example, a destination in a case of a domain in sending of the message, means a date and the like in a case of the calendar registration, and means an target person in a case of the ToDo registration. Therefore, the case where the domain and the slot are the same corresponds to a case where destinations in sending a message are the same, a case where dates in the calendar registration are the same, and a case where target persons in the ToDo registration are the same, or the like.
It is to be noted that, even in a case where the slots are different, as long as the domains are the same and the display is possible, the information processing apparatus 100 performs a process on an identical screen. Further, in a case where the domains are different, the information processing apparatus 100 divides a screen, performs a process in a presenting manner, or performs a process by substituting a voice output for a domain which cannot be displayed in a divided manner. For example, it is conceivable that in a case of performing a message sending task based on an utterance of Mammy “send a message to Daddy” and performing a request task based on an utterance of a child “display the weather,” the message sending task is performed on a screen, but regarding the weather, the weather is communicated to the child by voice.
A conversion candidate for a written sentence will be described. As described above, in the dictation mode process, a written sentence of an utterance in the dictation is displayed. In this case, a conversion candidate for a correction utterance of incorrect recognition is displayed.
How to present the conversion candidate will be described. Basically, priority is given to similar sound candidates as compared to candidates of notation variant (for example, whether a character is converted into a Japanese Kanji character or remains in a Japanese Hiragana character, whether a number is represented in a Japanese Kanji character or an Arabic numeral, and the like). This is because, with respect to the notation variant, even if it is included, its meaning can make sense. It is to be noted that a notation variant candidate can be presented to a user who is particular about the notation variant. In addition, only Japanese Hiragana characters can be presented to a child user. Whether or not a user is particular about the notation variant may be determined based on a personality attribute database of the user, or may be determined based on correction history information of the user in the past. Further, whether or not a user is a child can be determined based on user recognition results.
Regarding how to present the conversion candidate, a history is utilized to present for each utterance user. In this case, in a case where there is no similar sound candidate in the history of the target user, the history of another user such as a family member can be referred to. In this case, a candidate similar to the utterance is presented as a candidate from among the utterance input sentences of the target user in the past or from among the sentences used by another user in the past. In addition, in this case, a candidate suitable for a context, or a place, time, situation, and the like is presented with priority.
Next, how to designate for correction will be described. In a case where identical utterances are input, the utterance part is determined to be incorrect recognition, and a conversion candidate is changed to be different from the previous one. For example, in a case where a first utterance is “have a dinner,” a second utterance (correction utterance) is “have a dinner,” and a first written sentence is “have a dinner,” a second written sentence is corrected to, for example, “have a dinner?” which is different from the first one.
In addition, in a case where a correction utterance “xx instead of oo” is made, the corresponding part “oo” in the written sentence is corrected to “xx.” For example, consideration is given to a case where, in response to an utterance input “yuuhan taberu? (“have a dinner?” in Japanese),” a written sentence that has been recognized is “Yuu ha taberu (“Yuu eat” in Japanese).” In this case, in a case where a correction utterance “yuuhan instead of Yuu ha” is made, the part “Yuu ha” is corrected to “yuuhan.”
In addition, correction of the written sentence is conducted by a correction utterance of a conversion candidate only or the designation of the number of the conversion candidate. For example, consideration is given to a case where, in response to an utterance input “yuuhan taberu?,” a written sentence that has been recognized is “Yuu ha taberu.” In this case, in a case where a correction utterance “yuuhan” is made, correction is made to “yuuhan taberu.”
In addition, with respect to the written sentence of an utterance of a certain user, a correction utterance made by another user is also processed equally with a correction utterance made by the certain user. This enables another family member to make a correction utterance, in a case where the certain user's voice is hardly entered.
Correction in a case of alternately inputting a long sentence will be described. In this case, a sentence that has been input can be corrected. That is, while a certain user is inputting the next sentence, another user is able to correct a previous sentence. In this case, an utterance and an already input sentence are compared with each other. In a case where the similarity is equal to or more than a certain ratio, the utterance is regarded as an input of a correction sentence and a correction is made. In this case, the corrected part may be indicated so as to be understood by another user other than the user who has made the correction, for example, a user who is inputting the next sentence.
In addition, in this case, the sentence that has been input by a certain user can also be corrected by another user. In this case, the utterance and the already input sentence are compared with each other. In a case where the similarity is equal to or more than a certain ratio, the utterance is regarded as an input of a correction sentence. After the certain user confirms, the correction is decided. This prevents a correction of a sentence of a certain user from being corrected by another user without permission.
Next, by an utterance input of the user 1 “this fiscal year's main activities are participation in cultural festival and citizens' festival,” a written sentence corresponding to the utterance input is added. Next, by an utterance input of the user 2 “as budget, 350,000 yen in total is calculated,” a written sentence corresponding to the utterance input is added.
Next, according to an instruction utterance input of the user 2 “delete as budget and later,” as budget and later is deleted from the written sentence. In this case, the deletion is indicated so as to be understood by the user 1 (see the hatched part). Next, by an utterance input of the user 2 “budget is 350,000 yen in total,” a written sentence corresponding to the utterance input is added. In this case, the added part is displayed to be different in color from the other parts so that the user 1 understands the added part.
Next, by an utterance input of the user 2 for a correction instruction “performances on stage in citizens' festival,” the part of “citizens' festival” is corrected to “performances on stage in citizens' festival.” Also in this case, the corrected part is displayed to be different in color from the other parts so that the user 1 understands the corrected part. In this case, the input part of another user, which is not the user's, is corrected, and the corrected part becomes more noticeable.
Next, by an utterance input of a third party in a remote area or who is not a co-writer “activity plan in the fiscal year of 18,” the part “activity plan” is corrected to “activity plan in the fiscal year of 18.” In this case, the third party corrects the input part of the user, and the corrected part becomes more noticeable. It is to be noted that such a notice can be made, for example, in a special color. However, in
Utilization of other modalities in a case of being performed by a plurality of persons will be described. Use of an instruction word and a position will be described. For example, it is conceivable to select a conversion candidate corresponding to an utterance such as “change to a middle one” for correction, on the basis of the position of a user who is making an utterance. In addition, for example, it is conceivable to detect a standing position of each user, select a conversion candidate that is relatively close in response to an utterance “this,” and select a conversion candidate that is relatively distant in response to “that” so as to make a correction.
Use of a hand, a gesture, and a visual line will be described. By making an utterance “correct to this,” “change to this,” or the like, while pointing with a finger, touching, or the like to designate a conversion candidate, a correction is made by the conversion candidate that has been designated.
Further, a conversion candidate is selected in combination of an utterance and touching or the like to make a correction. For example, consideration is given to a case where, in response to an utterance input of a user “kaerini juunanzai kattekite (“buy a softening agent on your way home” in Japanese),” a written sentence that has been recognized is “kaerini juumankai kattekite (“buy ten thousand times on your way home” in Japanese),” and conversion candidates (1) juumankai (“ten thousand” in Japanese), (2) juunanzai (“a softening agent” in Japanses), and (3) juunansai (“a teenage” in Japanese) are presented. In this case, by making a second utterance “buy (touch (2)) on your way home” or “buy (2) on your way home,” (2) a softening agent is selected as the conversion candidate and a correction is made.
It is to be noted that, in a case where a plurality of users makes utterances, it is conceivable that a conversion candidate is presented to be close to a user who is currently making an utterance so as to be easily visible and easily touched. In addition, it is conceivable that in a written sentence, by presenting only a conversion candidate relating to the part on which the user's visual line stays, the user is able to select the conversion candidate with accuracy.
In
Note that it is also conceivable that a conversion candidate for correcting the written sentence of the utterance made by each user can be given by voice instead of the screen display. Also in such a case, the voice can be given to the user so that only the user can hear the voice.
In
Control by a display area will be described. In a case where a certain amount of the display area can be used, it is conceivable to display the whole text by emphasizing a difference between the candidates as the conversion candidates. In addition, in a case where the display area is small, it is conceivable to display only a part where a change is made. Further, for example, in a case where there is no display, it is conceivable to repeat by voice, and after only a part to be corrected is corrected, the corrected part is repeated. It is to be noted that the case where there is no display corresponds to, for example, a wearable device of a watch type, an earphone type, or the like.
As described heretofore, in the information processing apparatus 100 illustrated in
It is to be noted that in the above-described embodiment, the request utterance mode and the dictation mode have been described. However, a mixed mode is also conceivable such that a request part and a dictation part are identified from an utterance and an appropriate input is made.
In addition, in the above-described embodiment, as examples of conducting dictation, sending of a message, calendar registration, and ToDo registration have been illustrated (see
It is to be noted that, in the above-described embodiment, an example of making an input by an utterance of a user has been described. However, it is conceivable to give the initiative to a user who has input earlier, also in a case where the inputs are made through touching or gestures. Accordingly, even in the case where the inputs are made through touching or gestures, the initiative can be given to the user who has started the dictation, so that the user to whom the initiative has been given is able to perform a decision operation and the like.
Further, although not described above, it is conceivable that a list of coeditors is provided for each application such as sending of a message and calendar registration. The provision of the list in such a manner enables prevention of, for example, a specific user from relating to editing.
Further, although not described above, an Undo function may be provided in an editing process of a written sentence such as addition and correction in the dictation mode process. This enables the editing process, such as adding, clearing, and correcting, to be performed in an efficient manner.
Further, although not described above, in the dictation mode process, it is conceivable that a specific user, for example, utterances made by a child user are ignored. This enables avoidance of additions in the written sentence caused by unnecessary utterances such as mischief.
Further, in the above-described embodiment, a user who has started dictation has an initiative. However, it is conceivable that such an initiative can be passed on to another user while the dictation is being performed. This enables the another user to whom the initiative has been passed on to end the dictation, even in a case where the user who has started the dictation leaves for some reason while conducting the dictation.
Further, in the above-described embodiment, a user who has started dictation has an initiative. However, instead of deciding the user having the initiative at the time of starting the dictation, the user having the initiative may be decided when necessary.
Further, although not described above, depending on the application, which utterance has been made by which user may be stored. This enables the user who has made an utterance to be identifiable, by coloring the written sentence corresponding to the utterance of each user, or displaying an icon, a symbol, a name, or the like.
Further, although not described above, in a case where a written sentence is cleared, filtering may be performed with a username. For example, “clear the utterances of oo,” or the like. This enables saving of time and effort for designating a sentence to be cleared every time.
Further, in the above-described embodiment, a plurality of users who conducts the dictation includes humans. However, the plurality of users may partially include an AI (artificial intelligence) device.
Further, although not described above, in a case where the written sentence of the utterance in the dictation is cleared, such a cleared part may remain for a certain period of time, for example, in a translucent state, or the like. This enables confirmation of cleared contents and enables a mistakenly cleared content to be returned to the original one.
Further, although not described above, in inputs made by an utterance, a predetermined NG word may be filtered not to be written. In this case, it is conceivable to set the NG word for each user.
Further, although not described above, a written sentence made by an utterance of a user having an initiative may be displayed in an emphasized manner. This enables easy recognition of the written sentence of the utterance of the user having the initiative and enables understanding of who have the initiative.
Further, although not described above, in a case where an utterance of a user having an initiative overlaps with an utterance of another user, a written sentence relating to the utterance of the user having the initiative may be displayed first, and then a written sentence relating to the utterance of another user may be displayed.
Further, although not described above, on a display position of a written sentence relating to an utterance of a user having an initiative, a written sentence relating to an utterance of another user may be merged. This enables easy understanding of which user has the initiative.
Next, by an utterance input of a user 2 “as budget, 350,000 yen in total is calculated,” a written sentence corresponding to the utterance input is added. In this case, the sentence “as budget, 350,000 yen in total is calculated” is merged on the sentence “activity plan this fiscal year's main activities are participation in cultural festival and citizens' festival,” on the display in an animation-like manner.
In addition, the present technology is capable of having following configurations.
An information processing apparatus including:
a display control unit configured to control displaying of a written sentence of an utterance in dictation;
a giving unit configured to give an initiative to a predetermined user; and
an edit control unit configured to control such that the user to whom the initiative has been given is able to issue an instruction relating to the written sentence of the utterance.
The information processing apparatus described in the above (1), in which the display control unit displays the written sentence of the utterance in a state in which a user who has made the utterance is identifiable.
The information processing apparatus described in the above (1) or (2), in which the display control unit displays the written sentence of the utterance in an undecided state until a decision is made.
The information processing apparatus described in the above (3), in which the written sentence of the utterance is decided by a timeout or a decision process.
The information processing apparatus described in one of the above (1) to (4), in which the giving unit gives the initiative to a user who has started the dictation.
The information processing apparatus described in the above (5), in which the giving unit does not give the initiative in a case where the user who has started the dictation has a predetermined attribute.
The information processing apparatus described in the above (6), in which the giving unit does not give the initiative in a case where the user who has started the dictation is equal to or younger than a predetermined age.
The information processing apparatus described in the above (7), in which the giving unit gives the initiative to the user depending on a receiver to whom the written sentence of the utterance is sent, even in the case where the user who has started the dictation is equal to or younger than the predetermined age. (9)
An information processing method including:
a procedure of controlling displaying of a written sentence of an utterance in dictation;
a procedure of giving an initiative to a predetermined user; and
a procedure of controlling such that the user to whom the initiative is given is able to issue an instruction relating to the written sentence of the utterance.
100 . . . Information processing apparatus
101 . . . Control unit
102 . . . Input and output interface
103 . . . Operation input device
104 . . . Camera
105 . . . Microphone
106 . . . Speaker
107 . . . Display
108 . . . User recognition unit
109 . . . Voice recognition unit
110 . . . Communication interface
111 . . . Semantic analysis guide database
112 . . . Dictation guide database
113 . . . Bus
Number | Date | Country | Kind |
---|---|---|---|
2018-150961 | Aug 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/029716 | 7/29/2019 | WO | 00 |