The present invention relates to a mechanism for providing a user with information in accordance with the user's data input to a terminal apparatus.
There is a mechanism for enabling a terminal apparatus to execute processing conforming to an instruction that is issued by its user by speech (hereinafter, this mechanism is referred to as “speech agent system”).
For example, Non-Patent Literature 11 introduces examples of tasks carried out by a speech agent system. One of them is a task to cause a smartphone to display information of taxi companies that can dispatch a taxi to the current location of a user in response to a speech made by the user into the smartphone: “Search for a taxi around here!”1 Non-Patent Literature 1: NTT DOCOMO, Inc. What you can do with Shabette Concier. Retrieved Oct. 18, 2013, from http://www.nttdocomo.co.jp/service/information/shabette_concier/feature/index.html
A speech agent system enables a user to instruct a terminal apparatus to execute desired processing by speech. Generally, it takes less effort to issue an instruction by speech (hereinafter referred to as “speech instruction”) than to issue an instruction by character input and the like. However, a user who is unfamiliar with a speech instruction may not know what kind of speech he/she should make to cause a terminal apparatus to accurately execute processing conforming to an instruction. Even a user who is familiar with a speech instruction may not instantly come up with the content of a desirable speech directed to an instruction for processing that he/she desires.
In view of the foregoing issues, an object of the present invention is to alleviate difficulty experienced by a user when issuing a speech instruction.
To solve the problems, the present invention provides a terminal apparatus including: an attribute acquisition unit that acquires attribute data indicating an attribute of a user or an attribute of an environment surrounding the user; a sentence acquisition unit that acquires prompt sentence data indicating a sentence that prompts the user to issue a speech instruction, the prompt sentence data corresponding to the attribute indicated by the attribute data; a display control unit that instructs a display apparatus to display the sentence indicated by the prompt sentence data; a speech data acquisition unit that acquires speech data indicating a speech made by the user in response to the display apparatus displaying the sentence indicated by the prompt sentence data; a processing ID acquisition unit that acquires processing identification data identifying processing corresponding to an instruction indicated by the speech data; and a processing execution unit that executes the processing identified by the processing identification data.
The above terminal apparatus may further include a transmission unit that transmits the attribute data and the speech data to a server apparatus, and may be configured as follows: the sentence acquisition unit receives the prompt sentence data that is transmitted from the server apparatus in reply to transmission of the attribute data by the transmission unit; and the processing ID acquisition unit receives the processing identification data that is transmitted from the server apparatus in reply to transmission of the speech data by the transmission unit.
The above terminal apparatus may be configured as follows: the transmission unit transmits, to the server apparatus, prompt sentence identification data identifying the prompt sentence data indicating the sentence that is displayed by the display apparatus when the speech data is acquired by the speech data acquisition unit; and the processing ID acquisition unit receives the processing identification data that is transmitted from the server apparatus in reply to transmission of the speech data and the prompt sentence identification data, the processing identification data identifying the processing corresponding to a combination of the instruction indicated by the speech data and the sentence identified by the prompt sentence identification data.
The above terminal apparatus may be configured as follows: the attribute acquisition unit acquires the speech data indicating the speech made by the user as the attribute data.
The present invention also provides a server apparatus including: a reception unit that receives attribute data from a terminal apparatus, the attribute data indicating an attribute of a user of the terminal apparatus or an attribute of an environment surrounding the user; a sentence acquisition unit that acquires prompt sentence data indicating a sentence that prompts the user to issue a speech instruction, the prompt sentence data corresponding to the attribute indicated by the attribute data; a transmission unit that transmits the prompt sentence data to the terminal apparatus, wherein the reception unit receives speech data that is transmitted from the terminal apparatus after transmission of the prompt sentence data by the transmission unit; a speech recognition unit that recognizes an instruction indicated by the speech data; and a processing ID generation unit that generates processing identification data identifying processing corresponding to the instruction. The transmission unit transmits the processing identification data to the terminal apparatus in reply to the speech data received by the reception unit.
The above server apparatus may further include a storage control unit, and may be configured as follows: the reception unit receives the attribute data and the speech data from each of a plurality of terminal apparatuses; the storage control unit causes a storage apparatus to store the attribute data received by the reception unit from each terminal apparatus and instruction sentence data in association with each other, the instruction sentence data indicating a sentence of the instruction that is indicated by the speech data received by the reception unit from the terminal apparatus and that is recognized by the speech recognition unit; and the sentence acquisition unit generates prompt sentence data to be transmitted from the transmission unit to one of the plurality of terminal apparatuses using instruction sentence data that is stored in the storage apparatus in association with attribute data having a predetermined relationship with attribute data received by the reception unit from the one of the plurality of terminal apparatuses.
The above server apparatus may be configured as follows: the storage control unit causes the storage apparatus to store instruction sentence data and time data in association with each other, the time data indicating time of issuance of an instruction indicated by the instruction sentence data; and the sentence acquisition unit specifies instruction sentence data indicating an instruction that is used increasingly frequently as time elapses based on a plurality of pieces of instruction sentence data stored in the storage apparatus and time data stored in association with the plurality of pieces of instruction sentence data, and generates prompt sentence data to be transmitted from the transmission unit using the specified instruction sentence data.
The above server apparatus may be configured as follows: the storage control unit causes the storage apparatus to store instruction sentence data and terminal identification data in association with each other, the instruction sentence data being generated by the speech recognition unit from speech data, and the terminal identification data identifying a terminal apparatus that has transmitted the speech data; and the sentence acquisition unit generates, as prompt sentence data to be transmitted from the transmission unit to one of the plurality of terminal apparatuses, prompt sentence data prompting an instruction that does not bear a predetermined similarity to an instruction indicated by instruction sentence data that is stored in the storage apparatus in association with terminal identification data identifying the one of the plurality of terminal apparatuses.
The above server apparatus may further include an relevance data acquisition unit that acquires inter-processing relevance data indicating a magnitude of relevance between two arbitrary items of processing included among a plurality of items of processing, and may be configured as follows: the reception unit receives the speech data transmitted from the terminal apparatus as the attribute data; the speech recognition unit recognizes an instruction indicated by the attribute data; the processing ID generation unit generates processing identification data identifying an item of processing corresponding to the instruction indicated by the attribute data; and the sentence acquisition unit selects one item of processing from among the plurality of items of processing based on a magnitude of relevance to the item of processing corresponding to the instruction indicated by the attribute data, and acquires prompt sentence data indicating a sentence prompting an instruction for the selected one item of processing as prompt sentence data corresponding to the attribute indicated by the attribute data, the magnitude being indicated by the inter-processing relevance data.
The present invention also provides a program for causing a computer to execute: a process of acquiring attribute data indicating an attribute of a user or an attribute of an environment surrounding the user; a process of acquiring prompt sentence data indicating a sentence that prompts the user to issue a speech instruction, the prompt sentence data corresponding to the attribute indicated by the attribute data; a process of instructing a display apparatus to display the sentence indicated by the prompt sentence data; a process of acquiring speech data indicating a speech made by the user in response to the display apparatus displaying the sentence indicated by the prompt sentence data; a process of acquiring processing identification data identifying processing corresponding to an instruction indicated by the speech data; and a process of identification by the processing identification data.
The present invention prompts a user to issue a speech instruction corresponding to an attribute of the user or an attribute of the environment surrounding the user. The user can think about the content of a speech with reference to the content of the prompt.
This alleviates difficulty experienced by the user when issuing a speech instruction.
The following describes speech agent system 1 according to an embodiment of the present invention.
Terminal apparatus 11 includes the same hardware components as, for example, an ordinary slate personal computer equipped with a touch display. Alternatively, terminal apparatus 11 may be any of other types of computers.
Memory 101 is a storage apparatus including a volatile semiconductor memory, a non-volatile semiconductor memory, and the like. It stores an operation system (OS), application programs, and various types of data, such as user data, and is used as a working area for data processes executed by processor 102. Processor 102 is a processing apparatus, such as a central processing unit (CPU) and a graphics processing unit (GPU). Communication IF 103 is an interface that performs various types of wireless data communication with server apparatus 12 via communication network 19.
Touch display 104 includes display 1041 and touchscreen 1042. Display 1041 is a display apparatus, such as a liquid crystal display, and displays characters, graphics, photographs, and the like. Touchscreen 1042 is, for example, a capacitive touchscreen. It is an input device that, when a finger or a similar pointer has touched or become adjacent to the input device, accepts a user operation by specifying the position of the touch or adjacency. In the following description, the touch or adjacency is simply referred to as “touch” for the sake of convenience.
Display 1041 and touchscreen 1042 are stacked. When the user touches an image displayed on display 1041 with the pointer, the pointer actually touches touchscreen 1042, and the position of the touch is specified. In conformity to the OS and application programs, processor 102 specifies the content of an operation intended by the user's touch with the pointer based on the position specified by touchscreen 1042.
Microphone 105 is a sound pickup apparatus that picks up sound and generates sound data. In speech agent system 1, microphone 105 picks up the user's speech and generates speech data. Clock 106 is an apparatus that continuously measures a period elapsed since reference time, and generates time data indicating the current time. GPS unit 107 is an apparatus that receives signals from a plurality of satellites, specifies the current position of terminal apparatus 11 (that is to say, the current position of the user) based on the received signals, and generates position data indicating the specified position.
In terminal apparatus 11 including the foregoing hardware components, processor 102 executes processes conforming to the programs stored in memory 101. As a result, terminal apparatus 11 acts as an apparatus including functional components shown in
Terminal apparatus 11 includes attribute acquisition unit 111, transmission unit 112, sentence acquisition unit 113, display control unit 114, speech data acquisition unit 115, processing ID acquisition unit 116, and processing execution unit 117 as functional components.
Attribute acquisition unit 111 acquires attribute data indicating the attributes of the user of terminal apparatus 11 or the attributes of the environment surrounding the user. In the present embodiment, data indicating the gender, age, and current position of the user and the current time is used as the attribute data by way of example. The gender and age of the user are examples of the attributes of the user, whereas the current position of the user and the current time are examples of the attributes of the environment surrounding the user. Data indicating the gender and age of the user is input to terminal apparatus 11 by a user operation using, for example, touchscreen 1042, and attribute acquisition unit 111 acquires the data thus input by the user. On the other hand, attribute acquisition unit 111 acquires, for example, position data generated by GPS unit 107 as data indicating the current position of the user. In the present embodiment, data indicating the current time (time data) is generated by server apparatus 12 for use, and hence attribute acquisition unit 111 need not acquire time data.
Transmission unit 112 transmits the attribute data acquired by attribute acquisition unit 111 to server apparatus 12. Transmission unit 112 also transmits speech data acquired by speech data acquisition unit 115 to server apparatus 12.
Suggestion sentence acquisition unit 113 acquires prompt sentence data, which indicates a sentence prompting the user of terminal apparatus 11 to issue a speech instruction, by receiving the prompt sentence data from server apparatus 12. Display instruction unit 114 instructs display 1041 to display the sentence indicated by the prompt sentence data acquired by sentence acquisition unit 113.
Speech data acquisition unit 115 acquires, from microphone 105, speech data indicating a speech that has been made by the user and picked up by microphone 105. Transmission unit 112 described above transmits the speech data acquired by speech data acquisition unit 115 to server apparatus 12. Processing identification data acquisition unit 116 acquires processing identification data that is transmitted from server apparatus 12 in reply to the speech data transmitted from transmission unit 112. The processing identification data acquired by processing ID acquisition unit 116 identifies processing corresponding to an instruction indicated by the speech data transmitted from transmission unit 112 to server apparatus 12. In the present embodiment, the processing identification data identifies processing using a combination of a function ID identifying a function and a parameter specifying specific processing of the function by way of example.
Processing execution unit 117 executes the processing identified by the processing identification data acquired by processing ID acquisition unit 116.
Components of server apparatus 12 will now be described. Server apparatus 12 has the same hardware components as an ordinary computer that can perform data communication with an external apparatus via communication network 19.
Memory 201 is a storage apparatus including a volatile semiconductor memory, a non-volatile semiconductor memory, and the like. It stores an OS, application programs, and various types of data, such as user data, and is used as a working area for data processes by processor 202. Processor 202 is a processing apparatus, such as a CPU and a GPU. Communication IF 203 is an interface that performs various types of data communication with other apparatuses via communication network 19.
Server apparatus 12 acts as an apparatus including functional components shown in
Reception unit 121 receives attribute data transmitted from each of terminal apparatuses 11. Reception unit 121 also receives speech data transmitted from each of terminal apparatuses 11.
Speech recognition unit 122 recognizes an instruction indicated by the speech data received by reception unit 121 through a known speech recognition process, and generates instruction sentence data indicating a sentence of the recognized instruction. Processing ID generation unit 123 generates processing identification data that identifies processing to the instruction sentence data generated by speech recognition unit 122.
Transmission unit 124 transmits the processing identification data generated by processing ID generation unit 123 to terminal apparatus 11 that transmitted the speech data used to generate the processing identification data. Transmission unit 124 also transmits prompt sentence data acquired by sentence acquisition unit 127 to terminal apparatus 11 that transmitted attribute data used to acquire the prompt sentence data.
Storage control unit 125 causes memory 201 to store the following items in association with one another: attribute data received by reception unit 121 from one of terminal apparatuses 11, instruction sentence data that has been generated by speech recognition unit 122 using speech data received by reception unit 121 from the same terminal apparatus 11, and time data (generated by timer unit 126) indicating the time of issuance of an instruction indicated by the instruction sentence data.
Timer unit 126 generates time data indicating the current time. Suggestion sentence acquisition unit 127 acquires prompt sentence data corresponding to the attributes indicated by attribute data received by reception unit 121 from one of terminal apparatuses 11 by generating the prompt sentence data using such data as pieces of attribute data that have been received from various terminal apparatuses 11 and stored in memory 201, and pieces of instruction sentence data stored in memory 201 in association with such pieces of attribute data.
A structure of data stored in terminal apparatus 11 and server apparatus 12 will now be described. Memory 101 of terminal apparatus 11 stores terminal identification data that identifies terminal apparatus 11, and data indicating the gender and age of the user. The terminal identification data is acquired from server apparatus 12 when, for example, terminal apparatus 11 activates a program according to the present embodiment for the first time. The data indicating the gender and age of the user is, for example, input by the user with the use of touchscreen 1042.
Memory 201 of server apparatus 12 stores an attribute database, a synonym database, a relevance database, and a log database. The attribute database manages attribute data of the user of terminal apparatus 11. The synonym database manages synonym data indicating a correspondence relationship between a basic keyword (base keyword) and a keyword that is synonymous with the base keyword (synonymous keyword). The relevance database manages relevance data indicating the magnitudes of relevance between various keywords and various functions. The log database manages log data related to a speech instruction issued by the user of terminal apparatus 11.
In [keyword], text data indicating a keyword (one of the base keywords stored in the synonym database) is stored. Text data indicating a type(s) of a keyword is stored in [type]. For example, in
A function ID that identifies a function is stored in [function ID]. Text data indicating a name of a function is stored in [function name] Hereinafter, an individual function is referred to as a function “(function name) ”
Text data indicating a type of a parameter used for a function is stored in [parameter]. For example, in
A score representing numeric data indicating a magnitude of relevance between a keyword and a function is stored in [score]. Note that each record in the relevance database can store a plurality of sets of data in [function ID], [function name], [parameter], and [score].
The following describes tasks carried out by speech agent system 1 with the foregoing components.
First, when the user performs a predetermined operation on terminal apparatus 11, display control unit 114 of terminal apparatus 11 causes display 1041 to display a dialogue screen for the wait state (
The communication connection established between terminal apparatus 11 and server apparatus 12 is maintained during display of the dialogue screen on display 1041. Once server apparatus 12 identifies terminal apparatus 11 upon establishment of the communication connection, it can thereafter keep identifying terminal apparatus 11 via the communication connection until the communication connection is dissolved. Therefore, after terminal apparatus 11 transmits the terminal identification data to server apparatus 12 in step S102, it need not re-transmit the terminal identification data to server apparatus 12 in the processes described below.
Reception unit 121 of server apparatus 12 receives the terminal identification data and the position data transmitted from terminal apparatus 11 (step S103). Storage control unit 125 reads out the attribute database (
After step S104, the processes of steps S105 to S108 are continuously executed. As a part of data used in these processes is generated in the processes of steps S111 to S121 described below, steps S111 to S121 will now be described first.
With the start of the wait state, speech data acquisition unit 115 of terminal apparatus 11 waits for output of speech data indicating the user's speech from microphone 105, in parallel with the process of step S102. If the user issues a speech instruction (“Yes” of step S111), microphone 105 outputs the speech data, and speech data acquisition unit 115 acquires the speech data (step S112). Transmission unit 112 transmits the speech data acquired by speech data acquisition unit 115 to server apparatus 12 (step S113).
When reception unit 121 of server apparatus 12 receives the speech data transmitted from terminal apparatus 11 (step S114), speech recognition unit 122 recognizes the content of the speech indicated by the speech data, and generates spoken sentence data indicating a sentence of the recognized content (instruction sentence data indicating an instruction sentence prior to synonym conversion) (step S115). For instance, if the user issues a speech instruction “Tell me the location of Shinjuku Station” as exemplarily shown in
Subsequently, processing ID generation unit 123 converts a keyword (synonymous keyword) contained in the sentence indicated by the spoken sentence data generated by speech recognition unit 122 into a base keyword in conformity to synonym data stored in the synonym database (
Subsequently, processing ID generation unit 123 specifies processing corresponding to the instruction sentence indicated by the instruction sentence data generated in step S116, and generates processing identification data that identifies the specified processing (step S117). Specifically, processing ID generation unit 123 first extracts keywords contained in the instruction sentence indicated by the instruction sentence data. Subsequently, for each of the extracted keywords, processing ID generation unit 123 extracts a record that stores the keyword in [keyword] from the relevance database (
For example, assume that instruction sentence data indicating a sentence “Please tell me the location of Shinjuku Station” is generated in step S116. In this case, processing ID generation unit 123 extracts “Please tell me the location of” and “Shinjuku Station” as keywords. Subsequently, processing ID generation unit 123 extracts, from the relevance database, a record that stores “Shinjuku Station” in [keyword] (the fourth record in
Processing ID generation unit 123 specifies a function for which the highest score has been specified in the foregoing manner as a function corresponding to the instruction sentence. Subsequently, processing ID generation unit 123 extracts, from among the keywords extracted from the instruction sentence data, a keyword with a type indicated by data stored in [parameter] of relevance data associated with the specified function. Then, processing ID generation unit 123 generates processing identification data that includes a function ID identifying the function specified in the foregoing manner, and that includes the extracted keyword (if any) as a parameter. For example, processing ID generation unit 123 generates processing identification data including the function ID “F2537” of the function “map display” and a parameter “Shinjuku Station” as the processing identification data associated with the instruction sentence “Please tell me the location of Shinjuku Station.”
Transmission unit 124 transmits the processing identification data generated by processing ID generation unit 123, as a reply to the speech data received by reception unit 121 in step S114, to terminal apparatus 11 that transmitted the speech data (step S118). Processing identification data acquisition unit 116 of terminal apparatus 11 receives the processing identification data transmitted from server apparatus 12 (step S119). Processing execution unit 117 executes processing identified by the processing identification data received by processing ID acquisition unit 116 (step S120). As a result, the processing execution screen exemplarily shown in
On the other hand, in parallel with the process of step S118, storage control unit 125 of server apparatus 12 updates the log database (
In this case, precisely speaking, time indicated by the time data stored in [time] is later than the time of issuance of the speech instruction by a period required to execute steps S112 to S117. However, as the difference therebetween is practically ignorable, this time data is used as data indicating the time of issuance of the speech instruction. Similarly, precisely speaking, the position indicated by the position data stored in [position] may be different from the position of the user at the time of issuance of the speech instruction. However, as the difference therebetween is also practically ignorable, this position data is used as data indicating the position of the user at the time of issuance of the speech instruction. In order to store data indicating more accurate time and position in the log database, for example, terminal apparatus 11 may include a timer unit and transmit, to server apparatus 12, time data indicating the time of acquisition of the speech data in step S112 as well as position data generated by GPS unit 107 at the time of acquisition of the speech data in step S112, and server apparatus 12 may store these pieces of data in the log database.
The process of step S121 is executed each time a speech instruction is issued by a user of any one of various terminal apparatuses 11. As a result, the log database (
A description is now given of the processes of steps S105 to S108 that follow step S104. After storage control unit 125 has updated the attribute database (
Specifically, sentence acquisition unit 127 combines the log database (
Subsequently, sentence acquisition unit 127 generates prompt sentence data using the records extracted in step S105 (step S106). Specifically, first, sentence acquisition unit 127 groups the records in such a manner that records in one group store the same data in [processing identification data]. Then, sentence acquisition unit 127 counts the number of records included in each group. Furthermore, for each group, sentence acquisition unit 127 specifies data that is largest in number among the entire data stored in [instruction sentence] of the records included in the group as representative instruction sentence data of the group. As a result, a data table exemplarily shown in
Suggestion sentence acquisition unit 127 selects, from the instruction sentence list, a predetermined number of (e.g., 10) records in descending order of the number indicated by data stored in [number], and generates prompt sentence data indicating, for example, a sentence “An inquiry ‘XXX’ is often made recently” using the pieces of instruction sentence data stored in [instruction sentence] of the selected records (“XXX” denotes an instruction sentence indicated by each instruction sentence data). Note that the format of the sentence indicated by the prompt sentence data generated by sentence acquisition unit 127 is not limited to the foregoing example. For instance, instruction sentence data per se may be generated as the prompt sentence data. Alternatively, data indicating a sentence generated by retrieving a part of a sentence indicated by instruction sentence data and embedding the extracted part in a model sentence may be generated as the prompt sentence data.
Transmission unit 124 transmits the prompt sentence data generated by sentence acquisition unit 127, as a reply to the terminal identification data and the position data received by reception unit 121 in step S103, to terminal apparatus 11 that transmitted these pieces of data (step S107). Suggestion sentence acquisition unit 113 of terminal apparatus 11 receives the prompt sentence data transmitted from server apparatus 12 (step S108).
Consequently, terminal apparatus 11 acquires, from server apparatus 12, a predetermined number of (e.g., 10) pieces of prompt sentence data corresponding to the attributes of the user and the attributes of the environment surrounding the user. In this state, if a predetermined period (e.g., 10 seconds) has elapsed without issuance of a speech instruction since terminal apparatus 11 entered a state in which it waits for the speech instruction (“Yes” of step S131), display control unit 114 selects one piece of prompt sentence data, randomly for example, from among the predetermined number of pieces of prompt sentence data received in step S108, and causes display 1041 to display a dialogue screen presenting a sentence indicated by the selected piece of prompt sentence data (step S132). As a result, the dialogue screen exemplarily shown in
Thereafter, if the user issues a speech instruction (“Yes” of step S111), the processes of steps S112 to S121 are repeated, and the processing execution screen exemplarily shown in
As described above, when the user intends to issue a speech instruction but does not instantly come up with the content of the speech instruction, speech agent system 1 presents the user with a prompt sentence corresponding to the attributes of the user and the attributes of the environment surrounding the user. This enables the user to issue a speech instruction with ease.
Speech agent system 1 described above is an embodiment of the present invention, and can be modified in various ways within the scope of the technical ideas of the present invention. Examples of such modifications will now be described. Below, the modification examples will be described mainly with a focus on the differences between the modification examples and the embodiment, and a description of components and tasks that are similar to those of the embodiment will be omitted as appropriate. Furthermore, among components of a speech agent system according to the following modification examples, components that are the same as or correspond to the components of speech agent system 1 according to the embodiment are given the same reference signs thereas. Note that two or more of the embodiment and the following modification examples may be combined as appropriate.
(1) To generate prompt sentence data, sentence acquisition unit 127 may specify, from among the entire instruction sentence data stored in the log database (
This modification example has a high probability of presenting a user with a prompt sentence indicating an example speech instruction that is frequently used by many users recently. Therefore, this modification example is desirable for a user who wants to obtain hot-topic information that is attracting the attention of many other users at that point.
(2) Suggestion sentence acquisition unit 127 may generate prompt sentence data to be transmitted to, for example, terminal apparatus 11-X in such a manner that the generated prompt sentence data prompts an instruction that does not bear a predetermined similarity to an instruction indicated by certain instruction sentence data stored in the log database (
In a specific example of this modification example, sentence acquisition unit 127 extracts processing identification data stored in the log database in association with the terminal identification data of terminal apparatus 11-X, and generates prompt sentence data using log data other than log data that stores, in [processing identification data], processing identification data including the function ID included in the extracted processing identification data. In this case, a user of terminal apparatus 11-X is presented with a prompt sentence prompting a speech instruction for executing processing that uses a function different from any function that he/she used in the past by way of a speech instruction. Thus, the user is given the opportunity to use a function that he/she has never used in the past.
In another specific example of this modification example, sentence acquisition unit 127 excludes, from the entire log data stored in the log database, log data that stores the terminal identification data of terminal apparatus 11-X in [terminal identification data], and generates prompt sentence data using only log data related to terminal apparatuses 11 different from terminal apparatus 11-X. When the number of pieces of log data stored in the log database is small, the generation of prompt sentence data to be transmitted to terminal apparatus 11-X has a high probability of using instruction sentence data included in log data related to speech instructions that were issued on terminal apparatus 11-X in the past. This inconvenience does not occur in the foregoing specific example.
The user is aware of speech instructions that he/she issued in the past, and generally there is no difficulty in issuing a similar speech instruction. Therefore, in light of the object of the present invention, it is not desirable to present the user with a prompt sentence prompting a speech instruction that is the same as or similar to a speech instruction that he/she issued in the past. The present modification example lowers the probability of the occurrence of such inconvenience.
(3) When a user issues a speech instruction during display of a prompt sentence on display 1041 in step S132 of
In server apparatus 12, processing ID generation unit 123 specifies “that” included in the instruction sentence “I am interested in that, too” as “Akihabara Theater” included in the prompt sentence “An inquiry ‘What is Akihabara Theater?’ is often made recently.” Then, it generates a sentence “I am interested in Akihabara Theater, too” as well as processing identification data corresponding to this sentence (step S117 of
In the foregoing example, prompt sentence data is transmitted from terminal apparatus 11 to server apparatus 12. In the present modification example, it is sufficient that data transmitted from terminal apparatus 11 to server apparatus 12 be data that identifies a prompt sentence (prompt sentence identification data), and prompt sentence data is an example of such data. Therefore, for example, server apparatus 12 may transmit individual prompt sentence data to terminal apparatus 11 with prompt sentence identification data attached thereto (step S107 of
In the present modification example, when the user wants to issue a speech instruction that is the same as or similar to an example speech instruction indicated by a prompt sentence, the user need not read out the example speech instruction, and can issue a speech instruction to terminal apparatus 11 in a more natural speaking style.
(4) In the embodiment, the gender and age of a user are used as the attributes of the user that are used to generate prompt sentence data. Furthermore, the current position of the user and the current time are used as the attributes of the environment surrounding the user that are used to generate prompt sentence data. In the present invention, the attributes of the user and the attributes of the environment surrounding the user, which are used to generate prompt sentence data, are not limited to the ones just mentioned above, and various types of other attributes can be used.
For example, the hobbies and occupation of the user, the number of times a speech instruction was issued in the past (indicating a skill in issuing a speech instruction), the frequency of issuance of a speech instruction in the past, and the like may compose the attributes of the user that are used to generate prompt sentence data, either in addition to or in place of the gender and age.
Furthermore, for example, the current weather and air temperature of the area where the user is located, information indicating whether the user is at home, in an office, or in another place, information indicating whether today is a weekday or a day off, and the like may compose the attributes of the environment surrounding the user that are used to generate prompt sentence data, either in addition to or in place of the current position and the current time.
(5) A speech instruction that was issued by a user in the past (e.g., most recently) serves as an attribute indicating the user's hobby or request. Therefore, a speech instruction that was issued by the user in the past may compose the attributes of the user that are used to generate prompt sentence data. In this modification example, memory 201 of server apparatus 12 stores an inter-processing relevance database and a model sentence database. The inter-processing relevance database manages, for each arbitrary processing pair among a plurality of items of processing that can be executed by terminal apparatus 11, inter-processing relevance data indicating a magnitude of relevance between the processing pair. The model sentence database manages model sentence data indicating a model of a prompt sentence corresponding to each item of processing.
For example, data exemplarily shown in the first row in
For example, data exemplarily shown in the first row in
In conformity to the inter-processing relevance data, sentence acquisition unit 127 of server apparatus 32 selects one item of processing from among a plurality of items of processing that can be executed by terminal apparatus 31 based on, for example, a magnitude of relevance to an item of processing identified by processing identification data that was generated by processing ID generation unit 123 most recently, and generates prompt sentence data prompting an instruction for the selected item of processing.
In the present modification example, first, a user of terminal apparatus 31 issues a speech instruction after the start of display of a dialogue screen (step S101). Then, terminal apparatus 31 and server apparatus 32 execute a sequence of processes (steps S112 to S120) corresponding to the speech instruction, and terminal apparatus 31 executes processing corresponding to the speech instruction.
Note that in the present modification example, after generating processing identification data that identifies processing corresponding to the speech instruction (step S117), server apparatus 32 generates prompt sentence data (step S301), and transmits the processing identification data (step S118) together with the prompt sentence data generated in step S301. Terminal apparatus 31 receives the prompt sentence data that has been transmitted from server apparatus 32 in response to transmission of the speech data (step S113), together with the processing identification data (step S119). The prompt sentence data received in step S119 is later used to display a prompt sentence (step S132).
The following describes an exemplary procedure in which server apparatus 32 generates the prompt sentence data in step S301. First, relevance data acquisition unit 321 searches the inter-processing relevance database (
Suggestion sentence acquisition unit 127 selects, from among data included in [second function] of the record received from relevance data acquisition unit 321, a function ID stored in [function ID] associated with [score] indicating the largest numeric value as a function ID that identifies a function of the greatest relevance to a function corresponding to a speech instruction that was issued by the user most recently. Subsequently, sentence acquisition unit 127 searches the model sentence database (
Subsequently, if keywords contained in the processing identification data that was generated by processing ID generation unit 123 in step S117 include a keyword whose type matches “(place)” or the like indicated by the model sentence data, sentence acquisition unit 127 substitutes a character string in “(place)” or the like with that keyword. Data indicating a post-substitution sentence serves as the prompt sentence data. This concludes the description of the exemplary procedure in which server apparatus 32 generates the prompt sentence data in step S301.
(6) In the embodiment, there is no particular restriction regarding the new/old states of the times of issuance of speech instructions indicated by log data used to generate prompt sentence data, and the entire log data stored in the log database is the target of extraction in step S105 (
(7) In generating prompt sentence data, sentence acquisition unit 127 may exclude, from the entire log data stored in the log database, log data that stores processing identification data including a particular function ID in [processing identification data], and use only log data that does not include the particular function ID.
When issuing an instruction for execution of processing that uses a certain type of function (e.g., schedule management), a user may make heavy use of words that are specific to himself/herself (e.g., the names of his/her acquaintances). Therefore, information included in instruction sentence data related to that type of function may not be useful for other users, or may not be desirable in view of protection of personal information. With the present modification example, this inconvenience can be avoided.
(8) In the embodiment, to generate prompt sentence data (step S106 of
Grouping may be performed based on other criteria. For example, instead of categorizing instruction sentence data associated with the same processing identification data into one group, instruction sentence data indicating instruction sentences containing the same keyword may be categorized into one group. Furthermore, in selection of groups of instruction sentence data used to generate the prompt sentence data, the method of selecting a predetermined number of groups in descending order of the number of pieces of log data may be replaced by, for example, a method of excluding a predetermined number of groups in descending order of the number of pieces of log data (e.g., from the first to the fifth groups), and making a selection from the remaining groups in descending order of the number of pieces of log data (e.g., the sixth and subsequent groups). This prevents an inconvenient situation where only a speech instruction that is frequently issued by many users (e.g., “What is the weather like now?”) is repeatedly presented to a user as a prompt sentence. Moreover, instruction sentence data that was used by sentence acquisition unit 127 to generate prompt sentence data within a predetermined period in the past may not be used to generate new prompt sentence data (to be transmitted to the same terminal apparatus 11). This prevents an inconvenient situation where the same or similar prompt sentences are repeatedly presented to the same user.
(9) In the embodiment, sentence acquisition unit 127 of server apparatus 12 acquires prompt sentence data by generating the prompt sentence data using instruction sentence data included in log data extracted from the log database. Suggestion sentence acquisition unit 127 may not generate prompt sentence data, and may acquire prompt sentence data by reading out the prompt sentence data from, for example, memory 201 or receiving the prompt sentence data from an external apparatus. For example, sentence acquisition unit 127 may retrieve instruction sentence data included in log data extracted from the log database based on similarity in attribute data, and acquire the retrieved instruction sentence data as-is as prompt sentence data.
(10) In the embodiment, server apparatus 12 executes a speech recognition process (step S115 of
(11) In the embodiment, a display apparatus, an input device, and a sound pickup apparatus, which are exemplarily described as display 1041, touchscreen 1042, and microphone 105, respectively, are all built in terminal apparatus 11. However, at least one of them may be configured as an external apparatus different from terminal apparatus 11. Furthermore, in the embodiment, various types of data used by terminal apparatus 11 are stored in memory 101 built in terminal apparatus 11. However, an entirety or a part of such data may be stored in an external storage apparatus. Similarly, an entirety or a part of various types of data used by server apparatus 12 may be stored in an external storage apparatus in place of memory 201.
(12) In the embodiment, terminal apparatus 11 transmits terminal identification data and position data to server apparatus 12 upon entering a state in which it waits for a speech instruction (steps S102 and S103 of
(13) In the embodiment or modification examples, terminal apparatus 11 may execute at least a part of processing prompted by a prompt sentence as background processing in a period that follows reception of prompt sentence data from server apparatus 12 by terminal apparatus 11 (that follows step S108 of
In a variation of the embodiment, in the sequence of processes shown in
In a variation of the modification example (5), in the sequence of processes shown in
Thereafter, if the user issues a speech instruction in response to a prompt sentence as prompted by the prompt sentence, terminal apparatus 11 displays the result of the processing that has already been executed in the background.
In this modification example, processing conforming to a speech instruction prompted by a prompt sentence is already executed before the speech instruction is issued. Therefore, when the user issues the speech instruction as prompted by the prompt sentence, the result of the processing is presented to the user at higher speed.
Note that the user does not necessarily issue the exact speech instruction prompted by the prompt sentence. For example, in the examples of
In a further variation of the modification examples, when a user issues a speech instruction in response to a prompt sentence, terminal apparatus 11 may display the result of processing conforming to the speech instruction without accessing server apparatus 12. In this case, in order to recognize the speech instruction that is issued by the user as prompted by the prompt sentence, terminal apparatus 11 includes components that are similar to speech recognition unit 122 and processing ID generation unit 123 included in server apparatus 12. When the user issues the speech instruction as prompted by the prompt sentence, terminal apparatus 11 recognizes the speech indicated by speech data and generates instruction sentence data as well as processing identification data (processes similar to the processes of steps S115 to S117 of
In this modification example, even if terminal apparatus 11 cannot communicate with server apparatus 12 when the user issues a speech instruction as prompted by a prompt sentence, terminal apparatus 11 presents the user with the result of processing conforming to the speech instruction.
(14) Terminal apparatus 11 may include a speaker, and a prompt sentence may be read out via the speaker. In this case, terminal apparatus 11 includes, as functional components, a speech data generation unit that generates speech data indicating a speech formed by reading out a sentence indicated by prompt sentence data, and a speech data output unit that outputs the speech data to the speaker. The speech data output unit outputs the speech data to the speaker at the same time as when display control unit 114 issues an instruction for displaying the prompt sentence. As a result, the prompt sentence is not only displayed, but also presented in the form of a speech, to the user.
(15) In the embodiment, terminal apparatus 11 and server apparatus 12 are realized by causing an ordinary computer to execute processes conforming to the program according to the present invention. Alternatively, one or both of terminal apparatus 11 and server apparatus 12 may be configured as a so-called dedicated apparatus.
The present invention is to be understood as a system, an example of which is the speech agent system, as a terminal apparatus and a server apparatus composing the system, as a method for processes executed by these apparatuses, as a program for causing a computer to function as these apparatuses, and as a computer-readable non-transitory recording medium having recorded therein this program. Note that the program according to the present invention may be provided to the computer via the recording medium or via a network, such as the Internet.
Number | Date | Country | Kind |
---|---|---|---|
2014-000415 | Jan 2014 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2014/084044 | 12/24/2014 | WO | 00 |