The present disclosure relates to an information processing device, an information processing method, and a program.
Detecting communication such as conversations occurring between users is useful, for example, to guess the relationship between the users. As technology therefor, for example, Patent Literature 1 proposes the technology of extracting a conversation group on the basis of the similarity between speech feature values such as frequency components extracted from the sound information transmitted from the terminal devices of respective users. This makes it possible to analyze conversations irregularly occurring between unspecified people.
Patent Literature 1: JP 2012-155374A
However, it is not necessarily easy for the technology, for example, as described in Patent Literature 1 to detect a short conversation between users or detect that a conversation begins in real time in order to detect a conversation on the basis of aggregated speech feature values such as frequency components. Further, for example, in a case where there are a large number of users that can be candidates for conversation groups or users are found in noisy environments, it can be difficult to robustly detect conversations.
The present disclosure then proposes a novel and improved information processing device, information processing method, and program that uses feature values extracted from speech data and makes it possible to more robustly detect conversations between users in a variety of phases.
According to the present disclosure, there is provided an information processing device including: a communication determination unit configured to determine, on the basis of a feature value extracted from speech data including at least a sound of speech of a user, whether communication occurs between users including the user, the feature value indicating an interaction between the users.
Further, according to the present disclosure, there is provided an information processing method including, by a processor: determining, on the basis of a feature value extracted from speech data including at least a sound of speech of a user, whether communication occurs between users including the user, the feature value indicating an interaction between the users.
Further, according to the present disclosure, there is provided a program for causing a computer to execute: a function of determining, on the basis of a feature value extracted from speech data including at least a sound of speech of a user, whether communication occurs between users including the user, the feature value indicating an interaction between the users.
As described above, according to the present disclosure, it is possible to use feature values extracted from speech data and more robustly detect conversations between users in a variety of phases.
Note that the effects described above are not necessarily limitative. With or in the place of the above effects, there may be achieved any one of the effects described in this specification or other effects that may be grasped from this specification.
Hereinafter, (a) preferred embodiment(s) of the present disclosure will be described in detail with reference to the appended drawings. In this specification and the appended drawings, structural elements that have substantially the same function and structure are denoted with the same reference numerals, and repeated explanation of these structural elements is omitted.
Hereinafter, the description will be made in the following order.
Next, as illustrated in (b), sensor data is acquired for both the target user and the candidate users. More specifically, the sensor data includes sensor data such as speech data acquired by microphones (sound sensors), and acceleration indicating motions of users. As illustrated in (c), it is determined whether conversations occur between the target user and the candidate users, on the basis of feature values that are extracted from these kinds of sensor data and indicate interactions between the users. The target user can be categorized into a common conversation group along with candidate users determined to have the conversations.
The wearable terminal 100 is worn by each user. The wearable terminal 100 includes, for example, a microphone (sound sensor), and acquires speech data including a sound of speech of the user. Further, the wearable terminal 100 may include other sensors such as an acceleration sensor and a gyro sensor, and acquire sensor data such as acceleration indicating a motion of the user. For example, the eyewear 100a can be capable of acquiring sensor data indicating the acceleration or the angular velocity corresponding to a nod of a user. Further, for example, the wristwear 100b can be capable of acquiring sensor data indicating the acceleration or the angular velocity corresponding to a movement of a user's hand, a biological indicator such as a pulse, or the like. Further, the wearable terminal 100 may use information generated through information processing according to the present embodiment described below for presentation to a user. More specifically, the wearable terminal 100 may include output devices such as a display and a speaker, and present information to a user from these output devices in the form of images and sounds. Additionally, although the wearable terminal 100 and the mobile terminal 200 are separately shown in the illustrated example, the function of the wearable terminal 100 may be included in the mobile terminal 200 in another example. In this case, the mobile terminal 200 acquires sensor data by using a microphone, an acceleration sensor, a gyro sensor, or the like, and presents information generated through information processing to a user.
The mobile terminal 200 is carried by each user. The mobile terminal 200 relays communication between the wearable terminal 100 and the server 300 in the illustrated example. More specifically, for example, the wearable terminal 100 communicates with the mobile terminal 200 as wireless communication such as Bluetooth (registered trademark), while the mobile terminal 200 communicates with the server 300 as network communication such as the Internet. Here, the mobile terminal 200 may process information received from the wearable terminal 100 as necessary, and then transmit the processed information to the server 300. For example, the mobile terminal 200 may analyze sensor data including speech data received from the wearable terminal 100, and extract an intermediate feature value. Alternatively, the mobile terminal 200 may transfer sensor data received from the wearable terminal 100 to the server 300 with no processing. In such a case, for example, the system 10 does not necessarily have to include the mobile terminal 200 as long as network communication is possible between the wearable terminal 100 and the server 300. Further, the mobile terminal 200 may use information generated through information processing according to the present embodiment described below for presentation to a user instead of or in combination with the wearable terminal 100.
The server 300 is implemented by one or more information processing devices on a network, and provides a service to each user. For example, the server 300 extracts feature values from sensor data collected from the wearable terminal 100 of each user via the mobile terminal 200, and determines on the basis of the feature values whether a conversation occurs between users. The server 300 may generate information expressing the situation in which a conversation occurs between users, for example, on the basis of a result of the determination. This information may be used to display a screen for allowing, for example, a user (who can be a user not participating in a conversation or a user whose conversation is not a target of detection) to grasp, in real time, the situation in which a conversation occurs, or accumulated as a log. The information accumulated as a log may be, for example, referred to the above-described user afterwards, or a graph structure that expresses the relationship between users may be specified on the basis of the information accumulated as a log. Additionally, these kinds of processing may be executed, for example, by the mobile terminal 200 serving as a host between the wearable terminal 100 and the mobile terminal 200 of each user. In this case, the system 10 does not necessarily have to include the server 300.
The sensing unit 11 includes a sensor such as a microphone (sound sensor) that acquires speech data as an input into the system 10, an acceleration sensor or a gyro sensor that acquires sensor data such as acceleration indicating a motion of a user as an input into the system 10. Moreover, the sensing unit 11 includes a GNSS receiver or a wireless communication device for Wi-Fi or the like that acquires positional information of a user. The sensing unit 11 is implemented in the wearable terminal 100 such as the eyewear 100a and the wristwear 100b as illustrated, for example, in
The action detection unit 12 detects, from sensor data (that can include speech data) acquired by the sensing unit 11, an action of each user that provides sensor data. More specifically, for example, the action detection unit 12 detects a user's speech action from speech data. Here, the action detection unit 12 does not necessarily have to detect a feature of voice in speech or a speech content in the present embodiment. That is, the action detection unit 12 may simply detect whether a user speaks at certain time. In a case where the action detection unit 12 can additionally detect a feature of voice, a speech content, or the like, the action detection unit 12 may also detect them. Further, for example, the action detection unit 12 detects an action such as a nod of a user or a movement (gesture) of a user's hand from sensor data of acceleration or angular velocity. Moreover, for example, the action detection unit 12 may also detect a psychological action of a user from sensor data of a biological indicator such as the pulse of the user.
The candidate selection unit 13 detects the positional relationship between users who each provide sensor data, from sensor data acquired by the sensing unit 11, and selects users whose positional relationship satisfies a predetermined condition as candidates for users included in a conversation group. More specifically, the candidate selection unit 13 selects, as a candidate user, another user who is positioned near a target user, which is indicated through GNSS positioning, Wi-Fi positioning, or the like. Additionally, positional information of each user does not necessarily have to be available for the candidate selection unit 13 to select a candidate user. For example, if a terminal device (such as the wearable terminal 100 or the mobile terminal 200) of each user is directly communicable through wireless communication such as Bluetooth (registered trademark), the candidate selection unit 13 may recognize that these users are approaching. Alternatively, the candidate selection unit 13 may select a candidate user on the basis of behavior information of each user. More specifically, for example, the candidate selection unit 13 may acquire a user's behavior recognition result (such as work or a meeting in the office) associated with a position, and select another user whose behavior recognition result common to that of a target user is acquired as a candidate user. Further, for example, the candidate selection unit 13 may acquire a user's schedule (such as work or a meeting in the office similarly to a behavior recognition result) associated with a position, and select another user whose schedule common to that of a target user is acquired as a candidate user.
The feature value extraction unit 14 extracts the mutual relationship of actions detected by the action detection unit 12 between each candidate user extracted by the candidate selection unit 13 and the target user: a feature value indicating an interaction. Such a feature value is extracted on the basis of the temporal relationship between actions in the present embodiment.
For example, the feature value extraction unit 14 extracts, from speech data including a sound of speech of a user, a feature value indicating an interaction between users including the user. More specifically, the users include a first user and a second user, and the feature value extraction unit 14 extracts a feature value on the basis of the temporal relationship between a sound of speech of the first user (who can be a target user) and a sound of speech of the second user (who can be a candidate user). This feature can indicate that the first user exchanges speech with 16 the second user. For example, in a case where the first user converses with the second user, it is unlikely that speech sections of the first user overlap much with speech sections of the second user. The speech sections of the respective users should alternately occur.
Additionally, speech data acquired by the sensing unit 11 may separately include first speech data including a sound of speech of the first user, and second speech data including a sound of speech of the second user in the above-described example. Alternatively, speech data acquired by the sensing unit 11 may include a single piece of speech data including a sound of speech of the first user and a sound of speech of the second user (a sound of speech of still another user may be included in the single piece of speech data or different speech data). Additionally, in a case a single piece of speech data includes sounds of speech of users, processing of separating the sounds of speech of the respective users can be executed, for example, on the basis of a speaker recognition result or the like.
Further, for example, the feature value extraction unit 14 may extract a feature value between the first user and the second user on the basis of the temporal relationship between a sound of speech of each user included in the speech data provided from the user, and a motion or a biological indicator indicated by the sensor data provided from each user in the same way. That is, for example, the feature value extraction unit 14 may extract a feature value on the basis of the relationship between a sound of speech of the first user included in the speech data provided from the first user, and a motion or a biological indicator indicated by the sensor data provided from the second user. Further, the feature value extraction unit 14 may not only extract a feature value between a target user and a candidate user, but also extract a feature value between candidate users.
The conversation determination unit 15 determines on the basis of a feature value extracted by the feature value extraction unit 14 whether a conversation occurs between users. Since the candidate selection unit 13 is included in the present embodiment, the conversation determination unit 15 determines whether a conversation occurs between users selected on the basis of the positional relationship between users (all the users who are processing targets) from the users. As already described for the candidate selection unit 13, users who are determination targets may be selected on the basis of behavior information of each user. More specifically, for example, in a case where the occurrence probability of conversations calculated on the basis of a feature value extracted between the first user and the second user exceeds a predetermined threshold, the conversation determination unit 15 determines that a conversation occurs between the first user and the second user. The conversation determination unit 15 can specify a candidate user who has a conversation with a target user by calculating occurrence probability on the basis of a feature value extracted by the feature value extraction unit 14 between the target user and the candidate user. Moreover, the conversation determination unit 15 can specify a conversation that occurs between candidate users by calculating occurrence probability on the basis of a feature value extracted by the feature value extraction unit 14 between candidate users. Specifying conversations occurring not only between a target user and a candidate user, but also between candidate users make it is possible to grasp the situation of conversations occurring around the target user.
The score calculation unit 16 calculates a score between users on the basis of a conversation occurrence history based on the determination of the conversation determination unit 15. For example, the score calculation unit 16 may calculate a score by integrating times for which conversations are occurring between users within a predetermined period of time. Alternatively, the score calculation unit 16 may calculate a score on the basis of the frequency of occurrence of conversations occurring between users for a predetermined time or more within a predetermined period of time. Further, for example, in a case where it is determined that a conversation occurs between users, the score calculation unit 16 may refer to the occurrence probability of conversations calculated by the conversation determination unit 15, and calculate a higher score between users who are determined to have conversations with a higher occurrence probability. Moreover, for example, in a case where the action detection unit 12 can detect a feature of a user's voice, a speech content, and the like, the score calculation unit 16 may estimate the degree to which a conversation is active, on the basis of them, and calculate a higher score between users having a more active conversation.
The grouping unit 17 groups users on the basis of a score calculated by the score calculation unit 16. There can be a variety of grouping expressions. For example, the grouping unit 17 categorizes users whose mutual scores exceed a threshold into a common group. Further, the grouping unit 17 may specify a graph structure expressing the relationship between users. The graph structure may be defined separately from a group, or a group may be defined in accordance with the presence or absence, or the strength of a link of the graph structure. Additionally, information based on a determination result of the conversation determination unit 15 in the present embodiment may be generated not only by the grouping unit 17, but may be generated in a variety of forms. Such another example will be described below.
Next, the extraction of feature values in an embodiment of the present disclosure will be described. The feature value extraction unit 14 in the system 10 calculates a feature value indicating an interaction between the first user and the second user in the present embodiment. The feature value extraction unit 14 extracts a positive feature value for an interaction between users, for example, on the basis of the following events. That is, in a case where the following events frequently occur, feature values indicating interactions between users can be higher.
Exchange of speech (the speech of the first user and the speech of the second user alternately occur)
Nod of a non-speaker during speech
Nod of a non-speaker within a short speech period of time
Concurrent nods of a speaker and a non-speaker
Speech during speech of a speaker+a response of a nod
Meanwhile, the feature value extraction unit 14 calculates a negative feature value for an interaction between users, for example, on the basis of the following events. That is, in a case where the following events frequently occur, feature values indicating interactions between users can be lower.
Coincidence of speech sections (the speech of the first user and the speech of the second user concurrently occur)
No reaction of a non-speaker to speech
For example, the feature value extraction unit 14 calculates feature values based on the above-described events in a predetermined cycle (100 Hz as an example). The conversation determination unit 15 inputs the calculated feature values into a determination device in a predetermined cycle (which may be longer than the cycle for calculating the feature values, and 0.2 Hz as an example. In this case, the feature values may be treated as an average for every 30 s). The determination device may be, for example, a binary determination device, and determines whether the first user is likely or unlikely to converse with the second user. Such a determination device is generated, for example, through machine learning. For example, a support vector machine (SVM) can be used as a technique of machine learning, but a variety of known techniques in addition to this example can also be used. Further, any determination device can be used in the present embodiment as long as an output thereof enables the following determination. More specifically, the determination device may be a binary determination device, or a determination device that outputs probability. Further, the determination device does not necessarily have to be generated through machine learning.
As already described, the occurrence probability of conversations is used in determination using a threshold, for example, as illustrated in
Meanwhile, in a case where a user U4 who is not involved in the conversations passes by the users U1 to U3, the position of the user U4 approaches the users U1 to U3. Accordingly, the user U4 can be treated as a candidate user, but feature values that are extracted by the feature value extraction unit 14 and indicate interactions between users do not become positive for the occurrence of conversations as described above. Thus, the occurrence probability of conversations calculated by the conversation determination unit 15 is not also high. The occurrence probability of conversations does not therefore exceed the threshold, although the narrow links L1 can be displayed between the user U1 and the user U4, and between the user U3 and the user U4, for example, as illustrated. Accordingly, the displayed links do not have greater width. Once the user U4 goes further, the links L1 also disappear.
For example, conversations are detected on the basis of feature values such as frequency components of speech data acquired by the wearable terminal 100 of each of the users U1 to U4, it is possible to categorize the users U1 to U4 into a single conversation group because the speech data provided from each of the users U1 to U4 can indicate similar feature values, but it is difficult to guess in what combinations of them the conversations proceed in the above-described example illustrated
For example, the conversation determination unit 15 minimizes the energy of the generated graph structure, thereby performing the above-described optimization (rule of minimizing energy). Further, the conversation determination unit 15 may also optimize the graph structure in accordance with a rule based on the common knowledge that it is a single person who serves as a hub of conversations, for example, like the user U2 in the example of
As described in the example illustrated in
A speech section 132 is detected on the basis of speech data including sounds of speech of a user acquired by the microphone 112 as described in the example illustrated in
A body direction 136 is detected, for example, by using sensor data acquired by the geomagnetic sensor 126. As described above with reference to
A gesture 138 is detected, for example, by using sensor data acquired by the motion sensor 120 or the geomagnetic sensor 126. For example, similarly to a nod in the example described with reference to
A pulse 140 is detected, for example, by using the biological sensor 128. For example, in a case where, when users have an active conversation, the pulse 140 also looks likely to increase, it can be possible to estimate whether the state of the pulse matches the state of a conversation between users, or whether users are conversing with each other (e.g., if another action or a feature value indicates an active conversation, but the pulse 140 does not increase, it is possible that the users are not actually conversing with each other).
In a case where the above-described detection results of actions are used, feature values indicating interactions between users can be higher, for example, in a case where the occurrence frequency of the following events is high.
Reaction of a non-speaker in the form of gestures at the end of speech of a speaker
Words included in speech have commonality
Speech contents have commonality, and an answer matches the speech contents
Body directions of a speaker and a non-speaker intersect each other
Actions of walking, eating, or the like are common
Changes of a speaker and a non-speaker in pulse are correlated
Further, the conversation determination unit 15 may consider the context of user behaviors or methods for using specified conversation groups in categorizing users into conversation groups. For example, in a case where a private image of a user is shared between the specified conversation groups, it is possible to prevent the image from being shared with an inappropriate user, by setting a higher threshold for determining that a conversation occurs between users. Further, for example, setting a lower threshold makes it possible to categorize those who converse with a user into a conversation group without showing them in a party or the like, where participants are very likely to converse with each other in a wide area. Moreover, for example, a higher threshold may be set in the daytime, when a user is in the crowd in the city or the like in many cases, to prevent false detection, while a lower threshold may be set in the nighttime in many cases, when a user is in a less crowded place such as homes.
The UI generation unit 171 may provide a user interface that displays the states of conversations between users in a chronological order in the form of a graph, for example, as described above with respect to
For example, in a case where the states of conversations between users detected as described above are used for various kinds of use, ad hoc conversation group recognition between terminal devices as illustrated in
Terminal devices 100x and 100y (that are only have to be terminal devices each of which are used by a user, and may be, for example, the wearable terminal 100 or the mobile terminal 200 in the example of
More specifically, the candidate selection unit 13 selects a candidate user on the basis of positional information acquired by the sensing unit 11, and positional information acquired by the sensing unit 11 of the other user in the illustrated example. The users of the terminal devices 100x and 100y are then selected as each other's candidate user. Next, the action detection unit 12 specifies a section in which an action such as speech or nodding occurs, on the basis of the sensor data acquired by the sensing unit 11. Moreover, the feature value extraction unit 14 shares information such as the section specified by the action detection unit 12 of each terminal device via the communication unit 31, and extracts a feature value indicating an interaction between the users of the terminal devices 100x and 100y. The conversation determination unit 15 determines on the basis of the extracted feature value whether a conversation occurs between the users of the terminal devices 100x and 100y. The UI generation unit 171 generates a user interface such as the above-described graph or list in accordance with a result of the determination, and presents the generated user interface to each user via the display unit 32.
A conversation group of users is displayed in a screen 2100c in the form of a list in the example illustrated in
The history of a person with whom a user has conversations is output onto a time line by a log output unit 175 and the link function 172 to social media in a second example.
Functional components as described above can recommend, as a friend, another user in social media who, for example, has conversations (which can be determined on the basis of conversation time or high conversation probability) to some extent. This eliminates the necessity to take the trouble of registering another user with whom a user has conversations, as a friend in social media. Further, logs based on conversation occurrence histories can also be referred to in an application for social media or the like. Information such as topics of conversations recognized through processing of the speech recognition unit 34 and the topic recognition unit 35, information of places where conversations occur, images, or the like may be then added to the logs. For example, if conversation logs are filtered and displayed in accordance with topics or persons with whom a user has conversations, the conversation logs are useful as a tool for assisting the user in memory or a means for recording memories.
It is possible in a third example to make an action on a person with whom conversations are not necessarily exchanged, for example, on social media in the above-described second example. As described above, the feature value extraction unit 14 can not only extract feature values on the basis of the relationship between the respective sounds of speech of users, but also extract feature values on the basis of the temporal relationship between sounds of speech of one user and an action (such as a motion or a biological indicator) other than speech of the other user in the present embodiment. If this is used, for example, a sporadically conversing person recognition unit 173 can recognize not only another user with whom a user exchanges speech and converses, but also another user who shows some reactions to the speech of the user or another user whose speech an action of the user is pointed to and display him or her on a time line provided by the log output unit 175. The user can make an action 174 on the other user (who is not an acquaintance in many cases) in cloud computing on the basis of this. For example, an avatar of the other user is just visible at this time in the action in cloud computing because of privacy protection, but personal information does not necessarily have to be exchanged.
Additionally, the post-process unit 36 corresponds to the above-described sporadically conversing person recognition unit 173, and the action 174 in cloud computing. For example, the post-process unit 36 is implemented as software by a processor included in the terminal device 100w operating in accordance with a program.
The log output unit 175 outputs, as a log, a result obtained by generating a conversation group in the illustrated example. The post-process unit 36 specifies another user for whom communication including a conversation of a predetermined time or less or speech of only one user is detected in the log. Moreover, the post-process unit 36 can extract another user whom the user temporarily meets and make an action on such a user in cloud computing by removing users who have already been friends in social media from among the specified users.
The topic recommendation unit 183 illustrated in
Further, as another example, the topic recommendation unit 183 may provide a topic to the user in accordance with a log output by the log output unit 175 or an intimacy degree calculated by an intimacy degree graph generation unit 177 described below. More specifically, for example, in a case where the user converses with a person with whom the user constantly converses (person having a large number of logs of conversations with the user), or a person having a high intimacy degree, the topic recommendation unit 183 may determine that a new topic is provided in a case where the conversations are inactive as described above because the conversations are supposed to be active in theory. Meanwhile, in a case where the user converses with a person with whom the user does not converse much (person having few logs of conversations with the users) or a person having a low intimacy degree, the topic recommendation unit 183 may refrain from providing new topics although the conversations are estimated to be inactive as described above because conversations are not necessary in particular in some cases.
The intimacy degree graph generation unit 177 illustrated in
As an example, the intimacy degree graph generation unit 177 may calculate an intimacy degree C. with another user by using an equation such as the following expression 1. Additionally, it is assumed that each conversation occurring between a user and another user is provided with an index i. tnow represents the current time. tpast_i represents the time at which an i-th conversation with another user occurs (older conversations thus have less influence on the intimacy degree in the expression 1). durationi represents the total time of the i-th conversation. speaki represents a speaking time in the i-th conversation. nodi represents a nodding time in the i-th conversation (thus, as a speaking time increases as compared with a nodding time, the intimacy degree also increases in the expression 1). positivei and negativei represent a user's emotion (positive and negative. If the positive emotion is stronger, the intimacy degree has a plus value in the expression 1, while if the negative emotion is stronger, the intimacy degree has a minus value) estimated on the basis of biological information or the like about another user with whom the user has the i-th conversation.
A desire-to-share graph generation unit 179 illustrated in
Further, the filter F corresponds to an adaptation graph generated by the adaptation graph generation unit 181, and the filter F related to information to be shared is selected. A graph of interests is selected from graphs of places, interests, groups, and the like, and the filter F corresponding thereto is applied in the illustrated example. As illustrated in
As a result, the positional relationship between other users included in the graph changes in the desire-to-share graph G2 as compared with the group intimacy degree graph G1. A certain user has a link strengthened by applying the filter F, while another user has a link weakened by applying the filter F (the strength of links are expressed in the form of the distances from the center of the graph in the illustrated example). As a result, in a case where content is shared with another user whose link has strength exceeding a predetermined threshold (or such a user is treated as a candidate for a sharing destination of content), it is possible to set the more appropriate sharing destination or a candidate therefor corresponding to the type of content or the context in which content is shared than a case where a sharing destination or a candidate therefor is decided simply by using the group intimacy degree graph G1.
Here, a more specific example will be used to describe an example of dynamically selecting the adaptation graph from which the filter F originates. For example, in a case where a user goes on a trip, the adaptation graph corresponding to the attribute of places may be selected and the link to another user at the current position (trip destination) of a user may be strengthened (filter configured on the basis of the positional relationship between users included in the graph structure). Further, for example, in a case where a user is at work, the adaptation graph corresponding to work may be selected and the link to another user (such as a coworker) having a working relationship may be strengthened (filter configured on the basis of a group to which a user included in the graph structure belongs). Further, for example, in a case where a user is playing or watching a sport, the adaptation graph corresponding to interests may be selected and the link to another user interested in the sport may be strengthened (filter configured on the basis of what a user included in the graph structure is interested in). Further, for example, in a case where a user is participating in a party (social gathering) in which anyone can participate, an adaptation graph (filter configured on the basis of behavior information of a user included in the graph structure) may be selected to strengthen the link to another user having nothing to do at that time. Further, for example, in a case where a user is confronted with something unknown and has trouble, an adaptation graph (filter configured on the basis of the knowledge of a user included in the graph structure) may be selected to strengthen the link to another user who has the knowledge.
Additionally, adaptation graphs may be combined to configure the filter F. Further, it may be selectable to use no adaptation graph (apply substantially no filter F). As described above, the adaptation graph generation unit 181 automatically (e.g., rule-based) select an adaptation graph on the basis of the recognized context, the profile of a user, or the like. The adaptation graph generation unit 181 may be, however, capable of presenting selectable adaptation graphs to a user in the form of lists, tabs, or the like, and then selecting an adaptation graph in accordance with the selection of the user. In this case, the adaptation graph generation unit 181 may be configured to select an adaptation graph in accordance with the selection of a user and learn a selection criterion of an adaptation graph (based on the context of the situation of the user, the type of content to be shared, or the like) on the basis of selection results of the user at the initial stage, and then automatically select an adaptation graph.
The link to the user C is strengthened in the illustrated example because the user A mentions the name of the user C in the actual speech. However, the similar processing is also possible, for example, in a case where the name of the user C is included in sentences input by the user A (or the user B) when the user A and the user B have an on-line chat. The above-described example can also be an example in which the group intimacy degree graph generation unit 178 temporarily corrects an intimacy degree graph (graph structure expressing the relationship between users) specified on the basis of the occurrence histories of conversations between the user A and another user (including the user C) within a certain period of time (first period of time) in a case where the name of the user C is included in the contents sent by the user A in conversations (which may be actual conversations or virtual conversations such as on-line chats) occurring between the user A and another user (user B in the above-described example) within the most recent second period of time shorter than the first period of time. More specifically, the group intimacy degree graph generation unit 178 temporarily strengthens the relationship between the user A and the user C in a group intimacy degree graph in this example. As a similar example, the group intimacy degree graph generation unit 178 may temporarily strengthen the link to another user to whom the user casts the line of sight in the intimacy degree graph.
If content is shared as described in the above-described sixth example, the desire-to-share graph (G3 illustrated in
If content is shared in the above-described configuration, for example, a user with whom content is shared can be satisfied very much by selectively sharing content of another really intimate user or content in which the user can be interested when content is shared. Further, if content (such as watching sports games live) that is experienced by a certain user in real time is shared with another user in a remote place in real time, the experience can be shared.
The embodiments of the present disclosure may include, for example, an information processing device as described above, a system, an information processing method executed by the information processing device or the system, a program for causing the information processing device to function, and a non-transitory tangible medium having the program recorded thereon.
Additionally, conversations can be detected between users in the system in the description of the above-described embodiment. However, the conversations detected between users are not necessarily limited in the above-described embodiment to conversations in which the related users all speak as already described. For example, a case can also be detected where only a part of the users speaks, and the other users make an action such as nodding in accordance with the speech. It can be thus the occurrence of communication (conversations are a type of communication) between users that can be detected in an embodiment of the present disclosure in addition to a case where such a case is detected separately from conversations in another embodiment. The conversation determination unit can be thus an example of a communication determination unit.
The embodiment has been described above in which it is determined whether a conversation occurs between a target user and a candidate user, on the basis of a feature value indicating an interaction between the users. The following describes a second embodiment, which is an application example of the above-described first embodiment. A system in which positioning information is transferred between users will be described in the second embodiment.
GNSS positioning consumes much power. It is desirable to enable GNSS positioning with less power in a terminal such as the mobile terminal 100 or the wearable terminal 200 including a small battery. The following then describes an embodiment in which positioning information is transferred between users.
Next, as illustrated in B of
Additionally, the above-described GNSS positioning rights may be transferred at predetermined time intervals. Further, in a case where the remaining battery level of each wearable terminal 100 is recognized, the GNSS positioning right may be transferred to the wearable terminal 100 having a higher remaining battery level. If GNSS positioning is performed by the wearable terminal 100 having a higher remaining battery level in this way, it is possible to smooth the remaining battery levels of the terminals in the group. Further, as illustrated in
In
The server 300m includes a communication unit 37, an accompanying person recognition unit 38, and a GNSS positioning decision unit 39. The communication unit 37 communicates with each of the wearable terminals 100m and 100n. Further, the accompanying person recognition unit 38 groups accompanying persons on the basis of information sent from each of the wearable terminals 100m and 100n. Further, the GNSS positioning decision unit 39 decides to which user a GNSS positioning right is provided in a group recognized by the accompanying person recognition unit 38.
Further, the wearable terminals 100m and 100n includes the communication unit 31, the display unit 32, the sensing unit 11, an accompanying person recognition unit 40, a GNSS positioning unit 41, a GNSS control unit 42, and a virtual GNSS positioning unit 43. Here, the communication unit 31 communicates with the server 300m. Further, the display unit 32 displays information such as information on users belonging to a group. Additionally, the communication unit 31 is implemented by communication devices of Bluetooth (registered trademark), Wi-Fi, or the like included in the respective wearable terminals 100m and 100n as described above.
Further, the sensing unit 11 may include a microphone, an acceleration sensor, and/or a gyro sensor as described above, and further include an imaging unit such as a camera. Further, the accompanying person recognition unit 40 receives information from the sensing unit 11 and the communication unit 31, and transmits the received information to the accompanying person recognition unit 38 of the server 300m via the communication unit 31. Further, the accompanying person recognition unit 40 receives information of an accompanying person recognized by the accompanying person recognition unit 38 of the server 300m. Additionally, this information of an accompanying person may also be displayed on the display unit 32, and the displayed information of an accompanying person may be corrected by a user.
The GNSS positioning unit 41 receives GNSS signals from a GNSS satellite for positioning. The virtual GNSS positioning unit 43 uses positioning information received from another terminal to determine the position of an own terminal. Next, the GNSS control unit 42 switches whether to turn on the GNSS positioning unit 41 or the virtual GNSS positioning unit 43, on the basis of a GNSS positioning right generated by the GNSS positioning decision unit 39 of the server 300m. Further, as described above with reference to
The operation of the above-described configuration will be specifically described below. The accompanying person recognition units 40 of the wearable terminals 100m and 100n receive the following information from the sensing unit 11 or the GNSS control unit 42 or the communication unit 31.
(1) Positioning information generated by the GNSS positioning unit 41 or the virtual GNSS positioning unit 43
(2) Terminal identification information (ID) of Bluetooth (registered trademark) or Wi-Fi of the other terminal that is received
(3) Sounds received by a microphone
(4) Information of captured images taken by a camera
The accompanying person recognition units 40 of the wearable terminals 100m and 100n transmit the information described above in (1) to (4) to the accompanying person recognition unit 38 of the server 300m. The accompanying person recognition unit 38 of the server 300m, which receives the information, then determines the distance to each wearable terminal 100, for example, from the positioning information in (1). If the distance is a predetermined distance or less, the user who possesses the wearable terminal 100 may be recognized as an accompanying person.
Further, with respect to the terminal identification information in (2), the accompanying person recognition unit 38 of the server 300m may recognize, as an accompanying person, the user who possesses the wearable terminal 100 whose terminal identification information is observes on a long-term basis. That is, in a case where the wearable terminal 100 having terminal identification information A observes the wearable terminal 100 of terminal identification information B on a long-term basis, the user who possesses the wearable terminal 100 having the terminal identification information B is identified as an accompanying person.
Further, the accompanying person recognition unit 38 of the server 300m may perform environmental sound matching on the basis of the sound information in (3), and recognize the user of a wearable terminal having similar sound information as an accompanying person. Further, the accompanying person recognition unit 38 of the server 300m may recognize, on the basis of the image information in (4), a person recognized in captured images within a predetermined period of time as an accompanying person. Person data (such as face image data) used for image recognition may be then stored in each of the wearable terminals 100m and 100n, and the accompanying person recognition units 40 of the wearable terminals 100m and 100n may transmit the person data to the server 300m.
Further, the above-described accompanying person recognition unit 38 of the server 300m may recognize an accompanying person on the basis of an action such as a user's nod or hand movement (gesture) described in the first embodiment, or a feature value indicating an interaction between users (i.e., accompanying persons) which is based on the sounds of speech between the users. Further, various kinds of information in (1) to (4) and various kinds of information of an interaction between users user may be integrated to recognize an accompanying person. If an accompanying person is selected on the basis of the above-described various kinds of information, the recognition method corresponding to the conditions of the wearable terminals 100m and 100n is selected. For example, when a camera is activated, information of captured images of the camera may be used to recognize an accompanying person. Further, when a microphone is activated, sound information may be used to recognize an accompanying person. Further, the integration and use of some kinds of information make it possible to more accurately identify an accompanying person. As described above, various kinds of information in (1) to (4) and various kinds of information of an interaction between users can be an example of accompanying person recognition information used to recognize an accompanying person.
The above describes the example in which an accompanying person is recognized in the accompanying person recognition unit 40 of the server 300m via the server 300m. An accompanying person may be, however, recognized through communication between the respective wearable terminals 100m and 100n.
The above describes the example in which GNSS positioning rights are transferred among grouped users. The following describes an example in which positioning information of a device such as a vehicle including a sufficiently large power source and capable of GNSS positioning is used.
The GNSS control unit 42 of the wearable terminal 100 associated with the vehicle 400 powers off the GNSS positioning unit 41. The GNSS control unit 42 then acquires positioning information measured by the GNSS positioning unit 45 of the vehicle 400 via the communication unit 31. The GNSS control unit 42 turns on the virtual GNSS positioning unit 43, and recognizes the position of an own terminal by using the acquired control information. Once the association of the wearable terminal 100 with the vehicle 400 is released, the wearable terminal 100 turns on the GNSS positioning unit 41 of the wearable terminal 100 and performs positioning by itself.
In a case where a device such as the vehicle 400 including a sufficient power source is associated with the wearable terminal 100 in this way, the wearable terminal 100 uses positioning information measured by the device including a sufficient power source. This reduces the power consumption of the wearable terminal 100.
The above describes the example of the system that uses positioning information measured by another device. The following describes an application example of the system. Positioning information is shared between terminals positioned adjacent to each other in the application example. This application example is effective in a situation in which a large amount of terminals crowd a limited area such as a shopping mall.
Next, in S102, the wearable terminal 100 determines the number of adjacent terminals scanned in S100. Next, in S106, the wearable terminal 100 performs intermittent positioning described below in detail on the basis of the number of adjacent terminals determined in S102.
Next, the wearable terminal 100 determines in S108 whether to receive positioning information from another terminal. Here, in a case where no positioning information is acquired from another terminal, the processing proceeds to S112 and the wearable terminal 100 performs GNSS positioning by itself. If the wearable terminal receives positioning information from another terminal in S108, the processing proceeds to S110 and the wearable terminal uses the positioning information received from the other terminal to recognize the position of an own terminal. The processing then returns to S100, and the above-described processing is repeated.
The above describes the operation of the application example of the second embodiment. The following describes the intermittent positioning in S106 of
Further, as described above, in a case where the wearable terminal 100 intermittently performs positioning, the intermittence rate may be changed in the accordance with the number of adjacent terminals determined in S102. It is assumed that the number of adjacent terminals determined in S102 is, for example, ten, and each performs positioning at an intermittence rate of 90%. Here, an intermittence rate of 90% means that the GNSS positioning unit 41 is turned on, for example, for only one second every ten seconds.
The probability that the nine terminals other than the own terminal do not perform positioning is 0.9̂10≈0.35 (35%) in the above-described situation. Here, the probability that the terminals other than the own terminal do not perform positioning for three straight seconds is 0.35̂3≈0.039 (3.9%). This probability is very low. That is, there is a very high probability that the wearable terminal 100 can receive positioning information from another terminal at at least approximately 3-second intervals. The wearable terminal 100 can therefore acquire positioning information with sufficient accuracy in the above-described system while maintaining an intermittence rate of 90%.
As understood from the above description, a wearable terminal 199 can increase the intermittence rate if more adjacent terminals are detected, while the wearable terminal 100 has to decrease the intermittence rate if fewer adjacent terminals are detected. Intermittently operating the GNSS positioning unit 41 in this way allows the wearable terminal 100 to save power. Further, GNSS positioning may be executed by being complemented with past positioning information in the GNSS positioning method for intermittent positioning. At this time, if the past positioning information is too old, complementation can be impossible. Meanwhile, the use of the above-described system makes it possible to acquire positioning information from another terminal in spite of the increased intermittence rate. Accordingly, positioning information is appropriately complemented.
The embodiments of the present disclosure may include, for example, an information processing device as described above, a system, an information processing method executed by the information processing device or the system, a program for causing the information processing device to function, and a non-transitory tangible medium having the program recorded thereon.
Additionally, the example in which an accompanying person is recognized from various kinds of information detected by the wearable terminal 100 has been described in the above-described embodiment. An accompanying person may be, however, recognized by using a dedicated application that registers a user as an accompanying person in advance. Further, an accompanying person may be recognized by using a group function of an existing social network service (SNS).
Next, the hardware configuration of the information processing device according to the embodiment of the present disclosure will be described with reference to
The information processing device 900 includes a central processing unit (CPU) 901, read only memory (ROM) 903, and random access memory (RAM) 905.
Further, the information processing device 900 may include a host bus 907, a bridge 909, an external bus 911, an interface 913, an input device 915, an output device 917, a storage device 919, a drive 921, a connection port 923, and a communication device 925. Moreover, the information processing device 900 may include an imaging device 933 and a sensor 935 as necessary. The information processing device 900 may include a processing circuit such as a digital signal processor (DSP), an application specific integrated circuit (ASIC), or a field-programmable gate array (FPGA) instead of or in combination with the CPU 901.
The CPU 901 functions as an operation processing device and a control device, and controls all or some of the operations in the information processing device 900 in accordance with a variety of programs recorded on the ROM 903, the RAM 905, the storage device 919, or a removable recording medium 927. The ROM 903 stores a program, an operation parameter, and the like which are used by the CPU 901. The RAM 905 primarily stores a program which is used in the execution of the CPU 901 and a parameter which is appropriately modified in the execution. The CPU 901, the ROM 903, and the RAM 905 are connected to each other by the host bus 907 including an internal bus such as a CPU bus. Moreover, the host bus 907 is connected to the external bus 911 such as a peripheral component interconnect/interface (PCI) bus via the bridge 909.
The input device 915 is a device which is operated by a user, such as a mouse, a keyboard, a touch panel, a button, a switch, and a lever. The input device 915 may be, for example, a remote control device using infrared light or other radio waves, or may be an external connection device 929 such as a mobile phone operable in response to the operation of the information processing device 900. The input device 915 includes an input control circuit which generates an input signal on the basis of information input by a user and outputs the input signal to the CPU 901. By operating the input device 915, a user inputs various types of data to the information processing device 900 or requires a processing operation.
The output device 917 includes a device capable of notifying a user of the acquired information via senses of sight, hearing, touch, and the like. The output device 917 can be a display device such as a liquid crystal display (LCD) or an organic electro-luminescence (EL) display, a sound output device such as a speaker or headphones, a vibrator, or the like. The output device 917 outputs a result obtained by the information processing device 900 performing processing as video such as text or images, audio such as speech or sounds, vibration, or the like.
The storage device 919 is a device for data storage which is configured as an example of a storage unit of the information processing device 900. The storage device 919 includes, for example, a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, or a magneto-optical storage device. The storage device 919 stores a program, for example, to be executed by the CPU 901, various types of data, various types of data acquired from the outside, and the like.
The drive 921 is a reader/writer for the removable recording medium 927 such as a magnetic disk, an optical disc, a magneto-optical disk, and a semiconductor memory, and is built in the information processing device 900 or externally attached thereto. The drive 921 reads out information recorded in the removable recording medium 927 attached thereto, and outputs the read-out information to the RAM 905. Further, the drive 921 writes record into the mounted removable recording medium 927.
The connection port 923 is a port used to connect a device to the information processing device 900. The connection port 923 may include, for example, a universal serial bus (USB) port, an IEEE1394 port, and a small computer system interface (SCSI) port. The connection port 923 may further include an RS-232C port, an optical audio terminal, a high-definition multimedia interface (HDMI) (registered trademark) port, and so on. The connection of the external connection device 929 to the connection port 923 makes it possible to exchange various types of data between the information processing device 900 and the external connection device 929.
The communication device 925 is, for example, a communication interface including a communication device or the like for a connection to a communication network 931. The communication device 925 may be, for example, a communication card for a local area network (LAN), Bluetooth (registered trademark), Wi-Fi, a wireless USB (WUSB), or the like. Further, the communication device 925 may be a router for optical communication, a router for an asymmetric digital subscriber line (ADSL), a modem for various kinds of communication, or the like. The communication device 925 transmits a signal to and receives a signal from, for example, the Internet or other communication devices on the basis of a predetermined protocol such as TCP/IP. Further, the communication network 931 connected to the communication device 925 may include a network connected in a wired or wireless manner, and is, for example, the Internet, a home LAN, infrared communication, radio wave communication, satellite communication, or the like.
The imaging device 933 is a device that images a real space by using an image sensor such as a complementary metal oxide semiconductor (CMOS) or a charge coupled device (CCD), and a variety of members such as a lens for controlling the formation of an object image on the image sensor, and generates a captured image. The imaging device 933 may be a device that captures a still image, and may also be a device that captures a moving image.
The sensor 935 includes a variety of sensors such as an acceleration sensor, an angular velocity sensor, a geomagnetic sensor, an illuminance sensor, a temperature sensor, a barometric sensor, or a sound sensor (microphone). The sensor 935 acquires information on a state of the information processing device 900, such as the attitude of the housing of the information processing device 900, and information on an environment around the information processing device 900, such as the brightness and noise around the information processing device 900. The sensor 935 may also include a global positioning system (GPS) receiver that receives GPS signals and measures the latitude, longitude, and altitude of the device.
The example of the hardware configuration of the information processing device 900 has been described above. Each of the above-described components may be configured with a general-purpose member, and may also be configured with hardware specialized in the function of each component. Such a configuration may also be modified as appropriate in accordance with the technological level at the time of the implementation.
The preferred embodiment(s) of the present disclosure has/have been described above with reference to the accompanying drawings, whilst the present disclosure is not limited to the above examples. A person skilled in the art may find various alterations and modifications within the scope of the appended claims, and it should be understood that they will naturally come under the technical scope of the present disclosure.
Further, the effects described in this specification are merely illustrative or exemplified effects, and are not limitative. That is, with or in the place of the above effects, the technology according to the present disclosure may achieve other effects that are clear to those skilled in the art from the description of this specification.
Additionally, the present technology may also be configured as below.
(1)
An information processing device including:
a communication determination unit configured to determine, on the basis of a feature value extracted from speech data including at least a sound of speech of a user, whether communication occurs between users including the user, the feature value indicating an interaction between the users.
(2)
The information processing device according to (1), in which
the users include a first user and a second user, and
the feature value is extracted on the basis of a temporal relationship between a sound of speech of the first user and a sound of speech of the second user which are included in the speech data.
(3)
The information processing device according to (2), in which the speech data includes first speech data including the sound of speech of the first user, and second speech data including the sound of speech of the second user.
(4)
The information processing device according to (2), in which
the speech data includes a single piece of speech data including the sound of speech of the first user, and the sound of speech of the second user.
(5)
The information processing device according to any one of (1) to (4), further including:
a feature value extraction unit configured to extract the feature value from the speech data.
(6)
The information processing device according to any one of (1) to (5), in which
the communication determination unit determines whether the communication occurs between users selected from the users on the basis of a positional relationship between the users.
(7)
The information processing device according to any one of (1) to (6), in which
the communication determination unit determines whether the communication occurs between users selected from the users on the basis of behavior information of each user.
(8)
The information processing device according to any one of (1) to (7), in which
the feature value is extracted further from sensor data indicating motions or biological indicators of the users.
(9)
The information processing device according to (8), in which
the users include a third user and a fourth user, and
the feature value is extracted on the basis of a relationship between a sound of speech of the third user included in the speech data, and a motion or a biological indicator of the fourth user indicated by the sensor data.
(10)
The information processing device according to any one of (1) to (9), further including:
a display control unit configured to display a screen for presenting the communication in a chronological order.
(11)
The information processing device according to (10), in which
the communication is presented in the screen in a form corresponding to occurrence probability of the communication calculated on the basis of the feature value.
(12)
The information processing device according to any one of (1) to (11), further including:
a log output unit configured to output, on the basis of an occurrence history of the communication, a log including at least one of information of a person with whom at least one user included in the users communicates, or information of a conversation with the person with whom the at least one user included in the users communicates.
(13)
The information processing device according to (12), in which
the log output unit outputs the log onto a time line presented to the at least one user.
(14) The information processing device according to any one of (1) to (13), further including:
a relationship graph specification unit configured to specify a graph structure expressing a relationship between the users on the basis of an occurrence history of the communication.
(15)
The information processing device according to (14), further including:
a sharing user specification unit configured to apply, in a phase in which at least one user included in the users shares information, a filter related to the shared information to the graph structure, thereby specifying another user who shares the information.
(16)
The information processing device according to (15), in which
the filter is configured on the basis of a positional relationship with a user included in the graph structure, a group to which a user included in the graph structure belongs, an interest of a user included in the graph structure, behavior information of a user included in the graph structure, or knowledge of a user included in the graph structure.
(17)
The information processing device according to any one of (14) to (16), in which
the relationship graph specification unit temporarily corrects the graph structure specified on the basis of the occurrence history of the communication within a first period of time in accordance with a content of the communication occurring within a most recent second period of time that is shorter than the first period of time.
(18)
The information processing device according to (17), in which
the users include a fifth user and a sixth user, and
in a case where a content sent by the fifth user includes a name of the sixth user in the communication occurring within the second period of time, the relationship graph specification unit temporarily strengthens a relationship between the fifth user and the sixth user in the graph structure.
(19)
An information processing method including, by a processor:
determining, on the basis of a feature value extracted from speech data including at least a sound of speech of a user, whether communication occurs between users including the user, the feature value indicating an interaction between the users.
(20)
A program for causing a computer to execute:
a function of determining, on the basis of a feature value extracted from speech data including at least a sound of speech of a user, whether communication occurs between users including the user, the feature value indicating an interaction between the users.
(21)
The information processing device according to (1), including:
an accompanying person recognition unit configured to recognize an accompanying person of the user on the basis of accompanying person recognition information for recognizing the accompanying person; and
a GNSS positioning decision unit configured to determine whether a GNSS positioning right for GNSS positioning is provided to a first information processing device or a second information processing device, the first information processing device being possessed by the user, the second information processing device being possessed by the accompanying person.
(22)
The information processing device according to (21), in which
the accompanying person recognition information includes any one or a combination of a feature value indicating an interaction between the user and the accompanying person, or image information captured by the first information processing device possessed by the user, or information on a distance between the first information processing device and the second information processing device, or terminal identification information sent by the first information processing device or the second information processing device.
(23)
The information processing device according to (21) or (22), in which
remaining battery levels of the first information processing device and the second information processing device are recognized, and an information processing device to which the GNSS positioning right is provided is decided on the basis of the remaining battery levels.
(24)
The information processing device according to any one of (21) to (23), in which
in a case where a vehicle that is adjacent to the first information processing device and is capable of GNSS positioning is recognized, positioning information is acquired from the vehicle.
(25)
The information processing device according to any one of (21) to (24), further including:
a communication unit, in which
a frequency at which GNSS positioning is intermittently performed is changed in accordance with a number of adjacent terminals recognized by the communication unit.
Number | Date | Country | Kind |
---|---|---|---|
2015-066901 | Mar 2015 | JP | national |
PCT/JP2015/085187 | Dec 2015 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2016/057392 | 3/9/2016 | WO | 00 |