The present disclosure relates to an information processing apparatus, an information processing method, an information processing program, and an information processing system that control output to a user.
There is a technology that performs sound recognition on speech and environmental sound, selects content such as a piece of music on the basis of the recognized sound, and outputs the selected content (Patent Literature 1).
The technology that performs sound recognition on speech and environmental sound can only be applied to an environment in which sound is produced. Thus, there is a possibility that proper content will not be selected by a user who does not want to produce sound, or in a state in which sound is not allowed to be produced. Further, there is a need for a high computational performance to perform natural language processing. This results in difficulty in performing processing locally.
In view of the circumstances described above, it is an object of the present disclosure to provide an information processing apparatus, an information processing method, an information processing program, and an information processing system that control output to a user properly regardless of state.
An information processing apparatus according to an embodiment of the present disclosure includes:
The present embodiment makes it possible to control output to a user properly regardless of state.
The information processing apparatus may further include:
The user location estimator may estimate the user location using pedestrian dead reckoning (PDR).
The present embodiment makes it possible to control output to a user properly regardless of a state such as a state in which sound is not allowed to be produced, on the basis of a location of the user in a house and other user contexts.
The environment estimator may estimate the environment state on the basis of the location attribute.
For example, output to a user can be controlled such that the user can focus his/her mind on tasks when the user is at his/her desk for work during telecommuting, and output to the user can be controlled such that the user can get relaxed when the user is in a resting space.
The sensor section included in the wearable device may include at least one of an acceleration sensor, a gyroscope, a compass, a biological sensor, or a geomagnetic sensor.
The inside of a house is relatively narrow, which is different from the outdoors. Thus, there is generally a need for external equipment such as a high-accuracy beacon or camera in order to estimate a specific location in the house. On the other hand, the present embodiment makes it possible to specify, without external equipment, a location in a house using the acceleration sensor, the gyroscope, and/or the compass that are mounted on the wearable device.
The user location estimator may include
An angle at which the wearable device is worn differs depending on each user. Thus, an angle formed by a sensor axis of each of the acceleration sensor and the gyroscope differs depending on each user. Thus, the user location estimator can estimate, for each user, an angle formed by the sensor axis of the sensor section, and can estimate an orientation (an angle) with a high degree of accuracy using the estimated angle as a correction value, without depending on an individual difference.
The user location estimator may estimate a route of movement of the user location, and
For example, the present embodiment makes it possible to control output such that a user can focus his/her mind on tasks when the user is at his/her desk for work during telecommuting, and to control output such that the user can get relaxed when the user is in a resting space.
The location attribute estimator may hold a plurality of the routes of the movement, and may perform matching by checking the estimated route of the movement against the plurality of the routes of the movement to estimate the location attribute after the movement.
According to the present embodiment, a pattern of movement between locations and an order of the movement are stored, and a location of a user after movement can be specified using most recent movement patterns.
When the location attribute estimator has failed to perform matching a specified number of times, the location attribute estimator may output an alert.
Accordingly, notification that a location attribute after movement is to be estimated from a new movement route can be given to a user when, for example, a movement route that is completely different from any of a plurality of held movement routes is consecutively detected due to the user moving into a different building (such as a co-working space) from his/her own house.
The location attribute estimator may perform the matching using dynamic time warping (DTW).
The location attribute estimator may estimate the location attribute by determining a period of time for which the user stays at a location at which the user is.
The location attribute can be estimated more accurately by determining a staying period of time in addition to a movement route.
The information processing apparatus may further include
The context may include at least one of location information regarding a location of the user or terminal information regarding the information processing apparatus.
The user state can be estimated more accurately by estimating the user state not only on the basis of a location attribute but also on the basis of a context of a user.
The user state estimator may estimate the user state on the basis of the detection value detected by the sensor section included in the wearable device and/or on the basis of the location attribute.
Accordingly, the user state can be estimated more accurately.
The user state may include a plurality of activity states of the user.
For example, the user state includes activity states at four levels that respectively correspond to “break time”, “neutral”, “do not disturb (DND)”, and “offline”. “Break time” refers to a most relaxed state of activity, “neutral” refers to an ordinary state of activity, “DND” refers to a relatively busy state of activity, and “offline” refers to a busiest state of activity.
The output controller may include
The content controller may play back content that enables a user to focus his/her mind or content that enables the user to feel relaxed. The notice controller may reduce the number of notices or may stop the notice such that the user can focus his/her mind, and may cause a usual number of notices to be provided while the user is feeling relaxed.
The sensor section included in the wearable device may include an acceleration sensor, and
According to the present embodiment, a correction value for an azimuth corresponding to a user can be calculated only using an acceleration sensor. This makes it possible to calculate the correction value even in an environment with fewer mounted sensors. This makes it possible to reduce costs, to reduce power consumption, and to make an apparatus smaller.
The information processing apparatus may further include:
When the user is at a stop for a first period of time, the matching section may check the new detection value with the registration-use detection value to perform matching, and
The database generator may register, as the registration-use detection value, an average of a plurality of the detection values detected by the sensor section for a specified period of time.
An information processing apparatus according to an embodiment of the present disclosure includes:
An information processing method according to an embodiment of the present disclosure includes:
An information processing program according to an embodiment of the present disclosure causes a processor of an information processing apparatus to operate as
An information processing system according to an embodiment of the present disclosure includes:
Embodiments according to the present disclosure will now be described below with reference to the drawings.
An information processing system 10 includes an information processing apparatus 100 and a wearable device 200.
The information processing apparatus 100 is a terminal apparatus, such as a smartphone, a tablet computer, or a personal computer, that is used by an end user. The information processing apparatus 100 is connected to a network such as the Internet.
The wearable device 200 is a device used by being worn on a head of a user. Typically, the wearable device 200 is a wireless earphone (
In the information processing apparatus 100, a processor such as a CPU of a control circuit operates as a context acquisition section 110, a pedestrian dead reckoning (PDR) section 120 (a user location estimator), a location estimator 130 (a location attribute estimator), a user state estimator 140, an environment estimator 150, and an output controller 160 by loading, into a RAM, an information processing program recorded in a ROM and executing the information processing program.
The context acquisition section 110 acquires a context of a user. The context of a user includes location information and terminal information. Here, the context refers to, for example, a sensor value acquired from the sensor section 210 and schedule information regarding a schedule of a user that is acquired from a calendar app. The context acquisition section 110 includes apparatuses such as a GPS sensor 111 and a beacon transmitter-receiver 112 that acquire location information as a context. The context acquisition section 110 further includes a terminal information acquiring section 113 that acquires terminal information as a context. As the terminal information corresponding to a context, the terminal information acquiring section 113 acquires, for example, screen lock information (locked or unlocked), behavior information regarding the behavior of a user (such as running, a bicycle, at rest, walking, or on board), a location (a specific location or an unspecific location at home, at the office, or the like), calendar app information (having or not having a schedule for a meeting), time information (on duty or off duty), phone app information (on the phone), sound-recognition-app information (during speech), an automatic do-not-disturb (DND) setting (in a time frame or outside of the time frame), and a manual do-not-disturb (DND) setting (on or offline).
The PDR section 120 (a user location estimator) estimates a user location on the basis of detection values (acceleration, angular velocity, and an azimuth) detected by the sensor section 210 included in the wearable device 200 worn by the user. Specifically, the PDR section 120 includes an angle correction section 121, an angle estimator 122, and a user location estimator 123. The angle correction section 121 calculates a correction value for an azimuth corresponding to the user on the basis of detection values (acceleration, angular velocity, and an azimuth) detected by the sensor section 210 included in the wearable device 200 worn by the user. The angle estimator 122 estimates the azimuth corresponding to the user on the basis of the detection values (acceleration, angular velocity, and an azimuth) detected by the sensor section 210 included in the wearable device 200 worn by the user, and on the basis of the correction value. The user location estimator 123 uses the azimuth after correction to estimate the location of the user. Pedestrian dead reckoning (PDR) is a technology used to measure a relative location from a certain reference point on the basis of detection values from autonomously operating sensors of a plurality of autonomously operating sensors. In this example, on the basis of acceleration, angular velocity, and an azimuth that are respective detection values from the acceleration sensor 211, the gyroscope 212, and the compass 213, the PDR section 120 estimates a change in user location from a certain room to another room, that is, a route of movement of a user location.
The location estimator 130 (a location attribute estimator) estimates an attribute (a location attribute) of a location at which a user is, on the basis of a change in a user location estimated by the PDR section 120. In other words, the location estimator 130 estimates a location attribute after movement of a user on the basis of a movement route estimated by the PDR section 120. The location attribute refers to a portion, in a building, that is smaller than, for example, the building itself. The location attribute refers to, for example, a living room, a bedroom, a toilet, a kitchen, and a washroom in a house. Alternatively, the location attribute refers to, for example, a desk and a meeting room in a co-working space. However, the location attribute is not limited thereto, and the location attribute may represent, for example, a building itself, or may represent both a building itself and a portion in the building.
The user state estimator 140 estimates a user state on the basis of a context acquired by the context acquisition section 110, detection values (acceleration, angular velocity, and an azimuth) detected by the sensor section 210 included in the wearable device 200, and a location attribute estimated by the location estimator 130. The user state includes activity states of a user at a plurality of levels. For example, the user state includes activity states at four levels that respectively correspond to “break time”, “neutral”, “do not disturb (DND)”, and “offline”. “Break time” refers to a most relaxed state of activity, “neutral” refers to an ordinary state of activity, “DND” refers to a relatively busy state of activity, and “offline” refers to a busiest state of activity. Further, in addition to the four levels described above, any number of levels may be set in a system, or a user may be allowed to set the number of levels as appropriate.
The environment estimator 150 estimates an environment state to be provided to a user, on the basis of a user state estimated by the user state estimator 140. The environment estimator 150 may further estimate an environment state to be provided to a user, on the basis of a location attribute estimated by the location estimator 130. The environment state to be provided to a user is, for example, an environment state that enables the user to focus his/her mind (concentrate) or an environment state that enables the user to feel relaxed.
The output controller 160 controls output on the basis of an environment state estimated by the environment estimator 150. Specifically, the output controller 160 includes a content controller 161 and a notice controller 162. The content controller 161 plays back content (such as a piece of music and a video) that is selected on the basis of an environment state estimated by the environment estimator 150. For example, it is sufficient if the content controller 161 notifies an environment state to a digital service provider (DSP) through a network, receives content (such as content that enables a user to focus his/her mind or content that enables a user to feel relaxed) selected by the DSP on the basis of the notified environment state, and plays back the received content. The notice controller 162 controls the number of notices provided to a user, on the basis of the environment state. For example, the notice controller 162 may perform processing such as reducing the number of notices (such as a new-reception notice from an app or for a message) or stopping the notice such that the user can focus his/her mind, and causing a usual number of notices to be provided while the user is feeling relaxed.
Typically, the wearable device 200 is a wireless earphone. The wearable device 200, which is a wireless earphone, includes a speaker 221, a driver unit 222, and a sound tube 223 that connects the speaker 221 and the driver unit 222. The speaker 221 is inserted into an earhole to determine a position of the wearable device 200 relative to the ear, and the driver unit 222 is situated behind the ear. The sensor section 210 including the acceleration sensor 211 and the gyroscope 212 is included within the driver unit 222.
An angle that the driver unit 222 of the wearable device 200 forms with a frontal facial side differs depending on each user. Thus, an angle that a sensor axis of each of the acceleration sensor 211 and the gyroscope 212 of the sensor section 210 included within the driver unit 222 forms with a frontal facial side differs depending on each user. For example, a state in which the wearable device 200 is worn by a user by being shallowly inserted into an ear of the user is given in (a), and a state in which the wearable device 200 is immovably worn by a user by being deeply inserted into the ear of the user is given in (b). In some cases, a difference between the user of (a) and the user of (b) in an angle formed by a sensor axis with a frontal facial side is greater than or equal to 30 degrees. Thus, the PDR section 120 estimates, for each user, an angle formed by the sensor axis of the sensor section 210 with the frontal facial side, and estimates an orientation (an angle) of a face with a high degree of accuracy using the estimated angle as a correction value, without depending on an individual difference.
With respect to an azimuth corresponding to the wearable device 200, there is the following relationship between an update value AzimuthE obtained using sensor values acquired by the sensor section 210, and a difference AzimuthOffset in an orientation relative to a frontal facial side that is obtained when the wearable device 200 is worn: “Azimuth-AzimuthE+AzimuthOffset”. Here, AzimuthE is obtained from a three-dimensional pose obtained by summing sensor values acquired by the gyroscope 212 detecting angular velocity. On the other hand, AzimuthOffset differs depending on each user, and thus is not measured only by the wearable device 200 being worn. Thus, there is a need to estimate AzimuthOffset for each user.
In order to estimate a pose, two coordinate systems are defined in a state in which the two ears are horizontalized. A coordinate system (1) corresponds to a global frame (fixed), and is a coordinate system formed by a Z axis that is a perpendicular that extends upward over the head, an X axis that is formed by connecting the two ears and of which a right direction represents a positive value, and a Y axis orthogonal to the X axis and the Z axis. A coordinate system (2) corresponds to a sensor frame, and is a coordinate system (XE, YE, ZE) that is fixed relative to the sensor section 210 of the wearable device 200. A pose difference (AzimuthOffset) that corresponds to a correction value represents an amount of rotation of the coordinate system (2) relative to the coordinate system (1).
A user wears the wearable device 200, and turns his/her head downward from a state of facing the front ((a) of
The angle correction section 121 obtains an axis of rotation [αX,αy, αz,]T using the collected values of angular velocity acquired by the gyroscope 212. This axis of rotation is obtained with respect to the sensor axis. Next, the following is a rotation matrix (RotMat) at t0 that is defined by the angle correction section 121: “RotMat at t0=RZ(yaw)*RX(pitch)*RY(roll)”. RotMat is obtained with respect to a frontal facial side. RZ(⋅), RX(⋅), and RY(⋅) respectively represent rotation matrices of the Z axis, the Y axis, and the X axis. Pitch and roll with respect to a frontal facial side are obtained using an acceleration sensor, whereas yaw is unknown. The angle correction section 121 can calculate yaw using the following relationship: RotMat*axis=[1;0;0] (the process of (4) in
Next, a method for specifying a frontal side in a natural state, not in a state in which the head is turned downward, is described.
An initial pose (a head center pose) with a downwardly turned head is represented by I3×3 (an identity matrix). A pose of a sensor (referred to as a right sensor pose on the assumption that the sensor is worn in a right ear) is represented by Rt0 (RotMat in
If Rt0, which represents a pose (the right sensor pose) of a right sensor, is obtained using the method in
A user moves usually in a house wearing the wearable device 200. The location estimator 130 stores a pattern and an order of the movement. The number of locations to which the user moves in the house and a number for the order of the movement are often determined to be finite numbers. The location estimator 130 specifies a location from N (for example, N=3) most recent movement patterns.
In
For example, the use wears the wearable device 200, and starts working in the living room. After a while, the user goes to the toilet, then washes his/her hands in the washroom, and then returns to his/her desk. Then, after a while, the user moves to the kitchen for a drink, and returns to the living room with the drink. The following are movement patterns in this case. From the living room to the toilet (the route (3)). From the toilet to the living room (the route (4)). From the living room to the kitchen (the route (5)). From the kitchen to the living room (the route (6)).
The location estimator 130 stores the four patterns and an order of the patterns. When the user moves next time, a pattern of the movement is checked against the stored patterns to perform matching. The location estimator 130 can specify a location after the movement when the matching succeeds, and the location estimator 130 adds the movement pattern to a route list as a new pattern when the matching fails. The route list (on the right in
As described above, the location estimator 130 holds a plurality of movement routes, and performs matching by checking a movement route estimated by the PDR section 120 against the plurality of held movement routes. Accordingly, the location estimator 130 can estimate a location attribute (such as a living room, a bedroom, a toilet, a kitchen, and a washroom) after movement. Further, the location estimator 130 may estimate the location attribute by determining a staying period of time for which a user stays at a location at which the user is. The location attribute can be estimated more accurately by determining a staying period of time in addition to a movement route.
Patterns for leaving the same departure point for the same destination in different ways of walking, may fail to be matched. Thus, learning is performed by adding the pattern as a stored pattern. Thus, a plurality of patterns is learned even for a movement between locations of the same pair of locations. A change in a location of a user is given in a coordinate system illustrated in
The location estimator 130 assigns, to a route, a label used to indicate an attribute of the route when the location estimator 130 learns the route. This makes it possible to automatically display a label used to indicate an attribute when matching succeeds. Next, an operation of the location estimator 130 is more specifically described.
The PDR section 120 estimates a change in user location from a certain room to another room, that is, a route of movement of a user location (Step S201). The location estimator 130 detects that the user has stopped, on the basis of the change in user location that is detected by the PDR section 120 (Step S202, YES). The location estimator 130 increments (+1) a stop counter (Step S203). When the number of movements from a certain room to another room reaches N (for example, N=3) or more (Step S204, YES), the location estimator 130 performs matching by checking N (for example, N=3) most recent routes against a plurality of held movement routes (Step S205). When matching succeeds (Step S206, YES), the location estimator 130 specifies a location after movement (Step S207). On the other hand, when matching fails (Step S206, NO), the location estimator 130 adds the route to a route list as a new pattern (Step S208).
Here, there is a possibility that a movement route that is completely different from any of a plurality of held movement routes will be consecutively detected due to a user moving into a different building (such as a co-working space) from his/her own house. In this case, the location estimator 130 consecutively fails to perform matching (Step S206, NO) for some time (Step S209, YES). On the other hand, when the number of new movement routes sufficient to succeed in performing matching are accumulated in a route list (Step S208), matching succeeds (Step S206, YES), and this makes it possible to specify a location after movement (Step S207). When matching fails consecutively a specified number of times (Step S209, YES), the location estimator 130 outputs an alert indicating that the location may be another location that is not registered in the route list (Step S210). This makes it possible to notify a user that a location attribute after movement is to be estimated from a new movement route.
As described above, patterns for leaving the same departure point for the same destination in different ways of walking, may fail to be matched. Thus, learning is performed by adding the pattern as a stored pattern. A method for performing such learning is described. A distance between a set of N most recent routes and each of N patterns stored in a database is calculated by dynamic time warping (DTW), and the calculated distance is compared to a threshold. The dynamic time warping (DTW) is an approach used to measure a distance or similarity between pieces of time-series data. When ways of walking are different, a calculated distance or similarity may be greater than a threshold for DTW. In this case, the calculated distance or similarity is stored as other data.
The location estimator 130 may estimate an attribute (a location attribute) of a location, outdoors in particular, at which a user is, on the basis of pieces of location information acquired by the GPS sensor 111 and the beacon transmitter-receiver 112. The location estimator 130 may estimate the attribute (a location attribute) of a location at which a user is, on the basis of biological information acquired by the biological sensor 214. For example, when it is determined, using the biological sensor 214 (such as a heartbeat sensor), that a user is falling asleep, the location estimator 130 may estimate a bedroom as a location attribute.
The context acquisition section 110 acquires a context of a user. The user state estimator 140 estimates a user state on the basis of the context acquired by the context acquisition section 110, detection values (acceleration, angular velocity, and an azimuth) detected by the sensor section 210 included in the wearable device 200, and a location attribute estimated by the location estimator 130. The environment estimator 150 estimates an environment state to be provided to a user (such as “focus” (“concentration”) or “relax”).
The user state estimator 140 estimates a user state on the basis of a context acquired by the context acquisition section 110, detection values (acceleration, angular velocity, and an azimuth) detected by the sensor section 210 included in the wearable device 200, and a location attribute estimated by the location estimator 130. The context of a user includes location information and terminal information. Examples of the terminal information includes screen lock information (locked or unlocked), behavior information regarding the behavior of a user (such as running, a bicycle, at rest, walking, or on board), a location (a specific location or an unspecific location at home, at the office, or the like), calendar app information (having or not having a schedule for a meeting), time information (on duty or off duty), phone app information (on the phone), sound-recognition-app information (during speech), an automatic do-not-disturb (DND) setting (in a time frame or outside of the time frame), and a manual do-not-disturb (DND) setting (on or offline). The user state includes activity states of a user at a plurality of levels. For example, the user state includes activity states at four levels that respectively correspond to “break time”, “neutral”, “do not disturb (DND)”, and “offline”. “Break time” refers to a most relaxed state of activity, “neutral” refers to an ordinary state of activity, “DND” refers to a relatively busy state of activity, and “offline” refers to a busiest state of activity.
The user state estimator 140 estimates a user state by mapping a context to the user state. For example, the user state estimator 140 estimates the user state to be “DND” when screen lock information that is a context is “unlocked”, and estimates the user state to be “neutral” when the screen lock information is “locked”. The user state estimator 140 estimates the user state for each context other than the screen lock information. Further, the context is not limited to what is illustrated in
The user state estimator 140 estimates the user state to be “offline” when at least one “offline” is included with respect to a plurality of contexts. The user state estimator 140 estimates the user state to be “DND” when “offline” is not included and at least one “DND” is included with respect to a plurality of contexts. The user state estimator 140 estimates the user state to be “neutral” when “offline”/“DND”/“break time” are not included with respect to a plurality of contexts. The user state estimator 140 estimates the user state to be “break time” when “offline”/“DND” are not included and “break time” is included.
The environment estimator 150 estimates an environment state to be provided to a user, on the basis of a user state estimated by the user state estimator 140 and a location attribute estimated by the location estimator 130. The environment state to be provided to a user is, for example, an environment state that enables the user to focus his/her mind (concentrate) or an environment state that enables the user to feel relaxed. For example, (1) when the period of time is “on duty”, the user state is “neutral”, the behavior is “stay”, and the location is “desk”, the environment estimator 150 estimates, to be “focus”, the environment state to be provided to a user. (2) When the period of time is “on duty” and the user state is “break time”, the environment estimator 150 estimates, to be “relax”, the environment state to be provided to a user. (3) When the period of time is “off duty” and the user state is “break time”, the environment estimator 150 estimates, to be “relax”, the environment state to be provided to a user.
The content controller 161 of the output controller 160 plays back content (such as a piece of music and a video) that is selected on the basis of an environment state estimated by the environment estimator 150. For example, it is sufficient if the content controller 161 notifies an environment state to a digital service provider (DSP) through a network, receives content (content that enables a user to focus his/her mind or content that enables a user to feel relaxed) selected by the DSP on the basis of the notified environment state, and plays back the received content. For example, when a user is on duty and the user state is “focus”, the content controller 161 plays music that enables the user to concentrate, and when the user state is “relax”, the content controller 161 plays music that enables the user to feel relaxed. For example, when a user is falling asleep and the user state is “relax”, the content controller 161 plays music that promotes sleep, and when the user is asleep, the content controller 161 stops the music.
The notice controller 162 of the output controller 160 controls the number of notices provided to a user, on the basis of an environment state. For example, it is sufficient if the notice controller 162 reduces the number of notices (such as a new-reception notice from an app or for a message) or stops the notice such that the user can focus his/her mind, and causes a usual number of notices to be provided while the user is feeling relaxed. For example, when a user is on duty and the user state is “focus”, the notice controller 162 reduces the number of notices, and when the user state is “relax”, the notice controller 162 causes a usual number of notices to be provided.
There is a technology that performs sound recognition on speech and environmental sound, selects content such as a piece of music on the basis of the recognized sound, and outputs the selected content. The technology that performs sound recognition on speech and environmental sound can only be applied to an environment in which sound is produced. Thus, there is a possibility that proper content will not be selected by a user who does not want to produce sound, or in a state in which sound is not allowed to be produced. Further, there is a need for a high computational performance to perform natural language processing. This results in difficulty in performing processing locally.
On the other hand, the present embodiment makes it possible to output content that encourages a user to focus his/her mind (concentrate) or to get relaxed, on the basis of a location of the user in a house and other user contexts. The present embodiment makes it possible to control output to a user properly regardless of a state such as a state in which sound is not allowed to be produced. For example, on the basis of the user context, content that enables a user to focus his/her mind can be output when the user is at his/her desk for work during telecommuting, and music that enables the user to feel relaxed can be provided when the user is in a resting space.
The inside of a house is relatively narrow, which is different from the outdoors. Thus, there is generally a need for external equipment such as a high-accuracy beacon or camera in order to estimate a specific location in the house. On the other hand, the present embodiment makes it possible to specify, without external equipment, a location in a house using the sensor section 210 (the acceleration sensor 211, the gyroscope 212, and the compass 213) mounted on the wearable device 200. Specifically, a pattern of movement between locations and an order of the movement are stored, and a location of a user after movement can be specified using N most recent movement patterns.
Telecommuting is a common practice these days. Not only does a user get relaxed in a house, but the user focuses his/her mind on duty in the house for a long time. It is conceivable that there could be an increase in the number of users who do not want to produce sound in such a case, or there could be an increase in the number of states in which sound is not allowed to be produced in such a case, compared to when telecommuting was not widespread in the past. Thus, an approach with no speech, such as the present embodiment, that includes specifying a location in a house, estimating an environment state to be provided to a user, and controlling output to the user will offer a higher value in use in the future.
Further, in the present embodiment, a context acquired from each piece of sensor information is mapped to a user state to estimate the user state. This makes it possible to estimate the user state without producing sound by speech. In the present embodiment, a context acquired from each piece of sensor information is mapped to a user state. Thus, a calculation amount is greatly smaller, and this makes it easier to perform processing locally, compared to when natural language processing is applied.
A content playback system 20 includes the information processing apparatus 100 and the wearable device 200.
In the information processing apparatus 100, a processor such as a CPU of a control circuit loads, into a RAM, a content playback controlling application 300, a content provision application 400, and a preset application 500 that are recorded in a ROM, and executes the applications. Note that the content playback controlling application 300 does not have to be installed on the information processing apparatus 100, but may be installed on the wearable device 200 to be executed by the wearable device 200.
As described above, examples of the wearable device 200 include a wireless earphone (refer to
The content provision application 400 provides content. The content provision application 400 is a group of applications including different content provision applications 401 and 402 of a plurality of different content provision applications. For example, the different content provision applications 401 and 402 of the plurality of different content provision applications provide different genres of content (that is specifically sound content) such as a piece of music, environmental sound, healing sound, and a radio program. The different content provision applications 401 and 402 of the plurality of different content provision applications are each simply referred to as the “content provision application 400” when they are not particularly to be distinguished.
The content playback controlling application 300 includes the context acquisition section 110, the pedestrian dead reckoning (PDR) section 120 (a user location estimator), the location estimator 130 (a location attribute estimator), the user state estimator 140, the environment estimator 150, and the content controller 161 of the output controller 160 that are described above (refer to
The preset application 500 preassigns, to different functions of a plurality of different functions, different operations of a plurality of operations input by a user to the input apparatus 220 of the wearable device 200, the different functions of the plurality of different functions being related to services provided by the content provision application 400. For example, the preset application 500 preassigns the different operations to selections of the different content provision applications 401 and 402 of the plurality of different content provision applications. The different operations (such as a single tap, a double tap, a triple tap, and pressing of radio buttons) input by a user to the input apparatus 220 of the wearable device 200 are reassigned to selections of the different content provision applications 401 and 402 of the plurality of different content provision applications. The preset application 500 may be independent of the content playback controlling application 300, or may be included in the content playback controlling application 300.
For example, the preset application 500 includes a playback control GUI 710, a volume control GUI 720, and a quick access controlling GUI 730. Note that the kind of GUI provided by the preset application 500 and a settable combination of a function and an operation differ depending on a model of the wearable device 200.
Using the playback control GUI, a user can assign, to functions performed upon content playback, different operations of a plurality of different operations respectively input by the user to the input apparatuses 220 of the right and left wearable devices 200. For example, the user can assign a single-tap operation of the wearable device 200 on the right to playing and pausing, can assign a double-tap operation of the wearable device 200 on the right to playing of a next piece of music, can assign a triple-tap operation of the wearable device 200 on the right to playing of a previous piece of music, and can assign a long-press operation of the wearable device 200 on the right to start of a voice assistance function. Note that the functions assigned to the respective operations may be functions other than the functions described above, or the functions may be assigned to the operations by default.
Using the volume control GUI 720, a user can assign, to functions performed to control volume, different operations of a plurality of different operations respectively input by the user to the input apparatuses 220 of the right and left wearable devices 200. For example, the user can assign a single-tap operation of the wearable device 200 on the left to volume up, and can assign a long-press operation of the wearable device 200 on the left to volume down.
Using the quick access controlling GUI 730, a user can assign, to quick access functions, different operations of a plurality of different operations respectively input by the user to the input apparatuses 220 of the right and left wearable devices 200, each quick access function being performed to select and open a corresponding one of the different content provision applications 401 and 402 of the plurality of different content provision applications. For example, the user can assign a double-tap operation of the wearable device 200 on the left to opening of the content provision application 401, and can assign a triple-tap operation of the wearable device 200 on the left to opening of the content provision application 402.
As described above, the preset application 500 can assign different operations of a plurality of different operations not only to playback control and volume control that are performed while the content provision application 400 is running, but also to selection and opening of the content provision application 400, the different operations of the plurality of different operations being respectively input by the user to the input apparatuses 220 of the right and left wearable devices 200.
In the content playback controlling application 300, the context acquisition section 110 acquires a context of a user. The user state estimator 140 estimates a user state (activity states at four levels including “break time”, “neutral”, “do not disturb (DND)”, and “offline”) on the basis of the context acquired by the context acquisition section 110, detection values (acceleration, angular velocity, and an azimuth) detected by the sensor section 210 included in the wearable device 200, and a location attribute estimated by the location estimator 130. Here, the user states at four levels are described as an example, but the number of levels may be greater than or less than four. Further, any number of user states may be set by a user. On the basis of the user state estimated by the user state estimator 140, the environment estimator 150 estimates an environment state to be provided to a user (such as “focus” (“concentration”) or “relax”) (refer to
In the content playback controlling application 300, the content controller 161 of the output controller 160 selects the content provision application 400. For example, the content controller 161 selects the content provision application 400 on the basis of one of different operations that is input by a user to the input apparatus 220 of the wearable device 200. For example, the content controller 161 selects the content provision application 401 when the operation input by a user to the input apparatus 220 of the wearable device 200 is a double tap, and the content controller 161 selects the content provision application 402 when the operation input by the user to the input apparatus 220 of the wearable device 200 is a triple tap. Further, the content controller 161 selects the content provision application 400 on the basis of an environment state (a scenario described later) estimated by the environment estimator 150 (Step S302). Furthermore, the content controller 161 may select the content provision application 400 on the basis of training performed such that, despite the fact that the condition remains unchanged, a certain scenario no longer comes about as a result of repetition of refusal, or on the basis of setting performed by a user (such as setting the content provision application 400 in advance according to the state).
For example, the content controller 161 refers to a table 600 to select the content provision application 400. The table 600 includes an ID 601, a scenario 602, a user context 603, and a cue 604. The scenario 602 corresponds to an environment state estimated by the environment estimator 150. The user context 603 corresponds to a user state estimated by the user state estimator 140 on the basis of a context of a user that is acquired by the context acquisition section 110. The cue 604 is a cue used by the content provision application 400 to select content. The table 600 records therein a selection flag 605 for the content provision application 401 and a selection flag 606 for the content provision application 402 for each of nine records respectively assigned Music_01 to Music_09 given as the ID 601. The record for which only the selection flag 605 is recorded means that the content provision application 401 is selected when the scenario 602 (an environment state) corresponding to that record comes about. On the other hand, the record for which both of the selection flags 605 and 606 are recorded means that one of the content provision applications 401 and 402 is selected under a different condition when the scenario 602 (an environment state) corresponding to that record comes about. For example, the content controller 161 may learn in advance, for example, which of the content provision applications 400 is executed oftener at a current time and which of the content provision applications 400 is used oftener, and may perform selection.
In the content playback controlling application 300, the content controller 161 of the output controller 160 generates, on the basis of the scenario 602 (an environment state), a cue 604 used by a selected content provision application 400 to select content (Step S303). The content controller 161 outputs the generated cue to the selected content provision application 400, causes the content provision application 400 to select content on the basis of the cue, and causes the content to be played back by the wearable device 200 (Step S304). For example, the content provision application 400 may select a plurality of pieces of candidate content on the basis of a cue output from the content playback controlling application 300, and may select a piece of playback-target content from the plurality of pieces of candidate content on the basis of a detection value input by the sensor section 210 of the wearable device 200. Further, the content provision application 400 may select, for example, content with a fast tempo matched to a running speed of a user, on the basis of a detection value input by the sensor section 210.
After playback is started, the content controller 161 of the content playback controlling application 300 detects, on the basis of the environment state, a timing at which playback of other content is to be started (Step S301), selects the content provision application 400 (Step S302, which can be skipped), generates the cue 604 (Step S303), and causes the content to be played back by the wearable device 200 (Step S304). In other words, the content playback controlling application 300 has user information (that is, the user context 603 (a user state) and the scenario 602 (an environment state)) that is not available to the content provision application 400. Thus, the content playback controlling application 300 knows about a case in which it is desirable that content that is being played back by the content provision application 400 be changed. For example, a change in the feelings of a user can be led by changing content that is being played back, with a state of being going to work or finishing of work being used as a trigger. When the user information (that is, the user context 603 (a user state) and the scenario 602 (an environment state)) becomes available to the content playback controlling application 300, the content playback controlling application 300 transmits, to the content provision application 400, a cue used to change content that is being played back. This makes it possible to provide a user with more desirable content (such as a piece of music or healing sound).
Further, the content controller 161 of the content playback controlling application 300 generates a cue used by the content provision application 400 to stop (but not change) playback of content on the basis of the scenario 602 (an environment state) (Step S303), outputs the cue to the content provision application, and causes the content provision application 400 to stop the playback of the content on the basis of the cue (Step S304). For example, in some cases, it is better if music is stopped due to a change in state such as start of a meeting. The content playback controlling application 300 detects such a state to transmit a stop instruction to the content provision application 400.
Furthermore, during playback of content, the content provision application 400 may select content with a fast tempo matched to a running speed of a user, on the basis of a detection value input by the sensor section 210, that is, according to a specified value of, for example, a heart rate or acceleration, and may plays back the selected content. In other words, during playback of content, the content provision application 400 can actively select, without receiving a cue from the content controller 161 of the content playback controlling application 300, an attribute (such as a tempo and a pitch) of playback-target content on the basis of a detection value input by the sensor section 210, and can play back the selected content. In short, the content provision application 400 can actively change playback-target content during playback of content.
According to the content playback system 20 of the present embodiment, the content playback controlling application 300 selects the content provision application 400 and outputs a cue to the selected content provision application 400. This results in there being no need to consider, on the side of the content provision application 400, whether the different content provision applications 401 and 402 of the plurality of different content provision applications compete with each other upon content playback.
Further, the content playback controlling application 300 generates, on the basis of an environment state corresponding to sensitive information regarding a user, a cue used by the content provision application 400 to select content. Thus, the content provision application 400 can play back content in which an environment state corresponding to sensitive information regarding a user is reflected, without the content playback controlling application 300 sharing the environment state corresponding to sensitive information regarding a user with the content provision application 400. This makes it possible to reduce the security risk and to provide an improved user experience.
Furthermore, the content playback controlling application 300 selects the content provision application 400, and the selected content provision application 400 plays back content. Moreover, the preset application 500 enables the content playback controlling application 300 to select the content provision application 400 on the basis of one of different operations that is input by a user to the input apparatus 220 of the wearable device 200. This makes it possible to provide a user experience obtained by integrating services provided by the different content provision applications 401 and 402 of the plurality of different content provision applications, without there being no need for a user to actively perform selection.
As described with reference to
Thus, in the embodiments above, the angle correction section 121 calculates an inclination in a pitch direction and an inclination in a roll direction using a value of acceleration acquired by the acceleration sensor 211 when a user turns his/her head downward ((b) of
On the other hand, how the angle correction section 121 calculates not only the inclination in the pitch direction and the inclination in the roll direction but also the inclination in the yaw direction only using the value of angular velocity acquired by the gyroscope 212 without the value of angular velocity acquired by the gyroscope 212, is described below.
It is assumed that a setting application 800 that is a user interface is installed on the information processing apparatus 100 (such as a smartphone, a tablet computer, or a personal computer) and that a user can use the setting application 800 by use of a display apparatus and an operation apparatus (such as a touch panel) of the information processing apparatus 100.
First, a user operates the operation apparatus to cause the setting application 800 to give an instruction to start measurement. The setting application 800 outputs angle correction operating data 801 to the wearable device 200 (Step S400).
In response to receiving the instruction (the angle correction operating data 801) given by the setting application 800, the wearable device 200 starts transmitting, to the angle correction section 121, gravitational acceleration that is a detection value detected by the acceleration sensor 211.
The setting application 800 outputs, to the user wearing the wearable device 200, an instruction to face the front ((a) of
The angle correction section 121 calculates inclinations 802 in a pitch direction and in a roll direction using a value of gravitational acceleration occurring when the user faces the front (in the roll direction) ((a) of
Next, the setting application 800 outputs, to the user wearing the wearable device 200, an instruction to slowly turn his/her head upward and downward such that the head is not moved from side to side and then to stop the head for about one second ((b) and (c) of
The angle correction section 121 calculates an angle formed with a gravity axis, using axes of X, Y, and Z (Step S404). The angle correction section 121 determines whether the calculated angle satisfies a specified condition (Step S405). When a user faces the front, axes of X and Y for the acceleration sensor are nearly orthogonal to the gravity axis, and thus a measurement value gets closer to zero. The specified condition is a condition applied to avoid such a state, and is satisfied when an angle formed with a Z axis is a sufficient angle of bend and an error occurring due a state in operation is not measured (which will be described in detail later). When the condition is not satisfied, the angle correction section 121 outputs measurement progress data 808 used to give an instruction to move the head upward and downward again (to display the instruction on the display apparatus) (Step S405, No).
On the other hand, when the condition is satisfied (Step S405, Yes), the angle correction section 121 calculates an inclination 803 of the user in a yaw direction using values of gravitational acceleration occurring when the user turns his/her head upward and downward (the pitch direction) ((b) and (c) of
The angle estimator 122 reads correction values 806 (the inclinations 802 in the pitch direction and in the roll direction as well as the inclination 803 in the yaw direction) stored in the nonvolatile storage region 805. The angle estimator 122 estimates an azimuth 807 corresponding to the user on the basis of a detection value (acceleration) detected by the acceleration sensor 211 of the sensor section 210 of the wearable device 200 worn by the user, and on the basis of the read correction values 806. The angle estimator 122 may output the azimuth 807 to the setting application 800.
A coordinate system that is fixed to a user in a reference pose is represented using (X,Y,Z). An X axis (a pitch axis) as viewed by the user is horizontally oriented to the right, a Y axis (a roll axis) as viewed by the user is horizontally oriented toward the front (forwardly), and a Z axis (a yaw axis) as viewed by the user is vertically oriented upward. On the other hand, a three-dimensional local coordinate system of the acceleration sensor 211 mounted on the wearable device 200 is represented using (x,y,z). The three-dimensional coordinate systems are both a right-handed system.
Due to an individual difference in how a user wears the wearable device 200, there are relative shifts between the two coordinate systems respectively represented using (X, Y,Z) and (x,y,z) with respect to three degrees of freedom. The user coordinate system (X,Y,Z) can be obtained from the local coordinate system (x,y,z) of the wearable device 200 by specifying the shifts. Here, components of two degrees of freedom that represent inclinations with respect to a horizontal plane from among the shifts are calculated using a value measured by the acceleration sensor 211 of the wearable device 200 in a state in which a user is at rest in the reference pose.
Three angles that respectively represent amounts of relative shifts are defined. Various methods for defining an angle are conceivable. Here, in order to be suitable for quaternions described later, definition is made such that a coordinate axis corresponding to the user coordinate system is rotated three times to be a coordinate axis corresponding to the coordinate system of the wearable device 200. For example, a coordinate axis corresponding to the user coordinate system is rotated by an angle α about the X axis. The angle α is caused to be identical to an angle finally formed by the y axis with the horizontal plane. Next, about the y axis after the rotation, the coordinate axis is rotated by an angle β. Here, an angle formed by the x axis with the horizontal plane is caused to be identical to an angle (y) finally formed by the x axis with the horizontal plane. At the end, the coordinate axis is rotated by an angle θ about the Z axis. The angle θ is caused to be identical to an angle formed by a horizontal-plane component of a finally obtained y-axis vector with the Y axis. The angles α and β are calculated using a value acquired by the acceleration sensor 211 when a user is at rest. Note that θ is obtained by another method since 0 is not obtained by calculation (every value could be a solution).
(Ax,Ay,Az) is assumed to be a measurement value measured by the acceleration sensor 211 in an (x,y,z) direction. Formula 1 is represented by defining α.
There are a vertical plane including an orientation vector of the x axis, and a vertical plane including an orientation vector of the z axis. Formula 2 is obtained using angles γ and δ respectively formed by the axes of x and z with the horizontal plane.
The angle β is obtained using Formula 3 by use of Formula 2.
Using the angles α and β obtained from results described above, the coordinate system of the wearable device 200 that is represented using (x,y,z) is transformed into a coordinate system of the wearable device 200 that is represented using (x′,y′,Z) and in which inclinations with respect to the horizontal plane have been corrected. Both x′ and y′ are situated in the horizontal plane, and are respectively obtained by x and y being rotated by the angle θ about the Z axis. An acceleration value acquired in a coordinate system of the wearable device 200 after correction performed by correcting inclinations is used to calculate θ as described later. This makes it possible to perform calculation with a high degree of accuracy without an axial shift.
An example of rotation calculation performed using quaternions to transform an acceleration vector (Ax,Ay,Az) in a coordinate system of the wearable device 200 into an acceleration vector (Ax′,Ay′,Az′) in a coordinate system of the wearable device 200 after correction, is described. A relationship between the two coordinate systems is obtained by combining the first two rotations from among the rotations illustrated in
A quaternion R that represents rotation and is obtained by combining those rotation quaternions can be obtained using a formula indicated below. Here, * represents a quaternion conjugate.
R=Q
1
*Q
2*
The calculation performed to transform a vector of acceleration measured in a coordinate system of the wearable device 200 into a vector in a coordinate system of the wearable device 200 after correction, can be represented using a formula indicated below by use of R.
(Ax′,Ay′,Az′)=R*(Ax,Ay,Az)R
A value (x,y,z) of gravitational acceleration measured by the acceleration sensor 211 in three axes is transformed into polar coordinates to calculate a yaw rotation. A distance from an origin is defined as r, an angle formed with the Z axis is defined as θ, and an angle formed with the X axis is defined as Φ. Here, (x,y,z) and (r,θ,Φ) are included in relational expressions indicated below.
x=r sin θ cos θ,y=r sin θ sin Φ,z=r cos θ
These expressions are used to re-form expressions corresponding to Formula 5 (Step S404). Here, sgn represents a signum function.
A shift between the front as viewed by a user and the front as viewed from a sensor of the wearable device 200 that is to be obtained here using Φ, corresponds to the inclination in the yaw direction (Step S406).
Φ is calculated using results of measurement performed when a user faces upward and downward (
There is a possibility that, depending on a shape of an ear of a user and how the user wears the wearable device 200, the conditions will not be satisfied when the user faces upward or downward. Thus, two patterns of an upward movement and a downward movement are adopted.
Patent Literature 1 discloses detecting and adjusting turning of a head of a user. A rotation angle is measured by a gyroscope, a tilt of the gyroscope is measured by an acceleration sensor, and the “turning of the head of the user” is calculated to correct a location for localizing a sound image. A direction of the front is set by a user operation, and a turning movement from the front can be traced. However, all of the measurement is relative measurement using the “front as viewed by a user” as a reference. This is not allowed to be applied to an absolute coordinate system such as azimuth.
Patent Literature 2 discloses calculating, after removing an impact that a tilt of a road has, an angle at which a navigation apparatus is attached to an automobile. An acceleration sensor is used in combination with a gyroscope, a traveling speed sensor, or a GPS with respect to the yaw direction. Data is collected while a state of a vehicle such as being stopped or traveling is being detected, and acceleration of a vehicle in a traveling direction and in a right-and-left direction is detected to calculate the attachment angle using the detected data. This technology is dependent on automobile-specific characteristics, and thus is not allowed to be applied to a device worn by a human. Further, there is a need for an auxiliary sensor in addition to an acceleration sensor.
On the other hand, according to the present embodiment, a difference between a coordinate system of a sensor in a device that is worn on a head of a user, and a coordinate system that is determined by the user in any direction is measured to perform correction. This makes it possible to obtain a constant output result regardless of a shape of an ear of the user, a shape of the head of the user, and how the device is worn by the user. Correction is not performed with respect to relative coordinates. Thus, the present technology can also be applied to an absolute coordinate system such as azimuth.
According to the present embodiment, a user moves his/her head upward or downward (turns his/her head in the pitch direction) to calculate an inclination in the yaw direction using gravitational acceleration. When the yaw axis and the gravity axis are close to each other, it is difficult to calculate the inclination in the yaw report using gravitational acceleration. However, a tilt of the head in the pitch direction results in a change in gravitational acceleration applied to each axis. This makes it possible to calculate the inclination in the yaw direction. Depending on an ear shape, it may be difficult to calculate the inclination in the yaw direction even if the head is tilted in a specific pitch direction. However, calculation can be performed by measuring two patterns that are an upward pattern and a downward pattern.
According to the present embodiment, a correction value for an azimuth corresponding to a user can be calculated only using an acceleration sensor. This makes it possible to calculate the correction value even in an environment with fewer mounted sensors. This makes it possible to reduce costs, to reduce power consumption, and to make an apparatus smaller. Further, a drift may occur in a gyroscope itself depending on an environment in which the gyroscope is used or due to a continuous use of the gyroscope, whereas no drift occurs in an acceleration sensor. This results in an increase in the reliability in calculation.
On the assumption that the wearable device 200 is worn at all times, the content controller 161 (
For example, the content controller 161 may resume playing back content on the basis of a trigger (such as a tap or a gesture) pulled by a user (in an upper portion in
As an example, when the wearable device 200 is worn in the morning, the content controller 161 starts performing set resume play. When a user goes to work, the content controller 161 plays a playlist set for “going to work”. When the user arrives at the office, the content controller 161 changes the playlist according to the scene by playing a playlist set for “working at the office”. The content controller 161 stops playing the playlist when the user is at a meeting or is calling. When the meeting or the call is over, the content controller 161 resumes playing the playlist set for “working at the office”. For example, when the playing of a playlist is stopped in a state in which the wearable device 200 is being worn, the content controller 161 suggests starting playing a playlist according to the scene. When the user goes home, the content controller 161 plays a playlist set for “going home”. When the user arrives home, the content controller 161 stops content according to the scene by stopping content playback.
A previous day (the first day) has been already described with reference to
As an example, when the wearable device 200 is worn in the morning, the content controller 161 plays back content suitable for morning. When a user (a student in this example) goes to school, the content controller 161 resumes playing a piece of music last played (on the previous day) in a playlist set for “going to school”. When the user turns on a PC at a library, the content controller 161 plays a playlist set for “task”, and turns on a noise-canceling function. When the user is running, the content controller 161 plays back up-tempo content. When the user sits at his/her desk at home to study, the content controller 161 plays background music for concentration. When the user is under great stress, the content controller 161 plays background music for promoting meditation. When the user lies down on a bed at night, the content controller 161 plays background music for sleep, and when the user falls asleep, the content controller 161 stops content. Accordingly, content is automatically played back according to the behavior of a user just by the wearable device 200 being worn by the user at all times. This enables the user to live comfortably.
In the first implementation, a content control application 300 controls the content provision application 400. The content control application 300 determines a scene ID from, for example, a user state (“not busy”), and notifies the content provision application 400 of the scene ID. “Not busy” refers to a state that is not a state of “busy” (having a talk, a call, or an event scheduled in a calendar). The content provision application 400 determines, on the basis of the scene ID, a playlist suitable for a scene using a context, a user-specific content table, and last played back content, and plays the determined playlist.
In the second implementation, the content control application 300 records information regarding content last played back in a scene, and specifies a content ID for each context. The content control application 300 determines a scene ID from, for example, a user state (“not busy”), and notifies, on the basis of the scene ID, the content provision application 400 of the scene ID and an artist ID using a context, a user-specific content table, and last played-back content. The content provision application 400 selects a playlist that includes content specified using the content ID and the artist ID, and plays the selected playlist.
As an example, the content controller 161 remembers information regarding content heard by a user for 30 seconds or more (the case in which playback performed for 30 seconds or more is counted as playback performed once). At this point, the content information is associated with a specified classification type of “scene”, and is recorded, in the form of a log, together with information regarding contexts such as the time, a location, and the type of behavior. Examples of the content information include piece-of-music information, artist information, album information, playlist information, information regarding the order of pieces of music in a playlist, playback app information, and information regarding how many seconds a piece of music has been played for from the beginning. When the content controller 161 detects a context in conformity to a scene determining rule, the content controller 161 performs resume play at a point in time, in the scene, at which playback was stopped last time. Note that, when the content controller 161 remembers content information, the second for which a user has heard the content may be shorter than or longer than 30 seconds, may be set by the user as appropriate, or may be automatically set for each piece of content.
When content information is acquired using Audio/Video Remote Control Profile (AVRCP), this provides advantages in that there is no need for negotiation for a partner and there are no restrictions on type of membership, that is, paid membership or free membership. This provides a disadvantage in that playing may be resumed with a low degree of reproducibility. In other words, playback app information is not acquired, and text-based meta-information is acquired as piece-of-music information. Thus, there is a possibility that the content controller 161 will make a request that a service B performs playing, on the basis of meta-information regarding a piece of music played by a user using a service A but a matched piece of music will not be found and reproduction will fail.
When content information is acquired using Software Development Kit (SDK), this provides an advantage in that resuming of playing can be reproduced for each piece of music and for each artist. An ID for a piece of music, an artist ID, an album ID, and the like that are managed by the content provision application 400 can be acquired. Thus, an album that includes the piece of music and the artist can be played.
When content information is acquired using Generic Attribute Profile (GATT), this provides an advantage in that resuming of playing can be reproduced for each piece of music, for each artist, and for each playlist, which results in a highest quality of experience. If a uniform resource identifier (URI) of a playlist, the order of pieces of music, and the like are acquired in addition to an ID for a piece of music, an artist ID, and an album ID that are managed by the content provision application 400, resuming of playing a piece of music in a playlist starting in the middle of the piece of music, can be reproduced.
With respect to a first example of a playlist, pieces of music are classified into categories on the basis of user's preference, and pieces of music in each category are provided to the user in the form of a playlist, where the playlist is dynamically generated on the basis of preference selected by the user. A second example of a playlist is a playlist generated by a creator selecting pieces of music (a set of pieces of music). In the second example, the following three choices are conceivable when playing of a playlist is finished: the case in which the content controller 161 returns to the beginning of the playlist to play the playlist, the case in which the content controller 161 recommends a playlist that is likely to be related to the playlist of which playing has been finished, and plays the recommended playlist when a user accepts the recommended playlist, and the case in which the content controller 161 finishes playing a playlist.
When a user gets on a commuter train in the morning, content for “going to work in the morning” is recommended to the user by sound. When the user interacts by providing “Yes”, this enables the user to start hearing a piece of music last heard by the user during going to work in the morning at a point in time at which playing of the piece of music was stopped last time.
As an example of an implementation method, the content controller 161 remembers information regarding content heard by a user for 30 seconds or more (the case in which playback performed for 30 seconds or more is counted as playback performed once). Examples of the content information include piece-of-music information, artist information, album information, playlist information, information regarding the order of pieces of music in a playlist, playback app information, and information regarding how many seconds a piece of music has been played for from the beginning. At this point, the content controller 161 associates the content information with a specified classification type of “scene”, and records the content information in the form of a log together with information regarding contexts such as the time, a location, and the type of behavior. When the content controller 161 detects a context in conformity to a “scene” determining rule, the content controller 161 performs resume play at a point in time, in the scene, at which playback was stopped last time. Note that, when the content controller 161 remembers content information, the second for which a user has heard the content may be shorter than or longer than 30 seconds, may be set by a user as appropriate, or may be automatically set for each piece of content.
For example, the content controller 161 can resume playing a certain playlist played last time in a certain scene by specifying the content “YYYY” included in the certain playlist and last played back in the certain scene, and the playback time on the basis of piece-of-music information, playlist information, and information regarding how many seconds a piece of music has been played for from the beginning.
A user may specify in advance a playlist to be played for a scene. For example, a user sets a playlist A for a scene (1), sets a playlist B for a scene (2), and sets no playlist for a scene (3).
When no playlist is specified, the content controller 161 records a playlist that is being played by a user. The content controller 161 records a playlist C when the playlist C is being played for the scene (3). The content controller 161 plays the playlist C when the wearable device 200 is worn in the scene (3).
When a user does not change playlists manually, the content controller 161 does not change a combination of a scene and a playlist.
When a user changes playlists manually in the middle of a scene, the content controller 161 starts playing the playlist A in the scene (1), and changes the playlist to a playlist D in the middle of the scene (1). When the scene (1) is over and the scene (1) starts again after the elapse of a certain period of time, the content controller 161 plays (suggests) the playlist A. When the playlist A is refused, the content controller 161 suggests the playlist C (suggests in order of priority). Suggestion may be provided using a GUI. The case in which a user is encouraged to change playlists and accepts the change, is also included.
When a user displays a GUI in order to change playlists, the content controller 161 may make a recommendation on the basis of a scene.
The content controller 161 can reflect, in a dynamically generated playlist, preference with respect to a piece of music in a scene. The content controller 161 analyses preference using “skips” and “likes” in each scene, and this enables the content controller 161 to reflect preference in content desired in each scene and in a dynamic playlist generated for each scene.
The content controller 161 keeps on playing one playlist when apparatuses of a plurality of apparatuses are used to perform playing for respective scenes identical to each other. For example, in the case in which a user plays a playlist for Saturday nights on a smartphone, stops music for some time upon arriving home, and resumes playing music on an audio apparatus after a meal, the content controller 161 resumes playing the playlist for Saturday nights.
The content control application 300 defines a scene on the basis of the behavior of a user, information regarding the user, and an environment. Examples of the behavior include walking, running, laughing, being on a train, being at home, feeling good, and not feeling good. Examples of the information regarding a user include being at a meeting, being at the office, during shopping, and being on duty. Examples of the environment include weather and a period of time. The scene is defined by combining the pieces of information described above (all of the pieces of information do not necessarily have to be included). Examples of the scene include during going to work, being at the office, and running on day off.
The content control application 300 selects a playlist according to a scene and plays the selected playlist. Specifically, a playlist is selected to be played at the time of starting playing and at the time of changing scenes during playing. A scene and a playlist may be associated with each other by a user in advance. When there is a change in playlist in the middle of a scene, replacement may be performed (when playing of a playlist is stopped for some time, a preset playlist will be played again). For example, a playlist suitable for “during going to work” is selected to be played when a user goes to work, and a playlist that promotes concentration is selected to be played when a user is at the office. A piece of music that is being played at a timing at which there is a change in scene is played to the end, and a playlist suitable for a current scene is played after the playing of the piece of music is finished. Playing is suggested according to a scene when the wearable device 200 is worn. Not playing is another choice.
When a piece of music is selected for a certain scene, the content control application 300 plays a playlist previously played for the certain scene, the playing of the playlist starting at a point in time at which playing of the playlist was stopped last time. When playing of a piece of music is stopped in the middle of a scene, the piece of music is stored, and playing of the stored piece of music may be resumed when the certain scene occurs next time.
The content control application 300 confirms with a user whether the playlist is allowed to be changed when there is a change in scene. The user can refuse the change in playlist in response to the confirmation. Notice sound is provided at the same time as a currently played piece of music to notify a user that a playlist is to be changed. In response to the change confirmation being provided using notice sound, the user can refuse or accept the change using a key operation, voice, or input of a gesture.
The content controller 161 can be applied to provision or recommendation of not only music but also other content. With respect to content other than music, the content controller 161 can provide showing-target content according to a scene. In a train used to go to work, the content controller 161 plays a playlist for a video of news about the economy. In a train used to go home, the content controller 161 plays a playlist for a video of a YouTuber whom a user likes. With respect to content on social media, the content controller 161 can also change pieces of display-target content according to a scene. The content controller 161 selects news about the economy in a train used to go to work, and selects, for example, entertainment content in a train used to go home. The content controller 161 can change pieces of provision-target content according to a detected scene by defining provision-target content according to a scene for each device (for each device category).
The wearable device 200 and the content playback controlling application 300 correspond to a user front end, and to an interface that serves to provide a creator behind the interface or an application provided by each company.
(1) A user state that corresponds to a search key is acquired, and a search phrase is transmitted. (2) The search phrase is analyzed. (3) A highly relevant product from among products suggested by a creator is matched (content suitable for the state is selected). (4) Production information is provided. (5) An auction is held. (6) Content that corresponds to an advertisement is provided to a user. Accordingly, a creator who wants to deliver content can be connected to a user.
(1) A user state that corresponds to input performed using a sensor, and a search phrase is transmitted. (2) The search phrase is analyzed. (3) A highly relevant product from among products suggested by a creator is matched (content suitable for the state is selected to be automatically played). (4) Production information is provided. (5) An auction is held. (6) Content that corresponds to an advertisement is provided to a user, and automatically, a notice is reproduced and music content is played back. Accordingly, a notice and music content (information) can be connected to a user.
The content playback controlling application 300 provides a context of a user to the content provision application 400. The content provision application 400 makes a request that the content playback controlling application 300 allow content playback. The content playback controlling application 300 allows the content provision application 400 to play back content. The content provision application 400 provides content to the wearable device 200 to play back the content.
A creator adds a context to a created playlist, and this enables the creator to connect, to the behavior of a user, a scene in which music included in the created playlist is desired to be heard. At a timing at which the wearable device 200 is worn, the content playback controlling application 300 transmits a context and selects a playlist on the basis of a tag.
In order for a playlist to be specified as described above, a context list is provided to a creator in advance. The creator creates a playlist and sets a context with which the playlist is desired to be played. When a user has a specific context, the playlist provided by the creator is selected.
An object of the content playback controlling application 300 is not to cause music to be just heard, but to provide an experience, that is, to suggest a scene in which music is to be heard. The content playback controlling application 300 enables content that is not retrieved by a title to be retrieved by a tag. A tag can be attached by a user or a creator. The content playback controlling application 300 enables music content to be retrieved by a context (such as “run” or “night+run”). The content playback controlling application 300 makes it possible to retrieve a context using the behavior of a user as a search key.
The content playback controlling application 300 acquires a context of a user at a timing at which a “like” for music content is pressed, and when the user has the same context again, the content playback controlling application 300 plays back the same content to provide the content. For example, when the content playback controlling application 300 detects a context of and a “like” for “night+run”, the content playback controlling application 300 plays back the same content in the case in which a user has the same context (“night+run”) again. Accordingly, the content playback controlling application 300 plays pieces of music to which a shared tag is attached, and pieces of music detected due to being harmonious with each other.
For example, it is assumed that, while a user is running (tag: “run”), night falls (tag: “night+run”). The content playback controlling application 300 detects a change in tag and dynamically changes pieces of playback-target content.
An information processing system 30 has a configuration obtained by adding a registration-and-matching section 170 to the configuration of the information processing system 10 in
The registration-and-matching section 170 includes a sensor receiver 171, a stop-and-movement detector 172, an average processor 173, a database generator 174, a database 175, and a matching section 176.
The sensor receiver 171 receives detection values detected by the sensor section 210 (the acceleration sensor 211 and the geomagnetic sensor 215) and transmitted by a sensor transmitter 216 of the wearable device 200. Note that
The matching section 176 checks a detection value newly detected by the sensor section 210 against a registration-use detection value registered in the database 175 to perform matching, and determines whether a difference between the new detection value and the registration-use detection value registered in the database 175 is less than or equal to a matching threshold. When the difference between the detection value and the registered registration-use detection value is determined to be less than or equal to the matching threshold, the content controller 161 of the output controller 160 controls output on the basis of an environment state (such as “focus”) registered in the database 175 in association with the registration-use detection value. Processing performed by the registration-and-matching section 170 is more specifically described below.
A user who is wearing the wearable device 200 stops (for example, sits down) at a location (for example, a location at which the user telecommutes) for which a detection value detected by the sensor section 210 is desired to be registered (Step S501). When the sensor receiver 171 receives a valid detection value from the sensor section 210 and checks for calibration, the registration-and-matching section 170 displays, for example, a status 902 of the acceleration sensor 211 and a status 903 of the geomagnetic sensor 215 on a GUI 900 displayed on the information processing apparatus 100 (such as a smartphone). The user operates a registration starting button 901 of the GUI 900 (Step S502). Then, the registration-and-matching section 170 displays, on the information processing apparatus 100 (such as a smartphone) or another information processing apparatus (such as a personal computer used for telecommuting), an instruction screen used to give an instruction to move the head. The user moves his/her head slowly according to the instruction screen, such that a position and an angle of the head are changed gradually (Step S503). The user operates a registration termination button 904 (Step S504).
When a registration starting button 901 is operated (Step S502), the registration-and-matching section 170 displays an icon 905 of, for example, an arrow on the GUI 900 displayed on the information processing apparatus 100 (such as a smartphone), and displays an animation of the icon 905 moving smoothly to trace an expected movement of a head of a user at a speed suitable for the user to move his/her head slowly (Step S503). When the movement of the icon 905 is completed, a message 906 used to make a request that the registration termination button 904 be operated (Step S504) is displayed.
When the registration starting button 901 is operated (Step S502), the registration-and-matching section 170 displays, on another information processing apparatus (such as a personal computer used upon telecommuting), an icon 907 of, for example, a circle and an instruction screen 908 on which, for example, instruction messages and pictures each showing an orientation of a face appear, and displays an animation of the icon 907 moving smoothly to trace an expected movement of a head of a user at a speed suitable for the user to move his/her head slowly (Step S503). When the movement of the icon 907 is completed, a message used to make a request that the registration termination button 904 displayed on the GUI 900 of the information processing apparatus 100 (such as a smartphone) be operated (Step S504) is displayed. Note that the icon 907 of, for example, a circle and the instruction screen 908 on which, for example, instruction messages and pictures each showing an orientation of a face appear may be displayed on the GUI 900 of the information processing apparatus 100 (such as a smartphone).
When a user operates the registration termination button 904 (Step S504), the average processor 173 calculates, as a registration-use detection value, an average of a plurality of detection values detected by the sensor section 210 for a specified period of time (from a point in time just before stopping (Step S501) to a point in time of terminating registration (Step S504)), and temporarily stores the calculated registration-use detection value in a memory. The database generator 174 associates the registration-use detection value detected by the sensor section 210 with an environment state (such as “focus”) to be provided to the user when the registration-use detection value is detected, and registers the associated detection value and environment state in the database 175.
Thereafter, when the user operates a checking starting button 907 (Step S505), the matching section 176 checks whether the registration-use detection value registered in the database 175 is valid (Step S506). Specifically, the matching section 176 checks a detection value newly detected by the sensor section 210 after registration against a registration-use detection value registered in the database 175 to perform matching, and determines whether a difference between the new detection value and the registration-use detection value registered in the database 175 is less than or equal to a matching threshold. When the difference is less than or equal to the matching threshold, the registration-use detection value registered in the database 175 is determined to be valid. The matching section 176 outputs (displays, on, for example, the GUI 900,) a checking result, that is, whether the registration-use detection value registered in the database 175 is valid.
In addition to the display of the GUI 900 used for registration and checking (
In the following description, a description of a step similar to the already described step is omitted. When a user who is wearing the wearable device 200 sits down (Step S501), the user operates one of icons 913 displayed on the layout image 912 that corresponds to a location for which registration is desired to be performed, and operates the location tag inputting button 911 (Step S507). Thereafter, the user performs operations as described above to move his/her head (Steps S502 to S504). The database generator 174 associates a registration-use detection value detected by the sensor section 210, an environment state (such as “focus”) to be provided to the user when the registration-use detection value is detected, and an input location tag (Step S507), and registers the associated detection value, environment state, and location tag in the database 175. The repetition of the processes described above makes it possible to perform registration for a plurality of locations.
For example, the registration-and-matching section 170 associates a detection value detected by the sensor section 210 during telecommuting at a living room, an environment state (“focus”), and a location tag “living”, and registers the associated detection value, environment state, and location tag in the database 175. Further, the registration-and-matching section 170 associates a detection value detected by the sensor section 210 at a bedroom, an environment state (“relax”), and a location tag “room”, and registers the associated detection value, environment state, and location tag in another database 175. Accordingly, when a detection value that is matched to a registered registration-use detection value is newly detected, the information processing system 30 can output content that enables a user to focus his/her mind on tasks or content that enables a user to feel relaxed to sleep well, on the basis of an environment state registered in association with the registration-use detection value.
The additional registration is performed when registration has failed as a result of (1) registration and checking or (2) registration performed for a plurality of locations. A GUI used to perform additional registration using buttons is similar to the GUI 900 used for registration and checking (
When a user who is wearing the wearable device 200 sits down (Step S501), the user operates an additional registration button 914 (
A GUI used to perform additional registration using a layout is similar to the GUI 910 used to perform registration for a plurality of locations (
When a user who is wearing the wearable device 200 sits down (Step S501), the user operates one of the icons 913 displayed on the layout image 912 that corresponds to a location for which additional registration is desired to be performed (Step S510). Thereafter, the user performs operations as described above to move his/her head (Step S503). The database generator 174 associates a registration-use detection value detected by the sensor section 210 with an environment state (such as “focus”) to be provided to the user when the registration-use detection value is detected, and additionally registers the associated detection value and environment state in the database 175.
The stop-and-movement detector 172 determines that a user has stopped, on the basis of a detection value detected by the sensor section 210 and transmitted by the sensor transmitter 216 of the wearable device 200 (Step S511, YES). The stop-and-movement detector 172 determines that the user is at a stop for a first period of time (for example, three or four seconds) (that is, the user is at a stop for three or four seconds) (Step S512, YES). Then, the average processor 173 calculates an average of a plurality of detection values detected by the sensor section 210 for a specified period of time (from a point in time just before stopping (Step S511) to a point in time just after being at a stop for the first period of time (Step S512)), and temporarily stores the calculated average in a memory. The matching section 176 checks the new detection value temporarily stored in the memory against a registration-use detection value registered in the database 175 to perform matching (Step S514). Specifically, the matching section 176 checks an average of the new detection values that is temporarily stored in the memory against a registration-use detection value registered in the database 175 to perform matching, and determines whether a difference between the average of the new detection values that is temporarily stored in the memory and the registration-use detection value registered in the database 175 is less than or equal to a matching threshold. When the difference between the average of the new detection values and the registered registration-use detection value is determined to be less than or equal to the matching threshold, the content controller 161 of the output controller 160 controls output on the basis of an environment state (such as “focus”) registered in the database 175 in association with the registration-use detection value (Step S514, YES). Note that, when the difference between the average of the new detection values and the registered registration-use detection value is greater than the matching threshold (Step S514, NO), the average of the new detection values that is temporarily stored in the memory remains stored without being deleted from the memory since there is a possibility that the average of the new detection values will become a candidate to be registered in the database 175.
On the other hand, the case in which the user does not stop (Step S511, NO), the case in which the user is at a stop for a period of time that is less than the first period of time (Step S512, NO), and the case in which the difference between the average of the detection values and the registration-use detection value registered in the database 175 is greater than the matching threshold (Step S514, NO) are described. The stop-and-movement detector 172 determines that the user has started moving, on the basis of the detection value detected by the sensor section 210 and transmitted by the sensor transmitter 216 of the wearable device 200 (Step S515, YES). The matching section 176 checks the new detection value against the registration-use detection value registered in the database 175 to perform matching (Step S516). Since the user has started moving, the difference between the detection value and the registration-use detection value registered in the database 175 is greater than the matching threshold (Step S516, NO). The database generator 174 calculates (Step S517) a period of time (that is, a stop period of time from the user stopping to the user starting moving) for which the user is at a stop from the user stopping to the user starting moving (Step S515, YES). The database generator 174 determines whether the calculated stop period of time (that is, the stop period of time from the user stopping to the user starting moving) is longer than a second period of time (for example, M minutes) (that is, whether the user is at a stop for M minutes or more) (Step S518). When the stop period of time from the user stopping to the user starting moving is longer than the second period of time (M minutes) (Step S518, YES), the database generator 174 associates an average of a plurality of new detection values detected for a specified period of time from the user stopping to the user starting moving (Step S515, YES) with an environment state to be provided to the user when the new detection values are detected, and newly registers the associated average and environment state in the database 175 (Step S519).
The database generator 174 may update the database 175 when it is determined, from a degree of activeness of a user, that the user is in a state with a high degree of focusing or with a high degree of relaxing, the degree of activeness of the user being determined on the basis of detection values detected by the sensor section 210 (the biological sensors 214 such as a heartbeat sensor, a blood flow sensor, and a brain wave sensor in particular) and transmitted by the sensor transmitter 216 of the wearable device 200.
While a detection value detected by the sensor section 210 is being registered (Steps S501 to S504), the sensor receiver 171 receives a detection value (for example, 25 Hz) detected by the sensor section 210 and transmitted by the sensor transmitter 216 of the wearable device 200 (Step S521). The average processor 173 averages, every second, a plurality of detection values (for example, 25 Hz) detected by the sensor section 210, and shifts the average every 0.5 minutes to extract M detection values of 2 Hz (Step S522). The database generator 174 creates a table that registers therein M extracted detection values, and registers the created table in the database 175 (Step S523). For example, each detection value has feature amounts in eight dimensions from feature amounts 0 to 7 (N=7). The feature amounts in eight dimensions include, for example, a three-axis feature amount obtained by the acceleration sensor 211 and a three-axis feature amount obtained by the geomagnetic sensor 215.
After the detection value detected by the sensor section 210 is registered, the sensor receiver 171 receives a new detection value (for example, 25 Hz) detected by the sensor section 210 and transmitted by the sensor transmitter 216 of the wearable device 200 (Step S524). The matching section 176 calculates, for one second, an average of a plurality of detection values (for example, 25 Hz) detected by the sensor section 210 (Step S525). The matching section 176 calculates feature amounts in eight dimensions using the average for one second. The matching section 176 checks the calculated feature amounts against the feature amounts registered in the database 175 (Step S523) to perform matching (Step S526). Note that the above-described numbers of seconds, such as one second and 0.5 seconds, that are used to obtain an average, and the number of detection values extracted of 2 Hz, are merely examples, and the numbers are not limited thereto.
While a detection value detected by the sensor section 210 is being registered (Steps S501 to S504), the sensor receiver 171 receives a detection value (for example, 25 Hz) detected by the sensor section 210 and transmitted by the sensor transmitter 216 of the wearable device 200 (Step S521). The average processor 173 averages, every second, a plurality of detection values (for example, 25 Hz) detected by the sensor section 210, and shifts the average every 0.5 minutes to extract M detection values of 2 Hz (Step S522). The database generator 174 causes a model using a neural network to learn the M extracted detection values (Step S527). For example, each detection value has feature amounts in eight dimensions from feature amounts 0 to 7 (N=7). The feature amounts in eight dimensions include, for example, a three-axis feature amount obtained by the acceleration sensor 211 and a three-axis feature amount obtained by the geomagnetic sensor 215. Note that the number of feature amounts is not limited thereto, and any number may be accepted.
After learning is performed, the sensor receiver 171 receives a new detection value (for example, 25 Hz) detected by the sensor section 210 and transmitted by the sensor transmitter 216 of the wearable device 200 (Step S524). The matching section 176 calculates, for one second, an average of a plurality of detection values (for example, 25 Hz) detected by the sensor section 210 (Step S525). The matching section 176 calculates feature amounts in eight dimensions using the average for one second. The matching section 176 inputs the calculated feature amounts to the model using a neural network that has performed learning (Step S527) to perform matching (Step S528).
Note that a model that has performed learning at least once may continue to be used. Alternatively, learning may be performed, for example, upon starting registration (Step S502), upon performing additional registration (Step S508), or upon performing additional registration automatically (Step S511) to update the model.
According to the present embodiment, the registration-and-matching section 170 associates, in advance, a detection value detected by the sensor section 210 (a registration-use detection value) with a corresponding environment state (such as “focus”), and registers the associated detection value and environment state. Accordingly, when a detection value that is matched to a registered registration-use detection value is newly detected, the information processing system 30 can output content that enables a user to focus his/her mind on tasks or content that enables a user to feel relaxed to sleep well, on the basis of an environment state registered in association with the registration-use detection value.
The present disclosure may also include the following configurations.
Further, the present disclosure may also include the following configurations.
The embodiments and the modifications of the present technology have been described above. Of course the present technology is not limited to the embodiments described above, and various modifications may be made thereto without departing from the scope of the present technology.
Number | Date | Country | Kind |
---|---|---|---|
2021-056342 | Mar 2021 | JP | national |
PCT/JP2021/021259 | Jun 2021 | WO | international |
PCT/JP2021/043548 | Nov 2021 | WO | international |
PCT/JP2022/007705 | Feb 2022 | WO | international |
PCT/JP2022/013213 | Mar 2022 | WO | international |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/015297 | 3/29/2022 | WO |