This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2015-152313, filed on Jul. 31, 2015, the entire contents of which are incorporated herein by reference.
The technology disclosed herein relates to an information presentation method, an information presentation program and an information presentation apparatus.
In a system that allows interactive communication between user terminals, interactive communication services have been proposed by which information of life sound or the like is sensed and appropriate life sound or the like is presented to watch the elderly. As an element technology common to the services, a technology is available wherein feature portions such as opening or closing sound of a door or laughter of people are extracted from collected life sound and so forth and presented to a user.
For example, a method has been proposed wherein such an interactive confirmation state as “whether communicating persons simultaneously confirm the state of the communication partners” is sensed to perform automatic changeover between a privacy protection state in which information to be sent to the other party side from within the substance of the conversation is restricted and a conversation state in which no such restriction is applied (refer to, for example, Patent Document 1 or 2).
[Patent Document 1] Japanese Laid-open Patent Publication No. 2011-10095
[Patent Document 2] Japanese Laid-open Patent Publication No. 2006-093775
[Patent Document 3] Japanese Laid-open Patent Publication No. 2003-153322
[Patent Document 4] Japanese Laid-open Patent Publication No. 2013-074598
According to an aspect of the invention, an information presentation method includes monitoring specified information in contents of interactive communications between a first apparatus and a second apparatus, the contents from the first apparatus being outputted to the second apparatus when a first condition is satisfied, the contents from the second apparatus being outputted to the first apparatus when a second condition is satisfied, and changing, when the specified information is detected in the contents, at least one of the first condition and the second condition so that the contents is outputted more easily.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
However, with the technology of the Patent Documents mentioned above, transmission of information is performed only after the communication parties simultaneously confirm the state of the respective communication partners. Therefore, even if the communication parties want to hold a conversation with each other, if they do not simultaneously confirm the situation of the respective communication partners, it is difficult to enter a conversation state.
For example, the technology has a problem that, if the sound to be conveyed to the communication partner like a notification “I have just come home” when returning home is interrupted, then it is hard to have a communication.
Therefore, it is desirable to make it possible to smoothly perform interactive communication in a system that allows interactive communication between information presentation apparatus used by users.
In the following, embodiments of the disclosed technology are described with reference to the accompanying drawings. It is to be noted that, in the specification and the drawings, like elements including a substantially like functional configuration are denoted by like reference symbols, and overlapping description of them is omitted herein.
[Example of Hardware Configuration of Information Presentation Apparatus]
First, an example of a hardware configuration of an information presentation apparatus according to first and second embodiments of the technology is described with reference to
The information presentation apparatus 1 is implemented by a general-purpose computer, a workstation, a desk top type personal computer (PC), a notebook type PC or the like. The information presentation apparatus 1 includes a central processing unit (CPU) 110, a random access memory (RAM) 120, a read only memory (ROM) 130, a hard disk drive (HDD) 140, an inputting unit 150, an outputting unit 160, a communication unit 170 and a reading unit 180. The components mentioned are coupled to each other by a bus.
The CPU 110 controls the components of hardware in accordance with control programs stored in the ROM 130. The RAM 120 may be, for example, a static RAM (SRAM), a dynamic RAM (DRAM) or a flash memory. The RAM 120 temporarily stores data generated upon execution of the programs by the CPU 110. The control programs may include an information presentation program for executing an information presentation method according to the first and second embodiments.
The HDD 140 has various databases hereinafter described (each hereinafter referred to also as “DB”) stored therein. The control programs may otherwise be stored in the HDD 140. A solid state drive (SSD) may be provided in place of the HDD 140.
The inputting unit 150 includes a keyboard, a mouse, a touch panel and so forth for inputting data to the information presentation apparatus 1. Further, the inputting unit 150 includes, for example, a microphone 150a coupled thereto and inputs life sound and so forth collected by the microphone 150a.
It is to be noted that, in the present specification, the term “sound” is not limited to the “sound” of the narrow sense which is vibration in the air acquired by a microphone but signifies a concept of the broad sense including “vibration” propagating, for example, in the air, material or liquid as measured by a measuring instrument such as, for example, a microphone, a piezoelectric element or a laser micro displacement meter.
The outputting unit 160 outputs an image acquired by the information presentation apparatus 1 to a display apparatus 160a or outputs acquired sound to the speaker.
The communication unit 170 communicates with a different computer (for example, the information presentation apparatus 2) through a network. The reading unit 180 reads a portable storage medium 100a including a compact disk-ROM (CD-ROM) and a digital versatile disc-ROM (DVD-ROM). The CPU 110 may read a control program from the portable storage medium 100a through the reading unit 180 and store the control program into the HDD 140. Further, the CPU 110 may download a control program from a different computer through the network and store the control program into the HDD 140. Furthermore, the CPU 110 may read in a control program from a semiconductor memory 100b.
Now, an example of a functional configuration of an information presentation apparatus according to the first embodiment is described with reference to
It is to be noted that the functions of the components of the information presentation apparatus 1 are implemented by cooperative operation of the control programs stored in the ROM 130 and hardware resources such as the CPU 110, the RAM 120 and so forth.
The information presentation apparatus 1 includes a life sound inputting unit 10a, a feature amount extraction unit 11a, a score decision unit 12a, a recording unit 13a, a threshold value changing unit 14a, a presentation decision unit 15a, a transmission unit 16a, a reception unit 17a and an outputting unit 18a. The recording unit 13a records a life sound DB 20a, a sound feature amount DB 21a, a sound cluster DB 22a, a score DB 23a, a sound list DB 24a and an output condition table 25a.
The life sound inputting unit 10a inputs life sound. The life sound includes sound and conversation generated when the user A lives. The recording unit 13a records inputted life sound into the life sound DB 20a. It is to be noted that the life sound inputting unit 10a is a functioning unit corresponding to the inputting unit 150 that is a hardware element.
The format of sound data to be stored into the life sound DB 20a may be an uncompressed format such as waveform audio format (WAV) (resource interchange file format (RIFF)) or audio interchange file format (AIFF). The format of voice data may be a compressed format such as moving picture experts group (MPEG)-1 audio layer-3 (MP3), Windows Media (registered trademark) Audio (WMA).
The life sound inputting unit 10a passes sound data to the feature amount extraction unit 11a. The feature amount extraction unit 11a delimits sound data into time windows and calculates a feature amount for each delimited time window. The recording unit 13a records the calculated feature amounts into the sound feature amount DB 21a.
The score decision unit 12a performs matching between a feature amount received from the feature amount extraction unit 11a and feature amounts of clusters stored in the sound cluster DB 22a to determine a cluster to which the sound data that is a processing target is to belong.
If a cluster to which sound data of a processing target is to belong is determined, then the recording unit 13a records the ID of the determined cluster into the score DB 23a.
The score decision unit 12a calculates a score from a generation frequency. In the present embodiment, the score decision unit 12a uses a negative logarithm of the generation frequency as a score. Therefore, as the generation frequency of data of specific sound increases, the score decreases, but as the generation frequency decreases, the score increases. It is to be noted that the calculation method of a score is not limited to this, but the generation frequency may be used as it is as a score.
The threshold value changing unit 14a changes (moderates or restores) a threshold value for deciding whether or not sound data is to be transmitted to the communication partner. As the threshold value increases, the sound data to be cut off increases, and most part of the conversation may not be transmitted to the communication partner. Therefore, the privacy can be protected by setting the threshold value to a high level. On the other hand, as the threshold value decreases, the sound data to be cut off decreases, and the conversation is transmitted to some degree to the communication partner. Therefore, it becomes easily to recognize an environment or a situation of the communication partner at a remote place by decreasing the threshold value. The threshold value changing unit 14a decreases the threshold value when specific sound registered in the sound list DB 24a is included in feature sound included in sound data.
For example, an example of the sound list DB 24a is depicted in
For example, if the threshold value is a fixed value, then sound of “are you all right?” following “crash (an example of the specific sound)” may be cut off and may not be transmitted to the communication partner. However, it is considered that there is a difference in status of the place between “are you all right?” following “crash” and “are you all right?” free from “crash.” In other words, it is anticipated that the latter case is higher in emergency than the former and, in the situation of the latter case, it is better to convey the sounds to the communication partner and confirm the safety.
Therefore, in the present embodiment, the threshold value changing unit 14a lowers the threshold value only when specific sound registered in the sound list DB 24a is recognized thereby to allow sound outputted immediately after the specific sound is generated to be conveyed more readily to the communication partner. By moderating, when the urgency is high, the cutoff level of sound to make it easily to convey sound of a conversation or the like to the communication partner in this manner, when communication is demanded between users, such communication can be implemented. In particular, in such a case that, whereas the threshold value is set to a high level so as to cut off daily conversation to protect the privacy, where it is desirable to confirm the safety of life sound of a parent as in the case where the parent and a child live apart from each other, for example, the threshold value is lowered so as to allow life sound to be transmitted to the information presentation apparatus 1 of the child. This makes it possible to smoothly perform interactive communication in the system that allows interactive communication between the information presentation apparatuses 1 and 2 utilized by the users.
The presentation decision unit 15a decides, when the score recorded in the score DB 23a is equal to or higher than the threshold value, that the sound is low in generation frequency and then decides to transmit sound recorded in the score DB 23a in a corresponding relationship to the score so as to present the sound to the communication partner. When the score recorded in the score DB 23a is lower than the threshold value, the presentation decision unit 15a decides that the sound is sound outputted frequently and cuts off the sound.
The transmission unit 16a transmits sound data decided to be transmitted by the presentation decision unit 15a to the information presentation apparatus 2 of the communication partner. The reception unit 17a receives sound data transmitted from the information presentation apparatus 2 of the communication partner. The outputting unit 18a outputs the received sound data. It is to be noted that the outputting unit 18a is a functioning unit corresponding to the outputting unit 160 that is hardware. Meanwhile, the transmission unit 16a and the reception unit 17a are functioning units corresponding to the communication unit 170 that is hardware.
[Example of Information Presentation Process]
Now, the information presentation process according to the present embodiment is described with reference to
The life sound inputting unit 10a inputs streaming sound data using the microphone or the like (step S10). Then, the feature amount extraction unit ha delimits the inputted sound data for each given period of time and performs processing for each delimited period of time. That is, the feature amount extraction unit ha delimits sound data for each given period of time to divide the sound data into time windows (step S11). Then, the feature amount extraction unit ha executes a feature amount calculation process for sound data in a time window (step S12). When the feature amount calculation process ends, the feature amount extraction unit ha displaces the time window and repetitively performs a similar process for the sound data in the displaced time window.
(Example of Feature Amount Calculation Process)
An example of the feature amount calculation process is described with reference to
Then, the feature amount extraction unit 11a applies a filter to the mel-spectrum obtained at step S26 (step S27). The feature amount extraction unit 11a outputs the mel-spectrum after the filter is applied as a feature amount (step S28), and the processing is returned to the calling source.
Here, the filter is a power function whose multiplier p is smaller than 1 and is represented, for example, by the following expression (1):
[Expression 1]
x
p(i)=(xm(i))p (1)
where xm(i) represents the ith component of the mel-spectrum feature; xp(i) represents the ith component of the feature amount; p represents the multiplier (pεR, 0≦p<1); and R represents the entire real numbers.
Referring back to
(Example of Threshold Value Changing Process)
An example of the threshold value changing process is described with reference to
As a result, into the output condition table 25a, the decreasing width for the threshold value is recorded for the cluster ID (in other words, for the sound cluster) as depicted in
Thus, in an initial state illustrated in
In an initial state illustrated in
In the information presentation system according to the present embodiment, in order to protect the privacy, for example, everyday sound such as conversation is cut off, and only when comparatively rare sound such as opening or closing sound of a door is detected, the sound is conveyed to the other party side. However, if also sound to be conveyed to the other party side like “I have just come home” upon coming home is cut off, then the communication becomes less easily to be performed. In contrast, with the information presentation apparatus 1 according to the present embodiment, when specific sound is detected at least at one of the own side and the other party side, an output condition (in the present embodiment, a threshold value) of at least one of the own side and the other party side is moderated to help convey the sound. Consequently, an interaction system by which communication can be performed readily can be provided.
(Example of Threshold Value Restoration Process)
Referring back to
In the threshold value restoration process described above, at least one of the threshold value Th1 at the own side and the threshold value Th2 at the other party side is returned to the original value as illustrated in
(Example of Feature Sound Learning Process)
Referring back to
First, the threshold value changing unit 14a inputs a cluster ID of generated specific sound and a flag (hereinafter referred to as “communication flag”) indicative of whether communication has been performed within a fixed period of time after the threshold value for the specific sound was moderated (lowered) (step S36). Then, the threshold value changing unit 14a decides on the basis of the communication flag whether communication has been performed within the fixed period of time after the threshold value for the specific sound was moderated (changed) (step S37). When “1” is set in the communication flag, this indicates that communication has been performed within the fixed period of time after the threshold value for the specific sound was moderated. When “0” is set in the communication flag, this indicates that no communication has been performed within the fixed period of time after the threshold value for the specific sound was moderated.
It is to be noted that whether communication has been performed (whether the communication flag has “1” set therein) can be detected, in the information presentation system of the present embodiment, for example, depending upon whether a number of times of issuance of sound equal to or greater than a fixed number are included interactively within a fixed period of time after the moderation. Alternatively, a button for being pushed every time communication is to be performed may be prepared such that the number of times by which the button is pushed is counted.
If the threshold value changing unit 14a decides that communication has been performed within the fixed period of time after the threshold value for the specific sound was moderated, then the threshold value changing unit 14a increases the decreasing width for the threshold value (step S38). On the other hand, if it is decided that communication has not been performed within the fixed period of time after the threshold value for the specific sound was moderated, then the threshold value changing unit 14a decreases the decreasing width for the threshold value (step S39). Thereafter, the threshold value changing unit 14a sets “0” to the communication flag (step S40) and returns the processing to the calling source.
With the feature sound learning process described above, when specific sound such as sound of a gate or sound of a cleaner is acquired, the threshold value is gradually decreased as illustrated in
Then, it is detected whether the sound is followed by conversation as illustrated in
If “sound of cleaner” illustrated in
On the other hand, for example, if sound of a gate is generated and is not followed by conversation, passage of later sound for a fixed period of time can be difficult by decreasing the decreasing width for the threshold value for sound of the gate. The sound volume may be decreased in proportion to the magnitude of the decreasing width for the threshold value. With the method described, by increasing or decreasing the threshold value stepwise, it is possible to transmit sound data that may become a trigger for communication with the communication partner while any other sound data is cut off to accurately perform protection of the privacy.
Referring back to
If the threshold value is changed to change the output condition for sound data, then where there is an error in extraction of a feature amount of sound data for which moderation of the output condition is to be performed, then the output condition may be moderated in error, resulting in significant degradation of the usability. On the other hand, there is a wide variety of information especially in the real world (for example, life sound), and for setting of “which information is to be presented” in minute detail, much labor is demanded. Therefore, to change the threshold value to change the output condition of sound data is not realistic. Further, since whether or not an output condition is to be moderated may differ depending upon an individual user or an operation environment, it is not preferable to simply moderate an output condition in same conditions in all information presentation systems.
In contrast, with the present embodiment, an output condition is learned adaptively in accordance with a utilization condition of a user. In other words, when an output condition is moderated actually, if the moderation is followed by communication, then the degree of the moderation is increased, but if the moderation is not followed by communication, the moderation degree is decreased. By the countermeasures, such a situation can be suppressed that an output condition is moderated in error and the usability can be improved.
For example, with the present embodiment, sound that may cause a problem of privacy infringement such as conversation is cut off. On the other hand, sound that may give rise to expectation that a change has occurred in a situation of the other party such as, for example, opening or closing sound of a door or clacking sound of some tableware is conveyed to the user at the other party side. Consequently, a rough change of a state of the other party side can be recognized on the real time basis while protecting privacy.
Further, in the present embodiment, different from a case where a different sensor is used to change a presentation condition, there is no necessity to provide a sensor other than the information presentation apparatus 1, and therefore, the system configuration can be simplified. It is to be noted that a threshold value for determining whether or not information presentation is to be performed in the present embodiment is an example of an output condition.
In the description of the information presentation method according to the first embodiment described above, sound of a visual telephone system or the like is taken as an example. However, information that can be handled in the information presentation method according to the embodiment is not limited to sound, but includes media information such as, for example, a moving picture, a still picture and text information and sensor values.
At present, with the sophistication of networks and the spread of mobile apparatuses, social networking site (SNS) services of “gently conveying the state of each other to each other” such as, for example, Twitter (registered trademark), Facebook (registered trademark) or LINE (registered trademark) have been popularized. In the SNS service, a service is supposed by which not only text information but also media information such as video and sound are normally transferred to gently convey a state of each other to each other. The service may be, for example, a service by a normally-coupled visual telephone system that links a parent home and a child home.
In particular, an information presentation system wherein, by interactively conveying life sound or media information to each other, a complicated operation may not be performed and a feeling that one is monitored one-sidedly is reduced while the one can feel such a “sign” that the other party living apart may live in the neighborhood may be an SNS service. In the second embodiment, media information to be presented to the other party is decided at an SNS, and only information to be presented is conveyed to the other party.
The information presentation system according to the second embodiment can be applied, for example, not only to the information presentation system according to the first embodiment, but also to a system that decides, for example, the submission substance (hereinafter referred to also as “submission information”) to an SNS and conveys submission information only to users who have strong friendships on the SNS. In the SNS service, the submitted documents are filtered on the basis of the importance of a word, and it is controlled to what publication range each document is to be presented (whether each document is to be presented restrictively to those having strong friendships or is to be presented also to those having weak friendships). In particular, the information presentation method according to the second embodiment changes the range (publication range) in which media information is to be published on the basis of information of the specific substance obtained from a result of analysis of media information provided, for example, on an SNS. It is to be noted that the publication range may include both of a range of the publication destination for publication and a range of information to be published.
Where the information presentation method according to the present embodiment is applied to submission data exchanged on an SNS, it is possible to detect information to be conveyed to the other party side in response to a decision regarding whether a response is received in response to message information conveyed, for example, within a fixed period of time.
[Example of Functional Configuration of Information Presentation Apparatus]
The hardware configuration of the information presentation apparatus according to the second embodiment is similar to the hardware configuration of the information presentation apparatus according to the first embodiment, and therefore, overlapping description of the hardware configuration is omitted herein. Thus, an example of a functional configuration of the information presentation apparatus according to the second embodiment is described with reference to
The information presentation apparatus 1 includes an information inputting unit 30a, an information analysis extraction unit 31a, a presentation condition decision unit 32a, a recording unit 33a, a presentation condition changing unit 34a, a presentation decision unit 35a, a transmission unit 36a, a reception unit 37a and an outputting unit 38a. The recording unit 33a records an input information DB 40a, a moderation condition DB 42a, a word list DB 44a and an output condition table 45a.
The information inputting unit 30a inputs media information to be exchanged on an SNS. The media information includes sound, a moving picture, a still picture and text information (message information). The recording unit 33a records inputted media information into the input information DB 40a. In the input information DB 40a, a timestamp and a media information file name are recorded similarly as in the life sound DB 20a of
The information analysis extraction unit 31a delimits submission data into time windows and executes morpheme analysis of information for each delimited time window. The presentation condition decision unit 32a performs, on the basis of a result of the analysis received from the information analysis extraction unit 31a, matching with words of the specific substance stored in the word list DB 44a whose example is depicted in
The word list DB 44a has registered therein in advance words that indicate the specific substances that are used but not frequently in the day-to-day such as words used in a dangerous or abnormal scene or words that are used in a scene that involves urgency among the submission data. A word group registered in the word list DB 44a is an example of “specific substance information.”
If the submission data of the processing target includes a word registered in the word list DB 44a, then the presentation condition changing unit 34a moderates (changes) the presentation condition for the word. The changed presentation condition is recorded into the moderation condition DB 42a depicted in
The presentation decision unit 35a decides in accordance with the publication range of specific sound whether or not submission data including the specific sound is to be presented to the communication partner. The transmission unit 36a transmits submission data decided to be published by the presentation decision unit 35a to the communication partner to publish the submission data. The reception unit 37a receives submission data transmitted from the information presentation apparatus 2 of the communication partner. The outputting unit 38a outputs the received submission data.
[Example of Information Presentation Process]
Now, an information presentation process according to the present embodiment is described with reference to
The information inputting unit 30a inputs submission data on an SNS (step S50). Then, the information analysis extraction unit 31a delimits the inputted submission data for each given period of time and performs processing for each delimited period. In particular, the information analysis extraction unit 31a delimits the submission data for each given period of time to divide the submission data into time windows (step S51). Then, the information analysis extraction unit 31a executes morpheme analysis of the submission data in a time window to extract a feature portion of the submission substance (step S52). When the morpheme analysis ends, the time window is displaced and a similar process is performed repetitively for the submission data in the displaced time window. Then, the presentation condition changing unit 34a performs a presentation condition changing process (step S53).
(Example of Presentation Condition Changing Process)
An example of the presentation condition changing process is described with reference to
According to the example depicted in
On the other hand, where the submission data includes a word of the specific substance to be noted registered in the word list DB 44a such as “depression,” according to the example depicted in
(Example of Presentation Condition Restoration Process)
Referring back to
According to the presentation condition restoration process, by continuing to convey submission data for a fixed period of time until the publication range is returned to its original value as illustrated in
(Example of Presentation Condition Learning Process)
Referring back to
The presentation condition changing unit 34a receives, as inputs thereto, the ID of the extracted word and a flag (communication flag) indicative of whether communication has been performed after the publication range was changed using the word (step S66). Then, the presentation condition changing unit 34a decides on the basis of the communication flag whether communication has been performed after the change of the publication range (step S67). If the communication flag has “1” set therein, then this indicates that communication has been performed after the change of the publication range. If the communication flag has “0” set therein, then this indicates that communication has not been performed after the change of the publication range.
If the presentation condition changing unit 34a decides that communication has been performed after the change of the publication range, then the presentation condition changing unit 34a increases the increasing width of the publication range (step S68). On the other hand, if the presentation condition changing unit 34a decides that no communication has been performed after the change of the publication range, then the presentation condition changing unit 34a decreases the increasing width of the publication range (step S69). Thereafter, the presentation condition changing unit 34a sets the communication flag to “0” (step S70) and then returns the processing to the calling source.
Referring back to
According to the present embodiment, as depicted in
With the information presentation systems according to the first and second embodiments described above, usually transmission of information such as sound data or submission data is cut off for the privacy protection. Then, if event sound different from daily sound or the submission substance different from the ordinary submission substance is generated, then the transmission range of sound or the submission substance to be sent to the communication partner is increased such that a greater amount of information can be conveyed to the communication partner. Consequently, interactive communication can be performed smoothly in the system that allows interactive communication between information presentation apparatuses utilized by users.
Although the information presentation method, the information presentation program and the information presentation apparatus are described in connection with the embodiments, the information presentation method, the information presentation program and the information presentation apparatus according to the present technology are not limited to the embodiments described above but can be modified and improved in various manners without departing from the spirit and scope of the present technology. Further, the embodiments described above can be combined within a range within which no contradiction arises.
Further, although the output condition in the embodiments described above is changed on the real time basis, the change of the output condition is not limited to this, but a plurality of events (sound data and submission data) may be accumulated such that a batch process is performed so as to change the output condition on the basis of the plurality of pieces of accumulated data.
Additionally, the information presentation method in the embodiments includes (1) dividing the audio data for every specified period, (2) calculating each first feature of each frequency component of each of the divided audio data, (3) calculating each second feature of each of the divided audio data by applying a specified function to at least one component of each first feature, the specified function being a function of x that corresponds to each frequency component, a derivative or subderivative function of the specified function by x being monotonically decreasing within an interval ab≦x≦at (0≦ab<at≦∞), the specified function having a lower bound T, and (4) detecting the specified sound in each of the divided audio data based on the each second feature.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2015-152313 | Jul 2015 | JP | national |