The present invention relates to a character information appending method, a character information appending apparatus, and a program.
Conventionally, there is a method of assigning character information to a still image and a moving image (for example, Patent Literature 1). The method is a method of preparing a reference image to which characters representing features of an image are assigned in advance, calculating a relevance degree between a certain image and the reference image, and assigning character information assigned to the reference image having a relevance degree equal to or greater than a threshold value to the certain image.
However, in an example of graphic recording or the like that expresses drawing information related to content of a dialogue performed by a plurality of persons, a simple illustration is often symbolically used, and it is difficult to assign character information from the relevance degree with the reference image. In the conventional method, content of the dialogue performed in relation to the drawn picture cannot be used for assigning the character information, and the relevance degree between the assigned character information and the drawn picture becomes low.
The present invention was made in view of the above points, and an object thereof is to reduce a burden of assigning (appending) character information to drawing content.
Here, in order to solve the above problems, a computer executes a character information acquisition procedure of acquiring, for each utterance in a dialogue, character information indicating content of the utterance and information indicating timing at which the utterance is made, a picture area information acquisition procedure of acquiring information indicating an area of a picture drawn during the dialogue and information indicating timing at which the picture is drawn, and an association procedure of specifying the character information to be associated with the area, based on the timing at which the picture is drawn and the timing at which the utterance is made.
It is possible to reduce a burden of assigning (appending) character information to drawing content.
Hereinafter, embodiments of the present invention will be described with reference to the drawings. In a first embodiment, an example will be described in which, when a picture, an illustration, a drawing, or the like (hereinafter, referred to as a “picture”) associated with a dialogue is drawn at any time during a dialogue in graphic recording or the like, character information is assigned (appended) to the picture by using the time (timing) at which the picture is drawn and the time (timing) of the dialogue.
A program for realizing processing in the character information assigning device 10 is provided by a recording medium 101 such as a CD-ROM. When the recording medium 101 storing a program is set in the drive device 100, the program is installed on the auxiliary storage device 102 from the recording medium 101 via the drive device 100. However, the program is not necessarily installed from the recording medium 101 and may be downloaded from another computer via a network. The auxiliary storage device 102 stores the installed program and also stores necessary files, data, and the like.
When an instruction to start the program is issued, the memory device 103 reads the program from the auxiliary storage device 102 and stores the program. The CPU 104 executes a function related to the character information assigning device 10 according to the program stored in the memory device 103. The interface device 105 is used as an interface for connecting to a network.
The character information acquisition unit 11 inputs voice in the dialogue, acquires character information (information indicating the timing of the utterance and character information indicating the content of the utterance) from the voice, and records the acquired information in the character information storage unit 21 as a character information DB.
The picture area information acquisition unit 12 inputs a photographed image of an area (a paper surface, a whiteboard, a screen that is a digital drawing destination, or the like) where drawing is scheduled to be performed during the dialogue or a digitally drawn image at any time, acquires, from the photographed video, information indicating timing at which the drawing of the picture is performed and information indicating an area of the drawn picture, and records the acquired information in the picture area information storage unit 22 as a picture area information DB.
The association unit 13 generates an area character information correspondence DB by specifying character information to be associated with the area where drawing was performed, based on the information indicating the timing at which the utterance was performed and the information indicating the timing at which the drawing was performed, and records the area character information correspondence DB in the correspondence storage unit 23.
In the period during which the dialogue is performed, the character information acquisition unit 11 acquires character information from the input voice, and records the acquired information in the character information storage unit 21 as the character information DB (S101).
Specifically, the character information acquisition unit 11 specifies a time frame (timing) in which each utterance is made based on a voice input from a microphone installed in a place where the dialogue occurs.
Furthermore, the character information acquisition unit 11 extracts words uttered within each time frame as character information from the result of the morphological analysis of the utterance content included in the voice, generates a character information DB, and records the character information DB in the character information storage unit 21. However, morphological analysis is not necessarily used to extract words.
Furthermore, in order to extract a characteristic word, not all the uttered words but only words uttered x or more times may be extracted. Further, for each word, the deviation of the appearance frequency of the word in all the time frames may be calculated (for example, a standard deviation or the like of the number of appearances in each time frame), and only a word having a large deviation (for example, the standard deviation is 2.0 or more) may be extracted.
In addition, in the period during which the dialogue is being performed, the picture area information acquisition unit 12 acquires the picture area information from the input image (for example, an image obtained by photographing, with a camera, a paper surface, a whiteboard, or the like on which a picture is drawn in a dialogue) and acquires information indicating the timing at which drawing is performed, and records the acquired information as the picture area information DB in the picture area information storage unit 22 (S102). That is, steps S101 and S102 are executed in parallel.
Specifically, the picture area information acquisition unit 12 extracts an area of a picture drawn in a time frame, and generates the picture area information DB based on an extraction result, for each time frame. The picture area information may be, for example, information indicating a minimum circumscribed rectangle of the drawn picture.
In
At any timing in the middle of the dialogue (for example, periodic timing) or at any timing after the end of the dialogue, or in response to a predetermined input by the user, the association unit 13 generates the area character information correspondence DB by specifying the character information associated with the picture area based on the timing at which the picture is drawn and the timing at which the utterance is made, and records the area character information correspondence DB in the correspondence storage unit 23 (S103).
Specifically, the association unit 13 associates the character information (uttered word) with the picture area information based on the time information (time frame) of each of the character information DB and the picture area information DB. For example, the association unit 13 associates a picture area corresponding to a time frame obtained by adding 30 seconds to the time frame, with an uttered word in a certain time frame. The association unit 13 records a result of association between the uttered word and the picture area information in the correspondence storage unit 23 as the area character information correspondence DB.
In this manner, the content of the area character information correspondence DB indicates character information assigned to a picture which is drawing content during a dialogue.
Note that, in the above description, the shift time is set to 30 seconds in the associating the character information with the picture area, but the shift may be set to a time other than 30 seconds. In addition, instead of uniformly shifting the time, the shifting time may be dynamically changed. For example, in a case where characters are included in a picture, character recognition may be performed, and an utterance time frame in which the same word appears and a picture area of the picture may be associated with each other.
As described above, according to the first embodiment, character information is automatically assigned to the drawing content. Therefore, it is possible to reduce a burden of assigning character information to drawing content. That is, it is possible to easily assign the character information to a picture related to the dialogue expressed while performing a dialogue, such as graphic recording, or a picture referred to during the dialogue, and it is possible to reduce the labor of the user to assign appropriate character information. Furthermore, by using the content of the related dialogue for assigning the character information, it is possible to assign the character information having a high relevance degree to the image information.
Next, a second embodiment will be described. In the second embodiment, points different from the first embodiment will be described. Points that are not specifically described in the second embodiment may be similar to those in the first embodiment. In the second embodiment, an example in which character information is assigned to drawing content by using the time of utterance and the time in the line-of-sight direction of the interlocutor will be described.
In
The line-of-sight information acquisition unit 14 acquires, in a certain time frame, an area to which the line of sight of the participant of the dialogue was directed and timing at which the line of sight is directed to the area, out of an area in which drawing is performed (paper surface, whiteboard, or the like), and records information indicating the timing and information indicating the area to which the line of sight was directed (hereinafter, referred to as a “line-of-sight area”) in the line-of-sight area information storage unit 24 as the line-of-sight area information DB. Note that the line of sight may be a line of sight of one specific participant (for example, a participant who is drawing) among the participants of the dialogue, or may be lines of sight of a plurality of participants. In either case, the line-of-sight area of the participant may be specified by, for example, analyzing an image (video) obtained by photographing the dialogue, or may be specified by using a wearable device worn by the participant.
The association unit 13 generates the area character information correspondence DB by specifying character information associated with the line-of-sight area based on the timing at which the line of sight is directed to the area and the timing at which the utterance is made, and records the area character information correspondence DB in the correspondence storage unit 23. In the second embodiment, the line-of-sight area is estimated as a picture area. This is because, when drawing is being performed, there is a high possibility that the line of sight of the participant is focused on the drawn figure.
In the period during which the dialogue is performed, the line-of-sight information acquisition unit 14 acquires information indicating an area where the line of sight of the participant of the dialogue stays for 10 seconds or more in total within a time frame, and records the information indicating the area in the line-of-sight area information storage unit 24 as the line-of-sight area information DB, for each time frame (S202). That is, step S201 is executed in parallel with step S101.
In
At any timing in the middle of the dialogue (for example, periodic timing) or at any timing after the end of the dialogue, or in response to a predetermined input by the user, the association unit 13 generates the area character information correspondence DB by checking the character information DB against the line-of-sight area information DB and associating the character information with the line-of-sight area information based on the time information of each DB, and records the area character information correspondence DB in the correspondence storage unit 23 (S203).
Specifically, the association unit 13 associates an uttered word extracted in a certain time frame with a line-of-sight area in a certain time frame based on character information of the certain time frame and line-of-sight area information indicating an area to which the line of sight was directed in the time frame.
According to
In this manner, the content of the area character information correspondence DB indicates character information assigned to a picture which is drawing content during a dialogue.
As described above, the same effects as those of the first embodiment can also be obtained by the second embodiment.
Next, a third embodiment will be described. In the third embodiment, points different from the second embodiment will be described. Points that are not specifically described in the third embodiment may be similar to those in the second embodiment.
In the third embodiment, an example in which weighting is performed on the character information (each word) recorded in association with each cell (each partial area constituting the picture area) in the area character information correspondence DB will be described.
In
The weight calculation unit 15 weights each word stored in the area character information correspondence DB based on the appearance frequency (the number of appearances) of each cell. The result of weighting by the weight calculation unit 15 is recorded in the weight information storage unit 25.
In step S301, the weight calculation unit 15 refers to the area character information correspondence DB (
Subsequently, the weight calculation unit 15 records the weighting result as the weighting information DB in the weight information storage unit 25 (S302).
Note that the weight calculation unit 15 may calculate, for a certain cell, the deviation (for example, a standard deviation or the like of the appearance frequency in each time frame) in the appearance frequency in time series in another cell, and calculate a value corresponding to the magnitude of the deviation as a weighting coefficient. In this case, the larger the deviation, the larger the weighting coefficient.
Note that the weight calculation unit 15 and the weight information storage unit 25 may be combined with the first embodiment.
As described above, according to the third embodiment, the word associated with each cell (partial area constituting the picture area) can be weighted. As a result, relative importance can be assigned to each word corresponding to a figure or the like corresponding to each cell.
Although the embodiments of the present invention have been described in detail above, the present invention is not limited to such specific embodiments, and various modifications and changes can be made within the scope of the gist of the present invention described in the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/042780 | 11/17/2020 | WO |