The present disclosure relates to an information processing device, a control method, and a recording medium for performing a process related to generating of a digest.
There are technologies which generate a digest by editing video data that is raw material data. For example, Patent Literature 1 discloses a method for manufacturing the digest by confirming highlight scenes from a video stream of a sports event at a ground. Further, Non-Patent Literature 1 discloses information relating to a Grad-CAM (Gradient-weighted Class Activation Mapping) which is a technique of visualizing a basis for determination by a convolution neural network.
Patent Literature 1: JP 2019-522948A
Non-Patent Literature 1: Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, [Search on Apr. 27, 2020], Internet <URL: https://arxiv.org/pdf/1610.02391.pdf>
When the degree of importance is calculated for video data which is raw material data and the digest generation is carried out based on the degree of importance, it is required that the accuracy of the model for calculating the degree of importance is sufficiently high. Therefore, in such a case, it is necessary to appropriately evaluate whether or not the model for calculating the degree of importance has sufficient accuracy.
In view of the above-described issue, it is therefore an example object of the present disclosure to provide an information processing device, a control method, and a storage medium capable of acquiring information suitable for evaluation of a model for calculating the degree of importance used in digest generation.
In one mode of the information processing device, there is provided an information processing device including: an input data acquisition means configured to acquire input data including at least one of video data or audio data; an importance degree calculation means configured to calculate a degree of importance of the input data; and an attention part identification means configured to identify an attention part of the input data in a calculation of the degree of importance.
In one mode of the control method, there is provided a control method executed by a computer, the control method including: acquiring input data including at least one of video data or audio data; calculating a degree of importance of the input data; and identifying an attention part of the input data in a calculation of the degree of importance.
In one mode of the storage medium, there is provided a storage medium storing a program executed by a computer, the program causing the computer to function as: an input data acquisition means configured to acquire input data including at least one of video data or audio data; an importance degree calculation means configured to calculate a degree of importance of the input data; and an attention part identification means configured to identify an attention part of the input data in a calculation of the degree of importance.
An example advantage according to the present invention is to suitably specify a portion noted in the calculation of the degree of importance used in the digest generation.
Hereinafter, an example embodiment of an information processing device, a control method, and a storage medium will be described with reference to the drawings.
(1) System Configuration
The information processing device 1 performs data communication with the input device 2 and the display device 3 through a communication network or through a wired or wireless direct communication. Further, when raw material data (also referred to as “input data Di”) to be the target of visualization of the attention part is inputted, the information processing device 1 identifies the attention part noted in the digest generation of the input data Di. The input data Di may be any raw material data stored in the storage device 4 or may be raw material data supplied to the information processing device 1 from an external device other than the storage device 4. Then, the information processing device 1 displays the information on the identified attention part on the display device 3. In this case, the information processing device 1 generates a display signal “S1” for displaying information relating to the identified attention part, and supplies the generated display signal S1 to the display device 3.
The input device 2 is any user interface configured to accept a user input, and examples of the input device 2 include a button, a keyboard, a mouse, a touch panel, and a voice input device. The input device 2 supplies the input signal “S2” generated based on the user input to the information processing device 1. The display device 3 is, for example, a display, a projector or the like, and displays predetermined information based on the display signal S1 supplied from the information processing device 1.
The storage device 4 is a memory configured to store various kinds of information necessary for the process by the information processing device 1. The storage device 4 stores, for example, the importance degree inference engine information D1. The importance degree inference engine information D1 includes parameters of an inference engine (also referred to as “importance degree inference engine”) learned to infer the degree of importance of the video data if the video data is inputted thereto. The degree of importance described above serves as an index of a criterion for determining whether or not each section constituting the input data Di is an important section or a non-important section in the generation of the digest. The learning model of the importance degree inference engine may be a learning model based on any machine learning such as a neural network and a support vector machine. For example, if the model of the importance degree inference engine described above is a neural network such as a convolutional neural network, the importance degree inference engine information D1 includes various parameters regarding the layer structure, the neuron structure of each layer, the number of filters and filter sizes in each layer, and weights for each element of each filter. In addition, the storage device 4 may store raw material data, for generation of the digest, from which the input data Di is to be selected.
The storage device 4 may be an external storage device such as a hard disk connected to or built in to the information processing device 1, or may be a storage medium such as a flash memory. The storage device 4 may be a server device configured to perform data communication with the information processing device 1. The storage device 4 may include a plurality of devices.
The configuration of the attention part visualization system 100 shown in
(2) Hardware Configuration of Information Processing Device
The processor 11 executes a predetermined process by executing a program stored in the memory 12. The processor 11 is one or more processors such as a CPU (Central Processing Unit), GPU (Graphics Processing Unit), and a quantum processor.
The memory 12 is configured by various volatile and non-volatile memories such as RAM (Random Access Memory), ROM (Read Only Memory), and the like. In addition, a program executed by the information processing device 1 is stored in the memory 12. The memory 12 is used as a work memory and temporarily stores information acquired from the storage device 4. The memory 12 may function as a storage device 4. Similarly, the storage device 4 may function as a memory 12 of the information processing device 1. The program executed by the information processing device 1 may be stored in a storage medium other than the memory 12.
The interface 13 is an interface for electrically connecting the information processing device 1 to other devices. For example, the interface for connecting the information processing device 1 to other devices may be a communication interface such as a network adapter for performing transmission and reception of data to and from other devices by wired or wireless communication under the control of the processor 11. In other examples, the information processing device 1 may be connected to other devices by a cable or the like. In this instance, the interface 13 includes a hardware interface compliant with USB (Universal Serial Bus), SATA (Serial AT Attachment), and the like for exchanging data with other devices.
The hardware configuration of the information processing device 1 is not limited to the configuration shown in
(3) Functional Block
The input data acquisition unit 14 acquires the input data Di and supplies the acquired input data Di to the importance degree calculation unit 15 and the output control unit 17. In this case, for example, the input data acquisition unit 14 acquires the video data received from an external device through the interface 13 as the input data Di. In another example, the input data acquisition unit 14 acquires, as the input data Di, the video data specified by the input signal S2 generated based on the user input to the input device 2 from among the video data stored in the storage device 4 or in the memory 12.
The importance degree calculation unit 15 calculates the degree of importance, in time series, of the input data Di based on the input data Di supplied from the input data acquisition unit 14. Then, the importance degree calculation unit 15 supplies information (also referred to as “importance degree information Ii”) indicating the calculated degree of importance in time series to the output control unit 17. In this case, the importance degree calculation unit 15 configures the importance degree inference engine by referring to the importance degree inference engine information D1, and generates the importance degree information Ii obtained by inputting the input data Di to the importance degree inference engine. For example, data (also referred to as “sample data”) that is each section with a predetermined time length obtained by equally dividing the input data Di is inputted to the importance degree inference engine. Here, the importance degree inference engine is a learning model learned to infer, when the sample data is inputted thereto, the degree of importance in the section corresponding to the inputted sample data. In this case, for example, the importance degree calculation unit 15 acquires the degree of importance, in the time series, of the input data Di by sequentially inputting all the sample data, which obtained by dividing the input data Di in section units, to the importance degree inference engine.
In addition, the importance degree calculation unit 15 supplies information (also referred to as “intermediate calculation information Im”) representing the intermediate (midterm) result generated in the process of calculating the degree of importance to the attention part identification unit 16. In this case, for example, the importance degree inference engine has a multi-layer structure with three or more layers, and the importance degree calculation unit 15 supplies the intermediate calculation information Im that is the output value (e.g., the gradient with respect to the output regarding the prediction class) from the intermediate layer(s) of the importance degree inference engine when the sample data described above is inputted to the attention part identification unit 16. In this case, the intermediate calculation information Im may be, for example, map information indicating the attention degree (degree of attention) for each of pixels or for each of sub-pixels of one or more images (frames) constituting the sample data, or may be information indicating the degree of attention of each image for plural images constituting the sample data. It is noted that, for example, the importance degree calculation unit 15 can generate the intermediate calculation information Im described above by using a method according to Grad-CAM or its application technique that is a technique of visualizing the determination basis by the convolutional neural network.
On the basis of the intermediate calculation information Im supplied from the importance degree calculation unit 15, the attention part identification unit 16 identifies the attention part in the input data Di and supplies information (also referred to as “attention part information In”) indicating the identified attention part to the output control unit 17. Details of the process by the attention part identification unit 16 will be described later.
The output control unit 17 generates a display signal S1 for explicitly indicating an attention part based on the input data Di supplied from the input data acquisition unit 14, the importance degree information Ii supplied from the importance degree calculation unit 15, and the attention part information In supplied from the attention part identification unit 16. Then, the output control unit 17 supplies the generated display signal S1 to the display device 3 via the interface 13. The display example by the output control unit 17 will be described later. The output control unit 17 may control, in addition to the display device 3, the sound output device for audio output. For example, the output control unit 17 may output a guidance voice or the like relating to the attention part to the sound output device.
Each component of the input data acquisition unit 14, the importance degree calculation unit 15, the attention part identification unit 16, and the output control unit 17 described in
(4) Identification of Attention Part
Next, specific examples of the attention part by the attention part identification unit 16 described in
In this case, the importance degree calculation unit 15 inputs the image 8 to the importance degree inference engine as the sample data and supplies the intermediate calculation information Im corresponding to the image 8 to the attention part identification unit 16. In this case, for example, the intermediate calculation information Im is map information indicative of the degree of attention for each pixel or for each sub-pixel in the image 8. Then, based on the intermediate calculation information Im supplied from the importance degree calculation unit 15, the attention part identification unit 16 identifies an area of the image 8 surrounded by the frame 9 as an area (also referred to as “attention area”) corresponding to the attention part. Here, the attention part identification unit 16 identifies, as the attention area, a minimum rectangular area surrounding all or more than a predetermined percentage (e.g., 90%) of pixels where the degree of attention according to the above-described map information is equal to or larger than a predetermined threshold value. Instead of identifying a rectangular area as an attention area, the attention part identification unit 16 may identify an area with an arbitrary shape as an attention area. In this case, the attention part identification unit 16 may identifies the area constituted by pixels (partial area) where the degree of attention is equal to or larger than the predetermined threshold value as an attention area as it is.
In this case, the importance degree calculation unit 15 inputs three images 8a to 8c as sample data to the importance degree inference engine and supplies the intermediate calculation information Im indicating the intermediate calculation result outputted by the importance degree inference engine to the attention part identification unit 16. In this case, the intermediate calculation information Im is, for example, map information indicative of the degree of attention in pixel units or in sub-pixel units for each of the images 8a to 8c. Then, based on the above-described map information supplied from the importance degree calculation unit 15, the attention part identification unit 16 identifies a partial area of the image 8a surrounded by the frame 9a, a partial area of the image 8b surrounded by the frame 9b, and a partial area of the image 8c surrounded by the frame 9c as attention areas corresponding to the attention parts, respectively.
In this way, when there are a plurality of images included in the sample data, the attention part identification unit 16 may identifies, as the attention part, the attention area in each of the images included in the sample data. As in the example shown in
In this case, the importance degree calculation unit 15 inputs three images 8a to 8c as sample data to the importance degree inference engine and supplies the intermediate calculation information Im indicating the intermediate calculation result outputted by the importance degree inference engine to the attention part identification unit 16. In this case, the intermediate calculation information Im is the information indicating the degree of attention in image units for each of the images 8a to 8c included in the sample data. Then, the attention part identification unit 16 identifies an image (also referred to as “attention image”) corresponding to the attention part based on the intermediate calculation information Im. In this case, for example, the attention part identification unit 16 identifies the attention part that is an image having the highest degree of attention or one or more images having the degree of attention equal to or higher than a predetermined threshold. In the example shown in
In this way, when there are a plurality of images included in the sample data, the attention part identification unit 16 may identify the attention part in image units.
(5) Learning of Importance Degree Inference Engine Next, the generation of the importance degree inference engine information D1 will be described.
The configuration of the learning device 6 is the same as that of the information processing device 1 illustrated in
The training data D2 is training datasets that includes plural combinations of video data, which serves as input data to the importance degree inference engine, and a correct answer label indicative of it being important or non-important. The training data D2 contains both of video data (non-importance data) associated with a correct answer label indicative of it being non-important and video data (importance data) associated with a correct answer label indicative of it being important. It is noted that the video data serving as the input data to the importance degree inference engine is data including one or more images.
The learning device 6 learns, by using the training data D2, the importance degree inference engine configured to output, when the video data is inputted thereto, the degree of importance indicated by the corresponding correct answer label. In this case, for example, the learning device 6 may consider the degree of importance to be the lowest value in the case of a correct answer label indicative of it being non-important, and consider the degree of importance to be the maximum value in the case of a correct answer label indicative of it being important.
Then, the learning device 6 determines the parameters of the importance degree inference engine so that the error (loss) between the output from the importance degree inference engine when the video data included in the training data D2 is inputted to the importance degree inference engine and the correct answer label corresponding to the inputted video data is minimized. The algorithm for determining the parameters described above to minimize loss may be any learning algorithm used in machine learning, such as a gradient descent method and an error back-propagation method.
Then, the learning device 6 stores the parameters of the importance degree inference engine obtained by learning as the importance degree inference engine information D1. The generated importance degree inference engine information D1 may be immediately stored in the storage device 4 by data communication between the storage device 4 and the learning device 6, or may be stored in the storage device 4 through a removable storage medium.
(6) Display Examples
Next, a description will be given of display examples of the screen image displayed on the display device 3 under the control of the output control unit 17. Schematically, when any section corresponding to the input data Di is specified, the output control unit 17 displays on the display device 3 the attention part noted in the calculation of the degree of importance corresponding to the specified section in association with the sample data corresponding to the specified section. Accordingly, the output control unit 17 makes the viewer suitably check the information relating to the attention part on the screen image. In this case, the viewer can judge whether or not the importance degree inference engine grasps the correct attention part to calculate the degree of importance, and therefore can evaluate the learning accuracy of the importance degree inference engine. Hereafter, a screen image displayed on the display device 3 under the control of the output control unit 17 is also referred to as “learning accuracy evaluation view”.
The output control unit 17 provides, on the learning accuracy evaluation view according to the first display example, an attention part display area 30 for displaying sample data and an attention part corresponding to a section specified by the user, and a seek bar 38 for specifying a section in which attention parts are visualized.
Here, the seek bar 38 is a bar that clearly indicates the playback time length (40 minutes in this case) of the input data Di, and there provided a slide 39 for specifying a target section (in this case, a section corresponding to 12 minutes 30 seconds) of visualization of the attention parts. Here, the output control unit 17 moves the slide 39 on the seek bar 38 to a position specified by the user based on the input signal S2 generated by the input device 2.
The output control unit 17 extracts the sample data corresponding to the section specified by the slide 39 from the input data Di, and displays the corresponding attention parts on the attention part display area 30 in association with the images included in the extracted sample data. In the example shown in
As described above, in the first display example, the output control unit 17 can suitably present, to the viewer, attention parts identified by the attention part identification unit 16 for the sample data corresponding to the section specified by the user. This makes it possible for the viewer to confirm whether or not the importance degree inference engine grasps the correct attention parts to calculate the degree of importance, and to evaluate the learning accuracy of the importance degree inference engine. When the sample data is configured by a single image, the output control unit 17 displays the learning accuracy evaluation view for displaying the partial area serving as an attention part in the image in the same manner as in
In the second display example, the attention part identification unit 16 identifies the attention image for each sample data as the attention part, and supplies the attention part information In indicating the attention image to the output control unit 17. Then, the output control unit 17 extracts the sample data corresponding to the section specified by the seek bar 38 from the input data Di, and displays the images 31a to 31c included in the extracted sample data on the attention part display area 30. In this case, the output control unit 17 highlights the image 31b specified as the attention image by the edging effect on the basis of the attention part information In.
As described above, in the second display example, the output control unit 17 presents, to the viewer, the attention image specified by the attention part identification unit 16 for the sample data corresponding to the section specified by the user, and suitably causes the viewer to perform the evaluation of the learning accuracy of the importance degree inference engine. The output control unit 17 may specify the degree of attention of each image (images 31a to 31c in
(7) Process Flow
First, the input data acquisition unit 14 of the information processing device 1 acquires the input data Di (step S11). Next, the importance degree calculation unit 15 of the information processing device 1 extracts sample data, which is data for one sample that can be inputted to the importance degree inference engine, from the input data Di (step S12). In this case, for example, the importance degree calculation unit 15 extracts the sample data corresponding to the unextracted section in the input data Di in order from the section where the playback time is earlier.
Then, the importance degree calculation unit 15 calculates the degree of importance for the sampled data extracted at step S12 (step S13). In this case, the importance degree calculation unit 15 configures the importance degree inference engine by referring to the importance degree inference engine information D1, and inputs the above-described sample data into the importance degree inference engine to calculate the degree of importance.
Further, the attention part identification unit 16 of the information processing device 1 identifies the attention part noted in the calculation of the degree of importance for the sample data extracted at step S12 (step S14). In this case, based on the intermediate calculation information Im supplied from the importance degree calculation unit 15, the attention part identification unit 16 identifies the attention area in each image included in the sample data or the attention image among the images included in the sample data as the attention part.
Next, the information processing device 1 determines whether or not the processes at step S12 to step S14 have been completed for the entire input data Di (step S15). If the information processing device 1 determines that the processes at step S12 to step S14 have not been completed for the entire input data Di yet (step S15; No), the information processing device 1 gets back to the process at step S12. In this case, the information processing device 1 executes the processes at step S12 to step S14 on the sample data corresponding to an unextracted section of the input data Di.
On the other hand, if the information processing device 1 determines that the processes at step S12 to step S14 have been completed for the entire input data Di (step S15; Yes), the output control unit 17 of the information processing device 1 performs output control of the information relating to the attention part (step S16). In this case, based on the input data Di supplied from the input data acquisition unit 14, the importance degree information Ii supplied from the importance degree calculation unit 15, and the attention part information In supplied from the attention part identification unit 16, the output control unit 17 generates the display signal S1 relating to the learning accuracy evaluation view exemplified in
(8) Modifications
Next, a description will be given of each modification suitable for the above example embodiment. The following modifications may be applied to the example embodiments described above in arbitrary combination.
(First Modification)
When detecting a user input that specifies information regarding the correctness of an attention part on the learning accuracy evaluation view, the information processing device 1 may perform the training (learning) of the importance degree inference engine based on the information regarding the correctness specified by the user input.
The training (learning) unit 18 updates the importance degree inference engine information D1 by performing the training of the importance degree inference engine based on the input signal S2 that specifies the correctness of an attention part or a correct attention part on the learning accuracy evaluation view. For example, when the training unit 18 detects that the correctness of the attention part indicated on the learning accuracy evaluation view is specified based on the input signal S2, the training unit 18 trains the importance degree inference engine configured to output the intermediate calculation information Im based on the presented sample data and attention part and the specified correctness. For example, when the input signal S2 indicates that the attention part is correct, the training unit 18 trains the importance degree inference engine by using the combination of the sample data and the attention part indicated on the learning accuracy evaluation view as a positive example. Further, when a correct attention part is specified by the user input on the learning accuracy evaluation view, the training unit 18 performs the training of the importance degree inference engine configured to output the intermediate calculation information Im using the combination of the sample data inputted to the importance degree inference engine and the attention part specified by the user input.
In this case, the output control unit 17 extracts the sample data corresponding to the section (here, the section corresponding to 25 minutes and 39 seconds) specified by the seek bar 38 from the input data Di, and displays the image 31 which is the extracted sample data on the attention part display area 30 together with the rectangular frame 32 indicating the attention area. Further, on the learning accuracy evaluation view, the output control unit 17 displays a radio button 33 which is a button for selecting whether the attention part (here, the attention area) presented on the attention part display area 30 is appropriate or inappropriate.
Furthermore, when the attention area is inappropriate, the output control unit 17 displays a message indicating that the attention area to be the correct solution should be specified on the image, and accepts the designation of the attention area to be the correct solution on the image 31. In the example shown in
When the confirm button 34 is selected, the output control unit 17 supplies information relating to the selection result of the radio button 33 and the designation of the position of the rectangular frame 35 on the image 31 to the training unit 18. Then, based on the information supplied from the output control unit 17, the training unit 18 performs the training of the importance degree inference engine configured to output the intermediate calculation information Im which was used for determining the attention part.
Thus, according to this modification, it is also possible to improve the accuracy of the importance degree inference engine by accepting feedback by the user. When the attention image is presented on the attention part display area 30, the information processing device 1A receives a user input that specifies the correct attention image from a plurality of images serving as sample data on the learning accuracy evaluation view.
(Second Modification)
When audio data is included in the input data Di, the information processing device 1 may calculate the degree of importance in consideration of the audio data and identify the attention part in calculation of the degree of importance.
The output control unit 17 displays the image 31 corresponding to the section specified by the seek bar 38 on the attention part display area 30 and displays a sound playback icon 37 for playing back the audio data corresponding to the image 31. Here, as an example, it is assumed that one sample data includes one image as in the third display example of the learning accuracy evaluation view. Further, the output control unit 17, when detecting that the sound playback icon 37 is selected, it performs playback of audio data corresponding to the sample data.
Further, on the attention part display area 30, the output control unit 17 clearly indicates each degree of attention of the video data (here, the image) and the audio data in the calculation of the degree of importance. In this case, for example, the importance degree calculation unit 15 supplies the intermediate calculation information Im indicating at least the respective degrees of attention of the video data and the audio data to the attention part identification unit 16.
Then, based on the intermediate calculation information Im supplied from the importance degree calculation unit 15, the attention part identification unit 16 supplies the attention part information In that indicates at least the ratio between the degree of attention of the video data and the degree of attention of the audio data to the output control unit 17. Then, the output control unit 17 recognizes the ratio between the degree of attention of the video data and the degree of attention of the audio data (here, 8:2) in the calculation of the degree of importance based on the attention part information In, and displays the ratio on the attention part information In as the degree of attention for each of the video data and the audio data.
In the case where the sample data includes a plurality of images, the output control unit 17 may display the plurality of images side by side on the attention part display area 30, and displays the degrees of attention of the video data including the plurality of images and the audio data, respectively.
Accordingly, the information processing device 1 according to the second modification can suitably visualize the attention part in the calculation of the degree of importance even when the degree of importance is calculated based on both the video data and the audio data.
(Third Modification)
The information processing device 1 may calculate the degree of importance of the input data Di based only on the audio data. In this case, the information processing device 1 may identify the attention part in the audio data and display the information regarding the attention part.
In this case, the output control unit 176 extracts from the input data Di the sample data which is configured by audio data and which corresponds to the section (here, 7 minutes 13 seconds) specified by the seek bar 38. Then, the output control unit 17 displays the waveform of the extracted audio data in the sound waveform display area 41, and displays an image corresponding to the calculation result of the frequency spectrum of the audio data in the sound spectrogram display area 42.
Further, the output control unit 17 identifies the frequency region corresponding to the attention part based on the attention part information In supplied from the attention part identification unit 16 and highlights the specified frequency region on the sound spectrogram display region 42. Here, as an example, the importance degree calculation unit 15 supplies the intermediate calculation information Im indicating the degree of attention for each frequency to the attention part identification unit 16. Then, on the basis of the intermediate calculation information Im, the attention part identification unit 16 identifies the attention part that is the frequency region where the degree of attention is higher than a threshold, and supplies the attention part information In indicating the identified frequency region to the output control unit 17. Instead of identifying the attention part that is a frequency region in the sample data, the attention part identification unit 16 may identify the attention part that is a section (sub-section) having a particularly high degree of attention in the section corresponding to the sample data. In this case, the output control unit 17 may highlight the sub-section indicated by the attention part information In supplied from the attention part identification unit 16 on the audio waveform display area 41 or on the audio spectrogram display area 42.
Thus, the information processing device 1, even when calculating, based on the audio data, the degree of importance that is an index necessary for digest generation, it is possible to suitably visualize the attention part in the calculation of the degree of importance.
(Fourth Modification)
The attention part visualization system 100 may be a client-server model.
The terminal device 5 is a terminal equipped with an input function, a display function, and a communication function, and functions as the input device 2 and the display device 3 shown in
The information processing device 1A has the same configuration as the information processing device 1 illustrated in
<Second Example Embodiment>
The input data acquisition means 14X is configured to acquire input data “Di” including at least one of video data or audio data. The video data is data including at least one image. Examples of the input data acquisition means 14X include the input data acquisition unit 14 in the first example embodiment.
The importance degree calculation means 15X is configured to calculate a degree of importance of the input data Di. In this case, the importance degree calculation means 15X may equally divide the input data Di into each section with a predetermined time length and calculate the importance for each section. In this case, the importance degree calculation means 15X calculates the time-series degrees of importance of the input data Di. Examples of the importance degree calculation means 15X include the importance degree calculation unit 15 in the first example embodiment.
The attention part identification means 16X is configured to identify an attention part of the input data Di in a calculation of the degree of importance. It is noted that in a case where the importance degree calculation means 15X calculates the time-series degrees of importance of the input data Di, the attention part identification means 16X may identify an attention part corresponding to at least one of the time-series degrees of importance. Examples of the attention part identification means 16X include the attention part identification unit 16 in the first example embodiment.
The information processing device 1X according to the second example embodiment can suitably identify the attention part in the calculation of the degree of importance for the input data including at least one of the video data or the audio data.
In the example embodiments described above, the program is stored by any type of a non-transitory computer-readable medium (non-transitory computer readable medium) and can be supplied to a control unit or the like that is a computer. The non-transitory computer-readable medium include any type of a tangible storage medium. Examples of the non-transitory computer readable medium include a magnetic storage medium (e.g., a flexible disk, a magnetic tape, a hard disk drive), a magnetic-optical storage medium (e.g., a magnetic optical disk), CD-ROM (Read Only Memory), CD-R, CD-R/W, a solid-state memory (e.g., a mask ROM, a PROM (Programmable ROM), an EPROM (Erasable PROM), a flash ROM, a RAM (Random Access Memory)). The program may also be provided to the computer by any type of a transitory computer readable medium. Examples of the transitory computer readable medium include an electrical signal, an optical signal, and an electromagnetic wave. The transitory computer readable medium can provide the program to the computer through a wired channel such as wires and optical fibers or a wireless channel.
The whole or a part of the example embodiments described above (including modifications, the same applies hereinafter) can be described as, but not limited to, the following Supplementary Notes.
[Supplementary Note 1]
An information processing device comprising:
an input data acquisition means configured to acquire input data including at least one of video data or audio data;
an importance degree calculation means configured to calculate a degree of importance of the input data; and
an attention part identification means configured to identify an attention part of the input data in a calculation of the degree of importance.
[Supplementary Note 2]
The information processing device according to Supplementary Note 1,
wherein the importance degree calculation means is configured to calculate the degree of importance of the input data based on an inference engine learned to infer, when data including at least one of video data or audio data is inputted thereto, the degree of importance of the inputted data.
[Supplementary Note 3]
The information processing device according to Supplementary Note 2,
wherein the inference engine has a multi-layer structure, and
wherein the attention part identification means is configured to identify the attention part based on an output from an intermediate layer of the inference engine.
[Supplementary Note 4]
The information processing device according to any one of Supplementary Notes 1 to 3,
wherein the input data includes the video data, and
wherein the attention part identification means is configured to identify, as the attention part, an attention area noted in the calculation of the degree of importance in an image included in the video data.
[Supplementary Note 5]
The information processing device according to any one of Supplementary Notes 1 to 3,
wherein the input data includes the video data, and
wherein the attention part identification means is configured to select, as the attention part, an attention image noted in the calculation of the degree of importance from images included in the video data.
[Supplementary Note 6]
The information processing device according to any one of Supplementary Notes 1 to 3,
wherein the input data includes the audio data, and
wherein the attention part identification means is configured to identify, as the attention part, a section or a frequency of the audio data noted in the calculation of the degree of importance.
[Supplementary Note 7]
The information processing device according to any one of Supplementary Notes 1 to 6,
wherein the input data includes both the video data and the audio data, and
wherein the attention part identification means is configured to identify each degree of attention of the video data and the audio data in the calculation of the importance degree.
[Supplementary Note 8]
The information processing device according to any one of Supplementary Notes 1 to 7, further comprising
an output control means configured to display information relating to the attention part on a display device.
[Supplementary Note 9]
The information processing device according to Supplementary Note 8,
wherein, when a section corresponding to the input data is specified, the output control unit is configured to display, on the display device, the attention point noted in the calculation of the degree of importance corresponding to the specified section in association with the input data corresponding to the section.
[Supplementary Note 10]
The information processing device according to any one of Supplementary Notes 1 to 9, further comprising
a training means configured to train, when there is a user input specifying information on correctness of the attention part, an inference engine used for calculating the degree of importance based on the information on the correctness.
[Supplementary Note 11]
The information processing device according to any one of Supplementary Notes 1 to 10,
wherein the degree of importance is an index that serves as a criterion in generating a digest of the input data.
[Supplementary Note 12]
A control method executed by a computer, the control method comprising:
acquiring input data including at least one of video data or audio data;
calculating a degree of importance of the input data; and
identifying an attention part of the input data in a calculation of the degree of importance.
[Supplementary Note 13]
A storage medium storing a program executed by a computer, the program causing the computer to function as:
an input data acquisition means configured to acquire input data including at least one of video data or audio data;
an importance degree calculation means configured to calculate a degree of importance of the input data; and
an attention part identification means configured to identify an attention part of the input data in a calculation of the degree of importance.
While the invention has been particularly shown and described with reference to example embodiments thereof, the invention is not limited to these example embodiments. It will be understood by those of ordinary skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the claims. In other words, it is needless to say that the present invention includes various modifications that could be made by a person skilled in the art according to the entire disclosure including the scope of the claims, and the technical philosophy. All Patent and Non-Patent Literatures mentioned in this specification are incorporated by reference in its entirety.
1, 1A, 1B, 1X Information processing device
2 Input device
3 Display device
4 Storage device
5 Terminal device
6 Learning device
100, 100B attention part visualization system
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/020770 | 5/26/2020 | WO |