1. Field of the Invention
The present invention relates to an apparatus for managing video data including a moving image and, more particularly, to a retrieval apparatus, a reproduction apparatus, a recording apparatus, and the like that utilize a feature or a pattern of video data.
2. Description of the Related Art
Conventionally, research has been conducted in the field of information retrieval. In particular, considerably high-accuracy retrieval has been achieved for text data. Similarly, for moving images or still images, services in which retrieval is performed using an input keyword have been provided. For example, a technique that utilizes meta-data of a moving image during retrieval has been proposed (Japanese Unexamined Patent Application Publication No. 2007-12013).
However, an appropriate keyword is not always assigned to video data. Also, if moving image data, photograph data, or the like is privately recorded by the user, keyword search cannot be performed with respect to the data unless a keyword is previously associated with the data by the user itself.
On the other hand, image recognition technology has been advanced, and a technique of analyzing a feature or a pattern of an image and classifying or searching for video data has been conventionally studied (see U.S. Pat. No. 6,665,442). Also, a technique of creating a retrieval menu having good retrieval efficiency using various categorization patterns is known (see U.S. Pat. No. 6,219,665).
In recent years, video data recorders having large-capacity hard disks are becoming widespread. For such recorders, efficient retrieval of video data stored in the hard disk is required.
However, time and efforts are particularly required for the conventional technique of associating keywords with privately recorded moving images or still images. Also, the classification technique using features or patterns of images is supposed to be used by experts or the like. It has not been taken into consideration that an easily recognizable classification reference is presented to general users.
To solve the above-described problem, the present invention provides to the user an intuitive interface by generating icons that are representative samples matching the results of analysis of feature amounts or patterns.
As described above, the recent increase in the hard disk capacity leads to a demand for a function of easily retrieving a desired moving image or still image. Recent DVD (Digital Versatile Disc) recorders have a function of linking to a digital camera, so that retrieval of a still image is also an important function. The types of video widely include TV broadcast video, video downloaded from a network, video recorded by the user, and the like. The encoding format varies among them, and there is no standard format for retrieval. In such a situation, it would be considerably convenient to actually recognize features of moving images or still images, and retrieve, for example, a specific human face or a specific sport.
With the state-of-the-art image recognition technology, these images can be recognized to some extent within a limit. For example, sports that are performed on lawn often have features, such as intense motions and a green background. News broadcasts have a feature such that someone is present behind a desk.
Data recorded by the user is typically biased, so that general categorization is not helpful. It is also considered that retrieval of data recorded by the user is not necessarily perfect, and pattern recognition that provides some guidance is sufficient.
However, it is considerably difficult for the user to input a search pattern, for example, “green background and intense human motion”. The user usually desires to search for a scene based on the content rather than the image feature. It is difficult to involve the user in such pattern recognition.
Therefore, a main object of the present invention is to explicitly present the user a pattern that is actually extracted from video data, in an easily recognizable manner. To achieve this object, an icon reflecting a feature amount of a moving image or a still image is presented to the user.
This icon is not a smaller version of an image (i.e., so-called thumbnail), and clearly represents a feature amount pattern and is dynamically generated, depending on a content to be retrieved. The icon also provides a visible image of retrieval based on a feature amount. This is more universal than thumbnails, and can further emphasize the feature amount pattern. Whereas it is difficult to generate a thumbnail common to a plurality of moving images or still images, such a problem does not arise for icon generation based on the feature amount pattern. These features are significantly advantageous when they are used for retrieval.
According to the present invention, by generating icons from feature amounts of video data, it is possible to create various icons that visually reflect various feature amounts and are easily recognized by the user.
Also, by presenting an icon indicating a feature amount to the user, who in turn selects the icon to perform retrieval using the feature amount, it is possible to achieve retrieval using a feature amount that can be easily imagined by the user.
Hereinafter, embodiments in the best mode of the present invention will be described with reference to the accompanying drawings.
The hard disk 11 stores various types of video data, such as encoded moving image data or still image data (in some cases, audio data or meta-data is included).
The drive interface section 12 gives write data 36 to and receives read data 37 from the hard disk 11. The drive interface section 12 also gives write data 38 to and receives read data 39 from the DVD drive 30.
The decoder 13 decodes video data 40 received from the drive interface section 12. The decoding result is supplied as a decoded image 41 to the image synthesis section 19, and as feature amount extraction image data 46 to the feature amount extraction section 16. The decoder 13 also supplies audio data to the feature amount extraction section 16.
The meta-data processing section 14, for example, receives, from the drive interface section 12, meta-data 42 that is stored together with video data in the hard disk 11, and supplies a keyword 43 assigned to the video data to the image synthesis section 19.
The encoder 15, for example, during dubbing, encodes video data 44 received from the decoder 13, and supplies an encoded image 45 to the drive interface section 12.
The feature amount extraction section 16 extracts various feature amounts from video data 46 received from the decoder 13, and supplies feature amount information 48 to the feature amount index control section 17. As used herein, the feature amount ranges widely from an advanced feature amount for recognizing a specific human face to a feature amount representing only color tendency. The feature amount extraction section 16 also supplies algorithm selection information 47 to the decoder 13 so that an appropriate decoding algorithm is designated in the decoder 13.
The feature amount index control section 17 pairs the feature amount information 48 received from the feature amount extraction section 16 with a position in the hard disk 11 at which video data is stored, and records the pair as index information, and gives feature amount information 51 to and receives selected-feature amount information 52 from the icon generation section 18. If the feature amount extraction section 16 is operated during free time to generate and record index information into the feature amount index control section 17, the speed of image retrieval described below can be increased. For video data whose index information has not yet been generated, the feature amount index control section 17 receives new feature amount information 48 from the feature amount extraction section 16. In this case, index information may be generated and recorded.
The icon generation section 18 generates an icon that is a small image reflecting the feature amount information 51 received from the feature amount index control section 17, and supplies an icon image 53 to the image synthesis section 19 and the menu generation section 20.
The image synthesis section 19 combines the decoded image 41 received from the decoder 13, the keyword 43 received from the meta-data processing section 14, and the icon image 53 received from the icon generation section 18 into a single screen image, and supplies synthesized video data 54 to the display device 31.
The user interface section 21 receives user selection information 56 for icon selection via, for example, a remote controller, and supplies icon selection information 57 to the icon generation section 18.
The selected-feature amount information 52 that is supplied to the feature amount index control section 17 from the icon generation section 18 that has received by the icon selection information 57, is information that indicates the range of a selected feature amount. The feature amount index control section 17 selects video data to be read from the hard disk 11 based on the selected-feature amount information 52, and gives a read command 49 to and receives a response signal 50 from the drive interface section 12.
The menu generation section 20 generates a menu for dubbing using the icon image 53 received from the icon generation section 18, and supplies menu data 55 to the drive interface section 12 so that the menu is written into, for example, a DVD.
The video data recorder 10 of
Note that the decoder 13 of
Note that the feature amount extraction section 16 does not request a perfect decoding function from the decoder 13. A lowest resolution may be sufficient or a very large motion may not be required, depending on the extraction algorithm. In particular, when still images are mainly used for feature amount extraction, it is not necessary to calculate a feature amount in very short time intervals. For example, the decoder 13 can process moving image data as still images that are provided at the rate of one per second.
Next, an operation of the icon generation section 18 that is a basis of the present invention will be described. The purpose of an icon as used herein is to specifically convert information about a feature amount into an image that can be easily imagined by the user. The icon may be a single image, and may represent a feature of a plurality of moving images when it is used for retrieval. In this case, when the feature amount varies, the icon is not very suitable as representation of a feature of a plurality of moving images. Therefore, the icon generation section 18 receives each type of feature amounts and, if a plurality of moving images are present, a variance value as an index indicating the variation, and generates an icon. In other words, the icon generation section 18 receives the type of feature amounts, the values of the feature amounts, and a variance value of the values. The icon types are categorized into one for the background, one for the foreground, and one relating to audio.
Each feature amount is associated with corresponding basic icon data and its deformation type. These pieces of information are desirably recorded in the icon generation section 18. Various methods may be used to associate these pieces of information in the icon generation section 18 with their deformation types, and are implemented by software and are processed by a processor for the purposes of general versatility. In this case, the function can be easily extended by changing software.
Note that a basic icon is registered for each feature amount. For example, in step 103, by applying to the basic icon a deformation algorithm corresponding to the value or variance of a feature amount, the value of the feature amount can be reflected on icon display in various embodiments, so that the actual value or variance of the feature amount can be recognized by the user.
As described above, various visual representations can be used to cause the user to strongly imagine feature amounts. This large number of variations is the merit of generation of icons from feature amounts. If only predetermined icons are displayed, such a large number of variations cannot be represented.
Next, moving image retrieval in which the effect of the present invention is most significantly exhibited will be described.
Next, in step 202, distributions of feature amounts of retrieval target files are examined, and the retrieval targets are divided into a plurality of groups. It is here expected that feature amount distributions are mostly biased, depending on the feature of moving image files. For example, files are divided into those having considerably large specific feature amounts and those having considerably small specific feature amounts. In other words, a larger number of such feature amounts is more suitable for categorization. Such feature amounts are used to divide all files into a plurality of groups in step 202. As described below, the categories are displayed as a menu, and therefore, files are divided into only a number of categories appropriate for displaying and selection. Note that it depends on the user's preference, and the number of categories may be determined by the user to be, for example, 10. In step 203, representative feature amounts are calculated for the respective categories, and their variances are calculated.
In step 204, an icon is generated and displayed for each category. In this case, as shown in
In step 205, the process waits for selection by the user. In step 206, it is determined whether or not retrieval is ended. If retrieval continues, the retrieval range is narrowed, depending on the selected icon, in step 207, and thereafter, the process returns to step 202. More detailed retrieval operations are performed while icons corresponding to sub-categories are generated.
The above-described process can be repeated until the selection range becomes small. Since icons optimal to each selection range are displayed, it is highly convenient. When the number of choices is small, the user may select desired video.
When retrieval is ended, a retrieved moving image(s) or still image(s) is reproduced and displayed in step 208. In this case, if there are a plurality of retrieved moving images or still images, they may be successively displayed. Also, when a feature amount represents a specific scene in a single moving image, only the matching scene may be displayed.
Note that groups selected by icon selection are desirably the same as groups which are obtained by categorization during menu generation. This is because icon selection can match the contents of retrieval. However, the evaluation of image recognition generally varies depending on subjective recognition by the user. Therefore, if groups selected by icon selection are caused to accurately match groups which are obtained by categorization during menu generation, an image desired by the user may often fail to be included in icons. Therefore, it is more desirable to select data having a feature amount within a range slightly larger than the feature amount range which is used for categorization during menu generation. Thereby, the possibility of retrieval omission during icon selection can be reduced.
Although the icon of the present invention does not require a keyword, a keyword is considered to assist conveying an image to the user. Therefore, if a keyword is present at the same time when an icon is generated, the keyword can also be displayed. However, it may be expected that there are considerably many keywords to be assigned to a single icon. In an extreme case, the same keyword may be displayed for all icons, which is meaningless.
Therefore, the priority of a keyword to be displayed is determined using the frequency of occurrence. Specifically, a keyword that frequently appears in data belonging to one icon and does not appear in data belonging to the other icons, is given a higher priority. By performing such a process, an appropriate keyword is displayed as required. Any meta-data, such as other video data and the like, as well as keywords can be supported. Also, if an appropriate keyword is not found, no keyword needs to be displayed.
There are two methods of processing video data that has not yet been associated as index information in the feature amount index control section 17. One method is to associate all search patterns with data that has not yet been assigned. According to this method, data which has not yet been assigned does not fail to be retrieved. The user can certainly find desired data. The other method is to utilize the fact that data which has not yet been assigned is data which was most recently added, additionally display an icon indicating most recent data (the uncategorized icon of
As described above, the video data recorder 10 of this embodiment exhibits a considerably significant effect in retrieval of recorded moving images. Also, video data on a DVD as well as video data recorded in the hard disk 11 can be easily retrieved if index information is created for the video data in the DVD.
The method of using the icon of the present invention is not limited to image retrieval as described above. For example, it is more preferable to provide a technique of causing the user to be further accustomed to using icons for retrieval so as to cause the icon to be more easily used.
The new menu of
Also, video can be easily edited if an icon is provided for each scene. Video editing involves a scene retrieval operation. If the icon of the present invention is used for scene retrieval, the convenience of editing is improved.
For example, when video data is transferred to other apparatuses, the format of recorded video data may be changed to a format which allows the video data to be reproduced in other apparatuses. When the hard disk 11 nearly overflows, data may be compressed again. In these cases, it is considered that even if the encoded format is changed, the feature amount of an image does not change. Therefore, the feature amount of the data does not need to be calculated again. Therefore, when such duplication is performed, it is recorded what is the original data of the duplicate. Specifically, when data is duplicated, a feature amount of the duplicate is associated with the original feature amount, so that the feature amount does not need to be calculated during data duplication.
Here, attention should be paid to a case where original data is deleted. In this case, original video data is erased, and corresponding feature amount data may be desired to be deleted. However, in such a case, feature amount information corresponding to duplicated video data is erased. Therefore, most desirably, when video data is deleted, the feature amount information of the video data is correctly associated with the duplicated video data.
Note that, in the case of moving image retrieval in which a feature amount of video data is used as described above, the same video pieces should be considered as a single piece of video. Therefore, duplicates for which the original image is present are previously excluded from retrieval targets. It is considered that some duplicates have degraded image quality, and it is desirable to use original data. In other words, if duplicated video data is excluded from retrieval targets, the possibility that original data having higher image quality than that of duplicated video data is retrieval can be improved.
As described above, the video data management apparatus of the present invention generates an icon from a feature amount of video data, and this icon can be used for retrieval. Also, when the icon is used during normal reproduction and the like, the correspondence between the icon and the video can be presented to the user in an easily recognizable manner, resulting in moving image retrieval that can be considerably easily used.
Therefore, the video data management apparatus of the present invention is particularly effective to moving image retrieval that can be easily understood by the user, in a video recording/reproduction apparatus.
Number | Date | Country | Kind |
---|---|---|---|
2007-162776 | Jun 2007 | JP | national |