The invention to which this application relates is a method of adapting video images to small screen sizes, in particular to small screen sizes of portable handheld terminals.
Mobile TV (Mobile Television) is a growing and certainly promising market. It allows the reception of Television signals on small portable devices like cell phones, smartphones or PDAs (Personal Digital Assistant). The display on the screen of those small portable devices does not provide such a detailed image as it is known from stationary TV sets at home (currently SDTV, Standard Definition Television). Irrespective of such essential difference of viewing conditions, the same contents are mainly displayed on the screens of both, mobile and stationary TV systems. However, producing a separate programme for mobile TV would cause a huge expenditure of human sources as well as an increase of costs which broadcasters hardly can bring up.
To overcome such uncomfortable situation some proposals were made to adapt video contents having a high image resolution to smaller displays by cropping parts out. Such proposals are dealing with the automatic detection of regions of interest (ROI) based on feature extraction with common video analysis methods. The detected regions of interest in a video signal are used to find an adequate crop (cutting) area and to compose a new image containing all relevant information adapted to displays of handheld devices.
However, such known cropping systems are inadequately dealing with a wide range of contents since they are missing semantically knowledge and thus general defined methods.
It is the object of the present invention to improve a cropping system by obtaining the coverage of a wide range of contents for smaller sized displays of handheld devices.
The above object is solved by a method starting from a metadata aggregation and the corresponding video, e.g. in post-production, programme exchange and archiving, wherein
(a) the video is passed through to a video analysis to deliver video, e.g. by use of motion detection, morphology filters, edge detection, etc.,
(b) the separated video and metadata are combined to extract important features in a context wherein important information from the metadata is categorised and is used to initialise a dynamically fitted chain of feature extraction steps adapted to the delivered video content,
(c) extracted important features are combined to defione regions of interest (ROI) which are searched in consecutive video frames by object tracking, said object tracking identifies the new position and deformation of each initialised ROI in consecutive video frames and returns this information to the feature extraction thereby obtaining a permanent communication between said feature extraction and said object tracking,
(d) one or several ROIs are extracted and inputted video frame by video frame into a cropping step
(e) based on weighting information a well composed image part is cropped by classifying said supplied ROIs by importance, and
(f) said cropped image area(s) are scaled to the desired small screen size.
Advantageously, the invention provides a feature extraction in video signals with the aid of available metadata to crop important image regions and adapt them on displays with lower resolution.
Specific embodiments of the invention are now described with reference to the accompanying drawings wherein
The invention is aiming at file-based production formats (based on a shift from tape records to tapeless records) which are allowing the usage of various metadata for post-production, programme exchange and archiving. Such metadata are included in a container format containing video data and metadata. Such metadata include content-related-information which describes the type of genre as well as specific information related to details of the production procedure. The generated metadata are made available in a container format containing video and metadata. Such container format allows a multiplex of different data in a synchronised way, either as file or stream. The combination of metadata information with known feature extraction methods is resulting in the inventive method which is individually adaptable to a wide range of contents.
The overall system according to
Block 1 performes the aggregation and parsing of metadata as shown in detail in
All three metadata types, namely technical, descriptive and optional data are provided to the feature extraction modul (Block 2).
Block 2 which is the feature extraction module is shown in detail in
Each genre type has a different combination of feature extraction methods and different parameters, which are dynamically controllable by metadata or other information obtained by extracted features. This is depicted in block 2 by a matrix allocating a genre type with specific feature extraction methods. Following, the detected features are weighted by importance, e.g. by their contextual position or size. Relevant and related features are then combined to a ROI and delivered to the tracking tool. The tracking tool identifies the new position and deformation of each initialised ROI in consecutive frames and returns this information to the feature extraction. By this, a permanent communication between feature extraction and tracking tool is guaranteed. This can be used to suppress areas for feature extraction which are already tracked. Finally, one or several ROIs are extracted. The weighting of each feature depends on the context of the present video content. It comes to the decision by an algorithm aggregating and processing all available feature extraction data and metadata. This allocations deliver decision citerions what should be an integral part and how it should be arranged in a new composed image.
To explain the feature extraction performed in block 2 in more detail, a short example shown in
Block 3 and 4 (
In addition to the cropping parameters as mentioned above, viewing conditions for the different displays have to be considered. By this, a benchmark defines which size the cropped area should have compared to the original image. Such a benchmark can be determined by a comparison of viewing distances for both display resolution. Those considerations may change the size and shape of the cropped area again and has to be adapted once more. After coming to a decision of a properly cropped area considering all content-related and technical issues, the image has to be scaled to the size of the target display.
As shown above, the example of extracting features for showjumping (
The proposed methodology describes a workflow controlled by metadata. By this, a specially-tailored feature extraction and cropping method can be applied to increase the reliability of video analysis and aesthetic of the composed image.
The video analysis and cropping example of showjumping explained above is just for demonstration purposes of one possible workflow more in detail. They are not part of the patent application. Moreover, the scope of application is not limited to tv productions. The invention can be generally used where video cropping is required and metadata in a known structure is available, e.g. for web streaming or local stored videos.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2008/002266 | 3/20/2008 | WO | 00 | 12/13/2010 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2009/115101 | 9/24/2009 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6483521 | Takahashi et al. | Nov 2002 | B1 |
7260261 | Xie et al. | Aug 2007 | B2 |
7471827 | Xie et al. | Dec 2008 | B2 |
7821545 | Sato | Oct 2010 | B2 |
8085855 | Bennett | Dec 2011 | B2 |
20050203927 | Sull et al. | Sep 2005 | A1 |
20060139371 | Lavine et al. | Jun 2006 | A1 |
20060215753 | Lee et al. | Sep 2006 | A1 |
20060239645 | Curtner et al. | Oct 2006 | A1 |
Number | Date | Country |
---|---|---|
2006 056311 | Jun 2006 | WO |
Number | Date | Country | |
---|---|---|---|
20110096228 A1 | Apr 2011 | US |