The present invention relates to a method of generating an image scaling curve for correcting aspect ratio of an image or image sequence. More specifically, the method relies on detecting local saliency in an image. The invention equally relates to a corresponding apparatus and a computer program product comprising instructions for implementing the steps of the method.
Recent developments in the field of display technologies have seen great diversity in display sizes. Displays vary from low resolution hand-held devices to high definition wide-screen TVs. Computing and communications infrastructures are also evolving to support images and video into this ever expanding set of potential displays. Visual content is becoming more important for sharing, expressing, and exchanging information on devices such as cell phones and hand-held personal computers (PCs), personal digital assistants (PDAs) with video capabilities and home-networked media appliances. The same content is required to be displayed in different dimensions and aspect ratio for different devices. Standard image processing methods of scaling and cropping are not proving to be sufficient. The aspect ratio is understood as being the ratio of a width of an image to a height of the image.
With the use of computers to generate or convert video files it can be often seen that the resulting video becomes distorted. This is usually caused by inappropriate operation of the involved software by an inexperienced user—it requires quite a bit of knowledge about computer and video, such as understanding that TV systems use a non-square pixel aspect ratio.
Furthermore, legacy computer video file formats such as the audio video interleave (AVI) container lack the appropriate means to store aspect ratio information. A typical example of this problem are wide screen digital versatile discs (DVDs) converted into computer files, played on a TV, for instance via digital living network alliance (DLNA). Often movies are stored in anamorphic wide screen format. Unless care is taken during conversion into a computer file, the resulting movie will be rendered distorted during playback as can be seen by comparing
With the popularity of wide-screen TVs, efficient solutions which could effectively display video on displays other than originally intended is needed. Traditionally TVs implement a method called “black bar detection” to automatically adjust the aspect ratio. The video is scaled vertically in such a way that the black bars disappear. This is especially done in modern wide screen flat TVs.
U.S. Pat. No. 7,339,627 by Brian Schoner et al. describes a method for aspect ratio correction based on black bars surrounding the image. While applied in TVs in the market, this method has the disadvantage that it fails if the source video is encoded incorrectly (such as many videos downloaded from the Internet show), or for movies in 2.35:1 movie aspect ratio, which requires to be shown with black bars even on a 16:9 widescreen TV.
In Philips TVs, a technique of Panoramic Stretch is used, where the boundaries of image are stretched to take up the wider screen. Although the assumption on which the method is based, i.e. most essential information is in the centre view, is often a good one, there may be cases where such an anisotropic stretch is not the optimal solution. Better methods are desired enabling effective resizing for a variety of displays.
Aspect ratio correction may not be enough to render the image suitable for viewing. Image retargeting can also be invoked. Retargeting is scaling the image while taking the content, the important objects in the scene, in consideration. It is therefore often called content-aware resizing. The image retargeting in certain implementations first consist of image cropping and then scaling. The video retargeting problem is more challenging than image targeting. Due to motion and camera movement, determining important aspects of video is difficult. Moreover, maintaining temporal consistency, when important aspects change dynamically, is demanding. Fortunately, what is important to preserve depends highly on low-level visual saliency which can be modelled quite well, but in some cases it can even depend on high level aspects of the underlying story.
The proposed prior art methods for content-aware aspect ratio correction or retargeting, usually lack temporal consistency, are computationally complex, or introduce unacceptable distortions in some cases.
It is thus the object of the present invention to overcome the above-identified difficulties and disadvantages by proposing an improved solution for image or video processing.
According to a first aspect of the invention, there is provided a method of generating an image scaling curve, the method comprising:
Thus, the present invention provides a very efficient method for generating an image scaling curve. Furthermore, the proposed method does not rely on meta data. The proposed method does not rely on knowledge of aspect ratio or orientation of the picture (or video) as the method automatically determines from the image itself important objects, regions and/or pixels. The present invention is of special interest because of recent plans to introduce ultra wide-screen TV (21:9). It is likely that the old Panoramic Stretch no longer suffices to upscale legacy content. On the other hand, current more advanced methods from the literature suffer from serious flaws for TV-viewing and are often too costly.
According to a second aspect of the invention, there is provided a computer program product comprising instructions for implementing the method according to the first aspect of the invention when loaded and run on computer means of an apparatus.
According to a third aspect of the invention, there is provided an apparatus for generating an image scaling curve, the apparatus comprising:
Other aspects of the invention are recited in the dependent claims attached hereto.
Other features and advantages of the invention will become apparent from the following description of non-limiting exemplary embodiments, with reference to the appended drawings, in which:
One embodiment of the present invention is based on an idea of having a set of initial scaling curves and then cost values are calculated for these curves. A new scaling curve to be used in the actual image scaling is calculated so that it results as a weighted average of the individual curves where the weights are inversely related to the aforementioned cost. Thus, by looking at the video sequence itself, the system does not rely on any meta-data that, if available, could even be wrong.
To arrive at a nonlinearly scaled image 601 as shown in
However, it is to be noted that the most relevant information is not always located near the centre of the image. For this purpose the present invention proposes a new solution, where different scaling curves can be advantageously used.
In accordance with an embodiment of the present invention, there is further provided the following units: saliency detector 709, accumulator 710, cost calculator 711, curve generator 713 and memory 715. The decoded image data from the decoder 703 is not only fed into the scaler 705 but is fed in parallel into the saliency detector 709, which is arranged to detect salient features, also referred to as local saliency, in images. The salient features reflect the perceived distortion in the image in case the corresponding image segment is stretched or shrunk. In this example the saliency detector 709 makes use of what is known to the person skilled in the art as “computer vision library”. A computer vision library is a library of programming functions stored in the memory 715 mainly aimed at real time computer vision. These functions can e.g. detect people's faces and especially certain features such as eyes or lips. Round structures, such as wheels and watches, can also be detected relatively easily. The exact definition of the object depends on the particular computer vision library used for the saliency detector 709. Instead of relying on the computer vision library, a simpler method could be used as well, such as detecting edges in the images.
Information about the local saliency is fed into the accumulator 710 which is arranged to accumulate the detected local saliency in one direction (horizontal or vertical) as will be explained later in more detail. From the accumulator 710 the accumulated local saliency is fed into the cost calculator 711. The cost calculator is arranged to calculate costs for different scaling curves stored in the memory 715. In this example the memory 715 also contains a set of initial horizontal and/or vertical scaling curves that include the standard curve, i.e. the “bathtub” curve, of the Panoramic Stretching, but also some curves that might be suitable in cases where the standard curve fails. This happens mainly when most important object(s) are near the side panels of the screen. The number of stored initial scaling curves is at least 2, but smaller than the number of pixels in the image. In most applications the usage of 3-10 initial scaling curves suffices.
Given the salient features or local saliency of the current image, a “cost” for each of these initial curves can be calculated. The cost of a scaling curve depends on the position of essential objects such as faces, moving objects, etc., in the image, such that the cost increases the more the local scaling factor differs from unity scaling (scaling factor 1) particularly at the position of these essential objects. In other words, a high number of salient features in locations where the scaling factor differs from 1 leads to a high cost value. For the calculation of the cost values, the salient features in locations where the scaling factor is 1 can be neglected.
The curve generator 713 is arranged to calculate the scaling curve, i.e. the position transformation curve, to be used in the actual image rescaling as a weighted average of the individual curves where the weights are inversely related to the aforementioned cost. This means that the weights are decreasing with increasing cost of a predefined scaling curve. All candidate curves (both horizontal and vertical scaling curves) individually cause the desired aspect ratio change. In this case when the sum of the weights equals 1 the resulting curves will also lead to the desired aspect ratio change. In case the input video sequence has a good temporal stability (no scene change), the weights will only change gradually causing also the output retargeted video to be temporally stable. In the event of low temporal stability of the input video (scene change), the output can react immediately to the updated cost without remaining effects from the previous scene. Consequently, the so much appreciated temporal stability of the proposed rescaling method does not prohibit rapid adaptation to the new shot. Moreover, by selecting the initial curves more or less ambitiously (i.e. the curves differ from the standard curve) it can be guaranteed that the artefacts of the aspect ratio correction are modest.
Tables 1-4 illustrate concrete examples for calculating the correct magnification curve to be used in the image scaling. In the tables each column represents a specific horizontal location in the image. For simplicity the predefined scaling curves in these examples use only two different magnification values, namely values 1 and 2. These magnification values can also be referred to as local magnification values or local scaling curves in more general. Thus, the predefined scaling curves can be considered as consisting of several local scaling curves that can be considered as glued together. The predefined set of scaling curves contains three scaling curves in each example. The quality figure shown in the tables is inversely related to the cost values, which are calculated for each curve in the predefined set by taking into account the local saliency in the image as was explained above. For the final scaling curve, for each location Y the resulting magnification in one direction, i.e. horizontal or vertical, can be calculated by using the following formula:
The resulting final magnification curves for Tables 1, 2, 3 and 4 are shown in
In step 1207, a set of initial scaling curves is obtained. These curves can be stored in the memory 715. Then in step 1209 costs are calculated for the different initial curves as explained above by taking into account the local saliency in the image. In step 1211 a new scaling curve is calculated based on the calculated costs. Finally in step 1213 the image is rescaled in a second direction (horizontal direction in this example) by the scaler 705 by applying the new scaling curve. The image is now ready to be displayed to the user. The second direction is substantially orthogonal to the first direction. It is to be noted that in the example above scaling was applied in just one direction, but is equally possibly to apply scaling in both horizontal and vertical directions. If the scaling is done in both of these directions, then the scaling apparatus shown in
Naturally not every scene or picture of a video sequence will have objects that can be recognised by a computer. This is not an issue since the aspect ratio usually remains constant throughout a large part of it, if not the whole video sequence. This also means that for performance reasons the method does not rely on monitoring all video frames i.e. images, but can sample every x-th frame.
The flow chart of
In order to match the local magnification factor profile to a targeted image, it is constrained in step 1313 by adapting the profile so that the integral of the profile matches the desired output size while limiting the minimum and maximum magnification values of the local magnification factor profile. The present method can be used for both enlarging and compressing the image. The minimum scaling factor is 1 if the image is enlarged. On the other hand, the maximum scaling factor is 1, if the image is compressed. Next in step 1315 the local magnification factor profile is integrated to obtain the final scaling curve. Finally in step 1317 the image is rescaled in a second direction by applying the final scaling curve. Again the second direction is substantially orthogonal to the first direction.
Depending on the implementation details, it is possible to end up with a scaling curve where the input pixels should go into the output. However, if there are more output pixels than input pixels then some output pixels are unassigned. This issue can be solved by simple interpolation.
The flow chart of
It should be noted that the term “accumulation” used above or in the claims does not limit the actual implementation of the calculation of the projections and costs. Most evidently, the linearity of the calculations in the steps performed for the saliency projection and the cost of the scaling curves allows for several implementation options. As an illustration: in the calculation of the cost of initial scaling curves this can mean that first the saliency is accumulated and later multiplied with the local magnification curve, or alternatively first the saliency can be multiplied with the local magnification and later accumulated. These are mathematically identical because of the linearity of the operations and the constancy of the local magnification in the direction of accumulation.
Besides the above, it can be imagined that accumulation is not limited to a simple sum or average of the saliency in a direction, it can be imagined that instead of these measures the median, maximum or a weighted sum is used. In other words: other kinds of projections can be used, for instance maximum (saliency) projection, where the maximum in a direction is projected. Furthermore, sub-sampling schemes can be imagined or parts of the image might be discarded all together. For instance, subtitle areas or logos might be discarded in the accumulations.
The present invention can be applied in display products such as TVs, monitors, projectors, especially when they are designed to play videos that could originate from a computer, such as DLNA, Internet TV, USB and so forth. Another application where this invention is very beneficial is the area of “User Generated Content” on the Internet, i.e. websites such as YouTube. Due to the popularity of such services, the website has to deal with a large number of poorly generated content, amongst which are videos that are uploaded with wrong aspect ratios. Implementing the method described in this invention, the website would apply the algorithm to uploaded videos and correct any distortions due to incorrect aspect ratio before processing them further.
The invention also relates to a computer program product that is able to implement any of the method steps as described above when loaded and run on computer means of an image resizing apparatus. The computer program may be stored/distributed on a suitable medium supplied together with or as a part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.
The invention also relates to an integrated circuit that is arranged to perform any of the method steps in accordance with the embodiments of the invention.
While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive, the invention being not restricted to the disclosed embodiments. Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure and the appended claims. For instance, not all the steps shown in the flow charts need to be performed. More specifically, if the object is to simply obtain an image scaling curve, then the actual image scaling is not necessary.
In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single processor or other unit may fulfil the functions of several items recited in the claims. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used. Any reference signs in the claims should not be construed as limiting the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
08306006.1 | Dec 2008 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB09/55793 | 12/16/2009 | WO | 00 | 6/23/2011 |