The present application claims priority to European Patent Application 12 167 633.2, filed in the European Patent Office on May 11, 2012, the entire contents of which being incorporated herein by reference.
1. Field of the Disclosure
The present disclosure relates to an image enhancement apparatus and a corresponding method for enhancing an input image of a sequence of input images of at least a first view and obtaining an enhanced output image of at least said first view. Further, the present disclosure relates to a display device, a computer program and a computer readable non-transitory medium
2. Description of Related Art
Super-resolution can enhance the resolution in images and video sequences. The specific characteristic of super-resolution is that it is able to create high resolution frames which have high spatial frequencies not present in each low resolution input frame.
In M. Tanaka and M. Okutomi, “Toward Robust Reconstruction-Based Super-Resolution,” in Super-Resolution Imaging, P. Milanfar, Ed. Boca Raton: CRC Press, 2011, pp. 219-244 a system for generating a high resolution output sequence from multiple input frames is presented, accumulating details from a number of available input frames, which are all available as input of the system. The output signal is assumed to have a higher pixel range than the input signal. Therefore an internal up- and down sampling is necessary.
In US 2010/0119176 A1 a system for generating a high resolution output sequence from a sequence with lower spatial resolution is presented. The system uses a temporal recursive super-resolution system in parallel to a spatial upscaling system. As the output signal has a higher pixel range than the input signal, an internal upsampling is used. The higher detail level is achieved by temporally accumulating details from multiple temporal instances from the input sequence using a recursive feedback loop.
The “background” description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventor(s), to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as prior art against the present invention.
It is an object to provide an image enhancement apparatus and a corresponding image enhancement method for enhancing an input image of a sequence of input images of at least a first view and obtaining an enhanced output image of at least said first view, which particularly provide an image detail and sharpness enhancement for monoscopic as well as stereoscopic input sequences and avoid the generation of additional artifacts and noise. It is a further object to provide a corresponding computer program for implementing said method and a computer readable non-transitory medium.
According to an aspect there is provided an image enhancement apparatus for enhancing an input image of a sequence of input images of at least a first view and obtaining an enhanced output image of at least said first view, said apparatus comprising:
an unsharp masking unit configured to enhance the sharpness of the input image,
a motion compensation unit configured to generate at least one preceding motion compensated image by compensating motion in a preceding output image,
a weighted selection unit configured to generate a weighted selection image from said sharpness enhanced input image and said preceding motion compensated image based on selection weighting factor,
a detail signal generation unit configured to generate a detail signal from said input image and said weighted selection image, and
a combination unit configured to generate said enhanced output image from said detail signal and from said input image and/or said weighted selection image.
According to a further aspect there is provided an image enhancement apparatus for enhancing an input image of a sequence of input images of at least a first view and obtaining an enhanced output image of at least said first view, said apparatus comprising:
an unsharp masking means for enhancing the sharpness of the input image,
a motion compensation means for generating at least one preceding motion compensated image by compensating motion in a preceding output image,
a weighted selection means for generating a weighted selection image from said sharpness enhanced input image and said preceding motion compensated image based on selection weighting factor,
a detail signal generation means for generating a detail signal from said input image and said weighted selection image, and
a combination means for generating said enhanced output image from said detail signal and from said input image and/or said weighted selection image.
According to still further aspects a corresponding image enhancement method, a computer program comprising program means for causing a computer to carry out the steps of the method disclosed herein, when said computer program is carried out on a computer, as well as a non-transitory computer-readable recording medium that stores therein a computer program product, which, when executed by a processor, causes the method disclosed herein to be performed are provided.
Preferred embodiments are defined in the dependent claims. It shall be understood that the claimed image enhancement method, the claimed computer program and the claimed computer-readable recording medium have similar and/or identical preferred embodiments as the claimed image enhancement apparatus and as defined in the dependent claims.
One of the aspects of the disclosure is to provide a solution for image detail and sharpness enhancement for monoscopic as well as stereoscopic input sequences, particularly in current and future display devices, such as TV sets, which solution avoids the generation of additional artifacts and noise. Information from two or more input frames from left and/or right view is used to generate an output signal with additional details and a perceived higher resolution and sharpness. Recursive processing allows keeping the required frame memory to a minimum (e.g. one additional frame buffer for each view), although information from two or more input frames is used. The provided apparatus and method are thus computationally efficient, require only a small storage resulting in cheap hardware costs and a high image or video output quality robust towards motion estimation errors and other side-effects.
The provided apparatus and method are able to handle different input and output scenarios including a) single view input, single view output, b) stereo input, single view output, and c) stereo input, stereo output. In case of stereo input, the details from multiple temporal instances of both views are accumulated, generating a monoscopic or stereoscopic output sequence with additional details.
In contrast to known solutions the provided apparatus and method temporally accumulates details from one or two available input frames at each temporal instance using a recursive temporal feedback loop. Further, no internal up- and down-sampling is required as input and output signal generally have the same pixel range. Still further, provided solution is able to handle also stereoscopic input. A complete spatial processing in parallel for stabilization is generally not necessary.
It is to be understood that both the foregoing general description of the invention and the following detailed description are exemplary, but are not restrictive, of the invention.
A more complete appreciation of the disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:
Referring now to the drawings, wherein like reference numerals designate identical or corresponding parts throughout the several views,
In 2D to 2D processing the image enhancement is carried out on a single view input sequence, using information from multiple input frames from the input view to generate an output signal with a higher perceived resolution. For detecting corresponding pixel positions in the different input frames, preferably sub-pixel precise motion vectors by use of a preceding motion estimation are used.
In 3D to 2D processing the image enhancement is carried out on the input sequence of view 1, using information from multiple input frames of the input sequences of view 1 and view 2 to generate an output signal for view 1 with a higher perceived resolution. For detecting corresponding pixel positions in the different input frames from the different input views, intra-view motion vectors (from view 1) and inter-view disparity vectors (between view 1 and view 2) are used. The preferably sub-pixel accurate motion and disparity vectors are preferably previously detected by use of a motion estimation and a disparity estimation.
In 3D to 3D processing the image enhancement is carried out on the stepreo 3D input sequence, using information from multiple input frames from both input views, to generate output signals for view 1 and view 2 with higher perceived resolutions. For detecting corresponding pixel positions in the different input frames from the different input views, intra-view motion vectors (from view 1 and view 2) and inter-view disparity vectors (between view 1 and view 2 and between view 2 and view 1) are used. The preferably sub-pixel accurate motion and disparity vectors are preferably previously detected by use of a motion estimation and a disparity estimation.
A weighted selection unit 104 computes the reliability of the motion compensation by comparing Z1,mc(t-1) and Y1,UM and mixes the inputs depending on the computed reliability. In case of a high reliability Z1,mc(t-1) is mainly forwarded and in case of a low reliability Y1,UM is mainly forwarded to avoid artifacts from erroneous motion compensation which is caused by bad motion vectors.
The output of the weighted selection unit 104 is defined as X1. Based on X1 a detail signal D3 is computed using a detail signal generation unit 106. The detail signal generation unit 106, which preferably comprises at least a data model unit, generates the detail signal D3 by comparing X1 and the available current input frames. In case of only one available view as in the present embodiment, only the current input frame Y1 from view 1 and the weighted selection image X1 of the weighted selection unit 104 are used to generate the detail signal D3.
The resulting detail signal D3 is combined with Y1 in a combination unit 115, in this embodiment an addition unit, generating a signal with additional details which is used as the final output signal Z1 in this embodiment.
The two resulting detail signals D11, D12 are added resulting in a combined detail signal D1 and then subtracted in a first subtraction unit 107a from X1,n, generating an intermediate signal V1 with additional details. To generate a final difference signal D2 between the current input Y1 and the current result V1 of the processing, Y1 is subtracted from V1 in a second subtraction unit 107b resulting in a final difference signal D2.
As in edge areas the processing should be reduced to avoid overenhancement, the final difference signal D2 is weighted with an edge strength dependent weighting factor in an edge dependent weighting unit 114. This weighting factor is based on the maximum local gradient G1 of X1,n obtained in a maximum local gradient unit 112. The weighted final difference signal D3 is finally added by an addition unit 115 to the current input signal Y1, generating the final result Z1.
To further approximate a super-resolved solution, optionally Z1 can be internally fed back (set to X1,n+1) using a switch 116 controlling the image model and data model input, allowing multiple iterations of image model and data model processing. In a first iteration the switch 116 couples the output of the weighted selection unit 104 to the subsequent element 106a, 106b, 112. In subsequent iterations the switch 116 couples the output signal Z1 to said subsequent elements 106a, 106b, 112. To realize a temporally recursive processing, the final result of the image enhancement apparatus 100b is stored to the frame buffer 108, so that in the temporally next processing step the results can be further enhanced. Hence, with the proposed embodiment 100b it is possible to accumulate the details from multiple input frames from two views, using one recursive feedback loop.
The third embodiment 100c is based on the second embodiment 100b, but in still another embodiment it can also be based on the first embodiment 100a, i.e. in the first embodiment input frames of a second view and disparity vectors from view 1 to view 2 may be available to provide another embodiment of the image enhancement apparatus.
The fourth embodiment 100d is based on the third embodiment 100c, but in still another embodiment it can also be based on the first embodiment 100a, i.e. the first embodiment 100a may be doubled (one for each view), and motion vectors and disparity vectors may be added, to provide another embodiment of the image enhancement apparatus.
Exemplary embodiments of the various elements of the above described embodiments of the proposed image enhancement apparatus are described in the following.
An embodiment of the unsharp masking unit 102 is depicted in
An embodiment of the weighted selection unit 104 is depicted in
The selection weighting factor SW is computed in a weighting factor computation unit 104b based on the local summed absolute difference (SAD), which is computed inside a local block area, e.g. a 3×3 block area. A high SAD describes a strong local difference between the originally aligned input and the compensated input, which indicates a motion vector error. This assumption does not consider that in flat areas motion vector errors result in smaller differences between originally aligned input and compensated input than in textured areas. Therefore also a flat detection unit 104c is utilized for the computation of the weighting factor, allowing bigger differences in detail areas than in flat areas for strongly weighting the compensated input. This results in the following equation for the weighting factor computation:
Here, λtemp and λtemp,adapt are predefined control parameters.
For computation of the output of the weighted selection unit 104, the compensated input is multiplied in a multiplication unit 104d with the weighting factor and the originally aligned input is multiplied in a multiplication unit 104e with one minus the weighting factor. The resulting weighted signals W1, W2 are then summed up and used as the output signal X1 of the weighted selection unit 104.
For the flat map computation in the flat detection unit 104c the absolute local Laplacian is computed in an embodiment and summed up over a block area, e.g. 5×5 block area. Between a lower and an upper threshold the computed sum is mapped to values between 0 (flat area) and 1 (texture area).
The embodiment of the image model unit 106b depicted in
The weighting factor W3 is selected based on several gradient thresholds and a given image model weight:
gradX(x,y)=Xn(x,y)−Xn(x−1,y)
gradY(x,y)=Xn(x,y)−Xn(x,y−1) (3)
Then the absolute gradient G4 is computed in an absolute gradient computation unit 112c by the following operation:
gradient=√{square root over (gradX2+gradY2)} (4)
Finally the maximum local gradient G1 is detected inside a local block area, e.g. a 3×3 block area, by a local maximum gradient computation unit 112d and written to the maximum local gradient map. This map describes the local edge strength in X.
The embodiment of the data model unit 106a depicted in
To generate the detail signal D14 based on view 2, at first the disparity shift compared to view 1 has to be compensated, using a disparity compensation unit 306c with sub-pixel accuracy. After that the compensated Y2 is mixed with Y1 using the weighted selection unit 306d, which can be built in the same manner as the weighted selection unit 104 depicted in
Inside the data model unit 106a an adaptive low-pass filter 306a is used, an embodiment of which is depicted in
For filtering the input image is separately convoluted with the filter coefficients in horizontal and vertical direction:
Then the difference images between the low-pass filtered results and Xn are computed. For each filtered image then the local description length is computed inside a 5×5 block area using the following equation.
The local description length values are used to detect the standard deviation of the low-pass filters that induce the local minimum description length. Finally Xn is adaptively filtered using the locally optimal filter kernel. The 2D Filter is computed by:
For filtering the input image is convoluted with the 2D filter coefficients.
The result is the adaptive filter output F. Furthermore the local optimal standard deviations are written to a map which is forwarded so that it can be used for selection of a weighting factor.
To be able to control the enhancement level of the output signal, a final difference signal between the output of the complete processing and the current input signal is computed in an edge dependent weighting unit 114 as depicted in
For detail generation from spatially shifted inputs it is preferred to have a sub-pixel accurate compensation of the spatial shifts which are described by motion vectors and disparity vectors. A possible solution is the utilization of a bilinear interpolation. The luminance values of the compensated image are computed as follows:
vx and vy are the sub-pixel accurate motion/disparity vectors. If the accessed image position of the previous result is out of range, the luminance value of the reference input is copied.
In summary, the present disclosure relates to a method and corresponding apparatus for the enhancement of detail level and sharpness in monoscopic (single view) and stereoscopic image sequences. The detail level is enhanced by temporally accumulating information from multiple input frames of a first view and the additional information obtained from a secondary view of a stereoscopic input sequence using a recursive feedback loop. The accumulation of details results in a higher perceived resolution and sharp-ness in the output sequence. In contrast to typical spatial sharpness enhancement methods like unsharp masking the noise level is not amplified due to temporal and inter-view averaging. Furthermore typical side effects of methods using information from multiple input frames like artifacts from erroneous motion and disparity vectors can be strongly limited. Spatial artifacts are reduced by internally approximating an image model. The proposed method and apparatus is able to handle monoscopic as well as stereoscopic input sequences.
The various elements of the different embodiments of the provided image enhancement apparatus may be implemented as software and/or hardware, e.g. as separate or combined circuits. A circuit is a structural assemblage of electronic components including conventional circuit elements, integrated circuits including application specific integrated circuits, standard integrated circuits, application specific standard products, and field programmable gate arrays. Further a circuit includes central processing units, graphics processing units, and microprocessors which are programmed or configured according to software code. A circuit does not include pure software, although a circuit does include the above-described hardware executing software.
Obviously, numerous modifications and variations of the present disclosure are possible in light of the above teachings. It is therefore to be understood that within the scope of the appended claims, the invention may be practiced otherwise than as specifically described herein.
In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
In so far as embodiments of the invention have been described as being implemented, at least in part, by software-controlled data processing apparatus, it will be appreciated that a non-transitory machine-readable medium carrying such software, such as an optical disk, a magnetic disk, semiconductor memory or the like, is also considered to represent an embodiment of the present invention. Further, such a software may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.
Any reference signs in the claims should not be construed as limiting the scope.
Number | Date | Country | Kind |
---|---|---|---|
12167633.2 | May 2012 | EP | regional |