1. Field of the Invention
The present invention relates to image processing in general and to re-sampling and improving resolution of images in particular.
2. Discussion of the Related Art
Super-resolution image reconstruction is a form of digital image processing that increases the amount of resolvable details in images and thus its quality. Super-resolution generates a still image of a scene from a collection of similar lower-resolution images of the same scene. For example, several frames of low-resolution video may be combined using super-resolution techniques to produce a single or multiple still images whose true (optical) resolution is significantly higher than that of any single frame of the original video. Because each low-resolution frame is slightly different and contributes some unique information that is absent from the other frames, the reconstructed still image contains more information, i.e., higher resolution, than that of any one of the original low-resolution images. Super-resolution techniques have many applications in diverse areas such as medical imaging, remote sensing, surveillance, still photography, and motion pictures.
Other related problems in image processing that benefit from the advancement in super-resolution are de-interlacing of video, inpainting (filling in missing information in an image/video), and other problems, where one desires a new form of visual data, re-sampled from the given images. Many of the techniques available for super-resolution are applicable to these problems as well.
In the mathematical formulation of the problem, the available low-resolution images are represented as resulting from a transformation of the unknown high-resolution image by effects of image warping due to motion, optical blurring, sampling, and noise. When improving the resolution of an image that is part of a sequence of images, such as images taken from a video camera, highly accurate (sub-pixel accuracy) motion estimation is required for improving the resolution. Known solutions for determining motion vectors do not provide sufficient results in case of non-continuous movement of objects, for example, a tree moving due to wind, or moving persons in a scene.
Known methods for improving the resolution of images either process data acquired within the image, and thus failing to recover details smaller than the sensor size, or process data also from other images, but require accurate motion estimation to do so. On general sequences, motion estimation is usually prone to inaccuracies as well as errors. These inaccuracies cause the outcome of the processing to be of relatively low quality. Therefore, it is desirable to provide a method and apparatus for improving the resolution of images without using motion vectors, or put more conservatively, provide a method that uses motion estimation implicitly, rather than explicitly.
All the discussion brought here is applicable to other re-sampling problems as mentioned above, such as inpainting and de-interlacing, while the description is focused on super-resolution in the following description for clarity.
Exemplary non-limited embodiments of the disclosed subject matter will be described, with reference to the following description of the embodiments, in conjunction with the figures. The figures are generally not shown to scale and any sizes are only meant to be exemplary and not necessarily limiting. Corresponding or like elements are optionally designated by the same numerals or letters.
The disclosed subject matter describes a novel and unobvious method for improving the resolution of an image and avoiding the requirement of motion estimation when handling a sequence of images.
One technical problem addressed by the subject matter is to improve the resolution of images in a sequence of images, and allow simultaneous processing of date both within the image and between images. Super-resolution (SR) refers in some cases to a group of methods of enhancing the resolution of an imaging system. Further, when a sequence of images contains motion of one or snore objects, motion estimation is required for correcting the low-resolution images. Furthermore, objects' motion is necessary in providing classic SR. However, when the motion is not of simple form known motion-estimation solutions cannot provide sufficient results, for example in many cases such solutions wrongfully identify multiple objects instead of one. Hence, a method for improving the resolution of images within a sequence of images while avoiding determination and storage of motion vectors and motion estimation is another technical problem addressed in the subject matter.
The technical solution to the above-discussed problem is a method for improving the resolution of a low-resolution image by utilizing data acquired from multiple neighboring images as well as from the handled image. The method does not attempt to determine one specific location for each pixel in a high-resolution image in the neighboring images. In an exemplary embodiment of the subject matter, the method utilizes temporal neighboring images of the handled image, for example 10 images captured before the handled image, 10 images captured after the handled image, and the handled image itself. For each pixel in the handled image, pixel values of the pixels surrounding the handled pixel are compared to pixel values of pixels located in the same locations or nearby locations in neighboring images.
After comparing the pixel values of the pixels surrounding the handled pixel with pixel values of pixels located in the same locations of neighboring images, a weight value is determined as a function of the pixel values. In other embodiments of the disclosed subject matter, comparison between images is performed using other image-related parameters besides pixel values, for example gradients, gradients size, gradients direction, frequency domain values, transform domain coefficients and other features that that may be valuable for a person skilled in the art. Next, the pixel values of pixels located in the vicinity of the location of the handled pixel in the neighboring images are combined by the weighted average value. In an exemplary embodiment of the subject matter, the above-identified combinations are summed and divided by the sum of all weight values for normalizing the value of the sum. The pixel value determined for the handled pixel is a function of pixel values of pixels in neighboring images the weight values. In some embodiments, the pixel value is divided by a factor for normalizing.
The method described above is one embodiment of an algorithm for providing super resolution without motion compensation. Two implementations of parts of the algorithm detailed below provide better results than determining motion vectors, sometimes with less complexity. One algorithm discloses fuzzy motion techniques for super resolution and the other algorithm discloses the use of non-local means (NLM) algorithm for determining an optimal penalty function that enables determining the optimal high-resolution image.
The steps detailed above are preferably implemented as interrelated sets of computer instructions written in any programming language such as C, C#, C++, Java, VB, VB.Net, or the like, and developed under any development environment, such as Visual Studio.Net, J2EE or the like. It will be appreciated that the applications can alternatively be implemented as firmware ported for a specific processor such as digital signal processor (DSP) or microcontrollers, or can be implemented as hardware or configurable hardware such as field programmable gate array (FPGA), application specific integrated circuit (ASIC), or a graphic processing unit (GPU). The methods can also be adapted to be executed on a computing platform, or any other type of computing platform that is provisioned with memory unit 120, processing unit 130, and I/O devices 110 as noted above.
In accordance with some embodiments of the subject matter, processing unit 130 handles the image 117 pixel by pixel. Next, processing unit 130 compares the area surrounding each handled pixel with the area surrounding the pixel in the same location or in nearby locations in the neighboring images. The neighboring images are preprocessed and up-scaled to be in a desired size, preferably the size of the desired super-resolution images, or a size that is a function of the size of the largest image in the sequence of low-resolution images. For example, when the handled images are 100×120 pixels, and the desired size is 300×240, the images are up-scaled, for example by an intra-polation or interpolation process, in either a linear or non-linear manner. In various embodiments of the subject matter, the rescaling factor is equal in both axes, so the desired image is 300×360. After the step of upscaling, the neighboring images and the handled image 117 are stored in storage device 140 or in memory unit 120. Pixel values of pixels that are part of the low-resolution images and the locations of those pixels in the high-resolution images are also stored in storage device 140.
Processing unit 130 compares pixel values of the pixels surrounding the handled pixel in the handled image 117 with pixels values of pixels in temporal-neighboring images, preferably after at least some of the images are interpolated to a desired scale Processing unit 130 assigns a weight value for at least a portion of the pixels in a three-dimensional or two-dimensional neighborhood of the handled pixel in neighboring image, as a function of the difference between pixel values (or other measures) of each area within each temporal-neighboring image to the area surrounding the handled pixel in the handled image 117. Such weight value may be a Mean Squared Error (MSE) or any other function or measurable attribute that enables comparison between pixel values for determining differences between areas of images. The weight value is determined for each neighboring image as a function of the value described above. Such weight value may be the exponent of the value −MSE*T, when T is a predetermined value. An alternative weight value may be 1/MSE. The weight function and various image-related parameters required in the computation process may be adaptively selected for each handled pixel.
In some exemplary embodiments of the disclosed subject matter, processing unit 130 receives pixel values from memory unit 120 and determines the weight values according to a method stored in storage device 140. Such method, and the values of parameters related to the method, may be selected by processing unit 130 from a variety of methods according to data related to the pixel values, location of pixels, image size, and the like.
The weight value may indicate an inverse correlation between the result of the previous comparisons and the importance of an area compared to an area containing the handled pixel. For example, when the difference between pixel values of two compared areas of pixel is big, the importance of the pixel values of one area on determining the pixel values of the other area is relatively low. Another parameter that may affect the weight value is the time elapsed or the number of captured images between capturing the handled image and the specific neighboring image assigned with the specific weight value.
Next, the value of the handled pixel is determined as a function of the weight values and the pixel values related to pixels within the original images. After determining the pixel value of the handled pixel, the pixel is updated with the new pixel value. Alternatively, another image is generated, and the new pixel values are inserted in the other image. In some exemplary embodiments of the subject matter, the weight values are multiplied by a function of all pixel values in the detected area of each neighboring image. Alternatively, only pixel values of pixels within the low-resolution image are multiplied by the weight value when determining the value of the handled pixel. In some embodiments of the method, the value of the handled pixel is determined as the sum of all multiplications of the neighboring images divided by the sum of weight values for normalizing the value. In other alternative embodiments, the value of the handled pixel is determined as a function of the pixel values and the weights. In other embodiments, the weights are re-calculated, using the pixel values determined after one iteration of the method and the new weights of the image after one iteration of the super resolution method of the disclosed subject matter.
In accordance with some alternative embodiments of the subject matter, determination of at least a portion of the pixel values of the handled image may be performed according to pixel values of the previous handled image in the sequence of images. For example, in case the level of similarity of one area in the handled image respective to an area in the previous image is higher than a predetermined threshold value, the pixel values of the at least a portion of the pixels in the handled image are determined as a function of the pixel values of the previous image. This alternative method may be added to the method described above, for reducing complexity of the calculations, in accordance of predetermined conditions and terms related to image-related parameters.
After the pixel values in the up-scaled image are determined, a step of deblurring is performed using known methods such as total variation deblurring. Data required for deblurring, such as a set of rules for determining the proper method for improving the resolution of the handled image may be stored in storage device 140. In an exemplary embodiment of the subject matter, the updated super resolution image 145 may be displayed on monitor 150. The steps described above, mainly of up-scaling the image and comparing pixels values of the detected image with neighboring images, obtaining weight values for each neighbor image and determining the pixel values of pixels in the high resolution image are preferably performed by a computerized application.
In the example described below, the handled pixel 245 is located in row 32 and column 55 of handled image 240. The side of area 250 is determined to be 10 pixels. As a result, pixels belonging to rows 22-42 and columns 45-65 are part of area 250, which thus contains 21 rows and 21 columns. In some exemplary embodiments of the subject matter, the number of rows of an area may differ from the number of columns. The pixel values of pixels within area 250 are compared to pixel values of pixels within areas within neighboring images, such as area 230 of image N−M (220) and area 270 of image N+M (260). The location of area 230 in image N−M (220) is substantially the same location of area 250 in handled image N (240).
Additionally, area 250 is compared to areas in the neighboring images located near the location of area 250 in handled image N (240). In other embodiments, the pixel values of pixels in area 250 maybe compared to areas in the handled image N (240). For example, in case area 250 is located in rows 22-42 and columns 45-65, additional comparisons are performed between area 250 and areas having offset of one column to the left, i.e. comprises rows 22-42 and columns 44-64 within neighboring images. Another example of an area offset in four columns to the left and two rows up, relative to the location of area 250, i.e. comprises rows 24-44 and columns 41-61. In exemplary embodiment, wherein the number of areas used in each neighboring image is 25, using an offset of two rows in each direction and two columns in each direction, the number of areas in each neighboring image is 25. These 25 areas are extracted from at least a portion of the neighboring images and the handled image.
When comparing pixel values of area 250 with areas of neighboring images within the predetermined range, a weight value is obtained for each comparison. When determining the pixel value of handled pixel 245 within handled image N (240), one exemplary method is to determine the average of pixel values in each area and multiply the average with each weight value, and sum all multiplications. Another embodiment discloses steps of summing the pixel values of the centers of the areas, by multiplying the pixel values by the weights and divide by the sum of weights. According to some exemplary embodiments of the subject matter, the next step is to divide the result of the multiplications by the sum of all weight values for normalizing the determined pixel value. In another exemplary embodiment of the method of determining the pixel value of handled pixel 245, the average associated with each area compared with area 250 refers only to pixel values of pixels that were part of the original low-resolution images, before the step of up-scaling. Such average is multiplied by the relevant weight value and divided by the sum of weight values to provide the pixel value of handled pixel 245.
The number of neighboring images compared to the handled image, the range and thus the number of areas compared to the area of the handled pixel in each neighboring image, and the size of the area 250 may be predetermined and uniform for each handled pixel or handled image, or may be determined per pixel according to several parameters. Such parameters may be the difference between pixel values of the handled image, previous MSE values, standard deviation or average of previous comparisons, and the like.
Basic area 340 of handled image N (330) is stored in memory unit (120 of
In the exemplary embodiment, basic area 320 of image N−1 (310) contains pixels 311-319, contained within rows i−1 to i+1 and columns j−1 to j+1. Pixel 315 is located on row i and column j. When comparing area 340 containing handled pixel 335 to areas in neighboring images, areas located near the basic areas are also compared to area 340. For area 340. For example, area 321 is an offset area of image N−1 (310) located in rows i−2 to i and columns j−2 to j. Area 321 contains pixels 306-312, 314 and 315. The pixel value of each pixel in area 321 is compared to a pixel value of a respective pixel in area 340. For example, the pixel value of pixel 335 located in the center of area 340 is compared to the pixel value of pixel 311 located in the center of area 321. According to some embodiments of the method disclosed in
After comparing pixel values of each area within the range to area 340 that contains handled pixel 345, a weight value W(M,T) is obtained, associated with the offset M and the specific neighboring image T. For example, when comparing pixel values of area 340 to pixel values of area 321, the weight value W(M,T) is stored in memory unit 120 or storage 140 (both shown in
Another technical problem addressed in the subject matter is to provide a penalty function that avoids the determination of motion vectors and yet provides sufficient results. A penalty function is a method of developing a family of algorithms for improving the resolution of images. Such a penalty function receives known low-resolution images, and a candidate super-resolution outcome, and determines a penalty value as a function of these given items to indicate the quality of the super-resolution outcome match to the given low-resolution images. Determining efficient and accurate penalty functions leads to determining the high-resolution image from a low-resolution image.
One known penalty function for super-resolution is given by
Wherein parameter D refers to the resolution-scale factor, for example the numbers of rows, columns, or pixels that were previously removed when the image was downscaled. In other embodiments, D depends on the ratio between the number of pixels in the high resolution image to the number of pixels in the low resolution image. Alternatively, D refers to the ratio between the amount of data related to the high-resolution image and the amount of data related to the low resolution image. Parameter H refers to the blurriness of the image, sometimes caused by the camera's point spread function (PSF) that have various solutions known in the art. The parameter Ft refers to the warping of the image between the correct location of a pixel and the actual location of the pixel in the up-scaled image, in each neighboring image t for each pixel.
In order to find the super-resolution image that best fits the images yt, the penalty function is derived to determine its minimal value. Finding the minimal value of a penalty function is equivalent to determining the best method for transforming low-resolution images into the desired image X, according to the penalty term.
Finding the operators Ft is a problematic issue when determining the penalty function according to the algorithm disclosed in the prior art, since it requires determining and storing motion vectors for each pixel. The disclosed algorithm avoids determining the correction vector between the actual location of pixels in the low-resolution image provided to the computational entity that improves the resolution of the image and the correct location that should be in the desired high-resolution image. The parameter yt refers to the known low-resolution image and the parameter X refers to the desired high-resolution image. Indexing parameter t indicates summing over the number of T neighboring images compared to the handled image.
The new and unobvious disclosed penalty function results from data acquired from the low-resolution images while avoiding the use of external data such as motion vectors, predictions, and the like. Additionally, the method disclosed in the subject matter uses only basic rather than complex computations. The new method also saves memory since motion vectors and the difference in pixel locations respective to other images are not stored. The result of the method of the subject matter is a penalty function shown below:
The new and unobvious penalty function uses fuzzy motion estimation Parameters D and H are the same as in the penalty function provided in prior art methods. One major difference compared to prior art penalty functions is the lack of traditional F parameter, used for finding the difference between the location of a pixel in the correct image and the location of the same pixel in the provided image. Parameter Fm denotes the set of possible simple translations that image X may undergo in order to transform the entire image X into a new location. Additionally, the parameter Fm may contain a set of transformations that contain various types of motions, such as rotations, zooms, and the like. For example, one translation is an offset of one column up performed on an area compared with an area surrounding the handled pixel (such as pixel 245 of
Another major difference using fuzzy motion estimation for improving the resolution of an image is that the summation according to the subject matter is double, instead of single summation as suggested in the previous method. In other words, all the number of neighboring images (T) and offsets (M) are taken into consideration, instead of the prior art methods that refer to a single, constant offset for the entire image (M). The additional summation refers to the offsets (M) of the location of the areas compared to the area surrounding the handled pixel, relative to the location of the base areas. In case the area's offset is two rows up and down, and two columns to each side, the number of offset areas (M) for each neighboring image is 25 (5 in each dimension, including the same pixel and two pixels in each direction). The weight value (Wm,t) is a comparison comparison function performed between pixel values or other image-related parameters of the handled area (such as area 250 of
Another approach to design a penalty function for the development of super-resolution techniques is based on the non-local means (NLM) method described below. As the NLM is originally designed for noise removal, it is first described for this task, and then extended to super-resolution.
The parameter y[k,l] refers to an area surrounding a pixel located on row k and column l and the power of e indicates the difference between the pixel value of a pixel having indices [k,l] and the pixel value of a pixel having indices [i,j]. The exponentiation e is multiplied by a function f that takes into account the distance between the location of index [i,j] and index [k,l] in the low-resolution image y (410).
In another embodiment, the weight value is a function of an NLM filter shown below. The main difference between the NLM filter and the bilateral filter is the use of areas (Rk,l) surrounding the pixel in index [k,l] when comparing images.
An unobvious penalty function is defined below for transforming the low-resolution images yt into a desired super-resolution image X. The penalty function uses weight values resulting from NLM or bilateral filters disclosed above, or weights relying on other image-related parameters. The weights determined for the penalty functions, as well as weights determined in the methods disclosed hereinafter in the subject matter, may be any function of image-related parameters and are not limited to pixel values. Further, the determination of weight values is not limited to the methods disclosed in the subject matter, but to any method or function provided by a person skilled in the art. The parameter Rk,l refers to the area surrounding the pixel in row k and column l, i.e., the pixel in index [k,l]. Parameter t indicates that the comparison between areas is performed for t neighboring images. Index [k,l] is detected in the entire image, while index [I,j] is detected only in the neighborhood of index [k,l]. The penalty function is:
An iterative approach is used to minimize this penalty, where pixel value of each pixel in the low-resolution image y is updated on each iteration until the updated y is sufficiently similar to the desired image x, or has a level of resolution that is higher than a predetermined resolution value. According to one exemplary embodiment of the subject matter, the method for iterative approach uses the formula below. xn is a desired image, resulting from n iterations starting from x0.
Using the iterative approach, the input to the penalty function is a low-resolution image x0. Next, an image sized as x0 is initialized, with all pixel values set to zero. The method reviews all pixels in the initialized image. In the example below, the reviewed pixel is pixel 420. For each reviewed pixel 420, an area 430 surrounding reviewed pixel 420 is used. Area 430 comprises multiple pixels, such as pixel 450, in the neighborhood of reviewed pixel 420. For each pixel located in area 430 in the neighborhood of reviewed pixel 420, an area 440 surrounding each pixel located in area 430 is retrieved. Preferably, area 440 is smaller than or equal to area 430. The pixel values of area 440 surrounding each pixel located in area 430 are multiplied by a weight value. The weight value is specific to the relations between reviewed pixel 420 and pixel 450 in the area 430 surrounding the reviewed pixel 420. Other methods for determining a weight value are provided in association with
In other embodiments, area 440 or area 430 is up-scaled, so both areas 430, 440 have the same size. Then, pixel values of area 430 are compared with pixel values of area 440, and the weight value is a function of the difference between the pixel values. After multiplying the pixel values of pixels located in area 440 by the weight value, the result is added to the pixel values of the initialized image, in the pixels surrounding the location of reviewed pixel 420. After the pixel values of pixels surrounding the location of reviewed pixel 420 are extracted, a step of normalizing is provided. In an exemplary embodiment of the subject matter, area 430 surrounding reviewed pixel 420 is larger than area 440 surrounding each pixel such as pixel 450 that surround reviewed pixel 420. In an alternative embodiment of the disclosed subject matter, determining the weight values can be done using areas in the low-resolution images before the upscaling step instead of comparing interpolated images.
Another aspect of the disclosed subject matter relates to providing super resolution without explicit motion estimation. The first step is obtaining and minimizing a penalty function. The input to the penalty function is a set of low-resolution images. The method for improving the resolution of the images in the sequence of images is performed for each image separately. The size of the area taken from the high-resolution image is to be adjusted to fit the size of the areas of pixels taken from the low-resolution images. The adjustment is performed since the desired image X and the input images y have different sizes and different number of pixels, in order to accurately compare the pixel values of equivalent regions in the two types of images. The penalty function suggested is shown below.
The new and unobvious penalty term overcomes the technical problem of the operator Rkl that can only detect a minor portion of the pixels in the area surrounding a handled pixel. Operator Rkl cannot detect all pixels surrounding the handled pixel, since according to prior-art methods, the decimation step which results in down-scaling the image, is performed prior to detecting pixel values. According to an exemplary embodiment of the subject matter, the method first detects pixel values and then decimates the area previously detected. The decimation is performed in order to enable comparing areas of pixels having substantially the same sizes in the penalty function. For example, when comparing the left quarter of a low-resolution image to the left quarter of a high-resolution image in a penalty function, the area detected from the high-resolution image should be decimated. When performing decimation after detecting the area of pixels, more data is detected and can be used to determine more accurate pixel values in the low-resolution image. Parameter Dp refers to the step of decimation performed on the area of pixels detected by operator Rkl from the high-resolution image X. As a result, the area detected by operator Rij detected from the low-resolution image yt can successfully be compared to an equivalent area detected by operator Rkl from the high-resolution image after the area detected by operator Rkl is decimated. In an exemplary embodiment of the subject matter, the ratio between the size of area detected by operator Rkl and the size of area detected by operator Rij is constant and is called a decimation factor, used for decimating areas detected by operator Rkl. The functional TV refers to a total variation value added for smoothing the low-resolution image, and it may replaced by many regularizing functionals known to a person skilled in the art.
One technical effect of the methods described above is the ability to use several processors, each processor analyzing another part of the handled image and thus reduce the time required for improving the resolution. Another technical effect is the lack of requirement to determine, store and use motion vectors when improving the resolution of a sequence of images. Another technical effect is the use of an iterative approach that can be terminated when the level of resolution is higher than a predefined level. Another technical effect is the use of small areas in large numbers, for achieving better images.
While the disclosure has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings without departing from the essential scope thereof. Therefore, it is intended that the disclosed subject matter not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but only by the claims that follow.
This application claims priority from provisional application No. 60/982,800 filed Oct. 26, 2007, provisional application No. 61/015,420 filed Dec. 20, 2007, and from which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60982800 | Oct 2007 | US | |
61015420 | Dec 2007 | US |