This invention pertains to the field of digital image processing and more particularly to a method for increasing the resolution of a digital image.
In order to reduce the image data for storage and transmission, high-resolution images are often being down-sized and then compressed to produce low-resolution images with a smaller file size. The consequence of these degradation processes, both down-sizing and compression, is losing high-frequency information in the original images. However, it is frequently desirable to invert the processes to produce a high-resolution image from a low-resolution image. Both down-sizing and compression are lossy processes, and are not invertible mathematically. Many algorithms have been developed that attempt to approximately invert the two degradation processes separately. Generally, down-sizing operations are performed in spatial domain, while compression operations are performed in frequency domain. It is challenging to develop an algorithm to simultaneously account for both losses.
Commonly used algorithms for increasing the resolution are interpolation and machine learning. The interpolation algorithms are typically based on some assumptions about the characteristics of the original image. The main problem with this approach is that the interpolation operation cannot restore the high-frequency spatial detail discarded by the down-sizing process. Simple interpolation methods such as linear or cubic interpolation tend to generate overly smooth images with ringing and jagged artifacts.
Machine learning algorithms are based on using prior knowledge to estimate the high-frequency details in the high-resolution images. In the article “Example-based super-resolution” (IEEE Computer Graphics and Applications, Vol. 22, pp. 56-65, 2002), Freeman, et al. describe a promising frame work for an example-based super-resolution algorithm. A set of basis functions form a dictionary from which any new image region can be constructed as a sparse linear combination of elements in the dictionary. Matched low- and high-resolution dictionaries are created during a training phase. The coefficients necessary to recreate a low-resolution image region are applied to the high resolution dictionary forming a high resolution test image region. The problem with this approach is that the dictionary needs to be large to produce smooth and graceful coefficient activity across diverse test imagery. If the dictionary has too few entries, there may not be enough variety in the spatial information to produce a good quality high-resolution image. The solution of having a large dictionary, however, brings with it storage issues and the increased compute power necessary to solve for coefficients from so many dictionary elements.
JPEG is a commonly used compression algorithm for digital images. The JPEG compression process partitions an image into 8×8 pixel blocks, applying discrete cosine transform (DCT) to each block to obtain an 8×8 DCT coefficient array F[u,v], where u and v are numbers between 1 and 8. The DCT coefficients are quantized by an 8×8 array of numbers called quantization table Q[u,v]. Quantized DCT coefficients F′[u,v] are determined according to the following equation:
F′[u,v]=round(F[u,v]/Q[u,v]) (1)
The luminance quantization table recommended by the JPEG standard (table T50) is shown in Eq. (2). The numbers in the table can be scaled up (or down) to increase (or decrease) the degree of compression at each individual frequency.
It is well known to those skilled in the art that the human vision system is generally more sensitive to lower frequencies (upper left corner of the array), and is generally less sensitive to higher frequencies (lower right corner of the array). Therefore, it is common that the quantization table has smaller numbers in upper left corner and larger numbers in the lower right corner. With this kind of quantization table the compression is archived by removing the small high-frequency component in DCT coefficients. However, when the image is up-scaled, the missing high-frequency information is shifted to lower frequencies, and the up-scaled image can therefore exhibit visible JPEG compression artifacts. Over the years, many artifact removal algorithms have been developed. Most of them concentrate on block artifacts that result from a high degree of compression. Commonly assigned U.S. Pat. No. 7,139,437 to Jones et al., entitled “Method and system for removing artifacts in compressed images,” is an example of one such method.
There remains a need for a fast and robust technique to simultaneously increase the resolution of a digital image and perform JPEG compression artifact removal.
The present invention represents a method for processing an input digital image having a number of input image pixels to determine an output digital image having a larger number of output image pixels, comprising:
analyzing the image pixels in the input digital image to assign each image pixel to one of a plurality of different pixel classifications;
providing a plurality of pixel generation processes, each pixel generation process being associated with a different pixel classification and being adapted to operate on a neighborhood of input image pixels around a particular input image pixel to provide a plurality of output image pixels corresponding to the particular input image pixel;
for each input image pixel:
storing the output digital image in a processor accessible memory; wherein the method being performed, at least in part, by a data processor.
This invention has the advantage that both the increase of image resolution and the removal of JPEG artifacts, or other image artifacts such as noise/grain, are addressed at same time. It produces improved and more consistent results, especially in dealing with visible artifacts after image resolution is increased.
This invention has the additional advantage that since the processing models are generated based on a prior knowledge of image resizing and JPEG compression, the method is able to recover some missing information lost in the degradation process. As a result, the method can produce an enhanced image that is closer to an original high-resolution image.
Since the invention is based on a neural network framework, it has the additional advantage that it requires substantially less computation resources and processing time than many prior art methods.
It is to be understood that the attached drawings are for purposes of illustrating the concepts of the invention and may not be to scale.
In the following description, some embodiments of the present invention will be described in terms that would ordinarily be implemented as software programs. Those skilled in the art will readily recognize that the equivalent of such software may also be constructed in hardware. Because image manipulation algorithms and systems are well known, the present description will be directed in particular to algorithms and systems forming part of, or cooperating more directly with, the method in accordance with the present invention. Other aspects of such algorithms and systems, together with hardware and software for producing and otherwise processing the image signals involved therewith, not specifically shown or described herein may be selected from such systems, algorithms, components, and elements known in the art. Given the system as described according to the invention in the following, software not specifically shown, suggested, or described herein that is useful for implementation of the invention is conventional and within the ordinary skill in such arts.
The invention is inclusive of combinations of the embodiments described herein. References to “a particular embodiment” and the like refer to features that are present in at least one embodiment of the invention. Separate references to “an embodiment” or “particular embodiments” or the like do not necessarily refer to the same embodiment or embodiments; however, such embodiments are not mutually exclusive, unless so indicated or as are readily apparent to one of skill in the art. The use of singular or plural in referring to the “method” or “methods” and the like is not limiting. It should be noted that, unless otherwise explicitly noted or required by context, the word “or” is used in this disclosure in a non-exclusive sense.
The phrase, “digital image file”, as used herein, refers to any digital image file, such as a digital still image or a digital video file.
The data processing system 110 includes one or more data processing devices that implement the processes of the various embodiments of the present invention, including the example processes described herein. The phrases “data processing device” or “data processor” are intended to include any data processing device, such as a central processing unit (“CPU”), a desktop computer, a laptop computer, a mainframe computer, a personal digital assistant, a Blackberry™, a digital camera, cellular phone, or any other device for processing data, managing data, or handling data, whether implemented with electrical, magnetic, optical, biological components, or otherwise.
The data storage system 140 includes one or more processor-accessible memories configured to store information, including the information needed to execute the processes of the various embodiments of the present invention, including the example processes described herein. The data storage system 140 may be a distributed processor-accessible memory system including multiple processor-accessible memories communicatively connected to the data processing system 110 via a plurality of computers or devices. On the other hand, the data storage system 140 need not be a distributed processor-accessible memory system and, consequently, may include one or more processor-accessible memories located within a single data processor or device.
The phrase “processor-accessible memory” is intended to include any processor-accessible data storage device, whether volatile or nonvolatile, electronic, magnetic, optical, or otherwise, including but not limited to, registers, floppy disks, hard disks, Compact Discs, DVDs, flash memories, ROMs, and RAMs.
The phrase “communicatively connected” is intended to include any type of connection, whether wired or wireless, between devices, data processors, or programs in which data may be communicated. The phrase “communicatively connected” is intended to include a connection between devices or programs within a single data processor, a connection between devices or programs located in different data processors, and a connection between devices not located in data processors at all. In this regard, although the data storage system 140 is shown separately from the data processing system 110, one skilled in the art will appreciate that the data storage system 140 may be stored completely or partially within the data processing system 110. Further in this regard, although the peripheral system 120 and the user interface system 130 are shown separately from the data processing system 110, one skilled in the art will appreciate that one or both of such systems may be stored completely or partially within the data processing system 110.
The peripheral system 120 may include one or more devices configured to provide digital content records to the data processing system 110. For example, the peripheral system 120 may include digital still cameras, digital video cameras, cellular phones, or other data processors. The data processing system 110, upon receipt of digital content records from a device in the peripheral system 120, may store such digital content records in the data storage system 140.
The user interface system 130 may include a mouse, a keyboard, another computer, or any device or combination of devices from which data is input to the data processing system 110. In this regard, although the peripheral system 120 is shown separately from the user interface system 130, the peripheral system 120 may be included as part of the user interface system 130.
The user interface system 130 also may include a display device, a processor-accessible memory, or any device or combination of devices to which data is output by the data processing system 110. In this regard, if the user interface system 130 includes a processor-accessible memory, such memory may be part of the data storage system 140 even though the user interface system 130 and the data storage system 140 are shown separately in
The additional image data 220 can include various types of non-pixel information that pertains to the input digital image 210. In a preferred embodiment, the additional image data 220 includes metadata resided in the image file used to store the input digital image 210. For example, the metadata can include data pertaining to image capture settings such as image width and height, camera exposure index, exposure time, lens aperture (i.e., F/#), an indication of whether the flash was fired, and image compression settings. The metadata can also include an indication of various user-selectable options. For example, the user-selectable options can include information relevant to the formation of the improved high-resolution digital image 280 such as an output image size, display/printer characteristics, output media, and output finishing options. The metadata can also include semantic information pertaining to the image such as over-all scene type, regional content information and the location/identity of persons and objects. In some cases, some or all of the semantic information can be generated automatically by analyzing the input digital image 210 using any appropriate image analysis method known in the art. In other cases, some or all of the semantic information can be provided by a user using an appropriate user interface. For example the user can add a caption or manually identify persons to the input digital image 210.
An image enhancement block 230 operates on the input digital image 210 using the selected pixel generation models 245, thereby providing a high-resolution digital image 250. The high-resolution digital image 250 has an increased spatial resolution (i.e., a larger number of image pixels) and an improved image quality relative to the input digital image 210. Optionally, the high-resolution digital image 250 can be further enhanced in the post processing block 270 to provide the final improved high-resolution digital image 280. The post processing block 270 may operate responsive to various inputs including the additional image data 220 or data generated during the pixel generation process selection block 240.
After the improved high-resolution digital image 280 has been formed, it can be stored in some processor-accessible memory (e.g., in data storage system 140 (
Image regions within the input digital image 210 (or the entire input digital image 210) are classified using scene classification block 330 to associate each image region with one or more predefined classes. The resulting classifications are stored as classification data 335. The classes can include both global scene content classifications and local scene content classifications. Examples of classes that can be used in various embodiments would include Face Region; Text Region; Graphics Region, Indoor Scene, Outdoor Scene, Beach Scene, Portrait Scene and Macro Scene. The determination of the appropriate classes enables the selection of different pixel generation processes 245 for different types of image content. For example, the pixel generation process 245 that produces the best results for a text document may not produce the best results for a pictorial image of a person's face.
In some embodiments, the scene classification block 330 determines the classification data 335 by automatically analyzing one or both of the input digital image 210 and the additional image data 220. Algorithms for performing scene classification are well known in the art, and any appropriate algorithm can be used in accordance with the present invention. In some embodiments, a user interface providing various user-selectable options can be provided to enable a user to manually assign one or more classes to the input digital image 210 (or to regions in the input digital image 210).
An image characterization block 340 is used to analyze the image pixels in the input digital image 210 to determine image characteristics 345. The image characteristics 345 can include both global image characteristics and local image characteristics. Examples of image characteristics 345 that can be determined in various embodiments would include image size, edge activity, noise characteristics, and tone scale characteristics.
The image size characteristics of the input digital image 210, together with the desired output image size, provide an indication of the magnification that needs to be applied to the input digital image 210. The size of the input digital image 210 is typically stored as metadata in association with the image file (i.e., as part of the additional image data 220), and the output image size is typically provided by a user or specified in accordance with a particular application. In some embodiments, the total number of image pixels in the input digital image 210 can be used to represent the image size characteristics. In other embodiments, the image width or the image height or both can be used to represent time image size characteristics.
The edge activity image characteristic provides an indication of the local image structure at each pixel location. Various types of edge detection processes can be used to determine the edge activity. In some embodiments, the edge activity is determined by computing the magnitude of the intensity gradient. The edge activity can then be quantized into 3 ranges: low, median and high.
The noise activity image characteristic provides an indication of the noise (i.e., unwanted random variations) in the input digital image 210. The noise activity can be determined using various types of noise estimation processes as is known in the art. In some embodiments, the noise activity can be determined by computing a standard deviation of the pixel values in a flat image area. In other embodiments, the noise activity can be determined based on the additional image data 220 (e.g., images captured at higher exposure index settings are known to exhibit higher noise levels than images captured at lower exposure index settings). In some embodiments different noise activity levels can be determined for different image regions or different image pixels (e.g., the noise activity may be larger for dark image pixels than for light image pixels).
The tone scale activity provides an indication of the tone scale characteristics of the input digital image. The tone scale activity can be determined using various types of tone scale characterization methods. In some embodiments the tone scale activity provides an indication of image contrast, and can be measured from a histogram of the luminance values in the input digital image 210. For example, the tone scale activity can be determined by computing a difference between a minimum luminance code value (e.g., a code value which 99% of all image pixels in the input digital image 210 are greater than) and a maximum luminance code value (e.g., a code value which 99% of pixels in the input digital image 210 are less than). In some embodiments, a single tone scale activity value can be determined for the entire input digital image 210. In other embodiments, different tone scale activity values can be determined for different image regions, or on a pixel-by-pixel basis based on the image pixels in a local pixel neighborhood.
An assign pixel classifications block 350 is used to assign pixel classifications 355 to each image pixel in the input digital image 210 responsive to some or all of the JPEG compression settings 325, the scene classification data 335 and the image characteristics 345. In some cases, the same pixel classification 355 can be assigned to every image pixel in the input digital image 210. In other cases, different pixel classifications 355 can be assigned to different image pixels.
The pixel classifications 355 can be assigned using different strategies. For example, in some embodiments, the pixel classifications 355 can be assigned solely based on a single attribute (e.g., the edge activity level) as illustrated in
In other embodiments, a more complex logic tree can be used to sequentially consider a set of different attributes as shown in
Depending on the strategy used by the assign pixel classifications block 350, it will be recognized that it is only necessary to compute the image attributes that are relevant to that strategy. For example, in some embodiments the assign pixel classifications block 350 can base the assignment of the pixel classifications solely on an edge activity characteristic as in
A select pixel generation processes step 360 is used to select appropriate pixel generation processes 245 to be used for each of the image pixels in the input digital image 210 in accordance with the corresponding pixel classifications 355. In a preferred embodiment, the pixel generation process 245 associated with a particular pixel classification 255 is adapted to produce optimal results for the corresponding type of image content.
Various types of pixel generation processes 245 can be used in accordance with the present invention, including pixel interpolation algorithms and machine learning algorithms. Examples of appropriate pixel interpolation algorithms include, but are not limited to, nearest neighbor interpolation, bilinear interpolation and bicubic interpolation. These types of interpolation algorithms are well-known to those skilled in the image resizing art. Examples of appropriate machine learning algorithms include, but are not limited to, artificial neural networks, sparse representations, regression trees, Bayesian networks, support vector machines, and various combinations thereof. These types of machine learning algorithms are well-known to those skilled in the image processing art.
In some embodiments, different pixel generation processes 245 can be selected to operate on different color channels of the input digital image 210. For example, if the input digital image 210 is in a luma-chroma color space, such as the well-known YCrCb color space, then it can be desirable to assign one pixel generation process to the luma color channel (i.e., “Y”) and a different pixel generation process to the chroma color channels (i.e., Cr and Cb). In some embodiments, different pixel generation processes 245 can be selected to operate on different frequency bands of the input digital image 210.
In one exemplary embodiment, the image pixels in the input digital image are assigned to one of three different pixel classifications 355 corresponding to different edge activity levels as shown in
In some embodiments, the pixel classification can be assigned according to multiple image attributes as illustrated in
Returning to a discussion of
It will be recognized by one skilled in the art that the size of the input pixel neighborhood 410 and the block-size for the high-resolution image pixels 430 can be different in various embodiments. For example, the neural network machine learning algorithm can be trained to operate using a different number of input nodes (e.g., a 5×5 block of image pixels or a 9×9 block of image pixels). Likewise, the neural network machine learning algorithm can be trained to provide a different number of high-resolution image pixels 430 (e.g., a 3×3 block of high-resolution image pixels 430).
The size of the input pixel neighborhood 410 will also depend on the type of pixel generation process 245. For example, different types of pixel interpolation algorithms can require different size input pixel neighborhoods 410 (e.g., a single input pixel can be used for nearest neighbor interpolation, a 2×2 block of input pixels can be used for bilinear interpolation, and a 4×4 block of input pixels can be used for bicubic interpolation).
In the embodiment illustrated in
In other embodiments, the method of the present invention can be performed multiple times to increase the image resolution to the closest power of two to the desired final image resolution. A conventional image resizing algorithm can then be used to resize the image to the desired final resolution. For example, if it is desired to increase the resolution by a factor of 3.5×, the method of the present invention can be applied twice to provide a high-resolution image having a resolution increase of 4×. The resulting image can then be down-sampled using a conventional bicubic interpolation to obtain the desired output image resolution. When the method of the present invention is applied multiple times to a given input digital image 210, it may be desirable to train the pixel generation processes 245 differently for the first iteration and any additional iterations. For example, the pixel generation processes 245 used for the first iteration can be optimized to compensate for the compression artifacts in the input digital image 210, while the pixel generation processes 245 used for the additional iterations can be optimized to operate on digital images without such artifacts.
A down-size images step 505 is used to reduce size of high-resolution training images 500 by a factor of 2×. A compress/decompress images step 510 is used to apply a JPEG compression operation to the down-sized images, followed by a decompression operation, thereby providing low resolution images 515. A determine pixel classifications step 520 is used to determine pixel classifications 525 for the image pixels in the low-resolution images 515. The determine pixel classifications step 520 uses a process analogous to that described in
An apply neural network model step 530 is used to process the low-resolution images 515 to determine reconstructed high-resolution images 535 responsive to a set of neural network parameters 555. The neural network parameters 555 correspond to the weighting parameters for the interconnections in the neural network, and are initialized to nominal values at the start of the training process. (The apply neural network model step 530 only needs to process the image pixels in the low-resolution images 515 that have the particular pixel classification 525 for which the neural network model is being trained.)
A compute pixel differences step 540 is used to compute pixel differences 545 between the image pixels in the high-resolution training image 500 and the corresponding image pixels in the reconstructed high-resolution images 535. (The pixel differences 545 only need to be determined for image pixels that have the particular pixel classification 525 for which the neural network model is being trained.)
A back propagation training process is used to determine the neural network parameters 555 that minimize the pixel differences 545 for the training set. The following description is a high-level summary of an exemplary back propagation training process; the details of such processes will be well-known to one skilled in the art. A back propagation step 550 is used to update the neural network parameters 555 responsive to the pixel differences 545. The updated neural network parameters 555 are then used to determine an updated set of reconstructed high-resolution images 535 and an updated set of pixel differences 545. This process is repeated iteratively until the set of neural network parameters 555 that minimize the pixel differences 545 are identified.
The neural network training process shown in
In some embodiments, other operations can be used to modify the down-sized training images in addition to the compress/decompress images step 510 in
In addition to removing unwanted artifacts, the machine learning training process in
In some embodiments, the high-resolution training images 500 can be modified using an optional enhance images step 560 after the formation of the low-resolution images 515 to introduce various image enhancements. In some embodiments, the image enhancements can include image processing operations such as image sharpening or noise removal. The image enhancements can also include applying artistic variations such as water color, line sketch, cartoonization, and other artistic effects. The neural network model can then be trained to automatically provide reconstructed high-resolution images 535 that are similar to the modified high-resolution training images, and will therefore include these special effects.
In some embodiments, the neural network model can be trained to generate intermediate data for up-stream processing. For example, in some applications, the high-resolution training images 500 can be images of product on an assembly line or medical diagnostic scans of blood samples. The enhance images step 560 can be used to manually or automatically modify the high-resolution training images 500 in a way that facilitates identification of manufacturing defects or cell malignancies, respectively. In this way, the neural network model can then be trained to automatically provide reconstructed high-resolution images 535 where the manufacturing defects or cell malignancies are similarly highlighted. It will be readily obvious to those skilled in the art that this approach can readily be extended to various type of image enhancement applications in the manufacturing, security, military, and consumer fields.
As mentioned earlier, other types of machine learning algorithms can also be used for pixel generation processes in various embodiments of the present invention (e.g. sparse representations, regression trees, Bayesian networks, or Support Vector Machines). Those skilled in the art will recognize that appropriate training processes can be used to train the machine learning algorithms accordingly. Pixel interpolation algorithms do not generally require a training process.
The type of pixel generation process 245 that is used for each pixel classification 355 can be determined in various ways. In some cases, a variety of different pixel generation processes 245 (e.g., different pixel interpolation algorithms and different machine learning algorithms) can be applied to a set of training images and optimized accordingly. The pixel generation process 245 that produces the best results (e.g., the lowest average pixel difference 545) can then be selected. In some cases, there will be significant differences in the computational complexity of the different pixel generation processes 245. Therefore, it some cases it may be appropriate to make a tradeoff between the quality of the results and the computational efficiency.
In some embodiments, the determination of the high-resolution image pixels 430 (
As will be well-known to one skilled in the compression art, the JPEG quality factor 445 (Qf) is a number that is used to scale the standard JPEG T50 quantization table, which was given in Eq. (2). The value of Qf can range from 1 to 100, and is used to compute a scale factor S:
The scale factor is then used to scale the T50 quantization table to provide the quantization table Q[u,v] to be used by the JPEG compression algorithm:
Q[u,v]=S*T50[u,v] (4)
Large values of the JPEG quality factor 445 (e.g., Qf>90) will result in negligible losses in image quality, while small values of the JPEG quality factor 445 (e.g., Qf<25) will result in significant losses in image quality.
The neural network machine learning algorithm used for the pixel generation process 245 of
In some embodiments, the determination of the high-resolution image pixels 430 (
As mentioned earlier, an optional post processing block 270 can be used to process the high-resolution digital image 250 to provide the final improved high-resolution digital image 280. The post processing block 270 can perform any type of image processing known in the art, such as additional resizing, tone/color processing, sharpening, halftoning noise removal or texture processing (e.g., noise addition or removal).
In an exemplary embodiment, the post processing block 270 applies a texture synthesis process 600 as illustrated in
It has been observed that in some cases, the high-resolution digital image 250 can have certain texture attributes that may appear unnatural to a human observer. For example, the texture in relatively flat facial regions (e.g., the cheeks and forehead regions) may be overly flat and cause the face to appear “plastic” and mannequin-like. In this case, the perceived image quality can be improved by adding an appropriate texture (i.e., spatially-varying “noise”) using the texture synthesis process 600. In a preferred embodiment, different texture characteristics are imparted by the texture synthesis process 600 responsive to the pixel classifications 355 (
A computer program product can include one or more non-transitory, tangible, computer readable storage medium, for example; magnetic storage media such as magnetic disk (such as a floppy disk) or magnetic tape; optical storage media such as optical disk, optical tape, or machine readable bar code; solid-state electronic storage devices such as random access memory (RAM), or read-only memory (ROM); or any other physical device or media employed to store a computer program having instructions for controlling one or more computers to practice the method according to the present invention.
The invention has been described in detail with particular reference to certain preferred embodiments thereof, but it will be understood that variations and modifications can be effected within the spirit and scope of the invention.
Reference is made to commonly assigned, co-pending U.S. patent application Ser. No. 13/346,816 (docket K000673), entitled: “Super-resolution using selected edge pixels”, by Adams et al., which is incorporated herein by reference.