In conjunction with capturing a scene, a user device, such as a mobile phone, creates an image of the scene that is often lower in resolution than an image of the scene captured and rendered by other devices, such as digital single-lens reflex (DSLR) camera. Images captured by these user devices may also be noisy and have lower dynamic range because of the relatively small physical sensor size permitted by these user devices limiting the user device camera's spatial resolution. The image sensor of the user device may also have smaller apertures, limiting the user device camera's light-gathering ability, and smaller pixels, reducing a signal-to-noise ratio that the smartphone uses to process a captured image.
Furthermore, the image sensor of the user device's camera often includes a color filter array (CFA), which traditionally requires digital image-processing hardware of the user device to use demosaicing techniques while rendering a captured image of a scene. Demosaicing techniques, in general, are detrimental to super-resolution rendering. Effects of demosaicing techniques can include chromatic aliasing, false gradients, and Moiré patterns that lead to the user device rendering, at a poor resolution and with non-desirable artifacts, the captured image of the scene.
The present disclosure describes systems and techniques for creating a super-resolution image of a scene captured by a user device. Natural handheld motion introduces, across multiple frames of an image of a scene, sub-pixel offsets that enable the use of super-resolution computing techniques to form color planes, which are accumulated and merged to create a super-resolution image of the scene. These systems and techniques offer advantages over other systems and techniques that rely on demosaicing, providing the super-resolution image of the scene without detrimental artifacts, such as chromatic aliasing, false gradients, and Moiré patterns.
In some aspects, a method performed by a user device to render a super-resolution image of a scene is described. The method includes capturing, in a burst sequence, multiple frames of an image of a scene, where the multiple frames have respective, relative sub-pixel offsets of the image due to a motion of the user device during the capture of the multiple frames. The method includes using the captured, multiple frames to perform super-resolution computations that include computing Gaussian RBF kernels and computing a robustness model. The method further includes accumulating, based on the super-resolution computations, color planes, combining the accumulated color planes to create the super-resolution image of the scene, and rendering the super-resolution image of the scene.
In other aspects, a method for providing color planes to an apparatus is described. The method includes computing Gaussian radial basis function kernels, wherein computing the Gaussian radial basis function kernels includes (i) computing a reference frame and (ii) computing a kernel covariance matrix based on analyzing local gradient structure tensors, where the local gradient structure tensors correspond to edges, corners, or textured areas of content included in the reference frame.
The method also includes computing a robustness model, wherein computing the robustness model uses a statistical neighborhood model to compute a color mean and spatial standard deviation. Based on the computed Gaussian RBF kernels and the computed robustness model, the method includes determining the contribution of pixels to color planes and accumulating the color planes. The color planes are then provided to the apparatus.
In yet other aspects, a user device is described. The user device includes one or more processors, one or more image sensors, and a display. The user device also includes a computer-readable medium storing instructions of a super-resolution manager that, when executed by the one or more processors, directs the user device to capture, in a burst sequence, multiple frames of an image of a scene, where the multiple frames have respective, relative offsets of the image. The super-resolution manager also directs the user device to use the captured, multiple frames to perform super-resolution computations and accumulate planes based on the super-resolution computations, combine the accumulated planes to create the super-resolution image of the scene, and render the super-resolution image of the scene.
The details of one or more implementations are set forth in the accompanying drawings and the following description. Other features and advantages will be apparent from the description and drawings, and from the claims. This summary is provided to introduce subject matter that is further described in the Detailed Description and Drawings. Accordingly, a reader should not consider the summary to describe essential features nor limit the scope of the claimed subject matter.
This present disclosure describes details of one or more aspects associated with creating a super-resolution image of a scene captured by a user device.
The present disclosure describes techniques and systems for creating a super-resolution image of a scene. While features and concepts of the described systems and methods for super-resolution using natural handheld motion applied to a user device can be implemented in any number of different environments, systems, devices, and/or various configurations, aspects are described in the context of the following example devices, systems, and configurations.
The variations 104-108 of the image of the scene, captured in a burst sequence by the user device 102, include sub-pixel offsets that are a result of a natural handheld-motion 110 applied to the user device 102 while the user device 102 is capturing the image of the scene. The natural handheld-motion 110 may be caused, for example, by a hand-tremor of a user of the user device 102 that induces an in-plane motion, an out-of-plane motion, a pitch, a yaw, or a roll to the user device 102 while the user device 102 is capturing the variations 104-108 of the image of the scene.
In some instances, and as an alternative to the sub-pixel offsets resulting from the natural handheld-motion 110, the sub-pixel offsets may result from another motion applied to the user device 102, such as a haptic motion induced by a vibrating mechanism that is in contact with (or integrated with) the user device 102 or a vibration that is induced while the user device 102 is transported within the operating environment 100 (e.g., the user device 102 may be in motion in a vehicle, moved by the user, and so on).
The user device 102 includes a combination of one or more image sensor(s) 112 for capturing an image. The image sensor 112 may include a complementary metal-oxide semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor. In some instances, the image sensor 112 may include a color filter array (CFA) that overlays pixels of the image sensor 112 and limits intensities, as associated with color wavelengths, of light recorded through the pixels. An example of such a CFA is a Bayer CFA, which filters light according to a red wavelength, a blue wavelength, and a green wavelength. In an instance of multiples of the image sensor 112 (e.g., a combination of more than one image sensor, such as a dual image sensor), the multiples of the image sensor 112 may include combinations of pixel densities (e.g., 40 megapixel (MP), 32 MP, 16 MP, 8 MP) as well as different CFA configurations to support different image processing needs (e.g., inclusion of a Bayer CFA to support red green blue (RGB) image processing, exclusion of a CFA to support monochromatic-image processing). Light from images, when filtered through the Bayer CFA, may generate an image that can be referred to as a Bayer image or a Bayer frame.
The user device 102 also includes a display 114 for rendering the image. In some instances, the display 114 may be a touchscreen display. The user device 102 also includes a combination of one or more processor(s) 116. The processor 116 may be a single core processor or a multiple core processor composed of a variety of materials, such as silicon, polysilicon, high-K dielectric, copper, and so on. In an instance of multiples of the processor 116 (e.g., a combination of more than one processor), the multiples of processor 116 may include a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), or an image processing unit (IPU). Furthermore, and in such an instance, the multiples of the processor 116 may perform two or more computing operations using pipeline-processing.
The user device 102 also includes computer-readable storage media (CRM) 118 that includes executable instructions in the form of a super-resolution manager 120. The CRM 118 described herein excludes propagating signals. The CRM 118 may include any suitable memory or storage device such as random-access memory (RAM), static RAM (SRAM), dynamic RAM (DRAM), non-volatile RAM (NVRAM), read-only memory (ROM), or Flash memory useable to store the super-resolution manager 120.
Code or instructions of the super-resolution manager 120 may be executed, using the processor 116, to cause the user device 102 to perform operations directed to creating (and rendering) a super-resolution image 122 of the scene. Such operations may include capturing, using the image sensor 112, multiple frames of an image of a scene (e.g., the variations (104, 106, and 108) of the image of the scene) using the image sensor 112. The operations may further include the user device 102 (e.g., the processor 116) performing super-resolution computations, accumulating color planes, combining the accumulated color planes to create a super-resolution image 122 of the scene, and rendering (e.g., through the display 114) the super-resolution image 122 of the scene. The super-resolution image 122 of the scene, in general, has a resolution that is higher over another resolution of the multiple frames of the image of the scene.
As illustrated in
The burst sequence may include capturing the multiple frames 202 at a set time interval that may range, for example, from one millisecond to three milliseconds, one millisecond to five milliseconds, or one-half millisecond to ten milliseconds. Furthermore, and in some instances, the time interval of the burst sequence may be variable based on a motion of the user device (e.g., a time interval may be “shorter” during a high-velocity motion of the user device 102 than another time interval during a low-velocity motion of the user device 102 to keep the offsets at less than one pixel).
As illustrated, the image of frame 206 is respectively offset, relative to the image of frame 204, one half-pixel horizontally and one half-pixel vertically. Furthermore, and as illustrated, the image of frame 208 is respectively offset, relative to the image of frame 204, one-quarter pixel horizontally. Respectively, relative sub-pixel offsets can include different magnitudes and combinations of sub-pixel offsets (e.g., one sub-pixel offset associated with one frame might be one-quarter pixel horizontally and three-quarters of a pixel vertically, while another sub-pixel offset that is associated with another frame might be zero pixels horizontally and one-half of a pixel vertically). In general, the techniques and systems described by this present disclosure can accommodate sub-pixel offsets that are more random than the illustrations and descriptions of frames 204-208, including sub-pixel offsets that are non-linear.
As illustrated in
In support of the Gaussian RBF kernel computations, the user device 102 filters pixel signals from each frame of the multiple frames 202 to generate respective color-specific image planes corresponding to color channels. The user device 102 then aligns the respective color-specific image planes to a reference frame. In some instances, the reference frame may be formed through creating red/green/blue (RGB) pixels corresponding to Bayer quads by taking red and blue values directly and averaging green values together.
The user device 102 then computes a covariance matrix. Computing the covariance matrix may include analyzing local gradient structure tensors for content of the reference frame (e.g., a local tensor may be local to an edge, a corner, or a textured area contained within the reference frame). Using the covariance matrix, the user device 102 can compute the Gaussian RBF kernels.
Computing the covariance matrix may rely on the following mathematical relationship:
In mathematical relationship (1), Ω represents a kernel covariance matrix, e1 and e2 represent orthogonal direction vectors and two associated eigenvalues λ1 and λ2, and k1 and k2 control a desired kernel variance.
Computing the local gradient structure tensors may rely on the following mathematical relationship:
In mathematical relationship (2), Ix and Iy represent local image gradients in horizontal and vertical directions, respectively.
In support of the mentioned robustness model computations, the user device 102 may use a statistical neighborhood model to formulate probabilities of pixels contributing to a super-resolution image (e.g., pixels from the multiple frames 202 contributing to the super-resolution image 122 of the scene. The statistical neighborhood model may analyze local statistics such as a mean, a variance, or a Bayer pattern local quad green channel disparity difference to form a model that predicts aliasing (e.g., pixel signaling with frequency content above half of a sampling rate that manifests as a lower frequency after sampling).
The robustness model computations, in some instances, may include denoising computations to compensate for color differences. The denoising computations may, in some instances, rely on a spatial color standard deviation or a mean difference between frames.
Additional or alternative techniques may also be included in the super-resolution computations 302. For example, the super-resolution computations 302 may include analyzing downscaling operations to find regions of an image that cannot be aligned correctly. As another example, the super-resolution computations 302 may include detecting characteristic patterns to mitigate misalignment artifacts. In such an instance, signal gradient pattern analysis may detect artifacts such as “checkerboard” artifacts.
The super-resolution computations 302 are effective to estimate, for each of the multiple frames 202 (e.g., for frame 204, 206, and 208), the contribution of pixels to color channels associated with respective color planes, e.g., a first color plane 304 (which may be a red color plane associated to a red color channel), a second color plane 306 (which may be a blue color plane associated to a blue color channel), and a third color plane 308 (which may be a green color plane associated to a green color channel). The super-resolution computations 302 treat the pixels as separate signals and accumulate the color planes simultaneously.
Also, and as illustrated in
In mathematical relationship (3), x and y represent pixel coordinates, the sum Σn operates over (or is a sum of) contributing frames, the sum Σi is a sum of samples within a local neighborhood, cn,i represents a value of a Bayer pixel at a given frame n and sample i, wn,i represents a local sample weight, and {circumflex over (R)}n represents a local robustness.
With respect to
The elements described by
Example methods 500 and 600 are described with reference to
At block 502 the user device 102 (e.g., the image sensor 112) captures, in a burst sequence, multiple frames 202 of an image of a scene, where the multiple frames 202 have respective, relative sub-pixel offsets of the image due to a motion of the user device during the capturing of the multiple frames. In some instances, the motion of the user device may correspond to a natural handheld-motion 110 made by a user of the user device. In other instances, the motion of the user device may correspond to a displacement induced by a vibrating mechanism that is in contact with, or part of, the user device 102.
At block 504, the user device (e.g., the processor 116 executing the instructions of the super-resolution manager 120) performs super-resolution computations 302. Performing the super-resolution computations 302 uses the captured, multiple frames to compute Gaussian radial basis function kernels and compute a robustness model. Computing the Gaussian radial basis function kernels may include multiple aspects, inclusive of filtering pixel signals from each of the multiple frames to generate color-specific image planes for respective color channels and aligning the color-specific image planes to a reference frame. In addition to corresponding to red, green and blue color channels, the color-specific image planes may also correspond to chromatic color channels (e.g., shades of black, white, and grey) or other color-channels such as cyan, violet, and so on.
Computing the Gaussian radial basis function kernels may also include computing a kernel covariance matrix (e.g., mathematical relationship (1)) based on analyzing local gradient structure tensors (e.g., mathematical relationship (2)) generated by aligning the color-specific image planes to the reference frame. In such instances, the local gradient structure tensors may correspond to edges, corners, or textured areas of content included in the reference frame. Furthermore, and also as part of block 504, computing the robustness may include using a statistical neighborhood model to compute, for each pixel, a color mean and spatial standard deviation.
At block 506, the user device 102 (e.g., the processor 116 executing the instructions of the super-resolution manager 120) accumulates color planes based on the super-resolution computations 302 of block 504. Accumulating the color plane may include the user device performing computations (e.g., mathematical relationship (1)) that, for each color channel, normalize pixel contributions (e.g., normalize contributions of each pixel, of the multiple frames captured at block 502, to each color channel).
At block 508, the user device 102 combines the accumulated color planes to create the super-resolution image 122 of the scene. At block 510, the user device 102 (e.g., the display 114) renders the super-resolution image 122 of the scene.
Although the example method 500 of
At block 602, the user device 102 (e.g., the processor 116 executing the instructions of the super-resolution manager 120) computes Gaussian Radial Basis Function (RBF) kernels. Computing the Gaussian RBF kernels includes several aspects, including selecting a reference frame and computing a covariance matrix.
Computing the kernel covariance matrix (e.g., mathematical relationship (1)) is based on analyzing local gradient structure tensors (e.g., mathematical relationship (2)), where the local gradient structure tensors correspond to edges, corners, or textured areas of content included in the reference frame.
In some instances, at block 602, the multiple frames of the image of the scene 202 may have respective, relative sub-pixel offsets of the image across the multiple frames due to a motion of an image-capture device during the capture of the multiple frames 202. Furthermore, and in some instances, the motion of the image-capture device may correspond to a motion made by a user of the image-capture device. The motion, in some instances, may correspond to a natural handheld motion.
At block 604, the user device 102 computes a robustness model. Computing the robustness model includes using a statistical neighborhood model to a color mean and a spatial standard deviation.
At block 606 the user device 102 determines color planes. The user device 102 may base the determination on the computed Gaussian radial basis function kernels and the computed robustness model, determining the contribution of each pixel to the color planes.
At block 608, the user device 102 accumulates the color planes. Accumulating the color planes may include normalization computations (e.g., using mathematical relationship (1)).
At block 610, the user device 102 provides, to the apparatus, the color planes. In some instances, providing the color planes to the apparatus includes providing the color planes to the apparatus for storage (e.g., storage in a computer-readable media of the apparatus). In other instances, providing the color planes to the apparatus includes providing the color planes to the apparatus for combining the color planes and rendering the color planes.
Although systems and methods of super-resolution using handheld motion applied to a user device have been described in language specific to features and/or methods, it is to be understood that the subject of the appended claims is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as example ways in which super-resolution using handheld motion applied to a user device can be implemented.
Variations to systems and methods of super-resolution using handheld motion applied to a user device, as described, are many. As a first example variation, super-resolution computations may generate (and accumulate) depth maps or other planes that are not associated with a specific color. As a second example variation, super-resolution computations may rely on sampling patterns that are other than Gaussian RBF sampling patterns. As a third example variation, super-resolution computations may rely on offsets corresponding to displacement fields instead of sub-pixel offsets. And, as a fourth example variation, super resolution-computations may rely on motion that is not induced through handheld movement (e.g., small motions of an image may generate necessary sub-pixel offsets or displacements to perform the super-resolution computations).
Further to the descriptions above, a user may be provided with controls allowing the user to make an election as to both if and when systems, programs or features described herein may enable collection of user information (e.g., images captured by a user, super-resolution images computed by a system, information about a user's social network, social actions or activities, profession, a user's preferences, or a user's current location), and if the user is sent content or communications from a server. In addition, certain data may be treated in one or more ways before it is stored or used, so that personally identifiable information is removed. For example, a user's identity may be treated so that no personally identifiable information can be determined for the user, or a user's geographic location may be generalized where location information is obtained (such as to a city, ZIP code, or state level), so that a particular location of a user cannot be determined. Thus, the user may have control over what information is collected about the user, how that information is used, and what information is provided to the user.
In the following, several examples are described.
Example 1: A method used to render a super-resolution image of a scene, the method performed by a user device and comprising: capturing, in a burst sequence, multiple frames of an image of a scene, the multiple frames having respective, relative sub-pixel offsets of the image due to a motion of the user device during the capturing of the multiple frames; performing super-resolution computations using the captured, multiple frames, the super-resolution computations including: computing Gaussian radial basis function kernels; and computing a robustness model; accumulating, based on the super-resolution computations, color planes; combining the accumulated color planes to create the super-resolution image of the scene; and rendering the super-resolution image of the scene.
Example 2: The method as recited by example 1, wherein performing the super-resolution computations determines contributions of pixels of the multiple frames of the image of the scene to the color planes.
Example 3: The method as recited by example 1 or 2, wherein the motion of the user device corresponds to a natural handheld-motion made by a user of the user device during the burst sequence.
Example 4: The method as recited by any of examples 1-3, wherein performing the super-resolution computations includes filtering pixel signals from each of the multiple frames to generate color-specific image planes for respective color channels.
Example 5: The method as recited by example 4, wherein performing the super-resolution computations includes aligning the color-specific image planes to a reference frame.
Example 6: The method as recited by any of examples 4-5, wherein the respective color channels correspond to a red color channel, a blue color channel, and a green color channel.
Example 7: The method as recited by any of examples 5-6, wherein computing the Gaussian radial basis function kernels includes computing a kernel covariance matrix based on analyzing local gradient structure tensors of color-specific image planes aligned to the reference frame.
Example 8: The method as recited by any of examples 1-7, wherein the local gradient structure tensors correspond to edges, corners, or textured areas of content included in the reference frame.
Example 9: The method as recited by any of examples 1-8, wherein computing the robustness model uses a statistical neighborhood model to compute a spatial color standard deviation or a mean difference.
Example 10: A method of providing color planes to an apparatus, the method comprising: computing Gaussian radial basis function kernels, wherein computing the Gaussian radial basis function kernels includes: selecting a reference frame; and computing a kernel covariance matrix based on analyzing local gradient structure tensors, the local gradient structure tensors corresponding to edges, corners, or textured areas of content included in the reference frame; and computing a robustness model, wherein computing the robustness model includes using a statistical neighborhood model to compute a color mean and spatial standard deviation; determining, based on the computed Gaussian radial basis function kernels and the computed robustness model, contributions of pixels to color planes; accumulating the color planes; and providing, to the apparatus, the accumulated color planes.
Example 11: The method as recited by example 10, wherein providing the accumulated color planes includes providing the accumulated planes to the apparatus to store.
Example 12: The method as recited by example 10 or 11, wherein providing the accumulated color planes to the apparatus includes providing the accumulated color planes to the apparatus to combine and render a super-resolution image.
Example 13: A user device, the user device comprising: one or more image sensors; one or more processors; a display; and a computer-readable medium comprising instructions of a super-resolution manager application that, when executed by the one or more processors, directs the user device to: capture, in a burst sequence using the one or more image sensors, multiple frames of an image of a scene, the multiple frames having respective, relative offsets of the image across the multiple frames; perform, using the one or more processors, super-resolution computations using the captured, multiple frames of the image of the scene; accumulate, using the one or more processors and based on the super-resolution computations, planes; combine, using the one or more processors, the accumulated planes to create a super-resolution image of the scene; and render, using the display, the super-resolution image of the scene.
Example 14: The user device as recited by example 13, wherein the one or more processors are multiple, different processors and include a central processing unit, an image processing unit, a digital signal processor, or a graphics processing unit.
Example 15: The user device as recited by example 13 or 14, wherein the one or more processors that are the multiple, different processors, perform the super-resolution computations using pipeline-processing.
Example 16: A system comprising a means for performing any of the methods as recited by examples 1-9.
Example 17: A system comprising a means for performing any of the methods as recited by examples 10-12.
Example 19: A computer-readable storage medium including instructions that, when executed, configure a processor to perform any of the methods as recited by any of examples 1-9.
Example 20: A computer-readable storage medium including instructions that, when executed, configure a processor to perform any of the methods as recited by any of examples 10-12.
Example 21: A user device configured to perform the method as recited by any of examples 1-9.
Example 22: A user device configured to perform the method as recited by any of examples 10-12.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/045342 | 8/6/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62716921 | Aug 2018 | US |