The present disclosure relates to image processing and, more particularly, to methods, apparatuses, and systems for providing non-local means image processing of multiple sequential images gathered by a camera array.
Various techniques have been developed for performing filtering and denoising of a single image. One such technique is bilateral filtering. A non-linear, edge-preserving and noise-reducing smoothing filter is applied to the image. The intensity value at each pixel in this single image is replaced by a weighted average of intensity values from nearby pixels. The weights depend not only on the Euclidean distances between pairs of pixels, but also on radiometric differences such as color intensity and depth distance. The weight may be based on a Gaussian distribution. Bilateral filtering preserves sharp edges in the image by systematically looping through each pixel in the image and adjusting the weights of one or more adjacent pixels accordingly.
Another image filtering and denoising technique is non-local means. Unlike local means filters which take the mean value of a group of pixels surrounding a target pixel to smooth the single image, non-local means filtering takes a mean value of all pixels in a single image, weighted by how similar each pixel is to a target pixel. This approach provides a filtered image having greater post-filtering clarity and less loss of detail relative to an image that has been processed using a local means algorithm.
In at least some embodiments, the present invention relates to a method that includes gathering a plurality of sequential image sets from a scene. The plurality of sequential image sets includes at least a main sequential image set and one or more additional sequential image sets. The main sequential image set includes a plurality of main images comprising a main central image that is at or proximate to a temporal midpoint of the set, and one or more remaining main images. The one or more additional sequential image sets each include a plurality of additional images comprising an additional central image that is at or proximate to a temporal midpoint of the set, and one or more additional remaining images. A first patch is defined as including a first plurality of contiguous pixels in the main central image that surrounds or adjoins a first target pixel of the main central image. A plurality of second patches is defined as including a set of positions corresponding to the first patch within the one or more remaining main images and the plurality of additional images, wherein a single frame of reference is applied to each of the plurality of main images and each of the plurality of additional images. The single frame of reference includes an x-axis and a y-axis, such that a set of x and y coordinates specifying the first target pixel within the main central image also specifies a substantially identical position within each of the remaining main images and each of the plurality of additional images. A set of patch similarity weights is determined by comparing the first patch to each of the plurality of second patches. The set of patch similarity weights is applied to the first target pixel to provide a filtered pixel value for the first target pixel. The foregoing procedure is repeated for at least one additional target pixel of the main central image until a set of filtered pixel values are provided for substantially all pixels of the main central image. The set of filtered pixel values for the first target pixel and the additional target pixels may be used to provide a filtered main central image having an enhanced signal-to-noise ratio.
Additionally, in at least some embodiments, the present invention relates to an apparatus that includes a camera array comprising a main camera and one or more additional cameras, wherein the main camera is configured for gathering a main sequential image set from a scene, and the one or more additional cameras are configured for gathering one or more additional sequential image sets from the scene. The main sequential image set includes a plurality of main images comprising a main central image that is at or proximate to a temporal midpoint of the set, and one or more remaining main images. The one or more additional sequential image sets each include a plurality of additional images comprising an additional central image that is at or proximate to a temporal midpoint of the set, and one or more additional remaining images. The apparatus further includes a processing mechanism operatively coupled to the camera array. The processing mechanism is configured for defining a first patch as including a first plurality of contiguous pixels in the main central image that surrounds or adjoins a first target pixel of the main central image. A plurality of second patches is defined as including a set of positions corresponding to the first patch within the one or more remaining main images and the plurality of additional images, wherein a single frame of reference is applied to each of the plurality of main images and each of the plurality of additional images. The single frame of reference includes an x-axis and a y-axis, such that a set of x and y coordinates specifying the first target pixel within the main central image also specifies a substantially identical position within each of the remaining main images and each of the plurality of additional images. A set of patch similarity weights is determined by comparing the first patch to each of the plurality of second patches. The set of patch similarity weights is applied to the first target pixel to provide a filtered pixel value for the first target pixel. The foregoing procedure is repeated for at least one additional target pixel of the main central image until a set of filtered pixel values are provided for substantially all pixels of the main central image. The set of filtered pixel values for the first target pixel and the additional target pixels may be used to provide a filtered main central image having an enhanced signal-to-noise ratio.
Moreover, in at least some embodiments, the present invention relates to a non-transitory computer readable memory encoded with a computer program comprising computer readable instructions recorded thereon for execution of a method that includes gathering a plurality of sequential image sets from a scene. The plurality of sequential image sets includes at least a main sequential image set and one or more additional sequential image sets. The main sequential image set includes a plurality of main images comprising a main central image that is at or proximate to a temporal midpoint of the set, and one or more remaining main images. The one or more additional sequential image sets each include a plurality of additional images comprising an additional central image that is at or proximate to a temporal midpoint of the set, and one or more additional remaining images. A first patch is defined as including a first plurality of contiguous pixels in the main central image that surrounds or adjoins a first target pixel of the main central image. A plurality of second patches is defined as including a set of positions corresponding to the first patch within the one or more remaining main images and the plurality of additional images, wherein a single frame of reference is applied to each of the plurality of main images and each of the plurality of additional images. The single frame of reference includes an x-axis and a y-axis, such that a set of x and y coordinates specifying the first target pixel within the main central image also specifies a substantially identical position within each of the remaining main images and each of the plurality of additional images. A set of patch similarity weights is determined by comparing the first patch to each of the plurality of second patches. The set of patch similarity weights is applied to the first target pixel to provide a filtered pixel value for the first target pixel. The foregoing procedure is repeated for at least one additional target pixel of the main central image until a set of filtered pixel values are provided for substantially all pixels of the main central image. The set of filtered pixel values for the first target pixel and the additional target pixels may be used to provide a filtered main central image having an enhanced signal-to-noise ratio.
If a scene is photographed continuously or repeatedly by a camera array to gather a plurality of images, then a simple windowed averaging of the gathered images over time can theoretically achieve a higher signal-to-noise ratio (SNR) than would be possible for any single gathered image. However, practical application of this approach is limited by at least two factors. First, many scenes are not perfectly still. One or more objects or subjects may be engaged in motion. Second, even if all of the objects and subjects in a given scene were to remain perfectly still, the camera array itself may be in motion. In situations where a scene includes one or more moving objects, or where the camera array is in motion, simple averaging does not perform well and may provide blurry images.
The operational sequence of
Next, at block 103 (
The operational sequence of
Next, at block 113 (
Although the procedure of
w
pq
=|V
p
−V
q|
where subscript p denotes first target pixel (x1, y1) and subscript q denotes accounting pixel (xq, yq). Accounting pixel (xq, yq) could, but need not, be the same pixel as second target pixel (x2, y2). Vp is a vector representing a group of contiguous pixels comprising first patch 510 and centered at first target pixel (x1, y1). Vq is a vector representing a group of contiguous pixels comprising a further patch and centered at second target pixel (x2, y2). This further patch could, but need not, be identical to any of the plurality of second patches 530. A norm of the vector difference Vp−Vq is indicative of a distance measure between the patches for first target pixel (x1, y1) and accounting pixel (xq, yq), which is typically an L1 or L2 norm of the vector difference. Typically, some function f( ) of the norm is used as the weight. A non-local means (NLM) algorithm may illustratively be used to supply this function f( ).
Thus, the filtered pixel output value, denoted as Ip, is given by:
Each of a plurality of accounting pixels (xq, yq) may be selected from a contiguous patch area that is proximate to first target pixel (x1, y1). This patch area may be defined in terms of a window having a window size, where the window is centered at the first target pixel (x1, y1). At one extreme, the window size can be specified so as to cover an entire image, but such an implementation may render the filter complexity too great to implement. Vectors Vp and Vq typically represent patches of size from approximately 3 to 11, but these patches can be larger or smaller. For purposes of illustration, when the size of the patch is set to a single pixel, the pixel filtering result of the non-local means approach is close to the result that would be achieved using a bilateral filter.
Returning to the case of sequential images, the set of pixels used to determine the weighted average is extended from pixels of a window around first target pixel (x1, y1) in the main central image 503 (
As a practical matter, it may be difficult or impossible for a camera array to remain perfectly still during the entire time interval that the set of sequential images is being gathered. This factor may result in individual images of the set of sequential images being offset one relative to each other, thereby decreasing the overall quality of filtering. One technique for addressing this issue is to determine a position alignment offset for each image of the set of sequential images prior to determining where a window center corresponding to the coordinates of pixel p in the central image shall be placed on other non-central images of the set of sequential images. Although any of various techniques could be used to determine the offset, illustratively the offset is determined using feature points detection to identify one or more pixels corresponding to a given feature in each of a plurality of images of the set of sequential images. By identifying the location of the identified feature in each of the plurality of images, an appropriate offset is calculated for each of the plurality of images.
The aforementioned operational sequence of
Pursuant to one set of illustrative embodiments, the first and second image sensor arrays each include a predetermined filter pattern of color pixels. For purposes of illustration, the first image sensor array may be implemented using a first Bayer camera array, and the second image sensor array may be implemented using a second Bayer camera array. The first Bayer camera array includes a first CCD having a predetermined color filter pattern, and the second Bayer camera array includes a second CCD having a predetermined color filter pattern substantially identical to that of the first CCD. Each of a plurality of squares on the first and second image sensor arrays represents a corresponding pixel. Bayer camera arrays incorporate a filter mosaic in the form of a color filter array (CFA) in which red, green, and blue color filtering elements are arranged in a predetermined repeating pattern on a square grid of photosensors. Each square has two diagonally opposed green (G) pixel elements, and two diagonally opposed red (R) and blue (B) elements. The pattern includes 50% green filtering elements, 25% red filtering elements, and 25% blue filtering elements, and is referred to as an RGBG, GRGB, or RGGB array. These arrays are used in single-chip CCD sensors that are incorporated into digital cameras, camcorders, scanners, smartphones, and mobile devices to create a color image.
Alternatively or additionally, any of a number of other types of image sensor arrays may be employed in the configuration of
In at least some sets of embodiments, a sequence of images is gathered using a plurality of Bayer camera arrays including at least a first Bayer camera array and a second Bayer camera array. Illustratively, the first Bayer camera array and the second Bayer camera array operate in close synchronization such that both of the Bayer camera arrays gather a respective image of a sequence of a scene at substantially the same instant in time. Next, a spatial alignment procedure is employed to align a first plurality of images that were gathered at a plurality of different moments in time by the first Bayer camera array, and also to align a second plurality of images that were gathered at the plurality of different moments in time by the second Bayer camera array. The spatial alignment procedure also aligns a third plurality of images that were gathered by either or both of the first Bayer camera array and the second Bayer camera array from different perspectives.
After this spatial alignment procedure is performed, the non-local means procedure of
Pursuant to another set of illustrative embodiments, the first image sensor array is implemented using a clear image sensor that does not include color pixels, and the second image sensor array includes a predetermined filter pattern of color pixels. Thus, the first image sensor array produces a greyscale (W) output, and the second image sensor array produces a red-green-blue (RGB) output. In order to process images gathered by the first and second image sensor arrays together, the RGB output of the second image sensor array may be converted to a standard Y′UV color space or a standard L*a*b color space (or also, alternatively, a Luv color space or other color space). The Y′UV model defines a color space in terms of one luma (Y′) and two chrominance (UV) components. When performing denoising, intensity (Y′ or L) are employed together with the greyscale output (W). Y′ is a linear combination of R, G, and B. There are standard conversion formulas for deriving this linear relationship, such as Y′=0.299R+0.587G+0.114B, U=0.492(B−Y′), and V=0.877(R−Y′). However, for matching color with greyscale images this linear combination will have different weight coefficients than for a standard RGB to Y conversion.
When performing main central clear image denoising together with a sequence of clear images of the first sensor and Bayer images of the second sensor, for calculating optimal clear target pixels similarity weights wpq there should be accounting for different noise levels in the clear and Bayer luma images. As clear pixels have a higher signal-to-noise ratio, their patch similarity should be accounted for with a correspondingly larger weight. The optimal multiplier to account for clear pixel weight relative to Bayer luma pixel weight depends on the noise standard deviation of clear and Bayer luma images and can be estimated based thereon. As discussed further herein, the noise standard deviation depends on pixel intensity and therefore the multiplier shall account for that as well.
Joint NLM based denoising of clear channel values W and color (Bayer) luma Y′ as described above can work well if, for matching points, W and Y′ have very similar values. Due to the nature of W and Y′, in most cases initially these values will not be similar or closely matching. Therefore, a special procedure can be applied to make these values match more closely in terms of absolute intensity values. Any of various approaches may be employed to achieve a better match. Pursuant to a first approach, a Luma conversion formula such as Y′=0.299R+0.587G+0.114B can be optimized in the image with custom coefficients using linear regression or any other method to provide better matching. Optimization can be performed on some reselected image areas, like flat areas or special feature points, and then applied to the whole image and used further for joint denoising.
A second approach may be employed to make the absolute intensity values for W and Y′ match more closely. A good measure of similarity can be achieved by creating a W to Y′ relation map for the whole image of Y′ using a disparity map. A relation factor K=W(p)/Y′(p) calculated for every pixel then can be filtered, for example, with a median filter, a bilateral filter, an edge preserving filter, or a generic filter. Then an Image with a relation factor K can be applied to the whole Y′ image and this will produce a very well-matching Y′ image with respect to clear image W. Using this approach, one can obtain clear and color (Bayer) images that can be efficiently filtered together using the foregoing multi-image NLM approach. Other approaches are possible to make clear and Luma derived from color (Bayer) images more similar in terms of absolute intensity values, so that the clear and color images will work well together in the multi-image NLM approach.
One additional approach may be utilized for making the greyscale and color patches more similar, such that the absolute intensity values for W and Y′ match more closely. A mean offset value is calculated between a central patch surrounding (or adjoining) the target pixel of the main clear image and the luma Y′ component of a central patch of the color (Bayer) image. This mean offset value is then added in real time (i.e., on the fly) to all pixels within a window in the color (Bayer) luma image, thus making the patch values more similar to the central patch of the clear image.
Notwithstanding the above discussion, the present disclosure is also intended to encompass additional manners or procedures of calculating intensity offset values between pixels of clear images and color (Bayer) images so as to provide more precise estimations of offsets. For example, it should be recognized that the patch size over which an intensity offset is estimated can be different from the patch size used for NLM's patch similarity weight calculations. Further for example, larger patch sizes can be useful for more noisy images, and particularly can allow for more accurate estimation of intensity offset in such circumstances. Also, intensity offset estimation can be performed through the use of a difference of mean pixel intensity values or through the use of a mean value of pixel intensity differences of patches.
Additionally for example, another manner of estimating intensity offset involves the use of histograms of patches' pixel intensity values. An offset can be determined by searching for a offset value that makes the color (Bayer) patch histogram most similar to a corresponding histogram of a clear patch, where that value can then be (or be used to further calculate) the desired intensity offset. Similarity of histograms at a certain offset can be calculated in any of a number of manners, for example, as the value of a sum of absolute or squared differences or some other metric. Before applying these metrics, the histograms can be smoothed or filtered in any of a variety of manners to make the comparison more suitable for the circumstance. A similarity metric can account for positions of histogram maximum and minimum values or other histogram peculiarities. Given that different intensities can have different offsets, histogram similarity matching can be done not merely by optimizing just a single parameter (such as offset), but also by using additional parameters as well. Such additional parameters can be coefficients of linear or higher order polynomial fit of offset (intensity) dependence. Further, as fit coefficients are estimated for a central clear image patch and corresponding Bayer luma patch, those coefficients can be used for more precise Bayer pixel luma offset estimation of neighboring Bayer luma pixels (against correspondent clear pixels) for further Bayer patch similarity weight estimation for use in enhanced joint NLM denoising. Thus, higher quality noise reduction for a main central clear image can be achieved.
In a set of illustrative approaches described previously, sets of weights of accounting for clear pixels and color (Bayer) pixels are fixed and are typically determined based on a standard deviation of noise in the clear and color (Bayer) images. This task is typically performed based upon an assumption that the noise possesses the statistics of shot noise across all light levels (or pixel intensity levels). However, in low-light situations, read noise prevails in some of the darker regions of an image. Depending on the exposure time of each image in an image sequence, the balance between read noise and shot noise may be different for each of the images in the sequence. Therefore, an optimal weighting of a set of pixels that are used for weighting or modifying a new central pixel should account for a more complex dependency of the standard deviation of noise across a plurality of different pixel intensity levels. For each of a plurality of different pixel intensity levels, a set of optimal weights may be refined and estimated for optimal noise performance (where the weights can also be considered to be RGB to luma conversion coefficients). This refinement may involve offline profiling of sensor read noise and shot noise characteristics for applying these characteristics in run time. After profiling a known gain and a known exposure time, noise standard deviation at a given intensity or local area average intensity may be estimated, thus enabling a derivation of custom weights for accounting for clear and color (Bayer) luma pixels for a given central pixel or patch.
Any of the first, second, or additional approaches outlined in the foregoing paragraphs may be employed in the context of a first sensor array that includes a predetermined pattern of color pixels, and a second sensor array that includes a clear image sensor. The first sensor array outputs a luma Y′ value having a first absolute intensity level. The second image sensor array outputs a greyscale W value having a second absolute intensity level. The approaches described herein improve a match between the first and second absolute intensity levels, such that the first absolute intensity level is closer to the second absolute intensity level.
In situations where the filtered pixel value at block 111 (
Optionally, more than one clear image sensor array, or more than one color image sensor array, or any of various combinations thereof, may be provided. Using a clear image sensor and a color image sensor is similar to the previous example of using two color image sensors. However, clear image sensor arrays may provide pixel outputs having a higher signal-to-noise ratio than color image sensor arrays. The patch similarity weights at block 109 (
According to a set of illustrative embodiments, the gathered sequential images may be greyscale images. According to another set of illustrative embodiments, the gathered sequential images are color images having red, green, and blue components, and filtering is applied to each of the red, green, and blue components separately.
Pursuant to another set of illustrative embodiments, the procedure of
The simplified or extended approach to chroma and color image filtering can be applied to the camera array 300 (
As shown in
In the present embodiment of
The WLAN transceiver 205 may, but need not, be configured to conduct Wi-Fi communications in accordance with the IEEE 802.11 (a, b, g, or n) standard with access points. In other embodiments, the WLAN transceiver 205 can instead (or in addition) conduct other types of communications commonly understood as being encompassed within Wi-Fi communications such as some types of peer-to-peer (e.g., Wi-Fi Peer-to-Peer) communications. Further, in other embodiments, the WLAN transceiver 205 can be replaced or supplemented with one or more other wireless transceivers configured for non-cellular wireless communications including, for example, wireless transceivers employing ad hoc communication technologies such as HomeRF (radio frequency), Home Node B (3G femtocell), Bluetooth and/or other wireless communication technologies such as infrared technology. Thus, although in the present embodiment the mobile device 108 has two of the wireless transceivers 203 and 205, the present disclosure is intended to encompass numerous embodiments in which any arbitrary number of (e.g., more than two) wireless transceivers employing any arbitrary number of (e.g., two or more) communication technologies are present.
Exemplary operation of the wireless transceivers 202 in conjunction with others of the internal components of the mobile device 200 can take a variety of forms and can include, for example, operation in which, upon reception of wireless signals, the internal components detect communication signals and the transceiver 202 (
Depending upon the embodiment, the mobile device 200 may be equipped with one or more input devices 210, or one or more output devices 208, or any of various combinations of input devices 210 and output devices 208. The input and output devices 208, 210 can include a variety of visual, audio and/or mechanical outputs. For example, the output device(s) 208 can include one or more visual output devices 216 such as a liquid crystal display and light emitting diode indicator, one or more audio output devices 218 such as a speaker, alarm and/or buzzer, and/or one or more mechanical output devices 220 such as a vibrating mechanism. The visual output devices 216 can include, among other things, a video screen.
The input devices 210 include the camera array 300 of
The mobile device 200 may also include one or more of various types of sensors 228. The sensors 228 can include, for example, proximity sensors (a light detecting sensor, an ultrasound transceiver or an infrared transceiver), touch sensors, altitude sensors, a location circuit that can include, for example, a Global Positioning System (GPS) receiver, a triangulation receiver, an accelerometer, a tilt sensor, a gyroscope, or any other information collecting device that can identify a current location or user-device interface (carry mode) of the mobile device 200. Although the sensors 228 are for the purposes of
The memory 206 of the mobile device 200 can encompass one or more memory devices of any of a variety of forms (e.g., read-only memory, random access memory, static random access memory, dynamic random access memory, etc.), and can be used by the processor 204 to store and retrieve data. In some embodiments, the memory 206 can be integrated with the processor 204 in a single device (e.g., a processing device including memory or processor-in-memory (PIM)), albeit such a single device will still typically have distinct portions/sections that perform the different processing and memory functions and that can be considered separate devices.
The data that is stored by the memory 206 can include, but need not be limited to, operating systems, applications, and informational data, such as a database. Each operating system includes executable code that controls basic functions of the communication device, such as interaction among the various components included among the mobile device 200, communication with external devices via the wireless transceivers 202 and/or the component interface 212, and storage and retrieval of applications and data, to and from the memory 206.
In addition, the memory 206 can include one or more applications for execution by the processor 204. Each application can include executable code that utilizes an operating system to provide more specific functionality for the communication devices, such as file system service and the handling of protected and unprotected data stored in the memory 206. Informational data is non-executable code or information that can be referenced and/or manipulated by an operating system or application for performing functions of the communication device. One such application is a client application which is stored in the memory 206 and configured for performing the methods described herein. For example, one or more applications may be configured to implement the non-local means filter described at block 107 (
The client application is intended to be representative of any of a variety of client applications that can perform the same or similar functions on any of various types of mobile devices, such as mobile phones, tablets, laptops, etc. The client application is a software-based application that operates on the processor 204 (
It should be appreciated that one or more embodiments encompassed by the present disclosure are advantageous in one or more respects. Thus, it is specifically intended that the present disclosure not be limited to the embodiments and illustrations contained herein, but include modified forms of those embodiments including portions of the embodiments and combinations of elements of different embodiments as come within the scope of the following claims.
Number | Date | Country | |
---|---|---|---|
62242350 | Oct 2015 | US |