This invention relates to Virtual Reality (VR) panorama generation, and more particularly to color, luminance, and sharpness balancing when stitching images together.
In a typical Virtual Reality (VR) application, a 360-degree panoramic image or video is captured. A user wearing special goggles such as a Head-Mounted-Display (HMI)) can actively select and vary his viewpoint to get an immersive experience in a 360-degree panoramic space.
A wide variety of interesting and useful applications are possible as VR camera, technology improves and shrinks. A helmet cam such as a GoPro camera could be replaced by a VR panorama camera, set to allow the capture of 360-degree panoramas while engaging in various sports activities such as mountain biking, skiing, skydiving, traveling, etc. A VR camera placed in a hospital operating room could allow a remote surgeon or medical student to observe and interact with the operation using a VR headset or other tools. Such applications could require a very accurate rendering of the virtual space.
How the 360-degree panoramic video is captured and generated can affect the quality of the VR experience. When multiple cameras are used, regions where two adjacent camera images intersect often have visual artifacts and distortion that can mar the user experience.
In
In
Image problems caused by stitching may have various causes. Exposure time and white balance may vary from image to image. Different focal lengths may be used for each camera in the ring. Some lenses may get dirty while other lenses remain clean.
The opposite effect is seen in the foreground illumination. The brighter sky in image 122 upsets the white balance so that the plaza in the foreground is noticeably darker in region 124 than in surrounding regions 126. Abrupt transitions occur at 112, 114 between region 124 and surrounding regions 126. These abrupt transitions 112, 114 would not be visible to the human eye looking at the actual scene—they are errors created by white-balancing mismatch between adjacent captured images. These abrupt luminance transitions are undesirable.
Various prior-art techniques have been used to adjust the color, luminance, and sharpness of stitched images. The intensities of pixels are globally adjusted for color balance in an attempt to render neutral colors correctly. Color balance is a more generic term that can include gray balance, white balance, and neutral balance. Color balance changes the overall mixture of colors but is often a manual technique that requires user input.
Gamma correction is a non-linear adjustment that uses a gamma curve that defined the adjustment. User input is often required to select or adjust the gamma curve.
Histogram-based matching adjusts an image so that its histogram matches a specified histogram. Artifacts (noise) are created when a color matches to a darker reference image (the pixel is changed from a bright value to a darker value). Loss of image details occurs when a color matches to a brighter reference image (pixel is changed from dark to bright). Misalignment in overlapping regions between images can lead to incorrect color matching.
Unsharp masking uses a blurred, or “unsharp”, negative image to create a mask of the original image. The unsharped mask is then combined with the positive (original) image, creating an image that is less blurry than the original. Unsharp masking suffers because of the difficulty in choosing which parts in an image for sharpening.
Brightening the sky pixels to fix the dark sky of image 120 to better match the surrounding sky of image 122 (
While histogram matching, white balancing, and other prior-art techniques are useful for eliminating abrupt color changes where images are stitched together in a panorama, these techniques can still produce visible artifacts, or result in a loss of image detail .
What is desired is a Virtual Reality (VR) panorama generator that reduces or eliminates artifacts or loss of detail at interfaces where images from adjacent cameras are stitched together. A panorama generator that performs white balance and sharpness adjustments at image interfaces without creating new artifacts or losing detail is desirable. A panorama generator using color, luminance, and sharpness balancing to better match stitched images is desired.
The present invention relates to an improvement in stitched image correction. The following description is presented to enable one of ordinary skill in the art to make and use the invention as provided in the context of a particular application and its requirements. Various modifications to the preferred embodiment will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed.
Histograms of pixel values are generated for pixels in the overlapping region, step 216. Each histogram shows the number of occurrences within the overlapping region of a pixel value, for all possible pixel values. Thus the histogram shows the number of times each pixel value occurs. One histogram is generated for Y, another for U, and a third for V, for both the source image, and for the target image, for a total of 6 histograms. Only pixels within the overlapping region are included in the histograms.
The luminance Y values are processed separately from the chrominance U and V values. Y-channel process 220, shown later in
The adjusted Y, U, and V values are combined to form new YUV pixels, step 242, for the whole source image. These new YUV pixels replace the old YUV pixels in the source image. The source and target images are stitched together such as by using a blending algorithm with the new YUV values for the entire source image, including the overlapping region, step 244. Sharpening process 250 (
The Cumulative Density Function (CDF) is generated from the Y histograms for the source and target image, step 222. The Y color transfer curve is then generated from the two CDF's, step 224. This color transfer curve is then averaged to smooth it out, generating an averaged Y color transfer curve step 226. A moving average or a sliding window can be used. Pixels from the source image are adjusted using the averaged Y color transfer curve to generate the new adjusted Y values for the whole source image, step 228. These new adjusted Y luminance values are then scaled by a ratio, step 229. The scaling ratio is the brightest Y value in the Y color transfer curve divided by the brightest Y value in the averaged Y color transfer curve. This scales the pixels up to the brightest value to compensate for any loss of brightness due to averaging.
A moving average is taken of these four histograms, step 232. The Cumulative Density Function (CDF) is generated from theses moving averages of the U and V histograms for the source and target image, step 234. The U and V color transfer curves are generated from the four CDF's, step 236. Pixels II values from the source image are adjusted using the U color transfer curve to generate the new adjusted II values for the whole source image, step 238. Likewise, pixel V values from the source image are adjusted using the V color transfer curve to generate the new adjusted V values for the whole source image, step 238
Similarly for target image 310, target-Y histogram 312 shows the counts of Y-values within overlapping region 313, target-U histogram 314 shows the counts of U-values within overlapping region 313, and target-V histogram 316 shows the counts of V-values within overlapping region 313. A total of 6 histograms are generated.
In
Also, in
In
This Y color transfer curve 352 could be looked up using the source Y values to get the new adjusted source Y values. However, the inventors have noticed that there can be abrupt changes in the slope of Y color transfer curve 352, and the inventors believe that these abrupt slope changes cause artifacts such as shown in
When Y values for pixels in the source image are adjusted, averaged Y color transfer curve 354 is used rather than Y color transfer curve 352. Using averaged Y color transfer curve 354 produces fewer artifacts because the rate of change of averaged Y color transfer curve 354 is less than for Y color transfer curve 352 due to the averaging.
Surprisingly, averaging p eliminate both the artifacts problem and the loss of detail problems. Even though artifacts and loss of detail occur at opposite extremes, they are both solved by averaging, which reduces extremes.
In
For example, a large cumulative count value intersects source CDF curve 332 at a Y-value of 210. This same large cumulative count value intersects target CDF curve 342 at a Y-value of 200. See the upper dashed line that intersects both source CDF curve 332 and target CDF curve 342. Thus one (source, target) pair is (210, 200).
Another, smaller cumulative count value intersects source CDF curve 332 at a Y-value of 150. This same smaller cumulative count value intersects target CDF curve 342 at a Y-value of 30. See the lower dashed line that intersects both source CDF curve 332 and target CDF curve 342. Thus another (source, target) pair is (150,30).
Many others of these (source, target) pairs are extracted in a similar fashion for all the other cumulative count values. These (source, target) pairs are then plotted as shown in
Using averaged Y color transfer curve 354 rather than Y color transfer curve 352 causes the new adjusted Y values to be less extreme. Instead of 200, 170 is used, and instead of 30, 50 is used. Using Y color transfer curve 352, the difference in Y values in the source image is 200-30 or 170, while using averaged Y color transfer curve 354 the Y value difference is 170-50 or 120. Since 120 is less than 170, any spurious artifacts should be reduced. These less extreme Y values can reduce artifacts.
When performing color transfer, all pixels in the source image having a Y value of 210 are converted to new Y values of 170, using averaged Y color transfer curve 354. Likewise, all pixels in the source image having a Y value of 150 are converted to new Y values of 50. Any Y value in the source image can be looked up using averaged Y color transfer curve 354 to find the new Y value.
When the source image is bright, such as shown for source-Y histogram 302, and target image is dark, such as shown for target-Y histogram 312, (
Alternately, when the source image is dark and target image is bright, (
Averaging Y color transfer curve 352 to generate averaged Y color transfer curve 354 causes the shape to be smoothed out, reducing any bending that might cause the dark-to-bright artifacts to be generated (
As seen in the graph of
The maximum Y value MAX is 235 for some YUV pixel encodings. This maximum Y value MAX intersects Y color transfer curve 352 at point A. However, when averaged Y color transfer curve 354 is used, this maximum Y value MAX intersects averaged Y color transfer curve 354 at a smaller value B. Since B is smaller than A, using averaged Y color transfer curve 354 does not fully expand Y values to the full Y range of 0 to 235. This is undesirable, since saturated objects such as clouds in the sky may have the same saturated value for all images for better matching.
To compensate for the reduction of luminance range due to averaging, the new adjusted Y luminance values are scaled by a ratio of A/B. The scaling ratio is the brightest Y value in the Y color transfer curve divided by the brightest Y value in the averaged Y color transfer curve. This scales the pixels up to the brightest value to compensate for any loss of brightness due to averaging.
Using this process, adjacent color values tend to have similar color counts (histogram bar heights). Also, color distribution is more even when averaging is performed on histograms. This reduces the introduction of extra color that might be caused by misalignment
In
Target-U histogram 314 has averaged target-U histogram 364 superimposed, while target-V histogram 316 has averaged target-V histogram 368 superimposed. A shorter moving average can be used to make these averaged histograms more responsive, compared to the longer moving average used for generating averaged Y color transfer curve 354 (
In
In
A similar process is used for the V values to combine source-V CDF (not shown) and the target-V CDF (not shown) to create the V color transfer curve (not shown).
Without histogram averaging, step 232 of
With histogram averaging,
Using a color transfer curve generated with histogram averaging can minimize incorrect color matching due to mismatch of image contents in the overlapping regions (misalignment errors).
Since the human eye is more sensitive to brightness (Y) than to color (U,V), abrupt changes in U color transfer curve 380 do not create visible U,V artifacts.
Thus averaging of the Y color transfer curve prevents the creation of dark artifacts for pixels that are decreased in Y, or darkened by the balancing process. These bright-to-dark pixels do not create artifacts. Averaging Y color transfer curve 352 to use averaged Y color transfer curve 354 can both reduce artifacts (
The Y values are extracted from the panorama of stitched images, step 252. The entire panoramic image space is divided into blocks. Each block is further sub-divided into sub-blocks. For example, 16×16 blocks can be subdivided into 81 8×8 sub-blocks, or and 8×8 block can be sub-divided into 25 4×4 sub-blocks, or a 4×4 block could be sub-divided into nine 2×2 blocks. Just one sub-block size may be used for the whole panorama.
The sum-of-the-absolute difference (SAD) of the Y values is generated for each sub-block in each block, and the maximum of these SAD results (MAX SAD) is taken for each block, step 254. The MAX SAD value indicates the maximum difference among pixels within any one sub-block in the block. A block having a sub-block with a large pixel difference can occur when an edge of some visual object passes through the sub-block. Thus larger MAX SAD values indicate sharp features.
The MAX SAD value is used for the entire block. The MAX SAD value may be divided by 235 and then divided by 4 to normalize it to the 0 to 1 range. The MAX SAD value for each block is compared to one or more threshold levels, step 256. Blocks are separated into two or more sharpness regions, based on the threshold comparison, step 258. Sharpening is performed for all blocks in a sharpness region using a same set of sharpening parameters, regardless of which original image the block was extracted from. Different sharpness regions may use different parameters to control the sharpening process, step 262. The sharpened Y values over-write the Y values of the YUV pixels, and the image is output for the entire panorama, step 260.
For example, when there are two thresholds, blocks may be divided into three sharpness regions, such as sharp, blurry, and more blurry. These regions can span all images in the panorama, so sharpness is processed for the entire panorama space, not for individual images. This produces a more uniform panorama without abrupt changes in sharpness between images that are stitched together.
Blocks in upper sharpness region. 152 can be processed with sharpening parameters that sharpen edges, while blocks in lower sharpness region 154 can be processed with other sharpening parameters that sharpen the white region. Thus the buildings are sharpened to a particular level, while the road pavement is sharpened to another level. This approach is intended to balance the sharpness of a whole panorama with different levels of sharpness regions. Since the sharpness region span multiple stitched images, sharpening is consistent across all stitched images in the panorama.
Several other embodiments are contemplated by the inventors. For example, additional functions and step could be added, and some steps could be performed simultaneously with other steps, such as in a pipeline, or could be executed in a re-arranged order. For example, adjusting the overall luminance by scaling Y values (
While a single panorama image space that is generated by stitching together images has been described, the images could be part of a sequence of images, such as for a video, and a sequence of panoramic images could be generated for different points in time. The panoramic space could thus change over time.
While YUV pixels have been described, other formats for pixels could be accepted and converted into YUV format. The YUV format itself may have different bit encodings and bit widths (8, 16, etc.) for its sub-layers (Y, U, V), and the definitions and physical mappings of Y, U, and V to the luminosity and color may vary. Other formats such as RGB, CMYK, HSL/HSV, etc. could be used. The term YUV is not restricted to any particular standard but can encompass any format that uses one sub-layer (Y) to represent the brightness, regardless of color, and two other sub-layers (U,V) that represent the color space.
The number of Y value data points that are averaged when generating averaged Y color transfer curve 354 can be adjusted. More data points being averaged together produces a smoother curve for averaged Y color transfer curve 354, while fewer Y data pints in the moving average provides a more responsive curve that more closely follows Y color transfer curve 352. For example, when Y is in the range of 0 to 235, a moving average of 101 Y data values can be used. The moving average can contain data values from either or both sides of the current data value, and the ratio of left and right side data points can vary, or only data points to one side of the current data value may be used, such as only earlier data points. Extra data points for padding may be added, such as Y values of 0 at the beginning of the curve, and 235 at the end of the curve.
Likewise, the number of histogram bars that are averaged by the moving average that generates averaged U histogram 362 and other U, V chroma histograms can be varied. The moving average parameter or window size can be the same for all histograms or for all histograms and for averaged Y color transfer curve 354, or can be different. In one example, a moving average of 5 histogram bars is used with 2 padded values at the beginning and 2 padded values at the end.
The number of sharpness thresholds can be just one, or can be two or more for multi-thresholding. The amount of sharpening can vary from region to region, and can be adjusted based on the application, or for other reasons. Many different parameter values can be used.
Various resolutions could be used, such as HD, 4K, etc., and pixels and sub-layers could be encoded and decoded in a variety of ways with different formats, bit widths, etc. Additional masks could be used, such as for facial recognition, image or object tracking, etc.
While images show errors such as bright-to-dark artifacts and loss of detail have been shown, the appearance of errors may vary greatly with the image itself, as well as with the processing methods, including any pre-processing. Such images that are included in the drawings are merely to better understand the problems involved and how the inventors solve those problems and are not meant to be limiting or to define the
Color pixels could be converted to gray scale for searching in search windows with a query patch. Color systems could be converted during pre or post processing, such as between YUV and RGB, or between pixels having different bits per pixel. Various pixel encodings could be used, and frame headers and audio tracks could be added. GPS data or camera orientation data could also be captured and attached to the video stream.
While sum-of-the-absolute difference (SAD) has been described, other methods may be used, such as Mean-Square-Error (MSE), Mean-Absolute-Difference (MAD), Sum-of-Squared Errors, etc. Rather than use macroblocks, smaller blocks may be used, especially around object boundaries, or larger blocks could be used for background or objects. Regions that are not block shaped may also be operated upon.
When used in various processes, the size of the macroblock may be 8×8, 16×16, or some other number of pixels. While macroblocks such as 16×16 blocks and 8×8 have been described, other block sizes can be substitutes, such as larger 32×32 blocks, 16×8 blocks, smaller 4×4 blocks, etc. Non-square blocks can be used, and other shapes of regions such as triangles, circles, ellipses, hexagons, etc., can be used as a patch region or “block”. Adaptive patches and blocks need not be restricted to a predetermined geometrical shape. For example, the sub-blocks could correspond to content-dependent sub-objects within the object. Smaller block sizes can be used for very small objects.
The size, format, and type of pixels may vary, such as RGB, YUV, 8-bit, 16-bit, or may include other effects such as texture or blinking. When detecting overlapping regions from source and target images, a search range of a query patch in the search window may be fixed or variable and may have an increment of one pixel in each direction, or may increment in 2 or more pixels or may have directional biases. Adaptive routines may also be used. Larger block sizes may be used in some regions, while smaller block sizes are used near object boundaries or in regions with a high level of detail.
The number of images that are stitched together to form a panorama may vary with different applications and camera systems, and the relative size of the overlap regions could vary. Panoramic images and spaces could be 360-degree, or could be spherical or hemi-spherical, or could be less than a full 360-degree wrap-around, or could have image pieces missing for various reasons. The shapes and other features of curves and histograms can vary greatly with the image itself.
Graphs, curves, tables, and histograms are visual representations of data sets that may be stored in a variety of ways and formats, but such graphic representations are useful for understanding the data sets and operations performed. The actual hardware may store the data in various ways that do not at first appear to be the graph, curve, or histograms, but nevertheless are alternative representations of the data. For example, a linked list may be used to store the histogram data for each bar, and (source, target) pairs may also be stored in various list formats that still allow the graphs to be re-created for human analysis, while being in a format that is more useful for reading by a machine. A table could be used for averaged Y color transfer curve 354. The table has entries that are looked up by the source Y value, and the table entry is read to generate the new Y value. The table or linked list is an equivalent of averaged Y color transfer curve 354, and likewise tables or linked lists could be used to represent the histograms, etc.
Various combinations of hardware, programmable processors, software, and firmware may be used to implement functions and blocks. Pipelining may be used, as may parallel processing. Various routines and methods may be used, and factors such as the search range and block size may also vary.
It is not necessary to fully process all blocks in each time-frame. For example, only a subset or limited area of each image could be processed. It may be known in advance that a moving object only appears in a certain area of the panoramic frame, such as a moving car only appearing on the right side of a panorama captured by a camera that has a highway on the right but a building on the left. The “frame” may be only a subset of the still image captured by a camera or stored or transmitted,
The background of the invention section may contain background information about the problem or environment of the invention rather than describe prior art by others. Thus inclusion of material in the background section is not an admission of prior art by the Applicant.
Any methods or processes described herein are machine-implemented or computer-implemented and are intended to be performed by machine, computer, or other device and are not intended to be performed solely by humans without such machine assistance. Tangible results generated may include reports or other machine-generated displays on display devices such as computer monitors, projection devices, audio-generating devices, and related media devices, and may include hardcopy printouts that are also machine-generated. Computer control of other machines is another tangible result.
Any advantages and benefits described may not apply to all embodiments of the invention. When the word “means” is recited in a claim element, Applicant intends for the claim element to fall under 35 USC Sect. 112, paragraph 6. Often a label of one or more words precedes the word “means”. The word or words preceding the word “means” is a label intended to ease referencing of claim elements and is not intended to convey a structural limitation. Such means-plus-function claims are intended to cover not only the structures described herein for performing the function and their structural equivalents, but also equivalent structures. For example, although a nail and a screw have different structures, they are equivalent structures since they both perform the function of fastening. Claims that do not use the word “means” are not intended to fall under 35 USC Sect. 112, paragraph 6. Signals are typically electronic signals, but may be optical signals such as can be carried over a fiber optic line.
The foregoing description of the embodiments of the invention has beer presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope o the invention be limited not by this detailed description, but rather by the claims appended hereto.