The disclosure relates to a system and method for estimating a scene illumination using a neural network configured to predict the scene illumination based on two or more images of the same scene that are simultaneously captured by two or more cameras having different spectral sensitivities, and performing white balance corrections on the captured images.
In processing camera captured images, illuminant estimation is a critical step for computational color constancy. Color constancy refers to the ability of the human visual system to perceive scene colors as being the same even when observed under different illuminations. Cameras do not innately possess this illumination adaptation ability, and a raw-RGB image recorded by a camera sensor has significant color cast due to the scene's illumination. As a result, computational color constancy is applied to the camera's raw-RGB sensor image as one of the first steps in the in-camera imaging pipeline to remove this undesirable color cast.
In the related art, color constancy is achieved using (1) a statistics-based method or (2) a learning-based method.
Statistics-based methods operate using statistics from an image's color distribution and spatial layout to estimate the scene illuminant. These statistics-based methods are fast and easy to implement. However, these statistics-based methods make very strong assumptions about scene content and fail in cases where these assumptions do not hold.
Learning-based methods use labelled training data where the ground truth illumination corresponding to each input image is known from physical color charts placed in the scene. In general, learning-based approaches are shown to be more accurate than statistical-based methods. However, learning-based methods in the related art usually include many more parameters than statistics-based ones. The number of parameters could reach up to tens of millions in some models, which result in a relatively longer training time.
One or more example embodiments provide a system and method for estimating a scene illumination using a neural network configured to predict the scene illumination based on two or more images of the same scene that are simultaneously captured by two or more cameras having different spectral sensitivities. The multiple-camera setup may provide a benefit of improving the accuracy of illuminant estimation.
According to an aspect of an example embodiment, an apparatus for processing image data, may include: a memory storing instructions; and a processor configured to execute the instructions to: obtain a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively; spatially align the first image with the second image; obtain a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image; obtain an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and perform a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.
The neural network may be trained to minimize a loss between the estimated illuminant color and a ground-truth illuminant color, and the ground-truth illuminant color may be obtained from a color value of at least one achromatic patch in the color rendition chart.
The second image may show a wider view of the same scene than the first image, and the processor may be further configured to execute the instructions to: crop the second image to have a same view as the first image, to spatially align the first image with the cropped second image.
The processor may be further configured to execute the instructions to: down-sample the first image to obtain a down-sampled first image; down-sample the cropped second image to obtain a down-sampled second image; and compute the color transformation matrix that maps the down-sampled first image to the down-sampled second image based on color values of the down-sampled first image and the down-sampled second image.
The color transformation matrix may be a three-by-three matrix that maps RGB values of the first image to RGB values of the second image.
The output of the neural network may represent a ratio of RGB values of the estimated illuminant color.
The neural network may be further trained using augmented images, and the augmented images may be obtained by re-illuminating a first reference image and a second reference image of different scenes under different illuminations that are captured by a same reference camera, based on color transformations between first color chart values of the first reference image and second color chart values of the second reference image.
The neural network may be further trained using augmented images, and the augmented images may be obtained by re-illuminating a first reference image and a second reference image of different scenes under different illuminations that are captured by a same reference camera, based on color transformations between all color values of the first reference image and all color values of the second reference image.
The color transformation matrix may correspond to a first color transformation matrix. The processor may be further configured to execute the instructions to: obtain, from a third camera, a third image that captures the same scene in a view different from the views of the first image and the second image; spatially align the third image with the first image; spatially align the third image with the second image; obtain a second color transformation matrix that maps the first image to the third image based on the color values of the first image and color values of the third image; obtain a third color transformation matrix that maps the second image to the third image based on the color values of the second image and the color values of the third image; concatenate the first, the second, and the third color transformation matrices to obtain a concatenated matrix; obtain the estimated illuminant color from the output of the neural network by inputting the concatenated matrix to the neural network; and performing the white balance correction on the first image based on the estimated illuminant color to output the corrected first image.
The apparatus may be a user device in which the first camera and the second camera are mounted, and the first camera and the second camera may have different fields of view and different spectral sensitivities.
The apparatus may be a server including a communication interface configured to communicate with a user device including the first camera and the second camera, to receive the first image and the second image from the user device.
According to an aspect of an example embodiment, a method for processing image data may include: obtaining a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively; spatially aligning the first image with the second image; obtaining a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image; obtaining an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and performing a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.
The neural network may be trained to minimize a loss between the estimated illuminant color and a ground-truth illuminant color, and wherein the ground-truth illuminant color may be obtained from a color value of at least one achromatic patch in the color rendition chart.
The second image may show a wider view of the same scene than the first image, and the method may further include: cropping the second image to have a same view as the first image, to spatially align the first image with the cropped second image.
The method may further include: down-sampling the first image to obtain a down-sampled first image; down-sampling the cropped second image to obtain a down-sampled second image; and computing the color transformation matrix that maps the down-sampled first image to the down-sampled second image based on color values of the down-sampled first image and the down-sampled second image.
The color transformation matrix may be a three-by-three matrix that maps RGB values of the first image to RGB values of the second image.
The output of the neural network may represent a ratio of RGB values of the estimated illuminant color.
The neural network may be further trained using augmented images, and the augmented images may be obtained by re-illuminating a first reference image and a second reference image of different scenes under different illuminations that are captured by a same reference camera, based on color transformations between first color chart values of the first reference image and second color chart values of the second reference image.
The color transformation matrix may correspond to a first color transformation matrix. The method may further include: obtaining, from a third camera, a third image that captures the same scene in a view different from the views of the first image and the second image; spatially aligning the third image with the first image; spatially aligning the third image with the second image; obtaining a second color transformation matrix that maps the first image to the third image based on the color values of the first image and color values of the third image; obtaining a third color transformation matrix that maps the second image to the third image based on the color values of the second image and the color values of the third image; concatenating the first, the second, and the third color transformation matrices to obtain a concatenated matrix; obtaining the estimated illuminant color from the output of the neural network by inputting the concatenated matrix to the neural network; and performing the white balance correction on the first image based on the estimated illuminant color to output the corrected first image.
According to an aspect of an example embodiment, a non-transitory computer readable storage medium storing a program to be executable by at least one processor to perform a method for processing image data, including: obtaining a first image and a second image that capture a same scene in different views, from a first camera and a second camera, respectively; spatially aligning the first image with the second image; obtaining a color transformation matrix that maps the first image to the second image based on color values of the first image and the second image; obtaining an estimated illuminant color from an output of a neural network by inputting the color transformation matrix to the neural network, wherein the neural network is trained based on a pair of reference images of a same reference scene and a color rendition chart that are captured by different cameras having different spectral sensitivities; and performing a white balance correction on the first image based on the estimated illuminant color to output a corrected first image.
Additional aspects will be set forth in part in the description that follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments of the disclosure.
The above and other aspects, features, and aspects of embodiments of the disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
The following detailed description of example embodiments refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements.
Example embodiments of the present disclosure are directed to estimating a scene illumination in the RGB color space of camera sensors, and applying a matrix computed from estimated scene illumination parameters to perform a white-balance correction.
As shown in
Referring to
Graphs (a) and (b) shown in
For example, the pitch of photodiodes and the overall resolutions of the two image sensors (e.g., charge-coupled device (CCD) sensors) mounted in the first camera 111 and the second camera 112 may be different from each other to accommodate the different optics associated with each sensor. Also, different color filter arrays (CFA) may be used in the first camera 111 and the second camera 112 according to the different optics, which may result in the different spectral sensitivities to incoming light as shown in graphs (a) and (b) of
The first camera 111 and the second camera 112 may simultaneously capture a first (unprocessed) raw-RGB image and a second (unprocessed) raw-RGB image of the same scene, respectively, that provide different spectral measurements of the scene.
The first raw-RGB image and the second raw-RGB image may have different views while capturing the same scene. The image signal processing according to an embodiment of the present disclosure may use the color values of the scene captured with the different spectral sensitivities to estimate the scene illumination since the color values are correlated with the scene illumination.
Referring back to
In image alignment operation S110, a global homography may be used to align two different images of the same scene having different fields of view, and then down-sampling is performed on the aligned two images, prior to computing color transformation between the two images.
Specifically, down-sampling S111 and S113 and warping and cropping S112 are performed to register the pair of the first raw-RGB image and the second raw-RGB image, which capture the same scene but have different fields of view.
In a first processing pipeline, the first raw-RGB image is downscaled by a preset factor (e.g., a factor of six) in operation S111.
In a second processing pipeline, either or both of image warping and image cropping S112 are performed on the second raw-RGB image to align the second raw-RGB image with the first raw-RGB image. For example, in the second processing pipeline, the second raw-RGB image is cropped to have the same size of the field of view as the first raw-RGB image. Additionally, any one or any combination of transformation, rotation, and translation may be applied to the second raw-RGB image so that the same objects in the first raw-RGB image and the second raw-RGB image are located at the same pixel coordinates.
As shown in
At least four points x′1, x′2, x′3, and x′4 are selected from image 1 to compute the perspective transform H.
x′
1=(x′1,y′1,1)T
x′
2=(x′2,y′2,1)T
x′
3=(x′3,y′3,1)T
x′
4=(x′4,y′4,1)T
The corresponding points x1, x2, x3, and x4 in image 2 are represented as follows:
x
1=(x1,y1,1)T
x
2=(x2,y2,1)T
x
3=(x3,y3,1)T
x
4=(x4,y4,1)T
Matrix h [h1, h2, h3, h4, h5, h6, h7, h8, h9] is obtained based on the following:
Using matrix h [h1, h2, h3, h4, h5, h6, h7, h8, h9], the perspective transform H is obtained as follows:
Once the perspective transform H is computed using the calibration pattern, the warp and crop operation for a new scene is performed by applying the perspective transform H to an image captured by the second camera 112 (e.g., the second raw-RGB image). In an example embodiment, the warp and crop operation may be performed only once for the two cameras 111 and 112, rather than being performed individually for new images captured by the cameras 111 and 112.
Once the second raw-RGB image is aligned with the first raw-RGB image, down-sampling S113 is performed on the aligned second raw-RGB image.
The down-sampling S111 and the down-sampling S113 may use the same down-sampling factor to allow the down-sampled first raw-RGB image and the down-sampled first raw-RGB image to have substantially the same resolution.
However, the present embodiment is not limited thereto, and different down-sampling factors may be used for the down-sampling S111 and the down-sampling S113. Also, the first processing pipeline including operation S111 and the second processing pipeline including operations S112 and S113 may be executed in parallel or in sequence.
The down-sampling S111 and the down-sampling S113 prior to computing the color transformation, may make the illumination estimation robust to any small misalignments and slight parallax in the two views. Since the hardware arrangement of the two cameras 111 and 112 does not change for a given device (e.g., the user device 110), the homography can be pre-computed and remains fixed for all image pairs from the same device.
In color transformation operation S120, a color transformation matrix is computed to map the down-sampled first raw-RGB image from the first camera 111 to the corresponding aligned and down-sampled second raw-RGB image from the second camera 112. Fora particular scene illuminant, the color transformation between the two different images of the same scene may have a unique signature that is related to the scene illumination. Accordingly, the color transformation itself may be used as the feature for illumination estimation.
Given the first raw-RGB image I1∈Rn×3 and the second raw-RGB image I2 ∈Rn×3 with n pixels of the same scene captured by the first camera 111 and the second camera 112, under the same illumination L∈R3, there exists a linear color transformation T∈R3×3 between the color values of the first raw-RGB images and the second raw-RGB image I2 as:
I
2
≈I
1
T Equation (1)
such that T is unique to the scene illumination L.
T is computed using the pseudo inverse, as follows:
T=(I1TI1)−1I1TI2 Equation (2)
For example, the linear color transformation T may be represented in a 3×3 color transformation matrix as follows:
More specifically, given A denotes pixel values in R, G, B color channels for the down-sampled first raw-RGB image, B denotes pixel values in R, G, B color channels for the aligned and down-sampled second raw-RGB image, the 3×3 color transformation matrix T between A and B is calculated as follows.
In the matrices of A and B, the three columns correspond to R, G, B color channels, and the rows correspond to the number of pixels in the down-sampled first raw-RGB image and the aligned and down-sampled second raw-RGB image, respectively.
Using a pseudo-inverse equation, the 3×3 color transformation matrix T is calculated as follows:
In the embodiment, the 3×3 color transformation matrix is used since the 3×3 color transformation matrix is linear and accurate, and computationally efficient. However, the size of the color transformation matrix is not limited thereto, and any 3×M color transformation matrix (wherein M=3) may be used.
In illumination estimation operation S130, a neural network trained for estimating the illumination of the scene (e.g., the illuminant color) receives, as input, the color transformation, and outputs a two-dimensional (2D) chromaticity value that corresponds to the illumination estimation of the scene. The 2D chromaticity value may be represented by a ratio of R, G, and B values, such as 2D [R/G B/G]. For example, the estimated illumination {circumflex over (L)} is expressed as:
Referring to
The neural network according to an example embodiment may be required to process only the nine parameters in the color transformation matrix, and as a result, the neural network is relatively very light compared with other image processing networks, and therefore is capable of being efficiently run on-device in real time.
A method and a system for training the neural network will be described later with reference to
Referring back to
Parameters such as the R gain and the B gain (i.e., the gain values for the red color channel and the blue color channel) for white balance adjustment are calculated based upon a preset algorithm.
In an embodiment, white balance correction factors (e.g., α, β, γ) are selected for the first raw-RGB image based on the estimated illumination, and each color component (e.g., RWB, GWB, BWB) of the first raw-RGB image is multiplied with its respective correction factor (e.g., α, β, γ) to obtain white-balanced color components (e.g., αRWB, βGWB, γBWB).
In an embodiment, a R/G correction factor and a B/G correction factor may be computed based on the estimated illumination, to adjust the R/G gain and B/G gain of the first raw-RGB image.
The user device 110 includes one or more devices configured to generate an output image. For example, the user device 110 may include a computing device (e.g., a desktop computer, a laptop computer, a tablet computer, a handheld computer, a smart speaker, a server, etc.), a mobile phone (e.g., a smart phone, a radiotelephone, etc.), a camera device, a wearable device (e.g., a pair of smart glasses or a smart watch), or a similar device.
The server 120 includes one or more devices configured to train a neural network for predicting the scene illumination using camera images to correct scene colors in the camera images. For example, the server 120 may be a server, a computing device, or the like. The server 120 may receive camera images from an external device (e.g., the user device 110 or another external device), train a neural network for predicting illumination parameters using the camera images, and provide the trained neural network to the user device 110 to permit the user device 110 to generate an output image using the neural network.
The network 130 includes one or more wired and/or wireless networks. For example, network 130 may include a cellular network (e.g., a fifth generation (5G) network, a long-term evolution (LTE) network, a third generation (3G) network, a code division multiple access (CDMA) network, etc.), a public land mobile network (PLMN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a telephone network (e.g., the Public Switched Telephone Network (PSTN)), a private network, an ad hoc network, an intranet, the Internet, a fiber optic-based network, or the like, and/or a combination of these or other types of networks.
The number and arrangement of devices and networks shown in
As shown in
The bus 210 includes a component that permits communication among the components of the device 200. The processor 220 is implemented in hardware, firmware, or a combination of hardware and software. The processor 220 is a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), a microprocessor, a microcontroller, a digital signal processor (DSP), a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), or another type of processing component. The process 220 includes one or more processors capable of being programmed to perform a function.
The memory 230 includes a random access memory (RAM), a read only memory (ROM), and/or another type of dynamic or static storage device (e.g., a flash memory, a magnetic memory, and/or an optical memory) that stores information and/or instructions for use by the processor 220.
The storage component 240 stores information and/or software related to the operation and use of the device 200. For example, the storage component 240 may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, and/or a solid state disk), a compact disc (CD), a digital versatile disc (DVD), a floppy disk, a cartridge, a magnetic tape, and/or another type of non-transitory computer-readable medium, along with a corresponding drive.
The input component 250 includes a component that permits the device 200 to receive information, such as via user input (e.g., a touch screen display, a keyboard, a keypad, a mouse, a button, a switch, and/or a microphone). The input component 250 may include a sensor for sensing information (e.g., a global positioning system (GPS) component, an accelerometer, a gyroscope, and/or an actuator).
In particular, the input component 250 may include two or more cameras, including the first camera 111 and the second camera 112 illustrated in
The output component 260 includes a component that provides output information from the device 200 (e.g., a display, a speaker, and/or one or more light-emitting diodes (LEDs)).
The communication interface 270 includes a transceiver-like component (e.g., a transceiver and/or a separate receiver and transmitter) that enables the device 200 to communicate with other devices, such as via a wired connection, a wireless connection, or a combination of wired and wireless connections. The communication interface 270 may permit device 200 to receive information from another device and/or provide information to another device. For example, the communication interface 270 may include an Ethernet interface, an optical interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, a Wi-Fi interface, a cellular network interface, or the like.
The device 200 may perform one or more processes described herein. The device 200 may perform operations S110-S140 based on the processor 220 executing software instructions stored by a non-transitory computer-readable medium, such as the memory 230 and/or the storage component 240. A computer-readable medium is defined herein as a non-transitory memory device. A memory device includes memory space within a single physical storage device or memory space spread across multiple physical storage devices.
Software instructions may be read into the memory 230 and/or the storage component 240 from another computer-readable medium or from another device via the communication interface 270. When executed, software instructions stored in the memory 230 and/or storage component 240 may cause the processor 220 to perform one or more processes described herein.
Additionally, or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein. Thus, embodiments described herein are not limited to any specific combination of hardware circuitry and software.
The neural network according to an embodiment is trained to predict the illuminant for the first camera 111 and the illuminant for the second camera 112 using the same color transforms, but for simplicity, the description of the training process in the present disclosure focuses on estimating the illuminant for the first camera 111.
As shown in
The first camera 111 and the second camera 112 may simultaneously capture a first raw-RGB image and a second raw-RGB image of the same scene, respectively, that provide different spectral measurements of the scene. The first raw-RGB image and the second raw-RGB image may have different views while capturing the same scene.
For the purposes of training the neural network, the first camera 111 and the second camera 112 may capture a color rendition chart as shown in
Hereinafter, the first raw-RGB image and the second raw-RGB image may be referred to as image 1 and image 2.
In operation S210, image 1 and image 2 are spatially aligned with each other, for example, using a global homography. For example, image 2 is cropped to have the same size of the field of view as image 2, and any one or any combination of transformation, rotation, and translation is applied to image 2 so that the same objects (e.g., the slide) in image 1 and image 2 are located at the same pixel coordinates.
In turn, the aligned image 1 and image 2 are down-sampled prior to computing color transformation between image 1 and image 2. The down-sampling may make the illumination estimation robust to any small misalignments and slight parallax in the two views of images 1 and 2. Since the hardware arrangement of the two cameras 111 and 112 does not change for a given device, the homography can be pre-computed and remains fixed for all image pairs from the same device.
In operation S220, a color transformation matrix is computed to map the down-sampled image 1 from the first camera 111 to the corresponding aligned and down-sampled image from the second camera 112. For example, the color transformation matrix may be computed based on Equations (1) and (2).
In operation S230, a neural network for estimating the illumination of the scene is constructed to have the structure shown in
In the training process, the neural network receives, as input, the parameters of the color transformation matrix, and outputs a two-dimensional (2D) chromaticity value that corresponds to the illumination estimation of the scene. The 2D chromaticity value may be represented as 2D [R/G B/G], indicating a ratio of a red color value to a green color value, and a ratio of a blue color value to the green color value.
Given a dataset of M image pairs L={(I11,I21), . . . , (I1M,I2M)}, the corresponding color transformations T1, . . . , TM between each pair of images are computed using Equation (2), as follows:
T={T
1
, . . . ,T
M}
(I11,I21) may denote image 1 and image 2, and T1 may denote color transformation between image 1 and image 2. The training process according to the embodiment is described using the pair of images 1 and 2, but a large number of paired images may be used for training the neural network. Augmented training images may be developed by applying mathematical transformation functions to camera captured images. The description of data augmentation will be provided later with reference to
In operation S240, a set of corresponding target ground truth illuminations L of image I1i (i.e., as measured by the first camera 111) is obtained from each pair of images as follows:
L={L
1
, . . . ,L
M},
L1 may denote a ground truth illumination of image 1. The ground truth illumination L1 may be obtained by extracting the image area of the neutral patches from image 1 and measuring pixel colors of the neutral patches since the neutral patches work as a good reflector of the scene illumination. For example, average pixel colors L1 [Ravg, Gavg, Bavg] inside the neutral patches may be used as the ground truth illumination L1 for image 1.
The neural network fθ: T→L is trained with parameters θ to model the mapping between the color transformations T and scene illuminations L. The neural network f∈ may predict the scene illumination L for the first camera 111 given the color transformation T between image 1 and image 2, as follows:
{circumflex over (L)}=f
θ(T) Equation (3)
In operation S250, the neural network f∈ is trained to minimize the loss between the predicted illuminations {circumflex over (L)}i and the ground truth illuminations Li as follows:
The neural network according to an embodiment is lightweight, for example, consisting of a small number (e.g., 2, 5, or 16) of dense layers, wherein each layer has nine neurons only. The total number of parameters may range from 200 parameters for the 2-layer neural network up to 1460 parameters for the 16-layer neural network. The input to the neural network is the flattened nine values of the color transformation T and the output is two values corresponding to the illumination estimation in the 2D [R/G B/G] chromaticity color space where the green channel's value may be set to 1.
According to embodiments of the present disclosure, the user device 110 or the server 120 may use the neural network that has been trained by an external device without performing an additional training process on the user device 110 or the server 120, or alternatively may continue to train the neural network in real time on the user device 110 or the server 120.
Due to the difficulty in obtaining large datasets of image pairs captured with two cameras under the same illumination, a data augmentation process may be performed to increase the number of training samples and the generalizability of the model according to an example embodiment.
As shown in
Various methods may be used to re-illuminate an image which will be described with references to
As shown in
In order to re-illuminate the captured image I1 based on the color values of the captured image I2, the color rendition chart is extracted from each of the captured image I1 and the captured image I2. A color transformation matrix T is computed based on the color chart values of the captured image I1 and the color chart values of the captured image I2. The color transformation matrix T may convert the color chart values of the captured image I1 to the color chart values of the captured image I2.
The color transformation matrix T is applied to the captured image I1 to transform approximately all the colors in the captured image I1 and thereby to obtain the re-illuminated image I1′ which appears to be captured under illuminant L2.
While
In an example embodiment of the present disclosure, given a small dataset of raw-RGB image pairs captured with two cameras and including the color rendition charts, the color values of the color chart patches (e.g., the 24 color chart patches shown in
A color transformation TC1i→1j∈R3×3 between each pair of images (I1i, I1j) is obtained from the first camera 111 based only on the color chart values from the two images (I1i, I1j) as follows:
T
C
1i→1j=(I1iTI1i)−1I1iTI1j
Similarly, the color transformation TC2i→2j for image pairs (I2i, I2j) is obtained from the second camera 112 as follows:
T
C
2i→2j=(I2iTI2i)−1I2iTI2j
This bank of color transformations is applied to augment images by re-illuminating any given pair of images from the two cameras (I1i,I2i) to match their colors to any target pair of images I1j, I2j, as follows:
I
1i→j
=I
1i
T
C
1i→1j
I
2i→j
=I
2i
T
C
2i→2j
where i→j means re-illuminating image i to match the colors of image j. Using this illuminant augmentation method, the number of training image pairs may be increased from M to M2.
According to the data augmentation process shown in
However, the data augmentation process is not limited to the method of using the color rendition charts as shown in
Referring to
The color transformation is applied to image I1 to change neutral color values of image I1 and thereby to obtain image I1′ which appears to be captured under the target illuminant L2[r2, g2, b2]. Image I1′ as well as image I1 may be used to train the neural network.
In an embodiment shown in
According to the embodiment shown in
When there are N cameras (wherein N>2),
3×3 color transformation matrices are constructed independently using the process described with reference to
color transformation matrices are then concatenated and fed as input to the neural network. In particular, the feature vector that is input to the network is of the size of
In detail, referring to
The raw-RGB image 1 and the raw-RGB image 2 are re aligned with each other and down-sampled for calculation of a first color transformation between the down-sampled raw-RGB image 1 and the aligned and down-sampled raw-RGB image 2.
The raw-RGB image 1 and the raw-RGB image 3 are aligned with each other and down-sampled for calculation of a second color transformation between the down-sampled raw-RGB image 1 and the aligned and down-sampled raw-RGB image 3.
The raw-RGB image 2 and the raw-RGB image 3 are aligned with each other and down-sampled for calculation of a third color transformation between the down-sampled raw-RGB image 2 and the aligned and down-sampled raw-RGB image 3.
The first color transformation, the second color transformation, and the third color transformation are concentrated at a concatenation layer, and then are fed as input to a neural network for estimating the scene illumination.
Each of the first color transformation, the second color transformation, and the third color transformation may be a 3×3 matrix. The neural network may have an input layer having 27 nodes for receiving 27 parameters of the concatenated matrices, an output layer having 2 nodes for outputting a 2D chromaticity value for correcting color values of the raw-RGB image 1, and a set of hidden layers located between the input layer and the output layer.
The foregoing disclosure provides illustration and description, but is not intended to be exhaustive or to limit the implementations to the precise form disclosed. Modifications and variations are possible in light of the above disclosure or may be acquired from practice of the implementations.
As used herein, the term “component” is intended to be broadly construed as hardware, firmware, or a combination of hardware and software.
It will be apparent that systems and/or methods, described herein, may be implemented in different forms of hardware, firmware, or a combination of hardware and software. The actual specialized control hardware or software code used to implement these systems and/or methods is not limiting of the implementations. Thus, the operation and behavior of the systems and/or methods were described herein without reference to specific software code—it being understood that software and hardware may be designed to implement the systems and/or methods based on the description herein.
Even though particular combinations of features are recited in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of possible implementations. In fact, many of these features may be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each dependent claim listed below may directly depend on only one claim, the disclosure of possible implementations includes each dependent claim in combination with every other claim in the claim set.
No element, act, or instruction used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items, and may be used interchangeably with “one or more.” Furthermore, as used herein, the term “set” is intended to include one or more items (e.g., related items, unrelated items, a combination of related and unrelated items, etc.), and may be used interchangeably with “one or more.” Where only one item is intended, the term “one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise.
Expressions such as “at least one of,” when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list. For example, the expression, “at least one of a, b, and c,” should be understood as including only a, only b, only c, both a and b, both a and c, both b and c, all of a, b, and c, or any variations of the aforementioned examples.
While such terms as “first,” “second,” etc., may be used to describe various elements, such elements must not be limited to the above terms. The above terms may be used only to distinguish one element from another.
This application is based on and claims priority under 35 U.S.C. § 119 to U.S. Provisional Patent Application No. 63/114,079 filed on Nov. 16, 2020, U.S. Provisional Patent Application No. 63/186,346 filed on May 10, 2021, in the U.S. Patent & Trademark Office, the disclosures of which are incorporated herein by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
63186346 | May 2021 | US | |
63114079 | Nov 2020 | US |