The present invention relates to an apparatus and method for detecting an object automatically and estimating depth information of an image captured by an imaging device having a multiple color-filter aperture, and more particularly, to an apparatus and method for detecting an object region automatically and estimating depth information from an image captured by an imaging device having an aperture having a plurality of color filters of different colors installed therein, that is, a multiple color-filter aperture (MCA).
Much research has been conducted on a method of estimating three-dimensional depth information, which is used in a variety of fields, such as robot vision, human computer interface, intelligent visual surveillance, 3D image acquisition, intelligent driver assistant system, and so on.
Most conventional methods for 3D depth information estimation such as stereo vision depend on a plurality of images. Stereo matching is a method of estimating a depth by using binocular disparity that occurs in images obtained by two cameras. This method has a lot of advantages, but has a fundamental limitation in that a pair of images for the same scene, which are obtained by two cameras, are needed.
Research is also being conducted on a monocular method as an alternative of the method of using binocular disparity. As an example, a depth from defocus (DFD) method is a single camera-based depth estimation method, which estimates a degree of defocus blur by using a pair of images having different focuses that are captured from the same scene. However, the method has a limitation in that a fixed camera view is needed to capture a plurality of defocused images.
Thus, much research has been conducted on a method of estimating a depth by using one image, not a plurality of images.
Recently, a computational camera has been developed to obtain new information, which cannot have been obtained by an existing digital camera, and thus may provide new functionality to a consumer video device. The computational camera generates a final image by using a new combination of optics and calculation, and allows new image functions that cannot be achieved by an existing camera, such as an enhanced field of view, an increased spectral resolution, and an enlarged dynamic range.
Meanwhile, a color shift model using a multiple color-filter aperture (MCA) may provide depth information about objects positioned at different distances from a camera according to a relative direction and amount of shift between color channels of an image. However, existing MCA-based depth information estimation methods need a process of manually selecting an object region in an image in advance, in order to estimate object depth information.
The present invention is directed to providing an apparatus and method for detecting an object automatically and estimating depth information of an image captured by an imaging device having a multiple color-filter aperture, which can automatically detect an object from an image having a focus restored due to a shift characteristic of a color channel and estimate depth information on the detected object.
The present invention is also directed to providing a computer-readable recording medium storing a program for executing a method for automatically detecting an object and estimating depth information of an image captured by an imaging device having a multiple color-filter aperture, which can automatically detect an object from an image having a focus restored due to a shift characteristic of a color channel and estimate depth information on the detected object.
One aspect of the present invention provides an apparatus for automatically detecting an object of an image captured by an imaging device having a multiple color-filter aperture, the automatic object detection apparatus including: a background generation unit configured to detect a movement from a current image frame among a plurality of continuous image frames captured by an imaging device having different color filters installed in a plurality of openings formed in an aperture, to generate a background image frame corresponding to the current image frame; and an object detection unit configured to detect an object region included in the current image frame based on differentiation between a plurality of color channels of the current image frame and a plurality of color channels of the background image frame.
Another aspect of the present invention provides a method of automatically detecting an object of an image captured by an imaging device having a multiple color-filter aperture, the automatic object detection method including: a background generation step of detecting a movement from a current image frame among a plurality of continuous image frames captured by an imaging device having different color filters installed in a plurality of openings formed in an aperture, to generate a background image frame corresponding to the current image frame; and an object detection step of detecting an object region included in the current image frame based on differentiation between a plurality of color channels of the current image frame and a plurality of color channels of the background image frame.
Still another aspect of the present invention provides an apparatus for estimating depth information of an image captured by an imaging device having a multiple color-filter aperture, the depth information estimation apparatus including: a color shift vector calculation unit configured to calculate a color shift vector indicating a degree of color channel shift in an edge region extracted from color channels of an input image captured by an imaging device having different color filters installed in a plurality of openings formed in an aperture; and a depth map estimation unit configured to estimate a sparse depth map for the edge region by using a value of the estimated color shift vector, and interpolate depth information on a remaining region other than the edge region of the input image based on the sparse depth map to estimate a full depth map for the input image.
Yet another aspect of the present invention provides a method of estimating depth information of an image captured by an imaging device having a multiple color-filter aperture, the depth information estimation method including: calculating a color shift vector indicating a degree of color channel shift in an edge region extracted from color channels of an input image captured by an imaging device having different color filters installed in a plurality of openings formed in an aperture; estimating a sparse depth map for the edge region by using a value of the estimated color shift vector; and interpolating depth information on a remaining region other than the edge region of the input image based on the sparse depth map to estimate a full depth map for the input image.
With the apparatus and method for detecting an object automatically and estimating depth information of an image captured by an imaging device having a multiple color-filter aperture (MCA), it is possible to automatically detect an object by a repetitively updated background image frame and to accurately estimate object information by separately detecting an object for each color channel by considering a property of the MCA camera. It is also possible to estimate information on an actual depth from the camera to the object by using a property in which different color shift vectors are obtained according to positions of the object.
It is also possible to estimate a full depth map from one image captured by an imaging device having a multiple color-filter aperture (MCA) and to improve quality of the image by removing color-mismatching of the image by using the estimated full depth map. It is also possible to convert a 2D image into a 3D image by using the estimated full depth map.
An apparatus and method for detecting an object automatically and estimating depth information of an image captured by an imaging device having a multiple color-filter aperture according to a preferred embodiment of the present invention will be described below with reference to the accompanying drawings.
In order to describe a detailed configuration and operation of the present invention, a principle of an imaging device (hereinafter, referred to as an MCA camera) having a multiple color-filter aperture according to an embodiment of the present invention will be described, and then an operation of the present invention will be described in detail on an element-by-element basis.
Light forms an image at different positions of a camera sensor through the color filters installed in the respective openings according to a distance between a lens and an object. When the object is positioned at a position apart from a focal distance of the camera, color deviation occurs in the obtained image.
When a center of openings of a general camera is aligned with an optical axis of a lens, a convergence pattern of an image plane forms a point or a circular region depending on a distance to a subject, as shown in a portion (a) of
The present invention has a configuration for automatically detecting an object from an image by using color deviation that occurs in an image captured by the MCA camera and also estimating information on a depth from the MCA camera to the object on the basis of a degree of color deviation.
Referring to
The background generation unit 110 detects a movement from a current image frame among a plurality of continuous image frames that are captured by the MCA camera and generates a background image frame corresponding to the current image frame. That is, the automatic object detection apparatus 100 according to an embodiment of the present invention may generate a background and detect an object in real time for each image frame of a video image configured of a plurality of continuous image frames.
The background generation unit 110 may estimate movement of a current image frame by using an optical flow in order to generate a background image frame corresponding to the current image frame. Optical flow information corresponding to respective pixels of the current image frame may be obtained from a relation between the current image frame and a previous image frame before the current image frame, as expressed in Equation 1 below.
D(c,y)=Σi=x−wx+wΣj=y−wy+w(fi(i,j)−fi−1(i+dx, j+dy))2 [Equation 1]
where, D(x,y) is optical flow information corresponding to a pixel (x,y) of the current image frame, ft is the current image frame, ft−1 is the previous image frame, and (dx,dy) is a value for minimizing D(x,y) and indicates shift of the pixel (x,y). In Equation 1, a size of a search region is set as (2w+1)x(2w+1).
If a value of the optical flow information D(x,y) in the pixel (x,y) of the current image frame is less than a predetermined Euclidean distance threshold, the corresponding pixel is determined to be included in the background. The background generation unit 110 updates a background image frame generated corresponding to the previous image frame, as expressed in Equation 2 below, by using pixels of the current image frame that are determined to be included in the background.
f
B
t(x,y)=(1−α)ft(x,y)+αfBt−1(x,y) [Equation 2]
where, fBt and fBt−1 are background image frames corresponding to the current image frame and the previous image frame, respectively, and α is a predetermined mixing ratio in a range of [0,1].
The object detection unit 120 detects an object region included in the current image frame on the basis of differentiation between the current image frame and the background image frame of the current image frame. In conventional methods, only differentiation between image frames is calculated to detect an object. However, the object detection unit 120 of the automatic object detection apparatus 100 according to an embodiment of the present invention detects an object region for each color channel of the current image frame by calculating the differentiation between a plurality of color channels constituting the current image frame and the background image frame.
For example, if differentiation between a channel R of a current image frame and a channel R of a background image frame is calculated, an object region corresponding to the channel R of the current image frame is obtained, and object regions corresponding to channels G and B are obtained using the same process, respectively. By detecting an object region for each color channel of an image frame, as shown in
Specifically, the object detection unit 120 may detect an object region from the current image frame, as expressed in Equation 3 below.
where, fOc is a binary image corresponding to a color channel of the current image frame, which represents an object region where pixels having a value of 1 in fOc are detected from the corresponding color channel
After detecting the object region, the object detection unit 120 may additionally remove noise from object regions positioned at close points for each color channel, by using an object morphological filter.
As shown in the portion (a) of
Meanwhile, the automatic object detection apparatus 100 according to an embodiment of the present invention may estimate depth information, which is information about a distance from the MCA camera to the object corresponding to the object region, by using a degree of color shift that is included in an object region detected by the object detection unit 120.
In order to estimate the object depth information, a channel alignment process should be performed on an object region where deviation occurs between color channels as described above. The color channel alignment process may be performed by estimating color shift vectors (CSVs) that indicate information on directions and distances of other color channels (for example, a channel R and a channel B) with respect to a specific color channel (for example, a channel G).
Specifically, color shift vectors of the channel R and the channel B with respect to the channel G in an i-th object region of a plurality of object regions are expressed as Equation 4 below.
f
G(x,y)=fB(x+ΔxGB, y+ΔyGB)
f
G(x,y)=fR(x+ΔxGR, y+ΔyGR) [Equation 4]
where, (ΔxGB,ΔyGB) and (ΔxGR,ΔyGR) indicate a color shift vector for a GB channel (a channel G and a channel B) and a color shift vector for a GR channel (a channel G and a channel R), respectively. The two color shift vectors as expressed in Equation 4 have a relation as expressed in Equation 5, because of a property of the MCA camera as shown in the portion (a) of
In this case, the color shift vectors (ΔxGB,ΔyGB) and (ΔxGR,ΔyGR) may be estimated by minimizing quadratic error functions of Equation 6.
where, EGB is an error function corresponding to a color shift vector of the GB channel, EGB is an error function corresponding to a color shift vector of the GR channel, and Ω is an object region. Referring to Equation 6, the error function corresponding to the color shift vector of the GR channel may be represented using the color shift vector of the GB channel with reference to a relation the above-described color shift vectors.
As a result, the error function of Equation 6 is a nonlinear function of (ΔxGB,ΔyGB), and thus an iterative approach method such as Newton-Raphson algorithm may be used to find (ΔxGB,ΔyGB) that minimizes Equation 6.
Estimation of a linear Taylor series for the error functions of Equation 6 may be represented as expressed in Equation 7 below.
where, for a color channel c∈{R,B}, ftGc(x,y)=fG(x,y)−fc(x,y), and fxGc(•) and fyGc(•) are a horizontal derivative and a vertical derivative of 1/2{fG(x,y)+fc(x,y)}, respectively.
The estimated error is represented in the form of a vector, as expressed in Equation 8 below.
Since E(v) is a quadratic function of a vector v, v for minimizing an error may be obtained by finding a value of allowing a result obtained by differentiating an error function with respect to v to be zero, as expressed in Equation 9.
Since Equation 9 is a linear equation, the vector v may be obtained as expressed in Equation 10 below.
v=[Σ
x,y∈Ω
CC
T]−1[Σx,y∈ΩCS] [Equation 10]
where, C=(cGB, cGR) and S=(sGB, sGR)T. If a size of the detected object region is sufficiently large and sufficient contents are included in an image, a matrix C may have an inverse matrix in Equation 10.
Equation 10 may be further simplified based on a characteristic of the MCA camera. If the channel G and the channel B have the same horizontal axis, ΔyGB, a vertical component of the color shift vector, is equal to zero. Accordingly, the vector v may be represented with a single parameter ΔxGB by using triangle properties and an angle between color filters of an aperture, as shown in a portion (b) of
Each of a numerator and a denominator of Equation 11 may be a 1×1 matrix and estimate a final shift vector v, which is a combination between color shift vectors that are estimated by corresponding to color channels without an inverse matrix, respectively.
The automatic object detection apparatus 100 according to an embodiment of the present invention may further include the depth information estimation unit 140 for estimating information on an absolute depth from the MCA camera to the object. The depth information estimation unit 140 estimates information on a depth between the MCA camera and the object included in the object region on the basis of magnitude information of the final shift vector v.
Specifically, a conversion function indicating a distance to the object and a shift amount of the color channel, that is, a relation between magnitudes of shift vectors may be predetermined. The conversion function may be obtained by positioning objects at certain distances from the MCA camera, repetitively capturing the same scene including an object for each position of the object, and estimating a color shift vector.
Referring to the portion (a) of
When a graph has been established as shown in the portion (b) of
Referring to
Further, the color shift vector estimation unit 130 estimates color shift vectors each indicating a shift direction and a distance between the object regions detected from color channels of the current image frame, and calculates a final shift vector corresponding to an object region by combining the color shift vectors estimated corresponding to color channels in operation S1030.
The depth information estimation unit 140 may estimate information on a depth to an object included in the object region on the basis of magnitude information of the final shift vector in operation S1040. As described above, it is preferable that a conversion function between the magnitude information of the shift vector and the distance information be predetermined.
Referring to
The image capture unit 210 includes a capture module (not shown) and captures a surrounding scene to obtain an image. The capture module includes an aperture (not shown), a lens unit (not shown), and an imaging device (not shown). The aperture is disposed in the lens unit, and configured to include a plurality of openings (not shown) and adjust an amount of light incident on the lens unit according to a degree of openness of the openings. Each opening includes a red color filter, a green color filter, and a blue color filter. The capture module measures depth information of objects positioned at different distances and performs multi-focusing by using a multiple color-filter aperture (MCA). Since the multi-focusing has been described with reference to
The color shift vector calculation unit 230 calculates a color shift vector indicating a degree of color filter shift in an edge region that is extracted from a color channel of the image received from the image capture unit 210.
For example, the color shift vector calculation unit 230 calculates color shift vectors of a green color channel and a blue color channel with respect to a red color channel in an edge region extracted from the color channel of the input image by using a normalized cross correlation (NCC) combined with a color shifting mask map (CSMM), as expressed in Equation 12 below. Alternatively, color shift vectors of other color channels with respect to the green color channel or blue color channel among the three color channels.
CSV(x,y)arg max C
where, CSV(x,y) is a color shift vector estimated at (x,y), CN(u,v) is a value obtained by the normalized cross correlation (NCC), and CSMM(u,v) is the color shifting mask map (CSMM), which is predetermined based on a color shifting property of the multiple color-filter aperture (MCA) in which a color channel is shifted in a predetermined form.
Specifically, the normalized cross correlation (NCC) is expressed in Equation 13 below. Thus, fast block matching may be performed.
where, f1(x,y) is a block in the red color channel, and f2(x,y) is a block in the green color channel or blue color channel. The normalized cross correlation (NCC) of Equation 13 may be efficiently evaluated by using a fast Fourier transform (FFT).
An error in disparity estimated by an edge-based NCC may be reduced because of different intensity levels between erroneously detected edges and color channels by enforcing the color shifting property of the multiple color-filter aperture (MCA) in the color shifting mask map (CSMM). That is, the disparity may be accurately estimated by applying a priori constraint to a feasible pattern of color shift vectors (CSVs).
The color shift vector calculation unit 230 selects, as a color shift vector for an input image, a color shift vector having a high matching ratio among the calculated two color shift vectors.
The depth map estimation unit 250 estimates a sparse depth map for the input image, as expressed in Equation 14 below, by using the color shift vector (CSV) for the input image that is estimated by the color shift vector calculation unit 230.
D(x,y)=−sign(v)×√{square root over (u2+v2)} [Equation 14]
where, (u,v) is a color shift vector estimated at (x,y), and sign(v) is a sign of v.
The depth map estimation unit 250 estimates a full depth map for the input image from the sparse depth map that is estimated using the color shift vector (CSV), by using a depth interpolation method. That is, the depth map estimation unit 250 estimates a full depth map by filling a remaining portion of the image by using the matting Laplacian method, in order to generate a full depth map by using the sparse depth map detected in the edge region.
Specifically, the depth interpolation is performed by minimizing an energy function as expressed in Equation 15 below.
E(d)=dTLd+λ(d−{circumflex over (d)})TA(d−{circumflex over (d)}) [Equation 15]
where, d is a full depth map, {circumflex over (d)} is a sparse depth map, L is a matting Laplacian matrix, A is a diagonal matrix in which Aii is equal to 1 if an i-th pixel is on an edge and Aii is equal to 0 if an i-th pixel is not on an edge, and λ is a constant for controlling fidelity between smoothness of interpolation and a sparse depth map.
The matting Laplacian matrix L is defined as expressed in Equation 16 below.
where, δij is a Kronecker delta function, U is a 3×3 identity matrix, μk is a mean of colors in a window wk, Σk is a covariance matrix of colors in a window wk, Ii and Ij are colors of an input image I at pixels i and j, respectively, ε is a regularization parameter, and |wk| is a magnitude of a window wk.
The full depth map is obtained as expressed in Equation 17 below.
d=(L+λA)−λ{circumflex over (d)} [Equation 17]
The image correction unit 270 corrects the input image to a color-matched image by shifting a color channel of the input image using the full depth map estimated by the depth map estimation unit 250. Thus, it is possible to improve image quality by correcting a color-mismatched image using a full depth map for an input image. The image correction unit 270 may correct the input image to a 3D image by using the full depth map.
The image storage unit 290 stores the image corrected by the image correction unit 270 and a corresponding full depth map.
The depth information estimation apparatus 200 according to an embodiment of the present invention calculates a color shift vector from an edge extracted from a color channel of an input image captured by an MCA camera in operation S1110. That is, the depth information estimation apparatus 200 according to an embodiment of the present invention calculates the color shift vector from an edge extracted from the color channel of the input image with respect to a red color channel by using a normalized cross correlation (NCC) combined with a color shifting mask map (CSMM).
Subsequently, the depth information estimation apparatus 200 according to an embodiment of the present invention estimates a sparse depth map for the input image by using the color shift vector in operation S1120. That is, the depth information estimation apparatus 200 according to an embodiment of the present invention estimates the sparse depth map from the color shift vector as expressed in Equation 14 above.
Subsequently, the depth information estimation apparatus 200 according to an embodiment of the present invention estimates a full depth map from the sparse depth map by using the depth interpolation method in operation S1130. That is, the depth information estimation apparatus 200 according to an embodiment of the present invention estimates the full depth map by filling a remaining portion of an image by using a matting Laplacian method, in order to generate the full depth map using the sparse depth map detected in the edge region.
Subsequently, the depth information estimation apparatus 200 corrects the input image by using the estimated full depth map in operation S1140. For example, the depth information estimation apparatus 200 corrects the input image to a color-matched image by shifting a color channel of the input image by using the full depth map. The depth information estimation apparatus 200 may correct the input image to a 3D image by using the full depth map.
The invention can also be implemented as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording devices for storing data which can be thereafter read by a computer device. Examples of the computer-readable recording medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage. Further, the computer-readable recording medium may be implemented in the form of a carrier wave such as Internet transmission. Also, the computer-readable recording medium is distributed to computer devices that are connected over the wired/wireless communication networks so that the computer-readable codes may be stored and executed in a distributed fashion.
While the present invention has been particularly shown and described with reference to preferred embodiments thereof, it should not be construed as being limited to the embodiments set forth herein. It will be understood by those skilled in the art that various changes in form and details may be made to the described embodiments without departing from the spirit and scope of the present invention as defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2012-0017438 | Feb 2012 | KR | national |
10-2012-0042770 | Apr 2012 | KR | national |
The present application is a continuation application of U.S. application Ser. No. 14/376,770, filed on Aug. 5, 2014, and the present application claims priority to Korean applications KR 10-2012-0017438, filed on Feb. 21, 2012, and KR 10-2012-0042770, filed on Apr. 24, 2012, and international application PCT/KR2012/009308, filed on Nov. 7, 2012, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 14376770 | Aug 2014 | US |
Child | 15805812 | US |