APPARATUS AND METHOD FOR GENERATING DEPTH MAPS FROM RAW DUAL PIXEL SENSOR DATA

Information

  • Patent Application
  • 20250157062
  • Publication Number
    20250157062
  • Date Filed
    February 10, 2023
    3 years ago
  • Date Published
    May 15, 2025
    12 months ago
Abstract
An image processing apparatus and method for generating a sub-pixel alignment estimate from dual-pixel sensor data is provided. The method includes parametrically fitting at least two signals observed across a two dual-pixels of the dual-pixel sensor in the direction of the dual-pixel split. generating at least one bilinear measure on the at least two signals' fitting parameters. determining an alignment confidence based at least in part on the at least one bilinear measure on the at least two signals' fitting parameters, and determining the sub-pixel alignment estimate based at least in part on the at least one bilinear measure on the at least two signals' fitting parameters.
Description
BACKGROUND
Field

The present disclosure relates to an image processing technique used with a dual-pixel sensor.


Description of Related Art

Dual pixel sensors have become common in modern cameras due to their ability to perform fast focusing by examining the differential signals between the dual pixels at regions of interest such as eyes of people or animals, or other salient objects in a scene. Additionally, dual pixel raw data can also provide information about focus that relates to depth of objects relative to the focus depth. This is especially true when a lens with a large aperture allows for the pathways of the incoming light field to follow different paths when entering the dual pixels. Slight variations in the signals' positioning on the sensor from the differing pathways can account for changes in depth for different parts of the scene. In order to recover depth, sub-pixel alignment techniques may be used to estimate sub-pixel misalignments of the two paths of light onto the sensor.


An exemplary approach to sub-pixel alignment may involve taking a few pixels along the direction of the dual pixel separation from both dual pixel images, fitting the signals to a parametric model, interpolating the parametric model to some sub-pixel resolution shifting one interpolated signal with respect to the other, and determining the shift that minimizes the error between the two signals. One drawback with this exemplary processing includes increased computation power needed and a system and method according to the present disclosure remedies these drawbacks.


SUMMARY

An image processing apparatus and method for generating a sub-pixel alignment estimate from dual-pixel sensor data is provided. The method includes parametrically fitting at least two signals observed across a two dual-pixels of the dual-pixel sensor in the direction of the dual-pixel split, generating at least one bilinear measure on the at least two signals' fitting parameters, determining an alignment confidence based at least in part on the at least one bilinear measure on the at least two signals' fitting parameters, and determining the sub-pixel alignment estimate based at least in part on the at least one bilinear measure on the at least two signals' fitting parameters.


These and other objects, features, and advantages of the present disclosure will become apparent upon reading the following detailed description of exemplary embodiments of the present disclosure, when taken in conjunction with the appended drawings, and provided claims.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates the hardware architecture of the present disclosure.



FIG. 2 illustrates an algorithm according to the present disclosure.



FIGS. 3A-3C illustrate observed signals and processing performed thereon according to the present disclosure.



FIG. 4A-4D illustrate observed signals and processing performed thereon according to the present disclosure.



FIG. 5 is a graph illustrating alignment error of two signals according to the present disclosure.



FIG. 6 illustrates an exemplary pixel array according to the present disclosure.



FIG. 7 illustrates an exemplary image input prior to depth map processing being performed according to FIG. 2.



FIGS. 8A-8C illustrate the result of the depth map processing of FIG. 2.





Throughout the figures, the same reference numerals and characters, unless otherwise stated, are used to denote like features, elements, components or portions of the illustrated embodiments. Moreover, while the subject disclosure will now be described in detail with reference to the figures, it is done so in connection with the illustrative exemplary embodiments. It is intended that changes and modifications can be made to the described exemplary embodiments without departing from the true scope and spirit of the subject disclosure as defined by the appended claims.


DETAILED DESCRIPTION

According to the present disclosure an image processing technique is provided that advantageously determines depth maps from an image capture device with one or more sensors that are dual pixel image sensors. According the disclosure below, the conventional manner in which depth may be estimated is improved by providing an analytic determination of the depth map by approximating the two signals to be aligned and treating those signals as continuous functions where one function is allowed to shift by an arbitrary delta. In so doing, the error of the two functions is analyzed continuously and can be minimized over the period delta. Through an analysis of an analytical representation of the two signals, the optimal signal shift may be obtained in a single step rather than search over a range of discrete candidate steps, thereby making this method faster and capable of being done in real-time.



FIG. 1 illustrates an example embodiment of a system for generating a depth maps from an image capture device including a dual pixel sensor that is configured to convert light captured on the dual pixel sensor into electrical signals. The system 1 includes a capture device 100 and a receiving device 110 which are a specially-configured computing devices. In this embodiment, the capture device 100 and the receiving device 110 communicate via one or more networks 199, which may include a wired network, a wireless network, a LAN, a WAN, a MAN, and a PAN. Also, in some embodiments the devices communicate via other wired or wireless channels.


The two systems 100 and 110 include one or more respective processors 101 and 111, one or more respective I/O components 102 and 112, and respective storage 103 and 113. Also, the hardware components of the two systems 100 and 110 communicate via one or more buses or other electrical connections. Examples of buses include a universal serial bus (USB), an IEEE 1394 bus, a PCI bus, an Accelerated Graphics Port (AGP) bus, a Serial AT Attachment (SATA) bus, and a Small Computer System Interface (SCSI) bus.


The one or more processors 101 and 111 include one or more central processing units (CPUs), which may include one or more microprocessors (e.g., a single core microprocessor, a multi-core microprocessor); one or more graphics processing units (GPUs); one or more tensor processing units (TPUs); one or more application-specific integrated circuits (ASICs); one or more field-programmable-gate arrays (FPGAs); one or more digital signal processors (DSPs); or other electronic circuitry (e.g., other integrated circuits).


The I/O components 102 and 112 include communication components (e.g., a graphics card, a network-interface controller) that communicate with the respective image capture dual-pixel array 120 and a display device 130, the network 120, and other input or output devices (not illustrated), which may include a keyboard, a mouse, a printing device, a touch screen, a light pen, an optical-storage device, a scanner, a microphone, a drive, and a game controller (e.g., a joystick, a gamepad).


The storages 103 and 113 include one or more computer-readable storage media. As used herein, a computer-readable storage medium includes an article of manufacture, for example a magnetic disk (e.g., a floppy disk, a hard disk), an optical disc (e.g., a CD, a DVD, a Blu-ray), a magneto-optical disk, magnetic tape, and semiconductor memory (e.g., a non-volatile memory card, flash memory, a solid-state drive, SRAM, DRAM, EPROM, EEPROM). The storages 103 and 113, which may include both ROM and RAM, can store computer-readable data or computer-executable instructions.


The two systems 100 and 110 also include respective communication modules 103A and 113A. A module includes logic, computer-readable data, or computer-executable instructions. In the embodiment shown in FIG. 11, the modules are implemented in software (e.g., Assembly, C, C++, C#, Java, BASIC, Perl, Visual Basic, Python, Swift). However, in some embodiments, the modules are implemented in hardware (e.g., customized circuitry) or, alternatively, a combination of software and hardware. When the modules are implemented, at least in part, in software, then the software can be stored in the storage 103 and 113. Also, in some embodiments, the two systems 100 and 110 includes additional or fewer modules, the modules are combined into fewer modules, or the modules are divided into more modules. One environment system may be similar to the other or may be different in terms of the inclusion or organization of the modules.


The regression module 103B includes operations programed to carry out linear or linear dual-pixel transformations as described in block 230 of FIG. 2 as described below. The non-linear module 103C includes operations programmed to carry out non-linear processing of the results of the regression module 103B as described for example in block 240 of FIG. 2 as described below. The confidence module 103D and the shift module 103E include instructions to compute the respective confidence and shift values for a location as described in block 250 of FIG. 2 as described below. The depth module 103F includes instructions that combine the confidence and shift values, for example by applying a confidence threshold to the confidence map and using the confidence locations above the threshold as a depth pixel mask. The mesh generation module 103G includes instructions to generate a 3-D mesh using one or more confident depth locations as one or more 3-dimensional mesh vertices and the subsequently applying a triangulization method to the vertices such as a Delauny triangularization method to generate a 3D mesh. The Video module 103H includes instructions to store or transmit still or streamed RGBD (red, green, blue, depth) images, or RGBD, or variants thereof, or 3-dimensional mesh data which may also include corresponding uv texture maps generated from the RGB pixel values of the captured image.


The RGBD representation module 114 of system 110 includes instructions to receive and process for viewing the incoming data generated from module 103H of system 100 via the network 199. The processing of the incoming data may include rendering the data on a 2-D display unit 130 or a stereo-graphic display or a holographic or light field display.



FIG. 2 illustrates an exemplary algorithm that is embodied as a set of instructions that are stored in a memory and executed by one or more of the hardware processors discussed above with respect to FIG. 1. The basis as to why the algorithm of FIG. 2 improves the ability to determine depth map information from a dual pixel sensor array can best be understood based on the following information. This description sets the foundation for the algorithmic processing shown in FIG. 2.


Initially, it is important to identify or otherwise obtain a polynomial approximation of local signals from the dual pixel sensor in the image capture device. Thus, for a signal of interest s(x) centered around x=0 with observations s={s1, s2, . . . , sN} taken at {x1, x2, . . . , xN}. The signal may be approximated via polynomial regression calculation such that





s(x)≈p(x;a)


where p(x; a) is a polynomial of x with polynomial coefficients a. Thereafter, the polynomials are parameterized by their coefficients:






a
=


[


a
p

,


,

a
0


]

T








p

(

x
;
a

)

=





k
=
0

p



a
k



x
k



=


x
p
T


a









x
p

=


[


x
p

,


,
x
,
1

]

T





This leads to regression processing to be performed whereby the problem can be formulated as s≈Xa







[




s
1











s
N




]




[




x
1
p







x
1
0

















x
N
p







x
N
0




]


a





Processing is then performed to minimize the error of the regression where the following is obtained









min
a

[

s
-

X

a


]

T

[

s
-

X

a


]




Sometimes the error between observed samples and the polynomial are weighted to focus the attention of the regression on the center of the sampled signal. This involves the inclusion of a weighting matrix which is typically diagonal:









min
a

[

s
-

X

a


]

T



W
[

s
-

X

a


]





The solution for the weighted least squared problem is determined according to the following equation:






a
=


[



(


X
T


W

X

)


-
1




X
T


W

]


s





Thus, in an exemplary embodiment, the sampling of the signal is in regular intervals (evenly spaced pixels), the scale of the range of x may be arbitrarily assigned. This describes the parametrically fitting of a first signal observed across a first of two dual-pixels of the dual-pixel sensor in the direction of the dual-pixel split. The same method can be used to parametrically fit a second signal observed across a second of two dual-pixels of the dual-pixel sensor in the direction of the dual-pixel split. Additionally the signal may be zero centered to facilitate calculations. For example if the number of samples is N=2m+1, we can make x range from −1.0 to +1.0 in increments of 1/m. Thus the matrix X has entries:







X

i
,
j


=


(



i
-
1

m

-
1

)


p
+
1
-
j






Since the matrix X and the weighting matrix W are pre-determined, the expression (XTWX)−1XTW can be pre-calculated as a linear regression function on samples of s such that









a
=

R

s





(
0
)







And thus the signal as a function of position x can be approximated by a polynomial with coefficients Rs





s(x)≈p(x;Rs)


An example signal is shown in text missing or illegible when filed3A-3C where a signal in the horizontal direction of a first image centered at zero is shown on a scale from −1 to 1 in FIG. 3A. In FIG. 3B, a sample weighting function is shown which weighs higher each sample's (from FIG. 3A) contribution to the polynomial regression fitting error. In FIG. 3C a cubic polynomial fit of the signal using the error weighting is illustrated. As such, FIGS. 3A-3C illustrate an exemplary observed signal from a first image of a dual-pixel signal and the weighting function used to weigh the regression error of a polynomial fit along with the polynomial (cubic) fit of the signal via a center weighted regression.


Importantly, the image processing technique performs an analysis of the alignment error of the two polynomials. In an exemplary embodiment, two functions are sampled representing the same underlying phenomena but potentially locally misaligned. By approximating both signals locally as polynomials, a focus of the processing can be on the alignment error of the two polynomials. As such, the alignment error from a shift of δ of the two polynomials can be given by










J

(

a
,
b
,
δ

)

=




-








G

(
x
)

[


p

(


x
+
δ

;
a

)

-

p

(

x
;
b

)


]

2


d

x






(
1
)







Where G(x) is a (windowing) weighting function giving higher weight at x=0 and symmetrically decreases the weighting of the polynomial error as x moves away from zero. This is illustrated in FIGS. 4A-4D which represent observed signals and various weights applied thereto as disclosed herein. FIG. 4A illustrates an observed signal from a first dual pixel image (e.g. the left pixel intensities of the dual pixels). FIG. 4B illustrates an observed signal from the second dual pixel image (e.g. the right pixel intensities of the dual pixels). FIG. 4C illustrates the weighted cubic fit of the two signals with error weighting to prioritize the fit of the center of the signal and FIG. 4D illustrates the squared error weighted by a Gaussian windowing function. As such, FIG. 4A and 4B illustrate two corresponding dual pixel signals at the same location. In FIG. 4C, the center focused cubic regression functions of the two signals are shown whereas in FIG. 4D, the squared error at δ=0 weighted by a continuous Gaussian kernel, which is the function inside the integral of equation (1), is shown. In FIGS. 4A-4D we see two signals on the left (FIG. 4A) and right side (FIG. 4B) of the dual pixel fit to cubic polynomials (FIG. 4C) with more importance given toward the center pixel of this signal when performing the cubic regression. And finally for a given delta (namely zero in this example) we plot (FIG. 4D) the squared error across the continuous signals (weighted by a weighting function given higher weight to the errors around zero). As we will describe, the total error in FIG. 4D (calculated by integrating the signal) can be calculated for any given shift δ. The ultimate goal is to determine the value of δ that will produce the smallest alignment error. This δ, or signal shift correlates to how out of focus the point is and how much the focus would need to change to bring these two signals into alignment (e.g. into focus).


Turning now to more details on embodiments to determine shift in a fast manner. To do so we first describe how to represent polynomials of two variables. We note that the polynomial p(x+δ; a) can be written in a symmetric form







p

(


x
+
δ

;
a

)

=


x
p
T



β

(
a
)



δ
p







where






δ
p

=


[


δ
p

,


,
δ
,
1

]

T





where β(a) is a lower anti-triangular matrix, where the anti-triangular bands starting from the center band are given by:










a
p

[


(



p




0



)

,

(



p




1



)

,


,

(



p




p



)


]







a

p
-
1


[


(




p
-
1





0



)

,

(




p
-
1





1



)

,


,

(




p
-
1






p
-
1




)


]











a
0







Where (np) is the binomial coefficient defined as







(



p




n



)

=


p
!


n


!


(

p
-
n

)

!








For example, for illustrative purposes, if p=3:







β

(
a
)

=

[



0


0


0



a
3





0


0



3


a
3





a
2





0



3


a
3





2


a
2





a
1






a
3




a
2




a
1




a
0




]





This allows us to revisit the alignment error of two Polynomials where the alignment error for the two polynomials from equation (1) as










J

(

a
,
b
,
δ

)

=




-








G

(
x
)

[


x
p
T

(



β

(
a
)



δ
p


-
b

)

]

2


d

x






(
2
)







The inner term xT(β(a)δp−b) is a polynomial of both x, and δ, which can be described via a matrix C as








x
p
T



C

(

a
,
b

)



δ
p


=


x
p
T

(



β

(
a
)



δ
p


-
b

)





And thus Cis







C

(

a
,
b

)

=


β

(
a
)

-

b
·

[

0
,


,
0
,
1

]







And we note that [xpTC(a, b)δp]2 can be expressed with the matrix D which is the 2-d convolution of C with itself:








[


x
p
T



C

(

a
,
b

)



δ
p


]

2

=


x

2

p

T



D

(

a
,
b

)



δ

2

p







Where D represents a polynomial of both x, and δ both of order 2p. Thus D is a (2p+1) by (2p+1) lower anti-diagonal matrix.


Equations 1 and 2 both contain a weighting function G(x). The weighting function can be a Gaussian function with zero mean and a standard deviation of σ. Thus the objective of minimizing alignment error can be rewritten as:










G

(
x
)

=


1


2

π


σ
2






e

-


x
2


2


σ
2










(
3
)













J

(

a
,
b
,
δ

)

=





-






G

(
x
)



x

2

p

T



D

(

a
,
b

)



δ

2

p



d

x








=



[




-






G

(
x
)



x

2

p

T


d

x


]



D

(

a
,
b

)



δ

2

p










Here the first term in square brackets provides a vector of Gaussian central moments g2p with elements:











g
k

=





-






G

(
x
)



x
k


d

x


=




{







σ
k

(

k
-
1

)

!!

,





k


even

,

k

0







0
,




k


odd






1
,




k
=
0









(
4
)







Where (k−1)!! is the double factorial of (k−1):








(

k
-
1

)

!!

=



(

k
-
1

)



(

k
-
3

)







(
1
)


=


k
!



2

k
2





(

k
2

)

!








Thus, the main objective function (error function) is rewritten as










J

(

a
,
b
,
δ

)

=


g

2

p

T



D

(

a
,
b

)



δ

2

p







(
5
)







where g2pT=[g2p, . . . , g0] are the central moments of a zero mean Gaussian given by equation (4) as a function of σ.


In some cases, to improve the alignment results multiple signals are aligned simultaneously via a weighted sum of alignment objectives to find a single shift value δ. In this case a set of reference signals are from k=−w, . . . ,+w are fit via polynomial regression along with a corresponding set of alignment signals yielding A={a−w, . . . , aw} and B={b−w, . . . , bw}, generating a composite objective










J

(

A
,
B
,
δ

)

=







i
=

-
w


w



ω
i



J

(


a
i

,

b
i

,
δ

)






(
6
)













J

(

A
,
B
,
δ

)

=



g

2

p

T

[







i
=

-
w


w



ω
i



D

(


a
i

,

b
i


)


]



δ

2

p







(
7
)







where ωi is a windowing/weighting function. To find the optimal shift, δ, we perform minimization on the above objective over δ. A convenient weighting function is again the zero mean Gaussian function. The parameter σ may be the same as the continuous signal weighting Gaussian function, or may be different. Other weighting functions are also possible such as a binomial weighting, triangular function weighting, or box function weighting to name a few examples. Equation 7 shows a bilinear measure on the two signals' fitting parameters.


Processing is then performed to solve for the optimal shift whereby an objective is to minimize the composite objective of equation (6) and (7). In one embodiment, a root finding processing method is employed which obtains the derivative of the objective function and sets it equal to zero as a necessary condition of an extrema point in the alignment error. From equation (7), taking the derivative we get









0
=






δ



J

(

A
,
B
,
δ

)


=



g

2

p

T

[







i
=

-
w


w



ω
i



D

(


a
i

,

b
i


)


]







δ



δ

2

p









(
8
)









And







δ



δ

2

p





is











δ



δ

2

p



=



[


2

p


δ


2

p

-
1



,


(


2

p

-
1

)



δ


2

p

-
2



,


,

2

δ

,
1
,
0

]

T

=



Diag

(

[


2

p

,


,
0

]

)

[




δ


2

p

-
1






-




0



]






The necessary condition can be solved via a polynomial root finding algorithm. In many cases the minima of interest in the objective function is the minima near δ=0. As such, the objective function may be minimized via a gradient descent type method starting at δ=0. It should be noted that equation (8) still involves a bilinear measure on the two signals' fitting parameters.


For the minimization using the gradient descent from a starting point, the objective written J(δ) for short, can be approximated with N terms via a Maclaurin Series about some point δn:







J
N




(

δ
;

δ
n


)

=




n
=
0

N





(

δ
-

δ
n


)

n


n
!





J

(
n
)


(

δ
n

)








Using N=2, the error is approximated with a quadratic about the point on and this approximation may be minimized over δn+1









min



δ

n
+
1






J
2

(


δ

n
+
1


;

δ
n


)


=



min

δ

n
+
1





J

(

δ
n

)


+


(


δ

n
+
1


-

δ
n


)




J


(

δ
n

)


+




(


δ

n
+
1


-

δ
n


)

2

2




J


(

δ
n

)







Processing is performed to minimize by taking the derivative of the quadratic approximation and setting it equal to zero yielding the updated equation







δ

n
+
1


=


δ
n

-



J


(

δ
n

)



J


(

δ
n

)







This advantageously enables an embodiment using a one-step quadratic approximation to be performed where this iteration may only involve a single step to arrive at a good estimate for delta:










δ
1

=




J


(
0
)



J


(
0
)


=



[


g

2

p

T

[







i
=

-
w


w



ω
i



D

(


a
i

,

b
i


)


]

]

1



2
[


g

2

p

T

[







i
=

-
w


w



ω
i



D

(


a
i

,

b
i


)


]

]

2







(
9
)







where [v]k signifies the k-th order element of the polynomial coefficient vector v. FIG. 5text missing or illegible when filedshows an example error function J(δ) for two corresponding dual-pixel image patches and the quadratic approximation of the error function about δ=0. The observation of a typical alignment function such as the one shown herein, along with the expectation that the solution for δ is typically close to zero, provides the motivation to approximate the function with a quadratic and to use the one step solution approximation as described by equation (9). Indexing into the 1st and 2nd order coefficients of the delta polynomial coefficient vector only involves two rows of the matrix D(ai, bi). Furthermore, the vector g2pT when using the Gaussian weighting function is non-zero for every other element. Thus, while D is a (2p+1)×(2p+1) matrix containing 4p2+4p+1 coefficients (or actually 2p2+3p+1 elements that are non-zero in the lower anti-triangular matrix), only 2p terms in the matrix D are used. This indicates for the one-step approximate solution in (9) the D matrix need not be fully computed for each (ai, bi) pair to arrive at a solution thereby reducing processing power and computational cost of the apparatus executing the algorithm.


The validity of the above solution may be judged via the denominator of equation (9) which is a measure of the convexity of the function. A largely positive denominator would generate a better solution than a denominator close to zero. The solution will fail when the denominator is zero, in which case the zero shift solution is sometimes preferred. Also if the denominator is negative, the solution is a maxima instead of a minima, and the solution should be avoided. In some cases, the zero shift or a shift at the bounds of the acceptable shift range should be used (the two boundary conditions can be tested to determine which produces less error).


The above is advantageously applied to performing dual pixel alignment for depth estimation. The following description outlines the steps for fast generation of alignment errors for various points in a pair of two dual pixel images. Given a respective point in a pair of dual pixel images IL and IR. A window size of (2w+1)×(2w+1) centered around the point are sampled from the two images. Each windowed sampled represents a collection of 1-dimensional signals in the direction of the dual pixel split (e.g. horizontally). Thus a sampled patch from the left and right images








P
L

=

[



-




s



-
w




-















-




s



+
w




-



]


,


P
R

=

[



-




s



-
w




-















-




s



+
w




-



]






In these matrices custom-character is the left pixel signal of 2w+1 samples shifted j pixels down from the center of the target patch to align. Thus PL is a patch of the image generated from the left side of the pixels in a dual pixel sensor centered around the location of interest/analysis. Similarly PR is an image made up of corresponding pixel locations but taken from the right side of the pixels in the dual pixel sensor. Thus the polynomial approximations of the patch signals may be calculated from equation (0) and used in equation (7) as





A=RPLT, and B=RPRT


And ai denotes the corresponding column i of the matrix.


Next for each pair of columns, e.g. (ai, bi), from the left and right regression coefficient matrices A and B, the matrix is constructed







C

(


a
i

,

b
i


)

=


β

(

a
i

)

-


b
i

·

[

0
,


,
0
,
1

]







where β(ai) is the lower anti-triangular matrix derived from ai as described earlier in this document. For each i we derive the matrix D(ai, bi) from C(ai, bi) by performing a 2-dimensional auto-convolution on C(ai, bi). In some cases not all elements of D(ai, bi) are calculated. Processing is then performed to compute g2pTi=−wwωiD(ai, bi)] (or at least some of the terms of this vector—since some terms ultimately may not play into the final solution)). Finally, the optimal δ is estimated from the above polynomial coefficients based on one of the methods laid out above (e.g. the root finding method or equation (9)).


Based on the above further reduction in the necessary computation needed to solve for the estimate of delta is advantageously obtained by using a polynomial order of 3. Thus, from the matrix D, only 6 of the 49 terms in the matrix need to be computed with 3 of the 6 terms used for the calculation of the first derivative in equation (9), and the other 3 are used for the calculation of the second derivative. This is performed for each D matrix in the 2w+1 regressed lines in the patch. A matrix G is defined that describes the 3 terms for each row in a patch, a 3x(2w+1) matrix, where the i-th column is given by the i-th regression coefficients a and b










G
i

=

[





2


a
2




e
3


+

3


a
3



e
2









3


a
3



e
0


+

2


a
2



e
1


+


a
1



e
2









a
1



e
1





]





(
10
)







where e is defined as






e
=

a
-
b





A matrix H is also defined that describes the 3 terms for each row patch, a 3x(2w+1) matrix, where the i-th column is given by the i-th regression coefficients a and b










H
i

=

[




3



a
3

(


3


a
3


+

2


e
3



)








2



a
2

(


2


a
2


+

e
2


)


+

6



a
3

(


a
1

+

e
1


)









a
1
2

+

2


a
2



e
0






]





(
11
)







From these expressions equation (9) can be rewritten as










δ
1

=




J


(
0
)



J


(
0
)


=


2



g
˜

T


G

w



2



g
˜

T


H

w

+
λ







(
12
)







where G, H, {tilde over (g)}T and wT are






G
=

[


G
1

,

G
2

,


,

G


2

w

+
1



]







H
=

[


H
1

,

H
2

,


,

H


2

w

+
1



]









g
˜

T

=

[


1

5


σ
4


,

3


σ
2


,
1

]








w
T

=

[


ω

-
w


,


,

ω
0

,


,

ω
w


]





From this, the following is true:









E
=


[


e
1

,


,

e


2

w

+
1



]

=


A
-
B

=



R


P
L
T


-

R


P
R
T



=

R

(


P
L
T

-

P
R
T


)








(
13
)







So, the coefficients for E can be generated through the regression matrix applied to the difference of the two dual-pixel images' centered patches. Also the values, 3a3, 2a2, and a1 recur in equations (10) and (11) and the value of a0 does not occur at all. Thus the patch PLT can be regressed to produce only 3a3, 2a2, and a1 using a regression matrix S










A
~

=


[




3


a
3







2


a
2







a
1




]

=



{


[



3


0


0


0




0


2


0


0




0


0


1


0



]


R

}



P
L
T


=

S


P
L
T








(
14
)







where S can be pre-calculated as a 3×4 matrix which requires less computation to apply than using R. Because of this the algorithm described in FIG. 2 is possible. In equation (12) the numerator and denominator both involve a bilinear measure on the two signals' fitting parameters. Furthermore the denominator can determine an alignment confidence based at least in part on the at least one bilinear measure on the at least two signals' fitting parameters. The shift estimate of equation (12) is the sub-pixel alignment estimate based at least in part on the at least one bilinear measure on the at least two signals' fitting parameters (e.g. the numerator and the denominator). When other methods are used to fine the shift the solution still involves a bilinear measure on the at least two signals' fitting parameters (e.g. solving equation (8) using a root finding method).


In FIG. 2, the flow starts in 205 and moves to 210 where parameters are pre-calculated. In some example embodiments, the system pre-calculates the polynomial regression matrix R, the continuous even central moments {tilde over (g)}T, and the cross direction weighting vector w, and λ. λ is a regularization parameter which can the reciprocal of a delta standard deviation prior about zero for example or some other small number. Flow then moves to block 220 where a location in the image is selected for processing. The flow then moves to block 230.


Some embodiments of block 230 estimate à from equation (14) and E from equation (13) for the patch, and patch difference surrounding the pixel (note if patches overlap from one pixel to the next, some columns of à and E may be reused from calculations of a previous pixel). Other embodiments compute other linear transformations on the patch around the selected location to process.


Next flow moves to block 240 where additional non-linear processing takes place based on the linear processing performed in the previous step. For example, some embodiments calculate G and H from equations (10) and (11) using a bilinear combination of the results found in block 230. Other embodiments use the 2-D convolution of a matrix containing the block 230 results such as described by the matrix D derived from equation (2) above. Other embodiments process this step through other non-linear functions of the results from block 230 such as non-linear activations and network layers in a neural network.


Flow then moves to block 250 where a confidence measure and a shift measure are computed for the location. In some embodiments we calculate {tilde over (g)}THw as a confidence measure (a measure of convexity at 0). Some embodiments also calculate {tilde over (g)}TGw and combine that result with the confidence measure to compute the delta (shift) estimate such as is described in equation (12). This term may determine a confidence based at least in part on the at least one bilinear measure on the at least two signals' fitting parameters.


In block 260 it is determined if there are more locations to process. If so, flow returns to 220. In some embodiments the flow from 220 to 250 can be done in parallel for multiple locations simultaneously via multiple processing units such as a multicore CPU, a GPU, or a TPU, or other specialized and parallelize hardware for example. If all locations have been processed, the shift/delta maps and confidence maps are returned from the module in block 270 and flow then ends in block 280. Based on the output a confidence threshold may be determined and a map of delta values for the pixels above the confidence threshold can be generated.


For raw dual-pixel data, examination of the data in Bayer format is performed. The format usually interlaces pixels in a quad pattern such as shown in FIG. 6. In this case, the Blue channels (B) and Red channels (R) can be processed by sub-sampling every other row and column of the raw data, centered at a Red or Blue pixel. This process follows the approach previously described. For the Green channels, every other row is staggered with respect to the previous row. Thus, one row (for example the center row in the patch) may be regressed as outlined above with 2w+1 samples (from −2w, −2w+2, . . . ,0, . . . , 2w−2, 2w). The preceding and subsequent row can be sampled with 2w samples at (−2w+1, −2w+3, . . . , −3, −1, 1, 3, . . . , 2w−3, 2w−1). For the staggered rows, the regression matrix will be different because the sampling is different. For example, when weighing signals across multiple rows, the regression matrix used for alternating rows will need to change to account for the staggered G pattern as seen in FIG. 6.



FIG. 7 illustrates an example rotated input image taken from one channel of the raw dual pixel image. The image is in portrait mode and is therefore rotated for viewing purposes. Due to variations in intensities across the dual pixel images due to optical pathway and sensor variations, the first image and the second image may be intensity adjusted. One such adjustment method, profiles the ratio of the two images for each channel and takes the median across each column of the un-rotated image. The adjustment factor is fit to a low order polynomial or other smooth function to account for regions with large depth changes across the dual-pixels.



FIGS. 8A-8C show the results from the dual-pixel alignment tasks for the input images. FIG. 8A shows the confidence map of the dual-pixel alignment which represents the second derivative of the approximated error function. The second derivative is a measure of the objective convexity near zero, and informs the confidence value such that the one step approximation method leads to a good approximation of the local minima near zero. FIG. 8B shows the local minima for delta for all pixels bounded from about −1.5 to 1.5 pixels and FIG. 8C shows the delta value estimates for the pixels of high confidence only.


The red, green, and blue channels of the image each produce different confident delta maps. These results may be combined to produce a fused delta map. In some embodiments, the confidence map is converted to a confidence score from zero to one via a sigmoid function. The function maps high confidence points to 1.0 and low confidence scores to 0.0 and produces a score in-between for mid confidence. These scores can be combined to produce a softmax like weighting that is then used to average the results of the three channels. However, if all of the confidence scores are low, the point will not be estimated. Additionally there may exist biases or scaling in the delta estimates across the 3 channels. To adjust for this, pixels confident in all channels may be compared by normalizing the three delta values such that the green channel delta value is one. Then, the median of the other 2 channels may indicate a relative adjustment scale that may be applied before averaging across multiple channels. Depth information from sub-pixel alignment estimates can be obtained by estimating based on disparity calculations for a given focal length and given pixel spacing.






z
=

f


(

1
+

2

δ


)



Δ

p

i

x








where f is the focal length and Δpix is the pixel spacing in the same units.


As such, the present disclosure advantageously provides a real time system and method for sensing a scene and inferring a Red, Green, Blue, Depth estimation from existing commonly available sensors. Since the method is capable of real-time performance it may be used to augment video. The addition of depth estimation to the camera output enables applications in image editing, object recognition by providing richer data than what is typically generated. It can enable robotic applications, it can be used for VR applications, gaming, automotive sensing, fast background removal, just to name a few applications.


As such, the above illustrates a method for generating a sub-pixel alignment estimate from dual-pixel sensor data. This method is embodied as computer readable instructions stored in memory that are executed by one or more processors of an information processing apparatus. The stored instructions, when executed, cause the apparatus to parametrically fit at least two signals observed across a two dual-pixels of the dual-pixel sensor in the direction of the dual-pixel split, generate at least one bilinear measure on the at least two signals' fitting parameters, determine an alignment confidence based at least in part on the at least one bilinear measure on the at least two signals' fitting parameters, and determining the sub-pixel alignment estimate based at least in part on the at least one bilinear measure on the at least two signals' fitting parameters. In certain embodiments, the depth of a pixel relative to the focus depth based is estimated at least in part on the lens focal length and pixel spacing and the sub-pixel alignment estimate. Further, a sub-pixel alignment map for an image is estimated by repeatedly applying the above to a plurality of dual-pixels in an image. In certain embodiments, a mesh containing sets of vertices and surfaces including one or more textures is generated based on the subpixel alignment estimate.


At least two signals observed across a two dual-pixels of the dual-pixel sensor in the direction of the dual-pixel split includes parallel scan lines across the dual pixels for a patch centered at the location of the alignment estimate and a weighting of the parallel scan lines is applied and the weighting weighs more scan lines central to the patch.


At least some of the above-described devices, systems, and methods can be implemented, at least in part, by providing one or more computer-readable media that contain computer-executable instructions for realizing the above-described operations to one or more computing devices that are configured to read and execute the computer-executable instructions. The systems or devices perform the operations of the above-described embodiments when executing the computer-executable instructions. Also, an operating system on the one or more systems or devices may implement at least some of the operations of the above-described embodiments.


Furthermore, some embodiments use one or more functional units to implement the above-described devices, systems, and methods. The functional units may be implemented in only hardware (e.g., customized circuitry) or in a combination of software and hardware (e.g., a microprocessor that executes software).


Additionally, some embodiments of the devices, systems, and methods combine features from two or more of the embodiments that are described herein. Also, as used herein, the conjunction “or” generally refers to an inclusive “or,” though “or” may refer to an exclusive “or” if expressly indicated or if the context indicates that the “or” must be an exclusive “or.”


While the present disclosure has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments.

Claims
  • 1. A method for generating a sub-pixel alignment estimate from dual-pixel sensor data, the method comprising: parametrically fitting at least two signals observed across a two dual-pixels of the dual-pixel sensor in the direction of the dual-pixel split,generating at least one bilinear measure on the at least two signals' fitting parameters,determining an alignment confidence based at least in part on the at least one bilinear measure on the at least two signals' fitting parameters, anddetermining the sub-pixel alignment estimate based at least in part on the at least one bilinear measure on the at least two signals' fitting parameters.
  • 2. The method of claim 1 further comprising estimating the depth of a pixel relative to the focus depth based at least in part on the lens focal length and pixel spacing and the sub-pixel alignment estimate.
  • 3. The method of claim 1 further comprising, estimating a sub-pixel alignment map for an image by processing a plurality of dual-pixels in an image.
  • 4. The method of claim 1 wherein the at least two signals observed across a two dual-pixels of the dual-pixel sensor in the direction of the dual-pixel split includes parallel scan lines across the dual pixels for a patch centered at the location of the alignment estimate.
  • 5. The method of claim 4 wherein a weighting of the parallel scan lines is applied.
  • 6. The method of claim 5 wherein the weighting weighs higher scan lines central to the patch.
  • 7. The method of claim 1, further comprising generating, based on the subpixel alignment estimate, a mesh containing sets of vertices and surfaces including one or more textures.
  • 8. An apparatus that generates a sub-pixel alignment estimate from dual-pixel sensor data comprising: one or memories storing instructions;one or more processors that, upon execution of the instructions are configured to parametrically fit at least two signals observed across a two dual-pixels of the dual-pixelsensor in the direction of the dual-pixel split, generate at least one bilinear measure on the at least two signals' fitting parameters,determine an alignment confidence based at least in part on the at least one bilinearmeasure on the at least two signals' fitting parameters, and determine the sub-pixel alignment estimate based at least in part on the at least onebilinear measure on the at least two signals' fitting parameters.
  • 9. The apparatus of claim 8, wherein execution of the instructions further configures the one or more processors to estimate the depth of a pixel relative to the focus depth based at least in part on the lens focal length and pixel spacing and the sub-pixel alignment estimate.
  • 10. The apparatus of claim 8, wherein execution of the instructions further configures the one or more processors to, estimate a sub-pixel alignment map for an image by repeatedly applying the process of claim 1 to a plurality of dual-pixels in an image.
  • 11. The apparatus of claim 8, wherein the at least two signals observed across a two dual-pixels of the dual-pixel sensor in the direction of the dual-pixel split includes parallel scan lines across the dual pixels for a patch centered at the location of the alignment estimate.
  • 12. The apparatus of claim 11 wherein a weighting of the parallel scan lines is applied.
  • 13. The apparats of claim 12 wherein the weighting weighs higher scan lines central to the patch.
  • 14. An apparatus that generates color and depth data comprising: one or more memories storing instructions;one or more processors that, upon execution of the instructions are configured to read data from a dual-pixel RGB sensor array,parametrically fit at least two signals observed across a two dual-pixels of the dual-pixel sensor in the direction of the dual-pixel split,generate at least one bilinear measure on the at least two signals' fitting parameters,determine an alignment confidence based at least in part on the at least one bilinear measure on the at least two signals' fitting parameters,determine the sub-pixel alignment estimate based at least in part on the at leastone bilinear measure on the at least two signals' fitting parameters, generate the color and depth data based on the color information from the dual-pixel sensor array and from sub-pixel alignment estimate, andstore or transmit the generated color and depth data.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority from U.S. Provisional Patent Application Ser. No. 63/309,328 filed on Feb. 11, 2022, the entirety of which is incorporated herein by reference.

PCT Information
Filing Document Filing Date Country Kind
PCT/US2023/012766 2/10/2023 WO
Provisional Applications (1)
Number Date Country
63309328 Feb 2022 US