The present application relates to exposure control for digital cameras, and more particularly to event-guided auto-exposure control.
Exposure control (EC) is essential for digital photography [17] in order to capture images or video frames with appropriate dynamic range for high-quality visualization [30] or reliable vision-based applications, including object detection and tracking [24], simultaneous localization and mapping (SLAM) [44], recognition [23] and robotics [29]. Without EC, images can be saturated, which affects the detection algorithm. See
Auto-exposure (AE) facilitates EC by adjusting the exposure parameters with a sophisticated feedback controller [29, 30, 44]. Correspondingly, the optimal exposure level is determined either according to an irradiance prediction [29, 30] or by leveraging the image assessment [18, 38, 44]. Both require well-exposed images, which may not be provided under nature scenes, especially with harsh lighting conditions or unpredictable relative motions, resulting in a large number of failed trials for the auto-exposure control [38]. In. summary, two challenging issues are:
Using a combination of events and image frames for exposure control has become a popular area of research in recent years. The majority of the previous work in this area [7,40,41] directly uses event representation methods, such as time surface [21] or event stacking [11], to generate event frames and then combines them with image frames. In recent years, researchers have become interested in using optimization frameworks to combine two modalities [34, 43]. Wang et al. [43] used the motion compensation framework to filter the events in combination with a high frame rate image. Pan et al. [34] used an integral model to describe the latent image and then completed the deblurring and frame reconstruction. However, this work did not consider the complete image formation process. It assumes a linear camera radiometric response, which hardly converges to the optimal setting when facing a non-linear one. Besides, it can generate a correct contrast image but cannot directly estimate the irradiance.
Conventionally, most vision-based systems rely on build-in AE algorithms [17,28] to adjust the exposure time. Research methods for AE can be classified into three types. The first type uses image statistics as feedback. The most common approaches [19, 36] move the average intensity of images to a mid-range (e.g., 128 for 8-bit images). Improved methods adopt the image entropy [24] and histograms [27,30] to increase the robustness. However, converging to proper exposure requires many image samples and an even distribution of scene illumination, making the adjustment slow in a natural scene.
The second type of research leverages prior knowledge, like a predefined pattern [29], to increase the convergence speed. But such methods work poorly in an unknown scene.
The third type introduces quality-based loss functions to get better performance in a natural scene. Shim et al. [37] used the image gradient as a metric. They computed a linear loss for each image synthesized with gamma mapping to find a proper exposure adjustment step. For a faster convergence, they further introduced a nonlinear loss in [38]. Zhang et al. [44] improved the loss and considered the camera response function for SLAM tasks. However, metric-based methods are easily affected by motion blur. The blurred image hardly provides correct information for calculating the quality loss, thus limiting their performance.
A natural scene has a very high dynamic range, much higher than a camera can sense. Thus, AE methods are needed to control the total amount of light received by the sensor, i.e., exposure, to help keep most pixels immune from saturation.
Meter-based auto-exposure (M-AE). The task of MAE is to use light meters [15] to measure the radiance (u, t) and compute the optimal camera exposure of j-th capturing Hj(u
where u=(x; y)T is the pixel position, t is time, Ku defines the calibrated camera lens parameters, fM is a linear function for estimating the exposure. The estimated result is inaccurate because the light meter can only give the average of the scene's illumination. Besides, the optical system for metering also makes a camera using it bulky and expensive.
Image-based auto-exposure (I-AE). Without light meters, most vision-based systems directly use images to adjust the exposure:
where fI is a function that maps images to optimal exposure, Iji−2={Ii(u)|I==I=1, . . . , j−1} is the image set that contains images before the j-th capturing. Most I-AE methods rely on a feedback pipeline to gradually converge to an optimal exposure. That is not only a waste of image samples but it also makes I-AE easily affected by challenging natural illumination.
The problems of the prior art are addressed according to the present application by extending the formulations of Pan et al. [34], which use an integral model to describe the latent image and then complete the deblurring and frame reconstruction, so as to fit the non-linear case and then combine it with a contrast value calculated from a hardware parameter, allowing fast irradiance computing. This is achieved by introducing a novel sensor, i.e., the dynamic vision sensors (DVS) [5], to conduct the EC with a conventional active pixel sensor (APS).
This novel event-guided auto-exposure (EG-AE) leverages the dynamic vision sensor's high dynamic range and low latency properties to guide the exposure setting of the active pixel sensor. Physical connections of images and events are used to estimate the irradiance variations, which is further fed into the EG-AE for calculating the exposure setting.
Event-guided auto-exposure (EG-AE). Complementary to the active pixel sensor (APS), the dynamic vision sensor (DVS) provides a low latency event stream, which encodes the scene's illumination changes in a very high dynamic range. That means the event stream can be the ideal data to guide the camera exposure. Thus, the task of the EG-AE is to utilize two modalities to estimate the scene illumination and compute the optimal exposure:
where fEG is the function that leverages the events set ε and images set Ij−1 to give the optimal exposure. The main challenges of EG-AE come from finding the physical connection between two modalities and developing an efficient framework that fully exploits this connection to compute the irradiance for camera exposure control.
The advantages of DVS include high dynamic range (HDR, >130 dB [5]), low latency (1 μs [12]), and low motion blur [10], which make it an ideal sensor for operating in harsh environments. The HDR and low motion blur of the DVS provide it with HDR sensing ability to compute the scene's illumination without being affected by relative motions. The low latency of the DVS allows the exposure adjustment to be done before the APS exposure, thereby vastly shrinking the response time to the microsecond level. However, as the DVS only responds to the illumination change, it cannot be used alone to directly compute the correct absolute illumination. An estimation framework leveraging both absolute signals from an APS and the relative signals from a DVS is demanding. This application uses an efficient event-based framework for irradiance estimation.
The nonlinearity of camera response and DVS hardware principles are considered in extending previous event-based double integration, Pan et al. [34], to provide physically meaningful irradiance effectively. Based on the result, a novel event-guided auto exposure is used, which is believed to be the first AE method based on the event camera. The method of the present application is simple yet effective, which largely shrinks the response time and reduces the number of saturated image samples. As shown in
This patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. The foregoing and other objects and advantages of the present application will become more apparent when considered in connection with the following detailed description and appended drawings in which like designations denote like elements in the various views, and wherein:
Optical mapping. There are two steps for a camera to map the scene's illumination into a digital signal, i.e., the optical mapping and the sensor response. First, the lens 14 will linearly map the scene radiance to the sensor irradiance. The mapping of the lens system could be described by E(u; t)=K(u)L(u; t), where E(u; t) and L(u; t) are the irradiance and radiance of pixel u=(x; y)T at time t respectively. K(u) is the lens parameter of pixel u, which is a constant for most cameras. Using the recently introduced DAVIS 346 AE event camera, [5] that gives pixel-aligned DVS events and APS images concurrently, the optical mapping is suitable for both events and images. The DAVIS 346 acoustic emission (AE) is a 346×260 pixels DVS event camera with an included active pixel frame sensor or APS. After the optical mapping, the APS and DVS will respond to the irradiance and transform it into images and events correspondingly.
APS radiometric response. For a continuous irradiance signal, the APS exposes the scene to form a sequence of images. There are two equivalent ways to control the exposure, i.e., by adjusting aperture size or exposure time. The present embodiment of the application assumes, but is not limited to, a camera using a lens with a fixed aperture. Thus, the exposure is solely controlled by the exposure time. The APS will accumulate the irradiance during the exposure time and transform it to digital images. Thus, exposure equals the integration of irradiance over exposure time, which also equals the average irradiance times the exposure time:
where Hj(u) and Tj are the exposure and the exposure time of image j correspondingly, tj is the starting time of j-th exposure, Ej(u) is the average value of irradiance over the duration of the j-th exposure. Then the APS transforms the exposure to digital images in a nonlinear manner. This nonlinear radiometric response can be described by the camera response function (CRF) [13] as follows:
where f defines the CRF that maps exposure Hj(u) of the j-th image to corresponding image intensity Ij(u)ϵ{0, . . . , 255}.
DVS log-irradiance response. The DVS works asynchronously to respond to the changes in the log of irradiance, and generates a stream of timestamped address events ε={ek|k=1, . . . , Nev}, where Nev is the number of events. Each event is a 4-dimensional tuple ek≐(uk, tk, pk), where uk=(xk, yk)T is the pixel position, tk is the triggering time, pkϵ{−1, +1} is polarity indicating the increase (ON events_or decrease (OFF events) of log irradiance, i.e., Pk=1 if θk<COFF. CON>0 and COFF<0 are contrast values, θk=ln €(uk,tk))−ln(E(uk, tk−Δt)) is the change in the log of irradiance of pixel uk from time tk−Δt to tk.
Event-based Irradiance Estimation (E-IE). The task of EG-AE requires the irradiance to compute the desired exposure time. However, it is difficult to estimate the irradiance based solely on events or images. Events only encode the relative change of log irradiance, missing the absolute reference. Motion blur makes it impossible for the image to provide an irradiance reference. Thus, events and images must be unified. Events provide an estimation of the fluctuating irradiance during the exposure, allowing the image to give a correct reference.
To do that, the estimation in the form of a reference irradiance times relative irradiance must be formulated as follows:
where Ê (u, t; tref) is the estimated irradiance of pixel u at time t, computed from reference irradiance E(u, tref (at reference time trefΔE(u, t′tref)) denotes the relative irradiance from reference time tref to time t:
The reference irradiance E (u; tref) can be derived from the image exposure by letting tj be the reference time tref. Relating Eq. 1 and Eq. 2 and separating the continuous irradiance using Eq. 5:
As the reference irradiance E(u; tj) is a constant, it can be move out of the integration in Eq. 6. Then by moving the exposure time Tj to the right, it can be seen that the right side is a reference irradiance times the average of relative irradiance:
where (Δt)(u; tj) is the average of relative irradiance over the duration of j-th exposure. Rearranging Eq. 7, produces:
Plugging Eq. 8 into Eq. 4 (tj=tref), causes the irradiance estimation to turn into three approximations, i.e., the approximation of relative irradiance ΔE (u; t; tj), its average ΔĒjj(u; tj), and average of irradiance Ēj(u):
Approximation of the relative irradiance. The DVS events encode the relative irradiance in the log space. Thus, all of the events can be directly summed up from a reference time tj to time t to approximate the log relative irradiance, and then the exponentiation of it can be taken to get the relative irradiance:
where event ek=(uk, tk; pk) subjects to tkϵ[tj; t], h is the mapping function that maps the event in position u to corresponding contrast value:
Approximation of the average of irradiance. Plugging Eq. 2 into Eq. 3, the image intensity Ij(u) can be inversely mapped to corresponding average irradiance Ēj(u):
where f1 is inverse CRF. Here only the pixel intensity from 1 to 254 is used for estimation, because the value 0 and 255 indicates that the exposure is beyond the dynamic range.
Approximation of the average of relative irradiance. A straightforward method to approximate the average involves summing up all relative irradiance at a fixed step and taking its average. But since the event noise is evenly distributed over time, this method gives an equal weight to the noise. Thus, the result will be biased by noise, as shown in FIG. 3B where
The average of relative irradiance over duration [tj; t] is given by:
where Nev is the total number of events. The reconstructed images using Eq. 13 are shown in
Irradiance reconstruction. After the above approximations, the irradiance at time t can be estimated by plugging Eq. 10, Eq. 12 and Eq. 13 into Eq. 9. As saturated pixels cannot give correct exposure, the latest unsaturated pixels are used to recover the reference irradiance and they are combined with the irradiance estimated from previous unsaturated pixels:
where Ê (u; t) is the output irradiance of the E-IE framework, tn is the exposure starting time of the previous unsaturated pixel, j>n, n=0 indicates that the irradiance is estimated from the initial value E(u; 0), which is a synthesized high dynamic range (HDR) irradiance using [8]. Given the estimated irradiance Ê (u; t) from Eq. 14, the intensity images can be reconstructed using the camera response function (CRF):
where Î(u; t) is the image intensity of pixel u at time t, Tj is the exposure time of image j. When the irradiance is accurately estimated, the reconstructed images will be clear without blur. And the frame rate of reconstructed images could be, in theory, as high as the DVS's event rate. Thus, in addition to the irradiance estimation, complete image deblurring and high-rate frame reconstruction tasks are also completed.
Next, the event-guided auto exposure (EG-AE) uses the estimated irradiance Ê (u; t) to compute the desired exposure time for image capturing.
The dynamic range of the active pixel sensor (APS) is generally low, i.e., <60 dB, and is unable to fully cover lighting in a natural scene. Thus vision-based systems working in natural scenes need to adjust the exposure time to make sure most pixels in the APS are immune from saturation. Using events from the DVS, the possible sensing range extends beyond 130 dB. Thus, the HDR sensing capability of DVS can be leveraged to guide exposure setting for APS capturing. To do that, the average of estimated irradiance can be mapped to the middle value of CRF using a proper Id, i.e., Id=f(½(f−(255)+f-1(0))). In this way, except for extreme irradiance distribution, the illumination of most scenes can be covered by a camera's dynamic range. Given the desired intensity Id, the desired exposure time is given by:
where Np is the pixel number in a region of interest (ROI) P. For most vision-based systems, the ROI can be set to the whole imaging plane.
In order to evaluate the methods of the present application, real-world data was recorded by a DAVIS 346 event cameras.
Calibration of contrast values. To estimate the relative irradiance from a set of events, the contrast values CON and COFF in Eq. 11 need to be known. However, current methods as disclosed in articles [6, 42] estimate the contrast from pre-recorded data and thus suffer from the noise. To remove the effect of noise, methods established in article [31] are introduced, which directly compute a relatively accurate contrast from DVS settings using known hardware parameters:
where the Kn=0:7 and Kp=0:7 are the back gate coefficients of n and p FET transistors. C1/C2 is the capacitor ratio of DVS (130/6 for the DAVIS 346 sensor). Id, ION, and IOFF are the bias current set by user in a coarse-fine bias generator [9]. This current could be computed with jAER toolbox [3]. The contrast is set to CON=0:2609, COFF=−0:2605, and remainder was fixed in all experiments. As can be seen in Table 1 where the lower the value of deblurring, the better, the present application was better than the methods in articles [33], [26] and [34] with respect to PIQE [39], SSEQ [22] and BRISQUE [26].
Calibration of CRF. Camera manufacturers design the CRF to be different for each kind of camera. Thus, the CRF needs to be calibrated and the inverse CRF needs to be derived from the calibrated result. The method of article [8] is used for calibrating the CRF, which requires a set of static images with different exposure times.
Dataset. Since current datasets do not include the details of hardware settings and radiometric calibration data, an evaluation dataset was recorded, which included different motions (e.g., shaking, random move) and scene illuminations (e.g., low lighting conditions, sunlight, artificial light).
Implementation details. All the experiments were conducted using a laptop with Intel i7-10870H@2.2 GHz CPU. A synchronized stereo rig was constructed with identical hardware setups for two DAIVS 346 cameras. C++ software was used to implement the algorithms on a DV-Platform. The overall framework event rate of the embodiments exceeded 8 million events per second. Irradiance Reconstruction In order to evaluate the effectiveness of the event-based irradiance estimation (E-IE) framework of the present application, the irradiance at the starting time of each exposure was first reconstructed and then used with the corresponding exposure time to reconstruct an image from the irradiance. When the irradiance estimation was accurate, the reconstructed images were clear. Otherwise, the estimation error caused the image to blur. Then the results of the present application were compared with other state-of-the-art image deblurring methods, including the conventional method [33], the learning-based method [16], and the event-frame reconstruction method [34]. Qualitative results are shown in
From Table 1, we can see that the application (EG-AG) achieves the best score regarding all metrics. Other methods get suboptimal scores because their heavily blurred images hardly provide any useful information for reconstruction. From
Exposure Control. The present invention was compared with multiple EC methods, including manually adjusted exposure (ME), the built-in AE of DAVIS and the state-of-the-art I-AE method of article [38]. For ME, the exposure time was adjusted at the beginning of each experiment then remained fixed. For I-AE, the non-linear version in article [38] was re-implemented with stable parameters. Comparisons were made in pairs, i.e., one DAVIS runs the EG-AE of the present invention, another runs a conventional method for comparing. The EG-AE of the present application was tested in challenging illuminations that contain very bright lighting. Without a proper adjustment, images taken from APS will be heavily over-exposed. Thus, these methods were first evaluated using the over-exposed rate: the average number of over-exposed pixels divided by the number of total pixels over the whole sequence. The results are shown in Table 2. Also, the image quality was evaluated and the results were summarized in Table 3.
From Table 2 it can be seen that the EG-AE application significantly reduces the over-exposed rate. That means the EG-AE application can properly set the exposure time to alleviate saturation, which improves the image quality and helps the EG-AE get the best quality score in Table 3.
In the book scene illustrated in
In
Comparisons with off-the-shelf devices. To further evaluate the effectiveness of the new EG-AE method, tests of the method were carried out in an extreme illumination condition and compared with the current state of-the-art devices, including HUAWEI Mate 30 [2], GoPro HERO 7 [1], and DAVIS 346 [5]. In these tests the sensor was first covered with a book to generate a low lighting condition (<90 Lux) and then the book was moved away quickly to expose the sensor to sunlight directly (>3000 Lux). To fully evaluate the application, the snapshot mode of the DAVIS camera was used to take images. The first image was captured under low lighting conditions and the second image was captured after having fully moved the covering book away. In this way, there was no chance to use image samples in between to estimate the irradiance. The results are shown in
The methods of the present application increase the imaging quality and thus may benefit many applications. Here, are two examples.
Visual tag detection. The event-based irradiance estimation framework (E-IE) of the EG-AE method of the present application can improve visual tag detection because of its effectiveness for image deblurring. As shown in
Feature matching and tracking. The present methods can benefit feature matching and tracking due to their capability of essentially improving image quality. As shown in
The present invention can be compared with conventional EC methods for feature tracking using multiple indoor trajectories. The results are summarized in Table 4. As can be seen, EG-AE gets the most effective matching regarding all features in that table. That is because the EG-AE of the present invention can quickly adjust the exposure time, which preserve more gradient information for feature tracking.
The present invention provides a novel auto-exposure (ΔE) method that leverage the event camera's high dynamic range and low latency properties to adjust the exposure setting. The physical connections of DVS and APS and other hardware properties are considered in the proposed event-based irradiance estimation (E-IE) framework, allowing exposure control, irradiance estimation, and image deblurring to be efficiently done. Extensive experiments have demonstrated that the methods of the present invention can robustly tackle challenging lighting variations and alleviate saturation. Besides, these methods essentially increase the image quality, thereby benefit multiple downstream tasks.
The cited references in this application are incorporated herein by reference in their entirety and are as follows:
While the application is explained in relation to certain embodiments, it is to be understood that various modifications thereof will become apparent to those skilled in the art upon reading the specification. Therefore, it is to be understood that the application disclosed herein is intended to cover such modifications as fall within the scope of the appended claims.
This application is a U.S. National Stage Application under 35 U.S.C. § 371 of International Patent Application No. PCT/CN2022/114456, filed Aug. 24, 2022, and claims the benefit of priority under 35 U.S.C. Section 119(e) of U.S. Application No. 63/236,375, filed Aug. 24, 2021, all of which are incorporated herein by reference in their entireties. The International Application was published on Mar. 2, 2023 as International Publication No. WO 2023/025185 A1.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2022/114456 | 8/24/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63236375 | Aug 2021 | US |