This disclosure relates generally to light-field images, and more particularly, to processing light-field images using deep learning models.
The light-field camera has recently received increased attention. It can be used to recalculate a different focus point or point of view of an object, based on digital processing of the captured light-field image. The light-field camera also finds application in estimating depth to three-dimensional objects that are imaged by the light-field camera, possibly followed by three-dimensional reconstruction of those objects or the entire three-dimensional scene.
In addition, deep learning has also recently received increased attention. Much research has been directed to studying a multitude of deep learning architectures for solving a variety of classification problems. For image processing applications, typically, a training set of images is used to train the deep learning model and, once trained, the model can be used to classify new images. However, selection of the training set and pre-processing of the images can be an important aspect affecting the performance of the model. Furthermore, light-field images contain information that is different from the information in conventional two-dimensional images. However, to date, there has not been much research on how to architect deep learning models to work with light-field images.
Thus, there is a need for better approaches to combine deep learning with light-field images.
The present disclosure overcomes the limitations of the prior art by masking light-field data to identify regions of interest, before applying to deep learning models. In one approach, a light-field camera captures a light-field image of an object to be classified. The light-field image includes many views of the object taken simultaneously from different viewpoints. The light-field image is pre-processed, with the resulting data provided as input to a deep learning model. The pre-processing includes determining and then applying masks to select regions of interest within the light-field data. In this way, less relevant data can be excluded from the deep learning model. Based on the masked data, the deep learning model produces a decision classifying the object.
The masking can be determined in different ways, for example based on depth information extracted from the light-field image or based on the pixel values within the light-field image. The masking can also be applied at different stages: to the entire light-field image at once, applied separately to each of the views, applied separately for different color channels, or applied to epipolar images generated from the light-field image, just to give a few examples.
Other aspects include components, devices, systems, improvements, methods, processes, applications, computer readable mediums, and other technologies related to any of the above.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Embodiments of the disclosure have other advantages and features which will be more readily apparent from the following detailed description and the appended claims, when taken in conjunction with the examples in the accompanying drawings, in which:
The figures and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.
The system includes a light-field camera 110 and a deep-learning classifier 190. The classifier 190 in turn includes a pre-processing module 192 and a deep learning model 196. The deep-learning classifier 190 is typically implemented as a computer system with processor and computer instructions. The light-field camera 110 captures light-field images 170 of objects 150. Each light-field image 170 contains many views of the object 150 taken simultaneously from different viewpoints.
The pre-processing module 192 receives the light-field images from the camera. It may do so through a standardized interface (not shown in
Now consider each of the components shown in
The secondary imaging array 214 may be referred to as a microimaging array. The secondary imaging array 214 and sensor array 280 together may be referred to as a light-field sensor module. In this example, the secondary imaging array 214 is a microlens array. Other examples of microimaging arrays 214 include arrays of pinholes and waveguide/channel arrays. The microimaging array 214 can be a rectangular array, hexagonal array or other types of arrays.
These components form two overlapping imaging subsystems. In the first imaging subsystem, the objective lens 212 forms an optical image 255 of the object 150 at the primary image plane IP. This imaging subsystem has a pupil plane, marked SP′ in
The bottom portion of
However, the different views 255A-D are separated in an interleaved fashion at the sensor plane, as shown in
Areas 1-9 are similarly arranged. Thus, the areas 1A-9A that make up view 170A are spread out across the composite light-field image 170, separated by portions of the other views 170B-D. Put in another way, if the sensor is a rectangular array of individual sensor elements, the overall array can be divided into rectangular subarrays 271(1)-(9) of sensor elements (only one subarray 271(1) is shown by the dashed lines in
It should be noted that
The light-field image 170 is processed by the deep-learning classifier 190.
Masking can be determined using different techniques, some examples of which are shown in
In
Masking can also be manually generated, for example by having a person outline the region of interest. Alternately, it can be automatically generated, for example by segmenting the views into different regions based on identified features. In yet another approach, masking may be generated by comparing a current image with a previous image and selecting only the differences.
In another aspect, the masks may be binary, taking on values of 1 or 0, as in the examples of
The masks can also be applied to different types of light-field data. In
Various other combinations are possible.
In
Different types of voting schema can be used. Majority voting, plurality voting and weighted voting are some examples. In weighted majority voting, the majority vote wins but the intermediate decisions are given unequal weights. In one approach, the weighting is related to the masking used to produce that intermediate decision. In addition, the voting schema does not have to output a binary decision (as with majority voting). Rather, the intermediate decisions can be combined to produce a continuous final decision (e.g., probability of outcome). The voting schema may also be machine-based and learned.
Different deep learning architectures will be apparent. Deep learning architectures include multiple layers. Each layer typically includes a set of filters applied to a set of inputs (images or views in this case), and a non-linear step (e.g., a voting schema) which reduces the size of the output. When the filtering is a convolution between the inputs and the filters, the architecture is typically referred to as a convolutional neural network (CNN). When the network has only a few layers, it is a shallow network. When the number of layers is high, it is a deep learning network. Because of the multiple layers, a deep learning network is computationally and spatially expensive to implement. Masking the input image or the first layers drastically reduces the overall complexity of the network.
In
For example, in
In the examples thus far, the masking is applied in the (image) domain. That is, an (x,y) mask is defined and applied to one of the views, which is a slice of the light-field data along an (x,y) plane.
Notice also that the (channel) coordinate is re-ordered. All the red epipolar images are together, all the green ones are together and all the blue ones are together, suggesting that some reduction along the (channel) dimension is achieved by the deep learning model.
In
One channel is output in the intermediate image (as seen in
where L is the input (RGB) light-field image, i is the index for the color channel of the input image (i takes the values red, green and blue), j is the index for channels in layers after the input layer, w(u,v) are the weights of the (view) filter, and g is the rectified linear unit. As seen from the formula, the weights w of a channel j combine different color channels i. The output given by Eqn. 1 is passed into the remaining part of the deep learning architecture 1030.
Although the detailed description contains many specifics, these should not be construed as limiting the scope of the invention but merely as illustrating different examples. It should be appreciated that the scope of the disclosure includes other embodiments not discussed in detail above. Various other modifications, changes and variations which will be apparent to those skilled in the art may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope as defined in the appended claims. Therefore, the scope of the invention should be determined by the appended claims and their legal equivalents.
Alternate embodiments are implemented in computer hardware, firmware, software, and/or combinations thereof. Implementations can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method steps can be performed by a programmable processor executing a program of instructions to perform functions by operating on input data and generating output. Embodiments can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Generally, a computer will include one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits) and other forms of hardware.
Number | Name | Date | Kind |
---|---|---|---|
20150035803 | Wassvik et al. | Feb 2015 | A1 |
20150065803 | Douglas | Mar 2015 | A1 |
20160180195 | Martinson | Jun 2016 | A1 |
20170200067 | Zhou et al. | Jul 2017 | A1 |
20170256059 | Tosic et al. | Sep 2017 | A1 |
Number | Date | Country |
---|---|---|
107392234 | Nov 2017 | CN |
2014-049118 | Mar 2014 | JP |
2017-146957 | Aug 2017 | JP |
Entry |
---|
Wu et al. “Light Field Image Processing: An Overview.” IEEE Journal of Selected Topics in Signal Processing, vol. 11, No. 7, Oct. 2017, pp. 926-954 (Year: 2017). |
Raghavendra et al. “Combining Iris and Periocular Recognition using Light Field Camera.” 2nd IAPR Asian Conference on Pattern Recognition, Nov. 5, 2013, pp. 155-159 (Year: 2013). |
Wang, T. et al., “A 4D Light-Field Dataset and CNN Architectures for Material Recognition,” European Conference on Computer Vision, 2016, 16 pages. |
European Patent Office, Extended European Search Report and Opinion, EP Patent Application No. 19160959.3, Jul. 24, 2019, ten pages. |
Zhu, H. et al., “Light field imaging: models, calibrations, reconstructions, and applications,” Frontiers of Information Technology & Electronic Engineering, vol. 18, No. 9, Oct. 27, 2017, pp. 1236-1249. |
Japanese Patent Office, Notice of Reasons for Refusal, JP Patent Application No. 2019-026989, dated Feb. 12, 2020, 14 pages. |
Number | Date | Country | |
---|---|---|---|
20190279051 A1 | Sep 2019 | US |