Systems, methods, and media for reconstructing a space-time volume from a coded image are provided.
Cameras face a fundamental trade-off between spatial resolution and temporal resolution. For example, many digital still cameras can capture images with high spatial resolution, while many high-speed video cameras suffer from low spatial resolution. This limitation is due in many instances to hardware factors such as readout and analog-to-digital (A/D) conversion time of image sensors. Although it is possible to increase the readout throughput by introducing parallel A/D convertors and frame buffers, doing so often requires more transistors per pixel, which lowers the fill factor, and increases the cost, for such image sensors. As a compromise; many current camera manufacturers implement a “thin-out” mode, which directly trades-off the spatial resolution for higher temporal resolution, thereby degrading the image quality.
Accordingly, new mechanisms for providing improved temporal resolution without sacrificing spatial resolution are desirable.
Systems, methods, and media for reconstructing a space-time volume from a coded image are provided. In accordance with some embodiments, systems for reconstructing a space-time volume from a coded image are provided, the systems comprising: an image sensor that outputs image data; and at least one processor that: causes a projection of the space-time volume to be captured in a single image of the image data in accordance with a coded shutter function; receives the image data; and performs a reconstruction process on the image data to provide a space-time volume corresponding to the image data.
In accordance with some embodiments, methods for reconstructing a space-time volume from a coded image are provided, the methods comprising: causing a projection of the space-time volume to be captured by an image sensor in a single image of image data in accordance with a coded shutter function using a hardware processor; receiving the image data using a hardware processor; and performing a reconstruction process on the image data to provide a space-time volume corresponding to the image data using a hardware processor.
In accordance with some embodiments, non-transitory computer-readable media containing computer-executable instructions that, when executed by a processor, cause the processor to perform a method for reconstructing a space-time volume from a coded image are provided, the method comprising: causing a projection of the space-time volume to be captured in a single image of image data in accordance with a coded shutter function; receiving the image data; and performing a reconstruction process on the image data to provide a space-time volume corresponding to the image data.
Systems, methods, and media for reconstructing a space-time volume from a coded image are provided. In some embodiments, these systems, methods, and media can provide improved temporal resolution without sacrificing spatial resolution in a captured video.
In accordance with some embodiments, a video can be produced by reconstructing a space-time volume E from a single coded image I captured using a per-pixel coded shutter function S which defines how pixels of a camera sensor capture the coded image I.
In terms of the space-time volume E and the coded shutter function S, the coded image I can be defined as shown in equation (1):
I(x,y)=Σt=1NS(x,y,t)·E(x,y,t), (1)
where x and y correspond to the two-dimensions corresponding to an M×M pixel neighborhood of a camera sensor, t corresponds to N intervals of one integration time of the camera sensor, and the resolution of this space-time volume E is M×M×N. Although a neighborhood of a camera sensor is described herein as being square (M×M) for simplicity and consistency, in some embodiments, a neighborhood need not be square and can be any suitable shape.
Equation (1) can also be written in matrix form as I=SE, where I (observation) and E (unknowns) are vectors with M×M and M×M×N elements, respectively, and S is a matrix with M×M rows and M×N columns. Because the number of observations (M×M) is significantly lower than the number of unknowns (M×M×N), this is an under-determined system. In some embodiments, this system can be solved and the unknown signal E can be recovered if the signal E is sparse and the sampling satisfies the restricted isometry property:
I=SE=SDα, (2)
where D is a basis in which E is sparse, and a is the sparse representation of E.
Turning to
Any suitable coded shutter function can be used to capture an image at 104, and the used shutter function can have any suitable attributes. For example, in some embodiments, the shutter function can have the attribute of being a binary shutter function (i.e., S(x, y, t)∈0, 1) wherein, at every time interval t, the shutter is either integrating light (on) or not (off). As another example, in some embodiments, the shutter function can have the attribute of having only one continuous exposure period (or “bump”) for each pixel during a camera sensor's integration time. As yet another example, in some embodiments, the shutter function can have the attribute of having one or more bump lengths (i.e., durations of exposure) measured in intervals t. As still another example, in some embodiments, the shutter function can have the attribute of having bumps that start at periodic or random times. As a further example, in some embodiments, the shutter function can have the attribute of having groups of pixels having the same start time based on location (e.g., in the same row) in a camera sensor. As a still further example, in some embodiments, the shutter function can have the attribute that at least one pixel of each M×M pixel neighborhood of a camera sensor is sampled at each interval during the camera sensor's integration time.
In some embodiments, a coded shutter function can include a combination of such attributes. For example, in some embodiments, a coded shutter function can be a binary shutter function, can have only one continuous exposure period (or “bump”) for each pixel during a camera sensor's integration time, can have only one bump length, can have bumps that start at random times, and can have the attribute that at least one pixel of each M×M pixel neighborhood of a camera sensor is sampled at each interval during the camera sensor's integration time.
A process 200 for generating such a coded shutter function in accordance with some embodiments is illustrated in
As shown, after process 200 begins at 202, the process can set a first bump length at 204. Any suitable bump length can be set as the first bump length. For example, in some embodiments, the first bump length can be set to one interval t.
Next, at 206, the process can select the first camera sensor pixel. Any suitable pixel can be selected as the first camera sensor pixel. For example, the camera sensor pixel with the lowest set of coordinate values can be set as the first camera sensor pixel.
Then, at 208, process 200 can randomly select (or pseudo-randomly select) a start time during the integration time of the camera's sensor for the selected pixel and assign the bump length and start time to the pixel. At 210, it can be determined if the selected pixel is the last pixel. If not, then process 200 can select the next pixel (using any suitable technique) at 212 and loop back to 208.
Otherwise, process 200 can next select a first M×M pixel neighborhood at 214. This neighborhood can be selected in any suitable manner. For example, a first M×M pixel neighborhood can be selected as the M×M pixel neighborhood with the lowest set of coordinates.
At 216, the process can then determine if at least one pixel in the selected neighborhood was sampled at each time t. This determination can be made in any suitable manner. For example, in some embodiments, the process can loop through each time t and determine if a pixel in the neighborhood has a bump that occurs during that time t. If no pixel in the neighborhood is determined to have a bump during the time t, then the neighborhood can be determined as not having at least one pixel being sampled at each time t and process 200 can loop back to 206.
Otherwise, the process can determine if the current neighborhood is the last neighborhood at 218. This determination can be made in any suitable manner. For example, in some embodiments, the current neighborhood can be determined as being the last neighborhood if it has the highest coordinate pair of all of the neighborhoods. If it is determined that the current neighborhood is not the last neighborhood, then process 200 can select the next neighborhood at 220 and loop back to 216.
Otherwise, at 222, process 200 can next simulate image capture using the bump length and start time assigned to each pixel. Image capture can be simulated in any suitable manner. For example, in some embodiments, image capture can be simulated using real high-speed video data. Next, at 224, reconstruction of the M×M×N sub-volumes and averaging of the sub-volumes to provide a single volume can be performed as described in connection with 106 and 108 of
At 228, process 200 can determine if the current bump length is the last bump length to be checked. This can be determined in any suitable manner. For example, when the bump length is equal to the camera sensor's integration time, the bump length can be determined to be the last bump length. If the bump length is determined to not be the last bump length, then process 200 can select the next bump length at 230 and loop back to 206. The next bump length can be selected in any suitable manner. For example, the next bump length can be set to be the previous bump length plus one interval tin some embodiments.
Otherwise, the bump length and starting time assignments with the best PSNR can be selected as the coded shutter function at 232. The best PSNR can be selected on any suitable basis. For example, in some embodiment, the best PSNR can be selected as the highest PSNR value determined in the presence of noise similar to anticipated camera noise.
Finally, once the bump length and starting time assignments with the best PSNR are selected as the coded shutter function, process 200 can terminate at 234.
Referring back to
{circumflex over (α)}=argαmin∥α∥0 subject to ∥SDα−I∥22<ε (3)
where:
α is a sparse representation of E;
S is a matrix of the shutter function;
D is an over-complete dictionary;
I is a vector of the captured coded image; and
ε is the error between the reconstructed space-time volume and the ground truth. Any suitable mechanism can be used to solve this approximation problem. For example, in accordance with some embodiments, the orthogonal matching pursuit (OMP) algorithm can be used to solve this approximation problem.
Once {circumflex over (α)} has been found, the space-time volume can be computed by solving Ê=D{circumflex over (α)}.
Any suitable over-complete dictionary D can be used in some embodiments, and such a dictionary can be formed in any suitable manner. For example, in accordance with some embodiments, an over-complete dictionary for sparsely expressing target video volumes can be built from a large collection of natural video data. Such an over-complete dictionary can be trained from patches of natural scenes in a training data set using the K-SVD algorithm, as described in Aharon et al., “K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, November 2006, which is hereby incorporated by reference herein in its entirety. Such training can occur any suitable number of times (such as only once) and can occur at any suitable point(s) in time.
Any suitable number of videos of any suitable type can be used to train the dictionary in some embodiments. For example, in some embodiments, a random selection of 20 video sequences with frame rates close to a target frame rate (e.g., 300 fps) can be used in some embodiments. To add variability to the data set, spatial rotations on the sequences can be performed and the sequences can be used for training in their forward (i.e., normal playback) and backward (i.e., reverse playback) directions, in some embodiments. Any suitable rotations can be performed in some embodiments. For example, in some embodiments, rotations of 0, 45, 90, 135, 180, 215, 270, and 315 degrees can be performed. Any suitable number of basis elements (e.g., 5000) can be extracted from each sequence in some embodiments. As a result, the learned dictionary can capture various features such as shifting edges in various orientations in some embodiments.
After the reconstruction process has been performed for the all positions, the overlapping reconstructed patches can be averaged and the full space-time volume obtained at 108, and process 100 can terminate at 110.
The resulting space-time volume video can then be used in any suitable manner. For example, this video can be presented on a display, can be stored, can be analyzed, etc.
Turning to
The scene can be imaged onto a virtual image plane 304 using objective lens 302. Objective lens 302 can be any suitable lens, such as an objective lens with a focal length equal to 25 mm, for example. The virtual image can then be re-focused onto an image plane of LCoS chip 312 via relay lenses 306 and 310 and polarizing beam splitter 308. LCoS chip 312 can be any suitable LCoS chip, such as a LCoS chip part number SXGA-3DM from Forth Dimension Displays Ltd. of Birmingham, UK. Relay lenses 306 and 310 can be any suitable lenses, such as relay lenses with focal lengths equal to 100 mm, for example. Polarizing beam splitter 308 can be any suitable polarizing beam splitter.
The image formed on the image plane of LCoS chip 312 can be polarized according to the shutter function and reflected back to polarizing beam splitter 308, which can reflect the image through relay lens 314 and can focus the image on image sensor 316. Relay lens 314 can be any suitable relay lens, such as a relay lens with a focal length equal to 100 mm, for example. Image sensor 316 can be any suitable image sensor, such as a Point Grey Grasshopper sensor from Point Grey Research Inc. of Richmond, BC, Canada.
As stated above, the virtual image can be focused on both the image plane of the LCoS chip and the image sensor, thereby enabling per-pixel alignment between the pixels of the LCoS chip and the pixels of the image sensor. A trigger signal from the LCoS chip into computer 318 can be used to temporally synchronize the LCoS chip and the image sensor. The LCoS chip can be run at any suitable frequency. For example, in some embodiments, the LCoS chip can be run at 9-18 times the frame-rate of the image sensor.
Alternatively to using a LCoS chip to perform a shutter function, the shutter function can be performed by pixel-wise control of reset and reading of the pixels in an image sensor 416 as shown in
Computer 318 can be used to perform functions described above and any additional or alternative function(s). For example, computer 318 can be used to perform the functions described above in connection with
In some embodiments, any suitable computer readable media can be used for storing instructions for performing the processes described herein. For example, in some embodiments, computer readable media can be transitory or non-transitory. For example, non-transitory computer readable media can include media such as magnetic media (such as hard disks, floppy disks, etc.), optical media (such as compact discs, digital video discs, Blu-ray discs, etc.), semiconductor media (such as flash memory, electrically programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), etc.), any suitable media that is not fleeting or devoid of any semblance of permanence during transmission, and/or any suitable tangible media. As another example, transitory computer readable media can include signals on networks, in wires, conductors; optical fibers, circuits, any suitable media that is fleeting and devoid of any semblance of permanence during transmission, and/or any suitable intangible media.
Although the invention has been described and illustrated in the foregoing illustrative embodiments, it is understood that the present disclosure has been made only by way of example, and that numerous changes in the details of implementation of the invention can be made without departing from the spirit and scope of the invention, which is only limited by the claims which follow. Features of the disclosed embodiments can be combined and rearranged in various ways.
This application is a continuation of and claims the benefit under 35 U.S.C. § 120 of U.S. patent application Ser. No. 15/405,962, now U.S. Pat. No. 9,979,945, titled “SYSTEMS, METHODS, AND MEDIA FOR RECONSTRUCTING A SPACE-TIME VOLUME FROM A CODED IMAGE”, filed on Jan. 13, 2017, which is a continuation of and claims the benefit under 35 U.S.C. § 120 of U.S. patent application Ser. No. 14/001,139, titled “SYSTEMS, METHODS, AND MEDIA FOR RECONSTRUCTING A SPACE-TIME VOLUME FROM A CODED IMAGE”, filed on Mar. 24, 2014, which is a U.S. National Stage Application under 35 U.S.C. § 371, based on International Application No. PCT/US2012/026816, filed on Feb. 27, 2012, which claims the benefit of U.S. Provisional Patent Application No. 61/446,970, filed Feb. 25, 2011. Each of these applications is hereby incorporated by reference herein in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
7428019 | Irani et al. | Sep 2008 | B2 |
7511643 | Baraniuk et al. | Mar 2009 | B2 |
8797433 | Kaizu et al. | Aug 2014 | B2 |
8798395 | Jo | Aug 2014 | B2 |
8803985 | Kaizu et al. | Aug 2014 | B2 |
8848063 | Jo et al. | Sep 2014 | B2 |
8933924 | Sato | Jan 2015 | B2 |
9036060 | Kaizu et al. | May 2015 | B2 |
9060134 | Mitsunaga | Jun 2015 | B2 |
9100514 | Gu et al. | Aug 2015 | B2 |
9124809 | Kaizu et al. | Sep 2015 | B2 |
9344637 | Kasai et al. | May 2016 | B2 |
9357137 | Mitsunaga | May 2016 | B2 |
9648248 | Gupta et al. | May 2017 | B2 |
9736425 | Gu et al. | Aug 2017 | B2 |
20030076423 | Dolgoff | Apr 2003 | A1 |
20030108101 | Frossard et al. | Jun 2003 | A1 |
20060215934 | Peleg et al. | Sep 2006 | A1 |
20060221067 | Kim et al. | Oct 2006 | A1 |
20070103595 | Gong et al. | May 2007 | A1 |
20070104382 | Jasinschi | May 2007 | A1 |
20080278610 | Boettiger | Nov 2008 | A1 |
20120218426 | Kaizu et al. | Aug 2012 | A1 |
20120281111 | Jo et al. | Nov 2012 | A1 |
20120287294 | Kaizu et al. | Nov 2012 | A1 |
20120314124 | Kaizu et al. | Dec 2012 | A1 |
20130033616 | Kaizu et al. | Feb 2013 | A1 |
20130050177 | Sato | Feb 2013 | A1 |
20130050284 | Sato | Feb 2013 | A1 |
20130050520 | Takeuchi | Feb 2013 | A1 |
20130051700 | Jo | Feb 2013 | A1 |
20130070121 | Gu et al. | Mar 2013 | A1 |
20130308044 | Mitsunaga | Nov 2013 | A1 |
20130329128 | Kaizu et al. | Dec 2013 | A1 |
20140192235 | Hitomi et al. | Jul 2014 | A1 |
20140192250 | Mitsunaga | Jul 2014 | A1 |
20140267828 | Kasai et al. | Sep 2014 | A1 |
20140313400 | Kaizu et al. | Oct 2014 | A1 |
20140321766 | Jo | Oct 2014 | A1 |
20140340550 | Kaizu et al. | Nov 2014 | A1 |
20140368697 | Jo et al. | Dec 2014 | A1 |
20150312463 | Gupta et al. | Oct 2015 | A1 |
20150341576 | Gu et al. | Nov 2015 | A1 |
20160248956 | Mitsunaga | Aug 2016 | A1 |
20170034412 | Jo et al. | Feb 2017 | A1 |
20170134706 | Hitomi et al. | May 2017 | A1 |
20170230562 | Gupta et al. | Aug 2017 | A1 |
Entry |
---|
Gil Bub, Matthias Tecza, Michiel Helmes, Peter Lee & Peter Kohl, Temporal pixel multiplexing for simultaneous high-speed, high-resolution imaging, Published online Feb. 14, 2010, Nature Methods, vol. 7, No. 3., pp. 209-213. |
U.S. Appl. No. 13/399,222, filed Feb. 17, 2012, Kaizu et al. |
U.S. Appl. No. 13/452,977, filed Apr. 23, 2012, Jo et al. |
U.S. Appl. No. 13/457,774, filed Apr. 27, 2012, Kaizu et al. |
U.S. Appl. No. 13/487,682, filed Jun. 4, 2012, Kaizu et al. |
U.S. Appl. No. 13/504,905, filed Dec. 7, 2012, Gu et al. |
U.S. Appl. No. 13/559,720, filed Jul. 27, 2012, Kaizu et al. |
U.S. Appl. No. 13/565,202, filed Aug. 2, 2012, Takeuchi. |
U.S. Appl. No. 13/983,672, filed Aug. 5, 2013, Mitsunaga. |
U.S. Appl. No. 14/000,647, filed Aug. 21, 2013, Kaizu et al. |
U.S. Appl. No. 14/001,139, filed Mar. 24, 2014, Hitomi et al. |
U.S. Appl. No. 14/131,221, filed Mar. 21, 2014, Kasai et al. |
U.S. Appl. No. 14/238,216, filed Feb. 11, 2014, Mitsunaga. |
U.S. Appl. No. 14/319,212, filed Jun. 30, 2014, Kaizu et al. |
U.S. Appl. No. 14/326,963, filed Jul. 9, 2014, Kaizu et al. |
U.S. Appl. No. 14/472,153, filed Aug. 28, 2014, Jo et al. |
U.S. Appl. No. 14/652,397, filed Jun. 15, 2015, Gupta et al. |
U.S. Appl. No. 14/816,976, filed Aug. 3, 2015, Gu et al. |
U.S. Appl. No. 15/143,336, filed Apr. 29, 2016, Mitsunaga. |
U.S. Appl. No. 15/302,670, filed Oct. 7, 2016, Jo et al. |
U.S. Appl. No. 15/405,962, filed Jan. 13, 2017, Hitomi et al. |
U.S. Appl. No. 15/452,420, filed Mar. 7, 2017, Gupta et al. |
International Search Report and Written Opinion dated May 23, 2012 in connection with Internaitonal Application No. PCT/US2012/026816. |
International Prelimiary Report on Patentability dated Sep. 6, 2013 in connection with Internaitonal Application No. PCT/US2012/026816. |
Number | Date | Country | |
---|---|---|---|
20180234672 A1 | Aug 2018 | US |
Number | Date | Country | |
---|---|---|---|
61446970 | Feb 2011 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15405962 | Jan 2017 | US |
Child | 15950297 | US | |
Parent | 14001139 | US | |
Child | 15405962 | US |