1. Field of the Invention
The present invention relates to spatial multiplexing cameras, and more particularly, to video compressive sensing for spatial multiplexing cameras.
2. Brief Description of the Related Art
Compressive sensing (CS) enables one to sample well below the Nyquist rate, while still enabling the recovery of signals that admit a sparse representation in some basis. See, E. J. Cand'es, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete incomplete frequency information,” IEEE Trans. Inf. Theory, vol. 52, pp. 489-509, February 2006; D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, pp. 1289-1306, April 2006; and U.S. Pat. No. 7,271,747. Since many natural (and artificial) signals exhibit sparsity, CS has potential to reduce the sampling rates and costs of corresponding devices in numerous applications.
Compressive sensing deals with the recovery of a signal vector xεN from M<N non-adaptive linear measurements
y=Φx+z, (1)
where ΦεM×N is the sensing matrix and z represents measurement noise. Estimating the signal x from the compressive measurements y is ill-posed, in general, since the (noiseless) system of equations y=Φx is underdetermined. Nevertheless, a fundamental result from CS theory states that the signal vector x can be recovered stably from
M˜K log(N/K) (2)
measurements if: i) the signal x admits a K-sparse representation s=ΨTx in an orthonormal basis Ψ, and ii) the matrix ΦΨ satisfies the restricted isometry property (RIP). For example, if the entries of the matrix Φ are i.i.d. zero mean (sub-)Gaussian distributed, then ΦΨ is known to satisfy the RIP with overwhelming probability. Furthermore, any K-sparse signal x satisfying (2) can be recovered stably from the noisy measurement y by solving a convex optimization problem such as
minimize ∥ΨTx∥1 subject to ∥y−Φx∥2≦ε (P1)
where (•)T denotes matrix transposition and ε controls the accuracy of the estimate.
The single-pixel camera (SPC), the flexible voxels camera, and the P2C2 camera are practical imaging architectures that rely on the theory of CS. See, M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag., vol. 25, pp. 83-91, March 2008; M. Gupta, A. Agrawal, A. Veeraraghavan, and S. Narasimhan, “Flexible voxels for motion-aware videography,” in Euro. Conf. Comp. Vision, (Crete, Greece), September 2010; and D. Reddy, A. Veeraraghavan, and R. Chellappa, “P2C2: Programmable pixel compressive camera for high speed imaging,” in IEEE Conf. Comp. Vision and Pattern Recog, (Colorado Springs, CO, USA), June 2011, and U.S. Patent Application Publication No. 2006/0239336.
Spatial-multiplexing cameras (SMCs) are practical imaging architectures that build upon the ideas of CS. Such cameras employ a spatial light modulator, e.g., a digital micro-minor device (DMD) or liquid crystal on silicon (LCOS), to optically calculate a series linear projections of a scene x by implementing the sensing process in (1) above using pseudo-random patterns that ultimately determine the sensing matrix Φ. A prominent example of an SMC architecture is the single-pixel camera (SPC); its main feature is the ability of acquiring images by using only a single sensor element (i.e., a single pixel) and by taking significantly fewer measurements than the number of pixels of the scene to be recovered. Since SMCs rely on only a few sensor elements, they can operate at wavelengths where corresponding full-frame sensors are too expensive. In the recovery stage the image x is recovered from the compressive measurements collected in y. In practice, recovery is performed either by using (P1) above, total variation (TV)-based convex optimization, or greedy algorithms.
One approach for video-CS for SMC architectures relies on the observation that perception of motion is heavily dependent on the spatial resolution of the video. Specifically, for a given scene, reducing its spatial resolution lowers the error caused by a static scene assumption. Simultaneously, decreasing the spatial resolution reduces the dimensionality of the individual video frames. Both properties build the foundation of the multi-scale recovery approach proposed in J. Y. Park and M. B. Wakin, “A multiscale framework for compressive sensing of video,” in Pict. Coding Symp., (Chicago, Ill., USA), May 2009, where several compressive measurements are acquired at multiple scales for each video frame. The recovery of the video at coarse scales (small spatial resolution) is used to estimate motion, which is then used to boost the recovery at finer scales (high spatial resolution). The key drawback of this approach is the fact that it relies on the assumption that each frame of the video remains static during the acquisition of CS measurements at various scales. For scenes violating this assumption—as it is the case in virtually all real-world situations—this approach results in poor quality.
Another recovery method was described in D. Reddy, A. Veeraraghavan, and R. Chellappa, “P2C2: Programmable pixel compressive camera for high speed imaging,” in IEEE Conf. Comp. Vision and Pattern Recog, (Colorado Springs, CO, USA), June 2011 for the P2C2 camera, which differs considerably from SMC architectures, i.e., performs temporal multiplexing (instead of spatial multiplexing) with the aid of a full-frame sensor and a per-pixel shutter. The recovery of videos from the P2C2 camera is achieved by using the optical flow between consecutive frames of the video. The implementation of the recovery procedure, however, is tightly coupled with their imaging architecture and inhibits the use for SMC architectures.
The present invention is a compressive-sensing (CS)-based multi-scale video recovery method and apparatus for scenes acquired by spatial multiplexing cameras (SMCs). The invention includes a design of a new class of sensing matrices and an optical flow based video reconstruction algorithm. In particular, the invention includes multi-scale sensing (MSS) matrices that i) exhibit no noise enhancement when performing least-squares estimation at a lower spatial resolution and ii) preserve information about high spatial frequencies to enable recovery of the high-resolution scene. It further includes a MSS matrix having a fast transforms, which enables it to compute instantaneous low-resolution images of the scene at low computational costs. The preview computation supports a large number of novel applications for SMC-based devices, such as providing a digital viewfinder, enabling human-camera interaction, or triggering adaptive sensing strategies. Finally, CS-MUVI is the first video CS algorithm for the SPC that works well for scenes with fast and complex motion.
The performance degradation of recovery of time varying scenes caused by violating the static-scene assumption of conventional systems and methods is severe, even at moderate levels of motion. The present compressive-sensing strategy for SMC architectures overcomes the static-scene assumption. The present system and method, illustrated
In a preferred embodiment, the present invention is a method for video compressive sensing for spatial multiplexing cameras. The method comprises the steps of sensing a time-varying scene with a spatial multiplexing camera, computing a least squared estimate of a sensed scene, generating a low-resolution preview video of said sensed scene using a computed least squares estimate of said sensed scene, estimating an optical flow of said time-varying scene using said low-resolution preview video, and recovering a full resolution video of said sensed time-varying scene using sparse signal recovery algorithms. A generated low-resolution preview video may be displayed on a display. The method may further comprise displaying a recovered full resolution video of said time-varying scene. The step of sensing a time-varying scene with a spatial multiplexing camera may comprise, for example sensing a time-varying scene with a single pixel camera, a flexible voxels camera or a P2C2 camera.
Many variations may be used with the invention. Sensing patterns used in sensing the time-varying scene are generated using a multi-scale sensing (MSS) matrix. The MSS matrix may have a fast transform when right-multiplied by upsampling operators. The MSS matrix may be designed for two scales. A downsampled version of the sensing matrix may be orthogonal or may have a fast inverse transform. The optical flow may be approximated using block-matching techniques or may be computed using an upsampled version of the preview frames. Information about the scene may be extracted from the low-resolution preview. The extracted information may comprise the location, intensity, speed, distance, orientation, and/or size of objects in the scene. Information about the scene may be taken into account in the recovery procedure of the high-resolution video. Subsequent sensing patterns may be automatically adapted based on extracted information. Parameters (optics, shutter, orientation, aperture etc.) of the spatial multiplexing camera may be automatically or manually adjusted using the extracted information. The dynamic foreground and static background may be separated. The recovery is performed using l1-norm minimization. The recovery may be performed, for example, using total variation minimization or using greedy algorithms.
Still other aspects, features, and advantages of the present invention are readily apparent from the following detailed description, simply by illustrating a preferable embodiments and implementations. The present invention is also capable of other and different embodiments and its several details can be modified in various obvious respects, all without departing from the spirit and scope of the present invention. Accordingly, the drawings and descriptions are to be regarded as illustrative in nature, and not as restrictive. Additional objects and advantages of the invention will be set forth in part in the description which follows and in part will be obvious from the description, or may be learned by practice of the invention.
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following description and the accompanying drawings, in which:
A preferred embodiment of method and apparatus for video compressive sensing for spatial multiplexing cameras is described with reference to the Figures. As shown in
Spatial multiplexing cameras (SMCs) acquire random (or coded) projections of a (typically static) scene using a digital micro-minor device (DMD) or liquid crystal on silicon (LCOS) in combination with a few optical sensing elements, such as photodetectors or bolometers. The use of a small number of optical sensors—in contrast to a full-frame sensor—turns out to be extremely useful when acquiring scenes at non-visible wavelengths. In particular, sensing beyond the visual spectrum requires sensors built from exotic materials, which renders corresponding full-frame sensor devices cumbersome or too expensive.
Obviously, sampling with only a few sensors is, in general, not sufficient for acquiring complex scenes. Hence, SMCs acquire scenes by taking multiple consecutive measurements over time. For still images and for a single-pixel SMC architecture, this sensing strategy has been shown to deliver good results, but it fails for time-variant scenes (videos). The key challenge of video-CS for SMCs is the fact that the scene to be captured is ephemeral, i.e., each compressive measurement senses a (slightly) different scene; the situation is further aggravated when dealing with SMCs having a small number of sensors (e.g., only one for the SPC). Virtually all proposed methods for CS-based video recovery seem to overlook this important aspect. See, for example, J. Y. Park and M. B. Wakin, “A multiscale framework for compressive sensing of video,” in Pict. Coding Symp., (Chicago, Ill., USA), May 2009; A. C. Sankaranarayanan, P. Turaga, R. Baraniuk, and R. Chellappa, “Compressive acquisition of dynamic scenes,” in Euro. Conf. Comp. Vision, (Crete, Greece), September 2010; N. Vaswani, “Kalman filtered compressed sensing,” in IEEE Conf. Image Process., (San Diego, Calif., USA), October 2008; M. B. Wakin, J. N. Laska, M. F. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. F. Kelly, and R. G. Baraniuk, “Compressive imaging for video representation and coding,” in Pict. Coding Symp., (Beijing, China), April 2006; and S. Mun and J. E. Fowler, “Residual reconstruction for block based compressed sensing of video,” in Data Comp. Conf., (Snowbird, UT, USA), April 2011. Indeed, these approaches treat scenes as a sequence of static frames (i.e., videos) as opposed to a continuously changing scene. This disconnectedness between the real-world operation of SMCs and the assumptions commonly made for video CS renders existing recovery algorithms futile.
Successful video-CS recovery methods for camera architectures relying on temporal multiplexing (in contrast to spatial multiplexing as for SMCs) are generally inspired by video compression (i.e., exploit motion estimation). See, for example, D. Reddy, A. Veeraraghavan, and R. Chellappa, “P2C2: Programmable pixel compressive camera for high speed imaging,” in IEEE Conf. Comp. Vision and Pattern Recog, (Colorado Springs, CO, USA), June 2011; A. Veeraraghavan, D. Reddy, and R. Raskar, “Coded strobing photography: Compressive sensing of high speed periodic events,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, pp. 671-686, April 2011; and Y. Hitomi, J. Gu, M. Gupta, T. Mitsunaga, and S. K. Nayar, “Video from a single coded exposure photograph using a learned over-complete dictionary,” in IEEE Intl. Conf. Comp. Vision, (Barcelona, Spain), November 2011. The use of such techniques for SMC architectures, however, results in a fundamental problem: On the one hand, obtaining motion estimates (e.g., optical flow or via block matching) requires knowledge of the individual video frames. On the other hand, the recovery of the video frames from an SMC in absence of motion estimates is difficult, especially when using low sampling rates and a small number of sensor elements. Attempts that address this “chicken-and-egg” problem either perform multi scale sensing strategies or sense separate patches of the individual frames. Both approaches ignore the time varying nature of real-world scenes and rely on a piece-wise static model.
A recovery error results from the static-scene assumption while sensing a time-varying scene (video) with an SMC. There is a fundamental tradeoff underlying a multi-scale recovery procedure. Since the single pixel camera is the most challenging SMC architecture (i.e., it only provides a single pixel sensor), it is used herein as an example and generalization of that example to other SMC architectures with more than one sensor is straightforward.
The compressive measurements ytε taken by a single-sensor SMC at the sample instants t=1, . . . , T can be written as yr=φt,xt±zt, where T is the total number of acquired samples, φtεN×1 is the sensing vector, ztε is the measurement noise, and xtεN×1 is the scene (or frame) at sample instant t; here, •,• denotes the inner product. Hereafter, we assume that the 2-dimensional scene consists of n×n spatial pixels, which, when vectorized, results in the vector xt of dimension N=n2. We also use the notation y1:w to represent the vector consisting of a window of W≦T successive compressive measurements (samples), i.e,
Suppose that we rewrite our (time-varying) scene xt for a window of W consecutive sample instants as follows:
x
t
=b+Δx
t
, t=1, . . . ,W.
Here, b is a static component (assumed to be invariant for W samples), and Δxt=xt−b is the error at sample instant t caused by assuming a static scene. By defining et=φt,Δxt, we can rewrite (3) as
y
1:W
=Φb+e
1:W
+z
1:W (4)
where ΦεW×N is a sensing matrix whose tth row corresponds to the transposed vector φt.
We now consider the error caused by spatial downsampling of the static component b in (4). To this end, let bLεN
since bL=Db. Inspection of (5) reveals three sources of error in the CS measurements of the low-resolution static scene ΦUbL: i) The spatial-approximation error Φ(I-UD)b caused by down-sampling, ii) the temporal approximation error e1:W caused by assuming the scene remains static for W samples, and iii) the measurement error z1:W.
In order to analyze the trade-off that arises from the static-scene assumption and the down-sampling procedure, consider the scenario where the effective matrix ΦU is of dimension W×NL with W≧NL; that is, we aggregate at least as many compressive samples as the down-sampled spatial resolution. If ΦU has full (column) rank, then we can obtain a least-squares (LS) estimate {circumflex over (b)}L of the low resolution static scene bL from (5) as
where (•)† denotes the (pseudo) inverse. From (6) we can observe the following facts: i) The window length W controls a trade-off between the spatial-approximation error Φ(I-UD)b and the error e1:W induced by assuming a static scene b, and ii) the least squares (LS) estimator matrix (ΦU)† (potentially) amplifies all three error sources.
As developed above, the spatial-approximation error and the temporal-approximation error are both a function of the window length W. We now show that carefully selecting W minimizes the combined spatial and temporal error in the low-resolution estimate {circumflex over (b)}L. Inspection of (6) shows that for W=1, the temporal-approximation error is zero, since the static component b is able to perfectly represent the scene at each sample instant t. As W increases, the temporal-approximation error increases for time-varying scenes; simultaneously, increasing W reduces the error caused by down-sampling Φ(I-UD)b (see
In order to bootstrap CS-MUVI, a low-resolution estimate of the scene is required. We next now that carefully designing the CS sensing matrix Φ enables us to compute high-quality low-resolution scene estimates at low complexity, which improves the performance of video recovery.
The choice of the sensing matrix Φ and the upsampling operator U are critical to arrive at a high-quality estimate of the low-resolution image bL. Indeed, if the compound matrix ΦU is ill-conditioned, then application of (ΦU)† amplifies all three sources of errors in (6), resulting in a poor estimate. For a large class of conventional CS matrices Φ, such as i.i.d. (sub-)Gaussian matrices, as well as sub-sampled Fourier or Hadamard matrices, right multiplying them with an upsampling operator U typically results in an ill-conditioned matrix. Hence, using well-established CS matrices for obtaining a low-resolution preview turns out to be a poor choice.
In order to achieve good CS recovery performance and have minimum noise enhancement when computing low resolution estimates {circumflex over (b)}L according to (6), the present invention uses a new class of sensing matrices, referred to as multi-scale sensing (MSS) matrices. In particular, the present invention uses matrices that i) satisfy the RIP and ii) remain well-conditioned when right-multiplied by certain up-sampling operators U. The second condition requires mutual orthogonality among the columns of ΦU to minimize the noise enhancement in (6). Random matrices or certain sub-sampled orthogonal transforms are known to satisfy the RIP with overwhelming probability. However, they typically fail to meet the second constraint, because they have decaying singular values. The power of MSS matrices with a particular DSS design is demonstrated in
If we additionally impose the constraint that a downsampled MSS matrix ΦU has a fast inverse transform, then it will significantly speed up the recovery of the low resolution scene. Such a “fast” MSS matrix has the key capability of generating a high-quality preview of the scene (see
Real-time preview: Conventional SMC architectures do not enable the observation of the scene until CS recovery is performed. Due to the high computational complexity of most existing CS recovery algorithms, there is typically a large latency between the acquisition of a scene and its observation. Fast MSS matrices offer an instantaneous visualization of the scene, i.e., they can provide us with a real-time digital viewfinder. This capability substantially simplifies the setup of an SMC in practice.
The immediate knowledge of the scene—even at a low resolution—can potentially be used to design adaptive sensing strategies. For example, one may seek to extract the changes that occur in a scene from one frame to the next or track moving objects, while avoiding the latency caused by sparse signal recovery algorithms.
There are many ways to construct fast MSS matrices. In this section, we detail one design that is particularly suited for SMC architectures. In SMC architectures, we are constrained in the choice of the sensing matrix Φ. Practically, the DMD limits us to matrices having entries of constant modulus (e.g., ±1). Since we are interested in a fast MSS matrix, we propose the matrix Φ to satisfy H=ΦU, where H is a W×W Hadamard matrix and U is a predefined up-sampling operator. For SMC architectures, Hadamard matrices have the following advantages: i) They have orthogonal columns, ii) they exhibit optimal SNR properties over matrices restricted to {−1,+1} entries, and iii) applying the (inverse) Hadamard transform requires very low computational complexity (i.e., as a fast Fourier transform). See, M. Harwit and N. Sloane, Hadamard Transform Optics, New York: Academic Press, 1979; Y. Y. Schechner, S. K. Nayar, and P. N. Belhumeur, “Multiplexing for optimal lighting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, pp. 1339-1354, August 2007.
We now show the construction of a suitable fast MSS matrix Φ for two scales (for the preview frames at a given low resolution and the full resolution frames), referred to as dual-scale sensing (DSS) matrix (see
Φ=HD+F, (7)
where D is a down-sampling matrix satisfying DU=I, and FεW×N is an auxiliary matrix that obeys the following constraints: i) The entries of Φ are ±1, ii) the matrix Φ has good CS recovery properties (e.g., satisfies the RIP), and iii) F should be chosen such that FU=0. Note that an easy way to ensure that Φ be ±1 is to interpret F as sign flips of the Hadamard matrix H. Note that one could chose F to be an all-zeros matrix; this choice, however, results in a sensing matrix Φ having poor CS recovery properties.
In particular, such a matrix would inhibit the recovery of high spatial frequencies. Choosing random entries in F such that FU=0 (i.e., by using random patterns of high spatial frequency) provides excellent performance. To arrive at an efficient implementation of CS-MUVI, we additionally want to avoid the storage of an entire W×N matrix. To this end, we generate each row fiεN as follows: Associate each row vector fi to an n×n image of the scene, partition the scene into blocks of size (n/nL)×(n/nL), and associate an (n/nL)2-dimensional vector {circumflex over (f)}i with each block. We can now use the same vector {circumflex over (f)}i for each block and choose {circumflex over (f)}i such that the full matrix satisfies FU=0. We also permute the columns of the Hadamard matrix H to achieve better incoherence with the sparsifying bases (see
Optical-Flow-Based Video Recovery
We next detail the second part of CS-MUVI.
Thanks to the preview mode, we can estimate the optical flow between any two (low-resolution) frames {circumflex over (b)}Li and {circumflex over (b)}Lj. For CS-MUVI, we compute optical-flow estimates at full spatial resolution between pairs of upsampled preview frames; this approach turns out to result in more accurate optical-flow estimates compared to an approach that first estimates the optical flow at low resolution followed by upsampling of the optical flow. Hence, we start by upsampling the preview frames according to {circumflex over (b)}Li=U{circumflex over (b)}Li or via a conventional upsampling procedure (e.g., linear or bicubic interpolation,), and then extract the optical flow at full resolution. The optical flow at full resolution can be written as
{circumflex over (b)}
i(x,y)={circumflex over (b)}j(x+ux,y,y+vx,y)
where {circumflex over (b)}i(x,y) denotes the pixel (x,y) in the n×n plane of {circumflex over (b)}i, and ux,y and vx,y correspond to the translation of the pixel (x,y) between frame i and j. See, C. Liu, Beyond Pixels: Exploring New Representations and Applications for Motion Analysis. PhD thesis, Mass. Inst. Tech., 2009 and B. Horn and B. Schunck, “Determining optical flow,” Artif. Intel., vol. 17, pp. 185-203, April 1981. Other methods to compute the optical flow rely on block matching, which is commonly used in many video compression schemes.
In practice, the estimated optical flow may contain subpixel translations, i.e., ux,y and vx,y are not necessarily integers. In this case, we approximate {circumflex over (b)}j(x+ux,y,y+vx,y) as a linear combination of its four closest neighboring pixels
where └•┘ denotes rounding towards −∞ and the weights Wk,l are chosen according to the location within the four neighboring pixels. In order to obtain robustness against occlusions, we enforce consistency between the forward and backward optical flows; specifically, we discard optical flow constraints at pixels where the sum of the forward and backward flow causes a displacement greater than one pixel.
Before we detail the individual steps of the CS-MUVI video-recovery procedure, it is important to specify the rate of the frames to be recovered. When sensing scenes with SMC architectures, there is no obvious notion of frame rate. Our sole criterion is that we want each “frame” to contain only a small amount of motion. In other words, we wish to find the largest window size ΔW≦W such that there is virtually no motion at full resolution (n×n). In practice, an estimate of ΔW can be obtained by analyzing the preview frames. Hence, given a total number of T compressive measurements, we ultimately recover F=T/ΔW full-resolution frames (see
We are now ready to detail the final steps of CS-MUVI. Assume that ΔW is chosen such that there is little to no motion associated with each preview frame. Next, associate a preview frame with a high-resolution frame {circumflex over (x)}k, kε{1, . . . , T} by grouping W=NL compressive measurements in the immediate vicinity of the frame (since ΔW<W). Then, compute the optical-flow between successive (up-scaled) preview frames.
We can now recover the individual high-resolution video frames as follows. Each frame {circumflex over (x)}t is assumed to have a sparse representation in a 2-dimensional orthogonal wavelet basis Ψ; hence, our objective is to minimize the overall l-norm Σk=1F∥ΨT{circumflex over (x)}k∥1. We furthermore consider the following two constraints: i) Consistency with the acquired CS measurements, i.e, φt,{circumflex over (x)}I(t), where I(t) maps the sample index t to the associated frame index k, and ii) estimated optical-flow constraints between consecutive frames. Together, we arrive at the following convex optimization problem:
which can be solved using off-the-shelf algorithms tuned to solve c-recovery problems. See, for example, E. van den Berg and M. P. Friedlander, “Probing the Pareto frontier for basis pursuit solutions,” SIAM J. Scientific Comp., vol. 31, pp. 890-912, November 2008. The parameters ε1≧0 and ε2≧0 can be used to “tweak” the recovery performance Alternatively, we can recover the scene via total variation (TV)-based methods. Such approaches essentially amount to replacing the l1-norm constraint by the total variation norm. See, for example, Rudin, Leonid I., Stanley Osher, and Emad Fatemi. “Nonlinear total variation based noise removal algorithms” Physica D: Nonlinear Phenomena 60, no. 1 (1992): 259-268.
WW We validate the performance and capabilities of the CS-MUVI framework for several scenes. All simulation results were generated from video sequences having a spatial resolution of n×n=256×256 pixels. The preview videos have a spatial resolution of 64×64 pixels with (i.e., W=4096). We assume an SPC architecture as described in M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag., vol. 25, pp. 83-91, March 2008. Noise was added to the compressive measurements using an i.i.d. Gaussian noise model such that the resulting SNR was 60 dB. Optical-flow estimates were extracted using C. Liu, Beyond Pixels: Exploring New Representations and Applications for Motion Analysis. PhD thesis, Mass. Inst. Tech., 2009 and (PV) is solved using SPGL1. See, E. van den Berg and M. P. Friedlander, “Probing the Pareto frontier for basis pursuit solutions,” SIAM J. Scientific Comp., vol. 31, pp. 890-912, November 2008. The computation time of CS-MUVI is dominated by solving (PV), which requires 2-3 hours using an off-the-shelf quad-core CPU. The low-resolution preview is, of course, extremely fast.
Synthetic Scene with Sub-Pixel Motion
In
Video Sequences from a High-Speed Camera
The results shown in
Comparison with the P2C2 Algorithm:
There are some artifacts visible in
A smaller portion of the recovery artifacts is caused by using dense measurement matrices, which spread local errors (such as those from the inaccurate optical-flow estimates) across the entire image. This problem is inherent to imaging with SMCs that involve a high degree of spatial multiplexing; imaging architectures that perform only local spatial multiplexing (such as the P2C2 camera) do not suffer from this problem.
The videos in
Since CS-MUVI relies on optical-flow estimates obtained from low-resolution images, it can fail to recover small objects with rapid motion. More specifically, moving objects that are of sub-pixel size in the preview mode are lost.
The foregoing description of the preferred embodiment of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. The embodiment was chosen and described in order to explain the principles of the invention and its practical application to enable one skilled in the art to utilize the invention in various embodiments as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto, and their equivalents. The entirety of each of the aforementioned documents is incorporated by reference herein.
The present application claims the benefit of the filing date of U.S. Provisional Patent Application Ser. No. 61/604,095 filed by the present inventors on Feb. 28, 2012. The aforementioned provisional patent application is hereby incorporated by reference in its entirety.
This invention was made with government support under Grant Number N66001-11-C-4092 awarded by the Defense Advanced Research Projects Agency, Grant No. FA9550-09-1-0432 awarded by the Air Force Office of Scientific Research, Grant No. W911NF-09-1-0383 awarded by the Army Research Laboratory and Grant No. N66001-08-1-2065 awarded by the SPAWAR Systems Center (SSC) Pacific. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
61604095 | Feb 2012 | US |