This invention relates generally to processing videos, and more particularly to subtracting backgrounds in the videos to determine moving foreground objects.
Background subtraction segments moving objects from the background in a video acquired of a scene. The segmentation of moving objects can be used to determine trajectories of the objects, and to improve object detection and classification.
Background subtraction can be done by statistical motion flow analysis or an algebraic decomposition. Statistical motion flow methods generally utilize Gaussian mixture models (GMM) of image plane motion to determine the motion of the objects.
Algebraic (QR) decompositions model the background scene as a low dimensional subspace. The moving objects are then segmented as error terms in an orthogonal complement of the background space. When the camera is stationary, the dimensional subspace is low rank and methods, such as robust principle component analysis (RPCA), can successfully segment the foreground from the background. When the camera is moving, the low rank structure no longer holds, and adaptive subspace estimation techniques are used to track, the background subspace.
The embodiments of the invention provide a method for processing a video, i.e., a sequence of images. The method uses an an algebraic decomposition for solving a background subtraction problem with a factorized robust matrix completion (FRMC) procedure with global motion compensation. The method decomposes a sequence of images (group of pictures (GOP)) in the video into a sum of a low rank component, and a sparse motion component. The low rank component represents the background in the video, and the sparse component represents moving objects in the video. The method alternates between a solution of each component following a Pareto curve for each subproblem.
For videos with a moving background, motion vectors are extracted from the video. If the video is encoded, the motion vectors are extracted from a bitstream. The motion vectors compensate for a change in the camera perspective.
The FRMC has a pre-specified rank to determine the low rank component. A block coordinate descent is applied, using spectral projected gradient steps, to alternate between the solutions of the low rank component and the sparse component while traversing an updated Pareto curve of each subproblem.
For video acquired of the scene with a moving camera, we first extract the motion vectors from an encoded video bitstream and fit a global motion in every image to a parametric perspective model with eight parameters. Then, we align the images to match a perspective of the first image in the GOP, and the FRMC to fill in the background pixels that are missing from the individual images in the GOP.
The method can be perfomed by a processor in batch mode or real-time, and does not require any training step to learn the initial subspace.
As shown in
Factorized Robust Matrix Completion
First, we describe our factorized robust matrix completion (FRMC) used to solve the background subtraction problem.
A matrix Y∈Rm×n, i.e., the video 101, is decomposed into a sum of the low rank component X0 111 and the sparse component S0 121, such that, Y=X0+S0. A restriction operator A: Rm×n→Rp selects a subset Ω of size p of mn samples, generally pixels, in the matrix Y.
Herein, we defined the FRMC problem as the problem of determining X0 and S0 from incomplete samples b=A(Y). This problem can be formulated as a multi objective minimization problem
where λ is a positive weighting parameter, and * indicates the nuclear norm of the matrix X, which is equal to the sum of the singular values of X. When the entries of the matrix Y are fully observed, i.e., A is an identity matrix, the problem is known as robust principal component analysis (RPCA), and the optimization problem (1) is referred to as principal component pursuit (PCP).
The choice of λ={circumflex over (n)}−1/2, {circumflex over (n)}:=max{m, n}, guarantees the recovery of X0 and S0 with high probability when the rank(X0)≦C{circumflex over (n)}(log {circumflex over (n)})−2 for some constant C that depends on a coherence of the subspace of X. Solving the problem in equation (1) requires the determination of full (or partial) singular value decompositions of X, i.e., UDVT in every iteration where U and V are the row and column contain the row and column singular vectors and D is a diagonal matrix containing the singular values, which could become prohibitively complex when the dimensions are large.
To overcome this problem, we use a proxy for the nuclear norm of a rank—r matrix X defined by the following factorization
where inf represents an infimum function, and T is a transpose operator.
The nuclear norm proxy has been used for conventional standard nuclear norm minimization procedures that scale to very large matrix completion problems. Moreover, if the factors L and R of X have a rank greater than or equal to the true rank of X, then a spectral projected gradient procedure using the nuclear norm proxy is guaranteed to converge to the solution of the corresponding nuclear norm minimization problem.
Our FRMC based method extends the procedure described by Aravkin et al., “A robust SVD approach to matrix completion, with applications to interpolation of large scale data,” Cornell University Library, 1302.4886, 2013, May 1, 2013. We solve our FRMC by alternating between the solutions of two subproblems as shown in
Each subproblem in the FRMC method is a least absolute shrinkage and selection operator (LASSO) problem that we solve using spectral projected gradient iterations. The rationale behind this approach follows the technique for solving the basis pursuit denoise problem.
For every fixed sparse component S with bLR=b−A(S), the sequence of iterates Xτ=LRT, where (L, R) is the solution to
which are samples on the Pareto curve of the nuclear norm minimization problem
The update rule of τX is a Newton root determining step of the problem φ(τX)=σ, where
In updating τX, we first determine the current residual vector M=bLR−A(LRT) and apply the rule specified in step 8 of
Global Motion Parametrization
In videos acquired by a moving camera, applying the FRMC method directly to the video images fails in segmenting the correct motion because the background is non-stationary. A non-stationary background is not in a low rank subspace. Therefore, we can only expect the method to fail. Hence, we first estimate the global motion parameters in the video to compensate for the camera motion. Then, we align the background and apply the FRMC method to segment the moving objects.
One approach for global motion estimation relates the coordinates (x1, y1) in a reference image I1 to the coordinates (x2, y2) in a target image I2 using an 8-parameter homography vector h such that
Given a homography vector h=[h0 h1 h2 h3 h4 h5 h6 h7]T that relates the two images, we can warp the perspective of image I2 to match the perspective of image I1, thereby aligning the backgrounds of both images. However, estimating h from the raw pixel domain requires determining point-to-point matches between a subset of the pixels of the two images.
To determine h, we use the horizontal and vertical motion vectors (mx, my) that are available from the encoded video bitstream or during an encoding process. Here, we assume that motion estimation is performed using the previous video image as the reference image. The motion vectors provide relatively accurate point matches between the two images. Note, however, that we are only interested in matching pixels from the moving background. Therefore, we first determine a 32 bin histogram of each of the motion vectors mx and my. Next, we extract a subset A of the indices of pixels whose motion vectors are shared by at least 20% of the pixels in the image. Our assumption here is that foreground objects correspond to less than 20% of the moving pixels in the image. This threshold can vary between different scenes and video sequences. We then use the motion vectors indexed by A to estimate the homography parameter vector h by solving the following least squares problem:
where the function min return the minimum, and
and the matrix
were the subscript A indicates a restriction of the indices to the set A.
We align the nonaligned GOP 402 following the steps in
Due to the camera motion, the warped images Î2 generally occupy a larger viewing area relative to the reference image I2. Consequently, applying a forward map ƒ: (x1, y1)→({circumflex over (x)}2, ŷ2) often results in holes in the warped image. To correct this problem, we determine the reverse mapping g: ({circumflex over (x)}2, ŷ2)→(x2, y2) as a function of h and warp the image to obtain Î2({circumflex over (x)}2, ŷ2)=I2(g({circumflex over (x)}2, ŷ2)).
After the images are warped and aligned using global motion compensation, we vectorize the images and stack the vectors into the matrix Y of size m×n, where m is the number of pixels in each GOP image and n is the number of images in the GOP. The warped images can contain large areas where there are no intensity measurements. Therefore, we construct the restriction operator A that identifies the pixels that contain intensity values. Applying the operator A to the video Y results in a vector b=A(Y) that only contains the the pixels with intensity values. Then, we determine the low-rank approximation X of Y that corresponds to the background pixels, and a sparse component S corresponding to the moving objects in the scene, such that ∥b−A(X+S)∥F≦σ, where σ is the predetermined error tolerance or termination condition, and F represent a Frobenius norm.
FRMC in Batch Mode
In batch mode, disjoint groups of n video images are warped and aligned separately into matrices Yj, where j indicates the GOP number. The FRMC method is then applied to each matrix Yj resulting in the background matrix Xj and the foreground matrix Sj. Note that every column of Sj is a vectorization of a single image containing moving objects in the GOP.
FRMC in Real-Time Mode
In real-time mode, we first extract a background estimate according to the decomposition X1=L1R1T using the first n>1 video images. The number n can be selected to satisfy a maximum delay requirement. For every subsequent image indexed by i=n+1, . . . , N, we align Li−1 with the perspective of image i to produce {circumflex over (L)}=i−1, and match the background Xi−1=Li−1Ri−1T(1,→) to that of the new image. We then perform a single gradient update of equation (3) initialized with {circumflex over (L)}i−1, Ri−1(1,→)), where Ri−1(1,→) is the first row in the matrix Ri−1.
In order to speed up the computation of the sparse component, we replace equation (4) in
where z0=b, S0=0, η is the soft-thresholding operator
and γ is the an adaptive threshold, which is equal to the mode of the histogram of S1.
Effect of the Invention
The invention provides a method for video background subtraction method based on factorized matrix completion with global motion compensation. The method decomposes a sequence of video images into a sum of a low rank background component and a sparse motion component. The method alternates between the solution of each component following a Pareto curve trajectory for each subproblem. For videos with moving a background, we use motion vectors extracted from the video to compensate for changes in the camera perspective. Performance evaluations show that our approach is faster than state-of-the-art solvers and results in highly accurate motion segmentation for both stationary and non-stationary scenes. In fact, the method can estimate the perspective parameters, align the images, and segment common intermediate format (CIF) resolution videos at a rate that exceeds ten images per second.
Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.