A challenge exists to deliver quality and value to consumers, for example, by providing mobile devices, such as cell phones and personal digital assistants, that are cost effective. Additionally, businesses may desire to provide new features to such mobile devices. Further, businesses may desire to enhance the performance of one or more components of such mobile devices.
The following detailed description references the drawings, wherein:
Image stabilization includes techniques used to reduce jitter associated with the motion of a camera. It can compensate for pan and tilt (angular movement, equivalent to yaw and pitch) of a camera or other imaging device. It can be used in still and video cameras, including those found in mobile devices such as cell phones and personal digital assistants (PDAs). With still cameras, movement or shake is particularly problematic at slow shutter speeds in lower lighting conditions. With video cameras, movement or shake causes visible frame-to-frame jitter in the recorded video.
The quality of image stabilization depends in large part on two factors. First, the camera motion needs to be modeled and measured accurately. Second, the image stabilization technique needs to be able to distinguish between which part of the motion is intended and should be preserved and which part of the motion should be filtered out as unintended motion in order to produce a smooth result that is visually free from camera shake.
Solutions for image stabilization can generally be classified as either two-dimensional (2D) or three-dimensional (3D) according to the motion model that is used. Some 3D solutions may use Structure Form Motion algorithms to recover both the 3D camera motion and the 3D scene structure. This information can then be used to synthesize views as seen from a smooth camera trajectory. Such techniques can produce high quality results, but are computationally expensive and often lacking in robustness.
Two-dimensional techniques model the effect of camera motion on the imaged scenes as a 2D transform or homography (affine or perspective) between adjacent frames. These models are accurate if the scene is planar or if the camera motion consists only of rotations around an optical axis. Two-dimensional image stabilization techniques offer a better balance between performance, quality, and robustness than 3D image stabilization techniques. Two-dimensional techniques have a motion estimation component that needs to be carefully designed for accuracy while still enabling real-time performance. Image stabilization techniques that utilize direct or intensity-based motion estimation, such as optical flow or gradient descent searching, are often too expensive for real-time performance. Project profile correlation techniques can offer real-time performance in mobile devices, but have a lower accuracy and can typically model only translation instead of a more complete affine motion. Image stabilization techniques that utilize feature-based motion estimation typically use multi-scale methods such as scale invariant feature transform (SIFT) or speeded up robust features (SURF) which still tend to be short of real time performance.
Many image stabilization techniques are thus too computationally expensive for real time performance in mobile device applications. Other image stabilization techniques fall short of the accuracy required in such devices. A significant challenge and need therefore exist to accomplish image stabilization in a mobile device with real-time performance and accuracy (e.g., 30 frames per second (fps)).
A block diagram of an example of an image stabilization method 10 is shown in
Briefly, motion estimation module or component 14 determines the amount of motion occurring between an original frame and an adjacent frame in a sequence. This may be accomplished in a variety of ways including feature based image registration. Motion filtering module or component 16 analyzes the measured motion history and determines which component of the absolute motion needs to be preserved as intended motion and which component needs to be eliminated or filtered out. Motion compensation module or component 18 renders a new frame as a warped version of the original image frame to remove the unintended motion.
A block diagram of an example of a feature based image registration method 22 is shown in
Method 22 additionally includes a feature matching component 32 that additionally operates on both first image 24 and second image 26. As will be additionally discussed in more detail below, feature matching component 32 selects pairs of key points for each of first image 24 and second image 26 based on a measure of closeness of their feature descriptors. Method 22 further includes a geometric transform estimation module 34. As will be further discussed below in more detail, geometric transform estimation module 34 utilizes a list of matching pairs of key points selected by feature matching module 32 and the positions of such key points to map reference image 24 into target image 26.
A block diagram of an example of a feature extraction method 36 (for feature extraction component 28) is shown in
Feature extraction method 36 includes the element or component 40 of generating a blurred input image which involves convolving input image 38 with a two-dimensional box filter to create a box filtered image. The dimensions and size of this box filter can vary. For example, an N×N box filter, where N=8, may be used for video image applications. As another example, an N×N box filter, where N can vary between 8 and 32, may be used for still image applications.
An arbitrary size box filter of N×N can be computed efficiently with only four (4) operations (two (2) adds and two (2) subtracts) per pixel in input image 38. This is done by maintaining a one-dimensional (1D) array 42 which stores the sum of N consecutive image rows, for example the first eight (8) rows, where N=8, as generally illustrated in
Referring again to
and the determinant of H is: detH=fxxfyy−fxy. This means that the determinant of the Hessian matrix (detH) is only computed for 1/16th of the blurred input image which increases the speed of method 36. Examples of the kernels fxx, fyy, and fxy used in computing the second-order partial derivatives are shown in
Referring again to
The pre-determined image dependent threshold can be calculated as follows. The laplacian of the first input image 38 is computed in coarse grid 50. The laplacian is computed with the kernel:
This computation is performed only for every 1/16′h row and every 1/16th column. The initial threshold is given by: ThI=2 sdev (lapi), where sdev is the standard deviation of lapi. Using ThI on detH results in an initial number of feature points. If this is larger than the target, ThI is reduced until the target is reached. This is efficiently done using a histogram of the values of detH, numI represents the initial number of feature points and numT represents the targeted number, then for the next input image 26 the lap is not computed and the initial threshold is computed as: ThI (k+1)=(0.9 numI/NumT) ThI (k), where ThI (k+1) is the next input image 38 and ThI(k) is the previous input image 38.
Method 36 further includes the element or component 56 of determining the high resolution feature points in the blurred input image. This is accomplished by applying a fine grid 58 shown in
Referring again to
Referring again
Referring again
Once one or more matching pairs of feature points are determined by feature matching module 32, feature based image registration method 22 proceeds to geometric transform estimation module 34. Module 34 utilizes the matching pairs of feature points and their positions to estimate a global affine transformation that maps first or reference image 24 into second or target image 26. Robustness against outliers is obtained by using either random sample consensus (RANSAC) or M-Estimation. Other approaches (e.g., a robust mean or utilization of the median of the motion vectors defined by matching pairs of feature points) can be used if correction of only translation is required, rather than translation, rotation, scaling and shear. These approaches also tend to be computationally less expensive and faster.
Referring again to
The intended motion may be determined using a Kalman filter on the history of motion parameters. The unintended motion for which compensation is required is determined by subtracting the determined intended motion from the motion determined by module or component 14. A first order kinematic model may be used for Kalman filtering of the cumulative affine motion parameters. The process for obtaining the cumulative affine motion parameters is as follows. Let dZn represent the global motion affine transform between image frame (n−1) and image frame (n). In homogeneous coordinates dZ, is given by:
The cumulative affine transform from frame 0 to frame n) ea be computed as:
Z
n
=dZ
n
Z
n-1
with initial condition Z0=I.
Zn can also be expressed in homogeneous coordinates:
The process for obtaining the cumulative affine motion parameters for translational motion only can be simplified as follows. Let dzn represent a global translation parameter between frame (n−1) and frame (n). The cumulative parameters from frame 0 to frame (n) can be computed as:
z
n
=dz
n
+z
n-1
with initial condition z0=0.
The first order kinematic model used for Kalman filtering of the cumulative affine motion parameters is as follows. Let dan represent a sequence of the true intended values of any of the affine transform parameters describing the motion between two adjacent image frames and let an represent the cumulative values. The noisy cumulative motion measurements zn are related to the true intended values by:
z
n
=Hx
n
+v
n
where xn represents the state vector
The matrix H=[1 0] maps the state to the true intended value and vn represents the measurement noise or unintended motion which is assumed to be normally distributed with zero mean and covariance R.
The filtering process is described by the linear stochastic difference equation:
x
n
=Fx
n-1
+w
n-1
where the state transition matrix is given by
and wn represents the process noise which is assumed to be normally distributed with zero mean and covariance Q.
An example of the Kalman filter 78 utilized by module or component 16 is shown in
Covariances R and Q are used to increase or decrease the amount of filtering as well as the inertia in the system. In one application in accordance with the present invention, R is a scalar representing the variance of the cumulative motion parameters. It is common to express R in standard deviation form σR (the square root of R). As an example for translational motion, values of σR in a range between one (1) and ten (10) pixels are typical. Low values of σR increase the influence of the noisy measurements and decrease the filtering accordingly. Similarly, high values of σR decrease the influence of the noisy measurements and increase the filtering. Also, in the one application in accordance with the present invention referenced above, Q is a diagonal 2×2 matrix with equal entries representing the process variance. As an example for translational motion, values of Q in a range between 0.0001 and 0.1 are typical. Low values of Q force the system to conform to a first order kinematic model (i.e., one of constant velocity or constant motion vectors). Consequently, both filtering and system inertia are increased. High values of Q decrease both filtering and inertia.
Referring again to
. Additionally, in portion 82 of example 80, Ân Zn−1 represents the unconstrained correction affine warp that takes out the original motion Zn and replaces it with the filtered intended motion Ân. Further, in portion 82 of example 80. ψ(ÂnZn−1) represents a clipping function constraining the affine parameters to avoid warping into undefined regions.
. Additionally, in portion 86 of example 84, ψ(cn-1+dcn) represents a clipping function constraining the correction to the interval [cmin, cmax].
One way in which an implementation in accordance with the present invention can utilize Kalman filtering for image stabilization that differs from other implementations is illustrated in step (2) of both
An additional benefit of motion filtering module or component 16 of the present invention relates to the way in which degradations in system behavior during panning motion are addressed. This occurs during times when the system transitions in and out of frames for which motion information cannot be extracted. An example of such a situation where this can occur is panning in and out of scenes lacking features (e.g., a textureless wall). This results in noticeable and undesirable accelerations and decelerations in motion filter output. A solution to this problem for a translational motion case in accordance with the present invention is as follows. First, the global motion vectors are assigned a weight at each iteration of the filter (i.e., at every frame). The default normal value for this weight is one (1). As a frame for which motion cannot be extracted is transitioned into, the last valid known motion vector is used and its value is decreased towards zero (0) by decreasing its weight linearly for every frame as long as true motion information is unavailable. At the same time, the Kalman filter is decreased by halving the measurement noise standard deviation σR at each frame until the minimum allowable value is reached. Similarly, transition into a frame with available motion information is accomplished by increasing the weight linearly towards one (1) and using this weight on the known value of the global motion vector. At the same time, the Kalman filter is increased by doubling aR at each frame until the nominal default value is reached.
Although several examples have been described and illustrated in detail, it is to be clearly understood that the same are intended by way of illustration and example only. These examples are not intended to be exhaustive or to limit the invention to the precise form or to the exemplary embodiments disclosed. Implementations, modifications and variations may well be apparent to those of ordinary skill in the art. For example, motion compensation module or component 18 can be implemented for the translational case illustrated in
Additionally, reference to an element in the singular is not intended to mean one and only one, unless explicitly so stated, but rather means one or more. Moreover, no element or component is intended to be dedicated to the public regardless of whether the element or component is explicitly recited in the following claims.