This application claims priority under 35 U.S.C. §119 to GB Patent Application No. 1414743.3, filed Aug. 19, 2014, the entire contents of which is hereby incorporated herein by reference in its entirety.
1. Field of the Invention
The present invention relates to a method for processing a video stream and producing a framed video stream.
2. Description of the Related Technology
When capturing a video stream of a subject in a scene, framing the scene effectively, for example to produce an aesthetically pleasing composition of the scene, may be difficult, especially if the subject is in motion. For example, the camera operator may not move the camera smoothly, and may be unable to track accurately the object's motion. A preview of the video stream may be available to the user while capturing the video stream but this may be of limited use, for example if bright light is incident on a camera preview screen, or if the camera is held such that the screen is not easily visible.
In addition, objects may not always be present in a video stream. For example, an object may exit an area being filmed, or may be obscured by another object.
A method is required for improving the automatic framing of a video stream.
According to a first aspect of the present invention, there is provided a method of framing a video stream, the method comprising:
The method improves the framing of a video stream by making the framing dependent on two instead of one parameter. For example the framing can follow the motion of the object when the object is present in the video stream, and be stabilized according to the motion of the camera when the object is not present. An aesthetically pleasing composition may thus be obtained while the object is present and also while the object is not present.
The invention further relates to a system for framing a video stream, the system comprising a camera and a processing unit;
Further features and advantages of the invention will become apparent from the following description of preferred embodiments of the invention, given by way of example only, which is made with reference to the accompanying drawings.
a and 6b show two systems implementing the method of
The detection of the object may for example use known algorithms such as extraction of a histogram of oriented gradients. The histogram of oriented gradients may be analyzed by a feature classifier such as a support vector machine. Use of a support vector machine in this manner is known, for example in “Support-Vector Networks” (Vapnik, Machine Learning, 273-297 (1995), Kluwer Academic Publishers), and involves comparison of part of an image with a template produced by training the algorithm on images containing identified objects. The object detection may, in some embodiments, involve formation of a feature vector from features of the image, to which a trained classifier is applied. Object detection algorithms may output a detection score indicating the confidence of detection, with scores over a threshold value being interpreted as detection of the presence of the object in a frame. Once an object has been detected, its position within a frame can be determined. The position of the object may, for example, be expressed as the center of a box bounding the object.
Motion of the camera 105 with respect to the scene being imaged is detected (step 120). This may be performed by direct analysis of camera motion 125, for example by use of accelerometers mounted on the camera 105. As another example, motion of the camera 105 may be determined by analysis of the video stream 110 using known techniques such as dividing the frames of the video stream 110 into tiles, and determining a motion vector for each tile. Techniques for stabilization of video streams to compensate for camera shake are known in the art. Such techniques include use of an optical stabilization module to move the camera sensor with respect to the lens (for example as described in U.S. Pat. No. 3,942,862) and digital processing including selection of a sub-frame of the full frame whose position relative to the full frame moves in a manner opposing the motion of the camera (for example as described in U.S. Pat. No. 7,956,898).
A framing step 130 generates a framed video stream 135, using a framing in dependence on the motion of the camera 105 and the position of the object. The framing may include cropping and/or scaling the video stream, for example selecting a sub-frame comprising a region of frame of the video stream provided by the camera and discarding the pixels outside that sub-frame. The sub-frame is preferably rectangular, with a specified origin and dimensions. The sequence of sub-frames selected from a sequence of frames from the camera forms a framed video stream 135.
For example, when the object is present, the framing may depend only on the position of the object, for example such that the object remains in substantially the same position within each frame, such as the middle of the frame. When the object is not present, the video stream may be framed depending on the motion of the camera to compensate for the motion; that is to say the video stream may be stabilized.
According to some aspects of the method, when the object is present in the video stream, the framing may simultaneously depend on the motion of the camera and on the position of the object. The relative degree of dependence on the motion of the camera and position of the object may depend on a relative weighting factor. The relative weighting factor may depend on a detection score assigned to the object by an object detection algorithm. For example if an object is identified with a high degree of certainty, the framing may depend almost entirely on the motion of the object, whereas if the object is identified with a low degree of certainty, the framing may depend to a greater degree on the motion of the camera.
In some embodiments, multiple objects may be identified in a video stream, or in a single frame. The method may include selection of a single one of these objects for determining the framing, the selection being based on its position within a frame of the video stream, or based on the size of the object, or based on the type of the object (for example “person” or “car”). The selection may alternatively be performed by manual selection of an object by a user.
In some embodiments, as shown in
If the framing is expressed as a crop window 310, the position of the crop window 310 in a given frame may be expressed as the displacement of the crop window 310 with respect to the position of the crop window in the previous frame. The relative contributions of the camera motion and the object position to the framing may, for example, be combined as follows, where Δx is the horizontal displacement and Δy is the vertical displacement of the crop window 310 with respect to the position in the previous frame:
Δx=αF1(δ′x)+(1−α)F2(—δx)
Δy=αF1(δ′y)+(1−α)F2(−δy)
where δ′x and δ′y are the horizontal and vertical displacements of the object from its position in the previous frame; δx and δy are the amount of horizontal and vertical motion of the camera relative to the previous frame; F1 and F2 are spatial/temporal filters which may be applied to the motion of the camera and/or to the motion of the object to smooth the motion of the crop window between frames; and α is the relative weighting factor as described above.
In some aspects of the invention, the spatial/temporal filters are not applied. This is equivalent to setting:
F
1(δ′x)=δ′x
F
1(δ′y)=δ′y
F
2(−δx)=−δx
F
2(−δy)=−δy
in the equations above. In aspects in which spatial/temporal filters F1, F2 are applied, they may be applied to frames in which the object is present, or frames in which the object is not present, or both, in order to smooth the motion of the crop window between frames. An example of such a filter is a linear temporal filter FLT, which may be defined as:
F
LT(x(t))=βx(t)+(1−β)×(t−1)
where x(t) is the position of the crop window in frame t, and β is a temporal smoothing parameter. A similar filter may be applied to the motion of the camera. A smaller value of the temporal smoothing parameter causes a smoother motion of the crop window between frames. More complex temporal filters, such as a non-linear filter with a stability region, or a spatio-temporal filter which takes into account the size of the displacement, may alternatively be incorporated into the method.
In some embodiments, the relative weighting factor is typically equal to 1 if an object is detected in a given frame, and equal to 0 otherwise, with the consequence that the framing is dependent entirely on the position of the object when the object is present, and entirely on the motion of the camera when the object is not present. In other embodiments, in frames in which the object is detected, the relative weighting factor may have a value between 0 and 1. This provides a balance between tracking object position and compensating for camera motion. In such embodiments, a higher value of the relative weighting factor causes a greater degree of dependence on the position of the object, and a lower value of the relative weighting factor causes a greater degree of dependence on the motion of the camera. Spatial and/or temporal filtering may be applied when determining the weighting factor, for example to effect a smooth transition between dependence of framing on position of the object and dependence on motion of the camera.
The framing may include scaling the crop window by a scaling factor, which may be dependent on the size of the object. In some embodiments, this may be implemented as depicted in
The scaling factor, here termed Z, may be defined as:
Z=γS
where γ is a scaling parameter and S is a measure of the size of the detected object, for example its height or width. It may be desirable to apply spatial or temporal filtering to the scaling factor in order to ensure smooth transitions of the crop window between frames. In such embodiments, the scaling factor may be defined as:
Z=αF(γS)
where α is equal to 1 if the object is present in a given frame and equal to 0 otherwise, and F is a filter, for example a linear temporal filter as described above.
According to some aspects of the invention, the framing may include positioning the object in a position offset from the center of a frame, in which the offset may depend on the orientation of the object. The orientation may be identified by the application of multiple classifiers to the video stream, each classifier being trained to identify an object in a different orientation. For example, classifiers may be used which are trained to identify a human head oriented to the right or to the left with respect to the camera or facing the camera. These classifiers may typically output a detection score, with the classifier with the highest score indicating the best estimate of the orientation of the object.
In an exemplary embodiment, to obtain an aesthetically pleasing composition it may be desirable to position a right-facing person not in the center but to the left of the center of a frame of the framed video stream 135 and vice versa.
X
0
=h/3 if person is facing left;
X
0
=−h/3 if person is facing right;
X
0=0 if person is facing camera;
Y
0
=v/2 if person is facing up; and
Y
0
=−v/2 if person is facing down.
Two exemplary embodiments of a system for carrying out the above described methods are shown in
b shows a camera 625 providing a video stream. The camera is connected to a processing unit 610 and a memory 615 as described above. The processing unit 610 and memory 615 may be included within a computer 630 separate from the camera.
The above embodiments are to be understood as illustrative examples of the invention. Further embodiments of the invention are envisaged. For example, the source may be a memory within a computer, and the source 605, processing system 610 and memory 615 may all be contained within a computer. It is to be understood that any feature described in relation to any one embodiment may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the embodiments, or any combination of any other of the embodiments. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.
Number | Date | Country | Kind |
---|---|---|---|
1414743.3 | Aug 2014 | GB | national |