This application is related to video processing.
When a video user wishes to extract a frame of video having high quality as in a still photo (i.e., a single frame of video data), a computer may be employed to process the data. For example, a frame may be exported from an output frame buffer, copied and stored. The stored frame of data can then be converted to a photo format, such as Joint Photographic Experts Group (JPEG), bitmap (BMP), or Graphics Interchange Format (GIF).
There are, however, a variety of artifacts that become noticeable upon pausing (or “freezing”) the frame which corrupt the image. During real-time video viewing or video playback, such artifacts are not perceptible as a single frame may last for only 1/60 of a second, for example. However, when the video is slowed down or stopped for a still frame, the artifacts may be quite visible.
In one embodiment of some aspects of the of the invention there is described a method for extracting a still photo from a video signal which includes selecting at least one center-of-mass frame from the video signal, where the center-of-mass frame represents a candidate for the still photo, and the selecting is based on input, such as user input, that indicates a frame of interest. Pixel data in the at least one selected center-of-mass frame is corrected using pixel data from temporally offset frames to produce a corrected frame. A plurality of corrected frames is produced by repeating the selecting and the correcting and a still photo is extracted from the plurality of corrected frames based on an image quality assessment the corrected frames.
A system for extracting a still photo from a video signal includes a video capturing system for producing source data, a graphical user interface; and a processing unit configured to receive the source data and to receive input from the graphical user interface. The processing unit is further configured to select at least one center-of-mass frame from the video signal, where the center-of-mass frame represents a candidate for the still photo, and, in a further embodiment, the selecting is based on a user input that indicates a frame of interest. The processing unit is further configured to correct pixel data in the at least one selected center-of-mass frame using pixel data from temporally offset frames to produce a corrected frame. The processing unit repeats the selection and correction of pixel data to produce a plurality of corrected frames. The still photo is extracted by the processing unit from the corrected frames based on an image quality assessment the corrected frames.
A non-transitory computer readable medium has instructions stored thereon that, when executed, perform an extraction of a still photo from a video signal according to the following steps. At least one center-of-mass frame is selected from the video signal, where the center-of-mass frame represents a candidate for the still photo, and the selecting is based on input that indicates a frame of interest. Pixel data is corrected in the at least one center-of-mass frame using pixel data from temporally offset frames to produce a corrected frame. The selecting and the correcting is repeated to produce a plurality of corrected frames. The still photo is extracted from the corrected frames based on an image quality assessment of the corrected frames.
A system and method are provided for extracting a still photo from video. The system and method allow a user to select from among various available settings as presented on a display of a graphical user interface. Categories of settings include, but are not limited to, entering input pertaining to known types of defects for the video data, selecting a real-time or a playback mode, video data sample size to be analyzed (e.g., number of frames), identifying blur contributors (e.g., velocity of a moving camera), and selecting various color adjustments. A user interface may also include selection of frames of interest within a video segment from which a still photo is desired. This user interface may provide a result for several iterations of the extraction process, allowing the user to select, display, save and/or print one or more extracted photos. The system may include a processing unit that extracts an optimized photo from among a series of initial extracted photos, with or without user input. The system and method may include user interface to allow the user to have selective control of a sliding scale relationship between quality of result and processing time/processing resource allocation.
Source data 111 may be received from a video capturing system 105, such as a video camera. Alternatively, the source data 111 may be received from a video storage device 107 (e.g., a hard drive or a flash memory device) which may playback a video stream (e.g., internet protocol (IP) packets). A decoder or decompressor 108 is used to generate uncompressed pixels that are subsequently filtered for improved quality or clean up from the video storage 107. The improved quality may be achieved by, but is not limited to, deriving using motion vectors. The photo frame 112 is the output of the processing unit 101 following the photo extraction processing. While the compute shader 102, fixed function filters 103, and decoder/decompressor 108 are shown as being separate from the processing unit 101, the processing unit 101 may be configured to include any or all of these units as elements of the processing unit 101 as a single unit 101′.
The user may designate a real-time mode for photo extraction as shown in step 202, using the GUI 104, which activates the photo extraction processing to occur during real-time operation of the video signal capture. In response to the selection of the real-time mode, a temporary load is added on the processing unit 101 for the photo extraction processing, rather than a sustained load. The processing unit 101 restricts the photo extraction process during the real-time mode to a limited time as minimally needed to extract the still photo so as not to burden the related graphics and processing subsystems.
In step 203, the GUI 104 displays frames from which the user may select a single frame or a sequence of frames of interest for the photo extraction, which may be a portion of video data in temporal units.
Based on a selected single frame of interest, the processing unit 101 identifies a center-of-mass frame in step 204 within the source video data 111. This center-of-mass frame is a frame that the processor uses to analyze and process the pixel data to extract the photo. The center-of mass frame may be the selected frame-of-interest, or it may be a nearby frame. If the user selects several frames of interest, the processing unit 101 selects the first frame of interest or a nearby frame as a first center-of-mass frame, the second frame of interest or a nearby frame as a second center-of-mass frame, and so on, until all frames of interest are each designated with a corresponding center-of-mass frame. From the multiple center-of-mass frames, the user or the processing unit 101 may select a final center-of-mass frame based on quality or preference of the user.
As an example of a center-of-mass frame, an I-frame in a Moving Picture Expert Group (MPEG) stream may be used for the center-of-mass frame. Alternatively, a frame close to where the user would like to pause (perhaps within a defined threshold number of frames), which has small motion vectors and/or small errors (both of which are specified within the MPEG standard), may be used to select a center-of-mass frame.
Further alternatives to the step 204 selection of a center-of-mass frame include the following. The user may use the GUI 104 to select a single frame based on either the composition of the frame or the timing of the sequence. Alternatively, the user may use the GUI 104 to select a single frame as a first approximation for the center-of-mass frame based on composition or timing (e.g., the image in the frame is compelling in some way as it relates to the image content and/or to a particular moment in time). It may be, however, that the first approximation for the center-of-mass frame has an image quality that is less than desired. For example, the subject matter may not be properly lit, off-centered, clipped and/or blurry. To remedy this, the processing unit 101 may select a nearby frame, as a second center-of-mass frame, which may also have the preferred characteristics of the first center-of-mass frame, but with improved quality (e.g., absence of motion blur and other artifacts). Alternatively, there may be no user intervention in the initial approximation. Instead, the processing unit 101 may select one or more various frames of interest based on a quality parameter, such as where detected eyes of a face in the image are opened, centering of the image subject, size of a detected face, a detected face directly facing the camera or indirectly facing the camera, brightness, and so on. In this alternative, the processing unit 101 is configured as a non-transitory medium having stored instructions, that upon execution, perform algorithms to determine the above quality parameters.
In any of the preceding examples for selecting the center-of-mass frame, the decision may be tiered by spatial aspects and/or temporal aspects of the image within the frame or frames of interest and nearby candidate frames. For example, a first selection may be a frame sequence in which the general composition of the image spatially within the frame has a quality or characteristic of interest, and a second selection may be a single frame within the frame sequence based on a momentary event displayed in the video frame. Alternatively, the tiered decision may be based on temporal aspects before spatial aspects. For example, the first selection may be a frame sequence based on a particular time segment, and the second selection may be a single frame within the frame sequence based on the size, position and/or orientation of the image content within the frame. The spatial aspect decision may also include input from the user having selected a region of interest 452 within the frame, as described above in step 203. Alternatively, the decision may be tiered based on various spatial aspects alone, or based on various temporal aspects alone.
In step 205, pixel data is collected from one or more temporally offset frames previous of the center-of-mass frame and one or more temporally offset frames following the center-of-mass frame for referencing and comparison to determine the artifacts for correction. The number of temporally offset frames from which the processing unit 101 collects pixel data may be adjustable by the processing unit 101 using an optimization algorithm that weighs processing time against quality assessment based on historical results. In addition, the number of offset frames may be a selectable fixed number based on the photo extraction mode setting. For example, if the real-time extraction mode 411 is activated, the processing unit 101 may set a lower number of offset frames which will allow restriction of the entire photo extraction process to an acceptable limited time duration as previously described. This adjustable number may also be selected by the user using offset 421 selector displayed on the GUI 104 as shown in
In step 206, the compute shaders 102 and/or the fixed function filters 103 perform correction of pixel data to remove the artifacts related to various parameters including, but not limited to: poor color, motion blur, poor deinterlacing, video compression artifacts, poor brightness level, and poor detail. To correct the artifacts, an assessment of motion vectors within the video content is performed, whereby the degree of motion per pixel is established. Processing of pixel motion may include horizontal motion, vertical motion, or combined horizontal and vertical motion. A comparison of the current frame to a previous frame, a next frame, and/or previous and next frames combined, may be performed by the compute shaders 102 and/or fixed function filters 103. The pixel data may be processed to subtract, substitute, interpolate, or a combination thereof, to minimize blur, color, noise, or other aberrations associated with any non-uniform object motion (e.g., an object that is in accelerating or decelerating motion) with respect to uniform motion pixels (e.g., X, Y spatial interpolation, or X, Y spatial interpolation with Z temporal interpolation). Alternatively, pixel data from temporally offset frames may be substituted instead of being subtracted. In the case of interlaced content, a multiple frame motion corrected weave technique may be employed. Techniques which might otherwise take too long for a duration of 1/60 second, may be employed. Techniques such as edge enhancement, intermacro-block edge smoothing, contrast enhancements, etc. may be developed in view of the time constraint. Given that a still image is being processed in this embodiment, more robust but more computationally intensive versions of these same techniques may be applied. The above artifact correction techniques may be constrained to the spatial coordinates of a single frame. In addition, more data within the spatial domain and/or the temporal domain from other frames may be employed. Other techniques that may be applied to remove the artifacts include consensus, substitution, or arithmetic combining operations that may be implemented by, for example, the compute shaders 102 (e.g., using a sum of absolute differences (SAD) instruction), or the fixed function filters 103. Alternatively, the user may selectively adjust motion blur and/or edge corrections while viewing an extracted still photo during a playback photo extraction mode, and save the settings as a profile for future photo extraction processing of a video clip where frames of interest have similar characteristics. For example, in a case where the camera is arranged to move at a certain velocity, such as a camera mounted on a rail system or on a guy wire, the similar characteristics may include camera velocity. The blur and/or edges may then be corrected based on a known camera velocity, and if stored in the profile, subsequent corrections may be easily repeated. The user may make the blur and/or edge correction selections using the GUI 104 at a camera velocity selector 461 as shown in
With respect to color correction, the processing unit 101 may apply any one or more of the following techniques: gamma correction, modification to the color space conversion matrix, white balance, skin tone enhancement, blue stretch, red stretch, or green stretch. Alternatively, the user may selectively apply these color corrections, while viewing an extracted still photo during a playback photo extraction mode, and save the settings as a profile for future photo extraction processing of a video clip where frames of interest have similar characteristics, which may include for example, environment, lighting condition, or camera setting. The user may make the color correction selections using the GUI 104 at the following displayed selectors as shown in
In step 207, the processing unit 101 may optionally select another center-of-mass frame temporally offset to the initial center-of-mass frame (for example, if necessitated by an unsatisfactory quality of the processed center-of-mass frame) and may repeat steps 205 and 206 to correct detected artifacts while generating a histogram of the results. Using an image quality assessment of the results, an optimized photo extraction is achieved, and the optimized extracted photo 453 is displayed on a display of the GUI 104 as shown in
The processing unit 101 may halt further processing of adjacent frames triggered by a detection of a scene change in the frame sequence, thus indicating that such a frame is not suitable as the center-of-mass frame since the frame does not have an image of interest.
Using this autonomous process, the processing unit 101 may select a “best” choice from the optimized photo extraction. The user may then select the extracted photo based on the initial center-of-mass frame or the optimized result according to user preference by comparing the displayed results on the GUI 104 as shown in
The GUI 104 displays as shown in
It should be noted that combinations of the above techniques may be used to address additional artifacts in the video signal frame, such as algorithms that may be used in a GPU post-processing system. In the context of the embodiments described herein, the algorithms may be modified in complexity or in processing theme, consistent with processing of photo pixels rather than a video stream with a dynamic nature. For example, but not by way of limitation, the following may be modified: number of filter taps, deeper edge smoothing or softening (for correcting a jagged edge caused by aliasing), selecting photo color space in place of video color space or vice-versa (i.e., a matrix transform may be used to remap the pixel color coordinate of the first color space to that of the other color space). The result is a photo extracted from the video signal input where the extracted photo has quality equivalent to or superior to one obtained by using a digital photo device.
The processor 302 may include a central processing unit (CPU), a graphics processing unit (GPU), a CPU and GPU located on the same die, or one or more processor cores, wherein each processor core may be a CPU or a GPU, as in an accelerated processing unit (APU). The processor 302 may be configured to perform the functions as described above with reference to the processing unit 101/101′ shown in
The storage 306 may include a fixed or removable storage, for example, a hard disk drive, a solid state drive, an optical disk, or a flash drive, similar to the video storage device 107 shown in
The input driver 312 communicates with the processor 302 and the input devices 308, and permits the processor 302 to receive input from the input devices 308. The output driver 314 communicates with the processor 302 and the output devices 310, and permits the processor 302 to send output to the output devices 310. It is noted that the input driver 312 and the output driver 314 are optional components, and that the device 300 will operate in the same manner if the input driver 312 and the output driver 314 are not present.
It should be understood that many variations are possible based on the disclosure herein. Although features and elements are described above in particular combinations, each feature or element may be used alone without the other features and elements or in various combinations with or without other features and elements.
The methods provided may be implemented in a general purpose computer, a processor, or a processor core. Suitable processors include, by way of example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine. Such processors may be manufactured by configuring a manufacturing process using the results of processed hardware description language (HDL) instructions and other intermediary data including netlists (such instructions capable of being stored on a computer readable media). The results of such processing may be maskworks that are then used in a semiconductor manufacturing process to manufacture a processor which implements aspects of the present invention.
The methods or flow charts provided herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable storage medium for execution by a general purpose computer or a processor. Examples of computer-readable storage mediums include a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
Number | Date | Country | |
---|---|---|---|
61581823 | Dec 2011 | US |