1. Technical Field
This disclosure relates generally to video encoding, and, more specifically, to motion estimation.
2. Description of the Related Art
Performance of a video encoder can be thought of as a point in a three-dimensional space. The three axes represent quality (e.g., peak signal-to-noise ratio (PSNR) or other metrics), size of the encoded result (e.g., compression ratio), and time (e.g., effort or encoder speed). In general, if an encoder gains performance in one of the three dimensions, performance suffers in another one. For example, to improve picture quality in a given encoder, either encoding would need to be performed using better (e.g., lower) quantizers thereby increasing size of the encoding or using laborious methods (e.g., would use all possible combinations to find the most profitable one in terms of quality) thereby increasing the time (and decreasing the speed) to perform the encoding.
This disclosure describes techniques and structures that facilitate scene dependent motion search range adaptation. In one embodiment, a current frame may be encoded by determining a plurality of motion vectors for the current frame. Statistical information regarding sizes of the determined motion vectors may be generated. The search range of a next frame may be modified based on the generated statistical information.
In one embodiment, generating statistical information regarding sizes of the determined motion vectors may include generating a histogram that includes a number of points. Each point of the histogram may represent a motion vector size frequency based on the determined motion vectors for the current frame. In some embodiments, the statistical information may be analyzed by determining a ratio of a first portion and a second portion of the histogram. Modifying the search range of the next frame may be based on the determined ratio.
A better understanding of embodiments of the invention can be obtained when the following detailed description is considered in conjunction with the following drawings, in which:
a-6c illustrate a comparison of speed, quality, and size between the motion search range adaption according to some embodiments and a fixed search range, for a mixed case of scene motion.
a-7c illustrate a comparison of speed, quality, and size between the motion search range adaption according to some embodiments and a fixed search range, for a case of low scene motion.
a-8c illustrate a comparison of speed, quality, and size between the motion search range adaption according to some embodiments and a fixed search range, for a case of high motion and complex texture.
This specification includes references to “one embodiment” or “an embodiment.” The appearances of the phrases “in one embodiment” or “in an embodiment” do not necessarily refer to the same embodiment. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure.
Terminology. The following paragraphs provide definitions and/or context for terms found in this disclosure (including the appended claims):
“Comprising.” This term is open-ended. As used in the appended claims, this term does not foreclose additional structure or steps. Consider a claim that recites: “An apparatus comprising one or more processor units . . . .” Such a claim does not foreclose the apparatus from including additional components (e.g., a network interface unit, graphics circuitry, etc.).
“Configured To.” Various units, circuits, or other components may be described or claimed as “configured to” perform a task or tasks. In such contexts, “configured to” is used to connote structure by indicating that the units/circuits/components include structure (e.g., circuitry) that performs those task or tasks during operation. As such, the unit/circuit/component can be said to be configured to perform the task even when the specified unit/circuit/component is not currently operational (e.g., is not on). The units/circuits/components used with the “configured to” language include hardware—for example, circuits, memory storing program instructions executable to implement the operation, etc. Reciting that a unit/circuit/component is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. §112, sixth paragraph, for that unit/circuit/component. Additionally, “configured to” can include generic structure (e.g., generic circuitry) that is manipulated by software and/or firmware (e.g., an FPGA or a general-purpose processor executing software) to operate in manner that is capable of performing the task(s) at issue. “Configure to” may also include adapting a manufacturing process (e.g., a semiconductor fabrication facility) to fabricate devices (e.g., integrated circuits) that are adapted to implement or perform one or more tasks.
First,” “Second,” etc. As used herein, these terms are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.). For example, in a video having eight frames, the terms “first” and “second” frames can be used to refer to any two of the eight frames. In other words, the “first” and “second” frames are not limited to logical processing elements 0 and 1.
“Based On.” As used herein, this term is used to describe one or more factors that affect a determination. This term does not foreclose additional factors that may affect a determination. That is, a determination may be solely based on those factors or based, at least in part, on those factors. Consider the phrase “determine A based on B.” While B may be a factor that affects the determination of A, such a phrase does not foreclose the determination of A from also being based on C. In other instances, A may be determined based solely on B.
In the following discussion, a scene dependent motion search range adaptation is disclosed that allows for dynamic adaptation of motion estimation effort depending on the type and nature of content. The disclosure first describes an exemplary computing device that includes a video encoder, followed by a description of an encoder configured to perform scene dependent motion search range adaptation.
Turning now to
In various embodiments, encoder 104 may be implemented in hardware. For example, encoder 104 may be implemented on a graphics processing unit (GPU), as a dedicated video encoder, as a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC) or other suitable processor. In other embodiments, encoder 104 may be implemented as program instructions executed by a central processing unit (CPU) or other processing unit (including, but not limited to, a DSP, an accelerated processing unit (APU)—e.g., a CPU combined with GPU or DSP and the like). In various embodiments, encoder 104 may be a combination of hardware and software. Encoder 104 may be any type of encoder, such as a block based motion compensated video encoder, a full search encoder, etc. Encoder 104 may implement the disclosed techniques and produce encoded/compressed video. As shown, computing device 102 may output the encoded video to other devices over network 106.
Examples of those other devices may include, for example, computer system 108 (shown as a laptop), TV 110, tablet, set top box, mobile device 112 and the like. Such devices may include decoders (e.g., hardware or software) that are compatible with the encoded video format produced by encoder 104.
Turning now to
In one embodiment, motion estimator and encoder 202 may be configured to encode a current frame of a video. Encoding a current frame of a video may include determining a number of motion vectors for the current frame. Determining a number of motion vectors for the current frame may include estimating a difference in pixels between the current frame and a previous frame. Motion estimator and encoder 202 may perform motion estimation in a manner such that it respects four boundaries of a search range. For instance, the search range may include left, right, top, and bottom borders, which may be referred to as x_min, x_max, y_min, and y_max, respectively.
Turning back to
In one embodiment, search range calculator 204 may modify the four boundaries independent of one another. For example, the update to x_min may be modified (e.g., calculated) separately from x_max, y_min, and y_max. The updated search range may be calculated for the next frame based on motion vector statistics of the present frame. Encoder 104 may also include logic (e.g., motion estimator and encoder 202, search range calculator 204, or some other logic not shown in
In one embodiment, to determine the search range, statistical information regarding sizes of the determined motion vectors for the current frame may be generated and in some embodiments, encoder 104 may save the generated statistical information. In one embodiment, generating statistical information may include generating a histogram (or data that could be used to generate such a histogram); the terms will be used interchangeably throughout. The histogram data may include a plurality of points, with each point representing a motion vector size frequency based on the determined motion vectors for the current frame. Thus, each point on the histogram curve may represent the number of times that a particular motion was found in the scene. In one embodiment, the histogram data may represent the overall frame and the frequency may indicate how many macroblocks have that particular motion. In such an embodiment, at the end of each frame, a histogram may be generated that includes the magnitudes of motion vectors found throughout the whole frame. As such, the histogram may indicate how dynamic the motion is in a given scene (e.g., large amounts of motion, or slow and smooth transitions). An example histogram distribution can be seen in
With reference to
In various embodiments, search range calculator 204 may modify the search range of a next frame based on the generated statistical information. In one embodiment, search range calculator 204 may determine a ratio of a first portion and a second portion of the generated histogram. The ratio may be the ratio of the area under the entire histogram (A+B) to the area under the curve in the lower portion, A (e.g., lower 25%). The ratio may represent a value indicating a farness or closeness of motion vectors of the current frame to the current limit (e.g., x_min in the example shown in
In one embodiment, the increase or decrease in search range size may be by one pixel, two pixels, or some other amount of pixels. The adjustment increment may be different depending on the direction and limit. For example, for vertical search range adjustments, the adjustment amount may be 2 pixels while for horizontal search range adjustments, the adjustment may be 1 pixel. In other embodiments, the adjustment amounts may be the same for horizontal and vertical search range adjustments. In some embodiments, the adjustment amounts may be variable such that the amount of search adjustment may vary depending on the content. For example, for a sharp scene transition that goes from low to high motion, the search range size at the start of the high motion may have been sized for low motion and used a small search window. Based on the histogram, search range calculator 204 may adapt the search range size quicker to match the sharp change in content rather than by incrementing the search range window at a constant rate for each frame in which it is determined that the frame rate should adjust.
Using example numbers, consider a scenario in which the search window at the transition from low motion to high motion was 4 pixels by 4 pixels. Further consider in this example that a search window normally increases or decreases a boundary of the search range by 2 pixels based on the calculated ratio and the threshold values. As the scene transitions from low to high motion, the histogram and calculated ratio may indicate a large difference in motion. In such a scenario, search range calculator 204 may increase the search range in one or more directions and/or limit by more than the typical 2 pixels. For example, the histogram and calculated ratio may indicate a large difference in motion in the y_max boundary (e.g., the calculated ratio may be much smaller than the increase threshold). Thus, in one embodiment, the search range may increase by 10 pixels in the y_max direction for the next frame to attempt to arrive at an appropriate sized search window quicker than if the search range were increased at a smaller and same value from frame to frame. For the next frame, the calculated ratio may be larger than much may still be smaller than the increase threshold. Accordingly, for the next frame, the search range may be increased by 4 pixels instead of 10 pixels, which may represent a search range that is closing in on an appropriately sized search range (for that boundary) for the motion in the frame. In such an embodiment, not only may the search range be adaptable from frame to frame but it may also flexibly adapt such that quicker convergence may be achieved.
In one embodiment, encoder 104 may determine the type of frame. For example, encoder 104 may determine that a frame is an intracoded-frame (I-frame), or an interpredicted frame (e.g., bi-predictive picture frame (B-frame) or predicted picture frame (P-frame)). In one embodiment, if encoder 104 determines that a frame is an I-frame, search range calculator 204 may reset the search range to one or more default values. The default value may include default values for each direction and limit (e.g., x_min, x_max, y_min, y_max). If encoder 104 determines that the frame is not an I-frame, then motion estimator and encoder 202 and search range calculator 204 may perform the disclosed techniques to adapt the search range for the next frame. In various embodiments, if the frame is a B-frame, then the search forward and search backward may use different search ranges, or one of the search ranges may be used in both directions (e.g., the larger search range).
In various embodiments, the adaptive search range technique described herein may occur at the frame level per frame, at the macroblock level or for each of a plurality of groupings of macroblocks that collectively make up the frame. As an example, the current frame may be encoded by determining a number of motion vectors of the current frame, statistical information regarding sizes of the determined motion vectors for the current frame may be generated, and the search range for the next frame may be modified based on the generated statistical information at the macroblock level or for groupings of macroblocks.
By implementing an encoder that is configured to perform the disclosed content-dependent, adaptive search range techniques, improvements in the time/speed dimension may be achieved by reducing the search range of motion vectors without compromising the other two metrics (e.g., quality and compression ratio). By doing so, effort spent on motion estimation may be saved.
Turning now to
At 302, a frame type may be determined. For example, the frame type may be an I-frame, B-frame, or a P-frame.
As shown at 304, if the determined frame is an I-frame, the method may proceed to block 306. If the determined frame type is not an I-frame (e.g., a P-frame), the method may skip block 306 and proceed to block 308.
As illustrated at 306, the search range may be reset to default. For example, the search range for each direction and limit (e.g., x_min, x_max, y_min, y_max) may be reset to a default value. In one embodiment, the default value may be result in a 32 pixel by 32 pixel search range. The search range may be reset to default values for the search range corresponding to each macroblock or grouping of macroblocks.
At 308, a current frame may be encoded by determining a plurality of motion vectors for the current frame. Determining the plurality of motion vectors for the current frame may include estimating a difference in pixels between the current frame and a previous frame.
As shown at 310, statistical information regarding sizes of the determined motion vectors for the current frame may be generated. Generating statistical information regarding sizes of the determined motion vectors may include generating a histogram. The histogram may include a number of points where each point represents a motion vector size frequency based on the determined motion vectors for the frame. The statistical information may be analyzed by determining a ratio of first and second portions of the generated histogram. For example, the ratio may be the total area of both portions of the histogram divided by the area of the first portion. In one embodiment, the first portion may be approximately 25% (e.g., approximately 20-30%) of the area under the histogram and the second portion may be the remaining area under the histogram. In one embodiment, a histogram may be generated for each axis (e.g., x, y, etc.)
As illustrated at 312, the search range for the next frame may be modified based on the generated statistical information. In one embodiment, the search range may be modified based on the determined ratio. For example, the determined ratio may indicate that a given boundary of the search range (e.g., x_min) should be reduced, enlarged, or remain the same. In some embodiments, the determination to reduce, enlarge, or not modify the limit may be based on the determined ratio as compared to one or more threshold values. For instance, if the determined ratio falls above a decrease threshold value, the search range may be reduced. Modification of the search range may also factor in an absolute lower or upper bound. In the previous example, where the determined ratio may fall above a decrease threshold value, a reduction in the search range may cause that limit to cross a lower bound. In such an example, the limit may be set to the lower bound, or the limit may remain the same in various embodiments. In some embodiments, the modification of the search range for a limit may vary from frame to frame. For example, the modification of the x_min limit for the search range from the current frame to the next frame may be by 8 pixels while the modification of the x_min limit for the search range from the next frame to the subsequent next frame may be by 2 pixels. Thus, the modification may be by a different amount from frame to frame. In one embodiment, the modification of the search range may be performed separately for each boundary (e.g., x_min, x_max, y_min, y_max) of the search range.
a-6c, 7a-7c, and 8a-8c illustrate a comparison of speed, quality, and size between a fixed search range and an adaptive motion search range, according to some embodiments, for three different scenes. The fixed search range encoding and adaptive motion search range encoding were implemented on the same MPEG-4 encoder, using the same encoding parameters for better comparison sake. Thus, there were no variations in the process for testing the two search range implementations; for example, the same motion estimator (full search) was used for both cases and the quantizer was kept at a fixed value (QP=5) for both as well. Accordingly, the sole difference between the two encoding sessions is that one keeps the search range fixed (at −16 to +16 for both horizontal and vertical) whereas the other uses the disclosed search range adaptation techniques allowing the search range to follow the motion trends as the sequence moves along. For both a fixed search range and adaptive search range,
a-6c illustrate a comparison of speed, quality, and size between the disclosed motion search range adaption and a fixed search range, for a mixed case of scene motion. For example, the scene was shot with a hand-held camera and contains a small amount of global motion throughout the sequence. The sequence begins with little motion, is followed by a large motion (e.g., pan), and then settles to little or no motion. As shown in
b illustrates a frame size in bytes on the y-axis and searches on the x-axis. As shown, no significant size difference exists between the frames produced by the two encodings, even in areas with very low search effort in the adaptive search range encoding. When the adaptive search range expands the search range (e.g., from around frame 180 to 200), however, smaller frames (nearly half the size) are produced for the adaptive search range encoding than the fixed range encoding. Smaller frames may be possible by more accurate block matches in the adaptive search range encoding. The PSNR graph in
a-7c illustrate a comparison of speed, quality, and size between the disclosed motion search range adaption and a fixed search range, for a case of low scene motion (e.g., to represent low bandwidth telecom scenarios).
a-8c illustrate a comparison of speed, quality, and size between the motion search range adaption according to some embodiments and a fixed search range, for a case of high motion and complex texture. For example, the scene used in generating these results was a scene in which a large part of the scene (background) has some global motion (pan) whereas a significant part (a bus) remains substantially stationary in the middle of the screen. As seen in
Turning now to
Processor subsystem 980 may include one or more processors or processing units. For example, processor subsystem 980 may include one or more processing units (each of which may have multiple processing elements or cores) that are coupled to one or more resource control processing elements 920. In various embodiments of computer system 900, multiple instances of processor subsystem 980 may be coupled to interconnect 960. In various embodiments, processor subsystem 980 (or each processor unit or processing element within 980) may contain a cache or other form of on-board memory. In one embodiment, processor subsystem 980 may include processor 10 described above.
System memory 920 is usable by processor subsystem 980. System memory 920 may be implemented using different physical memory media, such as hard disk storage, floppy disk storage, removable disk storage, flash memory, random access memory (RAM—static RAM (SRAM), extended data out (EDO) RAM, synchronous dynamic RAM (SDRAM), double data rate (DDR) SDRAM, RAMBUS RAM, etc.), read only memory (ROM—programmable ROM (PROM), electrically erasable programmable ROM (EEPROM), etc.), and so on. Memory in computer system 900 is not limited to primary storage such as memory 920. Rather, computer system 900 may also include other forms of storage such as cache memory in processor subsystem 980 and secondary storage on I/O Devices 950 (e.g., a hard drive, storage array, etc.). In some embodiments, these other forms of storage may also store program instructions executable by processor subsystem 980.
I/O interfaces 940 may be any of various types of interfaces configured to couple to and communicate with other devices, according to various embodiments. In one embodiment, I/O interface 940 is a bridge chip (e.g., Southbridge) from a front-side to one or more back-side buses. I/O interfaces 940 may be coupled to one or more I/O devices 950 via one or more corresponding buses or other interfaces. Examples of I/O devices include storage devices (hard drive, optical drive, removable flash drive, storage array, SAN, or their associated controller), network interface devices (e.g., to a local or wide-area network), or other devices (e.g., graphics, user interface devices, etc.). In one embodiment, computer system 900 is coupled to a network via a network interface device.
Program instructions that are executed by computer systems (e.g., computer system 900) may be stored on various forms of computer readable storage media. Generally speaking, a computer readable storage medium may include any non-transitory/tangible storage media readable by a computer to provide instructions and/or data to the computer. For example, a computer readable storage medium may include storage media such as magnetic or optical media, e.g., disk (fixed or removable), tape, CD-ROM, or DVD-ROM, CD-R, CD-RW, DVD-R, DVD-RW, or Blu-Ray. Storage media may further include volatile or non-volatile memory media such as RAM (e.g. synchronous dynamic RAM (SDRAM), double data rate (DDR, DDR2, DDR3, etc.) SDRAM, low-power DDR (LPDDR2, etc.) SDRAM, Rambus DRAM (RDRAM), static RAM (SRAM), etc.), ROM, Flash memory, non-volatile memory (e.g. Flash memory) accessible via a peripheral interface such as the Universal Serial Bus (USB) interface, etc. Storage media may include microelectromechanical systems (MEMS), as well as storage media accessible via a communication medium such as a network and/or a wireless link.
In some embodiments, a computer-readable storage medium can be used to store instructions read by a program and used, directly or indirectly, to fabricate hardware for encoder 104 described above. For example, the instructions may outline one or more data structures describing a behavioral-level or register-transfer level (RTL) description of the hardware functionality in a high level design language (HDL) such as Verilog or VHDL. The description may be read by a synthesis tool, which may synthesize the description to produce a netlist. The netlist may comprise a set of gates (e.g., defined in a synthesis library), which represent the functionality of encoder 104. The netlist may then be placed and routed to produce a data set describing geometric shapes to be applied to masks. The masks may then be used in various semiconductor fabrication steps to produce a semiconductor circuit or circuits corresponding to encoder 104.
Although specific embodiments have been described above, these embodiments are not intended to limit the scope of the present disclosure, even where only a single embodiment is described with respect to a particular feature. Examples of features provided in the disclosure are intended to be illustrative rather than restrictive unless stated otherwise. The above description is intended to cover such alternatives, modifications, and equivalents as would be apparent to a person skilled in the art having the benefit of this disclosure.
The scope of the present disclosure includes any feature or combination of features disclosed herein (either explicitly or implicitly), or any generalization thereof, whether or not it mitigates any or all of the problems addressed herein. Accordingly, new claims may be formulated during prosecution of this application (or an application claiming priority thereto) to any such combination of features. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the appended claims.