The technology described herein encodes pixel data of an image or video frame to support multiple access patterns, including access in raster and macroblock formats, for image data that are captured, processed, stored, or displayed in a computing system.
In today's (2013) imaging applications, it is often desirable to capture, to process, to display, and to store images in mobile, portable, and stationary devices. The prodigious amount of pixels captured during image and video processing can create bottlenecks for system speed and performance in such devices. In imaging applications, at least two access patterns are common: raster-based access (accessing sequential pixels along successive horizontal rows, or rasters, of image frames) and block-based (accessing square [or rectangular] subsets of pixels, where the entire image frame is tiled with squares [or rectangles]). Compression of image frames using standard video compression algorithms such as MPEG2 and H.264 reduces these bottlenecks at the cost of additional computations and reference frame storage (previously decoded image frames). In video applications, if lossless or lossy compression of macroblocks within reference frames were used to reduce memory capacity requirements and to reduce memory access time, it would be desirable that such macroblock encoding be computationally efficient in order to minimize demands on computing resources. It would be further desirable that the macroblock encoding method support both raster-based and block-based access patterns.
Imaging systems are ubiquitous in both consumer and industrial applications using microprocessors, computers, and dedicated integrated circuits called systems-on-chip (SoCs) or application-specific integrated circuits (ASICs). Such imaging systems can be found in personal computers, laptops, tablets, and smart phones; in televisions, satellite and cable television systems, and set-top boxes (STBs); and in industrial imaging systems that include one or more cameras and a network for capturing video from monitored systems as diverse as factories, office buildings, and geographical regions (such as when unmanned aerial vehicles or satellites perform reconnaissance). Such imaging and video systems typically capture frames of image data from image sensors that require raster-based access. Similarly, images in such imaging and video systems typically use monitors or displays on which users view the captured still images or videos. Because digital video systems require memory access to tens or even hundreds of Megabytes (MByte) per second for recording or playback, several generations of video compression standards, including Moving Picture Experts Group (MPEG and MPEG2), ITU H.264, and the new H.265 (High Efficiency Video Codec) were developed to reduce memory bandwidth and capacity requirements of video recording and playback. These video processing standards achieve compression ratios between 10:1 and 50:1 by exploiting pixel similarities between successive frames. Many pixels in the current frame can be identical, or only slightly shifted horizontally and/or vertically, to corresponding pixels in previous frames. The aforementioned image compression standards operate by comparing areas of similarity between subsets (typically called macroblocks, or MacBlks) of the current image frame to equal-sized subsets in one or more previous frames. Macroblocks are the basic element used for many prediction and motion estimation techniques in video codec processes. In the remainder of this document, we use the abbreviation ‘MacBlk’ for the term ‘macroblock’, to distinguish it from the abbreviation ‘MByte’ for the term ‘Megabyte’ (106 Bytes). The encoding process that searches for, and then determines, the location of similar MacBlks is commonly called Motion Estimation (ME). The decoding process that retrieves MacBlks from prior frames while creating MacBlks for the current frame is commonly called Motion Compensation (MC). Both ME and MC processes typically access pixels from prior frames in 16×16 pixel MacBlks. During both encoding and decoding, prior video frames whose MacBlks are searched (encoding) or used as a reference (decoding) are called reference frames. As of today (2013), ME and MC processes access uncompressed MacBlks (pieces of reference frames) in main memory, also called dynamic random access memory (DRAM) or double data rate (DDR) memory.
Especially in mobile and portable devices, where only a limited amount of power is available due to battery limitations, it is desirable to use as little power for video recording and playback as possible. A significant (>30%) amount of power is consumed during video encoding when the ME process accesses MacBlks in reference frames stored in off-chip DDR memory, and during video decoding when the MC process accesses MacBlks in reference frames stored in off-chip DDR memory. In today's portable computers, tablets, and smart phones, the video encoding and decoding process is often orchestrated by one or more cores of a multi-core integrated circuit (IC).
The present specification describes an access encoder for performing low complexity encoding of reference frame MacBlks in a user-programmable way that supports both raster and MacBlk-based access. As MacBlks from reference frames are written to DDR memory, they are encoded according to user-selected parameters, such as the desired encoding ratio or the desired image quality (optionally including lossless compression). Similarly, as encoded MacBlks from reference frames are read from off-chip DDR memory, they are decoded according to the parameters selected or calculated during prior MacBlk encoding. The access encoder organizes the pixel data in a manner that supports both raster-based access and macroblock-based access. In prior video processing systems, additional steps such as transposition are typically required to convert between the decoded macroblocks in reference frames (stored in MacBlk access patterns for the convenience of standard video encoding and decoding) and the raster-based access preferred by image sensors and image displays and monitors. The access encoder described herein does not require such steps.
Commonly owned patents and applications describe a variety of compression techniques applicable to fixed-point, or integer, representations of numerical data or signal samples. These include U.S. Pat. No. 5,839,100 (the '100 patent), entitled “Lossless and loss-limited Compression of Sampled Data Signals” by Wegener, issued Nov. 17, 1998. The commonly owned U.S. Pat. No. 7,009,533, (the '533 patent) entitled “Adaptive Compression and Decompression of Bandlimited Signals,” by Wegener, issued Mar. 7, 2006, incorporated herein by reference, describes compression algorithms that are configurable based on the signal data characteristic and measurement of pertinent signal characteristics for compression. The commonly owned U.S. Pat. No. 8,301,803 (the '803 patent), entitled “Block Floating-point Compression of Signal Data,” by Wegener, issued Apr. 28, 2011, incorporated herein by reference, describes a block-floating-point encoder and decoder for integer samples. The commonly owned U.S. patent application Ser. No. 13/534,330 (the '330 application), filed Jun. 27, 2012, entitled “Computationally Efficient Compression of Floating-Point Data,” by Wegener, incorporated herein by reference, describes algorithms for direct compression floating-point data by processing the exponent values and the mantissa values of the floating-point format. The commonly owned patent application Ser. No. 13/617,061 (the '061 application), filed Sep. 14, 2012, entitled “Conversion and Compression of Floating-Point and Integer Data,” by Wegener, incorporated herein by reference, describes algorithms for converting floating-point data to integer data and compression of the integer data.
The commonly owned patent application Ser. No. 13/617,205 (the '205 application), filed Sep. 14, 2012, entitled “Data Compression for Direct Memory Access Transfers,” by Wegener, incorporated herein by reference, describes providing compression for direct memory access (DMA) transfers of data and parameters for compression via a DMA descriptor. The commonly owned patent application Ser. No. 13/616,898 (the '898 application), filed Sep. 14, 2012, entitled “Processing System and Method Including Data Compression API,” by Wegener, incorporated herein by reference, describes an application programming interface (API), including operations and parameters for the operations, which provides for data compression and decompression in conjunction with processes for moving data between memory elements of a memory system.
The commonly owned patent application Ser. No. 13/358,511 (the '511 application), filed Jan. 12, 2012, entitled “Raw Format Image Data Processing,” by Wegener, incorporated herein by reference, describes encoding of image sensor rasters during image capture, and the subsequent use of encoded rasters during image compression using a standard image compression algorithm such as JPEG or JPEG2000.
In order to better meet MacBlk access requirements during video capture, processing, and display, and to reduce memory utilization and complexity during both raster-based and block-based access, a need exists for a flexible, computationally efficient MacBlk encoding and decoding method that supports both raster and MacBlk access patterns.
In one embodiment, the access encoder described herein is applied to unencoded or previously decoded image data organized as macroblocks. The access encoder encodes the macroblocks for storage in memory in an order that supports both raster and MacBlk access to the stored, encoded macroblocks. Supplemental location information is also stored to be used for retrieving the desired portion of image data in macroblock or raster formats for further processing or display. In one aspect, MacBlk encoding and decoding for image data may be implemented using resources of a computer system.
Other aspects and advantages of the present invention can be seen on review of the drawings, the detailed description and the claims, which follow.
a illustrates an example of a video encoder where previously decoded reference frames are stored in a memory.
b illustrates an example of a video decoder where previously decoded reference frames are stored in a memory.
c illustrates an example of the access encoder and access decoder providing memory access during the video decoder's motion compensation process.
a and 17b illustrate examples of systems in which a video encoder and a video decoder include an access encoder and an access decoder.
Embodiments of the access encoder and access decoder described herein may encompass a variety of computing architectures that represent image data using a numerical representation. Image data may include both integer data of various bit widths, such as 8 bits, 10 bits, 16 bits, etc. and floating-point data of various bit widths, such as 32 bits or 64 bits, etc. The image data may be generated by a variety of applications and the computing architectures may be general purpose or specialized for particular applications. The image data may result from detected data from a physical process, image data created by computer simulation or intermediate values of data processing, either for eventual display on a display device or monitor, or simply for intermediate storage. For example, the numerical data may arise from image sensor signals that are converted by an analog to digital converter (ADC) in an image sensor to digital form, where the digital samples are typically represented in an integer format. Common color representations of image pixels include RGB (Red, Green, Blue) and YUV (brightness/chroma1/chroma2). Image data may be captured and/or stored in a planar format (e.g. for RGB, all R components, followed by all G components, followed by all B components) or in interleaved format (e.g. a sequence of {R,G,B} triplets).
An image frame has horizontal and vertical dimensions H_DIM and V_DIM, respectively, as well as a number of color planes N_COLORS (typically 3 [RGB or YUV] or 4 [RGBA or YUVA], including an alpha channel). H_DIM can vary between 240 and 2160, while V_DIM can vary between 320 and 3840, with typical H_DIM and V_DIM values of 1080 and 1920, respectively, for a 1080p image or video frame. A single 1080p frame requires at least 1080×1920×3 Bytes=6 MByte of storage, when each color component is stored using 8 bits (a Byte). Video frame rates typically vary between 10 and 120 frames per second, with a typical frame rate of 30 frames per second (fps). As of 2013, industry standard video compression algorithms called H.264 and H.265 achieve compression ratios between 10:1 and 50:1 by exploiting the correlation between pixels in MacBlks of successive frames, or between MacBlks of the same frame. The compression or decompression processing by industry-standard codecs require storage of the last N frames prior to the frame that is currently being processed. These prior frames are stored in off-chip memory and are called reference frames. The access encoder described below accelerates access to the reference frames between a processor and off-chip memory to reduce the required bandwidth and capacity for MacBlks in reference frame.
Access encoder processing continues in this way for subsequent slices of 16 rasters from the image sensor, filling encoded MacBlk regions in the allocated DDR addresses (9 MacBlks per DDR page) until all image sensor rasters have been encoded and stored. To summarize, after reading all 1,080 rasters from the image sensor, and filling 120 MacBlks per 16 rasters, 16 encoded pixels at a time, the access encoder has received 1,080 image sensor rasters and has stored 8,160 encoded MacBlks in DDR memory, accessing 120 encoded MacBlk regions in DDR memory per 16 rasters. In this example, the access encoder has stored the encoded MacBlks in half the DDR memory that would have been required to store the uncompressed YUV 4:2:0 pixels for this frame. Furthermore, the time taken to write the encoded pixels to DDR memory was also reduced by 2×, which decreases the power consumption of transferring and writing image sensor's pixels to DDR memory. Thus the access encoder provides both a capacity and a power savings to systems that capture and store image sensor data in DDR memory.
The C pseudo-code below illustrates an example of data structures and software methods useful for implementing the access encoding process for use case 1, just described with respect to
The function encodeFrame uses the following parameters:
Lines 54-97 include pseudo-code that controls the access encoder's encoding operations, where a 16-raster slice is processed 1 raster at a time to build up 120 encoded MacBlks per 16-raster slice. The outer “for” loop (index i, line 58) iterates 16 rasters at a time for RASTERS_PER_FRAME, while the inner pseudo-code “for” loop (index j, line 64) iterates 16 pixels at a time for PIXELS_PER_RASTER. In this manner, the two control loops provide the APAX_encode_MB_row function (lines 88-9) with 16 pixels at a time to encode and to store at the address specified by rasterPtr[k]. The function APAX_encode_MB_row implements the compression operations applied to the samples in the macroblock row. In a preferred embodiment the compression operations comprise calculating sample differences followed by block floating point encoding. For each outer loop iteration (index i), index k is cleared, initializing the local rasterPtr index in the inner loop. At the start of each inner loop iteration, when j==0, the macBlkPtr value and local rasterPtr value are initialized, and the encoded buffer pointer is incremented by encMBsize. Each call to APAX_encode_MB_row encodes the current PIXEL_PER_MB_ROW input pixels into compressed data having a number of compressed Bytes, N_compBytes. Because N_compBytes returned by each APAX_encode_MB_row call may vary from call to call, the local rasterPtr array maintains separate pointers for each encoded MacBlk in this 16-raster ‘slice.’ Thus every time the function APAX_encode_MB_row is called, IN_PTR_INC bytes are consumed from the input raster, while N_compBytes encoded bytes are generated. After the inner and outer loops complete, the pseudo-code has generated 8,160 encoded MacBlks and 8,160 encoded MacBlk pointers, stored in the macBlkDir array.
The C pseudo-code below illustrates an example of data structures and software methods useful for implementing the access decoding operations. The pseudo-code for a function called getMacBlk retrieves the encoded version of the specified (requested) MacBlk from DDR memory and returns the decoded version of that MacBlk to the calling function. The access decoder decodes the encoded MacBlk to re-create the pixels of the requested MacBlk.
The function getMacBlk is given the requested MacBlk number, macBlkNum, and is also given macBlkDir (an array of encoded MacBlk pointers), and the address where the decoded MacBlk's pixels shall be stored (macPlkPixels). The getMacBlk function first calculates the DDR memory start address for macBlkNum by retrieving that block's starting address (macBlkAddr) from the array (macBlkDir) of encoded MacBlk start addresses. The function APAX_decode_MB (line 15) decodes the encoded MacBlk whose encoded values begin at address macBlkAddr, and stores the decoded pixels into the macBlkPixels memory buffer. The function APAX_decode_MB is also provided with the pixelType, such as RGB 4:4:4 or YUV 4:2:2, of the requested color component and their width. In a preferred embodiment, the function APAX_decode_MB performs block floating point decoding and integration operations to invert the operations of the function APAX_encode_MB_row described above.
The C pseudo-code below illustrates an example of how certain data structures and software methods can implement the access decoding and MacBlk-to-raster operations. The function decodeFrame includes input parameters for the DDR start address of the first encoded MacBlk, a directory of pointers into the encoded frame (one pointer or start address per encoded MacBlk), a pointer to where the decoded pixels should be stored (one raster at a time), and a pixelType parameter that indicates the color components and color decimation parameters of the encoded MacBlks.
The decodeFrame function initializes various constants such as PIXELS_PER_RASTER and RASTERS_PER_FRAME, and allocates space for a local rasterDir array that stores pointers for each decoded MacBlk. The local rasterDir array (line 25) is needed because each encoded MacBlk may use a different number of Bytes for each group of 16 encoded pixels. Thus each encoded MacBlk can be decoded by different amounts as the decoded rasters are created by decodeFrame. A local frame pointer (framePtr, line 31) is initialized to point at the first byte of the decoded raster. The decoded frame buffer may occupy a contiguous area of memory large enough to hold the entire frame, to hold just one raster of the frame, to hold just one pixel of the current raster (such as the “wide” register described above), or to hold just one Byte (one color component) of the current pixel (such as the “narrow” register described above).
The decodeFrame's outer loop, beginning at line 37, iterates over all rasters in the frame (in this example, one raster at a time). The decodeFrame's inner loop, beginning at line 43, generates 16 pixels at a time, where the 16 pixels are decoded from one of 120 encoded MacBlks in this example. The pointers into the 120 encoded MacBlks are stored in the local rasterPtr array, whose index k is incremented 120 times per decoded raster of 1,920 pixels in this example. Since the number of encoded Bytes that corresponds to every 16 decoded pixels can vary from MacBlk to MacBlk, the APAX_decode_MB_row function returns the variable NencBytes, which advances the local rasterPtr address with every 16 decoded pixels (line 59). After 16 pixels have been decoded from each of 120 encoded MacBlks, the 1,920 pixel raster can be written to the raster-oriented display register, buffer, or frame.
The use cases described with respect to
a and 15b illustrate examples of macroblock-based video encoding and decoding algorithms, such as MPEG2, H.264, and H.265 (HEVC), that use one or more reference frames stored in a memory for encoding a current frame of pixels. The macroblock-based video encoding algorithms have previously encoded the reference frames, decoded the encoded reference frames and stored the previously decoded reference frames RF—1 to RF—6 for use in motion estimation calculations for encoding the current frame.
c illustrates an example of the access encoder and access decoder providing memory access during the video decoder's motion compensation process. During macroblock-based decoding of a current frame, the access encoder stores re-encoded (compressed) versions of the MacBlks that comprise reference frames RF 1C thru RF—6C; these MacBlks replace those originally associated with the MacBlks of the previously decoded reference frames RF—1 thru RF—6 of
Depending on the distance between “anchor” frames (also called P frames), the difference between the approximated macroblock and the original macroblock may cause some amount of ‘drift’ in the interpolated frames (also called I frames) between P frames. In macroblock-based video encoding algorithms, such as MPEG2, H.264, and H.265 (HEVC), the distance between P frames is called the Group Of Pictures (GOP) distance. GOP distance is a user-selectable parameter of the video encoding algorithm. The smaller the GOP distance, the more likely that the approximated macroblocks of the reference frames RF—1A-RF—6A (
The access encoder of
a and 17b illustrate examples of systems in which a video encoder and a video decoder include an access encoder and an access decoder.
1. The original image components (such as RGB or YUV),
2. The first difference between corresponding image components, where the variable
“i” indicates the current image component along a row or raster, such as:
i. R(i)-R(i−1), followed by
ii. G(i)-G(i−1), followed by
iii. B(i)-B(i−1); or
iv. Y(i)-Y(i−1), followed by
v. U(i)-U(i−1), followed by
vi. V(i)-V(i−1)
3. The difference between corresponding image components from the previous row (raster), where the variable i indicates the current image component along a row or raster, and the variable j indicates the current row or raster number, such as:
i. R(i,j)-R(i,j−1), followed by
ii. G(I,j)-G(i,j−1), followed by
iii. B(i,j)-B(i,j−1); or
iv. Y(i,j)-Y(i,j−1), followed by
v. U(i,j)-U(i,j−1), followed by
vi. V(i,j)-V(i,j−1)
During the encoding of the current MacBlk, the redundancy remover determines which of these three streams will use the fewest bits, i.e. will compress the most. That stream is selected as the “best derivative” for the next encoded MacBlk. The “best derivative” selection is encoded in the encoded MacBlk's header (as indicated by the DERIV_N parameter in
The access encoder/decoder can reduce the amount of DDR memory required to store reference frames in image compression applications such as H.264 and similar algorithms that encode image frames using MacBlks, as well as the time required to access the reference frame's pixels. The access encoder/decoder can also reduce the amount of memory required to capture image sensor frames, and to store display frames. The access encoder/decoder allows for variation in frame dimensions (PIXELS_PER_RASTER and RASTERS_PER_FRAME), macroblock dimensions (PIXELS_PER_MB_ROW), pixel color encoding and color space decimation (BYTES_PER_PIXEL and pixelType), encoding (compression) ratio (encRatio), and DDR memory page size (DDR_PAGE_SIZE). The access encoder/decoder provides a flexible, user-controllable method of reducing both DDR memory capacity and memory bandwidth required for common image capture, processing, storage, and display functions. Speed and latency of the access encoding and decoding processes can be modified by varying the number of pipeline stages in the combinatorial logic for the flexible encoding and decoding functions. Other implementations of the access encoder and decoder functions may use dedicated input and output registers in addition to or instead of the memory and registers described in the examples of the present specification.
A variety of implementation alternatives exist for the embodiments of the access encoder and access decoder, such as implementation in a microprocessor, graphics processor, digital signal processor, field-programmable gate array (FPGA), application-specific integrated circuit (ASIC), or system-on-chip (SoC). The implementations can include logic to perform the access encoding and access decoding processes described herein, where the logic can include dedicated logic circuits, configurable logic such as field programmable logic array FPGA blocks, configured to perform the functions, general purpose processors or digital signal processors that are programmed to perform the functions, and various combinations thereof.
The access encoder and access decoder operations can be implemented in hardware, software or a combination of both, and incorporated in computing systems. The hardware implementations include ASIC, FPGA or an intellectual property (IP) block for a SoC. The access encoder and access decoder operations can be implemented in software or firmware on a programmable processor, such as a digital signal processor (DSP), microprocessor, microcontroller, multi-core CPU, or GPU.
In one embodiment for a programmable processor, programs including instructions for operations of the access encoder and access decoder are provided in a library accessible to the processor. The library is accessed by a compiler, which links the application programs to the components of the library selected by the programmer. Access to the library by a compiler can be accomplished using a header file (for example, a file having a “.h” file name extension) that specifies the parameters for the library functions and corresponding library file (for example, a file having a “.lib” file name extension, a “.obj” file name extension for a Windows operating system, or a file having a “.so” file name extension for a Linux operating system) that use the parameters and implement the operations for the access encoder/decoder. The components linked by the compiler to applications to be run by the computer are stored, possibly as compiled object code, for execution as called by the application. In other embodiments, the library can include components that can be dynamically linked to applications, and such dynamically linkable components are stored in the computer system memory, possibly as compiled object code, for execution as called by the application. The linked or dynamically linkable components may comprise part of an application programming interface (API) that may include parameters for compression operations as described in the '898 application.
For implementation using FPGA circuits, the technology described here can include a memory storing a machine readable specification of the access encoder logic, and a machine readable specification of the access decoder logic, in the form of a configuration file for the FPGA block. For the systems shown in
When received within a computer system via one or more computer-readable media, such data and/or instruction-based expressions of the above described circuits may be processed by a processing entity (e.g., one or more processors) within the computer system in conjunction with execution of one or more other computer programs including, without limitation, netlist generation programs, place and route programs and the like, to generate a representation or image of a physical manifestation of such circuits. Such representation or image may thereafter be used in device fabrication, for example, by enabling generation of one or more masks that are used to form various components of the circuits in a device fabrication process.
While the preferred embodiments of the invention have been illustrated and described, it will be clear that the invention is not limited to these embodiments only. Numerous modifications, changes, variations, substitutions and equivalents will be apparent to those skilled in the art, without departing from the spirit and scope of the invention, as described in the claims.
This application claims benefit of U.S. Provisional Patent Application No. 61/759,805 filed on 1 Feb. 2013, which application is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
61759805 | Feb 2013 | US |