The present invention is directed to data compression and decompression and, more particularly, to balancing workloads in a multi-core video decoder.
Data compression is used for reducing the volume of data stored, transmitted or reconstructed (decoded and played back), especially for video content. Decoding recovers the video content from the compressed data in a format suitable for display. Various standards of formats for encoding and decoding compressed signals efficiently are available. Some standards that are commonly used are the International Telecommunications Union standards such as ITU-T H.264 ‘Advanced video coding for generic audiovisual services’, the standards of the Moving Picture Experts Group (MPEG), the VPx standards and the VC-1 standard.
Techniques used in video compression include inter-coding and intra-coding. Inter-coding uses motion vectors for block-based inter-prediction to exploit temporal statistical dependencies between items in different pictures (which may relate to different frames, fields, slices or macroblocks or smaller partitions). Intra-coding uses various spatial prediction modes to exploit spatial statistical dependencies (redundancies) in the source signal for items within a single picture. Prediction residuals, which define residual differences between the reference picture item and the currently encoded item, are then further compressed using a transform to remove spatial correlation inside the transform block before it is quantized during encoding. Finally, the motion vectors or intra-prediction modes are combined with the quantized transform coefficient information and encoded.
The decoding process involves taking the compressed data in the order in which it is received, decoding it for the different picture items, and combining the inter-coded and intra-coded items according to the motion vectors or intra-prediction modes. Decoding an intra-coded picture can be done without reference to other pictures, while decoding an inter-coded picture item uses the motion vectors together with blocks of sample values from a reference picture item selected by the encoder.
Decoding compressed video signals includes parsing parameters for a picture or slice from an input bit-stream. The parameters identify syntax element values, such as raw byte sequence payloads (RBSP) slice header, slice data and macroblock syntax elements. The parsing of the syntax elements enables the decoder to identify inter-coded and intra-coded items, any reference picture items, motion vectors or intra-prediction modes and prediction residuals, for example.
In a multi-core video decoder, the cores can be allocated to different tasks, and certain tasks can be performed by different cores in parallel. However, the parsing operations may be a bottleneck because there are interdependencies in variable length decoding and a picture may only contain one slice for resynchronization, restricting the performance of the decoder, to an extent that may be variable. It would be advantageous to have a multi-core video decoder in which the parsing and decoding operations were balanced in order to improve the overall performance of the decoder.
The present invention, together with objects and advantages thereof, may best be understood by reference to the following description of embodiments thereof shown in the accompanying drawings. Elements in the drawings are illustrated for simplicity and clarity and have not necessarily been drawn to scale.
The decoder 100 comprises multi-core processing resources 202 that perform parsing operations (110) and decoding operations (106) on picture data to be decoded. The multi-core processing resources 202 may perform parsing operations (110) in parallel with decoding operations (106). The control module 108 may adapt the resources of the cores by allocating each of a selected number of cores to parsing operations on data of a respective picture serially, and allocating other cores to decoding operations on picture data in parallel. The control module 108 may allocate the multi-core processing resources 202 between operations of parsing picture data (110) and decoding picture data (106) as a function of a workload parameter M, (TD-TP) related to the relative workloads of the parsing and decoding operations.
The adaptation of the multi-core processing resources 202 offers flexibility in balancing the parsing and decoding operations. The number of cores (one or more than one) that the control module 108 allocates to parsing operations can be selected to achieve a greater measure of balance between the parsing and decoding workloads. Certain cores can be allocated to parsing the data of respective pictures simultaneously; while the parsing operations of different pictures occur in parallel, each parsing core can parse serially the data of the respective picture, avoiding blocking the parsing operations. The decoding of one or more pictures can be distributed between one or more groups of the decoding cores in parallel.
The workload parameter M for current picture data may be related to relative durations P and D of parsing operations and of decoding operations for preceding picture data. The workload parameter M may be related to the relative values P/D of a duration P of parsing operations for preceding picture data that is a function of a difference between end and start times of the parsing operations on a core, and of a duration D of decoding operations that is a function of decoding times of samples of picture elements for the preceding picture data and of a sample rate. The duration D of decoding operations may be a function of the decoding times of the samples of picture elements after deduction of waiting times.
The workload parameter may be a time difference (TD-TP) between a completion time TD of decoding operations and a completion time TP of parsing operations for corresponding preceding picture data relative to a threshold value TTH. The control module 108 may allocate 302 unchanged numbers N, (X−N) of the cores to the parsing operations and decoding operations as long as the parsing operations for current picture data are completed in time for prompt decoding operations for the same picture data.
The control module 108 may allocate respective numbers N, (X−N) of the cores to the parsing and decoding operations, and adapts the numbers 304 as a function of the workload parameter.
The control module 108 may allocate a plurality N of the cores to the N serial parsing operations of data of N respective pictures. The decoder 100 includes temporary storage 104 for storing the results of the parsing operations. The control module 108 allocates at least one other of the cores to decoding data of at least one picture using the stored parsing results.
The control module 108 may adapt the resources of the cores repeatedly as a function of at least one of the following criteria: periodically, detection of a change of bit rate of the picture data to be decoded, and/or a change in the number of the number X of cores available for parsing and decoding operations.
In more detail, in the decoding process 300 illustrated in
At 314, a decision is taken whether to re-evaluate the number N of the cores in the multi-core processing resources 202 to allocate to the syntax parser 110. If the decision is not to re-evaluate the number N, the process proceeds at 302 to the next parsing operations 316. If the decision is to re-evaluate the number N, the process reverts to step 306 at 304. Factors influencing the decision 314 may include whether a change of bit rate of the picture data to be decoded is detected, and the calculation overhead associated with more frequent re-allocation of cores. Alternatively, or additionally, the decision 314 can be based on whether a change in the number of the number X of cores available for parsing and decoding operations occurs. Alternatively, the process can periodically revert 304 systematically to 306.
The workload parameter M for current picture data is calculated as the relative values P/D of the durations P and D of parsing and decoding operations for the preceding picture data at 414. At 416, the number cores allocated to the parser 110 is calculated as: N=M*X0M+1). If the number calculated is an integer, it can be applied directly. However, if N is not an integer, the next integer above can be used for a series of picture items and then the next integer below used for the next series. For example, if M=2 (parsing time is double the decoding time), X=8 (eight cores available), N=2*8/3. The number of cores can be balanced by using N=5 cores for 20 picture items and then N=6 cores for 10 picture items out of a total of 30 picture items.
The control module 108 may allocate at 302 unchanged numbers N, (X−N) of the cores to the parsing operations and decoding operations as long as the parsing operations for current picture data are completed in time for prompt decoding operations for the same picture data.
The invention may be implemented at least partially in a non-transitory machine-readable medium containing a computer program for running on a computer system, the program at least including code portions for performing steps of a method according to the invention when run on a programmable apparatus, such as a computer system or enabling a programmable apparatus to perform functions of a device or system according to the invention.
The computer program may be stored internally on computer readable storage medium or transmitted to the computer system via a computer readable transmission medium. All or some of the computer program may be provided on non-transitory computer-readable media permanently, removably or remotely coupled to an information processing system. The computer-readable media may include, for example and without limitation, any number of the following: magnetic storage media including disk and tape storage media; optical storage media such as compact disk media (e.g., CD ROM, CD R, etc.) and digital video disk storage media; nonvolatile memory storage media including semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM; ferromagnetic digital memories; MRAM; volatile storage media including registers, buffers or caches, main memory, RAM and so on; and data transmission media including computer networks, point-to-point telecommunication equipment, and carrier wave transmission media, just to name a few.
A computer program is a list of instructions such as a particular application program and/or an operating system. The computer program may for instance include one or more of: a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.
In the foregoing specification, the invention has been described with reference to specific examples of embodiments of the invention. It will, however, be evident that various modifications and changes may be made therein without departing from the broader spirit and scope of the invention as set forth in the appended claims.
Those skilled in the art will recognize that the boundaries between logic blocks are merely illustrative and that alternative embodiments may merge logic blocks or circuit elements or impose an alternate decomposition of functionality upon various logic blocks or circuit elements. Thus, it is to be understood that the architectures depicted herein are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality. Similarly, any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermediate components. Likewise, any two components so associated can also be viewed as being “operably connected”, or “operably coupled”, to each other to achieve the desired functionality.
Furthermore, those skilled in the art will recognize that boundaries between the above described operations merely illustrative. The multiple operations may be combined into a single operation, a single operation may be distributed in additional operations and operations may be executed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments.
Also, the invention is not limited to physical devices or units implemented in non-programmable hardware but can also be applied in programmable devices or units able to perform the desired device functions by operating in accordance with suitable program code, such as mainframes, minicomputers, servers, workstations, personal computers, notepads, personal digital assistants, electronic games, automotive and other embedded systems, cell phones and various other wireless devices, commonly denoted in this application as ‘computer systems’.
In the claims, the word ‘comprising’ or ‘having’ does not exclude the presence of other elements or steps then those listed in a claim. Furthermore, the terms “a” or “an,” as used herein, are defined as one or more than one. Also, the use of introductory phrases such as “at least one” and “one or more” in the claims should not be construed to imply that the introduction of another claim element by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim element to inventions containing only one such element, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an”. The same holds true for the use of definite articles. Unless stated otherwise, terms such as “first” and “second” are used to arbitrarily distinguish between the elements such terms describe. Thus, these terms are not necessarily intended to indicate temporal or other prioritization of such elements. The mere fact that certain measures are recited in mutually different claims does not indicate that a combination of these measures cannot be used to advantage.
Number | Date | Country | Kind |
---|---|---|---|
201510610852.7 | Jul 2015 | CN | national |