This application claims the benefit of Japanese Application No. 2008-318665 filed in Japan on Dec. 15, 2008, the contents of which are incorporated herein by this reference.
1. Field of the Invention
The present invention relates to a multi-core streaming processor configured to perform parallel processing of streams, an operation method of the streaming processor and a processor system including the streaming processor.
2. Description of the Related Art
In distribution of a digital motion picture, for example, in digital TV broadcast, efficient encoding techniques, including data compression, are indispensable in order to reduce the band required for data transmission. For example, “H.264”, which is the standard of compression encoding methods for motion picture data and which was recommended by the International Telecommunications Union in May 2003 is widely used for digital TV broadcast and the like, as an encoding method.
An encoded digital motion picture is decoding-processed by a receiver and displayed on a display device.
In the decoding process, a distributed stream has to be processed on a real-time basis. A streaming processor has been developed which performs, using a multi-core processor having one general-purpose processor core and multiple operation processor cores, parallel processing by assigning multiple processes of the decoding process to the operation processor cores in order to perform processing within a limited time.
For example, a processor system 101 shown in
Here, each of the decoding processes A to D corresponds to any of the processes including the entropy decoding process, described with reference to
In the streaming processor 110, it is determined in advance which process of the decoding process is to be assigned to which of the operation processor cores 112A to 112G. However, since the processing load of an encoded stream is not known until decoding process of the stream is started, assignment of cores is determined on the basis of a stream with the maximum load which may be inputted. Therefore, when a stream with a low processing load is inputted, the operation processor cores are not effectively used.
As shown in
As described above, in the conventional streaming processor 110 and processor system 101, the original performance of the processor cannot be sufficiently shown, for example, because other programs cannot be executed even when there is space in operation processor cores, and there is a possibility that decoding processing of stream data cannot be efficiently performed.
A streaming processor of an embodiment of the present invention is a streaming processor configured to perform decoding processing of an encoded stream, includes: one general-purpose processor core and multiple operation processor cores configured to perform in parallel multiple processes constituting the decoding processing; wherein the streaming processor performs stream analysis processing which includes load estimation processing for estimating a processing load for each stream on the basis of stream information about the stream and assignment processing for assigning the processes to be performed by the operation processor cores on the basis of the estimated processing load.
An operation method of a streaming processor of another embodiment of the present invention is an operation method of a streaming processor configured to perform decoding processing of an encoded stream, wherein the streaming processor includes one general-purpose processor core and multiple operation processor cores configured to perform in parallel multiple processes constituting the decoding processing; and the operation method includes: separating the stream which has been inputted, into an H.264 stream and an audio stream; analyzing a NAL unit in the separated H.264 stream to acquire stream information; estimating a processing load for each of multiple processes for performing decoding processing of the H.264 stream; determining the number of necessary operation processor cores on the basis of an estimated maximum processing load; assigning the processes to be performed by the operation processor cores; and subjecting the operation processor cores to perform the processes.
A processor system of still another embodiment of the present invention includes: a streaming processor configured to perform decoding processing of an encoded stream having: one general-purpose processor core and multiple operation processor cores configured to perform in parallel multiple processes constituting the decoding processing, wherein the streaming processor performs stream analysis processing which includes load estimation processing for estimating a processing load for each stream on the basis of stream information about the stream and assignment processing for assigning the processes to be performed by the operation processor cores on the basis of the estimated processing load; an input device configured to input the encoded stream to the streaming processor; an output device configured to output a decoded stream inputted from the streaming processor; and a storage device configured to store programs for the multiple processes, a table for correspondence between the stream information and the processing load, and a table for correspondence between the processing load and the number of the operation processor cores to be used.
A streaming processor 10 and a processor system 1 of a first embodiment of the present invention will be described below with reference to drawings.
As shown in
The storage device 4 stores programs 17 for processes divided and assigned to the operation cores 12, a parameter/processing load correspondence table 15 and a processing load/number of used cores correspondence table 16. The general-purpose core 11 performs processing for a program having a stream input function, a video/audio output function and the like to be read into a memory not shown. Each of the operation cores 12 performs processing for each of programs having other functions to be read into the memory not shown. In addition, as shown in
The streaming processor 10 is a multi-core processor having the one general-purpose core 11 and the seven operation cores 12A to 12G. However, the numbers of the processor cores are not limited to the numbers in the present embodiment.
For example, the input device 2 is a receiving section of a digital high-vision TV broadcast receiver or a hard disk recorder having a digital high-vision TV broadcast receiving function, the output device 3 is a monitor or a speaker, and the storage device 4 is a hard disk device.
Next, the operation of the streaming processor 10 and the processor system 1 of the present embodiment will be described with the use of
Description will be made in accordance with the flowchart in
Encoded stream data is inputted to the streaming processor 10 via the input device 2 and the general-purpose core 11.
The encoded stream inputted to the streaming processor 10 is sent to a stream separation section of the operation core 12A and separated into an H.264 stream and an audio stream. As shown under “start time” in
The separated H.264 stream and audio stream are sent to the stream analysis section 12B1 of the operation core 12B and an audio decoding section of the operation core 12G, respectively.
The stream analysis section 12B1 configured to perform stream analysis processing confirms whether an SPS parameter and a PPS parameter are included in a NAL unit, from the H.264 stream. If the parameters are not included (No), then, for example, each program is read into each memory in accordance with predetermined assignment of the programs 17 to the operation cores 12 as shown in
If the SPS parameter and the PPS parameter are included in the NAL unit (S23: Yes), then the stream analysis section 12B1 performs decoding processing only of the NAL unit and takes out parameter information which is added information. The operation of the stream analysis section 12B1 will be described later in detail with reference to
The stream analysis section 12B1 estimates the maximum processing load of the inputted stream with the use of acquired stream information on the basis of the parameter/processing load correspondence table 15. In the following description, “processing load on each processor cores at the time of processing each stream” will be described “stream load”. Together with a stream analysis program, the program 17 is read into a memory section of one operation core 12C from the storage device 4, and the program 17 operates and measures the processing load of each process for multiple streams with different parameters. On the basis of the result of the measurement, the parameter/processing load correspondence table 15 is created and stored into the operation core 12C.
For example, in the case of a 30 fps stream input, the processing load of each process refers to the sum total of the number of instruction cycles required for decoding processing of thirty frames, and the unit is “cycles/sec”.
For example, if the processing loads of processes are the values as shown below in the case where a level 4.1 stream is decoded, then the values shown below immediately constitute the parameter/processing load correspondence table 15 as shown in
Process A: 50 G cycles/sec
Process B: 60 G cycles/sec
. . .
Process E: 4 G cycles/sec
The stream analysis section 12B1 determines the minimum number of processor cores required for maintaining the performance for performing decoding processing of an inputted stream on a real-time basis, on the basis of the estimated maximum processing load and the processing load/number of used cores correspondence table 16. The processing load/number of used cores correspondence table 16 has been already read into the memory section of the operation core 12C from the storage device 4 together with the stream analysis program.
To explain this in greater detail, the processing load of the whole program and the ratio of the processing load of each process to the processing load of the whole program are calculated from the processing load measurement result obtained when the parameter/processing load correspondence table 15 is created. Then, from the result and the processing performance of the operation processor cores 12, the minimum number of used cores required by each process to maintain the real-time decoding processing performance is determined, and the processing load/number of used cores correspondence table 16 is created.
Specifically, when the processing load/number of used cores correspondence table 16 is created, the ratio of the processing load of each process to the load of the whole decoding process is calculated. That is, if the stream measurement result is as shown below, the decoding processing load for thirty frames is 10 G cycles/sec.
Process A: 3 G cycles/sec
Process B: 5 G cycles/sec
Process C: 0.8 G cycles/sec
Process D: 0.3 G cycles/sec
Process E: 0.9 G cycles/sec
Then, the decoding processing load ratios are as follows:
Process A: 30%
Process B: 50%
Process C: 8%
Process D: 3%
Process E: 9%
For example, in the case where the operating frequency of the operation cores 12 is 2.0 GHz, and processing corresponding to 2 G cycles can be executed per second, the minimum number of operation cores required to decode a stream with a processing load of 10 G cycles/sec on a real-time basis is 10 G/2 G=5. In this case, the number of cores assigned to each process, that is, the minimum number of used cores required by each process is calculated from the above ratios, as shown below.
Process A=5×0.3=1.5
Process B=5×0.5=2.5
Process C=5×0.08=0.4
Process D=5×0.03=0.15
Process E=5×0.09=0.45
The stream analysis section 12B1 assigns decoding processing to be performed by each of the operation cores 12. That is, as for the stream 1 shown in
The operation cores 12 perform decoding processing in parallel as processing sections of the read programs.
The streaming processor 10 outputs decoded data, that is, stream video data and audio data, to the output device 3 via the general-purpose core 11 after synchronizing output timings.
The streaming processor 10 repeats the above stream processing until an end instruction is given.
As described above, in the streaming processor 10 and the processor system 1 of the present embodiment, optimum processing arrangement is performed for each stream, and therefore, it is possible to efficiently perform decoding processing of stream data.
Next, the flow of the operation of information acquisition processing by a stream information analysis section of the streaming processor 10 of the present embodiment will be described in detail.
Description will be made in accordance with the flowchart in
As already described, the stream analysis section 12B1 of the operation core 12C of the streaming processor 10 acquires a NAL (Network Abstraction Layer) unit from an H.264 stream. The NAL unit includes various pieces of information about the stream. The streaming processor 10 estimates a processing load using a profile, a level, a macroblock size and an entropy coding mode. In the case where a bit rate is included in the NAL unit as the stream information, the streaming processor 10 estimates the processing load using the bit rate also.
When a stream is inputted, the stream analysis section 12B1 acquires a NAL unit included in the stream and (NAL_unit_type) associated with the NAL unit.
If the value of NAL_unit_type of the NAL unit equals to 7, the NAL unit includes a sequence parameter set (hereinafter referred to as “SPS”). Therefore, the stream analysis section 12B1 performs decoding processing of the SPS and acquires the values of the included profile (profile_idc), level (level_idc) and macroblock size (pic_width_in_mbs_minus1, pic_height_map_units_minus1). Here, the macroblock is a block with 16×16 pixels, which is a processing unit in H.264. The macroblock size is the number of blocks constituting video, that is, a video size.
If the value of (vui_parameters_present_flag) included in the SPS equals to 1, and both or any one of (nal_hrd_parameters_present_flag) and (vcl_hrd_parameters_present_flag) exist(s) in the SPS and the value(s) equals to 1, then the stream analysis section 12B1 acquires the value of the bit rate (bit_rate_value_minus1) existing in the SPS.
That is, the bit rate is not indispensable information as the stream information used for the streaming processor 10 to perform processing load estimation processing.
If the value of (NAL_unit_type) equals to 8, the NAL unit includes a picture parameter set (hereinafter referred to as “PPS”). Therefore, the stream analysis section 12B1 performs decoding processing of the PPS and acquires the value of the included entropy coding mode (entropy_coding_mode_flag).
When decoding processing of both of the SPS and the PPS and acquisition of necessary parameters included in the SPS and the PPS are completed, the stream analysis section 12B1 performs input stream load estimation processing at step S42. If only any one of the SPS and the PPS has been completed, the stream analysis section 12B1 acquires a next NAL unit included in the stream.
In the input stream load estimation processing, the stream analysis section 12B1 estimates a stream load corresponding to the combination of the parameters acquired from the SPS and the PPS, on the basis of the parameter/processing load correspondence table 15. As the parameters, that is, as the stream information, the profile, the level, the macroblock size and the entropy coding mode are indispensable information. When bit rate information can be obtained, the stream analysis section 12B1 also uses the bit rate information for the load estimation processing.
For example, as shown in
Furthermore, the stream analysis section 12B1 determines the number of used operation cores 12 corresponding to the estimated load, on the basis of the processing load/number of used cores correspondence table 16.
That is, as shown in
Of course, the processing load/number of used cores correspondence table 16 and the parameter/processing load correspondence table 15 may be shown not in a tabular form but in an expression.
As described above, in the streaming processor 10 and the processor system 1 of the present embodiment, optimum processing arrangement is performed for each stream on the basis of the processing load/number of used cores correspondence table 16 and the parameter/processing load correspondence table 15, and therefore, it is possible to efficiently perform decoding processing of stream data.
A streaming processor and a processor system of a first variation example of the first embodiment of the present invention will be described below with reference to drawings. Since the streaming processor and the processor system of this variation example are similar to the streaming processor 10 and the processor system 1 of the first embodiment, the same description will be omitted.
As shown in
In addition to the advantages of the streaming processor 10 and the processor system 1 of the first embodiment, the streaming processor and the processor system of this variation example can perform decoding processing of stream data more efficiently.
A streaming processor and a processor system of a second variation example of the first embodiment of the present invention will be described below with reference to drawings. Since the streaming processor and the processor system of this variation example are similar to the streaming processor 10 and the processor system 1 of the first embodiment, the same description will be omitted.
As shown in
Furthermore, in the streaming processor of this variation example, if there is any such an operation core 12 that the stream load is low even during decoding time and that is not used for decoding processing, the operation core 12 is used to perform signal processing other than the decoding processing, as shown under “decoding time” in
However, when the function of the stream analysis section 12B1 ends, the stream analysis section 12B1 is erased from the operation core 12B at decoding time, and the operation core 12B performs processing as a different decoding processing section A. As shown under “decoding time 2”, when the operation of the stream analysis section 12B1 is required again, the stream analysis program is read into the operation core 12B again and functions as the stream analysis section 12B1. Of course, the operation core 12 operating as the stream analysis section 12B1 is not limited to the operation core 12B. A different operation core 12 may be used.
In addition to the advantages of the streaming processor 10 and the processor system 1 of the first embodiment, the streaming processor and the processor system of this variation example can perform decoding processing of stream data more efficiently.
In addition to the advantages of the streaming processor 10 and the processor system 1 of the first embodiment, the streaming processor and the processor system of this variation example can perform decoding processing of stream data more efficiently. Furthermore, the streaming processor can be utilized for so-called best-effort process processing other than decoding processing, which is real-time processing. Furthermore, in the case where only processing with a low load is performed, there may be an operation core 12 which does not load a processing program. Since an operation core 12 which is not used at all does not consume power almost at all, the power consumption of the whole processor system can be reduced.
In the streaming processor and the processor system of the present invention, an upper limit of the number of operation cores 12 to be used by each decoding process may be set with the use of a so-called processor pool function. It is also possible to pool the number of cores to be used by the whole decoding process and perform the decoding process and processes other than the decoding process, balancing the decoding process and the other processes. It is also possible for the stream analysis section 12B1 to estimate a processing load for performing the whole decoding process, calculate the number of operation cores 12 for performing the whole decoding process and perform assignment processing prior to best-effort process processing.
Description has been made with H.264-coding-processed data as an example. However, the advantages of the present invention can be also obtained even in the case of other encoded data, for example, MPEG-4, MPEG-2 and VC1 data if the data is an encoded stream.
Having described the preferred embodiments of the invention referring to the accompanying drawings, it should be understood that the present invention is not limited to those precise embodiments and various changes and modifications thereof could be made by one skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2008-138665 | Dec 2008 | JP | national |