This present invention relates to the coding method and the coding apparatus of audio data and video data in digital broadcasting and communication field, especially to the technique for improving a transmission efficiency of the coded data.
Some of conventional coding apparatuses (encoders) that improve a transmission efficiency of coded data determine a quantization width based on buffer amount (for example, refer to document 1).
As shown in
The coding unit 202 receives an input of audio data 201 and calculates coded data size inside. The coding unit 202 generates coded data in a way that the coded data falls within the calculated coded data size, and the generated coded data is stored in the sequential storage unit 203. The data stored in the storage unit 203 is divided into the packet 205 by the packetization unit 204. The storage unit 203 transfers the stored data size to the quantization width calculation unit 206, and the quantization width calculation unit 206 transfers the quantization width to the coding unit 202.
In the case where the size of the coded data outputted by the coding unit 202 is variable length, the total size of the data stored in the storage unit 203 is also variable length.
The procedure at the time of dividing the data stored by the storage unit 203 will be shown in
The coded data stored by the storage unit 203 is called as elementary stream (abridged as ES from here). As to audio data, ESs are made in a plurality of audio frames 301, 302 and 308. Next, these ESs are converted into a packetized elementary stream (abridged as PES). ESs are converted into PESs by adding a PES header 303 to ESs, and the whole is called as PES packet. In the PES packet, the data part stored by the storage unit 203 is called as PES payload 304. This PES packet is converted into a transport stream (called TS from here). This PES packet is divided into fixed lengths, and the TS header 305 is added to each of the packets so as to make each of the corresponding TS packet. A TS payload 306 follows the TS header. In mapping from an ES to a PES or a TS, there may be a case of mapping an integer number of audio frames onto a single PES packet and mapping in a way that a single audio frame is not divided into plural PES packets, and also, mapping in a way that a single TS packet is not divided into plural PES packets with a view to making it easier to access on an audio frame basis as shown in
Japanese Laid-Open Patent application No. 2001-292448 publication.
However, in a conventional audio encoder, in the above-mentioned case, there is a case where a padding 307 is generated on a PES packet basis when dividing the PES packet into TS packets. Especially, percentage of the padding in the size of the PES packet becomes bigger when the size of the PES packet is small, and there emerges a problem that the transmission efficiency of the TS packets decreases.
Also, it is possible to improve the transmission rate of the TS packets by changing the coded data size in order to restrain padding, there is a problem that sound quality deteriorates when changing coded data size, to be more specific, reducing the coded data size or always changing the coded data size during the encode processing with no restriction.
The first object of the present invention is to provide a coding apparatus that solves the above-mentioned first problem, that is, to reduce the padding data and improve the transmission efficiency.
Also, the second object of the present invention is to provide a coding apparatus that solves the above-mentioned second problem and minimize sound deterioration improving the transmission efficiency.
In order to solve the above-mentioned problem, the coding apparatus concerning the present invention generates coded data by coding inputted data in a predetermined unit basis, maps one or more predetermined units of the coded data onto a packet of fixed length and outputs the packet, comprising: a storage unit operable to store the coded data and output a data size of the stored coded data; a packetization unit operable to map the coded data stored in the storage unit onto the packet and output a header size and a payload size of the packet assigned to one or more units of the coded data; a control unit operable to generate control information based on the data size of the outputted coded data, the header size and the payload size, and output the control information; and a coding unit operable to calculate an outer candidate value that makes the data size of the coded data to be mapped onto the packet become an integral multiple of the payload size of the packet based on the outputted control information, and code the predetermined units of inputted data up to the outer candidate value so as to generate the coded data.
In this way, it is possible to restrain padding of packets and improve the transmission efficiency by feeding back the information such as the size of the data stored by the storage unit, the packet size of the packetization unit and the like and adjusting the size of the coded data.
Also, in the coding apparatus concerning the present invention, further, the packetization unit may output a flag for judging timing for calculating the outer candidate value, the control unit may generate the control information including the outputted flag, and the coding unit may generate the coded data up to the outer candidate value in the case where the flag is ON.
Also, in the coding apparatus concerning the present invention, the packetization unit turns the flag ON before the data size of the coded data to be mapped onto the packet becomes the integral multiple of the payload size of the packet.
Also, in the coding apparatus concerning the present invention, the coding unit previously calculates an inner candidate value that is a unique coded data size at the time of coding, judges which of the inner candidate value and the outer candidate value should be used and generates the coded data up to the candidate value determined by the judgment.
Also, in the coding apparatus concerning the present invention, the coding unit generates the coded data up to the inner candidate value in the case where a ratio of difference between an inner candidate value and an outer candidate value to the inner candidate value exceeds a threshold.
Also, in the coding apparatus concerning the present invention, the inputted data is audio data, the coded data is an elementary stream of MPEG audio, and the packet of fixed length composes a MPEG transport stream.
Also, in the coding apparatus concerning the present invention, the inputted data is video data, the coded data is an elementary stream of MPEG video, and the packet of fixed length composes a MPEG transport stream.
As shown from the above-mentioned explanation, according to the coding apparatus of the present invention, it becomes possible to restrain the number of cases where padding occupies the most part at the time of dividing packets, and improve the transmission efficiency.
Also, in the case where the change of the coded data size, to be more specific, the reduce of the data size exceeds the threshold value, not changing the coded data size can avoid sound deterioration.
Also, it is possible to minimize sound deterioration because coded data size is changed only at a specific timing.
Therefore, this present invention can improve the transmission efficiency and minimize the deterioration of sound, and the present invention is highly practical today when digital broadcasting or the content distribution by the Internet and the like have been become popular.
Note that the present invention can not be only realized as this coding apparatus but also realized as an integrated circuit including characteristic units of this coding apparatus, as a coding method making these characteristic units of this coding apparatus become steps, and realized as a program that causes a computer to execute these steps. In addition, the program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.
Further Information about Technical Background to this Application
The disclosure of Japanese Patent Application No. 2003-383526 filed on Nov. 13, 2003 including specification, drawings and claims is incorporated herein by reference in its entirety.
These and other objects, advantages and features of the invention will become apparent from the following description thereof taken in conjunction with the accompanying drawings that illustrate a specific embodiment of the invention. In the Drawings:
The embodiment of the present invention will be explained in detail with reference to figures below.
As shown in
The coding unit 102 generates an audio frame data from the audio data to be inputted and a coded data by coding the audio frame data based on the control information 106 from the control unit 107.
The storage unit 103 includes a buffer for storing audio frame data and coded data or a unit for obtaining the size of the stored audio frame. Also, the size of the audio frame stored in the storage unit 103 is transferred from the storage unit 103 to the control unit 107 as storage data information 108. The structural example of the storage data information 108 will be shown in
The packetization unit 104 transmits, to the control unit 107, a flag indicating the timing for dividing coded data stored by the storage unit 103 into packets and the packetization information 109 including the size of the packet 105 and the like. The structural example of the packetization information 109 will be shown in
The packetization information 109, as shown in
The control unit 107 generates the control information 106 based on the storage data information 108 that is sent and the packetization information 109 and communicates the generated control information 106 to the coding unit 102. The structural example of the control information will be shown in
Next, the procedure of packetization processing of the audio frame in the coding apparatus 100 will be explained with reference to
The coding unit 102 converts the audio data 101 (S701) that is inputted first into an audio frame (S702) and stores the audio frame in the storage unit 103 (5703). At this time, the storage unit 103 updates the storage data information 108 inside (S704). After that the storage unit 103 transmits the storage data information 108 to the control unit 107 (S704). At this time, the data size of the audio frame to be stored in a PES packet and a TS packet that are generated next is stored in the storage data information 108.
After that, when the coded data is stored in the storage unit 103, the packetization unit 104 packetizes the stored data. Also, the packetization unit 104 controls the cycle of the packetization. In this embodiment, it is assumed that packetization is performed every six audio frames. In this case, packetization is performed after these six audio frames are stored in the storage unit 103.
The packetization unit 104 transfers the packetization information 109 where the packetization flag 401 is turned ON to the control unit. 107 after the phase before the packetization is finished, in other words, the 5th audio frame is stored (S705, S706 and S707). In this embodiment, in the case of “turn ON”, all the 32 bit data is made to be “1”. Actual packetization is performed in the next phase, that is, after the sixth audio frame is stored (S709, S710). At this time, the packetization flag 401 is turned OFF and the packetization information 109 is transmitted to the control unit 107 (S711). In this embodiment, in the case of “turn OFF”, all the 32 bit data is made to be “0”. After the other phases finish (“N” in S709), the packetization flag 401 is kept OFF and the packetization information 109 is transmitted to the control unit 107.
Note that, in the embodiment, 32 bit data structure is employed as a packetization flag, all the 32 bit data is made to be “1” in the case of “turn ON”, and all the 32 bit data is made to be “0” in the case of “turn OFF”. However, 1 bit data structure may be used as a packetization flag as long as two statuses are used as an indication of a packetized flag.
The control unit 107 receives the packetization information 109 from the packetization unit 104, combines the packetization information 109 with the storage data information 108 from the storage unit 103 so as to generate the control information 106 and transmits the generated control information 106 to the coding unit 102 (S708).
After receiving the control information 106, the coding unit 102 generates the coded data by executing the coding processing of the audio frame data based on the control information 106 (S702).
The procedure of the coding processing using the control information in the audio encoder (detailed processing in S702) will be explained in
The control unit 107 generates control information 106 from the sent packetization information 109 and the storage data information 108 and transmits the control information 106 to the coding unit 102 (S802).
The coding unit 102 inputs PCM data (S801), calculates the candid value of the coded data size at the time of outputting the audio frame and performs coding up to this size. The coding unit 102 calculates the candid value of the coded data size internally inside (S803). This candid value is called as inner candidate value (nInnerCandidateValue). Also, the coding unit 102 uses the inner candidate value as the coded data size candidate value in the case where the packetization flag 401 of the control information 106 sent from the control unit 107 is OFF (S808). In other words, inner candidate value is used for 1 to 5 frames.
On the other hand, in the case where the packetization flag is ON, the coding unit 102 calculates the candidate valule of the coded data size based on the control information 106 (S805). This candidate value is called as outer candidate value (nOuterCandidateValue). In other words, outer candidate value is basically used for six frames.
As shown in formula (1), an outer candidate value is calculated by subtracting, from an inner candidate value, the remainder of dividing the total of the inner candidate value, a PES header size 602 and a storage data size 604 by a TS payload size 603. Note that, “%” in the formula (1) is an operator for calculating the surplus. In the case of calculating the outer candidate value, the coding unit 102 performs processing for determining one of the inner candidate value and the outer candidate value as the coded data size candidate value next.
nStockedDateSize:Size of coded data presently stored in storage
unit 103
nPESHeaderSize:PES header size
nTSPayloadSize:TS payload size
The processing for determining the candidate value of the coded data size will be explained as follows. Initially, the coding unit 102 calculates the evaluation values.
dEvaluateValue=(nInnerCandidateValue−nOuterCandidateValue)/nOuterCandidateValue (2)
dEvaluateValue:Evaluatioin value
The coding unit 102 judges whether the evaluation value is small enough to the inner candidate value or not. This judgment is made using the threshold processing (S807). In the case where the differential value is small enough (“Y” in S807), the coding unit 102 judges that the sound deterioration is small in the case where coding is performed up to the outer candidate value, substitutes the outer candidate value into the coded data size candidate value (S809), and performs coding up to this value (S810). Note that threshold processing is performed on condition that the value of the threshold is constant, in the case where the differential slightly exceeds the threshold, it is possible to change the threshold into a larger threshold, and perform coding up to the outer candidate value.
On the other hand, in the case where the differential value is judged as big (“N” in S807), the coding unit 102 judges that the sound deterioration is remarkable in the case where coding is performed up to the outer candidate value, substitutes the inner candidate value into the coded data size candidate value (S808) and performs coding up to this value (S810).
Determining the target of the coded data size in this way, it becomes possible to perform the restriction of sound deterioration and the restriction of padding efficiently. Also, as the coded data size is changed only when the packetization flag is ON, not all the time during the encoding processing, and it is possible to minimize the number of audio frames whose data size is changed in controlling the encoding processing, and the deterioration in sound by the change of the coded data size.
Note that the form of mapping six audio frames into a single PES packet has already been explained in the embodiment, but the method for mapping the audio frames onto the PES packet is not limited to the embodiment, but any mapping may be used.
Also, in this embodiment, packetization flag is turned ON at the timing when the fifth audio frame is stored, and the coded data size is changed, but the present invention is not limited to this, and a similar effect can be obtained even in the case of changing the coded data size at an arbitrary timing. In this case, changing the coded data size of only the data to be coded immediately before performing packetization makes it possible to reduce the deterioration in sound by the change of the coded data size.
Also, it is possible to change the coded data size of the fifth and the sixth audio frames in the case of an encoder whose inner candidate value of the sixth audio frame becomes equal to its inner candidate value of the fifth audio frame. In this encoder, the sizes of the fifth and the sixth audio frames are always equal to each other as shown in
nInnerCandidateValue=179
After that, as two audio frames are handled before executing the formula (1), the processing that doubles nInnerCandidateValue is added like below.
nInnerCandidateValue=179×2=358
Next, the formula (1) is executed like the conventional method, and the outer candidate value is calculated. Note that the PES header size is made to be 10, and the PES payload size is made to be 80.
At last, two audio frames are handled, the processing for dividing nOuterCandidateValue into two is added like below, this value is used for an outer candidate value of the fifth and the sixth audio frames.
In the case where this outer candidate value is employed, the storage unit 103 becomes like.
The padding becomes 57 in the case of
Unlike the case where outer candidate value is used only in a single audio frame, in the case where the outer candidate value is calculated and employed in a plurality of audio frames like this, it is possible to disperse the sound deterioration in a single audio frame into plural audio frames and restrain the sound deterioration of the whole music.
Also, mapping onto the TS packet is performed on a basis of PES packet in this embodiment, padding is generated by preventing plural PES packets from being mapped onto a single TS packet, but the present invention is not limited to this. With an audio encoder that performs mapping where padding is frequently generated at the time of generating the TS packet, the same effect is obtained by using the present invention.
Also, the audio encoder has already been shown in the embodiment, the present invention is not limited to this. The present invention is applicable in a video encoder or an encoder where padding is generated at the time of mapping onto the coded data.
Although only an exemplary embodiment of this invention has been described in detail above, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiment without materially departing from the novel teachings and advantages of this invention. Accordingly, all such modifications are intended to be included within the scope of this invention.
The present invention may be used not only for an audio encoder and a video data but also for improving the transmission efficiency in the case of transmitting the variable length frame data in a fixed length packet in a communication field.
Number | Date | Country | Kind |
---|---|---|---|
2003-383526 | Nov 2003 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP04/16693 | 11/4/2004 | WO | 8/23/2005 |