[Not Applicable]
[Not Applicable]
[Not Applicable]
Audio decoding of compressed audio data is preferably performed in real time to provide a quality audio output. While decompressing audio data in real time can consume significant processing bandwidth, there may also be time periods where the processing core is down. This can happen if the processing core decompresses the audio data ahead of schedule beyond a certain threshold.
The down time periods may not be sufficient to encode entire audio frames. Utilization of a faster processor to allow encoding of audio data during the down time periods is disadvantageous due to cost reasons.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.
Described herein are system(s), method(s) and apparatus for efficient background audio encoding/transcoding in a real time system, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.
These and other features and advantages of the present invention may be appreciated from a review of the following detailed description of the present invention, along with the accompanying figures in which like reference numerals refer to like parts throughout.
Referring now to
The audio data 5 can comprise audio data that is encoded in accordance with any one of a variety of encoding standards, such as one of the audio compression standards promulgated by the Motion Picture Experts Group (MPEG). The audio data 5 comprises a plurality of frames 5(0) . . . 5(n). Each frame can correspond to a discrete time period.
The audio data 10 for encoding can comprise digital samples representing an analog audio signal. The digital samples representing the analog audio signal are divided into discrete time periods. The digital samples falling into a particular time period form a frame 10(0) . . . 10(m).
In accordance with an embodiment of the present invention, after decoding a first audio frame, e.g., audio frame 5(0), an encoding task is performed on audio frame 10(0). This results in a partially encoded audio frame 10(0).
After partially encoding the audio frame 10(0)′, audio frame 5(1) is decoded. After decoding audio frame 5(1), at least another task is executed encoding the partially encoded second audio frame, 10(0)′, thereby resulting in partially encoded audio frame 10(0)″. After the foregoing, a third audio frame is decoded, audio frame 5(2).
It is noted that although audio frame 10(0) is partially encoded after each audio frame 5(0) . . . 5(n) is decoded in the foregoing embodiment, audio frame 10(0) does not necessarily have to be encoded after each audio frame in other embodiments of the present invention. Additionally, the number of audio frames that are decoded for a given format between each successive partial encoding of audio frame 10(0) are not necessarily constant and it will depend upon the number of encoding tasks scheduled in between and also the frame size and sampling rate selected for a given decode audio format.
Referring now to
After partially encoding the audio frame 10(0)′, at 23, audio frame 5(1) is decoded. After decoding audio frame 5(1), at 24 at least another task is executed encoding the partially encoded second audio frame, 10(0)′, thereby resulting in partially encoded audio frame 10(0)″. At 25, a third audio frame is decoded, audio frame 5(2).
An audio processing core for decoding audio data can also encode audio data. As noted above, audio frames 5(0) . . . 5(m) correspond to discrete time periods. For quality of audio playback, it is desirable to decode audio frames 5(0) . . . 5(m) at least a certain threshold of time prior to the discrete time period corresponding therewith. The failure to do so can result in not having audio data for playback at the appropriate time.
Where the audio data is decoded prior to the time for playback, the audio data can be stored in a buffer until the time for playback. However, if the processing core decodes the audio data too early, the buffer can overflow.
To avoid overflowing, the processing core temporarily ceases decoding the audio data beyond another threshold. This will now be referred to as “down times”. During down times, the processing core can encode audio data 10. The foregoing time period may be too short to encode an entire audio frame 10(0). Therefore in certain embodiments of the present invention, the process of encoding and/or compressing audio data is divided into discrete portions. During down times, one or more of the discrete portions can be executed. Therefore, audio frame 10(0) can be encoded over the course of several non-continuous down times as per the processing power available for encoding/transcoding.
Referring now to
After decoding frame 100(0), an acoustic model for frame F0 is generated and data bits for encoding frame F0 are allocated. After the foregoing, audio frame 100(1) can be decoded. After decoding audio frame 100(1), a modified discrete cosine transformation (MDCT) may be applied to frame F0, resulting in a frame MDCT0 of 1024 frequency coefficients 150, e.g., MDCTx(0) . . . MDCTx(1023).
After the foregoing, audio frame 100(2) can be decoded. After decoding audio frame 100(2), the set of frequency coefficients MDCT0 may be quantized, thereby resulting in quantized frequency coefficients, QMDCT0. After the foregoing, audio frame 100(3) is decoded.
After decoding audio frame 100(3), the set of quantized frequency coefficients QMDCT0 can be packed into packets for transmission, forming what is known as a packetized elementary stream (PES). The PES may be packetized and padded with extra headers to form an Audio Transport Stream (Audio TS). Transport streams may be multiplexed together, stored, and/or transported for playback on a playback device. After the foregoing, audio frame 100(4) can be decoded. The foregoing can be repeated allowing for the background encoding of audio data F0 . . . Fx while decoding audio data 100 in real time.
Referring now to
The audio processing core 412 encodes and decodes audio data. The video processing core 415 decodes video data. The SRAM 420 stores data associated with the audio frames that are encoded and decoded.
The audio processing core 412 decodes and encodes audio data. As noted above, audio frames correspond to discrete time periods that are desirably decoded at least a certain threshold of time prior to the discrete time period corresponding therewith. The failure to do so can result in not having audio data for playback at the appropriate time.
Where the audio data is decoded prior to the time for playback, the audio data can be stored in DRAM 410 until the time for playback. However, if the processing core decodes the audio data too early, the DRAM 410 can overflow.
To avoid overflowing, the audio processing core 412 temporarily ceases decoding the audio data beyond another threshold. During down times, the processing core can encodes audio data. As will be described in further detail below, the process of encoding and/or compressing audio data is divided into discrete portions. During down times, one or more of the discrete portions can be executed. Therefore, an audio frame can be encoded over the course of several non-continuous down times.
The SRAM 420 stores data associated with the encoded audio frames and decoded audio frames that are operated on by the audio processing core 412. About the time the audio processing core 412 switches from encoding to decoding or vice versa, the direct memory access (DMA) controller 425 copies the contents of the SRAM 420 to the DRAM 405, and copies the data associated with the audio frame that will be encoded//transcoded/decoded.
The foregoing allows for a reduction in the amount of SRAM 420 used by the audio processing core 412. In certain embodiments, the SRAM 420 can comprise no more than 20 KB. In certain embodiments, the DMA controller 425 schedules the direct memory accesses so that the data is available when the audio processing core 412 switches from encoding to decoding and vice versa.
Referring now to
At 515, the DMA controller 425 copies the contents of the SRAM 420 (audio samples F0) to the DRAM 405 and writes data associated with the audio frame 100(1) to the SRAM 420. At 520, audio processing core 412 decodes audio frame 100(1). At 522, the DMA controller 425 copies the contents of SRAM 420 to the DRAM 405 and writes audio samples F0 from the DRAM 405 to the SRAM 420.
At 525, the audio processing core 412 applies the modified discrete cosine transformation (MDCT) to the samples F0, resulting in frequency coefficients MDCT0. At 530, the DMA controller 425 copies the frequency coefficients MDCT0 from the SRAM 420 to the DRAM 405 and copies the data associated with audio frame 100(2) from the DRAM 405 to the SRAM 420.
At 535, the audio processing core 412 decodes audio frame 100(2). At 540, the DMA controller 425 copies the decoded audio data associated with audio frame 100(2) from the SRAM 420 to the DRAM 405 and copies the frequency coefficients MDCT0 from the DRAM 405 to the SRAM 420.
At 545, the audio processing core 412 quantizes the sets of frequency coefficients MDCT0, thereby resulting in quantized frequency coefficients QMDCT0. At 550, the DMA controller 425 copies the quantized frequency coefficients QMDCT0 from the SRAM 420 to the DRAM 405, and copies the data associated with audio frame 100(3) from the DRAM 405 to the SRAM 420.
At 555, the audio processing core 412 decodes the audio frame 100(3). At 560, the DMA controller 425 copies the decoded audio data associated with audio frame 100(3) from the SRAM 420 to the DRAM 405 and copy the quantized frequency coefficients QMDCT0 from the DRAM 405 to the SRAM 420.
At 565, the audio processing core 412 packs the quantized frequency coefficients QMDCT0 into packets for transmission, forming what is known as an audio elementary stream (AES). The AES may be packetized and padded with extra headers to form an Audio Transport Stream (Audio TS). Transport streams may be multiplexed together, stored, and/or transported for playback on a playback device.
The embodiments described herein may be implemented as a board level product, as a single chip, application specific integrated circuit (ASIC), or with varying levels of the system integrated with other portions of the system as separate components. Alternatively, if the processor is available as an ASIC core or logic block, then the commercially available processor can be implemented as part of an ASIC device wherein certain aspects of the present invention are implemented as firmware.
The degree of integration may primarily be determined by the speed and cost considerations. Because of the sophisticated nature of modern processors, it is possible to utilize a commercially available processor, which may be implemented external to an ASIC implementation.
While the present invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. Therefore, it is intended that the present invention not be limited to the particular embodiment disclosed, but that the present invention will include all embodiments falling within the scope of the appended claims.