Certain embodiments of the invention relate to handling of music files. More specifically, certain embodiments of the invention relate to a method and system for redundancy-based decoding of audio content.
In some conventional receivers and/or electronic media players, improvements may require extensive system modifications that may be very costly and, in some cases, may even be impractical. Determining the right approach to achieve design improvements may depend on the optimization of a system to a particular modulation type and/or to the various kinds of noises that may be introduced by a transmission channel. For example, the optimization of a receiver system or media player may be based on whether the signals being received, generally in the form of successive symbols or information bits, are interdependent. Signals received from and/or generated by, for example, a convolutional encoder, may be interdependent signals, that is, signals with memory. In this regard, a convolutional encoder may generate NRZI or continuous-phase modulation (CPM), which is generally based on a finite state machine operation.
One method or algorithm for signal detection in a receiver system or media player that decodes convolutional encoded data is maximum-likelihood sequence detection or estimation (MLSE). The MLSE is an algorithm that performs soft decisions while searching for a sequence that minimizes a distance metric in a trellis that characterizes the memory or interdependence of the transmitted signal. In this regard, an operation based on the Viterbi algorithm may be utilized to reduce the number of sequences in the trellis search when new signals are received. Another method or algorithm for signal detection of convolutional encoded data that makes symbol-by-symbol decisions is maximum a posteriori probability (MAP). The optimization of the MAP algorithm is based on minimizing the probability of a symbol error. In many instances, the MAP algorithm may be difficult to implement because of its computational complexity.
In audio applications, for example, improvements in the design and implementation of receivers or media players for decoding convolutional encoded audio data may require modifications to the application of the MLSE algorithm, the Viterbi algorithm, and/or the MAP algorithm in accordance with the manner in which the signal was transmitted. In this regard, the overall performance of the receiver or media player may therefore depend on the ability of the system to optimize the decoding of audio content.
Audio content, such as music, sounds, and/or voice data, may generally be comprised within an audio file format that is used to digitally store the audio data on a computer system, for example. There may be many different types of formats that may be utilized for storing audio files. Some files may be generated without using data compression while others may be based on lossless or lossy compression techniques. For example, the Apple Lossless and the lossless Windows Media Audio (WMA) formats are based on lossless compression techniques while MPEG-1 Audio Layer 3 (MP3) and lossy WMA are based on lossy compression techniques. The overall performance of a receiver or media player may therefore depend on the ability of the system to optimize the decoding of content within an audio file format.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.
A system and/or method is provided for redundancy-based decoding of audio content, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.
These and other advantages, aspects and novel features of the present invention, as well as details of an illustrated embodiment thereof, will be more fully understood from the following description and drawings.
Certain embodiments of the invention may be found in a method and system for redundancy-based decoding of audio content. A redundancy parameter may be generated for verifying a decoded bit sequence that comprises audio content, such as a decoded audio frame. The redundancy parameter may be a cyclic redundancy check (CRC) value and/or a length of frame value associated with the decoded audio frame. Information associated with the redundancy parameter may be comprised within a header of the audio frame. For example, a length of frame value, a bitrate value, a sampling rate frequency value, and/or a frame padding value may be comprised within the header of the audio frame. If the verification of the decoded audio frame fails, subsequent decoding of the previously decoded audio frame may be performed by imposing at least one physical constraint that results from the encoding of the audio frame.
The burst process block 102 may comprise suitable logic, circuitry, and/or code that may enable a burst process portion of the decoding operation of the receiver 100. The burst process block 102 may comprise, for example, a channel estimation operation and a channel equalization operation. Results from the channel estimation operation may be utilized by the channel equalization operation to generate a plurality of data bursts based on a maximum-likelihood sequence estimation (MLSE) operation, for example. In audio applications, the data bursts generated by the burst process block 102 may correspond to audio data bursts, for example. The output of the burst process block 102 may be transferred to the de-interleaver 104. The de-interleaver 104 may comprise suitable logic, circuitry, and/or code that may enable multiplexing of bits from a plurality of data bursts received from the burst process block 102 to form the frame inputs to the frame process block 106. Interleaving may be utilized to reduce the effect of channel fading distortion, for example. In audio applications, the frame inputs to the frame process block 106 may correspond to audio frame inputs, for example.
The channel decoder 108 may comprise suitable logic, circuitry, and/or code that may enable decoding of the bit sequences in the input frames received from the de-interleaver 104. The channel decoder 108 may enable utilizing a Viterbi algorithm during a Viterbi operation to improve the decoding of the input frames. The audio decoder 110 may comprise suitable logic, circuitry, and/or code that may enable audio specific processing operations on the results of the channel decoder 108 for specified audio file formats such as MP3, and/or lossy/lossless WMA, for example. The audio decoder 110 may be utilized to recognize and/or decode more than one audio file format, for example. The audio decoder 110 may be utilized to reconstruct an encoded audio file or an encoded audio sequence for playback via a speaker, a headset, and/or ear buds, for example. Notwithstanding, the audio decoder 110 need not be so limited.
In some instances, audio decoding applications need not require burst process operations. In this regard, operations provided by the burst process block 102 and/or the de-interleaver 104 may be disabled and/or by-passed, for example, to allow direct frame process operations by the frame process block 106 on the received audio frames.
Regarding the frame process operation in the receiver 100 in
Regarding the frame process operation in the receiver 100 in
For certain data formats, for example, the inherent redundancy of the physical constraints may result from the packaging of the data and the generation of a redundancy verification parameter, such as a cyclic redundancy check (CRC), for the packaged data. Moreover, decoding data generated by entropy encoders or variable length coding (VLC) operations may also meet some internal constraints. For example, VLC operations utilize a statistical coding technique where short codewords may be utilized to represent values that occur frequently and long codewords may be utilized to represent values that occur less frequently.
The maximum-likelihood sequence estimate (MLSE) for a bit sequence may be a preferred approach for decoding convolutional encoded data. A general solution for the maximum of the conditional probability P(X/R), where R meets a certain set of physical constraints C(X), for the MLSE may still be difficult to implement. In this regard, an efficient solution may require a suboptimal solution that takes into consideration the complexity and the implementation requirements of utilizing physical constraints in the decoding operation. In audio applications, determining the appropriate physical constraints for the audio content may be necessary in order to implement an efficient solution for redundancy-based decoding operations.
The frame sync 301a may comprise a plurality of bits that may be utilized to synchronize the contents of the audio file frame 300. For example, a decoder may look or search through at least a portion of a file comprising the audio file frame 300 to detect or finding the frame sync 301a in order to decode the audio file frame 300. The frame sync 301a may comprise 11 or 12 set bits (0xFFF), for example. For audio file formats other than MP3, the frame sync 301a may have a corresponding frame field that may comprise fewer or more set bits than the number utilized in the frame sync 301a, for example.
The frame header 301b may comprise a plurality of fields. For example, the frame header 301b may comprise an audio version 304, a layer 306, a protection bit 308, a bitrate 310, a frequency 312, a pad bit 314, a private bit 316, a mode 318, a mode extension 320, a copy 322, a home 324, and an emphasis 326. The audio version 304 may comprise at least one bit that may be utilized to indicate the MPEG audio version ID utilized in the compression of the audio content in the audio file frame 300. For example, when two bits are utilized, ‘00’ may correspond to MPEG version 2.5, ‘10’ may correspond to MPEG version 2 (ISO/IEC 13818-3), ‘11’ may correspond to MPEG version 1 (ISO/IEC 11172-3), and ‘01’ may be reserved. The MPEG version 2.5 may be an extension of the standard that may be utilized in low bit rate files. When the MPEG version 2.5 is not supported by, for example, a decoder utilized to decode the received audio file frame, then utilizing a 12-bit frame sync 301a may provide better synchronization.
The layer 306 may comprise at least one bit that may be utilized to indicate the layer description. For example, when two bits are utilized, ‘01’ may correspond to Layer III, ‘10’ may correspond to Layer II, ‘11’ may correspond to Layer I, and ‘00’ may be reserved. The protection bit 308 may comprise at least one bit that may indicate whether the audio file frame 300 is protected by, for example, CRC. In this regard, when a single bit is utilized, a ‘0’ may indicate that the audio file frame 300 is protected by CRC while a ‘1’ may indicate that the audio file frame 300 is not protected by CRC. The CRC may be a 16-bit CRC that may follow the frame header 301b. In some instances, the CRC may be adjacent to and/or comprised within the side info 301c and/or within the main audio data 301d, for example. The CRC may be utilized to enable redundancy-based decoding of audio file frames, for example.
The bitrate 310 may comprise a plurality of bits that may be utilized to indicate the bitrate index in kilobits-per-second (kbps) utilized in encoding the audio content comprised within audio file frame 300. The following table illustrates exemplary bitrates that may be supported in MP3 when four bits are utilized to indicate the bitrates:
where V1 corresponds to MPEG version 1, V2 corresponds to MPEG version 2 and 2.5, L1 corresponds to Layer I, L2 corresponds to Layer II, L3 corresponds to Layer III, ‘free’ may indicate a free format, and ‘bad’ may indicate that the ‘1111’ value may not be a valid value and/or that it may not allowed. Since MPEG files may have variable bit rate (VBR), it may be possible for audio file frames 300 in an MP3 file to be created utilizing a different bitrate.
The frequency 312 may comprise at least one bit that may be utilized to indicate the sampling rate frequency utilized in creating the audio file frame 300. The following table illustrates exemplary sampling rate frequencies that may be supported in MP3 when two bits are utilized to indicate the sampling rate frequencies:
where all frequency values are in Hz.
The pad bit 314 may comprise at least one bit that may be utilized to indicate padding of the audio file frame 300. For example, when one bit is utilized, a ‘0’ may indicate that the frame is not padded while a ‘1’ may indicate that the frame is padded. In this regard, padding may be utilized to fit the bitrates exactly. For example, for a 128 kbps bitrate at 44.1 KHz sampling rate frequency, Layer II applications may utilize 418 byte and 417 byte long frames to get as close as possible to the 128 kbps bitrate. For Layer I, the slot may be 32 bits long while for Layer II and Layer III the slot may be 8 bits long, for example.
The private bit 316 may comprise at least one bit that may be utilized for specific needs of an application. In this regard, the private bit 316 may be utilized to carry information that may be utilized in redundancy-based decoding applications, for example. The mode 318 may comprise at least one bit that may be utilized to indicate the type of channel mode. For example, when two bits are utilized, a ‘00’ may indicate a stereo mode, a ‘01’ may indicate a joint stereo mode, a ‘10’ may indicate a dual channel or two mono channels, and a ‘11’ may indicate a single channel or mono. The mode extension 320 may comprise at least one bit that may be utilized in joint stereo mode to co-join channel data, for example. In this regard, the mode extension may utilize two bits to indicate the appropriate extension operation.
The copy 322 may comprise at least one bit that may be utilized to indicate whether the contents in the audio file frame 300 are copyrighted. For example, when a single bit is utilized, ‘0’ may indicate that the copyright is off, that is, the contents are not copyrighted, and a ‘1’ may indicate that the copyright is on, that is, the content are copyrighted. The home 324 may comprise at least one bit that may be utilized to indicate whether the contents are original or a copy of an original. For example, when a single bit is utilize, a ‘0’ may indicate that the contents are a copy of an original file while a ‘1’ may indicate that the contents are those of an original file. The emphasis 326 may comprise at least one bit that may be utilized to indicate the emphasis bit in an original recording. In some instances, the emphasis 326 may utilize two bits to for emphasis indication.
The side info 301c may comprise at least one bit that may be utilized to provide additional information that may be based on the audio version 304 and/or the mode 318. In this regard, the side info 301c may be a variable bit length structure, for example. The main audio data 301d may comprise a plurality of bits that may correspond to the compressed or encoded sound content within the audio file frame 300. The ancillary data 301e may comprise a plurality of bits that may be utilized to provide user defined data such as song or audio file title, for example.
When the protection bit 308 indicates that a CRC for the audio file frame 300 is available, the CRC may be utilized for redundancy-based decoding of the audio file frame 300 and the audio contents within the audio file frame 300. Moreover, additional information within the frame header 301b may be utilized to indicate redundancy or physical characteristics of the contents of the audio file frame 300 and which may be utilized for redundancy-based decoding applications. For example, the bitrate 310, the frequency 312, and/or the pad bit 314 may be utilized to indicate the length of the audio file frame 300. The length of the audio file frame 300 may be based on the appropriate encoding of the audio contents and may therefore be based on the physical information, such as musical and/or voice spectral content, for example, contained within the audio information in the main audio data 301d. Notwithstanding, other audio file frame formats may utilize a field in, for example, a frame header, to provide direct information as to the frame length which may be utilized for redundancy-based decoding applications.
Returning to step 408, when the CRC verification test is not successful for the decoded audio frame, the process may proceed to step 410. In step 410, the audio receiver may perform a redundancy algorithm that may be utilized to provide a decoding performance that may result in equal or reduced decoding errors when reconstructing the audio content than those that may occur from utilizing the standard Viterbi algorithm. After step 410, the operation may proceed to end step 414.
For some audio applications, for example, the redundancy algorithm may comprise searching for the MLSE that may also meet the CRC condition and the physical constraints. In this regard, a set of k bit sequences {S1, S2, . . . , Sk} may be determined from the MLSE that meet the CRC constraint. Once the set of k sequences is determined, a best sequence, Sb, may be determined that also meets at least one of a plurality of physical constraints associated with a specified audio content.
In step 428, the audio receiver may determine whether the CRC verification test was successful for the current hypothesis. When the CRC verification test is not successful, the operation may proceed to step 432. In step 432, the iteration counter may be incremented. After step 432, in step 434, the audio receiver may determine whether the iteration counter is less than a predetermined limit. When the iteration counter is higher or equal to the predetermined limit, the operation may proceed to step 446 where a bad audio frame indication is generated. When the iteration counter is less than the predetermined limit, the operation may proceed to step 436 where a next maximum likelihood solution may be determined. After step 436, the operation may proceed to step 426 where the CRC of the decoded audio frame may be determined based on the maximum likelihood solution determined in step 426.
Returning to step 428, when the CRC verification test is successful, the operation may proceed to step 430. In step 430, the hypothesis counter may be incremented. After step 430, in step 438, the audio receiver may determine whether the hypothesis counter is less than a predetermined limit. When the hypothesis counter is less than the predetermined limit, the operation may proceed to step 424 where the iteration counter may be set to an initial value. When the hypothesis counter is equal to the predetermined limit, the operation may proceed to step 440 where the best hypothesis may be chosen from the source constraints.
After step 440, in step 442, the audio receiver may determine whether the best hypothesis chosen in step 440 is sufficient to accept the decoded audio frame. When the chosen hypothesis is sufficient to accept the decoded audio frame, the operation may proceed to step 444 where the decoded audio frame may be accepted. When the chosen hypothesis is not sufficient to accept the decoded frame, the operation may proceed to step 446 where a bad audio frame indication is generated. After step 444 or step 446, the operation may proceed to end step 414 in
The search process for a T hypothesis that meets the CRC or redundancy verification parameter for audio decoding applications may start with the selected trellis junction with the highest metric. In this example, the junction labeled 6 has the highest metric and the search process may start at that point. A new search tree 500 branch or row may be created from the junction labeled 6 and a trace back pointer may be utilized to track the search operation. The new branch or row results in three additional estimated bit sequences or three junctions labeled 11 through 13. As a result, the three junctions in the top row with the lowest metrics, junctions 3, 9, and 10, may be dropped. This is shown by a small dash across the dark circle at the end of the diagonal line. Again, the new branch or row is verified for CRC. As shown, the CRC fails for this new branch and a next branch may be created from the junction with the highest metric or junction 12 as shown. In this instance, the branch that results from junction 12 meets the CRC constraint and the search process may return to the top row and to the junction with the next highest metric. The estimated bit sequence associated with junction 12 may be selected as one of the bit sequences for the set of k sequences {S1, S2, . . . , Sk}.
Junction 4 represents the next highest metric after junction 6 on the top row and a new branch or row may be created from junction 4. In this instance, the new branch meets the CRC constraint and the estimated bit sequence associated with junction 4 may be selected as one of the bit sequences for the set of k sequences {S1, S2, . . . , Sk}. This approach may be followed until the limit of k sequences is exceeded or the search from all the remaining selected junctions is performed. In this regard, a plurality of trace back pointers may be calculated during the search operation. The size of the set of k bit sequences {S1, S2, . . . , Sk} may vary.
Once the set of k sequences {S1, S2, . . . , Sk} has been determined by following the search as described in
For each of the candidate bit sequences in the set of k bit sequences {S1, S2, . . . , Sk}, a set of TI different physical constraint tests, {Test(j), . . . , Test(T1)}, may be performed. The physical constraint tests correspond to tests of quantifiable characteristics of the type of audio data received for a particular audio application, for example. The scores of the physical constraint tests for an ith bit sequence, {T_SC(i, j), . . . , T_SC(i, T1)}, may be utilized to determine whether the bit sequence passed or failed a particular test. One example of quantifiable characteristics of audio content in MP3 frames may be information regarding the variable length of the audio frame, the bitrate, the sampling rate frequency, and/or the bit padding. For example, when T_SC(i, j)>0, the ith bit sequence is said to have failed the jth physical constraint test. When the T_SC(i, j)<=0, the ith bit sequence is said to have passed the jth physical constraint test. In some instances, when the value of a test score is smaller, the reliability of the score may be increased.
Once the physical constraint tests are applied to the candidate estimated bit sequences, the following exemplary approach may be followed: when a score is positive, the candidate bit sequence may be rejected; for a particular physical constraint test, the candidate with the best score or with the lowest score value may be found; the candidate that is selected as the best score for the most number of tests may be selected as the best bit sequence, Sb.
Table 3 illustrates an exemplary embodiment of the invention in which a set of five candidate bit sequences, {S1, S2, S3, S4, and S5}, may be tested using a set of four physical constraint tests, {Test(1), Test(2), Test(3), and Test(4)}. The scores may be tabulated to identify passing and failing of various tests for each of the candidate bit sequences. In this instance, S2 and S4 are rejected for having positive scores for Test(2) and Test(4) respectively. The bit sequence S3 is shown to have the lowest score in Test(1), Test(3), and Test(4) and may be selected as the best bit sequence, Sb.
Accordingly, the present invention may be realized in hardware, software, or a combination of hardware and software. The present invention may be realized in a centralized fashion in at least one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software may be a general-purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
The present invention may also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
While the present invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. Therefore, it is intended that the present invention not be limited to the particular embodiment disclosed, but that the present invention will include all embodiments falling within the scope of the appended claims.
This application makes reference to, claims priority to, and claims the benefit of U.S. Provisional Application Ser. No. 60/970,354 filed Sep. 6, 2007. This patent application makes reference to: U.S. patent application Ser. No. 11/189,509 filed on Jul. 26, 2005;U.S. patent application Ser. No. 11/189,634 filed on Jul. 26, 2005; andU.S. Provisional Patent Application Ser. No. 60/957,096 filed on Aug. 21, 2007. Each of the above stated applications is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60970354 | Sep 2007 | US |