The present invention relates to systems and methods for decoding source messages that may get corrupted in the process of transmission. These systems and methods can be used in any communication system where fixed-length or variable-length coding is employed including audio, image, and video communication.
A simplified block diagram of a typical communication system is illustrated in
In the conventional system shown in
Furthermore, the combination of source and channel decoding can be realized as a symbol-clocked sequential decoder, such as the stack algorithm or M-algorithm. When the VLC-coded message is terminated by a known sequence of bits (as is the case, for example, in audio scalefactor coding by the low complexity mono MPEG-4 AAC source coder, where the terminating sequence is 000), it was suggested in the art that such knowledge can improve the decoding performance. This is because a known terminating sequence helps suppress the paths that have mismatched lengths compared to the correct path, and helps achieve a better estimate of the correct path length.
Some in this field have discussed the problem of separation of different media frames in the same transmission packet, a problem which has been referred to as “burst segmentation.” In this setting, media frames are stacked back-to-back, so each frame is terminated by the header of the next frame in the sequence. Since headers usually contain some a priori known or predictable information, such knowledge can be used to help separate the frames and improve decoding due to the same reasons discussed in the previous paragraph, namely, a known terminating sequence helps suppress the paths that have mismatched lengths compared to the correct path. Some VLCs possess so-called Self-Synchronization Strings (SSSs) with the following property: in a bit stream produced by the VLC, the first bit following the end of a SSS is the first bit of some codeword from that VLC. SSS helps the decoder regain symbol synchronization with the encoder. Certain systems have used SSS within a bit-clocked maximum a posteriori (MAP) decoder to improve the performance. However, although an SSS will help the decoder regain symbol synchronization with the encoder following a bit error, the number of decoded symbols prior to the SSS will, in general, be different from the number of encoded symbols, and the path length measured in source symbols will be different from the original path length. Moreover, since SSSs are combinations of codewords, they will occur in the bit stream if and when the corresponding sequence of source symbols is observed. The encoder itself has no control over when and how often SSSs will occur. It would therefore be advantageous if the encoder could control where synchronization happens; this would enable the system to synchronize in the most effective way.
Therefore, there is a need for a method and system that increase control over the synchronization process in joint source-channel decoding and improving source decoding performance.
According to certain embodiments, joint (as opposed to separate) source-channel decoding is performed in a manner that improves performance of a receiver. Methods and systems are provided to insert specifically-designed synchronization strings or sequences of symbols into bit streams in order to increase control over the synchronization process in such joint source-channel decoding, and to improve source decoding performance by suppressing sequences that exhibit some form of mismatch relative to the synchronization pattern. These methods and systems can be employed for standalone source decoding of noisy bit streams, as well as iterative joint source-channel decoding.
These methods and systems augment a source message bit sequence, or segments thereof, with one or more bits or synchronization sequences to facilitate better source decoding. The bits or synchronization sequences may be placed at a different bit positions within the source message. For example, a single synchronization sequence may be added at the end of the source message bit sequence. The bits or synchronization sequences may be selected based on characteristics of the source message bit sequence, such as a sequence weight modulo-q and/or other source sequence specific attributes. They may be inserted to maximize distance properties of different segment values or to minimize their probability of erroneous decoding. The synchronization sequences may be selected to minimize the bit-error-rate or packet-error-rate performance of source decoding, to maximize Hamming distances between synchronization sequences in the set, and/or to have minimum autocorrelation sidelobes or low autocorrelation function sidelobes. The segments to be augmented may be selected based on their frequency of occurrence, how prone they are to errors, and/or how important they are for decoding or interpreting, such as significant control fields, or segments of high/relevance for perceptual quality.
The properties of a source message can be used to improve decoding, synchronization performance, or both. For example, a window sequence and window shape fields may be jointly decoded, and a global gain value of one channel may be used to improve the decoding of another channel, whereby synchronization sequences and/or bits may be inserted at particular locations and between various fields in the bit stream. Additionally, a source packet may be encoded for unequal error protection by partitioning the source packet, encoding the subpackets to provide additional error protection (e.g., via a forward error correction code, a single parity check code, augmentation of the source bits with synchronization sequences), and additionally encoding the subpackets with a CRC code to facilitate separate error concealment.
Other benefits and features of the present invention may become apparent from the following detailed description considered in conjunction with the accompanying drawings. It is to be understood, however, that the drawings are designed solely for purposes of illustration and not as a definition of the limits of the invention, for which reference should be made to the appended claims.
Further features of the invention, its nature and various advantages will be more apparent from the following detailed description of the embodiments, taken in conjunction with the accompanying drawings in which:
Certain aspects of the present invention pertain to various systems and methods to improve the decoding of a source message, as compared to conventional communication systems (such as the one depicted in
The communication system of
The potential improvements in decoding performance brought by various embodiments are illustrated in terms of decoding of MPEG-4 AAC bit streams, whose structure is illustrated in
Most fields in
Joint source-channel decoding of MPEG-4 AAC bit streams with a specific configuration of low complexity mono MPEG-4 AAC frame is known in the art, where fields 305, 306, and 307 are all equal to 0. Hence, under such configuration, the binary sequence 000 is a known terminating sequence for VLC-coded scalefactors (304), and this knowledge is exploited to improve decoder performance. According to certain aspects of the present invention, certain designed synchronization sequences may be inserted not only at the end of VLC-coded fields, but also within those fields. Also, no specific configuration of the MPEG-4 AAC frame is assumed, and thus the principles of the present invention are much more general and applicable to any MPEG-4 AAC configuration, as well as other VLC-coded data, such as images and video.
It should be noted that, although a significant portion of the discussion herein is presented in the context of a system that employs a particular kind of variable-length coding strategy (e.g., ones that employ codes such MPEG-1, MPEG-2, MPEG-4, H.261, H.263 and H.264), the principles of the invention are applicable to other coding strategies, including ones that employ fixed-length codes, such as ISO 8859-15, UTF-32/UCS-4, etc.
In certain embodiments, alternating synchronization sequences that are applicable to both fixed-length and variable-length codes are used to augment the source message. The alternating synchronization sequence scheme in these embodiments consists of two synchronization sequences that are chosen to have maximum Hamming distance from each other to facilitate better differentiation among sequences with different attributes. An attribute is a suitably chosen property of the source sequence. One example of an attribute, used for illustration purposes in these exemplary embodiments, is the sequence weight modulo-2 (i.e., even or odd number of +1's). Logical bits 1 and 0, respectively, could also be interchangeably represented with antipodal levels +1 and −1, or with some other levels as per specific convention. In addition, SYNC sequences are preferably chosen to have good autocorrelation properties to distinguish better from sequences of source symbols/bits that are out of synchronization. Examples of such sequences, among many others, are Barker and Willard sequences. As an example, with synchronization sequences of length 4 bits, a valid pair of alternating synchronization sequences (assuming antipodal ±1 bit coding) is:
Sync_Seq#1=[−1,−1,−1,+1]
Sync_Seq#2=[+1,+1,+1,−1]
One of these synchronization sequences is appended at the end of the source message depending on the source message weight. If the weight is an even number, Sync_Seq#1 is appended, and if the weight is an odd number, Sync_Seq#2 is appended. The source message with a SYNC sequence appended at the end is shown in
The following example illustrates how this alternating SYNC synchronization technique helps with decoding. Consider the case where the source message weight is 10. Since this weight is an even number, Sync_Seq#1 is appended to the end of the source message and transmitted to the receiver. At the receiver, the decoder runs a defined decoding algorithm (e.g., M-Algorithm) and generates a number of possible source sequences (also referred to as paths) that could correspond to the received sequence. Now consider three different cases with and without synchronization sequences:
This alternating SYNC technique turns out to be very effective, especially when entropy-achieving source codes are used for compression. Entropy-achieving codes like Huffman codes usually have weak distance properties. Minimum distances as low as 1 or 2 are frequently observed even between relatively long sequences. This is the case with scalefactor data in MPEG-4 AAC that are used in several embodiments as an example. Therefore, at high channel signal-to-noise ratios (SNR), where the channel produces only a few errors, these few errors often cause an incorrect detection of the source message such that a decoded message may have a Hamming distance of 1 or 2 bits from the correct message. In cases where weight-1 error patterns are dominant, alternating synchronization sequences will help suppress these weight-1 error patterns and correct a significant fraction of erroneous packets.
The method of alternating synchronization can be applied to any VLC-coded field in the MPEG-4 AAC frame (
In other embodiments, synchronization sequences are applied not only at the end of the source message, but also within the source message, i.e., by partitioning the source message into several segments and placing a synchronization sequence at the end of each segment. Applying additional synchronization sequences within the source message will help the decoder to make a better decision about surviving paths at each stage of the decoding algorithm. This may be especially advantageous for the family of sequence decoding algorithms that keep only a fraction of best paths at each stage. Consider a source message consisting of N symbols, where each symbol is mapped to a binary codeword from a codebook of a variable-length code (e.g., Huffman code). Let the length of the source message be L bits. Now, consider two different paths: (i) the correct path, i.e., the path consisting of the N symbols of the transmitted source message, and (ii) a path different from the correct path, also consisting of N symbols and L bits. Using a non-alternating synchronization sequence only at the end of the source message, as in
The above-mentioned technique can be extended to a larger number of synchronization sequences at different positions in the source message. For example, one could have 3 synchronization sequences in every ⅓ of the source message.
Additional gains can be achieved by replacing the non-alternating synchronization sequences by alternating synchronization sequences as employed in other embodiments. For example, in the example above with two synchronization sequences in the middle and at the end of the source message, instead of the two non-alternating synchronization sequences, two alternating synchronization sequences could be employed. In this case, each alternating synchronization sequence is chosen based on the weight of the corresponding segment. By applying these two alternating sequences, the competing path (path (ii)) is suppressed if any one or more of the three conditions below occur:
For a better understanding of the error correction capabilities of the alternating synchronization sequences in the middle, consider high channel SNR scenarios, where the low-weight error patterns are dominant. Consider the following two schemes: Scheme1, where one alternating synchronization sequence is applied at the end of the source message, and Scheme2, where two alternating synchronization sequences are applied, one in the middle and one at the end of the source message. The error-correcting capabilities of Scheme1 and Scheme2 are compared in the following situations:
Thus, adding alternating synchronization sequences in the middle will improve decoder's performance by increasing both the ability of detecting paths with incorrect segment length, and incorrect segment weight.
Alternatively, more than two alternating synchronization sequences could be employed, each located at the end of one of the segments of a source message. This will further improve the capability of detecting incorrect segment lengths and incorrect segment weights, as explained above.
Due to the two above-mentioned advantages, Scheme (ii) outperforms Scheme (i) by about 0.3 dB at PER of 10−2. However, by increasing the channel SNR from 7 dB to 7.5 dB, Scheme (i) shows a faster decay in PER compared to Scheme (ii). This is due the fact that at high SNRs, weight-1 error patterns are dominant. In Scheme (ii) such error patterns are suppressed by one and only one of the two length-4 alternating sequences, i.e., either by the sequence located in the middle of the source message, or by the sequence located at the end of the source message depending on the location where the error has occurred). However, Scheme (i) suppresses weight-1 error patterns by a length-7 sequence, i.e., paths containing weight-1 error patterns are penalized more in case of Scheme (i), and the probability of decoding the correct path will increase. However, in joint source-channel decoding applications, lower range of SNR is of most interest and therefore Scheme (ii) is advantageous in such scenarios.
One additional advantage of using Scheme (ii) is that the synchronization sequences in the middle help the decoder decide which paths to keep and which ones to discard. This will reduce the probability that the decoder discards the correct path in early stages of decoding, and will increase the probability that the correct path stays as one of the paths with highest metrics and survives until the end of decoding.
To help additionally explain the above mentioned situation, Table III shows a comparison of the percentages of cases where the correct path is not one of the surviving paths at the end of decoding, for Scheme (i) and Scheme (ii). From the results in Table III, one could observe that by applying synchronization sequences in the middle of the source message, the percentage of cases where the correct path does not survive will considerably drop. Specifically, for the range of considered channel SNR (3 to 5 dB), applying an additional length-4 synchronization sequence in the middle of the source message will increase the probability of correct path surviving on the list by approximately a factor 1.5-3.
As a final comment on
From
In
As with the previous method, this method can also be applied to any VLC-coded field in the MPEG-4 AAC frame (
In other embodiments, the effect of alternating synchronization sequences can be further enhanced by employing more than two alternating SYNCs corresponding to more than two sequence weights. For example, Table IV shows how to employ three alternating synchronization sequences corresponding to three sets of sequence weights. As shown in Table IV, the encoder calculates the packet weight and divides the calculated weight by 3. If the remainder is 0, the encoder adds SYNC1 at the end of the source message; if the remainder is 1, it adds SYNC2 at the end; and if the remainder is 2, it adds SYNC3 at the end of the source message. That is, the three sets of sequence weights correspond to weights 0, 1 and 2, modulo-3. At the decoder, the path metrics are modified by adding correlation metric between the SYNC for corresponding sequence weight and bit sequence at the expected SYNC position. If the weights of the correct path and the top-ranked path give the same remainder after dividing by 3, then the top-ranked path is correlated by the correct SYNC and stays at the top of the list. Otherwise, if the weight remainders are different, it means that errors exist in the top-ranked path. In this case, correlation with the SYNC will be low, and the top-ranked path may move down the list and be replaced by the correct path or another surviving path that has the appropriate path weight. By applying such synchronization method, all surviving paths that have error patterns with weights 3w+1, 3w+2, for w=0, 1, 2, . . . , are suppressed. At high SNRs, the most frequently occurring error patterns have weights 1 and 2, and these get suppressed by the scheme described above.
In
As shown in
Despite its better performance at higher SNRs, the triple-choice scheme is slightly outperformed at lower channel SNRs by the scheme that applies synchronization sequences in the middle of the source message (Scheme (ii), marked by Triangles in
In other embodiments, triple-choice alternating synchronization scheme, can be extended to suppress error patterns with weights 1, 2, . . . , n−1, for any given integer n>3. For this purpose, the dictionary of alternating synchronization sequences is extended to include n sequences. The encoder calculates the remainder of source message weight divided by n, i.e., sequence weight modulo-n and applies the synchronization sequence number i, if the remainder is i−1. For example, to correct error patterns with weights 1, 2, and 3, a dictionary of 4 synchronization sequences, SYNC1, SYNC2, SYNC3, and SYNC4, is needed. If the remainder of the source message weight divided by 4 is 0, SYNC1 is used, and if, for example, the remainder is 3, SYNC4 is used. Otherwise, the decoding method is similar as in these embodiments with three SYNCs corresponding to n=3 sequence weights. In this example, all error patterns with weights 4w+1, 4w+2, 4w+3, for all w=0, 1, . . . , will be suppressed. At high SNRs, most common error patterns to be suppressed are error patterns with weights 1, 2, and 3.
This method can be applied to any VLC-coded field in the MPEG-4 AAC frame (
In yet other embodiments, as a special case of alternating synchronization sequences, one can apply single parity check bits for different segments of the source message. As an example, consider again
Despite its comparable performance (or even slightly better performance) at lower channel SNRs, Scheme (iii) is outperformed by Scheme (i) at higher channel SNRs. At higher channel SNRs, weight-1 error patterns are dominant. Scheme (i) suppresses weight-1 error patterns by a length-7 sequence; however, Scheme (iii) suppresses weight-1 error patterns by a length-1 synchronization sequence (i.e., a single synchronization bit) that is located at the end of the segment containing the erroneous bit. Since Scheme (i) penalizes the weight-1 error patterns considerably more than Scheme (iii), Scheme (i) has a better potential to suppress incorrect paths and decode the correct path at high SNRs.
Alternatively, in other embodiments, alternating synchronization sequences may have non-uniform spacing, such that they are located at non-equal distances from each other. For example, when applying 3 alternating synchronization sequences, one may apply the first sequence after ½ of the source symbols/bits, the second sequence after ⅗-ths of the symbols/bits, and the third sequence at the end of the source message.
Alternatively, in other embodiments, one may apply alternating sequences with different lengths at different locations of the source message. For example, when applying two alternating synchronization sequences, one may apply a length-7 sequence in the middle and a length-4 sequence at the end of the source message, or vice versa.
Alternatively, in other embodiments, one may apply different types of alternating sequences (in terms of maximum weight of error patterns that they could suppress) at different locations of the source message. For example, one may apply a triple-choice alternating sequence (e.g., the one given in Table V) in the middle, and a “double-choice” synchronization sequence (e.g., Table II) at the end of the source message. Such combined use of SYNC sequences with different error (weight) suppression capabilities may sometimes be referred to as “mixed-type” alternating SYNC scheme, whereas using the same type of sequence at one or multiple positions, as described in some of the previous embodiments, may be referred to as “single-type” alternating SYNC scheme. For the same amount of overhead, combining synchronization sequences of different types and lengths may provide better performance than using sequences of one type and length only. As an example, consider the following two schemes: Scheme (i) that uses two 8-bit synchronization sequences from the Hamming (8,4) code shown in Table VI, one in the middle and one at the end of the bit stream (as shown in
Alternatively, in other embodiments, one may apply a combination of non-alternating and alternating synchronization sequences. For example, one may apply a non-alternating sequence in the middle and an alternating sequence at the end of source message.
Alternatively, in other embodiments, one may apply a combination of two or more of features described in previous embodiments to satisfy system requirements for different digital communications systems with optimal performance complexity trade-off. For example, one may apply a larger number of synchronization sequences and/or more powerful/longer sequences in the earlier parts of the MPEG-4 AAC bit stream (
In other embodiments, separate CRC codes are provided for more protected and less protected parts of the source sequence, respectively, to facilitate better unequal error protection and error concealment. For example, earlier parts of MPEG-4 AAC packets, e.g., ICS, section data and scalefactor fields, say subpacket A, could be more protected by source sequence augmentation methods described in other embodiments, while the latter parts, subpacket B, could be less protected or without additional source sequence augmentation. Then subpackets A and B are protected separately by CRC codes CRCA and CRCB, respectively, wherein CRC codes CRCA and CRCB may also have different strengths depending on the desired probability of undetected errors for subpackets A and B. Since subpacket A will have lower packet error probability, on average, than subpacket B, whenever it happens that subpacket A is correct and subpacket B is incorrect, the error concealment will be used only for less important data in subpacket B.
Additionally, in stereo or multichannel MPEG-4 AAC packets, unequal CRC and/or augmentation protection may be applied on each channel. For example, the more important parts in channel 1, such as ICS, section data and scalefactors, referred to as subpacket A1, could be protected by more powerful CRC and/or sequence augmentation, while the less important parts of channel 1, referred to as subpacket B1, could be protected by less powerful CRC and/or sequence augmentation. To facilitate the detection of the start of channel 2, a SYNC sequence may be placed at the end of channel 1 bit stream. Within channel 2 bit stream, one may again distinguish the more important part (subpacket A2), which is protected using more powerful CRC and/or sequence augmentation, and the less important part (subpacket B2), which is protected using less powerful CRC and/or sequence augmentation. With this arrangement, the more important parts of both channels (subpackets A1 and A2) will have lower error probabilities than the less important parts (subpackets B1 and B2), which will facilitate error concealment in the less important parts.
Additionally, one may distinguish more than two levels of importance within each audio channel, and apply unequal CRC and/or augmentation protection on each part of the bit stream according to the importance of that part of the bit stream in order to achieve the desired performance.
Alternatively, in other embodiments, one may apply SYNCs in a “smarter” way by tailoring the SYNC bits to the properties of the specific encoded source sequences. In the previously described embodiments, SYNCs were chosen based on one attribute of the source sequence, namely the sequence weight. However, it is also possible to utilize other attributes of the source message in SYNC sequence selection. For example, for the scalefactors in MPEG-4 AAC encoded bit streams, the most frequent symbols are 59, 60, and 61. In one approach, some sync bits could be dedicated to correspond to the count of the number of one or more of these symbols, or others in addition, within the scalefactor field. Also, the codewords of some symbols in the vicinity of symbol 60 are at Hamming distance 1 from each other (e.g., the codewords of symbols 57 and 63). Therefore, a single bit error could result in erroneous symbol decision. To detect such errors, some sync bits could be used to identify the number of rising edges or falling edges, or both, in the symbol sequence. For example, if the current symbol is 60 and the next symbol is 63, number 3 is added to the rising edge count. If the decoder decodes symbol 57 instead of 63, due to one bit error, number 3 will be added (|60−57|=3) to the falling edge count. Therefore, both rising edge count and falling edge count will differ by 3 from the original rising and falling edge counts and SYNCs will detect this mismatch. To represent rising and falling edge counts with a finite, relatively small number of bits, say q bits, modulo-(2q) operation could be applied to the total count. To provide better resolution, one could also encode the peak value and the location of the peak symbol within the scalefactor field. Details of an exemplary embodiment of assigning bits to different fields of the “smart” SYNC is shown in Table VII. The 12 assigned bits are further encoded using a (12,24) Golay code and are added as the synchronization sequence to the end of the scalefactor field. Simulation results using this “smart” SYNC are shown in
The above embodiments were described for the scenario where the input to the source decoder consists of soft bits, as in
In yet other embodiments, an error detection code is associated with source encoded packet prior to augmenting the source sequence with additional bits and SYNC sequences. Error detection is performed prior to source decoding on the bit sequence stripped of augmenting bits and SYNC sequences at the receiver. If the error detection code indicates that the packet is correct, i.e., error-free, the augmented source decoding is skipped. In case of detected errors, source decoding is performed as described in various embodiments. In some embodiments, a CRC code is used as an error detection code. SISO CRC decoding is employed first on the source sequence stripped of augmenting bits and SYNC sequences, followed by soft input source decoding as described in various embodiments. In iterative joint source-channel decoding, extrinsic information, as known in the art, is passed to the channel decoder for the next iteration. Alternatively, SISO source decoding could be performed first, followed by SISO CRC decoding on the sequence stripped of augmenting bits and SYNC sequences, followed by passing extrinsic information to the channel decoding for the next iteration, if errors are detected by CRC.
The success of Joint Source Channel Decoding (JSCD) depends on the ability of both source and channel decoder to distinguish between valid (permissible) and invalid (impermissible) sequences. Efficient source encoding, however, generates many valid sequences with distance 1 from each other. Consider Huffman coding, for example. Since the Huffman code tree is a complete tree, as known in the art, Huffman decoder is able to interpret any bit sequence, even a random bit sequence. Hence, such source decoder by itself will not provide any gain in JSCD. The methods described in previous paragraphs are able to effectively increase the distance between permissible source sequences, thereby increasing the source decoding gain in JSCD. Performance of such methods was illustrated in terms of decoding of Huffman-coded scalefactor fields in MPEG-4 AAC (blocks 304 and 310 in
Decoding of MPEG-4 AAC Fields
ICS Decoding: Individual sub-fields of the ICS field are shown in
The field following the wss field is the max_sfb field. This field is 6 bits long in LONG windows (block 603), and 4 bits long in SHORT windows (block 610). However, in the LONG windows, 6-bit values higher than 51 (i.e., 52, 53, . . . 63) are not allowed. Further, the set of values observed in practice is actually much smaller. Table IX lists the set of values of max_sfb observed on a large audio dataset (over 2 hours of audio) spanning several program types such as jazz, classical music, newscast, talk shows, etc., encoded using the Nero codec at several different bitrates. As seen in the table, the set of values occurring in practice is much smaller than the set of possible values, which may be taken advantage of in decoding, either for error detection or correction.
In addition to these existing properties of wss and max_sfb fields, which help in decoding, one can strategically insert bits into the ICS field at the encoder to further increase the distance among permissible source sequences. Some possibilities are illustrated in
Following the max_sfb field in LONG windows is the 1-bit predictor_data_present flag (block 604 in
Following the predictor_data_present flag in LONG windows and the sfg field in SHORT windows is the 2-bit MS mask present (mp) field, shown as blocks 605 and 612 in
Section data is composed of an alternating sequence of 4-bit Huffman codebook indices and variable-bit length section length, which itself is composed of one or more section length increments that are 3 bits long in SHORT windows, and 5 bits long in LONG windows. It is possible to add parity bits to individual Huffman codebook indices or section length increments, or groups of these fields. It should be mentioned that section data provides a syntax check that can be used to detect certain error patterns. In particular, according to the MPEG-4 standard, the sum of all section length increments should equal the total number of scalefactor bands. Hence, bit errors in section length increments are likely to cause syntax violation.
It should be mentioned that correct decoding of section data is important for spectral data decoding, while for decoding of scalefactors, which are thought to be more important for perceptual audio quality, it is only important to decode the length of the section data correctly, because scalefactors follow immediately after the section data. The point is illustrated by the following example, where one (8,4) Hamming sync sequence (Table VI) is appended at the end of the ICS field, and another Hamming sync sequence is appended at the end of the scalefactor field. The 4-bit index of the sync sequence appended to the scalefactor field is simply chosen based on the weight of the scalefactor field as
scalefactor_SYNC_index=scalefactor_weight % 16 (1)
where % symbol corresponds to the modulo operation. Meanwhile, the 4-bit index of the sync sequence appended to the ICS field, ICS_SYNC_index=(b1, b2, b3, b4), is chosen based on several attributes of the ICS field. In SHORT windows, the index bits are chosen as
b1=sfg_bit—4⊕sfg_bit—5
b2=sfg_bit—6⊕sfg_bit—7
(b3,b4)=ICS_weight % 4, (2)
where sfg_bit_i is the i-th bit of the sfg field (611 in
b1=mp_bit—2
(b2,b3,b4)=ICS_weight % 8, (3)
where mp_bit_i is the i-th bit of the mp field (605 in
With such attribute-based sync selection, the block error rates in scalefactor decoding for the hard decoder and two soft decoders are measured. One soft decoder uses the noisy ICS field. It first finds the 7 lowest-magnitude input soft bits, generates 27 hypotheses by replacing these 7 soft bits by all combinations of the corresponding hard bits, and then finds the MAP estimate of ICS data. ICS data then provide the number of scalefactors, while the length of the ICS field indicates the start position of the scalefactor field. Scalefactors are then decoded using the M-algorithm with M=30. The second soft decoder uses the noiseless ICS data and runs the M-algorithm on the scalefactors with M=30, just like the first soft decoder, but with the exact knowledge of the start position of the scalefactor field and the number of scalefactors. The results are shown in
Scalefactor Decoding: In MPEG-4 AAC, scalefactors are encoded using a Huffman codebook consisting of 121 codewords. As already mentioned, for successful decoding, it is necessary to know the starting position of the scalefactor field and the number of scalefactors. In some specific encoder configurations known in the art, such as the low-complexity mono MPEG-4 AAC configuration, a known 3-bit sequence 000 terminates the scalefactor bit stream, which can be utilized to improve decoding performance. In other configurations, and also in the cases where different performance complexity trade-offs are desired, various features described herein such as alternating SYNC sequences, “smart” SYNCs, and combinations thereof, may be employed. Scalefactor decoding was used as a running example in the description of some of the embodiments to illustrate various aspects and performance gains.
Following the scalefactor field is the pulse data field (305 in
TNS Decoding: The next field in the MPEG-4 AAC packet is the Temporal Noise Shaping (TNS) field, indicated by block 306 in
Each of these parameters is encoded into the bit stream by the binary representation of its value, rather than by VLC coding. Hence, the decoding of the TNS field is more similar in spirit to the decoding of the ICS field, rather than the scalefactor field. Concepts described in the previous embodiments, such as SYNC sequence insertion, parity bit insertion, or bit insertion for the purpose of increasing the distance among various allowed values of a particular parameter, as well as combinations thereof, are all applicable to the TNS field as well. Generally, the parameters that appear earlier in the TNS field, such as the number of filters, coefficient resolution, and filter order, are more important for successful decoding than the parameters that appear later in the field, such as filter coefficients. This is because earlier parameters influence parsing and interpretation of latter parameters. For example, an error in the first parameter (the number of filters) will lead to a wrong decoded length of the TNS field whereas an error in a filter coefficient will not. Hence, the goal of SYNC sequence or bit insertion would be to achieve a desired performance complexity trade-off by exploiting the mentioned structure and properties of the TNS data field.
Following the TNS data field is the gain control data field (307 in
Spectral Data Decoding: The next field in the MPEG-4 AAC packet is the spectral data field (308 in
Other codebooks, referred to as “unsigned” codebooks, code only the magnitudes of quantized spectral coefficients, while their signs are inserted into the bit stream separately, immediately following the codeword for the magnitude. One can think of a codeword followed by sign bits as an “extended” codeword that includes the sign information
extended_codeword=[unsigned_codeword,sign_bits] (4)
Hence, although seemingly different, the decoding of spectral data encoded by unsigned Huffman codebooks is also essentially the same as scalefactor decoding, if one considers extended codewords illustrated above. The same concepts, such as alternating and “smart” SYNCs, described in the context of scalefactor decoding, are directly applicable to this case as well.
In MPEG-4 AAC coding of stereo audio, after the spectral data for the first channel, the data of the second audio channel are encoded, starting with the global gain and section data (309 in
The various features described in the foregoing are applicable to other source encoders, including other audio encoders, as well as image, video, and text encoders. As with the examples above, the goal of “smart” SYNCs and bit insertion would be to increase the distance among the permissible source sequences, utilizing the source structure to gain efficiency, which results in improved decoding performance.
While there have been shown and described various novel features of the invention as applied to particular embodiments thereof, it will be understood that various omissions and substitutions and changes in the form and details of the systems and methods described and illustrated may be made by those skilled in the art without departing from the spirit of the invention. Those skilled in the art will recognize, based on the above disclosure and an understanding of the teachings of the invention, that the particular methods, hardware and devices that are part of this invention, and the general functionality provided by and incorporated therein, may vary in different embodiments of the invention. Accordingly, the particular system components shown in
Number | Name | Date | Kind |
---|---|---|---|
5278844 | Murphy et al. | Jan 1994 | A |
5289476 | Johnson et al. | Feb 1994 | A |
5315583 | Murphy et al. | May 1994 | A |
5465396 | Hunsinger et al. | Nov 1995 | A |
5511099 | Ko et al. | Apr 1996 | A |
5517535 | Kroeger et al. | May 1996 | A |
5523726 | Kroeger et al. | Jun 1996 | A |
5559830 | Dapper et al. | Sep 1996 | A |
5566214 | Kroeger et al. | Oct 1996 | A |
5579345 | Kroeger et al. | Nov 1996 | A |
5588022 | Dapper et al. | Dec 1996 | A |
5606576 | Dapper et al. | Feb 1997 | A |
5633896 | Carlin et al. | May 1997 | A |
5646947 | Cooper et al. | Jul 1997 | A |
5703954 | Dapper et al. | Dec 1997 | A |
5745525 | Hunsinger et al. | Apr 1998 | A |
5757854 | Hunsinger et al. | May 1998 | A |
5764706 | Carlin et al. | Jun 1998 | A |
5809065 | Dapper et al. | Sep 1998 | A |
5828705 | Kroeger et al. | Oct 1998 | A |
5850415 | Hunsinger et al. | Dec 1998 | A |
5878089 | Dapper et al. | Mar 1999 | A |
5903598 | Hunsinger et al. | May 1999 | A |
5949813 | Hunsinger et al. | Sep 1999 | A |
5956373 | Goldston et al. | Sep 1999 | A |
5956624 | Hunsinger et al. | Sep 1999 | A |
6014407 | Hunsinger et al. | Jan 2000 | A |
6108810 | Kroeger et al. | Aug 2000 | A |
6128350 | Shastri et al. | Oct 2000 | A |
6148007 | Kroeger | Nov 2000 | A |
6178317 | Kroeger et al. | Jan 2001 | B1 |
6259893 | Kroeger et al. | Jul 2001 | B1 |
6292511 | Goldston et al. | Sep 2001 | B1 |
6292917 | Sinha et al. | Sep 2001 | B1 |
6295317 | Hartup et al. | Sep 2001 | B1 |
6301430 | Oguro et al. | Oct 2001 | B1 |
6317470 | Kroeger et al. | Nov 2001 | B1 |
6345377 | Kroeger et al. | Feb 2002 | B1 |
6353637 | Mansour et al. | Mar 2002 | B1 |
6366888 | Kroon et al. | Apr 2002 | B1 |
6400758 | Goldston et al. | Jun 2002 | B1 |
6405338 | Sinha et al. | Jun 2002 | B1 |
6430227 | Kroeger et al. | Aug 2002 | B1 |
6430401 | Lou et al. | Aug 2002 | B1 |
6452977 | Goldston et al. | Sep 2002 | B1 |
6480536 | Hartup et al. | Nov 2002 | B2 |
6487256 | Kroeger et al. | Nov 2002 | B2 |
6510175 | Hunsinger et al. | Jan 2003 | B1 |
6523147 | Kroeger et al. | Feb 2003 | B1 |
6532258 | Goldston et al. | Mar 2003 | B1 |
6539063 | Peyla et al. | Mar 2003 | B1 |
6549544 | Kroeger et al. | Apr 2003 | B1 |
6556639 | Goldston et al. | Apr 2003 | B1 |
6563880 | Hunsinger et al. | May 2003 | B1 |
6570943 | Goldston et al. | May 2003 | B2 |
6590944 | Kroeger | Jul 2003 | B1 |
6622008 | Kroeger et al. | Sep 2003 | B2 |
6639949 | Kroeger et al. | Oct 2003 | B2 |
6671340 | Kroeger et al. | Dec 2003 | B1 |
6892343 | Sayood et al. | May 2005 | B2 |
6982948 | Kroeger et al. | Jan 2006 | B2 |
7256718 | Nakagawa et al. | Aug 2007 | B2 |
7305056 | Kroeger | Dec 2007 | B2 |
7724850 | Kroeger et al. | May 2010 | B2 |
7796716 | Bhukania et al. | Sep 2010 | B2 |
20020026616 | Kikuchi et al. | Feb 2002 | A1 |
20020041570 | Ptasinski et al. | Apr 2002 | A1 |
20020075974 | Mill | Jun 2002 | A1 |
20040155802 | Lamy et al. | Aug 2004 | A1 |
20040165512 | Kim et al. | Aug 2004 | A1 |
20050175123 | Gurney et al. | Aug 2005 | A1 |
20050249266 | Brown et al. | Nov 2005 | A1 |
20050275573 | Raveendran | Dec 2005 | A1 |
20060268965 | Ibrahim et al. | Nov 2006 | A1 |
20070140375 | Jeanne et al. | Jun 2007 | A1 |
20070257786 | King et al. | Nov 2007 | A1 |
20080250294 | Ngo et al. | Oct 2008 | A1 |
20090274248 | Hepler et al. | Nov 2009 | A1 |
20090295607 | Au et al. | Dec 2009 | A1 |
20100077282 | Shen et al. | Mar 2010 | A1 |
20100272157 | Lakkis | Oct 2010 | A1 |
20110035647 | Eidson et al. | Feb 2011 | A1 |
20120060070 | Stark | Mar 2012 | A1 |
20120072802 | Chinnici et al. | Mar 2012 | A1 |
20120140769 | Hwang et al. | Jun 2012 | A1 |
Entry |
---|
ISO/IEC JTC1/SC29/WG11 14496-3, “MPEG-4: Information technology—Coding of audio-visual objects—Part 3: Audio,” 2009. [Online]. Available: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html. [Accessed 2011]. |
Y. Gao, “Audio coding standard overview: MPEG4-AAC, HE-AAC, and HE-AAC V2,” Chapter 21 in Mobile Multimedia Broadcasting Standards: Technology and Practice, F.-L. Luo (Ed.), Springer, 2009. |
J. Herre and H. Purnhagen, “General audio coding,” Chapter 11 in The MPEG-4 Book, F. Pereira and T. Ebrahimi (Eds.), Prentice-Hall, 2002. |
J.-S. Lee, J.-H. Jeong, and T.-G. Chang, “An efficient method of huffman decoding for MPEG-2 AAC and its performance analysis,” IEEE Trans. Speech and Audio Processing, vol. 13, No. 6, pp. 1206-1209, Nov. 2005. |
P. Duhamel and M. Kieffer, Joint source-channel decoding, Academic Press, Jan. 2010, Chapters 5 and 8. |
R. Hu, X. Huang, M. Kieffer, O. Derrien and P. Duhamel, “Robust critical data recovery for MPEG-4 AAC encoded bitstreams,” in Proc. IEEE ICASSP, Dallas, TX, 2010. |
O. Derrien, M. Kieffer and P. Duhamel, “Joint source/channel decoding of scalefactors in MPEG-AAC encoded bitstreams,” in Proc. EUSIPCO, Lausanne, Switzerland, 2008. |
C. Marin, Y. Leprovost, M. Kieffer and P. Duhamel, “Robust MAC-lite and soft header recovery for packetized multimedia transmission,” IEEE Trans. Communications, vol. 58, No. 3, pp. 775-784, Mar. 2010. |
L. Cao, L. Yao and C. W. Chen, “MAP decoding of variable length codes with self-synchronization strings,” IEEE Trans. Signal Processing, vol. 55, No. 8, pp. 4325-4330, Aug. 2007. |
M. Park and D. J. Miller, “Joint source-channel decoding for variable-length encoded data by exact and approximate MAP sequence estimation,” IEEE Trans. Communications, vol. 48, No. 1, pp. 1-6, Jan. 2000. |
K. Sayood, H. H. Otu, and N. Demir, “Joint source/channel coding for variable length codes,” IEEE Trans. Communications, vol. 48, No. 5, pp. 787-794, May 2000. |
C. Guillemot and P. Siohan, “Joint source-channel decoding of variable-length codes with soft information: A survey,” EURASIP J. Appl. Signal Processing, vol. 6, pp. 906-927, 2005. |
J. B. Anderson and S. Mohan, Source and channel coding: An algorithmic approach, Kluwer Academic Publishers, 1991, Chapter 6.2. |
L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Information Theory, vol. 20, No. 2, pp. 284-287, Mar. 1974. |
J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block and convolutional codes,” IEEE Trans. Information Theory, vol. 42, No. 2, pp. 429-445, Mar. 1996. |
J. Hagenauer and C. Kuhn, “The list-sequential (LISS) algorithm and its application,” IEEE Trans. Communications, vol. 55, No. 5, pp. 918-928, May 2007. |
P. A. Regalia, “Iterative decoding of concatenated codes: A tutorial,” EURASIP J. Appl. Signal Processing, vol. 6, pp. 762-774, 2005. |
A. Hedayat and A. Nosratinia, “Iterative list decoding of concatenated source-channel codes,” EURASIP J. Appl. Signal Processing, vol. 6, pp. 954-960, 2005. |
K. K. Y. Wong and P. J. McLane, “Bi-directional soft-output M-algorithm for iterative decoding,” Proc. IEEE ICC, vol. 2, pp. 792-797, Paris, France, Jun. 2004. |
K. K. Y. Wong, The Soft-Output M-Algorithm and Its Applications, Ph.D. Thesis, Queen's University, Kingston, ON, Canada, Aug. 2006. |
Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, or the Declaration, International Search Report and Written Opinion of the International Searching Authority, dated May 13, 2014, PCT/US2013/072907. |
Number | Date | Country | |
---|---|---|---|
20140153654 A1 | Jun 2014 | US |