There are many different audio and video compression formats, such as but not limited to Windows Media Audio (WMA), Advanced Audio Coding (AAC), MP3, MPEG-2, and MPEG-4. In general, these formats use Huffman coding.
Efficient variable length extraction of bit fields is critical for high-performance and ultra-low-power decoding of compressed audio and video content. Existing implementations use techniques that may be memory efficient at the cost of performance, or vice versa. Performance improvements are usually achieved by using large lookup tables that consume lots of memory.
For example, a Huffman-like compression format that relies on a two-bit lookup per stage may be highly inefficient as every decode has to go through multiple lookups to decode a symbol. The most frequent Huffman table used in such a compression format may have, for example, a worst-case codeword length of 21 bits. Consequently, a worst-case element goes through 11 lookups before the element can be resolved. In general, the average codeword length (ACL) for such a compression format would be around eight (8) bits. Thus, on average, the decode process needs to do between four (4) and five (5) two-bit lookups before an average field can be resolved, which is not very efficient. Furthermore, the tables are structured for the worst case, so that even frequently occurring elements have to go through a decision process, making the lookup process inherently inefficient.
Also, after a Huffman index is resolved, conventional implementations perform another lookup to get the actual information associated with the index. The additional lookup table is relatively large, because it has to resolve all the indices for which there are valid Huffman codes. The table packs 4 bytes per element, as it has to encode all the worst possible run-length-last scenarios. This essentially doubles the table size requirement, without a commensurate improvement in performance. The additional lookup also increases by one the average number of lookups per codeword (ALPC). Thus, in the above example, ALPC is increased to between 5 and six (6).
According to embodiments of the present invention, a sequence of bits in a data stream is accessed and used as an index to find an element in a primary lookup table. Not all of the bits in the sequence may be needed to find the element. If a designated leaf bit in that element is one, then the element is a terminating (leaf) element, and the set of bits used to index that element is resolved. Specifically, the combination of run, level, and length information included in the element is mapped to a decoded symbol. If, on the other hand, the designated leaf bit in the indexed element is zero, then the element is a non-terminating (non-leaf) element, and the information included in that element is used to locate a next (e.g., secondary) table that is accessed to advance the decoding process. Additional bits are accessed from the data stream and used to index an element in the secondary table. In a manner similar to that just described, the designated leaf bit of the indexed element in the secondary table is used to determine whether the element is a terminating element or a non-terminating element. If the element is a terminating element, then the combination of run, level, and length information included in the element is mapped to a decoded symbol; otherwise, the information in the element is used to locate a next (e.g., tertiary) table, and the process just described is repeated until it leads to a terminating element that can be mapped to a decoded symbol.
In general, embodiments according to the invention use different types of formats for lookup table elements, depending on whether the element is a terminating element or a non-terminating element, and depending on whether the lookup is the primary lookup or a secondary (or tertiary, etc.) lookup. The lengths of the table indexes and the table elements are chosen to minimize, or at least reduce, ALPC, which significantly improves performance. As well, the combination of information (e.g., run, level, etc.) packed into the table elements, as well as the lengths of the fields used to pack that information, are chosen such that ALPC is reduced/minimized to a range of 1-2. Furthermore, by using the table hierarchy in combination with the element formats just described, memory requirements are reduced. According to embodiments of the invention, four times less memory is used relative to conventional implementations. As such, the tables can be implemented in internal random access memory, which reduces power consumption to meet ultra-low power audio decode power targets.
These and other objects and advantages of the various embodiments of the present invention will be recognized by those of ordinary skill in the art after reading the following detailed description of the embodiments that are illustrated in the various drawing figures.
The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the present invention and, together with the description, serve to explain the principles of the invention.
Reference will now be made in detail to the various embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with these embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be understood that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the present invention.
Some portions of the detailed descriptions that follow are presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. In the present application, a procedure, logic block, process, or the like, is conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those utilizing physical manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as transactions, bits, values, elements, symbols, characters, samples, pixels, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “indexing,” “retrieving,” “encoding,” “decoding,” “fetching,” “accessing,” “determining,” “examining,” or the like, refer to actions and processes (e.g., flowchart 500 of
Embodiments described herein may be discussed in the general context of computer-executable instructions residing on some form of computer-usable medium, such as program modules, executed by one or more computers or other devices. By way of example, and not limitation, computer-usable media may comprise computer storage media and communication media. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or distributed as desired in various embodiments.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, random access memory (RAM), read only memory (ROM), electrically erasable programmable ROM (EEPROM), flash memory or other memory technology, compact disk ROM (CD-ROM), digital versatile disks (DVDs) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can accessed to retrieve that information.
Communication media can embody computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared and other wireless media.
Compressed data streams such as WMA, AAC, MP3, MPEG-2, and MPEG-4 may be used in many applications including wireless data and voice transmission, audio and video recording and transmission, satellite communications, and so on. Encoded data streams can include variable length codewords and sign information, and each codeword can encode one or more symbols. Data streams may be processed in which sign information follows encoded symbols in an encoded data stream.
Referring to
In one example, an accelerator includes a micro-programmed sequencer 14 that executes a series of decoding steps. The sequencer 14 can generate fetch requests to receive encoded data from the encoded data stream 10, and can use fetched data to look up elements in decode tables 12. The sequencer 14 can be adapted to extract bit fields from the looked-up elements and to interpret information in the bit fields through combinations of bit testing, bit field comparisons, additional fetches, and additional table lookups. The sequencer 14 can return decoded data 16 including available sign information. The sequencer 14 can also manage a bit stream pointer that indicates a current position in the data stream from which data may be fetched.
The sequencer 14 can be implemented on one or more device. In some embodiments, the sequencer 14 is programmable and can be configured to handle a plurality of data encoding schemes. In certain embodiments, configuration information 15 is provided by the processor 17 based on the data stream received. In some embodiments, the sequencer 14 can detect data stream format and select a configuration suitable to decode the data stream. In certain embodiments, decode tables can be provided and updated based on characteristics of the encoded data stream 10. In some embodiments, the processor 17 can be eliminated and the sequencer 14 can include logic and devices to identify or load decode tables and a sequencing algorithm appropriate for the detected data stream 10.
Efficient utilization of storage can be obtained by providing a hierarchy of decode tables 12 such that symbols having higher probabilities of occurrence are decoded in the higher levels of the hierarchy, so that higher probability symbols require fewer table lookups. In one embodiment, elements retrieved from the decode tables 12 are classified as either terminating elements or non-terminating elements. Terminating elements, also referred to as non-leaf elements or resolved elements, indicate that decoding of a codeword is completed/resolved. Non-terminating elements, also referred to as leaf elements or unresolved elements, indicate that additional table lookups are required. For example, in one embodiment, a particular field or flag in an element that is retrieved from the decode tables 12 can indicate whether the element is terminating or is non-terminating. In general, a terminating element includes information that maps or resolves to a decoded symbol, while a non-terminating element provides information such as an offset value that identifies the location of a table in the next level of the hierarchy of decode tables 12. Sign information can be retrieved from the data stream 10 after a terminating condition is encountered.
A data stream 10 includes a sequence of bits B0, B1, . . . , Bn. A portion of the sequence—a set of contiguous bits—is fetched from the data stream 20 and used as an index 22 for indexing the primary table 23. In one embodiment, the index 22 consists of 8 bits. Not all of the bits in the index 22 may be needed to index (locate an entry in) the primary table 23. If, for example, 5 of the 8 bits can be mapped to an entry in the primary table 23, then the three (3) bits that are not used are included in the next sequence of 8 bits that are used as the next index—that is, the 3 bits that are not needed in the first index are combined with the next 5 bits in the data stream 10 to form the next index of 8 bits. If not all 8 bits are needed in the next lookup, then the unused bits from that lookup are used as part of the next lookup, and so on.
In general, the primary table 23 includes a number of indexed entries or elements (see
As noted above, a terminating element includes information that maps or resolves to a decoded symbol, while a non-terminating element provides information such as an offset value that identifies the location of a table in the next level of the hierarchy of decode tables. Location may be identified as a start address relative to the start or end of the primary table 23 or relative to the non-terminating element that contained the offset value. Location may also be identified using other means known in the art and can, for example, be identified using an index or an absolute address that is referenced by the offset value.
Continuing with reference to
In the example of
If the first set of additional bits 30 indexes a terminating element in the secondary table 25B, then the information in that element can be resolved to a respective decoded symbol 40. If the first set of additional bits 30 indexes a non-terminating element in the secondary table 25B, then the information in that element (specifically, offset 26) can be used to identify/locate one of the tertiary tables (e.g., tertiary table 27B). Also, if the first set of additional bits 30 indexes a non-terminating element in the secondary table 25B, then a second set of additional bits 32 are fetched from the data stream 10 and used to index an element or entry in the selected tertiary table.
Some tertiary tables include both terminating elements (Format 1, Format 3 or Format 4) and non-terminating elements (Format 2); other tertiary tables include only terminating elements (Format 1, Format 3 or Format 4). In general, to decode a codeword or symbol, a process such as that just described advances from the primary table through the hierarchy of decode tables until a terminating element is reached, allowing the fetched sequence of bits to be decoded and resolved.
In one embodiment, a constant number of bits are fetched from the data stream 10 to form the set of additional bits 30, the set of additional bits 32, and any other set of additional bits used to index tables below the tertiary tables. In one such embodiment, the constant number of bits is two (2). Thus, the number of bits used to index lower level tables (that is, those tables other than the primary table 23) is known in advance, and so it is not necessary to store that value. Also, because 2 bits are used, each of the lower level tables includes only 4 elements or entries.
The hierarchical table structure in the example of
More specifically, codewords identifying the highest probability symbols can be decoded using the highest level table (e.g., the primary table 23). The secondary tables 25 can be used to decode lower probability symbols (measured relative to those decoded using the primary table), and the tertiary tables 27 (and other tables lower in the hierarchy) can be used to decode even lower probability symbols. Thus, higher probability symbols can be decoded using fewer lookups than lower probability symbols. According to embodiments of the invention, the average length codeword can be resolved in a single lookup.
In one embodiment, in the primary table 23 (
In one embodiment, in the tables other than the primary table 23 (e.g., in the secondary tables 25A and 25B and the tertiary tables 27A and 27B), each of the indexes 302 is 2 bits in length, and each of the table elements 304 is up to 16 bits in length. As mentioned above and as described further below, the table elements 304 in the secondary, tertiary, etc., tables can be packed in Format 1, Format 2, Format 3, or Format 4.
With reference back to the example of
In general, Format 1 is used for decoded symbols that are resolved in the first lookup (that is, in a single lookup). In Format 1, a table element 304 (
In Format 1 of
In Format 1, the value of the length field (Len) identifies the number of bits in the index 22 (
In Format 1, the length field is packed in “length+1” format so that the value of zero (0) can be used. That is, there will not be a case in which the length is 0. To not waste a 1-bit field for such a case, a binary value of [0 0 0] is used to indicate a length of 1, [0 0 1] is used to indicate a length of 2, and so on.
In Format 1, the leaf field (L) is a flag used to indicate that the entry is a terminating element. For example, a value of 1 can be used to indicate that the entry is a terminating (non-leaf) element.
Format 1 can be used for secondary lookups as well as for decoded symbols that are resolved in the single lookup, by using a combination of run, level, length, EOB, and leaf fields that maintain a 16-bit table element.
In Format 2, a table element 304 (
curr_ptr=curr_ptr+offset;
where “curr_ptr” denotes the location of the current table. Location can be identified in various ways, as previously discussed herein. In Format 2, a value of 0 is used in the leaf field to indicate that additional lookups are needed to advance decoding.
Thus, in Format 1, the leaf field has a value of 1 but, in Format 2, the leaf field has a value of 0. Thus, the format used for a particular element in the primary table 23 (
Formats 3 and 4 are used for the elements resolved in secondary, tertiary, etc., lookups. In Format 3 and in Format 4, the data is packed into the following fields: level, run, length, and leaf, where those fields are defined as described above.
The primary difference between Formats 3 and 4 is in the sizes of the level and run fields. A Huffman-like compression scheme may utilize different tables to encode symbols—Format 3 represents a format that can be used with one of the Huffman tables, and Format 4 represents a format that can be used with another of the Huffman tables. Means known in the art and external to the decoding tables described herein can be used to identify which Huffman table was used during encoding. Although only Format 3 and Format 4 are described herein, embodiments described herein are not so limited—there can be additional formats, depending on how many different tables are used in the encoding scheme. The primary difference between Format 3 and Format 4 is a 1 bit selection per block—this can be extended to N additional formats for N different tables.
In Formats 3 and 4, in one embodiment, the length field (Ln) needs only 1 bit because, as mentioned above, all secondary lookups use only 2 bits. Therefore, only a single bit is needed to resolve a bit length of 1 or 2 (a bit length of 0 is not possible). Also, because the number of bits to be fetched for each of the secondary lookups is known in advance (all the secondary lookups use 2 bits at a time), it is not necessary to store that value.
Furthermore, Formats 3 and 4 do not include an EOB marker because all EOB markers are less than or equal to 8 bits in length. Thus, EOB markers are resolved in the first lookup, and EOB markers do not need to be flagged in the secondary, tertiary, etc. lookups.
In Formats 1, 2, 3, and 4, information is packed using, at most, 2 bytes (16 bits) per index (per element). By using the table hierarchy of
In general, in operation, a sequence of bits (index 22 of
In block 502, with reference also to
In block 506 of
In block 508 of
In block 510 of
In block 514, the leaf bit (bit L of
If the leaf bit in the entry in the secondary table indicates that a symbol can be retrieved, then the flowchart 500 proceeds to block 522; otherwise, the flowchart proceeds to block 516 to continue decoding.
In block 516 of
In block 518 of
As previously described herein, more than three levels of tables can be provided, so that the process described by flowchart 500 can be extended to include lower level tables. Ultimately, in block 522, a terminating element (Format 1, 3, or 4) is reached, and the combination of level, run, and length information included in that element can be used to retrieve a symbol 40 (
In summary, embodiments according to the invention use different types of formats for lookup table elements, depending on whether the element is a terminating element or a non-terminating element, and depending on whether the lookup is the primary lookup or a secondary (or tertiary, etc.) lookup. The lengths of the table indexes and the table elements are chosen to minimize, or at least reduce, ALPC, significantly improving performance. As well, the chosen combination of information (e.g., run, level, etc.) packed into the table elements, as well as the lengths of the fields used to pack that information, reduces/minimizes ALPC. In addition, memory requirements and power consumption are reduced.
Embodiments according to the present invention are thus described. While the present invention has been described in particular embodiments, it should be appreciated that the present invention should not be construed as limited by such embodiments, but rather construed according to the below claims.