This invention relates to lookup tables, and in particular to an improved method and apparatus for storing and retrieving information in lookup tables.
A lookup table can be considered as a data structure, usually in the form of an array, which is used to replace a run-time computation with a simpler lookup operation. Typically, the addresses are sequential from zero to the maximum size needed, and each address is associated with a data value. For example, a lookup table may be provided for trigonometric computations. Thus, rather than calculating the cosine of a different angle each time that information is required by an application program operating on a computer, the application can reference a lookup table in which the addresses correspond to various angles, and the data in the array provides the cosine for that angle.
The invention described here relates to a technique for improving the speed of operation and the retrieval of information for video-related data from a lookup table.
In one embodiment a system for providing data from a lookup table includes a video processing engine which provides a pixel address as an output signal. A leading zero detector receives the pixel address as an input signal and provides a corresponding address in response. A lookup table is then accessed based upon the corresponding address, which in return provides the necessary data to the video processing engine.
In another embodiment of the invention, a method for performing faster table lookup operations in a computing system where the lookup table contains redundant information includes the steps of creating a lookup table having a series of values associated with addresses, some of the values being used more than once in the table, and organizing the table in a manner so that all entries having the same value have the same number of leading zeros in their addresses. When a value is to be retrieved, the number of leading zeros is used as an index into the table to select data based upon the number of leading zeros. Using the number of leading zeros as the address, the value is retrieved and used in an image processing operation, or other desired operation.
In commercially significant implementations of video compression systems, such as the Motion Picture Experts Group (MPEG)-4 standard, also known as H.264, the table may have as many as 512 addresses, but only ten different data values. In implementations such as this, as the original table becomes larger, the number of unique data values may grow slowly while the size of the table grows almost exponentially.
Data can be loaded into the weight table or the binary encoder via a data entry module 20. The data value is provided to the weight table 12 and the 4 bit weight table address for that data is provided to the binary encoder. The implementation of
Numerous circuit implementations for lookup tables are well known. Implementations of a leading zero detector and a binary encoder are well known, however, logic equations for such implementations such as for
Using the same naming conventions as in Table 1, the logic equations for the binary encoder are shown in Table 2 below. In Tables 2, the vertical line represents the logic operation “OR.”
The system described above for improving table lookup has been described in terms of a hardware implementation. Software implementations may also be employed. In a software implementation large table lookups often have a problem known as cache line victimization. A typical cache line in most computers is on the order of 16 to 64 bytes. As the motion estimation engine moves around the search window, the cache will eventually fill with data, e.g., 512 entries, meaning on the order of 1000 kilobytes. This is a large amount of data for a cache memory, particularly when there are really only ten unique data entries, occupying on the order of 20 bytes. This results in victimizing good data in the cache, and causing significant bus traffic, in addition to slowing cache access.
Furthermore, many digital signal processors have small fast memories in addition to their cache memories. These smaller memories are usually on the order of 4 kilobytes in size. The table of this invention fits into this fast memory; however, the large size of the table poses the same problem as the cache memory, that is a systematic reduction in the size of the cache in exchange for 20 bytes of unique data.
The invention described herein illustrates how to dramatically reduce the cache line victimization and data memory problems. In particular, by employing an instruction that counts the number of leading zeros, an address may be used to retrieve the appropriate data value. If desired, the leading zero count can be subtracted from a pointer address to get the data value from one of the ten unique data entries. For example, if the lookup table of
The data in the table can be rearranged to allow for different accesses or different constants to be used with the top of table pointer.
Although a preferred embodiment has been described, various changes, and substitutions are intended in the present invention. In some instances, features of the invention can be employed without a corresponding use of other features, without departing from the scope of the invention as set forth below. Therefore, many modifications may be made to adapt a particular configuration or method disclosed, without departing from the essential scope and spirit of the present invention. It is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments and equivalents falling within the scope of the claims.