Claims
- 1. A method for automated generation of an abbreviated instruction set comprising the steps of:
generating a compiled and optimized code segment, and dynamic execution profile information for an application; providing the original instruction set, the compiled and optimized code segment, a set of application specific requirements, and the dynamic execution profile information as inputs to an analysis and abbreviation tool; automatically producing an application specific high density representation of the original instructions used in the code segment and a decoding table utilizing the analysis and abbreviation tool.
- 2. The method of claim 1 further comprising the step of:
decoding the application specific high density representation of the original instructions in a digital signal processor system utilizing the decoding table and an instruction decoder.
- 3. The method of claim 1 wherein said step of automatically producing further comprises the step of:
partitioning an original code segment consisting of instructions from the original instruction set into groups of instructions based on a selected strategy defining the group where each group has one or more patterns of bits that do not change and patterns of bits that do change associated with it.
- 4. The method of claim 3 wherein said step of partitioning further comprises the step of:
defining each pattern in each group of instructions by at least two bit vectors, a mix mask and a bit group.
- 5. The method of claim 3 wherein said selected strategy defining the group comprises all instructions having the same operation code.
- 6. The method of claim 3 wherein said selected strategy defining the group comprises all instructions which execute on the same functional execution unit.
- 7. The method of claim 6 wherein the functional execution units comprise a store unit, a load unit, an arithmetic logic unit (ALU), a multiply accumulate unit (MAU), an a data select unit (DSU).
- 8. The method of claim 3 wherein said selected strategy defining the group comprises all instructions having a defined common characteristic.
- 9. The method of claim 8 wherein the definite common characteristic is frequency of instruction usage in the code segment.
- 10. The method of claim 3 wherein said selected strategy defining the group comprises a combination of other group selection strategies.
- 11. The method of claim 3 wherein the step of partitioning further comprises the step of selecting the best strategy defining the group.
- 12. The method of claim 3 wherein the step of partitioning further comprises the steps of:
defining a sequential number for each group specifying where that group is among other groups; and defining a static frequency of a particular instruction in the application under consideration.
- 13. The method of claim 1 wherein said decoding table comprises a translation table storing bit patterns in an x/y dictionary.
- 14. The method of claim 13 further comprising the step of loading the translation table into a translation memory.
- 15. The method of claim 14 further comprising the steps of:
replacing each instruction of the original instruction set used in the code segment with an abbreviated instruction containing a pair of indices that provide addressing information for accessing the translation memory.
- 16. A sequential implementation shuffler comprising:
a register for holding an original uncompressed instruction; a mix mask register; an x bit group register; a y bit group register; and a multiplexer, wherein the multiplexer passes one bit from the x bit group register to said register for holding if the current most significant bit of the mix mask register is equal to one, and one bit from the y bit group register otherwise, and at the end of the cycle, all registers are shifted left by one and the process is repeated.
- 17. A parallel implementation shuffler comprising:
a register for holding an original uncompressed instruction; a mix mask register; an x bit group register; a y bit group register; a plurality of parallel connected multiplexers; and a priority decoder for controlling the plurality of parallel connected multiplexers.
- 18. A parallel implementation shuffler comprising:
a multiplexer tree logic; and a priority decoder.
- 19. Apparatus for automated generation of an abbreviated instruction set comprising:
means for generating a compiled and optimized code segment, and dynamic execution profile information for an application; means for providing the original instruction set, the compiled and optimized code segment, a set of application specific requirements, and the dynamic execution profile information as inputs to an analysis and abbreviation tool; and means for automatically producing an application specific high density representation of the original instructions used in the code segment and a decoding table utilizing the analysis and abbreviation tool.
- 20. The apparatus of claim 19 further comprising:
means for decoding the application specific high density representation of the original instructions in a digital signal processor system utilizing the decoding table and an instruction decoder.
- 21. The apparatus of claim 19 wherein said means for automatically producing further comprises:
means for partitioning an original code segment consisting of instructions from the original instruction set into groups of instructions based on a selected strategy defining the group where each group has one or more patterns of bits that do not change and patterns of bits that do change associated with it.
- 22. The apparatus of claim 21 wherein said means for partitioning further comprises:
means for defining each pattern in each group of instructions by at least two bit vectors, a mix mask and a bit group.
- 23. The apparatus of claim 21 wherein said selected strategy defining the group comprises all instructions having the same operation code.
- 24. The apparatus of claim 21 wherein said selected strategy defining the group comprises all instructions which execute on the same functional execution unit.
- 25. The apparatus of claim 24 wherein the functional execution units comprise a store unit, a load unit, an arithmetic logic unit (ALU), a multiply accumulate unit (MAU), and a data select unit (DSU).
- 26. The apparatus of claim 21 wherein said selected strategy defining the group comprises all instructions having a defined common characteristic.
- 27. The apparatus of claim 26 wherein the defined common characteristic is frequency of instruction usage in the code segment.
- 28. The apparatus of claim 21 wherein said selected strategy defining the group comprises a combination of other group selection strategies.
- 29. The apparatus of claim 21 wherein the partitioning means further comprises means for selecting the best strategy defining the group.
- 30. The apparatus of claim 21 wherein the means for partitioning further comprises:
means for defining a sequential number for each group specifying where that group is among other groups; and means for defining a static frequency of a particular instruction in the application under consideration.
- 31. The apparatus of claim 19 wherein said decoding table comprises a translation table storing bit patterns in an x/y dictionary.
- 32. The apparatus of claim 31 further comprising:
means for loading the translation table into a translation memory.
- 33. The apparatus of claim 32 further comprising means for replacing each instruction of the original instruction set used in the code segment with an abbreviated instruction containing a pair of indices that provide addressing information for accessing the translation memory.
- 34. A translation memory (TM) system comprising:
a byte addressable X-TM portion; a byte addressable Y-TM portion; an X-TM K-by-K byte-wide switch; a Y-TM K-by-K byte-wide switch; an X-TM switch output latch; and a Y-TM switch output latch, wherein overlapping X entries are stored in the X-TM to minimize the X-TM size and overlapping Y entries are stored in the Y-TM to minimize the Y-TM size.
- 35. A translation memory (TM) system comprising:
a double ported read accessible byte addressable memory with one of the two read access ports associated with the X translation table entries and the other read access port associated with the Y translation table entries; an X-TM port K-by-K byte-wide switch; a Y-TM port K-by-K byte-wide switch; an X-TM port switch output latch; and a Y-TM port switch output latch, wherein overlapping X entries are stored in the X-TM read port accessible portion to minimize the X-TM size and overlapping Y entries are stored in the Y-TM read port accessible portion to minimize the Y-TM size.
- 36. The TM system of claim 34 where K is 4 for a 32-bit 4 byte wide original instruction.
- 37. The TM system of claim 35 where K is 4 for a 32-bit 4 byte wide original instruction.
- 38. The TM system of claim 35 where uncompressed instructions can be stored in the memory.
- 39. A translation memory (TM) system comprising:
a K-by-J abbreviated instruction memory for storing X-TM and Y-TM indices; a X-TM base address register; a X-TM address adder for adding a X-TM base address register value with a X-TM index creating a X-TM address; a Y-TM base address register; a Y-TM address adder for adding a Y-TM base address register value with a Y-TM index creating a Y-TM address; a X-TM for holding X translation table entries; a Y-TM for holding Y translation table entries; and a shuffler for combining a X-TM addressed entry with a Y-TM addressed entry to recreate an original instruction in a form suitable for execution on a core processor.
- 40. A method for automated generation of an abbreviated instruction set comprising the steps of:
generating a compiled and optimized code segment; generating dynamic execution profile information for the code segment; partitioning the instructions used in the code segment into a selected grouping strategy; using binary analysis, each group is analyzed to determine the fixed pattern of bits versus the pattern of bits which change within the selected group; determining the X and Y translation table contents; determining the X and Y translation memory contents; verifying branch target addresses and recalculating these addresses according to the requirements of the abbreviated code segment; determining whether a fitness factor has been met and if not repeating the method steps beginning at the repartitioning the instructions used in the code segment into a new grouping strategy and repeating the steps until the fitness factor is matched creating the abbreviated code segment and decoder tables.
- 41. The method of claim 40 wherein the determining of the X and Y translation memory contents is optimized by a dynamic analysis of the code segment using the generated translation tables and their associated X-TM and Y-TM base address register reload requirements for the purpose of minimizing the effect of the X-TM and Y-TM reload operations on the performance and size of the code segment.
- 42. The method of claim 40 wherein the TM contents are optimized by first grouping instructions according to their execution frequencies into threads and indices are assigned sequentially in each thread in a descending weight order and second for a fixed number of iterations the indices that are causing TM base address register reloading with the highest weight are swapped with indices of a lower weight.
- 43. Apparatus for automated generation of an abbreviated instruction set comprising:
means for generating a compiled and optimized code segment, and dynamic execution profile information for an application; means for providing the original variable width instruction set, the compiled and optimized code segment using variable width instructions, a set of application specific requirements, and the dynamic execution profile information as inputs to an analysis and abbreviation tool; and means for automatically producing an application specific high density representation of the original variable width instructions used in the code segment and a decoding table utilizing the analysis and abbreviation tool.
- 44. The apparatus of claim 43 further comprising:
means for decoding the application specific high density representation of the original variable width instructions in a digital signal processor system utilizing the decoding table and an instruction decoder.
- 45. The apparatus of claim 43 wherein said means for automatically producing further comprises:
means for partitioning an original code segment consisting of variable width instructions from the original instruction set into groups of instructions based on a selected strategy defining the group where each group has one or more patterns of bits that do not change and patterns of bits that do change associated with it.
Parent Case Info
[0001] The present application claims the benefit of U.S. Provisional Application Serial No. 60/283,582 filed Apr. 13, 2001, entitled “Methods and Apparatus for Automated Generation of Abbreviated Instruction Set and Configurable Processor Architecture,” which is incorporated by reference herein in its entirety.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60283582 |
Apr 2001 |
US |