This application claims the benefit of China application Serial No. 202211295072.4, filed on Oct. 21, 2022, the subject matter of which is incorporated herein by reference.
The present invention generally relates to instruction compression and instruction decompression, and, more particularly, to instruction compression and instruction decompression related to jump instructions (also referred to as branch instructions).
Generally speaking, a process (e.g., an image processing process, a booting process, etc.) usually includes at least one jump instruction. However, the platforms on the market cannot enable variable-length instruction compression when processing jump logic, which leads to the requirement for a larger instruction register, an increase in the number of long jump instructions, and a decrease in the process execution efficiency. Therefore, there is a need for an instruction compression method, an instruction decompression method, and a process compression method to reduce the space required by the instruction register and the number of long jump instructions.
In view of the issues of the prior art, an object of the present invention is to provide an instruction compression method, an instruction decompression method, and a process compression method, so as to make an improvement to the prior art.
According to one aspect of the present invention, an instruction decompression method is provided. The instruction decompression method is applied to a hardware circuit that decompresses an instruction and executes the instruction. The instruction includes a header, and the header includes a reference value. The instruction decompression method includes the following steps: reading a first parameter of the instruction to obtain a total number of mismatched parameters when the reference value of the instruction is a preset value; and using a plurality of second parameters of the instruction to set a plurality of corresponding parameters of the hardware circuit, the number of second parameters is equal to the total number of mismatched parameters.
According to another aspect of the present invention, an instruction compression method is provided. The instruction compression method is for compressing an instruction to generate a compressed instruction. The instruction includes a header and a plurality of parameters, and the header includes a reference value. The instruction compression method includes the following steps: comparing the instruction with a previous instruction to find a plurality of mismatched parameters between the instruction and the previous instruction; setting the reference value of the compressed instruction to a preset value; setting a target parameter of the compressed instruction to the number of mismatched parameters; and setting other parameters of the compressed instruction to the mismatched parameters.
According to still another aspect of the present invention, a process compression method is provided. The process compression method is for compressing a process that includes a jump instruction. The process compression method includes the following steps: dividing the process into a plurality of blocks according to a position of the jump instruction in the process and a destination of the jump instruction; storing a jump relationship between the blocks; performing instruction compression on the blocks; recalculating a jump address of the jump instruction according to the jump relationship; determining a plurality of groups according to sizes of the blocks and the jump relationship; and determining whether the jump instruction is a jump instruction of a first type or a jump instruction of a second type according to a relationship between the jump instruction and the groups.
The technical means embodied in the embodiments of the present invention can solve at least one of the problems of the prior art. Therefore, compared with the prior art, the present invention can reduce the space required by the instruction register and/or the number of long jump instructions.
These and other objectives of the present invention no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiments with reference to the various figures and drawings.
The following description is written by referring to terms of this technical field. If any term is defined in this specification, such term should be interpreted accordingly. In addition, the connection between objects or events in the below-described embodiments can be direct or indirect provided that these embodiments are practicable under such connection. Said “indirect” means that an intermediate object or a physical space exists between the objects, or an intermediate event or a time interval exists between the events.
The disclosure herein includes an instruction compression method, an instruction decompression method, and a process compression method. On account of that some or all elements of the Intelligent Processing Unit (IPU) could be known, the detail of such elements is omitted provided that such detail has little to do with the features of this disclosure, and that this omission nowhere dissatisfies the specification and enablement requirements. Some or all of the processes of the instruction compression method, the instruction decompression method, and the process compression method may be implemented by software and/or firmware. A person having ordinary skill in the art can choose components or steps equivalent to those described in this specification to carry out the present invention, which means that the scope of this invention is not limited to the embodiments in the specification.
The decoder 110 includes a memory 112 (e.g., a Static Random Access Memory (SRAM)), an instruction prefetch circuit 114, an instruction delivery circuit 116, and a jump logic circuit 118. The memory 112 can store instructions to be executed by the IPU 100. The instruction prefetch circuit 114 is used to fetch the instruction from the memory 112, and then the instruction delivery circuit 116 delivers the instruction to the corresponding hardware circuit (i.e., the DMA 120, the vector circuit 130, or the convolution circuit 140) according to the flag InstFlag of the instruction (reference can be made to
In some embodiments, if the difference between the destination of a jump instruction (i.e., the address of the target instruction in the memory 112) and the address of the jump instruction itself in the memory 112 is less than a threshold (e.g., the size of the instruction buffer of the memory 112), then the jump instruction is a short jump instruction; otherwise, the jump instruction is a long jump instruction. That is to say, the jump range of a short jump instruction is smaller than the jump range of a long jump instruction. When processing long jump instructions, the decoder 110 needs the DMA 120 to get more instructions from the external memory of the IPU 100 (e.g., a Dynamic Random Access Memory (DRAM), not shown); however, there is no such a need for short jump instructions. Therefore, long jump instructions consume more time and system resources than short jump instructions.
Step S210: Dividing the process into multiple blocks according to the position(s) of the jump instruction(s) in a process (e.g., the address(es) of the jump instruction(s) in the memory 112) and the destination(s) of the jump instruction(s) (e.g., the address(es) of the destination(s) in the memory 112). More specifically, step S210 scans the instructions in a process and sets at least one block boundary BB to divide the process into multiple blocks. The details of step S210 will be discussed in detail below in connection with
Step S220: Storing the inter-block jump relationship (i.e., a jump relationship between blocks). Reference is made to
Step S230: Performing instruction compression on the blocks block by block to obtain multiple compressed blocks. This step will be discussed in detail below in connection with
Step S240: Recalculating the jump address(es) (i.e., the destination address of the jump instruction) according to the inter-block jump relationship. Because the blocks have been compressed in step S230, the address of the destination of the compressed jump instruction is no longer the original address before compression, the jump address must be recalculated or updated. For example (referring to
Step S250: Determining groups according to the block sizes and the jump relationship. The purpose of this step is to divide the blocks into groups. The details of step S250 will be discussed below in connection with
Step S260: Determining whether the jump instruction is the jump instruction of the first type (e.g., the short jump instruction) or the jump instruction of the second type (e.g., the long jump instruction) according to the relationship between the jump instruction and the groups. By grouping the blocks into multiple groups, the present invention can more accurately separate short jump instructions from long jump instructions to avoid errors during the execution of the process. Step S260 will be discussed below in connection with
Step S510: Reading an instruction.
Step S520: Determining whether the instruction is a jump instruction or the destination of a jump instruction. If NO, then the flow proceeds to step S510 to read the next instruction; if YES, the flow proceeds to step S530.
Step S530: Setting a block boundary BB to determine a block. More specifically, if the instruction is a jump instruction, step S530 sets a block boundary BB after the instruction (e.g., between the instruction INST5 and the instruction INST6 and between the instruction INST11 and the instruction INST12 in
Step S540: Determining whether there is still instruction to be processed in the process. If YES, then the flow proceeds to step S510 to read the next instruction; if NO, the flow proceeds to step S550.
Step S550: Setting the block boundary BB before finishing.
Taking
Step S610: Reading an instruction of a block.
Step S620: Determining whether the instruction is the first instruction of the block. If YES, then the flow proceeds to step S610 to read the next instruction of the block; if NO, the flow proceeds to step S630. Instruction compression is not performed on the first instruction of a block because there is no previous instruction to be used as a reference.
Step S630: Comparing the instruction with the previous instruction to find out mismatched parameter(s) between the instruction and the previous instruction. Taking
Step S640: Setting the reference value HDLen of the header HD of the compressed instruction to the preset value to mark the compressed instruction.
Step S650: Setting the first parameter of the compressed instruction to the total number Nd of mismatched parameter(s). Taking
Step S660: Setting other parameters of the compressed instruction to the mismatched parameter(s). This step sets the second to xth (x=1+Nd) parameter(s) of the compressed instruction INST_k′ to the mismatched parameter(s) obtained in step S630. Taking
Step S670: Determining whether there is still instruction to be processed in the block. If YES, the flow proceeds to step S610 to read the next instruction of the block; if NO, the method of
Step S810: Selecting a block and updating the size of the current group according to the size of the block. The size of a group is the sum of the sizes of all the blocks contained in that group. This step updates the size of the current group to the sum of the size of the current group and the size of the block. Taking
Step S820: Determining whether the size of the current group is greater than a threshold. In some embodiments, the threshold may be the size of the instruction buffer of the memory 112. If the result of step S820 is NO, then the flow proceeds to step S830; otherwise, the flow proceeds to step S840 and step S850.
Step S830: Making the block a part of the current group. The result of step S820 being negative means that adding the selected block to the current group does not make the current group too large (greater than the threshold); thus, step S830 makes the block a part of the current group. Continuing the above example, if the sum of the size of the block BLK1 and the size of the block BLK2 does not exceed the threshold SR, this step sets the block BLK2 to be in the same group as the block BLK1.
Step S840: Making the block a part of a new group. The result of step S820 being positive means that adding the selected block to the current group makes the current group too large (greater than the threshold); thus, step S840 determines the current group (i.e., sets the group boundary GB), and then makes the selected block a part of the new group (the new group contains only this block at this time). Taking
Step S850: Setting the size of the new group to the size of the block. Continuing the above example, since the group GRP2 contains only the block BLK3 at this time, the size of the new group is the size of the block BLK3. Note that the new group becomes the current group in the next iteration (i.e., when step S810 is performed again).
Step S860: Determining whether there is still block to be processed. If YES, the flow proceeds to step S810 to select the next block; if NO, the flow proceeds to step S870.
Step S870: Setting a group boundary before finishing. Taking
Referring to
Step S910: Selecting a group.
Step S920: Selecting a block of the group.
Step S930: Determining whether the following conditions are satisfied: the block is not the first block of the group, and the block is the destination of a jump instruction of another group. Taking
Step S940: Determining whether there is still block to be processed in the group. If YES, the flow proceeds to step S920 to select the next block of the group; otherwise, the flow proceeds to step S960.
Step S950: Setting the group boundary GB, that is, dividing the current group into two groups. Taking
Step S960: Determining whether there is still group to be processed. If YES, the flow proceeds to step S910 to select the next group; if NO, the method of
Step S1110: Selecting a jump instruction. Taking
Step S1120: Determining whether the destination of the jump instruction is in the group to which the jump instruction belongs. Taking
Step S1130: Making the jump instruction a long jump instruction.
Step S1140: Making the jump instruction a short jump instruction.
Step S1150: Determining whether there is still a jump instruction. If YES, the flow proceeds to step S1110 to select the next jump instruction; if NO, the method of
In the example of
Step S1310: Reading or receiving an instruction, for example, reading an instruction from the memory, or receiving an instruction delivered by the instruction delivery circuit 116.
Step S1320: Determining whether the reference value HDLen of the header HD of the instruction is a preset value. If NO (meaning that the instruction is not a compressed instruction), the flow proceeds to step S1330; if YES (meaning that the instruction is a compressed instruction), the flow proceeds to step S1340 and step S1350.
Step S1330: Using all the parameters of the instruction to set corresponding parameters of the hardware circuit (e.g., setting the register values of registers). Reference is made to
Step S1340: Reading the first parameter of the instruction to obtain the total number Nd of mismatched parameter(s). Reference is made to
Step S1350: Using the second to (Nd+1)th parameters of the instruction to set the corresponding parameters of the hardware circuit (e.g., setting the register values of the registers). In the example shown in
To sum up, by compressing the instructions, the present invention reduces the space that the instruction register requires (i.e., reduces the size of the memory 112) to save cost. In addition, because the instructions are compressed, the present invention can also reduce the number of long jump instructions to improve the execution efficiency of the process.
The variable-length instructions and IPUs are intended to illustrate the invention by way of examples, rather than to limit the scope of the claimed invention. People having ordinary skill in the art may apply the present invention to other types of instructions and control circuits according to the discussions made above.
The aforementioned descriptions represent merely the preferred embodiments of the present invention, without any intention to limit the scope of the present invention thereto. Various equivalent changes, alterations, or modifications based on the claims of the present invention are all consequently viewed as being embraced by the scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
202211295072.4 | Oct 2022 | CN | national |
Number | Date | Country | |
---|---|---|---|
20240134649 A1 | Apr 2024 | US |