Method for superscalar delay optimization

Information

  • Patent Application
  • 20240378059
  • Publication Number
    20240378059
  • Date Filed
    July 13, 2023
    a year ago
  • Date Published
    November 14, 2024
    8 days ago
  • Inventors
    • Xu; Xiuquan
Abstract
Disclosed is a method for superscalar delay optimization, wherein an issue queue is divided into three types, namely, ready queue, wait 1 queue, and wait 2 queue. The advantages of the invention compared with prior art are: in the invention, an issue queue is divided into three types, namely, ready queue, wait 1 queue, and wait 2 queue; the length of each type of queue is one-third of the issue queue, so that the delay of scanning the entire issue queue from beginning to end per clock cycle is reduced to one-third of the original.
Description
1. TECHNICAL FIELD

The invention relates to the field of superscalar technology, in particular to a method for superscalar delay optimization.


2. BACKGROUND ART

In the superscalar technology issue queue, each issue needs to scan the issue queue from the beginning to the end to find out the commands that have been prepared for issue. And when the issue queue is long, especially the CIQ queue (centralized issue queue), it is necessary to select several commands that can be executed from a queue including a huge number of commands, and the delay will be very large, so that the cycle time is greatly affected, thereby greatly reducing the clock frequency and execution efficiency of the processor. And its delay is proportional to the capacity of the issue queue, the more commands that can be accommodated in the issue queue, the greater the delay.


In traditional superscalar techniques, scanning from beginning to end, looking for the commands that needs to be issued may require O (n) time complexity at worst, and then wake up, so that the wake-up time can only wait until that O (n) scans the entire issue queue is completed in one cycle.


3. SUMMARY OF THE INVENTION

The technical problem to be solved by the invention is to reduce the delay of scanning the entire issue queue from the beginning to the end per clock cycle of the traditional superscalar technology to one-third of the original one.


In order to solve the above technical problems, the technical solutions provided by the invention are: a method for superscalar delay optimization, wherein an issue queue is divided into three types, namely, ready queue, wait 1 queue, and wait 2 queue; the length of each type of queue is one-third of the issue queue, so that the delay of scanning the entire issue queue from beginning to end per clock cycle is reduced to one-third of the original; in the issue queue, the details are:

    • 1) ready queue: the data of both source registers is ready and can be issued directly;
    • 2) wait 1 queue: the data of only one source register is not ready;
    • 3) wait 2 queue: the data of both source registers is not ready;


in the issue queue, the three queues include three command entry and exit modes, specifically:

    • 1) the positions of issue width in the front of ready queue are used as the issue ports, and the commands are sent to the execution unit;
    • 2) scan wait 1 queue; if allow 1 is 1, then wait 1 queue is allowed to issue; there are commands in wait 1 queue that both source registers are ready, and the issue enters ready queue;
    • 3) scan wait 2 queue; if allow 2 is 1, then wait 2 queue is allowed to issue; there are commands in wait 2 queue that one source register is ready or both source registers are ready, and the issue enters wait 1 queue.


The advantages of the invention compared with prior art are: in the invention, an issue queue is divided into three types, namely, ready queue, wait 1 queue, and wait 2 queue; the length of each type of queue is one-third of the issue queue, so that the delay of scanning the entire issue queue from beginning to end per clock cycle is reduced to one-third of the original. Further, the three command entry and exit modes are parallel.


Further, the positions of issue width in the front of ready queue are set as the issue ports, the first position is the issue port for issuing the first issue queue command, and the second position is the issue port for issuing the second issue queue command, and so on to the issue width position, so that it is possible to find the commands that need to be issued and executed from the issue queue in a time complexity of O (1) in one cycle, that is, it can be woken up after a very small constant time complexity.


Further, in the issue queue, the three command entry and exit modes are applicable to any command type, but it needs to meet the condition that one command has at most two source registers and one destination register (applicable to risc instruction type, arm instruction type, micro instruction type that meets the condition, and any other instruction type that meets the condition).







4. SPECIFIC EMBODIMENT OF THE INVENTION

When the invention is in specific implementation, several important parameters are included:

    • 1) machine width: the length of the machine queue, that is, the maximum number of commands that the processor can decode and rename at one time;
    • 2) issue width: the number of commands that can be issued and executed from the issue queue at most once per cycle.


The type of queue storage command in this solution: any command type, but it needs to meet the condition that one command has at most two source registers and one destination register (applicable to risc instruction type, arm instruction type, micro instruction type that meets the condition, and any other instruction type that meets the condition).


In one embodiment of the invention, the core principle of this solution is: an issue queue is divided into three types, namely, ready queue, wait 1 queue, and wait 2 queue; the length of each type of queue is one-third of the issue queue, so that the delay of scanning the entire issue queue from beginning to end per clock cycle is reduced to one-third of the original.


In detail, the issue queue includes: 1) ready queue: the data of both source registers is ready and can be issued directly;

    • 2) wait 1 queue: the data of only one source register is not ready;
    • 3) wait 2 queue: the data of both source registers is not ready;


In one embodiment of the invention, in the issue queue, the three queues include three command entry and exit modes, specifically:

    • 1) the positions of issue width in the front of ready queue are used as the issue ports, and the commands are sent to the execution unit;
    • 2) scan wait 1 queue; if allow 1 is 1, then wait 1 queue is allowed to issue; there are commands in wait 1 queuethat both source registers are ready, and the issue enters ready queue;
    • 3) scan wait 2 queue; if allow 2 is 1, then wait 2 queue is allowed to issue; there are commands in wait 2 queue that one source register is ready or both source registers are ready, and the issue enters wait 1 queue.


The above three processes are parallel, thus reducing the scan queue delay to ⅓ of the original.


In one embodiment of the invention, the working principle of the solution is: the positions of issue width in the front of ready queue are set as the issue ports, the first position is the issue port for issuing the first issue queue command, and the second position is the issue port for issuing the second issue queue command, and so on to the issue width position, so that it is possible to find the commands that need to be issued and executed from the issue queue in a time complexity of O (1) in one cycle, that is, it can be woken up after a very small constant time complexity.


The invention and the embodiments thereof are described hereinabove, and this description is not restrictive. What is shown in the drawings is only one of the embodiments of the invention, and the actual structure is not limited thereto. All in all, structural methods and embodiments similar to the technical solution without deviating from the purpose of the invention made by those of ordinary skill in the art without creative design shall all fall within the protection scope of the invention. The protection scope of the invention is defined by the appended claims and the equivalents thereof.

Claims
  • 1. A method for superscalar delay optimization, wherein an issue queue is divided into three types, namely, ready queue, wait 1 queue, and wait 2 queue; the length of each type of queue is one-third of the issue queue, so that the delay of scanning the entire issue queue from beginning to end per clock cycle is reduced to one-third of the original; in the issue queue, the details are: ready queue: the data of both source registers is ready and can be issued directly;wait 1 queue: the data of only one source register is not ready;wait 2 queue: the data of both source registers is not ready;in the issue queue, the three queues include three command entry and exit modes, specifically:the positions of issue width in the front of ready queue are used as the issue ports, and the commands are sent to the execution unit;scan wait 1 queue; if allow 1 is 1, then wait 1 queue is allowed to issue; there are commands in wait 1 queue that both source registers are ready, and the issue enters ready queue;scan wait 2 queue; if allow 2 is 1, then wait 2 queue is allowed to issue; there are commands in wait 2 queue that one source register is ready or both source registers are ready, and the issue enters wait 1 queue.
  • 2. The method for superscalar delay optimization of claim 1, wherein the three command entry and exit modes are parallel.
  • 3. The method for superscalar delay optimization of claim 1, wherein the positions of issue width in the front of ready queue are set as the issue ports, the first position is the issue port for issuing the first issue queue command, and the second position is the issue port for issuing the second issue queue command, and so on to the issue width position, so that it is possible to find the commands that need to be issued and executed from the issue queue in a time complexity of O (1) in one cycle, that is, it can be woken up after a very small constant time complexity.
  • 4. The method for superscalar delay optimization of claim 1, wherein in the issue queue, the three command entry and exit modes are applicable to any command type, but it needs to meet the condition that one command has at most two source registers and one destination register (applicable to risc instruction type, arm instruction type, micro instruction type that meets the condition, and any other instruction type that meets the condition).
Priority Claims (1)
Number Date Country Kind
2023105185126 May 2023 CN national