The present application claims the priority to Chinese Patent Application No. 202211350246.2, filed on Oct. 31, 2022, the entire disclosure of which is incorporated herein by reference as portion of the present application.
Embodiments of the present disclosure relate to a decoding method, a processor, a chip, and an electronic device.
In modem processors, instructions need to go through processes such as instruction fetching, instruction decoding, and executing, etc.; and instruction decoding is a process of parsing and translating an instruction fetched to obtain a micro-operation (micro-op, Uop). Instruction decoding is an important task of a processor, and how to improve decoding performance of the processor has always been a research topic for those skilled in the art.
In view of this, the embodiments of the present disclosure provide a decoding method, a processor, a chip, and an electronic device to implement parallel decoding of instructions and obtain a micro-op sequence consistent with an instruction fetching order, thereby improving decoding performance of the processor.
To achieve the above objectives, the embodiments of the present disclosure provide the following technical solutions.
The embodiments of the present disclosure provide a decoding method, applied to a processor, and the method includes:
The embodiments of the present disclosure further provide a processor, including:
The embodiments of the present disclosure further provide a chip, including the processor mentioned above.
The embodiments of the present disclosure further provide an electronic device, including the chip mentioned above.
The decoding method provided by the embodiments of the present disclosure may be applied to a processor, at least one switch tag may be carried in an instruction fetching request, and the switch tag at least indicates an instruction position for performing decoder group switch. Thus, when responding to a micro-op being obtained as decoded by a decoder group, the processor may acquire an instruction stream fetched by the instruction fetching request, and determine the instruction position for performing decoder group switch in the instruction stream according to the switch tag carried by the instruction fetching request; further, allocate the instruction stream to a plurality of decoder groups for parallel decoding according to the instruction position; and attach the switch tag to a target micro-op obtained by decoding a target instruction; the target instruction is an instruction corresponding to the instruction position, such that micro-ops obtained as decoded by the plurality of decoder groups may be merged subsequently according to the switch tag attached to the target micro-op, to obtain micro-ops corresponding to an instruction fetching order. When responding to a micro-op being obtained through searching by a micro-op cache, in the case where the instruction fetching request is hit in the micro-op cache, the processor according to the embodiments of the present disclosure may not decode instructions through the decoder group, but instead acquire the micro-op corresponding to the instruction fetching request from the micro-op cache.
The embodiments of the present disclosure may allow in the processor having the micro-op cache and the decoder that, by carrying the switch tag in the instruction fetching request and at least indicating the instruction position for performing decoder group switch by the switch tag, in a decoder mode of obtaining the micro-op through decoding by the decoder group, the switch tag may be transparently transmitted through the target instruction and the target micro-op, to support parallel decoding by the plurality of decoder groups and merging of the decoded micro-ops according to an instruction fetching order of the instructions, thereby improving decoding efficiency; while in a micro-op cache mode of obtaining the micro-op through searching by the micro-op cache in the processor, the embodiments of the present disclosure may allow processing not based on the switch tag carried in the instruction fetching request, so as to be compatible with the micro-op cache mode of the processor. The embodiments of the present disclosure may support parallel decoding in the processor that supports the decoder mode and the micro-op cache mode, so as to improve decoding performance.
In order to clearly illustrate the embodiments of the present disclosure, the drawings of the embodiments will be briefly described. It is obvious that the described drawings in the following are only some embodiments of the present disclosure, and for those ordinarily skilled in the art, other drawings can be obtained on the basis of these provided drawings without inventive work.
The technical solutions of the embodiments of the present disclosure will be described in a clearly and fully understandable way in connection with the drawings related to the embodiments of the present disclosure. Apparently, the described embodiments are just a part but not all of the embodiments of the present disclosure. Based on the described embodiments herein, those ordinarily skilled in the art can obtain other embodiment(s), without any inventive work, which should be within the scope of the disclosure.
An instruction is a command that controls a computer to perform an operation, also referred to as a machine instruction. The instruction plays a role in coordinating operation relationships between respective hardware components, reflects a basic function possessed by a computer, and is a smallest functional unit of computer operation. When a computer executes a certain operation command, the processor needs to process the instruction and transform the same into machine language that can be recognized by a machine. In the processor, pipeline technology is usually used to process instructions.
In a pipeline operation of a processor, an instruction needs to go through processing procedures such as instruction fetching, instruction decoding, and executing. Instruction fetching refers to fetching an instruction corresponding to program operation from a cache or a main memory of the processor; instruction decoding is decoding the fetched instruction to determine an operation code and/or an address code of the instruction; and the executing operation refers to executing the instruction operation according to the operation code and/or the address code obtained, to implement program operation. Due to presence of a branch instruction that changes a program stream in the instruction, in order to solve pipeline delay caused by the processor having to wait for an execution result of the branch instruction to determine a next instruction fetching when processing the branch instruction, a front end of the pipeline where the processor processes the instruction may also be provided with a branch predicting unit to implement branch prediction of the instruction.
The branch predicting unit 101 is a digital circuit that may perform branch prediction on the instruction and generate an instruction fetching request based on a branch prediction result. The branch prediction result is, for example, whether the current instruction is a branch instruction, a branch result (direction, address, target address, etc.) of the branch instruction, etc. In one implementation, the branch predicting unit may perform branch prediction on the instruction based on historical execution information and a result of the branch instruction, so as to obtain an instruction fetching address range of the instruction and generate an instruction fetching request. The instruction fetching request generated by the branch predicting unit includes instruction fetching addresses of a plurality of instructions, for reading corresponding instructions from the instruction cache 102.
The instruction cache 102 has an instruction stored therein, and the instruction cache 102 mainly stores the instruction through an instruction cache block. In the instruction cache, each instruction cache block corresponds to a tag, for identifying the instruction cache block in the instruction cache, so that the instruction cache may find a corresponding instruction cache block based on the tag during instruction fetching according to the instruction fetching request. The instruction fetching address generated by the branch predicting unit may correspond to a plurality of instructions, and the plurality of instructions may form an instruction stream. Optionally, the instruction cache 102 may be a cache portion located in a Level I cache in the processor for storing instructions. The instruction fetching address generated by the branch predicting unit may correspond to a plurality of instructions,
It should be noted that there are a tag region (an address identification region) and an index region (an address index region) in the instruction fetching address generated by the branch predicting unit by using the index region in the instruction fetching address, tags of a plurality of instruction cache blocks may be read from the instruction cache, and then the read tags of the plurality of instruction cache blocks of the instruction cache may be matched with the tag of the instruction fetching address for judgment, to obtain a storage position of the instruction corresponding to the instruction fetching address in the instruction cache (i.e., a position of the instruction cache block), so as to read a corresponding instruction.
The decoder group 103 is capable of parsing and translating the instruction, and through the decode operation performed by the decoder group on the instruction, a decoded instruction may be obtained. The decoded instruction may be machine executable operation information obtained by translating the instruction, for example, a machine executable micro-op (Uop or microinstruction) formed by a control field; that is, the decoder can decode the instruction, so as to obtain the micro-op.
The processor architecture shown in
In one example, the plurality of decoder groups may be two decoder groups, for example, may be decoder group 0 and decoder group 1, respectively. While decoder group 0 is performing a decoding operation on instructions, decoder group 1 may also perform a decoding operation on instructions; for example, within one clock cycle of the processor, decoder group 0 and decoder group 1 are capable of simultaneously performing decoding operations on instructions and obtaining micro-ops, so as to implement parallel decoding on the instructions. Meanwhile, decoder group 0 and decoder group 1 may decode not in an order of the instructions, but may support parallel decoding for instructions. It should be noted that in practical applications, the processor may set up more than two decoder groups as needed; and for ease of understanding, the embodiments of the present disclosure only show an example of two decoder groups.
However, a plurality of decoder groups performing parallel decoding on instructions is different from a single decoder group performing sequential decoding on instructions; more complex situations have to be confronted for the case of a plurality of decoder groups performing parallel decoding on instructions, for example, how to allocate the instruction stream fetched by the instruction cache to the plurality of decoder groups, and how to merge the micro-ops obtained as decoded by the plurality of decoder groups so that micro-ops ultimately executed correspond to an instruction fetching order of instructions. To address the above-described problems, the embodiments of the present disclosure provide a further improved processor architecture, and
As an optional implementation,
Step S21: generating an instruction fetching request, in which the instruction fetching request carries at least one switch tag, and the switch tag at least indicates an instruction position for performing decoder group switch.
In some embodiments, the instruction fetching request generated by the branch predicting unit may carry a switch tag. When the branch predicting unit performs branch prediction, the branch prediction direction is mainly grouped into two types: branch instruction jumping and branch instruction not jumping; correspondingly, the instruction fetching addresses generated by the branch predicting unit may be grouped into two types: an instruction fetching address corresponding to a branch prediction direction of jumping, and an instruction fetching address corresponding to a branch prediction direction of not jumping. As an optional implementation, in the embodiments of the present disclosure, a switch tag may be set according to an address position corresponding to the branch prediction direction of jumping, and an instruction fetching request carrying at least one switch tag is generated.
In some embodiments, the switch tag may also be set through other mechanisms, but not limited to the case that the switch tag is set by the branch predicting unit in the instruction fetching request based on a branch prediction situation. As an optional implementation, after the branch predicting unit generates an instruction fetching request (without carrying a switch tag), the embodiments of the present disclosure may utilize other devices in the processor (e.g., the instruction cache) to set a switch tag in the instruction fetching request. In an implementation example, the instruction cache may set a switch tag in the instruction fetching request based on an instruction boundary after acquiring the instruction fetching request from the branch predicting unit; and the instruction boundary may represent an end position of the instruction.
Step S22: acquiring an instruction stream fetched by the instruction fetching request, and determining the instruction position for performing decoder group switch in the instruction stream according to the switch tag carried by the instruction fetching request.
The instruction cache acquires the instruction fetching request of the branch predicting unit, and fetches instructions according to the instruction fetching address in the instruction fetching request, so as to obtain the instruction stream corresponding to the instruction fetching request. In the case where the instruction fetching request carries a switch tag, and the switch tag at least indicates the instruction position for performing decoder group switch, then the instruction position for performing decoder group switch in the instruction stream may be determined according to the switch tag carried by the instruction fetching request.
It may be understood that an instruction stream is an instruction sequence including a plurality of instructions; and when there is no clear boundary in the instruction sequence, the end position of instructions in the instruction sequence cannot be determined. In the embodiments of the present disclosure, by determining the instruction position for performing decoder group switch in the instruction stream according to the switch tag carried by the instruction fetching request, the boundary of the instruction stream may be determined, and then the instruction position may be served as the end position. In the case where the instruction corresponding to the instruction position is served as a target instruction, the instruction position is an end position of the target instruction, so that the end position of the target instruction in the instruction stream may be determined according to the instruction position indicated by the switch tag carried by the instruction fetching request.
It should be noted that because the switch tag is used to at least indicate the instruction position for performing decoder group switch, the setting position in the instruction fetching request will not affect the instruction stream fetched by the instruction cache, nor will it cause damage to a structure of the instruction stream fetched. Moreover, specific setting position and representation form of the switch tag will not be limited in the embodiments of the present disclosure, for example, may be represented by an indicator field that exists outside the instruction stream fetched by the instruction cache or through a switch tag indicator.
Step S23: allocating the instruction stream to a plurality of decoder groups for parallel decoding according to the instruction position; and attaching the switch tag to a target micro-op obtained by decoding a target instruction, in which the target instruction is an instruction corresponding to the instruction position.
In some embodiments, based on the instruction stream fetched by the instruction cache, the instruction allocating unit may allocate the instruction stream to the plurality of decoder groups according to the instruction position (i.e., the instruction position for performing decoder group switch) for parallel decoding. In an optional example, the instruction allocating unit may split the instruction stream according to the instruction position to obtain a plurality of instruction groups, and further allocate the plurality of instruction groups to the plurality of decoder groups for parallel decoding.
Further, in some embodiments, the instruction allocating unit splitting the instruction stream according to the instruction position may be splitting the instruction stream into a plurality of instruction groups with the instruction position as a boundary in the instruction stream, and a target instruction that serves as a boundary in two adjacent instruction groups is grouped into a previous instruction group. Thus, when allocating the instruction stream to the plurality of decoder groups for parallel decoding, the instruction allocating unit may allocate a decoder group for a next instruction group according to a switch tag corresponding to the target instruction grouped into the previous instruction group, and the decoder group allocated for the previous instruction group is different from the decoder group allocated for the next instruction group; for example, the switch tag corresponding to the target instruction may be a switch tag corresponding to an end position indicating the target instruction.
It should be noted that after the plurality of decoder groups perform decoding operations on the allocated instruction groups to obtain micro-ops, in order to merge the micro-ops decoded by the plurality of decoder groups in the instruction fetching order, in the embodiments of the present disclosure, a switch tag may be attached to a target micro-op obtained by decoding the target instruction, through parsing and translating the target instruction corresponding to the instruction position. As an optional implementation, the target micro-op obtained by decoding the target instruction may be a combination of two micro-ops, one of which is a micro-op not attached with a switch tag and the other is a micro-op attached with a switch tag.
In some embodiments, after obtaining the micro-ops decoded by the plurality of decoder groups, in the embodiments of the present disclosures, the micro-ops obtained as decoded by the plurality of decoder groups may be further merged according to the switch tag attached to the target micro-op to obtain the micro-ops corresponding to the instruction fetching order. It may be understood that in order to implement complete operation of the program, it is necessary to merge the micro-ops obtained as decoded by the plurality of decoder groups, and an order of the micro-op sequence obtained after merging the micro-ops must also correspond to the instruction fetching order.
In the embodiments of the present disclosure, in the case of the instruction fetching request carrying the switch tag, the switch tag at least indicates the instruction position for performing decoder group switch. Thus, the instruction stream fetched by the instruction fetching request may be acquired; the instruction position for performing decoder group switch in the instruction stream may be determined according to the switch tag carried by the instruction fetching request; further, according to the instruction position, the instruction stream may be allocated to the plurality of decoder groups for parallel decoding; and the switch tag is attached to the target micro-op obtained by decoding the target instruction, and the target instruction is an instruction corresponding to the instruction position. In the embodiments of the present disclosure, the instruction position for performing decoder group switch may be indicated by the switch tag, and the switch tag is transparently transmitted from the instruction fetching request to the instruction position in the fetched instruction stream, so as to implement splitting the fetched instruction stream based on the instruction position and allocating the same to the plurality of decoder groups for parallel decoding, which effectively improves decoding efficiency of the processor. Further, in the embodiments of the present disclosure, the switch tag may be transparently transmitted into the target micro-op by parsing and translating the target instruction serving as the boundary; after obtaining the micro-ops decoded by the plurality of decoder groups, the micro-ops obtained as decoded by the plurality of decoder groups may be merged according to the switch tag, to obtain the micro-ops corresponding to the instruction fetching order, for accurate execution of the micro-ops.
In some embodiments,
Based on the principle of the method flow shown in
The branch predicting unit 101 generates an instruction fetching request carrying a switch tag, and sends the instruction fetching request to the instruction cache 102, to read a corresponding instruction stream according to the instruction fetching request in the instruction cache 102. The switch tag of the instruction fetching request indicates the instruction position for performing decoder group switch, without affecting the instruction fetching address searching for the instruction corresponding to the instruction position in the instruction cache.
The instruction cache 102 reads the instruction stream according to the instruction fetching address of the instruction fetching request, and the switch tag carried by the instruction fetching request does not affect the fetched instructions of the instruction cache. After reading the instruction stream, the instruction cache 102 may determine the instruction position for performing decoder group switch in the instruction stream according to the switch tag carried by the instruction fetching request.
The instruction allocating unit 201 splits the instruction stream according to the instruction position, to obtain a plurality of instruction groups, and allocates the plurality of instruction groups to instruction queues 2021 to 202n corresponding to decoder groups 1031 to 103n. As an optional implementation, the number of instruction positions indicated by the switch tag may be plural, so that the number of instruction positions determined in the instruction stream may be plural; when splitting the instruction stream, the instruction allocating unit may perform one split when one of the instruction positions is recognized in the instruction stream, group a target instruction corresponding to the instruction position into a previous instruction group, and allocate a decoder group for a next instruction group according to the switch tag corresponding to the target instruction grouped into the previous instruction group; and the decoder group allocated for the previous instruction group is different from the decoder group allocated for the next instruction group. In this way, with the instruction position as a boundary, the instruction stream is split into the plurality of instruction groups, and the plurality of instruction groups are allocated to the plurality of decoder groups for parallel decoding.
In some embodiments, based on the instruction queue that is set corresponding to the decoder group and is used for saving instructions to be decoded, the instruction allocating unit 201 may save a first instruction group among the plurality of instruction groups to an instruction queue corresponding to a default decoder group; with respect to a non-first instruction group among the plurality of instruction groups, the instruction allocating unit 201 may determine a decoder group from the plurality of decoder groups that is different from the decoder group allocated for the previous instruction group, according to the switch tag corresponding to the target instruction in the previous instruction group, and then save the non-first instruction group to the instruction queue corresponding to the determined decoder group.
In an optional implementation of allocating non-first instruction groups among the plurality of instruction groups to decoder groups, the decoder groups allocated for the respective non-first instruction groups are sequentially determined from the plurality of decoder groups, in an order of the plurality of decoder groups, according to the switch tag corresponding to the target instruction in a previous instruction group of a non-first instruction group. For example, in the case where the first decoder group 1031 is a default decoder group, the first instruction group after instruction stream splitting is allocated to the decoder group 1031, and then the respective non-first instruction groups are sequentially allocated to decoder groups after the decoder group 1031 in the order of the decoder groups, until decoder group 103n.
It may be understood that the allocation principle may be that the instruction allocating unit allocates the decoder groups for the instruction groups in an order of decoder groups, and a decoder group different to the decoder group allocated for the previous instruction group is allocated for the next instruction group, according to the switch tag corresponding to the target instruction in the previous instruction group, so as to reasonably allocate instruction groups in the instruction queues corresponding to the decoder groups, which ensures that the plurality of decoder groups may read instructions to be decoded in a corresponding instruction queue, and implements parallel decoding of the plurality of decoder groups.
It should be noted that in another optional implementation, the switch tag may further include information about a decoder group to be switched to, for specifically indicating the decoder group to be switched to. Thus, the instruction allocating unit is capable of allocating a specific decoder group for the next instruction group based on the switch tag corresponding to the target instruction in the previous instruction group, to implement allocating decoder groups for instruction groups without following the order of the decoder groups. For example, with respect to decoder groups 1031 to 103n, according to an order in which switch tags appear, in the case where information of decoder group 103n is recorded in a first switch tag, after allocating the first instruction group to the default decoder group, decoder group 103n specifically indicated by the first switch tag is allocated for the next instruction group, for which decoding is performed by decoder group 103n; and in the case where the next switch tag specifically indicates decoder group 1031, the decoder group allocated for the next instruction group is decoder group 1031, for which decoding is performed by decoder group 1031. In this way, the instruction allocating unit allocates a corresponding decoder group for an instruction group according to a decoder group to be switched as specifically indicated in the switch tag, until the instruction groups are allocated completely.
It should be further noted that the default decoder group may be the first decoder group allocated in order, or may also be a decoder group whose allocation is specified by the processor, which will not be limited by the present disclosure.
With further reference to
The merging unit 204 reads micro-ops from micro-op queues 2031 to 203n, and merges the read micro-ops, to obtain a micro-op sequence that may be executed. When the micro-ops in micro-op queues 2031 to 203n are merged by the merging unit 204, the merging unit 204 may sequentially read the micro-ops in micro-op queues 2031 to 203n for merging based on the switch tag attached to the target micro-op. For example, the merging unit reads micro-ops in micro-op queue 2031, in the order of micro-op queues; when reading a target micro-op attached with a switch tag in micro-op queue 2031, sequentially switches to a next micro-op queue after micro-op queue 2031 and reads micro-ops in the micro-op queue; and when reading a target micro-op attached with a switch tag in the micro-op queue, continues to switch to a next micro-op queue to read micro-ops, and so on, until the micro-ops are read completely.
In some embodiments, the merging unit reads a first micro-op queue of the micro-ops, which may correspond to a first instruction queue allocated among the instruction group (e.g., a micro-op queue and an instruction queue belonging to the same decoder group); in one example, in the case where the instruction allocating unit allocates the first instruction group to instruction queue 2021 corresponding to decoder group 1031, the merging unit, when merging the micro-ops, firstly reads micro-ops in micro-op queue 2031, to provide support such that the merged micro-op queue is capable of corresponding to the instruction fetching order.
In other embodiments, the switch tag corresponding to the target instruction may further specifically indicate the decoder group to be switched to, and the switch tag attached to the target micro-op obtained by decoding the target instruction may specifically indicate the micro-op queue to be switched to. Thus, when merging the micro-ops, the merging unit may read the micro-op queue in a switching manner based on the switch tag attached to the target micro-op, so as to read without following the order of the micro-op queue. In an example, assuming that the target micro-op exists in micro-op queue 2031, and the switch tag attached to the target micro-op specifically indicates that the micro-op queue to be switched to is 203n, the merging unit, when reading the switch tag attached to the target micro-op in micro-op queue 2031, may switch to micro-op queue 203n to continue to read micro-ops; and when the switch tag attached to the target micro-op is read in micro-op queue 203n, which specifically indicates that the micro-op queue to be switched to is 2031, the merging unit switches to micro-op queue 2031 to read micro-ops.
To facilitate understanding the principle of splitting the instruction stream based on the instruction position for performing decoder group switch in the instruction stream, hereinafter, it will be introduced by taking two decoder groups as an example.
As shown in
Referring to
To facilitate understanding the principle of merging the micro-ops, based on a micro-op attached with a switch tag, hereinafter, it will be introduced by taking two decoder groups as an example.
Decoder group 0 decodes an instruction group from instruction 310 to target instruction 31k, to obtain micro-op 320 to target micro-op 32k (not shown), decoder group 0 parses and translates target micro-op 31k, to obtain target micro-op 32k, which is a combination of micro-op 32k′ and micro-op 32k″; micro-op 32k′ is a micro-op not attached with a switch tag, and micro-op 32k″ is a micro-op attached with a switch tag. Decoder group 1 decodes an instruction group from instruction 31k+1 to instruction 31m, to obtain micro-op 32k+1 to micro-op 32m. Micro-op 320 to target micro-op 32k are saved in a micro-op queue of decoder group 0, and micro-op 32k+1 to micro-op 32m are save in a micro-op queue of decoder group 1.
When merging the micro-ops, the micro-ops may be mad from the micro-op queue of decoder group 0 firstly in an order of decoder groups; when reading micro-op 32k″ attached with a switch tag in target micro-op 32k, it switches to the micro-op queue of decoder group 1 to read the micro-ops, that is, when reading the micro-op with a switch tag attached with a switch tag from the currently read micro-op queue, it switches to a next micro-op queue to read micro-ops, until the micro-ops are read completely. Because the fetched instruction stream is split according to the instruction position indicated by the switch tag in the target instruction and allocated to the plurality of decoder groups for parallel decoding, when reading the micro-ops obtained through decoding, the micro-ops may be read in a switched manner between the micro-op queues of the plurality of decoder groups according to the switch tag attached to the target micro-op, so that the read micro-ops may correspond to the instruction fetching order.
It is described above that in the case where the processor has a plurality of decoder groups, the instruction position for performing decoder group switch is indicated by the switch tag of the instruction fetching request, and the switch tag is transparently transmitted through the target instruction in the instruction stream to the micro-op obtained as decoded by the decoder group, according to the instruction fetching request, so as to support parallel decoding by the plurality of decoder groups and sequential merging of micro-ops, which effectively improves decoding efficiency of the processor. Although the plurality of decoder groups of the processor may implement parallel decoding of instructions, yet it has to undergo the process of instruction fetching and decoding so as to obtain micro-ops, which makes the process of obtaining micro-ops more cumbersome; based on this, in order to improve the speed of acquiring micro-ops, the embodiments of the present disclosure further provide a high-performance processor having a Micro-Op Cache (OC).
When acquiring micro-ops in the micro-op cache based on the instruction fetching request generated by the branch predicting unit, the micro-op cache may make a hit judgment between a start address of the instruction fetching request and an address of a first micro-op of all table entries; if hit, a micro-op in a first table entry is obtained. When an end address of the last micro-op in the micro-op cache table entry is less than an end address of an address range of the instruction fetching request, the end address in the address range corresponding to the last micro-op needs to be used, to further make a hit judgment with the address of the first micro-op in all table entries; if hit, a micro-op in a second table entry is obtained. The above-described process is repeated until the end address of the address range in the instruction fetching request is less than the end address of the last micro-op in the table entry, and then the micro-ops may be read from the micro-op cache based on the instruction fetching request.
In some embodiments, when all the addresses in the instruction fetching request generated by the branch predicting unit may be hit in the micro-op cache, the micro-op cache is capable of outputting corresponding micro-ops; and when the start address in the instruction fetching request generated by the branch predicting unit fails to be hit in the micro-op cache table entry, the micro-op cache is incapable of outputting micro-ops.
Based on the processor having the micro-op cache and the decoder group, the processor may include a variety of decoding modes, and the variety of decoding modes include a decoder mode and a micro-op cache mode, the decoder mode is obtaining a micro-op through decoding by the decoder group, and the micro-op cache mode is obtaining a micro-op through searching by the micro-op cache.
As shown in
In response to the micro-op being obtained as decoded by the decoder group, that is, in the decoder mode, the branch predicting unit 101 issues an instruction fetching request carrying a switch tag to the instruction cache 102; the instruction cache 102 fetches an instruction stream according to an address in the instruction fetching request; and the instruction allocating unit 201 splits the instruction stream according to an instruction position in the instruction stream to obtain a plurality of instruction groups, and allocate the obtained plurality of instruction groups to instruction queues 2021 to 202n corresponding to the plurality of decoder groups. The plurality of decoder groups 1031 to 103n read the instructions to be decoded in instruction queues 2021 to 202n respectively corresponding thereto and perform decoding operations to obtain micro-ops, further save the decoded micro-ops into corresponding micro-op queues, and based on existence of the micro-op cache 104, may further cache the decoded micro-ops into the micro-op cache.
In response to a micro-op being obtained through searching by the micro-op cache, that is, in the micro-op cache mode, the branch predicting unit 101 issues an instruction fetching request carrying a switch tag to the micro-op cache 104, so that the micro-op cache may correspondingly output a micro-op, based on hit of the instruction fetching request in the micro-op cache, and based on existence of the micro-op queue, is capable of saving the acquired micro-op to a micro-op queue corresponding to a default decoder group.
In some embodiments, if the instruction fetching request is hit in the micro-op cache, the acquired micro-op is saved to the micro-op queue corresponding to the default decoder group; the micro-op queue corresponding to the default decoder group may be a micro-op queue corresponding to a first decoder group determined in order, or may also be a micro-op queue corresponding to a decoder group specified by the processor, or may also be a micro-op queue corresponding to the decoder group, determined according to whether the last instruction decoded by the decoder group has an instruction position indicated by a corresponding switch tag before the decoding mode is switched to the micro-op cache mode.
As an optional implementation, when the last instruction decoded by the decoder group does not have an instruction position indicated by a corresponding switch tag before the decoding mode of the processor is switched to the micro-op cache mode, the micro-op read in the micro-op cache is saved to a micro-op queue corresponding to a decoder group decoding the last instruction before switching the decoding mode.
As another optional implementation, when the last instruction decoded by the decoder group has an instruction position indicated by a corresponding switch tag before the decoding mode of the processor is switched from the decoder mode to the micro-op cache mode, the micro-op read in the micro-op cache is saved to the micro-op queue corresponding to the decoder group switched to as indicated by the switch tag according to the switch tag corresponding to the last instruction.
It may be understood that because the switch tag carried in the instruction fetching request at least indicates the instruction position for performing decoder group switch, and the micro-op cache mode does not undergo instruction decoding by the decoder group, the micro-op cache may not respond to the switch tag carried in the instruction fetching request, and the read micro-op is not attached with the switch tag.
In other embodiments, in the micro-op cache mode, when the instruction fetching request is not hit in the micro-op cache, it enters the decoder mode; the plurality of decoder groups in the decoder mode perform parallel decoding on the instructions corresponding to the instruction fetching request to obtain micro-ops, and the micro-ops obtained as decoded in the decoder mode may be saved in the micro-op cache. In the decoder mode switched to, after the instruction stream fetched by the instruction fetching request is split according to the instruction position indicated by the switch tag, the first instruction group among the obtained plurality of instruction groups is allocated to the instruction queue corresponding to the default decoder group; the instruction queue corresponding to the default decoder group may be an instruction queue corresponding to the first decoder group determined in order, or may also be an instruction queue corresponding to a decoder group specified by the processor, or may also be an instruction queue corresponding to the corresponding decoder group, determined according to whether the last instruction decoded by the decoder group has the instruction position indicated by the corresponding switch tag before entering the micro-op cache mode.
As an optional implementation, when the last instruction decoded by the decoder group does not have an instruction position indicated by a corresponding switch tag before the decoding mode of the processor is switched to the micro-op cache mode, after the micro-op cache mode is switched to the decoder mode, the first instruction group among the plurality of instruction groups obtained by splitting the instruction stream is allocated to an instruction queue corresponding to the decoder group decoding the last instruction, before switching to the micro-op cache mode. With further reference to
As another optional implementation, when the last instruction decoded by the decoder group has an instruction position indicated by a corresponding switch tag before the decoding mode of the processor is switched to the micro-op cache mode, after the micro-op cache mode is switched to the decoder mode, the first instruction group among the plurality of instruction groups obtained by splitting the instruction stream is allocated to the instruction queue corresponding to the decoder group switched to as indicated by the switch tag corresponding to the last instruction, before switching to the micro-op cache mode. With further reference to
The embodiments of the present disclosure may allow in the processor having the micro-op cache and the decoder that, by carrying the switch tag in the instruction fetching request and at least indicating the instruction position for performing decoder group switch by the switch tag, in a decoder mode (corresponding to obtaining the micro-op through decoding by the decoder group), the switch tag may be transparently transmitted through the target instruction and the micro-op, to support parallel decoding by the plurality of decoder groups, thereby improving decoding efficiency; while in a micro-op cache mode of the processor (corresponding to obtaining the micro-op through searching by the micro-op cache), the embodiments of the present disclosure may allow processing not based on the switch tag carried in the instruction fetching request, so as to be compatible with the micro-op cache mode of the processor. The embodiments of the present disclosure may support parallel decoding in the processor that supports the decoder mode and the micro-op cache mode, so as to improve decoding performance. That is, the embodiments of the present disclosure may support parallel decoding in the decoder mode in the case where the processor is compatible with the micro-op cache mode, so as to improve decoding performance.
As an optional example,
Step S60: generating an instruction fetching request.
Optionally, step S60 may be executed by the branch predicting unit; and the branch predicting unit may set a switch tag in the instruction fetching request according to a branch prediction jump result, for indicating the instruction position for performing decoder group switch.
Step S61: judging whether the current processor is in the micro-op cache mode, if NO, executing step S62; if YES, executing step S69.
The micro-op cache mode is obtaining micro-op through searching by the micro-op cache.
Step S62: accessing the instruction cache and fetching an instruction stream according to the instruction fetching request.
It should be noted that when the instruction cache reads the instruction stream according to the instruction fetching request, the switch tag carried by the instruction fetching request will be transparently transmitted outside the fetched instruction stream; because the switch tag at least indicates the instruction position for performing decoder group switch, the instruction position for performing the decoder group switch in the instruction stream may be determined, and further the target instruction corresponding to the instruction position is determined.
Step S63: judging whether there is a switch tag in the instruction fetching request; if NO, executing step S64; if YES, executing step S65.
Step S64: sending the instruction stream to the default decoder group, decoding, by the default decoder group, the instruction, and saving micro-ops obtained to the corresponding micro-op queue.
It may be understood that the switch tag carried by the instruction fetching request is at least used to indicate the instruction position for performing decoder group switch; when there is no switch tag in the instruction fetching request, there is no need to switch the decoder group; and the instruction stream is decoded by the default decoder group, and the micro-ops obtained are saved in the micro-op queue corresponding to the default decoder group. For example, the default decoder group may be the first decoder group allocated in order, or may also be a decoder group allocated as specified by the processor; in addition, during processing of instructions, for each instruction fetching request, the default decoder group may also be a decoder group currently switched to. The default decoder group according to the embodiments of the present disclosure is not specified as a certain fixed decoder group, and may be selected according to actual needs.
Step S65: splitting the instruction stream according to the switch tag, allocating the target instruction and instructions before the target instruction to the instruction queue corresponding to the default decoder group, decoding by the first decoder group, and saving micro-ops obtained to the corresponding micro-op queue, in which the target micro-op obtained by decoding the target instruction is attached with a switch tag.
Step S66: judging whether the remaining instructions still have a target instruction corresponding to the switch tag; if NO, executing step S67; if YES, executing step S68.
Step S67: allocating the remaining instructions to the instruction queue corresponding to a next decoder group different from the previous decoder group, decoding by the next decoder group, and saving micro-ops obtained to the corresponding micro-op queue.
Step S68: allocating the target instruction and instructions before the target instruction to the instruction queue corresponding to the next decoder group different from the previous decoder group, decoding by the next decoder group, and saving micro-ops obtained to the corresponding micro-op queue, in which the micro-op obtained by decoding the target instruction is attached with a switch tag; and returning to execute step S66.
It should be noted that because the instructions are allocated to the instruction queue of the decoder group, and then the instructions are read from the instruction queue by the decoder group for decoding, when the speed of allocating the instructions to the instruction queue is faster than the decoding speed of the decoder group, the embodiments of the present disclosure may implement allocating the split instruction stream to the instruction queue of the plurality of decoder groups, and cause the plurality of decoder groups to perform parallel decoding based on the instructions already allocated to the instruction queue. It should be further noted that step S65 to step S68 are only an optional implementation of splitting the instruction stream according to the switch tag corresponding to the target instruction, to obtain a plurality of instruction groups, and allocating the plurality of instruction groups to the plurality of decoder groups for parallel decoding according to the embodiments of the present disclosure.
Step S69: fetching the micro-op from the micro-op cache, and saving the obtained micro-op to the corresponding micro-op queue.
The selection of a micro-op queue is as described above, and no details will be repeated here.
It should be noted that the prerequisite for fetching a micro-op in the micro-op cache based on the instruction fetching request is that the instruction fetching request may be hit in the micro-op cache; when the instruction fetching request cannot be hit in the micro-op cache, it enters the decoder mode to execute step S62.
The decoder mode is obtaining a micro-op through decoding by the decoder group.
Step S70: reading the micro-op from the micro-op queue corresponding to the first decoder group.
Step S71: judging whether the read micro-op is attached with a switch tag; if YES, executing step S72; if NO, returning to execute step S70, until the micro-ops are read completely
Step S72: switching to a micro-op queue corresponding to a next decoder group to read micro-ops, and returning to step S71.
After the micro-ops are read completely, the micro-ops may be further executed according to the embodiments of the present disclosure.
The embodiments of the present disclosure may support parallel decoding in the processor that supports the decoder mode and the micro-op cache mode, so as to improve decoding performance.
The embodiments of the present disclosure further provide a processor,
Optionally, the instruction position is an end position of the target instruction; and the step of determining, by the instruction cache, the instruction position for performing decoder group switch in the instruction stream according to the switch tag carried by the instruction fetching request, includes:
Optionally, the step of allocating, by the instruction allocating unit, the instruction stream to a plurality of decoder groups for parallel decoding according to the instruction position, includes:
Optionally, the step of splitting, by the instruction allocating unit, the instruction stream according to the instruction position to obtain a plurality of instruction groups, includes:
The step of allocating, by the instruction allocating unit, the plurality of instruction groups to the plurality of decoder groups for parallel decoding, includes:
Optionally, one decoder group is correspondingly provided with one instruction queue for saving instructions to be decoded;
Optionally, the step of determining, by the instruction allocating unit, a decoder group from the plurality of decoder groups that is different from the decoder group allocated for the previous instruction group, for a non-first instruction group among the plurality of instruction groups, according to the switch tag corresponding to the target instruction in the previous instruction group, includes:
Optionally, the step of saving, by the instruction allocating unit, the first instruction group among the plurality of instruction groups to the instruction queue corresponding to the default decoder group, includes:
Optionally, the processor further includes a merging unit, which is configured to merge micro-ops obtained as decoded by the plurality of decoder groups, in the decoder mode, according to the switch tag attached to the target micro-op, to obtain micro-ops corresponding to an instruction fetching order.
Optionally, one decoder group is correspondingly provided with one micro-op queue:
Optionally, the step of merging, by the merging unit, the micro-ops in a switched manner between the micro-op queues corresponding to the respective decoder groups, according to the switch tag attached to the target micro-op, to obtain the micro-ops corresponding to the instruction fetching order, includes:
Optionally, in the case where the read micro-op is attached with the switch tag, the step of determining, by the merging unit, a next micro-op queue to be switched to for micro-op reading, according to the switch tag attached to the micro-op, includes:
Optionally, the micro-op cache is further configured to save an obtained micro-op to a micro-op queue corresponding to the default decoder group, in the micro-op cache mode;
The embodiments of the present disclosure further provide a chip, which may include the processor described above.
The embodiments of the present disclosure further provide an electronic device, which may include the chip described above.
The above describes a plurality of embodiments of the present disclosure, and in a case of no conflict, the respective optional modes introduced in the respective embodiments may be subjected to mutual combination and cross reference, thereby extending various possible embodiments, which may be considered as disclosed embodiments of the present disclosure.
Although the embodiments of the present disclosure are disclosed as above, the present disclosure is not limited thereto. Any person skilled in the art may make various changes and modifications without departing from the spirit and scope of the present disclosure. Therefore, the scope of the present disclosure should be the scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
202211350246.2 | Oct 2022 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2023/078435 | 2/27/2023 | WO |