The present disclosure is generally related to a branch target predictor.
Advances in technology have resulted in more powerful computing devices. For example, there currently exists a variety of computing devices, including wireless computing devices, such as portable wireless telephones, personal digital assistants (PDAs), and paging devices that are small, lightweight, and easily carried by users, laptop and desktop computers, and servers.
A computing device may include a processor that is operable to execute different instructions in an instruction set (e.g., a program). The instruction set may include direct branches and indirect branches. An indirect branch may specify the fetch address of the next instruction to be executed from an instruction memory. The next instruction may be indirectly fetched because the instruction address is resident in some other storage element (e.g., a processor register). Thus, the indirect branch may not embed the offset to the address of the target instruction within one of the instruction fields in the branch instruction. Non-limiting examples of an indirect branch include a computed jump, an indirect jump, and a register-indirect jump. In order to attempt to increase performance at the processor, the processor may predict the fetch address. To predict the fetch address, the processor may use multiple predictor tables, where each predictor table includes multiple prediction entries, and where each prediction entry stores a fetch address.
Because each prediction entry stores an entire fetch address and multiple prediction tables may include similar entries, in certain scenarios, there may be a relatively large amount of overhead at each predictor table. For example, each prediction entry in a predictor table may not be used by an application, multiple predictor tables may include identical predictor entries (e.g., target duplication), and the number of predictor table entries may not be capable of adjustment independently from the number of target instructions.
The processor may also utilize a stored global history from past indirect branches to predict the fetch address. For example, the processor may predict the fetch address based on predicted fetch addresses for the previous ten indirect branches to provide context. Each fetch address stored in the global history may utilize approximately ten bits of storage. For example, twenty previously predicted fetch addresses stored in the global history may utilize approximately two-hundred bits of storage. Thus, a relatively large amount of storage may be used for the global history.
According to one implementation of the present disclosure, an apparatus for predicting a fetch address of a next instruction to be fetched includes a memory system, first selection logic, and second selection logic. The memory system includes a plurality of predictor tables and a target table. The plurality of predictor tables includes a first predictor table and a second predictor table. The first predictor table includes a first entry having a first way identifier, and the second predictor table includes a second entry having a second way identifier. The target table includes a first way that stores a first fetch address associated with the first way identifier and a second way that stores a second fetch address associated with the second way identifier. The first way and the second way are associated with an active address. According to one implementation, the first way identifier and the second way identifier may “point” to a similar way. According to another implementation, the first way identifier and the second way identifier may point to different ways. The first selection logic is coupled to select the first way identifier or the second way identifier as a way pointer based on the active fetch address and historical prediction data. The second selection logic is configured to select the first fetch address or the second fetch address as a predicted fetch address based on the way pointer. By using a separate table (e.g., the target table) to store multiple fetch addresses as opposed to storing multiple (and sometimes identical) fetch addresses at different predictor tables, an amount of overhead may be reduced. Additionally, the historical prediction data may include an “abbreviated version” of the previously used fetch addresses (e.g., some bits of previously used fetch addresses) as opposed to the entire fetch addresses, data associated with way identifiers of the previously used fetch addresses, or a combination of both. The most significant bits of a fetch address may not substantially change from one fetch address to another fetch address. Lower order bits (or a hash function) may be used to reduce a particular fetch address into a smaller number of bits. According to one example, the historical prediction data may include a way number (e.g., a way identifier) in the target table for each previously used fetch address. Thus, instead of 64-bit previously used fetch addresses, the historical prediction data may include some bits (e.g., three to five bits) for each previously used fetch address and a relatively small number of bits (e.g., two to three bits) to identify the way of each previously used fetch address. This reduction in bits may reduce the overhead at the processing system compared conventional processing systems for predicting a fetch address of a target instruction.
According to another implementation of the present disclosure, a method for predicting a fetch address of a next instruction to be fetched includes selecting, at a processor, a first way identifier or a second way identifier as a way pointer based on an active fetch address and historical prediction data. A first predictor table includes a first entry having the first way identifier and a second predictor table includes a second entry having the second way identifier. The method also includes selecting a first fetch address or a second fetch address as a predicted fetch address based on the way pointer. A target table includes a first way storing the first fetch address and a second way storing the second fetch address. The first way and the second way are associated with the active fetch address. The first fetch address is associated with the first way identifier and the second fetch address is associated with the second way identifier.
According to another implementation of the present disclosure, a non-transitory computer-readable medium includes commands for predicting a fetch address of a next instruction to be fetched. The commands, when executed by a processor, cause the processor to perform operations including selecting a first way identifier or a second way identifier as a way pointer based on an active fetch address and historical prediction data. A first predictor table includes a first entry having the first way identifier and a second predictor table includes a second entry having the second way identifier. The operations also include selecting a first fetch address or a second fetch address as a predicted fetch address based on the way pointer. A target table includes a first way storing the first fetch address and a second way storing the second fetch address. The first way and the second way are associated with the active fetch address. The first fetch address is associated with the first way identifier and the second fetch address is associated with the second way identifier.
According to another implementation of the present disclosure, an apparatus for predicting a fetch address of a next instruction to be fetched includes means for storing data. The means for storing data includes a plurality of predictor tables and a target table. The plurality of predictor tables includes a first predictor table and a second predictor table. The first predictor table includes a first entry having a first way identifier, and the second predictor table includes a second entry having a second way identifier. The target table includes a first way that stores a first fetch address associated with the first way identifier and a second way that stores a second fetch address associated with the second way identifier. The first way and the second way are associated with an active address. The apparatus also includes means for selecting the first way identifier or the second way identifier as a way pointer based on the active fetch address and historical prediction data. The apparatus also includes means for selecting the first fetch address or the second fetch address as a predicted fetch address based on the way pointer.
Referring to
As explained below, the processing system 100 may predict the fetch address of the target instruction based on an active fetch address 110. According to one implementation, the active fetch address 110 may be based on a current program counter (PC) value. The processing system 100 includes a plurality of predictor tables, a global history table 112, first selection logic 114, a target table 118, and second selection logic 120. According to one implementation, the first selection logic 114 includes a first multiplexer and the second selection logic 120 includes a second multiplexer.
The plurality of predictor tables includes a predictor table 102, a predictor table 104, a predictor table 106, and a predictor table 108. Although four predictor tables 102-108 are shown, in other implementations, the processing system 100 may include additional (or fewer) predictor tables. As a non-limiting example, the processing system 100 may include eight predictor tables in another implementation.
Each predictor table 102-108 includes multiple entries that identify different fetch addresses. For example, the predictor table 102 includes a first plurality of entries 150, the predictor table 104 includes a second plurality of entries 160, the predictor table 106 includes a third plurality of entries 170, and the predictor table 108 includes a fourth plurality of entries 180. According to one implementation, different predictor tables 102-108 may have different sizes. To illustrate, different predictor tables 102-108 may have a different number of entries. As a non-limiting example, the fourth plurality of entries 180 may include more entries than the second plurality of entries 160.
The predictor tables 102-108 of the processing system 100 are shown in greater detail in
The predictor table 102 includes an entry 152, an entry 154, an entry 156, and an entry 158. According to one implementation, each entry 152-158 may be included in the first plurality of entries 150 of
The predictor table 104 includes an entry 162, an entry 164, an entry 166, and an entry 168. According to one implementation, each entry 162-168 may be included in the second plurality of entries 160 of
The predictor table 106 includes an entry 172, an entry 174, an entry 176, and an entry 178. According to one implementation, each entry 172-178 may be included in the third plurality of entries 170 of
The predictor table 108 includes an entry 182, an entry 184, an entry 186, and an entry 188. According to one implementation, each entry 182-188 may be included in the fourth plurality of entries 180 of
“0X80889823” and may include the way identifier “B”, the entry 186 may include a tag “0X80881323” and may include the way identifier “C”, and the entry 188 may include a tag “0X80888888” and may include the way identifier “D”.
A processor (e.g., in the processing system 100 of
The processor may determine that there are no entries in the predictor table 106 that match the active fetch address 110. Thus, the processor may not provide a way identifier to the first selection logic 114 as an output tag indicator 107 of the predictor table 106. The processor may determine that the entry 186 in the predictor table 108 matches the active fetch address 110. Based on this determination, the processor may provide the way identifier “C” to the first selection logic 114 as an output tag indicator 109 of the predictor table 108.
In the illustrative example, each output tag indicator 103, 105, 107, 109 provides a different way identifier to the first selection logic 114. The first selection logic 114 may be configured to select the output tag indicator of the predictor table that has an entry matching the active fetch address 110 and that utilizes a largest amount of historical prediction data (associated with the global history table 112), as explained below. As described above, the output tag indicators 103, 105, 109 correspond to entries 152, 164, 186, respectively, having tags identify the active fetch address 110. Thus, as explained below, the first selection logic 114 may determine which output tag indicator 103, 105, 109 to select based on the amount of historical prediction data associated with each output tag indicator 103, 105, 109. In a scenario where only one output tag indicator corresponds to an entry having a tag identifies the active fetch address 110, the first selection logic 114 may select that output tag indicator.
Referring back to
The processing system 100 may provide the historical prediction data 113 to the predictor table 104, to the predictor table 106, and to the predictor table 108. For example, the processing system 100 may provide a first amount of the historical prediction data 113 to the predictor table 104 with the active fetch address 110 to generate the output tag indicator 105, the processing system 100 may provide a second amount of the historical prediction data 113 (that is greater than the first amount) to the predictor table 106 with the active fetch address 110 to generate the output tag indicator 107, and the processing system 100 may provide a third amount of the historical prediction data 113 (that is greater than the second amount) to the predictor table 104 with the active fetch address 110 to generate the output tag indicator 109.
Because the processing system 100 generates the output tag indicator 103 from the predictor table 102 based solely on the active fetch address 110, the output tag indicator 103 may not be as reliable as the output tag indicators 105, 107, 109 that are generated based on increasing amounts of the historical prediction data 113. Furthermore, because the output tag indicator 107 is generated using more of the historical prediction data 113 than the amount of historical prediction data 113 used to generate the output tag indicator 105, the output tag indicator 107 may be more reliable than the output tag indicator 105. Similarly, because the output tag indicator 109 is generated using more of the historical prediction data 113 than the amount of historical prediction data 113 used to generate the output tag indicator 107, the output tag indicator 109 may be more reliable than the output tag indicator 107.
In the example illustrated in
The target table 118 includes multiple fetch addresses that are separated by sets (e.g., rows) and ways (e.g., columns). In the illustrative example, the target table 118 includes four sets (e.g., “Set 1”, “Set 2”, “Set 3”, and “Set 4”). The target table 118 may also include four ways (e.g., “Way A”, “Way B”, “Way C”, and “Way D”). Although the target table 118 is shown to include four sets and four ways, in other implementations, the target table 118 may include additional (or fewer) ways and sets. As a non-limiting example, target table 118 may include sixteen sets and thirty-two ways.
The processing system 100 may provide the active fetch address 110 to the target table 118. The active fetch address 110 may indicate a particular set of fetch addresses in the target table 118 to be selected. In the illustrative example of
Each way in the target table 118 corresponds to a particular way identifier in the predictor tables 102-108. As described with respect to the example in
Thus, the second selection logic 120 may select the predicted fetch address 140 in the table 118 as a fetch address 122 for a target instruction based on the way indicated by the selected way pointer 116 and the set indicated by the active fetch address 110. The fetch address 122 may be used by the processing system to locate the address of the next instruction to be executed (e.g., the target instruction).
The techniques described with respect to
Referring to
The method 300 includes selecting, at a processor, a first way identifier or a second way identifier as a way pointer based on an active fetch address and historical prediction data, at 302. A first predictor table includes a first entry having the first way identifier and a second predictor table includes a second entry having the second way identifier. For example, referring to
The method 300 also includes selecting a first fetch address or a second fetch address as a predicted fetch address based on the way pointer, at 304. A target table includes a first way storing the first fetch address and a second way storing the second fetch address. The first way and the second way may be associated with the active fetch address. The first fetch address is associated with the first way identifier and the second fetch address is associated with the second way identifier. For example, referring to
According to one implementation of the method 300, the first selection logic 114 includes a first multiplexer, and the second selection logic 120 includes a second multiplexer. The method 300 may also include storing the historical prediction data 113 at the global history table 112 that is accessible to the processor (e.g., the processing system 100). The historical prediction data 113 includes one or more fetch addresses for one or more previous indirect branches. The method 300 may also include storing most significant bits of each fetch address of the one or more fetch addresses at the global history table to reduce overhead.
According to one implementation, the method 300 includes generating the first entry based on a first amount of the historical prediction data. For example, the entries 162-168 in the predictor table 104 may be generated based on the first amount of the historical prediction data 113. The method 300 may also include generating the second entry based on a second amount of the historical prediction data that is greater than the first amount of the historical prediction data. For example, the entries 172-178 in the predictor table 106 may be generated based on the second amount of the historical prediction data 113 that is greater than the first amount of the historical prediction data 113. According to one implementation, the method 300 includes selecting the second way identifier as the way pointer if the second entry (e.g., the entry generated on a larger amount of the historical prediction data) matches the active fetch address. The method 300 may also include selecting the first way identifier as the way pointer if the second entry fails to match the active fetch address and the first entry matches the active fetch address.
The method 300 of
In particular implementations, the method 300 of
Referring to
The memory 432 may be a memory device, such as a random access memory (RAM), magnetoresistive random access memory (MRAM), spin-torque transfer MRAM (STT-MRAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, or a compact disc read-only memory (CD-ROM). The memory device may include commands (e.g., the commands 460) that, when executed by a computer (e.g., processor 410), may cause the computer to perform the method 300 of
In conjunction with the described implementations, an apparatus for predicting a fetch address of a next instruction to be fetched includes means for storing data. For example the means for storing data may include a memory system component (e.g., components storing the tables) of the processing system 100 of
The apparatus may also include means for selecting the first way identifier or the second way identifier as a way pointer based on the active fetch address and historical prediction data. For example, the means for selecting the first way identifier or the second way identifier may include the first selection logic 114 of
The apparatus may also include means for selecting the first fetch address or the second fetch address as a predicted fetch address based on the way pointer. For example, the means for selecting the first fetch address or the second fetch address may include the second selection logic 120 of
The foregoing disclosed devices and functionalities may be designed and configured into computer files (e.g. RTL, GDSII, GERBER, etc.) stored on computer readable media. Some or all such files may be provided to fabrication handlers who fabricate devices based on such files. Resulting products include semiconductor wafers that are then cut into semiconductor die and packaged into a semiconductor chip. The chips are then employed in devices, such as a communications device (e.g., a mobile phone), a tablet, a laptop, a personal digital assistant (PDA), a set top box, a music player, a video player, an entertainment unit, a navigation device, a fixed location data unit, a server, or a computer.
Those of skill would further appreciate that the various illustrative logical blocks, configurations, modules, circuits, and algorithm steps described in connection with the implementations disclosed herein may be implemented as electronic hardware, computer software executed by a processing device such as a hardware processor, or combinations of both. Various illustrative components, blocks, configurations, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or executable software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the implementations disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in a memory device, such as random access memory (RAM), magnetoresistive random access memory (MRAM), spin-torque transfer MRAM (STT-MRAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, or a compact disc read-only memory (CD-ROM). An exemplary memory device is coupled to the processor such that the processor can read information from, and write information to, the memory device. In the alternative, the memory device may be integral to the processor. The processor and the storage medium may reside in an application-specific integrated circuit (ASIC). The ASIC may reside in a computing device or a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a computing device or a user terminal.
The previous description of the disclosed implementations is provided to enable a person skilled in the art to make or use the disclosed implementations. Various modifications to these implementations will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other implementations without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the implementations shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined by the following claims.