The present invention relates generally to the area of accessing memory. More particularly, the present invention relates to quickly selecting a wordline from a memory array in a case where an address is based on a combination of operands.
Memory addressing in traditional processors is typically computed by adding two operands such as a base address and an offset address in order to arrive at an effective address. Base+offset addressing is typically used to address memory within data caches as well as data or instructions within other CPU memory units. For example, Table-Lookaside-Buffers (TLBs) typically use base+offset addition in order to access a buffer location within the TLB. Because an addition is typically performed to arrive at the effective address, traditional processors usually take at least two cycles to access the memory. A first cycle is used to add the base and offset addresses and a second cycle is used to access the memory. Consequently, because two cycles are usually needed to access the memory in a traditional processor, the cycle immediately following a load instruction cannot use the result of the load operation. This delay is referred to as “load latency.” Load latency is a performance limitation factor in traditional processors. Load latency often manifests itself in a pipelined processor as a load-use penalty with the load results being unavailable for two machine cycles.
Therefore, what is needed is a system and method that improves access to a memory array based on multiple operand addressing.
The foregoing and further and more specific objects and advantages of the invention will become readily apparent to those skilled in the art from the following detailed description of a preferred embodiment thereof taken in conjunction with the following drawings:
In one aspect an address for accessing a data entry is obtained using the sum of two operands. In order to determine if data is present in the cache a TAG in the cache is accessed. Only a few bits of the address is for accessing the TAG. The corresponding bits from the two operands are directly used in initiating the accessing of the TAG rather than waiting for the complete sum of the two operands. The corresponding bits from the two operands are further divided into two subsets of bits. The subsets from each operand are input to a fast address decoder FADec to decode both the sum with a carry and the sum without a carry. The decode is accomplished for the case of a carry bit and the case without the carry bit because prior to the adding of the operands is completed the carry is not known. A further decode is provided based on the outputs of the FADecs. The sum of the operands then becomes available so that the proper entry in the memory is provided as the TAG. Thus, much of the activity required for providing the TAG is accomplished while the sum of the two operands is being calculated. This is better understood by reference to the drawings and the following description.
The following is intended to provide a detailed description of an example of the invention and should not be taken to be limiting of the invention itself. Rather, any number of variations may fall within the scope of the invention, which is defined in the claims following the description.
At step 110, the base and offset addresses (operands) are received. Two parallel processes commence at this point. One process evaluates the address bits (e.g., bits 48 through 51) to arrive at two possible wordlines (as used herein, a “wordline” is an address of an entry in the memory array or an actual memory array entry, as the context indicates). The other process determines if a carry results from bits in the operands (e.g., bits 52 through 63) and adds the carry value to the LSBs of the bits of the Operand A and B used to address the memory entry. The summation value determines which of the possible wordlines is the actual wordline.
The first parallel process commences at step 115 which runs the bits that are used to access the memory array (e.g., bits 48 through 51 for both Operands A and B) through PGZO generation logic. PGZO generation logic combines pairs of bits using logical operators (XOR, OR, AND, NAND) to create PGZO values. PGZO values are generated for the MSBs (bit 48 from both operands), bit 49 from both operands, bit 50 from both operands and from the LSBs (bit 51 from both operands). In the example shown, four bits are provided from the base and offset to generate a four bit effective address. Therefore, in the example shown, the effective address can be used to access a memory entry from a sixteen entry memory array. In step 120, the PGZO values for the various pairs of bits are run through wordline generators (see
At step 110, the base and offset addresses (operands) are received. Two parallel processes commence at this point. One process evaluates the address bits (e.g., bits 48 through 51) to arrive at two possible wordlines (as used herein, a “wordline” is an address of an entry in the memory array or an actual memory array entry, as the context indicates). The other process determines if a carry results from bits in the operands (e.g., bits 52 through 63) and adds the carry value to the LSBs of the bits of the Operand A and B used to address the memory entry. The summation value determines which of the possible wordlines is the actual wordline.
The first parallel process commences at step 115 which runs the bits that are used to access the memory array (e.g., bits 48 through 51 for both Operands A and B) through PGZO generation logic. PGZO generation logic combines pairs of bits using logical operators (XOR, OR, AND, NAND) to create PGZO values. PGZO values are generated for the MSBs (bit 48 from both operands), bit 49 from both operands, bit 50 from both operands and from the LSBs (bit 51 from both operands). In the example shown, four bits are provided from the base and offset to generate a four bit effective address. Therefore, in the example shown, the effective address can be used to access a memory entry from a sixteen entry memory array. In step 120, the PGZO values for the various pairs of bits are run through wordline generators (see
In the embodiment shown, the reason that there are two wordline possibilities is because there may be a carry resulting from the bits that are less significant than the LSB used in the address. In the embodiment shown, the bits that are less significant are bits 52 through 63 for both operands A and B. The second parallel process is used to determine whether the odd or even wordline is the correct wordline from memory array 130. Steps 140 and 150 take place in parallel with steps 115 and 120. In step 140, a fast carry generation is performed for bits 52 through 63 for both operands A and B. In step 150, the carry out value generated in step 140 is summed (added) to the least-significant-bits (LSBs) of the Operands A and B. A determination is made as to whether the sum operation results in a “1” or a “0” (decision 160). If the sum operation results in a “0,” decision 160 branches to “no” branch 165 whereupon, at step 170, even possible wordline 175 is selected. On the other hand, if the sum operation results in a “1,” then decision 160 branches to “yes” branch 180 whereupon, at step 185, odd possible wordline 190 is selected. At step 195, the selected wordline is retrieved from memory array 130.
In parallel with PGZO generation logic 115 and wordline generators 200 and 210, fast carry generation logic is performed on bits 52 through 63 and the carry value is added to the LSB of the memory address bits of the Operands A and B. This results in sum value 230 which is either ‘0’ or ‘1,’ and sum bar 235 which is the opposite of the sum value (‘1’ if sum is ‘0’, ‘0’ if sum is ‘1’).
Match selector and DLatch 250 selects either the possible odd memory array entry address (205) or the possible even memory array entry address (215) depending on the value of sum and sum bar. The selected memory array address (270) is then retrieved from memory array 130.
Fast carry generation and sum logic 225 includes fast carry generation circuitry 300 that receives less significant bits from Operands A and B (bits 51-63) and generates carry out value 305. Fast carry generation and sum logic 225 also includes addition circuitry 310 that adds the least significant address bit (LSB bit 51) from Operand A, the least significant address bit (LSB bit 51) from Operand B, and the carry out value to generate sum 230 and sum bar 235.
Match selector and Dlatch circuitry 250 includes match selector circuitry 320 which receives the possible odd and even memory array entry wordlines (205 and 215) along with sum 230 and sum bar 235 and selects one wordline. Dlatch circuitry 330 operates to latch a memory array wordline corresponding to the selected memory array entry address from memory array 130, resulting in matching memory array entry 195. Memory array 130 may be a TLB, a data cache or an instruction cache. Matching memory array entry 195, therefore, may be a data or instruction used by a process or processed by a processor.
In an alternate embodiment, the two possible wordline entries corresponding to odd memory array entry wordline 205 and even memory array entry wordline entry 215 are retrieved from memory array 130 and stored in a separate buffer (buffer 350) prior to the latching operation. This embodiment may be used when the possible wordline entries (205 and 215) are identified before sum 230 and sum bar 235 are provided by sum logic 225. In this embodiment, latch circuitry 330 operates to latch one of the two memory array entries that have been stored in buffer 350 resulting in matching memory array entry 195.
Sum 230 is ANDed with each of the possible odd wordlines and sum bar 235 is ANDed with each of the possible even wordlines. In other words, both the wordline and the sum or sum bar have to be enabled in order for the signal to access one of the array entries within memory array 130. For example, assume that the possible odd wordline is WL 7 and the possible even wordline is WL 6. If sum is enabled (i.e., ‘1’), then sum bar would be ‘0’ and the result of the AND operations would result in WL 7 being selected (both WL 7 and sum are enabled) and WL 6 would not be selected (WL 6 being enabled but sum bar not being enabled). On the other hand, if sum bar is enabled, then the opposite result would occur: both WL 6 and sum bar would be enabled so the result of the AND operations would propagate the WL 6 signal to memory array 130, and WL 7 would not propagate because while WL 7 is enabled, sum would not be enabled.
The result of each of the PGZO generations is a P value (by XORing the inputs), a G value (by ANDing the inputs), a Z value (by ANDing the inverted inputs), and an O value (by ORing the inputs). In addition, a P bar value and a G bar value are generated, with P bar being the inverse of the XOR value (by XNORing the inputs), and with G bar being the inverse of the AND value (by NANDing the inputs). As used herein, “PGZO” refers to one or more values generated by XORing bits, XNORing bits, ANDing bits, NANDing bits, ORing bits, and ANDing inverted bit values. Each logical operation may not be performed for every pair of bits. As input to the wordline generators shown in
The NMOS n1 and n2 are in the top level of the NMOS stacks. Either n1 or n2 would be ON depending the inputs a and aa. Similarly, the NMOS n3, n4 and n5 are at the same level below the top level of the NMOS stacks. Only one of n3, n4 and n5 would be ON depending on the inputs b, bb and bbb. The NMOS n6, n7 and n8 are in the middle level of the NMOS stack. Only one of n6, n7 and n8 would be ON depending on the inputs c, cc and ccc.
The NMOS n9 and n10 are in the lower level of the NMOS stack. Either n9 or n10 would be ON depending on the inputs d and dd. Therefore, during the time when clk is high, there are two possibilities. Depending upon the inputs, a conductive path from the precharged node 730 to the ground GND may discharge the precharged node to 730 to LOW. The input of the inverter 720 connected to the precharged node drives a HIGH to the output WL. The input of the inverter 710 which is also connected to the precharged node 730 drives a HIGH to PMOS p2 and turning OFF the PMOS p2. Alternatively, when there is no conductive path from the precharged node 730 to ground GND, the precharged node 730 remains the precharged state. The keeper PMOS p2 actively keeps the precharged node 730 at the precharge state.
In
The NMOS n12 and n13 are in the top level of the NMOS stacks. Either n12 or n13 would be ON depending the inputs a and aa. Similarly, the NMOS n14, n15 and n16 are at the same level below the top level of the NMOS stacks. Only one of n14, n15 and n16 would be ON depending on the inputs b, bb and bbb. The NMOS n17, n18 and n19 are in the middle level of the NMOS stack. Only one of n17, n18 and n19 would be ON depending on the inputs c, cc and ccc. The NMOS n20 and n21 are in the lower level of the NMOS stack. Either n20 or n21 would be ON depending on the inputs d and dd. Therefore, during the time when clk is high, there are two possibilities. Depending upon the inputs, a conductive path from the precharged node 830 to the ground GND may discharge the precharged node to 830 to LOW. The input of the inverter 820 connected to the precharged node drives a HIGH to the output WL. The input of the inverter 710 which is also connected to the precharged node 830 drives a HIGH to PMOS p4 and turning OFF the PMOS p4. Alternatively, when there is no conductive path from the precharged node 830 to ground GND, the precharged node 830 remains the precharged state. The keeper PMOS p4 actively keeps the precharged node 830 at the precharge state.
Two wordline generators are depicted in
In the embodiment shown, a sixteen entry memory array is used. Larger or smaller memory arrays could be used according to the teachings provided herein. To determine if the first memory entry is a possibility (WL 0), PGZO inputs are provided to the Or11n wordline generator (see
In order to determine if the third memory entry is a possibility (WL 2), PGZO inputs are provided to the Or22n wordline generator (see
In order to determine if the fifth memory entry is a possibility (WL 4), PGZO inputs are provided to the Or22n wordline generator (see
To determine if the seventh memory entry is a possibility (WL 6), PGZO inputs are provided to the Or11n wordline generator (see
To determine if the ninth memory entry is a possibility (WL 8), PGZO inputs are provided to the Or11n wordline generator (see
In order to determine if the eleventh memory entry is a possibility (WL 10), PGZO inputs are provided to the Or22n wordline generator (see
In order to determine if the thirteenth memory entry is a possibility (WL 12), PGZO inputs are provided to the Or22n wordline generator (see
Finally, to determine if the fifteenth memory entry is a possibility (WL 14), PGZO inputs are provided to the Or11n wordline generator (see
As a result of the PGZO values being mapped and supplied to the wordline generators as described above, two possible wordlines will be ON and will provide input to match selector/Dlatch circuitry 250. In addition, circuitry 235 receives sum and sum bar from fast carry generation and sum logic 225. In one embodiment, shown in
The subscript next to each P, G, Z, or O value indicates which bit pairing is used to generate the respective value, with ‘1’ being the LSB and ‘4’ being the MSB. In addition, a line over a P, G, Z, or O indicates that the inverse of the logic function is provided as input. For example, a P4 indicates that the input is a result of an XOR of the MSBs (i.e., bit 48 from Operands A and B). Likewise, a G3 indicates that the input is a result of an AND of bit 49 from Operands A and B. A Z2 indicates that the input is a result of an AND of the inverted bit values of bit 50 from Operands A and B. An O1 indicates that the input is a result of an OR of the LSBs (bit 51 from Operands A and B).
The tables below detail the inputs shown in
To determine whether WL 0 is a possibility, a copy of the Or11n wordline generator is used (see
To determine whether WL 1 is a possibility, a copy of the Or11n wordline generator is used (see
To determine whether WL 2 is a possibility, a copy of the Or22n wordline generator is used (see
To determine whether WL 3 is a possibility, a copy of the Or22n wordline generator is used (see
Turning to
To determine whether WL 5 is a possibility, a copy of the Or22n wordline generator is used (see
To determine whether WL 6 is a possibility, a copy of the Or11n wordline generator is used (see
To determine whether WL 7 is a possibility, a copy of the Or11n wordline generator is used (see
Turning to
To determine whether WL 9 is a possibility, a copy of the Or11n wordline generator is used (see
To determine whether WL 10 is a possibility, a copy of the Or22n wordline generator is used (see
To determine whether WL 11 is a possibility, a copy of the Or22n wordline generator is used (see
Turning to
To determine whether WL 13 is a possibility, a copy of the Or22n wordline generator is used (see
To determine whether WL 14 is a possibility, a copy of the Or11n wordline generator is used (see
Finally, in order to determine whether WL 15 is a possibility, a copy of the Or11n wordline generator is used (see
PCI bus 1414 provides an interface for a variety of devices that are shared by host processor(s) 1400 and Service Processor 1416 including, for example, flash memory 1418. PCI-to-ISA bridge 1435 provides bus control to handle transfers between PCI bus 1414 and ISA bus 1440, universal serial bus (USB) functionality 1445, power management functionality 1455, and can include other functional elements not shown, such as a real-time clock (RTC), DMA control, interrupt support, and system management bus support.
Nonvolatile RAM 1420 is attached to ISA Bus 1440. Service Processor 1416 includes JTAG and I2C buses 1422 for communication with processor(s) 1400 during initialization steps. JTAG/I2C buses 1422 are also coupled to L2 cache 1404, Host-to-PCI bridge 1406, and main memory 1408 providing a communications path between the processor, the Service Processor, the L2 cache, the Host-to-PCI bridge, and the main memory. Service Processor 1416 also has access to system power resources for powering down information handling device 1401.
Peripheral devices and input/output (I/O) devices can be attached to various interfaces (e.g., parallel interface 1462, serial interface 1464, keyboard interface 1468, and mouse interface 1470 coupled to ISA bus 1440. Alternatively, many I/O devices can be accommodated by a super I/O controller (not shown) attached to ISA bus 1440.
In order to attach computer system 1401 to another computer system to copy files over a network, LAN card 1430 is coupled to PCI bus 1410. Similarly, to connect computer system 1401 to an ISP to connect to the Internet using a telephone line connection, modem 1475 is connected to serial port 1464 and PCI-to-ISA Bridge 1435.
While the computer system described in
One of the preferred implementations of the invention is a client application, namely, a set of instructions (program code) or other functional descriptive material in a code module that may, for example, be resident in the random access memory of the computer. Until required by the computer, the set of instructions may be stored in another computer memory, for example, in a hard disk drive, or in a removable memory such as an optical disk (for eventual use in a CD ROM) or floppy disk (for eventual use in a floppy disk drive), or downloaded via the Internet or other computer network. Thus, the present invention may be implemented as a computer program product for use in a computer. In addition, although the various methods described are conveniently implemented in a general purpose computer selectively activated or reconFIG.d by software, one of ordinary skill in the art would also recognize that such methods may be carried out in hardware, in firmware, or in more specialized apparatus constructed to perform the required method steps. Functional descriptive material is information that imparts functionality to a machine. Functional descriptive material includes, but is not limited to, computer programs, instructions, rules, facts, definitions of computable functions, objects, and data structures.
Shown in
In operation FADec 1504 receives three bits A52-A54 from operand A and three bits B52-B54 from operand B and provides an active output on one of the four even bits 1520 and one of the four odd bits 1522. Similary, FADec 1506 receives three bits A55-A57 from operand A and three bits B55-B57 from operand B and provides an active output on one of the four even bits 1524 and one of the four odd bits 1526. In this example of six bits 52-57, bits 55-57 are the least significant bits (lsb) and bits 52-54 are the most significant bits (msb). The two active lines in even bits 1520 and 1522 indicate the two possible selected values based on having a carry bit and not having a carry bit. Thus, the lower value is the decode of the partial sum of bits A52-A54 and B52-B54 without a carry bit, which can also be stated as a carry of zero. The higher value, which will be one higher than the lower value, is a decode of the sum of A52-A54 and B52-B54 with a carry bit. Since they are only one apart, one decoded value is certain to be for an odd sum of bits A52-A54 and B53-A54 and the other is certain to be for an even sum. This operation is being carried out while the relatively slower 64-bit adder 1502 is summing A0-A63 and B0-B63. The speed of an adder is generally reduced by increasing the number of bits being added. Thus, although the function being performed by FADec 1504 is more complex than just an add, it is significantly faster than adder 1502 because of the much fewer bits, only three, being decoded.
FADec 1506 operates in the same fashion as FADec 1504 except with bits A55-A57 and B55-B57 as the inputs. Thus, FADec provides one of four even outputs 1524 in an active state and one of four odd outputs 1526 in an active state. The two active lines are the decode of the sum of A55-A57 and B55-B57 with and without a carry bit, which are the only two possibilities.
Decoder 1508 responds to the active signals of bits 1520, 1522, 1524, and 1526 by providing one active wordline signal for each of arrays 1510, 1512, 1514, and 1516. The logical combination of even bits 1520 and 1524 determine the selected wordline for array 1510. The logical combination of even bits 1520 and odd bits 1526 determine the selected wordline for array 1512. The logical combination of odd bits 1522 and even bits 1524 determine the selected wordline for array 1514. The logical combination of odd bits 1522 and odd bits 1526 determine the selected wordline for array 1516. Decoder 1508 functions also as a wordline driver. The 16 lines coming into each array 1510, 1512, 1514, and 1516 are effectively wordlines but not showing the memory cells connected to them which are in the arrays. This function of decoder 1508 occurs while adder 1502 continues to complete the summing function. Arrays 1510, 1512, 1514, and 1516 respond to the wordline activation by providing an output, which is 36 bits, to multiplexer 1518. Adder 1502 by this time has calculated the sum and provides the result for bits 54 and 57 to multiplexer 1518. These two bits are sufficient to determine which of arrays 1510, 1512, 1514, and 1516 has the data corresponding to the correct address. If bit 54 is a one then the activated line of odd bits 1522 is the valid line. On the other hand, if bit 54 is a zero, then the activated line of even bits 1520 is the valid line. Similarly, if bit 57 is a one then the activated line of odd bits 1526 is the valid line. On the other hand, if bit 57 is a zero, then the activated line of even bits 1524 is the valid line. Multiplexer 1518 thus functions to select which of arrays 1510, 1512, 1514, and 1516 provides the output as the TAG output of TAG memory 1500.
Shown in
Another possibility is to select value in one FADecs 1504 and 1506. This would be even a little slower than the example shown in
Various other changes and modifications to the embodiments herein chosen for purposes of illustration will readily occur to those skilled in the art. For example, specific numbers of bits were described but other numbers could be used. Examples were described to aid in understanding. It was not intended that these examples were the only examples. To the extent that such modifications and variations do not depart from the spirit of the invention, they are intended to be included within the scope thereof which is assessed only by a fair interpretation of the following claims.
This application is a continuation-in-part of U.S. patent application Ser. No. 11/257,932, titled “System and Method for Memory Array Access with Fast Address Decoder,” filed Oct. 25, 2005, and assigned to the assignee hereof.
Number | Name | Date | Kind |
---|---|---|---|
3265876 | Lethin | Aug 1966 | A |
5754819 | Lynch et al. | May 1998 | A |
6813628 | Bhushan et al. | Nov 2004 | B2 |
20030110198 | Park | Jun 2003 | A1 |
20040064674 | Asano et al. | Apr 2004 | A1 |
20050050278 | Meier et al. | Mar 2005 | A1 |
Entry |
---|
PCT International Search Report and Written Opinion, PCT/US06/40017 dated Jul. 16, 2008. |
Cortadella et al; “Evaluation of “A+B=K” Conditions without Carry Propagation”; IEEE Trans. on Computers, vol. 41, No. 11, Nov. 1992. |
Cortadella et al; “Evaluating of “A+B=K” Conditions in Constant Time”; IEEE ISCAS, 1988. |
Y. Lee et al; “Address Addition and Decoding without Carry Propagation” IEICE Trans. Inf. & Syst. vol. E80-D, No. 1, Jan. 1997. |
R. Heald et al; “64-Kbyte Sum-Addressed-Memory Cache with 1.6ns Cycle and 2.6ns Latency”; IEEE JSSC vol. 33, No. 11, Nov. 1998. |
W. Lynch et al; “Low Load Latency through Sum-Addressed Memory(SAM)”. |
U.S. Appl. No. 11/257,932, Bearden, David, et al., “System and Method for Memory Array Access with Fast Address Decoder”, filed on Oct. 25, 2005, Office Action—Rejection, May 1, 2008. |
U.S. Appl. No. 11/257,932, Bearden, David, et al., “System and Method for Memory Array Access with Fast Address Decoder”, filed on Oct. 25, 2005, Office Action—Rejection, Oct. 14, 2008. |
Number | Date | Country | |
---|---|---|---|
20070094480 A1 | Apr 2007 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11257932 | Oct 2005 | US |
Child | 11552817 | US |