This invention relates to shunted interleaved addressing of memories having plural memory banks and particularly to shunted interleaves for accessing memory data for cache lines in banked and paged memories of the type that partially access data.
Some interleaved memories contain banks that when one of the individually addressable data cells of the bank is accessed for a cache line, the bank activates the entire row of cells which includes the accessed cell. Generally, each bank holds only one or a few of its rows currently active, holding the row active for an indefinite amount of time from when the cell was first accessed to cause the row to become active until another cell is accessed to cause another one or more rows of the bank to become active. Thus, after initial use, each bank holds some one or more of its rows active as “partially accessed”.
The purpose for holding a row active is that a cell is more quickly accessed if it is in a row that is already active. This feature is particularly useful where cache lines of related data are stored in data cells of a single row. Thus, accessing a single data cell to store a cache line in a cache also activates all other data cells of the row for possible quicker access. Consequently, a memory containing a plurality of banks, each with partial access capability, can access a number of cells more quickly than all its other cells; the number of cells ready for quick access being equal to the memory's total number of banks times the total number of active rows of each bank times the number of cells in each of the memory's rows.
Examples of memory banks with partial access capability include Extended Data Out Dynamic Random Access Memories (EDO DRAMs) and Fast Page Mode Dynamic Random Access Memories (FOP DRAMs), in which a row of the DRAM is kept charged while performing multiple reads or writes so that successive reads or writes within the row do not suffer the delay of pre-charge and access to the row.
Classical interleave patterns of memories having banks, where =2B and B≧1, employ simple rotations through all 2B banks. Within each rotation, each successively addressed row of a memory is in a row having a same row address but in the memory's next bank, and is successively selected by simply decoding the B address bits which are usually just lower in order than the row address bits and just higher in order than the cell address bits. Thus, each rotation of a classical interleave addresses the same respective row in each successive bank successively, e.g. rows 0 of banks 0, 1, 2, . . . , −1, rows 1 of banks 0, 1, 2, . . . , −1, rows 2 of banks 0, 1, 2, . . . , −1, through rows −1 of banks 0, 1, 2, . . . , −1.
One problem with classically interleaved memories containing banks having partial access capability is that large blocks of data having some number≧2B of rows have each of their respective positions—beginning, ending and each position between—in the same banks of the memory. Because same positions of different blocks of data tend to be accessed for processing at the same time, then at different periods of time a different one of the banks of the memory tends to be the only bank being successively accessed for the data of two or more different blocks. When this happens, each of the banks in turn becomes the one being excessively accessed for cache lines in cells of different rows. Consequently, each row having just become active continually returns to being not active before another of the row's cells is accessed more quickly. Since its recognition, this problem has been only marginally solved.
The above-described problem was marginally solved by variously toggling and un-toggling one or more of the memory address's B bank-address bits, such as by using exclusive-OR (XOR) logic gates inputted by both a bank-address bit and one of the address's bits higher in order than the B bits or a single bit sum of more than one of the higher bits. The output of each gate replaced the output of the bank address bit which inputs the gate, thus replacing the bit's output for being decoded as for the classic interleave. Thus, one or more pairs of banks swapped positions in the order of rotation for the classic interleave, doing this differently for different ranges of memory's addressing. An example of this toggling technique is described in “Pseudo-Randomly Interleaved Memory,” Proceedings of the Association for Computer Machinery, September 1991 by B. R. Rau.
These solutions were only marginally successful because where a swapping helped one pair of blocks of data for accessing more data more quickly from a row already partially accessed, it often harmed another pair. Consequently, different ranges of the memory's addressing had different orders of rotation for successively addressed rows being in successively selected banks and therefore had marginal success for having some same positions of blocks in different banks. While such pseudo-random toggling improved access to some data, it caused another problem. More particularly, accessing different blocks of different clashing rotations caused some additional accessing of data of different blocks in the same bank.
A solution is needed which distributes all data positions of all blocks of data of all memory from the smallest blocks (each row of cells) up to those blocks as large as the largest pages for mapping virtual data into physical memory, statistically distributing all of them evenly (in equal numbers) and finely (every few successive addresses) among all the interleaved banks, while avoiding clashing by also preserving a consistent respective order of rotation for successively selected banks for all addressing of memory. Consider a series of consecutively addressed same-sized blocks of data, of all memory, where each block's common size is the size of any one of all the successively doubling interleaved sizes—1 row, 2 rows, 4 rows, 8 rows, etc.—through the number of rows in the largest page for virtual mapping, and the number of rows of the smallest block is no less than the number of banks of one simple rotation. Such a series of blocks will have all respectively positioned rows of the blocks—all first (beginning) rows, second rows, third rows and so on through all last (ending) rows of the blocks—residing evenly and finely in all the banks of the rotation, equal or nearly equal (differing by no more than one) numbers of rows per bank. Thus in some embodiments of the present invention, the number of rows per bank per simple rotation equals the column height occupied by representations , F, X, Y and Z in ROM 14 of each of the shunted interleaves of
My aforementioned patent application Ser. No. 11/719,926 describes abbreviated interleave patterns for successively accessing plural banks in a memory to retrieve or store data interleaved among a plurality of banks. In binary embodiments, an odd number of banks are accessed by a processor during each of a plurality of abbreviated interleaves to retrieve or store data elements in the banks. Thus, as applied to a memory containing 2B banks, Q banks are accessed during each abbreviated interleave, where Q is an odd (that is, prime to the total number of banks), preferably a prime number, and 1<Q<2B. A rotation of 2B successive abbreviated interleave patterns accesses each bank 2A times, with each successive abbreviated interleave accessing a different set of Q banks also 2A times. During each interleave, each bank of a set of Q banks is accessed L times and R banks of the set of Q banks are accessed one additional time each, where 2A=L·Q+R and R=mod(2A,Q). My prior application describes a powerful technique that greatly increases the efficiency of the memory over that of the classic interleave technique for interleaving and accessing elements not via cache and that is about the same as the Ranade and Rau prime degree interleaves. It offers the advantage over the Rau and Ranade interleaves by employing a memory with 2B banks, permitting efficient binary addressing and use of the memory for paging. Also, the abbreviated interleave pattern has the efficiency of the Rau and Ranade interleaves for interleaving partially accessed rows for accessing cache lines while not having an address apparatus needing to input all physical address bits to address rows.
The present invention is directed to shunted interleave patterns that are particularly useful for accessing plural memory banks for use with a cache. Shunted interleave patterns according to the present invention may be abbreviated (Q<) or unabbreviated (Q=) and Q may be odd or even but not a power of two, where is the number of memory banks.
In one form of the invention, a computer memory containing a plurality of individually and successively addressable memory banks is addressed using a bank select device that defines a plurality of offset interleave patterns. The bank select device comprises a plurality of addressable locations, and a plurality of storage locations. Each storage location is correlated to a respective one or more of the plurality of addressable locations and each addressable location is correlated to one of the plurality of storage locations. Each storage location contains a respective bank select for addressing a respective one of the addressable memory banks, there being at least as many storage locations as there are addressable memory banks. The addressable locations and storage locations are grouped into a plurality of interleave patterns such that, for each interleave pattern, there are Q storage locations and 2A addressable locations arranged in L sequential loops each containing Q sequentially addressable locations and a single remainder loop containing R sequentially addressable locations, where L>0, L·Q+R=2A, ≧Q≧3, 0<R<Q, and Q has at least one odd factor greater than 1. A shunt, S, defines an offset for each interleave so that each interleave commences with a different bank select and a complete rotation of all of the interleaves addresses each of the memory banks an equal number of times. The bank select device is responsive to an input address to an addressable location to choose the bank select from the correlated storage location, such that successive input addresses to successive address locations during execution of a respective interleave pattern choose successive bank selects.
The plurality of interleave patterns are organized so that bank selects are chosen to address a different set of Q memory banks during execution of different ones of the plurality of interleave patterns, and execution of a complete rotation of the plurality of interleave patterns addresses all of the plurality of memory banks. Address apparatus is responsive to the bank select device to address successive memory banks using the chosen successive banks selects.
In some embodiments, the shunt (S) is selected from the group consisting of mod(2A,Q) and −Q+mod(2A,Q) where S is ±1 or ±prime to . In some embodiments, values of A, Q and/or are adjusted to derive a desired value of S using a gain, G.
In some embodiments each memory bank is capable of storing data in a plurality data storage locations as rows of data such that each row has a plurality of individually addressable data cells and each memory bank is responsive to an access of a data cell to make at least the row of data containing the accessed data cell ready for a quicker additional access to the row's data cells than for an access to data cells in others of the rows of data. The apparatus further comprises a cache coupled to the memory for individually storing data from each of a plurality of respective data cells in respective rows in respective memory banks as respective individually addressable cache lines and for transferring altered cache lines to respective memory banks as data for storage in respective data cells in respective rows. A processor is coupled to the cache to execute instructions on data received from data elements in individually addressable cache lines of data and to send results of executed instructions as data elements to the cache for storage as data elements in cache lines to alter the data of respective cache lines.
A computer process is provided as another form of the invention to carry out shunted interleave patterns on a plurality of memory banks.
A classic interleave distributes 2B+C−Λ addresses of rows among 2B banks at the rate of one address per bank for each rotation through all the banks, where B≧1, where 2C−Λ is the total number of data blocks or rows addressed per bank, 2C is the number of cells per bank, and 2Λ is the number of cells per data block interleaved or row. The distribution as for any one rotation is repeated 2C−Λ times for all memory. Thus, the B field of the address is ordered below all the C−Λ bits, and because field A has zero bits, the bank selection is a simple B-bit decoder to decode the value [B], the content of field B. The row selection is a simple (C−Λ)-bit decode, and cell selection is a simple decode of Λ bits.
The efficiency of the classic interleave employing eight memory banks is 67.2%, meaning that 67.2% of the banks, on the average, hold memory data of a large range of fixed stride values. But for accessing cache lines already partially accessed in bit addressed memories, rows of large power of two strides need to be well distributed among banks. For such accessing, a classic interleave does not do well because of having all rows of such strides in a single one of its banks. Efficiency for such accessing for eight classically interleaved banks is 12.5%, such poor efficiency causing each respective 1st, 2nd, 3rd, . . . or last data position of large blocks of data to be in only one of the eight banks. Thus, for applications accessing large blocks of data, there is a poor chance that a classic interleave will access a cache line from a row when the row is already partially accessed.
An interleave scheme with a greater efficiency for distributing respective same rows of large blocks is the Ranade interleave which employs an odd number of data banks greater than 2, an odd number of the form, 2B−1. In the context of a contiguous physical memory, the Ranade interleave would distribute every set of 2B−1 consecutive addresses of (2B−1)2C−Λ total addresses of rows among the 2B−1 banks at the rate of one address per bank for each of 2C−Λ repeats of the bank select pattern. The Ranade scheme employs an odd number of data banks (2B−1) and interleaves data evenly and finely among all of the banks. Thus in the Ranade scheme all the B+C−Λ bits of the memory's row address are the respectively ordered bits summed by Ranade's pyramid of B-bit ones compliment adders. For all stride values, a Ranade interleave scheme employing seven banks has an efficiency of 87.8%, significantly better than the classic interleave. But, more importantly for addressing partially accessed rows for accessing cache lines, the Ranade technique has 100% efficiency for row-data of all powers-of-two strides. However, the Ranade interleave is unsuitable for paged memories when interleaving elements for direct access (not via a cache) where different pages of the same size preferably have the same interleave pattern. The Ranade technique is suitable where the physical memory is small and the interleaving of rows for accessing cache lines is more important than pages having different patterns. For interleaving rows, Ranade's addressing apparatus for generating bank selects must input all the bits of the entire physical address of rows, which contains too many bits for any reasonable Ranade's apparatus to have good performance if physical memory is large. Thus, the Ranade interleave is unsuitable for the memories found in most computers. Where the B field addresses data blocks at least as large as the largest pages for mapping virtual addresses and the F field of the address according to the present invention is relatively large for not being inputted by the addressing apparatus, a shunted interleave according to the present invention applied to the Ranade technique of 2B−1 memory banks (=Q=2B−1) becomes quite suitable for interleaving row addresses of large physical memories.
B.R. Rau's aforementioned paper also proposed a “prime degree” interleave which employs a non-unitary odd prime number of banks (3, 5, 7, 13, 17, . . . ) in a manner similar to Ranade. While the efficiency of the Rau prime degree interleave is the same as that of the Ranade interleave and significantly better than the classic interleave, the use of an odd number of memory banks in the prime-degree interleave scheme described by Rau is unsuitable for paged memories and bit-addressed memories which require a power-of-two number of memory banks for having same sized pages be interleave the same. Rau offers no useable addressing apparatus for the prime degree interleave. Thus, the prime degree interleave becomes possible with the present invention where equals Q which equals a non-unitary odd prime number.
As used herein, interleaves are defined by the number, , of total interleaved banks and the number, Q, of banks accessed during a single simple rotation of one interleave. An interleave is odd if Q is odd; an interleave is even if Q is even. An interleave is unabridged if is a power of two (=2B); an interleave is abridged if is less than 2B where 2B=ceiling(log2). may be even other than a power of two or odd in an abridged interleave (interleaves where is a power of two are unabridged interleaves). An interleave is abbreviated if Q<; an interleave is unabbreviated if Q=. All abbreviated interleaves are shunted interleaves; unabbreviated interleaves may be shunted according to the present invention.
The present invention is directed to shunted interleaves where individual interleaves of a complete rotation of a plurality of interleaves are offset from each other in such a way that the first position of different ones of the plurality of interleaves addresses different ones of the interleaved banks. The offset is defined by a shunt, S, to cause successive pairs of the interleaves to have respective banks be shunted, S banks from interleave to next interleave. The interleaves may be odd or even, abridged or unabridged, and abbreviated or unabbreviated, but not even interleaves where Q is a power of two. Most preferred are odd abbreviated unabridged interleaves.
Each interleave of Q banks of the plurality of shunted interleaves of all banks according to the present invention contains 2A representations, where 2A is both the number of addressable rows distributed among all the Q banks for each interleave pattern of a rotation of interleave patterns and the number of addressable rows in each of all banks for each rotation of all interleave patterns, each of the banks being a different one (1st, 2nd, . . . ) of the Q banks in respective Q of the interleave patterns, each of the interleaves having at least one bank in common with the fewer of either 2(Q−1) or −1 of the other interleave patterns. Each individual interleave pattern of each complete rotation of the interleave patterns comprises a plurality of L loops each containing Q representations of addresses to address Q banks. As explained in conjunction with
Examples of abbreviated interleave patterns (Q<) include:
Unabbreviated interleave patterns include a 7/7 interleave, which is unabbreviated because Q equals . It is also an odd interleave, because Q is odd and is abridged oddly because =7. A 10/10 unabbreviated interleave pattern is even and abridged evenly.
There are two types of even interleave patterns, namely where Q is power of two and where Q is not a power of two although even. Examples of interleaves where Q is a power of two include 4/10 (abbreviated abridged even), 8/16 (abbreviated unabridged), 4/7 (abbreviated abridged odd), and 16/16 (unabbreviated unabridged). For reasons given below, interleaves where Q is a power of two (and the processor operates in binary), are not considered suitable for the present invention. Additional examples of interleaves where Q is even and not a power of two include 6/10 (abbreviated abridged even), 10/16 (abbreviated unabridged), 6/7 (abbreviated abridged odd), and 6/6 (unabbreviated abridged even). Also for reasons given below, interleaves where Q is even and not power of two or is odd (any having an odd factor greater than one), and where the processor operates in binary, are considered suitable for the present invention.
For reasons similar to those given in my prior application, abbreviated interleave patterns are preferred over unabbreviated interleave patterns and odd interleave patterns (where the processor operates in binary) are preferred over even abbreviated interleave patterns. Also for reasons to be given herein and although less preferred than Q odd, Q being an even number having an odd factor larger than one is preferred as suitable for this invention. Most preferred are odd abbreviated unabridged interleaves where =2B and Q is a prime number of three or more.
An interleave pattern (odd or even, unabbreviated or abbreviated, unabridged or abridged) is a shunted interleave pattern if a shunt value is applied so that initial bank selects of each of the plurality of interleave patterns is assigned to select a different bank (S≠0, where −Q<S<+Q) so the period of bank usage is Q during and between the Q interleaves using a bank. Thus, all of the above exemplary interleave patterns of the invention are shunted if, for each example, S≠0 so that initial bank selects of each of the plurality of interleaves is assigned to select a different bank to maintain, during transitions between successive interleaves, bank usage as once each Q usages for banks common to the interleaves. The adjacent B and A address fields address the addressable locations 102, 104, 106, 108 of each interleave pattern and establish a single remainder loop so that bank usage for each of all () memory banks is only once each Q usages during each of the (Q of all ) interleaves which use the bank.
The classic interleave pattern and Rau's pseudo random interleave pattern, such as a 2B/2B interleave that has A=0, Q=2B, =2B and S=0, are unshunted (not shunted) interleaves. Each rotation of both the classic and Rau's pseudo random interleaves is of all of the interleaved banks of the memory for the same address of data per bank because A=0 and, because S=0, each rotation commences at the same bank or, if Rau, a bank swapped for the bank. Again Q is the minimum period of bank usage for the Ranade interleave also being unshunted because the addressing apparatus of Ranade employs adders and natural adder wraparound to produce a single simple rotation of the bank selects for all addresses interleaved which works only for =Q=2B−1 and Γ=0. For Rau's prime degree interleave, Rau implies only a single simple rotation of the Q bank selects as for A=0, where Q= where Q and are prime numbers.
Processor 16 contains address apparatus 10 containing bank select apparatus 14 responsive to address 11 for selecting banks in main memory 12. As shown particularly in
As shown in
As shown in
As shown in
If cache 18 retains cache lines on a recently-used basis, then when the cache becomes full, a next cache line to be copied from memory 12 replaces the cache line that had not been accessed for the longest period of time. Thus cache 18 would always contain some number more than of the cache lines most recently-accessed for elements by the processor. In the event that data in a cache line are altered, such as due to a processing result to be stored to memory by processor 16, the altered data are stored in the respective cache line with a flag, sometimes called a “dirty bit”, that indicates the data within the cache line has been altered from the copy in the respective data cell in the respective row of the respective bank 20. Before being discarded, such as when being replaced by a new cache line, the altered cache line is copied back into memory 12 at the respective cell, row and bank from which it was copied into cache before becoming dirty. In some embodiments if the row is already partially accessed, the altered cache line is stored back to the memory bank more quickly than if the row is not already partially accessed.
In preferred embodiments there are 2B banks. For banks containing 32 data cells per row, there are as many as 32·2B partially accessed data cells in main memory 12. Thus, if memory has 16 (B=4) banks 20, there are up to 512 partially accessed data cells which can hold enough data for 512 cache lines, and if each cell has 16 elements, up to 8,092 partially accessed elements in main memory 12.
The line immediately below each sub-table identifies relative bank select values 100, identified in left to right order as values 0, 1, 2, . . . , −1, some specifically, some in expressions of Q, R, S, or , and some implied by dots. The relative bank select values are ordered identically for all sub-tables. The illustration of each sub-table of
In the embodiment of
For purposes of explanation, the first location 102 of each loop of Q or R locations is designated F or , with the first location of the first loop of each abbreviated interleave being . The last location 104 of each loop of Q locations is designated Y, the last location 106 of a loop of R locations is designated Z and each location between a first and last location of each Q or R locations is either explicitly designated or implied (dots between Xs) to be X. Each letter F, , X, Y, Z represents a different value, allowed value of [B,A], of the lower order B+A bits of a data (row) address from address 11. Successive next values of [B,A] are from left to right within loops and top to bottom between loops with wraparound, 0 of location 108 following (·2A)−1 of location Z 110, [B,A] not allowed where ·2A≦[B,A]<2A+B. The B-bit relative bank select values 100 are arranged commonly across all columns of all sub-tables (same column is for same bank select for each interleave regardless of notation differences), so that each allowed B+A-bit address (value) from address 11 represented by a letter /F,X,Y/Z in each loop L and R in each sub-table corresponds to a respective bank select value 100.
Fields Γ and B are all the bits of the physical address 11 for identifying blocks of data as large as or larger than one of the memory's largest pages for virtual addressing. In order that bank select 14 has the least number of individually addressable data cells, all physical address bits of address 11 in field Γ and B are preferably all that are needed to identify a particular one of memory's largest pages for virtual addressing. By having more A field bits and fewer Γ field bits, it is permissible to include more cells for holding bank select values in ROM 14, at the cost of increasing the size of the ROM.
The A+B bits from fields A and B of physical address 11 address 2A respective ones of the 2A+B cells of ROM 14. Thus, the B-bits select one of the more than 2B−1 up to 2B interleaves (or sub-tables) and the A-bits select one of the 2A representations within the selected interleave. As shown in
Since, in the case illustrated in
A given row of the 2Γ+A rows of all memory selected according to the value, [Γ,B,A], of fields Γ, B and A, along with the Λ bits of field A specific to 2Λ cells of each row, specifies cell (
As shown particularly by comparison of abbreviated interleaves 0 and 1, the first position shown by of the first loop of Q in any sub-table (such as interleave 1) is at some different column from the column of the first position of the just prior abbreviated interleave or next higher sub-table (with top to bottom wraparound for interleave −1 being just prior to interleave 0). In preferred embodiments which have minimum period of bank usage=Q, and as shown in
The first positions, and F, of the loops of each sub-table, and hence each abbreviated interleave, are offset from the first positions, and F, of the loops of the immediate prior abbreviated interleave by shunt S. In preferred embodiments, shunt S is determined as
S=mod(2A,Q), if 2A modulo Q is +1 or a positive prime to , if a resulting value for G is large enough, [1]
or S=−Q+mod(2A,Q), if −Q+(2A modulo Q) is −1 or a negative prime to if a resulting value for G is large enough. [2]
If each of equations [1] and [2] produces a valid value for S, then preferably S is chosen as the valid value of [1] or [2] closest to zero. If neither equation [1] or [2] provides a valid value for S, then a gain value, G, is identified to adjust the values of one or more of A, Q and to iteratively determine a value for S using equations [1] and [2]. If the shunt value is determined as S=+1, the first positions F of each loop of each of the plurality of interleave patterns 0 through −1 will choose bank selects 0, 1, 2, . . . , −1, respectively. If S is a positive number prime to , the first positions F of each loop of each of the interleave patterns 0 through −1 will choose bank selects 0, R, mod(2R,), mod(3R,), . . . , mod((−1)R,), respectively. If S is a negative number prime to , the first positions F of each loop of each interleave pattern will choose bank selects 0, −(Q−R), −mod(2(Q−R),), −mod(3(Q−R),), . . . , −mod((−1)(Q−R),), respectively. If S=−1, the first positions F of each loop of each interleave pattern will choose bank selects 0, −1, −2, . . . , −(−1) or −1.
Where Q< and the period of bank usage is Q during and between interleaves using a bank, the gain value, G, can be determined as a measure of how large a stride can be to have the interleaves remain as of a simple interleave of Q banks. For strides larger than G/Q (both respecting addresses interleaved), the interleaves begin periods of transitioning from as of Q banks to of banks back to as of Q banks, the interleaves behaving the same for strides differing by multiples of ·2A. G is not often a concern in cases where the size in rows of the largest page for mapping virtual addresses is not more than 2A; the largest page size in rows also being the largest stride (also in rows) which is the usual threshold immediately before concern for G. But having A be smaller is an opportunity for also having a smaller ROM 14. For having a smaller ROM 14, if Q<, period of bank usage is Q during and between interleaves using a bank, and =2B, 2A can be less than largest page size in rows if G is equal to or greater than the largest page size in rows where
G=Q+mod((−Q−R+1),S)+trunc((−Q−R+1)÷S)·2A if S is positive, or G=R+1+trunc((Q−)÷S)·2A if S is negative.
Thus, the numerical value of G can be used in some cases to reduce the required size of the A field, reducing the field by some number of bits until just before G becomes less than largest page size in rows. Generally with each reduction, S must be readjusted to a different preferred value because the most preferred evaluations for S are functions of A, namely S=mod(2A,Q) and S=−Q+mod(2A,Q). Also, where Q is not much smaller than and a preferred S is not near to zero (near floor(Q/2) from zero), G may show the A field needs an additional bit.
Conveniently, the process of adjusting A (as well as Q and/or ) for G can be accomplished with the aid of a spread sheet. Alternatively, a set of three-dimensional graphs could be used, each representing a different value of interest of some fourth variable being least considered for change, such as . Each graph has A and Q for values of interest as the X- and Y-coordinates and G as the Z-coordinate where G can be considered to be of no interest and set to either 2A or zero where not defined (either Q= or minimum period of bank usage is less than Q). The graph with the largest G for interest ranges indicates the values of A and Q along with its value together providing a best value for S. The minimum period of bank usage equals if (S<R,R−S, if (Q−S>,R+Q−−S, if (Q>S,R+Q−S , if (−S+R<Q,−S+R,Q)))).
Each bit of reduction or expansion reduces or increases the size of ROM 14 by half or double, which can be very significant. Where Q==2B for S=0, field A may be reduced any amount because bank usage transitioning from once each Q to once each is no change. In such a case, if A becomes zero, then interleave becomes a classic interleave. Where <2B, the expression for G is valid and might indicate a reduction, but field A ought not be reduced to 2A<largest page because to do so may cause discontinuities in the usable address spaces of largest pages.
It will be appreciated that the computation of S can be applied to unabbreviated interleaves as well as abbreviated interleaves. Thus selection of the equation or method of determining S for both abbreviated and unabbreviated interleaves can be summarized as follows:
For an odd/unabridged/abbreviated interleave pattern where =2B, and Qodd<, one of the two equations [1] or [2] for determining S can always be used and the other can not, depending on the values of A, B (for ) and Q. For example, for A=20, B=3 for =8, and Q=7 (7/8 interleave), equation [1] provides an invalid value of S=4, whereas equation [2] provides a valid value of S=−3 which is prime to 8; for A=21, B=4 and Q=7 (7/16 interleave), equation [1] provides a valid value of S=1, whereas equation [2] provides an invalid value of S=−6 which is not prime to 16. G for A=20, Q=7 and B=3 shows either A needs adjusting to 21 where S becomes +1 or S needs adjusting closer to zero to −1 if minimum period being only 5 is acceptable; no adjustment is needed for A=21, Q=7 and B=4.
For an odd/abridged-oddly/unabbreviated interleave pattern where <2B, Qodd=, equations [1] and [2] each gives a valid value for S. For example, for A=20 and Q==9, equation [1] provides a valid value of S=4 and equation [2] provides a valid value of S=−5, both being prime to 9, both resulting in minimum period of bank usage equal to Q, S=4 being nearer to zero than S=−5 (actually, where Q=, each S value is the same offset modulo and so produces exactly the same interleave).
For an odd/abbreviated/abridged-odd interleave pattern, none, one or both results of equations [1] and [2] may be useable. If neither equation [1] or [2] arrives at a solution, then one or more of A, Q and are adjusted for a second iteration. For example, where A=20, =77 and Q=43, equation [1] will result in the invalid value S=21 which is not prime to 77, equation [2] will also result in the invalid value S=22 which is also not prime to 77; so adjustment of A to 21 gives S=−1. Where A=20 for a largest page size of 220 rows, =9 and Q=7, equation [1] will result in the valid value S=4 which is prime to 9, but equation [2] will result in the invalid value S=−3. Nevertheless, A needs adjusting because G=10 is too small (G<largest page for virtual mapping). Adjusting A to 21, S becomes +1 and G becomes greater than 220 and thus, greater than largest pages, which is more than large enough to be not too small. Where A=20, =11 and Q=7, equation [1] will result in the valid value S=4 and equation [2] will result in the valid value S=−3, both being prime to 11, both having minimum usage of Q, −3 where G is adequate (>220) being nearer to zero than +4 where G=8.
For an odd/abbreviated/abridged-even interleave pattern, neither or one of equations [1] and [2] will result in a valid shunt value, S. Thus, where A=20, =12 and Q=7, equation [1] will result in the invalid value S=4 and equation [2] will result in the invalid value S=−3, each being a factor of 12, so S is solved as described above. Whereas, if =10, equation [1] gives the invalid S=4, whereas equation [2] gives the valid S=−3. Noteworthy, in this example, the results of equations [1] and [2] can never both be prime to an even because the result of one of the equations will always be even.
For an even/abbreviated/abridged-odd interleave pattern, neither, one or both of equations [1] and [2] will result in a valid shunt value, S. Thus, where A=20, =45 and Q=22, equation [1] will result in the invalid value S=12 and equation [2] will result in the invalid value S=−10, both not being prime to 45. Therefore, S needs to be adjusted as described above. Where A=20, =15 and Q=10, equation [1] gives the invalid S=6 due to the common factor 3, whereas equation [2] gives the valid S=−4. A further adjustment to A=20, =7 and Q=6, results in equation [1] giving the valid S=4 and equation [2] giving the valid S=−2, both being prime to 7, but neither results in a valid G at least as great as the preferred minimum size of G≧2A or =220. (Noteworthy here is that S=−2 results in minimum usage being less than Q, whereas S=4 and S=−3 each result in minimum usage of Q. Thus, −3 being nearer to zero than 4 means S=−3 would be preferred except that the value of G is too small. G is too small because S=−3 is not a result of equation [2] which is required (but not sufficient) for usage to be exactly Q to thereby allow G to possibly be large enough (not too small). Only S of equation [2] can sometimes result in minimum period of bank usage being less than Q. S of equation [1] always results in period of Q. S other than of equations [1] and [2] always results in usage not always Q and therefore, G being too small. For knowing what's occurring, unexpected minimum usage should be checked for a negative S along with G being checked for both positive and negative S.)
In the cases of the even/abbreviated/unabridged (6/8), even/unabbreviated/unabridged (16/16), even/unabbreviated/abridged (6/6) and even/abbreviated/abridged-even (4/6) interleave patterns, equations [1] and [2] will not supply a valid value for S because both Q and are even. Therefore, the shunt value is determined by adjusting one or more of Q and so than either or both are odd before proceeding as described above.
In all cases, the offset, when applied to the interleave patterns of a complete rotation, assures that each bank 200, 201, . . . , (
If the consecutive table positions representing R and/or Q addresses reach beyond the bounds of the right-most column of bank select value −1, each sequence wraps around to the beginning of the same line as in the same loop from the −1 to the 0 valued position, as shown in sub-tables of interleaves 1 and −1. In any case, it will be appreciated that there will be −Q unused positions in each line with a loop containing Q addressed positions and there will be −R unused positions in each line with a remainder loop.
It will be appreciated that the total number of addressable locations in the Q and R sequences on each interleave pattern is L·Q+R, and that the Q and R sequences of letters represent 2A addresses of [A] as 0, 1, 2, . . . , 2A−1 of the corresponding interleave of any rotation so aligned (parameter =0). Thus, 2A=L·Q+R. Thus, each letter ,F,X,Y,Z represents a (B+A)-bit address input to ROM 14 where a corresponding -bit bank select is stored. Also, each of the bank select values of the columns indicates which one of the outputted bank select bits is the only bit set to select a bank of any rotation so aligned (parameter K=0). The shape of the two-dimensional layout of letters in
As shown in
As for Spositive, the position of of a next rotation is offset right by +S from the position of F of the last interleave of the prior rotation; conversely left instead of right for Snegative. The position of each or F of each of the rotations is the same.
In
The data content of ROM 14 produces at least one period of the smallest repeated pattern of bank usage aligned in the form of a rotation of interleaves. The value of S establishes the offset which progressively positions the common same relative pattern of each successive interleave 0, 1, . . . , −1 (numbers vertically annotated along the right side of ROM or bank select 14,
In operation, fields A and B of physical address 11 (
As previously described, according to the present invention Q is odd or even having an odd factor greater than one, and 3≦Q≦. The interleaves are overlapping in that each of the banks is one of the Q banks for Q of the interleaves; that is, the bank is the first of the Q banks for one of the Q interleaves, the second of the Q banks for another of the Q interleaves, and so on through the last of the Q banks of yet another of the Q interleaves. Additionally and in the same bank order, the first Q−1 banks of each of the interleave is the last Q−1 banks of another interleave, the first Q−2 banks of each interleave is the last Q−2 banks of another interleave, and so on until the first bank of an interleave pattern is the last bank of yet another interleave pattern. This is the case for any valid shunt, ±1 or ±prime to . Therefore, with wraparound, each interleave pattern has at least one bank in common with the Q−1 interleave patterns before it and with the Q−1 interleave patterns after it where some furtherest before may be the same interleave patterns as the same number furtherest after because 2(Q−1) may be more than −1.
Upon processor 16 issuing an address 11 to address apparatus 10, the Λ field is decoded within a selected bank to generate a 2Λ-bit cell select and the Γ+A concatenated fields are decoded within the selected bank to generate a -bit row select. The B+A concatenated fields address one of 2A representations at a , F, X, Y or Z position of one of interleaves to select the correlated -bit bank select where the A bits address the 2A and the B bits address the with the allowed values of [B], 0, 1, . . . , −1. Thus, the bank select, row select and cell select together access an addressed cell in an addressed row in an addressed bank and, if the bank is coupled to cache 18 and is of the type that addresses and accesses one of the cells of an thereby also addressed row for a cache line, the addressed row preferably therefore becomes partially accessed for quicker additional accesses of its cells if not already.
Consider a memory having =4 banks 0, 1, 2, 3 each containing 2A=16 addressable rows 0 . . . 15 for a rotation of =4 shunted interleaves 0 . . . 3 so that B=ceiling(log2())=2 and A=log2(16)=4 for a ROM 14 being addressed by B+A=6 bits as ·2A=64 different addresses or address values. Consider the interleaves to be abbreviated interleaves where Q=3 so that for each of its =4 interleaves, ROM 14 of
Where the addressing is of successive rows, the bank selects are of successive ones of Q banks but where of successive cells, each 2Λ successive cells are of a single one of the Q banks. More particularly, field Λ of address 11 advances through the field's 2Λ cell addresses before advancing the value of the A field to select a next row and bank address as described above.
It will be appreciated that the tables in ROM 14 (
Efficiency for distributing data of powers of two strides among banks is one of four important qualities for judging different interleaves for interleaving partially accessed rows for accessing cache lines, namely distribution, order, adequacy and impediment ratio. The first three qualities are subjective and easily examined, whereas the impediment ratio of an interleave is an objective measure given by Q2/, where Q is the number of bank accessed during each of successive simple rotations of bank usage of the interleave scheme and is the total number of memory banks accessed during all successive rotations. The impediment ratio is a measure of the odds that two relatively long data streams accessed via cache are alternately accessing the same bank. It is desirable that the impediment ratio be as small as practical less than one (its maximum possible value) and preferably significantly less than one, although not having Q be so small so as to degrade adequacy for having multiple data streams accessing in the same one of interleaves at the same time.
Most interleave techniques distribute all data evenly among all of a memory's banks but may not do so finely so that each successively interleaved addresses of the data are distributed among all banks. Where Q=H, such as the classic, Ranade and Rau's pseudo random and prime degree interleaves, the best possible distribution is achieved. But for interleaving partially accessed rows for accessing cache lines, the important data are of large powers of two strides, and these are distributed very poorly by the classic interleave and only slightly better by Rau's pseudo random; they are, however, distributed very well by Ranade and Rau's prime degree interleaves. Distribution by a shunted interleave of the present invention does poorly where Q is a power of two, less poorly where Q is an even number having an odd factor, and well if Q is an odd number. Some interleave patterns, such as described by Lunteren in U.S. Pat. No. 6,381,668, employ plural interleave patterns, the rotation of which are selected by an address field, X, separated from the address field(s) selecting the banks by some number of bits N, N≧0. However, each additional N bit degrades the fineness of the interleave pattern such that 2N−1/2N of same positions of each 1/ of same sized blocks from 2X rows to 2X+N rows are in same banks. Additionally, bank usage within each given interleave pattern of 2X+N rows becomes more disparate with more N bits, using some banks N+1 more times as other banks.
The second quality, order, of an interleave pattern is good if the interleave holds to a constant overall relative order for distributing adjacently addressed data among banks. Rau's pseudo random interleave does not have a good rating for order; the classic, Ranade, Rau's prime degree and the shunted interleave of the present invention have good ratings for order. Good order is important for having large amounts of data of different data streams flow entirely unimpeded, without having to alternately access cache lines from different rows of the same bank.
The third quality, adequacy, of an interleave pattern is best if Q= for allowing up to Q different data streams to be accessing nearby data or data interleaved among the same Q banks. A shunted interleave of the present invention can have Q< which is preferred for having a low impediment ratio even while reducing adequacy and distribution to still be within tolerable levels.
A shunted interleave according to the present invention may be odd or even, abridged or unabridged, and abbreviated or unabbreviated. Most preferred are odd abbreviated unabridged interleaves, second most are odd abbreviated abridged (even or odd) interleaves, and third most are odd unabbreviated (abridged odd) interleaves. Where Q is even, other than power of two, the effectiveness for interleaving rows is nearly as good as if Q is equal to its largest odd factor. Of all shunted interleaves, only odd interleaves are preferred for Q being prime to powers of two strides. Unabridged interleaves are preferred for allowing full use of the range of addressing of physical address 11 and potentially allowing a reduction in the A field to reduce the size of ROM 14. Abridged interleaves do not allow the A field, and therefore ROM, to be reduced because they disallow use of physical addresses where for the B field, ≦[B]≦2B−1 (if A were reduced, then the void physical address spaces would be within at least largest page, an unmanageable condition for the operating system); the operating system not mapping pages to use any such physical addresses. Perhaps most importantly, abbreviated interleaves are preferred for having smaller impediment ratios.
One feature of the interleave pattern of the present invention is that the single remainder row of the pattern assures that the Q banks are accessed nearly equally during execution of each interleave pattern. More particularly, during execution of any one interleave pattern to successively access rows, R of the Q banks are each accessed L+1 times and each of the remaining Q−R banks is accessed one less time or L times. Moreover, the shunt causes execution of a complete rotation of all interleave patterns to access each of the banks 2A times and may cause no bank to be used more often than once each Q bank usages (thus alleviating the R banks of each interleave pattern from being used any more often than the Q−R). While placement of the remainder row is preferably at the end of each interleave pattern 0, 1, . . . , −1, it would be possible to place the remainder row elsewhere in each interleave pattern, but doing so would risk creating a lack of opportunity for shunt occurring with each update of B field to cause separation between accesses to a same bank during execution of successive interleave patterns, particularly where the remainder row contains a single representation (e.g., R=1).
The present invention thus provides apparatus and process for shunted interleave access of interleaved memories having partially accessed data cells containing cache lines. By finely interleaving data amounts partially accessed per bank for all sizes of blocks up to largest pages for mapping virtual addressing, the shunted interleave patterns provide improved odds of quicker access to memory than prior classic and Rau's pseudo random interleaves and provide a more fine and even distribution of access for data of most interest (data of power of two strides as naturally has same positions in different blocks of data of sizes from single rows to largest pages for mapping virtual addresses) over all of the banks of a memory. The shunted interleave patterns also allow practical designs for addressing apparatuses to translate fewer address bits for accessing large bit-addressed physical memories, thus providing a significant improvement over the Ranade and Rau prime degree interleaves.
Although the present invention has been described with reference to preferred embodiments, workers skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the invention.
The present application is a continuation-in-part of and claims priority of U.S. patent application Ser. No. 11/719,926, filed May 22, 2007, now U.S. Pat. No. 7,779,198 granted Aug. 17, 2010, which in turn is a Section 371 National Stage Application of International Application No. PCT/US/2005/042107 filed Nov. 21, 2005 published as WO/2006/057949 on Jan. 6, 2006 in English, which in turn is based on and claims benefit of U.S. Provisional Application No. 60/630,551 filed Nov. 23, 2004, the content of each which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5293607 | Brockmann et al. | Mar 1994 | A |
5379393 | Yang | Jan 1995 | A |
5452470 | Kantner | Sep 1995 | A |
5596686 | Duluk | Jan 1997 | A |
5634034 | Foster | May 1997 | A |
5790110 | Baker et al. | Aug 1998 | A |
5995438 | Jeng | Nov 1999 | A |
6131146 | Aono | Oct 2000 | A |
6252611 | Kondo | Jun 2001 | B1 |
6292194 | Powell, III | Sep 2001 | B1 |
6381668 | Lunteren | Apr 2002 | B1 |
6430672 | Dhong et al. | Aug 2002 | B1 |
6480943 | Douglas et al. | Nov 2002 | B1 |
6665768 | Redford | Dec 2003 | B1 |
6732253 | Redford | May 2004 | B1 |
6807602 | Hornung et al. | Oct 2004 | B1 |
6807603 | Gupta et al. | Oct 2004 | B2 |
6874070 | Gupta et al. | Mar 2005 | B2 |
6895488 | Feung et al. | May 2005 | B2 |
6925589 | Ben-Ezri | Aug 2005 | B1 |
6931518 | Redford | Aug 2005 | B1 |
7266132 | Liu et al. | Sep 2007 | B1 |
7266651 | Cypher | Sep 2007 | B1 |
7318114 | Cypher | Jan 2008 | B1 |
7337275 | Wolrich et al. | Feb 2008 | B2 |
7418571 | Wolrich et al. | Aug 2008 | B2 |
7471589 | Kim et al. | Dec 2008 | B2 |
7487505 | Rosenbluth et al. | Feb 2009 | B2 |
7515453 | Rajan | Apr 2009 | B2 |
7515588 | Naik et al. | Apr 2009 | B2 |
7610451 | Wolrich et al. | Oct 2009 | B2 |
7610457 | Lee | Oct 2009 | B2 |
7634621 | Coon et al. | Dec 2009 | B1 |
7647470 | Sasaki et al. | Jan 2010 | B2 |
20040019756 | Pergo | Jan 2004 | A1 |
20040093457 | Heap | May 2004 | A1 |
20050185437 | Wolrich et al. | Aug 2005 | A1 |
20090043943 | Hutson | Feb 2009 | A1 |
20100312945 | Hutson | Dec 2010 | A1 |
Number | Date | Country | |
---|---|---|---|
20100138587 A1 | Jun 2010 | US |
Number | Date | Country | |
---|---|---|---|
60630551 | Nov 2004 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11719926 | US | |
Child | 12698719 | US |