1. Technical Field of the Invention
This invention relates generally to register files for storing data in a processor, and more particularly to register files adapted for use in conjunction with matrix arithmetic logic units.
2. Background Art
When data are read out of the register file, they are held in a latch before being provided as input to an arithmetic logic unit (ALU). Commonly, a register file has at least two read ports and can simultaneously provide at least two outputs which are used as operand inputs into one or more ALUs. For simplicity, only a single read port is shown. Commonly, the register file has one or more write ports into which the ALU's result is written back to the register file. For simplicity, no write port is shown.
Within a data item stored within a physical row, the most significant byte (MSB) is toward column 7 and the least significant byte (LSB) is toward column 0, in a “big-endian” configuration. Both the prior art and the present invention will be discussed in terms of big-endian configurations, although neither is thus limited.
Although the storage locations have a “physical” size, such as one byte each, most digital logic systems include hardware for enabling the software to utilize data items of one or more different “logical” sizes. For purposes of illustration, the prior art and the present invention will be explained as being able to access logical data including single-byte data, word (2-byte) data, double-word (4-byte) data, and quad-word (8-byte) data.
Some digital processing systems access and process vectors of these data types, in which two or more data items (of a particular size) are grouped and processed together. Some digital processing systems access and process scalar data (single data items of a particular size). Both types of systems can benefit from improved register file access performance, and from improved register file wiring and configuration.
A word data item occupying e.g. storage locations C7 and C6 is referred to as C7:6, a double-word item occupying storage locations A3, A2, A1, and A0 is referred to as A3:0, and a quad-word item occupying storage locations G7, G6, . . . G0 is referred to as G7:0.
Previously, the register file configuration as it appears to software, known as the “logical” configuration, has been identical with the way that the register file is actually constructed, known as the “physical” configuration. If the logical configuration is an 8-by-8 matrix of single-byte storage locations, then the register file has been physically constructed as an 8-by-8 matrix of single-byte storage locations.
This is not a problem when the register file is being accessed row-wise, because each of the storage elements' data can be directly, vertically driven onto its own, local set of bit lines. However, when the register file is accessed column-wise, the storage elements' data must be driven both vertically and horizontally (so data from multiple data elements in a single column can be driven onto different vertical bit lines). This requires a substantial amount of additional, horizontal wiring and control logic in each row of the register file. Existing systems limit the vector element size that can be accessed column-wise, to avoid an explosion in the number and complexity of required wires and logic.
One example of a recent attempt to deal with this problem is presented in a paper entitled “A Register File with Transposed Access Mode” by Yoochang Jung, Stefan G. Berg, Donglok Kim, and Yongmin Kim of the Image Computing Systems Laboratory at the University of Washington, Seattle, Wash., 98195. Although it does permit both row-wise and column-wise accesses, the Jung register file has several significant drawbacks. It requires separate address decoders for row-wise access and for column-wise access. For each data element size (byte, word, etc.) it requires a separate copy of each of the row-wise and column-wise address decoders. The number of rows useable in column-wise access is reduced by a factor of X, where X is the number of bytes in the data element size. And, as nearly as we can ascertain, the width of each row's bus is equal to the largest permissible data element size.
Thus, for all row-wise accesses and for byte-sized column-wise access, the existing systems do just fine. The problem manifests itself when word-sized and larger column-wise accesses are performed.
Referring to
However, referring to
Some existing systems have solved this problem by adding additional decoders which require vertical column select lines and additional horizontal routing associated with each data port and for each data size which the system is able to access. For example, to perform the read shown in
What is needed, then, is an improved matrix register file which, with a minimal amount of additional wiring, allows logical rows and columns to be accessed using any of several elemental data sizes.
FIGS. 20A-D together show one embodiment of the contents of a lookup table, or of the output of logic, including the element selection line values and column output selection line values which result from each address combination of row-wise indicator, row/column index, and data element size indicator.
The invention will be understood more fully from the detailed description given below and from the accompanying drawings of embodiments of the invention which, however, should not be taken to limit the invention to the specific embodiments described, but are for explanation and understanding only.
In the prior art, logical rows and physical rows were the same thing. In the prior art, logical columns and physical columns were the same thing. In other words, a storage location's logical address e.g. “D4” precisely indicated its physical location within the register file. Row-wise access was performed simply by decoding the register address and activating a single “row select” line.
According to the present invention, logical rows are organized in physical columns, and within each physical column, the storage locations have been reordered differently. The result is that logical rows and physical rows are not only not the same thing, but the physical locations that make up a logical row are not even stored in the same physical row.
This reorganization can be done in a variety of manners.
Within physical column 7, the storage locations of Logical Row A are stored sequentially from physical row 7 to physical row 0. Within physical column 6, the storage locations of Logical Row B are stored sequentially from physical row 3 to physical row 0, wrapping to physical row 7 and continuing to physical row 4. Within physical column 5, the storage locations of Logical Row C are stored sequentially from physical row 5 to physical row 0, wrapping to physical row 7 and continuing to physical row 6. Within physical column 4, the storage locations of Logical Row D are stored sequentially from physical row 1 to physical row 0, wrapping to physical row 7 and continuing to physical row 2. Within physical column 3, the storage locations of Logical Row E are stored sequentially from physical row 0, wrapping to physical row 7 and continuing to physical row 1. Within physical column 2, the storage locations of Logical Row F are stored sequentially from physical row 4 to physical row 0, wrapping to physical row 7 and continuing to physical row 5. Within physical column 1, the storage locations of Logical Row G are stored sequentially from physical row 6 to physical row 0, wrapping to physical row 7. Within physical column 0, the storage locations of Logical Row H are stored sequentially from physical row 2 to physical row 0, wrapping to physical row 7 and continuing to physical row 3.
No two Logical Rows have their storage starting in the same physical row, and, significantly, the physical storage locations which are accessed in any single logical column-wise access of any data size are all stored within different physical rows. Each physical row contains exactly one element from each logical row.
Each storage location has a dedicated selection logic element 32 and a dedicated column output control logic element 36. In one embodiment, the selection logic element is a three-input AND gate with its inputs in positive or negative (inverted) state as indicated by the three-digit binary value, such that if that three-digit value is asserted on the ESel line, exactly that one selection logic element in the physical row will produce an active output enable signal to its storage cell. In the illustrated embodiment, the storage cell responds to this enable signal by outputting its stored value onto a common busN which is shared by the storage elements in that physical row. In one embodiment, the output control logic element operates similarly, such that if its corresponding three-digit value is asserted on the COut line, exactly that one output control element will pass onto its corresponding eight-bit column output bit line 26 the value on the bus.
The ESel and COut values are, in one embodiment, driven from a lookup table. The lookup table is indexed by the logical row or logical column identifier, a data size indicator, and a column-wise/row-wise selector value.
FIGS. 20A-D together illustrate one example of a suitable lookup table for generating the ESel and COut values. For ease of understanding, the respective byte, word, double-word, and quad-word sections have been grouped vertically; however, the two-bit value which selects between these four addressing modes might typically be utilized in conjunction with the row-wise selector bit and the three-bit row or column selector value. In other words, the lookup table may be indexed by a 6-bit value comprising:
<1-bit row-wise indicator><3-bit row or column index><2-bit size indicator>
If the row-wise indicator value is 1, the register file is being accessed row-wise; if it is 0, the register file is being accessed column-wise. The row or column index is a value in the range 111 (7) through 000 (0). A size indicator of 00 may cause byte-sized data access, 01 may cause word-sized data access, 10 may cause double-word-sized data access, and 11 may cause quad-word-sized data access. If other sizes are permitted, the indicator will need to be encoded accordingly. Similarly, the size of the row or column index will need to be selected according to the size of the register file.
Typically, the table will output forty-eight bits, comprised of the three-bit ESel value and the three-bit COut value for each of the eight physical rows in the register file. Within each cell of the following table, the eight three-bit values are organized top to bottom indicating the ESel or COut values provided to physical row 7 through physical row 0. The number of bits output per table access will depend on the size of the register file.
In other embodiments, rather than the ESel and COut values being stored in a table, they could be generated by decoder logic. This may offer some opportunity for die area savings. For example, in row-wise access mode, the ESel value is simply the same as the row/column index value, which can be passed straight through the decoder logic without the need for any storage cells. Similarly, in bite-size column-wise access mode, the ESel and COut values are identical, and in quad-word column-wise access mode, the ESel value is the same as the row/column index value. These and other embodiments and optimizations will be readily apparent to those skilled in the art, armed with the teachings of this disclosure.
There are a variety of such mappings which can be applied to the physical register file within the teachings of this invention. What matters is that, regardless of which logical row or column and which data element size is used in the access, no physical row contains two or more of the required storage locations.
When the digital logic system (not shown) makes an access of a logical row or column whose address puts it within the first portion of the register file, the lookup table (or other suitable means such as a state machine or hard coded logic) uses the row-wise indicator, data size indicator, and row/column index to generate the appropriate ESel and COut values to access the required storage elements within the first portion of the register file. The ESel values select the correct storage element in each respective row of that portion of the register file, and the COut values steer them onto their correct bit lines. The first portion of the register file thus permits accessing both logical rows and logical columns.
When the digital logic system makes an access of a logical row whose address puts it within the second portion of the register file, e.g. if the first portion contains 16 logical rows 0 through 15 and the access is to logical row 27, decoder logic responds to the logical row index to generate a row select signal enabling access of a physical row within the second portion of the register file. Because the second portion does not use the COut logic, the bytes within the selected row cannot be steered and are simply output on the bit lines at their respective column positions. Thus, the second portion of the register file permits accessing only logical rows. The COut lines in the first portion of the register file are enhanced with an extra “enable” bit which, when deasserted, prevents that that row from being coupled to any of the bit lines. Alternatively, a single enable line could be added to decouple the first portion's bit lines from the second portion's bit lines.
In other embodiments, the second portion of the register file could be modified to permit accessing logical columns as well. In one such embodiment, the technique of this invention could be used. In other embodiments, other techniques could be used.
In one embodiment, two or more register files according to the teachings of this invention may be stacked vertically, to share bit lines. For example, if the physical row is 8 bytes wide, it may be convenient to include 8 physical rows in the register file so it is square. Then, if more than 8 rows are needed, it may be convenient to simply stack two such register files vertically, and use the most significant bit of the row/column index value to select between the two register files.
When one component is said to be “adjacent” to another component, it should not be interpreted to mean that there is absolutely nothing between the two components, only that they are in the order indicated.
The various features illustrated in the figures may be combined in many ways, and should not be interpreted as though limited to the specific embodiments in which they were explained and shown.
Except where expressly indicated otherwise, the term “line” should not be interpreted as meaning exactly one single wire; rather, it generally indicates one or more wires carrying one or more related bits of data.
Those skilled in the art having the benefit of this disclosure will appreciate that many other variations from the foregoing description and drawings may be made within the scope of the present invention. Indeed, the invention is not limited to the details described above. Rather, it is the following claims including any amendments thereto that define the scope of the invention.