The execution of machine language instructions by a processor may involve storing data chunks in a destination device, such as a system memory or a cache memory. For reasons such as prioritization of accesses to the destination device, or for any other reason, the destination device may not be accessible for storing the data chunks at the same time the data chunks are available to be stored.
In order to bridge the time gap between the availability of the data chunks and the accessibility of the destination device, or for any other reason, an intermediate buffer (“write buffer”) may be used in the processor to temporarily store the data chunks until they can be stored in the destination device.
Such a write buffer may be implemented, for example, as a pointer-based first-in-first-out (FIFO) memory. The pointer-based FIFO may include, for example, a random access memory, and a control unit may select any entry in the random access memory to store a data chunk received through an input port of the FIFO. In addition, the control unit may control an output multiplexing unit of the FIFO to retrieve the data chunks from the random access memory in the same order the data chunks were received through the input port, and may control outputting the data chunks through an output port. A specific data chunk is written to and read from only one location in the random access memory. The read and write pointers of the FIFO change from one data chunk to another.
In another example, a write buffer may be implemented as a shift-based FIFO memory. The shift-based FIFO may have an input storage element, an output storage element and intermediate storage elements. A data chunk received through an input port of the write buffer may be initially stored in the input storage element, and may propagate through all the intermediate storage elements, one at a time, according to the availability of empty storage elements and accessibility of the destination device, until it is stored in the output storage element. The destination device may receive the data chunk from the output storage element of the write buffer.
A write buffer implemented using a pointer-based FIFO may have dynamic power consumption that is lower than the dynamic power consumption of a write buffer implemented using a shift-based FIFO. One possible reason for the difference in dynamic power consumption may be that a data chunk that is written to one entry of the pointer-based FIFO is outputted from the same entry, while a data chunk is propagated through several storage element of the shift-based FIFO before being outputted.
On the other hand, a write buffer implemented using a pointer-based FIFO may require more silicon area than a write buffer implemented using a shift-based FIFO, and may have higher combinatorial propagation delays that may impair the frequency performance of the pointer-based FIFO write buffer. One possible reason for the larger silicon area and the lower frequency performance may be the output multiplexing unit of the pointer-based FIFO.
Embodiments of the invention are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like reference numerals indicate corresponding, analogous or similar elements, and in which:
It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However it will be understood by those of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present invention.
Write buffer 24 may be able to receive from LSU 38 and 40, via input ports 42 and 44, respectively, data chunks to be stored in data memory 6, and to store the received data chunks internally. Write buffer 24 may be able to receive data chunks from elsewhere in processor 10 and to store the received data chunks internally. In some processors, the size of a data chunk may be variable, whereas in other processors, the size of a data chunk may be fixed. The size of a data chunk may be any number of bits; the following description is for a fixed size of 32 bits.
Output ports 46 and 48 of write buffer 24 may be coupled to, for example, data memory bus 12, and write buffer 24 may be able to output internally stored data chunks through output ports 46 and/or 48 to data memory bus 12, prior to these data chunks being stored in data memory 6.
32-bit address buses from LSU 38 to input port 42, from LSU 40 to input port 44, from output port 46 to data memory bus 12 and from output port 48 to data memory bus 12, as well as the 32-bit address portion of data memory bus 12 are not shown in
Write buffer 24 may receive control signals 50 that may be generated by CBU 20 and/or DAAU 18 and/or PCU 16 and/or memory subsystem controller 22 and/or any other unit of processor 10. Control signals 50 may control reception of data chunks by write buffer 24 and may control outputting the data chunks by write buffer 24.
In addition, control signals 50 may control the number of cycles of a clock 52 that pass from reception of a particular data chunk by write buffer 24 and outputting the particular data chunk from write buffer 24. Clock 52 is not necessarily a regular clock with cycles of a fixed time period. Rather, clock 52 may be generated by any logic function and different cycles of clock 52 may have different time periods.
Write buffer 24 is a dual-input, dual-output write buffer. Write buffer 24 may include one or more routing blocks 62, controlled by control signals 50, to provide alternative propagation paths for data chunks from input ports 42 and 44 to output ports 46 and 48. In the example shown in
In write buffer 24, routing blocks 62A, 62B, 62C, 62D, 62E and 62F each have two data-chunk-sized inputs and one data-chunk-sized output. Routing blocks 62G and 62H each have four data-chunk-sized inputs and one data-chunk-sized output Control signals 50 couple one of the inputs of a routing block to the output of the routing block.
Routing block 62A couples input ports 42 and 44 to storage element 60A. Routing block 62B couples input port 44 and storage element 60A to storage element 60B.
Routing block 62C couples storage elements 60A and 60B to storage element 60C. Routing block 62D couples storage elements 60B and 60C to storage element 60D. Routing block 62E couples storage elements 60C and 60D to storage element 60E. Routing block 62F couples storage elements 60D and 60E to storage element 60F.
Routing block 62G couples storage elements 60C, 60D, 60E and 60F to storage element 60G. Routing block 62H couples storage elements 60D, 60E, 60F and 60G to storage element 60H. The output of storage element 60G is coupled to output port 48, and the output of storage element 60H is coupled to output port 46.
Lengths of alternative propagation paths are independently selectable for each data chunk received through input port 42 or 44. Different lengths of propagation paths result in a variable effective depth of the write buffer.
Many different propagation paths are possible in write buffer 24. Arbitrarily selected, some propagation paths are presented in TABLE 1 to demonstrate possible lengths of alternative propagation paths. Each row of TABLE 1 represents a propagation path. The length of a propagation path is recorded as the number of storage elements through which a data chunk is propagated, and the storage elements that form part of the propagation path are marked with “X”.
It should be noted that one configuration of routing blocks 62 concurrently provides path “e” from input port 42 to output port 48 via storage elements 60A, 60C, 60E and 60G and path “j” from input port 44 to output port 46 via storage elements 60B, 60D, 60F and 60H. Path “e” and path “j” each delay the data chunks by at least 4 clock cycles. If a destination device is not accessible, the delay may be even longer. A different configuration of routing blocks provides path “a” from input port 42 to output port 46 via all the storage elements 60A-60H. Path “a” delays the data chunks by at least 8 clock cycles. If a destination device is not accessible, the delay may be even longer. Yet another configuration of routing blocks provides path “d” from input port 52 to output port 48 via storage elements 60A, 60B, 60C, 60D and 60G.
Write buffer 124 is a single-input, single-output write buffer. Write buffer 124 may include a routing block 162, controlled by control signals 150, to provide alternative propagation paths for data chunks from an input port 142 to an output port 146. Routing block 162 has four data-chunk-sized inputs and one data-chunk-sized output. Control signals 150 couple one of the inputs of routing block 162 to the output of routing block 162.
The input of storage element 160A is coupled to input port 142. The inputs of storage elements 160B, 160C, 160D, 160E, 160F and 160G are coupled to the outputs of storage elements 160A, 160B, 160C, 160D, 160E and 160F, respectively. The output of storage element 160H is coupled to output port 46.
Routing block 162 couples storage elements 160D, 160E, 160F and 160G to storage element 160H.
Lengths of alternative propagation paths are independently selectable for each data chunk received through input port 142. Different lengths of propagation paths result in a variable effective depth of the write buffer.
Some propagation paths from input port 142 to output port 146 are presented in TABLE 2 to demonstrate possible lengths of alternative propagation paths. Each row of TABLE 2 represents a propagation path. The length of a propagation path is recorded as the number of storage elements through which a data chunk is propagated, and the storage elements that form part of the propagation path are marked with “X”.
It should be noted that the different paths “aa”, “bb”, “cc”, “dd” and “ee” have different lengths. It should also be noted that path “aa” includes all of the storage elements 160, while the other paths exclude at least one of the storage elements. In paths “bb”, “cc”, “dd” and “ee”, the excluded storage elements are a chain of one or more storage elements that immediately precede storage element 160H.
While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the spirit of the invention.