The invention relates to memories, and more particularly relates to transferring information to a memory via a memory interface.
Computer memories have normally been designed to use one memory device for each bit, or for each small group of bits, of any individual computer word. The word size is governed by the choice of computer, and word sizes typically have ranged from 4 to 64 bits. Thus, each memory device is usually connected to one of a series of data lines. One or more devices may be connected to each data line, but typically only a small number of data lines are connected to a single memory device. Data is thus accessed (i.e. read) or provided (i.e. written) in parallel, for each memory read or write operation, respectively.
Different memory access techniques use various burst lengths for providing optimum performance of each application. For example, graphics require short bursts, whereas cache-filling uses long bursts. However, if only short or long bursts are used, then power and performance are necessarily lost.
The RAMBUS company has defined a method which defines burst length at the same time as burst access is performed. The RAMBUS method is limited to predefined burst lengths having multiples of 2. A pair of RAMBUS patents are incorporated herein by reference: Farmwald I (U.S. Pat. No. 6,032,214), and Farmwald II (U.S. Pat. No. 6,034,918). Incidentally, those two patents refer to the burst length as “block size.”
It is well known that memory can have a predefined burst length, which is typically a multiple of 2 (e.g. 2, 4, 8 or 16), as in the aforementioned RAMBUS patents. If a memory bus executes fast, then it is important to give the memory a maximum time to fetch the right amount of data for optimum performance and power consumption. A known way to do this is to select, from certain fixed values, one particular value that will be used during a burst, before making burst access, or while burst access is performed (as in Farmwald I and II).
It is also known to have a continuous burst without limit, and to stop this kind of continuous burst with a command in a command bus. According to such a method, a burst stop (BST) command is given when it is desired to stop the burst. Unfortunately, that method causes empty clocks regarding the data bus, at least in case of Dynamic Random Access Memory (DRAM) busses. A continuous burst stop is described, for example, in a document from MICRON TECHNOLOGY, INC. titled “Mobile Double Data Rate (DDR) SDRAM,” which is also incorporated herein by reference. The MICRON document refers to a continuous burst stop as a “burst terminate (BST)”.
Data mask signaling is often used in case of a write operation, and data mask signaling has also been used for read operations in a DRAM environment. The status of a data mask pair indicates whether data on the bus is valid or should instead be ignored. Unfortunately, that method does not remove invalid data from the bus, and leaving invalid data in the bus causes the problem that data bus cycles are lost.
Some of these problems are addressed by pending U.S. Provisional Application 60/779,269 titled “Method, Mobile Device, System and Software for Flexible Burst Length Control” (filed 2 Mar. 2006) which is incorporated in its entirety by reference herein. However, that pending application still leaves some challenges with respect to several byte masking cases. Although that pending application uses data masks for burst stop, a new method is needed to overcome the challenges regarding some of the masking cases. That pending application provided flexible burst length for write and read to achieve optimum performance, and that method can still be used, but supplementary improvements are needed.
The present invention discloses a way to have flexible burst length for write operations of RAM or memories with similar functionality. In particular, an improved write method for DRAM-type execution memory is presented. In combination with earlier known methods, the present invention allows full masking functionality like word masking.
According to this invention, the memory will always use a continuous burst or maximum supported burst length (which can be stopped). The method of the present invention is especially useful for allowing flexible burst length with a fast memory interface.
Thus, an unspecified burst length can be used until the burst is stopped, or alternatively a maximum burst length could be used (e.g. 16) but is stopped in case of a shorter burst. The stop method can be accomplished, for example, by reusing data mask signals to indicate when data is supposed to be stopped
A preferred embodiment of the present invention will now be described, merely to illustrate one way of implementing the invention, and without limiting the scope or coverage of what is described elsewhere in this application.
According to this embodiment, a new write command is introduced, and an existing write command's functionality is redefined. This means that a first write command is used for a short burst (e.g. length of prefetch such as 4), and another write command is used for a longer burst which are multiples (N>1) of the prefetch (e.g. 8 or 16).
A memory controller can use mask functionality only with the short burst. Conversely, the bus stop can only be used for the long burst.
Compared to previous behavior, the mask signal is now advanced so that it optimally has one clock latency, and also other values are possible, such as two clocks. This way, conflicts between two meanings of byte mask pins cannot exist, since potential mask signals for a short burst do not occur when the stop signal could be active.
As seen in
Instead of introducing a new command, it is thus possible to separate different write cases with a parameter which indicates data mask usage (i.e. allowed or not allowed). Advantages of this embodiment include bringing all combined advantages of earlier technology to bear on all use cases. That includes optimum power and performance. The performance benefit comes also from improved data mask operations. In other words, during the write, no useless data cycles are needed. Furthermore, there are no additional pins, and no command cycles are lost (in case of signal or calculation-based methods).
The present invention can operate in conjunction with at least four related types of burst length control. Each of these four implementations attains the same principle functionality.
The first of the four implementations is a signal-based method. In an optimized controller, one of the data mask signals (or alternatively some other additional/existing signal) is used to indicate to the memory a time of column address strobe (tCAS) before which the data bus must be released. The tCAS indicates the time (e.g. number of clock cycles) needed to access valid data on the data bus. However, the tCAS time is a minimum, and in some cases it makes sense to have available a longer duration for providing more time for the memory to act. In case of very fast buses and over-optimized performance, it can be useful to provide an additional write burst stop with read or write, since it is possible to have a timing conflict for usage of the mask signal in this case, or power/performance might be lost because of a stop indication that is too late (e.g. memory is already started next fetching).
The second of the four implementations is a register-based method. This method could be understood as advanced burst stop, used already in DRAMs. The novelty of this implementation resides in predefining the time, once a stop indication arrives, and a register for storing this time is needed. The way to indicate a burst stop can be either a command, or a register write, or even a signal.
The third of the four implementations is a calculation-based implementation. In this method, the memory has a capability to recognize when a counter starts (i.e. some command starts the counter), and to recognize when the counter stops due to commands or addresses. This counter value would then indicate how many data cycles are needed. For example, an indication of a start could be a row address which comes with a row activate command and a finishing column address which comes with a read or write command, or in case of column-only then a column address could be split into two or even more cycles. The challenge for this method is that for different burst lengths there would be different times, e.g. in case of address; counter start and stop commands time interval is predefined by the needed burst length, therefore making the bus usage more complicated.
The fourth of the four implementations is an enhanced signal-based method. In an optimized controller, one of the data mask signals (or alternatively some other additional/existing signal) is used to indicate to the memory a time which defines when the data bus must be released. In this implementation, the memory calculates rising clock edges from a read/write command (falling edges could be used as well but then of course the formula is different). The result of this calculation is then used in a formula, such as 2 to the power of sum, or 2 times the sum. This formula could be used also with the calculation-based implementation. According to that formula, if a burst length of 8 is desired, for instance, then 3 rising edges would be provided. This method could be enhanced: e.g. the calculation could start from 2, 4 or something else. In case of starting from 2, one calculated rising edge would result in a burst length of four. This implementation gives the longest time for the memory to behave properly. Among these four implementations, the signal-based methods are likely to provide the best implementation, especially if it is a data-mask signal-based implementation in a case like the DRAM case.
According to the flow chart 300 of
Turning now to the system 400 of
The present invention can be implemented using a general purpose or specific-use computer system, with program code conforming to the method described herein. The program code is designed to drive the operation of the particular hardware of the system, and to be compatible with other system components and I/O controllers. The computer system of this embodiment may include a CPU processor, comprising a single processing unit, or multiple processing units capable of parallel operation, or the processor can be distributed across one or more processing units in one or more locations, e.g., on a client and server. The memory containing the memory component 430 may comprise any known type for data storage, including magnetic media, optical media, random access memory (RAM), read-only memory (ROM), a data cache, a data object, etc. Moreover, the memory may reside at a single physical location, comprising one or more types of data storage, or be distributed across a plurality of physical systems in various forms.
It is to be understood that the present figures, and the accompanying narrative discussions of best mode embodiments, do not purport to be completely rigorous treatments of the method, system, apparatus, and software product under consideration. A person skilled in the art will understand that the steps and signals of the present application represent general cause-and-effect relationships that do not exclude intermediate interactions of various types, and will further understand that the various steps and structures described in this application can be implemented by a variety of different sequences and configurations, using various different combinations of hardware and software which need not be further detailed herein.
This application claim priority to U.S. Provisional Application 60/842,196 filed Aug. 31, 2006.
Number | Date | Country | |
---|---|---|---|
60842196 | Aug 2006 | US |