In general, the present disclosure is directed to accessing data from memory in a processor-based system. More particularly, the present disclosure is directed to cache systems and allowing the access of data in cache memory while a cacheline is being filled to thereby increase the processor access speed.
The demand on computer systems to quickly process, store, and retrieve large amounts of data and/or instructions continues to increase. One way to speed up a processor's access of stored data is to use cache memory for storing a duplicate copy of the data that the processor most recently retrieved from main memory. When the processor requests data that resides in the cache, the data can be retrieved much more quickly from cache than if the processor is required to retrieve the same data from main memory. Since software is typically written such that the same locations in memory are accessed over and over, it has been known in the art to incorporate some type of cache system in communication with the processor for speeding up the data access time by making the needed data more quickly accessible.
The cache controller 22 is configured to be connected in the cache system 20 so as to control the operations associated with the cache 24. When the processor 12 requests to access data from main memory 14, the cache controller 22 first checks to see if the data is already in the cache 24. If it is, then this access is considered a “cache hit” and the data can be quickly retrieved from the cache 24. If the data is not in the cache 24, then the result is a “cache miss” and the processor 12 will have to request the data from main memory 14 and store a copy of the data in the cache 24 for possible use at a later time.
The operation of the cache controller 22 will now be described. When the processor makes a memory access request, the cache controller 22 determines whether the access is a cache miss or a cache hit. For a cache miss, the cache controller 22 allocates a cacheline in the cache array to be filled. Before filling a cacheline 26, however, the cache controller first invalidates the cacheline 26 since the data being filled cannot be accessed until the entire cacheline 26 is filled. Then the cache controller 22 retrieves data from main memory 14 and fills the cacheline 26 one entry at a time to replace the old values in the cacheline 26. The cache controller 22 retrieves data not only from the one location being requested, but also from a series of sequential memory locations. This is typically done in anticipation of the processor 12 possibly needing the data from these additional locations as well. For example, with a cacheline having eight entries, a request to address 200 will cause the cache controller to fill the data from addresses 200 through 207 into the respective entries 28 of the cacheline 26. When data is written to the cache 24, it is written into one entry 28 at a time until that cacheline 26 is completely filled. After completely filling the cacheline 26, the cache controller 22 validates the filled cacheline 26 to indicate that data can then be accessed therefrom. One valid bit is used per cacheline 26 to indicate the validity of that cacheline 26.
A problem with the conventional cache system 20, however, is that when the processor 12 requests access to data in a filling cacheline, this request is neither a cache hit nor a cache miss. It is not considered a cache hit because the filling cacheline is flagged as invalid while it is filling. Therefore, this situation is handled differently than for a cache hit or cache miss. In this situation, the cache controller 22 asserts a wait signal for “waiting the processor”, or, in other words, causing the processor to wait, for the amount of time necessary for the cacheline to be filled and validated. Then, the access to the filled cacheline will hit in the cache and the data can be retrieved.
If the request in decision block 32 hits in the cache, then the data in cache can be accessed. In this case, flow is directed to decision block 36 where it is determined whether the request is a read or a write. For a read request, flow goes to block 38, but if the request is a write, then flow goes to block 40. In block 38, the data can be immediately read from cache and the processor resumes operation with its next instructions. In block 40, a process for writing data into the cache begins. In this writing process, data to be stored is written to cache and can be written to main memory at the same time or, alternatively, data can be written to main memory after the write-to-cache operation.
As can be seen from the flowchart of
However, it should be evident that flowchart 42 of
If it is determined in block 44 that the request does hit in the filling cacheline, then the flow proceeds to decision block 50, which determines whether or not the data access request is made for a location (entry) in the cacheline that has already been filled. If block 50 determines that the location has not yet been filled, then flow is directed to block 52, where the processor is waited and the filling process is continued for the filling cacheline until the location is filled. When the requested location is filled, the data will also be fed back to the processor (block 56) if the request is determined to be a read in decision block 48. If it is determined in block 50 that the location in the filling cacheline has already been filled, then flow is directed to block 54.
In block 48, it is determined whether or not the request is a read or a write. For a write, the flow proceeds to block 54, but for a read, the flow proceeds to block 56. In block 54, the processor is waited until the entire cacheline is filled. After the cacheline is filled, the process flow continues on to block 36, where the steps mentioned above with respect to
Even though
Cache systems and methods associated with cache controlling, described in the present disclosure, provide improvements to the performance of a processor by allowing the processor to access data at an increased speed. One embodiment of a cache system according to the teaching of the present disclosure comprises a cache controller that is in communication with a processor and cache memory that is in communication with the cache controller. The cache memory comprises a number of cachelines for storing data, wherein each cacheline has a number of entries. The cache system further includes a buffer system that is in communication with the cache controller. The buffer system comprises a number of registers, wherein each register corresponds to one of the entries of a filling cacheline. Each respective register stores the same data that is being filled into the corresponding entry of the filling cacheline. The cache controller of the cache system is configured to store the same data in both the filling cacheline and in the registers of the buffer system. During a cacheline fill process, the data in the registers of the buffer system can be accessed even though the valid bit associated with the filling cacheline indicates it is invalid.
Many aspects of the embodiments of the present disclosure can be better understood with reference to the following drawings. It can be noted that like reference numerals designate corresponding parts throughout the drawings.
The cache system 58 of
When the processor requests a write to a cache location that hits in a filling cacheline, the cache controller 62 writes the data into the buffer system 66 and allows this data to be written to cache 64 when the rest of the cacheline has been filled. Thus, with the updated data written into the buffer system 66, if the processor makes a subsequent read request of that location prior to the completion of the cacheline fill, then the cache controller 62 will read the appropriate value out of the buffer system 66.
The buffer system 66 stores the data in accessible registers while the same data is being filled into the cacheline. By storing a duplicate copy of the data in the buffer registers, the buffer system 66 allows data to be accessed without interrupting the filling cacheline or causing undesirable processor waiting times. Since the buffer system 66 stores a copy of the data that is also being filled in the filling cacheline, there will actually be three copies of this data—the data that is stored in main memory, the data being filled in the cacheline, and the data stored in the buffer system 66. Since data in the filling cacheline cannot always be accessed, as explained above, and the data in main memory takes a relatively long time to access, the buffer system 66 in these embodiments is capable of being accessed at the faster processor speed while the cacheline fill process is going on. Therefore, for accesses of data in a filling cacheline, these embodiments allow access to this same data in the buffer to free up the processor and allow it to move on to its next instructions, thereby increasing the operational speed of the processor.
In this example illustrated in
The write controlling module 68 is configured to receive a “processor_read” signal along line 78 and a “processor_write” signal along line 80. These signals are sent from the processor to indicate whether the request is a read request or a write request. Also, the buffer system 66 receives from the processor an “address” signal 82, corresponding to the address of the requested data as stored either in main memory or in the cache 64. The address signal 82, having a number of bits n, is input such that the least significant 0 and 1 bits of the address (address [1:0]) are input into the write controlling module 68 along lines 84 and the third through nth least significant bits (address [n:2]) are input into the buffer hit detecting module 70 along lines 86.
The buffer hit detecting module 70 is further configured to receive a “begin_fill” bit along line 88 and a “validate_cacheline” bit along line 90. The begin_fill bit indicates the start of the cacheline filling process and will remain high until the cacheline is completely filled. The validate_cacheline bit indicates whether or not the cacheline has been completely filled. If so, then the cacheline is indicated to be valid by a high validate_cacheline bit. If the cacheline is still in the process of being filled, then the validate_cacheline bit will be low to indicate that the cacheline is not yet valid. The cache controller 62 checks to see if data in the cacheline can be accessed based on whether the requested cacheline has been validated. The buffer hit detecting module 70 outputs a “buffer_hit” bit along line 96 to the write controlling module 68 for indicating when a request hits in the filling cacheline and consequently also hits in the cacheline fill buffer 74.
The validate_cacheline bit along line 90 is also input into the buffer location validating module 72. In addition to indicating the validity of the filling cacheline, the validate_cacheline bit also indicates whether or not the cacheline fill buffer 74 is valid or invalid, since the cacheline fill buffer 74 will be valid during the cacheline fill process when the filling cacheline itself is not valid. Therefore, either the cacheline itself, when completely filled, will indicate it is valid or the cacheline fill buffer 74, during cacheline filling, will indicate it is valid, but not both. A high validate_cacheline bit can therefore be used as a reset signal to invalidate the cacheline fill buffer 74.
Furthermore, the buffer location validating module 72 is configured to receive a “fill_cache_write” bit along line 92 and a two-bit “cache array address [1:0]” signal along line 94. The buffer location validating module 72 outputs four “validate_offset” bits along lines 98 and four “offset_valid” bits along lines 100 to the write controlling module 68, as described in more detail below. The write controlling module 68 outputs a “processor_read_buffer_hit” bit along line 102 for indicating when a processor read request hits in the cacheline fill buffer 74. Also, the write controlling module 68 outputs four “processor_write_offset” bits along lines 104 and four “register_offset_write” bits along lines 106 to the cacheline fill buffer 74. These signals are also described in more detail below.
In addition to the signals along lines 104 and 106, the cacheline fill buffer 74 also receives an eight-bit “fill_write_data [7:0]” signal along lines 108 and an eight-bit “processor_write_data [7:0]” signal along lines 110. The cacheline fill buffer 74 outputs four eight-bit “register_offset [7:0]” signals along lines 112 to the multiplexer 76, which also receives the processor_address [1:0] signal along line 84. The multiplexer 76 includes four inputs 00, 01, 10, and 11 for receiving the signals along lines 112 and a selection input for receiving the processor_address [1:0] signal from line 84. The multiplexer 76 outputs a “buffer_read_data [7:0]” signal along line 114 at the output of the buffer system 66, representing the data that the processor requested, the data of which, as may be unknown to the processor, was being stored in the cacheline fill buffer 74.
When the begin_fill bit along line 88 is high, indicating that the cacheline has begun filling, and the validate_cacheline bit is low, indicating firstly that the cacheline is in the process of filling and is not validated and secondly that the cacheline fill buffer 74 is active, then the output of flip-flop 118 will be high. At this time, it will be known that the cacheline is filling and not yet complete, therefore indicating that the cacheline fill buffer 74 is valid. The high begin_fill bit along line 88 clocks the flip-flop 116 to output the address [n:2] signal to the comparator 120. The comparator 120 detects when the top signals are equal to the bottom signals and at that time outputs a high buffer_hit signal along line 96 to indicate that a request to access data hits in the filling cacheline and can actually be accessed from the cacheline fill buffer 74. This buffer_hit bit is sent to the write controlling module 68 for further processing as is described below.
The buffer location validating module 72, according to this embodiment, includes a validation signal generating module 122 and four flip-flops 126-0, 126-1, 126-2, 126-3. In other embodiments, the buffer location validating module 72 may be designed to include any combination of logic and/or discrete elements to perform substantially similar functions as described herein. The flip-flops 126 essentially operate as set-reset flip-flops but, for example, may comprise D-type flip-flops and accompanying logic components. It should be recognized that the number of flip-flops 126 depends upon the number of entries in the cacheline, wherein each flip-flop 126 corresponds to an entry in the cacheline for indicating which entries are being or have been filled. Also, the validation signal generating module 122 contains any suitable combination of logic components for decoding the input signals along lines 92 and 94 and providing the appropriate responses along lines 124.
During operation of the buffer location validating module 72, the validate_cacheline signal along line 90 will be low, indicating that the cacheline is still filling and is not validated, but, on the other hand, that the cacheline fill buffer 74 is valid. At this time, access requests to the filling cacheline will hit in the cacheline fill buffer 74. When the cacheline is completely filled, and the validate_cacheline signal goes high to indicate that the cacheline is validated, then the flip-flops 126 are reset, and all of the outputs along lines 100 will be low to indicate that none of the locations in the cacheline fill buffer 74 are valid. At this time, however, access requests to the cacheline will hit in the completely filled cacheline and the cacheline fill buffer 74 is therefore not needed in this case. The cacheline fill buffer 74 will therefore be flagged as invalid for the completely filled cacheline and can be used in parallel with another cacheline to be filled.
The validation signal generating module 122 receives the fill_cache_write signal along line 92 and the two-bit address [1:0] signal along line 94. These signals are received from the cache controller 62 indicating that the requested data is currently filling the location in the cacheline corresponding to address [1:0]. In this example, there are four entries, which therefore requires two bits to address the four possible registers corresponding to the four entries in the cacheline. This address may be used to designate an “offset” for identifying the registers in the cacheline fill buffer 74. For example, in this embodiment, the offset is used to identify one of the four registers to indicate the stage of the cacheline filling routine.
The validation signal generating module 122 outputs validate_offset bits along lines 124-0, 124-1, 124-2, and 124-3 to the “set” inputs of respective flip-flops 126. These bits are also transmitted along lines 98 leading to the write controlling module 68. The validate_offset bits indicate which one of the registers in the cacheline fill buffer 74, and the corresponding entry in the cacheline of the cache array, is currently in the process of being filled. A validate_offset—0 bit is sent along line 124-0 to flip-flop 126-0 to indicate that the zero offset register in the cacheline fill buffer 74 is being filled and validate; a validate_offset—1 bit is sent along line 124-1 to flip-flop 126-1; a validate_offset—2 bit is sent along line 124-2 to flip-flop 126-2; and a validate_offset—3 bit is sent along line 124-3 to flip-flop 126-3. The validation signal generating module 122 outputs these validate_offset bits according to the truth table shown below:
The flip-flops 126 are set with the respective validate_offset bits and can be reset by the validate_cacheline bit along line 90. The output of the flip-flops 126 is referred to herein as offset_valid bits, which are sent along lines 100 to the write controlling module 68 shown in
In contrast to the prior art which merely determines whether the entire cacheline is valid, these offset_valid bits indicate which entries stored in the cacheline fill buffer are valid. The term “offset” used herein refers to the location of the registers in the cacheline fill buffer 74, wherein a zero offset refers to the register location corresponding to the actual requested address from main memory. Also, for example, if address 200 were requested, then the register corresponding to address 200 has a “0” offset. The register corresponding to address 201 has an offset of “1”; the resister corresponding to address 202 has an offset of “2”; and the register corresponding to address 203 has an offset of “3”. Therefore, a high offset_valid bit along one or more of lines 100 is used as a flag to indicate that these corresponding offset registers in the cacheline fill buffer 74 are valid.
As an alternative to using an offset_valid bit for each register in the cacheline fill buffer 74, the cache 64 itself may be configured such that there is a valid bit for each entry in each cacheline. However, since the cache 64 may have on the order of about 1024 cachelines, the number of valid bits would be very great. Assuming that there are 1024 cacheline and each cacheline includes 8 entries, then 8192 valid bits would be required to indicate the validity of each entry in such a cache. Of course, caches of greater size would require even more entry valid bits. Although this alternative embodiment is feasible, the use of the cacheline fill buffer as described herein requires only 1032 valid bits for the above example of a cache with 1024 eight-entry cachelines, whereby one valid bit is used for each of the eight entries of the filling cacheline and one valid bit is used for the already-filled validated cachelines that are not in the process of filling. Therefore, the embodiments of
Reference is made again to
Still referring to
Selection inputs to the multiplexers 128 are connected to lines 104, which carry the processsor_write_offset signals as described with reference to the truth tables above. These signals select whether data to be stored in the cacheline fill buffer 74 is received from the main memory or from the processor. The selected output from each multiplexer 128 is provided to the corresponding register 130, shown here as D-type flip-flops. The registers 130 also receive the register_offset_write bits from the write controlling module 68 along lines 106 at a clock input thereof. The register_offset_write bits are output from the write controlling module 68 according to the logic shown in
If the decision in block 132 determines that the request was a cache miss, then flow proceeds to decision block 142, where it is determined whether or not the request hits in the filling cacheline. If not, flow proceeds to block 144, and if so, then flow proceeds to decision block 146. In block 144, since the request does not hit in the cache or in a filling cacheline, then the processor is waited while the cacheline fill process begins. In contrast to
In decision block 146, it is determined whether or not the access request is made to a location that has already been filled in the filling cacheline. If not, flow proceeds to block 148, and, if so, then flow proceeds to decision block 150. In block 148, when the request hits in the filling cacheline but the specific location in the cacheline has not yet been filled, then the processor is waited while the cacheline and cacheline fill buffer continue to fill. The filling process in block 148 continues until the location in the cacheline fill buffer is filled. At this point, the flowchart proceeds to block 150. Also in block 154, the processor resumes, enabling it to make another data request if necessary, even a request to access, data in the partially filled cacheline as recorded in the cacheline fill buffer and even a request to read the data stored in the cacheline fill buffer during the previous write request.
As can be seen from
It should be emphasized that the above-described embodiments of the present application are merely possible examples of implementations that have been set forth for a clear understanding of the principles of the invention. Many variations and modifications may be made to the above-described embodiments without departing substantially from the spirit and principles of the invention. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.