The present invention relates generally to the field of computer memory cache access via computer instructions, and more particularly to improving processor efficiency in the course of memory caching by filtering of unnecessary cache accesses.
Embodiments of the present invention disclose a method, system, and computer program product for refining access to a cache by a processing unit. One or more previous requests to access data from a cache are stored. A current request to access data from the cache is retrieved. A determination is made whether the current request is seeking the same data from the cache as at least one of the one or more previous requests. A further determination is made whether the at least one of the one or more previous requests seeking the same data was successful in arbitrating access to a processing unit when seeking access. A next cache write access is suppressed if the at least one of the one or more previous requests seeking the same data was successful in arbitrating access to the processing unit.
In a second embodiment, the present invention discloses a method, system, and computer program product for refining access to a cache by a processing unit. A determination is made whether a new stream of instructions has begun. A further determination is made whether a successful cache access has occurred in the new stream of instructions. A next cache write access is suppressed if the new stream of instructions has begun and the successful cache access has occurred.
In a third embodiment, the present invention discloses a method, system, and computer program product for refining access to a multilevel cache system by a processing unit. A determination is made whether a request to access data from a lower level cache in a multilevel cache system is likely to succeed. A further determination is made, if the request to access data is not likely to succeed, whether the request to access data from the lower level cache is redundant of a previous request. If the request to access data from the lower level cache is not redundant, a cache write access is scheduled. Data is requested from a higher level cache. The requested data from the higher level cache is awaited. If the request to access data from the higher level cache is not successful, a subsequent cache write access is scheduled.
Nearly every modern processor uses memory caching to access more frequently needed data in the fastest manner possible, rather than always directly accessing main system memory. First, second, third, and even, in some processor designs, fourth and even higher level caches each present fast, and progressively larger locations from the processor to store and write data, and even though each cache level is more distant from the microprocessor, all cache levels are closer and allow faster access than accesses from main system memory. The goal of the caches' availability very close to the processor, is to improve memory search times for frequently needed data, with the end result of a reduction of the time needed to execute.
Cache access, as with other computer processes, occurs via instructions executed by the processor. Each “stream” of instructions may include, for example, one or more “read” passes, in which data is sought to be accessed from a cache, followed by one or more “write” passes, in which the data read is written elsewhere in the computer system, as well as various other steps. The scheduled write passes may also allow for rescheduling the read pass if it is missed so that data can be delivered in the fastest way possible (because of a failure to arbitrate access or because of data existing in a higher level cache).
Regardless of whether the data sought is actually obtained from a cache in a read pass (also known as a “read access” or a “read cache access”), or is predicted to be successfully obtained, because of the nature of instructions in which instructions executed in a sequential order, if data is successfully obtained in a “read” pass or is not predicted to be successfully obtained, at least one subsequent “write” pass (also known as a “write access” or a “write cache access”) may need to be descheduled, because the previous “read” pass has succeeded. These “write” passes are unnecessary, waste processor time and generate heat within the microprocessor by their execution. It is disadvantageous, however, to simply remove from scheduling every “write” pass following a “read” pass, despite their duplicity. Some duplication allows for a quicker execution if a previous instruction is blocked simply because of a simultaneous access to the processor in a multi-threaded application, or for any other reason.
Presented is a system, method, and computer program product for removing from scheduling redundant and unnecessary cache write accesses (also known as “write passes”) if they are not needed while still allowing the benefits of redundant “write” requests.
At step 120, a current request to access data from the cache is retrieved. The current request is currently executing in instructions, has just executed, or will execute by the processing unit very shortly. At step 130, a determination is made whether the current request is seeking the same data as at least one of the one or more previous requests (which were stored at step 110). The current request and any one of the previous request(s) may be seeking the same data if each is accessing the same logical address within a cache directory. The logical address of the cache directory references a cache memory location within the cache, and if the same logical address is accessed by the current request and at least one previous request (and no changes have occurred at any time), the two or more requests are identical. If the current request and none of the previous requests are seeking the same data, execution of the first embodiment terminates at end 190. If, on the other hand, it is determined that the current request is seeking the same data as at least one of the one or more previous requests, execution continues to 140.
At step 140, a determination is made whether at least one of the one or more previous requests seeking the same data was successful in arbitrating access to the processing unit when seeking access. In effect, the processing unit determines that not only was at least one of the previous request(s) to access data seeking the same data (at step 130), but also that the previous request to access data was successful (i.e. the processing unit was not blocked by another thread, load/store unit was not accessing the processing unit at the same time, some other concurrent read/write blocked access, or any other reason). If none of the previous requests seeking the same data arbitrated successfully, execution of the first embodiment terminates 190. If, on the other hand, if at least one of the previous requests seeking the same data successfully arbitrated access, execution may proceed directly to step 160, where a next cache write access is suppressed as unnecessary. This serves to increase efficiency of the processing unit, as discussed above. Optionally, after determining at step 140 that a previous request seeking the same data successfully arbitrated access to the processing unit, execution proceeds to step 160 and access to the next cache write access is suppressed, for improving of the efficiency of the processing unit. Access to the next cache write access may prevent concurrent read accesses by the processing unit, if not suppressed.
Execution finally proceeds to end 190, in any outcome. Note that end 190, as shown in
At step 230, the processing unit determines whether a successful cache access has occurred in the new stream of instructions. The successful cache access may occur at anytime after the new stream of instructions has begun. In practice, the processing unit may determine whether the successful cache access has occurred by determining whether the hold latch has been set (which occurs after successful cache access). If the processing unit determines successful cache access has not occurred, execution proceeds to end 290 (as discussed above). If the processing unit determines successful cache access has occurred, execution continues to step 250, and a next cache write access is suppressed. The next cache write access may also prevent concurrent read access, so suppressing of unnecessary cache write accesses may further allow for fast execution of the remainder of the instructions, decreased heat generation, as well as other benefits. The processing unit may further prevent all other unexecuted cache write accesses after successful cache access has occurred (and the hold latch set, etc.).
In any event, execution eventually proceeds to end 290. Note that end 290, as shown in
At step 130, a determination is made whether the current request 430 is seeking the same data as at least one of the previous requests (411-417) in cache directory 450. Depending upon the level of the cache involved and system architecture, cache directory 450 may be replaced by a translation look aside buffer (not shown). The processing unit 400 accesses stored data regarding previous requests (411-417). Previous read request i3 (411) and previous read request i4 (413) have attempted to access logical address 0x000002000 (453) in cache directory 450. Previous read request i5 (415) has attempted to access logical address 0x000004000 (455) in cache directory 450. Previous read request i6 (417) has attempted to access logical address 0x000005000 (457). Current request 430 is also seeking data at logical address 0x000004000 (455) in cache directory 450. The processing unit 400 has thus determined that current request 430 is seeking the same data as previous request i5 415, specifically data at logical address 0x000004000 (455) in cache directory 450.
At step 140, a determination is made by the processing unit 400 whether at least one of the one or more previous requests (411-417) was successful in arbitrating access to the processing unit when seeking access. In effect, a determination is made whether when one of the one or more previous requests 411-417, when executed, successfully accessed data at the specified logical address, and did not fail such as because the processing unit 400 was blocked by another thread, load/store unit access to the processing unit 400 was occurring, or for any other reason. The processing unit 400 determines that previous read request i3 (411), previous read request i5 (415), and previous read request i6 (417) were successful in arbitrating access to the processing unit 400, but since only previous read request i5 (415) was seeking the same data, only this previous read request is considered. The processing unit 400 therefore determines that previous read request i5 (415) is equivalent to current request i2 (430), and that previous read request i5 (415) was successful in arbitrating access to the processing unit 400. At step 160, after making this determination, the processing unit 400 suppresses access to the i2 cache write access 470. This cache write access 470 would be duplicative of a previous cache write access and is now not necessary. Execution then proceeds to end 190.
At step 740, a determination is made whether at least one of the one or more previous requests seeking the same data was successful is arbitrating access to the processing unit. At step 750, a determination is made whether at least one previous cache write access followed the previous requests seeking the same data. At step 770, a next cache write access is suppressed if at least one of the one or more previous requests seeking the same data was successful in arbitrating access to the processing unit. Execution proceeds to end at step 780.
At step 930, if the request to access data from the lower level cache is not redundant, a cache write access is scheduled. All cache write accesses may prevent concurrent read accesses, depending upon system architecture, so limiting them may be useful for saving processor time, battery life, lowering heat generation by the processing unit, etc. At step 940, data is requested by the processing unit from a higher level cache. The higher level cache may be an L2 cache 640 or an L3 cache 650. At step 950, the processing unit waits for the requested data from the higher level cache. The “waiting” may actually take a significant period of time, or occur quickly. If the request to access data from the higher level cache is not successful, at step 960 a subsequent cache write access is scheduled at step 960. Execution proceeds to end 980.
Based on the foregoing, a computer system, method and program product have been disclosed for filtering of redundantly scheduled write passes. However, numerous modifications and substitutions can be made without deviating from the scope of the present invention. The embodiments herein may be combined, altered, or portions removed. Therefore, the present invention has been disclosed by way of example and not limitation.
Number | Name | Date | Kind |
---|---|---|---|
3840862 | Ready | Oct 1974 | A |
3840863 | Fuqua | Oct 1974 | A |
4349871 | Lary | Sep 1982 | A |
4400770 | Chan | Aug 1983 | A |
4879666 | Kembo | Nov 1989 | A |
5029072 | Moyer | Jul 1991 | A |
5367656 | Ryan | Nov 1994 | A |
5371870 | Goodwin | Dec 1994 | A |
5615402 | Quattromani | Mar 1997 | A |
5694571 | Fuller | Dec 1997 | A |
5754820 | Yamagami | May 1998 | A |
5781733 | Stiles | Jul 1998 | A |
5890219 | Scaringella | Mar 1999 | A |
5920892 | Nguyen | Jul 1999 | A |
6105108 | Steely, Jr. | Aug 2000 | A |
6338118 | Johnson | Jan 2002 | B2 |
7386685 | Blumrich | Jun 2008 | B2 |
7434000 | Barreh | Oct 2008 | B1 |
7900024 | Abernathy et al. | Mar 2011 | B2 |
8195881 | Bohn et al. | Jun 2012 | B2 |
9519549 | Blount | Dec 2016 | B2 |
20010011330 | Hughes | Aug 2001 | A1 |
20030004952 | Nixon et al. | Jan 2003 | A1 |
20030191885 | Thimmanagari | Oct 2003 | A1 |
20040024968 | Lesartre | Feb 2004 | A1 |
20040054806 | Basu | Mar 2004 | A1 |
20050114592 | Jin | May 2005 | A1 |
20060044603 | Meeker | Mar 2006 | A1 |
20060059311 | Van De Waerdt | Mar 2006 | A1 |
20060224839 | Blumrich | Oct 2006 | A1 |
20060285397 | Nishihara | Dec 2006 | A1 |
20100274772 | Samuels | Oct 2010 | A1 |
20110271057 | Karlsson | Nov 2011 | A1 |
20110289263 | McWilliams et al. | Nov 2011 | A1 |
20130198459 | Joshi | Aug 2013 | A1 |
20140143471 | Moyer et al. | May 2014 | A1 |
20160026409 | Tanaka | Jan 2016 | A1 |
20160070651 | Shwartsman et al. | Mar 2016 | A1 |
20180341495 | Culurciello | Nov 2018 | A1 |
Entry |
---|
Programmable Caches with a Data Management Language and Policy Engine; Sevilla et al.; 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing; May 1-4, 2018 (Year: 2018). |
IBM, List of IBM Patents or Patent Applications Treated as Related, Appendix P, dated Nov. 8, 2017, 2 pages. |
Pending U.S. Appl. No. 15/805,549, filed Nov. 7, 2017, entitled: “Filtering of Redundently Scheduled Write Passes ”, 30 pages. |
Sun Microsystems, “OpenSPARC T1 Microarchitecture Specification”, www.sun.com, Part No. 819-6650-11, Apr. 2008, pp. 1-268. |
Number | Date | Country | |
---|---|---|---|
20190018779 A1 | Jan 2019 | US |