The subject matter, which is regarded as the invention, is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
One aspect of the exemplary embodiments is a method for addressing live-locks between dispatching. In another aspect of the exemplary embodiments, a set of logic is provided for breaking out of live-locks without knowing whether one exists at any given moment in time. In yet another exemplary embodiment, the breaking out of live-locks is accomplished by randomly stopping the dispatch to any Data machine (DM) within an L2 cache until all the DM's in that L2 cache are idle. Once all the DM's are idle, that L2 cache proceeds to a “single dispatch mode” for a random short period of time, whereby a DM may be dispatched if all the DM's contained within that L2 are idle.
Therefore, because it is difficult to predict ahead of time the live-locks that could occur and because it may be expensive (i.e., complexity and hardware) to detect a live-lock in progress, it is justified to merely assume that live-locks simply occur. As a result of this presumption, the logic is designed to break out of live-locks without knowing whether it's really in one at any given moment in time. The breaking out of live locks is described in detail with regards to
Referring to
Referring to
The following are two live-lock examples illustrating
In the first example, the processors 12 may be polling an address and thus generate a great deal of load traffic to that address. As a result, it is possible for one processor 12 to get locked out and be prevented from polling. Specifically, the following steps may take place:
P0 and P1 each send load@A to a cache 14 (L2) at same time;
P0 wins arbitration to the L2 access execution pipeline;
P1 wins arbitration to the L2 access execution pipeline;
P1's load gets rejected due to a conflict with P0's request. It then proceeds into a load Q to wait for P0's load to finish;
P0's load finishes;
P1's load is asked to retry;
P2 sends load@A to L2 and gets to the arbiter a cycle ahead of when the P1 load is able to make its request;
P2 wins arbitration to the L2 access pipeline;
P1 wins arbitration to the L2 access pipeline;
P1's load gets rejected due to a conflict with P2's request. It then proceeds into the load Q to wait for P2's load to finish;
Each time that it appears that P1 's load is able to get moving through the execution pipeline, another processor slips ahead of it and it ends up being rejected;
At this point, the live-lock breaker alters the conditions a bit, in accordance with the exemplary embodiments of the present invention. For instance, the live lock breaker levels the playing field somewhat by stopping all requests for a period of time, and it ensures that the P1 load and the P2 load requests are seen by the arbiter at the same time. This processing enables the P1 load to win either randomly (given enough head-to-head chances, it'll prevail at some point) or by favoring the older request in the arbiter.
In a second example, the processors 12 may be generating enough new requests to their shared L2 that it cannot complete an older operation. As a result, another L2 may be prevented from gaining access to the line affected by the older operation. Specifically, the following steps may take place:
P0 sends store1@ A to L2-0;
Store1 gets into DM7 (random data machine) and is an L2-0 miss;
Data@A comes into L2-0 and merges with store1's data;
DM7 has ownership of the line and also has the data. It is now ready to write L2-0 cache and L2-0 directory so that it can free up;
P1, P2, P3 & P0 start sending lots of load requests to L2-0;
All are unique addresses and no address conflicts or hazards;
Because processor and system performance is very dependent on load latency, loads have priority over other requests to the cache/directory. Therefore, DM7 keeps requesting access and keeps losing arbitration to the steady stream of new load requests;
P4 sends load1@A to L2-1;
Load1 is an L2-1 miss and L2-1 makes a read request on the system bus which becomes a snoop into the other L2's to see whether they have the data;
L2-0 responds: “retry,” it is not able to service the request because it's to the same line as a DM machine (e.g., DM7) that's trying to update the cache/directory and go idle. L2-0 can't service a snoop for that address until DM7 goes idle;
Each time that L2-1 retries its read request, it gets rejected because DM7 is prevented from completing due to all of the load traffic. It's making requests to the bus, but is not making progress for any request having address A;
So, L2-1 and as a result P4 are prevented from making forward progress due to the volume of load traffic to L2-0 by P0, P1, P2, & P3; and
The live-lock breaker randomly prevents the L2 arbiter from granting requests to gain access to the DM machines. This further stops the loads from being dispatched to DMs and allows the outstanding requests (e.g., DM7 in this case) to complete their processing.
Referring to
The exemplary embodiments address live-locks between dispatching DMs. In particular, the dispatching is randomly stopped (e.g., every few 100's of thousands of cycles) to any DM in an L2 until all DMs in that L2 are idle. Once all DMs in that L2 have been idle for a short period of time (e.g., 10's of cycles), go into “single dispatch mode” for a random, short period of time whereby a DM may only be dispatched if all DMs are idle. At the end of that short period of time, return to normal dispatch mode to let multiple DMs be used simultaneously. The reason for this is to periodically provide the DM dispatch with varying situations of system conditions as randomly as possible. Otherwise, it may be possible to get into a significantly large live-lock loop among multiple bus masters.
The exemplary embodiments do not apply only to L2 caches. The processing of the exemplary embodiments may apply to L3 caches, L4 caches, memories, and any other resource that has multiple requestors vying for limited resources.
The capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof.
As one example, one or more aspects of the present invention can be included in an article of manufacture (e.g., one or more computer program products) having, for instance, computer usable media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention. The article of manufacture can be included as a part of a computer system or sold separately.
Additionally, at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the present invention can be provided.
The flow diagrams depicted herein are just examples. There may be many variations to these diagrams or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order, or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.
While the preferred embodiment to the invention has been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.