Claims
- 1. A method for deadlock avoidance in a clustered symmetric multiprocessor system having a plurality of clusters interaced with one another, each cluster having:
(a) a local fetch interface controller, and (b) a local store interface controller, and (c) a remote fetch controller, and (d) a remote storage controller, and (e) a local-to-remote data bus and (f) an interface controller, (d) a plurality of processors, (e) a shared cache memory, (f) a plurality of I/O adapters, and (g) a main memory accessible from the cluster, said method comprising the steps of:
detecting a plurality of fetch requests between a local and a remote cluster, responding to the detected fetch requests to permit at least one request of said plurality of fetch requests to proceed while actively delaying action among remaining requests of said plurality of requests, causing a remote fetch controller to wait until said one request which was permitted to proceed has completed, and then proceeding with a delayed request among said remaining requests of said plurality of requests.
- 2. A method for deadlock avoidance in a clustered symmetric multiprocessor system having a plurality of clusters interaced with one another, each cluster having:
(a) a local fetch interface controller, and (b) a local store interface controller, and (c) a remote fetch controller, and (d) a remote storage controller, and (e) a local-to-remote data bus and (f) an interface controller, (d) a plurality of processors, (e) a shared cache memory, (f) a plurality of I/O adapters, and (g) a main memory accessible from the cluster, said method comprising the steps of:
detecting a plurality of fetch requests between a local and a remote cluster, responding to the detected fetch requests to permit one request of said plurality of fetch requests to proceed while actively rejecting remaining requests of said plurality of requests and notify the requesting rejected remote controlers of the rejection, causing a remote fetch controller to fetch data in response to said one request which was permitted to proceed has completed, and then storing the data of said response that proceeded from said remote fetch controller in said shared cache memory; and after a remote fetch controller request has received notification of said rejection, reinitiating the request which was requested from that controller with a fresh directory look up of said shared cache memory.
- 3. A method for deadlock avoidance in a clustered symmetric multiprocessor system having a plurality of clusters interaced with one another, each cluster having:
(a) a local fetch interface controller, and (b) a local store interface controller, and (c) a remote fetch controller, and (d) a remote storage controller, and (e) a local-to-remote data bus and (f) an interface controller, (d) a plurality of processors, (e) a shared cache memory, (f) a plurality of I/O adapters, and (g) a main memory accessible from the cluster, said method comprising the steps of:
detecting a plurality of storage requests between a local and a remote cluster, responding to the detected storage requests to permit one request of said plurality of fetch requests to proceed while actively aborting all remaining requests of said plurality of requests and notifying the requesting rejected remote controlers of a successful completion of an aborted request, causing a remote storage controller to store data in response to said one request which was permitted to proceed has completed, and then storing the data of said response that proceeded from said remote fetch controller into said main memory.
- 4. The method according to claim 2 whereby deadlock avoidance is achieved when multiple nodes are processing simultaneous data storage operations in which said data may involve processor storage operations to shared memory and/or memory storage operations resulting from casting out aged data from a shared L2 cache.
- 5. The method according to claim 2 whereby deadlock avoidance is achieved when one node is processing data access operations while another remote node is simultaneously processing data storage operations in which said data may involve processor storage operations to shared memory and/or memory storage operations resulting from casting out aged data from a shared cache.
- 6. The method according to claim 2 whereby deadlock avoidance is achieved when one node is processing simultaneous data access operations while another remote node is simultaneously processing cache management operations such as directory invalidation.
- 7. The method according to claim 2 whereby deadlock avoidance is achieved when one node is processing simultaneous data storage operations while another remote node is simultaneously processing cache management operations such as directory invalidation.
- 8. The method according to claim 2 whereby deadlock avoidance is achieved when a plurality of nodes are processing simultaneous cache management operations such as directory invalidation.
- 9. The method according to claim 2 whereby deadlock avoidance is achieved by providing a guaranteed 1-to-1 affinity between local memory controller resources and their corresponding remote resource, whereby said remote resource is acting as an agent servicing a data access, data storage or cache management operation on behalf of the local memory controller, and said remote resource is guaranteed to be available to service said request due to the dedicated pairing of local and remote resources.
- 10. The method according to claim 1 whereby any type of remote operation involves at most one pair of local and remote resources which is achieved through the use of an improved cache management scheme for data accesses coupled with an improved I/O Store mechanism which eliminates the need to perform “forced cast outs” which would require an additional pair of resources to support completion of the primary operation undergoing processing on the first pair of resources.
- 11. The method according to claim 10 whereby a remote interrogation command is used to query one or more remote nodes during I/O Stores to determine whether there is a cache hit in a remote cache, and in the case where said I/O Store does hit in a remote cache, the remote store controller resource is held in reserve to await transfer of the I/O Store data from the local to the remote node.
- 12. The method according to claim 10 eliminating the need for complex compare logic and cumbersome signaling between a local memory controller and one or more remote controllers residing on the same node but processing different (unrelated) operations due to the use of a single pair of local and remote resources to process any operation.
- 13. The method according to claim 1 whereby deadlock avoidance is achieved through the use of asynchronous cast outs which are permitted direct access to shared memory without the need to perform address compare interlocks against other simultaneous operations, and ensuring the cast out will complete without interruption from said simultaneous operations.
- 14. The method according to claim 1 whereby memory operations utilize an abort mechanism which allows the move page operation to cease prior to performing any memory access and permit other operations to continue and wherein said memory operations include any combination of said
a. page move store operations involving movement of data from one memory location to a target memory location and b. cache coherency operations involving invalidation of remote shared caches and c. I/O store operations involving storage of I/O data into main memory or a shared cache and d. store pad operations involving replication of data patterns into main memory.
- 15. The method according to claim 1 whereby a remote fetch controller processing a fetch request on behalf of a local memory controller utilizes a miss response in place of a reject response which permits the operation to complete without the need to recycle the operation back to the initiating processor.
- 16. The method according to claim 1 involving a fast hang quiesce mechanism embedded in the remote fetch and store controllers which prevent deadlocks by detecting system hangs and rejecting their current operations in an effort to permit the system operation to complete.
- 17. The method according to claim 1 involving a fast hang quiesce mechanism embedded in the remote fetch and store controllers which prevent deadlocks caused by their own operations by detecting an internally generated hang period and using this hang period to signal the other controllers to quiesce their pending operations to permit the current remote fetch and/or store operation to complete.
- 18. The method according to claim 1 wherein when responding to the detected fetch requests to permit more than one request of said plurality of fetch requests to proceed while actively delaying action among remaining requests of said plurality of requests, said more than one requests proceed in parallel.
- 19. The method according to claim 1 wherein when responding to the detected storage requests, the process applicable to fetch requests proceeds for stores.
RELATED APPLICATIONS
[0001] This application entitled “Method for deadlock avoidance in a cluster environment” is related to U.S. Ser. No. ______, filed ______, and entitled “High Speed Remote Storage Controller” and also to U.S. Ser. No. ______, filed ______, and entitled “Clustered Computer System with Deadlock Avoidance”.
[0002] These co-pending applications and the present application are owned by one and the same assignee, International Business Machines Corporation of Armonk, N.Y.
[0003] The descriptions set forth in these co-pending applications are hereby incorporated into the present application by this reference.
[0004] Trademarks: S/390 and IBM are registered trademarks of International Business Machines Corporation, Armonk, N.Y., U.S.A. Other names such as z900, e(logo)Server may be registered trademarks or product names of International Business Machines Corporation or other companies.