1. Field of Invention
The present disclosure relates generally to multi-tasking multi-processor environments, and in particular, to managing recovery of a link in a multi-tasking multi-processor environment.
2. Description of Background
When operating a communications link in a multi-tasking multi-processor environment, numerous failures can occur and there are a variety of ways in which the communication link can be/recovered. For example, if a channel needs to be recovered in the existing coupling technologies for multi-tasking multi-processor environment, the operation is supported by dedicated hardware. The hardware link can be reset in order to achieve this Loss of Link operation, often by dropping light or cutting power.
In the case of the new coupling technology based upon industry standard Infiniband, multiple channels can be emulated across a single physical link. Therefore, there is no hardware assist that can be called upon to aid in the recovery of the communication link. As such, recovery is left up to the firmware, which must be able to handle such recovery on a single channel without impacting the other channels that share the physical link.
An exemplary embodiment includes a computer program product for managing recovery of a communications link in a multi-tasking multi-processor environment, the computer program product including a tangible storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method including shutting off timers for a failed channel associated with the communications link, storing a loss of link condition in a data structure, disabling communications on the failed channel and sending an external notification of the loss of link condition.
Another exemplary embodiment includes an apparatus for managing recovery of a communications link in a multi-tasking multi-processor environment, the computer program product including a processor performing a method including shutting, off timers for a failed channel, associated with the communications link, storing a loss of link condition in a data structure, disabling communications on the failed channel and sending an external notification of the loss of link condition.
A further exemplary embodiment includes a method for managing recovery of a communications link in a multi-tasking multi-processor environment, the method including shutting off timers for a failed channel associated with the communications link, storing a loss of link condition in a data structure, disabling communications on the failed channel and sending an external notification of the loss of link condition.
Other articles of manufacture, apparatuses, and/or methods according to embodiments will be or become apparent to one with skill in the art upon review of the following drawings and detailed description. It is intended that all such additional articles of manufacture, apparatuses, and/or methods be included within this description, be within the scope of the present invention, and be protected by the accompanying claims.
The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
The detailed description explains the preferred embodiments of the invention, together with advantages and features, by way of example with reference to the drawings.
In accordance with an aspect of the present invention, the systems, methods and computer program products described herein implement an out-of-band command and control interface to guide the process of managing recovery of a link via loss of link in a multi-tasking multi-processor environment. In exemplary embodiments, in the event of a communication link failure a host system sends a message to a remote partner system to indicate that a channel recovery is required. The code associated with the systems then systematically disconnects the channel resources and ensures that these resources are cleaned up and made available for re-use. Once the aforementioned process is complete, the channel can be brought up again.
In exemplary embodiments, in a loss of link (LOL) scenario, the recovery is a temporary transition to recover from a problem that could not be corrected in a less intrusive fashion. Therefore, the transition has to be as fast and efficient as possible. Recovery of a channel can result from many different situations, including: timeout of communications across the link as detected by the heartbeat support; a communication error that caused a buffer to go into error; a software bug that compromised the integrity of the communications across the channel; an operating system request to recycle the channel; an operator request to disable the channel; and the remote partner informing the local channel that the channel is going through a loss of link operation.
Once the channel has been disabled, the I/O processor (IOP) can conditionally begin to restore the channel to operation, which depends upon the original cause of the loss of link. In an exemplary embodiment, the IOP initializes its control blocks for the channel, and then informs the channel layer to start the hardware dependent initialization. The channel initialization process of ensuring that the control blocks associated with the channel are in their initial state, establishing and connecting the queue pairs for the out of band signaling connection, negotiating buffer counts and sizes, connecting each of the queue pairs associated with data buffers, and then exchanging Node Descriptor (system identification information) ensures that the channel is now clean and ready to resume normal data communications.
Much of the coordination of this activity resides in the firmware, and involves the auxiliary queue pair. The auxiliary queue pair, or out of band signaling, manages not only the breaking down of the channel (the loss of link operation), but also the entire rebuilding and connecting of the channel across the link.
Technical effects of exemplary embodiments include the ability to recover a loss of link for existing coupling connections emulated in firmware. A single channel which is multiplexed across a shared physical link is capable of being recovered without in any way impacting the other channels that share the physical connection.
As described above, embodiments can be embodied in the form of computer-implemented processes and apparatuses for practicing those processes. In exemplary embodiments, the invention is embodied in computer program code executed by one or more network elements. Embodiments include a computer program product 300 as depicted in
While the invention has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include all embodiments falling within the scope of the appended claims. Moreover, the use of the terms first, second, etc. do not denote any order or importance, but rather the terms first, second, etc. are used to distinguish one element from another. Furthermore, the use of the terms a, an, etc. do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced item.
Priority based on U.S. Provisional Patent Application Ser. No. 61/031,315, filed Feb. 25, 2008, and entitled “MULTI-TASKING MULTI-PROCESSOR ENVIRONMENTS OVER INFINIBAND” is claimed, the entire contents of which is incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
3643227 | Smith et al. | Feb 1972 | A |
4993014 | Gordon | Feb 1991 | A |
5170472 | Cwiakala et al. | Dec 1992 | A |
5339413 | Koval et al. | Aug 1994 | A |
5343867 | Shankar | Sep 1994 | A |
5388266 | Frey et al. | Feb 1995 | A |
5524212 | Somani et al. | Jun 1996 | A |
5764641 | Lin | Jun 1998 | A |
5777987 | Adams et al. | Jul 1998 | A |
6073181 | Holland et al. | Jun 2000 | A |
6181677 | Valli et al. | Jan 2001 | B1 |
6185629 | Simpson et al. | Feb 2001 | B1 |
6289386 | Vangemert | Sep 2001 | B1 |
6363457 | Sundberg | Mar 2002 | B1 |
6483804 | Muller et al. | Nov 2002 | B1 |
6507567 | Willars | Jan 2003 | B1 |
6741552 | McCrosky et al. | May 2004 | B1 |
6862609 | Merkey | Mar 2005 | B2 |
6993032 | Dammann et al. | Jan 2006 | B1 |
7093024 | Craddock et al. | Aug 2006 | B2 |
7200704 | Njoku et al. | Apr 2007 | B2 |
7283473 | Arndt et al. | Oct 2007 | B2 |
7290077 | Gregg et al. | Oct 2007 | B2 |
7366813 | Gregg et al. | Apr 2008 | B2 |
7444641 | Diepstraten et al. | Oct 2008 | B1 |
7467402 | Pennington et al. | Dec 2008 | B2 |
7602774 | Sundaresan et al. | Oct 2009 | B1 |
7613183 | Brewer et al. | Nov 2009 | B1 |
20010014954 | Purcell et al. | Aug 2001 | A1 |
20010030943 | Gregg et al. | Oct 2001 | A1 |
20020091826 | Comeau et al. | Jul 2002 | A1 |
20020107903 | Richter et al. | Aug 2002 | A1 |
20020194245 | Simpson et al. | Dec 2002 | A1 |
20030005039 | Craddock et al. | Jan 2003 | A1 |
20030018828 | Craddock et al. | Jan 2003 | A1 |
20030061379 | Craddock et al. | Mar 2003 | A1 |
20040123068 | Hashimoto | Jun 2004 | A1 |
20040154007 | Koizumi et al. | Aug 2004 | A1 |
20040202189 | Arndt et al. | Oct 2004 | A1 |
20040221070 | Ortega, III et al. | Nov 2004 | A1 |
20050018669 | Arndt et al. | Jan 2005 | A1 |
20050060374 | Phillips | Mar 2005 | A1 |
20050060445 | Beukema et al. | Mar 2005 | A1 |
20050080933 | Herring | Apr 2005 | A1 |
20050120237 | Roux et al. | Jun 2005 | A1 |
20050144313 | Arndt et al. | Jun 2005 | A1 |
20060048214 | Pennington et al. | Mar 2006 | A1 |
20060176167 | Dohrmann | Aug 2006 | A1 |
20060230185 | Errickson et al. | Oct 2006 | A1 |
20060230209 | Gregg et al. | Oct 2006 | A1 |
20060230219 | Njoku et al. | Oct 2006 | A1 |
20070239963 | Yao et al. | Oct 2007 | A1 |
20070245050 | Gregg et al. | Oct 2007 | A1 |
20080028116 | Gregg et al. | Jan 2008 | A1 |
20080109891 | Greenwald et al. | May 2008 | A1 |
20080196041 | Gregg et al. | Aug 2008 | A1 |
20090019312 | Kulkarni et al. | Jan 2009 | A1 |
20090094603 | Hiltgen et al. | Apr 2009 | A1 |
20090217238 | Errickson et al. | Aug 2009 | A1 |
Entry |
---|
“Infiniband Architecture Specification vol. 1”, Release 1.0.a, Jun. 19, 2001, pp. 40, 71, 193-194, 366-370, 665-667. |
Jeong et al.; A study on TCP Buffer Management Algorithim for Improvement on Network Performance in Grid Environment; Jun. 6-9, 2004 pp. 281-288. |
Huang et al., InfiniBand Support in Xen Virtual Machine Environment, Technical Report, OSU-CISRC-10/05-TR63, Oct. 2005. |
Wu et al., “Design of An InfiniBand Emulator over Myrinet: Challenges, Implementation, and Performance Evaluation”, Technical Report, OSU-CISRC-2/01-TR03, 2003. |
Non-Final Office Action dated Jun. 23, 2010 for U.S. Appl. No. 12/036,986. |
Non-Final Office Action dated Jan. 29, 2010 for U.S. Appl. No. 12/036,979. |
Notice of Allowance dated Jun. 23, 2010 for U.S. Appl. No. 12/036,979. |
Non-Final Office Action dated Oct. 1, 2010 ffor U.S. Appl. No. 12/037,046. |
Non-Final Office Action dated May 17, 2010 for U.S. Appl. No. 12/037,046. |
Office Action made Final dated Oct. 6, 2010 for U.S. Appl. No. 12/037,048. |
Non-Final Office Action dated Apr. 9, 2010 for U.S. Appl. No. 12/037,048. |
Office Action Made Final dated Aug. 12, 2010 for U.S. Appl. No. 12/051,634. |
Non-Final Office Action dated Mar. 22, 2010 for U.S. Appl. No. 12/051,634. |
Non-Final Office Action dated May 13, 2010 for U.S. Appl. No. 12/051,631. |
Non-Final Official Action dated Mar. 22, 2011 for U.S. Appl. No. 12/036,983. |
Advisory Action dated Feb. 16, 2011 for U.S. Appl. No. 12/036,986. |
Notice of Allowance dated Apr. 22, 2011 for U.S. Appl. No. 12/037,046. |
Office Action made Final dated Dec. 8, 2010 for U.S. Appl. No. 12/036,986. |
Notice of Allowance dated Feb. 7, 2011 for U.S. Appl. No. 12/036,979. |
Notice of Allowance dated Jan. 11, 2011 for U.S. Appl. No. 12/037,048. |
Notice of Allowance dated Oct. 15, 2010 for U.S. Appl. No. 12/051,631. |
Non-Final Office Action dated Nov. 14, 2011 for U.S. Appl. No. 12/058,034. |
Non-Final Office Action dated Feb. 8, 2012 for U.S. Appl. No. 12/058,054. |
Non-Final Office Action dated Dec. 1, 2011 for U.S. Appl. No. 12/051,634. |
Office Action Made Final dated Jan. 4, 2012 for U.S. Appl. No. 12/051,628. |
Non-Final Office Action dated Jul. 6, 2011 for U.S. Appl. No. 12/051,628. |
Notice of Allowance dated Mar. 9, 2012 for U.S. Appl. No. 12/058,034. |
Final Office Action for U.S. Appl. No. 12/051,634—Dated May 9, 2012. |
Number | Date | Country | |
---|---|---|---|
20090216923 A1 | Aug 2009 | US |
Number | Date | Country | |
---|---|---|---|
61031315 | Feb 2008 | US |