The present invention relates to the field of data communications. More particularly, this invention relates to methods and systems for recovering from an interruption in, or failure of, tunnel communications when using the Layer 2 Tunneling Protocol.
It is known that data and voice communications can be carried over telephone lines or other data communication channels, including the Internet. In both cases, modern communications systems commonly divide the communicated signals into discrete packets of a known format for delivery over a communications network. The packets are transmitted from a sending end of the network to a receiving end of the network, and are managed in various ways to provide secure and almost error free communication.
In the case of long-distance communications, packets may be exchanged over a telephone network, requiring potentially expensive long-distance calling to be made. Alternatively, packets may be exchanged at a generally lower cost by having each of the two parties to the communication connect to a local electronic, or computer server, where the respective local servers exchange information using the Internet's infrastructure.
A standard model for exchanging data packets over the Internet is the Open Systems Interconnection (OSI) model. The OSI model comprises a family of known protocols that cooperatively provide services that collectively result in successful transmission and reception of data packets over the Internet. More information on the operation and use of the OSI model in Internet communications can be found in the literature known to those skilled in the art of digital communications. Briefly, the OSI model provides seven layers of functionality (including the physical layer, the data link layer, the network layer, the transport layer, the session layer, the presentation layer, and the application layer), where each layer is interdependent on the surrounding layers or levels. At one end of the protocol stack (L7, or the application layer), the end-user of the communication communicates using a software application. At the other end of the protocol stack (L1, or the physical layer) resides the physical signal transmission medium.
The principles of the present invention apply primarily to the second layer of the OSI model (L2), which is known as the data link layer. L2 describes the logical organization of data bits being transmitted on a particular medium, and ensures bit-level integrity of the data being transmitted on the medium. L2 accomplishes its function by properly encapsulating the underlying packets in frames according to the L2 communication protocol. The framing of the data packets allows for bit error checking and correction, and includes network addressing information so that the frame can be properly delivered to its intended destination.
The Layer 2 Tunneling Protocol (L2TP), which is a known secure protocol used for connecting users to networks (e.g., virtual private networks (VPNs)) over public lines (such as the Internet), encapsulates Point-to-Point Protocol (PPP) frames that are sent using the Internet Protocol (IP). The PPP payload information can be compressed and/or encrypted as desired. Header information and datagram content are included in a User Datagram Protocol (UDP) message that is sent across a “tunnel” from one server to another.
A tunnel is established between two servers using L2TP as provided in RFC 2661, available at http://www.ietf.org/rfc/rfc2661.txt, which is hereby incorporated by reference herein in its entirety. Establishing an L2TP tunnel generally requires the communicating servers (tunnel ends) to exchange tunnel state information (TSI) with each other, which as explained below, includes may include various information. Additionally, during the communication session, packet sequence numbers are exchanged to maintain the sent and received packets in a proper synchronized order.
In past communications systems, a difficulty generally arises when an L2TP tunnel is interrupted (e.g., lost or broken) or fails due to, for example, a software or a hardware fault (or failure) at one or both of the tunnel endpoint servers. In these systems, packet sequence numbers can become lost or de-synchronized between the two ends of the L2TP tunnel. In this situation, the receiving server will no longer recognize the misnumbered packets as being properly directed to it, and will ignore, or “drop,” the incoming packets. The effect of improper packet sequence numbering is thus to lose the L2TP communications tunnel, requiring a time-consuming reconnection of the tunnel.
Accordingly, it is desirable to provide methods and systems for recovering from an interruption in, or failure of, an L2TP tunnel, such that reconnection of the tunnel is not required.
Methods and systems are provided for recovering from an interruption in, or failure of, an L2TP tunnel endpoint, such that reconnection of the tunnel is not required. According to the invention, redundant storage of tunnel state information is used to avoid having to reconnect an interrupted or failed tunnel. Also, according to the invention, extraction of sequence numbers from a peer L2TP endpoint server is achieved by subdividing the universe, or set of all available sequence numbers into an appropriate number of divisions, and sending a control message having a sequence number from each of the divisions to the peer endpoint server to elicit a response.
In a first embodiment, the invention provides a method for recovering a communications session over a tunnel established using the Layer 2 Tunneling Protocol when a task fails and causes failure of the tunnel, where the method includes storing tunnel state information, descriptive of the tunnel, in more than one location prior to the failure of the tunnel, activating at least one backup task that is substantially capable of replacing the failed task upon the failure of the tunnel, providing the stored tunnel state information to the at least one backup task, and resuming communication over the tunnel using the stored tunnel state information and the backup task.
In a second embodiment, the invention provides a method for recovering a communications session over a tunnel established using the Layer 2 Tunneling Protocol when a task fails and causes failure of the tunnel, where the method includes storing tunnel state information, descriptive of the tunnel, in more than one location prior to the failure of the tunnel, restarting the failed task upon the failure of the tunnel, providing the stored tunnel state information to the restarted task when the failed task is restarted, and resuming communication over the tunnel using the stored tunnel state information and the restarted task.
In a third embodiment, the invention provides a method for recovering a failed communications session over a tunnel established using the Layer 2 Tunneling Protocol after a message sequence number of a first tunnel endpoint is no longer available, where the method includes sending a first control message from the first tunnel endpoint to a second tunnel endpoint, the first control message having a first sequence number, sending a second control message from the first tunnel endpoint to the second tunnel endpoint, the second control message having a second sequence number that is different from the first sequence number, receiving a response control message from the second tunnel endpoint, the response control message containing a peer sequence number from of the second tunnel endpoint, and using the peer sequence number of the second tunnel endpoint as a basis for determining future message sequence numbers of control messages sent from the first tunnel endpoint.
In a fourth embodiment, the invention provides a communications system capable of recovering a communications session over a tunnel established using the Layer 2 Tunneling Protocol following a failure of the tunnel, where the system includes a first component that stores tunnel state information descriptive of the tunnel prior to the failure of the tunnel, wherein the failure of a first task running on the first component causes the failure of the tunnel, the system includes a second component that stores the tunnel state information prior to the failure of the tunnel, and the system includes a third component that does not store the tunnel state information prior to the failure of the tunnel, wherein, upon failure of the tunnel, the tunnel state information stored in the second component is provided to the third component, and communication over the tunnel is resumed using a second task running on the third component and the tunnel state information provided to the third component.
In a fifth embodiment, the invention provides a communications system capable of recovering a communications session over a tunnel established using the Layer 2 Tunneling Protocol following a failure of the tunnel, where the system includes a first component that stores tunnel state information descriptive of the tunnel prior to the failure of the tunnel, wherein the failure of a first task running on the first component causes the failure of the tunnel, and the system includes a second component that stores the tunnel state information prior to the failure of the tunnel, wherein, upon failure of the tunnel, the failed first task is restarted and subsequently provided the tunnel state information stored in the second component, and wherein communication over the tunnel is resumed using the restarted first task and the tunnel state information from the second component.
In a sixth embodiment, the invention provides a system for recovering a communications session over a tunnel established using the Layer 2 Tunneling Protocol when a task fails and causes failure of the tunnel, where system includes means for storing tunnel state information, descriptive of the tunnel, in more than one location prior to the failure of the tunnel, means for activating at least one backup task that is substantially capable of replacing the failed component upon the failure of the session, means for providing the stored tunnel state information to the at least one backup task, and means for resuming communication over the tunnel using the stored tunnel state information and the backup task.
In a seventh embodiment, the invention provides a system for recovering a communications session over a tunnel established using the Layer 2 Tunneling Protocol when a task fails and causes failure of the tunnel, where the system includes means for storing tunnel state information, descriptive of the tunnel, in more that one location prior to the failure of the tunnel, means for restarting the failed task upon the failure of the tunnel, means for providing the stored tunnel state information to the restarted task after the failed task has been restarted, and means for resuming communication over the tunnel using the stored tunnel state information and the restarted task.
Additional embodiments of the invention, its nature and various advantages, will be more apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:
Methods and systems are described in greater detail below for recovering from an interruption in, or failure of, an L2TP tunnel endpoint, such that reconnection of the tunnel is not required. It should be noted that certain features, which are well known in the art, are not described in great detail in order to avoid complication of the subject matter of the present invention.
As described briefly above, improper data packet sequence numbering in an L2TP tunnel can result in lost communications, possibly requiring re-establishment of the L2TP tunnel and causing reduced communications efficiency. The various aspects and embodiments of the present invention described in greater detail below provide for re-establishing L2TP connections to recover from faults (or failures) that resulted in a loss of packet sequence numbering data and other data used in maintaining L2TP tunnels under the present standards. According to various embodiment of the invention, the present invention is used by persons communicating with a home network while they are geographically away from the home network. In other embodiments, the invention is used by persons communicating with other persons separated from them over a significant geographical distance. The invention is not, however, limited in this manner.
As mentioned above, a second, alternate method for connecting user 110 to home network 150 is also shown in
In the second method, local server 120 establishes a connection over IP network 140 (e.g., the Internet) to remote server 130, which provides the gatekeeping function to home network 150 as described above. Here, an L2TP tunnel is established between servers 120 and 130 (also referred to in this context as endpoint servers, or tunnel endpoint servers), and datagrams conforming to L2TP are exchanged between servers 120 and 130. It should also be noted that, server 120, which is at the user/originator end of the L2TP tunnel, is also referred to as an L2TP Access Concentrator (LAC). In addition, server 130, which is at the receiving end of the L2TP tunnel, is also referred to as an L2TP Network Server (LNS).
In order to establish and carry on a successful communications session using an L2TP tunnel, both endpoint server 120 (or LAC 120) and endpoint server 130 (or LNS 130) must exchange certain required information according to L2TP, and may exchange other optional information as well. Following successful establishment of an L2TP tunnel, the endpoint servers can exchange payload information. The process of establishing an L2TP tunnel is now discussed in more detail, followed by a discussion concerning the consequences of a failure at either L2TP tunnel endpoint 120 or 130 and ways to recover from such a failure according to various embodiments of the present invention.
LAC 120 and LNS 130 establish an L2TP tunnel by making a connection and exchanging control data as mentioned earlier. Three particular exchanges between LAC 120 and LNS 130 are called for by L2TP to establish an L2TP tunnel.
First, LAC 120 sends a Start Control Connection Request (SCCRQ) control message to LNS 130 when an incoming call is detected from user 110. The SCCRQ control message generally includes the message type, assigned session ID, and host name. The SCCRQ control message may optionally include receive window size, authentication challenge, and other information. Moreover, the SCCRQ control message is used to indicate to LNS 130 that an L2TP tunnel is to be established between the two servers to facilitate the incoming call.
The second step in establishing an L2TP tunnel is for LNS 130 to respond to the SCCRQ control message by sending a Start Control Connection Reply (SCCRP) message to LAC 120. The SCCRP is a control message indicating that the SCCRQ was successfully received, and that LNS 130 is ready to establish the requested tunnel. Generally speaking, the SCCRP message includes a message type and an assigned session ID number. In addition, according to various embodiments of the invention, the SCCRP includes other control information or parameters provided by LNS 130 that may be useful for establishing the L2TP tunnel.
The third and final step in establishing an L2TP session between LAC 120 and LNS 130 is for LAC 120 to send to LNS 130 a Start Control Connection Connected SCCCN control message (in response to the received SCCRP message). The SCCCN indicates that the SCCRP message was accepted by LAC 120, and that the L2TP tunnel should move to the established state. The SCCCN message generally includes a message type, and challenge response.
According to various embodiments of the present invention, as shown in
According to the invention, at least one respective Data Manager task runs on both LAC 120 and LNS 130. For example, according to various embodiments such as the one shown in
Also running on LAC 120 shown in
The sequence of events that generally take place in the event of a fault (or failure) associated with a Data Manager task or a Control Manager task during an L2TP session is now explained. As mentioned above, in past communications systems, a failure of an L2TP tunnel during an active L2 session would result in a significant degradation in performance. This is because, in past communications systems, the TSI information for an L2TP tunnel was likely to be lost as a result of the failure, in which case the tunnel would have to be re-established according to the three-step scheme described above. Re-establishment of an L2TP tunnel session, however, carries with it performance penalties because of the time it takes to re-establish the tunnel. Reconnection of the tunnel will also cause user network connection to be reestablished. In addition, degraded reliability results from dropped or lost data packets that are sent during the time the tunnel is inoperative. Also, there is no guarantee that the other endpoint server will accept the tunnel session when a recovering server attempts to re-establish the tunnel.
According to various embodiments of the present invention, in order to reduce (or eliminate) the possibility of a lost L2TP tunnel, redundancy is used to store TSI for any (or all) active L2TP tunnels. More particularly, according to various embodiments of the present invention, TSI information is stored with more than one component in both LAC 120 and LNS 130. For example, in connection with LAC 120, TSI information associated with L2TP tunnel 160 may be stored with a first component (e.g., circuit board, or card) on which Control Manager task 220 is running, as well as on a second component (e.g., circuit board, or card) on which Data Manager task 224 is running. In this case, upon a fault or failure of the Data Manager task 224 or the second component on which it is running, the Control Manager task 220, under control of software instructions, transfers a copy of the TSI information being stored with the first component to a third component on which Backup task 228 (or another Backup task that is not shown) is running. In turn, Backup task 228 promptly takes the place of failed Data Manager task 224. Because Backup task 228 is operationally and by design capable of carrying out substantially the same functions as failed Data Manager task 224, it is able to be pressed into service by providing it with the copied TSI information.
According to various embodiments of the present invention, in the case of a failure of Control Manager task 220 (which generally controls a plurality of L2TP tunnels) or the second component on which it is running, failure control software causes duplicate TSI information associated with each active Data Manager task (e.g., Data Manager tasks 223 and 224 of LAC 12) to be transferred to the component on which Backup task 228 is running, after which time Backup task 228 is able to carry out the functions of the failed Control Manager task 220.
It should be noted that faults and failures such as described above are not always fatal, and in some instances, the failed task (or failed component on which a task is running) can be restarted and used once again. In this case, when the same task or component that failed is to be returned to service, the TSI information may be delivered to the failed task or component after it has been restarted (and there is generally no need to use a Backup task running on another component).
In the manner described above, redundant TSI information is stored, and selectively provided in the L2TP endpoint servers so that it is not necessary to go through the process of re-establishing a failed L2TP tunnel. Rather, as explained above, data communication over the same L2TP tunnel can continue promptly after a redundant component (or the same component, after rebooting or restarting) is provided with the necessary TSI information.
According to various embodiments of the invention, tunnel endpoint servers 120 and 130 are considered peers. Moreover, it should be noted that tunnel endpoint servers 120 and 130 can, according to various embodiments of the invention, have similar or identical construction. However, servers 120 and 130 are generally not required to be identical (or even similar), so long as they can provide L2TP connection services. For example, one or both of servers 120 and 130 may be implemented in the ST-16 Intelligent Mobile Gateway from Starent Networks, Inc. of Tewksbury, Mass., USA. It should also be noted that other L2TP tunnels and OSI-compatible connections to other servers, and other auxiliary network devices and features, are also connectable to those shown. It should be appreciated that, as explained above, each of the Control Manager tasks 220 and 230, Data Manager tasks 223-225 and 232-235, and Backup tasks 228 and 238 may be running on different components (e.g., circuit boards, or cards), each of which will generally include onboard processing capability and data storage capability. Moreover, servers 120 and 130 shown in
Other aspects of the present invention, which are now described in greater detail, concern the control message sequence number handling at (or about) the time of a failure of an L2TP tunnel. For example, according to various embodiments of the present invention, methods and systems compatible with the L2TP standard (RFC 2661) are provided for a first L2TP tunnel endpoint server to obtain the correct message sequence numbers from a second L2TP tunnel endpoint server after sequence numbers become lost (e.g., due to a fault or failure at the first L2TP tunnel endpoint server). In this manner, it is unnecessary to re-establish a failed L2TP tunnel session because the sequence numbers were lost. Rather, according to the invention, the connection may continue without the second L2TP tunnel endpoint dropping or ignoring mis-sequenced messages from the first L2TP tunnel endpoint.
As was discussed earlier, control data is exchanged between two L2TP tunnel endpoint servers (e.g., LAC 120 and LNS 130). The sequence numbers in the L2TP header include a sent sequence number (Ns) and a received or expected sequence number (Nr).
Ns signifies the sequence number of a message being sent by an L2TP endpoint server. In L2TP, the sequence numbers are generally 16 bit numbers, and therefore, Ns (as well as Nr) begins at 0 and is continually incremented by one (MOD 2**16). For example, once the incremental value reaches 65,535, Ns becomes equal to 0, and then continues to increment from 0. The Ns value is incremented for each new control message transmitted to the peer.
Nr, on the other hand, signifies the sequence number of a message expected in the next control message received by an L2TP endpoint server. Thus, Nr is equal to the Ns of the last in-order message received from the peer plus one (MOD 2**16). Accordingly, as with the value of Ns, the maximum value of Nr is 65,535, after which point the value becomes 0, 1, and so on.
As persons versed in the art will appreciate, Nr and Ns are mandatory for L2TP control messages. Strict sequence number checking is generally done when a packet is received to make sure that the sequence number of the control message is correct and expected. By using Nr and Ns, L2TP implements reliable transport for the control messages, so there is message retransmission in the event of a “lost” message. Nr serves as an acknowledgement sequence number, indicating to the peer that it has successfully received the message up to sequence number Nr. Nr/Ns sequence numbers are optional for L2TP data messages, and message retransmission is not recommended.
As will be appreciated by persons versed in the art, a relatively wide window of tolerance is allowed under L2TP to determine whether a received control message was a duplicate (retransmitted) message. In effect, the L2TP standard allows any Ns sequence number falling within the last 32K of the correct Ns value to be considered duplicative. That is, if the peer is retransmitting any message with a sequence number of the last 32K range, it is considered to be a duplicate. Any peer tunnel endpoint server receiving a control message (e.g., a “Hello” keep alive message) within this window will retransmit its last acknowledgement (e.g., using a Zero Length Body (ZLB) reply), including the current Nr and Ns of that tunnel. Therefore, if a recovering L2TP endpoint server sends out two “Hello” control messages, each having a value of Ns separated by 32 K in the 64K universe of possible Ns values, then at least one of the two “Hello” control messages will be considered to be a duplicate and will elicit a useful ZLB response message, including the peer's current Ns and Nr sequence numbers in response. In this manner, a failed L2TP endpoint server can “trick” its peer into proffering the unknown or lost sequence numbers by eliciting the peer's Ns and Nr sequence numbers, then resetting its sequence numbers (Ns, Nr) to their corresponding proper values. The recovering L2TP endpoint server can thereby fully restore its lost Ns and Nr sequence numbers to what they should be in order to continue communicating with the peer endpoint server.
Therefore, according to the principles of the present invention, recovering endpoint server 400 sends two successive interrogatory “Hello” control messages 420 and 422 containing selected sequence numbers Ns and Nr so that recovering endpoint server 400 can elicit from the endpoint server 410 the correct sequence numbers (Ns, Nr) that endpoint server 410 is expecting. More specifically, a first “Hello” (keep-alive) control message 420 is sent by recovering endpoint server 400 to peer endpoint server 410 with potentially contrived, or arbitrary sequence numbers Ns=x and Nr=y, where x and y are between 0 and 64K (again, an approximation of 65,535 is being used for the sake of simplicity). In addition, recovering endpoint server 400 sends to peer endpoint server 410 a second “Hello” control message 422. The second “Hello” control message 422, however, is provided with the sequence numbers Ns=[(x+(2**15)) MOD (2**16)] and Nr=y. It will be understood by persons versed in the art that the Nr value can be any number, given that only the Ns needs to fit the “duplicate message” definition for a response to be received from the peer. Thus, it is assured that at least one of the two “Hello” control messages 420 and 422 will lie within the window allowed by L2TP for duplication, and that peer endpoint server 410 will respond to this “Hello” control message with the desired ZLB message 430 that carries the desired Ns=n and Nr=m values that recovering endpoint 400 will use to reset its sequence numbers (Ns, Nr) to their corresponding proper values.
At step 504, the recovering endpoint server sends a second control message (e.g., also a “Hello” control message) to its peer endpoint server, with the second control message having a value of Ns greater than that of the first control message's Ns by 32K, or Ns=[(x+(2**15)) MOD (2**16)], and Nr=y. It should be noted that, in practice, the second control message will have a value of Ns greater than that of the first control message's Ns by other than exactly 32k. The peer tunnel endpoint server is thus drawn to accepting one of the “Hello” messages as a duplicative message (according to the tolerance built into the L2TP standard). Accordingly, at step 506, the peer endpoint server sends, for example, a ZLB message to the recovering tunnel endpoint server, with the ZLB message including the proper sequence numbers (Ns, Nr) from the point of view of the peer endpoint server.
Once the desired Ns=n and Nr=m values are received from the peer endpoint server at step 506, the recovering endpoint server uses these values of the sequence numbers to properly formulate its next message's sequence numbers, at step 508. Specifically, the next Ns and Nr values of the recovering endpoint server would be equal to the respective Nr and Ns values of the peer endpoint server. Subsequent L2TP tunnel messages will thus be communicated without loss at the peer endpoint server's end of the tunnel because they were sent with the correct expected sequence numbers.
Although the invention has been described and illustrated in the foregoing illustrative embodiments, it is understood that the present disclosure has been made only by way of example, and that numerous changes to the details of implementation of the invention can be made without departing from the spirit and scope of the invention. For example, if the L2TP standard, or another standard adapted to the present concepts uses values different than those described herein or in RFC 2661, those differences will fall within the scope of the present disclosure.
Additionally, for example, if the tolerance band or window of acceptable sequence numbers that would trigger a ZLB message from a peer is changed, a different division of the circle 300 of
Therefore, other embodiments, extensions, and modifications of the ideas presented above are comprehended and should be within the reach of one versed in the art upon reviewing the present disclosure. Accordingly, the scope of the present invention in its various aspects should not be limited by the examples presented above. The individual aspects of the present invention, and the entirety of the invention, should be regarded so as to allow for such design modifications and future developments within the scope of the present disclosure. The present invention is limited only by the claims which follow.