The present disclosure relates to convergence for connectivity fault management.
IEEE 802.1ag (“IEEE Standard for Local and Metropolitan Area Networks Virtual Bridged Local Area Networks Amendment 5: Connectivity Fault Management”) is a standard defined by the IEEE (Institute of Electrical and Electronics Engineers). IEEE 802.1ag is largely identical with ITU-T Recommendation Y.1731, which additionally addresses performance management.
IEEE 802.1ag defines protocols and practices for OAM (Operations, Administration, and Maintenance) for paths through IEEE 802.1 bridges and local area networks (LANs). IEEE 802.1 ag defines maintenance domains, their constituent maintenance points, and the managed objects required to create and administer them. IEEE 802.1ag also defines the relationship between maintenance domains and the services offered by virtual local area network (VLAN)-aware bridges and provider bridges. IEEE 802.1ag also describes the protocols and procedures used by maintenance points to maintain and diagnose connectivity faults within a maintenance domain.
Maintenance Domains (MDs) are management space on a network, typically owned and operated by a single entity. Maintenance End Points (MEPs) are Points at the edge of the domain. MEPs define the boundary for the domain. A maintenance association (MA) is a set of MEPs configured with the same maintenance association identifier (MAID) and MD level.
IEEE 802.1ag Ethernet CFM (Connectivity Fault Management) protocols comprise three protocols that work together to help administrators debug Ethernet networks. They are: Continuity Check, Link Trace, and Loop Back.
Continuity Check messages (CCMs) are “heart beat” messages for CFM. The Continuity Check Message provides a means to detect connectivity failures in a MA. CCMs are multicast messages. CCMs are confined to a domain (MD). CCM messages are unidirectional and do not solicit a response. Each MEP transmits a periodic multicast Continuity Check Message inward towards the other MEPs
IEEE 802.1ag specifies that a CCM can be transmitted and received every 3.3 ms for each VLAN to monitor the continuity of each VLAN. A network bridge can typically have up to 4K VLANs. It follows that a bridge may be required to transmit over 12K CCM messages per second and receive 12K×N CCM messages, where N is the average number of remote end-points per VLAN within the network. This requirement creates an overwhelming control plane processing overhead for a network switch and thus presents significant scalability issues.
Accordingly, a need exists for an improved method of verifying point-to-point, point-to-multipoint, and multipoint-to-multipoint Ethernet connectivity among a group of Ethernet endpoints. A further need exists for such a solution that allows OAM protocols such as those defined by IEEE 802.1 ag and ITU-T Y.1731 OAM to utilize this verification method. A further need exists for such a solution that is scalable to support a full range of VLANs available on a network bridge.
A solution for convergence for connectivity fault management includes, at a device having a network interface, maintaining a continuity state. The continuity state is associated with a Connectivity Fault Management (CFM) Maintenance Association (MA) comprising multiple Maintenance End Points (MEPs) including a first MEP associated with the device. The maintaining includes setting the state to a value indicating continuity of the MA if a converged notification is received from the first MEP. The maintaining also includes setting the state value to a value indicating loss of continuity of the MA if a predetermined number of echo packets sent by the device towards the MEPs other than the first MEP are not received by the device within a predetermined time period.
The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate one or more embodiments of the present invention and, together with the detailed description, serve to explain the principles and implementations of the invention.
In the drawings:
Embodiments of the present invention are described herein in the context of convergence for connectivity fault management. Those of ordinary skill in the art will realize that the following detailed description of the present invention is illustrative only and is not intended to be in any way limiting. Other embodiments of the present invention will readily suggest themselves to such skilled persons having the benefit of this disclosure. Reference will now be made in detail to implementations of the present invention as illustrated in the accompanying drawings. The same reference indicators will be used throughout the drawings and the following detailed description to refer to the same or like parts.
In the interest of clarity, not all of the routine features of the implementations described herein are shown and described. It will, of course, be appreciated that in the development of any such actual implementation, numerous implementation-specific decisions must be made in order to achieve the developer's specific goals, such as compliance with application- and business-related constraints, and that these specific goals will vary from one implementation to another and from one developer to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking of engineering for those of ordinary skill in the art having the benefit of this disclosure.
According to one embodiment, the components, process steps, and/or data structures may be implemented using various types of operating systems (OS), computing platforms, firmware, computer programs, computer languages, and/or general-purpose machines. The method can be run as a programmed process running on processing circuitry. The processing circuitry can take the form of numerous combinations of processors and operating systems, connections and networks, data stores, or a stand-alone device. The process can be implemented as instructions executed by such hardware, hardware alone, or any combination thereof. The software may be stored on a program storage device readable by a machine.
According to one embodiment, the components, processes and/or data structures may be implemented using machine language, assembler, C or C++, Java and/or other high level language programs running on a data processing computer such as a personal computer, workstation computer, mainframe computer, or high performance server running an OS such as Solaris® available from Sun Microsystems, Inc. of Santa Clara, Calif., Windows Vista™, Windows NT®, Windows XP, Windows XP PRO, and Windows® 2000, available from Microsoft Corporation of Redmond, Wash., Apple OS X-based systems, available from Apple Inc. of Cupertino, Calif., or various versions of the Unix operating system such as Linux available from a number of vendors. The method may also be implemented on a multiple-processor system, or in a computing environment including various peripherals such as input devices, output devices, displays, pointing devices, memories, storage devices, media interfaces for transferring data to and from the processor(s), and the like. In addition, such a computer system or computing environment may be networked locally, or over the Internet or other networks. Different implementations may be used and may include other types of operating systems, computing platforms, computer programs, firmware, computer languages and/or general-purpose machines; and. In addition, those of ordinary skill in the art will recognize that devices of a less general purpose nature, such as hardwired devices, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or the like, may also be used without departing from the scope and spirit of the inventive concepts disclosed herein.
In the context of the present invention, the term “network” includes any manner of data network, including, but not limited to, networks sometimes (but not always and sometimes overlappingly) called or exemplified by local area networks (LANs), wide area networks (WANs), metro area networks (MANs), storage area networks (SANs), residential networks, corporate networks, inter-networks, the Internet, the World Wide Web, cable television systems, telephone systems, wireless telecommunications systems, fiber optic networks, token ring networks, Ethernet networks, Fibre Channel networks, ATM networks, frame relay networks, satellite communications systems, and the like. Such networks are well known in the art and consequently are not further described here.
In the context of the present invention, the term “identifier” describes an ordered series of one or more numbers, characters, symbols, or the like. More generally, an “identifier” describes any entity that can be represented by one or more bits.
In the context of the present invention, the term “distributed” describes a digital information system dispersed over multiple computers and not centralized at a single location.
In the context of the present invention, the term “processor” describes a physical computer (either stand-alone or distributed) or a virtual machine (either stand-alone or distributed) that processes or transforms data. The processor may be implemented in hardware, software, firmware, or a combination thereof.
In the context of the present invention, the term “data store” describes a hardware and/or software means or apparatus, either local or distributed, for storing digital or analog information or data. The term “Data store” describes, by way of example, any such devices as random access memory (RAM), read-only memory (ROM), dynamic random access memory (DRAM), static dynamic random access memory (SDRAM), Flash memory, hard drives, disk drives, floppy drives, tape drives, CD drives, DVD drives, magnetic tape devices (audio, visual, analog, digital, or a combination thereof), optical storage devices, electrically erasable programmable read-only memory (EEPROM), solid state memory devices and Universal Serial Bus (USB) storage devices, and the like. The term “Data store” also describes, by way of example, databases, file systems, record systems, object oriented databases, relational databases, SQL databases, audit trails and logs, program memory, cache and buffers, and the like.
In the context of the present invention, the term “network interface” describes the means by which users access a network for the purposes of communicating across it or retrieving information from it.
In the context of the present invention, the term “system” describes any computer information and/or control device, devices or network of devices, of hardware and/or software, comprising processor means, data storage means, program means, and/or user interface means, which is adapted to communicate with the embodiments of the present invention, via one or more data networks or connections, and is adapted for use in conjunction with the embodiments of the present invention.
In the context of the present invention, the term “switch” describes any network equipment with the capability of forwarding data bits from an ingress port to an egress port. Note that “switch” is not used in a limited sense to refer to FC switches. A “switch” can be an FC switch, Ethernet switch, TRILL routing bridge (RBridge), IP router, or any type of data forwarder using open-standard or proprietary protocols.
The terms “frame” or “packet” describe a group of bits that can be transported together across a network. “Frame” should not be interpreted as limiting embodiments of the present invention to Layer 2 networks. “Packet” should not be interpreted as limiting embodiments of the present invention to Layer 3 networks. “Frame” or “packet” can be replaced by other terminologies referring to a group of bits, such as “cell” or “datagram.”
It should be noted that the convergence for connectivity fault management system is illustrated and discussed herein as having various modules which perform particular functions and interact with one another. It should be understood that these modules are merely segregated based on their function for the sake of description and represent computer hardware and/or executable software code which is stored on a computer-readable medium for execution by appropriate computing hardware. The various functions of the different modules and units can be combined or segregated as hardware and/or software stored on a computer-readable medium as above as modules in any manner, and can be used separately or in combination.
In example embodiments of the present invention, a continuity verification service is provided to an Ethernet OAM module, allowing augmentation of the Ethernet OAM continuity check messaging model to improve scalability of the continuity check function. When coupled with Ethernet OAM CC, Ethernet OAM CC may be configured to execute at a relatively low rate while the continuity verification service described herein may be configured to execute at a relatively high rate to maintain a desired continuity fault detection time while minimizing control plane overhead.
According to one embodiment, echo module 120 and convergence module 115 are combined into a single module. According to another embodiment, all or part of convergence module 115 and echo module 120 are integrated within Ethernet OAM module 135.
According to one embodiment, the one or more processors 110 are further configured to set the state to a value indicating loss of continuity if a nonconverged notification is received, or if a notification that the first MEP has been disabled is received.
According to one embodiment, the one or more processors 110 are further configured to send the state towards the first MEP. Ethernet OAM module 135 may use the state forwarded by the one or more processors 110 to update its continuity status.
According to one embodiment, Ethernet OAM module 135 is further configured to perform continuity checking at a relatively low frequency. The one or more processors 110 are further configured to perform the maintaining (continuity service) at a relatively high frequency. For example, continuity checking by Ethernet OAM module 135 may be configured to execute at 5-second intervals, and the maintaining may be configured to execute at 3.3 ms intervals.
According to one embodiment, upon receiving from convergence module 115 an indication of loss of continuity, Ethernet OAM module 135 behaves as if it had lost CCM frames for a remote MEP within the MA for a predetermined number of consecutive CCM frame intervals. According to one embodiment, the predetermined number is three. Similarly, according to one embodiment, upon receiving from convergence module 115 an indication of continuity, Ethernet OAM module 135 behaves as if it has just started to receive CCM frames from the disconnected remote MEP again.
According to one embodiment, the one or more processors 110 are further configured to, if the converged notification is received, receive a value indicating a quantity of remote MEPs in the MA, and possibly physical addresses associated with the remote MEPs. The physical addresses may be, for example, MAC addresses. The physical addresses may be used to maintain per-node continuity status.
According to one embodiment, the echo packets sent by the device 100 comprise point-to-point echo packets. This is described in more detail below, with reference to
According to one embodiment, the device 100 is configured as one or more of a switch, a bridge, a router, a gateway, and an access device. Device 100 may also be configured as other types of network devices.
While convergence module 210 is in the “DOWN” state 222, continuity state values 238 received from echo module 242 will cause convergence module 210 to remain in the “DOWN” state 222, and the continuity state value 238 is passed 202 to the Ethernet OAM module 200. Once Ethernet OAM module 200 re-converges on the MA, Ethernet OAM module 200 sends a converged notification 206 to convergence module 210, driving convergence module 210 to the “UP” state 234.
According to one embodiment, echo module 120 is configured to off-load processing overhead of Ethernet OAM 200 Continuity Check processing. For each Ethernet OAM MA (per VLAN or per Service Instance) there is one and only one MEP designated as a “beacon” entity for convergence module 115. All other MEPs in the MA are considered “non-beacon” entities.
The “beacon” entity is configured in active mode while all “non-beacon” entities are configured in passive mode. That is, a beacon entity configured in active mode sends out echo packets periodically. Beacon entities configured in passive mode do not actively send out echo packets.
When a Beacon entity receives an echo packet, it updates its continuity state value in convergence module 210. The convergence module 115 is configured to use this information to deduce whether loss of continuity should be intimated to its parent session which will in turn notify its Ethernet OAM application client 200.
A non-beacon entity is configured to, it receives an echo packet, modify the echo packet and loop back the echo packet. Depending on the connection model (i.e., point-to-point, point-to-multipoint, multipoint-to-multipoint), the echo packet will undergo different packet modification rules before it is looped back. According to one embodiment, the returned echo packet is sent on a different VLAN ID (VID). This is described in more detail below with reference to
According to one embodiment, a continuity detection cycle is defined as a sequence of N echo packets sent with a time interval Tms (microsecond) between each echo packet. Therefore, a detection cycle is N*T microseconds. For example, if N=3 and T=3.3, then a detection cycle is calculated as 9.9 ms. In other words, for every 9.9 ms, 3 echo packets should be received.
According to one embodiment, convergence module 115 attempts to detect loss of continuity at the MA level, without necessarily identifying exactly which entity's connectivity is lost.
This class of detection logic has the potential to achieve minimum resource overhead. For example, a MA has M non-beacon MEP and a (N*T) detection cycle, a beacon node should expect to receive a total of (M*N) echo packets every detection cycle. If we define a “MA continuity threshold” CT to be the maximum number of lost echo packet before the signal failure condition is declared on a MA and X is the actual number of echo packets a beacon node received then it follows that—
By adjusting the threshold value, a MA fault tolerance factor can be defined.
According to one embodiment, the one or more processors 110 are further configured to set the state to a value indicating loss of continuity if a predetermined number of echo packets sent by the device towards MA or a particular one of the MEPs other than the first MEP 200 are not received by the device within a predetermined time period.
For this class of detection, convergence module 210 attempts to detect loss of continuity to any entity which is a member of the MA.
This class of detection logic stores information such as list of physical addresses for every MEP in a MA. Convergence module 115 is configured to use this additional information to identify which non-beacon MEP has lost continuity. Let X(MEP-ID) represent the number of echo packets received within a detection cycle from non-beacon MEP with MEP-ID then the following detection logic can be derived—
When a network device configured as a beacon node (“device A”) initiates an echo packet to a network device configured as a non-beacon node (“device B”), device B puts a device B's reserved physical address in the source physical address field and a device B reserved physical address in the destination address field. Device A also fills the VLAN-ID with a specific value and sends the Ethernet frame to device B.
When device B replies an echo packet back to device A, device B swaps the source and destination fields in the received echo packet and also fills the VLAN-ID with a specific value and sends the Ethernet frame back to device A. The VLAN-ID can be the same VLAN-ID as in the received echo packet, or it can be a different value.
According to one embodiment, an echo frame is identified by a reserved physical addresses, for example a unicast or group physical address. According to another embodiment, an echo frame is identified by a reserved Ethertype in the length/type field.
According to one embodiment, device configured as a beacon node and devices configured as non-beacon nodes encapsulate packets according to a standard. According to one embodiment, the standard is IEEE 802.1Q. An Ethernet frame according to IEEE 802.1 Q is shown in Table 1 below.
According to another embodiment, packets are encapsulated according to IEEE 802.1ad Q-in-Q frame format. According to another embodiment, packets are encapsulated according to IEEE 802.1ah MAC-in-MAC frame format.
According to one embodiment, echo module 242 uses the outer-most VLAN tag of an Ethernet frame format. For example, echo module 242 uses the C-tag in the case of an IEEE 802.1Q frame. As a further example, echo module 242 uses the S-tag in the case of an IEEE 802.1ad. As a further example, echo module uses the B-tag in the case of an IEEE 802.1ah frame.
Many other devices or subsystems (not shown) may be connected in a similar manner. Also, it is not necessary for all of the devices shown in
While embodiments and applications of this invention have been shown and described, it would be apparent to those skilled in the art having the benefit of this disclosure that many more modifications than mentioned above are possible without departing from the inventive concepts herein. The invention, therefore, is not to be restricted except in the spirit of the appended claims.