Not applicable.
Not applicable.
Computer virtualization has dramatically and quickly changed the information technology (IT) industry in terms of efficiency, cost, and the speed in providing new applications and/or services. The trend continues to evolve towards network virtualization, where a set of virtual machines (VMs) or servers may communicate in a virtual network environment that is decoupled from the underlying physical networks in a data center (DC). An overlay virtual network is one approach to provide network virtualization services to a set of VMs or servers. An overlay virtual network may enable the construction of many virtual tenant networks on a common network infrastructure, where each virtual tenant network may have independent address space, independent network configurations, and traffic isolation among each other, which are all decoupled from the underlying network infrastructure. In addition, an overlay virtual network may support migrations of VMs since there is no longer a physical network limitation. Further, an overlay virtual network may speed up the configuration of multi-tenant cloud applications and virtual DCs, leading to potential new DC applications, such as a software defined DC.
An overlay virtual network may provide communication among a set of tenant systems (TSs), where TSs may be VMs on a server or physical servers. An overlay virtual network may provide Layer 2 (L2) or Layer 3 (L3) services to the connected TSs via network virtualization edges (NVEs), where NVEs may be implemented as part of a virtual switch within a hypervisor, and/or physical switch or router. An NVE encapsulates ingress tenant traffic and sends the encapsulated traffic over a tunnel across an underlying network toward an egress NVE. An egress NVE at the tunnel remote end point decapuslates the traffic prior to delivering the original data packet to the appropriate TS. There are a number of encapsulation protocols available in the industry today, such as virtual eXtensible Local Area Network (VXLAN) encapsulation, Microsoft's Network Virtualization over Generic Routing Encapsulation (NVGRE), and Internet Protocol (IP) Generic Routing Encapsulation (GRE). In some instances, the NVEs in an overlay virtual network instance may not employ the same encapsulation protocols. In addition, an overlay virtual network may interwork with a non-overlay virtual network such as virtual local area network (VLAN). Consequently, there is a need in the art for a solution to enable multiple data plane encapsulations in an overlay virtual network by automatically mapping services and identifiers and translating encapsulation semantics between different encapsulation protocols.
In one example embodiment, a tunnel endpoint communicates in an overlay virtual network (OVN) with multiple data plane encapsulations by joining the OVN, advertising a supported route and a plurality of supported encapsulation types including overlay and non-overlay encapsulations, tracking other OVN members' routes and corresponding encapsulation types, maintaining a forwarding table with the routes and the corresponding encapsulation types in the OVN, performing encapsulation translation when receiving a data packet with a first encapsulation type that is destined to an egress tunnel endpoint of a second encapsulation type, and forwarding the data packet to the destination according to a route to the egress tunnel endpoint retrieved from an entry in the forwarding table.
In another example embodiment, a computer program product comprising computer executable instructions stored on a non-transitory medium that when executed by a processor causes a local tunnel endpoint to perform control plane function in an OVN with multiple data plane encapsulations. In this example embodiment, the control plane function comprises joining an OVN, advertising a supported route and a supported encapsulation type, obtaining other OVN members' routes and corresponding encapsulation types, maintaining a forwarding table with the routes and the corresponding encapsulation types in the OVN, and establishing overlay tunnels to the peers with an encapsulation type that is identical to the supported encapsulation type.
In yet another example embodiment, a Border Gateway Protocol (BGP) is extended to support the control signaling in an OVN with multiple data plane encapsulations automatically. In this example embodiment, the automatic control signaling comprises joining the OVN, advertising a supported capability in a BGP Open message, advertising a supported route and a supported tunnel encapsulation attribute in a BGP Update message, obtaining capabilities, routes, and corresponding tunnel encapsulation attributes of OVN members, and maintaining a forwarding table with the OVN members' routes and the corresponding tunnel encapsulation attributes.
These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
It should be understood at the outset that, although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
Disclosed herein are methods, apparatuses, and/or computer program products for communicating over an OVN that may support multiple data plane encapsulations. An NVE may communicate to a peer NVE directly when the peer NVE employs the same encapsulation type and automatically selects an OVG that may perform encapsulation translations when the peer NVE employs a different encapsulation type. An OVG or an NVE within an OVN may perform control plane functions, such as advertising its supported data plane encapsulation types and routes, tracking encapsulation types supported by other OVGs and/or peer NVEs in the same OVN, and maintaining forwarding routes to reach other OVGs and/or peer NVEs in the same OVN. In an example embodiment, a BGP may be extended to facilitate the control signaling automatically for multiple data plane encapsulations in an OVN. It should be noted that other control plane protocols may also be employed to implement the invention in the present disclosure.
It should be noted that in the present disclosure, the terms “underlying network”, “infrastructure network”, and “DC network” all refer to the actual physical network and may be used interchangeably. The terms “overlay virtual network” (“OVN”), “tenant network”, “overlay instance”, “overlay network”, and “network virtual overlay instance” refer to network virtualization overlay as described in the Internet Engineering Task Force (IETF) document draft-narten-nvo3-arch-00, which is incorporated herein by reference, and the terms may be used interchangeably. However, a “tenant network” may also comprise one or more OVNs. The terms “tenant system” (“TS”) and “endpoint” refer to an entity that originates or receives data from an OVN, and may be used interchangeably.
NVEs 120 may be implemented using software components, hardware, or a combination of both, and may be located on a virtual switch within a hypervisor, a physical switch, or server. NVEs 120 may perform routing, bridging, forwarding functions, and/or overlay virtual network functions. Overlay virtual network functions may include creation and maintenance of OVN states, data plane encapsulations/decapsulations, overlay tunnel initiations/establishments/tear downs, and automatic selection of overlay tunnels.
TSs 110 may include, but are not limited to VMs on a server, hosts, physical servers or other types of end devices that may originate data to or receive data from the overlay network via an NVE 120. TSs 110 may comprise an L2 Ethernet interface used to communicate with their associated NVEs 120. TSs 110 may be unaware of the overlay network. TSs 110 may communicate to remote TSs 110 in the same tenant network by sending packets directly to their associated NVEs 120.
The underlying network 130 is a physical network that provides connectivity between NVEs 120, but may be completely unaware of the overlay packets, the overlay tunnels 140, and the OVN. For instance, the underlying network 130 may be a DC physical network comprising Top of Rack (ToR) switches, aggregation switches, core switches, and/or DC gateway routers. Alternatively, the underlying network 130 may be multiple interconnected DC networks where NVEs 120 may be located in the same or different DC networks. In addition, the underlying network 130 may support multiple independent OVNs.
Typically, a large data center may deploy servers with different capacities, and/or features, and servers may be rolled out at different times. For example, a data center may comprise a combination of virtual servers and physical servers, which may be equipped with virtual switches. The servers that are equipped with hypervisor based virtual switches may support different encapsulation protocols, such as VXLAN encapsulation, Microsoft's NVGRE, IP GRE, MPLS or other encapsulation protocols. In order to enable communication between NVEs 120 in an OVN with multiple data plane encapsulations, there is a need to have an entity, either on a gateway or a standalone entity, that may map network services and network identifiers and modify packet encapsulation semantics with different encapsulations.
In one example embodiment, a DC operator may configure several OVGs 220 in a network overlay instance for load balancing. OVGs 220 in an OVN 230 may establish tunnels between each other and may perform load balancing on one or more OVGs 220. In addition, a DC operator may configure an OVG 220 or an NVE 120 to support multiple OVNs 230. However, the control plane functions described in method 700 with respect to
An OVN 230 may also support broadcast and/or multicast traffic. Broadcast and/or multicast traffic may be used to deliver common data packets to all tunnel endpoints in an OVN 230 and/or a set of tunnel endpoints in an OVN 230, respectively. In one example embodiment, an NVE 120 who is the ingress point of the broadcast and/or multicast data packet may replicate the broadcast and/or data packets to other peer NVEs 120 or OVGs 220 that support the same data plane encapsulation. In another example embodiment, NVE 120 may first route the broadcast and/or multicast data packets to an OVG 220, and the OVG 220 may send the broadcast and/or data packets over a P2MP overlay tunnel 140 to reach other NVEs 120. In order to avoid packet duplications in an OVN 230 with multiple OVGs 220, a DC operator may configure one OVG 220 as a designated gateway to forward all or a set of multicast and/or broadcast traffic. The designated OVG may determine a set of tunnel endpoints that may receive the multicast and/or broadcast data packet. The designated OVG may determine the encapsulation types supported by the set of receiving tunnel endpoints and encapsulate the data packet into the corresponding encapsulation types, which may or may not be the same. The designated OVG may then forward the corresponding encapsulated data packet to the receiving tunnel endpoint. When a non-designated OVG 220 receives a multicast and/or a broadcast data packet, the non-designated OVG 220 may drop the data packet.
The overlay tunnels 140a-d may transport data packets with a packet header comprising an inner address field, an encapsulation header, and an outer address field. In one example embodiment, the inner address field may comprise a MAC address of a remote TS that the data packet is destined to and a MAC address of the source TS that originated the data packet. The encapsulation header may comprise a VNID and/or other encapsulation type specific information. The outer address field may comprise IP addresses of the source and egress tunnel endpoints (e.g. NVE 120a-c or OVG 220), and thus the outer address field may also be referred to as the tunnel header.
In one example embodiment, the inner address field may be provided by an originating TS (not shown in
As discussed earlier, NVE 120a may not send a packet directly to a peer NVE 120c that supports a different encapsulation type. Instead, NVE 120a may first send the VXLAN encapsulated packet to an OVG 220.
Another example embodiment of a packet header 500 is shown in
In order to facilitate multiple data plane encapsulations in an OVN, OVN 230 may employ a control plane protocol, such as BGP and Interior Gateway Protocol (IGP) without manual configurations. Control plane functions may include establishing an OVN 230, advertising encapsulation types and tunnel routes, tracking peers' routes and corresponding encapsulation types, and maintaining lookup tables for routing and forwarding. Typically, a DC operator may configure a plurality of NVEs 120 and/or one or more OVGs 220 to be members of an OVN 230. Subsequent overlay functionalities may be performed by the NVEs 120 and/or the OVGs 220 through some control plane protocol.
At step 750, method 700 may check if the peer supports the same encapsulation type. If the peer supports the same encapsulation type, method 700 may proceed to step 760 to check if a prior overlay tunnel has been established with the peer. Method 700 may continue to step 770 and establish an overlay tunnel when an overlay tunnel has not been established with the peer. Otherwise, method 700 may return back to step 720 from step 760 when a prior overlay tunnel has been established with the peer. Recall in
Returning to step 730, method 700 may proceed to step 780 when method 700 does not receive a peer advertisement that represents the encapsulation type and routes supported by the peer. At step 780, method 700 may determine the packet is a TS attachment or detachment message. The attachment or detachment message may be advertised by an NVE when a TS attaches or detaches from an NVE, respectively. Upon the reception of the TS attachment or detachment message, method 700 may continue to step 790 and update an address mapping table with the addresses (e.g. MAC addresses or IP addresses) of the TS and the associated NVE. After method 700 completes step 790, method 700 may return to block 720 and continue to listen for a packet. Alternatively, at step 780, method 700 may return to block 720 when the received packet is not a TS attachment or detachment message. In another example embodiment, method 700 may skip the address mapping performed in steps 780 and 790 and may obtain the TS to NVE address mapping by employing other address mapping protocols, such as the Address Resolution Protocol (ARP), instead.
At step 835, method 800 may check if the egress NVE supports the same encapsulation type as the ingress NVE. If the egress NVE supports the same encapsulation type, method 800 may proceed to step 836 to add a tunnel header to the encapsulated data packet. The tunnel header may comprise the egress NVE's IP address and the ingress NVE's IP address, as described in packet header 300 with respect to
Returning to step 835, if the egress NVE supports a different encapsulation type, method 800 may select an overlay tunnel to an OVG that may support both the ingress NVE and the egress NVE encapsulation types as shown in step 838. At step 839, method 800 may add a tunnel header to the encapsulated data packet, which may comprise the OVG's IP address and the ingress NVE's IP address, as described in packet header 400 with respect to
Returning to step 820, an NVE may also receive a data packet destined to one of its associated TSs via an overlay tunnel either from an OVG or a peer NVE. In this case, method 800 may operate as the egress NVE. At step 851, method 800 may remove the tunnel header from the received data packet. At step 852, method 800 may decapsulate the received data packet (i.e. removing the encapsulation header). At step 853, method 800 may deliver the data packet to the destination TS.
At step 930, method 900 may retrieve an address mapping table entry with a mapping of the destination TS address to its associated NVE address (i.e. egress NVE address). The destination TS address may be obtained from an inner address field of the data packet. The address mapping table may be built previously from control plane as described in method 700 of
The control plane functions described in method 700 of
In one example embodiment, an NVE and/or an OVG may advertise its capability via the BGP Open message and may advertise its routes and corresponding encapsulation types via the BGP Update message. When an OVG receives route information, the OVG may not need to redistribute the route to other NVEs. However, if the OVG is an edge node (e.g. located at the edge or boundary of a network), the OVG may advertise the route information to an external domain.
The version field 1020 may be about one octet long and may be an unsigned integer that indicates the BGP version of the message. The AS field 1030 may be about two octets long and may indicate the AS number of the sending BGP. The hold time field 1040 may be about two octets long and may be an unsigned integer that indicates the number of seconds the sending BGP proposes for the value of the hold timer for calculating the maximum duration between successive Keep Alive and/or Update messages transmission. The BGP identifier field 1050 may be about four octets long and may be used to identify the IP address assigned to the sending BGP. The optional parameter length field 1060 may be about one octet long and may be an unsigned integer that indicates the total length of the optional parameter field 1070. The optional parameter field 1070 may comprise a plurality of optional parameters and may vary in length. The optional parameter field 1070 is TLV encoded. A TLV encoded message may include a type field that may indicate the message type, followed by a length field that may indicate the size of the message value, and a variable-sized series of bytes that carry the data for the message.
In one example embodiment, an OVG capability TLV 1080 may be added to the optional parameter field 1070 in the BGP Open message 1000 to indicate the support of OVNs 230 with multiple data plane encapsulations. The OVG capability TLV 1080 may comprise a capability code 1081, a length field 1082, and an OVG capability message value 1083. The capability code 1081 may be assigned by the Internet Assigned Numbers Authority (IANA). The length field 1082 may indicate the size of the OVG capability message value 1083. The OVG capability message value 1083 may comprise a supported encapsulation sub-TLV 1091, a supported NLRI type sub-TLV 1092, and a supported service function sub-TLV 1093. Since capability announcement messages may be optional in BGP, a peer BGP may send an OPEN statement without OVG capability TLV 1080 when the BGP peer does not support OVG capability. A BGP session may only begin when BGP peers agree to the supported functions. If BGP peers support the capability but do not support the same set of mechanisms, the responding BGP may set a flag to enable the support for both BGP peers in a session. In one example embodiment, the supported mechanism in each direction may also be different.
When an NVE or an OVG advertises its routes, the supported encapsulation type or types may also be advertised via the tunnel encapsulation attribute TLV 1440. The tunnel encapsulation attribute TLV 1440 may comprise an encapsulation sub-TLV 1441, a protocol type sub-TLV 1442, and a color sub-TLV 1443. Currently, the encapsulation types defined in the encapsulation sub-TLV 1441 may only include Layer Two Tunneling Protocol Version 3 (L2TPv3), GRE, and Internet Protocol in Internet Protocol (IP in IP). In order to support the encapsulation types described herein, three encapsulation sub-TLVs for VXLAN, NVGRE, and MPLS may be added. The protocol type sub-TLV 1442 may be encoded to indicate the type of the payload packets that will be encapsulated. When the encapsulation type is VXLAN, NVGRE or MPLS, the payload may carry an Ethernet frame, an IP packet, or others. The color sub-TLV 1443 may be encoded as a way to color the corresponding tunnel TLV.
The BGP extensions may facilitate the control plane functions in an OVN with multiple data plane encapsulations described in the present disclosure. The tunnel initiation/termination, tunnel selections, data plane encapsulations/decapsulations, and encapsulation translations may be independent from the control plane protocol employed. The control plane protocol may simply provide automatic signaling mechanisms for peers (e.g. NVEs 120, OVGs 220 from
It is understood that by programming and/or loading executable instructions onto the NE 1800, at least one of the processor 1830, the cache, and the long-term storage are changed, transforming the NE 1800 in part into a particular machine or apparatus, e.g., a multi-core forwarding architecture, having the novel functionality taught by the present disclosure. It is fundamental to the electrical engineering and software engineering arts that functionality that can be implemented by loading executable software into a computer can be converted to a hardware implementation by well-known design rules. Decisions between implementing a concept in software versus hardware typically hinge on considerations of stability of the design and numbers of units to be produced rather than any issues involved in translating from the software domain to the hardware domain. Generally, a design that is still subject to frequent change may be implemented in software, because re-spinning a hardware implementation is more expensive than re-spinning a software design. Generally, a design that is stable that will be produced in large volume may be implemented in hardware, for example in an ASIC, because for large production runs the hardware implementation may be less expensive than the software implementation. Often a design may be developed and tested in a software form and later transformed, by well-known design rules, to an equivalent hardware implementation in an application specific integrated circuit that hardwires the instructions of the software. In the same manner as a machine controlled by a new ASIC is a particular machine or apparatus, likewise a computer that has been programmed and/or loaded with executable instructions may be viewed as a particular machine or apparatus.
At least one embodiment is disclosed and variations, combinations, and/or modifications of the embodiment(s) and/or features of the embodiment(s) made by a person having ordinary skill in the art are within the scope of the disclosure. Alternative embodiments that result from combining, integrating, and/or omitting features of the embodiment(s) are also within the scope of the disclosure. Where numerical ranges or limitations are expressly stated, such express ranges or limitations should be understood to include iterative ranges or limitations of like magnitude falling within the expressly stated ranges or limitations (e.g., from about 1 to about 10 includes, 2, 3, 4, etc.; greater than 0.10 includes 0.11, 0.12, 0.13, etc.). For example, whenever a numerical range with a lower limit, Rl, and an upper limit, Ru, is disclosed, any number falling within the range is specifically disclosed. In particular, the following numbers within the range are specifically disclosed: R=Rl+k*(Ru−Rl), wherein k is a variable ranging from 1 percent to 100 percent with a 1 percent increment, i.e., k is 1 percent, 2 percent, 3 percent, 4 percent, 5 percent, . . . 50 percent, 51 percent, 52 percent, . . . , 95 percent, 96 percent, 97 percent, 98 percent, 99 percent, or 100 percent. Moreover, any numerical range defined by two R numbers as defined in the above is also specifically disclosed. The use of the term “about” means±10% of the subsequent number, unless otherwise stated. Use of the term “optionally” with respect to any element of a claim means that the element is required, or alternatively, the element is not required, both alternatives being within the scope of the claim. Use of broader terms such as comprises, includes, and having should be understood to provide support for narrower terms such as consisting of, consisting essentially of, and comprised substantially of. All documents described herein are incorporated herein by reference.
The present application claims priority to U.S. Provisional Patent Application 61/706,067, filed Sep. 26, 2012 by Lucy Yong, and entitled “System and Method of Network Virtual Overlay Gateway for Multiple Data Plane Encapsulation”, which is incorporated herein by reference as if reproduced in its entirety.
Number | Date | Country | |
---|---|---|---|
61706067 | Sep 2012 | US |