The present invention generally relates to Internet protocol address resolution in data centers. A data center is a physical complex housing: physical servers, network switches and routers, network service appliances, and networked storage. The purpose of a data center is to provide application, computing and/or storage services to customers. In a data center, a customer is called a “tenant”. It can be a person, an organization within an enterprise, or an enterprise, associated to a set of compute, storage and network resources of the data center. A virtual layer 2 or layer 3 domain that belongs to a tenant constitutes a virtual network. One of the services provided to the tenants of a data center is virtualized infrastructure, also known as infrastructure as a service: Several virtual machines share the resources of a single physical computer server using the services of a hypervisor. A hypervisor is server virtualization software running on a physical compute server that hosts virtual machines. The hypervisor provides shared computing, memory, storage, and network connectivity to the virtual machines that it hosts. It often embeds a virtual network node such as a virtual switch or a virtual router that provides services similar to those of respectively a physical Ethernet switch or a physical IP router. It forwards frames between virtual machines and virtual network interface controllers, within the same physical server, or between a virtual machine and a physical network interface controller card connecting the server to a physical Ethernet switch or router. It also enforces network isolation between virtual machines that should not communicate with each other.
A data center also uses network virtualization, referred to as software-defined networking (SDN). It is the process of merging hardware resources, software resources, and networking functionality into a software-based virtual network. SDN allows network administrators to have programmable central control of network traffic via a SDN controller without requiring manual management of each individual network node. A configuration of SDN can create a logical network control plane which is decoupled from the data forwarding plane hardware, i.e. a network node can forward packets and a separate server can run the network control plane (i.e. the controller).
Address resolution in a local area network is classically made by using the protocol ARP (Address Resolution Protocol, IETF RFC 826) in IPv4 (Internet Protocol version 4). The protocol NDP (Neighbor Discovery Protocol, IETF RFC 4861) is similarly used in IPv6 (Internet Protocol version 6). They map an IP address to a physical machine address that is recognized in the local network. For example, in IPv4, an address is 32 bits long, whereas a physical address is 48 bits long, for instance, in an Ethernet local area network. (In this case, a physical machine address is also called as a Media Access Control (MAC) address) A table, usually called the ARP cache, is used to maintain a correlation between each MAC address and the corresponding IP address. The protocol ARP provides protocol rules for making this correlation, and provides address conversion.
When a sender sends a packet destined for a receiver on a local area network, it first asks its local ARP program to find a physical address, e.g. a MAC address that is associated to the destination IP address. The ARP program looks into the ARP cache. If the ARP program finds the destination MAC address, it provides the later to the sender so that the IP packet can be encapsulated into the right MAC frame length and format, and then sent to the receiver. If no matched entry is found in the ARP cache for the destination IP address, the protocol ARP broadcasts a request packet, in a special format, to all the machines on the local area network, to check whether one machine has this IP address associated with it. A machine that recognizes this IP address as its own address returns a reply while indicating its hardware address. The protocol ARP within the sender updates the ARP cache for future reference, and then sends the packet with the MAC address of the receiver (i.e. the machine that replied to the ARP request).
In a data center, the greatest flexibility in virtual machine and workload management occurs when it is possible to place a virtual machine (or workload) anywhere in the data center, regardless of what IP address the virtual machine uses and how the physical network is laid out. The movement of virtual machines within a data center is easiest when virtual machine placement and movement do not conflict with the IP subnet boundaries of the data center's network, so that the IP address of a virtual machine needs not to be changed to reflect its actual point of attachment on the network, from a layer 3 perspective. Thus, from a virtual machine management perspective, operations are simplified if all the servers implied in the virtual machine movement are on the same domain of layer 2. However this implies a large layer 2 domain. Address resolution using ARP in IPv4, and NDP in IPv6, raises scalability concern in data centers comprising a very large number of hosts, e.g. virtual machines. The IETF (Internet Engineering Task Force) has exposed this concern in the document IETF RFC 6820: Address Resolution Problems in Large Data Center Networks—T. Narten et al. —January 2013. They have created a Working Group named ARMD (Address Resolution for Massive number of hosts in the Data center) in order to work on a standardized solution specification.
1) Overlay networks, such as the ones specified within the IETF NVO3 (Network Virtualization Over layer 3) Working Group, can be deployed in order to extend a layer 2 domain across the physical network (precisely across the underlay IP network) while maintaining it within a scalable ARP broadcast domain. Overlay networks are built using tunnel methods such as those described in the documents:
This solution is not good enough as it requires the implementation of a large number of multicast trees at the underlay network—i.e. the broadcast domain at the overlay network is mapped to the multicast domain at the underlay network. This implies the maintenance of a large number of states within the underlay network.
Note: The layer 3 overlay network implementations raise other concerns with regard to virtual machine mobility such as the large amount of signaling (e.g. Open Shortest Path First (OSPF) signaling).
2) An alternative best-known solution has been provided by the document IETF draft-shah-armd-arp-reduction-02 —ARP Broadcast Reduction for Large Data Centers—H. Shah et al.—October 2011. This document proposes a method to reduce the number of ARP broadcasts that are sent throughout the network. This method is applied by a top of rack switch (ToR switch), i. e. a network node aggregating traffic from all the processing resources installed in a same rack. This ToR switch intelligently processes ARP frames, rather than simply broadcasting them throughout the broadcast domain. When such processing generates a positive result, an ARP proxy response from the top of rack switch contains the MAC address of the destination.
This solution is not good enough as it complexifies the design of a ToR switch and thus increases its cost. Indeed, the ToR switch has to monitor all ARP traffic transiting through this ToR switch, and has to process them in the following manner:
These operations require additional processing power and additional memory space for ARP caching (Layer 2 domain may be large). Moreover, the use of ARP aging timer on the ToR switch can lead to inconsistency when virtual machines are moved or are deleted. For instance, the following tricky situation can occur: Host A is attached to a top of rack switch #1, and host B is attached to a top of rack switch #2. If host B issues an ARP request for host A, and if the entry is available at switch #2, then switch #2 would send the ARP reply on behalf of host A. It is possible that host A is no longer available, but there is no way for switch #2 to know this, and it would continue to respond on behalf of host A, until its entry for host A has timed out.
Thus, there is a need to provide a better technical solution for Internet protocol address resolution in data centers.
This problem is solved by the method according to the invention.
A first object of the invention is a method for centralized address resolution in a data center comprising at least one processing resource, at least one processing resource controller, at least one networking resource, and at least one networking resource software-defined network controller; this method comprising the steps of:
Thanks to the local databases respectively associated to the controllers, the method according to the invention enables replacing a set of ARP broadcast messages (or NDP multicast messages), on the data plane, by a single unicast message on the control plane. The use of a single unicast message solves the scalability issue presented in the first section, without reducing the flexibility concerning virtual machine mobility.
This method also improves scalability as it takes into consideration architecture with multiple SDN controllers, each one being responsible for a part of the data center resources.
A second object of the invention is a computer program product comprising computer-executable instructions for performing this method when the program is run on a computer.
Other features and advantages of the present invention will become more apparent from the following detailed description of embodiments of the present invention, when taken in conjunction with the accompanying drawings.
In order to illustrate in detail features and advantages of embodiments of the present invention, the following description will be with reference to the accompanying drawings:
Description below will generally focus on an improvement to ARP in IPv4 but the described principles can also be applied in order to improve NDP in IPv6 (e.g. Neighbor Solicitation and Neighbor Advertisement).
The exemplary data center represented on
It is noted that there could be an intermediary hierarchy of address resolution databases implemented between the aforementioned local database (e.g. LDB1, LDB2) hierarchy and the centralized database CARDB. This depends on the size and the architecture of the data center.
As shown in
For instance, the table TX2 for the tenant A within the virtual local area network VN X:
For instance, the table TY2 for the tenant A within the virtual local area network VN Y:
For instance, the table TZ2 for the tenant A within the virtual local area network VN Z:
So the local address resolution database LDB1 contains all the information about MAC and IP addresses of the processing resources controlled by the processing resource SDN controller, as well as the associations between these addresses.
Similarly, the local address resolution database LDB2 is organized into three separate address resolution tables, TX3, TY3, TZ3, one table for each pair Tenant identifier-Virtual network identifier. These address resolution tables, TX3, TY3, TZ3 store all the associations between the IP addresses and MAC addresses, for all the networking resources that are controlled by the networking resource SDN controller.
For instance, the table TX3 for the tenant A within the virtual local area network VN X:
For instance, the table TY3 for the tenant A within the virtual local area network VN Y:
For instance, the table TZ3 for the tenant A within the virtual local area network VN Z:
So the local address resolution database LDB2 contains all the information about MAC and IP addresses of the networking resources controlled by the networking resource SDN controller, as well as the associations between these addresses.
The centralized address resolution database CARDB is populated by copying the entries written to the local address resolution databases LDB1 and LDB2.
The important point is that all controllers (including NMS if applicable) communicate with the centralized address resolution database CARDB via a new protocol according to the invention, called centralized address resolution protocol CARP, which is described in what follows.
According to the invention, the CARP protocol allows each controller (respectively associated to the local address resolution databases LDB1, LDB2) to communicate with the centralized address resolution database CARDB in order to:
The proposed CARP protocol is also used for the communication between the virtual machines VM1, . . . , VM6 and the local address resolution data base LDB1. In the same way, the CARP protocol is used for the communication between network nodes, such as the network virtualization edges NVE1, . . . , NVE4, the top of rack switches ToRS1, ToRS2, ToRS3, the core 1 and core 2, and the local address resolution database LDB2.
According to the proposed CARP protocol, the address resolution procedure comprises the following steps:
Thanks to the invention, an IP host (e.g. a virtual machine VM1, . . . , VM6) or an IP interface from a router (e.g. a virtual port of a virtual router), in most cases, only needs to send a unicast CARP read message to its controller in order to get the answer with regards to MAC-IP address association, instead of broadcasting a classical ARP request message onto the virtual local area network. Thus, a unicast read message, sent out on the control channel, replaces the classical broadcast/multicast request message sent out on the data plane (e.g. a virtual local area network). This solves the ARP broadcast scalability issue (on the data plane) as described previously in the Prior Art section.
The proposed CARP protocol creates and maintains a separate local address resolution table, in the local address resolution databases associated to a processing resource SDN controller, for each virtual local area network pertaining to a given tenant.
Similarly, the proposed CARP protocol creates and maintains a separate network-related address resolution table in the local address resolution database associated to a networking resource SDN controller, for each virtual local area network pertaining to a given tenant.
Within each local address resolution table, entries are sorted in a pre-defined order (e.g. IP address increase order) so as to accelerate looking up procedure. It is noted that the same IP address, and/or the same MAC address, can be part of different address resolution tables (i.e. there is some overlap).
It is noted that a NMS can play the role of a network SDN controller with the CARP protocol. However, if some routers cannot be included into the previously described procedure (e.g. because their related NMS does not support the proposed CARP protocol) then traditional ARP broadcast on the data plane should be applied for the resolution of addresses associated to those routers. However, in order to avoid scalability of ARP broadcast in a data center, it is recommended to include the maximal number of processing resources and network resources into the CARP procedure, especially virtualized resources (e.g. virtual machine and network virtualization edges) which tend to be of a very large number, and to be very dynamic (e.g. short time between their creation and their deletion).
Both the processing resource controller and the network resource controller copy the entries of their respective local databases LDB1 and LDB2 to the centralized address resolution database CARDB by using a CARP protocol WRITE message (see below for more details). Duplication (e.g. the same IP address written twice for the same virtual network identifier) should be avoided when writing into the centralized address resolution database CARDB, by using well-known database techniques. Writing conflicts should also be avoided, by using well-known database techniques.
Finally, the centralized address resolution database CARDB contains a merged version of all the local address resolution databases associated to SDN controllers.
The centralized address resolution database CARDB keeps track of the controller that is the origin of each of its entries (i.e. which entry is written by which controller). Its entries are said to be “colored” with the “color” of the origin controller.
Note: For IPv6, the centralized and the local address resolution tables can contain additional information such as:
This allows a SDN controller, or the centralized address resolution database, to gather together neighbors of the same IPv6 subnet, and to identify router interface IPv6 addresses present on a given IPv6 subnet.
CARP messages can be conveyed with an IP/UDP (Internet Protocol/Universal datagram Protocol) header. UDP port numbers can be implemented in a proprietary manner, but standardization is recommended for a broad deployment. In the later case, UDP port number for CARP protocol should be assigned by the IANA (Internet Assigned Number Authority).
CARP messages can also be conveyed with an Ethernet header over a point-to-point link (e.g. between a SDN controller and the centralized address resolution database CARDB) with a destination MAC address, chosen within the link local address range.
In an exemplary embodiment, the CARP message format comprises the following fields:
Note: Optionally, the centralized address resolution database CARDB can warn other SDN controllers that entries have just been deleted (in real-time manner) by the origin SDN controller. It can do this by sending out, to all those SDN controllers, a DELETE_RESP message with a special flag (Flag=0xF1). In this case, the Sequence Number field is ignored by SDN controllers which receive the message.
Different Type Length value structures are, for instance, defined as follows:
3) Operations when a Virtual Machine Moves:
Usually a virtual machine move is organized by a single processing resource controller (e.g. a move within the same Hypervisor/physical machine) within a same IP subnet. The virtual machine keeps the same IP and MAC addresses; and nothing is to be modified neither within the address resolution tables of its associated controller tables nor within the address resolution tables of the centralized address resolution database CARDB. Thus, such virtual machine move is seamless with regards to address resolution database modification according to the invention.
The virtual machine moves within a same IP subnet, but implies two different processing resource controllers. It still keeps the same IP and MAC addresses. But, both processing resource controllers have to make modifications in their respective address resolution tables, with one deleting an entry and the other adding an entry. Both controllers should also inform the centralized address resolution database so that the later changes the “color” of the concerned entries. The controller that the virtual machine is quitting sends a DELETE message to the centralized address resolution database CARDB, whereas the other sends a WRITE message to the centralized address resolution database CARDB. These DELETE and WRITE messages can arrive at different instants at the centralized address resolution database CARDB:
A virtual machine moves to a different IP subnet. In this case, the old entry should be deleted and a new entry should be added within the tables associated to the impacted controller(s), and possibly within the tables of the centralized address resolution database CARDB (if two controllers are involved).
The method according to the invention has also the advantage of displacing the complexity from the ToR switches (i.e. specific hardware) towards the controllers and the centralized database (i.e. commodity hardware). This generates a cost reduction.
Number | Date | Country | Kind |
---|---|---|---|
13306342.0 | Sep 2013 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2014/067337 | 8/13/2014 | WO | 00 |