Virtualization allows the abstraction and pooling of hardware resources to support virtual machines in a Software-Defined Networking (SDN) environment, such as a Software-Defined Data Center (SDDC). For example, through server virtualization, virtualization computing instances such as virtual machines (VMs) running different operating systems may be supported by the same physical machine (e.g., referred to as a “host”). Each virtual machine is generally provisioned with virtual resources to run an operating system and applications. The virtual resources may include central processing unit (CPU) resources, memory resources, storage resources, network resources, etc. In practice, a VM may experience performance issues when there is a large volume of traffic going through its virtual network adapter, where packets may be dropped.
In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the drawings, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein.
Challenges relating to packet handling will now be explained in more detail using
In the example in
Hypervisor 114 maintains a mapping between underlying hardware 112 of host 110A and virtual resources allocated to respective VMs 130-140. Hardware 112 includes suitable physical components, such as central processing unit(s) or processor(s) 120, memory 122, physical network interface controllers (PNICs) 124, storage controller 126, and storage disk(s) 128, etc. Virtual resources are allocated to VM 130/140 to support applications 131/141 and guest operating system (OS) 132/142. For example, corresponding to hardware 112, the virtual resources or virtual devices may include virtual CPU, guest physical memory (i.e., memory visible to the guest OS running in a VM), virtual disk(s), virtual network interface controller (VNIC), etc.
Virtual machine monitor (VMM) 134/144 is implemented by hypervisor 114 to emulate various hardware resources for VM 130/140. For example, VMM1134 is configured to emulate VNIC1135 to provide network access for VM1130, and VMM2144 to emulate VNIC2145 for VM2140. In practice, VMM 134/144 may be considered as part of VM 130/140, or alternatively, separated from VM 130/140. In both cases, VMM 134/144 maintains the state of VNIC 135/145 to facilitate migration of VM 130/140. In practice, one VM may have multiple VNICs (each VNIC having its own network address). Any suitable virtual network adapter technology may be used for VNIC 135/136, such as VMXNET3 (available from VMware, Inc.), E1000 network interface (an emulated version of a Gigabit Ethernet), or the like, etc.
Although examples of the present disclosure refer to virtual machines, it should be understood that a “virtual machine” running on a host is merely one example of a “virtualized computing instance” or “workload.” A virtualized computing instance may represent an addressable data compute node or isolated user space instance. In practice, any suitable technology may be used to provide isolated user space instances, not just hardware virtualization. Other virtualized computing instances may include containers (e.g., running within a VM or on top of a host operating system without the need for a hypervisor or separate operating system or implemented as an operating system level virtualization), virtual private servers, client computers, etc. Such container technology is available from, among others, Docker, Inc. The VMs may also be complete computational environments, containing virtual equivalents of the hardware and software components of a physical computing system. The term “hypervisor” may refer generally to a software layer or component that supports the execution of multiple virtualized computing instances, including system-level software in guest VMs that supports namespace containers such as Docker, etc. Hypervisor 114 may implement any suitable virtualization technology, such as VMware ESX® or ESXi™ (available from VMware, Inc.), kernel-based virtual machine (KVM), etc.
Hypervisor 114 further implements virtual switch 116 to handle traffic forwarding to and from VMs 130-140. For example, VM 130/140 may send egress (i.e., outgoing) packets and receive ingress packets (i.e., incoming) via VNIC 135/145 and logical port 161/162. As used herein, the term “logical port” may refer generally to a port on a logical switch to which a virtualized computing instance is connected. A “logical switch” may refer generally to an SDN construct that is collectively implemented by multiple virtual switches, whereas a “virtual switch” may refer generally to a software switch or software implementation of a physical switch. In practice, there is usually a one-to-one mapping between a logical port on a logical switch and a virtual port on virtual switch 116. However, the mapping may change in some scenarios, such as when the logical port is mapped to a different virtual port on a different virtual switch after migration of the corresponding virtualized computing instance (e.g., when the source and destination hosts do not have a distributed virtual switch spanning them).
SDN controller 170 and SDN manager 180 are example management entities that facilitate management and configuration of SDN environment 100. One example of an SDN controller is the NSX controller component of VMware NSX® (available from VMware, Inc.) that may be a member of a controller cluster (not shown) and configurable using an SDN manager (not shown for simplicity). One example of an SDN manager is the NSX manager component that provides an interface for end users to perform any suitable configuration in SDN environment 100. In practice, management entity 170/180 may be implemented using physical machine(s), virtual machine(s), a combination thereof, etc. SDN controller 170 may send configuration information to each host 110A/110B/110C via a control-plane channel established between them, such as using TCP over Secure Sockets Layer (SSL), etc.
As used herein, the term “packet” may refer generally to a group of bits that can be transported together from a source to a destination, such as a “segment,” “frame,” “message,” “datagram,” etc. The term “traffic” may refer generally to multiple packets. The term “layer-2” may refer generally to a link layer or Media Access Control (MAC) layer; “layer-3” to a network or Internet Protocol (IP) layer; and “layer-4” to a transport layer (e.g., using Transmission Control Protocol (TCP), User Datagram Protocol (UDP), etc.), in the Open System Interconnection (OSI) model, although the concepts described herein may be used with other networking models. Physical network 102 may be any suitable network, such as wide area network, virtual private network (VPN), etc.
In practice, VM 130/140 may experience performance issues when there is a large volume of traffic going through VNIC 135/145. For example, packets may be dropped at VNIC 135/145 due to insufficiency of memory space to store packets and/or CPU cycles to process the packets. Such performance issues affect packet processing at VMs. In one example scenario, some VMs may rely on a reliable exchange of control packets among peers. In this case, the loss of control packets may lead to service disruption, which is undesirable.
Filter-Based Packet Handling
According to examples of the present disclosure, packet handling may be implemented at virtual network adapters in an improved manner. In particular, a “filter-based” approach may be implemented to provide a finer granularity of control for steering packets to dedicated queues at VNIC 135/145, or dropping the packets to protect against malicious attacks. Throughout the present disclosure, the term “virtual network adapter” or “virtual network interface controller” (e.g., VNIC 135/145) may refer to a virtual device that connects a virtualized computing instance (e.g., VM) to physical network 102 via a physical network adapter (e.g., PNIC 124).
In more detail,
At 210 in
At 230 and 240 in
At 250 and 260 in
As will be discussed further using
Depending on the desired implementation, any suitable filters may be configured and matched to ingress packets to support various use cases, such as differentiated (or prioritized) packet handling for control packets and data packets (see
Example Filter Configuration
At 310-315 in
Each filter (Fj) may specify a set of packet characteristic(s) to be matched to an ingress packet, and an action to be performed when there is a match. Using M=total number of filters configured for VNIC1135, a particular filter may be denoted as Fj, where j∈{0, . . . , M−1}. Using N=number of queue, a particular queue may be denoted as Qi, where i∈{0, . . . , N−1}. Any suitable action may be specified by filter (Fj), such as action=ASSIGN an ingress packet to a particular queue (Qi) of VNIC1135, and action=DROP the ingress packet, etc.
Any suitable packet characteristic may be specified by filter (Fj), such as packet header information (e.g., inner header and/or outer header), packet payload information, packet metadata, etc. Example inner/outer header information specified by filter (Fj), may include: a source IP address, source MAC address, source port number, destination IP address, destination MAC address, destination port number, destination port number, protocol, logical overlay network information (e.g., VNI), or any combination thereof, etc. In practice, a packet characteristic may be defined using a range of values, a group that includes a set of distinct values or entities, etc.
At 320-325 in
At 420 in
Filters 421-424 may be matched with data packets. The term “data packet” may refer generally to a packet that includes any suitable information that a source wishes to send to a destination, such as for processing, querying, etc. At 421, a second filter (labelled “F1”) may specify (protocol=TCP, port number=150) be matched with TCP data packets, and action=ASSIGN to queue “Q1” 411 (i=1). At 422, a third filter (labelled “F2”) may specify (protocol=HTTPS, port number=443) be matched with HTTPS data packets, and action=ASSIGN to queue “Q2” 412 (i=2). At 423, a fourth filter (labelled “F3”) may specify (protocol=TCP, source IP address=10.10.10.1) and action=DROP to block packets from a particular source. At 424, a fifth filter (labelled “F4”) may assign all remaining packets to queue “Q3” 413 (i=3).
Filters 420-424 may be arranged in an order of precedence. For example, filter “F0” 420 has the highest priority (or highest precedence) and overrides all other filters 421-424. “F1” 421 has the second highest priority, followed by “F2” 422, “F3” 423 and “F4” 424 (i.e., lowest priority or precedence). In practice, an ingress packet may be matched with “F0” 420, followed by subsequent filters 421-424. Although not shown in
Example Packet Handling
At 330, 340 and 345 in
Depending on the desired implementation, examples of the present disclosure may be implemented together with network driver technology such as RSS as a form of optimization. When RSS is enabled at VNIC 135/145, ingress packet processing for a particular packet flow may be shared across multiple processors or processor cores (instead of a single processor). In this case, at block 335 in
(a) Steering of Control and Data Packets
Some examples will be explained using
Depending on the desired implementation, hosts 110A-C in
In the example in
Each member of the HA cluster may monitor each other's status (i.e., alive or not) by exchanging control packets, such as using a fault detection or continuity check protocol such as BFD. For example in
According to examples of the present disclosure, filters 420-424 may be applied to identify control packets and data packets, and deliver them to different queues at VNIC1135. At 510 in
At 520 in
As such, filter “F0” 420 may be configured to specify control packet characteristics (e.g., protocol=UDP, destination port number=3784) associated with HA configuration to identify control packets. Filter “F2” 422 may be configured to specify data packet characteristics (e.g., protocol=HTTPS, destination port number=443) to identify data packets that, for example, require processing by VM1130. This way, even when VNIC1135 has to handle a large volume of data traffic (see 510) and a low volume of control traffic (see 520), filters may be applied to separate the different traffic. This way, control traffic may be delivered to VM1130 in a more reliable manner.
Depending on the desired implementation, VM1130 may retrieve packets 510/520 from queues 410-413 using any suitable approach. For example, the processing of control packets may be assigned with a higher priority compared to that of data packets. By steering control packets and data packets to respective dedicated queues, differentiated or prioritized packet handling may be performed.
(b) Intrusion Detection and Prevention
In another example, filter-based packet handling may be implemented for intrusion detection and prevention to protect against malicious attacks. For example, a distributed denial of service (DDOS) attack is a malicious network attack that involves hackers sending a large volume of traffic to one specific service or website with the intention of overwhelming it with false traffic. To protect against such malicious attacks, a particular filter (Fj) may be configured according to 310-325 in
(c) Filter-Based Load Balancing
In the example in
Filters 620-623 may be configured by application(s) 141 by generating and sending a request to virtual device backend 146 via driver 143. In the example in
Hardware Offload Capability
According to examples of the present disclosure, two layers of filter-based packet handling may be implemented, i.e., a first level at PNIC 124 and a second level at VNIC 135/145. In this case, prior to matching an ingress packet to one of multiple filters 420-424 configured for VNIC 135/145, the ingress packet may be matched to one of multiple PNIC filters configured for PNIC 125 to assign the ingress packet to one of multiple PNIC queues. An example is shown in
At the first layer, PNIC filters 720-723 (labelled “PF0” to “PF3”) may be configured to assign matching ingress packets to respective PNIC queues 710-713 (labelled “PQ0” to “PQ3”). At the second layer, ingress packets in physical NIC queues 710-713 may be matched to filters 420-424 in
The configuration of PNIC filters 720-723 may be initiated using virtual device backend 136/146 (see also 730 in
Container Implementation
Although explained using VMs, it should be understood that SDN environment 100 may include other virtual workloads, such as containers, etc. As used herein, the term “container” (also known as “container instance”) is used generally to describe an application that is encapsulated with all its dependencies (e.g., binaries, libraries, etc.). In the examples in
Computer System
The above examples can be implemented by hardware (including hardware logic circuitry), software or firmware or a combination thereof. The above examples may be implemented by any suitable computing device, computer system, etc. The computer system may include processor(s), memory unit(s) and physical NIC(s) that may communicate with each other via a communication bus, etc. The computer system may include a non-transitory computer-readable medium having stored thereon instructions or program code that, when executed by the processor, cause the processor to perform processes described herein with reference to
The techniques introduced above can be implemented in special-purpose hardwired circuitry, in software and/or firmware in conjunction with programmable circuitry, or in a combination thereof. Special-purpose hardwired circuitry may be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), and others. The term ‘processor’ is to be interpreted broadly to include a processing unit, ASIC, logic unit, or programmable gate array etc.
The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or any combination thereof.
Those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computing systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of skill in the art in light of this disclosure.
Software and/or to implement the techniques introduced here may be stored on a non-transitory computer-readable storage medium and may be executed by one or more general-purpose or special-purpose programmable microprocessors. A “computer-readable storage medium”, as the term is used herein, includes any mechanism that provides (i.e., stores and/or transmits) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant (PDA), mobile device, manufacturing tool, any device with a set of one or more processors, etc.). A computer-readable storage medium may include recordable/non recordable media (e.g., read-only memory (ROM), random access memory (RAM), magnetic disk or optical storage media, flash memory devices, etc.).
The drawings are only illustrations of an example, wherein the units or procedure shown in the drawings are not necessarily essential for implementing the present disclosure. Those skilled in the art will understand that the units in the device in the examples can be arranged in the device in the examples as described, or can be alternatively located in one or more devices different from that in the examples. The units in the examples described can be combined into one module or further divided into a plurality of sub-units.
Number | Name | Date | Kind |
---|---|---|---|
20150055456 | Agarwal | Feb 2015 | A1 |
20150055468 | Agarwal | Feb 2015 | A1 |
20150085868 | Snyder, II | Mar 2015 | A1 |
20150142832 | Pope | May 2015 | A1 |
20150261556 | Jain | Sep 2015 | A1 |
20150295830 | Talla | Oct 2015 | A1 |
20150370586 | Cooper | Dec 2015 | A1 |
20170005931 | Mehta | Jan 2017 | A1 |
20170126567 | Wang | May 2017 | A1 |
20180069924 | Tumuluru | Mar 2018 | A1 |
20180352474 | Mehta | Dec 2018 | A1 |
20190173851 | Jain | Jun 2019 | A1 |
20200084192 | Wang | Mar 2020 | A1 |
20200104269 | Pope | Apr 2020 | A1 |
20200274820 | Holla | Aug 2020 | A1 |
20200304418 | Holla | Sep 2020 | A1 |
20210029083 | Li | Jan 2021 | A1 |
Entry |
---|
Antoine Kaufmann, Simon Peter, Naveen Kr. Sharma, Thomas Anderson, and Arvind Krishnamurthy. 2016. High Performance Packet Processing with FlexNIC. SIGARCH Comput. Archit. News 44, 2 (May 2016), 67-81. DOI:https://doi.org/10.1145/2980024.2872367 (Year: 2016). |
Vmware, “Leveraging NIC Technology to Improve Network Performance in VMware vSphere”, Technical White Paper, 2015, p. 1-12, https://www.vmware.com/techpapers/2015/leveraging-nic-technology-to-improve-network-perfo-10450.html (Year: 2015). |
H. Oi and F. Nakajima, “Performance Analysis of Large Receive Offload in a Xen Virtualized System,” 2009 International Conference on Computer Engineering and Technology, 2009, pp. 475-480, doi: 10.1109/ICCET.2009.112. (Year: 2009). |
Zhou, FF., Ma, RH., Li, J. et al. Optimizations for High Performance Network Virtualization. J. Comput. Sci. Technol. 31, 107-116 (2016). https://doi.org/10.1007/s11390-016-1614-x (Year: 2016). |
Hatori, Takayuki & Oi, Hitoshi. (2008). Implementation and Analysis of Large Receive Offload in a Virtualized System, p. 1-5. (Year: 2008). |
Number | Date | Country | |
---|---|---|---|
20210029083 A1 | Jan 2021 | US |