Virtualization allows the abstraction and pooling of hardware resources to support virtual machines in a Software-Defined Networking (SDN) environment, such as a Software-Defined Data Center (SDDC). For example, through server virtualization, virtualization computing instances such as virtual machines (VMs) running different operating systems may be supported by the same physical machine (e.g., referred to as a “host”). Each VM is generally provisioned with virtual resources to run an operating system and applications. The virtual resources may include central processing unit (CPU) resources, memory resources, storage resources, network resources, etc. In practice, traffic among VMs may be susceptible to various network issues, which may affect the performance of hosts and VMs.
In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the drawings, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein. Although the terms “first,” “second” and so on are used to describe various elements, these elements should not be limited by these terms. These terms are used to distinguish one element from another. A first element may be referred to as a second element, and vice versa.
Challenges relating to network diagnosis will now be explained in more detail using
Each host 110A/110B/110C may include suitable hardware 112A/112B/112C and virtualization software (e.g., hypervisor-A 114A, hypervisor-B 114B, hypervisor-C 114C) to support various virtual machines (VMs) 131-136. For example, host-A 110A supports VM1 131 and VM2 132; host-B 110B supports VM3 133 and VM4 134; and host-C 110C supports VM5 135 VM6 136. Hypervisor 114A/114B/114C maintains a mapping between underlying hardware 112A/112B/112C and virtual resources allocated to respective VMs 131-136. Hardware 112A/112B/112C includes suitable physical components, such as central processing unit(s) (CPU(s)) or processor(s) 120A/120B/120C; memory 122A/122B/122C; physical network interface controllers (NICs) 124A/124B/124C; storage controller 126A/126B/126C; and storage disk(s) 128A/128B/128C, etc.
Virtual resources are allocated to respective VMs 131-136 to support a guest operating system (OS) and application(s). For example, the virtual resources may include virtual CPU, guest physical memory, virtual disk, virtual network interface controller (VNIC), etc. Hardware resources may be emulated using virtual machine monitors (VMMs). For example in
Although examples of the present disclosure refer to VMs, it should be understood that a “virtual machine” running on a host is merely one example of a “virtualized computing instance” or “workload.” A virtualized computing instance may represent an addressable data compute node (DCN) or isolated user space instance. In practice, any suitable technology may be used to provide isolated user space instances, not just hardware virtualization. Other virtualized computing instances may include containers (e.g., running within a VM or on top of a host operating system without the need for a hypervisor or separate operating system or implemented as an operating system level virtualization), virtual private servers, client computers, etc. Such container technology is available from, among others, Docker, Inc. The VMs may also be complete computational environments, containing virtual equivalents of the hardware and software components of a physical computing system.
The term “hypervisor” may refer generally to a software layer or component that supports the execution of multiple virtualized computing instances, including system-level software in guest VMs that supports namespace containers such as Docker, etc. Hypervisors 114A-C may each implement any suitable virtualization technology, such as VMware ESX® or ESXi™ (available from VMware, Inc.), Kernel-based Virtual Machine (KVM), etc. The term “packet” may refer generally to a group of bits that can be transported together, and may be in another form, such as “frame,” “message,” “segment,” etc. The term “traffic” may refer generally to multiple packets. The term “layer-2” may refer generally to a link layer or Media Access Control (MAC) layer; “layer-3” to a network or Internet Protocol (IP) layer; and “layer-4” to a transport layer (e.g., using Transmission Control Protocol (TCP), User Datagram Protocol (UDP), etc.), in the Open System Interconnection (OSI) model, although the concepts described herein may be used with other networking models.
Hypervisor 114A/114B/114C implements virtual switch 115A/115B/115C and logical distributed router (DR) instance 117A/117B/117C to handle egress packets from, and ingress packets to, corresponding VMs 131-136. In SDN environment 100, logical switches and logical DRs may be implemented in a distributed manner and can span multiple hosts to connect VMs 131-136. For example, logical switches that provide logical layer-2 connectivity may be implemented collectively by virtual switches 115A-C and represented internally using forwarding tables 116A-C at respective virtual switches 115A-C. Forwarding tables 116A-C may each include entries that collectively implement the respective logical switches. Further, logical DRs that provide logical layer-3 connectivity may be implemented collectively by DR instances 117A-C and represented internally using routing tables 118A-C at respective DR instances 117A-C. Routing tables 118A-C may each include entries that collectively implement the respective logical DRs.
Packets may be received from, or sent to, each VM via an associated logical switch port. For example, logical switch ports 151-156 (labelled “LSP1” to “LSP6”) are associated with respective VMs 131-136. Here, the term “logical port” or “logical switch port” may refer generally to a port on a logical switch to which a virtualized computing instance is connected. A “logical switch” may refer generally to a software-defined networking (SDN) construct that is collectively implemented by virtual switches 115A-C in the example in
SDN manager 170 and SDN controller 160 are example network management entities in SDN environment 100. To send and receive the control information, each host 110A/110B/110C may implement local control plane (LCP) agent (not shown) to interact with SDN controller 160. For example, control-plane channel 101/102/103 may be established between SDN controller 160 and host 110A/110B/110C using TCP over Secure Sockets Layer (SSL), etc. Management entity 160/170 may be implemented using physical machine(s), virtual machine(s), a combination thereof, etc. Hosts 110A-C may also maintain data-plane connectivity with each other via physical network 104.
Through virtualization of networking services in SDN environment 100, logical overlay networks may be provisioned, changed, stored, deleted and restored programmatically without having to reconfigure the underlying physical hardware architecture. A logical overlay network (also known as “logical network”) may be formed using any suitable tunneling protocol, such as Generic Network Virtualization Encapsulation (GENEVE), Virtual eXtensible Local Area Network (VXLAN), Stateless Transport Tunneling (STT), etc. For example, tunnel encapsulation may be implemented according to a tunneling protocol to extend layer-2 segments across multiple hosts. In relation to a logical overlay network, the term “tunnel” may refer generally to a tunnel established between a pair of VTEPs over physical network 104, over which respective hosts are in layer-3 connectivity with one another.
Hypervisor 114A/114B/114C may implement a virtual tunnel endpoint (VTEP) to encapsulate and decapsulate packets with an outer header (also known as a tunnel header) identifying a logical overlay network (e.g., VNI=5000) to facilitate communication over the logical overlay network. For example, hypervisor-A 114A implements a first VTEP-A associated with (IP address=IP-A, MAC address=MAC-A, VTEP label=VTEP-A), hypervisor-B 114B a second VTEP-B with (IP-B, MAC-B, VTEP-B) and hypervisor-C 114C a third VTEP-C with (IP-C, MAC-C, VTEP-C). Encapsulated packets may be sent via a logical overlay tunnel established between a pair of VTEPs over physical network 104. In practice, a host may support more than one VTEP.
To protect VMs 131-136 against security threats caused by unwanted packets, hypervisor 114A/114B/114C may implement distributed firewall (DFW) engine 119A/119B/119C to filter packets to and from associated VMs. For example, at host-A 110A, hypervisor 114A implements DFW engine 119A to filter packets for VM1 131 and VM2 132. SDN controller 180 may be used to configure firewall rules that are enforceable by DFW engine 119A/119B/119C. In practice, network packets may be filtered according to firewall rules at any point along the datapath from a source (e.g., VM1 131) to a physical NIC (e.g., 124A). In one embodiment, a filter component (not shown) may be incorporated into each VNIC 141-144 to enforce firewall rules that are associated with the VM (e.g., VM1 131) corresponding to that VNIC (e.g., VNIC 141). The filter components may be maintained by DFW engines 119A-C.
In practice, network diagnosis may be implemented to identify various issues in SDN environment 100, such as security threats, misuses, invalid configurations or performance issues. One approach is to monitor for network events that provide an insight into how well a network or a workload is performing. Conventionally, information relating network events is often sent to a database to facilitate subsequent retrieval using a query language such as structured query language (SQL). In other network diagnosis approaches, fixed queries may be made against streaming data for analysis. Such conventional approaches usually lack effectiveness, such as due to the time lag between event detection and subsequent analysis. This may in turn expose hosts 110A-C and VMs 131-136 to security and performance risks.
Dynamic Event Processing for Network Diagnosis
According to examples of the present disclosure, dynamic event processing may be implemented to monitor packet flows at runtime. Examples of the present disclosure may be implemented for detecting events and analyzing them dynamically so that remediation action(s) may be performed substantially close to the time at which the events were detected. As used herein, the term “dynamic” may refer generally to the execution of event processing in real time or near real time. A related term is “runtime,” which may refer generally to a period of time during which a monitoring target (e.g., packet flow) is active. The term “dynamic” may also refer generally to the adaptive execution of event processing based on any suitable configuration of events, rules and signatures (to be discussed below). Such dynamic approach should be contrasted against conventional approaches using fixed queries, which are usually non-modifiable and rely on some static events.
In more detail,
At 210 in
As used herein, the term “event” may refer generally to an incident of interest associated with the runtime flow. The term “signature” may refer generally to pattern(s) of interest that may be derived from the set of multiple events. As network flows are created and terminated dynamically, multiple events may be associated with these flows during the lifetime of the flows. In practice, any suitable events may be detected for network diagnosis purposes, ranging from simple events (e.g., failed TCP handshake) to more complex ones (e.g., a variety of destinations in a connection). For example, block 230 may involve using a set of event maps to match the set of multiple events to the set of signatures in a more efficient manner.
At 220 in
As will be explained using
At 230, host-A 110A may perform a second stage of event processing by comparing (a) predefined characteristic information specified by the first signature against (b) runtime characteristic information associated with the runtime flow. The second signature may be disregarded or eliminated from further processing during the second stage. At 240, in response to detecting an issue based on the second stage of event processing, remediation action(s) may be performed. Any suitable “issue” may be detected at block 240, such as a security-related issue to support intrusion detection and/or prevention, a performance-related issue to facilitate resource optimization, etc.
According examples of the present disclosure, the second signature (i.e., a partial match) may be eliminated from the second stage of dynamic event processing. Since the second stage involves comparison of characteristic information and usually takes up the bulk of the processing time, dynamic event processing may be performed in a more efficient manner. Depending on the desired implementation, examples of the present disclosure may be implemented to facilitate large-scale, compound event processing in a real-time manner.
Dynamic Rule Configuration
According to examples of the present disclosure, mapping rules may be configured to process events in a more efficient manner. Each mapping rule may specify any suitable match fields to match a set of events to a signature. Some examples will be explained using
At 310 in
At 320 in
Using M=3 in
At 330 in
For example in
Using examples of the present disclosure, analysis of network flow information may be enriched with contextual information across the lifetime of packet flow(s). The contextual information may be represented as events, and based on dynamic rules (e.g., defined by security administrators), compound event processing may be performed in real time. The example framework described herein may be implemented to enhance event processing capabilities by adding support for compound events across packet flows through suitable definition events 310, signatures 320 and mapping rules 330.
Dynamic Event Processing
According to examples of the present disclosure, mapping rules 331-333 in
(a) First Stage 401
At 410-415 in
At 420 in
At 425-430 in
In contrast, at 550 in
Depending on the desired implementation, block 425 may involve marking a runtime mask to represent whether events are detected, such as (1, 0, 1, 1, 0) for events (A, C, D), where index=1 for EVENT-1=A, index=3 for EVENT-3=C and index=4 for EVENT-4=D. The runtime mask may then be compared with a static mask defined using “mask( )” for each rule in
(b) Second Stage 402
At 450, 455 and 460 in
In more detail, at 560, a “filter” property may be configured to specify various filters for filtering access control (MAC) information, network layer information, transport layer information and application layer information. Layer-4 filters may specify attributes such as source IP address, destination IP address, source port number and destination port number. Layer-7 filter may specify attributes such as application ID, protocol, etc. This way, a particular signature may be matched against events or attributes from different layers from the networking stack. At 570, a “track by” property may be configured to instruct host-A 110A to track runtime packet flow(s) 510, such at a source, destination, both source and destination, per-flow basis, etc. At 580, a “threshold,” “limit,” “count” and “time” properties may be configured to specify a minimum threshold, maximum threshold, counter and duration, respectively.
At 465-470 in
Although explained using three mapping rules 331-333 and signatures 321-323 in
More complex events may include: port-to-APP-ID mismatch (e.g., destination port==80, and APP ID!=HTTP), number of drop rule hits within one period (e.g., (L4 Drop>10000) II (L7 Drop>100) per destination, threshold=10 seconds; logins per second with small transactions (e.g., SQL.Transaction<3 and Login.Username is UNIQUE, Threshold Count 10, 10 seconds); high rate of mini flows (e.g., packet count per flow<20, per source/destination, threshold count=100, 10 seconds).
According to examples of the present disclosure, any suitable mapping rules may be configured to detect compound events of different complexities from any suitable number of packet flows. Additional examples are shown in
During a first stage of event processing, host-A 110A may identify a first set of mapping rules (see 630 in
During a second stage of event processing, first signatures=(SIG-5 325, SIG-6 326) may be analyzed further. This stage is generally more resource-intensive and involves comparing (a) predefined characteristic information specified by SIG-5 325 and SIG-6 326 and (b) runtime characteristic information associated with runtime flows 611-612. In practice, flows 611-612 may be tracked for a period of time to identify any potential issues for network diagnosis purposes. Since corresponding (RULE-5 335, RULE-6 336) may be configured to define any suitable compound events, examples of the present disclosure may be implemented to facilitate dynamic compound event processing. This way, hosts 110A-C may perform event processing in a more efficient and reactive manner compared to conventional approaches that necessitates event processing by a remote entity (i.e., not by hosts 110A-C).
Container Implementation
Although explained using VMs, it should be understood that SDN environment 100 may include other virtual workloads, such as containers, etc. As used herein, the term “container” (also known as “container instance”) is used generally to describe an application that is encapsulated with all its dependencies (e.g., binaries, libraries, etc.). In the examples in
Computer System
The above examples can be implemented by hardware (including hardware logic circuitry), software or firmware or a combination thereof. The above examples may be implemented by any suitable computing device, computer system, etc. The computer system may include processor(s), memory unit(s) and physical NIC(s) that may communicate with each other via a communication bus, etc. The computer system may include a non-transitory computer-readable medium having stored thereon instructions or program code that, when executed by the processor, cause the processor to perform process(es) described herein with reference to
The techniques introduced above can be implemented in special-purpose hardwired circuitry, in software and/or firmware in conjunction with programmable circuitry, or in a combination thereof. Special-purpose hardwired circuitry may be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), and others. The term ‘processor’ is to be interpreted broadly to include a processing unit, ASIC, logic unit, or programmable gate array etc.
The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or any combination thereof.
Those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computing systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of skill in the art in light of this disclosure.
Software and/or to implement the techniques introduced here may be stored on a non-transitory computer-readable storage medium and may be executed by one or more general-purpose or special-purpose programmable microprocessors. A “computer-readable storage medium”, as the term is used herein, includes any mechanism that provides (i.e., stores and/or transmits) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant (PDA), mobile device, manufacturing tool, any device with a set of one or more processors, etc.). A computer-readable storage medium may include recordable/non recordable media (e.g., read-only memory (ROM), random access memory (RAM), magnetic disk or optical storage media, flash memory devices, etc.).
The drawings are only illustrations of an example, wherein the units or procedure shown in the drawings are not necessarily essential for implementing the present disclosure. Those skilled in the art will understand that the units in the device in the examples can be arranged in the device in the examples as described, or can be alternatively located in one or more devices different from that in the examples. The units in the examples described can be combined into one module or further divided into a plurality of sub-units.