Virtualization allows the abstraction and pooling of hardware resources to support virtual machines in a Software-Defined Networking (SDN) environment, such as a Software-Defined Data Center (SDDC). For example, through server virtualization, virtualization computing instances such as virtual machines (VMs) running different operating systems may be supported by the same physical machine (e.g., referred to as a “host”). Each VM is generally provisioned with virtual resources to run an operating system and applications. The virtual resources may include central processing unit (CPU) resources, memory resources, storage resources, network resources, etc. Using a plane separation architecture, the SDN environment may be divided into multiple planes having different functionalities. In practice, state inconsistencies between different planes may lead to incorrect network behavior, which is undesirable and affects network performance.
According to examples of the present disclosure, state consistency monitoring may be implemented to detect state inconsistencies that may lead to incorrect network behavior(s) in a network environment with plane separation architecture. One example may involve a computer system (e.g., witness system 210 in
In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the drawings, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein. Although the terms “first” and “second” are used to describe various elements, these elements should not be limited by these terms. These terms are used to distinguish one element from another. For example, a first element may be referred to as a second element, and vice versa.
Each host 110A/110B/110C may include suitable hardware 112A/112B/112C and virtualization software (e.g., hypervisor-A 114A, hypervisor-B 114B, hypervisor-C 114C) to support various VMs. For example, hosts 110A-C may support respective VMs 131-136 (see also
Virtual resources are allocated to respective VMs 131-136 to support a guest operating system (OS) and application(s). For example, VMs 131-136 support respective applications 141-146 (see “APP1” to “APP6”). The virtual resources may include virtual CPU, guest physical memory, virtual disk, virtual network interface controller (VNIC), etc. Hardware resources may be emulated using virtual machine monitors (VMMs). For example in
Although examples of the present disclosure refer to VMs, it should be understood that a “virtual machine” running on a host is merely one example of a “virtualized computing instance” or “workload.” A virtualized computing instance may represent an addressable data compute node (DCN) or isolated user space instance. In practice, any suitable technology may be used to provide isolated user space instances, not just hardware virtualization. Other virtualized computing instances may include containers (e.g., running within a VM or on top of a host operating system without the need for a hypervisor or separate operating system or implemented as an operating system level virtualization), virtual private servers, client computers, etc. Such container technology is available from, among others, Docker, Inc. The VMs may also be complete computational environments, containing virtual equivalents of the hardware and software components of a physical computing system.
The term “hypervisor” may refer generally to a software layer or component that supports the execution of multiple virtualized computing instances, including system-level software in guest VMs that supports namespace containers such as Docker, etc. Hypervisors 114A-C may each implement any suitable virtualization technology, such as VMware ESX® or ESXi™ (available from VMware, Inc.), Kernel-based Virtual Machine (KVM), etc. The term “packet” may refer generally to a group of bits that can be transported together, and may be in another form, such as “frame,” “message,” “segment,” etc. The term “traffic” or “flow” may refer generally to multiple packets. The term “layer-2” may refer generally to a link layer or media access control (MAC) layer; “layer-3” to a network or Internet Protocol (IP) layer; and “layer-4” to a transport layer (e.g., using Transmission Control Protocol (TCP), User Datagram Protocol (UDP), etc.), in the Open System Interconnection (OSI) model, although the concepts described herein may be used with other networking models. There are two versions of IP: IP version 4 (IPv4) and IP version 6 (IPv6) that will be discussed below.
Hypervisor 114A/114B/114C implements virtual switch 115A/115B/115C and logical distributed router (DR) instance 117A/117B/117C to handle egress packets from, and ingress packets to, corresponding VMs. In SDN environment 100, logical switches and logical DRs may be implemented in a distributed manner and can span multiple hosts. For example, logical switches that provide logical layer-2 connectivity, i.e., an overlay network, may be implemented collectively by virtual switches 115A-C and represented internally using forwarding tables 116A-C at respective virtual switches 115A-C. Forwarding tables 116A-C may each include entries that collectively implement the respective logical switches. Further, logical DRs that provide logical layer-3 connectivity may be implemented collectively by DR instances 117A-C and represented internally using routing tables 118A-C at respective DR instances 117A-C. Routing tables 118A-C may each include entries that collectively implement the respective logical DRs.
Packets may be received from, or sent to, each VM via an associated logical port. For example, logical switch ports 161-166 (see “LP1” to “LP6”) are associated with respective VMs 131-136. Here, the term “logical port” or “logical switch port” may refer generally to a port on a logical switch to which a virtualized computing instance is connected. A “logical switch” may refer generally to an SDN construct that is collectively implemented by virtual switches 115A-C in
To protect VMs 131-136 against security threats caused by unwanted packets, hypervisors 114A-C may implement firewall engines to filter packets. For example, distributed firewall engines 171-176 (see “DFW1” to “DFW6”) are configured to filter packets to, and from, respective VMs 131-136 according to firewall rules. In practice, network packets may be monitored and filtered according to firewall rules at any point along a datapath from a VM to corresponding physical NIC 124A/124B/124C. In one embodiment, a filter component (not shown) is incorporated into each VNIC 151-156 that enforces firewall rules that are associated with the endpoint corresponding to that VNIC and maintained by respective distributed firewall engines 171-176.
Through virtualization of networking services in SDN environment 100, logical networks (also referred to as overlay networks or logical overlay networks) may be provisioned, changed, stored, deleted and restored programmatically without having to reconfigure the underlying physical hardware architecture. A logical network may be formed using any suitable tunneling protocol, such as Virtual eXtensible Local Area Network (VXLAN), Stateless Transport Tunneling (STT), Generic Network Virtualization Encapsulation (GENEVE), etc. For example, VXLAN is a layer-2 overlay scheme on a layer-3 network that uses tunnel encapsulation to extend layer-2 segments across multiple hosts which may reside on different layer 2 physical networks.
SDN manager 180 and SDN controller 184 are example network management entities in SDN environment 100. One example of an SDN controller is the NSX controller component of VMware NSX® (available from VMware, Inc.) that operates on a central control plane. SDN controller 184 may be a member of a controller cluster (not shown for simplicity) that is configurable using SDN manager 180, which may be part of a manager cluster operating on a management plane. Network management entity 180/184 may be implemented using physical machine(s), VM(s), or both. Logical switches, logical routers, and logical overlay networks may be configured using SDN controller 184, SDN manager 180, etc. To send or receive control information, a local control plane (LCP) agent 119A/119B/119C on host 110A/110B/110C may interact SDN controller 184 via control-plane channel 101A/101B/101C.
Hosts 110A-C may also maintain data-plane connectivity among themselves via physical network 104 to facilitate communication among VMs located on the same logical overlay network. Hypervisor 114A/114B/114C may implement a virtual tunnel endpoint (VTEP) (not shown) to encapsulate and decapsulate packets with an outer header identifying the relevant logical overlay network (e.g., using a VXLAN or “virtual” network identifier (VNI) added to a header field). For example in
Plane Separation Architecture
As used herein, the term “control plane” may refer generally to functions that manage the intents of network administrators, maintain the desired network topology, and define traffic routing. Depending on the desired implementation, the control plane may include both CCP 202 and LCP 203. The term “management plane” may refer generally to functions relating to management of various planes 201-204, including providing user interface(s) for managing and configuring various network entities, troubleshooting, diagnosis, etc. The term “data plane” may refer generally to functions that handle traffic forwarding along a datapath between two endpoints (e.g., VM1131 on host-A 110A and VM2132 on host-B 110B). Data plane entities 205A-C may include physical and/or logical forwarding entities (also known as “forwarders”), such as physical/logical port(s), physical/logical switch(es), physical/logical router(s), VNIC(s), PNIC(s), edge appliance(s), etc. In practice, an edge appliance may be a transport node that resides on both LCP 203 and DP 204. It should be noted that DP 204 may include network services, such as firewall, load balancer, service insertion, etc. These network services may be distributed services implemented by hypervisor 114A/114B/114C and/or centralized services implemented by an edge appliance.
In relation to control-data plane separation, for example, there are various technical benefits for its implementation. First, the network may be managed in a centralized manner, which reduces if not eliminates the complexity in configuring a network entity locally with awareness of configurations and states of adjacent network entities. Second, it allows the control plane and data plane to evolve and be developed independently, which provides better vendor neutrality and interoperability across the network. The control-data plane separation architecture has been widely adopted in various computer systems (i.e., not limited to SDN environment 100). In a simplified perspective, interactions between control and data planes may include (1) reporting realized state from the data plane to the control plane, and (2) enforcing desired states (which include user's intents and realized states of surrounding network entities) from the control plane to the data plane.
In practice, state inconsistency or discrepancy between two planes (e.g., control plane and data plane) may occur due to various reasons, such as communication errors, software issues, etc. Any discrepancy is undesirable because it may lead to incorrect behavior in SDN environment 100. Some major challenges in addressing such issues are summarized below. First, many of the issues may only be manifested with specific configurations and workloads. Second, the occurrence patterns of these issues are usually demonstrated as irregular and unpredictable before root causes are known, which makes it difficult to apply workarounds and/or collect necessary debugging information in a timely manner. Third, in some cases, workarounds may be unavailable and cause users to temporarily make changes on network design or wait for a fix, which substantially hinders user experience. As such, state inconsistencies are undesirable in SDN environment.
State Consistency Monitoring
According to examples of the present disclosure, state consistency monitoring may be performed detect state inconsistencies that may lead to incorrect network behavior(s) in a network environment (i.e., not limited to an SDN environment) with multiple planes. As used herein, the term “plane” or “network plane” may refer generally to a logical division of a network environment with an architecture that is logically divided or separated into multiple divisions. Each plane may be associated with one or more network entities residing on that plane, such as SDN manager(s) 180 residing on MP 201, SDN controller(s) 184 on CCP 202, LCP agents 119A-C on LCP 203 and physical/logical forwarding entities on DP 204 in
Depending on the desired implementation, any suitable computer system may be deployed to perform state consistency monitoring, such as a centralized witness system (see 210 in
The example in
At 310 in
As will be explained below, an “association chain” (denoted as Li) may identified from an equivalence specification that includes a set of equivalence targets. The equivalence specification may be denoted as {([T1]pi, [T2]qi,Li)}. Here, each equivalence target ([T1]pi, [T2]qi,Li) specifies (a) [T1]pi=particular first field (pi) of a first table (T1) in the first state information, (b) [T2]qi=particular second field (pi) of a second table (T2) in the second state information and (c) Li=association chain between [T1]pi and [T2]qi. The association chain (Li) may define a mapping or relationship (e.g., binary relation) between [T1]pi and [T2]qi via zero or more intermediate fields (e.g., a third field in a third table). The intermediate field(s) may be part of the first or second state information. See 221-222 in
At 320 in
At 330 in
At 340 and 350 in
Using examples of the present disclosure, state inconsistencies between two network planes may be identified automatically by witness system 210 in real time such that appropriate remediation action(s) may be performed to address the state inconsistencies. This way, network performance may be improved by reducing the likelihood of incorrect network behavior(s) and system downtime. For the purposes of quality engineering, examples of the present disclosure may help to proactively identify product issues related to state discrepancy between multiple planes.
In the following, various examples will be described using centralized witness system 210 that operates independently from various network planes 201-204. Depending on the desired implementation, any suitable computer system that is capable of interacting with network planes 201-204 and processing state information may be configured to perform examples of the present disclosure. State consistency monitoring may be performed periodically (e.g., user-configurable interval) such that state inconsistencies may be detected and addressed in a real-time manner.
Example Control-Data Plane Separation Architecture
Similarly, the terms “first state information” and “second state information” may refer generally to state information associated with respective “first plane” and “second plane.” For example, first state information associated with CCP 202 or LCP 203 may be used as the source of truth of desired states, which are user intents and logical network configurations computed by controller(s). Second state information associated with DP 204 may be used as the source of runtime states, which are ephemeral states of network entities residing on DP 204.
In the example in
Controller 410 (denoted as C) on the control plane may include state collector 411 and remediation unit 412. State information collector 411 may be configured to collect state information 450 associated with controller 410, and remediation unit 412 to perform remediation action(s) based on notification(s) 480 from witness system 210. Similarly, each forwarder (Fm) 42m may include state information collector 43m and remediation unit 44m. Collector 43m may be configured to collect state information 45m associated with forwarder 42m, and remediation unit 43m to perform remediation action(s) based on notification(s) 48m from witness system 210. Each collector 411/43m may track state information using semantically equivalent state tables that are synchronized with state database 220 of witness system 210. In practice, any suitable number of controller and forwarder may be deployed.
State information associated with a network entity (i.e., controller 410 or forwarder 42m) may be formulated as state table(s). The schemas of state tables involved in consistency checking may be predefined by the system and/or user(s). For example, first state information 450 may include state table(s) denoted as T1. Second state information 45m associated with each forwarder 42j may include state table(s) denoted as T2(Fm), where Fm denotes the mth forwarder and m∈[1, . . . , M]. In this case, state consistency monitoring for each pair of state tables (T1, T2).
Depending on the desired implementation, witness system 210 may be a centralized entity that includes (a) state database 220 to store state information, (b) consistency check unit 211 to perform state consistency monitoring by querying state database and (c) remediation dispatcher 212 to dispatch instruction(s) based on the result of state consistency monitoring. In practice, witness system 210 may be implemented as part of any suitable network intelligence platform, such as VMware NSX® Intelligence (available from VMware, Inc.), etc. This way, witness system 210 may (1) reuse the data processing infrastructure of the platform, as well as (2) operate independently from network entities (e.g., SDN manager 180, SDN controller 184 and transport nodes in the form of hosts 110A-C) to be monitored. Witness system 210 is applicable to any network environment that employs plane separation architecture.
For each pair of state tables (T1, T2), consistency check unit 211 (i.e., query generator) may generate and send queries to state database 220 (see 460 in
Results of the consistency check may be utilized to generate remediation action(s) to be dispatched and applied to the corresponding controller 410 and/or forwarder 42m. For example, in response to determination that there is a state inconsistency, remediation dispatcher 212 may perform remediation action(s) by dispatching instruction(s), such as to implement the desired state, to remediation unit 44m associated with the relevant forwarder 41m. See 480 and 48m for 1≤m≤M. Using examples of the present disclosure, any state inconsistency between multiple planes (e.g., control plane and data plane) may be detected, and remediation action(s) performed in real time. Examples of the present disclosure may improve debuggability for state inconsistency issues because users may be prompted to collect debugging information timely when discrepancy begins to manifest.
Example Detailed Process and Formulations
(a) State Information
At 510 in
Definition 1: A state domain D=(L, ⊥, T, ) is a semilattice over a set of values L where is the partial order over L, ⊥, T∈L such that ∀l∈L and ⊥lT. Here, T denotes ‘ANY’ and ⊥ denotes ‘NIL’ or ‘NULL.’ A state table T over state domains D1, . . . , Dn is defined as T⊆D1× . . . ×Dn. [T]i (where 1≤i≤n) denotes the ith column or field of T. The notation D[T]
Depending on the desired implementation, only a fragment of state information may be required to be modeled into state tables according to the requirement of state consistency monitoring. The schemas of state tables are predefined. Each state table has primary key set on one or multiple fields and no foreign key is set because referential integrity is not required for state tables.
For controller (C) 410, first state information 450 may be formulated as S1=set of state tables (each denoted as T1) that are stored in state database 220 of witness system 210. Similarly, for forwarder (Fm) 42m, second state information 45m may be formulated as SF
For each table T1, there may be a column/field of forwarder identifier as part of the primary key, which may be leveraged to filter the states to be enforced to specific forwarders. When a new configuration arrives at controller (C) 410, records with proper forwarder identifiers may be inserted into a state table according to the span of this configuration. For each n-ary state table T∈SF
(b) Association Chain(s)
At 520 in
Definition 2: Given state tables T1, . . . , Tk, an association chain denoted as ([T1]p1, [T2]q2), ([T2]p2, [T3]q3), . . . ([Tk-1]pk-1, [Tk]qk) may be defined for ([T1]p1, [Tk]qk) where domain
for 1≤i<k. L defines a relation RL over D[T
More generally, the association between [T1]p1 and [Tk]qk may be formulated into a graph. For simplicity, all associations between table fields may be regarded as linear chains. The following procedures and conclusions may be generalized for non-linear associations.
Definition 3: Given two state tables T1 and T2, an equivalence target for (T1, T2) may be denoted as ([T1]pi, [T2]qi, Li), where Li=association chain for fields [T1]pi and [T2]qi in respective state tables. An equivalence specification E(T1,T2) may be defined as a set of equivalence targets for (T1, T2). Using E(T1,T2)={([T1]pi, [T2]qi,Li)} for 1≤i≤k, state tables T1 and T2 may satisfy the equivalence specification if the following conditions are satisfied:
According to definition 2, an association chain may be defined for checking the consistency of two tables that do not have common fields but are associated with other state table(s). For example, consider table TA with fields=(P, Q) and table TB with fields=(R, S) that are associated with table C with fields=(Q, R) according to association chain L=<([TA]Q, [TC]Q), ([TC]R, [TB]R)>. The association chain defines a binary relation between [TA]Q and [TB]R via intermediate fields [TC]Q and [TC]R. Based on statement (1) in definition 1, when (x, y) is in this binary relation, x is in [TA]Q and y in [TB]R. Based on statement (2), there is an entry (tA) from table TA, an entry (tB) from table TB and an entry (tC) from table TC such that (a) the field of Q of tA is equivalent to the field of Q of tC, and (b) the field of R of tC equals to the field R of tB.
According to definition 3, an equivalence target gives that two table fields should be consistent over a given association chain. Continuing from the above example with association chain L=<([TA]Q, [TC]Q), ([TC]R, [TB]R)>, TA and TB satisfy equivalence target E(TA, TB)=([TA]Q, [TB]R, L) provided conditions (1) and (2) are satisfied. Based on condition (1), for an arbitrary value x in [TA]Q, there is a value y in [TB]R such that (x, y) is in a binary relation defined by the association chain. Based on condition (2), there is a value x in [TA]Q such that (x, y) is in the binary relation defined by the association chain.
Consider an example with two state tables T1, T2∈D1×D2 that do not satisfy an equivalence target. For example, T1={(d11, d21), (d12, d22)} and T2={(d1, d21), (d12, d23)} where (d11, d12) are distinct elements in D1 and (d21, d22, d23) are distinct elements in D2. In this case, equivalence specification E(T1,T2) may include two equivalence targets in the form of ([T1]1, [T2]1, [T1]1, [T2]1) and ([T1]2, [T2]2, [T1]2, [T2]2). By definition, T1 and T2 do not satisfy E(T1,T2) because (1) for (d12, d22)∈T1, t2∈T2 such that d12=[T2]1(t2) and d22=[T2]2(t2); and (2) for (d12, d23)∈T2, t1∈T1 such that d12=[T1]1(t1) and d23=[T1]2(t1).
(c) Database Queries
At 530, 540 and 550 in
Without loss of generality, let T1 and T2 be the respective state tables for controller (C) 410 and a particular forwarder 42m. State consistency monitoring may involve querying state database 220 to perform a consistency check by comparing first field(s) in T1 with second field(s) in T2. The consistency check is performed to determine whether there is at least one state inconsistency between multiple planes. For example, using T1 as the source of truth, a state inconsistency may be stale information (denoted as Δ+T1) or missing information (denoted as Δ−T2) in T2.
At 542 in
Theorem: Given state tables (T1, . . . , Tk) and association chain L=([T1]p1, [Tk1]u1), ([Tk1]v1, [Tk2]v2), . . . , ([Tkj-1]uj-1, [Tkj]vj-1) ([Tkj]uj, [T2]q1) that defines an equivalence target ([T1]p1, [T2]q2, L). For t1∈T1 and tk∈Tk, consider two following statements. A first statement (denoted P) is: T*=T1 [T
In relational database, an inner join operation may refer to the joining of multiple tables to create a new table that have matching values in the multiple tables. After performing the inner join operation(s), X has the same columns of T1, T2 and possibly other tables involved in the association chain(s). Next, at line 7, a rename operation may be performed to rename [Tkj]vj-1 as [T2]q1 based on the association chain (Li). The result of the rename operation may be projected on the columns/fields of T2 such that the projected table is ready for comparison with T2. This way, a state inconsistency (if any) may be identified at line 11 (i.e., stale information) and line 13 (i.e., missing information) of the query procedure (see 542 in
(d) State Inconsistencies
In more detail, at 550 in
Consider an example with two state tables T1, T2∈D1×D2, where T1={(d11, d21), (d12, d22)}, T2={(d11, d21), (d12, d23)}. Here, (d11, d12) are distinct elements in D1 and (d21, d22, d23) are distinct elements in D2. In this case, equivalence specification E(T1, T2) may include ([T1]1, [T2]1, [T1]1, [T2]1) and ([T1]2, [T2]2, [T1]2, [T2]2). Let (Δ+T2, Δ−T2) be the output of a database query procedure QUERY(E(T1, T2), T1). In this case, stale information Δ+T2={(d12, d23)} and missing information Δ−T2={(d12, d22)} may be detected.
Using QUERY(E(T1,T2), T1) at 542 in
(d) Remediation Action(s)
At 560 in
For stale information (Δ+T2), remediation dispatcher 212 may generate and send a request to remediation unit 44m of forwarder (Fm) 42m to address the state inconsistency, such as by updating its state information to remove the stale information. For missing information (Δ−T2), remediation dispatcher 212 may generate and send a request to remediation unit 44m of associated forwarder (Fm) 42m to add the missing information. The remediation request(s) may be implementation-specific, such as updating configuration store, invoking input/output control (ioctl) command(s) to kernel, etc. Specific sequences on applying remediation for different state tables may be required subject to the dependencies of state tables.
Depending on the desired implementation, a user may be notified of any state inconsistency via an alarm, Simple Network Management Protocol (SNMP) trap, email bot etc. The remediation request(s) may be dispatched to the relevant network entity or entities to apply remediation automatically. Any other remediation action(s) may be performed, such as executing a runbook that diagnoses certain unhealthy symptoms, generating a log bundle immediately when the system is still in a problematic state, etc. The user may be asked to determine appropriate action(s) to taken, such as whether to apply the remediation automatically, whether to execute the runbook, whether to generate the log bundle, etc.
At 570 in
Using the example in
At 610 in
At 640, 645 and 650 in
At 660 in
Based on the result of the state consistency check, witness system 210 may perform any suitable remediation action(s) to address state inconsistencies 661-662. In practice, a remediation action to address a state inconsistency (e.g., add missing information or remove stale information) may be performed in an implementation-specific manner, such as updating configuration store, invoking ioctl command to kernel space, etc. Specific sequences on applying remediation for different state tables may be required subject to the dependencies of state tables.
In the example in
In practice, witness system 210 may generate and send a notification to a user (e.g., network administrator) to raise alarm about any state inconsistency detected. Depending on the desired implementation, the remediation request may be sent to the relevant network entity after obtaining the user's approval. The latter approach may reduce the likelihood of any undesirable side effects caused by the remediation request. In this case, the user has opportunity to approve or reject the suggested remediation request based on their review.
The first state information from LCP 203 may include table A (see 720) with fields=(LOCAL-REMOTE, LOCAL SPAN, REMOTE SPAN). Here, the “LOCAL-REMOTE” field may specify IP addresses of respective local and remote VTEPs. The “LOCAL SPAN” and “REMOTE SPAN” fields may each specify a universal unique identifier (UUID) of either a logical switch or routing domain. For example, when (local span=S1, remote span=S2), S1 and S2 may communicate via a logical overlay tunnel that is established between a pair of local and remote IP addresses in the “LOCAL-REMOTE” field.
The second state information from DP 204 may include table B (see 730) with fields=(LOCAL-REMOTE, REMOTE SPAN). Similarly, the “LOCAL-REMOTE” field may specify IP addresses of respective local and remote VTEPs. The “REMOTE SPAN” field may denote the VNI of a logical switch or routing domain that uses the VTEP specified by a remote IP address in the “LOCAL-REMOTE” field for overlay networking. During state consistency monitoring, witness system 210 may check the consistency of the “REMOTE SPAN” field in tables A and B. However, it is not possible to perform the check directly because DP 204 uses VNI to represent a logical switch in its “REMOTE SPAN” field, whereas LCP 203 uses the UUID of a logical switch.
At 710 in
At 750, 755 and 760 in
At 770 in
Experimental Evaluation
A prototype has been developed to evaluate state consistency monitoring according to examples of the present disclosure, such as to check the consistency of desired states related to L2 networking between LCP 203 and DP 204. Using the prototype, the following research questions (RQs) are considered: (RQ1) how effective and efficient does the prototype identify and fix state consistency issues, and (RQ2) what is the overhead introduced by the prototype?
To answer RQ1, the prototype may be evaluated with (a) an artificial benchmark and (b) a real-world benchmark. The artificial benchmark may be generated by a script that randomly adds/removes one or more configurations (e.g., in a vdl2 kernel module associated with DP 204) and then checks whether the consistency is restored within a specified timeout. For the real-world benchmark, the prototype may be evaluated with two state inconsistency situations caused by two product issues. To answer RQ2, the CPU and memory usages of host may be computed in the cases where state consistency check is enabled or disabled, with respect to logical topologies of different scales.
Some examples will be discussed using
(a) States of Interests
The following desired states related to L2 networking are considered: (1) logical switch definition (which includes properties of a logical switch, such as name, replication mode, VNI and routing domain ID), (2) remote logical switch state (which includes MAC/IP information of connected vNICs and VTEPs chosen by the certain logical switch in a remote host), (3) remote routing domain state (which includes VTEPs chosen by the certain routing domain in a remote host), (4) BFD table (which includes pairs of local and remote VTEPs to have overlay tunnels established, and references of each tunnel by local/remote logical switches/routing domains). The equivalence specification is defined for the consistency of remote logical switch states, remote routing domain states and BFD tables between LCP and DP. The source of truth is on LCP. All the involved associations of table fields may be linear.
(b) Remediation
Any remediation on the remote state of logical switch or routing domain may be applied prior to the re-mediation on BFD table when they are in the same batch because updating BFD table requires correct information on remote logical switches and routing domains.
(c) Artificial Benchmark
The testing script has three variable parameters: the number of test cases, the edit distance (which refers to the number of added/removed configurations from vdl2 kernel module for each case) and the timeout for the restoration of consistency. 4 groups of experiments are designed for edit distances 1-4, while the number of test cases is set to 16 and the timeout is set to 100 s for all the groups. Each case may involve mixed stale and missing configurations in the vdl2 kernel module. Between two adjacent cases, the script sleeps for 50 s to make sure the configuration convergences before the subsequent case starts. If one case fails, the remaining cases in the group will be skipped because state consistency is required prior to each test run. For this experiment, each server has 6 VMs deployed and every two connect to one logical switch, and all 3 logical switches are connected to one TO logical router.
The results are listed in Table 1 (see 810 in
(d) Real-World Benchmark
Two product issues listed in Table 2 (see 820 in
As Table 2 in
(e) Runtime Overhead
Runtime overhead may be measured by comparing average CPU/memory metrics in 5 minutes (e.g., using esx top utility) when state consistency monitoring is enabled or disabled on all components. The experiment is conducted using an example topology where (1) there are n (e.g., n=1, 5, 10, 25, 50) logical switches connected to one TO logical router; (2) each server has 2n VMs deployed and every two connect to one logical switch.
Table 3 (see 830 in
(d) Threat to Validity
First, the overhead on consistency checking generally grows over the increase of states, state tables and equivalence targets. The experiment data does not provide much insight on this aspect because the prototype only supports consistency checking between LCP and DP with respect to a limited kind of states. One candidate solution to this challenge is to leverage multiple workers in witness system 210 to perform consistency checking in parallel. Second, remediation may have non-negligible latency when the whole system is in an unstable or overloaded state. If the remediation is not realized within a predetermined interval (τ), the unremedied discrepancy may still be captured by the next working cycle.
Container Implementation
Although discussed using VMs 131-136, it should be understood that state consistency monitoring may be performed for other virtualized computing instances, such as containers, etc. The term “container” (also known as “container instance”) is used generally to describe an application that is encapsulated with all its dependencies (e.g., binaries, libraries, etc.). For example, multiple containers may be executed as isolated processes inside VM1131, where a different VNIC is configured for each container. Each container is “OS-less”, meaning that it does not include any OS that could weigh 11s of Gigabytes (GB). This makes containers more lightweight, portable, efficient and suitable for delivery into an isolated OS environment. Running containers inside a VM (known as “containers-on-virtual-machine” approach) not only leverages the benefits of container technologies but also that of virtualization technologies.
Computer System
The above examples can be implemented by hardware (including hardware logic circuitry), software or firmware or a combination thereof. The above examples may be implemented by any suitable computing device, computer system, etc. The computer system may include processor(s), memory unit(s) and physical NIC(s) that may communicate with each other via a communication bus, etc. The computer system may include a non-transitory computer-readable medium having stored thereon instructions or program code that, when executed by the processor, cause the processor to perform processes described herein with reference to
The techniques introduced above can be implemented in special-purpose hardwired circuitry, in software and/or firmware in conjunction with programmable circuitry, or in a combination thereof. Special-purpose hardwired circuitry may be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), and others. The term ‘processor’ is to be interpreted broadly to include a processing unit, ASIC, logic unit, or programmable gate array etc.
The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or any combination thereof.
Those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers (e.g., as one or more programs running on one or more computing systems), as one or more programs running on one or more processors (e.g., as one or more programs running on one or more microprocessors), as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of skill in the art in light of this disclosure.
Software and/or to implement the techniques introduced here may be stored on a non-transitory computer-readable storage medium and may be executed by one or more general-purpose or special-purpose programmable microprocessors. A “computer-readable storage medium”, as the term is used herein, includes any mechanism that provides (i.e., stores and/or transmits) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant (PDA), mobile device, manufacturing tool, any device with a set of one or more processors, etc.). A computer-readable storage medium may include recordable/non recordable media (e.g., read-only memory (ROM), random access memory (RAM), magnetic disk or optical storage media, flash memory devices, etc.).
The drawings are only illustrations of an example, wherein the units or procedure shown in the drawings are not necessarily essential for implementing the present disclosure. Those skilled in the art will understand that the units in the device in the examples can be arranged in the device in the examples as described or can be alternatively located in one or more devices different from that in the examples. The units in the examples described can be combined into one module or further divided into a plurality of sub-units.
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2022/097519 | Jun 2022 | WO | international |
The present application claims the benefit of Patent Cooperation Treaty (PCT) Application No. PCT/CN2022/097519, filed Jun. 8, 2022, which is incorporated herein by reference.