Identifying and remediating anomalies in a self-healing network

Description

BACKGROUND

Today, a software-defined wide-area network (SD-WAN) allows enterprises to build flexible WAN (wide area network) networks using programmable network components. The decoupling of the control plane and data plane in a software-defined architecture enables sophisticated Artificial Intelligence and Operations (AIOps) platforms to continually monitor and program the network to respond to higher level security and application considerations. For example, AIOps systems can look at granular flow metadata coupled with global underlay and overlay topology data to reliably detect problems and apply remedial control actions in an analytics-controlled feedback loop.

As workforces become more distributed and applications migrate to multiple clouds, networks become critical components of enterprise productivity. As such, issues on LANs (local area networks), WANs, datacenter, and/or within applications themselves have direct impact on end-user application performance, such as a network router on a flow path suddenly experiencing high packet drops on its outgoing links and causing the end-to-end application to slow down.

BRIEF SUMMARY

Some embodiments of the invention provide a method of detecting and autonomously remediating anomalies in a self-healing SD-WAN (software-defined wide-area network) implemented by multiple forwarding elements (FEs), each of which is located at one of multiple sites connected by the SD-WAN. The method of some embodiments is performed by an anomaly detection and remediation system for the SD-WAN. In some embodiments, the anomaly detection and remediation system includes one or more anomaly detection and remediation processes executed by one or more machines in a cluster. The one or more machines, in some embodiments, include one or more host computers. Also, in some embodiments, the anomaly detection and remediation system is implemented as part of an ENI (Edge Network Intelligence) platform.

From the multiple FEs, the method of some embodiments receives multiple sets of flow data associated with application traffic that traverses the multiple FEs (e.g., edge routers, gateway routers, and hub routers). The method uses a first set of machine-trained processes to analyze the multiple sets of flow data in order to identify at least one anomaly associated with at least one particular FE. The method then uses a second set of machine-trained processes to identify at least one remedial action for remediating the identified anomaly. The method implements the identified remedial action by directing an SD-WAN controller deployed in the SD-WAN to implement the identified remedial action.

In some embodiments, the anomaly detection and remediation system includes multiple sub-systems that execute various processes to detect and remediate anomalies. These sub-systems, in some embodiments, include a data ingestion system, an analytics system, and a control action system. The data ingestion system, of some embodiments, receives the flow data from the FEs that implement the SD-WAN and parses the received flow data into a data structure used internally by the anomaly detection and remediation system. The analytics system receives the parsed flow data from the ingestion system, in some embodiments, aggregates the data, computes performance scores from the data, and uses one or more anomaly detection machine-trained processes to analyze the computed performance scores to identify any anomalies. The control action system of some embodiments receives identified anomalies, uses one or more machine-trained processes to identify remedial actions to obviate the identified anomalies, and sends API calls to the SD-WAN controller to direct the SD-WAN controller to implement the remedial actions in the SD-WAN.

The received flow data, in some embodiments, is associated with multiple applications. In some embodiments, each received set of flow data includes a five-tuple identifier for the associated flow, an application identifier associated with the flow, and a protocol associated with the flow. The flow data, in some embodiments, also includes a set of flow statistics associated with the flow, an overlay route type associated with the flow, a next-hop overlay node for the flow, and a destination hop overlay node associated with the flow. In some embodiments, the set of flow statistics includes an amount of TX bytes associated with the flow, an amount of RX bytes associated with the flow, TCP latency associated with the flow, and a number of TCP re-transmissions associated with the flow.

In some embodiments, the analytics system aggregates the flow data by performing a time aggregation operation on the flow data at a first granularity in order to generate a set of aggregated flow data. For each FE of the multiple FEs, the analytics system of some embodiments uses the set of aggregated flow data to generate a set of performance scores at a second granularity. In some embodiments, the first granularity is a per-minute granularity, and the second granularity is a per-application granularity such that for each FE, each performance score in the set of performance scores for the FE corresponds to a particular minute of time (i.e., in a duration of time over which the flow data was collected) and a particular application. In some embodiments, the performance scores are per-edge, per-application, and per-path.

The machine-trained processes used to analyze the performance scores, in some embodiments, are part of a set of anomaly detection processes that detect anomalies at a short timescale (e.g., 30 minutes), and at a longer timescale (e.g., two weeks). In some embodiments, the shorter timescale anomaly detection process identifies, for each particular FE of the multiple FEs, performance scores associated with each application of multiple applications for which the particular FE forwards traffic flows.

For each particular application, the shorter timescale anomaly detection process generates a distribution graph (e.g., a Gaussian distribution curve) that shows the identified performance scores associated with the particular application for the particular FE over a first duration of time. The shorter timescale anomaly detection process then analyzes the generated distribution graphs using a machine-trained process (e.g., a sliding window Gaussian outlier detection process) to identify one or more per-application incidents by identifying that a threshold number of performance scores associated with the particular application (1) are outliers with respect to the generated distribution graph for the particular application and (2) occurred within a second duration of time.

In some embodiments, each generated distribution graph represents a distribution of the set of performance scores for the particular application over the first duration of time (e.g., 30 minutes). To generate the distribution graph, in some embodiments, the shorter timescale anomaly detection process computes a sample mean of the performance scores for the first duration of time and a standard deviation of the performance scores for the first duration of time. In some embodiments, the computed sample mean and standard deviation are dynamic parameters that change over time based on the generated performance scores. As such, each distribution graph generated for a particular FE varies compared to each other distribution graph generated for the particular FE as the performance scores computed for different durations of time affect the dynamic parameters used to generate the distribution graphs.

In order to identify that a threshold number of performance scores associated with the particular application are outliers with respect to the generated distribution graph for the particular application, some embodiments use the dynamic parameters to determine whether a threshold number of performance scores in the set of performance scores for the particular application exceed a specified threshold of performance. In some embodiments, the first and second durations of time are different durations of time (e.g., different 30 minute time windows), while in other embodiments, the first and second durations of time are the same duration of time (e.g., the same 30 minute time window). In still other embodiments the second duration of time is a subset of the first duration of time (e.g., a 5 minute subset of time within the 30 minute window).

In some embodiments, the longer timescale anomaly detection process is performed iteratively. The longer timescale anomaly detection process of some embodiments receives multiple performance scores that over a duration of time (e.g., two weeks) express a performance of the SD-WAN for at least one particular application associated with flows that traverse the SD-WAN during the time duration. The longer timescale anomaly detection process uses the received performance scores, in some embodiments, to update generated weight values for a topology graph that includes (1) multiple nodes representing the multiple FEs and (2) multiple edges between the multiple nodes representing paths traversed between the edges by the flows associated with the particular application, with the generated weight values being associated with said paths.

The longer timescale anomaly detection process of some embodiments uses a topology-based machine-trained process to analyze the topology graph with the updated generated weight values in order to identify an anomaly in the topology graph that is indicative of an anomaly the SD-WAN for the particular application's traffic flows. For an identified anomaly, the longer timescale anomaly detection process implements a remedial action to modify the SD-WAN in order to remediate the identified anomaly (e.g., by sending an API call identifying the remedial actions to an SD-WAN controller), according to some embodiments.

In some embodiments, when using the topology-based machine-trained process to analyze the topology graph with the generated weight values in order to identify an anomaly, the longer timescale anomaly detection process also determines whether the identified anomaly is isolated to a particular FE (e.g., isolated to a particular edge FE or due to a particular transit FE), or affects the overall application. For instance, in some embodiments, the identified anomaly is a network impairment on a first transit FE that is a next-hop FE for application traffic associated with the particular application and forwarded by a first edge FE located at a first branch site. In some such embodiments, the identified remedial action includes updating a transit FE order configuration for the first edge FE to change the next-hop transit FE for application traffic associated with the particular application and forwarded by the first edge FE from the first transit FE to a second transit FE.

The first transit FE, in some embodiments, is also a next-hop transit FE for application traffic associated with the particular application and forwarded by a second edge FE located at a second branch site. In some such embodiments, the identified anomaly is also associated with the second edge FE when application traffic for the particular application and forwarded by the second edge FE is also affected by the first transit FE's anomalous behavior. The identified remedial action, in some embodiments, is to update transit FE orders for both the first edge FE and the second edge FE.

In some embodiments, the particular application is a first application and the first transit FE is also a next-hop transit FE for application traffic associated with a second application and forwarded by the first edge FE. In some such embodiments, the identified remedial action includes updating the transit FE order configuration for the first edge FE to change the next-hop transit FE from the first transit FE to the second transit FE for application traffic associated with both the first and second applications. When the network impairment only affects traffic associated with the first application, the transit FE order configuration for the particular edge FE is only updated for traffic associated with the first application and not the second application, according to some embodiments.

When an identified anomaly is determined to require remediation to improve performance of as set of one or more flows, in some embodiments, the control action system mentioned above is utilized for identifying and implementing one or more remedial actions that modify the SD-WAN. For a particular anomaly, the control action system, in some embodiments, identifies a set of two or more remedial actions for remediating the particular anomaly in the SD-WAN.

For each identified remedial action in the set, the control action system selectively implements the identified remedial action for a subset of the set of flows for a duration of time in order to collect a set of performance metrics associated with SD-WAN performance during the duration of time for which the identified remedial action is implemented. Based on the collected sets of performance metrics, the control action system of some embodiments uses a machine-trained process to select one of the identified remedial actions as an optimal remedial action to implement for all of the flows in the set, and uniformly implements the selected remedial action for all of the flows in the set.

The particular anomaly, in some embodiments, is an increased latency associated with a first transit FE that forwards application data traffic between one or more edge FEs located at one or more branch sites connected by the SD-WAN and one or more applications deployed to a first cloud datacenter connected by the SD-WAN. The set of two or more remedial actions, in some embodiments, include two or more alternate routes through the SD-WAN to the particular application. For example, the alternate routes of some embodiments include at least (1) a first alternate route between the one or more edge FEs and the one or more applications deployed to a second cloud datacenter connected to the SD-WAN via a second transit FE, and (2) a second alternate route between the one or more edge FEs and the one or more applications deployed to a third cloud datacenter connected to the SD-WAN via a third transit FE.

In some embodiments, the control action system selectively implements each identified remedial action (e.g., each identified alternate path) for a subset of the set of flows for the duration of time by directing (e.g., via an API call to the SD-WAN controller specifying the remedial action) the one or more edge FEs to use the second transit FE to forward a first subset of flows to the one or more applications deployed to the second cloud datacenter, and directing the one or more edge FEs to use the third transit FE to forwards a second subset of flows to the one or more applications deployed to the third cloud datacenter. The one or more edge FEs of some embodiments continue to use the first transit FE to forward a remaining third subset of flows to the one or more applications deployed to the first cloud datacenter.

As the performance metrics are collected for each selectively implemented remedial action, the control action system of some embodiments receives (e.g., from the analytics system described above) or computes itself a performance score for each remedial action. When a performance score generated for the first alternate route is higher than a performance score generated for the second alternate route (i.e., is more optimal), in some embodiments, the machine-trained process of the control action system selects the first alternate route to implement for the set of flows. Alternatively, when the performance score generated for the second alternate route is higher than the performance score generated for the first alternate route, the machine-trained process of the control action system selects the second alternate route to implement for the set of flows. The control action system of some embodiments then sends an API call to the SD-WAN controller to direct the SD-WAN control to update configurations for the one or more edge FEs to cause the edge FEs to use the selected remedial action (e.g., selected alternate path) for all flows in the set.

The preceding Summary is intended to serve as a brief introduction to some embodiments of the invention. It is not meant to be an introduction or overview of all inventive subject matter disclosed in this document. The Detailed Description that follows and the Drawings that are referred to in the Detailed Description will further describe the embodiments described in the Summary as well as other embodiments. Accordingly, to understand all the embodiments described by this document, a full review of the Summary, the Detailed Description, the Drawings, and the Claims is needed. Moreover, the claimed subject matters are not to be limited by the illustrative details in the Summary, the Detailed Description, and the Drawings.

BRIEF DESCRIPTION OF FIGURES

The novel features of the invention are set forth in the appended claims. However, for purposes of explanation, several embodiments of the invention are set forth in the following figures.

FIG. 1 conceptually illustrates a schematic diagram of a self-healing SD-WAN overlay network architecture of some embodiments.

FIG. 2, for instance, conceptually illustrates a block diagram that includes an analytics machine cluster of some embodiments.

FIG. 3 conceptually illustrates a block diagram of interactions of some embodiments between a control action system of an ENI platform deployed in an SD-WAN and an SD-WAN controller for the SD-WAN.

FIG. 4 illustrates a list of the above-mentioned API calls and the responses to these API calls, in some embodiments.

FIG. 6 illustrates a graph of some embodiments that includes an example of a Gaussian distribution curve that is generated using performance scores computed over a particular 30 minute time window.

FIG. 7 illustrates an example graph of some embodiments that includes four different distribution curves.

FIG. 8 illustrates a process performed in some embodiments to identify anomalies at the shorter timescale.

FIG. 9 conceptually illustrates another block diagram of the ENI platform of FIG. 5 with an in-depth view of the longer timescale anomaly detector of some embodiments.

FIG. 10 illustrates simplified examples of two topology graphs of some embodiments for first and second applications.

FIG. 11 conceptually illustrates a process performed in some embodiments to identify anomalies at the longer timescale.

FIG. 12 conceptually illustrates a block diagram that provides a more in-depth view of the control action system of some embodiments.

FIG. 13 conceptually illustrates a reinforcement learning process performed by the control action system of some embodiments using the greedy algorithm described above.

FIG. 14 conceptually illustrates an example diagram of a self-healing SD-WAN, of some embodiments, in which alternate routes are monitored for sample flows between an edge router and an application.

FIG. 15 conceptually illustrates another example diagram of a self-healing SD-WAN of some embodiments in which alternate routes are identified between edge devices located at different branch sites for sending VOIP traffic between client devices at the different branch sites.

FIG. 16 conceptually illustrates a process performed in some embodiments to identify and remediate performance incidents in an SD-WAN.

FIG. 17 illustrates the layout of an incident, in some embodiments.

FIG. 18 illustrates an example layout of a recommendation, in some embodiments, that includes QoE (quality of experience) score comparisons for 6 gateways, as well as edge alternate overlay node QoE scores.

FIG. 19 conceptually illustrates a computer system with which some embodiments of the invention are implemented.

DETAILED DESCRIPTION

In the following detailed description of the invention, numerous details, examples, and embodiments of the invention are set forth and described. However, it will be clear and apparent to one skilled in the art that the invention is not limited to the embodiments set forth and that the invention may be practiced without some of the specific details and examples discussed.

From the multiple FEs, the method of some embodiments receives multiple sets of flow data associated with application traffic that traverses the multiple FEs, such as edge forwarding elements deployed at sites (e.g., branch sites), and transit FEs, including gateway forwarding elements deployed in cloud datacenters, and hub forwarding elements deployed in public or private datacenters. In some embodiments, these FEs are routers, e.g., are edge routers, cloud gateway routers, and hub routers.

The method uses a first set of machine-trained processes to analyze the multiple sets of flow data in order to identify at least one anomaly associated with at least one particular FE. The method then uses a second set of machine-trained processes to identify at least one remedial action for remediating the identified anomaly. The method implements the identified remedial action by directing an SD-WAN controller deployed in the SD-WAN to implement the identified remedial action.

SD-WAN forms the middle layer of the network connection between clients and devices on one end of the network (e.g., at branch, campus, and/or work-from-anywhere locations) and applications on the other end (e.g., cloud applications, datacenter applications). In some embodiments, SASE (secure access service edge) provides cloud-enabled security and network services over the SD-WAN. SASE, in some embodiments, encompasses multiple SDN (software-defined network) and security services such as an SD-WAN, Cloud Web Security, Zero Trust Network Access, etc.

FIG. 1 conceptually illustrates a schematic diagram of a self-healing SD-WAN overlay network architecture 100 of some embodiments. The self-healing SD-WAN augments an SD-WAN network with intelligence by using real-time network data and artificial intelligence/machine learning (AI/ML) algorithms (e.g., machine-trained processes) to monitor, detect, and proactively take control actions to auto-remediate end-user application and security issues (e.g., by programmatically reconfiguring SD-WAN network elements), according to some embodiments.

Multiple branch sites 130, 132, and 134 and devices 150 (e.g., user devices) located at the multiple branch sites 130-134 are connected to the SD-WAN 100 by the SD-WAN edge forwarding elements (FEs) 120, 122, and 124 (e.g., edge routers), and the datacenter 140 that hosts datacenter resources 155 is connected to the SD-WAN by the SD-WAN hub 145 (e.g., a hub router). A gateway FE 165 (e.g., gateway router) is deployed to a cloud 160 in the SD-WAN 100 to connect the SD-WAN edge FEs 120-124 to each other, to the SD-WAN hub FE 145, and to software as a service (SaaS) applications and cloud applications 110. The gateway FE 165, in some embodiments, also connects the SD-WAN 100 to external networks (not shown).

Additionally, the SD-WAN 100 includes an SD-WAN controller 105 for managing the elements of the SD-WAN 100, and an ENI platform 170 for collecting and analyzing flow data to detect and remediate issues (e.g., anomalous behavior associated with FEs, routes, applications, etc.). In some embodiments, the FEs (e.g., edge, hub, and gateway FEs) of the SD-WAN 100 are in a full mesh topology in which each forwarding element is connected to every other forwarding element. In other embodiments, the SD-WAN elements are in partial mesh topologies. Also, in some embodiments, the hub FE 145 serves as a hub in a hub-spoke architecture in which the edge FEs 120-124 serve as spokes.

The ENI platform 170 is a cluster of machines, in some embodiments, that implement a set of processes, including multiple machine-trained processes, for detecting and remediating applications issues in the SD-WAN 100. The ENI platform 170 ingests real-time flow data from network nodes (e.g., the SD-WAN edge FEs 120-124, the SD-WAN gateway FE 165, and the SD-WAN hub FE 145), analyzes and extracts insights from the flow data using AI/ML algorithms, and takes control actions by invoking certain APIs on the SD-WAN controller 105. The control actions alter the appropriate configurations of network nodes to remediate issues detected using machine-trained processes (e.g., machine learning algorithms).

The machine-trained processes are field-trained using unsupervised learning, in some embodiments, while in other embodiments, the machine-trained processes are trained prior to in-field use (e.g., supervised learning in a controlled environment). In still other embodiments, the machine-trained processes are trained both prior to in-field use as well as during in-field use (i.e., a combination of supervised and unsupervised learning).

The SD-WAN controller 105, in some embodiments, is a cluster of network managers and controllers that serves as a central point for managing (e.g., defining and modifying) configuration data that is provided to the edge FEs 120-124 and/or hubs and gateways (e.g., the SD-WAN gateway FE 165 and SD-WAN hub FE 145) to configure some or all of the operations. In some embodiments, this SD-WAN controller 105 is in one or more public cloud datacenters, while in other embodiments it is in one or more private datacenters. In some embodiments, the SD-WAN controller 105 has a set of manager servers that defines and modifies the configuration data, and a set of controller servers that distributes the configuration data to the edge FEs, hubs and/or gateways. In some embodiments, the SD-WAN controller 105 directs edge FEs and hub FEs to use certain gateways (i.e., assigns a gateway to the edges and hubs).

As described above, the SD-WAN 100 includes of two types of forwarding nodes (also called forwarding elements in the discussion below): (1) one or more edge forwarding nodes (also called edge forwarding elements or edges), and (2) one or more transit forwarding nodes (also called transit forwarding elements). An edge node (such as edge 120, 122, 124) resides at the overlay network boundary, in some embodiments, and connects the local-area-network (LAN) of a branch site (e.g., the LAN at a branch site 130, 132, or 134) with the overlay WAN network (e.g., the SD-WAN).

A transit node serves as the intermediary node on the overlay network for routing application flows to their respective destination servers, according to some embodiments. It provides several network-management functions as well as improves application performance by utilizing highly optimized network routes to reach the application servers. Examples of transit nodes include cloud gateways (e.g., cloud gateway 165) and hubs (e.g., hub 145). A hub forwarding element provides access to resources of datacenter (e.g., resources 155 of the datacenter 140) and also serves as a transit node for passing flows from one edge FE of one branch site to another edge FE of another branch site, according to some embodiments. For example, the SD-WAN hub 145 is connected to each of the SD-WAN edge FEs 120-124.

In some embodiments, flows that traverse the edge FEs and transit FEs are associated with various applications. Examples of applications, in some embodiments, include VOIP (voice over IP) applications, database applications, web applications, and applications for running virtual machines (VMs). Each application, in some embodiments, executes on a device operating at a site connected to the SD-WAN. For example, applications of some embodiments execute on devices operating at datacenters (e.g., public datacenters or private datacenters), in clouds (e.g., public clouds or private clouds), and at branch sites (e.g., on user devices operating at the branch sites). In some embodiments, different instances of the same application (e.g., a VOIP application) execute on separate user devices at separate branch sites and communicate via paths between the branch sites (e.g., direct paths between edge routers at each branch site, paths between the edge routers that traverse transit routers, etc.).

The datacenter 140 is one of multiple cloud datacenters connected by the SD-WAN, in some embodiments. In some such embodiments, each cloud datacenter can be provided by the same or different providers, while each of the branch sites 130-134 belongs to the same entity, according to some embodiments. The branch sites 130-134, in some embodiments, are multi-machine sites of the entity. Examples of multi-machine sites of some embodiments include multi-user compute sites (e.g., branch offices or other physical locations having multi-user computers and other user-operated devices and serving as source computers and devices for requests to other machines at other sites), datacenters (e.g., locations housing servers), etc. These multi-machine sites are often at different physical locations (e.g., different buildings, different cities, different states, etc.). In some embodiments, the cloud datacenters are public cloud datacenters, while in other embodiments the cloud datacenters are private cloud datacenters. In still other embodiments, the cloud datacenters may be a combination of public and private cloud datacenters. Examples of public clouds are public clouds provided by Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure, etc., while examples of entities include a company (e.g., corporation, partnership, etc.), an organization (e.g., a school, a non-profit, a government entity, etc.), etc.

The datacenter 140 includes a hub 145, as mentioned above, for connecting the datacenter 140 to the SD-WAN 100 (e.g., to the SD-WAN gateway 165 and/or the edge FEs 120-124), and for connecting branch sites 130-134 to the resources 155 of the datacenter 140. The datacenter resources 155, in some embodiments, are application resources. In some embodiments, additional SD-WAN gateways may be present and can include multi-tenant, stateless service gateways deployed in strategic points of presence (PoPs) across the globe. Some such gateways serve as gateways to various clouds and datacenters, such as the SaaS/Cloud applications 110. Also, in some embodiments, other additional SD-WAN forwarding elements may be present, including additional edge devices located at other branch sites of the entity, as well as additional SD-WAN hub FEs. Hub FEs, in some embodiments, use or have one or more service engines to perform services (e.g., middlebox services) on data messages that it forwards from one branch site to another branch site.

In some embodiments, between any two SD-WAN nodes (e.g., an SD-WAN edge 120-124 and an SD-WAN gateway 165), a network path is established between every available network interface pair. WAN optimization technology is used, in some embodiments, is used to send packets over the available paths in response to time-varying underlying network conditions. An example of such a WAN optimization technology used in some embodiments is a proprietary technology called Dynamic Multi-path Optimization (DMPO). When DMPO is used, in some embodiments, active paths with the best instantaneous network quality are used to send the application packets. In some embodiments, the outcome of DMPO optimization is a reliable overlay link between any two SD-WAN nodes even though the underlay network links may experience time-varying fluctuations.

At the flow level, in some embodiments, an application flow can take several different paths on the overlay network. Examples of such paths in some embodiments include (1) a Direct/NSD path (i.e., a non-SD-WAN path), (2) an Edge-Gateway/Hub-Application path, and (3) an Edge-Gateway/Hub-Edge-Application path. For the direct/NSD path, the application flow does not traverse the overlay network and is either put directly on the public internet or is routed through a non-SD-WAN (NSD) tunnel to the application destination, in some embodiments. For the Edge-Gateway/Hub-App path, in some embodiments, the application flow traverses the path established between the Edge node and the Gateway/Hub node and is then eventually routed to the application destination. Finally, for the Edge-Gateway/Hub-Edge-App path, the application flow is routed from one Edge node to another Edge node via the Gateway/Hub and then to the application destination, according to some embodiments.

On the overlay SD-WAN network, in some embodiments, there are several control knobs that affect the end-to-end application performance. In some embodiments, these control knobs can be categorized into two buckets: (1) link-level parameters and (2) path-level parameters. Link level parameters refer to several parameters inside the DMPO protocol on a single overlay link that control the packet transmission reliability on a single overlay link, according to some embodiments. Examples of such parameters, in some embodiments, include rate limit, traffic classification/QoS prioritization and interface switching configuration. Path level parameters refer to which paths are selected for routing application flows, in some embodiments. These include selecting direct versus overlay paths, in some embodiments, and, within overlay paths, selecting which transit node to choose as the intermediary node. In some embodiments, these parameters trade off application performance with network stability and uniform network load distribution.

While the self-healing SD-WAN technology encompasses control actions across all aspects of the network, some embodiments specifically focus on the aspect of dynamically selecting an overlay transit node (e.g., a hub node or a gateway node) as the control parameter to dynamically re-route application traffic in response to real-time end-to-end application performance conditions. The SD-WAN edge FEs 120-124, in some embodiments, are configured to stream flow metadata to a data ingestion system within an Edge Network Intelligence (ENI) Platform (not shown) of the SD-WAN 100.

The flow metadata messages streamed by the SD-WAN edge FEs, in some embodiments, include contextual information for each flow, such as the source and destination IP (Internet Protocol) addresses, source and destination ports, network protocol, and flow statistics. Examples of flow statistics, in some embodiments, include the average packet latency, drop and jitter, the number of bytes transmitted and received over the last minute, and the next hop and destination hop overlay nodes. In some embodiments, these flow metadata messages provide granular per-flow and device-level information and, in some embodiments, are streamed to an ENI analytics machine or machine cluster (not shown) through a message broker (e.g., Apache Kafka message broker) after cleaning and data normalization.

In some embodiments, the analytics process of the ENI platform is responsible for taking raw edge flow data and identifying application performance incidents in a streaming fashion. To facilitate parallel processing of edge data, in some embodiments, it is run on an analytics cluster (e.g., an Apache Spark cluster) in a three-stage process of aggregation, scoring, and anomaly detection.

FIG. 2, for instance, conceptually illustrates a block diagram 200 that includes an analytics cluster of some embodiments. As shown, the diagram includes SD-WAN edge routers 260 and an ENI platform 205. The ENI platform includes a data ingestion system 270, an analytics cluster 275, and a control action system 280. The data ingestion system 270 includes an ENI manager 210 and a message broker 215, while the analytics cluster 275 includes an aggregation pipeline 220, a scoring pipeline 230, and an anomaly detection pipeline 250.

As mentioned above, the SD-WAN edge routers 260 are configured to stream flow metadata to the data ingestion system 270 of the ENI platform 205. The ENI manager 210 (also referred to as an ENI backend) receives flow data from the edge routers 260, parses the flow data, and converts it into ENI internal data structures, according to some embodiments. In some embodiments, the ENI manager 210 is an ENI manager that executes a Java process. The flow data, in some embodiments, are protobuf messages. In some embodiments, each message includes a 5-tuple for the flow, an application identifier, protocol, flow statistics (e.g., TX bytes, RX bytes, TCP latency, and TCP retransmissions), and overlay route information (e.g., overlay route type, next hop overlay node, and destination hop overlay node), as also mentioned above. The edge routers 260 of some embodiments stream the flow data to the ENI manager 210 at a 1 minute stream frequency.

As the ENI manager 210 receives, parses, and converts flow data, the ENI manager 210 passes the converted flow data (i.e., in the ENI internal data structures) to the message broker 215. The message broker 215, in some embodiments, is a message broker between the ENI manager 210 and analytics cluster 275. As such, as the message broker 215 receives the converted flow data from the ENI manager 210, the message broker 215 passes the converted flow data to the aggregation pipeline 220 of the analytics cluster 275.

The aggregation pipeline 220 of the analytics cluster 275 reads raw data from the message broker 215 and performs a time aggregation operation to aggregate metrics (e.g., collected operational values) extracted from the raw data at a particular granularity. The granularity, in some embodiments, is a per-minute granularity such that a set of aggregated metrics are generated for each minute of a duration of time for which the raw data was collected.

In some embodiments, the messages streamed into the analytics cluster 275 have variable delays, and as such, the aggregation pipeline 220 is designed to accommodate late-arriving data by storing aggregates for a small duration past the end of the aggregation window, in some embodiments, and transmitting them downstream only after this duration has elapsed. For example, when calculating the aggregation for a window between times 12:03 and 12:04, some embodiments wait until 12:05 to receive data timestamped between 12:03 and 12:04 and add this additional data to the aggregate. Only at 12:05 will this aggregate be transmitted to the scoring pipeline 230 of the analytics cluster. In some embodiments (e.g., when using an Apache Spark cluster), this is done by maintaining a state for each edge router, application, and overlay next-hop node.

After aggregating raw data to the minute level, the aggregation pipeline 220 provides the data to the scoring pipeline 230 of the analytics cluster 275. The scoring pipeline 230 of some embodiments transforms this data into measurements of end-user application performance at a second granularity. In some embodiments, the scoring pipeline 230 performs this transformation by combining multiple raw metrics into a single score per key (e.g., a tuple for every edge router, application, and flow combination) that represents a holistic assessment of the average application performance of an edge router for each minute.

The performance scores, in some embodiments, are an application performance QoE scores on a (1) per-edge, (2) per-application, (3) per-overlay-route, and (4) per-minute level, according to some embodiments. For example, for a TCP flow, packet latency and retransmit percentage are used, in some embodiments, against pre-determined thresholds to compute a score. The self-healing system of some embodiments, however, allows flexibility to customize the scoring function on an application level and enterprise level. The scoring pipeline 230 of the analytics cluster 275 streams these scores to the anomaly detection pipeline 240.

The anomaly detection pipeline 240 of the analytics cluster 275 receives an overall performance score at the minute level for every edge, application, next-hop and destination overlay node. These scores are passed through machine learning models, in some embodiments, to detect large deviations in performance from the normal baseline. In some embodiments, the outcome of this step is an application performance incident that is generated and sent to the analytics control action system 280.

In some embodiments, as will be further described below, the anomaly detection pipeline 240 is broken into a fast acting change detection pipeline (e.g., a shorter timescale anomaly detector) and a global application performance analysis and machine learning recommendation pipeline (e.g., a longer timescale anomaly detector). The fast acting change detection pipeline of some embodiments runs time-series machine learning models on a (1) per-edge, (2) per-application, and (3) per-overlay-nexthop-node level to analyze and detect sudden degradation in performance, according to some embodiments.

In some embodiments, the global application performance analysis and machine learning recommendation pipeline runs machine learning models on a longer time-window data and computes application performance insights at the global topology level. In some embodiments, the global application performance analysis and machine learning recommendation pipeline identifies problematic edge, application, overlay-nexthop-node combinations and extracts valuable insights on application performance across the entire customer deployment.

After incidents have been detected by the analytics cluster, the control action system 280 takes a corrective action by 1) identifying if a remedial action on an impacted Edge is necessary, 2) determining which remedial action to take, and 3) autonomously applying the chosen remedial action (i.e., applying the chosen action without input from a user). This then creates a closed loop system that self-heals end-to-end application issues. Determining which remedial action to take, in some embodiments, includes using a machine-trained reinforcement learning process, as will be further described by embodiments below.

In some embodiments, the control action system 280 combines data from the (mainly domain agnostic) machine learning-based pipelines and (domain specific) topology and configuration information from an SD-WAN controller (not shown) to generate appropriate control actions which are then applied to the SD-WAN system through the SD-WAN controller. A control action (also referred to as a remedial action) involves programmatically altering the controllable parameters in the SD-WAN system, in some embodiments. The control action system 280 of some embodiments applies the control actions through API calls to the SD-WAN controller (not shown). For example, for VMware SD-WAN solutions, these changes are handled by a cloud-based management system called the VeloCloud Orchestrator (VCO). The VCO stores and periodically synchronizes edge configurations to all the network edges. This configuration can be modified by users via a GUI or an API which then dynamically alters the flow of network traffic.

FIG. 3 conceptually illustrates a block diagram 300 of interactions of some embodiments between a control action system of an ENI platform deployed in an SD-WAN and an SD-WAN controller for the SD-WAN. After the analytics cluster (not shown) identifies edge incidents 315 (e.g., anomalies and deviations in performance associated with one or more edge-application pairs), the edge incidents 315 are provided to the control action system 305. Additionally, for cases in which the SD-WAN architecture includes a hub-spoke topology and where an identified control action is to dynamically change the order of hubs for the edge (i.e., configure the edge to use a different hub for forwarding at least a subset of application traffic), the control action system 305 queries a datastore 350 for existing hub performance metrics 320, which are populated earlier by the analytics engine. An example of a datastore 320 used in some embodiments is an Apache Cassandra table.

When a remedial action is identified for remediating at least one of the edge incidents 315 for at least one edge-application pair, in some embodiments, the control action system 305 also retrieves existing configuration data 325 for the affected edge from the SD-WAN controller. The control action system 305, of some embodiments, is run on the analytics cluster described by FIG. 2 above. In some embodiments, the analytics cluster is an open-source analytics cluster such as an Apache Spark cluster.

The control action system 305 communicates with the SD-WAN controller using an API polling process that facilitates communication between the ENI platform that includes the control action system 305 and other third-party APIs, according to some embodiments. The API-poller 310, of some embodiments, queries the SD-WAN controller API for the existing edge configuration data 325 and writes the data to a datastore 340. The datastore 340, in some embodiments, is an open-source distributed wide-column datastore (e.g., Cassandra).

The open-source analytics cluster on which the control action system 305 runs, in some embodiments, queries the datastore 340 to make recommendations (i.e., for remediating identified network incidents). With the existing edge configuration 325 and knowledge of the hub performance metrics 320 for the affected edge-application pair, the control action system 305 of some embodiments creates a recommendation for a new hub order and writes this recommendation to a distributed search and analytics datastore 360 (e.g., an Elasticsearch type). In some embodiments, the API-poller 310 queries the recommendations from the datastore 360 and sends the queried recommendations as API calls to the SD-WAN controller to make the actual configuration changes. In some embodiments, the control action system 305 uses a controller API (e.g., the VCO API) to apply its recommended edge configuration changes. For example, the API-poller 310 sends the pushed configuration updates 330 to the SD-WAN controller in the block diagram 300.

Examples of the API calls invoked by the API-poller 310, in some embodiments, include getHubOrder to retrieve the hub order for one or more edges, updateHubOrder to change the configured hub order, getBusinessPolicies to retrieve business policies associated with a particular enterprise and/or edge FE(s), addBusinessPolicy to add a new business policy rule for an enterprise and/or particular edge FE(s), and removeBusinessPolicy to delete a business policy for an enterprise and/or particular edge FE(s). For each of these API calls, in some embodiments, the API-poller 310 specifies one or more edge identifiers associated with one or more edge FEs for which a configuration (e.g., hub order, business policy, etc.) is being retrieved or changed, and an enterprise identifier associated with the particular enterprise to which the one or more edge FEs belong.

In some embodiments, the getBusinessPolicies API call can instead be getBusinessPolicy with a “name” argument to search for the “Internet backhaul” rule directly. Additionally, in some embodiments, all of these API calls require an additional argument for segment logicalID or segment name. In some embodiments, the API calls can be separated into override/profile-level rules or combined into a single list. FIG. 4 illustrates a list of the above-mentioned API calls and the responses to these API calls, in some embodiments.

In some embodiments, the control action system looks to identify poor-performing edges and applications in a hub-spoke model and to add business policies updating the hub order at the edge-override level. At the input side, some embodiments include ENI incidents containing the ENI companyId affected, the logicalId of the affected edge, and a list of affected applications by ENI appStackId. Depending on the number of applications affected, some embodiments apply this new business policy to the affected applications, while in other embodiments, a single business policy is created to switch the hub order for all TCP traffic.

Because ENI “companyId”, edge “logicalId”, and ENI “appStackId” are not recognized by APIv1, the first step in the control action of some embodiments is to resolve these values to identifiers usable by APIv1, such as the enterprise's ID, edge's ID, and a configuration modules' “appId” (i.e., inside the field “match”). The “appId” is resolvable through the ENI file “vco_to_voyance_app_ids.json” and requires no calls to APIv1. For “enterpriseId”, the call /network/getNetworkEnterprises is made, in some embodiments, to get a list of all enterprise “logicalId” and corresponding “id”, which are stored as an internal map, in some embodiments. In some embodiments, a database (e.g., MongoDB) is then used to obtain the “logicalId” from the “companyId” and look up the appropriate “id” from the “logicalId”. For edges, /enterprise/getEnterpriseEdges is used to obtain a list of edges, in some embodiments, on a given “enterpriseId”, which is then used to construct a map of edge “logicalId” to edge “id”.

Using the enterprise and edge identifiers, some embodiments use the API to get the affected edge's configuration modules and apply changes. In some embodiments, this begins with a call to/edge/getEdgeConfigurationStack. Inside the resulting JSON, both the profile-level configurations and the edge (override)-level configurations are used, in some embodiments. The profile-level configuration, in some embodiments, is assumed to contain a business policy named “Internet backhaul” which contains the current hub order for the edge. This business policy is used as a template, in some embodiments, to construct a new business policy at the override level. At both the profile- and override-level, in some embodiments, the QoS module is extracted. In some embodiments, the profile-level “Internet backhaul” business policy is taken, the desired hub order changes, and, if applicable, appId, are applied, and this business policy is combined with the existing policies in the override-level, and the configuration change is made.

There are three cases for the configuration change, according to some embodiments. If the edge has no edge overrides and has not had any in the past, no QoS module at the override level will exist, in some embodiments, and as such, some embodiments create one with/configuration/insertConfigurationModule. If there have been previous edge overrides (or edge override currently exist), a QoS module will exist, which can be updated with /configuration/updateConfigurationModule, according to some embodiments. However, if there are no current edge overrides, in some embodiments, the global segment for the QoS will not exist (i.e., the “segments” field will be an empty JSON array). In the case where the global segment does not exist or a configuration module must be inserted, some embodiments first determine the logicalId of the global segment at the edge override-level. In some embodiments, doing so requires a call to/enterprise/getEnterpriseNetworkSegments. For simplicity of the internal Scala logic, this call is made, in some embodiments, even when it is not strictly necessary because of an existing global segment.

The API calls, in some embodiments, include/network/getNetworkEnterprises, /enterprise/getEnterpriseEdges, /edge/getEdgeConfigurationStack, /enterprise/getEnterpriseNetworkSegments, /configuration/updateConfigurationModule, and /configuration/insertConfigurationModule. In some embodiments, ENI incidents include a companyId, which is an identifier for the customer whose network is affected in the incident. Currently, ENI determines a network controller enterprise's logicalId from a companyId, in some embodiments. The enterprise's identifier must subsequently be determined, in some embodiments to provide as an argument to other API calls. To do so, some embodiments use /network/getNetworkEnterprises to fetch a list of all enterprises on a network controller, and use this list to construct a map of logicalId→id. When using APIv2, which works with logicalId directly, some embodiments do not need an analogous functionality for a v2 implementation of the control action system.

In some embodiments, ENI incidents include an edge's logicalId, but do not contain the edge's id. As such, some embodiments use/enterprise/getEnterpriseEdges to fetch a list of all edges on a given enterprise, and use this list to construct a map of logicalId→id. As with /network/getNetworkEnterprises, an analogous functionality for a v2 implementation of the control action system is not needed, in some embodiments.

To apply the hub order switch control action, in some embodiments, an update needs to be applied to an edge's QoS configuration module at the edge-override level. To simplify the construction of the hub order switch business policy, it is assumed, in some embodiments, that the edge has an existing rule at the profile level that includes the current hub order. Additionally, some embodiments use the override-level QoS for the edge, as in order to add a new business policy without removing old ones, the full JSON of the current QoS module is required.

In some embodiments, new business policies are applied to the global segment in self-healing. When no current edge-specific overrides exist, in some embodiments, the global segment may be empty, and so the override-level QoS module will include an empty “segments” field. In some such embodiments, a new global segment is inserted (i.e., rather than simply updating the existing one). Since no global segment exists on this module, in some embodiments, this API call is made to determine the ID of the global segment that is being inserted.

The API call/configuration/updateConfigurationModule, in some embodiments, applies the new business policy. The business policy is constructed from a template policy at the profile-level named “Internet backhaul”, in some embodiments, which is retrieved from /edge/getEdgeConfigurationStack. In some embodiments, the newly constructed policy is combined with existing policies (if any) and given as the data to /configuration/updateConfigurationModule.

In some embodiments, the API call/configuration/insertConfigurationModule also applies the new business policy. When an edge has no edge-level QoS module (e.g., when the edge has no edge-level overrides and has not had any such overrides in the past), in some embodiments, /configuration/updateConfigurationModule cannot be called as there is no module to update. Hence, some embodiments insert one instead. Beyond this detail, the logic for constructing a new business policy, in some embodiments, are largely similar between the insert and update calls.

In some embodiments, there are two modes in which self-healing operates. The first model, in some embodiments, includes recommendations with manual remediation, while and, the second model includes automated remediations, which are configurable at the level of an edge, application, and enterprise. When automatic remediation is disabled, in some embodiments, the recommendations by the control action system 305 are not immediately applied by the API-poller 310 to make configuration updates. Instead, they are shown to an end-user on a user interface (UI) provided by the ENI platform, in some embodiments, to allow the end-user to opt to apply, or disregard, these recommendations afterwards. To allow this, the ENI UI of some embodiments calls an ENI-internal API to apply the recommendation, which updates the recommendation's status in the distributed search and analytics datastore 360 and is subsequently read by the API-poller and applied.

The self-healing SD-WAN system of some embodiments also detects application performance anomalies on two different timescales. The first timescale is a shorter timescale at the minutes level, while the second timescale is a longer timescale at the days level, according to some embodiments. The first timescale, in some embodiments, addresses application issues that are happening currently (e.g., acute issues) and need to be addressed soon. Examples of such issues, in some embodiments, include sudden excessive packet drops at a router on an end-to-end path, a sudden network failure inside a datacenter, or a sudden issue with an application server. In some embodiments, the second timescale addresses application issues that are systemic and require a longer-term network optimization. Examples of these issues include an inefficient network setup that causes flows to be routed through inefficient routes, or a change in network utilization that has rendered initial configurations inefficient, in some embodiments.

As mentioned above, each SD-WAN edge FE streams per-application flow data to the ENI platform where the analytics cluster aggregates the flow data at a minute-level granularity, in some embodiments. For an edge e and application a, let {m_e,aⁱ(t)} denote the collection of flow-metrics such as packet-latency, packet drop, jitter, bytes sent/received, etc. Let {s_e,a(t)} denote the application performance scores computed from the above flow metrics for each edge and application. For example, s_e,appl(t) represents the performance score for a first application (e.g., Office 365 application) at edge e for flows routed through a respective configured gateway FE.

At the shorter timescale, which is on the order of minutes, the goal of the self-healing system, in some embodiments, is to detect an application performance issue that suddenly affects certain edges and flows on the network. To this end, a time-series outlier detection methodology is utilized, in some embodiments, as will be explained in greater detail below.

For instance, let Ω denote the outlier detection model applied to s_e,a(t) to detect if there is an anomaly or a sudden change in application performance score for a specific edge and application that warrants the self-healing system to take remediation action. While Ω can be any general timeseries anomaly detection model, a specific example used, in some embodiments, is a sliding window Gaussian outlier detection model. Let W denote a sliding window of data (e.g., 30 minutes) and assume that within this window, s_e,a(t) follows a Gaussian distribution. Let {tilde over (μ)}_e,aand σ_e,a, denote the sample mean and standard deviation in W. The instantaneous score value at time t is considered a deviation event δ(t) if |s_e,a(t)−{tilde over (μ)}_e,a|/{tilde over (σ)}_e,a>γ, for a specified threshold γ, according to some embodiments. To minimize false positives due to a single point variation, some embodiments look for consecutive points of deviation and then combine them to declare an application incident. Specifically, some embodiments generate an application performance incident if there are at least η (e.g., η=3) number of consecutive deviation events.

FIG. 5 conceptually illustrates a more detailed block diagram of an ENI platform of some embodiments that includes separate anomaly detectors for the shorter and longer timescales described above and an in-depth view of the shorter timescale anomaly detector. As shown, the ENI platform 500 includes a data ingestion system 505, an analytics system 510, and a control action system 515. The analytics system 510 (e.g., analytics cluster) includes an aggregator 520, storage 522, score calculator 524, and anomaly detector 526. In this example, the anomaly detector 526 includes a longer timescale detector 530 and a shorter timescale detector 540, which includes a graph generator 542 and a graph analyzer 544. The aggregator 520, storage 522, score calculator 524, and anomaly detector 526, in some embodiments, are processes run by one or more machines of the analytics system 510.

As the data ingestion system 505 receives flow data {m_e,aⁱ(t)} from network nodes (e.g., edge nodes and transit nodes), the data ingestion system 505 provides this flow data to the aggregator 520 of the analytics system 510 as also described above. The flow data, in some embodiments, includes packet-latency, packet drop, jitter, and bytes sent/received. The aggregator 520 aggregates the received flow data on a per-minute level, and places the aggregated data in the storage 522. In other embodiments, the aggregator 520 provides the aggregated data directly to the score calculator 524.

The score calculator 524 retrieves the aggregated data from the storage 522, or receives the aggregated data directly from the aggregator 520, in some embodiments, for use in calculating performance scores {s_e,a(t)} for each minute of aggregated data on a per-edge router, per-application, and per-path basis. As the score calculator 524 calculates the performance scores, the score calculator 524 provides the performance scores to the anomaly detector 526. As illustrated, the performance scores are iteratively provided to both the longer timescale detector 530 of the anomaly detector 526 and the shorter timescale detector 540 of the anomaly detector 526. In some embodiments, as the score calculator 524 provides the performance scores continuously as they are calculated, while in other embodiments, the score calculator 524 provides sets of performance scores. The processes of the longer timescale detector 530 will be described further below by FIG. 9.

The graph generator 542 of the shorter timescale detector 540 receives the performance scores from the score calculator 524 and uses the scores to generate graphs (e.g., Gaussian distribution curves). For a particular 30 minute window, the graph generator 542 computes a sample mean fie, and a standard deviation {tilde over (σ)}_e,a. In embodiments that utilize Gaussian distribution, the sample mean {tilde over (μ)}_e,adetermines the center of the distribution curve, while the standard deviation {tilde over (σ)}_e,adetermines the width of the distribution curve. Additionally, the height of any such Gaussian distribution curve is determined by α=1/(c√{square root over (2π)}). Based on these calculations, the graph generator 524 generates a graph for each 30 minute window for which it has received performance scores.

FIG. 6 illustrates a graph 600 of some embodiments that includes an example of a Gaussian distribution curve that is generated using performance scores computed over a particular 30 minute time window. The curve 620 has a height 640, and a center 630, as shown. While the majority of the performance scores, which are represented by multiple plotted points, fall within the curve 620, one performance score 660 falls outside of the curve. In some embodiments, this performance score 660 is considered an outlier. When three consecutive outliers are detected, in some embodiment, a deviation event is generated, as will be further described below. It should be noted that both the curve 620 and the plotted performance scores in this example are meant to be exemplary and are not representative of actual generated performance scores.

Because the graph generator 542 generates graphs for various 30 minute time windows, the curves of the generated graphs vary based on the performance scores used to generate them. More specifically, the sample mean and standard deviation, which determine the center and width of a distribution curve, are dynamic parameters that change over time based on the performance scores computed and received from the score calculator 524.

FIG. 7 illustrates an example graph 700 of some embodiments that includes four different distribution curves 720, 722, 724, and 726. In this example, three of the curves 720-724 have the same center (i.e., sampled mean), while each curve otherwise varies in both height and width (i.e., standard deviation). As such, not only do the distributions change from window to window, but what is considered an outlier (i.e., performance scores that fall outside of any given curve) changes from window to window as well. While the examples illustrated by FIGS. 6 and 7 show distribution curves, other examples can include any type of graph, including but not limited to bar graphs, scatter plots, histograms, etc.

As the graph generator 542 generates the graphs for various time windows (e.g., 30 minute time windows), it provides the graphs to the graph analyzer 544 of the shorter timescale detector 540 for analysis. In other embodiments, the shorter timescale detector includes a storage (not shown) that the graph generator 542 adds generated graphs to, and from which the graph analyzer 544 retrieves graphs for processing. In addition receiving graphs from the graph generator 542, the graph analyzer 544 receives the performance scores from the score calculator 524 for use in analyzing the graphs. In other embodiments, the graph generator 542 provides the performance scores to the graph analyzer 544 along with the generated graphs.

In some embodiments, the graph generator 542 computes values that can be used in the generation of graphs, as well as analyzed without generating a graph. For example, in some embodiments, the graph generator 542 computes the standard deviation and the sample mean, and provides these values, along with the performance scores used to compute these values, to the graph analyzer 544 for analysis (i.e., with or without a graph or graphs).

The graph analyzer 544 analyzes the graphs to identify outliers that are indicative of performance issues, according to some embodiments. The graph analyzer 544 identifies outliers by determining, for each performance score within a given 30 minute time window, whether the performance score is a deviation event δ(t) using |s_e,a(t)−{tilde over (μ)}_e,a|/{tilde over (σ)}_e,a>γ, where s_e,a(t) is the performance score, {tilde over (μ)}_e,ais the computed sampled mean for the time window, {tilde over (σ)}_e,ais the standard deviation computed for the time window, and γ is a specified threshold value, in some embodiments. False positives are minimized, in some embodiments, by identifying consecutive outliers and classifying a set of outliers as a deviation event when that set of outliers includes at least, e.g., 3, consecutive outliers within any particular 30 minute time window.

Once the graph analyzer 544 has identified a deviation event, the graph analyzer 544 provides the deviation event to the control action system 515. The control action system 515 identifies one or more remedial actions to implement in order to mitigate or eliminate anomalous behavior that led to the deviation event. Additional details regarding the processes performed by the control action system 515, as well as potential remedial actions, will be further described below.

FIG. 8 illustrates a process 800 performed in some embodiments to identify anomalies at the shorter timescale. The process 800 is performed by an analytics system, such as the analytics system 510 described above. As such, the process 800 will be described below with references to FIG. 5. In some embodiments, the process 800 is performed iteratively such that each step is performed repeatedly as additional performance scores are received.

The process 800 starts when the analytics system computes (at 810) performance scores based on flow data received from FEs during a particular time window. In some embodiments, the analytics system computes performance scores for each FE on a per-FE basis, a per-application basis, and a per-path (i.e., per-route) basis. The particular time window, in some embodiments, is a 30 minute time window. In some embodiments, the performance scores are computed at the per-minute level such that there is a performance score computed for each minute of flow data in the 30 minute time window. For example, the score calculator 524 described above computes performance scores based on the flow data it retrieves from the storage 522 and/or that it receives directly from the aggregator 520.

The process 800 computes (at 820) a sample mean and standard deviation of the performance scores for the time window. The sample mean is used to determine the center of the distribution curve of the performance scores, while the standard deviation is used to determine the width of the distribution curve, in some embodiments. The graph generator 542, for example, uses the performance scores received from the score calculator 524 to compute a sampled mean and standard deviation for each 30 minute time window for which it has received performance scores (e.g., by computing the sampled mean and standard deviation for minutes 1-30, 2-31, 3-32, and so on).

In some embodiments, the graph generator generates a distribution graph using the computed standard deviation, sampled mean, and performance scores. As illustrated in the block diagram 500, for instance, the distribution graph generator 542 passes several graphs to the distribution graph analyzer 544. Each of these several graphs, in some embodiments, is a respective distribution curve graph generated for a respective edge-application pair such that a distribution graph is generated per-edge, per-application, per-path for multiple 30 minute windows. For example, FIG. 7 described above illustrates a graph 700 that shows multiple distribution curves of some embodiments. In some embodiments, each curve is associated with the same edge-application pair for 4 separate time windows, or with the same edge and different applications, or different edges and the same application, etc.

The process 800 uses (at 830) a timescale outlier machine-trained process to identify any performance scores that deviate from the threshold during the time window. As described above, outliers δ(t) are identified using |s_e,a(t)−{tilde over (μ)}_e,a|/{tilde over (σ)}_e,a>γ, where s_e,a(t) is the performance score, {tilde over (μ)}_e,ais the computed sampled mean for the time window, {tilde over (σ)}_e,ais the standard deviation computed for the time window, and γ is a specified threshold value, in some embodiments. That is, in some embodiments, the dynamic parameters {tilde over (μ)}_e,aand {tilde over (σ)}_e,aare used to determine whether a given performance score exceeds the specified threshold γ.

The outlier identification is performed, in some embodiments, by the graph analyzer 544 as part of a shorter timescale detection process. The distribution graph analyzer 544, in some embodiments, analyzes the generated distribution curve graphs and determines whether any performance scores fall outside of the generated distribution curves (i.e., whether any of the performance scores are outliers with respect to the generated distribution graph). At least one performance score 660 falls outside of the curve 620 in the graph 600 described above, for example.

The process 800 determines (at 840) whether at least η consecutive deviations have been detected. In some embodiments, to prevent or at least minimize false positives, at least η (e.g., η=3) number of consecutive outliers are detected before a deviation event is generated. In other embodiments, a single detected outlier triggers a deviation event to be generated. In still other embodiments, the anomaly detector determines whether at least η outliers are detected within a time window (i.e., regardless of whether the outliers are consecutive) before the outliers are considered to be a deviation event. The time window during which at least η outliers are detected, in some embodiments, is the same as the time window for which the performance scores have been computed (e.g., 30 minutes), while in other embodiments, the time window is a smaller time window (e.g., a subset of 15 minutes) within the time window for which the performance scores have been computed. In still other embodiments, the time window during which at least η outliers are detected can span multiple time windows (e.g., multiple 30 minute windows).

When fewer than η outliers have been detected, the process 800 returns to compute (at 810) performance scores based on flow data (e.g., metrics) received from FEs during a time window. When at least η consecutive deviations have been detected, the process 800 transitions to generate (at 850) a deviation event based on the η consecutive deviations. The detection of deviations (e.g., outliers) is done by the graph analyzer 544, in some embodiments, based on data received from the graph generator 542 and, in some embodiments, data (e.g., performance scores) received from the score calculator 524. Each deviation event, in some embodiments, specifies “sampleTime” (i.e., the time of the event), “latestScore”, “historicalMean”, “scoreTimeSeries” (i.e., last 30 minutes of score 1 minute series), “edgeId” (i.e., the edge FE identifier), “nextHopId”, and “rootCauseIndicators”.

The process 800 uses (at 860) a reinforcement learning machine-trained process to identify a remedial action for remediating the deviation event. For example, once a deviation event has been generated by the graph analyzer 544, the deviation event is provided to the control action system 515, which uses the deviation event, and other data (e.g., configuration data associated with edge and/or transit nodes identified by the deviation event, performance scores, flow data, other metrics, etc.) to identify a remedial action to implement for remediating the deviation event.

After one or more remedial actions have been identified, the process 800 sends (at 870) an API call to the network controller to direct the network controller to implement the identified remedial action(s). The API call, in some embodiments, includes any configuration changes to be implemented as remedial actions, and is sent by an API poller of the control action system. Examples of API calls utilized, in some embodiments, are described above, such as the API calls illustrated by FIG. 4. Following 870, the process 800 ends.

In some embodiments, the process 800 is performed for only some of the critical edge routers, but not for all edge routers and not for transit routers (e.g., gateway routers or hub routers). In other embodiments, the process 800 is performed for every edge router, but not for the transit routers (e.g., gateway routers or hub routers). In still other embodiments, the process 800 is performed not only for some or all of the edge routers, but also for each transit router (e.g., each gateway router and each hub router). In yet other embodiments, the process 800 is performed just for transit routers (e.g., gateway routers and hub routers) but not for edge routers.

At the longer timescale, which is on the order of days, the goal of self-healing is to detect systemic issues in the application performance that require longer term network configuration changes, in some embodiments. Examples of such changes, in some embodiments, include assigning an edge to a different primary gateway for routing certain application flows. To achieve this, some embodiments use a global topology-based outlier analysis to detect systemic application issues.

In some embodiments, the global topology-based outlier analysis starts with a collection of application performance score data, {s_e,a(t)}, over a time window W. In this analysis, however, the time window is much larger than in the shorter timescale analysis (e.g., the last two weeks). In some embodiments, the long timescale detector 530 generates a custom topology graph for each application in a set of applications that has flows traversing through the SD-WAN (e.g., a virtual SD-WAN for each application).

The long timescale detector 530 then iteratively updates its generated topology graph as follows. Let F denote a graph representing the flow of traffic on the overlay SD-WAN topology for a specific application. Each node in the graph represents an SD-WAN node and an edge represents the flow of traffic for an application. For example, in the case of application flows routed to their destination via an SD-WAN gateway or SD-WAN hub FE, this graph is a bi-partite graph. One set of vertices of the bi-partite graph represent the SD-WAN edges in the overlay network and the other set of vertices represent the SD-WAN gateway FEs. A new graph edge is drawn between an edge, say el, and a gateway, say g₁, if flows for that application originating at el are now routed through g₁(i.e., as opposed to being routed through a different gateway g₂).

In some embodiments, when the initial topology graphs are generated, weights are assigned to each edge (i.e., path between two nodes) of each graph. In some embodiments, the weights are default weights. As such, after a topology graph has been updated based on the performance scores, the assigned weights are updated by mapping the application performance score timeseries to a number, according to some embodiments. The goal of this mapping function, in some embodiments, is to compute a value that represents the application performance for the corresponding edge router and gateway router combination over the respective time window. In its simplest form, the mapping function of some embodiments is the average of the performance scores s_e,a(t) over the time window W. Another more complex mapping function of some embodiments is the average of s_e,a(t) after filtering out low network utilization points.

In some embodiments, the topology-based outlier analysis is carried out on the above edge weighted graph F to detect whether an application issue is isolated to an edge FE, a gateway FE or to the overall application. In the first case, where an application issue is isolated to an edge FE, the edge weight connecting the edge FE to the respective gateway FE deviates significantly from other edges in F, according to some embodiments. In the second case, in some embodiments, where the application issue is due to a gateway FE, the sum weights of edges connecting to the respective gateway deviates significantly as compared to other gateway FEs. Finally, in the third case of overall application issue the edge weights across the graph deviates significantly from edge weights in other application graphs, in some embodiments.

FIG. 9 conceptually illustrates another block diagram of the ENI platform of FIG. 5 with an in-depth view of the longer timescale anomaly detector of some embodiments. As shown, the longer timescale detector 530 includes a score storage 932, a topology graph updater 934, and a topology graph analyzer 936. As the score calculator 524 computes performance scores from the flow data (e.g., metrics) aggregated by the aggregator 520, the score calculator 524 provides the performance scores to both the shorter timescale detector 540 and the longer timescale detector 530.

As mentioned above, the longer timescale detector 530 of some embodiments generates a custom topological graph for each particular application in a set of applications that have flows traversing through the SD-WAN. In some embodiments, the topology graph updater 934 generates the custom topology graphs for each application, while in other embodiments, the longer timescale detector 530 includes a separate topology graph generator process.

To generate a topology graph for each application, the longer timescale detector 530 in some embodiments defines one graph node for each edge or transit router (e.g., each edge, gateway, or hub router) that is used to forward one or more flows of the application through the SD-WAN. For each pair of nodes that represent a pair of routers through which one or more flows of the application traverse, the detector 530 also defines a graph edge in the graph to represent the tunnel between the pair of routers through which the application's flows traverse.

In each custom topology graph, a path between first and second edge node traverses zero or more transit nodes in the graph and one or more graph edges between these edge and/or transit nodes. This path in the graph is equivalent to a routing path from between the first and second edge routers represented by the first and second edge nodes in the graph. When the graph path traverses through one or more transit nodes, the routing path similarly traverses through one or more transit routers (e.g., hub or gateway routers). The inter-node graph edges in a graph path in some embodiments are equivalent to the tunnels between the routers in the routing paths, as mentioned above.

The long timescale detector 530 iteratively updates its generated topology graph(s) as it receives new performance scores from the score calculator 524. Specifically, as performance scores are received from the score calculator 524, the received performance scores are added to the score storage 932. When a threshold amount (e.g., n number of days- or weeks-worth) of performance scores are received for a particular key (e.g., edge router, application, route tuple), the topology graph updater 934 retrieves the collection of performance scores from the score storage 932 and uses the performance scores to update a topology graph corresponding to the particular key. Each topology graph includes nodes representing SD-WAN nodes (e.g., edge nodes and transit nodes), and edges representing traffic flows between the nodes for an application. In addition to updating the topology graphs, the topology graph updater 934 updates weights assigned to the edges of the topology graph based on performance scores for the nodes connected by the edges.

FIG. 10 illustrates simplified examples of two topology graphs of some embodiments for first and second applications. As shown, each topology graph 1010 and 1020 includes multiple nodes, labeled with an “e” to indicate edge node, or a “t” to indicate transit node (e.g., hub node or gateway node). Additionally, each edge between two nodes includes an assigned weight. For example, the edge between edge node “e1” and transit node “t3” is assigned a weight of 3 in the application 1 topology graph 1010, while this same edge is assigned a weight of 4 in the application 2 topology graph 1020.

As the topology graph updater 934 updates the weighted topology graphs, it provides these graphs to the topology graph analyzer 936 as shown. The topology graph analyzer 936 analyzes the weighted topology graphs to identify application issues for the time window represented by the graph (e.g., a two week window), and whether these application issues are isolated to an edge FE, whether they are due to a gateway FE, or whether they are overall application issues, according to some embodiments. Once an application issue, and the cause of the issue (i.e., isolated to an edge FE, due to a gateway FE, or affects the overall application), the topology graph analyzer 936 generates an event identifying the issue and provides the event to the control action system 515 for remediation.

FIG. 11 conceptually illustrates a process 1100 performed in some embodiments to identify anomalies at the longer timescale. The process 1100 is performed by an analytics system, such as the analytics 510 described above. The process 1100 will be described below with references to FIGS. 9 and 10. Like the process 800, the process 1100 is performed iteratively, in some embodiments, such that each step is performed repeatedly as additional performance scores (e.g., application score data) are received.

The process 1100 starts when the process receives (at 1110) a collection of application score data over a particular time window (e.g., a two-week time window). For instance, the topology graph updater 934 of some embodiments retrieves application score data from the score storage 932 once enough scores from the particular time window have been added to the score storage 932. As the performance scores are computed on a per-edge router, per-application, and per-route basis, the collection of application score data, in some embodiments, includes performance scores aggregated by application, such that each performance score is associated with the same application, but with different edge routers and routes.

Based on the collection of application score data, the process 1100 updates (at 1120) a topology graph that includes nodes for each FE and edges between the nodes to represent application traffic flows. That is, in some embodiments, an initial topology graph is generated based on edge routers, gateway routers, hub routers, and the paths between them, and this topology graph is then updated using the collection of application score data. For example, in some embodiments, changes to the paths and/or forwarding element configuration, such as added or removed paths and/or forwarding elements, are reflected in the performance score data and used to update the initial (or otherwise previous version of) the topology graph.

The process 1100 then uses (at 1130) the collection of application score data to assign or adjust weights to the graph edges. Each of the edges in the graphs 1010 and 1020, for instance, have assigned weights, as also described above. Each weight value, in some embodiments, can be mapped back to a corresponding performance score, or average of a set of performance scores over the time window.

In some embodiments, the weight values are generated for a range of scores over the longer time duration (e.g., a two-week duration of time), such as by using a weight calculator to produce a weight score from each performance score that has been collected for the longer time duration. The computed average of the performance scores, in some embodiments are blended averages. For instance, in some embodiments, a blended average is computed from all performance scores in the time duration (e.g., a two-week time window), while treating older scores (e.g., from week 1 of the two-week time window) with less significance than newer scores (e.g., by using another set of weight values to give more weight to the newer scores compared to older scores). Also, in some embodiments, the generated weight values are per-path (i.e., end-to-end), not per-edge (i.e., an edge between two nodes).

The process 1100 uses (at 1140) a topology-based outlier analysis to analyze the weighted topology graph in order to detect an application issue. In some embodiments, the topology-based outlier analysis is performed by comparing different topology graphs generated for different applications to determine whether any of the weights in a particular topology graph deviate significantly from other topology graphs, and/or whether the summed weights of edges connected to a particular transit node in a first topology graph deviate significantly compared to the summed weights of edges connected to the particular transit node in a second topology graph. The comparison, in some embodiments, are between topology graphs generated for different applications during the same time period (i.e., the same two week window), or between topology graphs generated for the same application during different time periods (i.e., different two week windows).

The process 1100 determines (at 1150) whether the application issue is isolated to an edge FE. In some embodiments, the application issue is determined to be isolated to an edge FE when the edge weight connecting the edge FE to a respective gateway FE deviates significantly from other edge weights in F. When the process 1100 determines that the application issue is isolated to an edge FE, the process 1100 transitions to 1180.

When the process 1100 determines that the application issue is not isolated to an edge FE, the process 1100 transitions to determine (at 1160) whether the application issue is due to a transit FE (e.g., a hub FE or gateway FE). In some embodiments, when the application issue is due to a transit FE, the sum weights of edges connecting to the respective transit FE deviates significantly as compared to other transit FEs during the same time period (i.e., same two-week window) and/or different time periods (i.e., different two-week windows). In other embodiments, when the application issue is due to a transit FE, the sum weights of edges connecting to the transit FE for the particular time period deviates significantly as compared to the sum weights of edges connecting to the transit FE for other time periods (i.e., previous two-week windows). When the process 1100 determines that the application issue is due to a transit FE, the process 1100 transitions to 1180.

When the process 1100 determines that the application issue is not due to a transit FE, the process 1100 transitions to determine (at 1170) that the application issue is an overall application issue. In some embodiments, when the application issue is an overall application issue, the edge weights across the topology graph for the particular time period (i.e., a particular two-week window) deviate significantly from edge weights in other application graphs (i.e., for other applications) for the particular time period (i.e., the particular two-week window). In other embodiments, when the application issue is an overall application issue, the edge weights across the topology graph for the particular time period (i.e., the particular two-week window) deviate significantly from edge weights in other application graphs for the same application during other particular time periods (i.e., previous two-week windows).

After the application issue, and the cause of the application issue, has been determined, the process 1100 uses (at 1180) a reinforcement learning machine-trained process to identify a remedial action for remediating the application issue. For instance, after the topology graph analyzer 936 has identified an application issue, the topology graph analyzer 936 generates an event identifying the issue and provides this event to the control action system 515 for remediation.

In some embodiments, when the application issue is isolated to an edge router, the application issue is due to a particular link used by the edge router to forward traffic for the application. In some such embodiments, a remedial action is to direct the edge router to forward traffic for the application on a different link. For example, an edge router of some embodiments is directed not to use one of three tunnels available to the edge router for outbound traffic for the application. In some other embodiments, a branch site can have more than one edge router (e.g., one edge per physical link), and when an issue is isolated to an edge router at such a site with multiple edge routers, a remedial action is to redirect traffic to a different edge device at the site.

After identifying a remedial action to implement in the SD-WAN to remediate the application issue, the process 1100 then sends (at 1190) an API call to the network controller to direct the network controller to implement the identified remedial action. Examples of such API calls are described above and illustrated by FIG. 4. Following 1190, the process 1100 ends.

In some embodiments, once the system detects that certain application flows are having an issue, the control action system takes remediation actions to resolve the issue. As described above, some embodiments focus on path-level parameters as the control knob, and, more specifically, dynamically selecting the overlay transit node to re-route affected application flows on alternate paths.

In some embodiments, depending on the network setup, there are two scenarios of route adaptation. In the first scenario, in some embodiments, application traffic at an edge cannot be sampled and partially routed on alternate paths. In the second scenario, application traffic at an edge can be sampled, in some embodiments, and a fraction of traffic can be routed on an alternate path. For example, with the assumption that there are 1000 flows for a first application at edge el routed through gateway g₁, in the case when flow sampling is allowed, a fraction of these flows (e.g., 10%) can be routed on an alternate path through gateway g* without affecting the rest of the traffic flow, according to some embodiments. The difference between sampling and non-sampling of flows, in some embodiments, manifests in terms of how alternate paths are determined for the control action.

Let s_g(t)={s_e,a(t)}_gdenote the application performance scores for edges whose application flows are routed through gateway node g. For the edge e (with flows currently routed through Gateway node g) detected as having an application performance anomaly (e.g., a sudden drop in application score due to excessive packet drops on the current route), let Φ_edenote the set of all possible gateways available for that edge. The objective of the route-adaptation control action is to determine the best alternate gateway g∈Φ_eto re-route flows for edge e and application a to alleviate the application issue, in some embodiments.

For the scenario where traffic flows cannot be sampled and routed on different alternate routes, some embodiments utilize a ranking-based approach to determine the best alternate gateway. In some embodiments, this is achieved by maintaining a real-time aggregate score of each gateway FE based on flows that are routed through them and selecting the gateway with the best instantaneous score. Let f(s_g(t)) denote the real-time gateway score function, then the control action is to choose g* such that,

g*=arg max_g∈Φ_ef(s_g(t))

For the second case when traffic flows can be sampled and routed on alternate paths before making a control action decision, in some embodiments, the actual performance on alternate paths can be measured. This lends itself naturally to a reinforcement learning-based approach, in some embodiments, and a greedy algorithm to find the best alternate path as described below.

In some embodiments, the algorithm proceeds by sampling flows and re-routing them through an alternate gateway g∈Φ_ein a round-robin fashion. The algorithm then picks the gateway node that has the best application performance score among the sampled flows, in some embodiments. Specifically, let E denote the fraction of flows that are sampled and routed through the alternate gateway g, and let s_g^∈ be the corresponding application performance score for the sampled flows. Then, g* is chosen such that,

g*=arg max_g∈Φ_es_g^∈

As compared to a ranking-based approach, the greedy algorithm uses the actual performance measurement values from the sampled flows at an edge, in some embodiments. This allows the system of some embodiments to make decisions on direct path measurements from the edge to the alternate gateway node.

FIG. 12 conceptually illustrates a block diagram that provides a more in-depth view of the control action system of some embodiments. As shown, the control action system 1200 includes a flow data storage 1220, incidents storage 1225, existing configurations storage 1230, actions storage 1235, remedial action identifier 1240, remedial action selector 1250, and an API poller 1215. The flow data storage 1220 and incidents storage 1225 are populated by the analytics system 1205, as shown.

In some embodiments, the flow data storage 1220 stores flow data (e.g., metrics) and performance scores (e.g., performance scores computed from the flow data), while in other embodiments, the control action system 1200 includes a separate storage for performance scores. The incidents storage 1225 stores incident events generated by the analytics system 1205 based on short timescale and long timescale outliers/deviations detected by the analytics system 1205. The existing configurations storage 1230 stores existing configuration data for FEs retrieved by the API poller 1215 from the network controller 1210 based on what FEs are associated with the incidents in the incidents storage 1225.

The remedial action identifier 1240 uses flow data and/or performance scores from the flow data storage 1220, incident events from the incidents storage 1225, and existing configuration data from the existing configurations storage 1230 in order to identify sets of potential remedial actions for each incident in the incident events storage 1225, in some embodiments. For example, for an incident event identifying an application issue that is due to a particular transit FE, the remedial action identifier 1240 of some embodiments identifies a set of alternate routes that do not traverse the particular transit FE for each affected edge FE-application pair. As the remedial action identifier 1240 identifies sets of potential remedial actions, the remedial action identifier 1240 adds these sets of potential remedial actions to the actions storage 1235.

The API poller 1215, of some embodiments, retrieves the sets of potential remedial actions and directs the network controller 1210 to implement the potential remedial actions for certain sampled flows for temporary time periods. As the sampled flows traverse the FEs, the FEs provide flow data to a data ingestion system as described above. The data is then processed, in some embodiments, by the analytics system 1205, and the performance scores are added to the flow data storage 1220, according to some embodiments. In other embodiments, these performance scores are stored to a separate test-performance-score storage (not shown).

The remedial action selector 1250 of some embodiments retrieves the performance scores associated with the potential remedial actions from the storage 1220 and the potential remedial actions from the actions storage 1235 in order to select the potential remedial action having the best associated performance score. The remedial action selector 1250 uses a reinforcement learning machine-trained process, in some embodiments, to select a remedial action for implementation for all affected flows. As described above, for embodiments where the remedial actions include routing flows through alternate transit FEs (e.g., hub FEs and gateway FEs), where ∈ denotes the fraction of flows that are sampled and routed through the alternate gateway g (or other transit FE), and where s_g^∈ is the corresponding application performance score for the sampled flows, then g* is chosen such that g*=arg max_g∈Φ_es_g^∈.

Once the remedial action selector 1250 has selected a remedial action, the remedial action selector 1250 provides the selected remedial action to the API poller 1215. The API poller then sends an API call that includes the remedial action (e.g., an updated configuration for an edge FE) to the network controller 1210 to autonomously direct the network controller 1210 to implement the remedial action.

FIG. 13 conceptually illustrates a reinforcement learning process 1300 performed by the control action system of some embodiments using the greedy algorithm described above. The process 1300 starts when the control action system receives a detected anomaly from the anomaly detection process of the analytics system. For example, the analytics system 1205 adds a new incident event to the incidents storage 1225 of the control action system 1200.

The process 1300 determines (at 1310) that the anomaly detected in the SD-WAN requires remediation. In some embodiments, this determination is performed by the anomaly detector before generating and sending a deviation event to the control analytics system. For example, in some embodiments, η (e.g., η=3) consecutive outlying performance scores are detected before a determination is made that a particular edge FE associated with the performance scores (e.g., for the shorter timescale analysis) is exhibiting anomalous behavior that requires remediation, while fewer than η consecutive outlying performance scores would not result in such a determination.

In other embodiments, η outlying performance scores within a particular time window (e.g., at least 3 within a 30 minute time window), regardless of whether these outliers are consecutive, results in a determination that a particular edge FE associated with the performance scores (e.g., for the shorter timescale analysis) is exhibiting anomalous behavior that requires remediation. As such, in some embodiments, determining that a detected anomaly requires remediation is performed along with receiving an incident event identifying anomalous behavior that requires remediation from the analytics system 1205. Also, in some embodiments, determining that an anomaly requires remediation includes a determination that remediating the anomaly will improve performance for one or more flows that traverse the SD-WAN.

The process 1300 identifies (at 1320) a set of two or more remedial actions for remediating the detected anomaly. For example, in some embodiments, the detected anomaly is a spike in latency associated with a particular hub FE that is used as a next-hop by one or more edge FEs when forwarding application traffic to one or more applications deployed to, e.g., a cloud datacenter, and the identified remedial actions include a set of alternate routes to the application(s). These alternate routes, in some embodiments, include routes through other FEs (e.g., other hub FEs and/or gateway FEs), direct routes, etc. As described above, this is performed by the remedial action identifier 1240 of the control action system 1200, in some embodiments, using data from each of the storages 1220, 1225, and 1230.

The process 1300 selects (at 1330) a remedial action from the identified set of remedial actions to implement for a sample of flows during a specified time period. As described above, in some embodiments, the greedy algorithm used in the reinforcement learning samples flows and re-routes these flows through alternate FEs (i.e., re-routes through available alternate routes) in a round-robin fashion. As such, in some embodiments, the API poller 1215 sends each potential remedial action to the network controller 1210 individually to selectively implement the potential remedial action, while in other embodiments, the API poller 1215 sends a single API call that includes each of the potential remedial actions for implementation. The API calls, in some embodiments, also specify a time duration for which each potential remedial action should be implemented, as well as a set of one or more flows for which each potential remedial action should be implemented.

The process 1300 monitors (at 1340) performance of the SD-WAN for the sample of flows during the specified time period for which the selected remedial action is implemented. The monitoring, in some embodiments, includes collecting performance measurement values associated with the sampled flows from one or more edge FEs for which the remedial action (e.g., an alternate route from the edge FE to the application(s)) is applicable.

The process 1300 then generates (at 1350) a performance score for the selected remedial action based on the monitored performance. The collected performance measurement values, in some embodiments, are stored in the flow data storage 1220, and retrieved by the remedial action selector 1250 to generate the performance scores. In other embodiments, the performance scores are computed by the analytics system 1205 based on the performance measurement values and added to the flow data storage 1220 for retrieval by the remedial action selector. When the remedial action is an alternate route, the generated performance score, s_g^∈, is representative of application performance when using said alternate route.

The process 1300 determines (at 1360) whether there are additional remedial actions to implement. In some embodiments, each of the remedial actions are implemented simultaneously such that steps 1330-1350 are performed simultaneously for each identified remedial action using respective sampled flows that are assigned in a round robin fashion. In other embodiments, each remedial action is implemented and monitored individually, and once the specified time period for implementation and monitoring has timed out, a next remedial action is selected for implementation and monitoring. As such, when additional remedial actions have yet to be temporarily implemented, the process 1300 returns to select (at 1330) a remedial action from the identified set.

When the process 1300 determines (at 1360) that there are no additional remedial actions to temporarily implement (i.e., all available remedial actions have been implemented, monitored, and scored), the process 1300 identifies (at 1370) the remedial action having the best generated performance score. As described above, the remedial action (e.g., alternate gateway) having the best application performance score is selected by the remedial action selector 1250 such that g*=arg max_g∈Φ_es_g^∈.

The process 1300 then implements (at 1380) the identified remedial action for all applicable flows. In some embodiments, the applicable flows are associated with a single application and a single edge FE, while in some other embodiments, the applicable flows are associated with two or more applications and two or more edge FEs. The API poller 1215 sends the remedial action in an API call to the network controller 1210 for autonomous implementation, in some embodiments. Following 1380, the process 1300 ends.

FIG. 14 conceptually illustrates an example diagram of a self-healing SD-WAN 1400 in which alternate routes are monitored for sample flows between an edge router and an application. As shown, the SD-WAN 1400 includes multiple SD-WAN edge FEs 1410, 1412, 1414, and 1416 that each connect one or more client devices 1420 to the SD-WAN, and multiple hub FEs 1430, 1432, and 1434 that connect the SD-WAN edge FEs 1410-1416 to multiple clouds 1440, 1442, and 1444. Each of the clouds 1440-1444 hosts each of three applications (App1, App2, App3). Additionally, the FEs in this example are in a full mesh such that each SD-WAN edge FE 1410-1416 connects to each hub FE 1430-1434.

The edge FE 1410 in this example is initially configured to use hub 11430 as a next-hop to reach application 1 in the cloud 1440 via the route 1450, which is dashed to indicate an anomaly with this route. The application 1 is also hosted by cloud 1442, which is reachable via the hub 21432, and by cloud 1444, which is reachable via the hub 31434. Accordingly, in some embodiments, to determine a best alternate path (i.e., as a remedial action based on the anomaly associated with the first path 1450) sampled flows are routed via alternate routes 1455a and 1455b from the edge router 1410 and application 1 in each of the clouds 1442 and 1444.

For example, assume application 1 is a web application (e.g., Microsoft 365). In order to determine which of the paths 1455a or 1455b is a better alternate route for the web application traffic, some embodiments send a first subset of these web application traffic flows (i.e., sampled flows) to the instance of the web application in the cloud 1442 via the hub FE 1432 on path 1455a, and a second subset of these web application traffic flows to the instance of the web application in the cloud 1444 via the hub FE 1434 on path 1455b, while the remaining flows for this web application will continue to be sent to the instance of the web application in the cloud 1440 via the hub 1430 on path 1450.

As these alternate routes are implemented, the edge router 1410 collects performance measurement values and provides these values to the control action system (not shown) for use in generating performance scores for each route 1455a and 1455b representing application performance for application 1 (e.g., a web application such as Microsoft 365) by each route. In some embodiments, each edge router 1410-1416 runs a process for collecting flow data as they process and forward application traffic flows to and from client devices 1420.

Once either of the routes 1455a or 1455b has been selected based on the performance scores generated for these alternate routes, all subsequent application traffic flows for the web application are sent via the selected alternate route. For example, when the path 1455a has a better performance score than path 1455b, all traffic flows for the web application are sent to the web application instance in the cloud 1442 via the hub FE 1432 on path 1455a, while paths 1450 and 1455b will not be used for application traffic for this web application.

In some embodiments, the flow data collected by the edge routers includes flow data associated with applications executing on devices that operate at different branch sites. For example, FIG. 15 conceptually illustrates another example diagram of a self-healing SD-WAN 1500 of some embodiments in which alternate routes are identified between edge devices located at different branch sites for sending VOIP traffic between client devices at the different branch sites. As shown, the SD-WAN 1500 includes multiple forwarding elements such as the SD-WAN edge routers 1510 and 1515 located at branch sites 1570 and 1575, and SD-WAN gateway routers 1530 and 1535 deployed to respective clouds 1550 and 1555.

The SD-WAN edge router 1510 is located at a branch site 1570 which also includes a client device 1520, while the edge router 1515 is located at a branch site 1575 which also includes a client device 1525. Each of the client devices 1520 and 1525 execute a respective VOIP (voice over IP) application instance 1540 and 1545. The SD-WAN edge routers 1510 and 1515 forward VOIP traffic flows between the client devices 1520 and 1525.

In this example, the SD-WAN edge routers 1510 and 1515 use the path 1560, which traverses the SD-WAN gateway 1535, for forwarding VOIP traffic flows. In addition to the path 1560 between the edge routers 1510 and 1515, traffic flows can also be sent using either of the alternate paths 1562 and 1564. The alternate path 1562 is a direct route between the edge routers 1510 and 1515, while the alternate path 1564 traverses the SD-WAN gateway router 1530, as shown.

Each path, in some embodiments, is defined by tunnels established between the different forwarding elements that implement the SD-WAN (e.g., edge routers, gateway routers, and hub routers). For example, the path 1560 is defined by tunnels established between the SD-WAN edge router 1510 and SD-WAN gateway router 1535, and between the SD-WAN gateway router 1535 and the SD-WAN edge router 1515. The path 1562 is defined by a direct tunnel established between the SD-WAN edge routers 1510 and 1515. Lastly, the path 1564 is defined by tunnels established between the SD-WAN edge router 1510 and SD-WAN gateway router 1530, and between the SD-WAN gateway router 1530 and the SD-WAN edge router 1515.

When an anomaly is detected with the path 1560 (e.g., due to anomalous behavior by the gateway router 1535), in some embodiments, the control action system described above identifies the alternate paths 1562 and 1564 and tests each path to determine which is the optimal path for forwarding the VOIP traffic flows. For instance, in some embodiments, the control action system directs a first subset of the VOIP traffic flows to the direct path 1562 and a second subset of the VOIP traffic flows to the path 1564 through the SD-WAN gateway router 1530, while all remaining VOIP flows continue to be sent on the path 1560. After a temporary period of time, the control action system then selects either the path 1562 or the path 1564 and directs the edge routers 1510 and 1515 (e.g., by sending an API call to a network controller that manages the edge routers) to forward all VOIP traffic flows on the selected path.

FIG. 16 conceptually illustrates a process performed in some embodiments to identify and remediate performance incidents in an SD-WAN. The process 1600 is performed, in some embodiments, by a set of one or more anomaly detection and anomaly remediation processes executing on one or more host machines to monitor, detect, and auto-remediate end-user application and security issues. In some embodiments, these one or more host machines operate as part of an ENI platform (e.g., the ENI platform 170). The process 1600 will be described below with references to the self-healing SD-WAN 100.

The process 1600 starts by receiving (at 1610) multiple sets of flow data associated with multiple packet flows that traverse multiple forwarding elements in the SD-WAN. The flow data, in some embodiments, includes five-tuple data for the flow, an application identifier, protocol, flow statistics (e.g., TX bytes, RX bytes, TCP latency, and TCP retransmissions), and overlay route information (e.g., overlay route type, next hop overlay node, and destination hop overlay node). As illustrated by the SD-WAN 100, the SD-WAN edge FEs 120-124, the SD-WAN gateway FE 165, and the SD-WAN hub FE 145 all provide network data to the ENI platform 170.

The process 1600 aggregates (at 1620) the received sets of flow data on a per-minute level to generate aggregated sets of metrics. In some embodiments, as described above, the ENI platform 170 includes one or more machines that execute multiple processes, such as those illustrated by FIGS. 2, 3, 5, 9, and 12, including an ENI backend for receiving the flow data from the edge FEs, a message broker for passing converted flow data from the ENI backend to the data aggregation pipeline. For each minute of flow data received, in some embodiments, the data aggregation pipeline (or data aggregation and application score computation pipeline) aggregates flow data for that minute to generate a set of per-minute metrics.

The process 1600 uses (at 1630) a first set of one or more machine-trained processes to analyze the aggregated sets of metrics and identify any performance incidents. In some embodiments, the machine-trained processes include timeseries anomaly detection processes (e.g., the sliding window Gaussian outlier detection model), global topology-based outlier detection processes, and piecewise function processes. The ENI platform processes the aggregated per-minute metrics, in some embodiments, to generate QoE scores representing performance on a per-edge, per-application, and per-overlay route level for each minute of aggregated data.

The process 1600 determines (at 1640) whether any performance incidents have been identified. In some embodiments, the QoE scores generated by the scoring pipeline of the ENI platform are not indicative of any issues in the network. As such, when no performance incidents have been identified, the process 1600 ends.

Otherwise, when at least one performance incident has been identified, the process 1600 transitions to use (at 1650) a second set of one or more machine-trained processes to identify at least one remedial action for remediating each identified performance incident. Examples of performance incidents, in some embodiments, include spikes in latency measurements increased packet drops. The remedial actions, in some embodiments, depend on network nodes (e.g., edge nodes and/or transit nodes) associated with the identified performance incident.

For instance, a remedial action of some embodiments includes changing the order of hubs for one or more edges based on determinations that (1) latency for a particular application's traffic forwarded by the edges is too high and (2) each of the one or more edges uses the same hub as a next hop. Another example of a remedial action, in some embodiments, is to instantiate a new transit node (e.g., a new gateway router or a new hub router) on the network for forwarding traffic associated with one or more applications for which anomalies have been identified.

After the remedial actions have been identified, the process 1600 sends (at 1660) an API call specifying the identified remedial action(s) to an SD-WAN controller to direct the SD-WAN controller to implement the identified remedial action(s). That is, the ENI platform of some embodiments, in coordination with the SD-WAN controller, autonomously implements remedial actions without requiring any end-user input. In some embodiments, the remedial actions are provided to the SD-WAN controller as configuration updates to, e.g., a configured hub order for a particular edge FE.

For embodiments where the remedial action is the addition of a new transit node (e.g., a new gateway router or new hub router), the SD-WAN controller implements the remedial action by instantiating and configuring a new machine to serve as the new transit node, as well as by adjusting configurations of edge routers or other transit routers that are to use the new transit node as a next-hop for forwarding traffic for one or more applications. The new transit node, in some embodiments, is instantiated in the same cloud or datacenter as an existing transit node that was associated with the anomaly that the new transit node is meant to remediate. In other embodiments, the new transit node is instantiated in a different cloud or datacenter. Also, in some embodiments, more than one new transit node is instantiated and configured to obviate the anomaly. Following 1660, the process 1600 ends.

Multiple examples of network impairments (i.e., network issues) detected in some embodiments and corresponding solutions are described below. As a first example, a set of edge routers of some embodiments are initially configured to use a particular hub FE as a primary hub FE. Upon detection of a network impairment on the particular hub WAN links causing increased latency of outgoing traffic, some embodiments implement a solution to change the hub order for each of the edge routers in the set.

As a second example, in a set of four edge routers, two edge routers are configured to use a first hub as their primary hub FE, while the other two edge routers in the set are configured to use a second hub FE as their primary hub FE. When a network impairment on WAN links of the first hub is detected and increases latency of outgoing traffic, the solution implemented in some embodiments is to change the hub order for the two edge routers configured to use the first hub FE as their primary hub FE, while the two edge routers configured to use the second hub FE as their primary hub FE can continue to use that second hub FE as primary.

In some embodiments, a third example uses the same initial configuration as the second example where two edge routers are configured to use a first hub as their primary hub FE, while the other two edge routers in the set are configured to use a second hub FE as their primary hub FE. Upon detection of a network impairment on WAN links of the first hub FE that is causing increased latency for outgoing traffic for clients connected to a first of the two edge routers configured to use the first hub FE as a primary hub FE, the solution implemented in some embodiments is to change the hub order only for the first edge router, while each other edge router continues to use their initially configured primary hub FE (i.e., the first hub FE for the second edge router in the set and the second hub FE for the other two edge routers configured to use the second hub FE as the primary hub FE).

As a fourth example of some embodiments, each of four edge routers are configured to use the same hub FE as a primary hub FE. Upon detection of a network impairment on the hub FEs WAN links, causing increased latency of outgoing traffic for a single application, some embodiments implement a solution that changes the hub order for each of the four edge routers, but only for the single application for which latency has increased. That is, in some embodiments, when traffic for only one application is affected, the hub order for the edge routers is only changed for that application, while the edge routers continue to use the first hub FE as primary for each other application.

In some embodiments, combinations of the solutions described in the above examples are implemented. As a fifth example, when latency increases due to a network impairment on WAN links of a first hub FE for which a subset of edge routers (e.g., two of four) are configured to use as a next hop for traffic for a first application, the solution, in some embodiments, is to change the hub order for only the subset of edge routers and only for traffic associated with the first application.

A sixth example, in some embodiments, involves an initial configuration where a first edge router sends traffic for a first application via a first hub FE, a second edge router sends traffic for the first application via a second hub FE, and third and fourth edge routers send traffic for the first application via a third hub FE. Upon detecting a network impairment on the first and second hub FEs1-2 for traffic for the first application, the implemented solution, in some embodiments, starts with the self-healing network detecting a drop in performance across both the first and second hub FEs. Incidents and recommendations are then generated, in some embodiments, including new hub orders for the first and second edge routers. In some embodiments, the recommended new hub orders should not have the first or second hub FEs as the first recommended hub FEs for the first and second edge routers.

A seventh and final example, of some embodiments, begins with an initial configuration where a first edge router sends traffic for first and second applications via a first hub FE, a second edge router sends traffic for the first and second applications via a second hub FE, and third and fourth edge routers send traffic for the first and second applications via a third hub FE. When a network impairment is detected that affects the first application but not the second application, in some embodiments, the solution is to perform a validation check to determine whether a different hub order is recommended for the affected application (i.e., the first application) while the second application continues using the same hub FE.

In some embodiments, the hub order is changed manually through a UI (user interface) provided by a controller for the network. For instance, in some embodiments, after a self-healing incident is generated, the hub order is manually changed (e.g., by a network administrator) through the network controller's UI, and the remediation is applied. The remediation, in some embodiments, should not be applied and show up in a “failed” state.

In some embodiments, WAN optimizations (e.g., Dynamic Multi-Path Optimization (DMPO) and the self-healing system described above include complementary actions, as well as many differences. WAN optimizations, such as DMPO, in some embodiments, are performed from a local link-level perspective, while the self-healing system of some embodiments operates from a global topology perspective. WAN optimizations like DMPO involve network packet-level optimizations using overlay tunnel metrics, while the self-healing system involves flow/overlay route-level optimizations using end-to-end application performance metrics, according to some embodiments. Additionally, in some embodiments, WAN optimization such as DMPO solves underlay and last-mile link level issues, while the self-healing system solves issues outside optimized tunnels (e.g., VeloCloud Multipath (VCMP) tunnels) such as WAN issues upstream of a gateway/hub, datacenter network issues, and localized application issues.

In some embodiments, from a time-scale perspective, WAN optimizations like DMPO operate at the milliseconds timescale making decisions at the local link level and respond to underlay and last mile link issues caused by changes in network conditions, while the self-healing system operates at the minutes and days timescales making decisions at the global topology level. At the minutes level, the self-healing system of some embodiments responds to sudden network issues outside the optimized tunnels (e.g., VCMP tunnels). At the days level, in some embodiments, the self-healing responds to systemic inefficiencies in the network causing sustained application performance issues.

Currently, two main features of the self-healing system of some embodiments include incidents and remediations. The goal of the incidents feature, in some embodiments, is to detect and remediate a sudden and significant application performance degradation. In some embodiments, an “incident” is created when the analytics system detects a sudden drop in application performance as compared to the recent past history of 30 minutes. The incidents feature works at the minutes timescale of the self-healing system, in some embodiments.

FIG. 17 illustrates the layout of an incident, in some embodiments. The incident summary 1700 includes information including the impact 1710, flow statistics 1720, flow overlay route 1730, other impact 1740, remediation 1750, and a line graph depicting performance drops. The impact 1710 denotes the number of edges that have been affected by the application performance drop. In this example, 10 out of 100, or 10%, of edges experienced a greater than 90% drop in Application 1 performance in the last few minutes. Specifically, the average performance drop was 95.7% with the most common root cause being high latency, as indicated. The incident summary 1700 also includes a line graph 1760 that provides a visualization of the performance drop.

The flow statistics 1720 identifies the specific flow metrics that show a significant change in value. In this example, the average TCP latency spiked to 2.7 s following a disruption. The flow overlay route 1730 denotes the most common next hop node on the SD-WAN overlay (or Direct) among the affected edges. As shown, 6 out of the 10 affected edges had the same hub, Hub-10, as the next hop overlay node. The other impact 1740 identifies if the application issue is specific to that application, or whether it affects other applications. In this example, no other applications were impacted at the same time, as illustrated.

Lastly, remediation 1750 identifies the remediation action that is automatically applied when automatic remediation is enabled, or manually applied (e.g., by an end-user through the user interface using a cursor or other selection means). In some embodiments, if automatic remediation is not enabled, the self-healing system only provides a suggested remediation action and a UI workflow to trigger the remediation action. The remediation action can be triggered directly through the incident alert, or through an ENI application available to the end-user. In this example, the remediation has been applied automatically (i.e., without any manual trigger), and as such, the remediation 1750 indicates that the remediation was executed by the self-healing system and provides a timestamp identifying when the remediation was automatically applied. The remediation 1750 also includes an option for an end-user to see details associated with the remediation, as shown.

Regarding the recommendations feature, in some embodiments, the goal of this feature is to identify systemic issues in the network that are not transient but manifest repeatedly over time. A “recommendation” is created when the analytics system identifies an edge experiencing significantly worse application performance than other edges in the network over a longer time window (e.g., days). The recommendations feature works at the days timescale of the self-healing system, in some embodiments.

In some embodiments, the recommendations feature addresses a few main questions and correlates certain data. For example, the recommendations feature of some embodiments (1) addresses long-term application performance across edges, (2) identifies edges that are outliers and generally have worse performance than the average, (3) correlates application performance with important attributes (e.g., service provider, SD-WAN topology (e.g., direct versus using gateways/hubs), and next hop overlay node), and (4) identifies if the application issue is local to an edge, linked to the overlay/direct route, or is a general application issue. FIG. 18 illustrates an example layout 1800 of a recommendation, in some embodiments, that includes QoE (quality of experience) score comparisons for 6 gateways, as well as edge alternate overlay node QoE scores.

To calculate application QoE scores, in some embodiments, the AI/ML platform of some embodiments calculates scores per minute, per edge, per application, and per route (i.e., next hops and destination hops). The metrics used to calculate application performance, in some embodiments, include tx_pkts, rx_pkts, tcpRXRexmit_pkts, tcpTXRexmit_pkts, tcpLatencySum_usec, and tcpLatencySamples. First, avgTcpLatency and percentTcpPacketDrops are calculated, where avgTcpLatency=tcpLatencySum_usec/tcpLatencySamples and percentTcpPacketDrops=(tcpTxRexmit_pkts+tcpRxRexmit_pkts)/(tx_pkts+rx_pkts). Then, when total packets are above 50 (i.e., tx_pkts+rx_pkts>50), scores are calculated. Intermediate scores are calculated for avgTcpLatency using piecewise function (e.g., configuration file across all tenants thresholds). If avgTcpLatency<=40000 us, the score is 100 (i.e., benign). If 200,000 us>avgTcpLatency>40000 us, (avgTcpLatency−40000 us)/(200,000−40000)*100. If avgTcpLatency>=200,000 us, the score is 0 (i.e., bad). Next, intermediate scores for percentTcpPacketDrops are calculated using a piecewise function. If percentTcpPacketDrops 0.05, the score is 100. If 0.15>percentTcpPacketDrops>0.05, (percentTcpPacketDrops−0.05)/(0.15−0.05)*100. If percentTcpPacketDrops>=0.15, the score is 0 (i.e., bad). Lastly, the minimum (i.e., the worse) between the two intermediate scores becomes the application QoE score, in some embodiments.

In some embodiments, basic statistics computed include number of edges per company, and number of TCP applications monitored by self-healing system (i.e., those with application QoE scores). Basic application statistics computes, in some embodiments, include (1) for each edge, total bytes sent direct versus through an SD-WAN tunnel, comparison of application scores for direct versus SD-WAN tunnel traffic, with table columns that include edge, total bytes direct, total bytes thru SD-WAN tunnel, appTcpQoe avg at peak times for direct, and appTcpQoe avg at peak times for SD-WAN tunnel, (2) top 10 applications by scores and their traffic volumes, and (3) bottom 10 applications by scores and their traffic volumes.

Analysis of edges, in some embodiments, where the application performance deviates significantly from average/normal, includes correlating with the overlay route topology and providing insights connecting edges, applications, and overlay routes. Overall edge/application/overlay statistics are provided in a table, in some embodiments, that lists all outlier edge-application combinations along with overlay nextHop and performance statistics, with table columns that include application, edge, overlay next hop, traffic, application QoE score, and total poor application minutes. Analysis from the edge perspective, in some embodiments, are provided in a table listing each edge, the application and overlay next hop combination, application QoE score, and fraction (e.g., as a ratio) of poor performance time to total time. This analysis also includes the edge versus applications that have issues, and top applications with poor performance per outlier edge.

Analysis from the perspective of the overlay route, in some embodiments, includes a comparison of direct versus SD-WAN tunnel for the outlier edges/applications. A table for the overlay route perspective of some embodiments includes columns for overlay next hop, edge-application pair, application QoE score, and ratio of poor performance time to total time. Lastly, analysis from an application perspective, in some embodiments, includes a table with columns including application, edge-overlay next hop pair, application QoE score, and ratio of poor performance time to total time.

In some embodiments, alternate path insights and recommendations are provided, such as those described above for FIGS. 17-18. For each outlier edge and application, in some embodiments, insights regarding how alternate gateways perform if the application is a SaaS application routed through a gateway, and if the application is a hosted application routed through a hub, how alternate hubs perform are provided. For example, insights of some embodiments regarding performance of alternate gateways include the number of edges actively sending traffic to the alternate gateway, application QoE scores of those edges connecting to the alternate gateway, alternate gateway score(s), and other geographically close gateways and their scores. In some embodiments, insights regarding hosted applications routed through a hub include the number of edges actively sending traffic to the alternate hub and application QoE scores of those edges connecting to the alternate hub, and alternate hub score(s).

While the embodiments described above are described as performing analyses on a per-edge, per-application, per-path basis, other embodiments may perform the analyses differently, such as on a per-edge, per-application, per-physical link (e.g., 5G link vs MPLS link vs cable modem(s)), on a per-edge, per-path basis, or on a per-edge, per-physical link basis. Also, while the topology graphs described above are described as being per-application, other embodiments create one topology graph for several applications.

Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer-readable storage medium (also referred to as computer-readable medium). When these instructions are executed by one or more processing unit(s) (e.g., one or more processors, cores of processors, or other processing units), they cause the processing unit(s) to perform the actions indicated in the instructions. Examples of computer-readable media include, but are not limited to, CD-ROMs, flash drives, RAM chips, hard drives, EPROMs, etc. The computer-readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.

In this specification, the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage, which can be read into memory for processing by a processor. Also, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions. In some embodiments, multiple software inventions can also be implemented as separate programs. Finally, any combination of separate programs that together implement a software invention described here is within the scope of the invention. In some embodiments, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.

FIG. 19 conceptually illustrates a computer system 1900 with which some embodiments of the invention are implemented. The computer system 1900 can be used to implement any of the above-described hosts, controllers, gateway, and edge forwarding elements. As such, it can be used to execute any of the above described processes. This computer system 1900 includes various types of non-transitory machine-readable media and interfaces for various other types of machine-readable media. Computer system 1900 includes a bus 1905, processing unit(s) 1910, a system memory 1925, a read-only memory 1930, a permanent storage device 1935, input devices 1940, and output devices 1945.

The bus 1905 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the computer system 1900. For instance, the bus 1905 communicatively connects the processing unit(s) 1910 with the read-only memory 1930, the system memory 1925, and the permanent storage device 1935.

From these various memory units, the processing unit(s) 1910 retrieve instructions to execute and data to process in order to execute the processes of the invention. The processing unit(s) 1910 may be a single processor or a multi-core processor in different embodiments. The read-only-memory (ROM) 1930 stores static data and instructions that are needed by the processing unit(s) 1910 and other modules of the computer system 1900. The permanent storage device 1935, on the other hand, is a read-and-write memory device. This device 1935 is a non-volatile memory unit that stores instructions and data even when the computer system 1900 is off. Some embodiments of the invention use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1935.

Other embodiments use a removable storage device (such as a floppy disk, flash drive, etc.) as the permanent storage device. Like the permanent storage device 1935, the system memory 1925 is a read-and-write memory device. However, unlike storage device 1935, the system memory 1925 is a volatile read-and-write memory, such as random access memory. The system memory 1925 stores some of the instructions and data that the processor needs at runtime. In some embodiments, the invention's processes are stored in the system memory 1925, the permanent storage device 1935, and/or the read-only memory 1930. From these various memory units, the processing unit(s) 1910 retrieve instructions to execute and data to process in order to execute the processes of some embodiments.

The bus 1905 also connects to the input and output devices 1940 and 1945. The input devices 1940 enable the user to communicate information and select commands to the computer system 1900. The input devices 1940 include alphanumeric keyboards and pointing devices (also called “cursor control devices”). The output devices 1945 display images generated by the computer system 1900. The output devices 1945 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD). Some embodiments include devices such as touchscreens that function as both input and output devices 1940 and 1945.

Finally, as shown in FIG. 19, bus 1905 also couples computer system 1900 to a network 1965 through a network adapter (not shown). In this manner, the computer 1900 can be a part of a network of computers (such as a local area network (“LAN”), a wide area network (“WAN”), or an Intranet), or a network of networks (such as the Internet). Any or all components of computer system 1900 may be used in conjunction with the invention.

Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media). Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM), recordable compact discs (CD-R), rewritable compact discs (CD-RW), read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.), magnetic and/or solid state hard drives, read-only and recordable Blu-Ray® discs, ultra-density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.

While the above discussion primarily refers to microprocessor or multi-core processors that execute software, some embodiments are performed by one or more integrated circuits, such as application-specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs). In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself.

As used in this specification, the terms “computer”, “server”, “processor”, and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms “display” or “displaying” mean displaying on an electronic device. As used in this specification, the terms “computer-readable medium,” “computer-readable media,” and “machine-readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral or transitory signals.

While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. Thus, one of ordinary skill in the art would understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.

Claims

1. A method of remediating anomalies in an SD-WAN (software-defined wide-area network) implemented by a plurality of forwarding elements (FEs) located at a plurality of sites connected by the SD-WAN, the method comprising: iteratively: receiving a plurality of performance metrics that over a duration of time expresses a performance of the SD-WAN for at least one particular application associated with flows that traverse the SD-WAN during the time duration;using the received performance metrics to update generated weight values for a topology graph that comprises (i) a plurality of nodes representing the plurality of FEs and (ii) a plurality of edges between the plurality of nodes representing paths traversed between the FEs by the flows associated with the particular application, said generated weight values associated with said paths;using a topology-based machine-trained process to analyze the topology graph with the generated weight values in order to identify an anomaly in the topology graph that is indicative of an anomaly the SD-WAN for the particular application's traffic flows; andfor an identified anomaly, implementing a remedial action to modify the SD-WAN in order to remediate the identified anomaly.
2. The method of claim 1, wherein using the topology-based machine-trained process to analyze the topology graph with the generated weight values in order to identify an anomaly further comprises determining whether the identified anomaly (i) is isolated to a particular FE in the plurality of FEs or (ii) affects the overall application.
3. The method of claim 1, wherein the topology-based machine-trained process is a topology-based first machine-trained process, wherein identifying the remedial action comprises using a second machine-trained process for (i) identifying a set of potential remedial actions, (ii) testing the identified set of remedial actions, and (iii) based on said testing, selecting a remedial action from a set of potential remedial actions to modify the SD-WAN in order to remediate the identified anomaly.
4. The method of claim 1, wherein implementing the remedial action comprises sending an API (application programming interface) call specifying the identified remedial action to an SD-WAN network controller to direct the SD-WAN network controller to implement the remedial action.
5. The method of claim 1, wherein the plurality of FEs comprises (i) a plurality of edge FEs located at a plurality of branch sites connected by the SD-WAN, (ii) a plurality of transit FEs for connecting the plurality of edge FEs to each other and to a plurality of datacenter sites.
6. The method of claim 5, wherein the identified anomaly comprises a network impairment on a first transit FE of the plurality of transit FEs that is a next-hop FE for application traffic associated with the particular application and forwarded by a first edge FE of the plurality of edge FEs located at a first branch site of the plurality of branch sites.
7. The method of claim 6, wherein the identified remedial action comprises updating a transit FE order configuration for the first edge FE to change the next-hop transit FE for application traffic associated with the particular application and forwarded by the first edge FE from the first transit FE to a second transit FE in the plurality of transit FEs.
8. The method of claim 7, wherein: the first transit FE is a next-hop transit FE for application traffic associated with the particular application and forwarded by a second edge FE in the plurality of edge FEs located at a second branch site in the plurality of branch sites,the identified anomaly is associated with the first edge FE and the second edge FE, andthe identified remedial action comprises updating the transit FE order for the first edge FE and updating a transit FE order for the second edge FE.
9. The method of claim 5, wherein the particular application is a first application, wherein the identified anomaly comprises a network impairment on a first transit FE in the plurality of transit FEs that is a next-hop transit FE for application traffic (i) associated with at least the first application and a second application and (ii) forwarded by a first edge FE in the plurality of edge FEs located at a first branch site in the plurality of branch sites.
10. The method of claim 9, wherein the identified remedial action comprises updating a transit FE order configuration for the particular edge FE to change the next-hop transit FE from the first transit FE to a second transit FE for application traffic (i) associated with the first application and the second application and (ii) forwarded by the particular edge FE.
11. The method of claim 9, wherein the network impairment on the first transit FE affects application traffic associated with the first application and does not affect application traffic associated with the second application.
12. The method of claim 11, wherein the identified remedial action comprises updating a transit FE order configuration for the particular edge FE to change the next-hop transit FE from the first transit FE to a second transit FE for application traffic (i) associated with the first application and (ii) forwarded by the particular edge FE.
13. The method of claim 12, wherein after the remedial action has been applied, the particular edge FE (i) uses the second transit FE as a next hop for application traffic associated with the first application and (ii) continues to use the first transit FE as a next hop for application traffic associated with the second application.
14. The method of claim 1, wherein the set of performance metrics is a first set of performance metrics that comprise performance scores computed for the duration of time based on a second set of performance metrics collected from the plurality of FEs.
15. The method of claim 1, wherein the time window comprises two weeks.
16. The method of claim 1, wherein each path in the plurality of paths is comprised of a set of one or more links, each link connecting two FEs in the plurality of FEs via a tunnel established over the link, the method further comprising generating the topology graph by (i) defining, for each FE in the plurality of FEs, a corresponding node in the plurality of nodes and (ii) defining, for each tunnel between two FEs, a corresponding edge in the plurality of edges.
17. A non-transitory machine readable medium storing a program for execution by a set of processing units, the program for remediating anomalies in an SD-WAN (software-defined wide-area network) implemented by a plurality of forwarding elements (FEs) located at a plurality of sites connected by the SD-WAN, the program comprising sets of instructions for: iteratively: receiving a plurality of performance metrics that over a duration of time expresses a performance of the SD-WAN for at least one particular application associated with flows that traverse the SD-WAN during the time duration;using the received performance metrics to update generated weight values for a topology graph that comprises (i) a plurality of nodes representing the plurality of FEs and (ii) a plurality of edges between the plurality of nodes representing paths traversed between the FEs by the flows associated with the particular application, said generated weight values associated with said paths;using a topology-based machine-trained process to analyze the topology graph with the generated weight values in order to identify an anomaly in the topology graph that is indicative of an anomaly the SD-WAN for the particular application's traffic flows; andfor an identified anomaly, implementing a remedial action to modify the SD-WAN in order to remediate the identified anomaly.
18. The non-transitory machine readable medium of claim 17, wherein the set of instructions for using the topology-based machine-trained process to analyze the topology graph with the generated weight values in order to identify an anomaly further comprises a set of instructions for determining whether the identified anomaly (i) is isolated to a particular FE in the plurality of FEs or (ii) affects the overall application.
19. The non-transitory machine readable medium of claim 17, wherein the topology-based machine-trained process is a topology-based first machine-trained process, wherein the set of instructions for identifying the remedial action comprises a set of instructions for using a second machine-trained process for (i) identifying a set of potential remedial actions, (ii) testing the identified set of remedial actions, and (iii) based on said testing, selecting a remedial action from a set of potential remedial actions to modify the SD-WAN in order to remediate the identified anomaly.
20. The non-transitory machine readable medium of claim 17, wherein the set of instructions for implementing the remedial action comprises a set of instructions for sending an API (application programming interface) call specifying the identified remedial action to an SD-WAN network controller to direct the SD-WAN network controller to implement the remedial action.

US Referenced Citations (1021)

Number	Name	Date	Kind
5652751	Sharony	Jul 1997	A
5909553	Campbell et al.	Jun 1999	A
6154465	Pickett	Nov 2000	A
6157648	Voit et al.	Dec 2000	A
6201810	Masuda et al.	Mar 2001	B1
6363378	Conklin et al.	Mar 2002	B1
6445682	Weitz	Sep 2002	B1
6744775	Beshai et al.	Jun 2004	B1
6976087	Westfall et al.	Dec 2005	B1
7003481	Banka et al.	Feb 2006	B2
7280476	Anderson	Oct 2007	B2
7313629	Nucci et al.	Dec 2007	B1
7320017	Kurapati et al.	Jan 2008	B1
7373660	Guichard et al.	May 2008	B1
7581022	Griffin et al.	Aug 2009	B1
7680925	Sathyanarayana et al.	Mar 2010	B2
7681236	Tamura et al.	Mar 2010	B2
7751409	Carolan	Jul 2010	B1
7962458	Holenstein et al.	Jun 2011	B2
8094575	Vadlakonda et al.	Jan 2012	B1
8094659	Arad	Jan 2012	B1
8111692	Ray	Feb 2012	B2
8141156	Mao et al.	Mar 2012	B1
8224971	Miller et al.	Jul 2012	B1
8228928	Parandekar et al.	Jul 2012	B2
8243589	Trost et al.	Aug 2012	B1
8259566	Chen et al.	Sep 2012	B2
8274891	Averi et al.	Sep 2012	B2
8301749	Finklestein et al.	Oct 2012	B1
8385227	Downey	Feb 2013	B1
8516129	Skene	Aug 2013	B1
8566452	Goodwin, III et al.	Oct 2013	B1
8588066	Goel et al.	Nov 2013	B2
8630291	Shaffer et al.	Jan 2014	B2
8661295	Khanna et al.	Feb 2014	B1
8724456	Hong et al.	May 2014	B1
8724503	Johnsson et al.	May 2014	B2
8745177	Kazerani et al.	Jun 2014	B1
8797874	Yu et al.	Aug 2014	B2
8799504	Capone et al.	Aug 2014	B2
8804745	Sinn	Aug 2014	B1
8806482	Nagargadde et al.	Aug 2014	B1
8855071	Sankaran et al.	Oct 2014	B1
8856339	Mestery et al.	Oct 2014	B2
8964548	Keralapura et al.	Feb 2015	B1
8989199	Sella et al.	Mar 2015	B1
9009217	Nagargadde et al.	Apr 2015	B1
9015299	Shah	Apr 2015	B1
9055000	Ghosh et al.	Jun 2015	B1
9060025	Xu	Jun 2015	B2
9071607	Twitchell, Jr.	Jun 2015	B2
9075771	Gawali et al.	Jul 2015	B1
9100329	Jiang et al.	Aug 2015	B1
9135037	Petrescu-Prahova et al.	Sep 2015	B1
9137334	Zhou	Sep 2015	B2
9154327	Marino et al.	Oct 2015	B1
9203764	Shirazipour et al.	Dec 2015	B2
9225591	Beheshti-Zavareh et al.	Dec 2015	B2
9306949	Richard et al.	Apr 2016	B1
9323561	Ayala et al.	Apr 2016	B2
9336040	Dong et al.	May 2016	B2
9354983	Yenamandra et al.	May 2016	B1
9356943	Lopilato et al.	May 2016	B1
9379981	Zhou et al.	Jun 2016	B1
9413724	Xu	Aug 2016	B2
9419878	Hsiao et al.	Aug 2016	B2
9432245	Sorenson, III et al.	Aug 2016	B1
9438566	Zhang et al.	Sep 2016	B2
9450817	Bahadur et al.	Sep 2016	B1
9450852	Chen et al.	Sep 2016	B1
9462010	Stevenson	Oct 2016	B1
9467478	Khan et al.	Oct 2016	B1
9485163	Fries et al.	Nov 2016	B1
9521067	Michael et al.	Dec 2016	B2
9525564	Lee	Dec 2016	B2
9542219	Bryant et al.	Jan 2017	B1
9559951	Sajassi et al.	Jan 2017	B1
9563423	Pittman	Feb 2017	B1
9602389	Maveli et al.	Mar 2017	B1
9608917	Anderson et al.	Mar 2017	B1
9608962	Chang	Mar 2017	B1
9614748	Battersby et al.	Apr 2017	B1
9621460	Mehta et al.	Apr 2017	B2
9641551	Kariyanahalli	May 2017	B1
9648547	Hart et al.	May 2017	B1
9665432	Kruse et al.	May 2017	B2
9686127	Ramachandran et al.	Jun 2017	B2
9692714	Nair et al.	Jun 2017	B1
9715401	Devine et al.	Jul 2017	B2
9717021	Hughes et al.	Jul 2017	B2
9722815	Mukundan et al.	Aug 2017	B2
9747249	Cherian et al.	Aug 2017	B2
9755965	Yadav et al.	Sep 2017	B1
9787559	Schroeder	Oct 2017	B1
9807004	Koley et al.	Oct 2017	B2
9819540	Bahadur et al.	Nov 2017	B1
9819565	Djukic et al.	Nov 2017	B2
9825822	Holland	Nov 2017	B1
9825911	Brandwine	Nov 2017	B1
9825992	Xu	Nov 2017	B2
9832128	Ashner et al.	Nov 2017	B1
9832205	Santhi et al.	Nov 2017	B2
9875355	Williams	Jan 2018	B1
9906401	Rao	Feb 2018	B1
9923826	Murgia	Mar 2018	B2
9930011	Clemons, Jr. et al.	Mar 2018	B1
9935829	Miller et al.	Apr 2018	B1
9942787	Tillotson	Apr 2018	B1
9996370	Khafizov et al.	Jun 2018	B1
10038601	Becker et al.	Jul 2018	B1
10057183	Salle et al.	Aug 2018	B2
10057294	Xu	Aug 2018	B2
10116593	Sinn et al.	Oct 2018	B1
10135789	Mayya et al.	Nov 2018	B2
10142226	Wu et al.	Nov 2018	B1
10178032	Freitas	Jan 2019	B1
10178037	Appleby et al.	Jan 2019	B2
10187289	Chen et al.	Jan 2019	B1
10200264	Menon et al.	Feb 2019	B2
10229017	Zou et al.	Mar 2019	B1
10237123	Dubey et al.	Mar 2019	B2
10250498	Bales et al.	Apr 2019	B1
10263832	Ghosh	Apr 2019	B1
10320664	Nainar et al.	Jun 2019	B2
10320691	Matthews et al.	Jun 2019	B1
10326830	Singh	Jun 2019	B1
10348767	Lee et al.	Jul 2019	B1
10355989	Panchal et al.	Jul 2019	B1
10425382	Mayya et al.	Sep 2019	B2
10454708	Mibu	Oct 2019	B2
10454714	Mayya et al.	Oct 2019	B2
10461993	Turabi et al.	Oct 2019	B2
10498652	Mayya et al.	Dec 2019	B2
10511546	Singarayan et al.	Dec 2019	B2
10523539	Mayya et al.	Dec 2019	B2
10550093	Ojima et al.	Feb 2020	B2
10554538	Spohn et al.	Feb 2020	B2
10560431	Chen et al.	Feb 2020	B1
10565464	Han et al.	Feb 2020	B2
10567519	Mukhopadhyaya et al.	Feb 2020	B1
10574482	Oréet al.	Feb 2020	B2
10574528	Mayya et al.	Feb 2020	B2
10594516	Cidon et al.	Mar 2020	B2
10594591	Houjyo et al.	Mar 2020	B2
10594659	El-Moussa et al.	Mar 2020	B2
10608844	Cidon et al.	Mar 2020	B2
10630505	Rubenstein et al.	Apr 2020	B2
10637889	Ermagan et al.	Apr 2020	B2
10666460	Cidon et al.	May 2020	B2
10666497	Tahhan et al.	May 2020	B2
10686625	Cidon et al.	Jun 2020	B2
10693739	Naseri et al.	Jun 2020	B1
10708144	Mohan et al.	Jul 2020	B2
10715427	Raj et al.	Jul 2020	B2
10749711	Mukundan et al.	Aug 2020	B2
10778466	Cidon et al.	Sep 2020	B2
10778528	Mayya et al.	Sep 2020	B2
10778557	Ganichev et al.	Sep 2020	B2
10805114	Cidon et al.	Oct 2020	B2
10805272	Mayya et al.	Oct 2020	B2
10819564	Turabi et al.	Oct 2020	B2
10826775	Moreno et al.	Nov 2020	B1
10841131	Cidon et al.	Nov 2020	B2
10911374	Kumar et al.	Feb 2021	B1
10938693	Mayya et al.	Mar 2021	B2
10951529	Duan et al.	Mar 2021	B2
10958479	Cidon et al.	Mar 2021	B2
10959098	Cidon et al.	Mar 2021	B2
10992558	Silva et al.	Apr 2021	B1
10992568	Michael et al.	Apr 2021	B2
10999100	Cidon et al.	May 2021	B2
10999137	Cidon et al.	May 2021	B2
10999165	Cidon et al.	May 2021	B2
10999197	Hooda et al.	May 2021	B2
11005684	Cidon	May 2021	B2
11018995	Cidon et al.	May 2021	B2
11044190	Ramaswamy et al.	Jun 2021	B2
11050588	Mayya et al.	Jun 2021	B2
11050644	Hegde et al.	Jun 2021	B2
11071005	Shen et al.	Jul 2021	B2
11089111	Markuze et al.	Aug 2021	B2
11095612	Oswal et al.	Aug 2021	B1
11102032	Cidon et al.	Aug 2021	B2
11108595	Knutsen et al.	Aug 2021	B2
11108851	Kurmala et al.	Aug 2021	B1
11115347	Gupta et al.	Sep 2021	B2
11115426	Pazhyannur et al.	Sep 2021	B1
11115480	Markuze et al.	Sep 2021	B2
11121962	Michael et al.	Sep 2021	B2
11121985	Cidon et al.	Sep 2021	B2
11128492	Sethi et al.	Sep 2021	B2
11146632	Rubenstein	Oct 2021	B2
11153230	Cidon et al.	Oct 2021	B2
11171885	Cidon et al.	Nov 2021	B2
11212140	Mukundan et al.	Dec 2021	B2
11212238	Cidon et al.	Dec 2021	B2
11223514	Mayya et al.	Jan 2022	B2
11245641	Ramaswamy et al.	Feb 2022	B2
11252079	Michael et al.	Feb 2022	B2
11252105	Cidon et al.	Feb 2022	B2
11252106	Cidon et al.	Feb 2022	B2
11258728	Cidon et al.	Feb 2022	B2
11310170	Cidon et al.	Apr 2022	B2
11323307	Mayya et al.	May 2022	B2
11349722	Mayya et al.	May 2022	B2
11363124	Markuze et al.	Jun 2022	B2
11374904	Mayya et al.	Jun 2022	B2
11375005	Rolando et al.	Jun 2022	B1
11381474	Kumar et al.	Jul 2022	B1
11381499	Ramaswamy et al.	Jul 2022	B1
11388086	Ramaswamy et al.	Jul 2022	B1
11394640	Ramaswamy et al.	Jul 2022	B2
11418997	Devadoss et al.	Aug 2022	B2
11438789	Devadoss et al.	Sep 2022	B2
11444865	Ramaswamy et al.	Sep 2022	B2
11444872	Mayya et al.	Sep 2022	B2
11477127	Ramaswamy et al.	Oct 2022	B2
11489720	Kempanna et al.	Nov 2022	B1
11489783	Ramaswamy et al.	Nov 2022	B2
11509571	Ramaswamy et al.	Nov 2022	B1
11516049	Cidon et al.	Nov 2022	B2
11522780	Wallace et al.	Dec 2022	B1
11526434	Brooker et al.	Dec 2022	B1
11533248	Mayya et al.	Dec 2022	B2
11552874	Pragada et al.	Jan 2023	B1
11575591	Ramaswamy et al.	Feb 2023	B2
11575600	Markuze et al.	Feb 2023	B2
11582144	Ramaswamy	Feb 2023	B2
11582298	Hood et al.	Feb 2023	B2
11601356	Gandhi et al.	Mar 2023	B2
11606225	Cidon et al.	Mar 2023	B2
11606286	Michael et al.	Mar 2023	B2
11606314	Cidon et al.	Mar 2023	B2
11606712	Devadoss et al.	Mar 2023	B2
11611507	Ramaswamy et al.	Mar 2023	B2
11637768	Ramaswamy et al.	Apr 2023	B2
11677720	Mayya et al.	Jun 2023	B2
11689959	Devadoss et al.	Jun 2023	B2
11700196	Michael et al.	Jul 2023	B2
11706126	Silva et al.	Jul 2023	B2
11706127	Michael et al.	Jul 2023	B2
11709710	Markuze et al.	Jul 2023	B2
11716286	Ramaswamy et al.	Aug 2023	B2
11722925	Devadoss et al.	Aug 2023	B2
11729065	Ramaswamy et al.	Aug 2023	B2
20020049687	Helsper et al.	Apr 2002	A1
20020075542	Kumar et al.	Jun 2002	A1
20020085488	Kobayashi	Jul 2002	A1
20020087716	Mustafa	Jul 2002	A1
20020152306	Tuck	Oct 2002	A1
20020186682	Kawano et al.	Dec 2002	A1
20020198840	Banka et al.	Dec 2002	A1
20030050061	Wu et al.	Mar 2003	A1
20030061269	Hathaway et al.	Mar 2003	A1
20030088697	Matsuhira	May 2003	A1
20030112766	Riedel et al.	Jun 2003	A1
20030112808	Solomon	Jun 2003	A1
20030126468	Markham	Jul 2003	A1
20030161313	Jinmei et al.	Aug 2003	A1
20030189919	Gupta et al.	Oct 2003	A1
20030202506	Perkins et al.	Oct 2003	A1
20030219030	Gubbi	Nov 2003	A1
20040059831	Chu et al.	Mar 2004	A1
20040068668	Lor et al.	Apr 2004	A1
20040165601	Liu et al.	Aug 2004	A1
20040224771	Chen et al.	Nov 2004	A1
20050078690	DeLangis	Apr 2005	A1
20050149604	Navada	Jul 2005	A1
20050154790	Nagata et al.	Jul 2005	A1
20050172161	Cruz et al.	Aug 2005	A1
20050195754	Nosella	Sep 2005	A1
20050210479	Andjelic	Sep 2005	A1
20050265255	Kodialam et al.	Dec 2005	A1
20060002291	Alicherry et al.	Jan 2006	A1
20060034335	Karaoguz et al.	Feb 2006	A1
20060114838	Mandavilli et al.	Jun 2006	A1
20060171365	Borella	Aug 2006	A1
20060182034	Klinker et al.	Aug 2006	A1
20060182035	Vasseur	Aug 2006	A1
20060193247	Naseh et al.	Aug 2006	A1
20060193252	Naseh et al.	Aug 2006	A1
20060195605	Sundarrajan et al.	Aug 2006	A1
20060245414	Susai et al.	Nov 2006	A1
20070050594	Augsburg et al.	Mar 2007	A1
20070064604	Chen et al.	Mar 2007	A1
20070064702	Bates et al.	Mar 2007	A1
20070083727	Johnston et al.	Apr 2007	A1
20070091794	Filsfils et al.	Apr 2007	A1
20070103548	Carter	May 2007	A1
20070115812	Hughes	May 2007	A1
20070121486	Guichard et al.	May 2007	A1
20070130325	Lesser	Jun 2007	A1
20070162619	Aloni et al.	Jul 2007	A1
20070162639	Chu et al.	Jul 2007	A1
20070177511	Das et al.	Aug 2007	A1
20070195797	Patel et al.	Aug 2007	A1
20070237081	Kodialam et al.	Oct 2007	A1
20070260746	Mirtorabi et al.	Nov 2007	A1
20070268882	Breslau et al.	Nov 2007	A1
20080002670	Bugenhagen et al.	Jan 2008	A1
20080049621	McGuire et al.	Feb 2008	A1
20080055241	Goldenberg et al.	Mar 2008	A1
20080080509	Khanna et al.	Apr 2008	A1
20080095187	Jung et al.	Apr 2008	A1
20080117930	Chakareski et al.	May 2008	A1
20080144532	Chamarajanagar et al.	Jun 2008	A1
20080168086	Miller et al.	Jul 2008	A1
20080175150	Bolt et al.	Jul 2008	A1
20080181116	Kavanaugh et al.	Jul 2008	A1
20080219276	Shah	Sep 2008	A1
20080240121	Xiong et al.	Oct 2008	A1
20080263218	Beerends et al.	Oct 2008	A1
20090013210	McIntosh et al.	Jan 2009	A1
20090028092	Rothschild	Jan 2009	A1
20090125617	Klessig et al.	May 2009	A1
20090141642	Sun	Jun 2009	A1
20090154463	Hines et al.	Jun 2009	A1
20090182874	Morford et al.	Jul 2009	A1
20090247204	Sennett et al.	Oct 2009	A1
20090268605	Campbell et al.	Oct 2009	A1
20090274045	Meier et al.	Nov 2009	A1
20090276657	Wetmore et al.	Nov 2009	A1
20090303880	Maltz et al.	Dec 2009	A1
20100008361	Guichard et al.	Jan 2010	A1
20100017802	Lojewski	Jan 2010	A1
20100046532	Okita	Feb 2010	A1
20100061379	Parandekar et al.	Mar 2010	A1
20100080129	Strahan et al.	Apr 2010	A1
20100088440	Banks et al.	Apr 2010	A1
20100091782	Hiscock	Apr 2010	A1
20100091823	Retana et al.	Apr 2010	A1
20100107162	Edwards et al.	Apr 2010	A1
20100118727	Draves et al.	May 2010	A1
20100118886	Saavedra	May 2010	A1
20100128600	Srinivasmurthy et al.	May 2010	A1
20100165985	Sharma et al.	Jul 2010	A1
20100191884	Holenstein et al.	Jul 2010	A1
20100223621	Joshi et al.	Sep 2010	A1
20100226246	Proulx	Sep 2010	A1
20100290422	Haigh et al.	Nov 2010	A1
20100309841	Conte	Dec 2010	A1
20100309912	Mehta et al.	Dec 2010	A1
20100322255	Hao et al.	Dec 2010	A1
20100332657	Elyashev et al.	Dec 2010	A1
20110001604	Ludlow et al.	Jan 2011	A1
20110007752	Silva et al.	Jan 2011	A1
20110032939	Nozaki et al.	Feb 2011	A1
20110035187	DeJori et al.	Feb 2011	A1
20110040814	Higgins	Feb 2011	A1
20110075674	Li et al.	Mar 2011	A1
20110078783	Duan et al.	Mar 2011	A1
20110107139	Middlecamp et al.	May 2011	A1
20110110370	Moreno et al.	May 2011	A1
20110141877	Xu et al.	Jun 2011	A1
20110142041	Imai	Jun 2011	A1
20110153909	Dong	Jun 2011	A1
20110235509	Szymanski	Sep 2011	A1
20110255397	Kadakia et al.	Oct 2011	A1
20110302663	Prodan et al.	Dec 2011	A1
20120008630	Ould-Brahim	Jan 2012	A1
20120027013	Napierala	Feb 2012	A1
20120039309	Evans et al.	Feb 2012	A1
20120099601	Haddad et al.	Apr 2012	A1
20120136697	Peles et al.	May 2012	A1
20120140935	Kruglick	Jun 2012	A1
20120157068	Eichen et al.	Jun 2012	A1
20120173694	Yan et al.	Jul 2012	A1
20120173919	Patel et al.	Jul 2012	A1
20120182940	Taleb et al.	Jul 2012	A1
20120221955	Raleigh et al.	Aug 2012	A1
20120227093	Shatzkamer et al.	Sep 2012	A1
20120240185	Kapoor et al.	Sep 2012	A1
20120250682	Vincent et al.	Oct 2012	A1
20120250686	Vincent et al.	Oct 2012	A1
20120266026	Chikkalingaiah et al.	Oct 2012	A1
20120281706	Agarwal et al.	Nov 2012	A1
20120287818	Corti et al.	Nov 2012	A1
20120300615	Kempf et al.	Nov 2012	A1
20120307659	Yamada	Dec 2012	A1
20120317270	Vrbaski et al.	Dec 2012	A1
20120317291	Wolfe	Dec 2012	A1
20130007505	Spear	Jan 2013	A1
20130019005	Hui et al.	Jan 2013	A1
20130021968	Reznik et al.	Jan 2013	A1
20130044764	Casado et al.	Feb 2013	A1
20130051237	Ong	Feb 2013	A1
20130051399	Zhang et al.	Feb 2013	A1
20130054763	Merwe et al.	Feb 2013	A1
20130086267	Gelenbe et al.	Apr 2013	A1
20130097304	Asthana et al.	Apr 2013	A1
20130103729	Cooney et al.	Apr 2013	A1
20130103834	Dzerve et al.	Apr 2013	A1
20130117530	Kim et al.	May 2013	A1
20130124718	Griffith et al.	May 2013	A1
20130124911	Griffith et al.	May 2013	A1
20130124912	Griffith et al.	May 2013	A1
20130128889	Mathur et al.	May 2013	A1
20130142201	Kim et al.	Jun 2013	A1
20130170354	Takashima et al.	Jul 2013	A1
20130173768	Kundu et al.	Jul 2013	A1
20130173788	Song	Jul 2013	A1
20130182712	Aguayo et al.	Jul 2013	A1
20130185446	Zeng et al.	Jul 2013	A1
20130185729	Vasic et al.	Jul 2013	A1
20130191688	Agarwal et al.	Jul 2013	A1
20130223226	Narayanan et al.	Aug 2013	A1
20130223454	Dunbar et al.	Aug 2013	A1
20130235870	Tripathi et al.	Sep 2013	A1
20130238782	Zhao et al.	Sep 2013	A1
20130242718	Zhang	Sep 2013	A1
20130254599	Katkar et al.	Sep 2013	A1
20130258839	Wang et al.	Oct 2013	A1
20130258847	Zhang et al.	Oct 2013	A1
20130266015	Qu et al.	Oct 2013	A1
20130266019	Qu et al.	Oct 2013	A1
20130283364	Chang et al.	Oct 2013	A1
20130286846	Atlas et al.	Oct 2013	A1
20130297611	Moritz et al.	Nov 2013	A1
20130297770	Zhang	Nov 2013	A1
20130301469	Suga	Nov 2013	A1
20130301642	Radhakrishnan et al.	Nov 2013	A1
20130308444	Sem-Jacobsen et al.	Nov 2013	A1
20130315242	Wang et al.	Nov 2013	A1
20130315243	Huang et al.	Nov 2013	A1
20130329548	Nakil et al.	Dec 2013	A1
20130329601	Yin et al.	Dec 2013	A1
20130329734	Chesla et al.	Dec 2013	A1
20130346470	Obstfeld et al.	Dec 2013	A1
20140016464	Shirazipour et al.	Jan 2014	A1
20140019604	Twitchell, Jr.	Jan 2014	A1
20140019750	Dodgson et al.	Jan 2014	A1
20140040975	Raleigh et al.	Feb 2014	A1
20140064283	Balus et al.	Mar 2014	A1
20140071832	Johnsson et al.	Mar 2014	A1
20140092907	Sridhar et al.	Apr 2014	A1
20140108665	Arora et al.	Apr 2014	A1
20140112171	Pasdar	Apr 2014	A1
20140115584	Mudigonda et al.	Apr 2014	A1
20140122559	Branson et al.	May 2014	A1
20140123135	Huang et al.	May 2014	A1
20140126418	Brendel et al.	May 2014	A1
20140156818	Hunt	Jun 2014	A1
20140156823	Liu et al.	Jun 2014	A1
20140157363	Banerjee	Jun 2014	A1
20140160935	Zecharia et al.	Jun 2014	A1
20140164560	Ko et al.	Jun 2014	A1
20140164617	Jalan et al.	Jun 2014	A1
20140164718	Schaik et al.	Jun 2014	A1
20140173113	Vemuri et al.	Jun 2014	A1
20140173331	Martin et al.	Jun 2014	A1
20140181824	Saund et al.	Jun 2014	A1
20140189074	Parker	Jul 2014	A1
20140208317	Nakagawa	Jul 2014	A1
20140219135	Li et al.	Aug 2014	A1
20140223507	Xu	Aug 2014	A1
20140229210	Sharifian et al.	Aug 2014	A1
20140244851	Lee	Aug 2014	A1
20140258535	Zhang	Sep 2014	A1
20140269690	Tu	Sep 2014	A1
20140279862	Dietz et al.	Sep 2014	A1
20140280499	Basavaiah et al.	Sep 2014	A1
20140310282	Sprague et al.	Oct 2014	A1
20140317440	Biermayr et al.	Oct 2014	A1
20140321277	Lynn, Jr. et al.	Oct 2014	A1
20140337500	Lee	Nov 2014	A1
20140337674	Ivancic et al.	Nov 2014	A1
20140341109	Cartmell et al.	Nov 2014	A1
20140355441	Jain	Dec 2014	A1
20140365834	Stone et al.	Dec 2014	A1
20140372582	Ghanwani et al.	Dec 2014	A1
20150003240	Drwiega et al.	Jan 2015	A1
20150016249	Mukundan et al.	Jan 2015	A1
20150029864	Raileanu et al.	Jan 2015	A1
20150039744	Niazi et al.	Feb 2015	A1
20150046572	Cheng et al.	Feb 2015	A1
20150052247	Threefoot et al.	Feb 2015	A1
20150052517	Raghu et al.	Feb 2015	A1
20150056960	Egner et al.	Feb 2015	A1
20150058917	Xu	Feb 2015	A1
20150088942	Shah	Mar 2015	A1
20150089628	Lang	Mar 2015	A1
20150092603	Aguayo et al.	Apr 2015	A1
20150096011	Watt	Apr 2015	A1
20150100958	Banavalikar et al.	Apr 2015	A1
20150106809	Reddy et al.	Apr 2015	A1
20150124603	Ketheesan et al.	May 2015	A1
20150134777	Onoue	May 2015	A1
20150139238	Pourzandi et al.	May 2015	A1
20150146539	Mehta et al.	May 2015	A1
20150163152	Li	Jun 2015	A1
20150169340	Haddad et al.	Jun 2015	A1
20150172121	Farkas et al.	Jun 2015	A1
20150172169	DeCusatis et al.	Jun 2015	A1
20150188823	Williams et al.	Jul 2015	A1
20150189009	Bemmel	Jul 2015	A1
20150195178	Bhattacharya et al.	Jul 2015	A1
20150201036	Nishiki et al.	Jul 2015	A1
20150222543	Song	Aug 2015	A1
20150222638	Morley	Aug 2015	A1
20150236945	Michael et al.	Aug 2015	A1
20150236962	Veres et al.	Aug 2015	A1
20150244617	Nakil et al.	Aug 2015	A1
20150249644	Xu	Sep 2015	A1
20150257081	Ramanujan et al.	Sep 2015	A1
20150264055	Budhani et al.	Sep 2015	A1
20150271056	Chunduri et al.	Sep 2015	A1
20150271104	Chikkamath et al.	Sep 2015	A1
20150271303	Neginhal et al.	Sep 2015	A1
20150281004	Kakadia et al.	Oct 2015	A1
20150312142	Barabash et al.	Oct 2015	A1
20150312760	O'Toole	Oct 2015	A1
20150317169	Sinha et al.	Nov 2015	A1
20150326426	Uo et al.	Nov 2015	A1
20150334025	Rader	Nov 2015	A1
20150334696	Gu et al.	Nov 2015	A1
20150341271	Gomez	Nov 2015	A1
20150349978	Wu et al.	Dec 2015	A1
20150350907	Timariu et al.	Dec 2015	A1
20150358232	Chen et al.	Dec 2015	A1
20150358236	Roach et al.	Dec 2015	A1
20150363221	Terayama et al.	Dec 2015	A1
20150363733	Brown	Dec 2015	A1
20150365323	Duminuco et al.	Dec 2015	A1
20150372943	Hasan et al.	Dec 2015	A1
20150372982	Herle et al.	Dec 2015	A1
20150381407	Wang et al.	Dec 2015	A1
20150381462	Choi et al.	Dec 2015	A1
20150381493	Bansal et al.	Dec 2015	A1
20160019317	Pawar et al.	Jan 2016	A1
20160020844	Hart et al.	Jan 2016	A1
20160021597	Hart et al.	Jan 2016	A1
20160035183	Buchholz et al.	Feb 2016	A1
20160036924	Koppolu et al.	Feb 2016	A1
20160036938	Aviles et al.	Feb 2016	A1
20160037434	Gopal et al.	Feb 2016	A1
20160072669	Saavedra	Mar 2016	A1
20160072684	Manuguri et al.	Mar 2016	A1
20160080268	Anand et al.	Mar 2016	A1
20160080502	Yadav et al.	Mar 2016	A1
20160105353	Cociglio	Apr 2016	A1
20160105392	Thakkar et al.	Apr 2016	A1
20160105471	Nunes et al.	Apr 2016	A1
20160105488	Thakkar et al.	Apr 2016	A1
20160117185	Fang et al.	Apr 2016	A1
20160134461	Sampath et al.	May 2016	A1
20160134527	Kwak et al.	May 2016	A1
20160134528	Lin et al.	May 2016	A1
20160134591	Liao et al.	May 2016	A1
20160142373	Ossipov	May 2016	A1
20160147607	Dornemann et al.	May 2016	A1
20160150055	Choi	May 2016	A1
20160164832	Bellagamba et al.	Jun 2016	A1
20160164914	Madhav et al.	Jun 2016	A1
20160173338	Wolting	Jun 2016	A1
20160191363	Haraszti et al.	Jun 2016	A1
20160191374	Singh et al.	Jun 2016	A1
20160192403	Gupta et al.	Jun 2016	A1
20160197834	Luft	Jul 2016	A1
20160197835	Luft	Jul 2016	A1
20160198003	Luft	Jul 2016	A1
20160205071	Cooper et al.	Jul 2016	A1
20160210209	Verkaik et al.	Jul 2016	A1
20160212773	Kanderholm et al.	Jul 2016	A1
20160218947	Hughes et al.	Jul 2016	A1
20160218951	Vasseur et al.	Jul 2016	A1
20160234099	Jiao	Aug 2016	A1
20160234161	Banerjee et al.	Aug 2016	A1
20160255169	Kovvuri et al.	Sep 2016	A1
20160255542	Hughes et al.	Sep 2016	A1
20160261493	Li	Sep 2016	A1
20160261495	Xia et al.	Sep 2016	A1
20160261506	Hegde et al.	Sep 2016	A1
20160261639	Xu	Sep 2016	A1
20160269298	Li et al.	Sep 2016	A1
20160269926	Sundaram	Sep 2016	A1
20160285736	Gu	Sep 2016	A1
20160299775	Madapurath et al.	Oct 2016	A1
20160301471	Kunz et al.	Oct 2016	A1
20160308762	Teng et al.	Oct 2016	A1
20160315912	Mayya et al.	Oct 2016	A1
20160323377	Einkauf et al.	Nov 2016	A1
20160328159	Coddington et al.	Nov 2016	A1
20160330111	Manghirmalani et al.	Nov 2016	A1
20160337202	Ben-Itzhak et al.	Nov 2016	A1
20160352588	Subbarayan et al.	Dec 2016	A1
20160353268	Senarath et al.	Dec 2016	A1
20160359738	Sullenberger et al.	Dec 2016	A1
20160366187	Kamble	Dec 2016	A1
20160371153	Dornemann	Dec 2016	A1
20160378527	Zamir	Dec 2016	A1
20160380886	Blair et al.	Dec 2016	A1
20160380906	Hodique et al.	Dec 2016	A1
20170005986	Bansal et al.	Jan 2017	A1
20170006499	Hampel et al.	Jan 2017	A1
20170012870	Blair et al.	Jan 2017	A1
20170019428	Cohn	Jan 2017	A1
20170024260	Chandrasekaran et al.	Jan 2017	A1
20170026273	Yao et al.	Jan 2017	A1
20170026283	Williams et al.	Jan 2017	A1
20170026355	Mathaiyan et al.	Jan 2017	A1
20170034046	Cai et al.	Feb 2017	A1
20170034052	Chanda et al.	Feb 2017	A1
20170034129	Sawant et al.	Feb 2017	A1
20170048296	Ramalho et al.	Feb 2017	A1
20170053258	Carney et al.	Feb 2017	A1
20170055131	Kong et al.	Feb 2017	A1
20170063674	Maskalik et al.	Mar 2017	A1
20170063782	Jain et al.	Mar 2017	A1
20170063783	Yong et al.	Mar 2017	A1
20170063794	Jain et al.	Mar 2017	A1
20170064005	Lee	Mar 2017	A1
20170075710	Prasad et al.	Mar 2017	A1
20170093625	Pera et al.	Mar 2017	A1
20170097841	Chang et al.	Apr 2017	A1
20170104653	Badea et al.	Apr 2017	A1
20170104755	Arregoces et al.	Apr 2017	A1
20170109212	Gaurav et al.	Apr 2017	A1
20170118067	Vedula	Apr 2017	A1
20170118173	Arramreddy et al.	Apr 2017	A1
20170123939	Maheshwari et al.	May 2017	A1
20170126475	Mahkonen et al.	May 2017	A1
20170126516	Tiagi et al.	May 2017	A1
20170126564	Mayya et al.	May 2017	A1
20170134186	Mukundan et al.	May 2017	A1
20170134520	Abbasi et al.	May 2017	A1
20170139789	Fries et al.	May 2017	A1
20170142000	Cai et al.	May 2017	A1
20170149637	Banikazemi et al.	May 2017	A1
20170155557	Desai et al.	Jun 2017	A1
20170155566	Martinsen et al.	Jun 2017	A1
20170155590	Dillon et al.	Jun 2017	A1
20170163473	Sadana et al.	Jun 2017	A1
20170171024	Anerousis et al.	Jun 2017	A1
20170171310	Gardner	Jun 2017	A1
20170180220	Leckey et al.	Jun 2017	A1
20170181210	Nadella et al.	Jun 2017	A1
20170195161	Ruel et al.	Jul 2017	A1
20170195169	Mills et al.	Jul 2017	A1
20170201568	Hussam et al.	Jul 2017	A1
20170201585	Doraiswamy et al.	Jul 2017	A1
20170207976	Rovner et al.	Jul 2017	A1
20170214545	Cheng et al.	Jul 2017	A1
20170214701	Hasan	Jul 2017	A1
20170223117	Messerli et al.	Aug 2017	A1
20170236060	Ignatyev	Aug 2017	A1
20170237710	Mayya et al.	Aug 2017	A1
20170242784	Heorhiadi et al.	Aug 2017	A1
20170257260	Govindan et al.	Sep 2017	A1
20170257309	Appanna	Sep 2017	A1
20170264496	Ao et al.	Sep 2017	A1
20170279717	Bethers et al.	Sep 2017	A1
20170279741	Elias et al.	Sep 2017	A1
20170279803	Desai et al.	Sep 2017	A1
20170280474	Vesterinen et al.	Sep 2017	A1
20170288987	Pasupathy et al.	Oct 2017	A1
20170289002	Ganguli et al.	Oct 2017	A1
20170289027	Ratnasingham	Oct 2017	A1
20170295264	Touitou et al.	Oct 2017	A1
20170302501	Shi et al.	Oct 2017	A1
20170302565	Ghobadi et al.	Oct 2017	A1
20170310641	Jiang et al.	Oct 2017	A1
20170310691	Vasseur et al.	Oct 2017	A1
20170317954	Masurekar et al.	Nov 2017	A1
20170317969	Masurekar et al.	Nov 2017	A1
20170317974	Masurekar et al.	Nov 2017	A1
20170324628	Dhanabalan	Nov 2017	A1
20170337086	Zhu et al.	Nov 2017	A1
20170339022	Hegde et al.	Nov 2017	A1
20170339054	Yadav et al.	Nov 2017	A1
20170339070	Chang et al.	Nov 2017	A1
20170346722	Smith et al.	Nov 2017	A1
20170364419	Lo	Dec 2017	A1
20170366445	Nemirovsky et al.	Dec 2017	A1
20170366467	Martin et al.	Dec 2017	A1
20170373950	Szilagyi et al.	Dec 2017	A1
20170374174	Evens et al.	Dec 2017	A1
20180006995	Bickhart et al.	Jan 2018	A1
20180007005	Chanda et al.	Jan 2018	A1
20180007123	Cheng et al.	Jan 2018	A1
20180013636	Seetharamaiah et al.	Jan 2018	A1
20180014051	Phillips et al.	Jan 2018	A1
20180020035	Boggia et al.	Jan 2018	A1
20180034668	Mayya et al.	Feb 2018	A1
20180041425	Zhang	Feb 2018	A1
20180062875	Tumuluru	Mar 2018	A1
20180062914	Boutros et al.	Mar 2018	A1
20180062917	Chandrashekhar et al.	Mar 2018	A1
20180063036	Chandrashekhar et al.	Mar 2018	A1
20180063193	Chandrashekhar et al.	Mar 2018	A1
20180063233	Park	Mar 2018	A1
20180063743	Tumuluru et al.	Mar 2018	A1
20180069924	Tumuluru et al.	Mar 2018	A1
20180074909	Bishop et al.	Mar 2018	A1
20180077081	Lauer et al.	Mar 2018	A1
20180077202	Xu	Mar 2018	A1
20180084081	Kuchibhotla et al.	Mar 2018	A1
20180091370	Arai	Mar 2018	A1
20180097725	Wood et al.	Apr 2018	A1
20180114569	Strachan et al.	Apr 2018	A1
20180123910	Fitzgibbon	May 2018	A1
20180123946	Ramachandran et al.	May 2018	A1
20180131608	Jiang et al.	May 2018	A1
20180131615	Zhang	May 2018	A1
20180131720	Hobson et al.	May 2018	A1
20180145899	Rao	May 2018	A1
20180159796	Wang et al.	Jun 2018	A1
20180159856	Gujarathi	Jun 2018	A1
20180167378	Kostyukov et al.	Jun 2018	A1
20180176073	Dubey et al.	Jun 2018	A1
20180176082	Katz et al.	Jun 2018	A1
20180176130	Banerjee et al.	Jun 2018	A1
20180176252	Nimmagadda et al.	Jun 2018	A1
20180181423	Gunda et al.	Jun 2018	A1
20180205746	Boutnaru et al.	Jul 2018	A1
20180213472	Ishii et al.	Jul 2018	A1
20180219765	Michael et al.	Aug 2018	A1
20180219766	Michael et al.	Aug 2018	A1
20180234300	Mayya et al.	Aug 2018	A1
20180248790	Tan et al.	Aug 2018	A1
20180260125	Botes et al.	Sep 2018	A1
20180261085	Liu et al.	Sep 2018	A1
20180262468	Kumar et al.	Sep 2018	A1
20180270104	Zheng et al.	Sep 2018	A1
20180278541	Wu et al.	Sep 2018	A1
20180287907	Kulshreshtha et al.	Oct 2018	A1
20180295101	Gehrmann	Oct 2018	A1
20180295529	Jen et al.	Oct 2018	A1
20180302286	Mayya et al.	Oct 2018	A1
20180302321	Manthiramoorthy et al.	Oct 2018	A1
20180307851	Lewis	Oct 2018	A1
20180316606	Sung et al.	Nov 2018	A1
20180351855	Sood et al.	Dec 2018	A1
20180351862	Jeganathan et al.	Dec 2018	A1
20180351863	Vairavakkalai et al.	Dec 2018	A1
20180351882	Jeganathan et al.	Dec 2018	A1
20180359323	Madden	Dec 2018	A1
20180367445	Bajaj	Dec 2018	A1
20180373558	Chang et al.	Dec 2018	A1
20180375744	Mayya et al.	Dec 2018	A1
20180375824	Mayya et al.	Dec 2018	A1
20180375967	Pithawala et al.	Dec 2018	A1
20190013883	Vargas et al.	Jan 2019	A1
20190014038	Ritchie	Jan 2019	A1
20190020588	Twitchell, Jr.	Jan 2019	A1
20190020627	Yuan	Jan 2019	A1
20190021085	Mochizuki et al.	Jan 2019	A1
20190028378	Houjyo et al.	Jan 2019	A1
20190028552	Johnson et al.	Jan 2019	A1
20190036808	Shenoy et al.	Jan 2019	A1
20190036810	Michael et al.	Jan 2019	A1
20190036813	Shenoy et al.	Jan 2019	A1
20190046056	Khachaturian et al.	Feb 2019	A1
20190058657	Chunduri et al.	Feb 2019	A1
20190058709	Kempf et al.	Feb 2019	A1
20190068470	Mirsky	Feb 2019	A1
20190068493	Ram et al.	Feb 2019	A1
20190068500	Hira	Feb 2019	A1
20190075083	Mayya et al.	Mar 2019	A1
20190081894	Yousaf et al.	Mar 2019	A1
20190103990	Cidon et al.	Apr 2019	A1
20190103991	Cidon et al.	Apr 2019	A1
20190103992	Cidon et al.	Apr 2019	A1
20190103993	Cidon et al.	Apr 2019	A1
20190104035	Cidon et al.	Apr 2019	A1
20190104049	Cidon et al.	Apr 2019	A1
20190104050	Cidon et al.	Apr 2019	A1
20190104051	Cidon et al.	Apr 2019	A1
20190104052	Cidon et al.	Apr 2019	A1
20190104053	Cidon et al.	Apr 2019	A1
20190104063	Cidon et al.	Apr 2019	A1
20190104064	Cidon et al.	Apr 2019	A1
20190104109	Cidon et al.	Apr 2019	A1
20190104111	Cidon et al.	Apr 2019	A1
20190104413	Cidon et al.	Apr 2019	A1
20190109769	Jain et al.	Apr 2019	A1
20190132221	Boutros et al.	May 2019	A1
20190132234	Dong et al.	May 2019	A1
20190132322	Song et al.	May 2019	A1
20190140889	Mayya et al.	May 2019	A1
20190140890	Mayya et al.	May 2019	A1
20190149525	Gunda et al.	May 2019	A1
20190158371	Dillon et al.	May 2019	A1
20190158605	Markuze et al.	May 2019	A1
20190199539	Deng et al.	Jun 2019	A1
20190220703	Prakash et al.	Jul 2019	A1
20190222499	Chen et al.	Jul 2019	A1
20190238364	Boutros et al.	Aug 2019	A1
20190238446	Barzik et al.	Aug 2019	A1
20190238449	Michael et al.	Aug 2019	A1
20190238450	Michael et al.	Aug 2019	A1
20190238483	Marichetty et al.	Aug 2019	A1
20190238497	Tourrilhes et al.	Aug 2019	A1
20190268421	Markuze et al.	Aug 2019	A1
20190268973	Bull et al.	Aug 2019	A1
20190278631	Bernat et al.	Sep 2019	A1
20190280962	Michael et al.	Sep 2019	A1
20190280963	Michael et al.	Sep 2019	A1
20190280964	Michael et al.	Sep 2019	A1
20190288875	Shen et al.	Sep 2019	A1
20190306197	Degioanni	Oct 2019	A1
20190306282	Masputra et al.	Oct 2019	A1
20190313278	Liu	Oct 2019	A1
20190313907	Khachaturian et al.	Oct 2019	A1
20190319847	Nahar et al.	Oct 2019	A1
20190319881	Maskara et al.	Oct 2019	A1
20190327109	Guichard et al.	Oct 2019	A1
20190334786	Dutta et al.	Oct 2019	A1
20190334813	Raj et al.	Oct 2019	A1
20190334820	Zhao	Oct 2019	A1
20190342201	Singh	Nov 2019	A1
20190342219	Liu et al.	Nov 2019	A1
20190356736	Narayanaswamy et al.	Nov 2019	A1
20190364099	Thakkar et al.	Nov 2019	A1
20190364456	Yu	Nov 2019	A1
20190372888	Michael et al.	Dec 2019	A1
20190372889	Michael et al.	Dec 2019	A1
20190372890	Michael et al.	Dec 2019	A1
20190394081	Tahhan et al.	Dec 2019	A1
20200014609	Hockett et al.	Jan 2020	A1
20200014615	Michael et al.	Jan 2020	A1
20200014616	Michael et al.	Jan 2020	A1
20200014661	Mayya et al.	Jan 2020	A1
20200014663	Chen et al.	Jan 2020	A1
20200021514	Michael et al.	Jan 2020	A1
20200021515	Michael	Jan 2020	A1
20200036624	Michael et al.	Jan 2020	A1
20200044943	Bor-Yaliniz et al.	Feb 2020	A1
20200044969	Hao et al.	Feb 2020	A1
20200059420	Abraham	Feb 2020	A1
20200059457	Raza et al.	Feb 2020	A1
20200059459	Abraham et al.	Feb 2020	A1
20200067831	Spraggins et al.	Feb 2020	A1
20200092207	Sipra et al.	Mar 2020	A1
20200097327	Beyer et al.	Mar 2020	A1
20200099625	Mgit et al.	Mar 2020	A1
20200099659	Cometto et al.	Mar 2020	A1
20200106696	Michael et al.	Apr 2020	A1
20200106706	Mayya et al.	Apr 2020	A1
20200119952	Mayya et al.	Apr 2020	A1
20200127905	Mayya et al.	Apr 2020	A1
20200127911	Gilson et al.	Apr 2020	A1
20200153701	Mohan et al.	May 2020	A1
20200153736	Liebherr et al.	May 2020	A1
20200159661	Keymolen et al.	May 2020	A1
20200162407	Tillotson	May 2020	A1
20200169473	Rimar et al.	May 2020	A1
20200177503	Hooda et al.	Jun 2020	A1
20200177550	Valluri et al.	Jun 2020	A1
20200177629	Hooda et al.	Jun 2020	A1
20200186471	Shen et al.	Jun 2020	A1
20200195557	Duan et al.	Jun 2020	A1
20200204460	Schneider et al.	Jun 2020	A1
20200213212	Dillon et al.	Jul 2020	A1
20200213224	Cheng et al.	Jul 2020	A1
20200218558	Sreenath et al.	Jul 2020	A1
20200235990	Janakiraman et al.	Jul 2020	A1
20200235999	Mayya et al.	Jul 2020	A1
20200236046	Jain et al.	Jul 2020	A1
20200241927	Yang et al.	Jul 2020	A1
20200244721	S et al.	Jul 2020	A1
20200252234	Ramamoorthi et al.	Aug 2020	A1
20200259700	Bhalla et al.	Aug 2020	A1
20200267184	Vera-Schockner	Aug 2020	A1
20200267203	Jindal et al.	Aug 2020	A1
20200280587	Janakiraman et al.	Sep 2020	A1
20200287819	Theogaraj et al.	Sep 2020	A1
20200287976	Theogaraj et al.	Sep 2020	A1
20200296011	Jain et al.	Sep 2020	A1
20200296026	Michael et al.	Sep 2020	A1
20200301764	Thoresen et al.	Sep 2020	A1
20200314006	Mackie et al.	Oct 2020	A1
20200314614	Moustafa et al.	Oct 2020	A1
20200322230	Natal et al.	Oct 2020	A1
20200322287	Connor et al.	Oct 2020	A1
20200336336	Sethi et al.	Oct 2020	A1
20200344089	Motwani et al.	Oct 2020	A1
20200344143	Faseela et al.	Oct 2020	A1
20200344163	Gupta et al.	Oct 2020	A1
20200351188	Arora et al.	Nov 2020	A1
20200358878	Bansal et al.	Nov 2020	A1
20200366530	Mukundan et al.	Nov 2020	A1
20200366562	Mayya et al.	Nov 2020	A1
20200366611	Kommula	Nov 2020	A1
20200382345	Zhao et al.	Dec 2020	A1
20200382387	Pasupathy et al.	Dec 2020	A1
20200403821	Dev et al.	Dec 2020	A1
20200412483	Tan et al.	Dec 2020	A1
20200412576	Kondapavuluru et al.	Dec 2020	A1
20200413283	Shen et al.	Dec 2020	A1
20210006482	Hwang et al.	Jan 2021	A1
20210006490	Michael et al.	Jan 2021	A1
20210021538	Meck et al.	Jan 2021	A1
20210029019	Kottapalli	Jan 2021	A1
20210029088	Mayya et al.	Jan 2021	A1
20210036888	Makkalla et al.	Feb 2021	A1
20210036987	Mishra et al.	Feb 2021	A1
20210037159	Shimokawa	Feb 2021	A1
20210049191	Masson et al.	Feb 2021	A1
20210067372	Cidon et al.	Mar 2021	A1
20210067373	Cidon et al.	Mar 2021	A1
20210067374	Cidon et al.	Mar 2021	A1
20210067375	Cidon et al.	Mar 2021	A1
20210067407	Cidon et al.	Mar 2021	A1
20210067427	Cidon et al.	Mar 2021	A1
20210067442	Sundararajan et al.	Mar 2021	A1
20210067461	Cidon et al.	Mar 2021	A1
20210067464	Cidon et al.	Mar 2021	A1
20210067467	Cidon et al.	Mar 2021	A1
20210067468	Cidon et al.	Mar 2021	A1
20210073001	Rogers et al.	Mar 2021	A1
20210092062	Dhanabalan et al.	Mar 2021	A1
20210099360	Parsons et al.	Apr 2021	A1
20210105199	H et al.	Apr 2021	A1
20210111998	Saavedra	Apr 2021	A1
20210112034	Sundararajan et al.	Apr 2021	A1
20210126830	R et al.	Apr 2021	A1
20210126853	Ramaswamy et al.	Apr 2021	A1
20210126854	Guo et al.	Apr 2021	A1
20210126860	Ramaswamy	Apr 2021	A1
20210144091	H et al.	May 2021	A1
20210160169	Shen et al.	May 2021	A1
20210160813	Gupta et al.	May 2021	A1
20210176255	Hill et al.	Jun 2021	A1
20210184952	Mayya et al.	Jun 2021	A1
20210184966	Ramaswamy	Jun 2021	A1
20210184983	Ramaswamy	Jun 2021	A1
20210194814	Roux et al.	Jun 2021	A1
20210226880	Ramamoorthy et al.	Jul 2021	A1
20210234728	Cidon et al.	Jul 2021	A1
20210234775	Devadoss et al.	Jul 2021	A1
20210234786	Devadoss et al.	Jul 2021	A1
20210234804	Devadoss et al.	Jul 2021	A1
20210234805	Devadoss et al.	Jul 2021	A1
20210235312	Devadoss et al.	Jul 2021	A1
20210235313	Devadoss et al.	Jul 2021	A1
20210266262	Subramanian et al.	Aug 2021	A1
20210279069	Salgaonkar et al.	Sep 2021	A1
20210314289	Chandrashekhar et al.	Oct 2021	A1
20210314385	Pande et al.	Oct 2021	A1
20210328835	Mayya et al.	Oct 2021	A1
20210336880	Gupta et al.	Oct 2021	A1
20210377109	Shrivastava et al.	Dec 2021	A1
20210377156	Michael et al.	Dec 2021	A1
20210392060	Silva et al.	Dec 2021	A1
20210392070	Tootaghaj et al.	Dec 2021	A1
20210399920	Sundararajan et al.	Dec 2021	A1
20210399978	Michael et al.	Dec 2021	A9
20210400113	Markuze et al.	Dec 2021	A1
20210400512	Agarwal et al.	Dec 2021	A1
20210409277	Jeuk et al.	Dec 2021	A1
20220006726	Michael et al.	Jan 2022	A1
20220006751	Ramaswamy et al.	Jan 2022	A1
20220006756	Ramaswamy	Jan 2022	A1
20220029902	Shemer et al.	Jan 2022	A1
20220035673	Markuze et al.	Feb 2022	A1
20220038370	Vasseur et al.	Feb 2022	A1
20220038557	Markuze et al.	Feb 2022	A1
20220045927	Liu et al.	Feb 2022	A1
20220052928	Sundararajan et al.	Feb 2022	A1
20220061059	Dunsmore et al.	Feb 2022	A1
20220086035	Devaraj et al.	Mar 2022	A1
20220094644	Cidon et al.	Mar 2022	A1
20220123961	Mukundan et al.	Apr 2022	A1
20220131740	Mayya et al.	Apr 2022	A1
20220131807	Srinivas et al.	Apr 2022	A1
20220131898	Hooda et al.	Apr 2022	A1
20220141184	Oswal et al.	May 2022	A1
20220158923	Ramaswamy et al.	May 2022	A1
20220158924	Ramaswamy et al.	May 2022	A1
20220158926	Wennerström et al.	May 2022	A1
20220166713	Markuze et al.	May 2022	A1
20220191719	Roy	Jun 2022	A1
20220198229	López et al.	Jun 2022	A1
20220210035	Hendrickson et al.	Jun 2022	A1
20220210041	Gandhi et al.	Jun 2022	A1
20220210042	Gandhi et al.	Jun 2022	A1
20220210122	Levin et al.	Jun 2022	A1
20220217015	Vuggrala et al.	Jul 2022	A1
20220231949	Ramaswamy et al.	Jul 2022	A1
20220231950	Ramaswamy et al.	Jul 2022	A1
20220232411	Vijayakumar et al.	Jul 2022	A1
20220239596	Kumar et al.	Jul 2022	A1
20220294701	Mayya et al.	Sep 2022	A1
20220335027	Seshadri et al.	Oct 2022	A1
20220337553	Mayya et al.	Oct 2022	A1
20220353152	Ramaswamy	Nov 2022	A1
20220353171	Ramaswamy et al.	Nov 2022	A1
20220353175	Ramaswamy et al.	Nov 2022	A1
20220353182	Ramaswamy et al.	Nov 2022	A1
20220353190	Ramaswamy et al.	Nov 2022	A1
20220360500	Ramaswamy et al.	Nov 2022	A1
20220407773	Kempanna et al.	Dec 2022	A1
20220407774	Kempanna et al.	Dec 2022	A1
20220407790	Kempanna et al.	Dec 2022	A1
20220407820	Kempanna et al.	Dec 2022	A1
20220407915	Kempanna et al.	Dec 2022	A1
20230006929	Mayya et al.	Jan 2023	A1
20230025586	Rolando et al.	Jan 2023	A1
20230026330	Rolando et al.	Jan 2023	A1
20230026865	Rolando et al.	Jan 2023	A1
20230028872	Ramaswamy	Jan 2023	A1
20230039869	Ramaswamy et al.	Feb 2023	A1
20230041916	Zhang et al.	Feb 2023	A1
20230054961	Ramaswamy et al.	Feb 2023	A1
20230105680	Simlai et al.	Apr 2023	A1
20230121871	Mayya et al.	Apr 2023	A1
20230156826	Palermo	May 2023	A1
20230179445	Cidon et al.	Jun 2023	A1
20230179502	Ramaswamy et al.	Jun 2023	A1
20230179521	Markuze et al.	Jun 2023	A1
20230179543	Cidon et al.	Jun 2023	A1
20230216768	Zohar et al.	Jul 2023	A1
20230216801	Markuze et al.	Jul 2023	A1
20230216804	Zohar et al.	Jul 2023	A1
20230221874	Markuze et al.	Jul 2023	A1
20230224356	Markuze et al.	Jul 2023	A1
20230224759	Ramaswamy	Jul 2023	A1
20230231845	Manoharan et al.	Jul 2023	A1
20230239234	Zohar et al.	Jul 2023	A1
20230261974	Ramaswamy	Aug 2023	A1
20240031264	Nigam	Jan 2024	A1

Foreign Referenced Citations (52)

Number	Date	Country
1926809	Mar 2007	CN
102577270	Jul 2012	CN
102811165	Dec 2012	CN
104956329	Sep 2015	CN
106656847	May 2017	CN
106998284	Aug 2017	CN
110447209	Nov 2019	CN
111198764	May 2020	CN
116783874	Sep 2023	CN
117178535	Dec 2023	CN
1912381	Apr 2008	EP
2538637	Dec 2012	EP
2763362	Aug 2014	EP
3041178	Jul 2016	EP
3297211	Mar 2018	EP
3509256	Jul 2019	EP
3346650	Nov 2019	EP
106230650	Dec 2016	IN
2002368792	Dec 2002	JP
2010233126	Oct 2010	JP
2014200010	Oct 2014	JP
2017059991	Mar 2017	JP
2017524290	Aug 2017	JP
20170058201	May 2017	KR
2574350	Feb 2016	RU
03073701	Sep 2003	WO
2005071861	Aug 2005	WO
2007016834	Feb 2007	WO
2012167184	Dec 2012	WO
2015092565	Jun 2015	WO
2016061546	Apr 2016	WO
2016123314	Aug 2016	WO
2017083975	May 2017	WO
2019070611	Apr 2019	WO
2019094522	May 2019	WO
2020012491	Jan 2020	WO
2020018704	Jan 2020	WO
2020091777	May 2020	WO
2020101922	May 2020	WO
2020112345	Jun 2020	WO
2021040934	Mar 2021	WO
2021118717	Jun 2021	WO
2021150465	Jul 2021	WO
2021211906	Oct 2021	WO
2022005607	Jan 2022	WO
2022082680	Apr 2022	WO
2022154850	Jul 2022	WO
2022159156	Jul 2022	WO
2022231668	Nov 2022	WO
2022235303	Nov 2022	WO
2022265681	Dec 2022	WO
2023009159	Feb 2023	WO

Non-Patent Literature Citations (58)

Entry
Non-Published Commonly Owned U.S. Appl. No. 18/211,568, filed Jun. 19, 2023, 37 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/222,864, filed Jul. 17, 2023, 350 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/222,868, filed Jul. 17, 2023, 22 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/224,466, filed Jul. 20, 2023, 56 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/235,879, filed Aug. 20, 2023, 173 pages, VMware, Inc.
Alsaeedi, Mohammed, et al., “Toward Adaptive and Scalable OpenFlow-SDN Flow Control: A Survey,” IEEE Access, Aug. 1, 2019, 34 pages, vol. 7, IEEE, retrieved from https://ieeexplore.ieee.org/document/8784036.
Alvizu, Rodolfo, et al., “SDN-Based Network Orchestration for New Dynamic Enterprise Networking Services,” 2017 19th International Conference on Transparent Optical Networks, Jul. 2-6, 2017, 4 pages, IEEE, Girona, Spain.
Author Unknown, “VeloCloud Administration Guide: VMware SD-WAN by VeloCloud 3.3,” Month Unknown 2019, 366 pages, VMware, Inc., Palo Alto, CA, USA.
Barozet, Jean-Marc, “Cisco SD-WAN as a Managed Service,” BRKRST-2558, Jan. 27-31, 2020, 98 pages, Cisco, Barcelona, Spain, retrieved from https://www.ciscolive.com/c/dam/r/ciscolive/emea/docs/2020/pdf/BRKRST-2558.pdf.
Barozet, Jean-Marc, “Cisco SDWAN,” Deep Dive, Dec. 2017, 185 pages, Cisco, Retreived from https://www.coursehero.com/file/71671376/Cisco-SDWAN-Deep-Divepdf/.
Bertaux, Lionel, et al., “Software Defined Networking and Virtualization for Broadband Satellite Networks,” IEEE Communications Magazine, Mar. 18, 2015, 7 pages, vol. 53, IEEE, retrieved from https://ieeexplore.ieee.org/document/7060482.
Cox, Jacob H., et al., “Advancing Software-Defined Networks: A Survey,” IEEE Access, Oct. 12, 2017, 40 pages, vol. 5, IEEE, retrieved from https://ieeexplore.ieee.org/document/8066287.
Del Piccolo, Valentin, et al., “A Survey of Network Isolation Solutions for Multi-Tenant Data Centers,” IEEE Communications Society, Apr. 20, 2016, vol. 18, No. 4, 37 pages, IEEE.
Duan, Zhenhai, et al., “Service Overlay Networks: SLAs, QoS, and Bandwidth Provisioning,” IEEE/ACM Transactions on Networking, Dec. 2003, 14 pages, vol. 11, IEEE, New York, NY, USA.
Fortz, Bernard, et al., “Internet Traffic Engineering by Optimizing OSPF Weights,” Proceedings IEEE Infocom 2000, Conference on Computer Communications, Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies, Mar. 26-30, 2000, 11 pages, IEEE, Tel Aviv, Israel, Israel.
Francois, Frederic, et al., “Optimizing Secure SDN-enabled Inter-Data Centre Overlay Networks through Cognitive Routing,” 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), Sep. 19-21, 2016, 10 pages, IEEE, London, UK.
Funabiki, Nobuo, et al., “A Frame Aggregation Extension of Routing Algorithm for Wireless Mesh Networks,” 2014 Second International Symposium on Computing and Networking, Dec. 10-12, 2014, 5 pages, IEEE, Shizuoka, Japan.
Guo, Xiangyi, et al., (U.S. Appl. No. 62/925,193), filed Oct. 23, 2019, 26 pages.
Huang, Cancan, et al., “Modification of Q.SD-WAN,” Rapporteur Group Meeting—Doc, Study Period 2017-2020, Q4/11-DOC1 (190410), Study Group 11, Apr. 10, 2019, 19 pages, International Telecommunication Union, Geneva, Switzerland.
Jivorasetkul, Supalerk, et al., “End-to-End Header Compression over Software-Defined Networks: a Low Latency Network Architecture,” 2012 Fourth International Conference on Intelligent Networking and Collaborative Systems, Sep. 19-21, 2012, 2 pages, IEEE, Bucharest, Romania.
Lasserre, Marc, et al., “Framework for Data Center (DC) Network Virtualization,” RFC 7365, Oct. 2014, 26 pages, IETF.
Li, Shengru, et al., “Source Routing with Protocol-oblivious Forwarding (POF) to Enable Efficient e-Health Data Transfers,” 2016 IEEE International Conference on Communications (ICC), May 22-27, 2016, 6 pages, IEEE, Kuala Lumpur, Malaysia.
Lin, Weidong, et al., “Using Path Label Routing in Wide Area Software-Defined Networks with Open Flow,” 2016 International Conference on Networking and Network Applications, Jul. 2016, 6 pages, IEEE.
Long, Feng, “Research and Application of Cloud Storage Technology in University Information Service,” Chinese Excellent Masters' Theses Full-text Database, Mar. 2013, 72 pages, China Academic Journals Electronic Publishing House, China.
Michael, Nithin, et al., “HALO: Hop-by-Hop Adaptive Link-State Optimal Routing,” IEEE/ACM Transactions on Networking, Dec. 2015, 14 pages, vol. 23, No. 6, IEEE.
Ming, Gao, et al., “A Design of SD-WAN-Oriented Wide Area Network Access,” 2020 International Conference on Computer Communication and Network Security (CCNS), Aug. 21-23, 2020, 4 pages, IEEE, Xi'an, China.
Mishra, Mayank, et al., “Managing Network Reservation for Tenants in Oversubscribed Clouds,” 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, Aug. 14-16, 2013, 10 pages, IEEE, San Francisco, CA, USA.
Mudigonda, Jayaram, et al., “NetLord: A Scalable Multi-Tenant Network Architecture for Virtualized Datacenters,” Proceedings of the ACM SIGCOMM 2011 Conference, Aug. 15-19, 2011, 12 pages, ACM, Toronto, Canada.
Non-Published Commonly Owned U.S. Appl. No. 17/574,225, filed Jan. 12, 2022, 56 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 17/574,236, filed Jan. 12, 2022, 54 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 17/833,555, filed Jun. 6, 2022, 34 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 17/833,566, filed Jun. 6, 2022, 35 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 17/966,814, filed Oct. 15, 2022, 176 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 17/966,820, filed Oct. 15, 2022, 26 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 17/976,717, filed Oct. 28, 2022, 37 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/088,554, filed Dec. 24, 2022, 34 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/088,555, filed Dec. 24, 2022, 35 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/088,556, filed Dec. 24, 2022, 27 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/096,001, filed Jan. 11, 2023, 34 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/100,369, filed Jan. 23, 2023, 55 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/100,381, filed Jan. 23, 2023, 55 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/100,397, filed Jan. 23, 2023, 55 pages, VMware, Inc.
Non-Published Commonly Owned Related U.S. Appl. No. 18/126,989 with similar specification, filed Mar. 27, 2023, 83 pages, VMware, Inc.
Non-Published Commonly Owned Related U.S. Appl. No. 18/126,991 with similar specification, filed Mar. 27, 2023, 84 pages, VMware, Inc.
Non-Published Commonly Owned Related U.S. Appl. No. 18/126,992 with similar specification, filed Mar. 27, 2023, 84 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/137,584, filed Apr. 21, 2023, 57 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/197,090, filed May 14, 2023, 36 pages, Nicira, Inc.
Noormohammadpour, Mohammad, et al., “DCRoute: Speeding up Inter-Datacenter Traffic Allocation while Guaranteeing Deadlines,” 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), Dec. 19-22, 2016, 9 pages, IEEE, Hyderabad, India.
Ray, Saikat, et al., “Always Acyclic Distributed Path Computation,” University of Pennsylvania Department of Electrical and Systems Engineering Technical Report, May 2008, 16 pages, University of Pennsylvania ScholarlyCommons.
Sarhan, Soliman Abd Elmonsef, et al., “Data Inspection in SDN Network,” 2018 13th International Conference on Computer Engineering and Systems (ICCES), Dec. 18-19, 2018, 6 pages, IEEE, Cairo, Egypt.
Taleb, Tarik, “D4.1 Mobile Network Cloud Component Design,” Mobile Cloud Networking, Nov. 8, 2013, 210 pages, MobileCloud Networking Consortium, retrieved from http://www.mobile-cloud-networking.eu/site/index.php? process=download&id=127&code=89d30565cd2ce087d3f8e95f9ad683066510a61f.
Tootaghaj, Diman Zad, et al., “Homa: An Efficient Topology and Route Management Approach in SD-WAN Overlays,” IEEE INFOCOM 2020_13 IEEE Conference on Computer Communications, Jul. 6-9, 2020, 10 pages, IEEE, Toronto, ON, Canada.
Valtulina, Luca, “Seamless Distributed Mobility Management (DMM) Solution in Cloud Based LTE Systems,” Master Thesis, Nov. 2013, 168 pages, University of Twente, retrieved from http://essay.utwente.nl/64411/1/Luca_Valtulina_MSc_Report_final.pdf.
Webb, Kevin C., et al., “Blender: Upgrading Tenant-Based Data Center Networking,” 2014 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), Oct. 20-21, 2014, 11 pages, IEEE, Marina del Rey, CA, USA.
Xie, Junfeng, et al., A Survey of Machine Learning Techniques Applied to Software Defined Networking (SDN): Research Issues and Challenges, IEEE Communications Surveys & Tutorials, Aug. 23, 2018, 38 pages, vol. 21, Issue 1, IEEE.
Yap, Kok-Kiong, et al., “Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering,” SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, Aug. 21-25, 2017, 14 pages, Los Angeles, CA.
Zakurdaev, Gieorgi, et al., “Dynamic On-Demand Virtual Extensible LAN Tunnels via Software-Defined Wide Area Networks,” 2022 IEEE 12th Annual Computing and Communication Workshop and Conference, Jan. 26-29, 2022, 6 pages, IEEE, Las Vegas, NV, USA.
Non-Published Commonly Owned U.S. Appl. No. 15/803,964, filed Nov. 6, 2017, 15 pages, The Mode Group.

Identifying and remediating anomalies in a self-healing network

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract

Description

Claims

US Referenced Citations (1021)

Foreign Referenced Citations (52)

Non-Patent Literature Citations (58)