Identifying and remediating anomalies in a self-healing network

Information

  • Patent Grant
  • 12034587
  • Patent Number
    12,034,587
  • Date Filed
    Monday, March 27, 2023
    a year ago
  • Date Issued
    Tuesday, July 9, 2024
    2 months ago
Abstract
Some embodiments of the invention provide a method of remediating anomalies in an SD-WAN implemented by multiple forwarding elements (FEs) located at multiple sites connected by the SD-WAN. The method is performed iteratively. The method receives multiple performance metrics that over a duration of time express a performance of the SD-WAN for at least one particular application associated with flows that traverse the SD-WAN during the time duration. The method uses the received performance metrics to update generated weight values for a topology graph that includes (1) multiple nodes representing the multiple FEs and (2) multiple edges between the multiple nodes representing paths traversed between the FEs by the flows associated with the particular application, said generated weight values associated with said paths. The method uses a topology-based machine-trained process to analyze the topology graph with the generated weight values in order to identify an anomaly in the topology graph that is indicative of an anomaly the SD-WAN for the particular application's traffic flows. For an identified anomaly, the method implements a remedial action to modify the SD-WAN in order to remediate the identified anomaly.
Description
BACKGROUND

Today, a software-defined wide-area network (SD-WAN) allows enterprises to build flexible WAN (wide area network) networks using programmable network components. The decoupling of the control plane and data plane in a software-defined architecture enables sophisticated Artificial Intelligence and Operations (AIOps) platforms to continually monitor and program the network to respond to higher level security and application considerations. For example, AIOps systems can look at granular flow metadata coupled with global underlay and overlay topology data to reliably detect problems and apply remedial control actions in an analytics-controlled feedback loop.


As workforces become more distributed and applications migrate to multiple clouds, networks become critical components of enterprise productivity. As such, issues on LANs (local area networks), WANs, datacenter, and/or within applications themselves have direct impact on end-user application performance, such as a network router on a flow path suddenly experiencing high packet drops on its outgoing links and causing the end-to-end application to slow down.


BRIEF SUMMARY

Some embodiments of the invention provide a method of detecting and autonomously remediating anomalies in a self-healing SD-WAN (software-defined wide-area network) implemented by multiple forwarding elements (FEs), each of which is located at one of multiple sites connected by the SD-WAN. The method of some embodiments is performed by an anomaly detection and remediation system for the SD-WAN. In some embodiments, the anomaly detection and remediation system includes one or more anomaly detection and remediation processes executed by one or more machines in a cluster. The one or more machines, in some embodiments, include one or more host computers. Also, in some embodiments, the anomaly detection and remediation system is implemented as part of an ENI (Edge Network Intelligence) platform.


From the multiple FEs, the method of some embodiments receives multiple sets of flow data associated with application traffic that traverses the multiple FEs (e.g., edge routers, gateway routers, and hub routers). The method uses a first set of machine-trained processes to analyze the multiple sets of flow data in order to identify at least one anomaly associated with at least one particular FE. The method then uses a second set of machine-trained processes to identify at least one remedial action for remediating the identified anomaly. The method implements the identified remedial action by directing an SD-WAN controller deployed in the SD-WAN to implement the identified remedial action.


In some embodiments, the anomaly detection and remediation system includes multiple sub-systems that execute various processes to detect and remediate anomalies. These sub-systems, in some embodiments, include a data ingestion system, an analytics system, and a control action system. The data ingestion system, of some embodiments, receives the flow data from the FEs that implement the SD-WAN and parses the received flow data into a data structure used internally by the anomaly detection and remediation system. The analytics system receives the parsed flow data from the ingestion system, in some embodiments, aggregates the data, computes performance scores from the data, and uses one or more anomaly detection machine-trained processes to analyze the computed performance scores to identify any anomalies. The control action system of some embodiments receives identified anomalies, uses one or more machine-trained processes to identify remedial actions to obviate the identified anomalies, and sends API calls to the SD-WAN controller to direct the SD-WAN controller to implement the remedial actions in the SD-WAN.


The received flow data, in some embodiments, is associated with multiple applications. In some embodiments, each received set of flow data includes a five-tuple identifier for the associated flow, an application identifier associated with the flow, and a protocol associated with the flow. The flow data, in some embodiments, also includes a set of flow statistics associated with the flow, an overlay route type associated with the flow, a next-hop overlay node for the flow, and a destination hop overlay node associated with the flow. In some embodiments, the set of flow statistics includes an amount of TX bytes associated with the flow, an amount of RX bytes associated with the flow, TCP latency associated with the flow, and a number of TCP re-transmissions associated with the flow.


In some embodiments, the analytics system aggregates the flow data by performing a time aggregation operation on the flow data at a first granularity in order to generate a set of aggregated flow data. For each FE of the multiple FEs, the analytics system of some embodiments uses the set of aggregated flow data to generate a set of performance scores at a second granularity. In some embodiments, the first granularity is a per-minute granularity, and the second granularity is a per-application granularity such that for each FE, each performance score in the set of performance scores for the FE corresponds to a particular minute of time (i.e., in a duration of time over which the flow data was collected) and a particular application. In some embodiments, the performance scores are per-edge, per-application, and per-path.


The machine-trained processes used to analyze the performance scores, in some embodiments, are part of a set of anomaly detection processes that detect anomalies at a short timescale (e.g., 30 minutes), and at a longer timescale (e.g., two weeks). In some embodiments, the shorter timescale anomaly detection process identifies, for each particular FE of the multiple FEs, performance scores associated with each application of multiple applications for which the particular FE forwards traffic flows.


For each particular application, the shorter timescale anomaly detection process generates a distribution graph (e.g., a Gaussian distribution curve) that shows the identified performance scores associated with the particular application for the particular FE over a first duration of time. The shorter timescale anomaly detection process then analyzes the generated distribution graphs using a machine-trained process (e.g., a sliding window Gaussian outlier detection process) to identify one or more per-application incidents by identifying that a threshold number of performance scores associated with the particular application (1) are outliers with respect to the generated distribution graph for the particular application and (2) occurred within a second duration of time.


In some embodiments, each generated distribution graph represents a distribution of the set of performance scores for the particular application over the first duration of time (e.g., 30 minutes). To generate the distribution graph, in some embodiments, the shorter timescale anomaly detection process computes a sample mean of the performance scores for the first duration of time and a standard deviation of the performance scores for the first duration of time. In some embodiments, the computed sample mean and standard deviation are dynamic parameters that change over time based on the generated performance scores. As such, each distribution graph generated for a particular FE varies compared to each other distribution graph generated for the particular FE as the performance scores computed for different durations of time affect the dynamic parameters used to generate the distribution graphs.


In order to identify that a threshold number of performance scores associated with the particular application are outliers with respect to the generated distribution graph for the particular application, some embodiments use the dynamic parameters to determine whether a threshold number of performance scores in the set of performance scores for the particular application exceed a specified threshold of performance. In some embodiments, the first and second durations of time are different durations of time (e.g., different 30 minute time windows), while in other embodiments, the first and second durations of time are the same duration of time (e.g., the same 30 minute time window). In still other embodiments the second duration of time is a subset of the first duration of time (e.g., a 5 minute subset of time within the 30 minute window).


In some embodiments, the longer timescale anomaly detection process is performed iteratively. The longer timescale anomaly detection process of some embodiments receives multiple performance scores that over a duration of time (e.g., two weeks) express a performance of the SD-WAN for at least one particular application associated with flows that traverse the SD-WAN during the time duration. The longer timescale anomaly detection process uses the received performance scores, in some embodiments, to update generated weight values for a topology graph that includes (1) multiple nodes representing the multiple FEs and (2) multiple edges between the multiple nodes representing paths traversed between the edges by the flows associated with the particular application, with the generated weight values being associated with said paths.


The longer timescale anomaly detection process of some embodiments uses a topology-based machine-trained process to analyze the topology graph with the updated generated weight values in order to identify an anomaly in the topology graph that is indicative of an anomaly the SD-WAN for the particular application's traffic flows. For an identified anomaly, the longer timescale anomaly detection process implements a remedial action to modify the SD-WAN in order to remediate the identified anomaly (e.g., by sending an API call identifying the remedial actions to an SD-WAN controller), according to some embodiments.


In some embodiments, when using the topology-based machine-trained process to analyze the topology graph with the generated weight values in order to identify an anomaly, the longer timescale anomaly detection process also determines whether the identified anomaly is isolated to a particular FE (e.g., isolated to a particular edge FE or due to a particular transit FE), or affects the overall application. For instance, in some embodiments, the identified anomaly is a network impairment on a first transit FE that is a next-hop FE for application traffic associated with the particular application and forwarded by a first edge FE located at a first branch site. In some such embodiments, the identified remedial action includes updating a transit FE order configuration for the first edge FE to change the next-hop transit FE for application traffic associated with the particular application and forwarded by the first edge FE from the first transit FE to a second transit FE.


The first transit FE, in some embodiments, is also a next-hop transit FE for application traffic associated with the particular application and forwarded by a second edge FE located at a second branch site. In some such embodiments, the identified anomaly is also associated with the second edge FE when application traffic for the particular application and forwarded by the second edge FE is also affected by the first transit FE's anomalous behavior. The identified remedial action, in some embodiments, is to update transit FE orders for both the first edge FE and the second edge FE.


In some embodiments, the particular application is a first application and the first transit FE is also a next-hop transit FE for application traffic associated with a second application and forwarded by the first edge FE. In some such embodiments, the identified remedial action includes updating the transit FE order configuration for the first edge FE to change the next-hop transit FE from the first transit FE to the second transit FE for application traffic associated with both the first and second applications. When the network impairment only affects traffic associated with the first application, the transit FE order configuration for the particular edge FE is only updated for traffic associated with the first application and not the second application, according to some embodiments.


When an identified anomaly is determined to require remediation to improve performance of as set of one or more flows, in some embodiments, the control action system mentioned above is utilized for identifying and implementing one or more remedial actions that modify the SD-WAN. For a particular anomaly, the control action system, in some embodiments, identifies a set of two or more remedial actions for remediating the particular anomaly in the SD-WAN.


For each identified remedial action in the set, the control action system selectively implements the identified remedial action for a subset of the set of flows for a duration of time in order to collect a set of performance metrics associated with SD-WAN performance during the duration of time for which the identified remedial action is implemented. Based on the collected sets of performance metrics, the control action system of some embodiments uses a machine-trained process to select one of the identified remedial actions as an optimal remedial action to implement for all of the flows in the set, and uniformly implements the selected remedial action for all of the flows in the set.


The particular anomaly, in some embodiments, is an increased latency associated with a first transit FE that forwards application data traffic between one or more edge FEs located at one or more branch sites connected by the SD-WAN and one or more applications deployed to a first cloud datacenter connected by the SD-WAN. The set of two or more remedial actions, in some embodiments, include two or more alternate routes through the SD-WAN to the particular application. For example, the alternate routes of some embodiments include at least (1) a first alternate route between the one or more edge FEs and the one or more applications deployed to a second cloud datacenter connected to the SD-WAN via a second transit FE, and (2) a second alternate route between the one or more edge FEs and the one or more applications deployed to a third cloud datacenter connected to the SD-WAN via a third transit FE.


In some embodiments, the control action system selectively implements each identified remedial action (e.g., each identified alternate path) for a subset of the set of flows for the duration of time by directing (e.g., via an API call to the SD-WAN controller specifying the remedial action) the one or more edge FEs to use the second transit FE to forward a first subset of flows to the one or more applications deployed to the second cloud datacenter, and directing the one or more edge FEs to use the third transit FE to forwards a second subset of flows to the one or more applications deployed to the third cloud datacenter. The one or more edge FEs of some embodiments continue to use the first transit FE to forward a remaining third subset of flows to the one or more applications deployed to the first cloud datacenter.


As the performance metrics are collected for each selectively implemented remedial action, the control action system of some embodiments receives (e.g., from the analytics system described above) or computes itself a performance score for each remedial action. When a performance score generated for the first alternate route is higher than a performance score generated for the second alternate route (i.e., is more optimal), in some embodiments, the machine-trained process of the control action system selects the first alternate route to implement for the set of flows. Alternatively, when the performance score generated for the second alternate route is higher than the performance score generated for the first alternate route, the machine-trained process of the control action system selects the second alternate route to implement for the set of flows. The control action system of some embodiments then sends an API call to the SD-WAN controller to direct the SD-WAN control to update configurations for the one or more edge FEs to cause the edge FEs to use the selected remedial action (e.g., selected alternate path) for all flows in the set.


The preceding Summary is intended to serve as a brief introduction to some embodiments of the invention. It is not meant to be an introduction or overview of all inventive subject matter disclosed in this document. The Detailed Description that follows and the Drawings that are referred to in the Detailed Description will further describe the embodiments described in the Summary as well as other embodiments. Accordingly, to understand all the embodiments described by this document, a full review of the Summary, the Detailed Description, the Drawings, and the Claims is needed. Moreover, the claimed subject matters are not to be limited by the illustrative details in the Summary, the Detailed Description, and the Drawings.





BRIEF DESCRIPTION OF FIGURES

The novel features of the invention are set forth in the appended claims. However, for purposes of explanation, several embodiments of the invention are set forth in the following figures.



FIG. 1 conceptually illustrates a schematic diagram of a self-healing SD-WAN overlay network architecture of some embodiments.



FIG. 2, for instance, conceptually illustrates a block diagram that includes an analytics machine cluster of some embodiments.



FIG. 3 conceptually illustrates a block diagram of interactions of some embodiments between a control action system of an ENI platform deployed in an SD-WAN and an SD-WAN controller for the SD-WAN.



FIG. 4 illustrates a list of the above-mentioned API calls and the responses to these API calls, in some embodiments.



FIG. 5 conceptually illustrates a more detailed block diagram of an ENI platform of some embodiments that includes separate anomaly detectors for the shorter and longer timescales described above and an in-depth view of the shorter timescale anomaly detector.



FIG. 6 illustrates a graph of some embodiments that includes an example of a Gaussian distribution curve that is generated using performance scores computed over a particular 30 minute time window.



FIG. 7 illustrates an example graph of some embodiments that includes four different distribution curves.



FIG. 8 illustrates a process performed in some embodiments to identify anomalies at the shorter timescale.



FIG. 9 conceptually illustrates another block diagram of the ENI platform of FIG. 5 with an in-depth view of the longer timescale anomaly detector of some embodiments.



FIG. 10 illustrates simplified examples of two topology graphs of some embodiments for first and second applications.



FIG. 11 conceptually illustrates a process performed in some embodiments to identify anomalies at the longer timescale.



FIG. 12 conceptually illustrates a block diagram that provides a more in-depth view of the control action system of some embodiments.



FIG. 13 conceptually illustrates a reinforcement learning process performed by the control action system of some embodiments using the greedy algorithm described above.



FIG. 14 conceptually illustrates an example diagram of a self-healing SD-WAN, of some embodiments, in which alternate routes are monitored for sample flows between an edge router and an application.



FIG. 15 conceptually illustrates another example diagram of a self-healing SD-WAN of some embodiments in which alternate routes are identified between edge devices located at different branch sites for sending VOIP traffic between client devices at the different branch sites.



FIG. 16 conceptually illustrates a process performed in some embodiments to identify and remediate performance incidents in an SD-WAN.



FIG. 17 illustrates the layout of an incident, in some embodiments.



FIG. 18 illustrates an example layout of a recommendation, in some embodiments, that includes QoE (quality of experience) score comparisons for 6 gateways, as well as edge alternate overlay node QoE scores.



FIG. 19 conceptually illustrates a computer system with which some embodiments of the invention are implemented.





DETAILED DESCRIPTION

In the following detailed description of the invention, numerous details, examples, and embodiments of the invention are set forth and described. However, it will be clear and apparent to one skilled in the art that the invention is not limited to the embodiments set forth and that the invention may be practiced without some of the specific details and examples discussed.


Some embodiments of the invention provide a method of detecting and autonomously remediating anomalies in a self-healing SD-WAN (software-defined wide-area network) implemented by multiple forwarding elements (FEs), each of which is located at one of multiple sites connected by the SD-WAN. The method of some embodiments is performed by an anomaly detection and remediation system for the SD-WAN. In some embodiments, the anomaly detection and remediation system includes one or more anomaly detection and remediation processes executed by one or more machines in a cluster. The one or more machines, in some embodiments, include one or more host computers. Also, in some embodiments, the anomaly detection and remediation system is implemented as part of an ENI (Edge Network Intelligence) platform.


From the multiple FEs, the method of some embodiments receives multiple sets of flow data associated with application traffic that traverses the multiple FEs, such as edge forwarding elements deployed at sites (e.g., branch sites), and transit FEs, including gateway forwarding elements deployed in cloud datacenters, and hub forwarding elements deployed in public or private datacenters. In some embodiments, these FEs are routers, e.g., are edge routers, cloud gateway routers, and hub routers.


The method uses a first set of machine-trained processes to analyze the multiple sets of flow data in order to identify at least one anomaly associated with at least one particular FE. The method then uses a second set of machine-trained processes to identify at least one remedial action for remediating the identified anomaly. The method implements the identified remedial action by directing an SD-WAN controller deployed in the SD-WAN to implement the identified remedial action.


In some embodiments, the anomaly detection and remediation system includes multiple sub-systems that execute various processes to detect and remediate anomalies. These sub-systems, in some embodiments, include a data ingestion system, an analytics system, and a control action system. The data ingestion system, of some embodiments, receives the flow data from the FEs that implement the SD-WAN and parses the received flow data into a data structure used internally by the anomaly detection and remediation system. The analytics system receives the parsed flow data from the ingestion system, in some embodiments, aggregates the data, computes performance scores from the data, and uses one or more anomaly detection machine-trained processes to analyze the computed performance scores to identify any anomalies. The control action system of some embodiments receives identified anomalies, uses one or more machine-trained processes to identify remedial actions to obviate the identified anomalies, and sends API calls to the SD-WAN controller to direct the SD-WAN controller to implement the remedial actions in the SD-WAN.


The received flow data, in some embodiments, is associated with multiple applications. In some embodiments, each received set of flow data includes a five-tuple identifier for the associated flow, an application identifier associated with the flow, and a protocol associated with the flow. The flow data, in some embodiments, also includes a set of flow statistics associated with the flow, an overlay route type associated with the flow, a next-hop overlay node for the flow, and a destination hop overlay node associated with the flow. In some embodiments, the set of flow statistics includes an amount of TX bytes associated with the flow, an amount of RX bytes associated with the flow, TCP latency associated with the flow, and a number of TCP re-transmissions associated with the flow.


In some embodiments, the analytics system aggregates the flow data by performing a time aggregation operation on the flow data at a first granularity in order to generate a set of aggregated flow data. For each FE of the multiple FEs, the analytics system of some embodiments uses the set of aggregated flow data to generate a set of performance scores at a second granularity. In some embodiments, the first granularity is a per-minute granularity, and the second granularity is a per-application granularity such that for each FE, each performance score in the set of performance scores for the FE corresponds to a particular minute of time (i.e., in a duration of time over which the flow data was collected) and a particular application. In some embodiments, the performance scores are per-edge, per-application, and per-path.


The machine-trained processes used to analyze the performance scores, in some embodiments, are part of a set of anomaly detection processes that detect anomalies at a short timescale (e.g., 30 minutes), and at a longer timescale (e.g., two weeks). In some embodiments, the shorter timescale anomaly detection process identifies, for each particular FE of the multiple FEs, performance scores associated with each application of multiple applications for which the particular FE forwards traffic flows.


For each particular application, the shorter timescale anomaly detection process generates a distribution graph (e.g., a Gaussian distribution curve) that shows the identified performance scores associated with the particular application for the particular FE over a first duration of time. The shorter timescale anomaly detection process then analyzes the generated distribution graphs using a machine-trained process (e.g., a sliding window Gaussian outlier detection process) to identify one or more per-application incidents by identifying that a threshold number of performance scores associated with the particular application (1) are outliers with respect to the generated distribution graph for the particular application and (2) occurred within a second duration of time.


In some embodiments, each generated distribution graph represents a distribution of the set of performance scores for the particular application over the first duration of time (e.g., 30 minutes). To generate the distribution graph, in some embodiments, the shorter timescale anomaly detection process computes a sample mean of the performance scores for the first duration of time and a standard deviation of the performance scores for the first duration of time. In some embodiments, the computed sample mean and standard deviation are dynamic parameters that change over time based on the generated performance scores. As such, each distribution graph generated for a particular FE varies compared to each other distribution graph generated for the particular FE as the performance scores computed for different durations of time affect the dynamic parameters used to generate the distribution graphs.


In order to identify that a threshold number of performance scores associated with the particular application are outliers with respect to the generated distribution graph for the particular application, some embodiments use the dynamic parameters to determine whether a threshold number of performance scores in the set of performance scores for the particular application exceed a specified threshold of performance. In some embodiments, the first and second durations of time are different durations of time (e.g., different 30 minute time windows), while in other embodiments, the first and second durations of time are the same duration of time (e.g., the same 30 minute time window). In still other embodiments the second duration of time is a subset of the first duration of time (e.g., a 5 minute subset of time within the 30 minute window).


In some embodiments, the longer timescale anomaly detection process is performed iteratively. The longer timescale anomaly detection process of some embodiments receives multiple performance scores that over a duration of time (e.g., two weeks) express a performance of the SD-WAN for at least one particular application associated with flows that traverse the SD-WAN during the time duration. The longer timescale anomaly detection process uses the received performance scores, in some embodiments, to update generated weight values for a topology graph that includes (1) multiple nodes representing the multiple FEs and (2) multiple edges between the multiple nodes representing paths traversed between the edges by the flows associated with the particular application, with the generated weight values being associated with said paths.


The longer timescale anomaly detection process of some embodiments uses a topology-based machine-trained process to analyze the topology graph with the updated generated weight values in order to identify an anomaly in the topology graph that is indicative of an anomaly the SD-WAN for the particular application's traffic flows. For an identified anomaly, the longer timescale anomaly detection process implements a remedial action to modify the SD-WAN in order to remediate the identified anomaly (e.g., by sending an API call identifying the remedial actions to an SD-WAN controller), according to some embodiments.


In some embodiments, when using the topology-based machine-trained process to analyze the topology graph with the generated weight values in order to identify an anomaly, the longer timescale anomaly detection process also determines whether the identified anomaly is isolated to a particular FE (e.g., isolated to a particular edge FE or due to a particular transit FE), or affects the overall application. For instance, in some embodiments, the identified anomaly is a network impairment on a first transit FE that is a next-hop FE for application traffic associated with the particular application and forwarded by a first edge FE located at a first branch site. In some such embodiments, the identified remedial action includes updating a transit FE order configuration for the first edge FE to change the next-hop transit FE for application traffic associated with the particular application and forwarded by the first edge FE from the first transit FE to a second transit FE.


The first transit FE, in some embodiments, is also a next-hop transit FE for application traffic associated with the particular application and forwarded by a second edge FE located at a second branch site. In some such embodiments, the identified anomaly is also associated with the second edge FE when application traffic for the particular application and forwarded by the second edge FE is also affected by the first transit FE's anomalous behavior. The identified remedial action, in some embodiments, is to update transit FE orders for both the first edge FE and the second edge FE.


In some embodiments, the particular application is a first application and the first transit FE is also a next-hop transit FE for application traffic associated with a second application and forwarded by the first edge FE. In some such embodiments, the identified remedial action includes updating the transit FE order configuration for the first edge FE to change the next-hop transit FE from the first transit FE to the second transit FE for application traffic associated with both the first and second applications. When the network impairment only affects traffic associated with the first application, the transit FE order configuration for the particular edge FE is only updated for traffic associated with the first application and not the second application, according to some embodiments.


When an identified anomaly is determined to require remediation to improve performance of as set of one or more flows, in some embodiments, the control action system mentioned above is utilized for identifying and implementing one or more remedial actions that modify the SD-WAN. For a particular anomaly, the control action system, in some embodiments, identifies a set of two or more remedial actions for remediating the particular anomaly in the SD-WAN.


For each identified remedial action in the set, the control action system selectively implements the identified remedial action for a subset of the set of flows for a duration of time in order to collect a set of performance metrics associated with SD-WAN performance during the duration of time for which the identified remedial action is implemented. Based on the collected sets of performance metrics, the control action system of some embodiments uses a machine-trained process to select one of the identified remedial actions as an optimal remedial action to implement for all of the flows in the set, and uniformly implements the selected remedial action for all of the flows in the set.


The particular anomaly, in some embodiments, is an increased latency associated with a first transit FE that forwards application data traffic between one or more edge FEs located at one or more branch sites connected by the SD-WAN and one or more applications deployed to a first cloud datacenter connected by the SD-WAN. The set of two or more remedial actions, in some embodiments, include two or more alternate routes through the SD-WAN to the particular application. For example, the alternate routes of some embodiments include at least (1) a first alternate route between the one or more edge FEs and the one or more applications deployed to a second cloud datacenter connected to the SD-WAN via a second transit FE, and (2) a second alternate route between the one or more edge FEs and the one or more applications deployed to a third cloud datacenter connected to the SD-WAN via a third transit FE.


In some embodiments, the control action system selectively implements each identified remedial action (e.g., each identified alternate path) for a subset of the set of flows for the duration of time by directing (e.g., via an API call to the SD-WAN controller specifying the remedial action) the one or more edge FEs to use the second transit FE to forward a first subset of flows to the one or more applications deployed to the second cloud datacenter, and directing the one or more edge FEs to use the third transit FE to forwards a second subset of flows to the one or more applications deployed to the third cloud datacenter. The one or more edge FEs of some embodiments continue to use the first transit FE to forward a remaining third subset of flows to the one or more applications deployed to the first cloud datacenter.


As the performance metrics are collected for each selectively implemented remedial action, the control action system of some embodiments receives (e.g., from the analytics system described above) or computes itself a performance score for each remedial action. When a performance score generated for the first alternate route is higher than a performance score generated for the second alternate route (i.e., is more optimal), in some embodiments, the machine-trained process of the control action system selects the first alternate route to implement for the set of flows. Alternatively, when the performance score generated for the second alternate route is higher than the performance score generated for the first alternate route, the machine-trained process of the control action system selects the second alternate route to implement for the set of flows. The control action system of some embodiments then sends an API call to the SD-WAN controller to direct the SD-WAN control to update configurations for the one or more edge FEs to cause the edge FEs to use the selected remedial action (e.g., selected alternate path) for all flows in the set.


SD-WAN forms the middle layer of the network connection between clients and devices on one end of the network (e.g., at branch, campus, and/or work-from-anywhere locations) and applications on the other end (e.g., cloud applications, datacenter applications). In some embodiments, SASE (secure access service edge) provides cloud-enabled security and network services over the SD-WAN. SASE, in some embodiments, encompasses multiple SDN (software-defined network) and security services such as an SD-WAN, Cloud Web Security, Zero Trust Network Access, etc.



FIG. 1 conceptually illustrates a schematic diagram of a self-healing SD-WAN overlay network architecture 100 of some embodiments. The self-healing SD-WAN augments an SD-WAN network with intelligence by using real-time network data and artificial intelligence/machine learning (AI/ML) algorithms (e.g., machine-trained processes) to monitor, detect, and proactively take control actions to auto-remediate end-user application and security issues (e.g., by programmatically reconfiguring SD-WAN network elements), according to some embodiments.


Multiple branch sites 130, 132, and 134 and devices 150 (e.g., user devices) located at the multiple branch sites 130-134 are connected to the SD-WAN 100 by the SD-WAN edge forwarding elements (FEs) 120, 122, and 124 (e.g., edge routers), and the datacenter 140 that hosts datacenter resources 155 is connected to the SD-WAN by the SD-WAN hub 145 (e.g., a hub router). A gateway FE 165 (e.g., gateway router) is deployed to a cloud 160 in the SD-WAN 100 to connect the SD-WAN edge FEs 120-124 to each other, to the SD-WAN hub FE 145, and to software as a service (SaaS) applications and cloud applications 110. The gateway FE 165, in some embodiments, also connects the SD-WAN 100 to external networks (not shown).


Additionally, the SD-WAN 100 includes an SD-WAN controller 105 for managing the elements of the SD-WAN 100, and an ENI platform 170 for collecting and analyzing flow data to detect and remediate issues (e.g., anomalous behavior associated with FEs, routes, applications, etc.). In some embodiments, the FEs (e.g., edge, hub, and gateway FEs) of the SD-WAN 100 are in a full mesh topology in which each forwarding element is connected to every other forwarding element. In other embodiments, the SD-WAN elements are in partial mesh topologies. Also, in some embodiments, the hub FE 145 serves as a hub in a hub-spoke architecture in which the edge FEs 120-124 serve as spokes.


The ENI platform 170 is a cluster of machines, in some embodiments, that implement a set of processes, including multiple machine-trained processes, for detecting and remediating applications issues in the SD-WAN 100. The ENI platform 170 ingests real-time flow data from network nodes (e.g., the SD-WAN edge FEs 120-124, the SD-WAN gateway FE 165, and the SD-WAN hub FE 145), analyzes and extracts insights from the flow data using AI/ML algorithms, and takes control actions by invoking certain APIs on the SD-WAN controller 105. The control actions alter the appropriate configurations of network nodes to remediate issues detected using machine-trained processes (e.g., machine learning algorithms).


The machine-trained processes are field-trained using unsupervised learning, in some embodiments, while in other embodiments, the machine-trained processes are trained prior to in-field use (e.g., supervised learning in a controlled environment). In still other embodiments, the machine-trained processes are trained both prior to in-field use as well as during in-field use (i.e., a combination of supervised and unsupervised learning).


The SD-WAN controller 105, in some embodiments, is a cluster of network managers and controllers that serves as a central point for managing (e.g., defining and modifying) configuration data that is provided to the edge FEs 120-124 and/or hubs and gateways (e.g., the SD-WAN gateway FE 165 and SD-WAN hub FE 145) to configure some or all of the operations. In some embodiments, this SD-WAN controller 105 is in one or more public cloud datacenters, while in other embodiments it is in one or more private datacenters. In some embodiments, the SD-WAN controller 105 has a set of manager servers that defines and modifies the configuration data, and a set of controller servers that distributes the configuration data to the edge FEs, hubs and/or gateways. In some embodiments, the SD-WAN controller 105 directs edge FEs and hub FEs to use certain gateways (i.e., assigns a gateway to the edges and hubs).


As described above, the SD-WAN 100 includes of two types of forwarding nodes (also called forwarding elements in the discussion below): (1) one or more edge forwarding nodes (also called edge forwarding elements or edges), and (2) one or more transit forwarding nodes (also called transit forwarding elements). An edge node (such as edge 120, 122, 124) resides at the overlay network boundary, in some embodiments, and connects the local-area-network (LAN) of a branch site (e.g., the LAN at a branch site 130, 132, or 134) with the overlay WAN network (e.g., the SD-WAN).


A transit node serves as the intermediary node on the overlay network for routing application flows to their respective destination servers, according to some embodiments. It provides several network-management functions as well as improves application performance by utilizing highly optimized network routes to reach the application servers. Examples of transit nodes include cloud gateways (e.g., cloud gateway 165) and hubs (e.g., hub 145). A hub forwarding element provides access to resources of datacenter (e.g., resources 155 of the datacenter 140) and also serves as a transit node for passing flows from one edge FE of one branch site to another edge FE of another branch site, according to some embodiments. For example, the SD-WAN hub 145 is connected to each of the SD-WAN edge FEs 120-124.


In some embodiments, flows that traverse the edge FEs and transit FEs are associated with various applications. Examples of applications, in some embodiments, include VOIP (voice over IP) applications, database applications, web applications, and applications for running virtual machines (VMs). Each application, in some embodiments, executes on a device operating at a site connected to the SD-WAN. For example, applications of some embodiments execute on devices operating at datacenters (e.g., public datacenters or private datacenters), in clouds (e.g., public clouds or private clouds), and at branch sites (e.g., on user devices operating at the branch sites). In some embodiments, different instances of the same application (e.g., a VOIP application) execute on separate user devices at separate branch sites and communicate via paths between the branch sites (e.g., direct paths between edge routers at each branch site, paths between the edge routers that traverse transit routers, etc.).


The datacenter 140 is one of multiple cloud datacenters connected by the SD-WAN, in some embodiments. In some such embodiments, each cloud datacenter can be provided by the same or different providers, while each of the branch sites 130-134 belongs to the same entity, according to some embodiments. The branch sites 130-134, in some embodiments, are multi-machine sites of the entity. Examples of multi-machine sites of some embodiments include multi-user compute sites (e.g., branch offices or other physical locations having multi-user computers and other user-operated devices and serving as source computers and devices for requests to other machines at other sites), datacenters (e.g., locations housing servers), etc. These multi-machine sites are often at different physical locations (e.g., different buildings, different cities, different states, etc.). In some embodiments, the cloud datacenters are public cloud datacenters, while in other embodiments the cloud datacenters are private cloud datacenters. In still other embodiments, the cloud datacenters may be a combination of public and private cloud datacenters. Examples of public clouds are public clouds provided by Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure, etc., while examples of entities include a company (e.g., corporation, partnership, etc.), an organization (e.g., a school, a non-profit, a government entity, etc.), etc.


The datacenter 140 includes a hub 145, as mentioned above, for connecting the datacenter 140 to the SD-WAN 100 (e.g., to the SD-WAN gateway 165 and/or the edge FEs 120-124), and for connecting branch sites 130-134 to the resources 155 of the datacenter 140. The datacenter resources 155, in some embodiments, are application resources. In some embodiments, additional SD-WAN gateways may be present and can include multi-tenant, stateless service gateways deployed in strategic points of presence (PoPs) across the globe. Some such gateways serve as gateways to various clouds and datacenters, such as the SaaS/Cloud applications 110. Also, in some embodiments, other additional SD-WAN forwarding elements may be present, including additional edge devices located at other branch sites of the entity, as well as additional SD-WAN hub FEs. Hub FEs, in some embodiments, use or have one or more service engines to perform services (e.g., middlebox services) on data messages that it forwards from one branch site to another branch site.


In some embodiments, between any two SD-WAN nodes (e.g., an SD-WAN edge 120-124 and an SD-WAN gateway 165), a network path is established between every available network interface pair. WAN optimization technology is used, in some embodiments, is used to send packets over the available paths in response to time-varying underlying network conditions. An example of such a WAN optimization technology used in some embodiments is a proprietary technology called Dynamic Multi-path Optimization (DMPO). When DMPO is used, in some embodiments, active paths with the best instantaneous network quality are used to send the application packets. In some embodiments, the outcome of DMPO optimization is a reliable overlay link between any two SD-WAN nodes even though the underlay network links may experience time-varying fluctuations.


At the flow level, in some embodiments, an application flow can take several different paths on the overlay network. Examples of such paths in some embodiments include (1) a Direct/NSD path (i.e., a non-SD-WAN path), (2) an Edge-Gateway/Hub-Application path, and (3) an Edge-Gateway/Hub-Edge-Application path. For the direct/NSD path, the application flow does not traverse the overlay network and is either put directly on the public internet or is routed through a non-SD-WAN (NSD) tunnel to the application destination, in some embodiments. For the Edge-Gateway/Hub-App path, in some embodiments, the application flow traverses the path established between the Edge node and the Gateway/Hub node and is then eventually routed to the application destination. Finally, for the Edge-Gateway/Hub-Edge-App path, the application flow is routed from one Edge node to another Edge node via the Gateway/Hub and then to the application destination, according to some embodiments.


On the overlay SD-WAN network, in some embodiments, there are several control knobs that affect the end-to-end application performance. In some embodiments, these control knobs can be categorized into two buckets: (1) link-level parameters and (2) path-level parameters. Link level parameters refer to several parameters inside the DMPO protocol on a single overlay link that control the packet transmission reliability on a single overlay link, according to some embodiments. Examples of such parameters, in some embodiments, include rate limit, traffic classification/QoS prioritization and interface switching configuration. Path level parameters refer to which paths are selected for routing application flows, in some embodiments. These include selecting direct versus overlay paths, in some embodiments, and, within overlay paths, selecting which transit node to choose as the intermediary node. In some embodiments, these parameters trade off application performance with network stability and uniform network load distribution.


While the self-healing SD-WAN technology encompasses control actions across all aspects of the network, some embodiments specifically focus on the aspect of dynamically selecting an overlay transit node (e.g., a hub node or a gateway node) as the control parameter to dynamically re-route application traffic in response to real-time end-to-end application performance conditions. The SD-WAN edge FEs 120-124, in some embodiments, are configured to stream flow metadata to a data ingestion system within an Edge Network Intelligence (ENI) Platform (not shown) of the SD-WAN 100.


The flow metadata messages streamed by the SD-WAN edge FEs, in some embodiments, include contextual information for each flow, such as the source and destination IP (Internet Protocol) addresses, source and destination ports, network protocol, and flow statistics. Examples of flow statistics, in some embodiments, include the average packet latency, drop and jitter, the number of bytes transmitted and received over the last minute, and the next hop and destination hop overlay nodes. In some embodiments, these flow metadata messages provide granular per-flow and device-level information and, in some embodiments, are streamed to an ENI analytics machine or machine cluster (not shown) through a message broker (e.g., Apache Kafka message broker) after cleaning and data normalization.


In some embodiments, the analytics process of the ENI platform is responsible for taking raw edge flow data and identifying application performance incidents in a streaming fashion. To facilitate parallel processing of edge data, in some embodiments, it is run on an analytics cluster (e.g., an Apache Spark cluster) in a three-stage process of aggregation, scoring, and anomaly detection.



FIG. 2, for instance, conceptually illustrates a block diagram 200 that includes an analytics cluster of some embodiments. As shown, the diagram includes SD-WAN edge routers 260 and an ENI platform 205. The ENI platform includes a data ingestion system 270, an analytics cluster 275, and a control action system 280. The data ingestion system 270 includes an ENI manager 210 and a message broker 215, while the analytics cluster 275 includes an aggregation pipeline 220, a scoring pipeline 230, and an anomaly detection pipeline 250.


As mentioned above, the SD-WAN edge routers 260 are configured to stream flow metadata to the data ingestion system 270 of the ENI platform 205. The ENI manager 210 (also referred to as an ENI backend) receives flow data from the edge routers 260, parses the flow data, and converts it into ENI internal data structures, according to some embodiments. In some embodiments, the ENI manager 210 is an ENI manager that executes a Java process. The flow data, in some embodiments, are protobuf messages. In some embodiments, each message includes a 5-tuple for the flow, an application identifier, protocol, flow statistics (e.g., TX bytes, RX bytes, TCP latency, and TCP retransmissions), and overlay route information (e.g., overlay route type, next hop overlay node, and destination hop overlay node), as also mentioned above. The edge routers 260 of some embodiments stream the flow data to the ENI manager 210 at a 1 minute stream frequency.


As the ENI manager 210 receives, parses, and converts flow data, the ENI manager 210 passes the converted flow data (i.e., in the ENI internal data structures) to the message broker 215. The message broker 215, in some embodiments, is a message broker between the ENI manager 210 and analytics cluster 275. As such, as the message broker 215 receives the converted flow data from the ENI manager 210, the message broker 215 passes the converted flow data to the aggregation pipeline 220 of the analytics cluster 275.


The aggregation pipeline 220 of the analytics cluster 275 reads raw data from the message broker 215 and performs a time aggregation operation to aggregate metrics (e.g., collected operational values) extracted from the raw data at a particular granularity. The granularity, in some embodiments, is a per-minute granularity such that a set of aggregated metrics are generated for each minute of a duration of time for which the raw data was collected.


In some embodiments, the messages streamed into the analytics cluster 275 have variable delays, and as such, the aggregation pipeline 220 is designed to accommodate late-arriving data by storing aggregates for a small duration past the end of the aggregation window, in some embodiments, and transmitting them downstream only after this duration has elapsed. For example, when calculating the aggregation for a window between times 12:03 and 12:04, some embodiments wait until 12:05 to receive data timestamped between 12:03 and 12:04 and add this additional data to the aggregate. Only at 12:05 will this aggregate be transmitted to the scoring pipeline 230 of the analytics cluster. In some embodiments (e.g., when using an Apache Spark cluster), this is done by maintaining a state for each edge router, application, and overlay next-hop node.


After aggregating raw data to the minute level, the aggregation pipeline 220 provides the data to the scoring pipeline 230 of the analytics cluster 275. The scoring pipeline 230 of some embodiments transforms this data into measurements of end-user application performance at a second granularity. In some embodiments, the scoring pipeline 230 performs this transformation by combining multiple raw metrics into a single score per key (e.g., a tuple for every edge router, application, and flow combination) that represents a holistic assessment of the average application performance of an edge router for each minute.


The performance scores, in some embodiments, are an application performance QoE scores on a (1) per-edge, (2) per-application, (3) per-overlay-route, and (4) per-minute level, according to some embodiments. For example, for a TCP flow, packet latency and retransmit percentage are used, in some embodiments, against pre-determined thresholds to compute a score. The self-healing system of some embodiments, however, allows flexibility to customize the scoring function on an application level and enterprise level. The scoring pipeline 230 of the analytics cluster 275 streams these scores to the anomaly detection pipeline 240.


The anomaly detection pipeline 240 of the analytics cluster 275 receives an overall performance score at the minute level for every edge, application, next-hop and destination overlay node. These scores are passed through machine learning models, in some embodiments, to detect large deviations in performance from the normal baseline. In some embodiments, the outcome of this step is an application performance incident that is generated and sent to the analytics control action system 280.


In some embodiments, as will be further described below, the anomaly detection pipeline 240 is broken into a fast acting change detection pipeline (e.g., a shorter timescale anomaly detector) and a global application performance analysis and machine learning recommendation pipeline (e.g., a longer timescale anomaly detector). The fast acting change detection pipeline of some embodiments runs time-series machine learning models on a (1) per-edge, (2) per-application, and (3) per-overlay-nexthop-node level to analyze and detect sudden degradation in performance, according to some embodiments.


In some embodiments, the global application performance analysis and machine learning recommendation pipeline runs machine learning models on a longer time-window data and computes application performance insights at the global topology level. In some embodiments, the global application performance analysis and machine learning recommendation pipeline identifies problematic edge, application, overlay-nexthop-node combinations and extracts valuable insights on application performance across the entire customer deployment.


After incidents have been detected by the analytics cluster, the control action system 280 takes a corrective action by 1) identifying if a remedial action on an impacted Edge is necessary, 2) determining which remedial action to take, and 3) autonomously applying the chosen remedial action (i.e., applying the chosen action without input from a user). This then creates a closed loop system that self-heals end-to-end application issues. Determining which remedial action to take, in some embodiments, includes using a machine-trained reinforcement learning process, as will be further described by embodiments below.


In some embodiments, the control action system 280 combines data from the (mainly domain agnostic) machine learning-based pipelines and (domain specific) topology and configuration information from an SD-WAN controller (not shown) to generate appropriate control actions which are then applied to the SD-WAN system through the SD-WAN controller. A control action (also referred to as a remedial action) involves programmatically altering the controllable parameters in the SD-WAN system, in some embodiments. The control action system 280 of some embodiments applies the control actions through API calls to the SD-WAN controller (not shown). For example, for VMware SD-WAN solutions, these changes are handled by a cloud-based management system called the VeloCloud Orchestrator (VCO). The VCO stores and periodically synchronizes edge configurations to all the network edges. This configuration can be modified by users via a GUI or an API which then dynamically alters the flow of network traffic.



FIG. 3 conceptually illustrates a block diagram 300 of interactions of some embodiments between a control action system of an ENI platform deployed in an SD-WAN and an SD-WAN controller for the SD-WAN. After the analytics cluster (not shown) identifies edge incidents 315 (e.g., anomalies and deviations in performance associated with one or more edge-application pairs), the edge incidents 315 are provided to the control action system 305. Additionally, for cases in which the SD-WAN architecture includes a hub-spoke topology and where an identified control action is to dynamically change the order of hubs for the edge (i.e., configure the edge to use a different hub for forwarding at least a subset of application traffic), the control action system 305 queries a datastore 350 for existing hub performance metrics 320, which are populated earlier by the analytics engine. An example of a datastore 320 used in some embodiments is an Apache Cassandra table.


When a remedial action is identified for remediating at least one of the edge incidents 315 for at least one edge-application pair, in some embodiments, the control action system 305 also retrieves existing configuration data 325 for the affected edge from the SD-WAN controller. The control action system 305, of some embodiments, is run on the analytics cluster described by FIG. 2 above. In some embodiments, the analytics cluster is an open-source analytics cluster such as an Apache Spark cluster.


The control action system 305 communicates with the SD-WAN controller using an API polling process that facilitates communication between the ENI platform that includes the control action system 305 and other third-party APIs, according to some embodiments. The API-poller 310, of some embodiments, queries the SD-WAN controller API for the existing edge configuration data 325 and writes the data to a datastore 340. The datastore 340, in some embodiments, is an open-source distributed wide-column datastore (e.g., Cassandra).


The open-source analytics cluster on which the control action system 305 runs, in some embodiments, queries the datastore 340 to make recommendations (i.e., for remediating identified network incidents). With the existing edge configuration 325 and knowledge of the hub performance metrics 320 for the affected edge-application pair, the control action system 305 of some embodiments creates a recommendation for a new hub order and writes this recommendation to a distributed search and analytics datastore 360 (e.g., an Elasticsearch type). In some embodiments, the API-poller 310 queries the recommendations from the datastore 360 and sends the queried recommendations as API calls to the SD-WAN controller to make the actual configuration changes. In some embodiments, the control action system 305 uses a controller API (e.g., the VCO API) to apply its recommended edge configuration changes. For example, the API-poller 310 sends the pushed configuration updates 330 to the SD-WAN controller in the block diagram 300.


Examples of the API calls invoked by the API-poller 310, in some embodiments, include getHubOrder to retrieve the hub order for one or more edges, updateHubOrder to change the configured hub order, getBusinessPolicies to retrieve business policies associated with a particular enterprise and/or edge FE(s), addBusinessPolicy to add a new business policy rule for an enterprise and/or particular edge FE(s), and removeBusinessPolicy to delete a business policy for an enterprise and/or particular edge FE(s). For each of these API calls, in some embodiments, the API-poller 310 specifies one or more edge identifiers associated with one or more edge FEs for which a configuration (e.g., hub order, business policy, etc.) is being retrieved or changed, and an enterprise identifier associated with the particular enterprise to which the one or more edge FEs belong.


In some embodiments, the getBusinessPolicies API call can instead be getBusinessPolicy with a “name” argument to search for the “Internet backhaul” rule directly. Additionally, in some embodiments, all of these API calls require an additional argument for segment logicalID or segment name. In some embodiments, the API calls can be separated into override/profile-level rules or combined into a single list. FIG. 4 illustrates a list of the above-mentioned API calls and the responses to these API calls, in some embodiments.


In some embodiments, the control action system looks to identify poor-performing edges and applications in a hub-spoke model and to add business policies updating the hub order at the edge-override level. At the input side, some embodiments include ENI incidents containing the ENI companyId affected, the logicalId of the affected edge, and a list of affected applications by ENI appStackId. Depending on the number of applications affected, some embodiments apply this new business policy to the affected applications, while in other embodiments, a single business policy is created to switch the hub order for all TCP traffic.


Because ENI “companyId”, edge “logicalId”, and ENI “appStackId” are not recognized by APIv1, the first step in the control action of some embodiments is to resolve these values to identifiers usable by APIv1, such as the enterprise's ID, edge's ID, and a configuration modules' “appId” (i.e., inside the field “match”). The “appId” is resolvable through the ENI file “vco_to_voyance_app_ids.json” and requires no calls to APIv1. For “enterpriseId”, the call /network/getNetworkEnterprises is made, in some embodiments, to get a list of all enterprise “logicalId” and corresponding “id”, which are stored as an internal map, in some embodiments. In some embodiments, a database (e.g., MongoDB) is then used to obtain the “logicalId” from the “companyId” and look up the appropriate “id” from the “logicalId”. For edges, /enterprise/getEnterpriseEdges is used to obtain a list of edges, in some embodiments, on a given “enterpriseId”, which is then used to construct a map of edge “logicalId” to edge “id”.


Using the enterprise and edge identifiers, some embodiments use the API to get the affected edge's configuration modules and apply changes. In some embodiments, this begins with a call to/edge/getEdgeConfigurationStack. Inside the resulting JSON, both the profile-level configurations and the edge (override)-level configurations are used, in some embodiments. The profile-level configuration, in some embodiments, is assumed to contain a business policy named “Internet backhaul” which contains the current hub order for the edge. This business policy is used as a template, in some embodiments, to construct a new business policy at the override level. At both the profile- and override-level, in some embodiments, the QoS module is extracted. In some embodiments, the profile-level “Internet backhaul” business policy is taken, the desired hub order changes, and, if applicable, appId, are applied, and this business policy is combined with the existing policies in the override-level, and the configuration change is made.


There are three cases for the configuration change, according to some embodiments. If the edge has no edge overrides and has not had any in the past, no QoS module at the override level will exist, in some embodiments, and as such, some embodiments create one with/configuration/insertConfigurationModule. If there have been previous edge overrides (or edge override currently exist), a QoS module will exist, which can be updated with /configuration/updateConfigurationModule, according to some embodiments. However, if there are no current edge overrides, in some embodiments, the global segment for the QoS will not exist (i.e., the “segments” field will be an empty JSON array). In the case where the global segment does not exist or a configuration module must be inserted, some embodiments first determine the logicalId of the global segment at the edge override-level. In some embodiments, doing so requires a call to/enterprise/getEnterpriseNetworkSegments. For simplicity of the internal Scala logic, this call is made, in some embodiments, even when it is not strictly necessary because of an existing global segment.


The API calls, in some embodiments, include/network/getNetworkEnterprises, /enterprise/getEnterpriseEdges, /edge/getEdgeConfigurationStack, /enterprise/getEnterpriseNetworkSegments, /configuration/updateConfigurationModule, and /configuration/insertConfigurationModule. In some embodiments, ENI incidents include a companyId, which is an identifier for the customer whose network is affected in the incident. Currently, ENI determines a network controller enterprise's logicalId from a companyId, in some embodiments. The enterprise's identifier must subsequently be determined, in some embodiments to provide as an argument to other API calls. To do so, some embodiments use /network/getNetworkEnterprises to fetch a list of all enterprises on a network controller, and use this list to construct a map of logicalId→id. When using APIv2, which works with logicalId directly, some embodiments do not need an analogous functionality for a v2 implementation of the control action system.


In some embodiments, ENI incidents include an edge's logicalId, but do not contain the edge's id. As such, some embodiments use/enterprise/getEnterpriseEdges to fetch a list of all edges on a given enterprise, and use this list to construct a map of logicalId→id. As with /network/getNetworkEnterprises, an analogous functionality for a v2 implementation of the control action system is not needed, in some embodiments.


To apply the hub order switch control action, in some embodiments, an update needs to be applied to an edge's QoS configuration module at the edge-override level. To simplify the construction of the hub order switch business policy, it is assumed, in some embodiments, that the edge has an existing rule at the profile level that includes the current hub order. Additionally, some embodiments use the override-level QoS for the edge, as in order to add a new business policy without removing old ones, the full JSON of the current QoS module is required.


In some embodiments, new business policies are applied to the global segment in self-healing. When no current edge-specific overrides exist, in some embodiments, the global segment may be empty, and so the override-level QoS module will include an empty “segments” field. In some such embodiments, a new global segment is inserted (i.e., rather than simply updating the existing one). Since no global segment exists on this module, in some embodiments, this API call is made to determine the ID of the global segment that is being inserted.


The API call/configuration/updateConfigurationModule, in some embodiments, applies the new business policy. The business policy is constructed from a template policy at the profile-level named “Internet backhaul”, in some embodiments, which is retrieved from /edge/getEdgeConfigurationStack. In some embodiments, the newly constructed policy is combined with existing policies (if any) and given as the data to /configuration/updateConfigurationModule.


In some embodiments, the API call/configuration/insertConfigurationModule also applies the new business policy. When an edge has no edge-level QoS module (e.g., when the edge has no edge-level overrides and has not had any such overrides in the past), in some embodiments, /configuration/updateConfigurationModule cannot be called as there is no module to update. Hence, some embodiments insert one instead. Beyond this detail, the logic for constructing a new business policy, in some embodiments, are largely similar between the insert and update calls.


In some embodiments, there are two modes in which self-healing operates. The first model, in some embodiments, includes recommendations with manual remediation, while and, the second model includes automated remediations, which are configurable at the level of an edge, application, and enterprise. When automatic remediation is disabled, in some embodiments, the recommendations by the control action system 305 are not immediately applied by the API-poller 310 to make configuration updates. Instead, they are shown to an end-user on a user interface (UI) provided by the ENI platform, in some embodiments, to allow the end-user to opt to apply, or disregard, these recommendations afterwards. To allow this, the ENI UI of some embodiments calls an ENI-internal API to apply the recommendation, which updates the recommendation's status in the distributed search and analytics datastore 360 and is subsequently read by the API-poller and applied.


The self-healing SD-WAN system of some embodiments also detects application performance anomalies on two different timescales. The first timescale is a shorter timescale at the minutes level, while the second timescale is a longer timescale at the days level, according to some embodiments. The first timescale, in some embodiments, addresses application issues that are happening currently (e.g., acute issues) and need to be addressed soon. Examples of such issues, in some embodiments, include sudden excessive packet drops at a router on an end-to-end path, a sudden network failure inside a datacenter, or a sudden issue with an application server. In some embodiments, the second timescale addresses application issues that are systemic and require a longer-term network optimization. Examples of these issues include an inefficient network setup that causes flows to be routed through inefficient routes, or a change in network utilization that has rendered initial configurations inefficient, in some embodiments.


As mentioned above, each SD-WAN edge FE streams per-application flow data to the ENI platform where the analytics cluster aggregates the flow data at a minute-level granularity, in some embodiments. For an edge e and application a, let {me,ai(t)} denote the collection of flow-metrics such as packet-latency, packet drop, jitter, bytes sent/received, etc. Let {se,a(t)} denote the application performance scores computed from the above flow metrics for each edge and application. For example, se,appl(t) represents the performance score for a first application (e.g., Office 365 application) at edge e for flows routed through a respective configured gateway FE.


At the shorter timescale, which is on the order of minutes, the goal of the self-healing system, in some embodiments, is to detect an application performance issue that suddenly affects certain edges and flows on the network. To this end, a time-series outlier detection methodology is utilized, in some embodiments, as will be explained in greater detail below.


For instance, let Ω denote the outlier detection model applied to se,a(t) to detect if there is an anomaly or a sudden change in application performance score for a specific edge and application that warrants the self-healing system to take remediation action. While Ω can be any general timeseries anomaly detection model, a specific example used, in some embodiments, is a sliding window Gaussian outlier detection model. Let W denote a sliding window of data (e.g., 30 minutes) and assume that within this window, se,a(t) follows a Gaussian distribution. Let {tilde over (μ)}e,a and σe,a, denote the sample mean and standard deviation in W. The instantaneous score value at time t is considered a deviation event δ(t) if |se,a(t)−{tilde over (μ)}e,a|/{tilde over (σ)}e,a>γ, for a specified threshold γ, according to some embodiments. To minimize false positives due to a single point variation, some embodiments look for consecutive points of deviation and then combine them to declare an application incident. Specifically, some embodiments generate an application performance incident if there are at least η (e.g., η=3) number of consecutive deviation events.



FIG. 5 conceptually illustrates a more detailed block diagram of an ENI platform of some embodiments that includes separate anomaly detectors for the shorter and longer timescales described above and an in-depth view of the shorter timescale anomaly detector. As shown, the ENI platform 500 includes a data ingestion system 505, an analytics system 510, and a control action system 515. The analytics system 510 (e.g., analytics cluster) includes an aggregator 520, storage 522, score calculator 524, and anomaly detector 526. In this example, the anomaly detector 526 includes a longer timescale detector 530 and a shorter timescale detector 540, which includes a graph generator 542 and a graph analyzer 544. The aggregator 520, storage 522, score calculator 524, and anomaly detector 526, in some embodiments, are processes run by one or more machines of the analytics system 510.


As the data ingestion system 505 receives flow data {me,ai(t)} from network nodes (e.g., edge nodes and transit nodes), the data ingestion system 505 provides this flow data to the aggregator 520 of the analytics system 510 as also described above. The flow data, in some embodiments, includes packet-latency, packet drop, jitter, and bytes sent/received. The aggregator 520 aggregates the received flow data on a per-minute level, and places the aggregated data in the storage 522. In other embodiments, the aggregator 520 provides the aggregated data directly to the score calculator 524.


The score calculator 524 retrieves the aggregated data from the storage 522, or receives the aggregated data directly from the aggregator 520, in some embodiments, for use in calculating performance scores {se,a(t)} for each minute of aggregated data on a per-edge router, per-application, and per-path basis. As the score calculator 524 calculates the performance scores, the score calculator 524 provides the performance scores to the anomaly detector 526. As illustrated, the performance scores are iteratively provided to both the longer timescale detector 530 of the anomaly detector 526 and the shorter timescale detector 540 of the anomaly detector 526. In some embodiments, as the score calculator 524 provides the performance scores continuously as they are calculated, while in other embodiments, the score calculator 524 provides sets of performance scores. The processes of the longer timescale detector 530 will be described further below by FIG. 9.


The graph generator 542 of the shorter timescale detector 540 receives the performance scores from the score calculator 524 and uses the scores to generate graphs (e.g., Gaussian distribution curves). For a particular 30 minute window, the graph generator 542 computes a sample mean fie, and a standard deviation {tilde over (σ)}e,a. In embodiments that utilize Gaussian distribution, the sample mean {tilde over (μ)}e,a determines the center of the distribution curve, while the standard deviation {tilde over (σ)}e,a determines the width of the distribution curve. Additionally, the height of any such Gaussian distribution curve is determined by α=1/(c√{square root over (2π)}). Based on these calculations, the graph generator 524 generates a graph for each 30 minute window for which it has received performance scores.



FIG. 6 illustrates a graph 600 of some embodiments that includes an example of a Gaussian distribution curve that is generated using performance scores computed over a particular 30 minute time window. The curve 620 has a height 640, and a center 630, as shown. While the majority of the performance scores, which are represented by multiple plotted points, fall within the curve 620, one performance score 660 falls outside of the curve. In some embodiments, this performance score 660 is considered an outlier. When three consecutive outliers are detected, in some embodiment, a deviation event is generated, as will be further described below. It should be noted that both the curve 620 and the plotted performance scores in this example are meant to be exemplary and are not representative of actual generated performance scores.


Because the graph generator 542 generates graphs for various 30 minute time windows, the curves of the generated graphs vary based on the performance scores used to generate them. More specifically, the sample mean and standard deviation, which determine the center and width of a distribution curve, are dynamic parameters that change over time based on the performance scores computed and received from the score calculator 524.



FIG. 7 illustrates an example graph 700 of some embodiments that includes four different distribution curves 720, 722, 724, and 726. In this example, three of the curves 720-724 have the same center (i.e., sampled mean), while each curve otherwise varies in both height and width (i.e., standard deviation). As such, not only do the distributions change from window to window, but what is considered an outlier (i.e., performance scores that fall outside of any given curve) changes from window to window as well. While the examples illustrated by FIGS. 6 and 7 show distribution curves, other examples can include any type of graph, including but not limited to bar graphs, scatter plots, histograms, etc.


As the graph generator 542 generates the graphs for various time windows (e.g., 30 minute time windows), it provides the graphs to the graph analyzer 544 of the shorter timescale detector 540 for analysis. In other embodiments, the shorter timescale detector includes a storage (not shown) that the graph generator 542 adds generated graphs to, and from which the graph analyzer 544 retrieves graphs for processing. In addition receiving graphs from the graph generator 542, the graph analyzer 544 receives the performance scores from the score calculator 524 for use in analyzing the graphs. In other embodiments, the graph generator 542 provides the performance scores to the graph analyzer 544 along with the generated graphs.


In some embodiments, the graph generator 542 computes values that can be used in the generation of graphs, as well as analyzed without generating a graph. For example, in some embodiments, the graph generator 542 computes the standard deviation and the sample mean, and provides these values, along with the performance scores used to compute these values, to the graph analyzer 544 for analysis (i.e., with or without a graph or graphs).


The graph analyzer 544 analyzes the graphs to identify outliers that are indicative of performance issues, according to some embodiments. The graph analyzer 544 identifies outliers by determining, for each performance score within a given 30 minute time window, whether the performance score is a deviation event δ(t) using |se,a(t)−{tilde over (μ)}e,a|/{tilde over (σ)}e,a>γ, where se,a(t) is the performance score, {tilde over (μ)}e,a is the computed sampled mean for the time window, {tilde over (σ)}e,a is the standard deviation computed for the time window, and γ is a specified threshold value, in some embodiments. False positives are minimized, in some embodiments, by identifying consecutive outliers and classifying a set of outliers as a deviation event when that set of outliers includes at least, e.g., 3, consecutive outliers within any particular 30 minute time window.


Once the graph analyzer 544 has identified a deviation event, the graph analyzer 544 provides the deviation event to the control action system 515. The control action system 515 identifies one or more remedial actions to implement in order to mitigate or eliminate anomalous behavior that led to the deviation event. Additional details regarding the processes performed by the control action system 515, as well as potential remedial actions, will be further described below.



FIG. 8 illustrates a process 800 performed in some embodiments to identify anomalies at the shorter timescale. The process 800 is performed by an analytics system, such as the analytics system 510 described above. As such, the process 800 will be described below with references to FIG. 5. In some embodiments, the process 800 is performed iteratively such that each step is performed repeatedly as additional performance scores are received.


The process 800 starts when the analytics system computes (at 810) performance scores based on flow data received from FEs during a particular time window. In some embodiments, the analytics system computes performance scores for each FE on a per-FE basis, a per-application basis, and a per-path (i.e., per-route) basis. The particular time window, in some embodiments, is a 30 minute time window. In some embodiments, the performance scores are computed at the per-minute level such that there is a performance score computed for each minute of flow data in the 30 minute time window. For example, the score calculator 524 described above computes performance scores based on the flow data it retrieves from the storage 522 and/or that it receives directly from the aggregator 520.


The process 800 computes (at 820) a sample mean and standard deviation of the performance scores for the time window. The sample mean is used to determine the center of the distribution curve of the performance scores, while the standard deviation is used to determine the width of the distribution curve, in some embodiments. The graph generator 542, for example, uses the performance scores received from the score calculator 524 to compute a sampled mean and standard deviation for each 30 minute time window for which it has received performance scores (e.g., by computing the sampled mean and standard deviation for minutes 1-30, 2-31, 3-32, and so on).


In some embodiments, the graph generator generates a distribution graph using the computed standard deviation, sampled mean, and performance scores. As illustrated in the block diagram 500, for instance, the distribution graph generator 542 passes several graphs to the distribution graph analyzer 544. Each of these several graphs, in some embodiments, is a respective distribution curve graph generated for a respective edge-application pair such that a distribution graph is generated per-edge, per-application, per-path for multiple 30 minute windows. For example, FIG. 7 described above illustrates a graph 700 that shows multiple distribution curves of some embodiments. In some embodiments, each curve is associated with the same edge-application pair for 4 separate time windows, or with the same edge and different applications, or different edges and the same application, etc.


The process 800 uses (at 830) a timescale outlier machine-trained process to identify any performance scores that deviate from the threshold during the time window. As described above, outliers δ(t) are identified using |se,a(t)−{tilde over (μ)}e,a|/{tilde over (σ)}e,a>γ, where se,a(t) is the performance score, {tilde over (μ)}e,a is the computed sampled mean for the time window, {tilde over (σ)}e,a is the standard deviation computed for the time window, and γ is a specified threshold value, in some embodiments. That is, in some embodiments, the dynamic parameters {tilde over (μ)}e,a and {tilde over (σ)}e,a are used to determine whether a given performance score exceeds the specified threshold γ.


The outlier identification is performed, in some embodiments, by the graph analyzer 544 as part of a shorter timescale detection process. The distribution graph analyzer 544, in some embodiments, analyzes the generated distribution curve graphs and determines whether any performance scores fall outside of the generated distribution curves (i.e., whether any of the performance scores are outliers with respect to the generated distribution graph). At least one performance score 660 falls outside of the curve 620 in the graph 600 described above, for example.


The process 800 determines (at 840) whether at least η consecutive deviations have been detected. In some embodiments, to prevent or at least minimize false positives, at least η (e.g., η=3) number of consecutive outliers are detected before a deviation event is generated. In other embodiments, a single detected outlier triggers a deviation event to be generated. In still other embodiments, the anomaly detector determines whether at least η outliers are detected within a time window (i.e., regardless of whether the outliers are consecutive) before the outliers are considered to be a deviation event. The time window during which at least η outliers are detected, in some embodiments, is the same as the time window for which the performance scores have been computed (e.g., 30 minutes), while in other embodiments, the time window is a smaller time window (e.g., a subset of 15 minutes) within the time window for which the performance scores have been computed. In still other embodiments, the time window during which at least η outliers are detected can span multiple time windows (e.g., multiple 30 minute windows).


When fewer than η outliers have been detected, the process 800 returns to compute (at 810) performance scores based on flow data (e.g., metrics) received from FEs during a time window. When at least η consecutive deviations have been detected, the process 800 transitions to generate (at 850) a deviation event based on the η consecutive deviations. The detection of deviations (e.g., outliers) is done by the graph analyzer 544, in some embodiments, based on data received from the graph generator 542 and, in some embodiments, data (e.g., performance scores) received from the score calculator 524. Each deviation event, in some embodiments, specifies “sampleTime” (i.e., the time of the event), “latestScore”, “historicalMean”, “scoreTimeSeries” (i.e., last 30 minutes of score 1 minute series), “edgeId” (i.e., the edge FE identifier), “nextHopId”, and “rootCauseIndicators”.


The process 800 uses (at 860) a reinforcement learning machine-trained process to identify a remedial action for remediating the deviation event. For example, once a deviation event has been generated by the graph analyzer 544, the deviation event is provided to the control action system 515, which uses the deviation event, and other data (e.g., configuration data associated with edge and/or transit nodes identified by the deviation event, performance scores, flow data, other metrics, etc.) to identify a remedial action to implement for remediating the deviation event.


After one or more remedial actions have been identified, the process 800 sends (at 870) an API call to the network controller to direct the network controller to implement the identified remedial action(s). The API call, in some embodiments, includes any configuration changes to be implemented as remedial actions, and is sent by an API poller of the control action system. Examples of API calls utilized, in some embodiments, are described above, such as the API calls illustrated by FIG. 4. Following 870, the process 800 ends.


In some embodiments, the process 800 is performed for only some of the critical edge routers, but not for all edge routers and not for transit routers (e.g., gateway routers or hub routers). In other embodiments, the process 800 is performed for every edge router, but not for the transit routers (e.g., gateway routers or hub routers). In still other embodiments, the process 800 is performed not only for some or all of the edge routers, but also for each transit router (e.g., each gateway router and each hub router). In yet other embodiments, the process 800 is performed just for transit routers (e.g., gateway routers and hub routers) but not for edge routers.


At the longer timescale, which is on the order of days, the goal of self-healing is to detect systemic issues in the application performance that require longer term network configuration changes, in some embodiments. Examples of such changes, in some embodiments, include assigning an edge to a different primary gateway for routing certain application flows. To achieve this, some embodiments use a global topology-based outlier analysis to detect systemic application issues.


In some embodiments, the global topology-based outlier analysis starts with a collection of application performance score data, {se,a(t)}, over a time window W. In this analysis, however, the time window is much larger than in the shorter timescale analysis (e.g., the last two weeks). In some embodiments, the long timescale detector 530 generates a custom topology graph for each application in a set of applications that has flows traversing through the SD-WAN (e.g., a virtual SD-WAN for each application).


The long timescale detector 530 then iteratively updates its generated topology graph as follows. Let F denote a graph representing the flow of traffic on the overlay SD-WAN topology for a specific application. Each node in the graph represents an SD-WAN node and an edge represents the flow of traffic for an application. For example, in the case of application flows routed to their destination via an SD-WAN gateway or SD-WAN hub FE, this graph is a bi-partite graph. One set of vertices of the bi-partite graph represent the SD-WAN edges in the overlay network and the other set of vertices represent the SD-WAN gateway FEs. A new graph edge is drawn between an edge, say el, and a gateway, say g1, if flows for that application originating at el are now routed through g1 (i.e., as opposed to being routed through a different gateway g2).


In some embodiments, when the initial topology graphs are generated, weights are assigned to each edge (i.e., path between two nodes) of each graph. In some embodiments, the weights are default weights. As such, after a topology graph has been updated based on the performance scores, the assigned weights are updated by mapping the application performance score timeseries to a number, according to some embodiments. The goal of this mapping function, in some embodiments, is to compute a value that represents the application performance for the corresponding edge router and gateway router combination over the respective time window. In its simplest form, the mapping function of some embodiments is the average of the performance scores se,a(t) over the time window W. Another more complex mapping function of some embodiments is the average of se,a(t) after filtering out low network utilization points.


In some embodiments, the topology-based outlier analysis is carried out on the above edge weighted graph F to detect whether an application issue is isolated to an edge FE, a gateway FE or to the overall application. In the first case, where an application issue is isolated to an edge FE, the edge weight connecting the edge FE to the respective gateway FE deviates significantly from other edges in F, according to some embodiments. In the second case, in some embodiments, where the application issue is due to a gateway FE, the sum weights of edges connecting to the respective gateway deviates significantly as compared to other gateway FEs. Finally, in the third case of overall application issue the edge weights across the graph deviates significantly from edge weights in other application graphs, in some embodiments.



FIG. 9 conceptually illustrates another block diagram of the ENI platform of FIG. 5 with an in-depth view of the longer timescale anomaly detector of some embodiments. As shown, the longer timescale detector 530 includes a score storage 932, a topology graph updater 934, and a topology graph analyzer 936. As the score calculator 524 computes performance scores from the flow data (e.g., metrics) aggregated by the aggregator 520, the score calculator 524 provides the performance scores to both the shorter timescale detector 540 and the longer timescale detector 530.


As mentioned above, the longer timescale detector 530 of some embodiments generates a custom topological graph for each particular application in a set of applications that have flows traversing through the SD-WAN. In some embodiments, the topology graph updater 934 generates the custom topology graphs for each application, while in other embodiments, the longer timescale detector 530 includes a separate topology graph generator process.


To generate a topology graph for each application, the longer timescale detector 530 in some embodiments defines one graph node for each edge or transit router (e.g., each edge, gateway, or hub router) that is used to forward one or more flows of the application through the SD-WAN. For each pair of nodes that represent a pair of routers through which one or more flows of the application traverse, the detector 530 also defines a graph edge in the graph to represent the tunnel between the pair of routers through which the application's flows traverse.


In each custom topology graph, a path between first and second edge node traverses zero or more transit nodes in the graph and one or more graph edges between these edge and/or transit nodes. This path in the graph is equivalent to a routing path from between the first and second edge routers represented by the first and second edge nodes in the graph. When the graph path traverses through one or more transit nodes, the routing path similarly traverses through one or more transit routers (e.g., hub or gateway routers). The inter-node graph edges in a graph path in some embodiments are equivalent to the tunnels between the routers in the routing paths, as mentioned above.


The long timescale detector 530 iteratively updates its generated topology graph(s) as it receives new performance scores from the score calculator 524. Specifically, as performance scores are received from the score calculator 524, the received performance scores are added to the score storage 932. When a threshold amount (e.g., n number of days- or weeks-worth) of performance scores are received for a particular key (e.g., edge router, application, route tuple), the topology graph updater 934 retrieves the collection of performance scores from the score storage 932 and uses the performance scores to update a topology graph corresponding to the particular key. Each topology graph includes nodes representing SD-WAN nodes (e.g., edge nodes and transit nodes), and edges representing traffic flows between the nodes for an application. In addition to updating the topology graphs, the topology graph updater 934 updates weights assigned to the edges of the topology graph based on performance scores for the nodes connected by the edges.



FIG. 10 illustrates simplified examples of two topology graphs of some embodiments for first and second applications. As shown, each topology graph 1010 and 1020 includes multiple nodes, labeled with an “e” to indicate edge node, or a “t” to indicate transit node (e.g., hub node or gateway node). Additionally, each edge between two nodes includes an assigned weight. For example, the edge between edge node “e1” and transit node “t3” is assigned a weight of 3 in the application 1 topology graph 1010, while this same edge is assigned a weight of 4 in the application 2 topology graph 1020.


As the topology graph updater 934 updates the weighted topology graphs, it provides these graphs to the topology graph analyzer 936 as shown. The topology graph analyzer 936 analyzes the weighted topology graphs to identify application issues for the time window represented by the graph (e.g., a two week window), and whether these application issues are isolated to an edge FE, whether they are due to a gateway FE, or whether they are overall application issues, according to some embodiments. Once an application issue, and the cause of the issue (i.e., isolated to an edge FE, due to a gateway FE, or affects the overall application), the topology graph analyzer 936 generates an event identifying the issue and provides the event to the control action system 515 for remediation.



FIG. 11 conceptually illustrates a process 1100 performed in some embodiments to identify anomalies at the longer timescale. The process 1100 is performed by an analytics system, such as the analytics 510 described above. The process 1100 will be described below with references to FIGS. 9 and 10. Like the process 800, the process 1100 is performed iteratively, in some embodiments, such that each step is performed repeatedly as additional performance scores (e.g., application score data) are received.


The process 1100 starts when the process receives (at 1110) a collection of application score data over a particular time window (e.g., a two-week time window). For instance, the topology graph updater 934 of some embodiments retrieves application score data from the score storage 932 once enough scores from the particular time window have been added to the score storage 932. As the performance scores are computed on a per-edge router, per-application, and per-route basis, the collection of application score data, in some embodiments, includes performance scores aggregated by application, such that each performance score is associated with the same application, but with different edge routers and routes.


Based on the collection of application score data, the process 1100 updates (at 1120) a topology graph that includes nodes for each FE and edges between the nodes to represent application traffic flows. That is, in some embodiments, an initial topology graph is generated based on edge routers, gateway routers, hub routers, and the paths between them, and this topology graph is then updated using the collection of application score data. For example, in some embodiments, changes to the paths and/or forwarding element configuration, such as added or removed paths and/or forwarding elements, are reflected in the performance score data and used to update the initial (or otherwise previous version of) the topology graph.


The process 1100 then uses (at 1130) the collection of application score data to assign or adjust weights to the graph edges. Each of the edges in the graphs 1010 and 1020, for instance, have assigned weights, as also described above. Each weight value, in some embodiments, can be mapped back to a corresponding performance score, or average of a set of performance scores over the time window.


In some embodiments, the weight values are generated for a range of scores over the longer time duration (e.g., a two-week duration of time), such as by using a weight calculator to produce a weight score from each performance score that has been collected for the longer time duration. The computed average of the performance scores, in some embodiments are blended averages. For instance, in some embodiments, a blended average is computed from all performance scores in the time duration (e.g., a two-week time window), while treating older scores (e.g., from week 1 of the two-week time window) with less significance than newer scores (e.g., by using another set of weight values to give more weight to the newer scores compared to older scores). Also, in some embodiments, the generated weight values are per-path (i.e., end-to-end), not per-edge (i.e., an edge between two nodes).


The process 1100 uses (at 1140) a topology-based outlier analysis to analyze the weighted topology graph in order to detect an application issue. In some embodiments, the topology-based outlier analysis is performed by comparing different topology graphs generated for different applications to determine whether any of the weights in a particular topology graph deviate significantly from other topology graphs, and/or whether the summed weights of edges connected to a particular transit node in a first topology graph deviate significantly compared to the summed weights of edges connected to the particular transit node in a second topology graph. The comparison, in some embodiments, are between topology graphs generated for different applications during the same time period (i.e., the same two week window), or between topology graphs generated for the same application during different time periods (i.e., different two week windows).


The process 1100 determines (at 1150) whether the application issue is isolated to an edge FE. In some embodiments, the application issue is determined to be isolated to an edge FE when the edge weight connecting the edge FE to a respective gateway FE deviates significantly from other edge weights in F. When the process 1100 determines that the application issue is isolated to an edge FE, the process 1100 transitions to 1180.


When the process 1100 determines that the application issue is not isolated to an edge FE, the process 1100 transitions to determine (at 1160) whether the application issue is due to a transit FE (e.g., a hub FE or gateway FE). In some embodiments, when the application issue is due to a transit FE, the sum weights of edges connecting to the respective transit FE deviates significantly as compared to other transit FEs during the same time period (i.e., same two-week window) and/or different time periods (i.e., different two-week windows). In other embodiments, when the application issue is due to a transit FE, the sum weights of edges connecting to the transit FE for the particular time period deviates significantly as compared to the sum weights of edges connecting to the transit FE for other time periods (i.e., previous two-week windows). When the process 1100 determines that the application issue is due to a transit FE, the process 1100 transitions to 1180.


When the process 1100 determines that the application issue is not due to a transit FE, the process 1100 transitions to determine (at 1170) that the application issue is an overall application issue. In some embodiments, when the application issue is an overall application issue, the edge weights across the topology graph for the particular time period (i.e., a particular two-week window) deviate significantly from edge weights in other application graphs (i.e., for other applications) for the particular time period (i.e., the particular two-week window). In other embodiments, when the application issue is an overall application issue, the edge weights across the topology graph for the particular time period (i.e., the particular two-week window) deviate significantly from edge weights in other application graphs for the same application during other particular time periods (i.e., previous two-week windows).


After the application issue, and the cause of the application issue, has been determined, the process 1100 uses (at 1180) a reinforcement learning machine-trained process to identify a remedial action for remediating the application issue. For instance, after the topology graph analyzer 936 has identified an application issue, the topology graph analyzer 936 generates an event identifying the issue and provides this event to the control action system 515 for remediation.


In some embodiments, when the application issue is isolated to an edge router, the application issue is due to a particular link used by the edge router to forward traffic for the application. In some such embodiments, a remedial action is to direct the edge router to forward traffic for the application on a different link. For example, an edge router of some embodiments is directed not to use one of three tunnels available to the edge router for outbound traffic for the application. In some other embodiments, a branch site can have more than one edge router (e.g., one edge per physical link), and when an issue is isolated to an edge router at such a site with multiple edge routers, a remedial action is to redirect traffic to a different edge device at the site.


After identifying a remedial action to implement in the SD-WAN to remediate the application issue, the process 1100 then sends (at 1190) an API call to the network controller to direct the network controller to implement the identified remedial action. Examples of such API calls are described above and illustrated by FIG. 4. Following 1190, the process 1100 ends.


In some embodiments, once the system detects that certain application flows are having an issue, the control action system takes remediation actions to resolve the issue. As described above, some embodiments focus on path-level parameters as the control knob, and, more specifically, dynamically selecting the overlay transit node to re-route affected application flows on alternate paths.


In some embodiments, depending on the network setup, there are two scenarios of route adaptation. In the first scenario, in some embodiments, application traffic at an edge cannot be sampled and partially routed on alternate paths. In the second scenario, application traffic at an edge can be sampled, in some embodiments, and a fraction of traffic can be routed on an alternate path. For example, with the assumption that there are 1000 flows for a first application at edge el routed through gateway g1, in the case when flow sampling is allowed, a fraction of these flows (e.g., 10%) can be routed on an alternate path through gateway g* without affecting the rest of the traffic flow, according to some embodiments. The difference between sampling and non-sampling of flows, in some embodiments, manifests in terms of how alternate paths are determined for the control action.


Let sg(t)={se,a(t)}g denote the application performance scores for edges whose application flows are routed through gateway node g. For the edge e (with flows currently routed through Gateway node g) detected as having an application performance anomaly (e.g., a sudden drop in application score due to excessive packet drops on the current route), let Φe denote the set of all possible gateways available for that edge. The objective of the route-adaptation control action is to determine the best alternate gateway g∈Φe to re-route flows for edge e and application a to alleviate the application issue, in some embodiments.


For the scenario where traffic flows cannot be sampled and routed on different alternate routes, some embodiments utilize a ranking-based approach to determine the best alternate gateway. In some embodiments, this is achieved by maintaining a real-time aggregate score of each gateway FE based on flows that are routed through them and selecting the gateway with the best instantaneous score. Let f(sg(t)) denote the real-time gateway score function, then the control action is to choose g* such that,

g*=arg maxg∈Φef(sg(t))


For the second case when traffic flows can be sampled and routed on alternate paths before making a control action decision, in some embodiments, the actual performance on alternate paths can be measured. This lends itself naturally to a reinforcement learning-based approach, in some embodiments, and a greedy algorithm to find the best alternate path as described below.


In some embodiments, the algorithm proceeds by sampling flows and re-routing them through an alternate gateway g∈Φe in a round-robin fashion. The algorithm then picks the gateway node that has the best application performance score among the sampled flows, in some embodiments. Specifically, let E denote the fraction of flows that are sampled and routed through the alternate gateway g, and let sg be the corresponding application performance score for the sampled flows. Then, g* is chosen such that,

g*=arg maxg∈Φesg


As compared to a ranking-based approach, the greedy algorithm uses the actual performance measurement values from the sampled flows at an edge, in some embodiments. This allows the system of some embodiments to make decisions on direct path measurements from the edge to the alternate gateway node.



FIG. 12 conceptually illustrates a block diagram that provides a more in-depth view of the control action system of some embodiments. As shown, the control action system 1200 includes a flow data storage 1220, incidents storage 1225, existing configurations storage 1230, actions storage 1235, remedial action identifier 1240, remedial action selector 1250, and an API poller 1215. The flow data storage 1220 and incidents storage 1225 are populated by the analytics system 1205, as shown.


In some embodiments, the flow data storage 1220 stores flow data (e.g., metrics) and performance scores (e.g., performance scores computed from the flow data), while in other embodiments, the control action system 1200 includes a separate storage for performance scores. The incidents storage 1225 stores incident events generated by the analytics system 1205 based on short timescale and long timescale outliers/deviations detected by the analytics system 1205. The existing configurations storage 1230 stores existing configuration data for FEs retrieved by the API poller 1215 from the network controller 1210 based on what FEs are associated with the incidents in the incidents storage 1225.


The remedial action identifier 1240 uses flow data and/or performance scores from the flow data storage 1220, incident events from the incidents storage 1225, and existing configuration data from the existing configurations storage 1230 in order to identify sets of potential remedial actions for each incident in the incident events storage 1225, in some embodiments. For example, for an incident event identifying an application issue that is due to a particular transit FE, the remedial action identifier 1240 of some embodiments identifies a set of alternate routes that do not traverse the particular transit FE for each affected edge FE-application pair. As the remedial action identifier 1240 identifies sets of potential remedial actions, the remedial action identifier 1240 adds these sets of potential remedial actions to the actions storage 1235.


The API poller 1215, of some embodiments, retrieves the sets of potential remedial actions and directs the network controller 1210 to implement the potential remedial actions for certain sampled flows for temporary time periods. As the sampled flows traverse the FEs, the FEs provide flow data to a data ingestion system as described above. The data is then processed, in some embodiments, by the analytics system 1205, and the performance scores are added to the flow data storage 1220, according to some embodiments. In other embodiments, these performance scores are stored to a separate test-performance-score storage (not shown).


The remedial action selector 1250 of some embodiments retrieves the performance scores associated with the potential remedial actions from the storage 1220 and the potential remedial actions from the actions storage 1235 in order to select the potential remedial action having the best associated performance score. The remedial action selector 1250 uses a reinforcement learning machine-trained process, in some embodiments, to select a remedial action for implementation for all affected flows. As described above, for embodiments where the remedial actions include routing flows through alternate transit FEs (e.g., hub FEs and gateway FEs), where ∈ denotes the fraction of flows that are sampled and routed through the alternate gateway g (or other transit FE), and where sg is the corresponding application performance score for the sampled flows, then g* is chosen such that g*=arg maxg∈Φesg.


Once the remedial action selector 1250 has selected a remedial action, the remedial action selector 1250 provides the selected remedial action to the API poller 1215. The API poller then sends an API call that includes the remedial action (e.g., an updated configuration for an edge FE) to the network controller 1210 to autonomously direct the network controller 1210 to implement the remedial action.



FIG. 13 conceptually illustrates a reinforcement learning process 1300 performed by the control action system of some embodiments using the greedy algorithm described above. The process 1300 starts when the control action system receives a detected anomaly from the anomaly detection process of the analytics system. For example, the analytics system 1205 adds a new incident event to the incidents storage 1225 of the control action system 1200.


The process 1300 determines (at 1310) that the anomaly detected in the SD-WAN requires remediation. In some embodiments, this determination is performed by the anomaly detector before generating and sending a deviation event to the control analytics system. For example, in some embodiments, η (e.g., η=3) consecutive outlying performance scores are detected before a determination is made that a particular edge FE associated with the performance scores (e.g., for the shorter timescale analysis) is exhibiting anomalous behavior that requires remediation, while fewer than η consecutive outlying performance scores would not result in such a determination.


In other embodiments, η outlying performance scores within a particular time window (e.g., at least 3 within a 30 minute time window), regardless of whether these outliers are consecutive, results in a determination that a particular edge FE associated with the performance scores (e.g., for the shorter timescale analysis) is exhibiting anomalous behavior that requires remediation. As such, in some embodiments, determining that a detected anomaly requires remediation is performed along with receiving an incident event identifying anomalous behavior that requires remediation from the analytics system 1205. Also, in some embodiments, determining that an anomaly requires remediation includes a determination that remediating the anomaly will improve performance for one or more flows that traverse the SD-WAN.


The process 1300 identifies (at 1320) a set of two or more remedial actions for remediating the detected anomaly. For example, in some embodiments, the detected anomaly is a spike in latency associated with a particular hub FE that is used as a next-hop by one or more edge FEs when forwarding application traffic to one or more applications deployed to, e.g., a cloud datacenter, and the identified remedial actions include a set of alternate routes to the application(s). These alternate routes, in some embodiments, include routes through other FEs (e.g., other hub FEs and/or gateway FEs), direct routes, etc. As described above, this is performed by the remedial action identifier 1240 of the control action system 1200, in some embodiments, using data from each of the storages 1220, 1225, and 1230.


The process 1300 selects (at 1330) a remedial action from the identified set of remedial actions to implement for a sample of flows during a specified time period. As described above, in some embodiments, the greedy algorithm used in the reinforcement learning samples flows and re-routes these flows through alternate FEs (i.e., re-routes through available alternate routes) in a round-robin fashion. As such, in some embodiments, the API poller 1215 sends each potential remedial action to the network controller 1210 individually to selectively implement the potential remedial action, while in other embodiments, the API poller 1215 sends a single API call that includes each of the potential remedial actions for implementation. The API calls, in some embodiments, also specify a time duration for which each potential remedial action should be implemented, as well as a set of one or more flows for which each potential remedial action should be implemented.


The process 1300 monitors (at 1340) performance of the SD-WAN for the sample of flows during the specified time period for which the selected remedial action is implemented. The monitoring, in some embodiments, includes collecting performance measurement values associated with the sampled flows from one or more edge FEs for which the remedial action (e.g., an alternate route from the edge FE to the application(s)) is applicable.


The process 1300 then generates (at 1350) a performance score for the selected remedial action based on the monitored performance. The collected performance measurement values, in some embodiments, are stored in the flow data storage 1220, and retrieved by the remedial action selector 1250 to generate the performance scores. In other embodiments, the performance scores are computed by the analytics system 1205 based on the performance measurement values and added to the flow data storage 1220 for retrieval by the remedial action selector. When the remedial action is an alternate route, the generated performance score, sg, is representative of application performance when using said alternate route.


The process 1300 determines (at 1360) whether there are additional remedial actions to implement. In some embodiments, each of the remedial actions are implemented simultaneously such that steps 1330-1350 are performed simultaneously for each identified remedial action using respective sampled flows that are assigned in a round robin fashion. In other embodiments, each remedial action is implemented and monitored individually, and once the specified time period for implementation and monitoring has timed out, a next remedial action is selected for implementation and monitoring. As such, when additional remedial actions have yet to be temporarily implemented, the process 1300 returns to select (at 1330) a remedial action from the identified set.


When the process 1300 determines (at 1360) that there are no additional remedial actions to temporarily implement (i.e., all available remedial actions have been implemented, monitored, and scored), the process 1300 identifies (at 1370) the remedial action having the best generated performance score. As described above, the remedial action (e.g., alternate gateway) having the best application performance score is selected by the remedial action selector 1250 such that g*=arg maxg∈Φesg.


The process 1300 then implements (at 1380) the identified remedial action for all applicable flows. In some embodiments, the applicable flows are associated with a single application and a single edge FE, while in some other embodiments, the applicable flows are associated with two or more applications and two or more edge FEs. The API poller 1215 sends the remedial action in an API call to the network controller 1210 for autonomous implementation, in some embodiments. Following 1380, the process 1300 ends.



FIG. 14 conceptually illustrates an example diagram of a self-healing SD-WAN 1400 in which alternate routes are monitored for sample flows between an edge router and an application. As shown, the SD-WAN 1400 includes multiple SD-WAN edge FEs 1410, 1412, 1414, and 1416 that each connect one or more client devices 1420 to the SD-WAN, and multiple hub FEs 1430, 1432, and 1434 that connect the SD-WAN edge FEs 1410-1416 to multiple clouds 1440, 1442, and 1444. Each of the clouds 1440-1444 hosts each of three applications (App1, App2, App3). Additionally, the FEs in this example are in a full mesh such that each SD-WAN edge FE 1410-1416 connects to each hub FE 1430-1434.


The edge FE 1410 in this example is initially configured to use hub 11430 as a next-hop to reach application 1 in the cloud 1440 via the route 1450, which is dashed to indicate an anomaly with this route. The application 1 is also hosted by cloud 1442, which is reachable via the hub 21432, and by cloud 1444, which is reachable via the hub 31434. Accordingly, in some embodiments, to determine a best alternate path (i.e., as a remedial action based on the anomaly associated with the first path 1450) sampled flows are routed via alternate routes 1455a and 1455b from the edge router 1410 and application 1 in each of the clouds 1442 and 1444.


For example, assume application 1 is a web application (e.g., Microsoft 365). In order to determine which of the paths 1455a or 1455b is a better alternate route for the web application traffic, some embodiments send a first subset of these web application traffic flows (i.e., sampled flows) to the instance of the web application in the cloud 1442 via the hub FE 1432 on path 1455a, and a second subset of these web application traffic flows to the instance of the web application in the cloud 1444 via the hub FE 1434 on path 1455b, while the remaining flows for this web application will continue to be sent to the instance of the web application in the cloud 1440 via the hub 1430 on path 1450.


As these alternate routes are implemented, the edge router 1410 collects performance measurement values and provides these values to the control action system (not shown) for use in generating performance scores for each route 1455a and 1455b representing application performance for application 1 (e.g., a web application such as Microsoft 365) by each route. In some embodiments, each edge router 1410-1416 runs a process for collecting flow data as they process and forward application traffic flows to and from client devices 1420.


Once either of the routes 1455a or 1455b has been selected based on the performance scores generated for these alternate routes, all subsequent application traffic flows for the web application are sent via the selected alternate route. For example, when the path 1455a has a better performance score than path 1455b, all traffic flows for the web application are sent to the web application instance in the cloud 1442 via the hub FE 1432 on path 1455a, while paths 1450 and 1455b will not be used for application traffic for this web application.


In some embodiments, the flow data collected by the edge routers includes flow data associated with applications executing on devices that operate at different branch sites. For example, FIG. 15 conceptually illustrates another example diagram of a self-healing SD-WAN 1500 of some embodiments in which alternate routes are identified between edge devices located at different branch sites for sending VOIP traffic between client devices at the different branch sites. As shown, the SD-WAN 1500 includes multiple forwarding elements such as the SD-WAN edge routers 1510 and 1515 located at branch sites 1570 and 1575, and SD-WAN gateway routers 1530 and 1535 deployed to respective clouds 1550 and 1555.


The SD-WAN edge router 1510 is located at a branch site 1570 which also includes a client device 1520, while the edge router 1515 is located at a branch site 1575 which also includes a client device 1525. Each of the client devices 1520 and 1525 execute a respective VOIP (voice over IP) application instance 1540 and 1545. The SD-WAN edge routers 1510 and 1515 forward VOIP traffic flows between the client devices 1520 and 1525.


In this example, the SD-WAN edge routers 1510 and 1515 use the path 1560, which traverses the SD-WAN gateway 1535, for forwarding VOIP traffic flows. In addition to the path 1560 between the edge routers 1510 and 1515, traffic flows can also be sent using either of the alternate paths 1562 and 1564. The alternate path 1562 is a direct route between the edge routers 1510 and 1515, while the alternate path 1564 traverses the SD-WAN gateway router 1530, as shown.


Each path, in some embodiments, is defined by tunnels established between the different forwarding elements that implement the SD-WAN (e.g., edge routers, gateway routers, and hub routers). For example, the path 1560 is defined by tunnels established between the SD-WAN edge router 1510 and SD-WAN gateway router 1535, and between the SD-WAN gateway router 1535 and the SD-WAN edge router 1515. The path 1562 is defined by a direct tunnel established between the SD-WAN edge routers 1510 and 1515. Lastly, the path 1564 is defined by tunnels established between the SD-WAN edge router 1510 and SD-WAN gateway router 1530, and between the SD-WAN gateway router 1530 and the SD-WAN edge router 1515.


When an anomaly is detected with the path 1560 (e.g., due to anomalous behavior by the gateway router 1535), in some embodiments, the control action system described above identifies the alternate paths 1562 and 1564 and tests each path to determine which is the optimal path for forwarding the VOIP traffic flows. For instance, in some embodiments, the control action system directs a first subset of the VOIP traffic flows to the direct path 1562 and a second subset of the VOIP traffic flows to the path 1564 through the SD-WAN gateway router 1530, while all remaining VOIP flows continue to be sent on the path 1560. After a temporary period of time, the control action system then selects either the path 1562 or the path 1564 and directs the edge routers 1510 and 1515 (e.g., by sending an API call to a network controller that manages the edge routers) to forward all VOIP traffic flows on the selected path.



FIG. 16 conceptually illustrates a process performed in some embodiments to identify and remediate performance incidents in an SD-WAN. The process 1600 is performed, in some embodiments, by a set of one or more anomaly detection and anomaly remediation processes executing on one or more host machines to monitor, detect, and auto-remediate end-user application and security issues. In some embodiments, these one or more host machines operate as part of an ENI platform (e.g., the ENI platform 170). The process 1600 will be described below with references to the self-healing SD-WAN 100.


The process 1600 starts by receiving (at 1610) multiple sets of flow data associated with multiple packet flows that traverse multiple forwarding elements in the SD-WAN. The flow data, in some embodiments, includes five-tuple data for the flow, an application identifier, protocol, flow statistics (e.g., TX bytes, RX bytes, TCP latency, and TCP retransmissions), and overlay route information (e.g., overlay route type, next hop overlay node, and destination hop overlay node). As illustrated by the SD-WAN 100, the SD-WAN edge FEs 120-124, the SD-WAN gateway FE 165, and the SD-WAN hub FE 145 all provide network data to the ENI platform 170.


The process 1600 aggregates (at 1620) the received sets of flow data on a per-minute level to generate aggregated sets of metrics. In some embodiments, as described above, the ENI platform 170 includes one or more machines that execute multiple processes, such as those illustrated by FIGS. 2, 3, 5, 9, and 12, including an ENI backend for receiving the flow data from the edge FEs, a message broker for passing converted flow data from the ENI backend to the data aggregation pipeline. For each minute of flow data received, in some embodiments, the data aggregation pipeline (or data aggregation and application score computation pipeline) aggregates flow data for that minute to generate a set of per-minute metrics.


The process 1600 uses (at 1630) a first set of one or more machine-trained processes to analyze the aggregated sets of metrics and identify any performance incidents. In some embodiments, the machine-trained processes include timeseries anomaly detection processes (e.g., the sliding window Gaussian outlier detection model), global topology-based outlier detection processes, and piecewise function processes. The ENI platform processes the aggregated per-minute metrics, in some embodiments, to generate QoE scores representing performance on a per-edge, per-application, and per-overlay route level for each minute of aggregated data.


The process 1600 determines (at 1640) whether any performance incidents have been identified. In some embodiments, the QoE scores generated by the scoring pipeline of the ENI platform are not indicative of any issues in the network. As such, when no performance incidents have been identified, the process 1600 ends.


Otherwise, when at least one performance incident has been identified, the process 1600 transitions to use (at 1650) a second set of one or more machine-trained processes to identify at least one remedial action for remediating each identified performance incident. Examples of performance incidents, in some embodiments, include spikes in latency measurements increased packet drops. The remedial actions, in some embodiments, depend on network nodes (e.g., edge nodes and/or transit nodes) associated with the identified performance incident.


For instance, a remedial action of some embodiments includes changing the order of hubs for one or more edges based on determinations that (1) latency for a particular application's traffic forwarded by the edges is too high and (2) each of the one or more edges uses the same hub as a next hop. Another example of a remedial action, in some embodiments, is to instantiate a new transit node (e.g., a new gateway router or a new hub router) on the network for forwarding traffic associated with one or more applications for which anomalies have been identified.


After the remedial actions have been identified, the process 1600 sends (at 1660) an API call specifying the identified remedial action(s) to an SD-WAN controller to direct the SD-WAN controller to implement the identified remedial action(s). That is, the ENI platform of some embodiments, in coordination with the SD-WAN controller, autonomously implements remedial actions without requiring any end-user input. In some embodiments, the remedial actions are provided to the SD-WAN controller as configuration updates to, e.g., a configured hub order for a particular edge FE.


For embodiments where the remedial action is the addition of a new transit node (e.g., a new gateway router or new hub router), the SD-WAN controller implements the remedial action by instantiating and configuring a new machine to serve as the new transit node, as well as by adjusting configurations of edge routers or other transit routers that are to use the new transit node as a next-hop for forwarding traffic for one or more applications. The new transit node, in some embodiments, is instantiated in the same cloud or datacenter as an existing transit node that was associated with the anomaly that the new transit node is meant to remediate. In other embodiments, the new transit node is instantiated in a different cloud or datacenter. Also, in some embodiments, more than one new transit node is instantiated and configured to obviate the anomaly. Following 1660, the process 1600 ends.


Multiple examples of network impairments (i.e., network issues) detected in some embodiments and corresponding solutions are described below. As a first example, a set of edge routers of some embodiments are initially configured to use a particular hub FE as a primary hub FE. Upon detection of a network impairment on the particular hub WAN links causing increased latency of outgoing traffic, some embodiments implement a solution to change the hub order for each of the edge routers in the set.


As a second example, in a set of four edge routers, two edge routers are configured to use a first hub as their primary hub FE, while the other two edge routers in the set are configured to use a second hub FE as their primary hub FE. When a network impairment on WAN links of the first hub is detected and increases latency of outgoing traffic, the solution implemented in some embodiments is to change the hub order for the two edge routers configured to use the first hub FE as their primary hub FE, while the two edge routers configured to use the second hub FE as their primary hub FE can continue to use that second hub FE as primary.


In some embodiments, a third example uses the same initial configuration as the second example where two edge routers are configured to use a first hub as their primary hub FE, while the other two edge routers in the set are configured to use a second hub FE as their primary hub FE. Upon detection of a network impairment on WAN links of the first hub FE that is causing increased latency for outgoing traffic for clients connected to a first of the two edge routers configured to use the first hub FE as a primary hub FE, the solution implemented in some embodiments is to change the hub order only for the first edge router, while each other edge router continues to use their initially configured primary hub FE (i.e., the first hub FE for the second edge router in the set and the second hub FE for the other two edge routers configured to use the second hub FE as the primary hub FE).


As a fourth example of some embodiments, each of four edge routers are configured to use the same hub FE as a primary hub FE. Upon detection of a network impairment on the hub FEs WAN links, causing increased latency of outgoing traffic for a single application, some embodiments implement a solution that changes the hub order for each of the four edge routers, but only for the single application for which latency has increased. That is, in some embodiments, when traffic for only one application is affected, the hub order for the edge routers is only changed for that application, while the edge routers continue to use the first hub FE as primary for each other application.


In some embodiments, combinations of the solutions described in the above examples are implemented. As a fifth example, when latency increases due to a network impairment on WAN links of a first hub FE for which a subset of edge routers (e.g., two of four) are configured to use as a next hop for traffic for a first application, the solution, in some embodiments, is to change the hub order for only the subset of edge routers and only for traffic associated with the first application.


A sixth example, in some embodiments, involves an initial configuration where a first edge router sends traffic for a first application via a first hub FE, a second edge router sends traffic for the first application via a second hub FE, and third and fourth edge routers send traffic for the first application via a third hub FE. Upon detecting a network impairment on the first and second hub FEs1-2 for traffic for the first application, the implemented solution, in some embodiments, starts with the self-healing network detecting a drop in performance across both the first and second hub FEs. Incidents and recommendations are then generated, in some embodiments, including new hub orders for the first and second edge routers. In some embodiments, the recommended new hub orders should not have the first or second hub FEs as the first recommended hub FEs for the first and second edge routers.


A seventh and final example, of some embodiments, begins with an initial configuration where a first edge router sends traffic for first and second applications via a first hub FE, a second edge router sends traffic for the first and second applications via a second hub FE, and third and fourth edge routers send traffic for the first and second applications via a third hub FE. When a network impairment is detected that affects the first application but not the second application, in some embodiments, the solution is to perform a validation check to determine whether a different hub order is recommended for the affected application (i.e., the first application) while the second application continues using the same hub FE.


In some embodiments, the hub order is changed manually through a UI (user interface) provided by a controller for the network. For instance, in some embodiments, after a self-healing incident is generated, the hub order is manually changed (e.g., by a network administrator) through the network controller's UI, and the remediation is applied. The remediation, in some embodiments, should not be applied and show up in a “failed” state.


In some embodiments, WAN optimizations (e.g., Dynamic Multi-Path Optimization (DMPO) and the self-healing system described above include complementary actions, as well as many differences. WAN optimizations, such as DMPO, in some embodiments, are performed from a local link-level perspective, while the self-healing system of some embodiments operates from a global topology perspective. WAN optimizations like DMPO involve network packet-level optimizations using overlay tunnel metrics, while the self-healing system involves flow/overlay route-level optimizations using end-to-end application performance metrics, according to some embodiments. Additionally, in some embodiments, WAN optimization such as DMPO solves underlay and last-mile link level issues, while the self-healing system solves issues outside optimized tunnels (e.g., VeloCloud Multipath (VCMP) tunnels) such as WAN issues upstream of a gateway/hub, datacenter network issues, and localized application issues.


In some embodiments, from a time-scale perspective, WAN optimizations like DMPO operate at the milliseconds timescale making decisions at the local link level and respond to underlay and last mile link issues caused by changes in network conditions, while the self-healing system operates at the minutes and days timescales making decisions at the global topology level. At the minutes level, the self-healing system of some embodiments responds to sudden network issues outside the optimized tunnels (e.g., VCMP tunnels). At the days level, in some embodiments, the self-healing responds to systemic inefficiencies in the network causing sustained application performance issues.


Currently, two main features of the self-healing system of some embodiments include incidents and remediations. The goal of the incidents feature, in some embodiments, is to detect and remediate a sudden and significant application performance degradation. In some embodiments, an “incident” is created when the analytics system detects a sudden drop in application performance as compared to the recent past history of 30 minutes. The incidents feature works at the minutes timescale of the self-healing system, in some embodiments.



FIG. 17 illustrates the layout of an incident, in some embodiments. The incident summary 1700 includes information including the impact 1710, flow statistics 1720, flow overlay route 1730, other impact 1740, remediation 1750, and a line graph depicting performance drops. The impact 1710 denotes the number of edges that have been affected by the application performance drop. In this example, 10 out of 100, or 10%, of edges experienced a greater than 90% drop in Application 1 performance in the last few minutes. Specifically, the average performance drop was 95.7% with the most common root cause being high latency, as indicated. The incident summary 1700 also includes a line graph 1760 that provides a visualization of the performance drop.


The flow statistics 1720 identifies the specific flow metrics that show a significant change in value. In this example, the average TCP latency spiked to 2.7 s following a disruption. The flow overlay route 1730 denotes the most common next hop node on the SD-WAN overlay (or Direct) among the affected edges. As shown, 6 out of the 10 affected edges had the same hub, Hub-10, as the next hop overlay node. The other impact 1740 identifies if the application issue is specific to that application, or whether it affects other applications. In this example, no other applications were impacted at the same time, as illustrated.


Lastly, remediation 1750 identifies the remediation action that is automatically applied when automatic remediation is enabled, or manually applied (e.g., by an end-user through the user interface using a cursor or other selection means). In some embodiments, if automatic remediation is not enabled, the self-healing system only provides a suggested remediation action and a UI workflow to trigger the remediation action. The remediation action can be triggered directly through the incident alert, or through an ENI application available to the end-user. In this example, the remediation has been applied automatically (i.e., without any manual trigger), and as such, the remediation 1750 indicates that the remediation was executed by the self-healing system and provides a timestamp identifying when the remediation was automatically applied. The remediation 1750 also includes an option for an end-user to see details associated with the remediation, as shown.


Regarding the recommendations feature, in some embodiments, the goal of this feature is to identify systemic issues in the network that are not transient but manifest repeatedly over time. A “recommendation” is created when the analytics system identifies an edge experiencing significantly worse application performance than other edges in the network over a longer time window (e.g., days). The recommendations feature works at the days timescale of the self-healing system, in some embodiments.


In some embodiments, the recommendations feature addresses a few main questions and correlates certain data. For example, the recommendations feature of some embodiments (1) addresses long-term application performance across edges, (2) identifies edges that are outliers and generally have worse performance than the average, (3) correlates application performance with important attributes (e.g., service provider, SD-WAN topology (e.g., direct versus using gateways/hubs), and next hop overlay node), and (4) identifies if the application issue is local to an edge, linked to the overlay/direct route, or is a general application issue. FIG. 18 illustrates an example layout 1800 of a recommendation, in some embodiments, that includes QoE (quality of experience) score comparisons for 6 gateways, as well as edge alternate overlay node QoE scores.


To calculate application QoE scores, in some embodiments, the AI/ML platform of some embodiments calculates scores per minute, per edge, per application, and per route (i.e., next hops and destination hops). The metrics used to calculate application performance, in some embodiments, include tx_pkts, rx_pkts, tcpRXRexmit_pkts, tcpTXRexmit_pkts, tcpLatencySum_usec, and tcpLatencySamples. First, avgTcpLatency and percentTcpPacketDrops are calculated, where avgTcpLatency=tcpLatencySum_usec/tcpLatencySamples and percentTcpPacketDrops=(tcpTxRexmit_pkts+tcpRxRexmit_pkts)/(tx_pkts+rx_pkts). Then, when total packets are above 50 (i.e., tx_pkts+rx_pkts>50), scores are calculated. Intermediate scores are calculated for avgTcpLatency using piecewise function (e.g., configuration file across all tenants thresholds). If avgTcpLatency<=40000 us, the score is 100 (i.e., benign). If 200,000 us>avgTcpLatency>40000 us, (avgTcpLatency−40000 us)/(200,000−40000)*100. If avgTcpLatency>=200,000 us, the score is 0 (i.e., bad). Next, intermediate scores for percentTcpPacketDrops are calculated using a piecewise function. If percentTcpPacketDrops 0.05, the score is 100. If 0.15>percentTcpPacketDrops>0.05, (percentTcpPacketDrops−0.05)/(0.15−0.05)*100. If percentTcpPacketDrops>=0.15, the score is 0 (i.e., bad). Lastly, the minimum (i.e., the worse) between the two intermediate scores becomes the application QoE score, in some embodiments.


In some embodiments, basic statistics computed include number of edges per company, and number of TCP applications monitored by self-healing system (i.e., those with application QoE scores). Basic application statistics computes, in some embodiments, include (1) for each edge, total bytes sent direct versus through an SD-WAN tunnel, comparison of application scores for direct versus SD-WAN tunnel traffic, with table columns that include edge, total bytes direct, total bytes thru SD-WAN tunnel, appTcpQoe avg at peak times for direct, and appTcpQoe avg at peak times for SD-WAN tunnel, (2) top 10 applications by scores and their traffic volumes, and (3) bottom 10 applications by scores and their traffic volumes.


Analysis of edges, in some embodiments, where the application performance deviates significantly from average/normal, includes correlating with the overlay route topology and providing insights connecting edges, applications, and overlay routes. Overall edge/application/overlay statistics are provided in a table, in some embodiments, that lists all outlier edge-application combinations along with overlay nextHop and performance statistics, with table columns that include application, edge, overlay next hop, traffic, application QoE score, and total poor application minutes. Analysis from the edge perspective, in some embodiments, are provided in a table listing each edge, the application and overlay next hop combination, application QoE score, and fraction (e.g., as a ratio) of poor performance time to total time. This analysis also includes the edge versus applications that have issues, and top applications with poor performance per outlier edge.


Analysis from the perspective of the overlay route, in some embodiments, includes a comparison of direct versus SD-WAN tunnel for the outlier edges/applications. A table for the overlay route perspective of some embodiments includes columns for overlay next hop, edge-application pair, application QoE score, and ratio of poor performance time to total time. Lastly, analysis from an application perspective, in some embodiments, includes a table with columns including application, edge-overlay next hop pair, application QoE score, and ratio of poor performance time to total time.


In some embodiments, alternate path insights and recommendations are provided, such as those described above for FIGS. 17-18. For each outlier edge and application, in some embodiments, insights regarding how alternate gateways perform if the application is a SaaS application routed through a gateway, and if the application is a hosted application routed through a hub, how alternate hubs perform are provided. For example, insights of some embodiments regarding performance of alternate gateways include the number of edges actively sending traffic to the alternate gateway, application QoE scores of those edges connecting to the alternate gateway, alternate gateway score(s), and other geographically close gateways and their scores. In some embodiments, insights regarding hosted applications routed through a hub include the number of edges actively sending traffic to the alternate hub and application QoE scores of those edges connecting to the alternate hub, and alternate hub score(s).


While the embodiments described above are described as performing analyses on a per-edge, per-application, per-path basis, other embodiments may perform the analyses differently, such as on a per-edge, per-application, per-physical link (e.g., 5G link vs MPLS link vs cable modem(s)), on a per-edge, per-path basis, or on a per-edge, per-physical link basis. Also, while the topology graphs described above are described as being per-application, other embodiments create one topology graph for several applications.


Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer-readable storage medium (also referred to as computer-readable medium). When these instructions are executed by one or more processing unit(s) (e.g., one or more processors, cores of processors, or other processing units), they cause the processing unit(s) to perform the actions indicated in the instructions. Examples of computer-readable media include, but are not limited to, CD-ROMs, flash drives, RAM chips, hard drives, EPROMs, etc. The computer-readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.


In this specification, the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage, which can be read into memory for processing by a processor. Also, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions. In some embodiments, multiple software inventions can also be implemented as separate programs. Finally, any combination of separate programs that together implement a software invention described here is within the scope of the invention. In some embodiments, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.



FIG. 19 conceptually illustrates a computer system 1900 with which some embodiments of the invention are implemented. The computer system 1900 can be used to implement any of the above-described hosts, controllers, gateway, and edge forwarding elements. As such, it can be used to execute any of the above described processes. This computer system 1900 includes various types of non-transitory machine-readable media and interfaces for various other types of machine-readable media. Computer system 1900 includes a bus 1905, processing unit(s) 1910, a system memory 1925, a read-only memory 1930, a permanent storage device 1935, input devices 1940, and output devices 1945.


The bus 1905 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the computer system 1900. For instance, the bus 1905 communicatively connects the processing unit(s) 1910 with the read-only memory 1930, the system memory 1925, and the permanent storage device 1935.


From these various memory units, the processing unit(s) 1910 retrieve instructions to execute and data to process in order to execute the processes of the invention. The processing unit(s) 1910 may be a single processor or a multi-core processor in different embodiments. The read-only-memory (ROM) 1930 stores static data and instructions that are needed by the processing unit(s) 1910 and other modules of the computer system 1900. The permanent storage device 1935, on the other hand, is a read-and-write memory device. This device 1935 is a non-volatile memory unit that stores instructions and data even when the computer system 1900 is off. Some embodiments of the invention use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1935.


Other embodiments use a removable storage device (such as a floppy disk, flash drive, etc.) as the permanent storage device. Like the permanent storage device 1935, the system memory 1925 is a read-and-write memory device. However, unlike storage device 1935, the system memory 1925 is a volatile read-and-write memory, such as random access memory. The system memory 1925 stores some of the instructions and data that the processor needs at runtime. In some embodiments, the invention's processes are stored in the system memory 1925, the permanent storage device 1935, and/or the read-only memory 1930. From these various memory units, the processing unit(s) 1910 retrieve instructions to execute and data to process in order to execute the processes of some embodiments.


The bus 1905 also connects to the input and output devices 1940 and 1945. The input devices 1940 enable the user to communicate information and select commands to the computer system 1900. The input devices 1940 include alphanumeric keyboards and pointing devices (also called “cursor control devices”). The output devices 1945 display images generated by the computer system 1900. The output devices 1945 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD). Some embodiments include devices such as touchscreens that function as both input and output devices 1940 and 1945.


Finally, as shown in FIG. 19, bus 1905 also couples computer system 1900 to a network 1965 through a network adapter (not shown). In this manner, the computer 1900 can be a part of a network of computers (such as a local area network (“LAN”), a wide area network (“WAN”), or an Intranet), or a network of networks (such as the Internet). Any or all components of computer system 1900 may be used in conjunction with the invention.


Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media). Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM), recordable compact discs (CD-R), rewritable compact discs (CD-RW), read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.), magnetic and/or solid state hard drives, read-only and recordable Blu-Ray® discs, ultra-density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.


While the above discussion primarily refers to microprocessor or multi-core processors that execute software, some embodiments are performed by one or more integrated circuits, such as application-specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs). In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself.


As used in this specification, the terms “computer”, “server”, “processor”, and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms “display” or “displaying” mean displaying on an electronic device. As used in this specification, the terms “computer-readable medium,” “computer-readable media,” and “machine-readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral or transitory signals.


While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. Thus, one of ordinary skill in the art would understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.

Claims
  • 1. A method of remediating anomalies in an SD-WAN (software-defined wide-area network) implemented by a plurality of forwarding elements (FEs) located at a plurality of sites connected by the SD-WAN, the method comprising: iteratively: receiving a plurality of performance metrics that over a duration of time expresses a performance of the SD-WAN for at least one particular application associated with flows that traverse the SD-WAN during the time duration;using the received performance metrics to update generated weight values for a topology graph that comprises (i) a plurality of nodes representing the plurality of FEs and (ii) a plurality of edges between the plurality of nodes representing paths traversed between the FEs by the flows associated with the particular application, said generated weight values associated with said paths;using a topology-based machine-trained process to analyze the topology graph with the generated weight values in order to identify an anomaly in the topology graph that is indicative of an anomaly the SD-WAN for the particular application's traffic flows; andfor an identified anomaly, implementing a remedial action to modify the SD-WAN in order to remediate the identified anomaly.
  • 2. The method of claim 1, wherein using the topology-based machine-trained process to analyze the topology graph with the generated weight values in order to identify an anomaly further comprises determining whether the identified anomaly (i) is isolated to a particular FE in the plurality of FEs or (ii) affects the overall application.
  • 3. The method of claim 1, wherein the topology-based machine-trained process is a topology-based first machine-trained process, wherein identifying the remedial action comprises using a second machine-trained process for (i) identifying a set of potential remedial actions, (ii) testing the identified set of remedial actions, and (iii) based on said testing, selecting a remedial action from a set of potential remedial actions to modify the SD-WAN in order to remediate the identified anomaly.
  • 4. The method of claim 1, wherein implementing the remedial action comprises sending an API (application programming interface) call specifying the identified remedial action to an SD-WAN network controller to direct the SD-WAN network controller to implement the remedial action.
  • 5. The method of claim 1, wherein the plurality of FEs comprises (i) a plurality of edge FEs located at a plurality of branch sites connected by the SD-WAN, (ii) a plurality of transit FEs for connecting the plurality of edge FEs to each other and to a plurality of datacenter sites.
  • 6. The method of claim 5, wherein the identified anomaly comprises a network impairment on a first transit FE of the plurality of transit FEs that is a next-hop FE for application traffic associated with the particular application and forwarded by a first edge FE of the plurality of edge FEs located at a first branch site of the plurality of branch sites.
  • 7. The method of claim 6, wherein the identified remedial action comprises updating a transit FE order configuration for the first edge FE to change the next-hop transit FE for application traffic associated with the particular application and forwarded by the first edge FE from the first transit FE to a second transit FE in the plurality of transit FEs.
  • 8. The method of claim 7, wherein: the first transit FE is a next-hop transit FE for application traffic associated with the particular application and forwarded by a second edge FE in the plurality of edge FEs located at a second branch site in the plurality of branch sites,the identified anomaly is associated with the first edge FE and the second edge FE, andthe identified remedial action comprises updating the transit FE order for the first edge FE and updating a transit FE order for the second edge FE.
  • 9. The method of claim 5, wherein the particular application is a first application, wherein the identified anomaly comprises a network impairment on a first transit FE in the plurality of transit FEs that is a next-hop transit FE for application traffic (i) associated with at least the first application and a second application and (ii) forwarded by a first edge FE in the plurality of edge FEs located at a first branch site in the plurality of branch sites.
  • 10. The method of claim 9, wherein the identified remedial action comprises updating a transit FE order configuration for the particular edge FE to change the next-hop transit FE from the first transit FE to a second transit FE for application traffic (i) associated with the first application and the second application and (ii) forwarded by the particular edge FE.
  • 11. The method of claim 9, wherein the network impairment on the first transit FE affects application traffic associated with the first application and does not affect application traffic associated with the second application.
  • 12. The method of claim 11, wherein the identified remedial action comprises updating a transit FE order configuration for the particular edge FE to change the next-hop transit FE from the first transit FE to a second transit FE for application traffic (i) associated with the first application and (ii) forwarded by the particular edge FE.
  • 13. The method of claim 12, wherein after the remedial action has been applied, the particular edge FE (i) uses the second transit FE as a next hop for application traffic associated with the first application and (ii) continues to use the first transit FE as a next hop for application traffic associated with the second application.
  • 14. The method of claim 1, wherein the set of performance metrics is a first set of performance metrics that comprise performance scores computed for the duration of time based on a second set of performance metrics collected from the plurality of FEs.
  • 15. The method of claim 1, wherein the time window comprises two weeks.
  • 16. The method of claim 1, wherein each path in the plurality of paths is comprised of a set of one or more links, each link connecting two FEs in the plurality of FEs via a tunnel established over the link, the method further comprising generating the topology graph by (i) defining, for each FE in the plurality of FEs, a corresponding node in the plurality of nodes and (ii) defining, for each tunnel between two FEs, a corresponding edge in the plurality of edges.
  • 17. A non-transitory machine readable medium storing a program for execution by a set of processing units, the program for remediating anomalies in an SD-WAN (software-defined wide-area network) implemented by a plurality of forwarding elements (FEs) located at a plurality of sites connected by the SD-WAN, the program comprising sets of instructions for: iteratively: receiving a plurality of performance metrics that over a duration of time expresses a performance of the SD-WAN for at least one particular application associated with flows that traverse the SD-WAN during the time duration;using the received performance metrics to update generated weight values for a topology graph that comprises (i) a plurality of nodes representing the plurality of FEs and (ii) a plurality of edges between the plurality of nodes representing paths traversed between the FEs by the flows associated with the particular application, said generated weight values associated with said paths;using a topology-based machine-trained process to analyze the topology graph with the generated weight values in order to identify an anomaly in the topology graph that is indicative of an anomaly the SD-WAN for the particular application's traffic flows; andfor an identified anomaly, implementing a remedial action to modify the SD-WAN in order to remediate the identified anomaly.
  • 18. The non-transitory machine readable medium of claim 17, wherein the set of instructions for using the topology-based machine-trained process to analyze the topology graph with the generated weight values in order to identify an anomaly further comprises a set of instructions for determining whether the identified anomaly (i) is isolated to a particular FE in the plurality of FEs or (ii) affects the overall application.
  • 19. The non-transitory machine readable medium of claim 17, wherein the topology-based machine-trained process is a topology-based first machine-trained process, wherein the set of instructions for identifying the remedial action comprises a set of instructions for using a second machine-trained process for (i) identifying a set of potential remedial actions, (ii) testing the identified set of remedial actions, and (iii) based on said testing, selecting a remedial action from a set of potential remedial actions to modify the SD-WAN in order to remediate the identified anomaly.
  • 20. The non-transitory machine readable medium of claim 17, wherein the set of instructions for implementing the remedial action comprises a set of instructions for sending an API (application programming interface) call specifying the identified remedial action to an SD-WAN network controller to direct the SD-WAN network controller to implement the remedial action.
US Referenced Citations (1021)
Number Name Date Kind
5652751 Sharony Jul 1997 A
5909553 Campbell et al. Jun 1999 A
6154465 Pickett Nov 2000 A
6157648 Voit et al. Dec 2000 A
6201810 Masuda et al. Mar 2001 B1
6363378 Conklin et al. Mar 2002 B1
6445682 Weitz Sep 2002 B1
6744775 Beshai et al. Jun 2004 B1
6976087 Westfall et al. Dec 2005 B1
7003481 Banka et al. Feb 2006 B2
7280476 Anderson Oct 2007 B2
7313629 Nucci et al. Dec 2007 B1
7320017 Kurapati et al. Jan 2008 B1
7373660 Guichard et al. May 2008 B1
7581022 Griffin et al. Aug 2009 B1
7680925 Sathyanarayana et al. Mar 2010 B2
7681236 Tamura et al. Mar 2010 B2
7751409 Carolan Jul 2010 B1
7962458 Holenstein et al. Jun 2011 B2
8094575 Vadlakonda et al. Jan 2012 B1
8094659 Arad Jan 2012 B1
8111692 Ray Feb 2012 B2
8141156 Mao et al. Mar 2012 B1
8224971 Miller et al. Jul 2012 B1
8228928 Parandekar et al. Jul 2012 B2
8243589 Trost et al. Aug 2012 B1
8259566 Chen et al. Sep 2012 B2
8274891 Averi et al. Sep 2012 B2
8301749 Finklestein et al. Oct 2012 B1
8385227 Downey Feb 2013 B1
8516129 Skene Aug 2013 B1
8566452 Goodwin, III et al. Oct 2013 B1
8588066 Goel et al. Nov 2013 B2
8630291 Shaffer et al. Jan 2014 B2
8661295 Khanna et al. Feb 2014 B1
8724456 Hong et al. May 2014 B1
8724503 Johnsson et al. May 2014 B2
8745177 Kazerani et al. Jun 2014 B1
8797874 Yu et al. Aug 2014 B2
8799504 Capone et al. Aug 2014 B2
8804745 Sinn Aug 2014 B1
8806482 Nagargadde et al. Aug 2014 B1
8855071 Sankaran et al. Oct 2014 B1
8856339 Mestery et al. Oct 2014 B2
8964548 Keralapura et al. Feb 2015 B1
8989199 Sella et al. Mar 2015 B1
9009217 Nagargadde et al. Apr 2015 B1
9015299 Shah Apr 2015 B1
9055000 Ghosh et al. Jun 2015 B1
9060025 Xu Jun 2015 B2
9071607 Twitchell, Jr. Jun 2015 B2
9075771 Gawali et al. Jul 2015 B1
9100329 Jiang et al. Aug 2015 B1
9135037 Petrescu-Prahova et al. Sep 2015 B1
9137334 Zhou Sep 2015 B2
9154327 Marino et al. Oct 2015 B1
9203764 Shirazipour et al. Dec 2015 B2
9225591 Beheshti-Zavareh et al. Dec 2015 B2
9306949 Richard et al. Apr 2016 B1
9323561 Ayala et al. Apr 2016 B2
9336040 Dong et al. May 2016 B2
9354983 Yenamandra et al. May 2016 B1
9356943 Lopilato et al. May 2016 B1
9379981 Zhou et al. Jun 2016 B1
9413724 Xu Aug 2016 B2
9419878 Hsiao et al. Aug 2016 B2
9432245 Sorenson, III et al. Aug 2016 B1
9438566 Zhang et al. Sep 2016 B2
9450817 Bahadur et al. Sep 2016 B1
9450852 Chen et al. Sep 2016 B1
9462010 Stevenson Oct 2016 B1
9467478 Khan et al. Oct 2016 B1
9485163 Fries et al. Nov 2016 B1
9521067 Michael et al. Dec 2016 B2
9525564 Lee Dec 2016 B2
9542219 Bryant et al. Jan 2017 B1
9559951 Sajassi et al. Jan 2017 B1
9563423 Pittman Feb 2017 B1
9602389 Maveli et al. Mar 2017 B1
9608917 Anderson et al. Mar 2017 B1
9608962 Chang Mar 2017 B1
9614748 Battersby et al. Apr 2017 B1
9621460 Mehta et al. Apr 2017 B2
9641551 Kariyanahalli May 2017 B1
9648547 Hart et al. May 2017 B1
9665432 Kruse et al. May 2017 B2
9686127 Ramachandran et al. Jun 2017 B2
9692714 Nair et al. Jun 2017 B1
9715401 Devine et al. Jul 2017 B2
9717021 Hughes et al. Jul 2017 B2
9722815 Mukundan et al. Aug 2017 B2
9747249 Cherian et al. Aug 2017 B2
9755965 Yadav et al. Sep 2017 B1
9787559 Schroeder Oct 2017 B1
9807004 Koley et al. Oct 2017 B2
9819540 Bahadur et al. Nov 2017 B1
9819565 Djukic et al. Nov 2017 B2
9825822 Holland Nov 2017 B1
9825911 Brandwine Nov 2017 B1
9825992 Xu Nov 2017 B2
9832128 Ashner et al. Nov 2017 B1
9832205 Santhi et al. Nov 2017 B2
9875355 Williams Jan 2018 B1
9906401 Rao Feb 2018 B1
9923826 Murgia Mar 2018 B2
9930011 Clemons, Jr. et al. Mar 2018 B1
9935829 Miller et al. Apr 2018 B1
9942787 Tillotson Apr 2018 B1
9996370 Khafizov et al. Jun 2018 B1
10038601 Becker et al. Jul 2018 B1
10057183 Salle et al. Aug 2018 B2
10057294 Xu Aug 2018 B2
10116593 Sinn et al. Oct 2018 B1
10135789 Mayya et al. Nov 2018 B2
10142226 Wu et al. Nov 2018 B1
10178032 Freitas Jan 2019 B1
10178037 Appleby et al. Jan 2019 B2
10187289 Chen et al. Jan 2019 B1
10200264 Menon et al. Feb 2019 B2
10229017 Zou et al. Mar 2019 B1
10237123 Dubey et al. Mar 2019 B2
10250498 Bales et al. Apr 2019 B1
10263832 Ghosh Apr 2019 B1
10320664 Nainar et al. Jun 2019 B2
10320691 Matthews et al. Jun 2019 B1
10326830 Singh Jun 2019 B1
10348767 Lee et al. Jul 2019 B1
10355989 Panchal et al. Jul 2019 B1
10425382 Mayya et al. Sep 2019 B2
10454708 Mibu Oct 2019 B2
10454714 Mayya et al. Oct 2019 B2
10461993 Turabi et al. Oct 2019 B2
10498652 Mayya et al. Dec 2019 B2
10511546 Singarayan et al. Dec 2019 B2
10523539 Mayya et al. Dec 2019 B2
10550093 Ojima et al. Feb 2020 B2
10554538 Spohn et al. Feb 2020 B2
10560431 Chen et al. Feb 2020 B1
10565464 Han et al. Feb 2020 B2
10567519 Mukhopadhyaya et al. Feb 2020 B1
10574482 Oréet al. Feb 2020 B2
10574528 Mayya et al. Feb 2020 B2
10594516 Cidon et al. Mar 2020 B2
10594591 Houjyo et al. Mar 2020 B2
10594659 El-Moussa et al. Mar 2020 B2
10608844 Cidon et al. Mar 2020 B2
10630505 Rubenstein et al. Apr 2020 B2
10637889 Ermagan et al. Apr 2020 B2
10666460 Cidon et al. May 2020 B2
10666497 Tahhan et al. May 2020 B2
10686625 Cidon et al. Jun 2020 B2
10693739 Naseri et al. Jun 2020 B1
10708144 Mohan et al. Jul 2020 B2
10715427 Raj et al. Jul 2020 B2
10749711 Mukundan et al. Aug 2020 B2
10778466 Cidon et al. Sep 2020 B2
10778528 Mayya et al. Sep 2020 B2
10778557 Ganichev et al. Sep 2020 B2
10805114 Cidon et al. Oct 2020 B2
10805272 Mayya et al. Oct 2020 B2
10819564 Turabi et al. Oct 2020 B2
10826775 Moreno et al. Nov 2020 B1
10841131 Cidon et al. Nov 2020 B2
10911374 Kumar et al. Feb 2021 B1
10938693 Mayya et al. Mar 2021 B2
10951529 Duan et al. Mar 2021 B2
10958479 Cidon et al. Mar 2021 B2
10959098 Cidon et al. Mar 2021 B2
10992558 Silva et al. Apr 2021 B1
10992568 Michael et al. Apr 2021 B2
10999100 Cidon et al. May 2021 B2
10999137 Cidon et al. May 2021 B2
10999165 Cidon et al. May 2021 B2
10999197 Hooda et al. May 2021 B2
11005684 Cidon May 2021 B2
11018995 Cidon et al. May 2021 B2
11044190 Ramaswamy et al. Jun 2021 B2
11050588 Mayya et al. Jun 2021 B2
11050644 Hegde et al. Jun 2021 B2
11071005 Shen et al. Jul 2021 B2
11089111 Markuze et al. Aug 2021 B2
11095612 Oswal et al. Aug 2021 B1
11102032 Cidon et al. Aug 2021 B2
11108595 Knutsen et al. Aug 2021 B2
11108851 Kurmala et al. Aug 2021 B1
11115347 Gupta et al. Sep 2021 B2
11115426 Pazhyannur et al. Sep 2021 B1
11115480 Markuze et al. Sep 2021 B2
11121962 Michael et al. Sep 2021 B2
11121985 Cidon et al. Sep 2021 B2
11128492 Sethi et al. Sep 2021 B2
11146632 Rubenstein Oct 2021 B2
11153230 Cidon et al. Oct 2021 B2
11171885 Cidon et al. Nov 2021 B2
11212140 Mukundan et al. Dec 2021 B2
11212238 Cidon et al. Dec 2021 B2
11223514 Mayya et al. Jan 2022 B2
11245641 Ramaswamy et al. Feb 2022 B2
11252079 Michael et al. Feb 2022 B2
11252105 Cidon et al. Feb 2022 B2
11252106 Cidon et al. Feb 2022 B2
11258728 Cidon et al. Feb 2022 B2
11310170 Cidon et al. Apr 2022 B2
11323307 Mayya et al. May 2022 B2
11349722 Mayya et al. May 2022 B2
11363124 Markuze et al. Jun 2022 B2
11374904 Mayya et al. Jun 2022 B2
11375005 Rolando et al. Jun 2022 B1
11381474 Kumar et al. Jul 2022 B1
11381499 Ramaswamy et al. Jul 2022 B1
11388086 Ramaswamy et al. Jul 2022 B1
11394640 Ramaswamy et al. Jul 2022 B2
11418997 Devadoss et al. Aug 2022 B2
11438789 Devadoss et al. Sep 2022 B2
11444865 Ramaswamy et al. Sep 2022 B2
11444872 Mayya et al. Sep 2022 B2
11477127 Ramaswamy et al. Oct 2022 B2
11489720 Kempanna et al. Nov 2022 B1
11489783 Ramaswamy et al. Nov 2022 B2
11509571 Ramaswamy et al. Nov 2022 B1
11516049 Cidon et al. Nov 2022 B2
11522780 Wallace et al. Dec 2022 B1
11526434 Brooker et al. Dec 2022 B1
11533248 Mayya et al. Dec 2022 B2
11552874 Pragada et al. Jan 2023 B1
11575591 Ramaswamy et al. Feb 2023 B2
11575600 Markuze et al. Feb 2023 B2
11582144 Ramaswamy Feb 2023 B2
11582298 Hood et al. Feb 2023 B2
11601356 Gandhi et al. Mar 2023 B2
11606225 Cidon et al. Mar 2023 B2
11606286 Michael et al. Mar 2023 B2
11606314 Cidon et al. Mar 2023 B2
11606712 Devadoss et al. Mar 2023 B2
11611507 Ramaswamy et al. Mar 2023 B2
11637768 Ramaswamy et al. Apr 2023 B2
11677720 Mayya et al. Jun 2023 B2
11689959 Devadoss et al. Jun 2023 B2
11700196 Michael et al. Jul 2023 B2
11706126 Silva et al. Jul 2023 B2
11706127 Michael et al. Jul 2023 B2
11709710 Markuze et al. Jul 2023 B2
11716286 Ramaswamy et al. Aug 2023 B2
11722925 Devadoss et al. Aug 2023 B2
11729065 Ramaswamy et al. Aug 2023 B2
20020049687 Helsper et al. Apr 2002 A1
20020075542 Kumar et al. Jun 2002 A1
20020085488 Kobayashi Jul 2002 A1
20020087716 Mustafa Jul 2002 A1
20020152306 Tuck Oct 2002 A1
20020186682 Kawano et al. Dec 2002 A1
20020198840 Banka et al. Dec 2002 A1
20030050061 Wu et al. Mar 2003 A1
20030061269 Hathaway et al. Mar 2003 A1
20030088697 Matsuhira May 2003 A1
20030112766 Riedel et al. Jun 2003 A1
20030112808 Solomon Jun 2003 A1
20030126468 Markham Jul 2003 A1
20030161313 Jinmei et al. Aug 2003 A1
20030189919 Gupta et al. Oct 2003 A1
20030202506 Perkins et al. Oct 2003 A1
20030219030 Gubbi Nov 2003 A1
20040059831 Chu et al. Mar 2004 A1
20040068668 Lor et al. Apr 2004 A1
20040165601 Liu et al. Aug 2004 A1
20040224771 Chen et al. Nov 2004 A1
20050078690 DeLangis Apr 2005 A1
20050149604 Navada Jul 2005 A1
20050154790 Nagata et al. Jul 2005 A1
20050172161 Cruz et al. Aug 2005 A1
20050195754 Nosella Sep 2005 A1
20050210479 Andjelic Sep 2005 A1
20050265255 Kodialam et al. Dec 2005 A1
20060002291 Alicherry et al. Jan 2006 A1
20060034335 Karaoguz et al. Feb 2006 A1
20060114838 Mandavilli et al. Jun 2006 A1
20060171365 Borella Aug 2006 A1
20060182034 Klinker et al. Aug 2006 A1
20060182035 Vasseur Aug 2006 A1
20060193247 Naseh et al. Aug 2006 A1
20060193252 Naseh et al. Aug 2006 A1
20060195605 Sundarrajan et al. Aug 2006 A1
20060245414 Susai et al. Nov 2006 A1
20070050594 Augsburg et al. Mar 2007 A1
20070064604 Chen et al. Mar 2007 A1
20070064702 Bates et al. Mar 2007 A1
20070083727 Johnston et al. Apr 2007 A1
20070091794 Filsfils et al. Apr 2007 A1
20070103548 Carter May 2007 A1
20070115812 Hughes May 2007 A1
20070121486 Guichard et al. May 2007 A1
20070130325 Lesser Jun 2007 A1
20070162619 Aloni et al. Jul 2007 A1
20070162639 Chu et al. Jul 2007 A1
20070177511 Das et al. Aug 2007 A1
20070195797 Patel et al. Aug 2007 A1
20070237081 Kodialam et al. Oct 2007 A1
20070260746 Mirtorabi et al. Nov 2007 A1
20070268882 Breslau et al. Nov 2007 A1
20080002670 Bugenhagen et al. Jan 2008 A1
20080049621 McGuire et al. Feb 2008 A1
20080055241 Goldenberg et al. Mar 2008 A1
20080080509 Khanna et al. Apr 2008 A1
20080095187 Jung et al. Apr 2008 A1
20080117930 Chakareski et al. May 2008 A1
20080144532 Chamarajanagar et al. Jun 2008 A1
20080168086 Miller et al. Jul 2008 A1
20080175150 Bolt et al. Jul 2008 A1
20080181116 Kavanaugh et al. Jul 2008 A1
20080219276 Shah Sep 2008 A1
20080240121 Xiong et al. Oct 2008 A1
20080263218 Beerends et al. Oct 2008 A1
20090013210 McIntosh et al. Jan 2009 A1
20090028092 Rothschild Jan 2009 A1
20090125617 Klessig et al. May 2009 A1
20090141642 Sun Jun 2009 A1
20090154463 Hines et al. Jun 2009 A1
20090182874 Morford et al. Jul 2009 A1
20090247204 Sennett et al. Oct 2009 A1
20090268605 Campbell et al. Oct 2009 A1
20090274045 Meier et al. Nov 2009 A1
20090276657 Wetmore et al. Nov 2009 A1
20090303880 Maltz et al. Dec 2009 A1
20100008361 Guichard et al. Jan 2010 A1
20100017802 Lojewski Jan 2010 A1
20100046532 Okita Feb 2010 A1
20100061379 Parandekar et al. Mar 2010 A1
20100080129 Strahan et al. Apr 2010 A1
20100088440 Banks et al. Apr 2010 A1
20100091782 Hiscock Apr 2010 A1
20100091823 Retana et al. Apr 2010 A1
20100107162 Edwards et al. Apr 2010 A1
20100118727 Draves et al. May 2010 A1
20100118886 Saavedra May 2010 A1
20100128600 Srinivasmurthy et al. May 2010 A1
20100165985 Sharma et al. Jul 2010 A1
20100191884 Holenstein et al. Jul 2010 A1
20100223621 Joshi et al. Sep 2010 A1
20100226246 Proulx Sep 2010 A1
20100290422 Haigh et al. Nov 2010 A1
20100309841 Conte Dec 2010 A1
20100309912 Mehta et al. Dec 2010 A1
20100322255 Hao et al. Dec 2010 A1
20100332657 Elyashev et al. Dec 2010 A1
20110001604 Ludlow et al. Jan 2011 A1
20110007752 Silva et al. Jan 2011 A1
20110032939 Nozaki et al. Feb 2011 A1
20110035187 DeJori et al. Feb 2011 A1
20110040814 Higgins Feb 2011 A1
20110075674 Li et al. Mar 2011 A1
20110078783 Duan et al. Mar 2011 A1
20110107139 Middlecamp et al. May 2011 A1
20110110370 Moreno et al. May 2011 A1
20110141877 Xu et al. Jun 2011 A1
20110142041 Imai Jun 2011 A1
20110153909 Dong Jun 2011 A1
20110235509 Szymanski Sep 2011 A1
20110255397 Kadakia et al. Oct 2011 A1
20110302663 Prodan et al. Dec 2011 A1
20120008630 Ould-Brahim Jan 2012 A1
20120027013 Napierala Feb 2012 A1
20120039309 Evans et al. Feb 2012 A1
20120099601 Haddad et al. Apr 2012 A1
20120136697 Peles et al. May 2012 A1
20120140935 Kruglick Jun 2012 A1
20120157068 Eichen et al. Jun 2012 A1
20120173694 Yan et al. Jul 2012 A1
20120173919 Patel et al. Jul 2012 A1
20120182940 Taleb et al. Jul 2012 A1
20120221955 Raleigh et al. Aug 2012 A1
20120227093 Shatzkamer et al. Sep 2012 A1
20120240185 Kapoor et al. Sep 2012 A1
20120250682 Vincent et al. Oct 2012 A1
20120250686 Vincent et al. Oct 2012 A1
20120266026 Chikkalingaiah et al. Oct 2012 A1
20120281706 Agarwal et al. Nov 2012 A1
20120287818 Corti et al. Nov 2012 A1
20120300615 Kempf et al. Nov 2012 A1
20120307659 Yamada Dec 2012 A1
20120317270 Vrbaski et al. Dec 2012 A1
20120317291 Wolfe Dec 2012 A1
20130007505 Spear Jan 2013 A1
20130019005 Hui et al. Jan 2013 A1
20130021968 Reznik et al. Jan 2013 A1
20130044764 Casado et al. Feb 2013 A1
20130051237 Ong Feb 2013 A1
20130051399 Zhang et al. Feb 2013 A1
20130054763 Merwe et al. Feb 2013 A1
20130086267 Gelenbe et al. Apr 2013 A1
20130097304 Asthana et al. Apr 2013 A1
20130103729 Cooney et al. Apr 2013 A1
20130103834 Dzerve et al. Apr 2013 A1
20130117530 Kim et al. May 2013 A1
20130124718 Griffith et al. May 2013 A1
20130124911 Griffith et al. May 2013 A1
20130124912 Griffith et al. May 2013 A1
20130128889 Mathur et al. May 2013 A1
20130142201 Kim et al. Jun 2013 A1
20130170354 Takashima et al. Jul 2013 A1
20130173768 Kundu et al. Jul 2013 A1
20130173788 Song Jul 2013 A1
20130182712 Aguayo et al. Jul 2013 A1
20130185446 Zeng et al. Jul 2013 A1
20130185729 Vasic et al. Jul 2013 A1
20130191688 Agarwal et al. Jul 2013 A1
20130223226 Narayanan et al. Aug 2013 A1
20130223454 Dunbar et al. Aug 2013 A1
20130235870 Tripathi et al. Sep 2013 A1
20130238782 Zhao et al. Sep 2013 A1
20130242718 Zhang Sep 2013 A1
20130254599 Katkar et al. Sep 2013 A1
20130258839 Wang et al. Oct 2013 A1
20130258847 Zhang et al. Oct 2013 A1
20130266015 Qu et al. Oct 2013 A1
20130266019 Qu et al. Oct 2013 A1
20130283364 Chang et al. Oct 2013 A1
20130286846 Atlas et al. Oct 2013 A1
20130297611 Moritz et al. Nov 2013 A1
20130297770 Zhang Nov 2013 A1
20130301469 Suga Nov 2013 A1
20130301642 Radhakrishnan et al. Nov 2013 A1
20130308444 Sem-Jacobsen et al. Nov 2013 A1
20130315242 Wang et al. Nov 2013 A1
20130315243 Huang et al. Nov 2013 A1
20130329548 Nakil et al. Dec 2013 A1
20130329601 Yin et al. Dec 2013 A1
20130329734 Chesla et al. Dec 2013 A1
20130346470 Obstfeld et al. Dec 2013 A1
20140016464 Shirazipour et al. Jan 2014 A1
20140019604 Twitchell, Jr. Jan 2014 A1
20140019750 Dodgson et al. Jan 2014 A1
20140040975 Raleigh et al. Feb 2014 A1
20140064283 Balus et al. Mar 2014 A1
20140071832 Johnsson et al. Mar 2014 A1
20140092907 Sridhar et al. Apr 2014 A1
20140108665 Arora et al. Apr 2014 A1
20140112171 Pasdar Apr 2014 A1
20140115584 Mudigonda et al. Apr 2014 A1
20140122559 Branson et al. May 2014 A1
20140123135 Huang et al. May 2014 A1
20140126418 Brendel et al. May 2014 A1
20140156818 Hunt Jun 2014 A1
20140156823 Liu et al. Jun 2014 A1
20140157363 Banerjee Jun 2014 A1
20140160935 Zecharia et al. Jun 2014 A1
20140164560 Ko et al. Jun 2014 A1
20140164617 Jalan et al. Jun 2014 A1
20140164718 Schaik et al. Jun 2014 A1
20140173113 Vemuri et al. Jun 2014 A1
20140173331 Martin et al. Jun 2014 A1
20140181824 Saund et al. Jun 2014 A1
20140189074 Parker Jul 2014 A1
20140208317 Nakagawa Jul 2014 A1
20140219135 Li et al. Aug 2014 A1
20140223507 Xu Aug 2014 A1
20140229210 Sharifian et al. Aug 2014 A1
20140244851 Lee Aug 2014 A1
20140258535 Zhang Sep 2014 A1
20140269690 Tu Sep 2014 A1
20140279862 Dietz et al. Sep 2014 A1
20140280499 Basavaiah et al. Sep 2014 A1
20140310282 Sprague et al. Oct 2014 A1
20140317440 Biermayr et al. Oct 2014 A1
20140321277 Lynn, Jr. et al. Oct 2014 A1
20140337500 Lee Nov 2014 A1
20140337674 Ivancic et al. Nov 2014 A1
20140341109 Cartmell et al. Nov 2014 A1
20140355441 Jain Dec 2014 A1
20140365834 Stone et al. Dec 2014 A1
20140372582 Ghanwani et al. Dec 2014 A1
20150003240 Drwiega et al. Jan 2015 A1
20150016249 Mukundan et al. Jan 2015 A1
20150029864 Raileanu et al. Jan 2015 A1
20150039744 Niazi et al. Feb 2015 A1
20150046572 Cheng et al. Feb 2015 A1
20150052247 Threefoot et al. Feb 2015 A1
20150052517 Raghu et al. Feb 2015 A1
20150056960 Egner et al. Feb 2015 A1
20150058917 Xu Feb 2015 A1
20150088942 Shah Mar 2015 A1
20150089628 Lang Mar 2015 A1
20150092603 Aguayo et al. Apr 2015 A1
20150096011 Watt Apr 2015 A1
20150100958 Banavalikar et al. Apr 2015 A1
20150106809 Reddy et al. Apr 2015 A1
20150124603 Ketheesan et al. May 2015 A1
20150134777 Onoue May 2015 A1
20150139238 Pourzandi et al. May 2015 A1
20150146539 Mehta et al. May 2015 A1
20150163152 Li Jun 2015 A1
20150169340 Haddad et al. Jun 2015 A1
20150172121 Farkas et al. Jun 2015 A1
20150172169 DeCusatis et al. Jun 2015 A1
20150188823 Williams et al. Jul 2015 A1
20150189009 Bemmel Jul 2015 A1
20150195178 Bhattacharya et al. Jul 2015 A1
20150201036 Nishiki et al. Jul 2015 A1
20150222543 Song Aug 2015 A1
20150222638 Morley Aug 2015 A1
20150236945 Michael et al. Aug 2015 A1
20150236962 Veres et al. Aug 2015 A1
20150244617 Nakil et al. Aug 2015 A1
20150249644 Xu Sep 2015 A1
20150257081 Ramanujan et al. Sep 2015 A1
20150264055 Budhani et al. Sep 2015 A1
20150271056 Chunduri et al. Sep 2015 A1
20150271104 Chikkamath et al. Sep 2015 A1
20150271303 Neginhal et al. Sep 2015 A1
20150281004 Kakadia et al. Oct 2015 A1
20150312142 Barabash et al. Oct 2015 A1
20150312760 O'Toole Oct 2015 A1
20150317169 Sinha et al. Nov 2015 A1
20150326426 Uo et al. Nov 2015 A1
20150334025 Rader Nov 2015 A1
20150334696 Gu et al. Nov 2015 A1
20150341271 Gomez Nov 2015 A1
20150349978 Wu et al. Dec 2015 A1
20150350907 Timariu et al. Dec 2015 A1
20150358232 Chen et al. Dec 2015 A1
20150358236 Roach et al. Dec 2015 A1
20150363221 Terayama et al. Dec 2015 A1
20150363733 Brown Dec 2015 A1
20150365323 Duminuco et al. Dec 2015 A1
20150372943 Hasan et al. Dec 2015 A1
20150372982 Herle et al. Dec 2015 A1
20150381407 Wang et al. Dec 2015 A1
20150381462 Choi et al. Dec 2015 A1
20150381493 Bansal et al. Dec 2015 A1
20160019317 Pawar et al. Jan 2016 A1
20160020844 Hart et al. Jan 2016 A1
20160021597 Hart et al. Jan 2016 A1
20160035183 Buchholz et al. Feb 2016 A1
20160036924 Koppolu et al. Feb 2016 A1
20160036938 Aviles et al. Feb 2016 A1
20160037434 Gopal et al. Feb 2016 A1
20160072669 Saavedra Mar 2016 A1
20160072684 Manuguri et al. Mar 2016 A1
20160080268 Anand et al. Mar 2016 A1
20160080502 Yadav et al. Mar 2016 A1
20160105353 Cociglio Apr 2016 A1
20160105392 Thakkar et al. Apr 2016 A1
20160105471 Nunes et al. Apr 2016 A1
20160105488 Thakkar et al. Apr 2016 A1
20160117185 Fang et al. Apr 2016 A1
20160134461 Sampath et al. May 2016 A1
20160134527 Kwak et al. May 2016 A1
20160134528 Lin et al. May 2016 A1
20160134591 Liao et al. May 2016 A1
20160142373 Ossipov May 2016 A1
20160147607 Dornemann et al. May 2016 A1
20160150055 Choi May 2016 A1
20160164832 Bellagamba et al. Jun 2016 A1
20160164914 Madhav et al. Jun 2016 A1
20160173338 Wolting Jun 2016 A1
20160191363 Haraszti et al. Jun 2016 A1
20160191374 Singh et al. Jun 2016 A1
20160192403 Gupta et al. Jun 2016 A1
20160197834 Luft Jul 2016 A1
20160197835 Luft Jul 2016 A1
20160198003 Luft Jul 2016 A1
20160205071 Cooper et al. Jul 2016 A1
20160210209 Verkaik et al. Jul 2016 A1
20160212773 Kanderholm et al. Jul 2016 A1
20160218947 Hughes et al. Jul 2016 A1
20160218951 Vasseur et al. Jul 2016 A1
20160234099 Jiao Aug 2016 A1
20160234161 Banerjee et al. Aug 2016 A1
20160255169 Kovvuri et al. Sep 2016 A1
20160255542 Hughes et al. Sep 2016 A1
20160261493 Li Sep 2016 A1
20160261495 Xia et al. Sep 2016 A1
20160261506 Hegde et al. Sep 2016 A1
20160261639 Xu Sep 2016 A1
20160269298 Li et al. Sep 2016 A1
20160269926 Sundaram Sep 2016 A1
20160285736 Gu Sep 2016 A1
20160299775 Madapurath et al. Oct 2016 A1
20160301471 Kunz et al. Oct 2016 A1
20160308762 Teng et al. Oct 2016 A1
20160315912 Mayya et al. Oct 2016 A1
20160323377 Einkauf et al. Nov 2016 A1
20160328159 Coddington et al. Nov 2016 A1
20160330111 Manghirmalani et al. Nov 2016 A1
20160337202 Ben-Itzhak et al. Nov 2016 A1
20160352588 Subbarayan et al. Dec 2016 A1
20160353268 Senarath et al. Dec 2016 A1
20160359738 Sullenberger et al. Dec 2016 A1
20160366187 Kamble Dec 2016 A1
20160371153 Dornemann Dec 2016 A1
20160378527 Zamir Dec 2016 A1
20160380886 Blair et al. Dec 2016 A1
20160380906 Hodique et al. Dec 2016 A1
20170005986 Bansal et al. Jan 2017 A1
20170006499 Hampel et al. Jan 2017 A1
20170012870 Blair et al. Jan 2017 A1
20170019428 Cohn Jan 2017 A1
20170024260 Chandrasekaran et al. Jan 2017 A1
20170026273 Yao et al. Jan 2017 A1
20170026283 Williams et al. Jan 2017 A1
20170026355 Mathaiyan et al. Jan 2017 A1
20170034046 Cai et al. Feb 2017 A1
20170034052 Chanda et al. Feb 2017 A1
20170034129 Sawant et al. Feb 2017 A1
20170048296 Ramalho et al. Feb 2017 A1
20170053258 Carney et al. Feb 2017 A1
20170055131 Kong et al. Feb 2017 A1
20170063674 Maskalik et al. Mar 2017 A1
20170063782 Jain et al. Mar 2017 A1
20170063783 Yong et al. Mar 2017 A1
20170063794 Jain et al. Mar 2017 A1
20170064005 Lee Mar 2017 A1
20170075710 Prasad et al. Mar 2017 A1
20170093625 Pera et al. Mar 2017 A1
20170097841 Chang et al. Apr 2017 A1
20170104653 Badea et al. Apr 2017 A1
20170104755 Arregoces et al. Apr 2017 A1
20170109212 Gaurav et al. Apr 2017 A1
20170118067 Vedula Apr 2017 A1
20170118173 Arramreddy et al. Apr 2017 A1
20170123939 Maheshwari et al. May 2017 A1
20170126475 Mahkonen et al. May 2017 A1
20170126516 Tiagi et al. May 2017 A1
20170126564 Mayya et al. May 2017 A1
20170134186 Mukundan et al. May 2017 A1
20170134520 Abbasi et al. May 2017 A1
20170139789 Fries et al. May 2017 A1
20170142000 Cai et al. May 2017 A1
20170149637 Banikazemi et al. May 2017 A1
20170155557 Desai et al. Jun 2017 A1
20170155566 Martinsen et al. Jun 2017 A1
20170155590 Dillon et al. Jun 2017 A1
20170163473 Sadana et al. Jun 2017 A1
20170171024 Anerousis et al. Jun 2017 A1
20170171310 Gardner Jun 2017 A1
20170180220 Leckey et al. Jun 2017 A1
20170181210 Nadella et al. Jun 2017 A1
20170195161 Ruel et al. Jul 2017 A1
20170195169 Mills et al. Jul 2017 A1
20170201568 Hussam et al. Jul 2017 A1
20170201585 Doraiswamy et al. Jul 2017 A1
20170207976 Rovner et al. Jul 2017 A1
20170214545 Cheng et al. Jul 2017 A1
20170214701 Hasan Jul 2017 A1
20170223117 Messerli et al. Aug 2017 A1
20170236060 Ignatyev Aug 2017 A1
20170237710 Mayya et al. Aug 2017 A1
20170242784 Heorhiadi et al. Aug 2017 A1
20170257260 Govindan et al. Sep 2017 A1
20170257309 Appanna Sep 2017 A1
20170264496 Ao et al. Sep 2017 A1
20170279717 Bethers et al. Sep 2017 A1
20170279741 Elias et al. Sep 2017 A1
20170279803 Desai et al. Sep 2017 A1
20170280474 Vesterinen et al. Sep 2017 A1
20170288987 Pasupathy et al. Oct 2017 A1
20170289002 Ganguli et al. Oct 2017 A1
20170289027 Ratnasingham Oct 2017 A1
20170295264 Touitou et al. Oct 2017 A1
20170302501 Shi et al. Oct 2017 A1
20170302565 Ghobadi et al. Oct 2017 A1
20170310641 Jiang et al. Oct 2017 A1
20170310691 Vasseur et al. Oct 2017 A1
20170317954 Masurekar et al. Nov 2017 A1
20170317969 Masurekar et al. Nov 2017 A1
20170317974 Masurekar et al. Nov 2017 A1
20170324628 Dhanabalan Nov 2017 A1
20170337086 Zhu et al. Nov 2017 A1
20170339022 Hegde et al. Nov 2017 A1
20170339054 Yadav et al. Nov 2017 A1
20170339070 Chang et al. Nov 2017 A1
20170346722 Smith et al. Nov 2017 A1
20170364419 Lo Dec 2017 A1
20170366445 Nemirovsky et al. Dec 2017 A1
20170366467 Martin et al. Dec 2017 A1
20170373950 Szilagyi et al. Dec 2017 A1
20170374174 Evens et al. Dec 2017 A1
20180006995 Bickhart et al. Jan 2018 A1
20180007005 Chanda et al. Jan 2018 A1
20180007123 Cheng et al. Jan 2018 A1
20180013636 Seetharamaiah et al. Jan 2018 A1
20180014051 Phillips et al. Jan 2018 A1
20180020035 Boggia et al. Jan 2018 A1
20180034668 Mayya et al. Feb 2018 A1
20180041425 Zhang Feb 2018 A1
20180062875 Tumuluru Mar 2018 A1
20180062914 Boutros et al. Mar 2018 A1
20180062917 Chandrashekhar et al. Mar 2018 A1
20180063036 Chandrashekhar et al. Mar 2018 A1
20180063193 Chandrashekhar et al. Mar 2018 A1
20180063233 Park Mar 2018 A1
20180063743 Tumuluru et al. Mar 2018 A1
20180069924 Tumuluru et al. Mar 2018 A1
20180074909 Bishop et al. Mar 2018 A1
20180077081 Lauer et al. Mar 2018 A1
20180077202 Xu Mar 2018 A1
20180084081 Kuchibhotla et al. Mar 2018 A1
20180091370 Arai Mar 2018 A1
20180097725 Wood et al. Apr 2018 A1
20180114569 Strachan et al. Apr 2018 A1
20180123910 Fitzgibbon May 2018 A1
20180123946 Ramachandran et al. May 2018 A1
20180131608 Jiang et al. May 2018 A1
20180131615 Zhang May 2018 A1
20180131720 Hobson et al. May 2018 A1
20180145899 Rao May 2018 A1
20180159796 Wang et al. Jun 2018 A1
20180159856 Gujarathi Jun 2018 A1
20180167378 Kostyukov et al. Jun 2018 A1
20180176073 Dubey et al. Jun 2018 A1
20180176082 Katz et al. Jun 2018 A1
20180176130 Banerjee et al. Jun 2018 A1
20180176252 Nimmagadda et al. Jun 2018 A1
20180181423 Gunda et al. Jun 2018 A1
20180205746 Boutnaru et al. Jul 2018 A1
20180213472 Ishii et al. Jul 2018 A1
20180219765 Michael et al. Aug 2018 A1
20180219766 Michael et al. Aug 2018 A1
20180234300 Mayya et al. Aug 2018 A1
20180248790 Tan et al. Aug 2018 A1
20180260125 Botes et al. Sep 2018 A1
20180261085 Liu et al. Sep 2018 A1
20180262468 Kumar et al. Sep 2018 A1
20180270104 Zheng et al. Sep 2018 A1
20180278541 Wu et al. Sep 2018 A1
20180287907 Kulshreshtha et al. Oct 2018 A1
20180295101 Gehrmann Oct 2018 A1
20180295529 Jen et al. Oct 2018 A1
20180302286 Mayya et al. Oct 2018 A1
20180302321 Manthiramoorthy et al. Oct 2018 A1
20180307851 Lewis Oct 2018 A1
20180316606 Sung et al. Nov 2018 A1
20180351855 Sood et al. Dec 2018 A1
20180351862 Jeganathan et al. Dec 2018 A1
20180351863 Vairavakkalai et al. Dec 2018 A1
20180351882 Jeganathan et al. Dec 2018 A1
20180359323 Madden Dec 2018 A1
20180367445 Bajaj Dec 2018 A1
20180373558 Chang et al. Dec 2018 A1
20180375744 Mayya et al. Dec 2018 A1
20180375824 Mayya et al. Dec 2018 A1
20180375967 Pithawala et al. Dec 2018 A1
20190013883 Vargas et al. Jan 2019 A1
20190014038 Ritchie Jan 2019 A1
20190020588 Twitchell, Jr. Jan 2019 A1
20190020627 Yuan Jan 2019 A1
20190021085 Mochizuki et al. Jan 2019 A1
20190028378 Houjyo et al. Jan 2019 A1
20190028552 Johnson et al. Jan 2019 A1
20190036808 Shenoy et al. Jan 2019 A1
20190036810 Michael et al. Jan 2019 A1
20190036813 Shenoy et al. Jan 2019 A1
20190046056 Khachaturian et al. Feb 2019 A1
20190058657 Chunduri et al. Feb 2019 A1
20190058709 Kempf et al. Feb 2019 A1
20190068470 Mirsky Feb 2019 A1
20190068493 Ram et al. Feb 2019 A1
20190068500 Hira Feb 2019 A1
20190075083 Mayya et al. Mar 2019 A1
20190081894 Yousaf et al. Mar 2019 A1
20190103990 Cidon et al. Apr 2019 A1
20190103991 Cidon et al. Apr 2019 A1
20190103992 Cidon et al. Apr 2019 A1
20190103993 Cidon et al. Apr 2019 A1
20190104035 Cidon et al. Apr 2019 A1
20190104049 Cidon et al. Apr 2019 A1
20190104050 Cidon et al. Apr 2019 A1
20190104051 Cidon et al. Apr 2019 A1
20190104052 Cidon et al. Apr 2019 A1
20190104053 Cidon et al. Apr 2019 A1
20190104063 Cidon et al. Apr 2019 A1
20190104064 Cidon et al. Apr 2019 A1
20190104109 Cidon et al. Apr 2019 A1
20190104111 Cidon et al. Apr 2019 A1
20190104413 Cidon et al. Apr 2019 A1
20190109769 Jain et al. Apr 2019 A1
20190132221 Boutros et al. May 2019 A1
20190132234 Dong et al. May 2019 A1
20190132322 Song et al. May 2019 A1
20190140889 Mayya et al. May 2019 A1
20190140890 Mayya et al. May 2019 A1
20190149525 Gunda et al. May 2019 A1
20190158371 Dillon et al. May 2019 A1
20190158605 Markuze et al. May 2019 A1
20190199539 Deng et al. Jun 2019 A1
20190220703 Prakash et al. Jul 2019 A1
20190222499 Chen et al. Jul 2019 A1
20190238364 Boutros et al. Aug 2019 A1
20190238446 Barzik et al. Aug 2019 A1
20190238449 Michael et al. Aug 2019 A1
20190238450 Michael et al. Aug 2019 A1
20190238483 Marichetty et al. Aug 2019 A1
20190238497 Tourrilhes et al. Aug 2019 A1
20190268421 Markuze et al. Aug 2019 A1
20190268973 Bull et al. Aug 2019 A1
20190278631 Bernat et al. Sep 2019 A1
20190280962 Michael et al. Sep 2019 A1
20190280963 Michael et al. Sep 2019 A1
20190280964 Michael et al. Sep 2019 A1
20190288875 Shen et al. Sep 2019 A1
20190306197 Degioanni Oct 2019 A1
20190306282 Masputra et al. Oct 2019 A1
20190313278 Liu Oct 2019 A1
20190313907 Khachaturian et al. Oct 2019 A1
20190319847 Nahar et al. Oct 2019 A1
20190319881 Maskara et al. Oct 2019 A1
20190327109 Guichard et al. Oct 2019 A1
20190334786 Dutta et al. Oct 2019 A1
20190334813 Raj et al. Oct 2019 A1
20190334820 Zhao Oct 2019 A1
20190342201 Singh Nov 2019 A1
20190342219 Liu et al. Nov 2019 A1
20190356736 Narayanaswamy et al. Nov 2019 A1
20190364099 Thakkar et al. Nov 2019 A1
20190364456 Yu Nov 2019 A1
20190372888 Michael et al. Dec 2019 A1
20190372889 Michael et al. Dec 2019 A1
20190372890 Michael et al. Dec 2019 A1
20190394081 Tahhan et al. Dec 2019 A1
20200014609 Hockett et al. Jan 2020 A1
20200014615 Michael et al. Jan 2020 A1
20200014616 Michael et al. Jan 2020 A1
20200014661 Mayya et al. Jan 2020 A1
20200014663 Chen et al. Jan 2020 A1
20200021514 Michael et al. Jan 2020 A1
20200021515 Michael Jan 2020 A1
20200036624 Michael et al. Jan 2020 A1
20200044943 Bor-Yaliniz et al. Feb 2020 A1
20200044969 Hao et al. Feb 2020 A1
20200059420 Abraham Feb 2020 A1
20200059457 Raza et al. Feb 2020 A1
20200059459 Abraham et al. Feb 2020 A1
20200067831 Spraggins et al. Feb 2020 A1
20200092207 Sipra et al. Mar 2020 A1
20200097327 Beyer et al. Mar 2020 A1
20200099625 Mgit et al. Mar 2020 A1
20200099659 Cometto et al. Mar 2020 A1
20200106696 Michael et al. Apr 2020 A1
20200106706 Mayya et al. Apr 2020 A1
20200119952 Mayya et al. Apr 2020 A1
20200127905 Mayya et al. Apr 2020 A1
20200127911 Gilson et al. Apr 2020 A1
20200153701 Mohan et al. May 2020 A1
20200153736 Liebherr et al. May 2020 A1
20200159661 Keymolen et al. May 2020 A1
20200162407 Tillotson May 2020 A1
20200169473 Rimar et al. May 2020 A1
20200177503 Hooda et al. Jun 2020 A1
20200177550 Valluri et al. Jun 2020 A1
20200177629 Hooda et al. Jun 2020 A1
20200186471 Shen et al. Jun 2020 A1
20200195557 Duan et al. Jun 2020 A1
20200204460 Schneider et al. Jun 2020 A1
20200213212 Dillon et al. Jul 2020 A1
20200213224 Cheng et al. Jul 2020 A1
20200218558 Sreenath et al. Jul 2020 A1
20200235990 Janakiraman et al. Jul 2020 A1
20200235999 Mayya et al. Jul 2020 A1
20200236046 Jain et al. Jul 2020 A1
20200241927 Yang et al. Jul 2020 A1
20200244721 S et al. Jul 2020 A1
20200252234 Ramamoorthi et al. Aug 2020 A1
20200259700 Bhalla et al. Aug 2020 A1
20200267184 Vera-Schockner Aug 2020 A1
20200267203 Jindal et al. Aug 2020 A1
20200280587 Janakiraman et al. Sep 2020 A1
20200287819 Theogaraj et al. Sep 2020 A1
20200287976 Theogaraj et al. Sep 2020 A1
20200296011 Jain et al. Sep 2020 A1
20200296026 Michael et al. Sep 2020 A1
20200301764 Thoresen et al. Sep 2020 A1
20200314006 Mackie et al. Oct 2020 A1
20200314614 Moustafa et al. Oct 2020 A1
20200322230 Natal et al. Oct 2020 A1
20200322287 Connor et al. Oct 2020 A1
20200336336 Sethi et al. Oct 2020 A1
20200344089 Motwani et al. Oct 2020 A1
20200344143 Faseela et al. Oct 2020 A1
20200344163 Gupta et al. Oct 2020 A1
20200351188 Arora et al. Nov 2020 A1
20200358878 Bansal et al. Nov 2020 A1
20200366530 Mukundan et al. Nov 2020 A1
20200366562 Mayya et al. Nov 2020 A1
20200366611 Kommula Nov 2020 A1
20200382345 Zhao et al. Dec 2020 A1
20200382387 Pasupathy et al. Dec 2020 A1
20200403821 Dev et al. Dec 2020 A1
20200412483 Tan et al. Dec 2020 A1
20200412576 Kondapavuluru et al. Dec 2020 A1
20200413283 Shen et al. Dec 2020 A1
20210006482 Hwang et al. Jan 2021 A1
20210006490 Michael et al. Jan 2021 A1
20210021538 Meck et al. Jan 2021 A1
20210029019 Kottapalli Jan 2021 A1
20210029088 Mayya et al. Jan 2021 A1
20210036888 Makkalla et al. Feb 2021 A1
20210036987 Mishra et al. Feb 2021 A1
20210037159 Shimokawa Feb 2021 A1
20210049191 Masson et al. Feb 2021 A1
20210067372 Cidon et al. Mar 2021 A1
20210067373 Cidon et al. Mar 2021 A1
20210067374 Cidon et al. Mar 2021 A1
20210067375 Cidon et al. Mar 2021 A1
20210067407 Cidon et al. Mar 2021 A1
20210067427 Cidon et al. Mar 2021 A1
20210067442 Sundararajan et al. Mar 2021 A1
20210067461 Cidon et al. Mar 2021 A1
20210067464 Cidon et al. Mar 2021 A1
20210067467 Cidon et al. Mar 2021 A1
20210067468 Cidon et al. Mar 2021 A1
20210073001 Rogers et al. Mar 2021 A1
20210092062 Dhanabalan et al. Mar 2021 A1
20210099360 Parsons et al. Apr 2021 A1
20210105199 H et al. Apr 2021 A1
20210111998 Saavedra Apr 2021 A1
20210112034 Sundararajan et al. Apr 2021 A1
20210126830 R et al. Apr 2021 A1
20210126853 Ramaswamy et al. Apr 2021 A1
20210126854 Guo et al. Apr 2021 A1
20210126860 Ramaswamy Apr 2021 A1
20210144091 H et al. May 2021 A1
20210160169 Shen et al. May 2021 A1
20210160813 Gupta et al. May 2021 A1
20210176255 Hill et al. Jun 2021 A1
20210184952 Mayya et al. Jun 2021 A1
20210184966 Ramaswamy Jun 2021 A1
20210184983 Ramaswamy Jun 2021 A1
20210194814 Roux et al. Jun 2021 A1
20210226880 Ramamoorthy et al. Jul 2021 A1
20210234728 Cidon et al. Jul 2021 A1
20210234775 Devadoss et al. Jul 2021 A1
20210234786 Devadoss et al. Jul 2021 A1
20210234804 Devadoss et al. Jul 2021 A1
20210234805 Devadoss et al. Jul 2021 A1
20210235312 Devadoss et al. Jul 2021 A1
20210235313 Devadoss et al. Jul 2021 A1
20210266262 Subramanian et al. Aug 2021 A1
20210279069 Salgaonkar et al. Sep 2021 A1
20210314289 Chandrashekhar et al. Oct 2021 A1
20210314385 Pande et al. Oct 2021 A1
20210328835 Mayya et al. Oct 2021 A1
20210336880 Gupta et al. Oct 2021 A1
20210377109 Shrivastava et al. Dec 2021 A1
20210377156 Michael et al. Dec 2021 A1
20210392060 Silva et al. Dec 2021 A1
20210392070 Tootaghaj et al. Dec 2021 A1
20210399920 Sundararajan et al. Dec 2021 A1
20210399978 Michael et al. Dec 2021 A9
20210400113 Markuze et al. Dec 2021 A1
20210400512 Agarwal et al. Dec 2021 A1
20210409277 Jeuk et al. Dec 2021 A1
20220006726 Michael et al. Jan 2022 A1
20220006751 Ramaswamy et al. Jan 2022 A1
20220006756 Ramaswamy Jan 2022 A1
20220029902 Shemer et al. Jan 2022 A1
20220035673 Markuze et al. Feb 2022 A1
20220038370 Vasseur et al. Feb 2022 A1
20220038557 Markuze et al. Feb 2022 A1
20220045927 Liu et al. Feb 2022 A1
20220052928 Sundararajan et al. Feb 2022 A1
20220061059 Dunsmore et al. Feb 2022 A1
20220086035 Devaraj et al. Mar 2022 A1
20220094644 Cidon et al. Mar 2022 A1
20220123961 Mukundan et al. Apr 2022 A1
20220131740 Mayya et al. Apr 2022 A1
20220131807 Srinivas et al. Apr 2022 A1
20220131898 Hooda et al. Apr 2022 A1
20220141184 Oswal et al. May 2022 A1
20220158923 Ramaswamy et al. May 2022 A1
20220158924 Ramaswamy et al. May 2022 A1
20220158926 Wennerström et al. May 2022 A1
20220166713 Markuze et al. May 2022 A1
20220191719 Roy Jun 2022 A1
20220198229 López et al. Jun 2022 A1
20220210035 Hendrickson et al. Jun 2022 A1
20220210041 Gandhi et al. Jun 2022 A1
20220210042 Gandhi et al. Jun 2022 A1
20220210122 Levin et al. Jun 2022 A1
20220217015 Vuggrala et al. Jul 2022 A1
20220231949 Ramaswamy et al. Jul 2022 A1
20220231950 Ramaswamy et al. Jul 2022 A1
20220232411 Vijayakumar et al. Jul 2022 A1
20220239596 Kumar et al. Jul 2022 A1
20220294701 Mayya et al. Sep 2022 A1
20220335027 Seshadri et al. Oct 2022 A1
20220337553 Mayya et al. Oct 2022 A1
20220353152 Ramaswamy Nov 2022 A1
20220353171 Ramaswamy et al. Nov 2022 A1
20220353175 Ramaswamy et al. Nov 2022 A1
20220353182 Ramaswamy et al. Nov 2022 A1
20220353190 Ramaswamy et al. Nov 2022 A1
20220360500 Ramaswamy et al. Nov 2022 A1
20220407773 Kempanna et al. Dec 2022 A1
20220407774 Kempanna et al. Dec 2022 A1
20220407790 Kempanna et al. Dec 2022 A1
20220407820 Kempanna et al. Dec 2022 A1
20220407915 Kempanna et al. Dec 2022 A1
20230006929 Mayya et al. Jan 2023 A1
20230025586 Rolando et al. Jan 2023 A1
20230026330 Rolando et al. Jan 2023 A1
20230026865 Rolando et al. Jan 2023 A1
20230028872 Ramaswamy Jan 2023 A1
20230039869 Ramaswamy et al. Feb 2023 A1
20230041916 Zhang et al. Feb 2023 A1
20230054961 Ramaswamy et al. Feb 2023 A1
20230105680 Simlai et al. Apr 2023 A1
20230121871 Mayya et al. Apr 2023 A1
20230156826 Palermo May 2023 A1
20230179445 Cidon et al. Jun 2023 A1
20230179502 Ramaswamy et al. Jun 2023 A1
20230179521 Markuze et al. Jun 2023 A1
20230179543 Cidon et al. Jun 2023 A1
20230216768 Zohar et al. Jul 2023 A1
20230216801 Markuze et al. Jul 2023 A1
20230216804 Zohar et al. Jul 2023 A1
20230221874 Markuze et al. Jul 2023 A1
20230224356 Markuze et al. Jul 2023 A1
20230224759 Ramaswamy Jul 2023 A1
20230231845 Manoharan et al. Jul 2023 A1
20230239234 Zohar et al. Jul 2023 A1
20230261974 Ramaswamy Aug 2023 A1
20240031264 Nigam Jan 2024 A1
Foreign Referenced Citations (52)
Number Date Country
1926809 Mar 2007 CN
102577270 Jul 2012 CN
102811165 Dec 2012 CN
104956329 Sep 2015 CN
106656847 May 2017 CN
106998284 Aug 2017 CN
110447209 Nov 2019 CN
111198764 May 2020 CN
116783874 Sep 2023 CN
117178535 Dec 2023 CN
1912381 Apr 2008 EP
2538637 Dec 2012 EP
2763362 Aug 2014 EP
3041178 Jul 2016 EP
3297211 Mar 2018 EP
3509256 Jul 2019 EP
3346650 Nov 2019 EP
106230650 Dec 2016 IN
2002368792 Dec 2002 JP
2010233126 Oct 2010 JP
2014200010 Oct 2014 JP
2017059991 Mar 2017 JP
2017524290 Aug 2017 JP
20170058201 May 2017 KR
2574350 Feb 2016 RU
03073701 Sep 2003 WO
2005071861 Aug 2005 WO
2007016834 Feb 2007 WO
2012167184 Dec 2012 WO
2015092565 Jun 2015 WO
2016061546 Apr 2016 WO
2016123314 Aug 2016 WO
2017083975 May 2017 WO
2019070611 Apr 2019 WO
2019094522 May 2019 WO
2020012491 Jan 2020 WO
2020018704 Jan 2020 WO
2020091777 May 2020 WO
2020101922 May 2020 WO
2020112345 Jun 2020 WO
2021040934 Mar 2021 WO
2021118717 Jun 2021 WO
2021150465 Jul 2021 WO
2021211906 Oct 2021 WO
2022005607 Jan 2022 WO
2022082680 Apr 2022 WO
2022154850 Jul 2022 WO
2022159156 Jul 2022 WO
2022231668 Nov 2022 WO
2022235303 Nov 2022 WO
2022265681 Dec 2022 WO
2023009159 Feb 2023 WO
Non-Patent Literature Citations (58)
Entry
Non-Published Commonly Owned U.S. Appl. No. 18/211,568, filed Jun. 19, 2023, 37 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/222,864, filed Jul. 17, 2023, 350 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/222,868, filed Jul. 17, 2023, 22 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/224,466, filed Jul. 20, 2023, 56 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/235,879, filed Aug. 20, 2023, 173 pages, VMware, Inc.
Alsaeedi, Mohammed, et al., “Toward Adaptive and Scalable OpenFlow-SDN Flow Control: A Survey,” IEEE Access, Aug. 1, 2019, 34 pages, vol. 7, IEEE, retrieved from https://ieeexplore.ieee.org/document/8784036.
Alvizu, Rodolfo, et al., “SDN-Based Network Orchestration for New Dynamic Enterprise Networking Services,” 2017 19th International Conference on Transparent Optical Networks, Jul. 2-6, 2017, 4 pages, IEEE, Girona, Spain.
Author Unknown, “VeloCloud Administration Guide: VMware SD-WAN by VeloCloud 3.3,” Month Unknown 2019, 366 pages, VMware, Inc., Palo Alto, CA, USA.
Barozet, Jean-Marc, “Cisco SD-WAN as a Managed Service,” BRKRST-2558, Jan. 27-31, 2020, 98 pages, Cisco, Barcelona, Spain, retrieved from https://www.ciscolive.com/c/dam/r/ciscolive/emea/docs/2020/pdf/BRKRST-2558.pdf.
Barozet, Jean-Marc, “Cisco SDWAN,” Deep Dive, Dec. 2017, 185 pages, Cisco, Retreived from https://www.coursehero.com/file/71671376/Cisco-SDWAN-Deep-Divepdf/.
Bertaux, Lionel, et al., “Software Defined Networking and Virtualization for Broadband Satellite Networks,” IEEE Communications Magazine, Mar. 18, 2015, 7 pages, vol. 53, IEEE, retrieved from https://ieeexplore.ieee.org/document/7060482.
Cox, Jacob H., et al., “Advancing Software-Defined Networks: A Survey,” IEEE Access, Oct. 12, 2017, 40 pages, vol. 5, IEEE, retrieved from https://ieeexplore.ieee.org/document/8066287.
Del Piccolo, Valentin, et al., “A Survey of Network Isolation Solutions for Multi-Tenant Data Centers,” IEEE Communications Society, Apr. 20, 2016, vol. 18, No. 4, 37 pages, IEEE.
Duan, Zhenhai, et al., “Service Overlay Networks: SLAs, QoS, and Bandwidth Provisioning,” IEEE/ACM Transactions on Networking, Dec. 2003, 14 pages, vol. 11, IEEE, New York, NY, USA.
Fortz, Bernard, et al., “Internet Traffic Engineering by Optimizing OSPF Weights,” Proceedings IEEE Infocom 2000, Conference on Computer Communications, Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies, Mar. 26-30, 2000, 11 pages, IEEE, Tel Aviv, Israel, Israel.
Francois, Frederic, et al., “Optimizing Secure SDN-enabled Inter-Data Centre Overlay Networks through Cognitive Routing,” 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), Sep. 19-21, 2016, 10 pages, IEEE, London, UK.
Funabiki, Nobuo, et al., “A Frame Aggregation Extension of Routing Algorithm for Wireless Mesh Networks,” 2014 Second International Symposium on Computing and Networking, Dec. 10-12, 2014, 5 pages, IEEE, Shizuoka, Japan.
Guo, Xiangyi, et al., (U.S. Appl. No. 62/925,193), filed Oct. 23, 2019, 26 pages.
Huang, Cancan, et al., “Modification of Q.SD-WAN,” Rapporteur Group Meeting—Doc, Study Period 2017-2020, Q4/11-DOC1 (190410), Study Group 11, Apr. 10, 2019, 19 pages, International Telecommunication Union, Geneva, Switzerland.
Jivorasetkul, Supalerk, et al., “End-to-End Header Compression over Software-Defined Networks: a Low Latency Network Architecture,” 2012 Fourth International Conference on Intelligent Networking and Collaborative Systems, Sep. 19-21, 2012, 2 pages, IEEE, Bucharest, Romania.
Lasserre, Marc, et al., “Framework for Data Center (DC) Network Virtualization,” RFC 7365, Oct. 2014, 26 pages, IETF.
Li, Shengru, et al., “Source Routing with Protocol-oblivious Forwarding (POF) to Enable Efficient e-Health Data Transfers,” 2016 IEEE International Conference on Communications (ICC), May 22-27, 2016, 6 pages, IEEE, Kuala Lumpur, Malaysia.
Lin, Weidong, et al., “Using Path Label Routing in Wide Area Software-Defined Networks with Open Flow,” 2016 International Conference on Networking and Network Applications, Jul. 2016, 6 pages, IEEE.
Long, Feng, “Research and Application of Cloud Storage Technology in University Information Service,” Chinese Excellent Masters' Theses Full-text Database, Mar. 2013, 72 pages, China Academic Journals Electronic Publishing House, China.
Michael, Nithin, et al., “HALO: Hop-by-Hop Adaptive Link-State Optimal Routing,” IEEE/ACM Transactions on Networking, Dec. 2015, 14 pages, vol. 23, No. 6, IEEE.
Ming, Gao, et al., “A Design of SD-WAN-Oriented Wide Area Network Access,” 2020 International Conference on Computer Communication and Network Security (CCNS), Aug. 21-23, 2020, 4 pages, IEEE, Xi'an, China.
Mishra, Mayank, et al., “Managing Network Reservation for Tenants in Oversubscribed Clouds,” 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, Aug. 14-16, 2013, 10 pages, IEEE, San Francisco, CA, USA.
Mudigonda, Jayaram, et al., “NetLord: A Scalable Multi-Tenant Network Architecture for Virtualized Datacenters,” Proceedings of the ACM SIGCOMM 2011 Conference, Aug. 15-19, 2011, 12 pages, ACM, Toronto, Canada.
Non-Published Commonly Owned U.S. Appl. No. 17/574,225, filed Jan. 12, 2022, 56 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 17/574,236, filed Jan. 12, 2022, 54 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 17/833,555, filed Jun. 6, 2022, 34 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 17/833,566, filed Jun. 6, 2022, 35 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 17/966,814, filed Oct. 15, 2022, 176 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 17/966,820, filed Oct. 15, 2022, 26 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 17/976,717, filed Oct. 28, 2022, 37 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/088,554, filed Dec. 24, 2022, 34 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/088,555, filed Dec. 24, 2022, 35 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/088,556, filed Dec. 24, 2022, 27 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/096,001, filed Jan. 11, 2023, 34 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/100,369, filed Jan. 23, 2023, 55 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/100,381, filed Jan. 23, 2023, 55 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/100,397, filed Jan. 23, 2023, 55 pages, VMware, Inc.
Non-Published Commonly Owned Related U.S. Appl. No. 18/126,989 with similar specification, filed Mar. 27, 2023, 83 pages, VMware, Inc.
Non-Published Commonly Owned Related U.S. Appl. No. 18/126,991 with similar specification, filed Mar. 27, 2023, 84 pages, VMware, Inc.
Non-Published Commonly Owned Related U.S. Appl. No. 18/126,992 with similar specification, filed Mar. 27, 2023, 84 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/137,584, filed Apr. 21, 2023, 57 pages, VMware, Inc.
Non-Published Commonly Owned U.S. Appl. No. 18/197,090, filed May 14, 2023, 36 pages, Nicira, Inc.
Noormohammadpour, Mohammad, et al., “DCRoute: Speeding up Inter-Datacenter Traffic Allocation while Guaranteeing Deadlines,” 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), Dec. 19-22, 2016, 9 pages, IEEE, Hyderabad, India.
Ray, Saikat, et al., “Always Acyclic Distributed Path Computation,” University of Pennsylvania Department of Electrical and Systems Engineering Technical Report, May 2008, 16 pages, University of Pennsylvania ScholarlyCommons.
Sarhan, Soliman Abd Elmonsef, et al., “Data Inspection in SDN Network,” 2018 13th International Conference on Computer Engineering and Systems (ICCES), Dec. 18-19, 2018, 6 pages, IEEE, Cairo, Egypt.
Taleb, Tarik, “D4.1 Mobile Network Cloud Component Design,” Mobile Cloud Networking, Nov. 8, 2013, 210 pages, MobileCloud Networking Consortium, retrieved from http://www.mobile-cloud-networking.eu/site/index.php? process=download&id=127&code=89d30565cd2ce087d3f8e95f9ad683066510a61f.
Tootaghaj, Diman Zad, et al., “Homa: An Efficient Topology and Route Management Approach in SD-WAN Overlays,” IEEE INFOCOM 2020_13 IEEE Conference on Computer Communications, Jul. 6-9, 2020, 10 pages, IEEE, Toronto, ON, Canada.
Valtulina, Luca, “Seamless Distributed Mobility Management (DMM) Solution in Cloud Based LTE Systems,” Master Thesis, Nov. 2013, 168 pages, University of Twente, retrieved from http://essay.utwente.nl/64411/1/Luca_Valtulina_MSc_Report_final.pdf.
Webb, Kevin C., et al., “Blender: Upgrading Tenant-Based Data Center Networking,” 2014 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), Oct. 20-21, 2014, 11 pages, IEEE, Marina del Rey, CA, USA.
Xie, Junfeng, et al., A Survey of Machine Learning Techniques Applied to Software Defined Networking (SDN): Research Issues and Challenges, IEEE Communications Surveys & Tutorials, Aug. 23, 2018, 38 pages, vol. 21, Issue 1, IEEE.
Yap, Kok-Kiong, et al., “Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering,” SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, Aug. 21-25, 2017, 14 pages, Los Angeles, CA.
Zakurdaev, Gieorgi, et al., “Dynamic On-Demand Virtual Extensible LAN Tunnels via Software-Defined Wide Area Networks,” 2022 IEEE 12th Annual Computing and Communication Workshop and Conference, Jan. 26-29, 2022, 6 pages, IEEE, Las Vegas, NV, USA.
Non-Published Commonly Owned U.S. Appl. No. 15/803,964, filed Nov. 6, 2017, 15 pages, The Mode Group.