Claims
- 1. An integrated Overload Monitoring System (OMS), for Critical Service Level Agreement (CSLA) Identification based on bandwidth (BW) and Near-Optimal Traffic Analysis for Forecasting BW-related CSLA Operator Violations at a future time point, comprising of:
a) a subsystem for the identification of critical SLAs from the data obtained from external repositories wherein a critical SLA is an SLA which is of strategic importance to an operator and hence requiring the monitoring of the same; b) a subsystem to regionalize (cluster) the nodes in the network, using a hierarchical clustering based on similar load behavior, wherein the nodes under consideration are the nodes in a provider's network; c) a Forecast Model (FM) selection subsystem to select horizontal and vertical usage pattern models based on the usage pattern exhibited in the historical data; d) a subsystem to determine the overall load due to critical SLAs by determining Usage Upper Bound for all non-critical SLAs at periodic time intervals and further reduce the load due to non-critical SLAs from the overall load on the network; e) a subsystem for offline SLA load prediction to predict the load due to a critical SLA at a future time point based on historical traffic data using vertical FM; f) plurality of universal network probes that execute the role of master and/or slave probes to perform a near-optimal analysis of the network traffic for critical SLAs; g) a subsystem for network traffic analysis by the network probes based on two distinct configurable clocks with different frequencies to minimize the monitor-data flow across the network; h) a subsystem for real-time SLA load prediction to predict the load due to critical SLAs at a future time point based on current traffic data using horizontal FM; and i) a subsystem to predict alarm set points based on horizontal and vertical FMs to generate alarms based on alarm set point consistency, and further to escalate in case of no acknowledgement for the generated alarm.
- 2. The system of claim 1, wherein said CSLA subsystem comprises of a procedure to determine critical SLAs for monitoring and forecasting purposes using:
a) An automatic SLA selection technique to identify SLAs based on an optimization procedure (O-CSLAs); b) Past violation history of SLAs (V-CSLAs); c) Manual selection based on operator preference (M-CSLAs); and d) New SLAs (N-CSLAs).
- 3. The system of claim 2 further comprises of an SLA load analysis procedure to compute the load due to the SLA over a predefined analysis time period (ATP), partitioning ATP into contiguous analysis time windows (ATWs), wherein the load due to the SLA is computed over ATWs.
- 4. The system of claim 3, wherein said load analysis procedure computes the load for the SLA at each ATW by aggregating traffic (packets) specific to the SLA at any node based on the source time-stamp of packets and dividing by ATW period.
- 5. The system of claim 3 further comprises of a ranking procedure wherein the load due to the SLAs at each ATW is aggregated and ranked in decreasing order of load, and a configurable number of ATWs, termed as sensitive ATWs (SATWs) are identified.
- 6. The system of claim 2 further comprises of a three-step CSLA-Optimum set identification procedure to identify O-CSLAs.
- 7. The system of claim 6, wherein the step 1 of said three-step procedure comprises of an optimization procedure to associate weightage to each SLA over SATWs wherein the optimization procedure minimizes the variation in the load due to SLAs based on the peak load characteristics of the SLAs.
- 8. The system of claim 6, wherein step 2 of said three-step procedure applies a filtering procedure over each of the SATWs on each of the weighted SLAs to filter out SLAs that have crossed their Soft Upper Bound (SUB) in any one SATW, wherein SUBs are pre-agreed BW values as per the SLAs.
- 9. The system of claim 6, wherein step 3 of the said three-step procedure determines the set of O-CSLAs by ranking the aggregated weights, that pertain to an SLA across SATWs, in an increasing order, from which a configurable number of O-CSLAs are identified.
- 10. The system of claim 2 further comprises of CSLA-Violation set identification procedure to identify a configurable number of V-CSLAs wherein CSLAs are selected based on the historical data associated with SLA violations and SLA-related complaints.
- 11. The system of claim 2 further comprises of a CSLA-Manual set identification procedure to identify a configurable number of M-CSLAs wherein CSLAs are selected manually based on operator preferences.
- 12. The system of claim 2 further comprises of a CSLA-NewSLA set identification procedure to identify a configurable number of N-CSLAs based on the agreed upon BW as per the SLA of the new subscribers.
- 13. The system of claim 2 comprises of means to obtain a final set of CSLAs as the union of the CSLAs obtained from the CSLA-Optimum set procedure, CSLA-Violation set procedure, CSLA-Manual set procedure and CSLA-NewSLA set procedure.
- 14. The system of claim 1, wherein said node regionalization subsystem comprises of a procedure to regionalize nodes wherein the nodes under consideration are the nodes in the provider's network.
- 15. The system of claim 14 further comprises of a procedure to identify a contiguous region of nodes based on the number of hops from the identified seed nodes in the network.
- 16. The system of claim 15 further comprises of a procedure to form SLA-contiguous regions for each SLA by extracting only those nodes that carry SLA related traffic, from the said contiguous regions.
- 17. The system of claim 16 further comprises of a procedure to analyze the load due to an SLA over ATP by partitioning the traffic data into ATWs in each of these SLA-contiguous regions.
- 18. The system of claim 17 further comprises of a procedure to compute load vectors for the said ATWs in each of the nodes in each of the said SLA-contiguous regions, wherein the node related traffic data is characterized by the load vector.
- 19. The system of claim 18 further comprises of a hierarchical clustering procedure, based on the similarity exhibited in the load pattern within an SLA-contiguous region, to form clusters of load vectors at various hierarchical levels.
- 20. The system of claim 19, wherein said hierarchical clustering procedure groups a collection of load vectors with similar mean characteristics into Similar Load Sub-Clusters (SLSCs) based on the correlation among the said load vectors, wherein the elements of each SLSC exhibit similar load pattern behavior.
- 21. The system of claim 20 further comprises of a procedure to identify a node in the centroid's position of an SLSC as the Master Node (MN) and other nodes in the SLSC as Slave Nodes (SNs).
- 22. The system of claim 19 further comprises of a procedure to identify and combine all singleton clusters obtained from the said hierarchical clustering procedure into a Variable Load Sub-Cluster (VLSC).
- 23. The system of claim 22 further comprises of a procedure to randomly identify a node as the Master Node for the VLSC and other nodes in the VLSC are identified as Slave nodes.
- 24. The system of claim 1, wherein said forecast model (FM) selection subsystem comprises of a procedure to forecast the load due to an SLA based on the historical data.
- 25. The system of claim 24 further comprises of a procedure that applies a suite of numerical forecasting models (SFM) over MaxTBOL and identifies the forecast model that generates the least prediction error based on the historical bandwidth utilization data of the SLA, wherein MaxTBOL is the maximum future time point of interest.
- 26. The system of claim 24 further comprises of a forecast procedure that forecasts the load due to SLAs at DTP sub-intervals across a set of ATPs, wherein DTP is a sub-period of ATP.
- 27. The system of claim 26, wherein said forecast procedure comprises of an Analysis Intervals Identification procedure based on correlation in SLA load behavior to identify Analysis Intervals (AIs) within each DTP over ATPs.
- 28. The system of claim 26 further comprises of a procedure to identify FMs for an SLA using two procedures, namely, Horizontal BFM Model Selection procedure and Vertical BFM Model Selection procedure.
- 29. The system of claim 28, wherein said Horizontal BFM Model Selection procedure identifies FMs for an SLA using the corresponding actual data of last DTP to generate predictions and further selecting the FM that generates the least model error as the Best Fit Model (BFMh) for the AI.
- 30. The system of claim 28, wherein said Vertical BFM Model Selection procedure identifies FMs for an SLA using the corresponding actual data of previous DTPs to generate predictions and further selecting the FM that generates the least model error as the Best Fit Model (BFMv) for the AI.
- 31. The system of claim 24 further comprises of a procedure wherein consecutive AIs are merged into continuous interval (CIs) based on least merge error (LME), when the number of As identified for the DTP are more than the pre-specified number of AIs.
- 32. The system of claim 31, wherein said procedure comprises of a procedure wherein the merge error is computed by applying the identified BFMvs and BFMhs corresponding to AIs, across consecutive AIs and selecting the model that generates the LME across the consecutive AIs as the BFM for the CI.
- 33. The system of claim 1, wherein said UUB subsystem comprises of a procedure that determines the Usage Upper Bound (UUB) for a non-critical SLA (X-CSLA) by computing the load due to the SLA at ATWs over a pre-specified ATP.
- 34. The system of claim 33 further comprises of load analysis procedure to compute the load for an X-SLA at each ATW by aggregating traffic (packets) specific to the SLA at any node based on the source time-stamp of packets and dividing by ATW period.
- 35. The system of claim 33 further comprises of a procedure that forecasts the most probable load over DTP due to an X-SLA by analyzing its past load over multiple associated DTPs across ATPs, and further divides the X-SLA load over the DTP into H analysis sub-intervals.
- 36. The system of claim 33 further comprises of a procedure to determine the peak load (UUB) at the H sub-intervals for all the X-CSLAs.
- 37. The system of claim 35, wherein said procedure is computed for each DTP associated with an ATP.
- 38. The system of claim 33 further consists of a procedure to determine the bandwidth hard upper bound (HUB) for CSLAs by computing the difference between the overall network capacity (NWC) and the sum of the forecasted UUBs for all the non-critical SLAs, wherein the overall network bandwidth is the network capacity determined based on network topology.
- 39. The system of claim 1, wherein said offline prediction subsystem comprises of a procedure that predicts a network overload at a given TBOL time point and further checks if the load due to the SLA is below its SUB and the usage pattern of the SLA shows an upward trend at the pre-defined points around TBOL.
- 40. The system of claim 39, further generates an SLA tuple of Os and is wherein a tuple entry of 1 indicates confirmation to Operator SLA violation wherein the SLA Tuple is used by the online component for generating operator violations.
- 41. The system of claim 1, wherein said UNP subsystem comprises of a procedure to pre-install the universal network probe (UNP) in all the network nodes to aid in SLA pre-violation predictions.
- 42. The system of claim 41 further comprises of a procedure that installs Slave Node Probes (SNP) in Slave Nodes and Master Node Probes (MNP) in Master Nodes wherein the SNP collects traffic data related to its node and the MNP receives traffic data from SNPs in its SLSC, collects traffic data from its node, and forecasts the load due to the nodes in its SLSC.
- 43. The system of claim 1, wherein said parent probe subsystem comprises of a procedure that creates unique parent probes for every critical SLA to forecast the load due to the CSLAs.
- 44. The system of claim 41 further comprises of a procedure for network traffic analysis based on two distinct configurable clocks, namely, the master clock and the slave clock wherein the Master Clock Frequency (MCF) is much higher than the Slave Clock Frequency (SCF) to aid in reducing the monitor-data flow across the network.
- 45. The system of claim 41 further comprises of a procedure wherein the parent probe associated with a CSLA communicates a set of distinct parameters at ATP intervals to UNPs that need to monitor the traffic related to the CSLA
- 46. The system of claim 41 further comprises of a procedure wherein the parent probe associated with a CSLA communicates parameters such as probe type (master), MCF interval and the CSLA related information to the MNPs for the CSLA as per claim 45 at ATP intervals.
- 47. The system of claim 41 further comprises of a procedure wherein the parent probe associated with an SLA communicates parameters such as probe type (Slave), the associated MN, SCF and the CSLA related information to the SNPs for the SLA as per claim 45 at ATP intervals.
- 48. The system of claim 41 further comprises of a procedure wherein an SNP associated with a CSLA filters and computes the load on the SN and communicates the load data to the associated MNP at SCF intervals.
- 49. The system of claim 41 further comprises of a procedure wherein the MNP associated with a CSLA filters and computes the load on the MN at MCF intervals.
- 50. The system of claim 41 further comprises of a procedure wherein the MNP estimates the load on the nodes in the associated SLSC at MCF based on the received past traffic data from the SNPs at SCF and its own computed load at MCF.
- 51. The system of claim 41 further comprises of a procedure wherein the MNP communicates the associated SLSC forecasted load to the parent probe at MCF.
- 52. The system of claim 41 further comprises of a procedure wherein parent probe communicates the same master clock frequency to both slave and master nodes of a VLSC.
- 53. The system of claim 41 further comprises of a procedure wherein the MNP computes the load on the nodes in the associated VLSC at MCF based on the received past traffic data from the SNPs at MCF and its own computed load at MCF.
- 54. The system of claim 43 further comprises of a procedure that uses a Time Before OverLoad (TBOL) point wherein TBOL is the future time point at which load on the network due to critical SLAs is forecasted.
- 55. The system of claim 43 further comprises of a procedure wherein the parent probe forecasts the load due to its associated CSLA at each MCF wherein the forecast is for a set of Time Points Of Prediction (TTPP) around TBOL and the forecast is based on the horizontal prediction model (BFMh) generated during offline analysis for the TBOL point.
- 56. The system of claim 43 further comprises of a trending procedure wherein a parent probe analyzes the forecasted load associated with its CSLA and identifies the underlying trend (an upward or downward tendency) of the forecasted load pattern.
- 57. The system of claim 1, wherein said Alarm Manager Component subsystem comprises of a procedure that receives the forecasted load and the trend information related to CSLAs from parent probes.
- 58. The system of claim 57 further comprises of an Alarm Prediction Module wherein the alarm prediction module, based on the information received from the parent probes, computes the load due to all critical SLAs, and determines an overload condition when the forecasted load exceeds HUB.
- 59. The system of claim 58, wherein said Alarm Prediction Module further determines network overload at a given TBOL time point and checks if the load due to the SLA is below its SUB and the usage pattern of the SLA shows an upward trend at all the pre-defined points around TBOL.
- 60. The system of claim 58, wherein said Alarm Prediction Module further generates an Online CSLA tuple of Is, if the condition in the system of claim 59 is met.
- 61. The system of claim 58, wherein said Alarm Prediction Module further forms an Alarm Tuple by combining the said Online CSLA tuple of is with the CSLA tuple generated during offline analysis, wherein both tuples correspond to the same TBOL time point.
- 62. The system of claim 58, wherein said Alarm Prediction Module further predicts an Alarm Set Point (ASP) for the CSLA when the numbers of 1s in the Alarm tuple exceeds a pre-specified value.
- 63. The system of claim 57 further comprises of an Alarm Confirmation Module that confirms the ASP, when there is a consistent ASP at a pre-specified number of successive MCF intervals, by setting an Operator Violation Alarm (OVA).
- 64. The system of claim 57 further comprises of an Alarm Generation & Escalation Module that associates an alarm escalation level to the said pre-specified candidate operators.
- 65. The system of claim 64, wherein said Alarm Generation & Escalation Module further checks the health of the network after a pre-specified MTTR (Mean time to Repair) interval, if it receives an acknowledgement for the OVA alarm.
- 66. The system of claim 64, wherein said Alarm Generation & Escalation Module further generates SLA Recovery Notifications (SRNs), wherein SRNs are generated if the health of the network after a pre-specified MTTR is in the recovered state.
- 67. The system of claim 64, wherein said Alarm Generation & Escalation Module further re-generates the said OVA alarms if the health of the network after a pre-specified MTTR is not in the recovered state.
- 68. The system of claim 64 further comprises of an alarm escalation procedure that further escalates the alarm to the next level of the pre-specified candidate operators after a pre-specified Acknowledgement time (AT), if an acknowledgement is not received for the generated OVA alarm within AT.
- 69. An apparatus for CSLA identification based on bandwidth and near-optimal traffic analysis for forecasting bandwidth-related CSLA operator violations at a future time point, comprising of:
a) an offline computer system to execute offline procedures; and b) an online computer system to execute online procedures and network probes in network element systems that is part of a provider's network.
- 70. The system of claim 69, wherein said offline procedures consists of a procedure for critical SLA identification, a procedure for regionalization of network nodes, a procedure for historical data based selection of forecast models, a procedure for computing overall load due to CSLAs and a procedure for offline operator SLA violation prediction at a future time point based on vertical FM.
- 71. The system of claim 69, wherein said online procedures consists of a procedure for creation of unique probes for each CSLA, a procedure in a network element system to perform traffic analysis at master clock frequencies, a procedure in a network element system to perform traffic analysis at slave clock frequencies, a procedure for activation of master and slave probes in network element systems, a procedure to forecast CSLA load at a future time point, a procedure to predict real-time operator SLA violations, a procedure to check operator violation prediction consistency, and a procedure to generate and escalate operator violation alarms.
- 72. An apparatus, for CSLA identification based on bandwidth and near-optimal traffic analysis for forecasting bandwidth-related CSLA operator violations at a future time point, coupled to a communication system for communicating plurality of information comprising of:
a) offline computational results related to CSLA identification and regionalization at ATP intervals to the online computer system that is part of the said apparatus; b) offline computational results related to vertical FM based load forecast and CSLA load computation at DTP intervals to the online computer system that is part of the said apparatus; and c) traffic analysis results at MCF intervals from network element systems that is part of a provider's network to the online computer system that is part of the said apparatus.
FIELD OF INVENTION
[0001] This present invention relates to real-time monitoring of SLAs in general and more particularly monitoring a subset of SLAs. Still more particularly, the present invention relates to a system and method for prediction of operator SLA violation based on monitor-data.