The present invention relates to methods for accurate downtime calculation; more particularly, to methods for downtime calculation of systems with a stable baseline signal, especially for distributed ledger systems monitoring block height increments as an input data to determine the downtime interval and total downtime.
A service-level objective (SLO) is a key element of a service-level agreement (SLA) between a service provider and a customer. SLOs are agreed upon as a means of measuring the performance of the service provider and are outlined as a way of avoiding disputes between the two parties owing to misunderstanding. Several types of measures are frequently applied in SLOs, such as availability, service desk response, response time, etc. Among them, availability is the degree to which a system or equipment is in a specified operable and committable state at the start of a mission when the mission is called. Therefore, for setting feasible SLOs, it is crucial for a service provider to accurately measure the availability of the provided services.
Downtime refers to periods when a system is unavailable or offline, which is usually a result of the system failing to function because of an unplanned event or a routine maintenance. The measurement of system downtime is a direct way to know the availability of the system. For a system normally operates with a stable baseline for specific signals (e.g. the voltage in a power supply), the measurement of system downtime can be achieve by monitoring the signal from the system, and the downtime can be claimed when the signal deviates from the baseline and does not recover within a predetermined short period.
As the development of blockchain techniques, the clearance and settlement of digital property using blockchain consensus in the network within a business alliance becomes available, as provided in International Application PCT/US2017/012635 filed on Jan. 6, 2017. This method, however, requires a majority of nodes to operate normally in order to reach a consensus in the transaction. To evaluate the availability of this service, a probabilistic model may be applied to check the status of the validator nodes, and thus to check the overall network status. In the probabilistic model, it regularly counts the number of healthy validators on the network once every a specific time (e.g. once every minute). If more than a specific number of validators (e.g. ½ or ⅓) are down, the network halts, and the system is declared to be in downtime.
The drawback of the probabilistic model is that it might have a higher chance to report false positives downtime, which means that the network is still up but is reported down. It is especially the case when signals from some nodes are not sent to the network administrator perfectly. The probabilistic model is the estimated measure of the network performance using signals from the agent coordinator. For example, if more than ⅓ of the signals are missing due to the temporary down of the agent coordinator but the actual network is still growing, the probabilistic model will report that the network is down.
Accordingly, it is desirable to develop methods to improve the system downtime calculation in the measure of service level objection.
To resolve the problems, the present invention provides a method for accurate downtime calculation based on measured block height increments as an input data to determine the downtime interval and total downtime of the system.
The present disclosure described a method for signal processing, comprising filtering an input signal from a system by a closing process and an opening process to generate a first smoothed signal. The closing process comprises a first dilation process followed by a first erosion process, and the opening process comprises a second erosion process followed by a second dilation process.
In one embodiment, each of the first dilation process and the second dilation process is performed by outputting its local maximum within a normal dilation window for each data point of the input signal, and each of the first erosion process and the second erosion process is performed by outputting its local minimum within a normal erosion window for each data point of the input signal.
In one embodiment, the method of the present invention further comprises generating a second smoothed signal by a moving average filter, wherein each data point of the second smoothed signal is generated by calculating the average value of all local data points within an averaging window for each data point of the first smoothed signal, and outputting the averaging result.
In one embodiment, the method further comprising: (1) generating a series of morphology event labels representing system down events or up events based on the first smoothed signal; (2) generating a series of average event labels representing system down events or up events based on the second smoothed signal. The series of morphology event labels indicating the positions of sharp downward steps and sharp upward steps of the first smoothed signal, and the series of average event labels indicating the positions where the second smoothed signal crosses an average threshold value.
In one embodiment, the series of average up labels further comprise a series of artificial up labels indicating the second smoothed signal goes above the average threshold value following a morphology down label.
In one embodiment, the method further comprises determining one or more downtime intervals based on the morphology event labels and the average event labels, wherein each of the downtime intervals indicates the interval when the system does not operate normally.
In one embodiment, the downtime interval is determined by a state machine with multiple states with the steps of: (1) integrating the morphology event labels and the average event labels into the same index based on their corresponding positions; (2) determining the state of the first position in region of interest; (3) determining the state of each position from the second position to the last position in region of interest based on the state of its previous position and the existence of a morphology event label or an average event label at that position; and (4) determining the downtime interval based on the states of all positions in region of interest.
In one embodiment, the states in the state machine comprise Up state, Metastable Down state and Deep Down state, wherein the state of the first position is set to Up state; and the state of the remaining position are determined by the following rules: (1) if no morphology event label or average event label is identified at that position, the state of the position is the same as the previous position; (2) if an average up label is identified at that position, the state of the position is Up state; (3) if an average down label is identified at that position, the state of the position is Deep Down state; (4) if the previous position is Up state or Metastable Down state, and a morphology up label is identified at that position, the state of the position is Up state; (5) if the previous position is Up state or Metastable Down state, and a morphology down label is identified at that position, the state of the position is Metastable Down state; and (6) if the previous position is Deep Down state and a morphology up label or a morphology down label is identified at that position, the state of the position is Deep Down state.
In one embodiment, the downtime interval is determined by: (1) identifying the positions determined as Metastable Down state or Deep Down state where the previous state is Up state as the starting points of downtime; (2) identifying the positions determined as Up state where the previous state is Metastable Down state or Deep Down state as the ending points of downtime; and (3) pairing all staring points of downtime with their following ending points of downtime to identify all downtime intervals.
The terminology used in the description presented below is intended to be interpreted in its broadest reasonable manner, even though it is used in conjunction with a detailed description of certain specific embodiments of the technology. Certain terms may even be emphasized below; however, any terminology intended to be interpreted in any restricted manner will be specifically defined as such in this Detailed Description section.
The embodiments introduced below can be implemented by programmable circuitry programmed or configured by software and/or firmware, or entirely by special-purpose circuitry, or in a combination of such forms. Such special-purpose circuitry (if any) can be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), etc.
In the present disclosure, the block height increments of the distributed ledger are used as the input data to determine the downtime interval and total downtime of the system if downtimes exist. Theoretically, the ledger number should monotonically increase at a constant rate when the network is up. The block height increment curve, i.e. ledger number deltas, should stay around a constant value. It means the derivative of the ledger numbers is a constant. On the other hand, if the block height increment curve has a dramatic drop and remains a comparably small value for a while (the depth of the drop passes a predefined threshold and the width of the down value stays longer than another predefined value), a network downtime will be declared by the method of this invention. However, it should be noted that the method in this disclosure is applicable to any systems with stable baseline signals, and is not limited to the example provided herein.
The method for downtime determination disclosed herein is designed to be able to detect the following event of the system: (1) a sudden failure that lasts for a while; (2) a slowly deteriorating failure that lasts for a while; (3) a quick recovery from a certain time of failure; and (4) a gradual recovery from a certain time of failure. In addition, noise such as small and short system glitch and network fluctuation, should be eliminated to avoid false downtime calculation.
A morphology filter refers to a machine or a module implemented in a programmable circuit which performs the function of morphology filtering as described above. The morphology filter may also be able to further process the first smoothed signal to find down-spikes and up-spikes for the first smoothed signal. A moving average filter refers to a machine or a module implemented in a programmable circuit which performs the function of moving average calculation. A morphology event generator refers to a machine or a module implemented in a programmable circuit which generates morphology event labels indicating rapid drop or lift of the input signal. The morphology event labels may be further divided into (1) morphology down labels indicating the occurrence of a rapid drop, and (2) morphology up labels indicating the occurrence of a rapid lift. An average morphology event generator refers to a machine or a module implemented in a programmable circuit which generates moving average event labels indicating a gradual decline or rise of the input signal. The moving average event labels may be further divided into (1) average down labels indicating the signal gradually falls below the watermark, and (2) average up labels indicating the signal gradually rises above the watermark. The moving average event generator may also receive the morphology labels from a morphology event generator to generate a series of artificial up labels besides the normal average up labels to avoid miscalculation of system downtime. A downtime detector refers to a machine or a module implemented in a programmable circuit which integrates the event labels from a morphology event generator and an average event generator to determine the downtime intervals and total downtime of the system. The downtime detector may employ a finite state machine to determine the state of the system along with the time series, and calculate the downtime based on the states determined.
In general, the method comprises generating a first smoothed signal for an input signal by closing and opening process, generating a second smoothed signal for the first smoothed signal by moving average smoothing, generating morphology event labels and average event labels based on the first and second smoothed signals, and using the generated labels to determine the downtime interval and total downtime of the system.
Firstly, the present invention provides a method for signal processing by a morphology filter, comprising filtering an input signal with a closing process and an opening process to generate a first smoothed signal, wherein the closing process comprises a dilation process followed by an erosion process, and the opening process comprises an erosion process followed by a dilation process. In one embodiment of the invention, the dilation process is performed by finding the local maximum for all data points within a normal dilation window for each data point of the input signal in region of interest, and the erosion process is performed by finding the local minimum for all data points within a normal erosion window, respectively. The dilation process aims to remove upward noise spikes, and the erosion process aims to remove downward noise spikes. The normal dilation and erosion windows applied for local maximum/minimum calculation are generally small and extend only several points from the position of the input signal. The method disclosed herein can be a closing process followed by an opening process (i.e. dilation, erosion, erosion and dilation in sequential), or it can also be an opening process followed by a closing process (i.e. erosion, dilation, dilation and erosion in sequential). Both procedures should yield similar results to each other. The region of interest is usually the region where the downtime interval will be determined. With the morphology filtering process, the noise signals caused by glitches or temporarily down the network monitor can be removed.
The generated first smoothed signal is then sent to a moving average filter to generate a second smoothed signal. The second smoothed signal is generated by calculating the average value of all local data points within a predetermined averaging window for each corresponding data point of the first smoothed signal, with the corresponding data point of the first smoothed signal placed at the center of the predetermined averaging window. The aim of applying a moving average filter is to find the “trend” of the signal and to smooth the local fluctuation of the first smoothed signal.
The first smoothed signal (morphology smoothed signal) and the second smoothed signal (moving average smoothed signal) are then used to generate a series of morphology event labels and average event labels representing the positions where the system goes down and goes up. The morphology event labels are generated based on the first smoothed signal by a morphology event generator. The morphology labels indicate system down events or up events by labeling the positions of sharp downward steps and sharp upward steps of the first smoothed signal. In one embodiment of the invention, the morphology event labels are divided into two groups: morphology down labels indicating the positions of sharp downward steps of the first smoothed signal, and morphology up labels indicating the positions of sharp upward steps of the first smoothed signal. On the other hand, the second smoothed signal is used by an average event generator to generate a series of average event labels indicating system down events or up events, wherein the average event labels are the positions where the second smoothed signal crosses a predetermined average threshold value. In one embodiment, the average event labels are divided into two groups: average down labels indicating the positions where the second smoothed signal crosses the average threshold value from above to below, and average up labels indicating the positions where the second smoothed signal crosses the average threshold value from below to above.
In an embodiment of the present invention, the positions of the down morphology labels which represent the positions of sharp downward steps are identified by (1) performing a bias dilation process to the first smoothed signal to generate a bias dilated signal; (2) subtracting all data points of the first smoothed signal from the corresponding data points of the bias dilated signal to obtain a first spike signal representing the positions of downward steps; and (3) finding the positions where the value of first spike signal is equal to or above a predetermined down-spike threshold value. The positions where the first spike signal goes above the down-spike threshold value are defined as the positions of morphology down events. Likewise, the positions of up morphology labels representing the positions of sharp upward steps are identified by (1) performing a bias erosion process to the first smoothed signal to generate a bias erosion signal; (2) subtracting all data points of the bias eroded signal from the corresponding data points of the first smoothed signal to obtain a second spike signal representing the position of upward steps; and (3) finding the positions where the value of the second spike signal is equal to or above a predetermined up-spike threshold value. The positions where the second spike signal goes above the up-spike threshold value are defined as the positions of morphology up events. The bias dilation process and bias erosion process are similar to the normal dilation and erosion process described above, except that the data point of the input first smoothed signal is placed at the last position of the dilation or erosion window instead of placing at the center. Furthermore, the bias dilation/erosion window used in the bias dilation/erosion process is smaller than half of the normal dilation/erosion window in order to accurately extract the positions of sharp downward/upward steps.
In one another embodiment, the series of average up labels further comprise a series of artificial up labels indicating the second smoothed signal goes above the average threshold value following a morphology down label. In this embodiment, the generated morphology event labels are received by the average event generator and used to generate a series of artificial up labels, wherein the artificial up labels are generated when a morphology down label is identified but no average event label exists in a predetermined range following the morphology down label. This can be determined by checking all values of the second smoothed signal within a predetermined range from which the morphology down label is identified. If all the checked values are above the average threshold value, an artificial up label is generated correspondingly. In the following downtime determination step, the generated artificial up labels are treated as normal average up labels. The introduction of the artificial up labels is to avoid a special case that when the signal drops abruptly and recovers relatively smoothly, and only a morphology down label is generated but no average up label exists to declare an uptime (i.e. the end of a downtime interval) when the network actually recovers, as shown in
The application of both morphology-based filter and moving-average-based filter to generate both morphology event labels and average event labels have the advantage of finding the true starting point of a downtime more sensitively while preventing the false-positive results from noise signals. Generally speaking, for a moving average filter and the morphology event labels, the smaller the threshold values are, the more sensitive the downtime detection will be, and also more likely to be influenced by the noise signals (e.g. a glitch). On the other hand, if the threshold values are set to be large, the detected downtime will always be declared much later than the actual downtime. Consequently, to achieve a satisfied sensitivity, morphology-based filters are applied, and moving-average-based filters can confirm that the downtime results from morphology-based filters are true to avoid false positives as much as possible.
Lastly, a downtime detector integrates the morphology event labels and the average event labels into the same index based on their corresponding positions, and determines the downtime interval based on the morphology event labels and the average event labels. In the embodiment where the artificial up labels are generated, the artificial up labels are treated the same as the average up labels. In details, the downtime determination process includes (1) determining the state of the first position; (2) determining the state of each position from the second position to the last position in region of interest based on the state of its previous position and the existence of a morphology event label or an average event label at that position; and (3) determining the downtime interval based on the states of all positions in region of interest. To prevent the problems caused when a morphology event label and an event average label present at the same position, in an embodiment of the invention, when integrating the series of morphology event labels and the series of average event labels, the average event label at the position is entered and the morphology event label at the same position is discarded.
The determination of the downtime interval can be performed by a finite state machine with three available states: “Up” state, “Metastable Down” state, and “Deep Down” state. Each state represents the status of the signal. Up state represents a status when the signal stays around the baseline; whereas Metastable Down and Deep Down state (collectively “Down states”) represent a status when the signal stays below and far off the baseline. The state of the first position in the region of interest is set as an “Up” state, and the states of each of the following positions are determined by the state of its previous position and the existence of a morphology event label or an average event label at that position. The rules of determining the states are as follows:
The rules above can be summarized as shown in Table 1 and
Next, the downtime interval is determined by (1) identifying the positions determined as Metastable Down state or Deep Down state where the previous state is Up state as the starting points of downtime; (2) identifying the positions determined as Up state where the previous state is Metastable Down state or Deep Down state as the ending points of downtime; and (3) pairing all staring points of downtime with their following ending points of downtime to identify all downtime intervals. A pair of data points are included in a downtime interval, which represent the starting point and the ending point of the downtime. Based on the downtime interval, the length of each downtime, and the length of total system downtime within the interval of downtime detection, can both be easily calculated.
In one embodiment, to avoid errors due to the boundary conditions, the interval of downtime determination is smaller than the interval of state determination, so that the downtime detector only uses the middle part of the states determined by the state machine to generate the downtime interval. Therefore, the uncertainty of the state caused by the boundary can be avoided.
The following example is provided to further illustrate the present invention. In the example system, the core monitor receives a signal from the network every 10 seconds representing the block number of the distributed ledger. Theoretically, the ledger number should monotonically increase at a constant rate when the system works normally. Therefore, by calculating the block high increment (the block number of a specific point minus the number of the previous point) to every point, a block high increment curve is obtained as the input signal, as shown in
The input signal was then filtered by a morphology filter, which is a collection of non-linear operations intends to smooth out the noise from the original curve and to produce the result that indicates the radical changes on the curve. Two types of morphology-based operations, dilation and erosion, were implemented. The dilation operation is to find the local maximum value on the data points within a predetermined dilation window, i.e. y[i]=max(x[i−r_left: i+r_right]), where x[ ] is the input data point value and y[ ] is the output data point value, with the boundary condition: “if i−r_left<0, use 0; if i+r_right>the length of x−1, use the length of the x−1.” In this example, the values of both r_left (left radius) and r_right (right radius) are set to 3 (r_left=r_right=M_r=3, wherein M_r was defined as the radius), and the dilation window contained 7 data points in total. With the parameters of r_left=r_right=M_r=3, the position of each output data point is at the center of corresponding input window. Similarly, the erosion operation is to find the local minimum value on the data points within a predetermined erosion window, i.e. y[i]=min(x[i−r_left: i+r_right]), where x[ ] is the input data point value and y[ ] is the output data point value, with the boundary condition: “if i−r_left<0, use 0; if i+r_right>the length of x−1, use the length of the x−1.” As described above, the values of both r_left and r_right are set to 3, and the erosion window also contained 7 data points in total. In conclusion, a 4-step procedure comprising first dilation, first erosion, second erosion and second dilation was performed. The result of the processed signal after first dilation, first erosion, second erosion and second dilation are shown in
To find the morphology labels indicating downward steps, firstly the morphology filter performed a bias dilation process on the first smoothed signal with left radius r_left=ceil(½*M_r)=2 and right radius r_right=0. The bias dilation window contains 3 data points and the output is at the last position of the bias dilation window. The bias dilation process utilized the same Python code as the normal dilation process in Snippet 2. The resulting bias dilated signal is as shown in
The moving average filter yields the second smoothed signal which is a moving average curve used for detecting a certain trending of changes on the original input signal (
In this example, the moving average filter in the equation form is:
where (left_radius+right_radius) is the averaging window size, x[ ] is the input data point value and y[ ] is the output data point value. The Python code for calculating moving average signal (the second smoothed signal) is shown in Snippet 7. In the code, left_radius=right_radius=floor(½*A_r), where A_r is the averaging window. The averaging window A_r was set to 6, and the resulting second smoothed signal is shown in
In the detection stage, it processes the intermediate calculation results from both the morphology-based filter and the moving-average-based filter to find accurate downtime intervals.
First, the morphology event generator scanned both downtime spikes (as shown in
In this example, there were two types of morphology event labels—morphology down labels (M.Down) and morphology up labels (M.Up). The M.Down label was triggered by the down spikes representing a sudden drop on the original input signal (
The morphology event labels not only consist of the type of change but also the index and the value on the original input signal when the radical change happened. For instance, the morphology event label in
The moving average event generator (abbreviated as average event generator) calculated moving average event labels (abbreviated as average event labels or a_events). In order to be able to utilize the Python code in Snippet 9, a dip function was applied first to the second smoothed signal to obtain the dip average signal as shown in
In addition, the morphology event labels were also sent to the average event generator to avoid a scenario where a down event (morphology down label) exists without any up event (morphology up label or average up label). There might be a special case that only a morphology down label (M.Down) label is generated with neither morphology up (M.Up label) nor average up label (A.Up label) to declare an uptime (i.e. the end of a downtime interval) when the system recovers, as shown in
The downtime interval was detected by a downtime detector, in which all types of filter labels (M.filter labels and A.filter labels) were processed to calculate final downtime intervals. The followings are the steps utilized to calculate final downtime intervals in the example:
The result of the final downtime interval of the input signal is shown in
In this example, the downtime calculation was performed periodically in a 10 mins period to avoid errors due to the boundary conditions. It read the block height increment data from the past 30 mins for calculation. The output showed the downtime interval between the last 20 mins and the last 10 mins if any downtime existed. In other words, if the filter-based downtime calculation starts at time t, it uses the block height increment data from [t−30 mins, t) for calculation and outputs the downtime interval results within the time range [t−20 mins, t−10 mins). The next run will start at t+10 mins, as shown in
Below are embodiments of pseudo code for implementing the methods described above.
The foregoing description of embodiments is provided to enable any person skilled in the art to make and use the subject matter. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the novel principles and subject matter disclosed herein may be applied to other embodiments without the use of the innovative faculty. The claimed subject matter set forth in the claims is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein. It is contemplated that additional embodiments are within the spirit and true scope of the disclosed subject matter. Thus, it is intended that the present invention covers modifications and variations that come within the scope of the appended claims and their equivalents.
This application claims the benefit of the U.S. provisional application 63/195,720 filed on Jun. 2, 2021, titled “METHOD AND SYSTEMS TO MEASURE SERVICE LEVEL OBJECTIONS,” which is incorporated herein by reference at its entirety.
| Number | Date | Country | |
|---|---|---|---|
| 63195720 | Jun 2021 | US |