Disclosed are embodiments related to predicting multiple parameters and managing resources.
Predicting a parameter (a.k.a. variable) from historical data is a common practice. One area where this arises involves efficiently handling human resources (such as domain specialists), and particularly the time of the resources. In managed services, for example, this is a great challenge. Many service industries are looking to reduce the manpower necessary for various tasks (such as maintenance), and are attempting to replace human workers with intelligent robots. This means that fewer human resources are available to be allotted to very critical tasks. Due to the demanding requirements of many service industries, various attempts have been made to better allocate resources, such as algorithms to optimize route selection and only allocating engineers for tasks after getting details such as that a given fault has occurred at a given location. Better allocation of resources is generally applicable to many industrial service scenarios. Some examples include attending to the services of “smart objects” in new Internet-of-Things (IoT) type environments (such as cities and particular industrial sites). For discussion purposes, a specific problem will be addressed below using the telecommunications service industry as an example.
A base transceiver station (BTS) is a piece of equipment that facilitates wireless communication between a user equipment (UE) (such as a mobile handset) and a network (such as a wireless communications network like a 4G network or a 5G network). These BTSs include many components, and often times will need repairs associated with those components in order to remain in good working order. These repairs can be handled efficiently by service engineers who are specialized in the domain. The time taken for the repair typically depends on the experience of the service engineer and level of the difficulty of the problem. There can be thousands of towers (which can house a BTS) in a normal city; therefore, it can be a complex task to assign human resources to handle and solve problems as they arise in time to address the problems and maintain good network conditions.
There are known approaches available in the literature to predict a single parameter such as the number of faults. One approach is to model the faults as time-series data and predict the location and possible number of faults in a computer-software system. Other approaches address the usage of reinforcement learning (RL) in the management of resources based on the prediction information of the tasks.
Assigning resources should be done optimally. For example, assigning service engineers to handle repairs (such as in the telecommunications service industry example discussed above) should be done optimally so that the repairs can be handled efficiently, e.g. to minimize time and/or cost and/or network downtime. Another factor to consider is that the service engineers may be located anywhere in a city, and they can be assigned to repair a tower which is far away from their current location. Therefore, the method of assigning repairs should consider the distance from the current location of the service engineer to the location of the tower and/or the expected time to traverse that distance.
Given all the above considerations, if faults necessitating repair and their specific location are known in advance of those faults, then resources can be more optimally allocated and the problems can be repaired more efficiently, e.g. more cheaply and more quickly. However, predicting faults in advance is a complex problem and implicates various other issues some of which are hard or impossible to know in advance with certainty, e.g. outer environmental parameters. In view of all of this, there is a need for improved resource allocation, such as for an improved digital workforce which can predict fault invariant parameters on a real-time basis and automate and improve the workforce handling those issues. Such improvements could be useful in a wide variety of industrial applications, e.g. for managing the repair of “smart objects” in IoT environments. The improvements can also create value in solving problems quickly and providing the most satisfaction to customers utilizing the services being offered.
Problems with existing solutions for predicting faults abound. For example, such systems are generally limited to predicting a single parameter. Further, such systems have trouble being applied in the above telecommunications service industry scenario, or generally where there are various equipment with different specifications (such as in an IoT platform). Also, because these systems are generally limited to predicting a single parameter, they cannot readily be modified to handle the prediction of multiple parameters, e.g. multiple correlated features relevant to a fault (such as fault location, time of fault, and fault type). Moreover, existing work with RL typically requires the user to specify a high quality reward matrix in order to produce reasonably accurate predictions, which is difficult to obtain in real-time scenarios.
In embodiments described here, systems and methods are provided for predicting faults in advance by optimally updating weights (e.g. by tuning a reward function) in consideration with the multiple predictions learned from the past rewards and using this information to optimize the current rewards. Embodiments are able to optimize the current rewards without the use of any rules (e.g. domain-specific rules) for predicting multiple parameters. In some embodiments such parameters include fault-invariants such as time of fault, location of fault, and fault type.
Embodiments are able to predict multiple parameters while minimizing prediction error.
As an example of how embodiments may be used, consider the telecommunications service industry example discussed above. Service engineers may be assigned tasks within maximum resolution time which depends on e.g. (1) predicting faults periodically (e.g. every four-hour period) from the historical and current data; and (2) assigning these faults optimally to service engineers on a real-time basis by considering the engineer's present location, the distance and/or time to reach the location of the fault, the engineer's domain knowledge and expertise, and the level and type of faults involved. At each period (e.g., every four-hour period) the prediction models can be further optimized based on additional data.
Advantages include that the prediction and allocation may be advantageously applied to a large number of different settings, including efficient allocation of human resources in diverse industries, and efficient allocation of resources more generally. For example, computing resources in a cloud-computing type environment can be allocated by some embodiments. The prediction and allocation may also be applied when resources are very limited or otherwise constrained, including during natural disaster or other types of irregular events that cause strain on a system's resources.
According to a first aspect, a method for managing resources is provided. The method includes applying an ensemble model, the ensemble model comprising a plurality of sub-models such that an output of the ensemble model is a weighted average of predictions from the sub-models, and such that the output is a prediction of multiple parameters. The method further includes determining that an accuracy of the ensemble model is below a first threshold. The method further includes optimizing weights for the predictions from the sub-models as a result of determining than an accuracy of the trained ensemble model is below a first threshold. Optimizing weights for the predictions from the sub-models includes applying reinforcement learning, such that the weights are selected for a given time instance to improve prediction accuracy based at least in part on a reward function; and updating the weights selected by the reinforcement learning by looking ahead over a prediction horizon and optimizing the reward function at the given time instance. The method further includes using the prediction of the multiple parameters to manage resources.
In some embodiments, updating the weights selected by the reinforcement learning by looking ahead over a prediction horizon and optimizing the reward function at the given time instance comprises:
Step 1) initializing weights for the predictions from the sub-models;
Step 2) computing predictions of multiple parameters over the prediction horizon using the weights for the predictions from the sub-models;
Step 3) computing a minimization function to update the reward function to minimize prediction error, whereby the weights for the predictions from the sub-models are updated;
Step 4) computing the predictions of multiple parameters over the prediction horizon using the weights for the predictions from the sub-models that were updated in step 3; and
Step 5) determining whether a prediction error is less than a second threshold.
In some embodiments, as a result of step 5, it is determined that the prediction error is not less than a second threshold, and updating the weights selected by the reinforcement learning further comprises: discarding a sample used in step 2 for computing the predictions of multiple parameters over the prediction horizon using the weights for the predictions from the sub-models; and repeating steps 2 through 5, until it is determined that the prediction error is less than the second threshold. In some embodiments, computing the minimization function of step 3 comprises optimizing
R
minΣi=k+1k+N(f(R,y[i−p],u[i−p])−y[i]), p=1,2,3 . . . ,
where R is the reward function, y[i] is the actual output calculated at the given time instant i, f(.) is the reinforcement learning model and u is the multiple parameters.
In some embodiments, at least one of the multiple parameters is related to a fault and wherein using the prediction of the multiple parameters to manage resources comprises assigning resources to correct the predicted fault. In some embodiments, the multiple parameters includes (i) a location of a fault, (ii) a type of the fault, (iii) a level of a node where the fault occurred, and (iv) a time of the fault. In some embodiments, using the prediction of the multiple parameters to manage resources comprises applying an integer linear programming (ILP) problem as follows:
where d is the distance to the location of the fault, and tij is the time taken by resource i to reach the location j, where M is a total number of predicted faults in a time period, where the constraint Σj=1M Σi=1N aji=M ensures that there are M resources assigned, where the constraint Σj=1M aji≤1∀i=1, . . . , N ensures that almost one object is assigned to one resource, and where the constraint aji={0,1} ensures a resource is either selected or not.
In some embodiments, using the prediction of the multiple parameters to manage resources comprises assigning human resources based on one or more of the multiple parameters. In some embodiments, using the prediction of the multiple parameters to manage resources comprises assigning computing resources based on one or more of the multiple parameters.
According to a second aspect, a node adapted to perform the method of any one of the embodiments of the first aspect is provided. In some embodiments, the node includes a data storage system; and a data processing apparatus comprising a processor. The data processing apparatus is coupled to the data storage system, and the data processing apparatus is configured to apply an ensemble model, the ensemble model comprising a plurality of sub-models such that an output of the ensemble model is a weighted average of predictions from the sub-models, and such that the output is a prediction of multiple parameters. The data processing apparatus is further configured to determine that an accuracy of the ensemble model is below a first threshold. The data processing apparatus is further configured to optimize weights for the predictions from the sub-models as a result of determining than an accuracy of the trained ensemble model is below a first threshold. Optimizing weights for the predictions from the sub-models includes applying reinforcement learning, such that the weights are selected for a given time instance to improve prediction accuracy based at least in part on a reward function; and updating the weights selected by the reinforcement learning by looking ahead over a prediction horizon and optimizing the reward function at the given time instance. The data processing apparatus is further configured to use the prediction of the multiple parameters to manage resources.
According to a third aspect, a node is provided. The node includes an applying unit configured to apply an ensemble model, the ensemble model comprising a plurality of sub-models such that an output of the ensemble model is a weighted average of predictions from the sub-models, and such that the output is a prediction of multiple parameters. The node further includes a determining unit configured to determine that an accuracy of the ensemble model is below a first threshold. The node further includes an optimizing unit configured to optimize weights for the predictions from the sub-models as a result of the determining unit determining than an accuracy of the trained ensemble model is below a first threshold. Optimizing weights for the predictions from the sub-models includes: applying reinforcement learning, such that the weights are selected for a given time instance to improve prediction accuracy based at least in part on a reward function; and updating the weights selected by the reinforcement learning by looking ahead over a prediction horizon and optimizing the reward function at the given time instance. The node further includes a managing unit configured to use the prediction of the multiple parameters to manage resources.
According to a fourth aspect, a computer program is provided. The computer program includes instructions which when executed by processing circuitry of a node causes the node to perform the method of any one of the embodiments of the first aspect.
According to a fifth aspect, a carrier containing the computer program of the fourth aspect is provided. The carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium.
The accompanying drawings, which are incorporated herein and form part of the specification, illustrate various embodiments.
As shown in
System 100 may be used in a wide variety of resource allocation problems. For example, system 100 may be used to optimally assign computing resources in a cloud-computing type environment. As another example, system 100 may be used to predict and assign the faults in a telecommunications network to service engineers based on the engineers' present location, distance and/or time to travel to the fault location, domain knowledge, and also based on the fault's level or severity and the type of fault. In allocating resources in this example, two steps are provided, which include (i) predicting fault invariants and (ii) assigning service engineers to the predicted faults in a timely manner.
Ensemble models 110 may include a set of N models, resulting in N predictions at a given time instant. Using an ensemble model in this manner can be advantageous for certain types of data. For example, the alarm data for faults in the telecommunications service industry is noisy, it may have many outliers, and the time scale of the faults is not uniform (for example, a first fault can occur at 2 AM, a second fault at 2:01 AM, and a third fault at 4:00 AM). In addition, some of the variables discussed here are categorical variables. In this case, using only one model can lead to poor predictions. Also, every model will have its own limitations. Hence, to address this, system 100 employs ensemble models to provide for more accurate predictions. Instead of relying on a single model, a prediction is generated from taking the output of N models. However, a problem with the ensemble approach is that the output of the ensemble model depends on the choice of weights chosen. A simple choice of weights is the arithmetic mean of all the predictions. This choice is not always optimal, however, as sometimes it is appropriate to weigh one model more than another.
To address the issue of selecting weights for the ensemble models 110, system 100 employs a reinforcement learning 112. The reinforcement learning 112 may use reinforcement learning (RL) methods to select weights. Typical RL methods tend to require very precise rewards in order to produce good results. Here, system 100 introduces a weight updater 114 to help to produce optimal weights by computing a prediction of future fault invariants and minimizing the prediction error. In some embodiments, weight updater 114 may apply a reward tuning function to optimally update weights.
The individual components of system 100 are now explained in greater detail.
The prediction is computed as an ensemble average of predictions obtained from different models. Assuming that there are N such models (for ensemble models 110) which predict N different values in a given time, an average of all these N models is computed as:
where pk are each of the predictions obtained from the N different models; and wk are the corresponding weights for the different models.
To make sure that the prediction S(t) is optimal, it is important to select weights wk that are optimal. The reinforcement learning 112 and weight updater 114 work together to achieve this. The aim of the RL here is to interact with the environment and learn the optimal weights for the time instant. The learning of the optimal weights depends also on an error obtained from previous time instants. In RL, the objective is to change the state of the agent such that the agent is maximum rewarded i.e.
Reward
maxstate of the agent
To apply this, we need to define states and actions. The states here represent the prediction obtained at time t and the nearby values (e.g. what is previous in sequence). The actions represent the choice of weights. The reward matrix is computed at every time instant based on the inverse of the prediction error obtained at the time instant (for example, to minimize the prediction error, which would maximize the prediction accuracy). The state transition corresponds to the choice of weighting functions which influence the prediction of time data. The objective of RL is to choose optimal weights such that the overall prediction accuracy should be improved.
In the RL, X={xi} is a set of states, U is a set of actions, PXU are the state transition probabilities, and r is the reward function for state transition xi. Hence the total payoff can be computed as:
R=r(x0)+γr(x1)+γ2r(x2)+ . . .
In the above equation γ is called a discount factor. Typically, the rewards are chosen as scalar values, and in some case where a large number of states and actions exist, deep RL may be used to approximate the reward function. In RL the idea to choose actions ut such that the total payoff R is maximized. It is similar to a Markov Decision Process where the transition probabilities are estimated by a maximum likelihood estimate.
The predictions are computed periodically, and after a given period, the problem is solved again to obtain optimal predictions for the next period. For the next period, the real-time data from the preceding period may be added to the models as additional historic data. For example, for the telecommunications service industry example, predictions may be computed for every four hours by taking the real-time faults that occurred during the four-hour time period. At the end of the time period, the faults that occurred are added to the historic data thereby improving the performance of the model. The reason for the choosing a four-hour period is because the average time spent by a service engineer to repair the fault is about four hours. A longer or shorter period may also be chosen.
It should be noted the predictions obtained from this step may not be optimal. For good predictions, existing RL requires a lot of historical data and particularly data that is not noisy. (The fault data in the telecommunications service industry example, for instance, is particularly noisy). In many scenarios, there may not be enough historical data available, it may be too noisy, or there may not be space available to store it. Therefore, the policy learnt in existing RL may not be optimal. To improve this, the weight updater 114 is provided to help to learn the optimal policy by predicting the future fault invariants and changing the current policy such that the prediction error is minimized.
Updating weights (such as by reward tuning) as described here may be considered analogous to a model predictive controller (MPC), where there is a significant overlap with the RL techniques. The idea of MPC has not been used previously in the design of optimal rewards.
To review, the objective of an MPC is to drive the output of a process y[k] (where k is the time at which the output is recorded) to a fixed value s as quick as possible. This is achieved by predicting the output of the system over the next N instants ahead in time and changing the input of the system at a current time such that the actual output reaches the fixed value s as soon as possible. The output of the process y[k] is controlled by changing the input of the process u[k]. This is analogous to the RL mechanism, where s is the optimal policy, y[k] is the state of the process and u[k] is set of actions to be taken. In MPC, the model between u[k] and y[k] is used to minimize the value y[k]−s. Mathematically, it can be written as:
In the above equation, N is known as a prediction horizon, f(.) is the model built between the output and input to the process. As a note to the reader, the N used here is different from, and not related to, the number of models used in the set of ensemble models; rather, it refers to the number of time instants that the prediction will predict ahead of time, hence the name “prediction horizon.” The optimization problem is solved at the current instant and the input u[k] is estimated. Similarly, the input is calculated at every next sampling instant by solving the optimization problem:
Here f( ) may be chosen as the state-space model.
In the case of prediction as discussed here, the input u[k] consists of past data when there are no independent variables. In the case of independent variables, the input u[k] consists of both past data and independent variables.
Applying this to the system 100, first the process is modeled with a deep RL model with the choice of random rewards; then, using the computed predictions, the following optimization problem may be solved:
where the output at the current instant is predicted as
ŷ[i]=f(R,y[i−p],u[i−p])
In the above equation R is the reward chosen to compute the predictions, y[i] is the actual output calculated at the instant i. In this equation f(.) is the RL model built during the process and the u is the set of independent variables.
It should be remarked that the weights chosen at the end of this step may or may not be an optimal one in a strict sense. The optimality of the solution depends on many factors, such as the initial choice of rewards, the current predictions, the choice of the models, and so on. Notwithstanding, predictions obtained at the end of this step have lower prediction error than that of the predictions obtained without the usage of the weight updater (e.g., applying a reward tuning function), and such predictions are referred to as optimal predictions 106.
The pseudo-code for updating weights by the weight updater is given below. The pseudo code to predict the time of the fault in the table is provided, however the code to predict other parameters will have similar calculations.
r0=[1 1 1 . . . ]T, K=0 and P=10.
Based on the prediction output, it is possible to then assign or allocate resources. Allocating resources will depend on the nature of resources being allocated. For example, for a cloud-computing type environment, where the resources include computing resources, different computing resources may have heterogeneous capabilities (e.g. number of graphics processors, available RAM, size of L2 cache).
Turning to the telecommunications service industry example previously discussed, resources may include service engineers. These engineers may have different levels of experience and may be positioned at different geographic locations, for instance. The problem of assigning the service engineers, in some embodiments, can be solved as an integer linear programming (ILP) problem where the decision variables are either 0 or 1. It is known how to solve such ILP efficiently; for example, solvers such as Gurobi can be easily integrated with CVX. The proposed formulation of the ILP problem uses parameters like a distance a service engineer has to travel, domain expertise, and time to reach the destination. The final function to be solved is
where d is the distance travelled by a service engineer, and tij is the time taken by the service engineer i to reach the destination j. The first constraint Σj=1M Σi=1N aji=M ensures that there are M (total number of predicted faults in the four-hour duration) objects assigned. The second constraint Σj=1M aji≤1∀i=1, . . . , N ensures that almost one object is assigned to one person. The third constraint aji={0,1} ensures that the service engineer is either selected or not.
Using the above ILP technique, all objects will be assigned to all the service engineers optimally.
A couple of illustrations will now be described.
Illustration 1: In this illustration, based on the telecommunications service industry example, the possible faults occurring in a particular object along with the level and type of fault and the possible time of fault are to be predicted. For this, we assume the history of the faults is known along with the level and type of faults and time of faults. Sample data is shown in the Table 1.
A deep learning model has been trained on historical data such as this (as a part of ensemble method), to understand and predict the faults. This results in a set of ensemble models. Since the data considered in this illustration is noisy and can have outliers (e.g. some towers have one or less faults), the model results in poor accuracy. An example of an outlier may be the third row in table 1 above, where a specific alarm type (X.733) has happened at particular tower in a particular location only one time. This is an outlier and similar outliers can exist in the data. Hence, to improve the accuracy of the model, modifying the ensemble method by using RL and weight updater modules as described herein can be beneficial.
In any reinforcement learning algorithm, a rewards function must be specified. An example of the rewards function can be a constant function and it may be deduced from the past data. For example, from the table above, there are three instances of a specific fault type (X.733) in a particular tower at a particular location (600013). In this case, the reward of the state (fault at the location 600013) could be calculated as 3/(total number of faults data). A similar constant function could be calculated for the remaining faults. This choice of reward function, however, is not typically optimal, as the noisy data can lead to more reward than is appropriate. Another problem with this choice is that the optimal policy calculation can take longer to converge. Another choice of rewards, for example, could be a polynomial function giving more weight to more repeated faults and less weight to less repeated faults. This also may not be optimal. These reward functions can result in poorer predictions as the choice of the reward functions is independent of the environment.
Tests were run using 10,000 sample faults (with data such as that shown in Table 1). For these tests, 8,000 faults were used for training and 2,000 faults were used for testing the models. The prediction horizon used was 2, i.e. the rewards for every next sample was predicted based on the prediction error obtained for the next two samples. As a result, rewards were computed; as an example, the first 6 sample entries so computed were {0.5,0.4,0.3,0.43,0.54,0.21}. The rewards were then further fed to the system to compute the predictions. Choosing a high value for the desired accuracy threshold increases the number of iterations and can also increases the problem of over-fitting. On the other hand, choosing a lower value can result in poorer predictions. Sufficient care therefore should be taken for selecting this value.
The output of the minimization problem (that is, equation (3) above) results in optimal rewards being calculated. Once the rewards were calculated, the RL model was used to obtain the predictions. Based on the prediction error obtained, the rewards were then recalculated by predicting the rewards for next instants by solving the optimization problem for N future instants, by considering them
If required (depending on the desired accuracy), the rewards are once again calculated (by solving the optimization problem at next instant) to improve the accuracy of the predictions.
The type of the solution (such as global or local) depends on the model chosen to implement the process. For a one-layer network with linear activation function, the model is linear and hence, the minimization problem results in a global solution and can easily get a solution. However, if the network has multiple layers and if the activation network is non-linear, then the optimization problem is non-linear and converges to local solution. Embodiments can readily adapt to both cases.
Illustration 2: In this illustration, based on the telecommunications service industry example, fault alarms in a real-time scenario are predicted. Alarm data was obtained from a service provider, and collected in the span of four months (first four months of 2017). Three months of the data was used to train the model, and the fourth month of the data was used for testing. The data used is shown in part in table 2 below.
While the underlying data included more columns, we focused here on alarm type, node type, location, and time of the fault. It should be noted that the columns (Alarm type, node type, location) are categorical variables while the time is a continuous variable.
The data considered here is obtained from 19 locations across the world. There are 4 alarm types and 20 different node types in the data. The 4 alarm types considered are shown in table 3 below. The unique node types considered are shown in table 4 below.
The different parameters (here the columns of data including Alarm type, node type, location, and time) were analyzed for any correlation. Correlation plots of the data were obtained in order to facilitate this analysis. The Alarm type and node type were correlated, and time was correlated with itself. We also found that the location was not correlated with any of the parameters. Therefore for this illustration, location was not used as a predictor variable. The data was then filtered across each location and the following process performed to predict the remaining variables.
Predicting the Time of the Fault.
According to the data for this illustration, the time of the fault occurring is independent of all the other variables. The time of the fault was therefore modeled as a time series model. First, the data was split into 80% that was used for training the model and 20% that was used for testing the model. Next, an ensemble model was built on the training data and used to predict the time of the fault. Consequently, the accuracy of each model was calculated and depending on the accuracy, the RL and weight updater modules described above were used to calculate the rewards by assigning proper weights to the ensemble model. This is repeated until the desired accuracy is achieved.
As part of this, one of the rewards was plotted as a function of the number of iterations, as shown in
The desired accuracy is generally chosen based on the application. In this illustration, a desired accuracy of 99% was used for time-series prediction, as the time should be estimated with high accuracy. In addition to time, the node type and type of the fault are also predicted.
Predicting the Node Type
According to the data for this illustration, the node type is correlated only with the time of the fault. Therefore, to predict the node type, time of the fault was considered as an independent variable. Similar to the previous work on prediction of the time, the same steps apply to predict the node type. In this example, the desired accuracy for node type was set at 85%.
Predicting the Alarm Type
According to the data for this illustration, the alarm type is correlated with both the time of the fault and node type. Therefore, to predict the alarm type, time of the fault and node type were considered as independent variables. Again, the same steps to predict the time or node type are also applicable for predicting the alarm type. In this example, the desired accuracy was set at 85%.
After predicting the time of fault, the node type, and the alarm type, the accuracies of these predictions were recorded for observation. These accuracies are provided below:
From the table, it is evident that the proposed algorithm is able to make good predictions with the real-time data. For the sake of comparison, similar predictions were made using only RL without modifying the reward function with the weight updater, and the accuracies of these predictions were also recorded for observation. In this case, the RL has not converged because of the noisiness of the data. The method uses a stochastic gradient kind of approach to estimate the optimal rewards. The accuracies obtained are given in the table below.
As further evidence, these results can also be compared to another system for predicting faults. Using the method disclosed in Ostrand, Thomas J., Elaine J. Weyuker, and Robert M. Bell, “Predicting the location and number of faults in large software systems.” IEEE Transactions on Software Engineering 31, no. 4 (2005): 340-355, the accuracy of the predictions are shown in the table below.
As you can see, the accuracies obtained using the proposed algorithm are good when compared with the existing method which depicts the efficacy of the proposed method.
In some embodiments, updating the weights selected by the reinforcement learning by looking ahead over a prediction horizon and optimizing the reward function at the given time instance comprises:
Step 1) initializing weights for the predictions from the sub-models;
Step 2) computing predictions of multiple parameters over the prediction horizon using the weights for the predictions from the sub-models;
Step 3) computing a minimization function to update the reward function to minimize prediction error, whereby the weights for the predictions from the sub-models are updated;
Step 4) computing the predictions of multiple parameters over the prediction horizon using the weights for the predictions from the sub-models that were updated in step 3; and
Step 5) determining whether a prediction error is less than a second threshold.
In some embodiments, as a result of step 5, it is determined that the prediction error is not less than a second threshold, and updating the weights selected by the reinforcement learning further comprises: discarding a sample used in step 2 for computing the predictions of multiple parameters over the prediction horizon using the weights for the predictions from the sub-models; and repeating steps 2 through 5, until it is determined that the prediction error is less than the second threshold. In some embodiments, computing the minimization function of step 3 comprises optimizing
R
minΣi=k+1k+N(f(R,y[i−p],u[i−p])−y[i]), p=1,2,3 . . . ,
where R is the reward function, y[i] is the actual output calculated at the given time instant i, f(.) is the reinforcement learning model and u is the multiple parameters.
In some embodiments, at least one of the multiple parameters is related to a fault and wherein using the prediction of the multiple parameters to manage resources comprises assigning resources to correct the predicted fault. In some embodiments, the multiple parameters includes (i) a location of a fault, (ii) a type of the fault, (iii) a level of a node where the fault occurred, and (iv) a time of the fault. In some embodiments, using the prediction of the multiple parameters to manage resources comprises applying an integer linear programming (ILP) problem as follows:
where d is the distance to the location of the fault, and tij is the time taken by resource i to reach the location j, where M is a total number of predicted faults in a time period, where the constraint Σj=1M Σi=1N aji=M ensures that there are M resources assigned, where the constraint Σj=1M aji≤1∀i=1, . . . , N ensures that almost one object is assigned to one resource, and where the constraint aji={0,1} ensures a resource is either selected or not.
In some embodiments, using the prediction of the multiple parameters to manage resources comprises assigning human resources based on one or more of the multiple parameters. In some embodiments, using the prediction of the multiple parameters to manage resources comprises assigning computing resources based on one or more of the multiple parameters.
While various embodiments of the present disclosure are described herein, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.
Additionally, while the processes described above and illustrated in the drawings are shown as a sequence of steps, this was done solely for the sake of illustration. Accordingly, it is contemplated that some steps may be added, some steps may be omitted, the order of the steps may be re-arranged, and some steps may be performed in parallel.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IN2019/050189 | 3/5/2019 | WO | 00 |