This application is based upon and claims the benefit of priority of the prior Indian Patent Application number 202311052286, filed on Aug. 3, 2023, the entire contents of which are incorporated herein by reference.
The present invention relates to forecasting network usage data, and in particular to a computer-implemented method, a computer program, and an information programming apparatus.
Mobile technologies and services contributed €757 billion to European GDP in 2021 and it has been predicted that by 2025, 5G will account for nearly half of mobile connections in Europe (44%) and that major European markets will lag global peers (https://www.telefonica.com/en/communication-room/blog/the-future-of-connectivity-in-europe-perspectives-and-policies/).
Setting up 5G connections and infrastructure requires more hardware resources compared to other networks such as LTE, 4G, and 3G, etc. Achieving maximum resource utilization can reduce the hardware cost and hence may result in cheaper internet and more benefits to network service providers. It is therefore useful to forecast 5G usage data in regions which do not currently have 5G connection, for example to see what infrastructure might be required and/or whether it would be worth the cost/effort of setting up the infrastructure.
In view of the above, an improved method for forecasting 5G usage data in areas without 5G connectivity is desired.
According to an embodiment of a first aspect there is disclosed herein a computer-implemented method comprising performing a forecasting process to predict 5G usage data for a target geographical area (which does not have 5G connectivity/for which historical 5G usage data does not exist or is not available) to include (wherein the forecasting process comprises) generating at least one of first to third 5G usage data predictions for the target geographical area, wherein generating the first 5G usage data prediction comprises: using a first model, which has been trained using data of the target geographical area (to predict non-network data of a second time period based on non-network data of a first time period before the second time period), to generate a first intermediate prediction by predicting non-network data of the target geographical area of a future time period based on non-network data of the target geographical area of a past time period; using a second model, which has been trained using data of at least one reference geographical area (which has or has had 5G connectivity) (to predict non-5G network usage data of the second time period based on non-network data and non-5G network usage data of the first time period), to generate a second intermediate prediction by predicting non-5G network usage data of the target geographical area of the future time period based on the non-network data and non-5G network usage data of the target geographical area of the past time period; and using a third model, which has been trained using data of the at least one reference geographical area (to predict 5G usage data of the second time period based on non-network data and the non-5G network usage data of the second time period), to generate the first 5G usage data prediction by predicting 5G usage data of the target geographical area of the future time period based on the predicted non-network data of the first intermediate prediction and the predicted non-5G network usage data of the second intermediate prediction, wherein generating the second 5G usage data prediction comprises: using a fourth model, which has been trained using data of the target geographical area (to predict non-5G network usage data of the second time period based on the non-network data and non-5G network usage data of the first time period), to generate a third intermediate prediction by predicting non-5G network usage data of the target geographical area of the future time period based on the non-network data and the non-5G network usage data of the target geographical area of the past time period; and using the third model to generate the second 5G usage data prediction by predicting 5G usage data of the target geographical area of the future time period based on the predicted non-network data of the first intermediate prediction and the predicted non-5G network usage data of the third intermediate prediction, wherein generating the third 5G usage data prediction comprises: using a fifth model, which has been trained using data of the at least one reference geographical area (to predict combined network usage data of the second time period based on the non-network data of the first time period (the combined network usage data comprising usage data relating to 5G and non-5G networks)), to generate a fourth intermediate prediction by predicting combined network usage data of the target geographical area of the future time period based on the non-network data of the target geographical area of the past time period (the combined network usage data comprising usage data relating to 5G and non-5G networks); using a sixth model, which has been trained using data of the at least one reference geographical area (to predict 5G usage data of the second time period based on the non-network data and the non-5G network usage data of the first time period), to generate a fifth intermediate prediction by predicting 5G usage data of the target geographical area of the future time period based on the non-network data and the non-5G network usage data of the target geographical area of the past time period; subtracting the predicted 5G usage data of the fifth intermediate prediction from the combined network usage data of the fourth intermediate prediction to generate a sixth intermediate prediction comprising predicted non-5G network usage data of the target geographical area of the future time period; and using the third model to generate the third 5G usage data prediction by predicting 5G usage data of the target geographical area of the future time period based on the predicted non-network data of the first intermediate prediction and the predicted non-5G network usage data of the sixth intermediate prediction, wherein non-network data comprises any of location data, (geographical data,) demographic data, weather data, infrastructure data, and (vehicular) traffic data.
Features relating to any aspect/embodiment may be applied to any other aspect/embodiment.
Reference will now be made, by way of example, to the accompanying drawings, in which:
The following definitions may be used in the description but are not exhaustive.
Cold-start forecasting: Generating a forecast for a variable for which there is no historical data (or without using historical data about that variable). Common in the e-commerce domain, for example in which a forecast for a new product (the sales thereof, for example) is generated but because the product is new there is no historical data of the product.
Multi-variate forecasting: Generating a forecast for multiple variables together and/or based on multiple variables together.
Spatio-temporal forecasting: Generating a forecast comprising and/or based on variables that vary across both time and space.
Distant Supervision: A machine learning/deep learning technique that may be used to learn direct and/or latent relations between entities or features from data in different distributions. Distant supervision in this sense is based on the concept of a distant distribution in natural language processing (NLP). In NLP, the core intuition behind distant supervision is that if two entities are related in a knowledge base and/or clean unstructured text data, then any sentence that mentions those two entities is likely to express that relation. Distant supervision may comprise training a machine learning model using training data generated based on existing training data.
Forecasting 5G usage data in an area which does not currently have 5G connectivity requires a cold-start forecasting approach because historical data of 5G usage for that area does not exist. Cold start 5G network usage forecasting provides valuable insights that may support informed decision-making, efficient resource allocation, improved service delivery, and enhanced user experiences, benefiting various stakeholders in the telecommunications industry.
In the e-commerce domain cold-start forecasting may be used to forecast sales of a new product. For example, historical data related to user behavior and related to sales of other similar products (among other data) may be used to forecast sales data for the new product. This technique for cold-start forecasting in the e-commerce domain may be referred to herein as a first comparative method.
In the weather domain, cold-start forecasting may be used to predict weather patterns in an area in which weather data has not previously been collected. For example, historical data related to weather in other areas, geographical features, and weather patterns in other areas may be used to forecast weather data for the new area. This forecasting comprises, for example, using existing overlapping weather patterns of other areas. This technique for cold-start weather forecasting may be referred to herein as a second comparative method.
The approaches used in the first and second comparative methods are not appropriate for generating forecasts of 5G usage data in an area which does not currently have 5G connectivity. That is, those approaches, when applied to cold-start 5G forecasting, do not generate useful forecasts.
A reason the approach of the first comparative method is not appropriate for 5G forecasting is because the distribution of data in a 5G area is too different from the distribution of data in a non-5G area. That is, the data of the 5G area belongs to a different distribution to that of the data of the non-5G area. In contrast, in the first comparative method there exists products for which historical data exists which are similar to the new product, as well as a common user base. It may be considered that a reason for the difference in distributions in the cold-start 5G forecasting case is because 5G and other networks (e.g. 4G, 3G, 2G, LTE, etc.) are too different from each other.
A reason that the approach of the second comparative method is not appropriate for 5G forecasting is that, similar to the above reasoning, the data of the 5G area belongs to a different distribution to that of the data of the non-5G area. In contrast, in the second comparative method the distributions of the data of the different geographical areas are more similar to each other (at least for some geographical areas). Another reason is that in the second comparative method there exist overlapping weather patterns as well as an overall common weather pattern/background for the entire world (or at least for large areas comprising multiple smaller areas). Such patterns can be learned and used in the forecasting. In contrast, in the cold-start 5G forecasting scenario there are no such overlapping patterns (or at least no useful overlapping patterns).
A problem with cold-start forecasting for 5G usage may be set out as follows. Let D={d0, d1, d2, . . . , d1, . . . dn}, where D is the set of time series data distributions from areas having 5G network usage history. di represents the ith area having spatial information, 5G time series usage history, and Non-5G (e.g. 2G, 3G, 4G and other networks) network usage history. Here, area means a geographical area having 5G network infrastructure (for example in at least a 1 Km2 area).
Let D′={d0′, d1′, d2′, . . . , di′, . . . dn′}, where, D′ is the set of time series data distributions from areas having No 5G network usage history. di′ represents the jth area having spatial information and Non-5G (e.g. 2G, 3G, 4G and other networks) network usage history. Here, area means geographical area having Non-5G network infrastructure (for example in at least a 1 Km2 area).
Here, D and D′ are different distributions. There is no overlap or direct correlation (at least no useful overlap or correlation—for example, there may be overlap or correlation between e.g. weather data but this is not useful) between D and D′. An aim of aspects disclosed herein is to forecast the 5G time series usage for any given jth area from D′ (represented as dj′) for which 5G usage history is not available. Aspects disclosed herein may provide direct forecasted values and/or ranges of values (e.g. minimum and maximum) values.
Aspects disclosed herein may be considered to comprise learning the time series correlation patterns between different variables of the distributions D, learning the timeseries patterns of different variables of the given distribution from D′ (say, dj′), and using the (distantly) learned correlation patterns of different variables of the distribution D and time series patterns of dj′ to forecast the 5G usage timeseries data for dj′.
Aspects disclosed herein may comprise a cold-start forecasting approach to indirectly learn patterns from different distributions (i.e. data of non-5G areas and data of 5G areas) and to use the (distantly) learned patterns to forecast the 5G usage data at for a given area or areas. It may be considered that the learning of patterns from different distributions employs distant-supervision-based strategies, as explained later.
The data used in the methods disclosed herein may be broadly divided into three categories: non-network data; non-5G network usage data; and 5G network usage data.
Non-network data may be referred to as “geographical and other data”, non-5G network data may be referred to as “other networks data” or “other networks (2G, 3G, 4G, etc.) data”, and 5G network usage data may be referred to as “5G network data”.
Each category of data may comprise any number of variables. In the example representation shown in
Call (in) means incoming call, call (out) means outgoing call, SMS (in) means incoming SMS, and SMS (out) means outgoing SMS. All these variables use some network bandwidth. Methods disclosed herein will work for any number of other variables which use network bandwidth, for example, IOT-based variables, which use internet bandwidth, and so on.
As shown in
The weather data may comprise at least one of temperature, wind speed/flow, rain frequency, and rainfall (amount). The traffic data may comprise at least one of a number of vehicles, an average speed of vehicles, and an average journey time of vehicles. The traffic data may comprise at least one of a number/density of vehicles, an average speed of vehicles, an average journey time of vehicles, a level of congestion, a number of traffic jams, a level of use of a road network, and a maximum transit capacity (of a road network).
The non-network data may comprise infrastructure data which may comprise any of a number of rail lines, a number of bus lines, a number of tram lines, distance to the nearest airport, the capacity of the nearest airport, and an average frequency of busses and/or trains and/or trams, among others. It will be appreciated that some of these variables may not vary with time. The non-network data may comprise demographic data which may include any of the population or population density, and the economic status of the area or of the people therein, among others.
The variables illustrated in
Non-network usage data comprises data usage relating to non-5G networks, for example any of 2G, 3G, 4G, LTE, etc. (for example whatever (telecommunications) networks are installed/used in the area concerned).
In the following description of the training process according to the running example, the reference area is an area for which 5G usage data is available and the target area is an area for which 5G usage data is not available. The reference area may be considered an area having 5G connectivity (or at least which had 5G connectivity in the past). The target area may be considered an area which does not have 5G connectivity (or did not have it in the past so that (sufficient) 5G usage data is not available). Of course, the target area does not necessarily not have 5G connectivity. For example, it may be that 5G usage data has not been collected for the target area or is not available for the target area. Alternatively, 5G usage data may be available for the target area but it is desired to forecast 5G usage data without using the historical 5G usage data for the target area. Having and not having 5G connectivity may be referred to as having and not having 5G network infrastructure.
The training process according to the running example comprises training model 1 using non-network data of a reference area before time T, non-5G network usage data of the reference area before time T, and 5G usage data of the reference area after time T. Model 1 is trained to predict the 5G usage data of the reference area after time T based on the non-network data of a reference area before time T and the non-5G network usage data of the reference area before time T.
“Until time T” may be used interchangeably herein with “before time T”. The time period referred to as “before time T” may be referred to as a first time period. The time period referred to as “after time T” may be referred to as a second time period.
In other words, training model 1 comprises training model 1 to predict 5G usage data of the reference area of the second time period based on the non-network data of the reference area of the first time period and the non-5G network usage data of the reference area of the first time period.
The training process according to the running example comprises training model 2 using non-network data of the reference area before T and combined network usage data of the reference area after T. Model 2 is trained to predict the combined network usage data of the reference area after T based on the non-network data of the reference area before T. In other words, training model 2 comprises training model 2 to predict the combined network usage data of the reference area of the second time period based on the non-network data of the reference area of the first time period.
The training process according to the running example comprises training model 3 using non-network data of the reference area before T, non-5G network usage data of the reference area before T, and non-5G network usage data of the reference area after T. Model 3 is trained to predict the non-5G network usage data of the reference area after T based on the non-network data of the reference area before T and the non-5G network usage data of the reference area before T. In other words, training model 3 comprises training model 3 to predict the non-5G network usage data of the reference area of the second time period based on the non-network data of the reference area of the first time period and the non-5G network usage data of the reference area of the first time period.
The training process according to the running example comprises training model 4 using non-network data of a target area before T, non-5G network usage data of the target area before T, and non-5G network usage data of the target area after T. Model 4 is trained to predict the non-5G network usage data of the target area after T based on the non-network data of the target area before T and the non-5G network usage data of the target area before T. In other words, training model 4 comprises training model 4 to predict the non-5G network usage data of the target area of the second time period based on the non-network data of the target area of the first time period and the non-5G network usage data of the target area of the first time period.
The training process according to the running example comprises training model 5 using non-network data of the target area before T and non-network data of the target area after T. Model 5 is trained to predict the non-network data of the target area after T based on the non-network data of the target area before T. In other words, training model 5 comprises training model 5 to predict the non-network data of the target area of the second time period based on the non-network data of the target area of the first time period.
The training process according to the running example comprises training model 6 using non-network data of the reference area after T, non-5G network usage data of the reference area after T, and 5G usage data of the reference area after T. Model 6 is trained to predict the 5G usage data of the reference area after T based on the non-network data of the reference area after T and the non-5G network usage data of the reference area after T. In other words, training model 6 comprises training model 6 to predict the 5G usage data of the reference area of the second time period based on the non-network data of the reference area of the second time period and the non-5G network usage data of the reference area of the second time period.
As disclosed herein, model 1 may be referred to as a sixth model, model 2 as a fifth model, model 3 as a second model, model 4 as a fourth model, model 5 as a first model, and model 6 as a third model, as shown below:
Training any of models 1-6 comprises adjusting at least one network weight of the model (for example to bring the prediction output by the model to or towards the ground truth data). Training any of models 1-6 may comprise utilizing traditional deep learning-based algorithms, for example traditional supervised deep learning algorithms. That is, training any of the models 1-6 may comprise deep learning-based training.
Disclosed herein are forecasting processes for forecasting 5G usage data, i.e. for predicting/forecasting 5G usage data in a target area in a future time period. For example, disclosed herein are first to third forecasting processes.
“Until time T′” may be used interchangeably herein with “before time T′”. The time period referred to as “before time T” may be referred to as a past time period. The time period referred to as “after time T″” may be referred to as a future time period.
In other words, the first forecasting process comprises
In other words, the second forecasting process comprises
The third forecasting process further comprises, using the fifth model (model 2), generating a fourth intermediate prediction by predicting combined network usage data of the target area after T′ based on the non-network data of the target area before T′. The third forecasting process further comprises, using the sixth model (model 1), generating a fifth intermediate prediction by predicting 5G usage data of the target area after T′ based on the non-network data of the target area before T′ and the non-5G network usage data of the target area before T′.
The third forecasting process further comprises combining the fourth and fifth intermediate predictions to generate a sixth intermediate prediction. The combining comprises subtracting the predicted 5G usage data of the target area after T′ of the fifth intermediate prediction from the predicted combined network usage data of the target area after T′ of the fourth intermediate prediction to generate the sixth intermediate prediction.
The third forecasting process further comprises generating the third 5G usage data prediction by, using the third model (model 6), predicting 5G usage data of the target area after time T′ based on the predicted non-network data of the first intermediate prediction and the predicted non-5G network usage data of the sixth intermediate prediction.
In other words, the third forecasting process comprises
A forecasting process may comprise generating any (at least one) of the first to third 5G usage data predictions, i.e. may comprises any (at least one) of the first to third forecasting processes. The first intermediate prediction may be generated only once in a forecasting process rather than being generated multiple times.
A forecasting process may comprise performing at least two of the first to third forecasting processes and combining the 5G usage data predictions to generate a final 5G forecast. The final 5G forecast may comprise an average of the at least two 5G usage data predictions or a predicted range of 5G usage data.
For example, considering a forecasting process in which two of the first to third forecasting processes are performed, a predicted range of 5G usage data may comprise, for each variable in the forecast, for each time step, a range for the variable comprising as endpoints the value for that variable in that time step of the two 5G usage data predictions.
A forecasting process may comprise the first to third forecasting processes. A predicted range of 5G usage data may comprise, for each variable in the forecast, for each time step, a range for the variable comprising as endpoints the highest and lowest values for that variable in that time step among the first to third 5G usage data predictions.
For a forecasting process comprising the first to third forecasting processes a predicted range of 5G usage data may be computed as follows.
In other words, in an implementation example, combining the first to third forecasts comprises for at least one (or each, or the) variable: computing the mean of the variable's predicted values (at each time step) in the first to third 5G usage data predictions; and selecting two values among the variable's predicted values (at each time step) which are closest to the corresponding mean as endpoints of a predicted range for the variable.
Aspects disclosed herein include methods which comprise a training process and/or a forecasting process—the training process in such a method may comprise training only the models which are used in the subsequent forecasting process (if included).
In testing, the forecasting process comprising the first to third forecasting processes and the combination thereof as described with reference to
The training and/or forecasting processes disclosed herein may comprise collecting at least some of the training and/or input data (any of non-network or network data). Traffic data may comprise data obtained from sensors in a geographical area concerned. The sensors may comprise any of: at least one on-board vehicle sensor; at least one user equipment; at least one camera; and at least one speed sensor. Weather data may comprise data obtained from sensors in a geographical area concerned.
The first time period may be, for example, a number of months (e.g. 1, 6, or others) or weeks or years, etc. The second time period may be, for example, a number of months (e.g. 1, 6, or others) or weeks or years, etc. The past time period may comprise the first and second time periods, for example in a method comprising a training process followed by a forecasting process using the trained models. The time T′ may be the present or close to the present, or may be in the past. Although the first and second time periods have been described above as directly following each other this is not necessary—that is, instead of “before time T” and “after time T”, the time periods “before time T1” and “after time T2” may be employed, such that there is a time period between the first and second time periods. Corresponding considerations may apply to the past and future time periods.
Training processes may comprise using data from multiple target areas and/or multiple reference areas. Forecasting processes may comprise forecasting 5G data usage for multiple target areas. Forecasting processes may comprise forecasting 5G data usage for one or more or all of the target areas used in the training process used to train the models used in the forecasting process.
The first BILSTM layers may be considered to learn (long-term) dependencies between time steps in each set of the input data which is time series data. Each layer performs additive interactions, which can help improve gradient flow over long sequences during training. Each layer comprises a number of BILSTM blocks. The function of a BILSTM Layer may be defined as follows:
h_t=LSTM_forward(x_t,h_{t−1})
g_t=LSTM_backward(x_t,g_{t+1})
y_t=f(concatenate(h_t,g_t))
LSTM_forward and LSTM_backward denote the LSTM functions for forward and backward directions, respectively. ‘concatenate’ is the operation that concatenates the outputs of the forward and backward LSTMs. ‘f’ is the activation function that transforms the concatenated output into the final output y_t.
The bidirectional LSTM processes the input sequence x_t from left to right with the forward LSTM and from right to left with the backward LSTM. The final output y_t is going to the next layer.
The concatenation layer takes inputs (the outputs from the first BILSTM layers) and concatenates them along a specified dimension. That is, the concatenation layer concatenates the first BILSTM layer outputs (based on two sets of input data) to generate a concatenation which has a dimension appropriate for the next stage.
The first DNNs learn relations and correlations in the data and add nonlinearity in the data. The function of a DNN may be defined as follows:
A DNN (Deep Neural Network) layer refers to a layer within a deep neural network architecture. A typical DNN layer can be mathematically represented as follows:
y=f(Wx+b)
The DNN layer takes the input x, performs a linear transformation by multiplying it with the weight matrix W, and adds the bias vector b. The resulting weighted sum is then passed through the activation function f to compute the output y of the layer. This output is passed to the next layer.
The self-attention layer focuses on “important” parts of the data based on correlations computed between parts of the input data. The self-attention mechanism allows the model to focus on different parts of the input sequence based on their relevance to each other. By attending to the relevant context, the model can better capture long-range dependencies and improve its ability to generate accurate predictions or representations.
The second BILSTM layer learns dependencies in the modified data output from the self-attention layer. The output of each of the BiLSTM blocks of the second BILSTM layer is taken as the output of the second BiLSTM layer.
The second DNNs make predictions based on the outputs of the second BILSTM layer. The second DNNs are implemented with a time-distributed wrapper to extract the predictions at the relevant time and at the relevant time steps. The time-distributed wrapper works as follows:
A time-distributed dense layer is used as the time-distributed wrapper. The time-distributed dense layer is particularly useful when dealing with variable-length sequences or when applying a dense layer to each time step of a sequence individually. Mathematically, the time-distributed dense layer can be represented as follows:
Here, batch_size represents the number of sequences in the batch, time_steps is the length of each sequence, input_dim is the dimensionality of the input at each time step, and units is the number of units or neurons in the dense layer.
The time-distributed dense layer reshapes the input tensor to (batch_size*time_steps, input_dim), applies a standard dense layer with unit neurons, and then reshapes the output back to (batch_size, time_steps, units).
By using a time-distributed dense layer, the network can learn different weights for each time step, enabling it to capture temporal patterns and dependencies within the sequence. This is especially beneficial when the relationship between the inputs and outputs varies across different time steps.
The repeat vector layer duplicates the output of the first DNN(s) so that two branches of processing are carried out. Each branch comprises a self-attention network/layer, a second BiLSTM layer, and a second at least one DNN. The architecture comprises two branches of processing because model 2 is configured to output two predicted sets of data. Of course, the two predicted sets of data may be considered combined so that the model is configured ultimately to output one predicted set of data (combined network usage).
The stages of the architecture operate in a similar manner to that described above with reference to
The architectures described above are not essential and it will be appreciated that variations of the architectures may be used.
It may be said that the first to sixth models are encoder-decoder (network) models, and/or that the first to sixth models each comprises a deep neural network, DNN and/or that the first to sixth models are self-attention-based models and/or that the first to sixth models are self-attention-based encoder-decoder (network) models and/or that the first to sixth models each comprises a self-attention network and/or that the first to sixth models each comprises a BILSTM layer. It may be said that the first to sixth models each comprises a first BILSTM layer, a first at least one deep neural network, DNN, a self-attention network, a second BILSTM layer, and a second at least one DNN.
In a forecasting process any number of variables may be forecasted and not all of the variables in the input data need to be forecasted.
The training and forecasting processes may in some aspects be considered to be divided into stages 1, 2, and 3.
Stage 1 comprises training models 1-6. That is, letting “A” represent non-network data, “B” non-5G network usage data, and “C” 5G usage data, stage 1 comprises, for the at least one reference area:
Stage 1 comprises, for the at least one target area:
Stage 2 comprises generating the intermediate predictions. That is, stage 2 comprises, using data of the target area(s):
Stage 3 comprises generating the first to third 5G usage data predictions and the predicted range. Stage 3 comprises:
The methodology according to some aspects disclosed herein may be described as follows:
Aspects disclosed herein may be considered to make use of multi-variate time series forecasting. In general, multi-variate time series forecasting may be explained as follows.
Assumption—Assume complex relations between multiple time-series.
To emphasize the relationships among multiple time-series, the problem of multivariate time-series forecasting may be formulated based on a data structure called multivariate temporal graph (which may be selected as a case of non-Euclidean learning), which can be denoted as G=(X,W), where X={x_it}∈N×T stands for the multivariate time-series input, where N is the number of time-series (nodes), and T is the number of timestamps. The observed values at timestamp t are denoted as X_t∈
N. W∈
N×N is the adjacency matrix, where wij>0 indicates that there is an edge connecting nodes i and j, and wij indicates the strength of this edge. Problem Definition: Given observed values of previous K timestamps Xt−k, . . . , Xt−1 the task of multivariate timeseries forecasting aims to predict the node values in a multivariate temporal graph G=(X,W) for the next H timestamps, denoted by Xt{circumflex over ( )}, Xt+k {circumflex over ( )}. . . , Xt+H−1{circumflex over ( )}. These values may be inferred by the forecasting model M with parameter φ and a graph structure G, where G can be input as prior or automatically inferred from data. Thus:
X
t
{circumflex over ( )},X
t+1
{circumflex over ( )}, . . . ,X
t+H−1
{circumflex over ( )}=F(Xt−F, . . . ,Xt−1;G;φ)
The technique of distant supervision may be understood as follows. Traditionally, most machine learning techniques require training data. A common approach for collecting training data is to have humans label data. For example, for a marriage relation (e.g. if a model is being trained to extract said relation), human annotators may label the pair “David Beckham” and “Victoria Beckham” as a positive training example. This approach is expensive and even if the data corpus is large this approach will not generate sufficient data for some ML algorithms. Furthermore, the resulting training data may be noisy due to human error.
According to an alternative approach to generating training data which may be referred to as “Distant Supervision”, use is made of an already existing database to collect examples for the relation to be extracted and these examples are used to automatically generate training data. For example, considering a scenario in which a database which contains the fact that Barack Obama and Michelle Obama are married, accordingly each pair of “Barack Obama” and “Michelle Obama” that appear in the same sentence in any dataset which is used in training the ML model is labelled as a positive example for the marriage relation. Thus a large amount of (possibly noisy) training data may be generated. Applying distant supervision to get positive training examples may be considered relatively straightforward, but generating negative examples more difficult.
The aspects disclosed herein may be considered to apply a distant supervision technique in the sense that they learn multiple significant relationships from diverse source data, which can be instrumental in forecasting the demand for 5G data usage in target areas (having no 5G infrastructure and no 5G usage history) with different distributions. These learned relationships encompass various aspects, including but not limited to:
By integrating these relationship patterns and correlations with changes in geographical infrastructure and non-5G usage, a clearer understanding is obtained of the anticipated demand for 5G data usage in the near future in target areas where 5G infrastructure and usage history are absent.
Aspects disclosed herein may also be applied in other domains, such as e-commerce to forecast sales of a new product (for example, a product for which there is no data available about a similar previous product). That is, aspects may use distant supervision to learn the distant time-series patterns from different ecommerce providers on sale of different products (& user base etc.), and use the distantly learned time series patterns to do the time-series forecast on a new product on different ecommerce platform.
Aspects disclosed herein may comprise distant supervision-based cold start, multi-step, multi-variate, and spatiotemporal time series forecasting. The distant supervision is used to learn the distant patterns for time series forecasting from different distributions. Systems use the distantly learned patterns for cold start forecasting for a different distribution's time series data.
Aspects disclosed herein include aspects for cold-start spatiotemporal, multi-variate, and Multi-Step time series forecasting. According to an aspect, a system gives direct forecast values and range of minimum and maximum values as forecast (where for each of the variables, and for each of the time steps, the minimum and maximum values are indicated). The source dataset (i.e., used to train the time-series forecasting system) and the target dataset (i.e., used for time-series forecasting) may belong to different distributions. Due to different distributions, the common/useful patterns cannot be obtained beforehand using traditional strategies. The concept of distant supervision may be applied to distantly learn useful patterns (useful for time series forecasting). Thus, the system may be considered to present a scenario of true cold-start forecasting.
In other words, constraints in the scenarios considered herein show that the training data belongs to multiple different distributions, for the areas having 5G service, while the test data belongs to different distributions (having no 5G service). The distant-supervision-based strategies are used to indirectly learn the patterns from different distributions so that they may be used in forecasting time series for 5G in different distributions. The system uses distantly learned patterns to forecast the 5G requirements at cell-level (cell is a grid-type structure, used by mobile companies—it divides the entire geographical area into small sub-areas, to effectively manage the network).
As mentioned above, previous methods include the first and second comparative methods. Such methods do the cold-start forecasting for either the sale of new products (in a previously defined domain, or in any new domain) or weather updates in new areas, etc. But such methods use a common distribution for all such forecasting. Due to this arrangement, all such sale forecasting methods get the benefits of trends of sales, user behavior, etc. Similarly, for weather forecasting, such systems use a lot of common features (i.e., common to both the existing area and the cold start area), other weather-related ground realities, and information, etc. Such approaches will not work if different time series data distributions are used to train and test the time series data (to achieve the cold-start forecasting). In the proposed scenario (5G forecasting for an area without 5G usage data (historical)), there are no (useful) dependencies among features, common metadata, and common background information.
Aspects disclosed herein may comprise using distant supervision to identify useful patterns from different data distributions (i.e., source training datasets belonging to a different domain, and target test dataset belonging to a different domain, with minimum feature overlap) for cold-start Multi-step, Multi-variate, and spatiotemporal time series forecasting. Some aspects disclosed herein are capable of cold-start, multi-step, multi-variate, and spatiotemporal time series forecasting for data related to new distribution. Some aspects disclosed herein can effectively predict the cold start time series data for areas having no history of that event and belonging to different distributions (different from the distribution used to train the system). The distant supervision-based approaches can be applied to a lot of different time series applications.
The computing device 10 comprises a processor 993 and memory 994. Optionally, the computing device also includes a network interface 997 for communication with other such computing devices, for example with other computing devices of invention embodiments. Optionally, the computing device also includes one or more input mechanisms such as keyboard and mouse 996, and a display unit such as one or more monitors 995. These elements may facilitate user interaction. The components are connectable to one another via a bus 992.
The memory 994 may include a computer readable medium, which term may refer to a single medium or multiple media (e.g., a centralized or distributed database and/or associated caches and servers) configured to carry computer-executable instructions. Computer-executable instructions may include, for example, instructions and data accessible by and causing a computer (e.g., one or more processors) to perform one or more functions or operations. For example, the computer-executable instructions may include those instructions for implementing any of the method steps or processes described above. Thus, the term “computer-readable storage medium” may also include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any of the method steps or processes described above. The term “computer-readable storage medium” may accordingly be taken to include, but not be limited to, solid-state memories, optical media and magnetic media. By way of example, and not limitation, such computer-readable media may include non-transitory computer-readable storage media, including Random Access Memory (RAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory devices (e.g., solid state memory devices).
The processor 993 is configured to control the computing device and execute processing operations, for example executing computer program code stored in the memory 994 to implement any of the method steps or processes described above. The memory 994 stores data being read and written by the processor 993 and may store training data and/or test data and/or network weights and/or intermediate predictions and/or predictions and/or values and/or other data, described above, and/or programs for executing any of the method steps or processes described above. As referred to herein, a processor may include one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. The processor may include a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. The processor may also include one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. In one or more embodiments, a processor is configured to execute instructions for performing the operations and operations discussed herein. The processor 993 may be considered to comprise any of the modules described above. Any operations described as being implemented by a module may be implemented as a method by a computer and e.g. by the processor 993.
The display unit 995 may display a representation of data stored by the computing device, such as a representation of training data and/or predictions and/or GUI windows and/or interactive representations enabling a user to interact with the apparatus 10 by e.g. drag and drop or selection interaction, and/or any other output described above, and may also display a cursor and dialog boxes and screens enabling interaction between a user and the programs and data stored on the computing device. The input mechanisms 996 may enable a user to input data and instructions to the computing device, such as enabling a user to input any user input described above.
The network interface (network I/F) 997 may be connected to a network, such as the Internet, and is connectable to other such computing devices via the network. The network I/F 997 may control data input/output from/to other apparatus via the network.
Other peripheral devices such as microphone, speakers, printer, power supply unit, fan, case, scanner, trackerball etc may be included in the computing device.
Methods/processes embodying the present invention may be carried out on a computing device/apparatus 10 such as that illustrated in
A method/process embodying the present invention may be carried out by a plurality of computing devices operating in cooperation with one another. One or more of the plurality of computing devices may be a data storage server storing at least a portion of the data.
The invention may be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The invention may be implemented as a computer program or computer program product, i.e., a computer program tangibly embodied in a non-transitory information carrier, e.g., in a machine-readable storage device, or in a propagated signal, for execution by, or to control the operation of, one or more hardware modules.
A computer program may be in the form of a stand-alone program, a computer program portion or more than one computer program and may be written in any form of programming language, including compiled or interpreted languages, and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a data processing environment. A computer program may be deployed to be executed on one module or on multiple modules at one site or distributed across multiple sites and interconnected by a communication network.
Method steps of the invention may be performed by one or more programmable processors executing a computer program to perform functions of the invention by operating on input data and generating output. Apparatus of the invention may be implemented as programmed hardware or as special purpose logic circuitry, including e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions coupled to one or more memory devices for storing instructions and data.
The above-described embodiments of the present invention may advantageously be used independently of any other of the embodiments or in any feasible combination with one or more others of the embodiments.
The disclosure extends to the following statements:
S1. A computer-implemented method comprising:
S2. The computer-implemented method according to statement S1, wherein non-network data comprises a population or population density of the area concerned.
S3. The computer-implemented method according to statement S1, wherein non-network data comprises a population or population density of the area concerned and location data indicating the location and extent of the area concerned.
S4. The computer-implemented method according to any of the preceding statements, wherein non-network data, 5G usage data, and non-5G network usage data comprises time series data.
S5. The computer-implemented method according to any of the preceding statements, wherein weather data comprises at least one of temperature, wind speed/flow, rain frequency, and rainfall.
S6. The computer-implemented method according to any of the preceding statements, wherein traffic data comprises at least one of a number/density of vehicles, an average speed of vehicles, an average journey time of vehicles, a level of congestion, a number of traffic jams, a level of use of a road network, and a maximum transit capacity (of a road network).
S7. The computer-implemented method according to any of the preceding statements, wherein demographic data comprises at least one of a population/population density and an economic background.
S8. The computer-implemented method according to any of the preceding statements, wherein infrastructure data comprises at least one of a number of train lines, a number of bus lines, a number of tram lines, a capacity of a nearest airport, a distance to the nearest airport, a frequency of busses, a frequency of trains, and a frequency of trams.
S9. The computer-implemented method according to any of the preceding statements, wherein location data comprises information indicating a location and/or an extent of the area concerned.
S10. The computer-implemented method according to any of the preceding statements, wherein geographical data comprises information indicating a location and/or an extent of the area concerned and/or information indicating geographical features of the area concerned.
S11. The computer-implemented method according to any of the preceding statements, wherein 5G usage data comprises values over time of at least one variable, the at least one variable comprising any of: a number of active users (over time); a number and/or length of video streams (over time) and/or an amount of data/bandwidth used for video streaming (over time); a number and/or length of calls (over time) and/or an amount of data/bandwidth used for calls (over time); a number and/or size of SMS messages (over time) and/or an amount of data/bandwidth used for SMS messages (over time); and a usage amount of the internet (over time) and/or an amount of data/bandwidth used for internet-related processes (over time) and/or an amount of data/bandwidth exchanged via the internet (over time).
S12. The computer-implemented method according to any of the preceding statements, wherein non-5G network usage data comprises values over time of at least one variable, the at least one variable comprising any of: a number of active users (over time); a number and/or length of video streams (over time) and/or an amount of data/bandwidth used for video streaming (over time); a number and/or length of calls (over time) and/or an amount of data/bandwidth used for calls (over time); a number and/or size of SMS messages (over time) and/or an amount of data/bandwidth used for SMS messages (over time); and a usage amount of the internet (over time) and/or an amount of data/bandwidth used for internet-related processes (over time) and/or an amount of data/bandwidth exchanged via the internet (over time).
S13. The computer-implemented method according to any of the preceding statements, wherein combined network usage data comprises values over time of at least one variable, the at least one variable comprising any of: a number of active users (over time); a number and/or length of video streams (over time) and/or an amount of data/bandwidth used for video streaming (over time); a number and/or length of calls (over time) and/or an amount of data/bandwidth used for calls (over time); a number and/or size of SMS messages (over time) and/or an amount of data/bandwidth used for SMS messages (over time); and a usage amount of the internet (over time) and/or an amount of data/bandwidth used for internet-related processes (over time) and/or an amount of data/bandwidth exchanged via the internet (over time).
S14. The computer-implemented method according to any of the preceding statements, wherein non-5G network usage data comprises usage data of at least one non-5G telecommunications network.
S15. The computer-implemented method according to any of the preceding statements, wherein non-5G network usage data comprises usage data of the non-5G telecommunications network or networks that exist in the area concerned.
S16. The computer-implemented method according to any of the preceding statements, wherein non-5G network usage data comprises usage data of at least one of 2G, 3G, 4G, and LTE networks.
S17. The computer-implemented method according to any of the preceding statements, wherein predicted 5G usage data comprises predicted values over time of at least one variable, the at least one variable comprising any of: a number of active users (over time); a number and/or length of video streams (over time) and/or an amount of data/bandwidth used for video streaming (over time); a number and/or length of calls (over time) and/or an amount of data/bandwidth used for calls (over time); a number and/or size of SMS messages (over time) and/or an amount of data/bandwidth used for SMS messages (over time); and a usage amount of the internet (over time) and/or an amount of data/bandwidth used for internet-related processes (over time) and/or an amount of data/bandwidth exchanged via the internet (over time).
S18. The computer-implemented method according to any of the preceding statements, wherein the forecasting process comprises generating at least two of the first to third 5G usage data predictions and combining the at least two 5G usage data predictions to generate a final 5G forecast.
S19. The computer-implemented method according to statement S18, wherein combining the at least two 5G usage data predictions comprises computing a mean 5G usage data prediction (comprising computing a mean for each variable of the at least two 5G usage data predictions).
S20. The computer-implemented method according to any of the preceding statements, wherein the forecasting process comprises generating at least two of the first to third 5G usage data predictions and combining the at least two 5G usage data predictions to generate a predicted range of 5G usage data.
S21. The computer-implemented method according to any of the preceding statements, wherein the forecasting process comprises generating the first to third 5G usage data predictions and combining the first to third 5G usage data predictions to generate a final 5G forecast.
S22. The computer-implemented method according to statement S21, wherein combining the first to third 5G usage data predictions comprises computing a mean 5G usage data prediction (comprising computing a mean for each variable of the first to third 5G usage data predictions).
S23. The computer-implemented method according to any of the preceding statements, wherein the forecasting process comprises generating the first to third 5G usage data predictions and combining the first to third 5G usage data predictions to generate a predicted range of 5G usage data.
S24. The computer-implemented method according to statement S21 or S23, wherein combining the first to third 5G usage data predictions to generate a final 5G forecast comprises, for at least one variable (of the first to third 5G usage data predictions): computing the mean of the variable's predicted values (at each time step) in the first to third 5G usage data predictions; and selecting two values among the variable's predicted values (at each time step) which are closest to the (corresponding) mean as endpoints of a predicted range for the variable.
S25. The computer-implemented method according to any of the preceding statements, wherein the first model may has been trained based on non-network data of the target geographical area of a first time period and non-network data of the target geographical area of a second time period before the first time period (to predict to predict the non-network data of the target geographical area of the second time period based on the non-network data of the target geographical area of the first time period) and/or the second model has been trained based on non-5G network usage data of the at least one reference geographical area of the second time period and based on non-network data and non-5G network usage data of the at least one reference geographical area of the first time period (to predict the non-5G network usage data of the at least one reference geographical area of the second time period based on the non-network data and the non-5G network usage data of the at least one reference geographical area of the first time period); and/or the third model has been trained based on 5G usage data, non-network data, and the non-5G network usage data of the at least one reference geographical area of the second time period (to predict the 5G usage data of the at least one reference geographical area of the second time period based on the non-network data and the non-5G network usage data of the at least one reference geographical area of the second time period); and/or the fourth model has been trained based on non-5G network usage data of the target geographical area of the second time period and based on the non-network data and non-5G network usage data of the target geographical area of the first time period (to predict the non-5G network usage data of the target geographical area of the second time period based on the non-network data and the non-5G network usage data of the target geographical area of the first time period); and/or the fifth model has been trained based on combined network usage data of the at least one reference geographical area of the second time period (, wherein the combined network usage data comprises usage data relating to 5G and non-5G networks,) and based on the non-network of the at least one reference geographical area of the first time period (to predict the combined network usage data of the at least one reference geographical area of the second time period based on the non-network data of the at least one reference geographical area of the first time period); and/or the sixth model has been trained based on the 5G usage data of the at least one reference geographical area of the second time period and based on the non-network data and the non-5G network usage data of the at least one reference geographical area of the first time period (to predict the 5G usage data of the at least one reference geographical area of the second time period based on the non-network data and the non-5G network usage data of the at least one reference geographical area of the first time period).
S26. The computer-implemented method according to any of the preceding statements, wherein the computer-implemented method comprises performing a training process before performing the forecasting process, the training process comprising training at least one of the first to sixth models.
S27. The computer-implemented method according to statement S26, wherein the training process comprises: based on non-network data of the target geographical area of a first time period and non-network data of the target geographical area of a second time period before the first time period, training the first model to predict the non-network data of the target geographical area of the second time period based on the non-network data of the target geographical area of the first time period; based on non-5G network usage data of the at least one reference geographical area of the second time period and based on non-network data and non-5G network usage data of the at least one reference geographical area of the first time period, training the second model to predict the non-5G network usage data of the at least one reference geographical area of the second time period based on the non-network data and the non-5G network usage data of the at least one reference geographical area of the first time period; based on 5G usage data, non-network data, and the non-5G network usage data of the at least one reference geographical area of the second time period, training the third model to predict the 5G usage data of the at least one reference geographical area of the second time period based on the non-network data and the non-5G network usage data of the at least one reference geographical area of the second time period; based on non-5G network usage data of the target geographical area of the second time period and based on the non-network data and non-5G network usage data of the target geographical area of the first time period, training the fourth model to predict the non-5G network usage data of the target geographical area of the second time period based on the non-network data and the non-5G network usage data of the target geographical area of the first time period; based on combined network usage data of the at least one reference geographical area of the second time period and based on the non-network of the at least one reference geographical area of the first time period, training the fifth model to predict the combined network usage data of the at least one reference geographical area of the second time period based on the non-network data of the at least one reference geographical area of the first time period (wherein the combined network usage data comprises usage data relating to 5G and non-5G networks); and based on the 5G usage data of the at least one reference geographical area of the second time period and based on the non-network data and the non-5G network usage data of the at least one reference geographical area of the first time period, training the sixth model to predict the 5G usage data of the at least one reference geographical area of the second time period based on the non-network data and the non-5G network usage data of the at least one reference geographical area of the first time period.
S28. The computer-implemented method according to any of the preceding statements, wherein the first to sixth models are/comprise encoder-decoder (network) models.
S29. The computer-implemented method according to any of the preceding statements, wherein the first to sixth models each comprises a deep neural network, DNN.
S30. The computer-implemented method according to any of the preceding statements, wherein the first to sixth models are/comprise self-attention-based models.
S31. The computer-implemented method according to any of the preceding statements, wherein the first to sixth models are/comprise self-attention-based encoder-decoder (network) models.
S32. The computer-implemented method according to any of the preceding statements, wherein the first to sixth models each comprises a self-attention network.
S33. The computer-implemented method according to any of the preceding statements, wherein the first to sixth models each comprises a (bidirectional) long short-term memory, LSTM, layer.
S34. The computer-implemented method according to any of the preceding statements, wherein the first to sixth models each comprises a first (bidirectional) long short-term memory, LSTM, layer, a first at least one deep neural network, DNN, a self-attention network, a second LSTM layer, and a second at least one DNN.
S35. The computer-implemented method according to any of the preceding statements, wherein the first model comprises a first (bidirectional) long short-term memory, LSTM, layer (to receive data on which the prediction concerned is based), a first at least one deep neural network, DNN, a self-attention network, a second LSTM layer, and a second at least one DNN (to output predicted data).
S36. The computer-implemented method according to any of the preceding statements, wherein the second model comprises a pair of first (bidirectional) long short-term memory, LSTM, layers (to receive data on which the prediction concerned is based), a concatenation layer, a first at least one deep neural network, DNN, a self-attention network, a second LSTM layer, and a second at least one DNN (to output predicted data).
S37. The computer-implemented method according to any of the preceding statements, wherein the third model comprises a pair of first (bidirectional) long short-term memory, LSTM, layers (to receive data on which the prediction concerned is based), a concatenation layer, a first at least one deep neural network, DNN, a self-attention network, a second LSTM layer, and a second at least one DNN (to output predicted data).
S38. The computer-implemented method according to any of the preceding statements, wherein the fourth model comprises a pair of first (bidirectional) long short-term memory, LSTM, layers (to receive data on which the prediction concerned is based), a concatenation layer, a first at least one deep neural network, DNN, a self-attention network, a second LSTM layer, and a second at least one DNN (to output predicted data).
S39. The computer-implemented method according to any of the preceding statements, wherein the fifth model comprises a first (bidirectional) long short-term memory, LSTM, layer (to receive data on which the prediction concerned is based), a first at least one deep neural network, DNN, a repeat vector layer, a pair of self-attention networks, a pair of second LSTM layers, and a pair of second at least one DNNs (to output predicted data).
S40. The computer-implemented method according to any of the preceding statements, wherein the sixth model comprises a pair of first (bidirectional) long short-term memory, LSTM, layers (to receive data on which the prediction concerned is based), a concatenation layer, a first at least one deep neural network, DNN, a self-attention network, a second LSTM layer, and a second at least one DNN (to output predicted data).
S41. The computer-implemented method according to any of the preceding statements, wherein each of the second, third, fourth, and sixth models comprises a pair of first (bidirectional) long short-term memory, LSTM, layers (to receive data on which the prediction concerned is based), a concatenation layer, a first at least one deep neural network, DNN, a self-attention network, a second LSTM layer, and a second at least one DNN (to output predicted data).
S42. A computer program which, when run on a computer, causes the computer to carry out a method comprising:
S43. An information processing apparatus comprising a memory and a processor connected to the memory, wherein the processor is configured to:
Number | Date | Country | Kind |
---|---|---|---|
202311052286 | Aug 2023 | IN | national |