The disclosure relates to a computer-implemented method for processing data items for use in training a machine learning model to identify a relationship between the data items, a computer-implemented method for training the machine learning model to identify the relationship between the data items, and entities configured to operate in accordance with those methods.
With an ever increasing demand for a fast-speed and high-quality user experience, it is important that a telecommunications network is able to serve large volumes of traffic (e.g. for online sessions) for a large number of end users of the network. In some scenarios, in order to assist with this, a network can be configured to deploy and allocate surrogate servers according to requests received from end users, e.g. via the online visit sessions of those end users.
A challenge that is associated with providing an optimum user experience is how to, automatically and efficiently, detect events in the network that may have an impact on the end user experience (e.g. events such as a network session failure, a connection failure, a network failure, etc.). This can be particularly challenging where surrogate servers are deployed, e.g. in a high-speed streaming network, such as a video content delivery network (CDN) or other networks providing similar services.
There already exist techniques for detecting events in a telecommunications network. In some of these existing techniques, artificial intelligence (AI) and machine learning (ML) is used in the detection of events, and such techniques often rely on a regular ML model or a deep recurrent neural network (RNN). However, these existing techniques can be inaccurate and inefficient.
As mentioned earlier, existing techniques that use a regular ML model or deep RNN in the detection of events in a telecommunications network can be inaccurate and inefficient. In particular, it is has been realised that it is not possible for a regular ML model to appropriately capture the characteristics of sequential behaviour in the network, while a deep RNN may be able to learn some contexts from sequential behaviours in the network but it performs poorly for longer sequences. In addition, for a deep RNN, it is generally more difficult to train a longer sequence and to apply that trained longer sequence for the fast prediction of events in real time. This can be particularly problematic when applied to realistic cases, such as high-speed streaming network operations.
The existing techniques for detecting events in a telecommunications network mainly apply traditional machine learning methods (such as regular tree-based algorithms) or deep neural network models (such as an RNN model) for sequence learning, e.g. a long-short-term-memory (LSTM) and a gated recurrent unit (GRU). However, due to the training cost associated with such methods and models, there are few engineering applicable solutions for LSTM and GRU in the real-time prediction of events in a network (i.e. in network inference).
It is an object of the disclosure to obviate or eliminate at least some of the above-described disadvantages associated with existing techniques.
Therefore, according to an aspect of the disclosure, there is provided a first computer-implemented method for processing data items for use in training a machine learning model to identify a relationship between the data items. The data items correspond to one or more features of a telecommunications network. The first method comprises, for each feature of the one or more features, organising the corresponding data items into a sequence according to time to obtain at least one sequence of data items. The first method also comprises encoding a single sequence of data items comprising the at least one sequence of data items to obtain an encoded sequence of data items. The single sequence of data items is encoded with information indicative of a position of data items in the single sequence of data items. The encoded sequence of data items is for use in training the machine learning model to identify the relationship between the data items.
According to another aspect of the disclosure, there is provided a second computer-implemented method for training a machine learning model to identify a relationship between data items corresponding to one or more features of a telecommunications network. The second method comprises training the machine learning model to identify the relationship between the data items in an encoded sequence of data items. The encoded sequence of data items is obtained by, for each feature of the one or more features, organising the corresponding data items into a sequence according to time to obtain at least one sequence of data items, and encoding a single sequence of data items comprising the at least one sequence of data items. The single sequence of data items is encoded with information indicative of a position of data items in the single sequence of data items. The relationship between the data items in the encoded sequence of data items is identified based on the information indicative of the position of data items in the single sequence of data items.
According to another aspect of the disclosure, there is provided a third computer-implemented method performed by a system. The third method comprises the first method described earlier and the second method described earlier.
According to another aspect of the disclosure, there is provided a first entity configured to operate in accordance with the first method described earlier. In some embodiments, the first entity may comprise processing circuitry configured to operate in accordance with the first method described earlier. In some embodiments, the first entity may comprise at least one memory for storing instructions which, when executed by the processing circuitry, cause the first entity to operate in accordance with the first method described earlier.
According to another aspect of the disclosure, there is provided a second entity configured to operate in accordance with the second method described earlier. In some embodiments, the second entity may comprise processing circuitry configured to operate in accordance with the second method described earlier. In some embodiments, the second entity may comprise at least one memory for storing instructions which, when executed by the processing circuitry, cause the second entity to operate in accordance with the second method described earlier.
According to another aspect of the disclosure, there is provided a system comprising the first entity described earlier and the second entity described earlier.
According to another aspect of the disclosure, there is provided a computer program comprising instructions which, when executed by processing circuitry, cause the processing circuitry to perform the first method described earlier and/or the second method described earlier.
According to another aspect of the disclosure, there is provided a computer program product, embodied on a non-transitory machine-readable medium, comprising instructions which are executable by processing circuitry to cause the processing circuitry to perform the first method described earlier and/or the second method described earlier.
Therefore, there is provided an advantageous technique for processing data items for use in training a machine learning model to identify a relationship between the data items corresponding to one or more features of a telecommunications network. There is also provided an advantageous technique for training the machine learning model to identify the relationship between the data items. The manner in which the data items are processed and the use of data items processed in this way in training a machine learning model to identify a relationship between the data items provides a trained machine learning model that can more accurately and efficiently predict the relationship between the data items in practice.
For a better understanding of the techniques, and to show how they may be put into effect, reference will now be made, by way of example, to the accompanying drawings, in which:
Some of the embodiments contemplated herein will now be described more fully with reference to the accompanying drawings. Other embodiments, however, are contained within the scope of the subject-matter disclosed herein, the disclosed subject-matter should not be construed as limited to only the embodiments set forth herein; rather, these embodiments are provided by way of example to convey the scope of the subject-matter to those skilled in the art.
As mentioned earlier, there is described herein an advantageous technique for processing data items for use in training a machine learning model to identify a relationship between the data items corresponding to one or more features of a telecommunications network. This technique can be performed by a first entity. There is also described herein an advantageous technique for training the machine learning model to identify the relationship between the data items. This technique can be performed by a second entity. The first entity and the second entity described herein may communicate with each other, e.g. over a communication channel, to implement the techniques described herein. In some embodiments, the first entity and the second entity may communicate over the cloud. The techniques described herein can be implemented in the cloud according to some embodiments. The techniques described herein are computer-implemented.
The telecommunications network referred to herein can be any type of telecommunications network. For example, the telecommunications network referred to herein can be a mobile network, such as a fourth generation (4G) mobile network, a fifth generation (5G) mobile network, a sixth generation (6G) mobile network, or any other generation mobile network. In some embodiments, the telecommunications network referred to herein can be a radio access network (RAN), or any other type of telecommunications network. In some embodiments, the telecommunications network referred to herein may be a content delivery network (CDN).
The advantageous techniques described herein involve the use of artificial intelligence/machine learning (AI/ML). For example, an AI/ML engine can be embedded on a back-end of a network node (e.g. a server) in order to provide training and inference according to the techniques described herein. In general, techniques based on AI/ML allow a back-end engine to provide accurate and fast inference and feedback, e.g. in nearly real-time. In particular, the techniques described herein can beneficially enable detection of an event in a network accurately and efficiently.
As illustrated in
Briefly, the processing circuitry 12 of the first entity 10 is configured to, for each feature of the one or more features, organise the corresponding data items into a sequence according to time to obtain at least one sequence of data items. The processing circuitry 12 of the first entity 10 is also configured to encode a single sequence of data items comprising the at least one sequence of data items to obtain an encoded sequence of data items. The single sequence of data items is encoded with information indicative of a position of data items in the single sequence of data items. The encoded sequence of data items is for use in training the machine learning model to identify the relationship between the data items.
As illustrated in
The processing circuitry 12 of the first entity 10 can be connected to the memory 14 of the first entity 10. In some embodiments, the memory 14 of the first entity 10 may be for storing program code or instructions which, when executed by the processing circuitry 12 of the first entity 10, cause the first entity 10 to operate in the manner described herein in respect of the first entity 10. For example, in some embodiments, the memory 14 of the first entity 10 may be configured to store program code or instructions that can be executed by the processing circuitry 12 of the first entity 10 to cause the first entity 10 to operate in accordance with the method described herein in respect of the first entity 10. Alternatively or in addition, the memory 14 of the first entity 10 can be configured to store any information, data, messages, requests, responses, indications, notifications, signals, or similar, that are described herein. The processing circuitry 12 of the first entity 10 may be configured to control the memory 14 of the first entity 10 to store information, data, messages, requests, responses, indications, notifications, signals, or similar, that are described herein.
In some embodiments, as illustrated in
Although the first entity 10 is illustrated in
With reference to
The single sequence of data items referred to herein effectively provides an encoded representation of the at least one sequence of data items. In an embodiment where the at least one sequence comprises a plurality of sequences, each single sequence of the plurality of sequences may be encoded and these encoded sequences can be concatenated together to obtain the encoded representation of the plurality of sequences. In some embodiments, the encoded representation referred to herein may be an encoded representation vector, e.g. for a machine learning model.
Although not illustrated in
In some embodiments, the relationship between the data items that is referred to herein can be a similarity measure (e.g. a similarity score). The similarity measure (e.g. similarity score) can quantify the similarity between the data items (e.g. between any two data items) in the single sequence of data items. A person skilled in the art will be aware of various techniques that can be used to determine a similarity measure (e.g. similarity score). In an example, the single sequence of data items may comprise the data items in the form of sequential vectors x=[x1, x2, . . . xn]. Each data item x can represent an embedded vector with a dimension, such as a dimension of emb_dim xi ∈ emb_dim, which can be encoded from the raw data items. In some embodiments, the relationship between any two data items xi and xj can be calculated by an attention mechanism. For example, the relationship (“Attention”) between any two data items xi and xj may be calculated as follows:
Attention(xi,xj)=Σksimilarity(xi,xk)xj,
where the subscript k denotes the index of each data item in the single sequence of data items, except the data item xj. The similarity in the above equation may be defined by a scaled dot-product, for example, using a softmax function (or normalized exponential function) as follows:
where d is the number of units in a layer (namely, the attention layer) that performs the calculation. The scaled dot-product can ensure that the similarity measure (e.g. similarity score) will not be saturated due to a sigmoid-like calculation.
Although also not illustrated in
In some embodiments, each feature of the one or more features may have a time stamp for use in organising the corresponding data items into the sequence according to time. Thus, the data items may be organised into the sequence according to the associated time stamp according to some embodiments.
Although not illustrated in
Although not illustrated in
Although also not illustrated in
In some embodiments, the method may comprise, if the predicted probability of the event occurring in the telecommunications network is above a predefined threshold, initiating an action in the telecommunications network to prevent or minimise an impact of the event. In some embodiments, the predicted probability may be a binary value, where a value of 1 can be indicative that the event will occur and a value of 0 can be indicative that the event will not occur. The predefined threshold may, for example, be set to a value of 0.5 as a fair decision boundary for such a binary classification. However, in other embodiments, another predefined threshold may be identified, e.g. a brute force search may be used to identify an appropriate (or the best) threshold. If an action is to be initiated in the telecommunications network, the first entity 10 (e.g. the processing circuitry 12 of the first entity 10) can be configured to initiate this action. For example, the first entity 10 (e.g. the processing circuitry 12 of the first entity 10) can be configured to itself implement the action or can be configured to cause, e.g. via a communications interface 16 of the first entity 10, another entity to implement the action.
In some embodiments, the action may be an adjustment to at least one network node (e.g. server or base station) of the telecommunications network. In some embodiments, the event may be any one or more of a failure of a communication session in the telecommunications network, a failure of a network node (e.g. server or base station) of the telecommunications network, an anomaly in a behaviour of the telecommunications network, and any other action to prevent or minimise an impact of the event. In some embodiments, the event may be a connection failure in the telecommunications network.
In some embodiments, the method may comprise initiating transmission of information indicative of the prediction of an event occurring in the telecommunications network. In this way, feedback can be provided. The first entity 10 (e.g. the processing circuitry 12 of the first entity 10) can be configured to initiate the transmission of this information. For example, the first entity 10 (e.g. the processing circuitry 12 of the first entity 10) can be configured to itself transmit the information or can be configured to cause, e.g. via a communications interface 16 of the first entity 10, another entity to transmit the information. The information may be utilised to make a decision on whether or not to take an action in the telecommunications network and/or what action to take in the telecommunications network, e.g. so as to prevent or minimise an impact of the event. For example, the decision can be about resource allocation in the telecommunications network (such as whether or not to adjust the allocation of resources in the telecommunications network, e.g. so as to achieve a more efficient allocation for a future incoming load to the network).
As illustrated in
Briefly, the processing circuitry 22 of the second entity 20 is configured to train the machine learning model to identify the relationship between the data items in an encoded sequence of data items. The encoded sequence of data items is obtained by, for each feature of the one or more features, organising the corresponding data items into a sequence according to time to obtain at least one sequence of data items, and encoding a single sequence of data items comprising the at least one sequence of data items. The single sequence of data items is encoded with information indicative of a position of data items in the single sequence of data items. The relationship between the data items in the encoded sequence of data items is identified based on the information indicative of the position of data items in the single sequence of data items.
As illustrated in
The processing circuitry 22 of the second entity 20 can be connected to the memory 24 of the second entity 20. In some embodiments, the memory 24 of the second entity 20 may be for storing program code or instructions which, when executed by the processing circuitry 22 of the second entity 20, cause the second entity 20 to operate in the manner described herein in respect of the second entity 20. For example, in some embodiments, the memory 24 of the second entity 20 may be configured to store program code or instructions that can be executed by the processing circuitry 22 of the second entity 20 to cause the second entity 20 to operate in accordance with the method described herein in respect of the second entity 20. Alternatively or in addition, the memory 24 of the second entity 20 can be configured to store any information, data, messages, requests, responses, indications, notifications, signals, or similar, that are described herein. The processing circuitry 22 of the second entity 20 may be configured to control the memory 24 of the second entity 20 to store information, data, messages, requests, responses, indications, notifications, signals, or similar, that are described herein.
In some embodiments, as illustrated in
Although the second entity 20 is illustrated in
With reference to
In some embodiments, each feature of the one or more features may have a time stamp. In some of these embodiments, organising the corresponding data items into the sequence according to time may comprise organising the corresponding data items into the sequence according to time using the time stamp of each feature of the one or more features. In some embodiments, the at least one sequence of data items may be embedded into the single sequence of data items. In some embodiments, the data items may be from (i.e. may originate from) at least one network node of the telecommunications network.
Although not illustrated in
Although also not illustrated in
Although also not illustrated in
Although also not illustrated in
In some embodiments, the method may comprise, if the predicted probability of the event occurring in the telecommunications network is above a predefined threshold, initiating an action in the telecommunications network to prevent or minimise an impact of the event. The second entity 20 (e.g. the processing circuitry 22 of the second entity 20) can be configured to initiate this action. For example, the second entity 20 (e.g. the processing circuitry 22 of the second entity 20) can be configured to itself implement the action or can be configured to cause, e.g. via a communications interface 26 of the second entity 20, another entity to implement the action. In some embodiments, the action may be an adjustment to at least one network node (e.g. server or base station) of the telecommunications network. In some embodiments, the event may be any one or more of a failure of a communication session in the telecommunications network, a failure of a network node (e.g. server or base station) of the telecommunications network, an anomaly in a behaviour of the telecommunications network, and any other action to prevent or minimise an impact of the event.
In some embodiments, the information referred to herein that is indicative of the position of data items in the single sequence of data items may comprise information indicative of a position of at least one of the data items in the single sequence of data items relative to at least one other data item in the single sequence of data items and/or information indicative of a relative distance between at least two of the data items in the single sequence of data items. In some embodiments, the information referred to herein that is indicative of the position of data items in the single sequence of data items may be obtained by applying an exponential decay function to the single sequence of data items. In some of these embodiments, applying the exponential decay function to the single sequence of data items may comprise inputting values into the exponential decay function. The values can be indicative of the position of at least two of the data items in the single sequence of data items. In some embodiments, each of the at least one sequence of data items referred to herein may be in the form a vector.
In some embodiments, the one or more features of the telecommunications network referred to herein may comprise one or more features of at least one network node (e.g. server or base station) of the telecommunications network. In some of these embodiments, the at least one network node may comprise at least one network node that is configured to replicate one or more resources of at least one other network node. For example, the at least one network node may be a surrogate server of a content delivery network (CDN). Generally, a CDN may comprise one or more surrogate servers that replicate content from a central (or an origin) server. The surrogate servers can be placed in strategic locations to enable an efficient delivery of content to users of the CDN. In some embodiments, the one or more features of the telecommunications network referred to herein may comprise one or more features of a session a user (or user equipment, UE) has with the telecommunications network. Examples of the one or more features include, but are not limited to, an internet protocol (IP) address, a server identifier (ID), an account offering gate, a hypertext transfer protocol (HTTP) request, an indication of session failure, and/or any other feature of the telecommunications network.
In some embodiments, the data items referred to herein may correspond to a UE served by the telecommunications network. In some of these embodiments, an identifier that identifies the UE (or a location of the UE) may be assigned to the at least one sequence of data items. For example, the identifier may comprise information indicative of a geolocation of the UE. Alternatively or in addition, the identifier may be an IP address associated with the UE. In some embodiments, the data items referred to herein may comprise information indicative of a quality of a connection between a UE and the telecommunications network. In some of these embodiments, the connection between the UE and the telecommunications network can be a connection between the UE and at least one network node (e.g. server or base station) of the telecommunications network.
In some embodiments, the machine learning model referred to herein may be trained to identify the relationship between the data items in the encoded sequence of data items using a multi-head attention mechanism. In some embodiments, the machine learning model referred to herein may be a machine learning model that is suitable for natural language processing, and/or the machine learning model referred to herein may be a deep learning model. In some embodiments, this deep learning model may be a transformer (or a transformer model).
There is also provided a system comprising the first entity 10 described herein and the second entity 20 described herein. A computer-implemented method performed by the system comprises the method described herein in respect of the first entity 10 and the method described herein in respect of the second entity 20.
As mentioned earlier, the telecommunications network in respect of which the techniques described herein can be implemented may be any type of telecommunications network and one example is a content delivery network (CDN).
Generally, the CDN can be configured to allocate the surrogate servers 304, 306, 308, 310, 312, 314, 316, 318, 320, 322 according to visit sessions from different Internet Protocol (IP) addresses, e.g. corresponding to different users of the CDN 300. The surrogate servers 304, 306, 308, 310, 312, 314, 316, 318, 320, 322 can replicate (network) content from a server of the central network 302. The surrogate servers 304, 306, 308, 310, 312, 314, 316, 318, 320, 322 can be placed in strategic locations to enable a more efficient delivery of content to users. With the increased amount of content (e.g. video traffic) in recent years, it is valuable for the CDN 300 to be able to cope with high demand and speed in order to provide satisfactory user experiences. In this context, it can be beneficial to (track and) analyse data items corresponding to one or more features of the CDN 300 (such as data items comprising information indicative of a quality of a connection between a UE and the CDN 300), e.g. in order to provide better network services.
Feedback from visiting sessions of users (e.g. from collected traces) can be related to key performance indicators (KPI), which may be provided by data from back-end event records (e.g. log files). Such feedback may allow the CDN 300 to evaluate a quality of service (QOS) offered to users of the CDN 300 and this evaluation may be used to influence a Quality of Experience (QoE) for the users. For example, KPIs can comprise a download bit rate (DBR), which is indicative of a rate at which data may be transferred from a surrogate server to a user, a content (e.g. video) quality level (QL), and/or any other KPI, or any combination of KPIs. KPI features can be formulated in a time-series sequence for serial sessions, some of which may fail during the connection. These failure events and other events in the network may be rare but it is beneficial to be able to (e.g. accurately and efficiently) detect events. For example, this can provide valuable information to better configure and/or operate the CDN 300, e.g. for a better reallocation of resources in the CDN 300.
Although a CDN has been described by way of an example of a telecommunications network, it will be understood that the description in respect of the CDN can also apply to any other type of telecommunications network.
Although not illustrated in
For example, in some embodiments, the first entity 10 (or the processing circuitry 12 of the first entity 10) described herein may comprise the data collection and processing pipeline engine 400, and the transformer model engine 402, whereas the second entity 20 (or the processing circuitry 22 of the second entity 20) described herein may comprise the trained model 406 and optionally also the inference engine 408. In some embodiments, for example, the data collection and processing pipeline engine 400 can be configured to perform the organising of data items as described herein (e.g. with reference to step 102 of
The system illustrated in
As illustrated in
An issue that can exist for efficient CDN services is that the central network 302 may need to allocate appropriate surrogate servers 304, 306, 308, 310, 312, 314, 316, 318, 320, 322 in terms of their maximal loads and characteristics of each user equipment (UE) of the CDN 300. The same may be true of other telecommunications networks in terms of allocating an appropriate network node (e.g. server or base station) to a UE. A UE may be identified by an identifier, such as an internet protocol (IP) address. Each UE visit (e.g. from a particular IP address) can comprise one or more interactive sessions. For example, the one or more interactive sessions may comprise image viewing, texting, web browsing, and/or any other interactive session, or combination thereof. The interactive sessions can be associated with a time series. The session quality may be (e.g. largely) affected by one or more features of the CDN 300, such as a surrogate server identifier (ID) and/or a current content (e.g. video, text, and/or other content), which occupies network bandwidth. The interactive session may fail due to the connection quality or disproportionate load balancing between the surrogate servers of the CDN 300. Therefore, predicting a probability of an event occurring that can have an impact on a session (e.g. that can cause a failure of a session) can be a useful indicator of network quality.
The probability of such an event occurring can be relatively low compared with most successful sessions, making it difficult to accurately predict the event in time for action to be taken to avoid it (e.g. in real-time). However, using the advantageous techniques described herein, it is possible to embed a machine learning model (e.g. a deep learning model) into a system that can accurately and efficiently predict the event. For example, the machine learning model may be trained using (e.g. large volumes of) network session data (e.g. historical logged session data) to perform inference. The system described herein uses a cutting-edge methodology, which can be applied to, among others, the telecommunication domain. The core engine for the machine learning model training described herein, according to some embodiments, may advantageously be based on a deep transformer network model, as originally proposed to solve language translation tasks.
The system illustrated in
In more detail, in some embodiments, as illustrated at block 508 of
As illustrated in
For each feature of the one or more features 602, the first entity 10 (e.g. the processing circuitry 12, such as the data collection and processing pipeline engine 400 or the data collection and pre-processing engine 506, of the first entity 10) described herein, can organise (e.g. all of) the corresponding data items 600 into a sequence according to time to obtain at least one sequence of data items 604. As illustrated in
It may be assumed that the probability of an event occurring in the telecommunications network (e.g. a network failure) at a time T will be affected by sequenced data items from times T−1, T−2, . . . . T-n (where n may be a maximum number of data items in a sequence that the machine learning model will accept as input). The processing of the data items described herein can be easily and efficiently be adapted with parallel processing, particularly since the at least one sequence of data items referred to herein (e.g. in a dictionary format) can easily and efficiently be retrieved during an inference (or prediction) phase, e.g. by using the identifier (IP address) that identifies the user concerned.
The first entity 10 (e.g. the processing circuitry 12 of the first entity 10) described herein can comprise at least part of the transformer model engine 700 and/or the second entity 20 (e.g. the processing circuitry 22 of the second entity 20) described herein can comprise at least part of the transformer model engine 700. Thus, at least some steps (e.g. sequence embedding 702 and positional encoding 704) described with reference to the transformer model engine 700 can also be said to be performed by the first entity 10 (e.g. the processing circuitry 12 of the first entity 10) and/or at least some steps (e.g. training 706) described with reference to the transformer model engine 700 can also be said to be performed by the second entity 20 (e.g. the processing circuitry 22 of the second entity 20). As illustrated in
As illustrated at block 702 of
As illustrated at block 706 of
As illustrated in
At block 702 of
At block 710 of
The technique described herein can outperform existing techniques (e.g. recurrent neural network (RNN) techniques) as the technique described herein can not only learn the relationship between two data items that are close in their position in the sequence of data items, but also the relationship of two data items having a similar meaning even if those data items are physically far away from each other in their position in the sequence of data items.
In some embodiments, the overall output of the transformer structure illustrated in
The single sequence of data items 900 comprising the at least one sequence (e.g. all sequences) of data items 902, 904, 906 is encoded using positional encoding 704 to obtain an encoded sequence of data items. More specifically, the single sequence of data items 900 comprising the at least one sequence (e.g. all sequences) of data items 902, 904, 906 is encoded with information indicative of a position of data items in the single sequence of data items 900. The function of positional encoding can be to enable the machine learning model to learn the relative positions of each data item in the single sequence of data items, e.g. irrespective of the length of that single sequence of data items. Thus, the technique can be used on any length sequence of data items, even a long sequence of data items. In some embodiments, a (e.g. mathematical) function, such as an exponential decay function, can be used for the positional encoding as described earlier.
In some embodiments, the implementation may take into consideration (e.g. all) sequential behaviours of one or more historical sessions in the telecommunications network and realise the functionality of data items and sequence embedding. In some embodiments, as illustrated at block 714 of
In some embodiments, after encoding 704 and optionally also embedding 714, the encoded sequence of data items 900 may be input into a multi-head attention block 706 (e.g. an 8-layered multi-head attention block), which may be a part of the model engine according to some embodiments. As illustrated in
The use of a multi-head attention mechanism can ensure that any bias, e.g. from random seeding in the system, is reduced. Typically, multiple calculations based on a single attention head can be performed with different random seeds, which generate different initial embedding vectors x. For example, multiple outputs can be obtained for different attention matrices, e.g. attention1, attention2, . . . attentionN may be obtained based on different random seeds. The random seeds can, for example, be set by a user (e.g. modeller). Following the multiple calculations performed with different random seeds, a multi-head attention vector may be obtained by concatenating the outputs of these calculations, e.g. as follows:
MultiHeadedAtten=[attention1,attention2, . . . attentionN].
After the operation for multi-headed attention is complete, a regular feedforward layer can be applied on the above multi-headed attention vector. In some embodiments, the trained machine learning model may be stored in memory (e.g. the model saver 708), which may be a memory of the second entity 20 described herein or another memory.
As illustrated in
The single sequence of data items comprising the at least one sequence (e.g. all sequences) of data items is encoded using positional encoding 704 to obtain an encoded sequence of data items. More specifically, the single sequence of data items comprising the at least one sequence (e.g. all sequences) of data items is encoded with information indicative of a position of data items in the single sequence of data items. In some embodiments, a (e.g. mathematical) function, such as an exponential decay function, can be used for the positional encoding as described earlier. An example of an exponential decay function is illustrated in
At block 706 of
At block 712 of
As illustrated in
As illustrated in
The input data items 1102 may be formulated into updated sequences 1104 taking into account the input data items 1102 and optionally also historical (e.g. previously stored) data items 1106. For example, for each sequence of data items, the sequence may be recursively transferred from x0, x1, . . . x(T-1) to x1, x2, . . . xT to ensure the length of the sequence of data items is the same as for the model input. Afterwards, the previously trained machine learning (e.g. transformer) model may be called from a memory (e.g. a model saver 1108) to predict an output (e.g. to predict a probability of an event occurring in the telecommunications network, such as a session failure) 1110.
An inference test simulator, which can mimic a real-world network operation, has been developed. The following table illustrates a summary of the prediction performance and inference time for two existing machine learning models (namely, a light gradient boosting machine model and a recurrent neural network model) and a transformer model, which is an example of a machine learning model that can be used according to some embodiments described herein. The machine learning models were tested using a test data set.
By way of the above table, the performance of the two existing machine learning models can be compared with the transformer machine learning model referred to herein. The main aspects considered during testing, were off-line training performance, online inference accuracy, and response time. In order to evaluate the different existing machine learning models against the transformer model referred to herein, the testing included training using a training data set comprising 4 million samples and testing on a test data set comprising 500 K samples. The lightGBM model that was tested is an example of a traditional tree-based machine learning model, and the RNN model that was tested is an example of a long-short term memory (LSTM) model. To mimic a real-time scenario, during the inference phase of testing, batch streaming data (with 64 samples in one batch from different IP addresses) was used as input to the previously trained machine learning models.
The overall performance is shown in the above table. As illustrated in the above table, it can be concluded that the transformer model, which can be used according to some embodiments described herein, takes much less time during the offline training phase. This is advantageous as it ensures that the previously trained machine learning model (e.g. on a back-end of a network server) can be updated frequently when a historical data set is updated. During online testing/inference, the transformer model was able to achieve 97% accuracy (which was measured as the number of correct predictions per number of samples). An area under curve (AUC) score was also evaluated by considering precision and recall for binary classes. In general, the AUC score is considered to be a more fair evaluation metric for imbalanced data such as rare failure or anomaly cases. As illustrated in the above table, the transformer model was shown to achieve an AUC score of 0.96. In addition, the transformer model realises the lowest inference time of all the models tested. More specifically, the transformer model can reach a 3-millisecond prediction time when parallel processing is applied.
In summary, the evaluation test results illustrate that the transformer model, which can be used according to some embodiments described herein, can achieve a higher accuracy of inference (or prediction) in less time than existing techniques in a real-world scenario.
At block 1206 of
At block 1212 of
At block 1216 of
There is also provided a computer program comprising instructions which, when executed by processing circuitry (such as the processing circuitry 12 of the first entity 10 described herein and/or the processing circuitry 22 of the second entity 20 described herein), cause the processing circuitry to perform at least part of the method described herein. There is provided a computer program product, embodied on a non-transitory machine-readable medium, comprising instructions which are executable by processing circuitry (such as the processing circuitry 12 of the first entity 10 described herein and/or the processing circuitry 22 of the second entity 20 described herein) to cause the processing circuitry to perform at least part of the method described herein. There is provided a computer program product comprising a carrier containing instructions for causing processing circuitry (such as the processing circuitry 12 of the first entity 10 described herein and/or the processing circuitry 22 of the second entity 20 described herein) to perform at least part of the method described herein. In some embodiments, the carrier can be any one of an electronic signal, an optical signal, an electromagnetic signal, an electrical signal, a radio signal, a microwave signal, or a computer-readable storage medium.
In some embodiments, the first entity functionality and/or the second entity functionality described herein can be performed by hardware. Thus, in some embodiments, the first entity 10 and/or the second entity 20 described herein can be a hardware entity. However, it will also be understood that optionally at least part or all of the first entity functionality and/or the second entity functionality described herein can be virtualized. For example, the functions performed by the first entity 10 and/or second entity 20 described herein can be implemented in software running on generic hardware that is configured to orchestrate the first entity functionality and/or the second entity functionality. Thus, in some embodiments, the first entity 10 and/or second entity 20 described herein can be a virtual entity. In some embodiments, at least part or all of the first entity functionality and/or the second entity functionality described herein may be performed in a network enabled cloud. Thus, the method described herein can be realised as a cloud implementation according to some embodiments. The first entity functionality and/or second entity functionality described herein may all be at the same location or at least some of the functionality may be distributed, e.g. the first entity functionality may be performed by one or more different entities and/or the second entity functionality may be performed by one or more different entities.
It will be understood that at least some or all of the method steps described herein can be automated in some embodiments. That is, in some embodiments, at least some or all of the method steps described herein can be performed automatically. The method described herein can be a computer-implemented method.
The techniques described herein include an advantageous technique for organising data items corresponding to one or more features of a telecommunications network (e.g. user streaming data) for input into a machine learning model, an advantageous technique for training such a machining learning model (e.g. a deep transformer model), and an advantageous technique for using the trained machine learning model to perform inference on incoming data items (e.g. comprising streaming data). The inference performed according to the techniques described herein is efficient and/or can be performed in (e.g. near) real-time. The response time achieved using the techniques described herein is largely reduced compared to existing techniques. In this way, the potential for human error caused by subjective assessment is reduced.
Owing to their nature of automation and efficiency, the techniques described herein can scale up network failure detection and optimisation for all existing and future telecommunications networks, such as 5G telecommunications networks and any other generations of telecommunications network. The techniques can be broadly applied to many use cases and it will be understood that they are not limited to the example use cases described herein.
It should be noted that the above-mentioned embodiments illustrate rather than limit the idea, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim, “a” or “an” does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2021/055913 | 7/1/2021 | WO |