The present invention relates to methods of monitoring parts of a communications network including manipulating a stream of monitoring data using a stored index, to methods of adapting the manipulation, and to corresponding apparatus for manipulating and apparatus for adapting the manipulation and programs for such methods.
Monitoring of network and service performance in large-scale operational networks is becoming increasingly important, especially with the fast deployment of mobile broadband networks and services. Operators need to be able to respond quickly to customer complaints, and be pro-active by continuously monitoring in (near) real time and respond to service affecting changes in network behavior.
(Near) real time monitoring means there is some time between an event recorded by the network until the relevant information from that event is presented as information to the operator.
Continuously monitoring means all relevant events (e.g. subscriber activities) are continuously generated in the network and either logged/collected in files and sent to an OSS or directly streamed to the OSS.
Existing solutions include real-time event stream processing involving classifying and aggregating events (and calculating Key Performance Indicators KPIs for networks and services) based on attribute values.
Attribute values of an event are set based on attributes of the corresponding signaling procedure. An example of an attribute field is a cell ID, which can have various defined values. In order to carry out complex event processing, such as KPI calculation on cell groups or clustering based analysis, the event processing engine has to correlate events in real time with additional data source (such as cell group definitions or topological information of cells). This poses significant overhead on CPU and disk/DB I/O in the presence of high volumes of event streams.
One practice in redundancy elimination in event streaming is to transmit those duplicate values (such as UE information and PDN session information such as APN) only in the starting event record of the stream, and only transmit the key ID such as IMSI and PDN ID (and bearer ID if necessary) in the consequent events.
Another known method of reducing data transmission overhead is to apply data compression methods onto streams, i.e. compressing data sent over a socket. Generally, data compression involves encoding information using fewer bits than the original representation. Compression can be either lossy or lossless. Lossy compression reduces bits by identifying marginally important information and removing it. In contrast, loss less compression exploits statistical redundancy and represents data more concisely without losing information.
In particular, lossless compression exploits redundancy by finding patterns of bits or bytes inside the data block to be compressed. Without priori knowledge of the nature of the data, the pattern can only be detected in finite blocks of data. Some most popular lossless compression methods are the Lempel-Ziv (LZ) method, DEFLATE (a variation on LZ, used in PKZIP, gzip and PNG) and LZW (Lempel-Ziv-Welch, used in GIF images).
In the case of event streams, where communication is continuous and openended (unlike transactional communication with bounded operations “opentransmission-close”), compression can be applied by writing (one or multiple) events into a buffer, compressing previously buffered events and transmitting it without closing the underlying stream.
Redundancy elimination (RE) is used to replace and transmit duplicate information with labels, which brings the following benefits; reduced bandwidth usage cost; reduced network congestion at access links; higher throughputs; and reduction in transfer completion times.
In particular, existing solutions in redundancy identification involve caching data blocks that have been previously seen, including:
The basic principle of the Protocol-independent RE approach is to identify repeating patterns in the outgoing raw data traffic (bit/byte level) and replace them with labels, so that the original raw data could be reconstructed on the receiver side.
The outgoing data is split into sub-strings (also referred to as chunks) and the redundancy is detected by looking for repeating chunks. The major processing stages are as follows.
Fingerprinting Data.
The goal of this operation is to facilitate identification of repeating patterns in data stream. Rabin fingerprints is the common approach used. The algorithm calculates fingerprint over a sliding window of data. The byte at which special fingerprint is found becomes the last byte of a current data chunk. The result of a fingerprinting is a set of fingerprints and corresponding byte positions.
Indexing and Lookup.
The goal of a lookup is to use fingerprints from a previous step to search for repeating data in the local cache. If the data has been seen earlier and was saved to the cache, the cache must contain corresponding fingerprints. Finding them means that redundancy is detected.
Storing Data.
Data chunks have to be saved in the cache for the purpose of repetition detection. RE systems tend to increase redundancy elimination ratio by increasing storage capacity, thus eventually the data has to be stored on hard drive.
Assuming that the system is completely protocol unaware (i.e. cannot tell, in general, whether given piece of data is unique or highly repetitive), the decision can be based on whether data is new or it is already present in the cache. On one hand, saving all data to the cache consumes more storage space; on the other hand, it improves data access locality and may improve compression.
Reconstructing Data.
The task of data reconstruction is to restore the original data from the compressed one. The approach has two major advantages. First, it is protocol-independent. Because the method is applied to raw data, there is no need to know which protocol is used for the particular data transfer; the redundancies are identified across the whole protocol stack. Second, the mechanism tolerates modifications of the original objects. If some data object, such as a file, is partially modified before it is transferred for the second time, those parts of it which remain unchanged will still benefit from the optimization.
An implementation of a middle-box based approach is as follows: An RE implementation requires the installation of two middleware boxes; one closer to the server (encoder) and one closer to the client (decoder). As data flows from the server to the client, it passes through the boxes and is broken into chunks. The chunks are stored on the persistent storage of each box. For each chunk, a representing fingerprint (hash) that maps to the actual chunk is generated and stored in the memory (e.g. a 1 KB stream can be represented by a collision-free 20B hash). The two boxes communicate through an out-of-band TCP connection such that the data are delivered in order. Since both boxes contain the same data, they are synchronized. A second reference to a chunk would mean that the encoding box would send the hash value instead of the actual bytes.
According to an aspect of the present invention, there is provided a method of monitoring a part of a communications network, having the steps of: receiving a stream of monitoring data relating to the part of the network, the stream of monitoring data having a repeating data format of attribute fields, and manipulating the stream of monitoring data automatically by: detecting the attribute fields in the stream of monitoring data, and for an attribute value of selected attribute fields, looking up a corresponding indexed value in a stored index, and selectively replacing that attribute value in the data stream with a corresponding indexed value, wherein the selective replacement is based on a characteristic of at least one of: the stream of monitored data, and the network being monitored.
A benefit of such selective replacement at the attribute field level is that the monitoring data can be enriched or compressed more efficiently with less processing overhead by exploiting knowledge of the data format and the network configuration, which cannot be achieved by other lower level compression techniques. Also it is compatible with hardware implementations for faster processing. Furthermore, since the replacement is carried out at the attribute field level rather than a bit level, and because the values are indexed, they are consistent and predictable and so some data processing operations on the attribute values, such as scaling, can be carried out instead on the mapping. Thus repetition of the same operation on multiple instances of the same attribute value can be avoided and thus the processing resource required can be reduced. See
Any additional features can be added to these features, and some such additional features are set out below and set out in dependent claims and described in more detail. One such additional feature is the characteristic being redundancy in the stream of monitored data and the selective replacement is based on at least how many unique attribute values there are for the respective attribute field, and on frequencies of occurrence of the different attribute values. This can help to limit the amount of replacement to those where there is most redundancy for example. This means that fewer indexed values may be needed, and so they can be shorter, thus improving compression efficiency. Also it can mean that fewer replacements are needed for a given reduction in redundancy, thus reducing the processing resource needed to make the replacements. See
Another such additional feature is the step of replacing at least one of the attribute values comprising replacing it with a corresponding indexed value comprising embedded information concerning the part of the network which is generating the monitoring data. This can enable some types of processing of the indexed values to take into account such embedded information without needing to carry out additional steps of looking up such information from an external source for each of the indexed values. Thus such processing can be carried out much more efficiently or quickly. See
Another such additional feature is the embedded information concerning the part of the network comprising information about relationships between the part and other parts of the network. Such information about relationships can enable correlation with data from monitoring such other parts to be carried out more efficiently for example. See
Another such additional feature is the step of processing the monitoring data using the indexed values. This can help avoid the need to reconvert the indexed values back to the original attribute values and thus reduce processing resource requirements. As the indexed values can be shorter, this can also help reduce the processing resource requirement. See
Another such additional feature is the processing step comprising correlating monitored data from related parts of the network using the embedded information about relationships between the part and other parts of the network. Such correlation is a particularly useful type of processing, for example to locate causes of faults. Notably this can be carried out much more efficiently or more quickly by using information about relationships in the indexed values since there is less time and resource spent looking up the relationship information from elsewhere. See
Another such additional feature is the step of dynamically adapting how the stream of monitoring data is manipulated in use by altering any of: the selection of attribute field, the selection of attribute values to be replaced, and the corresponding indexed values in the stored index. This can enable adaptation to changing conditions such as changes of value occurrence frequencies, over different time periods, or over multiple networks, or changes in how the monitored data is to be processed, for example from group based analysis (pre-defined cell groups) to cluster analysis based on cell geographical distributions (to identify areas with weak wireless signals in a mobile network example).
Another such additional feature is the step of observing frequencies of occurrence of the different attribute values in use and adapting the selection of attribute values to be replaced, based on the observed frequencies. This can help to improve the efficiency of compression, and adapt it to changing conditions. See
Another such additional feature is the step of replacing the attribute values comprising replacing variable length attribute values with corresponding indexed values having a fixed length. This can make processing of such index values faster, and can help enable processing hardware to be optimized for such fixed lengths. See
Another such additional feature is the step of maintaining a database at the network management system and storing the received monitoring data in the database after the step of replacing at least one of the attribute values, without reconverting the indexed values back to the corresponding attribute values. This can help reduce processing and storage resource requirements. The index can be stored also to help enable later reconversion. See
Another aspect provides a method of adapting manipulation by apparatus having a stored index, of a stream of monitoring data relating to a part of a communications network, the stream of monitoring data having a repeating data format of attribute fields, the manipulating involving replacing selected attribute values with indexed values, the stored index having a mapping of indexed values corresponding to attribute values to be replaced. The adapting involves generating selection information about which of the attribute fields and which of their values are for replacement, and generating corresponding indexed values, based on a characteristic of at least one of: the stream of monitored data, and the network being monitored. The selection information and the corresponding indexed values are sent to the apparatus to cause it to adapt its manipulation of the monitoring data according to the selection information and corresponding indexed values. By adapting such manipulating apparatus it can become more efficient for the conditions including the network configuration. See
Another such additional feature is, for each of the attribute fields, identifying how many unique attribute values there are for the respective attribute field, and identifying frequencies of occurrence of the different attribute values, and deriving a characteristic of the stream of monitoring data comprising amounts of redundancy for different ones of the attribute fields, based on the numbers of unique attribute values, and how frequently the different attribute values occur and generating the selection information based on these redundancy characteristics. See
Another such additional feature is the step of identifying how many unique attribute values there are for the respective attribute field having a step of deriving this from at least one of: configuration information about the network, and observations of monitoring data over a period of time in use. For some types of attribute field this provides limitations which enable the number of values to be deduced. See
Another such additional feature is the step of identifying a frequency of occurrence comprises the step of observing frequencies of occurrence of the different attribute values in use over a period of time. This can help to improve the efficiency of compression, and adapt it to changing conditions. See
Another such additional feature is the step of selecting an indexed value to have embedded information concerning the part of the network which is generating the monitoring data. By including such information in the indexed values this can enable some types of processing of the indexed values to take into account such information without needing to carry out additional steps of looking up such information for each of the indexed values. Thus such processing can be carried out much more efficiently or quickly. See
Another such additional feature is the embedded information concerning the part of the network comprising information about relationships between the part and other parts of the network. By including such information about relationships in the indexed values, this can enable correlation with data from monitoring such other parts to be carried out more efficiently for example. See
Another aspect provides apparatus configured to carry out methods as set out above. Another aspect provides apparatus for manipulating a stream of monitoring data relating to a part of a communications network, the stream of monitoring data having a repeating data format of attribute fields, the apparatus having a stored index having a mapping of indexed values corresponding to the selected attribute values to be replaced, characteristic of at least one of: the stream of monitored data, and the network being monitored. The apparatus also has look up circuitry, configured to detect the attribute fields in the stream of monitoring data, and for selected attribute fields, to use the attribute values to look up corresponding indexed values in the stored index. Replacing circuitry is provided configured to selectively replace the attribute values in the data stream with corresponding indexed values according to the stored index.
Another such additional feature is a monitoring data processor configured to carry out real time processing on the monitoring data having the indexed values. See
Another such additional feature is adaptation apparatus coupled to any of the stored index, the look up circuitry and the replacing circuitry for dynamically altering the manipulating in use by altering any of: the selection of attribute field, the selection of attribute values to be replaced, and the corresponding indexed values in the stored index.
Another such additional feature is the indexing adaptation circuitry having a selector configured to determine amounts of redundancy in attribute fields, and an indexing part configured to selectively create the indexed values for the mapping in the index. See
Another aspect provides adaptation apparatus for adaptation of a manipulating operation by apparatus having a stored index, to manipulate a stream of monitoring data relating to a part of a communications network, the stream of monitoring data having a repeating data format of attribute fields, the manipulating involving replacing selected attribute values with indexed values, the stored index having a mapping of indexed values corresponding to attribute values to be replaced. The apparatus has a processor and memory configured to generate selection information about which of the attribute fields and which of their values are for replacement, and to generate corresponding indexed values, based on a characteristic of at least one of: the stream of monitored data, and the network being monitored. An interface is provided for sending the selection information and the corresponding indexed values to the adaptation apparatus to cause it to adapt its manipulation of the monitoring data according to the selection information and corresponding indexed values. Such adaptation can enable the manipulation to become more efficient to match changing conditions or match the network configuration for example. It can enable the index and the selection to be built up and maintained automatically in some cases. This adaptation part can be located centrally to help enable the network management system to control the manipulation more easily than if it were distributed around the network. If the look up part and the replacing part are located remotely, the benefit of the compression is present in the transmission of the monitoring data, as well as assisting with aggregation, processing and storing of the monitoring data at the network management system. See
Another aspect provides a computer program having instructions on a non transient computer readable medium which when executed by a processor, cause the processor to carry out any of the methods above, involving manipulating the monitoring data using the index, or adapting the operation of manipulating.
Any of the additional features can be combined together and combined with any of the aspects. Other effects and consequences will be apparent to those skilled in the art, especially over compared to other prior art. Numerous variations and modifications can be made without departing from the claims of the present invention. Therefore, it should be clearly understood that the form of the present invention is illustrative only and is not intended to limit the scope of the present invention.
Embodiments of the invention will be described, by way of example only, with reference to the accompanying drawings in which:
9 and 10, show methods of adapting the selective replacement according to embodiments,
The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims. The drawings described are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn to scale for illustrative purposes.
XML extensible Markup Language
Where the term “comprising” is used in the present description and claims, it does not exclude other elements or steps and should not be interpreted as being restricted to the means listed thereafter. Where an indefinite or definite article is used when referring to a singular noun e.g. “a” or “an”, “the”, this includes a plural of that noun unless something else is specifically stated.
Elements or parts of the described nodes or networks may comprise logic encoded in media for performing any kind of information processing. Logic may comprise software encoded in a disk or other computer-readable medium and/or instructions encoded in an application specific integrated circuit (ASIC), field programmable gate array (FPGA), or other processor or hardware.
References to nodes can encompass any kind of switching node, not limited to the types described, not limited to any level of integration, or size or bandwidth or bit rate and so on.
References to programs or software can encompass any type of programs in any language executable directly or indirectly on processing hardware.
References to processors, hardware, processing hardware or circuitry can encompass any kind of logic or analog circuitry, integrated to any degree, and not limited to general purpose processors, digital signal processors, ASICs, FPGAs, discrete components or logic and so on. References to a processor are intended to encompass implementations using multiple processors which may be integrated together, or co-located in the same node or distributed at different locations for example.
The functionality of circuits or circuitry described herein can be implemented in hardware, software executed by a processing apparatus, or by a combination of hardware and software. The processing apparatus can comprise a computer, a processor, a state machine, a logic array or any other suitable processing apparatus. The processing apparatus can be a general-purpose processor which executes software to cause the general-purpose processor to perform the required tasks, or the processing apparatus can be dedicated to perform the required functions. Embodiments can have programs in the form of machine-readable instructions (software) which, when executed by a processor, perform any of the described methods. The programs may be stored on an electronic memory device, hard disk, optical disk or other machine-readable storage medium or non-transitory medium. The programs can be downloaded to the storage medium via a network connection.
Modifications and other embodiments of the disclosed invention will come to mind to one skilled in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of this disclosure. Although specific terms may be employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
By way of introduction to features of embodiments of the invention, some discussion of known features will be presented.
Events and event based monitoring will be described, as an example of one type of stream of monitoring data, though many embodiments are not limited to such events. An event is an object that is a record of an activity in a system. This concept is different from the everyday usage of “event” as “something that happens”', since the event here is an object signifying an activity; in event processing, event objects are processed, not, not activities. Also, an event is not just a message. Generating a message is a common way of generating an event. However, an event also contains data describing the activity it signifies (such as results of a signalling procedure); due to the relations (such as time and causality) between activities, events have the same relationships to one another as the activities. In particular, events from a mobile radio system are defined as follows: An object or signal carrying information about any discernable occurrence that has significance for the management of the mobile radio infrastructure or the delivery of service and evaluation of the impact a deviation might cause to the services.
A mobile radio system consists of user equipments (for example mobile phones or laptop's modems), radio access networks and core networks. Events are defined in such systems to monitor network and service performance and manage subscriber service experiences.
A mobility or session management message (for example, Create Session Request message) used in a signaling procedure (for example, Initial Attach procedure) triggers an event (for example, SESSION_CREATION event). Refer to 3GPP TS 23.401 for more information about mobility and session message procedures and messages.
The sGW nodes set an event outcome, together with detailed contextual information about the signaling procedure (timestamps, PDN information, UE information, and bearer statistics), into a data block using pre-defined structures. The events may be encoded in bit-packed binary format based on a XML file (that defines its format). Events are then transmitted to an external post-processing system (such as EBM application) that decodes the events in files every ROP (for example lasting for periods of 15 minutes) or streamed to a specified IP address.
Event based applications collect generated events, either in (compressed) ROP files or in real-time streams, and process events, including:
For these purposes, event parameters such as failure codes and sub failure codes are defined to help troubleshooting the network. Such O&M information is unique to events of mobile radio networks and useful in the context of (root) cause analysis of network problems.
Unless specially specified, the event concept in this document refers to O&M events, e.g. objects carrying information about O&M occurrences.
Compared to other monitoring methodologies, event based monitoring has the following advantages:
These advantages of event based monitoring are at the cost of extra overhead in transmission and processing. Contextual information dramatically increases the size of an event, while real-time transmission and processing of events poses stringent requirements on the capacity of both networks and systems.
Compared to messages or events for general purposes (such as twitter messages), event stream traffic has the following characteristics. Aggregate traffic; thousands (or even tens of thousands of) simultaneous streams originated from nodes are sent towards one O&M server; for example, a least one operational network in use now has up to 50,000 event streams; Constant event streams; event data is pushed from nodes constantly; event streams (such as events describing status of subscriber mobility management and session management; refer to 3GPP TS 23.401, TS 29.274, TS 36.401) are long-lasting and persistent in a production/operational network, as long as One-way data transmission; event data is pushed from nodes towards the NMS only;
Low event/data rate per stream but potentially very high after aggregation; event/data rate per stream is low, typically 10 k-100 k bps; however, aggregated event/data rate can be as high as Gigabit bps, which poses significant challenges in event processing and forwarding;
Well structured events, but with diverse event format; event streams may come from multiple data sources (RAN/Core/Transport); event formats and encoding vary over node types (with different technologies/domains), node versions and node vendors; a large number of encoding formats are used in practice, such as Google protocol buffer, ASN.1 (BER/XER/PER) or XML. The data may be bit-packed, byte-packed, or even in XML format or ASCII format; even the same types of events may have different definitions (such as bearer session traffic statistics from Cisco BGW nodes as compared to those from CPG/sGW nodes) in terms of attribute definitions.
Spatially varied stream characteristics; event streams from different networks, of different node types and node locations, may have significantly different event rates;
Time varied stream characteristics; even from the same network, event rate may vary significantly over different time periods; busy hour (BH) event rate is expected to be higher than average event rate, while event rate during peak time periods can be three times higher than busy hour event rate.
Attributes of an event can have contextual details about a signaling procedure or measurement, including for example IP addresses of participating nodes, PDN information and UE information (IMSIs etc). Event stream traffic exhibits large amount of redundancy. Duplicate values of event attributes appear frequently across events of the same stream, or multiple streams. Such redundancy is caused by the following reasons:
Event attribute is usually defined to be 3GPP compliant (refer to 3GPP TS 23.003). The size/length of each attribute is defined to be capable of accommodating the maximum number of unique values. For example, an IP address attribute is defined as a 4-byte array, an APN as 100 octets, and an URI as 800 bits. The data type of each attribute is defined based on the nature of the attribute. For example, APNs and URis are octet strings.
(2) Locality of Attribute Values with Restricted Value Space:
O&M events are different from other events (such as events from financial markets). In a real-world scenario, possible unique values of an attribute are limited by local configurations. The value space of some attributes is quite restricted.
For example, a typical (large) operational network may configure 1000 APNs, with 40 EPG nodes (which corresponds to 40 pGW IP addresses). Signaling procedures are associated with these nodes and identifiers. Because of this, the same values of an attribute may be frequently accessed. The same values may appear frequently in the same stream or multiple streams, either with events of the same type or with events of different types.
In particular, there are two basic types of locality: temporal locality and spatial locality. Temporal locality refers to re-occurrence of specific attribute values within relatively small time periods (for example, busy hours within a day or URLs associated with sudden events). Spatial locality refers to re-occurrence of specific attribute values within relatively close locations.
Not all of the attribute values appear with the same frequency or probability. It suggests that the frequency of (at least some of) the attribute values follows zip-f distribution, i.e. the popularity of an attribute value is inversely proportional to its rank in the frequency table. A few values appear extremely frequent, a medium number of values with moderate frequency, while a huge number of values relatively inactive. Due to their frequent occurrence, these popular attribute values contribute to data redundancy in event streams.
Problems with Real-Time Event Stream Processing
Existing event stream processing solutions (such as group based analysis or clustering based analysis) require correlating event streams with topological information in real time. Considering the high event rate, such lookup/correlation operations would significantly increase overhead on CPU and disk/DB I/O, which slows the performance of the whole system.
Problems with Redundancy Elimination Solutions
Transmitting Once-for-all as mentioned above has problems as follows. Transmitting duplicate values in starting records and identifiers only in the consequent records reduces transmission overhead. However, in order to process the events (such as APN based aggregations), the consequent events have to be enriched with the missing information in order to carry out the proposed aggregation, no matter it is online processing or offline processing. A list of live session information would have to be maintained, so that the missing values can be looked up.
Another drawback of this solution is that it only applies to a single event stream and cannot remove redundancy across multiple event streams.
Data compression can only reduce redundancy inside the data block to be transmitted. It cannot handle redundancy across multiple event blocks, or across multiple event streams.
As mentioned earlier, RE solutions identify redundancy by caching information (objects, packets or chunks) that has been previously seen. The major problem of applying such redundancy elimination to event based realtime monitoring is its cost on storage and processing. Protocol-independent RE detects redundancies based on repetition of byte or bit level data blocks. Effective protocol-independent RE requires looking for small redundant chunks of the order of 32-64 bytes (because most transfers involve just a few packets each). The standard algorithms for such fine scale redundancy are very expensive in memory (i.e. caching of data that have been seen) and processing especially for continuous event streams originated from nodes like MMEs or CPGs, potentially with very high event rates.
In particular,
(1) A large storage is not feasible or cost-effective for a network node.
(2) Processing overhead in redundancy identification may have a major impact on node performance.
(3) On the receiver side, event data has to be reconstructed before any further processing. This would introduce high processing overhead for network management applications, considering the volume of aggregate event traffic.
(4) It introduces additional equipment for each connection to support RE function.
At step 126 there is a look up step to provide an indexed value in the stored index corresponding to an attribute value of an attribute field. This can be implemented in various ways, for example a conventional addressable memory, or a content addressable memory or other circuitry. At step 128 that attribute value in the data stream is replaced with a corresponding indexed value according to the stored index. This selective replacement at the attribute field level is based on a characteristic of the monitored data or of the network, and can be for various purposes for example to reduce redundancy in the monitored data or to enhance it with embedded information to speed up further processing.
Note that existing redundancy reduction solutions including data compression and RE solutions, work at a different layer compared to the proposed embodiments. Hence the proposed embodiments can be deployed in combination with existing redundancy reduction solutions.
Step 126 shows looking up the indexed value corresponding to the attribute value in the stored index. At step 129, there is a step of selectively replacing the attribute values with the corresponding indexed values. In some cases this can involve replacing only some of the attribute values of the selected attribute field, such selection again being based on an amount of redundancy.
Step 126 shows looking up the indexed value corresponding to the attribute value in the stored index. At step 131, there is a step of selectively replacing the attribute values with the corresponding indexed values which have embedded information characteristic of the network, for example information about relationships between the part being monitored and other parts of the network. This can be useful in enabling easier or more rapid processing of the monitoring data. In particular examples the information is about relationships with other parts such as which nodes are neighbouring or in the same cluster, or on the same path for example. This can enable correlation of monitoring data from different parts or averaging or comparing monitoring data from related parts of the network, with less need for additional resources to look up such information.
At step 130 there is a step of processing the monitoring data with indexed values, for example those having the embedded information. This can help enable the monitoring data processor to carry out processing of the monitoring data while it has indexed values, which can be more efficient than equivalent processing before the replacement step.
The embodiments of
Stream collection/termination functions bind to pre-defined ports and await event streams to arrive. This embodiment enables an enhanced event based real-time monitoring solution by (1) analyzing redundancy of event attributes and identifying redundancies, (2) selectively building indices for attribute fields of an event with topological relations, and (3) processing events in real-time based on the indices inside the events.
In at least some embodiments, attribute fields of an event are analyzed in terms of field size, number of unique values, frequency of each field value; redundancy is identified as a result. Indices are built upon the selected redundant values of an attribute field, with topological relations between indexed items (such as cell groups). Values of attribute fields of an event are replaced with the built indices before the event is transmitted to a network management application. The events can be processed in real-time (instead of original values) using information about the part of the network, such as for example topological knowledge, embedded in the indices.
Compared to known replacement of repetitive data with labels or signatures, the embodiments described can provide selective replacement using a stored index based on knowledge (including event format definition and domain knowledge such as topologies) of the data (i.e. events) for the purpose of redundancy reduction, instead of reactively caching data to identify repetition, at byte/bit level or object level. Another notable feature of some embodiments is building the stored index of attribute fields, so that the values of the indexed values (also called indices) includes information about the part of the network being monitored, such as for example the topological relations between the indexed attribute values, and carrying out complex event processing (group based analysis or clustering analysis).
Here topological relation (or topological knowledge) refers to the layout of the connections (or relations) between the items the attributes are representing (for example the items being cells if the attribute is cell ID). It may be physical, such as geo distance between cells, or logical, such as pre-defined cell groups or logical distance between cells (i.e. neighboring relations between cells).
The selector program can incorporate a redundancy analyser program coupled to receive information such as streams of monitoring data, real time or historic, so as to observe frequencies of occurrence of attribute values. It can also receive the definitions of data format of the monitoring data, and the network configuration information, if this is useful to establish limits on numbers of unique attribute values for example.
The redundancy analyser program can be configured to determine which attribute fields have most amounts of redundancy by calculating how many indexed values are needed, thus how much shorter they can be than the attribute values, and how frequently they occur, so that the best selection can be made of which attribute values to replace, to maximise the bandwidth reduction. Amounts of redundancy or similar indications can be output to the indexing program 80 which can select which attribute values to replace and what indexed values to choose to replace them. These selections can be stored in the index as a mapping so that the looking up part described earlier can use the stored index to obtain the indexed values. For attribute values not selected, the stored index can return a null for example, or this information can be stored separately by the looking up part. Various ways of implementing this can be envisaged.
At step 230, characteristics such as amounts of redundancy are derived for different attribute fields based on how many unique indexed values are needed and thus how much shorter the indexed values can be, as well as from the frequency of occurrence of the different values.
A further step 150 is shown of dynamically adapting the manipulation by altering in use the selection of attribute field and/or the selection of attribute values and/or the choice of corresponding indexed values. This can be done as a preliminary step or carried out at any time, if conditions change or if the network configuration changes.
The following sections give further details of the key parts or algorithms. Note that there are multiple implementation options. The following sections are based on the option that redundancy analysis and selective indexing run on network management applications, while indices based streaming runs on network nodes. Later sections give details of other implementation options.
Redundancy analysis over real-time event transmissions focuses on duplicate values of attributes across multiple events. The proposed analysis can exploit the localization of data carried by events, which is achieved based on knowledge on event formats, local network configurations, and event data that is initiated locally. As mentioned earlier, redundancy analysis can run on network management application as one possible implementation.
The proposed analysis is not limited to events or to monitoring data from any particular type of source. It is applicable to monitoring data such as events from multiple sources and can analyze redundancies across multiple event streams of different types.
Event format of a particular source is pre-defined in XML files based on scheme files. In the format, at least the following information is included: Name and identifier of an event; Attribute/parameter definitions of an event, including name and size of each attribute;
One example event is defined as follows.
Attributes of an event and size of each attribute are extracted from the definitions.
This is to calculate number of unique values (i.e. cardinality) of an attribute in an event, to estimate the required indices length. There are three types of cardinalities in the context of this:
Cardinality by definition: the number of possible unique values of an attribute, limited by size of the attribute, may be defined in the event format, such as enum data type. Such information shall be extracted from event format definitions.
Cardinality by configuration: it is common that only a small subset of available values is used in a network instance. Cardinality by configuration refers to the number of unique values of an attribute that are possible in local configuration. Such information shall be extracted from network configurations, such as network topology repositories.
Cardinality by observation: it is highly possible that not all of the attribute values are relevant to the event streams (the redundancy of which is being studied). This is to calculate number of unique values that have been observed over the studied event streams during a pre-defined time window (for example 24 hours. or 7 days, by default). In the proposed best-mode implementation (mentioned earlier), this cardinality can be calculated by storing the events in a data warehouse for the pre-defined period and then using DB functions in cardinality calculation.
This is to further count occurrence frequency of attribute values over the studied event streams during a pre-defined time window (24 hours, or 7 days, by default). The aim is to further reduce the required indices length by only considering most frequent values in the indexing. In one proposed implementation (mentioned earlier), this frequency analysis can be done by storing the events in a data warehouse for the predefined period and then using ranking functions in DB aggregation functions to get the most frequent items of an attribute field.
It is estimated that the popularity of each attribute value follows a zip-f style distribution (http://en.wikipedia.org/wiki/Zipfslaw), i.e. the frequency of any attribute is inversely proportional to its rank in the frequency table. By focusing on only frequent items, the indices length is minimized, which significantly reduces the overhead introduced by the proposed solution.
Note that frequency analysis may be applied to attribute values with large size (for example>4 bytes) to maximize the gains in resource savings. After these three steps, a redundancy table can be constructed for further analysis, having columns, in this example, for attribute name, attribute size in bits, number of unique values defined, number of unique values limited by network configuration, number of unique values observed, list of top-K frequent values, and frequency of Top K frequent values.
Note that redundancy analysis may be implemented in different ways: offline analysis, on-the-fly analysis or a hybrid approach. If the redundancy is analyzed based on event format (for attribute size) and network configurations (for number of configurable unique values), the redundancy analysis can be done immediately by importing event format definitions and topology into the redundancy analysis engine.
If the redundancy is analyzed based on event traffic observations (for observed number of unique values and top-k frequent values), the redundancy analysis requires a per-defined observation period before indices can be built. Moreover, the built indices are subject to on-the-fly changes if the frequency is observed periodically, i.e. every observation window.
This is to build indices on attribute fields of an event for the following two purposes:
In particular, the algorithm is to select attribute values to be indexed and then determine the index for each attribute value based on their topological relations (such as groups for group analysis and distances for clustering analysis). Note that indexing for real-time processing can be implemented separately from indexing for redundancy elimination.
The output of this algorithm is an index table of values for selected event attributes. In one implementation (as mentioned earlier), a network management application is responsible for redundancy analysis and then further building the indices. The generated indices table shall be distributed by network management application towards network nodes, which uses the indices table in event generation.
The index doesn't have to have a one-to-one mapping. One attribute value should be mapped onto only one index value; however, there might be multiple attribute values that are mapped onto the same index value. For example, one SGSN node may have multiple SGSN IP addresses. It is possible that all SGSN IP addresses of the same SGSN node are mapped onto the same index value; however, when looking up a particular SGSN IP address, there should be only one index value (to keep the integrity of the data).
This can help to maximize the gains (i.e. resource saving) and minimize the overheads of introducing indices into event transmission and processing. The idealistic attribute values to be indexed if compression is the aim are those with:
Accordingly, based on the candidate redundancy table calculated previously, one implementation of the selection algorithm is as follows:
Note that this is only one of various possible implementations. The observed cardinality, or the configured cardinality, may be used to calculate the required length of index, instead of using top-K attribute values. One reason for this is that finding top-K attribute values may consume considerable resources.
In one example the following fields can be considered for indexing:
It is clear that not all of the values of an attribute may be indexed (i.e. partial indexing). So an extra bit may be required as indexing flag, to differentiate indexed values from original values of the attribute.
This is to use indexed attribute values to facilitate real-time event processing. Originally, each attribute field is encoded with values as they are given. In order to carry out real-time analysis on event attributes such as group based on analysis on nodes (based on node IP addresses or node IDs), terminal types (based on IMEISV), URIs and APNs, a list of group membership shall be maintained and looked up for each event to be processed. With extremely high aggregate event rate, this may consume significant system resources.
It is proposed to build index values for the attributes that are to be processed in real time by the network management applications. In particular, the built index can reflect the topological relations between the attribute values. Note that the detailed indexing scheme depends upon the operations of the complex event processing. At least the following indexing options can be considered:
This is to add group IDs as part of the index, prepared for further group based analysis. The group IDs may be hierarchically organized with sub group IDs. The group IDs shall be pre-defined based on group definitions (such as terminal groups based on handset terminal types, subscriber groups based on IMSIs, SGSN groups, APN groups and URI groups). The groups are usually manually defined at the network management applications. Examples of a format of an indexed value having additional information are shown in tables 1 and 2 below:
(2) Hierarchical Indexing with Topological Proximity or Other Logical Proximity
This is to index attribute values such as node IP addresses to reflect logical or topological proximity between nodes, to facilitate possible clustering based analysis. One example is a tree structure having a single top level node, three middle level nodes (1, 2, 3) and five lower level nodes. A first of these is labelled 1-1-1, as it is linked to the first of the middle level nodes. The next three lower level nodes are linked to the second of the middle level nodes and so can be labelled 1-2-2, 1-2-3, and 1-2-4. The fifth lower level node in this example is linked to the third of the middle level nodes and so can be labelled 1-3-5.
In a simplified network, the cell IDs at the bottom of the topology can be indexed hierarchically so that the values of the indices reflect neighboring relations between cells. Further clustering based analysis can be applied based on such logical proximities, to cluster cells and pinpoint root causes of failures.
For example, if events associate with 1-2-2, 1-2-3 and 1-2-4 report handover failures, it can be preliminarily inferred that node 1-2 can possibly be the cause. Without such enriched indexed values, the network management applications would have to map events onto network topologies before carrying out clustering, since cell IDs carry no information about network topology. This is not feasible or at least expensive to implement in the presence of high event rates.
(3) Building Indices with Geographical Proximity
This is to index attribute values such as node IDs with geographical information (such as cell geo location), so that the index reflects geographical distances, for example, between cells. The corresponding clustering analysis can be further carried out based on the geo information within the index. Note that the attribute values may be indexed in on-demand way and therefore dynamic. In case that group based (complex) real-time event processing is enabled, the index may be built to incorporate group IDs. After a pre-defined time period, the indices shall be invalidated. New indices might be built for the same attribute values for the purpose of different types of realtime event calculations. A validation time window may be associated with such index. The index may get invalid after the pre-defined time period and therefore removed from the indices table.
As mentioned earlier, complex event processing such as group based analysis and cluster analysis gets much simpler using indices, since the indices already contain information about groups and distances between indexed items. No lookup or correlation operations are required. The corresponding counters can be configured to calculate corresponding performance indicators in real time.
The proposed solutions may be implemented in several different ways:
(1) Plug-in based implementation: the redundancy analysis (and replacement of attribute values with indices) can be implemented as plug-ins of the encoding process of the events. Before events are written into binary blocks, the original values of each attribute are examined by the redundancy analyzer and replaced with indices if the indices exist in the indices table.
(2) Middle-box implementation: alternatively, some or all of the proposed operations can be done in a separate middle box. This may introduce extra cost but with little impact on existing systems.
(3) Sender initiated redundancy elimination: the proposed solution in previous sections assumes that the network management applications carry out redundancy analysis and build indices. Alternatively, these operations may be carried out by event senders, i.e. network nodes. The generated indices need to be sent to the network management applications. Indices from different nodes may be merged.
(4) Event re-construction: the proposed solution doesn't require events to be re-constructed at the receiver side. That is, the receiver may keep the received events at they are, without replacing the indices back with original values. This may further reduce the size of the storage required for the same amount of events, since the indices are much smaller in length than the original values. Accordingly, SQL queries need to be looked up in the indices table before executed onto the stored events.
(5) Hardware implementation of event processing based on indices: a hardware based processing solution is particularly suitable because of the following benefits: Firstly, indices can be used to remove differences between lengths of attribute values; that is, all attribute values of a field or of an event can have equal lengths. This can reduce the complexity of using CAM or TCAM (http://en.wikipedia.org/wiki/Content-addressable_memory) in event processing.
Secondly, indices contain all required information for the analysis. There is no need to carry out further memory accesses and searches for such correlation operations.
As has been described, some embodiments use a method, algorithms and functions for event based real-time monitoring of large-scale operational networks and services by selectively building indices for attribute fields of an event to (1) eliminate redundancy in event transmission and processing and (2) facilitate real-time event processing. This can involve a method of event based real-time monitoring of large-scale operational networks and services, comprising: —analyzing size, number of unique values of an attribute field, and frequency of each value of an attribute field inside an event, to identify candidates for redundancy elimination; selectively indexing the identified redundancy attribute values; transmitting the built indexes to network management applications; generating an event recording/describing node-level or subscriber level behaviors, procedures or periodic reports, according to predefined formats; replacing the identified field values of the event with the corresponding indices and transmitting the event (for example over TCP/IP protocols) to network management applications; and processing the events in real-time based on the indices inside the received events.
Other embodiments involve methods of eliminating redundancy in event transmission from a plurality of nodes to network management applications in a large-scale operational network, by: identifying redundancy across multiple events in a event stream by analyzing event format definitions, number of unique values of an attribute field inside an event and frequency of each values of an attribute field; building indices selectively for the identified attribute fields and transmitting the indices to network management applications; replacing the identified field values of the event with the corresponding indices and transmitting the event (for example over TCP/IP protocols) to network management applications.
A third group of embodiments have steps of analyzing redundancy for real-time transmissions in event based monitoring, by: analyzing event formats from different sources/nodes and extracting attributes, unique values (if available) and size of each attribute; calculating number of unique values for each attribute field extracted based on local domain knowledge, including network topologies and configurations; identifying most frequent values of each attribute field extracted by periodically sampling events in the real-time transmissions; and identifying candidate redundant attribute fields using calculated results from previous steps.
A fourth group of embodiments has methods of real-time complex event processing based on indices of attribute fields of an event, by: building indices for an attribute field, with topological relations between indexed items; replacing values of attribute fields of an event with the built indices and transmitting the events over TCP/IP protocols to network management applications; and processing the events using indices instead of original values, using topological relations between indexed items for real-time processing.
A fifth group of embodiments involves methods of selectively building indices for attribute fields of an event to eliminate redundancy, by calculating possible unique values of an attribute field of an event based on network topology and network configurations; counting number of unique values of the attribute field of an event that are observed in event streams during a pre-defined time period; calculating frequencies of occurrence of each value of the attribute field of an event to identify most frequent values of an attribute file during a pre-defined time period; estimating length of an index required for the most frequent attribute values of the attribute field; calculating bandwidth savings by subtracting the index length from the size and then multiplying frequency of the attribute field; and building indices for the attribute field if the bandwidth savings is above a pre-defined threshold.
A sixth group of embodiments involves methods of event based network and service monitoring of a large-scale operational network by selecting attribute fields and building indices for selected attribute fields based on redundancy observations on pre-defined network segments within a pre-defined time period. (This shows dynamically building indices based on temporal/spatial observations of redundancy.)
A seventh group of embodiments has methods of event based network and service monitoring of a large-scale operational network by: inputting the type of aggregation calculations of the event processing engine (clustering or group analysis); building indices based on the type of aggregation calculations and topological knowledge of indexed attribute fields, so that the values of the indices reflects the topological relations of the attribute fields; and transmitting and processing events with indices until further instructions on the type of aggregation calculations are received. (This shows dynamically building indices on demand, based on the types of the processing to be carried out by network management applications; and changing indices or adding indices if a different type of processing is expected.)
Various of the embodiments have some or all of these following benefits over existing solutions for compressing streams of monitoring data such as event based streams:
(1) Facilitated real-time complex eventstream processing by indexing attribute fields based on processing requirements. By implementing the solution using hardware such as CAM, the performance can easily reach multiple millions of event rate per second (based on mature DPI performance; a 10 Gbps throughput DPI solution with average packet rate of 500 bytes).
(2) Reduced transmission overhead: The proposed redundancy elimination solution combines knowledge on events (i.e. event formats), domain knowledge (in determining number of unique values of an attribute field) and frequency of a value of the attribute field in redundancy identification, which maximizes the ratio of redundancy elimination. For example, by building indices on URIs (top 1000 URIs) and APNs (a total of 1000 APNs), the proposed solution could reduce an event of 3431 bits to 456 bits, which leads to 88% overhead reduction.
(3) Reduced event processing overhead: Indices, which can be in binary form, can be decoded and loaded into DB much faster than the original data. In addition, since the values are indexed, any operations on the attribute values, such as type conversion, can be done in the indices table instead of on the attribute of each event.
(4) Improved event processing throughput: Attribute fields of events are featured with variable lengths, which makes hardware based processing difficult. By building indices with equal lengths, it is feasible to use hardware such as TCAM to speed up event processing.
(5) Event independent solution: The proposed solutions are not limited to any specific event types. Redundancy is identified by analyzing event definitions (preferably XML definition files).
(6) Lossless redundancy elimination: Instead of removing duplicated values from events, the attribute fields are re-encoded with indices. No information has been lost during the process.
(7) Compatibility with other redundancy elimination or data compression solutions: The proposed solution operates above event/application layers, which is different from those RE solution running on raw data. Those redundancy elimination or stream compression techniques can be applied side by side with the proposed selective indexing.
(8) There is no need to recover data from redundancy elimination, as compared to RE solutions; original attribute values can be accessed using the stored index.
As has been described above, streams of monitoring data relating to part of a communications network, are manipulated automatically by looking up a corresponding indexed value in a stored index, and selectively replacing that attribute value with the corresponding indexed value. The selective replacement is based on a characteristic of the stream of monitored data, or the network being monitored. The selection or the indexed values can be adapted dynamically. Such selective replacement at the attribute field level can enable the data to be enriched with embedded information or be compressed more efficiently with less processing overhead by exploiting knowledge of the data format and the network configuration. It is compatible with hardware implementations. The embedded information can enable subsequent processing of the monitored data to be speeded up.
Other variations can be envisaged within the scope of the claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2012/065111 | 8/2/2012 | WO | 00 | 5/28/2015 |