The present disclosure generally relates to the field of stream data processing and, more specifically, to a technique for efficiently collecting data in a distributed stream data processing system that employs a network to convey one or more data streams.
So-called “big data” has encouraged organizations to collect as much data as possible in order to perform data analytics on the collected data. Data volumes and the diversity of new data sources are exploding everywhere. More than ever before, many businesses have come to rely on the data collected from a plurality of sources in order to take the best decision possible based on the information available. Time is becoming a critical factor for decisions makers and this has increased the demand for processing streaming data in real-time to leverage insights with minimum delay for business operations. Nowadays, everything that happens in a company can be recorded, collected, transmitted to a stream data analytics platform, and collated with data collected from a plurality of data sources, to make it all available as a real-time stream. Data should be collected as soon as it is available from the site where it has been produced and transferred to the place where it will be stored, correlated with other data sources, analysed and/or brokered to some other final destination.
However, data collection is costly and consumes a lot of resources, especially when the data volume is high and there are multiple data sources that produce the data to be collected. The lower the reporting time period for each of the data producers is, the greater the problem of collecting such an amount of data will be. There is a need for collecting and transmitting data in the most efficient way possible.
As an illustrative example, in Machine to Machine scenarios, new applications are increasing the volume, variety and velocity of data that can be used for many different purposes such as tracking location, heath monitoring, etc. Another example is event-based monitoring applications, which have gained renewed interest and have the potential to scale to hundreds of data sources and possibly thousands of user clients. Event-based monitoring is used, among other possible applications, to obtain and/or execute rule-based actions in real-time taking into account the data that have previously been analysed at any time by the system. A device (a data producer) such as a sensor or smart meter might record events by means of some data (such as temperature, CPU load, energy consumption by an appliance, etc.) that are transmitted through a network from the data producer to an application or server (a data consumer) that will correlate, analyse and produce meaningful information from the data from time to time or in real-time (using, for instance, stream data processing technology). Although the majority of the devices may report data rather infrequently, many other devices (such as smart meters, tracking position devices, etc.) release data in almost real-time, increasing the volume of data transmitted and the utilization of resources.
As a yet further example, in the case of communication networks, traffic loads are increasing continuously due to the growing use of smartphones and applications. Identifying potential bottlenecks early helps operators continuously maintain good quality of services (QoS) for their users. It is becoming challenging to efficiently manage network capacity to guarantee the QoS for subscribers, which makes it important to obtain more information about what is happening in the network, with no delay. Real-time contextual data about how the network is performing at any time (involving data from several nodes and related bearers, capacity, etc.) allows systems managers to proactively monitor and improve customer experience. Such intelligent applications require stream data collected from multiple sources with various latencies to be correlated and analysed in order to reveal actionable insights that might be useful for maintaining the required QoS.
In all these cases, obtaining data from multiple sources is costly in terms of network bandwidth and other computational resources that are involved in the process. The challenge increases with the increasing number of high throughput data producers, such as sensors or smartphones. Furthermore, reducing the reporting time period increases the volume of data to be transmitted through the network and consequently the load on the server responsible for their collection and processing. There is a need to identify and handle, in an efficient way, which data are released from data producers and how these data are transmitted. A reduction in the data exchanged between the data producers and the data consumers will also reduce the chance of overloading situations and will release some extra network capacity that can be used for some other purposes. Moreover, streams are typically of a very high rate and have to be transferred continuously. When compared to the abundant processing power provided by a large number of servers, network bandwidth is the bottleneck in such a context. When applications fail or degrade performance then the system must react intelligently to reduce the workload in the entire system.
While an approach requiring the application to release messages immediately may not be problematic in small-scale systems, in larger systems the need to simultaneously update information from all the components can provide a significant impediment. In order to conserve bandwidth and reduce storage and processing requirements, storing and transmitting data in an efficient way is more than desirable.
The present inventors have identified short-comings in various conventional approaches to solving the problems identified above. For example, one possible solution that allows controlling data loads is load shedding. Load shedding discards data until enough processing/storage resources become available. However these methods present several shortcomings. Firstly, in some systems it is not possible to shed data because all requests need to be handled (i.e. no information loss is acceptable). Secondly, they are implemented in the data consumer side; as this is only responsible for deciding which data will be shed and for how long, load shedding does not prevent data producers from sending data across the network, which leads to a misuse of the network, both in terms of bandwidth and processing.
Another conventional approach is to delay, for a period of time, the reporting of new events. Thus, data producers store some data locally, within this time window, until reporting is enabled again. The advantage is obvious; as long as the data is not being sent out, less processing is being done by system. However, this approach is unfeasible when data needs to be released in real-time. In these scenarios, the value of the stored data may rapidly decline over time, meaning that when it is finally ready to be transmitted, it might not be useful at all.
Another approach aims at constraining the data producers' reporting capabilities until data consumers are ready to handle the load. However, this approach has several inconveniences. Data are usually discarded on a time-basis fashion, without considering if they are really relevant for the data consumer or not. Besides, some data producers do not support any filtering mechanism at all (e.g. the aforementioned time-basis one or any other based on the application of certain filtering rules).
The present inventors have devised a scheme of collecting data records in a distributed stream data processing system that exploits the tendency in some practical applications for data records in the flow to follow a pattern. The embodiments of the present invention described herein allow the amount of data that is to be transmitted between data sources and data consumers to be diminished, thereby freeing up valuable network resources to process other traffic.
A communication system according to an embodiment of the present invention comprises a first network node and a second network node, where the first network node is arranged to transmit a flow of data records to the second network node via a network, and the second network node includes a data record processing module arranged to receive and process the data records. In the embodiment, data records of the flow are acquired and analysed to determine whether they match a part of a pattern of one or more patterns each defining a respective sequence of data records. When the acquired data records match part of one of the patterns, the matching pattern is identified, and an indication of the matching pattern is generated along with at least one transmission control signal for the first network node to prevent the first network node from transmitting to the second network node remaining data records in the flow that follow the acquired data records and whose number is equal to the number of data records in the remaining part of the matching pattern. The communication system of the embodiment also has a pattern handler that includes a data store which stores the one or more patterns, the pattern handler being communicatively coupled to the data record processing module via a communication path that is separate from the network and thus uses none of the network's resources. In response to the indication of the matching pattern, the pattern handler predicts the remaining data records using the pattern of the stored patterns that is indicated by the indication, and provides the predicted data records to the data record processing module via the communication path. In this way, the data record processing module can be provided with predicted data records that are the same (or substantially the same) as those that would have been transmitted via the network, at the mere cost of communicating the aforementioned indication of the matching pattern or the at least one transmission control signal across the network, which would, in general, place a much smaller burden on the available network resources than the transmission of data records corresponding to those that have been predicted. Valuable network resources can thus be made available for handling other network traffic, without compromising on the accuracy of data records provided to the data record processing module of the second network node.
More specifically, the present inventors have devised a communication system comprising a first network node and a second network node, wherein the first network node is arranged to transmit a flow of data records to the second network node via a network, and the second network node comprises a data record processing module arranged to receive and process the data records. The communication system further comprises a controller for controlling the transmission of data records by the first network node to the second network node, the controller comprising: an acquisition module operable to acquire data records of the flow of data records; a pattern recognition module arranged to determine whether the data records acquired by the acquisition module match a part of a pattern of one or more patterns each defining a respective sequence of data records and, when the acquired data records match part of a pattern of the one or more patterns, to identify which of the one or more patterns the acquired data records match; and a control signal generator module arranged to generate, when the pattern recognition module has identified a pattern matching the acquired data records, an indication of the matching pattern and at least one transmission control signal for the first network node to prevent the first network node from transmitting to the second network node remaining data records in the flow that follow the acquired data records and whose number is equal to the number of data records in the remaining part of the matching pattern. The communication system further comprises a pattern handler comprising a data store that stores the one or more patterns, the pattern handler being communicatively coupled to the data record processing module via a communication path that is separate from the network and responsive to the indication of the matching pattern to predict the remaining data records using the pattern of the stored patterns that is indicated by the indication, and provide the predicted data records to the data record processing module via the communication path.
The present inventors have further devised a controller for use in a communication system, the communication system comprising: a first network node and a second network node, wherein the first network node is arranged to transmit a flow of data records to the second network node via a network, and the second network node comprises a data record processing module arranged to receive and process the data records; and a pattern handler comprising a data store that stores one or more patterns each defining a respective sequence of data records, the pattern handler being communicatively coupled to the data record processing module via a communication path that is separate from the network, wherein the pattern handler is responsive to an indication of a pattern to predict data records using a pattern of the stored patterns that is indicated by the indication, and provide the predicted data records to the data record processing module via the communication path. The controller is arranged to control the transmission of data records by the first network node to the second network node, and comprises: an acquisition module operable to acquire data records of the flow of data records; a pattern recognition module arranged to determine whether the data records acquired by the acquisition module match part of a pattern of the one or more patterns and, when the acquired data records match part of a pattern of the one or more patterns, to identify which of the one or more patterns the acquired data records match; and a control signal generator module arranged to generate, when the pattern recognition module has identified a pattern matching the acquired data records, at least one transmission control signal for the first network node to prevent the first network node from transmitting to the second network node remaining data records in the flow that follow the acquired data records and whose number is equal to the number of data records in the remaining part of the matching pattern, and an indication of the matching pattern to cause the pattern handler to predict the remaining data records and provide the predicted data records to the data record processing module via the communication path.
The present inventors have further devised a network node operable to transmit, via a network, a flow of data records to a second network node comprising a data record processing module which is arranged to receive and process the data records transmitted by the network node, the second network node being communicatively coupled to a pattern handler via a communication path that is separate from the network, wherein the pattern handler is responsive to an indication of a pattern to predict data records using a pattern of the stored patterns that is indicated by the indication and provide the predicted data records to the data record processing module via the communication path, wherein the network node comprises a controller as set out above.
The present inventors have further devised a pattern handler for use in a communication system comprising: a first network node and a second network node, wherein the first network node is arranged to transmit a flow of data records to the second network node via a network, and the second network node comprises a data record processing module arranged to receive and process the data records; and a controller for controlling the transmission of data records by the first network node to the second network node. The controller comprises: an acquisition module operable to acquire data records of the flow of data records; a pattern recognition module arranged to determine whether the data records acquired by the acquisition module match part of a pattern of one or more patterns each defining a respective sequence of data records and, when the acquired data records match part of a pattern of the one or more patterns, to identify which of the one or more patterns the acquired data records match; and a control signal generator module arranged to generate, when the pattern recognition module has identified a pattern matching the acquired data records, an indication of the matching pattern and at least one transmission control signal for the first network node to prevent the first network node from transmitting to the second network node remaining data records in the flow that follow the acquired data records and whose number is equal to the number of data records in the remaining part of the matching pattern. The pattern handler is operable to communicate with the data record processing module via a communication path that is separate from the network, and comprises: a data store that stores the one or more patterns; and a data record prediction module arranged to select a pattern of the stored patterns based on the indication generated by the control signal generator, predict data records using the selected pattern, and provide the predicted data records to the data record processing module via the communication path.
The inventors have further devised a network node operable to receive a flow of data records that has been transmitted by a second network node via a network, the network node comprising: a data record processing module arranged to receive and process the data records; a data store that stores one or more patterns each defining a respective sequence of data records; a controller for controlling the transmission of data records by the second network node. The controller comprises an acquisition module operable to acquire data records of the flow of data records; a pattern recognition module arranged to determine whether the data records acquired by the acquisition module match part of a pattern of the one or more patterns stored in the data store and, when the acquired data records match part of a pattern of the one or more patterns, to identify which of the one or more patterns the acquired data records match; and a control signal generator module arranged to generate, when the pattern recognition module has identified a pattern matching the acquired data records, an indication of the matching pattern and at least one transmission control signal for the second network node to prevent the second network node from transmitting remaining data records in the flow that follow the acquired data records and whose number is equal to the number of data records in the remaining part of the matching pattern. The network node further comprises a pattern handler responsive to the indication of the matching pattern to predict the remaining data records using the pattern of the stored patterns that is indicated by the indication of the matching pattern, and provide the predicted data records to the data record processing module.
The present inventors have further devised a method of controlling the transmission of data records in a communication system comprising: a first network node and a second network node, wherein the first network node is arranged to transmit a flow of data records to the second network node via a network, and the second network node comprises a data record processing module arranged to receive and process the data records; and a pattern handler comprising a data store that stores one or more patterns each defining a respective sequence of data records, the pattern handler being communicatively coupled to the data record processing module via a communication path that is separate from the network, wherein the pattern handler is responsive to an indication of a pattern to predict data records using a pattern of the stored patterns that is indicated by the indication, and to provide the predicted data records to the data record processing module via the communication path. The method comprises: acquiring data records of the flow of data records; determining whether the acquired data records match a part of a pattern of the one or more patterns; and generating, when the acquired data records have been determined to match a part of a pattern of the one or more patterns: (i) at least one transmission control signal for the first network node to prevent the first network node from transmitting to the second network node remaining data records in the flow that follow the acquired data records and whose number is equal to the number of data records in the remaining part of the matching pattern; and (ii) an indication of the matching pattern for use by the pattern handler to predict the remaining data records.
The present inventors have further devised a method of processing data records in a communication system comprising: a first network node and a second network node, wherein the first network node is arranged to transmit a flow of data records to the second network node via a network, and the second network node comprises a data record processing module arranged to receive and process the data records; and a controller for controlling the transmission of data records by the first network node to the second network node. The controller comprises: an acquisition module operable to acquire data records of the flow of data records; a pattern recognition module arranged to determine whether the data records acquired by the acquisition module match part of a pattern of one or more patterns each defining a respective sequence of data records and, when the acquired data records match part of a pattern of the one or more patterns, to identify which of the one or more patterns the acquired data records match; and a control signal generator module arranged to generate, when the pattern recognition module has determined a pattern matching the acquired data records, an indication of the matching pattern and at least one transmission control signal for the first network node to prevent the first network node from transmitting to the second network node remaining data records in the flow that follow the acquired data records and whose number is equal to the number of data records in the remaining part of the matching pattern. The method comprises: receiving the indication of the matching pattern generated by the control signal generator; selecting a pattern of the stored patterns based on the received indication of the matching pattern; predicting the remaining data records using the selected pattern; and providing the predicted data record to the data record processing module via a communication path that is separate from the network.
The present inventors have further devised a computer program product, comprising a non-transitory computer-readable storage medium or a signal, carrying computer program instructions which, when executed by a processor, cause the processor to perform at least one of the methods set out above.
Embodiments of the invention will now be explained by way of example only, in detail, with reference to the accompanying figures, in which:
The second network node 300 is provided at a data consumer site and comprises a data record processing module 600 which is configured to receive the data records and process them in any way required by the data consumer. The second network node 300 may alternatively function as a forwarding element that forwards data records received thereby to a data consumer site or another intervening forwarding element.
In the present embodiment, the first network node 200 is configured to process data records received from a data producer site (not shown) in any suitable or desirable way and to forward the processed data records towards the second node 300 via the network 500. However, the first network node 200 may alternatively generate the data records itself. Depending on the use case and the scenario, the information contained in the data records may change. For example, in the context of a smart metering application, a data record may provide an indication of electricity consumption measured at any time by the smart meter. As another example, in the context of a performance monitoring application for a network management system, the data record may be related to any key performance indicator (KPI), CPU consumption, bandwidth utilisation, etc. provided by the system being monitored at any time and sent towards the location of the performance monitoring application server.
The first network node 200 comprises a controller 700 for controlling the transmission of data records by the first network node 200 to the second network node 300. Functional components of the controller are illustrated in
The controller 700 may further comprise a data store 740 that stores one or more patterns each defining a respective sequence of data records, where each of the one or more patterns is stored in association with a respective pattern identifier that identifies the pattern, as well as an indication of the pattern's accuracy (which, as will be explained in the following, will depend on how many times departures from the pattern have been observed during prior operation of the system). In the present embodiment, a plurality of patterns is stored in the data store 740, each in association with a respective pattern identifier and accuracy indication. The controller 700 may, as in the present embodiment, also include a pattern monitoring module 750 and a pattern learning module 760. The functionalities of these components of the controller 700 will be described below in detail.
Referring again to
The pattern handler 800 is communicatively coupled to the data record processing module 600 via a communication path 900 that is separate from the network and, more specifically, internal to the second network node 300. Where, as in the present embodiment, the functions of the data record processing module 600 an the pattern handler 800 are implemented in common data processing hardware, the communication path 900 is internal to that hardware. However, in other possible implementations, wherein the data record processing module 600 and the pattern handler 800 are implemented in separate hardware, the communication path 900 may, for example take the form of a data bus or a direct data link that is separate from the network 500 and thus uses none of its resources. As will be explained in the following, the pattern handler 800 is configured to predict data records under certain circumstances and to provide the predicted data records to the data record processing module 600 via the communication path 900. The pattern handler 800 is also responsible for analysing the validity of the existing patterns and providing feedback about the accuracy of the patterns in use to the controller 700 in order to update the way in which patterns are learned and recognized.
Regarding the physical implementation of the controller 700 and the pattern handler, this could be done in a number of different ways. For example, a programmable signal processing apparatus of the general kind shown schematically in
The signal processing apparatus 1000 comprises a communications module 1100, a processor 1200, a working memory 1300, and an instruction store 1400 storing computer-readable instructions which, when executed by the processor 1200, cause the processor 1200 to perform the processing operations hereinafter described to generate at least one transmission control signal for the first network node 200 and a pattern indicator based on data records acquired from the flow 400 and the one or more patterns stored in the data store 740 (when implementing the functionality of the controller 700) or to predict data records based on stored one or more patterns and a received pattern indicator (when implementing the functionality of the data handler 800).
The instruction store 1400 is a data storage device which may comprise a non-volatile memory, for example in the form of a ROM, a magnetic computer storage device (e.g. a hard disk) or an optical disc, which is pre-loaded with the computer-readable instructions. Alternatively, the instruction store 1400 may comprise a volatile memory (e.g. DRAM or SRAM), and the computer-readable instructions can be input thereto from a computer program product, such as a computer-readable storage medium 1500 (e.g. an optical disc such as a CD-ROM, DVD-ROM etc.) or a computer-readable signal 1600 carrying the computer-readable instructions.
The working memory 1300 functions to temporarily store data to support the processing operations executed in accordance with the processing logic stored in the instruction store 1400. As shown in
In the present embodiment, the combination 1700 of the processor 1200, working memory 1300 and the instruction store 1400 (when appropriately programmed by techniques familiar to those skilled in the art) together constitute the components of the controller 700 shown in
The processing operations performed by the controller 700 to control the transmission of data records via the network 500 will now be described with reference to
At start-up, the controller 700 may, as in the present embodiment, access its data store 740 to acquire the patterns stored therein and update the data store 820 of the pattern handler 800 via the network 500 to store the same patterns and associated pattern identifiers as the data store 740 of the controller 700. The pattern handler 800 stores each received pattern in the data store 820 in association with the pattern identifier that identifies that pattern.
Each of the stored patterns may be a sequence of actual data records that is known to repeat from time to time in the data flow. However, the stored pattern may, as in the present embodiment, be provided in the more compact form of a mathematical function that models the repeating sequence of data records. This function, together with an indication of the sequence length (which defines the time-frame of the pattern), can be used to reconstruct the repeating sequence of data records. Regardless of their form, the patterns may be entered directly by a user who is familiar with the behaviour of the data record source(s) and/or they may be learned autonomously by the pattern learning module 760 in the manner described below.
In step S20, the acquisition module 710 acquires a data record that is to be transmitted by the first network node 200 towards the second network node 300. During the repeated execution of step S20 that is described below, the acquisition module 710 acquires each data record from the flow, in turn. However, in other embodiments, the acquisition module 710 may alternatively acquire only some of the data records (e.g. every jth data record in the flow, where j is an integer).
In step S30, the acquisition module 710 determines whether the control signal generator module 730 has disabled the transmission of data records by the first network node 200. As will be explained below, the transmission of data records by the first network node 200 is disabled when the pattern recognition module 720 has determined that the data records that are to be transmitted appear to be following a known pattern. If the transmission of data records by the first network node 200 has not been disabled, the process proceeds to step S40, otherwise it proceeds to the pattern monitoring process S50 described below.
In step S40, the acquisition module 710 stores the data record acquired in step S20 in, e.g. a First-In, First-Out (FIFO) buffer. Then, in step S60, the acquisition module 710 determines whether the FIFO buffer is full. In general, the FIFO buffer has the capacity to store N data records, where N is an integer greater than or equal to two. By way of example, N=4 in the present embodiment. If the FIFO buffer is not yet full, the process loops back to step S20, the next data record in the flow is acquired and, in a repeat of step S40, added to the FIFO buffer. By the repeated performance of step S20 to S60, the FIFO buffer is filled up, one data record at a time, to store a sequence of N=4 data records that follow one another in the data record flow 400 (i.e. the ith, (i+1)th, (i+2)th and (i+3)th data records in the flow).
Once the FIFO buffer has been filled up, the process proceeds to step S70, where the pattern recognition module 720 determines whether the N=4 data records that have been acquired match a part of a pattern of the patterns that are stored in the data store 740. In other words, the pattern recognition module 720 determines whether the sequence of acquired data records appears, in the same form or with data record values that are the same to within a predetermined tolerance (e.g. 2%, 5% or 10%), in any part (preferably at the beginning) other than a concluding part of a sequence of data records that has been constructed using any of the patterns stored in the data store 740. Thus, the pattern recognition module 720 attempts to model the acquired data records using at least some of the stored patterns, looking for a pattern that provides a satisfactory fit to the acquired data records. The goodness of fit for each pattern considered may be determined in any suitable way known to those skilled in the art. If the acquired data records are determined in this way to match any of the stored patterns, the pattern recognition module 720 determines the pattern identifier of the matching pattern, i.e. the pattern identifier associated with the pattern which the acquired data records have been found to follow and which data records subsequent to those acquired might be expected to also follow.
In case the pattern recognition module 720 determines in step S70 that the acquired data records match a part of each of two or more of the patterns stored in the data store 740, it may, as in the present embodiment, select from those candidate patterns the pattern that is indicated in the data store 740 to have the highest accuracy. In case there are two or more matching patterns that are currently indicated to have the same accuracy (or in case accuracy data is not available or not yet available) but one of those matching patterns defines a shorter sequence of data records than each of the one or more other matching patterns, the pattern recognition module 720 preferably selects the shortest pattern as the matching pattern that is being followed by the acquired data records. This selection rule is based on the inventors' finding that patterns defining shorter sequences of data records are more likely to be consistently followed than patterns defining longer sequences of data records.
If the pattern recognition module 720 identifies a pattern matching the acquired data records in step S70, the process proceeds to step S90, otherwise it proceeds to step S80. In step S80, the control signal generator module 730 controls the first network node 200 to serialise (i.e. appropriately format for transmission through the network 500) and transmit the first of the data records to enter the FIFO buffer (i.e. the “oldest” data record in the buffer) to the second network node 300. The process then loops back to step S20 and then on to step S40, in which the FIFO buffer is replenished to store the next data record from the flow 400 that immediately follows that previously added to the FIFO buffer. In this way, the controller 700 continues to look for a pattern that matches the data records in the flow 400, in the meantime causing the first network node 200 to forward data records to the second network node 300 via the network 500.
When the pattern recognition module 720 identifies a pattern matching the acquired data records then, in step S90, the control signal generator module 730 generates and transmits to the pattern handler 800, via the network 500, a message comprising the pattern identifier of the matching pattern that was determined by the pattern recognition module 720 in step S70. In addition, the control signal generator module 730 disables the transmission of data records by the first network node 200 by generating at least one transmission control signal for the first network node 200 to prevent the first network node 200 from transmitting to the second network node 300 remaining data records in the flow that follow the acquired data records, the number of the remaining data records that are not be transmitted to the second network node 300 being equal to the number of data records in the remaining part of the matching pattern. Thus, the data records that are expected to complete the matching pattern are not transmitted via the network 500 and, instead, the pattern identified associated with the matching pattern is transmitted to the pattern handier 800. As will be explained further below, the pattern handler 800 is arranged to respond to receipt of the pattern identifier by retrieving the pattern from the data store 820 which is associated with the received pattern identifier, to use the retrieved pattern to predict the remaining data records, and to provide the predicted data records to the data record processing module 600 via the communication path 900.
More specifically, the control signal generator module 730 may, as in the present embodiment, generate in step S90 a first (“stop”) signal to prevent the first network node 200 from transmitting data records to the second network node 300, and subsequently a second (“start”) signal to cause the first network node 200 to resume transmitting data records, where the time interval between the transmission of the first and second signals is set to allow the pattern handler 800 to predict and provide the remaining data records of the matching pattern to the data record processing module 600. However, in other embodiments, the control signal generator module 730 may generate in step S90 a single transmission control signal for the first network node 200, which specifies the number of data records whose transmission to the second network node 300 is to be prevented.
Furthermore, the generation of the transmission control signal(s) and the indication of the matching pattern by the control signal generator module 730 may be made conditional on the network being close to a congested state. In this case, the control signal generator module 730 may be arranged to determine whether usage of network bandwidth available for communication between the first network node 200 and the second network node 300 exceeds a predetermined level, and to generate the indication of the matching pattern and the at least one transmission control signal when the determined usage exceeds the predetermined level.
In step S100, the acquisition module 710 empties the FIFO buffer and, in step S110, sets a counter “i” used by the pattern monitoring module 720 as hereinafter described to 1. The process then loops back to step S20, where the acquisition module 710 acquires the next data record from the flow 400.
In some embodiments, the data record source(s), which provide, over time, the data records that are to be transmitted by the first network node 200 to the second network node 300, may be certain to provide some of their data records in sequences that never deviate from the stored patterns. In these scenarios, once acquired data records are determined to follow one of the stored patterns, it is certain that subsequent data records in the flow 400 will continue to follow the matching pattern. In these cases, the controller 700 may control the first network node 200 to simply discard remaining data records in the flow 400 that follow the acquired data records and whose number is equal to the number of data records in the remaining part of the matching pattern (i.e. the part of the sequence of data records of the pattern other than the part found to match the acquired data records in step S70).
However, the present embodiment is configured to cater for more unpredictable data record sources, whose data records may deviate from the pattern they had been following. In order to ensure that such deviations are not overlooked by the pattern handler 800, the controller 700 of the present embodiment comprises a pattern monitoring module 750 as shown in
In step S51, the pattern monitoring module 750 generates a reference data record that is the (N+i)th data record of the sequence of data records defined by the pattern that was identified in step S70. Then, in step S52, the pattern monitoring module 750 determines whether the reference data record matches the data record acquired in the last performance of step S20 (whose transmission has been prevented by the transmission control signal generated by the control signal generator module 720 in step S90). In other words, the pattern monitoring module 750 determines in step S52 whether the reference data record value is the same as, or within a tolerance band (e.g. ±2%, 5% or 10%) of, the value of the data record acquired in the last performance of step S20.
If the pattern monitoring module 750 determines there to be a match in step S52, this provides new feedback for the pattern learning module 760 (described in more detail below) about the validity of the pattern, and the process proceeds to step S53, where the pattern monitoring module 750 determines whether N+i has reached M, which is the number of data records in the sequence defined by the matching pattern. If N+i has not reached M, then the counter “i” is incremented by 1 in step S54, and the process then loops back to step S20 in
However, if the pattern monitoring module 750 determines there not to be a match in step S52, then the process proceeds to step S55, where the non-matching acquired data record is stored by the pattern monitoring module 750. Then, in step S57, the pattern monitoring module 750 determines whether a predetermined number (in this example, four, although one, two, three or a number greater than four could alternatively be chosen) of consecutive non-matching acquired data records have been stored. If not, then the process proceeds to step S54. However, if the pattern monitoring module 750 determines that four consecutive non-matching acquired data records have been stored, this indicates that the acquired data records have deviated significantly from their expected values (i.e. the values that would be expected if the acquired data records had continued to follow the identified pattern), and the process proceeds to step S58. In step S58, the pattern monitoring module 750 causes the control signal generator module 730 to control the first network node 200 to transmit to the second network node 300 the four stored data records whose transmission to the second network node 300 had been prevented and which were determined not to follow the identified pattern. The process then proceeds to step S55, where the transmission of data records by the first network node 200 is enabled so that data records from the flow 400 subsequent to the four non-matching data records can be transmitted to the second network node 300 via the network 500.
Where the pattern monitoring module 750 determines that four consecutive non-matching acquired data records have been stored, this indicates a failure in the definition of the pattern in use. This may be investigated by the pattern learning module 760 (described in more detail below), which can decide if the affected pattern needs to be updated or even disabled to prevent future inaccuracies. The decision will vary according to the statistical relevance of the detected failure. If it has just happened the first time, the decision may be to wait until further evidence about the failure is collected. This depends on the nature of the application in which the pattern-based event reporting system is used. If guaranteed accuracy is required, the failure will impose an update in the pattern if possible or otherwise the pattern will be disabled, and the failure will be fed back to the pattern learning module 760 to learn new similar patterns better in future closely-related situations.
In the present embodiment, the pattern monitoring module 750 requires four consecutive acquired data records to differ from their respective reference data records by more than a predetermined amount (e.g. ±2%, 5% or 10%, as noted above). However, in a variant of this embodiment, the pattern monitoring module 750 may be configured to determine that at least one data record whose transmission has been prevented does not follow the identified pattern when each of the at least one data record differs from the corresponding reference data record by at least a respective predetermined amount. Thus, in general, the tolerance bands for the first, second, third and fourth consecutive data records in the above embodiment need not be the same. For example, in an embodiment where small, short-lived departures from the matching pattern are acceptable but more rapid and pronounced departures are not, the finding of a first non-matching acquired data record may require a larger tolerance band to be used in the assessment of the next acquired data record and, where that next acquired data record is also found not to follow the pattern, a yet larger tolerance band to be used in the assessment of the next acquired data record, and so on.
In summary, the controller 700 performs a method of controlling the transmission of data records in the above-described communication system that comprises the key steps shown in the flow diagram of
As noted above, the controller 700 comprises a pattern learning module 760, which can operate in the parallel with other components of the controller 700 in order to learn new patterns and supplement the data store 740 with the new patterns that have been found in the data record flow 400. During its operation, the pattern learning module 760 receives the flow of data records 400 and searches for an occurrence of a repeating sequence of data records that repeats at least once in the flow of data records 400. When a repeating sequence of data records has been found, the pattern learning module 760 generates a pattern defining the repeating sequence of data records and stores the generated pattern in association with a corresponding pattern identifier as one of the stored patterns and associated pattern identifier in the second data store 740. The pattern learning module 760 also transmits the generated pattern and the associated pattern identifier to the pattern handler 800 via the network 500 for storage as one of the patterns and associated pattern identifier in the data store 820. The pattern learning module 760 may discard any patterns that are rarely followed by acquired data records.
The pattern learning module 760 may follow the workflow shown in
In the pattern learning process, every new data record is analysed together with the data records collected previously, looking for possible patterns or to extend any of the current patterns with the new data record. Any of the following possibilities may occur:
1. If the data record does not extend the information contained in any of the existing patterns, this may mean that the new data record starts a new pattern. This is verified by analysing the data records that follow the data record, and determining whether the data record and the surrounding data records in arrival time really constitute a new pattern or not.
2. The data record extends one or more existing patterns. The information from the new data record is incorporated into any of the existing patterns.
3. The data record does not match the pattern that is already active. In this case, the data record is sent towards the pattern handler 800, the active pattern is deactivated, as described above. The active pattern needs to be updated. The update process may involve several situations. One possibility is that the active pattern is deactivated to prevent the same failure happening in the future. Another possibility is to set up a new pattern with the part of the pattern that was successfully detected until this moment, and remove the rest from the pattern description.
The pattern learning module 760 of the present embodiment provide new patterns (that describe the data records analysed so far) and update the existing patterns to keep their accuracy as high as possible. The pattern learning module 760 may offer several modes of operation. The mode of operation can be selected by the data consumer system through the pattern handler 800. The modes of operation may include:
a) No error mode: this means that patterns will not be applied for a period of time due to some application requirements. This may happen when the application, at the data consumer site, must guarantee completely the accuracy of the results.
b) Overload prevention mode: this changes the way in which pattern are built, spanning the validity time period of the patterns as much as possible. This mode of operation looks for patterns that are valid for a longer period of time, reducing the number of messages sent across the network 500.
c) Normal operation: patterns are built with the highest accuracy possible, meaning that the validity time period will be shorter.
In the present embodiment, the pattern handler 800 is operable in a forwarding mode to de-serialise any data records it receives from the first network node 200 via the network 500 and forward the de-serialised data records to the data record processing module 600. However, when the pattern handler 800 receives the indication of the matching pattern from the control signal generator module 730, the pattern handler switches to operating in a data record prediction mode, as will now be described with reference to
In step S400, the pattern handler 800 receives the indication of the matching pattern generated by the control signal generator module 730. More particularly, the pattern handler 800 receives the pattern identifier transmitted by the control signal generator module 730 in step S90 of
The time span of each pattern is continuously checked to detect if it remains valid or needs to be updated. After the data record prediction module 810 has predicted the final data record in the sequence of data records defined by the indicated pattern, the pattern handler 800 reverts to operating in the aforementioned forwarding mode. However, up to that point, the pattern handler continues to operate in the data record prediction mode (predicting data records and providing them to the data record processing module 600), unless a data record is received from the first network node 200 via the network 500. When a data record is received under these circumstances (i.e. before the remaining data records of the identified pattern have all been predicted by the data record prediction module 810), the data record prediction module 810 responds by terminating its operation in the data record prediction mode, and resumes operating in the forwarding mode. In this way, the data record processing module 600 is fed accurately predicted data records up to the point when the deviation occurs, and is then fed actual data records that have been transmitted via the network 500 and appropriately de-serialised, in place of predicted data records that would not accurately reflect the data records which deviate from the (previously) matching pattern.
The pattern handler 800 may analyse, based on feedback that may be provided by a data consumer system connected to the second network node 300 the pattern accuracy, and send back to the controller 700 the corresponding insights. These insights may be used to reinforce the pattern learning process or to update how patterns are detected, for instance, the validity time period for patterns like the one being analysed.
The operations of the controller 700 and pattern handler 800 may be synchronised in any suitable way to ensure that the data record processing module 600 seamlessly transitions between receiving data records that have been transmitted from the first network node 200 via the network 500, and predicted data records that have been generated by the data record prediction module 610, with no data records being lost or duplicated during the transition. For example, these components may operate on the basis of a common clock signal provided via the network 500, with e.g. the acquisition of each data record and its processing by the pattern monitoring module 750 in S50 being timed to substantially coincide with the prediction of the corresponding data record by the data record prediction module 810.
In the above-described first embodiment, the controller 700 is provided as part of the first network node 200 (where it might be provided as a plug-in, if possible) while the pattern handler 800 is provided as part of the second network node 300. However, these components may be deployed in many other ways in the communication system. For example, the controller 700 may alternatively be provided as a stand-alone device in the network 500 (or a component of any intervening node or other component of the network 500), which eavesdrops on traffic being transmitted from the first network node 200 to the second network node 300 to acquire transmitted data records, and performs the above-described processes of interrupting the transmission of data records through the network that are found to follow a known pattern, and causing the data records whose transmission has been withheld to be predicted and passed to the second network node 300 by the pattern handler 800. In the present embodiment, the controller is provided as part of the second network node 300′, as illustrated in
The controller 700′ of the second embodiment differs from that of the first embodiment in that it does not comprise the data store 740 that stores the patterns, pattern identifiers and accuracy levels as described above. Instead, the controller 700′ of the present embodiment (and, more specifically, its pattern recognition module) is arranged to access the data store 820 of the pattern handler 800 and determine whether the data records from the received data record flow 400 that have been acquired by the acquisition module match part of a pattern of the patterns stored in the data store 820. Similarly, the pattern learning module of the controller 700′ is configured to store the new patterns it generates (together with the associated pattern identifier) in the data store 820 as one of the stored pattern and pattern identifier combinations.
The first network node 200′ may, as in the present embodiment, comprise a second data store, which store the same information as the data store 740 of the first embodiment and is therefore labelled with a like numeral in
Furthermore, in the present embodiment, the control signal generator of the controller 700′ is configured to transmit the transmission control signal(s) it generates to the first network node 200′ via the network 500 (instead of internally, within a node, as in the case of the first embodiment). The control signal(s) may be the same as described above with reference to the first embodiment. Alternatively, the control signal generator module may, as in the present embodiment, be arranged to transmit, as the at least one control signal, the indication of the matching pattern to the first network node 200′ via the network 500, the indication comprising the pattern identifier associated with the matching pattern. In this example, the first network node 200′ is responsive to the receipt of the pattern identifier to stop transmitting data records to the second network node 300′, to use the pattern identifier to identify the associated pattern stored in the second data store 740, to use the identified pattern to determine the number of data records whose transmission to the second network node 300′ is to be prevented, and to transmit data records that follow the determined number of data records whose transmission to the second network node 300′ is to be prevented such that the second network node 300′ receives said transmitted data records after the remaining data records have been predicted and provided to the data record processing module 600.
The first network node 200′ may, as shown in
Many modifications and variations can be made to the embodiments described above.
For example, the order of some of the process steps in
In the above-described embodiments, the flow of data records 400 takes the exemplary form of a single stream of data records, as shown in
In this way, the pattern matching techniques described in the first and second embodiments may be extended to two-dimensional patterns that can occur in data record flows comprising a plurality of data record streams, which may originate from different data record sources (e.g. sensors).
In the above-described embodiments, the pattern handler 800 is arranged to receive data records from the first network node 200 and forward the received data records to the data record processing module 600. In these embodiments, it is therefore possible to configure the pattern handler 800 to interpret the receipt of a data record before the remaining data records of the matching pattern have been predicted as an indication that a data record whose transmission by the first network node has been prevented does not follow/match the identified pattern being used for data record prediction. In these embodiments, the transmission of at least one data record by the first network node 200 may be sufficient to cause the pattern handler 800 to stop predicting the remaining data records and to revert to passing received data records to the data record processing module 600. However, in other embodiments, the pattern handler may be configured not to receive any data records and to instead start and stop predicting data records and passing them to the data record processing module 600 under instruction of the controller 700. In such alternative embodiments, the pattern handler 800 may be arranged to stop predicting data records in response to a stopping signal, and the pattern monitoring module 750 may be arranged, when at least one data record whose transmission has been prevented is determined not to follow the identified pattern, to cause the control signal generator 730 to generate and transmit the stopping signal via the network 500 to stop the pattern handler 800 predicting data records, and to control the first network node 200 to transmit to the second network node 300 the at least one data record whose transmission had been prevented and which was determined not to follow the identified pattern, such that the data record processing module 600 receives said data records instead of the corresponding predicted data records whose generation has been prevented by the stopping signal.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2015/060185 | 5/8/2015 | WO | 00 |