The technology described in this application relates to data dissemination in computer environments, and especially relates to extraction of changes in data sets for distribution of the changes in computer networks.
Today it is very common that data is sent over computer networks. The amount of data being sent is rapidly increasing due to the advances in technology, making it possible to send and process more data at higher speed. Furthermore, new applications also demand more data since they have become more complex. An example of computer systems where data dissemination technologies is important is an electronic trading systems.
Electronic trading of securities, derivatives, commodities and other financial instruments results in large amount of data which has to be distributed to users that the data for making trade decisions, statistical calculations and other assessments. The process of extracting and sending all this information in a high performance computer system can be very demanding for a processor. The amount of CPU time is a sparse resource that should not be wasted on executing steps that could be avoided.
Furthermore the users connected to such a centralised trading system typically want to have the information as soon as possible. In these cases it may not only be enough to boost the performance in the central system by for example updating the hardware.
In order to get rid of a bottleneck or other latency problem in the system, additional techniques may have to be used.
Thus, one such additional technique is to make the work of the processor more efficient. For example when updating a data set on a user terminal there exist different approaches.
One commonly used solution is to always send the complete new data set, which replaces the old data set. This is often inefficient when only part of the data in the sets has been changed. Thus a more efficient approach may be to only send the parts of the data set that have changed. An even further enhancement is to send the delta changes.
Another known technology is to send operators that describe the differences between two data sets. Applying the operators on the first data set converts it into the second data set.
With the selection of a good set of operators acting on the data set, optimisation is possible. There are today poor methods available for extracting data set differences.
Hence, there is a need for developing techniques for extracting and selecting operators in an efficient way, for example with a fewer number of steps, in order to reduce the load on a processor and to reduce data dissemination such as bandwidth in a computer system.
Thus it is an object to generate an update data set to be sent to remote terminals.
It is another object to efficiently extract data from a data set.
It is another object to efficiently extract and/or selects operators based on differences between data sets.
It is another object to generates a data structure to be sent to remote terminals.
It is another object to use less processor time.
According to a first aspect the above and other objects are achieved by a computer system for generating an update data set to be sent to remote terminals, the update data set comprising operators describing differences between a first data set comprising sorted data elements and a second data set comprising sorted data elements, the computer system comprising:
The computer system has the advantage that it makes it possible for a computer system such as a trading system to more efficiently generate an update data set by using less CPU time. For example, since the computer system makes it possible to compare the first and second data set preferably by only one run through of each data set.
The computer system further comprising a communicator associated with the selector for generating and sending an update message comprising the update data set. The message may comprise a data structure such as a list, an array, bitboard, stack, heap, tree or collection and so forth comprising the data set to be sent to remote terminals. Preferably the message is sent by using the FIX standards, however any other protocol well known to the person skilled in the art may be used to send the message such as Omnet API, XTP, SSL or any other standard or proprietary protocols.
Preferably the first and second data set comprises dynamic data elements changing over time. For example the first data set may be an orderbook in a trading system at the time T1, the second data set may be the orderbook at the time T2. The data in the orderbook may have changed during the time T1 and T2, since new orders may have been entered, either buying or offering for sale financial instruments. Information dissemination systems may disseminate information to remote terminals either at certain predetermined time intervals or upon an activity in the orderbook. For example if the orderbook doesn't receive any new order there is no meaning to send new updates to the remote terminal. However when a new order enters the orderbook the new information has to be propagated to the remote terminals. Thus, at this point in time when an activity occurred it may be necessary to send update messages. Another example where changes may occur in the orderbooks is during state transition, for example when an orderbook opens or closes.
Hence the second data set is a later version of the first data set. In this way the central system can compare what changes that have occurred, and since the central system knows what data set the remote terminals have, the central systems can extract the operators to store and send in the update data set. Furthermore it may also be possible to send certain parts of the changes between the first and the second data set to specific remote terminals, since the users on the remote terminals may want to have information relating to different parts of the data sets.
Preferably the operators is determined from a group comprising the following operators: Add operator, Delete operator, Replace operator. By determining these operators based on the change parameter it is possible to generate the update data set. The operators may be combined as described later in this document.
The change parameter preferably comprise a first counter related to the first data set and a second counter related to the second data set. In this way the selection process can be monitored and managed in a more accurate way. Thus, the selector may determine operators based on the relation between the first counter related to the first data set and the second counter related to the second data set. This speeds up the process of selection since the relation is preferably a relation chosen from a group of relations, the group comprising: >, <, =, ≧, ≦ or ≠. Based on the relation between the counters at least one of the operators: Add, Delete, Replace, is determined.
The change parameter may further comprise a first position parameter associated with the first data set and a second position parameter associated with the second data set for keeping track of the logical position of the comparator in the first and second data set. In this way the result of the sequential comparison of the first and second list can be monitored and managed in a more accurate way.
The operators may comprise delta changes. Thus if only a part of a data element has changed the operator may describe the part of the data element that has changed. However the operator could also describe that the whole data element should be e.g. deleted or replaced or added.
Each data element normally comprises at least a key. However the data element may also comprise a data part. Generally the Data part may be empty. However in a trading system a typical key is a price in an order book. The data part could for example be the aggregated volume for that price. However it is not always necessary to send the whole key if the key comprises price and time. Sorting of the elements may be based on price and time but only the price may be sent to a remote terminal.
In a second aspect the above and other objects are achieved by an electronic trading system comprising the computer system as mentioned above.
The computer system may thus be an integrated module in an electronic trading system. It can also be a stand alone module that can be sold separately as an information extraction system.
In a third aspect, the above and other objects are fulfilled by a method for generating an update data set to be sent to remote terminals, the update data set comprising operators describing differences between a first data set comprising sorted data elements and a second data set comprising sorted data elements, the method comprising the steps of:
The method has the advantage that it makes it possible for a computer system such a trading system to more efficiently generating an update data set by using less CPU time since the technology described in this application makes it possible to compare the first and second data set preferably by only one run through of each data set.
The method may further comprise the step of associating the determined operators to compared data elements. In this way the system is able to keep track of which data element that the operator should be used on.
The method may also comprise the step of: determining at least one operator from a group comprising the following operators: Add operator, Delete operator, Replace operator. By determining these operators based on the change parameter it is possible to generate the update data set. The operators may be combined as described later in this document.
As mentioned above the change parameter may comprise a first counter related to the first data set and a second counter related to the second data set. Hence the method may further comprise the step of determining operators based on a relation between the first counter related to the first data set and the second counter related to the second data set.
Preferably each data element comprises a key and a data part, the method may further comprising the step of comparing at least one of the key and data part of an element of the first data set with at least one of the key and data part of an element of the second data set.
As mentioned earlier the first and second data set may comprise dynamical information changing over time.
In a fourth aspect, the above and other objects are fulfilled by a computer program product according to any of the previous described aspects and/or embodiments, the computer program product being stored on a data carrier.
These and other aspects will be apparent from and elucidated with reference to the non-limiting, example embodiments described hereinafter.
In the following, the details of how the operators may be determined/selected will be explained.
Below follows an example of a method for selecting operators that efficiently describes the difference between two data sets.
The data sets consist of one or several elements where each element has a key and a possible data part. The key together with a sorting algorithm gives each element a logic position within the data set.
This method is based on:
The method uses the following set of operators to describe changes between two data sets:
Method Introduction
An element that is identical (both key and data part) in the two data sets is unchanged. The algorithm considers an unchanged element as a barrier. The barriers are used by the method to identify when an optimisation of the operator can be performed. The method is based on the observation that optimisations may be possible for the operators used before a barrier is detected.
Assume the data sets A and B shown in
During the traversing of the data sets preferably the following counters are used, #Del and #Add. These are initially set to zero or any other corresponding value that fulfils the same object.
Thereafter the data set A and the data set B are traversed preferably from the beginning, according to the sorting order of keys.
The actual logical position within data set A is hereafter named APos, which comprises the position parameter for data set A. The actual logical position within data set B is hereafter named BPos, which comprises the position parameter for data set B. APos and BPos are initially set to the first logical position within the data sets A and B.
Initialize:
The algorithm: Iterate through dataset A and B, and Count #Add and #Del:
Below follows a non-limiting example describing the different method steps with reference to the figures.
Imagine two data sets A and B, according to the tables shown in
Initialize:
The method and system starts to initialize counters that preferably will be used through the iteration of the data sets.
Thereafter an iteration phase is started wherein the change parameters #Add and #Del is iterated and increased.
STEP 1. Compare the element from set A with the element from set B. Since Apos and Bpos are both set to 1, this means that the first elements in the list are compared. However if the elements in the data sets are sorted in another way the method may start to compare other elements first.
The comparison is shown in
Continue with step 1 in the algorithm and compare next element.
STEP 2. Compare the element from set A with the element from set B. Now the Bpos counter has increased with 1 therefore the element at (Apos=1) in data set A is compared with the element at (Bpos=2) in data set B.
The comparison is illustrated in
Continue with step 1 in the algorithm and compare next element.
STEP 3. Compare the element from set A with the element from set B. Similar to above.
The comparison is illustrated in
Continue with step 1 in the algorithm and compare next element.
STEP 4. Compare the element from set A with the element from set B. The comparison is illustrated in
Continue with step 1 in the algorithm and compare next element.
STEP 5. Compare the element from set A with the element from set B. The comparison is illustrated in
Initiate the selector and select the operators using method B.
STEP 6. Compare #Del with #Add. According to method B, step b is applicable in this situation:
(b). #Del (2)=#Add (2) and therefore select the following operators:
Replace 2 (#Add) operator, selecting data from set B, starting from logical position AposStart=1.
The selected operators are shown in
STEP 7. The iteration and comparison of the elements in data set A and data set B continues
Set #Add=0 and #Del=0.
In order to get by the barrier the counters are increased, APos with 1 and BPos with 1, giving Apos=4 and Bpos=4.
Set APosStart=APos and BPosStart=Bpos, giving APosStart=4 and BPosStart=4.
Continue with step 1 in the algorithm and compare next elements.
STEP 8. Compare the element from set A with the element from set B. The comparison is illustrated in
Continue with step 1 in the algorithm and compare next element.
STEP 9. Compare the element from set A with the element from set B. The comparison is illustrated in
Continue with step 1 in the algorithm and compare next element.
STEP 10. Compare the element from set A with the element from set B. The comparison is illustrated in
Continue with step 1 in the algorithm and compare next element.
STEP 11. Compare the element from set A with the element from set B. The comparison is illustrated in
Increase #Add with 1, giving #Add=1.
Increase #Del with 1, giving #Del=4.
Increase APos with 1, giving Apos=8.
Increase BPos with 1, giving Bpos=5.
Continue with step 1 in the algorithm and compare next element.
STEP 12. Compare the element from set A with the element from set B. The comparison is illustrated in
Increase #Del with 1, giving #Del=5. Increase APos with 1, giving Apos=9.
Continue with step 1 in the algorithm and compare next element.
STEP 13. Compare the element from set A with the element from set B. The comparison is illustrated in
After the selection continue with step 2 (Done).
STEP 14. Compare the counter #Del with the counter #Add, by use of selection method B.
According to the selection step (a) in selection method B, #Del (5)>#Add (1) and therefore the operators are preferably selected as follows:
Delete 5 (#Del) operator at logical Position AposStart (4).
Followed by:
Add 1 (#Add) operator from the set B elements, starting from logical Position AposStart (4).
The operators selected are shown in
STEP 15. The final step 2 (Done) in the algorithm, is reached as mentioned above in step 14.
After the complete iteration of the algorithm and method B the complete number of operators that describe the changes from dataset A to dataset B are stored in the memory, preferably together with the corresponding logical position, key and data part. An update data set is generated comprising the operators, logical positions, keys and data parts, as illustrated in
In another embodiment the logical position could be represented by the key. In this case the terminal receiving the update data set preferably comprises information of how logical positions relate to each other, so as to be able to sort the data elements. Thus it is not necessary for a logical position to be 1, 2, 3 . . . etc. It can also be represented by A, B, C and so forth, as long as the positions have a relation to each other so as to facilitate sorting.
With knowledge of dataset A, and the update data set comprising the operators as illustrated in
Below follows a stepwise description of how this may be done, illustrated with the accompanying
By applying the first row, shown in
By applying the second row, shown in
By applying the third row, shown in
By applying the third row, shown in
The operators have now transformed the Data Set A into Data Set B as illustrated in
In the above description the term “comprising” does not exclude other elements or steps and “a” or “an” does not exclude a plurality.
Furthermore the terms “include” and “contain” does not exclude other elements or steps.
Number | Name | Date | Kind |
---|---|---|---|
5418965 | Mahar | May 1995 | A |
5473772 | Halliwell et al. | Dec 1995 | A |
5899998 | McGauley et al. | May 1999 | A |
6260025 | Silverman et al. | Jul 2001 | B1 |
6847971 | Balaraman et al. | Jan 2005 | B1 |
6925467 | Gu et al. | Aug 2005 | B2 |
20030135419 | Haulk et al. | Jul 2003 | A1 |
20030229570 | Hughes, Jr. et al. | Dec 2003 | A1 |
20040010429 | Vedula et al. | Jan 2004 | A1 |
20040010456 | Hoang | Jan 2004 | A1 |
20060107260 | Motta | May 2006 | A1 |
Number | Date | Country |
---|---|---|
02091650 | Nov 2002 | WO |
Number | Date | Country | |
---|---|---|---|
20080155527 A1 | Jun 2008 | US |