The present invention relates to the distribution of data within a network, such as, for example, the distribution of a security patch for an operating system or media data file, for example.
In many instances, networks are constructed, at least in part, in a hierarchical fashion, with some computing entities in a network being topologically located closer to a server computing node than others. Moreover, it is a natural consequence of network configurations that, practically speaking, the number of computing nodes increases the further away one moves topologically from a server node. The distribution of data to a large number of computing nodes at the same time is, therefore, frequently apt to result in a bottleneck as data is transmitted to and from the server node.
One way to ameliorate this is to ensure that many of the computing entities on the network are capable of file sharing with other entities. A file sharing computing node which has received data is thus capable of passing this on to another computing entity (usually known as a ‘peer’), which, in theory reduces the load on the server node. This is not, however, necessarily the case. In the event of demand overload, file-sharing software such as BitTorrent required on both the client and server side, is often unable to dispatch even a single full copy of the requested data file to a single other computing entity. Further, precisely because client computing nodes are usually less powerful than server nodes, it's easier for client nodes to become overloaded.
The present invention provides a method of distributing data across a network from a server computing node to one or more client computing nodes across a network, the method comprising the steps of:
Embodiments of the invention will now be described, by way of example, and with reference to the accompanying drawings, in which:
Referring now to
To enable optimum usage of the network capacity, the server preferably possesses: a mapping of the network topology: data on the inherent connective capacity of each of the links; and data on the computational capacity of each of the entities P and C. Such a mapping is relatively easy to obtain by a variety of methods, such as the use of various network utilities such as TraceRoute and Ping for example in relation to topology, as well as various other known tools for establishing connective and computing capacity. This ‘intrinsic map’ can also be updated as new information becomes available, e.g. as a result of observing events such as the allocation of an IP address. Clearly more information on the nature of the network can be obtained in the case of an intranet than would typically be the case for an internet scenario, but in either case sufficient data can usually be legitimately obtained. Further, each computing entity is equipped with a file-sharing application. Typically this will take the form of an application which interacts with and supplements the functionality of a web browser, known as a ‘helper’ application. One example of such an application is BitTorrent. Since such a helper application is required in order to transmit data from one computing node to another by file-sharing, it follows that the a computing node's ability to perform the necessary processing operations has an effect upon the ability of the network to transmit data in this manner. It is for this reason that data relating to computing capability of nodes is gathered and taken into consideration.
A schematic representation of the data held in respect of each computing entity is illustrated in
Installation of the relevant utilities in order readily to obtain the requisite mapping information and helper applications in order to ensure that file-sharing capacity is present on each computing entity is relatively easy to impose upon users in an Intranet scenario, since the computing entities of employees working for an organisation can be subjected to forced installation. By contrast, in an Internet scenario, users will typically agree to join a community in which this information is shared openly in return for certain benefits, which will become apparent subsequently.
In addition to data on the intrinsic capacity of the various network elements, the server also stores what might be thought of as a ‘temporal’ profile of typical usage of computing and connective capacity against time of day for each of the links and each of the computing entities, or if not each of them, then a significant proportion. This provides information on the degree of usage of the inherent capacity. Thus, a given link between two computing entities may be intrinsically capable of transmitting 11 Mbps, but at a particular time of day, the rate of data transmission along that link may typically be 7 Mbps, meaning that the link is only residually capable of providing a capacity of 4 Mbps. Typically this kind of information can be obtained most easily from users by the installation of an appropriate utility on the computing entities which returns the relevant data to the server. An example of such a temporal profile for a particular network element—i.e. a link or a computing node is illustrated in FIGS. 4A-C. The profile is generated from a graph (
One purpose of obtaining the mapping data and temporal usage profile is to enable the server node to distribute data efficiently. On occasions when the connective capacity of the link L100 is not being approached or exceeded by the aggregate level of data transmission which is requested of the server by the computing entities P and C, efficiency is of less importance. However, in a number of scenarios where heavy demand is placed upon the server, efficient usage of the network capacity can be more significant. Examples include the case of streaming of media files to all computing entities (e.g. where a Chief Executive wishes to address employees directly by streaming media); distribution of large files to a limited number of computing entities where the file size is significant; or the distribution of smaller data files to all computing entities—e.g. in the case of a security patch, for example. Thus, in any scenario where connective capacity is likely to be approached, efficient network usage becomes an issue.
In order to enable efficient usage of the network capacity at any given instant, the server node generates what can be thought of as a substantially real-time useable network map, having the form of a connective tree by applying the temporal usage profile for each element of the network (i.e. link and computing entity) to the intrinsic mapping information. The map is ‘real-time’ in the sense that it represents the current usable topology to within a defined temporal resolution. Referring now to
It should be appreciated that different trees may be created from the same intrinsic network topology and the same temporal usage profile at the same time of day as a result of differing selection of the threshold value TR (in
The utility of this approach can be readily appreciated by consideration of a simple example. Having generated the tree illustrated in
Thus, for example, instead of attempting to return copies of the data file to the entities P2-P5 directly, and to the adjacent C entities via entities P2-P5, the server node instead dispatches a copy of the data file initially to entities C1 and C8. Because in the present embodiment, all entities in the network are provided with a file-sharing helper application, the data file is dispatched to C1 and C8 via entities P1 and C2 in turn, each of which, in the course of transmitting the data file, thus creates and stores a copy of it for its own use. In sending the data file to entities C1 and C8, it has, in each case, been transmitted across only three network links: L100-L120-L210 in the case of C1, and L100-L120-L220 in the case of C8; in each case this involves transmission across only one further link than would be required via the topologically fastest route. Transmission of the data file to other entities can then be achieved in a number of ways.
One way is to instruct the entities C1 and C8 to transmit the file onwards to the other entities who are below them in the tree, which again only requires, at most, transmission across two further network links. This is possible as a result of the use of the presence of file-sharing software on each entity in an Intranet scenario (in an Internet scenario, the capability of the computing nodes with regard to file sharing will be taken into account in constructing the tree). Alternatively, the server node can merely transmit a pointer via a different route to the remaining entities who require the data file, directing them to the nearest entity which possesses a copy: C1 in the case of C4, C3 and P2; C8 in the case of C7, P4, C10, C9, P5, C6, P3 and C5. The pointer, being very small in size may, depending upon the circumstances, be easily transmitted directly via the topologically fastest route to the relevant requesting entities (NB this may apply in a case were either the current usage map may, due to the small amount of data involved, or that, once the higher level nodes (i.e. those topologically closest to the server have been bypassed, a different usage map may be created, due to the absence of any need to pass data along the higher level links, so that the usage map at a more local level represents more closely the intrinsic topology). This then further reduces the load on the server node as the remaining requesting entities direct their request to the nearest ‘peer’ entity in possession of the data file, as determined by the tree structure. In a further modification, a combination of the two approaches can be advantageously employed. Thus, for example, a copy of the data file can be distributed to node C6 and a pointer to C6 then sent to P3, C5 and C7 to distribute the load across different nodes (in the latter two cases using the degenerate connections to C7 and C5 to advantage to enable load distribution).
It should be noted that, preferably, the data file is transmitted to a location within the network such that any bottleneck is simply not relocated to another point. Thus, for example, this would have been the case had the data file simply have been sent to C2 and then the remaining computing nodes had addressed their request for the file to C2. In contrast, the file was distributed to nodes C1 and C8, with the result that the initial load on each of these nodes is comparable, and the subsequent benefit that the load can be further distributed by use of sub-trees with roots—node C4 in the case of C2 (having further sub-trees with roots C3 and C9) and nodes C6 and C7 in the case of C8; the load is thus, as a result of file-sharing, distributed relatively evenly with increasing topological distance from the P nodes.
In a modification, often applicable in the case of media data files, the file is dispatched in segments X1, X2 . . . Xn. Depending on the nature of the tree which is available to transmit data along, different segments of the file to different parts of the network, and then instructing entities to obtain the remaining parts of the file from a neighbouring entity, the total traffic can be reduced and the file can be again to
The server node preferably updates the tree upon the occurrence of one or more specified events. Thus, for example, in the event of the failure of a particular data file successfully to navigate a particular route—an occurrence which may, for example, be caused by corruption or un-installation (whether inadvertent or deliberate) of file sharing software, for example—the tree is preferably updated to reflect the change in useable network topology at that time. Preferably a corresponding change is made to the temporal usage profile in order that subsequent tree generation results in as accurate a real-time representation of the useable network as possible.
Number | Date | Country | Kind |
---|---|---|---|
0522223.7 | Oct 2005 | GB | national |