The present disclosure relates generally to content popularity tracking, and more particularly, content popularity tracking for use in memory eviction.
Content eviction algorithms are used to provide effective cache utilization. Various replacement policies may be used to decide which objects remain in cache and which are evicted to make space for new objects. When the number of cached objects grows extremely large, conventional centralized systems do not provide adequate support.
An example of a distributed system is a distributed hash table (DHT). DHTs are a class of decentralized distributed systems that provide a lookup service similar to a hash table. Name and value pairs are stored in the DHT and any participating node can efficiently retrieve the value associated with a given name. Responsibility for maintaining the mapping from names to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows DHTs to scale to large numbers of nodes and to handle continual node arrivals, departures, and failures.
Corresponding reference characters indicate corresponding parts throughout the several views of the drawings.
In one embodiment, a method generally comprises storing at a node in a distributed network of nodes, an object associated with an object descriptor comprising popularity information for the object, each of the nodes storing a plurality of the objects and object descriptors, the node in communication with a server storing objects that are less popular than the objects stored at the nodes, and transmitting from the node to the server, one of the objects identified as less popular than one of the objects stored at the server. One of the nodes receives and stores the object from the server.
In another embodiment, an apparatus generally comprises memory for storing at a node configured for operation in a distributed network of nodes, objects associated with object descriptors each comprising popularity information for the object, each of the nodes comprising a plurality of objects and object descriptors, the node configured for communication with a server storing objects that are less popular than the objects stored at the nodes. The apparatus further includes a processor for transmitting from the apparatus to the server, one of the objects identified as less popular than one of the objects at the server. One of the nodes receives and stores the object from the server.
The following description is presented to enable one of ordinary skill in the art to make and use the embodiments. Descriptions of specific embodiments and applications are provided only as examples and various modifications will be readily apparent to those skilled in the art. The general principles described herein may be applied to other embodiments and applications. Thus, the embodiments are not to be limited to those shown, but are to be accorded the widest scope consistent with the principles and features described herein. For purpose of clarity, features relating to technical material that is known in the technical fields related to the embodiments have not been described in detail.
The embodiments described herein provide a distributed content popularity tracking scheme for use in content eviction decisions. Since the scheme is distributed, it can track popularity of a very large number of objects. The embodiments operate to evict content in a globally fair manner (i.e., evict the globally least popular content from the distributed system). As described in detail below, objects (e.g., content including, for example, data, video, audio, or any combination thereof) are stored in a distributed system comprising nodes for storing popular objects, and a server for storing less popular objects. The embodiments may be used, for example, by a content distributor to store popular content on edge servers or other nodes in an overlay network and evict less popular content. The embodiments may also be used by content providers to extract popularity analytics about their content, for example.
The embodiments operate in the context of a data communication network including multiple network elements. Referring now to the figures, and first to
The nodes 12 are in communication with a server (referred to herein as an eviction server) 14. The eviction server 14 comprises one or more storage network devices (e.g., backend server). The eviction server 14 may also comprise a network of nodes (e.g., storage nodes 12) or any other type or arrangement of network devices configured to store and deliver content. As described in detail below, less popular objects, which typically comprise a large majority of the content, are stored in the eviction server 14, while more popular objects, which are typically few in number, are stored in the nodes 12. The eviction server 14 thus stores content that is less popular, and less frequently accessed, than content stored at the nodes 12. Although the content stored at the eviction server 14 is not requested as often as the more popular content, it is still available for delivery (e.g., streaming, file transfer, etc.) to users or other network devices whenever requested.
Popularity refers to the relative access frequencies of requested objects. Any policy may be used to determine popularity of an object, including, for example, GDS (Greedy Dual Size), LFU (Least Frequently Used), etc. Popularity information may be updated when an object is accessed or at periodic intervals.
In one embodiment, the nodes 12 form an overlay distributed hash table (DHT) network. The nodes 12 comprise distributed hash table storage nodes that form a ring to cache descriptors of objects. The distributed hash table is a data structure that is distributed in the nodes 12 in the network. Each node 12 belonging to the DHT is responsible for a range of a complete space of keys. Each key can have one or more values assigned thereto. Each storage node 12 may be a DHT per se or may be another DHT-like entity that supports a distributed interface of a DHT even though it may be implemented in another way internally.
The DHT ring operates as an autonomous content indexing and delivery system via basic PUT/GET operations. Data is stored in the DHT by performing a PUT (key, value) operation. The value is stored at a location, typically in one DHT storage node 12, that is indicated by the key field of the PUT message. Data is retrieved using a GET (key) operation, which returns the value stored at the location indicated by the key field in the GET message. Content is indexed by hashing an extensible resource identifier (xri) of the content to generate a key. The value of the key is a descriptor that contains meta-data about the object, such as locations where the content is stored (resources) and popularity of the object. The content can be located by hashing the xri and performing a GET on the generated key to retrieve the descriptor. The content can then be downloaded from the resources listed in the descriptor. It is to be understood that the distributed hash table described herein is just one example of a distributed data structure and that other types of distributed systems may be used without departing from the scope of the embodiments.
In one embodiment, each node 12 includes a hash bucket 16 and the eviction server 14 includes one or more eviction buckets 18. As described in detail below with respect to
It is to be understood that the network shown in
An example of a network device 12 that may be used to implement embodiments described herein is shown in
Memory 26 may be a volatile memory or non-volatile storage, which stores various applications, modules, and data for execution and use by the processor 24. Logic may be encoded in one or more tangible media for execution by the processor 24. For example, the processor 24 may execute codes stored in a computer-readable medium such as memory 26. The computer-readable medium may be, for example, electronic (e.g., RAM (random access memory), ROM (read-only memory), EPROM (erasable programmable read-only memory)), magnetic, optical (e.g., CD, DVD), electromagnetic, semiconductor technology, or any other suitable medium.
The network interface 28 may comprise one or more wired or wireless interfaces (line cards, ports) for receiving signals or data or transmitting signals or data to other devices.
It is to be understood that the network device 12 shown in
When a new object is sent to the eviction bucket 18 or the popularity of an object has been updated, the popularity of the most popular object in the eviction bucket is compared to the least popular objects in the hash buckets 16 (steps 36 and 38). The popularity of the most popular object in the eviction bucket 18 should not be more than the popularity of the least popular objects in the hash buckets 16. If the most popular object in the eviction bucket 18 is more popular than the least popular objects in the hash buckets 16, the objects are swapped. The less popular object in the hash bucket 16 is moved to the eviction bucket 18 (step 40) and the most popular object in the eviction bucket 16 is moved to one of the hash buckets 16 (step 41). The node 12 that receives the object from the eviction server 14 may be the same node that sent an object to the eviction server or may be a different node.
For example, if object A in hash bucket i is less popular than object B in the eviction bucket 18, these objects need to be swapped (i.e., object A moved to eviction bucket 18 and object B moved to one of the hash buckets 16). Object A is moved to the eviction bucket 18 and becomes the new most popular object. Object B is moved to one of the hash buckets 16. While object A will become the new most popular object in the eviction bucket 18, object B may not become the new least popular content in the hash bucket it moves to, even if it is the hash bucket i. The singly-linked queue of content popularity which exists inside the hash bucket 16 is used to find the proper place for object B in the queue, the new least popular content of the hash bucket to which object B was moved if it is different from i, and the new least popular object of the hash bucket i. The global popularity descriptor is also updated.
During normal operation, each hash bucket 16 maintains an approximately constant number of descriptors, as descriptors are only added to the hash buckets 16 via a swap with the eviction bucket 18. The eviction bucket 18, however, can continue to grow as new object descriptors are created. When the eviction bucket 18 reaches an eviction threshold, a block of descriptors and their related content are evicted from the system. It is not necessary that these evicted objects are the least popular in the system as they are all in the least popular eviction bucket.
The descriptor table 44 is preferably updated when any changes are made to the objects (e.g., new object received at the node, popularity of one or more objects updated).
It is to be understood that the tables shown in
Although the method and apparatus have been described in accordance with the embodiments shown, one of ordinary skill in the art will readily recognize that there could be variations made to the embodiments without departing from the scope of the embodiments. Accordingly, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.