“Memcached” is a cache system used by web service providers to expedite data retrieval and reduce database workload. A Memcached server may be situated between a front-end web server (e.g., Apache) and a back-end data store (e.g., SQL databases). Such a server may provide caching of content or queries from the data store thereby reducing the need to access the back-end.
As noted above, web service providers may utilize Memcached to reduce database workload. In a Memcached system, objects may be cached across multiple machines with a distributed system of hash tables. When a hash table is full, subsequent inserts may cause older cached objects to be purged in least recently used (“LRU”) order. Memcached servers primarily handle network requests, perform hash table lookups, and access data. However, stress tests have shown that Memcached servers spend most of their time engaging in activity other than core Memcached functions. For example, one test shows that Memcached servers spend a considerable amount of time on network processing. Moreover, multiple web applications may generate millions of requests for cached objects; stress tests show that Memcached servers may also spend a significant amount of time handling and keeping track of these requests.
In addition to performance bottlenecks, tests show that power consumption may also be a concern for conventional Memcached servers. For example, a study shows that a Memcached server with two Intel Xeon central processing units (“CPUs”) and 64 Gigabytes of DRAM consumes 258 Watts of total power. 190 Watts of the total power was distributed between the two CPUs in the system; 64 Watts were consumed by DRAM memory; and, 8 Watts were consumed by a 1 GbE Ethernet network interface card. Thus, this study confirms that the CPU may consume a disproportionate amount of power.
In view of the foregoing, disclosed herein are an apparatus, integrated circuit, and method for caching objects. In one example, at least one hash table of a circuit comprises a predetermined arrangement that maximizes cache memory space and minimizes a number of cache memory transactions. In a further example, the circuit handles requests by a remote device to obtain or cache an object. By integrating the networking, processing, and memory aspects of Memcached systems, more time may be spent on core Memcached functions. Thus, the techniques disclosed herein alleviate the bottlenecks of conventional Memcached systems. The aspects, features and other advantages of the present disclosure will be appreciated when considered with reference to the following description of examples and accompanying figures. The following description does not limit the application; rather, the scope of the disclosure is defined by the appended claims and equivalents.
Caching circuit 104 may include a packet decipher engine 107 to determine whether a packet is a get command or set command. Packet decipher engine 107 may analyze the received packets and may store respective field information for further command processing. Irrespective of whether a packet is a set or get command, a packet may comprise a header field, which may include data such as an operation code, a key length, and a total data length. After the header field, the packet format may vary depending on the type of operation. For example, a set command may comprise an object to be cached in the hash table, user data, and a key. In a similar manner, a get command may comprise a basic header field, and a key to determine the location of the cached object. The key may be generated by the client requesting the set or get command, and the key may be a string that is somehow associated with the cached object. For example, if a phone number of a person named “John” is the cached object, “John” may be the key and hash(“John”) may represent the hash table address where the key “John” and its associated phone number will be stored (i.e. the key-value pair). In another example, the key may be a database query and the cached object may be the data returned by the query.
Key to hash memory management module 115 may be comprise a data path for objects being cached. Memory management module 119 may comprise a collection of functional units that perform caching of objects. Memory management module 119 may further comprise a dynamic random access memory (“DRAM”) module divided into two sections: hash memory and slab memory. The slab memory may be used to allocate memory suitable for objects of a certain type or size. Memory management module 119 may keep track of these memory allocations such that a request to cache a data object of a certain type and size can instantly be met with a pre-allocated memory location. In another example, destruction of an object makes a memory location available and may be put on a list of free slots by memory management module 119. Thus, a set command requiring memory of the same size may return the now unused memory slot. Accordingly, the need to search for suitable memory space may be eliminated and memory fragmentation may be alleviated.
Key to hash decoder module 113 may comprise a data path for objects to be hashed and hash decoder 117 may generate a hash for an incoming key associated with an object to be cached. In one implementation, hash decoder 117 may accept three inputs; each input may be a 4 byte segment of the key among three internal variables (e.g., a, b and c). Initially, the hash algorithm may accumulate the first set of 12 byte key segments with a constant, so that the mix module has an initial state. After the combine state is processed, the input variables may be passed to the mix state. At this point, a counter, which may be called length_of_key, may be decremented by 12 bytes in each iteration of combine and mix module execution. After each iteration, hash decoder 117 may determine whether the length_of_key counter is greater than 12 bytes. If the remaining length is less than or equal to 12 bytes, the intermediate key may be routed to a final addition block, which may execute the combine functionality for key lengths less than or equal to 12 bytes. Hash decoder 117 may then compute the internal illustrative variables a, b and c with a final addition/combine block. Hash decoder 117 may then pass the variables to a final mix data path to post process the internal states so that it can generate the final constant hash value.
Controller 111 may comprise control logic to perform a set or get command by coordinating activities between hash decoder 117 and memory management module 119. Controller 111 may instruct hash decoder 117 to perform a hash on a key to determine the hash table address. Once hash decoder 117 signals controller 111 that it has completed execution of a hash function, controller 111 may then signal memory management module 119 to perform a get or set command. For example, during a get command, once the hash value is ready, memory management module 119 may look up the hash table address. Once the value is retrieved, controller 111 may place the data on a FIFO queue in preparation for response packet generator 109. If the data is not found in the hash bucket, controller 111 may instruct response packet generator 109 to generate a miss response. When a set command is received, hash decoder 117 may perform a hash of the key to determine the hash table location of the new key-value pair and memory management module 119 may cache the object into the corresponding entry. Once completed, controller 111 may instruct response packet generator 109 to reply to the client with a completion message.
Working examples of the apparatus, integrated circuit, and method are shown in
As shown in block 202 of
Referring now to
Referring now to
As noted above, circuit 100 may be an ASIC, a PLD, or a FPGA. As such, the different example hash tables shown in
Referring back to
Although the disclosure herein has been described with reference to particular examples, it is to be understood that these examples are merely illustrative of the principles of the disclosure. It is therefore to be understood that numerous modifications may be made to the examples and that other arrangements may be devised without departing from the spirit and scope of the disclosure as defined by the appended claims. Furthermore, while particular processes are shown in a specific order in the appended drawings, such processes are not limited to any particular order unless such order is expressly set forth herein; rather, processes may be performed in a different order or concurrently and steps may be added or omitted.