This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2013-116072, filed May 31, 2013; the entire contents of which are incorporated herein by reference.
Embodiments described herein relate to a data transferring apparatus, a data transferring system and a program.
Conventionally, there is a server that stores data as a file in a storage, and that reads data from the storage to transmit them or writes data to the storage in response to a request from a client. In this case, data to be frequently accessed are placed in a DRAM (Dynamic Random Access Memory), and thereby, a fast response can be achieved. However, there is a problem in that an achievement of a fast response in a large storage capacity requires a large capacity of DRAM.
According to one embodiment, there is provided a data transferring apparatus to communicate with a communicating apparatus through a network in accordance with a predetermined protocol, including: a writing controller and a transmission controller.
The writing controller performs control of writing a first response message containing first data to a storage.
The transmission controller performs control of reading the first response message from the storage and transmitting the first response message to the communicating apparatus, upon receiving a data acquisition request message for the first data from the communicating apparatus.
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
A caching server 101 and a client 201 are connected through a network 301. Although only a single client is shown, multiple clients may be connected with the network 301.
In the embodiment, as the network 301, a LAN such as Ethernet (R), for example, is assumed, but any scheme is allowable. The network 301 is not limited to a LAN, and may be a wide area network or the Internet. In addition, the network may be a wired network or may be a wireless network.
The client 201 is a communication client apparatus that transmits and receives messages with the caching server 101 in a procedure (cache protocol) determined by the caching server 101. The caching server 101 includes a storage for storing data. One characteristic of the embodiment is that the storage stores data in a format of a response message in the cache protocol.
When receiving a data saving request message containing data as an object to be saved, from the client 201, the caching server 101 makes a response message (response data) containing the data and saves the response data in the storage. When receiving a data acquisition request message from the client 201, the caching server 101 reads the response data containing the acquisition-requested data from the storage.
The caching server 101 adds headers to the read response data so as to configure packets, and transmits the packets to the client 201. When receiving the acquisition request, it is not necessary to make a response message, and it is only necessary to read the response message (response data) made in advance, from the storage, and to add headers thereto and transmit them. Therefore, even if a capacity of the storage is made larger, a fast response is possible. Here, the client to perform a saving request of the data and the client to perform an acquisition request of the data may be the same or different.
In the embodiment, the caching server 101 adopts a storing method in which keys and values correspond to one another. Thereby, in some cases, the caching server 101 is called a key-value storing (KVS) server, and the client 201 may be called a KVS client.
The lower side of
As shown in the lower side of
Advanced Technology Attachment) storage 12. The data transferring apparatus includes a direct transmission HW (HardWare) unit (transmission apparatus) 11, a CPU 31, a main memory 32, and a bus bridge 33. The CPU 31 and the main memory 32 are a general CPU and main memory. As for the bus bridge 33, any scheme is allowable, as long as it is a bus to connect with a wide variety of devices, such as PCI Express.
The functional block diagram in the upper side of
The direct transmission HW unit 11 includes a network input/output unit 21, a TCP process offloading unit 22, a direct-storage-access processor 23, a SATA input/output unit 24, and a bus input/output unit 25.
The network input/output unit 21 is a communication interface to connect with the network 301, and performs processes on the MAC layer.
The TCP process offloading unit 22 is a processing device to execute some of the TCP/IP communication protocol processes in hardware (HW). At the time of transmitting data, a TCP header and an IP header are added to the transmitting data. At the time of receiving data, whether the data has a connection associated therewith is checked based on a destination IP address of the IP header and a destination port of the TCP header. In the case of having such connection, the received data is passed to a TCP/IP protocol stack 43 of the OS 41. In the embodiment, the TCP process offloading unit 22 is configured by hardware, but it is allowable that the CPU executes software (mounted on the OS, for example) and thereby the function of the TCP process offloading unit is implemented. The SATA input/output unit 24 is an interface unit for connecting with the SATA storage 12. Here, the storage 12 is externally connected with the data transferring apparatus, but it may be configured to be connected through a network such as a LAN. In this case, the network may be the network 301 connected with the client 201, or may be a network different from this.
The direct-storage-access processor 23 accesses the SATA storage 12 in accordance with an instruction from a storage driver (writing controller) 44 of the OS 41. Furthermore, the unit 23 reads data (here, response data) from the SATA storage 12, and passes the data to the TCP process offloading unit 22, in accordance with an instruction of a direct transmission HW driver (transmission controller) 47 of the OS 41. The TCP process offloading unit 22 having received the data adds a TCP header and an IP header, and transmits the data to which these headers have been added, to the network 301 through the network input/output unit 21.
The bus input/output unit 25 is a device-side interface unit for connecting with the CPU 31 and the main memory 32, through a bus such as PCI Express, for example. The bus input/output unit 25 exchanges data between the CPU 31 and the main memory 32. Also, exchanging of process instructions by the CPU is performed between the CPU 31 and the main memory 32. Thereby, the instructions and data from the software (OS) can be exchanged with the direct transmission HW unit 11.
The OS 41 includes a network driver 42, the TCP/IP protocol stack 43, the storage driver 44, a buffer cache 45, a file system 46, the direct transmission HW driver (transmission driver) 47, and a system call unit 48. The application program 51 is a program to operate on the OS 41, and in the embodiment, the program is a caching server program.
The network driver 42 of the OS 41 is a device driver to transmit and receive data to the network 301 through the network input/output unit 21 included in the direct transmission HW unit 11.
The TCP/IP protocol stack 43 is a processor to implement the transmitting and receiving of data in accordance with the TCP/IP protocol, and has a function to share processes with the TCP process offloading unit 22. As described above, the header process is performed by the TCP process offloading unit 22, which is hardware, and other processes relevant to the TCP session control are performed by the TCP/IP protocol stack 43, which is a software part.
The storage driver 44 is a device driver to implement the access to the storage 12 connected by the SATA, through the SATA input/output unit 24 included in the direct transmission HW unit 11.
The buffer cache 45 caches some data in the storage 12 into the main memory 32, replaces reading and writing to the storage with a cache access to main memory 32, and thereby reduces the number of times of access to the storage 12. This achieves an increase in data access speed.
The file system 46 is a processor to logically format the storage area of the storage 12 and to implement the data management by files. Examples of such logical format scheme include the FAT file system, the exFAT file system, the UNIX file system, the Berkley Fast file system and the ext file system. In the embodiment, particularly, any scheme is allowable.
The direct transmission HW driver 47 is a device driver to take out the corresponding data (here, response data) from a series of sectors that constitute a file in the SATA storage 12, and to perform an instruction to transmit the response data directly by TCP/IP (that is, without the process by the TCP/IP protocol stack 43). In the present application, the caching server program unit 51 performs a data transmission instruction with a “sendfile( )” system call of the system call unit 48, and on this occasion, the direct transmission HW driver 47 instructs the direct-storage-access processor 23 and the TCP process offloading unit 22 to start the processes. The direct-storage-access processor 23 extracts a sector sequence with respect to a file designated from the direct transmission HW driver 47, takes out the corresponding data (response data) from the extracted sector column, and passes the response data to the TCP process offloading unit 22. The TCP process offloading unit 22 adds an IP header and a TCP header to the response data to make packets, and transmits the packets to the network 301.
The system call unit 48 is a processor to provide a program interface with the caching server program unit (application program) 51. The implementation method varies depending on the OS, and in many cases, software exceptions provided by the CPU 31 are used. Also, the function to be provided varies depending on the OS, and in the embodiment, a “recv( )” system call for receiving data from a network, a “write( )” system call for writing data to a file (a storage), and a “sendfile( )” system call for transmitting file data to the network are shown in the figure.
The caching server program unit 51 includes a cache protocol processor 52, a data-storing instructor 53, a direct transmission instructor 54, and a data placement managing unit 55.
The cache protocol processor 52 is a processor to receive a cache protocol message that is sent from the client 201 through the network 301, and to interpret the content of the cache protocol message. As the cache protocol, which is not limited thereto, the memcached is assumed in the embodiment. As the type of basic request messages, there are a “GET” request message for requesting transmission (acquisition) of data, and a “SET” request message for requesting storing (saving) of data. Other types of request messages may also be implemented. By receiving the “recv( )” system call from the system call unit 48, the cache protocol processor 52 receives the request message from the OS 41.
When the cache protocol processor 52 receives a “SET” request message, the data-storing instructor 53, by an instruction of the data placement managing unit 55, calls the “write( )” system call for performing a process to write a response message (response data) containing data included in the “SET” request message, to a file.
When the cache protocol processor 52 receives a “GET” request message, the direct transmission instructor 54, by an instruction of the data placement managing unit 55, calls the “sendfile( )” system call for performing a process to read data (response data) in a file and transmit the data to the network 301.
The data placement managing unit 55 is a processor to manage use areas and unused areas in the storage 12, and to manage what data is in what location in which file. The data placement managing unit 55 uses data-holding structures each of which describes a key and where data corresponding to the key is held in which file, and thereby manages data for each key. The key is generally called an identifier. The “SET” request and the “GET” request contain a designated key, and the “SET” request further contains data that is an object to be saved. The data placement managing unit 55 has a hash table of keys (a data-holding management table), and provides a scheme by which a targeted data-holding structure can be quickly retrieved from a key.
The “key” is a character string that is a retrieval key of data or byte sequence.
The “value length” is a byte length of data for which a saving is requested from the client, corresponding to the “key.”
The “file descriptor” is a descriptor for identifying a file in which stored data is written. The “file descriptor” is an identifier that is used for a file access in a general OS. For example, an “open( )” system call, which converts a file name into a descriptor, is well known. As long as the data is information for identifying a file in which stored data is written, another format is allowable.
The “file offset” is a storage location in a file where the data (the response data containing data for which saving is requested from the client) corresponding to the “key” is stored. The “file offset” indicates a byte location from the top of the file. The figure shows that the corresponding response data is stored at the shaded location in a file “A.” One file holds many pieces of data, and the respective pieces of data are separated by the vertical lines. The embodiment assumes that these respective pieces of data have a format of the response message in the cache protocol.
The data-holding management table is a table with which the hash value of the “key” (character string or byte sequence) is calculated and that has the calculated result as an index. Here, there is a possibility that the same hash value is obtained from a plurality of the “keys”. This table has a plurality of entries, and each entry contains a hash value (an index) and a list of data-holding structures for “keys” having the hash value. Thereby, the retrieval of a data-holding structure having a targeted “key” can be implemented by calculating the hash value of the “key,” and then by, from a list of data-holding structures in an entry whose index is the calculated hash value, retrieving a data-holding structure with the matching “key” (character string or byte sequence) value. Concretely, in the example of the figure, in order from the leftmost data-holding structure in a list, check of a data-holding structure is performed with a rightward shift, until a data-holding structure with the matching “key” value is found. The “NULL” indicates the termination of a list, and, if the retrieval reaches the “NULL” without identifying the data-holding structure, the judgment that the data having the targeted “key” are not present is made.
The unused structure management table manages unused data-holding structures as a list. The “class” is a classification by the size of a storage area that is allocated for a data-holding structure. It can be determined by design, for example, that 128 bytes or less is “class 1,” over 128 bytes to 256 bytes or less is “class 2,” and over 256 bytes to 1,024 bytes or less is “class 3.” In the case where the response message (response data) generated from the data for which saving is requested by the “SET” request message has a size of 128 bytes or less, a data-holding structure to be used is specified from the list of the data-holding structures classified into the “class 1,” and the data is stored from the top of the location shown by the specified data-holding structure. The used data-holding structure is deleted from the unused structure management table, and is moved to the data-holding management table shown in
First, the “GET” request message is received ((1)). The cache protocol processor 52 receives the “GET” request message, through the network input/output unit 21, the TCP process offloading unit 22, the network driver 42, the TCP/IP protocol stack 43, and the system call unit 48 (“recv( )” system call).
Next, a data retrieval process ((2)), a data transmission instruction process ((3)), a data acquisition process ((4)) and a response data transmission ((5)) are continuously performed.
This continuous flow will be concretely described using
As shown in
In the entry of 0×05, the “key” value of the data-holding structure positioned at the top is checked, and since it is confirmed that the “key” value is “xxxx,” the retrieval is completed ((2)).
The “file descriptor” and “file offset” contained in the data-hold structure indicate the location of the targeted data. The direct transmission instructor 54 sets these “file descriptor” and “file offset” as arguments and calls the “sendfile( )” system call, and thereby instructs data transmission to the direct transmission HW driver 47((3)).
By controlling the direct-storage-access processor 23, while reading the data (response data) at an offset position (“0×abcd00”) in the file (here, “z” (=“file descriptor”) designates the file “A”) designated by the “sendfile( )” from the SATA storage 12, and the direct transmission HW driver 47 passes the data to the TCP process offloading unit 22 ((4)). The TCP process offloading unit 22 adds a header to the passed data to configure packets, and transmits the packets from the network ((5)). Thereby, transmission of the response data is completed.
As shown in
The cache protocol processor 52 receives the “SET” request message, through the network input/output unit 21, the TCP process offloading unit 22, the network driver 42, the TCP/IP protocol stack 43, and the system call unit 48 (“recv( )” system call) ((1)).
It is supposed that the “SET” request message is a message whose key is “mmmm” (each “m” represents a numerical value) and that instructs holding of data with a length of “nnnn” (each “n” represents a numerical value), as shown in
First, the data placement managing unit 55 checks the value of “nnnn” and specifies the header length of the response data (response message) that is generated from the data with the length of “nnnn.” It is supposed that the total value of the length of “nnnn” and the header length is the length of the “class 3.”Here, as the header length of the response data, the length of a prescribed longest format (for example, the key is 250 bytes and other numerical values are 16 bytes.) may be used. Alternatively, it is allowable to temporarily make a header of the response data when determining the class, and to use the length of the header.
One unused data-holding structure is taken out from the list of the “class 3” entry in the unused structure management table. In this data-holding structure, a storage area with a size of the “class 3” is allocated at an offset “0×dcba00” in the file “A” (the “file descriptor” is “z.”).
As shown in
Here, in the writing to the file, first, the buffer cache 45 performs caching in the main memory 32 ((3) of
A sequence when the response data held by the buffer cache 45 contains the data for which acquisition is requested by the subsequent “GET” request message, will be shown. Until the data transmission instruction process, the processes previously described using
It is supposed that a certain client A performs a “GETS” request, and receives response data with a check code. Next, when the client A attempts to perform update of the value by designating the check code, in some cases, the check code designated by the client does not match with one that, in the server, is managed by a check code added to the key. That is, another client B has already updated the value for the key after the client A receives the “GETS” response and before the client A updates the value by designating the check code. In such cases, the value update with the check code designation by the client A is rejected. In some cases, the check code is called a “casID (Check and Set ID).”
In some cases, the caching server 101 needs to deal with both of the two types of response data shown in
The discrimination of the type of the response data by the presence or absence of the check code is just one example, and the type of the response data is not limited to this. For example, a case where the protocol supports both text messages (in which “GET” and the like are readable for human) and binary messages (whose character appearances are numerical values after encoded), and a case where the contents of the response data for same key and value are different such as, for example, whether the data are compressed, are allowable.
For example, a text response is performed for a request message of ASCII strings, and a binary response is performed for a request message of binary data. Thus, a value is not only stored corresponding to a key, like a typical key-value database, but also, if data (character appearance) to be responded differ for each request, response data containing a different value (data) is stored for each correspondence of the request and the key.
Here, in the example of
In the embodiment, the block configuration in the upper side of
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2013-116072 | May 2013 | JP | national |