The present application claims priority from Japanese patent application JP 2014-006135 filed on Jan. 16, 2014, the content of which is hereby incorporated by reference into this application.
The present invention relates to a gateway device, a file server system, and a file distribution method.
In recent years, there is an approach that a large amount of data represented by big data is stored in a data center, and subjected to batch processing to obtain knowledge (information) useful for business. In the case of processing large amounts of data, the performance of a disk I/O (throughput) becomes an issue. Under the circumstances, in a distributed file system technology that is representative of a hadoop distributed file system (HDFS) of hadoop, large files are divided into small units (blocks), stored in local disks of plural servers, and read from the plural servers (disks) in parallel when reading the files to realize a high throughput (for example, refer to items of “Architecture”, “Deployment-Administrative commands”, “HDFS High Availability Using the Quorum Journal Manager”, [online], The Apache Software Foundation, [searched on Nov. 15, 2013], the Internet http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html). On the other hand, in a service delivery platform of telecommunications carriers or a system control platform of social infrastructure operators in power or traffic, non-stop operation of the service is one of top priorities, and a failed server is disconnected and switched to a standby server in the event of a system failure, to thereby realize high reliability.
For example, in a technique disclosed in JP-A-2012-173996, there is proposed a method of preventing unnecessary service stop when split brain (abnormal operation by synchronous fraud between servers due to network failure) in a cluster system (for example, refer to the summary).
In a distributed file system, a large number of servers are coordinated for operation to realize distribution processing, and the processing performance can be improved with an increase in the number of servers. On the other hand, because the increase in the number of servers makes a possibility that a failure occurs high, even in a state where a part of the servers does not normally operate, there is required that the processing can be normally continued as the entire system.
The technique disclosed in “HDFS High Availability Using the Quorum Journal Manager”, [online], The Apache Software Foundation, [searched on Nov. 15, 2013], the Internet <http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html>, the redundancy of a NameNode that manages metadata of the file system becomes an issue. For that reason, the technique includes an active NameNode server and a standby NameNode server, and when the NameNode is in failure, the server that is in an active state stops, and switches to the server that is in a standby state for processing to realize high reliability. However, in the technique disclosed in “HDFS High Availability Using the Quorum Journal Manager”, there is a risk that the service is interrupted during switching between the active server and the standby server, or the switching process fails to stop the service. Further, in order to apply the technique of “HDFS High Availability Using the Quorum Journal Manager”, because there is a need to update software of all the servers, there arises such a problem that operational costs (implementation costs) are large.
In the technique disclosed in JP-A-2012-173996, as described above, there is proposed a method of preventing unnecessary service stop when the split brain occurs in a cluster system. However, the technique of JP-A-2012-173996 suffers from such a problem that a shared storage is used to synchronization processing between the servers, but a failure of the shared storage is not considered. Further, the technique of JP-A-2012-173996 does not consider the redundancy of data, and cannot ensure the data availability of the distributed file system.
The present invention improves the availability of a system having a distributed file system including plural file servers and a local disk.
For example, it is provided a gateway device that mediates requests between a client device that transmits a request including any one of file storage, file read, and file deletion, and a distributed file system having a plurality, of file server clusters that perform file processing according to the request, the gateway device comprising:
a health check function unit that monitors an operating status of the file server cluster; and
a data control function unit that receives the request for the distributed file system from the client device, and selects one or more of the file server clusters that are normally in operation, and distributes the request to the selected file server clusters.
For another example, it is provided a file server system comprising:
a distributed file system having a plurality of file server clusters that perform any one of file storage, file read, and file deletion according to a request,
a gateway device that mediates requests between a client device that transmits the request including any one of file storage, file read, and file deletion, and the distributed file system
wherein the gateway device comprising:
a health check function unit that monitors an operating status of the file server cluster; and
a data control function unit that receives the request for the distributed file system from the client device, and selects one or more of the file server clusters that are normally in operation, and distributes the request to the selected file server clusters.
For another example, it is provided a file distribution method in a file server system, the file server system comprising:
a distributed file system having a plurality of file server clusters that perform any one of file storage, file read, and file deletion according to a request,
a gateway device that mediates requests between a client device that transmits the request including any one of file storage, file read, and file deletion, and the distributed file system
wherein the gateway device
monitors an operating status of the file server cluster, and
receives the request for the distributed file system from the client device, and selects one or more of the file server clusters that are normally in operation, and distributes the request to the selected file server clusters.
It is possible, according to the disclosure of the specification and figures, to improve the availability of a system having a distributed file system including plural file servers and a local disk.
The details of one or more implementations of the subject matter described in the specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.
This embodiment relates to a gateway system installed on a communication path between a terminal and a server device, a file server system having the gateway device, and a file distribution method in a network system that communicates data between the server devices in, for example, a world wide web (WWW), a file storage system, and a data center, and the terminal. Hereinafter, respective embodiments will be described with reference to the drawings.
A computer system (file server system) according to this embodiment includes one or plural client devices (hereinafter referred to merely as “client”) 10, one or plural application extension gateway device (hereinafter referred to also as “gateway device”) 30, and one or plural file server clusters 40, and the respective devices are connected to each other through networks 20 or 21.
Each of the clients 10 is a terminal that creates a file and/or executes an application referring to the file. The application extension gateway device 30 is a server that is installed between the clients 10 and the file server clusters 40, and implements a function unit program of this embodiment. For example, each of the clients 10 transmits a request including any one of file storage, file read, file deletion, and file search.
Each of the file server clusters 40 includes at least one name node 50 that manages metadata such as data location or status, and one or plural data nodes 60 that hold data, and one or plural file server clusters 40 to configure a distributed file system.
In this embodiment, the application extension gateway device 30 and the file server clusters 40 are configured by separate hardware. Alternatively, the application extension gateway device 30 and the file server clusters 40 may be configured to operate on the same hardware.
Also, in this embodiment, a configuration of the computer system having one application extension gateway device 30 will be described. Alternatively, the computer system may have plural gateway devices 30. In this case, information is shared or synchronization is performed among the plural gateway devices 30.
The gateway device 30 includes, for example, at least one CPU 101, at least one network interfaces (NW I/F) 102 to 104, an input/output device 106, and a memory 105. The respective units are connected to each other through a communication path 107 such as an internal bus, and realized on a computer. The NW I/F 102 is connected to the client 10 through the network 20. The NW I/F 103 is connected to the name node 50 of a file server cluster through the network 21. The NW I/F 104 is connected to the data node 60 of the file server cluster through the network 21. In the memory 105 are stored the respective programs of a client API function unit 111, a cluster setting function unit 112, a health check function unit 113, a data control function unit 114, a data restore function unit 115, and a data policy setting function unit 117, which will be described below, and a cluster management table 121, a data index management table 122, and a data policy management table 123 therein. The respective programs are executed by the CPU 101 to realize the operation of the respective function units. The respective tables may not be of a table form, or may be an appropriate storage region.
The respective programs may be stored in the memory 105 of the gateway device 30 in advance, or may be introduced into the memory 105 through a recording medium available by the gateway device 30 when needed. The recording medium means, for example, a recording medium detachably attached to the input/output device 106, or a medium through a communication medium (that is, a network such as wired, wireless, or light which is connected to the NW I/F 102 to 104, or a carrier wave or a digital signal which propagates through the network).
The input/output device 106 includes, for example, an input unit that receives data according to the operation of a manager 70, and a display unit that displays data. The input/output device 106 may be connected to an external management terminal operated by the manager so as to receive data from the management terminal, or output data to the management terminal.
The data control function unit 114 distributes the following processing according to a request type from the client 10 (S601).
First, the file creation will be described. If the request type is <file creation>, the data control function unit 114 selects the cluster that stores the file with reference to the cluster management table 121 and the data policy management table 123, and acquires the name node address of the selected cluster (S602). For example, the data control function unit 114 acquires the corresponding data redundancy 904 with reference to the data policy management table 123 on the basis of the application type included in the request from the client 10. Also, the data control function unit 114 selects the clusters of the number corresponding to the data redundancy from the clusters indicating that the operating status 804 is normal with reference to the cluster management table 121. The selection manner of the clusters is performed according to the cluster distribution rule 1101 illustrated in
The data control function unit 114 inquires of the name node of the selected cluster about whether to enable the file creation according to the acquired name node address (S603). If the data control function unit 114 receives a response that the file can be created from the name node, the data control function unit 114 requests an appropriate data node to create the file (S604). The data control function unit 114 acquires the file from the client 10 at appropriate timing, and transfers the file to the data node. On the other hand, except for the case where the data control function unit 114 receives the response that the file can be created from the name node, the data control function unit 114 selects another cluster. The manner of selecting the clusters is identical with the above-mentioned manner. The exclusion of the case in which the file can be created from the name node includes a case in which the data control function unit 114 receives a response that the permission of the file creation is difficult from the name node due to the capacity shortage of the cluster, and a case in which there is no response from the name node.
The data control function unit 114 repeats the processing of Steps S603 and S604 until the file creation processing suitable for the data redundancy policy is completed (S605). Upon the completion of the file creation processing, the data control function unit 114 updates the data index management table 122 (S606). For example, the data control function unit 114 obtains the data key from the file name, and stores the data key, the cluster ID of one or plural clusters that store the files, the file name, the application type, the file size, and the updated date in the data index management table 122. Also, the data control function unit 114 returns the file creation results to the client API function unit 111 (S607). The file creation results include, for example, the completion of the file creation, and the cluster that has created the file. The file creation results are transmitted to the client 10 through the client API function unit 111.
If the request type is <file read> (S601), the data control function unit 114 acquires the name node address of the cluster in which the file to be read is stored with reference to the cluster management table 121 and the data index management table 122 (S611). For example, the data control function unit 114 acquires the corresponding application type 1005 and cluster ID 1003 with reference to the data index management table 122 on the basis of the file name included in the request from the client 10. Also, the data control function unit 114 acquires the corresponding read majority determination information 905 with reference to the data policy management table 123 on the basis of the acquired application type. Further, the data control function unit 114 acquires the corresponding name node address 803 with reference to the cluster management table 121 on the basis of the acquired cluster ID.
The data control function unit 114 inquires of the name node of the selected cluster about whether to enable the file read according to the acquired name node address (S612). If the data control function unit 114 receives the response that the file can be read from the name node, the data control function unit 114 requests the data node of the appropriate cluster to read the file (S613). As a result, the data control function unit 114 reads the file from the data node. On the other hand, except for the case where the data control function unit 114 receives the response that the file can be read from the name node, the data control function unit 114 selects another cluster from the clusters that store the target file, and repeats Step S612. For example, another cluster ID is selected from the cluster IDs acquired with reference to the data index management table 122. The data control function unit 114 repeats the processing in Steps S612 and S613 until the file read processing suitable for a majority determination policy is completed (S614).
When the majority determination policy is applied, the data control function unit 114 reads the file from the plural data nodes, and if the number of files having the same contents is, for example, the majority of the acquired total number, the data control function unit 114 determines that the read processing is completed. The identity of the files can be checked by calculating a hash value such as MD5, and determining whether or not the files are identical. Upon the completion of the file read processing, the data control function unit 114 returns the file read results to the client API function unit 111 (S615). The file read results include, for example, the read files. The file read results are transmitted to the client 10 through the client API function unit 111.
Also, if the request type is <file deletion>, the data control function unit 114 acquires the name node address of the cluster in which the file to be deleted is stored with reference to the cluster management table 121 and the data index management table 122 (S621). For example, the data control function unit 114 acquires the corresponding cluster ID 1003 with reference to the data index management table 122 on the basis of the file name included in the request from the client 10. Also, the data control function unit 114 acquires the corresponding name node address 803 with reference to the cluster management table 121 on the basis of the acquired cluster ID.
The data control function unit 114 inquires of the name node of the selected cluster about whether to enable the file deletion according to the acquired name node address (S622). When receiving the response that the file deletion is enabled from the name node, the data control function unit 114 requests the appropriate data node to delete the file (S623). As a result, the data control function unit 114 deletes the file from the data node in which the file is stored. On the other hand, except for the case where the data control function unit 114 receives the response that the file can be deleted from the name node, the data control function unit 114 selects another cluster. The data control function unit 114 repeats Steps S622 and S623 until the file deletion processing is completed from the node that holds the data (S624). Upon the completion of the file deletion processing, the data control function unit 114 updates the data index management table 122 (S625). For example, the data control function unit 114 deletes the entry of the file name to be deleted. Also, the data control function unit 114 returns the file deletion results (S626). The file deletion results include, for example, information indicating that the file is correctly deleted. If there is a cluster in which the file could not be deleted, the identification information on the cluster may be included in the file deletion results. The file deletion results are transmitted to the client 10 through the client API function unit 111.
Also, if the request type is <file search>, the data control function unit 114 searches the data index management table 122 according to the search condition included in the request information (S631). The search condition includes, for example, the designation of the file name, the designation of the size, or a range designation of the updated date, but maybe other conditions. For example, if the search condition is the designation of the file name, the data control function unit 114 acquires the respective pieces of information (identification information on the file, and the above-mentioned file information) on the appropriate entry with reference to the data index management table 122 on the basis of the file name included in the request information. Then, the data control function unit 114 returns the file sear results to the client API function unit 111 (S632). The file search results include, for example, the respective pieces of information on the appropriate entry acquired from the data index management table 122. The file search results are transmitted to the client 10 via the client API function unit 111.
Specifically, the data restore function unit 115 searches the entry in which the cluster ID of the abnormal duration cluster (first file server cluster) is registered with reference to the cluster ID of the data index management table 122. The data restore function unit 115 acquires the corresponding data redundancy with reference to the data policy management table 123 on the basis of the application type 1005 of the appropriate entry. If the plural data redundancies are present, the abnormal direction cluster is in an abnormal state, to thereby reduce the redundancy. Therefore, the data restore function unit 115 again refers to the appropriate entry of the data index management table 122, and specifies the cluster ID other than the abnormal duration cluster. The data restore function unit 115 reads the file from the cluster (second file server cluster) indicated by the specified cluster ID in the same manner as that of the file read processing illustrated in
The processing of the data restore function unit 115 is called by the health check function unit 113, but may be executed by another appropriate trigger, or may be periodically executed.
First, the gateway device 30 refers to the cluster management table 121 illustrated in
First, the gateway device 30 refers to the cluster management table 121 illustrated in
First, the gateway device 30 refers to the cluster management table 121 illustrated in
The gateway device 30 transmits a file deletion request to the name node of the clusters #1001 and #1003, and if a response to the file deletion request is “acceptable”, the gateway device 30 executes the file deletion for the appropriate data node, and deletes the file. Also, the gateway device 30 notifies the client 10 of the file deletion completion.
After receiving the file search API request, the gateway device 30 first searches the data index management table 122 illustrated in
The health check function unit 113 of the gateway device 30 monitors the operating status of the file server clusters 40 (S2201). Also, the data control function unit 114 of the gateway device 30 receives the request for the distributed file system from the client device (S2202). The data control function unit 114 of the gateway device 30 selects one or more file server clusters 40 that are normally in operation (S2203). Also, the data control function unit 114 of the gateway device 30 distributes the request to the selected file server clusters 40 for transmission (S2204).
According to this embodiment, in the system where the distributed file system configured by the plural client terminals, servers, and local disks exchanges a large amount of data, the availability of the overall system can be improved.
Also, according to this embodiment, the application extension gateway device can select an appropriate server that processes data along a level required by the application in response to the request to the distributed file system, and implement distribution processing of data. Also, the application extension gateway device distributes and manages data and the meta information of data along the level required by the application by the plural servers, thereby being capable of executing data processing without stopping the service when a fault occurs in the server or the local disk.
Further, according to this embodiment, the application extension gateway device can be introduced without changing the server software of the distributed file system. Further, in the gateway device, the management policy of data can be flexibly set and executed according to the application type, and an additional function unit such as the file search function unit can be added.
Also, according to this embodiment, no high-performance server is required, no software that manages a large amount of data is required, and the introduction is easy. On the other hand, in order to ensure neglected reliability, the same processing is executed by the plural servers in parallel, and data is made redundant, thereby being capable of maintaining the high reliability.
In the first embodiment, a case in which the plural file server clusters are configured in advance, and the data may be distributed as it is has been described. In a second embodiment, a data migration method in which only one file server cluster is present in an initial stage, and data not distributed is present will be described.
A portion after data has been migrated is identical with that in the first embodiment. For that reason, in this embodiment, differences from the first embodiment will be mainly described.
According to this embodiment, data can be migrated from the system having only one file server cluster to the distributed system by the gateway device. Also, after migration, processing in the gateway device in the first embodiment can be applied. In the above example, a case in which only one file server cluster is provided has been described. Also, the present invention can be applied to a case in which a new file server cluster is provided in a system having plural file server clusters.
The present invention is not limited to the above embodiments, but includes various modified examples. For example, in the above-mentioned embodiments, in order to easily understand the present invention, the specific configurations are described. However, the present invention does not always provide all of the configurations described above. Also, a part of one configuration example can be replaced with another configuration example, and the configuration of one embodiment can be added with the configuration of another embodiment. Also, in a part of the respective configuration examples, another configuration can be added, deleted, or replaced.
Also, parts or all of the above-described respective configurations, functions and processors may be realized, for example, as an integrated circuit, or other hardware. Also, the above respective configurations and functions may be realized by allowing the processor to interpret and execute programs for realizing the respective functions. That is, the respective configurations and functions may be realized by software. The information on the program, table, and file for realizing the respective functions can be stored in a storage device such as a memory, a hard disc, or an SSD (solid state drive), or a storage medium such as an IC card, an SD card, or a DVD.
Also, the control lines and the information lines necessary for description are illustrated, and all of the control lines and the information lines necessary for products are not illustrated. In fact, it may be conceivable that most of the configurations are connected to each other.
Although the present disclosure has been described with reference to example embodiments, those skilled in the art will recognize that various changes and modifications may be made in form and detail without departing from the spirit and scope of the claimed subject matter.
Number | Date | Country | Kind |
---|---|---|---|
2014-006135 | Jan 2014 | JP | national |