This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2013-164452, filed on Aug. 7, 2013, the entire contents of which are incorporated herein by reference.
The embodiment discussed herein is related to a storage system, a storage control device, and a storage medium storing a control program.
In recent years, there is increasing use of a distributed file system that operates multiple file servers in the same manner as one file server and allows access to a file via multiple computer networks. The distributed file system enables multiple users to share files and storage resources on multiple machines. As a utility form of such a distributed file system, previously, multiple file servers in the same building or the same site are virtually integrated into one file server; nowadays, a form of wide-area deployment of file servers on a global scale is becoming widespread.
A distributed file system is constructed of a global name space and a file system. The global name space is for integrating respective file name spaces separately managed by file servers into one to realize a virtual file name space, and is a core technology for distributed file systems. The distributed file system is a system that provides a virtual name space created by the global name space to clients.
When a client such as a personal computer (PC) 91d out of clients 91a to 91d accesses a certain file, the client 91d issues a request to the name node 92 (1). Here, assume that files 94a, 94c, and 94d are triplexed by, for example, three data nodes 93a, 93c, and 93d out of the multiple data nodes 93a to 93d.
The name node 92 includes a meta-information storage unit 92a that stores therein meta-information including information on locations of the clients 91a to 91d and locations of the data nodes 93a to 93d, and file information, etc. The name node 92 instructs the data node 93d nearest to the client 91d, which has issued the request, to transfer the file on the basis of the meta-information (2). The data node 93d directly transfers the file to the client 91d on the basis of the instruction from the name node 92 (3).
However, in the case of the distributed file system illustrated in
However, when the function of the name node 92 is distributed to multiple nodes, there is a problem that the time taken for meta-information synchronization among name nodes for the maintenance of consistency of the global name space is to be shortened. The maintenance of consistency of the global name space here is to make the meta-information consistent among multiple name nodes.
According to an embodiment, it is possible to shorten the time taken for meta-information synchronization among name nodes.
According to an aspect of an embodiment, a storage system in which multiple nodes that each include a storage device and a management device are connected by a network includes, a first management device, out of multiple management devices, that stores, when data has been created, the data in a storage device in a node thereof, and manages an identifier of the data in a manner associated with a storage location of the data in the storage device; and a second management device that receives an instruction to associate information indicating that the data is under the management of the first management device with the identifier of the data from the first management device asynchronously with the time of creation of the data, and manages the information in a manner associated with the identifier.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
Preferred embodiments of the present invention will be explained with reference to accompanying drawings. Incidentally, this embodiment does not limit a technology discussed herein.
First, a configuration of a distributed file system according to the embodiment is explained.
The name nodes 1 to 3 each have meta-information, and manage file names of files in the entire distributed file system 101. Furthermore, data nodes in which files are stored are placed in each area. Specifically, data nodes 61 to 63 are placed in the area 51, data nodes 71 to 73 are placed in the area 52, and data nodes 81 to 83 are placed in the area 53.
Incidentally, here, the name node and the data nodes placed in each area are collectively referred to as a node. For example, the name node 1 and the data nodes 61 to 63 form a node placed in the area 51.
Clients in each area request a name node in the area to allow an access to a file. That is, clients 51a to 51c in the area 51 request the name node 1 to allow a file access, clients 52a to 52c in the area 52 request the name node 2 to allow file access, and clients 53a to 53c in the area 53 request the name node 3 to allow a file access.
Meta-information is synchronized among name nodes.
Then, the name node 1 instructs the nearest name node 2 to create a copy of the file 6a, and the name node 2 as a slave name node creates a file 6a in the data node 7 and stores meta-information of the file 6a in a meta-information storage unit 2a. The slave name node of the file here is a name node that manages the file as a slave subordinate to the master.
In this manner, in the distributed file system 101 according to the embodiment, a file and meta-information are synchronized between a master name node and a slave name node in real time. That is, in the distributed file system 101 according to the embodiment, when a file has been created in a master name node, the file and meta-information of the file are synchronized between the master name node and its slave name node.
On the other hand, when a file has been created in the name node 1, the name node 3 does not create a copy of the file in the data node 8. Like the name node 3, a name node that does not create data in a data node thereof when a file has been created in a master name node is here referred to as a dummy name node that operates as a dummy with respect to the file. Between a dummy name node and a master name node, real-time synchronization of meta-information is not performed, and meta-information synchronization is performed asynchronously with file creation. Furthermore, in the meta-information synchronization performed asynchronously with file creation, the dummy name node acquires only a part of the meta-information from the master name node.
Incidentally, in
Furthermore,
Subsequently, a functional configuration of a name node according to the embodiment is explained. Incidentally, the name nodes 1 to 3 have the same functional configuration, so here we explain a functional configuration of the name node 1 as an example.
As illustrated in
The meta-information storage unit 10 stores therein information that the name node 1 manages, such as meta-information of a file and node location information.
The metadata includes an inode number, type, master, slave, create, time, path, and hashvalue as members.
Furthermore, the meta-information storage unit 10 stores therein log information of the file. The log information includes the number of accesses to the file from clients.
To return to
As illustrated in
Then, the file creating unit 11 of the name node 1 registers a file name “aaa” in a directory in a manner corresponding to inode#x. Then, the file creating unit 11 creates an inode indicating that an inode number is inode#x, type of the name node 1 is master, master is the name node 1, slave is the name node 2, create is the name node 1, and path is /mnt1/A.
Furthermore, a file creating unit of the name node 2 creates an actual file in the data node 7, and registers a file name “aaa” in a directory in a manner corresponding to inode#y. Then, the file creating unit of the name node 2 creates an inode indicating that an inode number is inode#y, type of the name node 2 is slave, master is the name node 1, slave is the name node 2, create is the name node 1, and path is /mnt1/B.
Incidentally, at this point, the name node 3, which is a dummy name node, does not create meta-information of the file whose file name is “aaa”. When the name node 3 has received an instruction for resynchronization, the name node 3 creates meta-information of the file whose file name is “aaa”. The term “resynchronization” here means synchronization performed between a master and a dummy because synchronization between the master and the dummy is not performed as against synchronization between the master and a slave is performed when a file is created.
Furthermore, to allow a name node to create a file with the same file name as a file which has been created in the name node 1 while meta-information on the file created in the name node 1 has not yet been reflected in another name node, the file is identified by the file name plus create information.
The resynchronization unit 12 performs resynchronization of meta-information among name nodes regularly or on the basis of an instruction from a system administrator. As for a file of which the master is the name node 1, the resynchronization unit 12 instructs a dummy name node to create a dummy; as for a file of which the dummy is the name node 1, the resynchronization unit 12 creates a dummy. Creating a dummy here is to create meta-information of a file in a dummy name node.
Then, a resynchronization unit of the name node 3 registers a file name “aaa” in a directory in a manner corresponding to inode#z. Then, the resynchronization unit of the name node 3 creates an inode indicating that an inode number is inode#z, type of the name node 3 is dummy, master is the name node 1, slave is the name node 2, create is the name node 1, and path is null. The term “null” here indicates that there is no path to the file. That is, the dummy name node includes only meta-information, and the file is not in an area in which the dummy name node is placed.
In this manner, the resynchronization unit 12 performs resynchronization among name nodes, thereby a dummy name node can recognize a destination to transfer an access request upon request for access to a file of which the dummy is the dummy name node.
The file open unit 13 performs open processing, such as checking whether there is a file, in response to a file open request. When the name node 1 is a master of the file requested to be opened, the file open unit 13 performs open processing on the file in a data node of the name node 1, and instructs a slave name node to open the file. On the other hand, when the name node 1 is not a master of the file requested to be opened, the file open unit 13 transfers the file open request to a master name node, and transmits a response to a requestor on the basis of a response from the master name node.
The file reading unit 14 reads out data from an opened file and transmits the read data to a client. When the name node 1 is a master or slave of the file requested to be read, the file reading unit 14 reads out the file from a data node of the name node 1 and transmits the read file to the client. On the other hand, when the name node 1 is a dummy of the file requested to be read, the file reading unit 14 requests a master name node or a slave name node, whichever is closer to the name node 1, to transfer the file, and transmits the transferred file to the client.
As illustrated in
The file writing unit 15 writes data specified in a file write request from a client to a specified file. When the name node 1 is a master of the file requested to write the data thereto, the file writing unit 15 writes the file in a data node of the name node 1, and instructs a slave name node to write the data to the file. On the other hand, when the name node 1 is not a master of the file requested to write the data thereto, the file writing unit 15 transfers the request to a master name node, and transmits a response to a requestor on the basis of a response from the master name node.
The file close unit 16 performs a process of completing input-output to a file specified in a file close request. When the name node 1 is a master of the file requested to be closed, the file close unit 16 performs the completing process on the file in a data node of the name node 1, and instructs a slave name node to close the file. On the other hand, when the name node 1 is not a master of the file requested to be closed, the file close unit 16 transfers the file close request to a master name node, and transmits a response to a requestor on the basis of a response from the master name node.
The file deleting unit 17 performs a process of deleting a file specified in a file delete request. When the name node 1 is a master of the file requested to be deleted, the file deleting unit 17 performs a process of deleting the file in the name node 1, and instructs a slave name node to delete the file. On the other hand, when the name node 1 is not a master of the file requested to be deleted, the file deleting unit 17 transfers the file delete request to a master name node, and transmits a response to a requestor on the basis of a response from the master name node.
The statistical processing unit 18 records log information including the number of accesses to a file from clients on the meta-information storage unit 10.
The migration unit 19 performs migration of a file on the basis of a migration policy.
“Schedule” indicates to perform migration on the basis of a schedule, and the migration unit 19 migrates a specified file or a specified directory to a specified name node at specified time. For example, when a file written in Tokyo is referenced or updated in London and is further referenced or updated in New York, by migrating the file regularly according to time difference, the distributed file system 101 can speed up the file access.
Incidentally, a migration schedule is shared by all name nodes, and migration is performed on the initiative of a master of each file. Name nodes which are not a master of a file ignore a migration schedule of the file.
“Manual” indicates to perform migration on the basis of an instruction from a system administrator, and the migration unit 19 migrates a specified file or directory to a specified name node.
“Automatic” indicates to perform migration on the basis of the access frequency, and the migration unit 19 migrates a file to a node having the highest access frequency. For example, after a file created in Tokyo has been referenced or updated in Tokyo for a given period of time, when the file is referenced in New York over a long period of time, by migrating the file on the basis of the access frequency, the distributed file system 101 can speed up the file access.
“Fixed” indicates not to perform migration. For example, if a file created in Tokyo is used in Tokyo only, the file requires no migration.
The migration unit 19 performs, as a migration process, copying of a file from a source to a migration destination and update of meta-information.
In this case, as a migration process, the migration unit 19 of the name node 1 copies a file 6a from the name node 2 to the name node 1. Then, migration units of the name nodes 1 to 3 update meta-information of the file 6a.
Specifically, the migration unit 19 of the name node 1 updates type of the name node 1 from dummy (D) to master (M), and updates the master name node from the name node 2 to the name node 1, and updates the slave name node from the name node 3 to the name node 2. A migration unit of the name node 2 updates type of the name node 2 from master (M) to slave (S), and updates the master name node from the name node 2 to the name node 1, and updates the slave name node from the name node 3 to the name node 2. A migration unit of the name node 3 updates type of the name node 3 from slave (S) to dummy (D), and updates the master name node from the name node 2 to the name node 1, and updates the slave name node from the name node 3 to the name node 2.
Incidentally, as for the name node 4 which is a dummy name node both before and after the migration, traffic related to the migration does not occur. Therefore, the distributed file system 101 can reduce traffic among name nodes at the time of migration.
When the name node 4 has received the file read request, as the name node 4 is not a master, a file reading unit of the name node 4 transfers the read request to the name node 2 determined to be a master from meta-information. Incidentally, a master name node at this point is the name node 1; however, meta-information of the name node 4 has not been updated, so the file reading unit of the name node 4 determines that a master is the name node 2.
When the name node 2 has received the file read request, as the name node 2 is not a master, a file reading unit of the name node 2 transfers the read request to the name node 1 determined to be a master from meta-information. Then, the file reading unit 14 of the name node 1 reads out the file 6a from the data node 6, and transfer the file 6a together with meta-information to the name node 4. Then, the file reading unit of the name node 4 transmits the file 6a to the client, and updates the meta-information. That is, the file reading unit of the name node 4 updates the master name node of the file 6a transmitted to the client to the name node 1, and updates the slave name node of the file 6a transmitted to the client to the name node 2.
Incidentally, in
To return to
Subsequently, the flow of a file creating process is explained.
As illustrated in
Then, the file creating unit 11 transmits a slave-file create request to the name node 2, and the name node 2 receives the slave-file create request (Step S4). Then, the file creating unit of the name node 2 creates an actual file (Step S5), and creates an inode indicating type=slave (Step S6).
Then, the file creating unit of the name node 2 transmits a response of completion of slave file creation to the name node 1, and the file creating unit 11 of the name node 1 transmits completion of file creation to the client (Step S7).
In this manner, when a file has been created, a master name node performs the synchronization process with only a slave name node, and does not perform the synchronization process with the other name node as a dummy name node; therefore, the distributed file system 101 can shorten the time taken to perform the synchronization process.
Subsequently, the flow of a resynchronization process is explained.
As illustrated in
Then, the resynchronization unit 12 of the name node 1 waits for completion responses of dummy creation from the dummy name nodes (Step S14), and, when having received the dummy creation responses from all the dummy name nodes, terminates the process.
In this manner, the resynchronization unit performs creation of a dummy asynchronously with file creation, thereby the distributed file system 101 can maintain the consistency of meta-information among multiple name nodes.
Subsequently, the flow of a file reading process is explained.
As illustrated in
On the other hand, if the name node 4 is not a master, the file reading unit transfers the read request to a master name node (Step S23). Then, a file reading unit of the master name node reads out the file from a data node and transmits the read file to a dummy name node, i.e., the name node 4 (Step S24). Then, the file reading unit of the name node 4 transmits the file to the client (Step S25).
In this manner, a dummy name node transfers a file read request to a master name node, and therefore can respond to a request for reading of a file that is not in a data node thereof. Incidentally, here, a dummy name node transfers a file read request to a master name node; however, the dummy name node can transfer the file read request to the master name node or a slave name node, whichever is closer to the dummy name node.
Subsequently, the flow of the migration process is explained.
As illustrated in
Then, the migration unit 19 of the name node 1, which is a master name node after migration, copies a file from a requestor name node into the name node 1 (Step S32), and updates inode information (Step S33). As the update of the inode information, specifically, the migration unit 19 changes type of the name node 1 from dummy (D) to master (M), and changes the master name node from the name node 2 to the name node 1, and changes the slave node from the name node 3 to the name node 2.
Furthermore, the migration unit of the name node 3, which is a slave name node before migration, updates inode information (Step S34), and sets the file as a deletable object (Step S35). As the update of the inode information, specifically, the migration unit changes type of the name node 3 from slave (S) to dummy, and changes the master name node from the name node 2 to the name node 1, and changes the slave node from the name node 3 to the name node 2.
Then, the migration unit of the name node 2 waits for completion responses from the name nodes 1 and 3 (Step S36), and, when having received responses from both the name nodes 1 and 3, updates inode information (Step S37). Specifically, the migration unit of the name node 2 changes type of the name node 2 from master to slave, and changes the master name node from the name node 2 to the name node 1, and changes the slave node from the name node 3 to the name node 2.
In this manner, the migration unit does not transmit a migration request to any name nodes other than a new master name node and a slave node; therefore, the distributed file system 101 can shorten the time taken to perform the synchronization process.
Subsequently, the flow of an after-migration file reading process is explained.
As illustrated in
On the other hand, if the name node 4 is not a master, the file reading unit transfers the file read request to the name node 2 which is a master name node in accordance with inode information (Step S42). Then, the file reading unit of the name node 2 receives the file read request, and determines whether the name node 2 is a master of the file to be read (Step S43). As a result, if the name node 2 is a master, the file reading unit of the name node 2 reads out the file from a data node and transfers the read file to the name node 4 (Step S46). Then, the process control moves on to Step S47.
On the other hand, if the name node 2 is not a master, the file reading unit of the name node 2 transfers the file read request to a master name node (the name node 1 in this example) (Step S44). Here, the name node 2 is not a master, which means after a master name node has been switched from the name node 2 to the name node 1 and before resynchronization is performed, the name node 4 has received the file read request from the client.
Then, the file reading unit 14 of the name node 1 receives the file read request, and reads out the file from a data node and transfers the read file to the name node 4 (Step S45). Then, the process control moves on to Step S47.
Then, the file reading unit of the name node 4 transmits the file to the client (Step S47), and updates inode information (Step S48). Specifically, the file reading unit of the name node 4 changes the master name node from the name node 2 to the name node 1, and changes the slave node from the name node 3 to the name node 2.
In this manner, when a dummy name node has transferred a file read request to an old master name node based on old inode information before migration, the old master name node transfers the file read request to a new master name node. Therefore, a file can be transferred from the master name node to the dummy name node.
Subsequently, the flow of a master switching process using a hash value is explained.
As illustrated in
The migration unit 19 of the name node 1, which is a master name node after migration, holds an actual file to be subject to migration, and determines whether the received hash value coincides with a hash value of the actual file held therein (Step S52). As a result, if the migration unit 19 has held therein an actual file to be subject to migration and the received hash value coincides with a hash value of the actual file held therein, the migration unit 19 skips copying of the file, and the process control moves on to Step S54. On the other hand, if the migration unit 19 has not held therein an actual file to be subject to migration or if the received hash value does not coincide with a hash value of the actual file held therein, the migration unit 19 copies the file from an old master name node (Step S53).
Then, the migration unit 19 updates inode information (Step S54). As the update of the inode information, specifically, the migration unit 19 changes type of the name node 1 from dummy to master, and changes the master name node from the name node 2 to the name node 1, and changes the slave node from the name node 3 to the name node 2.
Then, when having received a response from the name node 1, the migration unit of the name node 2 updates inode information (Step S55). Specifically, the migration unit of the name node 2 changes type of the name node 2 from master to slave, and changes the master name node from the name node 2 to the name node 1, and changes the slave node from the name node 3 to the name node 2.
In this manner, when the migration unit 19 holds an actual file to be subject to migration and a received hash value coincides with a hash value of the actual file held therein, the migration unit 19 skips copying of the file; therefore, it is possible to reduce the load of the master migration process.
Subsequently, the flow of an automatic migration process based on access frequency is explained.
As illustrated in
On the other hand, if the number of accesses to the file exceeds the threshold value, the migration unit requests a master for migration (Step S63). Then, a migration unit of the master name node determines whether the number of accesses from the requesting dummy name node is larger than the number of accesses of the master name node (Step S64), and, if the number of accesses from the requesting dummy name node is larger, the migration unit of the master name node performs a migration process illustrated in
Then, the migration unit of the dummy name node determines whether there is a dummy file that has not been checked whether automatic migration thereof is necessary or not (Step S66), and, if there is, the process control returns to Step S62; if there isn't, the process is terminated.
In this manner, the migration unit performs an automatic migration process, thereby a file can be migrated to an area having a high access frequency, and the distributed file system 101 can speed up access to the file.
As described above, in the embodiment, at the time of file creation, a master name node performs synchronization of meta-information with only a slave name node, and does not perform the meta-information synchronization with a dummy name node. The master name node performs the meta-information synchronization with the dummy name node asynchronously with file creation. Therefore, the distributed file system 101 can shorten the time required for meta-information synchronization performed among name nodes.
Furthermore, in the embodiment, the same meta-information as that stored in the master name node is stored in the slave name node, and a file is stored in a data node placed in the same node as the slave name node and a data node placed in the same node as the master name node. Therefore, the distributed file system 101 can provide a highly-reliable file system.
Moreover, in the embodiment, a migration unit of the master name node transmits a migration request to a dummy name node which is a migration destination, and the dummy name node which has received the migration destination becomes a master name node of a file to be subject to migration. Therefore, when the same file is accessed from multiple areas among which there are time differences, an area in which the file is stored is migrated according to time difference; therefore, the distributed file system 101 can speed up access to the file.
Furthermore, in the embodiment, a migration unit of the migration destination determines the presence or absence of a file to be subject to migration, and, if the migration destination has the file, the migration unit does not copy the file from a migration source. Therefore, the distributed file system 101 can shorten the time required for a migration process.
Moreover, in the embodiment, at scheduled time, a migration unit of the dummy name node determines whether the number of accesses to a file to be subject to automatic migration exceeds a predetermined threshold value, and, if it exceeds the predetermined threshold value, transmits a migration request to the master name node. Therefore, the distributed file system 101 can place the file in a node having a high access frequency, and can speed up access to the file.
Incidentally, in the embodiment, name nodes are discussed; however, components of a name node can be realized by software, thereby a name management program having the same functions as a name node can be obtained. A computer that executes the name management program is explained below.
The main memory 110 is a memory that stores therein a program and an intermediate result of execution of the program, etc. The CPU 120 is a central processor that reads out the program from the main memory 110 and executes the program. The CPU 120 includes a chip set having a memory controller.
The LAN interface 130 is an interface for connecting the computer 100 to another computer via a LAN. The HDD 140 is a disk device that stores therein a program and data, and the super IO 150 is an interface for connecting input devices such as a mouse and a keyboard to the computer 100. The DVI 160 is an interface for connecting a liquid crystal display device to the computer 100, and the ODD 170 is a device that performs reading and writing to a DVD.
The LAN interface 130 is connected to the CPU 120 by PCI Express (PCIe), and the HDD 140 and the ODD 170 are connected to the CPU 120 by SATA (Serial Advanced Technology Attachment). The super IO 150 is connected to the CPU 120 by LPC (Low Pin Count).
The name management program executed by the computer 100 is stored on a DVD, and is read out from the DVD by the ODD 170 and installed on the computer 100. Or, the name management program is stored in a database of another computer connected to the computer 100 via the LAN interface 130, and is read out from the database and installed on the computer 100. Then, the installed name management program is stored in the HDD 140, and is read into the main memory 110 and executed by the CPU 120.
Furthermore, in the embodiment, there is described the case where a data node stores therein a file; however, the present invention is not limited to this, and can be also applied to a case where a data node stores therein another form of data.
All examples and conditional language recited herein are intended for pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment of the present invention has been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2013-164452 | Aug 2013 | JP | national |