The present application claims priority from Japanese Patent Application 2013-029852 filed on Feb. 19, 2013, the content of which is hereby incorporated by reference into this application.
The present invention relates to an apparatus and a method, for preventing duplication of data strings of files in an autonomous distributed type file system, and, more particularly, to an effective technique applicable to controlling replicated data of a storage device which can be connected to a plurality of different kind networks.
With a rapid increase in an amount of data handled in a computer system, there has been developed a technique for realizing high speed access and large volume access to a storage device system. In this technique, connection is made between a plurality of disk array devices (hereinafter referred to as storage device systems) and servers using a dedicated network (Storage Area Network, hereinafter referred to as SAN), to manage enormous data with high efficiently. To realize high speed data transfer by connecting the storage device systems and the servers through the SAN, a network is generally constructed using a communication unit in accordance with a fiber channel protocol.
In general, contents that have different file names are independently stored into a storage device even if the contents are exactly same. Thus, the storage capacity is wastefully consumed. It is therefore important to attain a technique for preventing storage of files with the duplicated contents.
Japanese Unexamined Patent Application Publication No. 2009-237979 discloses a server which reduces an increased amount of data stored in a plurality of file servers and can out down a storage cost for file maintenance. In the invention of Japanese Unexamined Patent Application Publication No. 2009-237979, when a duplicated files are included in those files stored by a controlling file server, a proxy server for file server management causes a user terminal to show the files in the form of a plurality of files, on the other hand, in fact, only one file is stored, thereby attempting to reduce the duplicated file. According to this server, in response to a request for storing a file from a user terminal, a file access management unit acquires a hash value of the requested file for storage, and checks the existence of the same file based on the hash value. A file management unit manages only registration information regarding the requested file for storage if there is the same file as the requested file for storage, and manages the registration information regarding the requested file for storage and file data if there is not the same file as the requested file for storage.
Japanese Unexamined Patent Application Publication No. 2009-129441 discloses a technique for deduplicating data in a storage system, by calculating a hash value of a current virtual file, and searching for real file information based on the same hash value. In the invention of Japanese Unexamined Patent Application Publication No. 2009-129441, an attempt is made both to compress data storage in the capacity due to duplicate prevention and to realize data security. That is, duplicated real data is deleted. However, a duplicate prevention process is not performed when the duplication degree becomes equal to or greater than a threshold value. This enables to ease some problems such as the risk of loss of stored data and a decrease in reliability and performance over a plurality of target data items.
In the field of big data, in the storage and process for data in a range from several hundred Terabytes to several hundred Petabytes, it is demanded to realize both a distributed storage system to be parallelly accessed by distributing storage device units and a duplicate prevention technique to allow storing enormous data.
Duplicate prevention performed in the conventional storage units is to reduce an amount of real data by removing a sector of the totally same contents.
In the invention of Japanese Unexamined Patent Application Publication No. 2009-237979, reduction is made in an amount of data by deleting the duplicated file. However, it loses an opportunity for performing a parallel process for accessing these data items by a plurality of user terminals.
In the invention of Japanese Unexamined Patent Application Publication No. 2009-129441, duplicated real data is deleted until the duplication degree reaches a threshold value. The amount of data is reduced due to deletion of the duplicated real data. However, it loses an opportunity for performing a parallel process for accessing these data items by a plurality of user terminals.
Accordingly, Japanese Unexamined Patent Application Publication Nos. 2009-237979 and 2009-129441 have no sufficient consideration on achieving both duplicate prevention of the same data in the file system and the parallel access process.
The main subject of the present invention is to provide an autonomous distributed type file system, a storage device, and a data access method, for achieving both prevention of excess duplication of the same data in order to increase an effective amount of data storage and a parallel access process.
The typical example of the present invention is as follows. A file system may be an autonomous distributed type file system which is connected to a data reference device through a first network, comprising: a plurality of storage device units which are mutually connected through a second network and connected to the first network; a storage directory; and a duplicated data maintaining unit, each of the storage device units includes a local storage, wherein the storage directory has a function of keeping, in relation to data to be kept, an ID of a logical block and an ID of a physical block of the local storage of each of the storage device units, a value of a link to a node ID of a same or another storage device unit, and a value of a link to the logical block ID of this node ID, and wherein the duplicated data maintaining unit refers to the storage directory, continuously keeps one real data item of the data and at least one replicated data item duplicately in a range without running out of storage capacity of each of the storage device units, and restricts or prevents writing of the replicated data when there is no free space in the storage capacity.
According to the present invention, in the file system, the duplication degree of the same data is appropriately controlled, and it is possible to realize both the prevention of the excess duplication and parallel access.
According to a typical embodiment of the present invention, an autonomous distributed type file system which is connected to a data reference device includes a function/configuration for writing files (data strings) and preventing duplication thereof. Each of the files is a holder for keeping data or data itself that is kept, and one single file is composed of sequential record strings. In the record strings of one file, a pointer for referring to another file is embedded as a link. In the autonomous distributed type file system of the present invention, a link is set up to the same part of files (data strings) in different storage device units, that is, the same contents of real data. The file system continues to keep the entity of the real data in a range without running out of storage capacity of the corresponding storage device unit. Further, at the time of reading out data, the system reads the file contents located in the nearest position, thereby enabling to reduce the access time and parallel access. In the situation where it approximately runs out of storage capacity of the corresponding storage device unit, a link is set up to the same part of the real data, its entity (entities) is deleted therefrom, and the number of entities of the same contents is decreased. This enables to increase an amount of stored data (different data), and to maintain the efficiency of the parallel processing, without increasing the total storage capacity of the file system.
In the present invention, “data” implies data based on the write request from the data reference device, in other words, the data kept in different files. For example, it is assumed that the full text dall of a particular research paper is composed of title (d1)+abstract (d2)+body text (d3 to d98)+conclusion (d99). The “data” may be each of data D1-99 of the full text dall, data D2 of the abstract (d2), and data D20-25 of a particular subject matter (d20 to d25) of the body text. These “data” items are kept respectively in different files. “Same data items” imply, for example, data D20-25 and D′20-25 of the same particular subject matter (d20 to d25). On the contrary, the data D20-25 and the data D3-98 of the body text including this are not the same data, and are different data.
Descriptions will now be made to details of the present invention, with reference to the drawings.
A server connected to a network will hereinafter be described by way of example, as a data reference device for the autonomous distributed type file system. However, the present invention is not limited to this, and is applicable to various terminals.
In the autonomous distributed type file system, a plurality of servers as data reference devices are connected through a plurality of access paths, and each of the access paths is connected to the storage device unit which stores files keeping data. That is, the plurality of servers 1000 (“a” to “n”) are connected to a plurality of autonomous distributed type storage device units 1001 (“a” to “m”), through a first network 1006. Each of the storage device units (hereinafter referred to also as nodes) 1001a to 1001n writes or reads out data of a file (data string), based on a request from each server.
The storage device units 1001 (“a” to “m”) are mutually connected through a second network 1007. The first network 106 and the second network 1007 may include, for example, an SAN, a LAN (Local Area Network), a WAN (Wide Area Network), the Internet, a public line, or a dedicated line. For example, when the network is a LAN or a WAN, the plurality of storage device units and servers are mutually connected through an NAS (Network Attached Storage), and communication is performed in accordance with a TCP/IP protocol. When the network is an SAN, communication is performed in accordance with a fiber channel protocol. In this case, the first network 1006 is configured with an SAN, while the second network 1007 is configured with a LAN.
Each of the storage device units 1001 (“a” to “m”) includes a storage interface 1101, a local storage 1102, and a local controller 1103. The local controller 1103 includes a hash value calculation unit 1130 which calculates a hash value, a data comparison unit 1131 which compares data items, a hash value comparison unit 1132 which compares hash values of data, a network interface 1133, a storage directory 1134, and a duplicated data maintaining unit 1135.
The number of storage device units 1001 (“a” to “m”), as an entire system, may be determined appropriately depending on the use. As an example, one file system is preferably configured with a plurality of storage device units 1001, that is, ten or less than ten storage device units. Each of the storage device units 1001 (“a” to “m”) is assigned a value of a unique node ID in advance. For example, the smallest ID value is given to the storage device unit 1001a, while the largest ID value is given to the storage device unit 1001n. This may be opposite to each other, or any other setting is possible. Descriptions will hereinafter be made to a case where the smallest ID value is given to the storage device unit 1001a.
Each of the storage device units 1001 (“a” to “m”) includes channel control units 1101 (function as a storage interface), a local storage 1102, and a local controller 1103. The local controller 1103 includes a network interface 1133, a connection unit 1137, and a management terminal 1140, and controls the local storage 1102 in accordance with a command received from the servers 1000 (“a” to “n”). For example, upon reception of a data input/output request from the server 1000a, the controller performs a process for inputting/outputting data stored in the local storage 1102a. The local controller 1103a gives and receives various commands for managing its storage device unit 1001a and the interaction with each of the servers 1000 (“a” to “n”).
The channel control units 1101 are assigned respective network addresses (for example, IP addresses). The local controller 1103 receives file access requests from the server 1000 through the SAN 1006, from the channel control units 1101. The server 1000 sends a data access request (block access request) in the unit of data/blocks in accordance with a fiber channel protocol, to each of the storage device units 1001.
The local storage 1102 includes a plurality of disk drives (physical disks), and provides the server 1000 with a storage area. Data is stored in a logical volume (LU) as a storage area set logically on the physical storage area provided by the disk drive. The local storage 1102 may have a configuration of a disk array using, for example, the plurality of disk drives. In this case, the storage area provided for the server 1000 is provided using the plurality of disk drives managed by RAID (Redundant Arrays of Inexpensive Disks).
Disk control units 1139 for controlling the local storage 1102 are provided between the local controller 1103 and the local storage 1102. Data or commands are given and received between the channel control units 1101 and the disk control units 1139 through the connection unit 1137.
The disk control units 1139 write data into the local storage 1102, in accordance with a data write command which is received by the channel control unit 1101 from the server 1000. Conversion is made for a data access request for the LU based on a logical address specification transmitted by the cannel control unit 1101, into a data access request for a physical disk based on a physical address specification. When physical disks in the local storage are managed by the RAID, data accessing is performed in accordance with the RAID configuration. The disk control unit 1139 also controls to manage replicating of the data stored in the local storage 1102, as well as backup.
The management terminal 1140 is a computer which maintains and manages the storage device unit 1001. As illustrated in
The memory 1142 stores a physical disk management table 1143, an LU management table 1144, a storage directory 1134, and a program 1146. The CPU 1141 executes the program 1146, thereby controlling the management terminal 1140 entirely.
The storage directory 1134 is to manage writing or reading of data to or from each server, for each storage device unit 1001 (“a” to “m”) in the autonomous distributed type file system, in accordance with free space of the storage device unit. The storage directories 1134 (“a” to “m”) are configured to be mutually connected with each other. Thus, the storage directory 1134 is configured to include some of functions inherent in the LU management table or the physical disk management table. That is, each of the storage directories 1134 includes some or all of the functions of the physical disk management table 1143 and the LU management table 1144, and is configured as a higher rank table than these. Alternatively, the LU management table 1144 may be omitted, and the storage directories 1134 may be provided for the storage device units in one-to-one correspondence to each other.
The physical disk management table 1143 is a table for managing the physical disks (disk drives) included in the local storage 1102. This physical disk management table 1143 records and manages disk numbers of the plurality of physical disks (included in the local storage 1102), the capacity of the physical disks, the RAID configuration, and the status of use. The LU management table 1144 is to manage the LU which is logically set on each of the physical disks. This LU management table 1144 records and manages the LU numbers of the plurality of LUs set on the local storage 1102, the physical disk numbers, the capacity, and the RAID configuration. The port 1147 is connected to an internal LAN or SAN. The storage device 1148 is, for example, a hard disk device, a flexible disk device, or a semiconductor device.
The logical block ID 11341 is a logical file path managed in each storage device unit 1001 (1001a to 1001m), and is uniquely set to each of all the files of the local storage. For example, logical block IDs “4000”, “4001”, “4002”, and “4003” . . . are set in the storage device unit 1001e.
The physical block ID 11342 is a real file path of a file which is actually stored in each storage device unit 1001 (1001a to 1001m). For example, in the storage device unit 1001e, “5123” is set as an ID (=4000) of a physical block in which real data of the file is stored. Each server can access the file of the storage device unit 1001, using the IDs of this storage directory 1134.
The hash value 11343 indicates a hash value (6100 or the like) of a file necessary for accessing the file. When files are duplicated, the same hash value is given. Instead of the hash value, any other feature value may be used.
The link 11344 to the node ID indicates a link to the storage device unit of another node from the storage device unit 1001 of the own node. The link 11345 to the block ID indicates a link to its logical block ID. For example, a link is set up to a logical block ID 4121 of the storage device unit 1001c, for the data of a hash value 6103, in association with the logical block ID 4002 of the storage device unit 1001e.
The in-process flag 11346 indicates whether each node is in an in-process state (=1) or not (=0).
Each of the other storage device units includes also the same storage directory 1134 as the storage device unit 1001e.
As an alternative of this embodiment, a management server is provided, and this server is connected to the first network and the second network of the autonomous distributed type file system. Some of functions of the local controller 1103 of each storage device unit may centrally be managed by this management server. That is, the storage directory 1134 is provided in this management server, while the physical disk management table and the LU management table are provided in each storage device unit. The logical position, data, and a feature quantity in each storage device unit 1001, at the data writing, are kept in the storage directory of the management server. In this case, at the time of data readout, the server inquires of this management server to obtain the position of the storage device unit having the data with reference to the storage directory 1134.
Descriptions will now be made to characteristic functions of the autonomous distributed type file system according to this embodiment, with reference to
The local controller of the storage device units b and e includes a duplicated data maintaining unit and a calculation/comparison function of hash values and data values. In this function, when there is free space in the logical block of the local storage, in other words, unless it runs out of storage capacity, one real data item of data and at least one replicated data item are duplicated and continuously kept. When there is no free space in the logical block, in other words, if the storage capacity has no extra space, writing of replicated data is restricted or prevented. More specific descriptions will be made below.
(1) Each storage device unit 1001 calculates and records a feature value (hash value or the like) of data held by the own node, in the storage directory 1134.
(2) When a server (connected to the storage) writes new data D into a logical position p (of logical/physical blocks), the storage device unit which has received the data (1001e in this example) calculates a feature value (hash value) H of the new data D, extracts data having the same hash value from a list of feature values recorded in the own node, and sets up a link to data D′ if the own node has the data D′ which is duplicated with the new data D.
(3) The storage device unit 1001e which has received the data reports the feature value (hash value) H of the new data D to another storage device unit i (hereinafter represented as the storage device unit 1001b) included in the storage system.
(4) The storage device unit b which has received the feature value selects data having the same hash value from the list of feature values recorded in the own node. The storage device unit b having the same value H′ requests the storage device unit e for data D.
(5) The storage device unit e transfers data D to the storage device unit b.
(6) The storage device unit b determines whether the own node has the same data D′ as the data D, and sends the determination result to the storage device unit e.
(7) When there is a storage device unit b having the same data D′, the storage device unit e keeps the data D as a replica of the data D′, creates a link from the data D to the data D′, and records it in the storage directory 1134e. The creation of the link to the storage device unit b implies that the data D is marked as “data that can be prevented from being duplicated” when it approximately runs out of storage capacity of the storage device unit e.
The storage directory of the storage device unit b records that the data D′ (same as the data D) held by the storage device unit b is linked from another.
(1) The server (x) specifies a logical position p, and requests the storage device unit e for data D.
(2) The storage device unit e sends data D when it has the data D with the logical value p.
(3) When there is no requested data in the own node, and when there is a link to “p”, the storage device unit e requests the storage device unit b as a destination link, to transfer the data D′.
(4) The storage device unit e receives the data D′ from the storage device unit b, thereafter sending the data D′ to the server.
Descriptions will now be made to a process mainly performed by the duplicated data maintaining unit 1135, when a server writes data into one storage device unit e, with reference to
Upon reception of written data (D1) from a server (x) (S2001), the storage device unit e determines whether there is free space in logical blocks of the storage directory 1134e of the own node (S2002).
If there is no free space in the logical blocks of the directory, it is assumed that “there is no free space” in the storage, and the process ends (S2003). If there is free space in the logical blocks of the directory, it is determined whether there is free space in physical blocks of the directory (S2004). If there is no free space in the physical blocks of the directory (“NO” in S2004), it is determined whether there is a logical block having a link to the directory (S2005). If there is no logical block with the link, a response of “there is no free space” is made, and the process ends (S2003). If there is a duplicated physical block (for example, a physical block “D2”) in the directory, a pointer to the physical block is deleted (S2006), and a free block is secured. Then, data (D1) is stored in this free block, an entry of this block is created in the storage directory (S2007), and an “in-process flag” is set in the storage directory (S2008).
Further, a hash value H1 of the data D1 is calculated (S2009). It is determined whether there is a block having the same hash value H1 in the storage directory of the own node (S2010). If there is a block having the same hash value H1, it is determined whether there are blocks having the same data D1′ in the storage directory of the own node (S2011). When the blocks having the same data D1′ are in different files, the flow proceeds to Step 2019, in which a link to the data D1′ is created, and the data D1 is assumed as a replicated block of the data D1′. When blocks having the same data are not in the storage directory of the own node, the hash value H1 is delivered to another node (S2012).
In
In
In
In Step 2017 of
After this, in Step 1115 (b, e), the storage device unit 1001b receives written data Dn at “t=t3”, while the storage device unit 1001e receives written data Dm at “t=t4” from the servers. In a state where there is free space in the logical blocks and there is no free space in the physical blocks in the directory of the both storage device units (corresponding to “NO” in S2004 of
As a comparative example,
In the present invention, according to the “function of keeping one real data in a particular node, keeping one or more replicas in this particular node or another node, or creating a link”, when there are a plurality of data items with the same entities, only data of the particular node (for example, data of a small node ID) remains, thus preventing to simultaneously delete entities of data from the storage device units 1001. As a method for setting the particular node, the magnitude relation of ID values of the nodes may be set in the other way round, or the magnitude relation may be determined using a reference value (for example, an intermediate value) instead of the minimum ID value.
Instead of performing the procedures from S1504 to S1506, when the storage directory e has the link 1134 corresponding to the logical value p (“YES” in S1502), the storage device unit 1001e may transmit a request to the storage device unit b as a destination link to transmit it from the storage device unit b directly to the server (x). The storage device unit b which has received this request may directly transmit the requested data to the server (x).
In this embodiment, when there is free space in the logical block of the directory, duplicate writing of the same data is permitted. Thus, it is possible to reduce the access time from the server (x) and to perform parallel access from a plurality of servers (x).
That is, when a plurality of servers transmit a request for reading/writing data to the storage device units connected through the plurality of access paths, each server can transmit a request for reading/writing data from/to another storage device unit, and each storage device unit can individually perform a process for requesting reading/writing of data. Thus, requests are not concentrated on one storage device unit, thereby allowing high-speed access to data.
Descriptions will now be made to a process in a case where a plurality of servers perform accessing to one storage device unit in the file system of this embodiment, using
The upper stage of
In the upper stage of
In the lower stage of
The data writing process is continuously performed by the “function of keeping one real data item in a particular node, and keeping one or more replicas in this particular node or another node, or creating a link”, in this embodiment. Then, in the end, one real data item D1, D2, . . . , DZ, and one or a plurality of replicated data items D1′, D2′, . . . , DZ′ are kept in the file system, and one link to each of these data is created. To evenly and efficiently store data in the storage device units and increase the amount of stored data (different data items) in the file system, it is necessary to function the duplicated data maintaining unit, that is, to prevent from a plurality of replicated data items from being kept in the particular node through the procedure after “YES” of Step 2018 of
Accordingly, the file system of this embodiment permits that the same data is duplicated and exist in the own node or another node, and keeps other data with a set up link thereto, when there is free space in the logical block or physical block of the storage device unit 1001. That is, a plurality of same data entities and replicas are continuously kept in the file system without running out of storage capacity, and the contents located in the nearest position are read at the time of reading out data from the server, thereby enabling to reduce the access time and realize parallel access.
On the contrary, when there is no free space in the logical block and physical block of the storage device unit 1001, the file system prevents that a plurality of same data items exist in the same node or another node. As a result, the file system can realize reduction in the access time for arbitrary data from servers. That is, under the circumstances where it runs out of storage capacity to some extent, the “function of keeping one real data in a particular node, and keeping one or more replicas in this particular node or another node, or creating a link” is realized.
Accordingly, in the file system, the duplication degree of the same data is appropriately controlled, and it is possible to realize both the prevention of the excess duplication and parallel access.
Descriptions will now be made to an autonomous distributed type file system according to a second embodiment of the present invention. What differs from the first embodiment is that storage device units actively prevent the duplicate writing in their own nodes. It can be said that the “duplicate prevention” function of the first embodiment is a function of performing “duplicate prevention of another node”. A data maintaining unit of the second embodiment has a function of “duplicate prevention of own node” as described below, in addition to the function of “duplicate prevention of another node” of the first embodiment.
(1) If a server writes new data D into a logical position p (of logical/physical blocks), a storage device unit (in this example, 1001e) which has received data calculates a feature (hash value) H of the new data D, extracts data having the same hash value based on a list of feature values recorded in the own node, and sets up a link to data D′ if there is duplicated data D′ in its own node.
(2) The storage device unit 1001e reports the feature (hash value) H of the new data D to another storage device units i (hereinafter represented as a storage device unit 1001b) included in the storage system.
(Like the first embodiment, the function of “duplicate prevention of another node” is executed.)
(3) When it is about the time to delete data because it runs out of capacity, the storage device unit e deletes duplicated replicated data D′ of the own node.
Procedures from Step 12000 to Step 12011 are the same as the procedures from Step 2000 to Step 2011 in the flow of the first embodiment. In Step 12011, when there are blocks having the same data D1′, replicated data D′ of the D1′ in the own node is deleted. That is, a pointer to the physical block in the storage directory e is deleted (S12022), and the process proceeds to Step 12018 thereafter. When there are no blocks having the same data in the storage directory of the own node, a hash value H1 is delivered to another node (S12012). The following procedures are the same as those of the flow of the first embodiment.
Descriptions will be made to accessing to one storage device unit from a plurality of servers, in the second embodiment, using
When there is free space in the physical block of the storage device unit 1001e, it is permitted that a plurality of same data items exist (for example, D1′) in the same node or another nodes 1001b or 1001m, and also a replica of data D2′ with a link set up thereto is kept. This is the same as the case of
When there is no free space in the physical block of the storage device unit 1001e, it is prevented that a plurality of same data items exist in the same node or another node. For example, in the storage device unit 1001e, replicated data D2′ with a set up link to the data D2 of the storage device unit 1001b is deleted, and replicated data D1 duplicated with data D1′ (1) is deleted in the own node, also a link is set up from the data D1 to the data D1′ (1). The data D1′ (3) of the storage device unit 1001b (as a particular node) is kept as is as an entity. As a result, an attempt is made to reduce the time for access to arbitrary data (for example, the data D1 and the data D1′ (2) to D1′ (3)), without increasing the total amount of data.
In this embodiment, if the data writing process is continuously performed by the “function of keeping one real data in a particular node, and keeping one or more replicas in this particular node or another node, or creating a link” of the duplicated data maintaining unit, one real data items D1, D2, . . . , and DZ, and one replicated data items D1′, D2′, . . . , and DZ″ are kept in the file system, in the end, and one link to each of these data items is created. As a result, it is possible to increase an amount of stored data (different data) in the file system. When the use of the file system requires reduction in the access time, the duplicated data maintaining unit may function, in a manner that each node is permitted to keep two or three replicated data items in Step 12022 of
Accordingly, in this embodiment, a plurality of same contents data items are continuously kept to an extent without running out of storage capacity. Under the circumstances where it about runs out of storage capacity, the “function of keeping one real data in a particular node, and keeping one or more replicas in this particular node or another node, or creating a link” is realized. Accordingly, in the file system, the duplication degree of the same data is appropriately controlled, and it is possible to realize both the prevention of the excess duplication and parallel access.
Number | Date | Country | Kind |
---|---|---|---|
2013-029852 | Feb 2013 | JP | national |