The present disclosure relates to U.S. patent application Ser. No. 11/256,410, titled “SYSTEMS AND METHODS FOR PROVIDING VARIABLE PROTECTION,” U.S. Pat. No. 7,346,720, titled “SYSTEMS AND METHODS FOR MANAGING CONCURRENT ACCESS REQUESTS TO A SHARED RESOURCE,” U.S. patent application Ser. No. 11/255,818, titled “SYSTEMS AND METHODS FOR MAINTAINING DISTRIBUTED DATA,” U.S. patent application Ser. No. 11/256 317, titled “SYSTEMS AND METHODS FOR USING EXCITEMENT VALUES TO PREDICT FUTURE ACCESS TO RESOURCES,” and U.S. patent application Ser. No. 11/255,337, titled “SYSTEMS AND METHODS FOR ACCESSING AND UPDATING DISTRIBUTED DATA,” each filed on even date herewith and each hereby incorporated by reference herein in their entirety.
This disclosure relates to systems and methods for scanning files in distributed file systems.
Operating systems generally manage and store information on one or more memory devices using a file system that organizes data in a file tree. File trees identify relationships between directories, subdirectories, and files.
In a distributed file system, data is stored among a plurality of network nodes. Files and directories are stored on individual nodes in the network and combined to create a file tree for the distributed file system to identify relationships and the location of information in directories, subdirectories and files distributed among the nodes in the network. Files in distributed file systems are typically accessed by traversing the overall file tree.
Occasionally, a file system may scan a portion or all of the files in the file system. For example, the file system or a user may want to search for files created or modified in a certain range of dates and/or times, files that have not been accessed for a certain period of time, files that are of a certain type, files that are a certain size, files with data stored on a particular memory device (e.g., a failed memory device), files that have other particular attributes, or combinations of the foregoing. Scanning for files by traversing multiple file tree paths in parallel is difficult because the tree may be very wide or very deep. Thus, file systems generally scan for files by sequentially traversing the file tree. However, file systems, and particularly distributed file systems, can be large enough to store hundreds of thousands of files, or more. Thus, it can take a considerable amount of time for the file system to sequentially traverse the entire file tree.
Further, sequentially traversing the file tree wastes valuable system resources, such as the availability of central processing units to execute commands or bandwidth to send messages between nodes in a network. System resources are wasted, for example, by accessing structures stored throughout a cluster from one location, which may require significant communication between the nodes and scattered access to memory devices. The performance characteristics of disk drives, for example, vary considerably based on the access pattern. Thus, scattered access to a disk drive based on sequentially traversing a file tree can significantly increase the amount of time used to scan the file system.
Thus, it would be advantageous to use techniques and systems for scanning file systems by searching metadata, in parallel, for selected attributes associated with a plurality of files. In one embodiment, content data, parity data and metadata for directories and files are distributed across a plurality of network nodes. When performing a scan of the distributed file system, two or more nodes in the network search their respective metadata in parallel for the selected attribute. When a node finds metadata corresponding to the selected attribute, the node provides a unique identifier for the metadata to the distributed file system.
According to the foregoing, in one embodiment, a method is provided for scanning files and directories in a distributed file system on a network. The distributed file system has a plurality of nodes. At least a portion of the nodes include metadata with attribute information for one or more files striped across the distributed file system. The method includes commanding at least a subset of the nodes to search their respective metadata for a selected attribute and to perform an action in response to identifying the selected attribute in their respective metadata. The subset of nodes is capable of searching their respective metadata in parallel.
In one embodiment, a distributed file system includes a plurality of nodes configured to store data blocks corresponding to files striped across the plurality of nodes. The distributed file system also includes metadata data structures stored on at least a portion of the plurality of nodes. The metadata data structures include attribute information for the files. At least two of the plurality of nodes are configured to search, at substantially the same time, their respective metadata data structures for a selected attribute.
In one embodiment, a method for recovering from a failure in a distributed file system includes storing metadata corresponding to one or more files on one or more nodes in a network. The metadata points to data blocks stored on the one or more nodes. The method also includes detecting a failed device in the distributed file system, commanding the nodes to search their respective metadata for location information corresponding to the failed device, receiving responses from the nodes, the responses identifying metadata data structures corresponding to information stored on the failed device, and accessing the identified metadata data structures to reconstruct the information stored on the failed device.
For purposes of summarizing the invention, certain aspects, advantages and novel features of the invention have been described herein. It is to be understood that not necessarily all such advantages may be achieved in accordance with any particular embodiment of the invention. Thus, the invention may be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other advantages as may be taught or suggested herein.
Systems and methods that embody the various features of the invention will now be described with reference to the following drawings.
Systems and methods which represent one embodiment and example application of the invention will now be described with reference to the drawings. Variations to the systems and methods which represent other embodiments will also be described.
For purposes of illustration, some embodiments will be described in the context of a distributed file system. The inventors contemplate that the present invention is not limited by the type of environment in which the systems and methods are used, and that the systems and methods may be used in other environments, such as, for example, the Internet, the World Wide Web, a private network for a hospital, a broadcast network for a government agency, an internal network of a corporate enterprise, an intranet, a local area network, a wide area network, and so forth. The figures and descriptions, however, relate to an embodiment of the invention wherein the environment is that of distributed file systems. It is also recognized that in other embodiments, the systems and methods may be implemented as a single module and/or implemented in conjunction with a variety of other modules and the like. Moreover, the specific implementations described herein are set forth in order to illustrate, and not to limit, the invention. The scope of the invention is defined by the appended claims.
I. Overview
Rather than sequentially traversing a file tree searching for a particular attribute during a scan, a distributed file system, according to one embodiment, commands a plurality of network nodes to search their respective metadata for the particular attribute. The metadata includes, for example, attributes and locations of file content data blocks, metadata data blocks, and protection data blocks (e.g., parity data blocks and mirrored data blocks). Thus, two or more nodes in the network can search for files having the particular attribute at the same time.
In one embodiment, when a node finds metadata corresponding to the selected attribute, the node provides a unique identifier for a corresponding metadata data structure to the distributed file system. The metadata data structure includes, among other information, the location of or pointers to file content data blocks, metadata data blocks, and protection data blocks for corresponding files and directories. The distributed file system can then use the identified metadata data structure to perform one or more operations on the files or directories. For example, the distributed file system can read an identified file, write to an identified file, copy an identified file or directory, move an identified file to another directory, delete an identified file or directory, create a new directory, update the metadata corresponding to an identified file or directory, recover lost or missing data, and/or restripe files across the distributed file system. In other embodiments, these or other file system operations can be performed by the node or nodes that find metadata corresponding to the selected attribute.
In one embodiment, the distributed file system commands the nodes to search for metadata data structures having location information corresponding to a failed device on the network. The metadata data structures identified in the search may then be used to reconstruct lost data that was stored on the failed device.
In the following description, reference is made to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific embodiments or processes in which the invention may be practiced. Where possible, the same reference numbers are used throughout the drawings to refer to the same or like components. In some instances, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. The present disclosure, however, may be practiced without the specific details or with certain alternative equivalent components and methods to those described herein. In other instances, well-known components and methods have not been described in detail so as not to unnecessarily obscure aspects of the present disclosure.
II. Distributed File System
In one embodiment, at least one of the nodes 102, 104, 106, 108, 110, 112 comprises a conventional computer or any device capable of communicating with the network 114 including, for example, a computer workstation, a LAN, a kiosk, a point-of-sale device, a personal digital assistant, an interactive wireless communication device, an interactive television, a transponder, or the like. The nodes 102, 104, 106, 108, 110, 112 are configured to communicate with each other by, for example, transmitting messages, receiving messages, redistributing messages, executing received messages, providing responses to messages, combinations of the foregoing, or the like. In one embodiment, the nodes 102, 104, 106, 108, 110, 112 are configured to communicate RPC messages between each other over the communication medium 114 using TCP. An artisan will recognize from the disclosure herein, however, that other message or transmission protocols can be used.
In one embodiment, the network 100 comprises a distributed file system as described in U.S. patent application Ser. No. 10/007,003, entitled “System and Method for Providing a Distributed File System Utilizing Metadata to Track Information About Data Stored Throughout the System,” filed Nov. 9, 2001 which claims priority to application Ser. No. 60/309,803 filed Aug. 3, 2001, and U.S. patent application Ser. No. 10/714,326, filed Nov. 14, 2003, which claims priority to application Ser. No. 60/426,464, filed Nov. 14, 2002, all of which are hereby incorporated by reference herein in their entirety. For example, the network 100 may comprise an intelligent distributed file system that enables the storing of file data among a set of smart storage units which are accessed as a single file system and utilizes a metadata data structure to track and manage detailed information about each file. In one embodiment, individual files in a file system are assigned a unique identification number that acts as a pointer to where the system can find information about the file. Directories (and subdirectories) are files that list the name and unique identification number of files and subdirectories within the directory. Thus, directories are also assigned unique identification numbers that reference to where the system can find information about the directory.
In addition, the distributed file system may be configured to write data blocks or restripe files distributed among a set of smart storage units in the distributed file system wherein data is protected and recoverable if a system failure occurs.
In one embodiment, at least some of the nodes 102, 104, 106, 108, 110, 112 include one or more memory devices for storing file content data, metadata, parity data, directory and subdirectory data, and other system information. For example, as shown in
A. Metadata
Metadata data structures include, for example, the device and block locations of the file's data blocks to permit different levels of replication and/or redundancy within a single file system, to facilitate the change of redundancy parameters, to provide high-level protection for metadata distributed throughout the network 100, and to replicate and move data in real-time. Metadata for a file may include, for example, an identifier for the file, the location of or pointer to the file's data blocks as well as the type of protection for each file, or each block of the file, the location of the file's protection blocks (e.g., parity data, or mirrored data). Metadata for a directory may include, for example, an identifier for the directory, a listing of the files and subdirectories of the directory as well as the identifier for each of the files and subdirectories, as well as the type of protection for each file and subdirectory. In other embodiments, the metadata may also include the location of the directory's protection blocks (e.g., parity data, or mirrored data). In one embodiment, the metadata data structures are stored in the distributed file system.
B. Attributes
In one embodiment, the metadata includes attribute information corresponding to files and directories stored on the network 100. The attribute information may include, for example, file size, file name, file type, file extension, file creation time (e.g., time and date), file access time (e.g., time and date), file modification date (e.g., time and date), file version, file permission, file parity scheme, file location, combinations of the foregoing, or the like. The file location may include, for example, information useful for accessing the physical location in the network of content data blocks, metadata data blocks, parity data blocks, mirrored data blocks, combinations of the foregoing, or the like. The location information may include, for example, node id, device, id, and address offset, though other location may be used.
C. Exemplary Metadata File Tree
Since the metadata includes location information for files and directories stored on the network 100, the distributed file system according to one embodiment uses a file tree comprising metadata data structures. For example,
As illustrated in the example in
The inode 206 for directory dir2 points to an inode 210 (stored on devices D and G) for a directory named “dir4,” an inode 212 (stored on devices B and C) for a directory named “dir5,” an inode 214 (stored on devices A and E) for a directory named “dir6,” and an inode 216 (stored on devices A and B) for a file named “file1.zzz.” The inode 208 for directory dir3 points to an inode 218 (stored on devices A and F) for a file named “file2.xyz.” The inode 214 for the directory dir6 points to an inode 220 (stored on devices A and C) for a file named “file3.xxx,” an inode 222 (stored on devices B and C) for a file named “file4.xyz,” and an inode 224 (stored on devices D and G) for a file named “file5.xyz.” An artisan will recognize that the inodes shown in
D. Exemplary Metadata Data Structures
In one embodiment, the metadata data structure (e.g., inode) includes, attributes and a list of devices having data to which the particular inode points. For example,
The inode 202 for the root directory also includes location information 306 corresponding to the directories dir1, dir2, and dir3. As shown in
The inode 220 also includes a list of devices used 604. In this example, content data blocks and parity data blocks corresponding to the file file3.xxx are striped across devices B, C, D, and E using a 2+1 parity scheme. Thus, for two blocks of content data stored on the devices B, C, D, and E, a parity data block is also stored. The parity groups (e.g., two content data blocks and one parity data block) are distributed such that each block in the parity group is stored on a different device. As shown in
The inode 220 for the file file3.xxx also includes location information 606 corresponding to the content data blocks (e.g., block0, block1, block2, and block3) and parity data blocks (e.g., parity0 and parity1). As shown in
III. Scanning Distributed File Systems
In one embodiment, the distributed file system is configured to scan a portion or all of the files and/or directories in the distributed file system by commanding nodes to search their respective metadata for a selected attribute. As discussed in detail below, the nodes can then search their respective metadata in parallel and perform an appropriate action when metadata is found having the selected attribute.
Commanding the nodes to search their respective metadata in parallel with other nodes greatly reduces the amount of time necessary to scan the distributed file system. For example, to read a file in path/dir1/fileA.xxx of a file tree (where “/” is the top level or root directory and “xxx” is the file extension of the file named fileA in the directory named dir1), the file system reads the file identified by the root directory's predefined unique identification number, searches the root directory for the name dir1, reads the file identified by the unique identification number associated with the directory dir1, searches the dir1 directory for the name fileA.xxx, and reads the file identified by the unique identification number associated with fileA.xxx.
For example, referring to
After sequentially stepping through the subdirectory and file paths of the directory dir1, the inode 206 corresponding to the directory dir2 may be read to determine the names and locations of the subdirectories (e.g., dir4, dir5, and dir6) and files (e.g., file1.zzz) that the inode 206 points to. This process may then be repeated for each directory and subdirectory in the distributed file system. Since content data, metadata and parity data is spread throughout the nodes 102, 106, 108, 110, 112 in the network 100, sequentially traversing the file tree 200 requires a large number of messages to be sent between the nodes and uses valuable system resources. Thus, sequentially traversing the file tree 200 is time consuming and reduces the overall performance of the distributed file system.
However, commanding the nodes 102, 106, 108, 110, 112 to search their respective metadata in parallel, according to certain embodiments disclosed herein, reduces the number of messages sent across the network 100 and allows the nodes 102, 106, 108, 110, 112 to access their respective devices A-H sequentially.
In one embodiment, for example, one or more of the devices A-H are hard disk drives that are capable of operating faster when accessed sequentially. For example, a disk drive that yields approximately 100 kbytes/second when reading a series of data blocks from random locations on the disk drive may yield approximately 60 Mbytes/second when the data blocks are read from sequential locations on the disk drive. Thus, allowing the nodes 102, 106, 108, 110, 112 to respectively access their respective drives sequentially, rather than traversing an overall file tree for the network 100 (which repeatedly accesses small amounts of data scattered across the devices A-H), greatly reduces the amount of time used to scan the distributed file system.
The nodes 102, 106, 108, 110, 112 may perform additional processing, but the additional work is spread across the nodes 102, 106, 108, 110, 112 and reduces overall network traffic and processing overhead. For example, in one embodiment, rather than reading all the metadata from the node 102 across the network, the node 102 searches its metadata and only the metadata satisfying the search criteria is read across the network. Thus, overall network traffic and processing overhead is reduced.
From the block 710, the process 700 proceeds, in parallel, to blocks 712 and 714. In the block 714, file system operations are performed. The file system operations may include, for example, continuing to distribute data blocks for files and directories across the nodes in the network, writing files, reading files, restriping files, repairing files, updating metadata, waiting for user input, and the like. The distributed file system operations can be performed while the system waits for a command to scan and/or while the distributed file system performs a scan as discussed below.
In the block 712, the system queries whether to scan the distributed file system to identify the files and directories having a selected attribute. For example, the distributed file system or a user of the network 100 shown in
If a scan is desired or needed, the process 700 proceeds to a block 716 where the distributed file system commands the nodes to search their respective metadata data blocks for a selected attribute. Advantageously, the nodes are capable of searching their metadata data blocks in parallel with one another. For example, the nodes 102, 106, 108, 110, 112 may each receive the command to search their respective metadata data blocks for the selected attribute. The nodes 102, 106, 108, 110, 112 can then execute the command as node resources become available. Thus, rather than waiting for each node 102, 106, 108, 110, 112 to scan its respective metadata data blocks one at a time, two or more of the nodes 102, 106, 108, 110, 112 that have sufficient node resources may search their respective metadata data blocks at the same time. It is recognized that the distributed file system may command a subset of the nodes to conduct the search.
In one embodiment, the metadata data blocks for a particular node are sequentially searched for the selected attribute. For example, a node may include a drive that is divided into a plurality of cylinder groups. The node may sequentially step through each cylinder group reading their respective metadata data blocks. In other embodiments, metadata data blocks within a particular node are also searched in parallel. For example, the node 108 includes devices C and D that can be searched for the selected attribute at the same time. The following exemplary pseudocode illustrates one embodiment of accessing metadata data blocks (e.g., stored in data structures referred to herein as inodes) in parallel:
In a block 718, the distributed file system commands the nodes to perform an action in response to identifying the selected attribute in their respective metadata and proceeds to an end state 720. An artisan will recognize that the command to search for the selected attribute and the command to perform an action in response to identifying the selected attribute can be sent to the nodes using a single message (e.g., sent to the nodes 102, 106, 108, 110, 112) or using two separate messages. The action may include, for example, writing data, reading data, copying data, backing up data, executing a set of instructions, and/or sending a message to one or more of the other nodes in the network. For example, the node 102 may find one or more its inodes that point to files or directories created within a certain time range. In response, the node 102 may read the files or directories and write a backup copy of the files or directories.
In one embodiment, the action in response to identifying the attribute includes sending a list of unique identification numbers (e.g., logical inode number or “LIN”) for inodes identified as including the selected attribute to one or more other nodes. For example, the nodes 102, 106, 108, 110, 112 may send a list of LINs for their respective inodes with the selected attribute to one of the other nodes in the network 100 for processing. The node that receives the LINs may or may not have any devices. For example, the node 104 may be selected to receive the LINs from the other nodes 102, 106, 108, 110, 112 and to perform a function using the LINs.
After receiving the LINs from the other nodes 102, 106, 108, 110, 112, the node 104 reads the inodes identified by the LINs for the location of or pointers to content data blocks, metadata data blocks, and/or protection data blocks (e.g., parity data blocks and mirrored data blocks). In certain such embodiments, the node 104 also checks the identified inodes to verify that they still include the selected attribute. For example, the selected attribute searched for may be files and directories that have not been modified for more than 100 days and the node 104 may be configured to delete such files and directories. However, between the time that the node 104 receives the list of LINs and the time that the node 104 reads a particular identified inode, the particular identified inode may be updated to indicate that its corresponding file or directory has recently been modified. The node 104 then deletes only files and directories with identified inodes that still indicate that they have not been modified for more than 100 days.
While process 700 illustrates an embodiment for scanning files and directories in a distributed file system such that all devices are scanned in parallel, it is recognized that the process 700 may be used on a subset of the devices. For example, one or more devices of the distributed file system may be offline. In addition, the distributed file system may determine that the action to be performed references only a subset of the devices such that only those devices are scanned, and so forth.
A. Example Scan Transactions
High-level exemplary transactions are provided below that illustrate scanning a distributed file system according to certain embodiments. The exemplary transactions include a data backup transaction and a failure recovery transaction. An artisan will recognize from the disclosure herein that many other transactions are possible.
1. Example Data Backup Transaction
The following example illustrates how backup copies of information stored on the network 100 can be created by scanning the distributed file system to find files and directories created or modified during a certain time period (e.g., since the last backup copy was made). In this example, the node 104 is selected to coordinate the backup transaction on the distributed file system. An artisan will recognize, however, that any of the nodes can be selected to coordinate the backup transaction.
The node 104 begins the backup transaction by sending a command to the nodes 102, 106, 108, 110, 112 to search their respective metadata so as to identify inodes that point to files and directories created or modified within a certain time range. As discussed above, the exemplary nodes 102, 106, 108, 110, 112 are capable of searching their metadata in parallel with one another. After searching, the nodes 102, 106, 108, 110, 112 each send a list of LINs to the node 104 to identify their respective inodes that point to files or directories created or modified within the time range. The node 104 then accesses the identified inodes and reads locations of or pointers to content data blocks, metadata blocks, and/or protection data blocks corresponding to the files or directories created or modified within the time range. The node 104 then writes the content data blocks, metadata blocks, and/or protection data blocks to a backup location.
2. Example Failure Recovery Transaction
Beginning at a start state 808, the process 800 proceeds to block 810. In block 810, the process 800 detects a failed device in a distributed file system. For example, in one embodiment, the nodes 102, 106, 108, 110, 112 include a list of their own devices and share this list with the other nodes. When a device on a node fails, the node notifies the other nodes of the failure. For example, when device B fails, the node 106 sends a message to the nodes 102, 104, 108, 110, 112 to notify them of the failure.
In a block 812, the process 800 includes commanding the nodes to search their respective metadata for location information corresponding to the failed device. In one embodiment, the message notifying the nodes 102, 106, 108, 110, 112 of the failure of the device B includes the command to search for metadata identifies the location of content data blocks, metadata data blocks, and protection data blocks (e.g., parity data blocks and mirrored data blocks) that are stored on the failed device B.
After receiving the command to search metadata for location information corresponding to the failed device B, the nodes 102, 108, 110, 112 begin searching for inodes that include the failed device B in their list of devices used. For example, as discussed above in one embodiment, the inode 202 for the root directory is stored on devices D and H and includes the location of or pointers to the inodes 204, 206, 208 for the directories dir1, dir2 and dir3, respectively (see
Similarly, the nodes 108 (for device C) and 110 (for device F) will include the LIN for the inode 206 in their respective lists of LINS that meet the search criteria, the nodes 102 (for device A) and 110 (for device E) will include the LIN for the inode 214 in their respective lists of LINS that meet the search criteria, and the nodes 102 (for device A) and 108 (for device C) will include the LIN for the inode 220 in their respective lists of LINs that meet the search criteria. While this example returns the LIN of the inode, it is recognized that other information may be returned, such as, for example, the LIN for inode 208. In other embodiments, rather than return any identifier, the process may initiate reconstruction of the data or other related actions.
In other embodiments, the list of devices used for a particular inode includes one or more devices on which copies of the particular inode are stored. For example,
As discussed above, the nodes 102, 108, 110, 112 are capable of searching their respective metadata in parallel with one another. In one embodiment, the nodes 102, 108, 110, 112 are also configured to execute the command to search their respective metadata so as to reduce or avoid interference with other processes being performed by the node. The node 102, for example, may search a portion of its metadata, stop searching for a period of time to allow other processes to be performed (e.g., a user initiated read or write operation), and search another portion of its metadata. The node 102 may continue searching as the node's resources become available.
In one embodiment, the command to search the metadata includes priority information and the nodes 102, 108, 110, 112 are configured to determine when to execute the command in relation to other processes that the nodes 102, 108, 110, 112 are executing. For example, the node 102 may receive the command to search its metadata for the location information as part of the overall failure recovery transaction and it may also receive a command initiated by a user to read certain content data blocks. The user initiated command may have a higher priority than the command to search the metadata. Thus, the node 102 will execute the user initiated command before searching for or completing the search of its metadata for the location information corresponding to the failed device B.
In one embodiment, the nodes 102, 108, 110, 112 are configured to read their respective inodes found during the search and reconstruct the lost data (as discussed below) that the inodes point to on the failed device B. In the embodiment shown in
In a block 814, the process 800 includes receiving responses from the nodes that identify metadata data structures corresponding to information stored on the failed device. For example, the nodes 102, 108, 110 may send their lists of LINs to the node 112. In a block 816, the process 800 includes accessing the identified metadata data structures to reconstruct the lost information stored on the failed device and proceeds to an end state 818. For example, after receiving the lists LINs from the nodes 102, 108, 110, the node 112 may use the received LINs and any LINs that it has identified to read the corresponding inodes to determine the locations of content data blocks, metadata blocks and protection data blocks corresponding to the lost information on the failed device B.
For example, as discussed above, the node 112 in one embodiment may receive lists of LINs from the nodes 108 and 112 that include the LIN for the inode 202. The node 112 then reads the inode 202 from either the device D or the device H to determine that it includes pointers to the inode 208 for the directory dir3 stored on the failed device B (see
As another example, the node 112 also receives lists of LINs from the nodes 102 and 108 that include the LIN for the inode 220. The node 112 then reads the inode 220 from either the device A or the device C for the location of or pointers to content data blocks (block0 and block3) stored on the failed device B (see
As discussed above, the file3.xxx uses a 2+1 parity scheme in which a first parity group includes block0, block1 and parity0 and a second parity group includes block2, block3, and parity1. If needed or desired, the node 112 can recover the block0 information that was lost on the failed device B by using the pointers in the inode 220 to read the block1 content data block and the parity0 parity data block, and XORing block1 and parity0. Similarly, the node 112 can recover the block3 information that was lost on the failed device B by using the pointers in the inode 220 to read the block2 content data block and the parity1 parity data block, and XORing block2 and parity1. In one embodiment, the node 112 writes the recovered block0 and block3 to the remaining devices A, C, D, E F, G, H. In another embodiment, the node 112 can then change the protection scheme, if needed or desired, and restripe the file file3.xxx across the remaining devices A, C, D, E, F, G, H.
Thus, the distributed file system can quickly find metadata for information that was stored on the failed device B. Rather than sequentially traversing the entire file tree 200, the distributed file system searches the metadata of the remaining nodes 102, 108, 110, 112 in parallel for location information corresponding to the failed device B. This allows the distributed file system to quickly recover the lost data and restripe any files, if needed or desired.
While certain embodiments of the inventions have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Name | Date | Kind |
---|---|---|---|
5163131 | Row et al. | Nov 1992 | A |
5181162 | Smith et al. | Jan 1993 | A |
5212784 | Sparks | May 1993 | A |
5230047 | Frey et al. | Jul 1993 | A |
5251206 | Calvignac et al. | Oct 1993 | A |
5258984 | Menon et al. | Nov 1993 | A |
5329626 | Klein et al. | Jul 1994 | A |
5359594 | Gould et al. | Oct 1994 | A |
5403639 | Belsan et al. | Apr 1995 | A |
5548724 | Akizawa et al. | Aug 1996 | A |
5568629 | Gentry et al. | Oct 1996 | A |
5596709 | Bond et al. | Jan 1997 | A |
5606669 | Bertin et al. | Feb 1997 | A |
5612865 | Dasgupta | Mar 1997 | A |
5649200 | Leblang et al. | Jul 1997 | A |
5657439 | Jones et al. | Aug 1997 | A |
5668943 | Attanasio et al. | Sep 1997 | A |
5680621 | Korenshtein | Oct 1997 | A |
5694593 | Baclawski | Dec 1997 | A |
5696895 | Hemphill et al. | Dec 1997 | A |
5734826 | Olnowich et al. | Mar 1998 | A |
5754756 | Watanabe et al. | May 1998 | A |
5761659 | Bertoni | Jun 1998 | A |
5774643 | Lubbers et al. | Jun 1998 | A |
5799305 | Bortvedt et al. | Aug 1998 | A |
5805578 | Stirpe et al. | Sep 1998 | A |
5805900 | Fagen et al. | Sep 1998 | A |
5806065 | Lomet | Sep 1998 | A |
5822790 | Mehrotra | Oct 1998 | A |
5862312 | Mann | Jan 1999 | A |
5870563 | Roper et al. | Feb 1999 | A |
5878410 | Zbikowski et al. | Mar 1999 | A |
5878414 | Hsiao et al. | Mar 1999 | A |
5884046 | Antonov | Mar 1999 | A |
5884098 | Mason, Jr. | Mar 1999 | A |
5884303 | Brown | Mar 1999 | A |
5890147 | Peltonen et al. | Mar 1999 | A |
5917998 | Cabrera et al. | Jun 1999 | A |
5933834 | Aichelen | Aug 1999 | A |
5943690 | Dorricott et al. | Aug 1999 | A |
5966707 | Van Huben et al. | Oct 1999 | A |
5996089 | Mann | Nov 1999 | A |
6014669 | Slaughter et al. | Jan 2000 | A |
6021414 | Fuller | Feb 2000 | A |
6029168 | Frey | Feb 2000 | A |
6038570 | Hitz et al. | Mar 2000 | A |
6044367 | Wolff | Mar 2000 | A |
6052759 | Stallmo et al. | Apr 2000 | A |
6055543 | Christensen et al. | Apr 2000 | A |
6070172 | Lowe | May 2000 | A |
6081833 | Okamato et al. | Jun 2000 | A |
6081883 | Popelka et al. | Jun 2000 | A |
6108759 | Orcutt et al. | Aug 2000 | A |
6117181 | Dearth et al. | Sep 2000 | A |
6122754 | Litwin et al. | Sep 2000 | A |
6138126 | Hitz et al. | Oct 2000 | A |
6154854 | Stallmo | Nov 2000 | A |
6173374 | Heil et al. | Jan 2001 | B1 |
6209059 | Ofer et al. | Mar 2001 | B1 |
6219693 | Napolitano et al. | Apr 2001 | B1 |
6321345 | Mann | Nov 2001 | B1 |
6334168 | Islam et al. | Dec 2001 | B1 |
6353823 | Kumar | Mar 2002 | B1 |
6384626 | Tsai et al. | May 2002 | B2 |
6385626 | Tamer et al. | May 2002 | B1 |
6393483 | Latif et al. | May 2002 | B1 |
6397311 | Capps | May 2002 | B1 |
6408313 | Cambell et al. | May 2002 | B1 |
6405219 | Saether et al. | Jun 2002 | B2 |
6421781 | Fox et al. | Jul 2002 | B1 |
6434574 | Day et al. | Aug 2002 | B1 |
6449730 | Mann et al. | Sep 2002 | B2 |
6453389 | Weinberger et al. | Sep 2002 | B1 |
6457139 | D'Errico et al. | Sep 2002 | B1 |
6463442 | Bent et al. | Oct 2002 | B1 |
6499091 | Bergsten | Dec 2002 | B1 |
6502172 | Chang | Dec 2002 | B2 |
6502174 | Beardsley et al. | Dec 2002 | B1 |
6523130 | Hickman et al. | Feb 2003 | B1 |
6526478 | Kirby | Feb 2003 | B1 |
6546443 | Kakivaya et al. | Apr 2003 | B1 |
6549513 | Chao et al. | Apr 2003 | B1 |
6557114 | Mann | Apr 2003 | B2 |
6567894 | Hsu et al. | May 2003 | B1 |
6567926 | Mann | May 2003 | B2 |
6571244 | Larson | May 2003 | B1 |
6571349 | Mann | May 2003 | B1 |
6574745 | Mann | Jun 2003 | B2 |
6594655 | Tal et al. | Jul 2003 | B2 |
6594660 | Berkowitz et al. | Jul 2003 | B1 |
6594744 | Humlicek et al. | Jul 2003 | B1 |
6598174 | Parks et al. | Jul 2003 | B1 |
6618798 | Burton et al. | Sep 2003 | B1 |
6662184 | Friedberg | Dec 2003 | B1 |
6671686 | Pardon et al. | Dec 2003 | B2 |
6671704 | Gondi et al. | Dec 2003 | B1 |
6732125 | Autrey et al. | May 2004 | B1 |
6748429 | Talluri et al. | Jun 2004 | B1 |
6801949 | Bruck et al. | Oct 2004 | B1 |
6848029 | Coldewey | Jan 2005 | B2 |
6856591 | Ma et al. | Feb 2005 | B1 |
6895534 | Wong et al. | May 2005 | B2 |
6907011 | Miller et al. | Jun 2005 | B1 |
6917942 | Burns et al. | Jul 2005 | B1 |
6922696 | Lincoln et al. | Jul 2005 | B1 |
6934878 | Massa et al. | Aug 2005 | B2 |
6940966 | Lee | Sep 2005 | B2 |
6954435 | Billhartz et al. | Oct 2005 | B2 |
6990604 | Binger | Jan 2006 | B2 |
6990611 | Busser | Jan 2006 | B2 |
7007044 | Rafert et al. | Feb 2006 | B1 |
7007097 | Huffman et al. | Feb 2006 | B1 |
7017003 | Murotani et al. | Mar 2006 | B2 |
7043485 | Manley et al. | May 2006 | B2 |
7069320 | Chang et al. | Jun 2006 | B1 |
7111305 | Solter et al. | Sep 2006 | B2 |
7124264 | Yamashita | Oct 2006 | B2 |
7146524 | Patel et al. | Dec 2006 | B2 |
7152182 | Ji et al. | Dec 2006 | B2 |
7177295 | Sholander et al. | Feb 2007 | B1 |
7184421 | Liu et al. | Feb 2007 | B1 |
7194487 | Kekre et al. | Mar 2007 | B1 |
7225204 | Manley et al. | May 2007 | B2 |
7228299 | Harmer et al. | Jun 2007 | B1 |
7240235 | Lewalski-Brechter | Jul 2007 | B2 |
7249118 | Sandler et al. | Jul 2007 | B2 |
7257257 | Anderson et al. | Aug 2007 | B2 |
7313614 | Considine et al. | Dec 2007 | B2 |
7318134 | Oliverira et al. | Jan 2008 | B1 |
7346346 | Fachan | Mar 2008 | B2 |
7373426 | Jinmei et al. | May 2008 | B2 |
7386675 | Fachan | Jun 2008 | B2 |
7386697 | Case et al. | Jun 2008 | B1 |
7440966 | Adkins et al. | Oct 2008 | B2 |
7451341 | Okati et al. | Nov 2008 | B2 |
7509448 | Fachan et al. | Mar 2009 | B2 |
7533298 | Smith et al. | May 2009 | B2 |
7546354 | Fan et al. | Jun 2009 | B1 |
7546412 | Ahmad et al. | Jun 2009 | B2 |
7551572 | Passey et al. | Jun 2009 | B2 |
7571348 | Deguchi et al. | Aug 2009 | B2 |
7590652 | Passey et al. | Sep 2009 | B2 |
7593938 | Lemar et al. | Sep 2009 | B2 |
6687805 | Cochran | Feb 2010 | B1 |
7676691 | Fachan et al. | Mar 2010 | B2 |
7680836 | Anderson et al. | Mar 2010 | B2 |
7680842 | Anderson et al. | Mar 2010 | B2 |
7685126 | Patel et al. | Mar 2010 | B2 |
7739288 | Lemar et al. | Jun 2010 | B2 |
7743033 | Patel et al. | Jun 2010 | B2 |
7752402 | Fachan et al. | Jul 2010 | |
7756898 | Passey et al. | Jul 2010 | |
20010042224 | Stanfill et al. | Nov 2001 | A1 |
20010047451 | Noble et al. | Nov 2001 | A1 |
20010056492 | Bressoud et al. | Dec 2001 | A1 |
20020010696 | Izumi | Jan 2002 | A1 |
20020035668 | Nakano et al. | Mar 2002 | A1 |
20020038436 | Suzuki | Mar 2002 | A1 |
20020055940 | Elkan | May 2002 | A1 |
20020072974 | Pugliese et al. | Jun 2002 | A1 |
20020075870 | de Azevedo et al. | Jun 2002 | A1 |
20020078180 | Miyazawa | Jun 2002 | A1 |
20020083078 | Pardon et al. | Jun 2002 | A1 |
20020083118 | Sim | Jun 2002 | A1 |
20020087366 | Collier et al. | Jul 2002 | A1 |
20020095438 | Rising et al. | Jul 2002 | A1 |
20020124137 | Ulrich et al. | Sep 2002 | A1 |
20020138559 | Ulrich et al. | Sep 2002 | A1 |
20020156840 | Ulrich et al. | Oct 2002 | A1 |
20020156891 | Ulrich et al. | Oct 2002 | A1 |
20020156973 | Ulrich et al. | Oct 2002 | A1 |
20020156974 | Ulrich et al. | Oct 2002 | A1 |
20020156975 | Staub et al. | Oct 2002 | A1 |
20020158900 | Hsieh et al. | Oct 2002 | A1 |
20020161846 | Ulrich et al. | Oct 2002 | A1 |
20020161850 | Ulrich et al. | Oct 2002 | A1 |
20020161973 | Ulrich et al. | Oct 2002 | A1 |
20020163889 | Yemini et al. | Nov 2002 | A1 |
20020165942 | Ulrich et al. | Nov 2002 | A1 |
20020166026 | Ulrich et al. | Nov 2002 | A1 |
20020166079 | Ulrich et al. | Nov 2002 | A1 |
20020169827 | Ulrich et al. | Nov 2002 | A1 |
20020174295 | Ulrich et al. | Nov 2002 | A1 |
20020174296 | Ulrich et al. | Nov 2002 | A1 |
20020178162 | Ulrich et al. | Nov 2002 | A1 |
20020191311 | Ulrich et al. | Dec 2002 | A1 |
20020194523 | Ulrich et al. | Dec 2002 | A1 |
20020194526 | Ulrich et al. | Dec 2002 | A1 |
20020198864 | Ostermann et al. | Dec 2002 | A1 |
20030005159 | Kumhyr | Jan 2003 | A1 |
20030014391 | Evans et al. | Jan 2003 | A1 |
20030033308 | Patel et al. | Feb 2003 | A1 |
20030061491 | Jaskiewicz et al. | Mar 2003 | A1 |
20030109253 | Fenton et al. | Jun 2003 | A1 |
20030120863 | Lee et al. | Jun 2003 | A1 |
20030125852 | Schade et al. | Jul 2003 | A1 |
20030131860 | Ashcraft et al. | Jul 2003 | A1 |
20030135514 | Patel et al. | Jul 2003 | A1 |
20030149750 | Franzenburg | Aug 2003 | A1 |
20030158873 | Sawdon et al. | Aug 2003 | A1 |
20030163726 | Kidd | Aug 2003 | A1 |
20030172149 | Edsall et al. | Sep 2003 | A1 |
20030177308 | Lewalski-Brechter | Sep 2003 | A1 |
20030182325 | Manely et al. | Sep 2003 | A1 |
20040003053 | Williams | Jan 2004 | A1 |
20040024731 | Cabrera et al. | Feb 2004 | A1 |
20040024963 | Talagala et al. | Feb 2004 | A1 |
20040078812 | Calvert | Apr 2004 | A1 |
20040133670 | Kaminksky et al. | Jul 2004 | A1 |
20040143647 | Cherkasova | Jul 2004 | A1 |
20040153479 | Mikesell et al. | Aug 2004 | A1 |
20040189682 | Troyansky et al. | Sep 2004 | A1 |
20040199734 | Rajamani et al. | Oct 2004 | A1 |
20040199812 | Earl et al. | Oct 2004 | A1 |
20040205141 | Goland | Oct 2004 | A1 |
20040230748 | Ohba | Nov 2004 | A1 |
20040260673 | Hitz et al. | Dec 2004 | A1 |
20050010592 | Guthrie | Jan 2005 | A1 |
20050066095 | Mullick et al. | Mar 2005 | A1 |
20050114402 | Guthrie | May 2005 | A1 |
20050114609 | Shorb | May 2005 | A1 |
20050131860 | Livshits | Jun 2005 | A1 |
20050131990 | Jewell | Jun 2005 | A1 |
20050138195 | Bono | Jun 2005 | A1 |
20050171960 | Lomet | Aug 2005 | A1 |
20050171962 | Martin et al. | Aug 2005 | A1 |
20050187889 | Yasoshima | Aug 2005 | A1 |
20050188052 | Ewanchuk et al. | Aug 2005 | A1 |
20050192993 | Messinger | Sep 2005 | A1 |
20050289169 | Adya et al. | Dec 2005 | A1 |
20050289188 | Nettleton et al. | Dec 2005 | A1 |
20060004760 | Clift et al. | Jan 2006 | A1 |
20060041894 | Cheng | Feb 2006 | A1 |
20060047925 | Perry | Mar 2006 | A1 |
20060059467 | Wong | Mar 2006 | A1 |
20060074922 | Nishimura | Apr 2006 | A1 |
20060083177 | Iyer et al. | Apr 2006 | A1 |
20060095438 | Fachan et al. | May 2006 | A1 |
20060101062 | Godman et al. | May 2006 | A1 |
20060129584 | Hoang et al. | Jun 2006 | A1 |
20060129631 | Na et al. | Jun 2006 | A1 |
20060129983 | Feng | Jun 2006 | A1 |
20060155831 | Chandrasekaran | Jul 2006 | A1 |
20060206536 | Sawdon et al. | Sep 2006 | A1 |
20060230411 | Richter et al. | Oct 2006 | A1 |
20060277432 | Patel | Dec 2006 | A1 |
20060288161 | Cavallo | Dec 2006 | A1 |
20070091790 | Passey et al. | Apr 2007 | A1 |
20070094269 | Mikesell et al. | Apr 2007 | A1 |
20070094277 | Fachan et al. | Apr 2007 | A1 |
20070094310 | Passey et al. | Apr 2007 | A1 |
20070094431 | Fachan | Apr 2007 | A1 |
20070094452 | Fachan | Apr 2007 | A1 |
20070168351 | Fachan | Jul 2007 | A1 |
20070171919 | Godman et al. | Jul 2007 | A1 |
20070195810 | Fachan | Aug 2007 | A1 |
20070233684 | Verma et al. | Oct 2007 | A1 |
20070233710 | Passey et al. | Oct 2007 | A1 |
20070255765 | Robinson | Nov 2007 | A1 |
20080005145 | Worrall | Jan 2008 | A1 |
20080010507 | Vingralek | Jan 2008 | A1 |
20080021907 | Patel et al. | Jan 2008 | A1 |
20080031238 | Harmelin et al. | Feb 2008 | A1 |
20080034004 | Cisler et al. | Feb 2008 | A1 |
20080044016 | Henzinger | Feb 2008 | A1 |
20080046432 | Anderson et al. | Feb 2008 | A1 |
20080046443 | Fachan et al. | Feb 2008 | A1 |
20080046444 | Fachan et al. | Feb 2008 | A1 |
20080046445 | Passey et al. | Feb 2008 | A1 |
20080046475 | Anderson et al. | Feb 2008 | A1 |
20080046476 | Anderson et al. | Feb 2008 | A1 |
20080046667 | Fachan et al. | Feb 2008 | A1 |
20080059541 | Fachan et al. | Mar 2008 | A1 |
20080126365 | Fachan et al. | May 2008 | A1 |
20080151724 | Anderson et al. | Jun 2008 | A1 |
20080154978 | Lemar et al. | Jun 2008 | A1 |
20080155191 | Anderson et al. | Jun 2008 | A1 |
20080168458 | Fachan et al. | Jul 2008 | A1 |
20080243773 | Patel et al. | Oct 2008 | A1 |
20080256103 | Fachan et al. | Oct 2008 | A1 |
20080256537 | Fachan et al. | Oct 2008 | A1 |
20080256545 | Fachan et al. | Oct 2008 | A1 |
20080294611 | Anglin et al. | Nov 2008 | A1 |
20090055399 | Lu et al. | Feb 2009 | A1 |
20090055604 | Lemar et al. | Feb 2009 | A1 |
20090055607 | Schack et al. | Feb 2009 | A1 |
20090210880 | Fachan et al. | Aug 2009 | A1 |
20090248756 | Akidau et al. | Oct 2009 | A1 |
20090248765 | Akidau et al. | Oct 2009 | A1 |
20090248975 | Daud et al. | Oct 2009 | A1 |
20090249013 | Daud et al. | Oct 2009 | A1 |
20090252066 | Passey et al. | Oct 2009 | A1 |
20090327218 | Passey et al. | Dec 2009 | A1 |
20100161556 | Anderson et al. | Jun 2010 | A1 |
20100161557 | Anderson et al. | Jun 2010 | A1 |
Number | Date | Country |
---|---|---|
0774723 | May 1997 | EP |
2006-506741 | Jun 2004 | JP |
4464279 | May 2010 | JP |
WO 9429796 | Dec 1994 | WO |
WO 0057315 | Sep 2000 | WO |
WO 0114991 | Mar 2001 | WO |
WO 0133829 | May 2001 | WO |
WO 02061737 | Aug 2002 | WO |
WO 03012699 | Feb 2003 | WO |
WO 2004046971 | Jun 2004 | WO |
WO 2008021527 | Feb 2008 | WO |
WO 2008021528 | Feb 2008 | WO |
Number | Date | Country | |
---|---|---|---|
20070094269 A1 | Apr 2007 | US |