Scalable system for partitioning and accessing metadata over multiple servers

Description

FIELD OF THE INVENTION

The present invention relates generally to the field of storage networks, and more specifically to a system and method for partitioning and accessing metadata over multiple servers in an aggregated file system.

BACKGROUND

With the arrival of gigabit and multi-gigabit network technology, storing, accessing and sharing large volumes of data over a network has become more and more efficient. For instance, a single Gigabit Ethernet or FibreChannel connection is capable of communicating data at a rate of up to 240 Megabytes/second (MB/s), which is even faster than most locally attached storage devices. As a result, many users can store and manipulate their data in an aggregated file system that is located remotely and managed by professional system administrators. In order to ensure a smooth and secure operation of the aggregated file system, however, a large amount of metadata needs to be stored and accessed. The volume of metadata and volume of access requests to the metadata may exceed capability of a single metadata server. There is a need, therefore, for an improved aggregated file system for managing large amounts of metadata.

SUMMARY

A system and method is described for partitioning and accessing metadata in multiple metadata volumes in an aggregated file system. The aggregated file system includes a plurality of file servers storing user-specific data, a plurality of metadata servers, each metadata server hosting one or more metadata volumes associated with the user-specific data and a plurality of file switches receiving user requests regarding the user-specific data and acting on the data accordingly.

Each of the metadata volumes has links to some other metadata volumes. As a result, the metadata volumes form a hierarchical structure. This hierarchical structure is built through the use of special metadata files that create the links across distinct metadata volumes. These inter-volume links are called “Mount Entries”. In particular, there is a root metadata volume that includes a plurality of mount entries. A respective mount entry is accessible via a pathname and references a respective distinct child metadata volume using a unique volume ID. The respective child metadata volume may, in turn, have its own mount entries further referencing its own respective distinct child metadata volumes using unique volume IDs.

In one embodiment, each metadata volume stores a list of mount entries. At system start-up time, a file switch retrieves mount entries from at least a subset of the metadata volumes and caches them in its memory. The retrieval of the mount entries starts with the root metadata volume and traverses each of the child metadata volumes, recursively. When a user submits to the file switch a processing request including an absolute pathname to a file, the file switch analyzes the absolute pathname of the file and identifies a target metadata volume that hosts the metadata of the requested file. The identification of the target metadata volume begins with the root metadata volume, which is treated as the current metadata volume. The mount entries associated with the current metadata volume are first examined to locate a child metadata volume that matches a portion of the absolute pathname. If no child metadata volume is located, the file switch assumes that the metadata of the requested file is stored in the current metadata volume, which is the target metadata volume. Otherwise, the child metadata volume becomes the current metadata volume and the identification process continues recursively after removing the matched portion of the absolute pathname until a target metadata volume is identified. In other words, when there are no mount entries in the current metadata volume that match a portion of the residual pathname, the current metadata volume is the target metadata volume. In some embodiments, the mount entries are cached in the file switch, thereby enabling searches for a target metadata volume to be completed quickly and efficiently.

BRIEF DESCRIPTION OF THE DRAWINGS

The aforementioned features and advantages of the invention as well as additional features and advantages thereof will be more clearly understood hereinafter as a result of a detailed description of embodiments of the invention when taken in conjunction with the drawings.

FIG. 1 is a diagram illustrating an embodiment of an aggregated file system.

FIG. 2
a is a diagram illustrating an embodiment of a user view of a hierarchical directory structure of the aggregated file system.

FIG. 2
b is a diagram illustrating an embodiment of a metadata hierarchical directory structure implemented using a plurality of metadata volumes corresponding to the user view shown in FIG. 2a.

FIG. 2
c is a diagram illustrating an embodiment of the metadata hierarchical directory structure partitioned across a plurality of metadata volumes, along with the links that aggregate the structure shown in FIG. 2a.

FIG. 3
a is a diagram illustrating an embodiment of data structures of the mount entry cache supporting the metadata hierarchical directory structure shown in FIGS. 2b and 2c.

FIG. 3
b is a diagram illustrating the content of additional disk-resident metadata structures used to aggregate the metadata hierarchy in FIGS. 2b, 2c and 3a, partitioned across a plurality of metadata volumes.

FIG. 4 is an overview flowchart illustrating an embodiment of the operations of the aggregated file system in response to different types of user requests.

FIG. 5 is a flowchart illustrating an embodiment of the operations in a file switch during system initialization.

FIG. 6 is a flowchart illustrating an embodiment of the operations in a mount entry lookup with respect to the metadata hierarchical directory structure.

FIG. 7 is a flowchart further illustrating an embodiment of the operations in identifying the matching mount entry in the respective metadata volume.

FIG. 8 is a flowchart illustrating an embodiment of the operations in a mount entry insertion with respect to the metadata hierarchical directory structure.

FIG. 9 is a flowchart illustrating an embodiment of the operations in a mount entry deletion with respect to the metadata hierarchical directory structure.

FIG. 10 is a diagram illustrating an embodiment of a file switch configuration.

FIG. 11 is a diagram illustrating an exemplary embodiment of identifying a target metadata volume and a residual pathname in response to an absolute pathname.

Like reference numerals refer to corresponding parts throughout the several views of the drawings.

DESCRIPTION OF EMBODIMENTS

FIG. 1 illustrates an embodiment of an aggregated file system 140 that includes a group of file servers (142, 144), a group of file switches (152, 154), and a group of metadata servers (162, 164) that have connections to the file servers (142, 144) and the metadata servers (162, 164), respectively. While FIG. 1 shows pairs of file servers (142, 144), file switches (152, 154) and metadata servers (162, 164), some embodiments have more than two file servers, file switches and/or metadata servers. The aggregated file system 140 typically manages a large number of user files. Each file stored by the system 140 has an associated unique pathname, which identifies where the file is stored in a logical hierarchy of directories. The user files may include many types of files, including documents of various types, computer programs, database files, and other types of information storing files.

To be efficient and scalable, the aggregated file system 140 splits the user files into multiple volumes, with a respective file server, such as file server 142, hosting one or more of the multiple volumes. A respective user file also has an associated metadata file storing information identifying at least a subset of the file servers (142, 144) that store the user file and directory structures on the subset of file servers (142, 144).

Typically, the aggregated file system 140 includes one or more file switches (152, 154) that receive a user request, e.g., file open, regarding the user file from one of a plurality of clients (102, 104, 106 and 108) through a communications network 120, e.g., the Internet, and a network interface 130. At least one of the file switches (152, 154), such as file switch 152, acts on the user files stored in one or more of the file servers (142, 144) in accordance with the user request. The user request includes (or, alternately, specifies or identifies) an absolute pathname of the requested user file. Prior to acting on the user file, the file switch 152 needs to identify the exact subset of file servers (142, 144) hosting the user file and determine their respective status. This process of identifying the subset of the hosting file servers (142, 144) for the requested user file is typically implemented as a query to the metadata servers (162, 164) to locate the corresponding metadata file associated with the requested user file.

The user files are organized by a hierarchical directory structure of the aggregated file system 140 (an example is shown in FIG. 2a), that is defined by the metadata directory structure. The primary goal of the query is to scan through the metadata hierarchical data structure to identify a particular metadata file that corresponds to the absolute pathname of the requested user file. The identified metadata file includes information identifying the subset of the file servers (142, 144) hosting the requested user file. However, as discussed above in the background section, a single metadata server, such as metadata server 162, may lack the storage or computational capacity required to host all the metadata files of the aggregated file system 140. Accordingly, the metadata files are also split into multiple volumes. A respective metadata volume is assigned to one of the metadata servers (162, 164), and covers only a portion of the metadata hierarchical directory structure. In some embodiments, more than one metadata volume may be assigned to a metadata server. The group of metadata servers (162, 164) jointly cover the entire metadata hierarchical directory structure through the metadata volumes they manage.

It will be understood by one skilled in the art that FIG. 1 shows two sets of servers for the purpose of illustrating that there are generally two types of data in an aggregated file system. However, the present invention is also applicable to embodiments in which both user data and metadata volumes are physically stored in a same server.

FIG. 2
b depicts an illustrative metadata hierarchical directory structure that can support the user view shown in FIG. 2a and implemented using a plurality of metadata volumes (MDV). In this example, there is a root metadata volume, MDV1_1, at the root level that has respective links to a plurality of child metadata volumes MDV2_1 (via Mount Entry ME2_1, whose pathname within MDV1_1 is “usr/joe”) and MDV2_2 (via Mount Entry ME2_2, whose pathname within MDV1_1 is “usr/bill/data”). This means that the portion of the file system hierarchy below the root directory of MDV2_1 is effectively seen by the client as being placed beneath the “usr/joe” Mount Entry, and that the latter appears as a directory in MDV1_1 to the client. Metadata volume MDV2_2 has no further links to any other metadata volumes. Metadata volume MDV2_1, however, has links to MDV3_1 (via Mount Entry ME3_1, whose pathname within MDV2_1 is “old/archive”), MDV3_2 (via Mount Entry ME3_2, whose pathname within MDV2_1 is “progs/code”) and MDV3_3 (via Mount Entry ME3_3, whose pathname within MDV2_1 is “arch/prodx/sw”). Metadata volumes MDV3_1 and MDV3_3 have no further links, whereas volume MDV3_2 has two more links to MDV4_1 (via Mount Entry ME4_1, whose pathname within MDV3_2 is “src/C”) and to MDV4_2 (via Mount Entry ME4_2, whose pathname within MDV3_2 is “src/java”). Finally, volumes MDV4_1 and MDV4_2 have no further links. The resulting file system hierarchy is depicted in FIG. 2c, which identifies individual metadata volumes, Mount Entries and Reverse Mappers, each of which link back the root directory of each child metadata volume to the referencing Mount Entry in its respective parent metadata volume.

The interpretation of pathnames in the metadata service occurs in terms of absolute pathnames that may span multiple metadata volumes. When a pathname to be interpreted is passed to a standard File System running on the root metadata volume, the File System has no knowledge about Mount Entries that appear to it as pure data files. Thus, pathnames that include Mount Entries as intermediate components would cause the File System to return an error indicating that the pathnames does not exist. Therefore, it is necessary to break up absolute pathnames into multiple components so that each component (between two Mount Entries, or between the root of the pathname and a Mount Entry) is interpreted in the context of one specific metadata volume. Since this process must be carried out every time a client sends a request containing a pathname, there is a need for an efficient process to break the absolute pathname into the multiple components and to direct the request containing the final residual portion of the pathname to the metadata volume whose File System can interpret it. The process of retrieving the target metadata volume in response to any pathname could involve hopping across multiple volumes, which may imply network exchanges with various servers. This process potentially has two impacts: 1) the time it takes to get to the target server would be considerably extended by the number of network interactions needed; 2) since the process would start from the root volume, the server that hosts the root volume would have the largest communications load because every pathname translation begins with the root volume.

Therefore, it would therefore be advantageous to provide a metadata access system and process that meets the following requirements: 1) the system or process must be capable of performing partial matches of pathnames through intermediate Mount Entries detection, regardless of the number of pathname components; 2) the system or process must be efficient; 3) the system or process must efficiently handle pathname changes (e.g., pathname changes performed in response to user requests); 4) the system or process must avoid overloading the servers that manage the metadata volumes highest up in the metadata hierarchy, e.g., the root volume; 5) in a system having multiple File Switches (152) , all the File Switches should have a common view of the metadata volume hierarchy. Typically, the metadata hierarchy may evolve over time, but only quite slowly. Also, the matching of pathnames must be performed only up to the final Mount Entry (which points to the target metadata volume).

In some embodiments, the volume-resident metadata structures that support the partitioning of metadata are a Mount Entry List, metadata files that implement the Mount Entries and metadata files that implement the Reverse Mappers. Besides providing the basic cross-volume link information, these metadata files also provide a certain amount of redundancy that allows missing links to be reconstructed. Reconstruction may be required, for example, if a system crash leaves the aggregated file system in an inconsistent state. Below are definitions of the three types of objects:

- Mount Entries: metadata files that point to a target volume ID. They are the actual cross-volume links interpreted by the aggregated file system that runs in the File Switch. The file system, using this mechanism, makes the root directory of the target volume to which a Mount Entry points appear as if it were located at the pathname of the Mount Entry.
- Mount Entry List: these are ancillary metadata files that contain one entry for each Mount Entry in the volume where they reside. Each entry in the list is an ordered pair that provides the pathname of the Mount Entry relative to the root directory of the volume where it resides and the ID of the volume it points to (see FIG. 3b for examples of Mount Entry Lists for volumes MDV1_1 and MDV2_1, in the context of the example in FIGS. 2a and 2b). This file is used in loading the Mount Entry Cache on startup. In one embodiment, this file is the one that is the ultimate reference for the existing Mount Entries in the volume.
- Reverse Mapper: these are ancillary metadata files of which there is only one in the root directory of each volume. Each such file contains an ordered pair: the ID of the volume where the Mount Entry that references this volume resides and the pathname of the Mount Entry relative to the root directory of the volume where it resides. FIG. 3b shows the content of the Reverse Mappers for volume MDV2_1 and MDV3_3 of the example in FIGS. 2a and 2b.

The requirements related to the partitioned metadata, discussed above, can be met by means of a Mount Entry Cache residing in each File Switch. In some embodiments, the cache contains all the existing Mount Entries, rather than a subset of them that are frequently used.

FIG. 4 is an overview flowchart illustrating an embodiment of the operations in the aggregated file system 140 (FIG. 1) in response to user requests. The system begins by conducting a system-level initialization 401. More details about this initialization process are provided below in connection with FIG. 5. After the system-level initialization 401, the file switches in the aggregated file system 140 (FIG. 1) wait for a subsequent or next user request 403. Different types of the user requests are followed by different types of operations, such as a mount entry insertion 405, a mount entry lookup 407 or a mount entry deletion 409. After an operation, the system waits for the next user request 403 or terminates operation 411, e.g., a power-off instruction.

FIG. 5 is a flowchart illustrating an embodiment of the operations in the file switch during the system-level initialization 401 (FIG. 4). The file switch identifies the root metadata volume 502 and loads the mount entry list (ME list) of the root metadata volume into a mount entry cache (MEC) 504 of the file switch. An empty ME list indicates that the root metadata volume itself has all the metadata of the aggregated file system 140 (FIG. 1), in which case any user request can be resolved within the root metadata volume itself. However, as shown in FIG. 2b, this root metadata volume's ME list typically includes a plurality of list entries and each respective list entry further comprises a group of mount entries. Therefore, the file switch selects an entry from the ME list that has not been selected before 506, identifies the child metadata volume associated with the selected entry 508 and recursively loads the ME list associated with the child metadata volume into the Mount Entry Cache (MEC) 510 until there is no remaining mount entry that has not been selected by the file switch 512. In the context of 506, “the ME List” includes all ME lists that have been loaded into the MEC. At the end of the system-level initialization 401 (FIG. 4), a copy of the metadata hierarchical directory structure of the aggregated file system 140 (FIG. 1) is stored in the MEC of the file switch.

Referring to FIG. 1, since the aggregated file system 140 often includes multiple file switches (152, 154), in some embodiments each of the multiple file switches, such as file switch 152, has its own copy of the metadata hierarchical directory structure in its own mount entry cache (153, 155). In some embodiments, a subset of the multiple file switches (152, 154) have their own copy of the metadata hierarchical directory structure. In one embodiment, each file switch conducts the system-level initialization 401 (FIG. 4) independently to generate its own copy of the metadata hierarchical directory structure. In yet another embodiment, after the metadata hierarchical directory structure is created in the mount entry cache of a respective file switch, it is broadcast and replicated in other file switches. A significant advantage of caching the metadata hierarchical directory structure in the memory of the respective file switch is that it enables the metadata volume, served by the appropriate metadata server such as 162, responsive to a user request to be efficiently identified—because it eliminates the need to retrieve metadata from multiple volumes.

Depending on the types of the user requests, there are three primary operations associated with the metadata hierarchical directory structure in the MEC. FIG. 6 is a flowchart illustrating an embodiment of the operations in the mount entry lookup 407 (FIG. 4), which is the most typical operation of the three. On receipt of the user request for accessing a user-specified file 403, the file switch retrieves an absolute pathname of the user-specified file from the user request (601). The file switch sets the root metadata volume as the current metadata volume (603), sets the absolute pathname as the current pathname (605) and examines its associated mount entries for any one whose relative pathname partially matches the current pathname (607). If a matching mount entry is found, the file switch sets the metadata volume associated with the matching entry as the new current metadata volume (609), creates the residual pathname by removing the relative pathname of the matching entry from the current pathname (611), sets the residual pathname as the new current pathname (613) and returns to operation (607) to search for another matching mount entry in the new current ME list for the metadata volume.

The aforementioned process repeats itself recursively until no mount entry is found in the ME list of the current metadata volume whose relative pathname matches a beginning portion of the current pathname. Then the file switch sets the current metadata volume as the target metadata volume (615), sets the current pathname as the residual pathname (617) and information identifying the target metadata volume and the residual pathname is returned (619) to the file switch. Based on the returned information, the file switch may directly visit the location in the target metadata volume as represented by the residual pathname and retrieve the corresponding metadata information associated with the user-specified file.

An efficient Mount Entry Cache capable of matching strings could be based on a tree data structure. This would be adequate if pathname changes are infrequent. However, since the pathname changes are controlled by client applications, there is no such guarantee. A preferred design for the Mount Entry Cache is based on the following. The cache is organized as a tree of descriptors for Mount Entries pointing to physical metadata volumes. Each Mount Entry that references a volume containing Mount Entries points to one or more List Entries. For example, the file system tree in FIG. 2a, and implemented as shown in FIGS. 2b and 2c is stored in the Mount Entry Cache as in FIG. 3a. The tree (FIG. 3a) has a root node, root Mount Entry ME1_1. In some embodiments, each Mount Entry (ME1_1, ME2_1, ME2_2, ME3_1, ME3_2, ME3_3, ME4_1, ME4_2 in FIGS. 2b and 2c) stores the ID of the associated physical volume, the relative pathname of the Mount Entry (which would be null for the root Mount Entry, ME1_1, as shown in FIG. 3a), the maximum number of pathname components for the Mount Entries in the associated physical volume, and a linked list of List Entries that point or link to all the Mount Entries in the volume. One List Entry exists for each set of Mount Entries that are contained in the volume referenced by the parent Mount Entry and that have the same number of pathname components. Within each List Entry, the number of pathnames components for all the Mount Entries the List Entry references is stored. Each List Entry also points to the next List Entry, if any. The list of List Entries is ordered by the number of components in the pathnames of the listed Mount Entries. In some embodiments, each List Entry contains a hash table that allows access to the Mount Entries via the hash code computed from the absolute pathname of the Mount Entry. Mount Entries with the same hash code are accessed as a linked list reference by the bucket or record associated with the hash code (e.g., the hash table may point to the first entry of the linked list, or may point to a record that contains or points to the first entry of the linked list).

Mapping a given absolute pathname into the Mount Entry that points to the appropriate metadata volume is accomplished by searching for a Mount Entry that either exactly matches all the components in the pathname, or for a terminal Mount entry that is a partial match, i.e., an exact match to a stem of the pathname. The search is based on efficiently matching strings of variable length until the maximal match is found.

In some embodiments, the data structures used to perform the matching (see FIG. 3a) and the associated algorithm (see FIGS. 6 and 7) minimize the computational resources used to map a pathname to a matching Mount Entry. The input pathname is scanned and a hash code is generated for the pathname. The data structures in use are two: the Mount Entries (2000, 2010, 2020, 2030, 2040, 2050, 2060 and 2070 in FIG. 3a) and the List Entries (2100, 2110, 2120, 2130 and 2140 in FIG. 3a). A Mount Entry (ME) 2000 in FIG. 3a comprises the following fields:

- the ID of the MDV the ME points to (2001)—in this case this entry (the root ME) points to MDV1_1;
- the maximum number of pathname components in the MEs that are within the MDV this ME points to (2002)—this field would be set to 0 if there were no MEs in the MDV, in this case it is set to 3, as there are subordinate MEs with pathname counts of 2 and 3;
- the relative pathname of this ME with respect to the root directory of the volume (2003)—for this ME the relative pathname is a null string as this ME is the global file system root;
- a pointer to an LE that points to MEs within this MDV (2004)—this would be a null pointer if there were no MEs in the MDV this ME points to, in this case it points to LE 2100;
- a pointer to the next ME, if any, that has the same hash code as this ME (2005)—this is a null pointer when there are no other MEs, as in the case of this ME.

In some embodiments, a List Entry (LE), such as LE 2100, includes the following fields:

- a hash table (2101) in which each non-null item (2102) is a pointer to a list of MEs whose hash code maps to that hash table entry;
- the count of components in the MEs that this LE points to (2103)—in this case, pathnames with two components; and
- a pointer to the next LE (2104), if any, where the next LE points to MEs within the same volume with a number of pathname components that is higher than the number of pathname components for the MEs referenced by the current LE. For example, the next LE pointer in LE 2100 points to LE 2110, which points to MEs having three pathname components.

A flowchart of the pathname lookup process used in some embodiments is shown in FIGS. 6 and 7. FIG. 7 is a flowchart illustrating the operations of identifying the matching mount entry 407 (FIG. 4) within a metadata volume. In particular, FIG. 7 represents one implementation of operation 607, which searches for a matching mount entry within the current metadata volume.

To look at a concrete example, let's examine the case in FIG. 11, in which the pathname “/usr/joe/progs/code/src/java/Applets/app.java” is to be looked up on the file system of FIG. 2a, built on the MDV's shown in FIG. 2b. The steps are carried out on the basis of the data structure in FIG. 3a and are shown in synthesis in table 1200 of FIG. 11:

- In the first step, the pathname is handed off to the lookup engine of the Mount Entry Cache. The engine starts with the root ME (2000), after having removed the starting forward slash. So, at this point the pair to be interpreted is made of ME1_1 (pointing to MDV1_1) and of the residual pathname: “usr/joe/progs/code/src/java/Applets/app.java”.
- The hash code for the pathname is computed. This computes a different hash code for each preliminary or beginning portion of the pathname up to the maximum number of components specified in ME1_1 (field 2002). In this example, the hash codes are computed for the strings: “usr”, “usr/joe” and “usr/joe/progs”. In some embodiments, the hash code for each string is computed incrementally from the previous one.
- Now, LE 2100 is looked at: it handles pathnames with 2 components (field 2103), therefore the first hash code need not be used, and the second one will be selected. This will map to a given entry within the hash table (2101). Therefore, the link will be followed until an ME that matches the second string is found. This leads to ME2_1 (2010) and to a residual pathname of “progs/code/src/java/Applets/app.java” (step 1210 in FIG. 11). In general, the lookup could have been unsuccessful if the pathname comprised just one pathname component or if no match for “usr/joe” had been found. In the first case, the resulting pair would be made of MDV 1_1 and pathname “usr/joe/progs/code/src/java/Applets/app.java”. In the second case, the lookup would have continued with the following LE in the list (2110), which is used to locate or match pathnames having three components.
- The starting point now is ME2_1 (2010) and the residual pathname is “progs/code/src/java/Applets/app.java”. The maximum number of components handled within the context of ME2_1 is three. So, the pathname strings to be considered are: “progs”, “progs/code” and “progs/code/src” and the three hash codes are incrementally computed.
- Now, LE 2120 is looked at. Since it deals with pathnames with two components, the string to be considered is: “progs/code”. The hash code for this string maps to ME3_2 (2050) and the strings match (step 1220 in FIG. 11). So, the next step is to interpret the residual pathname “src/java/Applets/app.java”, in the context of MDV3_2.
- At the new starting point, the pathname strings are “src”, “src/java” and “src/java/Applets”, for which hash codes are incrementally computed. Now, LE 2140 is looked at. Since it maps pathnames with two components, only “src/java” and its hash code are considered. This leads to ME4_2 (2060, FIG. 3a) and to the residual pathname “Applets/app.java”. Since the maximum pathname component count for ME4_22060 is zero, there is no underlying ME to go to. So, the final result is the ordered pair <MDV4_2, “Applets/app.java”> (step 1230 in FIG. 11) and the request will be sent to MDV4_2, which will interpret and process the request (see 1300 in FIG. 11).

FIG. 7 represents one implementation of operation 607, which searches for a matching mount entry in the current ME list. The matching mount entry is the one whose relative pathname matches a portion of the current pathname beginning with its first path component. For example, if the current pathname is “user/local/tmp” and there are two mount entries in the ME list whose relative pathnames are, respectively, “user” and “tmp”, The former entry is the matching entry since its relative pathname matches the first path component in the current pathname, while the latter is not the matching entry because its relative pathname does not match the first component in the current pathname. The fact that the latter entry “tmp” matches the third path component in the current pathname is irrelevant for the purposes of locating a matching mount entry. The comparison between the path components in the current pathname and the one in the relative pathname may be implemented as a string comparison. In one embodiment, if the pathname in a user request is not already in the same format as the relative pathnames in the mount entry, the pathname in the user request is converted into the format of the relative pathnames in the mount entries.

An efficient method of identifying the matching mount entry in the ME list is to calculate the hash code for a first portion of the current pathname, and performing a hash table lookup based on the hash code, because hash code calculation and table lookup is often faster than the string comparison. The result of the hash table lookup directs the file switch (which is performing the pathname search operation) to an appropriate bucket, i.e., the matching mount entry in the ME list. When the current pathname has multiple path components, multiple hash codes may be generated. For example, if the current pathname is “user/local/tmp”, the file switch may generate three respective hash codes for the partial pathnames “user”, “user/local” and “user/local/tmp”. Among them, there is at most one hash code, if any, having a matching mount entry in the ME list (i.e., with a matching hash code) and this matching entry must belong to one of the list entries having a path component count equal to the path component count for the portion of the current pathname used to generate the matching hash code.

It is noted that the path component counts for the list entries in an ME list need not be continuous. Therefore, it may not be necessary to calculate a hash code for every possible partial current pathname. Rather, for a given current pathname, hash codes need to be generated only for those path component counts that (A) have an associated list entry for the current metadata server's ME list, and (B) which do not exceed the number of components in the current pathname. Further, as explained next, these hash codes can be computed one at a time, starting with the smallest component count, until either a matching entry is found, or the search for a matching entry is exhausted without success.

In the embodiment shown in FIG. 7, the file switch identifies a set of unique path component counts in the ME list of the current metadata server (710), each count corresponding to one list entry in the ME list. Starting with the smallest unique path component count (712), the file switch calculates the hash code for the partial current pathname having the unique count of path components (720) and compares the hash code with the hash codes in the corresponding list entry (LE) 730. If the matching mount entry is found in the list entry (730—Yes), the file switch will examine the matching metadata server's ME list for a new match 609. If no match is found in the list entry (730—No), the file switch determines if the set of unique path component counts has at least one count larger than the current count 740. If not (740—No), there is no mount entry in any list entry matching any portion of the current pathname and the file switch performs operation 615 (described above) to find the metadata file associated with the user-specified file.

Otherwise (740—Yes), the file switch identifies the next unique path component count, which is the smallest path count not yet processed, and returns to operation 720 to generate a new hash code for a new partial current pathname. In one embodiment, if the hash code generation is not completely unique, the file switch may need to conduct a string comparison after hash code-based matching 730 to verify that it has located an entry with a relative pathname matching a partial pathname of the specified file.

FIG. 8 is a flowchart illustrating an embodiment of the operations in the mount entry insertion 405 (FIG. 4) with respect to the metadata hierarchical directory structure. A mount entry insertion occurs when a new mount entry is added to the mount entry list for an identified metadata volume. The new mount entry represents a new metadata volume that has been added to the aggregated file system. Alternately, when a child metadata volume is moved within the hierarchical directory structure represented by the metadata volumes, a mount entry deletion and a mount entry insertion are required in order to implement the change of the child metadata volume's position in the hierarchy.

On receipt of the mount entry insertion request 403, the file switch retrieves (810) from the insertion request information identifying the parent metadata volume, the child metadata volume (e.g., a new metadata volume being added to the system) and the relative pathname of the child metadata volume in the directory structure of the parent metadata volume. The file switch identifies the child metadata volume and creates a reverse mapper in the child metadata volume (820). In some embodiments, the reverse mapper is a file located in or referenced by the root directory of the child metadata volume. The reverse mapper includes the ID of the parent metadata volume and the relative pathname of the ME pointing to the volume with respect to the root directory of the parent volume. The file switch subsequently opens the ME list of the parent metadata volume and inserts into it a new mount entry pointing to the child metadata volume (830) according to the relative pathname. Next, if the ME list of the parent metadata volume has been loaded into the mount entry cache, the file switch synchronizes the mount entry cache with the ME list of the parent metadata volume by inserting the newly created mount entry into the mount entry cache (840). In some embodiments, the file switch further identifies an appropriate directory in the parent metadata volume and creates a new mount entry in the directory (850) (e.g., by storing a record within the directory, or by storing within the directory a reference to a file containing the new mount entry). The new mount entry in the directory includes the ID of the child metadata volume.

Note that creating a reverse mapper (820) is redundant, since the file switch only needs to visit the ME list itself within each parent metadata volume to create a complete mount entry cache for the aggregated file system 140. However, storing such redundant information in the metadata volumes makes sure that the aggregated file system 140 (FIG. 1) is able to efficiently reconstruct the metadata hierarchical directory structure after a system crash.

FIG. 9 is a flowchart illustrating an embodiment of the mount entry deletion operation 409 (FIG. 4) with respect to the metadata hierarchical directory structure. Note that the processing order of the mount entry deletion 409 (FIG. 4) is opposite to that of the mount entry insertion 405 (FIG. 4). On receipt of the mount entry deletion request 403, the file switch retrieves from the deletion request information identifying a pair of metadata volumes including a parent metadata volume and a child metadata volume (910). The file switch identifies and deletes a mount entry in its associated mount entry cache that corresponds to the pair of parent and child metadata volumes (920). As a result, the metadata files stored in the child metadata volume are immediately inaccessible to the client. The file switch furthermore identifies and deletes a mount entry from the ME list of the parent metadata volume that points to the child metadata volume (930) and, if the child metadata volume is still part of the aggregated file system, a reverse mapper from the child metadata volume that points to the parent metadata volume (940). In some embodiments, the file switch also identifies and deletes a mount entry from a directory in the parent metadata volume that is associated with the child metadata volume according to its relative pathname (950).

In some embodiments, a file switch (e.g., file switch 152 or 154) (FIG. 1) of the aggregated file system 140 (FIG. 1) is implemented using a computer system schematically shown in FIG. 10. The file switch comprises one or more processing units (CPUs) 1000, a memory device 1009, one or more network or other communication interface circuits 1004 for interconnecting a plurality of clients 1006, file servers 1007 and metadata servers 1008 (each managing one or more metadata volumes), a switch 1003 or bus interface for connecting the network interface circuits to one or more system buses 1001 that interconnect these components. The file switch may optionally have a user interface 1002, although in some embodiments the file switch is managed using a workstation connected to the file switch via one of the network interface circuits 1004. In alternate embodiments, much of the functionality of the file switch may be implemented in one or more application specific integrated circuits (ASIC's), thereby either eliminating the need for the CPU, or reducing the role of the CPU in the handling file access requests by client computers.

The memory 1009 may include high speed random access memory and may also include non volatile memory, such as one or more magnetic disk storage devices. The memory 1009 may include mass storage that is remotely located from the central processing unit(s) 1000. The memory 1009 stores:

- an operating system 1010 that includes procedures for handling various basic system services and for performing hardware dependent tasks;
- a network communication module 1011 that is used for controlling communication between the system and clients 1006, file servers 1007 and metadata servers 1008 via the network interface circuits 1004 and one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, or combinations of two or more of these networks;
- a file switch module 1012, for implementing many of the main aspects of the aggregate file system 140 (FIG. 1), including a mount entry cache (MEC) initialization module 1013, a MEC insertion module 1014, a MEC lookup module 1015 and a MEC deletion module 1016;
- file state information 1020, including transaction state information 1021, open file state information 1022 and locking state information 1023; and
- cached information 1024, including mount entry cache 1025, cached (and aggregated) data files 1026 and corresponding metadata files 1027.

The file switch module 1012, the state information 1020 and the cached information 1024 may include executable procedures, sub-modules, tables and other data structures. In other embodiments, additional or different modules and data structures may be used, and some of the modules and/or data structures listed above may not be used.

As shown in FIG. 1, a user-specific file's metadata is stored in one or more metadata servers (162, 164) and separated from the file's user data that is stored in one or more file servers (142, 144). In order to satisfy a request to access a particular file, a file switch first visits a set of metadata servers based on the identified target metadata volume to identifies a set of file servers hosting the user data of the requested file and then visits each of the file servers to retrieve the requested user data. One benefit inherent in this configuration is that a file switch may be able to retrieve the user data from multiple file servers more efficiently, e.g., in a parallel mode, when certain data striping or mirroring strategies are employed by the aggregated file system.

According to another embodiment, a self-sustained file server can be incorporated into the aggregated file system by generating in the file system's mount entry caches a new mount entry representing the file server and making the file system hierarchy associated with file server a subset of the file system hierarchy associated with the aggregated file system. In this embodiment, the volumes managed by the file server remain in their native format and the file server is insulated from any data striping or mirroring strategy implemented in the aggregated file system. When a file switch processes a file access request for a file stored in the self-sustained file server, it is only responsible for identifying a volume within the file server. All subsequent processing, including access to the requested files, is exclusively handled by the file server itself with respect to both the file's user data and the metadata.

To incorporate this self-sustained file server into an existing aggregated file system, there is little modification to the aggregated file system except inserting into the mount entry caches of the aggregated file system a new mount entry corresponding to the file server and associating the new entry with some existing ones in the metadata hierarchical directory structure of the file system. A file access request can be satisfied by just one visit to the file server since both the file's metadata and user data can be found therein. By the same token, the task of disconnecting the file server from the aggregated file system is also less complicated. The file system only needs to identify and update or eliminate entries in the mount entry caches that are relevant to the file server.

In particular, if the self-sustained file server joins or leaves an aggregated file system as a member associated with one leaf node of the hierarchical directory structure of the aggregated file system, the only change to the mount entry cache is to update the mount entry list associated with the parent node of the leaf node, because this is the only member of the existing system that has a logical connection with the self-sustained file server. But if the file server joins or leaves the aggregated file system as a member associated with an intermediate node, additional changes to the data in the mount entry cache are needed to ensure that the mount entry lists associated with its parent and child nodes are updated to reflect the change to the hierarchy and to ensure there is no name conflict between the new file server and any existing ones in the metadata hierarchical data structure.

The foregoing description, for purposes of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method of handling a file processing request in an aggregated file system, comprising: receiving a request with respect to a file, the request identifying an absolute pathname of the file in the aggregated file system;identifying a target metadata volume that hosts metadata of the file among a plurality of metadata volumes in accordance with the request and one or more mount entries each associated with a respective one of the plurality of metadata volumes, including recursively: identifying a mount entry when associated with a current metadata volume, wherein the mount entry identifies a relative pathname matching a respective portion of the absolute pathname, the identified relative pathname further comprising pathname sub-components of sizes ranging from one pathname component to a size of the identified relative pathname; andselecting at least one of the pathname sub-components using the identified mount entry based upon a number indicated in a list entry pointed to by the identified mount entry, until no mount entry associated with a current metadata volume matches a respective portion of the absolute pathname, to set the current metadata volume; andidentifying the current metadata volume as the target metadata volume and sending the request to a metadata server hosting the identified target metadata volume in response to the request to a file switch for assembling the file.
2. The method of claim 1, wherein the recursively identifying includes producing a residual pathname by removing from the absolute pathname the relative pathname associated with each identified mount entry.
3. The method of claim 1, wherein the absolute pathname of the file is unique in the aggregated file system.
4. The method of claim 1, wherein at least a subset of the plurality of metadata volumes includes one or more mount entries, each mount entry identifying a respective distinct child metadata volume and an associated relative pathname.
5. The method of claim 4, wherein at least one metadata volume in the subset of the plurality of metadata volumes includes a reverse mapper, the reverse mapper identifying a parent metadata volume.
6. The method of claim 1, wherein: the plurality of metadata volumes include at least one root metadata volume, one or more intermediate metadata volumes and one or more leaf metadata volumes, and wherein the root metadata volume is a metadata volume that has no parent metadata volume, the leaf metadata volume is a metadata volume that has no child metadata volume and the intermediate metadata volume is a metadata volume that has both at least one parent and at least one child metadata volumes;the target metadata volume is one of the root metadata volumes when none of the child metadata volumes directly referenced by at least one of the root metadata volumes has an associated relative pathname matching a portion of the absolute pathname;the target metadata volume is one of the intermediate metadata volumes when at least one of the intermediate metadata volumes has an associated relative pathname matching a portion of the absolute pathname and none of the child metadata volumes referenced by at least one of the intermediate metadata volumes has an associated relative pathname matching a portion of the absolute pathname; andthe target metadata volume is one of the leaf metadata volumes referenced by at least one of the root metadata volumes using one or more mount entries when at least one of the leaf metadata volumes has an associated relative pathname matching a portion of the absolute pathname.
7. The method of claim 1, further comprising: forming an ordered pair comprising the identified target metadata volume and a residual pathname, if any, based upon said identifying of the target metadata volume.
8. The method of claim 1 further comprising: accessing a plurality of mount entries, wherein each of the plurality of mount entries is associated with a respective one of the plurality of metadata volumes as a parent metadata volume and identifies a respective distinct child metadata volume and the relative pathname associated with the child metadata volume;resetting the current metadata volume to the child metadata volume identified by the identified mount entry, until no mount entry associated with the current metadata volume is found that matches a respective portion of the absolute pathname, the target metadata volume comprising the last child metadata volume so identified; andreturning information identifying the target metadata volume, hosted metadata and its associated residual pathname in response to the request to a file switch for assembling the file in response to the request.
9. The method of claim 1, wherein the identified relative pathname has a size based upon a maximum number of pathname components of the relative pathname that can be handled by the identified mount entry.
10. An apparatus for handling a file processing request in an aggregated file system, comprising: at least one processor; andmemory coupled to the at least one processor which is configured to execute program instructions stored in the memory comprising: receiving a request with respect to a specified file, the request including an absolute pathname of the specified file in the aggregated file system;identifying a target metadata volume that hosts metadata of the file among a plurality of metadata volumes in accordance with the request and one or more mount entries each associated with a respective one of the plurality of metadata volumes, including recursively: identifying a mount entry when associated with a current metadata volume, wherein the mount entry identifies a relative pathname matching a respective portion of the absolute pathname, the identified relative pathname further comprising pathname sub-components of sizes ranging from one pathname component to a size of the identified relative pathname; andselecting at least one of the pathname sub-components using the identified mount entry based upon a number indicated in a list entry pointed to by the identified mount entry, until no mount entry associated with a current metadata volume matches a respective portion of the absolute pathname, to set the current metadata volume; andidentifying the current metadata volume as the target metadata volume and sending the request to a metadata server hosting the identified target metadata volume in response to the request to a file switch for assembling the file.
11. The apparatus of claim 10, wherein the recursively identifying includes producing a residual pathname by removing from the absolute pathname the relative pathname associated with each identified mount entry.
12. The apparatus of claim 10, wherein the absolute pathname of the file is unique in the aggregated file system.
13. The apparatus of claim 10, wherein at least a subset of the plurality of metadata volumes includes one or more mount entries, each mount entry identifying a respective distinct child metadata volume and an associated relative pathname.
14. The apparatus of claim 13, wherein at least one metadata volume in the subset of the plurality of metadata volumes includes a reverse mapper, the reverse mapper identifying a parent metadata volume.
15. The apparatus of claim 10, wherein: the plurality of metadata volumes include at least one root metadata volume, one or more intermediate metadata volumes and one or more leaf metadata volumes, and wherein the root metadata volume is a metadata volume that has no parent metadata volume, the leaf metadata volume is a metadata volume that has no child metadata volume and the intermediate metadata volume is a metadata volume that has both at least one parent and at least one child metadata volumes;the target metadata volume is one of the root metadata volumes when none of the child metadata volumes directly referenced by at least one of the root metadata volumes has an associated relative pathname matching a portion of the absolute pathname;the target metadata volume is one of the intermediate metadata volumes when at least one of the intermediate metadata volumes has an associated relative pathname matching a portion of the absolute pathname and none of the child metadata volumes referenced by at least one of the intermediate metadata volumes has an associated relative pathname matching a portion of the absolute pathname; andthe target metadata volume is one of the leaf metadata volumes referenced by at least one of the root metadata volumes using one or more mount entries when at least one of the leaf metadata volumes has an associated relative pathname matching a portion of the absolute pathname.
16. The apparatus of claim 10 wherein the at least one processor is further configured to execute programmed instructions stored in the memory further comprising forming an ordered pair comprising the identified target metadata volume and a residual pathname, if any, based upon said identifying of the target metadata volume.
17. The apparatus of claim 10 wherein the at least one processor is further configured to execute programmed instructions stored in the memory further comprising: accessing a plurality of mount entries, wherein each of the plurality of mount entries is associated with a respective one of the plurality of metadata volumes as a parent metadata volume and identifies a respective distinct child metadata volume and the relative pathname associated with the child metadata volume;resetting the current metadata volume to the child metadata volume identified by the identified mount entry, until no mount entry associated with the current metadata volume is found that matches a respective portion of the absolute pathname, the target metadata volume comprising the last child metadata volume so identified; andreturning information identifying the target metadata volume, hosted metadata and its associated residual pathname in response to the request to a file switch for assembling the file in response to the request.
18. The apparatus of claim 10, wherein the identified relative pathname has a size based upon a maximum number of pathname components of the relative pathname that can be handled by the identified mount entry.
19. A non-transitory computer readable medium having stored thereon instructions for handling a file processing request in an aggregated file system comprising machine executable code which when executed by at least one processor, causes the processor to perform steps comprising: receiving a request with respect to a specified file, the request including an absolute pathname of the specified file in the aggregated file system;identifying a target metadata volume that hosts metadata of the file among a plurality of metadata volumes in accordance with the request and one or more mount entries each associated with a respective one of the plurality of metadata volumes, including recursively: identifying a mount entry when associated with a current metadata volume, wherein the mount entry identifies a relative pathname matching a respective portion of the absolute pathname, the identified relative pathname further comprising pathname sub-components of sizes ranging from one pathname component to a size of the identified relative pathname; andselecting at least one of the pathname sub-components using the identified mount entry based upon a number indicated in a list entry pointed to by the identified mount entry, until no mount entry associated with a current metadata volume matches a respective portion of the absolute pathname, to set the current metadata volume; andidentifying the current metadata volume as the target metadata volume and sending the request to a metadata server hosting the identified target metadata volume in response to the request to a file switch for assembling the file.
20. The medium of claim 19, wherein the recursively identifying includes producing a residual pathname by removing from the absolute pathname the relative pathname associated with each identified mount entry.
21. The medium of claim 19, wherein the absolute pathname of the file is unique in the aggregated file system.
22. The medium of claim 19, wherein at least a subset of the plurality of metadata volumes includes one or more mount entries, each mount entry identifying a respective distinct child metadata volume and an associated relative pathname.
23. The medium of claim 22, wherein at least one metadata volume in the subset of the plurality of metadata volumes includes a reverse mapper, the reverse mapper identifying a parent metadata volume.
24. The medium of claim 19, wherein: the plurality of metadata volumes include at least one root metadata volume, one or more intermediate metadata volumes and one or more leaf metadata volumes, and wherein the root metadata volume is a metadata volume that has no parent metadata volume, the leaf metadata volume is a metadata volume that has no child metadata volume and the intermediate metadata volume is a metadata volume that has both at least one parent and at least one child metadata volumes;the target metadata volume is one of the root metadata volumes when none of the child metadata volumes directly referenced by at least one of the root metadata volumes has an associated relative pathname matching a portion of the absolute pathname;the target metadata volume is one of the intermediate metadata volumes when at least one of the intermediate metadata volumes has an associated relative pathname matching a portion of the absolute pathname and none of the child metadata volumes referenced by at least one of the intermediate metadata volumes has an associated relative pathname matching a portion of the absolute pathname; andthe target metadata volume is one of the leaf metadata volumes referenced by at least one of the root metadata volumes using one or more mount entries when at least one of the leaf metadata volumes has an associated relative pathname matching a portion of the absolute pathname.
25. The medium of claim 19, further having stored thereon instructions that when executed by the processor causes the process to perform steps comprising forming an ordered pair comprising the identified target metadata volume and a residual pathname, if any, based upon said identifying of the target metadata volume.
26. The medium of claim 19 further having stored thereon instructions that when executed by the processor causes the process to perform steps comprising: accessing a plurality of mount entries, wherein each of the plurality of mount entries is associated with a respective one of the plurality of metadata volumes as a parent metadata volume and identifies a respective distinct child metadata volume and the relative pathname associated with the child metadata volume;resetting the current metadata volume to the child metadata volume identified by the identified mount entry, until no mount entry associated with the current metadata volume is found that matches a respective portion of the absolute pathname, the target metadata volume comprising the last child metadata volume so identified; andreturning information identifying the target metadata volume, hosted metadata and its associated residual pathname in response to the request to a file switch for assembling the file in response to the request.
27. The medium of claim 19, wherein the identified relative pathname has a size based upon a maximum number of pathname components of the relative pathname that can be handled by the identified mount entry.
28. An aggregated file system, comprising: a plurality of file servers;a plurality of metadata servers including a plurality of metadata volumes; anda plurality of file switches, each file switch including: at least one processor; andmemory coupled to the at least one processor which is configured to execute program instructions stored in the memory comprising: receiving a request with respect to a specified file, the request including an absolute pathname of the specified file in the aggregated file system;identifying a target metadata volume that hosts metadata of the file among the plurality of metadata volumes in accordance with the request and one or more mount entries each associated with a respective one of the plurality of metadata volumes, including recursively: identifying a mount entry when associated with a current metadata volume, wherein the mount entry identifies a relative pathname matching a respective portion of the absolute pathname, the identified relative pathname further comprising pathname sub-components of sizes ranging from one pathname component to a size of the identified relative pathname; andselecting at least one of the pathname sub-components using the identified mount entry based upon a number indicated in a list entry pointed to by the identified mount entry, until no mount entry associated with a current metadata volume matches a respective portion of the absolute pathname, to set the current metadata volume; andidentifying the current metadata volume as the target metadata volume and sending the request to a metadata server hosting the identified target metadata volume in response to the request to a file switch for assembling the file.
29. The system of claim 28, wherein the recursively identifying includes producing a residual pathname by removing from the absolute pathname the relative pathname associated with each identified mount entry.
30. The system of claim 28, wherein the absolute pathname of the file is unique in the aggregated file system.
31. The system of claim 28, wherein at least a subset of the plurality of metadata volumes includes one or more mount entries, each mount entry identifying a respective distinct child metadata volume and an associated relative pathname.
32. The system of claim 31, wherein at least one metadata volume in the subset of the plurality of metadata volumes includes a reverse mapper, the reverse mapper identifying a parent metadata volume.
33. The system of claim 28, wherein: the plurality of metadata volumes include at least one root metadata volume, one or more intermediate metadata volumes and one or more leaf metadata volumes, and wherein the root metadata volume is a metadata volume that has no parent metadata volume, the leaf metadata volume is a metadata volume that has no child metadata volume and the intermediate metadata volume is a metadata volume that has both at least one parent and at least one child metadata volumes;the target metadata volume is one of the root metadata volumes when none of the child metadata volumes directly referenced by at least one of the root metadata volumes has an associated relative pathname matching a portion of the absolute pathname;the target metadata volume is one of the intermediate metadata volumes when at least one of the intermediate metadata volumes has an associated relative pathname matching a portion of the absolute pathname and none of the child metadata volumes referenced by at least one of the intermediate metadata volumes has an associated relative pathname matching a portion of the absolute pathname; andthe target metadata volume is one of the leaf metadata volumes referenced by at least one of the root metadata volumes using one or more mount entries when at least one of the leaf metadata volumes has an associated relative pathname matching a portion of the absolute pathname.
34. The system of claim 28 wherein the at least one processor is further configured to execute programmed instructions stored in the memory further comprising forming an ordered pair comprising the identified target metadata volume and a residual pathname, if any, based upon said identifying of the target metadata volume.
35. The system of claim 28 wherein the at least one processor is further configured to execute programmed instructions stored in the memory further comprising: accessing a plurality of mount entries, wherein each of the plurality of mount entries is associated with a respective one of the plurality of metadata volumes as a parent metadata volume and identifies a respective distinct child metadata volume and the relative pathname associated with the child metadata volume;resetting the current metadata volume to the child metadata volume identified by the identified mount entry, until no mount entry associated with the current metadata volume is found that matches a respective portion of the absolute pathname, the target metadata volume comprising the last child metadata volume so identified; andreturning information identifying the target metadata volume, hosted metadata and its associated residual pathname in response to the request to a file switch for assembling the file in response to the request.
36. The system of claim 28, wherein the identified relative pathname has a size based upon a maximum number of pathname components of the relative pathname that can be handled by the identified mount entry.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 11/337,190, filed Jan. 20, 2006, which claims priority to U.S. Provisional Patent Application No. 60/646,214, filed Jan. 20, 2005, entitled “Scalable System For Partitioning And Accessing Metadata Over Multiple Servers”, each of which is incorporated herein by reference. This application is related to U.S. patent application Ser. No. 10/043,413, entitled File Switch and Switched File System, filed Jan. 10, 2002, and U.S. Provisional Patent Application No. 60/261,153, entitled File Switch And Switched File System, filed Jan. 11, 2001, both of which are incorporated herein by reference.

US Referenced Citations (345)

Number	Name	Date	Kind
4993030	Krakauer et al.	Feb 1991	A
5218695	Noveck et al.	Jun 1993	A
5303368	Kotaki	Apr 1994	A
5473362	Fitzgerald et al.	Dec 1995	A
5511177	Kagimasa et al.	Apr 1996	A
5537585	Blickenstaff et al.	Jul 1996	A
5548724	Akizawa et al.	Aug 1996	A
5550965	Gabbe et al.	Aug 1996	A
5583995	Gardner et al.	Dec 1996	A
5586260	Hu	Dec 1996	A
5590320	Maxey	Dec 1996	A
5623490	Richter et al.	Apr 1997	A
5649194	Miller et al.	Jul 1997	A
5649200	Leblang et al.	Jul 1997	A
5668943	Attanasio et al.	Sep 1997	A
5692180	Lee	Nov 1997	A
5721779	Funk	Feb 1998	A
5724512	Winterbottom	Mar 1998	A
5806061	Chaudhuri et al.	Sep 1998	A
5832496	Anand et al.	Nov 1998	A
5832522	Blickenstaff et al.	Nov 1998	A
5838970	Thomas	Nov 1998	A
5862325	Reed et al.	Jan 1999	A
5884303	Brown	Mar 1999	A
5893086	Schmuck et al.	Apr 1999	A
5897638	Lasser et al.	Apr 1999	A
5905990	Inglett	May 1999	A
5917998	Cabrera et al.	Jun 1999	A
5920873	Van Huben et al.	Jul 1999	A
5937406	Balabine et al.	Aug 1999	A
5991302	Berl et al.	Nov 1999	A
5995491	Richter et al.	Nov 1999	A
5999664	Mahoney et al.	Dec 1999	A
6012083	Savitzky et al.	Jan 2000	A
6029168	Frey	Feb 2000	A
6044367	Wolff	Mar 2000	A
6047129	Frye	Apr 2000	A
6072942	Stockwell et al.	Jun 2000	A
6078929	Rao	Jun 2000	A
6085234	Pitts et al.	Jul 2000	A
6088694	Burns et al.	Jul 2000	A
6104706	Richter et al.	Aug 2000	A
6128627	Mattis et al.	Oct 2000	A
6128717	Harrison et al.	Oct 2000	A
6161145	Bainbridge et al.	Dec 2000	A
6161185	Guthrie et al.	Dec 2000	A
6181336	Chiu et al.	Jan 2001	B1
6202156	Kalajan	Mar 2001	B1
6223206	Dan et al.	Apr 2001	B1
6233648	Tomita	May 2001	B1
6237008	Beal et al.	May 2001	B1
6256031	Meijer et al.	Jul 2001	B1
6282610	Bergsten	Aug 2001	B1
6289345	Yasue	Sep 2001	B1
6308162	Ouimet et al.	Oct 2001	B1
6324581	Xu et al.	Nov 2001	B1
6339785	Feigenbaum	Jan 2002	B1
6349343	Foody et al.	Feb 2002	B1
6374263	Bunger et al.	Apr 2002	B1
6389433	Bolosky et al.	May 2002	B1
6393581	Friedman et al.	May 2002	B1
6397246	Wolfe	May 2002	B1
6412004	Chen et al.	Jun 2002	B1
6438595	Blumenau et al.	Aug 2002	B1
6466580	Leung	Oct 2002	B1
6469983	Narayana et al.	Oct 2002	B2
6477544	Bolosky et al.	Nov 2002	B1
6487561	Ofek et al.	Nov 2002	B1
6493804	Soltis et al.	Dec 2002	B1
6516350	Lumelsky et al.	Feb 2003	B1
6516351	Borr	Feb 2003	B2
6549916	Sedlar	Apr 2003	B1
6553352	Delurgio et al.	Apr 2003	B2
6556997	Levy	Apr 2003	B1
6556998	Mukherjee et al.	Apr 2003	B1
6560230	Li et al.	May 2003	B1
6601101	Lee et al.	Jul 2003	B1
6606663	Liao et al.	Aug 2003	B1
6612490	Herrendoerfer et al.	Sep 2003	B1
6654346	Mahalingaiah et al.	Nov 2003	B1
6721794	Taylor et al.	Apr 2004	B2
6728265	Yavatkar et al.	Apr 2004	B1
6738357	Richter et al.	May 2004	B1
6738790	Klein et al.	May 2004	B1
6742035	Zayas et al.	May 2004	B1
6744776	Kalkunte et al.	Jun 2004	B1
6748420	Quatrano et al.	Jun 2004	B1
6754215	Arikawa et al.	Jun 2004	B1
6757706	Dong et al.	Jun 2004	B1
6775672	Mahalingaiah et al.	Aug 2004	B2
6775673	Mahalingam et al.	Aug 2004	B2
6775679	Gupta	Aug 2004	B2
6782450	Arnott et al.	Aug 2004	B2
6801960	Ericson et al.	Oct 2004	B1
6826613	Wang et al.	Nov 2004	B1
6839761	Kadyk et al.	Jan 2005	B2
6847959	Arrouye et al.	Jan 2005	B1
6847970	Keller et al.	Jan 2005	B2
6850997	Rooney et al.	Feb 2005	B1
6871245	Bradley	Mar 2005	B2
6880017	Marce et al.	Apr 2005	B1
6889249	Miloushev et al.	May 2005	B2
6914881	Mansfield et al.	Jul 2005	B1
6922688	Frey, Jr.	Jul 2005	B1
6934706	Mancuso et al.	Aug 2005	B1
6938039	Bober et al.	Aug 2005	B1
6938059	Tamer et al.	Aug 2005	B2
6959373	Testardi	Oct 2005	B2
6961815	Kistler et al.	Nov 2005	B2
6973455	Vahalia et al.	Dec 2005	B1
6973549	Testardi	Dec 2005	B1
6975592	Seddigh et al.	Dec 2005	B1
6985936	Agarwalla et al.	Jan 2006	B2
6985956	Luke et al.	Jan 2006	B2
6986015	Testardi	Jan 2006	B2
6990114	Erimli et al.	Jan 2006	B1
6990547	Ulrich et al.	Jan 2006	B2
6990667	Ulrich et al.	Jan 2006	B2
6996841	Kadyk et al.	Feb 2006	B2
7003533	Noguchi et al.	Feb 2006	B2
7006981	Rose et al.	Feb 2006	B2
7010553	Chen et al.	Mar 2006	B2
7013379	Testardi	Mar 2006	B1
7020644	Jameson	Mar 2006	B2
7020669	McCann et al.	Mar 2006	B2
7024427	Bobbitt et al.	Apr 2006	B2
7039061	Connor et al.	May 2006	B2
7051112	Dawson	May 2006	B2
7054998	Arnott et al.	May 2006	B2
7072917	Wong et al.	Jul 2006	B2
7075924	Richter et al.	Jul 2006	B2
7089286	Malik	Aug 2006	B1
7111115	Peters et al.	Sep 2006	B2
7113962	Kee et al.	Sep 2006	B1
7120728	Krakirian et al.	Oct 2006	B2
7120746	Campbell et al.	Oct 2006	B2
7127556	Blumenau et al.	Oct 2006	B2
7133967	Fujie et al.	Nov 2006	B2
7143146	Nakatani et al.	Nov 2006	B2
7146524	Patel et al.	Dec 2006	B2
7152184	Maeda et al.	Dec 2006	B2
7155466	Rodriguez et al.	Dec 2006	B2
7165095	Sim	Jan 2007	B2
7167821	Hardwick et al.	Jan 2007	B2
7171469	Ackaouy et al.	Jan 2007	B2
7173929	Testardi	Feb 2007	B1
7194579	Robinson et al.	Mar 2007	B2
7234074	Cohn et al.	Jun 2007	B2
7236491	Tsao et al.	Jun 2007	B2
7280536	Testardi	Oct 2007	B2
7284150	Ma et al.	Oct 2007	B2
7293097	Borr	Nov 2007	B2
7293099	Kalajan	Nov 2007	B1
7293133	Colgrove et al.	Nov 2007	B1
7343398	Lownsbrough	Mar 2008	B1
7346664	Wong et al.	Mar 2008	B2
7383288	Miloushev et al.	Jun 2008	B2
7401220	Bolosky et al.	Jul 2008	B2
7406484	Srinivasan et al.	Jul 2008	B1
7415488	Muth et al.	Aug 2008	B1
7415608	Bolosky et al.	Aug 2008	B2
7440982	Lu et al.	Oct 2008	B2
7457982	Rajan	Nov 2008	B2
7467158	Marinescu	Dec 2008	B2
7475241	Patel et al.	Jan 2009	B2
7477796	Sasaki et al.	Jan 2009	B2
7509322	Miloushev et al.	Mar 2009	B2
7512673	Miloushev et al.	Mar 2009	B2
7519813	Cox et al.	Apr 2009	B1
7562110	Miloushev et al.	Jul 2009	B2
7571168	Bahar et al.	Aug 2009	B2
7574433	Engel	Aug 2009	B2
7587471	Yasuada et al.	Sep 2009	B2
7590747	Coates et al.	Sep 2009	B2
7599941	Bahar et al.	Oct 2009	B2
7610307	Havewala et al.	Oct 2009	B2
7610390	Yared et al.	Oct 2009	B2
7624109	Testardi	Nov 2009	B2
7639883	Gill	Dec 2009	B2
7644109	Manley et al.	Jan 2010	B2
7653699	Colgrove et al.	Jan 2010	B1
7685177	Hagerstrom et al.	Mar 2010	B1
7689596	Tsunoda	Mar 2010	B2
7694082	Golding et al.	Apr 2010	B2
7711771	Kimos et al.	May 2010	B2
7734603	McManis	Jun 2010	B1
7743035	Chen et al.	Jun 2010	B2
7752294	Meyer et al.	Jul 2010	B2
7769711	Srinivasan et al.	Aug 2010	B2
7788335	Miloushev et al.	Aug 2010	B2
7809691	Karmarkar et al.	Oct 2010	B1
7822939	Veprinsky et al.	Oct 2010	B1
7831639	Panchbudhe et al.	Nov 2010	B1
7849112	Mane et al.	Dec 2010	B2
7870154	Shitomi et al.	Jan 2011	B2
7877511	Berger et al.	Jan 2011	B1
7885970	Lacapra	Feb 2011	B2
7913053	Newland	Mar 2011	B1
7953701	Okitsu et al.	May 2011	B2
7958347	Ferguson	Jun 2011	B1
8005953	Miloushev et al.	Aug 2011	B2
20010007560	Masuda et al.	Jul 2001	A1
20010014891	Hoffert et al.	Aug 2001	A1
20010047293	Waller et al.	Nov 2001	A1
20010051955	Wong	Dec 2001	A1
20020035537	Waller et al.	Mar 2002	A1
20020059263	Shima et al.	May 2002	A1
20020065810	Bradley	May 2002	A1
20020073105	Noguchi et al.	Jun 2002	A1
20020083118	Sim	Jun 2002	A1
20020087887	Busam et al.	Jul 2002	A1
20020106263	Winker	Aug 2002	A1
20020120763	Miloushev et al.	Aug 2002	A1
20020133330	Loisey et al.	Sep 2002	A1
20020133491	Sim et al.	Sep 2002	A1
20020138502	Gupta	Sep 2002	A1
20020143909	Botz et al.	Oct 2002	A1
20020147630	Rose et al.	Oct 2002	A1
20020150253	Brezak et al.	Oct 2002	A1
20020156905	Weissman	Oct 2002	A1
20020161911	Pinckney, III et al.	Oct 2002	A1
20020188667	Kirnos	Dec 2002	A1
20020194342	Lu et al.	Dec 2002	A1
20030009429	Jameson	Jan 2003	A1
20030012382	Ferchichi et al.	Jan 2003	A1
20030028514	Lord et al.	Feb 2003	A1
20030033308	Patel et al.	Feb 2003	A1
20030033535	Fisher et al.	Feb 2003	A1
20030061240	McCann et al.	Mar 2003	A1
20030065956	Belapurkar et al.	Apr 2003	A1
20030115218	Bobbitt et al.	Jun 2003	A1
20030115439	Mahalingam et al.	Jun 2003	A1
20030128708	Inoue et al.	Jul 2003	A1
20030135514	Patel et al.	Jul 2003	A1
20030149781	Yared et al.	Aug 2003	A1
20030156586	Lee et al.	Aug 2003	A1
20030159072	Bellinger et al.	Aug 2003	A1
20030171978	Jenkins et al.	Sep 2003	A1
20030177364	Walsh et al.	Sep 2003	A1
20030177388	Botz et al.	Sep 2003	A1
20030179755	Fraser	Sep 2003	A1
20030204635	Ko et al.	Oct 2003	A1
20040003266	Moshir et al.	Jan 2004	A1
20040006575	Visharam et al.	Jan 2004	A1
20040010654	Yasuda et al.	Jan 2004	A1
20040017825	Stanwood et al.	Jan 2004	A1
20040025013	Parker et al.	Feb 2004	A1
20040028043	Maveli et al.	Feb 2004	A1
20040028063	Roy et al.	Feb 2004	A1
20040030857	Krakirian et al.	Feb 2004	A1
20040054777	Ackaouy et al.	Mar 2004	A1
20040093474	Lin et al.	May 2004	A1
20040098383	Tabellion et al.	May 2004	A1
20040098595	Aupperle et al.	May 2004	A1
20040133573	Miloushev et al.	Jul 2004	A1
20040133577	Miloushev et al.	Jul 2004	A1
20040133606	Miloushev et al.	Jul 2004	A1
20040133607	Miloushev et al.	Jul 2004	A1
20040133650	Miloushev et al.	Jul 2004	A1
20040139355	Axel et al.	Jul 2004	A1
20040148380	Meyer et al.	Jul 2004	A1
20040153479	Mikesell et al.	Aug 2004	A1
20040181605	Nakatani et al.	Sep 2004	A1
20040199547	Winter et al.	Oct 2004	A1
20040213156	Smallwood et al.	Oct 2004	A1
20040236798	Srinivasan et al.	Nov 2004	A1
20040267830	Wong et al.	Dec 2004	A1
20050021615	Arnott et al.	Jan 2005	A1
20050050107	Mane et al.	Mar 2005	A1
20050091214	Probert et al.	Apr 2005	A1
20050108575	Yung	May 2005	A1
20050114291	Becker-Szendy et al.	May 2005	A1
20050114701	Atkins et al.	May 2005	A1
20050117589	Douady et al.	Jun 2005	A1
20050160161	Barrett et al.	Jul 2005	A1
20050175013	Le Pennec et al.	Aug 2005	A1
20050187866	Lee	Aug 2005	A1
20050198501	Andreev et al.	Sep 2005	A1
20050213587	Cho et al.	Sep 2005	A1
20050246393	Coates et al.	Nov 2005	A1
20050289109	Arrouye et al.	Dec 2005	A1
20050289111	Tribble et al.	Dec 2005	A1
20060010502	Mimatsu et al.	Jan 2006	A1
20060045096	Farmer et al.	Mar 2006	A1
20060075475	Boulos et al.	Apr 2006	A1
20060080353	Miloushev et al.	Apr 2006	A1
20060106882	Douceur et al.	May 2006	A1
20060112151	Manley et al.	May 2006	A1
20060123062	Bobbitt et al.	Jun 2006	A1
20060140193	Kakani et al.	Jun 2006	A1
20060153201	Hepper et al.	Jul 2006	A1
20060161518	Lacapra	Jul 2006	A1
20060167838	Lacapra	Jul 2006	A1
20060179261	Rajan	Aug 2006	A1
20060184589	Lees et al.	Aug 2006	A1
20060190496	Tsunoda	Aug 2006	A1
20060200470	Lacapra et al.	Sep 2006	A1
20060212746	Amegadzie et al.	Sep 2006	A1
20060224687	Popkin et al.	Oct 2006	A1
20060230265	Krishna	Oct 2006	A1
20060242179	Chen et al.	Oct 2006	A1
20060259949	Schaefer et al.	Nov 2006	A1
20060268692	Wright et al.	Nov 2006	A1
20060271598	Wong et al.	Nov 2006	A1
20060277225	Mark et al.	Dec 2006	A1
20060282461	Marinescu	Dec 2006	A1
20060282471	Mark et al.	Dec 2006	A1
20070022121	Bahar et al.	Jan 2007	A1
20070024919	Wong et al.	Feb 2007	A1
20070027929	Whelan	Feb 2007	A1
20070027935	Haselton et al.	Feb 2007	A1
20070028068	Golding et al.	Feb 2007	A1
20070088702	Fridella et al.	Apr 2007	A1
20070098284	Sasaki et al.	May 2007	A1
20070136308	Tsirigotis et al.	Jun 2007	A1
20070208748	Li	Sep 2007	A1
20070209075	Coffman	Sep 2007	A1
20070226331	Srinivasan et al.	Sep 2007	A1
20080046432	Anderson et al.	Feb 2008	A1
20080070575	Claussen et al.	Mar 2008	A1
20080104443	Akutsu et al.	May 2008	A1
20080208933	Lyon	Aug 2008	A1
20080209073	Tang	Aug 2008	A1
20080222223	Srinivasan et al.	Sep 2008	A1
20080243769	Arbour et al.	Oct 2008	A1
20080282047	Arakawa et al.	Nov 2008	A1
20090007162	Sheehan	Jan 2009	A1
20090037975	Ishikawa et al.	Feb 2009	A1
20090041230	Williams	Feb 2009	A1
20090055607	Schack et al.	Feb 2009	A1
20090077097	Lacapra et al.	Mar 2009	A1
20090089344	Brown et al.	Apr 2009	A1
20090094252	Wong et al.	Apr 2009	A1
20090106255	Lacapra et al.	Apr 2009	A1
20090106263	Khalid et al.	Apr 2009	A1
20090132616	Winter et al.	May 2009	A1
20090204649	Wong et al.	Aug 2009	A1
20090204650	Wong et al.	Aug 2009	A1
20090204705	Marinov et al.	Aug 2009	A1
20090210431	Marinkovic et al.	Aug 2009	A1
20090254592	Marinov et al.	Oct 2009	A1
20100077294	Watson	Mar 2010	A1
20100211547	Kamei et al.	Aug 2010	A1
20110087696	Lacapra	Apr 2011	A1
20110093471	Brockway et al.	Apr 2011	A1

Foreign Referenced Citations (21)

Number	Date	Country
2003300350	Jul 2004	AU
2080530	Apr 1994	CA
2512312	Jul 2004	CA
0605088	Feb 1996	EP
0 738 970	Oct 1996	EP
63010250	Jan 1988	JP
6205006	Jul 1994	JP
06-332782	Dec 1994	JP
8021924	Mar 1996	JP
08-328760	Dec 1996	JP
08-339355	Dec 1996	JP
9016510	Jan 1997	JP
11282741	Oct 1999	JP
2000-183935	Jun 2000	JP
566291	Dec 2008	NZ
0239696	May 2002	WO
WO 02056181	Jul 2002	WO
WO 2004061605	Jul 2004	WO
2006091040	Aug 2006	WO
WO 2008130983	Oct 2008	WO
WO 2008147973	Dec 2008	WO

Non-Patent Literature Citations (87)

Entry
Cavale, M.R., “Introducting Microsoft Cluster Service (MSCS) in the Windows Server 2003,” Microsoft Corporation (Nov. 2002).
Pearson, P.K., “Fast Hashing of Variable-Length Text Strings,” Comm. of the ACM 33(6) (Jun. 1990).
Sorenson, K.M., “Installation and Administration: Kimberlite Cluster Version 1.1.0, Rev. D,” Mission Critical Linux, http://oss.missioncriticallinux.comkimberlite/kimberlite.pdf.
Thekkath, C.A., et al., “Frangipani: A Scalable Distributed File System” in Proceedings of the 16th ACM Symposium on Operating Systems Principles (Oct. 1997).
Wilkes, J., et al., “The HP AutoRAID Hierarchical Storage System,” ACM Transactions on Computer Systems 14(1) (Feb. 1996).
Thekkath et al., “Frangipani: A Scalable Distributed File System,” ACM Symposium on Operating Systems Principles 31(5):224-237 (1997).
“The AFS File System in Distributed Computing Environment,” www.transarc.ibm.com/Library/whitepapers/AFS/afsoverview.html, last accessed on Dec. 20, 2002.
Aguilera, Marcos K. et al., “Improving recoverability in multi-tier storage systems,” International Conference on Dependable Systems and Networks (DSN-2007), Jun. 2007, 10 pages, Edinburgh, Scotland.
Anderson, Darrell C. et al., “Interposed Request Routing for Scalable Network Storage,” ACM Transactions on Computer Systems 20(1): (Feb. 2002), pp. 1-24.
Anderson et al., “Serverless Network File System,” in the 15th Symposium on Operating Systems Principles, Dec. 1995, Association for Computing Machinery, Inc.
Anonymous, “How DFS Works: Remote File Systems,” Distributed File System (DFS) Technical Reference, retrieved from the Internet on Feb. 13, 2009: URL<:http://technetmicrosoft.com/en-us/library/cc782417WS.10,printer).aspx> (Mar. 2003).
Apple, Inc., “Mac OS X Tiger Keynote Intro. Part 2,” Jun. 2004, www.youtube.com <http://www.youtube.com/watch?v=zSBJwEmRJbY>, p. 1.
Apple, Inc., “Tiger Developer Overview Series: Working with Spotlight,” Nov. 23, 2004, www.apple.com using www.archive.org <http://web.archive.org/web/20041123005335/developer.apple.com/macosx/tiger/spotlight.html>, pp. 1-6.
“Auspex Storage Architecture Guide,” Second Edition, 2001, Auspex Systems, Inc., www.auspex.com, last accessed on Dec. 30, 2002.
Cabrera et al., “Swift: Storage Architecture for Large Objects,” In Proceedings of the—Eleventh IEEE Symposium on Mass Storage Systems, pp. 123-128, Oct. 1991.
Cabrera et al., “Swift: Using Distributed Disk Striping to Provide High I/O Data Rates,” Computing Systems 4, 4 (Fall 1991), pp. 405-436.
Cabrera et al., “Using Data Striping in a Local Area Network,” 1992, technical report No. UCSC-CRL-92-09 of the Computer & Information Sciences Department of University of California at Santa Cruz.
Callaghan et al., “NFS Version 3 Protocol Specifications” (RFC 1813), Jun. 1995, The Internet Engineering Task Force (IETN), www.ietf.org, last accessed on Dec. 30, 2002.
Carns et al., “PVFS: A Parallel File System for Linux Clusters,” in Proceedings of the Extreme Linux Track: 4th Annual Linux Showcase and Conference, pp. 317-327, Atlanta, Georgia, Oct. 2000, USENIX Association.
“CSA Persistent File System Technology,” Colorado Software Architecture, Inc.: A White Paper, Jan. 1, 1999, p. 1-3, <http://www.cosoa.com/white—papers/pfs.php>.
“Distributed File System: Logical View of Physical Storage: White Paper,” 1999, Microsoft Corp., www.microsoft.com, <http://www.eu.microsoft.com/TechNet/prodtechnol/windows2000serv/maintain/DFSnt95>, pp. 1-26, last accessed on Dec. 20, 2002.
English Language Abstract of JP 08-328760 from Patent Abstracts of Japan.
English Language Abstract of JP 08-339355 from Patent Abstracts of Japan.
English Translation of paragraphs 17, 32, and 40-52 of JP 08-328760.
English Translation of Notification of Reason(s) for Refusal for JP 2002-556371 (Dispatch Date: Jan. 22, 2007).
Fan et al., “Summary Cache: A Scalable Wide-Area Protocol”, Computer Communications Review, Association Machinery, New York, USA, Oct. 1998, vol. 28, Web Cache Sharing for Computing No. 4, pp. 254-265.
Farley, M., “Building Storage Networks,” Jan. 2000, McGraw Hill, ISBN 0072120509.
Gibson et al., “File Server Scaling with Network-Attached Secure Disks,” in Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (Sigmetrics '97), Association for Computing Machinery, Inc., Jun. 15-18, 1997.
Gibson et al., “NASD Scalable Storage Systems,” Jun. 1999, USENIX99, Extreme Linux Workshop, Monterey, California.
Harrison, C., May 19, 2008 response to Communication pursuant to Article 96(2) EPC dated Nov. 9, 2007 in corresponding European patent application No. 02718824.2.
Hartman, J., “The Zebra Striped Network File System,” 1994, Ph.D. dissertation submitted in the Graduate Division of the University of California at Berkeley.
Haskin et al., “The Tiger Shark File System,” 1996, in proceedings of IEEE, Spring COMPCON, Santa Clara, CA, www.research.ibm.com, last accessed on Dec. 30, 2002.
Hu, J., Final Office action dated Sep. 21, 2007 for related U.S. Appl. No. 10/336,784.
Hu, J., Office action dated Feb. 6, 2007 for related U.S. Appl. No. 10/336,784.
Hwang et al., “Designing SSI Clusters with Hierarchical Checkpointing and Single 1/0 Space,” IEEE Concurrency, Jan.-Mar. 1999, pp. 60-69.
International Search Report for International Patent Application No. PCT/US2008/083117 (Jun. 23, 2009).
International Search Report for International Patent Application No. PCT/US2008/060449 (Apr. 9, 2008).
International Search Report for International Patent Application No. PCT/US2008/064677 (Sep. 6, 2009).
International Search Report for International Patent Application No. PCT/US02/00720, Jul. 8, 2004.
International Search Report from International Application No. PCT/US03/41202, mailed Sep. 15, 2005.
Karamanolis, C. et al., “An Architecture for Scalable and Manageable File Services,” HPL-2001-173, Jul. 26, 2001. p. 1-114.
Katsurashima, W. et al., “NAS Switch: A Novel CIFS Server Virtualization, Proceedings,” 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003 (MSST 2003), Apr. 2003.
Kimball, C.E. et al., “Automated Client-Side Integration of Distributed Application Servers,” 13Th LISA Conf., 1999, pp. 275-282 of the Proceedings.
Klayman, J., Nov. 13, 2008 e-mail to Japanese associate including instructions for response to office action dated May 26, 2008 in corresponding Japanese patent application No. 2002-556371.
Klayman, J., Response filed by Japanese associate to office action dated Jan. 22, 2007 in corresponding Japanese patent application No. 2002-556371.
Klayman, J., Jul. 18, 2007 e-mail to Japanese associate including instructions for response to office action dated Jan. 22, 2007 in corresponding Japanese patent application No. 2002-556371.
Kohl et al., “The Kerberos Network Authentication Service (V5),” RFC 1510, Sep. 1993. (http://www.ietf.org/ rfc/rfc1510.txt?number=1510).
Korkuzas, V., Communication pursuant to Article 96(2) EPC dated Sep. 11, 2007 in corresponding European patent application No. 02718824.2-2201.
Lelil, S., “Storage Technology News: AutoVirt adds tool to help data migration projects,” Feb. 25, 2011, last accessed Mar. 17, 2011, <http://searchstorage.techtarget.com/news/article/0,289142,sid5—gci1527986,00.html>.
Long et al., “Swift/RAID: A distributed RAID System”, Computing Systems, Summer 1994, vol. 7, pp. 333-359.
“NERSC Tutorials: I/O on the Cray T3E, ‘Chapter 8, Disk Striping’,” National Energy Research Scientific Computing Center (NERSC), http://hpcfnersc.gov, last accessed on Dec. 27, 2002.
Noghani et al., “A Novel Approach to Reduce Latency on the Internet: ‘Component-Based Download’,” Proceedings of the Computing, Las Vegas, NV, Jun. 2000, pp. 1-6 on the Internet: Intl Conf on Internet.
Norton et al., “CIFS Protocol Version CIFS-Spec 0.9,” 2001, Storage Networking Industry Association (SNIA), www.snia.org, last accessed on Mar. 26, 2001.
Patterson et al., “A case for redundant arrays of inexpensive disks (RAID)”, Chicago, Illinois, Jun. 1-3, 1998, in Proceedings of ACM SIGMOD conference on the Management of Data, pp. 109-116, Association for Computing Machinery, Inc., www.acm.org, last accessed on Dec. 20, 2002.
Peterson, M., “Introducing Storage Area Networks,” Feb. 1998, InfoStor, www.infostor.com, last accessed on Dec. 20, 2002.
Preslan et al., “Scalability and Failure Recovery in a Linux Cluster File System,” in Proceedings of the 4th Annual Linux Showcase & Conference, Atlanta, Georgia, Oct. 10-14, 2000, pp. 169-180 of the Proceedings, www.usenix.org, last accessed on Dec. 20, 2002.
Response filed Jul. 6, 2007 to Office action dated Feb. 6, 2007 for related U.S. Appl. No. 10/336,784.
Response filed Mar. 20, 2008 to Final Office action dated Sep. 21, 2007 for related U.S. Appl. No. 10/336,784.
Rodriguez et al., “Parallel-access for mirror sites in the Internet,” InfoCom 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE Tel Aviv, Israel Mar. 26-30, 2000, Piscataway, NJ, USA, IEEE, US, Mar. 26, 2000, pp. 864-873, XP010376176 ISBN: 0-7803-5880-5 p. 867, col. 2, last paragraph -p. 868, col. 1, paragraph 1.
RSYNC, “Welcome to the RSYNC Web Pages,” Retrieved from the Internet URL: http://samba.anu.edu.ut.rsync/. (Retrieved on Dec. 18, 2009).
Savage, et al., “AFRAID—A Frequently Redundant Array of Independent Disks,” 1996 USENIX Technical Conf., San Diego, California, Jan. 22-26, 1996.
“Scaling Next Generation Web Infrastructure with Content-Intelligent Switching: White Paper,” Apr. 2000, p. 1-9 Alteon Web Systems, Inc.
Soltis et al., “The Design and Performance of a Shared Disk File System for IRIX,” in Sixth NASA Goddard Space Flight Center Conference on Mass Storage and Technologies in cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems, Mar. 23-26, 1998.
Soltis et al., “The Global File System,” in Proceedings of the Fifth NASA Goddard Space Flight Center Conference on Mass Storage Systems and Technologies, Sep. 17-19, 1996, College Park, Maryland.
Stakutis, C., “Benefits of SAN-based file system sharing,” Jul. 2000, InfoStor, www.infostor.com, last accessed on Dec. 30, 2002.
Uesugi, H., Nov. 26, 2008 amendment filed by Japanese associate in response to office action dated May 26, 2008 in corresponding Japanese patent application No. 2002-556371.
Uesugi, H., English translation of office action dated May 26, 2008 in corresponding Japanese patent application No. 2002-556371.
Uesugi, H., Jul. 15, 2008 letter from Japanese associate reporting office action dated May 26, 2008 in corresponding Japanese patent application No. 2002-556371.
“VERITAS SANPoint Foundation Suite(tm) and SANPoint Foundation Suite(tm) HA: New VERITAS Volume Management and File System Technology for Cluster Environments,” Sep. 2001, VERITAS Software Corp.
“Windows Clustering Technologies—An Overview,” Nov. 2001, Microsoft Corp., www.microsoft.com, last accessed on Dec. 30, 2002.
Zayas, E., “AFS-3 Programmer's Reference: Architectural Overview,” Transarc Corp., version 1.0 of Sep. 2, 1991, doc. No. FS-00-D160.
Basney, Jim et al., “Credential Wallets: A Classification of Credential Repositories Highlighting MyProxy,” TPRC 2003, Sep. 19-21, 2003.
Botzum, Keys, “Single Sign On—A Contrarian View,” Open Group Website, <http://www.opengroup.org/security/topics.htm>, Aug. 6, 2001, pp. 1-8.
Novotny, Jason et al., “An Online Credential Repository for the Grid: MyProxy,” 2001, pp. 1-8.
Pashalidis, Andreas et al., “A Taxonomy of Single Sign-On Systems,” 2003, pp. 1-16, Royal Holloway, University of London, Egham Surray, TW20, 0EX, United Kingdom.
Pashalidis, Andreas et al., “Impostor: a single sign-on system for use from untrusted devices,” Global Telecommunications Conference, 2004, GLOBECOM '04, IEEE, Issue Date: Nov. 29-Dec. 3, 2004.Royal Holloway, University of London.
Tulloch, Mitch, “Microsoft Encyclopedia of Security,” pp. 218, 300-301, Microsoft Press, 2003, Redmond, Washington.
Gupta et al., “Algorithms for Packet Classification”, Computer Systems Laboratory, Stanford University, CA, Mar./ Apr. 2001, pp. 1-29.
Heinz II G., “Priorities in Stream Transmission Control Protocol (SCTP) Multistreaming”, Thesis submitted to the Faculty of the University of Delaware, Spring 2003, pp. 1-35.
Internet Protocol,“DARPA Internet Program Protocol Specification”, (RFC:791), Information Sciences Institute, University of Southern California, Sep. 1981, pp. 1-49.
Ilvesmaki M., et al., “On the capabilities of application level traffic measurements to differentiate and classify Internet traffic”, Presented in SPIE's International Symposium ITcom, Aug. 19-21, 2001, pp. 1-11, Denver, Colorado.
Modiano E., “Scheduling Algorithms for Message Transmission Over a Satellite Broadcast System,” MIT Lincoln Laboratory Advanced Network Group, Nov. 1997, pp. 1-7.
Ott D., et al., “A Mechanism for TCP-Friendly Transport-level Protocol Coordination”, USENIX Annual Technical Conference, 2002, University of North Carolina at Chapel Hill, pp. 1-12.
Padmanabhan V., et al., “Using Predictive Prefetching to Improve World Wide Web Latency”, SIGCOM, 1996, pp. 1-15.
Rosen E., et al., “MPLS Label Stack Encoding”, (RFC:3032) Network Working Group, Jan. 2001, pp. 1-22, (http://www.ietf.org/rfc/rfc3032.txt).
Wang B., “Priority and Realtime Data Transfer Over the Best-Effort Internet”, Dissertation Abstract, Sep. 2005, ScholarWorks@UMASS.
Woo T.Y.C., “A Modular Approach to Packet Classification: Algorithms and Results”, Nineteenth Annual Conference of the IEEE Computer and Communications Societies 3(3):1213-22, Mar. 26-30, 2000, abstract only, (http://ieeexplore.ieee.org/xpl/freeabs—all.jsp?arnumber=832499).

Related Publications (1)

	Number	Date	Country
	20110087696 A1	Apr 2011	US

Provisional Applications (1)

	Number	Date	Country
	60646214	Jan 2005	US

Continuations (1)

	Number	Date	Country
Parent	11337190	Jan 2006	US
Child	12972825		US

Scalable system for partitioning and accessing metadata over multiple servers

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Disclaimer

Abstract