The present invention relates generally to data storage systems, and more particularly to network file servers.
In a data network it is conventional for a network server containing disk storage to service storage access requests from multiple network clients. The storage access requests, for example, are serviced in accordance with a network file access protocol such as the Network File System (NFS), the Common Internet File System (CIFS) protocol, the Hypertext Transfer Protocol (HTTP), or the File Transfer Protocol (FTP). NFS is described in Bill Nowicki, “NFS: Network File System Protocol Specification,” Network Working Group, Request for Comments: 1094, Sun Microsystems, Inc., Mountain View, Calif., March 1989. CIFS is described in Paul L. Leach and Dilip C. Naik, “A Common Internet File System,” Microsoft Corporation, Redmond, Wash., Dec. 19, 1997. HTTP is described in R. Fielding et al., “Hypertext Transfer Protocol—HTTP/1.1,” Request for Comments: 2068, Network Working Group, Digital Equipment Corp., Maynard, Mass., January 1997. FTP is described in J. Postel & J. Reynolds, “FILE TRANSFER PROTOCOL (FTP),” Network Working Group, Request for Comments: 959, ISI, Marina del Rey, Calif., October 1985.
A network file server typically includes a digital computer for servicing storage access requests in accordance with at least one network file access protocol, and an array of disk drives. The computer has been called by various names, such as a storage controller, a data mover, or a file server. The computer typically performs client authentication, enforces client access rights to particular storage volumes, directories, or files, and maps directory and file names to allocated logical blocks of storage.
System administrators have been faced with an increasing problem of integrating multiple storage servers of different types into the same data storage network. In the past, it was often possible for the system administrator to avoid this problem by migrating data from a number of small servers into one new large server. The small servers were removed from the network. Then the storage for the data was managed effectively using storage management tools for managing the storage in the one new large server.
When system administrators integrate multiple storage servers of different types into the same data storage network, they must deal with problems of allocating the data to be stored among the various servers based on the respective storage capacities and data access bandwidths of the various servers. This should be done in such as way as to minimize any disruption to data access by client applications. To address these problems, storage management tools are being offered for allocation and migration of the data to be stored among various servers to enforce storage management policies. These tools often have limitations when the various servers use different high-level storage access protocols or are manufactured by different storage vendors. In addition, when files are migrated between servers in order to add or remove a server, it may be necessary for the system administrator to access network clients to re-map a server share from a server that is removed or to a server that is added.
The present invention is directed to a namespace server for a data processing network having clients and servers using different file access protocols, and in particular servicing a client that uses a file access protocol that supports redirection. Typically there has been a separate namespace for each protocol and a separate repository of translation information for each protocol. For example, in a data processing system including CIFS clients and servers, and also NFS clients and servers, the namespace repository for CIFS/DFS has been separate from the namespace repository for NFS/automounter. In such a system, problems arise because different clients see different namespaces, and administrators must manage the same data under different views. The inventors recognize that these problems can be solved by providing a unified namespace for such heterogeneous clients and servers, and by appropriate redirection or translation and forwarding of client file access requests.
In accordance with one aspect of the invention, a multi-protocol namespace server provides a unified client-server network namespace to clients using different file access protocols to access files in different file servers in a network attached storage (NAS) network namespace. Some of the clients use file access protocols that support redirection, and others of the clients use file access protocols that do not support redirection. Some of the file servers support file access protocols that are not supported by others of the file servers. The multi-protocol namespace server includes memory for storing translation information for translating pathnames in the client-server network namespace to respective translated pathnames in the NAS network namespace and for storing protocol information defining file access protocols for accessing files at the respective translated pathnames in the NAS network namespace. The multi-protocol namespace server further includes at least one processor coupled to the memory for accessing the translation information and the protocol information. The at least one processor is programmed for receiving requests from the clients for access to files referenced by pathnames in the client-server network namespace, and translating the pathnames in the client-server network namespace to respective translated pathnames in the NAS network namespace. The at least one processor is also programmed for responding to some of the requests from said some of the clients by returning redirection replies to said some of the clients. The redirection replies include translated pathnames in the NAS network namespace. The at least one processor is also programmed for responding to the requests from the others of the clients by forwarding translated requests to the file servers. The translated requests include translated pathnames in the NAS network namespace. The at least one processor is also programmed for translating and forwarding a request of a client supporting redirection for access to a file upon determining that the file to be accessed by the client supporting redirection is stored in a file server that does not support redirection from the client supporting redirection. For example, the at least one processor is programmed to determine that the file to be accessed by the client supporting redirection is stored in a file server that does not support redirection from the client supporting redirection upon finding that the file to be accessed by the client supporting redirection is not accessible at the respective translated pathname in the NAS network namespace using any file access protocol used by the client supporting redirection.
In accordance with another aspect, the invention provides a data processing system including a namespace server, at least one redirection capable client, and at least one file server. The at least one redirection capable client is linked to the namespace server for transmission of file access requests from the at least one redirection capable client to the namespace server and return of redirection replies from the namespace server to the at least one redirection capable client. The at least one file server is in a network attached storage (NAS) network and is linked to the namespace server for receipt of forwarded file access requests from the namespace server and linked to the at least one redirection capable client for receipt of redirected file access requests from the at least one redirection capable client. The namespace server is programmed for responding to a file access request from the at least one redirection capable client by translating a client-server network pathname in the file access request from the at least one redirection capable client into a NAS network pathname of a physical share in the at least one file server, and returning to the at least one redirection capable client a redirection reply specifying the NAS network pathname of the physical share in the at least one file server. The at least one redirection capable client is programmed for responding to the redirection reply by redirecting the file access request to the NAS network pathname of the physical share in the at least one file server, and subsequently sending file access requests for access to the physical share in the at least one file server directly to the at least one file server without redirection from the namespace server. The at least one file server is programmed for returning a redirection reply to the at least one redirection capable client in response to an access request from the at least one redirection capable client requesting access to a share, directory, or file that is offline for migration or for which the at least one redirection capable client is requesting a kind of access for which the at least one redirection capable client does not have access permission. Moreover, the at least one redirection capable client is programmed for responding to the redirection reply from the at least one file server by redirecting access to the namespace server.
In accordance with yet another aspect, the invention provides a method of request redirection in a data processing system. The data processing system includes a namespace server, at least one redirection capable client linked to the namespace server for transmission of file access requests from the at least one redirection capable client to the namespace server and return of redirection replies from the namespace server to the at least one redirection capable client, and at least one file server in a network attached storage (NAS) network linked to the namespace server for receipt of forwarded file access requests from the namespace server and for receipt of redirected file access requests from the at least one redirection capable client. The method includes the namespace server responding to a file access request from the at least one redirection capable client by translating a client-server network pathname in the file access request from the at least one redirection capable client into a NAS network pathname of a physical share in the at least one file server, and returning to the at least one redirection capable client a redirection reply specifying the NAS network pathname of the physical share in said at least one file server. The method further includes the at least one redirection capable client responding to the redirection reply by redirecting the file access request to the NAS network pathname of the physical share in the at least one file server, and subsequently sending file access requests for access to the physical share in the at least one file server directly to the at least one file server without redirection from the namespace server. The method further includes the at least one file server returning a redirection reply to the at least one redirection capable client in response to an access request from the at least one redirection capable client requesting access to a share, directory, or file that is offline for migration or for which the at least one redirection capable client is requesting a kind of access for which the at least one redirection capable client does not have access permission. The method further includes the at least one redirection capable client responding to the redirection reply from the at least one file server by redirecting access back to the namespace server.
Additional features and advantages of the invention will be described below with reference to the drawings, in which:
FIGS. 16 to 18 together comprise a flowchart of programming for the namespace server of
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown in the drawings and will be described in detail. It should be understood, however, that it is not intended to limit the invention to the particular forms shown, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the appended claims.
With reference to
The clients that use the UNIX operating system, for example, use the NFS protocol for access to NFS file servers, and the clients that use the WINDOWS operating system use the CIFS protocol for access to CIFS file servers. A file server may have multi-protocol functionality, so that it may serve NFS clients as well as CIFS clients. A multi-protocol file server may support additional file access protocols such as NFS version 4 (NFSv4), HTTP, and FTP. Various aspects of the network file servers 28, 29, for example, are further described in Vahalia et al., U.S. Pat. No. 5,893,140 issued Apr. 6, 1999, incorporated herein by reference, and Xu et al., U.S. Pat. No. 6,324,581, issued Nov. 27, 2002, incorporated herein by reference. Such network file servers are manufactured and sold by EMC Corporation, 176 South Street, Hopkinton, Mass. 01748.
In the client-server network 21, the operating systems of the clients 22, 23, 24 see a namespace identifying the file servers 28, 29 and identifying groups of related files in the file servers. In the terminology of the WINDOWS operating system, the files are grouped into one or more disjoint sets called “shares.” In UNIX terminology, such a share is referred to as a file system depending from a root directory. For example, assume that the file server 28 is a NFS file server named “TOM”, and has two shares 30 and 31 named “A” and “B”, respectively. Assume that the file server 29 is a CIFS file server named “DICK”, and has two shares 32 and 33, also named “A” and “B”, respectively. In this case, the UNIX operating system in the NFS client 22 could see the shares of the NFS file server 26 mounted to a root directory “X:” as shown in
In the client-server network of
At this point, even though each of the clients can now access the new file server, the job is still not done. Since the new storage appears at a particular path in the namespace, the system administrator 27 should inform the users 25, 26 about the details of the new shares (name, IP or ID) where they can go to find more storage space. It is up to the individual users to make use of the new storage, by creating files there, or moving files from existing directories over to new directories. Even if the system administrator has a tool to migrate files automatically to the new file server, users must still be informed of the migration. Otherwise they will have no way of finding the files that have moved. Moreover, the system administrator has no easy or automatic way to enforce a policy about which files get placed on the new file server. For example, the new file server may provide enhanced bandwidth or storage access time, so it should be used by the most demanding applications, rather than by less demanding applications such as backup applications.
Overall, the process of adding a new file server turns out to be so expensive, in terms of management cost and disruption to end users, that the system administrator adds much more additional storage for each user group than is necessary to meet current demands in order to avoid frequent installations of new file servers or storage over-provisioning. The cost of the extra storage head-room and resulting lower storage utilization will increase the cost of ownership.
What is desired is a way of adding file server storage capacity to specific user groups without disruption to the users and their clients and applications. It is desired to provide a way of automatically and transparently balancing file server storage usage across multiple file servers, in order to drive up storage usage and eliminate wasted capacity. It is also desired to automatically and transparently match files with storage resources that exhibit an appropriate service level profile, based on business rules established for user groups, allowing users to deploy low-cost storage where appropriate. Files should be automatically migrated without user disruption between service levels as the file data progresses through its natural life-cycle, again based on the business rules established for each user group. User access should be routed automatically and transparently to replicas in case of server or site failures. Point-in-time copies should also be made available through a well-defined interface. In short, end users should be protected from disruption due to changes in data location, protection, or service level, and the end users should benefit from having access to all of their data in a timely and efficient manner.
The present invention is directed to a namespace server that permits the namespace for client access to file servers to be different from the namespace used by the file servers. This provides a single unified namespace for client access that may combine storage in servers accessible only by different file access protocols. This single unified namespace is accessible to clients using different file access protocols. The clients send file access requests to the namespace server, the namespace server translates names in theses file access requests to produce translated file access requests, and the namespace server sends the translated file access requests to the file servers. For a translated file access request sent to a file server, the namespace server receives a response from the file server and transfers the response back to the client. All of the background activity between the namespace server and the file server is not visible to the client, nor the actual location where the file or object is stored. The file can be location agnostic. Although a file may seem to a client to be local and bound to a server, it may actually reside elsewhere. The namespace server directs data and control from and to the actual location or locations of the file.
The name translation permits file server storage capacity to be added for specific user groups without disruption to the users and their clients and applications. For example, when a new server is added, the client can continue to address file access requests to an old server, yet the namespace server can translate these requests to address files in the old server or files in the new servers. The translation process permits a client to continue to access a file by addressing file access requests to the same network pathname for the file as the file is migrated from one file server to another file server due to load balancing, recovery in case of file server failure, or a change in a desired level of service for accessing the file.
As shown in
The policy engine server 45 decides when a file in one file server (i.e., a source file server) should be migrated to another file server (i.e., a target file server). The policy engine server 45 is activated at scheduled times, or it may respond to events generated by specific file type, size, owner, or a need for free storage capacity in a file server. Migration may be triggered by these events, or by any other logic. When free storage capacity is needed in a file server, the policy engine server 45 scans file attributes in the file server in order to select a file to be migrated to another file server. The policy engine server 45 may then select a target file server to which the file is migrated. Then the policy engine server sends a migration command to the source file server. The migration command specifies the selected file to be migrated and the selected target file server.
A share, directory or file can be migrated from a source file server to a target file server while permitting clients to have concurrent read-write access to the share, directory or file. The target file server issues directory read requests and file read requests to the source file server in accordance with a network file access protocol (e.g., NFS or CIFS) to transfer the share, directory or file from the source file server to the target file server. Concurrent with the transfer of the share, directory or file from the source file server to the target file server, the target file server responds to client read/write requests for access to the share, directory or file. For example, the target file server maintains a hierarchy of on-line inodes and off-line inodes. The online inodes represent file system objects (i.e., shares, directories or files) that have been completely migrated, and the offline inodes represent file system objects that have not been completely migrated. The target file server executes a background process that walks through the hierarchy in order to migrate the objects of the offline inodes. When an object has been completely migrated, the target file server changes the offline inode for the object to an online inode for the object. Such a migration method is further described in Bober et al., U.S. Ser. No. 09/608,469 filed Jun. 30, 2000, U.S. Pat. No. 6,938,039 issued Aug. 30, 2005, incorporated herein by reference.
A comparison of
The namespace server can be programmed to translate not only network pathnames but also the high-level format of the file access requests. For example, a NFS client sends a file access request to the namespace server using the NFS protocol, and the namespace server translates the request into one or more CIFS requests that are transmitted to a CIFS file server. The namespace server receives one or more replies from the CIFS file server, and translates the replies into a NFS reply that is returned to the client. In another example, a CIFS client sends a file access request to the namespace server using the CIFS protocol, and the namespace server translates the request into one or more NFS requests that are transmitted to a NFS file server. The namespace server receives one or more replies from the NFS file server, and translates the replies into a CIFS reply that is returned to the client.
The namespace server could also be programmed to translate NFS, CIFS, HTTP, and FTP requests from clients in the client-server network into NAS commands sent to a NAS server in the backend NAS network. The namespace server could also cache files in a locally owned file system to the extent that local disk space and cache memory would be available in the namespace server. A client could be served directly by the namespace server.
Client request translation and forwarding 57 to file servers includes name substitution, and also format translation if the client and server use different high-level file access protocols. The programming for the client request translation and forwarding to NFS or NFSv4 file servers includes the NFS or NFSv4 protocol layer software found in an NFS or NFSv4 client since the namespace server is acting as a NFS or NFSv4 proxy client when forwarding the translated requests to NFS or NFSv4 file servers. The programming for the client request translation and forwarding to CIFS file servers includes the CIFS protocol layer software found in a CIFS client since the namespace server is acting as a CIFS proxy client when forwarding the translated requests to CIFS file servers. The programming for the client request translation and forwarding to HTTP file servers includes the HTTP protocol layer software found in an HTTP client since the namespace server is acting as an HTTP proxy client when forwarding the translated requests to HTTP file servers.
A database of file server addresses and connections 58 is accessed to find the network protocol or machine address for a particular file server to receive each request, and a particular protocol or connection to use for forwarding each request to each file server. For example, the connection database 58 for the preferred implementation includes the following fields: for CIFS, the Server Name, Share name, User name, Password, Domain Server, and WINS server; and for NFS, the Server name, Path of exported share, Use Root credential flag, Transport protocol, Secondary server NFS/Mount port, Mount protocol version, and Local port to make connection. Using the connection database avoids storing all the credential information in the offline inode.
A backend NAS network interface port 59 transmits the translated file access requests to file servers on the backend NAS network 40. A request and reply decoder 60 receives requests and replies from the backend NAS network 40. File server reply modification and redirection to clients 61 includes modification in accordance with namespace translation and also format translation if the reply is from a server that uses a different high-level file access protocol than is used by the client to which the reply is directed. The client-server network port 51 transmits the replies to the clients over the client-server network 21.
In a preferred implementation, whenever the namespace server returns a file identifier (i.e., a file handle or fid) to a client, the namespace tree will include an inode for the file. Therefore, the process of a client-server network namespace lookup for the pathname of a directory or file in the backend NAS network will cause instantiation of an inode for the directory or file if the namespace tree does not already include an inode for the directory or file. This eliminates any need for the file identifier to include any information about where an object (i.e., a share, directory, or file) referenced by the file identifier is located in the backend NAS network. Instead, the namespace server may issue file identifiers that identify inodes in the namespace tree in a conventional fashion. Consequently, an object referenced by a file identifier issued to a client can be migrated from one location to another in the backend NAS network without causing the file identifier to become stale. The growth of the namespace tree caused by the issuance of file identifiers could be balanced by a background pruning task that removes from the namespace tree leaf inodes for directories and files that are in the file servers in the backend NAS network and have not been accessed for a certain length of time in excess of a file identifier lifetime.
The inode 74 for the virtual file system “TOM” has an entry 75 pointing to an offline share named “A” in the client-server network namespace, an entry 76 pointing to an offline share named “B” in the client-server network namespace, and an entry 77 pointing to an offline share named “C” in the client-server network namespace. The offline inode 78 has an entry 79 indicating that the offline share having the pathname “\\TOM\A” in the client-server network namespace has a pathname of “\\TOM\A” in the backend NAS network namespace. The offline inode 80 has an entry 81 indicating that the offline share having a pathname “\\TOM\B” in the client-server network namespace has a pathname of “\\TOM\B” in the backend NAS network namespace. The offline inode 82 has an entry 83 indicating that the offline share having the pathname “\\TOM\C” in the client-server network namespace has a pathname of “\HARRY\A” in the backend NAS network namespace.
The inode 84 for the virtual file system “DICK” has an entry 85 pointing to an offline share named “A” in the client-server network namespace, an entry 86 pointing to an offline share named “B” in the client-server network namespace, and an entry 87 pointing to an offline share named “C” in the client-server network namespace. The offline inode 88 has an entry 89 indicating that the offline share having the pathname “\\DICK\A” in the client-server network namespace has a pathname of “\\DICK\A” in the backend NAS network namespace. The offline inode 90 has an entry 91 indicating that the offline share having the pathname “\\DICK\B” in the client-server network namespace has a pathname of “\\DICK\B” in the backend NAS network namespace. The offline inode 92 has an entry 93 indicating that the offline share having the pathname “\\DICK\C” in the client-server network namespace has a pathname of “\HARRY\B” in the backend NAS network namespace.
In practice, the inodes in the namespace tree can be inodes of a UNIX-based file system, and conventional UNIX facilities can be used for searching through the namespace tree for a given pathname in the client-server network namespace. However, the inodes of a UNIX-based file system include numerous fields that are not needed, so that the inodes have excess memory capacity, especially for the online inodes. Considerable memory savings can be realized by eliminating the unused fields from the inodes.
The inode 134 has a second entry 136 pointing to an inode 139 for a virtual file named “D”. The inode 139 includes a first entry 140 pointing to an offline inode 142 named “@L” . . . The offline inode 142 has an entry 143 pointing to the contents of a file having a backend NAS network pathname of “\\DICK\A\F2”. The inode 139 has a second entry 141 pointing to an offline inode 144 named “@M”. The offline inode 144 has an entry 145 pointing to the contents of a file having a backend NAS network pathname of “\\HARRY\F3”.
For NFS, at mount time a handle to a root directory is sent to the client. In a client-server network, user identity and access permissions are checked before the handle to the root directory is sent to the client. For subsequent file accesses, the handle to the root directory is unchanged. A mount operation is also performed in order to obtain a handle for a share. In order to access a file, an NFS client must first obtain a handle to the file. This is done by resolving a full pathname to the file by successive directory lookups, culminating in a lookup which returns the handle for the file. The client uses the file handle for the file in a request to read from or write to the file.
For CIFS, a typical client request—server reply sequence for access to a file includes the following:
1. SMB_COM_NEGOTIATE. This is the first message sent by the client to the server. It includes a list of Server Message Block (SMB) dialects supported by the client. The server response indicates which SMB dialect should be used.
2. SMB_COM—l SESSION_SETUP_ANDX. This message from the client transmits the user's name and credentials to the server for verification. A successful server response has a user identification (Uid) field set in SMB header used for subsequent SMBs on behalf of this user.
3. SMB_COM_TREE_CONNECT_ANDX. This message from the client transmits the name of the disk share that the client wants to access. A successful server response has a Tid field set in a SMB header used for subsequent SMBs referring to this resource.
4. SMB_COM_OPEN_ANDX. This message from the client transmits the name of the file, relative to Tid, the client wants to open. A successful server response includes a file id (Fid) the client should supply for subsequent operations on this file.
5. SMB_COM_READ. This message from the client transmits the Tid, Fid, file offset, and number of bytes to read. A successful server response includes the requested file data.
6. SMB_COM_CLOSE. The message from the client requests the server to close the file represented by Tid and Fid. The server responds with a success code.
7. SMB_COM_TREE_DISCONNECT. This message from the client requests the client to disconnect from the resource represented by Tid.
By using a CIFS request batching mechanism (called the “AndX” mechanism), the second to sixth messages in this sequence can be combined into one, so there are really only three round trips in the sequence, and the last one can be done asynchronously by the client.
FIGS. 16 to 18 together show a procedure used by the namespace server for responding to a client request. In a first step 151, the namespace server decodes the client request. In step 152, if the request is in accordance with a connection-oriented protocol such as CIFS, then execution continues to step 153. If a connection with the client has not already been established for handling the request, then execution branches from step 153 to step 154. In step 154, the namespace server sets up a new connection in a client connection database in the namespace server. If a connection has been established with the client, then execution continues from step 153 to step 155 to find the connection status in the client connection database. Execution continues from steps 154 and 155 to step 156. Execution also continues to step 156 from step 152 if the request is not in accordance with a connection oriented protocol.
In step 156, if the request requires a directory lookup, then execution continues to step 157. For example, for a NFS client, the namespace server performs a directory lookup for a server share or a root file system in response to a mount request, and for a file in response to a file name lookup request, resulting in the return of a file handle to the client. For a CIFS client, the namespace server performs a directory lookup for a server share in response to a SMB_COM_TREE_CONNECT request, and for a file in response to a SMB_COM_OPEN request. In step 157, the namespace server searches down the namespace tree along the path specified by the pathname in the client request until an offline inode is reached. Once an offline inode is reached, in step 158 the namespace server accesses the offline inode to find a backend NAS network pathname of a server in which the search will be continued. In addition to the server address, the offline inode has a pointer to protocol and connection information for this server in which the search will be continued. In step 159, this pointer is used to obtain this protocol and connection information from the connection database. In step 160, this protocol and connection information is used to formulate and transmit a server share or file lookup request for obtaining a Tid, fid, or file handle corresponding to the backend NAS network pathname from the offline inode.
The search of the namespace tree in the namespace server may reach an inode having entries that point to the contents of directories in more than one of the file servers. In this case, in step 160, it is possible for the namespace server to forward concurrently a pathname search request to each of the file servers. As soon as any one of the servers returns a reply indicating that a successful match has been found, the namespace server could issue a request canceling the searches by the other file servers.
In step 161 of
For the case of a SMB_COM_SESSION_SETUP request as well as a mount request, the actual authentication and authorization of a client could be deferred until the client specifies a share or file system and a search of the pathname for the specified share or root file system is performed in the file server for the specified share or root file system. In this case, a client would have only read-only access to information in the namespace server until the client is authenticated and authorized by one of the file servers. However, an entirely separate authentication mechanism could be used in the tree management programming (56 in
In step 156 of
In the preferred implementation in which a file identifier (i.e., file handle or fid) from or to a client identifies an inode in the namespace tree, if a request or reply received by the namespace server includes a file identifier, then the namespace server will perform a file handle substitution because the corresponding file handle to or from a file server identifies a different inode in a file system maintained by the file server. In order to facilitate this file identifier substitution, when a file server returns a file identifier to the namespace server as a result of a directory lookup for an object specified by a backend NAS network pathname, the namespace server stores the file identifier in the object's inode in the namespace tree. Also, the corresponding file system handle or TID for accessing the object in the file server is associated with the object's inode in the namespace tree if this inode is an offline inode, or otherwise the corresponding file system handle or TID for accessing the object in the file server is associated with the offline inode that is a predecessor of the object's inode in the namespace tree.
In step 166, for a read or write request, execution continues to step 167. In step 167, the read or write data passes through the namespace server. For a read request, the requested data passes through the namespace server from the backend NAS network to the client-server network. For a write request, the data to be written passes through the namespace server from the client-server network to the backend NAS network.
In step 166, if the client request is not a read or write request, then execution continues to step 168. In step 168, if the client request is a request to add, delete, or rename a share, directory, or file, then execution continues to step 169. A typical user may have authority to add, delete, or rename a share, directory, or file in one of the file servers. In this case, the file server will check the user's authority, and if the user has authority, the file server will perform the requested operation. If the requested operation requires a corresponding change or deletion of a backend NAS network pathname in the namespace tree, then the namespace server performs the corresponding change upon receipt of a confirmation from the file server. A deletion of a backend NAS network pathname from an offline inode may result in an offline inode empty of entries, in which case the off line inode may be deleted along with deletion of a pointer to it in its parent inode in the namespace tree.
The namespace server may also respond to client requests for metadata of virtual inodes in the namespace tree. Virtual inodes can serve as namespace junctions that are not written into, but which aggregate file systems. Once the metadata information in the namespace tree becomes too large for a single physical file system to hold, a virtual inode can be used to link together more than one large physical file system in order to continue to scale the available namespace. In many cases the metadata of a virtual inode can be computed or reconstructed from metadata stored in the file servers that contain the objects referenced by the offline inodes that are descendants of the virtual inode. Once this metadata is computed or reconstructed, it can be cached in the namespace tree. The virtual inodes could also have metadata that is configured by the system administrator or updated in response to file access. For example, the system administrator could configure a quota for a virtual directory, and a “bytes used” could be maintained for the virtual directory, and updated and checked against the quota each time a descendant file is added, deleted, extended, or truncated.
The namespace server may also respond to tree management commands from an authorized system administrator, or a policy engine or file migration service of a file server in the backend NAS network. For example, file migration transparent to the clients at some point requires a change in the storage area pathname in an offline inode. If the new or old storage area pathname is a CIFS server, the server connection status should also be updated.
The namespace server may also respond to a backend NAS network pathname change request from the backend NAS network for changing the translation of a client-server network pathname from a specified old backend NAS network pathname to a specified new backend NAS network pathname. The namespace server searches for offline inode or inodes in the namespace tree from which the old backend NAS network pathname is reached. Upon finding such an offline inode, if an entry of the inode includes the old backend NAS network pathname, then the entry is changed to specify the new backend NAS network pathname.
The namespace tree could be constructed so that the pathname of every physical file in every file server is found in at least one offline inode of the namespace tree. This would simplify the process of changing backend NAS network pathnames, but it would result in the namespace server having to store and access a very large directory structure. For the general case where the offline inodes represent shares or directories, an entry of an offline inode may specify merely a beginning portion of the old backend NAS network pathname. In this case, this offline inode represents a “mount point” or root directory of a file tree that includes the object identified by the old backend NAS network pathname. The remaining portion of the old backend NAS network pathname is the same as an end portion of the client-server pathname. In this case, the namespace tree is reconfigured by the addition of inodes to perform the same client-server network to storage-area network namespace translation as before and so that the old backend NAS network pathname appears in an entry in an added offline inode. Then, the old backend NAS network pathname in this added offline inode is changed to the new backend NAS network pathname. A specific example of this process was described above with reference to
In the general case, the namespace tree is reconfigured to perform the same namespace translation as before by adding a new offline inode to contain the old backend NAS network pathname. In addition, the offline inode representing the “mount point” is changed to a virtual inode containing entries pointing to newly added offline inodes for all of the objects in the root inode that are not the object having the old backend NAS network pathname or a predecessor directory for the object having the old storage area pathname. In a similar fashion, a virtual inode is created in the namespace tree for each directory name in the pathname between the virtual inode of the “mount point” and the offline inode for the object having the old backend NAS network pathname. Each of these virtual inodes are provided with entries pointing to new offline inodes for the files or directories that are not the object having the old backend NAS network pathname or a predecessor directory for the object having the old storage area pathname.
To facilitate the search for offline inode or inodes in the namespace tree from which the old backend NAS network pathname is reached, the namespace server may maintain an index to the backend NAS network pathnames in the offline inodes. For example, this index could be maintained as a hash index. Alternatively, the index could be a table of entries, in which each entry includes a pathname and a pointer to the offline inode where the pathname appears. The entries could be maintained in alphabetical order of the pathnames, in order to facilitate a binary search.
In step 175, the source file server receives the ready signal, and sends a backend NAS network pathname change request to the namespace server. In step 176, the namespace server responds to the namespace change request by growing the namespace tree if needed for the old pathname to appear in an offline inode of the namespace tree, and changing the old pathname to the new pathname wherever the old pathname appears in the offline inodes of the namespace tree. In step 177, the source file server receives a reply from the namespace server, suspends further access to the file system by the namespace server or clients other than migration process of the target file server, and sends a “migration start” request to the target file server. In step 178, the target file server responds to the “migration start” request by migrating files of the file system on a priority basis in response to client access to the files and in a background process of fetching files of the file system from the source file system.
The policy engine could also be involved in a background process of pruning the namespace tree by migrating all files in the same virtual directory of the narnespace tree to the same file server, creating a directory in the file server corresponding to the virtual directory, replacing the virtual directory with an offline inode, and then removing the offline nodes of the files from the namespace tree.
In the above examples, each offline inode in the namespace tree has had a single entry pointing to an object of a file server. When the offline inode represents a file, it may be appropriate to permit the offline inode to have one or more entries, each designating a separate physical copy of the file at a different physical location. When reading the file, if the file is not available at one location because of failure or a heavy access loading or loss of a network connection, then the file can be accessed at one of the other locations. When writing to the file, the file can be written to at all locations, as shown and further described below with reference to
The write operation will complete without error, and the namespace server will return an acknowledgement of successful completion to the client, only after all of the copies have been updated successfully, and acknowledgements of such successful completion have been returned by the file servers at all of the locations to the namespace server. See, for example, the discussion of synchronous remote mirroring in Yanai et al., U.S. Pat. No. 6,502,205 issued Dec. 31, 2002, incorporated herein by reference. The writing of the file to all of the locations could also be done by the namespace server writing to a local file, and using a replication service to replicate the changes in the local file to file servers in the backend NAS network. See, for example, Raman et al., “Replication of remote copy data for internet protocol (IP) transmission,” U.S. patent application publication no. 20030217119 published Nov. 20, 2003, incorporated herein by reference.
If the write operation does not complete at any location, then the copy at that location will become invalid. In this case the corresponding entry in the offline inode can be removed or flagged as invalid. The number of copies that should be made and maintained for a file could be dynamically adjusted by the policy engine server. For example, the namespace server could collect access statistics and store the access statistics in the offline inodes as file attributes. The policy engine server could collect and compare these statistics among the files in order to dynamically adjust the number of copies that should be made.
The point-in-time versions are also known as snapshots or checkpoints. A snapshot copy facility can create a point-in-time copy of a file while permitting concurrent read-write access to the file. Such a snapshot copy facility, for example, is described in Kedem U.S. Pat. No. 6,076,148 issued Jun. 13, 2000, incorporated herein by reference, and in Armangau et al., U.S. Pat. No. 6,792,518, issued Sep. 14, 2004, incorporated herein by reference. The service level attribute is a numeric value indicating an ordering of the copies in terms of accessibility for primary and secondary copies, and time of creation for the point-in-time versions.
For an offline inode having more than one entry, the namespace server may access the file type and service level attributes in order to determine which copy or version of the file to access in response to a client request. For example, the namespace server will usually reply to a file access request from a client by accessing the primary copy having the highest level of accessibility, as indicated by the service level attribute, unless this primary copy is already busy servicing a prior file access request from the namespace server. An appropriate scheduling procedure, such as “round-robin” weighted by the service level attribute, is used for selecting the primary copy to access for the case of concurrent access.
In step 193, if the file access request is not a read request, then execution continues to step 199. In step 199, if the file access request is a write request, then execution continues to step 200 to write to all of the primary copies by sending write requests to all of the file servers containing the primary copies, as indicated by the backend NAS network pathnames for the primary copies. In step 201, if all servers reply that the write operations were successful, then execution returns. If there was a write failure, execution continues to step 202. In step 202, the namespace server invalidates each copy having a write failure, for example by marking as invalid each entry in the offline inode for each invalid primary copy.
If the namespace server finds that there are no primary copies of a file to be accessed or if the primary copies are found to be inaccessible, then the namespace server may access a secondary copy. If a primary copy is found to be inaccessible, this fact is reported to the policy engine, and the policy engine may choose to select a file server for creating a new primary copy and initiate a migration process to create a primary copy from a secondary copy.
If the namespace server finds that there are no accessible primary or secondary copies of a file to be accessed, then the namespace server reports this fact to the policy engine. The policy engine may choose to initiate a recovery operation that may involve accessing the point-in-time versions, starting with the most recent point-in-time version, and re-doing transactions upon the point-in-time version. If the recovery operation is successful, an entry will be put into the offline inode pointing to the location of the recovered file in primary storage, and then the namespace server will access the recovered file.
The configured portion of the namespace tree 218 from the local disk storage 216 is cached in the memory 215 together with cached inodes of the namespace tree for any outstanding file handles or fids. When the namespace tree needs to be reconfigured, the processor 214 obtains write locks on the inodes of the namespace tree that need to be modified. The write locks include local write locks on the inodes of the namespace tree 218 in the namespace server 210 and also remote write locks on the inodes of the namespace tree 228 in the other namespace server 220. If the inodes to be write locked are also cached in the memories 215, 225, these cached inode copies are invalidated. Then changes are first written to the logs 219, 229 and then written to the write-locked inodes of namespace trees 218, 228 in the local disk storage 216, 226 in each of the namespace servers 210, 220. In this fashion, the two namespace servers 210, 220 are clustered together for bi-directional synchronous mirroring of the configured inodes in the namespace trees.
If one of the namespace servers should crash, it could be re-booted and the namespace configuration information could either be recovered from the other namespace server or recovered from its local log. Also, each of the namespace servers could monitor the health of the other, and if one of the namespace servers would not recover upon reboot from a crash, the other namespace server could service the clients that would otherwise be serviced by the failed namespace server. Monitoring and fail-over of service from one of the namespace servers to the other could also use methods described in Duso et al. U.S. Pat. No. 6,625,750 issued Sep. 23, 2003, incorporated herein by reference.
In
The redirection agent 244 could further function as a proxy agent, so that the NFS client 22 may function as a proxy server for other network clients such as the NFS client 24. For example, the redirection agent 244 may forward file access requests from the other network clients to the namespace server 44 in order to perform a share lookup. The redirection agent 244 may also forward file access requests from the other network clients to the file servers 28, 29 or 41 after a share lookup and redirection from the namespace server 44a. The redirection agent may also directly access network attached data storage on behalf of the other clients in response to metadata from the namespace server 44 or from the file servers 28, 29 or 41.
The client 241 is operated by a user 245 and has a direct link 246 to the backend NAS network 40. The client 241 uses the NFS version 4 file access protocol (NFSv4), which supports redirection of file access requests. The NFSv4 protocol is described in S. Shepler et al., “Network File System (NFS) version 4 Protocol,” Request for Comments: 3530, Network Working Group, Sun Microsystems, Inc., Mountain View, Calif., April 2003. In NFSv4, the redirection of file access requests is supported to enable migration and replication of file systems. A file system locations attribute provides a method for the client to probe the file server about the location of a file system. In the event of a migration of a file system, the client will receive an error when operating on the file system, and the client can then query as to the new file system location.
The client 241 includes an installable metadata agent 247 as described in the above-cited Xu et al. U.S. Pat. No. 6,324,581. The metadata agent 247 collects metadata about a file by sending a metadata request to the namespace server. This metadata, for example, specifies the backend NAS network address of a NAS file server where the metadata agent 247 may read or write the data, for example, by sending Internet Protocol Small Computer Systems Interface (iSCSI) commands over the link 246 to the backend NAS network 40.
The client 242 is operated by a user 248 and has a direct link 249 to the backend NAS network 40. The client 242 uses the CIFS protocol and also may use Microsoft's Distributed File System (DFS) namespace service. Microsoft's DFS provides a mechanism for administrators to create logical views of directories and files, regardless of where those files physically reside in the network. This logical view could be set up by creating a DFS Share on a server. In the system of
In step 251, if the offline inode does not specify one or more of a plurality of components of a virtual file, then execution continues to step 253. In step 253, if the client does not support redirection, then execution branches to step 252 so that the namespace server accesses the offline object or objects indicated by the offline inode. The namespace server can determine the client's protocol from the client request, and decide that the client supports redirection if the protocol is NFSv4 or CIFS-DFS. The namespace server may also determine whether the client may recognize a redirection request regardless of the protocol of the client's request by accessing client information configured in the client connection database (53 in
In step 254, if the offline file server does not support the client's redirection, then execution continues to step 252 so that the namespace server accesses the offline object or objects indicated by the offline inode. The offline server can support the client's redirection only if the client and the offline server have the capability of communicating with each other using compatible protocols. For example, a NFSv4 client may support redirection but a CIFS file server may not support this client's redirection. If the offline server can support the client's redirection, execution continues from step 254 to step 255.
In step 255, if the client is requesting the deletion or name change of an offline object (i.e., a share, directory, or file), execution branches to step 252 so that the namespace server accesses the offline object. This is done so that the namespace server will delete or rename the offline object in its namespace tree upon receiving confirmation that the offline file server has deleted or renamed the object. To ensure that the namespace server will be informed of deletion or name changes to offline objects referenced in the namespace tree, a permission attribute of each referenced offline object in each file server may be programmed so that only client requests forwarded from the namespace server would have permission to delete or rename such objects. A client's installable agent could be programmed so that if a client directly accesses such a referenced offline object and attempts to delete or rename it and the file server refuses to honor the deletion or rename request, then the client will reformulate the deletion or rename request in terms of the object's client-server network pathname and send the reformulated request to the namespace server. In step 255, if the client is not requesting the deletion or name change of an offline object, execution continues to step 256.
In step 256, if the offline inode does not designate a plurality of primary copies of a file, then execution continues to step 257 to formulate a redirection reply including an IP address or backend NAS network pathname to the offline physical object. Then in step 258 the namespace server returns the redirection reply to the client.
In step 256, if the offline inode designates a plurality of offline primary copies of a file, then execution branches to step 259. In step 259, if the primary copies are all read-only copies, then execution continues to step 260. In step 260, the namespace server selects one of the primary copies for the client to access. From step 260, execution continues to step 257 to formulate a redirection reply including a backend NAS network pathname to the selected primary copy. This redirection reply is returned to the client in step 258.
In step 259, if the primary copies are not all read-only, then execution continues to step 261. In step 261, the namespace server accesses the primary copies on behalf of the client, as shown in
As introduced above with respect to step 255, a redirection capable client could not only be redirected by the namespace server to a server when it is appropriate for the client to directly access a file server, but also redirected by the file server back to the namespace server when it is appropriate to do so. This is further shown in the example of
In a first step 271 of
In general, the redirection capable client retains a memory of the namespace translation in each redirection reply from the namespace server, and if this namespace translation is applicable to a subsequent request, the redirection capable client will use this namespace translation to direct the subsequent request directly to NAS network pathname of the applicable physical share, directory, or file, without access to the namespace server. Thus, a redirection reply for access to a share provides a namespace translation for a share than can be used for access to any directories or files in a share. A redirection reply for access to a directory provides a namespace translation for the directory that can be used for any subdirectories or files contained in or descendant from the directory. In general, because subsequent client access can be sent directly to the same file server containing descendants of the same share or directory once a client is redirected, aggregate performance can scale with capacity.
In step 274, when the client attempts to delete or rename a share, directory, or file that is referenced by an offline inode of the namespace tree, or the client attempts to access a file system object (i.e., a share, directory, or file) that is offline for migration, the server returns a redirection reply or an access denied error. In step 275, the client responds to the redirection reply or access denied error by resending the request to the namespace server and specifying the directory or file in terms of its client-server network pathname. In step 276, the namespace server responds by deleting or renaming the share, directory, or file, or by directing the request to the target of the migration.
The namespace server may be provided with or without certain capabilities in order to ensure compatibility with or simplify implementation for various file access protocols that support redirection. For example, to be compatible with CIFS-DFS, if an object referenced in an offline inode of the namespace tree is in a file server that does not support CIFS-DFS, then that object should not be visible to a client when that client is using the CIFS-DFS protocol. To be compatible with NFSv4, if an object referenced in an offline inode of the namespace tree is in a file server that does not support NFSv4, then that object should not be visible to a client when that client is using the NFSv4 protocol. To be compatible with NFSv4, the namespace tree may provide virtual interconnects between disjoint ports of the namespace that support the NFSv4 protocol. For example, in a tree “/a/b/c”, if “a” and “c” support the NFSv4 protocol, then the namespace tree may provide attributes when the NFSv4 protocol accesses attributes for “b”.
In general, it should be possible for the namespace server to share or export the root of the namespace tree to allow all supported and authorized clients to connect to it. To simplify the implementation of the namespace tree, however, the namespace tree may only provide metadata access and access to an internal file buffer. In this case, clients will not be allowed to write files to the root of the namespace tree.
Although the namespace tree can be constructed from a UNIX-based file system as described above, an alternative implementation could be based on a modification of a DFS share facility. This alternative implementation would be most advantageous if one would want to provide redirection only for CIFS-DFS clients. The DFS share facility would be modified to specify the protocols associated with leaf nodes in the virtual namespace tree. For example, the DFS share facility provides a target definition for each leaf node. Each target definition includes a server name, a share name on that server, and a comment field. To provide redirection, the DFS share facility is modified by inserting protocol keywords in the comment field. If the comment field is blank, then the protocol is assumed to be CIFS-DFS. To associate additional information with each leaf node, a pointer to the additional information could be put into the comment field.
In step 283, upon finding that the client is requesting access to a metadata file, the namespace server checks that the client supports direct access using metadata, and if so, the namespace server returns metadata to the metadata agent. The metadata specifies the data storage locations for the data to be read or written. For example, the specification could include a backend NAS network pathname for a set of storage units of the NAS file server, and a block mapping table specifying logical unit numbers, block addresses, and extents of storage in the NAS file server for respective offsets and extents in the file. The specification could also designate a particular way of striping the data across multiple storage units to form a RAID set. If the namespace server receives a request to read or write data to a metadata from a client that does not support direct access using metadata, then the namespace server may access the metadata file and use metadata in the metadata file to read or write data to the data storage locations specified by the metadata. In other words, the namespace server itself may function as a metadata agent on behalf of a client that does not have its own metadata agent.
In step 284, the metadata agent formulates read or write requests by using the metadata specifying the data storage locations to be read or written. In step 285, the metadata agent sends the read or write requests directly to the backend NAS network, and the data that is read or written is transferred between the client and the storage without passing through the namespace server. For example, the read or write requests are iSCSI commands sent to a NAS file server. Finally, in step 286, if the write operation changes the metadata for the file, then the metadata agent sends a write request to the namespace server to update the metadata in the named file. For example, if the write operation extends the extent of the file, the metadata agent will send such a write request to the namespace server.
The two-level redirection in
In view of the above, there has been described a namespace server that can receive client requests for access to files referenced by pathnames in a client-server namespace, and can translate the requests from the client into translated requests sent from the network namespace server to a file server for access to files referenced by pathnames in a backend NAS network namespace. Therefore it is possible to scale the namespace capacity seamlessly, by abstracting the namespace management and representation from the actual data storage locations. The namespace server also has the capability of changing the translation of a client-server network pathname from an old backend NAS network pathname to a new backend NAS network pathname during concurrent client read-write access. This allows for transparent data re-distribution for balancing storage utilization, performance balancing, and resource management. The namespace server can perform a translation between different file access protocols, so that a NFS client can access files serviced by a CIFS file server, and a CIFS client can access files serviced by a NFS file server. If a client supports redirection and is requesting access to a file in a file server that supports the client's redirection, then the namespace server may redirect the client to the NAS network pathname of the file. For example, a request from an NFSv4 client may be redirected for access to an NFSv4 file server, and a request from a CIFS-DFS client may be redirected to a CIFS file server. Since subsequent client file access can be directly sent to the same share, directory, or file once a client is redirected, aggregate performance can scale with capacity.
The namespace server provides a unified repository for namespace information. The namespace information includes a hierarchy of storage objects (i.e., shares, directories, or files). The repository includes the NAS network location and protocol information for each storage object. For example, the NAS network location is a Uniform Resource Locator (URL) specifying the file server and pathname that can be used to retrieve the object via the specified protocol. When receiving a client request to access an object in the namespace repository, the namespace server examines the location information for the object, and the access method of the client. If the access method of the client is a protocol that supports redirection and the protocol information for the object shows that the object can be accessed using the client's redirection, then the namespace server returns an appropriate kind of redirection reply to the client. If the access method of the client is a protocol that supports redirection and the protocol information for the object shows that the object cannot be accessed using the client's redirection, then the namespace server translates the client's request and functions as a proxy server by forwarding the translated request to the file server that contains the object to be accessed. In the preferred implementation, however, the namespace server will not redirect a request for access to a virtual file component, a request for deletion or name change of an offline object, or a request for write access to copies maintained by namespace server in a state of coherency. A file server may redirect a redirection-capable client's access back to the namespace server for access to a share, directory, or file that is offline for migration, or for a deletion or name change that would require a change in translation information in the namespace server.