The following description relates to computing in general and to file systems in particular.
Computers or other information processing devices typically store data on or in a storage medium such as a hard disk drive. A file system is typically used to organize, store, retrieve, and manage the stored data. As used herein, the term “volume” refers to the logical entity on which a file system operates. A volume is physically stored on or in one or more items of storage media.
In one configuration, a computer accesses a “local” volume that is physically stored on storage media that is local to or directly coupled to the computer (for example, storage media that is a part of the computer). In another configuration, multiple computers access a “shared” volume that is physically stored on storage media that the computers access over a network (for example, a local area network or a storage area network) in addition to or instead of any local volumes used by the computers. In one such configuration, one of the computers (referred to here as a “file server”) maintains information related to the shared volume (for example, file system meta data) and controls access to the storage media on which the shared volume is stored. The physical storage media on which a shared volume is stored is also referred to here as the “shared storage media.”
In one example of such a shared-volume configuration, when a client wishes to write data to or read data from the shared volume, the client sends to the file server a request that such a write or read operation be performed by the file server on behalf of the client. In the case of a write, the client sends to the file server the data to be written to the shared volume, which the file server receives and writes to the shared storage media. In the case of a read, the file serve reads the requested data from the shared storage media and sends the read data to the client.
In order to reduce the overhead associated with communicating data between clients and the file server in connection with such operations, some shared-volume configurations also support “direct” input/output (I/O) operations in which a client is able to write or read data directly to or from the shared storage media. When a client opens a file for writing, the file server sends the client information indicating where on the shared storage media that file is located. The client uses the location information provided by the file server to directly write data to the shared storage media.
Some file systems include functionality that allows a “snapshot” of a volume (also referred to here in this context as the “live volume”) to be created at a given point in time. A snapshot maintains a copy of the live volume as the volume existed at the time the snapshot was created. In order to reduce the amount of resources used to create and store a snapshot, a “copy-on-write” technique is typically used to create and maintain the snapshot. Initially, when the file system first “creates” the snapshot, data is not copied from the live volume to the snapshot. Instead, the snapshot contains meta data that references the same physical data stored on the storage media for the live volume. After the snapshot is created, when a write operation intends to overwrite data stored in the live volume at a particular location on the storage media, the data stored on the storage media at that location is first copied to a new location on the storage media. The meta data stored in the snapshot for that file (which previously referred to the first location on the storage media) is updated to refer to the new location on the storage media. After this “copy-on-write” is completed, the write operation is performed, which overwrites the data stored at the first location on the storage media.
In a shared-volume configuration, when snapshots are created and maintained using such copy-on-write techniques, the client are not typically allowed to perform direct write operations to shared storage media on which the live shared volume is stored. Instead, in such a configuration, all write operations are performed by the file server on behalf of the client, which requires the client to send the data to be written to the file server. Transferring data from the client to the file server in order to perform a write reduces the performance of the write.
In one embodiment, a method comprises maintaining information indicative of which, if any, data stored on a storage medium, before being changed, needs to be copied to a snapshot. The method further comprises communicating, to a client, at least a portion of the information for use by the client in determining whether to perform a direct input/output operation on the storage medium that would change data stored thereon.
In another embodiment, a method comprises, at a client that is communicatively coupled to a file server and a storage medium on which data are stored, receiving, from the file server, information indicative of which, if any, of at least a subset of the data need to be copied to a snapshot before being changed on the storage medium. The method further comprises, when the client intends to perform an input/output operation that would change any data included in the subset, determining, by the client based on the received information, if any data included in the subset needs to be copied to the snapshot before being changed on the storage medium. The method further comprises, when the client intends to perform the input/output operation, if any data included in the subset needs to be copied to the snapshot before being changed on the storage medium, requesting, by the client, that the file server copy to the snapshot the data included in the subset that needs to be copied to the snapshot before being changed on the storage medium and requesting that the file server perform the input/output operation on behalf of the client. The method further comprises, when the client intends to perform the input/output operation, if none of the data included in the subset needs to be copied to the snapshot before being changed on the storage medium, performing the input/output operation directly on the storage medium.
In another embodiment, a file server comprises a storage medium interface to communicatively couple the file server to a storage medium on which a file is stored and a client interface to communicatively couple the file server to at least one client. The file server provides, to the client, information indicative of whether any part of the file needs a copy-on-write to be performed therefor for use by the client in determining whether to perform a direct input/output operation to the file.
In another embodiment, a device comprises a storage medium interface to communicatively couple the device to a storage medium on which a file is stored and a file server interface to communicatively couple the device to a file server. The device receives, from the file server, information indicative of whether any part of the file needs a copy-on-write to be performed therefor. The device, when the device intends to perform an input/output operation on the file that would change at least a part of the file, uses the information to determine if the at least a part of the file needs a copy-on-write to be performed therefor. If the at least a part of the file needs a copy-on-write to be performed therefor, the client requests that the file server perform the copy-on-write for the at least a part the file and that the file server perform the input/output operation on the file on behalf of the client. If no part of the file needs a copy-on-write to be performed therefor, the device performs the input/output operation directly to the file.
The details of various embodiments of the claimed invention are set forth in the accompanying drawings and the description below. Other features and advantages will become apparent from the description, the drawings, and the claims.
Like reference numbers and designations in the various drawings indicate like elements.
In the embodiment shown in
The shared storage device 106 is communicatively coupled to the clients 102 and the file server 108 using a storage area network (SAN) 144. The shared storage device 106 comprises an interface 145 (also referred to here as the “SAN” interface 145) that communicatively couples the shared storage device 106 to the SAN 144 and to the other devices communicatively coupled thereto (that is, the clients 102 and the file server 108). Each client 102 comprises an interface 147 (also referred to here as the “SAN” interface 147 or the “storage device” interface 147) that communicatively couples the client 102 to the SAN 144 and to the shared storage device 106. The file server 108 comprises an interface 149 (also referred to here as the “SAN” interface 149 or the “storage device” interface 149) that communicatively couples the file server 149 to the SAN 144 and to the shared storage device 106. In one implementation, the storage area network 144 comprises a fiber channel storage-area network having, for example, a point-to-point or switched topology. In such an implementation, the interface 145, each interface 147, and the interface 149 comprises a fiber channel network interface for coupling the respective device to such a fiber channel SAN.
In other embodiments, the clients 102, the shared storage device 106, and the file server 108 are communicatively coupled in other ways.
In the embodiment shown in
In the embodiment shown in
Data is stored on the storage media 105 of the shared storage device 106 in a plurality of physical storage units. A file system 107 is used to organize, store, retrieve, and manage the data stored on in the physical storage units on the storage media 105. In the embodiment shown in
In the embodiment shown in
For each file 130 that existed in the live volume 104 at the time the snapshot 136 was initially created, the snapshot storage map 138 contains one or more entries that point to (or otherwise reference) one or more extents 132 stored on the storage media 105 that contain the data stored in that file 130 at the time the snapshot 136 was created. If a new file 130 is created and stored in the live volume 104 after the snapshot 136 was created, that new file 130 is not copied to the snapshot 136 and the snapshot storage map 138 does not contain an entry that references the new file 130. The new file 130 is not a part of the snapshot 136 because the new file 130 was not stored in the live volume 104 at the time the snapshot 136 was created.
Before a copy-on-write is performed for a particular part of a file 130 that is contained in the snapshot 136, the entries in the snapshot storage map 138 that correspond to that part of the file 130 point to the same one or more extents 132 that are pointed to by the entries in the live storage map 134 that correspond to that part of the file 130. A copy-on-write is performed for a particular part of a file 130 the first time, after the snapshot 136 was created, that the particular part of the file 130 is changed. For example, a copy-on-write is performed for a part of a file 130 before that part of the file 130 is written to. When a copy-on-write is performed on a part of a file 130, the data stored in that part of the file 130 is copied from the one or more extents 132 in which that data is stored to one or more new extents 132. The one or more entries in the snapshot storage map 138 for that part of the file 130 are updated to point to the new extents 132.
In the particular embodiment shown in
The file server 108 maintains information that is indicative of which of the extents 132 (and/or the logical entity corresponding thereto) on the shared storage device 106 need a copy-on-write performed therefor. In the embodiment shown in
Also, when a new file 130 is added to the live storage volume 104 after the snapshot 136 was created, the entries in the live storage map 134 for that new file 130 indicate that a copy-on-write does not need to be performed for any part of the new file 130 (or for any of the one or more extents 132 at which the new file 130 is stored).
When a client 102 wishes to make a change to a file 130, the file server 108 sends to the client 102 information indicating which part or parts of the file 130 (and the one or more extents 132 in which those parts are stored) need a copy-on-write performed therefor. Any such copy-on-write needs to be performed by the file server 108 before any data stored in such a part (or corresponding extent 132) is changed. The client 102, in connection with performing an input/output operation that would change a part of the file 130 (for example, a write), uses this information to determine if that part of file 130 needs a copy-on-write to be performed for that part. If a copy-on-write needs to be performed for that part of the file 130, the client 102 requests that the file server 108 perform any copy-on-writes that are needed and that the file server 108 perform the input/output operation on the client's behalf. However, if a copy-on-write does not need to be performed for that part of the file 130, the client 102 can perform the input/output operation directly on the shared storage device 106. Input/output operations performed directly by the client 102 typically are performed more quickly than input/output operations performed by the file server 108.
In one embodiment, a predetermined bit contained within an entry in the live storage map 134 is set in order to indicate whether a copy-on-write needs to be performed for the part of a file 130 (and the corresponding extent 132 pointed to by that entry). In one implementation of such an embodiment, the most-significant bit of each entry in the live storage map 134 is set to indicate that a copy-on-write does not need to be performed for the part of a file 130 (and the corresponding extent 132 pointed to by that entry). One such embodiment is illustrated in
In the example shown in
When a client 102 wishes to open a file 130 for writing (checked in block 302 of
When the file server 108 receives the request from the client 102 (checked in block 352 of
If the client 102 receives a message indicating that the file 130 is locked (checked in block 306 of
If the file 130 is not locked, the client 102 receives from the file server 108 the one or more entries from the live storage map 134 that correspond to the file 130 (block 310) and opens the file for writing (block 312).
When the client 102 wishes to write to a particular region of the opened file 130 (checked in block 314), the client 102 uses the received entries to determine if a copy-on-write needs to be performed for any part of that region of the file 130 (checked in block 316). The region of the file 130 to which the client 102 wishes to write is also referred to here as the “targeted” region of the file 130. Any part of the targeted region for which a copy-on-write needs to be performed is also referred to here as an “uncopied” part of the targeted region. In the embodiment shown in
If a copy-on-write needs to be performed for a part of the targeted region of the file 130, the client 102 sends a request to the file server 108 requesting the file server 108 perform any needed copy-on-writes and perform the write on behalf of the client 102 (block 318). The client 102 identifies, for the file server 108, the targeted region of the file 130 and sends to the file server 108 the data to be written to the targeted region of the file 130. The data that is to be written to the targeted region of the file 130 is also referred to here as the “write data.”
When the file server 108 receives the write request (checked in block 362 of
In one implementation of such an embodiment, when an uncopied part of the targeted region is stored in less than all of the physical storage units that make up a particular extent 132 (referred to here as the “original extent” 132), the file server 108 performs a copy-on-write for only those storage units in which the uncopied part of the targeted region is stored and “splits” the original extent 132 into two extents as described below in connection with
After the copy-on-write is complete, the file server 108 writes the write data to the targeted part of the file 130 (block 368 of
The client 102 receives the updated entries from the live storage map 134 for the opened file 130 (block 320 of
When the client 102 wishes to write to a particular part of the opened file 130 and the client 102 (based on the entries from the live storage map 134) determines that a copy-on-write does not need to be performed for the targeted region of the file 130, the client 102 directly writes the write data to the one or more extents 132 in which the targeted region of the file 130 is stored on the storage media 105 (block 322). In this way, the client 102 is able to perform direct writes to the storage media 105 when the targeted region has already been copied into the snapshot 136. As a result, the write data need not be transferred to the file server 108 over the cluster interconnect 142 in order to carry out a write to the storage media 105.
The operation of one implementation of the embodiment of method 350 shown in
As shown in
As shown in
The methods and techniques described here may be implemented in digital electronic circuitry, or with a programmable processor (for example, a special-purpose processor or a general-purpose processor such as a computer) firmware, software, or in combinations of them. Apparatus embodying these techniques may include appropriate input and output devices, a programmable processor, and a storage medium tangibly embodying program instructions for execution by the programmable processor. A process embodying these techniques may be performed by a programmable processor executing a program of instructions to perform desired functions by operating on input data and generating appropriate output. The techniques may advantageously be implemented in one or more programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory previously or now known or later developed, including by way of example semiconductor memory devices, such as erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and DVD disks. Any of the foregoing may be supplemented by, or incorporated in, specially-designed application-specific integrated circuits (ASICs).