Systems and methods for a read only mode for a portion of a storage system

Information

  • Patent Grant
  • 7953709
  • Patent Number
    7,953,709
  • Date Filed
    Thursday, March 27, 2008
    16 years ago
  • Date Issued
    Tuesday, May 31, 2011
    13 years ago
Abstract
In general, embodiments of the invention relate to reading data from and writing data to a storage system. Specifically, embodiments of the invention relate to a read only mode for a portion of a storage system. In one embodiment, a selective read-only mode for a portion of a storage system is implemented by monitoring a condition that affects a subset of persistent storage in a storage system, by detecting the condition, by entering a read-only mode for the subset, and by enforcing a policy of processing write requests and read requests to the storage system, which includes processing the write requests without modifying user data stored on the subset and processing the read requests, including requests for user data stored on the subset.
Description
LIMITED COPYRIGHT AUTHORIZATION

A portion of the disclosure of this patent document includes material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyrights whatsoever.


CROSS-REFERENCED APPLICATIONS

This application was filed on the same day as the following applications: U.S. patent application Ser. No. 12/057,298, entitled “SYSTEMS AND METHODS FOR MANAGING STALLED STORAGE DEVICES”, U.S. patent application Ser. No. 12/057,302, entitled “SYSTEMS AND METHODS FOR MANAGING STALLED STORAGE DEVICES”, and U.S. patent application Ser. No. 12/057,303, entitled “SYSTEMS AND METHODS FOR A READ ONLY MODE FOR A PORTION OF A STORAGE SYSTEM”, all of which are hereby incorporated by reference in their entirety.


BACKGROUND

1. Field of the Disclosure


In general, embodiments relate to reading data from and writing data to a storage system. Specifically, embodiments relate to a read only mode for a portion of a storage system.


2. Description of the Related Art


The increase in processing power of computer systems has ushered in a new era in which information is accessed on a constant basis. In many computing environments, storage systems are in continuous, or near-continuous, use. During such use, storage systems may experience conditions that negatively affect the durability of data in the storage system. For example, servicing a storage system may place certain storage components at risk by creating a possibility of an inadvertent loss of power to one or more of the storage components.


SUMMARY OF THE DISCLOSURE

Because only some of the components of the storage system may be affected by the dangerous condition, it may be preferable to allow the storage system to isolate its response to the dangerous condition. Hence, there is a need for responding to conditions that may negatively affect a portion of the storage system while allowing the remainder of the storage system to operate normally. In general, embodiments of the invention relate to reading data from and writing data to a storage system. Specifically, embodiments of the invention relate to a read only mode for a portion of a storage system.


In one embodiment, a method of implementing a selective read-only mode in a storage system is disclosed. The method includes monitoring a condition that affects a subset of persistent storage in a storage system; detecting the condition; entering a read-only mode for the subset; and enforcing a policy of processing write requests and read requests to the storage system, which policy includes processing the write requests without modifying user data stored on the subset and processing the read requests, including requests for user data stored on the subset.


In another embodiment, a computer-readable medium having instructions stored thereon is disclosed. The instructions including the method of implementing a selective read-only mode in storage system described above.


In yet another embodiment, a distributed storage system configured to implement a selective read-only mode is disclosed. The distributed storage system includes a plurality of storage modules configured to communicate via a network, each of the plurality of storage modules configured to process at least one of read and write requests on behalf of the entire distributed storage system, wherein the storage modules configured to handle write requests are configured to individually enter a read-only mode if at least one condition is detected that affects persistent storage on a respective storage module; and an allocation module configured to exclude storage modules operating in read-only mode from processing write requests.


In yet another embodiment, a storage module configured to communicate with other storage modules and to enter a read-only mode is disclosed. The storage module includes persistent storage with user data stored thereon; memory with temporary data stored thereon; a monitoring module configured to detect at least one condition that affects the persistent storage; a read-only module configured to place the storage module into a read-only mode after the monitoring module detects the at least one condition.


In yet another embodiment, a method of implementing a selective read-only mode in a storage system with a transaction journal is disclosed. The method includes: monitoring a condition that affects data stored in a transaction journal corresponding to a portion of a storage system, wherein the transaction journal records in a persistent storage transactions that modify user data stored in the portion; detecting the condition; entering a read-only mode for the portion of the storage system; enforcing a policy of processing write requests and read requests to the storage system, which policy includes processing the write requests without modifying user data on the portion and processing the read requests, including requests for user data stored on the portion.


In yet another embodiment, a computer-readable medium having instructions stored thereon is disclosed. The instructions including the method of implementing a selective read-only mode in storage system with a transaction journal described above.


In yet another embodiment, a distributed storage system configured to implement a selective read-only mode is disclosed. The distributed storage system includes: a plurality of storage modules configured to communicate via a network, each of the plurality of storage modules configured to process at least one of read and write requests on behalf of the entire distributed storage system, wherein the storage modules configured to handle write requests are configured to individually enter a read-only mode if at least one condition is detected that affects data stored in a transaction journal corresponding to a portion of a storage system, wherein the transaction journal records in persistent storage transactions that modify user data stored in the portion; and an allocation module configured to exclude storage modules operating in read-only mode from processing write requests.


In yet another embodiment, a storage module configured to communicate with other storage modules and to enter a read-only mode is disclosed. The storage module includes: a first persistent storage with user data stored thereon; a second persistent storage with transaction data stored thereon; memory with temporary data stored thereon; a monitoring module configured to detect at least one condition that affects the second persistent storage; a read-only module configured to place the storage module into a read-only mode after the monitoring module detects the at least one condition.


In yet another embodiment, a method of entering a read-only mode for a portion of a storage system is disclosed. The method includes: marking as unwritable a transaction journal associated with a portion of a storage system entering a read-only mode; backing up the transaction journal without writing to the portion; continuing to process unresolved transactions affecting the portion by marking as commit deferred, without writing to the transaction journal, those transactions for which a commit message is received after entering the read-only mode; and processing those transactions marked commit deferred, after leaving the read-only mode for the portion, by marking each of those transactions as committed in the transaction journal, and sending committed messages to other affected portions of the storage system.


In yet another embodiment, a computer-readable medium having instructions stored thereon is disclosed. The instructions including the method of entering a read-only mode for a portion of a storage system described above.


For purposes of this summary, certain aspects, advantages, and novel features are described herein. It is to be understood that not necessarily all such advantages may be achieved in accordance with any particular embodiment. Thus, for example, those skilled in the art will recognize that the systems and methods may be embodied or carried out in a manner that achieves one advantage or group of advantages as taught herein without necessarily achieving other advantages as may be taught or suggested herein. Furthermore, embodiments may include several novel features, no single one of which is solely responsible for the embodiment's desirable attributes or which is essential to practicing the systems and methods described herein. Additionally, in any method or process disclosed herein, the acts or operations of the method or process may be performed in any suitable sequence and are not necessarily limited to any particular disclosed sequence.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates one embodiment of a storage system configured to implement a read-only mode for a portion of the storage system.



FIG. 2 illustrates one embodiment of a problem that arises when a portion of journal data is lost.



FIG. 3 illustrates one embodiment of a string of bits that reflect conditions that may negatively affect a portion of the storage system.



FIGS. 4A, 4B, 4C, and 4D illustrate embodiments of a read-only mode for a portion of a storage system.



FIGS. 5A and 5B illustrate one embodiment of a process for writing data in a storage system when a portion of the storage system operates in a read-only mode.



FIGS. 6A and 6B illustrate a problem that arises when entering a read-only mode with unresolved transactions in some embodiments.



FIGS. 6C and 6D illustrate one embodiment of a protocol for deferring operations associated with unresolved transactions during a read-only mode.



FIG. 7A illustrates one embodiment of a protocol for mounting a file system in a read-write mode following a temporary power failure.



FIG. 7B illustrates one embodiment of a protocol for mounting a file system in a read-only mode following a temporary power failure.



FIG. 8 illustrates one embodiment of a state diagram for a protocol implementing a read-only mode for a portion of a storage system.



FIG. 9 illustrates one embodiment of a monitor process for detecting conditions relevant to a read-only mode for a portion of a storage system.



FIG. 10 illustrates one embodiment of a state change process for responding to a condition relevant to a read-only mode for a portion of a storage system.



FIG. 11 illustrates one embodiment of a file system process that implements a read-only mode for a portion of a storage system.



FIG. 12 illustrates one embodiment of a process for mounting a file system to a portion of the storage system.



FIG. 13 illustrates one embodiment of a process for processing deferred transactions after exiting a read-only mode for a portion of the storage system.





These and other features will now be described with reference to the drawings summarized above. The drawings and the associated descriptions are provided to illustrate embodiments of the invention and not to limit the scope of the invention. Throughout the drawings, reference numbers may be re-used to indicate correspondence between referenced elements. In addition, the first digit of each reference number generally indicates that figure in which the element first appears.


DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Systems and methods which represent one embodiment of an example application of the invention will now be described with reference to the drawings. Variations to the systems and methods which represent other embodiments will also be described.


For purposes of illustration, some embodiments will be described in a context of a distributed file system. Embodiments of a distributed file system suitable for accommodating embodiments of a read only mode for a portion of a storage system disclosed herein are disclosed in: U.S. patent application Ser. No. 10/007,003, titled, “SYSTEMS AND METHODS FOR PROVIDING A DISTRIBUTED FILE SYSTEM UTILIZING METADATA TO TRACK INFORMATION ABOUT DATA STORED THROUGHOUT THE SYSTEM,” filed Nov. 9, 2001, which claims priority to Application No. 60/309,803 filed Aug. 3, 2001; U.S. patent application Ser. No. 10/281,467 entitled “SYSTEMS AND METHODS FOR PROVIDING A DISTRIBUTED FILE SYSTEM INCORPORATING A VIRTUAL HOT SPARE,” filed Oct. 25, 2002, which issued as U.S. Pat. No. 7,146,524, on Dec. 5, 2006; and U.S. patent application Ser. No. 10/714,326 entitled “SYSTEMS AND METHODS FOR RESTRIPING FILES IN A DISTRIBUTED FILE SYSTEM, ” filed Nov. 14, 2003, which claims priority to Application No. 60/426,464, filed Nov. 14, 2002, all of which are hereby incorporated by reference herein in their entirety.


The present invention is not limited by the type of environment in which the systems and methods are used, however, and systems and methods may be used in other environments, such as, for example, other file systems, other distributed systems, the Internet, the World-Wide Web, a private network for hospitals, a broadcast network for a government agency, and an internal network for a corporate enterprise, an intranet, a local area network (LAN), a wide area network (WAN), a wired network, a wireless network, and so forth. Furthermore, some embodiments of the invention may include distributed databases and other distributed systems.


Some of the figures and descriptions, however, relate to embodiments of a distributed file system. It is also recognized that, in some embodiments, the systems and methods disclosed herein may be implemented as a single module and/or implemented in conjunction with a variety of other modules and the like. Furthermore, the methods and processes disclosed herein may be embodied in, and fully automated via, software code modules executed by one or more computers or processors. Computer code modules may comprise computer readable mediums with instructions stored thereon. The code modules may be stored in any type of computer-readable medium or other computer storage device. Some or all of the systems and methods may alternatively be embodied in specialized computer hardware. Moreover, the specific implementations described herein are set forth in order to illustrate, and not to limit, the invention.


The word module refers to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, C or C++, and stored in a computer-readable medium. A software module may be compiled and linked into an executable program, installed in a dynamically linked library, or may be written in an interpreted programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software modules may be callable from other modules or from themselves, and/or may be invoked in response to detected events or interrupts. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware modules may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors. The modules described herein are preferably implemented as software modules, but may be represented in hardware or firmware. Moreover, although in some embodiments a module may be separately compiled, in other embodiments a module may represent a subset of instructions of a separately compiled program, and may not have an interface available to other logical program units.


I. Overview


In general, embodiments relate to reading from and writing to data storage systems. Specifically, embodiments implement a read-only mode for a portion of a data storage system. In some circumstances, it may be advantageous for a data storage system to enforce a read-only mode for only a portion of the system. A portion-specific read-only mode may provide an efficient solution in various circumstances. During a portion-specific read-only mode, read operations may be processed on the entire storage system, and write operations may be processed except on the portion in a read-only mode. Thus, a portion-specific read-only mode allows for an efficient way to prevent writing to only a portion of a storage system when it may be advantageous to prevent writing to that portion, to permit writing to the remaining portions, and/or to permit reading from the entire storage system.


In some circumstances, a portion-specific read-only mode may be advantageous when it is possible that certain components of the system may affect only a portion of the system without affecting another portion of the system in the same way. For example, in a distributed storage system that includes a cluster of data storage nodes, a power supply for one of the nodes may affect that node without directly affecting the other nodes. In some circumstances, a loss of power may lead to the loss of data that would compromise the reliability and/or consistency of a data storage system. Thus, it may be advantageous to detect conditions indicating that a subsequent power loss would be likely to result in lost data for a particular node. When such a condition is detected, it may then be advantageous to enter a read-only mode for that node to prevent such data loss. For example, in a read-only mode, a node might make a permanent backup copy (unaffected by a subsequent power loss) and then prevent subsequent writes to data that would be affected by a subsequent power loss. Preventing subsequent writes to the jeopardized data ensures that the backup copy is consistent in the event that it is used to restore data following a subsequent power loss. Enforcing the read-only mode for just that node allows the remaining portions of the system (unaffected by a subsequent power loss) to continue as normal and also allows the affected node to participate in read operations. A node-level, read-only mode, therefore, allows a system, for example, to take precautionary measures for only the affected portions of the system. Such narrowly tailored responses permit a system, for example, to provide efficient safety precautions-ensuring data integrity, while allowing as much functionality as may be prudent in the circumstances. One skilled in the art will appreciate that there are many suitable circumstances in which a portion-specific, read-only mode may be implemented for a data storage system.


A description of an example architecture in which embodiments may be implemented will first be described. Then, a problem that arises when a portion of journal data is lost will be described. Next, a description of certain conditions that may negatively affect the durability of a portion of storage system will be described. Various embodiments of storage systems with identifiable portions will then be described. Next, there will be a description of a process for writing data to a storage system when a portion of the system is operating in a read-only mode. Because entering a read-only mode with unresolved transactions presents a particular problem, this problem and embodiments of a possible solution will then be described. Finally, processes for implementing the embodiments described herein will then be described.


II. System Architecture



FIG. 1 illustrates one embodiment of a storage system with a read-only mode for a portion of the system. In the illustrated embodiment, the distributed storage system 100 includes a cluster of four nodes 102, storage modules with processors and at least one storage device, connected via a switched network 104. Although in the illustrated embodiment the distributed system 100 is connected via a switched network 104, in other embodiments the distributed storage system may be connected in different ways, including, for example, direct wired connections between the nodes of the storage system. The nodes 102, storage modules, of distributed storage system 100 are divided conceptually into top halves 106 and bottom halves 108. Top halves 106 handle requests for the distributed storage system 100. For example, top halves 106 receive requests to deliver the contents of data stored under the distributed storage system 100, and then satisfy those requests by communicating the requested data to the respective requesters.


The bottom halves 108 manage the distribution of data across the distributed storage system 100. For example, the bottom halves 108 operate the storage of data by distributing portions of a file, for example, across the nodes 102. Some of the nodes 102 operate exclusively with the functionality of a top half 106. These nodes, for example, handle requests to the file system, but do not participate in the storage of data on the distributed storage system 100. The distributed storage system 100 provides a node-level read-only mode. When a node 102 is in a read-only mode, the bottom half 108 participates in answering read requests, but does not participate in handling write requests. For write requests, the remaining nodes 102 handle the requests. In general, top halves 106 are not affected by the node-level read-only mode. In other embodiments, top halves 106 or other portions of nodes 102 may also be affected.


In general, the distributed storage system 100 may include a variety of computer systems such as, for example, a computer, a server, a smart storage unit, and so forth. In one embodiment, the computer may be a general purpose computer using one or more microprocessors, such as, for example, an Intel® Pentium® processor, an Intel® Pentium® II processor, an Intel® Pentium® Pro processor, an Intel® Pentium® IV processor, an Intel® Pentium® D processor, an Intel® Core™ processor, an xx86 processor, an 8051 processor, a MIPS processor, a Power PC processor, a SPARC processor, an Alpha processor, and so forth. The computer may run a variety of operating systems that perform standard operating system functions such as, for example, opening, reading, writing, and closing a file. It is recognized that other operating systems may be used, such as, for example, Microsoft® Windows® 3.X, Microsoft® Windows 98, Microsoft® Windows® 2000, Microsoft® Windows® NT, Microsoft® Windows® CE, Microsoft® Windows® ME, Microsoft® Windows® XP, Palm Pilot OS, Apple® MacOS®, Disk Operating System (DOS), UNIX, IRIX, Solaris, SunOS, FreeBSD, Linux®, or IBM® OS/2® operating systems.


III. Losing Journal Data


During the operation of a distributed storage system, conditions arise that may negatively affect the durability of persistent storage. Persistent storage may be any suitable storage medium for storing the contents of data when a main power supply source is unavailable. For example, persistent storage may include non-volatile random access memory (NVRAM), flash memory, hard-disk drive, compact-disc read-only memory (CD-ROM), digital versatile disc (DVD), optical storage, and magnetic tape drive. Persistent storage itself may depend on power supplies. Thus, the term persistent may be a relative term, in that some storage is persistent relative to a certain condition, such as the loss of a main power supply. For example, NVRAM may be battery-backed random access memory that persists even when a main power supply is unavailable. Persistent storage may also be independent of power supplies in general, such as flash memory, hard-disk drives, CD-ROMs, DVDs, optical storage devices, and tape drives. One skilled in the art will appreciate many suitable mediums for persistent storage.


When the durability of persistent storage is negatively affected, a storage system may become unreliable. For example, some persistent storage components may store a write journal that keeps track of write transactions in a storage system. FIG. 2 illustrates one embodiment of a problem that arises when a component storing journaled data loses power and the contents of the journal are lost. Data journals keep records of write transactions until the respective data has been completely written to a permanent storage location. For example, many hard-disk drives employ write caches to temporarily store data while the disk drive rotates the magnetic disks and encodes data values. If a hard-disk drive loses power before the contents of its write cache have been written to the disk platters, then the hard disk drive will be in an inconsistent state because the storage system expects that the data has been written, but in reality, the data has been lost. By keeping a data journal, the contents of the hard disk drive can be written again after a power failure. This is because a data journal may be kept in non-volatile memory, which persists even after a power failure. Embodiments of a data journal for accommodating embodiments of a read only mode for a portion of a storage system disclosed herein are disclosed in: U.S. patent application Ser. No. 11/506,597, titled, “SYSTEMS AND METHODS FOR PROVIDING NONLINEAR JOURNALING,” filed Aug. 18, 2006; U.S. patent application Ser. No. 11/507,073, titled, “SYSTEMS AND METHODS FOR PROVIDING NONLINEAR JOURNALING,” filed Aug. 18, 2006; U.S. patent application Ser. No. 11/507,070, titled, “SYSTEMS AND METHODS FOR PROVIDING NONLINEAR JOURNALING,” filed Aug. 18, 2006; and U.S. patent application Ser. No. 11/507,076, titled, “SYSTEMS AND METHODS FOR ALLOWING INCREMENTAL JOURNALING,” filed Aug. 18, 2006; all of which are hereby incorporated by reference herein in their entirety.


If the contents of a data journal are lost, however, the data storage system may reach an inconsistent state even if the remaining components are not affected. In some embodiments, data journals kept in non-volatile memory are subject to failure if the batteries supplying the non-volatile memory are interrupted. For example, many non-volatile memory components are backed up with power supplies, such as batteries, that are independent from the power source of the storage system, or from a portion of the storage system. Separate batteries, for example, may supply power to the non-volatile memory, allowing the contents to persist even when other components of the data storage system lack power. If these independent batteries are interrupted, however, then the journaled data may be lost when, for example, there is a loss of main power supply and there is no backup power supply.



FIG. 2 illustrates the consequences of losing journaled data. The journals 200 correspond to three separate portions of a storage system: portion A, portion B, and portion C. These portions are referred to, respectively, as NodeA, NodeB, and NodeC, which are three storage modules acting collectively to implement the storage system. Journals 200 include a record of three different transactions. The transactions have different data blocks spread across the three different portions, the nodes, of the storage system. Transaction blocks 204 correspond to one of the three transactions and keep a record of the respective transaction states such as “prepared,” “aborted,” and “committed.” Data blocks 206 include the data to be written to permanent storage, such as hard-disk drives, and are either flushed or unflushed. Unflushed data blocks 206 persist until they have been flushed. Flushed data blocks may be unlinked from the transaction, as their contents have been recorded in permanent storage. Although a journal does not keep a permanent record of data, losing the contents of a journal may cause the storage system to have an inconsistent state.



FIG. 2 illustrates how losing JournalB 200 would affect the contents of the storage system. The transaction blocks 204 corresponding to JournalA 200 and JournalC 200 have the same recorded transaction states for all three transactions. JournalB 200, however, has a different transaction state for two of the three transactions. This may be the result of NodeB having been a coordinator and/or initiator for the respective transactions. Transaction protocols involving coordinators and/or initiators are described in greater detail below with reference to FIGS. 7A, 7B, and 8.


JournalB 200 reports that transaction T2 has been aborted. In some embodiments, an aborted transaction may be marked aborted in its respective transaction block 204. Furthermore, in some embodiments, an aborted transaction may be unlinked from a transaction list. In the illustrated embodiment, the aborted transaction is unlinked from the list and marked aborted for clarity, though one skilled in the art will appreciate that an unlinked transaction may not be marked as aborted. If the information that transaction T2 aborted is lost during, for example, a power failure affecting JournalB, NodeA and NodeC will be unaware. Furthermore, when NodeB comes back on line, NodeA and NodeC will not be able to communicate to NodeB that transaction T2 aborted. With respect to transaction T3, only JournalB 200 reported that the transaction had committed. One of its data blocks 206 has also been flushed to disk, meaning that the permanent contents of the disk record the data corresponding to the transaction. If the journaled data in JournalB 200 is lost, NodeA and NodeC will not know that transaction T3 committed. Thus, the data storage system will be in an inconsistent state.


Because losing journaled data may affect the reliability and/or consistency of a data storage system, it may be advantageous to detect conditions that may negatively affect the persistent storage components that stores journaled data. Furthermore, once such a condition is detected, it may be advantageous to take actions that would prevent the permanent loss of journaled data in the event that the persistent storage components lose their data. For example, it may be advantageous for a node in a distributed storage system to enter a read-only mode when a condition is detected that may lead to the failure of that nodes' persistent storage for journaled data. A read-only mode may provide a consistent state to which the journal may be restored after a failure affecting the persistent storage storing the journal. In some embodiments, a read-only module may cause a portion of a storage system to enter a read-only mode.


For example, entering the read-only mode may include waiting until transactions in the node's journal sufficiently resolve, marking the node's journal as unwritable, and backing up the contents of the node's journal. In read-only mode, subsequent write transactions are prevented from modifying the node's journal. Processing write transactions in a data storage system implementing a portion-specific read-only mode is discussed in greater detail below with respect to FIGS. 5A and 5B. If that node's persistent storage for the journal then fails, the backup copy of the journal can be used because the backup is consistent with the lost journal data, which was not modified during the read-only mode.


As discussed above, entering a read-only mode may include waiting until transactions in the node's journal sufficiently resolve. Determining when a transaction has sufficiently resolved may depend on the transaction protocol used to resolve the transactions. Transaction protocols suitable for implementing the embodiments described herein are disclosed in: U.S. patent application Ser. No. 11/262,306, titled, “NON-BLOCKING COMMIT PROTOCOL SYSTEMS AND METHODS,” filed Oct. 28, 2005, which is incorporated by reference herein in its entirety. In some embodiments, transactions have sufficiently resolved when they are either aborted or committed. In some embodiments, if a transaction has not resolved into an aborted or committed state—that is, the transaction is in, for example, a prepared state—it may be disadvantageous to prevent writes to the journal during the read-only mode. For example, if a transaction is in a prepared state when a journal is marked unwritable, the transaction may not resolve during the read-only mode, which may prevent the other nodes from processing the transaction. Thus, it may be advantageous to allow transactions to resolve before marking a node's journal as unwritable and saving a backup copy.


Enforcing a node-level read-only mode is advantageous because it allows a portion of the system to respond to conditions that may only affect journal data for a portion of the system. During the read-only mode, the node may continue to participate in read transactions and the remainder of the system can participate in both read and write transactions. One skilled in the art will appreciate that a portion-specific, read-only mode may be advantageous for other circumstances in addition to possible loss of journaled data.


IV. Detecting Conditions


As described above, it may be advantageous to enter a portion-specific read-only mode when certain conditions are detected that affect a particular portion of a storage system. In some embodiments, a monitoring module may detect certain conditions that trigger a read-only mode.


In some embodiments, it may be advantageous to respond to conditions that may affect the durability of power supplies providing back-up power to non-volatile random access memory (“NVRAM”). One skilled in the art will appreciate that there are various suitable ways to provide a backup power supply, including batteries. Furthermore, it will be appreciated by one skilled in the art that there are many ways in which the durability of NVRAM may be negatively affected. For example, the power supply of NVRAM may be hot-swappable, meaning that, for example, batteries can be replaced without interrupting the use of the memory. During a hot-swap, the non-volatile memory effectively becomes volatile memory, and the contents of the memory are temporarily unprotected. Opening the chassis of a storage node is another condition that creates the possibility for NVRAM to be negatively affected. For example, while the chassis is opened in order to service cooling fans, there is a possibility that the backup power supply for the NVRAM may become disconnected. Once disconnected, the non-volatile memory effectively becomes volatile memory and the contents of the memory are temporarily unprotected should power to the node cease. Weak or failing power supplies, such as weak or failing batteries, are another condition that may negatively affect the durability of NVRAM. If the backup power supply for NVRAM becomes too weak or ceases to function altogether, then the non-volatile memory effectively becomes volatile memory and the contents are unprotected in the event of a power failure which would cause the loss of data.


Although the conditions described above are directed to detecting conditions that negatively affect backup power supplies, there are also other conditions that may negatively affect the durability of a portion of a storage system, as well as other conditions that might trigger entry into a read-only mode. In general, embodiments described herein may include any relevant hardware failure that could negatively affect the durability of a portion of a storage system, as well as any other relevant reason for entering a read-only mode, such as a decision by a user to enter a read-only mode. For example, in some embodiments, an NVRAM component may be backed up with flash memory. Thus, some embodiments may monitor for conditions that negatively affect the back-up flash memory. As another example, non-volatile memory components may have support for auto-correcting single-bit errors. If the number of single-bit errors being corrected is high enough, it may indicate a higher likelihood of device failure. Thus, some embodiments may monitor for high error rates. Furthermore, although the conditions described above are directed to detecting conditions that negatively affect backup power supplies, some embodiments may monitor conditions related to the health of the main power supplies. For example, in some embodiments, storage system devices may have redundant power supplies. If one of the power supplies fails, then there is an increased likelihood of relying on the backup power supplies supporting the non-volatile memory. Thus, some embodiments may monitor for conditions related to the health of the main power supplies.



FIG. 3 illustrates one embodiment of a set of bits used to reflect the status of conditions that may negatively affect the durability of portions of the data search system. For example, in some embodiments, three of the conditions discussed above—a removed battery, an open chassis, and a failed battery—may be assigned a single bit to reflect whether the condition is present. In some embodiments, there may be a bit corresponding to each one of the monitored conditions. For example, in a cluster of four nodes, each node may have a corresponding bit to reflect whether the chassis of the respective node has been opened, or whether the batteries corresponding to the non-volatile memory in the respective node had been removed or are failing. In some cases, more than one condition may be present or no conditions may be present. In some embodiments, users may be able to trigger a read-only mode by selecting the mode from a graphical user interface (GUI) as well as from a hardware interface, such as a button on a device. In some embodiments, it is also possible to disallow portions of the data storage system from entering a read-only mode. This “disallow” feature may be imposed based on system defaults or may be configurable by the administrator or may be selected by an individual user, as well as other possible implementations.



FIG. 3 illustrates several examples of bits string reflecting conditions that may determine whether a portion of the file system is placed in a read-only mode. The first example is a bit string of “101000.” The bit string signifies that the “Battery Failing” condition and the “Open Chassis” condition are both satisfied. The presence of these two conditions triggers, in the illustrated embodiment, the read-only mode for the corresponding portion of the storage system. In the second example, the bit string is “001001.” The “Open Chassis” condition is satisfied and the “Disallow” condition is also satisfied. Thus, although the “Open Chassis” condition is satisfied, because the “Disallow” condition is also satisfied, the corresponding portion of the storage system remains in a read-write mode. In the third example, the bit string is “000000.” Because none of the conditions are satisfied, in the illustrated embodiment, the corresponding portion of the data storage system is in a read-write mode. In the fourth example, the bit string is “000010.” The “User (software)” condition is satisfied, meaning that a user has selected through a software interface to trigger the read-only mode.


One skilled in the art will appreciate that there are various suitable ways to use a bit string in connection with transitioning a portion of a storage system to a read-only mode. For example, a bit string may reflect the current status of conditions. Various processes may evaluate the bit string in order to execute appropriate operations in response to the presence or absence of particular conditions. Additionally and/or alternatively, various processes may set the bits in the string after detecting the condition and initiating operations in response to the presence or absence of particular conditions. Bitwise operations, such as the logical “OR” command, may be used, for example, to determine whether to execute certain operations. One skilled in the art will appreciate that other suitable operations on the bit string may be used, and that there are other suitable uses for the bit string than those enumerated herein.


V. Hardware Configurations



FIGS. 4A, 4B 4C and 4D illustrate different hardware configurations of various embodiments of a storage system that implements the read-only mode for a portion of the storage system.



FIG. 4A illustrates a data storage system 400 that includes three data storage nodes 402 connected via a switched network 404. There is a non-volatile memory component 406 corresponding to each of the data storage nodes 402. The data storage system 400 is configured to implement a node-level read-only mode. In the illustrated embodiment, node B is in a read-only mode. The contents of non-volatile memory 406 corresponding to node B, therefore, may not be modified.



FIG. 4B illustrates a data storage system 400 that comprises a single unit. The data storage system 400 includes three non-volatile memory components 406 that correspond to three different portions of data storage system 400. Data storage system 400 is capable of implementing a read-only mode for each of the three different portions. In the illustrated embodiment, portion B is in a read-only mode. Thus, the contents of non-volatile memory component 406 corresponding to portion B may not be modified.



FIG. 4C illustrates a data storage system 400 comprising two data storage nodes 402 connected via a switched network 404. One of the data storage nodes 402 includes two non-volatile memory components 406, and the other data storage node 402 comprises only one non-volatile memory component 406. In the illustrated embodiment, each of the non-volatile memory components 406 corresponds to a different portion of the data storage system 400. Thus, two portions of the data storage system correspond to the first data storage node 402, and one of the portions corresponds to the second data storage node 402. In the illustrated embodiment, the B portion of the data storage system 400 is in a read-only mode. Thus, the non-volatile memory component 406 corresponding to node B portion of the data storage system 400 may not be modified.



FIG. 4D illustrates a data storage system 400 that includes three data storage nodes 402 connected directly to one another in a fully connected topology. Each data storage node 402 includes a non-volatile memory component 406. The data storage system 400 implements a node-level read-only mode. In the illustrated embodiment, node B is in a read-only mode. Thus, the contents of non-volatile memory component 406 corresponding to node B may not be modified.


VI. Processing Write Requests



FIG. 5A illustrates one embodiment of processing write requests in a data storage system when a portion of the data storage system is in a read-only mode. Embodiments of the nodes 102, storage modules, shown in FIG. 1 may be configured to store and/or implement embodiments of write process 500. Write process 500 may be included as part of an allocation module that determines where to write and/or rewrite data to a data storage system.


In state 502, write process 500 determines whether the data block to be written corresponds to a read-only mode. In other words, state 502 determines whether the write request would include a portion of the data storage system that is currently in a read-only mode. For example, a write transaction may involve three data blocks distributed across three nodes: NodeA, NodeC, and NodeF. If NodeC is in a read-only mode, then the write process 500 determines that the data block corresponding to NodeC is on a read-only node. Because the data blocks corresponding to a node in a read-only mode may not be written, the write process 500 must find an alternative way in which to write the data request. If the data block does not correspond to a read-only mode, then in state 504, the write process 500 writes the data block in its defined location. For example, the data blocks on NodeA and NodeF may be executed because the respective nodes are in a read-write mode. Thus, the data blocks corresponding to NodeA and NodeF are written to their respective locations on NodeA and NodeF.


With respect to nodes in a read-only mode, the write process 500, in state 506, writes the data block in a newly allocated location. This newly allocated location may be in an available location on another node that is not in a read-only mode. For example, write process 500 could write the data block corresponding to NodeC on one of the other nodes, such as on NodeA or on NodeF, or also on NodeD or on NodeE, and so forth, so long as these nodes are in a read-write mode. Because the data block previously corresponding to NodeC has been written to a newly allocated location, the metadata corresponding to the data block may be updated to reflect the new location. In other words, prior to the write request, the data block corresponding to NodeC was stored at a location on NodeC, and this location may have been recorded in various places in the data storage system to track the location of the data block. This metadata may be updated once the data block has been redirected to a new location.


As discussed above, with reference to FIG. 5A, when a data block corresponding to a read-only mode is reallocated to an available location on another node, the metadata corresponding to the respective data block is updated. FIG. 5B illustrates one example embodiment of updating metadata to reflect the new location for a data block. File X 550 includes three data blocks: Data Block 1, Data Block 2, and Data Block 3. These three data blocks are spread across the distributed storage system 552, which includes data storage nodes 554 connected via switch 556. Specifically, Data Block 1 is located on NodeA 554 at block location 3. Data Block 2 is located on NodeC 554 at block location 12. Data Block 3 is located on NodeF 554 at block location 16. At some point after Data Block 2 has been written to NodeC 554 at block location 12, NodeC enters a read-only mode. Accordingly, Data Block 2 can no longer be modified. If a write request corresponding to File X 550 attempts to modify Data Block 2, data storage system 552 allocates a new location for Data Block 2 and writes the updated value to the newly allocated location. The metadata for File X 550 is then updated to indicate the newly allocated location of Data Block 2. Thus, Data Block 2 is now located on NodeE 554 at block location 17, which is reflected in the metadata for File X 550. The location of Data Block 2 can be modified for File X 550, even though the old location of Data Block 2 was on a node in a read-only mode, because the metadata for File X 550 may be stored on different nodes than those on which the data blocks are stored. For a more detailed explanation of how the metadata and data of a file may be distributed among different nodes, see U.S. patent application Ser. No. 10/007,003, titled, “SYSTEMS AND METHODS FOR PROVIDING A DISTRIBUTED FILE SYSTEM UTILIZING METADATA TO TRACK INFORMATION ABOUT DATA STORED THROUGHOUT THE SYSTEM,” filed Nov. 9, 2001, which claims priority to Application No. 60/309,803 filed Aug. 3, 2001, and which was incorporated above by reference in its entirety.


VII. Mounting Read-Only


As described above with respect to FIG. 3, a portion of a storage system may enter a read-only mode by waiting until relevant transactions have sufficiently resolved, marking the portion unwritable, and backing up the portion. In some circumstances, it may be disadvantageous and/or not possible to wait for transactions to resolve as part of entering a read-only mode. For example, in some circumstances, it may be advantageous to enter a read-only mode when a file system mounts without waiting for the relevant transactions to resolve further. Thus, in some embodiments, a portion-specific read-only mode may be entered by marking the portion unwritable and backing up the portion, without first waiting for the relevant transactions to resolve. FIGS. 6A and 6B illustrate one embodiment of a problem that arises when a portion enters a read-only mode with unresolved transactions, such as when the file system is mounted. In some embodiments, a node in a read-only mode can defer operations associated with resolving the transactions until after leaving the read-only mode. FIGS. 6C and 6D illustrate embodiments of deferring transactions until after leaving a read-only mode.



FIG. 6A illustrates three data journals corresponding to three portions of a data storage system (three storage modules): JournalA corresponding to NodeA, JournalB corresponding to NodeB, and JournalC corresponding to NodeC. There are two global transactions recorded in the respective journals. Transaction blocks 604 indicate the transaction state for each respective transaction. Both transaction T1 and transaction T2 are in the “prepared” state. All of the data blocks 606 remain in the “unflushed” state because the transactions have not yet committed. FIG. 6A illustrates an event, such as a power failure, in which NodeB becomes temporarily unavailable.



FIG. 6B illustrates that NodeB returns to the cluster and mounts in a read-only mode. By this time, NodeA and NodeC of the data storage system have further resolved transaction T1 and transaction T2. Transaction T1 has been aborted and transaction T2 has committed. The respective transaction block 604 corresponding to transaction T1 has been unlinked from the respective journal block 602 and the respective data blocks 606 corresponding to transaction T2 have been flushed to permanent storage. NodeB can communicate with NodeA and NodeC to discover that Transactions T1 and T2 have aborted and committed, respectively. Because NodeB is in a read-only mode, however, JournalB cannot be modified. Thus, the modifications to JournalB are deferred until NodeB exits a read-only mode. Furthermore, NodeB cannot communicate to NodeA and NodeC its acknowledgment that transaction T1 and transaction T2 have aborted and committed, respectively. If NodeB communicates this information to NodeA and NodeC, then JournalA and JournalC would remove the transaction state information related to transactions T1 and T2. Then, if NodeB were to lose the information that Transactions T1 and T2 had aborted and committed, respectively, NodeB would not be able to regain this information because JournalA and JournalC no longer keep a record that could be communicated to NodeB. In some embodiments, journal writes and messages to other portions of the data storage system are deferred until the relevant portion of the storage system exits the read-only mode.



FIG. 6C illustrates one embodiment of memory data structures that may be used to defer certain transaction processes until an affected portion exits a read-only mode. MemoryB 650 corresponds to NodeB of the data storage system. MemoryB 650 includes a deferred write queue (DWQ) 656 and a list of transaction states, such as the transaction state for transaction T2 654. The DWQ 656 includes deferred write commands 658, including a deferred write command 658 for unlinking the transaction block 604 corresponding to transaction T1. The DWQ 656 may also include other deferred write commands 658 (not illustrated). The transaction state block 654 is marked commit-deferred. This information allows the journal system to commit transaction T2 when NodeB exits the read-only mode.



FIG. 6D illustrates the contents of MemoryB 650 and JournalB 600 after processing the deferred transactions related to transactions T1 and T2. Once NodeB exits the read-only mode and enters the read-write mode, transactions T1 and T2 may be aborted and committed, respectively. The DWQ 656 is emptied by executing each of the deferred write commands 658. Thus, transaction T1's transaction block 604 is unlinked from the journal block 602 corresponding to JournalB 600. Furthermore, the transaction blocks 654 are traversed, and any transactions marked “commit-deferred” are committed. Thus, transaction block 604 corresponding to transaction T2 is marked “committed” and the respective data block 606 is subsequently flushed to disk. After the respective data block 606 is flushed and after NodeB has received commit confirmation from the other participants of transaction T2, T2's transaction block 604 can be unlinked from the journal block 602 corresponding to JournalB 600. In the illustrated embodiments the DWQ 656 is used for deferred operations associated with an aborted transaction and the list of transaction blocks 654 is used for deferred operations associated with a committed transaction. In other embodiments, these data structures may be used in other ways to defer certain operations. Additionally and/or alternatively, other data structures may be used in order to defer certain operations.



FIGS. 7A and 7B illustrate embodiments of protocols for mounting a portion of a data storage system after a temporary power failure. FIG. 7A illustrates one embodiment of a protocol for mounting in a read-write mode. Embodiments of protocols for managing interactions between nodes that process global transactions are disclosed in: U.S. patent application Ser. No. 11/262,306, titled, “NON-BLOCKING COMMIT PROTOCOL SYSTEMS AND METHODS,” filed Oct. 28, 2005, which was incorporated above by reference in its entirety. FIG. 7B illustrates one embodiment of a protocol for mounting in a read-only mode. In some embodiments, the protocols in the above-mentioned application may be adapted to accommodate protocols for mounting in a read-only mode.


In FIGS. 7A and 7B, a storage system 700 is illustrated. Two portions of the data storage system 700 are illustrated communicating with one another. These portions are referred to as nodes 702, which are storage modules with processors and at least one storage device. NodeC 702 has a special role as the initiator, coordinator, and participant of two transactions being processed by data storage system 700. NodeB 702 is a participant of the two transactions. The roles of the initiator, coordinator, and participants are described in greater detail in the above-mentioned application incorporated herein it its entirety. While processing the two transactions, NodeB experiences a power failure that temporarily interrupts its ability to process the transactions. FIGS. 7A and 7B illustrate embodiments of mounting a file system after a temporary power failure in read-write and read-only modes, respectively. Although recovering from a temporary power failure is one circumstance in which a portion of a storage system might remount a file system, there are other possible circumstances. For example, a storage node might be disconnected from a cluster of nodes and might then be unmounted from the file system. One skilled in the art will appreciate that there are other suitable scenarios in which a storage node may remount a file system following certain circumstances.



FIG. 7A illustrates one embodiment of a protocol to mount NodeB 702 in a read-write mode after experiencing a temporary power failure. In state 704, NodeC 702 transmits “Prepare” messages to NodeB 702 for transactions T1 and T2. In state 706, NodeB 702 prepares transactions T1 and T2. This preparation may include receiving data blocks from NodeC 702 corresponding to transactions T1 and T2 that are to be written in the permanent storage of NodeB 702. In state 708, NodeB 702 communicates to NodeC that the transactions T1 and T2 have been prepared. In state 710, NodeB 702 experiences a temporary power failure. When NodeC 702 communicates, in state 712, to NodeB 702, that transaction T1 should abort and that transaction T2 should commit, NodeB 702 does not receive the messages. In state 714, the NodeB 702 mounts in a read-write mode.


In state 716, NodeB 702 replays the contents of its journal. In other words, NodeB 702 traverses the transactional information recorded in the journal and begins to process the respective transactions. While in the process of replaying the journal, NodeB 702 recognizes that transactions T1 and T2 have been prepared. NodeB 702 then communicates to NodeC 702 that the transactions have been prepared. In state 718, these “Prepared” messages are communicated to NodeC 702. In state 720, NodeC 702 instructs NodeB 702 to abort transaction T1 and to commit transaction T2. In some embodiments, the message communicated may request that the recipient return a communication once it has finished processing the respective command. For example, NodeC 702 sends a Committed′ message to NodeB 702, which informs NodeB 702 to return a “Committed” message once transaction T2 has been committed. In state 722, NodeB 702 aborts transaction T1. In state 724, NodeB 702 commits transaction T2. In state 726, NodeB 702 communicates to NodeC 702, that transaction T2 has been committed.



FIG. 7B illustrates one embodiment of a protocol for mounting in a read-only mode after a temporary power failure. States 754, 756, 758, 760, and 762 correspond to states 704, 706, 708, 710, and 712, described above with respect to FIG. 7A. In state 764, NodeB 702 mounts in read-only mode. States 766, 768, and 770 correspond to states 716, 718, and 720 described above with respect to FIG. 7A. When NodeB 702 receives the “Abort” message for transaction T1 and the Committed′ message for transaction T2, NodeB 702 cannot process these commands as it would in read-write mode because the contents of the journal may not be modified during the read-only mode. In state 772, NodeB 702 defers write commands associated with the abort message for transaction T1. Deferred writes were described in greater detail above with reference to FIGS. 6C and 6D. In state 724, NodeB 702 defers the operations associated with the Committed′ message for transactions T2. The operations for deferring a “Commit” message are described in greater detail above with reference to FIGS. 6C and 6D.


In state 776, NodeB 702 transitions to a read-write mode. In state 778, NodeB 702 empties the DWQ. Processing the operations associated with emptying the DWQ are described in greater detail above with reference to FIGS. 6C and 6D. In state 780, NodeB 702 marks transaction T2 as committed in the journal. Marking a transaction as committed is described in greater detail above with reference to FIGS. 6C and 6D. In state 782, NodeB 702 sends a “Committed” message to NodeC 702.



FIG. 8 illustrates one embodiment of a state diagram for a protocol that includes a read-only mode for a portion of a storage system. The states and transitions correspond to embodiments of a read-only mode described herein. A related state diagram was disclosed in: U.S. patent application Ser. No. 11/262,306, titled, “NON-BLOCKING COMMIT PROTOCOL SYSTEMS AND METHODS,” filed Oct. 28, 2005, which was incorporated above by reference in its entirety. The state diagram in FIG. 8 introduces a new state, CR, which corresponds to a commit deferred state in read-only mode. The following illustrates one embodiment of exemplary pseudocode for a protocol that includes a read-only mode for a portion of a storage system. The pseudocode corresponds to embodiments of a read-only mode described herein.















function abort(T):



    log abort


    send aborted to T ∪ {i, c}


    set state to U


function commit_r(C, P, S):


    send committed(Ø) to S ∪ {i}


    set state to (CR, C, P)


function commit_i(C, P, S):


    log commit


    send committed({p}) to S ∪ {i}


    set state to (Ci, C, P)


function commit_p(C, P):


    send committed’({p}) to P \ C


    set state to (Cp, C, P)


in state U:


    on committed’(X) from p:
send committed({p}) to p


    on prepared from p:
send aborted to p


    on start from i:
log start



set state to (I, Ø)


    on commit from c:


    on prepare from i:


    on aborted from c; i; p:


in state (I, T):


    on disconnect from i:
abort(T)


    on local failure:
abort(T)


    on aborted from i:
abort(T)


    on prepared from p:
set state to (I, T ∪ {p})


    on prepare(P, S) from i:
log prepare(P, S)



send prepared to c



set state to (Pc, T, P, S)


in state (Pc, T, P, S):


    on disconnect from c:
send prepared to P



set state to (Pp, T, P, S)


    on aborted from c:
abort(T)


    on commit from c:


        if RW:
commit_i(Ø, P, S)


        else:
commit_r(Ø, P, S)


    on committed’(X) from pε P:


        if RW:
commit_i(X, P, S)



send committed({p}) to p


        else:
commit_r(X, P, S)


    on prepared from pε P:
set state to (Pc, T ∪ {p}, P, S)


in state (Pp, T, P, S):


    if T = P \ S:
abort(Ø)


    on connect to pε P:
send prepared to p


    on aborted from pε P:
abort(Ø)


    on commit from c:


    on committed’ (X) from pε P:


        if RW:
log commit



commit_p(X, P)



send committed({p}) to {i, p}


        else:
commit_r(X, P, Ø)


    on prepared from pε P \ S:
set state to (Pp,



T ∪ {p}, P, S)


in state (CR, C, P):


    on RW:
log commit



send committed({p}) to C



commit_p(C, P)


    on committed’(X) from pε P:
set state to (CR, C ∪ X, P)


in state (Ci, C, P):


    on committed(X) from i:
commit_p(C ∪ X, P)


    on disconnect from i:
commit_p(C, P)


    on commit from c:


    on committed’(X) from pε P:
set state to (Ci, C ∪ X, P)



send committed({p}) to p


    on prepared from pε P:


in state (Cp, C, P):


    if C = P:
log done



set state to U


    on connect to pε P \ C:
send committed’({p}) to p


    on commit from c:


    on committed(X) from pε P:
set state to (Cp, C ∪ X, P)


    on committed’(X) from pε P:
set state to (Cp, C ∪ X, P)



send committed({p}) to p


    on prepared from pε P:










VIII. Example Processes



FIG. 9 illustrates one embodiment of a process to monitor conditions that may negatively affect a portion of a data storage system. Embodiments of the nodes 102 shown in FIG. 1 may be configured to store and/or implement embodiments of monitor process 900. In state 902, monitor process 900 determines whether a condition exists that negatively affects the durability of a portion of the storage system. For example, each condition that is tested for may correspond to one or more daemon processes that continuously monitor for the condition. It will be appreciated by one skilled in the art that there are many suitable ways to detect for conditions that may negatively affect the durability of a portion of the storage system. For example, whether the chassis is open may be determined by a combination of hardware sensors and/or software monitors that detect for the presence of the open chassis condition.


If a condition exists, the monitor process 900 proceeds to state 904 which sets the bit corresponding to the detected condition via a state-change process, described below with reference to FIG. 10. After the state has been changed, in state 906, the monitor process 900 sleeps temporarily for a certain amount of time. The amount of time may be defined or may be randomly determined. In state 908, the monitor process 900 determines whether the system is shutting down. If the system is shutting down, then the monitor process 900 terminates. If the system is not shutting down, the monitor process 900 determines whether the already detected condition still exists. If the already detected condition still exists, then the monitor process 900 returns to state 906 to await another period of testing for the condition. If the already detected condition does not exist, then the monitor process 900 proceeds to state 912 in order to clean the bit corresponding to the no longer detected condition via state-change process described in greater detail below with reference to FIG. 10.


If in state 902 the monitor process 900 detected that the condition did not exist, then monitor process 900 proceeds to state 912 to clear the bit corresponding to the tested for condition via a state-change process described in greater detail below with reference to FIG. 10. In state 914, the monitor process 900 sleeps temporarily. The amount of sleep time may be a defined value, determined at random, or dynamically changed. In state 916, the monitor process 900 determines whether the system is shutting down. If the system is shutting down, then the monitor process 900 terminates. If the system is not shutting down, then, in state 918, the monitor process 900 determines whether the previously tested for condition exists. If the previously tested for condition still does not exist, then the monitor process 900 returns to state 914 until it tests again. If the previously tested for condition now exists, then the monitor process 900 proceeds to state 904 in order to set the bit corresponding to the now detected condition via a state-change process described in greater detail below with reference to FIG. 10.



FIG. 10 illustrates one embodiment of a process to change the state of a portion of a data storage system. Embodiments of the nodes 102 shown in FIG. 1 may be configured to store and/or implement embodiments of state-change process 1000. In state 1002, the state-change process 1000 updates the read-only state as requested by the calling process such as monitor process 900. In other words, if monitor process 900 requested that a bit corresponding to a particular condition be set, then state-change process 1000 sets the respective bit in order to update the read-only status of the calling process. Alternatively, if the monitor process 900 requested that a bit corresponding to a read-only mode condition be cleared, then the state-change process 1000 clears the bit in order to update the read-only state as requested by the calling process.


In some embodiments, there is more than one condition that may trigger the read-only state. Thus, even when a condition is detected, the relevant portion of the storage system may not transition to a read-only mode because the relevant portion is already in the read-only mode due to another condition. Furthermore, in some embodiments, a “Disallow” condition may prevent entry into the read-only mode. Therefore, in some embodiments, when a condition is detected, the data storage system may not enter the read-only mode because it is presently disallowed. It will be appreciated by one skilled in the art that there are various suitable ways to detect conditions and then to transition into a read-only mode.


In state 1004, the state-change process 1000 determines whether the node has transitioned into a read-only mode. If the node has not transitioned into a read-only mode, then the state-change process 1000 terminates. If the node has transitioned into a read-only mode, then the state-change process 1000 proceeds to state 1006. In state 1006 the state-change process 1000 determines whether the file system is mounted. If the file system is not mounted, then the state-change process 1000 proceeds to state 1012. If the file system is mounted, then the state-change process 1000 proceeds to state 1008. In state 1008, the state-change process 1000 notifies the file system process of a state-change. The file system process is described in greater detail below with reference to FIG. 11. In state 1010, the state-change process 1000 waits for the journal to be locked. In some embodiments, waiting for the journal to be locked includes waiting for the relevant transactions in the journal to sufficiently resolve. In some embodiments, waiting for the relevant transactions in the journal to sufficiently resolve comprises draining the journal of prepared transactions. Once the journal has been locked, in state 1012, a backup of the journal is saved and the state-change process 1000 terminates. In some embodiments, saving a backup of the journal also includes marking the journal as unwritable.



FIG. 11 illustrates one embodiment of a process to manage a file system that implements a read-only mode for a portion of the storage system. Embodiments of the nodes 102, storage modules, shown in FIG. 1 may be configured to store and/or implement embodiments of file system process 1100. In state 1102, file system process 1100 receives a request from a user to mount the file system on a portion of the data storage system. In some embodiments, the respective portions of the data storage system are nodes, and, in the description below, portions of the data storage system will be referred to as nodes.


In state 1104, file system process 1100 determines whether the relevant node is in a read-only mode. If the node is not in a read-only mode, then the file system process 1100, in state 1106, mounts the file system to the relevant node in read-write mode. Mounting the file system in read-write mode is described below in greater detail with reference to FIG. 12. In state 1108, the file system process 1000 waits for notification of a state-change. When the notification of a state-change is received, then the file system process 1100, in state 1110, determines whether the file system is still mounted. If the file system is not still mounted, then the file system process 1100 terminates. If the file system is still mounted, then the file system process 1100, in state 1112, terminates. If the node has not entered a read-only mode, then the file system process 1100 returns to state 1108 and waits for notification of a state change. As described in greater detail above with respect to FIG. 3 and FIG. 10, some state-changes will not cause a node to enter a read-only mode, even if the node is in a read-write mode. If the node is in a read-only mode, then file system process 1100, in state 1114, drains the journal of prepared transactions. Then, in state 1116, file system process 1100 locks the journal, preventing any changes to transaction states as well. The file system process 1100 then proceeds to state 1120.


If after receiving a request from a user to mount the file system, the file system process 1100 determines that the relevant node is in a read-only mode, then file system process 1100, in state 1118, mounts the file system as a read-only node. Mounting the file system for a read-only node is described in greater detail below with reference to FIG. 12. In state 1120, the file system process 1100 waits for notification of a state change. In state 1122, file system 1100 determines whether the file system is still mounted. If the file system is not still mounted, then the file system process 1100 terminates. If the file system is still mounted, then file system 1100, in state 1124, determines whether the node is still in a read-only mode. If the node is still in a read-only mode, then the file system process 1100 returns to state 1120 and wait for notification of a state change. As described in greater detail above with reference to FIG. 3 and FIG. 10, some state changes may not cause a node to exit a read-only mode. If the node is no longer in a read-only mode, then file system process 1100, in state 1126, invalidates the backup of the Journal for the node. Then, in state 1128, the file system process 1100 processes any deferred operations related to aborted and committed transactions. In some cases, it is possible that there are no deferred operations. Processing deferred operations related to aborted and committed transactions is described in greater detail below with reference to FIG. 13. In state 1130, file system process 1100 collects any garbage. In other words, file system process 1100 collects any data blocks that were invalidated during the read-only mode. For example, as described in greater detail above with reference to FIGS. 5A and 5B, data blocks that correspond to read-only nodes are re-written to other nodes. The previous data block values stored on the nodes in read-only mode are then invalidated. Once the relevant node has exited the read-only mode, the invalid data blocks may then be collected.



FIG. 12 illustrates one embodiment of a mount process 1200 for mounting a file system to a portion of a data storage system in either a read-only or read-write mode. Embodiments of the nodes 102 shown in FIG. 1 may be configured to store and/or implement embodiments of mount process 1200. With respect to a description of mount process 1200, portions of a storage system are referred to as nodes. Data journals may store information related to global transactions processed by the data storage system across different nodes. This information may include a global transaction list that records, for example, information related to the respective transaction state of the global transactions, as well as copies of data values associated with the respective global transactions and also other information.


In state 1202, the mount process 1200 rebuilds the global transaction list from the journal data. In state 1204, mount process 1200 establishes a connection with other nodes. The operations between states 1206 and 1210 are executed for the prepared transactions in the global transactions list. In state 1208, mount process 1200 communicates to other nodes that the relevant transaction has been prepared. In state 1212, the mount process 1200 receives Committed' and “Aborted” messages from other nodes. The operations between states 1214 and state 1222 are executed for the aborted transactions. In state 1216, mount process 1200 determines whether the node is in a read-only mode. If the node is in a read-only mode, the mount process 1200 defers the abort of the relevant transaction by queuing writes into a DWQ. The DWQ is described in greater detail above with reference to FIGS. 6C and 6D. If the node is not in a read-only mode, then the relevant transaction is aborted and the aborted transaction is unlinked in the journal, in state 1222. The operations between state 1224 and 1230 are executed for the committed transactions. In state 1226, the mount process 1200 determines whether the node is in a read-only mode. If the node is in a read-only mode, the mount process 1200 defers the commit of the relevant transaction by marking the transaction as commit-deferred in memory. Marking a transaction as commit-deferred in memory is described in greater detail above with reference to FIGS. 6C and 6D. If the node is not in a read-only mode, then the mount process 1200 sets the transaction state to committed and sends out “Committed” messages to the other nodes participating in the transaction.



FIG. 13 illustrates one embodiment of a process for processing deferred operations related to aborted and committed transactions. Embodiments of the nodes 102 shown in FIG. 1 may be configured to store and/or implement embodiments of the process deferred transactions process 1300. The operations between state 1302 and 1310 are executed for the transactions in the global transaction list. In state 1304, the process deferred transactions process 1300 determines whether the relevant transaction is marked as “commit-deferred” in memory. If the relevant transaction is marked “commit-deferred” in memory, then the process deferred transactions process 1300, in state 1306, marks the relevant transaction as committed in the journal. Then, in state 1308, the process deferred transactions process 1300 communicates to other nodes that the relevant transaction has been committed. Processing deferred commit transactions is also described above with respect to FIG. 6D. In state 1312, the process deferred transactions process 1300 determines whether the DWQ is not empty. If the DWQ is not empty, then the process deferred transactions process 1300, in state 1314, processes the next operation placed in the DWQ. Emptying the DWQ is also described above with reference to FIG. 6D.


IX. Additional Embodiments


While certain embodiments of the invention have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the present invention. For example, certain illustrative embodiments of the disclosed systems and methods have been described with reference to a read-only mode for a portion of a clustered storage system. The disclosed systems and methods are not limited by the illustrative examples. For example, in other embodiments, the disclosed systems and methods may implement a read-only mode for a portion of a non-distributed storage system. Many variations are contemplated.


For example, some embodiments may be directed to a distributed block device, which might resemble in functionality, for example, a large hard-disk drive. A distributed block device might be implemented on a cluster of nodes that provide the block-level functionality, for example, of a hard-disk drive. Thus, a distributed block device may be organized at the block level, rather than the file level. A cluster of processing nodes may work together to process read and write requests at specified block offsets, for example. Client processes may deliver read and write requests with an interface for block addresses, rather than filenames. In these embodiments, it may be advantageous to offer a read-only mode for member nodes of a cluster implementing a distributed block device. The embodiments described herein encompass such a distributed system, as well as other possible storage systems.


Embodiments of the disclosed systems and methods may be used and/or implemented with local and/or remote devices, components, and/or modules. The term “remote” may include devices, components, and/or modules not stored locally, for example, not accessible via a local bus. Thus, a remote device may include a device which is physically located in the same room and connected via a device such as a switch or a local area network. In other situations, a remote device may also be located in a separate geographic area, such as, for example, in a different location, country, and so forth.

Claims
  • 1. A distributed storage system configured to implement a selective read-only mode, comprising: a plurality of storage modules configured to communicate via a network, each of the plurality of storage modules comprising at least one processor and configured to process at least one of read and write requests on behalf of the entire distributed storage system, wherein the storage modules configured to handle write requests are configured to individually enter a read-only mode if at least one condition is detected that affects persistent storage on a respective storage module; andan allocation module configured to exclude storage modules operating in read-only mode from processing write requests.
  • 2. The distributed storage system of claim 1, wherein the storage system is a distributed file system, and wherein file data and metadata of the file system are distributed across the plurality of storage modules.
  • 3. The distributed storage system of claim 1, wherein the persistent storage is at least one of the following: non-volatile random access memory (NVRAM), flash memory, hard-disk drive, compact-disc read-only memory (CD-ROM), digital versatile disc (DVD), optical storage, and magnetic tape drive.
  • 4. The distributed storage system of claim 1, wherein the at least one condition is at least one of the following: batteries supporting the persistent storage have been physically removed; batteries supporting the persistent storage have started to fail; a chassis housing the persistent storage has been opened; a user has requested read-only mode via a software interface; a user has requested read-only mode via a hardware interface; a hardware failure that negatively affects the durability of the persistent storage; a condition that negatively affects the durability of the persistent storage; a condition that negatively affects a main power supply; a condition that negatively affects backup flash memory; and a condition related to auto-correcting single-bit errors for non-volatile memory components.
  • 5. The distributed storage system of claim 1, the allocation module further configured to write data, which is addressed to a location within the persistent storage affected by the at least one condition, to another location in the storage system and to update that data's corresponding metadata to identify the new location.
  • 6. The distributed storage system of claim 1, wherein each of the storage modules configured to handle write requests is further configured to, while in read-only mode: process the write requests without modifying data stored on the storage module; andprocess the read requests, including read requests for data stored on the storage module.
  • 7. The distributed storage system of claim 1, wherein the allocation module is further configured to: receive a request to write data to a data block on one of the storage modules;determine whether the data block is on a storage module which is in the read-only mode; andif the data block is on a storage module which is in the read-only mode, write the data to one or more of the plurality of storage modules which is not in the read-only mode.
  • 8. The distributed storage system of claim 7, wherein the allocation module is further configured to update metadata associated with the data.
  • 9. The distributed storage system of claim 1, further comprising a monitor module configured to detect the at least one condition that affects persistent storage.
  • 10. The distributed storage system of claim 9, wherein the monitor module is further configured to maintain an indication of all existing conditions that affect persistent storage which the monitor module has detected.
  • 11. The distributed storage system of claim 1, wherein the write requests are associated with transactions, each transaction affecting one or more of the plurality of storage modules.
  • 12. The distributed storage system of claim 11, wherein each of the storage modules configured to handle write requests is further configured to wait for one or more of the transactions to resolve before entering the read-only mode.
  • 13. The distributed storage system of claim 11, wherein each of the storage modules configured to handle write requests is further configured to maintain a data journal which maintains information about one or more of the transactions.
  • 14. The distributed storage system of claim 13, wherein each of the storage modules configured to handle write requests is further configured to designate the data journal unwritable before entering the read-only mode.
  • 15. The distributed storage system of claim 13, wherein each of the storage modules configured to handle write requests is further configured to create a copy of the data journal before entering the read-only mode.
  • 16. The distributed storage system of claim 13, wherein each of the storage modules configured to handle write requests is further configured to, while in the read-only mode, defer modifications to the data journal until after leaving the read-only mode.
  • 17. The distributed storage system of claim 11, wherein each of the storage modules configured to handle write requests is further configured to, while in the read-only mode, defer operations associated with resolving one or more of the transactions until after leaving the read-only mode.
  • 18. The distributed storage system of claim 1, wherein each of the storage modules configured to handle write requests is further configured to maintain a queue while in the read-only mode, the queue configured to comprise deferred write commands associated with transactions.
  • 19. The distributed storage system of claim 18, wherein each of the storage modules configured to handle write requests is further configured to maintain a list of transaction states for the transactions.
  • 20. The distributed storage system of claim 18, wherein each of the storage modules configured to handle write requests is further configured to execute one or more of the deferred write commands after leaving the read-only mode.
US Referenced Citations (338)
Number Name Date Kind
5163131 Row et al. Nov 1992 A
5181162 Smith et al. Jan 1993 A
5212784 Sparks May 1993 A
5230047 Frey et al. Jul 1993 A
5251206 Calvignac et al. Oct 1993 A
5258984 Menon et al. Nov 1993 A
5329626 Klein et al. Jul 1994 A
5359594 Gould et al. Oct 1994 A
5403639 Belsan et al. Apr 1995 A
5459871 Van Den Berg Oct 1995 A
5481699 Saether Jan 1996 A
5548724 Akizawa et al. Aug 1996 A
5548795 Au Aug 1996 A
5568629 Gentry et al. Oct 1996 A
5596709 Bond et al. Jan 1997 A
5606669 Bertin et al. Feb 1997 A
5612865 Dasgupta Mar 1997 A
5649200 Leblang et al. Jul 1997 A
5657439 Jones et al. Aug 1997 A
5668943 Attanasio et al. Sep 1997 A
5680621 Korenshtein Oct 1997 A
5694593 Baclawski Dec 1997 A
5696895 Hemphill et al. Dec 1997 A
5734826 Olnowich et al. Mar 1998 A
5754756 Watanabe et al. May 1998 A
5761659 Bertoni Jun 1998 A
5774643 Lubbers et al. Jun 1998 A
5799305 Bortvedt et al. Aug 1998 A
5805578 Stirpe et al. Sep 1998 A
5805900 Fagen et al. Sep 1998 A
5806065 Lomet Sep 1998 A
5822790 Mehrotra Oct 1998 A
5862312 Mann Jan 1999 A
5870563 Roper et al. Feb 1999 A
5878410 Zbikowski et al. Mar 1999 A
5878414 Hsiao et al. Mar 1999 A
5884046 Antonov Mar 1999 A
5884098 Mason, Jr. Mar 1999 A
5884303 Brown Mar 1999 A
5890147 Peltonen et al. Mar 1999 A
5917998 Cabrera et al. Jun 1999 A
5933834 Aichelen Aug 1999 A
5943690 Dorricott et al. Aug 1999 A
5966707 Van Huben et al. Oct 1999 A
5996089 Mann Nov 1999 A
6000007 Leung et al. Dec 1999 A
6014669 Slaughter et al. Jan 2000 A
6021414 Fuller Feb 2000 A
6029168 Frey Feb 2000 A
6038570 Hitz et al. Mar 2000 A
6044367 Wolff Mar 2000 A
6052759 Stallmo et al. Apr 2000 A
6055543 Christensen et al. Apr 2000 A
6055564 Phaal Apr 2000 A
6070172 Lowe May 2000 A
6081833 Okamoto et al. Jun 2000 A
6081883 Popelka et al. Jun 2000 A
6108759 Orcutt et al. Aug 2000 A
6117181 Dearth et al. Sep 2000 A
6122754 Litwin et al. Sep 2000 A
6138126 Hitz et al. Oct 2000 A
6154854 Stallmo Nov 2000 A
6173374 Heil et al. Jan 2001 B1
6209059 Ofer et al. Mar 2001 B1
6219693 Napolitano et al. Apr 2001 B1
6321345 Mann Nov 2001 B1
6334168 Islam et al. Dec 2001 B1
6353823 Kumar Mar 2002 B1
6384626 Tsai et al. May 2002 B2
6385626 Tamer et al. May 2002 B1
6393483 Latif et al. May 2002 B1
6397311 Capps May 2002 B1
6405219 Saether et al. Jun 2002 B2
6408313 Campbell et al. Jun 2002 B1
6415259 Wolfinger et al. Jul 2002 B1
6421781 Fox et al. Jul 2002 B1
6434574 Day et al. Aug 2002 B1
6449730 Mann Sep 2002 B2
6453389 Weinberger et al. Sep 2002 B1
6457139 D'Errico et al. Sep 2002 B1
6463442 Bent et al. Oct 2002 B1
6499091 Bergsten Dec 2002 B1
6502172 Chang Dec 2002 B2
6502174 Beardsley Dec 2002 B1
6523130 Hickman et al. Feb 2003 B1
6526478 Kirby Feb 2003 B1
6546443 Kakivaya et al. Apr 2003 B1
6549513 Chao et al. Apr 2003 B1
6557114 Mann Apr 2003 B2
6567894 Hsu et al. May 2003 B1
6567926 Mann May 2003 B2
6571244 Larson May 2003 B1
6571349 Mann May 2003 B1
6574745 Mann Jun 2003 B2
6594655 Tal et al. Jul 2003 B2
6594660 Berkowitz et al. Jul 2003 B1
6594744 Humlicek et al. Jul 2003 B1
6598174 Parks et al. Jul 2003 B1
6618798 Burton et al. Sep 2003 B1
6631411 Welter et al. Oct 2003 B1
6658554 Moshovos et al. Dec 2003 B1
6662184 Friedberg Dec 2003 B1
6671686 Pardon et al. Dec 2003 B2
6671704 Gondi et al. Dec 2003 B1
6687805 Cochran Feb 2004 B1
6725392 Frey et al. Apr 2004 B1
6732125 Autrey et al. May 2004 B1
6742020 Dimitroff et al. May 2004 B1
6748429 Talluri et al. Jun 2004 B1
6801949 Bruck et al. Oct 2004 B1
6848029 Coldewey Jan 2005 B2
6856591 Ma et al. Feb 2005 B1
6895482 Blackmon et al. May 2005 B1
6895534 Wong et al. May 2005 B2
6907011 Miller et al. Jun 2005 B1
6907520 Parady Jun 2005 B2
6917942 Burns et al. Jul 2005 B1
6920494 Heitman et al. Jul 2005 B2
6922696 Lincoln et al. Jul 2005 B1
6934878 Massa et al. Aug 2005 B2
6940966 Lee Sep 2005 B2
6954435 Billhartz et al. Oct 2005 B2
6990604 Binger Jan 2006 B2
6990611 Busser Jan 2006 B2
7007044 Rafert et al. Feb 2006 B1
7007097 Huffman et al. Feb 2006 B1
7017003 Murotani et al. Mar 2006 B2
7043485 Manley et al. May 2006 B2
7043567 Trantham May 2006 B2
7069320 Chang et al. Jun 2006 B1
7103597 McGoveran Sep 2006 B2
7111305 Solter et al. Sep 2006 B2
7113938 Highleyman et al. Sep 2006 B2
7124264 Yamashita Oct 2006 B2
7146524 Patel et al. Dec 2006 B2
7152182 Ji et al. Dec 2006 B2
7177295 Sholander et al. Feb 2007 B1
7181746 Perycz et al. Feb 2007 B2
7184421 Liu et al. Feb 2007 B1
7194487 Kekre et al. Mar 2007 B1
7206805 McLaughlin, Jr. Apr 2007 B1
7225204 Manley et al. May 2007 B2
7228299 Harmer et al. Jun 2007 B1
7240235 Lewalski-Brechter Jul 2007 B2
7249118 Sandler et al. Jul 2007 B2
7257257 Anderson et al. Aug 2007 B2
7290056 McLaughlin, Jr. Oct 2007 B1
7313614 Considine et al. Dec 2007 B2
7318134 Oliveira et al. Jan 2008 B1
7346720 Fachan Mar 2008 B2
7370064 Yousefi'zadeh May 2008 B2
7373426 Jinmei et al. May 2008 B2
7386675 Fachan Jun 2008 B2
7386697 Case et al. Jun 2008 B1
7440966 Adkins et al. Oct 2008 B2
7451341 Okaki et al. Nov 2008 B2
7509448 Fachan et al. Mar 2009 B2
7509524 Patel et al. Mar 2009 B2
7533298 Smith et al. May 2009 B2
7546354 Fan et al. Jun 2009 B1
7546412 Ahmad et al. Jun 2009 B2
7551572 Passey et al. Jun 2009 B2
7558910 Alverson et al. Jul 2009 B2
7571348 Deguchi et al. Aug 2009 B2
7577667 Hinshaw et al. Aug 2009 B2
7590652 Passey et al. Sep 2009 B2
7593938 Lemar et al. Sep 2009 B2
7631066 Schatz et al. Dec 2009 B1
7676691 Fachan et al. Mar 2010 B2
7680836 Anderson et al. Mar 2010 B2
7680842 Anderson et al. Mar 2010 B2
7685126 Patel et al. Mar 2010 B2
7739288 Lemar et al. Jun 2010 B2
7743033 Patel et al. Jun 2010 B2
7752402 Fachan et al. Jul 2010 B2
7756898 Passey et al. Jul 2010 B2
7779048 Fachan et al. Aug 2010 B2
7783666 Zhuge et al. Aug 2010 B1
7788303 Mikesell et al. Aug 2010 B2
7797283 Fachan et al. Sep 2010 B2
20010042224 Stanfill et al. Nov 2001 A1
20010047451 Noble et al. Nov 2001 A1
20010056492 Bressoud et al. Dec 2001 A1
20020010696 Izumi Jan 2002 A1
20020029200 Dulin et al. Mar 2002 A1
20020035668 Nakano et al. Mar 2002 A1
20020038436 Suzuki Mar 2002 A1
20020055940 Elkan May 2002 A1
20020072974 Pugliese et al. Jun 2002 A1
20020075870 de Azevedo et al. Jun 2002 A1
20020078161 Cheng Jun 2002 A1
20020078180 Miyazawa Jun 2002 A1
20020083078 Pardon et al. Jun 2002 A1
20020083118 Sim Jun 2002 A1
20020087366 Collier et al. Jul 2002 A1
20020095438 Rising et al. Jul 2002 A1
20020124137 Ulrich et al. Sep 2002 A1
20020138559 Ulrich et al. Sep 2002 A1
20020156840 Ulrich et al. Oct 2002 A1
20020156891 Ulrich et al. Oct 2002 A1
20020156973 Ulrich et al. Oct 2002 A1
20020156974 Ulrich et al. Oct 2002 A1
20020156975 Staub et al. Oct 2002 A1
20020158900 Hsieh et al. Oct 2002 A1
20020161846 Ulrich et al. Oct 2002 A1
20020161850 Ulrich et al. Oct 2002 A1
20020161973 Ulrich et al. Oct 2002 A1
20020163889 Yemini et al. Nov 2002 A1
20020165942 Ulrich et al. Nov 2002 A1
20020166026 Ulrich et al. Nov 2002 A1
20020166079 Ulrich et al. Nov 2002 A1
20020169827 Ulrich et al. Nov 2002 A1
20020170036 Cobb et al. Nov 2002 A1
20020174295 Ulrich et al. Nov 2002 A1
20020174296 Ulrich et al. Nov 2002 A1
20020178162 Ulrich et al. Nov 2002 A1
20020191311 Ulrich et al. Dec 2002 A1
20020194523 Ulrich et al. Dec 2002 A1
20020194526 Ulrich et al. Dec 2002 A1
20020198864 Ostermann et al. Dec 2002 A1
20030005159 Kumhyr Jan 2003 A1
20030009511 Giotta et al. Jan 2003 A1
20030014391 Evans et al. Jan 2003 A1
20030033308 Patel et al. Feb 2003 A1
20030061491 Jaskiewicz et al. Mar 2003 A1
20030109253 Fenton et al. Jun 2003 A1
20030120863 Lee et al. Jun 2003 A1
20030125852 Schade et al. Jul 2003 A1
20030135514 Patel et al. Jul 2003 A1
20030149750 Franzenburg Aug 2003 A1
20030158873 Sawdon et al. Aug 2003 A1
20030161302 Zimmermann et al. Aug 2003 A1
20030163726 Kidd Aug 2003 A1
20030172149 Edsall et al. Sep 2003 A1
20030177308 Lewalski-Brechter Sep 2003 A1
20030182325 Manely et al. Sep 2003 A1
20030233385 Srinivasa et al. Dec 2003 A1
20040003053 Williams Jan 2004 A1
20040024731 Cabrera et al. Feb 2004 A1
20040024963 Talagala et al. Feb 2004 A1
20040078812 Calvert Apr 2004 A1
20040133670 Kaminksky et al. Jul 2004 A1
20040143647 Cherkasova Jul 2004 A1
20040153479 Mikesell et al. Aug 2004 A1
20040158549 Matena et al. Aug 2004 A1
20040189682 Troyansky et al. Sep 2004 A1
20040199734 Rajamani et al. Oct 2004 A1
20040199812 Earl et al. Oct 2004 A1
20040205141 Goland Oct 2004 A1
20040230748 Ohba Nov 2004 A1
20040240444 Matthews et al. Dec 2004 A1
20040260673 Hitz et al. Dec 2004 A1
20040267747 Choi et al. Dec 2004 A1
20050010592 Guthrie Jan 2005 A1
20050033778 Price Feb 2005 A1
20050044197 Lai Feb 2005 A1
20050066095 Mullick et al. Mar 2005 A1
20050114402 Guthrie May 2005 A1
20050114609 Shorb May 2005 A1
20050125456 Hara et al. Jun 2005 A1
20050131860 Livshits Jun 2005 A1
20050131990 Jewell Jun 2005 A1
20050138195 Bono Jun 2005 A1
20050138252 Gwilt Jun 2005 A1
20050171960 Lomet Aug 2005 A1
20050171962 Martin et al. Aug 2005 A1
20050187889 Yasoshima Aug 2005 A1
20050188052 Ewanchuk et al. Aug 2005 A1
20050289169 Adya et al. Dec 2005 A1
20050289188 Nettleton et al. Dec 2005 A1
20060004760 Clift et al. Jan 2006 A1
20060041894 Cheng Feb 2006 A1
20060047925 Perry Mar 2006 A1
20060059467 Wong Mar 2006 A1
20060074922 Nishimura Apr 2006 A1
20060083177 Iyer et al. Apr 2006 A1
20060095438 Fachan et al. May 2006 A1
20060101062 Godman et al. May 2006 A1
20060129584 Hoang et al. Jun 2006 A1
20060129631 Na et al. Jun 2006 A1
20060129983 Feng Jun 2006 A1
20060155831 Chandrasekaran Jul 2006 A1
20060206536 Sawdon et al. Sep 2006 A1
20060230411 Richter et al. Oct 2006 A1
20060277432 Patel et al. Dec 2006 A1
20060288161 Cavallo Dec 2006 A1
20070091790 Passey et al. Apr 2007 A1
20070094269 Mikesell et al. Apr 2007 A1
20070094277 Fachan et al. Apr 2007 A1
20070094310 Passey et al. Apr 2007 A1
20070094431 Fachan Apr 2007 A1
20070094452 Fachan Apr 2007 A1
20070168351 Fachan Jul 2007 A1
20070171919 Godman et al. Jul 2007 A1
20070195810 Fachan Aug 2007 A1
20070233684 Verma et al. Oct 2007 A1
20070233710 Passey et al. Oct 2007 A1
20070255765 Robinson Nov 2007 A1
20080005145 Worrall Jan 2008 A1
20080010507 Vingralek Jan 2008 A1
20080021907 Patel et al. Jan 2008 A1
20080031238 Harmelin et al. Feb 2008 A1
20080034004 Cisler et al. Feb 2008 A1
20080044016 Henzinger Feb 2008 A1
20080046432 Anderson et al. Feb 2008 A1
20080046443 Fachan et al. Feb 2008 A1
20080046444 Fachan et al. Feb 2008 A1
20080046445 Passey et al. Feb 2008 A1
20080046475 Anderson et al. Feb 2008 A1
20080046476 Anderson et al. Feb 2008 A1
20080046667 Fachan et al. Feb 2008 A1
20080059541 Fachan et al. Mar 2008 A1
20080126365 Fachan et al. May 2008 A1
20080151724 Anderson et al. Jun 2008 A1
20080154978 Lemar et al. Jun 2008 A1
20080155191 Anderson et al. Jun 2008 A1
20080168304 Flynn et al. Jul 2008 A1
20080168458 Fachan et al. Jul 2008 A1
20080243773 Patel et al. Oct 2008 A1
20080256103 Fachan et al. Oct 2008 A1
20080256537 Fachan et al. Oct 2008 A1
20080256545 Fachan et al. Oct 2008 A1
20080294611 Anglin et al. Nov 2008 A1
20090055399 Lu et al. Feb 2009 A1
20090055604 Lemar et al. Feb 2009 A1
20090055607 Schack et al. Feb 2009 A1
20090210880 Fachan et al. Aug 2009 A1
20090248756 Akidau et al. Oct 2009 A1
20090248765 Akidau et al. Oct 2009 A1
20090248975 Daud et al. Oct 2009 A1
20090249013 Daud et al. Oct 2009 A1
20090252066 Passey et al. Oct 2009 A1
20090327218 Passey et al. Dec 2009 A1
20100161556 Anderson et al. Jun 2010 A1
20100161557 Anderson et al. Jun 2010 A1
20100185592 Kryger Jul 2010 A1
20100223235 Fachan Sep 2010 A1
20100235413 Patel Sep 2010 A1
Foreign Referenced Citations (13)
Number Date Country
0774723 May 1997 EP
2006-506741 Jun 2004 JP
4464279 May 2010 JP
4504677 Jul 2010 JP
WO 9429796 Dec 1994 WO
WO 0057315 Sep 2000 WO
WO 0114991 Mar 2001 WO
WO 0133829 May 2001 WO
WO 02061737 Aug 2002 WO
WO 03012699 Feb 2003 WO
WO 2004046971 Jun 2004 WO
WO 2008021527 Feb 2008 WO
WO 2008021528 Feb 2008 WO
Related Publications (1)
Number Date Country
20090248765 A1 Oct 2009 US