Embodiments described herein relate to fencing and, more particularly, to techniques for implementing reliable lease-based fencing mechanisms for shared storage devices.
Persistent reservations provide a mechanism for both heartbeating between several cluster members and performing access control to shared storage medium(s). However, existing approaches to fencing are unreliable due to shortcomings in common implementations. For example, a storage system may include a primary storage controller and a secondary storage controller coupled to a plurality of storage devices. When the primary storage controller experiences a delay or stall, the secondary storage controller may take over as the new primary storage controller. However, the old primary storage controller may return from the stall and attempt to perform an unauthorized access to the shared storage medium(s), potentially causing data corruption or other unintended behavior. To prevent this scenario and other similar scenarios, more reliable fencing mechanisms are needed.
Various embodiments of systems, apparatuses, methods, and computer readable storage mediums for implementing lease-based fencing are contemplated.
In one embodiment, multiple storage controllers may be coupled to one or more shared storage mediums for shared access and lease-based fencing may be utilized to control access to the shared stored mediums. Lease-based fencing may allow a primary storage controller to utilize a time-limited lease window during which accesses to the shared storage mediums are permitted, while accesses may be prohibited for expired leases. In one embodiment, write operations may be allowed for the primary controller with a current lease window and write operations may be prevented for one or more secondary controllers. In this embodiment, read operations may be allowed for all controllers.
In one embodiment, the primary storage controller may be configured to generate heartbeats at regularly spaced intervals (or heartbeat intervals). When the primary storage controller generates a successful heartbeat, the lease window may be extended, with the extension being calculated from a prior heartbeat. In one embodiment, the lease extension may be calculated from the previous (or most recent) heartbeat before the current heartbeat. In another embodiment, the lease extension may be calculated from a prior heartbeat performed two or more heartbeats before the current heartbeat. In a further embodiment, the lease extension may be calculated from a previous point in time when a prior lease extension was granted. In other embodiments, the lease extension may be calculated from a different point in time prior to the current heartbeat.
In one embodiment, if the secondary storage controller does not detect a heartbeat from the primary storage controller for a certain period of time (or takeover window), the secondary storage controller may take over as the new primary storage controller. If the old primary storage controller comes back up and generates a new heartbeat, it may be granted a lease extension which is calculated from a prior heartbeat. Accordingly, the lease extension will already be expired by the time it is granted if the takeover window is larger than the lease window. Therefore, the old primary storage controller will not be able to perform an unauthorized access to the shared storage medium(s) after the secondary storage controller has taken over as the new primary storage controller.
These and other embodiments will become apparent upon consideration of the following description and accompanying drawings.
While the invention is susceptible to various modifications and alternative forms, specific embodiments are shown by way of example in the drawings and are herein described in detail. It should be understood, however, that drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.
In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention. However, one having ordinary skill in the art should recognize that the invention might be practiced without these specific details. In some instances, well-known circuits, structures, signals, computer program instruction, and techniques have not been shown in detail to avoid obscuring the present invention. It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements.
This specification includes references to “one embodiment”. The appearance of the phrase “in one embodiment” in different contexts does not necessarily refer to the same embodiment. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure. Furthermore, as used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean including, but not limited to.
Terminology. The following paragraphs provide definitions and/or context for terms found in this disclosure (including the appended claims):
“Comprising.” This term is open-ended. As used in the appended claims, this term does not foreclose additional structure or steps. Consider a claim that recites: “A system comprising a storage device . . . ” Such a claim does not foreclose the system from including additional components (e.g., a network interface, one or more processors, a storage controller).
“Configured To.” Various units, circuits, or other components may be described or claimed as “configured to” perform a task or tasks. In such contexts, “configured to” is used to connote structure by indicating that the units/circuits/components include structure (e.g., circuitry) that performs the task or tasks during operation. As such, the unit/circuit/component can be said to be configured to perform the task even when the specified unit/circuit/component is not currently operational (e.g., is not on). The units/circuits/components used with the “configured to” language include hardware—for example, circuits, memory storing program instructions executable to implement the operation, etc. Reciting that a unit/circuit/component is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. § 112, paragraph (f), for that unit/circuit/component. Additionally, “configured to” can include generic structure (e.g., generic circuitry) that is manipulated by software and/or firmware (e.g., an FPGA or a general-purpose processor executing software) to operate in manner that is capable of performing the task(s) at issue. “Configured to” may also include adapting a manufacturing process (e.g., a semiconductor fabrication facility) to fabricate devices (e.g., integrated circuits) that are adapted to implement or perform one or more tasks.
“Based On.” As used herein, this term is used to describe one or more factors that affect a determination. This term does not foreclose additional factors that may affect a determination. That is, a determination may be solely based on those factors or based, at least in part, on those factors. Consider the phrase “determine A based on B.” While B may be a factor that affects the determination of A, such a phrase does not foreclose the determination of A from also being based on C. In other instances, A may be determined based solely on B.
Referring now to
Each of storage controllers 105 and 110 may include software and/or hardware configured to provide access to storage devices 135A-N. Storage controllers 105 and 110 may be coupled directly to client computer system 125, and storage controllers 105 and 110 may be coupled remotely over network 120 to client computer system 115. Clients 115 and 125 are representative of any number of clients which may utilize storage controllers 105 and 110 for storing and accessing data in system 100. It is noted that some systems may include only a single client, connected directly or remotely to storage controllers 105 and 110. Although storage controllers 105 and 110 are shown as being separate from storage device groups 130 and 140, in some embodiments, portions or the entirety of storage controllers 105 and 110 may be located within one or each of storage device groups 130 and 140.
Storage controllers 105 and 110 may be connected to each other and to clients 115 and 125 using any suitable connection (e.g., local area network (LAN), storage area network (SAN)). In one embodiment, the connection between storage controllers 105 and 110 may be used for the delivery of heartbeat signals. Storage controllers 105 and 110 may also be connected to storage device groups 130 and 140 using any of a variety of connections. In one embodiment, the interfaces between controllers 105 and 110 and storage devices of storage device groups 130 and 140 may be custom designed interfaces. Alternatively, in other embodiments, these interfaces may utilize a standard communication protocol. For example, the interfaces between between controllers 105 and 110 and storage devices of storage device groups 130 and 140 may utilize a Serial Advanced Technology Attachment (“SATA”) bus protocol, a Small Computer System Interface (“SCSI”) bus protocol, a Serial Attached SCSI (“SAS”) bus protocol, a Peripheral Component Interconnect Express (PCIe) bus protocol, and/or any of various other communication protocols.
In a typical configuration, one of storage controllers 105 and 110 may be the primary storage controller and the other of storage controllers 105 and 110 may be the secondary storage controller. In systems with three or more controllers, one controller may be the primary controller and the other controllers may be secondary controllers. The various connections to and from storage controllers 105 and 110 may be configured to allow a high availability configuration and to allow either of storage controllers 105 and 110 to serve as the primary storage controller. In this configuration, the primary storage controller may have read and write access to storage device groups 130 and 140 while the secondary storage controller may have only read access to storage device groups 130 and 140. In other embodiments, other configurations may be utilized. For example, one storage controller may have access to a first portion of storage device groups 130 and 140 and the other storage controller may have access to a second portion of storage device groups 130 and 140, and access may be shared to a third portion of storage device groups 130 and 140. Additional configurations are possible and are contemplated.
Storage controllers 105 and 110 may include or be coupled to a base operating system (OS), a volume manager, and additional control logic for implementing the various techniques disclosed herein. Storage controllers 105 and 110 may also include and/or execute on any number of processors and may include and/or execute on a single computing device or be spread across multiple computing devices, depending on the embodiment. The computing device(s) may be servers, workstations, or other types of computing devices. In some embodiments, storage controllers 105 and 110 may include or execute on one or more file servers and/or block servers. Storage controllers 105 and 110 may use any of various techniques for replicating data across devices 135A-N to prevent loss of data due to the failure of a device or the failure of storage locations within a device.
Storage controllers 105 and 110 may include software and/or control logic for implementing a reliable lease-based fencing mechanism for shared storage devices 135A-N. In one embodiment, one of the storage controllers 105 and 110 may be designated as the primary controller and the other may be designated as the secondary controller. The primary storage controller may be configured to extend lease windows after establishing ownership of the shared storage devices 135A-N and may also be configured to enforce access failures on expired leases. In one embodiment, a valid lease may permit a storage controller to perform a write operation to a shared storage device on behalf of a client without having to check for permission from another storage controller that has access to the storage device. For example, in various embodiments the lease defines a time period during which the primary storage controller can perform write operations without further reference to any other storage controllers.
In one embodiment, enforcement of the lease may occur in the storage controller operating system kernel, at the lowest layer of the storage driver stack. However, it is contemplated that such enforcement may occur elsewhere within a given system. In one embodiment, the storage driver stack may be a SCSI stack. In one embodiment, enforcement may include performing command descriptor block (CDB) parsing to split SCSI commands into non-state-changing (or harmless) commands (e.g., reads, TEST_UNIT_READY, READ_CAPACITY, PERSISTENT_RESERVE_IN, MAINTENCE_IN) and state-changing (or dangerous) commands (e.g., writes, FORMAT_UNIT, ERASE, WRITE_SAME, UNMAP). The non-state-changing commands may be harmless no matter which or how many controllers perform them while the state-changing commands may become harmful when performed concurrently by multiple controllers. In some cases, there may be commands that are considered non-state-changing for a first system design which may be considered state-changing for a second system design. These commands may include SYNCHRONIZE_CACHE, SECURITY_PROTOCOL_OUT (SECURITY_UNLOCK), ENABLE_ADVANCED_POWER_MANAGEMENT, and SECURITY_DISABLE_PASSWORD. In some embodiments, non-state-changing commands may always be allowed while state-changing commands may only be permitted within the lease window.
Queueing times may cause a delay between issuing a command from userspace and the command reaching the storage device. Accordingly, queueing inside the kernel may be minimized in order to minimize these delays. However, since there is still queueing possible inside the storage devices, the lease window may be sized so that it is comfortably smaller than the takeover window, wherein the takeover window is defined as the amount of time without detecting a heartbeat that the secondary controller waits before taking over as the new primary controller.
It is noted that in alternative embodiments, the number and type of client computers, storage controllers, networks, storage device groups, and data storage devices is not limited to those shown in
Network 120 may utilize a variety of techniques including wireless connection, direct LAN connections, wide area network (WAN) connections such as the Internet, a router, storage area network, Ethernet, and others. Network 120 may comprise one or more LANs that may also be wireless. Network 120 may further include remote direct memory access (RDMA) hardware and/or software, transmission control protocol/internet protocol (TCP/IP) hardware and/or software, router, repeaters, switches, grids, and/or others. Protocols such as Fibre Channel, Fibre Channel over Ethernet (FCoE), iSCSI, and so forth may be used in network 120. The network 120 may interface with a set of communications protocols used for the Internet such as the Transmission Control Protocol (TCP) and the Internet Protocol (IP), or TCP/IP.
Client computer systems 115 and 125 are representative of any number of stationary or mobile computers such as desktop personal computers (PCs), servers, server farms, workstations, laptops, handheld computers, servers, personal digital assistants (PDAs), smart phones, and so forth. Generally speaking, client computer systems 115 and 125 include one or more processors comprising one or more processor cores. Each processor core includes circuitry for executing instructions according to a predefined general-purpose instruction set. For example, the x86 instruction set architecture may be selected. Alternatively, the ARM®, Alpha®, PowerPC®, SPARC®, or any other general-purpose instruction set architecture may be selected. The processor cores may access cache memory subsystems for data and computer program instructions. The cache subsystems may be coupled to a memory hierarchy comprising random access memory (RAM) and a storage device.
Turning now to
In one embodiment, computing device 205 may operate as a primary device with read and write access to resource 215 while computing device 210 may operate as a secondary device with only read access to resource 215. In various embodiments, the device designated as the primary device (computing device 205 in this example,) may be configured to generate a heartbeat periodically (e.g., every N seconds, wherein N may be a fraction of a second). Depending on the embodiment, a heartbeat may be a signal or pulse generated by the computing device 205 and conveyed to computing device 210 and/or shared resource 215, a region of memory in resource 215 that is updated by computing device 205, or another suitable mechanism for indicating the health or status of computing device 205. In one embodiment, computing device 205 may generate a heartbeat every 100 milliseconds (ms) (or 0.1 seconds). In one embodiment, computing device 205 may issue itself a new lease window of M seconds every time it has generated a successful heartbeat, wherein M may be a fraction of a second, and wherein the lease window is extended from the time of a prior heartbeat rather than the current heartbeat. Alternatively, in another embodiment, the lease extension may be calculated from the time of a prior lease extension. In a further embodiment, the lease extension may be calculated from any point in time prior to the current heartbeat. For example, in this further embodiment, computing device 205 may take a timestamp and then perform the necessary verifications for confirming ownership of resource 215. After performing the necessary verifications, computing device 205 may extend the lease.
In one embodiment, computing device 210 may detect when heartbeats are generated by computing device 205, and if computing device 210 does not detect a heartbeat for a certain period of time (i.e., the takeover window), then computing device 210 may take over as the primary computing device 210. The takeover window may be programmable and may vary depending on the embodiment. After taking over as the primary device, if computing device 205 were to perform an unauthorized access to resource 215, such an unauthorized access could result in data corruption or other unintended outcomes.
Therefore, to prevent both computing devices from attempting to simultaneously control resource 215, a new lease extension may be calculated from the time of a prior lease extension. Depending on the embodiment, the lease extension may be calculated from the time of the heartbeat immediately prior to the current heartbeat, two heartbeats ago, three heartbeats ago, four heartbeats ago, or some other point in time in the past. By utilizing this scheme, at least a portion of the new lease window is for time that has already expired.
In the case where computing device 205 experiences a stall longer than the lease window between generating the heartbeat and extending the lease, the lease will already be expired when the lease gets extended for computing device 205 if the lease is extended from the previous heartbeat. For example, in one embodiment, if the lease window is 500 ms, and if computing device 205 experiences a stall for 1 second between heartbeats at times of 0 seconds and 1 second, then the new lease window granted for the heartbeat generated at time 1 second will be calculated from the previous heartbeat, which is 0 seconds. Accordingly, the lease will expire at 500 ms, which is in the past, and computing device 205 will have an expired lease at time 1 second preventing it from making an access to resource 215. This scheme allows computing device 205 to make one wrong decision after a stall and not cause a catastrophic failure. Therefore, when computing device 210 takes over as the new primary computing device at time 1 second, computing device 205 will be unable to access resource 215 and perform any actions which might corrupt or otherwise cause unintended behavior.
Referring now to
The top row of
The third row of
Turning now to
Entry 405A corresponds to the successful heartbeat generated at 0.1 seconds by the primary storage controller as shown in
This pattern continues for entry 405B which corresponds to the successful heartbeat performed at 0.2 seconds. As shown in entry 405B, the new lease window is valid from 0.1 seconds to 0.6 seconds. Similarly, entry 405C corresponds to the successful heartbeat performed at 0.3 seconds, and the lease window for this heartbeat will be calculated from the previous heartbeat of entry 405B (at 0.2 seconds) until 0.7 seconds when the new lease will expire. This pattern continues for entries 405D-E which will have their leases extended until 0.8 seconds and 0.9 seconds, respectively.
It is noted that in other embodiments, the lease window could be calculated from the time corresponding to two or more heartbeats prior to the current heartbeat. In other words, instead of calculating the lease extension from the time of the previous entry of table 400, the lease extension could be calculated from the time of the entry which is two or more entries prior to the current entry of table 400. For example, for entry 405D corresponding to the successful heartbeat at 0.4 seconds, the lease window could be calculated from entry 405A performed at 0.1 seconds. Accordingly, for this example, the lease window would go from 0.1 seconds to 0.6 seconds, giving an effective lease life of 0.2 seconds from the current time of 0.4 seconds. In other embodiments, the lease window may be calculated from other points in time from the past which are correlated to previous heartbeats, previous lease extensions, or other events or timestamps from the past.
Turning now to
After the heartbeat at 0 seconds, it may be assumed that the primary controller stops functioning normally and stalls rather than generating new heartbeats. As shown, the primary controller does not generate a heartbeat from time 0 seconds until time 1.1 seconds. The secondary controller may detect heartbeats generated by the primary controller, and when the secondary controller detects no new heartbeats for a length of time equal to or greater than the takeover window, the secondary controller may take over as the new primary controller. It is noted that in some embodiments, the lease window may be the same size or larger than the takeover window. In these embodiments, the secondary controller can perform part of a takeover, claiming all devices, before the lease expires. However, the secondary controller cannot trust data written to the devices before the lease expires. Therefore, the secondary controller may wait until the lease expires to perform any state-changing operations.
Accordingly, at time 1 second, a full takeover window has elapsed without a heartbeat, and so the secondary controller may generate a new heartbeat. Since this is the first heartbeat generated by the secondary controller, the secondary controller may not be given a lease for this heartbeat. On the next heartbeat generated by the secondary controller (the new primary controller) at 1.1 seconds, a new lease window may be granted and calculated from the previous heartbeat at time 1 second. Therefore, this lease window will extend from 1 second to 1.5 seconds. The secondary controller (or new primary controller) may continue to generate new heartbeats every 0.1 seconds and new lease windows may be granted for each new heartbeat, with the lease window being calculated from a previous heartbeat.
In some cases, the original primary controller may come back up after the stall and attempt to assert itself as the current primary controller. This is shown with the original primary controller generating a heartbeat at 1.1 seconds. For this heartbeat, a lease window may be granted, but this lease window will begin at the previous heartbeat generated by the original primary controller at 0 seconds. Accordingly, this lease window will already be expired at the time it is granted. Therefore, the original primary controller will be prevented from performing an access to the shared storage device(s) and a split-brain scenario will be avoided. In response to detecting the new lease window being expired at the time it is issued, the original primary controller may perform a check to determine if another storage controller has taken over as the new primary controller. In one embodiment, the original primary controller may detect heartbeats generated by the new primary controller and in response, the original primary controller may transition to the role of secondary controller.
Turning now to
As shown in the timing diagram of
Entry 615C corresponds to the heartbeat performed at time 1.2 seconds, and the lease window may be calculated from the time of the previous entry 615B (or 1.1 seconds). Accordingly, the new lease for entry 615C expires at 1.6 seconds. Similarly, entry 615D corresponds to the heartbeat performed at time 1.3 seconds, and the lease window may be calculated from the time of the previous entry 615C (or 1.2 seconds). Therefore, the new lease for entry 615D expires at 1.7 seconds. This pattern may continue for each newly generated heartbeat by the new primary controller. It is noted that tables 600 and 610 may be combined into a single table in some embodiments. To differentiate between heartbeats generated by different controllers, each entry may include an identifier (ID) of the controller which generated the heartbeat.
Turning now to
A heartbeat interval counter used for determining when to generate a heartbeat for a storage controller may be initialized (block 705). In one embodiment, the counter may be configured to count until 100 ms has elapsed. In other embodiments, the counter may be configured to count for other periods of time. Next, the storage controller may determine if the heartbeat interval counter has expired (conditional block 710).
If the heartbeat interval counter has expired (conditional block 710, “yes” leg), then the storage controller may generate a new heartbeat (block 715). After generating the heartbeat, the storage controller may extend the lease window from a previous point in time prior to the new heartbeat, wherein the lease window allows state-changing operations to be performed to the one or more shared storage device(s) (block 720). In one embodiment, the extension of the lease window may be calculated from a previously generated heartbeat rather than the most recently generated heartbeat. In other words, the lease window may be extended from the point in time when a previous lease extension was made. For example, if the current heartbeat is generated at 1 second, the storage controller may determine when the heartbeat prior to the current heartbeat was generated, and then extend the lease window from that point in time. Accordingly, if the previous heartbeat was performed at 0.9 seconds, then the lease window may be extended from 0.9 seconds. The length of the lease window (e.g., 500 ms) may vary according to the embodiment.
If the heartbeat interval counter has not expired (conditional block 710, “no” leg), then the storage controller may determine if there are any pending state-changing operations targeting the shared storage device(s) (conditional block 725). If there are any pending state-changing operations targeting the shared storage device(s) (conditional block 725, “yes” leg), then the storage controller may determine if it has a valid lease for writing to the shared storage device(s) (conditional block 730). If there are no pending state-changing operations targeting the shared storage device(s) (conditional block 725, “no” leg), then method 700 may return to conditional block 710 to determine if the heartbeat interval counter has expired.
If the storage controller currently has a valid lease for writing to the shared storage device(s) (conditional block 730, “yes” leg), then the storage controller may perform the pending state-changing operations to the shared storage device(s) (block 735). After block 735, method 700 may return to conditional block 710 to determine if the heartbeat interval counter has expired. If the storage controller does not have a valid lease for writing to the shared storage device(s) (conditional block 730, “no” leg), then the storage controller may prevent the pending state-changing operations from being written to the shared storage device(s) (block 740). It is noted that a different storage controller which has a valid lease may perform the pending state-changing operations to the shared storage device(s).
After block 740, the storage controller may perform a check to determine if another storage controller has taken over as the new primary storage controller (conditional block 745). In various embodiments, the storage controller may determine if another storage controller has taken over as the new primary storage controller by detecting heartbeats generated by another controller, checking the heartbeat status table(s) (e.g., tables 400, 600, 610) for heartbeats generated by other controllers, querying the other controllers, and/or utilizing various other techniques. If another storage controller has taken over as the new primary storage controller (conditional block 745, “yes” leg), then the storage controller may operate as a secondary storage controller by performing only read operations to the shared storage device(s) (block 750). After block 750, method 700 may end. If no other storage controller has taken over as the new primary storage controller (conditional block 745, “no” leg), then method 700 may return to conditional block 710 to determine if the heartbeat interval counter has expired.
It is noted that method 700 may be implemented with multiple processes in some embodiments. For example, in another embodiment, the storage controller may constantly monitor the heartbeat interval counter with one process and use one or more other processes to determine whether there are state-changing operations pending, determine if there is a valid lease for performing state-changing operations, perform state-changing operations, and/or detect one or more other conditions.
Turning now to
A storage controller may detect a plurality of pending state-changing operations targeting one or more shared storage devices (block 805). In response to detecting the plurality of pending state-changing operations, the storage controller may determine if there is a valid lease for performing the pending state-changing operations to the shared storage device(s) (conditional block 810). If the lease is valid (conditional block 810, “yes” leg), then the storage controller may determine a maximum number of state-changing operations that can be performed (and allowed to be in progress) without risking loss or inadvertent corruption of data or storage devices in the event of a failure of the fencing mechanism in the presence of state-changing operations (block 815). Alternatively, the number of state-changing operations that can safely be performed without causing loss or corruption of data or storage devices in the event of a failure may have previously been determined. In other words, the storage controller may determine how many and which of the pending state-changing operations could be recovered from if the lease or fencing mechanism fails and these pending state-changing operations were performed after the lease expires. In one scenario, the lease mechanism may fail if there is an unexpectedly long delay between checking the validity of the lease and performing the state-changing operation(s). In some cases a valid lease may no longer exist when the state-changing operations are performed. If the lease is then determined to not be valid (conditional block 810, “no” leg), then the storage controller may attempt to obtain a new lease (block 820). After block 820, method 800 may return to block 810 to determine if there is a valid lease for performing the pending state-changing operations.
The maximum number of the pending state-changing operations that can be performed without causing irrevocable damage if the lease mechanism were to fail may be based on the protection scheme which is being utilized for the shared storage devices. For example, if the protection scheme can withstand a single device failing or having corrupt data, any number of state-changing operations can be performed to one device. For a protection scheme that protects only one device, if there are pending state-changing operations targeting multiple storage devices, then state-changing operations may only be performed on a single device before checking the validity of the lease. Alternatively, if the protection scheme (e.g., RAID 6) being utilized provides fault tolerance for up to two failed devices, then state-changing operations may only be performed on two devices before checking the validity of the lease. For other types of protection schemes, the maximum number of state-changing operations which can be performed after checking the validity of the lease may vary based on the number of errors or failed storage devices the particular protection scheme can withstand if the lease mechanism were to fail. The number of allowed state-changing operations may also be limited to allow for one or more devices to fail or be corrupted due to some issue unrelated, such as failures within the devices themselves. For example, with RAID-6, we may allow one outstanding state-changing operation so that one unrelated device failure can be tolerated as well as one corruption resulting from a state-changing operation that happened after lease expiry.
In another embodiment, rather than determining a maximum number of pending state-changing operations which can be performed based on the current protection scheme, the storage controller may allow for a fixed number (which may be programmable) of state-changing operations to be performed after checking the validity of the lease. For example, in one embodiment, only a single state-changing operation may be allowed to be performed to a single storage device in response to determining the lease is valid. After performing the single state-changing operation, the validity of the lease may be checked again, and the next state-changing operation may be performed if the lease is still valid, with this process being repeated for each remaining pending state-changing operation.
After block 815, the storage controller may perform the maximum number of permitted state-changing operations to the shared storage device(s) (block 825). Next, the storage controller may determine if there are still any pending state-changing operations (conditional block 830). If there are one or more pending state-changing operations (conditional block 830, “yes” leg), then method 800 may return to block 810 to determine if there is a valid lease for performing the pending state-changing operations. If there are no more pending state-changing operations (conditional block 830, “no” leg), then method 800 may end.
It is noted that the above-described embodiments may comprise software. In such an embodiment, the program instructions that implement the methods and/or mechanisms may be conveyed or stored on a non-transitory computer readable medium. Numerous types of media which are configured to store program instructions are available and include hard disks, floppy disks, CD-ROM, DVD, flash memory, Programmable ROMs (PROM), random access memory (RAM), and various other forms of volatile or non-volatile storage.
In various embodiments, one or more portions of the methods and mechanisms described herein may form part of a cloud-computing environment. In such embodiments, resources may be provided over the Internet as services according to one or more various models. Such models may include Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). In IaaS, computer infrastructure is delivered as a service. In such a case, the computing equipment is generally owned and operated by the service provider. In the PaaS model, software tools and underlying equipment used by developers to develop software solutions may be provided as a service and hosted by the service provider. SaaS typically includes a service provider licensing software as a service on demand. The service provider may host the software, or may deploy the software to a customer for a given period of time. Numerous combinations of the above models are possible and are contemplated.
Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
This application is a continuation application of and claims priority from U.S. patent application Ser. No. 14/340,169, filed on Jul. 24, 2014.
Number | Name | Date | Kind |
---|---|---|---|
5208813 | Stallmo | May 1993 | A |
5403639 | Belsan | Apr 1995 | A |
5463741 | Levenstein | Oct 1995 | A |
5850521 | Morganti et al. | Dec 1998 | A |
5872941 | Goodrum et al. | Feb 1999 | A |
5933824 | DeKoning et al. | Aug 1999 | A |
5940838 | Schmuck et al. | Aug 1999 | A |
6032216 | Schmuck et al. | Feb 2000 | A |
6055590 | Pettey et al. | Apr 2000 | A |
6076126 | Shagam | Jun 2000 | A |
6151659 | Solomon et al. | Nov 2000 | A |
6263350 | Wollrath | Jul 2001 | B1 |
6412045 | DeKoning et al. | Jun 2002 | B1 |
6449648 | Waldo et al. | Sep 2002 | B1 |
6526418 | Midgley et al. | Feb 2003 | B1 |
6622163 | Tawill et al. | Sep 2003 | B1 |
6687702 | Vaitheeswaran et al. | Feb 2004 | B2 |
6718448 | Ofer | Apr 2004 | B1 |
6757769 | Ofer | Jun 2004 | B1 |
6775703 | Burns et al. | Aug 2004 | B1 |
6799283 | Masaaki et al. | Sep 2004 | B1 |
6834298 | Singer et al. | Dec 2004 | B1 |
6850938 | Sadjadi | Feb 2005 | B1 |
6915434 | Kuroda | Jul 2005 | B1 |
6934749 | Black et al. | Aug 2005 | B1 |
6965923 | Norman et al. | Nov 2005 | B2 |
6973521 | Indiresan et al. | Dec 2005 | B1 |
6973549 | Testardi | Dec 2005 | B1 |
7028216 | Aizawa et al. | Apr 2006 | B2 |
7028218 | Schwarm et al. | Apr 2006 | B2 |
7039827 | Meyer et al. | May 2006 | B2 |
7065618 | Ghemawat | Jun 2006 | B1 |
7173929 | Testardi | Feb 2007 | B1 |
7216164 | Whitmore et al. | May 2007 | B1 |
7457880 | Kim | Nov 2008 | B1 |
7783682 | Patterson | Aug 2010 | B1 |
7873619 | Faibish et al. | Jan 2011 | B1 |
7913300 | Flank et al. | Mar 2011 | B1 |
7933936 | Aggarwal et al. | Apr 2011 | B2 |
7979613 | Zohar et al. | Jul 2011 | B2 |
8086652 | Bisson et al. | Dec 2011 | B1 |
8117464 | Kogelnik | Feb 2012 | B1 |
8205065 | Matze | Jun 2012 | B2 |
8219762 | Shavit | Jul 2012 | B1 |
8341458 | Butterworth et al. | Dec 2012 | B2 |
8352540 | Anglin et al. | Jan 2013 | B2 |
8489636 | Scales | Jul 2013 | B2 |
8527544 | Colgrove et al. | Sep 2013 | B1 |
8549364 | Ziskind et al. | Oct 2013 | B2 |
8560747 | Tan | Oct 2013 | B1 |
8621241 | Stephenson | Dec 2013 | B1 |
8700875 | Barron et al. | Apr 2014 | B1 |
8719225 | Rath | May 2014 | B1 |
8751463 | Chamness | Jun 2014 | B1 |
8806160 | Colgrove et al. | Aug 2014 | B2 |
8874850 | Goodson et al. | Oct 2014 | B1 |
8949430 | Mueller et al. | Feb 2015 | B2 |
8959305 | Lecrone et al. | Feb 2015 | B1 |
8990954 | Cook | Mar 2015 | B2 |
9423967 | Colgrove et al. | Aug 2016 | B2 |
9436396 | Colgrove et al. | Sep 2016 | B2 |
9436720 | Colgrove et al. | Sep 2016 | B2 |
9454476 | Colgrove et al. | Sep 2016 | B2 |
9454477 | Colgrove et al. | Sep 2016 | B2 |
9513820 | Shalev | Dec 2016 | B1 |
9516016 | Colgrove et al. | Dec 2016 | B2 |
9542237 | Baron et al. | Jan 2017 | B2 |
9552248 | Miller et al. | Jan 2017 | B2 |
20020038436 | Suzuki | Mar 2002 | A1 |
20020049890 | Bosisio et al. | Apr 2002 | A1 |
20020087544 | Selkirk et al. | Jul 2002 | A1 |
20020178335 | Selkirk et al. | Nov 2002 | A1 |
20030140209 | Testardi | Jul 2003 | A1 |
20040049572 | Yamamoto et al. | Mar 2004 | A1 |
20040088396 | Hammons et al. | May 2004 | A1 |
20040117563 | Wu | Jun 2004 | A1 |
20040123053 | Karr et al. | Jun 2004 | A1 |
20050066095 | Mullick et al. | Mar 2005 | A1 |
20050216535 | Saika et al. | Sep 2005 | A1 |
20050223154 | Uemura | Oct 2005 | A1 |
20050262143 | Rao et al. | Nov 2005 | A1 |
20060074940 | Craft et al. | Apr 2006 | A1 |
20060075061 | Garcia | Apr 2006 | A1 |
20060136365 | Kedem et al. | Jun 2006 | A1 |
20060155946 | Ji | Jul 2006 | A1 |
20070067585 | Ueda et al. | Mar 2007 | A1 |
20070162954 | Pela | Jul 2007 | A1 |
20070171562 | Maejima et al. | Jul 2007 | A1 |
20070174673 | Kawaguchi et al. | Jul 2007 | A1 |
20070220313 | Katsuragi et al. | Sep 2007 | A1 |
20070245090 | King et al. | Oct 2007 | A1 |
20070266179 | Chavan et al. | Nov 2007 | A1 |
20080059699 | Kubo et al. | Mar 2008 | A1 |
20080065852 | Moore et al. | Mar 2008 | A1 |
20080120474 | Hastings et al. | May 2008 | A1 |
20080134174 | Sheu et al. | Jun 2008 | A1 |
20080155191 | Anderson et al. | Jun 2008 | A1 |
20080168300 | Brooks | Jul 2008 | A1 |
20080178040 | Kobayashi | Jul 2008 | A1 |
20080209096 | Lin et al. | Aug 2008 | A1 |
20080243847 | Rasmussen | Oct 2008 | A1 |
20080244205 | Amano et al. | Oct 2008 | A1 |
20080275928 | Shuster | Nov 2008 | A1 |
20080285083 | Aonuma | Nov 2008 | A1 |
20080307270 | Li | Dec 2008 | A1 |
20090006587 | Richter | Jan 2009 | A1 |
20090037662 | La Frese et al. | Feb 2009 | A1 |
20090150459 | Blea | Jun 2009 | A1 |
20090204858 | Kawaba | Aug 2009 | A1 |
20090228648 | Wack | Sep 2009 | A1 |
20090300084 | Whitehouse | Dec 2009 | A1 |
20100017409 | Rawat et al. | Jan 2010 | A1 |
20100057673 | Savov | Mar 2010 | A1 |
20100058026 | Heil et al. | Mar 2010 | A1 |
20100067706 | Anan et al. | Mar 2010 | A1 |
20100077205 | Ekstrom et al. | Mar 2010 | A1 |
20100082879 | McKean et al. | Apr 2010 | A1 |
20100106905 | Kurashige et al. | Apr 2010 | A1 |
20100153620 | McKean et al. | Jun 2010 | A1 |
20100153641 | Jagadish et al. | Jun 2010 | A1 |
20100191897 | Zhang et al. | Jul 2010 | A1 |
20100250802 | Waugh et al. | Sep 2010 | A1 |
20100250806 | Devilla et al. | Sep 2010 | A1 |
20100250882 | Hutchison et al. | Sep 2010 | A1 |
20100281225 | Chen et al. | Nov 2010 | A1 |
20100287327 | Li et al. | Nov 2010 | A1 |
20110029810 | Jaisinghani | Feb 2011 | A1 |
20110072300 | Rousseau | Mar 2011 | A1 |
20110137879 | Dubey | Jun 2011 | A1 |
20110145598 | Smith et al. | Jun 2011 | A1 |
20110161559 | Yurzola et al. | Jun 2011 | A1 |
20110167221 | Pangal et al. | Jul 2011 | A1 |
20110185160 | Gaskins et al. | Jul 2011 | A1 |
20110191465 | Hofstaedter et al. | Aug 2011 | A1 |
20110238634 | Kobara | Sep 2011 | A1 |
20120023375 | Dutta et al. | Jan 2012 | A1 |
20120036309 | Dillow et al. | Feb 2012 | A1 |
20120079102 | Damodaran et al. | Mar 2012 | A1 |
20120117029 | Gold | May 2012 | A1 |
20120179798 | Pafumi et al. | Jul 2012 | A1 |
20120198175 | Atkisson | Aug 2012 | A1 |
20120239899 | Adderly et al. | Sep 2012 | A1 |
20120330954 | Sivasubramanian et al. | Dec 2012 | A1 |
20130042052 | Colgrove et al. | Feb 2013 | A1 |
20130046995 | Movshovitz | Feb 2013 | A1 |
20130047029 | Ikeuchi et al. | Feb 2013 | A1 |
20130091102 | Nayak | Apr 2013 | A1 |
20130205110 | Kettner | Aug 2013 | A1 |
20130227236 | Flynn et al. | Aug 2013 | A1 |
20130232261 | Wright et al. | Sep 2013 | A1 |
20130275391 | Batwara et al. | Oct 2013 | A1 |
20130275656 | Talagala et al. | Oct 2013 | A1 |
20130283058 | Fiske et al. | Oct 2013 | A1 |
20130290648 | Shao et al. | Oct 2013 | A1 |
20130290967 | Calciu et al. | Oct 2013 | A1 |
20130318314 | Markus | Nov 2013 | A1 |
20130339303 | Potter et al. | Dec 2013 | A1 |
20140006465 | Davis et al. | Jan 2014 | A1 |
20140025820 | Rawat et al. | Jan 2014 | A1 |
20140052946 | Kimmel | Feb 2014 | A1 |
20140068791 | Resch | Mar 2014 | A1 |
20140089730 | Watanabe et al. | Mar 2014 | A1 |
20140101361 | Gschwind | Apr 2014 | A1 |
20140143517 | Jin et al. | May 2014 | A1 |
20140172929 | Sedayao et al. | Jun 2014 | A1 |
20140181421 | O'Connor et al. | Jun 2014 | A1 |
20140201150 | Kumarasamy et al. | Jul 2014 | A1 |
20140215129 | Kuzmin et al. | Jul 2014 | A1 |
20140229131 | Cohen et al. | Aug 2014 | A1 |
20140229452 | Serita et al. | Aug 2014 | A1 |
20140281308 | Lango et al. | Sep 2014 | A1 |
20140325115 | Ramsundar et al. | Oct 2014 | A1 |
20140359225 | Lee | Dec 2014 | A1 |
20140380425 | Lockett et al. | Dec 2014 | A1 |
20150234709 | Koarashi | Aug 2015 | A1 |
20150244775 | Vibhor et al. | Aug 2015 | A1 |
20150278534 | Thiyagarajan et al. | Oct 2015 | A1 |
20150378895 | Gschwind et al. | Dec 2015 | A1 |
20160019114 | Han et al. | Jan 2016 | A1 |
20160098191 | Golden et al. | Apr 2016 | A1 |
20160098199 | Golden et al. | Apr 2016 | A1 |
Number | Date | Country |
---|---|---|
103370685 | Oct 2013 | CN |
103370686 | Oct 2013 | CN |
104025010 | Nov 2016 | CN |
3066610 | Sep 2016 | EP |
3082047 | Oct 2016 | EP |
3120235 | Jan 2017 | EP |
2007-087036 | Apr 2007 | JP |
2007-094472 | Apr 2007 | JP |
2008-250667 | Oct 2008 | JP |
2010-211681 | Sep 2010 | JP |
WO-1995002349 | Jan 1995 | WO |
WO-1999013403 | Mar 1999 | WO |
WO-2008102347 | Aug 2008 | WO |
WO-2010071655 | Jun 2010 | WO |
Entry |
---|
Microsoft Corporation, “GCSettings.IsServerGC Property”, Retrieved Oct. 27, 2013 via the WayBack Machine, 3 pages. |
Microsoft Corporation, “Fundamentals of Garbage Collection”, Retrieved Aug. 30, 2013 via the WayBack Machine, 11 pages. |
Number | Date | Country | |
---|---|---|---|
Parent | 14340169 | Jul 2014 | US |
Child | 15254293 | US |