Current data storage devices such as volatile and non-volatile memory often include fault tolerance mechanisms to ensure that data remains available in the event of a device error or failure.
An example of a fault tolerance mechanism provided to current data storage devices is a redundant array of independent disks (RAID). RAID is a storage technology that controls multiple memory modules and provides fault tolerance by storing data with redundancy. RAID technology may store data with redundancy in a variety of ways. Examples of redundant data storage include duplicating data and storing the data in multiple memory modules and adding parity to store calculated error recovery bits. The multiple memory modules, which may include the data and associated parity, may be accessed concurrently by multiple redundancy controllers.
Another example of a fault tolerance mechanism provided to current data storage devices is an end-to-end retransmission scheme. The end-to-end retransmission scheme is utilized to create a reliable memory fabric that retransmits individual packets or entire routes that are lost enroute to a protocol agent due to transient issues such as electrical interference, or persistent issues such as the failure of a routing component, cable, or connector.
Features of the present disclosure are illustrated by way of example and not limited in the following figure(s), in which like numerals indicate like elements, in which:
For simplicity and illustrative purposes, the present disclosure is described by referring mainly to an example thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent however, that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure. As used herein, the terms “a” and “an” are intended to denote at least one of a particular element, the term “includes” means includes but not limited to, the term “including” means including but not limited to, and the term “based on” means based at least in part on.
In addition, the following terms will be used throughout the remainder of the present disclosure. The term fabric may mean some combination of interconnected fabric devices used to convey packet-based information between endpoint components. The term memory fabric may mean a fabric used, at least in part, to provide connectivity between redundancy controllers and media controllers. The term protocol agents may mean endpoints (e.g., producers and consumers of data) that communicate with each other over a memory fabric. The term redundancy controller may mean a requesting protocol agent that acts on behalf of a central processing unit (CPU), input output (I/O) device, or other user of memory, and generates requests such as read and write requests to one or more responding protocol agents (e.g., media controllers). The redundancy controller may be the attachment point where producers or consumers of data attach to the fabric. The redundancy controller may communicate with multiple media controllers and may implement redundant storage of data across more than one media controller on behalf of a CPU, I/O device, etc., such that the failure of a subset of the media controllers will not result in loss of data or interruption of system operation. The term media controller may mean a responding protocol agent that connects memory or storage devices to a memory fabric. The media controller may receive requests such as read and write requests, control the memory or storage devices, and return corresponding responses. The media controller may be the attachment point where data storage attaches to the memory fabric.
The term command may mean a transaction sent from a processor, I/O device or other source to a redundancy controller, which causes the redundancy controller to issue a sequence. The term primitive may mean a single request issued by a redundancy controller to a media controller along with its corresponding response from the media controller back to the redundancy controller. The term idempotent primitive may mean a primitive that has no additional effect if the packet is received more than once. The term non-idempotent primitive may mean a primitive that has an additional effect if the packet is received more than once. The term sequence may mean an ordered set of primitives issued by a redundancy controller to one or more media controllers to execute a command received from a processor, I/O device or other source. The term locked sequence may mean a sequence that requires atomic access to multiple media controllers. The term cacheline may mean a unit of data that may be read from or written to a media controller by a redundancy controller. The term is not intended to be restrictive. The cacheline may include any type or size of data, such as a disk sector, a solid-state drive (SSD block), a RAID block or a processor cacheline. The term stripe may mean a set of one or more data cachelines, and associated redundancy information stored in one or more parity cachelines that are distributed across multiple memory modules. The term RAID degraded mode may mean a mode of operation of a RAID redundancy controller following the failure of a survivable subset of the media controllers or their memory devices. In degraded mode, reads and writes access the surviving media controllers only.
Disclosed herein are examples of methods to modify a retransmission sequence involving non-idempotent primitives in a fault-tolerant memory. The fault-tolerant memory may be a packet-switched memory fabric that connects one or more requesting protocol agents to a plurality of responding protocol agents. The fault-tolerant memory may, for instance, implement RAID storage technology. A retransmission sequence is used to create a reliable fabric when individual packets or entire routes may be lost enroute to a protocol agent. An example of a retransmission sequence is a protocol layer end-to-end retransmission sequence, where entire protocol-layer primitives (e.g., read-request-to-read-return) are timed by the protocol agent that issues the request, and the entire primitive is repeated in the event that the entire primitive does not complete.
The protocol layer end-to-end retransmission sequence, however, may be unsafe for the carriage of non-idempotent primitives from a requesting protocol agent to a responding protocol agent in the fault-tolerant memory. For example, when a response fails to arrive back at the requesting protocol agent, the requesting protocol agent does not know whether the initial request had already reached the responding protocol agent. Thus, when the requesting protocol agent resends the request, the same packet may be received twice by the responding protocol agent.
Accordingly, the disclosed examples provide a method to safely use non-idempotent primitives across a protocol-layer end-to-end retransmission-protected memory fabric by modifying the sequence to utilize storage redundancy to reconstruct lost packets from the non-idempotent primitives. According to an example, a redundancy controller may request a sequence to access a stripe in the fault-tolerant memory fabric, wherein the sequence involves a non-idempotent primitive. In response to determining an expiration of a time threshold for the non-idempotent primitive, the redundancy controller may read other data in other cachelines in the stripe, use RAID to regenerate data that was potentially corrupted by a first transmission of the non-idempotent primitive, write the regenerated data to the stripe, reissue the non-idempotent primitive on an alternate fabric route, calculate a new parity value by performing an idempotent exclusive-or primitive on the new data with the other data in the stripe, and write the new parity to the stripe using the idempotent write primitive.
End-to-end retransmission sequences may require packets to be stored in a costly “replay buffer” by protocol agents and that an explicit end-to-end acknowledgement is delivered. For example, each request packet that crosses the multi-hop memory fabric is acknowledged by a completion message crossing back in the other direction. The requesting protocol agent waits for the completion message while holding a copy of the requested packet in a replay buffer so that the requested packet may be resent if the completion message never arrives. The expiration of a predetermined time threshold while waiting for the completion message triggers a resend of the copy of the requested packet from the replay buffer, possibly via an alternate fabric route. Completion messages consume considerable memory fabric bandwidth, so it is advantageous to use a protocol-layer end-to-end retransmission scheme, where the protocol-layer responses serve double-purpose as delivery acknowledgements.
The protocol-layer end-to-end retransmission sequence of the disclosed examples may structure all traffic as primitives of request and response packets and rely upon the storage redundancy inherent in RAID to correct the aftermath of a lost packet in a non-idempotent sequence. A timer in the redundancy controller may monitor how long each request has been outstanding before a response is received, and if the timer reaches a predetermined time threshold, either the request packet or the response packet is known to have been lost. This approach has the advantages that (a) no dedicated replay buffer is required for storing individual request and response packets, as the entire protocol layer may be replayed instead, and (b) fabric overhead associated with completion packets or acknowledgement fields in packets may be avoided.
As discussed above, this end-to-end retransmission sequence may not be safe for the carriage of non-idempotent packets. For example, simple read and write primitives to a single media controller are idempotent provided proper precautions are taken to avoid race conditions between multiple redundancy controllers. That is, a second read primitive immediately following a first one returns identical data and a second write primitive immediately following a first one writes the same data. However, non-idempotent primitives such as swap, increment, etc. are non-idempotent. For example an increment primitive increases a stored value by one, but when delivered twice, the increment primitive increases the stored value by two instead. Accordingly, when a response fails to arrive back at a requesting redundancy controller, the timer expiration informs the requesting redundancy controller that either the request packet or a resulting response packet has been lost. However, there is no way for the requesting redundancy controller to know which of these two packets was lost. As such, there is no way to know whether the request had already reached the media controller. Thus, the redundancy controller may resend the request, which would result in the same packet being received twice by the media controller.
Non-idempotent primitives, for example, are valuable in RAID storage. A write primitive to a RAID stripe typically includes a sequence of events whereby a redundancy controller reads old data from a media controller, then reads the old parity typically from a different media controller, writes the new data to the first media controller, calculates the new parity value from the new and old data and old parity values, and finally writes the new value of the parity to the appropriate media controller. Since individual read and write primitives are idempotent, the entire sequence may be implemented entirely composed of idempotent primitives. However, this is a very sub-optimal implementation. A more optimized implementation might, for example, merge the data read and data write into a single data swap primitive, which writes new data, while returning the prior data, or use an exclusive-or (XOR) primitive in place of a parity read primitive and parity write primitive. This increases performance and lowers memory fabric traffic by combining two primitives into one primitive. Unfortunately, the swap and XOR primitives are non-idempotent primitives. That is, if the swap primitive is received a second time, it cannot return the correct old data as it has already been overwritten in the first delivery of the swap primitive. Similarly, an XOR primitive is ineffective if received twice because the second transmission undoes the effect of the first transmission. Accordingly, the disclosed examples provide a method to safely use non-idempotent primitives in a protocol-layer end-to-end retransmission-protected memory fabric by modifying the retransmission sequence.
The disclosed examples may provide the technical benefits and advantages of an end-to-end retransmission-protected memory fabric without incurring the memory fabric overhead and replay-buffer storage costs associated with packet delivery acknowledgment. Furthermore, the disclosed examples do not incur the performance and memory fabric overhead costs of foregoing the swap and XOR primitive optimizations. In this regard, the disclosed method takes advantage of non-idempotent primitive optimizations while avoiding repeat delivery hazards that are associated with the carriage of non-idempotent primitives across a protocol-layer end-to-end retransmission-protected memory fabric during a retransmission sequence.
With reference to
For example, the compute node 100 may include a processor 102, an input/output interface 106, a private memory 108, and a redundancy controller 110. In one example, the compute node 100 is a server but other types of compute nodes may be used. The compute node 100 may be a node of a distributed data storage system. For example, the compute node 100 may be part of a cluster of nodes that services queries and provides data storage for multiple users or systems, and the nodes may communicate with each other to service queries and store data. The cluster of nodes may provide data redundancy to prevent data loss and minimize down time in case of a node failure.
The processor 102 may be a microprocessor, a micro-controller, an application specific integrated circuit (ASIC), field programmable gate array (FPGA), or other type of circuit to perform various processing functions. The private memory 108 may include volatile dynamic random access memory (DRAM) with or without battery backup, non-volatile phase change random access memory (PCRAM), spin transfer torque-magnetoresistive random access memory (STT-MRAM), resistive random access memory (reRAM), memristor, FLASH, or other types of memory devices. For example, the memory may be solid state, persistent, dense, fast memory. Fast memory can be memory having an access time similar to DRAM memory. The I/O interface 106 may include a hardware and/or a software interface. The I/O interface 106 may be a network interface connected to a network, such as the Internet, a local area network, etc. The compute node 100 may receive data and user-input through the I/O interface 106. Where examples herein describe redundancy controller behavior occurring in response to commands issued by the processor 102, this should not be taken restrictively. The examples are also applicable if such commands are issued by an I/O device via interface 106.
The components of computing node 100 may be coupled by a bus 105, where the bus 105 may be a communication system that transfers data between the various components of the computing device 100. In examples, the bus 105 may be a Peripheral Component Interconnect (PCI), Industry Standard Architecture (ISA), PCI-Express, HyperTransport®, NuBus, a proprietary bus, and the like. Alternatively, the processor 102 may use multiple different fabrics to communicate with the various components, such as PCIe for I/O, DDR3 for memory, and QPI for the redundancy controller.
The redundancy controller 110, for example, may act on behalf of the processor 102 and generate sequences of primitives such as read, write, swap, XOR, lock, unlock, etc. requests to one or more responding protocol agents (e.g., media controllers 120A-M) as discussed further below with respect to
With reference to
The multiple compute nodes 100A-N may be coupled to the memory modules 104A-M through the network 140. The memory modules 104A-M may include media controllers 120A-M and memories 121A-M. Each media controller, for instance, may communicate with its associated memory and control access to the memory by the redundancy controllers 110A-N, which in turn are acting on behalf of the processors. The media controllers 120A-M provide access to regions of memory. The regions of memory are accessed by multiple redundancy controllers in the compute nodes 100A-N using access primitives such as read, write, lock, unlock, swap, XOR, etc. In order to support aggregation or sharing of memory, media controllers 120A-M may be accessed by multiple redundancy controllers (e.g., acting on behalf of multiple servers). Thus, there is a many-to-many relationship between redundancy controllers and media controllers. Each of the memories 121A-M may include volatile dynamic random access memory (DRAM) with battery backup, non-volatile phase change random access memory (PCRAM), spin transfer torque-magnetoresistive random access memory (STT-MRAM), resistive random access memory (reRAM), memristor, FLASH, or other types of memory devices. For example, the memory may be solid state, persistent, dense, fast memory. Fast memory can be memory having an access time similar to DRAM memory.
As described in the disclosed examples, the redundancy controllers 110A-N may maintain fault tolerance across the memory modules 104A-M. The redundancy controller 110 may receive commands from one or more processors 102, I/O devices, or other sources. In response to receipt of these commands, the redundancy controller 110 generates sequences of primitive accesses to multiple media controllers 120A-M. The redundancy controller 110 may also generate certain sequences of primitives independently, not directly resulting from processor commands. These include sequences used for scrubbing, initializing, migrating, or error-correcting memory. The media controllers 120A-M may then respond to the requested primitives with a completion message.
RAID stripe locks acquired and released by the redundancy controller 110 guarantee atomicity for locked sequences. Accordingly, the shortened terms “stripe” and “stripe lock” has been used throughout the text to describe RAID stripes and locks on RAID stripes, respectively.
The time module 112, for instance, may identify a time threshold for receiving an acknowledgement to a non-idempotent primitive. The sequence module 114, for instance, may initiate a sequence that includes at least one non-idempotent primitive to access a stripe in the fault-tolerant memory fabric. In response to a determination by the time module 112 that the time threshold for the non-idempotent primitive has expired, the modification module 116, for instance, may perform the following functions. The modification module 116 may request a read of other data in other cachelines in the stripe, request the use of RAID to regenerate data that was potentially corrupted by a first transmission of the non-idempotent primitive, request a write of the regenerated data to the stripe, request a computation of a new parity value, and request an idempotent write primitive to write the new parity to the stripe.
In this example, modules 112-116 are circuits implemented in hardware. In another example, the functions of modules 112-116 may be machine readable instructions stored on a non-transitory computer readable medium and executed by a processor, as discussed further below in
Referring to
A stripe may include a combination of data cachelines from at least one memory module and parity cachelines from at least one other memory module. In other words, a stripe may include memory blocks distributed across multiple modules which contain redundant information, and must be atomically accessed to maintain the consistency of the redundant information. For example, one stripe may include cachelines A1, A2, and Ap (stripe 1), another stripe may include cachelines B1, B2, and Bp (stripe 2), another stripe may include cachelines C1, C2, and Cp (stripe 3), and another stripe may include cachelines D1, D2, and Dp (stripe 4). The data cachelines in a stripe may or may not be sequential in the address space of the processor 102. A RAID memory group may include stripes 1-4. The example in
According to this example, if memory module 1 fails, the data cachelines from memory module 2 may be combined with the corresponding-stripe parity cachelines from memory module 3 (using the boolean exclusive-or function) to reconstruct the missing cachelines. For instance, if memory module 1 fails, then stripe 1 may be reconstructed by performing an exclusive-or function on data cacheline A2 and parity cacheline Ap to determine data cacheline A1. In addition, the other stripes may be reconstructed in a similar manner using the fault tolerant scheme of this example. In general, a cacheline on a single failed memory module may be reconstructed by using the exclusive-or function on the corresponding-stripe cachelines on all of the surviving memory modules. The use of the simple exclusive-or operation in the reconstruction of missing data should not be taken restrictively. Different data-recovery operations may involve different mathematical techniques. For example, RAID-6 commonly uses Reed-Solomon codes.
As discussed above, when a response packet is lost, for instance, the response packet is not received at a requesting redundancy controller within a predetermined time threshold in other end-to-end retransmission sequences, the requesting redundancy controller does not know whether the request packet was lost enroute to the media controller or whether the resulting response packet was lost on a return route to the requesting redundancy controller. These two scenarios are illustrated in
At arc 302, the redundancy controller 110 issues a non-idempotent primitive request packet, such as a swap primitive, to the media controller 120A of the first data memory module 303 in stripe 301. The swap primitive is a non-idempotent primitive that reads the old data, and then writes the new data to the first data memory module 303 in one merged primitive. As such, the redundancy controller 110 requests to swap new data with old data in arc 302. In the example of
In this scenario, the non-idempotent swap primitive does not have a harmful effect upon the end-to-end retransmission sequence because the initial non-idempotent primitive packet in arc 302 never reached the media controller 120A. In other words, the old data in the first data memory module 303 in stripe 301 remains unchanged, and therefore the reissued non-idempotent swap primitive request packet in arc 304 will receive the correct old data as shown in arc 306.
On the other hand,
At arc 402, the redundancy controller 110 issues a non-idempotent primitive request packet, such as a swap primitive, to the media controller 120A of the first data memory module 303 in stripe 301. Upon receiving the non-idempotent swap primitive request packet, the media controller 120A may store the new data in memory, and return old data from the first data memory module 303 in a response packet to the redundancy controller 110, as shown in arc 404. However, on the return route, the response packet of arc 404 is lost due to a memory fabric failure.
After waiting for the expiration of a time threshold, the redundancy controller 110 may reissue the non-idempotent swap primitive request packet to the media controller 120A via an alternate fabric route in accordance with the end-to-end retransmission sequence protocol, as shown in arc 406. However, this reissued non-idempotent swap primitive request packet is harmful because the new data has already been written to the first data memory module 303 in stripe 301 by the initial swap primitive of arc 402. That is, media controller 120A may receive the same non-idempotent swap primitive packet twice. In this regard, the media controller 120A returns a response packet to the redundancy controller 110 with corrupted data in arc 408. That is, the media controller 120 returns a response packet with incorrect “old” data that has already been overwritten with the new data in the initial swap primitive of arc 402.
Accordingly, in this scenario, the non-idempotent swap primitive has a harmful effect upon the end-to-end retransmission sequence because the initial primitive packet in arc 402 was received and executed by the media controller 120A. In other words, the old data in the first data memory module 303 in stripe 301 is swapped with the new data, and therefore the reissued non-idempotent swap primitive request packet in arc 402 will receive incorrect and corrupt data as shown in arc 408.
The methods disclosed below in
At arc 502, the redundancy controller 110 issues a non-idempotent swap primitive request packet to the media controller 120A of the first data memory module 303 in stripe 301. Upon receiving the non-idempotent swap primitive request packet, the media controller 120A may return old data from the first data memory module 303 in a response packet to the redundancy controller 110, as shown in arc 504. However, on the return route, the response packet of arc 504 may be lost due to a memory fabric failure.
After waiting for the expiration of a time threshold, the redundancy controller 110 may use redundancy to reconstruct and write the old data in the first data memory module 303. For example, the redundancy controller 110 may issue a read primitive to the second data memory module 307 and may issue a read primitive to the parity memory module 305, as shown in arcs 506 and 508. As a result, the redundancy controller 110 may receive the old parity from the media controller 120B and data from media controller 120C, as shown in arcs 510 and 512. At arc 514, the redundancy controller 110 may then reconstruct the old data by performing an exclusive-or function on the old parity and the data received in arcs 510 and 512 and write the old data to the first data memory module 303. In response to writing the old data, the redundancy controller 110 may receive a completion message from the media controller 120A at arc 516.
According to an example, the redundancy controller 110 may attempt an end-to-end retransmission of the non-idempotent swap primitive on an alternate fabric route, as shown at arc 518. Upon receiving the non-idempotent swap primitive request, the media controller 120A may return old data from the first data memory module 303 in a response packet to the redundancy controller 110, as shown at arc 520. As shown at arc 522, the redundancy controller 110 may then issue an idempotent write primitive to commit a new parity to the parity memory module 305 to keep the stripe 301 consistent. At arc 524, the redundancy controller 110 may receive a completion message from the media controller 120B. At this point, method 500 has successfully modified a retransmission sequence to utilize redundancy and an alternate fabric route to safely recover from a fabric failure.
With reference to
Prior to performing a sequence that involves at least one non-idempotent primitive, a determination is made as to whether the redundancy controller is in a degraded mode. The degraded mode is an operation of the redundancy controller following the failure of a survivable subset of the media controllers or their memory devices. In a degraded mode, reads and writes access the surviving media controllers only. According to an example, in response to determining that the redundancy controller is in the degraded mode, an alternate sequence that involves only idempotent primitives may be requested. On the other hand, in response to determining that the redundancy controller is not in the degraded mode, the redundancy controller may perform the sequence that involves the at least one non-idempotent primitive.
In this regard, the redundancy controller 110 may request a lock from the media controller 120B. The stripe 301 may include data stored in at least one data memory module and parity stored in at least one parity module. Since there is typically no single point of serialization for multiple redundancy controllers that access a single stripe, a point of serialization may be created at the media controller 120B. The point of serialization may be created at the media controller 120B because any sequence that modifies the stripe 301 has to access the parity. Accordingly, the redundancy controller 110 may obtain the lock for the stripe 301. The lock may provide the redundancy controller 110 with exclusive access to the stripe. For instance, the lock prevents a second redundancy controller from concurrently performing a second sequence that accesses multiple memory modules in the stripe 301 during the locked sequence.
As shown in block 610, the redundancy controller, for instance, may request a sequence to access a stripe in the fault-tolerant memory fabric. In this example, the redundancy controller is not in a degraded mode and the sequence includes at least one non-idempotent primitive. The non-idempotent primitive may be, for example, an optimized swap primitive that merges a data read primitive and a data write primitive into a single primitive. In another example, the non-idempotent primitive may be an optimized exclusive-or primitive that merges a parity read primitive and a parity write primitive merged into a single primitive. Specifically, the optimized exclusive-or primitive includes performing an idempotent exclusive-or primitive on old data from the cacheline with the new data from the cacheline to determine a resulting exclusive-or value, performing the idempotent exclusive-or primitive on the resulting exclusive-or value with an old parity to determine a new parity, and writing the new parity to the stripe using the idempotent write primitive.
According to an example, the redundancy controller may set a timer to monitor the duration between requesting a non-idempotent primitive and receiving a completion message for the requested non-idempotent primitive from an associated media controller. In block 620, the redundancy controller may determine that the duration for the non-idempotent primitive has exceeded a predetermined time threshold (e.g., the time threshold has expired). The time threshold, for instance, is a predefined duration in which a completion message to a requested primitive is expected to be received at the redundancy controller from the media controller.
In response to determining an expiration of a time threshold for the non-idempotent primitive, the redundancy controller may read other data in other cachelines in the stripe as shown in block 630 and use RAID to regenerate data that was potentially corrupted by a first transmission of the non-idempotent primitive as shown in block 640. As a result, the redundancy controller may request a write primitive to write the regenerated data to the stripe as shown in block 650. Furthermore, the redundancy controller may reissue the non-idempotent primitive on an alternate fabric route in block 660 and calculate a new parity value by performing an idempotent exclusive-or primitive on the new data with the other data in the stripe as shown in block 670. In this regard, for instance, the redundancy controller may request an idempotent write primitive to write the new parity to the stripe, as shown in block 680. Accordingly, the retransmission sequence is modified in method 600 to avoid data corruption in the stripe and safely use non-idempotent primitives in a protocol-layer end-to-end retransmission-protected memory fabric.
According to another example, if the redundancy controller receives the completion message within the predetermined time threshold, then the redundancy may acknowledge that the requested non-idempotent primitive has been completed on the stripe and proceed with the next primitive of the sequence. For example, the redundancy controller may request another non-idempotent primitive in the sequence and iteratively perform the steps in blocks 620-680 if necessary. Once the sequence has completed, the redundancy controller 110 may then release the lock for the stripe 301.
Some or all of the operations set forth in the methods 500 and 600 may be contained as utilities, programs, or subprograms, in any desired computer accessible medium. In addition, the methods 500 and 600 may be embodied by computer programs, which may exist in a variety of forms both active and inactive. For example, they may exist as machine readable instructions, including source code, object code, executable code or other formats. Any of the above may be embodied on a non-transitory computer readable storage medium.
Examples of non-transitory computer readable storage media include conventional computer system RAM, ROM, EPROM, EEPROM, and magnetic or optical disks or tapes. It is therefore to be understood that any electronic device capable of executing the above-described functions may perform those functions enumerated above.
Turning now to
The computer-readable medium 710 may store instructions to perform methods 500 and 600. For example, the computer-readable medium 710 may include machine readable instructions such as lock instructions 712 to request and release a lock to perform a sequence with a non-idempotent primitive on the stripe, sequence instructions 714 to request a sequence with a non-idempotent primitive on the stripe, time threshold instructions 716 to determine whether a time threshold for receiving a completion message for the non-idempotent primitive has expired, and in response to determining the expiration of the time threshold, modifying instructions 718 to initiate a use of RAID to regenerate data that was potentially corrupted by a first transmission of the non-idempotent primitive, initiate a write of the regenerated data to the stripe, initiate a computation of a new parity value, and initiate the idempotent write primitive to write the new parity to the stripe. In this regard, the computer-readable medium 710 may include machine readable instructions to perform methods 500 and 600 when executed by the processor 702.
What has been described and illustrated herein are examples of the disclosure along with some variations. The terms, descriptions and figures used herein are set forth by way of illustration only and are not meant as limitations. Many variations are possible within the scope of the disclosure, which is intended to be defined by the following claims—and their equivalents—in which all terms are meant in their broadest reasonable sense unless otherwise indicated.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2015/013898 | 1/30/2015 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2016/122637 | 8/4/2016 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5032744 | Wai Yeung | Jul 1991 | A |
5243592 | Perlman et al. | Sep 1993 | A |
5327553 | Jewett et al. | Jul 1994 | A |
5533999 | Hood et al. | Jul 1996 | A |
5546535 | Stallmo et al. | Aug 1996 | A |
5555266 | Buchholz et al. | Sep 1996 | A |
5633996 | Hayashi et al. | May 1997 | A |
5633999 | Clowes et al. | May 1997 | A |
5724046 | Martin et al. | Mar 1998 | A |
5905871 | Buskens et al. | May 1999 | A |
6073218 | DeKoning et al. | Jun 2000 | A |
6081907 | Witty et al. | Jun 2000 | A |
6092191 | Shimbo et al. | Jul 2000 | A |
6141324 | Abbott et al. | Oct 2000 | A |
6151659 | Solomon et al. | Nov 2000 | A |
6181704 | Drottar et al. | Jan 2001 | B1 |
6389373 | Ohya | May 2002 | B1 |
6457098 | DeKoning et al. | Sep 2002 | B1 |
6467024 | Bish et al. | Oct 2002 | B1 |
6490659 | McKean et al. | Dec 2002 | B1 |
6502165 | Kishi et al. | Dec 2002 | B1 |
6510500 | Sarkar et al. | Jan 2003 | B2 |
6542960 | Wong et al. | Apr 2003 | B1 |
6654830 | Taylor et al. | Nov 2003 | B1 |
6735645 | Weber et al. | May 2004 | B1 |
6826247 | Elliott et al. | Nov 2004 | B1 |
6834326 | Wang | Dec 2004 | B1 |
6911864 | Bakker et al. | Jun 2005 | B2 |
6938091 | Das Sharma | Aug 2005 | B2 |
6970987 | Ji | Nov 2005 | B1 |
7366808 | Kano et al. | Apr 2008 | B2 |
7506368 | Kersey et al. | Mar 2009 | B1 |
7738540 | Yamasaki et al. | Jun 2010 | B2 |
7839858 | Wiemann et al. | Nov 2010 | B2 |
7908513 | Ogasawara et al. | Mar 2011 | B2 |
7934055 | Flynn et al. | Apr 2011 | B2 |
7996608 | Chatterjee et al. | Aug 2011 | B1 |
8005051 | Watanabe | Aug 2011 | B2 |
8018890 | Venkatachalam et al. | Sep 2011 | B2 |
8054789 | Boariu et al. | Nov 2011 | B2 |
8103869 | Balandin et al. | Jan 2012 | B2 |
8135906 | Wright et al. | Mar 2012 | B2 |
8161236 | Noveck | Apr 2012 | B1 |
8169908 | Sluiter et al. | May 2012 | B1 |
8171227 | Goldschmidt et al. | May 2012 | B1 |
8332704 | Chang et al. | Dec 2012 | B2 |
8341459 | Kaushik et al. | Dec 2012 | B2 |
8386834 | Goel et al. | Feb 2013 | B1 |
8386838 | Byan et al. | Feb 2013 | B1 |
8462690 | Chang et al. | Jun 2013 | B2 |
8483116 | Chang et al. | Jul 2013 | B2 |
8605643 | Chang et al. | Dec 2013 | B2 |
8619606 | Nagaraja | Dec 2013 | B2 |
8621147 | Galloway et al. | Dec 2013 | B2 |
8667322 | Chatterjee et al. | Mar 2014 | B1 |
8700570 | Marathe et al. | Apr 2014 | B1 |
8793449 | Kimmel | Jul 2014 | B1 |
8812901 | Sheffield, Jr. | Aug 2014 | B2 |
9128948 | Raorane | Sep 2015 | B1 |
9166541 | Funato et al. | Oct 2015 | B2 |
9298549 | Camp et al. | Mar 2016 | B2 |
9621934 | Seastrom | Apr 2017 | B2 |
20010002480 | Dekoning et al. | May 2001 | A1 |
20020162076 | Talagala et al. | Oct 2002 | A1 |
20030037071 | Harris et al. | Feb 2003 | A1 |
20030126315 | Tan et al. | Jul 2003 | A1 |
20040133573 | Miloushev et al. | Jul 2004 | A1 |
20040233078 | Takehara | Nov 2004 | A1 |
20050027951 | Piccirillo et al. | Feb 2005 | A1 |
20050044162 | Liang | Feb 2005 | A1 |
20050120267 | Burton et al. | Jun 2005 | A1 |
20050144406 | Chong, Jr. | Jun 2005 | A1 |
20050157697 | Lee et al. | Jul 2005 | A1 |
20060112304 | Subramanian et al. | May 2006 | A1 |
20060129559 | Sankaran et al. | Jun 2006 | A1 |
20060264202 | Hagmeier et al. | Nov 2006 | A1 |
20070028041 | Hallyal et al. | Feb 2007 | A1 |
20070140692 | Decusatis et al. | Jun 2007 | A1 |
20070168693 | Pittman | Jul 2007 | A1 |
20070174657 | Ahmadian et al. | Jul 2007 | A1 |
20080060055 | Lau | Mar 2008 | A1 |
20080137669 | Balandina et al. | Jun 2008 | A1 |
20080201616 | Ashmore | Aug 2008 | A1 |
20080281876 | Mimatsu | Nov 2008 | A1 |
20090080432 | Kolakeri et al. | Mar 2009 | A1 |
20090259882 | Shellhamer | Oct 2009 | A1 |
20090313313 | Yokokawa et al. | Dec 2009 | A1 |
20100107003 | Kawaguchi | Apr 2010 | A1 |
20100114889 | Rabii | May 2010 | A1 |
20110109348 | Chen et al. | May 2011 | A1 |
20110173350 | Coronado et al. | Jul 2011 | A1 |
20110208994 | Chambliss et al. | Aug 2011 | A1 |
20110213928 | Grube et al. | Sep 2011 | A1 |
20110246819 | Callaway et al. | Oct 2011 | A1 |
20110314218 | Bert | Dec 2011 | A1 |
20120032718 | Chan et al. | Feb 2012 | A1 |
20120059800 | Guo | Mar 2012 | A1 |
20120096329 | Taranta, II | Apr 2012 | A1 |
20120137098 | Wang et al. | May 2012 | A1 |
20120166699 | Kumar et al. | Jun 2012 | A1 |
20120166909 | Schmisseur et al. | Jun 2012 | A1 |
20120201289 | Abdalla et al. | Aug 2012 | A1 |
20120204032 | Wilkins et al. | Aug 2012 | A1 |
20120213055 | Bajpai et al. | Aug 2012 | A1 |
20120297272 | Bakke et al. | Nov 2012 | A1 |
20120311255 | Chambliss | Dec 2012 | A1 |
20130060948 | Ulrich et al. | Mar 2013 | A1 |
20130128721 | Decusatis et al. | May 2013 | A1 |
20130128884 | Decusatis et al. | May 2013 | A1 |
20130138759 | Chen et al. | May 2013 | A1 |
20130148702 | Payne | Jun 2013 | A1 |
20130227216 | Cheng et al. | Aug 2013 | A1 |
20130246597 | Iizawa | Sep 2013 | A1 |
20130297976 | McMillen | Nov 2013 | A1 |
20130311822 | Kotzur et al. | Nov 2013 | A1 |
20130312082 | Izu et al. | Nov 2013 | A1 |
20140067984 | Danilak | Mar 2014 | A1 |
20140095865 | Yerra et al. | Apr 2014 | A1 |
20140115232 | Goss et al. | Apr 2014 | A1 |
20140136799 | Fortin | May 2014 | A1 |
20140269731 | Decusatis et al. | Sep 2014 | A1 |
20140281688 | Tiwari et al. | Sep 2014 | A1 |
20140304469 | Wu | Oct 2014 | A1 |
20140331297 | Innes et al. | Nov 2014 | A1 |
20150012699 | Rizzo et al. | Jan 2015 | A1 |
20150095596 | Yang et al. | Apr 2015 | A1 |
20150146614 | Yu et al. | May 2015 | A1 |
20150199244 | Venkatachalam et al. | Jul 2015 | A1 |
20150288752 | Voigt | Oct 2015 | A1 |
20160034186 | Weiner et al. | Feb 2016 | A1 |
20160170833 | Segura et al. | Jun 2016 | A1 |
20160196182 | Camp et al. | Jul 2016 | A1 |
20160226508 | Kurooka et al. | Aug 2016 | A1 |
20170253269 | Kanekawa et al. | Sep 2017 | A1 |
20170302409 | Sherlock | Oct 2017 | A1 |
20170346742 | Shahar et al. | Nov 2017 | A1 |
Number | Date | Country |
---|---|---|
101576805 | Nov 2009 | CN |
102521058 | Jun 2012 | CN |
10433358 | Feb 2015 | CN |
1347369 | Nov 2012 | EP |
1546MUM2013 | Mar 2015 | IN |
201346530 | Nov 2013 | TW |
WO-02091689 | Nov 2002 | WO |
2014120136 | Aug 2014 | WO |
Entry |
---|
International Search Report and Written Opinion; PCT/US2015/013898; dated Oct. 8, 2015; 11 pages. |
Paulo Sergio Almeida et al; Scalable Eventually Consistent Counters Over Unreliable Networks; Jul. 12, 2013; 32 Pages. |
Amiri, K. et al., Highly Concurrent Shared Storage, (Research Paper), Sep. 7, 1999, 25 Pages. |
International Search Report and Written Opinion; PCT/US2014/053704; dated May 15, 2015; 13 pages. |
Razavi, B. et al., “Design Techniques for High-Speed, High-Resolution Comparators,” (Research Paper), IEEE Journal of Solid-State Circuits 27.12, Dec. 12, 1992, pp. 1916-1926, http://www.seas.ucla.edu/brweb/paper/Journals/R%26WDec92_2.pdf. |
Xingyuan, T. et al., “An Offset Cancellation Technique in a Switched-Capacitor Comparator for SAR ADCs;”; (Research Paper), Journal of Semiconductors 33.1, Jan. 2012, 5 pages, http://www.jos.ac.cn/bdtxbcn/ch/reader/create_pdf.aspx?file_no=11072501. |
Mao, Y. et al., A New Parity-based Migration Method to Expand Raid-5, (Research Paper), Nov. 4, 2013, 11 Pages. |
Kimura et al., “A 28 Gb/s 560 mW Multi-Standard SerDes With Single-Stage Analog Front-End and 14-Tap Decision Feedback Equalizer in 28 nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 49, No. 12, Dec. 2014, pp. 3091-3103. |
International Searching Authority, The International Serach Report 6 and the Written Opinion, dated Feb. 26, 2015, 10 Pages. |
Kang, Y. et al., “Fault-Tolerant Flow Control in On-Chip Networks,” (Research Paper), Proceedings for IEEE, May 3-6, 2010, 8 pages, available at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.7865&rep=rep1&type=pdf. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2015/023708, dated Apr. 22, 2016, 11 pages. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2014/062196, dated Jun. 30, 2015, 13 pages. |
International Preliminary Report on Patentability received for PCT Patent Application No. PCT/US2015/023708, dated Oct. 12, 2017, 10 pages. |
International Preliminary Report on Patentability received for PCT Patent Application No. PCT/US2015/013817, dated Aug. 10, 2017, 9 pages. |
International Preliminary Report on Patentability received for PCT Patent Application No. PCT/US2014/062196, dated May 4, 2017, 12 pages. |
EMC2; High Availability and Data Protection with EMC Isilon Scale-out NAS, (Research Paper); Nov. 2013; 36 Pages. |
Do I need a second RAID controller for fault-tolerance ?, (Research Paper); Aug. 22, 2010; 2 Pages; http://serverfault.com/questions/303869/do-i-need-a-second-raid-controller-for-fault-tolerance. |
Li, M. et al.: Toward I/O-Efficient Protection Against Silent Data Corruptions in RAID Arrays, (Research Paper); Jun. 2-6, 2014; 12 Pages. |
International Search Report and Written Opinion; PCT/US2015/013921; dated Oct. 28, 2015; 12 pages. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2015/013921, dated Oct. 28, 2015, 10 pages. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2015/013898, dated Oct. 8, 2015, 9 pages. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2015/013817, dated Oct. 29, 2015, 11 pages. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2014/053704, dated May 15, 2015, 11 pages. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2014/049193, dated Feb. 26, 2015, 8 pages. |
International Preliminary Report on Patentability received for PCT Patent Application No. PCT/US2015/013921, dated Aug. 10, 2017, 9 pages. |
International Preliminary Report on Patentability received for PCT Patent Application No. PCT/US2015/013898, dated Aug. 10, 2017, 8 pages. |
International Preliminary Report on Patentability received for PCT Patent Application No. PCT/US2014/053704, dated Mar. 16, 2017, 10 pages. |
International Preliminary Report on Patentability received for PCT Patent Application No. PCT/US2014/049193, dated Feb. 9, 2017, 7 pages. |
Number | Date | Country | |
---|---|---|---|
20170242753 A1 | Aug 2017 | US |