The present disclosure is directed to a redundant array of independent disks (RAID) system, and more particularly to a RAID system configured to maintain data consistency during simultaneous input/output commands (requests) to overlapping blocks of data.
RAID systems comprise data storage virtualization technology that combines multiple disk drive components into a logical unit for the purposes of data redundancy and performance improvement. RAID systems can employ various configurations for storing and maintaining data redundancy. One of these configurations is referred to as RAID 5, and RAID 5 comprises block-level striping with distributed parity. The parity information is distributed among the drives that comprise the RAID 5 system.
A controller for maintaining data consistency without utilizing region lock is disclosed. In one or more embodiments, the controller is connected to multiple physical disk drives, and the physical disk drives include a data portion and a parity data portion that corresponds to the data portion. The controller can receive a first input/output command (I/O) from a first computing device for writing write data to the data portion and a second I/O command from a second computing device for accessing data from the data portion (e.g., another write operation, etc.). The I/O command and the second I/O command are for accessing the data portion simultaneously. The controller allocates a first buffer for storing data associated with the first I/O command and allocates a second buffer for storing data associated with a logical operation for maintaining data consistency. The controller initiates a logical operation that comprises an exclusive OR operation directed to the write data and the accessed data to obtain resultant exclusive OR data and copies the write data to the data portion and to cause the resultant exclusive OR data to be stored in the second buffer.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Written Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The Written Description is described with reference to the accompanying figures. The use of the same reference numbers in different instances in the description and the figures may indicate similar or identical items.
As shown in
In one or more embodiments of the present disclosure, the system 100 utilizes Small Computer System Interface (SCSI) communication protocols. For example, the system 100 may utilize XDWRITEREAD command protocols, XPWRITE command protocols, and the like.
As shown, the RAID system 100 also includes multiple cache buffers 114, 116, 118, 116. In one or more embodiments of the present disclosure, the system 100 includes multiple buffers 114, buffers 116, buffers 118, and buffers 116. These buffers, as described in greater detail below, are utilized to store data associated with the physical disk drives 112. For instance, the buffer 114 may be a data buffer for storing data for a write back volume, the buffer 116 may be a parity buffer for storing parity data and/or resultant data of a logical operation (e.g., output data associated with an exclusive or operation), and the buffer 118 may be a temporary buffer for storing data and parity data. 116116 The buffers 114, 116, 118 each have a respective buffer state for indicating the respective buffers availability. For instance, a buffer state can comprise “Not in use” to indicate that the buffer is not allocated. A buffer state can also comprise “Ready to use” to indicate the buffer is allocated and there is no operation pending. A buffer state can also comprise “Busy” indicating a read or write operation is in progress for the respective buffer and the buffer cannot be utilized (e.g., an XDWRITE or an XDWRITEREAD is in progress for the respective buffer). The buffer states can be maintained utilizing bits within the respective buffers 114, 116, 118.
As described herein in greater detail, the system 100 comprises a RAID controller 102 that maintains data integrity while processing simultaneous I/O requests (e.g., I/O requests issued to overlapping blocks of memory within the physical disk drives 112 at or near the same time) without utilizing region locks. For example, a first computing device 106 may issue an I/O request for accessing a first data stripe region, and a second computing device 106 may issue an I/O request for accessing the first data stripe region during the same time period of the I/O request from the first computing device 106. In some instances, based upon the I/O request, the corresponding parity data may need to be updated. In one or more embodiments of the present disclosure, the RAID controller 102 allocates a first data buffer 114 (and a first parity buffer 116) for the first I/O request issued from the first computing device 106 and allocates a second data buffer 114 (and a second parity buffer for the second I/O request issued from the second computing device 106.
The first data buffer 114 stores data associated with the first I/O request, and the second data buffer 114 stores data associated with the second I/O request. For instance, if the first I/O request is a read request, the first data buffer 114 stores the read data from the corresponding region in the physical disk drive 112. If the second I/O request is a write request, the second data buffer 114 stores the write data for the corresponding region in the physical disk drive 112. In this instance, the data is consistent independent which direct memory access (DMA) command (e.g., read command or write command) completes first.
In an embodiment of the present disclosure, the RAID controller 102 receives a write request from a computing device 102 (e.g., a host). The RAID controller 102 allocates a data buffer (e.g., a buffer 114) and a buffer for storing data associated with a logical operation (e.g., a buffer 116) in response to receiving the write request. The RAID controller 102 may copy the write data (e.g., data to be written per the write request) to the data buffer utilizing direct memory access protocols. The RAID controller 102 can then initiate a logical operation. For instance, the RAID controller 102 issues an XDWRITEREAD command to the corresponding data region of the physical disk drive 112 for the accessed stripe region (e.g., writes the data write data to the data region with the source buffer indicated as the data buffer and the destination buffer as the exclusive OR buffer). The source and the destination buffer address are provided utilizing the XDWRITEREAD command such that the resultant XOR data generated by the memory device or physical disk drive can be copied to the destination buffer. Utilizing an XPWRITE command, the RAID controller 102 writes the resulting data to the corresponding parity data region in the physical disk drive 112. Thus, the RAID system 100 is configured to maintain parity data consistency without utilizing region locks.
As shown in
If the buffer state of the buffer is busy and there was a cache hit (YES from Decision Block 404), a temporary buffer is allocated (Block 410). For example, when the RAID controller 102 determines the buffer state of the buffer 114 is busy and there was a cache hit, the RAID controller 102 allocates a temporary buffer 118 (e.g., a second buffer), and the RAID controller 102 links the temporary buffer 118 with the buffer 114. The write data is transferred to the buffer (Block 408). For example, the RAID controller 102 utilizes direct memory access protocols to cause the write data associated with the write command to be stored in the buffer 118. The RAID controller 102 then indicates to the computing device 106 that the write command is complete.
A determination of whether linked buffers exist occurs (Decision Block 510). The RAID controller 102 then determines whether linked buffers (e.g., data buffers) exist. Linked buffers comprise buffers that contain data for the same data portion within the physical disk drives 112. If linked buffers exist (YES from Decision Block 510), the next linked buffer is identified (Block 512). For example, the RAID controller 102 identifies the next linked buffer (e.g., linked buffer 114), which, as described above, is flushed. If there is no linked buffers (NO from Decision Block 510), the exclusive or buffer is invalidated (Block 516). For instance, the RAID controller 102 invalidates the respective buffer 116 when the data stored in the buffer 116 has been stored in the corresponding parity drive portion of the physical disk drive. The RAID controller 102 determines whether there are any remaining dirty data buffers to be flushed. If there are remaining the data buffers to be flushed, the next dirty data buffer is identified, and, as described above, is flushed. While the present disclosure discusses utilization of specific SCSI commands (e.g., XDWRITEREAD and XPWRITE), it is understood that other commands representing the same functionality may be utilized without departing from the spirit of the disclosure.
Generally, any of the functions described herein can be implemented using hardware (e.g., fixed logic circuitry such as integrated circuits), software, firmware, manual processing, or a combination of these embodiments. Thus, the blocks discussed in the above disclosure generally represent hardware (e.g., fixed logic circuitry such as integrated circuits), software, firmware, or a combination thereof. In the instance of a hardware embodiment, for instance, the various blocks discussed in the above disclosure may be implemented as integrated circuits along with other functionality. Such integrated circuits may include all of the functions of a given block, system or circuit, or a portion of the functions of the block, system or circuit. Further, elements of the blocks, systems or circuits may be implemented across multiple integrated circuits. Such integrated circuits may comprise various integrated circuits including, but not necessarily limited to: a monolithic integrated circuit, a flip chip integrated circuit, a multichip module integrated circuit, and/or a mixed signal integrated circuit. In the instance of a software embodiment, for instance, the various blocks discussed in the above disclosure represent executable instructions (e.g., program code) that perform specified tasks when executed on a processor. These executable instructions can be stored in one or more tangible computer readable media. In some such instances, the entire system, block or circuit may be implemented using its software or firmware equivalent. In other instances, one part of a given system, block or circuit may be implemented in software or firmware, while other parts are implemented in hardware.
Although the subject matter has been described in language specific to structural features and/or process operations, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.