The present invention relates to direct memory access engines, and more particularly, to a method and apparatus for transferring data between RAID controllers using a direct memory access engine.
Redundant Arrays Of Inexpensive Disks (RAID) systems are well known systems which can help increase availability of stored data in network storage systems. Such systems typically include several hard disk drives which store data in such a way that, if one disk drive fails, data is still able to be recovered from the system. Such systems generally have a network storage bridge, which acts as an interface between a host computer and an array of disk drives. To further enhance the availability of data in such a system, it is common to have redundant storage controllers within the network storage bridge, such that if one controller fails, the remaining controller is able to continue read and write operations to the array of disk drives.
In a fully redundant storage bridge, user data has to be temporarily stored twice, once in a primary controller of data and once in its redundant or secondary counterpart. In this setup if the primary controller is damaged and unable to continue operation, the secondary redundant controller has a copy of the user data. Thus data is available if one controller fails, because either the primary or the secondary copy of the data will be delivered to its final destination, the array of disk drives.
Typically and with reference to the prior art drawing of
A traditional method for mirroring data between controllers is for the primary controller 10 to notify the secondary controller 18 that data is going to be mirrored. The primary controller 10 then transfers metadata to the secondary controller 18. The metadata is the data which is contained in the description tables. Following the metadata, the primary controller 10 transfers the user data to the secondary controller 18. The user data is stored in a local memory 26 associated with the primary controller 10 for transfer to a remote memory 30 associated with the secondary controller 18. In some cases, these transfers are initiated using direct memory access (DMA) and a DMA engine 34. In such a situation, a processing portion 38 within the primary controller 10 will give the DMA engine 34 a transfer command to transfer data from the primary controller 10 to the secondary controller 18. This transfer command typically includes the data contained in the description tables, which identifies the data which is to be transferred to the secondary controller 18. The processing portion 38 generally builds a DMA table containing the data from the description tables, which is able to be used by the DMA engine 34. This DMA table is then loaded into the DMA engine 34, with the transfer command. The DMA engine 34 receives the data from the processing portion 38, stores it in a memory 42 associated with the DMA engine 34, and then conducts the data transfer.
As can be seen, the processing portion 38 must build a DMA table, and transfer the table from the processing portion to the DMA engine 34. Thus, this data is stored in a memory 46 associated with the processing portion 38, configured into a form which is usable by the DMA engine 34, and then transferred and stored in the memory 42 associated with the DMA engine 34. It would be advantageous to reduce the amount of processing overhead involved in a mirroring transaction, thereby improving system performance. Accordingly, it would be advantageous to have a DMA engine which does not require a processing portion to create a DMA table. Furthermore, it would be advantageous to reduce internal memory required in a DMA engine, thereby reducing the silicon area required for such a DMA engine.
This invention allows a hardware based DMA engine to use the description tables stored in the processing portion directly to transfer data from the primary controller to the secondary controller through a dedicated, redundant set of data paths internal to the data bridge subsystem. Thus, the processing portion of the primary controller is not required to build a DMA table for use by the DMA engine. This transfer is accomplished via an internal data path from the primary controller to the secondary controller without ever leaving the storage bridge itself. Furthermore, the DMA engine uses the data tables as-is, without modification, thus reducing processor overhead. As the data is being transferred the tables are being updated and upon completion of a transfer, a completion status is posted to the processing portion. Since the DMA engine accesses the data table present in the processing portion memory, the processing portion does not have to load or build the table into the DMA memory. This access is accomplished independently of the processor portion thereby reducing processor overhead, as well as allowing for a reduced memory size for the DMA engine, saving silicon area for the chip containing the DMA engine.
The engine is also capable of XOR (exclusive OR) operations on the user data based on the same tables as the DMA transfer. In fact the XOR and DMA of the data can be performed concurrently, thus reducing overhead further.
The DMA engine of this invention thus results is a savings in gate count and silicon area required for its implementation, a reduction in CPU (central processing unit) processing time and thus reduction in command overhead, and reduction in software development time, since no additional data structures are needed for the DMA engine, which transfers data independently of the processor.
The invention is implemented as a module inside a memory controller 50. This device, illustrated in
The data manipulated by the DMA engine 70 does not have to be contiguous. The CPU tables typically represent the scatter-gather nature of the RAID data storage (see
The DMA table consists of destination entries only. Each table entry consists of sub-fields referred to as elements. Each element describes the location and the amount of data to copy. The “LAST” bit at the end of the source and destination tables instructs the DMA engine 70 to complete the operation.
The data copy table consists of source and destination entries. This command is designed to copy data inside the local controller memory 82. Each element describes the location and the amount of data to copy. The source table describes the locations and amounts of data located in the primary controller 50, the destination table describes the locations in the secondary controller to which the data has to be moved. The “LAST” bit at the end of the source and destination tables instructs the DMA engine 70 to complete the operation.
The XOR table resembles the data copy table, except it has multiple source tables. Data location and amount described by these source tables is XORed and the result is moved to the locations described by the destinations table. Again the “LAST” bit at the end of the source and destination table instructs the DMA engine 70 to complete the operation. In addition the “Source Count” field alerts the DMA engine 70 as to the number of source data pointers to XOR. Since the tables are not stored inside the DMA engine 70 itself, the theoretical number of sources is unlimited and does not impact the size of the DMA engine 70.
Once the DMA engine 70 operation has started, the system CPU 78 has an option to enqueue another command into the DMA engine 70 thus reducing command overhead to almost zero or alternatively to monitor the progress and wait for a command completion interrupt. Once the DMA engine 70 has completed all required data transfer, it issues a maskable interrupt and a status to the CPU 78 and ceases operation.
The DMA engine 70 has additional system benefits. It can be used to quickly initialize all memory to a predetermined data pattern. Another benefit of the DMA engine 70 is that it can be used to quickly check memory for errors, if a checksum or an ECC is used in conjunction with the temporary storage memory, as is the case in typical redundant systems.
Both the XOR engine portion 86 and the DMA engine portion 90 of the DMA engine 70 use common Scatter-Gather list structures. The S-G lists reside in the CPU memory 74 and the DMA engine 70 is responsible for extracting information from these lists and performing necessary data movement and manipulation. Unlike the S-G lists, the source of the actual user data being manipulated is always the local DDR-SDRAM user data buffer. Data transfer destination for XOR operations is also the local DDR-SDRAM memory 82. Data destination for DMA operations is either the PCI-XA port 54 or the PCI-XB port 58, depending on the command.
Within the command registers are several fields. The memory commands start immediately after the command (CMD) field of the enqueue register. Currently defined legal commands are:
The source count field (SrcCnt) defines the number of source scatter-gather (S-G) lists that the command is required to process. There is only one destination S-G list. The source count field is only relevant for XOR commands. The DMA commands only use the one allowed destination list as both source and destination, and do not check the SrcCnt field. The current command register displays the current (live) source list number being processed. The S-G address pointer field is used to identify which S-G list structure is to be used for a given command. The field is presumably used as both a starting address of the list (when the command is enqueued) as well as a command identifier (when status register is examined). The current command register will always show a “live” version of the address pointing to the list currently being processed. Note that this address pointer is required to point to an existing address in the CPU memory range. The stat field reflects command initialization (start) and completion statuses. A value of “00” in the status register signifies that there are no statuses pending. Clearing the status field will automatically make the register available for the initialization/completion status of the next command. Note that clearing this bit also clears the interrupt associated with this command's initialization or completion, provided the interrupt is enabled.
The command progress register reflects the flow of the commands being enqueued, currently executed and completed. As commands are being enqueued, a CMD_ENQD bit in this register gets asserted to signify that there is a command that has been enqueued but has not yet started. This bit will continue to be asserted until the command starts. If no other command is being executed, this will occur immediately, however if there is another command in the queue being executed, the bit will stay ON until the previous command execution is completed. Another command can be enqueued only after the currently enqueued command has been started and this bit has been cleared by the hardware.
Once the command starts, the CMD_INPROG bit is asserted. At the same time a maskable “Command Started” interrupt is set. The significance of this event is that once this interrupt is set, another command can be enqueued. Again, this new command will stay enqueued until the current command execution is completed.
Upon completion of either the DMA or the XOR operation, the CMD_COMPL bit is asserted signifying that the command has completed all data movement and/or manipulation. If the command status register is empty (no previous command completion statuses are pending), assertion of this bit will be momentary. However, if the status register is busy, this bit will stay on signifying that the command is waiting to post its completion status as soon as the status register is available for posting.
The command status register contains a valid status when a maskable command completion interrupt is set. The CMD_STAT bit in the command progress register will also be set at the same time. Clearing status bits in the command status register clears both the interrupt and the CMD_STAT bit. It also makes the status register available for the next command completion status.
The S-G list structure is illustrated in
At block 124, the DMA engine portion 90 determines if the last element flag is set in the scatter/gather element. If the last element flag is set, the DMA engine portion 90 marks the command status as complete in the command queue and sends a command complete message to the CPU 78, at noted by block 128. The mirroring operation is then complete, as noted by block 132. If at block 124 the DMA engine portion 90 determines that the last element flag is not set, the DMA engine portion 90 retrieves the next sequential scatter/gather element from CPU memory 74, as noted by block 136. The DMA engine portion 90, according to block 140, then copies data from the local DDR-SDRAM memory 82 location indicated by the scatter/gather element to a remote DDR-SDRAM location which corresponds to the location indicated by the scatter/gather element. The DMA engine portion 90 then repeats the operations associated with blocks 124 through 140.
If, at block 112, the DMA engine 70 determines that the DMA command is an XOR command, the XOR engine portion 86 of the DMA engine 70 reads the source count field from the DMA command register, as noted by block 144. The XOR engine portion 86 then, at block 148, retrieves the first scatter/gather element from the CPU memory address indicated in the DMA command. The XOR engine portion 86 performs XOR operations on the data contained in the memory locations which are determined from the memory location indicated by the first scatter/gather element and stores the XOR result in a destination memory location which is indicated by the memory location in the first scatter/gather element, as indicated by block 152. The XOR engine portion 86 then determines, at block 156, whether the last element flag is set in the scatter/gather element. If the last element flag is set, the XOR engine portion 86, at block 160, marks the command as complete in the command register and sends a notification to the CPU 78 that the command is complete. The XOR operation is then complete, as noted by block 164. If, at block 156, the XOR engine portion 86 determines that the last element flag is not set in the scatter/gather element, the XOR engine portion 86, at block 168, retrieves the next sequential scatter/gather element from CPU memory 74. The XOR engine portion 86, at block 171, performs XOR operations on the data contained in the memory locations which are determined from the memory location indicated by the scatter/gather element and stores the XOR result in a destination memory location which is indicated by the memory location in the scatter/gather element. The XOR engine portion 86 then repeats the operations associated with blocks 156 through 172.
The foregoing discussion of the invention has been presented for purposes of illustration and description. The description is not intended to limit the invention to the form disclosed herein. Variations and modifications commensurate with the above teachings, within the skill and knowledge of the relevant art, are within the scope of the present invention. The embodiment described hereinabove is further intended to explain the best mode presently known of practicing the invention and to enable others skilled in the art to utilize the invention in such embodiment, or in other embodiments, and with the various modifications required by their particular application or use of the invention. It is intended that the appended claims be construed to include alternative embodiments to the extent permitted by the prior art.
This application is the U.S. national stage of International Application No. PCT/US02/35786, filed Nov. 7, 2002, which claims the benefit under 35 USC § 119(e) of U.S. Provisional Application Ser. No. 60/332,415, filed Nov. 9, 2001, both of which are incorporated herein by reference in their entirety.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US02/35786 | 11/7/2002 | WO | 00 | 1/15/2004 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO03/043254 | 5/22/2003 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4217486 | Tawfik et al. | Aug 1980 | A |
4428044 | Liron | Jan 1984 | A |
5159671 | Iwami | Oct 1992 | A |
5345565 | Jibbe et al. | Sep 1994 | A |
5483528 | Christensen | Jan 1996 | A |
5530842 | Abraham et al. | Jun 1996 | A |
5590377 | Smith | Dec 1996 | A |
5619642 | Nielson et al. | Apr 1997 | A |
5668956 | Okazawa et al. | Sep 1997 | A |
5680579 | Young et al. | Oct 1997 | A |
5696922 | Fromm | Dec 1997 | A |
5790775 | Marks et al. | Aug 1998 | A |
5812754 | Lui et al. | Sep 1998 | A |
6061822 | Meyer | May 2000 | A |
6094699 | Surugucchi et al. | Jul 2000 | A |
6098140 | Pecone et al. | Aug 2000 | A |
6151641 | Herbert | Nov 2000 | A |
6157955 | Narad et al. | Dec 2000 | A |
6185634 | Wilcox | Feb 2001 | B1 |
6185652 | Shek et al. | Feb 2001 | B1 |
6219725 | Diehl et al. | Apr 2001 | B1 |
6230240 | Shrader et al. | May 2001 | B1 |
6243829 | Chan | Jun 2001 | B1 |
6272533 | Browne | Aug 2001 | B1 |
6370601 | Baxter | Apr 2002 | B1 |
6397293 | Shrader et al. | May 2002 | B2 |
6421769 | Teitenberg et al. | Jul 2002 | B1 |
6434720 | Meyer | Aug 2002 | B1 |
6470429 | Jones et al. | Oct 2002 | B1 |
6493795 | Arsenault et al. | Dec 2002 | B1 |
6507581 | Sgammato | Jan 2003 | B1 |
6732243 | Busser et al. | May 2004 | B2 |
6745310 | Chow et al. | Jun 2004 | B2 |
6801958 | Gugel | Oct 2004 | B2 |
6839788 | Pecone | Jan 2005 | B2 |
20010013076 | Yamamoto | Aug 2001 | A1 |
20020029319 | Robbins et al. | Mar 2002 | A1 |
20020065998 | Buckland | May 2002 | A1 |
20020069317 | Chow et al. | Jun 2002 | A1 |
20020069334 | Hsia et al. | Jun 2002 | A1 |
20020083111 | Row et al. | Jun 2002 | A1 |
20020091828 | Kitamura et al. | Jul 2002 | A1 |
20030065733 | Pecone | Apr 2003 | A1 |
20060106982 | Ashmore et al. | May 2006 | A1 |
Number | Date | Country |
---|---|---|
0800138 | Oct 1997 | EP |
0817054 | Jan 1998 | EP |
2396726 | Apr 2004 | GB |
2396463 | Jun 2004 | GB |
2396725 | Jun 2004 | GB |
2001142648 | May 2001 | JP |
Number | Date | Country | |
---|---|---|---|
20040186931 A1 | Sep 2004 | US |
Number | Date | Country | |
---|---|---|---|
60332415 | Nov 2001 | US |