One or more embodiments of the invention relate generally to the field of integrated circuit and computer system design. More particularly, one or more of the embodiments of the invention relate to a method and apparatus for a multi-function direct memory access core.
Data transfer between a peripheral device, such as an input/output (I/O) device, and system memory may be accomplished using programmed I/O transfers or direct memory access (DMA). Generally, programmed I/O transfers provide a less efficient method than DMA. For programmed I/O transfers, an I/O device generates an interrupt to inform a central processing unit (CPU) that the I/O device requires data transfer. Issuing of the interrupt causes the CPU to write data from the I/O device to system memory or read data from system memory and provide the data to the I/O device.
Generally, programmed I/O transfers are less efficient than DMA since they require the generation of at least two bus cycles by the CPU for each data transfer. In addition, programmed I/O transfers occupy the CPU to transfer the data, rather than performing its primary function of executing application code. Conversely, DMA provides a more efficient method to accomplish transfer between an I/O device and system memory. To perform DMA, the I/O device requires designation as a bus master. A bus master I/O device may initiate a bus cycle to communicate with memory once the I/O device is awarded bus ownership via bus arbitration.
Generally, such I/O devices are not directly coupled to memory, but are coupled to a controller, such as, for example, an I/O controller hub, which performs the read/write to/from memory as directed by the I/O device. This bus master or DMA method of data transport is more efficient because the CPU is not involved in the data transfer and typically a single burst cycle is generated to move a block of data. To direct the controller to perform DMA, the I/O device may populate the fields of a DMA descriptor.
In operation, the DMA descriptor is read by the controller, which either reads or writes requested data to or from memory, referred to herein as “DMA data.” A controller optimized to perform block transfers of data between an I/O device bus and local processor memory is referred to herein as a “DMA controller.” In addition, some DMA controllers support descriptor chaining. Generally, DMA descriptors that describe one DMA transfer each can be linked together in, for example, I/O local memory to form a linked list. Each chain descriptor contains all the necessary information for transferring a block of DMA data and a pointer to the next descriptor in the chain. The end of the chain is indicated when the pointer is zero.
The various embodiments of the present invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
A method and apparatus for a multi-function direct memory access core are described. In one embodiment, the method includes the reading of a direct memory access (DMA) descriptor having associated DMA data to identify at least one micro-command. Once the micro-command is identified, the DMA data is processed according to the micro-command during DMA transfer of the DMA data. In one embodiment, control logic directs processing on the DMA data in transit within a DMA engine according to the identified micro-command. Hence, by defining a primitive set of micro-commands, a DMA engine within, for example, an input/output (I/O) controller hub (ICH) or I/O processor, can be used to perform a large number of complex operations on the DMA data as the DMA data flows through the ICH without introducing latency into the DMA transfer.
In the following description, certain terminology is used to describe features of the invention. For example, the term “logic” is representative of hardware and/or software configured to perform one or more functions. For instance, examples of “hardware” include, but are not limited or restricted to, an integrated circuit, a finite state machine or even combinatorial logic. The integrated circuit may take the form of a processor such as a microprocessor, application specific integrated circuit, a digital signal processor, a micro-controller, or the like.
An example of “software” includes executable code in the form of an application, an applet, a routine or even a series of instructions. In one embodiment, an article of manufacture includes a machine or computer-readable medium having stored thereon instructions to program a computer (or other electronic devices) to perform a process according to one embodiment. The computer or machine readable medium includes, but is not limited to: a programmable electronic circuit, a semiconductor memory device inclusive of volatile memory (e.g., random access memory, etc.) and/or non-volatile memory (e.g., any type of read-only memory “ROM,” flash memory), a floppy diskette, an optical disk (e.g., compact disk or digital video disk “DVD”), a hard drive disk, tape, or the like.
System
Representatively, chipset 130 may include memory controller hub 110 (MCH) coupled to graphics controller 150. In an alternative embodiment, graphics controller 150 is integrated into MCH, such that, in one embodiment, MCH 110 operates as an integrated graphics MCH (GMCH). Representatively, MCH 110 is also coupled to main memory 140 via memory bus 142. In one embodiment, main memory 140 may include, but is not limited to, random access memory (RAM), dynamic RAM (DRAM), static RAM (SRAM), synchronous DRAM (SDRAM), double data rate (DDR) SDRAM (DDR-SDRAM), Rambus DRAM (RDRAM) or any device capable of supporting high-speed buffering of data.
As further illustrated, chipset 130 includes an input/output (I/O) controller hub (ICH) 120. Representatively, ICH 120 may include a universal serial bus (USB) link or interconnect 162 to couple one or more USB slots 160 to ICH 120. Likewise, a serial advance technology attachment (SATA) 172 may couple hard disk drive devices (HDD) 170 to ICH 120. In addition, ICH 120 may include peripheral component interconnect (PCI)/PCI-X bus 182 to couple PCI slots 180 to ICH 120, such as small computer system interface (SCSI) 190 coupled to redundant array of independent disk (RAID) disk array 192. In one embodiment, system BIOS 106 initializes computer system 100.
Representatively, ICH 120 enables communication between the various peripheral devices coupled to ICH and chipset 130. As described herein, each device, or I/O card that resides on an I/O bus, such as USB bus 162 or PCI-X bus 182 are referred to herein as “bus agents.” Bus agents are generally divided into symmetric agents and priority agents, such that priority agents are awarded ownership when competing with symmetric agents for bus ownership. Such arbitration is required since bus agents are generally not allowed to simultaneously drive the bus to issue transactions.
As described herein, the term “transaction” is defined as bus activity related to a single bus access request. Generally, a transaction may begin with bus arbitration and the assertion of a signal to propagate a transaction address. A transaction, as defined by the Intel® architecture (IA) specification, may include several phases, each phase using a specific set of signals to communicate a particular type of information. Phases may include at least an arbitration phase (for bus ownership), a request phase, a response phase and a data transfer phase.
Within computer systems, such as computer system 100, memory access latency or the time required to write or read data from main memory 140 is often seen as a system bottleneck. Conventionally, main memory access by I/O devices is performed using programmed I/O transfers in which a CPU issues a bus transaction to either read or write data to/form memory for the I/O device. Accordingly, one technique for alleviating the memory bottleneck is DMA. DMA is a capability provided by advanced architectures which allows direct transmission of data from an attached device to main memory, without involving the CPU. As a result, the system's CPU is free from involvement with the data transfer, thus speeding up overall computer operation.
Implementing DMA access within a computer system, such as computer system 100, requires the designation of devices with DMA access as bus masters. A bus master is a program either in a microprocessor or in a separate I/O controller that directs traffic on the system bus or input/output (I/O) paths. For example, as depicted with reference to
The OS is responsible for designating a certain area of memory 140 as DMA enabled memory. Within the DMA enabled memory area, the OS will assign portions of this area to the various bus masters within the system 100. Once the assignment is received, the bus master is said to have established a DMA channel between the bus master and the main memory 140. As a result, during operation, when an I/O device such as RAID 192 requires read-write access to main memory 140, the bus master 190 performs a DMA access request to chipset 130.
To direct a controller, such as ICH 120, to perform DMA, an I/O device may populate the fields of a DMA descriptor. The fields of a DMA descriptor may include a source address, a destination address, a byte count to transfer and other attributes. In operation, the DMA descriptor is read by the controller, which either reads or writes requested data to or from memory, referred to herein as “DMA data.” A controller optimized to perform block transfers of data between an I/O device bus and main memory is referred to herein as a “DMA controller,” which are conventionally implemented within an I/O controller hub, such as ICH 120.
Conventional DMA controllers are generally limited to moving data from one memory, or I/O, location to another memory, or I/O, location. In contrast to conventional DMA controllers, ICH 120 includes DMA logic 200. In one embodiment, DMA logic 200 supports the use of DMA micro-commands selected by a bus master to direct DMA logic 200 to perform various functions. In one embodiment, DMA logic processes DMA data as the DMA data flows through DMA core 300 either to main memory 140 or from main memory 140, for example, as illustrated in
As shown in
Referring again to
In one embodiment, descriptor logic 210 utilizes command interface 220 to store DMA micro-commands within command queue 330 of DMA core 300. Accordingly, as a DMA data request is received from a bus master, DMA data associated with the DMA data request is processed by DMA core 300 according to at least one associated DMA micro-command contained within command queue 330. In one embodiment, control logic 302 decodes DMA micro-commands associated with a received DMA data request to form one or more DMA micro-operations. In response to such decoded DMA micro-operations, control logic 302 directs the various components of DMA core 300 to perform various functions on the DMA data as DMA data flows through data buffers 370.
In one embodiment, the processing of DMA data associated with received DMA data request is performed under the direction of control logic 302. Accordingly, once identified DMA micro-commands are decoded into one or more DMA micro-operations, control logic 302 directs the various components of DMA core 300, as illustrated in
In one embodiment, control logic 302 directs input DMA data logic 340 to perform data alignment with reference to a destination for received DMA data, as well as byte lane swapping and encryption according to the decoded DMA micro-command. In one embodiment, DMA data logic 340 performs byte lane swapping of incoming data to support, for example, big endian processing. DMA data logic 340 also supports cryptographic functions, such as encryption of incoming DMA data to provide Galois Multiplication functionality using an encryption key specified by the encryption key (attribute field) provided with the DMA micro-command.
In one embodiment, control logic 302 may direct data integrity logic 350 to detect transmission errors of DMA data associated with received DMA data requests. In one embodiment, data integrity logic 350 enables the computation of a cyclic redundancy check (CRC), as well as checksum operations to detect data transmission errors of DMA data, which is corrupted during transmission. Likewise, control logic 302 may direct computational logic 360 to perform one or more DMA exclusive-OR (XOR) logical operations. In one embodiment, logic 360 includes an XOR engine to XOR incoming DMA data or transformed DMA data (using for instance, Galois multiplier) with data contained within the data buffer, as specified by a buffer ID (attribute) received with the associated micro-command.
In one embodiment, control logic 302 may direct output DMA data logic 390 to perform data alignment functionality for outbound DMA data. In one embodiment, output DMA data logic 390 to support swapping byte lanes in both incoming (input DMA data logic 304) and outgoing data paths to support big-endian applications. The endian byte swap can be performed according to the swap width (attribute field) provided with the micro-command. In one embodiment, control logic 302 decodes the following micro-commands to process DMA data in transit through DMA core 300 without actually copying data to another memory or I/O space:
dma—this micro-command can be used to perform a simple DMA operation. The DMA data is moved from a source address to a destination address. In one embodiment, CRC/Checksum/Encryption, etc., can also be computed for the DMA data by either input DMA logic 340 or data integrity logic 350.
dma_new_seed—this micro-command can be used to perform a simple DMA operation. The DMA data is moved from a source address to a destination address. In one embodiment, CRC register (contained in data integrity logic (350)) is loaded with the crc_seed provided with micro-command (attribute filed), before computing CRC for the DMA data by data integrity logic 350.
buf_rd—this micro-command is used to move DMA data from the source address to one of the internal buffers (370-1, . . . , 370-N). The DMA data is stored aligned to the destination address. CRC/Checksum/Encryption, etc., can also be computed.
buf_rd_new_seed—this micro-command can be used to move DMA data from the source address to one of the internal buffers (370-1, . . . , 370-N). The DMA data is stored aligned to the destination address. CRC register is loaded with the new seed provided with the micro-command (attribute field), before computing CRC for the DMA data.
XOR—this micro-command can be used to read data from the source address and exclusive-OR (XOR) to the data in a buffer specified by the src_buf_id (attribute field) provided with the command, and store the XORed data in the data buffer specified by the dest_buf_id (attribute field) provided with the command. The XORed data may be stored in the same buffer. CRC/Checksum/Encryption, etc., can be computed for incoming data. In addition, control logic 302 verifies that data buffer is all-zero for the specified byte count.
In one embodiment, XOR commands are broken up into multiple specific XOR commands. All XOR sequences require the same destination address except for XOR LAST RAID 6 command.
XOR FIRST—this command is used to read DMA data from the source address and aligned to the destination address as the DMA data is written into the data buffer 370. The XOR FIRST implies a start of an XOR sequence. All XOR sequences start with the XOR FIRST command. The DMA data is written in the data buffer specified by the dest_buf_id (attribute field) provided with the command. CRC/Checksum/Encryption, etc., can also be computed.
XOR LAST—this command is used to read DMA data from the source address and aligned to the destination address as the data is written into the data buffer. The XOR LAST command is used at the end of an XOR sequence. The DMA data is read from a buffer specified by the src_buf_id (attribute field) provided with the command from previous XOR or XOR FIRST command and bit-wise XOR with the new read data and written back to the data buffer specified by the dest_buf_id (attribute field) provided with the command. Once in the specified data buffer, the data can be written back out using the buffer write command. CRC/Checksum/Encryption, etc., can also be computed.
XOR ZERO CHECK—this command is identical to the XOR LAST command except that it performs a zero check on the resulting data. This is reported onto the zero_chk_fail signal along with dma_done. When the zero check fails, the zero_chk_fail signal is set.
XOR LAST RAID 6—this command is identical to the XOR LAST command except that this is an additional XOR command after the XOR LAST command. This calculates the diagonal parity. The destination address here is not required to be identical to the destination address of subsequent XOR commands. CRC/Checksum/Encryption, etc., can also be computed.
buf_wr—this micro-command can be used to write the data buffer specified by the dest_buf_id field provided with the micro-command to the destination address. No alignment operations are performed. It is assumed that the data in that buffer is already aligned to that destination address. CRC/Checksum/Encryption, etc., can be computed for outgoing data.
block_fill—this micro-command can be used to fill a block in the memory specified by the destination address with the fill data provided together with the micro-command.
Hence, control logic 302, in one embodiment, decodes a received DMA micro-command to perform the following commands for DMA data received from input port 240: DMA, DMA with new seed, buffer read, buffer read with new seed, XOR first, XOR, XOR last, XOR zero check and XOR LAST RAID 6. In one embodiment, control logic directs write port 250 perform, such as buffer_wr and block_fill micro-commands from command queue. A command interface for DMA core 300 is shown in Table 2.
Although Table 2 lists a limited set of micro-commands, it is possible to add new micro-operations to enhance the features supported by DMA core 300. Procedural methods for implementing one or more of the above-described embodiments are now provided.
Operation
Accordingly, in one embodiment, a DMA core, as illustrated in
In any representation of the design, the data may be stored in any form of a machine readable medium. An optical or electrical wave 560 modulated or otherwise generated to transport such information, a memory 550 or a magnetic or optical storage 540, such as a disk, may be the machine readable medium. Any of these mediums may carry the design information. The term “carry” (e.g., a machine readable medium carrying information) thus covers information stored on a storage device or information encoded or modulated into or onto a carrier wave. The set of bits describing the design or a particular of the design are (when embodied in a machine readable medium, such as a carrier or storage medium) an article that may be sealed in and out of itself, or used by others for further design or fabrication.
It will be appreciated that, for other embodiments, a different system configuration may be used. For example, while the system 100 includes a single CPU 102, for other embodiments, a multiprocessor system (where one or more processors may be similar in configuration and operation to the CPU 102 described above) may benefit from the multi-function DMA core of various embodiments. Further different type of system or different type of computer system such as, for example, a server, a workstation, a desktop computer system, a gaming system, an embedded computer system, a blade server, etc., may be used for other embodiments.
Having disclosed embodiments and the best mode, modifications and variations may be made to the disclosed embodiments while remaining within the scope of the embodiments of the invention as defined by the following claims.