A direct memory access (DMA) system typically includes multiple DMA engines that access a central memory system. Because only one DMA engine may access the memory at a time, access to the memory is arbitrated. If multiple DMA engines are trying to transfer data at the same time, each DMA engine will wait for the other DMA engines to finish their transfers, introducing latency. In addition, the arbitration system typically has multiplexers to select which DMA engine's data and address will be sent to the central memory system.
A particularly desirable function in a DMA system with multiple channels is the ability to pass data in a buffer from one DMA channel to another DMA channel through the memory. Generally, the application must wait for the originating DMA channel to finish its transfer before starting the DMA channel that wants to use the data, which imposes significant latency.
A direct memory access (DMA) system overcomes these problems by providing a single context-based DMA engine connected to the memory system. The context-based DMA Engine implements the logic for each DMA function only once, and switches parameter sets as needed to service various DMA requests from different channels. Arbitration is performed at the DMA request level. After a DMA channel is selected for service, the parameters for that channel's transfer are retrieved from a central context block, the data transfer is queued to the memory system, and the parameters are updated and stored back to the central context block. Data paths also are constructed to support context-based transfer, using buffer blocks, to allow the DMA engine and the memory system to access any channel's data through simple addressing of the buffer block.
The DMA system also may have a buffer control unit (BCU) that permits DMA channels to be linked together in a flow controlled system to reduce latency. The buffer control unit allows independent flow control between write and read DMA channels accessing the same data, preventing underflow or overflow of data during simultaneous DMA operations. In particular, the large shared memory may be divided into several buffers. Buffers may be software-defined ring buffers of different sizes. A resource of the BCU is allocated to each of the buffers. DMA operations to or from a buffer are then linked to the BCU resource for that buffer. The BCU resource tracks the amount of data in the buffer and other buffer state information, and flow-controls the DMA engine(s) appropriately based on parameters that are set up within the BCU resource. Multiple read or write DMA channels also may be linked by the same BCU, so that, for example, two DMA channels could write into one buffer, which in turn is read out by one DMA channel that uses the data from both the input channels. To control data flow, neither the sender nor the receiver requires any knowledge of each other. The sender and receiver each use knowledge of the BCU resource associated with the buffer being used by the given DMA channel.
In the drawings,
This system may be implemented as a peripheral device connected to a host computer through a standard interface such as a PCI interface 122. The PCI interface may include one or more channels as indicated by FIFOs 124a, 124b and 124c. An application executed on the host computer configures the buffers in the memory system 100, and their corresponding BCUs, and sets up DMA contexts to be used by the DMA controller 116.
More details of an example implementation of the DMA controller, DMA context information, buffer control units, memory arbiter and read and write data buffers will now be provided in connection with
When 8 bytes of the port FIFO 400 are filled, the data is written as a single 64-bit word into registers 403 for that channel in the word assembly register/multiplexer 402. The writing of different data streams by different channels into the multiplexer 402 is controlled by arbiter 405. The arbiter 405 may permit writing on a round robin basis or by using any other suitable arbitration technique, such as by assigning priorities to different channels. As a word is written to the registers for a channel in this multiplexer 402, a 2-bit counter 404 associated with that channel is incremented. When four 64-bit words have been written to a port's assembly area 403 in the multiplexer, the data is transferred to a burst assembly buffer 408 as a single 256-bit word, through one or more intermediate FIFOs. It may be desirable to force each channel to always transfer a group of four 64-bit words. Each channel has its own designated address range in the burst assembly buffer. There is a 5-bit counter 410 associated with each port's designated address range within the burst assembly buffer 408. This counter is used to track the amount of data currently in the buffers for that channel. After up to sixteen 256-bit words (512 bytes) have been written into one of the buffers defined for a given channel in the burst assembly buffer, as determined by counter 410, a burst of up to 512 bytes may be written into the memory system 411.
An arbiter 412 determines whether such a burst transfer to the memory system should be made for a channel. The arbiter can make such a determination in any of a number of ways, including, but not limited to, round robin polling of the counter of each channel, or by responding to the counter status as if it were an interrupt, or by any other suitable prioritization scheme. Certain channels may be designated as high priority channels which are processed using interrupts (such as for live video data capture), whereas other channels for which data flow may be delayed can be processed using a round robin arbitration. The buffer status is checked as data is transferred in or out of the buffer to determine if a request is warranted.
The requests from the arbiter are queued to the DMA controller 414 through one or more FIFOs. An integral arbiter within the DMA controller determines which of the (potentially many) requests it will service next. The DMA controller loads the appropriate parameters for the transfer from the DMA context RAM block 416 (CRB). Using this information, the buffer control unit 419 linked to the buffer for the transfer also is accessed and checked.
The contents of the DMA Context RAM block 416 and the buffer control unit 419 will now be described in more detail.
The DMA context RAM block is a memory that is divided into a number of units, where each unit is assigned to a DMA channel. Each unit may include one or more memory locations, for example, about 16 memory locations. Each memory location is referred to as a DMA context block (DCB). For example, if there are 64 DMA channels, and 16 DCBs per channel, there would be 1024 memory locations. One DCB per channel may be designated as the active or scratchpad DCB, which is the DCB that is loaded for that channel to perform a data transfer. The DCBs for each channel may be linked together such that by use of one set of parameters from a DCB, the next set of parameters from the next DCB for that channel are automatically loaded into the location for the current DCB. Additionally, the active DCB may be modified by the DMA controller if, for example, the DMA performs only a partial data transfer.
Each DCB includes a set of parameters that are programmable by the application program running on the host computer. The set of parameters are stored in a set of registers that hold control information used by the DMA controller to effect a data transfer. These parameters generally include an address for the data transfer and a transfer count (i.e., an amount of data to be transferred). A pointer or link to the next set of parameters for the channel also may be provided. All DCBs except the active DCB for a channel are programmable by the application program. In the active DCB, only the link (to the next set of parameters) should be programmed by the application program.
An example of the kinds of data that may be stored in an example set of registers in a DCB in one embodiment may include the following:
A DCB also may include information not used by the DMA controller but used by the port that is transferring data. This information may include, for example, data format information and control parameters for processing performed by the port, such as audio mixing settings. A separate memory may be provided for this additional port information. As noted below, such information could be used by any port that is reading or writing data. A client control bus 430 is provided to connect the DMA controller to all of the ports. The port information for a transfer may be sent over the bus 430 to the appropriate port. In one embodiment, bus 430 is a broadcast channel and port information is sent, preceded by a signal indicating the port for which the information is intended. There are numerous other ways to direct port information to the ports in the system.
As noted above, the memory system is dynamically organized into buffers by the application software. Each buffer is a region of memory, and may be used, for example, as temporary data storage between processing elements that are connected to the read and write channels. The size and many characteristics of each buffer are programmable as noted above. A buffer has associated with it one or more buffer control unit entries (BCU entries). The BCU is the mechanism which controls the flow of data through the memory buffer, allowing the memory to be used as a FIFO with variable latency. Multiple BCU entries may be specified by the application at any given time. The BCU for a buffer tracks the amount of data written to and read from the buffer, counting the data in units called “slices”. A slice defines the granularity that the system uses to manage the buffers. The size of a slice is programmable within each BCU. For example, a slice may be a number of video lines, from 1 to 4096, or a number of supersamples (512 byte blocks) of audio data. The size of a given buffer is defined as the number of slices that the buffer can hold. A suitable limit for this size may be 4096 slices. If the size of a video line is also programmable, these parameters are programmed with significant flexibility.
As an independent logical unit, the BCU is a resource which can be assigned to any of the DMA channels. As noted above, the DCB for a DMA channel references a specific buffer in the memory system (as defined by the transfer address) and includes a BCU pointer to identify the BCU associated with the buffer. The BCU keeps track of the number of slices in the buffer (0 to 4095), providing a full flag to stall the port-to-memory DMA channel and an empty flag to stall the memory-to-port DMA channel. Thus a buffer may be “filled” by one DMA channel and the DMA channel reassigned to other tasks using other buffers, and the BCU retains the “status” of the buffer until another DMA channel links to it in order to access the data in the buffer. The BCU function is used when an access to the memory system is requested. The BCU either allows or disables the memory access, depending on the “fullness” of the buffer that is being accessed. Thus, an implementation may use only one physical BCU, which changes context for every memory access. Those contexts may be stored in four 512×32 RAMs yielding 512 individual contexts. When a DMA channel attempts to access the memory system, the BCU pointer in the DMA channel's current DCB selects the BCU context for that channel. Application software assigns the BCU pointer to the channel when programming the DCB.
Thus, each entry or context in the BCU context RAM block generally includes state information, such as current read and write pointers and the buffer size, to permit the determination of the fullness of the buffer. In one embodiment using the concept of “slices” noted above, a BCU may include a read line count, a write line count, a buffer size, a slice size, a slice count, a sequence count and other control information. These parameters for the BCU are programmed by the application when the BCU is allocated to a specific buffer. The read line count and write line count represent the number of lines that have been read from or written to the next slice, respectively. The slice size parameter defines how many lines are in a slice, and the slice count indicates the number of valid slices in the buffer at any given moment. Slices are defined in terms of lines of video in order to place reasonable limits on the hardware resources required to implement these functions; finer granularity in the flow control may be achieved by defining the slices in terms of smaller units (for example, pixels or bytes), at the expense of providing larger counters and comparators.
The sequence count field is another way in which DCBs for a channel and a BCU entry for a buffer interact. The sequence count field may be used for buffer read or write operations, to allow for the synchronization of multiple sets of DMA engines using the same buffer. This field may be ignored for read operations in certain implementations. As noted above, a DCB for a DMA operation includes a sequence number as well as a BCU pointer. If the sequence number in the DCB does not match the sequence count in the BCU, then the DMA engine will not transfer data, just as if the BCU was reporting that the buffer was full or empty. The sequence count may be optionally incremented at the end of the execution of any given DCB by setting the BCU sequence increment bit in that DCB.
The control field may include any control bits for functions available in the DMA engine. For example, these functions may include stop, go, write link and read link. The stop and go bits allow for direct host control so that the application may pause a transfer (by setting the stop bit) or allow a transfer to free-run (by setting the go bit).
The write link and read link operations are used to permit multiple ports to access the same buffer. For example, a video channel and an alpha channel may be merged into the same buffer, but data should not be read out of the buffer until both input channels have written into the channel. To support this operation, multiple BCU contexts may be linked using the read link and write link control bits in the BCU mentioned above. Linked contexts reside in consecutive locations in the BCU Context RAM. For example, DMA channel A, writing video to the buffer, is programmed to use BCU Context 30. BCU context 30 would have its read link bit set. DMA channel B, writing alpha to the buffer, is programmed to use BCU Context 31. Each DMA channel's write access to the buffer is independently controlled. The buffer read is performed by DMA channel C, whose DCB is set to use BCU Context 30 (an implementation would set a convention as to whether the lowest-numbered or highest-number linked context is to be used). When BCU Context 30 is accessed for the Read operation, because the read link bit is set, the buffer status is checked, and then the next Context (31) is also read and checked. Only if both level checks pass is the read memory access allowed to proceed. To link multiple buffer read operations, the same sequence applies, but the write link bit is set in each context that has a subsequent link.
Given the parameters for the channel from the current DCB for the channel, the DMA controller effects the data transfer using the state information about the buffer from the BCU controller 418 and BCU context RAM block 419, in a manner described below in connection with
There is a 5-bit counter 514 associated with each port's designated address range within the burst disassembly buffer 504. After up to sixteen 256-bit words (512 bytes) have been written into the address range for a channel in the burst disassembly buffer 504, that data may be read out through disassembly buffers 516 to the appropriate channel. An arbiter 520 controls which channel is reading from the burst disassembly buffer 504 into its corresponding buffer, from which data is transferred to its corresponding channel. This arbiter may operate, for example, on a round robin basis, or other suitable scheme, such as by assigning different priorities to different channels. The disassembly buffers 516 receive and store each 256-bit word in a FIFO memory for a channel as indicated at 526. A counter 528 for each channel determines when the FIFO is full or empty. Data in the FIFO is transferred to the client port 506 in 4 consecutive 64-bit chunks. The transferred data may be subjected to appropriate padding and formatting (indicated at 522) to the FIFO 524 at the client port 506. Similar to write operations, the DMA controller also may send information about the transfer to the port that is reading the data over the client control bus 530 to be used by the counter and control logic 532.
If, in step 706, the transfer count is not greater than zero, then the current channel N is set (718) to be inactive. If the chain pointer in the active DCB is equal to zero, as determined in step 720, then the current port has no further operations to process, and the DMA controller returns to the idle state 701. Otherwise, the next DCB for the channel is fetched (722) using the chain pointer. Any port-specific parameters for the current port N are then sent (724) to that port, and the channel is set (726) to be active. The first set of the transfer parameters is then saved into the DCB 0 location in step 716, and the DMA controller returns to the idle state 701.
In one embodiment, the DMA system described herein may be a peripheral device to a general-purpose computer system. Such a computer system typically includes a main unit connected to both an output device that displays information to a user and an input device that receives input from a user. The main unit generally includes a processor connected to a memory system via an interconnection mechanism. The input device and output device also are connected to the processor and memory system via the interconnection mechanism.
The computer system may be a general purpose computer system which is programmable using a computer programming language. The computer system may also be specially programmed, special purpose hardware. In a general-purpose computer system, the processor is typically a commercially available processor. The general-purpose computer also typically has an operating system, which controls the execution of other computer programs and provides scheduling, debugging, input/output control, accounting, compilation, storage assignment, data management and memory management, and communication control and related services. A memory system in such a computer system typically includes a computer readable medium. The medium may be volatile or nonvolatile, writeable or nonwriteable, and/or rewriteable or not rewriteable. A memory system stores data typically in binary form. Such data may define an application program to be executed by the microprocessor, or information stored on the disk to be processed by the application program.
One or more output devices may be connected to such a computer system. Example output devices include, but are not limited to, a cathode ray tube display, liquid crystal displays and other video output devices, printers, communication devices such as a modem, and storage devices such as disk or tape. One or more input devices may be connected to the computer system. Example input devices include, but are not limited to, a keyboard, keypad, track ball, mouse, pen and tablet, communication device, and data input devices. The invention is not limited to the particular input or output devices used in combination with the computer system or to those described herein.
Having now described a few embodiments, it should be apparent to those skilled in the art that the foregoing is merely illustrative and not limiting, having been presented by way of example only. Numerous modifications and other embodiments are within the scope of the invention.