The present invention relates generally to a crossbar data switch, and more particularly, to a hybrid crossbar partially non-blocking data switch with a single port per attached unit and multiple rings.
The transmission of data between multiple processing units within a single chip can be difficult. This problem has become important due to the proliferation of a multiple processing units on a chip. There are many specific problems relating to the transmission of data between these units on the same chip. Data coherency, substantial area on the chip, and power consumption are a few problems with these transmissions of data. Furthermore, attempting to achieve higher transfer rates exacerbates these problems. Transfer rates can be an exceptional problem when the processing units are large enough that the time required to propagate a signal across one unit approaches the cycle time of the data bus in question.
Conventional methods and/or apparatuses designed to solve these problems contain substantial drawbacks. Some solutions, such as a conventional shared processor local bus, do not achieve a high enough bandwidth. This result negatively impacts the data transfer rate on the chip. Another conventional solution is a full crossbar switch. This type of switch cross connects each port to all the other ports. This means that a full crossbar switch requires N×N connections, adding to the complexity of the switch. This solution consumes too much area on the chip and requires extensive wiring resources. It is clear that a new method or apparatus is needed enable the transmission of data between multiple processing units on the same chip, while retaining a high data transfer rate.
The present invention provides a ring-based crossbar data switch, a method, and a computer program for transferring data between multiple bus units in a memory system. This hybrid crossbar partially non-blocking data switch is configured for a single port connection for each bus unit and multiple data rings. The data transfers on this ring-based crossbar data switch are managed by a central arbiter. Multiple data transfers on this crossbar data switch can be handled concurrently, which ensures a high bandwidth. Furthermore, unused segments of this crossbar data switch are not clocked, which results in a lower power consumption.
Each bus unit is connected to a corresponding data ramp with a simple control interface. A controller resides on each data ramp, which controls the data transfers from the data ramp to the bus unit and the data transfers between the data ramps. The central arbiter receives requests from the bus units, arbitrates the requests, and issues control signals. The controllers interpret the control signals and transfer the data accordingly. Each data ramp is only directly connected to the two adjacent data ramps, which reduces the amount of wiring resources. The data rings form the connection between all of the data ramps. This enables this ring-based crossbar data switch to transfer data from one bus unit to any other bus unit in the memory system. In a preferred embodiment, there are four data rings, wherein two data rings transfer data clockwise and two data rings transfer data counter-clockwise.
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
In the following discussion, numerous specific details are set forth to provide a thorough understanding of the present invention. However, those skilled in the art will appreciate that the present invention may be practiced without such specific details. In other instances, well-known elements have been illustrated in block diagram form in order not to obscure the present invention in unnecessary detail. Additionally, for the most part, details concerning network communications, electro-magnetic signaling techniques, and the like, have been omitted inasmuch as such details are not considered necessary to obtain a complete understanding of the present invention, and are considered to be within the understanding of persons of ordinary skill in the relevant art.
Referring to
Data ramps and bus units are the devices that are described in this application, but many other similar devices may be utilized to achieve the same results as the present invention. For example, a “bus unit” is a generic term for any logical device that exchanges data with another logical device. A memory controller, an ethernet controller, a central processing unit (CPU), a peripheral component interconnect (PCI) express controller, a universal serial bus controller, and a graphics adapter unit could be a “bus unit” in this description. Furthermore, a “data ramp” is a generic term for a data transmission device in the data switch fabric.
Two data rings transfer data clockwise, Ring0102 and Ring2106, and two data rings transfer data counter-clockwise, Ring1104 and Ring3108. Each data ramp device is only connected to the two adjacent data ramp devices. For example, Ramp2118 is only connected to Ramp1116 and Ramp3120. Ramp1116 can transfer data to Ramp2118 on Ring0102 or Ring2106. Ramp1116 can receive data from Ramp2118 on Ring1104 or Ring3108. Alternatively, Ramp3120 can transfer data to Ramp2118 on Ring1104 or Ring3108. Ramp3120 can receive data from Ramp2118 on Ring0102 or Ring2106. Therefore, each data ramp device can only transmit data between the data ramp devices directly adjacent to it.
The central arbiter 112 manages the flow of data around the rings, allowing the bus units to have a simple interface with the data ramp device. For example, Ramp2118 interfaces with Unit2144 and Ramp8130 interfaces with Unit8156. This connection is a Request/Grant/Receive handshake control interface. Therefore, the bus unit can only request to send data, send data when permitted, or receive data. The bus units do not need to have any awareness of the actual structure of the data switch itself. This data structure is designed to move previously agreed-upon data packets.
For example, if Unit 10160 wanted to send data to Unit1142 it would send out a request. The request reaches the central arbiter 112 and the arbiter 112 begins to send out control signals 110 to the necessary data ramp devices, Ramp10134, Ramp11136, Ramp0114 and Ramp1116. The central arbiter 112 also selects an available data ring, which can be Ring0102 for this operation. Unit10160 receives a grant from the central arbiter 112 and transmits the requested data to Ramp10134. Ramp10134 uses Ring0102 to transmit this data to Ramp11136. Ramp11136 allows this data to pass through to Ramp0114. Ramp0114 allows this data to pass through to Ramp1116. Ramp1116 accepts this data and transmits the requested data to Unit1142.
Referring to
Ring0 In 206, Ring1 In 212, Ring2 In 214, and Ring3 In 220 are inputs into MUX0230, MUX1232, MUX2234, and MUX3236, respectively. Ring0 Out 208, Ring1 Out 210, Ring2 Out 216, and Ring3 Out 218 are outputs of Ramp Latch0222, Ramp Latch1224, Ramp Latch2226, and Ramp Latch3228, respectively. The Data In 202 signal is also an input to the multiplexors, 230, 232, 234, and 236. The multiplexors, 230, 232, 234, and 236 are split in half. The upper half of the multiplexors receive the Data In 202 signal and the lower half of the multiplexors receive the ring in signals, 206, 212, 214, and 220.
The controller latches 240, 242, and 244 reside on the ramp controller. The controller latches 0240 control the Data In signal 202 from the bus unit. If data is coming from the bus unit 202, then the controller latches 0240 order the correct multiplexor, 230, 232, 234, or 236 to accept the data. The controller latches 1242 control the ring in signals, 206, 212, 214, and 220. For example, if data is coming in on the Ring1 In line 212, then the controller latches 1242 order MUX1232 to accept the data. Each multiplexor can only accept data from one input at a time. Therefore, if there is data coming from the bus unit 202 to MUX1232, then data cannot be transferred on Ring1104 at the same time. It is the controller latches 240 and 242 that control the data flow on this data ramp.
Once the proper data channel has been selected then the multiplexors transfer the data to the ramp latches. For example, if Ring1 In data 212 has been selected by controller latches 1242, then this data passes from MUX 1224 to Ramp Latch1224. Alternatively, if data from the bus unit 202 has been selected by controller latches 0240 to transmit on Ring2106, then this data passes from MUX2234 to Ramp Latch2226.
The outputs of the ramp latches are the data out signals. Accordingly, Ramp Latch0222 outputs Ring0 Out 208. The ramp latches outputs are also connected to another multiplexor 238. This multiplexor transmits data to the bus unit 204. The controller latches 2244 control the multiplexor 238. For example, if the bus unit needs data from Ring3108, then the controller latches 2244 select the output of Ramp Latch3228 and the multiplexor 238 transmits the data to the bus unit. If the bus unit does not need any data, then the controller latches 2244 do not select any outputs from the ramp latches.
The controller latches, 240, 242, and 244, control the data ramp device by organizing these data transactions. Only one latch of controller latches 0240 can be on at any given time. This means that only one ring can receive data from the bus unit at any given time. All of the latches of controller latches 1242 can be on at any given time. Consequently, data can be transferred on all of the rings at the same time. Only one latch of controller latches 2244 can be on at any given time. Therefore, the multiplexor 238 can only transmit data from one data ring at any given time. These sets of controller latches can control multiple ramp building blocks.
As shown in
The data ramp devices provide a simple entry and exit port to the bus device's multiple ring structure. It takes one bus cycle for data to pass from the bus unit to its ramp, one bus cycle for data to pass from one data ramp to the next data ramp in the ring, and once the destination ramp is reached, it takes one bus cycle for data to pass from that data ramp to the receiving device.
Referring to
In this process the central arbiter 112 also sends flow control signals to the downstream data ramp devices. For the receiving data ramp device, the central arbiter sends an Early Data Valid (EDV) pulse. The EDV pulse is similar to the Grant pulse, but it cues the receiver to accept data. The receiver captures the DataTag signal from the requesting ramp output for one cycle. One cycle after the EDV pulse the receiver captures the DataTag data from the tag bus, and 3 cycles after the EDV pulse the receiver collects data for 8 cycles (one granule). The controller housed on the data ramp device receives a bus-specific EDV pulse and controls the ramp output multiplexors with the same timing constraints.
During a data transfer data ramps are also utilized as passthru devices. This entails that the specific data ramp device is only passing data to the next data ramp. The central arbiter 112 sends out passthru pulses for data transfers that must traverse one or more data ramps. A data ramp receiving a passthru pulse passes data from the specified ring input to its output for 8 cycles, starting 1 cycle after the pulse is received for the Tag Bus and 3 cycles for the Data Bus.
The central arbiter 112 controls this whole process. It collects the requests, arbitrates between them, chooses an appropriate ring, and grants the requests. The arbiter 112 does not grant requests if the new data transfer conflicts with another transfer that is already in progress. If part of a ring is in use by a transfer, it allows non-overlapping transfers to exist concurrently on other parts of the ring or it allows the new transfer to follow sequentially after the trailing edge of the prior transfer. For this embodiment there is also an error bit and a partial transfer bit that is transmitted with the data packets. The error bit indicates whether there is an error with the data, and it is transferred with the data on the data bus. The partial transfer bit indicates if the data transfer is less than 128 bytes, and it is transferred with the data on the tag bus.
Referring to
The second timing diagram,
The third timing diagram,
Referring to
The following details are implementation specific and only apply to this embodiment. Data transfer 516 begins with Unit 9 raising its data request line along with the destination unit ID. In this case the destination unit ID would identify Unit 6. The central arbiter collects this request and sends a grant to Unit 9 and Unit 6. Unit 9 then sends datatag data on the tag bus and subsequently sends data on the data bus to Ramp 9. Ramp 9 outputs this data on Ring3. Ramps 8, 7, and 6 receive passthru signals to allow this data to pass through on Ring3. The datatag data on the tag bus and the data on the data bus pass through Ramp 8, Ramp 7 and Ramp 6 on Ring3. During this process the central arbiter sends an EDV to Ramp 6. After the data has passed through Ramp 6, the controller on Ramp 6 passes the output to Unit 6. This is how a packet of data is transferred from Unit 9 to Unit 6 on Ring3. The procedure for data input and output between the bus unit and the ramp is shown in
This invention provides many advantages over the prior art. This hybrid crossbar data switch consumes less silicon area on the chip. Because each ramp is only interfacing with two other ramps the amount of logic and buses on the data chip is reduced. Only four sets of buses (rings) are needed, which also limits the corresponding logic to save space. This invention also drastically reduces the amount of wiring tracks. For a conventional crossbar data switch each port must be wired to every other port. For this modified crossbar switch each ramp only requires connections between adjacent data ramps. Furthermore, this modified crossbar switch retains a high peak bandwidth. As shown in
Referring to
It is understood that the present invention can take many forms and embodiments. Accordingly, several variations of the present design may be made without departing from the scope of the invention. The capabilities outlined herein allow for the possibility of a variety of programming models. This disclosure should not be read as preferring any particular programming model, but is instead directed to the underlying concepts on which these programming models can be built.
Having thus described the present invention by reference to certain of its preferred embodiments, it is noted that the embodiments disclosed are illustrative rather than limiting in nature and that a wide range of variations, modifications, changes, and substitutions are contemplated in the foregoing disclosure and, in some instances, some features of the present invention may be employed without a corresponding use of the other features. Many such variations and modifications may be considered desirable by those skilled in the art based upon a review of the foregoing description of preferred embodiments. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the scope of the invention.