The disclosed invention is in the field of network-on-chip (NoC) for system-on-chip (SoC).
A NoC comprises logic in distant parts of a chip. Controlling skew over long distances makes it impractical to use a synchronous clock tree in distant parts of modern chips. Some conventional solutions approach the problem by trying to send data through long channels in a way that it is reliably sampled by a clock edge at the destination. This is done by driving an inverted clock with the data so that the destination register samples on an edge that should fall within the middle of the data stability. This depends on total skew between all data and clock signals being less than one-half cycle over the full distance. It is very difficult to measure the skew. It depends on the capacitive coupling, which can make a very big difference on whether the bits are all changing the same way or interleaved on each cycle. Controlling for such skew is difficult in place and route and requires chip-specific annotation of signals, which makes it difficult to design a NoC as generated register transfer language (RTL). This is unworkable, except at low clock speeds.
Adapting a data channel between registers with different source and destination clocks is well known in the art as an asynchronous clock adapter. Because an asynchronous clock adapter has no requirements on the relationship of two clocks, it is ideal for a data link between different parts of a chip.
Referring to
From the write and read count values, the read control unit 114 produces a read pointer (RdPtr), which is registered in the destination clock domain. The read pointer controls the mux 104 to select the next data element to read from the buffer 102. The timing path from the RdPtr register, through the mux 104, and to the data register 106 in the destination clock domain is typically a critical path. It is slow because it twice traverses the distance between the source and destination ends of the asynchronous clock adapter 100. The timing path significantly limits the clock speed of the destination clock domain.
The disclosed embodiments include an asynchronous clock adapter that does not limit the clock speed of a destination clock domain when wire delay increases. A number of data channels are connected between source and destination clock domains. Successive data elements from a buffer (e.g., a FIFO buffer) can be sent on the data channels in a rotating order on successive cycles of a destination clock. With multiple data elements (e.g., data words), simultaneously being transmitted on data channels, each word transmission can take multiple cycles. By transmitting the elements in successive clock cycles, the elements can be captured in successive clock cycles, thus keeping the destination link fully utilized at a high clock speed.
The details of the disclosed implementations are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
The same reference symbol used in various drawings indicates like elements.
The disclosed asynchronous clock adapter uses parallelism to allow a lower clock speed for the path from read pointer to destination register. Multiple data elements (e.g., data words) from a buffer are sent simultaneously in parallel on separate data channels. In one embodiment, multiple read pointers each point to a data element in the buffer. In another embodiment, a single read pointer can be used to select a block of several data elements. In both embodiments, the timing path from each read pointer to the destination register can be a multi-cycle path in the destination clock domain.
In one embodiment, each channel can carry any data element from the buffer. This gives more flexibility to the read pointer, and is most efficient when the number of buffered data elements is not an integer multiple of the number of data channels. In another embodiment, only some data elements of the buffer are connected to each data channel. This latter embodiment has the benefit of reducing the number of inputs to a mux that selects between buffered data elements. Fewer mux inputs reduce the required silicon area and reduces clock delay.
The example embodiment shown in
A write control module 206 controls the writing of sequentially received source data elements into data elements of the buffer 204. The write control 206 module receives a Gray coded read count value, RdCnt. The write control module 206 synchronizes RdCnt through at least two sequential register stages, clocked by the source clock, to settle out metastability of the flip-flops of the registers. The write control module 206 sends a coded write count value, WrCnt, which is synchronized in registers 208 clocked by the destination clock. A read control unit 210 receives WrCnt and sends RdCnt in correspondence with the write control unit 206. The read control unit 210 generates the read pointers, RdPtr_0 and RdPft_1, to select the next buffer data element to transfer. The read control unit 210 also controls the mux 202 that selects between data channels.
In the example embodiment of
In another embodiment, the inputs to mux 202a, 202b, are connected to sequential (non-interleaved) data elements of the buffer 204. In such a configuration, the write control module 206 would assign incoming data elements to non-sequential locations within the buffer 204.
In another embodiment, mux 202a, 202b, have an input for each element of buffer 204. In such a configuration, data elements of the buffer 204 can be transferred in any order. In this configuration, the buffer need not have a number of data elements that is equal to a multiple of the number of data channels.
Logic synthesis of the asynchronous clock adapter can use a max_delay constraint between each read pointer and the destination register 212 equal to the clock period times the number of channels. Likewise, a max_delay constraint between the buffer registers and the destination register can be set equal to the clock period times the number of channels.
In an embodiment with four channels, with a max_delay constraint of four times the destination clock period, it is possible for the delay of data registered in the buffer 204 to become valid at the destination register 212 only after four clock periods of duration. Since the WrCnt value is only synchronized through two registers, it is possible that the Rd_Ptr would cause the destination register 212 to receive the previous value of the pointer buffer 204. To avoid this possibility, an embodiment with more than 3 channels can include an additional clock of delay, typically in the form of an additional write count register, for each channel beyond three. Such an embodiment can include the delay register(s) in the location shown as register D in
The Gray coded RdCount can still be transferred in a single clock cycle of the source clock and the WrCount can still be transferred in a single clock cycle of the destination clock in order to avoid the possibility of multiple increments of the Gray coded counter occurring before the counter value is registered in its receiving synchronization register. However, the Gray coded counters can have relatively small buses, which can be implemented with wide wires and strong drivers.
In some implementations, process 300 includes registering a first pointer (302); transmitting a first data element from a buffer on a first data channel (304); registering a second pointer at least one clock cycle after registering the first pointer (306); and registering the first data element at least one cycle after registering the second pointer (308).
In some implementations, process 300 can also include the steps of accepting a write count; delaying the registering of the first pointer until the write count exceeds a read count; and delaying the acceptance of the write count for at least one cycle.
This application claims priority to U.S. Provisional Application No. 61/504,135, filed Jul. 1, 2011, entitled “GALS ASYNCHRONOUS CLOCK ADAPTER,” the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61504135 | Jul 2011 | US |