The present invention relates generally to signaling between electrical components and in particular to a system and method for adaptively deskewing parallel data signals relative to a clock.
In the multiprocessor computer systems environment, clock pulses from a common source are distributed for controlling many widely separated circuit modules. Time delays associated with the passage of clock and data signals through parallel, but not identical, paths are not uniform; signals can arrive at their destination in skewed time relation to each other. Source synchronous clocking is often utilized whereby parallel data signals and a synchronous clock are distributed to widely separated circuit modules. The forwarded clock acts as a capture clock for data at the destination. The capture clock edge is optimally positioned between successive data edges so the receiving capturing device has equal setup and hold time margins. Often, finite time delay is added to each signal to correct for skew and to optimally position the forwarded capture clock edge relative to the deskewed data edges.
It is possible to limit a certain amount of signal skew by applying careful attention to layout and design. Examples of methods to reduce clock pulse skew are shown in U.S. Pat. Nos. 4,514,749 by Skoji and 4,926,066 by Maimi et al. Such methods fail, however, to correct for skew from various divergent clock pulse path interconnections. In addition, such skew compensations, once implemented, cannot accommodate variations in skew caused by such factors as component aging, operating environment variations, and so forth.
Within a computer system, data is passed from register to register, with varying amounts of processing performed between registers. Registers store data present at their inputs either at a system clock transition or during a phase of the system clock. Skew in the system clock signal impacts register-to-register transfers, i.e., skew may cause a register to store data either before it has become valid or after it is no longer valid.
As system clock periods shrink there is increasing pressure on the computer architect to increase the amount of determinism in the system design. Clock skew, like setup time, hold time and propagation delay, increase the amount of time that data is in an indeterminable state. System designers must be careful that this indeterminable state does not fall within the sampling window of a register in order to preserve data integrity.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for a system and method for reducing skew between parallel signals within electrical systems.
In the following drawings, where like numbers indicate similar function,
a-c illustrate a phase comparator which can be used in deskewing circuits according to the present invention;
In the following detailed description of the preferred embodiments, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific preferred embodiments in which the inventions may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, mechanical and electrical changes may be made without departing from the spirit and scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the claims.
The system and method described below can be used to reduce skew between parallel data signals relative to a clock. In one embodiment, skew is reduced relative to an optimally positioned (orthogonal) capture clock edge as is described below.
One embodiment of deskewing circuit 100 is shown in more detail in
In another embodiment (not shown), synchronizer circuit 140 includes a sampler 144 and an output register 146. Sampler 144 is clocked with doubled channel clock 132. Output register 146 is clocked with core clock 150.
In one embodiment, such as is shown in
In one embodiment, channel clock interface 130 includes a delay line to allow for additional clock delay. In one such embodiment, delay line controller 120 processes skew indicator signals 118 to minimize the skew between data bits and to optimally delay the doubled channel clock with respect to a predetermined timing scheme. Delay line controller 120 determines the amount of delay a signal 105 requires and through one or more control lines 122 dictates the specific behavior of each delay line 112.
In one embodiment, each delay line 112 sends a processed channel data signal 108 to skew detection circuit 114. Skew detection circuit 114 compares the phase of the processed channel data signal 108 to the phase of the doubled channel clock 132 supplied by channel clock interface 130. At the completion of this phase comparison skew detection circuit 114 generates a skew indicator signal 118 representing skew detected in each data channel. In one such embodiment, skew indicator signal 118 includes a clock early signal which is active when the reference clock signal edge is early relative to the data edge and a data early signal which is active when the data edge is early relative to the reference clock signal edge.
Delay line controller 120 receives the phase comparison information via skew indicator signal 118 and determines whether additional delay adjustments are required. Since any individual phase comparison would be subject to significant error due to data edge jitter, a large number of samples are required before an updated estimate of data “early” or “late” can be made. (In one embodiment, a minimum of 256 samples are required before an updated estimate of data “early” or “late” can be made.)
In one embodiment, individual phase comparisons are digitally filtered inside delay line controller 120 prior to any delay adjustments being made to the clock or data signals.
In the embodiment shown in
In one embodiment, a duty cycle sense circuit 166 is used to ensure that doubled channel clock 132 has approximately a 50 percent duty cycle. In one such embodiment, doubled channel clock 132 has a positive duty cycle of 45-55%.
In one embodiment, serial to parallel converter 142 receives data from delay line 112 and converts the data to a parallel format. The data is then shifted, in parallel, to sampling circuit 144. In one embodiment, sampling circuit 144 samples the parallel data read from serial to parallel converter 142 such that it can be latched by output register 146. Output register 146 drives deskewed data signal 116 with a deskewed data signal synchronized to core clock 150.
In the embodiment shown in
In the embodiment shown in
In one embodiment, bit deskew and clock centering circuitry is added to independently center the capture clock within the center of each data eye. In one such embodiment, deskew is achieved by adding additional delay to “early” arriving signals so that they match the “latest” arriving signal.
In one embodiment, delay is added to the clock or data signals to position the channel clock within the data eye. Delay line controller 120 maintains minimum latency through the delay lines once this objective is met.
In the embodiment shown in
In one embodiment, fine tune delay line 200 includes a number of differential delay circuits. In the embodiment described in the patent application entitled “A Programmable Differential Delay Circuit with Fine Tune Adjustment” discussed above, an internal multiplexing scheme eases many timing and physical design concerns encountered when selecting between tap points distributed along a long delay line.
In one embodiment, coarse tune delay line 210 provides a frequency dependent amount of additional delay (1, 2, or 3 clock cycles) which corresponds to a range of 2.5 ns at signaling rates of 800 Mb/s. The coarse tuning technique uses the frame signal shown in
In one embodiment, channel clock 115 is nominally delayed from channel data 105 by half of a bit duration. In one such embodiment, this delay takes place on the transmit side of the link either by launching channel clock 115 off of the opposite edge of the transmit clock than that used to launch channel data 105 or by launching clock 115 and data 105 off of the same transmit clock and then delaying clock 115 with additional PCB foil trace length.
In one embodiment, phase comparator 220 is a digital sample and hold phase comparator used to establish the phase relationship between double channel clock 132 and fine tuned deskewed data 204. Since, as is noted above, any individual phase comparison would be subject to significant error due to data edge jitter, a minimum of 256 samples are required before an updated estimate of data “early” or “late” can be made.
In one embodiment, an initial training sequence is required to deskew and center the date and clock. To facilitate this, in one such embodiment, the channel protocol includes an initial start-up sequence. The initial start-up sequence provides a sufficiently long sequence of data edges to guarantee that delay line controller 120 can deskew the data using fine tune delay line 200.
At the end of the start-up sequence, a one-time coarse tune sequence is initiated. The coarse tune sequence is required because the phase comparator has phase ambiguity if channel clock 115 is skewed from data 105 by more than ±Tbit/2. In other words, phase comparator 220 cannot distinguish whether the Nth clock edge is being compared to the Nth data eye or the (N−1)th or (N+1)th data eyes.
To counter this, in one embodiment, the one-time coarse tuning sequence is used to re-align all data bits which have slipped beyond the resolution of phase comparator 220. In one embodiment, logic within the frame data bit slice is designed to detect a unique coarse tuning sequence (e.g., ‘110011’) sent on the incoming frame signal. Upon detection, a CTUNE pulse is generated and fanned out to all the data bit slices, data ready and frame. The CTUNE pulse delays the incoming data by one, two or three doubled channel clocks 132 prior to entering the serial to parallel converter, after determining if the data is early, nominal or late with respect to the CTUNE pulse. An example of this correction is shown in
If none of the slices has late arriving data leading to cycle slip (determined, e.g., by a logical OR of all the data, data ready and frame ‘late’ signals), then, in one embodiment, all the data travels through one less coarse tune flip flop of delay to reduce the overall latency by one doubled channel clock cycle.
In the embodiment discussed above, circuitry in coarse tuning delay circuit 210 can be used to deskew all data bits as long as there is not more than one clock cycle slip in either direction between any individual data or data ready bit relative to the frame signal (the frame signal acts as a coarse tune reference point). This range can easily be increased to any arbitrary limit with additional circuitry.
In one embodiment, each data, data-ready and frame signal is deskewed by a separate bit slice deskew circuit 110. Phase comparators 220 within each bit slice produce an output which indicates whether doubled channel clock 132 is early or late with respect to the optimal clock position. A simplified diagram of phase comparator 220 is shown in
In one embodiment, delay line controller 120 includes circuitry to adaptively deskew delays between all data, data ready and frame bits and to optimally position capture clock 132 between opening and closing edges of the data eye. The deskew circuitry continuously monitors phase comparators 220 inside all data bit slices and periodically adjusts the tap settings of data and clock fine tune delay lines (200, 160) to optimally position the sampling clock. Controller 120 maintains minimum latency through delay lines 200 and 160 to minimize jitter added by the delay lines themselves. An overview of a feedback control system which can be used to control the Data, Data_Ready, Frame, and Clock delay lines is shown in
As can be seen in
If, however, filtered “data early” is detected from any given bit slice, control moves to 408 and the corresponding DMC register is incremented by one. Control then moves to 406.
At 406 a determination is made of the minimum DMC value across all the bit slices. If the minimum DMC value is greater than or equal to zero, control moves to 410 and the clock delay is set to the minimum clock delay. Control then moves to 414.
If, however, the minimum DMC value is less than zero, control moves to 412 and the clock delay is set to the increment corresponding to the absolute value of the minimum DMC value. Control then moves to 414.
At 414, each bit slice delay line 200 is set to delay its data signal by the difference between its DMC value and the minimum DMC value. Control then moves to 402 and the process begins again.
Since, as is noted above, any given phase comparison is subject to data edge jitter (i.e., noise which may exceed ±200 ps), many samples are observed before an estimate of the relative channel clock/channel data relationship is made. In one embodiment, such as is shown in
The filter of
If there are not a sufficient number of data transitions, filter 262 will not allow the delay line to change state. In one embodiment, fine tune delay line 200 can update in as short of time as Tclk*1024=5 ns*1024 or 5.12 us. An individual update can cause the data delay to move relative to the clock delay by +/−one tap setting (45 ps/90 ps increments best case (BC)/worst case (WC)). In order to deskew 1250 ps of skew, one tap setting at a time (BC) will require 150 us, assuming sufficient data transitions. This should be adequate for tracking delay variations due to environmental factors such as voltage and temperature.
In one embodiment, each of the devices 510, 530 include an integral communications interface; each communications interface includes a signal deskewing circuit 100 (not shown) as described and presented in detail above in connection with
Thus, novel structures and methods for reducing the skew on signals transmitted between electrical components while reducing both engineering and material costs related to achieving low skew occurrence in data signals has been described.
When transferring parallel data across a data link, variations in data path delay or an imperfectly positioned capture clock edge limit the maximum rate at which data can be transferred. Consequently, a premium is spent in engineering design time and material cost to realize a low skew data links with proper clock-data phase relationship. In one particular area, electrical cables, some have been paying a very high premium for low skew properties. This invention should dramatically relax the low skew requirement of similar cables and consequently reduce costs as they become easier to manufacture allowing more than a single vendor to produce. One should expect to achieve faster communication rates with this invention and thus the achievement of a higher premiums on products that implement this invention.
In one embodiment of the present invention, this invention compensates data path delays by adding additional delay to the early arriving signal until they match the delay of the latest arriving signal. Furthermore, if the clock which is to capture this data is early or late with respect to a optimal quadrature placement (depending on latch setup/hold requirement) additional data or clock path delay is added to optimally position all data with respect to the capturing clock.
This can be strategically important because it affords a way to either dramatically cut costs or achieve higher performance in an area where many in the affected industries would not without equivalent functionality. Much of system cost is based on commodity parts (e.g. Microprocessors, Memory), which most industry participants pays an equal price for, so in areas where one uses unique parts (e.g. cables) it is a strong advantage to be able to find much less expensive solutions to the problem of variations in data path delay when transferring parallel data cross a data link, in order to command higher product margins.
Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that any arrangement which is calculated to achieve the same purpose may be substituted for the specific embodiment shown. This application is intended to cover any adaptations or variations of the present invention. Therefore, it is manifestly intended that this invention be limited only by the claims and the equivalents thereof.
This application is a Divisional of U.S. patent application Ser. No. 11/405,387, filed on Apr. 17, 2006, now U.S. Pat. No. 7,433,441 which is a continuation of U.S. application Ser. No. 09/476,678, filed Dec. 30, 1999, now U.S. Pat. No. 7,031,420 the contents of which are incorporated herein by reference in their entirety.
The United States Government has rights to use this invention pursuant to subcontract B338314 issued by the University of California, which operates Lawrence Livermore National Laboratory for the United States Department of Energy under Contract No. W-7405-ENG-48.
Number | Name | Date | Kind |
---|---|---|---|
4375051 | Theall | Feb 1983 | A |
4477713 | Cook et al. | Oct 1984 | A |
4514749 | Shoji | Apr 1985 | A |
4525684 | Majefski | Jun 1985 | A |
4587445 | Kanuma | May 1986 | A |
4799214 | Kaku | Jan 1989 | A |
4823184 | Belmares-Sarabia et al. | Apr 1989 | A |
4896272 | Kurosawa | Jan 1990 | A |
4918685 | Tol et al. | Apr 1990 | A |
4926066 | Maini et al. | May 1990 | A |
4935741 | Reich | Jun 1990 | A |
4982428 | Agazzi et al. | Jan 1991 | A |
5124673 | Hershberger | Jun 1992 | A |
5144174 | Murakami | Sep 1992 | A |
5194765 | Dunlop et al. | Mar 1993 | A |
5274836 | Lux | Dec 1993 | A |
5295132 | Hashimoto et al. | Mar 1994 | A |
5315175 | Langner | May 1994 | A |
5394528 | Kobayashi et al. | Feb 1995 | A |
5410263 | Waizman | Apr 1995 | A |
5416606 | Katayama et al. | May 1995 | A |
5428806 | Pocrass | Jun 1995 | A |
5481567 | Betts et al. | Jan 1996 | A |
5487095 | Jordan et al. | Jan 1996 | A |
5490252 | Macera et al. | Feb 1996 | A |
5521836 | Hartong et al. | May 1996 | A |
5535223 | Horstmann et al. | Jul 1996 | A |
5537068 | Konno | Jul 1996 | A |
5544203 | Casasanta et al. | Aug 1996 | A |
5555188 | Chakradhar | Sep 1996 | A |
5579336 | Fitzgerald et al. | Nov 1996 | A |
5583454 | Hawkins et al. | Dec 1996 | A |
5598442 | Gregg et al. | Jan 1997 | A |
5604450 | Borkar et al. | Feb 1997 | A |
5621774 | Ishibashi et al. | Apr 1997 | A |
5631611 | Luu et al. | May 1997 | A |
5657346 | Lordi et al. | Aug 1997 | A |
5712883 | Miller et al. | Jan 1998 | A |
5757658 | Rodman et al. | May 1998 | A |
5760620 | Doluca | Jun 1998 | A |
5778214 | Taya et al. | Jul 1998 | A |
5778308 | Sroka et al. | Jul 1998 | A |
5787268 | Sugiyama et al. | Jul 1998 | A |
5790838 | Irish et al. | Aug 1998 | A |
5793259 | Chengson | Aug 1998 | A |
5802103 | Jeong | Sep 1998 | A |
5811997 | Chengson et al. | Sep 1998 | A |
5828833 | Belville et al. | Oct 1998 | A |
5832047 | Ferraiolo et al. | Nov 1998 | A |
5844954 | Casasanta et al. | Dec 1998 | A |
5847592 | Gleim et al. | Dec 1998 | A |
5870340 | Ohsawa | Feb 1999 | A |
5872471 | Ishibashi et al. | Feb 1999 | A |
5898729 | Boezen et al. | Apr 1999 | A |
5910898 | Johannsen | Jun 1999 | A |
5915104 | Miller | Jun 1999 | A |
5920213 | Graf, III | Jul 1999 | A |
5922076 | Garde | Jul 1999 | A |
5929717 | Richardson et al. | Jul 1999 | A |
5946712 | Lu et al. | Aug 1999 | A |
5948083 | Gervasi | Sep 1999 | A |
5982309 | Xi et al. | Nov 1999 | A |
6005895 | Perino et al. | Dec 1999 | A |
6016553 | Schneider et al. | Jan 2000 | A |
6029250 | Keeth | Feb 2000 | A |
6031847 | Collins et al. | Feb 2000 | A |
6075832 | Geannopoulos et al. | Jun 2000 | A |
6084930 | Dinteman | Jul 2000 | A |
6100735 | Lu | Aug 2000 | A |
6104223 | Chapman et al. | Aug 2000 | A |
6104228 | Lakshmikumar | Aug 2000 | A |
6127872 | Kumata | Oct 2000 | A |
6150875 | Tsinker | Nov 2000 | A |
6175598 | Yu et al. | Jan 2001 | B1 |
6178206 | Kelly et al. | Jan 2001 | B1 |
6181912 | Miller et al. | Jan 2001 | B1 |
6226330 | Mansur | May 2001 | B1 |
6229358 | Boerstler et al. | May 2001 | B1 |
6232946 | Brownlow et al. | May 2001 | B1 |
6259737 | Fung et al. | Jul 2001 | B1 |
6268841 | Cairns et al. | Jul 2001 | B1 |
6294924 | Ang et al. | Sep 2001 | B1 |
6294937 | Crafts et al. | Sep 2001 | B1 |
6310815 | Yamagata et al. | Oct 2001 | B1 |
6334163 | Dreps et al. | Dec 2001 | B1 |
6373908 | Chan | Apr 2002 | B2 |
6380878 | Pinna | Apr 2002 | B1 |
6417713 | DeRyckere et al. | Jul 2002 | B1 |
6421377 | Langberg et al. | Jul 2002 | B1 |
6430242 | Buchanan et al. | Aug 2002 | B1 |
6463548 | Bailey et al. | Oct 2002 | B1 |
6486723 | DeRyckere et al. | Nov 2002 | B1 |
6522173 | Otsuka | Feb 2003 | B1 |
6557110 | Sakamoto et al. | Apr 2003 | B2 |
6573764 | Taylor | Jun 2003 | B1 |
6574270 | Madkour et al. | Jun 2003 | B1 |
6597731 | Shuholm | Jul 2003 | B1 |
7031420 | Jenkins et al. | Apr 2006 | B1 |
7167523 | Mansur | Jan 2007 | B2 |
7248635 | Arneson et al. | Jul 2007 | B1 |
7433441 | Jenkins et al. | Oct 2008 | B2 |
20010033630 | Hassoun et al. | Oct 2001 | A1 |
20060188050 | Jenkins et al. | Aug 2006 | A1 |
Number | Date | Country |
---|---|---|
2003-008427 | Jan 2003 | JP |
Number | Date | Country | |
---|---|---|---|
20090034673 A1 | Feb 2009 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11405387 | Apr 2006 | US |
Child | 12247122 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 09476678 | Dec 1999 | US |
Child | 11405387 | US |