The following prior applications are herein incorporated by reference in their entirety for all purposes:
U.S. Patent Publication 2018/0115410A1 of application Ser. No. 15/791,373, filed Oct. 23, 2017, naming Armin Tajalli and Amin Shokrollahi, entitled “Quadrature and Duty Cycle Correction in Matrix Phase Lock Loop”, hereinafter referred to as [Tajalli I].
The present invention relates to communications systems circuits generally, and more particularly to obtaining stable, correctly phased receiver clock signals from a high-speed multi-wire interface used for chip-to-chip communication.
In modern digital systems, digital information has to be processed in a reliable and efficient way. In this context, digital information is to be understood as information available in discrete, i.e., discontinuous values. Bits, collection of bits, but also numbers from a finite set can be used to represent digital information.
In most chip-to-chip, or device-to-device communication systems, communication takes place over a plurality of wires to increase the aggregate bandwidth. A single or pair of these wires may be referred to as a channel or link and multiple channels create a communication bus between the electronic components. At the physical circuitry level, in chip-to-chip communication systems, buses are typically made of electrical conductors in the package between chips and motherboards, on printed circuit boards (“PCBs”) boards or in cables and connectors between PCBs. In high frequency applications, microstrip or stripline PCB traces may be used.
Common methods for transmitting signals over bus wires include single-ended and differential signaling methods. In applications requiring high speed communications, those methods can be further optimized in terms of power consumption and pin-efficiency, especially in high-speed communications. More recently, vector signaling methods have been proposed to further optimize the trade-offs between power consumption, pin efficiency and noise robustness of chip-to-chip communication systems. In those vector signaling systems, digital information at the transmitter is transformed into a different representation space in the form of a vector codeword that is chosen in order to optimize the power consumption, pin-efficiency and speed trade-offs based on the transmission channel properties and communication system design constraints. Herein, this process is referred to as “encoding”. The encoded codeword is communicated as a group of signals from the transmitter to one or more receivers. At a receiver, the received signals corresponding to the codeword are transformed back into the original digital information representation space. Herein, this process is referred to as “decoding”.
Regardless of the encoding method used, the received signals presented to the receiving device must be sampled (or their signal value otherwise recorded) at intervals best representing the original transmitted values, regardless of transmission channel delays, interference, and noise. This Clock and Data Recovery (CDR) not only must determine the appropriate sample timing, but must continue to do so continuously, providing dynamic compensation for varying signal propagation conditions.
Many known CDR systems utilize a Phase-Locked Loop (PLL) or Delay-Locked Loop (DLL) to synthesize a local receive clock having an appropriate frequency and phase for accurate receive data sampling. In advanced embodiments, multiple local clocks with particular phase relationships may be generated, as one example to permit overlapping or parallel processing of received information by multiple instances of the receiver embodiment.
Data receivers require accurately adjusted local clocks to enable accurate signal detection, and advanced receiver designs may require generation of multiple clock phases, collectively having particular relationships with the received data signals, and fixed phase relationships among each other.
A common receiver clock subsystem utilizes a phase-locked loop (PLL) to produce a local clock having the desired frequency and phase relationship with a reference signal, generally obtained with or derived from the received data. Within the PLL, a voltage-controlled oscillator based on a ring-connected sequence of active elements conveniently produces multiple clock phases in a fixed relationship. However, variations among the ring's active elements can also induce periodic clock variations and thus result in undesirable duty cycle variations as well as skew between output clock phases.
A configurable clock buffer chain is described, allowing adjustment of clock duty cycle and overall delay by internally modifying the rise and fall time of signals propagating between buffer stages. These buffers are combined with a measurement subsystem capable of directly measuring clock duty cycle and inter-phase skew, to provide clean, accurately timed multiphase clock signals to the data receiver.
To reliably detect the data values transmitted over a communications system, a receiver must accurately measure the received signal value amplitudes at carefully selected times. Various methods are known to facilitate such receive measurements, including reception of one or more dedicated clock signals associated with the transmitted data stream, extraction of clock signals embedded within the transmitted data stream, and synthesis of a local receive clock from known attributes of the communicated data stream. In general, the receiver embodiments of such timing methods are described as Clock-Data Recovery (CDR) or alternatively as performing Clock-Data Alignment (CDA). These timing methods are often based on Phase-Lock Loop (PLL) or Delay-Locked Loop (DLL) synthesis of a local receive clock having the desired frequency and phase characteristics.
In both PLL and DLL embodiments, a Phase Detector compares the relative phase (and in some variations, the relative frequency) of a received reference signal and a local clock signal to produce an error signal, which is subsequently used to correct the phase and/or frequency of the local clock source and thus minimize the error. As this feedback loop behavior will lead to a given PLL embodiment producing a fixed phase relationship (as examples, 0 degrees or 90 degrees of phase offset) between the reference signal and the local clock, an additional fixed or variable phase adjustment is often introduced to permit the phase offset to be set to a different desired value (as one example, 45 degrees of phase offset) to facilitate receiver data detection.
Advanced receiver embodiments may require the generation of two or more local clocks having particular phase relationships. As one example, a so-called “four phase” embodiment incorporates four instances of detection apparatus configured to operate on consecutive unit intervals of the received signal, with the resulting parallelism providing extended detection time. In such a system, four phases of local clock signals may be required having a fixed frequency and phase relationship to the reference signal, and also having fixed relationships to each other.
PLL Overview
Phase Locked Loops are well represented in the literature. A typical PLL is composed of a phase detector that compares an external reference signal to an internal clock signal, a low pass filter that smooths the resulting error value to produce a clock control signal, and a variable frequency clock source (typically, a Voltage Controlled Oscillator or VCO) controlled by the smoothed error value, producing the internal clock signal presented to the phase detector.
In an alternative embodiment, the variable frequency clock source is replaced by a variable delay element, its (optionally multiple tapped) outputs thus representing one or more successive time-delayed versions of the original input signal rather than successive cycles of an oscillator to be phase compared to the reference input signal. For the purposes of this document, such Delay Locked Loops (DLL) are considered functionally equivalent to a PLL in such an application, and the tapped variable delay element of a DLL functionally equivalent to the ring of delay elements in a PLL ring-oscillator VCO.
In one embodiment, a ring oscillator composed of a sequence of identical gates in a closed loop is used as the internal Voltage Controlled Oscillator (VCO) timing source for the PLL. The VCO frequency is varied by analog adjustment of at least one of gate propagation delay, inter-gate rise and fall time, and gate switching threshold within the ring oscillator. As examples, the supply voltage or current provided to the ring oscillator elements may be adjusted to modify internal node switching time and thus the resulting oscillation frequency. Outputs taken at equal intervals (i.e. separated by equal numbers of ring oscillator gates) along the sequence of gates comprising the ring oscillator can provide multi-phase clocks having a fixed phase relationship. Such ring oscillators are well represented in the art, typically comprised of three to eight or more elements typically embodied as digital inverters, with both single-ended and differential signal variations described in the literature.
The example embodiment illustrated in
It is known that periodic variations in edge timing of a receiver's local clock signals can lead to degraded signal detection quality, thus it is extremely desirable to minimize these effects. In the example of
[Tajalli 1] describes a ring oscillator embodiment in which multiple ring oscillator output phases are compared against each other using a matrix phase detector. The resulting differential phase error information is used to incrementally adjust the delay of each ring oscillator element, above and beyond the overall frequency and phase error correction applied to the ring oscillator as a whole by the primary PLL phase detector.
One embodiment of the system shown in
Instead of directly manipulating ring oscillator elements, the system of
Measurement subsystem 200 observes the resulting outputs Clock Phase 1 and Clock Phase 2, measuring individual clock duty cycles Clk1_duty and Clk2_duty respectively, and the differential clock offset between Clk1 and Clk2, denoted Clk_skew. In some embodiments, the delay correction Clk_skew may include a rising-edge to rising-edge (RE-to-RE) component and a falling-edge to falling-edge (FE-to-FE) component. The control logic 240 provides multi-bit control signals for adjusting stages 121, 122, and 123 in configurable clock buffer chains 120 and 160 to maintain the desired result at their outputs.
As shown, control logic 240 includes a selection circuit 293, shown in
Digital divide-by-2 flip-flops 230 and 270 produce rising-edge (RE) and/or falling-edge (FE) triggered half-rate square wave signals 231 and 271 from rising/falling edges respectively of Clk1 and Clk2 which are then compared by phase detector 280, shown here as a simple XOR gate. In some embodiments, both RE and FE triggered half-rate clocks are generated, while alternative embodiments may utilize a single edge-triggered half-rate clock to reduce convergence time for duty cycle and clock skew.
One embodiment of 200 minimizes measurement errors by implementing all signals and signal processing elements differentially, with identical loading on both signal paths in each differential pair. Thus, as examples, differential signal Clk1 passes through a differential R-C low pass filter 210 to differential comparator 220.
As illustrated in
When both EnP and EnN are enabled in a given parallel stage 330, stage 330 acts in parallel with 320 to provide an increased output drive current for both rising and falling edges of signal transitions on node 335, thus incrementally reducing the effective overall propagation delay of 310. Enabling only EnP provides increased drive (and thus, a faster transition time,) only for rising transitions, and enabling only EnN provides increased drive (and thus, faster transition time,) only for falling transitions. Other characteristics remaining constant, a faster rising transition time will incrementally increase the duration of active high levels of signal 321, and a faster falling transition time will incrementally increase the duration of active low levels of signal 312.
Seven parallel instances of 330 are shown, thus if control signals EnP<13:7> and EnN<13:7> are thermometer-coded, seven distinct amounts of augmentation may be configured for each of the rising and falling edge rates seen at node 321. Similarly, the seven instances of 350 can be configured to augment 340, using control signals EnP<6:0> and EnN<6:0>.
In one embodiment, transistors 331 and 334 within 330 are twice the size and current drive capability of the comparable transistors 351 and 354 in stage 340. As the transistors in 320 are themselves scaled to be twice the size of those in 340, one may observe that each step of augmentation provided by 330 may be 4× (assuming a 2× increase per stage) that provide by 350, thus EnP<13:7> and EnN<13:7> may be considered as “coarse” adjustment controls, and EnPb<6:0> and EnNb<6:0> as “fine” adjustment controls over the rising and falling edge characteristics of the signals being buffered by their respective stages.
As the “fine” and “coarse” control signals are thermometer-encoded where they are applied to each augmentation group, incremental control signal changes within each group are glitch-free. One particular embodiment insures that concurrent changes to both fine and coarse control signals are synchronized, by latching all control signals using a common clock. A further embodiment changes the amount of augmentation for a given edge transition only when the drivers for that edge are inactive.
In one embodiment, a finite state machine within the measurement subsystem initiates duty cycle and skew measurements, interprets the results, and adjusts the configurable clock buffer chains to minimize duty cycle and skew errors. To reduce power utilization, the measurement subsystem may operate periodically, rather than continuously. The finite state machine may perform duty cycle corrections for clock 1, duty cycle corrections for clock 2 duty cycle, and delay corrections for rising-edge to rising-edge and/or falling-edge to falling-edge delay corrections sequentially.
In a first startup mode, the duty cycle of each clock is rapidly optimized by simultaneously modifying both the rise time and the fall time configuration of its respective buffer chain after each measurement cycle. The skew between the two clocks is adjusted by modifying both the first and second clock rise times after each measurement cycle.
In a second operational mode, the duty cycle of each clock is adjusted non-intrusively, by incrementing modifying only the falling edge characteristics of the clock buffers after each measurement cycle. If required, the skew between the two clocks is adjusted by incrementally modifying the rise time for one clock or the other.
To minimize the number of control signals needing to be routed, the measurement subsystem outputs binary control values. A gray code is used to minimize glitching when incrementally increasing or decreasing a control value. The more-significant and less-significant portions of the control value are locally converted using Boolean logic from gray code to thermometer code to control enabling of driver elements in 350 and 330, respectively. Clocked latches synchronize changes between more-significant and less-significant portions of the control value to minimize glitching.
In some embodiments, generating the edge-triggered half-rate clocks includes generating half-rate singled-ended clocks from the first and second clock signals. In such embodiments, generating the half-rate clock signals includes generating complements of the half-rate single-ended clocks using inverters, and retiming the complements and the half-rate single-ended signals according to the corresponding first and second clock signals.
In some embodiments, the edge-triggered half-rate clocks include RE-triggered half-rate clocks and FE-triggered half-rate clocks from dividers operating responsive to rising edges and falling edges, respectively, of the first and second clock signals. In such embodiments, the method includes generating delay corrections between the first and second clock signals responsive to inter-phase comparisons between the RE-triggered and FE-triggered half-rate clocks. In some embodiments, generating the edge-triggered half-rate clocks includes enabling the dividers in a predetermined order. Such a predetermined order may be implemented via a state machine or logic.
In some embodiments, the method further comprising includes synchronizing the coarse and fine components of the multi-bit control signal. In some such embodiments, synchronization includes incrementally updating the set of multi-bit control signals by latching the coarse and fine components of the multi-bit control signal according to a flag signal. In some embodiments, the set of multi-bit control signals correspond to thermometer bits. In some such embodiments, the method further includes generating the thermometer bits from a gray code.
In some embodiments, the duty cycle corrections and delay corrections are selected via a selection circuit to incrementally update the multi-bit control signals. In some embodiments, the method further includes low-pass filtering each duty cycle correction and each delay correction responsive to selection by the selection circuit. The selection circuit may include a shared low-pass filter to perform the low-pass filtering.
In some embodiments, the method includes generating the common mode signal associated with the respective clock signal by low-pass filtering the respective clock signal.
In some embodiments, generating the inter-phase comparisons includes exclusively-OR (XORing) the edge-triggered half rate clocks. In some such embodiments, the method further includes low-pass filtering the inter-phase comparisons between the edge-triggered half-rate clocks. Such low-pass filters may be local low-pass filters operating directly on the outputs of the XOR logic gate.
In some embodiments, the coarse inverter stage precedes the fine inverter stage in the sets of clock buffers. In some embodiments, each bit of the coarse component of a given set of multi-bit control signal is provided to multiple inverters connected in parallel in the coarse inverter stage. In some embodiments, each inverter stage of a given set of clock buffers comprises at least one transistor for controlling the rising edge of a given clock signal and at least one transistor for controlling the falling edge of the given clock signal. In some embodiments, at least one transistor for controlling the rising edge of the given clock signal and at least one transistor for controlling the falling edge of the given clock signal are inverted with respect to each other in the coarse and fine inverter stages.
In some embodiments, adjusting the respective coarse and fine inverter stages of the set of clock buffers includes simultaneously adjusting the rising edges and falling edges of the first and second reference signals responsive to corresponding duty cycle corrections during a start-up mode of operation.
In some embodiments, adjusting the respective coarse and fine inverter stages of the set of clock buffers includes adjusting the falling edges of the first and second reference signals responsive to corresponding duty cycle corrections during a mission mode of operation. Furthermore, adjusting the respective coarse and fine inverter stages of the set of clock buffers may include simultaneously adjusting the rising or falling edges of the first and second reference signals responsive to each delay correction during a mission mode of operation, depending on which edge was used to generate the edge-triggered half-rate clocks.
In some embodiments, comprising selecting the first and second clock signals from a main clock path or a phase-interpolator clock path.
Number | Name | Date | Kind |
---|---|---|---|
4839907 | Saneski | Jun 1989 | A |
5266907 | Dacus | Nov 1993 | A |
5528198 | Baba et al. | Jun 1996 | A |
5602884 | Wieczorkiewicz et al. | Feb 1997 | A |
5629651 | Mizuno | May 1997 | A |
5802356 | Gaskins et al. | Sep 1998 | A |
6026134 | Duffy et al. | Feb 2000 | A |
6307906 | Tanji et al. | Oct 2001 | B1 |
6316987 | Dally et al. | Nov 2001 | B1 |
6380783 | Chao et al. | Apr 2002 | B1 |
6389091 | Yamaguchi et al. | May 2002 | B1 |
6509773 | Buchwald et al. | Jan 2003 | B2 |
6717478 | Kim et al. | Apr 2004 | B1 |
7199728 | Dally et al. | Apr 2007 | B2 |
7336112 | Sha et al. | Feb 2008 | B1 |
7535957 | Ozawa et al. | May 2009 | B2 |
7616075 | Kushiyama | Nov 2009 | B2 |
7650525 | Chang et al. | Jan 2010 | B1 |
7688929 | Co | Mar 2010 | B2 |
7860190 | Feller | Dec 2010 | B2 |
8036300 | Evans et al. | Oct 2011 | B2 |
8253454 | Lin | Aug 2012 | B2 |
8791735 | Shibasaki | Jul 2014 | B1 |
9036764 | Hossain et al. | May 2015 | B1 |
9059816 | Simpson et al. | Jun 2015 | B1 |
9306621 | Zhang et al. | Apr 2016 | B2 |
9374250 | Musah et al. | Jun 2016 | B1 |
9397868 | Hossain et al. | Jul 2016 | B1 |
9438409 | Liao et al. | Sep 2016 | B1 |
9520883 | Shibasaki | Dec 2016 | B2 |
9565036 | Zerbe et al. | Feb 2017 | B2 |
9577815 | Simpson et al. | Feb 2017 | B1 |
9602111 | Shen | Mar 2017 | B1 |
9906358 | Tajalli | Feb 2018 | B1 |
9960902 | Lin et al. | May 2018 | B1 |
10055372 | Shokrollahi | Aug 2018 | B2 |
20030001557 | Pisipaty | Jan 2003 | A1 |
20030146783 | Bandy et al. | Aug 2003 | A1 |
20040092240 | Hayashi | May 2004 | A1 |
20050024117 | Kubo et al. | Feb 2005 | A1 |
20050084050 | Cheung et al. | Apr 2005 | A1 |
20050117404 | Savoj | Jun 2005 | A1 |
20050128018 | Meltzer | Jun 2005 | A1 |
20050201491 | Wei | Sep 2005 | A1 |
20050220182 | Kuwata | Oct 2005 | A1 |
20050275470 | Choi | Dec 2005 | A1 |
20060140324 | Casper et al. | Jun 2006 | A1 |
20060232461 | Felder | Oct 2006 | A1 |
20070001713 | Lin | Jan 2007 | A1 |
20070001723 | Lin | Jan 2007 | A1 |
20070047689 | Menolfi et al. | Mar 2007 | A1 |
20070147559 | Lapointe | Jun 2007 | A1 |
20070201597 | He et al. | Aug 2007 | A1 |
20080007367 | Kim | Jan 2008 | A1 |
20080165841 | Wall et al. | Jul 2008 | A1 |
20080181289 | Moll | Jul 2008 | A1 |
20080317188 | Staszewski et al. | Dec 2008 | A1 |
20090103675 | Yousefi et al. | Apr 2009 | A1 |
20090167389 | Reis | Jul 2009 | A1 |
20090195281 | Tamura et al. | Aug 2009 | A1 |
20090231006 | Jang | Sep 2009 | A1 |
20090262876 | Arima et al. | Oct 2009 | A1 |
20100156543 | Dubey | Jun 2010 | A1 |
20100180143 | Ware et al. | Jul 2010 | A1 |
20100220828 | Fuller et al. | Sep 2010 | A1 |
20110002181 | Wang et al. | Jan 2011 | A1 |
20110025392 | Wu | Feb 2011 | A1 |
20110311008 | Slezak et al. | Dec 2011 | A1 |
20120206177 | Colinet et al. | Aug 2012 | A1 |
20120327993 | Palmer | Dec 2012 | A1 |
20130088274 | Gu | Apr 2013 | A1 |
20130091392 | Valliappan et al. | Apr 2013 | A1 |
20130207706 | Yanagisawa | Aug 2013 | A1 |
20130243127 | Ito et al. | Sep 2013 | A1 |
20130271194 | Madoglio et al. | Oct 2013 | A1 |
20130285720 | Jibry | Oct 2013 | A1 |
20130314142 | Tamura et al. | Nov 2013 | A1 |
20140286381 | Shibasaki | Sep 2014 | A1 |
20150078495 | Hossain et al. | Mar 2015 | A1 |
20150117579 | Shibasaki | Apr 2015 | A1 |
20150180642 | Hsieh et al. | Jun 2015 | A1 |
20150220472 | Sengoku | Aug 2015 | A1 |
20150256326 | Simpson et al. | Sep 2015 | A1 |
20160056980 | Wang et al. | Feb 2016 | A1 |
20160134267 | Adachi | May 2016 | A1 |
20170310456 | Tajalli | Oct 2017 | A1 |
20180083763 | Black | Mar 2018 | A1 |
20180115410 | Tajalli et al. | Apr 2018 | A1 |
20180375693 | Zhou et al. | Dec 2018 | A1 |
Number | Date | Country |
---|---|---|
203675093 | Jun 2014 | CN |
0740423 | Oct 1996 | EP |
Entry |
---|
Loh, Mattew , et al., “A 3x9 Gb/s Shared, All-Digital CDR for High-Speed, High-Density I/O”, IEEE Journal of Solid-State Circuits, vol. 47, No. 3, Mar. 2012, 641-651 (11 pages). |
Riley, M. W. , et al., “Cell Broadband Engine Processor: Design and Implementation”, IBM Journal of Research and Development, vol. 51, No. 5, Sep. 2007, 545-557 (13 pages). |