The following prior applications are herein incorporated by reference in their entirety for all purposes:
U.S. patent application Ser. No. 16/107,822 filed Aug. 21, 2018, entitled “High Performance Phase Locked Loop”, naming Armin Tajalli, hereinafter referred to as [Tajalli I].
The present embodiments relate to communications systems circuits generally, and more particularly to obtaining a stable, correctly phased receiver clock signal from a high-speed multi-wire interface used for chip-to-chip communication.
In modern digital systems, digital information has to be processed in a reliable and efficient way. In this context, digital information is to be understood as information available in discrete, i.e., discontinuous values. Bits, collection of bits, but also numbers from a finite set can be used to represent digital information.
In most chip-to-chip, or device-to-device communication systems, communication takes place over a plurality of wires to increase the aggregate bandwidth. A single or pair of these wires may be referred to as a channel or link and multiple channels create a communication bus between the electronic components. At the physical circuitry level, in chip-to-chip communication systems, buses are typically made of electrical conductors in the package between chips and motherboards, on printed circuit boards (“PCBs”) boards or in cables and connectors between PCBs. In high frequency applications, microstrip or stripline PCB traces may be used.
Common methods for transmitting signals over bus wires include single-ended and differential signaling methods. In applications requiring high speed communications, those methods can be further optimized in terms of power consumption and pin-efficiency, especially in high-speed communications. More recently, vector signaling methods have been proposed to further optimize the trade-offs between power consumption, pin efficiency and noise robustness of chip-to-chip communication systems. In such vector signaling systems, digital information at the transmitter is transformed into a different representation space in the form of a vector codeword that is chosen in order to optimize the power consumption, pin-efficiency and speed trade-offs based on the transmission channel properties and communication system design constraints. Herein, this process is referred to as “encoding”. The encoded codeword is communicated as a group of signals from the transmitter to one or more receivers. At a receiver, the received signals corresponding to the codeword are transformed back into the original digital information representation space. Herein, this process is referred to as “decoding”.
Regardless of the encoding method used, the received signals presented to the receiving device must be sampled (or their signal value otherwise recorded) at intervals best representing the original transmitted values, regardless of transmission channel delays, interference, and noise. Such Clock and Data Recovery (CDR) not only determines the appropriate sample timing, but may continue to do so continuously, providing dynamic compensation for varying signal propagation conditions.
Many known CDR systems utilize a Phase-Locked Loop (PLL) or Delay-Locked Loop (DLL) to synthesize a local receive clock having an appropriate frequency and phase for accurate receive data sampling.
To reliably detect the data values transmitted over a communications system, a receiver measures the received signal value amplitudes at carefully selected times. Various methods are known to facilitate such receive measurements, including reception of one or more dedicated clock signals associated with the transmitted data stream, extraction of clock signals embedded within the transmitted data stream, and synthesis of a local receive clock from known attributes of the communicated data stream. In general, the receiver embodiments of such timing methods are described as Clock-Data Recovery (CDR) or alternatively as Clock-Data Alignment (CDA) often based on Phase-Lock Loop (PLL) or Delay-Locked Loop (DLL) synthesis of a local receive clock having the desired frequency and phase characteristics.
In both PLL and DLL embodiments, a phase comparator compares the relative phase (and in some variations, the relative frequency) of a received reference signal and a local clock signal to produce an error signal, which is subsequently used to correct the phase and/or frequency of the local clock source and thus minimize the error. As this feedback loop behavior will lead to a given PLL embodiment producing a fixed phase relationship (as examples, 0 degrees or 90 degrees of phase offset) between the reference signal and the local clock, an additional fixed or variable phase adjustment is often introduced to permit the phase offset to be set to a different desired value (as one example, 45 degrees of phase offset) to facilitate receiver data detection.
Methods and systems are described herein for generating a plurality of phases of an oscillator signal via two rings of a voltage controlled oscillator (VCO), each ring of the two rings (i) generating a subset of the phases of the plurality of phases of the oscillator signal at outputs of three stages of inverters, (ii) inverse-phase locking the subsets of phases of the plurality of phases of the oscillator signal of the two rings of each stage to a corresponding stage in an other ring of the two rings using inverters, and (iii) receiving inputs at each stage from a previous stage in the ring and a feed-forward signal from a successive stage in the other ring of the two rings, generating a tail current at a tail current supply, the tail current comprising a low-magnitude proportional component associated with high-frequency phase comparisons and a high-magnitude integral component associated with an accumulation of phase comparison, and supplying the two rings of the VCO with the tail current.
In some embodiments, synthesis of additional local clock phases is desirable to enable multi-phase or pipelined processing of received data values, facilitate phase interpolation, and/or provide additional inputs to the phase detection process to reduce clock jitter and/or improve PLL closed-loop bandwidth. As one example, [Tajalli I] describes an embodiment in which multiple voltage-controlled oscillator (VCO) phases and (optionally, multiple delayed) instances of a received clock reference are compared using a matrix of phase comparator elements, the multiple partial phase error signals of which are combined in a weighed summation to provide a VCO phase error correction.
VCOs utilizing ring-connected strings of active circuit elements are well represented in the art. The basic oscillation frequency of such a VCO is determined by the total propagation time of the string of active element. Thus, to enable high-speed operation, a simple digital inverter having minimal propagation delay is typically used as the active element. Ring-connected strings of differential amplifiers or buffers are also known, with stable oscillation occurring as long as the overall phase shift at the desired oscillation frequency is an odd multiple of 180 degrees.
The ring oscillation frequency may be varied using a control signal that adjusts active circuit element delay, which may in turn be functions of circuit gain and switching threshold. Alternatively, the effective propagation delay may be increased by limiting the skew rate of signal transitions propagating between ring elements, either explicitly by constraining the output current drive of each active circuit element, or implicitly by constraining the supply voltage or current provided to each active circuit element.
Conveniently, an N-element ring oscillator inherently provides N multiple clock output phases, each typically offset by 180/N degrees of phase difference. Thus, a ring oscillator comprising three identical single-ended digital inverters can provide three distinct phases of oscillator output signal.
To reliably detect the data values transmitted over a communications system, a receiver measures the received signal value amplitudes at carefully selected times. Various methods are known to facilitate such receive measurements, including reception of one or more dedicated clock signals associated with the transmitted data stream, extraction of clock signals embedded within the transmitted data stream, and synthesis of a local receive clock from known attributes of the communicated data stream. In general, the receiver embodiments of such timing methods are described as Clock-Data Recovery (CDR) or alternatively as Clock-Data Alignment (CDA) often based on Phase-Lock Loop (PLL) or Delay-Locked Loop (DLL) synthesis of a local receive clock having the desired frequency and phase characteristics.
A typical PLL is composed of a phase comparator that compares an external reference signal to an internal clock signal, a low pass filter that smooths the resulting error value to produce a clock control signal, and a variable frequency clock source (typically, a Voltage Controlled Oscillator or VCO) controlled by the smoothed error value, producing the internal clock signal presented to the phase comparator. In a well-know variation, such a PLL design may incorporate a clock frequency divider between the VCO and the phase comparator, allowing a higher-frequency clock output to be phase locked to a lower-frequency reference signal.
In an alternative Delay-Locked Loop (DLL) embodiment, the variable frequency clock source is replaced by a variable delay element, having (optionally multiple tapped) outputs representing one or more successive time-delayed versions of the original input signal rather than successive cycles of an oscillator to be phase compared to the reference input signal. For the purposes of this document, the variable delay elements utilized in a DLL are considered functionally equivalent to the variable delay elements of a ring-connected oscillator VCO in a PLL embodiment.
In some embodiments, the PLL may synthesize additional local clock phases to enable multi-phase or pipelined processing of received data values, facilitate phase interpolation, and/or provide additional inputs to the phase detection process to reduce clock jitter and/or improve PLL closed-loop bandwidth. As one example, [Tajalli I] describes an embodiment in which multiple VCO phases and (optionally, multiple delayed) instances of a received clock reference are compared using a matrix of phase comparator elements, the multiple partial phase error signals of which are combined in a weighed summation to provide a VCO phase error correction.
A simple digital XOR gate may be used as a phase comparator. As a non-limiting example, an XOR between two square wave signals having a phase difference will result in a variable-duty-cycle waveform which, when low pass filtered into an analog error signal, results in a proportional error signal centered in its analog signal range when the two input signals have a 90-degree phase offset relationship. More complex finite state machine (FSM) phase comparator compare the relative arrival times of clock transitions, as one example using edge-triggered latches clocked respectively by the reference and internal clock signals, with the relative arrival time of the clock edges resulting in an “early” or “late” output signal that produces a corresponding correction of the VCO phase. Other FSM phase comparator embodiments are additionally sensitive to clock frequency differences, enabling faster initial loop lock at startup. Further embodiments accumulate multiple phase error measurements into an integral error result which may be used alone or in combination with a short-term proportional result as a VCO control value. Some embodiments incorporate some or all of these functions into firmware or software executing on a CPU, or implemented as a hardware state machine.
In most PLL embodiments, the error signal produced by the phase comparator is low pass filtered and applied as an analog control voltage used to adjust the VCO operating frequency.
The control signal used to adjust the VCO frequency may be composed of multiple components; a proportional component formed from phase comparisons of a reference and a local clock signal by a phase comparator, a matrix of multiple such comparisons as taught in [Tajalli I], the output of a FSM performing frequency or phase comparisons, as well as an integral component derived from an accumulated history of phase measurements. In some embodiments, a first control signal component may correspond to a long time constant correction and a second control signal component may correspond to a short time constant correction. Identical or different weights or scaling factors may be applied to said first and second control components when they are combined.
In an alternative embodiment, all or part of the filtering operation are subsumed into the same digital processing used for phase comparison, with the digital error result applied to a digital-to-analog converter (DAC) to obtain an analog VCO control signal. In further embodiments, all or a portion of the VCO control signal may be applied in the digital domain.
Voltage-controlled oscillators operate responsive to an initial signal transitioning and propagating down the string of connected elements, appearing at consecutive element outputs after a delay determined by the signal propagation delay of the active circuit element. Thus, as one example offered without limitation, the initial signal transition would appear at the end of a series-connected string of four active circuit elements after (4*prop_delay), corresponding to one half-cycle of the oscillator. If the output is inverted and applied to the input (thus, the term “ring-connected”) the oscillation will continue for another half period, resulting in a square wave output with a period of (2*Σprop_delay), determined by the total propagation time Σprop_delay of the string of active elements. The number of active circuit elements in the ring may vary, with oscillation occurring as long as the overall phase shift of the string at the desired oscillation frequency is an odd multiple of 180 degrees. Thus, to enable high-speed oscillation, the string length is kept short and minimal-delay active elements such as simple digital inverters, amplifiers, or buffers are used.
The ring oscillation frequency may be varied using a control signal that adjusts an active circuit element parameter affecting propagation delay, such as gain, switching threshold, or output drive; low frequency embodiments are also known that incorporate configurable passive delay elements between active stages to provide additional control. At high frequencies, a significant component of an active element's propagation time can be the node charge/discharge time required for an output state change in one element to charge or discharge the parasitic capacitance of the interconnecting node and reach the switching threshold of the subsequent element's input. Under these conditions, the effective propagation delay may be varied by limiting the skew rate of signal transitions propagating between ring elements, either by explicitly adjusting the output current drive capability or output impedance of each active circuit element, or by implicitly making such adjustment by varying the supply voltage or current provided to the active circuit elements.
Conveniently, an N-element lring oscillator inherently generates N multiple clock output phases as the output of consecutive active elements, each typically offset by an additional 180/N degrees of phase difference. In embodiments based on inverting active elements, an additional 180-degree offset (i.e. inversion) will be seen at odd-numbered outputs, using the input of the first element as the reference.
In some embodiments, the inverters e.g., 120 and 160, generating the subset of phases of the plurality of phases of the oscillator signal and the back-to-back inverters 125 and 126 cross coupling each stage to the corresponding stage in the other ring have different sizes. In some such embodiments, inverters 120 and 160 may have a relative size of 1×, and the back-to-back inverters 125 and 126 may have a relative size of <1×. In some embodiments, the back-to-back inverters 125 and 126 have a size of approximately 0.4× with respect to inverters 120 and 160. In some embodiments, feed-forward inverters 128 and 168 have a size of approximately 0.6× with respect to inverters 120 and 160.
In some embodiments, generating the tail current includes receiving control signals at input transistors and responsively generating the low-magnitude proportional component of the tail current and the high-magnitude integral component of the tail current. In some embodiments, the control signal for generating the low-magnitude proportional component of the tail current is received at an input transistor having a smaller size than an input transistor receiving the control signal for generating the high-magnitude integral component of the tail current. In some embodiments, the control signals for generating the low-magnitude proportional component of the tail current and for generating the high-magnitude integral component of the tail current are received at respective sets of equal-sized transistors. In some such embodiments, the respective set of equal-sized transistors receiving the control signal for generating the low-magnitude proportional component of the tail current includes a less amount of input transistors than the respective set of input transistors receiving the control signal for generating the high-magnitude integral component of the tail current. In some embodiments, the control signals are generated using a phase and frequency detection circuit (not shown).
One embodiment of a VCO includes two rings each having three stages of series-connected digital inverters, the two rings together providing differential (i.e. 180 degree offset) clock signals. Each ring feeds back on itself to maintain the three-inversion topology for oscillation with a (2*3*gate_delay) period. In the illustration of
To maintain an inverse-phase lock between the two rings, each ring node is cross-connected to its corresponding node on the other ring using back-to-back digital inverters, maintaining the desired 180 degree phase offset between corresponding nodes on the two rings. In such embodiments, the back-to-back inverters provide bidirectional synchronization between corresponding nodes on the two rings, as well as introducing a small amount of hysteresis into node switching transitions. Thus, the output of first ring node 120 is cross-connected to the output of second ring node 160 by inverters 125 and 126, the output of 130 is cross-connected to the output of 170 by inverters 135 and 136, and the output of 140 is cross-connected to the output of 180 by inverters 145 and 146.
As the desired ring oscillation frequency approaches the design limits of the embodiment's integrated circuit process, each ring node is also driven with a small amount of feed-forward signal from a node 60 degrees earlier in the oscillation cycle (which, in the example three inverting element two ring topology, may be obtained from a successive stage of the other ring.) This feed-forward signal begins to drive the node in anticipation of the switching transition, allowing operation at a higher frequency than would otherwise be possible. In
Such anticipatory signaling cannot exceed that of the primary signal path, or spurious high-frequency oscillation can occur. Similarly, signaling on the cross-coupling path introduces hysteresis which delays switching transitions, so must also be constrained to be significantly less than that of the primary signal path. The amount of anticipatory and cross-coupled signaling may be controlled by scaling the size of the transistors (and thus their current drive capability) of the inverters relative to the transistor size of the inverters used on the primary signal path.
In one particular embodiment, feed-forward signaling was found to be of benefit at approximately 60% of the drive level of the primary signal path, with cross-coupling at approximately 40% of the drive level of the primary signal path. Smaller amounts of feed-forward signaling provided correspondingly smaller speed-up benefit. Larger amounts of cross-coupling increased the effective propagation delay of the active ring elements, and significantly smaller amounts reduced the desired locked phase relationship between the first and second rings.
As the output current and thus the output slew rate of a CMOS inverter varies with according to supply voltage, the varying output slew rate of each inverter into its output node capacitance will result in a variation of propagation delay with supply voltage, providing a mechanism for adjusting the ring oscillation frequency. As shown in
For the inverter structures shown in
The particular examples of three stages of active elements per ring and two rings do not imply a limitation in either minimum or maximum, although the available phase differences within a two element ring will generally preclude use of feed-forward speedup as described herein. Similarly, the CMOS ring inverters used for descriptive purposes above may alternatively utilize CML or other digital design conventions, or equivalent analog amplifier/buffer conventions.
Number | Name | Date | Kind |
---|---|---|---|
4839907 | Saneski | Jun 1989 | A |
5266907 | Dacus | Nov 1993 | A |
5528198 | Baba et al. | Jun 1996 | A |
5602884 | Wieczorkiewicz et al. | Feb 1997 | A |
5629651 | Mizuno | May 1997 | A |
5802356 | Gaskins et al. | Sep 1998 | A |
6026134 | Duffy et al. | Feb 2000 | A |
6307906 | Tanji et al. | Oct 2001 | B1 |
6316987 | Dally et al. | Nov 2001 | B1 |
6380783 | Chao et al. | Apr 2002 | B1 |
6389091 | Yamaguchi et al. | May 2002 | B1 |
6509773 | Buchwald et al. | Jan 2003 | B2 |
6717478 | Kim et al. | Apr 2004 | B1 |
7199728 | Dally et al. | Apr 2007 | B2 |
7336112 | Sha et al. | Feb 2008 | B1 |
7535957 | Ozawa et al. | May 2009 | B2 |
7616075 | Kushiyama | Nov 2009 | B2 |
7650525 | Chang et al. | Jan 2010 | B1 |
7688929 | Co | Mar 2010 | B2 |
7860190 | Feller | Dec 2010 | B2 |
8036300 | Evans et al. | Oct 2011 | B2 |
8253454 | Lin | Aug 2012 | B2 |
8791735 | Shibasaki | Jul 2014 | B1 |
9036764 | Hossain et al. | May 2015 | B1 |
9059816 | Simpson et al. | Jun 2015 | B1 |
9306621 | Zhang et al. | Apr 2016 | B2 |
9374250 | Musah et al. | Jun 2016 | B1 |
9397868 | Hossain et al. | Jul 2016 | B1 |
9438409 | Liao et al. | Sep 2016 | B1 |
9520883 | Shibasaki | Dec 2016 | B2 |
9565036 | Zerbe et al. | Feb 2017 | B2 |
9577815 | Simpson et al. | Feb 2017 | B1 |
9602111 | Shen et al. | Mar 2017 | B1 |
9906358 | Tajalli | Feb 2018 | B1 |
9960902 | Lin et al. | May 2018 | B1 |
10055372 | Shokrollahi | Aug 2018 | B2 |
10374787 | Tajalli | Aug 2019 | B2 |
20030001557 | Pisipaty | Jan 2003 | A1 |
20030146783 | Bandy et al. | Aug 2003 | A1 |
20040092240 | Hayashi | May 2004 | A1 |
20050024117 | Kubo et al. | Feb 2005 | A1 |
20050084050 | Cheung et al. | Apr 2005 | A1 |
20050117404 | Savoj | Jun 2005 | A1 |
20050128018 | Meltzer | Jun 2005 | A1 |
20050201491 | Wei | Sep 2005 | A1 |
20050220182 | Kuwata | Oct 2005 | A1 |
20050275470 | Choi | Dec 2005 | A1 |
20060140324 | Casper et al. | Jun 2006 | A1 |
20060232461 | Felder | Oct 2006 | A1 |
20070001713 | Lin | Jan 2007 | A1 |
20070001723 | Lin | Jan 2007 | A1 |
20070047689 | Menolfi et al. | Mar 2007 | A1 |
20070147559 | Lapointe | Jun 2007 | A1 |
20070201597 | He et al. | Aug 2007 | A1 |
20080007367 | Kim | Jan 2008 | A1 |
20080165841 | Wall et al. | Jul 2008 | A1 |
20080181289 | Moll | Jul 2008 | A1 |
20080317188 | Staszewski et al. | Dec 2008 | A1 |
20090103675 | Yousefi et al. | Apr 2009 | A1 |
20090167389 | Reis | Jul 2009 | A1 |
20090195281 | Tamura et al. | Aug 2009 | A1 |
20090231006 | Jang et al. | Sep 2009 | A1 |
20090262876 | Arima et al. | Oct 2009 | A1 |
20100156543 | Dubey | Jun 2010 | A1 |
20100180143 | Ware et al. | Jul 2010 | A1 |
20100220828 | Fuller et al. | Sep 2010 | A1 |
20110002181 | Wang et al. | Jan 2011 | A1 |
20110025392 | Wu et al. | Feb 2011 | A1 |
20110311008 | Slezak et al. | Dec 2011 | A1 |
20120206177 | Colinet et al. | Aug 2012 | A1 |
20120327993 | Palmer | Dec 2012 | A1 |
20130088274 | Gu | Apr 2013 | A1 |
20130091392 | Valliappan et al. | Apr 2013 | A1 |
20130207706 | Yanagisawa | Aug 2013 | A1 |
20130243127 | Ito et al. | Sep 2013 | A1 |
20130271194 | Madoglio et al. | Oct 2013 | A1 |
20130285720 | Jibry | Oct 2013 | A1 |
20130314142 | Tamura et al. | Nov 2013 | A1 |
20140286381 | Shibasaki | Sep 2014 | A1 |
20150078495 | Hossain et al. | Mar 2015 | A1 |
20150117579 | Shibasaki | Apr 2015 | A1 |
20150180642 | Hsieh et al. | Jun 2015 | A1 |
20150220472 | Sengoku | Aug 2015 | A1 |
20150256326 | Simpson et al. | Sep 2015 | A1 |
20160056980 | Wang et al. | Feb 2016 | A1 |
20160134267 | Adachi | May 2016 | A1 |
20170310456 | Tajalli | Oct 2017 | A1 |
20180083763 | Black et al. | Mar 2018 | A1 |
20180375693 | Zhou et al. | Dec 2018 | A1 |
Number | Date | Country |
---|---|---|
203675093 | Jun 2014 | CN |
0740423 | Oct 1996 | EP |
Entry |
---|
Loh, Mattew , et al., “A 3x9 Gb/s Shared, All-Digital CDR for High-Speed, High-Density I/O”, IEEE Journal of Solid-State Circuits, vol. 47, No. 3, Mar. 2012, 641-651 (11 pages). |
Riley, M. W. , et al., “Cell Broadband Engine Processor: Design and Implementation”, IBM Journal of Research and Development, vol. 51, No. 5, Sep. 2007, 545-557 (13 pages). |