A portion of the disclosure of this patent document contains material, which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention generally relates to jitter compensation clock and data recovery, and more specifically, to a four-level pulse amplitude modulation (PAM-4) receiver with jitter compensation clock and data recovery.
Driven by the proliferation of data intensive applications such as 5G communications, cloud services, autonomous vehicles, deep neural networks and 8K display panels, high speed and low power data movements from processors to processors and from processors to off-chip memories become crucial problems in high-performance computing systems. The explosive rise in processor I/O bandwidth demands for massive low-power links with advanced signaling schemes such as four-level pulse amplitude modulation (PAM-4). As the data rate reaches over 50 Gb/s/lane with PAM-4 signaling, the signal quality is getting increasingly susceptible to jitter generated from both channel and circuits. Therefore, the clock distribution circuit requires extra design efforts to handle the jitter issues and ensure robust system synchronization.
Source synchronous I/O is an attractive technique in chip-to-chip interconnections due to its low latency and high reliability in frequency recovery, wide jitter tolerance bandwidth and implementation simplicity. A widely adopted source synchronous I/O architecture consisting of a differential clock lane and a differential data lane is shown in
In order to achieve robust synchronization, several jitter and phase skew related issues need to be carefully handled in the clock distribution circuit. First, the data phase and clock phase are typically precisely aligned at the transmitter outputs in source synchronous I/Os. The correlated jitter between clock and data can be properly accommodated by the wideband MPLL or IJO. However, the difference in latencies between the data and clock lanes cause a phase skew between the equalized data and reproduced clock from Buf and DCC. The unfolded phase skew induced by the delays from channels, equalizer (EQ), Buf, DCC, IJO/MPLL and PI can reach several UIs. Second, except from phase skew, uncorrelated jitter exists between data and clock. For electrical interconnections, the uncorrelated jitter originates from ground and supply noises, temperature drift, front-end circuit flicker noises, channel coupling and electro-magnetic interference (EMI). For optical interconnections, the uncorrelated jitter is mainly due to the noises of photo detector and front-end circuit. Third, even though these uncorrelated jitters are sufficiently tracked, the jitter transfer on the recovered clock signal CLKREC and data signal DATAREC can still cause error when synchronized to the local clock on the following digital processing systems. For massively parallel communication, the uncorrelated jitters are different from lane to lane, which also induce synchronization challenges.
Various solutions have been reported against the previously mentioned challenges. For example, a delay-locked loop (DLL) can be employed for multi-phase clock generation and a PI with coarse and fine phase selection to eliminate the skew between clock and data. To avoid the multi-phase clock mismatch produced by the DLL due to the voltage-controlled delay line (VCDL) asymmetry, an IJO could be used for global I/Q phase generation with better phase matching using proper dummy and frequency calibration techniques, which was followed by a PI or a local IJO for phase skew elimination. Digital-controlled delay line in data path or IJO in clock path could also be employed as deskew methods for parallel optical interconnections with source synchronous clocking scheme. Although the above-said methods properly handled the static phase skew induced by the differences in lane latencies, the uncorrelated jitter between data and clock remained untracked. The narrow jitter tolerance amplitude and bandwidth could stress the decoding circuit.
On the other hand, various techniques such as time-to-time phase update, clock and data alignment (CDA) and clock and data recovery (CDR) had been proposed to support sufficient jitter tracking as shown in
Jitter tolerance and jitter transfer decoupling techniques can be used to support wide jitter tolerance bandwidth with suppressed jitter transfer to the recovered clock and data of each lane. A dual-loop configuration consisting of a wideband DLL and narrow-band PLL was proposed to achieve jitter tolerance and jitter transfer bandwidth decoupling. A low-pass loop filter with adjustable loop bandwidth for data and edge samplings was demonstrated to achieve a wide 20-MHz jitter tolerance bandwidth with a narrow 4-MHz jitter transfer bandwidth, under 40-Gb/s ¼-rate receiver (Rx) architecture. Nonetheless, the previously reported methods could only narrow the jitter transfer bandwidth down to a few MHz, not enough to sufficiently filter out lower frequency jitters from power and ground noises, temperature drift and CMOS device flicker noise. In addition, the jitter tolerance and jitter transfer decoupling technique had only been implemented with non-return-to-zero (NRZ) Rx architecture at data rate below 50 Gb/s.
To solve the above-mentioned challenges, the present invention provides a source synchronous 60-Gb/s ¼-rate PAM-4 receiver with a jitter compensation CDR (JCCDR) in 40-nm CMOS technology, achieving a wide jitter tolerance bandwidth (40-MHz) and an ultralow jitter transfer (<−8-dB).
According to one aspect of the present invention, the provided PAM-4 receiver includes a first-order delay-locked loop (DLL) which employs a bang-bang phase detector (BBPD) and a voltage-controlled delay line (VCDL) circuit supporting 40 MHz jitter tracking bandwidth and static phase skew elimination. A second-order wideband phase-locked loop (WBPLL) using the ¼-rate reference clock provides multi-phase clock generation and ensure a sufficiently low input-to-output latency. To suppress the consequent jitter transfer, a jitter compensation circuit (JCC) acquires the jitter transfer amplitude and frequency information by detecting the DLL loop filter voltage (VLF(s)) signal, and generates an inverted loop filter voltage signal, denoted as VLFINV(s). The VLFINV(S) modulates a group of complementary VCDLs (C-VCDLs) to attenuate the jitter transfer on both recovered clock and data.
With the provided PAM-4 receiver, a jitter compensation ratio up to 60% can be supported from DC to 4 MHz, with a −3-dB cornerfrequency of 40 MHz. Therefore, the present invention provides a solution to the three challenges in source synchronous I/O, including clock phase deskew, wideband jitter tolerance and jitter transfer attenuation.
Aspects of the present disclosure may be readily understood from the following detailed description with reference to the accompanying figures. The illustrations may not necessarily be drawn to scale. That is, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion. There may be distinctions between the artistic renditions in the present disclosure and the actual apparatus due to manufacturing processes and tolerances. Common reference numerals may be used throughout the drawings and the detailed description to indicate the same or similar components.
In the following description, preferred examples of the present disclosure will be set forth as embodiments which are to be regarded as illustrative rather than restrictive. Specific details may be omitted so as not to obscure the present disclosure; however, the disclosure is written to enable one skilled in the art to practice the teachings herein without undue experimentation.
The CTLE 110 is implemented as a front-end of the receiver 100 to compensate for moderate channel loss and configured to equalize an PAM4 input data signal (DATAIN). Referring to
Referring back to
Referring to
In some embodiments, the phase difference detection may be realized with a XOR phase detector which produces a high voltage level signal when the states of CLKREC and CLKDLL are different from each other, and a low voltage level signal (typically equal to 0 V) when the states of CLKREC and CLKDLL are the same with each other.
Referring to
In some implementations, the output clock frequency of the WBPLL may have a tuning range from 3.75 to 7.5 GHz to support 30-to-60-Gb/s PAM-4 operation. The WBPLL bandwidth is set at 400 MHz to ensure its phase and frequency updates can settle much faster than the quarter-rate delay-locked clock signal CLKDLL, whose bandwidth is designed to be 40 MHz for good jitter tolerance. The 400-MHz PLL bandwidth also supports wideband correlated jitter tracking and pattern-dependent uncorrelated jitter filtering.
Referring back to
Referring to
For example, the input PAM-4 signal may be sampled and deserialized by four S/H circuits with the PH-0/90/180/270 CLKREC signals. Next, the sampled signals are decoded using the three StrongARM CMP with individual reference voltages generated from a 6-bit current-mode DAC for slicing the top, middle, and bottom data eyes. The offsets at the input MOSFET devices of the StrongARM CMPs are calibrated upon startup using a 6-bit DAC as the calibration circuit. The decoded 4×3-bit thermometer codes (Tcode) are then converted into 4×2-bit binary codes (Bcode) as MSBREC and LSBREC.
Referring back to
Referring to
Referring back to
Referring to
Referring back to
The delay-line control voltage signal VLF(s) consists of a DC component VLFDC for fixing the locked timing point and an AC component VLFAC for tracking high-frequency jitter. Typically, the VLFDC varies from 0.15 V to 0.85 V, while VLFAC exhibits an amplitude of tens of mV and a bandwidth within 40 MHz.
Referring to
In some implementations, the CP may have an output current of 50˜100 uA. Since the on-off switching of CP current can cause a relatively large supply variation, the C-R-C loop filter decouples the variation on CP power supply and VCDL power supply. The VLF(s) regulates the VCDL to generate the CLKDLL, which tracks the jitter from the input PAM-4 signal.
The DLL 160 may further comprise a buffer (Buf) circuit and a duty cycle correction (DCC) circuit configured to correct duty cycles of the input clock signal CLKIN and convert the input clock signal from a single-ended clock signal to a differential clock signal.
Referring to
Referring to
Referring back to
Referring to
Referring to
In some embodiments, the voltage follower may include a first core amplifier (AMP) with rail-to-rail input and output connected as unit gain feedback. That is, the first core amplifier may have a negative unit gain feedback loop connected between an output of the AMP and an inverting input of the AMP so as to generate a unit gain.
In some embodiments, the inverting follower may comprise a second core amplifier (AMP) having a negative feedback loop formed with a feedback resistor Rfb coupled between an output of the second amplifier and an inverting input of the second amplifier; and an input resistor Rin coupled to the inverting input of the second amplifier. The feedback resistor Rfb and the input resistor Rin are set to have a same resistance (typically equal to 10 KΩ) so as to generate an inverting unit gain (i.e., an inverting gain close to 1).
In some embodiments, the SAR-ADC may include a comparator (CMP) with regeneration (RG) configured to receive the buffered delay-line control voltage signal VLFBuf(s) at a first input terminal; a SAR logic circuit coupled to an output of the comparator and configured to provide a digital output; and a digital-to-analog converter (DAC) (e.g., a R-2R DAC) configured to receive the digital output from the SAR logic circuitry, convert the digital output into the analog delay-line control voltage VLFDAC(s), and feedback the analog delay-line control voltage VLFDAC(s) into a second input terminal of the comparator. As such, upon receiving the enabling signal VENABLE from the lock detector, the SAR-ADC can start operation to detect, reproduce, and maintain the DC level of VLFBuf on the R-2R DAC as VLFDAC, which can be designed to track VLFDC with an error less than 7 mV typically.
The 8-bit SAR logic circuit consists of eight identical SAR logic units. As shown in
As shown in
Referring back to
Referring to
In other words, the first C-VCDL circuit may include one or more complementary voltage-controlled delay cells for the jitter-compensated recovered clock signal CLKRECJC which has a delay time proportional to the inverted delay-line control voltage signal VLFINV(s) with reference to the input clock signal CLKIN.
The second C-VCDL circuit may include one or more complementary voltage-controlled delay cells for generating the jitter-compensated recovered LSB signal LSBRECJC which has a delay time proportional to the inverted delay-line control voltage signal VLFINV(s) with reference to the input clock signal CLKIN.
The third C-VCDL circuit may include one or more complementary voltage-controlled delay cells for generating the jitter-compensated recovered MSB signal MSBRECJC which has a delay time proportional to the inverted delay-line control voltage signal VLFINV(s) with reference to the input clock signal CLKIN.
The principle of jitter compensation can also be illustrated using loop dynamic analysis. The close loop transfer function (CLTF) of jitter transferred from input data to the DLL can be derived as:
where Φin(s) stands for the input jitter, RT stands for transition ratio (typically equal to 0.5). Ke and KCP represents the gains of the BBPD and charge pump (CP). The effect of WBPLL is not include since its loop bandwidth is ten times higher than the DLL.
The CLTF from Φin(s) to the recovered clock phase ΦCLKREC(s) can be represented by:
where KVCDL represents the gain of the VCDL circuit.
Eq. (2) illustrates the jitter transfer behavior of the DLL. The 3-dB bandwidth of Eq (2) determines the jitter tolerance bandwidth defined as:
The CLTF from Φin(s) to the phase of jitter compensated clock ΦCLKRECJC(s) can be determined by:
where KCSG represents the gain of the CSG circuit and KPV represents the gain induced by process variation.
Ideally, KCSG is equal to −1 to generate an VLFINV(s) signal with completely the same amplitude and inverted phase as VLF(s) such that complete jitter transfer compensation can be achieved. However, two non-ideal factors deviate KCSG from −1, including the AC gain errors in voltage follower and inverting follower, and the DC offset between VLFDC and VLFINVDC. Therefore, KCSG can be represented as:
KCSG=KACgainKDCoffset (6)
wherein KACgain is the AC gain of voltage follower and inverting voltage follower and KDCoffset is the DC offset gain of the voltage follower and SAR ADC in the CSG circuit.
The DC offset gain may be calculated using the equation:
where KVCDL(VLFDC) represents the KVCDL value at VLFDC.
As described previously, the function of CSG is to produce the VLFINV(s) with the same amplitude and inverted phase as VLF. The mismatch factor between KVCDL and KCVCDL due to local process variation is included in KPV, which is close to 1 with fully symmetry layout.
In real CMOS implementation, the offset and gain error in CSG, and the mismatch between VCDL and C-VCDL due to process variation can degrade the jitter transfer compensation performance. In order to ensure better matching, the VCDL and C-VCDL circuits are aligned close to each other, and protected by dummies at both ends in circuit layout as shown in
The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated. While the methods disclosed herein have been described with reference to particular operations performed in a particular order, it will be understood that these operations may be combined, sub-divided, or re-ordered to form an equivalent method without departing from the teachings of the present disclosure. Accordingly, unless specifically indicated herein, the order and grouping of the operations are not limitations. While the apparatuses disclosed herein have been described with reference to particular structures, shapes, materials, composition of matter and relationships . . . etc., these descriptions and illustrations are not limiting. Modifications may be made to adapt a particular situation to the objective, spirit and scope of the present disclosure. All such modifications are intended to be within the scope of the claims appended hereto.
The present application claims priority to U.S. Provisional Patent Application No. 63/190,829 filed May 20, 2021, and the disclosure of which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
6903587 | Sasaki et al. | Jun 2005 | B2 |
7412016 | Stojanovic | Aug 2008 | B2 |
7587012 | Evans et al. | Sep 2009 | B2 |
8019022 | Liu et al. | Sep 2011 | B2 |
8861664 | Akkihal et al. | Oct 2014 | B2 |
9036764 | Hossain et al. | May 2015 | B1 |
9941864 | Daghighian et al. | Apr 2018 | B2 |
10135604 | Dehlaghi | Nov 2018 | B1 |
10277230 | Liu et al. | Apr 2019 | B2 |
10277431 | Tajalli | Apr 2019 | B2 |
10615956 | Kobayashi | Apr 2020 | B2 |
10763866 | Ryu et al. | Sep 2020 | B2 |
11271571 | Tajalli | Mar 2022 | B2 |
20020075981 | Tang et al. | Jun 2002 | A1 |
20030165208 | Carter et al. | Sep 2003 | A1 |
20170257168 | Gopalakrishnan et al. | Sep 2017 | A1 |
20190058576 | Lim | Feb 2019 | A1 |
20220329247 | van Ierssel | Oct 2022 | A1 |
Number | Date | Country |
---|---|---|
1747327 | Mar 2006 | CN |
105703767 | Jun 2016 | CN |
108282162 | Jul 2018 | CN |
110034826 | Jul 2019 | CN |
108880721 | Jul 2020 | CN |
1424012 | Feb 1976 | GB |
5061498 | Oct 2012 | JP |
100568106 | Apr 2006 | KR |
100605578 | Jul 2006 | KR |
102210324 | Feb 2021 | KR |
2017141258 | Aug 2017 | WO |
2018192647 | Oct 2018 | WO |
2019032085 | Feb 2019 | WO |
Entry |
---|
Junwen Zhang et al., Demonstration of Single-Carrier ETDM 400GE PAM-4 Signals Generation and Detection, IEEE Photonics Technology Letters, 2015, vol. 27 (24), p. 2543-2546. |
Zhuo-Kai Yang et al., High-speed Vertical-cavity Surface-emitting Lasers Based on PAM4 Modulation, Chinese Journal of Luminescence, 2020, vol. 41 (4), p. 399. |
Office Action of corresponding China patent application No. 202210541372.X dated Jun. 14, 2023. |
Chia-Tse Hung et al., “A 40 GB/s PAM-4 Receiver with 2-Tap DFE Based on Automatically Non-Even Level Tracking,” 2018 IEEE Asian Solid-State Circuits Conference, 2018, pp. 213-214. |
Po-Wei Chiu et al., “A 32Gb/s Digital-Intensive Single-Ended PAM-4 Transceiver for High-Speed Memory Interfaces Featuring a 2-Tap Time-Based Decision Feedback Equalizer and an In-Situ Channel-Loss Monitor,” 2020 IEEE International Solid-State Circuits Conference, 2020, pp. 336-338. |
Yang-Hang Fan et al., “A 32-GB/s Simultaneous Bidirectional Source-Synchronous Transceiver With Adaptive Echo Cancellation Techniques,” IEEE Journal of Solid-State Circuits, 2019, pp. 439-451, vol. 55, No. 2. |
Kunzhi Yu et al., “A 25 GB/s Hybrid-Integrated Silicon Photonic Source-Synchronous Receiver With Microring Wavelength Stabilization,” IEEE Journal of Solid-State Circuits, 2016, pp. 2129-2141, vol. 51, No. 9. |
Timothy O. Dickson et al., “A 1.4 pJ/bit, Power-Scalable 16x12 GB/s Source-Synchronous I/O With DFE Receiver in 32 nm SOI CMOS Technology,” IEEE Journal of Solid-State Circuits, 2015, pp. 1917-1931, vol. 50, No. 8. |
Hao Li et al., “A 0.8 V, 560fJ/bit, 14Gb/s Injection-Locked Receiver with Input Duty-Cycle Distortion Tolerable Edge-Rotating 5/4X Sub-Rate CDR in 65nm CMOS,” 2014 Symposium on VLSI Circuits Digest of Technical Papers, 2014, pp. 1-2. |
Xuqiang Zheng et al., “A 40-GB/s Quarter-Rate SerDes Transmitter and Receiver Chipset in 65-nm CMOS,” IEEE Journal of Solid-State Circuits, 2017, pp. 2963-2978, vol. 52, No. 11. |
Seon-Kyoo Lee et al., “A 5 GB/s Single-Ended Parallel Receiver With Adaptive Crosstalk-Induced Jitter Cancellation,” IEEE Journal of Solid-State Circuits, 2013, pp. 2118-2127, vol. 48, No. 9. |
James F. Buckwalter et al., “Analysis and Equalization of Data-Dependent Jitter,” IEEE Journal of Solid-State Circuits, 2006, pp. 607-620, vol. 41, No. 3. |
Number | Date | Country | |
---|---|---|---|
20220385444 A1 | Dec 2022 | US |
Number | Date | Country | |
---|---|---|---|
63190829 | May 2021 | US |