This application is a non-provisional of U.S. Provisional Application No. 61/773,732 filed Mar. 6, 2013 and incorporated herein in its entirety by this reference.
Clock forwarded architectures help to achieve higher data rates due to jitter tracking between transmitted data and the clock signals. Further, higher data rates can be achieved by using quadrature data rate i.e., transmitting data on rising and falling edges of a pair of quadrature clocks e.g., ICLK/QCLK.
In this context, it is important to maintain a desired duty-cycle of the two clock signals, and a desired phase, e.g., a quadrature-phase (90° lag) relationship between them. In some situations, the transmitter-receiver loop traverses two or more devices. For example, control of the duty-cycle and phase shift may be performed in a Memory Controller PHY, while detection of duty-cycle/phase error is performed in the Memory PHY. A precise 50% duty cycle and quadrature-phase relation of the clock signals helps to reduce data recovery error rates and therefore improve effective transmission rates.
This disclosure is not intended to be limited to any particular implementation. Rather, the implementation details discussed below are merely to illustrate the inventive concepts. Aspects of the present disclosure can be utilized, for example, in memory systems, or in any other systems where clock forwarding is employed. In some embodiments, the data transmitter may be in a memory controller, and the corresponding data receiver may be in a corresponding memory. Memory controllers may be standalone or integrated “on-chip,” for example in a microprocessor. Here we use the term “memory” in a broad sense to include, without limitation, one or more of various integrated circuits, components, modules, sub-systems, etc. Other aspects of this disclosure relate to assessing and reducing integral non-linearity (INL) in phase interpolator (PI) circuits. These aspects may be used in other contexts not limited to clock-forwarded architectures and applications.
Equal spacing of all four clock signal edges requires:
One step toward meeting these criteria calls for measuring duty cycle of the clock signals.
The output of the PI is referred to as a sampling clock signal (SCLK) in the drawing, as it is used to sample the state of the clock signal under test 32. Each cycle of the SCLK clocks or triggers the flip-flop 30 to output at node 54 the current state of its clock input 32. The PI shifts the falling edge of its input clock signal responsive to the current phase code input from source 50. A counter or the like (not shown) counts the number of 1s in the samples at node 54. This number is compared to the total number of samples (cycles of SCLK) to provide an indication of the duty-cycle of the input clock signal. For example, if 120 “1s” are counted out of a total of 200 cycles, the duty-cycle of the clock signal waveform may be estimated to be 120/200=60%.
In use, a circuit of the type illustrated in
The mux 430 outputs a corrected clock signal at node 460, responsive to a select input at 462. If the duty cycle of the uncorrected clock is greater than 50%, a select signal at node 462 is input to the multiplexer 430 to select input number 1. Alternatively, if the duty cycle of the uncorrected clock is less than 50%, the select signal at 462 is set to select the 0 input to the multiplexer. Using this arrangement, the corrected clock signal at node 460 will have a following edge adjusted by Δ phase code counts relative to the input clock signal, to arrive at 50% duty cycle. The circuitry of
This diagram illustrates how the duty cycle of the received clock signal 606 may be measured in order to determine a duty cycle correction or Delta, preferably expressed in terms of PI phase code counts. The correction or Delta (Δ) may be transmitted back to the transmitter side 600, as illustrated, to be applied to the phase interpolator 602 in order to adjust the clock signal so that it will have a 50% duty cycle at the receiver side 650. As explained above, the received clock signal is input (/4) to a PI 612 to generate sample clock signals at node 614. The PI 612 is controlled by FSM 616 which is arranged to step though the range of phase codes (Code=Code+1) so that each sample clock signal at 614 has a phase corresponding to the current state of the FSM. The sample clock signal is applied to the clock input of a flip flop circuit 610 (D-type), to clock the input clock signal CLK/B to the FF output at node 618. Counters 620 are arranged to count the number of 1's at node 618 (“Ct1”) along with the number of samples (“Cnt2”), based on the sample clock signal 614. If the total number of samples divided by 2 equals the number of 1's, then the clock signal CLK/B has a 50% duty cycle. The error or offset from 50%, Delta=(Cnt2/2)−Cnt1. This metric is fed back to the transmitter side 600 to adjust PI 602.
In a preferred embodiment, the correction or Delta may be expressed in units of the phase interpolator input code or phase code. Preferably, the phase interpolator 612 which is used in the receiver circuitry 650 to measure the duty cycle error is matched to the phase interpolators 602 and 604 in the transmitter side circuitry 600, such that the phase code adjustment or Delta I will have the same step size. In general, circuitry of the type disclosed can be used to estimate a delay for adjusting a received clock signal to achieve any desired duty cycle. The phase adjustment can be done as a single adjustment, as distinguished from an incremental or closed loop feedback system.
In some embodiments, the received clock signal may comprise quadrature clock signals, I clock (ICLK) and Q clock (QCLK). The described circuits and methodology can be applied to at least one of the quadrature clock signals. In some embodiments, the desired duty cycle maybe 50% for I clock and also 50% for Q clock so as to support quad data rate sampling. In some embodiments, the clock signal delay is applied for adjusting only the following edge of at least one of the quadrature clock signals so as to maintain alignment of the rising edges. Conversely, in other embodiments, the delay may be applied to a rising edge while maintaining alignment of the falling edges of the clock signals.
Referring again to
The QCLK correction circuitry 930 also may utilize a pair of PIs. A first PI 932 is controlled by phase code input=Qcode+IQ delta, while the second PI 934 receives phase code input=Qcode+IQ delta+Q delta. Operation of adjustment circuits of this general type was described in greater detail with regard to
Referring now to the receiver side 950 in
In an embodiment, the forwarded clocks (or their divided versions) 966 are input to a PI 960. The PI 960 is controlled by an FSM or other counter source, not shown, to provide and increment phase codes 968. The output from the PI 960 is coupled via node 956 to a clock input of flip-flop (FF) 966. The flip-flop couples a signal currently selected by multiplexer 952 (one of the three signals described above) to one or more counters represented by cloud 970. Circuitry of this type, arranged to measure duty cycle of an input clock signal, was described above with regard to
The quadrature phase portion in this embodiment begins at block 1030. Here, the process calls for converting a quadrature phase error (QPE) to a duty cycle of an error clock signal IQCLK. At block 1034, the process continues to determining the delay necessary to achieve a selected duty cycle in the error clock signal corresponding to a quadrature phase of the forwarded clock signal. Continuing to block 1036, the delay is expressed as a delta (Δ) i.e., a number of phase interpolator steps of a step size selected to match the transmitter side phase interpolator step size. Finally, at block 1038, the determined delay is sent back to the transmitter side via path 1040. THE duty cycle correction and or the QPE correction can be performed at various times. For example, they may be done at power up, at reset, or periodically, or responsive to predetermined conditions as described below. These examples are merely illustrative and not intended to be limiting.
Referring now to
At block 1070, the process calls for estimating a quadrature phase error. Proceeding to decision 1072, the question is whether the phase quadrature error exceeds a predetermined threshold value, or whether a timer has elapsed. Either or both conditions can be implemented. If neither applicable condition is satisfied, the process loops back via path 1074 to repeat the quadrature phase error estimation at block 1070. If and when the predetermined condition is satisfied, the process falls out of the loop via path 1076 to block 1078. Here, the process calls for determining or estimating an error or Δ necessary to achieve a selected duty cycle in the error clock signal, corresponding to the quadrature phase of the forwarded clock signal. In other words, what change in the duty cycle of the error signal is necessary to adjust the quadrature phase of the forwarded clock signal to achieve the desired 90° in most cases. Next, the method continues to block 1080 whereupon the determined error correction or Δ is forwarded back to the transmitter, via path 1082. This correction corresponds to the quantity “IQ delta” input to the PI 934 in the example of
Like all devices, PIs are not perfect. In general, due to circuit fabrication variations, for example, PIs exhibit some integral non-linearity (“INL”). This property may be represented graphically an INL curve. An example is shown in
One way to mitigate the effect of INL is illustrated in the flow diagram of
Referring again to
Estimating and Reducing Phase Interpolator Non-Linearity
We have found that CDR (clock data recovery) jitter is a major contributor to overall receiver jitter. We have also determined that CDR dither can be reduced by minimizing PI INL. Above, we mitigated INL effects by averaging over different parts of the INL curve. Next, we discuss ways to measure (actually quantify) INL and how to correct or reduce INL effects using that information in several ways. It should be noted that correcting or minimizing PI INL, as described herein, may be useful in some clock forwarded serial link applications. However, these aspects of the disclosure are not limited to serial link applications. Estimating and reducing PI INL is useful for PIs in general.
First, we can estimate PI INL, in one embodiment as follows.
This clock signal 1222 is input to a flip-flop circuit 1224. The clock input to the flip-flop circuit is driven by a suitable signal for clocking the flip-flop. For example, it may be provided by a random number generator 1230. In one embodiment, a clocking signal may have a frequency that substantially greater than the frequency of the input clock signals at node 1200 to implement direct sampling. In another embodiment, a sampling clock may have a lower frequency (sub-sampling). In one preferred embodiment, a pseudo random bit sequence (PRBS) generator provides improved performance. The output of the flip-flop circuit 1224 is provided to a counter circuit 1240 which is arranged to count a number of ones as compared to the total count. This ratio of ones to the total count determines the actual or measured duty cycle of the clock signal at node 1222. This measured duty cycle is compared to the expected duty cycle corresponding to the selected PI input code at 1204. The difference between the measured and expected duty cycle represents the PI INL.
Techniques to reduce the effective non-linearity of a PI are presented next. Recall that in general, a PI may be used to generate an output clock signal that has a phase is a mixture of the phases of two input clocks. PIs may have a unit-cell approach, in which case the mixing is done by combining x unit cells connected to clock1 with y unit cells connected to clock2. The PI cells may be variously implemented, for example, using current mode logic (CML), inverter stages, etc.
Suppose during manufacture, post silicon, the cells are measured as having actual values of 1 U, 2U, 3U, 4U, 5U. (These values are listed in Table 1 in the row labeled Magnitude.) In other words, the actual 1 U, 2U and 4U_2 cells are approximately equal to the corresponding nominal input code values. However, 4U_1 is low by one unit, and the 4U_3 is high by one unit. In this case, conventional wisdom may suggest a hard-wired decoding of 4U=4U_1 and 8U=4U_2+4U_3. The output with that decoding would be 3U and 9U, respectively, with resulting INL errors of −1 U and 1U. If nothing else, the “average INL” in this case may be said to be approximately zero. However, matching PIs using this approach can be very difficult.
We propose a new approach generally as follows. Our first illustrative method calls for evaluating all combinations of the available U cells and choosing the combination having the least INL. Here, to illustrate, a first decoding scheme may be the one described in the previous paragraph, in which each decoding, 4U and 8U, results in 1U INL. A second decoding scheme may comprise 4U=4U_2 and 8U=4U_3+4U_1, resulting in outputs 4U and 8U, for 0 INL. A third decoding scheme may comprise 4U=4U_3 and 8U=4U_1+4U_2, resulting in outputs 5U and 7U, again suffering 1U INL. Thus the optimal solution is the second decoding scheme, the one having the least INL. In this way, we can minimize INL by optimizing decoding based on on-chip testing.
Table 1 shows further illustration of an example in which the order of switching on the 4U units is fixed. This simplifies implementation logic. The example below has a maximum INL of 1U.
In an alternative method, the order of switching on units may be relaxed, resulting in a significantly larger set of possible decoding schemes. In this type of method, it is likely to take longer to determine an optimal decoding. On the other hand, reduced error (INL) may be achieved. An illustration is shown in Table 2 below. In this case, with the ability to switch in any order, the INL can be reduced to zero.
The optimum result, determined using either type of methodology, may be implemented, for example, using the multiplexer circuitry shown in
In another aspect of the disclosure, two PIs may be measured as described above, the NL results used to select or match PIs to achieve accurate delay times between two clock paths, for example, ICLK and QCLK. In an embodiment, a determination is made as to which code Y of P12 results in say a 90-degree delay with respect to a given code X of PI1. This is straightforward to implement once the INL of the two devices is known.
It will be obvious to those having skill in the art that many changes may be made to the details of the above-described embodiments without departing from the underlying principles of the invention. The scope of the present invention should, therefore, be determined only by the following claims.
Number | Name | Date | Kind |
---|---|---|---|
7268601 | Kwak | Sep 2007 | B2 |
7613266 | Talbot | Nov 2009 | B1 |
7681063 | Partovi et al. | Mar 2010 | B2 |
8139700 | Beukema et al. | Mar 2012 | B2 |
20070271052 | Abel | Nov 2007 | A1 |
20110001527 | Lee | Jan 2011 | A1 |
20110148498 | Mosalikanti | Jun 2011 | A1 |
20110221495 | Lee et al. | Sep 2011 | A1 |
20110298518 | Kim et al. | Dec 2011 | A1 |
20120038404 | Yong | Feb 2012 | A1 |
Number | Date | Country | |
---|---|---|---|
20140253195 A1 | Sep 2014 | US |
Number | Date | Country | |
---|---|---|---|
61773732 | Mar 2013 | US |