The present application is a national phase application of PCT Application No. PCT/EP2008/006852, filed on Aug. 20, 2008, and claims priority to German Application No. 10 2007 045 085.2, filed on Sep. 21, 2007, and German Application No. 10 2008 011 845.1, filed on Feb. 29, 2008, the entire contents of which are herein incorporated by reference.
1. Field of the Invention
The invention relates to a method for clock-data recovery and an associated device.
2. Discussion of the Background
In digital transmission technology, many binary data streams, especially serial data streams are transmitted at a fast bit rate without an accompanying clock-data signal. The goal of clock-data recovery (CDR) is to determine the frequency and phase of the underlying transmission clock data from the received data stream.
In a conventional receiver, the recovered clock signal is used for decoding the transmitted bit sequence by sampling the received signal pulses exactly in the center in order to maximize the signal-noise ratio. In signal analysis, the recovered clock signal is used to evaluate the signal quality, typically with reference to so-called eye diagrams and mathematical tools for jitter analysis.
The transmitted clock pulse is often determined by means of a PLL (Phase-Locked Loop), a phase-locked control loop. For the analysis of signal quality, various standards specify a standardized receiver in the form of PLL properties. In this context, the recovered clock signal determines the ideal bit starting point according to definition. The evaluation of deviations between the zero passes in the received data stream and in the clock signal forms the basis of the data analysis.
For signal analysis or jitter analysis, the transmitted clock pulse is recovered, in principle, in two different ways:
Hardware PLLs known in the prior art can be subdivided into three categories: linear PLLs, digital PLLs and all-digital PLLs. The three types of PLL process and generate analog, time-continuous signals, wherein the digital and all-digital PLLs are adapted to the processing of binary serial data streams.
The rule for computation, which simulates the method of functioning of a hardware PLL, is generally referred to as a software PLL. One approach is to describe the method of operation of the analog components mathematically and to process a highly-sampled version of the received data stream with this. A second approach is based on the observation that only the zero passes in the data stream contain the relevant information for the clock-data recovery. In this case, the position of the zero passes is initially determined by interpolation of the stored data portion, and the zero passes of the clock signal are then calculated from this.
The x0(k) are sorted chronologically and processed sequentially. Initially, the time difference e(k) between data edges and clock edges is formed in the phase detector. Since no signal throughput takes place when the transmitter transmits two or more identical bits in succession, the number of data-edges is generally smaller than the number of bits transmitted. If the time-difference value is greater than one half bit period T0, a missing edge can be assumed, and, by way of example, e(k)=0 is set; otherwise, the phase detector passes on the time difference e(k) without change. The term e(k) is filtered with the loop filter F(q−1)·F(q−1) describes a differential equation as a function of the delay operator q−1, for which the following applies by way of example: 2·q−1·e(k)=2·e(k−1). The resulting d(k) together with a constant T0, which indicates the nominal bit period of the data stream, provides an estimate of the momentary bit period of the data stream. The accumulator A(q−1) determines the position of the next clock edge, by adding the momentary bit-period estimate to the last clock edge. The underlying method of functioning can be described algorithmically as follows:
Through the targeted selection of coefficients of F(q−1) and A(q−1), the above software PLL can approximate the theoretical PLL transmission function very well, provided it operates offline. In the case of a realization operating in real time, it should be remembered that each of the above processing stages requires a certain processing time. The overall realization-specific delay falsifies the transmission function of the phase-locked loop and can even endanger stability. As a rule of thumb, a realtime-capable software PLL according to the prior art can be used only for the analysis of data streams, of which the bit period is longer than the processing time for the calculation of a new clock edge.
Embodiments of the present invention advantageously provide a method and a device for clock-data recovery, which determines the clock-edge positions through the parallel processing of several data edges. The parallel processing allows a relatively-higher throughput than with conventional software PLLs. The method and the device approximate the theoretical PLL transmission function, wherein the stability of the phase-locked loop is always guaranteed.
The stages of the method for clock-data recovery according to the invention are as follows:
In other words, the phase-locked loop from
If required, all functional blocks and processing stages can process a plurality of edges in parallel. Advantageous embodiments are outlined in the section below.
The drawings are as follows
The invention has its origin in the transformation of the classic block-circuit diagram of a software PLL.
In block-circuit diagram A,
Block-circuit diagram B from
The loop filter F(q−1) and the accumulator A(q−1) define conventional linear differential equations as a function of the delay operator q−1.
For example:
describes the differential equation:
a(k)−g1·a(k−1)=b(k)+g2·b(k−1)
According to block-circuit diagram A in
y(k)=A(q−1)·(F(q−1)·(x(k)−y(k))+T0). (1)
Let the following be defined:
{tilde over (x)}(k)=x(k)−A(q−1)·T0, {tilde over (y)}(k)=y(k)−A(q−1)·T0. (2)
In this context, it should be noted that the term t(k)=A(q−1)·T0 describes the accumulation or integration of a constant signal with growing edge index k and functionally defines a straight line of gradient T0.
The following is obtained from equations (1) and (2):
Finally, the following is obtained:)
{tilde over (y)}(k)=A(q−1)·F(q−1)·({tilde over (x)}(k)−{tilde over (y)}(k)) (4)
Equation (4) corresponds to the phase-locked loop in block-circuit diagram B in
The consequence of the structure in block-circuit diagram B is that the transmission function of the phase-locked loop can be expressed as a linear, rational filter or respectively linear differential equation:
As a result of the transformation into a differential equation, the original structure of the software PLL from
One additional cost results from the pre-processing and post-processing stages. The term t(k)=A(q−1)·T0 from equation (2) describes a straight line of constant gradient T0 via the index k. The pre-processing block extracts this linear trend from the incoming data edges x(k) and the post-processing block adds it back again to the PLL output.
The data-edge positions of the analyzed data stream always represent a straight line of gradient Tb via the index k. After the extraction of the linear trend, a residual trend of gradient (Tb−T0) remains. Consequently, the terms |{tilde over (x)}(k)| and |{tilde over (y)}(k)| grow over time in an unrestricted manner, if the data-stream bit period Tb deviates from the nominal value T0. To ensure that |{tilde over (x)}(k)| and |{tilde over (y)}(k)| remain limited, both values must be reset occasionally by a given offset. This can be implemented by a simultaneous increase in the auxiliary blocks and a resetting of the status of the main phase-locked loop, also referred to below as the PLL core, by the same offset value. Accordingly, the difference e(k)={tilde over (x)}(k)−{tilde over (y)}(k) is preserved.
In particular, the implementation of all functional blocks can take place in parallel in order to increase the rate of operation. This is understood to mean that several successive elements of the data-edge sequence x0(k) are processed in the same operational stage. The resulting clock-edge sequence y(k) is theoretically identical to the sequential processing of x0(k) with a conventional software PLL according to the prior art.
The parallel structure of the trend-extraction block and of the trend-injection block is uncomplicated, because the nominal clock continues to run with the nominal bit period T0 known in advance. For the parallel realization of the linear filter or respectively of the linear differential equation from equation (5), methods are known, for example, from the literature for recursive block filtering (pipelined block filtering) can be used. The parallel edge assignment in the EMPU uses a prediction of the clock edges y(k) as a basis. The method of functioning and favorable embodiments of the EMPU are explained below.
Method of Functioning of the EMPU
In theory, it is possible to distinguish between three cases:
Conventional software PLLs or hardware PLLs basically operate in a sequential manner and determine the clock edge y(k+1) by processing earlier data and clock edges up to the timing index k. For parallel edge assignment of the data-edge packet [x(k+1), x(k+2), . . . , x(k+N)], a prediction of several clock edges is necessary, that is to say, the terms [y(k+1), y(k+2), . . . , y(k+N)] must be estimated from the information up to the timing index k.
For this purpose, the EMPU defines a secondary clock signal, referred to below as the “Front Clock”. The Front Clock represents a prediction of the recovered clock edges y(k) and is used to subdivide the time axis for the edge assignment. The Front Clock is coupled to the PLL core and consequently to the recovered clock edge y(k), as indicated by the dotted line in the block-circuit diagrams of
In one possible embodiment, the Front Clock starts running immediately after the system initialisation with the nominal bit period {circumflex over (T)}b=T0. Only after L timing units or system-clock pulses processing latency does the PLL begin to process the data edges and to synchronize the clock edges to the received data stream by adaptation of {circumflex over (T)}b. From this moment, the Front Clock and the recovered clock, also referred to below as the PLL clock, can operate with one another in a coupled manner, because, for example, the Front Clock can use {circumflex over (T)}b in order to follow the excursion of the PLL clock.
With this procedure, the Front Clock estimates the future values of the PLL clock according to the principle “the PLL clock will continue to run with the nominal bit period for the next L system clock pulses”. In the event of a non-observance of this assumption, a phase offset occurs between the two clocks. The phase offset after the settling of the PLL to a data stream with constant bit period Tb can be approximated as follows:
wherein
The phase offset brings about a displacement of the time intervals in a case, in which the PLL clock has been used instead of the Front Clock for the edge assignment. In the case illustrated in
An improved performance is generally obtained, if the Front Clock is determined according to the principle “the PLL clock will continue to run for the next L system clock pulses with the last estimated momentary bit period”. Other prediction principles are conceivable.
In the case of a system operating online, the data stream is constantly observed. A volume of new data edges is provided regularly to the clock-data recovery, for example, every system clock pulse, via an external auxiliary device. One system clock pulse defines a given window on the time axis. Against this background, the method of functioning of the EMPU can be subdivided into two sub-tasks. Initially, with the assistance of the Front Clock, the clock edges covered by the current system clock pulse or respectively current time window are determined. Following this, the received data edges are paired with the clock edges.
The example in
The Front Clock specifies the position of the clock edges tF(k). For the determination of the time intervals covered by the current time window, it is helpful to compare the upper limit of the k-th interval tF+(k), which is derived from the clock edges, for example, according to tF+(k)=tF(k)+T0/2, with the upper limit of the k-th system clock pulse tS+(j). With reference to
In summary, the effective number of clock edges in the j-th system clock pulse is determined with regard to how many tF+(k) fit between the timing points tS+(j−1) and tS+(j).
After determining the relevant time intervals of the current time window, the assignment of the data and clock edges is implemented.
Method A connects data edges to clock edges according to the rule:
If |Da−Cb|≦Δ, then Da and Cb fit together. (a,b≧0), (7)
wherein Δ is selected in such a manner that the time axis is subdivided into non-mutually-overlapping intervals. Data edges, which are disposed in the regions not covered by the intervals, are simply ignored. According to
In formal terms, the edge assignment can be described in matrix form by the following table:
The clock edges and data edges are each sorted chronologically. On the assumption that a maximum of one data edge occurs per bit period, the calculation of the elements in the lower, shaded triangular matrix can be skipped in order to reduce the computational cost. However, this reduces the robustness of the assignment matrix in the event that several data edges per bit period occur, as can be the case, for example, during the settling of the PLL. Compromise solutions, where only the elements of the lowest diagonal are automatically set to zero (0), are conceivable.
Method B presents a slight variation of the same principle. In this case, the time axis is subdivided over the timing points Qb in mutually-adjacent intervals. The Qb corresponds to the timing points tF+(k), which are calculated in order to determine the effective number of clock-edges. The data and clock edges are now linked to one another according to the rule:
If Qb−1<Da≦Qb, then Da and Cb fit together (a,b≧0), (8)
This leads to an assignment matrix as in Method A—in the example considered, both matrices correspond exactly.
The identified missing edges are dealt with separately. The phase and timing error e(k) between data and clock edges represents the control difference of the PLL phase-locked loop. In the case of missing edges, the phase error is not defined. In a conventional software PLL according to
The EMPU interpolates x0(k) and generates a gap-free data-edge sequence x(k), which is then processed by the PLL core. The interpolation is implemented, for example, by filling missing edges with an artificial edge. In order to approximate the case e(k)=0, a prediction of the PLL clock edges, such as the Front Clock, is used. Other approaches, such as e(k)=e(k−1) can be realized through an appropriate choice of the interpolating edges. Although the filling takes place in the EMPU for explanatory purposes, this can be realized dependent upon the implementation at one or more positions in the processing path between the missing-edge assignment and the PLL core.
Time Presentation
In practice, all timing points, including the data edges and clock edges are expressed with a finite bit-word width. The use of an absolute time reference is inappropriate for systems, which are in operation for long periods of time. In this context, the processing of relative time data is advantageous. This can be implemented, inter alia, in two mutually-combinable ways:
It is assumed that the data edges are provided in an appropriate format. For example, the time axis with low bit rates (that is to say, long bit periods) is additionally scaled, so that the edge timing points can be presented with a limited word width.
The system initially comprises an Edge Matching and Patching Unit (EMPU) 110, where the assignment between data edges and clock edges is implemented. An internal clock signal referred to as the Front Clock, indicates the approximate position of the clock edges. On this basis, the missing edges are identified and marked as such. The missing edges are interpolated in an appropriate manner, by way of example, here, but always before the block PCU 130, in order to obtain a gap-free data edge sequence. The data edges are sorted and routed without modification.
Moreover, the system comprises a Trend Extraction Unit (TEU) 120. A linear trend is extracted from the data edges. The linear trend is provided by the so-called Nominal Clock, which is driven exclusively with the nominal bit period. The output of the TEU consists of the data-edge positions relative to the Nominal Clock.
Furthermore, the system contains a PLL Core Unit (PCU) 130. The PCU contains the PLL core, which processes a plurality of data edges in parallel. The PLL core can be presented according to the invention as a linear filter or respectively a linear differential equation.
On the basis of the latency of the processing chain, two clock signals are used, on the one hand, the Front Clock in the EMPU for the assignment of data edges and clock edges, and on the other hand, the PLL clock supplied by the PCU, which is responsible for the calculation of the phase errors in the sense of e(k) in
Finally, the system comprises a Trend Injection Unit (TIU) 140. Here, the Nominal Clock is added to the clock edges from the PCU, in order to obtain the final clock-edge position. As in the TEU 120, the Nominal Clock describes a linear trend.
The PCU 140 is capable of processing in parallel a plurality of data edges through an appropriate implementation of the linear differential equation from Equation (5). In the online operating mode, in which the data signal is observed constantly, the number of data edges per time unit can fluctuate slightly. For example with a data stream with an average bit-period number of 2.5 per clock pulse, it can occur that 3 and 2 edges are processed alternately in parallel. The layout of the PCU is simplified if the PLL core is driven with a constant parallelism.
In this context,
In another embodiment of the PCU, the parallel realization of the differential equation (5) can be bypassed, by initially decimating the data edges after appropriate lowpass filtering; the resulting data-edge stream is then processed with a PLL core of low or even single parallelism, and finally, the recovered, estimated clock edges are fed back via an interpolation stage to the original parallelism. For example, the decimation can be implemented by averaging over the elements in one data-edge packet. The clock edges can be recovered, for example, through linear interpolation of the decimated clock-edge sequence.
The invention is not restricted to the exemplary embodiment presented. All the features described and/or illustrated can be combined with one another within the framework of the invention.
Number | Date | Country | Kind |
---|---|---|---|
10 2007 045 085 | Sep 2007 | DE | national |
10 2008 011 845 | Feb 2008 | DE | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2008/006852 | 8/20/2008 | WO | 00 | 10/8/2009 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2009/039924 | 4/2/2009 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4771250 | Statman et al. | Sep 1988 | A |
5469466 | Chu | Nov 1995 | A |
6326851 | Staszewski et al. | Dec 2001 | B1 |
6809598 | Staszewski et al. | Oct 2004 | B1 |
7076377 | Kim et al. | Jul 2006 | B2 |
20020094052 | Staszewski et al. | Jul 2002 | A1 |
20040096025 | Rupp | May 2004 | A1 |
20040136450 | Guenther | Jul 2004 | A1 |
20050144416 | Lin | Jun 2005 | A1 |
20070047686 | Aoki et al. | Mar 2007 | A1 |
20070085579 | Wallberg et al. | Apr 2007 | A1 |
20070205931 | Vanselow et al. | Sep 2007 | A1 |
Number | Date | Country |
---|---|---|
102006007022 | Aug 2007 | DE |
0312671 | Apr 1989 | EP |
0912010 | Apr 1999 | EP |
1152562 | Nov 2001 | EP |
2359223 | Aug 2001 | GB |
WO 2007059409 | May 2007 | WO |
Number | Date | Country | |
---|---|---|---|
20100141308 A1 | Jun 2010 | US |