The following references provide brief description of the prior art:
[1] X. Zhou, “HW Efficient Carrier Recovery Algorithms for Single-Carrier QAM systems,” in SPPCOM'12, OSA, paper SpTu3A.1 (2012).
[2] N. Sigron, I. Tselniker, and M. Nazarathy, “Carrier phase estimation for optically coherent QPSK based on Wiener-optimal and adaptive Multi-Symbol Delay Detection (MSDD),” Opt. Express 20, 1981-2003 (2012).
[3] I. Tselniker, N. Sigron and M. Nazarathy “Joint phase noise and frequency offset estimation and mitigation for optically coherent QAM based on adaptive multi-symbol delay detection (MSDD),” Opt. Express 20, 10944-10962 (2012).
[4] Nobuhiko Kikuchi, Shinya Sasaki, Tetsuya Uda, “Improvement of tolerance to intra-channel non-linear effect of coherent higher-order multilevel signaling with digital delay detection,” in ECOC'12, We,3.C.1 (2012).
[5] S. Zhang, P.-yuen Kam, C. Yu, J. Chen, “Decision-aided carrier phase estimation for coherent optical communication,” JLT 28, 1597 (2010).
[6] X. Liu and M. Nazarathy, “Coherent, self-coherent, and differential detection systems,” Ch.1 in “Impact of Nonlinearities on Fiber Optic Communications, (ed Kumar), Springer (2011).
[7] T. Pfau, S. Hoffmann, and R. Noe, “HW-efficient coherent digital receiver concept with feedforward carrier recovery for QAM constellations,” J. Lightwave Technol. 27, 989-999, (2009).
[8] J. Volder, “The CORDIC trigonometric computing technique,” IRE Tran. Electronic Computers EC-8, 330-334 (1959).
[9] R. Andraka, “A survey of CORDIC algorithms for FPGA based computers,” ACM/SIGDA FPGA '98, 191-200, (1998).
[10] Y. Atzmon, M. Nazarathy, “Laser Phase Noise in Coherent and Differential Optical Transmission Revisited in the Polar Domain,” J. Lightwave Technol. 27, 19-29 (2009).
[11] T. Pfau, X. Liu, S. Chandrasekhar, “Optimization of 16-ary Quadrature Amplitude Modulation Constellations for Phase Noise Impaired Channels,” paper Tu.3.A.6, European Conf. Opt. Comm., ECOC'11 (2011).
[12] M. Taylor, “Phase Estimation Methods for Optical Coherent Detection Using Digital Signal Processing,” J. Lightwave Technol. 24 (2009).
[13] Q. Zhuge et al, “Linewidth tolerant low-complexity pilot-aided phase recovery for M-QAM using superscalar parallelization,” OFC' 12.
[14] K. Itoh, “Analysis of the phase unwrapping algorithm,” Applied Optics, 21, p. 2470 (1982)
[15] Gdeisat and Lilley, “One-Dimensional Phase Unwrapping Problem,” available on the Internet at http://www.ljmu.ac.uk/GERI/CEORG_Docs/OneDimensionalPhase Unwrapping_Final.pdf.
Carrier recovery (CR) and in particular carrier phase and frequency estimation continue to pose performance and computational challenges, especially for higher order transmission constellations, imminent for deployment in the next phase of coherent optical communication systems upgrades for long-haul, metro and access applications.
A plethora of CR methods has been investigated [1]. Among those, Multi-Symbol Delay Detection (MSDD) [2-6] (alternatively referred to as Multi-Symbol Phase Estimation (MSPE) [6] or Maximum likelihood (ML) phase estimation [5]) is gradually gaining recognition as capable of delivering superior performance-complexity tradeoffs. In the wireless transmission context where it originated, MSDD was proven optimal for detection in white noise. In the optical transmission context, MSDD copes well with the combination of ASE, laser and nonlinear phase noises (PN) [4]. For QPSK systems, MSDD [2] is free of cycle slips and provides 1-2 dB OSNR lead over Viterbi & Viterbi CR, whereas for 16-QAM transmission, MSDD performance [3] trails by just a fraction of a dB below the extremely complex Blind Phase Search (BPS) CR [7], considered as a “benchmark”. Numerous CR variants have recently been investigated based on two-staged processing using a coarse BPS first stage feeding a second CR stage realized by various methods [1]. Such CR systems claim substantial reductions of complexity vs. BPS at the expense of some performance degradation. To best of our knowledge, the MSDD CR method for 16-QAM [3] outperforms these other CR methods while still offering less complexity. However, there is still room for further complexity reduction of the MSDD CR sub-system.
There is provide a system, a receiver and a method.
According to an embodiment of the invention there is provided a polar multi symbol differential detection (MSDD) module. Some non-limiting examples of an MSDD module are provided in
The MSDD module (denoted 90, 100, 130 and 200 in
The MSDD module (90, 100, 130 and 200) may include a phase estimator that may be arranged to:
Receive (a) the current phase signal (302) and (b) an estimate (305) of a phase of a last input symbol that preceded the current input symbol.
Generate multiple partial phase estimates (denoted 311 in
Output a reconstructed phase 305 of the current input symbol, wherein the estimate of the reconstructed phase is response to, at least, the multiple partial phase estimates.
The phase estimator may include multiple (L) partial phase estimation circuits (such as 91(1)-91(L) of
The low limit may equal two the phase estimator may include a first partial phase estimation circuit (91(1)) that may be arranged to calculate a first partial phase estimate that is a difference between the current phase signal and a last phase signal, the last phase signal represents a phase of a last input symbol that preceded the current input symbol.
The phase estimator may include a slicer 17, a lookup table 18 and the like.
The phase estimator may include a phase unwrapping circuit (denoted 112 in
The phase estimator may include an averaging circuit (denoted 113 in
The phase estimator may include a phase unwrapping and averaging circuit (denoted 93 in
The phase estimator may include a slicer 17 that may be arranged to receive the average phase estimate and to output the estimate of the reconstructed phase of the current input symbol.
The phase estimator may include a carrier frequency offset (CFO) estimator. Various CFO estimators are illustrated in
The CFO estimator may include a constant CFO phase rotation circuit (see, for example boxes 104, 111, 122, 132 of
The CFO estimator may include an input port for receiving an CFO estimator input signal (see, for example signal 103 of
The CFO estimator may include a wrap unit (such as wrap unit 107 of
The CFO estimator wherein the constant CFO phase rotation circuit may be arranged to calculate the estimate of a constant CFO phase rotation by calculating a moving average (see, for example, boxes 104, 111, 122, 132 of
There may be provided a method for calculating a reconstructed phase, the method may include: receiving a current input symbol; calculating a current phase signal and current amplitude signal that represent a phase and an amplitude of the current input symbol, respectively; generating, in response to the current phase signal and an estimate of a phase of a last input symbol that preceded the current input symbol, multiple partial phase estimates, wherein a plurality of the multiple partial phase estimates of the multiple phase estimates are responsive to (i) phase signals representative of phases of a plurality of input symbols that preceded the current input symbol, and (ii) estimates of the phase of the plurality of input symbols; and calculating a reconstructed phase of the current input symbol, in response to, at least, the multiple partial phase estimates.
According to an embodiment of the invention there is provided a polar multi symbol differential detection (MSDD) module. Some non-limiting examples of an MSDD module are provided in FIGS. 12 and 16-19.
The polar multi symbol differential detection (MSDD) module may include an input unit (15) that may be arranged to receive a current input symbol; and output a current phase signal and current amplitude signal that represent a phase and amplitude of the current input symbol, respectively.
The MSDD module (120, 160, 170, 180 and 190 of
Receive (a) the current phase signal and (b) an estimate of a phase of a last input symbol that preceded the current input symbol.
Generate multiple partial references (denoted 310 in
The MSDD module may include a phase unwrap circuit (112) that may be arranged to receive the multiple partial references and calculate unwrapped partial references.
The MSDD module may include a carrier frequency offset (CFO) module that may be arranged to estimate a constant CFO phase rotation in response to the unwrapped partial references.
The MSDD module may include an output circuit that may be arranged to output a reconstructed phase of the current input symbol, wherein the estimate of the reconstructed phase is response to, at least, the estimate of the constant CFO phase rotation and to the unwrapped partial references.
The CFO estimator may include a moving average circuit (see, for example, boxes 111, 122 of
The CFO estimator may include a moving average circuit that may be arranged to receive only a part (see, for example
The polar MSDD module may include a first weighted sum module (see, for example box 123 of
The CFO estimator may include a moving average circuit (box 208 of
The output circuit may be arranged to subtract (see subtraction unit 119 of
The CFO estimator may include a moving average circuit (See box 131 of
The CFO estimator may include (see
The CFO estimator may include (See
The CFO estimator may include a wrap unit (see
There may be provided a method for calculating a reconstructed phase, the method may include: receiving a current input symbol; calculating a current phase signal and current amplitude signal that represent a phase and an amplitude of the current input symbol, respectively; generating, in response to the current phase signal and an estimate of a phase of a last input symbol that preceded the current input symbol, multiple partial references, wherein a plurality of partial references of the multiple partial references are responsive to (i) phase signals representative of phases of a plurality of input symbols that preceded the current input symbol, and (ii) estimates of the phase of the plurality of input symbols; calculating unwrapped partial references; estimating a constant carrier frequency offset (CFO) phase rotation in response to the unwrapped partial references; calculating a reconstructed phase of the current input symbol, wherein the estimate of the reconstructed phase is response to, at least, the estimate of the constant CFO phase rotation and to the unwrapped partial references.
There is provided in real-time FPGA or ASIC a novel MSDD CR for 16-QAM coherent transmission which is multiplier-free yet attains the same performance as the U-notU variant of MSDD CR disclosed in [2,3].
Any reference to any type of integrated circuit should be interpreted as a reference to any other type of integrated circuit. For example, any reference to an FPGA should be also interpreted as a reference to an ASIC and vice versa.
We implement the new polar MSDD in FPGA or ASIC and demonstrate its real-time HW operation by having it embedded in an off-line pre-computed optical transmission chain which is nevertheless processed by the MSDD FPGA in real-time at full baud-rate speed (25 GBd for example).
The full-speed full-channel HW operation is enabled by a new technique for temporal parallelization of the MSDD HW processing, referred to here as Polyblock Parallelization.
I. Polar-Domain MSDD Carrier Recovery (Phase Recovery Only Version)
A. Theory of Operation
The novel polar-domain MSDD format for 16-QAM uses the CORDIC algorithm [8,9] (add-and-shift and simple logic, no multipliers) to extract the phase angle of the noisy symbol, {tilde under (r)}k, incoming into the MSDD, then performs all its internal manipulations in the angular (phase)) domain, eliminating HW-intensive processing of complex-numbers.
As a brief pre-requisite As in [2,3] and also as explained in our prior MSDD patent (which went PCT) the MSDD presumes a differential precoder (DP) in the transmitter, and also in our references of the manuscript above.
We shall use, as in [2,3], the “inverted-moon” notation {tilde under ({hacek over (x)}≡{tilde under (x)}/|{tilde under (x)}| to denote unity-modulus (|{tilde under ({hacek over (x)}|=1) normalization (which is angle-preserving, ∠{tilde under ({hacek over (x)}≡∠{tilde under (x)}). In this notation the differential precoding (DP) we use at the transmitter is as follows: the information symbols {tilde under (S)}k of the QAM constellation alphabet are mapped by a modulus preserving differential precoder (DP) [Kikuchi] into line symbols, {tilde under (A)}k={tilde under (S)}k{tilde under ({hacek over (A)}−1. Taking the phase argument of both sides this amounts to
∠{tilde under (A)}k=∠{tilde under (S)}k+∠{tilde under ({hacek over (A)}k−1=∠{tilde under (S)}k+∠{tilde under (A)}k−1∠{tilde under (s)}k=∠{tilde under (A)}k−∠{tilde under ({hacek over (A)}k−1, (1)
whereas taking the absolute value yields, |{tilde under (A)}k|=|{tilde under (S)}k|.
This indicates that our DP differentially encodes phase as in Differential Phase Shift Keying systems (∠{tilde under (S)}k=∠{tilde under (A)}k−∠{tilde under ({hacek over (A)}k−1,
thus information is encoded in the phase difference of transmitted line samples), however the magnitude is preserved, allowing to apply DP and reconstruct in the receiver arbitrary constellations such as m-QAM or ring constellations.
A conventional delay detector (or self-homodyne or differential phase detector) generates the decision variable {tilde under ({hacek over (S)}k={tilde under (r)}k{tilde under ({hacek over (r)}k−1*, which in the phase domain corresponds to ∠{tilde under ({hacek over (S)}k=∠{tilde under (r)}k−{tilde under (r)}k−1=∠{tilde under (r)}k−{tilde under ({hacek over (r)}k−1≅∠{tilde under (A)}k−∠{tilde under ({hacek over (A)}k−1=∠{tilde under (S)}k. Thus, in the absence of noise the angle ∠{tilde under ({hacek over (S)}kof the sample which is sliced is ideally equal to the transmitted data angle ∠{tilde under (s)}k.
Notice that the DP phase relation of Eq. (1) implies the following extended recursion (a result to be used in the sequel):
∠{tilde under (A)}k=∠{tilde under (A)}k−i+∠{tilde under (S)}k−i+1+∠{tilde under (S)}k−i+2+ . . . +∠{tilde under (S)}k (2)
For example, ∠{tilde under (A)}k=∠{tilde under (A)}k−2+∠{tilde under (S)}k−1+∠{tilde under (S)}k
The previously disclosed “complex-domain” MSDD CR (in its U-notU flavor) [2,3] generates an improved reference {tilde under (R)}k−1, to be used instead of the previous sample {tilde under (r)}k−1, in order to demodulate {tilde under (r)}k prior to decision: {tilde under (Ŝ)}k={tilde under (r)}k{tilde under ({hacek over (R)}k−1*, with {tilde under (R)}k−1 expressed in terms of the slicer decision {tilde under ({hacek over (s)}k−i in response to {tilde under (Ŝ)}k−i−1:
{tilde under (R)}k−1={tilde under ({hacek over (r)}k−1+{tilde under ({hacek over (r)}k−2k−1+{tilde under ({hacek over (r)}k−3k−2k−1+ . . . +{tilde under ({hacek over (r)}k−Lk−L+1k−L+2 . . . k−1. (3)
Thus, MSDD is a decision-feedback based CR (but quite different from a decision-driven PLL). This algorithm (see block-diagram in
Here we show that all the MSDD processing may be preferably performed in the angular domain, as per
The derivation of the angular-domain block diagram from the complex (Cartesian-domain) block diagram disclosed in [2,3] involves the following PN exponent commutation (PNEC) approximation:
A similar approximation (in continuous rather than discrete-time) was shown in [10] to be surprisingly accurate over a wide angular range. From the identity
it follows that Eq. (4) is equivalent to
indicating that PNEC effectively states that the geometric mean of L uni-modular phasors may be used in this case to well approximate their arithmetic mean (notice that for real-valued numbers the geometric mean generally falls under the arithmetic mean, here for complex-valued unimodular numbers the two means track each other well). Note that L-th root extraction in the complex-domain introduces an
phase ambiguity, requiring a disambiguation algorithm realized by simple boolean logic, as detailed next.
B. Phase Disambiguation (Unwrap) and Averaging Algorithm
The PNEC approximation (4) may be equivalently expressed as by extracting the angle (phase)) of both sides:
1. Let us assume that angles are represented modulo [−π,π), writing ψi=ψi0+2πNi with ψi0ε[−π,π) called the principal part of ψi (equivalent results hold for a modulo [0,2π) representation). Evidently, the left-hand-side of (5) is unaffected by whatever selection of Ni (due to the periodicity of the complex exponents), however the RHS changes by integer multiples of 2π/L, whenever ψi are represented with different Ni factors. It turns out that there exist selections of Ni which make the two sides of (5) approximately equal. These proper Ni values may be determined in terms of the collection of angles {ψi0}i=1L according to the following disambiguation algorithm: Classify the angles {ψi0}i=1L according to their quadrant, Qq≡[qπ/2,(q+1)π/2), q=0,1,2.3. Let the subset of angles falling in the q-th quadrant be denoted by Aq (thus ψi0)i=1L=A1∪A2∪A3∪A4). Further denote the number of angles falling in the q-th quadrant by #Aq.
2. If #A2≧1 and #A3≧1 then represent the angles A3 as ψi=ψi0+2π, i.e. add 2π to the principal parts of all angles in the third quadrant, whereas the angles in the other quadrants are just represented by their principal parts. Else (if either A2=0 or A1=0) then represent all angles by their principal parts.
3. Take the arithmetic mean of all angles as represented as in point 2:
This arithmetic mean then provides a good approximation for the angle of the arithmetic mean of the corresponding phasors.
The principle of operation of the algorithm is outlined in the next subsection. This phase disambiguation and averaging algorithm generically provides an excellent approximation for a collection of angles that are mostly close to each other (with the exception of a few, say one or two, isolated outliers), which is typically the case for phase-noisy reception.
The phase disambiguation described here is a simplified special case of more general phase unwrap algorithms to be introduced later in the disclosure to address the more demanding case when frequency offset is also present.
In the context of polar MSDD, the disambiguated averaging algorithm is applied to select the proper representations of the following set of L angles,
∠{{tilde under ({hacek over (r)}k−1},∠{{tilde under ({hacek over (r)}k−2k−1},∠{{tilde under ({hacek over (r)}k−3k−2k−1} . . . ∠{{tilde under ({hacek over (k)}k−Lk−L+1k−L+2 . . . k−1} (6)
as generated in the top part of
Disambiguation Algorithm Principle
To understand the principle of operation of the algorithm, let us first consider a specific example for L=2, #A2=1=#A3, i.e., one angle in the second quadrant, represented by its principal part ψ10=π−δψ1, δψ1ε[0, π/2) and another angle in the third quadrant ψ20=−π+δψ2,δψ2ε[0,π/2).
If we just averaged the angles as represented by their principal parts, we would obtain an angle pointing in the right-half-plane (Q1∪Q4)
½(ψ1+ψ2)=½(ψ10+ψ20)=½[(π−δψ1)+(−π+δψ2)]=δψ2−δψ1ε[−½π,½π] (11)
However, this average angle does not coincide with that of the two phasors resultant (which falls in the left-hand plane), but is rather antipodal to it (this is readily exemplified by assuming small deviations, δψ1, δψ2 1, though the conclusion generally holds for any deviations in the range [0, π/2). Notice that the resultant of two phasors both in the left-hand plane, Q2∪Q3, always falls in the left-hand plane, as the individual angles do. This indicates that we must modify at least one of the two input angles representations such that their mean end up in the left-hand plane. In this case, according to the disambiguation algorithm, we must add 2π to the angle falling in Q3, making the substitution ψ2=ψ20+2π, while still representing the angle falling in Q2 by its principal part, ψ1=ψ10. After this correction we have:
More generally, assuming first an arbitrary number of angles in Q2, Q3 (at least one of them in each of these two quadrants, the justification of adding 2π to the principal part of each angle in Q3 is that each such angle, say ψiεA3, is going to be represented in the form
ψi=ψi0+2π=(−π+δψi)+2π=π+δψi, (13)
whereas each angle in Q2 is going to be represented in the form ψj=ψj0=π−δψj. Both of these representations are in a CCW one-sided form, therefore their arithmetic mean correctly represents the mean of their corresponding phasors. More generally, one can verify that whenever all angles fall in a particular half-plane Q1∪Q2∪Q2∪Q3, Q3∪Q4, Q4∪Q1, then the proposed disambiguation algorithm functions perfectly. Finally, let us address the case where most of the angles are in a particular half-plane whereas a low number of outlier angles fall in the complementary half-plane. The most problematic case is again having the majority of the angles fall in Q3∪Q4
(at least one angle in the left half-plane) but having say one or two outlier angles fall in either Q1 or Q2. In this case the algorithm will still function well whenever the outlier angles are incapable of pulling the resultant of the phasors outside the left-half-plane. As we assume that the number of outliers is small (one or at most two) this is a highly probable event.
Polar MSDD Hardware Realization Complexity
The proposed polar MSDD structure brings down the CR sub-system complexity to bottom level (without sacrificing performance) as we manage to eliminate all multipliers altogether. We even remove the complex multiplier used for demodulating the noisy signal prior to slicing (multiplication by exp{−jφkest} prior to slicing, where φkest is the estimated phase). Indeed, the conjugate multiplication entailed in the demodulation reduces in the angular domain to a simple subtraction of phase s.
The resulting multiplier-free CR comprises just a reasonably low quantity of simpler elementary operations: additions, comparators, a lookup table for the slicer angular outputs (e.g., 8 bit phase and 8 bit magnitude for the slicer input, i.e., 16 bit input and 4 bit cells for 16QAM, thus a LUT of 256 Kbit) simple comparators logic (for CORDIC and disambiguation), trivial digital word shifts (to divide or multiply by a power-of-two) and block serial-parallel data re-shuffling. These hardware operations are far less complex than multiple complex multiplications (as typically used in other CR schemes). By using the new MSDD, the available multiplier reservoir in the FPGA is freed up for other DSP functionalities (in ASIC realizations the area and power consumption would be reduced).
The optimal window is L=8 (in this case multiplications by 8 or by ⅛ become trivial word-shifts) however the HW itemization below is formulated in terms of a general L window size. Itemizing the required operations for the HW implementation of the averaging+disambiguation algorithm in an exemplary design (
Simulated Performance of the Polar
Simulated Polar-vs. Cartesian-MSDD OSNR-BER performance is shown in
It turns out that other proposed two-staged CR variants, e.g. combining coarse BPS with maximum likelihood or alternative CR methods, also typically fall behind the BPS benchmark performance by 0.2-0.4 dB. It follows that the performance of our polar MSDD is in par with these previous MSDD variants, whereas our proposed multiplier-free scheme provides the lowest complexity.
II. Polyblock HW Parallelization of the MSDD CR
We now disclose a new block-processing oriented technique for temporal parallelization of CR HW processing, applicable in particular to differential detection decision-feedback driven CR schemes such as the MSDD. The proposed MSDD real-time HW parallelization method, referred to here as Polyblock Parallelization (PBP), enables realizing the CR DSP with slower clock for the FPGA or ASIC, while avoiding the conventional “distant-feedback” [2] phase-noise penalty due to parallelized processing. Such penalty is incurred in polyphase temporal parallelization of decision-feedback based schemes (such as the MSDD) upon time de-interleaving the samples and processing the M polyphases via M parallel MSDDs, each slowed down by a factor of M [12]. The M-fold reduction in sampling rate per polyphase degrades the linewidth tolerance of the CR by a factor of M. However, parallelizing the MSDD by means of the new polyblock method essentially eliminates the distant-feedback parallelization penalty.
A. Polyblock MSDD with Initialization Overhead
A first variant of the novel parallelized hardware realization of the MSDD is described in
The degree of parallelization, P, is selected sufficiently large such that the MSDDs is implemented at a sampling rate R/P not exceeding the speed limitation of the HW platform. Each block, streaming out of a particular output port of the B_S/P, then represents a set of B contiguous samples of the original data stream (at the high rate), and can therefore be processed exactly as per
In the Tx (
{tilde under (A)}k={tilde under (s)}k{tilde under ({hacek over (A)}k−1,k=0,1,2, . . . ,B−1{tilde under ({hacek over (A)}−1=1 (7)
Here k is the discrete-time index of the incoming stream of information samples {{tilde under (s)}k}. It is just that the physical time associated with the discrete-time is slowed down by a factor of P in the parallelized realization, relative to an hypothetical full-speed direct implementation of (7), which is not attainable with current ASIC technology. The initialization {tilde under ({hacek over (A)}−1=1 implies that {tilde under (A)}0={tilde under (s)}0, next {tilde under (A)}1={tilde under (s)}1{tilde under ({hacek over (A)}0={tilde under (s)}1{tilde under ({hacek over (s)}0, {tilde under (A)}2={tilde under (s)}2{tilde under ({hacek over (A)}1={tilde under (s)}2{tilde under ({hacek over (s)}1{tilde under ({hacek over (s)}0, amounting to a complex-valued multiplicative accumulator generating the line symbols out of the information symbols, which corresponds to an additive accumulator for the phase s:
{tilde under (A)}k={tilde under (s)}kΠm=0k−1{tilde under ({hacek over (s)}m∠{tilde under (A)}k=Σm=0k∠{tilde under (s)}m, k=0,1,2, . . . ,B−1 (8)
In the Rx (
The sliding window processing of L prior samples implies some degradation in the quality of the first L−1 estimated symbols (to be input into the slicer) over the block head, as the k-th sample, with k<L is just preceded by L-k non-zero samples to average the phase noise over, rather than L samples. The initial L−1 recovered samples are explicitly expressed as follows:
Notice that at time k=0 no data is conveyed as this is the initial symbol of the block, which does not have access to a phase reference ahead of it. It is only for k≧L that the block processor has access to a full window of L past symbols and may generate a “standard” MSDD estimate (or the corresponding polar-domain version of
{tilde under (Ŝ)}={tilde under (r)}k({tilde under ({hacek over (r)}k−1+{tilde under ({hacek over (r)}k−2{tilde under ({hacek over (s)}k−1+{tilde under ({hacek over (r)}k−3{tilde under ({hacek over (s)}k−2{tilde under ({hacek over (s)}k−1+{tilde under ({hacek over (r)}0{tilde under ({hacek over (s)}1 . . . {tilde under ({hacek over (s)}k−1)* (11)
The smaller k is (over the 1≦k<L initial interval), the more degraded its MSDD phase recovery is, due to insufficient white noise averaging over the shortened window. For example, the second recovered symbol, {tilde under (Ŝ)}={tilde under (r)}2{tilde under ({hacek over (r)}1*, amounts to delay detection which has its white noise doubled. Fortunately, if the parallelization block size, B, is large enough, the higher error probability over the block head just slightly raises the average error probability over the overall block (as it is not that all symbols in the block incur uniformly higher error probability, but just the first L symbols in the block do—the average error probability is then slightly higher, with the errors slightly more likely to occur in the head of the block than elsewhere in the block). It is possible to further trade off this slightly higher error probability vs. a slight increase in computational load, by introducing an overlapped block strategy as described in the next subsection.
B. Initialization-Free, Block-Overlapped Polyblock MSDD
In the initial variant of the MSDD polyblock parallelization scheme as discussed above, the error rate is enhanced for the first L samples of each block, during the interval that the MSDD (as initialized by the training sample set to 1 starting each block), converges to steady-state performance.
We now introduce an alternative overlapped polyblock parallelization scheme (
In this overlapped scheme the block-parallel/serial module still partitions the incoming fast rate samples into successive blocks of B samples arrayed into a 2D buffer of P rows of B samples each. However, ahead of this buffer an additional buffer of P rows of LOverlap samples each is prepended as indicated in the figure, forming a (B+LOverlap)×p buffer array. The B samples in each row of the B×P sub-array are written into by the B_P/S, whereas the prepended array is handled as follows: as soon as the B incoming samples are stored in the p-th row, then the last LOverlap samples out of these B samples are copied over (at the reduced rate of R/P, where R is the fast line rate) into the initial LOverlap samples of the p+1-th row (this occurs in parallel with the fast data being deposited into the B samples of the p+1-th row). Notice that the copy introduces an extra latency by a factor of (B+LOverlap)/B=1+LOverlap/B.
We may characterize the overlapped writing into (B+LOverlap)×P buffer array as consisting of writing into the B×P sub-array and prepending an overlap-prefix of LOverlap samples to each B-block, obtained by replicating ahead of the current B-block the last LOverlap samples of the previous B-block.
Given the duplication of the samples, the processing of these particular LOverlap samples may in principle be performed either in the tail of the p-th block or in the appended head of the p+1-th block. It is advantageous to adopt the last mentioned option, since in this case the MSDD sliding window associated for each sample in the L samples tail of each block may now extend L symbols into the past. Thus, the MSDD exercises its normal operation over the last B samples of the p-th block, including the very last L samples. MSDD operation is oblivious to the L-th sample from the end actually being a 1 training symbol.
We explored the possibility of starting the MSDD without any training sequence (which would be desirable as there would be no need for synchronization of the MSDD with respect to the timing phase of the TS). Indeed, as borne out by simulation, as shown in
We may characterize the overlapped writing into (B+LOverlap)×P buffer array as consisting of writing into the B×P sub-array and prepending an overlap-prefix of LOverlap samples to each B-block, obtained by replicating ahead of the current B-block the last LOverlap samples of the previous B-block.
Given the duplication of the samples, the processing of these particular LOverlap samples may in principle be performed either in the tail of the p-th block or in the appended head of the p+1-th block. It is advantageous to adopt the last mentioned option, since in this case the MSDD sliding window associated for each sample in the L samples tail of each block may now extend L symbols into the past. Thus, the MSDD exercises its normal operation over the last B samples of the p-th block, including the very last L samples. MSDD operation is oblivious to the L-th sample from the end actually being a 1 training symbol.
We explored the possibility of starting the MSDD without any training sequence (which would be desirable as there would be no need for synchronization of the MSDD with respect to the timing phase of the TS). Indeed, as borne out by simulation, as shown in
C. HW Complexity Considerations
Notice that the processing must be run at a slightly enhanced rate: Now B+L samples must be processed during the time it takes to deposit B samples into the memory. Thus, the overlapped scheme must operate at a clock-rate elevated by a factor of (B+L)/B=1+L/B, increasing the computational complexity (ops per unit time) by this factor. However, as typically L B, the extra computational load and latency are relatively small
The second discernible cost of using PBP is the allocation of a 1.38 Mbyte buffer in our FPGA and some parallel-serial block data shuffling. In detail, the PBP incremental hardware cost is the inclusion of a two-dimensional block-parallel buffer of size P×B, with P the number of parallel paths and B the block size, which incurs a fractional transmission overhead 1/B (due to the initial symbol which is not useful for transmission). E.g., upon selecting a sufficiently large block size of B=4 Ksamp, a spectral efficiency loss of just 1/B=0.02% is incurred. In our FPGA realization, the GHz channel baud-rate is 25 GS/s, 64 times faster than our FPGA processing clock of 4110 MHz. As the HW takes 5 clock cycles to complete one iteration of the MSDD loop, we then require a temporal parallelization factor of P=64 5=320. The buffer storage for PBP realization, expressed in bits, is then relatively modest (we use 8 bits words, i.e., one byte per phase sample):
P×B=320×4Ksamp=1.28Msamp →1.28Msamp·1 byte/samp=1.28 Mbyte (10)
We should mention that the PBP technique presented here is somewhat similar to the superscalar parallelization used in other CR methods [13], however the PBP is specifically adapted to the current MSDD context.
FPGA Implementation and Real-Time Demo
We actually tested the hardware implementation of the new polar MSDD in FPGA and established its real-time HW operation by having the new MSDD embedded in an optical transmission chain using a single-carrier (SC) QPSK SC differentially encoded signal, occupying a total channel bandwidth of 25 GHz. Notice that it is just the MSDD CR that is demonstrated here in real-time HW; apart from the MSDD FPGA, the rest of the transmission chain (SC transmitter (Tx), fiber channel, receiver (Rx) front-end and DSP) is simulated offline, feeding symbols into the MSDD FPGA memory and reading decisions off the MSDD output memory. However, in between its input and output memory, the MSDD FPGA is demonstrated in real-time at full 25 GBd rate (parallelized over 320 paths).
The following design was implemented using Xilinx Virtex XC6VLX240T FPGA. The FPGA block diagram is shown in
The real-time FPGA_CR+offline_optical_link demo described here verifies the new MSDD as a suitable integrative CR HW solution, simplifying DSP ASIC design. The proposed carrier recovery algorithm is both hardware efficient and its performance is relatively high.
Alternative “pre-delta” embodiment of the Polar MSDD (phase-recovery-only version) and its tolerance to carrier frequency offset (CFO).
We now introduce an alternative equivalent embodiment of the polar MSDD as described in
In this section we shall determine the inherent tolerance to Carrier Frequency Offset (CFO) of the MSDD systems of
In coherent detection homodyne systems, be they optical or wireless, the CFO indicates an undesired difference between transmitter and receiver local oscillator frequencies. We model the impact of CFO as multiplication of the sequence of samples (indexed by discrete-time k) by the phase factor exp{jkθ} where
θ=2πΔvCFOTs=2πΔvCFORs−1 (12)
is the CFO-induced phase increment per sample, ΔvCFO is the CFO in Hz units and Ts is the sampling interval and Rs=Ts−1 is the sampling rate (equal here to the baudrate).
Subsequently we introduce enhancements to the phase compensation MSDD schemes of
Let us briefly analyze system operation of the polar MSDD variant of
{tilde under (r)}k={tilde under (A)}k+{tilde under (n)}k=|1+Re{{tilde under (n)}k}/|{tilde under (A)}k∥{tilde under (A)}kejφ
The AWG-induced phase noise is modeled as a Gaussian noise random process ηk=Im{{tilde under (n)}k}/|{tilde under (A)}k|, N[0,σASE2], where {tilde under (r)}k={tilde under (A)}k+{tilde under (n)}k and {tilde under (n)}k is the additive white zero-mean circular Gaussian (AWZMCG) noise affecting the transmitted symbols {tilde under (A)}k and re/im denotes taking the real/imaginary part, The phase noise and distortion, φk, comprises an Additive White Noise (AWG) component ηk (in optical transmission this is due such as Amplified Spontaneous Emission (ASE)) and random-walk phase noise (in optical transmission this is due to Laser Phase Noise (LPN)). The random-walk phase noise is modeled as a Weiner-Levy random walk process generated by accumulating zero-mean Gaussian increments Ωk of variance
σLPN2=Ωk2=2πΔvLWRs−1, (14)
where ΔvLW is the laser linewidth.
Notice that in addition to the random phase walk by Ωk steps, there is also a systematic phase walk-off, adding in each step a constant phase increment, θ, due to CFO (Eq. (12)), yielding the CFO-induced “ramp”, kθ in the phase sequence φk, which is now described by the overall expression
φk=ηk+stepk(Ωm+θ)=ηk+φk+kθ;φk≡Σm=k
where stepk is a step sequence (equal 1 for non-negative k, zero otherwise) describing the accumulator impulse response (IR) and denotes convolution.
The outputs ∠{tilde under (Ŝ)}k(i), i=1, 2, . . . L of the top row of subtractors in
where the differential precoding relations (1) and (2) were used.
Thus, in the absence of noise and CFO, all partial estimators ∠{tilde under (Ŝ)}k(i) ideally end up equal the transmitted info symbol, ∠{tilde under (S)}k. The white ASE noise yields ηk−ηk−i whereas the cumulative LPN successively degrades the noise, yielding a degradation Σm=k−i+1kΩm. The CFO effect is to generate an offset iθ in the i-th partial estimator. Upon subsequently averaging over all L partial estimators (to extract a quitter version of ∠{tilde under (S)}k) these offsets will yield a constant phase offset (a rotation of the constellation) at the slicer input, to be referred to as “CFO phase rotation”.
Ignoring phase disambiguation for the moment (i.e. pretending all phases are unwrapped phases not confined to a 2π interval), the phase averaging operation over the partial estimators, then yields:
where the laser phase noise contribution is given by
The phase averaging is then seen to reduce the variance of the white noise terms {ηk−i} by a factor of L (but does not improve ηk, which would be the ultimate white noise present in an ideal phase-noise-free local oscillator). The phase noise term φLPN represents the degradation of the MSDD phase estimate due to the LPN, which increases with block length, L (however, for moderate L, such as L=8 in our exemplary system, the LPN degradation is kept relatively small, vs. the larger benefit of ASE noise reduction due to averaging).
The effect of the CFO is then to generate in ∠{tilde under ({hacek over (S)}k at the slicer input a fixed CFO-induced phase rotation:
The slicer performance is degraded whenever this phase rotation becomes excessive. Let us assess the CFO impairment of the polar MSDD CR sub-system. E.g., for an L=8 averaging window, to bound the CFO impairment to 10 mrad of constellation rotation would require to restrict the CFO to
For Rs=25 GBd symbol (and sampling) rate, this would yield a maximum CFO tolerance of |ΔvCFOmax|=8.8 MHz
We next carry out a similar analysis for the alternative equivalent polar MSDD system depicted in
∠{tilde under (R)}k−1(1)=∠{tilde under (r)}k−1+∠{tilde under (A)}k−1ηk−1+φk−1+(k−1)θ
∠{tilde under (R)}k−1(2)=∠{tilde under (r)}k−2+∠{tilde under ({hacek over (s)}k−1=∠{tilde under (A)}k−2ηk−2+φk−2+(k−2)θ+∠{tilde under ({hacek over (s)}k−1=∠{tilde under (A)}k−1+ηk−2+φk−2−2θ+kθ
∠{tilde under (R)}k−1(i)=∠{tilde under (r)}k−i+∠{tilde under ({hacek over (s)}k−i+1+∠{tilde under ({hacek over (s)}k−i+2+ . . . +∠{tilde under ({hacek over (s)}k−1=∠{tilde under (A)}k−i+∠{tilde under ({hacek over (s)}k−i+1+∠{tilde under ({hacek over (s)}k−i+2+ . . . +∠{tilde under ({hacek over (s)}k−1+ηk−i+φk−1+(k−i)θ=∠{tilde under (A)}k−1+ηk−i+φk−i−iθ+kθ (23)
where again the differential precoding relations (1) and (2) were used, and we also assumed that the slicer made no error, i.e.:
∠{tilde under ({hacek over (S)}k−i+1+∠{tilde under ({hacek over (S)}k−i+2+ . . . +∠{tilde under ({hacek over (S)}k−1=∠{tilde under (S)}k−i+1+{tilde under (S)}k−i+2+ . . . +∠{tilde under (S)}k−1 (24)
It is apparent that all the partial references ideally equal ∠{tilde under (A)}k−1 in the absence of noise and CFO. In the presence of noise the common phase ∠{tilde under (A)}k−1 is corrupted by noise fluctuations, the uncorrelated (white noise) components of which are further suppressed by averaging of all L partial references, yielding an improved reference, ∠{tilde under (R)}k−1.
Here, in addition to fixed phase rotations, iθ, all partial references contain a phase ramp, kθ. Notice that this phase ramp was cancelled out in the analysis of
Ignoring phase disambiguation for the moment, the phase averaging operation over the partial references then yields:
Thus, the phase averager output, ∠{tilde under (R)}k−1, referred to as improved reference, is essentially the prior line symbol phase, ∠{tilde under (A)}k−1 (which may be used as reference for differential detection), degraded by an average of the white noise samples (with its variance suppressed by a factor of L due to the averaging effect), by the average phase noise, by a fixed phase rotation
and further containing a phase ramp, kθ. This improved reference phase is subtracted off the phase of the k-th sample, yielding the following estimator at the slicer input:
which reconstructs the same result as in Eq. (19). Notice in particular that the two phase ramps, kθ, in the two terms {tilde under (r)}k, ∠{tilde under (R)}k−1 have cancelled out leaving just a constant phase rotation,
as induced by the CFO, which also appeared in the analysis of
More General Phase Averaging with Non-Uniform Taps for Improved Phase Noise Mitigation
More general embodiments of the two polar MSDD variants of
In the current phase domain context, for a phase-domain MSDD, using unequal weights (taps) in the moving average enables to better approximate the Wiener filter (hence slightly improve performance). However, the tradeoff of using general taps (rather than having all taps be uniformly equal to 1/L) is that additional complex multipliers (multipliers of a variable by a constant) are required (which may be implemented by shifts and adds). It may be possible to select simple tap values, such as n/8 where n=0,1,2, . . . ,8 such that the implementation of the complex multipliers remains relatively simple, yet slightly improve performance. MSDD schemes based on non-uniform averaging taps will be shown in the sequel (also equipped to mitigate CFO), however by removing the CFO mitigation sub-systems, these schemes may be downgraded to phase-noise-mitigating-only MSDDs with non-uniform taps, hence slightly improved phase noise tolerance.
4. Polar MSDD Extended to Also Mitigate Carrier Frequency Offset
The polar-MSDD (
CFO then manifests as a phase ramp θ,2θ,3θ, . . . , Lθ at successive i-lag differential detectors outputs (outputs of the Phase Unwrap module in
operations after phase unwrap) yields a phase offset
The precise same phase must be estimated by CFO EST and cancelled out in the subtractor before the slicer. The challenge is to perform the CFO estimation while leaking in little excess noise and not being affected by the common phase riding on the differential detector outputs. To this end, the CFO EST uses a novel L-taps MISO filter design, with taps given by +++ . . . −−− (where ±indicates ±1), with zero DC gain, yet with inherent noise averaging.
For L=8, the MISO filter output is given by (5θ+6θ+7θ+8θ)−(θ+2θ+3θ+4θ)=16θ, (it features both a derivative and averages) while for general L, it is gθ, with g=L2/4=16. The MISO filter output is smoothed out through an additional LMA-point moving average (here La=128) which retains its DC value, then is scaled out by
(this factor is simply implemented by a single adder and two trivial bit-shifts), yielding the sought estimate
of the CFO induced phase offset, which precisely cancels out the CFO induced fixed angular offset 4.5θ, which is generated by the “regular” MSDD section, mitigating phase noise.
Having explained the functionality of the new CFO EST module, let us briefly review MSDD operation for phase estimation (assuming phase noise but zero CFO) for the benefit of readers who have not consulted prior MSDD papers. In the absence of noise, due to differential precoding (DP) we would have, ∠{tilde under (r)}k−∠{tilde under (r)}k−1=∠{tilde under (s)}k retrieving the data angle into the transmitter DP. The other differential detectors generate ∠k−∠rk−2=∠{tilde under (s)}k+∠{tilde under (s)}k−1. but those are corrected by sums of prior angular decisions such that after the additive corrections all L=8 differential precoders (outputs of the phase unwrap module) would have a common phase equal to ∠{tilde under (r)}k−∠{tilde under (r)}k−1=∠{tilde under (s)}k.
When phase noise of any source is present, the L common phase terms are perturbed by phase noises. The ASE components are independent and may be averaged out by the Σ and
operations after phase unwrap, generating an improved ∠{tilde under (s)}k, estimate fed into the slicer. The laser phase noises in the L averaged phases are actually correlated and degrade the quality of the estimate to the extent the window L is excessively increased, but as the simulation in the next section indicates, for L=8, a beneficial tradeoff between ASE improvement and laser phase noise degradation.
As indicated above we now extend the Polar MSDD such that it also compensate for carrier frequency offset (CFO) in addition to phase noise. We disclose two families of embodiments obtained by enhancing the block diagrams of
Using
In the embodiment of
One particular possible moving average embodiment is shown in the figure (with La equal taps each equal La−1, realized by the cascade of a “skip-La” discrete-time differentiator, yk=xk−xk−L
in the input to the CFO E&C, thus zero DC gain is useful at it cancels out this DC term at the output. Actually, the moving average path, which has unity DC gain, estimates the DC CFO phase rotation term
(while averaging out noise) at its output and subtracts it off the through pass, in effect cancelling out the DC term,
The decision feedback (subtraction of the ∠{tilde under ({hacek over (s)}k−1 decision phase fed from the slicer output out of the unit delayed, Z−1 output ∠{tilde under (Ŝ)}k+CFO of the phase averaging stage−(here the +CFO superscript denotes that CFO is present in this signal to be cleaned up the CFO E&R module)) is intended to strip-off the data in the CFO estimation, as analyzed below.
There final “WRAP” operation at the CFO E&C module output prior to entering the moving average, is required in order to condition the input phase into the slicer to be mapped onto the [−π, π) interval. The WRAP operation is formally defined in terms of the W{ } operator acting on an unwrapped phase sequence {uk} to yield a wrapped phase sequence:
where Arg is the principal argument (angle) function mapping complex numbers to the (−π,90 ] angular range; the modulo operation with respect to the (−π,π] interval was defined in the last equality above, as subtraction (or addition) of an integer multiple of 2π bringing the result within the principal (wrapped phase) interval (−π,π]; the superscript round indicates rounding its argument to the nearest integer.
As the WRAP mapping is many-to-one, it appears that it is not invertible, however by imposing additional a-priori restrictions on the unwrapped phase sequence, uk, one is able to nevertheless uniquely reconstruct {uk} out of {wk}, as discussed further below. The algorithms implementing the unique inversion of the WRAP operation are referred to as UNWRAP and are known art, e.g. [14] briefly reviewed further below to the extent required for explaining our innovative advance. In fact one such particular novel algorithm was described in the context of the MSDD for phase-mitigation-only (
For now let us assume that the output of the “Phase Disambiguation (Unwrap)” sub-block consists of unwrapped phase versions (attaining any possible value) of the L phase inputs into the “Phase Disambiguation (Unwrap)+Averaging” block. Let us derive detailed mathematical analysis of operation of the polar MSDD, including the CFO E&C scheme of
We denote the output of the phase averaging and disambiguation block by ∠{tilde under (Ŝ)}+CFO (the CEO superscript indicates that it is affected by CFO). Then by Eq. (26) (in the slightly different notation here):
A unit delay and subtraction of the slicer decision feedback ∠{tilde under ({hacek over (S)}k−1, yields
where in the last expression we assumed the slicer decision is not in error: ∠{tilde under (S)}k−1=∠{tilde under ({hacek over (S)}k−1.
The last expression in Eq. (29) indicates that the input ∠{tilde under (Ŝ)}k−1+CFO−∠{tilde under ({hacek over (S)}k−1 into the moving average is data-independent, consisting entirely of the constant CFO-induced phase rotation
perturbed by phase noise terms. The moving average further reduces the variance of the phase noises in (29), extracting a relatively clean version of
which is then subtracted off the ∠{tilde under (Ŝ)}k−1+CFO term arriving along the though path, cancelling out the
CFO phase rotation term present in ∠{tilde under (Ŝ)}k+CFO. Notice that moving average path also leaks in some extra phase noise, however because of the averaging action this noise contribution is relatively small, just slightly enhancing the overall phase noise in the ∠{tilde under (Ŝ)}k estimate, while cancelling out the
CFO phase rotation term. This completes the top-level analysis of the scheme of
Alternative post-delta MSDD with discrete-gradient parallel CFO E&C
An alternative embodiment of an MSDD with CFO mitigation is introduced in
The CFO E&C in
Δi[1,L
the output of which is subsequently averaged in an La points moving average (MA) (the particular implementation of the MA is identical to the one within the CFO E&C in
Let us assess the difference Eq. (30) of partial references, accounting for why it may provide, after the moving average a good quality estimate, {hacek over (φ)}, of the CFO phase estimate. Using Eq. (23) for ∠{tilde under (R)}k−1(1) Eq. (30) simplifies to:
It follows that scaling by 1/LΔ (which is a multiplicative term in the overall gain following the moving average) and applying the moving average linear operation, denoted MA{ }, yields:
Thus, we have obtained an estimator {hacek over (φ)} for θ. The noise fluctuation φk{hacek over (φ)} in it is initially suppressed in power by a factor of
relative to those of ηk−1−ηL
Next, multiplication of {hacek over (φ)} by the constant ½(L+1) yields
which is an estimator for the CFO phase rotation of Eq. (21), repeated here:
This estimated phase rotation is subtracted off the ∠{tilde under (r)}k signal from the CORDIC module, thus a term
is added to ∠{tilde under (r)}k (Eq. (15)) yielding
where the term
is the phase noise leaking through the CFO E&R module.
Finally, the difference between this signal and the improved reference ∠{tilde under (R)}k−1+CFO is generated and fed into the phase input of the polar domain slicer:
∠{tilde under (Ŝ)}=∠{tilde under (r)}k+CFO−∠{tilde under (R)}k−1+CFO (35)
Notice that a WRAP operation should also be included onto the input into the polar slicer (which by definition only accepts phase inputs in the [π, π) range, but this WRAP operation is not explicitly shown, as it is assumed to be included within the slicer module
In turn, ∠{tilde under (R)}k−1+CFO, coincides up to notation with Eq. (25):
Evaluating Eq. (35) by subtracting Eqs. (34) and (36) yields:
where the laser phase noise term φkLPN is given by Eq. (20), and we note the beneficial cancellation of the
upon subtracting the two signals. It is apparent that the slicer input is essentially the transmitted data symbol phase, corrupted by white noise (with reduced variance due to the
averaging, relative to a conventional delay detector, wherein the ASE induced white phase noise term would be ηk−ηk−1 rather than
here. The ratios of powers of these two terms is obtained by observing that ASE noise samples at different discrete times may be assumed independent, hence add up on a power basis, yielding var{ηk−ηk−1}2ση2 for the conventional delay detection case vs.
Thus, the ratio of the two variances is
which is the factor by which ASE noise is suppressed by the MSDD.
At the same time one may derive an enhancement factor for the laser phase noise power relative to a simple delay detector, wherein the LPN term is φk−φk−1=φk, with variance given by σΩ2. To this end one may evaluate the variance of φkLPN as given by Eq. (20):
The derivation is deferred to further below, however simulations show that the degradation due to the enhanced LPN is less than the improvement due to averaging the white ASE noise, thus overall the MSDD acts to reduce the total noise. It is also important to assess the magnitude of the noise enhancement due to phase noise leaking through the CFO E&R, as represented by the
term. An evaluation of this term is also deferred at this point, however one may get a sense that this term ought to be small given the action of the moving average, which reduces the overall phase noise. Notice that the differentiation occurring in the discrete gradient of Eq. (30) effectively whitens the phase noise, extracting the independent increments of the Wiener phase noise process, allowing the averaging of the moving average to further reduce the resulting whitened noise. In fact the size of the averaging window may be selected sufficiently large such that significant suppression of the power of the φkCFO E&R is attained. A consideration why not to increase the moving average window indefinitely is related to the case when the CFO is not constant but linearly drifts (the presence of chirp). In this case the term related to the chirp is not cancelled by the CFO E&R but in order to keep this term small in magnitude the averaging window may not be taken indefinitely large.
Generalized MSDD Embodiments with Both Phase Noise and CFO Mitigation
A generalization for the post-delta parallel CFO E&C embodiment of
of
with arbitrary coefficients, as shown in
The zero DC gain condition means that the taps should satisfy Σi=1Ldi=0. We refer to the resulting filter as generalized derivative, since the zero DC gain condition implies that the filter steady-state response to a constant is zero, hence the steady-state response to a ramp is a constant—as the discrete-time derivative behaves.
The output of the generalized derivative is fed into a moving average linear time invariant filter with taps {ai} (finite or infinite). Moreover, we also generalize the phase averaging operation (used for suppression of phase noise, unrelated to the CFO) from the particular special case of having L constant taps each equal to 1/L to using L taps with arbitrary values {ci}i=1L, which should satisfy Σi=1Lci=1, i.e., unity DC gain—which makes it behave like an average—the average of a constant sequence is the same constant sequence.
The coefficient g is selected such as to perfectly cancel the fixed CFO-induced phase rotation term, however this coefficient may be alternatively absorbed within the {di}i=1L coefficients, as the linear constraint Σi=1Ldi=0 still be satisfied even when all coefficients are multiplied by a arbitrary factor g.
As a particularly useful example of the generalized post-delta parallel CFO E&C MSDD of Evidently, a filter with these taps qualifies as generalized derivative as it has zero DC gain (the taps sum up to zero).
Using Eq. (23), repeated here for convenience,
∠{tilde under (R)}k−1(i)=∠{tilde under (A)}k−1+ηk−iφk−i−iθ+kθ
each pair of adjacent taps, {1,−1} in the taps design of Eq. Error! Reference source not found, generates the following contribution to the FIR filter output:
Summing over all pairs of adjacent taps, the contribution of the laser phase noise term is Σn=0L/2Ω2n+1=Ω1Ω3+ . . . +ΩL−1 and the contribution of the CFO is L/2θ. Scaling the filter by
normalizes the CFO contribution to θ, but then the laser noise is scaled down to
A similar analysis may be made for the ASE noise contribution. The noise contributions are the same as the ones generated by the system of
As for a generalization of the MSDD carrier recovery system of the post-delta type of
The taps {ci}i=1L of the phase averaging filter satisfy unity DC gain, Σi=1L ci=1. The taps of the moving average {ai}i=0∞ (possibly a finite sequence) satisfy Σi=1L ai=1, such that both filters qualify as averages (the average of a constant sequence is the same constant sequence).
The taps, {di}L, {ci}i=1L, {ai}i=0∞ for the generalized MSDDs of FIGS. 12,13 should be preferably selected for maximal suppression of the overall phase noise in the slicer input ∠{tilde under ({hacek over (S)}k, subject to the constraint that the CFO term be cancelled out.
Polar MSDD CFO E&C as Frequency Detector in Phase/Frequency Locked Loops
The robust CFO estimator provided within the Δ-filter may be used as phase (or rather frequency) detector element in a digital Phase Locked Loop (PLL) or digital-analog Frequency Locked Loop (FLL) correcting the CFO upstream of the polar MSDD, typically ahead of the chromatic dispersion (CD) and polarization equalizers. This is achieved either by tuning the frequency of the local oscillator (LO) laser (in the FLL case) or by synthesizing a time-varying phase to be applied to a digital multiplier located ahead of the equalizers (in the PLL case). The CFO estimator in the MSDD provides an estimate of θk, the instantaneous carrier frequency offset (which reduces to a constant θ for constant CFO). This auxiliary output of the CFO E&C is indicated in both
Now we briefly describe the PLL/FLL actuated by this auxiliary output.
The local oscillator (LO) laser essentially acts as a voltage tuned oscillator (VTO), driven by tuning frequency electrical control. As the frequency response of the laser tuning is substantially low-pass, to “close the loop” on the laser frequency it essentially suffices to low-pass filter the θk signal provided by the MSDD and apply the low bandwidth digital low-pass filter actuating output to a Digital to Analog Converter (DAC), the output of which drives the LO laser frequency tuning control. The low-pass filtering may be realized multiplier-free, with low-complexity by decimating the CFO estimator output θk. An efficient hardware implementation successively passes the CFO estimator through a down-sampler by the factor La, followed by a sequence of decimators each of may consist of a moving average followed by a down-sampler (alternatively a more general FIR or IIR filter may be used ahead of the down-sampler instead of the moving average). There is need for substantial decimation within the FLL loop, since the sampling rate of the CFO E&C within the MSDD is very high (for optical systems, in the GS/s range) whereas the frequency response of the laser tuning is substantially low-pass, in the KHz range, thus about 6 orders of magnitude sampling rate reduction is required).
The resulting system also comprises an all-digital PLL which may be realized by passing the CFO estimator output θk through a digital loop filter which should contain at least one digital integrator (accumulator) to convert the frequency estimates (phase increments) θk into phase samples φk=Σm=0kθm used to demodulate the complex samples via multiplication by exp{−jφk}. In its simplest form the PLL loop filter driven by θk just consist of the accumulator. More generally additional loop filtering may be inserted to modify the loop dynamics (e.g., an additional integrator may allow the loop to follow slow but steady CFO ramps). To summarize this section, the CFO E&C module of the polar MSDD provides a fast and accurate sensor of the instantaneous frequency, which may be used in externally actuating frequency or phase control, closing the loop in FLL/PLL in the receiver system.
FIG. 14-PLL/FLL for CFO mitigation using the MSDD CFO E&C as CFO sensor <<note: two continuous lines should connect the X-pol and Y-pol outputs of the OPT Rx FRONT END with the two digital multipliers ahead of the Rx back-end CD/POL equalizer. Other details which may not be clear from the messy drawing: The input at the left is labeled “INPUT FIBER”. There are three down-samplers (marked by down-arrows).
Design Tradeoffs for the MSDD System of
The three design parameters at our disposal in the MSDD system of
of the Δ-filter), however there are adverse tradeoffs limiting their increase as follows: LΔ may not exceed L; let us then set it to L; Making La too large, i.e., using too long an averaging window allows chirp (a CFO ramp) to accumulate as discussed in the next section. For agile burst receivers, a design “sweet-spot” consists of L=8, LΔ8, La=128. With these values, the resulting noise enhancement factor of Eq. Error! Reference source not found. is small. On the other hand, for a non-agile (non-burst) receiver, La may be selected much larger, making the NEF much closer to unity.
Embodiments of Post-Delta MSDD with Parallel CFO E&C Detailing the UNWRAP Module
In this particular case it will also work by applying the FIR with +1,−1,+1,−1, . . . taps in the unwrapped phase domain—this is because such an FIR may be viewed as summation of increments between each pair of adjacent inputs (partitioning the L inputs into L/2 disjoint pairs and taking the difference within each pair, then summing up. But this is equivalent to evaluating discrete gradients within each pair, in the unwrapped phase domain. However, by
Itoh's theorem increments in the unwrapped domain equal increments in the wrapped domain, hence the FIR with +1,−1,+1,−1 may indeed be equivalently applied in the wrapped phase domain.
Moreover, the terms “front,” “back,” “top,” “bottom,” “over,” “under” and the like in the description and in the claims, if any, are used for descriptive purposes and not necessarily for describing permanent relative positions. It is understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in other orientations than those illustrated or otherwise described herein.
The connections as discussed herein may be any type of connection suitable to transfer signals from or to the respective nodes, units or devices, for example via intermediate devices. Accordingly, unless implied or stated otherwise, the connections may for example be direct connections or indirect connections. The connections may be illustrated or described in reference to being a single connection, a plurality of connections, unidirectional connections, or bidirectional connections. However, different embodiments may vary the implementation of the connections. For example, separate unidirectional connections may be used rather than bidirectional connections and vice versa. Also, plurality of connections may be replaced with a single connection that transfers multiple signals serially or in a time multiplexed manner. Likewise, single connections carrying multiple signals may be separated out into various different connections carrying subsets of these signals. Therefore, many options exist for transferring signals.
Each signal described herein may be designed as positive or negative logic. In the case of a negative logic signal, the signal is active low where the logically true state corresponds to a logic level zero. In the case of a positive logic signal, the signal is active high where the logically true state corresponds to a logic level one. Note that any of the signals described herein may be designed as either negative or positive logic signals. Therefore, in alternate embodiments, those signals described as positive logic signals may be implemented as negative logic signals, and those signals described as negative logic signals may be implemented as positive logic signals. Furthermore, the terms “assert” or “set” and “negate” (or “deassert” or “clear”) are used herein when referring to the rendering of a signal, status bit, or similar apparatus into its logically true or logically false state, respectively. If the logically true state is a logic level one, the logically false state is a logic level zero. And if the logically true state is a logic level zero, the logically false state is a logic level one.
Those skilled in the art will recognize that the boundaries between logic blocks are merely illustrative and that alternative embodiments may merge logic blocks or circuit elements or impose an alternate decomposition of functionality upon various logic blocks or circuit elements. Thus, it is to be understood that the architectures depicted herein are merely exemplary, and that in fact many other architectures may be implemented which achieve the same functionality. Any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality may be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being “operably connected,” or “operably coupled,” to each other to achieve the desired functionality.
Furthermore, those skilled in the art will recognize that boundaries between the above described operations merely illustrative. The multiple operations may be combined into a single operation, a single operation may be distributed in additional operations and operations may be executed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments.
Also for example, in one embodiment, the illustrated examples may be implemented as circuitry located on a single integrated circuit or within a same device. Alternatively, the examples may be implemented as any number of separate integrated circuits or separate devices interconnected with each other in a suitable manner.
Also for example, the examples, or portions thereof, may implemented as soft or code representations of physical circuitry or of logical representations convertible into physical circuitry, such as in a hardware description language of any appropriate type.
Also, the invention is not limited to physical devices or units implemented in non-programmable hardware but can also be applied in programmable devices or units able to perform the desired device functions by operating in accordance with suitable program code, such as mainframes, minicomputers, servers, workstations, personal computers, notepads, personal digital assistants, electronic games, automotive and other embedded systems, cell phones and various other wireless devices, commonly denoted in this application as ‘computer systems’.
However, other modifications, variations and alternatives are also possible. The specifications and drawings are, accordingly, to be regarded in an illustrative rather than in a restrictive sense. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word ‘comprising’ does not exclude the presence of other elements or steps then those listed in a claim. Furthermore, the terms “a” or “an,” as used herein, are defined as one or more than one. Also, the use of introductory phrases such as “at least one” and “one or more” in the claims should not be construed to imply that the introduction of another claim element by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim element to inventions containing only one such element, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an.” The same holds true for the use of definite articles. Unless stated otherwise, terms such as “first” and “second” are used to arbitrarily distinguish between the elements such terms describe. Thus, these terms are not necessarily intended to indicate temporal or other prioritization of such elements.
The mere fact that certain measures are recited in mutually different claims does not indicate that a combination of these measures cannot be used to advantage.
While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.
This application claims the benefit of US provisional patent filing date Mar. 11, 2013, Ser. No. 61/775,709 which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
5740204 | Nagashima | Apr 1998 | A |
6674814 | Tanada | Jan 2004 | B2 |
Number | Date | Country | |
---|---|---|---|
20140254723 A1 | Sep 2014 | US |
Number | Date | Country | |
---|---|---|---|
61775709 | Mar 2013 | US |