The prior art may be disclosed in some of the following publications. In the specification a prior art reference will be addresses by referring to the number of that prior art reference in the list. For example the first prior art reference in the list will be referred to as [1]:
There may be provided a receiver that may include one or more carrier recovery modules, wherein a carrier recovery module may include: a port arranged to receive a receiver input signal that is representative of an optically coherent signal that was received by the receiver as a result of a transmission, by an optical transmitter, of a transmitter signal that is a carrier signal that is being modulated by information; a reference signal generator that may be arranged to generate a reference signal that estimates the carrier signal; a decision module that may be arranged to demodulate the receiver input signal by the reference signal to provide a demodulated signal and to evaluate the demodulated signal to provide an decision module output signal that estimates the carrier signal; wherein the reference signal generator may include: a delay and rotation module that may be arranged to delay receiver input signals to provide delayed receiver input signals and to align the delayed receiver input signals by a rotation that is responsive to the decision module output signal thereby providing aligned signals; and a multiplication and summation module that may be arranged to generate the reference signal by calculating a weighted sum of the aligned signals.
The decision module may be a slicer.
The decision module may include a normalizing module that may be arranged to normalize the decision output signal to provide a normalized output signal signals that is used to rotate at least some of the delayed receiver input signals.
The multiplication and summation module may include multiple adders; and only a single multiplier; wherein the multiple adders are arranged to add the aligned signals to provide an first sum and wherein the single multiplier may be arranged to multiply the first sum by a single coefficient to provide the weighted sum of the aligned signals. The single coefficient may equal 1/L, wherein L is a number of the aligned signals.
The multiplication and summation module may include less than L multipliers, wherein L is a number of the aligned signals.
The multiplication and summation module may include multipliers that are arranged to multiply the aligned signals by coefficients.
The receiver may include a coefficient calculator that may be arranged to calculate the coefficients.
The coefficient calculator is fed by the decision module output signal.
The coefficient calculator may be arranged to calculate the coefficients by applying a Wiener optimization process.
The coefficient calculator may be arranged to calculate the coefficients by applying a least mean square error optimization process.
The coefficient calculator may be arranged to calculate a current value of a certain coefficient, the certain coefficient is to be multiplied by a certain aligned signal, in response to a last value of the certain coefficient, a value of the certain aligned signal and a certain delayed receiver input signal that is associated with the certain aligned signal.
The receiver may include an input module, an output module and multiple carrier recovery modules coupled between the input and output modules; wherein the input module may be arranged to receive a sequence of receiver input signals and to send to each of the multiple carrier recovery modules a sub-sequence of receiver input signals; wherein the multiple carrier recovery modules are arranged to output decision module output signals; and wherein the output module may be arranged to receive the decision module output signals from the multiple carrier recovery modules and to output a sequence of decision module output signals.
Each sub-sequence of receiver input signals may include at least one thousand consecutive input receiver signals.
There may be provided a method for carrier recovery, the method may include: receiving a receiver input signal that is representative of an optically coherent signal that was received by the receiver as a result of a transmission, by an optical transmitter, of a transmitter signal that is a carrier signal that is being modulated by information; generating, by a reference signal generator, a reference signal that estimates the carrier signal; demodulating the receiver input signal by the reference signal to provide a demodulated signal; evaluating the demodulated signal, by a decision module, to provide an decision module output signal that estimates the carrier signal; wherein the generating of the reference signal may include: delaying, by a delay and rotation module, receiver input signals to provide delayed receiver input signals; aligning the delayed receiver input signals by a rotation that is responsive to the decision module output signal thereby providing aligned signals; and calculating, by a multiplication and summation module, a weighted sum of the aligned signals to provide the reference signal.
There may be provided a non-transitory computer readable medium that stores instructions to be executed by a receiver, the instructions are for: receiving a receiver input signal that is representative of an optically coherent signal that was received by the receiver as a result of a transmission, by an optical transmitter, of a transmitter signal that is a carrier signal that is being modulated by information; generating, by a reference signal generator, a reference signal that estimates the carrier signal; demodulating the receiver input signal by the reference signal to provide a demodulated signal; evaluating the demodulated signal, by a decision module, to provide an decision module output signal that estimates the carrier signal; wherein the generating of the reference signal may include: delaying, by a delay and rotation module, receiver input signals to provide delayed receiver input signals; aligning the delayed receiver input signals by a rotation that is responsive to the decision module output signal thereby providing aligned signals; and calculating, by a multiplication and summation module, a weighted sum of the aligned signals to provide the reference signal.
The decision module may be a slicer.
The method according to claim may include normalizing, by a normalizing module of the decision circuit, the decision output signal to provide a normalized output signal signals that is used to rotate at least some of the delayed receiver input signals.
The method according to claim wherein the calculating of the weighted sum may include adding the aligned signals to provide an first sum and multiplying the first sum by a single coefficient to provide the weighted sum of the aligned signals.
The single coefficient may equal 1/L, wherein L is a number of the aligned signals.
The multiplication and summation module may include less than L multipliers, wherein L is a number of the aligned signals.
The calculating of the weighted sum may include multiplying the aligned signals by coefficients.
The method may include calculating, by a coefficient calculator, the coefficients.
The calculating of the coefficients may include receiving the decision module output signal.
The calculating of the coefficients may include applying a Wiener optimization process.
The calculating of the coefficients may include applying a least mean square error optimization process.
The calculating of the coefficients may include calculating a current value of a certain coefficient, the certain coefficient is to be multiplied by a certain aligned signal, in response to a last value of the certain coefficient, a value of the certain aligned signal and a certain delayed receiver input signal that is associated with the certain aligned signal.
The method may include: receiving by an input module a sequence of receiver input signals; sending to send to each of carrier recovery module of multiple carrier recovery modules a sub-sequence of receiver input signals; outputting, by the multiple recovery modules, decision module output signals to an output module; and outputting by the output module, a sequence of decision module output signals.
Each sub-sequence of receiver input signals may include at least one thousand consecutive input receiver signals.
The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying drawings in which:
It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.
Carrier Recovery
Carrier Recovery (CR) is a critical component in modern DSP-oriented coherent receivers (Rx) for 100-400 G transmission and beyond. Multiple carrier phase estimation (CPE) methods have been heretofore considered for QPSK transmission, among them [1-10]. One of the most popular CPE techniques for QPSK coherent detection is the Viterbi&Viterbi algorithm [1], which is conceptually elegant, yet suffers from phase wrap-around effects, cycle slips and noise enhancement due to the non-linear M-th power and scaled argument extraction operations.
The Multi-Symbol-Delay Detection (MSDD) carrier phase estimation technique is derived here for optically coherent QPSK transmission, introducing the principle of operation while providing intuitive insight in terms of a multi-symbol extension of nave delay-detection. We derive here for the first time Wiener-optimized and LMS-adapted versions of MSDD, introduce simplified hardware realizations, and evaluate complexity and numerical performance tradeoffs of this highly robust and low-complexity carrier phase recovery method. A multiplier-free carrier phase recovery version of the MSDD provides nearly optimal performance for linewidths up to ˜0.5 MHz, whereas for wider linewidths, the Wiener or LMS versions provide optimal performance at about 9 taps, using 1 or 2 complex multipliers per tap.
There is provided a novel carrier recovery (CR) technique for QPSK optically coherent links, based on Multi-Symbol-Delay Detection (MSDD), called Multi-Symbol-Differential Detection (also with MSDD acronym) in the wireless literature, alternatively referred to by the synonym term Multi-Symbol-Phase Estimation (MSPE) which was also used in photonic applications.
Historically, MSDD was introduced in the electrical communications context more than two decades ago [11-14]. More recently, the MSDD method was applied to carrier phase estimation for coherent receivers under the name Maximum Likelihood Phase Estimation, by a group from the National Univ. of Singapore [15-17]. While those works applied the MSDD technique to coherent optical detection, prior applications of MSDD were already introduced in optical communication by multiple groups since 2005, in the related context of self-coherent detection (coherent-grade incoherent detection without a local oscillator) [18-28], on which topic a review chapter recently appeared in [29]. Our interest here is MSDD for coherent rather than incoherent or self-coherent detection, but it should be mentioned that the mathematics of self-coherent and coherent MSDD are formally similar. The applicability of MSDD to coherent detection was recently previewed in our brief expositions [30], [31] which explored applications to both QPSK and QAM coherent receivers. N. Kikuchi et al also recently ported their self-coherent or incoherent detection MSDD approach (called in their language “delay detection”) to the realization of a CR sub-system for coherent detection [32].
It turns out that, beyond QPSK, our MSDD methodology is also applicable to QAM coherent detection, as well as to carrier frequency offset (CFO) estimation in addition to phase estimation. Nevertheless, for ease of exposition of the initial concept, in this work we focus exclusively on thoroughly deriving and explaining MSDD carrier phase estimation (CPE) principle for QPSK coherent detection, relegating to a future publication the additional MSDD extensions to QAM and to CFO tracking and correction. The MSDD CPE method is theoretically derived and simulated here in the QPSK transmission context, however we emphasize that our method is actually “QAM-ready”—the block diagrams developed here will function for QAM as well, however QAM extensions our outside the QPSK-oriented paper scope.
We aim to establish MSDD as a preferred alternative for accurate yet simple QPSK carrier phase estimation and correction. Unlike prior methods, our MSDD method is optimal in the Minimum-Mean-Square-Error (MMSE) sense, in the wake of channel statistics consisting of a combination of ASE-induced phase noise (PN) and laser phase noise (LPN), i.e. the MSDD CR will exhibit the best possible OSNR performance and tolerance to laser linewidth (LW). The adaptive LMS version, as derived here in detail for the first time, requires no prior knowledge of channel statistics—it learns the channel whatever the relative strengths of ASE and LPN (OSNR vs. LW) are, automatically adjusting the taps for optimal performance.
Notice that we inevitably require multiple, L, taps in order to suppress the phase noise by an effective averaging effect. The computational complexity of our optimized algorithm is about one complex multiplier (CM) per tap for the Wiener-optimal version with fixed coefficients and about 2 CMs per tap for the LMS-adaptive version. However, at the expense of slight (or in some cases negligible) reduction in performance, if we give up optimized coefficients but rather make all tap coefficients equal to unity, we obtain an MSDD variant of ultimate simplicity: The CPE becomes multiplier-free. This version has negligible performance penalty relative to a fully optimized MSDD, in the prevalent scenario that for coherent-grade lasers with 100 KHz linewidth are used in the transmitter and for LO, and even up to 0.5 MHz linewidth for a parallelization factor of 16. In addition to performance and complexity metrics, we should also mention that the MSDD CR method is robust, providing uninterrupted operation, as MSDD processing is essentially linear time-varying, rather than non-linear, thus cycle slips and other non-linear phase-wrapping artifacts of the competing leading M-power (Viterbi&Viterbi) method for QPSK CR, are completely eliminated.
The paper is structured as follows: Section 2 reviews generic CR concepts and discusses the naïve Delay Detector (DD), which is extended in section 3 to the more advanced MSDD concept, explaining the MSDD principle of operation. Section 4 develops a Wiener filtering solution, optimizing the MSDD coefficients for a channel affected by both ASE-induced and laser source phase-noises. In Section 5 we derive an LMS adaptive algorithm for the MSDD coefficients. Section 6 introduces efficient implementations and evaluates computational complexity of the MSDD. Two hardware structures are derived: a very low complexity multiplier-free CPE which is non-adaptive and non-optimized (but displays nearly optimal performance for low linewidths) and a more complex optimally performing Wiener or LMS-adaptive version. Section 7 develops the polyphase hardware parallelization of the MSDD. Section 8 presents numeric simulation performance results and Section 9 concludes the paper.
Appendix A reviews some differential precoding mathematical properties, Appendix B details the derivation of the Wiener optimal solution and Appendix C collects the relevant abbreviations used in this paper.
Carrier Recovery (CR) Concepts—Naïve Delay Detector (DD)
2.1 Differential Precoding
Differential precoding is used in Direct Detection Differential Phase Shift Keying (DPSK) systems, yet here we are interested in CR for coherent rather than direct detection. Our motivation for reviewing and expanding the DP concept is that MSDD carrier recovery may be viewed as a generalization of DPSK, retaining some of the DPSK advantages while overcoming the sensitivity disadvantage of DPSK. A coherent QPSK transmitter (Tx) intended to operate with an MSDD based receiver (Rx), should include a Differential Precoder (DP) (
|{tilde under (A)}k|=|{tilde under (A)}k-1|=A;∠{tilde under (A)}k=∠{tilde under (s)}k+∠{tilde under (A)}k-1∠{tilde under (s)}k=∠{tilde under (A)}k−∠{tilde under (A)}k-1 (1)
The line symbols {tilde under (A)}k generated at the DP output are pulse-shaped and optically transmitted.
The DP recursion (1) amounts to an additive accumulator in the phase domain: The QPSK information phase ∠{tilde under (s)}k sets the difference between two successive phases of the line symbols, i.e. information is encoded in the phase differences transmitted on the line.
A more mathematically abstract formulation of the DP, amenable to generalizing the current QPSK MSDD to a higher-order QAM constellation, is obtained in terms of the following unimodular normalization operation, referred to as “Uop”,
which normalizes a given phasor (complex-number {tilde under (z)}) into a unimodular output phasor (unimodular means unity modulus, |{tilde under ({hacek over (Z)}|≡1), retaining the same angle (argument): ∠{tilde under ({hacek over (Z)}=∠{tilde under (z)}.
A modulus-preserving differential precoder (MP-DP) applicable to both QPSK and QAM was proposed by N. Kikuchi [26]. In the polar domain this MP-DP is described as accumulating the phase,
while preserving the modulus, |{tilde under (A)}k|=|{tilde under (s)}k|. In the Uop based complex notation, the MP-DP is compactly represented as {tilde under (A)}k={tilde under (s)}k:
{tilde under (A)}k={tilde under (s)}k{tilde under ({hacek over (A)}k-1|{tilde under (A)}k|=|{tilde under (s)}k| and ∠{tilde under (A)}k=∠{tilde under (s)}k+∠{tilde under (A)}k-1. (3)
Here we prefer to express Kikuchi's MP-DP description, originally expressed in the polar (magnitude and phase) domain, in the more abstract equivalent form {tilde under (A)}k={tilde under (s)}k{tilde under ({hacek over (A)}k-1, formulated terms of the Uop. The mathematics of the Uop and MP-DP modules are developed in Appendix A.
In this paper our exclusive focus is on QPSK coherent transmission. It is readily verified that the MP-DP transformation, (3), generally applicable to QAM, reduces to (1) in the special case of a QPSK constellation, wherein |{tilde under (s)}k|=A. Henceforth, for brevity, we use the term DP in the sense of MP-DP (such as in
2.2 Link Model Including the CR
The QPSK Tx (
where φkLPN are the samples of the laser phase noise (LPN), ejθk describes the spinning of the constellation due to laser frequency offset (FO), and is additive circularly symmetric white Gaussian noise, due to Amplified Spontaneous Emission (ASE) (possibly also including other smaller white noise contributions such as the ADC quantization and thermal noise). In this paper we do not further consider the FO impairment, but rather exclusively focus on phase noise mitigation, thus (4) reduces to the simple memoriless channel model
with the LPN φkLPN given by Wiener-Levy (LPN) random process, accumulating independent gaussian phase noise increments:
with T the sampling interval and the normalized combined lasers linewidth. The normalized angular ASE noise introduced in (5) is also circular Gaussian, with scaled-down variance σ{tilde under (η)}≡σ{tilde under (n)}/|A|2. The total phase noise effect is compactly encapsulated in the PN multiplicative noise sequence {tilde under (p)}k comprising LPN and ASE additive contributions:
2.3 Naïve Delay Detector
A simple CPE strategy is to use delay-detection (DD). The simple CR
is variously referred to as delay detector, differential detector or delay demodulator (all abbreviated as DD). The received signal {tilde under (r)}k=|{tilde under (r)}k|ej(∠{tilde under (A)}
yielding
where the differential precoding relation (1) was used in the last equality. The rounded hat on the CR output indicates that this is an “analog” estimate of the transmission symbol {tilde under (s)}k, attempting to approximate at least the phase ∠{tilde under (s)}k=∠{tilde under (A)}k−∠{tilde under (A)}k-1 of as faithfully as possible. This noisy estimate, is then sliced (its phase is quantized) in order to extract the decision (pointed hat denotes decision, rounded hat denotes the CR output—noisy estimate of to be input into the slicer). As is well known, the naïve DD is too noisy (approximately doubles the input ASE noise power) thus fails to provide a useful CR for coherent detection. Nevertheless the delay-detection concept is the starting point leading to the high-performance MSDD CR realization, which is interpreted as a generalization of the naïve DD.
The output signal of delay unit 84 is also sent to adder 94. The receiver input signal and a the complex conjugate of the output signal adder 94 are fed to multiplier 96 that is followed by slicer 98.
Either of these rotated-into-alignment past samples may be selected to serve as a “partial” reference. The improved reference will be obtained by taking a linear combination of all of the partial reference (
The receiver input signal and a the complex conjugate of the output signal of adder 118 are fed to multiplier 119 that is followed by slicer 120.
3. From the Naïve DD to MSDD Carrier Recovery
3.1 MSDD Principle: Generation of an Improved Reference from Prior Received Samples
In a naive DD, the last sample, {tilde under (r)}k-1, is just too noisy a phase reference. Let us then also process the earlier samples, i.e. generate our CPE by acting on moving window of L past samples, {tilde under (r)}k-1, {tilde under (r)}k-2, {tilde under (r)}k-3, . . . , {tilde under (r)}k-L, in order to form an improved reference, {tilde under (R)}k-1, and demodulate the received samples with it, forming an improved decision variable to be presented to the slicer (
i=1,2, . . . , (9)
may be used as alternative phase references (for i=1 we simply retrieve the original reference {tilde under (R)}k-1(1)≡{tilde under (r)}k-1). Each of these references are nearly aligned with {tilde under (r)}k-1 and may be used instead of the originally considered {tilde under (R)}k-1(1)≡{tilde under (r)}k-1 reference in order to delay-demodulate the received symbol {tilde under (r)}k. Demodulation with yields the “partial estimates”
each of which may be sliced in order to extract the k-th decision, {tilde under (ŝ)}k.
At this point let us clarify the usage of decision feedback (DF). The partial reference (9) presumes that the transmission symbols are known at the Rx, which is evidently not the case (as then we would just use them for decisions, setting {tilde under (ŝ)}k={tilde under (s)}k). In the absence of a “genie” whispering to us what the transmitted symbols are, the next best approximation is to use the slicer decisions {tilde under (ŝ)}k as estimates of the true {tilde under (s)}k. Thus, in an actual implementation, we replace the partial reference (9), by a decision-feedback derived one (just placing hats over the s-es),
{tilde under (R)}k-1(i)≡{tilde under (ŝ)}k-1{tilde under (ŝ)}k-2 . . . {tilde under (ŝ)}k-i+1{tilde under (r)}k-ii=1,2, . . . , (10)
As long as there is no decision error, then this is perfect, however decision errors, i.e. events {tilde under (ŝ)}k≠{tilde under (s)}k will generate erroneous rotation of the past symbols, such that some of the partial references will be incorrectly aligned. A theoretical analysis of the DF error propagation is outside the scope of the paper, but our numerical simulations invariably indicate that the impact of the error propagation is small Our simulations in section 8 fully account for the error propagation effect.
Which of the alternative phase references {tilde under (R)}k-1(i), i=2, 3, . . . should be used for demodulation? It turns out that no particular one is preferred; however, the question arises whether we can take advantage of them all, combining these partial references into an improved reference generating a higher quality decision. In the case ASE-induced PN is a significant component of the overall PN (which usually holds when coherent-grade lasers are used), as white ASE noise is dominant, the partial references are essentially mutually independent. In this case it is advantageous to form a linear combination of these partial references (in the simplest case take their sum), generating an improved reference, as follows:
This improved reference is used demodulate the received samples, generating an improved decision variable
which is then input into the slicer. The resulting CR system, as illustrated in
The improved reference {tilde under (R)}k-1 (11) is seen to be formed as a linear combination of L partial references, namely prior samples, phase rotated into alignment. A phasor diagram presenting the rotation (alignment) process of the various past samples is shown in
It appears advantageous to accrue the noise averaging effect over arbitrarily long windows (though in practice, we would get diminishing returns beyond a certain window size, and the computational complexity must also be taken into account). However, when LPN is present, an opposite effect is at work, namely the longer the record of past samples used in forming the improved reference, the worse the LPN induced degradation. Thus, a “block length” effect emerges—it does not pay to increase the block length L indefinitely, but there is an optimal block length, L, as determined by the balance of the ASE and laser phase noises. In this simplified analysis we assumed equal coefficients, ci (taken as unity without loss of generality), but more generally the linear combination coefficients may be arbitrarily selected, in the combined presence of ASE and LPN phase noise sources. In section 4 we apply Wiener filtering theory in order to determine unequal optimal coefficients ĉi which yield the best performance for any given block length L, striking the best balance between the opposing effects of ASE and LPN.
3.2 MSDD Alternative Formulation in Terms of Partial DD Estimators
To derive an alternative point of view of the MSDD demodulation process, let us substitute the improved reference (11) into the demodulation relation,
yielding
where we used (9) for the i-th partial reference and introduced the i-th partial estimator
It is apparent in (12) that the MSDD estimate of the transmitted symbol may be expressed as a linear combination of partial estimators , each obtained by demodulation with a partial reference, each of which could by itself provide a valid, albeit noisier, estimate for the information symbol, as described in the block diagram of
4. Optimal Wiener-Filtering Based Minimum Mean Square Error (MMSE) Solution
In this section we derive the MMSE optimal solution, which aims at minimizing the Mean Square Error (MSE) between the QPSK or QAM symbols, {tilde under (s)}k and their estimates , as generated at the MSDD output (slicer input). Introducing the estimation error, we seek the optimal MSDD coefficients minimizing the MSE, |{tilde under (ε)}k|2=|{tilde under (s)}k−{hacek over ({tilde under (s)}k|2.
Note that, for the purpose of QPSK detection, we have heretofore ignored the magnitude (modulus) of the improved estimate ={tilde under (r)}k{tilde under (R)}k-1*, which is generated by mixing the received symbol {tilde under (r)}k with the improved reference
in the process of the generation of which the magnitudes were not normalized. As the QPSK slicer essentially acts on the angle of , ignoring the magnitude || does not pose a problem. A different length of the reference phasor will just scale the modulus of estimate without affecting its phase. However, once QPSK transmission is extended to QAM, the references magnitudes do become important. Even in the current QPSK context, proper processing of references magnitudes does become essential in the MMSE formulation and derivation. Indeed, although the phase of our slicer input generated by the MSDD tends to be close to that of the actual transmission symbol, {tilde under (s)}k, nevertheless if the magnitudes of and {tilde under (s)}k are disparate, then a large MSE deviation may still be generated, defeating the minimization process. Thus, in order to properly optimize the MSDD coefficients, it is imperative to properly scale magnitudes, such that the estimate be made to approach {tilde under (s)}k not only in phase but also in modulus, and a small residual estimation error may be generated. Here we use the Uop normalization (2) as a key step enabling to devise a modified MSDD structure for QPSK (also applicable to QAM), suitable for attaining the MMSE condition. To this end, we propose to apply the Uop to the partial references, {tilde under (R)}k-1(i), now to be replaced by Uop-normalized versions {tilde under ({hacek over (R)}k-1(i) (which preserve the original angles of {tilde under (R)}k-1(i), i.e., are still nearly aligned with {tilde under (r)}k-1, hence are also suitable to form an improved reference):
{tilde under ({hacek over (R)}k-1(i)=U{{tilde under (r)}k-ik-i+1k-i+2 . . . k-1}={tilde under ({hacek over (r)}k-ik-i+1k-i+2 . . . k-1 (14)
The resulting MSDD improved reference is then formed by the linear combination
Here, the inverted under-hat at the bottom of k-1, through resembling the inverted over-hat used to denote the Uop, does not actually indicate, that k-1 is a normalized unimodular quantity, but it rather signifies here that it is formed as a linear combination of quantities which are themselves unimodular. Notice that a linear combination of Uop-normalized quantifies is generally not unimodularly normalized itself (unimodularity is not preserved under a linear combination). In fact, whether or not is unimodular depends on the c-coefficients selection, which for the MMSE solution pursued below, will assume optimal values which make k-1 nearly unimodular (notice that
This modified version of the MSDD will be referred to as “U-notU”, as the partial references are Uop normalized, whereas the improved reference k-1 is not necessarily Uop normalized.
Using the U-notU magnitude normalizations proposed here, the modified “U-notU” MSDD is analyzed in Appendix B in terms of the phase-noisy memoriless channel model.
The overall improved estimate (12) is a linear combination of the partial estimates. It is then useful to explicitly express dependency of the partial estimates on the phase noise (7):
where in the last expression we used the generalized delay-detection relation (34) derived in Appendix A, namely
{tilde under (s)}k={tilde under (A)}k{tilde under ({hacek over (A)}k-i*{tilde under ({hacek over (s)}k-i+1*{tilde under ({hacek over (s)}k-i+2* . . . {tilde under ({hacek over (s)}k-1*.
The resulting Eq. (16) indicates that indeed qualify as partial estimators for {tilde under (s)}k, as they essentially coincide with the transmitted symbols {tilde under (s)}k, apart from multiplicative phase noise perturbations {tilde under (p)}k{tilde under ({hacek over (P)}k-i.
Considering the U-notU modified MSDD structure, as introduced above, we now address the problem of optimizing the c-coefficients such as to minimize the Mean Square Error (MSE) between the transmitted symbol, ŝk, and its MSDD estimate . First, we compactly express the MSDD estimate in terms of inner products between a coefficients vector and vectors of partial estimates and partial references (here denotes the conjugate transpose, while the overbar is an alternative notation for the complex conjugate):
The estimation error is then expressed as, and the MSE is written as:
|{tilde under (ε)}k|2={tilde under (s)}k−k|2=|{tilde under (s)}k−c†k|2
We seek the optimal coefficients vector minimizing the MSE. Even prior to deriving the rigorous MMSE solution on approx. magnitude preservation condition:
e can infer an approximate constraint on the coefficients:
The Wiener formulation is standard, leading to the Wiener-Hopf (W-H) equations for the optimal coefficients c:
where ΓAB=AB† generically denotes the correlation matrix of two column vectors (in particular or B might be scalar), the autocorrelation matrix is denoted, and the components of the correlation matrices are defined as
Most of Appendix B is devoted to evaluating the joint second order statistics (20) for our optical channel (5), resulting in expressions (48) which are substituted into (19), reducing to the following operational form of the W-H linear system of equations in the L unknowns cj:
where we defined the time-dependent transmission SNR, with the expectation taken over all constellation points. Thus, the inverse expected SNR parameter featuring in (21) is given by:
This derivation is generally applicable to QAM, though our interest in this paper is in QPSK. Our final form (21) of the W-H equations for the U-notU optimal coefficients cj may be solved numerically offline, provided that the statistical/physical parameters (signal power, ASE noise variance and laser linewidth) have been estimated. A more practical approach pursued next is to derive an LMS adaptation scheme for the coefficients, such that the coefficients are iteratively adjusted, approximately converging to the optimal MMSE values mandated by the W-H equation, automatically learning the phase-noise channel statistics.
5. LMS Algorithm for the MSDD Coefficients
In practice, the channel phase-noise statistics (balance of laser phase noise, ASE, and also nonlinear phase noise contributions) is unknown and may even be time-varying. Therefore, it is advantageous to devise an adaptive method to approach the optimal MSDD coefficients automatically. Here we derive an LMS algorithm for the “U-notU” MSDD coefficients.
Conjugate-transposing the orthogonality relation (37), yields
{tilde under (ε)}k†=0{tilde under (ε)}k*=({tilde under (s)}k*−cT*)=0 (23)
Let us introduce the updates vector and substitute ={tilde under (r)}kk-1*:
where the last expression in (24) was obtained substituting (14). In light of (23), the updates vector has zero expectation, U[k]={tilde under (ε)}k*=0, whenever the coefficients are MMSE optimal. The elements of the update vector provide the coefficient updates for the LMS algorithm associated with the MMSE problem. When its expectation is not zero, the update vector tells us in which direction to adjust the coefficients in order to advance to zero expectation, i.e. to optimal coefficients. To verify that the proper coefficients update vector for the LMS algorithm is indeed given by (24), we evaluate the squared error (SE) gradient,
|{tilde under (ε)}k|2={tilde under (ε)}k{tilde under (
(i.e., without taking the expectation). Using the Wirtinger complex-conjugate derivative technique as described in [33], the SE gradient ∇c≡[∂c1, ∂c2, . . . , ∂cL]T with respect to the coefficient vector is derived as follows:
Here ∇
Substituting the coefficient update (24) yields our final result, the LMS coefficients recursion:
with
while the error is expressed in terms of as per (15):
{tilde under (ε)}k={tilde under (s)}k−={tilde under (s)}k−{tilde under (r)}kk-1* (29)
This leads to an adaptive U-notU version of the MSDD LMS CR, as implemented in
6. Efficient Hardware Implementations
In this section we derive an efficient hardware implementation for the MSDD sub-system, as illustrated in
seems to require i−1 multiplications by s-symbols to be applied to {tilde under (r)}k-i, per clock cycle. The diagram of
In addition, we modify the block diagram of
The block diagram further features a coefficients control module tasked with generating the optimal coefficients, ci, whether by an offline MMSE calculation (solution of the W-H equation as derived in the last section, or alternatively (preferably) by means of the adaptive LMS algorithm (28). In addition, in order to implement the U-notU MSDD modification, a Uop acting on the received samples, {tilde under (r)}k, is inserted ahead of the partial references delay line at the top of the figure.
6.1 MSDD Hardware Realization Complexity (Excluding the Adaptive Coefficients Control)
Inspecting
The delay and rotation module 190 includes a sequence of normalizing module 262, delay unit 208, delay unit 210, multiplier 212, delay unit 214, multiplier 216, delay unit 218, delay unit 220 and multiplier 222.
The multiplication and summation module 195 is illustrated as including multipliers 230, 232, 234 and 236 and adders 240, 242, 244 and 246. The output signal of adder 246 and a complex conjugate of the input receiver signal are fed to multiplier (Demodulator) 252 that has its output signal sent to decision module 260. The normalized decision module output signal is fed to multipliers 222, 216 and 212.
A simplified system is obtained for ci=1/L, replacing the L coefficients multiplications by a single scaling multiplication performed prior to the demodulation, as indicated in
The carrier recovery module of
6.2 MSDD with Adaptive Coefficients Control and its Total Complexity
At the high-end extreme, consider a high performance system with its coefficients LMS-optimized, as described in
In
Accounting now for the contribution of the coefficients adaptation to complexity, we must consider additional CMs: Another full-fledged CM, {tilde under (ε)}k*{tilde under (r)}k, an easy CM generating fixed scaling by μ (which may be quantized to a convenient value, with few one-bits, which is simple to multiply by, so it will not be counted), then another full-fledged CM {tilde under (r)}kk-1 required for generating the error, (29), plus L full-fledged CMs generating the coefficient updates,
Thus L+2 extra full-fledged the adaptive part, which when added to the L+2⅓ non-adaptive multiplications, yields a total of full-fledged complex multiplications for the high-end adaptive CR realization of
7. Polyphase Parallelization
Due to its usage of decision-feedback, the MSDD algorithm poses an implementation challenge for coherent optical receivers operating at tens of GBd rates, given that the fastest multipliers currently available with state-of-the-art ASIC technology operate at the rate of 2 to 3 GHz. As shown in [9], decision feedback (DF) based algorithms are not directly amenable to parallelization. Indeed, DF creates a dependency between modules, precluding independent parallel operation of identical processing sub-modules. Thus, a polyphase decomposition, i.e., time-parallelization of the processing using identical processing units operating on the polyphase components, would not equivalent to (in fact would have reduced performance relative to) our nominal MSDD, hypothetically operating at the full high rate. Nevertheless, realization-wise we adopt such parallelization strategy as shown in
In order to enable MSDD polyphase operation at the Rx, the Tx is modified to also support a polyphase version of differential precoding, comprising P parallel MP-DPs modules, each operating at reduced rate by a factor of 1/P, as shown in
The coherent receiver backend 310 is followed by coherent receiver front end, polyphase demultiplexer 314, S/P module 316, multiple parallel MSDD polyphase sub-modules 318 and P/S 320.
In general the processing is partitioned (into P parallel sub-modules, each acting on a received polyphase. Notice that the clock rates of the DP modules in the Tx and MSDD modules in the Rx are reduced by a factor of P.
7.1 The Distant Feedback (DF) Problem in Parallelized MSDD Processing
When using the polyphase implementation just introduced, the inputs to each MSDD sub-module are in jumps of P. The larger separation between MSDD input samples does not affect the white ASE noise performance, as there is no correlation between distinct ASE samples of white noise no matter how far apart. However, LPN noise performance is degraded under the polyphase implementation, as samples further away from each other are less correlated, and their relative phase noise is increased. Since the laser phase noise is a Weiner process with independent increments, Ωk=φk−φk-1, with variance proportional to the time interval T between samples (processing latency), i.e. inversely proportional to the sampling rate, it follows that reduction in sampling rate by a factor of P, due to parallelization, increases the variance of the laser phase noise by a factor of P. This amounts to having an effective laser linewidth P times wider. We refer to this laser phase noise tolerance penalty as the distant feedback effect, exacting a penalty due to the multiple parallel processing paths, which are inevitable at current CMOS clock speeds. Thus, the LPN tolerance will be degraded by a factor of P due to the parallelization, nevertheless, as the normalized phase noise tolerance of the MSDD method is very high to begin with (unless the laser phase noise is dominant relative to the ASE), the penalty will be seen to be small.
8. Simulation Results
The simple channel model of subsection 2.2 is assumed here (
The performances of the Viterbi&Viterbi M-power QPSK CR 360 is compared to various carrier recovery modules according to an embodiment of the invention.
In all Monte-Carlo and LMS simulations we assume a 100 G PDM-QPSK system at 28 GBd baudrate per polarization, simulating a single polarization. We also assume a parallelization factor of P=16, i.e. the DP transmission and MSDD detection is parallelized, as per
Curve 401 of
Notice that the robustness of HDD is much higher than that of soft differential decoding, {tilde under (r)}k{tilde under (r)}k-1*, which corresponds to L=1 (i.e., the window of past samples just includes the last sample), as the HDD hard decision is in error when either of the {tilde under (ŝ)}k, {tilde under (ŝ)}k-1 hard decisions are in error, which occurs with probability double that of either of them being in error. A linear factor of 2 on the BER scale corresponds to about 0.8 dB penalty at BER=10E-3, which is much smaller than the ˜3 dB penalty of differential decoder, as derived in sub-section 2.4. The ˜2.2 dB gap between soft and hard differential decoding is bridged over by the MSDD—the higher the window size L, the more the HDD limit is approached. Here, in the absence of LPN, the white-noise performance is monotonic increasing in L.
The final performance attainable with uniform (all equal to 1/L) coefficients, vs. Wiener-optimal and LMS coefficients is shown in
9. Conclusions
In this paper we introduced the MSDD principle, explaining in detail how a moving window of L prior symbols may be linearly processed in order to generate a cleaner demodulation reference, relative to other carrier-recovery methods. The two MSDD versions presented here (multiplier-free vs. optimized) provide the least complex CR system vs. the best performance, as borne by numeric simulations indicating up to 1.9 dB advantage over the Viterbi&Viterbi algorithms and ultra-low complexity multiplier-free CPE realization.
Moreover, the MSDD features linear (time-varying) processing hence is free of cycle-slips and other phase unwrapping impairments.
The only weakness of MSDD is its reliance on decision-feedback, which exacts a “distant-feedback” linewidth penalty upon polyphases parallelization. Nevertheless, the simulated performance indicates that the resulting degradation is negligible up to ˜0.5 MHz linewidth, thus for practical coherent systems, the limited linewidth tolerance may not be an issue—it is the improved resilience in the lower OSNR regime that makes MSDD the preferred scheme.
This work was devoted to coherent QPSK transmission, yet the MSDD CPE method may be extended to higher modulation formats. MSDD QAM operation was previewed in [30][31], however this extensive key topic will be fully elaborated in a future publication, covering unique aspects of adaptive MSDD for QAM: consolidation of carrier phase and carrier frequency estimation in a single MSDD system, seamless transition between QAM constellation sizes and automatic adaptive scaling of the received QAM constellation.
Despite the proliferation of CR techniques, e.g. [1-10], we are convinced that the MSDD approach features the best performance-complexity tradeoffs and will evolve to be increasingly adopted as the carrier recovery method of choice.
Method 1500 starts by stage 1510 of receiving a receiver input signal that is representative of an optically coherent signal that was received by the receiver as a result of a transmission, by an optical transmitter, of a transmitter signal that is a carrier signal that is being modulated by information.
Stage 1510 may be followed by stage 1520 of generating, by a reference signal generator, a reference signal that estimates the carrier signal.
Stage 1520 may be followed by stage 1530 of demodulating the receiver input signal by the reference signal to provide a demodulated signal
Stage 1530 may be followed by stage 1540 of evaluating the demodulated signal, by a decision module, to provide an decision module output signal that estimates the carrier signal
Stage 1520 may include delaying, by a delay and rotation module, receiver input signals to provide delayed receiver input signals; aligning the delayed receiver input signals by a rotation that is responsive to the decision module output signal thereby providing aligned signals; and calculating, by a multiplication and summation module, a weighted sum of the aligned signals to provide the reference signal.
Some Uop properties: The Uop distributes over products, i.e. the Uop of a product is the product of Uops: {tilde under (ν)}={tilde under (z)}{tilde under (w)}{tilde under ({hacek over (ν)}={tilde under ({hacek over (Z )}{tilde under ({hacek over (w)}; Uop is an idempotent operation: The last two relations lead to {tilde under (ν)}={tilde under (z)}{tilde under ({hacek over (w)}{tilde under ({hacek over (ν)}={tilde under ({hacek over (z)}{tilde under ({hacek over (w)}.
Next, let us evaluate the computational complexity of generating the Uop,
The operational form above indicates that we require a complex-real multiplier (i.e. two real-multipliers (RMs)), a look-up table (LUT) and the absolute square operation comprising two RMs. Just counting multipliers, resulting overall Uop complexity is 4 RMs. As a single complex multiplier takes three real-multipliers to execute, it is apparent that the Uop complexity essentially amounts to CMs.
Consider now the DP recursion (3), relating two line symbols separated by one discrete-time unit. More generally by repeated application of (3), shifted back in time we have
{tilde under ({hacek over (A)}k-i{tilde under (s)}k-i+1={tilde under (A)}k-i+1{tilde under ({hacek over (A)}k-i+1{tilde under (s)}k-i+2={tilde under (A)}k-i+2 . . . {tilde under ({hacek over (A)}k-2{tilde under (s)}k-1={tilde under (A)}k-1 (31)
i.e. we have a more general recursion, essentially relating two line symbols which are i time units apart by a complex rotation through the unimodular product:
{tilde under (A)}k-i−i+1−i+2 . . . −1={tilde under (A)}k-1 (32)
It is readily verified that the delay-detection operation, {tilde under (A)}k{tilde under ({hacek over (A)}k-1*, undoes DP. Indeed,
{tilde under (A)}k{tilde under ({hacek over (A)}k-1*=({tilde under (s)}k{tilde under ({hacek over (A)}k-1*={tilde under (s)}k({tilde under ({hacek over (A)}k-1*{tilde under ({hacek over (A)}k-1*)={tilde under (s)}k. (33)
Thus, and more generally we have the recursion
{tilde under (s)}k={tilde under (A)}k{tilde under ({hacek over (A)}k-i{tilde under ({hacek over (s)}k-i+1*{tilde under ({hacek over (s)}k-i+2 . . . {tilde under ({hacek over (s)}k-1* (34)
which is readily proven using (32), as follows:
To derive the MMSE solution, minimizing (18), we invoke the orthogonality principle of linear estimation. The optimal coefficients vector is obtained from the condition that c†{tilde under (ŝ)}k be the projection of the estimation target {tilde under (s)}k onto the “observations” subspace, i.e. the estimation error be orthogonal to each of the “observations” (which correspond here to the inputs into the linear estimator
Substituting the estimation error {tilde under (ε)}k={tilde under (s)}k−c† into the last equation yields the W-H equation:
It remains to evaluate the second-order statistics. Working out the cross-correlation vector first, its i-th element is given by
Next evaluating the autocorrelation matrix elements, we have
These statistics are in turn determined by the second- and fourth-order statistics of the multiplicative phase noise sequence (7), {tilde under (p)}k. Start with evaluating the conjugate product:
where we introduced a notation for the LPN increment between two discrete-times:
This phase noise increment is zero-mean Gaussian distributed with variance:
Using well-known Wiener phase noise statistical techniques [35] the expected phase noise exponent in (40) is then given by:
exp[jφk
Taking the expectation of (40) yields the autocorrelation of the PN factors:
We now expand the quadruple product of multiplicative PN factors:
We next take the expectation of this quadruple product. Since the phase noise is independent of η, the expectation factors out over the LPN exponents and the sum of η terms and products thereof, out of which the only terms not having null expectation are the conjugate double products (to show that the other double, triple and quadruple products have zero mean, one invokes the whiteness and circularity of the noise sequence):
In the special case of interest, (46) reduces to:
Substituting (44), (47) of the p-sequence into (38), (39) respectively, yields our final results for the second-order statistics required in formulating the Wiener-Hopf equations:
The two leftmost columns list the 18 abbreviations specific to this paper the third column contains abbreviations in general use.
The two leftmost columns list the 18 abbreviations specific to this paper the third column contains abbreviations in general use.
The invention may also be implemented in a computer program for running on a computer system, at least including code portions for performing steps of a method according to the invention when run on a programmable apparatus, such as a computer system or enabling a programmable apparatus to perform functions of a device or system according to the invention.
A computer program is a list of instructions such as a particular application program and/or an operating system. The computer program may for instance include one or more of: a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system. The computer program may be stored internally on a non-transitory computer readable medium. All or some of the computer program may be provided on computer readable media permanently, removably or remotely coupled to an information processing system. The computer readable media may include, for example and without limitation, any number of the following: magnetic storage media including disk and tape storage media; optical storage media such as compact disk media (e.g., CD-ROM, CD-R, etc.) and digital video disk storage media; nonvolatile memory storage media including semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM; ferromagnetic digital memories; MRAM; volatile storage media including registers, buffers or caches, main memory, RAM, etc.
A computer process typically includes an executing (running) program or portion of a program, current program values and state information, and the resources used by the operating system to manage the execution of the process. An operating system (OS) is the software that manages the sharing of the resources of a computer and provides programmers with an interface used to access those resources. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the system.
The computer system may for instance include at least one processing unit, associated memory and a number of input/output (I/O) devices. When executing the computer program, the computer system processes information according to the computer program and produces resultant output information via I/O devices.
In the foregoing specification, the invention has been described with reference to specific examples of embodiments of the invention. It will, however, be evident that various modifications and changes may be made therein without departing from the broader spirit and scope of the invention as set forth in the appended claims.
The terms “front,” “back,” “top,” “bottom,” “over,” “under” and the like in the description and in the claims, if any, are used for descriptive purposes and not necessarily for describing permanent relative positions. It is understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in other orientations than those illustrated or otherwise described herein.
The connections as discussed herein may be any type of connection suitable to transfer signals from or to the respective nodes, units or devices, for example via intermediate devices. Accordingly, unless implied or stated otherwise, the connections may for example be direct connections or indirect connections. The connections may be illustrated or described in reference to being a single connection, a plurality of connections, unidirectional connections, or bidirectional connections.
However, different embodiments may vary the implementation of the connections. For example, separate unidirectional connections may be used rather than bidirectional connections and vice versa. Also, plurality of connections may be replaced with a single connection that transfers multiple signals serially or in a time multiplexed manner. Likewise, single connections carrying multiple signals may be separated out into various different connections carrying subsets of these signals.
Therefore, many options exist for transferring signals.
Those skilled in the art will recognize that the boundaries between logic blocks are merely illustrative and that alternative embodiments may merge logic blocks or circuit elements or impose an alternate decomposition of functionality upon various logic blocks or circuit elements. Thus, it is to be understood that the architectures depicted herein are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality.
Any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being “operably connected,” or “operably coupled,” to each other to achieve the desired functionality. Furthermore, those skilled in the art will recognize that boundaries between the above described operations merely illustrative. The multiple operations may be combined into a single operation, a single operation may be distributed in additional operations and operations may be executed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments. Also for example, in one embodiment, the illustrated examples may be implemented as circuitry located on a single integrated circuit or within a same device. Alternatively, the examples may be implemented as any number of separate integrated circuits or separate devices interconnected with each other in a suitable manner.
Also for example, the examples, or portions thereof, may implemented as soft or code representations of physical circuitry or of logical representations convertible into physical circuitry, such as in a hardware description language of any appropriate type. Also, the invention is not limited to physical devices or units implemented in non-programmable hardware but can also be applied in programmable devices or units able to perform the desired device functions by operating in accordance with suitable program code, such as mainframes, minicomputers, servers, workstations, personal computers, notepads, personal digital assistants, electronic games, automotive and other embedded systems, cell phones and various other wireless devices, commonly denoted in this application as ‘computer systems’.
However, other modifications, variations and alternatives are also possible.
The specifications and drawings are, accordingly, to be regarded in an illustrative rather than in a restrictive sense. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word ‘comprising’ does not exclude the presence of other elements or steps then those listed in a claim. Furthermore, the terms “a” or “an,” as used herein, are defined as one or more than one. Also, the use of introductory phrases such as “at least one” and “one or more” in the claims should not be construed to imply that the introduction of another claim element by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim element to inventions containing only one such element, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an.”
The same holds true for the use of definite articles. Unless stated otherwise, terms such as “first” and “second” are used to arbitrarily distinguish between the elements such terms describe. Thus, these terms are not necessarily intended to indicate temporal or other prioritization of such elements. The mere fact that certain measures are recited in mutually different claims does not indicate that a combination of these measures cannot be used to advantage. While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.
Any of the systems above can be arranged to execute a method. Any method illustrated in the specification or the drawings can be implemented by executing instructions that are stored in a non-transitory computer readable medium.
This patent application claims priority from U.S. provisional patent Ser. No. 61/577,107 filing date Dec. 19, 2011 which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
8041233 | Hueda et al. | Oct 2011 | B2 |
8385747 | Roberts et al. | Feb 2013 | B2 |
20080025733 | Nazarathy et al. | Jan 2008 | A1 |
20090208224 | Kikuchi | Aug 2009 | A1 |
20100177815 | Garg et al. | Jul 2010 | A1 |
20100322637 | Hayee et al. | Dec 2010 | A1 |
Number | Date | Country | |
---|---|---|---|
20140050493 A1 | Feb 2014 | US |
Number | Date | Country | |
---|---|---|---|
61577107 | Dec 2011 | US |