The present invention relates generally to optical fiber transmission systems and, more particularly, to methods and apparatus for measuring and controlling polarization mode dispersion (PMD) in such optical fiber transmission systems.
Optical communication systems increasingly employ wavelength division multiplexing (WDM) techniques to transmit multiple information signals on the same fiber at different optical frequencies. WDM techniques are being used to meet the increasing demands for improved speed and bandwidth in optical transmission applications, including fiber optic communication systems.
Polarization mode dispersion (PMD) is a fundamental problem in single-mode fiber optic communication systems that has limited the channel capacity. Fiber asymmetry gives rise to random birefringence effects, resulting in differential group delay (DGD) between the two principle polarization modes. Wavelength dependent coupling of the modes then leads to polarization mode dispersion, which becomes the dominant limiting factor of transmission rate as bandwidth increases.
A need therefore exists for improved techniques for compensating for polarization mode dispersion in order to further improve the efficiency and channel capacity of high-speed fiber optic communication.
Generally, a method and apparatus are disclosed for compensating for polarization mode dispersion using cascades of all-pass filters and directional couplers. The disclosed PMD compensator adjusts the coefficients of an adaptive filter structure involving all-pass filters and directional couplers based on a minimized cost function. In one implementation, a stochastic gradient algorithm, also referred to as the least mean square (LMS) algorithm, is employed to sequentially reduce the value of the cost function by the method of steepest descent. In another implementation, convergence is improved by employing a Newton algorithm that uses second derivatives to accelerate convergence.
A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings.
The present invention provides a method and apparatus for compensating for polarization mode dispersion using cascades of all-pass filters and directional couplers. The disclosed PMD compensator adjusts the coefficients of an adaptive filter structure involving all-pass filters and directional couplers based on a minimized cost function. Initially, the phase and amplitude of polarization components are evaluated in order to characterize the PMD of channels and adjust tunable PMD compensators.
As shown in
At a receiver, demultiplexer 226, preferably at the downstream end of optical transmission path 213, separates the wavelength channels. Optical coupler 227 provides a wavelength channel to the polarization mode dispersion compensator (PMDC) 110 (the adaptive optical filter). Coupler 227 also provides light to the PMD measuring apparatus 300. The channel estimate made at PMD measuring apparatus 300 generates an adaptive PMD correction signal to control PMDC 110. Receiver (RX) 100 receives the compensated wavelength channel signal. The RF detectors (not shown in
Both polarizations see the same narrowband filter but in counterpropagating directions. The filters are tuned, for example, by thermo-optic phase shifters. The APF's are identical in principle, but any variations can be handled by calibration. Each APF is designed to provide a very sharp transition in its phase response from 0 to 2π near resonance. On-resonance, the phase is π. Off-resonance, the phase is ideally 0 or 2π. As the resonant frequency is shifted via a phase shifter in the feedback path, the phase response is translated across the channel spectrum and the RF detectors X 306 and Y 307 record different linear combinations of beats between adjacent tones. An RF reference 313 is obtained from a light signal tapped before the polarization beam splitter 301 and fed to an RF detector and phase locked loop incorporating a voltage controlled oscillator 314 to develop an RF reference signal 313. Blocks 311 and 312 measure the phase of the X and Y components as detected by 306 and 307 with respect to the RF reference 313. For a detailed discussion of the PMD measuring apparatus 300, see U.S. patent application Ser. No. 10/180,842, entitled “Apparatus and Method for Measurement and Adaptive Control of Polarization Mode Dispersion in Optical Fiber Transmission Systems,” incorporated by reference herein.
According to another aspect of the invention, adaptive algorithms are provided for two-channel (two-input/two-output) structures consisting of multiple cascades of all-pass filters and directional couplers. While this exemplary architecture is employed for compensation of polarization mode dispersion, the results apply more generally to this class of filters, as would be apparent to a person of ordinary skill in the art. For a detailed discussion of an application of this architecture to PMD compensation using multiple cascades of all-pass filters and directional couplers, see C. K. Madsen, “Optical All-Pass Filters for Polarization Mode Dispersion Compensation,” Optics Letters, Vol. 25, No. 12, 878-80 (June, 2000), incorporated by reference herein.
Models for PMD Generation and Compensation
is a unitary matrix representing a directional coupler 510, 530 with parameter value 0.5 and j={square root}{square root over (−1)}. In this manner, the simplified PMD compensator 600 of
The multistage all-pass filter A 540 is mathematically expressed as:
where P is the number of stages and the response of each stage is written as follows:
where ap is a complex number that specifies the pole location and * denotes complex conjugate. (The complementary zero location is at 1/a*p.) The sections Ap(z) are all-pass functions so that |Ap(Z)|=1 at all frequencies.
Likewise, a Q-stage filter B 520 is defined as
where Q is the number of stages and the response of each stage is written as follows:
Cascading the above in the order of
This shows how the input polarization signals are coupled to the output signals, as determined by the all-pass model filters Aopt 540 and Bopt 520. When Aopt=Bopt=−1, y=x and there is no distortion.
Now consider the response of the PMD compensator 600 of
Again, note that {circumflex over (x)}=y for A=B=−1. Also note that for A=Aopt and B=Bopt, the matrices in Equations (6) and (7) are conjugate transpose pairs; in fact, it can be easily demonstrated that in this case they are unitary so that their product is the identity matrix and {circumflex over (x)}=x. Thus, the compensator completely removes the modeled PMD for A=Aopt and B=Bopt.
The difference between the ideal input signal vector x and its recovered estimate {circumflex over (x)} forms an error signal vector
e(z)=x(z)−{circumflex over (x)}(z) (8)
which should be small. In other words, the error signal vector, e(z), (having two values, one for each principle polarization mode) characterizes the difference between the signal that was transmitted, x(z), (which is known) and the response, {circumflex over (x)}(z), of the PMD compensator. Here, it is assumed that compensation is performed over a discrete set of K representative frequencies zk, k=1, . . . , K, and form the weighted cost function:
Equation (9) looks at the two components of the error signal vector, e(z), as functions of frequency (zk). Generally, Equation (9) determines a weighted sum of the squared error, that is summed over K frequencies.
As discussed hereinafter, the present invention provides adaptive algorithms for adjusting A and B to minimize this cost function, J. The complex (amplitude and phase) measurements of the optical PMD compensator output {circumflex over (x)} for each frequency zk are provided by the detector 310 of
According to another aspect of the invention, two exemplary algorithms are presented for adapting the coefficients of a two-channel adaptive filter structure involving two all-pass filters and two directional couplers. Both exemplary adaptive algorithms minimize the cost function of equation (9). The first adaptive algorithm is a stochastic gradient algorithm, also referred to as the least mean square algorithm, and sequentially reduces the value of J by the method of steepest descent. For the simplified model of
As is well known, the LMS algorithm has the disadvantage of slow convergence for some applications in which the effect of the adjustable coefficients is closely coupled. As shown hereinafter, such behavior results for the all-pass PMD compensation context of the present invention. Therefore, a Newton algorithm is also derived for this application that makes use of second derivatives to accelerate convergence. For both algorithms, it will be necessary to calculate derivatives of the error signals e1 and e2 in (9).
LMS Algorithm
The LMS algorithm adapts the (complex) coefficients of the PMD compensator all-pass filters employed by equations (3) and (5) as follows:
where μ is the step size and
denotes the complex derivative of J with respect to the complex variable c, defined as:
where R(c) and F(c) are, respectively, the real and imaginary parts of c=R(c)+jF(c).
As previously indicated, these updates are deterministic for the simplified model. However, in the case of random signals, this also provides instantaneous gradient updates which are subsequently smoothed by the recursive nature of the algorithm, as controlled by the selection of u. In general, the selection of step size value must take into account the competing objectives of rapid convergence/tracking, and maintaining stability (and smoothing in the case of random signals).
Differentiating equation (9) with respect to ap, it can be shown that:
where apr and api are, respectively, the real and imaginary parts of ap=apr+jdpi,
is a real valued function, independent of the index p. Combining equations (12a) and (12b), the complex derivative can be more compactly written as
The complex derivative can be alternatively calculated by considering ap and a*p to be independent variables and differentiating with respect to a*p. However, the derivatives are separately derived with respect to the real and imaginary parts of ap because they will be needed in the next section to calculate the second derivatives.
Likewise, for B, the following is obtained:
A composite coefficient vector is defined as:
where a≡[a1 a2 . . . ap]T and b≡[b1 b2 . . . bQ]T, then equation (10) can be compactly written as:
is the (P+Q)×1 complex gradient of J with respect to w, and
It has been observed that the LMS algorithm demonstrates slow convergence, which can be solved using the Newton algorithm to accelerate convergence, discussed hereinafter.
Newton Algorithm
Newton's method involves multiplying the gradient by the inverse matrix of second derivatives, or Hessian. It has been found that it is not generally possible to apply this to the complex LMS algorithm defined by equation (21), whereby the Hessian is a complex matrix, even in the simplest case with a single complex coefficient. Therefore, the real and imaginary components of the weight update, as in equations (12) and (16), must be dealt with separately.
Taking second derivatives of equation (9) with respect to the real and imaginary parts of ap and bp, it can be shown that:
where Gp and Hq are defined, respectively, in (13) and (17), δpq=1 for p−q and 0 otherwise, and
are real functions.
It is noted that as e1→0 and e2→0, G11 and G22 are non-negative, and F1→0 and F2→0. Therefore, as the error goes to zero, equations (24) and (25) show that the diagonal terms
are all non-negative, thus establishing a quadratic minimum.
The P×P Hessian sub-matrix is now assembled from equation (24a) as
Likewise,
are formed. Then, defining the 2P×1 coefficient vector as:
where ar≡[a1r a2r . . . apr] and ai≡[a1i a2i . . . api], we form the 2P×2P Hessian matrix:
Similarly, from equations (25) and (26), the 2Q×2Q matrix is obtained as follows:
and the 2Q×2P matrix is expressed as follows:
The above results are consolidated and the Newton algorithm is then formulated. First, a composite real coefficient vector is defined as follows:
and the composite (2P+2Q)×(2P+2Q) Hessian matrix is formed as follows:
Similarly, using equations (12) and (16), the real gradient of J can be assembled with respect to the composite real coefficient vector (35) as follows:
Finally, with the above definitions, the Newton algorithm is written as
w(n+1)=w(n)−μH−1∇(J). (38)
Generally, it has been observed that with the Newton algorithm, the error converges more quickly and steadily to zero, relative to the LMS algorithm which takes a longer time to converge completely. Even for 10% initialization with the Newton algorithm, convergence to zero was rapidly attained, although there may be some potential instability at some intermediate point.
In one variation, the capability of the Newton algorithm is extended by first running the LMS algorithm for a small number of samples and then using the coefficient values to initialize the Newton algorithm. It is noted that both the LMS and Newton algorithms are susceptible to getting stuck in a local minimum. The theoretical basis for this phenomenon is related to the nature of the MSE gradient, which for the all-pass structure turns out to have components that only differ by a function that depends on the coefficient value. Therefore, if any two coefficients ever reach the same value they will remain locked together, thereby leading to the local minimum. One way around this problem is to compute multiple solutions starting with a suitable number of initial guesses spaced out over the feasible parameter space.
As is known in the art, the methods and apparatus discussed herein may be distributed as an article of manufacture that itself comprises a computer readable medium having computer readable code means embodied thereon. The computer readable program code means is operable, in conjunction with a computer system, to carry out all or some of the steps to perform the methods or create the apparatuses discussed herein. The computer readable medium may be a recordable medium (e.g., floppy disks, hard drives, compact disks such as DVD, or memory cards) or may be a transmission medium (e.g., a network comprising fiber-optics, the world-wide web, cables, or a wireless channel using time-division multiple access, code-division multiple access, or other radio-frequency channel). Any medium known or developed that can store information suitable for use with a computer system may be used. The computer readable code means is any mechanism for allowing a computer to read instructions and data, such as magnetic variations on a magnetic media or height variations on the surface of a compact disk, such as a DVD.
It is to be understood that the embodiments and variations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention.