The present invention generally relates to devices and methods for digital communication testing, and more particularly relates to a method and device for detecting and analyzing bit error and bit slip error patterns for high speed serial and parallel data links.
High speed serial single- and multi-lane data links and associated devices often introduce distortion in digital signals, which may result in bit errors and bit slip errors in a digital receiver. Examples of such data links that can cause bit errors are links within components, such as for example a 28 Gbit/s quad retimer, between components on a PCB, such as for example an electrical CAUI-4 interface used for 100 Gbit/s transponders, board-to-board links within a system, such as for example electrical backplane links, or system-to-system links, such as for example an optical 100 Gbit/s Ethernet 100 GBASE-LR4 link. Diagnosing and analyzing root causes of such bit errors and bit slips occurring on high speed data links is often difficult and time consuming. The problem is often exacerbated when the errors occur infrequently, for example once a day.
A conventional method of diagnosing such problems is to tap the signal and to analyze it with a high speed Digital Sampling Oscilloscope (DSO) or other suitable analyzer tools. However, tapping of high speed signals is often not possible, for example when the error occurs within a component or a closed subsystem, or because tapping severely distorts the signal. On multi-lane links a necessity to tap multiple signals in parallel might exacerbate the difficulties. In addition, the signals are often severely distorted and judgment of the signal quality may not be possible without complex preprocessing, for example by means of an equalizer. Even if such preprocessing is available, it is often not possible or at least difficult to deduce which portion of the signal causes bit errors at the receiver. Furthermore, DSOs or similar test equipment with a high enough measurement bandwidth can be extremely expensive or simply non-existent, such as in the case of very high speed links. As a result, the root causes of bit errors often remain unclear.
Therefore, technicians are often forced to work in the dark when trying to determine and remove root cause of bit errors in a data link. Typically a trial and error approach is used, which includes tuning a number of parameters, such as output level, de-emphasis, equalizer, slicer level, sampler phase, etc., while making bit error rate (BER) measurements for every parameter combination tried. This process is often very time consuming, in part because each BER measurement can take a long time when errors are infrequent, and also because tuning of the various parameters influence the measurement result in a hard to predict and mutually dependent manner.
An object of the present invention is to provide a method and/or device for bit error analysis that correlates bit errors with specific bit patterns and related signal characteristics thereby enabling a quick estimation of likely causes of the bit errors.
Accordingly, one aspect of the present invention relates to a method of testing a data link that enables a non-intrusive identification of probable causes of bit errors by firstly identifying bit patterns that are likely to cause bit errors and secondly by determining and providing to the user specific signal properties of the bit error patterns that are indicative of the probable causes of the bit errors. The method comprises: a) providing a first pseudo-random bit sequence (PRBS) to the input port of the data link; b) using a first PRBS analyzer connected to the output port of the data link to detect bit error events in a first received bit sequence, wherein the first received bit sequence corresponds to the first PRBS transmitted over the data link; c) for each bit error event detected by the first PRBS analyzer in at least a portion of the first received bit sequence, writing bit error information into an error buffer, wherein the bit error information comprises PRBS analyzer state information corresponding to the detected bit error event; d) using an error pattern analyzer to read the bit error information from the error buffer, to associate detected errors with specific bit patterns, and to generate therefrom error pattern analysis information that is indicative of a cause of the detected bit errors; and, e) providing the error pattern analysis information to a user.
One aspect of the present invention relates to a bit error pattern tester that implements the method of the present invention for testing digital signal transmission through a data link. The bit error pattern tester comprises a PRBS generator for feeding a PRBS signals into an input port of the data link, a PRBS analyzers for detecting bit errors in a received bit sequence, wherein the received bit sequence corresponds to the PRBS signal received from an output port of the data link after transmission over the data link, and an error data buffer. Further provided is an error data generator that is operatively connected to the PRBS analyzer for receiving therefrom bit error information for each detected bit error event, and for writing the bit error information into the error buffer. The bit error pattern tester further comprises an error pattern analyzer that is operatively connected to the error data buffer and is configured to associate detected errors with specific bit patterns based on the bit error information saved in the error data buffer, and to generate bit error pattern analysis information that is indicative of a cause of the detected bit errors; and, an output device for providing the bit error pattern analysis information to the user.
The invention will be described in greater detail with reference to the accompanying drawings which represent preferred embodiments thereof, in which like elements are indicated with like reference numerals, and wherein:
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular circuits, circuit components, techniques, etc. in order to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known methods, devices, and circuits are omitted so as not to obscure the description of the present invention.
Note that as used herein, the terms “first”, “second” and so forth are not intended to imply sequential ordering, but rather are intended to distinguish one element from another unless explicitly stated. The term “data link” as used herein may refer to any transmission-type device, including but not limited to a transmission line, that has an input single-lane or multi-lane port for receiving a stream or streams of binary data, and an output single-lane or multi-lane port for outputting the stream or streams of binary data after its propagation in the device. The terms ‘data link’ and ‘device under test’ (DUT) are used herein interchangeably. When a plurality of digital signals are transmitted through a device or data link in parallel, the signal path of each of the digital signals in the device is referred to herein as a lane. When the plurality of parallel digital signals are bit-synchronized, an ordered set of time-synchronous bits from all lanes is referred to herein as an inter-lane word, or simply as word where it cannot lead to a confusion.
One aspect of the present invention relates to an indirect, non-intrusive method of diagnosis and analysis of bit errors and bit slips in high-speed data links, which is based on establishing correlations between bit errors and bit and word patterns present in the link when errors occur. Embodiments of the method use a single or multi-lane bit error rate test set (BERT), which is augmented by an error pattern analyzer. Pseudo random sequences (PRBS) generated by the BERT are used as test signals. With high speed data links, the mechanisms which lead to bit errors or CDR (clock and data recovery) slips are often related to specific data signal pattern carried over the link. Examples of such mechanism are inter-symbol interferences caused by bandwidth limitations, distortions caused by reflections and baseline wander caused by AC coupling. If this is the case, a correlation between the bit pattern sequence and the occurrence of errors exists, and the method disclosed herein enables to establish these correlations. In case of multi-lane links errors may not only be caused by the bit pattern sequence transported over the particular lane in error but may also be caused by the bit pattern sequences transported over the other lanes of the link. In this case a correlation between the word pattern sequence and the occurrence of errors exists. Examples of such mechanism are crosstalk and simultaneous switching noise.
With reference to
Turning now to
As illustrated in
Every time a bit error occurs on any lane, information related to the bit error is generated by the raw error data generator 15 and is written to the error buffer 16. In one embodiment, the data stored in the error buffer 16 contain PRBS analyzer state information as well as additional information. From the data stored in the buffer 16, EPA 18 is able to reconstruct all intra-lane error bit patterns and inter-lane error word patterns as well as their exact location on the time line. From this reconstructed patterns and their location on the time line EPA 18 is able to calculate further error pattern analysis information such as transition density wander, baseline wander, neighbor lane activity etc, which is indicative of a cause of the detected bit errors. The data storage buffer 16 may be sized to store a high number of errors, possibly occurring over a long period of time. In one embodiment, EPA 18 is configured, for example programmed to execute a suitable algorithm, to distinguish between bit errors and bit slips. As a result of the analysis process executed by EPA 18, a large amount of information may be provided to the user. This information enables the user to deduce likely root cause for errors in DUT 9, and decide which modifications of DUT 9 or of in parameter settings are required to eliminate the cause of the errors.
Referring now to
The PRBS analyzers 130 make bit decisions on the PRBS signals 31 received from the DUT 9, thereby transforming the received signals into synchronous bit sequence 32i, i=1, . . . , L, which are illustrated in
Referring back to
Referring now to
In one embodiment, the PRBS seed is the state information of the PRBS analyzer 130. A PRBS seed width K, i.e. the number of bit positions in the PRBS seed, may correspond to the length of the generator polynomial of the PRBS. By way of example, a PRBS with the generator polynomial G(x)=1+x28+x31 has a seed width K equal to 31 bits.
In one embodiment, a bit error signal 131 is generated by each PRBS analyzer 130 and passed onto an error detection element 140. The bit error signal 131 may be a binary signal that indicates whether an error event has been detected (‘1’) or not (‘0’) by a particular PRBS analyzer 130 in a current cycle of the PRBS analyzer operation. If at least one PRBS analyzer 130 has detected a bit error in the current clock cycle, the error detection circuit 140 sends an error signal 141 to the REDG 15. In one embodiment, the error detection circuit 140 may implement a logical “OR” on all analyzer bit error signals 131. When the so generated error signal 141 is a logical “true” (‘1’), that is at least one of the PRBS analyzer 130i has detected a bit error event, the raw error data generator 15 writes the error vector 132 and PRBS seed data 133 from each PRBS analyzer 130i to an error data buffer 160i associated with the ith data lane.
Referring now to
A PRBS analyzer 130 may lose synchronization to the incoming bit pattern due to a bit slip, when a PRBS framer in the PRBS analyzer ‘slips’ relative to a corresponding PRBS by one or more bits. As a result of a bit slip, the error rate in the error vector 132 becomes typically very high. Accordingly, one embodiment of the invention implements a special bit slip handling process. This process may include detecting and verifying a bit slip, and a mechanism for limiting the number of data entries 165 written to the buffers 160 in order to not flood the error data buffers 160.
Turning now to
The error pattern analyzer 18 may require information about the “dirty” error data entries stored in the buffers 160 during the first ‘dirty’ interval 2103. In order to provide this information, in one embodiment the raw error data generator 15 generates a list of dirty data pointers 151. This list is written to a corresponding dirty data pointer FIFO 170. The pointers 151 may be for example in the form of the actual error data buffer addresses 161. In one embodiment, for every ‘dirty’ segment 223 of the received bit sequence that is saved in the buffer 160, a pair of pointers 151 is written to the FIFO 170, which identify the beginning an the end of the dirty data entries in the buffers 160. In one embodiment, the pair of pointers includes a pointer to the first dirty entry 165 in the error data buffer 160, which in the example of
The error pattern analyzer (EPA) 18 reads the bit error data stored in the error data buffers 160 and, in some embodiments, the ‘dirty data’ pointers 151 stored in FIFO 170, based on these inputs associates detected bit errors with specific bit patterns, and generates therefrom error pattern analysis information, as described hereinbelow. The error analysis processes implemented in EPA 18 can either run in parallel with the acquisition of the PRBS signals 31, i.e. in online mode, or after the raw data acquisition is stopped, i.e. in an offline mode. EPA 18 may be implemented using software or hardware logic. In one embodiment, EPA 18 is implemented in software, i.e. is in the form of a set of computer executable instructions that are saved in a computer-readable memory and are executed by a digital processor.
The operation of EPA 18 will now be described at first with reference to a first PRBS signal 31 that is received by the first PRBS Analyzer 1301 in the first lane, i=1, while the other (L−1) PRBS signals 31i, i=2, . . . , L, will be referred to as the second PRBS signals, and the corresponding bit sequences 32i, obtained therefrom by the PRBS analyzers 130i, I=2, . . . , L, will be referred to as the second bit sequences. It will be appreciated that the terms ‘first’ and ‘second’ do not imply a particular position of the respective lanes relative to other lanes in the multi-lane data link, but are simply labels that are used to distinguish between lanes and signals for the sake of clarity and convenience.
With reference to
Referring now to
Turning now also to
Referring again to
The total count of bit errors detected in the first lane corresponds to the number of entries in the bit level seeds list 183. The total number of different bit error patterns detected corresponds to the number of entries in the unique bit level seeds list 193. A top N bit error patterns correspond to the top N entries in the sorted bit level seeds list 193. The actual bit pattern 431 wherein the bit error occurred is generated by loading a PRBS generator 413 with the corresponding bit level PRBS seed value 185 or 186 and shifting it by a desired number of bits to the left and to the right so as to provide a bit error pattern of a width that is suitable for further analysis by a pattern analysis module 450 and/or as desired for displaying to a user. The resulting bit pattern 431, which corresponds to a segment of the received bit sequence 32 where the bit error occurred, is referred to herein as the bit error pattern 431. By way of example,
With reference to
In one embodiment, EPA 18 is further configured to perform an inter-lane word error pattern analysis wherein it identifies, in the received parallel multi-lane stream of L bit sequences 32, specific inter-lane word patterns that are more likely than other word patterns to be associated with bit errors. In this mode of operation, EPA 18 operates on error data in all L lanes wherein the parser 401 reads time-synchronous error data entries 165 from all L buffers 160, excluding entries in dirty segments. For each bit error detected in the first bit sequence 32 received in the first lane, a word error pattern identifier is generated. This word error pattern identifier uniquely identifies an inter-lane word that is composed of bits from each of the L bit sequences 32 that are synchronous with the bit error. Similarly to the bit error pattern analysis described hereinabove, top N most frequently occurring word error patterns may be identified and displayed to the user.
With reference to
Similarly to the bit error pattern analysis described hereinabove, the sorter 430 may be configured to identify word error entries 515 that appear multiple times, count the number of occurrences of unique word error entries 516 in the list 583, and order them in accordance with the frequency of their occurrences, for example in a descended order of the word entry count 198. The total count of word errors detected corresponds to the number of entries 515 in the bit level seeds list 583. The total number of different word error patterns detected corresponds to the number of entries in the unique bit level seeds list 593. When the entries 516 are ordered in the descended order of the count 198, the top N word error patterns correspond to the top N entries in the sorted word entries list 593.
The actual word pattern belonging to a set of corresponding seeds 185 is reconstructed by loading the set of L simulated PRBS generators 413 with the corresponding bit level seed values of all lanes as read from a word entry W(k) 515. This process is similar to the bit pattern reconstruction described hereinabove but generates correlated bit patterns for all lanes in parallel, with the bit error in the middle of the first bit pattern provided that the PRBS generators 413 are shifted +/−symmetrically. By way of example,
The process described hereinabove corresponds to performing, for each error data entry in the buffer 1601 of the first lane, the following sequence of steps:
a) find bit error offsets [k]=k1 . . . km from the error vector 132 read from the first buffer,
b) feed the PRBS generators 403 of all L lanes with the PRBS seeds 133 that are read from their respective buffers 160 at the same error data entry address;
c) shift all L PRBS generators by the same set of bits k1 . . . km to obtain m rows of L bit-level PRBS seeds SEED(i, kj) . . . SEED(i, kj), i=1, . . . , L, j=1, . . . , m. The L bit-level PRBS seeds SEED(i, k) that are obtained by shifting respective PRBS generators 403 by the same number of bits for example k1, correspond to the same word, and may be written in the same row of the list 593;
The aforedescribed procedure identifies all word patterns having an error at the bit position corresponding to the first lane. In one embodiment, steps (a)-(c) may be repeated for any new bit error positions l1 . . . lm from the error vectors 132 that are read from the time-synchronous data entries 156 in the buffers 160 of all other lanes.
In one embodiment, the parser 401 may be configured to read all time-synchronous entries from the buffers 160, identify bit error offsets k in each of the L error vectors 132 and compose a list [k] of all bit offsets k that are encountered at least once in the L time-synchronous error vectors, and then perform steps (b) and (c) for each offset from the list [k]. The resulting M rows of L bit-level PRBS seeds {SEED(i, kj) . . . SEED(i, kj)}, i=1, . . . , N, j=1, . . . , M define all word errors encountered in a particular clock cycle of the PRBS analyzers 130.
In one embodiment, EPA 18 may be configured to display top N most frequently occurring word error patterns to a user.
In one embodiment, EPA 18 further includes a bit slip detector functionality that is configured to analyze the ‘dirty data’ segment 223 saved in the buffer 160 in order to check for bit slips and to find the bit slip pattern, i.e. a bit pattern that caused the bit slip in the PRBS analyzer 130. Referring again to
The bit slip detection functionality of EPA 18 will now be described with reference to
The method starts at step 501, wherein the first entry in the dirty data segment 223 is selected by the parser 401. In this step, the error data parser 401 reads the dirty data pointer 141, which points to the address ‘A1’ of the first entry in the dirty data segment 223. In order to verify that the excess bit errors in the dirty data segment 223 was caused by a bit slip, at step 502 a bit pattern of a suitable length from the error vector 1321 of the first entry in the dirty segment 223 at address A1 is used as the seed of the PRBS generator 403. The PRBS bit pattern generated by this generator is sent to the bit slip detector 440, which compares it to the error vector bit pattern 1321 stored in the dirty segment 223. Since the generated PRBS is in phase with the error vector 1321, the comparison may start with the portion of the error vector 1321 which was used to seed the PRBS generator. The actual number of bits compared should be at least as long as the PRBS seed, i.e. as long as the degree of the polynomial of the PRBS generator. However, more bits can be compared for added reliability, since the “dirty” data may not be caused by a slip but by a long error burst. In one embodiment, all bits till the end of the “dirty” segment 223 are compared. If both patterns match at step 503, then the dirty data entry at the address A1 was caused by a bit slip, and the error data buffer entry {1321, 1331} stored as the first entry A1 of the dirty data segment 223 is used to recover the bit level seed 185 of the bit pattern causing the bit slip. This is done by loading in step 505 the PRBS generator 403 with the stored PRBS seed 1331 and shifting it by the number of bits that is equal to the offset of the first bit error within the error vector 1321, which results in the generation of a bit-level bit slip corrected PRBS seed 525. By feeding this bit-level PRBS seed 525 to the PRBS generator 413 in step 507, a bit slip pattern 535 may be generated.
If in step 503 the patterns do not match, at step 504 a next entry in the dirty data segment 223 is selected, and the check is repeated with the 2nd error vector 1322 from the second entry in the dirty data segment 223, and so on until the end ‘AK’ of the dirty data segment 223 is reached. This mechanism accounts for multi-bit slips. If no matching patterns can be found until the end of the segment 223 is reached, it is assumed that the dirty data was not caused by a slip. If matching patterns are found in step 503 a bit slip is assumed.
By going through all dirty data segments, a bit level PRBS seed list similar to the list 183 in the aforedescribed bit error pattern analysis is build. This list contains one entry for every bit slip. Further processing and analysis of this list is similar to bit error pattern analysis, and may include identifying top N most frequently encountered bit slip patterns, and generating signal characteristics therefor.
The aforedescribed bit slip verification approach is based upon the PRBS property that, in the case of a bit slip, the bit error pattern, i.e. the pattern of ‘1’s and ‘0’s in the error vector 132, is also a PRBS of the same type as the original PRBS. Therefore, when the PRBS generator is seeded with the bit error pattern from the error vector 132 and the resulting bit sequence generated by the PRBS generator is identical to the bit error pattern stored in the buffer 160, the bit errors are due to a bit slip.
The number of consecutive bits from the error vector 132 that are used to seed the PRBS generator in step 502 is defined by the order of the PRBS generator, or in other words by the width of the PRBS seed. By way of example, for a PRBS3131 consecutive bits from the error vector 132 have to be used as a PRBS seed in step 502, while for a PRBS77 consecutive bits from the error vector 132 have to be used. For an implementation where the width of the error vector 132 is smaller than the width of the PRBS seed, two or more error vectors are concatenated to get these bits.
Bit slip word pattern analysis is similar to bit slip bit pattern analysis. However, like with word error pattern analysis described hereinabove, not only the bit level seeds of the analyzed lane wherein the bit slip occurred are generated but also the corresponding time-synchronous bit level seeds of all the other lanes are generated too. The process used is similar to the word pattern analysis described hereinabove. Further processing of the bit slip word pattern analysis is similar to the word error pattern analysis as described hereinbelow.
In one embodiment, EPA 18 includes a bit slip & bit error pattern characterization module (BSBEPC) module 450, which is also referred to herein simply as a characterization module 450 and which includes logic for determining one or more signal characteristics for the bit error & slip patterns and the word error & slip patterns. Examples of the signal characteristics that can be computed include baseline wander, transition density, and transition density wander. The baseline wander represent a variation of a DC component of a signal over time and is obtained by computing a running average of a bit pattern with an averaging window several bits wide. The transition density is an average number of bit transitions between logical ‘1’ and ‘0’ for a window of P>1 bits wide; it can be computed by dividing the number of bit transition that occur over a window of P bits wide by the number of bits in the window P. The transition density wander, which is also referred to as clock wander, is a low pass filtered deviation of the transition density from a long time average.
Accordingly, embodiments of the characterization module 450 may include one or more of the following modules: a baseline wander computing module 431 which computes the baseline wander characteristic for a bit error pattern provided thereto from a PRBS generator 413 and a transition density wander computing module 431 for a bit error pattern, or a bit slip pattern, or a word error pattern or a word bit slip pattern provided thereto from one or more PRBS generators 413. Methods and algorithms for computing the transition density, transition density wander and baseline wander from a given bit pattern are known in the art and are described, for example, in a publication of the Optical Internetworking Forum (OIF) “CEI Short Stress Patterns White Paper”, by Pete Anslow et al, which is available from the OIF website “oiforum.com”, which is included herein by reference.
With reference to
In one embodiment the characterization module 450 may further include logic 433 for computing transition probability versus bit slip or bit error position, which can then displayed to a user. The transition probability is defined as the probability of a lane to transition between ‘1’ and ‘0’ when a bit error occurs. It may be computed, for example by a following method: i) counting the number of transitions between ‘1’ and ‘0’ at the position of the bit error for all bit error patterns of a lane, and then dividing the transition count by the number of error patterns. This is done for the lane in error itself as well as for any other lanes in the link. By repeating this procedure for bit positions in the vicinity of the bit in error, a transition probability curve is obtained for any lane at and in the vicinity of the bit in error.
In one embodiment, the characterization module 450 may further include logic 434 for generating bit error and bit slip probability histograms versus any of the derived signal characteristics. Examples are histograms for bit error or bit slip probability versus transition density wander, versus baseline wander or versus transition density.
The aforedescribed method and device for analyzing bit error and bit slip patterns provides an indirect, non-intrusive means to bit error and bit slip diagnosis and analysis, wherein bit and word patterns that are likely to cause errors and bit slips in the receiver are identified and analyzed. The method uses a single or multi-lane bit error rate test set (BERT) which is augmented by an error pattern analyzer. Pseudo random sequences (PRBS) are used as test signals. The method provides a number of advantages over previous approaches, including the following: a) root causes of bit error and bit error slips which occur in the DUT may be identified without the need to tap signals from inside the DUT; b) every error is captured, even if the rate of occurrence of it is very low; c) it is applicable to data links not directly accessible, e.g. links inside a component, and to a number of topologies, e.g. point-to-point, loopback; d) it does not require high bandwidth analog measurement equipment; e) enables to identify which portion of the signal is actually causing errors, as the instrument's BERT receiver is directly detecting the errors; f) it provides error pattern analysis results that can be directly mapped to link features and parameters.
For example, a finding that bit errors correlate with both positive and negative baseline wander peaks, it may signal that the bandwidth of AC-coupling in DUT is too high. If bit errors correlate with positive or negative baseline wander peaks, it may signal that the input to an amplifier in the DUT is incorrectly biased, or a slicer level in the receiver not set to an optimum value. If bit errors occur for single ones/zeros embedded in longer blocks of zeros/ones, i.e. where the transition probability is low, it may signal that there is not enough bandwidth, or the DUT includes a receiver with a too low equalization, or insufficient de-emphasis in a transmitter in the DUT. If bit errors/slips correlate with peaks in the transition density wander, it may signal that the CDR control loop bandwidth is too high, or CDR phase noise is too high. If bit errors correlate with positive or negative transition density wander peaks, it may signal that the CDR sampling phase is not set to optimum value.
The above-described exemplary embodiments are intended to be illustrative in all respects, rather than restrictive, of the present invention. For example, although embodiment of the invention have been described hereinabove with reference to a multi-lane error pattern analyzer, a single-lane error pattern analyzer and method is also within the scope of the present invention. Furthermore, the operation of the error pattern analyzer of the present invention has been described with reference to NRZ signals, it will be appreciated that the method of the present invention is also applicable to other modulation formats such as RZ and PAM-4. Thus the present invention is capable of many variations in detailed implementation that can be derived from the description contained herein by a person skilled in the art. All such variations and modifications are considered to be within the scope and spirit of the present invention as defined by the following claims.
The present invention claims priority from U.S. Patent Application No. 61/774,427 filed Mar. 7, 2013, which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61774427 | Mar 2013 | US |