Many measurement instruments need to sample and convert an analog input signal into a digital form before further processing can be performed. Examples of measurement instruments include, but are not limited to: oscilloscopes, network analyzers, spectrum analyzers, signal analyzers, protocol analyzers, printed circuit board testers, atomic force microscopes, frequency counters, time-domain reflectometers, mass spectrometers, liquid or gas chromatographs, power analyzers, data acquisition cards, ultrasonagraphs, optical distributed temperature sensing systems, polarization analyzers, digital communications analyzers, and jitter analyzers. The conversion from analog to digital form is typically accomplished by an analog-to-digital converter (ADC), sometimes also referred to as a “data converter”. As the speed of ADCs increase, the amount of digital data that needs to be stored into memory gets larger. Processing these large amounts of digital data also takes longer.
The data samples (16, 36) in the systems of
The ADC 62 receives an analog signal at input 66 and converts it into parallel digital data samples at outputs 68. There are a total of N outputs 68. During each cycle of the ADC 62, N data samples are presented at substantially the same time, in parallel, at the outputs 68. Each of the outputs 68 is represented by a single line in the figure, but it should be noted that each single line 68 may also represent a multi-line bus of M bits. For example, each line 68 may represent an 8-line bus for representing an 8-bit digital data sample. Each single line 68 may also carry serial output. The ellipses ( . . . ) in
The N outputs 68 are received by parallel computation block 64. The outputs 68 from the ADC 62 will be referred to hereinafter as ADC links x0 through xN−1. The parallel computation block 64 can be implemented using one or more field programmable gate arrays (FPGAs), programmable logic devices (PLDs), dedicated hardware such as an application specific integrated circuit (ASIC), digital signal processors (DSPs), microprocessors, or in software running on one or more processors, or any combination thereof. The parallel computation block 64 pre-processes the data samples carried on each of the ADC communication links x0 through xN−1. Then, the processed data is stored into a memory 65, where a processor 67 can access the data for further computations. Alternatively, the processed data from the parallel computation block 64 may be passed on to other computation elements before being stored into memory 65.
In one embodiment, each computation sub-block 70 is implemented with its own dedicated device, e.g. its own dedicated FPGA, PLD, ASIC, DSP, or microprocessor. When the computation sub-block 70 is implemented with its own FPGA, PLD, DSP, microprocessor, or other reprogrammable device, the computations performed by the computation sub-block 70 can be changed in mid-operation of the measurement instrument.
Various computations can be pre-processed using a parallel computation block 64. In one embodiment, a parallel computation block computes a histogram on the data samples received on the ADC links x0 through xN−1. To compute a histogram, a set of data samples is sorted by value into groups, or “buckets”. The number of data samples that fall within each bucket is counted. Each bucket in a histogram counts a unique value (or a unique range of values).
It should be noted that the word “histogram” is often used to refer to a visual graph that represents a frequency distribution of values in a set of data. However, the word “histogram” will be used in this application to refer to the actual raw counts of data that are used to produce such a visual graph.
Each sub-histogram 92 computes a histogram on the data samples it receives on its own ADC link, xr, where r is a range of integers such that 0≦r≦N−1. The sub-histograms 92 are also indexed by r, such that the rth sub-histogram 92 receives the ADC link xr. Let M be the number of “buckets” desired in the histogram. For example, M can be the number of discretization levels that are produced by the ADC 62. Each counter in each sub-histogram can be uniquely identified as Cr,i, where 0≦r≦N−1 and 0≦i≦M.
Each sub-histogram has M counter outputs, Cr,0 through Cr,M−1. Each counter Cr,0 through Cr,M−1 in a sub-histogram counts a unique data value x (or a unique range of data values), where x is any data value that can be generated by the ADC 62. The data value(s) 10 counted by each counter will depend on the application. Counters across histograms that share a common index i count the same value(s). For example, counter C1,1 counts the same value(s) as counters C2,1 and C3,1; counter C1,2 counts the same value(s) as counters C2,2, C3,2, etc. Let Cmax be the maximum count that can be made without overflow by any counter Cr,0 through Cr,M−1.
Each time a data sample with a value x is received by a sub-histogram 92, the sub-histogram 92 increments the appropriate counter (Cr,0 through Cr,M−1) that counts the value x. A clear signal 100 is provided that signals the sub-histograms 92 to reset the counters of the sub-histogram 92 to zero to begin counting data values again. A read signal 102 is provided to latch and transmit the values from the sub-histogram 92 to the histogram adder 94.
When the read signal 102 is asserted, the outputs from the sub-histograms 92 are added and combined in the histogram adder 94 according to the following equation to produce a histogram Hi of sample values:
where Cr,i represents the output of the ith counter from the rth sub-histogram 92. Hi is the number of times since the last assertion of the clear signal 100 that the sample value counted by counters having the index i was obtained from the ADC 62. The histogram adder 94 can be implemented with a number of adders that add up the counters Cr,i for r=0 to r=N−1 for each given i from i=0 to i=M−1.
Note that the clear signal 100 and the read signal 102 should be asserted at least once every Cmax samples in order not to lose any data samples and to prevent the counters in the sub-histograms 92 from overflowing or rolling over. This does not limit the number of samples that can be accumulated in a histogram, however. To create longer histograms, simply add together multiple histograms Hi in memory.
The parallel histogram 90 can be designed so that no samples from the ADC are ever omitted. The clear signal 100 and read signal 102 are combined so that the counter values are latched and the counters immediately cleared in between the arrival of two consecutive data samples on the ADC links. This requires the use of counters and adders that are fast enough to respond in the time between arrival of consecutive data samples.
In one embodiment, the parallel computation block computes a portion of a Discrete Fourier Transform in parallel on the data samples received from the ADC links x0 through xN−1. It is well-known that the Discrete Fourier Transform (DFT) of an array X of n real or complex numbers, is yet another array Y of n complex numbers, expressed by the following equations:
Equation 2 can also be rewritten using the following decomposition:
Let n=n1n2, (4)
j=j
1
n
2
+j
2, and (5)
k=k
1
+k
2
n
1, (6)
Then equation (2) can be rewritten as follows:
See, for example, Matteo Frigo and Steven G. Johnson, “The Design and Implementation of FFTW3”, Proceedings of the IEEE vol. 93, no. 2, pp 216-231 (2005), especially equation (2) of the paper.
Notice that there are actually two Fourier Transforms in equation (7): an inner one indexed by j1, and an outer one indexed by j2, with multiplication by a so-called “twiddle factor” of ωnj
The outer Fourier Transform (“outer FT”) can then be rewritten as follows:
As above, let N be the number of ADC links. Let Nn1 be the number of complex multiply and accumulate (MAC) operations that may be performed in parallel. n1 will generally be chosen on the basis of cost: the larger n1 is, the faster and more expensive the implementation. Let n2=n/n1. The inner FT in equation (8) can be applied in parallel to n/n2 sets of data arriving on the N ADC links xo through xN−1. The array Zj
This system can save time over a standard Fast Fourier Transform (FFT) algorithm by computing the inner FT in parallel. The standard FFT algorithm takes an amount of time proportional to n log n (all logarithms used here are in base 2). The computation of the inner FT in equation (8) will be complete after a constant delay. The delay is the amount of time required to accumulate the final samples into the dot products of equation (8). Then the outer FT in equation (9) will take an additional amount of time proportional to n2 log n2. Therefore, the system will speed up the calculation of a Discrete Fourier Transform by up to
times. The speed-up factor is smaller for the same number n when smaller n1 and m are chosen to reduce hardware costs.
At time t, the value X[n2t+r] is transmitted from the ADC to the parallel FT 110 on ADC link Xr, for t=0,1,2, . . . , n1−1.
Refer back to equation (8). Let j1 be the index for time t ranging from 0 to n1−1; let j2 be the index for the inner FT sub-blocks 112, which ranges from 0 to n2−1; let k1 range from 0 to n1−1 As each value arrives on an ADC link xr, it is sent to each Z-calculator 116.
For example, consider the first Z-element Z0[k1], which is calculated by the first inner FT sub-block 112. The first term in the first Z-element Z0[k1] is Z0[0], which is calculated by the Z-calculator 116A in
Z
0[0]=ωn0·0((X[0·n2+0]ωn
Notice that as soon as each value arrives on the ADC link x0, it can be immediately plugged into calculations for the relevant product term X[j1n2+j2]·ωn
As another example, consider the second term Z0[1] in the first Z-element Z0[k1]. Z0[1] is calculated by a Z-calculator that is not explicitly shown in
Z
0[1]=ωn0·1((X[0·n2+0]ωn
As each value arrives on the ADC link x0, the calculations for Z0[1] can be done in parallel with the calculations for Z0[0], and with all of the calculations for Z0[k1]. Additionally, all of the calculations for Z0[k1] are done in parallel with the calculations of Z1[k1], Z2[k1], and the other Zj
Then the final values of Z0[0], . . . Z0[n1−1],Z1[0], . . . , Z1[n1−1], . . . , ZN−1[0], . . . , ZN−1[n1−1] are read out of the Z-calculators 116 and into a memory 113. The memory 113 is accessible by computation device 114, which computes the outer Fourier Transform Y according to equation (9). It is well-known in the art how to compute equation (9) using a Fast Fourier Transform.
In one embodiment, the parallel computation block is used to filter the data received from the ADC links x0 through xN−1. Such filtering may need to be done for any number of reasons: for calibration correction (including correction of timing and digitization level inaccuracies within the ADC), for impedance mismatch correction between the instrument probe and the signal line being probed, for estimation of the signal shape at a location removed from the probing point, and many other reasons.
The parallel FIR filter 120 in
Only a few selected filter sub-blocks 122 are outlined in
The samples x(0), x(N), x(2N), . . . etc. arrive on ADC link x0. The samples x(1), x(N+1), x(2N+1), . . . etc. arrive on ADC link x1, and so on and so forth. Therefore, the samples x(r), x(N+r), x(2N+r), . . . etc. arrive on ADC link xr for 0≦r≦N−1. Similarly, the filter results y(r), y(N+r), y(2N+r), . . . etc. are generated on the outputs labeled yr for 0≦r≦N−1.
Refer now to the rth filter sub-block 122r in
Filter sub-block 122r receives a sequence of data samples x(r), x(N+r), x(2N+r), . . . etc. on ADC link xr. Filter sub-block 122r also receives a second sequence of data samples x(r+1), x(N+r+1), x(2N+r+1), . . . etc. on ADC link xr+1. For now, consider just the first data sample in each of these two sequences, namely x(r) and x(r+1). Each of these data sample x(r) and x(r+1) is sent through a delay element 124. After passing through the delay elements 124, each data sample is multiplied with its respective filter coefficient. Multipliers 126 are used to multiply x(r) by a0, and x(r+1) by a1. The products a0·x(r) and ax·x(r+1) are summed together by an adder 128 to generate y(r). The remaining data samples in the sequences are processed in the same manner.
Refer now to the (N−1)th filter sub-block 122N−1 in
The (N−1)th filter sub-block 122N−1 is able to receive both x(N−1) and x(N) because there is no delay element 124 in the path of the look-ahead link 130, while there is a delay element 124 in the path of ADC link xN−1. The data samples are then multiplied with their respective filter coefficient. Multipliers 126 are used to multiply x(N−1) by a0, and x(N) by a1. The products a0·x(N−1) and ax·x(N) are summed together by an adder 128 to generate y(N−1). The remaining data samples in the sequences are processed in the same manner.
Expanding the number of M taps up to N is straightforward. There will be M−1 of the filter output computations taking inputs from the ADC links x0 through xN−1. If 2N≧M>N, then a second layer of delay elements 124 must be added to the ADC links. If 3N≧M>N, then a third layer of delay elements 124 must be added, and so on and so forth.
Other types of filters may be implemented using this parallel structure as well. By appending appropriate further stages of delays, adders, and multipliers to the y outputs in a manner well-known to those of ordinary skill in the art, Infinite Impulse Response (IIR) filters may be also implemented.
In one embodiment, time-dependent filters can also be implemented using a parallel computation block. Time-dependent filters implement modulations such as multiplying each sample by a function of time f(t). Examples of f(t) commonly used include exp(2πjt), sin(2πt), cos(2πt), etc. where j=√{square root over (−1)}.
Data samples are received on N ADC links xr, where r is a range of integers such that 0≦r≦N−1. A multiplier 144 multiplies each data sample by a time-dependent function. Let t be the time that the data sample arriving on x0 was taken. Then the time-dependent filter 140 can be described by the following equation:
y
r
=f(t+r)·xr (11)
There are many other kinds of computations that can be performed in parallel in the parallel computation block 64 of
Although the present invention has been described in detail with reference to particular embodiments, persons possessing ordinary skill in the art to which this invention pertains will appreciate that various modifications and enhancements may be made without departing from the spirit and scope of the claims that follow.