The present disclosure relates to digital sample rate conversion and in particular to methods, structures and computer program products for sample rate conversion, whereby an input digital sample with a first frequency is converted to an output sample with a second frequency.
In order to be processed by digital systems, a continuously varying signal needs to be converted to a set of discrete samples. A sample is a value or a set of values at a point in the domain in which the continuously varying signal is sampled.
It is often desired to convert the sampling rate of a digital signal from one rate to another, for example in audio, video or image processing systems where data needs to be processed by different sub-systems or components which require different sampling rates.
Sample rate conversion can be implemented with a structure that provides various electronic components which are arranged to store and perform arithmetic operations on data to implement algorithms for sample rate conversion, converting an input signal having a first sample rate to an output signal having a second sample rate. Sample rate conversion structures may be suitably provided as a digital signal processor (DSP) device or as a component part of DSP device which also performs other functions. A DSP device provides suitable software and hardware architecture for power-efficient processing of algorithms for portable devices or other applications where there is a great need for power efficiency.
When designing a sample rate conversion structure, there is a trade-off between the quality and the computational complexity. In this context, the quality may be defined as a ratio between the (wanted) signal power and the (unwanted) noise power. The computational complexity may be defined as the average number of arithmetic (such as multiply or add) operations that are required to generate one output sample. A higher computational complexity will generally lead to a higher power consumption and larger footprint (in terms of the required amount of memory, and the required physical circuit area). It is desired to have a better trade-off between these factors. According to a first aspect of the disclosure there is provided a method of converting a stream of input samples to stream of output samples, comprising deriving each output sample by convolution of a continuous time interpolation kernel with a continuous time step function representing the input sample stream.
Optionally, each input sample is separated by an input sample interval and convolution of a continuous time interpolation kernel with a continuous time step function representing the input sample stream comprises calculating a weighted sum of the continuous time impulse response integrated over all values of the input sample stream over the output sample interval.
Optionally, each output sample is separated by an output sample interval; and:
(a) at the start of each output sample interval the last known input sample is stored;
(b) if a new input sample arrives during the course of the output sample interval, the stored value is updated;
(c) step (b) is repeated for any other new input samples; and
(d) at the end of the output sample interval the output sample is calculated based on the stored or updated value.
Optionally, the last known input sample is stored using an accumulate and load unit.
An accumulate and load unit may be any suitable circuit, device or code that provides for the storage and summation of a plurality of values and arranged so that upon receipt of a trigger signal it makes the accumulated value available and resets itself to a defined value.
Optionally, the interpolation kernel comprises a piecewise polynomial function of a given polynomial order, and comprises matrix coefficients which are generated from coefficients of a transposed Farrow structure of a polynomial order lower than said given polynomial order.
Optionally, the interpolation kernel comprises a B-spline interpolator.
Optionally, the interpolator coefficients define a symmetric phase range.
According to a second aspect of the disclosure there is provided a structure for converting a stream of input samples to stream of output samples, being arranged to derive each output sample by convolution of a continuous time interpolation kernel with a continuous time step function representing the input sample stream.
Optionally, the structure implements a polynomial interpolator and comprises:
a phase generation unit, that computes a new phase value by adding a phase change to a previous phase and triggers generation of an output sample;
a comb filter that computes a differential signal, formed by subtracting a previous input from a current input value;
one or more multipliers that multiply the differential signal by powers of the new phase value;
an accumulate-and-load unit, which is loaded with the previous input sample value upon generation of an output sample;
one or more accumulate and dump units, which are reset to zero upon generation of a new output sample;
a matrix multiplication unit, that forms multiple outputs using constant coefficient multiplication of values from the accumulate and load and the accumulate and dump units; and
a delay-and-add unit, that adds delayed versions of the outputs of the matrix multiplication unit.
According to a third aspect of the disclosure there is provided a non-transitory computer program product storing instructions that, when executed by a computing device, enable the computing device to convert a stream of input samples to stream of output samples, comprising deriving each output sample by convolution of a continuous time interpolation kernel with a continuous time step function representing the input sample stream.
The computer program product may be stored on or transmitted as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fibre optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infra-red, radio, and microwave, then the coaxial cable, fibre optic cable, twisted pair, DSL, or wireless technologies such as infra-red, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. The instructions or code associated with a computer-readable medium of the computer program product may be executed by a computer, e.g., by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry.
The disclosure will be described below, by way of example only, with reference to the accompanying drawings, in which:
It is often desired to convert between arbitrary sample rates, and to enable this an approximation of the continuous signal needs to be created and then sampled. This process is called interpolation.
Consider an input signal x(kT1) sampled at one rate, which needs to be converted to an output signal y(mT2) sampled at a different rate. This is shown schematically in
A sample rate conversion (SRC) equation relating the output signal to the input signal and describing the filtering and re-sampling of SRC for the case in
h(t) is the continuous-time impulse response of the required signal, which delivers the signal y(mT2) at the new sample rate from the input signal x(kT1) at the old sample rate. Or, putting it another way, h(t) is a function representing the action the required filter performs on the input signal to convert it to the output signal. Equation [1] describes a time-varying system.
It should be noted that we have Rational Factor SRC if
with L,MεN+. If L=1 or M=1, we have Integer Factor SRC (i.e. one of the sample rates is an exact multiple of the other sample rate). If L=M=1, we have discrete-time convolution (i.e. no SRC at all).
Generally, the complete continuous-time impulse response h(t) must be known. For rational and integer value SRC, described above, the system described by equation [1] varies periodically with time. So, only certain values of h(t) are actually required for the computation.
In order to simplify calculating the samples of h(t) which are required, we look for simple functions describing continuous time impulse responses. It has been found that, since polynomials are an example of such simple functions, polynomial filters are useful in SRC. We limit the class of polynomial filters to piecewise polynomial impulse responses composed from pieces of equal length. Given polynomial pieces of degree Q and length Δ, we may derive an expression for h(t) as follows:
Where cq is the coefficient for the q-th order polynomial.
There are two choices for Δ which simplify equation [2], namely Δ=T1 and Δ=T2. Using equation [2] and equation [1] above and setting Δ=T1 gives:
[a] denotes the floor operation i.e. [a] denotes the greatest integer a′ such that a′≤a.
μm is the inter-sample position, i.e. the distance between the previous input signal and the current output signal (see
The above equations [3], [4] and [5] represent an implementation of SRC known as the Farrow structure and
The Farrow structure shown in
In the Farrow structure equations [3], [4] and [5] above, the higher the order of the polynomial pieces, the better the impulse response h(t) can be matched to the application. If high order polynomials are not feasible, it is also possible to use shorter polynomials of lower order. In that case, more “reference” polyphase branches (i.e. starting points of polynomial pieces) are required. This can be achieved by decreasing the length of the polynomial pieces by a factor J, that is:
This may be thought of as a generalization of the Farrow structure so is commonly known as the generalized Farrow structure.
Since equation [6] provides us with a generalization compared with the case where Δ=T1, we can skip the case where Δ=T2 and, instead, immediately set
Substituting equations [6] and [7] into equation [2] above, we have:
Note that when J=1, we return to the original Farrow structure and μk becomes equal to μm, the previously defined inter-sample position.
Equations [8], [9] and [10] above describe the Farrow structure when the piece length Δ of the polynomial pieces is defined by equation [7] above, and is known as the Transposed Farrow Structure (TFS).
The Farrow Structure provides an efficient way to implement a sampling rate increase between arbitrary sampling rates, and can be seen as a polynomial polyphase interpolator. However, the transfer zeros of the filter are clustered around the integer multiples of the input sampling rate which means that the Farrow structure is subject to aliasing when implementing a sampling rate decrease. The Transposed Farrow structure is suitable for implementing a sample rate decrease, because its transfer zeros are clustered around the integer multiples of the output sample rate, while using the same polynomial functions as the Farrow structure. It can be seen as a polynomial polyphase decimator which can provide anti-aliasing.
While both the Farrow structure and Transposed Farrow structure can in principle convert sample rates between arbitrary values, because of imaging and aliasing problems in practice a DSP can be provide with either a Farrow structure and be used only for increasing a sample rate, or with a Transposed Farrow structure and be used only for decreasing a sample rate. If a DSP is to be provided which can handle arbitrary sample rate conversion with improved imaging or aliasing properties, then it must have both a Farrow structure and a Transposed Farrow structure, together with a detection device that compares the sample rate of an incoming signal with the sample rate of a target output signal and makes an appropriate selection between either the Farrow structure or the Transposed Farrow Structure depending on whether the output sample rate is respectively higher or lower than the input sample rate. Implementing both structures is costly in terms of power consumption and die area, while implementing only one of the structures limits the functionality of the DSP. It is desired to provide a sample rate converter with one or more of improved functionality, reduced power consumption, reduced computational complexity or reduced die area.
A modified Transposed Farrow Structure is described in Babic, D.; Vesma, J.; Saramaki, T.; Renfors, M., “Implementation of the transposed Farrow structure,” ISCAS 2002. IEEE International Symposium on Circuits and Systems, vol.4, no., pp.IV-5,IV-8 vol.4, which is hereby incorporated by reference. This describes a Transposed Farrow Structure that is implemented using fewer multipliers for cases where decimation ratio is high. If the input sampling rate is high relative to the output sampling rate, the required number of bits to represent the fractional interval for each input sample is small, and so the multiplication of a signal with a high sample rate with the fractional interval can be realised with simple additions. The structure for this modified Farrow Structure is shown in
However, while this is more computationally efficient than the Transposed Farrow Structure of
In
x=the input sample
y=the output sample
ω=phase increment
ϕ=phase
A&L=Accumulate-and-load unit
A&D=Accumulate-and-dump unit
=constant coefficient multiplier
z−1=delay element
ci,j=constant coefficient values
+=adder with optional negated inputs (marked with ‘−’).
⊗=variable input multiplier
t=trigger input (generated upon phase wrap around)
This structure includes:
A phase generation unit 500, that computes a new phase value by adding a phase change to a previous phase and triggers generation of an output sample.
A comb filter 502 that computes a differential signal, formed by subtracting the previous input from the current input value.
One or more multipliers 504 that multiply the differential signal from the comb filter 502 by powers of the wrapped phase value.
An accumulate-and-load unit (A&L), which is loaded with the previous input sample value upon generation of an output sample.
One or more accumulate and dump units (A&D), which are reset to zero upon generation of a new output sample.
A matrix multiplication unit 506, that forms multiple outputs using constant coefficient multiplication of integrate and load/integrate and dump values. The coefficients of the matrix are ci,j, where i is an index of the order of the polynomials and j is an index of the polynomial pieces.
A delay-and-add unit 508, that adds delayed versions of the outputs of the matrix.
As described above, a general equation describing the filtering and resampling of a sample rate converter is:
Where h(t) is a continuous time impulse response, T1 is the input sampling period, T2 is the output sampling period, x(kT1) is the input sample sequence and, y(mT2) is the output sample sequence. In effect, an output sample is a weighted sum of the continuous time impulse response h(τ) at discrete time differences τ=mT2−kT1.
The present disclosure provides an enhanced resampling scheme, where the continuous time response h(t) is integrated over all values instead of only summed at discrete time differences:
In the Farrow Structure, the function ha) is a piecewise polynomial function:
The polynomial sections hj(t) can be defined as:
For each polynomial section j we can define the following function:
Hj(t) is a function that applies a coefficient matrix to the polynomial sections of the continuous time response h(t).
The enhanced sample rate converter evaluates the definite integral mentioned earlier
by using an initial guess, followed by refinement stages.
At the start of each interval T2, the initial guess is that there will be no new incoming sample over that interval. The last known sample, xprev is loaded in the first accumulator, and the other accumulators are reset to zero. Whenever an incoming sample xprev arrives, this guess needs to be refined. The refinement is (xnew−xprev(ti, using again the assumption that there will not be an additional incoming sample over the rest current interval. Subsequently the last known sample is updated to the value of incoming sample. At the end of the interval T2 an output sample is computed and the next interval T2 starts.
A structure according to the disclosure (which may be referred to as an “enhanced” Transposed Farrow Structure, or ETFS) significantly reduces computation complexity compared to a Transposed Farrow Structure of the same order. The Enhanced Transposed Farrow Structure of order 2 requires 2 variable multipliers and 6 constant coefficient multipliers, whereas the Transposed Farrow structure uses 3 variable multipliers and 9 constant coefficient multipliers. Note that he Transposed Farrow Structure requires a dedicated multiplier for multiplication with the phase increment ω, which is not required for the Enhanced Transposed Farrow Structure. The difference in complexity of the accumulate-and-load unit compared to the accumulate-and-dump is very small: For the accumulate-and-load the accumulator is set to a starting value when triggered, and for the accumulate-and-dump the accumulator is reset to zero.
In practice the variable multipliers can be implemented using a single physical multiplier that performs all the required multiplications, by using a system clock that is higher than the input sample rate. In the example where both multipliers are applied (as shown in
The disclosure is applicable to any polynomial order N≥1 and any number of segments L≥1. The size of the corresponding coefficient matrix is L×K, with K=N+1. A convenient way to generate the matrix coefficients for the ETFS of order N is to derive them from the coefficients of the TFS of lower polynomial order N−1, using the following equation:
Where E is a coefficient matrix with dimensions (L×K) of the Enhanced Transposed Farrow Structure, and F is a coefficient matrix L×(K−1) of the Transposed Farrrow Structure.
The disclosure is not limited to any particular interpolation method and there are many methods that can be used. However, in a preferred embodiment a B-spline interpolation is used because this results in coefficients which can be expressed in only a few powers of two (and a common gain factor). As a result, the constant coefficient multipliers can be efficiently implemented as a combination of shift-and-adds.
The coefficient matrices for B-spline interpolation order 0, 1 and 2 are:
The coefficients corresponding to Enhanced B-spline interpolation of order 1,2 and 3 are:
Transposed Farrrow Structures of the type illustrated in
The coefficients corresponding to Modified B-spline interpolation of order 0,1,2,3 are:
For an Enhanced Farrow Structure according to the disclosure the symmetric phase range modification is also possible. The general formula for transforming a coefficient matrix MTF width dimensions L×(K−1) for a Modified Transposed Farrow Structure to a coefficient matrix EMTF with dimensions L×K for the Enhanced Modified Transposed Farrow Structure is:
As a result, all columns with index j>1 will be either symmetric or anti-symmetric.
For the same polynomial order, the proposed structure gives much better quality of result than the conventional farrow structure, with lower computational complexity. For a given quality target, the invention allows lower oversampling ratios and/or lower order interpolation to be used than the existing structures, resulting in lower operating frequencies and hence lower power, and lower footprint.
The power spectra in
The spectra 1100 through 1110 were obtained from simulations, using four full scale sinusoidal input signals with a frequency of 1 kHz, and input sample rate of 44.1 kHz for each of the input signals, and an output sample rate of 48 kHz for each of the output signals. The simulations were done in 64-bit precision floating point logic. Hence quantization noise is expected to be around −300 dBFS. It can be seen from the spectra 1100 through 1110 that for order =2 and order=3, The Enhanced Transposed Farrow Structure provides approximately 50-100 dB better suppression of spurious noise in the audible region (100 Hz-20 kHz). Similar advantages are also obtained for higher orders.
The time domain is also illustrated, in
It can be seen that that the first and second order transposed farrow structures show large spikes in the output, which are avoided with the enhanced transposed farrow structure according to the disclosure. In the conventional transposed farrow function the input signal is represented by a series of dirac delta pulses, which are spaced apart by an interval T1. The output samples are computed by convoluting the input signal with a continuous time impulse response h(t). If the width of h(t) is less than sample spacing the output of T1 then the output will consist of convoluted delta pulses, resulting in spikes (fast variations in the output). For an input signal that is constant (nonzero), the output signal will be time varying.
In the Enhanced Transposed farrow structure the input signal is effectively represented by a stepwise function, which is convoluted by a continuous time impulse response h(t). For an input signal that is constant (nonzero), the output signal will become constant as well.
While these spikes in the output of the first and second order transposed farrow structures are associated with spectral components that are outside the audible region, additional headroom in the data path is still required to prevent saturation or wrap around. This is because variations in the output signal result in an increased dynamic range (larger peak-to-peak values). As a result, it requires more bits to accurately represent the signal (preventing clipping), and hence larger multipliers and adders are needed to process signal, resulting in a larger physical area. This additional headroom can be omitted when using the enhanced transposed farrow structure, which allows for a more power and area efficient implementation.
To further aid the understanding of the disclosure,
Compared with the embodiment of
An accumulate and dump (A&D) unit may be any suitable circuit, device or code that provides for the storage and summation of a plurality of values and arranged so that upon receipt of a trigger signal it resets the accumulated value to zero, and can output the accumulated value. As an example,
The accumulate and dump unit 800 comprises an adder 802, a delay element 804 acting as an accumulator, and a multiplexer 806. When the multiplexer 806 receives a trigger signal, t, it resets a value stored at the delay element to zero, as shown by the input 808 at t=1 and outputs the accumulated value, which in this case is a phase value ϕ.
An accumulate and load unit may be any suitable circuit, device or code that provides for the storage and summation of a plurality of values and arranged so that upon receipt of a trigger signal it makes the accumulated value available and resets itself to a defined value. As an example,
The accumulate and load unit 900 comprises an adder 802, a delay element 804 acting as an accumulator, and a multiplexer 806. When the multiplexer 806 receives a trigger signal, t, it resets a value stored at the delay element to an initial accumulator value as shown by the input 900 and outputs the accumulated value, which in this case is a phase value ϕ.
Various modifications and improvements can be made to the above without departing from the scope of the disclosure.
It should be understood that the logic code, programs, modules, processes, methods, and the order in which the respective elements of each method are performed are purely exemplary. Depending on the implementation, they may be performed in any order or in parallel, unless indicated otherwise in the present disclosure. Further, the logic code is not related, or limited to any particular programming language, and may comprise one or more modules that execute on one or more processors in a distributed, non-distributed, or multiprocessing environment.
The method as described above may be used in the fabrication of integrated circuit chips. The resulting integrated circuit chips can be distributed by the fabricator in raw wafer form (that is, as a single wafer that has multiple unpackaged chips), as a bare die, or in a packaged form. In the latter case, the chip is mounted in a single chip package (such as a plastic carrier, with leads that are affixed to a motherboard or other higher level carrier) or in a multi-chip package (such as a ceramic carrier that has either or both surface interconnections or buried interconnections). In any case, the chip is then integrated with other chips, discrete circuit elements, and/or other signal processing devices as part of either (a) an intermediate product, such as a motherboard, or (b) an end product. The end product can be any product that includes integrated circuit chips, ranging from toys and other low-end applications to advanced computer products having a display, a keyboard or other input device, and a central processor.
While aspects of the invention have been described with reference to at least one exemplary embodiment, it is to be clearly understood by those skilled in the art that the invention is not limited thereto. Rather, the scope of the invention is to be interpreted only in conjunction with the appended claims and it is made clear, here, that the inventor(s) believe that the claimed subject matter is the invention.
Number | Date | Country | Kind |
---|---|---|---|
1611083 | Jun 2016 | GB | national |
This application is a Continuation of U.S. application Ser. No. 15/631,067 which was filed on Jun. 23, 2017, assigned to a common assignee, and which is herein incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5473555 | Potter | Dec 1995 | A |
20060248133 | Li | Nov 2006 | A1 |
20080155000 | Mehrseresht | Jun 2008 | A1 |
20090319065 | Risbo | Dec 2009 | A1 |
20100329229 | Lipka | Dec 2010 | A1 |
20200204192 | Hamlin | Jun 2020 | A1 |
Entry |
---|
“Implementation of the Transposed Farrow Structure,” by Djordje Babic et al., IEEE International Symposium on Circuits and Systems, 2002. ISCAS 2002, May 26-29, 2002, pp. IV-5 to IV-8. |
“Sample Rate Conversion by Trapezoidal Interpolation for Software Defined Radio,” by Xiaojing Huang et al., 14th IEEE Proceedings on Personal, Indoor and Mobile Radio Communications, 2003, PIMRC 2003, Sep. 7-10, 2003, pp. 135-139. |
“Polynomial Interpolators for High-Quality Resampling of Oversampled Audio,” by Olli Niemitalo, http://www.student.oulu.fi/˜oniemita/DSP/INDEX.HTMoniemitalo@sublevel3.org, pp. 1-65. |
“Continuous-Time Digital Filters for Sample-Rate Conversion in Reconfigurable Radio Terminals,” by Tim Hentschel et al., Frequenz, Journal of RF-Engineering and Telecommunications, vol. 55, Issue 5-6, May 2001, 6 pgs. |
Number | Date | Country | |
---|---|---|---|
20200012707 A1 | Jan 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15631067 | Jun 2017 | US |
Child | 16574256 | US |