The present invention is related to digital processing techniques and, more particularly, to techniques for vector convolution.
A vector processor implements an instruction set containing instructions that operate on vectors (i.e., one-dimensional arrays of data). Scalar digital signal processors (DSPs), on the other hand, have instructions that operate on single data items. Vector processors offer improved performance on certain workloads.
Digital processors, such as DSPs and vector processors, often incorporate specialized hardware to perform software operations that are required for math-intensive processing applications, such as addition, multiplication, multiply-accumulate (MAC), and shift-accumulate. A Multiply-Accumulate architecture, for example, recognizes that many common data processing operations involve multiplying two numbers together, adding the resulting value to another value and then accumulating the result. Such basic operations can be efficiently carried out utilizing specialized high-speed multipliers and accumulators.
Existing DSPs and vector processors, however, do not provide specialized instructions to support vector convolution of an input signal by a filter having an impulse response. Increasingly, however, there is a need for vector convolution operations in processors. In the FIR filter domain, for example, convolution processes an input waveform signal and the impulse response of the filter as a function of an applied time lag (delay). A convolution processor typically receives and processes a time shifted input signal and the impulse response of the filter and produces one output value for each time shifted version (each time lag). Such convolution computation can be extensively utilized, for example, in FIR filter applications. For an input sequence length of L and a number of time lags W, the required computation complexity is O(L*W). Because of the large number of calculations required, it is therefore highly desirable to accelerate convolution computation in many applications.
A need therefore exists for digital processors, such as vector processors, having an instruction set that supports a vector convolution function.
Generally, a vector processor is provided having an instruction set with a vector convolution function. According to one aspect of the invention, the disclosed vector processor performs a convolution function between an input signal and a filter impulse response by obtaining a vector comprised of at least N1+N2-1 input samples; obtaining N2 time shifted versions of the vector (including a zero shifted version), wherein each time shifted version comprises N1 samples; and performing a weighted sum of the time shifted versions of the vector by a vector of N1 coefficients; and producing an output vector comprising one output value for each of the weighted sums. The vector processor performs the method, for example, in response to one or more vector convolution software instructions having a vector comprised of the N1+N2-1 input samples.
The vector can comprise a plurality of real or complex input samples and the filter impulse response can be expressed using a plurality of coefficients that are real or complex. The plurality of coefficients can be processed with a reduced number of bits using a plurality of iterations until all bits of the coefficients are processed; and an output of each iteration is shifted and accumulated until all bits of the coefficients are processed.
In a further embodiment, when a number of coefficients supported by the convolution is less than a number of coefficients in a filter being processed; smaller chunks of the larger filter are iteratively processed and an output of each iteration is accumulated for each chunk until all of the larger filter is processed.
A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings.
Aspects of the present invention provide a vector processor that supports a vector convolution function. A convolution instruction typically receives and processes a time shifted input signal and the impulse response of the filter and produces a vector having one output value for each time shifted version. The elementary MAC operations can be with complex or real inputs and coefficients. Thus, both the input samples and coefficients can be real and/or imaginary numbers. The disclosed specialized vector convolution instruction can be used to implement, for example, a channel filter, RF equalizer, IQ imbalance correction and convolutions for digital pre-distortion (DPD) parameter estimation, in Digital Front-end signal processing. As used herein, the term “vector processor” shall be a processor that executes vector instructions on vector data in program code.
The present invention can be applied, for example, in handsets, base stations and other network elements.
Generally, if the vector processor 100 is processing software code that includes a predefined instruction keyword corresponding to a vector convolution function and the appropriate operands for the function (i.e., the input samples), the instruction decoder must trigger the appropriate vector convolution functional unit(s) 110 that is required to process the vector convolution instruction. It is noted that a vector convolution functional unit 110 can be shared by more than one instruction.
Generally, aspects of the present invention extend conventional vector processors to provide an enhanced instruction set that supports vector convolution functions. The vector processor 100 in accordance with aspects of the present invention receives an input vector having real or complex inputs, applies a complex vector convolution function to the input and generates a vector having one output value for each time shift.
The disclosed vector processors 100 have a vector architecture, as discussed hereinafter in conjunction with
In the exemplary embodiment of
The disclosed vector convolution function (vec_conv( )) accelerates the FIR filter within the vector convolution function 200 where the coefficients are, e.g., binary values (such as 2bit, 4bit, etc.). Additionally, the operation can be further accelerated and performed in a single cycle using a sufficient number of bits for the coefficient, such as 18 bits. Generally, each time shifted operation comprises an FIR filtering of the shifted input value 220 and the coefficient.
For an exemplary convolution with 2bit values, an FIR filter/convolution operation can be written as follows:
where h(k) indicates the coefficients and x(n-k) indicates the time shifted input values. In the case of a multi-phase filter, the coefficients hk can be changed for each phase of the filter.
The convolution of an input signal x by a filter having an impulse response h can be written as follows:
The correlation or cross-correlation of an input signal x with an input signal y can be written as follows (where signal x and/or signal y can be a known reference signal such as a pilot signal or a CDMA binary/bipodal code):
For an exemplary convolution with a 12-bit representation of the coefficients, there are 6 iterations to compute the FIR filter output (6 times 2-bit values).
Generally, the vector-based digital processor 300 processes a vector of inputs x and generates a vector of outputs, y(n). An exemplary vector-based digital processor 300 for N1=32 and N2=37 can be expressed as:
(y1, y2, . . . y37)=vec_cor32×37(x1, x2, , x68).
While exemplary embodiments of the present invention have been described with respect to digital logic blocks and memory tables within a digital processor, as would be apparent to one skilled in the art, various functions may be implemented in the digital domain as processing steps in a software program, in hardware by circuit elements or state machines, or in combination of both software and hardware. Such software may be employed in, for example, a digital signal processor, application specific integrated circuit or micro-controller. Such hardware and software may be embodied within circuits implemented within an integrated circuit.
Thus, the functions of the present invention can be embodied in the form of methods and apparatuses for practicing those methods. One or more aspects of the present invention can be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, wherein, when the program code is loaded into and executed by a machine, such as a processor, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a device that operates analogously to specific logic circuits. The invention can also be implemented in one or more of an integrated circuit, a digital processor, a microprocessor, and a micro-controller.
It is to be understood that the embodiments and variations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention.
The present application claims priority to U.S. Patent Provisional Application Ser. No. 61/552,242, filed Oct. 27, 2011, entitled “Software Digital Front End (SoftDFE) Signal Processing and Digital Radio,” incorporated by reference herein. The present application is related to U.S. patent application Ser. No. 12/849142, filed Aug. 3, 2010, entitled “System and Method for Providing Memory Bandwidth Efficient Correlation Acceleration,” incorporated by reference herein.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US12/62182 | 10/26/2012 | WO | 00 | 4/24/2013 |
Number | Date | Country | |
---|---|---|---|
61552242 | Oct 2011 | US |