The invention relates to programmable processing methods and systems for use in communications applications. More particularly, the invention relates to performing communications processing functions on programmable parallel processors.
Generally the modulation and demodulation required in modern communications devices uses many different processing steps to convert data (digital or analog or other information that can be expressed in digital form) to a waveform signal used at the transmitter and conveyed by some means to a receiver that is tolerant of channel impairments and path losses between the transmitter and receiver. High performance communication systems are known to be very processing intensive. In the prior art these processing steps were performed with dedicated hardware developed specifically for that purpose. More recently, it has become known to partition off some of the processing steps, assigning different functions to individual processors such as programmable Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs) and/or Field Programmable Gate Array (FPGA) devices. This type of architecture is ad-hoc, has limited flexibility after the partitioning has occurred and has been committed to hardware, and is specific to the modulation format. The inflexibility inherent in these ad-hoc designs has been a major impediment to the development of a Software Defined Radio (SDR).
From efforts to make hardware more flexible for different applications and standards, the concept of the Software Defined Radio (SDR) arose. The SDR implementations to date have not fully realized the potential or vision of fully programmable hardware/software architecture. Providing the flexibility in hardware to the degree required for future modulation schemes and other foreseeable requirements undefined at the time of design has been nearly impossible. Difficulties in approaching these ideal goals are further compounded by the very short real-time schedules for the processing required in most applications. On the one hand, making the software more portable and structured degrades performance, which has been a limitation in the application of the SDR concept. On the other hand, performing many of the functions in FPGAs provides some flexibility with good performance when using FPGAs that have downloadable codes from a host processor, but this approach requires much more effort to develop than pure software and imposes a time-to-market limitation, and imposes yet more design restrictions. Each implementation has limited reuse potential, such that nearly every change in waveform calls for a complete new design. Also, FPGA implementations tend to have higher power and cost compared to full ASIC implementations. There have been base stations introduced to the market claiming SDR functionality, but the portability and performance of SDR systems known in the art are limited. The designs current in the art use a combination of DSPs and field-programmable FPGAs that limit design flexibility and limit development cost reductions attainable.
Due to the foregoing and possibly additional problems, improved methods and systems for processing communications signals using parallel processing systems and techniques would be a useful contribution to the arts.
The invention provides systems and methods for digitally modulating and demodulating communication signals using parallel processing. The invention may be used for the purpose of transforming bit streams or other information that can be represented as a sequence of numbers into waveforms for transmission and receiving on a communication channel, and processing them to extract the information stream using a plurality of processing elements in the described architecture. For example, the invention may be used to enable mobile phones or other mobile devices to communicate with a network access point or base station. The systems and methods may also be used for signal processing within a network access point or base station. Scalability potential is also provided for large scale communications processing solutions.
According to one aspect of the invention, in a preferred embodiment of a communications processing system, a plurality of functionally identical processing elements are interconnected by shared memory interfaces. The shared memory is coupled with a host General Purpose Processor (GPP) for communications and/or control of the processing elements. Each of the processing elements is connected to a local private memory, increasing total memory bandwidth for the processing elements. A digital interface to one or more antennas is also provided.
According to other aspects of the invention, in an example of a preferred embodiment, a communications processing system includes processors for performing computations used for one or more processing functions, including dynamic spectrum awareness for spectrum allocation optimization, computing metrics for routing decisions between wireless nodes, utilizing multiple antenna resources for improved performance, computing metrics for improved system performance with multiple base stations.
According to another aspect of the invention, a communication signal processing system in a preferred embodiment includes numerous processor elements. Each of the processor elements has local memory and an arithmetic unit, an interface for communications, and a control block that may control individual processing elements or clusters of processing elements. One or more devices provide communication between the processor elements. A host processor is provided for programming and controlling the processor elements, and an interface with one or more antennas completes the system.
According to additional aspects of the invention, in exemplary embodiments, a processing system is disclosed in which at least one GPP using an operating system is coupled with at least one General Purpose Graphics Processing Unit (GPGPU) for communications processing, an interface to at least one radio resource, and an interface to at least one communications network. The system may include a GPP and its operating system configured in such a way as to establish virtual machines for partitioning services in various ways according to operational parameters and/or service objectives.
The invention may be understood from the following detailed description when read in connection with the following figures:
References in the detailed description correspond to like references in the various drawings unless otherwise noted. Descriptive and directional terms used in the written description such as front, back, top, bottom, et cetera, refer to the drawings themselves as laid out on the paper and not to physical limitations of the invention unless specifically noted. The drawings are not to scale, and some features of embodiments shown and discussed are simplified or amplified for illustrating principles and features as well as advantages of the invention.
Communication applications require and will continue to require increasing amounts of data to be transmitted over wireless systems. Systems and methods are disclosed that provide very flexible communications capabilities wherein the hardware is scalable and supportive of communication approaches known in the arts and is designed to support future modifications. Preferably, communication is accomplished using a selection from among several known protocols for voice and/or data transmission, for example, CDMA, WCDMA, TDMA, GSM, EDGE, 3G, 4G, LTE, WiMax, 802.16e, 802.11b, 802.11g, Bluetooth, Zigbee, WLAN, WPAN, WWAN, and the like. The invention is not limited to these modulation and demodulation methods. The individual communication devices may be cell phones or other devices, including wireless portable email terminals, computers, both fixed and portable, such as laptops and palm computers, smart phones, fixed location, handheld, and vehicle mounted telephone equipment, personal internet browsing devices, video equipment, and other communications or data receiver or transmitter applications. In these exemplary applications, and potentially others, all of the necessary communication processing is preferably performed using the standard hardware architecture described. An advantage of the approach is that nearly any communications standard or method can be implemented on a low-cost, high-performance commodity hardware platform. This allows easy field upgrades and standard changeover as required to upgrade systems for performance or standards reasons. Additionally, multiple standards may be supported simultaneously on the same platform and/or multiple service providers may share the same hardware resource for more cost effective solutions. Also, the architecture components are commonly available components so that costs may be reduced by using components also used in other high volume industries. Further advantages include one or more of: general programmability, reduced development costs; rapid remote field upgrades and waveform modes for rapid upgrades without physical investment; partitionable processing, accommodating multiple standards, operators, and virtual base stations simultaneously; accommodating developing standards without hardware changeover; scalable architecture where only new processing elements need to be added for additional performance; parallel processing reduces latency; utilizes readily available low-cost, high-performance interconnect and switching hardware for scaling using Infiniband or similar technologies across multiple processing blocks. In general, the invention provides communication signal processing using an implementation of parallel processing, preferably massively parallel processing. The processing systems and methods preferably use readily available components, maintain the required performance, and are sufficiently programmable and adaptable to reduce the investment required to implement many existing standards and future modifications. The system and methods described herein may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The described embodiments are therefore to be considered in all respects illustrative and not limiting. The invention described is one potential implementation of a software defined radio (SDR).
A GPGPU as used in preferred implementations of the invention is a processing system that may include a plurality of processing elements interconnected by shared memory interfaces, a shared memory connected to a host general purpose processor (GPP) for control of the shared memory. Each processing element is connected to a local memory to increase total memory bandwidth for processing. The processing system efficiently performs communications processing. The GPGPUs are preferably massively parallel with hundreds or thousands of processors. This changes the processing paradigm of the processing model. Each processing element may be a vector processor using a single instruction stream with a separate data stream for each element. One or more devices are included for providing communication between the processor elements. A host processor is utilized for programming and controlling the processor elements. Each processor element has local memory, and the processor elements may each perform communications signal processing.
An example of a preferred implementation of the methods and systems of the invention is described with respect to use in the context of wideband cellular, i.e., wireless, communications, but the invention is not is not limited to such applications. A typical exemplary application considered is for a cellular telephone base station and data access point. Graphics Processor Units (GPUs) have been generalized to address a wider range of applications beyond computer graphics and have sometimes been renamed General Purpose Graphics Processor Units (GPGPUs). These processors have been applied to many traditional high performance computing applications, such as, not surprisingly, graphics processing. It has been noted by the inventors that these processors to date have not been applied to communications. Modern GPGPUs offer floating point arithmetic, reducing the engineering effort in the implementation of many algorithms. They also support fixed point arithmetic so that algorithms may utilize this capability for higher speed processing where deemed feasible or to ease the porting of software already using fixed point arithmetic. Examples of communications functions that may be provided according to the invention include but are not limited to: channelizer/polyphase filters; equalization filters; Fast Fourier Transforms/Inverse Fourier Transforms (FFT/IFFT); forward error correction (FEC) encoding and decoding (where the code may include convolutional codes, LDPC codes, Turbo Codes, Algebraic codes); interleaving/de-interleaving; matched filtering; numerically controlled oscillator/quadrature mixers, Automatic Gain Control (AGC); clock/carrier recovery; CDMA spreading/dispreading; rake receiver; sample rate conversion; preamble insertion/removal; preamble correlations; generation of quality metrics (such as EVM and ACLR for example). According to the invention, all of these functions may be performed with GPGPUs or similar processors. The processors may also be used for higher layer processing required in a complete communications system such as a base station. One example is the mapping of MAC addresses to IP addresses. This mapping can be significantly accelerated on a parallel or massively parallel processing architecture, as in a GPGPU, by assigning a search range to each processing element and then collecting the information in a central point with the ‘winning’ processor reporting the match found. Distributed algorithms may also be used for routing, using a distributed Dijkstra algorithm as an example. Alternatively, the L2/L3 functionality may be provided using multi-core microprocessors.
In
In some applications more processing will be required than can be provided by a single GPGPU. Now referring primarily to
In the example of virtualization of a base station, each service provider may be given a physical GPP resource, and the GPGPU processing may be managed in the host processor. However, despite some reduction in performance, it may be preferred for the GPP pool to use the virtual processor pool so that the system can benefit from this approach. The GPPs may be allocated based the virtual processing load, for example, where a specific vendor requires a portion of a GPP or several across the array. The system then also benefits in that redundancy may be built into the operation of the system so that failed units can be reported and the work dynamically reassigned to functional units. Consider the application illustrated in
The processing may be distributed to accommodate processing loads that are not feasible with the current state of the art in a number of different ways or some combination of ways. The processing loads may be split using at least one of the following.
In all of these cases, the resources may be statically or dynamically allocated in any combination. Static allocations are the simplest but may not be the most efficient use of the processing resources. Dynamic allocation utilizes the resources more efficiently but an overhead is incurred in the allocation of the resources.
In the shared resource model many resources may be deployed for the implementation of the base station. With multiple processing modules or multiple RRH's the system may include a switching fabric to route data between resources for load balancing. The introduction of a switching fabric allows the base station to be scaled to nearly any size as may be required.
With the possibility of supporting multiple service providers on a single platform, the base station may be provided as a service itself to a cellular service provider or an agent of the service provider. These services may be one of the following, or a combination of the following.
An exemplary processing flow used in the signal processing of the transmission path is shown in
In general, a system that uses a plurality of parallel processors for providing a plurality of functions required in a high performance system for waveform processing may include a plurality of functions, which are parameterized such that the required processing steps are partitioned among a plurality of processing elements. The plurality of functions have inputs, outputs, and parameters in accordance with a common protocol such that the processing functions and control functions are separated along these lines. A hierarchy of communications methods between processors, and groups of parallel processors that is efficient for the functions considered may also include multi-ported memories or switch fabrics. The processing functions of the system can be scheduled in any order using the common interface rules in any order to accomplish the system function desired. The processing elements or blocks may process vectors using a SIMD or SIMT (single instruction multiple thread) architecture and may contain multiple SIMD/SIMT blocks. The processing system may be connected to a plurality of antenna elements to facilitate MIMO operation, multiple virtual base stations, multiple service providers, or multiple radio standards simultaneously or in any combination thereof. The system work load may be partitioned by radio standard, service provider, antennas, or other logical or arbitrary partition or in any combination thereof. The work load may be dynamic, allocating resources optimally in some sense to reduce operating costs, power, size or other appropriate metric or in any combination thereof. The system may enable hoteling (placing remote radio heads on multiple antenna masts). Processors may be synchronized using semaphores or equivalent synchronization methods on a multi-processor system. The allocation of computing resources can be dynamic using task queues and allocated to available processing elements according to a priority schedule. The processing system allows higher layer functions to be also used to accelerate higher layer protocol elements. The higher layer functions may be performed on more conventional general purpose processors (GPP) that may themselves be multi-processors. The processing system may include a GPP for control, scheduling and synchronization of processing tasks. The processing system may include antenna elements that are amplified and digitized and presented to the processing system and digitized signals are presented to an antenna element for transmission. Digitized data may be time stamped to align or identify data where time is required to perform the processing correctly. The processing system may include an ASIC that has multiple processing elements or a system that is comprised of multiple ASICs of this type to achieve a larger processing capability. The processing system may include a graphic processing unit (GPU) or general purpose graphics processing unit (GPGPU). The processing system may include an ADC and DAC interfaces for the source and destination signal streams or a plurality of ADC and DAC interfaces or other more direct interface to a RF upconversion/downconversion interface. The processing system may include dynamic spectrum awareness by performing operations required for the decision in allocating spectrum to maximize or minimize an objective function. The processing system may perform processing required to drive cognitive radio decisions. (e.g., sufficiently computationally intelligent radio resources and related computer-to-computer communications to detect user communications needs as a function of use context, and to provide radio resources and wireless services most appropriate to those needs). The processing system may compute metrics used in mesh network routing and computes optimal routes according to an objective function. The processing system may utilize a hierarchy of switching elements to create a switching fabric that allows communications between any pair wise processing element either directly or indirectly using the fabric. The processing system may use virtual machines for partitioning the processing between different service providers.
In order to further illustrate the principles and practice of the invention, a specific example of an FIR filter using the GPGPU in accordance with the presently preferred embodiments is shown below using the programming language CUDA which is a multiprocessor extension to C:
A portable system may include an RF up conversion and down conversion component interfacing to a digital processor and an antenna. a digital processor including a plurality of processing elements, a transducer for communications with the local environment that includes at least one of the following elements: a speaker and microphone; a digital interface for communications with another processor or storage device; a second wireless communications device; an analog to digital converter and a digital to analog converter for providing an analog interface; digital processing elements that can be programmed to support a plurality of communications waveforms; digital processing elements that can be programmed to support an image processing function.
The systems and methods of the invention provide one or more advantages including but not limited to one or more of, improved communications efficiency and reduced costs. While the invention has been described with reference to certain illustrative embodiments, those described herein are not intended to be construed in a limiting sense. For example, variations or combinations of features or materials in the embodiments shown and described may be used in particular cases without departure from the invention. Although the presently preferred embodiments are described herein in terms of particular examples, modifications and combinations of the illustrative embodiments as well as other advantages and embodiments of the invention will be apparent to persons skilled in the arts upon reference to the drawings, description, and claims.