1. Field of the Invention
The disclosed technology relates to a data transfer device for transferring data on a platform, in particular, for transferring simultaneous data between different components of the platform.
2. Description of the Related Technology
The continuously growing variety of wireless standards and the increasing costs related to IC design and handset integration make implementation of wireless standards on reconfigurable radio platforms the only viable option in the near future.
In the concept of cognitive reconfigurable radio (CRR), various communication modes need to be supported. The required flexibility and high performance lead to heterogeneous multiprocessor platforms. With platform is meant the framework on which applications may be run. CRR is an effective way to provide the performance and flexibility necessary therefore. A cognitive radio, broadly defined, is a radio that can autonomously change its transmission and receive parameters based on interaction with and learning of the environment in which it operates. A more spectrum-centric definition denotes a radio that co-exists with other wireless systems using the same spectrum resources without significantly interfering with them (also referred to as opportunistic radio). Both are considered in parallel.
Another type of cognitive radio is a software-defined radio (SDR) system, which is a radio communication system where components that previously were implemented in hardware are now instead implemented using software on a computing system, such as for example an embedded computing device. A basic SDR system may comprise a computing device equipped with a sound card, or another analog to digital converter, preceded by some form of RF front end. Significant amounts of signal processing are handed over to a general purpose processor of the computing device, rather than being done in special-purpose hardware. Such a design produces a radio that can receive and transmit different radio protocols based solely on the software used.
The wireless standards in the scope of CRR or SDR are LTE evolutions, WLAN evolutions and broadcasting standards. The goal is to support 4G connectivity requirements which include support of 1 Gbps and 100 Mbps as well as support of 4×4 MIMO operations with advanced detection capabilities. The 3GPP LTE standard is a very flexible standard and dimensioning a platform largely depends on the mode subset supported by the platform. The interconnection bandwidth between the baseband engines and the front-end interfaces on the one hand and between the baseband engines and the outer modem blocks on the other hand both during reception and transmission, as well as the computational requirements for the baseband engines and the outer modem blocks largely depend on the envisioned communication modes. In the 802.11x set of standards, and more specifically in the 802.11n standard, the functional requirements for the platform in terms of required interconnection bandwidth (between digital front-end interface and baseband engines on the one hand, and between the baseband engines and the outer modem blocks on the other hand), for the computation requirement of the inner and outer modem processing, depend on the chosen communication mode.
Most commonly, as for example described in WO 2007/132016, a bus infrastructure like for example AHB (Advanced High Performance Bus), AHB-Lite (a subset of the full AHB specification intended for use in designs where only a single bus master is used) or AXI (Advanced eXtensible Interface) are used as interconnection. Both in gate count as well as in programming paradigm, AXI and AHB are a bit heavy for what is needed. Further, predictability of the bus-architecture is also desired. For broadcasting from one source to multiple destinations this type of bus becomes complex and should even be avoided. Most interconnects in the art have one or more of the following problems:
Also another common technique is point to point connection which is not flexible enough for different parallelization schemes.
WO 2008/103850 describes a video surveillance system including a plurality of input ports for coupling a camera, synchronization logic blocks coupled to the input ports, an image sharing logic block coupled to the camera ports, and an output port coupled to the image sharing logic block. In the system described it is desired to synchronize image capture and/or subsequent transfer between multiple cameras. The surveillance system makes sure all the input ports are synchronized, and then sends the information. However, as the data that will enter the system is unpredictable, such system needs to have overdesigned memory space at the output in order to prevent a buffer data overflow at the output. This is not desired because overdesigning memory space burns up area and prevents the system from being low power.
Certain inventive aspects relate to a device for energy and latency efficient communication between different components on a platform.
One inventive aspect relates to a data transfer device adapted for simultaneous transfer of data between at least 3 ports of which at least one is an input port and at least one is an output port. The data transfer device comprises at least two controllers (IC1, IC2) for executing instructions that transfer data between an input and an output port. The controllers are adapted for receiving a synchronization instruction for synchronizing between input and output ports.
In a data transfer device according to one inventive aspect, the controllers may furthermore be adapted for receiving a synchronization instruction for synchronizing between the controllers.
In one aspect, each controller is connected to one output port.
In one aspect, the data transfer device comprises at least two program memories for storing transfer instructions. The data transfer device may comprise as many program memories as there are controllers.
In an embodiment, the data transfer device further comprises a controller interface for programming the at least two program memories.
The proposed device provides an efficient and predictable device of synchronized and un-synchronized communication between different components on the platform. The device supports efficient communication between multiple cores with low, predictable latency as well as power. Furthermore, multiple streams, even of multiple (transmit and/or receive) standards, can run in parallel with the required freedom to be provided to ensure different code parallelization strategies between the different cores. A distributed and programmable stream control architecture is presented that can manage multiple synchronous or asynchronous communication streams in parallel. Flow control is implemented between source and destination as well as between streams.
It is an advantage of one inventive aspect that they may be used when designing a reconfigurable platform solution that supports CRR and SDR systems. The platform may support co-existence of multiple standards and the handover between the standards. At baseband level, the flexibility is provided to support this during run-time by run-time reconfiguration of the platform, so that any change in parallelism/mode of operation at run-time can be obtained.
Particular and preferred aspects of the invention are set out in the accompanying independent and dependent claims. Features from the dependent claims may be combined with features of the independent claims and with features of other dependent claims as appropriate and not merely as explicitly set out in the claims.
Illustrative embodiments are described below in conjunction with the appended drawing figures, wherein like reference numerals refer to like elements in the various figures, and wherein:
The drawings are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes.
Any reference signs in the claims shall not be construed as limiting the scope.
The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto. The drawings described are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes.
Furthermore, the terms first, second, third and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. The terms are interchangeable under appropriate circumstances and the embodiments of the invention can operate in other sequences than described or illustrated herein.
Moreover, the terms top, bottom, over, under and the like in the description and the claims are used for descriptive purposes and not necessarily for describing relative positions. The terms so used are interchangeable under appropriate circumstances and the embodiments of the invention described herein can operate in other orientations than described or illustrated herein.
The term “comprising”, used in the claims, should not be interpreted as being restricted to the means listed thereafter; it does not exclude other elements or steps. It needs to be interpreted as specifying the presence of the stated features, integers, steps or components as referred to, but does not preclude the presence or addition of one or more other features, integers, steps or components, or groups thereof. Thus, the scope of the expression “a device comprising means A and B” should not be limited to devices consisting of only components A and B. It means that with respect to the present invention, the only relevant components of the device are A and B.
A data transfer device 10 is presented which is adapted for simultaneous transfer of data on a platform. The data transfer device 10 serves as an interconnect between different components of a platform; e.g. the interconnect between the baseband engines (e.g. CGA) 12 and the front-end interfaces (e.g. DFE) 11 on the one hand and between the baseband engines 12 and the outer modem blocks (e.g. FEC) 12. In particular, the data transfer device 10 comprises at least three ports of which at least one is an input port, e.g. ports 13, 14, 15 in the embodiment illustrated, and at least one is an output port, e.g. ports 16, 17, 18 in the embodiment illustrated. The data transfer device 10 comprises at least two controllers 20, 21 for executing instructions that transfer data between an input port 13, 14, and an output port 16, 17, 18. The intelligence of the data transfer device 10 according to one embodiment is in the programmable interconnect controllers 20, 21. The controllers 20, 21 are adapted for receiving a synchronization instruction for synchronizing between the controllers 20, 21 and/or a synchronization instruction for synchronizing input ports 13, 14, and output ports 16, 17, 18.
Possible types of data flows between the different components are for example (the example for illustration purposes only being specific to an SDR platform):
The proposed transfer device 10 according to one embodiment provides an efficient and predictable device of synchronized and un-synchronized communication between different components on the platform. The transfer device 10 supports efficient communication between multiple cores with low, predictable latency as well as power. Furthermore, multiple streams can run in parallel with the required freedom to be provided to ensure different code parallelization strategies between the different cores. Multiple streams may be multiple transmit or receive streams or both. A distributed and programmable stream control architecture is presented that can manage multiple synchronous or asynchronous communication streams in parallel. Flow control is implemented between source and destination as well as between streams. Distributed control mechanism also refers to the possibility to decouple data and control traffic and/or to decouple data traffic to avoid reuse of the interconnect.
One of the biggest changes compared to the previous generation platform is the addition of a custom interconnect for data communication between the different cores. In wireless CRR (cognitive reconfigurable radio)/SDR (software defined radio) systems data and control communication between different components are known at design time. Using a DMA to perform the traffic not only requires the ARM processor to program it at a very fine granularity (every symbol or few symbols), but also doubles the traffic on the bus.
The platform according to one embodiment as illustrated in
The control plane architecture has different functions: exchanging state information and control data between different processing units in the data path, and configuring the different processing cores in the data path to setup a burst. The control processor 28 may be solely responsible for packet level control. It may set up the data plane to process a complete packet and may only be interrupted when data is available that is useful for the software PHY layer or MAC layer.
The data transfer device 10 according to one embodiment is a custom interconnect for data communication between different cores on the platform. It comprises FIFOs 25, 26 connected to a crossbar 27. The FIFOs 25, 26 allow having flow control over the complete transmit or receive chain. The FIFOs 25, 26 can have any suitable implementation, for example they can be implemented as software or as hardware, or even just as memories. In case of memories, the interconnect controller acts as a DMA with its own program to transfer data at appropriate moments in time from source to destination over the data transfer device 10. Because of the decentralized control by means of interconnect controllers 20, 21, the control processor 28 of the platform can program the interconnect controllers 20, 21 for a complete burst of symbols. This allows the data flow to be setup and running during the burst itself without any further intervention. This implies that only cores that need to communicate with each other can do so (just enough flexibility).
Advantages of the data transfer device 10 according to one embodiment include decoupled data and control traffic between the different cores on the platform, flow control, flexibility, low power consumption, high throughput and low latency interconnect, reduction of the load of the control processor 28 of the platform to reprogram transfers. The low power consumption may be obtained because the data transfer device 10 may act as a dedicated control for transfer of data between components, thus ensuring that timing of this transfer and amount of data transferred is appropriate. A low latency interconnect may be obtained by FIFO connections 25, 26 at either end of the crossbar 27. Low latency may furthermore be obtained by programmability of the data transfer device 10, such that the transfer can be timed when the throughput would be high, such that latency of the transfer is minimized.
The details of the interfaces 30, 31 between the different blocks are now specified. Table 1 describes the AHBhandler module interface.
It is to be noted that the read and write signals are mutually exclusive; asserting both in the same clock cycles causes an error. The writedata is only relevant if a write transaction is requested (i.e. if write is asserted) and the readdata is only valid if produce is asserted. The address is used if read or write is asserted. If the AHBhandler 42, 43 de-asserts the accept signal, the read or write request (if any) on the data transfer device 10 is not being handled in this clock cycle and should not be overwritten with a next one until the accept signal is asserted again. The consume signal can be de-asserted to prevent the AHBhandler 42, 43 from removing data from its fifo. By doing so, the AHBhandler module 42, 43 will keep offering the readdata until the consume signal is asserted again. The produce signal is set by the AHB handler 42, 43 if readdata is being produced (a data valid signal).
Table 2 shows the instruction set of the interconnect controllers 20, 21. The first 4 instructions (with opcode 0 to 3) are control constructs that do not cause data to be transferred. The last 4 instruction (with opcode 4 to 7) cause data to be transferred on the data transfer device 10. One instruction (LOADCONST) takes an operand on the next program line, that is also a 16-bit word.
In this section, the coding of the individual instructions is presented, together with a more detailed description of the behavior of the instruction. As a general remark, instructions are coded in 16-bit words, of which the most significant three bits denote the instruction's opcode as presented in Table 2.
The “SYNC” instruction can be used for synchronization purposes. Its format is depicted in Table 3. The opcode for this instruction is 0, the other parameters are:
A special note is required for the SYNC instruction with all parameters set to 0. This instruction, coded as 0, triggers no functionality at all in the interconnect controller 20, 21. It will stall at this instruction until the instruction is reloaded with non-zero instruction data. It is to be noted that this instruction has no significance for synchronization anyway.
The “JUMPNZ” instruction checks whether a counter has reached 0. If so, program execution continues with the next program line, if not, the counter is decremented and program execution is continued at a new location. Together with the “LOADCNT” instruction, this instruction provides for iterations in the program. The number of nested operations is limited by the amount of counters available. The instruction encoding provides 3 bits for the counter identifier, so the maximum amount of counters is 8. Table 4 shows the encoding of the “JUMPNZ” instruction. Its opcode is 1, the other parameters are:
The “LOADBB” loads the initial parameters to be used for accessing the baseband 12 (with the “XFER” and “INSCONST” instructions). It sets an initial address and an increment value for this address. Its encoding is shown in Table 5. Its opcode is 2, the other parameters are:
The “LOADCNT” instruction loads a value into a counter. Together with the “JUMPNZ” instruction, it can be used to insert iterations in a program. Its encoding is shown in Table 6. Its opcode is 3, the other parameters are:
The “LOADCONST” instruction is used to load the constant to be used by the “INSCONST” instruction to insert a constant value in the baseband memory or a fifo. The constant is a 32-bit constant, of which the “LOADCONST” instruction can initialize the 16 LSB's or the 16 MSB's. It takes two “LOADCONST” instructions to load the complete 32-bit constant. The “LOADCONST” takes a 16-bit operand on the next program line. The encoding of the “LOADCONST” instruction is shown in Table 7. Its opcode is 4, the other parameter is:
The “INSCONST” instruction inserts a number of times the value previously loaded with the “LOADCONST” instruction in the baseband memory or in a FIFO. It always inserts 32-bit values, but depending on settings part of it can be 0. This can e.g. be used to add a signature to a number of datawords transferred, to allow the baseband processor 12 to detect that all required input data is available and that it can start. Its encoding is shown in Table 8. It has opcode 5, the other parameters are
The “FIFO2NULL” instruction removes an amount of datawords from a fifo 25, 26 and discards them. The instruction encoding is shown in Table 9. It has opcode 6, the other parameters are:
The “XFER” instruction moves an amount of datawords from the baseband memory to a fifo 25, 26 or vice versa. The instruction encoding is shown in Table 10. Its opcode is 7, the other parameters are:
Two examples of code to be loaded into an interconnect controller in accordance with one embodiment are given below. It should be noted both these examples show a trade-off between throughput and latency.
The above code illustrates how transfers happen from 4 sources to one destination in chunks of 40 elements. Each ‘XFER’ instruction transfers 40 elements from one source to another destination in the above code. These transfers are in a (inner) loop of size 4 (0 to 3), possibly this allows a total of 160 elements transfers from source 0 to destination 1. It can be noted that in steady state, a steady set of transfers can be done with high speed. These fine grain transfers allow to hide the latency of the transfer making it quite efficient such that the buffers at the source side can be kept small.
The above source code shows a coarser set of transfers from sources 0 to 3 to the destination port compared to example 1. There is a loop of only count 1, such that there is effectively only 4 transfers each of 160 elements from the source to a destination. Although the first transfer takes more cycles (due to setups at the source 0), the following transfers are quite efficient. The interconnect controller 20 waits for the source 0 to be ready before the transfers are made, therefore an extra number of cycles is required. This mode of transfer does higher throughput as much more data is transferred per instruction, however there is more latency for transferring data from source 3.
The foregoing description details certain embodiments of the invention. It will be appreciated, however, that no matter how detailed the foregoing appears in text, the invention may be practiced in many ways. It should be noted that the use of particular terminology when describing certain features or aspects of the invention should not be taken to imply that the terminology is being re-defined herein to be restricted to including any specific characteristics of the features or aspects of the invention with which that terminology is associated.
While the above detailed description has shown, described, and pointed out novel features of the invention as applied to various embodiments, it will be understood that various omissions, substitutions, and changes in the form and details of the device or process illustrated may be made by those skilled in the technology without departing from the spirit of the invention. The scope of the invention is indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Number | Date | Country | Kind |
---|---|---|---|
10160036.9 | Apr 2010 | EP | regional |
This application is a continuation of PCT Application No. PCT/EP2010/066993, filed Nov. 8, 2010, which claims priority under 35 U.S.C. §119(e) to U.S. provisional patent application 61/259,441 filed Nov. 9, 2009. Each of the above applications is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61259441 | Nov 2009 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2010/066993 | Nov 2010 | US |
Child | 13465277 | US |