The proliferation of computer networks has led to an increasing demand for high-performance computing systems. For example, there is a growing demand for computing devices capable of handling multiple communications protocols, thereby enabling a single device—such as a personal computer, cellular telephone, personal digital assistant (PDA), or the like—to switch seamlessly between any of a variety of communication protocols (e.g., 802.11, General Packet Radio Service (GPRS), Bluetooth, Ultra Wideband (UWB), etc.). Such a capability might, for example, enable a user to maintain a continuous connection to the Internet or a virtual private network (VPN) as the user moved his laptop computer between a cable modem connection in his apartment, to a wireless local area network (WLAN) connection in his apartment complex, to a mobile connection while riding the train to work, to a local area network connection at his office. As another example, the ability to switch between a variety of communication protocols may be useful on a business trip, as a user moves between countries or regions that have adopted different communications standards.
Computer systems typically include a combination of hardware and software, although the relative roles and proportions of each will often vary among systems. Software-based systems typically operate by executing computer-readable instructions on general-purpose hardware. Hardware-based systems, on the other hand, are typically comprised of circuitry specially designed to perform specific operations (e.g., application specific integrated circuits (ASICs)). As a result, hardware-based systems generally have higher performance than software-based systems, although they also typically lack the flexibility to perform tasks other than the specific task(s) for which they were designed.
Reconfigurable systems represent a hybrid approach, in which software is used to reconfigure specially designed hardware to achieve performance approaching that offered by custom hardware. Reconfigurable systems also provide the flexibility of software-based systems, including the ability to adapt to new requirements, protocols, and standards. Thus, for example, a reconfigurable system could be used to efficiently process a variety of communications protocols, without the need for dedicated, ASIC-based digital signal processors (DSPs) for each protocol, resulting in savings in chip-size, cost, and/or power consumption.
Reference will be made to the following drawings, in which:
Systems and methods are disclosed for improving the integration of processing components in multi-processing unit systems, such as programmable or reconfigurable processors. It should be appreciated that these systems and methods can be implemented in numerous ways, several examples of which are described below. The following description is presented to enable any person skilled in the art to make and use the inventive body of work. The general principles defined herein may be applied to other embodiments and applications. Descriptions of specific embodiments and applications are thus provided only as examples, and various modifications will be readily apparent to those skilled in the art. For example, although several examples are provided in the context of the reconfigurable communications architecture, it will be appreciated that the same principles can be readily applied to other contexts as well. Accordingly, the following description is to be accorded the widest scope, encompassing numerous alternatives, modifications, and equivalents. For purposes of clarity, technical material that is known in the art has not been described in detail so as not to unnecessarily obscure the inventive body of work.
In some computer architectures, multiple execution units are used to perform complex calculations, with the results generated by one execution unit used as input to other execution units. Calculations can thus be divided among hardware elements, such that different parts of a calculation are assigned to the execution units upon which they are most efficiently carried out.
For example, the physical layer processing performed by many wireless and wired communications systems often involves a combination of numerically intensive computations and somewhat less intensive, but more general-purpose, computations. This is particularly true of protocols that use packetized data where fast acquisition is often needed. For example, processing a 802.11 a preamble typically entails fast preamble detection, fast automatic gain control (AGC) adjustment, and fast timing synchronization. These computations can advantageously be performed by processors that include a combination of datapath execution units capable of efficiently performing the intensive numerical computations, and integer units capable of performing the general purpose computations, preferably operating in parallel to reduce latency and enhance overall system performance.
When multiple execution units operate in parallel to perform a given function, it will often be desirable to pass intermediate results from one execution unit to another. Systems and methods described herein provide the ability to pass computed results from one execution unit (e.g., a datapath unit) to other, parallel execution units (e.g., integer units), in a manner that minimizes overhead (e.g., requires few clock cycles).
The operation of processor 100 is controlled by control unit 106, which sends function control signals (typically derived from instructions that the control unit is executing) to the various components of the system. For example, control unit 106 may send function control signals to integer unit 102 and datapaths 104 over dedicated control lines 112, specifying the operations to be performed on data read from memory 110.
Datapaths 104 are generally designed to perform numerically intensive operations, such as those involved in digital signal processing (DSP) calculations, while integer unit 102 performs somewhat less intensive, but more general-purpose, integer operations, such as those performed by reduced instruction set (RISC) type processors. Integer unit 102 and datapath units 104 perform their processing in parallel, and it will often be desirable to share results among them. For example, it may be desirable for datapath units 104 to pass results to integer unit(s) 102 for further processing, as might be the case if datapath units 104 provide intermediate results for a larger calculation.
As shown in
This could be accomplished in a variety of ways. One technique would be to copy the accumulator data to shared local data memory 110, where it could be retrieved by integer unit 102. Another technique would be to copy the accumulator data to an external register that is shared by the datapath unit and the integer unit. A problem with both of these approaches, however, is that they are relatively slow, since each involves multiple steps which will typically be performed on separate clock cycles (e.g., one clock cycle to copy data from accumulator register 114 to shared local data memory 110, another clock cycle to write data from shared local data memory 110 to integer unit 102).
Thus, in one embodiment the data that is written to the datapath accumulator registers 114 is also copied directly to parallel, “shadow registers” 115 in integer unit 102. As shown in
The logic shown in datapath 104a is illustrative of the type of logic that might be found in each of the datapaths 104, although it will be appreciated that any suitable logic could be used. As shown in
In some embodiments, the control unit 106 itself may be reconfigurable. Alternatively, or in addition, the elements in the datapath units may be reconfigurable, at least in the sense of performing operations in accordance with control signals received from control unit 106. In one embodiment, the signals used to reconfigure the various execution units (e.g., the signals used to specify the functions they are to perform) are sent on each clock cycle by a state machine run on control unit 106.
It should be appreciated that
It should be appreciated that a variety of changes or additions could be made to the basic process shown in
A combination of an integer unit and multiple datapath units operating in parallel with no shared execution hardware, such as that shown in
Thus, systems and methods have been described that can be used to improve the coupling of parallel integer and datapath units without requiring shared execution hardware, shared external memory, shared register hardware, or the use of data move instructions that consume extra clock cycles. A tight coupling of two processing units improves processing efficiency by reducing the overhead associated with inter-processing unit data transfers, and can thus be used to improve the efficiency of physical layer processing on a common set of hardware, thereby enabling programmable or reconfigurable processors to compete more effectively with dedicated hardware systems.
The techniques described above can be used in a variety of computing systems. For example, a processor such as that shown in
As shown in
It should be appreciated that