The present invention generally relates to embedded systems, and more particularly relates to communication between subsystems of an embedded system.
Embedded systems are special-purpose computer systems designed to perform one or more dedicated functions. For example, some types of embedded system include cell phones, handheld devices, calculators, GPS (global positioning system) receivers, printers, network devices, digital cameras, controllers, etc. Embedded systems are often fabricated on a single semiconductor substrate typically referred to as a System-on-Chip (SoC) or Network-on-Chip (NoC) system. Embedded systems are often highly complex, including multiple processor-based subsystems. The processor subsystems often share common resources such as memory, busses and peripherals to improve system cost and reduce power and packaging constraints. However, a greater burden is placed on the processor subsystems as the number of subsystems increases and more resources are shard. For example, one or more of the processor-subsystems must arbitrate requests for the same common resource and maintain data coherency. As the number of processor subsystems increases, so to does the complexity of the arbitration and coherency processes that must be managed. Packaging constraints such as pin count often result in external memory resources also being shared, further complicating the arbitration and coherency schemes.
In addition to managing the use of shared resources, the processor subsystems must also be aware of which subsystems are powered down during low power or sleep modes. Otherwise, erroneous system operation may result. As a result, embedded system design is often a tradeoff between many variables such as bandwidth, efficiency, system performance, power consumption and cost. Bandwidth and power are of particular concern for handheld and mobile embedded systems where the processor subsystems are under a greater burden to meet performance requirements created by increasingly higher user demand.
The processor subsystems attempt to meet increasing user demand, but in doing so place a greater stress on the underlying embedded support system. Mainly, the internal bus architecture or ‘bus fabric’, together with embedded subsystems such as DMA (direct memory access) controllers and interrupt handlers, have a greater burden for providing transparent and efficient use of limited common resources. However, conventional bus fabrics are not fully transparent, requiring a central process to manage resource use at a relatively low level. For example, one or more processor subsystems are conventionally responsible for low-level functions such as data flow (including DMA), arbitration, interrupt handling, inter-processor communication, power management, etc. This results in a master-slave type arrangement. Yet, the processor subsystems must also satisfy stringent embedded system performance requirements, requiring a greater emphasis on higher-level functions. Allocating limited processor resources between low-level and high-level functions has a tremendous affect on overall embedded system performance. For example, processor subsystems become slow and cannot efficiently handle high-level tasks when too many processor resources are allocated to low-level master-slave functions. On the other hand, bottlenecks arise in the bus fabric and between shared resources when too few processor resources are allocated to the low-level tasks.
Conventional embedded bus fabric architectures are based on a master-slave arrangement where main components such as the DMA unit of a processor are bus masters and originate bus traffic. They communicate to bus slaves such as memory, peripherals (UART, USB etc.). The bus slaves cannot generate traffic and only respond to memory requests from a master. The bus master accesses slave devices with two functions, either read or write. In both cases the transfer originates with the master and is controlled across the bus fabric by the master. Additional functionality in slave devices is achieved using memory mapped registers that can be read and/or written to drive the additional functions.
The bus fabric is structured as a memory space where all slave devices are assigned a physical address, the address being issued as part of a memory read/write by the master to identify the device it wants to access. Each master in the system does not have an address or assigned location within the memory map unless the master also has a slave port as is the case with certain devices like accelerators such as a DMA unit. Conventional bus systems include the ability to pipeline operations and share bus data paths between masters (e.g., interleaving, out of order data transfer, etc.) but these abilities are aimed at increasing efficiency and do not change the fundamental master-slave operation of the bus design.
According to the methods and apparatus taught herein, control and data flow functions are managed in an embedded system using a peer-to-peer access scheme instead of a master-slave topology. In doing so, such low-level functions are distributed more evenly across the system. This frees up processor resources for higher-level functions, improving embedded system performance without creating bottlenecks in the bus fabric or between subsystems. The bus fabric may include any preexisting type of bus structures. A peer-to-peer communication matrix is formed using the bus structures by inserting communication nodes at different points in the bus fabric. These nodes, referred to herein as non-terminating nodes, are interconnected with terminating nodes associated with the subsystems (e.g., memory, processors, peripheral, etc.) to complete the peer-to-peer matrix. The peer-to-peer matrix enables all subsystems to communicate with the bus fabric on the same level. The subsystems request execution of low-level control and data flow tasks by issuing messages to the other subsystems. The messages are routed over the peer-to-peer matrix by the non-terminating nodes until arriving at the proper destination for execution. The non-terminating nodes also manage other functions such as arbitration and interrupt handling, alleviating the processor subsystems of these tasks.
According to one embodiment, an embedded system includes at least one processor, memory and peripheral subsystem. Each subsystem has a terminating node configured to issue and receive messages for the subsystem. A bus fabric interconnects the subsystems and includes a plurality of non-terminating nodes located at different points in the bus fabric and interconnected with the terminating nodes to form a peer-to-peer communication matrix between the subsystems. The non-terminating nodes route the messages over the peer-to-peer matrix so that instructions included in the messages are delivered to the terminating nodes identified in the messages for execution. Each node is assigned one or more unique object identifiers for identifying the nodes and the instructions included in the messages identify different control and data flow functions supported by different ones of the subsystems.
Of course, the present invention is not limited to the above features and advantages. Those skilled in the art will recognize additional features and advantages upon reading the following detailed description, and upon viewing the accompanying drawings.
In more detail, the bus fabric 114 includes a plurality of nodes 118-124 located at different points in the fabric 114. Each node 118-124 of the bus fabric 114 is connected to other nodes within the fabric 114, to nodes 126-132 associated with the subsystems 102-108, or both. Interconnecting the nodes 118-132 this way creates a peer-to-peer communication matrix formed between the subsystems 102-108. Messages are routed from one subsystem 102-108 to another over the peer-to-peer matrix irrespective of the underlying bus topology. This way, peer-to-peer access can occur within the embedded system 100 using any type of preexisting bus architectures. In one embodiment, connections between the nodes 118-132 are unidirectional and either point-to-point or point-to-multi-point. A node 118-132 can be connected to the peer-to-peer matrix using any underlying bus structure (not shown) capable of transferring messages. The underlying bus structure also carries a clock for synchronous operation, strobes for delineating and transferring the messages and control signals such as wait, ready and acknowledge signals for controlling message flow. Other bus lines can be used to indicate the structure of the messages as will be described in more detail later. Any number of conventional bus protocols can be used to implement the access scheme.
The messages are routed over the peer-to-peer matrix based on unique object identifiers included in the messages. Each node 118-132 is identified using one or more of the unique object identifiers, not an address space. As such, memory mapping is not needed each time a node 118-132 is added, deleted or otherwise modified. Instead, the nodes 118-132 need only be aware of which object identifiers are valid. Multiple unique object identifiers can be assigned to the same node 118-132 for identifying a different function or group of functions supported by the corresponding subsystem 102-108. This way, different functions supported by the same subsystem 102-108 can be accessed using different object identifiers assigned to the same subsystem 102-108. The subsystems 102-108 can be re-used in other embedded designs without substantive software or hardware revisions when the object identification techniques disclosed herein are used for subsystem identification instead of a memory mapping technique.
In addition to having unique object identifiers, the messages also include instructions corresponding to low-level functions to be executed by the receiving terminating nodes 126-132. Each instruction identifies which function or functions should be executed, the node 126-132 to execute the instruction and data associated with the function(s). One or more instructions can be included in a single message. Each node 118-132 is capable of handling received messages, whether by processing the instructions included in the messages, routing the messages over the peer-to-peer matrix or performing other tasks such as arbitration or interrupt handling. If a node 118-132 cannot execute one or more instructions included in a received message, the receiving node issues a message to the originating node indicating that the receiving node is not configured to execute the instruction(s).
Messages originate and terminate at the subsystem nodes 126-132, hereinafter referred to as terminating nodes. Each terminating node 126-132 comprises an issuer 134, a receiver 136 and an interface controller 138. The issuer 134 generates new messages and sends the messages to the bus fabric 114. Messages are received from the bus fabric 114 by the receiver 136 and decoded. The interface controller 138 manages interaction between the corresponding subsystem 102-108 and the issuer and receiver 134, 136. The interface controller 138 receives commands from the subsystem 102-108 identifying new instructions. In response, the interface controller 138 instructs the issuer 134 to generate new messages including the instructions. The interface controller 138 also accepts decoded messages from the receiver 136 and initiates the instructions included in the decoded messages. The instructions are passed to the subsystem 102-108 for execution when appropriate. The interface controllers 138 can be integrated as part of the subsystem logic or can be add-on components. For example,
Each non-terminating node 118-124 similarly comprises an issuer 142, a receiver 144 and an interface controller 146. The receiver 144 passes messages received from one of the terminating nodes 126-132 to one or more other ones of the non-terminating nodes 118-124. The issuer 142 receives messages from one or more other ones of the non-terminating nodes 118-124 and passes the messages to one of the terminating nodes 126-132. The interface controller 146 determines how the messages are routed. The interface controller 146 included in the non-terminating nodes 118-124 may perform other tasks such as interrupt handling and arbitration, alleviating the processor subsystems 102, 104 of these tasks. The main role of the non-terminating nodes 118-124 is routing messages from source to destination over the peer-to-peer matrix. Messages can be routed from a single source to a single destination. Alternatively, the non-terminating nodes 118-124 can receive a message and decode the routing destination not as a single terminating node but as a group of terminating nodes connected by a common factor in their respective node identities. This enables the non-terminating nodes 118-124 to broadcast messages for system or group wide functions such as reset.
According to one embodiment, the interface controller 146 included in the non-terminating nodes 118-124 determines a preferred routing path by accessing a link map 148 associating the unique object identifiers with different routing paths. In one embodiment, the link map 148 is fixed in hardware. In another embodiment, the link map 148 is a programmable routing table arranged similar to a conventional IP network routing table. Conventional routing tables include at least the destination network ID, cost of the path through which the packet is to be sent and the next network station to which the packet is to be sent on the way to destination. However, the link map 148 accessed by the non-terminating nodes 118-124 corresponds to embedded system messages and not IP packets. Also, the link map 148 can be used along with accumulated statistics to determine arbitration priority. The link map 148 can be used in this way to balance system latencies. In one embodiment, the link map 148 is modified to reflect a new arbitration scheme or to reprioritize the current arbitration scheme. For example, the arbitration scheme or priority may be change when a portion of the bus fabric 114 is disabled, e.g., when the bus fabric 114 is powered down or placed in a low power mode such as sleep mode. In one embodiment, one or more of the nodes 118-132 in the peer-to-peer matrix issues a message configured to update the link map 148 when part of the bus fabric 114 is disabled. This way, messages are not routed through non-terminating nodes 118-124 located in the disabled portion of the bus fabric 114. Modifying the arbitration scheme or priority in this way also enables the embedded system 100 to maintain acceptable throughput levels even though part of the bus fabric 114 is disabled, thus better balancing system latencies.
In another embodiment, messages are routed by the non-terminating nodes 118-124 based on status and traffic information exchanged between the non-terminating nodes 118-124. The information may be used to automatically modify the fabric arbitration scheme or priority so that new routes are created when bottlenecks occur within the bus fabric 114 or between the subsystems 102-108. New routes may also be created when sections of the bus fabric 114 are in sleep or power down operation or otherwise disabled. Routes that avoid disabled regions of the bus fabric 114 may be pre-programmed into the non-terminating nodes 118-124 to address known fixed system power saving states or application specific states. For example, low latency paths can be automatically created through the bus fabric 114 when particular operating states occur. This information may also be used to better balance system latencies by modifying the fabric arbitration scheme or priority accordingly.
In yet another message routing embodiment, a dedicated controller is provided for managing the bus fabric 114, modifying the fabric arbitration scheme or priority and/or reconfiguring the routing paths of the peer-to-peer matrix based on changes in subsystem activity. The dedicated controller can be part of a non-terminating node 118-124 or can be a stand-alone controller (not shown). In either case, the dedicated controller receives status messages originated by the terminating nodes 126-132 indicating subsystem activity. The non-terminating nodes 118-124 may also issue messages to the dedicated controller for indicating bus fabric activity. In response, the dedicated controller issues messages that tailor the behavior of the bus fabric 114 to particular power and application demands. The distributed nature of the peer-to-peer matrix allows the dedicated controller to function transparently with respect to the processor subsystems 102, 104. In still another embodiment, message routing information is hard-wired into each non-terminating node 118-124. In each of these embodiments, the non-terminating nodes 118-124 route messages from source to destination over the peer-to-peer matrix so that instructions included in the messages can be executed in a more distributed and timely manner across the bus fabric 114 and the subsystems 102-108.
The instructions included in the messages not only support conventional read and write data flow functions, but other data flow functions and certain control functions. The instructions can broadly relate to any type of low-level control and data flow function such as reads/writes, DMA, arbitration, interrupt handling, inter-processor communication, power management, etc. The terminating nodes 126-132 directly execute low-level functions indicated by the instructions included in the messages routed over the peer-to-peer matrix, improving data transfer, automating common processes and reducing the need for centralized control. The peer-to-peer access techniques disclosed herein enable low-level functions to be distributed more evenly across the bus fabric 114 and subsystems 102-108.
The second field 202 identifies the instruction to be executed. The instruction can correspond to one of several supported control and data flow functions. Several basic instructions are available. Additional instructions may also be supported depending on the type of embedded system 100. One of the base instructions is the READ instruction. The READ instruction requests a target node to provide data. The target node is identified in the third field 204 of the message (the issuing node may be identified in an optional fourth field 206). A data field 208 of the message gives further specifics such as an address within an address space of the target node, number of words, and additional actions after the data is sent. In one embodiment, the terminating node 130 associated with the memory subsystem 106 can use the data field to store the address of required data so that conventional memory mapping can be transparently implemented without processor subsystem control.
Another base instruction is the WRITE instruction. The WRITE instruction requests the node identified in the third field 204 to store data included in the data field. The data field 208 may also include further details relating to the WRITE instruction. The access scheme can also be used to maintain coherency when read and write instructions are issued. In one embodiment, the terminating node 130 of the memory subsystem 106 issues a message to the processor subsystems 102, 104 indicating when a shared region of the memory array 112 has been accessed as a result of a READ, WRITE or other memory-based instruction. In one embodiment, the memory subsystem 106 maintains a map (not shown) identifying different shared regions of the memory array 112 to determine whether a shared region of the array 112 has been accessed.
DMA read and write instructions are also supported. The data field 208 indicates the target node for the DMA operation and other setup information. The DMA instructions cause the node identified in the third field 204 to directly initiate a read or write operation with the node indicated in the data field 208 as part of a DMA-type transfer. One of the processor subsystem terminating nodes 126, 128 can initiate a DMA exchange between peripheral and memory subsystems 106, 108 by issuing a DMA instruction. In response, the terminating nodes 130, 132 of the peripheral and memory subsystems 106, 108 directly execute the DMA exchange over the peer-to-peer matrix without intervention from the processor subsystems 102, 104.
Non-data flow instructions are also supported. One such instruction is the INITIATE instruction. The INITIATE instruction is a control instruction that causes the subsystem 102-108 of the identified target node to begin a process or routine. The data field 208 of the message contains the details of the process or routine to be executed and any associated parameters. In one embodiment, the INITIATE instruction is used to notify a first terminating node that the peripheral subsystem 108 has an interrupt request for the subsystem of the first terminating node. The first terminating node can then send a message to the terminating node 124 of the peripheral subsystem 108 confirming receipt of the interrupt request. The INITIATE instruction can also be used as part of a daisy-chained operation for initiating a sequence of operations within a pipelined process without processor subsystem intervention, e.g., for various power-down scenarios.
In one embodiment, a control node such as one of the non-terminating nodes 118-124 or a stand-alone node (not shown) manages complex pipeline operations. The control node receives a message such as the INITIATE instruction indicating a number of commands are to be executed in a particular sequence. The control node issues new instructions to different ones of the terminating nodes 126-132 as prior instructions are executed as indicated by status messages received by the control node. In another embodiment, pipelined operations are more distributed. According to this embodiment, each terminating node 126-132 identified in a pipelined operation executes one or more functions assigned to the terminating node and then triggers the next terminating node identified in the pipelined operation to execute one or more additional functions until all functions associated with the pipelined process are executed. The INITIATE instruction can be used to initiate either of these pipeline operations.
Another supported control instruction is the MY_STATUS instruction. The MY_STATUS instruction is used to send data, e.g., via the data field 208 indicating the status of the originating node. This can be used to acknowledge INITIATE instructions, DMA instructions or to communicate between nodes 118-132 for controlling operation of the bus fabric 114. A RESET instruction is a system wide instruction that is broadcast when received by a non-terminating node 118-124 to all attached nodes for executing a cold reset and returning the affected nodes to a default state. The RESET instruction ripples through the peer-to-peer matrix in a set sequence based on where the instruction enters the matrix. The data field 208 may contain additional parameters that can be modified by intervening nodes to achieve a structured reset. The RESET_NODE instruction is similar to the global RESET instruction, but is more selective. The RESET_NODE instruction is used to reset the terminating node 126, 128 of a processor subsystem 102, 104 to a known state determined by the data field. More than one node can be specified. The POWER instruction either sets or unsets nodes 118-132 or subsystems 102-108 into various power states, e.g. sleep, low power, etc.
Any other type of instruction can be supported using the instruction field 202 of the message. The message may have additional fields.
More than one instruction can be included in a message by concatenating multiple instructions to produce a compound message.
With the above range of variations and applications in mind, it should be understood that the present invention is not limited by the foregoing description, nor is it limited by the accompanying drawings. Instead, the present invention is limited only by the following claims, and their legal equivalents.