1. Field of the Invention
The present invention relates generally to systems on chips, and in particular to methods and mechanisms for routing transactions in a system on chip.
2. Description of the Related Art
Systems on chips (SoCs) are increasing in complexity and size due to continual technological advances in the electronics industry. A common SoC may include multiple input/output (I/O) devices connected to a processor complex containing one or more processors. The processor complex may typically include one or more processors and one or more caches, and the processor complex may be coupled to a CPU port of a memory controller through which the processor complex may access a memory. The I/O devices may be coupled to a coherency port on the processor complex and access memory through the CPU port of the memory controller.
A portion of the traffic from the I/O devices may be cache coherent. Another portion of the traffic from the I/O devices may be low-performance transactions, and some of the low-performance transactions may be directed to non-shareable memory. Typically, the cost of checking every transaction for cache coherency is high, in terms of hardware, performance, and power. In addition, the traffic from the I/O devices may compete with the processor complex for memory bandwidth on the CPU port on the memory controller. Furthermore, the traffic from the I/O devices may also unnecessarily cause snoop activity to take place in the processor complex.
In one embodiment, an apparatus may include one or more processors, a memory controller, one or more I/O devices, and a coherence switch. The one or more processors may be located in a processor complex, and the processor complex may be coupled to a real-time port of the memory controller. The processor complex may include one or more levels of caches, and the processor complex may also include a coherency port coupled to the coherence switch. The coherence switch may be coupled to the one or more I/O devices, to the processor complex, and to the memory controller. In some embodiments, the apparatus may include a non-real-time (NRT) block, and the coherence switch may be coupled to the memory controller via the NRT block. The apparatus may also include a multiplexer, and the coherence switch may be coupled to the one or more I/O devices via the multiplexer.
The coherence switch may receive transactions from the I/O device(s), and the coherence switch may route received transactions to the memory controller on two separate paths within the apparatus. The first path may pass through a coherency port on the processor complex and through a first port of the memory controller. Traffic from sources that are known to be coherent may be routed to memory via the first path. A second path may pass through a NRT block and through a second port of the memory controller. Traffic from sources that are known to be non-coherent may be routed to memory via the second path. The sources that are known to be non-coherent may generate transactions that access only non-shareable memory. In one embodiment, the coherence switch may determine which path to route a transaction based on an identifier that accompanies the transaction.
In various embodiments, the coherence switch may be configured to dynamically reallocate traffic from the coherent path to the non-coherent path or from the non-coherent path to the coherent path. The coherence switch may maintain a configuration register, and the configuration register may store an indicator for each transaction identifier. The indicator may specify whether the corresponding transaction should be routed to memory via the coherent or non-coherent path. In one embodiment, the coherence switch may maintain two copies of the configuration register. The first copy of the configuration register may be a software-writeable copy and the second copy of the configuration register may be a working copy. The working copy may also be referred to as a shadow copy.
Having two copies of the configuration register may facilitate dynamic switching of traffic. For example, the two copies of the configuration register may allow software to initiate a change to the configuration register, while the coherence switch hardware may control actual changes to system behavior. Specifically, a software application may update the software-writeable copy of the configuration register to reallocate traffic flows on a transaction identifier basis. The coherence switch may detect the update to the software-writeable copy, and then the coherence switch may update the working copy which is used to actually implement the new routing.
These and other features and advantages will become apparent to those of ordinary skill in the art in view of the following detailed descriptions of the approaches presented herein.
The above and further advantages of the methods and mechanisms may be better understood by referring to the following description in conjunction with the accompanying drawings, in which:
In the following description, numerous specific details are set forth to provide a thorough understanding of the methods and mechanisms presented herein. However, one having ordinary skill in the art should recognize that the various embodiments may be practiced without these specific details. In some instances, well-known structures, components, signals, computer program instructions, and techniques have not been shown in detail to avoid obscuring the approaches described herein. It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements.
This specification includes references to “one embodiment”. The appearance of the phrase “in one embodiment” in different contexts does not necessarily refer to the same embodiment. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure. Furthermore, as used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean including, but not limited to.
Terminology. The following paragraphs provide definitions and/or context for terms found in this disclosure (including the appended claims):
“Comprising.” This term is open-ended. As used in the appended claims, this term does not foreclose additional structure or steps. Consider a claim that recites: “A processor comprising a cache . . . .” Such a claim does not foreclose the processor from including additional components (e.g., a network interface, a crossbar).
“Configured To.” Various units, circuits, or other components may be described or claimed as “configured to” perform a task or tasks. In such contexts, “configured to” is used to connote structure by indicating that the units/circuits/components include structure (e.g., circuitry) that performs the task or tasks during operation. As such, the unit/circuit/component can be said to be configured to perform the task even when the specified unit/circuit/component is not currently operational (e.g., is not on). The units/circuits/components used with the “configured to” language include hardware—for example, circuits, memory storing program instructions executable to implement the operation, etc. Reciting that a unit/circuit/component is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. §112, sixth paragraph, for that unit/circuit/component. Additionally, “configured to” can include generic structure (e.g., generic circuitry) that is manipulated by software and/or firmware (e.g., an FPGA or a general-purpose processor executing software) to operate in manner that is capable of performing the task(s) at issue. “Configured to” may also include adapting a manufacturing process (e.g., a semiconductor fabrication facility) to fabricate devices (e.g., integrated circuits) that are adapted to implement or perform one or more tasks.
“First,” “Second,” etc. As used herein, these terms are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical) unless explicitly defined as such. For example, in a memory controller having five ports, the terms “first” and “second” ports can be used to refer to any two of the five ports.
“Based On.” As used herein, this term is used to describe one or more factors that affect a determination. This term does not foreclose additional factors that may affect a determination. That is, a determination may be solely based on those factors or based, at least in part, on those factors. Consider the phrase “determine A based on B.” While B may be a factor that affects the determination of A, such a phrase does not foreclose the determination of A from also being based on C. In other instances, A may be determined based solely on B.
Referring now to
Components shown within IC 10 may be coupled to each other using any suitable bus and/or interface mechanism. In some embodiments, these components may be connected using the Advanced Microcontroller Bus Architecture (AMBA®) protocol (from ARM® Holdings) or any other suitable on-chip interconnect specification for the connection and management of logic blocks. Examples of AMBA buses and/or interfaces may include Advanced eXtensible Interface (AXI), Advanced High-performance Bus (AHB), Advanced System Bus (ASB), Advanced Peripheral Bus (APB), and Advanced Trace Bus (ATB).
IC 10 includes coherence switch 12, and coherence switch 12 may be a programmable switch that software can configure dynamically. As shown in
Coherence switch 12 is also coupled to multiplexer 14, and multiplexer 14 is coupled to DMA controller 15 and I/O devices 16 and 20. Multiplexer 14 is also coupled to I/O device 18 via DMA controller 15. Multiplexer 14 may include one or more buffers for buffering data from I/O devices 16-20 and/or DMA controller 15. In one embodiment, multiplexer 14 may be a PL301 High Performance Matrix from ARM Holdings. I/O devices 16-20 are representative of any number of I/O devices, and the various I/O devices may be coupled to multiplexer 14 in a variety of ways, such as directly, through DMA controller 15, and/or through another device. Variations of the types of connections between I/O devices 16-20 and multiplexer 14 are possible and are contemplated. In other embodiments, multiplexer 14 may be coupled to an I/O processor, peripheral I/O queues, and/or one or more other devices not shown in
Coherence switch 12 may receive transactions from the I/O devices 16-20 and may convey the transactions to processor complex 22 or NRT block 26. In some embodiments, in response to receiving transactions, coherence switch 12 may issue corresponding memory requests to processor complex 22 or NRT block 26. Generally speaking, a transaction may comprise a memory request, and the term “memory request” is not limited to requests that are ultimately responded to by memory, but can also include requests that are satisfied by a cache. It is noted that the terms “memory request”, “transaction”, and “memory operation” may be used interchangeably throughout this disclosure.
Although not shown in
Memory controller 30 includes ports 32, 34, and 36, which are representative of any number of ports. Port 32 may be coupled to processor complex 22. In one embodiment, port 32 may be designated to receive real-time (RT) memory requests. Port 36 may be coupled to NRT block 26. In one embodiment, port 36 may be designated to receive NRT memory requests. Generally speaking, NRT memory requests may be treated as a lower priority than RT memory requests by memory controller 30. Port 34 may be coupled to another block (not shown) of IC 10. For example, in one embodiment, port 34 may be coupled to a RT peripheral block. In another embodiment, port 34 may be coupled to a graphics controller.
The memory controller 30 may include circuitry configured to interface to memory (not shown). For example, the memory controller 30 may be configured to interface to dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM), double data rate (DDR) SDRAM, DDR2 SDRAM, Rambus DRAM (RDRAM), etc. Memory controller 30 may also be coupled to memory physical interface circuits (PHYs) 38 and 40. Memory PHYs 38 and 40 are representative of any number of memory PHYs which may be coupled to memory controller 30. The memory PHYs 38 and 40 may be configured to interface to memories. The memory controller 30 may receive memory requests from processor complex 22, NRT block 26, and other blocks (not shown), and memory controller 30 may perform the corresponding read and write operations to the memory.
The coherence switch 12 may determine if a transaction received from an I/O device (via multiplexer 14) is a cache-coherent or non-cache-coherent transaction using a variety of methods. Throughout this disclosure, a cache-coherent transaction may be referred to as a “coherent transaction” or “coherent memory request”, and a non-cache-coherent transaction may be referred to as a “non-coherent transaction” or “non-coherent memory request”. Generally speaking, a non-coherent transaction may correspond to a memory operation that is not checked against a cache. In one embodiment, the coherence switch 12 may determine if a transaction is coherent or non-coherent based on the I/O device from which the transaction is received. A first portion of I/O devices 16-20 may be designated as coherent devices, and a second portion of I/O devices 16-20 may be designated as non-coherent devices. In another embodiment, the coherence switch 12 may determine if a transaction is coherent or non-coherent based on a transaction identifier. Each I/O device may be assigned a range of transaction identifiers, and the identifiers may be designated for use as coherent or non-coherent transactions.
In various embodiments, the I/O devices 16-20 may obtain access to memory via multiplexer 14, coherence switch 12, and then through either processor complex 22 or NRT block 26. For example, an originating I/O device may issue a read or write request to memory. The request may pass through multiplexer 14 and then coherence switch 12 may receive the request and determine if the request should be routed to the processor complex 22 (for coherent requests) or to NRT block 26 (for non-coherent requests). For coherent traffic, the processor complex 22 may provide a mechanism to snoop the cache. If there is a cache hit, the processor complex 22 may provide a response to coherence switch 12. If there is a cache miss, the processor complex 22 may forward the request to memory. For non-coherent traffic, coherence switch 12 may forward the request to NRT block 26 and then NRT block 26 may forward the request to memory (via memory controller 30).
It is noted that other embodiments may include other combinations of components, including subsets or supersets of the components shown in
Turning now to
Coherence switch 12 may receive the transactions from multiplexer 14 and the coherent and non-coherent transactions may be intermingled. As shown in
Referring now to
ACP queue 56 may store coherent transactions coupled from egress port multiplexer 52, and then ACP queue 56 may convey the coherent transactions to the ACP of the processor complex (not shown). Similarly, NRT queue 58 may store non-coherent transactions coupled from egress port multiplexer 52, and then NRT queue 58 may convey the coherent transactions to the NRT block (not shown). ACP queue 60 may store coherent transaction return data coupled from the processor complex, and then ACP queue 60 may convey the return data to ingress port multiplexer 54. Similarly, NRT queue 62 may store non-coherent transaction return data coupled from the NRT block, and then NRT queue 62 may convey the return data to ingress port multiplexer 54. Although not shown in
In one embodiment, ingress port multiplexer 54 may intermingle data associated with coherent and non-coherent transactions on the return path to the I/O devices. Ingress port multiplexer 54 may intermingle the data associated with coherent and non-coherent transactions in the order in which the data is received from ACP queue 60 and NRT queue 62. In various embodiments, the queues 56-62 may be any of various sizes to store any number of transactions or any amount of return data associated with transactions.
In one embodiment, configuration unit 50 may be accessible via an advanced peripheral bus (APB) interface or the like. For example, a software application running on an external device or processor (not shown) may utilize the APB interface 64 for programming or configuring the configuration unit 50. The APB interface 64 may be independent of the transactions and data that pass through coherence switch 12.
Turning now to
In one embodiment, in response to a system or software reset, all of the values in registers 72 and 74 may be set to zero, wherein a value of zero corresponds to the coherent path. As a result of the reset, each of the values of registers 72 and 74 may indicate that the coherent path should be taken for each transaction identifier. This may be the default setting for each transaction identifier. It is noted that in other embodiments, a value of one in registers 72 and 74 may correspond to the coherent path, and zero may correspond to the non-coherent path.
In one embodiment, CPU 76 or 78 may write to software-writeable register 72 via APB interface 64 to change the routing settings for transaction identifiers assigned to one or more I/O devices. In another embodiment, I/O processor 80 may write to software-writeable register 72 via APB interface 64 to change the routing settings for various transaction identifiers. In a further embodiment, another device (not shown) may write to software-writeable register 72 via APB interface 64 to change the routing settings for one or more transaction identifiers. As shown in
In one embodiment, after detecting a change to the software-writeable register 72, coherence switch 12 may stop accepting new transactions from I/O devices 16-20 (of
Configuration unit 50 may keep track of outstanding transactions through the use of one or more counters (not shown). In one embodiment, configuration unit 50 may utilize a first counter to maintain a count of the outstanding write transactions, and configuration unit 50 may utilize a second counter to maintain a count of the outstanding read transactions. When an update to the software-writeable register 72 is detected, configuration unit 50 may stop accepting new transactions until all outstanding write transactions have been processed. Configuration unit 50 may utilize the first counter to determine when all of the write transactions have been processed. The routing indicators in the software-writeable register 72 may not affect the return path of the read transactions, and so in some embodiments, the number of outstanding read transactions may not be monitored. In another embodiment, configuration unit 50 may maintain separate counters for the number of outstanding coherent write transactions and for the number of outstanding non-coherent write transactions.
In one embodiment, a coherence switch may include a split-bus architecture with separate address and data buses for write transactions. In such an embodiment, configuration unit 50 may utilize a counter to detect whether or not there any pending write transactions. In one embodiment, the counter may be initialized to a particular value which represents a state in which no transaction are pending. For example, for an 8-bit counter that counts from 0 to 255, the particular (initial) value of the counter may be set to 128. The counter may be incremented when the address portion of a transaction is received, and the increment may be proportional to the amount of data associated with the transaction. Furthermore, the counter may be decremented each time a write data beat is received. When the counter is equal to its initial (particular) value, this will indicate that all of the address and data of the outstanding write transactions have been received by the coherence switch and no write transactions are outstanding. Additionally, when the counter is back to its initial value, configuration unit 50 may then update the shadow copy of the configuration register based on the update to the software-writeable copy after a change to the software-writeable copy has been detected.
Referring now to
In another embodiment, shadow register 74 may be split up into multiple registers, with each register corresponding to an address range which is a portion of the total address range. Shadow register 74 is shown in
In various embodiments, each I/O device may be assigned a range of addresses to be used for transactions. For example, one I/O device may be assigned addresses 0-15 for its transactions, another I/O device may be assigned addresses 16-23, and so on. In some embodiments, each I/O device may be designated as either a source of coherent transactions or as a source of non-coherent transactions. In other embodiments, an individual I/O device may be a source of both coherent and non-coherent transactions.
In further embodiments, a specific transaction identifier or address may be designated as a coherent transaction. An I/O device may utilize the specific transaction identifier for a coherent transaction, and then at some point in the future, the I/O device may decide to use the specific transaction identifier for a non-coherent transaction. The I/O device may update the software-writeable copy of the configuration register to change the routing indicator for the specific transaction, and then in one embodiment, the I/O device may send a barrier instruction to the coherence switch prior to sending the non-coherent transaction. The barrier instruction may allow the coherence switch time to update the shadow copy of the configuration register to match the software-writeable copy. The barrier instruction may also serve as notice to the coherence switch that an update to the shadow copy of the configuration register has taken place.
In a still further embodiment, the update to the software-writeable copy of the configuration register may occur after the coherence switch has received only the address portion of a particular transaction. In this case, the coherence switch may not update the shadow copy of the configuration register until all of the beats from the relevant data traffic have been received for this particular transaction. The coherence switch may utilize the previously-described counter mechanism to determine when there are no transactions outstanding.
Turning now to
In one embodiment, a plurality of transactions, including first and second transactions, may be received at a coherence switch (block 90). The transactions may be generated by one or more I/O devices. The I/O device(s) may be coupled to the coherence switch through a multiplexer, DMA controller, and/or other devices. The first and second transactions may be accompanied by first and second identifiers, respectively. After receiving the first and second transactions, the coherence switch may access first and second routing indicators in a configuration register (block 92). In one embodiment, the coherence switch may utilize the first and second identifiers as indices or addresses into the configuration register to access the first and second routing indicators, respectively.
The coherence switch may route the first transaction on a first path in response to determining the first routing indicator has a first value (block 94). In one embodiment, the first value may be ‘0’ indicating the first transaction is a coherent request. The first path may go from the coherence switch to the ACP of a processor complex. The coherence switch may route the second transaction on a second path in response to determining the second routing indicator has a second value (block 96). In one embodiment, the second value may be ‘1’ indicating the second transaction is a non-coherent request. The second path may go from the coherence switch to a non-real-time (NRT) block and then to a NRT port of a memory controller.
Turning now to
The coherence switch may be configured to detect an update to a software-writeable copy of the configuration register (block 100). Then, after detecting the update, the coherence switch may stop accepting new transactions from the connected I/O devices (block 102). Next, the coherence switch may determine if there are any outstanding transactions still in-flight (conditional block 104). In one embodiment, the coherence switch may utilize a counter mechanism such as that discussed above to determine if there are any outstanding transactions. As previously described, if the counter is equal to a predetermined value, this may indicate that all pending transactions have been processed. In other embodiments, the coherence switch may utilize other mechanisms to determine whether or not there are any outstanding transactions that need to be completed.
If there are not any outstanding transactions still in-flight (conditional block 104), then the coherence switch may update a shadow copy of the configuration register (block 106). If the coherence switch determines there are outstanding transactions still in-flight (conditional block 104), then the coherence switch may wait until all outstanding transactions are completed before updating the shadow copy. After block 106, the coherence switch may begin accepting new transactions (block 108). Then, the coherence switch may receive a new transaction (block 110). The coherence switch may route the new transaction based on the updated shadow copy of the configuration register (block 112). In one embodiment, the transaction may include an identifier, and the coherence switch may look up the identifier in the shadow copy of the configuration register to find a corresponding routing indicator.
Referring now to
The memory 122 may be any type of memory, such as dynamic random access memory (DRAM), synchronous DRAM (SDRAM), double data rate (DDR, DDR2, DDR3, etc.) SDRAM (including mobile versions of the SDRAMs such as mDDR3, etc., and/or low power versions of the SDRAMs such as LPDDR2, etc.), RAMBUS DRAM (RDRAM), static RAM (SRAM), etc. One or more memory devices may be coupled onto a circuit board to form memory modules such as single inline memory modules (SIMMs), dual inline memory modules (DIMMs), etc.
The peripherals 124 may include any desired circuitry, depending on the type of system 120. For example, in one embodiment, the system 120 may be a mobile device (e.g., personal digital assistant (PDA), smart phone, electronic reading device) and the peripherals 124 may include devices for various types of wireless communication, such as Wi-Fi, Bluetooth, cellular, global positioning system, etc. The peripherals 124 may also include additional storage, including RAM storage, solid state storage, or disk storage. The peripherals 124 may include user interface devices such as a display screen, including touch display screens or multitouch display screens, keyboard or other input devices, microphones, speakers, etc. In other embodiments, the system 120 may be any type of computing system (e.g., desktop personal computer, laptop, workstation, video game console, television, nettop).
Turning now to
Generally, the data structure(s) of the circuitry on the computer readable medium 130 may be read by a program and used, directly or indirectly, to fabricate the hardware comprising the circuitry. For example, the data structure(s) may include one or more behavioral-level descriptions or register-transfer level (RTL) descriptions of the hardware functionality in a high level design language (HDL) such as Verilog or VHDL. The description(s) may be read by a synthesis tool which may synthesize the description to produce one or more netlists comprising lists of gates from a synthesis library. The netlist(s) comprise a set of gates which also represent the functionality of the hardware comprising the circuitry. The netlist(s) may then be placed and routed to produce one or more data sets describing geometric shapes to be applied to masks. The masks may then be used in various semiconductor fabrication steps to produce a semiconductor circuit or circuits corresponding to the circuitry. Alternatively, the data structure(s) on computer readable medium 130 may be the netlist(s) (with or without the synthesis library) or the data set(s), as desired. In yet another alternative, the data structures may comprise the output of a schematic program, or netlist(s) or data set(s) derived therefrom.
While computer readable medium 130 includes a representation of the IC 10, other embodiments may include a representation of any portion or combination of portions of the IC 10 (e.g., coherence switch 12, multiplexer 14, processor complex 22, NRT block 26, memory controller 30).
It should be emphasized that the above-described embodiments are only non-limiting examples of implementations. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.