The present disclosure relates generally to processing systems having multiple coherency domains and more particularly to routing coherency messages between multiple coherency domains.
In processing systems having multiple processors, it often is advantageous to maintain cache coherence—that is, to provide mechanisms that ensure consistency in the data shared between the processors. When one processor modifies its local copy of a shared data, a coherency protocol is utilized to make the modified data available to the other processors. This coherency protocol typically is implemented as coherency messages transmitted between the processors via one or more coherency interconnects.
In larger systems, the coherency message traffic can overwhelm the bandwidth of the coherency interconnect when the coherency messages are broadcast to all coherent components in the system. Accordingly, in some conventional systems, coherent components of the system are assigned to one or more coherency domains and the broadcast of coherency messages can be limited to those coherency agents of a particular coherency domain. In such systems, an indicator of the cache domain for a particular cached data is stored at the cache and when the cached data is modified, the coherency agent can speculatively assign the corresponding coherency domain identified from the cache to a coherency message generated as a result of the modification of the cache data. In the event that the speculated coherency domain was assumed incorrectly, the coherency agent expands the scope of the coherency message to include more coherency domains or broader coherency domains and retransmits the coherency agent. While this speculative process can reduce system-wide coherency message traffic when the coherency domain is correctly speculated, the rebroadcast of coherency messages for incorrectly speculated coherency domains can result in increased coherency message traffic, thereby contributing to the bottleneck at the coherency interconnect. Accordingly, an improved technique for domain-specific coherency message transmission would be advantageous.
The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings.
The use of the same reference symbols in different drawings indicates similar or identical items.
In accordance with one aspect of the present disclosure, a method is provided in a processing system comprising a plurality of coherency domains and a plurality of coherency agents. Each coherency agent is associated with at least one of the plurality of coherency domains. The method includes performing, at a select coherency agent of the plurality of coherency agents, an address translation for a coherency message using a first memory address to generate a second memory address. The method further includes determining, at the select coherency agent, a select coherency domain of the plurality of coherency domains associated with the coherency message based on the address translation. The method additionally includes providing the coherency message and a coherency domain identifier of the select coherency domain to a coherency interconnect for distribution to at least one of the plurality of coherency agents based on the coherency domain identifier.
In accordance with another aspect of the present disclosure, a processor device is provided. The processor device includes a coherency agent and a memory management unit. The memory management unit includes an address translation table comprising a plurality of entries. Each entry includes a first field to store a corresponding address value and a second field to store a coherency domain identifier of a corresponding coherency domain of a plurality of coherency domains.
In accordance with yet another aspect of the present disclosure, a system is provided. The system includes a plurality of coherency agents. Each coherency agent is associated with at least one of a plurality of coherency domains and comprising an address translation table. Each coherency agent is configured to generate a coherency message in response to a cache access at the coherency agent and determine a coherency domain identifier for the coherency message based on the address translation table and a first memory address associated with the cache access. The coherency domain identifier is associated with a select coherency domain of the plurality of coherency domains. The system further includes a coherency interconnect configured to distribute the coherency messages between select ones of the plurality of coherency agents based on the coherency domain identifier associated with the coherency message.
The term “coherency agent,” as used herein, refers to a component of a system that stores, accesses, modifies shared data of one or more coherent memories in a processing system, or participates in the coherency protocol with other components of the system (e.g., other coherency agents). Examples of coherency agents include, but are not limited to, processor cores with associated caches, stand-alone caches, and the like. For ease of discussion, certain aspects of the techniques disclosed herein are described in the illustrative context of coherency management by a processor core. However, the disclosed techniques can be implemented by other types of coherency agents using the guidelines provided herein without departing from the scope of the present disclosure. Further, the memory address translation techniques are described herein in the context of a memory management unit (MMU) for ease of illustration. These memory address translation techniques can be utilized in other contexts without departing from the scope of the disclosure.
Each of the coherency agents 101-108 includes an address translation component 120 for translating virtual memory addresses to physical memory addresses. The address translation component 120 can be implemented as, for example, a memory management unit (MMU), as described in greater detail herein with reference to
In the illustrated example, the multiple-processor system 100 is divided into three coherency domains (coherency domains 1-3), wherein the coherency agents 101 and 102 are assigned to coherency domain 1, the coherency agents 103 and 104 are assigned to coherency domain 2, and coherency agent 105, coherency agent 106, coherency agent 107, and coherency agent 108 are assigned to coherency domain 3. In one embodiment, the software executed at the multiple-processor system 100 controls which addresses are in which domains. Based on this coherency domain assignment, the address translation tables of the address translation components 120 of the coherency agents 101-108 are configured such that each virtual address entry includes a DID for the corresponding coherency domain.
In response to an operation that involves shared data (e.g., a read operation or a write operation) at one of the coherency agents 101-108, the coherency agent generates a coherency message for the operation. As part of the coherency message generation, the virtual address associated with the shared data is converted to a physical address by the address translation component 120 of the coherency agent. The address translation involves indexing an entry of the address translation table based on the virtual address and accessing a corresponding physical address portion, which is then used to generate the physical address. Further, the DID field of the indexed entry of the address translation table is accessed to determine the one or more DIDs associated with the virtual address. The coherency agent then provides a coherency message with the physical address to the system interconnect 114 along with the determined DIDs for transmission to the coherency agents assigned to the coherency domains identified by the determined DIDs. The DIDs can be provided as part of the coherency message, or the DIDs can be provided as a separate input to the system interconnect 114.
To facilitate routing of coherency domain-specific coherency messages, the system interconnect 114 includes a routing table 122 that identifies the correspondence between coherency agents and DIDs. Table 1 illustrates a basic implementation of the routing table 122 for the example of
Thus, the system interconnect 114 can limit the distribution of the coherency message to only those coherency agents associated with coherency domains identified by the coherency message based on a mapping of the DID(s) supplied with a coherency message to the routing information of the routing table 122. In the event that no DID is supplied (or a default or global DID “—” for the entire system), the coherency message can be broadcast to all coherency agents of the multiple-processor system 100.
In the illustrated example, the multiple-processor system 200 is divided into three coherency domains (coherency domains 1-3), wherein the coherency agents 201 and 202 are assigned to coherency domain 1, the coherency agents 203 and 204 are assigned to coherency domain 2, and coherency agents 202 and 204 are assigned to coherency domain 3. Thus, the coherency agent 202 is assigned to two coherency domains, coherency domain 1 and coherency domain 3, and the coherency agent 204 is also assigned to two coherency domains, coherency domain 2 and coherency domain 3. Based on this domain assignment, the address translation tables of the address translation components 220 of the coherency agents 201-204 are configured such that each virtual address entry includes one or more DIDs for the one or more corresponding coherency domains.
To facilitate routing of coherency domain-specific coherency messages between the coherency agents 201-204, the system interconnect 214 includes a routing table 222 (corresponding to routing table 122,
In the illustrated example, the multiple-processor system 300 is divided into two coherency domains (coherency domains 1 and 2), one for each processing node, wherein the coherency agents 301 and 302 are assigned to coherency domain 1 and the coherency agents 303 and 304 are assigned to coherency domain 2. Based on this coherency domain assignment, the address translation tables of the address translation components 320 each is configured such that each virtual address entry includes a DID for the corresponding coherency domain.
The intra-node interconnect 315 includes a routing table 323 to facilitate routing of coherency messages between the coherency agents 201 and 202 and the system interconnect 314. Likewise, the intra-node interconnect 316 includes a routing table 324 to facilitate routing of coherency messages between the coherency agents 303 and 304 and the system interconnect 314. The system interconnect 314 includes a routing table 322 to facilitate routing of coherency messages between the intra-node interconnect 315 and the intra-node interconnect 316. Tables 3-5 illustrate basic implementations of the routing table 322, 323, and 324, respectively that can be used to limit the distribution of the coherency message to only those coherency agents associated with coherency domains identified by the coherency message based on a mapping of the DID(s) supplied with a coherency message to the routing information of the routing tables 322-324.
In the event that of a load operation or a store operation, the LSU 416 provides a virtual address 420 to the data MMU 410 (along with write data in the event of a store operation). The data MMU 410 translates the virtual address 420 to a physical address 422 using a translation lookaside buffer (TLB) 424 or other address translation table. The data MMU 410 then provides the physical address 422 to the data cache 406 to identify the cache location involved with the load/store operation. Further, as part of the address translation, the data MMU 410 can identify one or more coherency domains associated with the virtual address 420 and provide the DID 426 of each of the identified coherency domains to the BIU 412.
In the event that the load/store operation to the cache location specified by the physical address 422 has coherency ramifications, the data cache 406 can provide a coherency indicator 428 to the BIU 412 to direct the BIU 412 to generate a coherency message. The coherency indicator 428 can include, for example, the physical address 422, the data value of the cache location prior to modification, the data value of the cache location after modification, the one or more DIDs identified by the data MMU 410, and the like.
In response to the coherency indicator 428, the BIU 412 generates a coherency message 430 with the relevant information and provides the coherency message 430 to the coherency interconnect for transmission to the appropriate coherency agents. Further, the BIU 412 provides the one or more DIDs 426 identified by the data MMU 410 during the address translation to the coherency interconnect, either as a separate signal or as part of the coherency message 430 itself. The coherency interconnect then can use the provided DIDs 426 to limit the transmission of the coherency message 430 to only the identified coherency domains.
In one embodiment, the virtual address 420 includes a virtual page number 522 that identifies a particular virtual page number 522 and a page offset 524 that identifies a particular page offset. The TLB 424 indexes an entry 526 of the address translation table 502 using the virtual page number and the virtual page number field 504. The TLB 424 then accesses a physical page number 528 from the physical page number field 510 of the indexed entry 526 and combines the physical page number 528 with the page offset 420 to generate a unique address value for the physical address 422. Further, the TLB 424 accesses the DID field 508 of the indexed entry 526 to obtain one or more DIDs 426 associated with the corresponding virtual page and outputs the DIDs 426 to a BIU or other coherency interface as described above.
In the example of
In one embodiment, a DID of “0” is used to signify a local coherency domain (e.g., the coherency domain of each of the intra-node interconnects 706 and 708) and a DID of “1” is used to signify a global coherency domain of all coherency agents of the multiple-processor system 700. Accordingly, the intra-node interconnect 706 is configured to route coherency messages having a DID of “0” to only those coherency agents connected to the intra-node interconnect 706 and to route coherency messages having a DID of “1” to both those coherency agents connected to the intra-node interconnect 706 and to the system interconnect 714 to distribute to other coherency agents directly or indirectly connected to the system interconnect 714. Likewise, the intra-node interconnect 708 is configured to route coherency messages having a DID of “0” to only those coherency agents connected to the intra-node interconnect 708 and to route coherency messages having a DID of “1” to both those coherency agents connected to the intra-node interconnect 706 and to the system interconnect 714 to distribute to other coherency agents directly or indirectly connected to the system interconnect 714. Thus, a DID of “0” serves to limit the transmission of a coherency message to only the local coherency domain and a DID of “1” serves to broadcast a coherency message to all coherency agents of the multiple-processor system 700.
In the illustrated example, the coherency agent 701 provides the coherency messages CM1 and CM2 to the intra-node interconnect 706 and the coherency agent 703 provides the coherency message CM3 to the intra-node interconnect 708. The coherency messages CM1, CM2, and CM3 have DIDs of “0”, “1”, and “0,” respectively. Based on the DIDs of the coherency messages CM1 and CM2, the intra-node interconnect 706 transmits the coherency message CM1 to only the coherency agent 702, but transmits the coherency message CM2 to both the coherency agent 702 and to the system interconnect 714, which provides it to the intra-node interconnect 708 for transmission to the coherency agents 703 and 704. Based on the DID of the coherency message CM3, the intra-node interconnect 708 transmits the coherency message CM3 to only the coherency agent 704.
The terms “comprises”, “comprising”, or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The term “another”, as used herein, is defined as at least a second or more. The terms “including”, “having”, or any variation thereof, as used herein, are defined as comprising. The term “coupled”, as used herein with reference to electro-optical technology, is defined as connected, although not necessarily directly, and not necessarily mechanically.
Other embodiments, uses, and advantages of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. The specification and drawings should be considered exemplary only, and the scope of the disclosure is accordingly intended to be limited only by the following claims and equivalents thereof.