Examples of the present disclosure generally relate to performing strict and relaxed ordered requests using a network on a chip (NoC).
A system on chip (SoC) (e.g., a field programmable gate array (FPGA), a programmable logic device (PLD), or an application specific integrated circuit (ASIC)) can contain a packet network structure known as a network on a chip (NoC) to route data packets between logic blocks in the SoC—e.g., programmable logic blocks, processors, memory, and the like.
The NoC can include ingress logic blocks (e.g., masters) that execute read or write requests to egress logic blocks (e.g., servants). An ingress logic block can receive multiple requests. If strict ordering is required, the ingress logic block may have to wait until a response to the first request is received from a first egress logic block before a second request can be transmitted to a different egress logic block. That is, strict ordering requires the responses to different egress logic blocks to occur sequentially. As such, this can cause substantial delay where the ingress logic block waits for a response from each egress logic block before issuing another read or write request.
Techniques for defining relaxed order requests are described. One example is an integrated circuit that includes a first hardware entity, a second hardware entity, and a network on a chip (NoC) that provides connectivity between the first and second hardware entities. The NoC includes an ingress logic block coupled to the first hardware entity and an egress logic block coupled to the second hardware entity where the ingress logic block includes a write tracker configured to receive a first request from the first hardware entity to write data to the second hardware entity and determine whether the first request is one of a relaxed ordered request or a strict ordered request, wherein the relaxed ordered request can be executed in parallel with a subsequently received response while the strict ordered request cannot be executed in parallel with a subsequently received response that has a different destination than the first request.
One example described herein is a method that includes receiving a first request from a first hardware entity to write data to a second hardware entity where the first hardware entity and the second hardware entity are communicatively coupled by a NoC and determining, at an ingress logic block in the NoC, whether the first request is one of a relaxed ordered request or a strict ordered request, where the relaxed ordered request can be executed in parallel with a subsequently received response while the strict ordered request cannot be executed in parallel with a subsequently received response that has a different destination than the first request.
So that the manner in which the above recited features can be understood in detail, a more particular description, briefly summarized above, may be had by reference to example implementations, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical example implementations and are therefore not to be considered limiting of its scope.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements of one example may be beneficially incorporated in other examples.
Various features are described hereinafter with reference to the figures. It should be noted that the figures may or may not be drawn to scale and that the elements of similar structures or functions are represented by like reference numerals throughout the figures. It should be noted that the figures are only intended to facilitate the description of the features. They are not intended as an exhaustive description of the description or as a limitation on the scope of the claims. In addition, an illustrated example need not have all the aspects or advantages shown. An aspect or an advantage described in conjunction with a particular example is not necessarily limited to that example and can be practiced in any other examples even if not so illustrated, or if not so explicitly described.
Embodiments herein describe a SoC that includes a NoC that supports both strict and relaxed ordering requests. That is, some applications may require strict ordering (such as many processor type operations) where requests transmitted from the same ingress logic block (also referred to as a NoC Master Unit (NMU)) to different egress logic blocks (also referred to as NoC Slave Units or NoC Servant Units (NSU)) are performed sequentially. However, other applications may not require strict ordering, such as interleaved writes to memory, In those applications, relaxed ordering can be used were the same ingress logic block can transmit multiple requests to different egress logic blocks in parallel. For example, an ingress logic block may receive a first request that is indicated as being a relaxed ordered request. After transmitting the request to a corresponding egress logic block, the ingress logic block may receive a second request to the same or different destination as the first request. The ingress logic block can transmit the second request to its destination without waiting for a response for the first request. In this manner, designating requests as relaxed ordering can avoid the delay caused by strict ordering.
In one embodiment, the ingress logic block still returns completion notices to the hardware entity submitting the request in order. Continuing the example above, if the second request completes before the first request (e.g., the ingress logic block receives a response from the egress logic block corresponding to the second request before receiving a response from the egress logic block corresponding to the first request), the ingress logic block waits to inform the entity submitting the request that the second request is complete until after the first request has completed. However, this may be a protocol specific requirement, and thus, may depend on the communication protocol used to transmit data on the NoC.
As shown, the NoC 105 interconnects a programmable logic (FL) block 125A, a PL block 125B, a processor 110, and a memory 120. That is, the NoC 105 can be used in the SoC 100 to permit different hardened and programmable circuitry elements in the SoC 100 to communicate. For example, the PL block 125A may use one ingress logic block 115 (e.g., a NMU) to communicate with the PL block 125B and another ingress logic block 115 to communicate with the processor 110. However, in another embodiment, the PL block 125A may use the same ingress logic block 115 to communicate with both the PL block 125B and the processor 110 (assuming the endpoints use the same communication protocol). The PL block 125A can transmit the data to the respective egress logic blocks 140 (e.g., NSUs) for the PL block 125B and the processor 110 which can determine whether the data is intended for them based on an address (if using a memory mapped protocol) or a destination ID (if using a streaming protocol).
The PL block 125A may include egress logic blocks 140 for receiving data transmitted by the PL block 125B and the processor 110. In one embodiment, the hardware logic blocks are able to communicate with all the other hardware logic blocks that are also connected to the NoC 105, but in other embodiments, the hardware logic blocks may communicate with only a sub-portion of the other hardware logic blocks connected to the NoC 105. For example, the memory 120 may be able to communicate with the PL block 125A but not with the PL block 125B.
As described above, the ingress and egress logic blocks 115, 140 may all use the same communication protocol to communicate with the PL blocks 125, the processor 110, and the memory 120, or can use different communication protocols. For example, the PL block 125A may use a memory mapped protocol to communicate with the PL block 125B while the processor 110 uses a streaming protocol to communicate with the memory 120. In one embodiment. a transfer network 130 in the NoC 105 can support multiple protocols.
In one embodiment, the SoC 100 is an FPGA which configures the PL blocks 125 according to a user design. That is, in this example, the FPGA includes both programmable and hardened logic blocks. However, in other embodiments, the SoC 100 may be an ASIC that includes only hardened logic blocks. That is, the SoC 100 may not include the PL blocks 125. Even though in that example the logic blocks are non-programmable, the NoC 105 may still be programmable so that the hardened logic blocks—e.g., the processor 110 and the memory 120 can switch between different communication protocols, change data widths at the interface, or adjust the frequency.
The NoC 105 permits entities (e.g., the PL blocks 125, the processor 110, and the memory 120) to submit write requests using strict or relaxed ordering. For example, the processor 110 may always use strict ordering when transmitting data across the NoC 105. However, the PL block 125A may include memory controllers that can use relaxed ordering to store data in the memory 120. Using the embodiments herein, a user can customize the SoC 100 so that certain writes facilitated by the NoC 105 are done using strict ordering or relaxed ordering.
Each of the ingress logic blocks 115 can include a write tracker 145 that tracks the write requests transmitted by the ingress logic blocks to the egress logic blocks 140. The write tracker includes the linked list 150 and all the status information about the write requests and received write responses. In one embodiment, the write tracker can handle a maximum number of requests (e.g., 64 requests) but this number can depend on the implementation.
The write tracker 145 includes a linked list 150 which has a head and a tail. The write requests can be added to the linked list 150 as they are received. Further, as discussed below, the write requests may be added to the linked list 150 depending on whether they are strict ordered or relaxed ordered requests. For example, if the linked list 150 already includes a strict ordered request, a subsequently received strict request may be blocked (e.g., not added to the linked list 150). The details for explaining how requests are added and removed from the linked list 150 are described in
The locations of the PL blocks 125. the processor 110, and the memory 120 in the physical layout of the SoC 100 are just one example of arranging these hardware elements. Further, the SoC 100 can include more hardware elements than shown. For instance, the SoC 100 may include additional PL blocks, processors, and memory that are disposed at different locations on: the SoC 100. Further, the SoC 100 can include other hardware elements such as I/O modules and a memory controller which may, or may not, be coupled to the NoC 105 using respective ingress and egress logic blocks 115 and 140. For example, the 110 modules may be disposed around a periphery of the SoC 100.
At block 310. the ingress logic block determines whether the request is blocked. A request can be blocked for multiple reasons. In one example, the linked list may already include a strict ordered request that is not yet complete. If the request received at block 305 is to a different destination than the strict ordered request already in the linked list, then the new request is blocked if it is also a strict ordered request. In another example, a request may be subdivided into different “chops” (e.g., a 512-byte write request is divided into two 256-byte chops). If the chops of a strict ordered request are for two different destinations, the second chop may be blocked while the first chop is transmitted to its destination. That is, the first chop can be transmitted to its egress logic block while the second chop has to wait. The memory system can also have a mode bit which can force a strict request to wait until all relax-ordered requests, with the same AXI ID, are retired from the write tracker. These examples are not intended to cover all scenarios where a request would be (at least partially) blocked by the ingress logic block. The types of scenarios may vary depending on the implementation of the NoC and the communication protocol being used.
There are also many situations where requests are not blocked. For example, the Advanced eXtensible Interface (AXI) communication protocol permits requests with different AXI IDs to occur in parallel (e.g., without strict ordering). Thus, a strict or relaxed ordered request to the same or different destination as another request that has a different AXI ID would not be blocked. Furthermore, AXI permits requests with the same AXI ID and the same destination ID (e.g., the same destination egress logic block) to occur in parallel. Thus, even if the linked list has a previous strict ordered request that has the same destination as a new strict ordered request received at block 305, the new request is not blocked. Stated oppositely, a previously received strict ordered request blocks a new strict ordered request (with the same AXI ID) only if the new request has a different destination as the previous received request. Further, if the previous request is a relaxed ordered request, a new request (whether strict order or relaxed order) is not blocked by the previous request regardless whether the new request has the same destination or a different destination. Again, the memory system can also have a mode bit which forces a strict request to wait until all relax-ordered requests, with the same AXI ID, are retired from the write tracker. These examples are not intended to cover all scenarios where a request would be not blocked at the ingress logic block.
If the request is blocked at block 310, the method 300 proceeds to block 315 where the ingress logic block adds a temporary entry in the write tracker for the request. That is, the ingress logic block does not add an entry to the linked list but may nonetheless store a temporary entry for the request where the request can wait until it is unblocked.
At block 320, the ingress logic block adds the request to the linked list after a previous blocking request has been serviced—i.e., is complete. For example, if a new received request is blocked by a previously received strict ordered request, after a response is received from the destination corresponding to the previously received request, the ingress logic block can add the temporary entry for the new request to the linked list and transmit the request to its destination egress logic block.
Returning to block 310, if the request is not blocked the method 300 proceeds to block 325 where the ingress logic block adds an entry corresponding to the request to the tail of the linked list. If the request is the only request currently being tracked in the linked list, its corresponding entry will be both the head and the tail of the linked list.
At block 330. the ingress logic block transmits the request without waiting for a response related to a previous request. That is, if there are other entries in the linked list corresponding to previously received requests, the ingress logic block can nonetheless transmit the newly received request to its destination without waiting for a response to the previous received request(s). In this manner, relaxed ordering can reduce delay between requests. That is, if a previous request is relaxed ordering, a new request can be transmitted without waiting for the ingress logic block to receive a response to the previous request. In contrast, if the previous request is strict ordering, a new strict order request that has the same AXI ID but a different destination ID is blocked. Thus, by providing the user with the ability to designate which requests are strict ordering and which are relaxed ordering, the user can avoid the delay caused by strict ordering which blocks requests with the same AXI ID but different destination IDs.,
There are multiple different ways to designate a request as relaxed ordering or strict ordering. In one embodiment, a new bit. referred to below as a RELAX bit, is added to each entry in an address map table maintained in the ingress logic blocks. This bit indicates whether the address region requires enforcement of strict or relaxed AXI write Order rule. That is, the user can designate which address regions should follow strict or relaxed ordering and the entity submitting the request (e.g., a P1 block, processor, or memory controller) can assign the relax_order_en bit accordingly. In another embodiment, two bits of AWUSER signals of AXI write address channel is used to force either Relaxed Write Order or Strict Write Order. In yet another embodiment, the relax_order_en bit is also is added to the write tracker entry data structure in the ingress logic block. This bit is set based on the look up address map table entry (or a remap operation). In another embodiment, another bit can be added to enforce blocking between previous relaxed order request and new strict order request (i.e., a block en bit), which is discussed below.
To illustrate how the RELAX bit can be used, in one embodiment, when a new strict AXI write request arrives at an ingress logic block, and its dest-ID/RELAX bit is selected by the address map, the write tracker checks whether the linked list contains any VALID entries with matching AXI-ID and RELAX bit set to 0 (indicating the previous request is a strict ordering request). If any match is found with a different dest-ID of the new strict ordered request, the new request is blocked until the matching entry receives all its NoC responses and is retired i.e., removed from the linked list. In contrast, when a new relaxed AXI write request arrives at an ingress logic block, and its dent-IDI RELAX bit is selected by the address map, the write tracker does not block the write request from being sent to NoC.
At block 410, the ingress logic block determines whether an entry corresponding to the request is at the head of the linked list That is, assuming the linked list only stores requests with the same AXI ID, the write tracker determines whether the request is the oldest request stored on the linked list (e.g., the request was received before all the other requests represented in the linked list). As mentioned above, at least for AXI, the ingress logic block informs the entity that submitted the request in the order the requests were received. Thus, if a request in the linked list finishes before a previously received request, the method 400 moves to block 415 where the write tracker waits until all previous responses in the linked list have been reported out.
Once that is done (or if the response was at the head of the linked list), the method 400 proceeds to block 420 where the write tracker reports the request as being complete—i.e., the request is retired.
At block 425, the write tracker deletes the entry from the linked list. That is, the head of the linked list is moved to the next entry in the linked list.
As shown, the entries in
Further,
Because Req2 is a relaxed ordered request, it is not blocked by the strict ordered request Req1. Thus, the write tracker can add an entry to the tail of the linked list 150 for the Req2 and forward its two chops to their respective destinations (which may be the same destination or different destinations). That is, the two chops can be forwarded in parallel to their destinations.
In this embodiment, the two chops of Req2 are represented by the same (i.e., single) entry in the linked list 150. That is, because Req2 is a relaxed ordered request, it does not matter in what order the chops are transmitted or the order in which responses to the two chops are received at the write tracker. Thus, the write tracker can use the same entry for Req2 to ensure that it receives two response (as indicated by the RESP section) before it retires Req2, but does not care in what order the responses to the chops are received.
After the entry for Req1 is removed, the entry for Req2 is now the head of the linked list 150. Because the responses for Req2 have already been received (as shown in
Although Req4 is strict ordered, Req4 chop 1 is not blocked because Req3 is relaxed order. However, because the Chop2 has a different destination than Req4 Chop1 the Chop2 is blocked from being transmitted in parallel with Req4 Chop1. If the destination of both chops were to the same destination as Req4 then both of the chops of Req4 could have been transmitted in parallel with Req3,. Instead, the write tracker proceeds to forward Chop1 on the NoC and stores entries for the two chops as temporary entries 505B and 505C. That is, the write tracker might not add the chops of Req4 to the linked list 150.
Further, because the response for Req3 has been received, the write tracker can report that Req3 is complete and remove its entry from the linked list 150. Also, because the response to Chop1 is received, Req4 Chop2 is no longer blocked by AXI strict ordering and can be sent to the NoC.
In this embodiment, because Req5 has a different AXI ID than the previous requests, the write tracker stores an entry for Req5 in the linked list 510. In one embodiment, the write tracker maintains a different linked list for each AXI ID. For example, an ingress logic block may correspond to multiple AXI IDs. The ingress logic block can maintain a respective linked list for each of the AXI
As mentioned above, AXI permits requests with different AXI IDs to be sent in parallel, regardless whether those requests are strict or relaxed order. That is, Req5 is never blocked by Req4 since they are assigned to different AXI IDs. Thus. the Req5 can be transmitted to the NoC without first waiting for the response to Chop2 to be received. Further, the write tracker can report receiving the responses to Req4 and Req5 in any order since they are assigned to different AXI IDs (and stored in different linked lists). That is, if the write tracker receives the response to Req5 before the response to Req4 Chop2, the write tracker can go ahead and report that Req5 is complete to the entity that submitted it without waiting first to report Req4 as being complete which is compliant with AXI.
While the foregoing is directed to specific examples, other and further examples may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.