This disclosure is related generally to the field of network on chip interconnects for systems on chip.
A network on chip (NoC) connects one or more intellectual property (IP) block initiator interfaces to one or more IP target interfaces. An example of an initiator IP is a central processing unit (CPU) and an example of a target IP is a memory controller. Initiators request read and write transactions from targets. The target gives responses (data for reads and in many systems acknowledgements for writes) to the transactions. The NoC transports requests and responses between initiators and targets. The time from which an initiator requests a transaction until it receives a response is usually multiple clock cycles. Often it is ten or more cycles and sometimes more than 100 cycles. It is possible, and in fact common, for an initiator to have more than one transaction pending simultaneously. Furthermore, if transactions are directed to different targets or if they access different data within a single target then responses may arrive at initiators out of order.
A NoC associates responses with their requests and therefore, at the interface to the initiator, stores some identification information. The amount of storage limits the number of simultaneously pending transactions that can be supported. If an initiator requests a transaction while the maximum supported number of pending transactions is pending then the NoC signals the initiator that it is not ready. In another case, if the target interface supports a smaller number of pending transactions than the initiator interface, the NoC signals the initiator that it is not ready. In a third case, if more than one initiator simultaneously make requests to the target then there is contention between the initiators for access. One initiator will have to wait. To that initiator the NoC will signal that it is not ready.
OCP and Advanced Microcontroller Bus Architecture (AMBA) Advanced Extensible Interface (AXI) are examples of widely used industry standard transaction interfaces. They use a handshake protocol with a valid (vld) sender signal and ready (rdy) receiver signal indicating a data transfer. As shown in
A NoC is, internally, a network. It is therefore necessary to generate one or more transport packets for each transaction request. As indicated in
State of the art probes only gather statistics within the transport network topology. To optimize the performance of the system it is useful to know certain statistics about transactions that are only available within the NIU. Four are:
The time from initiator request vld for the first word of a transaction to NoC request rdy (the request acceptance latency);
The time from initiator request vld and NoC request rdy for the first word of a transaction to NoC response vld for the first word of the transaction (the response latency);
The time from initiator request vld for the first word of a transaction to NoC response valid for the last word of the transaction (total transaction latency); and
The number of pending transactions, which indicates the utilization of the NoC by the initiator.
An example of the behavior an initiator NIU to multiple pending transactions is shown in
The latency statistics for a single given transaction, or number of pending transactions for a single given clock cycle are not very interesting. However, the average over many transactions is useful, for example, to adjust the priority of requests from different initiators or to design the behavior of IPs in order to achieve certain design goals. A histogram of transactions per request acceptance latency, transactions per response latency, or clock cycles per number of pending transactions is even more useful for system performance optimization.
Simulations of the functions of an SoC are easily programmed to gather and report transaction statistics. However, simulations that accurately model the behavior of the SoC run slowly. Useful simulations are impractical during software development and impossible at run time.
The disclosed invention is a system, device and method to gather data about transactions in order to calculate statistics, particularly histograms of latencies and numbers of pending transaction.
The details of the disclosed implementations are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
The same reference symbol used in various drawings indicates like elements.
A probe within an initiator interface of a NoC, for gathering transaction statistics data is disclosed. The probe provides a set of registers containing count values, each of which corresponds to a bin of a histogram. The bin count statistics can be used during system performance analysis, software debug, and real-time operation.
Referring to
In some implementations, the value of thresholds between bins is reprogrammable under software control. This provides for different scopes and different ranges of data in different use cases. For example, transactions to a fast target might typically received responses within ten cycles whereas transactions to a slow target might typically take 100 to 200 cycles to receive a response. In the first case, histogram bins represent transactions over latency would be separated by thresholds in the 1 to 10 cycle whereas in the second case the same bin count registers could be used by with thresholds in the 100 to 200 cycle range.
In some implementations, the type of histogram data to be gathered in each bin can be reprogrammed under software control. More than one kind of statistics can be gathered simultaneously in different bins. In one embodiment, the histogram data that can be gathered are a number of elapsed clock cycles with a number of pending transactions in defined range bins, and a number of transactions with cycles of latency in defined range bins.
Histogram data for number of elapsed clock cycles with a number of pending transactions in defined bins having a range with a minimum and maximum are gathered on a clock cycle by incrementing histogram bin counters. In one embodiment, shown in
Histogram data for number of transactions with cycles of latency in defined bins of min/max range are gathered on the completion of latency periods by incrementing histogram bin counters. In one embodiment, shown in
In the embodiment shown in
To reduce the amount of hardware in a NoC, especially the number of timers, one embodiment shares timers between more than one initiator NIU. This can be implemented with a crossbar switch that connects the Vld, Rdy, Head, and Tail control signals of the request and response paths of different initiators. While each initiator NIU can complete no more than one transaction per cycle, multiple initiator NIUs can complete multiple transactions per cycle. To allow multiple transaction completion, timers can be arranged in banks Each bank can have one value and an incr output signal. A reverse crossbar switch can connect the value and incr signals to threshold bin counters. Timer banks can be arranged in groups of four timers. This configuration provides a good balance between the number of crossbar switch ports and the ability to allocate an optimal number of timers to NIUs.
In one embodiment the crossbar switch control that allows the allocation of banks to different NIUs is software programmable. The reverse crossbar switch control that allows the allocation of bin counters to banks can also be software programmable.
Note that the number of timers allocated to an initiator NIU may be less than the total number of pending transactions. In one embodiment, when such a configuration is programmed, then at the start of a transaction when no timers are available the transaction is disregarded by the probe and a software accessible flag is set to indicate that a transaction was disregarded.
In one embodiment, a programmable filter is applied to the incr output of the module that gathers an enumeration of the number of pending transactions. This allows software to control criteria of which cycles will increment pending bins. In the embodiment shown, the criteria are every cycle and cycles in which the number of pending transactions is greater than zero.
In one embodiment, a software programmable filter is applied to the transactions to be observed. Transactions not meeting filter criteria can be disregarded. Filter criteria can include but are not limited to transaction sideband signals, target identifier, address bits, opcode, security bits, burst size, and ID.
In one embodiment, log2 of the number of cycles for pending transactions can exceed the number of bits in the timer. A time scaling module can be implemented. The scaling module causes the timer to increment only once in a cycle time window.
When the latency probe logic receives transaction event information from initiator NIUs in more than one domain, the probe can be in the fastest of all connected clock domains to ensure that its sampling frequency is greater than the frequency of received transaction signaling so that no transactions are missed. In one embodiment, a clock domain adapter is implemented between initiator NIUs and the probe.
In one embodiment, a timer saturates at its maximum value. In one embodiment, a bin counter can overflow. A software resettable status flag indicates overflow for each bin. When counters overflow they can set their overflow flag and saturate their count value.
In one embodiment the probe comprises clock gating. Clocks can be disabled to flip-flops on transaction timers and enumerators of pending transactions when not in use. A programmable configuration register can cause the disconnection of power to the rest of the probe and another configuration register can disable the clock signal globally to the rest of the probe. These configurations allow power savings during operation, under software control, when statistics gathering is not necessary.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, many of the examples presented in this document were presented in the context of an ebook. The systems and techniques presented herein are also applicable to other electronic text such as electronic newspaper, electronic magazine, electronic documents etc. Elements of one or more implementations may be combined, deleted, modified, or supplemented to form further implementations. As yet another example, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. In addition, other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. Accordingly, other implementations are within the scope of the following claims.
This application claims priority to U.S. Provisional Application No. 61/500,078, filed Jun. 22, 2011, entitled “Latency Probe,” the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61500078 | Jun 2011 | US |