The present technique relates to the field of data processing.
Data processing systems may maintain a time count which can be used for various purposes, such as scheduling tasks on a processing unit, tracking correlations between events, or managing coherent access to data for example. However, larger data processing systems may have two or more separate integrated circuits. Hence, it may be desirable for a common view of time to be maintained across different integrated circuits.
Viewed from aspect, the present technique provides a method for correlating a first local time count of a first integrated circuit and a second local time count of a second integrated circuit, the first integrated circuit and the second integrated circuit configured to communicate via a communication network within a data processing apparatus; the method comprising:
determining a signal propagation latency associated with propagation of a latency determining signal between the first integrated circuit and the second integrated circuit on a time control signal path separate from the communication network; and
correlating the first local time count and the second local time count in dependence on the signal propagation latency and at least one of a transmission time and a reception time of a time correlating signal transmitted between the first integrated circuit and the second integrated circuit on the time control signal path.
Viewed from another aspect, the present technique provides a data processing apparatus comprising:
a first integrated circuit to maintain a first local time count;
a second integrated circuit to maintain a second local time count;
a communication network to provide communication between the plurality of integrated circuits; and
a time control signal path between the first integrated circuit and the second integrated circuit separate from the communication network;
wherein at least one of the first integrated circuit and the second integrated circuit comprises a time control agent to determine a signal propagation latency associated with propagation of a latency determining signal between the first integrated circuit and the second integrated circuit on the time control signal path, and to correlate the first local time count and the second local time count in dependence on the signal propagation latency and at least one of a transmission time and a reception time of a time correlating signal transmitted between the first integrated circuit and the second integrated circuit on the time control signal path.
Further aspects, features and advantages of the present technique will be apparent from the following description of examples, which is to be read in conjunction with the accompanying drawings.
Data processing system may include multiple processing units or other devices which may require a common view of time in order to function. For example, a local time count may be used task scheduling, measuring correlations between events for debugging purposes, or for managing coherency in accesses to data. When all of the devices requiring access to the common time count are on a single integrated circuit or system-on-chip (SoC), distributing a global view of time may be challenging but achievable, as the system design constraints are known once the integrated circuit design is complete. For example, signal propagation delays along certain paths can be characterised so that signal propagation delays associated with distributing the time count can be accounted for, enabling different components of the integrated circuit to see aligned local time counts.
However, increasingly data processing systems are becoming larger and so the data processing apparatus may include more than one integrated circuit, which nevertheless includes elements on different integrated circuits which may require a common view of time. Hence, there may be a desire to correlate a first local time count on a first integrated circuit with a second local time count on a second integrated circuit. This can be difficult since the latencies associated with communication between different integrated circuits would not be precisely known at design time, since they may depend on factors such as the particular way in which the integrated circuits have been connected together on a printed circuit board (e.g. the size of solder bumps coupling ports of the integrated circuit to communication paths, the distance separating the integrated circuits on the printed circuit board, and so on). Also, communication networks provided to communicate between integrated circuits may be used for general data transmissions, and so signal propagation delays on the network may be variable depending on the current bandwidth required for the data transmissions. Also, even if identical integrated circuits are brought out of reset at the same time, local time counters on each circuit may not start in a synchronised manner due to local timing variations caused by manufacturing process variation. Hence, it is typically difficult to maintain a common view of time across different integrated circuits.
The technique described in this application provides a time control signal path which is separate from the communication network used for communication between the first and second integrated circuits. A signal propagation latency associated with propagation of a latency determining signal between the first and second integrated circuits on the time control signal path is determined. First and second local time counts on the first and second integrated circuits respectively are correlated in dependence on the signal propagation latency and at least one of a transmission time and a reception time of a time correlating signal transmitted between the first and second integrated circuits on the time control signal path.
Hence, a dedicated time measuring signal path can be provided separate from the regular communication network, so that a deterministic latency associated with a signal passing on that path can be measured. A time correlating signal transmitted on that signal path can then be used to correlate the first and second local times. This enables correlation of the first and second local time counts in a more precise manner than is currently available. This can help to support larger data processing systems with multiple processor cores spread over several integrated circuits, while still supporting features such as debugging or task scheduling which may require a common view of time across the different integrated circuits.
The correlation between the first and second local time counts may be achieved in different ways. In some cases the correlating may comprise aligning the first and second local time counts, so that one (or both) of the first and second local time counts is adjusted to map to some common time value. This approach may use hardware circuits to adjust the local time count on one of the integrated circuits. This can make it simpler for software to use the local time counts, since there is no need for further adjustment in software—the software can be unaware of the alignment being performed in hardware.
The correlating may comprise storing or outputting an offset value which represents an offset between the first and second local time counts. In some implementations, the offset can be used by hardware to update one of the local time counts to bring it in line with the other local time count. For example, a time generating circuit may be provided on at least one of the first and second integrated circuits. The time generating circuit may generate the corresponding one of the first and second local time counts in dependence on the offset value.
In other examples, the offset may simply be stored or output for access by software. Hence, even if the first and second local time counts themselves are not adjusted in hardware, by generating an offset value to characterise the difference between the first and second local time counts, software can then take this into account when using the time counts (e.g. during task scheduling or debugging). For example, the software can add or subtract the offset value from one of the first and second local time counts to align it with the other local time count.
The signal propagation latency can be determined in different ways. In one example the signal propagation latency may be determined in dependence on a round trip time of the latency determining signal and a response to the latency determining signal transmitted between the first and second integrated circuits. For example, one of the first and second integrated circuits may transmit the latency determining signal to the other, and the other integrated circuit may then respond as soon as it receives the latency determining signal. If the propagation latencies associated with the latency determining signal and the response are assumed to be approximately equal, then the signal propagation latency can be determined as half the round trip time. Alternatively, in some implementations, the receiving integrated circuit could wait for a certain constant delay period D between reception of the latency determining signal and transmission of the response, and in this case the signal propagation latency may correspond to ½*(RTT−D), where RTT is the round trip. Using a delay period D allows the time control signal path to be implemented as a single bidirectional channel shared between transmission and reception in both directions between the first and second integrated circuits, rather than two unidirectional wires dedicated to transmission in opposite directions between the first and second integrated circuits.
Alternatively, it may not be necessary to actually transmit a signal on the time control signal path at the time of correlating the first and second time counts. Instead, the signal propagation latency may have been determined previously by transmitting a latency determining signal on the time control signal path, with the signal propagation latency stored to a storage element which is accessible to at least one of the first and second integrated circuits. Hence, at the time of correlating the first and second local time counts, the signal propagation latency may be determined by reading the signal propagation latency from a storage element.
As discussed above, the correlation of the first and second type local time count can be performed in different ways. The examples below assume that the time correlating signal is transmitted from the first integrated circuit to the second integrated circuit (although it will be appreciated that the labels “first” and “second” are arbitrary and so could be mapped either way to a pair of integrated circuits). Hence, the first integrated circuit is the integrated circuit which initiates the time correlating signal and the second integrated circuit is the one that receives it.
In one example the correlating may comprise recording a value T1 of the first local time count when the time correlating signal is transmitted at the first integrated circuit, recording a value T2 of the second local time count when the time correlating signal is received at the second integrated circuit, and determining an offset value for correlating the first local time count and the second local time count in dependence on T1, T2 and the signal propagation latency TL. Hence, the correlation is dependent on the transmission time T1 and reception time T2 of the time correlating signal as measured using the first and second local time counts respectively and the signal propagation latency TL determined previously. For example, the offset value may comprise a value corresponding to a difference between T1 and (T2−TL). Note that this difference could be calculated in different ways, e.g. as T1−(T2−TL) or as T1+TL−T2. As discussed above, the offset value could simply be output or stored for access by software which can use it to correlate the timings on the different integrated circuits, or alternatively the offset value may be used to adjust one of the first and second local time counts in hardware. For example, at least one of the integrated circuits may include a local time generating circuit which updates a previous offset being applied to the corresponding local time count based on the newly correlated offset value. While the adjustment could be applied to either the first or the second local time count, it may be desirable to ensure that no adjustment to a local time count can result in time going backwards, because running time backwards could result in a non-unique view of time (where two separate moments in time correspond to the same count value), which could be problematic for scheduling of events or debugging purposes. Time running backwards can be avoided by applying the adjustment using the correlated offset value to the one of the first and second local time counts that have the lowest time count value. Hence, the lower of the first and second time count values is updated to move forward in time to match the other of the local time counts, or alternatively both local time counts could be updated to some future value.
Another approach for correlating the first and second local time counts may be to transmit the time correlating signal from the first integrated circuit to the second integrated circuit when the first local time count reaches a predetermined count value which corresponds to a difference between a future count value and the signal propagation latency, and then to set the second local time count to the future count value in response to receipt of the time correlating signal at the second integrated circuit.
Hence, rather than measuring reception and arrival times of the time correlating signal and calculating an offset based on the difference, with the second approach the first integrated circuit determines a future instant of time which is to be set to the second local time count and transmits the time correlating signal at a transmission timing selected such that by the time the time correlating signal reaches the second integrated circuit, the first local time count will be at the future count value. Hence, when the second integrated circuit responds to receipt of the time correlating signal by setting the second local time count to the future count value, this results in correlation of the first and second local time counts. This approach can be particularly useful in cases where one integrated circuit acts as the primary integrated circuit which controls time generation across the entire system, and the other integrated circuits act as secondary integrated circuits which take their view of time from the primary integrated circuit. For example, this approach can be useful in systems where integrated circuits can be placed in a power saving state when not needed, to vary the number of active integrated circuits depending on the processing workload required. For example, the primary circuit may be always powered, but depending on current workloads may allocate tasks to other integrated circuits, with variable numbers of secondary integrated circuits powered up as required. Hence, at the time of powering up one of the secondary integrated circuits, the primary integrated circuit may push its view of time to the secondary integrated circuit to start processing on the secondary integrated circuit with a view of time aligned to the primary integrated circuit.
In some implementations the first integrated circuit may transmit an indication of the future count value to the second integrated circuit before transmitting the time correlating signal. The transmitted indication may be any information which enables the future count value to be determined by the second integrated circuit (the future count value does not have to be explicitly indicated). In some cases, there may be a risk that the future count value takes a long time to reach the second integrated circuit. For example, the time control signal path used to transmit the time correlating signal may be a relatively simple interface which is not capable of transmitting binary values with multiple bits that may be necessary to represent the future count value. Hence, the future count value may be transmitted using the regular communications network and if the current load conditions on the communications network are high then the future count value may take longer to reach the second integrated circuit. If the time correlating signal is received before the second integrated circuit has received the future count value, this could lead to incorrect setting of the second local time count. This issue can be addressed in different ways.
In one example, the second integrated circuit may acknowledge receipt of the future time count value. If the first local count value is detected by the first integrated circuit as reaching the predetermined count value before the second integrated circuit has acknowledged receipt of the future time count value, then the first integrated circuit may restart the correlating process. For example, the restarting may comprise determining another future count value and transmitting it again. The first integrated circuit may then proceed to transmit the time correlating signal on an attempt when the second integrated circuit does acknowledge receipt of the future time count value before the first local count value reaches the predetermined count value. This approach places the burden on determining whether the correlating has been successful on the first integrated circuit.
Alternatively, the responsibility could be placed on the second integrated circuit to request a restart of the correlating process if the time correlating signal is received by the second integrated circuit before the future time count value has been received by the second integrated circuit. This second approach avoids the need for an acknowledgement to be sent by the second integrated circuit, saving some bandwidth on the communications network for example.
Alternatively, other techniques may not require the first integrated circuit to transmit an indication of the future count value at all. For example, if the correlating is triggered by some predetermined event such as a reset operation then the future count value could simply be a count value which is a fixed number of cycles after the timing of the reset event and in this case the first and second integrated circuits may effectively be hardwired to agree on some particular value for the future count value. This approach may be sufficient in systems where the correlation is typically only performed following certain events such a reset. However, the approach discussed above where the future count value is transmitted from the first to the second integrated circuit can be more flexible in permitting correlations at more arbitrarily defined timings.
In the approach where the correlation is performed based on controlling the transmission timing of the time correlating signal in dependence on the signal propagation latency, the correlation could be performed after a reset or power up event for the second integrated circuit, in which case the second local time count value may not yet have been initialised at the time the correlation is performed. In this case, it may be sufficient to update the second local time count to match the future time count.
However, if the correlation is performed subsequently at a time when the second integrated circuit is already counting time using its second local time count, then there is a possibility that the second local time count may already be ahead of the future time count value chosen by the first integrated circuit. As mentioned above, it can be undesirable for time to be seen to run backwards. Therefore, in some embodiments, the second integrated circuit may determine whether the future count value is less than a reception time of the time correlating signal as measured using the second local time count. When the future count value is greater than or equal to the reception time, the second integrated circuit may align its second local time count with the first local time count by setting the second local time count to the future count value. However, when the future count value is less than the reception time, the second integrated circuit may align the second local time count with the first local time count by pausing the second local time count for a number of cycles corresponding to the difference between the reception time and the future count value. In this way, by the time the second local time count is restarted after the pause, the first local time count value will have caught up with the second local time count, to align the first and second local time counts.
Regardless of which of the options discussed above is used for correlating the first and second local time counts, the technique can be applied to any data processing system having multiple integrated circuits which maintain separate local time counts. However, it can be particularly useful in a case where at least one of the first integrated circuit and the second integrated circuit comprises coherency managing circuitry to manage coherency between data accessed by the first integrated circuit and data accessed by the second integrated circuit. Some coherency protocols may rely on the local time stamps providing a common view of time. To allow such coherency schemes to span the first and second integrated circuits, this may require correlation of the local time stamps maintained by each integrated circuit.
Also, the technique discussed above can be particularly useful where at least one of the first and second integrated circuits includes debug circuitry for capturing or outputting information tracking timings of events occurring in the first integrated circuit or the second integrated circuit (in particular if both integrated circuits have their own debug circuitry). Without correlating the first and second local time stamps it may be difficult for a debugger to determine whether an event occurring in the first integrated circuit occurred before or after an event occurring in the second integrated circuit. By enabling the first and second local time stamps to be correlated (and optionally aligned) this can simplify debugging of systems spanning multiple integrated circuits.
The above examples discuss a first integrated circuit and a second integrated circuit, but the technique can also be applied to systems including three or more integrated circuits. In this case, the first and second integrated circuits discussed above may be any two of the three or more integrated circuits. The three or more integrated circuits may be connected by respective time control signal paths in a topology having the property that any two of the integrated circuits are connected either directly by one of the time control signal paths or indirectly by control signal paths passing via at least one other integrated circuit. Hence, it is not necessary to connect every single pair of integrated circuits via a dedicated time control signal path, instead it is enough that it is possible to align or correlate the time stamps on any two integrated circuits directly or indirectly via correlations with other circuits. The three or more integrated circuits could be connected in a number of different topologies, such as a ring topology, line topology, tree topology or star topology, for example.
The time control signal path may comprise a bidirectional communication path with a deterministic propagation delay. Unlike a bus network where a requesting master may need to request access to the bus and the time taken to gain access to the bus may depend on what other data transmissions are currently pending or on relative priority between different masters asserting signals on the bus, with the time control signal path the signal propagation delay may be deterministic so that the time between requesting a transmission and the other integrated circuit receiving the transmission may be relatively repeatable between transmissions.
In some examples, the bidirectional communication paths may comprise a pair of wired signal paths which pass along corresponding routes between the first and second integrated circuit. Hence, one of the signal paths may be used for signals transmitted by the first integrated circuit to the second integrated circuit and the other signal path may be used by signals translated from the second integrated circuit to the first integrated circuit. This can simplify characterisation of the signal propagation delay since it is relatively straightforward to provide hardware on one of the first integrated circuits for mirroring a received signal to generate a transmitting signal on the other of the pair of wired signal paths.
However, it is also possible for the time control signal path to comprise a single bidirectional wire so that at any given time either the first integrated circuit may be transmitting signals to the second integrated circuit or the second integrated circuit may be transmitting signals to the first integrated circuit, but not both at the same time. By providing for a predetermined delay period between transmission and reception, a protocol can be used to measure the signal propagation delay on the time control signal path even if it is not possible for both integrated circuits to be simultaneously transmitting and receiving. An example is shown in
The time control signal path could also comprise a wireless signal path. For example, integrated circuits may communicate via transmission of radio signals within the data processing system or by transmission of pulses of light. The communication network used for high level data communication between the integrated circuits could also use wireless communication.
Another approach may be that the respective integrated circuits correspond to different layers within a three dimensional integrated circuit. The time control signal path could include through vias which pass vertically between different layers of the three dimensional structure through the substrate of at least one layer.
The first and second integrated circuits may comprise any separate integrated circuits assembled into a data processing system. For example, each integrated circuit may correspond to a set of components formed on a single piece of silicon or other semiconductor. Each integrated circuit may be such that the propagation delays of associated with a given integrated circuit may be characterised at design time (subject to natural variation caused by manufacturing process variation), while the signal propagation delays between different integrated circuits cannot be characterised a design time. The respective integrated circuits may be mounted on a common printed circuit board and connected to the circuit board via solder bumps, conductors etc.
The first SoC 4 has a data communications interface 20 for communicating with the second SoC 5 via a communications network 22. Also, a coherency interface 24 is provided for communicating with a corresponding coherency interface of the second SoC 5 via a coherency signal path 26, so that a coherent view of data accessed by both SoCs 4, 5 can be maintained. For example, snoop transactions and snoop responses could be transmitted over the coherency signal path 26. While this example has separate communication paths for data communications and coherency messages, other examples could share the same path for both data and coherency communications, so that the data communications interface 20 and the coherency interface 24 could be a single component. The first SoC 4 includes a local time generator 30(4) for generating a first local time count which provides a view of time which is common across the first SoC 4. Time stamps generated by the local time generator 30(4) can be routed to various components of the first SoC 4, such as the coherency circuitry 10, the CPUs 6, and trace unit 32 (debug module) for monitoring events occurring within the first SoC 4 and generating a stream of trace packets output to an external debugger via a debug port 34. For example, the trace packets may be timestamped with a value of the first local time count generated by the local time generator 30(4), and may provide information on events occurring within the SoC 4, such as the addresses of executed instructions, addresses of data accesses, occurrence of exceptions, faults, cache misses, branch mispredictions or other events. While not shown in
The second SoC 5 includes the same components as the first SoC 4 in this example, so are not described again for conciseness. However, it will be appreciated that this is not essential, and in many cases the components of the two SoCs 4, 5 may differ. In general, the provision of multiple SoCs within the same data processing system 2 enables larger more powerful processing systems to be provided with a greater number of processor cores 6 whilst still enabling coherency to be maintained across the different SoCs so that the system as a whole behaves as a single larger processor cluster even though it is distributed across multiple integrated circuits 4,5.
However, when multiple integrated circuits 4, 5 need to agree on a common view of time, for example to correlate the events tracked by the trace units 32 on the respective SoCs 4, 5 or to schedule tasks across the processors 6 in the two SoCs 4, 5, this can create challenges in managing a common time count. While the first and second local time counts generated by each local time generator 30(4), 30(5) may be distributed within a SoC to provide a common view of time across the SoC, because the design constraints associated with each SoC may be known at design time, the communication delays between the SoCs on the data communications network 22 cannot be determined at design time, because they depend on current bus utilisation as well as on the particular way in which the integrated circuits have been connected together. Hence, the techniques used within a SoC to align time counts seen by different components cannot be applied to components in different SoCs.
A time control agent 40 is provided on each of the respective SoCs 4, 5. The time control agents 40(4), 40(5) communicate with each another via a time control signal path 42. Although not shown in
The first time control agent 40(4) transmits the latency determining signal 60 on the time control signal path 42 and records the transmission time TA using its first local time count 30(4). The first time control agent 40(4) waits for the predetermined delay time TD, and switches to receiving mode once that delay period has expired to start listening for the response 62 sent by the second time control agent 40(5).
On the other hand, the second time control agent 40(5) initially is in the receiving mode, and when the latency determining signal 60 is detected by the second time control agent then it starts to count time until the predetermined delay period TD has expired. Although the time counts of the first and second SoCs 40(4), 40(5) may not be synchronised they may still count time at the same rate (for example they may be controlled by a common clock signal). When the predetermined delay TD expires, the second time control agent 40(5) switches to driving mode, and transmits the response 62 to the latency determining signal 60 to the first time control agent 40(4). On receipt of the response 62 the first time control agent 40(4) records the arrival time TB of the response 62 using the first local time count generated by local time generator 30(4). The first time control agent 40(4) can then calculate the signal propagation delay TL according to (TB−TA−TD)/2.
In some cases, the calculation of the signal propagation delay TL in
Alternatively, at step 50 of
Also, as shown in
Note that when updating the offset register 86 following a correlation event, if the new offset is computed between T1 and T2 where the offset register to be updated has been applied to the time used to compute the time offset, then the new offset register value updates the offset register by adding or subtracting the newly computed offset to/from the current offset register contents. If the new offset is computed between T1 and T2 without the offset register to be updated being applied to the time used to compute the time offset, then the new offset register value is simply the computed offset. Also, while the adjustment to the offset could be applied to either of the first and second local time counts, it may be preferable to adjust one of the local time counts which has the lowest current time value to move it forward to match the time count on the other SoC. This prevents time being seen to run backwards. Hence, if the first local time count is ahead of the second local time count, then the second local time count is corrected by updating the offset value 86 in the second SoC's local time generator 30(5). If the first time count is lagging behind the second time count, then the first SoC's local time generator 30(4) would be updated instead. Either way the local time counts on both SoCs can be aligned.
When the first local time count reaches a value TF−TL which corresponds to the difference between the future time count TF and the previously determined signal propagation latency TL, a time correlating signal 104 is transmitted on the time control signal path 42 from the first time control agent 40(4) to the second time control agent 40(5). Due to the timing at which the time correlating signal 104 is transmitted, it is expected that at the time when the time correlating signal 104 is received by the second time control agent 40(5), the first local time count will be equal to the predetermined future time TF. On receiving the time correlating signal 104, the second time control agent 40(5) checks whether the current value T2 of the second local time count is greater than the future time TF. If T2≤TF then the second time control agent 40(5) sets the second local time count equal to TF, and this is expected to the align the first and second local time count to one another. If T2>TF, i.e. the second local time count is already ahead of the predetermined future time, then setting the second local time count to TF would cause time to be seen to run backwards which would be undesirable. Hence, if T2>TF, then instead the second local time generator 30(5) is paused for T2−TF cycles and then restarted, at which point the first local time count will have reached T2 and so now the first and second local time counts are aligned. In embodiments where the correlation is only performed at a time of resetting or powering up the second integrated circuit, then it may be assumed that T2 cannot be greater than TF as the second time generator may not have started counting yet, and so in this case it may not be necessary to check whether T2 is greater than TF, and instead the second local time count can simply be set to equal TF on receipt of the time correlating signal (in the same way as the T2≤TF case described above).
With the approach shown in
The transmission of the future time 100 as shown in
Hence, the techniques discussed above provide a way of characterising chip-to-chip time stamp skew and correlating or aligning the different local time counts across the system, without needing to explore the time stamps between the chips explicitly over a high level communications network, which would have issues as the chip-to-chip data link may have high latency and large variations in the latency. It also provides a technique which scales better to systems with multiple SoCs in the same circuit board. As shown in
As shown in
Alternatively, as shown in
In the present application, the words “configured to . . . ” are used to mean that an element of an apparatus has a configuration able to carry out the defined operation. In this context, a “configuration” means an arrangement or manner of interconnection of hardware or software. For example, the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. “Configured to” does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.
Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
20160170440 | Aweya | Jun 2016 | A1 |
20170168520 | Yu | Jun 2017 | A1 |
20180232006 | Ganfield | Aug 2018 | A1 |
Number | Date | Country | |
---|---|---|---|
20180309565 A1 | Oct 2018 | US |