The present invention relates to data processing. More particularly the present invention relates to an apparatus, a system, a chip containing product, a method, and a medium.
Some data processing apparatuses are able to perform processing operations in a plurality of processing contexts. In such apparatuses, synchronisation of memory subsequent to one or more maintenance operations can be consuming in terms of both time and resources.
In a first example configuration there is provided an apparatus comprising one or more processing elements, each processing element of the one or more processing elements comprising:
In a second example configuration there is provided a system comprising:
In a third example configuration there is provided a chip-containing product comprising the system of the second example configuration assembled on a further board with at least one other product component.
In a fourth example configuration there is provided a method of operating an apparatus comprising one or more processing elements, the method comprising with the one or more processing elements:
In a further example configuration there is provided a non-transitory computer readable storage medium to store computer-readable code for fabrication of an apparatus comprising one or more processing elements, each processing element of the one or more processing elements comprising:
The present invention will be described further, by way of example only, with reference to configurations thereof as illustrated in the accompanying drawings, in which:
Before discussing the configurations with reference to the accompanying figures, the following description of configurations is provided.
In some configurations there is provided an apparatus comprising one or more processing elements, each processing element of the one or more processing elements comprising processing circuitry configured to perform processing operations. The processing operations are carried out in one of a plurality of processing contexts. Each of the one or more processing elements also comprises context tracking circuitry configured to store context tracking data indicative of active contexts of the plurality of processing contexts in which the processing operations have been carried out by the processing circuitry. Each of the one or more processing elements also comprises control circuitry responsive to a request for a memory synchronisation occurring subsequent to at least one maintenance operation, the at least one maintenance operation associated with a given set of one or more contexts of the plurality of processing contexts, to determine whether at least one of the given set of one or more contexts is indicated in the context tracking data. The control circuitry is configured, in response to the determination: when the at least one of the given set of one or more contexts is determined to be indicated in the context tracking data, to implement a delay before performing the memory synchronisation, the delay continuing until one or more pending memory updates have been performed; and when each of the given set of one or more contexts is determined to be absent from the context tracking data, to perform the memory synchronisation without implementing the delay.
In the current disclosure, a “processing context” should be understood as an operating environment in which the processing element can operate, according to which the components of the processing element are provided with a self-consistent view of not only the components of the processing element itself, but of the whole of the apparatus in which the processing element is found, for example, including one or more further components such as a memory system to which the data processing apparatus is connected. The view of the processing system is complete from the point of view of the processing context. In other words, the processing context has all the information that is required by the processing circuitry for processing operations to be performed in that processing context. However, the processing context may not include information that is not required for processing operations to be performed in that processing context. For example, the memory system with which the data processing apparatus interacts may in fact contain a wider range of address locations than the processing circuitry of the data processing apparatus is able to see when operating in a particular context, yet the processing circuitry, when operating in that particular context, has no awareness that other inaccessible memory locations in the memory system exist. Each of the plurality of processing contexts may correspond, for example, to a process that is being carried out by the apparatus. In addition to each of the plurality of processing elements being configured to perform processing in one of the plurality of contexts, each of the plurality of contexts may be processed by one or more of the plurality of processing elements either in parallel or sequentially. For example, a given processing context may initially be processed on a first processing element of the plurality of processing elements but, subsequent to one or more context switching operations, the processing context may, subsequently, be processed on a second processing element (different to the first processing element) of the plurality of processing elements.
Because a single processing context may be processed by a plurality of processing elements and a single processing element may process a plurality of processing contexts, data associated with one or more processing contexts may be retained by the processing elements once the processing element has finished processing the context with which that data was associated. During processing, a given processing context may issue one or more maintenance operations relating to the data that is associated with the set of one or more contexts. For example, in some configurations a context operating at a higher exception level may issue a maintenance operation relating to one or more contexts operating at a lower exception level. These maintenance operations may require one or more memory updates to be completed by the processing elements where those processing elements have outstanding memory accesses relating to the given set of one or more contexts. In some use cases, the maintenance operations may be followed by a request for a memory synchronisation requiring that all memory updates have been completed before the synchronisation can be completed. The inventor has realised that, for example dependent on how the outstanding memory accesses are tracked, it may not always be possible to quickly determine from the tracking itself, which memory accesses on the processing element (if any) are related to the given set of one or more contexts. Therefore, it is possible that a memory synchronisation received by a processing element could be delayed unnecessarily, for example, if the memory synchronisation follows a maintenance operation specifying a given set of one or more contexts where none of the given set of one or more contexts have been processed by that processing element since the last memory synchronisation. This could result in unnecessary latency and power consumption. The processing element is therefore provided with context tracking circuitry that stores context tracking data. The context tracking data indicates a subset of the plurality of processing contexts that are treated as “active” contexts. An active context is one that has been processed by the processing element since the last synchronisation operation. The control circuitry is configured, in response to receipt of the synchronisation operation subsequent to one or more maintenance operations associated with a given set of one or more contexts, to determine whether any of (at least one of) the given set of one or more contexts is indicated in the context tracking data and, if so, to delay the memory synchronisation until one or more pending memory updates have been performed. When it is determined that none of the given set of one or more contexts are indicated in the context tracking data, the delay can be omitted and the memory synchronisation can be dealt with without delay resulting in reduced latency. In some configurations the control circuitry is configured, when implementing the delay, to delay the memory synchronisation until one or more pending memory updates associated with the at least one maintenance operation have been performed. In some configurations the delay is implemented until all pending memory updates associated with the at least one maintenance operation have been performed. In other configurations, the delay is implemented until all pending memory updates have been performed, independent as to whether those memory updates are associated with the at least one maintenance operation or not.
The delay is associated with the time for the pending memory accesses to be performed, and may also be associated with the time taken to determine whether or not there are any pending memory updates. Where a result of the determination is that there are no such pending memory updates, the delay is merely the time taken to determine that there are no such pending memory updates. For example, in some use cases one or more maintenance operations associated with one or more contexts may be received by a processing element which are associated with pending memory accesses. However, if there is a sufficient delay between the maintenance operations and the request for memory synchronisation, each of the one or more memory accesses associated with the maintenance operation may already have been completed by the time at which the request for memory synchronisation is received and, as a result, there is no delay associated with the updates being performed.
The different circuitry blocks that are comprised in each of the one or more processing elements (e.g., processing circuitry, control circuitry, and context tracking circuitry) may be provided as discrete circuitry blocks, each configured to provide the functions defined in association with that circuitry block. Alternatively, the circuitry blocks of each of the processing elements may be each provided as part of one or more discrete circuitry blocks that together perform the functions described in relation to the processing blocks.
In some configurations each processing element comprises a translation lookaside buffer (TLB) configured to store translations between a first address space associated with one of the plurality of processing contexts and a second address space, and the at least one maintenance operation comprises a translation lookaside buffer maintenance operation. In some configurations the first address may be a virtual address and the second address may be a physical address. In some configurations at least one of the first address and the second address may be an intermediate physical address with the TLB storing translations between a virtual address and an intermediate physical address and between an intermediate physical address and a physical address. The TLB maintenance operation may, for example, be a request requiring invalidation of one or more entries of a TLB that are associated with the given set of one or more contexts. In such configurations, any outstanding read/write requests may need to be completed before the TLB maintenance operation. Because the read/write requests are typically tagged by a second address (e.g., a physical address or an intermediate physical address) and it may therefore be difficult to determine whether that read/write transaction is associated with the given set of one or more contexts which may be identified using a first address (e.g., an intermediate physical address or a virtual address).
The format of the context tracking data can be variously defined. In some configurations the context tracking data comprises a set of context identifiers. For example, the context tracking data could be provided by a region of storage space capable of storing a predetermined number of context identifiers with a new context identifier being added to the storage space each time a new one of the plurality of contexts is recognised by the control circuitry.
In some configurations the context tracking data is stored as a filtered data set generated by applying a filter to context identifiers associated with each of the active contexts; and the control circuitry is configured to determine whether at least one of the given set of one or more contexts is included in the filtered data set by applying the filter to the given set of one or more contexts to generate a corresponding filtered context identifier, and comparing the corresponding filtered context identifier to the context tracking data. In some configurations, rather than storing the entire context identifier, the filter may be arranged to select a subset of bits of the context identifiers to be stored. For example, if the context identifier is represented as an N-bit value, the filter may filter the context identifiers to generate N-M bits (N minus M bits) that can be stored as context tracking data so that each stored value corresponds to 2M possible context identifiers. Whilst this could potentially result in false positives when the determination as to whether a given context is identified in the context tracking data, this approach would ensure that there are no false negatives and would reduce the amount of storage space required for the context identifiers. Furthermore, this approach would avoid placing an upper bound on the number of contexts that could be tracked.
In some configurations the context tracking data comprises a single bit vector generated as a logical OR of each filtered context identifier associated with the active contexts. Therefore, rather than storing a list of context identifiers the single bit vector may be provided indicative of a plurality of context identifiers. In some configurations, the single bit vector is generated by converting the context data into a one hot representation and generating the single bit vector using the logical OR function. Subsequently, when determining if a given context is comprised in the context tracking data, it can simply be determined if the one hot representation of the context identifier is included in the single bit vector. This approach provides a particular compact solution for storing the context tracking data.
In some configurations the filter circuitry implements a bloom filter. A bloom filter is a probabilistic data structure that is based on hashing. Typically, a bloom filter adds elements to a set and can then perform a test to determine if an element is in the set. The elements themselves are not added to the set. Instead a hash of the elements is added to the set. When testing if an element is in the bloom filter, false positives are possible but false negatives are not. For example, a bloom filter will determine that an element is definitely not in the set or that it is possible the element is in the set. The bloom filter is a particularly space efficient approach to implementing a filter.
Whilst the plurality of processing contexts may, in general, each relate to a different one of a plurality of processes, in some configurations the apparatus is configured to provide a virtualized operating environment supporting a plurality of virtual machines each associated with a virtual machine identifier; and each virtual machine of the virtual machines corresponds to one or more of the plurality of processing contexts identified by the virtual machine identifier associated with that virtual machine. Accordingly, the virtualized operating environment provides one manner in which the processing element can operate (e.g., execute data processing instructions) in more than one context. A given virtual machine (typically comprising a particular guest operating system and set of applications which run on that guest operating system) interacts with the hardware of the processing element (e.g., in particular in the present context the processing circuitry and memory system interaction circuitry) when operation of that virtual machine is the present context of operation for the processing element. The virtual machine identifier is an unique identifier associated with that virtual machine and that can be used to identify a plurality of processing contexts that correspond to that virtual machine. In some configurations each context identifier may comprise a virtual machine identifier and an address space identifier (ASID) uniquely specifying the context associated with the virtual machine identifier. In such configurations the context tracking data may be associated only with the virtual machine identifier so that all contexts having that virtual machine identifier are indicated in the context tracking data.
In some configurations at least one of the virtual machines is configured to implement a distributed virtual memory and the memory synchronisation is a distributed virtual memory synchronisation. Distributed virtual memory improves parallelisation by maintaining local copies of data items. For example, a virtual machine implementation may run one or more processes or applications across multiple processing elements. For example, an application may be multi-threaded with each thread running in parallel on different processing elements, or sequentially with different threads run one after another on the same processing element. In such a situation, the operating system of the virtual machine has to manage memory, for example, by assigning and reclaiming pages and may issue maintenance operations in respect of those applications. The distributed virtual memory synchronisation (DVM sync) is an operation that allows the operating system of the virtual machine to ensure that the application is being executed properly whilst its pages are being remapped.
In some configurations the context tracking data comprises, for each active context of the active contexts, information indicative of at least one of: a virtual machine identifier associated with the active context; a security state associated with the active context; and an exception level associated with the active context. In some configurations there may be plural contexts associated with each virtual machine identifier. For example, the context may be identified through a combination of a virtual machine identifier and an ASID with the context tracking data indicating the virtual machine identifier and omitting the ASID. As a result, the context tracking data identifiers all active contexts associated with a given VMID. In some configurations the context tracking data may be split into virtual machine tracking context data and other context data comprising information indicative of the security state and/or the exception level associated with the active context. In some configurations the control circuitry may be configured to omit the check of the virtual machine tracking context data dependent on the other context data. For example, if the other context data indicates a security state and/or exception level that is not associated with virtual machines that were active on the processing element then it may not be necessary to check the virtual machine tracking context data.
In some configurations the control circuitry is responsive to the context tracking data meeting a predetermined condition to perform a tracking data reset procedure comprising clearing the context tracking data. For configurations in which a more compact form of the context tracking data has been used, the context tracking data may saturate, for example, after multiple different processing contexts have been processed on any one processing element. As the context tracking data gets closer to saturating, the likelihood that, for a maintenance operation specifying any given context, the determination as to whether the given context is included in the context tracking data will return a false positive increases. The inventors have realised that the performance lost as the number of false positives increases may be greater than the performance cost associated with performing the tracking data reset procedure to reset the context tracking data. Hence, overall performance is increased through the provision of the reset procedure.
The predetermined condition may be variously defined. In some configurations the context tracking circuitry comprises storage space for a first number of context identifiers and the predetermined condition is met when storage of a new context identifier in the context tracking circuitry would exceed the storage space. In some configurations the context tracking data is stored as a filtered data set; the control circuitry is configured to calculate a likelihood that the determination could result in a false positive; and the predetermined condition is met when the likelihood exceeds a predetermined threshold. For example, the predetermined threshold may be when the likelihood of a false positive exceeds 70%, 80%, etc.
The reset procedure may handle synchronisation operations subsequent to the reset in a variety of ways. In some configurations the tracking data reset procedure comprises setting a saturation flag to indicate that the context tracking data has been cleared; and the control circuitry is responsive to the at least one maintenance operation, when the saturation flag indicates that the context tracking data has been cleared, to convert the at least one maintenance operation associated with the given set of one or more contexts to at least one maintenance operation associated with each of the plurality of processing contexts and to clear the saturation flag. The time at which the saturation flag is set can be dependent on the implementation. The saturation flag may be set at the point at which the tracking data is cleared or may be set immediately thereafter. The saturation flag may be a dedicated flag or encoded into a plurality of flag bits incorporated as part of a general flag register. The inventors have realised that any performance penalty associated with converting the maintenance operation to one associated with each of the plurality of processing contexts can be outweighed by the benefits of being able to quickly determine whether a synchronisation operation can be performed without implementing the delay. By implementing the reset procedure in this way, the amount of context tracking data that is required to be stored can be reduced providing an implementation with low circuitry overhead.
In some configurations the control circuitry is responsive to receipt of the at least one maintenance operation subsequent to an earlier memory synchronisation request and when the at least one of the given set of one or more contexts is indicated in the context tracking data, to set a maintenance flag; the control circuitry is configured, when responding to the request for the memory synchronisation, to determine whether the at least one of the given set of one or more contexts is indicated in the context tracking data based on a value of the maintenance flag; and the control circuitry is responsive to completion of the memory synchronisation, to clear the maintenance flag. If the maintenance operation(s) correspond to those included in the context tracking data, the control circuitry sets the maintenance flag to indicate that a maintenance operation has been received corresponding to a context that has been active on that processing element since the last memory synchronisation operation. Once the memory synchronisation operation is complete, the maintenance flag is reset (cleared) to indicate that no maintenance operations have been received since that memory synchronisation operation matching context tracking data since the maintenance flat is only set when there is a match. The maintenance flag may be provided as a dedicated flag or may be encoded in a plurality of flag bits stored in a general flag register.
In some configurations the control circuitry is responsive to receipt of the request for the at least one maintenance operation, when the saturation flag indicates that the context tracking data has been cleared, to set the maintenance flag independent of whether any of the given set of one or more contexts are indicated in the context tracking data. When the saturation flag indicates that the context tracking data has been cleared (e.g., the saturation flag is set), the control circuitry may be configured to convert the at least one maintenance operation associated with the given set of one or more contexts to at least one maintenance operation associated with each of the plurality of processing contexts and to clear the saturation flag. In this way, the control circuitry is able to ensure that pending memory updates associated with any of the plurality of contexts have been performed before the memory synchronisation.
Whilst in some configurations the given set of one or more contexts may include the context that is currently active on the processing circuitry of the processing element, in some configurations at least one of the given set of one or more contexts is different from a current context being processed by the processing circuitry. Each maintenance operation of the at least one maintenance operation may be received from any other processing element comprised in the apparatus. In some configurations each of the given set of one or more contexts is different from the current context being processed by the processing circuitry.
Particular configurations will now be described with reference to the figures.
The processing circuitry 52 is arranged to perform processing operations in one of the plurality of contexts. The processing circuitry 52 may store a context identifier 54 indicative of a current processing context of the processing circuitry 54. The context tracking circuitry 64 is arranged to store context tracking data 70 indicative of processing contexts that have been actively processed by the processing circuitry 52 of the processing element 50. In the illustrated configuration, the context tracking circuitry 64 is storing context tracking data 70 indicating that contexts identified by context identifier 1 and context identifier 2 have been actively processed by the processing circuitry 50. The context tracking circuitry 64 is also provided with a saturation flag 68 and a maintenance flag 66. The saturation flag 68 is set in response to the context tracking data being cleared and the maintenance flag 66 is set when it is determined that the processing circuitry has received a maintenance operation identifying one of the contexts that is indicated in the context tracking data 70. The control circuitry 56 is responsive to requests for memory synchronisation to determine if, since a previous memory synchronisation, any maintenance operations, for example, the pending maintenance operation 58 indicating context identifier 60, have been received where the indicated context identifier 60 is also indicated in the context tracking data 70 maintained by the context tracking circuitry 64.
On receipt of a maintenance operation 58 the control circuitry 56 determines whether the maintenance operation 58 has a context identifier 60 that corresponds to one of the context identifiers stored in the context tracking data 70. If a match is determined, then the control circuitry sets the maintenance flag 66 to indicate that a maintenance operation has been received since a previous synchronisation operation. If the pending maintenance operation 58 indicates a context identifier 60 that is not included in the context tracking data 70, then the control circuitry does not modify a value of the maintenance flag 66 (i.e., if the maintenance flag 60 is previously set, then it remains set and if the maintenance flag 60 was previously clear then it remains clear). In this way, the control circuitry 56 is able to use the maintenance flag 66 to track whether there have been any maintenance operations, indicating a context identifier 60 that is included in the context tracking data 70, received since a previous memory synchronisation. On receipt of a synchronisation operation the control circuitry 56 determines whether or not the maintenance flag 66 is set. If the maintenance flag 66 is not set, then the control circuitry 56 causes the memory synchronisation to be performed without delay, e.g., at the next available opportunity. On the other hand, if the maintenance flag 66 is set, then the control circuitry 56 implements a delay before performing the memory synchronisation. The control circuitry 56 is configured to cause the delay to continue until pending memory updates associated with the maintenance operations, have been completed. The control circuitry 56 is configured to cause the maintenance flag 66 to be cleared once the memory synchronisation has occurred.
The saturation flag 68 is provided to indicate that the context tracking data 70 has been cleared as a result of the context tracking data 70 saturating. In the illustrated configuration the context tracking circuitry 64 is provided with storage to store four context identifiers as part of the context tracking data 70. It would be readily apparent to the skilled person that the context tracking circuitry 64 could be provided with any amount of storage for any type/format of context tracking data dependent on the implementation. When the context tracking data 70 saturates, it will no longer be possible to add new context identifiers when the processing element 50 performs processing in a different processing context. The control circuitry 56 is responsive to the saturation of the context tracking circuitry (for example, responsive to receipt of a new context identifier that is not included in the context tracking data 70 when the context tracking data 70 already stores a context identifier in each possible storage location of the context tracking data 70), to clear the context tracking data 70 and to set the saturation flag 68. Subsequently, when the saturation flag 68 is set, the control circuitry 56 is responsive to receipt of a maintenance operation 58 specifying a particular context identifier 60 to convert the maintenance operation 58 to a maintenance operation specifying each context and to set the maintenance flag 66. The saturation flag 68 is cleared once the maintenance operation 58 has been upgraded to a maintenance operation specifying each context and the maintenance flag 66 has been set.
In
In
If, at step S118, it is determined that any of the set of one or more contexts associated with the maintenance operation is indicated in the context tracking data then flow proceeds to step S120. At step S120 it is determined if the maintenance flag is set. If, at step S120, it is determined that the maintenance flag is not set, then flow proceeds to step S122 where the maintenance flag is set before flow proceeds to step S124. If, at step S120, it is determined that the maintenance flag is set, then flow proceeds to step S124. At step S124 the maintenance operation associated with the context is triggered to be performed before flow returns to step S110.
If, at step S112, it was determined that the saturated flag was set, then flow proceeds to step S114 where the maintenance operation associated with the set of one or more contexts is converted to a maintenance operation that is associated with each of the plurality of contexts before flow proceeds to step S116. At step S116, the saturated flag is cleared before flow proceeds to step S122 where flow proceeds as described above.
Concepts described herein may be embodied in a system comprising at least one packaged chip. The apparatus described earlier is implemented in the at least one packaged chip (either being implemented in one specific chip of the system, or distributed over more than one packaged chip). The at least one packaged chip is assembled on a board with at least one system component. A chip-containing product may comprise the system assembled on a further board with at least one other product component. The system or the chip-containing product may be assembled into a housing or onto a structural support (such as a frame or blade).
As shown in
In some examples, a collection of chiplets (i.e. small modular chips with particular functionality) may itself be referred to as a chip. A chiplet may be packaged individually in a semiconductor package and/or together with other chiplets into a multi-chiplet semiconductor package (e.g. using an interposer, or by using three-dimensional integration to provide a multi-layer chiplet product comprising two or more vertically stacked integrated circuit layers).
The one or more packaged chips 400 are assembled on a board 402 together with at least one system component 404 to provide a system 406. For example, the board may comprise a printed circuit board. The board substrate may be made of any of a variety of materials, e.g. plastic, glass, ceramic, or a flexible substrate material such as paper, plastic or textile material. The at least one system component 404 comprise one or more external components which are not part of the one or more packaged chip(s) 400. For example, the at least one system component 404 could include, for example, any one or more of the following: another packaged chip (e.g. provided by a different manufacturer or produced on a different process node), an interface module, a resistor, a capacitor, an inductor, a transformer, a diode, a transistor and/or a sensor.
A chip-containing product 416 is manufactured comprising the system 406 (including the board 402, the one or more chips 400 and the at least one system component 404) and one or more product components 412. The product components 412 comprise one or more further components which are not part of the system 406. As a non-exhaustive list of examples, the one or more product components 412 could include a user input/output device such as a keypad, touch screen, microphone, loudspeaker, display screen, haptic device, etc.; a wireless communication transmitter/receiver; a sensor; an actuator for actuating mechanical motion; a thermal control device; a further packaged chip; an interface module; a resistor; a capacitor; an inductor; a transformer; a diode; and/or a transistor. The system 406 and one or more product components 412 may be assembled on to a further board 414.
The board 402 or the further board 414 may be provided on or within a device housing or other structural support (e.g. a frame or blade) to provide a product which can be handled by a user and/or is intended for operational use by a person or company.
The system 406 or the chip-containing product 416 may be at least one of: an end-user product, a machine, a medical device, a computing or telecommunications infrastructure product, or an automation control system. For example, as a non-exhaustive list of examples, the chip-containing product could be any of the following: a telecommunications device, a mobile phone, a tablet, a laptop, a computer, a server (e.g. a rack server or blade server), an infrastructure device, networking equipment, a vehicle or other automotive product, industrial machinery, consumer device, smart card, credit card, smart glasses, avionics device, robotics device, camera, television, smart television, DVD players, set top box, wearable device, domestic appliance, smart meter, medical device, heating/lighting control device, sensor, and/or a control system for controlling public infrastructure equipment such as smart motorway or traffic lights.
Concepts described herein may be embodied in computer-readable code for fabrication of an apparatus that embodies the described concepts. For example, the computer-readable code can be used at one or more stages of a semiconductor design and fabrication process, including an electronic design automation (EDA) stage, to fabricate an integrated circuit comprising the apparatus embodying the concepts. The above computer-readable code may additionally or alternatively enable the definition, modelling, simulation, verification and/or testing of an apparatus embodying the concepts described herein.
For example, the computer-readable code for fabrication of an apparatus embodying the concepts described herein can be embodied in code defining a hardware description language (HDL) representation of the concepts. For example, the code may define a register-transfer-level (RTL) abstraction of one or more logic circuits for defining an apparatus embodying the concepts. The code may define a HDL representation of the one or more logic circuits embodying the apparatus in Verilog, System Verilog, Chisel, or VHDL (Very High-Speed Integrated Circuit Hardware Description Language) as well as intermediate representations such as FIRRTL. Computer-readable code may provide definitions embodying the concept using system-level modelling languages such as SystemC and SystemVerilog or other behavioural representations of the concepts that can be interpreted by a computer to enable simulation, functional and/or formal verification, and testing of the concepts.
Additionally or alternatively, the computer-readable code may define a low-level description of integrated circuit components that embody concepts described herein, such as one or more netlists or integrated circuit layout definitions, including representations such as GDSII. The one or more netlists or other computer-readable representation of integrated circuit components may be generated by applying one or more logic synthesis processes to an RTL representation to generate definitions for use in fabrication of an apparatus embodying the invention. Alternatively or additionally, the one or more logic synthesis processes can generate from the computer-readable code a bitstream to be loaded into a field programmable gate array (FPGA) to configure the FPGA to embody the described concepts. The FPGA may be deployed for the purposes of verification and test of the concepts prior to fabrication in an integrated circuit or the FPGA may be deployed in a product directly.
The computer-readable code may comprise a mix of code representations for fabrication of an apparatus, for example including a mix of one or more of an RTL representation, a netlist representation, or another computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus embodying the invention. Alternatively or additionally, the concept may be defined in a combination of a computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus and computer-readable code defining instructions which are to be executed by the defined apparatus once fabricated.
Such computer-readable code can be disposed in any known transitory computer-readable medium (such as wired or wireless transmission of code over a network) or non-transitory computer-readable medium such as semiconductor, magnetic disk, or optical disc. An integrated circuit fabricated using the computer-readable code may comprise components such as one or more of a central processing unit, graphics processing unit, neural processing unit, digital signal processor or other components that individually or collectively embody the concept.
In brief overall summary there is provided an apparatus, system, method, and medium. The apparatus comprises one or more processing elements, each processing element comprising processing circuitry to perform processing operations in one of a plurality of processing contexts. Each processing element further comprises context tracking circuitry to store context tracking data indicative of active contexts. Each processing element comprises control circuitry responsive to a request for a memory synchronisation occurring subsequent to at least one maintenance operation associated with a given set of one or more contexts of the plurality of processing contexts, to determine whether at least one of the given set of one or more contexts is indicated in the context tracking data. The control circuitry is configured, when the at least one of the given set of one or more contexts is determined to be indicated in the context tracking data, to implement a delay before performing the memory synchronisation, and when each of the given set of one or more contexts is determined to be absent from the context tracking data, to perform the memory synchronisation without implementing the delay.
In the present application, the words “configured to . . . ” are used to mean that an element of an apparatus has a configuration able to carry out the defined operation. In this context, a “configuration” means an arrangement or manner of interconnection of hardware or software. For example, the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. “Configured to” does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.
In the present application, lists of features preceded with the phrase “at least one of” mean that any one or more of those features can be provided either individually or in combination. For example, “at least one of: [A], [B] and [C]” encompasses any of the following options: A alone (without B or C), B alone (without A or C), C alone (without A or B), A and B in combination (without C), A and C in combination (without B), B and C in combination (without A), or A, B and C in combination.
Although illustrative configurations of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise configurations, and that various changes, additions and modifications can be effected therein by one skilled in the art without departing from the scope of the invention as defined by the appended claims. For example, various combinations of the features of the dependent claims could be made with the features of the independent claims without departing from the scope of the present invention.
Some configurations of the invention are set out in the following numbered clauses: