This invention relates to data processing systems. More particular, this invention relates to the management of a local memory and a main memory within a data processing system.
It is known to provide data processing systems with processing elements having local memory for improvement in speed/power/performance. One normal way of achieving this is to use a local RAM or TCM in the address map of the processing element. This approach has the advantage that the processing element concerned is able to directly access the local memory and accordingly there is advantageously little overhead associated with such accesses. However, a significant problem with this approach is that the data held in that local memory is normally a copy of the data in the main memory rather than a cache of that data. The main and local memories are visible in different parts of the address space, and data items are copied between them. This copy is usually explicitly controlled by a processing element in the system. This distributed memory model means there is generally nomechanism for providing coherency between the memories.
A cache memory provides an alternative model where data items within the local memory are cached copies of items in main memory. Data items in the local memory, usually arranged as cache lines, have corresponding address TAGs which are setup to indicate to which data items in main memory they correspond. The caching operation is usually occurs implicitly due to a memory request by a processing element. This may include allocation of a cache line, fetching the data, and setup of the TAGs. These TAGs enable various mechanisms to monitor the multiple copies of data items in the system and ensure coherency can be maintained. Within a multiprocessing environment, snooping can be used to ensure updates to one copy of the data are appropriately seen by other entities accessing that data. Whilst caches do address the problems of providing coherency, they bring with them some associated disadvantages.
The primary disadvantage is that the cache requires the TAGs to be accessed to determine if and where in the cache the data item exists. This cache TAG lookup consumes time and power compared to an access made directly to a RAM or TCM.
In addition, it may be desired to support virtual addressing. In this case, virtual to-physical address translation is required. This necessitates additional overhead in terms of circuit area and/or power consumption. In the case of physically addressed cache memory, an additional lookup is required prior to being able to access the cache.
Viewed from one aspect the present invention provides apparatus for processing data comprising:
a main memory having a main-memory address space;
a local memory having a local-memory physical address space;
processing logic coupled to said local memory and operable to output a physical address within said local-memory physical address space to said local memory to directly address a memory location within said local memory;
a translation store operable to store mapping data identifying a mapping between a plurality of regions of said local-memory physical address space and respective corresponding regions within said main-memory address space;
a local-memory control mechanism operable to transfer data between said main memory and said local memory and to maintain said mapping data; and
a coherency management mechanism responsive to said mapping data to manage coherence between data stored in corresponding regions of said local-memory physical address space and said main-memory address space.
The present technique can be considered to provide a reverse tagged local memory which is addressed using its own physical address space by the processing logic. More than one set of processing logic could share the local memory if properly coordinated. Since the local memory is directly addressed by the processing
element, no address translation or TAG lookup is required. In addition, mapping data is stored and maintained to provide a mapping of regions of the local memory to corresponding regions in the main memory such that coherency management can be performed. The local memory is flat mapped from the point of view of the processing logic enabling it to perform with a high degree of efficiency. Furthermore, the processing logic can control which data is moved into the local memory to suit a particular application or environment. Coherency management can be offloaded from the processing logic and dealt with elsewhere using the mapping data.
The present technique caches data by copying it into local address space, and setting up tags which can be used for coherency back to the main address space. Accesses to the local memory don't require a tag lookup, as it is known where the data was placed within the local memory. Coherency management mechanisms can still use the tags to ensure coherency with main or other memory (which can be physically addressed, virtually addressed or provide respective virtual address spaces for different programs being executed (e.g. use ASIDS)). This mechanism allows direct lookup in the local memory, in a manner similar to a RAM or TCM (i.e. with no tag lookup), but also provides coherency support in the same way as can be provided with caching mechanisms (the tags can be snooped). In the context of virtually addressed systems, the local address space can be virtually addressed since the processing logic has control over where data was placed in that local memory. The coherency tags can be populated with physical addresses which have been translated from virtual addresses for the purposes of coherency management. Local accesses to the local memory can use the virtual addresses and no address translation is required.
As will be appreciated, the coherency management mechanism is useful when further processing logic operates to access the main memory and accordingly will need coherency management to be performed between corresponding data in the main memory and the local memory.
The translation store can also be used to store coherency control data. Such coherency control data may, for example, be of a MESI protocol form.
Whilst the local memory and the main memory can vary in their capacity, it is normal for the main memory to have a greater capacity than the local memory.
The local memory controller will in preferred embodiments be responsive to one or more memory control instructions executed by the processing logic to copy data between the main memory and the local memory. Thus, software control of the data stored in the local memory is given to the processing logic.
The processing logic will typically perform data processing operations upon the data values stored within the local memory. The processing logic could be a data engine, a coprocessor, a general purpose microprocessor or some other data processing device.
In preferred embodiments, the local memory is configured as a plurality of local-memory lines with the regions within the local memory physical address space each corresponding to one or more local-memory lines.
The mapping data may typically be TAG data specifying respective main-memory physical address ranges corresponding to the regions of the local-memory physical address space mapped.
The local-memory control mechanism could take a variety of different forms such as, for example, a DMA unit, the processing logic operating under software control and/or a further processor operating under software control. In a similar way, the coherency management mechanism can take a variety of forms including a hardware unit snooping access to the main memory and the local memory, the processing logic operating under software control and/or a further processor operating under software control.
Viewed from another aspect the present invention provides a method of processing data using a main memory having a main-memory address space and a local memory having a local-memory physical address space; said method comprising the steps of:
outputting from processing logic a physical address within said local-memory physical address space to said local memory to directly address a memory location within said local memory;
storing mapping data identifying a mapping between a plurality of regions of said local-memory physical address space and respective corresponding regions within said main-memory address space;
transferring data between said main memory and said local memory and maintaining said mapping data; and
in dependence upon said mapping data, managing coherence between data stored in corresponding regions of said local-memory physical address space and said main-memory address space.
Viewed from a further aspect the present invention provides a computer program product comprising a computer readable medium storing a computer program executable by processing logic coupled to a main memory having a main memory address space and a local memory having a local memory physical address space to control the steps of:
outputting from said processing logic a physical address within said local-memory physical address space to said local memory to directly address a memory location within said local memory;
storing mapping data identifying a mapping between a plurality of regions of said local-memory physical address space and respective corresponding regions within said main-memory address space; and
transferring data between said main memory and said local memory and maintaining said mapping data.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:
The main memory 6 in
The data engine 8 executes program instructions (as part of its computer program) which trigger the DMA unit 14 to move data values from regions within the main memory 6 to specified regions within the local memory 4. At the same time, the physical address TAG data and coherency management data within the translation store 16 will be updated to reflect the mapping between the region of the local memory into which the data values have been copied and the corresponding region within the main memory together with the coherency management information. The updating of the TAG data and the MESI data may be performed by a separate program instruction(s) executed by the data engine 8 or may be automatically performed by dedicated hardware monitoring the transfers being performed. In this way, the data engine 8 may select data from within the main memory 6 and copy it into regions within the local memory 4. The data engine 8 when accessing that data within the local memory 4 uses local-memory physical addresses and considers the local memory to be a flat mapped physical address space which is directly accessible. This gives highly efficient access to the data within the local memory 4. Furthermore, the data engine 8 is able to control what data is present within the local memory 4 at any given time by copying that data into the local memory or copying it back to the main memory 6 (or simply invalidating it within the local memory 4 or overwriting it). More than one data engine could share the local memory 4 if desired and suitably co-ordinated.
As shown in
The main memory 6 has a greater memory storage capacity than the local memory 4. The local memory 4 may be conveniently divided into equal size regions each corresponding to one line of data which is mapped by a physical address TAG to a corresponding line of data within the main memory 6. It will be appreciated that the regions within the local memory 4 could be divided in many different ways and could be multiple lines in length or could each vary in length. In the example discussed above, the local-memory control mechanism is provided by the data engine 8 acting in cooperation with the DMA unit 14. However, the local-memory control mechanism could be provided in other ways and with other combinations, such as including operation of a further processor within the system, e.g. the microprocessor 10. In a similar way, the coherency management mechanism 18 is described as a dedicated hardware element in the example embodiment of
At step 26, the coherency management mechanism 18 snoops transactions on the bus 12 to identify accesses to the local memory 4 and the main memory 6. When these accesses are material to coherency management, then the coherency management mechanism 18 at step 28 updates the MESI data stored within the translation store 16 in association with the physical address TAGs.
At step 30, the microprocessor 10 accesses data values stored within the main memory 6 by issuing main memory physical addresses to the main memory 6.
| Filing Document | Filing Date | Country | Kind | 371c Date |
|---|---|---|---|---|
| PCT/GB2006/001139 | 3/29/2006 | WO | 00 | 7/8/2008 |