The present invention relates to a data processing system, a method for prefetching data and/or instructions, a method for loading data and/or instructions into a memory as well as to an electronic device.
Today's data processing systems or processors are based on a certain memory hierarchy, comprising memories with different speeds and sizes. However, as fast memories are expensive, the memory hierarchy is organized into several levels, wherein each level is smaller, faster and more expensive per byte than the next lower level. Usually, all data in one level can also be found in the level below and all data in the lower level can be found in the level below this one until the bottom of the hierarchy is reached.
A cache memory may constitute the first level of the memory hierarchy, i.e. it is the memory closest to a central processing unit CPU or a processing unit. If the CPU requests a data item, which can be found in the cache, a so-called cache hit has occurred. However, if the data item requested by the CPU cannot be found in the cache, a so-called cache miss has occurred. The time needed to correct the cache miss and fetch the requested data item depends on the latency and the bandwidth of the memory. The latency corresponds to the time for retrieving a first word of a block and the bandwidth relates to the time to retrieve the rest of a block. The basic idea of a cache is to fetch those data items, which will be needed during upcoming processing cycles before their actual processing.
The memory bandwidth can be exploited by replacing a whole cache line at a time if a cache miss occurs. However, such an approach will also increase the cache-line size in order to improve the available memory bandwidth. Large cache lines are advantageous in particular with regard to pre-fetching. However, if the size of the cache lines increases, the performance of the system may be decreased if programs do not have sufficient spatial locality and cache misses frequently take place.
In “Dynamically Variable Line-Size Cache Exploiting High On-Chip Memory Bandwidth of Merged DRAM/Logic LSIs” by K. Inoue et al., Proceedings of HPCA-5.5 International Conference on High Performance Computing, January 1999, it is described to change the size of cache-lines at runtime according to the characteristics of an application being currently executed.
Algorithms which may be processed within a data processing system will differ with respect to their locality of reference for the instructions as well as the data. The locality of reference constitutes a property of applications running on a processor. The locality of reference indicates how different memory regions are accessed by the application. Here, the locality of references may refer to the spatial locality of reference and the temporal locality of reference. An application has a good spatial locality of reference if there is a great likelihood that the data locations that are in close proximity of the recently accessed data location will be accessed in near future. Temporal locality of reference indicates that the access to the recent data location will occur again in the near future. Therefore, while some algorithms will have a good locality of reference (either spatial, temporal or both), others comprise a bad locality of reference. Accordingly, some algorithms will have a good cache hit rate while others will have a rather bad cache hit rate. It should be noted that cache misses cannot be avoided. However, the cache miss rate should be reduced to a minimum in order to reduce the cache miss penalty. If the processed data comprise a rich spatial locality, larger cache lines are used.
It is an object of the invention to provide a data processing system a method for prefetching data and/or instruction with a reduced amount of cache miss penalty.
This object is solved by a data processing system according to claim 1, a method for loading data and/or instructions into a memory according to claim 5, a method for prefetching data and/or instructions according to claim 6 and an electronic device according to claim 8.
Therefore, a data processing system for processing at least one application is provided. The data processing system comprises a processor for executing the application. The system furthermore comprises a cache memory being associated to the processor for caching data and/or instructions for the processor. The system furthermore comprises a memory unit for storing data and/or instructions for the application. The memory unit comprises a plurality of memory partitions. Data with similar data attributes are stored in the same memory partition. A predefined prefetching pattern is associated to each of the memory partitions.
According to an aspect of the invention, the cache memory comprises a plurality of registers which are each associated to one of the memory partitions of the memory. The registers are used to store the predefined prefetching pattern associated to the memory partitions. Data and/or instructions are prefetched according to the prefetched pattern stored in the registers. Hence, the prefetching of a data item can be customized for the particular data item, in particular regarding its data attributes.
According to a further aspect of the invention, data with a similar locality of reference are stored in the same memory partition. Accordingly, the cache miss penalty can be reduced as only those data items which are required will be prefetched.
According to still a further aspect of the invention, data stored in a memory partition having a high locality of reference are fetched as a complete block of data, merely the requested data stored in the memory partition having a low locality of reference is fetched.
The invention also relates to a method for loading data and/or instructions of an application into a memory unit. The memory unit comprises a plurality of memory partitions. Data and/or instructions with similar data attributes are loaded in the same memory portion. Accordingly, the memory and the data stored therein will be organized according to the data attributes.
The invention furthermore relates to a method for prefetching data and/or instructions of an application from a memory unit, which comprises a plurality of memory partitions. The data from the memory unit is prefetched into a cache memory associated to a processor. Data with similar data attributes are stored in the same memory partition. A predefined prefetching pattern is performed on each of the memory partitions.
The invention also relates to an electronic device for processing an application. The electronic device comprises at least one processor for executing the application. The electronic device furthermore comprises a cache memory associated to at least one of the processors for caching data and/or instructions received from a memory unit having a plurality of memory partitions. Data with similar data attributes are stored in the same memory partition. A predefined prefetching pattern is associated to each of the memory partitions.
The invention relates to the idea to partition a memory space into different regions while instructions and/or data with similar cache performance are placed together in similar regions. The regions may also be based on the amount of words being fetched during a cache miss. Accordingly, by reorganizing the storage of data in the memory, a substantial gain can be achieved. This may lead to a better performance and a reduced execution time.
The embodiments of the invention as well as the advantages thereof are described below in more detail with reference to the drawings.
The cache 200 may further comprise of (configurable) registers 240. Preferably, a register is associated to each of the partitions. The register serves to store information with regard to each of the partitions. This information may contain the start and end address, the number of words to be fetched if data or instructions are accessed from such a partition.
The processor 100 will issue a command to the cache 200 requesting to read data from a specified address. If this data is already prefetched into the cache 200, a cache hit will occur and the data is forwarded from the cache 200 to the processor 100. However, if this data is not present in the cache 200, a cache miss will occur. The cache controller 210 of the cache 200 may determine the partition or memory region 401-404 of the address within the memory 400 and issue a fetch operation in order to fetch a number of words which is associated with this partition. The data from the partition or the memory subsystem is then forwarded to the cache 200 according to the predefined prefetching pattern for this region 401-404. The status of the cache block is then updated in order to indicate whether valid data is present in the cache block.
According to the invention, the memory space is partitioned or devided into different memory regions wherein instructions and/or data are placed into one of the memory regions with other instructions and/or data which have a similar cache performance like a similar locality of reference. The memory regions where data is stored indicate the amount of words which will be fetched during a cache miss.
The above described architecture may be implemented in a multi processor system on chip. Accordingly, applications exhibiting a poor spatial locality of reference can be mapped.
The invention also relates to a method for categorizing data and instructions of different behaviors and to create corresponding memory partitions within a memory. According to this information, a linker or a loader application, which load the application object code (binary file) into the system memory during boot-up time, may organize the actual data into the particular memory regions as instructed. Accordingly, a compiler, a linker and/or a loader unit may be provided to enable the above-mentioned categorizing and creation. A predefined prefetching pattern is associated to each of the memory partitions or regions.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim. The word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. In the device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Furthermore, any reference signs in the claims shall not be construed as limiting the scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
06110435.2 | Feb 2006 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2007/050604 | 2/26/2007 | WO | 00 | 8/27/2008 |