This application claims the benefit of Taiwan application Serial No. 107119551, filed Jun. 6, 2018, the subject matter of which is incorporated herein by reference.
The invention relates an image processing system, and more particularly to a technology for enhancing memory utilization efficiency in an image processing system.
To buffer data used in an image processing process, many image processing systems use dynamic random access memory (DRAM) as a main memory, and uses static random access memory (SRAM) as a cache. Compared to a main memory, a cache has a faster data access speed but a higher hardware cost. Thus, a cache is only used for storing a small amount of image data recently having been used or immediately to be used, whereas a main memory is for storing complete image data of one or multiple video frames.
Many memory controllers 120 adopt a prefetch technique; that is, it is predicted which image data may be needed by the image processing circuit 110, and such image data is duplicated in advance from the main memory 140 to the cache 130.
Starting from when the memory controller 120 informs the main memory 140 to read data at a particular address to when the main memory 140 actually outputs data, a main time delay amount in between is referred to as column address strobe latency (to be referred to as CAS latency), which is a critical indicator for evaluating memory efficiency. In regard to a current DRAM, the main memory 140 includes multiple memory banks, and only one of these memory banks is active at the same time point. In general, CAS latency consists of two delay periods. If a memory bank storing required data is originally inactive, the memory bank needs to be first switched to an active state, and such switching time is the first delay period. The second delay period is the time needed by the active memory bank to transmit data to an output terminal of the main memory 140. For the same main memory 140, the first delay period is a constant value irrelevant to the amount data that needs to be fetched, whereas the length of the second delay period is a variable value directly proportional to the amount of data that needs to be fetched.
With the progress in manufacturing processes, the data rate of newer generation DRAMs also gets higher, meaning that the above time length T2 becomes shorter. However, the absolute time length of the first delay period T1 is not proportionally reduced along with the increase in the data rate. Because the ratio of first delay period T1 in the CAS latency cannot be overlooked, appropriately planning fetching behaviors on the main memory 140 (e.g., consecutively fetching multiple sets of data in one single fetch whenever possible) gets even more critical.
One issue of a current prefetch mechanism is that the utilization efficiency of the main memory 140 is not taken into account; the memory controller 120 may fetch image data from the main memory 140 by multiple times in a fragmented manner, resulting in degraded utilization efficiency of the main memory 140.
To resolve the above issue, the present invention provides an image processing system and a memory managing method thereof.
An image processing system suitable for accessing a main memory is provided according to an embodiment of the present invention. The image processing system includes a cache, an image processing circuit and a memory controller. The memory controller includes a hit calculating circuit, a deciding circuit and a fetching circuit. In response to a data request issued by the image processing circuit for a set of target image data, the hit rate calculating circuit calculates a cache hit rate of the set of target image data in the cache. The deciding circuit generates a prefetch decision according to the cache hit rate to indicate whether to perform a prefetch procedure. The fetching circuit selectively performs the prefetch procedure on the main memory according to the prefetch decision.
A memory managing method cooperating with an image processing system is provided according to another embodiment of the present invention. The image processing system is suitable for accessing a main memory, and includes a cache and an image processing circuit. The memory managing method includes: (a) in response to a data request issued by the image processing circuit for a set of target image data, calculating a cache hit rate of the set of target image data in the cache; (b) generating a prefetch decision according to the cache hit rate to indicate whether a prefetch procedure is to be performed; and (c) selectively performing the prefetch procedure on the main memory according to the prefetch decision.
The above and other aspects of the invention will become better understood with regard to the following detailed description of the preferred but non-limiting embodiments. The following description is made with reference to the accompanying drawings.
It should be noted that, the drawings of the present invention include functional block diagrams of multiple functional modules related to one another. These drawings are not detailed circuit diagrams, and connection lines therein are for indicating signal flows only. The interactions between the functional elements/or processes are not necessarily achieved through direct electrical connections. Further, functions of the individual elements are not necessarily distributed as depicted in the drawings, and separate blocks are not necessarily implemented by separate electronic elements.
The image processing circuit 410 performs one or more image processing processes. For example, if the image processing system 400 is a video signal receiving terminal, the image processing circuit 410 may include a motion compensation circuit for sequentially reconstructing multiple image blocks according to multiple sets of motion vectors and residuals. Each time an image processing process is to be performed, the image processing circuit 410 issues to the memory controller 420 a data request for image data (to be referred to as a set of target image data) needed for the image processing process, and informs the memory controller 420 of position information of the set of target image data.
In response to the data request issued by the image processing circuit 410, the hit calculating circuit 421 calculates a cache hits of the set of target image data in the cache 430. In the current cache memory structure, a cache includes multiple cache lines, and each cache line includes multiple fields including correctness, tag, index, offset and data. When a batch of data is duplicated from the main memory 900 to the cache 430, original addresses of the batch of data in the main memory 900 are divided into three parts, which are distributed and stored in the three fields of tag, index and offset. In other words, by combining the contents in the three fields of tag, index and offset, complete addresses of the batch of data can be obtained. In practice, the hit calculating circuit 421 may calculate the cache hit rate according to the contents of these fields. Associated details are given below.
Assume the set of target image data is distributed in multiple addresses in the main memory 900. If the cache 430 is a single-set cache, the hit calculating circuit 421 may search the correctness field, tag field and index field in the cache 430 according to each of the multiple addresses, so as to determine whether the address has a cache hit and to further calculate the overall cache hit rate of the set of target image data.
If the cache 430 is a multi-set cache and a least recently used (LRU) algorithm is used as a data replacement policy thereof, the hit calculating circuit 421 may be designed to perform searching without triggering the related replacement mechanism of the cache 430, or designed to perform searching without replacing any contents of the fields of the cache 430, thus avoiding any interference on data importance sorting of the cache 430.
In another embodiment, to avoid interference on data importance sorting of the cache 430, the hit calculating circuit 421 is designed to search duplications of address related fields of the cache 430 through a simulation mechanism, rather than directly searching address related fields of the cache 430.
It should be noted that, if the data request issued by the image processing circuit 410 directly includes the address of the set of target image data in the main memory 900, the converting circuit 421C in
It is seen from the above description that, the inquiry task of the searching circuit 421D is to obtain the hit rate instead of physically fetching data from the cache 430. Having the searching circuit 421D search the address table 421A1 rather than directly inquiring (fetch) the tag field and index field of the cache 430 can avoid any interference on data importance sorting of the cache 430. It should be noted that, because other fields in the cache 430 are not required to be also duplicated to the buffer 421A, the buffer 421A does not require a large capacity.
As shown in
It is seen from the above details that, whether the prefetch procedure is to be performed is determined according to whether the cache hit rate is 100%. However, in other embodiments of the present invention, the deciding circuit may generate a prefetch decision according to a cache hit rate other than a 100% cache hit rate.
It is seen from the above description that, the memory controller 420 does not perform a prefetch procedure each time a data request issued by the image processing circuit 410 is received. In the above embodiments, each time the memory controller 420 fetches image data from the main memory 900, the target of fetching necessarily includes the part of cache miss in the target image data and image data desired to be prefetched. In other words, the memory controller 420 does not perform the fetch procedure on the main memory 900 only for the part of cache miss in the target data, nor does it perform the fetch procedure on the main memory 900 only for the image data desired to be prefetched. One advantage of the above approach is that, in average, the memory 420 successively fetches a sufficient amount of sets of data each time in one burst, such that the utilization efficiency of the main memory 900 is effectively enhanced.
As shown in
The scope of the present invention does not limit the image processing system 400 to be implemented by a specific configuration or architecture. A person skilled in the art can understand that, there are numerous circuit configurations and components for realizing the concept of the present invention without departing from the spirit of the present invention. In practice, the foregoing circuits may be implemented by various control and processing platforms, including fixed and programmable logic circuits such as programmable logic gate arrays, application-specific integrated circuits, microcontrollers, microprocessors, and digital signal processors. Further, these circuits may also be designed to complete tasks thereof through executing processor instructions stored in a memory.
A person skilled in the art can conceive of applying the operation variations in the description associated with the image processing system 400 to the memory managing method in
While the invention has been described by way of example and in terms of the preferred embodiments, it is to be understood that the invention is not limited thereto. On the contrary, it is intended to cover various modifications and similar arrangements and procedures, and the scope of the appended claims therefore should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements and procedures.
Number | Date | Country | Kind |
---|---|---|---|
107119551 | Jun 2018 | TW | national |