The present application claims priority of the Chinese Patent Application No. 202210648905.4, filed on Jun. 9, 2022, the disclosure of which is incorporated herein by reference in its entirety as part of the present application.
Embodiments of the present disclosure relate to a cache system simulating method, an apparatus, a device and a storage medium.
In a common computer architecture, instructions and data of the program are all stored in a memory, and an operation frequency of a processor is much higher than an operation frequency of the memory, so it takes hundreds of clock cycles to acquire data or instructions from the memory, which usually causes the processor to idle due to inability to continue running related instructions, resulting in performance loss. In order to run fast and access efficiently, a high-speed cache storage apparatus (or briefly referred to as a cache) is usually adopted to save part of the data for high-speed reading by the processor. The data may be, for example, recently accessed data, pre-fetched data according to program operation rules, etc.
At least one embodiment of the present disclosure provides a cache system simulating method, which includes: acquiring a cache system model; acquiring an instruction information record, in which the instruction information record includes a plurality of entries, each entry of the plurality of entries includes a request instruction and a first addressing address corresponding to the request instruction; reading at least one entry of the plurality of entries from the instruction information record; simulating access to the cache system model by using the request instruction and the first addressing address in each entry of the at least one entry to acquire statistical data of the cache system model; and updating the cache system model based on the statistical data.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, simulating the access to the cache system model by using the request instruction and the first addressing address in each entry of the at least one entry to acquire the statistical data of the cache system model, includes: mapping the first addressing address to the cache system model to acquire a count value in a statistics counter, in which the cache system model is set to have a first configuration parameter; and acquiring the statistical data according to the count value.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, updating the cache system model based on the statistical data, includes: comparing the statistical data with target data to update the first configuration parameter.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, the count value includes a first count value, the statistical data includes a first statistical value, mapping the first addressing address to the cache system model to acquire the count value in the statistics counter, includes: mapping m first addressing addresses into the cache system model, where m is an integer greater than 1; comparing the m first addressing addresses with address segments in a plurality of corresponding cache lines in the cache system model; and in response to a comparison result of i first addressing addresses being cache hit, updating the first count value in the statistics counter to i, where i is a positive integer not greater than m.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, acquiring the statistical data according to the count value, includes: acquiring the first statistical value as i/m according to the first count value.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, the target data includes a first target value, comparing the statistical data with the target data to update the first configuration parameter, includes: in response to the first statistical value being greater than or equal to the first target value, outputting the first configuration parameter as a target first configuration parameter; or in response to the first statistical value being less than the first target value, modifying the first configuration parameter.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, the first statistical value is a hit ratio, and the first target value is a target hit ratio.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, the count value includes a second count value, and the statistical data includes a second statistical value, mapping the first addressing address to the cache system model to acquire the count value in the statistics counter, includes: mapping n first addressing addresses into the cache system model, where n is an integer greater than 1; comparing the n first addressing addresses with address segments in a plurality of corresponding cache lines in the cache system model; and in response to a comparison result of j first addressing addresses being bank conflict, updating the second count value in the statistics counter to j, where j is a positive integer not greater than n.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, acquiring the statistical data according to the count value includes: acquiring the second statistical value as j/n according to the second count value.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, the target data includes a second target value, comparing the statistical data with the target data to update the first configuration parameter includes: in response to the second statistical value being less than or equal to the second target value, outputting the first configuration parameter as the target first configuration parameter; or in response to the second statistical value being greater than the second target value, modifying the first configuration parameter.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, the second statistical value is a bank conflict ratio, and the second target value is a target bank conflict ratio.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, the first configuration parameter includes way, set, bank or replacement strategy.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, the request instruction includes a load request instruction or a store request instruction.
For example, the cache system simulating method provided in at least one embodiment of the present disclosure further includes: creating the cache system model by using a script language.
For example, in the cache system simulating method provided in at least one embodiment of the present disclosure, the instruction information includes trace log instruction information.
At least one embodiment of the present disclosure further provides an apparatus for cache system simulation, which includes: an acquiring circuit, configured to acquire a cache system model and acquire an instruction information record, in which the instruction information record includes a plurality of entries, each entry of the plurality of entries includes a request instruction and a first addressing address corresponding to the request instruction; a simulating access circuit, configured to read at least one entry of the plurality of entries from the instruction information record, simulate access to the cache system model by using the request instruction and the first addressing address in each entry of the at least one entry to acquire statistical data of the cache system model; and an updating circuit, configured to update the cache system model based on the statistical data.
At least one embodiment of the present disclosure further provides a device for cache system simulation, which includes: a processor; a memory, which includes computer programs; in which the computer programs are stored in the memory and configured to be executed by the processor, and the computer programs are configured to: implement the simulating method provided by any embodiment of the present disclosure.
At least one embodiment of the present disclosure further provides a storage medium, which is configured to store non-transitory computer readable instructions; when executed by a computer, cause the non-transitory computer readable instructions to implement the simulating method provided by any embodiment of the present disclosure.
In order to clearly illustrate the technical solution of the embodiments of the invention, the drawings of the embodiments will be briefly described in the following; it is obvious that the described drawings are only related to some embodiments of the invention and thus are not limitative of the invention.
In order to make objects, technical details and advantages of the embodiments of the invention apparent, the technical solutions of the embodiments will be described in a clearly and fully understandable way in connection with the drawings related to the embodiments of the invention. Apparently, the described embodiments are just a part but not all of the embodiments of the invention. Based on the described embodiments herein, those skilled in the art can obtain other embodiment(s), without any inventive work, which should be within the scope of the invention.
Unless otherwise defined, all the technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art to which the present disclosure belongs. The terms “first,” “second,” etc., which are used in the present disclosure, are not intended to indicate any sequence, amount or importance, but distinguish various components. The terms “comprise,” “comprising,” “include,” “including,” etc., are intended to specify that the elements or the objects stated before these terms encompass the elements or the objects and equivalents thereof listed after these terms, but do not preclude the other elements or objects. The phrases “connect”, “connected”, etc., are not intended to define a physical connection or mechanical connection, but may include an electrical connection, directly or indirectly. “On,” “under,” “right,” “left” and the like are only used to indicate relative position relationship, and when the position of the object which is described is changed, the relative position relationship may be changed accordingly.
The present disclosure is described below through several specific embodiments. To keep the following description of the embodiments of the present disclosure clear and concise, detailed descriptions of well-known functions and well-known components may be omitted. When any component of an embodiment of the present disclosure appears in more than one drawing, the component is denoted by the same reference numeral in each drawing.
For example, in a computing system shown in
Capacity of a cache is very small; content saved by a cache is only a subset of content of the main memory; and data exchange between the cache and the main memory is in blocks. To cache data in the main memory into the cache, for example, a certain function is used to locate a main memory address into the cache, which is referred to as address mapping. After the data in the main memory is cached in the cache according to the mapping relationship, the CPU converts the main memory address in a program into a cache address when executing the program. Address mapping modes of different types of caches usually include a direct mapping, a full association mapping, and a set association mapping.
Although the cache has smaller capacity than that of the main memory, it is much faster than the main memory, therefore, a main function of the cache is to store data that the processor may need to access frequently in the near future. In this way, the processor may directly read the data from the cache without frequently accessing the slower main memory, so as to improve the access speed to the memory of the processor. A basic unit of a cache is a cache line, which may be referred to as a cache block or a cache row. With respect to the cache being divided into a plurality of cache blocks, data stored in the main memory is divided in a similar dividing manner. The data blocks divided from the memory are referred to as memory blocks. Usually, a memory block may be 64 bytes in size, and a cache block may also be 64 bytes in size. It may be understood that, in practical applications, sizes of the memory block and the cache line may be set to other values, for example, 32 bytes to 256 bytes, as long as the size of the memory block is ensured to be the same as that of the cache block.
In order to improve a hit ratio of the cache, it is necessary to store the most recently used data in the cache as much as possible. Because cache capacity is limited, when a cache space is full, a cache replacement strategy may be adopted to delete some data from the cache, and then write new data into the space freed. The cache replacement strategy is actually a data obsolescence mechanism. Using a reasonable cache replacement strategy may effectively improve the hit ratio. Common cache replacement strategies include, but are not limited to, a first in first out scheduling, a least recently used scheduling, a least frequently used scheduling, etc., which is not limited in the embodiments of the present disclosure.
For example, in a superscalar processor, in order to improve performance, the processor needs to be capable of simultaneously executing a plurality of load/store instructions in each cycle, which requires a multi-port cache. However, due to greater capacity of the multi-port cache, and due to the use of multi-port design, it has a great negative impact on area and speed of the chip; therefore, a multi-bank structure may be adopted.
For example, as shown in
With respect to the caches shown in
At least one embodiment of the present disclosure provides a cache system simulating method. The method includes: acquiring a cache system model; acquiring an instruction information record, in which the instruction information record includes a plurality of entries, each entry of the plurality of entries includes a request instruction and a first addressing address corresponding to the request instruction; reading at least one entry of the plurality of entries from the instruction information record; simulating access to the cache system model by using the request instruction and the first addressing address in each entry of the at least one entry to acquire statistical data of the cache system model; and updating the cache system model based on the statistical data.
A plurality of embodiments of the present disclosure further provide an apparatus, a device or a storage medium corresponding to performing the above-described cache system simulating method.
The cache system simulating method, the apparatus, the device and the storage medium provided by at least one embodiment of the present disclosure separately model the cache system based on the instruction information record, without modeling the entire IP of the CPU or the GPU, which greatly reduces the workload for modeling, shortens the model convergence time, so that the performance data of the cache can be acquired quickly.
Hereinafter, at least one embodiment of the present disclosure will be described in detail with reference to the accompanying drawings. It should be noted that the same reference signs will be used in different drawings to refer to the same elements that have been described.
For example, as shown in
Step S10: acquiring a cache system model;
Step S20: acquiring an instruction information record;
Step S30: reading at least one entry of the plurality of entries from the instruction information record;
Step S40: simulating access to the cache system model by using a request instruction and a first addressing address in each entry to acquire statistical data of the cache system model; and
Step S50: updating the cache system model based on the statistical data.
For example, in step S10, the acquired cache system model may be, for example, a multi-level cache as shown in
For example, the cache system simulating method provided by at least one embodiment of the present disclosure further includes: creating the cache system model in step S10, for example, by using a script language. For example, the script language may be a perl language or a python language, or may also be other script languages that may implement a function of modeling the cache system, which is not limited in the embodiments of the present disclosure.
For example, in step S20, the instruction information record includes a plurality of entries; and each entry of the plurality of entries includes a request instruction (request) and a first addressing address (address) corresponding to the request instruction. For example, the request instruction includes a load request instruction (load) or a store request instruction (store); and the first addressing address may be an address carried by the load request instruction or the store request instruction.
For example, the instruction information record may include trace log instruction information (trace log); the trace log instruction information may be directly acquired through a hardware platform or an open source website. For example, an exemplary trace log instruction information may include the following contents:
In the embodiment of the present disclosure, the instruction information record, for example, the trace log instruction information, may be acquired through a hardware platform or an open source website, so that the cache system may be independently modeled by using the instruction information record. Since the instruction information record is easy to acquire, cache system simulation based on the instruction information has higher computing efficiency, and may undergo customized optimization as required by customers.
For example, in step S30, the at least one entry of the plurality of entries is read from the instruction information record, to acquire the request instruction and the first addressing address in each entry of the at least one entry. For example, the script language includes a system function for executing file reading; and information in the instruction information record may be directly read by calling the system function. For example, each time an entry in the instruction information record (e.g., a line in the trace log instruction information) is read, information such as the request instruction in the entry, the first addressing address corresponding to the request instruction, etc. may be acquired.
For example, in step S40, by using the request instruction and the first addressing address in each entry read, a process of accessing the cache system model may be simulated, for example, mapping of the first addressing address corresponding to the request instruction to ways, sets, banks, etc. in the cache is mainly completed; specifically, the first addressing address may be compared with an address segment (tag) in a plurality of cache lines corresponding to the cache system model, to acquire statistical data of the cache system model.
For example, the statistical data may be a hit ratio or a bank conflict ratio of the cache, or may also be other data which reflects a functional state of the cache system, which is not limited in the embodiments of the present disclosure.
For example, in step S50, one or more configuration parameters in the cache system model are updated based on the statistical data acquired in step S40, for example, address mapping or replacement strategies of ways, sets, or banks etc. in the cache are updated to achieve an optimal cache hit ratio and a minimum bank conflict.
In the cache system simulating method provided by the embodiments of the present disclosure, the cache system may be modeled independently based on the instruction information record, without modeling the entire IP of the CPU or the GPU, which greatly reduces the workload for modeling and shortens the model convergence time, so that the performance data of the cache can be acquired quickly.
For example, by using the request instruction included in each entry of the at least one entries read from the instruction information record in step S30 and the first addressing address corresponding to the request instruction, access to the cache system model may be simulated to acquire the statistical data of the cache system model. For example, as shown in
Step S410: mapping the first addressing address to the cache system model to acquire a count value in a statistics counter;
Step S420: acquiring the statistical data according to the count value.
For example, in the embodiment of the present disclosure, the cache system model is set to have a first configuration parameter; and the first configuration parameter includes way, set, bank or replacement strategy, etc. For example, in step S410, the first addressing address may be mapped to the cache system model set to the first parameter, by providing the statistics counter in the cache system model, to update the count value of the statistics counter. For example, in step S410, the statistical data is acquired according to the count value; and step S410 further includes: comparing the statistical data with target data to update the first configuration parameter. For example, the first configuration parameter of the cache is updated to make the statistical data reach an allowable range of the target data.
For example, as shown in
For example, as shown in
Then, step S30 as shown in
For example, in step S32, the entries in the instruction information record are read one by one. For example, the script language includes a system function (e.g., a $readfile function) for executing file reading. By calling the system function, the information in the instruction information record may be directly read. For example, in step S32, an entry in the instruction information record (e.g., a line in the trace log instruction information) may be read, to acquire information such as a request instruction in the entry, a first addressing address corresponding to the request instruction, etc.
For example, continue to execute step S40 as shown in
For example, as shown in
For example, in step S43, it is judged whether the comparison result of the first addressing address is cache hit: in response to the comparison result of the first addressing address being cache hit, in step S44, the count value of the counter is added by 1, and then proceed to step S45; in response to the comparison result of the first addressing address being cache miss, the count value of the counter remains unchanged, and step S45 is directly performed.
For example, in step S45, it is judged whether reading of the entry to be read in the instruction information record is completed: in response to reading of the entry to be read in the instruction information record being completed, step S46 is directly performed; in response to reading of the entry to be read in the instruction information record being uncompleted, return to step S32, in order to read a next entry in the instruction information record and execute the process of steps S41 to S45 for the entry.
For example, in step S46, the number of first addressing addresses mapped to the cache system model is m (e.g., m is the number of entries to be read in the information records counted in step S31); and a final update result of the first count value in the statistics counter is i, that is, a comparison result of i first addressing addresses is cache hit, so that the first statistical value (hit ratio) is acquired as i/m.
For example, continue to execute step S50 as shown in
For example, after modifying the first configuration parameters, the cache system simulating method provided by at least one embodiment of the present disclosure is executed again until the first statistical value acquired is greater than or equal to the first target value (i.e., an optimal first statistical value is acquired).
For example, the first statistical value is the hit ratio; and the first target value is the target hit ratio. For example, the modifying the first configuration parameter may be modifying ways, sets, banks, or replacement strategies, etc. in the cache system model, to optimize the cache hit ratio.
For example, as shown in
For example, as shown in
Then, step S30 as shown in
For example, in step S302, the entries in the instruction information record are read one by one. For example, the script language includes a system function (e.g., a $readfile function) for executing file reading. By calling the system function, the information in the instruction information record may be directly read. For example, in step S302, an entry in the instruction information record (e.g., a line in the trace log instruction information) may be read, to acquire information such as a request instruction in the entry, a first addressing address corresponding to the request instruction, etc.
For example, continue to execute step S40 as shown in
For example, as shown in
For example, in step S403, it is judged whether the comparison result of the first addressing address is bank conflict: in response to the comparison result of the first addressing address being bank conflict, in step S404, the count value of the counter is added by 1, and then proceed to step S405; in response to the comparison result of the first addressing address being not bank conflict, the count value of the counter remains unchanged, and step S405 is directly performed.
For example, in step S405, it is judged whether reading of the entry to be read in the instruction information record is completed: in response to reading of the entry to be read in the instruction information record being completed, step S406 is directly performed; in response to reading of the entry to be read in the instruction information record being uncompleted, return to step S302, in order to read a next entry in the instruction information record and execute the process of steps S401 to S405 for the entry.
For example, in step S406, the number of first addressing addresses mapped to the cache system model is n (e.g., n is the number of entries to be read in the information records counted in step S301), and a final update result of the second count value in the statistics counter is j, that is, a comparison result of j first addressing addresses is bank conflict, so that the second statistical value (bank conflict ratio) is acquired as j/n.
For example, continue to execute step S50 as shown in
For example, after modifying the first configuration parameter, the cache system simulating method provided by at least one embodiment of the present disclosure is executed again until the second statistical value acquired is less than or equal to the second target value (i.e., an optimal second statistical value is acquired).
For example, the second statistical value is the bank conflict ratio, and the second target value is the target bank conflict ratio. For example, the modifying the first configuration parameter may be modifying ways, sets, banks, or replacement strategies, etc. in the cache system model to achieve a minimize bank conflict.
For example, at least one embodiment of the present disclosure provides an apparatus for cache system simulation. As shown in
For example, the acquiring circuit 210 is configured to acquire a cache system model and acquire an instruction information record. For example, the instruction information record includes a plurality of entries; each entry of the plurality of entries includes a request instruction and a first addressing address corresponding to the request instruction. That is, the acquiring circuit 210 may be configured to execute steps S10 to S20 shown in
For example, the simulating access circuit 220 is configured to read at least one entry of the plurality of entries from the instruction information record, simulate access to the cache system model by using the request instruction and the first addressing address in each entry of at least one entry to acquire statistical data of the cache system model. That is, the simulating access circuit 220 may be configured to execute steps S30 to S40 shown in
For example, the updating circuit 230 is configured to update the cache system model based on the statistical data. That is, the updating circuit 230 may be configured to execute step S50 shown in
Since in the process of, for example, the cache system simulating method shown in
It should be noted that the above-described respective circuits in the apparatus 200 for cache system simulation shown in
In addition, although the apparatus 200 for cache system simulation is divided into circuits respectively configured to execute corresponding processing when described above, it is clear to those skilled in the art that the processing executed by respective circuits may also be executed without any specific circuit division in the apparatus or any clear demarcation between the respective circuits. In addition, the apparatus 200 for cache system simulation as described above with reference to
At least one embodiment of the present disclosure further provides a device for cache system simulation; the device includes a processor and a memory; the memory includes computer programs; the computer programs are stored in the memory and configured to be executed by the processor; and the computer programs are used to implement the above-described cache system simulating method provided by embodiments of the present disclosure.
For example, as shown in
For example, the processor 310 may be a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or other form of processing unit having a data processing capability and/or a program execution capability, for example, a Field Programmable Gate Array (FPGA), etc.; for example, the Central Processing Unit (CPU) may be an X86 or ARM architecture. The processor 310 may be a general purpose processor or a special purpose processor, and may control other components in the self-adaptive voltage and frequency adjusting device 300 to execute desired functions.
For example, the memory 320 may include any combination of one or more computer program products; and the computer program products may include various forms of computer readable storage media, for example, a volatile memory and/or a non-volatile memory. The volatile memory may include, for example, a Random Access Memory (RAM) and/or a cache, or the like. The non-volatile memory may include, for example, a Read Only Memory (ROM), a hard disk, an Erasable Programmable Read Only Memory (EPROM), a Portable Compact Disk Read Only Memory (CD-ROM), a USB memory, a flash memory, or the like. Computer programs may be stored on the computer readable storage medium, and the processor 310 may run the computer programs, to implement various functions of the device 300. Various applications and various data, as well as various data used and/or generated by the applications may also be stored on the computer readable storage medium.
It should be noted that in the embodiments of the present disclosure, the above description of the cache system simulating method provided by at least one embodiment of the present disclosure may be referred to for specific functions and technical effects of the device 300 for cache system simulation, and no details will be repeated here.
For example, as shown in
For example, as shown in
Although
The above description of the cache system simulating method may be referred to for detailed description and technical effects of the device 400 for cache system simulation, and no details will be repeated here.
For example, as shown in
For example, the storage medium 500 may be applied to the above-described device 300 for cache system simulation. For example, the storage medium 500 may be a memory 320 in the device 300 shown in
The technical effects of the in-memory computing processing apparatus provided by the embodiments of the present disclosure may be referred to the corresponding descriptions of the parallel acceleration method and the in-memory computing processor in the above embodiments, which will not be repeated here.
The following points need to be noted:
(1) In the drawings of the embodiments of the present disclosure, only the structures related to the embodiments of the present disclosure are involved, and other structures may refer to the common design(s).
(2) In case of no conflict, features in one embodiment or in different embodiments of the present disclosure may be combined.
The above are merely particular embodiments of the present disclosure but are not limitative to the scope of the present disclosure; any of those skilled familiar with the related arts may easily conceive variations and substitutions in the technical scopes disclosed by the present disclosure, which should be encompassed in protection scopes of the present disclosure. Therefore, the scopes of the present disclosure should be defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202210648905.4 | Jun 2022 | CN | national |