The present application claims priority to Korean Patent Application No. 10-2023-0179295, filed on Dec. 12, 2023, the entire contents of which are incorporated herein for all purposes by this reference.
This present disclosure is derived from research conducted as part of the SW Computing Industry Core Technology Development of the Ministry of Science and ICT (Project Unique Number: 1711193834, Project Serial Number: 2021-0-00754-003, Project Management Agency: National IT Industry Promotion Agency, Research Project Name: Artificial Intelligence Semiconductor Design SW (Software) Development, Project Execution Agency Name: Korea Advanced Institute of Science and Technology, Research Period: 2023.01.01˜2023.12.31), supported by the Ministry of Trade, Industry and Energy of the Republic of Korea.
Meanwhile, in all the aspects of the inventive concept, there is no property interest in the government of the Republic of Korea.
The present disclosure relates to a method for generating training data and, more particularly, to a method for generating training data for training an artificial intelligence model capable of automatically designing a cache memory structure.
Automating designs of cache memory structures is necessary for various reasons. First, it is a difficult task to manually design an appropriate cache memory structure due to complex hardware components and diverse application requirements. It is necessary to consider a number of variables and constraints, and many experiments are required to obtain optimal performance.
In addition, new architectures and memory hierarchy structures continuously emerge as technology evolves. In response, by using automated tools, it is possible to respond to the latest technologies and shorten the cycles of development and design.
Automated design of cache memory structures can be efficiently performed by utilizing machine learning and optimization algorithms. Through this, it is possible to automatically find and adjust memory structures optimized for specific applications or tasks. This can contribute to improving performance and energy efficiency, and to relieving developers of the burden of finding the optimal memory structure one by one.
A task of the present disclosure relates to a method for generating training data for a cache memory design.
In a method for generating training data for cache memory design performed by at least one processor, the method for generating training data for cache memory design according to an exemplary embodiment includes setting a reuse profile, setting a first selection reuse distance based on the reuse profile, setting a first load index and a first real reuse distance based on the first selection reuse distance, modifying the reuse profile according to setting the first real reuse distance, and setting a second selection reuse distance based on the modified reuse profile.
Herein, the setting the reuse profile may include setting the number of reuse distances and a minimum reuse distance, setting a first reuse distance and a second reuse distance based on a preset mathematical formula, the number of reuse distances and the minimum reuse distance, and setting a first probability for the first reuse distance and a second probability for the second reuse distance.
Herein, the first probability and the second probability may be set based on random number generation, and a sum of probabilities corresponding to each reuse distance included in the reuse profile may be “1”.
Herein, the setting the reuse profile may include setting the number of reuse distances and a maximum reuse distance, setting a plurality of reuse distances corresponding to the number of reuse distances within a numerical value range of the maximum reuse distance by using a random function, and setting a probability for each of the plurality of reuse distances by using the random function.
Herein, the setting the first load index and the first real reuse distance may include checking whether the first selection reuse distance has ever been used, setting a default value as the first real reuse distance when the first selection reuse distance has never been used, and setting the first load index corresponding to the first real reuse distance by using a numerical value within a range of an index.
Herein, setting a second real reuse distance having the same numerical value as the first selection reuse distance after setting the first real reuse distance may be included, wherein the number of real reuse distances present between the first real reuse distance and the second real reuse distance may be the same as the numerical value of the first selection reuse distance.
Herein, setting a second load index corresponding to the second real reuse distance may be further included, wherein the second load index may have the same numerical value as the first load index.
Herein, further included may be setting the second selection reuse distance based on the modified reuse profile after setting the second real reuse distance, determining whether the number of non-duplicated numerical values from the second load index to a third load index is the same as a value obtained by adding “1” to a numerical value of the second selection reuse distance, the number of hops between the second load index and the third load index being the same as the numerical value of the second selection reuse distance, and setting a fourth load index corresponding to the second selection reuse distance to be the same as the third load index when a result of the determination is positive.
Herein, further included may be setting the second selection reuse distance based on the modified reuse profile after setting the second real reuse distance, determining whether the number of non-duplicated numerical values from the second load index to a third load index is the same as a value obtained by adding “1” to a numerical value of the second selection reuse distance, the number of hops between the second load index and the third load index being the same as the numerical value of the second selection reuse distance, setting the default value to a third real reuse distance corresponding to the second selection reuse distance when a result of the determination is negative, and setting a fifth load index corresponding to the third real reuse distance by using a numerical value within a range of the index, the fifth load index being different from the first load index and the second load index.
Herein, the setting the first load index and the first real reuse distance may include checking whether the first selection reuse distance has ever been used, and setting the first load index corresponding to the first selection reuse distance to be the same as a sixth load index when the first selection reuse distance has ever been used, wherein the number of non-duplicated numerical values from the first load index to the sixth load index may be the same as a value obtained by adding “1” to the numerical value of the first selection reuse distance.
Herein, the modifying the reuse profile may include subtracting a numerical value as many as the number of uses of the first real reuse distance from a first access number for the first real reuse distance included in the reuse profile, and modifying a first probability for the first real reuse distance based on the modified first access number.
Herein, a computer program stored in a computer-readable recording medium may be provided in order to execute the method for generating training data for the cache memory design.
A computing device according to an exemplary embodiment may include a memory, and at least one processor connected to the memory and configured to execute at least one computer-readable program included in the memory, wherein the at least one program includes instructions for setting a reuse profile, setting a first selection reuse distance based on the reuse profile, setting a first load index and a first real reuse distance based on the first selection reuse distance, modifying the reuse profile according to setting the first real reuse distance, and setting a second selection reuse distance based on the modified reuse profile.
A system for generating training data for a cache memory design based on artificial intelligence according to an exemplary embodiment includes a training data generation unit for generating training data by using a reuse distance, and a cache structure design unit configured to design a cache structure for an application by using an artificial intelligence model trained based on the training data, wherein the training data generation unit includes a first module for generating the reuse profile, and a second module for setting a first selection reuse distance based on the reuse profile, setting a first load index and a first real reuse distance based on the first selection reuse distance, modifying the reuse profile according to setting the first real reuse distance, and setting a second selection reuse distance based on the modified reuse profile.
A method of generating training data for a cache memory design may be provided according to an exemplary embodiment of the present disclosure.
The exemplary embodiments described in the present specification are intended to clearly explain the idea of the present disclosure to those skilled in the art to which the present disclosure belongs, so the present disclosure is not limited to the exemplary embodiments described in the present specification, and the scope of the present disclosure should be interpreted as including modifications or variations that do not depart from the spirit of the present disclosure.
The terms used in this specification are selected as general terms currently widely used as possible in consideration of the functions of the present disclosure, but this may vary depending on the intention of a person with ordinary knowledge in the technical field to which the present disclosure belongs, precedents, or the emergence of new technologies. However, when a specific term is defined and used in an arbitrary meaning, the meaning of the term will be described separately. Therefore, the terms used in the present specification should be interpreted based on the practical meaning of the term and the overall contents of the present specification, not just the name of the term.
The drawings attached to the present specification are intended to facilitate the description of the present disclosure, and the shape shown in the drawings may be exaggerated and displayed as necessary to help understanding of the present disclosure, and thus the present disclosure is not limited by the drawings.
In the present specification, when it is determined that a detailed description of a known configuration or function related to the present disclosure may obscure the gist of the present disclosure, a detailed description thereof will be omitted as necessary.
Referring to
Specifically, the memory trace module may extract a memory trace of an application in the AI accelerator. The feature extraction module may extract data locality features from the memory trace extracted by the memory trace module. The simulation module may measure the performance of various cache structures for labeling training dataset and perform labeling. The model training unit may train the artificial intelligence model with the configured training dataset. Accordingly, the inference part may include a module capable of automatically designing a cache memory structure by using the trained artificial intelligence model.
However, the conventional system for cache memory design based on artificial intelligence may have the following problems by directly extracting features from an application. First, the training model may utilize a lot of information related to input data, but this may lead to a data overfitting problem where the model fits too excessively into the training data.
In addition, the model may be more likely to learn features that are specific to a particular task or application, so the ability to generalize the training model may decrease, and the performance for other applications may be degraded in the inference process. In addition, feature extraction may be a time-consuming and expensive process, so there may be a disadvantage of consuming a lot of time and money since the amount of memory traces in an application is huge.
Accordingly, the present disclosure proposes a method for generating training data for training an artificial intelligence model by introducing a trace generator without directly extracting features from an application.
Referring to
The first module may generate a reuse profile. The second module may set a first selection reuse distance based on the reuse profile, set a first load index and a first real reuse distance based on the first selection reuse distance, modify the reuse profile according to setting the first real reuse distance, and set a second selection reuse distance based on the modified reuse profile.
The training of the artificial intelligence model may be performed by using the training data generated based on the first module and the second module. The trained artificial intelligence model may design a cache memory structure in the inference part.
The present disclosure may generate training data for a cache memory design based on a reuse distance. Hereinafter, a reuse distance will be described.
In order to efficiently utilize the cache, the hit rate of the cache should be maximized. As shown in
The cache may have a limited size, so when accessing a memory address that does not exist in the cache, the cache may fetch data from the main memory to the cache. In this case, the cache may erase as much data from the cache as fetching from the main memory, and may update the data to the main memory. There may be a number of cache replacement policies, including LRUs, FIFOs, and RRs, which are methods for determining which data the cache erases and updates, and the reuse distance may be one of the main indicators. Accordingly, a reuse distance analysis may be used as one of the best ways for characterizing the memory operation of an application, which enables optimizing the cache.
Referring to
Specifically, referring to
When the fourth address is “a”, “a” has been accessed before, so it may be necessary to check the number of accesses of unique (non-duplicated) addresses between the first address and the fourth address. Since the number of accesses of unique addresses between the first address and the fourth address is “2” (“b” and “c”), the fourth reuse distance may become “2”.
When the fifth address is “a”, “a” has been accessed before, so it may be necessary to check the number of accesses of unique addresses between the fifth address and the fourth address (“a” accessed just before). Since the number of accesses of unique addresses between the fifth address and the fourth address may be “0”, the fifth reuse distance may become “0”.
When the sixth address is “c”, “c” has been accessed before, so it may be necessary to check the number of accesses of unique addresses between the sixth address and the third address (“c” accessed just before). Since the number of accesses of unique addresses between the sixth address and the third address is “1” (“a”), the sixth reuse distance may become “1”.
When the seventh address is “b”, “b” has been accessed before, so it may be necessary to check the number of accesses of unique addresses between the seventh address and the second address (“b” accessed just before). Since the number of accesses of unique addresses between the seventh address and the second address is “2” (“a” and “c”), the seventh reuse distance may become “2”.
When the eighth address is “d”, the eighth reuse distance may become “−1” since “d” is the memory address that has been accessed for the first time.
When the ninth address is “a”, “a” has been accessed before, it may be necessary to check the number of accesses of unique addresses between the ninth address and the fifth address (“a” accessed just before). Since the number of accesses of unique addresses between the ninth address and the fifth address is “3” (“b”, “c”, “d”), the ninth reuse distance may become “3”.
As described above, the reuse distance may be determined based on the number of accesses of unique addresses between accessed addresses, thereby becoming an indicator representing the locality of the memory access patterns.
Referring to
The reuse profile may include information on the number of accesses (appearances) of the reuse distance set for each reuse distance. In addition, the reuse profile may include information on a probability based on the number of accesses of the reuse distance. In this case, a sum of probabilities for each reuse distance may be “1”. The number of accesses of the reuse distance may be randomly set, and may be set based on a predetermined equation. The present disclosure may generate a variety of training data by controlling the number of accesses of the reuse distance, the number of reuse distances, and the numerical value of the reuse distance.
Referring to
The setting the reuse profile (S100) may be generating the reuse profile as shown in
Referring to
The setting the number of reuse distances and the minimum reuse distance (S111) may be setting how many reuse distances are to be included and used in the reuse profile and the minimum of the numerical value (a numerical value excluding the default value of “−1”) of the reuse distance value. Referring to
The setting the first reuse distance and the second reuse distance (S112) may be setting the first reuse distance and the second reuse distance based on a preset mathematical formula. In the example of
In addition, for example, when “m” is “2”, the second reuse distance of “49” may be set by using 2b-1. In addition, for example, when “m” is “3”, the third reuse distance of “74” may be set by using 3b-1. By repeating this, the reuse distance may be set until the number of reuse distances excluding the default value (“−1”) becomes “n”.
The setting the first probability and the second probability (S113) may be setting the number of accesses (appearances) of reuse distances for the reuse distance set in the step S112 and the probability based on thereon. In the example of
Referring to
The setting the number of reuse distances and the maximum reuse distance (S121) may be setting how many reuse distances are to be included and used in the reuse profile and the maximum of the numerical value (a numerical value excluding the default value of “−1”) of the reuse distance value. Referring to
The setting the plurality of reuse distances corresponding to the number of reuse distances (S122) may be setting a plurality of reuse distances corresponding to the number of reuse distances within a numerical range of the maximum reuse distance by using a random function. Referring to
The setting the probability for each reuse distances (S123) may be setting the probability for each of the plurality of reuse distances by using a random function. Referring to
Referring back to
The setting the first load index and the first real reuse distance (S300) may be setting the first load index and the first real reuse distance corresponding to the first selection reuse distance based on the first selection reuse distance set in the step S200.
The modifying the reuse profile (S400) may be modifying the reuse profile generated in the step S100 in consideration of the parameters set in the steps S200 and S300. Specifically, the step S400 may be modifying an access number for the reuse distance and a probability.
The setting the second selection reuse distance (S500) may be setting the selection reuse distance as in the step S200 by the reuse profile modified in the step S400.
The steps S200 to S500 will be described in detail below with reference to the flowchart of
Referring to
After the step S324, determining whether the number of non-duplicated numerical values from the second load index to the third load index is the same as a value obtained by adding “1” to the numerical value of the second selection reuse distance may be performed (S325). When the result of the step S325 is positive, setting the fourth load index to be the same as the third load index may be performed (S326). When the result of the step S325 is negative, setting a fifth load index corresponding to the third real reuse distance may be performed (S327).
The steps S310 to S331 may be written by names such as the first selection reuse distance, the second real reuse distance, and the like, but the first selection reuse distance may become the second selection reuse distance or the third selection reuse distance. That is, the steps of
Referring to
The processor may set the first load index 21 and the first real reuse distance 31 according to the first selection reuse distance. Specifically, the processor may set the first real reuse distance 31 based on whether the first selection reuse distance 11 has ever been used (S310). In the example of
The processor may set the first load index 21 corresponding to the first real reuse distance 31. Specifically, the processor may randomly set a numerical value of a first load index 21 according to the first selection reuse distance 11 within a range of a predetermined index (e.g., “0” to “10000”). In the example of
Thereafter, the processor may set the second real reuse distance 32. The processor may set the numerical value of the second real reuse distance 32 to be the same as the numerical value of the first selection reuse distance 11. In this case, the number of real reuse distances present between the first real reuse distance 31 and the second real reuse distance 32 may be the same as the numerical value of the first selection reuse distance 11. In the example of
The processor may set the second load index 22 corresponding to the second real reuse distance 32. In this case, the second load index 22 may have the same numerical value as the first load index 21. This is because the number of non-duplicated load indices from the first real reuse distance 31 to the second real reuse distance 32 (wherein the first real reuse distance 31 and the second real reuse distance 32 are included) should be the same as the sum of the numerical value of the second real reuse distance 32 and “1”. This may be due to the definition of the reuse distance, and the second load index 22 may be automatically set by the definition of the reuse distance since the second real reuse distance 32 is set.
After setting the real reuse distance, the processor may be capable of modifying the reuse profile. This process may be performed whenever the real reuse distance is set.
Referring to
In the example of
In addition, for example, since the real reuse distance is set to “4” at the 5-th length, the processor may subtract “1”, the number of uses of “4”, from the access number corresponding to “4” in the reuse profile. Accordingly, the access number of “4” may be modified from “30” to “29”. After subtracting the number of uses from the access number, the processor may recalculate the probability for each reuse distance based on the modified access number.
Referring back to
In order to set a fourth load index corresponding to the second selection reuse distance 12, the processor may check whether the number of non-duplicated numerical values from the second load index 22 to the third load index 23 is the same as the value obtained by adding “1” to the numerical value of the second selection reuse distance 12 (S325). In this case, the third load index 23 may refer to a load index where the second load index 22 and the number of hops are the same as the numerical value of the second selection reuse distance 12.
In the example of
In the example of
However, the number of non-duplicated numerical values from the second load index 22 to the third load index 23 may not be the same as the value obtained by adding “1” to the numerical value of the second selection reuse distance 12. This case may be described below with reference to the fourth selection reuse distance 14.
After setting the fourth load index 24, the processor may set the third selection reuse distance 13 based on the modified reuse profile. In the example of
Since “4” at the 0-th length has ever been used as the first selection reuse distance 11, the processor may set a load index corresponding to the third selection reuse distance 13 by performing the step S331. Accordingly, the processor may determine the numerical value of the load index corresponding to the third selection reuse distance 13 as “16” and determine the real reuse distance as “4”, such that the number of non-duplicated numerical values between load indices becomes “5” (“4”+“1”).
Thereafter, the processor may set the fourth selection reuse distance 14 based on the modified reuse profile. In the example of
Since the numerical value “8” has never been used, the processor may set the third real reuse distance corresponding to the fourth selection reuse distance 14 to the default value of “−1”. In addition, the processor may perform the step S325 for the fourth selection reuse distance 14. In the example of
Referring to
Referring to
On the other hand, the present disclosure may ensure that the entire area is used evenly without the coverage area by setting the access number (N) and the maximum reuse distance (RD) as shown in
The method according to an exemplary embodiment may be implemented in the form of program commands that may be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program commands, data files, data structures, and the like alone or in combination. Program commands recorded on the medium may be specially designed and configured for exemplary embodiments, or may be known and available to those skilled in computer software. Examples of computer-readable recording media may include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, magnetic-optical media such as floptical disks, and hardware devices specifically configured to store and perform program commands such as ROMs, RAMs, and flash memories. Examples of program commands may be not only machine language codes such as those produced by a compiler, but also high-level language codes that are executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules in order to perform the operations of the exemplary embodiments, and vice versa.
As described above, the exemplary embodiments have been described by limited exemplary embodiments and drawings, but various modifications and variations from the above description may be possible to those of ordinary skill in the art. For example, appropriate results may be achieved even when the described techniques are performed in a different order from the described method, and/or the described components such as systems, structures, devices, circuits, etc. may be combined or assembled in a different form from the described method, or may be replaced or substituted by other components or equivalents.
Therefore, other implementations, other exemplary embodiments, and those equivalent to the patent claims may fall within the scope of the claims described below.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0179295 | Dec 2023 | KR | national |