This application is a National Stage application under 35 U.S.C. § 371 of International Application No. PCT/JP2019/029789, having an International Filing Date of Jul. 30, 2019. The disclosure of the prior application is considered part of the disclosure of this application, and is incorporated in its entirety into this application.
The present invention relates to a cache usage indicator calculation apparatus, a cache usage indicator calculation method, and a cache usage indicator calculation program.
A cache memory is a high-speed small-capacity memory that is used to conceal delay in a main memory when a central processing unit (CPU) accesses data, a command, or the like. A cache memory is important when a computer increases the speed of processing of an application.
The horizontal axis of the graph indicates the cache occupancy amount, and the vertical axis of the graph indicates the normalized performance. The diamond-shaped plots are plots relating to Povray. The square plots are plots relating to Bzip2. The triangular plots are plots relating to MCF. The X-shaped plots are plots relating to Bwaves.
As indicated by this graph, when the occupancy amount of the cache decreases, the cache miss count increases. With Bwaves, the performance decreases by about 60% when the cache occupancy amount is 0 MB compared to when the cache occupancy amount is 15 MB.
Cache Allocation Technology refers to a function by which a CPU can restrict and control use of a final-level cache for each application.
As described in NPL 2, Cache Allocation Technology is installed, which is a function by which use of a final-level cache can be restricted and controlled for each application, from the Intel Xeon (registered trademark) processor E5 2600 v3 product family released in September 2014.
With these processors, it is possible to determine the level of Class of Service (CLOS) for each application process, and the range up to which the final-level cache is to be used for each CLOS is controlled with a capacity mask.
In the example shown in
For CLOS[1], four is are set from the 11th bit to the 8th bit, which indicates that ¼ of the final-level cache is used. Note that 1s are set from the 11th bit to the 8th bit only for CLOS[1], which indicates that an application of CLOS[1] occupies the regions corresponding to these bits.
For CLOS[2], six is are set from the 7th bit to the 2nd bit, which indicates that ⅜ of the final-level cache is used. Note that is are similarly set from the 7th bit to the 2nd bit for CLOS[3] as well, which indicates that applications of CLOS[2] and CLOS[3] share the regions corresponding to these bits.
For CLOS[3], eight is are set from the 7th bit to the 0th bit, which indicates that ½ of the final-level cache is used. Note that is are similarly set from the 7th bit to the 2nd bit for CLOS[3] as well, which indicates that an application of CLOS[2] and an application of CLOS[3] share the regions corresponding to these bits. This indicates that is are set from the 1st bit to the 0th bit only for CLOS[3], and an application of CLOS[3] occupies the regions corresponding to these bits.
The degree to which the performance obtained when multiple applications operate deteriorates relative to the performance obtained when a single application operates is called the cache sensitivity of the application. When performing tuning using Cache Allocation Technology, the inventors thought of a policy of allocating many cache memories to an application with a high cache sensitivity. This makes it possible to improve the performance of the application.
The degree to which the performance of another application 11 deteriorates when multiple applications operate is called the cache pollutivity of that application. The inventors thought of a policy of allocating a small amount of cache memory to an application with a high cache pollutivity. This makes it possible to suppress the influence on another application when the application operates.
In view of this, the present invention aims to calculate the cache sensitivity and the cache pollutivity based on the cache usage state of each application.
In order to solve the above-described problem, a cache usage indicator calculation apparatus of the present invention includes:
a memory for reading and writing data;
a cache that can be accessed more rapidly than the memory;
a central processing unit configured to read and write from and to the memory and the cache and execute processing;
a usage state measurement unit configured to measure a usage state of the cache used by an application executed by the central processing unit;
a performance measurement unit configured to measure a cache sensitivity and a cache pollutivity relating to an application; and
an indicator calculation unit configured to, based on performance deterioration of a pre-selected plurality of applications and the usage state of the cache, calculate an indicator for the cache sensitivity and/or an indicator for the cache pollutivity of each application.
Other means will be described in the mode for carrying out the invention.
According to the present invention, it is possible to calculate the cache sensitivity and the cache pollutivity based on the cache usage state of each application.
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
If multiple virtual machines operate at the same time on a physical host, there is a risk that a cache conflict will occur and the performance will decrease.
A physical host 3 is constituted by including CPU (Central Processing Unit) cores 31a to 31d, primary cache memories 32a to 32d, secondary cache memories 33a to 33d, a tertiary cache memory 34, and a main memory 35. The main memory 35 is a RAM (Random Access Memory) for reading and writing data. The primary cache memories 32a to 32d, the secondary cache memories 33a to 33d, and the tertiary cache memory 34 are storage regions that can be accessed more rapidly than the main memory 35. The CPU cores 31a to 31d are central processing units that execute processing by reading and writing from and to the primary cache memories 32a to 32d, the secondary cache memories 33a to 33d, the tertiary cache memory 34, and the main memory 35.
In a multi-core configuration, which is mainstream for current CPUs, multiple CPU cores 31a to 31d commonly share a lower-order cache such as the tertiary cache memory 34. Hereinafter, when no particular distinction is made between the CPU cores 31a to 31d, they will simply be described as “CPU cores 31”.
Multiple applications 11a to 11d operate on the physical host 3 shown in
The application 11a occupies the CPU core 31a, the primary cache memory 32a, and the secondary cache memory 33a, and further shares part of the tertiary cache memory 34.
The application 11b occupies the CPU core 31b, the primary cache memory 32b, and the secondary cache memory 33b, and further shares part of the tertiary cache memory 34.
The application 11c occupies the CPU core 31c, the primary cache memory 32c, and the secondary cache memory 33c, and further shares part of the tertiary cache memory 34.
The application 11d occupies the CPU core 31d, the primary cache memory 32d, and the secondary cache memory 33d, and further shares part of the tertiary cache memory 34.
When multiple applications 11a to 11d run on the same node, the tertiary cache memory 34 is polluted by the applications 11a to 11d and the number of cache misses relatively increases. The CPU core 31 in which the cache miss has occurred needs to go reference the main memory 35, which takes more than double the time to access compared to the tertiary cache memory 34. The applications 11 cause a decrease in performance resulting from penalties for such a cache miss. This is called a cache conflict caused by multiple applications 11.
In contrast to this, a single application 11a operates on the physical host 3 shown in
The cache evaluation indicator acquisition system 1 is constituted by including an OS (Operating System) 2 that operates on the physical host 3, multiple applications 11a and 11b that operate on the OS 2, and a cache usage indicator calculation unit 12. Note that in
The physical host 3 is a combination of a central processing unit, caches, and memories for causing the OS 2, the applications 11a and 11b, and the cache usage indicator calculation unit 12 to operate, and for example, is configured similarly to the physical host 3 shown in
The OS 2 is basic software for controlling execution of a program that operates on the physical host 3, and performs job management, input/output control, data management, and processing relating thereto.
The applications 11a and 11b are programs that operate in the environments of the physical host 3 and the OS 2 respectively. The applications 11a and 11b may also be virtual machines or containers.
A desired amount of cache memory can be allocated to the applications 11a and 11b by setting up a capacity mask of the CLOS corresponding to the applications 11a and 11b. Hereinafter, when no particular distinction is made between the applications 11a and 11b, they are simply written as “applications 11”.
The cache usage indicator calculation unit 12 is a portion that tunes the cache allocation amounts of the applications 11 that operate on the physical host 3. The function of the cache usage indicator calculation unit 12 is realized by a processor (not shown) of the physical host 3 executing a program for cache tuning.
The cache usage indicator calculation unit 12 is constituted by including a cache usage amount measurement unit 121, an application performance measurement unit 122, a cache evaluation indicator calculation unit 123, a sensitivity calculation unit 124, a pollutivity calculation unit 125, and a database unit 126.
The cache usage amount measurement unit 121 acquires values relating to the cache and cache usage parameters regarding how the cache is being used by the applications 11. That is, the cache usage amount measurement unit 121 functions as a usage state measurement unit for measuring the usage state of the tertiary cache memory 34 by the applications 11 executed by the CPU cores 31. These applications 11 include virtual machines, containers, and the like.
The values relating to the cache and the cache usage parameters are, for example, a cache prefetch count, a cache reference count, a cache hit count, a cache miss count, a cache miss rate, a dTLB load count, a dTLB miss count, and a dTLB miss rate. The values relating to the cache and the cache usage parameters of the measured applications 11 are stored in the database unit 126.
The application performance measurement unit 122 measures the performance of each application 11. The application performance measurement unit 122 measures the performance in which case where each application 11 operates alone, and measures the performance in the case where two applications 11 operate in combination with each other. The measured performance of each application 11 is stored in the database unit 126. Accordingly, the application performance measurement unit 122 calculates the performance deterioration that occurs when multiple applications 11 operate compared to the performance obtained when each application 11 operates alone. That is, the application performance measurement unit 122 functions as a performance measurement unit that measures the cache sensitivities and the cache pollutivities relating to the applications 11.
The cache evaluation indicator calculation unit 123 calculates indicators for the cache sensitivities and indicators for the cache pollutivities based on the performance deterioration obtained by the application performance measurement unit 122 and the measurement amount obtained by the cache usage amount measurement unit 121.
The sensitivity calculation unit 124 calculates the sensitivities of these applications 11 by statistically processing the performances obtained when the applications 11 operate alone and the performances obtained when multiple applications 11 operate, the applications 11 having been extracted as representative applications. Furthermore, the sensitivity calculation unit 124 calculates the cache sensitivity of a new application 11 based on how the application 11 is using the cache and the cache sensitivity of the application 11.
The pollutivity calculation unit 125 calculates the pollutivities of these applications 11 by statistically processing the performances obtained when the applications 11 operate alone and the performances obtained when multiple applications 11 operate, the applications 11 having been extracted as representative applications. Furthermore, the pollutivity calculation unit 125 calculates the cache pollutivity of a new application 11 based on how the application 11 is using the cache, and the indicator for the cache pollutivity of the application 11.
The sensitivity calculation unit 124 and the pollutivity calculation unit 125 function as calculation units that calculate the cache sensitivities and/or the cache pollutivities of the applications based on the partial regression coefficients and the cache usage state.
In steps S10 to S12, the cache usage indicator calculation unit 12 repeatedly performs processing regarding each application 11 extracted as being representative. Here, it is desirable that as many applications as possible that are to be used as the CPU benchmark are extracted and processed by the cache usage indicator calculation unit 12.
In step S11, the application performance measurement unit 122 measures the performances obtained when the applications operate alone, and the cache usage amount measurement unit 121 measures the cache usage parameters.
In step S12, the cache usage indicator calculation unit 12 determines whether or not all of the applications 11 have been processed. If there is an unprocessed application 11, the cache usage indicator calculation unit 12 returns to step S10 and repeatedly performs processing. If all of the applications 11 have been processed, the cache usage indicator calculation unit 12 advances to step S13.
In steps S13 to S16, the cache usage indicator calculation unit 12 repeatedly performs processing on combinations of two applications whose performances obtained during individual operation were measured.
The application performance measurement unit 122 measures the performance by causing both combined applications 11 to operate (S14), and stores the measured performance data in the database unit 126.
Furthermore, the application performance measurement unit 122 calculates the deterioration degree of the performances of both applications 11 (S15) and stores the calculated performance deterioration degree in the database unit 126.
In step S16, the cache usage indicator calculation unit 12 determines whether or not all combinations of the applications 11 have been processed. If there is an unprocessed combination of applications 11, the cache usage indicator calculation unit 12 returns to step S13 and repeatedly performs processing. If all combinations of the applications 11 have been processed, the cache usage indicator calculation unit 12 advances to step S17.
In steps S17 to S20, the cache usage indicator calculation unit 12 repeatedly performs processing for each application 11 extracted as being representative.
The sensitivity calculation unit 124 calculates the cache sensitivity by finding the average of the degree by which the performance obtained when multiple applications 11 operate deteriorates relative to the performance obtained when an application 11 operates alone (S18).
The pollutivity calculation unit 125 calculates the cache pollutivity by finding the average of the degrees to which the performances of the other applications 11 deteriorate when multiple applications 11 operate (S19).
In step S20, the cache usage indicator calculation unit 12 determines whether or not all of the applications 11 have been processed. If there is an unprocessed application 11, the cache usage indicator calculation unit 12 returns to step S17 and repeatedly performs processing. If all of the applications 11 have been processed, the cache usage indicator calculation unit 12 advances to step S21.
The description will continue with reference to
The cache evaluation indicator calculation unit 123 sets the cache sensitivity of each application 11 as a target variable Y (S21).
The cache evaluation indicator calculation unit 123 sets the cache usage parameters of each application as descriptive variables X0, X1, . . . (S22). Here, the cache prefetch count, the cache reference count, the cache hit count, the cache miss count, the cache miss rate, the dTLB load count, the dTLB miss count, the dTLB miss rate, and the like are allocated to each descriptive variable X0, X1, . . . . Note that the target variable Y and the descriptive variables X0, X1, . . . satisfy the following Formula (1).
[Math. 1]
Y=b0X0+b1X1+b2X2+b3X3+ (1)
Here, the partial regression coefficients b0, b1, . . . are indicators for the cache sensitivity.
The cache evaluation indicator calculation unit 123 calculates the partial regression coefficients b0, b1, . . . based on the target variable Y and the descriptive variables X0, X1, . . . (S23), and records the calculated partial regression coefficients b0, b1, . . . in the database unit 126 (S24).
The cache evaluation indicator calculation unit 123 sets the cache pollutivity of each application 11 as the target variable Y (S25) and sets the cache usage parameters of each application 11 as the descriptive variables X0, X1, . . . (S26). Note that the target variable Y and the descriptive variables X0, X1, . . . satisfy the following Formula (2).
[Math. 2]
Y=c0X0+c1X1+c2X2+c3X3+ (2)
Here, the partial regression coefficients c0, c1, . . . are indicators for the cache sensitivity.
The cache evaluation indicator calculation unit 123 calculates the partial regression coefficients c0, c1, . . . based on the target variable Y and the descriptive variables X0, X1, . . . (S27). That is, the cache usage parameter that relates to the cache pollutivity of the application 11 is found, and weighting is performed. The cache evaluation indicator calculation unit 123 records the calculated partial regression coefficients c0, c1, . . . in the database unit 126 (S28), and the processing of
With this processing, the partial regression coefficients b0, b1, . . . are calculated as indicators of the cache sensitivity, and the partial regression coefficients c0, c1, . . . are calculated as indicators of the cache pollutivity. The cache sensitivity and the cache pollutivity corresponding to the target variable Y can be calculated through measurement of these partial regression coefficients and the cache usage parameters corresponding to the descriptive coefficients X0, X1, . . . for an unknown application 11.
The second to thirteenth rows indicate the applications for which the degree of performance deterioration was measured. The second to thirteenth columns indicate other applications that were operated in combination with the applications indicated in the rows.
The fourteenth row indicates the average values of the values from the second row to the thirteenth row of each column. These are the measurement values of the cache pollutivity for the applications indicated in the first row of the columns.
The fourteenth column indicates the average value of the values from the second column to the thirteenth column of each row. This is the result of measuring the cache sensitivity for the application indicated in the first column of that row.
First, the cache usage amount measurement unit 121 measures the cache usage parameter obtained when a given application 11 is operated (S40).
The sensitivity calculation unit 124 calculates the cache sensitivity of the application 11 based on the partial regression coefficients b0, b1, . . . that were calculated in advance and the cache usage parameters of the application 11.
The sensitivity calculation unit 124 calculates the cache pollutivity of the application 11 based on the partial regression coefficients c0, c1, . . . that were calculated in advance and the cache usage parameters of the application 11 (S42), and ends the processing of
The cache sensitivity and cache pollutivity of a given application can be measured easily. This makes it possible to provide indicators for tuning of a cache memory to a user.
(1) The cache usage indicator calculation apparatus is characterized by including:
a memory for reading and writing data;
a cache that can be accessed more rapidly than the memory;
a central processing unit configured to read and write from and to the memory and the cache and execute processing;
a usage state measurement unit configured to measure a usage state of the cache used by an application executed by the central processing unit;
a performance measurement unit configured to measure a cache sensitivity and a cache pollutivity relating to an application; and
an indicator calculation unit configured to, based on performance deterioration of a pre-selected plurality of applications and the usage state of the cache, calculate an indicator for the cache sensitivity and/or an indicator for the cache pollutivity of each application.
This makes it possible to calculate the cache sensitivity and cache pollutivity based on the cache usage state of each application.
(2) The performance measurement unit of the cache usage indicator calculation apparatus according to (1) is characterized in that
the performance measurement unit uses an average of degrees of performance deterioration of a first application among a pre-selected plurality of applications obtained when the first application is operated in combination with a second application among the plurality of applications, with respect to a performance obtained when the first application is operated alone, as the cache sensitivity of the first application.
This makes it possible to measure the cache sensitivity of the application.
(3) The performance measurement unit of the cache usage indicator calculation apparatus according to (1) is characterized in that
the performance measurement unit uses an average of degrees of performance deterioration of a first application among a pre-selected plurality of applications obtained when the first application is operated in combination with a second application among the plurality of applications, with respect to a performance obtained when the first application is operated alone, as the cache pollutivity of the second application.
This makes it possible to measure the cache pollutivity of the application.
(4) The indicator calculation unit of the cache usage indicator calculation apparatus according to any one of (1) to (3) is characterized in that
the indicator calculation unit performs multiple regression analysis using the performance deterioration of each application as a target variable and the usage state of the cache as a descriptive variable, and calculates a partial regression coefficient.
This makes it possible to specify the usage state of the cache that contributes to the cache sensitivity and/or the cache pollutivity of the application.
(5) The cache usage indicator calculation apparatus according to (4) is characterized by further including
a calculation unit configured to calculate the cache sensitivity and/or the cache pollutivity of an application based on the partial regression coefficient and the usage state of the cache.
This makes it possible to calculate the cache sensitivity and/or the cache pollutivity by measuring the cache usage state of a given application.
(6) A cache usage indicator calculation method is characterized in that
a computer including
a memory for reading and writing data,
a cache that can be accessed more rapidly than the memory, and
a central processing unit configured to read and write from and to the memory and the cache and execute processing,
executes:
This makes it possible to calculate the cache sensitivity and cache pollutivity based on the cache usage state of each application.
(7) The cache usage indicator calculation method according to (6) is characterized by further executing
a step of calculating the cache sensitivity and/or the cache pollutivity of an application based on the indicator and the usage state of the cache.
This makes it possible to calculate the cache sensitivity and/or the cache pollutivity by measuring the cache usage state of a given application.
(8) A cache usage indicator calculation program is for causing
a computer including
a memory for reading and writing data,
a cache that can be accessed more rapidly than the memory, and
a central processing unit configured to read and write from and to the memory and the cache and execute processing,
to execute:
This makes it possible to calculate the cache sensitivity and cache pollutivity based on the cache usage state of each application.
The present invention is not limited to the above-described embodiment, can be modified without departing from the gist of the present invention, and for example, variations such as the following (a) to (c) are possible.
(a) The applications shown in
(b) A virtual machine or a container may also be included in the application being measured.
(c) Either one of the cache sensitivity and the cache pollutivity may be calculated.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/029789 | 7/30/2019 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2021/019674 | 2/4/2021 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20060174045 | Maeda | Aug 2006 | A1 |
20070150657 | Yigzaw | Jun 2007 | A1 |
20120297145 | Castillo | Nov 2012 | A1 |
20180217937 | Koo | Aug 2018 | A1 |
20180322024 | Nagao | Nov 2018 | A1 |
Entry |
---|
Nakamura et al., “Mitigating CPU Cache Contention in Virtual Environments,” 2016 IEICE Communication Society Conference, Sep. 20, 2016, 3 pages (with English Translation). |
Nguyen, “Intel's Cache Monitoring Technology: Use Models and Data,” Intel Corporation, Mar. 31, 2016, retrieved from URL <https://software.intel.com/en-us/blogs/2014/12/11/intels-cache-monitoring-technology-use-models-and-data>, 14 pages. |
Nguyen, “Usage Models for Cache Allocation Technology in the Intel Xeon Processor E5 v4 family,” Intel Corporation, Feb. 11, 2016, retrieved from URL <https://software.intel.com/en-us/articles/cache-allocation-technology-usage-models>, 6 pages. |
Number | Date | Country | |
---|---|---|---|
20220261348 A1 | Aug 2022 | US |