The present invention relates to an information processing apparatus, an information processing system, and an information processing method.
A multi-access edge computing (MEC) server that provides a cloud computing function and an IT service environment at the edge of a network has many computing resources such as a memory, a central processing unit (CPU), and a graphics processing unit (GPU).
The MEC server and terminals are connected to each other through a network, and various users allocate tasks to the MEC. Then, the MEC server analyzes data obtained from each terminal according to the task, and sends feedback. In particular, a use case is assumed in which the MEC server installed on-premises, such as in a factory, is connected to a video distribution terminal in order to monitor the status of workers, products, and the like, and high-speed and stable processing on video data is required.
Conventionally, a GPU used for image processing has a large number of calculation units thereinside. Therefore, it is known that by applying GPUs to general-purpose calculations, simpler parallel calculations can be executed more efficiently than CPUs. In recent years, GPUs have been applied not only to image processing but also to processing of various applications. The GPU used for general-purpose calculations is particularly called a GPGPU (general purpose graphics processing unit). However, in this specification and diagrams, both the GPU and the GPGPU are considered as processors for speeding up parallel calculations including image processing, and both are described as GPUs without distinction. If the GPU arranged on the MEC server can be effectively used, it is expected that image processing applications or AI applications can be processed at high speed.
However, when multiple tasks are executed simultaneously in the same computing environment, processors may compete for the use of resources due to changes in resource requests of the tasks, causing performance degradation.
JP 2017-37533 A proposes a method of evaluating the occurrence of overhead in a parallel calculation environment using heterogeneous processors with the sum of task execution time and deviation and allocating processors so that the variation in task processing that occurs depending on the number or frequency of tasks to be executed is reduced.
However, when simultaneously executing tasks with different average execution times, processors running many applications with long average execution times may erroneously determine that the processing overhead is large even though resource contention is not large. For this reason, it is considered that the magnitude of the processing variation cannot be evaluated correctly. As a result, when real-time video analysis such as posture estimation is executed multiple times, degradation of the application operation cannot be detected, which may be a factor in lowering production safety.
It is an object of the invention to allocate tasks to processors so that resource contention is reduced even when tasks with different average execution times are executed in parallel in an information processing apparatus.
An information processing apparatus according to one aspect of the invention includes: processors to which a plurality of tasks are respectively allocated; a task execution history recording unit that records a task execution time of each of the plurality of tasks as a history; a score calculation unit that calculates a score indicating a degree of resource contention of the processor based on the task execution time; and a processor allocation unit that changes allocation of the task based on the score.
According to one aspect of the invention, it is possible to allocate tasks to processors so that resource contention is reduced even when tasks with different average execution times are executed in parallel.
Hereinafter, embodiments will be described with reference to the diagrams.
In a first embodiment, an example in which a plurality of GPUs are arranged will be described below. In addition, although the invention will be described with the GPU as an example in the first embodiment, the invention may be applied to a CPU.
A storage 100 is connected to a CPU 120 using, for example, a magnetic disk as a medium, and records a task execution time 101 and a processor utilization rate 102. A task execution file 104 for tasks to be executed on the GPU is arranged, and a task priority table 105 for linking execution priority to each task is arranged. In addition, there is an F distribution table 103 that is used for changing GPU processor allocation. Some or all of programs or data may be stored in the storage 100 in advance, or may be introduced from a non-temporary storage medium or from an information processing apparatus including an external non-temporary storage device through a network.
A GPU driver 115 is installed on an OS 110 that runs on the CPU 120, and the utilization rate of the GPU 10 (percentage of time during which the GPU is occupied by some task within a predetermined period) can be acquired through the GPU driver 115. In addition, there is a task execution history recording unit 114 that acquires the task execution time of the GPU and records the task execution time of the GPU in the task execution time 101 of the storage 100. A score calculation unit 113 calculates the score of the GPU 10 based on the task execution time 101 recorded on the storage 100, and determines whether or not to change the processor allocated to the task by referring to the F distribution table 103 on the storage 100. A processor allocation unit 111 is responsible for suspending and resuming tasks when task allocation needs to be changed.
The management terminal 4 includes a parameter input unit 130 for the end user or system administrator to input parameters necessary for system operation and a task execution status monitoring unit 131 that allows the end user or system administrator to visually check the task execution status.
Each processing unit described above is realized by executing a program read from the storage 100 by the CPU 120.
In the present embodiment, the information processing apparatus 1 includes functional units necessary for the invention, but this is an example of the embodiment of the invention, and functional units other than the processor for task processing do not necessarily have to be located inside the information processing apparatus 1. Therefore, it may be devised to provide the functional units on the management terminal 4 that communicates with the information processing apparatus 1.
In
The significance level 1301 is a level at which GPU resource contention degree differences are considered to occur by chance. The smaller the significance level, the less error is allowed in the test, and the larger the F-value, which is a threshold value used to determine the significant difference by the F-test. Therefore, processor allocation changes are less likely to occur. Conversely, the larger the significance level, the more error is allowed in the test. Therefore, processor allocation changes are more likely to occur. The sample size 1302 is the number of samples used for the F-test, and the processing time of each task immediately before is read from the task execution time 101 on the storage 100 by the number of the sample size 1302 at predetermined time intervals.
Task information is acquired through the GPU driver 115 on the OS 110. Whether each task is being executed or stopped is displayed according to the acquired task execution status.
The task execution time 101 immediately before recorded on the storage 100 is read by the number of the sample size 1302 designated by the parameter input unit 130. Each read task execution time 101 is associated with the execution priority of the task 1000.
Specific calculation of the score 1124 in the score calculation unit 113 is performed by using the following (Equation 1) to (Equation 4), and the score 1124 is calculated from the variation coefficient of the execution time of the task 1000.
In (Equation 1) to (Equation 4), t is a task execution time, p is a task number, tp is an average execution time, i is the number of executions, n is a sample size, sp is a standard deviation, CVp is a variation coefficient, g is a GPU, SCg is a score, and m is the number of tasks on the GPU.
The variation coefficient 1123 is a value obtained by dividing the standard deviation 1122 of each task by the average execution time 1121, and the variation coefficient is not affected by the average execution time. Therefore, even for tasks with different average execution times 1121, only the magnitude of variation can be objectively compared.
If the degree of resource contention increases, task processing becomes unstable and accordingly, the variation coefficient of the task execution time 101 also naturally increases. For this reason, the average value of the variation coefficients 1123 of the task execution time 101 is calculated for each GPU and set as the score 1124 indicating the degree of resource contention of the GPU 10. Similarly, the score 1124 calculated from the average of the variation coefficients 1123 is also a value that is not affected by the average execution time 1121 of the task 1000 on each GPU. Therefore, scores between different GPUs can be objectively compared.
In the first embodiment, the sample size 1302 of the execution time of the task 1000 extracted from the task execution time 101 is uniform for all tasks. For example, if the significance level 1301 and the sample size 1302 input from the parameter input unit 130 in advance are 5% and 20, respectively, the F-value, which is a threshold value for determining whether there is a significant difference in variance, is 2.12 according to the F distribution table 103.
Here, the F-test is a test of whether there is a significant difference between population variances. The score 1124 calculated by averaging the variation coefficients of tasks on each GPU is obtained by averaging the standard deviation 1122 of the task 1000 on the GPU when the average execution time 1121 of the task 1000 is set to 1. Since the variance is the square of the standard deviation, as long as the execution time of the task 1000 follows a normal distribution, it is possible to determine whether there is a significant difference by the F-test of the score 1124. The F-test of the score is performed by using the GPU_A 10a with the highest score and the GPU_B 10b with the lowest score, and a test value F0 is calculated by the following (Equation 5). SCa in (Equation 5) is the score of the GPU_A, SCb is the score of the GPU_B, and F0 is a test value.
In the first embodiment, when the value of the score 1124 is calculated by (Equation 5), the test value F0 is 3.24 and F≤F0 is satisfied, so that it is determined by the F-test that there is a significant difference. In addition, in the present embodiment, the F-value is calculated in advance and summarized as the F distribution table 103, and the F distribution table 103 is referred to when calculating the score. However, the F-value may be calculated based on the input significance level 1301 and the sample size 1302 when calculating the score without using the F distribution table.
If the value of the sample size 1302 used for the calculation of the score 1124 and the F-test is small, a sample with a sufficient execution time cannot be obtained. In this case, since the accuracy of the score is not always good, it is desirable that the sample size 1302 is a predetermined size. When processing has been performed as many as the number of the sample size 1302 determined by the task 1000 executed on the GPU 10, it is determined that the accuracy of the significance determination of a score 1324 by the F-test has reached a desired accuracy, and the score 1324 is calculated and the F-test is performed. Hereinafter, details of each step in the sequence will be described by using the case of the numerical values shown in
In step s2001, the task execution history recording unit 114 records the execution time of each task in the task execution time 101 on the storage.
In step s2002, it is determined whether or not the number of recordings of the execution time of the task 1000 exceeds the sample size 1302. If Yes (s2002: Yes), the process proceeds to step s2003. If No (s2002: No), the process proceeds to step s2001.
In step s2003, the score calculation unit 113 on the CPU 120 calculates the score 1324 indicating resource contention of each GPU by using (Equation 1) to (Equation 4).
In step s2004, the test value F0 is calculated by Equation (5). Then, the F-value is determined from the sample size 1302 and the significance level 1301 input in the parameter input unit 130 and the F distribution table 103 on the storage 100. By using the determined F-value as a threshold value for determining whether or not there is a significant difference, it is determined by the F-test whether or not F≤F0 is satisfied. If Yes (s2004: Yes), it is determined that there is a significant difference in score, and the process proceeds to step s2005. If No (s2004: No), it is determined that there is no significant difference, and the process proceeds to step s2010.
In step s2005, the task C 1000c with the lowest execution priority on the GPU_A 10a with the highest score is suspended by the processor allocation unit 111. As a result, the resource contention of the GPU_A 10a having the highest degree of resource contention is reduced.
In step s2006, when resuming the suspended task C 1000c, the utilization rate of each GPU is acquired by referring to the processor utilization rate 102 of the GPU 10 recorded on the storage 100 through the GPU driver 115 on the OS 110.
In step s2007, an attempt is made to resume the operation of the suspended task on the GPU_B 10b with the lowest score 1124. At this time, if it is possible to resume the operation of the suspended task (s2007: Yes), the process proceeds to step s2009. If there is no room to resume the operation of the suspended task because the GPU is occupied by another task (s2007: No), the process proceeds to step s2008.
In step s2008, in order to resume the suspended task C 1000c, a variation in the utilization rate of the GPU 10 is awaited, and nothing is done for a predetermined period of time until the next step is executed.
In step s2009, the processor allocation unit 111 resumes the operation of the suspended task on the GPU_B 10b with the lowest score.
In step s2010, the execution time recorded in the task execution time 101 is deleted to make the recorded content empty.
In step s2011, it is determined whether or not to terminate the system. If the system administrator or the end user inputs any command from the management terminal 4 (s2011: Yes), the sequence ends. If there is no command input (s2011: No), the process proceeds to start.
In a second embodiment, an example in which only one GPU is arranged will be described. In addition, although the invention will be described with the GPU as an example in the second embodiment, the invention may be applied to a CPU.
Differences in the system configuration from
Of the GPUs 10 arranged in
Differences from
Since the F-test is not performed in the second embodiment, the significance level 1301 is not input. Instead, an allowable value 1303 is input. As the allowable value 1303, an arbitrary value is input by the end user or the system administrator. If the score calculated by using Equations (1) to (4) is equal to or greater than the allowable value, it is determined that the allowable resource contention is exceeded, and tasks with low execution priority on the GPU_A 10a are stopped.
Referring to
Since the F-test is not performed in the second embodiment, it is not always necessary to unify the sample size for all the tasks 1000. In
In step s2013, in order to reduce the resource contention on the GPU_A 10a, the task C 1000c with the lowest execution priority among the tasks 1000 executed on the GPU_A 10a is suspended.
In step s2014, when resuming the suspended task C 1000c, it is determined whether or not the task C 1000c can be resumed with reference to the processor utilization rate 102 of the GPU_A 10a obtained in step s2006. If it is possible to resume the task C 1000c (s2014: Yes), the process proceeds to step s2015. If there is no room to resume the task C 1000c because the GPU is occupied by another task (s2014: No), the process proceeds to step s2008.
In step s2015, the operation of the task C 1000c that has been suspended on the GPU_A 10a resumes.
In step s2016, it is determined whether or not the execution period of the task 1000 exceeds the sample period 1304. As the value of the task execution period, the sum of the execution times of tasks recorded in the task execution time 101 is used. If Yes (s2016: Yes), the process proceeds to step s2003. If No (s2016: No), the process proceeds to step s2001.
According to the embodiment described above, tasks can be executed with good efficiency even when applications with different average execution times are executed simultaneously in the same computing environment. Therefore, it is possible to suppress variations in application processing due to resource contention. This is effective for real-time video analysis such as posture estimation.
In addition, the invention is not limited to the embodiments described above, and includes various modification examples. For example, the above embodiments have been described in detail for easy understanding of the invention, but the invention is not necessarily limited to having all the components described above. In addition, some of the components in one embodiment can be replaced with the components in another embodiment, and the components in another embodiment can be added to the components in one embodiment. In addition, for some of the components in each embodiment, addition, removal, and replacement of other components are possible.
Number | Date | Country | Kind |
---|---|---|---|
2022-101681 | Jun 2022 | JP | national |