This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2021-22593, filed on Feb. 16, 2021, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a storage medium storing a multiple control program, an information processing apparatus, and a multiple control method.
In recent years, systems that execute artificial intelligence (AI) processing using a graphical processing unit (GPU) have been increasing. For example, there is a system that performs object detection or the like by AI processing of a video.
In such a system, one GPU processes videos transferred from one camera. However, since the videos are sent at regular intervals, time when the GPU is not used is generated between pieces of processing. It is expected that one GPU accommodates and processes videos transferred from a plurality of cameras so that the time when the GPU is not used is not generated and the GPU is efficiently used.
Japanese Laid-open Patent Publication No. 2020-109890, Japanese Laid-open Patent Publication No. 2020-135061, and Japanese Laid-open Patent Publication No. 2019-175292 are disclosed as related art.
According to an aspect of the embodiments, a non-transitory computer-readable storage medium storing a multiple control program that causes at least one computer to execute a process, the process includes, storing a processing time of a first step in processes of a plurality of applications as a first threshold in a storage unit when the processes are executed in an overlapping manner; and when receiving an execution request from a subsequent application during execution of a process of the plurality of applications, delaying start of a process of the subsequent application by the first threshold or more from start of a process of a preceding application being executed.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
When one GPU processes a plurality of videos, in some cases, a plurality of processes are executed by one GPU in an overlapping manner. In such cases, there is a problem in which processing time increases due to interference between the processes.
A case in which processing time increases due to interference between processes will be described with reference to
When a GPU executes one process for inference processing of videos, the GPU executes inference processing at predetermined regular intervals. However, when the GPU executes four processes in parallel for inference processing of videos, pieces of inference processing may interfere with each other, causing an increase in processing time. The degree of increase in processing time varies depending on the details of the inference processing and the manner of overlapping. For example, the degree of increase in processing time is larger when the overlap between pieces of inference processing is larger and the number of overlapping pieces of inference processing is larger. Since the start timings of inference processing are different from each other, when many pieces of inference processing happen to start at close timings, the number of overlapping pieces of inference processing increases, the degree of increase in processing time increases, and the processing time of inference processing exceeds a fixed period. For example, there arises a problem in which processing time increases due to interference between processes.
In one aspect, an object of the present disclosure is to suppress an increase in processing time due to overlapping execution of processes even when one GPU executes a plurality of processes in an overlapping manner.
Hereinafter, the embodiments of a multiple control program, an information processing apparatus, and a multiple control method disclosed in the present application will be described in detail with reference to the drawings. The present disclosure is not limited by the embodiments.
[Configuration of System]
The storage server 3 includes a data source 31 of videos output respectively from the plurality of cameras 5, and the inference model 32. The inference model 32 is a model used for inference processing of the inference process 11 and is based on a predetermined algorithm. In the first embodiment, the inference model 32 based on the same algorithm is used by a plurality of inference processes 11.
In the execution server 1, a GPU use control unit 12 is provided between a plurality of inference processes 11, and a GPU driver 13 and the AI framework 14. The execution server 1 includes profile information 15.
The GPU driver 13 is dedicated software for controlling the GPU. For example, the GPU driver 13 transmits a GPU use request requested from the GPU use control unit 12 to the AI framework 14. The GPU driver 13 transmits the processing result returned from the AI framework 14 to the GPU use control unit 12.
The AI framework 14 executes inference processing of the inference process 11. The AI framework 14 is a library for performing inference processing on a video, and is incorporated in the inference process 11 (application). The AI framework 14 is called by the inference process 11, and executes inference processing via the GPU driver 13. Examples of the AI framework 14 include TensorFlow, MXNet, Pytorch, and the like.
The GPU use control unit 12 monitors a GPU use request from the inference process 11 (application), and changes the start timing of GPU use in the inference process 11. For example, when a plurality of inference processes 11 are executed in an overlapping manner, the GPU use control unit 12 controls the use of the GPU by delaying the start of a subsequent inference process 11 based on a predetermined threshold. In the first embodiment, the predetermined threshold is a value of processing time of a phase, among a plurality of phases included in the inference process 11, having a large influence on processing time when executed in an overlapping manner (with interference). For example, the predetermined threshold is a value of processing time of a phase, among a plurality of phases included in the inference process 11, that increases the processing time when overlapping (interference) occurs. When two inference processes 11 are executed at close timings, the GPU use control unit 12 delays the start of the subsequent inference process 11 by the predetermined threshold from the start of the preceding inference process 11 to suppress an increase in processing time due to interference. In the first embodiment, since the same inference model 32 (algorithm) is used in a plurality of inference processes 11, the processing times of the plurality of phases in each of the plurality of inference processes 11 are the same.
The profile information 15 stores a predetermined threshold. For example, the predetermined threshold is the processing time of convolution processing described later. As an example, the GPU use control unit 12 measures the processing time of convolution processing in advance, and records the processing time in the profile information 15. The profile information 15 is an example of a storage unit.
Multiple control according to the first embodiment will be described with reference to
When a plurality of inference processes 11 are executed in an overlapping manner, the influence on an increase in processing time varies depending on the combination of overlapping phases. When phases of the same type overlap, an increase in processing time is large. When different types of phase overlap, an increase in processing time is small. As illustrated in the left diagram of
For example, when a plurality of inference processes 11 are executed at close timings, the GPU use control unit 12 delays the start of a subsequent inference process 11 by a threshold or more, with the processing time of convolution processing in the inference process 11 as the threshold. The processing time of convolution processing used as the threshold is the processing time of convolution processing measured in a state where the inference process 11 does not overlap another inference process 11, and may be measured in advance.
As illustrated in
The GPU use control unit 12 delays the start of inference processing of the application c subsequent to the application b by the threshold or more from the start of the inference processing of the application b executed immediately before, transmits a start request (GPU use request) of the application c to the AI framework 14, and causes the AI framework to execute inference processing. Thus, the GPU use control unit 12 may perform control such that the convolution processing of the application a, the convolution processing of the application b, and the convolution processing of the application c do not overlap.
[Functional Configuration of GPU Use Control Unit]
The use detection unit 121 detects a GPU use request (application start request) from the inference process 11 (application). The GPU use request includes the name of the inference model 32 and the identifier of the data source 31. The use detection unit 121 outputs the process ID of the inference process 11 that has made the detected GPU use request to the delay execution determination unit 123.
The reading unit 122 reads a threshold from the profile information 15. The reading unit 122 outputs the read threshold to the delay execution determination unit 123 described later.
An example of the profile information 15 according to the first embodiment will be described with reference to
Referring back to
When the request queue 125 is not empty, the delay execution determination unit 123 accumulates the GPU use request in the request queue 125. An example of the data structure of the request queue 125 will be described with reference to
Referring back to
The use request transmission unit 126 transmits a GPU use request to the AI framework 14 via the GPU driver 13. For example, the use request transmission unit 126 updates the latest time of GPU use (GPU latest use time) to the current time. The use request transmission unit 126 records the requesting process ID of the GPU use request in association with the GPU latest use time. The association between the GPU latest use time and the requesting process ID is recorded in a storage unit (not illustrated). The use request transmission unit 126 transmits the GPU use request to the GPU driver 13.
The processing result reception unit 127 receives a processing result processed by the AI framework 14 via the GPU driver 13.
The processing result transmission destination determination unit 128 determines a transmission destination of the processing result. For example, the processing result transmission destination determination unit 128 acquires, from the use request transmission unit 126, the requesting process ID associated with the recorded GPU latest use time as the transmission destination of the processing result.
The processing result transmission unit 129 transmits the processing result to the inference process 11 corresponding to the requesting process ID determined by the processing result transmission destination determination unit 128.
[Hardware Configuration of Execution Server]
The network interface 25 is a network interface card or the like, and communicates with other devices such as the storage server 3. The hard disk 24 stores the profile information 15 and a program for operating the functions illustrated in
The CPU 21 reads, from the hard disk 24 or the like, a program for executing the same processing as that of each processing unit illustrated in
The GPU 22 reads, from the hard disk 24 or the like, a program for executing inference processing of the inference process 11 by using the AI framework 14 illustrated in
[Flowchart of GPU Use Control]
A flowchart of GPU use control processing according to the first embodiment will be described with reference to
[Flowchart of Delay Execution Determination Processing]
Next, the delay execution determination unit 123 determines whether the request queue 125 that accumulates waiting use requests is empty (step S13). When it is determined that the request queue 125 is empty (Yes in step S13), the delay execution determination unit 123 acquires the GPU latest use time recorded in the storage unit (not illustrated) (step S14). The GPU latest use time is the latest time of GPU use, and is, for example, a time at which a GPU use request has been most recently transmitted. The GPU latest use time is recorded by the use request transmission unit 126.
The delay execution determination unit 123 acquires a threshold from the profile information 15 (step S15). The delay execution determination unit 123 acquires the current time from a system (operating system (OS)) (step S16). The delay execution determination unit 123 calculates a waiting time from the following formula (1) (step S17).
Waiting time=(GPU latest use time+threshold)−current time (1)
The delay execution determination unit 123 determines whether the waiting time is larger than 0 (step S18). When it is determined that the waiting time is equal to or smaller than 0 (No in step S18), the delay execution determination unit 123 outputs the detected GPU use request and the PID to the use request transmission unit 126, and requests for transmission of the request (step S19). For example, when the waiting time is equal to or smaller than 0, the GPU latest use time is earlier than the current time by the threshold or more. Thus, the delay execution determination unit 123 determines that the subsequent inference process 11 does not overlap the convolution processing of the preceding inference process 11, and makes a GPU use request for the subsequent inference process 11. The delay execution determination unit 123 ends the delay execution determination processing.
On the other hand, when it is determined that the waiting time is larger than 0 (Yes in step S18), the delay execution determination unit 123 adds the GPU use request information and the PID to the request queue 125 (step S20). The delay execution determination unit 123 sets the waiting time in the delay-waiting request management unit 124 (step S21). For example, the delay execution determination unit 123 performs control to delay the start timing of the (subsequent) inference process 11 for which a GPU use request is detected by the threshold or more from the start of use of the preceding inference process 11. For example, the delay execution determination unit 123 performs control so that the convolution processing of the inference process 11 for which the GPU use request is made does not overlap the convolution processing of the preceding inference process 11. The delay execution determination unit 123 ends the delay execution determination processing.
When it is determined in step S13 that the request queue 125 is not empty (No in step S13), the delay execution determination unit 123 adds the GPU use request information and the PID to the end of the request queue 125 (step S22). The delay execution determination unit 123 ends the delay execution determination processing.
[Flowchart of Delay-Waiting Request Management Processing]
On the other hand, when it is determined that the waiting time has been set (Yes in step S31), the delay-waiting request management unit 124 waits until the set time passes (step S32). After waiting until the set time passes, the delay-waiting request management unit 124 outputs the first request in the request queue 125 and the PID to the use request transmission unit 126, and requests for transmission of the request (step S33).
The delay-waiting request management unit 124 determines whether the request queue 125 is empty (step S34). When it is determined that the request queue 125 is not empty (No in step S34), the delay-waiting request management unit 124 acquires the threshold from the profile information 15 (step S35). The delay-waiting request management unit 124 sets the threshold as a waiting time in order for the next request to wait (step S36). For example, the delay-waiting request management unit 124 performs control to delay the start timing of the inference process 11 for which the next GPU use request is made by the threshold or more from the start of the use of the preceding inference process 11. The delay-waiting request management unit 124 proceeds to step S32.
On the other hand, when it is determined that the request queue 125 is empty (Yes in step S34), the delay-waiting request management unit 124 ends the delay-waiting request management processing.
[Flowchart of Use Request Transmission Processing]
On the other hand, when it is determined that there has been a request for transmission of a GPU use request (Yes in step S41), the use request transmission unit 126 acquires the current time from the system (OS) (step S42). The use request transmission unit 126 updates the GPU latest use time to the current time (step S43). The use request transmission unit 126 records the requesting PID in association with the GPU latest use time (step S44).
The use request transmission unit 126 transmits the GPU use request to the GPU driver 13 (step S45). The use request transmission unit 126 ends the use request transmission processing.
[Flowchart of Processing Result Transmission Destination Determination Processing]
On the other hand, when it is determined that the processing result has been received (Yes in step S51), the processing result transmission destination determination unit 128 acquires the recorded requesting PID from the use request transmission unit 126 (step S52). The processing result transmission destination determination unit 128 transmits the processing result to the application (the inference process 11) corresponding to the acquired PID (step S53). The processing result transmission destination determination unit 128 ends the processing result transmission destination determination processing.
As described above, in the first embodiment, when processes of a plurality of applications are executed in an overlapping manner, the execution server 1 records, in the profile information 15, the processing time of the first step in the processes of the plurality of application as a threshold. When receiving an execution request from a subsequent application during execution of a process of any application among the plurality of applications, the execution server 1 delays the start of the process of the subsequent application by a threshold or more from the start of the process of the preceding application being executed. With such a configuration, the execution server 1 may perform control such that the first steps do not overlap, and may suppress an increase in processing time due to overlapping execution of the first steps.
In the first embodiment, the execution server 1 delays the start of the process of the subsequent application by a value obtained by subtracting the time of the timing of the execution request of the subsequent application from the value obtained by adding the threshold to the start time of the preceding application being executed, or more. With such a configuration, the execution server 1 may delay the start of the process of the subsequent application by such a length of time that the first steps do not overlap, or longer.
In the first embodiment, when processes of a plurality of applications use the same algorithm, the execution server 1 sets a value obtained by measuring the processing time of the first step as the threshold. With such a configuration, by using the value obtained by measuring the processing time of the first step as the threshold, the execution server 1 may suppress an increase in processing time due to overlapping execution of the first steps.
In the first embodiment, when a plurality of inference processes 11 are executed in an overlapping manner, the same inference model 32 (algorithm) is used in the inference processes 11. For example, the execution server 1 measures the processing time of the convolution processing of any inference process 11 and records the processing time as a threshold in the profile information 15, and delays the start timing of a subsequent inference process 11 by the threshold or more from the start of use of a preceding inference process 11. However, without being limited to the case of the first embodiment, different inference models 32 (algorithms) may be used in a plurality of inference processes 11 when the inference processes 11 are executed in an overlapping manner.
In the second embodiment, a case will be described in which different inference models 32 (algorithms) are used in a plurality of inference processes 11 when the inference processes 11 are executed in an overlapping manner.
[Functional Configuration of GPU Use Control Unit]
The profile information 15A stores the processing time of preprocessing and the processing time of convolution processing for each inference model 32 (algorithm). As an example, the GPU use control unit 12 measures the processing time of preprocessing and the processing time of convolution processing for each inference model 32 in advance, and records them in the profile information 15A.
An example of the profile information 15A according to the second embodiment will be described with reference to
As an example, when model name is “model A”, “Tb_A” is stored as the preprocessing time and “Tt_A” is stored as the convolution processing time. When model name is “model B”, “Tb_B” is stored as the preprocessing time and “Tt_B” is stored as the convolution processing time. When model name is “model C”, “Tb_C” is stored as the preprocessing time and “Tt_C” is stored as the convolution processing time. “Tb_A”, “Tt_A”, “Tb_B”, “Tt_B”, “Tb_C”, and “Tt_C” are positive integers.
Referring back to
For example, the delay execution determination unit 123A acquires the model name of the inference model 32 included in the GPU use request. The delay execution determination unit 123A determines whether the request queue 125 that accumulates GPU use requests is empty. When the request queue 125 is empty, the delay execution determination unit 123A acquires the latest time of GPU use (GPU latest use time) and the model name of the latest used inference model 32. For example, the delay execution determination unit 123A acquires the model name of the inference model 32 used in the inference process 11 executed immediately before (preceding inference process). The delay execution determination unit 123A acquires, from the profile information 15A, the preprocessing time and the convolution processing time corresponding to the model name of the inference model 32 used in the preceding inference process 11. The delay execution determination unit 123A acquires, from the profile information 15A, the preprocessing time and the convolution processing time corresponding to the model name of the inference model 32 used in the requesting (subsequent) inference process 11.
The delay execution determination unit 123A calculates, as a threshold, a value obtained by subtracting the preprocessing time corresponding to the inference model 32 used in the subsequent inference process 11 from the value obtained by adding the preprocessing time and the convolution processing time corresponding to the inference model 32 used in the preceding inference process 11. For example, the delay execution determination unit 123A calculates the threshold based on the combination of the inference model 32 used in the preceding inference process 11 and the inference model 32 used in the subsequent inference process 11.
The delay execution determination unit 123A calculates, as a waiting time, a time obtained by subtracting the current time from the time obtained by adding the threshold to the latest use time. When the waiting time is larger than 0, the delay execution determination unit 123A accumulates the GPU use request in the request queue 125, and sets the waiting time in the delay-waiting request management unit 124A. For example, the delay execution determination unit 123A performs control to delay the start timing of the (subsequent) inference process 11 for which the GPU use request is made by the threshold or more from the start of use of the preceding inference process 11. For example, the delay execution determination unit 123A performs control such that the convolution processing of the inference process 11 for which the GPU use request is made does not overlap the convolution processing of the preceding inference process 11. When the waiting time is equal to or smaller than 0, the delay execution determination unit 123A makes the GPU use request to the use request transmission unit 126. For example, when the waiting time is equal to or smaller than 0, the GPU latest use time is earlier than the current time by the threshold or more. Thus, the delay execution determination unit 123A determines that the subsequent inference process 11 does not overlap the convolution processing of the preceding inference process 11, and makes a GPU use request for the subsequent inference process 11.
The delay-waiting request management unit 124A manages the GPU use requests waiting for delay. For example, the delay-waiting request management unit 124A waits until a waiting time set by the delay execution determination unit 123A passes. After waiting until the waiting time passes, the delay-waiting request management unit 124A makes the first GPU use request in the request queue 125 to the use request transmission unit 126. The delay-waiting request management unit 124A determines whether the request queue 125 is empty. When the request queue 125 is not empty, the delay-waiting request management unit 124A acquires the inference model name of the first request in the request queue 125. The delay-waiting request management unit 124A acquires the model name of the inference model 32 used in the inference process 11 executed immediately before (preceding inference process). The delay-waiting request management unit 124A acquires, from the profile information 15A, the preprocessing time and the convolution processing time corresponding to the inference model name of the request. The delay-waiting request management unit 124A acquires, from the profile information 15A, the preprocessing time and the convolution processing time corresponding to the model name of the inference model 32 used in the preceding inference process 11.
The delay-waiting request management unit 124A calculates, as a threshold, a value obtained by subtracting the preprocessing time corresponding to the inference model name of the request from the value obtained by adding the preprocessing time and the convolution processing time corresponding to the inference model 32 used in the preceding inference process 11. For example, the delay-waiting request management unit 124A calculates the threshold based on the combination of the inference model 32 used in the preceding inference process 11 and the inference model 32 used in the inference process 11 for which the request is made.
The delay-waiting request management unit 124A sets the calculated threshold value as the waiting time. For example, the delay-waiting request management unit 124A performs control to delay the start timing of the subsequent inference process 11 by the threshold from the start of use of the currently transmitted inference process 11 so that the convolution processing of the subsequent inference process 11 and the convolution processing of the preceding inference process 11 do not overlap.
[Flowchart of GPU Use Control]
A flowchart of delay execution determination processing according to the second embodiment will be described with reference to
Next, the delay execution determination unit 123A determines whether the request queue 125 that accumulates waiting use requests is empty (step S63). When it is determined that the request queue 125 is empty (Yes in step S63), the delay execution determination unit 123A acquires the recorded GPU latest use time and latest use model name (step S64). In this case, the latest use model name is “model B”. The GPU latest use time and the latest use model name are recorded by the use request transmission unit 126.
The delay execution determination unit 123A acquires information corresponding to the model name from the profile information 15A (step S65). In this case, the delay execution determination unit 123A acquires, from the profile information 15A, the preprocessing time and the convolution processing time corresponding to the latest use model name (model B). The delay execution determination unit 123A acquires, from the profile information 15A, the preprocessing time and the convolution processing time corresponding to the model name corresponding to the request (model A).
The delay execution determination unit 123A acquires the current time from the system (OS) (step S66). The delay execution determination unit 123 calculates a threshold from the following formula (2), and calculates a waiting time from formula (3) by using the calculated threshold (step S67). Formula (3) is the same as formula (1).
Threshold=model B preprocessing time+model B convolution processing time−model A preprocessing time (2)
Waiting time=(GPU latest use time+threshold)−current time (3)
The delay execution determination unit 123A determines whether the waiting time is larger than 0 (step S68). When it is determined that the waiting time is equal to or smaller than 0 (No in step S68), the delay execution determination unit 123A outputs the detected GPU use request and the PID to the use request transmission unit 126, and requests for transmission of the request (step S69). For example, when the waiting time is equal to or smaller than 0, the GPU latest use time is earlier than the current time by the threshold or more. Thus, the delay execution determination unit 123A determines that the subsequent inference process 11 does not overlap the convolution processing of the preceding inference process 11, and makes a GPU use request for the subsequent inference process 11. The delay execution determination unit 123A ends the delay execution determination processing.
On the other hand, when it is determined that the waiting time is larger than 0 (Yes in step S68), the delay execution determination unit 123A adds the GPU use request information and the PID to the request queue 125 (step S70). The delay execution determination unit 123A sets the waiting time in the delay-waiting request management unit 124A (step S71). For example, the delay execution determination unit 123A performs control to delay the start timing of the subsequent inference process 11 by the threshold or more from the start of use of the preceding inference process 11 so that the subsequent inference process 11 does not overlap the convolution processing of the preceding inference process 11 that largely affects the processing time. The delay execution determination unit 123A ends the delay execution determination processing.
When it is determined in step S63 that the request queue 125 is not empty (No in step S63), the delay execution determination unit 123A adds the GPU use request information and the PID to the end of the request queue 125 (step S72). The delay execution determination unit 123A ends the delay execution determination processing.
On the other hand, when it is determined that the waiting time has been set (Yes in step S81), the delay-waiting request management unit 124A waits until the set time passes (step S82). After waiting until the set time passes, the delay-waiting request management unit 124A outputs the first request in the request queue 125 and the PID to the use request transmission unit 126, and requests for transmission of the request (step S83).
The delay-waiting request management unit 124A determines whether the request queue 125 is empty (step S84). When it is determined that the request queue 125 is not empty (No in step S84), the delay-waiting request management unit 124A acquires the model name of the first request in the request queue 125 (step S85). In this case, the model name of the first request is model A. The delay-waiting request management unit 124A acquires the model name corresponding to the transmission request having been made immediately before (step S86). In this case, the model name corresponding to the transmission request having been made immediately before is model B. The delay-waiting request management unit 124A may acquire the model name associated with the GPU latest use time as the model name corresponding to the transmission request having been made immediately before.
The delay-waiting request management unit 124A acquires information corresponding to the model name from the profile information 15A (step S87). In this case, the delay-waiting request management unit 124A acquires the preprocessing time and the convolution processing time corresponding to model A, and acquires the preprocessing time and the convolution processing time corresponding to model B, from the profile information 15A.
The delay-waiting request management unit 124A calculates a threshold from the above-described formula (2) (step S88). The delay-waiting request management unit 124A sets the threshold as a waiting time in order for the next request to wait (step S89). The delay-waiting request management unit 124A proceeds to step S82.
On the other hand, when it is determined that the request queue 125 is empty (Yes in step S84), the delay-waiting request management unit 124A ends the delay-waiting request management processing.
As described above, in the second embodiment, when processes of a plurality of applications use different algorithms, the execution server 1 records, for each algorithm, the processing time of the first step and the processing time of the second step before the first step in the profile information 15A. The execution server 1 calculates a threshold from the processing time of the first step and the processing time of the second step corresponding to the algorithm in the process of the preceding application being executed, and the processing time of the first step corresponding to the algorithm in the process of the subsequent application. The execution server 1 delays the start of the process of the subsequent application by the threshold or more from the start of the process of the preceding application being executed. With such a configuration, even when processes of a plurality of applications use different algorithms, the execution server 1 may suppress an increase in processing time due to overlapping execution of the first steps.
In the first embodiment, the execution server 1 measures the processing time of the convolution processing of any inference process 11 and records the processing time in the profile information 15 as a threshold in advance, and reads and uses the threshold to perform control of delaying the start timing of the subsequent inference process 11. However, the GPU that measures a threshold in advance may be different from the GPU that actually executes GPU use control processing.
In the third embodiment, description will be given for GPU use control processing executed when the GPU that measures a threshold in advance is different from the GPU that actually executes the GPU use control processing.
[Functional Configuration of GPU Use Control Unit]
The profile information 15B stores processing time in addition to a predetermined threshold. The profile information 15B also stores a coefficient for each inference process 11. A threshold is a value obtained by measuring the processing time of convolution processing in advance using a first GPU. Processing time is the entire execution time taken when the inference process 11 is executed by using the first GPU in advance. A coefficient is a ratio between the entire execution time measured in advance using the first GPU and actual processing time taken when the processing is actually executed using a second GPU. Actual processing time and coefficient are calculated by the processing result transmission destination determination unit 128B.
An example of the profile information 15B according to the third embodiment will be described with reference to
As an example, “nn” is stored as the threshold. “t0” is stored as the processing time. “nn” and “t0” are positive integers. When PID is “PID_A”, “coefficient A” is stored as the coefficient.
Referring back to
When the request queue 125 is not empty, the delay execution determination unit 123B accumulates the GPU use request in the request queue 125.
When the coefficient corresponding to the process ID is not set in the profile information 15B, the delay execution determination unit 123B requests the use request transmission unit 126B to execute the GPU use request if the GPU is available. This is to cause the processing result transmission destination determination unit 128B to calculate the actual processing time by causing the target use request to be executed at a timing when no load is applied to the GPU, and to calculate the coefficient corresponding to the process ID of the inference process 11 that has issued the target use request.
The delay-waiting request management unit 124B manages the GPU use requests waiting for delay. For example, the delay-waiting request management unit 124B waits until a waiting time set by the delay execution determination unit 123B passes. After waiting until the waiting time passes, the delay-waiting request management unit 124B makes the first GPU use request in the request queue 125 to the use request transmission unit 126B. The delay-waiting request management unit 124B determines whether the request queue 125 is empty. When the request queue 125 is not empty, the delay-waiting request management unit 124B acquires, from the profile information 15B, the threshold and the coefficient corresponding to the first process ID accumulated in the request queue 125. The delay-waiting request management unit 124B sets, as a waiting time, a new threshold obtained by multiplying the threshold by the coefficient.
When the coefficient corresponding to the process ID is not set in the profile information 15B, the delay-waiting request management unit 124B requests the use request transmission unit 126B to execute the GPU use request if the GPU is available. This is to cause the processing result transmission destination determination unit 128B to calculate the actual processing time by causing the target use request to be executed at a timing when no load is applied to the GPU, and to calculate the coefficient corresponding to the process ID of the inference process 11 that has issued the target use request.
The use request transmission unit 126B transmits a GPU use request to the AI framework 14 via the GPU driver 13. For example, the use request transmission unit 126B updates the latest time of GPU use (GPU latest use time) to the current time. The use request transmission unit 126B records the requesting process ID of the GPU use request in association with the GPU latest use time. The use request transmission unit 126B transmits the GPU use request to the GPU driver 13. The use request transmission unit 126B records the processing state of GPU as “processing”.
The processing result transmission destination determination unit 128B determines a transmission destination of the processing result.
For example, the processing result transmission destination determination unit 128B records the processing state of GPU as “available” indicating that the GPU is not processing. The processing result transmission destination determination unit 128B acquires, as the transmission destination of the processing result, the recorded requesting process ID associated with the GPU latest use time from the use request transmission unit 126B. The processing result transmission destination determination unit 128B transmits the processing result to the inference process 11 corresponding to the requesting process ID via the processing result transmission unit 129.
When the coefficient corresponding to the process ID is not set in the profile information 15B, the processing result transmission destination determination unit 128B calculates the coefficient corresponding to the process ID. As an example, the processing result transmission destination determination unit 128B calculates an actual processing time obtained by subtracting the latest use time from the current time. The use request transmission unit 126B calculates a value obtained by dividing the actual processing time by the processing time set in the profile information 15B as a coefficient, and records the value in the profile information 15B.
[Flowchart of Delay Execution Determination Processing]
Next, the delay execution determination unit 123B determines whether the request queue 125 that accumulates waiting use requests is empty (step S93). When it is determined that the request queue 125 is empty (Yes in step S93), the delay execution determination unit 123B acquires the recorded GPU latest use time (step S94). The GPU latest use time is the latest time of GPU use, and is, for example, a time at which a GPU use request has been most recently transmitted. The GPU latest use time is recorded by the use request transmission unit 126B.
The delay execution determination unit 123B acquires a threshold from the profile information 15B (step S95). The delay execution determination unit 123B acquires the current time from the system (OS) (step S96). The delay execution determination unit 123B acquires the coefficient corresponding to the PID from the profile information 15B (step S97).
The delay execution determination unit 123B determines whether coefficient is empty (step S98). When it is determined that coefficient is empty (Yes in step S98), the delay execution determination unit 123B acquires the processing state of GPU (step S99). The delay execution determination unit 123B determines whether the processing state is “processing” (step S100). When it is determined that the processing state is not “processing” (No in step S100), the delay execution determination unit 123B proceeds to step S102 to request for transmission of the GPU use request. This is to cause the processing result transmission destination determination unit 128B to calculate the actual processing time by causing the target use request to be executed at a timing when no load is applied to the GPU, and to calculate the coefficient corresponding to the process ID of the inference process 11 that has issued the target use request.
On the other hand, when it is determined that the processing state is “processing” (Yes in step S100), the delay execution determination unit 123B adds the GPU use request information and the requesting process ID to the request queue 125 (step S101). In such a case, since a coefficient is not set, the delay execution determination unit 123B may not calculate a waiting time and does not set the waiting time in the delay-waiting request management unit 124B. The delay execution determination unit 123B ends the delay execution determination processing.
When it is determined in step S98 that coefficient is not empty (No in step S98), the delay execution determination unit 123B calculates a waiting time from the following formula (4) (step S103).
Waiting time=(GPU latest use time+threshold×coefficient)−current time (4)
The delay execution determination unit 123B determines whether the waiting time is larger than 0 (step S104). When it is determined that the waiting time is equal to or smaller than 0 (No in step S104), the delay execution determination unit 123B outputs the detected GPU use request and the PID to the use request transmission unit 126B, and requests for transmission of the request (step S102). The delay execution determination unit 123B ends the delay execution determination processing.
On the other hand, when it is determined that the waiting time is larger than 0 (Yes in step S104), the delay execution determination unit 123B adds the GPU use request information and the PID to the request queue 125 (step S105). The delay execution determination unit 123B sets the waiting time in the delay-waiting request management unit 124B (step S106). The delay execution determination unit 123B ends the delay execution determination processing.
When it is determined in step S93 that the request queue 125 is not empty (No in step S93), the delay execution determination unit 123B adds the GPU use request information and the PID to the end of the request queue 125 (step S107). The delay execution determination unit 123B ends the delay execution determination processing.
[Flowchart of Delay-Waiting Request Management Processing]
On the other hand, when it is determined that the waiting time has been set (Yes in step S111), the delay-waiting request management unit 124B waits until the set time passes (step S112). After waiting until the set time passes, the delay-waiting request management unit 124B outputs the first request in the request queue 125 and the PID to the use request transmission unit 126B, and requests for transmission of the request (step S113).
The delay-waiting request management unit 124B determines whether the request queue 125 is empty (step S114). When it is determined that the request queue 125 is not empty (No in step S114), the delay-waiting request management unit 124B acquires the threshold from the profile information 15B (step S115). The delay-waiting request management unit 124B acquires the coefficient corresponding to the PID of the first request in the request queue 125 (step S116).
The delay-waiting request management unit 124B determines whether coefficient is empty (step S117). When it is determined that coefficient is not empty (No in step S117), the delay-waiting request management unit 124B sets, as a waiting time, a value obtained by multiplying the threshold by the coefficient in order for the next request to wait (step S117A). The delay-waiting request management unit 124B proceeds to step S112.
On the other hand, when it is determined that coefficient is empty (Yes in step S117), the delay-waiting request management unit 124B acquires the processing state of GPU (step S118A). The delay-waiting request management unit 124B determines whether the processing state is “processing” (step S118B). When it is determined that the processing state is “processing” (Yes in step S118B), the delay-waiting request management unit 124B ends the delay-waiting request management processing.
On the other hand, when it is determined that the processing state is not “processing” (No in step S118B), the delay-waiting request management unit 124B outputs the first request in the request queue 125 and the PID to the use request transmission unit 126B, and requests for transmission of the request (step S118C). This is to cause the processing result transmission destination determination unit 128B to calculate the actual processing time by causing the target use request to be executed at a timing when no load is applied to the GPU, and to calculate the coefficient corresponding to the process ID of the inference process 11 that has issued the target use request. The delay-waiting request management unit 124B ends the delay-waiting request management processing.
When it is determined in step S114 that the request queue 125 is empty (Yes in step S114), the delay-waiting request management unit 124B ends the delay-waiting request management processing.
[Flowchart of Use Request Transmission Processing]
On the other hand, when it is determined that there has been a request for transmission of a GPU use request (Yes in step S121), the use request transmission unit 126B acquires the current time from the system (OS) (step S122). The use request transmission unit 126B updates the GPU latest use time to the current time (step S123). The use request transmission unit 126B records the requesting PID in association with the GPU latest use time (step S124).
The use request transmission unit 126B transmits the GPU use request to the GPU driver 13 (step S125). The use request transmission unit 126B records the processing state of GPU as “processing” (step S126). The use request transmission unit 126B ends the use request transmission processing.
[Flowchart of Processing Result Transmission Destination Determination Processing]
On the other hand, when it is determined that the processing result has been received (Yes in step S131), the processing result transmission destination determination unit 128B records the processing state of GPU as “available” (step S132). The processing result transmission destination determination unit 128B acquires the recorded requesting PID from the use request transmission unit 126B (step S133). The processing result transmission destination determination unit 128B acquires, from the profile information 15B, the coefficient corresponding to the acquired PID (step S134).
Next, the processing result transmission destination determination unit 128B determines whether coefficient is empty (step S135). When it is determined that coefficient is empty (Yes in step S135), the processing result transmission destination determination unit 128B acquires the current time from the system (OS) (step S136). The processing result transmission destination determination unit 128B calculates a value obtained by subtracting the GPU latest use time from the current time as the actual processing time (step S137).
The processing result transmission destination determination unit 128B acquires the processing time from the profile information 15B (step S138). The processing result transmission destination determination unit 128B records (actual processing time/processing time) in the profile information 15B as the coefficient corresponding to the PID (step S139).
The processing result transmission destination determination unit 128B determines whether the request queue is empty (step S140). When it is determined that the request queue is empty (Yes in step S140), the processing result transmission destination determination unit 128B proceeds to step S142.
On the other hand, when it is determined that the request queue is not empty (No in step S140), the processing result transmission destination determination unit 128B sets the waiting time to 0 in the delay-waiting request management unit 124B to immediately start the next request (step S141). The processing result transmission destination determination unit 128B proceeds to step S142.
In step S142, the processing result transmission destination determination unit 128B transmits the processing result to the application (inference process 11) corresponding to the acquired PID (step S142). The processing result transmission destination determination unit 128B ends the processing result transmission destination determination processing.
[Use of Multiple Control]
As described above, in the third embodiment, when processes of a plurality of applications use the same algorithm, the execution server 1 sets, as the threshold, a value obtained by measuring the processing time of the first step with the first GPU. The execution server 1 further records, in the profile information 15B, the total processing time of the process of any application executed with the first GPU. When a process is executed with the second GPU different from the first GPU, the execution server 1 performs control such that the first process of an application does not overlap the process of another application, and measures the total processing time of the process. The execution server 1 calculates a ratio between the total processing time stored in the profile information 15B and the measured total processing time, and uses, as a new threshold, a value obtained by multiplying the threshold by the calculated ratio. With such a configuration, even when the GPU that executes a process is changed, the execution server 1 may suppress an increase in processing time due to overlapping execution.
In the third embodiment, description is given for multiple control performed by the execution server 1 when a plurality of inference processes 11 use the same algorithm. However, the execution server 1 may also perform multiple control when a plurality of inference processes 11 use different algorithms. For example, when processes of a plurality of applications use different algorithms, the execution server 1 measures the total processing time of the process of an application executed with the first GPU for each algorithm, and records the total processing time in the profile information 15B. When a process is executed with the second GPU different from the first GPU, the execution server 1 performs control such that the first process of an application does not overlap the process of another application, and measures the total processing time of the process for each algorithm. The execution server 1 calculates a ratio (coefficient) for each algorithm from the total processing time for each algorithm stored in the profile information 15B and the measured total processing time for each algorithm, and calculates a new threshold using the calculated ratio for each algorithm and the threshold. The execution server 1 may calculate a waiting time of the corresponding inference process 11 by using the new threshold corresponding to the algorithm. Thus, even when a plurality of inference processes 11 use different algorithms and the GPU that executes a process is changed, the execution server 1 may suppress an increase in processing time due to overlapping execution.
Each component of the GPU use control unit 12 included in the execution server 1 illustrated in the drawings does not necessarily have to be physically configured as illustrated in the drawings. For example, specific forms of separation and integration of each device are not limited to those illustrated in the drawings, and all or a part thereof may be functionally or physically separated and integrated in any unit depending on various loads, usage states, and the like. For example, the reading unit 122 and the delay execution determination unit 123 may be integrated as one unit. The delay-waiting request management unit 124 may be separated into a waiting unit that causes a GPU use request to wait for a set waiting time and a setting unit that calculates and sets a waiting time for the next GPU use request. A storage unit (not illustrated) that stores the profile information 15 and the like may be coupled via a network as an external device of the execution server 1.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2021-22593 | Feb 2021 | JP | national |