This application claims the benefit under 35 USC § 119 Korean Patent Application No. 10-2021-0153402, filed on Nov. 9, 2021 in the Korean Intellectual Property Office, the entire disclosure of which is incorporated herein by reference for all purposes.
The present disclosure relates to a system and method for scheduling machine learning jobs, and more particularly, to a system and method that calculate the expected time required for machine learning jobs and allocates cloud resources for each machine learning job requested to execute using the calculating results.
In order to execute machine learning jobs, a large number of computational resources have to be invested. Accordingly, a machine learning job is generally performed by allocating resources through a cloud system consisting of multiple computational resources. Therefore, there is a clear need for scheduling for determining the execution order of each machine learning job.
For efficient scheduling, it is necessary to predict the expected execution time of each machine learning job. Conventionally, a statistical model based on a floating point operation (FLOP) of the model execution or a model where the execution time was machine-learned according to the FLOP was used to predict the execution time of the machine learning job. However, since the execution time of the machine learning job is determined by various factors other than the FLOP, it is difficult to trust prediction of the expected time of execution based only on the FLOP of the model execution. Therefore, the scheduling for determining the execution order of each machine learning job has not been also efficiently performed.
In a paper entitled “Predicting the Computational Cost of Deep Learning Models”, Daniel Justus et al. provided a method for machine-learning a model that outputs computational costs for learning a deep learning model using several layer features and hardware features. However, in addition to the layer features and hardware features, a variety of factors affect the expected execution time of machine learning jobs; therefore, despite the description of Daniel Justus's paper, there is still a need to accurately expect the execution time of machine learning jobs.
In addition, in a system requested to execute machine learning jobs, there is a need (request) for providing a system that receives the minimum amount of information while obtaining the exact expected execution time using the received information in order to increase user convenience, and handles even automatic scheduling based on the expected execution time.
Aspects of the present disclosure provide a system and method for scheduling machine learning jobs that automatically determine an expected execution time using an analysis result of the machine learning jobs, and schedule the machine learning jobs using the determined expected execution time, in a system for processing a request for an execution of the machine learning jobs.
Aspects of the present disclosure also provide a system and method that accurately determine the expected execution time of machine learning jobs using new input factors that have not been previously considered.
The technical aspects of the present disclosure are not restricted to those set forth herein, and other unmentioned technical aspects will be clearly understood by one of ordinary skill in the art to which the present disclosure pertains by referencing the detailed description of the present disclosure given below.
According to some embodiments of the present disclosure, there is provided a system for scheduling machine learning jobs. The system includes a user interface provision unit configured to transmit, to a user terminal, data for forming an input interface to receive job information including information indicative of a checkpoint file of a model to be executed and require resource information, a resource management unit configured to manage a job queue including items, wherein each item of the job queue includes the job information and annotation including an expected execution time, complement for an item in which the expected execution time is not recorded in the annotation by receiving the expected execution time from an execution time expectation unit, and execute job scheduling using the job information of each item of the job queue, and the execution time expectation unit configured to calculate memory usage necessary for executing the model with information of the checkpoint file and determine the expected execution time of the model, using the memory usage, wherein the checkpoint file is a file output by a machine learning framework for storing the model to be executed.
According to another embodiments of the present disclosure, there is provided a method for scheduling machine learning jobs performed by a computing device. The method includes receiving job information including information indicative of a checkpoint file of a model to be executed and required resource information, determining an expected execution time of the model using memory usage for executing the model after calculating the memory usage, using information on the checkpoint file, and automatically performing job scheduling for the model, using the expected execution time of the model and the required resource information, wherein the checkpoint file is a file output by a machine learning framework for storing the model to be executed.
According to another embodiments of the present disclosure, there is provided a computer program coupled to a computing device and a computer-readable medium storing the computer program. The computer program includes instructions for receiving job information including information indicative of a checkpoint file of a model and required resource information, wherein the checkpoint file is a file output by a machine learning framework for storing the model to be executed, determining an expected execution time of the model using memory usage after calculating the memory usage necessary for executing the model to be executed using information of the checkpoint file, and automatically performing job scheduling for the model using the expected execution time of the model and the required resource information.
The above and other aspects and features of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:
Hereinafter, preferred embodiments of the present disclosure will be described with reference to the attached drawings. Advantages and features of the present disclosure and methods of accomplishing the same may be understood more readily by reference to the following detailed description of preferred embodiments and the accompanying drawings.
The present disclosure may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the concept of the disclosure to those skilled in the art, and the present disclosure will only be defined by the appended claims.
In adding reference numerals to the components of each drawing, it should be noted that the same reference numerals are assigned to the same components as much as possible even though they are shown in different drawings. In addition, in describing the present disclosure, when it is determined that the detailed description of the related well-known configuration or function may obscure the gist of the present disclosure, the detailed description thereof will be omitted.
Unless otherwise defined, all terms used in the present specification (including technical and scientific terms) may be used in a sense that can be commonly understood by those skilled in the art. In addition, the terms defined in the commonly used dictionaries are not ideally or excessively interpreted unless they are specifically defined clearly. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase.
In addition, in describing the component of this disclosure, terms, such as first, second, A, B, (a), (b), can be used. These terms are only for distinguishing the components from other components, and the nature or order of the components is not limited by the terms. If a component is described as being “connected,” “coupled” or “contacted” to another component, that component may be directly connected to or contacted with that other component, but it should be understood that another component also may be “connected,” “coupled” or “contacted” between each component.
The terms “comprise”, “include”, “have”, etc. when used in this specification, specify the presence of stated features, integers, steps, operations, elements, components, and/or combinations of them but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or combinations thereof
Hereinafter, embodiments of the present disclosure will be described with reference to the attached drawings.
Referring to
As illustrated in
A machine learning resource 30 may be understood as a cloud service-based server farm composed of a plurality of physical servers or virtual servers. The machine learning resource 30 may perform a machine learning job under the control of the resource management unit 200.
In addition, the machine learning resource 30 may perform a machine learning job using training data stored in a user data storage 20. The user data storage 20 is accessed by a user terminal 10 to input and output data for machine learning, such as training data. While
The resource management unit 200 monitors an idle resource of the machine learning resource 30, and processes a request for performing a machine learning job input through the user interface provision unit 300 using the monitoring results. In addition, the resource management unit 200 registers the machine learning job in a job queue, which was requested for execution, then requests and receives an expected execution time of the machine learning job from the execution time expectation unit 100, and, finally, determines when each machine learning job will be performed in the machine learning resource 30 using the expected execution time and the results of monitoring the idle resource of the machine learning resource 30.
An operation of the resource management unit 200 will be described in more detail with reference to
A job queue 210 is a data structure operating in a first-in-first-out (FIFO) scheme, and job information on the machine learning job may be stored in each item of the job queue 210.
A job queue management module 220 manages the job queue 210. In other words, the job queue management module 220 receives job information on a new machine learning job input through the user interface provision unit 300, sets the job information indicative of the new machine learning job, and inserts the set job information into the job queue 210. In that case, the job information set by the job queue management module 220 may include information indicative of a checkpoint file of a model to be executed and required resource information, and both the information indicative of the checkpoint file and the required resource information are information input through the user interface provision unit 300.
In addition, the job queue management module 220 may insert the job information and an annotation together into the job queue 210. The annotation is detailed information on the job, and at least some of the information recorded in the annotation may be used to schedule the job. The annotation may include a field indicating an expected execution time. The job queue management module 220 may insert, into the job queue 210, the annotation in which the job information on the new machine learning job and a value of the expected execution time are not recorded.
In addition, the job queue management module 220 may execute a complementary job on a new item in which the expected execution time is not recorded in the annotation among items included in the job queue 210. The complementary job denotes that the expected execution time for the new item is requested and received from the execution time expectation unit 300, and the expected execution time is recorded in the annotation of the new item.
A scheduler 230 receives idle resource information of the machine learning resource 30 from a resource monitor 240 and executes job scheduling using the idle resource information and job information of each item of the job queue 210.
The scheduler 230 may request the job queue management module 220 to execute the complementary job. The scheduler 230 may periodically perform a scheduling process, and may request the job queue management module 220 to perform the complementary job at a start point of the scheduling process.
In other words, the scheduling process may include providing, to the execution time expectation unit, an expected execution time request for the new item in which the expected execution time is not recorded in the annotation among each item of the job queue, receiving the expected execution time from the execution time expectation unit in response to the expected execution time request, and recording the received expected execution time as an expected execution time of an annotation of the new item. The expected execution time for the new machine learning job can be determined immediately before performing the scheduling, thus obtaining the effect capable of determining the expected execution time using the latest updated information.
Meanwhile, in some embodiments, the scheduling using items in which the expected execution time is recorded in the annotation among each item of the job queue is performed first, and then, when there are idle resources, additional scheduling may be performed by further using a new item in which the expected execution time is not recorded in the annotation among each item of the job queue.
In other words, the scheduling process may include allocating resources using the job information of each item, for items in which the expected execution time is recorded in the annotation among each item of the job queue, and, when the idle resources are present even after the allocation of resources, allocating resources using the job information of the new item in which the expected execution time is not recorded in the annotation among each item of the job queue. Furthermore, allocating resources using the job information of the new item may include providing the execution time expectation unit with the expected execution time request for the new item, receiving the expected execution time from the execution time expectation unit in response to the expected execution time request, and recording the received expected execution time as an expected execution time of the annotation. As such, the scheduling using items in which the expected execution time is recorded in the annotation among each item of the job queue can be performed first, thereby obtaining the effect capable of allocating machine learning jobs to the idle resources as soon as possible. In other words, in this case, it is possible to minimize a delay in the allocation of resources even though there are the idle resource, by means of calculation of the expected execution time.
Next, an operation of the execution time expectation unit 100 will be described in more detail with reference to
The execution time expectation unit 100 receives the job information input through the user interface provision unit 300 from the resource management unit 200. The job information includes the information on the checkpoint file and the required resource information. For example, the information on the checkpoint file may be a path of the checkpoint file on the user data storage. The checkpoint file may be a file output by a machine learning framework for storing the model to be executed. The machine learning framework may be, for example, any one of general-purpose machine learning frameworks such as TensorFlow, PyTorch, scikit-learn, SparkML, Torch, Huggingface, and Keras. The checkpoint file should be understood as a representation indicative of all forms of files generated as a result of exporting the machine learning result of the model by the machine learning framework; in addition, it should not be regarded as a checkpoint file according to the present disclosure because its name is different from that of a checkpoint file.
A conversion module 110 may convert the checkpoint file into an intermediate representation that represents the model to be executed by using the information on the checkpoint file. In addition, an analysis module 120 may analyze the intermediate representation and extract parameter information of the model to be executed.
In addition, an expectation module 130 may obtain memory usage necessary for executing the model to be executed, using parameters of the model to be executed, and determine an expected execution time of the model to be executed using the memory usage.
In some embodiments, the expectation module 130 can determine the expected execution time using the memory usage and the required resource information, thus determining the expected execution time differently according to input resources even when the memory usage is the same.
A more detailed idea of the operation of the execution time expectation unit 100 may be understood with reference to embodiments described below.
The input area of the required resource information 310b is an area where information on hardware resources that have to be allocated to the model to be executed is input.
An input window of path information of the checkpoint file may be included in the input area of the machine learning information 310c. Although
In some embodiments, the user interface provision unit 300 may further display information on a scheduling result of a job requested to be executed and the expected execution time. Accordingly, a user who requested to execute the job may confirm that the processing result for his or her request is updated. In addition, if job scheduling is performed periodically, the user interface provision unit 300 may further display the remaining time until the scheduling result of the job requested to be executed is updated.
An example in which the checkpoint file 40 is converted into the intermediate representation 111 will be described with reference to
The format of the intermediate expression 111 may include an arrangement of pairs (of (keys and values). The key may include input data 111a, a kernel 111b, output data 111c, padding 111d, and a stride 111e.
A value of the input data 111a may include a first axis size 111a-1 of the input data, a second axis size 111a-1 of the input data, and the number of input channels 111a-2 of the input data, and a value of the kernel 111b-1 may include the number of input channel 111b-1 of the kernel, a first axis size 111b-2 of the kernel, a second axis size 111b-2 of the kernel, and the number of channels 111b-3 output as a result of a convolution operation.
A process in which each value of the intermediate representation 111 is set will be described below.
First, as a result of analyzing the checkpoint file 40a of the PyTorch, parameters of each layer may be identified. For example, a first layer recorded in the checkpoint file 40a is a convolution layer that performs a 2d convolution operation. As a result of parsing a syntax indicative of the convolution layer, the number of input channels 40a-1 of the kernel, the number of output channels 40a-2 of the kernel, first and second axis sizes 40a-3 of the kernel, a stride 40a-4, and padding 40a-5 are extracted respectively. Since the extracted values correspond to the values for each key of the intermediate expression 111, they will be used to set each value of the intermediate expression 111.
First, the parameters of each layer may be identified with a result of analyzing the checkpoint file 40b of the Tensorflow. The layer recorded in the checkpoint file 40b is also a convolution layer that performs the 2d convolution operation. As a result of parsing the syntax indicative of the convolution layer, data on input data 40b-1, padding 40b-2, a stride 40b-3 and a kernel 40b-4 may be extracted. Since each extracted data includes values corresponding to the values for each key of the intermediate representation 111, they will be used to set each value of the intermediate expression 111.
However, in the case of the padding 40b-2, the value is marked as “SAME”, meaning that the padding is set so that the shapes of input and output data are identical to each other even if the convolution is performed, and accordingly, padding values 111d of the intermediate representation are calculated to be an equation such as (input data size−kernel size+2*padding size)/2+1=output data size=input data size. First to fourth values of the padding values 111d denote padding values in one direction with respect to batch size, x, y, and channel dimensions, respectively. In addition, fifth to sixth values of the padding values 111d refer to padding values in the opposite direction.
The meanings of each parameter illustrated in
Ix=First axis size of input data, Iy=Second axis size of input data, Cin=Number of channels of input data, Kx=First axis size of kernel, Ky=Second axis size of kernel, Cout=Number of channels output as a result of convolution operation, Ox=First axis size of output data, Oy=Second axis size of output data, and precision=Number of bytes per element
The parameters of
In some embodiments, the memory usage used by the model to be executed is calculated by (number of elements of input data, convolution operation data and output data)×(size of each element). That is, this is {(Ix×Iy×Cin)+(Kx×Ky×Cin×Cout)+(Ox×Oy×Cout)}×(precision), and the memory usage according to the example of the intermediate representation illustrated in
The expectation module 130 may calculate the time required for each layer using at least one of an analytical model 51 and a data-driven model 52 such as a deep learning model.
The analytical model refers to an equation for receiving a floating point operation (FLOP) for each layer. In the case of the convolution layer illustrated in
Meanwhile, the data-driven model 52 may be a deep learning model that is machine-learned to receive an additional feature along with the parameters of each layer and output a required time. The additional feature may include the memory usage. In addition, the additional feature may further include the FLOP. Given that memory usage can be a new input factor that has not been considered in determining the time required for machine learning jobs, and that with improvement of GPU performance, the time required to load training data into a memory can be a major bottleneck, the data-driven model 52 that receives the memory usage as a feature may accurately infer the time required for each layer.
The expectation module 130 may finally determine the time required for the layers by using at least one of the time required for the layer output from the analytical model 51 and the time required for the layer output from the data-driven model 52.
In some embodiments, the expectation module 130 may determine the time required for the layer output from the data-driven model 52 in which the memory usage is input as a feature, as a final time required for the layer.
In some embodiments, the expectation module 130 may finally determine the time required for the layer by weighting and adding up the time required for the layer output from the analytical model 51 and the time required for the layer output from the data-driven model 52, and weights of the analytical model 51 and the data-driven model 52 may be dynamically adjusted based on information on idle resources.
The methods described thus far can more accurately determine the expected execution time of the model to be executed, which enables the scheduling for machine learning jobs to satisfy both fairness and efficiency.
On the other hand, if the expected execution time of the model to be executed can be accurately determined and trusted, this will enable scheduling that satisfies both fairness and efficiency.
Next, a method for scheduling machine learning jobs according to another embodiment of the present disclosure will be described with reference to
First, this will be described with reference to
When inputting the job information including the information on the checkpoint file and the required resource (S100), the job information is input to the job queue (S110). At this time, the job information and the annotation are input to the job queue, and the expected execution time included in the annotation is not recorded.
Next, each item input to the job queue is checked (S120). According to the checking results, when there is an item including an annotation in which the expected execution time is not recorded (S130), the expected execution time is determined using the job information of the item according to the aforementioned embodiment (S140).
Next, job scheduling is advanced using an idle resource status according to resource monitoring information and the information of the job queue (S150). When the job scheduling during a corresponding period is completed, waiting is continued during the period of a scheduling cycle (S160). The job scheduling takes into account both efficiency and fairness and will be described in more detail with reference to
Basically, the job scheduling is performed on items to which the resources have not been allocated among all items of the job queue. An index of the job queue is initialized (S1500), and the job information of the current index is retrieved from the job queue (S1510).
It is determined whether the amount of current idle resources according to the resource monitoring information exceeds the amount of required resources according to the job information of the current index (S1520). For example, when the current idle resources are two cores of the GPU and the required resources according to the job information of the current index are one core of the GPU, it will be determined that the amount of current idle resources exceeds the amount of required resources according to the job information of the current index.
However, just because the amount of current idle resources exceeds the amount of required resource according to the job information of the current index, the resources are not unconditionally allocated for machine learning jobs according to the job information of the current index. This is because resource allocation may cause a delay in scheduled pre-registration jobs. Therefore, the expected execution time should be accurately predicted. In order to accurately determine whether or not the resource allocation causes the delay in the scheduled pre-registration jobs, the expected execution time has to be accurately predicted.
Accordingly, even if the amount of current idle resources exceeds the amount of required resources according to the job information of the current index, the resources are allocated (S1540) only when the resource allocation does not cause the delay in the scheduled pre-registration jobs at the time of allocating the resources to the job of the current index (S1530).
Next, a next index of the job queue will be sequentially processed (S1550) and, when the processing of all items of the job queue is completed, the scheduling of the corresponding period will be completed (S1560).
Meanwhile, in some embodiments, as illustrated in
In other words, in that case, when there are the idle resources, each item of the job queue is checked (S120), and a complementary action for recording the expected execution time is performed (S140) on an item in which the expected execution time is not recorded (S130), and additional scheduling may be performed using new items in which the information on the expected execution time is not recorded in the annotation the item among the items of the jog queue (S141). When the scheduling is completed in this way, the waiting may be continued during the period of a scheduling cycle (S142).
As such, the scheduling can be performed first using the items in which the expected execution time is recorded in the annotation among each of the items of the job queue, thus obtaining the effect capable of allocating the idle resources to the machine learning jobs as soon as possible. In other words, in that case, it is possible to minimize a delay in the resource allocation even though there are the idle resources, by means of the calculation of the expected execution time.
The technical idea of the present disclosure described with reference to
Hereinafter, a hardware configuration of an exemplary computing device according to some embodiments of the present disclosure will be described with reference to
The processor 1100 controls the overall operations of each component of a computing device 2000. The processor 1100 may be understood as a central processing unit (CPU). Furthermore, the processor 1100 may perform an arithmetic operation on at least one application or program for performing method/operations according to various embodiments of the present disclosure.
The memory 1400 stores different types of data, instructions, and/or information. The memory 1400 may load one or more programs 1500 from the storage 1300 in order to perform the methods/operations according to various embodiments of the present disclosure. An example of the memory 1400 may be a random access memory (RAM), but the present disclosure is not limited thereto.
The system bus 1600 provides a communication function between components of the computing device 1000. The system bus 1600 may be implemented as various types of buses such as an address bus, a data bus, and a control bus. The network interface 1200 supports wired/wireless Internet communication of the computing device 1000.
The storage 1300 may non-temporarily store one or more computer programs 1500. The storage 1300 may include a nonvolatile memory such as a flash memory, a hard disk, a removable disk, or any type of computer-readable recording medium well known in the technical field to which the present disclosure belongs.
The computer program 1500 may include one or more instructions implementing the methods/operations according to various embodiments of the present disclosure. When the computer program 1500 is loaded into the memory 1400, the processor 1100 may execute the one or more instructions to perform methods according to various embodiments of the present disclosure.
The computer program 1500 may include an instruction of transmitting, to the user terminal, data for forming an input interface to receive the job information including information indicative of the checkpoint file of the model to be executed and the required resource information, an instruction of managing the job queue, wherein each item of the job queue includes the job information and the annotation, and the annotation includes the expected execution time, an instruction of complementing for items in which the expected execution time is not recorded in the annotation by receiving the expected execution time from the execution time expectation unit, and performing the job scheduling using the job information of each item of the job queue, and an instruction of calculating the memory usage necessary for executing the model to be executed, using the information on the checkpoint file, and determining the expected execution time of the model to be executed by using the memory usage.
The technical features of the present disclosure described so far may be embodied as computer readable codes on a computer readable medium. The computer readable medium may be, for example, a removable recording medium (CD, DVD, Blu-ray disc, USB storage device, removable hard disk) or a fixed recording medium (ROM, RAM, computer equipped hard disk). The computer program recorded on the computer readable medium may be transmitted to other computing device via a network such as internet and installed in the other computing device, thereby being used in the other computing device.
Although operations are shown in a specific order in the drawings, it should not be understood that desired results can be obtained when the operations must be performed in the specific order or sequential order or when all of the operations must be performed. In certain situations, multitasking and parallel processing may be advantageous. According to the above-described embodiments, it should not be understood that the separation of various configurations is necessarily required, and it should be understood that the described program components and systems may generally be integrated together into a single software product or be packaged into multiple software products.
In concluding the detailed description, those skilled in the art will appreciate that many variations and modifications can be made to the preferred embodiments without substantially departing from the principles of the present disclosure. Therefore, the disclosed preferred embodiments of the disclosure are used in a generic and descriptive sense only and not for purposes of limitation.
Number | Date | Country | Kind |
---|---|---|---|
10-2021-0153402 | Nov 2021 | KR | national |