The field relates generally to information processing systems, and more particularly to techniques for processing data using such systems.
Deep learning techniques typically include a training phase and an inference phase. The training phase commonly involves a process of creating a machine learning model and/or training a created machine learning model, which are often compute-intensive procedures. The inference phase commonly involves a process of using the trained machine learning model to generate a prediction. Also, the inference phase can occur in both edge devices (e.g., laptops, mobile devices, etc.) and datacenters.
Inference servers in datacenters often have common attributes and/or functionalities, such as, for example, obtaining queries from one or more sources and sending back predicted results within one or more certain latency constraints without degrading the quality of the prediction(s). Also, as more models are trained, implementing and/or deploying such models at scale presents challenges related to hyperparameters. Conventional deep learning-related approaches include utilization of the same set of hyperparameters across multiple models regardless of the differing topologies associated with the models, which can often limit and/or reduce model performance. Additionally, conventional deep learning-related approaches typically perform hyperparameter tuning exclusively during the training phase, and not during the inference phase.
Illustrative embodiments of the disclosure provide techniques for automated topology-aware deep learning inference tuning. An exemplary computer-implemented method includes obtaining input information from one or more systems associated with a datacenter, and detecting topological information associated with at least a portion of the one or more systems by processing at least a portion of the input information, wherein the topological information is related to hardware topology. The method also includes automatically selecting one or more of multiple hyperparameters of at least one deep learning model based at least in part on the detected topological information, and determining a status of at least a portion of the detected topological information by processing, during an inference phase of the at least one deep learning model, the detected topological information and data from at least one systems-related database. Further, the method additionally includes performing, in connection with at least a portion of the one or more selected hyperparameters of the at least one deep learning model, one or more automated actions based at least in part on the determining.
Illustrative embodiments can provide significant advantages relative to conventional deep learning-related approaches. For example, problems associated with performing topology-indifferent hyperparameter tuning exclusively during the training phase are overcome in one or more embodiments through automatically performing topology-aware tuning of deep learning models during an inference phase.
These and other illustrative embodiments described herein include, without limitation, methods, apparatus, systems, and computer program products comprising processor-readable storage media.
Illustrative embodiments will be described herein with reference to exemplary computer networks and associated computers, servers, network devices or other types of processing devices. It is to be appreciated, however, that these and other embodiments are not restricted to use with the particular illustrative network and device configurations shown. Accordingly, the term “computer network” as used herein is intended to be broadly construed, so as to encompass, for example, any system comprising multiple networked processing devices.
The user devices 102 may comprise, for example, mobile telephones, laptop computers, tablet computers, desktop computers or other types of computing devices. Such devices are examples of what are more generally referred to herein as “processing devices.” Some of these processing devices are also generally referred to herein as “computers.”
The user devices 102 in some embodiments comprise respective computers associated with a particular company, organization or other enterprise. In addition, at least portions of the computer network 100 may also be referred to herein as collectively comprising an “enterprise network.” Numerous other operating scenarios involving a wide variety of different types and arrangements of processing devices and networks are possible, as will be appreciated by those skilled in the art.
Also, it is to be appreciated that the term “user” in this context and elsewhere herein is intended to be broadly construed so as to encompass, for example, human, hardware, software or firmware entities, as well as various combinations of such entities.
The network 104 is assumed to comprise a portion of a global computer network such as the Internet, although other types of networks can be part of the computer network 100, including a wide area network (WAN), a local area network (LAN), a satellite network, a telephone or cable network, a cellular network, a wireless network such as a Wi-Fi or WiMAX network, or various portions or combinations of these and other types of networks. The computer network 100 in some embodiments therefore comprises combinations of multiple different types of networks, each comprising processing devices configured to communicate using internet protocol (IP) or other related communication protocols.
Additionally, automated deep learning inference tuning system 105 can have an associated machine learning model-related database 106 configured to store data pertaining to hyperparameters, hyperparameter values, model attributes, system configuration data, etc.
The database 106 in the present embodiment is implemented using one or more storage systems associated with automated deep learning inference tuning system 105. Such storage systems can comprise any of a variety of different types of storage including network-attached storage (NAS), storage area networks (SANs), direct-attached storage (DAS) and distributed DAS, as well as combinations of these and other storage types, including software-defined storage.
Also associated with automated deep learning inference tuning system 105 are one or more input-output devices, which illustratively comprise keyboards, displays or other types of input-output devices in any combination. Such input-output devices can be used, for example, to support one or more user interfaces to automated deep learning inference tuning system 105, as well as to support communication between automated deep learning inference tuning system 105 and other related systems and devices not explicitly shown.
Additionally, automated deep learning inference tuning system 105 in the
More particularly, automated deep learning inference tuning system 105 in this embodiment can comprise a processor coupled to a memory and a network interface.
The processor illustratively comprises a graphics processing unit (GPU) such as, for example, a general-purpose graphics processing unit (GPGPU) or other accelerator, a microprocessor, a microcontroller, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other type of processing circuitry, as well as portions or combinations of such circuitry elements.
The memory illustratively comprises random access memory (RAM), read-only memory (ROM) or other types of memory, in any combination. The memory and other memories disclosed herein may be viewed as examples of what are more generally referred to as “processor-readable storage media” storing executable computer program code or other types of software programs.
One or more embodiments include articles of manufacture, such as computer-readable storage media. Examples of an article of manufacture include, without limitation, a storage device such as a storage disk, a storage array or an integrated circuit containing memory, as well as a wide variety of other types of computer program products. The term “article of manufacture” as used herein should be understood to exclude transitory, propagating signals. These and other references to “disks” herein are intended to refer generally to storage devices, including solid-state drives (SSDs), and should therefore not be viewed as limited in any way to spinning magnetic media.
The network interface allows automated deep learning inference tuning system 105 to communicate over the network 104 with the user devices 102, and illustratively comprises one or more conventional transceivers.
The automated deep learning inference tuning system 105 further comprises a load balancer 112, an optimization engine 114, and an inference engine 116.
It is to be appreciated that this particular arrangement of elements 112, 114 and 116 illustrated in automated deep learning inference tuning system 105 of the
At least portions of elements 112, 114 and 116 may be implemented at least in part in the form of software that is stored in memory and executed by a processor.
It is to be understood that the particular set of elements shown in
An exemplary process utilizing elements 112, 114 and 116 of an example automated deep learning inference tuning system 105 in computer network 100 will be described in more detail with reference to the flow diagram of
Accordingly, at least one embodiment includes automated topology-aware deep learning inference tuning methods for one or more servers in a datacenter (which can include one or more collections of systems such as, for example, geographically-distributed computing systems, enterprise computing systems, etc.). Such an embodiment includes utilizing a real-time inference loop to check with at least one database to determine if a given set of topological information (e.g., hardware-related topological information) associated with a machine learning model is new, and if the topological information is not new, retrieving one or more known values from the database(s) without needing to rerun an optimization technique. Such topological information can include, for example, the number of central processing units (CPUs) and/or GPUs in a given system (e.g., a given accelerator), how the CPUs and/or GPUs are connected (e.g., one CPU directly connected to one GPU, one CPU connected to two GPUs via a peripheral component interconnect express (PCIe) switch, etc.), overall system connection information with respect to at least one given accelerator, etc.
Additionally, one or more embodiments can include linking and/or associating with at least one user's machine learning operations pipeline such that, for example, a hardware-specific optimization layer is triggered only if the pipeline is triggered. As used herein, a pipeline is a concept used in a Kubernetes context (e.g., in connection with deep learning techniques). Specifically, a “pipeline” refers to the sequence of operations that are undergone in a given system or platform (e.g., a MLOps platform). In connection with such a pipeline, users (e.g., customers, machine learning engineers, etc.) can utilize a set of items that are defined and well-established from data preprocessing to model production. Also, such a pipeline can include sequences of elements that are engaged (or “triggered”) as and when there is a reason for the pipeline to be engaged or triggered. Such reasons can include, for example, that a dataset was changed (e.g., a given dataset is not coming from the same distribution as previously and/or as another dataset, etc.), a given model has been retrained and is performing better than a given baseline, a bottleneck step in a given process has been reduced and/or eliminated, the base-working case of an existing setup was altered, etc.
In other words, techniques detailed herein in connection with one or more embodiments will not be a disruption to a given user's working setup and will not be required to be triggered every time there is a need to perform inferencing. In such an embodiment, the techniques will only be carried out when a given pipeline is triggered.
As illustrated in
As further detailed herein, in one or more embodiments, optimization engine 214 performs one or more optimization techniques based at least in part on the hardware topology associated with user device(s) 202 and one or more policy sets. By way of example, in one or more embodiments, a policy set can include aspects such as behaviors of the system, wherein accelerator-specific implementation details are examined across different parts of a stack, and the appropriate algorithm is selected to tune for the best hyperparameters and make intelligent choices for enabling faster inference processing by reducing latencies. An example policy set can be built to be extensible and allowed for later modifications to accommodate new algorithms and/or techniques. As is to be appreciated by one skilled in the art, deep learning models and other artificial intelligence and/or machine learning algorithms commonly include model parameters and model hyperparameters. Model parameters are typically learned from training data (e.g., in a linear regression, the coefficients are model parameters), while model hyperparameters typically vary from algorithm to algorithm and can be tuned in an attempt to optimize the performance and accuracy of the algorithm. By way merely of example, three potential hyperparameters for a gradient boosting regressor algorithm with the corresponding range of values can include the following: criterion: ‘mse,’ ‘mae,’ ‘Friedman_mse;’ max_features: ‘auto,’ ‘sqrt,’ ‘log 2;’ and min_samples_leaf: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11].
Accordingly, in one or more embodiments, once improved hyperparameter sets (e.g., optimal hyperparameter sets) are identified (e.g., within a given period of time), automated deep learning inference tuning system 205 can, for example, automatically implement the identified hyperparameter values and/or provide those values to one or more production systems. Additionally or alternatively, the same type of system can directly use known hypermeter sets (as that of the same type of system) if the hyperparameters are known and/or have been used before on the same configuration(s).
As illustrated,
It is to be appreciated that this particular example code snippet shows just one example implementation of a JSON file generated by a configurator for a deep learning model, and alternative implementations of the process can be used in other embodiments.
Referring again to
By way of example, if there are interconnections between parameters, such interconnections can be implemented in connection with the controller 332. For instance, if it is desired to let gpu_copy_steams always be less than or equal to gpu_inference_streams, then the following can be set as one policy: config[“gpu_copy_streams”]config[“gpu_inference_streams”]. By way of further example, an engineer can adjust runs such as by providing better initial values and/or assigning a certain range of each parameter to limit the reach range, so the number of runs can be reduced and/or runs can be finished faster. Additionally or alternatively, walltime can be set in policy as well, which can be useful in a situation such as when only two hours can be given to the optimization, and the software will try its best to find the best parameters in the given time. Such a circumstance can be controlled by setting this as a stop point, wherein the best values found in the given time range can be automatically updated to production servers. Policy also can be set, for example, to determine if the inference engine needs to be rebuilt or not, and/or if some hypermeters need the inference engine to be rebuilt. Policy can adjust such changes based at least in part on relevant rules.
Also, one or more conditions can be extended, and based at least in part on such conditions, policy can help to reduce the time spent on finding the best hypermeter set. In such an embodiment, example conditions (i.e., constraints that are to be met while executing) can include domain expert recommendations (e.g., a recommendation can suggest running batch sizes between 256 and 512 for all multiples of 64), the type of deployments that the inference system(s) is/are subjected to, whether to optimize for quality of service or system throughput, how model sparsity is addressed, if a human in the loop is needed, etc.
Referring again to
As also depicted in
As detailed herein, one or more embodiments include incorporating topology awareness to determine the best values for multiple systems with different layouts, configurations, etc. Additionally or alternatively, at least one embodiment includes running at least a portion of the techniques detailed herein on top of a software development kit for deep learning inference (e.g., TensorRT), wherein customized optimization can be carried out on each type of system. Also, such an embodiment includes reducing, relative to conventional approaches, the time required to determine optimal hyperparameter sets as well as reducing the human errors related thereto.
In one or more embodiments, an optimization engine can be added and/or incorporated, for example, into deep learning pipelines in Kubeflow or something similar. Such an embodiment can include combining the optimization engine with a software development kit for a deep learning inference server docker, forming a component that allows users to download and/or provide tuned pre-installed inference servers. In connection with a datacenter that runs an inference workload on a large number of servers with exactly the same configuration, an example embodiment can include gathering all configuration data of the system(s) preemptively and performing one of more techniques detailed herein such that the best values can be saved in a database ahead of time. Accordingly, performance improves on the entire datacenter with no additional hardware cost and no additional run time. Additionally or alternatively, such a datacenter can use idle resources during non-peak hours for optimization.
It is to be appreciated that a “model,” as used herein, refers to an electronic digitally stored set of executable instructions and data values, associated with one another, which are capable of receiving and responding to a programmatic or other digital call, invocation, and/or request for resolution based upon specified input values, to yield one or more output values that can serve as the basis of computer-implemented recommendations, output data displays, machine control, etc. Persons of skill in the field may find it convenient to express models using mathematical equations, but that form of expression does not confine the model(s) disclosed herein to abstract concepts; instead, each model herein has a practical application in a processing device in the form of stored executable instructions and data that implement the model using the processing device.
In this embodiment, the process includes steps 500 through 508. These steps are assumed to be performed by automated deep learning inference tuning system 105 utilizing elements 112, 114 and 116.
Step 500 includes obtaining input information from one or more systems associated with a datacenter. In at least one embodiment, obtaining input information includes communicating with at least one load balancing component associated with the datacenter. Also, in one or more embodiments, the one or more systems include multiple systems with multiple different layouts and multiple different configurations.
Step 502 includes detecting topological information associated with at least a portion of the one or more systems by processing at least a portion of the input information, wherein the topological information is related to hardware topology. Step 504 includes automatically selecting one or more of multiple hyperparameters of at least one deep learning model based at least in part on the detected topological information. In at least one embodiment, such an automatic selection step can be based at least in part on the detected topological information and one or more performance variables. Such performance variables can include, for example, maintenance of a given level of quality of service associated with the model, increased throughput associated with the model, accuracy of the model, latency associated with the model, etc. Also, in at least one embodiment, the at least one deep learning model includes one or more of at least one binary search model, at least one genetic algorithm, at least one Bayesian model, at least one MetaRecentering model, at least one covariance matrix adaption (CMA) model, at least one Nelder-Mead model, and at least one differential evolution model.
Step 506 includes determining a status of at least a portion of the detected topological information by processing, during an inference phase of the at least one deep learning model, the detected topological information and data from at least one systems-related database. Step 508 includes performing, in connection with at least a portion of the one or more selected hyperparameters of the at least one deep learning model, one or more automated actions based at least in part on the determining. In at least one embodiment, determining a status includes determining a first status indicating that the at least a portion of the detected topological information is part of previous topological information, and performing one or more automated actions includes automatically retrieving one or more values from the at least one systems-related database upon determining the first status. Additionally or alternatively, determining a status can include determining a second status indicating that the at least a portion of the detected topological information is not part of previous topological information, and in such an embodiment, performing one or more automated actions can include determining one or more hyperparameter values for the one or more selected hyperparameters of the at least one deep learning model upon determining the second status, wherein determining the one or more hyperparameter values is based at least in part on analyzing a set of one or more rules. It is to be appreciated that such noted status indications are merely examples implemented in connection with one or more embodiments, and other examples of a status can include new, not new, previously existing and not previously existing.
At least one embodiment can further include automatically implementing the one or more determined hyperparameter values in the at least one deep learning model and/or outputting the one or more determined hyperparameter values to one or more production systems associated with the datacenter. Additionally or alternatively, such an embodiment can include automatically generating data pertaining to the one or more determined hyperparameter values in JSON format.
In at least one embodiment, performing one or more automated actions includes translating results of the determining and outputting at least a portion of the translated results via at least one user interface. In such an embodiment, outputting at least a portion of the translated results via at least one user interface can include outputting the at least a portion of the translated results via at least one web graphical user interface.
Accordingly, the particular processing operations and other functionality described in conjunction with the flow diagram of
The above-described illustrative embodiments provide significant advantages relative to conventional approaches. For example, some embodiments are configured to automatically perform topology-aware tuning of deep learning models during an inference phase. These and other embodiments can effectively overcome problems associated with performing topology-indifferent hyperparameter tuning exclusively during the training phase.
It is to be appreciated that the particular advantages described above and elsewhere herein are associated with particular illustrative embodiments and need not be present in other embodiments. Also, the particular types of information processing system features and functionality as illustrated in the drawings and described above are exemplary only, and numerous other arrangements may be used in other embodiments.
As mentioned previously, at least portions of the information processing system 100 can be implemented using one or more processing platforms. A given such processing platform comprises at least one processing device comprising a processor coupled to a memory. The processor and memory in some embodiments comprise respective processor and memory elements of a virtual machine or container provided using one or more underlying physical machines. The term “processing device” as used herein is intended to be broadly construed so as to encompass a wide variety of different arrangements of physical processors, memories and other device components as well as virtual instances of such components. For example, a “processing device” in some embodiments can comprise or be executed across one or more virtual processors. Processing devices can therefore be physical or virtual and can be executed across one or more physical or virtual processors. It should also be noted that a given virtual device can be mapped to a portion of a physical one.
Some illustrative embodiments of a processing platform used to implement at least a portion of an information processing system comprises cloud infrastructure including virtual machines implemented using a hypervisor that runs on physical infrastructure. The cloud infrastructure further comprises sets of applications running on respective ones of the virtual machines under the control of the hypervisor. It is also possible to use multiple hypervisors each providing a set of virtual machines using at least one underlying physical machine. Different sets of virtual machines provided by one or more hypervisors may be utilized in configuring multiple instances of various components of the system.
These and other types of cloud infrastructure can be used to provide what is also referred to herein as a multi-tenant environment. One or more system components, or portions thereof, are illustratively implemented for use by tenants of such a multi-tenant environment.
As mentioned previously, cloud infrastructure as disclosed herein can include cloud-based systems. Virtual machines provided in such systems can be used to implement at least portions of a computer system in illustrative embodiments.
In some embodiments, the cloud infrastructure additionally or alternatively comprises a plurality of containers implemented using container host devices. For example, as detailed herein, a given container of cloud infrastructure illustratively comprises a Docker container or other type of Linux Container (LXC). The containers are run on virtual machines in a multi-tenant environment, although other arrangements are possible. The containers are utilized to implement a variety of different types of functionality within the system 100. For example, containers can be used to implement respective processing devices providing compute and/or storage services of a cloud-based system. Again, containers may be used in combination with other virtualization infrastructure such as virtual machines implemented using a hypervisor.
Illustrative embodiments of processing platforms will now be described in greater detail with reference to
The cloud infrastructure 600 further comprises sets of applications 610-1, 610-2, . . . 610-L running on respective ones of the VMs/container sets 602-1, 602-2, . . . 602-L under the control of the virtualization infrastructure 604. The VMs/container sets 602 comprise respective VMs, respective sets of one or more containers, or respective sets of one or more containers running in VMs. In some implementations of the
A hypervisor platform may be used to implement a hypervisor within the virtualization infrastructure 604, wherein the hypervisor platform has an associated virtual infrastructure management system. The underlying physical machines comprise one or more distributed processing platforms that include one or more storage systems.
In other implementations of the
As is apparent from the above, one or more of the processing modules or other components of system 100 may each run on a computer, server, storage device or other processing platform element. A given such element is viewed as an example of what is more generally referred to herein as a “processing device.” The cloud infrastructure 600 shown in
The processing platform 700 in this embodiment comprises a portion of system 100 and includes a plurality of processing devices, denoted 702-1, 702-2, 702-3, . . . 702-K, which communicate with one another over a network 704.
The network 704 comprises any type of network, including by way of example a global computer network such as the Internet, a WAN, a LAN, a satellite network, a telephone or cable network, a cellular network, a wireless network such as a Wi-Fi or WiMAX network, or various portions or combinations of these and other types of networks.
The processing device 702-1 in the processing platform 700 comprises a processor 710 coupled to a memory 712.
The processor 710 comprises a microprocessor, a microcontroller, an ASIC, an FPGA or other type of processing circuitry, as well as portions or combinations of such circuitry elements.
The memory 712 comprises RAM, ROM or other types of memory, in any combination.
The memory 712 and other memories disclosed herein should be viewed as illustrative examples of what are more generally referred to as “processor-readable storage media” storing executable program code of one or more software programs.
Articles of manufacture comprising such processor-readable storage media are considered illustrative embodiments. A given such article of manufacture comprises, for example, a storage array, a storage disk or an integrated circuit containing RAM, ROM or other electronic memory, or any of a wide variety of other types of computer program products. The term “article of manufacture” as used herein should be understood to exclude transitory, propagating signals. Numerous other types of computer program products comprising processor-readable storage media can be used.
Also included in the processing device 702-1 is network interface circuitry 714, which is used to interface the processing device with the network 704 and other system components, and may comprise conventional transceivers.
The other processing devices 702 of the processing platform 700 are assumed to be configured in a manner similar to that shown for processing device 702-1 in the figure.
Again, the particular processing platform 700 shown in the figure is presented by way of example only, and system 100 may include additional or alternative processing platforms, as well as numerous distinct processing platforms in any combination, with each such platform comprising one or more computers, servers, storage devices or other processing devices.
For example, other processing platforms used to implement illustrative embodiments can comprise different types of virtualization infrastructure, in place of or in addition to virtualization infrastructure comprising virtual machines. Such virtualization infrastructure illustratively includes container-based virtualization infrastructure configured to provide Docker containers or other types of LXCs.
As another example, portions of a given processing platform in some embodiments can comprise converged infrastructure.
It should therefore be understood that in other embodiments different arrangements of additional or alternative elements may be used. At least a subset of these elements may be collectively implemented on a common processing platform, or each such element may be implemented on a separate processing platform.
Also, numerous other arrangements of computers, servers, storage products or devices, or other components are possible in the information processing system 100. Such components can communicate with other elements of the information processing system 100 over any type of network or other communication media.
For example, particular types of storage products that can be used in implementing a given storage system of a distributed processing system in an illustrative embodiment include all-flash and hybrid flash storage arrays, scale-out all-flash storage arrays, scale-out NAS clusters, or other types of storage arrays. Combinations of multiple ones of these and other storage products can also be used in implementing a given storage system in an illustrative embodiment.
It should again be emphasized that the above-described embodiments are presented for purposes of illustration only. Many variations and other alternative embodiments may be used. Also, the particular configurations of system and device elements and associated processing operations illustratively shown in the drawings can be varied in other embodiments. Thus, for example, the particular types of processing devices, modules, systems and resources deployed in a given embodiment and their respective configurations may be varied. Moreover, the various assumptions made above in the course of describing the illustrative embodiments should also be viewed as exemplary rather than as requirements or limitations of the disclosure. Numerous other alternative embodiments within the scope of the appended claims will be readily apparent to those skilled in the art.