Machine learning models, such as regression models, hidden Markov models, neural networks like convolutional neural networks or recurrent neural networks, and other types of machine learning models, are trained to fine-tune parameters of the machine learning model (e.g., weights of the machine learning model). The model may undergo training via many epochs, where each epoch is an iteration including inputting training data to the model and adjusting the parameters of the model.
Some implementations described herein relate to a method. The method may include receiving, by a device, a configuration associated with a machine learning model. The method may include receiving, by the device, a first hyperparameter set associated with the machine learning model. The method may include estimating, by the device, a first quantity of floating-point operations (FLOPs) associated with one or more epochs, for the machine learning model, based on the first hyperparameter set. The method may include outputting, to a user, an indication of a first energy consumption associated with training the machine learning model based on the first quantity of FLOPs.
Some implementations described herein relate to a device. The device may include one or more memories and one or more processors communicatively coupled to the one or more memories. The one or more processors may be configured to receive a configuration associated with a machine learning model. The one or more processors may be configured to receive a first hyperparameter set associated with the machine learning model. The one or more processors may be configured to estimate a first quantity of FLOPs associated with one or more epochs, for the machine learning model, based on the first hyperparameter set. The one or more processors may be configured to output, to a user, an indication of a first energy consumption associated with training the machine learning model based on the first quantity of FLOPs.
Some implementations described herein relate to a non-transitory computer-readable medium that stores a set of instructions for a device. The set of instructions, when executed by one or more processors of the device, may cause the device to receive a configuration associated with a machine learning model. The set of instructions, when executed by one or more processors of the device, may cause the device to receive a first hyperparameter set associated with the machine learning model. The set of instructions, when executed by one or more processors of the device, may cause the device to estimate a first quantity of FLOPs associated with one or more epochs, for the machine learning model, based on the first hyperparameter set. The set of instructions, when executed by one or more processors of the device, may cause the device to output, to a user, an indication of a first energy consumption associated with training the machine learning model based on the first quantity of FLOPs.
The following detailed description of example implementations refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements.
Machine learning models consume large amounts of energy during training. However, energy consumption can vary significantly across model types, model architectures, hyperparameter sets, and epochs used. Additionally, energy consumptions may vary across different types of hardware.
By estimating energy consumption before training a machine learning model, methods and apparatus described herein help conserve power and processing resources when the model is actually trained. Some implementations described herein enable energy associated with training a machine learning model to be estimated while the model is being designed. As a result, power and processing resources are conserved during training of the model, for example, by adjusting the hyperparameter sets and epochs used for the model.
As shown by reference number 110, the user device may transmit, and the model analysis system may receive, input including a statement associated with a machine learning model to be trained. For example, the input may be a string encoding the statement. The statement may include keywords (e.g., one or more keywords) associated with a goal for the machine learning model (e.g., “image identification,” “data categorization,” “text prediction,” and/or “speech-to-text transcription,” among other examples) or a natural language indication of a problem for the machine learning model to solve (e.g., “The model will predict a next word in a sentence while a user types,” “The model should identify cats within images,” “The model will parse data from comma-separate values (CSV) files and categorize the data into spreadsheets,” among other examples). In some implementations, the model analysis system may receive the input using an interface as described in connection with
Accordingly, the model analysis system may process the input using natural language processing (NLP) and/or another type of text interpretation model. In some implementations, the model analysis system may process the input using a model trained and applied as described in connection with
Additionally with, or alternatively to, the machine learning architectures, the model analysis system may receive (e.g., from the machine learning database) indications of optimization algorithms (e.g., one or more indications of one or more machine learning architectures) that are identified as relevant to the input. For example, the optimization algorithms may be identified as relevant based on mapping keywords in the input to keywords stored in the machine learning database in association with indications of optimization algorithms. Additionally, or alternatively, the optimization algorithms may be identified as relevant based on output from a model trained and applied as described in connection with
Therefore, as shown by reference number 130, the model analysis system may transmit, and the user device may receive, indications of recommended architectures and/or optimization algorithms. For example, the recommended architectures and/or optimization algorithms may include the relevant architectures and/or optimization algorithms, respectively, received from the machine learning database, as described in connection with reference number 120.
Accordingly, as shown by reference number 140, the user device may transmit, and the model analysis system may receive, a selection from the recommended architectures and/or optimization algorithms. Additionally, or alternatively, the user device may transmit, and the model analysis system may receive, a custom architecture and/or optimization algorithm. For example, the model analysis system may use an interface as described in connection with
Furthermore, the user device may transmit, and the model analysis system may receive, a hyperparameter set associated with the machine learning model. For example, the model analysis system may use an interface as described in connection with
Accordingly, as shown by reference number 150, the model analysis system may estimate an energy consumption associated with training the machine learning model. For example, the energy consumption may include a quantity of floating-point operations (FLOPs) associated with one or more epochs, for the machine learning model, based on the hyperparameter set. As described in connection with
Additionally, or alternatively, the model analysis system may estimate the energy consumption in Joules (J), kilowatt-hours (kWh), and/or another unit associated with energy. For example, as described in connection with
Accordingly, the model analysis system may transmit, and the user device may receive, an indication of the energy consumption associated with training the machine learning model based on the quantity of FLOPs. In some implementations, as shown by reference number 160a in
Additionally, or alternatively, and as shown by reference number 160b, the model analysis system may transmit, and the user device may receive, a recommendation based on the energy consumption. For example, the model analysis system may recommend a different hyperparameter set (e.g., to decrease energy consumption and/or to increase accuracy), a different quantity of epochs (e.g., to decrease energy consumption or to increase accuracy), and/or a different optimization algorithm (e.g., to decrease energy consumption and/or to increase accuracy).
Accordingly, as shown by reference number 170, the user device may transmit, and the model analysis system may receive, a selection based on the recommendation. For example, the user device may select a new hyperparameter set, a new quantity of epochs, and/or a new optimization algorithm. In some implementations, the user device and the model analysis system may iteratively perform operations associated with reference numbers 150, 160a and/or 160b, and 170 to estimate new energy consumptions based on modifications to the hyperparameter set, the quantity of epochs, and/or the optimization algorithm.
Therefore, as shown by reference number 180, the model analysis system may initiate training of the machine learning model when the user device indicates a final selection of the hyperparameter set, the quantity of epochs, and the optimization algorithm. For example, the model analysis system may store files (e.g., one or more files) that may be executed or otherwise used to train the machine learning model according to the final selection. Additionally, or alternatively, the model analysis system may transmit instructions to hardware (e.g., indicated by the user device) to begin training the machine learning model according to the final selection.
By using techniques as described in connection with
As indicated above,
As shown by reference number 210a, the model analysis system may receive (e.g., from the machine learning database) an indication of a pre-trained model to use as a configuration for the machine learning model. For example, the model analysis system may identify the pre-trained model as relevant based on input from the user device (e.g., as described in connection with reference number 110 of
Additionally, or alternatively, as shown by reference number 210b, the user device may transmit, and the model analysis system may receive, definitions (e.g., one or more definitions) of layers (e.g., one or more layers) associated with the machine learning model. For example, the user device may build a configuration for the machine learning model indicating the definitions of the layers that form an architecture of the machine learning model. In some implementations, the user device may modify definitions of the layers of the pre-trained model in order to modify the configuration for the pre-trained model. The model analysis system may receive the configuration indicating the definitions using an interface as described in connection with
Based on the configuration for the machine learning model, the model analysis system may receive (e.g., from the optimization algorithms database) indications of optimization algorithms (e.g., one or more indications of one or more machine learning architectures). For example, the optimization algorithms may be selected based on mapping layer definitions and/or a base architecture indicated by the configuration to layer definitions and/or base architectures stored in the optimization algorithms database in association with indications of optimization algorithms. Additionally, or alternatively, the optimization algorithms may be selected based on output from a model trained and applied as described in connection with
Accordingly, as shown by reference number 230, the model analysis system may transmit, and the user device may receive, indications of recommended optimization algorithms. For example, the recommended optimization algorithms may include the optimization algorithms received from the machine learning database, as described in connection with reference number 220.
Further, as shown by reference number 240, the user device may transmit, and the model analysis system may receive, a selection from the recommended optimization algorithms. Additionally, or alternatively, the user device may transmit, and the model analysis system may receive, a custom optimization algorithm. For example, the model analysis system may use an interface as described in connection with
As shown in
As shown by reference number 260, the model analysis system may estimate a quantity of FLOPs associated with training the machine learning model. For example, the model analysis system may identify a quantity of MAC operations based on an architecture of the machine learning model and the selected optimization algorithm. The model analysis system may identify the quantity of MAC operations by estimating, for an epoch, a quantity of layers through which a training data set will pass, and activation functions and weights that will be applied in each layer. Additionally, the model analysis system may identify the quantity of MAC operations by estimating, for an epoch, how many calculations will be used by the selected optimization function to fine-tune the weights.
In some implementations, the model analysis system may generate a plurality of estimates, where each estimated quantity of FLOPs is associated with a unique quantity of epochs. Accordingly, the model analysis system may output a visualization associated with energy consumption (expressed in FLOPs) relative to epochs, as described in connection with reference number 290a. Additionally, or alternatively, the model analysis system may output a recommended quantity of epochs, as described in connection with reference number 290b, based on the estimated quantities of FLOPs.
In some implementations, the model analysis system may further estimate a plurality of accuracy values, where each accuracy value is associated with a unique, corresponding quantity of epochs. Accordingly, the model analysis system may output a visualization associated with energy consumption (expressed in FLOPs) relative to accuracy values, as described in connection with reference number 290a. Additionally, or alternatively, the model analysis system may output a recommended quantity of epochs, as described in connection with reference number 290b, based on the estimated accuracy values. For example, the model analysis system may balance energy conservation with accuracy importance. The model analysis system may apply an energy threshold and an accuracy threshold to determine the recommended quantity of epochs. Alternatively, the model analysis system may determine the recommended quantity of epochs based on output from a model trained and applied as described in connection with
Additionally, or alternatively, the model analysis system may generate a plurality of estimates, where each estimated quantity of FLOPs is associated with a unique hyperparameter set. Accordingly, the model analysis system may output a visualization associated with energy consumption (expressed in FLOPs) relative to hyperparameter sets, as described in connection with reference number 290a. Additionally, or alternatively, the model analysis system may output a recommended hyperparameter set, as described in connection with reference number 290b, based on the estimated quantities of FLOPs.
In some implementations, the model analysis system may further estimate a plurality of accuracy values, where each accuracy value is associated with a unique, corresponding hyperparameter set. Accordingly, the model analysis system may output a visualization associated with energy consumption (expressed in FLOPs) relative to hyperparameter sets, as described in connection with reference number 290a. Additionally, or alternatively, the model analysis system may output a recommended hyperparameter set, as described in connection with reference number 290b, based on the estimated accuracy values. For example, the model analysis system may balance energy conservation with accuracy importance. The model analysis system may apply an energy threshold and an accuracy threshold to determine the recommended hyperparameter set. Alternatively, the model analysis system may determine the recommended hyperparameter set based on output from a model trained and applied as described in connection with
The model analysis system may combine these analyses to calculate a corresponding estimate of FLOPs for each unique combination of hyperparameter set and quantity of epochs. Accordingly, the model analysis system may generate a three-dimensional visual graph as described in connection with
As shown by reference number 270, the model analysis system may receive (e.g., from the hardware database) an indication of a TDP associated with hardware for training the machine learning model. For example, the user device may indicate the hardware (e.g., via a serial number, a model number, and/or another indication of the hardware intended to be used for training the machine learning model).
Accordingly, as shown by reference number 280, the model analysis system may estimate energy consumption in J, kWh, and/or another unit associated with energy rather than FLOPs. For example, the model analysis system may perform any estimates described in connection with reference number 260 but with additional converting of the quantities of FLOPs to energy using the TDP associated with the hardware for training the machine learning model. Although described as using the TDP associated with the hardware, the hardware database may additionally or alternatively store algorithms associated with different types of hardware that the model analysis system uses to convert FLOPs to energy. For example, the algorithms may account for energy efficiency of particular hardware types as determined by factory specifications and/or experimental results associated with the types of hardware.
Accordingly, as shown by reference number 290a, the model analysis system may transmit, and the user device may receive, a visualization (e.g., one or more visualizations) indicating energy consumption associated with training the machine learning model. For example, the model analysis system may generate visual graphs as described in connection with
Additionally, or alternatively, as shown by reference number 290a, the model analysis system may transmit, and the user device may receive, a recommendation (e.g., one or more recommendations) associated with which hyperparameter set (or sets) and/or a quantity (or quantities) of epochs to use for the machine learning model. For example, as described above, the model analysis system may balance energy conservation with accuracy importance to determine the recommendation.
In some implementations, as described in connection with reference number 180 of
By using techniques as described in connection with
As indicated above,
As shown by reference number 305, a machine learning model may be trained using a set of observations. The set of observations may be obtained and/or input from training data (e.g., historical data), such as data gathered during one or more processes described herein. For example, the set of observations may include data gathered from a machine learning database, an optimization algorithms database, a hyperparameter set database, and/or a hardware database, as described elsewhere herein. In some implementations, the machine learning system may receive the set of observations (e.g., as input) from a user device, as described elsewhere herein.
As shown by reference number 310, a feature set may be derived from the set of observations. The feature set may include a set of variables. A variable may be referred to as a feature. A specific observation may include a set of variable values corresponding to the set of variables. A set of variable values may be specific to an observation. In some cases, different observations may be associated with different sets of variable values, sometimes referred to as feature values. In some implementations, the machine learning system may determine variables for a set of observations and/or variable values for a specific observation based on input received from the machine learning database, the optimization algorithms database, the hyperparameter set database, the hardware database, and/or the user device. For example, the machine learning system may identify a feature set (e.g., one or more features and/or corresponding feature values) from structured data input to the machine learning system, such as by extracting data from a particular column of a table, extracting data from a particular field of a form and/or a message, and/or extracting data received in a structured data format. Additionally, or alternatively, the machine learning system may receive input from an operator to determine features and/or feature values. In some implementations, the machine learning system may perform natural language processing and/or another feature identification technique to extract features (e.g., variables) and/or feature values (e.g., variable values) from text (e.g., unstructured data) input to the machine learning system, such as by identifying keywords and/or values associated with those keywords from the text.
As an example, a feature set for a set of observations may include a first feature of an input statement, a second feature of an accuracy importance, a third feature of an energy importance, and so on. As shown, for a first observation, the first feature may have a value of “image ID” (or image identification), the second feature may have a value of medium, the third feature may have a value of high, and so on. These features and feature values are provided as examples, and may differ in other examples. For example, the feature set may include one or more of the following features: a hardware configuration, a selected architecture, a quantity of epochs, a desired accuracy, a selected hyperparameter set, and/or a selected optimization algorithm, among other examples. In some implementations, the machine learning system may pre-process and/or perform dimensionality reduction to reduce the feature set and/or combine features of the feature set to a minimum feature set. A machine learning model may be trained on the minimum feature set, thereby conserving resources of the machine learning system (e.g., processing resources and/or memory resources) used to train the machine learning model.
As shown by reference number 315, the set of observations may be associated with a target variable. The target variable may represent a variable having a numeric value (e.g., an integer value or a floating point value), may represent a variable having a numeric value that falls within a range of values or has some discrete possible values, may represent a variable that is selectable from one of multiple options (e.g., one of multiples classes, classifications, or labels), or may represent a variable having a Boolean value (e.g., 0 or 1, True or False, Yes or No), among other examples. A target variable may be associated with a target variable value, and a target variable value may be specific to an observation. In some cases, different observations may be associated with different target variable values. In example 300, the target variable is a recommended architecture, which has a value of convolutional neural network (CNN) for the first observation.
The feature set and target variable described above are provided as examples, and other examples may differ from what is described above. For example, for a target variable of recommended optimization algorithm, the feature set may include an accuracy importance, an energy importance, a selected architecture, and/or a selected hyperparameter set. In another example, for a target variable of recommended hyperparameter set, the feature set may include an accuracy importance, an energy importance, a selected architecture, and/or a selected optimization algorithm. In another example, for a target variable of recommended quantity of epochs, the feature set may include an accuracy importance, an energy importance, a selected architecture, a selected hyperparameter set, and/or a selected optimization algorithm.
The target variable may represent a value that a machine learning model is being trained to predict, and the feature set may represent the variables that are input to a trained machine learning model to predict a value for the target variable. The set of observations may include target variable values so that the machine learning model can be trained to recognize patterns in the feature set that lead to a target variable value. A machine learning model that is trained to predict a target variable value may be referred to as a supervised learning model or a predictive model. When the target variable is associated with continuous target variable values (e.g., a range of numbers), the machine learning model may employ a regression technique. When the target variable is associated with categorical target variable values (e.g., classes or labels), the machine learning model may employ a classification technique.
In some implementations, the machine learning model may be trained on a set of observations that do not include a target variable (or that include a target variable, but the machine learning model is not being executed to predict the target variable). This may be referred to as an unsupervised learning model, an automated data analysis model, or an automated signal extraction model. In this case, the machine learning model may learn patterns from the set of observations without labeling or supervision, and may provide output that indicates such patterns, such as by using clustering and/or association to identify related groups of items within the set of observations.
As further shown, the machine learning system may partition the set of observations into a training set 320 that includes a first subset of observations, of the set of observations, and a test set 325 that includes a second subset of observations of the set of observations. The training set 320 may be used to train (e.g., fit or tune) the machine learning model, while the test set 325 may be used to evaluate a machine learning model that is trained using the training set 320. For example, for supervised learning, the test set 325 may be used for initial model training using the first subset of observations, and the test set 325 may be used to test whether the trained model accurately predicts target variables in the second subset of observations. In some implementations, the machine learning system may partition the set of observations into the training set 320 and the test set 325 by including a first portion or a first percentage of the set of observations in the training set 320 (e.g., 75%, 80%, or 85%, among other examples) and including a second portion or a second percentage of the set of observations in the test set 325 (e.g., 25%, 20%, or 15%, among other examples). In some implementations, the machine learning system may randomly select observations to be included in the training set 320 and/or the test set 325.
As shown by reference number 330, the machine learning system may train a machine learning model using the training set 320. This training may include executing, by the machine learning system, a machine learning algorithm to determine a set of model parameters based on the training set 320. In some implementations, the machine learning algorithm may include a regression algorithm (e.g., linear regression or logistic regression), which may include a regularized regression algorithm (e.g., Lasso regression, Ridge regression, or Elastic-Net regression). Additionally, or alternatively, the machine learning algorithm may include a decision tree algorithm, which may include a tree ensemble algorithm (e.g., generated using bagging and/or boosting), a random forest algorithm, or a boosted trees algorithm. A model parameter may include an attribute of a machine learning model that is learned from data input into the model (e.g., the training set 320). For example, for a regression algorithm, a model parameter may include a regression coefficient (e.g., a weight). For a decision tree algorithm, a model parameter may include a decision tree split location, as an example.
As shown by reference number 335, the machine learning system may use one or more hyperparameter sets 340 to tune the machine learning model. A hyperparameter may include a structural parameter that controls execution of a machine learning algorithm by the machine learning system, such as a constraint applied to the machine learning algorithm. Unlike a model parameter, a hyperparameter is not learned from data input into the model. An example hyperparameter for a regularized regression algorithm includes a strength (e.g., a weight) of a penalty applied to a regression coefficient to mitigate overfitting of the machine learning model to the training set 320. The penalty may be applied based on a size of a coefficient value (e.g., for Lasso regression, such as to penalize large coefficient values), may be applied based on a squared size of a coefficient value (e.g., for Ridge regression, such as to penalize large squared coefficient values), may be applied based on a ratio of the size and the squared size (e.g., for Elastic-Net regression), and/or may be applied by setting one or more feature values to zero (e.g., for automatic feature selection). Example hyperparameters for a decision tree algorithm include a tree ensemble technique to be applied (e.g., bagging, boosting, a random forest algorithm, and/or a boosted trees algorithm), a number of features to evaluate, a number of observations to use, a maximum depth of each decision tree (e.g., a number of branches permitted for the decision tree), or a number of decision trees to include in a random forest algorithm.
To train a machine learning model, the machine learning system may identify a set of machine learning algorithms to be trained (e.g., based on operator input that identifies the one or more machine learning algorithms and/or based on random selection of a set of machine learning algorithms), and may train the set of machine learning algorithms (e.g., independently for each machine learning algorithm in the set) using the training set 320. The machine learning system may tune each machine learning algorithm using one or more hyperparameter sets 340 (e.g., based on operator input that identifies hyperparameter sets 340 to be used and/or based on randomly generating hyperparameter values). The machine learning system may train a particular machine learning model using a specific machine learning algorithm and a corresponding hyperparameter set 340. In some implementations, the machine learning system may train multiple machine learning models to generate a set of model parameters for each machine learning model, where each machine learning model corresponds to a different combination of a machine learning algorithm and a hyperparameter set 340 for that machine learning algorithm.
In some implementations, the machine learning system may perform cross-validation when training a machine learning model. Cross validation can be used to obtain a reliable estimate of machine learning model performance using only the training set 320, and without using the test set 325, such as by splitting the training set 320 into a number of groups (e.g., based on operator input that identifies the number of groups and/or based on randomly selecting a number of groups) and using those groups to estimate model performance. For example, using k-fold cross-validation, observations in the training set 320 may be split into k groups (e.g., in order or at random). For a training procedure, one group may be marked as a hold-out group, and the remaining groups may be marked as training groups. For the training procedure, the machine learning system may train a machine learning model on the training groups and then test the machine learning model on the hold-out group to generate a cross-validation score. The machine learning system may repeat this training procedure using different hold-out groups and different test groups to generate a cross-validation score for each training procedure. In some implementations, the machine learning system may independently train the machine learning model k times, with each individual group being used as a hold-out group once and being used as a training group k−1 times. The machine learning system may combine the cross-validation scores for each training procedure to generate an overall cross-validation score for the machine learning model. The overall cross-validation score may include, for example, an average cross-validation score (e.g., across all training procedures), a standard deviation across cross-validation scores, or a standard error across cross-validation scores.
In some implementations, the machine learning system may perform cross-validation when training a machine learning model by splitting the training set into a number of groups (e.g., based on operator input that identifies the number of groups and/or based on randomly selecting a number of groups). The machine learning system may perform multiple training procedures and may generate a cross-validation score for each training procedure. The machine learning system may generate an overall cross-validation score for each hyperparameter set 340 associated with a particular machine learning algorithm. The machine learning system may compare the overall cross-validation scores for different hyperparameter sets 340 associated with the particular machine learning algorithm, and may select the hyperparameter set 340 with the best (e.g., highest accuracy, lowest error, or closest to a desired threshold) overall cross-validation score for training the machine learning model. The machine learning system may then train the machine learning model using the selected hyperparameter set 340, without cross-validation (e.g., using all of the data in the training set 320 without any hold-out groups), to generate a single machine learning model for a particular machine learning algorithm. The machine learning system may then test this machine learning model using the test set 325 to generate a performance score, such as a mean squared error (e.g., for regression), a mean absolute error (e.g., for regression), or an area under receiver operating characteristic curve (e.g., for classification). If the machine learning model performs adequately (e.g., with a performance score that satisfies a threshold), then the machine learning system may store that machine learning model as a trained machine learning model 345 to be used to analyze new observations, as described below in connection with
In some implementations, the machine learning system may perform cross-validation, as described above, for multiple machine learning algorithms (e.g., independently), such as a regularized regression algorithm, different types of regularized regression algorithms, a decision tree algorithm, or different types of decision tree algorithms. Based on performing cross-validation for multiple machine learning algorithms, the machine learning system may generate multiple machine learning models, where each machine learning model has the best overall cross-validation score for a corresponding machine learning algorithm. The machine learning system may then train each machine learning model using the entire training set 320 (e.g., without cross-validation), and may test each machine learning model using the test set 325 to generate a corresponding performance score for each machine learning model. The machine learning model may compare the performance scores for each machine learning model, and may select the machine learning model with the best (e.g., highest accuracy, lowest error, or closest to a desired threshold) performance score as the trained machine learning model 345.
In some implementations, the trained machine learning model 345 may predict a value of CNN for the target variable of recommended architecture for the new observation, as shown by reference number 355. Based on this prediction (e.g., based on the value having a particular label or classification or based on the value satisfying or failing to satisfy a threshold), the machine learning system may provide a recommendation and/or output for determination of a recommendation, such as a recommended pre-trained CNN to use. Additionally, or alternatively, the machine learning system may perform an automated action and/or may cause an automated action to be performed (e.g., by instructing another device to perform the automated action), such as selecting a CNN base architecture. As another example, if the machine learning system were to predict a value of recurrent neural network (RNN) for the target variable of recommended architecture, then the machine learning system may provide a different recommendation (e.g., a recommended pre-trained RNN to use) and/or may perform or cause performance of a different automated action (e.g., selecting an RNN base architecture). In some implementations, the recommendation and/or the automated action may be based on the target variable value having a particular label (e.g., classification or categorization) and/or may be based on whether the target variable value satisfies one or more threshold (e.g., whether the target variable value is greater than a threshold, is less than a threshold, is equal to a threshold, or falls within a range of threshold values).
In some implementations, the trained machine learning model 345 may classify (e.g., cluster) the new observation in a cluster, as shown by reference number 360. The observations within a cluster may have a threshold degree of similarity. As an example, if the machine learning system classifies the new observation in a first cluster (e.g., energy conscious), then the machine learning system may provide a first recommendation, such as a CNN architecture. Additionally, or alternatively, the machine learning system may perform a first automated action and/or may cause a first automated action to be performed (e.g., by instructing another device to perform the automated action) based on classifying the new observation in the first cluster, such as selecting a CNN base architecture. As another example, if the machine learning system were to classify the new observation in a second cluster (e.g., accuracy conscious), then the machine learning system may provide a second (e.g., different) recommendation (e.g., an RNN architecture) and/or may perform or cause performance of a second (e.g., different) automated action, such as selecting an RNN base architecture.
The recommendations, actions, and clusters described above are provided as examples, and other examples may differ from what is described above. For example, the recommendations associated with text-related statements may include a hidden Markov model architecture. The actions associated with text-related statements may include, for example, selecting a Markov model base architecture. The clusters associated with text-related statements may include, for example, energy conscious and accuracy conscious clusters.
In this way, the machine learning system may apply a rigorous and automated process to recommending machine learning model architectures, hyperparameter sets, optimization algorithms, and/or quantities of epochs. The machine learning system enables recognition and/or identification of tens, hundreds, thousands, or millions of features and/or feature values for tens, hundreds, thousands, or millions of observations, thereby increasing accuracy and consistency and reducing delay associated with building a machine learning model relative to requiring computing resources to be allocated for tens, hundreds, or thousands of operators to build and test multiple different architectures, hyperparameter sets, optimization algorithms, and/or quantities of epochs using the features or feature values.
As indicated above,
As shown in
As further shown in
As shown in
As further shown in
As shown in
As further shown in
As shown in
As further shown in
As shown in
As shown in
As shown in
As indicated above,
The cloud computing system 702 includes computing hardware 703, a resource management component 704, a host operating system (OS) 705, and/or one or more virtual computing systems 706. The cloud computing system 702 may execute on, for example, an Amazon Web Services platform, a Microsoft Azure platform, or a Snowflake platform. The resource management component 704 may perform virtualization (e.g., abstraction) of computing hardware 703 to create the one or more virtual computing systems 706. Using virtualization, the resource management component 704 enables a single computing device (e.g., a computer or a server) to operate like multiple computing devices, such as by creating multiple isolated virtual computing systems 706 from computing hardware 703 of the single computing device. In this way, computing hardware 703 can operate more efficiently, with lower power consumption, higher reliability, higher availability, higher utilization, greater flexibility, and lower cost than using separate computing devices.
Computing hardware 703 includes hardware and corresponding resources from one or more computing devices. For example, computing hardware 703 may include hardware from a single computing device (e.g., a single server) or from multiple computing devices (e.g., multiple servers), such as multiple computing devices in one or more data centers. As shown, computing hardware 703 may include one or more processors 707, one or more memories 708, and/or one or more networking components 709. Examples of a processor, a memory, and a networking component (e.g., a communication component) are described elsewhere herein.
The resource management component 704 includes a virtualization application (e.g., executing on hardware, such as computing hardware 703) capable of virtualizing computing hardware 703 to start, stop, and/or manage one or more virtual computing systems 706. For example, the resource management component 704 may include a hypervisor (e.g., a bare-metal or Type 1 hypervisor, a hosted or Type 2 hypervisor, or another type of hypervisor) or a virtual machine monitor, such as when the virtual computing systems 706 are virtual machines 710. Additionally, or alternatively, the resource management component 704 may include a container manager, such as when the virtual computing systems 706 are containers 711. In some implementations, the resource management component 704 executes within and/or in coordination with a host operating system 705.
A virtual computing system 706 includes a virtual environment that enables cloud-based execution of operations and/or processes described herein using computing hardware 703. As shown, a virtual computing system 706 may include a virtual machine 710, a container 711, or a hybrid environment 712 that includes a virtual machine and a container, among other examples. A virtual computing system 706 may execute one or more applications using a file system that includes binary files, software libraries, and/or other resources required to execute applications on a guest operating system (e.g., within the virtual computing system 706) or the host operating system 705.
Although the model analysis system 701 may include one or more elements 703-712 of the cloud computing system 702, may execute within the cloud computing system 702, and/or may be hosted within the cloud computing system 702, in some implementations, the model analysis system 701 may not be cloud-based (e.g., may be implemented outside of a cloud computing system) or may be partially cloud-based. For example, the model analysis system 701 may include one or more devices that are not part of the cloud computing system 702, such as device 800 of
Network 720 includes one or more wired and/or wireless networks. For example, network 720 may include a cellular network, a public land mobile network (PLMN), a local area network (LAN), a wide area network (WAN), a private network, the Internet, and/or a combination of these or other types of networks. The network 720 enables communication among the devices of environment 700.
The machine learning database 730 includes one or more devices capable of receiving, generating, storing, processing, and/or providing information associated with machine learning model architectures, as described elsewhere herein. The machine learning database 730 may include a communication device and/or a computing device. For example, the machine learning database 730 may include a database, a server, a database server, an application server, a client server, a web server, a host server, a proxy server, a virtual server (e.g., executing on computing hardware), a server in a cloud computing system, a device that includes computing hardware used in a cloud computing environment, or a similar type of device. The machine learning database 730 may communicate with one or more other devices of environment 700, as described elsewhere herein.
The optimization algorithms database 740 includes one or more devices capable of receiving, generating, storing, processing, and/or providing information associated with optimization algorithms (e.g., loss functions), as described elsewhere herein. The optimization algorithms database 740 may include a communication device and/or a computing device. For example, the optimization algorithms database 740 may include a database, a server, a database server, an application server, a client server, a web server, a host server, a proxy server, a virtual server (e.g., executing on computing hardware), a server in a cloud computing system, a device that includes computing hardware used in a cloud computing environment, or a similar type of device. The optimization algorithms database 740 may communicate with one or more other devices of environment 700, as described elsewhere herein.
The hyperparameter set database 750 includes one or more devices capable of receiving, generating, storing, processing, and/or providing information associated with hyperparameter sets, as described elsewhere herein. The hyperparameter set database 750 may include a communication device and/or a computing device. For example, the hyperparameter set database 750 may include a database, a server, a database server, an application server, a client server, a web server, a host server, a proxy server, a virtual server (e.g., executing on computing hardware), a server in a cloud computing system, a device that includes computing hardware used in a cloud computing environment, or a similar type of device. The hyperparameter set database 750 may communicate with one or more other devices of environment 700, as described elsewhere herein.
The hardware database 760 includes one or more devices capable of receiving, generating, storing, processing, and/or providing information associated with hardware properties (e.g., TDP values), as described elsewhere herein. The hardware database 760 may include a communication device and/or a computing device. For example, the hardware database 760 may include a database, a server, a database server, an application server, a client server, a web server, a host server, a proxy server, a virtual server (e.g., executing on computing hardware), a server in a cloud computing system, a device that includes computing hardware used in a cloud computing environment, or a similar type of device. The hardware database 760 may communicate with one or more other devices of environment 700, as described elsewhere herein.
The user device 770 includes one or more devices capable of receiving, generating, storing, processing, and/or providing information associated with machine learning models, as described elsewhere herein. The user device 770 may include a communication device and/or a computing device. For example, the user device 770 may include a wireless communication device, a mobile phone, a user equipment, a laptop computer, a tablet computer, a desktop computer, a gaming console, a set-top box, a wearable communication device (e.g., a smart wristwatch, a pair of smart eyeglasses, a head mounted display, or a virtual reality headset), or a similar type of device.
The number and arrangement of devices and networks shown in
Bus 810 includes one or more components that enable wired and/or wireless communication among the components of device 800. Bus 810 may couple together two or more components of
Memory 830 includes volatile and/or nonvolatile memory. For example, memory 830 may include random access memory (RAM), read only memory (ROM), a hard disk drive, and/or another type of memory (e.g., a flash memory, a magnetic memory, and/or an optical memory). Memory 830 may include internal memory (e.g., RAM, ROM, or a hard disk drive) and/or removable memory (e.g., removable via a universal serial bus connection). Memory 830 may be a non-transitory computer-readable medium. Memory 830 stores information, instructions, and/or software (e.g., one or more software applications) related to the operation of device 800. In some implementations, memory 830 includes one or more memories that are coupled to one or more processors (e.g., processor 820), such as via bus 810.
Input component 840 enables device 800 to receive input, such as user input and/or sensed input. For example, input component 840 may include a touch screen, a keyboard, a keypad, a mouse, a button, a microphone, a switch, a sensor, a global positioning system sensor, an accelerometer, a gyroscope, and/or an actuator. Output component 850 enables device 800 to provide output, such as via a display, a speaker, and/or a light-emitting diode. Communication component 860 enables device 800 to communicate with other devices via a wired connection and/or a wireless connection. For example, communication component 860 may include a receiver, a transmitter, a transceiver, a modem, a network interface card, and/or an antenna.
Device 800 may perform one or more operations or processes described herein. For example, a non-transitory computer-readable medium (e.g., memory 830) may store a set of instructions (e.g., one or more instructions or code) for execution by processor 820. Processor 820 may execute the set of instructions to perform one or more operations or processes described herein. In some implementations, execution of the set of instructions, by one or more processors 820, causes the one or more processors 820 and/or the device 800 to perform one or more operations or processes described herein. In some implementations, hardwired circuitry is used instead of or in combination with the instructions to perform one or more operations or processes described herein. Additionally, or alternatively, processor 820 may be configured to perform one or more operations or processes described herein. Thus, implementations described herein are not limited to any specific combination of hardware circuitry and software.
The number and arrangement of components shown in
As shown in
As further shown in
As further shown in
As further shown in
Process 900 may include additional implementations, such as any single implementation or any combination of implementations described below and/or in connection with one or more other processes described elsewhere herein.
In a first implementation, process 900 further includes receiving an indication of hardware to be used for training the machine learning model, and determining the first energy consumption associated with training the machine learning model based on a TDP associated with the hardware.
In a second implementation, alone or in combination with the first implementation, process 900 further includes outputting, to the user, an indication of a recommended optimization algorithm for the machine learning model, where the configuration associated with the machine learning model includes an optimization algorithm selected by the user.
In a third implementation, alone or in combination with one or more of the first and second implementations, process 900 further includes estimating a second quantity of FLOPs associated with the one or more epochs, for the machine learning model, based on a second hyperparameter set, and outputting, to the user, an indication of a second energy consumption associated with training the machine learning model based on the second quantity of FLOPs.
In a fourth implementation, alone or in combination with one or more of the first through third implementations, outputting the indication of the first energy consumption and outputting the indication of the second energy consumption include outputting a visual graph of the first energy consumption and the second energy consumption relative to the first hyperparameter set and the second hyperparameter set.
In a fifth implementation, alone or in combination with one or more of the first through fourth implementations, the visual graph further includes variations of the first energy consumption and the second energy consumption relative to quantities of the one or more epochs.
In a sixth implementation, alone or in combination with one or more of the first through fifth implementations, process 900 further includes estimating a plurality of accuracy values associated with corresponding quantities of epochs, for the machine learning model, based on the first hyperparameter set, and determining a plurality of energy consumptions, including the first energy consumption, associated with training the machine learning model and corresponding to the plurality of accuracy values.
In a seventh implementation, alone or in combination with one or more of the first through sixth implementations, outputting the indication of the first energy consumption includes outputting a visual graph of the plurality of accuracy values relative to the plurality of energy consumptions.
In an eighth implementation, alone or in combination with one or more of the first through seventh implementations, process 900 further includes indicating, on the visual graph, a portion associated with an inflection point.
Although
The foregoing disclosure provides illustration and description, but is not intended to be exhaustive or to limit the implementations to the precise forms disclosed. Modifications may be made in light of the above disclosure or may be acquired from practice of the implementations.
As used herein, the term “component” is intended to be broadly construed as hardware, firmware, or a combination of hardware and software. It will be apparent that systems and/or methods described herein may be implemented in different forms of hardware, firmware, and/or a combination of hardware and software. The actual specialized control hardware or software code used to implement these systems and/or methods is not limiting of the implementations. Thus, the operation and behavior of the systems and/or methods are described herein without reference to specific software code—it being understood that software and hardware can be used to implement the systems and/or methods based on the description herein.
As used herein, satisfying a threshold may, depending on the context, refer to a value being greater than the threshold, greater than or equal to the threshold, less than the threshold, less than or equal to the threshold, equal to the threshold, not equal to the threshold, or the like.
Although particular combinations of features are recited in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of various implementations. In fact, many of these features may be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each dependent claim listed below may directly depend on only one claim, the disclosure of various implementations includes each dependent claim in combination with every other claim in the claim set. As used herein, a phrase referring to “at least one of” a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiple of the same item.
No element, act, or instruction used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items, and may be used interchangeably with “one or more.” Further, as used herein, the article “the” is intended to include one or more items referenced in connection with the article “the” and may be used interchangeably with “the one or more.” Furthermore, as used herein, the term “set” is intended to include one or more items (e.g., related items, unrelated items, or a combination of related and unrelated items), and may be used interchangeably with “one or more.” Where only one item is intended, the phrase “only one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise. Also, as used herein, the term “or” is intended to be inclusive when used in a series and may be used interchangeably with “and/or,” unless explicitly stated otherwise (e.g., if used in combination with “either” or “only one of”).