Machine learning often involves tuning parameters of computer-readable parametric models using training data. Training data can be operated upon using an initial parametric model, and the results of that operation can be processed to reveal calculated errors in the initial model parameters. The model parameters can be adjusted, or tuned, to reduce the error. More training data can be operated upon using the resulting newly-tuned model, and the results of that operation can be processed to reveal calculated errors in the initial model parameters. The processing of training data using the model and then adjusting the model parameters can be repeated many times in an iterative process to tune a model. For example, such tuning may be done to tune parametric models for identifying features such as visual objects in digital images (such as identifying that an image includes a tree, or that it includes a house) or words in digital audio (speech recognition).
The parameter tuning operations for parametric models in machine learning are governed by parameters other than those in the model being tuned. Such governing parameters are referred to as hyperparameters. In machine learning, such hyperparameters have been selected by administrative users. Such users often utilize a trial-and-error approach, where a user may input some hyperparameters into a computer system, which uses the hyperparameters to govern a model parameter tuning process. A user may then input some different hyperparameter values to find out if those different values result in a better tuning process. Some machine learning frameworks have provided visualizations and other results that allow comparisons between different tuning process jobs, each of which may use different sets of hyperparameter values. For example, a machine learning framework may provide a graph of how precision of a model changed over time for different tuning jobs, with different graph lines for different jobs being overlaid on the same graph for comparison. As another example, a machine learning framework may provide a list of error values for different parameter tuning jobs.
The tools and techniques discussed herein relate to a computerized hyperparameter tuning tool, which can provide improved efficiencies for users and computer systems in comparing effectiveness of different sets of hyperparameter values. Also, such efficiencies can allow for a scaled-up tuning process for tuning the hyperparameters that are used in tuning computer-readable parametric models for machine learning. This can facilitate selection of hyperparameter values that are more effective in tuning models in a machine learning process.
In one aspect, the tools and techniques can include performing a technique via a hyperparameter tuning tool. The technique can include receiving computer-readable values for each of one or more hyperparameters that govern operation of a computerized parameter tuning system in tuning parameters in a machine learning operation. The technique can further include defining multiple computer-readable hyperparameter sets that each includes a set of the computer-readable values. The defining of the hyperparameter sets can include using the computer-readable values to generate different combinations of the computer-readable values, with each hyperparameter set including one of the computer-readable values for each of the one or more hyperparameters. The technique can further include receiving a computer-readable request to start an overall hyperparameter tuning operation. The technique can also include responding to that request to start by performing the overall hyperparameter tuning operation via the hyperparameter tuning tool, with the overall hyperparameter tuning operation including a tuning job for each of the hyperparameter sets. Performing the tuning operation can include, for each of the tuning jobs, performing a parameter tuning operation on a set of parameters in a parameter model as governed by the hyperparameter set using the parameter tuning system, with the parameter tuning operation operating on computer-readable training data using the parameter model. Performing the tuning operation can also include, for each of the tuning jobs, generating computer-readable results of the parameter tuning operation for the hyperparameter set, with the results of the parameter tuning operation representing a level of effectiveness of the parameter tuning operation using the hyperparameter set. The technique of
Another aspect of the tools and techniques can also include performing a technique via a hyperparameter tuning tool. In this technique, computer-readable values can be received for each of one or more hyperparameters that govern operation of a computerized parameter tuning system in tuning parameters in a machine learning operation. Multiple different computer-readable hyperparameter sets can be defined using the hyperparameter values, with each hyperparameter set including a different set of the hyperparameter values, and with each hyperparameter set including one of the hyperparameter values for each of the one or more hyperparameters. The technique can also include generating computer-readable tuning job requests using the hyperparameter sets, with the computer-readable tuning job requests each defining a different one of the hyperparameter sets to govern a parameter tuning job. Each tuning job request can be sent to a to a computerized parameter tuning system, with each tuning job request instructing the parameter tuning system to conduct a parameter tuning job that includes tuning a parameter model as governed by a corresponding hyperparameter set defined in the tuning job request. The technique can also include retrieving a comparison of results of the parameter tuning jobs, with the results indicating effectiveness of the different hyperparameter sets in tuning the parameter model. Further, the technique can include presenting a representation of the comparison using a computer output device.
This Summary is provided to introduce a selection of concepts in a simplified form. The concepts are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Similarly, the invention is not limited to implementations that address the particular techniques, tools, environments, disadvantages, or advantages discussed in the Background, the Detailed Description, or the attached drawings.
Aspects described herein are directed to techniques and tools for tuning hyperparameters for use in governing machine learning processes. Such improvements may result from the use of various techniques and tools separately or in combination.
Such techniques and tools may include a hyperparameter tuning tool, which can automate generation, submission, and/or monitoring of multiple different parametric model tuning operations, or jobs, each of which can have a different hyperparameter set, with the hyperparameter sets including different combinations of input hyperparameter values. The tool can also facilitate the retrieval and display of results of those tuning operations, as well as comparisons of the results of the different tuning operations with different sets of hyperparameter values.
The tuning operations may be performed in a computer cluster, such as a graphics processing unit (GPU) cluster, and the tuning tool can be configured to work with multiple different parameter tuning applications, such as instances of multiple different types of deep learning frameworks. The parametric models being tuned may be artificial neural networks, such as deep neural networks. The tuning tool may handle training job failures (i.e., failures of the tuning jobs for the different hyperparameter sets) by monitoring job status and re-trying failed jobs automatically.
Various hyperparameters and values may be chosen using the hyperparameter tuning tool. After a user input request (such as clicking a “Submit” button following the input of hyperparameter values to be used), the tool can submit and monitor jobs with combinations of the hyperparameter values (such as with a different job for each different possible combination of the entered hyperparameter values, with each combination being used as a hyperparameter set).
The hyperparameter tuning tool may also facilitate comparison of results of jobs using the different hyperparameter sets. For example, the tool may receive a request to retrieve comparisons of selected hyperparameter sets, such as a user input request to visualize results of jobs. This may yield training curves that illustrate how effectiveness of the parameter models being trained compares between different hyperparameter value sets. As an example, this may yield a displayed graph of overlaid training curves (such as training curves showing the change in precision over time) for selected jobs, along with listings of error values for the jobs using the different hyperparameter sets.
Additionally, the tool may monitor the tuning jobs and can automatically retry jobs that fail. The tool may also provide notifications, such as by email, when the status of jobs change.
Accordingly, one or more substantial benefits can be realized from the tools and techniques described herein using the hyperparameter tuning tool. For example, the use of the tuning tool to define hyperparameter sets from input values, and to generate job requests and submit those job requests can provide several benefits. For example, such use of the tuning tool can also allow for more combinations of hyperparameters to be tested, and for the results of such combinations to be compared. Additionally, because a user need not manually enter each hyperparameter value combination, the tuning tool can reduce typographical errors in the entering of the hyperparameter values, thereby increasing the reliability of selecting hyperparameter value combinations that are most effective. These benefits can be further improved by the automatic retrying of failed tuning jobs, and facilitating comparison of tuning results. Accordingly, the hyperparameter tuning tool can result in better hyperparameter value selection that can yield better parameter model tuning, and overall better machine learning results, with a simplified interface and less time and effort by computer users who oversee the machine learning operations, such as deep neural network model training processes.
The subject matter defined in the appended claims is not necessarily limited to the benefits described herein. A particular implementation of the invention may provide all, some, or none of the benefits described herein. Although operations for the various techniques are described herein in a particular, sequential order for the sake of presentation, it should be understood that this manner of description encompasses rearrangements in the order of operations, unless a particular ordering is required. For example, operations described sequentially may in some cases be rearranged or performed concurrently. Moreover, for the sake of simplicity, flowcharts may not show the various ways in which particular techniques can be used in conjunction with other techniques.
Techniques described herein may be used with one or more of the systems described herein and/or with one or more other systems. For example, the various procedures described herein may be implemented with hardware or software, or a combination of both. For example, the processor, memory, storage, output device(s), input device(s), and/or communication connections discussed below with reference to
The computing environment (100) is not intended to suggest any limitation as to scope of use or functionality of the invention, as the present invention may be implemented in diverse types of computing environments.
With reference to
Although the various blocks of
A computing environment (100) may have additional features. In
The memory (120) can include storage (140) (though they are depicted separately in
The input device(s) (150) may be one or more of various different input devices. For example, the input device(s) (150) may include a user device such as a mouse, keyboard, trackball, etc. The input device(s) (150) may implement one or more natural user interface techniques, such as speech recognition, touch and stylus recognition, recognition of gestures in contact with the input device(s) (150) and adjacent to the input device(s) (150), recognition of air gestures, head and eye tracking, voice and speech recognition, sensing user brain activity (e.g., using EEG and related methods), and machine intelligence (e.g., using machine intelligence to understand user intentions and goals). As other examples, the input device(s) (150) may include a scanning device; a network adapter; a CD/DVD reader; or another device that provides input to the computing environment (100). The output device(s) (160) may be a display, printer, speaker, CD/DVD-writer, network adapter, or another device that provides output from the computing environment (100). The input device(s) (150) and output device(s) (160) may be incorporated in a single system or device, such as a touch screen or a virtual reality system.
The communication connection(s) (170) enable communication over a communication medium to another computing entity. Additionally, functionality of the components of the computing environment (100) may be implemented in a single computing machine or in multiple computing machines that are able to communicate over communication connections. Thus, the computing environment (100) may operate in a networked environment using logical connections to one or more remote computing devices, such as a handheld computing device, a personal computer, a server, a router, a network PC, a peer device or another common network node. The communication medium conveys information such as data or computer-executable instructions or requests in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
The tools and techniques can be described in the general context of computer-readable media, which may be storage media or communication media. Computer-readable storage media are any available storage media that can be accessed within a computing environment, but the term computer-readable storage media does not refer to propagated signals per se. By way of example, and not limitation, with the computing environment (100), computer-readable storage media include memory (120), storage (140), and combinations of the above.
The tools and techniques can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various aspects. Computer-executable instructions for program modules may be executed within a local or distributed computing environment. In a distributed computing environment, program modules may be located in both local and remote computer storage media.
For the sake of presentation, the detailed description uses terms like “determine,” “perform,” “choose,” “adjust,” “define”, “generate”, and “operate” to describe computer operations in a computing environment. These and other similar terms are high-level descriptions for operations performed by a computer, and should not be confused with acts performed by a human being, unless performance of an act by a human being (such as a “user”) is explicitly noted. The actual computer operations corresponding to these terms vary depending on the implementation.
Referring still to
A. Example Cluster-Based Hyperparameter Tuning System
Referring now to
The parameter tuning system (240) can be configured to perform machine learning operations, such as tuning parameters of machine learning models. As an example, the parameter tuning system (240) can include a computer cluster (242), which may be a combination of multiple sub-clusters configured to operate together. The cluster (242) can be managed by a cluster manager (244), which can be running in the cluster (242). For example, the cluster manager (244) can distribute and monitor jobs being performed by one or more processors in the cluster (242). In one example, the cluster may include graphics processing units that operate together as dictated by the cluster manager (244) to execute jobs submitted to the cluster (242). The cluster (242) can run one or more parameter tuning applications (250), such as operating instances of deep learning frameworks. The cluster (242) can also run a second tool part (260) of the hyperparameter tuning tool (220). The second tool part (260) can act as an intermediary between the first tool part (222) in the tool server system (218) and the parameter tuning application(s) (250) in the cluster (242). As an example, the first tool part (222) may be implemented as a Web-based application that utilizes programming such as computer script in one or more scripting languages to perform actions discussed herein. Additionally, the second tool part (260) may be a script running in the computer cluster (242). The interface (210) can utilize a Web browser application to interface with the first tool part (222). The first tool part (222) and the second tool part (260) can communicate requests to each other, and can be programmed to process such requests and provide responses over the network (230), which can be facilitated by the cluster manager (244).
The second tool part (260) can communicate with the parameter tuning application(s) (250) using application programming interfaces that are exposed by the parameter tuning application(s) (250). Different parameter tuning applications (250) can dictate the use of different application programming interface calls and responses. Accordingly, if the second tool part (260) is to interact with multiple different parameter tuning applications (250), the second tool part can include alternative programming code for translating requests and responses differently for different parameter tuning applications (250). For example, the second tool part (260) can include multiple scripts running in the cluster manager (244), with one script for handling requests and responses for each different parameter tuning application (250). In this instance, the communications sent from the first tool part (222) to the second tool part (260) can be addressed to a specified script in the second tool part (260). For example, the first tool part (222) may send a parameter tuning job request to a specified script in the second tool part (260), to be forwarded to a specified parameter tuning application (250). That script can be programmed in a scripting language to respond to the request by generating one or more application programming interface calls that are formatted for the corresponding parameter tuning application, and sending those application programming interface calls as a job request to the parameter tuning application (250).
A hyperparameter tuning system may be configured in various alternative ways that are different from what is illustrated in
B. Hyperparameter Tuning System Communications and Operations
Additional details of communications and operations of components in a hyperparameter tuning system will now be discussed. In the discussion of embodiments herein, communications between the various devices and components discussed herein can be sent using computer system hardware, such as hardware within a single computing device, hardware in multiple computing devices, and/or computer network hardware. A communication or data item may be considered to be sent to a destination by a component if that component passes the communication or data item to the system in a manner that directs the system to route the item or communication to the destination, such as by including an appropriate identifier or address associated with the destination. Also, a data item may be sent in multiple ways, such as by directly sending the item or by sending a notification that includes an address or pointer for use by the receiver to access the data item. In addition, multiple requests may be sent by sending a single request that requests performance of multiple tasks.
Referring now to
Referring to
Referring to
The configuration data (432) in an intra-tool job request (426) can also include a parameter model (436) to be tuned in the requested parameter tuning job. The first tool part (222) can generate the intra-tool job requests (426), with one job request for each of the defined hyperparameter sets. This generating can include inserting the job data (428) in the request, in a format that the second tool part (260) is programmed to understand. The intra-tool job requests (426) can be sent from the first tool part (222) to the second tool part (260). The second tool part (260) may simply translate each intra-tool job request (426) into a format that can be recognized and processed by the parameter tuning application (250), such as into the form of application programming interface calls that are exposed and published for the parameter tuning application (250). For example, this may be done by mapping the items of job data (428) in the intra-tool job requests (426) onto corresponding items of parameter tuning application job requests (322). This translating can generate a parameter tuning application job request (322) for each intra-tool job request (426). The parameter tuning application job requests (322) can be formatted to be understood by the selected parameter tuning application (250), such as by complying with available application programming interface call requirements for the parameter tuning application (250).
Each parameter tuning application job request (322) can include job data (448), which can include training data (450) and configuration data (452). The configuration data (452) can include a job hyperparameter set (454) and a parameter model (456) for the corresponding parameter tuning job. The second tool part (260) can send each parameter tuning application job request (322) to the parameter tuning application (250), requesting the parameter tuning application (250) to perform the requested jobs.
Referring to
Referring still to
If the tool (220) receives an indication that a job's status has changed (e.g., from running to successful completion), the tool (220) can respond by sending a status update (346) to the interface (210). In some implementations, the status updates (346) may be in the form of emails that are sent from the tool (220) to a registered email address corresponding to the submitted jobs (332) (e.g., a user profile corresponding to a logged-in user profile that submitted the start request (312)). In response to receiving a status update (346), the interface (210) can present the status (346), such as by displaying an email that indicates the status update.
If the tool (220) identifies (348) failure of a job (332), such as by receiving an indication that the job's status has changed from running to failed, then the tool (220) can respond by sending a retry failed job request (322). This request can be the same as, or at least similar to, the original job request (322) sent for the failed job (332). In response to receiving the retry failed job request (322), the parameter tuning application (250) can retry performing the job (332). In some embodiments, the tool (220) may only send a retry failed job request (322) if the failed job (332) made at least some progress prior to its failure, or only if at least some progress has been made in at least some of the jobs (332) in the overall operation (330). For example, this can be determined from information that the tool (220) retrieves from the parameter tuning application (250), or from some other component, such as the cluster manager (244) (which may report data such as progress values for jobs, as discussed below).
Referring still to
The parameter tuning application (250) can return the results (360) and comparison (362) to the tool (220), and the tool (220) can forward results (364) and a comparison (366) to the interface, which can present the results (368) and the comparison (370) to a user, such as on a computer display. In one implementation, where the results and comparison are rendered by the parameter tuning application (250), the parameter tuning application (250) may store a rendered page (such as a Web page) in a location, and provide an address, such as a uniform resource locator, to allow the interface (210) to retrieve the rendered page.
C. Hyperparameter Tuning System User Interface Examples
Examples of user interface displays for use with the hyperparameter tuning tool will now be discussed with reference to
1. Job Submission Display
In the illustrated example, a value of JOB_NAME is entered adjacent to a NAME identifier for the name of the overall hyperparameter tuning process. A value of /FOLDER/SUBFOLDER/SUBFOLDER is entered adjacent to a DATA identifier for a location, such as a filesystem path to a location for training data to be used in the parameter tuning jobs of the hyperparameter tuning operation. A value of CONFIGNAME.SH is entered adjacent to a CONFIG identifier for the location of a configuration file that can include configuration data for the hyperparameter tuning operation, which can include a model to be tuned by the jobs of the operation as well as other configuration data. A value of TUNEAPPNAME is entered adjacent to the DOCKER identifier for identifying a container for the parameter tuning application to be used in the overall tuning application, such as a container for the parameter tuning application in a computer cluster. This can be used to select which of multiple available parameter tuning applications is to be used for the hyperparameter tuning process, and the hyperparameter tuning tool can direct its communications to that application and format its communications for the application. For example, a different second tool part can be used for each different type of parameter tuning application. Thus, each second tool part can be configured to communicate with a different type of hyperparameter tuning application, such as using different application programming interface calls for each application.
Referring still to
The job submission display 500 can also specify hyperparameters (510) and values (520) entered adjacent the indicators for the corresponding hyperparameters (510). User input can be provided for each hyperparameter (510), to enter a single value or multiple values. In this example, the multiple values for a hyperparameter are separated by a space within the text entry box. The tool can be configured to parse such data and identify the values within each box. The hyperparameters (510) are indicated by the “OPTION” text, and additional text entry boxes for options (additional hyperparameters (510)) can be provided in response to user input selecting the ADD OPTIONS button at the top of the job submission display (500). In the illustrated example, there are values entered for four hyperparameters. One is DF, which can be decay factor (which could be specified as “decay_factor”, rather than DF); another is DS, which can be decay steps (which could be specified as “decay_steps”, rather than DS); another is LR, which can be an initial learning rate; and another is BS, which can be batch size.
The initial learning rate (LR) is the initial fraction of the learning error that is corrected for the parameters in a model being tuned in a machine learning process. For example, the learning rate may be used with backpropagation in tuning parameter models that are artificial neural networks, where backpropagation is a technique that can be used to calculate an error contribution of each neuron in an artificial neural network after a batch of training data is processed. Typically, only a fraction of this calculated error contribution is corrected when tuning the artificial neural network model, and the initial fraction to be taken out is learning rate hyperparameter.
The number of items in a batch of training data (such as the number of images for image recognition) is dictated by the batch size (BS) hyperparameter. The decay steps (DS) hyperparameter is the number of steps (processed training data batches) between dropping the value of the learning rate, and the decay factor (DF) is the ratio indicating how much the learning rate is dropped after the number of steps in the decay steps hyperparameter. These are merely examples of hyperparameters that can be tuned using the hyperparameter tuning tool. Values of other hyperparameters may be entered and tuned in addition to, or instead of, these hyperparameters.
In the example, a single value of 0.0125 is entered for the decay factor (DF), values of 1000, 2000, and 4000 are entered for the decay steps (DS), values of 0.001, 0.005, and 0.01 are entered for the initial learning rate (LR), and a single value of 32 is entered for the batch size (BS). A user input start request can be provided for the hyperparameter tuning process indicated in the job submission display (500) by selecting the SUBMIT button on the job submission display (500). The hyperparameter tuning tool can respond to that start request by defining hyperparameter sets for corresponding jobs, generating the requests for the corresponding jobs, and sending the job requests to the parameter tuning application indicated on the job submission display.
2. Running Jobs Display
Referring now to
CUST-R-JOB_NAME_DF0.0125_DS100000_LR0.001_BS32!˜!˜!1
This name indicates that the hyperparameter set for this job includes a decay factor (DF) value of 0.0125, a decay steps (DS) value of 100000, an initial learning rate (LR) value of 0.001, and a batch size (BS) of 32. As can be seen by the hyperparameter values in the job names the hyperparameter tuning tool has defined a hyperparameter set for each combination of the hyperparameter values entered by user input through the job submission display (500), and has submitted a parameter tuning job for each of them. Accordingly, rather than having a user manually enter and re-enter values for each of these possible hyperparameter sets, a user can simply enter each hyperparameter value one time in the job submission display, and request the hyperparameter tuning tool to define the hyperparameter sets from combinations of those values, and submit the corresponding parameter tuning jobs to the parameter tuning application. This can save substantial time and effort on the part of computer users, and can speed up the process of defining and submitting the jobs.
Still referring to
Each job listing (610) can also include an entry in a progress column (PROG), which can indicate how much progress has been made in running the model, with higher numbers indicating more progress. The job listings (610) can also each include an entry in a portal (PRTL) column, which can be a link to a portal for a computer system running the parameter tuning application, such as a link to a Web portal for a cluster that is running the parameter tuning application. Additionally, the job listings (610) can include an entry in the model column, which can be a link to a location of the parameter model used in the corresponding parameter tuning job. Also, the “ETC” column entry for each job listing (610) can include a link to a storage location that includes related stored resources for the corresponding job.
Also, each job listing (610) can include an entry in the visualization (VIS) column that can be a link to be selected to retrieve and display a visualization for that job, such as a page that displays results for that job, such as a listing of the loss for the job, and a precision learning curve for the job. Each job listing (610) can also include a listing in the clone column, which can be a link that can be selected to generate entries in the job submission display that are the same as for that job. For example, these may include all the values other than the option values in the job submission display (500) discussed above, and they may even include pre-populated values (the same values as in the job) for the options (hyperparameter values such as DF, DS, LR, and BS in
In addition, each job listing (610) can include a checkbox in the selection (SEL) column, which can be checked with user input to select that job for actions to be selected from control buttons (620) on the running jobs display (600). For example, user input selecting one or more job listings (610) and selecting the button labeled KILL can request that the tool terminate the selected jobs, with the tool responding by sending requests to the parameter tuning application to terminate those jobs. Selecting the MONITOR button while job listings (610) are selected can generate and send monitor requests (342) discussed above with reference to
3. Visualization Display
If job listings (610) are selected and the VISUALIZE button is selected, a results request (352) can be sent to the tool (see
The job listings (730) can also include loss values (740) for the selected jobs, with those loss values also being results that can indicate effectiveness of the corresponding hyperparameter sets of the jobs. Accordingly, the table of job listings can be a textual (rather than graphical) comparison of the effectiveness of the different jobs and corresponding hyperparameter sets. For example, a lower loss rate can indicate a more effective hyperparameter set, and a higher loss rate can indicate a less effective hyperparameter set. Contrarywise, greater increase in the precision values illustrated in the precision learning curve of the comparison graph (i.e., a curve that has a greater upward trend) can indicate a more effective corresponding parameter set, and less of an increase in the precision values illustrated in the precision curve of the comparison graph can indicate a less effective corresponding parameter set.
The visualization display (700) may be presented while the corresponding jobs are still running and/or after the corresponding jobs are complete. Using the results and comparisons of results, a hyperparameter set can be selected, such as a hyperparameter set in a job that exhibits the lowest loss and/or the greatest precision gain during the tuning. This selection may be received as user input after the results comparisons are presented, or it may be provided as an automated identification and selection. For example, the hyperparameter set with the lowest loss may be identified and selected by the hyperparameter tuning system for use in subsequently tuning parameter models. This identification and selection can include analyzing the loss values in the results. As an alternative, a hyperparameter set with the greatest gain in precision may be identified and selected by the hyperparameter tuning system by analyzing the precision learning curves, or directly analyzing values for precision from the job results. Other alternative selection criteria may be used, such as a weighted combination that factors in the loss values and the gain in precision, to provide a computer-readable score for each job, which can then be compared between jobs to identify and select a best scoring job and corresponding hyperparameter set.
The selected hyperparameter set can be used in subsequent tuning operations for tuning parameter models. For example, a hyperparameter set may be used in tuning general speech recognition models, user-specific speech recognition models, image recognition models, or other machine learning models. Using the hyperparameter tuning model discussed herein can allow better tuning of the hyperparameters, which can in turn produce better parameter model tuning. Indeed, the use of such hyperparameters selected using a hyperparameter tool as discussed herein has been shown to improve the accuracy of image classification. Specifically, a hyperparameter selection tool was used to tune over 100 models used in the image classification, with different hyperparameter sets selected using the hyperparameter tuning tool. The image classification accuracy of the models tuned with the hyperparameters tuned using a parameter selection tool as discussed herein was greater than with previous models tuned with different hyperparameters that were not tuned with the hyperparameter tuning tool.
Several hyperparameter tuning tool techniques will now be discussed. Each of these techniques can be performed in a computing environment. For example, each technique may be performed in a computer system that includes at least one processor and memory including instructions stored thereon that when executed by at least one processor cause at least one processor to perform the technique (memory stores instructions (e.g., object code), and when processor(s) execute(s) those instructions, processor(s) perform(s) the technique). Similarly, one or more computer-readable memory may have computer-executable instructions embodied thereon that, when executed by at least one processor, cause at least one processor to perform the technique. The techniques discussed below may be performed at least in part by hardware logic. Additionally, the different features of the techniques discussed below may be used with each other in different combinations, as the different features can provide benefits when used alone and/or in combination with other features.
Referring to
The performing (840) of the overall hyperparameter tuning operation can be done in a computer cluster, as discussed above. At least part of the hyperparameter tuning tool can be located outside the computer cluster. For example, a first part of the hyperparameter tuning tool can be located outside the computer cluster, and a second part of the hyperparameter tuning tool can be located inside the computer cluster. The technique can include, responsive to a request to start the overall hyperparameter tuning operation, sending a first set of one or more requests from the first part of the hyperparameter tuning tool to the second part of the hyperparameter tuning tool. Also, the technique can include, responsive to the first set of one or more requests, sending a second set of requests corresponding to the first set of one or more requests from the second part of the hyperparameter tuning tool to a machine learning framework running in the computer cluster, with the second set of one or more requests instructing the machine learning framework to perform the tuning jobs.
The technique of
Referring now to
The technique can also include defining (920) multiple different computer-readable hyperparameter sets using the hyperparameter values, with each hyperparameter set including a different set of the hyperparameter values, and with each hyperparameter set including one of the hyperparameter values for each of the one or more hyperparameters. The defining (920) of the hyperparameter sets can include using the computer-readable values to generate different combinations of the computer-readable values.
The technique of
In the technique of
Also, in the
The defining (920) of the hyperparameter sets, the generating (930) of the tuning job requests, and/or the sending (940) of the tuning job requests to the parameter tuning system may all be performed in response to receiving a single computer-readable request, such as in response to receiving a single user input request.
The technique of
The technique of
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.