This U.S. patent application claims priority under 35 U.S.C. § 119 to: India application No. 202321067480, filed on Oct. 9, 2023. The entire contents of the aforementioned application are incorporated herein by reference.
The embodiments herein generally relate to the field of optimization of Deep Neural Networks (DNNs) and, more particularly, to a method and system for integrated platform enabling rapid automated generation of optimized DNNs for edge devices.
Edge computing is becoming a preferred choice of Artificial Intelligence (AI)/Machine Learning (ML) model deployment, as it reduces the cost of running expensive Graphics Processing Unit (GPU) servers in the cloud. Along with reduced server cost, it helps with protecting users' data privacy. Edge computing requires building optimized ML models, also referred to as Tiny ML models or Edge AI models, for resource constrained edge devices. Generating the tiny ML models require a huge amount of workflow setup, engineering skills and research skills. Hence, there is a need to automate the process to generate Tiny ML models and contribute to make analytics Green. Various research paths are being explored by AI researchers to design algorithms and hardware to build optimized Tiny ML models quickly and efficiently. Some of the techniques in literature are listed below:
Reduction in end-to-end time to build and deploy a model at Edge involves huge engineering (embedded systems hardware and software knowledge), workflow setup and research effort. Therefore, it is a technical challenge to choose the correct technique for a combination of data, task, and hardware at hand. Further, it takes immense cost in terms of hiring niche talent, Research and Development (R and D) and systems engineering.
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.
For example, in one embodiment, a method for rapid generation of optimized Deep Neural Networks (DNNs) for edge devices is provided.
The method includes receiving a plurality of input parameters comprising a target hardware metric constraints of a target edge device, and at least one of i) a training dataset, and ii) a pretrained DNN model trained for a target application.
Further, the method includes selecting one of i) a Fast-Neural Architecture Search (F-NAS) based optimization technique and ii) a non-F-NAS based optimization technique for generating an optimized DNN model customized for deployment on the target edge device to run the target application. The selection is based on availability of the training dataset, availability of the pretrained DNN model, and the target hardware metric constraints. The F-NAS based optimization technique comprises i) a Fast-Neural Architecture Search (F-NAS), and ii) a LTH+F-NAS; and the non-F-NAS based optimization technique comprises i) a EvoPrune technique, ii) a Lottery Ticket Hypothesis (LTH), iii) and an utilities based approach in accordance with the target application.
Furthermore, the method includes generating the optimized DNN model using the F-NAS based optimization technique comprising one of: the LTH+F-NAS if the pretrained DNN model is available and the training dataset is available, wherein the pretrained DNN model is pruned using the LTH pipeline and passed to the F-NAS in form of a student model to generate a search space; and the F-NAS directly if the pretrained DNN model is unavailable, and the training dataset is available. The F-NAS is based on a Reinforcement Learning (RL)-NAS technique using a RL-NAS reward comprising a sigmoid based multi-objective reward function with an exponential decay and generalized for maximization and minimization of Key Performance metrics. The F-NAS comprising: (a) receiving the plurality of input parameters; (b) initializing a RL search space based on a default search space or the student model; (c) initializing hyperparameters based on default values or values derived from user inputs; (d) receiving a ratio of a Scaled Hamming Distance (SHD) episodes among a total number of episodes; (e) initializing a set of default hyperparameters for model training and evaluation for intermediate models generated during a NAS process (206e); and (f) generating the optimized DNN model by combining a RL-based NAS technique with a SHD based metric that reduces a search time. The hamming distance based metric is scaled between bounds 0 to 1 to be used in the RL-NAS reward.
Further, the method includes deploying the optimized DNN model on the target edge device to run the target application.
In another aspect, a system for rapid generation of optimized Deep Neural Networks (DNNs) for edge devices is provided. The system comprises a memory storing instructions; one or more Input/Output (I/O) interfaces; and one or more hardware processors coupled to the memory via the one or more I/O interfaces, wherein the one or more hardware processors are configured by the instructions to
The one or more hardware processors are configured to receive a plurality of input parameters comprising a target hardware metric constraints of a target edge device, and at least one of i) a training dataset, and ii) a pretrained DNN model trained for a target application.
Further, the one or more hardware processors are configured to select one of i) a Fast-Neural Architecture Search (F-NAS) based optimization technique and ii) a non-F-NAS based optimization technique for generating an optimized DNN model customized for deployment on the target edge device to run the target application. The selection is based on availability of the training dataset, availability of the pretrained DNN model, and the target hardware metric constraints. The F-NAS based optimization technique comprises i) a Fast-Neural Architecture Search (F-NAS), and ii) a LTH+F-NAS; and the non-F-NAS based optimization technique comprises i) a EvoPrune technique, ii) a Lottery Ticket Hypothesis (LTH), iii) and an utilities based approach in accordance with the target application.
Furthermore, the one or more hardware processors are configured to generate the optimized DNN model using the F-NAS based optimization technique comprising one of: the LTH+F-NAS if the pretrained DNN model is available and the training dataset is available, wherein the pretrained DNN model is pruned using the LTH pipeline and passed to the F-NAS in form of a student model to generate a search space; and the F-NAS directly if the pretrained DNN model is unavailable, and the training dataset is available. The F-NAS is based on a Reinforcement Learning (RL)-NAS technique using a RL-NAS reward comprising a sigmoid based multi-objective reward function with an exponential decay and generalized for maximization and minimization of Key Performance metrics. The F-NAS comprising: (a) receiving the plurality of input parameters; (b) initializing a RL search space based on a default search space or the student model; (c) initializing hyperparameters based on default values or values derived from user inputs; (d) receiving a ratio of a Scaled Hamming Distance (SHD) episodes among a total number of episodes; (e) initializing a set of default hyperparameters for model training and evaluation for intermediate models generated during a NAS process (206e); and (f) generating the optimized DNN model by combining a RL-based NAS technique with a SHD based metric that reduces a search time. The hamming distance based metric is scaled between bounds 0 to 1 to be used in the RL-NAS reward.
Further, the one or more hardware processors are configured to deploy the optimized DNN model on the target edge device to run the target application.
In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions, which when executed by one or more hardware processors causes a method for rapid generation of optimized Deep Neural Networks (DNNs) for edge devices.
The method includes receiving a plurality of input parameters comprising a target hardware metric constraints of a target edge device, and at least one of i) a training dataset, and ii) a pretrained DNN model trained for a target application.
Further, the method includes selecting one of i) a Fast-Neural Architecture Search (F-NAS) based optimization technique and ii) a non-F-NAS based optimization technique for generating an optimized DNN model customized for deployment on the target edge device to run the target application. The selection is based on availability of the training dataset, availability of the pretrained DNN model, and the target hardware metric constraints. The F-NAS based optimization technique comprises i) a Fast-Neural Architecture Search (F-NAS), and ii) a LTH+F-NAS; and the non-F-NAS based optimization technique comprises i) a EvoPrune technique, ii) a Lottery Ticket Hypothesis (LTH), iii) and an utilities based approach in accordance with the target application.
Furthermore, the method includes generating the optimized DNN model using the F-NAS based optimization technique comprising one of: the LTH+F-NAS if the pretrained DNN model is available and the training dataset is available, wherein the pretrained DNN model is pruned using the LTH pipeline and passed to the F-NAS in form of a student model to generate a search space; and the F-NAS directly if the pretrained DNN model is unavailable, and the training dataset is available. The F-NAS is based on a Reinforcement Learning (RL)-NAS technique using a RL-NAS reward comprising a sigmoid based multi-objective reward function with an exponential decay and generalized for maximization and minimization of Key Performance metrics. The F-NAS comprising: (a) receiving the plurality of input parameters; (b) initializing a RL search space based on a default search space or the student model; (c) initializing hyperparameters based on default values or values derived from user inputs; (d) receiving a ratio of a Scaled Hamming Distance (SHD) episodes among a total number of episodes; (e) initializing a set of default hyperparameters for model training and evaluation for intermediate models generated during a NAS process (206e); and (f) generating the optimized DNN model by combining a RL-based NAS technique with a SHD based metric that reduces a search time. The hamming distance based metric is scaled between bounds 0 to 1 to be used in the RL-NAS reward.
Further, the method includes deploying the optimized DNN model on the target edge device to run the target application.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
Many factors such as Artificial Intelligence (AI)/Machine Learning (ML) task, the availability of training dataset, the availability of existing larger Neural Network Models, the computing constraints on an edge device or target device, the type of platform for the final deployment as so on are critical in selecting the right technique for Neural network (NN) model optimization. Furthermore, selection of optimization technique, Tiny model or optimized model generation and deployment is desired to be time efficient effectively contributing to make analytics Green. Reduction in end-to-end time to build and deploy a model at edge involves huge engineering (embedded systems hardware and software knowledge), workflow setup and research effort.
Embodiments of the present disclosure provide a method and system providing an integrated platform enabling rapid automated generation of optimized Deep Neural Networks (DNNs) for edge devices. The system, also referred to as Edge Wizard provides an automation platform to generate optimized DNNs or Tiny models for resource-constrained target hardware. The system augments and integrates knowledge from domain experts, data scientists, ML Engineers, or developers to select or recommend most appropriate optimization technique based on the target application (the AI/ML task and domain where the task is applied), the availability of training dataset, the availability of existing larger Neural Network (NN) Models, the computing constraints on the target device, and the type of platform for the final deployment. An optimized DNN model is then generated in accordance with the recommended optimization technique. The integrated platform disclosed herein thus saves implementation and maintenance cost of business enterprises by offloading the burden of huge amount of workflow setup, engineering skills and research skills required for edge AI development.
Further, one of the optimization technique disclosed herein, referred to as Fast-Neural Architecture Search (F-NAS), upgrades and accelerates Reinforcement Learning based Neural Architecture Search (RL-NAS), to allow it to be generalized across multiple applications with reduced power consumption of 95% less computational power (Graphical Processing Unit (GPU hours)). The F-NAS improves existing RL-NAS by introducing a model performance evaluation technique with the help of a new metric, a new reward function with adaptive parameters, an early-exit strategy to further expedite the optimization process, and a new NAS flow enhanced with AutoML (Hyper-Parameter Optimization) to minimize human intervention.
Most existing platforms just focus on one ML task and one technique to generate an optimized ML model for resource constrained target hardware. Whereas our platform/system provides support for many ML tasks/use-cases and automatically chooses an appropriate technique from a pool of techniques, to generate optimized ML model based on an intelligent rule-based system. It also supports model generation for a variety of edge computing hardware. Our platform/system is a GUI based tool that reduces development time, reduces required man power, reduces GPU server expense and makes the developers life easier.
Existing approaches to Neural Architecture Search (NAS) often involve high-capacity GPU systems, where the major overhead is from the training of candidate models. While the optimization process has been accelerated by various approaches such as differential architecture search techniques (DARTS, PDARTS), it still has a substantial GPU requirement to complete the optimization process. State-of-the-art approaches do not provide search techniques which can run on non-GPU devices. The Faster-NAS disclosed herein can run on CPU server as well, to search Neural Networks Architecture quickly and efficiently.
Referring now to the drawings, and more particularly to
In an embodiment, the system 100, also referred to as Edge Wizard, includes a processor(s) 104, communication interface device(s), alternatively referred as input/output (I/O) interface(s) 106, and one or more data storage devices or a memory 102 operatively coupled to the processor(s) 104. The system 100 with one or more hardware processors is configured to execute functions of one or more functional blocks of the system 100.
Referring to the components of system 100, in an embodiment, the processor(s) 104, can be one or more hardware processors 104. In an embodiment, the one or more hardware processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 104 are configured to fetch and execute computer-readable instructions stored in the memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems including laptop computers, notebooks, hand-held devices such as mobile phones, workstations, mainframe computers, servers, and the like.
The I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface enabling users communicate interactively with the system 100, and the like. Further, the I/O interface 106 can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular and the like. In an embodiment, the I/O interface(s) 106 can include one or more ports for connecting to a number of external devices or to another server or devices.
The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
In an embodiment, the memory 102 includes a plurality of modules 110 such as an optimization technique selection module to select an appropriate approach based on input parameters. The different techniques provided by the integrated platforms that are intelligently and automatically selected by the optimization technique selection module for generating an optimized DNN model based on availability of the training dataset, availability of the pretrained DNN model, and the target hardware metric constraints include i) a Fast-Neural Architecture Search (F-NAS) based optimization technique and ii) a non-F-NAS based optimization technique. The F-NAS based optimization technique comprises i) a Fast-Neural Architecture Search (F-NAS), and ii) a Lottery Ticket Hypothesis (LTH)+F-NAS. The non-F-NAS based optimization technique comprises i) a EvoPrune technique, ii) a Lottery Ticket Hypothesis (LTH), iii) and an utilities based approach in accordance with the target application.
The system 100 executes the optimization technique selection module based on a rule engine and recommends a optimization technique. However, with additional inputs/confirmation from the data scientist and domain expert as depicted in the architectural overview of the system 100 the appropriate technique can be selected and the respective module corresponding to that technique enables generation of the optimized DNN model to be deployed on the target device Thus, the system 100 offers multiple optimization technique implementation at a single point, reducing effort of new sets up for different optimization technique requirements. The rule engine is built based on knowledge derived from data scientists, domain experts and the like. An example rule engine is depicted in the table within the optimization technique selection module. The rule engine can be further expanded to scale to address inclusion of newer optimization techniques and newer rules. Furthermore, the system 100 interacts with a user via the Ul and requests for providing additional input parameters required by the optimization techniques selected.
The edge wizard (system 100) optimization techniques:
EvoPrune: This optimization technique is specifically selected when a pretrained DNN is available, but training data set is not available for generation of optimized DNN model. The EvoPrune generates pruned DNN models satisfying the target edge device constraints. The pruned DNN model can then be deployed on the target hardware The technique is based on applicants' earlier filed patent application titled Method and System for Jointly Pruning And Hardware Acceleration of Pre-Trained Deep Learning Models, Indian Patent Application number 202221043520 filed on 29 Jul. 2022 at IPO. Herein, the user, on system request/prompt is requested to select (i) a plurality of deep neural network (DNN) models, (ii) a plurality of hardware accelerators comprising of one or more processors, a plurality of target performance indicators comprising of a target accuracy, a target inference latency, a target model size, a target network sparsity, and a target energy, and (iii) a plurality of user options comprising of a first pruning search, and a secondary pruning search. The plurality of DNN models and the plurality of hardware accelerators are transformed into a plurality of pruned hardware accelerated DNN models based on at least one of the user options. The first pruning search option executes a hardware pruning search technique, to perform a search on each DNN model and each processor based on at least one of a performance indicator and an optimal pruning ratio. The second pruning search option executes an optimal pruning search technique, to perform a search on each layer with corresponding pruning ratio. Further, an optimal layer associated with the pruned hardware accelerated DNN model is identified based on the user option. The layer assignment sequence technique creates a static load distributor by partitioning the optimal layer of the DNN model into a plurality of layer sequences and assigning each layer sequence to corresponding processing element of hardware accelerators.
LTH: This optimization technique is preferred when pruning method is preferred for optimized model generation, and the training dataset is available.
F-NAS: This technique is used when no pretrained DNN models are available and the optimized DNN model is to be generated for the target device using the training dataset available. This optimization approach is based on NAS algorithm and provides a technical improvement in search speed over the RL-based NAS of the applicant's earlier filed patent application titled Automated Creation Of Tiny Deep Learning Models Based On Multi-Objective Reward Function, Indian Patent Application number 202221022177 filed at IPO on Apr. 13, 2022. The F-NAS is explained in detail in conjunction with
LTH+F-NAS: Reduces compute time significantly for complex use cases. LTH-FNAS is useful in cases where there is a large model available, which has good accuracy, but is suitable for the target device. E.g. ECG AF and Arrhythmia classification use case (4-class single lead ECG dataset, Physionet 2017). Here Benchmark model has a size of 40 MB, but good accuracy. So this model is taken by LTH module, reduced, and converted into a student model by knowledge distillation/pruning. This student model is then used to generate search space for NAS. Details of the module is available in the patent 497675-046. The upgrade to this module is the implementation of FNAS additional features on NAS to further reduce search and optimization time. The LTH+NAS approach is based in applicants earlier filed patent application titled METHOD AND SYSTEM FOR CREATING TINY DEEP NEURAL NETWORK MODEL, Indian Provisional Patent Application No. 202321025696 filed at IPO on 5 Apr. 2023
Utilities based approach: This approach is used when conversion of model structures between platforms is required, irrespective of pretrained models are available or not, or training dataset is available or not. Utilities is basically an engineering enabling module which provides conversion of model structures between ML frameworks/platforms like Tensor Flow (TF)™ to PyTorch™, PyTorch to TF, PyTorch to C or the like as is needed for deployment on target device.
The plurality of modules 110 can further include modules for implementing each of the optimization techniques such as a F-NAS module, a LTH module, a EvoPrune module and an Utilities based approach module.
Further, the plurality of modules 110 include programs or coded instructions that supplement applications or functions performed by the system 100 for executing different steps involved in the process of rapid automated selection of optimized Deep Neural Networks (DNNs) for edge devices, being performed by the system 100. The plurality of modules 110, amongst other things, can include routines, programs, objects, components, and data structures, which performs particular tasks or implement particular abstract data types. The plurality of modules 110 may also be used as, signal processor(s), node machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the plurality of modules 110 can be used by hardware, by computer-readable instructions executed by the one or more hardware processors 104, or by a combination thereof. The plurality of modules 110 can include various sub-modules (not shown).
Further, the memory 102 may comprise information pertaining to input(s)/output(s) of each step performed by the processor(s) 104 of the system 100 and methods of the present disclosure.
Further, the memory 102 includes a database 108. The database (or repository) 108 may include a plurality of abstracted pieces of code for refinement and data that is processed, received, or generated as a result of the execution of the plurality of modules in the module(s) 110.
Although the data base 108 is shown internal to the system 100, it will be noted that, in alternate embodiments, the database 108 can also be implemented external to the system 100, and communicatively coupled to the system 100. The data contained within such an external database may be periodically updated. For example, new data may be added into the database (not shown in
In an embodiment, the system 100 comprises one or more data storage devices or the memory 102 operatively coupled to the processor(s) 104 and is configured to store instructions for execution of steps of the method 200 by the processor(s) or one or more hardware processors 104. The steps of the method 200 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in
Referring to the steps of the method 200, at step 202 of the method 200, the one or more hardware processors 104 are configured by the instructions to receive the plurality of input parameters. One of the necessary input parameter are target hardware metric constraints defining an accuracy, a latency, a runtime memory usage, and a size of the optimized DNN model (also referred to as a Tiny model or TinyML model) and the like in accordance with the target device. The model generated is to be customized for the target device. Further, based on availability, the system 100 can optionally also receive at least one of i) a training dataset, and ii) a pretrained DNN model trained for a target application. It can be understood that the pretrained DNN model exceeds the target hardware metric constraints and hence needs the optimization to be applied for deployment on the target device.
At step 204 of the method 200, the one or more hardware processors 104 are configured by the instructions to select one of i) the Fast-Neural Architecture Search (F-NAS) based optimization technique and ii) the non-F-NAS based optimization technique for generating the optimized DNN model, customized for deployment on the edge device to run the target application, wherein the selection is based on availability of the training dataset, availability of the pretrained DNN model, and the target hardware metric constraints. As depicted in
Selection of the appropriate technique is implemented as discussed in conjunction with the optimization technique selection module's rule engine depicted in
At step 206 of the method 200, the one or more hardware processors 104 are configured by the instructions to generate the optimized DNN model. The F-NAS based optimization technique comprises: i) the LTH+F-NAS if the pretrained DNN model is available and the training dataset is available, wherein the pretrained DNN model is pruned using the LTH pipeline and passed to the F-NAS in form of a student model to generate a search space; and ii) the F-NAS directly if the pretrained DNN model is unavailable, and the training dataset is available. The F-NAS is built over a Reinforcement Learning (RL)-NAS technique using a RL-NAS reward comprising sigmoid based multi-objective reward function with an exponential decay and generalized for maximization and minimization Key Performance metrics (refer to equation 5 and 6). The key performance metrics of the RL-reward constitute of these target hardware constraint metrics. Accuracy is used alternately with SHD. Other hardware metrics are Size, latency, runtime memory, multiply-accumulate operations, power, etc. The mathematical representation and explanation of the multi-objective reward function is later explained The F-NAS approach comprises the following steps 206a through 206f as below and further elaborated in conjunction with
The F-NAS utilizes an early exit strategy by terminating search for the intermediate models upon reaching a Target Reward Value, which is a function of the input device constraints and target hardware metric constraints.
Once the optimized DNN model is generated, then at step 208 of the method 200, the one or more hardware processors 104 are configured by the instructions to deploy the optimized DNN model on the target edge device to run the target application.
The F-NAS enables reduction in architecture search time and computational overhead by implementing a technique to evaluate model performance, eliminating the requirement to train the models in each episode of RL based NAS. This involves the disclosed step of combining the RL-based Neural Architecture Search technique with a scaled Hamming Distance based metric, to reduce search time by at least 95% (in general the acceleration can be 20× and higher). Instead of training newly sampled model each episode, the approach disclosed that uses the Hamming Distance based metric replaces the metric value obtained after training. This Hamming Distance based metric is scaled between bounds 0 to 1, so that it becomes convenient to use it in NAS reward function.
The F-NAS process flow in conjunction with
Equation 1 refers to a matrix of size N×N, where N is the number of data samples chosen from training set. N is a subset of samples from a training data set. Each row in Matrix KH corresponds to i{circumflex over ( )}th sample from N samples, and each column in those rows tells how similar the ith sample is to ith sample. Where, dH is the Hamming distance between ith & ith sample activation value vector for hidden layer “l”. NA is the number of neurons in the activation layer of hidden layer “l”. Higher the (NA−dH), higher the activation similarity. Equation 2 indicates summation for each layer wise KHl (N×N) matrix.
Equation 3 depicts normalizing the K matrix in equation 2, such that all values are between 0 & 1.
For the N×N matrix above, 3 row vectors j, r & r2 are initialized. j is row vector of all ones. r row vector contains value from 1 to N. r_2 is the square of vector r containing square of each value from 1 to N.
Thus, r=(1, 2, . . . , d) and r2=(12, 22, . . . d2). Then:
Implementation and Results: The system 100 disclosed herein has been implemented in a lossless image compression application, where the described accelerated architecture search technique was used to generate tiny, fully connected neural networks for image compression. The optimization of performance metrics and behavior of the reward function are described in the Table 1 below.
Thus, the method and system disclosed herein provides:
Faster-NAS (F-NAS) due to new model performance evaluation metric: The F-NAS disclosed herein enables reduction in the model architecture search time and computational overhead by implementing a novel technique to evaluate model performance, eliminating the requirement to train the models in each episode of RL based NAS. The system combines the RL-based Neural Architecture Search technique with a scaled Hamming Distance based metric, to reduce search time by at least 95% (in general the acceleration can be 20× and higher). Instead of training newly sampled model each episode, the formulation of Hamming Distance based metric replaces the metric value obtained after training. This Hamming Distance based metric is scaled between bounds 0 to 1, so that it becomes convenient to use it in NAS reward function.
Sigmoid based multi-objective reward function with adaptively setting the reward hyperparameters (weight & scaling factors of each metric in reward function): This RL-NAS reward is revised version of exponential decay based multi-objective expression from the applicant's earlier filed Indian Patent Application number 202221022177. It is further generalized for maximization and minimization of Key Performance metrics. As the RL based NAS algorithm does random exploration in early episodes and exploitation in later remaining episodes, the random exploration is referred to as explore episodes. The statistics collected for each metric term of reward function during explore episodes is then used to adaptively change weight & scaling factors of each metric in reward.
Generalized NAS framework: A generalized case, where both SHD-based reward and Accuracy-based reward can be used in the exploration-exploitation episodes as per requirement. Due to this generalization, the system offers 3 ways to run NAS. First, where Accuracy-based reward is used in both exploration & exploitation epochs. Second the F-NAS, where SHD-based reward is used in both exploration & exploitation epochs. Third, a combination of SHD-based reward in exploration epochs & Accuracy-based reward in exploitation epochs is used. The ratio of exploration and exploitation episodes can also be tuned by user of the system.
Early exit search: The early-exit strategy is based on the adaptive reward function. The target hardware metric constraints (values) input by user are used to scale the reward in an adaptive manner. This involves creating an intermediate target at x % of the user-specified hard target constraint, (for example, 80% of available space on a given hardware platform) and scaling the reward to have value=1, which can be early exit criterion for the NAS algorithm.
NAS flow enhanced with AutoML (Hyperparameter Optimization): The addition of AutoML (Hyperparameter Optimization or HPO) enables to optimize the training parameters of the NAS-generated models (intermediate models), and minimize the effort required by developer/domain expert/data science/ML Engineer. For the accuracy metric based reward function and the SHD metric based reward function mentioned in the generalized NAS framework, once the NAS has generated the final model with highest reward, AutoML is run on best NAS generated model to train it & get most performance out of it. For reward function and combination of SHD and accuracy, AutoML (HPO) can be run on best model found in explore episodes and then use the best-found hyper-parameters for remaining exploitation episodes.
Use case Example depicting significance of the system 100 or the Edge Wizard in time efficient and computation efficient optimized model generation: One of the examples where the Edge Wizard (system 100) has been used to generate smaller, faster models with same accuracy performance, and lesser time and effort in model design, is the use-case of ‘Unobtrusive vision-based Assembly line Quality Inspection’ for a car manufacturing company. The problem addressed here is an image classification problem in a controlled environment, where multiple models might be required with the change in design and low inference latency is necessary to reduce loss. Following a conventional approach, building the suitable model by DL expert took nearly 3-5 weeks' time, during which a suitable small model was designed and optimized manually with the help of available tools by the domain experts. The target device constraints was a Raspberry-pi (R-pi)™ board with available memory of 1 MB. A model with 100% accuracy was designed, with 655 KB model size, and 59 milli seconds (ms) inference latency.
The same task was passed to the Edge Wizard with the dataset as input, and target device constraints as the R-pi board with available memory of 1 MB. With the edge wizard, there is no need of new setup for a new optimization approach. Since input dataset and device constraints are available, the Edge Wizard providing ready set up for multiple optimization approaches, recommended the NAS path. The model optimization took up to 2 days at most, with 100% accuracy with 90 KB model size, and 20.35 milli seconds (ms) inference latency. Thus, this clearly shows the utility of the Edge Wizard in real-life applications. It can be noted that 2 days was the time taken for NAS. However, when F-NAS is used this time was reduced to a few hours.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202321067480 | Oct 2023 | IN | national |