The disclosure generally relates to the field of neural networks, and more particular, to a method and a system for selecting an artificial intelligence (AI) model in neural architecture search (NAS).
With advancements of artificial intelligence technologies, overall design of neural network structures is shifting from manual to automatic machine design. Developers can be assisted in finding an ideal neural network architecture by using neural architecture search (NAS). The NAS is a technique for automating the building of neural networks (NN) such as artificial neural networks (ANN) and deep neural networks (DNN), a popular model in machine learning. The NAS has been used to design networks that are on par with or outperform hand-designed architectures. Even though NAS is able to design tailored NNs effectively, there is still a major challenge with current NAS approaches. i.e., NAS is very compute-intensive and time-consuming. Current NAS approaches simply try out a plurality of NN models/designs and select one of the plurality of NNs.
Few solutions have been provided to solve the above discussed problems. One of the solutions is to use a proxy method to speed up the NAS. The proxy method is a reduced form of training where the number of training is reduced, a smaller NN model is used, and/or the training data is subsampled. One of the proxy method is “zero-cost” proxy method, which is a way to estimate the performance of NN model without training.
Where loss with respect to weight (wi) is denoted by
Each layer is assigned the same weight, but each layer may have a different receptive field. Therefore, assigning the same weight to each layer results in an inappropriate score and the best model may not be selected.
Provided are a method and a system for selecting an artificial intelligence (AI) model in neural architecture search (NAS).
This summary is provided to introduce a selection of concepts in a simplified format that are further described in the detailed description of the disclosure. This summary is not intended to identify key or essential concepts of the disclosure, nor is it intended for determining the scope of the disclosure.
According to an aspect of the disclosure, a method for selecting an artificial intelligence (AI) model in neural architecture search (NAS), may include: measuring a scale of receptive field for a plurality of neural network layers corresponding to each of a plurality of candidate AI models; determining a first score for a first group of neural network layers among the plurality of neural network layers based on the scale of the receptive field for the first group of neural network layers, the scale of the receptive field for each of the first group of neural network layers being smaller than a size of an object; determining a second score for a second group of neural network layers among the plurality of neural network layers based on the scale of the receptive field for the second group of neural network layers, the scale of the receptive field for each of the second group of neural network layers being greater than the size of the object; determining a third score for each of the plurality of candidate AI models as a function of the first score and the second score; and selecting, based on the third score, a candidate AI model among the plurality of candidate AI models for training and deployment, the candidate AI model having a highest third score among the third scores of the plurality of candidate AI models.
The second group of neural network layers may have a depth that is greater than a depth of the first group of neural network layers.
The determining the first score may include: determining a first weightage value associated with each layer of the first group of neural network layers based on the scale of the receptive field for the first group of neural network layers; and determining the first score based on the first weightage value.
The determining the second score may include: determining a second weightage value associated with each layer of the second group of neural network layers based on the scale of the receptive field for the second group of neural network layers; and determining the second score based on the second weightage value.
The first score corresponds to a sum of a first group of scores corresponding to the first group of neural network layers.
The second score corresponds to a sum of a second group of scores corresponding to the second group of neural network layers.
The measuring the scale of the receptive field for the plurality of neural network layers may include adding information including the scale of the receptive field of the candidate AI model in a feature map.
Each of the plurality of candidate AI models may be a zero-cost proxy model.
The plurality of candidate AI models may be generated by an NAS controller.
According to an aspect of the disclosure, a system for selecting an artificial intelligence (AI) model in neural architecture search (NAS), includes: a memory storing instructions; and at least one processor operatively connected to the memory and configured to execute the instructions to: measure a scale of receptive field for a plurality of neural network layers corresponding to each of a plurality of candidate AI models; determine a first score for a first group of neural network layers among the plurality of neural network layers based on the scale of the receptive field for the first group of neural network layers, the scale of the receptive field for each of the first group of neural network layers being smaller than a size of an object, determine a second score for a second group of neural network layers among the plurality of neural network layers based on the scale of the receptive field for the second group of neural network layers, the scale of the receptive field for each of the second group of neural network layers being greater than the size of the object, determine a third score for each of the plurality of candidate AI models as a function of the first score and the second score, and select, based on the third score, a candidate AI model among the plurality of candidate AI models for training and deployment, the candidate AI model having a highest third score among the third scores of the plurality of candidate AI models.
The second group of neural network layers may have a depth that is greater than a depth of the first group of neural network layers.
The at least one processor may be further configured to execute the instructions to determine the first score by: determining a first weightage value associated with each layer of the first group of neural network layers based on the scale of the receptive field for the first group of neural network layers; and determining the first score based on the first weightage value.
The at least one processor may be further configured to execute the instructions to determine the second score by: determining a second weightage value associated with each layer of the second group of neural network layers based on the scale of the receptive field for the second group of neural network layers; and determining the second score based on the second weightage value.
The at least one processor may be further configured to execute the instructions to determine the first score by performing a summing operation on a first group of scores corresponding to the first group of neural network layers.
The at least one processor may be further configured to execute the instructions to determine the second score by performing a summing operation on a second group of scores corresponding to the second group of neural network layers.
The at least one processor may be further configured to execute the instructions to measure the scale of the receptive field for the plurality of neural network layers by adding information including the scale of the receptive field of the candidate AI model in a feature map.
Each of the plurality of candidate AI models may be a zero-cost proxy model.
The at least one processor may include a NAS controller configured to generate the plurality of candidate AI models.
The above and other aspects, features, and advantages of certain embodiments of the present disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
Example embodiments of the disclosure are described below with reference to the drawings. It be understood that no limitation of the scope of the disclosure is thereby intended, such alterations and further modifications in the disclosed embodiments, and such further applications of the principles of the disclosure as illustrated therein being contemplated as would normally occur to one skilled in the art to which the disclosure relates.
It will be understood by those skilled in the art that the foregoing general description and the following detailed description are explanatory of the disclosure and are not intended to be restrictive thereof. Reference throughout this specification to “an aspect”, “another aspect” or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Thus, appearances of the phrase “in an embodiment”, “in another embodiment” and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
The terms “comprises”, “comprising”, “includes”, “including”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more systems or sub-systems or elements or structures or components proceeded by “comprises . . . a” does not, without more constraints, preclude the existence of other systems or other sub-systems or other elements or other structures or other components or additional systems or additional sub-systems or additional elements or additional structures or additional components.
As shown in
The processor 401 can be a single processing unit or several units, all of which could include multiple computing units. The processor 401 may be implemented as one processor or a plurality of microprocessors, microcomputers, microprocessors, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor 401 is configured to fetch and execute computer-readable instructions and data stored in memory 403.
The memory 403 may include any non-transitory computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read-only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
In an embodiment, the system 400 may be a part of an electronic device. In another embodiment, the system 400 may be coupled to the electronic device. It should be noted that the term “electronic device” refers to any electronic devices used by a user such as a mobile device, a desktop, a laptop, a personal digital assistant (PDA) or similar devices.
Referring to
In another embodiment, the processor 401 may measure the scale of the receptive field for each of the NN layers based on a feature map of that layer. For example, as shown in
Further, it should be noted that
Referring back to
At operation 305, the method 300 may comprise determining a second score for a second group of neural network layers among the plurality of neural network layers based on the measured receptive field for the second group of neural network layers. In an embodiment, the measured scale of the receptive field for each of the second group of neural network layers is greater than the size of the object. Operations 303 and 305 are further explained in reference to
In an embodiment, the second group of neural network layers (709) have a depth greater than a depth of the first group of neural network layers (701-707), as shown in
Further, in an embodiment, the first and the second weightage value may be determined using one of a plurality of weightage functions. In an embodiment, the plurality of weightage functions may include a Gaussian weightage function, a triangular weightage function, and a linear weightage function.
Wherein αi is the first weightage value for each of the NN layers of the first group and wi is the corresponding weight of each of the NN layers of the first group.
Hence, the first score corresponds to a sum of a first group of scores corresponding to the first group of neural network layers.
Similarly, the second score may be calculated as:
Wherein αj is the second weightage value for each of the NN layers of the second group and wj is the corresponding weight of each of the NN layers of the second group.
Hence, the second score corresponds to a sum of a second group of scores corresponding to the second group of neural network layers.
Referring back to
In another embodiment, a value of a classifier may also be added to compute the third score.
Thereafter, at operation 309, the method 300 may comprise selecting, based on the calculated third score, a candidate AI model among the plurality of candidate AI models for training and deployment, wherein the selected candidate AI model has a highest third score among the calculated third scores of the plurality of candidate AI models. In particular, the third score is calculated for each of the candidate AI model and the candidate AI model with the highest third score is selected for training and deployment on the electronic device.
Hence, the present disclosure considers the perception window of each neuron of the NN layer and uses this wisely in the formulation of a more understandable and reliable zero-cost proxy. Further, the significance of the receptive field is mathematically transformed into weights on individual metrics obtained from each layer.
Accordingly, the present disclosure provides the following technical advantages:
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or component of any or all the claims.
While the disclosure has been particularly shown and described with reference to embodiments thereof, it will be understood that various changes in form and details may be made therein without departing from the spirit and scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
2022411060811 | Oct 2022 | IN | national |
20241060811 | May 2023 | IN | national |
This application is a Bypass Continuation Application of International Application No. PCT/KR2023/095066, filed on Oct. 25, 2023, which is based on and claims priority to Indian Patent Application No. 202241060811, filed on Oct. 25, 2022 in the Indian Intellectual Property Office, and to Indian Patent Application No. 202241060811 filed on May 29, 2023 in the Indian Intellectual Property Office, the disclosures of which are incorporated by reference herein in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/KR23/95066 | Oct 2023 | US |
Child | 18414068 | US |