Embodiments of the invention relate to neural networks; more specifically, to automatic searches for network spaces.
Recent architectural advances in deep convolutional neural networks consider several factors for network designs (e.g., types of convolutions, network depths, filter sizes, etc.), which are combined to form a network space. One can leverage such network spaces to design favorable networks or utilize them as the search spaces for Neural Architecture Search (NAS). In industry, efficiency considerations for architectures are also required for deploying products on various platforms, such as mobile, augmented reality (AR), and virtual reality (VR) devices.
Design spaces have lately been demonstrated to be a decisive factor in designing networks. Accordingly, several design principles are proposed to deliver promising networks. However, these design principles are based on human expertise and require extensive experiments for validation. In contrast to handcrafted designs, NAS automatically searches for favorable architectures within a predefined search space. The choice of the search space is a critical factor affecting the performance and efficiency of NAS approaches. It is common to reuse tailored search spaces developed in previous works. However, these approaches ignore the potential of exploring untailored spaces. On the other hand, defining a new, effective search space involves tremendous prior knowledge and/or manual effort. Hence, there is a need for automatic network space discovery.
In one embodiment, a method is provided for network space search. The method comprises the step of partitioning an expanded search space into a plurality of network spaces. Each network space includes multiple network architectures and is characterized by a first range of network depths and a second range of network widths. The method further comprises the step of evaluating performance of the network spaces by sampling respective network architectures with respect to a multi-objective loss function. The evaluated performance is indicated as a probability associated with each network space. The method further comprises the steps of identifying a subset of the network spaces that has highest probabilities, and selecting a target network space from the subset based on model complexity.
In another embodiment, a system is provided for network space search. The system includes one or more processors, and a memory that stores instructions, when executed by the one or more processors, cause the system to partition an expanded search space into multiple network spaces. Each network space includes a plurality of network architectures and is characterized by a first range of network depths and a second range of network widths. The instructions, when executed by the one or more processors, further cause the system to evaluate performance of the network spaces by sampling respective network architectures with respect to a multi-objective loss function, wherein the evaluated performance is indicated as a probability associated with each network space; identify a subset of the network spaces that has highest probabilities; and select a target network space from the subset based on model complexity.
Other aspects and features will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments in conjunction with the accompanying figures.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that different references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references mean at least one. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures, and techniques have not been shown in detail in order not to obscure the understanding of this description. It will be appreciated, however, by one skilled in the art, that the invention may be practiced without such specific details. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
A method and a system are provided for Network Space Search (NSS). The NSS method is performed automatically on an Expanded Search Space, which is a search space scalable with minimal assumptions in network designs. The NSS method automatically searches for Pareto-efficient network spaces in Expanded Search Space, instead of searching for a single architecture. The search for network spaces takes into account efficiency and computational costs. The NSS method is based upon differentiable approaches and incorporates multi-objectives into the search process to search for network spaces under given complexity constraints.
The network spaces output by the NSS method, named Elite Spaces, are Pareto-efficient spaces aligned with the Pareto front with respect to performance (e.g., error rates) and complexity (e.g., number of floating-point operations (FLOPs)). Moreover, Elite Spaces can further serve as NAS search spaces to improve NAS performance. Experimental results using the CIFAR-100 dataset show that NAS searches in Elite Spaces result in an average 2.3% lower error rate and 3.7% closer to target complexity than the baseline (e.g., Expanded Search Space) with around 90% fewer samples required to find satisfactory networks. Finally, the NSS method can search for superior spaces from various search spaces with different complexity, showing the applicability in unexplored and untailored spaces. The NSS method automatically searches for favorable network spaces, reducing the human expertise involved in both designing network designs and defining NAS search spaces.
Expanded Search Space 110 is a large-scale space with two main properties: automatability (i.e., minimal human expertise) and scalability (i.e. capability of scaling networks). Expanded Search Space 110 serves as a search space for NSS to search for network spaces.
Expanded Search Space is much more complex than the conventional NAS search spaces in terms of difficulty in the selections among candidates. This is because there are dmax possible blocks in network depths and wmax possible channels in network widths. Moreover, Expanded Search Space can be potentially extended by replacing it with more sophisticated building blocks (e.g., complex bottleneck blocks). Thus, Expanded Search Space meets the goals of scalability in network designs and automatability with minimal human expertise.
After defining Expanded Search Space, the following question is addressed: how to search for network spaces given Expanded Search Space? To answer this, NSS is formulated as a differentiable problem of searching for an entire network space:
where the optimal network space A*ϵ A is obtained from A along with its weights wA* to achieve minimal loss (A*, wA*). Here A is a space without any prior knowledge imposed in network designs (e.g., Expanded Search Space). To reduce the computational cost, probability sampling is adopted and Objective (1) is rewritten to:
where Θ contains parameters for sampling spaces A ϵ A. Although Objective (2), which is relaxed from Objective (1), can be used for optimization, the estimation of expected loss for each space A is still lacking. To solve this, distributional sampling is adopted to optimize (2) for the inference of super networks. A super network is a network with dmax blocks in each stage, and wmax channels in each block. More specifically, from a sampled space A ϵ A in (2), architectures a ϵ A are sampled to evaluate the expected loss of A. Therefore, Objective (2) is further extended accordingly:
where Pθ is a uniform distribution and θ contains parameters that determine the sampling probability Pθ of each architecture a. Objective (3) is to be optimized for network space search, and the evaluation of expected loss of a sampled space is based on (3) as well.
Instead of regarding a network space A as a set of individual architectures, A can be represented with the components in Expanded Search Space. Recalling that Expanded Search Space is composed of searchable network depths di and widths wi, a network space A can therefore be viewed as a subset of all possible numbers of blocks and channels. More formally, the network space is expressed as A={diA ϵ d, wiA ϵ w}i=1N, where d={1, 2, . . . , dmax}, w={1, 2, . . . , wmax}, and diA and wiA respectively denote the set of possible numbers of blocks and channels in A. After the searching process, diA and wiA are retained to represent the discovered network space.
The NSS method searches for network spaces that satisfy a multi-objective loss function for further use in designing networks or defining NAS search spaces. In this way, the searched spaces enable downstream tasks to reduce the effort of refining tradeoffs and concentrate on fine-grained objectives instead. In one embodiment, the NSS method discovers networks with satisfactory tradeoffs between accuracy and model complexity. The multi-objectives search incorporates model complexity in terms of FLOPs into Objective (1) to search for network spaces fulfilling the constraints. The FLOPs loss is defined as:
FLOPs()=|FLOPs(
)/FLOPstarget−1| (4)
where |·| denotes the absolute function and FLOPstarget is the FLOPs constraint to be satisfied. The multi-objective losses are combined by weighted summation, and therefore in (1) can be replaced with the following equation:
(
, w
)=
task(
, w
)+λ
FLOPs(
) (5)
where task is the ordinary task-specific loss in (1), which can be optimized with (3) in practice, and λ is the hyperparameter controlling the strength of FLOPs constraint.
By optimizing (5), the NSS method produces the network spaces satisfying a multi-objective loss function. Elite Spaces are derived from the optimized probability distribution PΘ after the searching process. From PΘ, the n spaces having the highest probabilities are sampled. The one space that is closest to the FLOPs constraint is selected as Elite Space.
To improve the efficiency of the NSS framework, weight sharing techniques can be adopted in two aspects: 1) the masking techniques can be adopted to simulate various numbers of blocks and channels by sharing a portion of the super components. 2) To ensure well-trained super networks, warmup techniques can be applied to both block and channel search.
As Expanded Search Space includes a wide range of possible network depths and widths, simply enumerating each candidate is memory-prohibited for either the kernels with various channel sizes or the stages with various block sizes. A masking technique can be used to efficiently search for channel sizes and block depths. A single super kernel is constructed with the largest possible number of channels (i.e., wmax). Smaller channel sizes w≤wmax is simulated by retaining the first w channels and zeroing out the remaining ones. Moreover, a single deepest stage with the largest possible number of blocks (i.e., dmax) is constructed, and shallower block sizes d≤dmax are simulated by taking the output of the dth block as the output of the corresponding stage. The masking technique achieves the lower bound of memory consumption and more importantly, is differential-friendly.
To provide the maximum flexibility in network space search, a super network in Expanded Search Space is constructed to have dmax blocks in each stage and wmax channels in each convolutional kernel. Super network weights need to be sufficiently well-trained to ensure reliable performance estimation of each candidate network space. Therefore, several warmup techniques can be used to improve the quality of super network weights. For example, in the first 25% of epochs, only the network weights are updated and network space search is disabled since network weights cannot appropriately guide the searching process in the early period.
The following description provides a non-limiting example of an experimental setup for NSS. A super network in Expanded Search Space is constructed to have dmax=16 blocks in each stage and wmax=512 channels in each convolutional kernel of all 3 stages. Each network space in the Expanded Search Space is defined as a continuous range of network depths and widths for simplicity. As an example, each network space includes 4 and 32 possible blocks and channels, respectively, and therefore Expanded Search Space results in (16/4)3×(512/32)3=218 possible network spaces. A searching process is performed on the 218 network spaces, with each network space assigned a probability according to a probability distribution. The probability assigned to each network space is updated by gradient descent. The top n network spaces having the highest probabilities are selected for further evaluation; e.g., n=5. In one embodiment, the network architectures in the n spaces are sampled. The network space having a FLOPs count closest to a predetermined FLOPs constraint is chosen as Elite Space.
The images in each of CIFAR-10 and CIFAR-100 datasets are equally split into a training set and a validation set. These two sets are used for training the super network and searching for network spaces, respectively. The batch size is set to 64. The searching process lasts for 50 epochs where the first 15 ones are reserved for warmup. The temperature for Gumbel-Softmax is initialed to 5 and linearly annealed down to 0.001 throughout the searching process. The search cost for a single run of the NSS process is roughly 0.5 days under the above settings, and the subsequent NAS performed on Expanded Search Space and Elite Spaces requires 0.5 days and merely several hours to complete a searching process, respectively.
The performance of Elite Spaces is evaluated by the performance of their comprised architectures. The NSS method sustainably discovers promising network spaces across different FLOPs constraints in both CIFAR-10 and CIFAR-100 datasets. Elite Spaces achieve satisfactory tradeoffs between the error rates and meeting the FLOPs constraints, and are aligned with the Pareto front of Expanded Search Space. Since Elite Spaces discovered by the NSS method are guaranteed to consist of superior networks provided in various FLOPs regimes, they can be utilized for designing promising networks. More importantly, Elite Spaces are searched by NSS automatically, therefore the human effort involved in network designs is significantly reduced.
The system at step 420 evaluates the performance of the network spaces by sampling respective network architectures with respect to a multi-objective loss function. The evaluated performance is indicated as a probability associated with each network space. The system at step 430 identifies a subset of the network spaces that has the highest probabilities. The system at step 440 selects a target network space from the subset based on model complexity. In one embodiment, the target network space selected at step 440 is referred to as Elite network space.
The processing hardware 610 is coupled to a memory 620, which may include memory devices such as dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, and other non-transitory machine-readable storage media; e.g., volatile or non-volatile memory devices. To simplify the illustration, the memory 620 is represented as one block; however, it is understood that the memory 620 may represent a hierarchy of memory components such as cache memory, system memory, solid-state or magnetic storage devices, etc. The processing hardware 610 executes instructions stored in the memory 620 to perform operating system functionalities and run user applications. For example, the memory 620 may store NSS parameters 625, which may be used by method 400 in
In some embodiments, the memory 620 may store instructions which, when executed by the processing hardware 610, cause the processing hardware 610 to perform image refinement operations according to method 400 in
The operations of the flow diagrams of
Various functional components, blocks, or modules have been described herein. As will be appreciated by persons skilled in the art, the functional blocks or modules may be implemented through circuits (either dedicated circuits or general-purpose circuits, which operate under the control of one or more processors and coded instructions), which will typically comprise transistors that are configured in such a way as to control the operation of the circuity in accordance with the functions and operations described herein.
While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, and can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.
This application claims the benefit of U.S. Provisional Application No. 63/235,221 filed on Aug. 20, 2021, the entirety of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63235221 | Aug 2021 | US |