The present disclosure relates to a machine learning technique.
In recent years, a technique called Neural Architecture Search (NAS) has been attracting attention in the field of machine learning (refer to “A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions”, P. REN et al. (ACM Comput. Surv., Vol. 37, No. 4, Article 111)). NAS is a technique of searching for and determining an architecture (including the type of calculation to be performed in each layer, and connection states between layers) in a hierarchical network in order to obtain higher performance. In NAS, an optimum architecture is searched for, and simultaneously a weight coefficient and other parameters corresponding to the architecture found in the search is learned.
Architecture search methods using NAS include the followings. A method using an evolutionary algorithm (EA) is discussed in “Large-scale evolution of image classifiers”, E. Real et al. (ICML 2017). In the EA method, an evolutionary algorithm is used to search for an optimum architecture. More specifically, a group of candidate architectures is prepared, and an evolutionary algorithm such as mutation and selection is applied to this group, whereby a higher-performance architecture is eventually determined.
A method using reinforcement learning (RL) is discussed in “Neural Architecture Search with Reinforcement Learning”, B. Zoph et al. (ICLR 2017). In the RL method, an architecture is generated using a controller recurrent neural network (RNN). Subsequently, using accuracy (a correct answer rate) in the generated architecture as a reward, the controller RNN is updated by a policy gradient method to perform learning.
A method using gradient descent (GD) is discussed in each of “DARTS: Differentiable Architecture Search”, H. Liu et al. (ICLR 2019), and “SNAS: Stochastic Neural Architecture Search”, S. Xie et al. (ICLR 2019). In the GD method, a space to be searched for an architecture is expressed by a directed acyclic graph (DAG), a maximum graph expression is used as a parent graph, and an optimum architecture is searched for in the range of a child graph to be a subset thereof.
In the GD method, discrete selection of calculations to be executed in respective layers of an architecture is not performed unlike the RL method described above, and a continuous expression in which calculations are mixed is used. Thus, calculations (e.g., a convolution calculation, a pooling calculation, and a zero calculation) that can be selected in each layer are added together and expressed using a softmax function (refer to “DARTS: Differentiable Architecture Search”, H. Liu et al. (ICLR 2019)) or concrete distribution (refer to “SNAS: Stochastic Neural Architecture Search”, S. Xie et al. (ICLR 2019)). Subsequently, the architecture expression and the weight coefficient are optimized. As a loss function for optimizing the architecture expression, a validation loss (refer to “DARTS: Differentiable Architecture Search”, H. Liu et al. (ICLR 2019)) or a generic loss (refer to “SNAS: Stochastic Neural Architecture Search”, S. Xie et al. (ICLR 2019)) is used.
In the conventional NAS methods described above, the performance of a network model in an architecture search process is evaluated by monitoring a temporal change in accuracy (correct answer rate). However, even if the temporal change in accuracy (correct answer rate) is monitored, a change in network connection state cannot be grasped. In addition, even if a temporal change in network connection state is monitored, a change in connection state of the entire network model cannot be grasped. Thus, the performance of the network model in the architecture search process cannot be precisely evaluated, which makes it difficult to search for an architecture efficiently.
The present disclosure is directed to improving performance evaluation in a process of searching for an architecture of a network model.
According to an aspect of the present disclosure, an information processing apparatus configured to optimize an architecture of a network model includes at least one memory storing instructions, and at least one processor that, upon execution of the instructions, is configured to perform a search for the architecture based on data for learning, evaluate a topology of the network model that corresponds to the architecture obtained in a process of the search, and perform control to output a change in an evaluation result of the topology as the search progresses.
Further features of the present disclosure will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Exemplary embodiments of the present disclosure will be described below with reference to the attached drawings. Configurations described in the following exemplary embodiments are merely examples, and the present disclosure is not limited to the illustrated configurations.
In a first exemplary embodiment, a network model is optimized using a technique of Neural Architecture Search (NAS).
The network model is assumed to be a hierarchical network such as a neural network. In NAS, an architecture (including the type of calculation to be performed in each layer, and connection states between layers) of a hierarchical network is searched for and determined in order to obtain higher performance. The following description will be given assuming a case where a unit structure called CELL (a micro-architecture) of an architecture (a macro-architecture) of the entire network is searched for.
The input device 113 is a human interface device or the like, and inputs user operation information, which indicates an operation performed by a user, to the information processing apparatus 100. The display device 114 is a display or the like, and displays a result of performing processing according to the present exemplary embodiment under control of the control device 111. The communication I/F 115 performs a communication connection with an external apparatus by wire or wirelessly, and exchanges data with the external apparatus under the control of the control device 111.
While
The operation acceptance unit 301 accepts the user operation information via the input device 113. The architecture search unit 305 performs various types of processing, such as continuation and stoppage of an architecture search, based on the user operation information accepted by the operation acceptance unit 301. The user performs various operations while viewing a result of monitoring by the search process monitoring unit 307, which is displayed on the display device 114.
The display control unit 302 performs processing for controlling a screen to be displayed on the display device 114. More specifically, the display control unit 302 controls the display device 114 to display a result of monitoring by the search process monitoring unit 307 or a graphical user interface (GUI) to be used for operations by the user.
The data storage unit 303 stores training data and evaluation data.
The architecture parameter storage unit 304 stores information about an architecture obtained as a result of a search by the architecture search unit 305.
The architecture search unit 305 performs a NAS-based search using the training data (the data for learning) held in the data storage unit 303, and stores in the architecture parameter storage unit 304 information about an architecture finally obtained as a result of the search. Using the evaluation data held in the data storage unit 303, the architecture search unit 305 also calculates validation accuracy (a correct answer rate on the evaluation data, hereinafter referred to as “val_acc”) of a network obtained in the search process.
The topology evaluation unit 306 acquires from the architecture search unit 305 information about the network obtained in the search process, and evaluates a network topology based on the acquired information.
The search process monitoring unit 307 acquires from the architecture search unit 305 information about the correct answer rate of the network. The search process monitoring unit 307 also acquires from the topology evaluation unit 306 information about the evaluation of the network topology. The search process monitoring unit 307 monitors a temporal change in these evaluation results, and outputs a result of the monitoring to the display control unit 302.
Next, overall processing for an architecture search by the information processing apparatus 100 according to the present exemplary embodiment will be described.
In the present exemplary embodiment, steps S101 to S104 in
In step S101, the architecture search unit 305 performs a search for an architecture using NAS. More specifically, the architecture search unit 305 reads appropriate data for learning from the data storage unit 303, and uses the read data to search for an architecture and perform learning of a weight coefficient held by the network corresponding to the architecture found in the search. The architecture search is performed by learning of architecture-related parameters.
In this step, the learning is performed using the data for learning, including input data and training data. The input data is input to a target network, and an error of an output value obtained as a result of a feedforward calculation is back-propagated in the network, so that the architecture-related parameters and the weight coefficient of the network are updated. The training data is used to calculate the error of the output value described above. The training data represents desired output (a label value or distribution thereof) corresponding to the input data.
In step S201, the architecture search unit 305 performs a search for an optimum architecture expression. The architecture expression is such that, in a case where a search target network is expressed by connection states between a plurality of contact points (hereinafter referred to as “nodes”) and calculations between the nodes, all calculations possible on the nodes are added together along with being weighted. The nodes are an example of elements of the network. The following description will be given of a case, as an example, where a search for a CELL structure including four nodes is performed assuming four calculations O (o1, o2, o3, and 0: zero calculation) and using a gradient descent (GD) method (refer to “DARTS: Differentiable Architecture Search”, H. Liu et al. (ICLR 2019) and “SNAS: Stochastic Neural Architecture Search”, S. Xie et al. (ICLR 2019)). The zero calculation influences the connection states between the nodes, and expresses a state where two nodes are disconnected in a case where the value is 1.
where i and j each represent a node number.
The calculation o(i,j) between the nodes i and j in the formula (1) can be expressed in a continuous manner in which calculations indicated by the following formula (2) are mixed, by adding the calculation o1, the calculation o2, and the calculation o3 together using, for example, a softmax function, instead of being expressed in a discrete manner for each calculation.
In the formula (2), α(i,j) is a vector expressing weighting for each calculation, and αo(i,j) is a component thereof. The calculation o(i,j) between the nodes i and j is expressed by a set α(i,j) as indicated by the following formula (3).
As described above, the continuous expression is adopted, whereby a space searched with respect to the calculations can be made continuous, and a gradient method can be applied at the time of a search. For simple description, an architecture-related parameter set represented by the architecture expression matrix is expressed by a. A weighting parameter set held by the network is expressed by w. A loss value (a training loss) calculated using the training data is expressed by Ltrain. A loss value (a validation loss) calculated using the evaluation data is expressed by Lval. The loss values Ltrain and Lval are determined by the parameter sets α and w. In the architecture search, a pair of the parameter sets α and w minimizing the loss values Ltrain and Lval is searched for, as expressed by the following formula (4).
The description will continue referring back to
In step S201, the architecture search unit 305 performs learning using the appropriate data for learning read from the data storage unit 303, for the network formed by the architecture expression matrix obtained in the last search (step S101). More specifically, using the following formula (5), a gradient for the loss value Lval is calculated, and the architecture-related parameter set α is updated.
In step S202, the architecture search unit 305 performs learning using the appropriate data for learning read from the data storage unit 303, for the network having the architecture found in the search in step S101. More specifically, using the architecture-related parameter set a updated in step S201 and the following formula (6), a gradient for the loss value Ltrain is calculated, and the weighting parameter set w of the network is updated.
As described above, in step S101, the architecture search unit 305 performs learning of the architecture-related parameters and learning of the weight coefficient held by the network. Repeating steps S101 to S104 updates the architecture expression matrix. Also in step S101, the architecture search unit 305 outputs the val_acc of the network calculated using the evaluation data, to the search process monitoring unit 307. The processing then proceeds to step S102.
The description will continue referring back to
In step S102, the topology evaluation unit 306 acquires information about the connection state of the network from the architecture search unit 305, and evaluates the network topology.
A frame 1001 in the architecture expression matrix illustrated in the upper part of
In the present exemplary embodiment, to indicate each of the connection states in a simple manner, the value of the zero calculation having a 0 or 1 value is expressed by 0 or 1, but actually, a certain threshold (e.g., 0.8) is set, and a value more than or equal to the threshold is determined as 1 (a disconnected state). This also applies to the other architecture expression matrices illustrated in
In step S103, each time steps S101 to S104 in
In step S104, the display control unit 302 controls the display device 114 to display the monitoring result stored in step S103. The monitoring result displayed on the display device 114 is updated to the latest state as the architecture search process progresses. The display content of the display device 114 is presented to the user. The user can perform various determinations, such as continuation and stoppage of the architecture search, while viewing the monitoring result. The architecture search unit 305 performs various types of processing, such as continuation and stoppage of the architecture search, based on the user operation information accepted by the operation acceptance unit 301. The output destination of the monitoring result is not limited to the display device 114. The control device 111 can control the storage device 112 to store the val_acc and the topological invariant, which are acquired in the search process, in association with the epoch number. The transition of the val_acc and the topological invariant in the search process can be thereby analyzed based on the data read out from the storage device 112.
In the architecture search process, the user can refer to the information displayed in step S104 to evaluate the performance of the network based on both the val_acc and the topological invariant. Thus, the accuracy of evaluating the performance of the network increases. In addition, in a case where the topological invariant settles near a fixed value, the search for the network topology can be considered to have settled, and the architecture search unit 305 can continue the learning of the weight coefficient after fixing the architecture-related parameters. This makes it possible to reduce the search space, thereby improving the efficiency of the search. In a case where the val_acc and the topological invariant each settle near a fixed value, the search can be regarded as having settled on a stable solution, and the architecture search unit 305 can stop the learning. Whether the topological invariant has settled near the fixed value can be determined by the search process monitoring unit 307 based on the amount of change in the topological invariant corresponding to a predetermined epoch number, or can be determined by the operation acceptance unit 301 based on an instruction from the user.
In step S105, the architecture search unit 305 determines whether to end the architecture search. In the present exemplary embodiment, the architecture search unit 305 determines whether an instruction to stop the architecture search is received from the operation acceptance unit 301. In this determination, a determination as to whether the epoch number has reached a predetermined number can also be made, and a determination as to whether at least a certain level of performance has been achieved or whether the loss values each have become a fixed value or less using the evaluation data can also be made. In a case where the architecture search unit 305 determines to end the architecture search (YES in step S105), information about the finally obtained architecture is stored into the architecture parameter storage unit 304, and the series of steps in the flowchart ends. In a case where the architecture search unit 305 determines to continue the architecture search (NO in step S105), the information processing apparatus 100 repeats steps S101 to S104.
As described above, in the present exemplary embodiment, the state of the macro form of the network model can be appropriately evaluated by the evaluation of the network topology in the NAS-based architecture search process.
While the above description has been given assuming the case where the CELL structure with respect to the entire network is searched for, the present exemplary embodiment is also applicable to a case where the structure of the architecture of the entire network is searched for. While the above description has been given using the number of holes as a specific example of the topological invariant, the number of connected components or the number of hollow areas can also be used as the topological invariant.
As a modification of the present exemplary embodiment, the search process monitoring unit 307 can instruct the architecture search unit 305 to perform the learning of the weight coefficient after fixing the architecture-related parameters, in a case where the search process monitoring unit 307 determines that the topological invariant has settled near a fixed value.
This makes it possible to reduce the search space at the timing when the network topology is determined to have settled in a specific state, thereby improving the efficiency of the search. In this case, the display control unit 302 may not necessarily display on the display device 114 the monitoring result stored in step S103.
In a second exemplary embodiment, a description will be given of a case where, during the architecture search according to the first exemplary embodiment, the network topology is intentionally changed by a user operation, and a response thereto is reflected in the monitoring result. Description of a part common to the first exemplary embodiment will be omitted, and a difference from the first exemplary embodiment will be mainly described.
Overall processing for an architecture search performed by an information processing apparatus according to the present exemplary embodiment will be described.
In step S301, the control device 111 changes the network topology at present, based on the user operation information accepted by the operation acceptance unit 301. More specifically, the user operates the GUI to change a value (0 or 1) in the part about the connection states between the nodes in the architecture expression matrix updated in step S101. The network topology is thereby changed. Alternatively, the user can operate the GUI to change the network topology, and the control device 111 can reflect the change in the architecture expression matrix. The search process monitoring unit 307 monitors the val_acc and the topological invariant acquired in the search before and after the network topology is changed. In other words, the monitoring result reflects a change in the val_acc and the topological invariant before and after the change of the network topology. The user can thereby verify the global stability of the search result at present while viewing the monitoring result.
A method for verifying the global stability of the search result will be described with reference to
A shift from the position of the epoch number N on the graph in
As described above, according to the present exemplary embodiment, the global stability of the search result can be verified in the process of searching for the architecture of the network.
In the second exemplary embodiment described above, the method of verifying the global stability of the search result after the network topology is changed based on the operation by the user has been described. In a third exemplary embodiment, a method of verifying the global stability of the search result while the network topology is periodically changed by the information processing apparatus 100 will be described. Description of a part common to the second exemplary embodiment will be omitted, and a difference from the second exemplary embodiment will be mainly described. The following description will be given of a case where two holes are assumed to be a standard state of the topological invariant (the number of holes) and a fluctuation of plus or minus 1 is given thereto.
In the present exemplary embodiment, in step S301, the control device 111 changes the network connection state by giving a fluctuation to the topological invariant at present. The search process monitoring unit 307 monitors the val_acc and the topological invariant acquired in the state where the fluctuation is given. A global change in the form of the network can be thereby verified more directly than in giving a fluctuation to a value of the architecture expression matrix. In the present exemplary embodiment, the fluctuation of the topological invariant is reflected in the architecture expression matrix, but the fluctuation can be given to a value itself in the part about the connection state of the network in the architecture expression matrix.
As described above, according to the present exemplary embodiment, the global stability of the search result can be verified in the process of searching for the architecture of the network.
The exemplary embodiments of the present disclosure include a case where the functions according to the above-described exemplary embodiments are implemented by supplying a software program to a system or an apparatus directly or remotely, and causing a computer of the system or the apparatus to read out the supplied program and execute the read-out program. In this case, the supplied program is a computer readable program corresponding to the flowchart illustrated in each of the exemplary embodiments. Further, besides being implemented by the execution of the read-out program by the computer, the functions according to the above-described exemplary embodiments can be implemented in cooperation with an operating system (OS) or the like running on the computer, based on instructions of the program. In this case, the OS or the like performs part or all of actual processing, and the functions according to the above-described exemplary embodiments are implemented by the processing.
According to the exemplary embodiments of the present disclosure, performance evaluation can be appropriately performed in the process of searching for the architecture of the network model.
Embodiment(s) of the present disclosure can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present disclosure has been described with reference to exemplary embodiments, it is to be understood that the disclosure is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2022-065813, filed Apr. 12, 2022, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2022-065813 | Apr 2022 | JP | national |