The present disclosure relates to a neural network construction device, a neural network construction method, an image processing device, and an image processing method.
A learning model generation device described in Patent Literature 1, which is an example of a neural network construction device, replaces a layer constituting a learning model with a structure in which an operation amount is reduced, in order to suppress a decrease in accuracy of recognition or the like by the learning model even if computational resources of a computer for executing the learning model are relatively small.
Patent Literature 1: JP 2020-107042 A
Since the above-described learning model generation device reduces the operation amount of the above-described learning model, the accuracy of recognition or the like originally requested for the learning model is satisfied. However, there is a problem that the speed, that is, the necessary time originally requested for the learning model may not be satisfied.
An object of the present disclosure is to provide a neural network construction device capable of constructing a neural network model that satisfies both a necessary time and accuracy for inference, which are originally requested, even if resources for executing a learning model are relatively small.
In order to solve the above problem, there are included a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of, specifying a plurality of operations allocatable to a plurality of layers constituting a neural network model on the basis of a construction condition for constructing the neural network model, generating a first operation combination by allocating a corresponding one of the plurality of operations to each of the plurality of layers, determining whether or not a total necessary time which is inferred on the basis of the allocated operations and which is a total of necessary times of the respective operations satisfies a predetermined requested necessary time, training the neural network model having the generated first operation combination, determining whether or not accuracy of inference by the neural network model after the training satisfies predetermined requested inference accuracy, and when it is determined that the total necessary time does not satisfy the requested necessary time, or when it is determined that the accuracy of the inference does not satisfy the requested inference accuracy, generating a second operation combination different from the first operation combination.
With the neural network construction device according to the present disclosure, it is possible to construct a neural network model that satisfies both a necessary time and accuracy for inference, which are originally requested, even if resources for executing a learning model are relatively small.
Embodiments of a neural network construction device and an image processing device according to the present disclosure will be described.
Neural network construction devices NNK (illustrated in
The neural network construction device NNK of the first embodiment will be described.
As illustrated in
As illustrated in
Operations (not illustrated) are allocated to the input layer Lin and the output layer Lout in advance, and the operations thereof are fixed. An input/output relationship related to the operation (for example, the number of pieces of data input for the operation and the number of pieces of data output as a result of the operation) is determined in the process of constructing the neural network model NNM by the neural network construction device NNK (illustrated in
Operations and input/output relationships are not allocated to the seven intermediate layers L1 to L7 in advance, and the operations and the input/output relationships thereof are determined in the process of constructing the neural network model NNM by the neural network construction device NNK.
A value of the weight w is determined in the process of constructing the neural network model NNM by the neural network construction device NNK.
Hereinafter, the value of the weight w is referred to as a “weight value wt”. Although the weight value wt is originally a unique value, the weight value wt collectively refers to a plurality of weight values for easy description and understanding.
As illustrated in
As illustrated in
The condition information JJ includes conditions for constructing the neural network model NNM (illustrated in
As described above, the task name is a possible task to be executed by the neural network model NNM (illustrated in
The input/output size includes the number of pieces of data input to the input layer Lin and the number of pieces of data output from the output layer Lout when the neural network model NNM executes one of the tasks.
The hardware specifications are specifications of hardware that operates the neural network model NNM, and specifically includes a model of a computer, a type of processing (CPU/GPU/FPGA/DPU), and the like.
Instead of being stored in advance in the storage unit KI (illustrated in
The contents of the search space TK are specified by the condition information JJ. Specifically, as illustrated in
As illustrated in
The operation combination EK indicates a combination of operations allocated to the intermediate layers L1 to L7 of the neural network model NNM. The weight value wt is a value of a weight between layers as described above with reference to
Returning to
The interface unit IF receives input of the condition information JJ (also illustrated in
The requested necessary time YSJ and the requested inference accuracy YSS are the necessary time and the inference accuracy that are to be satisfied at the time of inference, that is, when the neural network model NNM actually performs a task.
The user inputs a request indicating that, for example, the neural network model NNM has to execute the task “class classification” (illustrated in
The training data KD is data used when the neural network construction device NNK constructs the neural network model NNM, more specifically, when training is performed. The training data KD is labeled data for performing supervised learning, for example, as conventionally known.
The acquisition unit SY acquires the condition information JJ, the requested necessary time YSJ, the requested inference accuracy YSS, and the training data KD via the interface unit IF.
The specification unit TO specifies the search space TK by referring to the table TB (illustrated in
For example, it is assumed that the acquisition unit SY acquires the condition information JJ of the task name “class classification”, the input size “1”, the output size “1”, and the hardware specifications “HW1” (hereinafter referred to as “condition information JJ1”), which are similar to those indicated by a dash-dot-dash line (left side) in
The search space TK1 indicates the following under the condition information JJ1 described above.
(1) Four operations OP1, OP2, OP3, and OP5 can be allocated to the intermediate layers L1 to L7 (illustrated in
(2) For example, in the operation OP1, the number of pieces of data input to the operation OP1 is one, and the number of pieces of data output from the operation OP1 is one.
(3) For example, the connection condition of the operation OP1 is that the operation OP1 is allocated to a position before the operation OP2.
(4) For example, when the operation OP1 is executed by the hardware HW1, the necessary time T1 is needed.
The operation OP1 and the like in the table TB include, for example, a Convolution operation and a Maxpooling operation which are conventionally well-known. In addition to the above operations, the operation OP1 and the like include, for example, the Skip Connection operation and the No Connection operation. The Skip Connection operation outputs input data as it is without performing any processing, and the No Connection operation does not output any data even if some data is input.
Hereinafter, in order to facilitate description and understanding, it is assumed that the necessary time Tl or the like of the operation OP1 or the like is fixed regardless of the internal structure of the operation OP1 or the like, the input/output relationship between the operation OP1 or the like and the outside, or the like. Thus, a total necessary time GSJ (as will be described in detail later) when the neural network model NNM executes the task “class classification” on the hardware HW1 under the condition information JJ1 is the sum of the necessary times T1, T2, T3, and T5 of the operations OP1, OP2, OP3, and OP5 allocated to the intermediate layers L1 to L7.
For example, the generation unit SE allocates the operation OP1 and the like to the intermediate layers L1 to L7 (illustrated in
The training unit KU trains the neural network model NNM generated by the generation unit SE. More specifically, for example, the training unit KU trains the neural network model NNM in which the operation OP1 and the like in the search space TK1 (illustrated in
The first determination unit HA1 determines whether or not the necessary time of the neural network model NNM (illustrated in
The second determination unit HA2 determines whether or not the inference accuracy of the neural network model NNM (illustrated in
The neural network construction device NNK may be, for example, either a centralized system or a distributed system.
The specification unit TO corresponds to a “specification unit”, the generation unit SE corresponds to a “generation unit”, the first determination unit HA1 corresponds to a “first determination unit”, the second determination unit HA2 corresponds to a “second determination unit”, and the training unit KU corresponds to a “training unit”. The condition information JJ corresponds to a “construction condition”.
As illustrated in
The processor PC is the well-known core of a computer that operates hardware in accordance with software. The memory MM includes, for example, a dynamic random access memory (DRAM) and a static random access memory (SRAM). The storage medium KB includes, for example, a hard disk drive (HDD), a solid state drive (SSD), and a read only memory (ROM). The storage medium KB stores a program PR. The program PR is a command group that defines contents of processing to be executed by the processor PC. The input unit NY includes, for example, a keyboard, a mouse, and a touch panel. The output unit SY includes, for example, a liquid crystal monitor, a printer, and a touch panel.
With regard to the relationship between the function and the configuration of the neural network construction device NNK, on the hardware, the processor PC executes the program PR stored in the storage medium KB on the memory MM and controls operations of the input unit NY and the output unit SY as necessary, thereby implementing the functions of the respective units from the interface unit IF to the second determination unit HA2 (illustrated in
Hereinafter, in order to facilitate the description and understanding, it is assumed that the user inputs the condition information JJ1, the requested necessary time YSJ1, the requested inference accuracy YSS1, and the training data KD1 described above as the condition information JJ, the requested necessary time YSJ, the requested inference accuracy YSS, and the training data KD.
Step ST11: the acquisition unit SY (illustrated in
Step ST12: when the condition information JJ1, the requested necessary time YSJ1, the requested inference accuracy YSS1, and the training data KD1 are acquired in step ST11, the specification unit TO (illustrated in
Step ST13: when the search space TK1 is specified in step ST12, the generation unit SE (illustrated in
Step ST14: when the first operation combination EK is generated by allocating the operation OP1 and the like to the intermediate layers L1 to L7 in step ST13, the first determination unit HA1 calculates the total necessary time GSJ as illustrated in
In addition to the above calculation, the first determination unit HA1 determines whether or not a necessary condition that the total necessary time GSJ (T1+T1+T2+T3+T2+T5+T5) is equal to or less than the requested necessary time YSJ1 acquired by the acquisition unit SY is satisfied. When it is determined that the condition is satisfied, the process proceeds to step ST15, and on the other hand, when it is determined that the condition is not satisfied, the process returns to step ST13.
Here, the following is assumed.
(1) The generation unit SE and the first determination unit HA1 repeat steps ST13 and ST14.
(2) After the repetition of (1), the generation unit SE allocates the operation OP1 to the intermediate layer L1, allocates the operation OP1 to the intermediate layer L2, allocates the operation OP3 to the intermediate layer L3, allocates the operation NO to the intermediate layer L4, allocates the operation OP3 to the intermediate layer L5, allocates the operation OP5 to the intermediate layer L6, and allocates the operation OP5 to the intermediate layer L7 as an m-th (m is any positive integer) operation combination EK, for example, as illustrated in the middle part of FIG. 7.
(3) After the generation of the m-th operation combination EK in (2), the first determination unit HA1 determines that the total necessary time GSJ (T1+T1+T3+0+T3+T5+T5) of the m-th operation combination EK satisfies the requested necessary time YSJ1.
“NO operation” means the No Connection operation. As described above, the “NO operation” has a function of outputting no data even when receiving input of some data. The necessary time of the “NO operation” is 0.
Step ST15: when it is determined in the immediately preceding step ST14 that the total necessary time GSJ (T1+T1+T3+0+T3+T5+T5) satisfies the requested necessary time YSJ1, the training unit KU (illustrated in
The training unit KU ends the above training after continuing the training for a predetermined time or a predetermined number of times, or ends the training after continuing the training depending on the progress of the learning effect by the training.
Step ST16: when the training of the neural network model NNM is completed in step ST15, the second determination unit HA2 (illustrated in
Here, the following is assumed.
(1) The generation unit SE, the first determination unit HA1, the training unit KU, and the second determination unit HA2 repeat steps ST13 to ST16.
(2) After the repetition of (1), the generation unit SE allocates the operation OP1 to the intermediate layer L1, allocates the operation OP1 to the intermediate layer L2, allocates the operation OP2 to the intermediate layer L3, allocates the operation OP2 to the intermediate layer L4, allocates the operation SKIP to the intermediate layer L5, allocates the operation OP5 to the intermediate layer L6, and allocates the operation OP1 to the intermediate layer L7 as an n-th (n is an integer greater than m) operation combination EK, for example, as illustrated in the lower part of
(3) After generation of the n-th operation combination EK in (2), the first determination unit HA1 determines that the total necessary time GSJ (T1+T1+T2+T2+0+T5+T1) of the n-th operation combination EK satisfies the requested necessary time YSJ1.
“SKIP operation” means the Skip Connection operation. As described above, the “SKIP operation” has a function of outputting input data as it is without performing any processing. The necessary time of the “SKIP operation” is 0.
(4) After the determination of (3), the training unit KU adjusts the weight value wt of the neural network model NNM having the n-th operation combination EK, for example, sets the weight value wt to wta, then changes it to wtb, and then changes it to wtc.
(5) After the training of (4), the second determination unit HA2 determines that the inference accuracy SS of the n-th operation combination EK satisfies the requested inference accuracy YSS1.
Step ST17: when it is determined in the immediately preceding step ST16 that the inference accuracy SS of the neural network model NNM having the n-th operation combination EK satisfies the requested inference accuracy YSS1, the second determination unit HA2 stores the construction data CD, that is, the n-th operation combination EK (illustrated in the lower part of
As described above, in the neural network construction device NNK of the first embodiment, the allocatable operation OP1 and the like in the search space TK1 specified by the condition information JJ1 input by the user are allocated to the intermediate layers L1 to L7 of the neural network model NNM to be constructed. Thus, it is possible to increase the possibility of constructing the neural network model NNM that satisfies the requested necessary time YSJ1.
The neural network construction device NNK of the first embodiment also performs training using the training data KD1. Thus, it is possible to increase the possibility of constructing the neural network model NNM that satisfies the requested inference accuracy YSS1.
Further, in the neural network construction device NNK of the first embodiment, by allocating the No Connection operation and the Skip Connection operation to at least one or more of the intermediate layers L1 to L7 of the neural network model NNM, the number of layers that do not substantially function is increased, in other words, the number of layers that substantially function is reduced. Thereby, the overall operation amount of the neural network model NNM can be reduced.
The neural network construction device NNK of the first embodiment is mounted on a vehicle such as a private car in order to perform image processing for autonomous driving, for example. The neural network construction device NNK stores image data in advance at the manufacturing stage of the vehicle by the manufacturer. In addition, the neural network construction device NNK continues to accumulate image data at the traveling stage of the vehicle by the driver. The neural network construction device NNK can construct the neural network model NNM that executes the class classification for both the images described above, for example, the classification as to whether the detected object is a person, an animal, a tree, or the like, in such a way as to satisfy the requested necessary time YSJ1 and the requested inference accuracy YSS1 described above.
Unlike the neural network construction device NNK of the first embodiment that sums the necessary times in the intermediate layers L1 to L7, the necessary times in the intermediate layers L1 to L7 and the necessary times in the input layer Lin and the output layer Lout to which the operation OP is not allocated may be summed to obtain the total necessary time GSJ, and then it may be determined whether or not the total necessary time GSJ satisfies the requested necessary time YSJ1.
Unlike the neural network construction device NNK of the first embodiment that generates only the neural network model NNM having the n-th operation combination EK, for example, a plurality of operation combinations EK such as a p-th (p is an integer greater than n) operation combination EK, a q-th (q is an integer greater than p) operation combination EK, . . . may be generated. Then, the most excellent operation combination EK may be selected from the viewpoint of the requested necessary time YSJ1 and the requested inference accuracy YSS1 among the plurality of operation combinations EK, that is, the n-th operation combination EK, the p-th operation combination EK, and the q-th operation combination EK.
A neural network construction device NNK of a second embodiment will be described.
Unlike the case where the neural network construction device NNK of the first embodiment uses the total necessary time GSJ (illustrated in
As illustrated in
The function of each unit from the interface unit IF to the second determination unit HA2 of the second embodiment is similar to that of each unit from the interface unit IF to the second determination unit HA2 of the first embodiment.
The execution unit JI is provided, for example, on hardware “HW1” specified by the condition information JJ1 (illustrated in
The configuration of the neural network construction device NNK of the second embodiment is similar to the configuration of the neural network construction device NNK of the first embodiment (illustrated in
For ease of description and understanding, the following is assumed.
(1) As in the first embodiment, the user inputs the condition information JJ1, the requested necessary time YSJ1, the requested inference accuracy YSS1, and the training data KD. (2) As in the first embodiment, the n-th operation combination EK is generated after the first operation combination EK, the m-th operation combination EK, and the like.
Steps ST21 to ST23: the acquisition unit SY, the specification unit TO, and the generation unit SE perform processes similar to those in steps ST11 to ST13 of the first embodiment.
Step ST24: when the allocatable operation OP1 and the like in the search space TK1 specified by the condition information JJ1 are allocated to the intermediate layers L1 to L7 in step ST23, that is, when the neural network model NNM having the n-th operation combination EK (illustrated in the lower part of
Step ST25: when the actual necessary time JSJ is measured in step ST24, the first determination unit HA1 determines whether or not the actual necessary time JSJ of the n-th operation combination EK satisfies the requested necessary time YSJ1.
Steps ST26 to ST28: the training unit KU and the second determination unit HA2 perform processes similar to those in steps ST15 to ST17 of the first embodiment.
As described above, the neural network construction device NNK of the second embodiment measures and uses the actual necessary time JSJ of the neural network model NNM, for example, the actual necessary time JSJ when the neural network model NNM having the n-th combination actually operates on the hardware HW1. Thereby, it is possible to construct the neural network model NNM having a higher possibility of satisfying the requested necessary time YSJ1 as compared with the neural network model NNM of the first embodiment constructed using the necessary times for operations (illustrated in
An image processing device of a third embodiment will be described.
As illustrated in
The storage unit KO stores the neural network model NNM constructed under conditions such as the condition information JJ1 by the neural network construction device NNK of any one of the first embodiment and the second embodiment, more precisely, the n-th operation combination EK and the weight value wt.
The input unit NB is, for example, an imaging camera. The input unit NB acquires, for example, image data GD.
The inference unit SU is provided, for example, on the hardware HW1 specified by the condition information JJ1. The inference unit SU performs inference as conventionally known, and specifically, executes the neural network model NNM using the image data GD.
The output unit SB is, for example, a liquid crystal monitor, a speaker, or the like. The output unit SB outputs an inference result SK that is a result of the inference performed by the inference unit SU.
The storage unit KO corresponds to a “memory”, the input unit NB corresponds to a “camera”, the output unit SB corresponds to a “monitor or speaker”, the image data GD corresponds to “image data”, and the “inference result SK” corresponds to a “result of inference”.
As described above, the image processing device GSS of the third embodiment executes, on the hardware HW1 specified by the condition information JJ1, the neural network model NNM constructed by the neural network construction device NNK of the first or second embodiment and having the n-th operation combination EK satisfying the requested necessary time YSJ1 and the requested inference accuracy YSS1. Thus, it is possible to perform inference that satisfies the requested necessary time YSJ1 and the requested inference accuracy YSS1.
The above-described embodiments may be combined, components in each embodiment may be appropriately omitted or changed, or other components may be added, without departing from the gist of the present disclosure.
The neural network construction device according to the present disclosure can be applied to a neural network construction device capable of constructing a neural network model that can satisfy the necessary time and accuracy for inference.
CD: construction data, EK: operation combination, GD: image data, GSJ: total necessary time, GSS: image processing device, HA1: first determination unit, HA2: second determination unit, HW: hardware, IF: interface unit, JI: execution unit, JJ: condition information, JSJ: actual necessary time, KB: storage medium, KD: training data, KI: storage unit, KO: storage unit, KU: training unit, L: intermediate layer, Lin: input layer, Lout: output layer, MM: memory, NB: input unit, NNK: neural network construction device, NNM: neural network model, NY: input unit, OP: operation, PR: program, SB: output unit, SE: generation unit, SK: inference result, SS: inference accuracy, SU: inference unit, SY: acquisition unit, SY: output unit, TB: table, TK: search space, TO: specification unit, w: weight, wt: weight value, YSJ: requested necessary time, YSS: requested inference accuracy.
This application is a Continuation of PCT International Application No. PCT/JP2021/022425 filed on Jun. 14, 2021, which is hereby expressly incorporated by reference into the present application.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2021/022425 | Jun 2021 | US |
Child | 18381703 | US |