The present disclosure relates to a model constructing method of a neural network model, a model constructing system, and a non-transitory computer readable storage medium. More particularly, the present disclosure relates to a model constructing method of a neural network model, a model constructing system, and a non-transitory computer readable storage medium for optimizing the neural network construction dynamically.
In recent years, neural networks have been effectively applied to different technical fields. The existing methods for neural training need to pre-define the model architecture. The existing methods for neural training do not learn the connection between each layers, they just use pre-defined connection paths between the layers, without dynamically search for best model architecture.
One aspect of the present disclosure is related to a model constructing method for a neural network model applicable for image recognition processing is disclosed. The model constructing method includes the following operation: updating, by a processor, a plurality of connection variables between a plurality of layers of the neural network model, according to a plurality of inputs and a plurality of outputs of the neural network model. The plurality of outputs represent a plurality of image recognition results. The plurality of connection variables represent a plurality of connection intensities between each two of the plurality of layers.
Another aspect of the present disclosure is related to a model constructing system for a neural network model applicable for image recognition processing. The model constructing system includes a memory and a processor. The memory is configured to store at least one instruction. The processor is coupled to the memory, in which the processor is configured to access and process the at least one instruction to: update a plurality of connection variables between a plurality of layers of the neural network model, according to a plurality of inputs and a plurality of outputs of the neural network model. The plurality of outputs represent a plurality of image recognition results. The plurality of connection variables represent a plurality of connection intensities between each two of the plurality of layers.
Another aspect of the present disclosure is related to a non-transitory computer readable storage medium storing one or more programs comprising instructions, which when executed, causes one or more processing components to perform operations including the following operations: updating a plurality of connection variables between a plurality of layers of a neural network model applicable for image recognition processing, according to a plurality of inputs and a plurality of outputs of the neural network model. The plurality of outputs represent a plurality of image recognition results. The plurality of connection variables represent a plurality of connection intensities between each two of the plurality of layers.
Through the operations of one embodiment described above, whether to keep or abandon the connection between the layers which are not adjacent to each other may be trained dynamically. The adjustment of the connection intensity between the layers which are not adjacent to each other may be trained dynamically. Better accuracy and performance of the neural network structure may be achieved dynamically.
The invention can be more fully understood by reading the following detailed description of the embodiments, with reference made to the accompanying drawings as follows:
Reference will now be made in detail to the present embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
It will be understood that, in the description herein and throughout the claims that follow, when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected” or “directly coupled” to another element, there are no intervening elements present. Moreover, “electrically connect” or “connect” can further refer to the interoperation or interaction between two or more elements.
It will be understood that, in the description herein and throughout the claims that follow, the terms “comprise” or “comprising,” “include” or “including,” “have” or “having,” “contain” or “containing” and the like used herein are to be understood to be open-ended, i.e., to mean including but not limited to.
It will be understood that, in the description herein and throughout the claims that follow, the phrase “and/or” includes any and all combinations of one or more of the associated listed items.
It will be understood that, in the description herein and throughout the claims that follow, unless otherwise defined, all terms (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Any element in a claim that does not explicitly state “means for” performing a specified function, or “step for” performing a specific function, is not to be interpreted as a “means” or “step” clause as specified in 35 U.S.C. § 112(f). In particular, the use of “step of” in the claims herein is not intended to invoke the provisions of 35 U.S.C. § 112(f).
In some embodiments, the memory 110 can be a flash memory, a HDD, a SSD (Solid State Disk), a DRAM (Dynamic Random Access Memory) or a SRAM (Static Random-Access Memory). In some embodiments, the memory 110 can be a non-transitory computer readable medium stored with at least one instruction associated with a machine learning method. The at least one instruction can be accessed and executed by the processor 130.
In some embodiments, the processor 130 can be, but is not limited to being, a single processor or an integration of multiple microprocessors such as CPUs or GPUs. The microprocessors are electrically coupled to the memory 110 in order to access the at least one instruction. According to the at least one instruction, the above-mentioned machine learning method can be performed. For better understanding, details of the machine learning method will be described in the following paragraphs.
Details of the present disclosure are described in the paragraphs below with reference to an model constructing method in
Reference is made to
It should be noted that the model constructing method can be applied to the model constructing system 100 shown in
It should be noted that, in some embodiments, the method may be implemented as a computer program. When the computer program is executed by a computer, an electronic device, or the one or more processor 130 in
In addition, it should be noted that in the operations of the following method, no particular sequence is required unless otherwise specified. Moreover, the following operations also may be performed simultaneously or the execution times thereof may at least partially overlap.
Furthermore, the operations of the following method may be added to, replaced, and/or eliminated as appropriate, in accordance with various embodiments of the present disclosure.
Reference is made to
In operation S210, inputting several inputs into a neural network model and obtaining several outputs according to the inputs. In some embodiments, the operation S210 may be operated by the processor 130 in
Reference is made to
For example, the connection variable V13 exist between layer L1 and L3, and the connection variable V13 represent the connection intensity between layer L1 and L3, and so on.
In operation S230, updating several connection variables of the neural network model according to the inputs and the outputs. In some embodiments, the operation S230 may be operated by the processor 130 in
Various methods may be implemented for operation S230. Reference is made to
In operation S232A, calculating a batch variance of several layer outputs of one of the layers. In some embodiments, the operation S232A may be operated by the processor 130 in
In operation S234A, updating a first connection variable of the connection variables according to the batch variance, in which the first connection variable represent a connection intensity between the one of the layers and another one of the layers. In some embodiments, the operation S234A may be operated by the processor 130 in
An example for operation S234A is as following. Reference is made to
Reference is made to
In operation S232B, setting a first connection variable to be a first value, in which the first value represents that a first connection intensity corresponding to the first connection variable is high. In some embodiments, the operation S232B may be operated by the processor 130 in
In operation S234B, setting a second connection variable to be a second value, in which the second value represents that a second connection intensity corresponding to the second connection variable is low. In some embodiments, the operation S234B may be operated by the processor 130 in
In operation S236B, generating a first output according to the first connection variable and the second connection variable. In some embodiments, the operation S236B may be operated by the processor 130 in
In operation S238B, updating the first connection variable and the second connection variable according to the first output. In some embodiments, the operation S238B may be operated by the processor 130 in
Examples for operations S232B to S238B are as following. Reference is made to
In accordance with the above, after setting at least one of the connection variables V13 to V35, the processor 130 inputs an input MI1 into the neural network model 300 and generates an output MO1 through the neural network model 300 corresponding to the input MI1 with the connection variable V14 being 1 and the connection variable V24 being 0. According to the output MO1, the processor 130 updates the connection variables V14 and V24 according to the backward gradient. For example, in some embodiments, the connection variable V14 may be updated to be 0.5, and the connection variable V24 may be updated to be 1.
With the updated connection variables V14 and V24, the processor 130 further inputs an input MI2 into the neural network model 300 and generates an output MO2 through the neural network model 300 corresponding to the input MI2 with the connection variable V14 being 0.5 and the connection variable V24 being 1. According to the output MO2, the processor 130 updates the connection variables V14 and V24 again.
In some embodiments, according to the output MO1, the processor 130 generates a backward gradient, in which the backward gradient present the gradient that the connection variables should be tuned.
It should be noted that, in some embodiments, in operations S232B and S234B, the at least one of the connection variables V13 to V35 to be set to be the first value, for example, to be set to be 1, is selected randomly by the processor 130. Similarly, the at least one of the connection variables V13 to V35 to be set to be the second value, for example, to be set to be 0, is selected randomly by the processor 130.
In some embodiments, the value of the connection variables V13 to V35 are between any of the two values. For example, the connection variables V13 to V35 may be between 1 and 0, with 1 representing the highest connection intensity and 0 representing the lowest connection intensity. For another example, the connection variables V13 to V35 may be between −1 and 1, or any other values.
In some embodiments, the connection variables V13 to V35 may include only two status, for example, connected or non-connected. For example, the connection variables V13 to V35 may include values of 1 and 0 only, in which value 1 represents that the corresponding layers are connected, and value 0 represents that the corresponding layers are not connected. If the connection variable V13 is 1, the corresponding layers L1 and L3 are connected. If the connection variable V13 is 0, the corresponding layers L1 and L3 are not connected.
In some embodiments, the two of the layers L1 to L5 that are adjacent to each other are connected originally. The embodiments of the present disclosure trained the connection variables between the two of the layers L1 to L5, in which the two of the layers L1 to L5 are not adjacent to each other.
In some embodiments, before the training of the neural network model 300 is started, the processor 130 is configured to connect all of the layers L1 to L5 between each other. That is, each two of the layers L1 to L5 are connected to each other initially by the processor 130. Moreover, for each of the connection relationships, the processor 130 presets a connection variable. For example, for the neural network model 300, the processor 130 connects all two of the layers L1 to L5 and presets the connection variables V13 to V35. In some embodiments, the connection variables V13 to V35 are preset randomly.
Reference is made to
As illustrated in
In some embodiments, the processor 130 as illustrated in
Details of updating the sub-connection variables of the neural network model 600 is similar to the methods of updating the connection variables of the neural network model 300, and may not be described again herein.
It should be noted that not only the sub-layers at different layers may be connected to each other, but also the sub-layers at the same layer may be connected to each other as well, as illustrated in
In some embodiments, the way to activate and update the connection variables between the layers which are not adjacent to each other is not limited to the operations as mentioned in
In some embodiments, the sizes of the layers of the neural network model are different, methods such as pooling, convolution, or deconvolution may be introduced to make the feature size matched, and the embodiments of updating the connection variables between the layers dynamically may also be applied.
It should be noted that the neural network model 300 in
Through the operations of the embodiments described above, whether to keep or abandon the connection between the layers which are not adjacent to each other may be trained dynamically. The adjustment of the connection intensity between the layers which are not adjacent to each other may be trained dynamically. Better accuracy and performance of the neural network structure may be achieved dynamically.
Although the present invention has been described in considerable detail with reference to certain embodiments thereof, other embodiments are possible. Therefore, the scope of the appended claims should not be limited to the description of the embodiments contained herein.
This application claims priority to U.S. Provisional Application Ser. No. 62/676,291, filed May 25, 2018, which is herein incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
11341616 | Wang | May 2022 | B2 |
20170228639 | Hara | Aug 2017 | A1 |
20170351948 | Lee | Dec 2017 | A1 |
20180174034 | Obradovic | Jun 2018 | A1 |
20190095464 | Xie | Mar 2019 | A1 |
20190114547 | Jaganathan | Apr 2019 | A1 |
20190138896 | Deng | May 2019 | A1 |
20190318469 | Wang | Oct 2019 | A1 |
20190325273 | Kumar | Oct 2019 | A1 |
20190332940 | Han | Oct 2019 | A1 |
20190348011 | Kurokawa | Nov 2019 | A1 |
20200210893 | Harada | Jul 2020 | A1 |
20200311456 | Ji | Oct 2020 | A1 |
20200380365 | Karino | Dec 2020 | A1 |
20200411199 | Shrager | Dec 2020 | A1 |
20210110273 | Kwon | Apr 2021 | A1 |
20210224640 | Nakahara | Jul 2021 | A1 |
Number | Date | Country | |
---|---|---|---|
20190362230 A1 | Nov 2019 | US |
Number | Date | Country | |
---|---|---|---|
62676291 | May 2018 | US |