The present document claims priority to Chinese Patent Application No. 202210278192.7, titled “METHOD AND APPARATUS FOR GENERATING NEURAL NETWORK,” filed on Mar. 21, 2022, the content of which is incorporated herein by reference in its entirety.
The present document relates generally to the technical field of neural networks and, more particularly, to a method and an apparatus for generating a neural network.
In recent years, with the rapid development of deep learning, higher requirements have been put forward for the performance parameters of neural networks such as accuracy, number of parameters, and running speed. However, artificial designing of neural networks requires the expertise of designers, and a large number of experiments are also necessary to verify the performance of neural networks. Therefore, automatic designing of efficient neural networks has attracted attention in recent years, and neural architecture search (NAS) has been increasingly favored for its high performance, deep automation, and other advantages.
Typically, NAS needs to sample and train candidate network structures in a search space, evaluate the candidate network structures in terms of a single performance parameter, and determine a target neural network in terms of the single performance parameter according to obtained data. This fails to implement searching under constraints.
The present document presents a technique for generating a neural network that can enable searching under constraints.
A summary of the document is given below to provide a basic understanding of some aspects of the document. It should be understood that this summary is neither an exhaustive overview of the document, nor intended to identify key or critical elements of the document or define the scope of the document. It is intended solely to present some concepts in a simplified form as a prelude to the more detailed description that follows.
According to an aspect of the present document, a method for generating a neural network is provided, including: training a plurality of neural networks for a plurality of performance parameters to obtain a plurality of parameter values for each performance parameter; training a plurality of neural network predictors based on the parameter values and the neural networks; and determining a target neural network using the trained neural network predictors.
According to another aspect of the present document, an apparatus for generating a neural network is provided, including: a first training unit configured to train a plurality of neural networks for a plurality of performance parameters to obtain a plurality of parameter values for each performance parameter; a second training unit configured to train a plurality of neural network predictors based on the parameter values and the neural networks; and a determination unit configured to determine a target neural network using the trained neural network predictors.
According to another aspect of the present document, a computer program for enabling the above method for generating a neural network is provided. Furthermore, a computer program product in the form of at least a computer-readable medium recording computer program codes for implementing the above method for generating a neural network is provided.
According to another aspect of the present document, an electronic device is provided, including a processor and memory, wherein the memory stores a program which, when executed by the processor, causes the processor to perform the above method for generating a neural network.
According to another aspect of the present document, a data processing method is provided, including: receiving data; and processing the data using the target neural network determined according to the above method for generating a neural network to achieve at least one of data classification, semantic segmentation, or target detection.
According to the technique for generating a neural network herein, a plurality of neural network predictors may be trained to determine a target neural network, and one or more of the plurality of neural network predictors may represent a constraint (the one or more neural network predictors representing a constraint are also referred to as auxiliary predictors), thereby enabling an automatic search for a network structure satisfying a preset constraint in a search space of network structures.
The above and other objects, features and advantages of the present document will be more readily understood by reference to the following description of embodiments of the document taken in conjunction with the accompanying drawings, in which:
Hereinafter, some embodiments of the present document will be described in detail with reference to the accompanying illustrative drawings. When reference is made to an element of a drawing, while the element is shown in different drawings, the element will be referred to by the same reference numerals. Furthermore, in the following description of the present document, a detailed description of known functions and configurations incorporated herein will be omitted to avoid rendering the subject matter of the present document unclear.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit this document. As used herein, the singular forms of terms are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms “comprise”, “include”, and “have” herein are taken to specify the presence of stated features, entities, operations, and/or components, but do not preclude the presence or addition of one or more other features, entities, operations, and/or components.
Unless otherwise defined, all the terms including technical and scientific terms herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
In the following description, numerous specific details are set forth to provide a thorough understanding of the present document. The present document may be implemented without some or all of these specific details. In other instances, to avoid obscuring the document by unnecessary detail, only features that are germane to aspects according to the document are shown in the drawings, and other details that are not germane to the document are omitted.
Hereinafter, a technique for generating a neural network according to the present document will be described in detail in conjunction with embodiments of the present document with reference to the accompanying drawings.
According to some embodiments of the present document, the method 100 may include:
According to some embodiments of the present document, the method 100 may optionally include step S105 of determining a set of network structures, where each network structure in the set of network structures characterizes a neural network, as indicated in a dashed box.
A neural network, also known as an artificial neural network (ANN) is an algorithm mathematical model for distributed parallel information processing that imitates behavior characteristics of animal neural networks. Such a network relies on the complexity of a system and enables the processing of information by adjusting the interconnection of a large number of internal nodes. Since neural networks and their network structures are known to those skilled in the art, the details of neural networks and their network structures are not described in more detail herein for the sake of brevity. Furthermore, in the context herein, “neural network structure” and “network structure” are the terms that have the same meaning, both for characterizing a neural network and are therefore used interchangeably in the description.
Steps S105, S110, S120, and S130 of the method 100 are described in more detail below in connection with
According to some embodiments of the present document, in step S105 of the method 100, the set of network structures is determined. The set of network structures may also be referred to as a search space of network structures. Each network structure in the set of network structures contains information such as a depth, width and/or size of a convolution kernel of the neural network to which the network structure corresponds (also referred to as the neural network characterized by the network structure), and thus selecting a network structure is equivalent to selecting the neural network to which the network structure corresponds.
According to some embodiments of the present document, the network structure in the set of network structures may be a network structure based on a network topology and/or a network structure based on a network size. Accordingly, the set of network structures may include a subset of network structures based on the network topology and/or a subset of network structures based on the network size.
According to some embodiments of the present document, the network structures based on the network topology may include, for example, a network structure represented by a directed acyclic graph (DAG). The directed acyclic graph refers to a directed graph in which no loops exist. In other words, a directed graph is a directed acyclic graph if a path cannot start from a node and go back to the same node through several edges. Since directed acyclic graphs are known to those skilled in the art, the details of directed acyclic graphs are not described in more detail herein for the sake of brevity.
According to some embodiments of the present document, nodes of the directed acyclic graph may represent different types of operations of a neural network, and one node may represent one operation. An edge of the directed acyclic graph may represent a connection relationship between nodes of the directed acyclic graph. One edge typically corresponds to two nodes (e.g., the two nodes connected by the edge) to represent a connection relationship between the two nodes.
According to some embodiments of the present document, each operation represented by a node in the directed acyclic graph may be one of inputting, convolution, pooling, reduce-summing, skipping, zeroizing, and outputting. Herein, the convolution can include group convolution, separable convolution, or dilated convolution; the pooling may include max pooling or average pooling; the reduce-summing may include addition along a channel dimension or a spatial dimension. Furthermore, according to some embodiments of the present document, the size of the convolution (i.e., the size of the convolution kernel) and the size of the pooling may be set for a particular target task.
According to some embodiments of the present document, the edge in the directed acyclic graph may be directed to indicate an order of execution of the operations represented by the corresponding two nodes.
The directed acyclic graph shown in
As shown in
As described above, the skipping and zeroizing operations may be omitted regardless of the consumption of computational resources. In the example of
According to some embodiments of the present document, the operations represented by all the nodes may be encoded using one-hot codes to form an operation matrix representing the nodes of the directed acyclic graph. For the example shown in
Furthermore, according to some embodiments of the present document, connection relationships between the nodes in the directed acyclic graph may be encoded to form a connection matrix representing the edges of the directed acyclic graph. In the example of
The connection matrix and the operation matrix derived from the directed acyclic graph on the left side of
Furthermore, according to some embodiments of the present document, network structures based on the network size may include, for example, a network structure represented by a one-dimensional vector. In the case of the network structure represented by the one-dimensional vector, the topological structure of the network is not considered, and the size of the network is the only thing of interest, such as the width and depth of the network.
According to some embodiments of the present document, a network structure based on the network size may be represented by a one-dimensional vector that may be constructed by concatenating numerical values representing the sizes of the neural network characterized by the network structure at different stages. For example, if the neural network characterized by the network structure has four stages, the width at each stage is 64, 128, 256, and 512 sequentially, and the depth at each stage is 4, 3, 3, and 4 sequentially, then the network structure can be represented by a one-dimensional vector, i.e., {64, 128, 256, 512, 4, 3, 3, 4}, constructed by concatenating the above values.
Thus, according to some embodiments of the present document, a network structure based on the network size may be represented as a vector v=[w1,w2,...wS,d1,d2,...dS], where ws and ds represent the width and depth of the network structure at the s-th stage, respectively, 1≤s≤S, and S represents the total number of stages of the network structure. Thus, according to some embodiments of the present document, the network structure represented by the vector v can be updated by updating the vector v.
It will be appreciated by those skilled in the art that the network structure characterizing the neural network is not limited to those defined by an encoding manner such as a matrix based on the network topology, or a vector based on the network size described above as an example. Given the teachings and concepts of the present document, one of ordinary skill in the art may devise other encoding solutions to define the network structure characterizing the neural network, and all such variations are intended to be within the scope of the present document.
Next, according to some embodiments of the present document, in step S110 of the method 100, a plurality of neural networks are trained for a plurality of performance parameters to obtain a plurality of parameter values for each performance parameter.
A plurality of network structures characterizing the plurality of neural networks may be selected (e.g., sampled) from the set of network structures determined in step S105 and trained to obtain a plurality of parameter values for a plurality of performance parameters. The parameter values may be divided into a plurality of groups, where each group may include a number of parameter values, and one performance parameter corresponding to one group of parameter values. In other words, for each trained neural network, it can generate one parameter value for each performance parameter. For any of the plurality of performance parameters, a group of parameter values is obtained by training the plurality of network structures, where the number of parameter values in the group is the same as the number of trained neural networks. Since the training of the neural networks is known to those skilled in the art, the details of which are not described in greater detail herein for the sake of brevity.
According to some embodiments of the present document, the plurality of performance parameters may include at least two of an accuracy, a number of parameters, an amount of delay at run-time, and an amount of computation needed at run-time (e.g., a number of floating-point operations) of the neural network for a particular target task.
According to some embodiments of the present document, examples of the particular target task may be data classification (e.g., image analysis), semantic segmentation, target detection, etc.
For example, in the case where the particular target task is target detection, the parameter value of a first performance parameter may be the accuracy for target detection of a corresponding trained neural network, a second performance parameter may be the number of parameters, such as weights, of the corresponding trained neural network, a third performance parameter may be the amount of delay at run-time when the corresponding trained neural network performs target detection, and a fourth performance parameter may be the amount of computation needed which the corresponding trained neural network performs target detection. It will be understood by those skilled in the art that there may be more or fewer performance parameters, not limited to four.
According to some embodiments of the present document, in step S110, it is assumed that L network structures (represented by grey boxes in
According to some embodiments of the present document, through the operation performed in step S110 above, for each network structure selected from the set of network structures, parameter values of a plurality of performance parameters corresponding to the network structure may be obtained such that the network structure and the parameter values of the corresponding performance parameters constitute data pairs. For example, it is assumed that neural networks characterized by the selected network structures are trained for four performance parameters, a plurality of data pairs, such as a first data pair (ai (or vi), Pi1), a second data pair (ai (or vi), Pi2), a third data pair (ai (or vi), Pi3), and a fourth data pair (ai (or vi), Pi4), can be obtained by training a selected i-th (1≤i≤L) network structure, where Pi1 represents the parameter value of the first performance parameter, Pi2 represents the parameter value of the second performance parameter, Pi3 denotes the parameter value of the third performance parameter, and Pi4 denotes the parameter value of the fourth performance parameter. In this way, four groups of data pairs can be obtained, the first group including L first data pairs, the second group including L second data pairs, the third set including L third data pairs, and the fourth group including L fourth data pairs.
Next, according to some embodiments of the present document, in step S120 of the method 100, a plurality of neural network predictors are trained based on the plurality of neural networks and the plurality of parameter values, where each neural network predictor is used for predicting one performance parameter for the neural networks. For example, a plurality of network structures corresponding to the plurality of neural networks and a corresponding plurality of groups of parameter values may be provided to the plurality of neural network predictors to train the neural network predictors. Each neural network predictor corresponds to one performance parameter, such that a group of parameter values obtained for a particular performance parameter is used to train one neural network predictor to which the performance parameter corresponds.
According to some embodiments of the present document, the number of neural network predictors trained in step S120 corresponds to the number of performance parameters. For example, if in step S110 parameter values are obtained only for two performance parameters, that is, two groups of parameter values are obtained, the number of neural network predictors trained in step S120 is also two. If parameter values are obtained for four performance parameters in step S110, that is, four groups of parameter values are obtained, and the number of neural network predictors trained in step S120 is four.
Note that each neural network predictor corresponds to one performance parameter, and different neural network predictors correspond to different performance parameters. Thus, each neural network predictor is used to predict a parameter value of one performance parameter for the neural network, and different neural network predictors are used to predict parameter values of different performance parameters for the neural network. As described above, it is assumed that, in step 110, L network structures are selected from the set of network structures defined in step S105, the neural networks characterized by the L network structures are trained for four performance parameters, and four groups of data pairs are obtained; the four groups of data pairs are respectively used for training a corresponding first neural network predictor, a second neural network predictor, a third neural network predictor, and a fourth neural network predictor.
According to some embodiments of the present document, a neural network predictor may be trained through a regression analysis method using Huber loss. Since the regression analysis method using Huber losses is known to those skilled in the art, the details thereof are not described in more detail herein for the sake of brevity. Moreover, those skilled in the art will recognize that while embodiments of the present document are described above by taking an example of the regression analysis method using Huber loss, the present document is not so limited. In light of the teachings and concepts of the present document, one of ordinary skill in the art can devise other methods to train corresponding neural network predictors based on data pairs, and all such variations are intended to be within the scope of the present document.
According to some embodiments of the present document, the neural network predictor trained in step S120 may be used to predict a performance parameter of the neural network. In other words, the trained neural network predictor may predict the performance parameters of each network structure in the set of network structures defined in step S105. Specifically, for example, if the first neural network predictor is trained using L network structures selected from the set of network structures and a group of parameter values of the corresponding first performance parameter (e.g., the first data pair (ai (or vi), Pi1) described above), then the first neural network predictor may be used to predict the first performance parameters of other network structures than the L network structures in the set of network structures. In fact, the neural network predictor can learn the law concerning different samples (network structures) through the training, and then can cause the network structures to update, so as to obtain network structures with higher prediction performance. In other embodiments, the trained neural network predictor may predict the performance parameter not only of each network structure in the set of network structures defined in step S105, but also of other network structures associated with the network structures in the set of network structures (e.g., network structures generated through multiple iterations using the trained neural network predictors, described below, which may include network structures associated with, but not belonging to, the network structures in the set of network structures).
Furthermore, according to some embodiments of the present document, the plurality of neural network predictors trained in step S120 may include a main predictor and at least one auxiliary predictor. According to some embodiments of the present document, the selection of the main predictor or auxiliary predictors may be determined according to the particular target task. For example, for a particular task target that is accuracy sensitive, the main predictor may be a predictor that predicts the performance parameter of accuracy for a neural network; the auxiliary predictor may be a predictor that predicts other performance parameters for the neural network, such as the number of parameters, the amount of delay at run-time, or the amount of computation needed at run-time. According to some embodiments of the present document, the main predictor may play a dominant role in determining the final network structure (i.e., a target neural network), while the auxiliary predictor may play a subordinate role in determining the final network structure. This will be described in further detail below.
Next, according to some embodiments of the present document, in step S130 of the method 100, a target neural network is determined using a trained plurality of neural network predictors.
Specifically, according to some embodiments of the present document, in step S130, the target neural network may be determined using the trained plurality of neural network predictors, including the main predictor and the auxiliary predictor. As shown in
In sub-step S131, multiple iterations are performed using the trained neural network predictors, where the number of iterations may be determined empirically. In each iteration, the trained plurality of neural network predictors are used to determine a plurality of gradient structures respectively corresponding to the plurality of neural network predictors based on the network structure obtained in a previous iteration, and a network structure for this iteration is obtained based on the network structure obtained in the previous iteration and the plurality of gradient structures.
According to some embodiments of the present document, in each iteration, different weights are assigned to the gradient structures corresponding to the main predictor and the auxiliary predictor, respectively. In other words, the weights reflect the different roles that the main predictor and the auxiliary predictor play in determining the final network structure (i.e., the target neural network). For example, according to some embodiments of the present document, in each iteration, a relatively large weight may be assigned to the gradient structure corresponding to the main predictor and a relatively small weight may be assigned to the gradient structure corresponding to the auxiliary predictor, that is, the weight assigned to the gradient structure corresponding to the auxiliary predictor is smaller than the weight assigned to the gradient structure corresponding to the main predictor.
Specifically, according to some embodiments of the present document, the above iterations may be represented by Equation (1) below:
In Equation (1), a represents a network structure encoded as a matrix. In other embodiments of the present document, the encoded matrix a of a network structure in Equation (1) may be replaced with an encoded vector v of a network structure. In Equation (1), at+1 represents the network structure for this iteration, and at represents the network structure obtained in the previous iteration. Furthermore, in Equation (1), PΩ is a function of projecting a network structure in an encoded form back into the search space (i.e., the set of network structures determined in step S105), η is a learning rate, m represents the main predictor, and aux represents the auxiliary predictor. In addition,
represents a gradient structure corresponding to the main predictor,
represents a gradient structure corresponding to the auxiliary predictor, and w represents a weight corresponding to the auxiliary predictor (or a weight of the gradient structure corresponding to the auxiliary predictor) and may have any value selected empirically. The value of w may be determined empirically, for example, according to the desired number of parameters or throughput of the neural network (e.g., the neural network corresponding to the network structure identified after searching). Note that in Equation (1), the weight corresponding to the main predictor (or the weight of the gradient structure corresponding to the main predictor) is 1, and those skilled in the art can understand that the weight corresponding to the main predictor may also be any value selected empirically.
Those skilled in the art will recognize that although Equation (1) includes only one gradient structure corresponding to an auxiliary predictor, the present document is not so limited. According to the teachings and concepts of the present document, Equation (1) may also include a plurality of gradient structures corresponding to auxiliary predictors, where the number of the gradient structures corresponds to the number of the auxiliary predictors, and each of the plurality of gradient structures corresponding to the auxiliary predictors has a corresponding weight. According to some embodiments of the present document, the value of the weight of the gradient structure corresponding to the auxiliary predictor may be determined according to the particular target task.
According to some embodiments of the present document, searching under constraints for a network structure is achieved by adding gradient structure terms corresponding to a plurality of neural network predictors into Equation (1). In other words, by converting the constraint into the gradient structure term corresponding to the auxiliary predictor, the efficiency of searching can be improved while satisfying the preset constraint.
According to some embodiments of the present document, in a first iteration, a network structure, which may be denoted as a0, may be randomly selected (e.g. sampled) from the set of network structures, i.e., the search space, as an initial point for the iteration. Subsequently, the network structure is updated using Equation (1).
According to some embodiments of the present document, as shown in Equation (1), the step of obtaining the network structure at+1 for this iteration based on the network structure at obtained in the previous iteration and the gradient structures such as
and
may include: modifying the network structure at obtained in the previous iteration using the gradient structures such as
determining whether the modified network structure, for example,
belongs to the set of network structures; and in response to the modified network structure not belonging to the set of network structures, projecting the modified network structure to the set of network structures to obtain the network structure at+1 for this iteration. In this regard, it will be appreciated by those skilled in the art that the function PΩ serves to avoid a situation that the modified network structure is beyond the set of network structures (i.e., the search space). According to some embodiments of the present document, the function PΩ may be, for example, the argmax function.
Specifically, according to some embodiments of the present document, when the modified network structure is beyond the set of network structures, a network structure from the set of network structures which is closest to the modified network structure, for example,
may be determined as the network structure at+1 for this iteration. For example, a network structure from the set of network structures, i.e., the search space, which has the shortest distance from the modified network structure may be determined as the network structure at+1 for this iteration, and the distance may be, for example, a Euclidean distance. In some embodiments, if the modified network structure is beyond the set of network structures, the modified network structure may be subjected to a rounding operation to obtain a corresponding network structure in the set of network structures. In some embodiments, the modified network structure is still determined as the network structure for this iteration if the distance from the modified network structure to the set of network structures is within a preset threshold range, although the modified network structure is beyond the set of network structures, which is particularly applicable where the number of network structures in the set of network structures is small.
In sub-step S132, the target neural network is determined. In this step, a network structure characterizing the target neural network may be selected according to a predetermined rule from the network structures obtained in the multiple iterations in sub-step S131.
According to some embodiments of the present document, the neural networks characterized by the network structures obtained through multiple iterations can be trained for a performance parameter corresponding to the main predictor (namely, a performance parameter predicted by the main predictor), and then parameter values corresponding to each network structure are obtained; a network structure is selected according to the parameter value (for example, a network structure corresponding to a maximum parameter value is selected), and the neural network characterized by the network structure is taken as the target neural network. For example, the network structure obtained in each iteration described above may be determined as a candidate target network structure. That is to say, J candidate target network structures can be obtained in J iterations (the J candidate target network structures can constitute a set of candidate target network structures). According to some embodiments of the present document, the neural networks characterized by the J candidate target network structures can be trained, and an optimal candidate target network structure is determined as the network structure characterizing the target neural network based on a comparison of parameter values of a performance parameter (e.g., the performance parameter corresponding to the main predictor, such as accuracy) of the trained J neural networks.
According to the technique for generating a neural network of the present document, automatic searching for a network structure satisfying a preset constraint can be achieved for different tasks in a search space of network structures while consuming fewer computational resources. Specifically, according to the technique for generating a neural network herein, efficient searching for a network structure is achieved without training a large number of samples by introducing a search strategy of gradient updates. For example, according to the technique for generating a neural network herein, a neural network structure with good performance that meets the constraint can be found using only a few tens of samples, enabling cost-effective automatic searching for a network structure regardless of the artificial design of the neural network.
In sub-step S231, a group of network structures is obtained using the trained neural network predictors. In this step, to obtain the group of network structures, multiple iterations are performed using the trained neural network predictors, where the number of iterations can be determined empirically. This step is the same as step S131 of
In sub-step S236, the neural network predictors are retrained. This step is similar to step S120 of
In some embodiments, the neural network predictors are trained based on the network structures obtained in sub-step S231 and the parameter values obtained in sub-step S235. In some embodiments, the neural network predictors are trained based on the network structures selected from the set of network structures in step 110 and the corresponding parameter values obtained in step 110, in addition to the network structures obtained in sub-step S231 and the corresponding parameter values obtained in sub-step S235.
In sub-step S237, a determination is made as to whether the neural network predictors have been retrained for a predetermined number of times, where the predetermined number of times may be determined empirically and may be any integer greater than or equal to 1. A counter may be provided in some embodiments to count the number of times the neural network predictors are retrained. An initial value of the counter is 0, and upon each iteration through sub-step S236, the value of the counter is incremented by 1.
If in sub-step S237, a determination is made that the neural network predictors have been retrained for the predetermined number of times, the method proceeds to sub-step S232 to determine the target neural network. Sub-step S232 is the same as sub-step S132 of
If a determination is made in sub-step 237 that the neural network predictors have not been retrained for the predetermined number of times, the method returns to sub-step 231 to begin the next iteration for training the neural network predictors.
Furthermore, the present document provides an apparatus 400 for generating a neural network.
As shown in
Furthermore, according to some embodiments of the present document, the apparatus 400 may optionally include a second determination unit 405, as indicated by a dashed box, configured to determine a set of network structures.
According to some embodiments of the present document, the second determination unit 405, the first training unit 410, the second training unit 420, and the first determination unit 430 included in the apparatus 400 above may respectively perform the operations in steps S105, S110, S120, and S130 included in the method 100 for generating a neural network described above with reference to
According to the technique for generating a neural network of the present document, automatic searching for a network structure satisfying a preset constraint can be achieved for different tasks in a huge search space of network structures while consuming fewer computational resources. Specifically, according to the technique for generating a neural network herein, efficient searching for a network structure is achieved without training a large number of samples by introducing a search strategy of gradient updates. For example, according to the technique for generating a neural network herein, a neural network structure with good performance that meets the constraint can be found using only a few tens of samples, enabling cost-effective automatic searching for a network structure regardless of the artificial design of the neural network.
In
The following components are also connected to the input/output interface 505: an input component 506 (including a keyboard, a mouse, etc.), an output component 507 (including a display such as a CRT and an LCD, and a speaker, etc.), a storage component 508 (including a hard disk, etc.), and a communication component 509 (including a network interface card such as a LAN card, and a modem, etc.). The communication component 509 performs communication processing via a network such as the Internet. A drive 510 may also be connected to the input/output interface 505 as desired. A removable medium 511 such as a magnetic disk, optical disk, magneto-optical disk, and semiconductor memory may be installed on the drive 510 as desired so that a computer program read therefrom may be installed in the storage component 508 as desired.
In the case where the series of processes are implemented by software, the program constituting the software may be installed from a network such as the Internet or a storage medium such as the removable medium 511.
It will be understood by those skilled in the art that such a storage medium is not limited to the removable medium 511 shown in
Furthermore, the present document provides a program product storing machine-readable instruction code. The instruction code, when read and executed by a machine, may perform the data processing method and the method for generating a neural network according to the present document described above. Accordingly, the various storage media listed above for carrying such a program product are also included within the scope of the present document.
The technique for generating a neural network according to the present document may be applied to any technical field of information or data processing using neural networks. For example, according to some embodiments of the present document, data processing (e.g., image processing) may be performed using the target neural network determined by the method and apparatus for generating a neural network described above to enable, for example, data classification (e.g., image classification), semantic segmentation, and/or object detection.
For example, according to some embodiments of the present document, in the apparatus 400 for generating a neural network, the first training unit 410 may train a plurality of neural networks using labeled image data to obtain a plurality of parameter values for a plurality of performance parameters of the plurality of neural networks. The second training unit 420 may train a plurality of neural network predictors configured to predict performance parameters of the neural networks based on the plurality of neural networks and the plurality of parameter values, the plurality of neural network predictors including a main predictor and auxiliary predictors. The determination unit 430 may determine the target neural network using the trained plurality of neural network predictors. The target neural network as determined may be used to perform image classification, semantic segmentation and/or target detection.
The foregoing detailed description has described the implementations of the apparatus and/or method according to embodiments of the present document through block diagrams, flowcharts, and/or embodiments. When such block diagrams, flowcharts, and/or embodiments include one or more functions and/or operations, those skilled in the art will appreciate that each function and/or operation in such block diagrams, flowcharts, and/or embodiments may be implemented individually and/or collectively by various hardware, software, firmware, or virtually any combination thereof. In some embodiments, portions of the subject matter described in this specification may be implemented in the form of an application-specific integrated circuit (ASIC), field programmable gate array (FPGA), digital signal processor (DSP), or other integrated forms. However, those skilled in the art will recognize that some aspects of the embodiments described in this specification can be equivalently implemented, in whole or in part, in the form of one or more computer programs running on one or more computers (e.g., in the form of one or more computer programs running on one or more computer systems), in the form of one or more programs running on one or more processors (e.g., in the form of one or more programs running on one or more microprocessors), in the form of firmware, or substantially any combination thereof. Moreover, it is well within the ability of those skilled in the art, given this document, to design circuitry and/or write code for the software and/or firmware of the present document.
Although the present document is described above through the detailed description of embodiments thereof, it should be understood that various modifications, improvements, or equivalents thereof may be devised by those skilled in the art within the spirit and scope of the appended claims. Such modifications, improvements, or equivalents shall also be considered to be within the scope of this document.
Number | Date | Country | Kind |
---|---|---|---|
202210278192.7 | Mar 2022 | CN | national |