The present application claims the benefit under 35 U.S.C. § 119 of German Patent Application No. 10 2021 207 937.7 filed on Jul. 23, 2021, which is expressly incorporated herein by reference in its entirety.
The present invention relates to a method for creating a machine learning system, for example, for segmentation and object detection, a corresponding computer program and a machine-readable memory medium including the computer program.
The aim of an architecture search for neural networks is to fully automatically find a preferably good network architecture in terms of a key performance indicator/metric for a predefined data set.
In order to design the automatic architecture search in a computationally efficient manner, various architectures in the search space may share the weights of their operations as, for example, in a One-Shot NAS model, described by Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018), “Efficient neural architecture search via parameter sharing,” arXiv preprint arXiv: 1802.03268.
The One-Shot model in this case is typically constructed as a directed graph, in which the nodes represent data and the edges represent operations, which represent a calculation rule that converts the input node of the edge into the output node. The search space in this case is made up of subgraphs (for example, paths) in the One-Shot model. Since the One-Shot model may be very large, individual architectures may be drawn, in particular, randomly from the One-Shot model for the training such as, for example, described by Cai, H., Zhu, L., & Han, S. (2018), “Proxylessnas: Direct neural architecture search on target task and hardware,” arXiv preprint arXiv:1812.00332. This typically occurs by drawing a single path from an established input node to an output node of the network such as, for example, described by Guo, Z., Zhang, X., Mu, H., Heng, W., Liu, Z., Wei, Y., & Sun, J. (2019), “Single path one-shot neural architecture search with uniform sampling,” arXiv preprint arXiv:1904.00420.
For particular tasks such as object detection or in the case of multi-task networks, it is necessary for the network to include multiple outputs. In this case, gradient-based training of the complete One-Shot model may be modified for this case such as, for example, described by Chen, W., Gong, X., Liu, X., Zhang, Q., Li, Y., & Wang, Z. (2019), “FasterSeg: Searching for Faster Real-Time Semantic Segmentation,” arXiv preprint arXiv:1912.10917. This in turn is not memory-efficient, however and does not show the drawing of architectures with branches and with different outputs during the training within the scope of the architecture search.
The authors Cai, et al. describe in their paper “ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware,”, retrievable online at: https://arxiv.org/abs/1812.0332, an architecture search, which takes hardware properties into account.
The present invention improves the architecture search for multi-task networks including multiple outputs, whose architectures include multiple paths by drawing paths from a One-Shot model at the start of the architecture search without implicitly favoring individual paths. In this way, all architectures of the search space are initially drawn with an equal probability and the search space is thus impartially explored. This has the advantage that ultimately more superior architectures may be found for multi-task networks, which would not be discovered during normal initialization of the (draw) probabilities of the edges and/or nodes of the One-Shot model.
In one first aspect, the present invention relates to a computer-implemented method for creating a machine learning system, which is configured for the segmentation and object detection of images, the machine learning system including an input for receiving the image and one or multiple outputs, one first output outputting the segmentation of the image and one second output outputting the object detection. The second output may alternatively output another object description of the objects on the image, such as an object classification.
In accordance with an example embodiment of the present invention, the method includes the following steps: providing a directed graph, the directed graph including an input and output node and a plurality of further nodes. It is noted that it is also possible that the graph includes a plurality of input nodes and/or output nodes. The input and output nodes are connected via the further nodes with the aid of directed edges and the nodes represent data and the edges represent operations, which may be calculation rules that convert a first node of the edges into a further node connected to the respective edge. It is noted that the assignment of the data and calculation rules may also be reversed.
The edges are each assigned a probability, which characterizes with what probability the respective edge is drawn, in particular, when selecting a path through the graph. The selection of the path is preferably a random, in particular iterative, drawing of the edges as a function of their probabilities.
This is followed by a selection of a path through the graph. In the process, a subset is initially determined from the plurality of the further nodes, which all satisfy a predefined property, for example, with respect to a data resolution, or a depth of the nodes relative to the input node, in order to ensure a predefined receptive field. At least one additional node (Node of Interest, NOI) is selected from this subset, which may serve directly as a second output. A path through the graph from the input node along the edges via the additional node (NOI) up to the output node is thereupon selected. The path may be understood here to mean a sequence of selected nodes of the directed graph, in which in each case two consecutive nodes are connected by one edge. The path in this case may fully characterize the architecture if the architecture includes only one path through the directed graph. It is also possible that the path describes only a part of the architecture, namely when the architecture includes multiple paths through the graph.
It is noted that this additional node may serve as an output and here, for example, an object classification is output, or that an object classification head (detection head) may be connected at this output, which then ascertains the object classification, in particular, object description, as a function of the output of the additional node.
In accordance with an example embodiment of the present invention, the selection of the path may take place iteratively, at each node the following edge being randomly selected as a function of its assigned probability from the possible following edges that are connected to this node. This process is also referred to be below as drawing the path. The path thus represents a direct connection of the input node to the output node.
If the architecture is to include multiple paths, correspondingly multiple repetitions of the preceding step “selecting a path” may take place and a creation of the machine learning system may subsequently take place based on the multiple drawn paths.
In accordance with an example embodiment of the present invention, a creation of a machine learning system then follows as a function of the architecture corresponding to the selected paths and training of the created machine learning system, adapted parameters of the machine learning system being stored in the corresponding edges of the directed graph and the probabilities of the edges of the architecture being adapted. The adaptation of the probabilities may take place in this case via a so-called Black-Box optimizer, which applies the REINFORCE algorithm, for example, (see in this regard, for example, the above-mentioned paper “ProxylessNAS”), in order to estimate gradients for adapting the probabilities of the edges.
The steps of selecting the path/the paths and creating the machine learning system as well as the training may be carried out multiple times in succession, preferably until a change of the parameters is sufficiently small.
The drawing of the path in the last step, in particular after the optimization of the probabilities of the edges, may take place randomly or the edges with the highest probabilities are specifically selected.
In accordance with the present invention, a particular feature of the method is that the directed graph is provided in such a way that the probabilities of the edges are initially set at one value, so that all paths through the directed graph, in particular architectures, are drawn with equal probability. “Initially” may be understood here to mean that this initialization of the probabilities of the edges takes place before the first step of training is carried out. During training, these probabilities may be adapted so that after the training process, high probabilities of the edges signal that these edges are relevant for the architecture of the machine learning system. This particular initializing of the probabilities yields the advantage that as a result of the initial normalization of the probabilities before the training, it is ensured that an exploration of all possible architectures is impartially carried out. This results in the advantageous effect that as a result of the particular initializing, better architectures may be discovered, which would not have been identified without the normalization, since these are initially not explored or only insufficiently explored.
Thus, it may be said that the provided method has the advantage that with this method, a particularly efficient machine learning system, in particular, an artificial neural network, may be discovered for multi-task tasks for image processing (for example, gesture recognition or object distance estimation, etc.).
In addition or alternatively, the tasks for the artificial neural network may be as follows: natural language processing, autoencoder, generative models, etc., the different outputs each characterizing different properties of the input signal with respect to the task.
In accordance with an example embodiment of the present invention, it is provided that for each node of the subset, a total number of first subpaths from the respective node of the subset up to the input node and a total number of second subpaths to the output node are counted. The probabilities of those edges contained in the first subpaths are preferably ascertained as a function of the total number of the first subpaths and the probabilities of those edges contained in the second subpaths are ascertained as a function of the total number of the second subpaths. For example, the probabilities of those edges contained in the first subpaths may be set in each case initially to a number of the possible paths, which connect the input node to the respective node of the subset and which extend over the respective edge, divided by the total number of the first subpaths. In the same way, the probabilities of those edges contained in the second subpaths are set in each case initially to a number of the possible paths, which connect the output node to the respective node of the subset and which extend over the respective edge, divided by the total number of the second subpaths.
It is noted that the ascertainment of the total number of the subpaths may take place in such a way that one first subpath is initially created starting from the additional node (NOI) backwards through the graph to the input node, and one second subpath is created from the additional node (NOI) to the output node. This procedure is then repeated until all possible first and second subpaths have been detected. For this purpose, it is noted that the first subpath and the second subpath together result in the path.
The separate searching of first and second subpaths and the related procedure in order to achieve the normalization of the draw probabilities has the advantage that with respect to the corresponding NOI, it is possible to thereby select the suitable probabilities. Furthermore, this procedure may be carried out in parallel, since the separate searching of the subpaths is independent of one another. The method may thus be particularly easily carried out on parallel computing architectures.
It is further provided that the further nodes of the subset, all of which satisfy a predefined property with respect to a data resolution, are also each assigned a probability, this probability being normalized. The probabilities of the nodes of the subset are, in particular, normalized in such a way that the drawing of a path starting with the drawing of a node of the subset and then drawing of the path results in an equally probable drawing of all possible paths. The normalization of the probabilities of the nodes of the subset preferably takes place regardless of the normalization of the probabilities of the edges. “Normalized” may be understood to mean that a drawing of the respective elements is equally probable, i.e., initially there is no preference for certain NOIs and/or edges and/or paths present.
Identical to the probabilities of the edges, when drawing the path, the additional node (NOI) is drawn as a function of its assigned probability. Furthermore, this probability is also adapted during training, for example, with the Black-Box optimizer.
In accordance with an example embodiment of the present invention, it is further provided that the probabilities of the nodes of the subset of the further nodes are initially set to the probability that the number of paths through the respective node of the subset of the further nodes divided by the total number of paths through the directed graph is set.
In other words, the probability for the drawing of the nodes of the subset is then adjusted to the number of paths through the respective node of the subset divided by the total number of paths through each of the nodes of the subset.
Applying the approach of the separate searching of first and second subpaths, the probability of the nodes of the subset may be defined by
the number of first subpaths representing NsA and the number of second subpaths representing NsD
Based on the probabilities of the edges and/or the nodes of the subset, a softmax function and the edges/nodes are then preferably randomly drawn as a function of the outputs of the softmax function. This has the advantage that the softmax function guarantees that an accumulation of the probabilities of the edges/nodes of the subset always yields 1. This advantage produces the advantageous effect that the probabilistic character of the drawing of the paths is maintained and an optimal architecture is thus more reliably discovered.
An advantage of a normalization of the probabilities of the additional nodes (NOIs) is that a full equal treatment of all possible architectures in the One-Shot model is achieved as a result, in particular, any architecture with a probability of one over the entire number of possible architectures of the One-Shot model may be drawn.
In accordance with an example embodiment of the present invention, it is further provided that the probabilities of the nodes (NOIs) of the subset of the further nodes of the graph are initially set to the same probability that all nodes of the subset are initially drawn with the same probability. As a result, no longer are all architectures equally probable, but only all architectures that have the same NOIs. Nevertheless, it has been found that this configuration in numerous applications also results in good architectures. The previous step may thus be omitted, namely that the probability of the additional nodes (NOIs) is ascertained as a function of the number of paths through the additional nodes (NOIs).
In accordance with an example embodiment of the present invention, it is provided that at least two additional nodes (NOI) are selected and the architecture includes at least two paths, each of which extends via one of the additional nodes to the output node. The architecture thus includes at least one branch, according to which pieces of information when propagated through the machine learning system arrive by the different ways to the outputs of the machine learning system. The second additional node may output a further image property. The two paths may be created independently of one another from the input node to the additional nodes starting at the additional nodes up to the input node. It may be said that a subgraph is determined, which effectively describes at least two intersecting paths or one splitting path through the graph. It is further noted that the paths may each be made up of a first and a second subpath and may be drawn accordingly.
In accordance with an example embodiment of the present invention, it is further provided that when a second path of the two paths meets the previously drawn first path of the two paths, the remaining section of the first path is used for the second path.
In accordance with an example embodiment of the present invention, it is further provided that further paths up to the output node are created starting from the additional nodes. It is noted that the paths together result in a subgraph of the directed graph.
In accordance with an example embodiment of the present invention, it is further provided that further paths are drawn independently of one another, and when the paths meet, the previously drawn path continues to be used.
An advantage of this is that by trend more optimal architectures that are smaller may be discovered with this procedure.
In accordance with an example embodiment of the present invention, it is further provided that a cost function is optimized during the training of the machine learning system, the cost function including a first function, which evaluates an efficiency of the machine learning system with respect to its segmentation and object recognition/object description, and including a second function, which estimates a latency or the like of the machine learning system as a function of a length of the path and of the operations of the edges.
In accordance with an example embodiment of the present invention, it is further provided that when creating the machine learning system, at least one output layer is appended to the additional node (NOI). The output layer is preferably a softmax layer.
In further aspects, the present invention relates to a computer program, which is configured to carry out the above method and to a machine-readable memory medium, on which this computer program is stored.
Specific embodiments of the present invention are explained in greater detail below with reference to the figures.
In order to find good architectures of deep neural networks for a predefined data set, automatic methods for the architecture search, so-called neural architecture search methods, may be applied. For this purpose, a search space of possible architectures of neural networks is explicitly or implicitly defined.
The term “operation” will be used below for describing a search space, which describes a calculation rule that converts one or multiple n-dimensional input data tensors into one or multiple output data tensors and, in the process, may have adaptable parameters. In image processing, operations used are, for example, frequently convolutions with various kernel sizes and different types of convolutions (regular convolution, depthwise-separable convolution) and pooling operations.
Furthermore, a calculation graph (the so-called One-Shot model) will be defined, which includes all architectures in the search space as subgraphs. Since the One-Shot model may be very large, individual architectures may be drawn from the One-Shot model for the training. This occurs typically by drawing individual paths from an established input node to an established output node of the network.
In the simplest case, when the calculation graph is made up of a chain of nodes, each of which may be connected via various operations, it is sufficient to draw the operation for two consecutive nodes each, which connects them.
If the One-Shot model is more generally a directed graph, a path may be iteratively drawn by starting at the input, the next node and the connecting operation are then drawn, and this procedure is then continued iteratively up to the target node.
The One-Shot model with drawing may then be trained by drawing an architecture for each mini-batch and by adapting the weights of the operations in the drawn architecture with the aid of a standard gradient step method. Finding the best architecture may take place either as a separate step after the training of the weights or may be carried out alternatingly with the training of the weights.
In order to draw architectures from a One-Shot model, which has branches and multiple outputs, a sampling model may be used in one specific embodiment for paths in the opposite direction. For this purpose, one path may be drawn for each output of the One-Shot model, which leads, starting from the output, to the input of the One-Shot model. For drawing the paths, the transposed One-Shot model may be considered for this purpose, in which all directed edges point in the opposite direction as in the original One-Shot model.
Once the first path has been drawn, it may happen that the next path reaches a node of the previous path. In this case, the drawing of the instantaneous path may be terminated, since a path from the shared node to the input already exists. Alternatively, it is possible to nevertheless draw the path further and to possibly obtain a second path to the input node.
In addition, the case is to be considered that the drawn architectures include one or multiple node(s) of the One-Shot model, which is/are not situated at the full depth of the network and is/are referred to below as NOI (“Nodes of Interest”), as well as an output at full depth of the One-Shot model. In this case, the creation of the path may take place by a back-directed drawing for the NOIs in order to connect these to the input. In addition, a forward-directed drawing is also carried out for each NOI, which leads to the output of the One-Shot model. As in the case of the back-directed drawing, the drawing in the case of the forward-directed drawing may be stopped once a path is reached that already leads to the output.
As an alternative to the back-directed drawing, a purely forward-directed drawing may take place by drawing for each NOI a path from the input to the corresponding NOI. This is achieved in that the drawing is carried out only on the subgraph, which is made up of all nodes that lie on a path from the input of the network to the instantaneous NOI, as well as all edges of the One-shot model between these nodes.
One exemplary embodiment is a multi-task network for object detection and semantic segmentation. The NOIs in this case are nodes at which an object classification output (detection head or object detection head) may be attached. In addition, one more output is used for the semantic segmentation at the output at the full depth of the network.
The automatic architecture search initially requires the creation of a search space (S21 in
For each node in G, a probability distribution across the outgoing edges is defined. For each node and for each path, preferably for each set of NOIs, a separate probability distribution may be defined within one architecture. This means, the different paths within one architecture use different probabilities. In addition, the transposed One-Shot model Gt is considered, which has the same node, but all directed edges point in the reverse direction. On Gt, a probability distribution across the outgoing edges is also introduced for each node (this corresponds to a probability distribution across incoming edges in G).
For the back-directed drawing, a path is drawn in Gt for the first NOI (23 in
With each drawing of an architecture, the NOIs may vary, since these NOIs may also be randomly drawn.
Based on graph G, an artificial neural network 60 (depicted in
The method may start with step S21, in which graph G is provided.
This is followed by step S22. In this step, the probabilities of the edges as explained in
Step S23 then follows. In this step, the architectures are drawn from the graph as a function of the probabilities of the edges. This is followed by the steps of training the drawn architecture as well as of the transfer of the optimized parameters and probabilities as a result of the training into the graph.
Control system 40 receives the sequence of sensor signals S of sensor 30 in an optional receiving unit 50, which converts the sequence of sensor signals S into a sequence of input images x (alternatively, each sensor signal S may also be directly adopted as input image x). Input image x may, for example, be a section or a further processing of sensor signal S. Input image x includes individual frames of a video recording. In other words, input image x is ascertained as a function of sensor signal S. The sequence of input images x is fed to a machine learning system, in the exemplary embodiment, to an artificial neural network 60, which has been created, for example, according to the method according to
Artificial neural network 60 is preferably parameterized by parameters ϕ, which are stored in a parameter memory St1 and are provided by the latter.
Artificial neural network 60 ascertains from input images x output variables y. These output variables y may include, in particular, a classification and semantic segmentation of input images x. Output variables y are fed to an optional forming unit 80, which ascertains therefrom activation signals A, which are fed to actuator 10 in order to activate actuator 10 accordingly. Output variable y includes pieces of information about objects that sensor 30 has detected.
Actuator 10 receives activation signals A, is activated accordingly and carries out a corresponding action. Actuator 10 in this case may include a (not necessarily structurally integrated) activation logic, which ascertains a second activation signal from activation signal A, with which actuator 10 is then activated.
In further specific embodiments, control system 40 includes sensor 30. In still further specific embodiments, control system 40 includes alternatively or in addition also actuator 10.
In further preferred specific embodiments, control system 40 includes a singular or a plurality of processors 45 and at least one machine-readable memory medium 46, on which instructions are stored which, when they are carried out on processors 45, prompt control system 40 to carry out the method according to the present invention.
In alternative specific embodiments, a display unit 10a is provided alternatively or in addition to actuator 10.
Sensor 30 may, for example, be a video sensor preferably situated in motor vehicle 100.
Artificial neural network 60 is configured to reliably identify objects from input images x.
Actuator 10 preferably situated in motor vehicle 100 may, for example, be a brake, a drive or a steering of motor vehicle 100. Activation signal A may be ascertained in such a way that the actuator or actuators 10 is/are activated in such a way that motor vehicle 100 prevents, for example, a collision with the objects reliably identified by artificial neural network 60, in particular, if it involves objects of particular classes, for example, pedestrians.
Alternatively, the at least one semi-autonomous robot may also be another mobile robot (not depicted), for example, one which moves by flying, floating, diving or pacing. The mobile robot may, for example, also be an at least semi-autonomous lawn mower or an at least semi-autonomous cleaning robot. In these cases as well, activation signal A may be ascertained in such a way that the drive and/or steering of the mobile robot may be activated in such a way that the at least semi-autonomous robot prevents, for example, a collision with objects identified by artificial neural network 60.
Alternatively or in addition, display unit 10a may be activated with activation signal A and, for example, the ascertained safe areas may be shown. It is also possible, for example, in a motor vehicle 100 with non-automated steering, that display unit 10a is activated with activation signal A in such a way that it outputs a visual or acoustic warning signal if it is ascertained that motor vehicle 100 threatens to collide with one of the reliably identified objects.
Sensor 30 may then, for example, be an optical sensor, which detects, for example, properties of manufactured products 12a, 12b. It is possible that these manufactured products 12a, 12b are movable. It is possible that actuator 10 controlling manufacturing machine 11 is activated as a function of an assignment of detected manufactured products 12a, 12b, so that manufacturing machine 11 accordingly carries out a subsequent processing step of the correct one of manufactured products 12a, 12b. It is also possible that by identifying the correct properties of the same one of manufactured products 12a, 12b (i.e., without a misclassification), manufacturing machine 11 accordingly adapts the same manufacturing step for a processing of a subsequent manufactured product.
Control system 40 ascertains as a function of the signals of sensor 30 an activation signal A of personal assistant 250, for example, by the neural network carrying out a gesture recognition. This ascertained activation signal A is then conveyed to personal assistant 250 and it is thus activated accordingly. This ascertained activation signal A may be selected, in particular, in such a way that it corresponds to a presumed desired activation by user 249. This presumed desired activation may be ascertained as a function of the gesture recognized by artificial neural network 60. Control system 40 may then select activation signal A as a function of the presumed desired activation for the conveyance to personal assistant 250 and/or may select activation signal A for the conveyance to the personal assistant corresponding to presumed desired activation 250.
This corresponding activation may, for example, entail personal assistant 250 retrieving pieces of information from a database and intelligibly reproducing these for user 249.
Instead of personal assistant 250, a household appliance (not depicted), in particular, a washing machine, a stove, an oven, a microwave or a dishwasher may also be provided in order to be activated accordingly.
The methods carried out by training system 140 implemented as computer program may be stored on a machine-readable memory medium 147 and carried out by a processor 148.
It is, of course, not necessary to classify whole images. It is possible that, for example, image details are classified as objects using a detection algorithm, that these image details are then cut out, optionally a new image detail is generated and used in the associated image instead of the cut out image detail.
The term “computer” includes arbitrary devices for processing predefinable calculation rules. These calculation rules may be present in the form of software or in the form of hardware or also in a mixture of software and hardware.
Number | Date | Country | Kind |
---|---|---|---|
10 2021 207 937.7 | Jul 2021 | DE | national |