The present invention pertains to the technical field of image recognition, and particularly relates to an image recognition method and system based on multi-population alternate evolution neural architecture search.
Analysis of image datasets is an emerging interdisciplinary field that requires expertise in computer vision and various fields, posing significant challenges for beginners in computer vision or specialized fields. Particularly, analyzing multiple datasets with different modalities can be unfriendly due to the non-standard nature of datasets. Previously, deep learning has dominated research and applications in image analysis, but the constant adjustment of deep learning models is costly in terms of labor and finances. Therefore, automated image classification has become increasingly important.
Currently, the technological approach of automated machine learning is adopted, using neural network search (NAS) to process image datasets. NAS is a method that automatically searches for and optimizes neural network structures using machine learning techniques. NAS aims to improve the efficiency or performance of deep learning models by searching better neural networks. The design of the search space in NAS is a key element that plays a crucial role in determining the optimal configuration.
One strategy of NAS is to explore all possible combinations of nodes and connections within neural networks, while another strategy includes: dividing the network into basic units and constructing more complex networks by stacking these units together.
Considering the scalability of the search space, the strategic methods of NAS require a substantial amount of computational resources and time. Although the second strategy reduces the complexity of the search and enhances structural adaptability, the stacking structure based on units harms the diversity of the network structure and does not fully consider the characteristics and limitations of each part of the entire network. When attempting to enhance the diversity of the network structure, it seems inevitable to incur additional search costs, presenting certain limitations.
The patent with the publication number CN 109299142 A discloses a method for searching a convolutional neural network structure based on an evolutionary algorithm, including: inputting a dataset and setting preset parameters to obtain an initial population; pushing, by a controller TC as the main thread, the initial population into a queue Q and activating a queue manager TQ and a message manager TM; after the queue manager TQ is activated, popping out untrained chromosomes from the queue Q for decoding, and then activating a worker manager TW as an independent temporary thread for training and calculating the fitness; and completing the parallel search of the convolutional neural network structure based on the evolutionary algorithm through the collaboration of the controller TC, the queue manager TQ, the worker manager TW, and the message manager TM, and outputting the best model. However, when analyzing multiple image datasets with different modalities, the complexity of the search space is high, and the search efficiency of this method needs further improvement.
In response to the above technical issues, the purpose of the present invention is to provide an image recognition method and system based on multi-population alternate evolution neural architecture search. This method not only benefits from an expandable network structure but also allows searching different layer structures without incurring additional costs, efficiently searching excellent image recognition network models for image recognition.
The technical solution of the present invention is as follows.
An image recognition method based on multi-population alternate evolution neural architecture search, including the following steps:
S01: Acquire image data and determine a search network according to a target task;
S02: Construct a supernet and pre-train the supernet according to preset parameters;
S03: Divide a network structure search space into multiple sub-spaces through an L-layer structure of a neural network, and randomly select N candidate sub-networks from the sub-spaces to form an initialized population;
S04: Sample multiple populations from the multiple search sub-spaces for alternate evolution, and select frontier individuals from a merged population in a multi-objective environment to generate the next parent population for multi-population alternate evolution; and
S05: Obtain an optimal neural network model for image recognition.
In a preferred technical solution, a method for constructing a supernet in step S02 includes:
The entire search space pool A is represented as a directed acyclic graph (DAG) of L layers, denoted by the formula: Πl=LEl, where El represents available operations in the lth layer of the DAG. The neural network within the search space is denoted by: a=Πl=1Lel, where el⊆El.
Each layer el of the neural network a is composed of multiple operations {opk} selected from K candidate operations, denoted as:
e=og={opk|gk=1,k ϵ{1, . . . , K}}, where g represents a specific set of operation configurations {gk} and the binary gate gkϵ{0,1} represents whether the kth operation is selected. The number of selected operations in og is denoted as: Σk=1Kgk and the number of possible operation combinations is 2K. The total number of operations contained in the L-layer neural network is: (2K)L.
In a preferred technical solution, the supernet is pre-trained through uniform sampling of sub-network structures for training. Each sub-network structure in the supernet S is denoted by si. The weights WS(Si) of the sub-network structure are inherited from the supernet weights WS. The optimization of the supernet weights Ws is denoted as:
Where E [·] represents the expectation, LC(·) represents the cross-entropy loss, N(si, Ws(si)) represents the network with sub-network structure si and weights Ws(Si), Si˜Us indicates that the sub-network si is sampled from the supernet space S, which follows a uniform distribution Us.
The minimization of the expectation value E[·] is achieved by sampling the sub-network structure si from the supernet space S and then updating the corresponding weights Ws(si) using a stochastic gradient descent method.
In a preferred technical solution, the genetic codes of individuals in the initialized population in step S03 are denoted by a V×E matrix, where V={vi}i=1:M represents the set of data nodes in each layer of the neural structure, with M indicating the number of data nodes in each layer of the network; and E={vi,vj}i,j=1:M is the set of an edge for describing connections between data nodes across layers, where the edge for the connections between data nodes indicates an operational action. The value corresponding to (vi,vj) in the matrix indicates the operational code value of the edge for the connections between data nodes vi and vj.
In a preferred technical solution, the multi-population alternate evolution in step S04 includes:
S41: Generate a current offspring population Ql according to preset crossover and mutation parameters as well as offspring generation strategies;
S42: Migrate excellent individuals from other populations to a current evolution population to obtain a migrated population Ml; and
S43: Merge the parent population Pl, the offspring population Ql, and the migrated population Ml to form a merged population, decode individuals within the merged population into corresponding sub-network structures si, inherit the weights Ws(si) from the supernet S, and then conduct fine-tuning training on the training dataset followed by an evaluation of accuracy performance indexes.
In a preferred technical solution, the fine-tuning training process of the sub-network structure si is a process of updating the weights of the supernet; given a multi-population pops, a process of sampling a complete sub-network structure si from the supernet S is implemented by sampling individuals p from the multi-population pops. The sampling process of the sub-network structure si is as follows:
Where represents an index set of the number of layers in the
-layer sub-network, and also represents
populations, decode( ) is a decoding function, and pil represents individual pi sampled from the lth population.
In a preferred technical solution, the method for obtaining a migrated population Ml in step S42 includes:
Maintain migration archives, select excellent individuals from the contemporary population into the migration archive set according to the multi-objective evolutionary algorithm;
Determine the number of migrated individuals according to the adjacent distance of each population;
Select the migrated individuals of the population according to the degree of similarity between the individual and the population. The degree of similarity between individual Gena in population Pa and population Pb is represented by the following formula:
Where D represents the number of best individuals selected; Genbi represents the genetic code of the ith best individual in population Pb, Len(Gen) is the length of the genetic code; Gena×Genbi is the sum of the products of the values of genes of two individuals at the corresponding bits, representing the degree of similarity between the two individuals; and Sim(Gena,Pb) is used to determine the degree of similarity between individual Gena and population Pb.
The present invention also discloses an image recognition system based on multi-population alternate evolution neural architecture search includes:
an image acquisition module, configured to acquire image data and determine a search network according to a target task;
a supernet constructing and training module, configured to construct a supernet and pre-train the supernet according to preset parameters;
an initialization module, configured to divide a network structure search space into multiple sub-spaces through an L-layer structure of a neural network, and randomly select N candidate sub-networks from the sub-spaces to form an initialized population;
a multi-population alternate evolution module, configured to sample multiple populations from the multiple search sub-spaces for alternate evolution, and select frontier individuals from a merged population in a multi-objective environment to generate the next parent population for multi-population alternate evolution; and
an image recognition module, configured to obtain an optimal neural network model for image recognition.
In a preferred technical solution, the multi-population alternate evolution in the multi-population alternate evolution module includes:
S41: Generate a current offspring population Ql according to preset crossover and mutation parameters as well as offspring generation strategies;
S42: Migrate excellent individuals from other populations to a current evolution population to obtain a migrated population Ml; and
S43: Merge the parent population Pl, the offspring population Ql, and the migrated population Ml to form a merged population, decode individuals within the merged population into corresponding sub-network structures si, inherit the weights Ws(si) from the supernet S, and then conduct fine-tuning training on the training dataset followed by an evaluation of accuracy performance indexes.
The present invention also discloses a computer storage medium on which a computer program is stored. When the computer program is executed, the above image recognition method based on multi-population alternate evolution neural architecture search is implemented.
Compared with the prior art, the present invention has the following beneficial effects.
1. The method not only benefits from an extensible network structure but also allows the searching of different layer structures without incurring additional costs to search excellent image recognition network models for image recognition.
2. The method defines the entire search space as multiple independent cell spaces, conducts searches sequentially within these cell spaces, meets the diversified needs of modules at a smaller search cost, and finds a balance between search cost and cell diversity. By simplifying the search space according to multiple populations and evenly dividing lengthy network codes into each population, the search space required for a single image dataset is reduced. The module diversification is realized at a smaller search cost, the complexity of the search space is significantly reduced, and the automated processing of image analysis is promoted.
3. Additionally, the method introduces a population migration mechanism, leveraging the knowledge and experience retained by each population to accelerate the evolutionary process, significantly speeding up the convergence rate of the populations.
The present invention will be further described below with reference to the accompanying drawings and examples:
To make the objectives, technical solutions, and advantages of the present invention more clear and understandable, the present invention is further described in detail below with reference to specific implementations and the accompanying drawings. It should be understood that these descriptions are exemplary and are not intended to limit the scope of the present invention. Furthermore, in the following description, descriptions of well-known structures and technologies are omitted to avoid unnecessarily confusing the concepts of the present invention.
As shown in
S01: Acquire image data and determine a search network according to a target task;
S02: Construct a supernet and pre-train the supernet according to preset parameters;
S03: Divide a network structure search space into multiple sub-spaces through an L-layer structure of a neural network, and randomly select N candidate sub-networks from the sub-spaces to form an initialized population;
S04: Sample multiple populations from the multiple sub-spaces for alternate evolution, and select frontier individuals from a merged population in a multi-objective environment to generate the next parent population for multi-population alternate evolution; and
S05: Obtain the optimal neural network model for image recognition.
Specifically, in step S01, preset parameters can also be set, which include dataset-related parameters, network training-related parameters, and search algorithm-related parameters.
The dataset-related parameters include: a) the division ratio of a training set and a validation set; b) the batch size of the training set; c) the batch size of the validation set.
The network training-related parameters include: a) learning rate; b) gradient clipping rate for weights; c) weight decay rate; d) number of pre-training epochs for the supernet; e) total number of training epochs for the supernet; and f) number of fine-tuning training epochs for individuals in the population during the evolution.
The search algorithm-related parameters include: a) the number of populations L′; b) the population size N′; c) the maximum number of iterations T; d) the individual gene crossover rate; e) the individual gene mutation rate; and f) the size of the migration archive set.
In an optimal example, the method for constructing a supernet in step S02 includes:
The entire search space pool A is represented as a directed acyclic graph (DAG) of L layers, denoted by the formula: Πl=1LE, where Et represents available operations in the lth layer of the DAG. The neural network within the search space is denoted by: a=Πl=1Lel, where el⊆El.
Each layer el of the neural network is composed of multiple operations {opk} selected from K candidate operations, denoted as e=og={opk|gk=1,k ∈{1, . . . ,K }}, where g represents a specific set of operation configurations {gk} and the binary gate gk∈{0,1} represents whether the kth operation is selected. The number of selected operations in og is denoted as: Σk=1Kgk and the number of possible operation combinations is 2K. The total number of operations contained in the L-layer neural network is: (2K)L.
In a preferred example, the supernet is pre-trained through uniform sampling of sub-network structures for training. Each sub-network structure in the supernet S is denoted by Si. The weights Ws(si) of the sub-network structure are inherited from the supernet weights Ws. The optimization of the supernet weights Ws is denoted as:
Where E[·] represents the expectation, Lc(·) represents the cross-entropy loss, N(si, Ws(si)) represents the network with sub-network structure si and weights Ws(si), Si˜Us indicates that the sub-network si is sampled from the supernet space S, which follows a uniform distribution Us.
The minimization of the expectation value E[·] is achieved by sampling the sub-network structure si from the supernet space S and then updating the corresponding weights Ws(si) using a stochastic gradient descent method.
In a preferred example, the genetic codes of individuals in the initialized population in step S03 are denoted by a V×E matrix, where V={vii}i=1:M represents the set of data nodes in each layer of the neural structure, with M indicating the number of data nodes in each layer of the network; and E={vi,vj}i,j=1:M is the set of an edge for describing connections between data nodes across layers, where the edge for the connections between data nodes indicates an operational action (such as convolution, pooling and other operations). The value corresponding to (vi,vj) in the matrix indicates the operational code value of the edge for the connections between data nodes vi and vj.
In a preferred example, the multi-population alternate evolution in step S04 includes:
S41: Generate a current offspring population Ql according to preset crossover and mutation parameters as well as offspring generation strategies;
S42: Migrate excellent individuals from other populations to a current evolution population to obtain a migrated population Ml; and
S43: Merge the parent population Pl, the offspring population Ql, and migrated population Ml to form a merged population, decode individuals within the merged population into corresponding sub-network structures si, inherit the weights Ws(si) from the supernet S, and then conduct fine-tuning on the training dataset followed by an evaluation of accuracy performance indexes.
In a preferred example, the fine-tuning training process of the sub-network structure Si is a process of updating the weights of the supernet; given a multi-population pops, a process of sampling a complete sub-network structure si from the supernet S is implemented by sampling individuals p from the multi-population pops. The sampling process of the sub-network structure si is as follows:
Where represents an index set of the number of layers in the
-layer sub-network, and also represents
populations, decode ( ) is a decoding function, and pil represents individual pi sampled from the lth population.
In a preferred example, the method for obtaining a migrated population Ml in step S42 includes:
Maintain migration archives, and select excellent individuals from the contemporary population into the migration archive set according to the multi-objective evolutionary algorithm;
Determine the number of migrated individuals according to the adjacent distance of each population;
Select the migrated individuals of the population according to the degree of similarity between the individual and the population. The degree of similarity between individual Gena in population Pa and population Pb is represented by the following formula:
Where D represents the number of best individuals selected; Genbi represents the genetic code of the ith best individual in population Pb, Len(Gen) is the length of the genetic code; Gena×Genbi is the sum of the products of the values of genes of two individuals at the corresponding bits, representing the degree of similarity between the two individuals; and Sim(Gena,Pb) is used to judge the degree of similarity between individual Gena and population Pb.
In another example, a computer storage medium on which a computer program is stored. When the computer program is executed, the above image recognition method based on multi-population alternate evolution neural architecture search is implemented. The specific method is consistent with the image recognition method based on multi-population alternate evolution neural architecture search described above, which will not be repeated here.
In another example, as shown in
an image acquisition module 10, configured to acquire image data and determine a search network according to a target task;
a supernet constructing and training module 20, configured to construct a supernet and pre-train the supernet according to preset parameters;
an initialization module 30, configured to divide a network structure search space into multiple sub-spaces through an L-layer structure of a neural network, and randomly select N candidate sub-networks from the sub-spaces to form an initialized population;
a multi-population alternate evolution module 40, configured to sample multiple populations from the multiple search sub-spaces for alternate evolution, and select frontier individuals from a merged population in a multi-objective environment to generate the next parent population for multi-population alternate evolution; and
an image recognition module 50, configured to obtain an optimal neural network model for image recognition.
The workflow of the image recognition system based on multi-population alternate evolution neural architecture search is described in detail below with a best example,, as shown in
Step 1: Input the dataset and set preset parameters.
Step 2: Construct the supernet and perform pre-training: perform pre-training on the supernet according to preset parameters.
Step 3: Initialize multiple populations and migration archive sets: initialize multiple populations and migration archive sets according to preset parameters.
Loop Judgment {circle around (1)}: Enter the multi-population alternate evolution phase and perform T multi-population alternate evolution cycles according to the preset maximum number of iterations. At the same time, determine whether the current number of iterations t has reached the maximum number of iterations T. If so, proceed to Step 9 to output the optimal network structure and end. Otherwise, select population Pl to start the single-population evolution process.
Step 4: Generate offsprings. Select the current population Pl to be evolved, and generate the current offspring population Ql according to the preset crossover and mutation parameters as well as the offspring generation strategy.
Step 5: Migrate populations. According to the population migration mechanism, migrate excellent individuals from other populations to the current evolution population to obtain a migrated population Ml.
Step 6: Train and evaluate merged populations. Evaluate the network individuals in the parent population Pl, the offspring population Ql, and the migrated population Ml according to the weight inheritance strategy.
Step 7: Update supernet. Synchronously update the weight parameters of the supernet during the training of individuals in the population in Step 6.
Step 8: Update populations and migration archive sets. If the preset termination generation is met, proceed to Step 9; otherwise, return to Step 4.
Loop Judgment {circle around (2)}: Determine whether the multi-population alternate evolution process of the current tth generation has ended. If so, proceed to the t+1 generation multi-population alternate evolution process; otherwise, select the next population Pl+1 in sequence for single-population evolution process.
Step 9: Output the optimal network model and end.
The preset parameters in Step 1 include dataset-related parameters, network training-related parameters, and search algorithm-related parameters.
The dataset-related parameters include: a) the division ratio of a training set and a validation set; b) the batch size of the training set; c) the batch size of the validation set.
The network training-related parameters include: a) learning rate; b) gradient clipping rate for weights; c) weight decay rate; d) number of pre-training epochs for the supernet; e) total number of training epochs for the supernet; f) number of fine-tuning training epochs for individuals during the evolutionary.
The search algorithm-related parameters include: a) the number of populations L; b) the population size N′; c) the maximum number of iterations T; d) the individual gene crossover rate; e) the individual gene mutation rate; f) the size of the migration archive set.
The construction of the supernet in step 2 is to construct a larger network SuperNet including all predefined operations. Since neural architectures typically use feedforward structures, in this example, the entire search space pool A is represented as a directed acyclic graph (DAG) of L layers, denoted by the formula Πl=1LEl, where El represents available operations in the lth layer of the DAG (such as convolution, pooling, and other operations). Therefore, the neural network within the search space is denoted by a=Πl=1Lel, where el⊆El. Each layer el of the neural network a is composed of multiple operations {opk} selected from K candidate operations, denoted as: e=o={opk|gk=1,k ∈{1, . . . , K}}, where g represents a specific set of operation configurations {gk} and the binary gate gk ∈ {0,1} represents whether the kth operation is selected. In this case, the number of selected operations in og is denoted as: Σk=1Kgk and the number of possible operation combinations is 2K, while the total number of operations contained in the L-layer neural network is: (2K)L.
In Step 2, the supernet is pre-trained through uniform sampling of sub-network structures for training. Each sub-network structure in the supernet S is denoted by si, and the weights Ws(si) of the sub-network structure are inherited from the supernet weights Ws. The optimization of the supernet weights Ws is denoted as:
Where E[·] represents the expectation, Lc(·) represents the cross-entropy loss, N(si, Ws(si)) represents the network with sub-network structure Si and weights Ws(si), and Si˜US indicates that the sub-network si is sampled from the supernet space S, which follows a uniform distribution US. The minimization of the expectation value E[·] is achieved by sampling the sub-network structures si from the supernet space S and then updating the corresponding weights Ws(si) using a stochastic gradient descent method. In the example, uniform sampling is performed for each possible architecture, and the sampling probability for the sub-network structure follows pi˜Bernoulli (0.5), where Bernoulli ( ) is the Bernoulli distribution.
In Step 3, the initialization of L populations indicates the sub-network sampling encoding for each layer of the L-layer neural network. According to the L-layer structure of the neural network, the search space A is divided into L subset spaces Al. Then, N candidate sub-networks are randomly selected from each sub-space Al to form a population. The genetic codes of individuals in the population are denoted by a V×E matrix, where V={vi}i=1:M represents the set of data nodes in each layer of the neural structure, with M indicating the number of data nodes in each layer of the network; E={vi,vj}i,j=1:M is the set of an edge for describing connections between data nodes across layers, where the edge for the connections between data nodes indicates an operational action (such as convolution, pooling, and other operations), and the values corresponding to (Vi, Vj) in the matrix indicates the operation code values of the edge for the connections between the data nodes vi and vj.
The initialization of the migration archive set is to randomly select m excellent individuals to form the migration archive set Ml for population Pl.
In Step 4, offspring generation is achieved through three operational operators: selection, crossover, and mutation. The operational operators are selected to select excellent individuals according to the fitness values of the previous evolution for crossover and mutation, thus generating offsprings. The selected strategies are optionally one of three methods of roulette wheel selection, tournament selection, and probabilistic selection. The crossover method can be either single-point crossover or multi-point crossover. Single-point crossover is that two parent individuals select the same point in the binary encoding genes for crossover to produce two entirely new offspring individuals, while multi-point crossover is to select multiple points for crossover. Mutation is to select multi-point mutation.
According to the mutation probability in the preset parameters in Step 1, it is determined whether the binary needs to mutate from 0 to 1 or from 1 to 0. The current population popl undergoes the process of selection, crossover, and mutation repeatedly until the predefined maximum number of offspring is reached, at which point the process ends, obtaining the current offspring population Ql.
In Step 5, the population migration mechanism consists of three aspects: maintaining migration archives (Step 8), determining the number of migrated individuals for each population, and selecting migrated individuals (Step 5). The migration mechanism determines the number of migrated individuals according to the adjacent distance of each population. The adjacent distance between the populations is the difference between the network layer numbers corresponding to each population. At the same time, the migrated individuals of the population are selected according to the degree of similarity between the individual and the population. The degree of similarity between individual Gena in population Pa and population Pb is represented by the following formula:
Where D represents the number of best individuals selected; Genbi represents the genetic code of the ith best individual in population Pb; Gena×Genbi is the sum of the products of the values of genes of two individuals at the corresponding bits, and representing the degree of similarity between the two individuals, and Len(Gen) is the length of the genetic code; and Sim(Gena,Pb) is used to judge the degree of similarity between individual Gena and population Pb. The smaller the value of Sim(Gena,Pb), the lower the degree of similarity between the selected migration individual Gena from population Pa and population Pb, with the aim of increasing the diversity of population Pb while ensuring the individual's fitness.
In Step 6, the training of the merged population and the weight update of the supernet in Step 7 are conducted alternately. The merged population refers to merging the parent population Pl, the offspring population Ql, and the migrated population Ml to form a population Cl, denoted by Cl=Pl υQlυMl. The individuals in the merged population Cl are first decoded into the corresponding sub-network structures si and inherit the weights Ws(si) from the supernet S. After that, they undergo a small number of epochs of fine-tuning training on the training dataset Dtrain, followed by an evaluation of accuracy performance indexes on the validation dataset Dvaild. The fine-tuning training process of the sub-network structure Si is the weight update process of the supernet, and its optimization process is the same as the formula 1 in Step 2. The process of sampling a complete sub-network structure si from the supernet S for a given multi-population pops is achieved by sampling individuals p from the multi-population pops. The sampling process of the sub-network structure Si can be defined as follows:
Where represents an index set of the number of layers in the
-layer sub-network, and also represents
populations. pil represents individual pi sampled from the lth population.
In Step 8, population updating is achieved through the multi-objective evolutionary algorithm NSGA-III. From the merged population Cl, the NSGA-III algorithm and at least two optional predefined objectives (accuracy, number of model parameters, FLOPS) are used to select a predefined number N of individuals as the next generation's parent population.
The update of the migration archive set is also to select excellent individuals from the current population according to the multi-objective evolutionary algorithm to enter the migration archive set and cover the previous individuals.
After completing Step 8, it is determined whether the preset number of termination generations has been reached. If yes, Step 9 is proceeded to output the optimal network model; otherwise, Step 4 is returned.
In the example, comparative experimental results on the CIFAR dataset with other algorithms are provided, as shown in Table 1 below. In the example, the CIFAR-10 and CIFAR-100 training sets are divided into two parts, with 25,000 images used for the training dataset Dtrain and 25,000 images for the validation dataset Dvaild. A total of 500 epochs were searched, with the supernet parameter preheating phase lasting for the first 10% of the period (50 epochs).
From the table, it can be seen that the optimal model searched by the method of this example on the CIFAR-10 and CIFAR-100 datasets achieves highly competitive results in both model accuracy (ACC) and search time (GDs), outperforming most competitors. On the CIFAR-10 and CIFAR-100 datasets, the optimal network model MPAE-C found by the algorithm has a classification accuracy of up to 97.51% and 84.12%, respectively, surpassing all peer competitors considered in the experiment; and the search cost only requires 0.4 GDs, which is far less than the computational resources consumed by the AmoebaNet-A and NASNet-A models (0.4 GDs«3150 GDs, 0.4 GDs «1800 GDs).
The automated search and classification of medical images face has the problems of unfriendly analysis and difficulty in mastering. The current algorithms are often very costly and consume more human and financial resources. The image recognition method based on multi-population alternate evolution neural architecture search of the present invention is applied to the automated search and classification of medical images, and can automatically search an excellent medical image recognition network model from the sampling dataset to solve the problems.
By utilizing a collection of publicly available medical open datasets, standardized medical image processing-related sampling data is obtained. The sampling dataset includes MedMNIST, which consists of 10 pre-processed datasets from selected sources, and covers major data forms (X-ray, OCT, ultrasound, CT), various classification tasks (binary/multi-class, ordinal regression, and multi-label), and data scales (ranging from 100 to 100,000).
Based on the above content, the flowchart of the multi-population alternate evolution search algorithm provided in this example for the recognition process in the field of medical images, as shown in
Step S201: Acquire standardized medical image processing-related sampling data through a collection of publicly available medical open datasets.
Step S202: Automatically search an excellent network structure from a medical sampling training set according to a multi-population alternate evolution neural architecture search algorithm (MPAE) and a supernet model, where a search process uses a multi-population to represent different modules, and each module is alternately optimized. The specific implementation mode is the same as that in Example 1.
Step S203: Obtain a complete medical image recognition network model by finally training the searched network structure on the medical dataset.
From the MedMNIST publicly available medical open dataset, the following datasets are available: PathMNIST that is a dataset for predicting survival outcomes of colorectal cancer histology slides; DermaMNIST that is a dataset of dermoscopic images of common skin pigmented lesions from multiple sources; OCTMNIST that is a dataset of effective optical coherence tomography (OCT) images related to retinal diseases; OrganMNIST {Axial, Coronal, Sagittal} that is a 3D computed tomography (CT) image dataset based on the liver tumor segmentation benchmark (LiTS); and other medical datasets.
Step S203 is to retrain the optimal network structure on the complete MedMNIST dataset (including both the training set and the test set) to obtain the accurate network model weight parameters and the final recognition accuracy results. The comparison of the final experimental results with the experimental results of other algorithms is shown in Table 2 below.
It can be seen from the above table that the medical image recognition network model searched by the algorithm of the present invention has high accuracy.
The image recognition method based on multi-population alternate evolution neural architecture search of the present invention can be applied to the automated search and classification of car images. An excellent car image recognition network model is automatically searched for the sampling dataset.
By utilizing a collection of publicly available car datasets, standardized car image processing-related sampling data is obtained. The sampling dataset includes the Stanford Cars dataset and the CompCars dataset. The Stanford Cars dataset is a fine-grained classification dataset specifically designed for car image recognition tasks, and contains images of 196 car types with 16,185 different car models, including 8,144 images for the training set and 8,041 images for the test set, covering detailed category processing, including car images of different angles, sizes and lighting conditions. The Comprehensive Cars (CompCars) dataset includes data from both web and surveillance scenarios. The web image data contains 163 car brands and 1,716 car models, with a total of 136,726 whole car images and 27,618 car part images. The surveillance image data contains 50,000 car images captured in a frontal view.
Based on the above content, the flowchart of the multi-population alternate evolution search algorithm provided in this example for the recognition process in the field of car image, as shown in
Step S301: Acquire the car image datasets of Stanford Cars and CompCars through a collection of publicly available car open datasets and conduct pre-processing. The pre-processing methods optionally include multiple methods of CenterCrop, Resize, Normalize and data enhancement.
Step S302: Input the car image dataset into a supernet model that includes all operators within the entire network for training to prepare for weight sharing in Step S303.
Step S303: Automatically search an excellent network structure for an inputted car sampling training set according to the multi-population alternate evolution neural architecture search algorithm (MPAE) and the supernet model. The specific implementation mode is the same as that in Example 1.
Step S304: Evaluate and iterate the neural network models MPAE continuously generated from Step S303, determine whether the maximum number of iterations has been reached; and if so, proceed to the next step, otherwise continue iterating.
Step S305: Optimal Model Training. Step S305 is to retrain the optimal network structure on the complete car dataset (including the training set and the test set) to obtain the accurate weight parameters of the network model and the final recognition accuracy result. The comparison of the final experimental results on the Stanford Cars datasets and CompCars datasets with the experimental results of other algorithms are shown in Table 3 below.
It can be seen from the above table that the car image recognition network model searched by the algorithm of the present invention has high accuracy.
It should be understood that the above specific examples of the present invention are merely exemplary for illustrative or explanatory purposes, and do not constitute a limitation on the present invention. Therefore, any modifications, equivalent substitutions, improvements, etc., made without departing from the spirit and scope of the present invention should be included within the scope of protection of the present invention. In addition, the claims appended to the present invention are intended to cover all variations and modifications falling within the scope and boundaries of the appended claims, or their equivalent forms.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202410095592.3 | Jan 2024 | CN | national |