The present invention relates to the field of AI accelerator technology, particularly to an optimizing method for an AI accelerator and an AI accelerator.
In recent years, supported by the accumulation of massive data and increasingly sophisticated computing power, deep learning (DL) has achieved significant success in fields such as image processing, text understanding, and recommendation systems. Among them, neural networks (NN) can automatically extract and process a large number of features, often achieving better performance. Currently, artificial intelligence (AI) using various neural network models has been widely applied in various industries. Due to the large amount of data and computing power required by neural networks, it is often necessary to use dedicated AI accelerators to process tasks. AI accelerators are a type of specialized hardware accelerator or computer system designed to accelerate AI applications. However, when applying deep learning to a new research task, AI accelerators typically rely on past experience to complete the design of neural network models by manual debugging. Furthermore, as the size of the model used increases, the search space for all weight parameters and feature parameters will grow exponentially, leading to a potentially exponential increase in the time required to debug parameters. Such AI accelerator design methods can consume a lot of time for researchers. The efficiency will be significantly improved if the work in this aspect is automated.
Neural architecture search (NAS) is a technology specifically researched in AI accelerators that aims to automate the design of high-performance deep neural network architectures without manual debugging. It does not require users to have extensive expert experience and has garnered widespread attention due to its ability to automate the generation of neural networks by replacing human design of neural network hyperparameters. NAS can reduce the labor-intensive work of researchers, allowing them to focus their attention and efforts on other more meaningful research. At the same time, relevant studies have proven that the performance of neural networks searched by NAS is superior to that of manually designed network structures.
Currently, research on neural network architecture search is mainly divided into three directions: search space, search strategy, and evaluation strategy, and the search strategy method has received the most attention. However, current neural network architecture search methods face the problem of consuming a large amount of computation time. Moreover, this data-driven pure black box method also makes the generated neural network lack interpretability, further limiting its application in real life. Inspired by different intelligent behaviors in biological systems, evolutionary algorithms are used to mimic these behaviors in the form of algorithms to solve optimization problems in mathematics and engineering. Among them, the genetic algorithm (GA) based on Darwin's theory of evolution has been proven to be effective in solving optimization problems and is widely used. This algorithm uses selection, crossover, and mutation operators to make the feasible solutions of the population converge to the optimal solution. With continuous research, the particle swarm optimization algorithm (PSO), Ant Colony Optimization (ACO), and other algorithms have been designed and proposed, and have been proven effective in various practical applications. Although evolutionary algorithms used in AI accelerators can often find the global optimal solution in the search space and are suitable for both continuous and discrete problems, they also have significant drawbacks, such as long execution time and high computational cost.
The inventors have found that there are at least the following problems in the prior art, in the process of implementing the present invention.
Existing evolution algorithms of AI accelerators are time-consuming and computationally expensive. It is difficult to adapt to the developing needs of AI accelerators, and further optimization and improvement are urgently needed.
The objective of this invention is to provide an optimizing method for an AI accelerator and an AI accelerator, to address the technical problems existing in the prior art, such as the long-time consumption and high computational cost of evolution algorithms of the AI accelerator, which is difficult to adapt to the developing needs of AI accelerators and urgently needs further optimization and improvement. The technical effects of the preferred technical solution among the various technical solutions provided in the present invention are described in detail as follows.
To fulfill the above objective, the present invention provides the following technical solution:
The present invention provides an optimizing method for an AI accelerator, characterized by optimizing the AI accelerator through obtaining target neural network architecture by genetic programming, including the following steps:
In one embodiment, the functional layer includes an input layer, a preprocessing layer, a feature extraction layer, a feature concatenation layer, and an output layer; the input layer is used to input raw data, the preprocessing layer is used to preprocess the type of the raw data, the feature extraction layer extracts features of the raw data through a feature extraction network, the feature concatenation layer is used to concatenate different features extracted by the feature extraction layer, and the output layer returns output results based on the features extracted by the feature extraction layer.
In one embodiment, the step S40 comprises the following steps:
In one embodiment, the optimizing method is based on GPU computation, and comprises the following steps:
In one embodiment, the optimizing method uses a tree-like parameter server structure for parameter aggregation, each parameter server receives parameters of its child nodes and performs aggregation the tree-like parameter server structure; when all the data is aggregated to a root node, the root node performs gradient descent operation and updates model parameters of the target neural network architecture; and the updated model parameters are distributed to each parameter server.
In one embodiment, the optimizing method further optimizes dataset size and batch size of the target neural network architecture, and comprises the following steps:
In one embodiment, the optimizing method optimizes the computational performance of the parameter servers, and the optimizing method comprises: taking the dataset size of each parameter server as the dependent variable, the working time and idle waiting time of each parameter server as the fitness function value, evaluating the performance of each parameter server, and optimizing the workload of each parameter server based on the performance evaluation results using an acquired genetic algorithm.
The present invention provides an AI accelerator, which is obtained by the above-mentioned optimizing method for the AI accelerator.
Implementation of a technical solution of the above technical solutions of the present invention provides the following advantages of efficacy effects:
The present invention addresses the issues of interpretability and understandability in traditional neural network generation by leveraging the encoding capabilities of genetic programming. It also utilizes the optimization performance of genetic programming as an evolutionary algorithm to search for the optimal weight and feature precision in the search space of different weight precisions and feature precisions at each layer, resulting in a neural network architecture with optimal weight and feature precision, thereby reducing computational costs.
To more clearly expound the technical solution of embodiment of the present invention, a brief description will be provided below for the drawings that are necessary for the illustration of the embodiments. It is appreciated that the drawings described below show only some of the embodiments of the present invention, and those having ordinary skill in the art may envisage other drawings based on the attached drawings, without creative endeavor. In the drawings:
To better expound the objectives, the technical solution, and the advantages of the application, description of various illustrative embodiments will be provided below, with reference to corresponding drawings. The drawings form a part of the illustrative embodiments and provide illustration for various illustrative embodiments that are adopted to realize the present invention. Unless otherwise indicated, identical figures used through the various drawings designate the same or similar elements. The ways of implementation described with the following illustrative embodiments do not represent all the embodiments that are consistent with the disclosure. It is noted that they are only provided as examples that illustrate processes, methods, and devices in accord with some of the aspects defined, in detail, in the appended claims and disclosed in the present invention, and other feasible embodiments may also be available, and modifications with respect to the structures and functions involved in the embodiments listed in the disclosure may be made without departing from the scope and essence of the present invention.
In the description of the present invention, it is noted that terms, such as “central”, “longitudinal”, and “transverse”, are used to indicate directional or positional relationships interpreted on the basis of the illustrations of the drawings and are applied to ease the description of the present invention and to simplify the illustration thereof, and are not intended to indicate or imply a designated element must be of a specific direction, or must be structured and operated in a specific direction. Terms, such as “first” and “second”, are adopted only for the purposes of description and should not be interpreted as indicating or implying relative importance or implicitly suggesting the quantity of a technical feature indicated thereby. Terms, such as “plurality”, bear meaning of having a quantity of two or more than two. Terms, such as “interconnect” and “connect”, should be interpreted in a broad sense, such as being fixedly connected, detachably connected, integrally connected, mechanically connected, electrically connected, in communication connection, directly connected, and indirectly connected through an intermediate medium, and may be regarded as communication between interiors of two elements or an interacting relationship of two elements. Terms, such as “and/or”, include any and all combinations of one or multiple ones of listed items. For those having ordinary skill in the field, the specific meaning of the above-named terms can be appreciated from the context of the present invention based on specific situations.
To explain the technical solution provided in the present invention, in the following, description is made with respect to specific embodiments, yet only the parts that are associated with the implementation of the present invention are illustrated.
As shown in
Optionally, in step S20, genetic programming uses a tree structure to perform neural network architecture search. The tree structure defines input and output of each layer in genetic programming, thus enabling compatibility with inputs and outputs in different data types. The tree structure also defines an order of different layers in the neural network architecture, the input and output relationships between different layers, and the overall input and output formats.
Optionally, in step S20, the function set includes different functional layers of the tree structure, each functional layer corresponding to different functions, and including a set number of units. The number of units is fixed or non-fixed, which can be designed according to specific tasks. The functional layers include an input layer, a preprocessing layer, a feature extraction layer, a feature concatenation layer, and an output layer. The input layer is used to input raw data. The preprocessing layer is used to preprocess the type of the raw data, for example, image data can be subjected to grayscale transformation, etc. The feature extraction layer extracts the features of the raw data through a feature extraction network. The feature concatenation layer concatenates different features extracted by the feature extraction layer. The output layer returns output results based on the features extracted by the feature extraction layer, or it can also return a specific type of result. In step S20, the terminal set defines the parameters of different functional layers, such as the input layer defines the size of the image, the preprocessing layer defines the image preprocessing method and preprocessing parameters, the feature extraction layer defines the network parameters used, etc., so that the input and output types of each functional layer match each other and meet the requirements of the genetic programming algorithm.
Optionally, as shown in
Optionally, the search method is computed based on GPU. Due to the large number of feasible solutions in the evolutionary algorithm, and the evolution of each feasible solution can be processed in parallel in different threads, it is suitable for running on GPU. Switching from the computing framework of CPU+GPU to pure GPU computing framework greatly reduces the latency caused by end-to-end communication and data transmission, which can significantly improve the speed of automated neural network design and design better neural networks. As shown in
Optionally, the optimizing method adopts a tree-structured parameter server architecture for parameter aggregation. In the tree-structured parameter server architecture, each parameter server receives the parameters from its child nodes and aggregates them. When all data is aggregated to the root node, the root node performs gradient descent and updates the model parameters of the target neural network architecture. Finally, the updated model parameters are distributed to each parameter server. As shown in
Optionally, as shown in
Optionally, the optimizing method optimizes the computational performance of the parameter servers, and the specific optimization method is as follows. Using the dataset size of each parameter server as the dependent variable and the working time and idle waiting time of each parameter server as the fitness function values, evaluate the performance of each parameter server. Based on the performance evaluation results, use an adaptive genetic algorithm to optimize the workload of each parameter server. This avoids human debugging to obtain the optimal solution, reduces the idle waiting time of the parameter server, increases the dataset size of the parameter server, and thus fully utilizes its computational performance.
The embodiment is merely a special case and does not indicate that the invention is implemented in this way only.
An AI accelerator, obtained through the optimizing method of the AI accelerator in Embodiment 1. The AI accelerator, through the coding capability of genetic programming, solves the problems of unexplainability and incomprehensibility of neural network generation in AI accelerators. At the same time, it utilizes the optimization performance of genetic programming as an evolutionary algorithm to effectively reduce the computational time in AI accelerators. In the search space of different weight accuracies and feature accuracies in each layer, it searches for an optimal neural network architecture with weight accuracy and feature accuracy, reducing the computational cost of the AI accelerator.
The above only illustrates some of the preferred embodiments of the present invention. Artisans of the field may appreciate that various changes and equivalent substitutes can be made on the features and embodiments without departing from the spirit and scope of the present invention. Further, with the teaching of the present invention, such features and embodiments can be modified to adapt to specific situations and materials without departing the spirit and scope of the present invention. Thus, the present invention is not limited to the specific embodiments disclosed herein and all the embodiments that fall in the scope of the claims of the application should be considered belonging to the scope of protection of the present invention.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202311744044.0 | Dec 2023 | CN | national |
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CN2024/098529 | Jun 2024 | WO |
| Child | 19026584 | US |