This application relates to the field of artificial intelligence, and in particular, to a neural network tiling method, a prediction method, and a related apparatus.
Artificial intelligence (AI) is a theory, a method, a technology, or an application system that simulates, extends, and expands human intelligence by using a digital computer or a machine controlled by the digital computer, to perceive an environment, obtain knowledge, and achieve an optimal result based on the knowledge. In other words, artificial intelligence is a branch of computer science, and is intended to understand essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is to study design principles and implementation methods of various intelligent machines, so that the machines have perceiving, inference, and decision-making functions. Researches in the field of artificial intelligence include robots, natural language processing, computer vision, decision-making and inference, human-machine interaction, recommendation and search, AI basic theories, and the like. Machine learning is a main manner of implementing artificial intelligence.
In the field of machine learning and cognitive science, a neural network (NN for short) or an artificial neural network is a mathematical model or calculation model that mimics a structure and a function of a biological neural network (which is a central nervous system, especially a brain, of an animal), and is configured to estimate or approximate a function. The neural network performs calculation by a large quantity of artificial neuron connections. Currently, there are mainly two methods for calculating the neural network: (1) a graphics processing unit (GPU); and (2) an application-specific integrated circuit (ASIC). However, regardless of the GPU or the ASIC, a conventional technical solution in which the neural network is used to process a processing task is as follows: A single vertex (that is, a neuron) in the neural network is used as a basic unit to perform calculation layer-by-layer. Because calculation result data of the single vertex in the neural network is usually relatively large and cannot be stored into on-chip storage, a calculation result needs to be exported to off-chip storage. Therefore, the off-chip storage needs to be accessed to store an output result of the single vertex in the neural network. Because a quantity of vertices in the neural network is usually relatively large, the off-chip storage needs to be frequently accessed in a calculation process of the neural network. When the off-chip storage is frequently accessed, calculation performance is limited due to a bandwidth of the off-chip storage to a specific extent and system power consumption is high.
Embodiments of this application provide a neural network tiling method, a prediction method, and a related apparatus. A neural network graph is tiled to obtain a depth subgraph, so as to generate a depth subnetwork. The depth subnetwork does not need to access an external memory in a process of performing a processing operation, so as to effectively decrease a quantity of times of accessing the external memory or even avoid accessing the external memory.
According to a first aspect, an embodiment of this application provides a neural network tiling method. The method may include: obtaining a neural network graph, where the neural network graph is used to represent a neural network, the neural network graph includes a plurality of vertices, and each vertex represents a calculation unit in the neural network; and tiling the neural network graph to obtain a depth subgraph, where the depth subgraph is used to represent a depth subnetwork, a plurality of vertices included in the depth subnetwork exchange data with each other by reading and writing an on-chip buffer, the depth subnetwork is configured to successively process at least two groups of data obtained by tiling first input data, to obtain first output data, the first input data is input data of the depth subnetwork, and the first input data includes one or more signals that can be processed by a computer.
The method is executed by a neural network tiling apparatus, and the neural network tiling apparatus may be a terminal such as a server or a mobile phone, and another computer device. In actual application, the neural network tiling apparatus may tile a neural network graph to obtain one or more depth subgraphs, so as to generate one or more depth subnetworks based on the one or more depth subgraphs. The neural network graph represents a neural network, and essence of tiling the neural network graph is tiling the neural network. These depth subnetworks may be understood as subnetworks obtained by tiling the neural network, that is, each depth subnetwork includes a part of vertices in the neural network. The tiling of the neural network described herein is only logical tiling. To be specific, only a procedure in which each vertex in the neural network processes input data of the vertex and a data reading and writing procedure are adjusted, instead of tiling the neural network into several parts. Vertices included in a tiled neural network (that is, these depth subnetworks) are vertices in the neural network, and a processing operation implemented by each vertex does not change. The neural network is configured to execute a target task, and each depth subnetwork is configured to execute a subtask included in the target task. It can be understood that these depth subnetworks can implement the target task. For example, a reference prediction result may be obtained by inputting reference input data to the neural network for prediction processing. The reference prediction result is also obtained by inputting the reference input data, for processing, to the one or more depth subnetworks obtained by tiling the neural network. In other words, a processing operation implemented by using the depth subnetwork obtained by tiling the neural network is the same as a processing operation implemented by the neural network. The depth subnetwork does not need to access an external memory in a process of processing input data of the depth subnetwork. Therefore, when the depth subnetwork obtained by tiling the neural network is used to execute the target task, a quantity of times of accessing the external memory can be decreased or access to the external memory even can be avoided. The external memory is a memory other than the on-chip buffer. Because the quantity of times of accessing the external memory is decreased, power consumption can be further reduced when a processing task is executed by using the tiled neural network.
In this embodiment of this application, the neural network tiling apparatus tiles a neural network graph to obtain one or more depth subgraphs, so as to generate one or more depth subnetworks based on the one or more depth subgraphs. When these depth subnetworks are used to perform a processing task of the neural network, a quantity of times of accessing the external memory can be greatly decreased, and power consumption can be reduced.
In an optional manner, the method further includes: tiling the neural network graph to obtain a direct subgraph, where the direct subgraph is used to represent a direct subnetwork, a plurality of vertices included in the direct subnetwork exchange data with each other by reading and writing the on-chip buffer, the direct subnetwork is configured to process second input data as a whole to obtain second output data, and the second input data is input data of the direct subnetwork. The neural network tiling apparatus can obtain a depth subgraph and a direct subgraph by tiling the neural network graph. A depth subnetwork generated by using the depth subgraph can tile input data of the depth subgraph into at least two groups of data for successive processing, and a direct subnetwork generated by using the direct subgraph can process input data of the direct subnetwork as a whole. Because the direct subnetwork processes the input data of the direct subnetwork as a whole, each vertex in the direct subnetwork needs to perform only one processing operation, which takes short time.
In this implementation, the neural network tiling apparatus tiles the neural network graph to obtain the direct subgraph, so as to generate the direct subnetwork based on the direct subgraph. The direct subnetwork can process the input data of the direct subnetwork as a whole, thereby effectively reducing processing time.
In an optional manner, storage space required by the first input data is larger than available storage space of the on-chip buffer, storage space required by each of the at least two groups of data is not larger than the available storage space of the on-chip buffer, and storage space required by the second input data is not larger than the available storage space of the on-chip buffer.
In this implementation, in one aspect, when storage space required by input data is larger than the available storage space of the on-chip buffer, the input data is tiled into at least two groups of data for successive processing, so that storage space required by each group of data is not larger than the available storage space of the on-chip buffer, thereby avoiding accessing the external memory. In another aspect, when the storage space required by the input data is not larger than the available storage space of the on-chip buffer, the input data is processed as a whole, so that processing time can be reduced while the external memory is not accessed.
In an optional manner, each of the plurality of vertices included in the depth subnetwork performs at least two processing operations in a process of processing the first input data, and at least one vertex in the direct subnetwork performs one processing operation in a process of processing the second input data.
In this implementation, the first input data is tiled into at least two groups for successive processing, so as to avoid accessing the external memory.
In an optional manner, the tiling the neural network graph to obtain a depth subgraph includes: obtaining a first reference subgraph, where the first reference subgraph includes a first vertex and a second vertex, the first vertex is a current to-be-allocated vertex in the neural network graph, and the second vertex is a next vertex of the first vertex in the neural network graph; adding a third vertex to the first reference subgraph to obtain a second reference subgraph, where the third vertex is a next vertex of the second vertex in the neural network graph, and the second reference subgraph is used to process third input data; allocating an address of the on-chip buffer to the second reference subgraph in a process in which the second reference subgraph processes the third input data; and when the address of the on-chip buffer is successfully allocated to the second reference subgraph and the third vertex is an end vertex, using the second reference subgraph as the depth subgraph. A reference subgraph is a subgraph that is not determined as the depth subgraph or the direct subgraph and includes at least one vertex, and start vertex in the subgraph is a current to-be-allocated vertex in the neural network graph. The reference subgraph is a subgraph that includes a current to-be-allocated vertex in the neural network graph or includes a current to-be-allocated vertex in the neural network graph and a next vertex of the to-be-allocated vertex. In a procedure of a neural network graph tiling method, the neural network graph tiling apparatus may add one or more unallocated vertices in the neural network graph based on a reference subgraph to obtain a direct subgraph or a depth subgraph; and after obtaining one direct subgraph or one depth subgraph, the neural network graph tiling apparatus adds one or more unallocated vertices in the neural network graph based on another reference subgraph (a subgraph including a current to-be-allocated vertex in the neural network graph, or a subgraph including a current to-be-allocated vertex in the neural network graph and a next vertex of the to-be-allocated vertex) to obtain a new direct subgraph or a new depth subgraph. It can be understood that each reference subgraph includes a current to-be-allocated vertex in the neural network graph, and a direct subgraph or a depth subgraph can be obtained based on each reference subgraph.
In this implementation, the neural network tiling apparatus can quickly and accurately generate a depth subgraph based on a result of allocating the address of the on-chip buffer and whether a vertex is an end vertex.
In an optional manner, after the allocating an address of the on-chip buffer to the second reference subgraph, the method further includes: when the address of the on-chip buffer fails to be allocated to the second reference subgraph, using the first reference subgraph as the direct subgraph.
A case in which the address of the on-chip buffer fails to be allocated to the second reference subgraph includes the following: The address of the on-chip buffer fails to be allocated to the second reference subgraph in a process in which at least two groups of data obtained by tiling the third input data are successively processed. That the address of the on-chip buffer cannot be successfully allocated to the second reference subgraph means that the second reference subgraph cannot be used as the depth subgraph. In this implementation, when the address of the on-chip buffer fails to be successfully allocated to the second reference subgraph, the first reference subgraph is used as the direct subgraph, thereby avoiding generating a wrong depth subgraph.
In an optional manner, the allocating an address of the on-chip buffer to the second reference subgraph includes: determining whether the address of the on-chip buffer is successfully allocated to the second reference subgraph in a process in which the second reference subgraph processes the third input data as a whole; and when the address of the on-chip buffer fails to be allocated to the second reference subgraph, allocating the address of the on-chip buffer to the second reference subgraph in a process in which the second reference subgraph successively processes at least two groups of data obtained by tiling the third input data.
In this implementation, if the address of the on-chip buffer is successfully allocated to the second reference subgraph in the process in which the second reference subgraph processes the third input data as a whole, a subsequent operation does not need to be performed, thereby reducing operations.
In an optional manner, after the tiling the neural network graph to obtain a depth subgraph, the method further includes: when a quantity of vertices included in the depth subgraph is not less than a first threshold, tiling the depth subgraph to obtain a first second-order subgraph and a second second-order subgraph, where the first second-order subgraph is used to represent a first second-order subnetwork, the second second-order subgraph is used to represent a second second-order subnetwork, both the first second-order subnetwork and the second second-order subnetwork are included in the depth subnetwork, and vertices included in the first second-order subnetwork are all different from vertices included in the second second-order subnetwork.
The first threshold may be 5, 6, 7, 8, or the like. This is not limited in this embodiment of this application. The first second-order subgraph and the second second-order subgraph are subgraphs obtained by tiling the depth subgraph. The first second-order subnetwork and the second second-order subnetwork may be understood as two subnetworks obtained by tiling the depth subnetwork. For example, one of the first second-order subgraph and the second second-order subgraph is a direct subgraph, and the other one is a depth subgraph. In other words, a new depth subgraph and a new direct subgraph may be obtained by tiling a depth subgraph that includes vertices whose quantity is not less than the first threshold. For example, both the first second-order subgraph and the second second-order subgraph are depth subgraphs. In other words, two new depth subgraphs may be obtained by tiling a depth subgraph that includes vertices whose quantity is not less than the first threshold. For example, a depth subgraph includes a vertex 1, a vertex 2, a vertex 3, a vertex 4, a vertex 5, and a vertex 6. The depth subgraph is tiled to obtain a depth subgraph including the vertex 1, the vertex 2, and the vertex 3, and a depth subgraph including the vertex 4, the vertex 5, and the vertex 6. For example, both the first second-order subgraph and the second second-order subgraph are direct subgraphs. In other words, two new direct subgraphs may be obtained by tiling a depth subgraph that includes vertices whose quantity is not less than the first threshold. In some embodiments, the neural network tiling apparatus first tiles the neural network graph to obtain at least one depth subgraph, and then tiles a depth subgraph that is in the at least one depth subgraph and that includes vertices whose quantity is not less than the first threshold, so that a quantity of vertices included in each depth subgraph is less than the first threshold. It can be understood that a larger quantity of vertices included in a depth subnetwork indicates a larger repeated calculation amount of the depth subnetwork. The repeated calculation amount can be effectively reduced by decreasing a quantity of vertices included in a depth subnetwork.
In this implementation, a depth subgraph that includes vertices whose quantity is not less than the first threshold is tiled, so that a quantity of depth subgraphs that include vertices whose quantity exceeds the first threshold can be decreased, thereby reducing a repeated calculation amount of a depth subnetwork generated based on the depth subgraph.
In an optional manner, a plurality of vertices included in the first second-order subnetwork exchange data with each other by reading and writing the on-chip buffer, and a plurality of vertices included in the second second-order subnetwork exchange data with each other by reading and writing the on-chip buffer.
In this implementation, a quantity of times of accessing the external memory can be decreased, thereby reducing power consumption.
In an optional manner, input data of the first second-order subnetwork is the first input data, output data of the second second-order subnetwork is the first output data, the first second-order subnetwork is configured to store, into a middle buffer, first intermediate data obtained by processing the first input data, the second second-order subnetwork is configured to process the first intermediate data obtained from the middle buffer, and the middle buffer is not the on-chip buffer.
Optionally, the middle buffer may be an external buffer whose reading and writing speed is lower than the on-chip buffer and whose reading and writing speed is higher than the external memory. In other words, performance of the middle buffer is between performance of the on-chip buffer and performance of the external memory, reading and writing performance of the middle buffer is better than the external memory but lower than the on-chip buffer, and storage space of the middle buffer is smaller than storage space of the external memory but larger than storage space of the on-chip buffer.
In this implementation, the middle buffer temporarily stores output data of the first second-order subnetwork, so as to decrease a quantity of depth subgraphs that include vertices whose quantity exceeds the first threshold.
In an optional manner, the middle buffer is an off-chip buffer whose reading and writing speed is lower than the on-chip buffer.
In an optional manner, after the tiling the depth subgraph to obtain a first second-order subgraph and a second second-order subgraph, the method further includes: when the second second-order subnetwork is configured to process the first intermediate data as a whole, combining the second second-order subgraph and a first direct subgraph to obtain a second direct subgraph, where the first direct subgraph is used to represent a first direct subnetwork, input data of the first direct subnetwork is the first output data that is output by the second second-order subnetwork, the first direct subnetwork is configured to process the first output data as a whole to obtain third output data, the second direct subgraph is used to represent a second direct subnetwork, and the second direct subnetwork is configured to process the first intermediate data as a whole to obtain the third output data.
In this implementation, two neighboring direct subgraphs are combined into one direct subgraph, so that the two neighboring direct subgraphs are used as a whole to generate a corresponding instruction.
In an optional implementation, the tiling the depth subgraph to obtain a first second-order subgraph and a second second-order subgraph includes: determining at least one reference vertex that is in a plurality of vertices included in the depth subgraph and whose output data needs to occupy storage space smaller than available storage space of the middle buffer; and tiling the depth subgraph by using an output of an intermediate vertex in the at least one reference vertex as a tiling point to obtain the first second-order subgraph and the second second-order subgraph, where the intermediate vertex is any reference vertex in the at least one reference vertex, output data of the intermediate vertex is output data of the first second-order subgraph and is input data of the second second-order subgraph.
In this implementation, one depth subgraph can be quickly tiled into two second-order subgraphs, and it can be ensured that output data of a subnetwork represented by the first second-order subgraph can be stored into the middle buffer.
In an optional implementation, before the tiling the depth subgraph by using an output of an intermediate vertex in the at least one reference vertex as a tiling point to obtain the first second-order subgraph and the second second-order subgraph, the method further includes: obtaining a depth difference between two second-order subgraphs that are obtained by tiling the depth subgraph by separately using an output of the at least one reference vertex as a tiling point, to obtain at least one depth difference, where the at least one reference vertex is in a one-to-one correspondence with the at least one depth difference; and determining that the output of the intermediate vertex that is in the at least one reference vertex and that corresponds to a depth difference less than a depth difference threshold is used as a tiling point to tile the depth subgraph. Optionally, it is determined that the output of the intermediate vertex that is in the at least one reference vertex and that corresponds to a minimum depth difference is used as a tiling point to tile the depth subgraph. The depth difference threshold may be 1, 2, 3, or the like. A depth of a subgraph may be a quantity of vertices included in the subgraph. In this implementation, the depth subgraph can be quickly tiled into two second-order subgraphs between which a depth difference is less than the depth difference threshold.
In an optional manner, after the tiling the neural network graph to obtain a depth subgraph, the method further includes: generating a target instruction corresponding to the depth subgraph, where the target instruction is used to execute a target subtask, the neural network is configured to execute a target task, and the target subtask is a part of the target task.
In this implementation, the target instruction corresponding to the depth subgraph is generated, so that the target subtask can be implemented by executing the target instruction.
According to a second aspect, an embodiment of this application provides a neural network-based prediction method. The method may include: obtaining original input data, where the original input data includes one or more signals that can be processed by a computer; inputting the original input data to a neural network for prediction processing to obtain a prediction result, where the prediction processing includes: successively inputting, to a depth subnetwork for processing, at least two groups of data obtained by tiling first input data, where the depth subnetwork is included in the neural network and includes a part of vertices in the neural network, each vertex represents a calculation unit in the neural network, and a plurality of vertices included in the depth subnetwork exchange data with each other by reading and writing an on-chip buffer, and the first input data is obtained in the process of inputting the original input data to the neural network for prediction processing; and outputting the prediction result.
In this embodiment of this application, in the process of inputting the original input data to the neural network for prediction processing, the depth subnetwork is used to execute a processing task of the neural network. Because the depth subnetwork does not need to access an external memory in a process of processing input data of the depth subnetwork, a quantity of times of accessing the external memory can be greatly decreased, and power consumption can be reduced.
In an optional manner, the prediction processing further includes: processing, by a direct subnetwork, second input data as a whole, where the direct subnetwork is included in the neural network and includes a part of vertices in the neural network, and the second input data is obtained in the process of inputting the original input data to the neural network for prediction processing.
In this implementation, the direct subnetwork processes the input data of the direct subnetwork as a whole, so that a processing speed is fast.
In an optional manner, storage space required by the second input data is not larger than available storage space of the on-chip buffer.
In this implementation, when storage space required by input data is not larger than the available storage space of the on-chip buffer, the input data is processed as a whole, so that processing time can be reduced while the external memory is not accessed.
In an optional manner, at least one vertex in the direct subnetwork performs one processing operation in a process of processing the second input data.
In this implementation, processing time can be reduced.
In an optional manner, storage space required by the first input data is larger than the available storage space of the on-chip buffer, and storage space required by each of the at least two groups of data is not larger than the available storage space of the on-chip buffer.
In this implementation, when storage space required by input data is larger than the available storage space of the on-chip buffer, the input data is tiled into at least two groups of data for successive processing, so that storage space required by each group of data is not larger than the available storage space of the on-chip buffer, thereby avoiding accessing the external memory.
In an optional manner, each of the plurality of vertices included in the depth subnetwork performs at least two processing operations in a process of processing the first input data.
In this implementation, each vertex in the depth subnetwork successively performs at least two processing operations, so as to avoid accessing the external memory.
In an optional manner, an off-chip memory does not need to be accessed in the process of inputting the original input data to the neural network for prediction processing to obtain the prediction result.
In an optional manner, the plurality of signals that can be processed by a computer include at least one of a voice signal, a text signal, or an image signal.
According to a third aspect, an embodiment of this application provides a neural network graph tiling apparatus, and the apparatus includes a memory and a processor. The memory is configured to store code. The processor is configured to perform the following operations by reading the code stored in the memory: obtaining a neural network graph, where the neural network graph is used to represent a neural network, the neural network graph includes a plurality of vertices, and each vertex represents a calculation unit in the neural network; and tiling the neural network graph to obtain a depth subgraph, where the depth subgraph is used to represent a depth subnetwork, a plurality of vertices included in the depth subnetwork exchange data with each other by reading and writing an on-chip buffer, the depth subnetwork is configured to successively process at least two groups of data obtained by tiling first input data, to obtain first output data, the first input data is input data of the depth subnetwork, and the first input data includes one or more signals that can be processed by a computer.
In an optional manner, the processor is further configured to tile the neural network graph to obtain a direct subgraph, where the direct subgraph is used to represent a direct subnetwork, a plurality of vertices included in the direct subnetwork exchange data with each other by reading and writing the on-chip buffer, the direct subnetwork is configured to process second input data as a whole to obtain second output data, and the second input data is input data of the direct subnetwork.
In an optional manner, storage space required by the first input data is larger than available storage space of the on-chip buffer, storage space required by each of the at least two groups of data is not larger than the available storage space of the on-chip buffer, and storage space required by the second input data is not larger than the available storage space of the on-chip buffer.
In an optional manner, each of the plurality of vertices included in the depth subnetwork performs at least two processing operations in a process of processing the first input data, and at least one vertex in the direct subnetwork performs one processing operation in a process of processing the second input data.
In an optional manner, the processor is specifically configured to: obtain a first reference subgraph, where the first reference subgraph includes a first vertex and a second vertex, the first vertex is a current to-be-allocated vertex in the neural network graph, and the second vertex is a next vertex of the first vertex in the neural network graph; add a third vertex to the first reference subgraph to obtain a second reference subgraph, where the third vertex is a next vertex of the second vertex in the neural network graph, and the second reference subgraph is used to process third input data; determine whether an on-chip address manager successfully allocates an address of the on-chip buffer to the second reference subgraph in a process in which the second reference subgraph processes the third input data; and when the address of the on-chip buffer is successfully allocated to the second reference subgraph and the third vertex is an end vertex, use the second reference subgraph as the depth subgraph.
In an optional manner, the processor is further configured to: when the address of the on-chip buffer fails to be allocated to the second reference subgraph, use the first reference subgraph as the direct subgraph.
In an optional manner, the on-chip address manager is specifically configured to allocate the address of the on-chip buffer to the second reference subgraph in a process in which the second reference subgraph processes the third input data as a whole. The processor is specifically configured to: when the on-chip address manager fails to allocate the address of the on-chip buffer to the second reference subgraph, allocate the address of the on-chip buffer to the second reference subgraph in a process in which the second reference subgraph successively processes at least two groups of data obtained by tiling the third input data.
In an optional manner, the processor is further configured to generate a target instruction corresponding to the depth subgraph, where the target instruction is used to execute a target subtask, the neural network is configured to execute a target task, and the target subtask is a part of the target task.
According to a fourth aspect, an embodiment of this application provides a data processing apparatus, and the data processor apparatus includes a memory and a processor. The memory is configured to store code and original input data. The processor is configured to perform the following operations by reading the code stored in the memory: obtaining the original input data, where the original input data includes one or more signals that can be processed by a computer; and inputting the original input data to a neural network for prediction processing to obtain a prediction result, where the prediction processing includes: successively inputting, to a depth subnetwork for processing, at least two groups of data obtained by tiling first input data, where the depth subnetwork is included in the neural network and includes a part of vertices in the neural network, each vertex represents a calculation unit in the neural network, and a plurality of vertices included in the depth subnetwork exchange data with each other by reading and writing an on-chip buffer, the first input data is obtained in the process of inputting the original input data to the neural network for prediction processing; and outputting the prediction result.
In an optional manner, the prediction processing further includes: processing, by a direct subnetwork, second input data as a whole, where the direct subnetwork is included in the neural network and includes a part of vertices in the neural network, and the second input data is obtained in the process of inputting the original input data to the neural network for prediction processing.
In an optional manner, storage space required by the second input data is not larger than available storage space of the on-chip buffer.
In an optional manner, at least one vertex in the direct subnetwork performs one processing operation in a process of processing the second input data.
In an optional manner, storage space required by the first input data is larger than the available storage space of the on-chip buffer, and storage space required by each of the at least two groups of data is not larger than the available storage space of the on-chip buffer.
In an optional manner, each of the plurality of vertices included in the depth subnetwork performs at least two processing operations in a process of processing the first input data.
In an optional manner, the processor does not need to access an off-chip memory in the process of inputting the original input data to the neural network for prediction processing to obtain the prediction result.
In an optional manner, the plurality of signals that can be processed by a computer include at least one of a voice signal, a text signal, or an image signal.
According to a fifth aspect, an embodiment of this application provides a computer-readable storage medium. The computer storage medium stores a computer program, and the computer program includes program instructions. When the program instructions are executed by a processor, the processor is enabled to perform the method according to the first aspect and the second aspect and the optional implementations.
To make a person skilled in the art better understand the technical solutions in this application, the following clearly describes the technical solutions in the embodiments of this application with reference to the accompanying drawings in the embodiments of this application. It is clear that the described embodiments are merely a part but not all of the embodiments of this application.
In the embodiments of the specification, the claims, and the accompanying drawings of this application, the terms “first”, “second”, “third”, and the like are intended to distinguish between similar objects, but do not necessarily indicate a specific order or sequence. In addition, terms “including” and “having” and any variants thereof are intended to cover non-exclusive inclusion, for example, include a series of steps or units. A method, system, product, or device is not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such a process, method, product, or device.
Currently, a conventional technical solution in which a neural network is used to perform a processing operation (for example, image processing or voice processing) is as follows: A single vertex (that is, a neuron or a calculation unit in the neural network) in the neural network is used as a basic unit to perform calculation layer-by-layer, and a calculation result is exported to external storage. A data amount of the calculation result of the single vertex in the neural network is usually large, and the calculation result cannot be stored into an internal storage unit, that is, an on-chip buffer. Therefore, the external storage needs to be accessed to store an output result of the single vertex in the neural network.
A main principle of this application is converting an expression form of the NN from using the OP as a basic unit into using a depth subgraph and a direct subgraph as basic units. The depth subgraph represents a depth subnetwork, and the direct subgraph represents a direct subnetwork. Each depth subgraph and each direct subgraph carries information about on-chip address allocation (which is completed by an on-chip address management module) and the like, and the depth subgraph also carries information about data tiling (which is completed by a data tiling module). Each depth subgraph and each direct subgraph includes a plurality of OPs, and the plurality of OPs included in each subgraph exchange data with each other by reading and writing the on-chip buffer. In other words, each depth subgraph is equivalent to an OP in the classical calculation form, and each direct subgraph is also equivalent to an OP in the classical calculation form. The AI chip may complete instruction mapping based on a new expression form (a form including the depth subgraph and the direct subgraph), that is, may complete calculation of the entire NN network in an on-chip memory (buffer). Each depth subgraph may be mapped to at least two calculation instruction pipelines, each direct subgraph may be mapped to one calculation instruction pipeline, and each calculation instruction pipeline is used to perform a series of operations.
A difference between the depth subgraph and the direct subgraph lies in the following: The depth subgraph corresponds to at least two calculation instruction pipelines, and each calculation instruction pipeline is used to perform a processing operation based on a part of input data of the depth subgraph. The direct subgraph corresponds to only one calculation instruction pipeline, and the calculation instruction pipeline corresponding to the direct subgraph is used to perform a processing operation based on all input data of the direct subgraph. It can be understood that a plurality of PipeLines corresponding to the depth subgraph are used to successively process at least two groups of data obtained by tiling the input data, to obtain output data, that is, each PipeLine processes a part of the input data. For a depth subgraph, an amount of output data obtained by performing a processing operation by a calculation instruction pipeline corresponding to the depth subgraph is fixed. A larger quantity of pieces of data obtained by tiling input data of the depth subgraph indicates smaller storage overheads required for performing a processing operation by each calculation instruction pipeline. In addition, in a process of performing processing operations by the plurality of PipeLines corresponding to the depth subgraph, some addresses in the on-chip buffer can be multiplexed. To be specific, in the process of performing processing operations by the plurality of PipeLines corresponding to the depth subgraph, storage space occupied by invalid data (data not required in a subsequent processing operation) is released, and only valid data (data required in a subsequent processing operation) is retained. In this way, each PipeLine requires small storage overheads of the on-chip buffer when performing the processing operation. It can be understood that at least two groups of data obtained by tiling input data are separately processed to obtain output data, so that occupied storage space of the on-chip buffer can be greatly reduced. An execution operation (data is not tiled) that originally needs to be completed by accessing an external memory can be implemented simply by reading and writing the on-chip buffer, thereby reducing access to the external memory. For a specific vertex, when storage space required when the vertex processes input data of the vertex exceeds current available storage space of the on-chip buffer, the input data of the vertex is tiled, and a plurality of groups of data obtained by tiling the input data are separately processed. In this way, the vertex can complete a processing operation without accessing the external memory. In addition, processing operations performed by the PipeLines in the plurality of PipeLines corresponding to the depth subgraph are similar to each other, and a larger quantity of pieces of data obtained by tiling the input data of the depth subgraph indicates a larger quantity of PipeLines corresponding to the depth subgraph. Because the PipeLines successively perform processing (that is, perform processing in series), a larger quantity of PipeLines corresponding to the depth subgraph indicates longer time that is required by these PipeLines to complete processing operations. To consider both storage overheads and calculation efficiency, the input data of the depth subgraph needs to be tiled properly. Each direct subgraph corresponds to one PipeLine, and a vertex corresponding to the direct subgraph only needs to perform one processing operation. The depth subgraph corresponds to a plurality of PipeLines, and a vertex corresponding to the depth subgraph needs to perform a plurality of processing operations.
In conclusion, it can be learned that calculation efficiency of the vertex corresponding to the depth subgraph and overheads of on-chip storage space that are required by the vertex to perform a processing operation are both lower than those of the vertex corresponding to the direct subgraph. Therefore, the neural network graph needs to be properly tiled to obtain a depth subgraph and a direct subgraph, and PipeLines are generated based on the depth subgraph and the direct subgraph, so as to improve calculation efficiency while reducing access to the external memory. In addition, when the OP is used as a basic calculation unit. In a process of performing a processing operation, a PipeLine corresponding to the neural network needs to frequently access external storage. As a result, power consumption is high, and the AI chip cannot meet a requirement in a mobile phone terminal scenario, such as face detection and recognition in a screen-off scenario. When the depth subgraph and/or the direct subgraph are/is used as a basic calculation unit/basic calculation units, in the process of performing a processing operation, the PipeLine corresponding to the neural network does not need to frequently access the external storage, so that power consumption is low, and a requirement in a screen-off scenario or the like can be well met.
A manner of tiling a neural network graph to obtain a depth subgraph and a direct subgraph provided in an embodiment of this application is described below.
A process of tiling a neural network graph to obtain a direct subgraph and a depth subgraph is briefly described below with reference to
In some embodiments, after performing graph tiling (that is, first-level graph tiling) on the neural network graph by using the neural network graph tiling processing system in
A process of first tiling a neural network graph to obtain a direct subgraph and a depth subgraph, and then tiling a depth subgraph whose depth is not less than a first threshold is described below with reference to
Further, the control module 401 controls the on-chip address management module 404 to allocate an address to a Tensor inside the depth subgraph. It can be learned from
A procedure in which the control module 401 controls the graph tiling module 402, the data tiling module 403, and the on-chip address management module 404 to tile a neural network graph to obtain a depth subgraph and a direct subgraph, and an address allocation procedure are first described below.
501. A neural network graph tiling apparatus obtains a neural network graph.
The neural network graph (NN Graph) is an expression form of a neural network. The neural network graph may include information about a processing operation performed by the neural network and information such as a size of storage space that is occupied by input/output data of each OP in the neural network in a process of processing a processing operation by the OP That the neural network graph tiling apparatus obtains the neural network graph may be as follows: The neural network graph tiling apparatus receives the neural network graph that is input by a user; the neural network graph tiling apparatus obtains program code corresponding to the neural network, and determines the neural network graph based on the program code; the neural network graph tiling apparatus obtains reference information representing the neural network graph or the neural network, and determines the neural network graph based on the reference information; or the neural network graph tiling apparatus obtains the neural network graph in another manner. This is not limited in this application.
502. The neural network graph tiling apparatus determines whether traversal of all vertices in the neural network graph is completed.
A process in which the neural network graph tiling apparatus tiles the neural network graph may be successively placing vertices included in the neural network graph into a direct subgraph or a depth subgraph through tiling in a sequence of performing processing operations by the vertices, until traversal of all the vertices is completed. Determining whether traversal of all the vertices in the neural network graph is completed may be determining whether a last vertex in the neural network graph is placed into a specific direct subgraph or depth subgraph through tiling. Step 513 is performed if traversal of all the vertices in the neural network graph is completed; or step 503 is performed if traversal of all the vertices in the neural network graph is incomplete.
503. The neural network graph tiling apparatus uses a current vertex as a start vertex in a reference subgraph.
The current vertex is a current to-be-tiled vertex in the neural network graph. To be specific, each vertex that is in the neural network graph and that performs a processing operation before the current vertex is placed into a direct subgraph or a depth subgraph through tiling, and all other vertices need to perform a processing operation after the current vertex.
504. The neural network graph tiling apparatus determines whether the current vertex is a last vertex in the neural network graph.
Step 506 is performed if the current vertex is the last vertex in the neural network graph; or step 505 is performed if the current vertex is not the last vertex in the neural network graph.
505. The neural network graph tiling apparatus adds a next vertex of the current vertex to the reference subgraph.
The next vertex of the current vertex may be a vertex that is in the neural network graph and that first performs a processing operation after the current vertex performs a processing operation.
506. The neural network graph tiling apparatus determines whether data tiling is successfully performed.
Step 507 is performed if data tiling is successfully performed; or step 508 is performed if data tiling fails to be performed. An implementation of step 506 is described in detail in a subsequent embodiment. Details are not described herein.
507. The neural network graph tiling apparatus constructs a subgraph by using a direct mode.
The neural network graph tiling apparatus constructs a subgraph by using a direct mode based on the reference subgraph obtained in step 504 or step 505. For example, the reference subgraph includes the vertex 5 and the vertex 6 in
508. The neural network graph tiling apparatus constructs a subgraph by using a depth mode.
The neural network graph tiling apparatus constructs a subgraph by using a depth mode based on the reference subgraph obtained in step 504 or step 505. For example, the reference subgraph includes the vertex 1 and the vertex 2 in
509. The neural network graph tiling apparatus determines whether subgraph construction succeeds.
Step 510 is performed if subgraph construction succeeds; or step 512 is performed if subgraph construction fails. That the neural network graph tiling apparatus determines whether subgraph construction succeeds may be determining whether a direct subgraph or a depth subgraph is successfully constructed. Specifically, if a depth subgraph is obtained after the neural network graph tiling apparatus performs step 508, it is determined that subgraph construction succeeds. Alternatively, if information indicating that subgraph construction fails is output after the neural network graph tiling apparatus performs step 508, it is determined that subgraph construction fails. Specifically, if a depth subgraph or a direct subgraph is obtained after the neural network graph tiling apparatus performs step 507, it is determined that subgraph construction succeeds. Alternatively, if information indicating that subgraph construction fails is output after the neural network graph tiling apparatus performs step 507, it is determined that subgraph construction fails.
510. The neural network graph tiling apparatus outputs a subgraph and address allocation information.
The address allocation information is obtained in a process of constructing the subgraph. The address allocation information is an on-chip address that is allocated to each vertex in the subgraph in a process of performing a processing operation by the vertex.
511. The neural network graph tiling apparatus updates a next vertex in the subgraph as a start vertex in a next reference subgraph.
For example, the subgraph that is output in step 510 is the depth subgraph in
512. The neural network graph tiling apparatus returns a tiling failure.
That the neural network graph tiling apparatus returns a tiling failure may be outputting information indicating that neural network graph tiling fails.
513. Output all subgraphs and address allocation information.
From a perspective of reducing overheads of an on-chip buffer, the neural network graph may be tiled into a plurality of depth subgraphs. In other words, subgraphs obtained by tiling the neural network graph do not include the direct subgraph.
In this embodiment of this application, the neural network graph tiling apparatus tiles the neural network graph in two subgraph construction manners, which not only can effectively decrease a quantity of times of accessing an external memory, but also can ensure that calculation performance is not greatly affected.
A specific implementation of step 506 in
701. A neural network graph tiling apparatus obtains a reference subgraph.
That the neural network graph tiling apparatus obtains the reference subgraph may be as follows: An on-chip address management module receives the reference subgraph (that is, the reference subgraph obtained in step 504 or step 505) from a control module. The method procedure in
702. The neural network graph tiling apparatus determines whether to allocate an address of an on-chip buffer to input/output data inside the reference subgraph.
Step 703 is performed if the neural network graph tiling apparatus determines to allocate the address of the on-chip buffer to the input/output data inside the reference subgraph; or step 707 is performed if the neural network graph tiling apparatus determines not to allocate the address of the on-chip buffer to the input/output data inside the reference subgraph. The input/output data inside the reference subgraph is input/output data, other than input data and output data of the reference subgraph, of each vertex in the reference subgraph in a process of performing a processing operation by the vertex. For example, the depth subgraph in
703. The neural network graph tiling apparatus allocates the address of the on-chip buffer to the input/output data inside the reference subgraph.
That the neural network graph tiling apparatus allocates the address of the on-chip buffer to the input/output data inside the reference subgraph may be as follows: The control module invokes the on-chip address management module to allocate the address of the on-chip buffer to the input/output data inside the reference subgraph. An implementation of step 703 is subsequently described in detail.
704. The neural network graph tiling apparatus determines whether to allocate an address to output data of the reference subgraph.
Step 705 is performed if the neural network graph tiling apparatus determines to allocate an address to the output data of the reference subgraph; or step 707 is performed if the neural network graph tiling apparatus determines not to allocate an address to the output data of the reference subgraph. That the neural network graph tiling apparatus determines whether to allocate an address to the output data of the reference subgraph may be as follows: The neural network graph tiling apparatus determines whether current available storage space of the on-chip buffer is larger than storage space required by the output data of the reference subgraph. If the current available storage space of the on-chip buffer is larger than the storage space required by the output data of the reference subgraph, the neural network graph tiling apparatus allocates an address to the output data of the reference subgraph; or if the current available storage space of the on-chip buffer is not larger than the storage space required by the output data of the reference subgraph, the neural network graph tiling apparatus does not allocate an address to the output data of the reference subgraph.
705. The neural network graph tiling apparatus allocates an address to the output data of the reference subgraph.
That the neural network graph tiling apparatus allocates an address to the output data of the reference subgraph may be as follows: The control module invokes the on-chip address management module to allocate an address of the on-chip buffer to the output data of the reference subgraph.
706. The neural network graph tiling apparatus recycles all addresses allocated to the reference subgraph.
That the neural network graph tiling apparatus recycles all the addresses allocated to the reference subgraph may be as follows: The on-chip address management module releases storage space occupied by current invalid data (that is, data not required in a subsequent processing operation).
707. The neural network graph tiling apparatus resets the on-chip address management module.
That the neural network graph tiling apparatus resets the on-chip address management module may be releasing the address that is of the on-chip buffer and that is allocated by performing step 703 and/or step 705.
708. The neural network graph tiling apparatus determines whether to allocate an address to output data of the reference subgraph.
Step 709 is performed if the neural network graph tiling apparatus determines to allocate an address to the output data of the reference subgraph; or step 713 is performed if the neural network graph tiling apparatus determines not to allocate an address to the output data of the reference subgraph. An implementation of step 708 may be the same as that of step 704.
709. The neural network graph tiling apparatus allocates an address to the output data of the reference subgraph.
An implementation of step 709 may be the same as that of step 705.
710. The neural network graph tiling apparatus tiles input data of the reference subgraph according to a tiling rule.
That the neural network graph tiling apparatus tiles the input data of the reference subgraph according to the tiling rule may be as follows: The control mode invokes the on-chip address management module to tile the input data of the reference subgraph according to the tiling rule. An implementation of step 710 is subsequently described in detail.
711. When the input data of the reference subgraph is tiled into at least two parts, the neural network graph tiling apparatus allocates the address of the on-chip buffer to the input/output data inside the reference subgraph.
That when the input data of the reference subgraph is tiled into at least two parts, the neural network graph tiling apparatus allocates the address of the on-chip buffer to the input/output data inside the reference subgraph may be as follows: The control module invokes the on-chip address management module to allocate the address of the on-chip buffer to the input/output data inside the reference subgraph when the input data of the reference subgraph is tiled into at least two parts. It is assumed that the input data of the reference subgraph is tiled into a first part of data, a second part of data, and a third part of data. When the first part of data is used as the input data of the reference subgraph, the on-chip address management module allocates the address of the on-chip buffer to the input/output data inside the reference subgraph and the output data of the reference subgraph. In a process of allocating the address, the on-chip address management module may release an address occupied by invalid data (for example, data obtained in a process of processing the first part of data). When the second part of data is used as the input data of the reference subgraph, the on-chip address management module allocates the address of the on-chip buffer to the input/output data inside the reference subgraph and the output data of the reference subgraph. In a process of allocating the address, the on-chip address management module may release an address occupied by invalid data (for example, data obtained in a process of processing the second part of data). When the third part of data is used as the input data of the reference subgraph, the on-chip address management module allocates the address of the on-chip buffer to the input/output data inside the reference subgraph and the output data of the reference subgraph. Step 711 is a process of allocating the address of the on-chip buffer, but allocation may fail in this process due to insufficient current available storage space of the on-chip buffer. In other words, step 711 does not necessarily succeed, but is a process of trying to perform allocation, because before performing step 711, the neural network graph tiling apparatus cannot accurately determine whether the address of the on-chip buffer can be successfully allocated to the input/output data inside the reference subgraph when the input data of the reference subgraph is tiled into at least two parts.
712. The neural network graph tiling apparatus determines whether the address of the on-chip buffer is successfully allocated to the input/output data inside the reference subgraph.
An implementation of step 712 may be the same as that of step 703. Step 714 is performed if the address of the on-chip buffer is successfully allocated to the input/output data inside the reference subgraph; or step 713 is performed if the address of the on-chip buffer fails to be allocated to the input/output data inside the reference subgraph.
713. The neural network graph tiling apparatus returns “not support”.
That the neural network graph tiling apparatus returns “not support” may be returning information indicating that data tiling is not supported. If the neural network graph tiling apparatus returns “not support”, a result of determining performed in step 506 is that data tiling fails.
714. The neural network graph tiling apparatus outputs an address allocation result of the reference subgraph.
The address allocation result that is of the reference subgraph and that is output by the neural network graph tiling apparatus may be address allocation information obtained in step 702 and address allocation information obtained in step 704; or may be address allocation information obtained in step 708 and address allocation information obtained in step 711. If the neural network graph tiling apparatus outputs the address allocation result of the reference subgraph, a result of determining performed in step 506 is that data tiling succeeds.
It can be learned from
An implementation of step 507 in
801. A neural network graph tiling apparatus obtains a reference subgraph.
That the neural network graph tiling apparatus obtains the reference subgraph may be as follows: A graph tiling module receives the reference subgraph (that is, the reference subgraph obtained in step 504 or step 505) from a control module. The method procedure in
802. The neural network graph tiling apparatus adds a current to-be-tiled vertex in a neural network graph to the reference subgraph.
That the neural network graph tiling apparatus adds the current to-be-tiled vertex in the neural network graph to the reference subgraph may be as follows: The graph tiling module adds the current to-be-tiled vertex in the neural network graph to the reference subgraph. Adding the current to-be-tiled vertex in the neural network graph to the reference subgraph may be understood as expanding the reference subgraph by one vertex. For example, vertices included in the reference subgraph are the vertex 1 and the vertex 2 in
803. The neural network graph tiling apparatus performs data tiling determining.
That the neural network graph tiling apparatus performs data tiling determining may be as follows: The control module invokes a data tiling module and the on-chip address management module to perform the method procedure in
804. The neural network graph tiling apparatus determines whether data tiling is successfully performed.
Step 804 is the same as step 506 in
805. The neural network graph tiling apparatus determines whether input data of the reference subgraph is tiled.
Step 806 is performed if the input data of the reference subgraph is tiled; or step 802 is performed if the input data of the reference subgraph is not tiled. When successfully performing data tiling, the neural network graph tiling apparatus outputs the address allocation result of the reference subgraph. Optionally, the address allocation result may include information indicating whether the input data of the reference subgraph is tiled. The neural network graph tiling apparatus determines, based on the information, whether the input data of the reference subgraph is tiled. It can be understood that an address allocation result that is output when the input data of the reference subgraph is tiled is different from an address allocation result that is output when the input data of the reference subgraph is not tiled. Optionally, the neural network graph tiling apparatus determines, based on the address allocation result, whether the input data of the reference subgraph is tiled.
806. The neural network graph tiling apparatus generates a direct subgraph or a depth subgraph.
If data tiling fails to be performed, the reference subgraph is used as a direct subgraph. Alternatively, if data tiling is successfully performed and the input data of the reference subgraph is tiled, the reference subgraph is used as a depth subgraph.
In this embodiment of this application, a subgraph is constructed by using a Direct mode, so that a direct subgraph or a depth subgraph can be quickly constructed.
An implementation of step 508 in
901. A neural network graph tiling apparatus obtains a reference subgraph.
That the neural network graph tiling apparatus obtains the reference subgraph may be as follows: A graph tiling module receives the reference subgraph (that is, the reference subgraph obtained in step 504 or step 505) from a control module. The method procedure in
902. The neural network graph tiling apparatus adds a current to-be-tiled vertex in a neural network graph to the reference subgraph.
That the neural network graph tiling apparatus adds the current to-be-tiled vertex in the neural network graph to the reference subgraph may be as follows: The graph tiling module adds the current to-be-tiled vertex in the neural network graph to the reference subgraph. Adding the current to-be-tiled vertex in the neural network graph to the reference subgraph may be understood as expanding the reference subgraph by one vertex.
903. The neural network graph tiling apparatus performs data tiling determining.
That the neural network graph tiling apparatus performs data tiling determining may be as follows: The control module invokes the data tiling module and the on-chip address management module to perform the method procedure in
904. The neural network graph tiling apparatus determines whether data tiling is successfully performed.
Step 905 is performed if data tiling is successfully performed; or step 907 is performed if data tiling fails to be performed. Step 901 to step 904 successively correspond to step 801 to step 804.
905. The neural network graph tiling apparatus determines whether a current vertex is a special vertex.
The special vertex may include a Pooling vertex and the like. The current vertex is a vertex recently added in step 902 (that is, a vertex by which the reference subgraph is expanded in step 902). Step 906 is performed if the current vertex is the feature vertex; or step 902 is performed if the current vertex is not the feature vertex. Determining whether the current vertex is the special vertex is only an optional manner provided in this embodiment of this application to determine whether to output the current reference subgraph as a depth subgraph. In actual application, the neural network graph tiling apparatus may output the current reference subgraph as a depth subgraph when the current reference subgraph meets another condition.
906. The neural network graph tiling apparatus uses the reference subgraph as a depth subgraph.
907. The neural network graph tiling apparatus determines that subgraph construction by using a depth mode fails.
In this embodiment of this application, a subgraph is constructed by using a Depth mode, so that a depth subgraph can be quickly constructed, and when a subgraph fails to be constructed, it is determined in time that subgraph construction fails.
An implementation of step 710 in
1001. A neural network graph tiling apparatus determines whether a height of output data of a depth subgraph is a height threshold.
The height threshold may be 1. Assuming that the output data is a three-dimensional matrix of [5, 10, 256], the height of the output data is 10. This procedure ends if the height of the output data of the depth subgraph is the height threshold; or step 1002 is performed if the height of the output data of the depth subgraph is not the height threshold.
1002. The neural network graph tiling apparatus subtracts a target value from the height of the output data of the depth subgraph to obtain a reference height.
The target value may be 1, 2, 3, or the like. This is not limited in this application. The reference height is a height obtained after the target value is subtracted from the height of the output data. The target value represents a height value by which the height of the output data is decreased each time. For example, the output data is a three-dimensional matrix of [5, 10, 256], and the target value is 1. A height (corresponding to the reference height) obtained after the target value 1 is subtracted from the height 10 of the output data is 9.
1003. The neural network graph tiling apparatus tiles input data of the depth subgraph based on the reference height.
That the neural network graph tiling apparatus tiles the input data of the depth subgraph based on the reference height may be as follows: A data tiling module divides a complete height (that is, an original height) of the output data of the depth subgraph by the reference height, and rounds up a calculation result to obtain a quantity of PipeLines to be obtained through tiling; and tiles the input data of the depth subgraph based on the quantity of PipeLines. Each PipeLine corresponds to a group of data. The complete height of the output data of the depth subgraph is a height of the output data of the depth subgraph before the height of the output data of the depth subgraph is adjusted. For example, the original height of the output data of the depth subgraph is 7, and an adjusted height of the output data is 4. 7/4 is calculated and rounded up to obtain 2 (a quantity of PipeLines), and the input data of the depth subgraph is tiled into two parts.
1004. The neural network graph tiling apparatus allocates an address of an on-chip buffer to the depth subgraph.
That the neural network graph tiling apparatus allocates the address of the on-chip buffer to the depth subgraph may be as follows: An on-chip address manager allocates the address of the on-chip buffer to input/output data that is of a plurality of PipeLines corresponding to the depth subgraph and that is in a process of performing processing operations by the PipeLines. It can be understood that the on-chip address manager allocates the address of the on-chip buffer to the input/output data based on information about storage space required by the input/output data that is of the plurality of PipeLines corresponding to the depth subgraph and that is in the process of performing processing operations by the PipeLines, and the plurality of PipeLines corresponding to the depth subgraph do not need to perform processing operations.
1005. The neural network graph tiling apparatus recycles an on-chip address (the address of the on-chip buffer) of invalid data.
The invalid data is data not required in a subsequent processing operation. For example, the input data of the depth subgraph is tiled into two parts, that is, a first part and a second part. When the first part is processed as the input data to obtain first output, storage space occupied by other data used in a process of processing the first part is released, and only the first output is retained. When the second part is processed as the input data to obtain second output, storage space occupied by other data used in a process of processing the second part is released, and only the second output is retained. In other words, the on-chip address management module may continuously recycle current available storage space of the on-chip buffer by using a memory multiplexing mechanism.
1006. The neural network graph tiling apparatus determines whether the address is successfully allocated.
This procedure ends if the address is successfully allocated; or step 1001 is performed if the address fails to be allocated.
The input data of the depth subgraph has three forms after being tiled: (1) a head (Graph Head) form; (2) a body (Graph Body) form; and (3) a tail (Graph Tail) form. If an OP corresponding to the depth subgraph is a convolution operator or the like, the input data usually includes padding (Padding) information, the Graph Head includes up padding, the Graph Body does not include Up Padding and down padding, and the Graph Tail includes Down Padding. After the input data of the depth subgraph is tiled, a corresponding address offset needs to be calculated for a corresponding PipeLine (after the input data is tiled, different storage space needs to be accessed when blocks (groups of data) obtained after tiling are independently processed). After the input data of the depth subgraph is tiled, an overlapping part may or may not exist between at least two obtained groups of data. That overlapping exists between two groups of data means that two neighboring groups of data obtained through tiling include same data.
GraphHeadaddr=input Tensor addr;
GraphBodyaddr=GraphHead addr+GraphHead data size−overlapSize; and
GraphTailaddr=GraphBody addr+(Graph Body data size−overlapSize)*(loop−2).
In the foregoing manner, “input Tensor addr” represents a base address of the input data, “GraphHead addr” represents a base address of the head data, “GraphHead data size” represents a storage address occupied by the head data, overlapSize represents a storage address occupied by overlapped data between the body data and the head data, “GraphBodyaddr” represents a base address of the body data, “GraphTailaddr” represents a base address of the tail data, “Graph Body data size” represents a storage address occupied by one piece of body data, and “loop” represents a quantity of pieces of data obtained by tiling the input data.
In this embodiment of this application, the data tiling module performs processing to obtain a maximum height that is of the output data of the depth subgraph and that can be currently supported by the on-chip buffer, so that a quantity of PipeLines mapped to the depth subgraph can be decreased, thereby improving calculation efficiency.
An implementation of step 703 in
1201. A neural network graph tiling apparatus obtains a current vertex.
The current vertex is a vertex that is in a subgraph and to which an on-chip address is to be allocated in current allocation. The neural network graph tiling apparatus may successively allocate storage space of an on-chip buffer to output data (that is, an output edge) of vertices in the subgraph based on a sequence of performing processing operations by the vertices.
1202. The neural network graph tiling apparatus determines whether the current vertex is a last vertex in a subgraph.
Step 1203 is performed if the current vertex is not the last vertex in the subgraph; or step 1210 is performed if the current vertex is the last vertex in the subgraph.
1203. The neural network graph tiling apparatus recycles current releasable storage space of the on-chip buffer.
The current releasable storage space of the on-chip buffer is storage space that is of the on-chip buffer and that is currently occupied by invalid data. The current releasable storage space of the on-chip buffer is recycled, so that a multiplexing rate of the on-chip address can be greatly improved.
1204. The neural network graph tiling apparatus obtains an input edge of the current vertex.
In this application, an edge is input data and output data of a vertex. An input edge is input data of the vertex, and an output edge is output data of the vertex.
1205. The neural network graph tiling apparatus determines whether an output edge of the current vertex is an output edge of the subgraph.
The output edge of the subgraph is an output edge of the last vertex in the subgraph. Using the depth subgraph in
1206. The neural network graph tiling apparatus allocates an address of the on-chip buffer to the output edge of the current vertex.
1207. The neural network graph tiling apparatus determines whether the address of the on-chip buffer is successfully allocated to the output edge of the current vertex.
Step 1208 is performed if the address of the on-chip buffer is successfully allocated to the output edge of the current vertex; or step 1210 is performed if the address of the on-chip buffer fails to be allocated to the output edge of the current vertex.
1208. The neural network graph tiling apparatus records a processed vertex.
That the neural network graph tiling apparatus records a processed vertex may be recording the current vertex as a processed vertex.
1209. The neural network graph tiling apparatus records a processed edge.
1210. The neural network graph tiling apparatus releases all allocated edges.
That the neural network graph tiling apparatus releases all the allocated edges may be releasing storage space that is of the on-chip buffer and that is occupied by all input data and output data corresponding to the subgraph. In addition to allocating an on-chip address to the output data of the subgraph, the on-chip buffer further provides on-chip address allocation to input/output data of each vertex in the subgraph. If a specific subgraph cannot be stored in the on-chip buffer, all the allocated edges are released, and a failure is directly returned to indicate that a neural network graph cannot be tiled.
In this embodiment of this application, the neural network graph tiling apparatus allocates an on-chip address to an output edge of each vertex in the subgraph by using the memory multiplexing mechanism, which can greatly reduce on-chip address overheads, and is simple to implement.
A manner in which an on-chip buffer recycles current releasable storage space of the on-chip buffer is described below.
1301. A neural network graph tiling apparatus obtains a current edge of a subgraph.
The current edge is input data and output data of the current vertex. The neural network graph tiling apparatus may successively obtain edges of vertices in the subgraph in a sequence of performing processing operations by the vertices. The depth subgraph in
1302. The neural network graph tiling apparatus determines whether the current edge is an output edge of the subgraph.
Step 1301 is performed if the current edge is the output edge of the subgraph; or step 1303 is performed if the current edge is not the output edge of the subgraph.
1303. The neural network graph tiling apparatus releases storage space that is of the on-chip buffer and that is occupied the current edge.
Optionally, in a process of processing a PipeLine, the neural network graph tiling apparatus may release current available storage space of the on-chip buffer by using a memory multiplexing mechanism.
In this embodiment of this application, the neural network graph tiling apparatus releases the current available storage space of the on-chip buffer, so as to resolve a problem that storage space of the on-chip buffer is insufficient.
Some operations such as step 702, step 704, and step 708 in the foregoing embodiment relate to on-chip address allocation. An on-chip address allocation procedure is described below. The on-chip address allocation procedure may be divided into two stages. In a first stage, an optimal discrete block (that is, optimal storage space) that can currently store to-be-stored data is found from an on-chip buffer. In a second stage, address space that is in the optimal discrete block and that is relatively suitable for storing the to-be-stored data is determined. Each discrete block corresponds to continuous storage space in the on-chip buffer.
1401. A neural network graph tiling apparatus determines an optimal discrete block that is in an on-chip buffer and that can currently store to-be-stored data.
The to-be-stored data may be output data or input data of a specific vertex in a subgraph. That the neural network graph tiling apparatus determines the optimal discrete block that is in the on-chip buffer and that can currently store the to-be-stored data may be as follows: An on-chip address management module performs traversal in a size sequence of currently available discrete blocks in the on-chip buffer, so as to use a smallest discrete block that can store the to-be-stored data as the optimal discrete block. A size of a discrete block is a size of storage space of the discrete block.
1402. The neural network graph tiling apparatus determines target address space in the optimal discrete block.
The target address space is used to store the to-be-stored data. In actual application, storage space of the optimal discrete block is often larger than storage space required for storing the to-be-stored data. To reduce overheads of on-chip storage space, the neural network graph tiling apparatus may allocate the target address space in the optimal discrete block to the to-be-stored data. According to a ping-pong allocation policy provided in this application, a part of space that is in the optimal discrete block and that is more suitable for storing the to-be-stored data may be further determined, so as to reduce overheads of on-chip storage space.
In this embodiment of this application, the optimal discrete block is first determined, and the target address space in the optimal discrete block is further determined, which can greatly reduce overheads of on-chip storage space.
An implementation of step 1402 is not described in detail in
1501. A neural network graph tiling apparatus obtains an optimal discrete block.
That the neural network graph tiling apparatus obtains the optimal discrete block may be as follows: An on-chip address management module obtains the optimal discrete block. The optimal discrete block is a discrete block to store data of a current vertex. In other words, the optimal discrete block is a discrete block determined to store the data of the current vertex.
1502. The neural network graph tiling apparatus sorts discrete blocks in an on-chip buffer in ascending order of base addresses.
The discrete blocks in the on-chip buffer include an available discrete block and an occupied discrete block.
1503. The neural network graph tiling apparatus determines whether total discrete address space of the on-chip buffer is complete.
That the neural network graph tiling apparatus determines whether the total discrete address space of the on-chip buffer is complete may be determining whether storage space of the on-chip buffer is occupied. If all storage space of the on-chip buffer is unoccupied, the total discrete address space of the on-chip buffer is complete; or if the storage space of the on-chip buffer is occupied, the total discrete address space of the on-chip buffer is incomplete. Step 1511 is performed if the total discrete address space of the on-chip buffer is complete; or step 1504 is performed if the total discrete address space of the on-chip buffer is incomplete.
1504. The neural network graph tiling apparatus determines whether the optimal discrete block is located at two ends of storage space of the on-chip buffer.
Addresses of the storage space of the on-chip buffer are successively ranked from a base address to an end address in ascending order. One end of the storage space of the on-chip buffer may be continuous storage space including the base address of the on-chip buffer, and the other end may be continuous storage space including the end address of the on-chip buffer. For example, one end of the storage space of the internal buffer may be storage space from the base address to a first address, and the other end of the storage space of the internal buffer may be storage space from a second address to the end address. A size of the storage space from the base address to the first address is one tenth, one eighth, and the like of a size of the entire storage space of the on-chip buffer, and a size of the storage space from the second address to the end address is one tenth, one eighth, and the like of the size of the entire storage space of the on-chip buffer. Step 1505 is performed if the optimal discrete block is located at the two ends of the storage space of the on-chip buffer; or step 1508 is performed if the optimal discrete block is not located at the two ends of storage space of the on-chip buffer.
1505. The neural network graph tiling apparatus determines whether to-be-stored data occupies an address of the on-chip buffer for long time.
That the neural network graph tiling apparatus determines whether the to-be-stored data occupies the address of the on-chip buffer for long time may be determining whether the to-be-stored data is required only in recently performed M PipeLines. If the to-be-stored data is required only in the recently performed M PipeLines, the neural network graph tiling apparatus determines that the to-be-stored data does not occupy the address of the on-chip buffer for long time; or if the to-be-stored data is not only required in the recently performed M PipeLines, the neural network graph tiling apparatus determines that the to-be-stored data occupies the address of the on-chip buffer for long time. M is an integer greater than 0, such as 1, 2, 3, and 4. Step 1506 is performed if the to-be-stored data occupies the address of the on-chip buffer for long time; or step 1507 is performed if the to-be-stored data does not occupy the address of the on-chip buffer for long time.
1506. The neural network graph tiling apparatus allocates storage addresses at two ends of the optimal discrete block to the to-be-stored data.
One end of the optimal discrete block may be continuous storage space including a base address of the optimal discrete block, and the other end may be continuous storage space including an end address of the optimal discrete block.
1507. The neural network graph tiling apparatus allocates, to the to-be-stored data, a storage address that is of the optimal discrete block and that is away from the two ends.
The storage address that is of the optimal discrete block and that is away from the two ends is a storage address, other the storage addresses at the two ends of the optimal discrete block, that is in the storage addresses corresponding to the optimal discrete block. The storage address that is of the optimal discrete block and that is away from the two ends is allocated to the to-be-stored data that does not need to occupy storage space for long time, so that this part of address is recycled in time.
1508. The neural network graph tiling apparatus determines whether a forward address block stores input data of a current vertex.
The forward address block (corresponding to third storage space) is a discrete block, in the storage space of the on-chip buffer, that is adjacent to the optimal discrete block (corresponding to second storage space) and that is located before the optimal discrete block. The to-be-stored data may be output data of the current vertex. Step 1510 is performed if the forward address block stores the input data of the current vertex; or step 1509 is performed if the forward address block does not store the input data of the current vertex.
1509. The neural network graph tiling apparatus allocates a low address (corresponding to a third address) of the optimal discrete block to the to-be-stored data.
The low address of the optimal discrete block may be half of the addresses that includes the base address, and a high address of the optimal discrete block may be the other half of the addresses that includes the end address. For example, if a size of the optimal discrete block is 100 KB, and the to-be-stored data needs to occupy 40 KB, the neural network graph tiling apparatus allocates the first 40 KB in the 100 KB.
1510. The neural network graph tiling apparatus allocates a high address (corresponding to a fourth address) of the optimal discrete block to the to-be-stored data.
When the forward address block stores the input data of the current vertex, the high address of the optimal discrete block is allocated to the to-be-stored data, when the forward address block is released, larger continuous storage space may be formed for storage space corresponding to the low address of the optimal discrete block.
1511. The neural network graph tiling apparatus allocates a low address of the optimal discrete block to the to-be-stored data.
1512. The neural network graph tiling apparatus obtains an allocated address.
In this embodiment of this application, when the storage space of the optimal discrete block is larger than the storage space required for storing the to-be-stored data, it is determined that the optimal discrete block is more suitable for storing the to-be-stored data, so as to further decrease a quantity of discrete blocks in the on-chip buffer.
The foregoing embodiment describes an implementation in which the neural network graph tiling apparatus performs graph tiling (that is, first-level graph tiling) on a neural network graph by using the neural network graph tiling processing system in
In an optional implementation, after tiling the neural network graph to obtain the depth subgraph (that is, a first-order subgraph), the neural network graph tiling apparatus may further perform the following operations: when a quantity of vertices included in the depth subgraph is not less than the first threshold, tiling the depth subgraph to obtain a first second-order subgraph and a second second-order subgraph, where the first second-order subgraph is used to represent a first second-order subnetwork, the second second-order subgraph is used to represent a second second-order subnetwork, both the first second-order subnetwork and the second second-order subnetwork are included in the depth subnetwork, and vertices included in the first second-order subnetwork are all different from vertices included in the second second-order subnetwork. Optionally, input data of the first second-order subnetwork is the first input data, output data of the second second-order subnetwork is the first output data, the first second-order subnetwork is configured to store, into a middle buffer, first intermediate data obtained by processing the first input data, the second second-order subnetwork is configured to process the first intermediate data obtained from the middle buffer, and the middle buffer is not the on-chip buffer, that is, the middle buffer is located outside an AI chip. For example, the middle buffer is an off-chip buffer whose reading and writing speed is lower than that of the on-chip buffer, and the reading and writing speed of the middle buffer is faster than that of an external memory, such as a DDR.
For example, a manner in which the neural network graph tiling apparatus tiles the depth subgraph to obtain the first second-order subgraph and the second second-order subgraph is as follows: determining at least one reference vertex that is in a plurality of vertices included in the depth subgraph and whose output data needs to occupy storage space smaller than available storage space of the middle buffer; and tiling the depth subgraph by using an output of an intermediate vertex in the at least one reference vertex as a tiling point to obtain the first second-order subgraph and the second second-order subgraph, where the intermediate vertex is any reference vertex in the at least one reference vertex, output data of the intermediate vertex is output data of the first second-order subgraph and is input data of the second second-order subgraph. It should be understood that the neural network graph tiling apparatus may further tile, in another manner, at least one depth subgraph whose depth is not less than the first threshold to obtain at least two second-order subgraphs. This is not limited in this embodiment of this application. The depth subgraph may be any depth subgraph obtained by tiling the neural network graph. It should be understood that the neural network graph tiling apparatus may tile, in a similar manner, each depth subgraph whose depth is not less than the first threshold. Optionally, the neural network graph tiling apparatus performs a subgraph address allocation attempt (refer to
Optionally, before tiling the depth subgraph by using the output of the intermediate vertex in the at least one reference vertex as a tiling point to obtain the first second-order subgraph and the second second-order subgraph, the neural network graph tiling apparatus may perform the following operations: obtaining a depth difference between two second-order subgraphs that are obtained by tiling the depth subgraph by separately using the at least one reference vertex as a tiling point, to obtain at least one depth difference, where the at least one reference vertex is in a one-to-one correspondence with the at least one depth difference; and determining that the output of the intermediate vertex that is in the at least one reference vertex and that corresponds to a depth difference less than a depth difference threshold is used as a tiling point to tile the depth subgraph. Optionally, it is determined that the intermediate vertex that is in the at least one reference vertex and that corresponds to a minimum depth difference is used as a tiling point to tile the depth subgraph. The depth difference threshold may be 1, 2, 3, or the like. For example, vertices in the depth subgraph are successively a vertex 1, a vertex 2, a vertex 3, a vertex 4, and a vertex 5 in an execution sequence. A second-order subgraph (whose depth is 1) including the vertex 1 and a second-order subgraph (whose depth is 4) including the vertex 2, the vertex 3, the vertex 4, and the vertex 5 may be obtained by tiling the depth subgraph by using an output of the vertex 1 as a tiling point. A depth difference between the two second-order subgraphs is 3. For example, vertices in the depth subgraph are successively a vertex 1, a vertex 2, a vertex 3, a vertex 4, and a vertex 5 in an execution sequence. A second-order subgraph (whose depth is 3) including the vertex 1, the vertex 2, and the vertex 3 and a second-order subgraph (whose depth is 2) including the vertex 4 and the vertex 5 may be obtained by tiling the depth subgraph by using an output of the vertex 3 as a tiling point. A depth difference between the two second-order subgraphs is 1. In this implementation, the depth subgraph can be quickly tiled into two second-order subgraphs between which a depth difference is less than the depth difference threshold.
An example of tiling a depth subgraph whose tiling depth is not less than the first threshold is described below. After completing first-level graph tiling on the neural network graph, the neural network graph tiling apparatus may retain an address of input data and an address of output data of each depth subgraph and each direct subgraph.
It is assumed that after tiling the neural network graph, the neural network graph tiling apparatus obtains a depth subgraph including a vertex 1 to a vertex 5 and a direct subgraph including a vertex 6. The neural network graph tiling apparatus may first determine a specific vertex that is in the depth subgraph and whose output data can be stored into the middle buffer. A specific process is as follows: The neural network graph tiling apparatus determines whether an internal output in the depth subgraph, such as output data of the vertex 1, the vertex 2, the vertex 3, and the vertex 4, can be placed into the middle buffer, and then selects each vertex whose output data can be placed into the middle buffer, to obtain an output list. A depth difference between two second-order subgraphs obtained by tiling the depth subgraph is calculated based on the output list. For example, the depth subgraph is tiled by using an output of an OP 1 as a tiling point to tile to obtain two second-order subgraphs. One second-order subgraph includes the vertex 1 and has a depth of 1, and the other second-order subgraph includes the vertex 2 to the vertex 5 and has a depth of 4. In this case, a depth difference between the second-order subgraphs is 3. By analogy, a corresponding depth difference obtained when the output of the OP 1 is used as a tiling point is 3, a corresponding depth difference obtained when an output of an OP 2 is used as a tiling point is 1, a corresponding depth difference obtained when an output of an OP 3 is used as a tiling point is 1, and a corresponding depth difference obtained when an output of an OP 4 is used as a tiling point is 3. For example, an output of a vertex whose depth difference is minimum and whose output data is relatively large is determined from the OPs as a tiling point. For example, in this example, the depth difference of the output of the OP 2 and that of the output of the OP 3 are both 1. If the output of the OP 3 is greater than the output of the OP 2, the output of the OP 3 is used as a tiling point of the depth subgraph, and the output of the OP 3 is set in the middle buffer. In this way, two second-order subgraphs may be formed: (OP 1-OP 3) and (OP 4-OP 5). Each second-order subgraph obtained by tiling a depth subgraph may be a direct subgraph, or may be a depth subgraph. Optionally, after obtaining at least one direct subgraph by tiling the depth subgraph, the neural network graph tiling apparatus may combine a direct subgraph before the at least one direct subgraph, the at least one direct subgraph, and a direct subgraph after the at least one direct subgraph to form a new direct subgraph. For example, in
In some embodiments, if N of depth subgraphs obtained by tiling the neural network graph by the neural network graph tiling apparatus has a depth not less than the first threshold, any one depth subgraph, two depth subgraphs, three depth subgraph, . . . , (N−1) depth subgraphs, and N depth subgraphs in the N depth subgraphs are tiled in the foregoing manner, so that W neural network graph tiling results can be obtained. There are (W+1) cases in total plus a case in which second-level graph tiling is not performed. In some embodiments, the neural network graph tiling apparatus may separately generate AI chip-executable files corresponding to the (W+1) neural network graph tiling results, and separately test, by using test data, execution time of processing the test data by each AI chip-executable file. The neural network graph tiling apparatus may compare the execution time of the (W+1) AI chip-executable files (calculated input data has a same scale), and select a smallest case as a final result to be deployed on a terminal device (such as a mobile phone or a smart monitoring device) or even on a cloud. As a whole, for a specific neural network graph tiling result, because a depth of the depth subgraph becomes smaller, an amount of repeatedly calculated data (for example, overlapping brought by a convolution operator) is decreased, thereby further improving entire network performance.
A difference between a calculation form of a vertex included in a depth subgraph (that is, a first-order depth subgraph) and a calculation form of a vertex included in a second-order subgraph is described below with reference to accompany drawings.
Because a middle buffer that can be read and written is introduced into a calculation process, a processing procedure performed by the vertices in
Theoretically, a quantity of pieces of data obtained by tiling data when the vertex included in the second-order subgraph performs a processing operation should be less than a quantity of pieces of data obtained by tiling data when the vertex included in the first-order subgraph performs a processing operation. Therefore, an amount of repeatedly loaded data in the calculation process of the AI chip can be decreased in overall. Examples in
To reduce storage load of the on-chip buffer, a Concat vertex optimization solution is introduced in this application. A schematic diagram of a structure of a neural network graph tiling apparatus is shown in
In actual application, in a process of allocating an address of the on-chip buffer to internal input/output data inside a reference subgraph, if a function of a specific vertex in the reference subgraph is transmitting, to one or more vertices in the reference subgraph, target data obtained by concatenating output data of at least two vertices in the reference subgraph, an address of the output data of the at least two vertices in the on-chip buffer is used as an address of output data of the vertex.
When the foregoing solution is used, the following objectives can be achieved: (1) Occupied on-chip storage space is reduced, and entire network integration gains are improved. (2) Input data of the Concat vertex is prevented from being migrated, thereby further improving entire network calculation performance of the neural network.
To further reduce occupied storage space of the on-chip buffer and reduce a calculation amount, this application provides a DetectionOutput vertex optimization solution based on the foregoing embodiment. A schematic diagram of a structure of a neural network graph tiling apparatus is shown in
Using
In actual application, before a neural network graph is tiled, the foregoing DetectionOutput vertex optimization solution may be used to process the neural network graph.
In this embodiment of this application, the Concat vertex optimization solution and the DetectionOutput vertex optimization solution in the neural network may be separately introduced into this embodiment, or may be simultaneously introduced into this embodiment.
The foregoing embodiment describes a manner in which a neural network graph is tiled to obtain a depth subgraph and a direct subgraph, and two types of subgraphs obtained through tiling are mapped to AI chip instructions by using a unified Pipeline form. A neural network corresponding to a group of AI chip instructions in a computer device, that is, the group of AI chip instructions is run to implement a function of the neural network. In the method in the foregoing embodiment, a group of AI chip instructions corresponding to the neural network may be adjusted to another group of AI chip instructions. Compared with the group of AI chip instructions before adjustment, the adjusted group of AI chip instructions implements a same processing operation, but a quantity of times of accessing an external memory is greatly decreased and power consumption is greatly reduced. In actual application, a computer device such as a server or a terminal may implement a processing operation by using AI chip instructions corresponding to a depth subgraph and a direct subgraph. A manner of executing a target task by using a depth subgraph (corresponding to a depth subnetwork) and a direct subgraph (corresponding to a direct subnetwork) that are obtained by tiling a neural network graph (corresponding to a neural network) is described below.
2401. A data processing apparatus obtains original input data.
The original input data includes one or more signals that can be processed by a computer. The plurality of signals that can be processed by a computer include at least one of a voice signal, a text signal, or an image signal. Optionally, the original input data is image data, such as a face image. The data processing apparatus may be a terminal, a server, and another computer device. That the data processing apparatus obtains the original input data may be as follows: The data processing apparatus obtains the original input data from another device such as a server through a communications interface; or collects image data by using a camera to obtain the original input data; or collects voice data by using an audio device to obtain the original input data.
2402. The data processing apparatus inputs the original input data to a neural network for prediction processing to obtain a prediction result.
The prediction processing may be prediction processing that can be implemented by the neural network, such as face detection, face feature point detection, image enhancement, super-resolution processing of an image, natural voice text processing, and semantic tiling of an image. Correspondingly, the prediction result may be a prediction result that can be obtained by the neural network, such as a face detection result, a face feature point detection result, an enhanced image, an image obtained after super-resolution processing, a natural language text processing result, and a semantic tiling result of an image.
The prediction processing includes: successively inputting, to a depth subnetwork for processing, at least two groups of data obtained by tiling first input data, where the depth subnetwork is included in the neural network and includes a part of vertices in the neural network, each vertex represents a calculation unit in the neural network, and a plurality of vertices included in the depth subnetwork exchange data with each other by reading and writing an on-chip buffer, and the first input data is obtained in the process of inputting the original input data to the neural network for prediction processing. For a specific example of this process, refer to
The prediction processing further includes: processing, by a direct subnetwork, second input data as a whole, where the direct subnetwork is included in the neural network and includes a part of vertices in the neural network, and the second input data is obtained in the process of inputting the original input data to the neural network for prediction processing. For a specific example of this process, refer to
It can be understood that neither the depth subnetwork nor the direct subnetwork needs to access the external memory in a process of processing their respective input data. In some embodiments, inputting the original input data to the neural network for prediction processing to obtain the prediction result may be understood as: inputting, for processing, the original input data to a depth subnetwork and a direct subnetwork that are obtained by tiling the neural network. Actually, the depth subnetwork and the direct subnetwork are merely abstracted concepts, and the depth subnetwork and the direct subnetwork correspond to two different forms of processing procedures in a process in which the neural network performs prediction processing on the original input data. In other words, the depth subnetwork implements some processing operations of the neural network, and the direct subnetwork also implements some processing operations of the neural network. It can be understood that each depth subnetwork and each direct subnetwork that are obtained by tiling the neural network are used to implement some operations of the neural network, and these depth subnetworks and direct subnetworks can implement a function of the neural network.
In some embodiments, the prediction processing further includes: processing, by a third second-order subnetwork, fourth input data to obtain second intermediate data; and storing the second intermediate data into a middle buffer, where the middle buffer is not the on-chip buffer; and processing, by a fourth second-order subnetwork, the second intermediate data obtained from the middle buffer to obtain fourth output data, where both the third second-order subnetwork and the fourth second-order subnetwork are included in the neural network, vertices included in the third second-order subnetwork are all different from vertices included in the fourth second-order subnetwork, and the fourth input data is obtained in the process of inputting the original input data to the neural network for prediction processing. A plurality of vertices included in the third second-order subnetwork exchange data with each other by reading and writing the on-chip buffer, and a plurality of vertices included in the fourth second-order subnetwork exchange data with each other by reading and writing the on-chip buffer. Optionally, both a quantity of vertices included in the third second-order subnetwork and a quantity of vertices included in the fourth second-order subnetwork are less than a first threshold. For example, the middle buffer is an off-chip buffer whose reading and writing speed is lower than that of the on-chip buffer, and reading and writing performance of the middle buffer is better than that of the external memory, such as a DDR. An example of the third second-order subnetwork is a depth subnetwork including a vertex 1, a vertex 2, and a vertex 3 in
In some embodiments, the data processing apparatus may have two working modes. In a first working mode, the data processing apparatus only reads and writes the on-chip buffer when inputting the original input data to the neural network for prediction processing. In a second working mode, the data processing apparatus reads and writes the on-chip buffer and the middle buffer when inputting the original input data to the neural network for prediction processing. Optionally, the data processing apparatus may store two AI chip-executable files. The data processing apparatus performs prediction processing in the first working mode when executing one AI chip-executable file, and the data processing apparatus performs prediction processing in the second working mode when executing the other AI chip-executable file. The data processing apparatus may switch a working mode based on a control instruction from a user. It should be understood that power consumption of the first working mode is lower than that of the second working mode, and calculation efficiency of the second working mode is higher than that of the first working mode. The two working modes have different advantages, and a user may set a corresponding working mode based on an actual requirement. For example, if the data processing apparatus uses the first working mode, an example of a processing procedure performed by the data processing apparatus is a processing procedure in
2403. The data processing apparatus outputs the prediction result.
The neural network does not need to access the external memory in a process of performing prediction processing on the original input data, so that power consumption of the data processing apparatus can be effectively reduced. It can be learned that the data processing apparatus may run various prediction methods such as face detection and face feature point detection in an ultra-low power consumption form in a screen-off scenario and the like.
In this embodiment of this application, in the process of inputting the original input data to the neural network for prediction processing, the depth subnetwork is used to execute a processing task of the neural network. Because the depth subnetwork does not need to access the external memory in a process of processing input data of the depth subnetwork, a quantity of times of accessing the external memory can be greatly decreased, and power consumption can be reduced.
In an optional manner, the processor 2502 is further configured to tile the neural network graph to obtain a direct subgraph, where the direct subgraph is used to represent a direct subnetwork, a plurality of vertices included in the direct subnetwork exchange data with each other by reading and writing the on-chip buffer, the direct subnetwork is configured to process second input data as a whole to obtain second output data, and the second input data is input data of the direct subnetwork.
In an optional manner, the processor 2502 is specifically configured to: obtain a first reference subgraph, where the first reference subgraph includes a first vertex and a second vertex, the first vertex is a current to-be-allocated vertex in the neural network graph, and the second vertex is a next vertex of the first vertex in the neural network graph; and add a third vertex to the first reference subgraph to obtain a second reference subgraph, where the third vertex is a next vertex of the second vertex in the neural network graph, and the second reference subgraph is used to process third input data. The apparatus further includes: an on-chip address manager 2403, configured to allocate an address of the on-chip buffer to the second reference subgraph in a process in which the second reference subgraph processes the third input data.
The processor 2502 is specifically configured to: when the address of the on-chip buffer is successfully allocated to the second reference subgraph and the third vertex is an end vertex, use the second reference subgraph as the depth subgraph. The processor is specifically configured to: when the address of the on-chip buffer is successfully allocated to the second reference subgraph and the third vertex is not the end vertex, add a fourth vertex to the second reference subgraph to obtain a fourth reference subgraph, where the fourth vertex is a next vertex of the third vertex in the neural network graph. The on-chip address manager 2403 is configured to implement a function of the on-chip address management module 404 in
In an optional manner, the processor 2502 is further configured to: when the address of the on-chip buffer fails to be allocated to the second reference subgraph, use the first reference subgraph as the direct subgraph.
In an optional manner, the on-chip address manager 2403 is specifically configured to allocate the address of the on-chip buffer to the second reference subgraph in a process in which the second reference subgraph processes the third input data as a whole.
The processor 2502 is specifically configured to: when the on-chip address manager fails to allocate the address of the on-chip buffer to the second reference subgraph, allocate the address of the on-chip buffer to the second reference subgraph in a process in which the second reference subgraph successively processes at least two groups of data obtained by tiling the third input data.
In an optional manner, the processor 2502 is further configured to generate a target instruction corresponding to the depth subgraph, where the target instruction is used to execute a target subtask, the neural network is configured to execute a target task, and the target subtask is a part of the target task.
The processor 2602 is configured to implement the method in step 2401 to step 2403 in
In an optional manner, the processor 2602 is further configured to: process, by a third second-order subnetwork, fourth input data to obtain second intermediate data; and store the second intermediate data into a middle buffer, where the middle buffer is not the on-chip buffer; and process, by a fourth second-order subnetwork, the second intermediate data obtained from the middle buffer to obtain fourth output data, where both the third second-order subnetwork and the fourth second-order subnetwork are included in the neural network, vertices included in the third second-order subnetwork are all different from vertices included in the fourth second-order subnetwork, and the fourth input data is obtained in the process of inputting the original input data to the neural network for prediction processing.
In an optional manner, a plurality of vertices included in the third second-order subnetwork exchange data with each other by reading and writing the on-chip buffer, and a plurality of vertices included in the fourth second-order subnetwork exchange data with each other by reading and writing the on-chip buffer.
In the embodiments of this application, the memory 2501 and the memory 2601 each may be a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory may store an operating system and another application program.
The processor 2502 and the processor 2602 each may be a central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), a GPU, or one or more integrated circuits, and is configured to execute a related program, so as to implement functions that need to be performed by units in the neural network graph tiling apparatus and/or the data processing apparatus in the embodiments of this application, or perform the neural network graph tiling method and/or the prediction method provided in the method embodiments of this application. The processor may also be an integrated circuit chip and has a signal processing capability.
In an implementation process, steps of the method provided in this application may be completed by using an integrated logic circuit of hardware in the processor or instructions in a form of software. The processor may be a general purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or another programmable logical device, a discrete gate or transistor logic device, or a discrete hardware component. The processor may implement or perform the methods, the steps, and logical block diagrams that are disclosed in the embodiments of this application. The general-purpose processor may be a microprocessor, or the processor may be any conventional processor or the like. Steps of the methods disclosed with reference to the embodiments of this application may be directly performed and completed by a hardware decoding processor, or may be performed and completed by using a combination of hardware and software modules in the decoding processor. The software module may be located in a mature storage medium in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, or a register.
An embodiment of this application provides a computer-readable storage medium. The computer-readable storage medium stores a computer program, and the computer program includes software program instructions. When the program instructions are executed by a processor in a data processing device, the neural network graph tiling method and/or the data processing method in the foregoing embodiments are/is implemented.
All or a part of the foregoing embodiments may be implemented by using software, hardware, firmware, or any combination thereof. When the software is used to implement the embodiments, all or a part of the embodiments may be implemented in a form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or some of the procedures or functions according to the embodiments of this application are generated. The computer may be a general-purpose computer, a dedicated computer, a computer network, or another programmable apparatus. The computer instructions may be stored in a computer-readable storage medium, or may be transmitted by using the computer-readable storage medium. The computer-readable storage medium may be any usable medium accessible by a computer, or a data storage device, such as a server or a data center, integrating one or more usable media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a DVD), a semiconductor medium (for example, a solid-state disk (SSD)), or the like.
The foregoing descriptions are merely specific embodiments of this application, but are not intended to limit the protection scope of this application. Any equivalent modification or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2019/097501 | Jul 2019 | CN | national |
This application is a continuation of International Application No. PCT/CN2019/128915, filed on Dec. 26, 2019, which claims priority to International Application No. PCT/CN2019/097501, filed on Jul. 24, 2019. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2019/128915 | Dec 2019 | US |
Child | 17583053 | US |