The present disclosure relates to the technical field of computers, and in particular, relates to a neural network computing method, system and device therefor.
In the era of big data, more and more devices such as industrial robots, driverless cars and mobile devices, etc. require more and more complex processing on real-time input of the real world. These tasks mostly belong to machine learning field, wherein most operations are vector operations or matrix operations, which have a high degree of parallelism. As compared to the conventional GPU/CPU acceleration schemes, hardware ASIC accelerator is the most popular acceleration scheme at present. On one hand, it can provide a high degree of parallelism so as to achieve high performance, and on the other hand, it has high energy efficiency.
However, bandwidth becomes a bottleneck that limits performance of the accelerator, and the common solutions balance disequilibrium of the bandwidth by means of on-chip cache. These common solutions do not optimize reading and writing of the data, and cannot utilize characteristics of the data well, such that cost of the on-chip storage as well as cost of data reading and writing is too much. As for current common machine learning algorithms, on one hand, data size is huge, as for hardware, the resource is quite limited, and a huge network cannot complete computation once; on the other hand, most of the data have reusability, i.e., the same data will be used for many times, such that the data has the same characteristics.
In conclusion, the current neural network computing technology obviously has inconvenience and deficiencies in practical use, so it is necessary to make improvement.
With respect to the above deficiencies, an object of the present disclosure is to provide a network computing method, system and device therefor, so as to improve the computing efficiency of the neutral network.
In order to achieve the object, the present disclosure provides a neural network computing method, comprising the following steps:
A. dividing a neural network into a plurality of subnetworks having consistent internal data characteristics;
B. computing each of the subnetworks to obtain a first computation result for each subnetwork; and
C. computing a total computation result of the neural network on the basis of the first computation result of each subnetwork.
According to the computing method, the step A comprises:
A1. dividing the neural network into a plurality of subnetworks having consistent internal data characteristics on the basis of output neurons of the neural network;
A2. dividing the neural network into a plurality of subnetworks having consistent internal data characteristics on the basis of input neurons of the neural network; and
A3. dividing the neural network into a plurality of subnetworks having consistent internal data characteristics on the basis of neuron weights of the neural network.
According to the computing method, the step A3 comprises:
According to the computing method, in the step C, the first computation result of each subnetwork is spliced or weighted to compute the total computation result of the neural network.
According to any one of the computing method, data of the neural network is stored in an off-chip storage medium, and data of the subnetwork is stored in an on-chip storage medium.
In order to achieve another object of the present disclosure, the present disclosure further provides a neural network computing system, comprising:
According to the computing system, the division module comprises:
According to the computing system, the third division submodule divides the neural network into a plurality of subnetworks having consistent internal data characteristics on the basis of distribution of the neuron weights of the neural network; or
According to the computing system, the second computation module splices or weights the first computation result of each subnetwork to compute the total computation result of the neural network;
In order to achieve another object of the present disclosure, the present disclosure further provides a device for any one of the computing system, comprising:
In order to clarify the object, the technical solution and the advantages, the present disclosure is further explained in detail with reference to the drawings and the examples. It shall be understood that the specific examples described here are only to explain the present disclosure, instead of limiting the present disclosure.
Referring to
In this example, a neural network computing system 100 is provided, through which a neural network is divided into a plurality of subnetworks at first. According to different division principles, the neural network can be divided into different subnetworks, and different division methods result in different characteristics of the subnetworks, wherein data of the neural network is stored in off-chip storage medium, and data of the subnetwork is stored in on-chip storage medium. Specifically, the division module 10 divides the neural network into different subnetworks according to different division principles. The division principles make internal data characteristics of the same subnetwork have consistency, whereas data of different subnetworks may have different properties and different subnetworks may be stored in different mediums, such as, in (i.e., on-chip) and out of the chip, so as to be dispatched by hardware for computation at different times. The first computation module 20 performs subnetwork computation by computing each of the subnetworks to obtain a first computation result for each subnetwork. Generally, limited on-chip resources limit the possibility of computing all data simultaneously, so data is divided. Large storage medium (cheap, slow speed) is placed out of the chip, and small storage medium (expensive, fast speed) is integrated on the chip. Data is stored in the off-chip storage in accordance with subnetworks, and is carried to the computation module for related operations of subnetworks at different times. Although the neural network itself may be a complex huge network, computation of each subnetwork is consistent with that of the original network itself. Finally, the second computation module 30 computes the total computation result of the neural network by splicing or weighting the first computation result of each subnetwork. Different operations are performed on different subnetworks according to different division principles. For example, the second computation module 30 simply splices or computes to obtain the final computation result of the total network. Therefore, it improves the computing efficiency of the neutral network.
Referring to
In this example, subnetwork division principles in the present disclosure comprise division according to output neurons, division according to input neurons, and division according to weights. The first division submodule 11, the second division submodule 12 and the third division submodule 13 perform division according to different division principles. As for the subnetwork division method shown in
In addition, the third division submodule 13 divides the neural network into a plurality of subnetworks having consistent internal data characteristics on the basis of distribution of the neuron weights of the neural network; or
As for subnetwork division shown in
In one example of the present disclosure, as for subnetwork division principle in one example shown in
In the examples aforesaid, a plurality of modules of the neural network computing system 100 can be software units, hardware units, or software and hardware combined units.
Referring to
In this example, the device 101 of the neural network computing system comprises the on-chip storage and addressing module 1011, the on-chip address index module 1012 and the on-chip computation module 1013. The on-chip address index module 1012 indexes data stored on the chip; a data read-out interface of the on-chip storage and addressing module 1011 is an output port of the indexed data; a data write-in interface of the on-chip storage and addressing module 1011 is an interface through which data of the storage unit is written into a corresponding storage position according to a write address. The on-chip storage and addressing module 1011 adopts a design with read port and write port separated, such that reading and writing of the data are independent from each other and may be performed simultaneously. Therefore, repetitive addressing in an on-chip address space can be performed efficiently, and addressing of off-chip addresses can be performed also. Specifically, there are an on-chip storage medium, an off-chip storage medium, an address index unit, channel between on-chip and off-chip data, and on-chip data channels. The on-chip storage medium comprises common storage mediums, such as, a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), an Enhanced Dynamic Random Access Memory (eDRAM), a Register file (RF) and the like, and also can be novel storage devices, such as, a Non-Volatile Memory (NVM), or a 3D storage device and the like. As for the on-chip storage medium, it is not limited to storage mediums. The off-chip storage medium comprises common storage mediums, such as, a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), an Enhanced Dynamic Random Access Memory (eDRAM), a Register file (RF) and the like, and also can be novel storage devices, such as, a Non-Volatile Memory (NVM), or a 3D storage device and the like. Address space is divided into off-chip data space and on-chip data space. Address space division has a strong flexibility, and is not limited to a size of the address space. The channels between on-chip and off-chip data comprise interconnection techniques of PCI, PCIE, HT and the like, and the channels between on-chip and off-chip data are not limited to interconnection techniques. The on-chip data channels comprise interconnection techniques of FATTREE, HTREE and the like, and the on-chip data channels are not limited to interconnection techniques. Data of the neural network and the subnetworks can be read and written once or many times, and data can be read to one or more on-chip computation units. The on-chip storage medium can be read and written once or many times from outside, and the on-chip storage medium can be read and written once or many times from inside. The off-chip storage medium can be read and written once or many times, and data of the off-chip storage medium can be read to one or more on-chip computation units. The off-chip storage medium can be read and written once or many times from outside, and the off-chip storage medium can be read and written once or many times from inside. The on-chip storage medium comprises one or more replacement. A data replacement strategy of the on-chip storage medium comprises sequential replacement, reversed replacement, random replacement and the like.
Referring to
In this example, the neural network is divided by the division module 10 into subnetworks, such that by means of accelerating individual subnetwork, computation of the subnetworks can be rapidly and efficiently completed by a chip, and computation of the total network is rapidly and efficiently. According to different division principles, the neural network is divided into different subnetworks organized by the first computation module 20 and the second computation module 30 for computation. In addition, data of the neural network is stored in an off-chip storage medium, and data of the subnetwork is stored in an on-chip storage medium. The first computation result of each subnetwork is spliced or weighted to compute the total computation result of the neural network. The present disclosure can effectively provide reusability of the data, fulfill the requirements of flexible addressing, efficiently satisfy the requirements for hardware resources, such as, the bandwidth, and can be adapted to different scenes.
In another example of the present disclosure, the step S901 comprises:
The third division submodule 13 divides the neural network into a plurality of subnetworks having consistent internal data characteristics on the basis of distribution of the neuron weights of the neural network; or
As for a heterogeneous platform, the data can be stored on the chip of an accelerator is quite limited. Currently, the neural network often has a large scale, thus it is required to divide the entire neural network into different subnetworks for computation, and the required data block is read in or written out through interaction of data on the off-chip large storage medium and the on-chip small storage medium. Finally, a total network result is computed on the basis of the computation results of different subnetworks. On-chip and off-chip data connection shown in
In one example of the present disclosure, the computing flow of the neural network shown in
In this example, the neural network is divided into subnetworks, and weights of the neural network are clustered to 356 types, i.e., 356 weights. Hypothesizing that a weight cache on the chip can only store 256 values, naturally the neural network is divided into two types, one type is a network connected using the former 256 weights, i.e., the subnetwork 1, and the other type is a network connected using the remaining 100 weights, i.e., the subnetwork 2. Hence, as for the final neuron result, it is only required to add accumulation results of the subnetworks 1 and 2 so as to obtain the final result of the total network. After beginning to compute, the former 256 weights are loaded to the chip, all output neurons are addressed one by one according to input neurons and then computed till all weights are used, and computation of the subnetwork 1 is completed. Similarly, computation of the subnetwork 2 is completed. Results of the subnetworks 1 and 2 are added to obtain the final result of the total network. It shall be noticed that the storage devices in various examples of the present disclosure are not limited to storage-specific mediums, may be common storage mediums, such as, a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), an Enhanced Dynamic Random Access Memory (eDRAM), a Register file (RF) and the like, and can be novel storage devices also, such as, a Non-Volatile Memory (NVM), or a 3D storage device and the like.
In conclusion, by means of dividing a neural network into a plurality of subnetworks having consistent internal data characteristics; computing each of the subnetworks to obtain a first computation result for each subnetwork; and computing a total computation result of the neural network on the basis of the first computation result of each subnetwork, the present disclosure can cut on-chip cache overhead by reasonably dispatching data, so as to provide a design of more efficient accelerator. Due to effective division of the large scale data, the requirements for hardware resources, such as, the requirements for access bandwidth, are reduced while a good flexibility is provided, and the problem of efficiently reading and writing repetitive data is solved.
Certainly, the present disclosure also may have other multiple examples, and without departing from the spirit and substance of the present disclosure, those skilled in the art shall make various corresponding modifications and variations according to the present disclosure, but these corresponding modifications and variations shall belong to the scope protected by the appended claims.
By means of dividing a neural network into a plurality of subnetworks having consistent internal data characteristics; computing each of the subnetworks to obtain a first computation result for each subnetwork; and computing a total computation result of the neural network on the basis of the first computation result of each subnetwork, the present disclosure can cut on-chip cache overhead by reasonably dispatching data, so as to provide a design of more efficient accelerator. Due to effective division of the large scale data, the requirements for hardware resources, such as, the requirements for access bandwidth, are reduced while a good flexibility is provided, the problem of efficiently reading and writing repetitive data is solved, and the computing efficiency of the neural network is improved.
Number | Date | Country | Kind |
---|---|---|---|
201610149920.9 | Mar 2016 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2016/094199 | 8/9/2016 | WO | 00 |