The present disclosure relates to a method and system for compressing kernels. More particularly, compressing kernels for use with a convolutional neural network, wherein the kernels exhibit cyclic rotation.
A system may be used to compress kernels with convolutional neural networks. It is desirable to reduce the network's memory footprint, the amount of data to be fetched, and the number of memory fetches. It is also desirable to reduce the power consumption of such systems.
According to a first aspect of the present disclosure, there is provided a method of compressing kernels; the method comprising detecting a plurality of replicated kernels; generating a composite kernel from the replicated kernels, the composite kernel comprising kernel data and meta data; and storing the composite kernel.
According to a second aspect of the present disclosure, there is provided a system for compressing kernels, the system comprising a detection module for detecting a plurality of replicated kernels; a generation module for generating composite kernels from the replicated kernels; and storage for storing the composite kernels.
According to a third aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium comprising computer-executable instructions stored thereon which, when executed by at least one processor, cause the at least one processor to compress kernels the instructions comprising detecting a plurality of replicated kernels; generating a composite kernel from the replicated kernels, the composite kernel comprising kernel data and meta data; and storing the composite kernel.
Further features and advantages will become apparent from the following description of preferred embodiments, given by way of example only, which is made with reference to the accompany drawings in which like reference numerals are used to denote like features.
Details of systems and methods according to examples will become apparent from the following description with reference to the Figures. In this description, for the purposes of explanation, numerous specific details of certain examples are set forth. Reference in the specification to ‘an example’ or similar language means that a feature, structure, or characteristic described in connection with the example is included in at least that one example but not necessarily in other examples. It should be further notes that certain examples are described schematically with certain features omitted and/or necessarily simplified for the ease of explanation and understanding of the concepts underlying the examples.
Convolutional neural networks typically comprise an input layer, a plurality of convolutional layers, a number of fully connected layers and an output layer. The input layer for example corresponds with an input to the neural network, such as image data. The convolutional layers are arranged to extract particular features from the input data to create feature maps and may only operate on a small portion of the input data. The fully connected layers then use the feature maps for classification.
In general, neural networks, such as the one described above, may undergo a training phase, in which the neural network is trained for a particular purpose. The internal state of a neuron within the neural network (sometimes referred to as the activation) typically depends on an input received by the neuron. The output of said neuron then depends on the input, kernel, bias, and the activation. The output of some neurons is connected to the input of other neurons, forming a directed, weighted graph in which edges (corresponding to neurons) or vertices (corresponding to connections) of the graph are associated with weights, respectively. The weights may be adjusted throughout the training altering the output of individual neurons and hence of the neural network as a whole.
When training neural networks one or more kernels are generated. The kernels are associated with at least some of the layers of the network. The kernels, for example allow features of an image to be identified. Some kernels may be used to identify edges in the input and others may be used to identify horizontal or vertical features in the image (although this is not limiting, and other kernels are possible). The precise features that the kernels identify will depend on the object that the neural network is trained to identify. Kernels may be three dimensional volumes having a width, height and depth, for example 3×3×64.
During supervised training, a training data set is used, the training data set comprises input and output data, and is used to train the neural network by providing the inputs to the network, determining an output, and then comparing the determined output to the known output provided by the training data set.
In general, the more training data items available in the training data set the more accurate a trained neural network will be at identifying features and/or objects. It is not uncommon for training data to be augmented by applying several transformations to the original training data item, thereby expanding the training data set, without the need for obtaining additional training data items. For example, the training data set may be augmented by applying several different transformations to the original training data items, such as rotation, shifting, rescaling, mirroring/flipping, shearing, stretching, adjusting the colour, and adding noise. Expanding the training data set in this way enables the neural network to more accurately classify objects which do not match the training data. For example, when the object to be identified is at a different orientation, under different lighting conditions, and/or a different size to the items in the training data.
As a result of augmenting the training data in such a way, it is not uncommon for the kernels generated to exhibit similar modifications. In particular, when applying rotation transformations to the training data, such as 90-degree rotations, the kernels generated as a result may also exhibit such rotational similarities. This is as a result of the required equivariance the neural network has when considering kernels exhibiting such 90-degree rotations. Therefore, storing kernels which exhibit such similarities requires an increased memory footprint and increases the power consumption of any system arranged to implement the neural network.
Kernels may be compared for such similarities by comparing the entire volume of one kernel with the entire volume of another kernel. Alternatively, kernels may be compare one portion at a time. A kernel may be separated into slices, such as a 3×3×1 slice. Each slice may then be compared against a slice of another kernel volume. For example, the 10th slice of one kernel volume may be compared to the 10th slice of another kernel volume.
For example, as shown in
The kernels may be of any size depending on the function of the neural network. The kernels 110a,110b,110c,110d, of
At item 210 of
Item 220 of
Once the composite kernel has been generated, at item 230 of
Generating composite kernels as described above and storing them in a new kernel set for use by the neural network during the processing of inputs, reduces the total number of kernels required to be stored in memory for implementing the neural network. This reduces the memory footprint of the neural network, as well as the amount of data that needs to be fetched from the memory, thereby saving power. Furthermore, by storing only the new kernel set, the amount of on-chip memory required is reduced, increasing efficiency and decreasing the number of memory fetches from on-chip memory, thereby resulting in a power saving.
In some embodiments, the generation of composite kernels may be undertaken when the neural network is being trained. For example, during the training process occurrences of rotated kernels may be detected, and then subsequently optimized. Alternatively, a fully trained network may be provided, the rotated kernels will then be detected and optimized, before the network is retrained. Retraining may occur when the neural network is implemented using a neural network accelerator or neural network processor, since the processor may use different data types, such as an 8-bit integer that the trained network, which may use, for example a floating-point data type. In yet other embodiments, a pre-trained network may be provided to a driver, such as the driver described below in relation to
Items 210 and 230 of the method 300 are identical to those discussed above in relation to
Alternatively, if at item 320 the difference does not exceed the threshold, the composite kernel is made to equal the original kernel. The threshold may be, for example, any weight in the neural network which does not change by more than two bits. Alternatively, the threshold may be the sum of different weights in a slice or volume of the kernel which does not change by a predetermined valued. It will be appreciated that the threshold may be the combination of the two options described above or may be determined using a different metric. In some embodiments where the kernels are generated using the method described in relation to
Once it is determined whether the composite kernel is equal to an average of the kernels or the composite kernel is equal the original kernel, as described above in relation to
In
Along with each average kernel, for example 110z, meta data 110m is also produced. The meta data 110m indicates whether the kernel is rotated. The average kernel 110z and meta data 110m is stored in a new kernel set 150 along with average kernels 112z,114z and meta data 112m,114m for other groups of replicated kernels in the kernel set. Storing kernels in this way may result in an approximately 75% reduction of memory requirements.
When implementing the neural network, for example using the neural network to identify/classify items in an image, each kernel 110z,112z114z of the new kernel set 150 is processed. The kernel 110z,112z,114z and the meta data 110m,112m,114m are fetched from storage (as will be described below). The meta data 110m,112m,114m is interrogated and it is determined whether the kernel 110z,112z,114z exhibits cyclic rotation. If so, the kernel 110z,112z,114z may be processed for each rotation indicated in the meta data 110m,112m,114m.
For example, the meta data may be a set comprising a binary representation of the rotations, such as {0, 1, 1, 0} which would indicate that there are three kernels which exhibit cyclic rotation, the first 0 represents the unrotated kernel, the first 1 represents a 90-degree rotated kernel, the second 1 represents a 180-degree rotated kernel, and the final 0 represents that there is no 270-degree rotated kernel in the kernel set.
Items 210 and 230 of the method 500 are identical to those described above in relation to
At item 550, it is determined whether all kernels from the group have been, if not, then the method loops back to item 520, where further kernels, for example kernel 110c, is aligned with the first kernel, a delta kernel is produced and stored within the new kernel set.
Once all kernels from the group the method loops back to item 210 where a next group, such as group 112, is detected and the process is repeated. Once all groups of kernels have been processed, any remaining kernels in the original kernel set, that do not exhibit cyclic rotation are added to the new kernel set, without processing.
In some embodiments, the delta kernel may be compressed, for example using a lossless compression algorithm such as Rice-Golumb coding, to further reduce the memory requirement of the method. As mentioned above reducing the amount of memory required for the method has additional benefits. In particular, it reduces the network size and the number of memory fetches required, thereby reducing the power requirements of implementing the neural network.
When implementing the neural network, for example for identifying/classifying items in an image, when the new kernel set has been generated using the method 500 described above in relation to
The neural network may be processed, for example, on a neural network accelerator, or other processor designed to process neural networks. The detection of replicated kernels may occur during the training process, for example whilst training a neural network, on the neural network accelerator and/or neural network processor. In this embodiment, any kernels that exhibit 90-degree cyclic rotation in comparison to other kernels may be detected, grouped together, and then processed further as described above.
In an alternative embodiment, the kernel set may be processed prior to training the neural network. The neural network accelerator may include an interface via which inputs to the neural network may be received, for example from other components of a computer device.
The CPU 610 of
In the alternative embodiment described above, the driver 612 of the CPU 610 may be configured to process, using the CPU 610, the kernel set to produce the new kernel set prior to the training of the network in accordance with any of the methods 200,300,500 previously described in relation to
The computer device 600 also includes a dynamic memory controller (DMC) 630 which may be used to control access to storage 640 of the computer device 600. The storage 640 is for example external to the neural network accelerator 620 and may be a random-access memory (RAM) such as DDR-SDRAM (double data rate synchronous dynamic random-access memory). In other examples, the storage 640 may be or include a non-volatile memory such as Read Only Memory (ROM) or a solid-state drive (SSD) such as Flash memory. The storage 40 in examples may include further storage devices, for example magnetic, optical or tape media, compact disc (CD), digital versatile disc (DVD) or other data storage media. The storage 640 may be removable or non-removable from the computer device 104. In some embodiments, the storage may be used for storing the original and new kernel sets. Alternatively, the original and new kernel sets may be stored in on-chip memory within the neural network accelerator 620, or other component of the computer device 600.
The components of the computer device 600 in the example of
The system 700 comprises storage 710 for holding a plurality of kernels generated by training of a neural network. The kernels may exhibit cyclic rotation. The system 700 also comprises a compression module 720, further comprising a detection module 722 and a generation module 724. The detection module 722 retrieves kernels from the storage 710, and is arrange to determine whether any of the kernels exhibit cyclic rotation. Once kernels exhibiting cyclic rotation have been detected, the generation module 724 is arranged to produce an average/composite kernel. This average/composite kernel, as described above, is also stored with meta data indicating whether the kernel exhibits cyclic rotation.
Once the generation module 724 has produced an average/composite kernel, it is stored in further storage 730 as part of the new kernel set, for use when implementing a convolutional neural network. The further storage 730 may be the same as the storage 710 holding the original kernels, or alternatively, may by separate storage.
The order of processing steps in the examples described above are merely examples. In other examples, these processing steps may be performed in a different order.
It is to be understood that any feature described in relation to any one example may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the examples, or any combination of any other of the examples. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the accompanying claims.
Number | Name | Date | Kind |
---|---|---|---|
5008752 | Van Nostrand | Apr 1991 | A |
5959693 | Wu | Sep 1999 | A |
6587537 | Hsieh | Jul 2003 | B1 |
7359576 | Worthington | Apr 2008 | B1 |
20030068085 | Said | Apr 2003 | A1 |
20050230641 | Chun | Oct 2005 | A1 |
20090129636 | Mei | May 2009 | A1 |
20130016784 | Sikora | Jan 2013 | A1 |
20130322752 | Lim | Dec 2013 | A1 |
20160342893 | Ross | Nov 2016 | A1 |
20160358068 | Brothers | Dec 2016 | A1 |
20170034453 | Usikov | Feb 2017 | A1 |
20180293552 | Zhang | Oct 2018 | A1 |
20190122115 | Wang | Apr 2019 | A1 |
20190171926 | Chen | Jun 2019 | A1 |
20190340488 | Fishel | Nov 2019 | A1 |
Entry |
---|
Sedighi—(Histogram Layer, Moving Convolutional Neural Networks Towards Feature-Based Steganalysis—2017) (Year: 2017). |
Li—(An Efficient Deep Convolutional Neural Networks Model for Compressed Image Deblocking—2017) (Year: 2017). |
Mai—Kernel Fusion for Better Image Deblurring (Year: 2015). |
Szafranski—Composite Kernel Learning (Year: 2009). |
Li—An_efficient_deep_convolutional_neural_networks_model_for_compressed_image_deblocking (Year: 2017). |
Sedighi—HistogramFinal (Year: 2017). |
Gretton—Optimal kernel choice for large-scale two-sample tests (Year: 2012). |
Number | Date | Country | |
---|---|---|---|
20200090032 A1 | Mar 2020 | US |