The present disclosure relates to a method and an electronic system for inferring a Morphological Neural Network based on inference data including an input data vector.
Artificial Neural Networks (ANN) are widely used as an easy and generic way to implement solutions for digital signal processing and analysis. Such networks obtain their outputs as a combination of a cooperative operation of individual neurons, each neuron comprising a linear filtering process followed by a non-linear activation function. However, alternative Networks to classic Artificial Neural Networks have been developed using a set of mathematical operations (as those based on the use of tropical algebra rules), simpler than those used by classic ANNs.
One such alternative is Morphological Neural Networks (MNN). These networks are a type of ANN wherein each neuron can identify a range of values of the input data by using a set of lower and an upper boundaries to the regions of interest. These sets of boundaries are defined by the weights of each morphological neuron, in the form of morphological filters (i.e., each implemented by a morphological neuron, as opposed to classic ANN filters which are implemented by a classical artificial neuron). Morphological filters (i.e., morphological neurons) use maximum and minimum operations, without using the complex non-linear activation functions associated to classical ANNs.
Furthermore, MNNs are characterized by defining their outputs as a combination of orthogonal hyperplanes, which is a more complex definition than the simpler hyperplanes defined by classical ANNs. Therefore, by virtue of such hyperplanes and assuming the same number of neurons per network, MNNs can classify the input data into a larger number of regions than classical ANNs can.
An example of a software architecture developed for morphological networks (F. Arce, E. Zamora, H. Sossa, R. Barrón, 2016) consists of a network comprising two layers: a first hidden layer formed by complementary morphological neurons each defining the boundaries of a different range of values of the input data of the network, and a second layer comprising morphological neurons followed by a non-linear activation function. The second layer has the function of identifying the class to which the input data belongs to. In said example, each neuron performs the following operation:
That is, determining the minimum (denoted with operator ⊕′) between two previously obtained minima. In the equation, yj for j={1 . . . K} is the output of the j-th neuron, and xi for i={1 . . . Q} is the i-th attribute of each of the Q-dimensional inputs which form the input data database (i.e., inputs of the network) to be classified. For the estimation of weights ωij and νij heuristic methods are normally followed, in contrast to the way in which classical ANNs are trained by using the most powerful algorithm currently used: the gradient descent backpropagation.
Recent advances in morphological networks tend to combine morphological layers (i.e., a set of morphological neurons) with conventional ones (i.e., a set of classical artificial neurons). Such networks may be defined as hybrid morphological-classical neural networks. Said advances claim that a network consisting of a set of combined morphological layers and classical artificial linear neuron layers can approximate any smooth function. Furthermore, its approximation error will be all the smaller the greater the number of neurons (both classical and morphological), which act as extractors of the input attributes.
Morphological networks use the principles of Tropical Algebra to implement at least one morphological layer. Such layer implements the maximum (or minimum) of additions instead of the sum of products. As a result, the nature of each morphological layer computation is nonlinear. This in contrast to the classical ANN scheme in which the non-linearity is local to each artificial neuron, and not a global non-linear operation associated to a single layer (as is the case of the morphological layer).
Therefore, the properties of morphological neural networks are structurally different than those of classical neural network models. In addition, it is possible for a hybrid morphological-classical neural network to be trained by using the same type of algorithms as the gradient descent backpropagation.
The software implementation of such hybrid morphological-classical neural networks requires a high computational cost from a controller. This becomes a significant drawback when such neural networks are used within edge computing devices.
Edge Computing may be defined as all the data processing that is performed in the edge of any network, i.e., nearby the physical data collection point in the real world. Generally, when machine learning is needed to be applied to said data, there are two possible strategies: sending the data from an edge device to a cloud (or server) where the AI processing is implemented; or performing the AI processing within an edge device, and then send the output data to the cloud.
If an edge device has to send the data to be processed by a neural network to a cloud or server, it requires a strong data connection (such as a standard cable Internet connection or wifi connection) and enough upstream and downstream band to send and receive such data. This may be difficult because of the usual high weight of the data used in such cases (for example, in image recognition systems, wherein sending images with proper resolution uses a lot of bandwidth). Furthermore, edge devices may require a high energy consumption to send and receive such amounts of big data.
Therefore, there is a need for a hybrid morphological neural network implementation which can be used within edge computing devices without a high computation cost and energy consumption, and with an acceptable performance.
Despite the progress that has been made in terms of morphological architectures, no application-specific circuitry implementations of such morphological hybrid neural networks have been implemented to this date.
According to a first aspect, a method of inferring of a Morphological Neural Network (MNN) is presented, the method being performed by an electronic system, based on inference data including an input data vector, wherein the MNN comprises at least one morphological hidden layer and an output linear layer, a first matrix of neuron weights associated with the morphological hidden layer, and a second matrix of neuron weights associated with the output linear layer, and wherein the method comprises the steps of:
The input data vector may be defined as the input data to be processed by the MNN Network to infer an output. Furthermore, the input data vector may be codified in binary, and can be taken directly from the sensors (input sensed signals) of the network or be the result of a pre-processing of the input sensed signals.
Classically, a binary number is defined by using a set of N-bits, each one with a weight related to 2 powered to the relative position of the bit inside the bit-stream. Two possible widely used binary codification may be defined: signed binary and unsigned binary. In the signed binary codification, an N-bit number can represent a number between −2N-1 and 2N-1−1. This codification is also known as of two-complement. On the other hand, in unsigned binary codification, an N-bit number can represent a number between 0 and 2N−1. However, unsigned codification does not allow defining negative numbers. Therefore, the input data vector may use either of said binary codifications.
Furthermore, the first matrix of neuron weights associated with the hidden layer may be defined as the connectivity matrix of weights from the input of the network (where the input data vector is inputted) to the morphological hidden layer of the network, corresponding to all the morphological neurons of the hidden layer.
The number of neurons of the morphological hidden layer can be selected during the training stage of the network, in order to achieve the best fitting for the purpose of the neural network. Furthermore, the number of neurons of the output linear layer is determined by the purpose of the network itself (i.e., the number of categories outputted by the neural network).
The second matrix of neuron weights associated with the output linear layer may be defined as the connectivity matrix of weights from the morphological hidden layer to the output linear layer of the network itself. In case that the MNN network comprises more than one hidden layer, the penultimate layer of the network would be a morphological hidden layer, and the last layer of the network would be an output linear layer, and thus, the second matrix of neuron weights would be the connectivity matrix of weights from the output of the last morphological hidden layer within the network to the output linear layer of the MNN network. The function of the second matrix is to perform a simple linear transformation.
The matrix of addition components may be an intermediate matrix, calculated by performing a vector-matrix addition between the input data vector and the first matrix of neuron weights.
As an example, the following mathematical notation used in the rest of the present disclosure may be defined. More specifically, a generic matrix A with M rows and N columns and vector {right arrow over (b)} with N components. A vector-matrix operation may be defined as follows:
By performing such vector-matrix operation, a resulting matrix may be generated by operating a vector and a matrix. In the above depicted equation, a generic operation “(op)” is defined. From equation (1) a vector-matrix addition may be defined by equation (2) as:
Similarly, a vector-matrix product may be defined as:
Over any matrix A, the operators maximum or minimum component of a row (Mr and mr) may be defined as follows:
Finally, a sum over rows may be defined as:
Both vector-matrix operations, along with operators Mr, mr and Sr may be used below.
Regarding the steps of the method of inferring of a MNN, in step a, the vector-matrix addition between the input data vector ({right arrow over (x)}) and the first matrix of neuron weights may be performed by using an array of binary digital adders of the electronic system.
Furthermore, in step b, the components outputted by the array of binary digital adders, which are the result of the vector-matrix addition performed in step a, are further encoded in a temporal signal.
Many temporal coding possibilities exist. According to a common example, a temporal coding uses a single bit which oscillates along time and, wherein the codified value depends on the frequency of the oscillation of the single bit between the high and the low bit states.
According to a specific example, the temporal encoding may be performed by comparing each component of the matrix of addition components with a counter signal, using an array of digital comparators of the electronic system.
The counter signal may be, for example, a continuous counter signal. That is, it may be a signal which counts correlative values. This signal can also be, for example, cyclical, i.e., it can count from a first value to a last value, and then start back on the first value.
Alternatively, the counter signal may be a random signal. That is, it may be a signal which delivers a series of random values. This signal can also be, for example, cyclical, i.e., it can count from a first value to a last value (wherein the values are not correlatives), and then start back on the first value, thus being considered “pseudo random”. A random or pseudo random signal is used in encoding styles such as stochastic computing encoding.
According to an example, the array of multiplying logic gates may comprise at least one XNOR logic gate.
Furthermore, according to another example, the array of multiplying logic gates may comprise at least one AND logic gate.
When multiplying two quantities using a temporal codification such as, for example, a stochastic bipolar or unipolar codification, the gates needed to perform such operation may be the XNOR gate or the AND gate respectively. Depending on which type of temporal codification is used, the multiplying logic gate may comprise one or more XNOR gates or one or more AND gates.
More precisely, according to another example, the encoding of each component of the matrix of addition components into a temporal signal may be performed in several ways:
A digital comparator or magnitude comparator is a hardware electronic device that takes two numbers as input in binary form and determines whether one number is greater than, less than or equal to the other number according to a specific codification (one of the most widely used in 2-complement). Comparators are widely used in central processing units (CPUs) and microcontrollers (MCUs). Examples of digital comparator may comprise digital designs like the ones used for the CMOS 4063 and 4585 or the TTL 7485 and 74682.
To temporally encode each signal (each binary value of each component of the matrix of addition components), both the input signal to be encoded and the reference counter signal are inputted in a digital comparator, where the reference counter signal changes its binary value over a series of values. This way, when the signal to be encoded is greater than the reference counter signal, a high output (for example, a first output binary signal being equal to 1) is outputted by the comparator. Otherwise, the output of the comparator will be a low value (output binary signal being equal to 0).
According to specific examples of the present disclosure, the reference counter signal may be a continuous counter signal (i.e., a cyclical counter), thus resulting in a signal at the output of the comparator which has two continuous states (either one and changing to zero, or zero and changing to one). Such encoding could be defined as a two-state temporal encoding.
Alternatively, the reference counter signal may also be a random or pseudo-random clock signal. Such signal may be obtained from, for example, a linear-feedback shift register (LFSR). Such module is a shift register whose input bit is a linear function of its previous state.
In this specific example, the initial value of the LFSR may be called the seed, and because the operation of the register is deterministic, the stream of values produced by the register is completely determined by its current (or previous) state. Likewise, because the register has a finite number of possible states, it must eventually enter a repeating cycle. However, an LFSR with a well-chosen feedback function can produce a sequence of bits that appears random and has a very long cycle.
General applications of LFSRs may include generating pseudo-random numbers, pseudo-noise sequences, fast digital counters, and whitening sequences. Both hardware and software implementations of LFSRs are common.
Therefore, the resulting signal of using a random or pseudo random number signal (generated by an LFSR or other digital pseudo-random number generators such as a Sobol Sequence Generator) may be a stochastic temporal encoded signal. More specifically, stochastic temporal encoding may be defined as a type of encoding wherein information is represented and processed in the form of digitized probabilities. Stochastic computing (SC) was proposed in the 1960s as a low-cost alternative to the architecture of Von Neumann. However, because of its intrinsically random nature, a resulting lower accuracy discouraged its use in digital computing. However, when used within the hardware implementation of the inferring process of an MNN, it may be useful it substantially reduces the amount of hardware needed to perform the inference. This dramatic decrease in hardware comes with a decrease of accuracy and computing speed, which is acceptable for most applications of such neural networks as pattern recognition networks, wherein the neural networks are systems that may not need a very high accuracy in all its intermediate processing steps to provide an accurate pattern recognition response.
As previously described, an example of a temporal coding uses a single bit which oscillates along time and, the codified value depending on the frequency of the oscillation of the single bit between the high and the low bit states. Regarding the specific example of the temporal encoding being a stochastic temporal encoding, in which this oscillation is following an apparently random pattern, the two most widely used codifications are the unipolar and the bipolar codes.
Unipolar codification may be performed by adding all the high binary values of a signal during time, and dividing the result quantity by the total number of high and low values. Therefore, in unipolar type stochastic encoding, the number codified is directly the probability of having a high value in the bit signal. Coded numbers are bounded between 0 and 1, and have the disadvantage of not including negative numbers.
On the other hand, bipolar codification may be performed by adding all the high values and subtracting from this quantity the number of low values, the result of this subtraction is divided by the total number of zeroes and ones. For the case of bipolar, the possible values are bounded between −1 and 1.
After temporally encoding the components of the matrix of addition components (for example, either with a two-state temporal encoding or a stochastic temporal encoding), a step c is performed wherein the maximum or minimum value among the components of each row of the matrix of addition components is selected (previously denoted in generical equations (4) and (5) by operators Mr and mr respectively). Furthermore, a vector of outputs of the hidden layer is generated from said selection.
More precisely, the vector of outputs of the hidden layer is generated by selecting, for each row within the matrix of addition components, the maximum or minimum component of the row, and placing it in the same row of the vector of outputs as the row of the matrix of addition components from which it has been calculated.
The selection, for each row of the matrix of addition components, of a maximum or minimum value among the components of the row, is performed in the following way:
More precisely, all the components in each row of the matrix of addition components are inputted in a corresponding logic gate, to obtain at the output of the logic gate the desired maximum or minimum value within said row.
The maximum or the minimum between values of two different temporally encoded signals can be obtained by using OR and AND gates respectively. This is possible when all the signals involved in the operation are correlated. This correlation is obtained when the same counter is used while decoding each number from classical binary notation (unsigned or signed) to a temporal codification. The coherence between the switching of all the signals involved implies that an OR gate selects the signal that is switching the most (the maximum) and that an AND gate selects the signal that is switching the less.
Therefore, by performing steps a to d, a resulting vector of outputs of the hidden layer is inferred through the morphological hidden layer, with an input of the network (i.e., the input data vector).
Afterwards, step e is performed, wherein a matrix of multiplying components is generated by performing the vector-matrix product of the vector of outputs of the hidden layer with the second matrix of neuron weights.
Therefore, in step e, the vector-matrix product between the vector of outputs of the hidden layer and the second matrix of neuron weights may be performed by using an array of multiplying logic gates of the electronic system. Such array of multiplying logic gates may comprise one or more XNOR logic gates and/or AND logic gates of the electronic system.
Finally, in step f, an output data vector for the neural network is generated, by performing a sum over the rows (previously denoted in generical equation (6) by operator Sr) of the matrix of multiplying components.
Therefore, each component of the same row within the matrix of multiplying components is inputted in an Accumulated Parallel Counter (APC), to have, for each row, the sum of all the corresponding components of said row. That is, a sum over rows is performed over the matrix of multiplying components. The result of such sum over rows is a vector of one column with a number of components being equal to the number of rows of the matrix of multiplying components. Such resulting vector is defined as the output data vector.
The output data vector may be considered as the output of the Neural Network, which may be interpreted a set of categories in which the input is classified. The number of possible outputs of the neural network corresponds to the number of rows the output data vector has.
Consequently, the hardware implementation of a MNN network according to the present disclosure allows to use digital binary adders along with maxima and minima operating modules (using OR and AND logic gate arrays) instead of a micro-controller (wherein the network may be programmed). Furthermore, the use of the temporal codification allows merging both layers (morphological and classical linear layers) in a highly compact way (i.e., there is no encoding or decoding of the data in between layers of the MNN network), thus reducing the energy consumed by the inferring of the network, and the complexity of the hardware used to implement the network.
Furthermore, by choosing a hardware implementation of an MNN network according to the present disclosure, instead of a hardware implementation of other ANN networks, the hardware resources used may be considerably reduced since in the network according to the present disclosure there is no need to implement any non-linear activation function for each neuron, with the associated saving in terms of hardware and energy resources.
Also, when an edge computing device performs the AI processing (i.e. infers a neural network according to the present disclosure) within itself, and sends the output data to the cloud, a small quantity of information is needed to be sent from node to server, thus minimizing the energy consumption considerably. This is because the data transmission is highly costly in terms of energy consumption of an edge device. In this sense, the low-power AI hardware implementation according to the present disclosure minimizes the processing steps in the AI-modules, thus increasing the battery lifetime of battery-powered devices.
More precisely, the use of the method of the present disclosure may minimize the cost of inferring data in an MNN neural network by means of an electronics system, (such as, for example, a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit, ASIC), thus avoiding high-cost digital blocks. Such high-cost digital blocks may usually be implemented by large numbers of logic gates, such as, for example, digital multipliers. Furthermore, the implementation of costly non-linear functions is also avoided, such as the hyperbolic tangent or the sigmoidal functions. As a result, it is possible to avoid the use of hardware-consuming digital blocks.
According to another example of the present disclosure, step e of the method of inferring of a Morphological Neural Network may further comprise temporally encoding the components of the second matrix of neuron weights, the encoding being performed by comparing each component of the second matrix of neuron weights with a second counter signal, using a second digital comparison module of the electronic system.
Therefore, the operations between outputs of one layer and inputs of the subsequent layer do not require coding and decoding in between them, resulting in a more compact (i.e., less hardware consuming) implementation. More specifically, the operations of the selection of minimum or maximum and the generation matrix of multiplying components are linked with no codification or decodification between them, thus compacting the hardware used to implement them, and decreasing the computation time.
According to a specific example, the first counter signal and the second counter signal are statistically decorrelated.
This is because both signals (the outputs of the hidden morphological layers and the multiplying weights from the second matrix of neuron weights), which are codified in the temporal domain, are combined with multiplying gates to generate the matrix of multiplying components. In this case, the time-domain signals have to be temporally uncorrelated (and therefore have to be generated by different counters) in order to provide the product at the output of the array of multiplying logic gates.
Furthermore, the second counter signal may be a continuous counter signal or a random signal if a stochastic codification is used. Regarding the stochastic option, if a bipolar codification is selected, the multiplying logic gates used to implement the product may be XNOR gates. Otherwise, if a unipolar codification is used, the multiplying logic gates to be used may be AND gates.
According to another example of the present disclosure, the method of inferring of a Morphological Neural Network may further comprise generating two operating sub-matrices from the first matrix of neuron weights, each sub-matrix comprising a unique set of weights from the first matrix of neuron weights, and wherein:
This way, by dividing the first matrix of neuron weights into two sub-sets of weights, each subset may be used to subsequently select a maximum and a minimum among its rows. By doing so, the non-linearity of the operations is enhanced, and the precision of the overall network increases significantly.
Another advantage of the use of morphological layers comprising both maximum and minimum operations may be that, when applying a classical back-propagation algorithm to the full network, it would naturally discard several morphological weights. That is, during the network's training, when lowering (or increasing) the sub-set of weights that are used to estimate the maximum (or the minimum) of each row of the matrix of addition components, some weights (and therefore some connections) could be discarded from the final hardware implementation since they may have no influence in the final value of said operations. This natural “pruning” of connections may be implemented by simply applying the gradient descend backpropagation algorithm.
According to another example, the network may comprise a plurality of hidden layers and a plurality of output linear layers, each hidden layer comprising a matrix of neuron weights associated with the hidden morphological layer, and each output linear layer comprising a matrix of neuron weights associated with the output linear layer, wherein each hidden layer is connected with a following output linear layer in an alternate way, the last output linear layer being the output linear layer of the network.
This way, a MNN network may be implemented wherein a plurality of morphological layers and linear layers alternate, resulting in an overall output of the neural network capable of dealing with complex problems, such as pattern recognition or regression problems. The overall accuracy provided by the present method is similar or even better than those obtained with classical ANN, while using a compact low-power digital system and also providing a natural way of connectivity pruning included when backpropagation is implemented to the full network, thus further reducing the overall area/power dissipation.
According to another aspect of the present disclosure, an electronic system for inferring of a Morphological Neural Network (MNN) is presented, the system using inference data including an input data vector. More precisely, the MNN comprises at least one morphological hidden layer and an output linear layer, a first matrix of neuron weights associated with the morphological hidden layer, and a second matrix of neuron weights associated with the output linear layer, and the system comprises:
The use of such electronic system, implemented in, for example, a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC), may minimize the cost of inferring data in an MNN neural network, thus avoiding high-cost digital blocks. Such high-cost digital blocks may usually be implemented by large numbers of logic gates, such as, for example, digital multipliers. Furthermore, as previously noted, the implementation of costly non-linear functions is also avoided, such as the hyperbolic tangent or the sigmoidal functions. As a result, it is possible to avoid the use of hardware-consuming digital blocks.
In general, ANNs are highly dependent on the use of Multiply-and-accumulate (MAC) blocks, normally implemented using classical digital codifications like two-complement. However, by using the system according to the present disclosure, many MAC blocks may be substituted by add-and-maximize or add-and-minimize blocks which use a lower number of hardware resources. Also, temporal codification may merge morphological and linear layers in a highly compact way due to the use of the temporal codification for signals used therein.
According to another example, the temporal encoder may comprise an array of digital comparators configured to temporally encode the components of the matrix of addition components, and wherein the encoding is performed by comparing each component of the matrix of addition components with a counter signal.
This enables implementing subsequent maximum and minimum functions in a compact way, since those functions may be implemented with only a single logic gate in the temporal domain.
Furthermore, the electronic system may also comprise a linear-feedback shift register (LFSR) for generating a random counter signal. Using such register in place of other random-generating modules (such as software implemented modules) leads to significantly lower hardware cost and also provides a faster method of generating a non-repeating sequence.
Also, in another example, the selection module may comprises a plurality of OR logic gates, to select a maximum value. Furthermore, according to another example, the selection module may comprise a plurality of AND logic gates, to select a minimum value.
The maximum or the minimum between two different temporally-coded signals can be obtained by using OR or AND gates respectively. This may be the case when all the numbers to include in the operation are correlated. This correlation is obtained when the same counter is used where de-codifying each number from the classical binary notation (unsigned or signed) to the temporal one. The coherence between the switching of all the signals involved implies that an OR gate selects the signal that is switching the most (the maximum) and that an AND gate selects the signal that is switching the less.
Furthermore, the array of multiplying logic gates may also comprise a plurality of XNOR logic gates, and/or a plurality of AND logic gates.
When multiplying two quantities using a temporal codification as, for example, the stochastic bipolar or unipolar codification, the gates needed to perform such operation are the XNOR gate or the AND gate respectively.
Non-limiting examples of the present disclosure will be described in the following, with reference to the appended drawings, in which:
According to a first example of the present disclosure, a method, and an electronic system for inferring a morphological neural network are presented. More precisely,
Furthermore,
As seen in
Furthermore, as seen in
In the present example, as in the following examples, vectors are considered to be column vectors, although a change to row columns could be performed, thus transposing the matrices involved in the operations accordingly.
Furthermore, in the present example, parameter M may be the number of morphological neurons of the morphological hidden layer 105, and it may usually be selected during the training stage of the network, in order to achieve the best fitting for the purpose of the neural network, or the type of input data used thereof.
The MNN network of
Thus, in step 115, the following equation is performed:
Matrix H is an intermediate matrix wherein the input of the network, as defined by the input data vector {right arrow over (x)} 101, is added to the first matrix of neuron weights VM×Q 104, which is the matrix of connectivity between the input layer (i.e., the input data vector {right arrow over (x)} 101) and the hidden layer 105. Said addition is a vector-matrix addition (a previously defined operation), resulting in a matrix H of M by Q dimensions.
Said vector-matrix addition of step 115 is performed using the array of binary digital adders 601A of the electronic system. More precisely, the components (x1 to xQ) of the input data vector {right arrow over (x)} 101 and the components of the first matrix of neuron weights VM×Q 104 are inputted in the inputs of the array 601A, in order to perform each sum as defined in equation (A1). Therefore, the outputs 601B of the array 601A comprise each sum, i.e., each component of the resulting matrix of addition components H. In order to show where each component of matrix H is obtained within the electronic system, in
Afterwards, the following step is performed:
Thus, in step 116, each output of the array of binary digital adders 601A is encoded by inputting it to an array of digital comparators 602A and comparing each output with a counter signal R1 (as seen in
Afterwards, the following steps are performed:
Thus, in step 117, the following equation is performed:
By performing steps 117 and 118, a vector of outputs {right arrow over (h)} 106 is obtained, each component of the vector 106 being the output of the corresponding neuron (i.e., for each neuron i, a corresponding hi is obtained, for all M neurons). As seen in the previous equation (A2), and following the previously defined operation of selecting the maximum component of a row of a matrix (Mr), each component of vector {right arrow over (h)} 106 is obtained by selecting, in this example, a maximum among the components of each row of the matrix of addition components H. Therefore, for row 1 of matrix H, the maximum value among all the elements of the row is selected, and placed in row 1 of the vector of outputs {right arrow over (h)} 106. This is performed for each row of H, placing the maximum value of each row of the matrix H in the corresponding same row of the vector of outputs {right arrow over (h)} 106, for all M components of the vector of outputs {right arrow over (h)} 106 (and thus, generating an output for all M neurons of the hidden layer 105).
In the present example, in the electronic system (as seen in
Therefore, the vector of outputs {right arrow over (h)} 106 is generated by gathering lines h1, h2, etc . . . , which are the components that form the vector of outputs {right arrow over (h)} 106.
Once the outputs (h1, h2, etc . . . ) of the hidden layer 105 of the network are generated, the following steps are performed:
By performing steps 119 and 120, the vector of outputs {right arrow over (h)} 106 is operated with a second matrix of neuron weights UK×M 108, by performing a vector-matrix product between them, which using the previously introduced mathematical notation can be represented by the following expressions:
Which develops the following way:
Therefore, an intermediate matrix of multiplying components Y is generated by performing a vector-matrix product between the second matrix of neuron weights UK×M 108 (which is the connectivity matrix between hidden layer 105 and the output linear layer 109) and the vector of outputs {right arrow over (h)} 106. This means that each component of a row of Y is the component of UK×M at the same row multiplied by the component of the corresponding row of the vector of outputs {right arrow over (h)} 106, and all of this for each row of UK×M 108.
In the present example, in the electronic system (as seen in
Furthermore, once the matrix of multiplying components Y is generated, a sum over rows is performed over matrix Y. Such sum over rows (as previously defined in its generic form in equation (6)) can be represented by the following equation:
Therefore, each component of a row of matrix Y is summed over the same row, thus obtaining a K-dimensional vector at the output.
The implementation of these average values over the outputs of the hidden layer provides an additional degree of freedom to the machine-learning process. Furthermore, a training process of the network would select the best averaging for each specific output related to the specific machine-learning task to implement.
According to a second example of the present disclosure, a method, and an electronic system for inferring a morphological neural network are presented. More precisely,
Furthermore,
As seen in
In the present example, parameter M may be the number of morphological neurons of the morphological hidden layer 205, and it may usually be selected during the training stage of the network in order to ensure the best fitting for the purpose of the neural network, or the type of input data used thereof. Parameter M′ may be set as half of M for simplicity of the present example (so that M=2×M′). This parameter can be selected during the training of the network or during the training by applying a backpropagation algorithm.
The MNN network of
In this example, two operating sub-matrices VM×Q 204a and V(M−M′)×Q 204b are generated from the first matrix of neuron weights VM×Q. Each sub-matrix VM′×Q 204a and V(M−M′)×Q 204b comprises a unique set of weights from the first matrix of neuron weights VM×Q; and as seen in the graphical representation of
Furthermore, the following step is performed:
H1 and H2 are intermediate matrices wherein the input of the network, as defined by the input data vector {right arrow over (x)} 201, is added to both VM×Q 204a and V(M−M)×Q 204b, that are two portions of the matrix of connectivity between the input layer (i.e., the input data vector {right arrow over (x)} 201) and the hidden layer 205. Said addition is a vector-matrix addition (an operation as previously defined), resulting in a matrix H1 of M′ rows and Q columns, while matrix H2 is composed of (M−M′) rows and Q columns.
Said vector-matrix additions of step 215 are performed using the array of binary digital adders 701A of the electronic system. More precisely, the components (x1 to xQ) of the input data vector {right arrow over (x)} 201 and the components of the first operating sub-matrix VM×Q 204a are inputted in the inputs of the array 701A (in the portion covered by the group of signals 706A to 706B), in order to perform each sum of the matrix as defined in equation (B11). Furthermore, the components of the input data vector {right arrow over (x)} 201 and the components of the second operating sub-matrix V(M−M)×Q 204b are inputted in the inputs of the array 701A (in the portion covered by the group of signals 706C to 706D), in order to perform each sum as defined in equation (B12).
Therefore, the outputs 701B of the array 701A comprise each sum, i.e., each component of the resulting matrices of addition components H1 (to generate outputs h1 to hM′) and H2 (to generate outputs hM′+1 to hM). In order to show where each component of matrix H is obtained within the electronic system, in
Afterwards, the following step is performed:
Thus, in step 216, each output of the array of binary digital adders 701A is encoded by inputting it to an array of digital comparators 702A, and comparing each output with a counter signal R1 (as seen in
After encoding all the components of both matrices H1 and H2, the following steps are performed:
Thus, in step 217, the following equation is performed:
By performing steps 217 and 218, a vector of outputs {right arrow over (h)} 206 is obtained, each component of the vector 206 being the output of the corresponding neuron of the morphological hidden layer 205 (i.e., for each neuron i, a corresponding hi is obtained, for all M morphological neurons). As seen in the previous equations (B2 and B3), and following the previously defined operation of selecting the minimum and maximum component of each row of a specific matrix (mr and Mr), each component of vector {right arrow over (h)} 206 is obtained by selecting, in this example, a maximum among the components of each row of the first matrix of addition components H1, and a minimum among the components of each row of the second matrix of addition components H2. Therefore, for row 1 of matrix H1, the maximum value among all the elements of the row is selected, and placed in row 1 of the first vector of outputs {right arrow over (h)}M′×1. This is performed for each row of H1, placing the maximum value of each row of the matrix H1 in the corresponding same row of the first vector of outputs {right arrow over (h)}M′×1, for the first M′ components of the vector of outputs {right arrow over (h)} 206. Analogously, the same is performed for each row of the second matrix of addition components H2, thus forming the second vector of outputs {right arrow over (h)}(M−M′)×1. Thus, combining them in order in correspondence with the neurons of the morphological hidden layer used to generate them, a single vector of outputs {right arrow over (h)} 206 of the hidden layer 205 is generated. That is, the elements of the first vector of outputs {right arrow over (h)}M′×1 are the first M′ elements of the single vector of outputs {right arrow over (h)} 206, and the following (M−M′) elements of the single vector of outputs {right arrow over (h)} 206 are the elements of the second vector of outputs {right arrow over (h)}(M−M′)×1.
In the present example, in the electronic system (as seen in
Therefore, the vector of outputs {right arrow over (h)} 206 is generated by gathering lines h1, h2 . . . to hM, which are the components that form the vector of outputs {right arrow over (h)} 206.
Once the outputs (h1, h2, etc . . . ) of the hidden layer 205 of the network are generated, the following steps are performed:
By performing steps 219 and 220, the vector of outputs {right arrow over (h)} 206 is operated with a second matrix of neuron weights UK×M 208, by performing a vector-matrix product between them, which can be represented by the following equation:
Which develops in the following way:
Therefore, an intermediate matrix of multiplying components Y is generated by performing a vector-matrix product between the second matrix of neuron weights UK×M 208 (which is the connectivity matrix between hidden layer 205 and the output linear layer 209) and the vector of outputs {right arrow over (h)} 206. This means that each component of a row of Y is the component of UK×M 208 of the same row multiplied by the component of the corresponding row of the vector of outputs {right arrow over (h)} 206, and all of this for each row of UK×M
Furthermore, once the matrix of multiplying components Y is generated, a sum over rows is performed over matrix Y. Such sum over rows (as previously defined in its generic form in equation (6)) can be represented by the following equation:
Therefore, each component of a row of matrix Y is added over the same row, thus obtaining the same row component of vector {right arrow over (y)}.
The implementation of these average values over the outputs of the hidden layer provides an additional degree of freedom to the machine-learning process. Furthermore, a training process of the network would select the best averaging for each specific output related to the specific machine-learning task to implement.
In this example, the APC output is bounded to a specific precision. If the output is codified in two-complement, then the possible output values (codified with N-bits) are bounded between −2N-1 and 2N-1+1. In case the incoming bits to add by APCs are presenting a high (or low) activity (that is, there is a large fraction of logic high (or low) values in the input M-bit signal), the output of the APC saturates to its upper (or lower) allowed bound. This natural saturation of the APC to its upper or lower bounds may be included in the software model used to train a full ANN network as just hard-tanh functions after each neuron of the linear layer. However, the presence of these APC upper and lower bounds implies a restriction for the range of possible values that are inputted to each morphological layer of the present MNN network. This restriction helps to further improve the natural pruning of connections in morphological layers during the application of a back-propagation training algorithm to the MNN network.
According to a third example of the present disclosure, a method and an electronic system for inferring a morphological neural network are presented. More precisely,
More specifically, in this example, each morphological layer 304, 311 to 318 has M(1), M(2), to M(L) morphological neurons respectively, and each output linear layers 307, 314 to 321 has K(1), K(2), K(L) neurons respectively.
The advantages of including a plurality of mixed morphological/linear layers are like those obtained when implementing classical multi-layer ANNs like Convolutional Neural Networks. That is, the first layers are used to detect and emphasize those low-level characteristics of the data (that can be an edge-detection for image processing, or a high-frequency detection in sound processing), while the last layers would be used to further classify and discriminate the information with a high-level of abstraction. For example, in the last layers of the network, the processing information may be answering high-level questions as: is there an animal in my photograph? or has letter ‘a’ been uttered?
Further examples of the present disclosure, and other clarifications, are hereby presented.
There are many ways of implementing in hardware a Morphological Neural Network (MNN). However, among all these strategies, the Stochastic Computing (SC) technique proposed is likely the most effective one because it suits MNN like a glove. As seen in the description of the present disclosure, morphological neurons use four fundamental mathematical operations: the maximum, the minimum, the product and the addition. Three of these operations, concretely the maximum, the minimum and the product, are highly straightforward using SC as they can be implemented through simple logic gates. Specifically, the multiplication can be implemented by a simple XNOR gate, a maximum by a simple OR gate and a minimum by a simple AND gate.
That's the main reason behind the use of SC technique for implementing MNN in hardware. The suitability of SC to MNN leads to a significant shrinkage of the hardware resources and, as a consequence, a substantial reduction of the energy consumption and power dissipation.
In addition, the SC design proposed in the present disclosure to implement MNN in hardware enables to merge morphological layers and classical linear layers in a highly compact way. In some examples of the present disclosure, there may not be encoding or decoding of data in between stacked layers of the MNN network, thereby reducing the hardware complexity and therefore the energy consumption.
As regards to the own advantages of using morphological networks instead of classical linear networks we should remind that morphological neurons segment the phase space much better than the classical ones, meaning that with fewer neurons we can achieve similar results in terms of accuracy. The shrinkage of the amount of neurons in the hidden layer leads to a decrease in the hardware resources needed. On top of that, the non-linearity inherent to morphological neurons enables to remove the typical activation function of the linear neurons, with the associated saving in terms of hardware and energy resources. It is due to remind that the implementation of digital blocks for activation functions such as the hyperbolic tangent or the sigmoid is very costly in terms of hardware resources. These activation function blocks or a digital multiplier for example—to speak about another common block in neural networks—are high-cost digital blocks that require large numbers of logic gates.
To sum up, the combination of MNN and the proposed SC design, may reduce dramatically the complexity of the hardware implementation with a shrinkage of the number of necessary neural units and the trainable parameters (weights) while increasing the compactness of the design. This may be achieved thanks to the following factors exposed above: a) the use of morphological neurons instead of classical neurons leads to a better segmentation of the pattern space with fewer neurons, meaning that with less morphological neurons in the hidden layer similar results can be reached in accuracy compared to those obtained with classical neurons; b) the use of a hybrid morphological & classical-linear network allows to remove the typical activation function of neural networks, saving this sort of high-cost digital blocks; c) given that the proposed SC design fits MNN like a glove, three out of the four (improvement of the addition is still a challenge for the Stochastic Computing community) main operations of MNN such as the maximum, the minimum and the product can be easily implemented when working in bitwise (bit by bit) mode. All these factors contribute to reducing the amount of hardware resources, the complexity of the hardware design and, as a consequence, to also reducing the energy consumption and power dissipation. Additionally, the proposed SC-based MNN may also be very competitive in terms of number of inferences per second without hardly affecting the accuracy of the inference when compared to other SC-based neural networks such as SC-based Convolutional Neural Networks.
Many temporal coding possibilities exist. Classically, a binary number is defined by using a set of b bits, each one with a weight related to 2 powered to the relative position of the bit inside the bitstream. Two possible widely used binary codification may be defined: signed binary and unsigned binary. In the signed binary codification, a b-bit number can represent a number between −2b-1 and 2b-1−1. This codification is also known as two-complement. On the other hand, in unsigned binary codification, a b-bit number can represent a number between 0 and 2b−1. However, unsigned codification does not allow defining negative numbers. Unlike the above classical coding possibilities, Stochastic Computing (SC) represents and processes the information as digitized probabilities. Stochastic Computing was proposed in the 1960s as a low-cost alternative to the architecture of Von Neumann. However, because of its intrinsically random nature, a resulting lower accuracy discouraged its use in digital computing. Anyway, when used within the hardware implementation of the inferring process of an MNN, it may be useful since, as seen above, it substantially reduces the amount of hardware needed to perform the inference. This dramatic decrease in hardware resources comes with a decrease of accuracy and computing speed, which is acceptable for most applications of such neural networks as pattern recognition networks, given that the neural networks are systems that may not need a very high accuracy in all its intermediate processing steps to provide an accurate pattern recognition response.
To work in the SC domain, firstly binary numbers may be converted to stochastic pulsed signals through blocks called Binary-To-Stochastic Converters (BSC). A BSC is a simple comparator which compares the b-bit binary number X (classically codified by using two-complement, per example) with a random (or pseudorandom) reference signal R(t) which varies every clock cycle among random values bound to N=2b levels. For instance, a 3-bit binary number at the input of the BSC will be compared every clock cycle with a value R(t) among N=23=8 levels generated randomly by a Random Number Generator (RNG), as shown in
However, instead of purely random sequences generated by a RNG, pseudorandom sequences are used owing to the lower complexity of the pseudorandom generators such as, for example, Linear Feedback Shift Registers (LFSR) or Sobol Sequence Generators.
The temporal codification as it has been explained above is called unipolar codification. Unipolar codification can be computed by adding all the high binary values of this signal during time, and dividing this result quantity by the total number of high and low values of the bitstream signal, namely, the total of bits N=2b of the sequence. Therefore, in unipolar-type stochastic encoding, the number codified is directly the probability of having a high value in the 2b-bit bitstream signal or sequence. Coded numbers are bound between 0 and 1, and have the disadvantage of not including negative numbers.
To overcome the constraint of not including negative numbers, another codification called bipolar codification was formulated in SC. Now the possible codified values may be bound between −1 and 1. Bipolar codification may be performed by adding all the high values and subtracting from this quantity the number of low values, finally the result of this subtraction is divided by the total number of zeroes and ones, namely N=2b. The relationship between the unipolar coding (noted as p) and the bipolar coding (noted as p*) comes from the following variable change:
Stochastic numbers in unipolar codification p coincide with the probability of having a “one” within the N-bit sequence. On the contrary, a bipolar-coded stochastic number p* is not actually a probability (it may be negative!) but can be associated with the probability p after having applied the former change variable. An activation probability of p=0.5 in unipolar codification is equivalent to the stochastic number p*=0 in bipolar codification. A probability of p=0 in unipolar corresponds to the stochastic number p*=−1 in bipolar, a probability of p=1 in unipolar to a p*=1 in bipolar, and a probability of p=0.3 in unipolar to a p*=−0.4 in bipolar codification.
On the other hand, the conversion from a stochastic-coded bitstream signal to a classically coded binary number may be performed by a simple digital counter. This counter may be incremented every high pulse of the bitstream during the evaluation period which may be, as exposed above, of N=2b clock cycles.
Actually, a unipolar N-bit stream is only able to represent the following exact values [0/N,1/N,2/N, . . . , (N−1)/N,N/N]. Nevertheless, this feature can be exploited since the precision needed can be adjusted without hardware modifications, just changing the number of clock cycles (or bits) N of the sequence. The higher N the greater the accuracy.
Since the formulation of the SC-based temporal codification of the sixties, several stochastic circuits have been presented for the implementation of common mathematical operations. As a first example, let us prove that the multiplication of two probabilities px and py associated to the unipolar-coded stochastic signals x(t) and y(t) can be performed by a simple AND logic gate when these two input signals are totally uncorrelated. The probability of having a high state as the output pz will be
where z(t) is the stochastic output bitstream and pz its associated activation probability. As a second example, let us take a look at the XNOR gate which in bipolar codification performs the product of two stochastic numbers
(px*·py*)
when the two input signals are also totally uncorrelated:
After operating algebraically this expression we conclude that:
Before going forward, let us make a necessary parenthesis about the impact of correlation between stochastic signals. While uncorrelated stochastic signals are generated by different RNG, totally correlated stochastic signals are generated by the same RNG. When SC was born in the 60s, the effect of correlation between stochastic signals was regarded as a problem to be avoided and a severe restriction to the SC's success. All the stochastic signals employed in the SC-based design had to be necessarily uncorrelated, namely, without statistical similarity. That is why a lot of the research efforts were focused on attaining Random Number Generators more and more independent in order to avoid correlation. Unfortunately, this approach meant to employ a high amount of hardware resources in the conversion circuits (binary numbers to bitstreams and vice versa), constraining the advantages of Stochastic Computing in hybrid circuits that mixed classical binary and SC-based circuitries. Later on it was demonstrated that correlation could be exploited to realize some operations. Even more, depending upon the correlation/decorrelation between signals, some logic gates could behave in a way or another one. For instance, while an AND logic gate performs the multiplication when its input signals are correlated, the same AND logic gate will perform the minimum operation whenever the two stochastic sequences are correlated. Something similar occurs to the XNOR gate. As demonstrated above, an XNOR gate performs the product when its input signals are uncorrelated (statistically independent). By contrast, if they are totally correlated—to say, generated by the same RNG—the XNOR gate produces
where px* and py* are the stochastic numbers linked to the stochastic signals x(t) and y(t), both coded in bipolar codification.
To demonstrate how to compute the maximum and minimum operations in bipolar codification a little more algebraic effort is required and it remains out of the scope of this text. The maximum among different probabilities can be obtained by using a simple OR logic gate as long as the input signals are totally correlated. In turn, the minimum among different probabilities is achieved by using a simple AND logic gate, with the input signals totally correlated, too. This is valid either for bipolar and unipolar codification, as is shown in Table 1.
To recap, depending on the correlation degree and the codification used (either unipolar or bipolar), the same logic gate may perform different operations which are summarized in the following table:
The stochastic addition is still a challenge for the SC community. Different circuits have been proposed to add some stochastic numbers: the OR gate, the multiplexer circuit (producing a weighted sum) and the Accumulative Parallel Counter (APC). Though the APC is by far the most complex circuit out of the three in terms of hardware cost, it is also the most accurate one. Additionally, it has another advantage: its output already is coded in binary two-complement, thus not needing a Stochastic-to-Binary converter to convert its output into a classical binary signal. The APC adds up different SC bitstreams and provide a classically-coded signal at its output (such as two-complement). An APC is composed of a parallel counter and an accumulator circuit. The APC is the block we have finally chosen in our designs since it has been also used implicitly to convert from the SC coding domain to the classical two-complement domain.
As stated on the description of the hybrid neural network described herein, its hidden layer is the morphological layer. This layer is formed by M neurons of which M′ are neurons that calculate the maximum of their respective inputs and (M−M′) are the number of neurons computing the minimum. Once M′ and M parameters are set (a first criterion to begin to work with is usually to test a network with a similar number of total weights or trainable parameters to other neural networks we would like to compare with), the neural network will be trained to find out those particular weights (also known as trainable parameters) which minimize the loss function (such as the cross-entropy loss function, for instance) considered, providing us with an accuracy or any other figure of merit which evaluates the performance of the network. Since the choice of M and M′ affects the performance of the MNN obviously, a sweeping of different M and M′ parameters will be tested until optimizing this performance according to the chosen merit figures. As seen in the description, two examples are taken: a) the first one for M=M′, meaning that the two submatrices of weights have the same dimensions, to say, the number of neurons computing the maximum is the same as the number computing the minimum; b) the second one for M′=0, meaning that all the neurons of the hidden layer calculate the maximum and none of them the minimum. To summarize, the choice of how many neurons calculate the maximum and how many the minimum is a parameter to be set during the training process in order to maximize the figures of merit we take into consideration.
For reasons of completeness, various aspects of the present disclosure are set out in the following numbered clauses:
Clause 1. Method of inferring of a Morphological Neural Network (MNN), the method being performed by an electronic system, based on inference data including an input data vector, wherein the MNN comprises at least one morphological hidden layer and an output linear layer, a first matrix of neuron weights associated with the morphological hidden layer, and a second matrix of neuron weights associated with the output linear layer, and wherein the method comprises the steps of:
Clause 2. Method according to clause 1, wherein in step b, the temporal encoding is performed by comparing each component of the matrix of addition components with a counter signal, using an array of digital comparators of the electronic system;
Clause 3. Method according to clause 2, wherein the counter signal is a continuous counter signal.
Clause 4. Method according to clause 2, wherein the counter signal is a random signal.
Clause 5. Method according to any of clauses 1 to 4, wherein the array of multiplying logic gates comprises at least one XNOR logic gate.
Clause 6. Method according to any of clauses 1 to 5, wherein the array of multiplying logic gates comprises at least one AND logic gate.
Clause 7. Method according to any of clauses 1 to 6, wherein step e further comprises temporally encoding the components of the second matrix of neuron weights, the encoding being performed by comparing each component of the second matrix of neuron weights with a second counter signal, using a second digital comparison module of the electronic system.
Clause 8. Method according to clause 7, wherein the first counter signal and the second counter signal are statistically decorrelated.
Clause 9. Method according to clause 7 or 8, wherein the second counter signal is a continuous counter signal or a random signal.
Clause 10. Method according to any of clauses 1 to 9, further comprising generating two operating sub-matrices from the first matrix of neuron weights, each sub-matrix comprising a unique set of weights from the first matrix of neuron weights, and wherein:
Clause 11. Method according to any of clauses 1 to 10, wherein the network comprises a plurality of hidden layers and a plurality of output linear layers, each hidden layer comprising a matrix of neuron weights associated with the hidden morphological layer, and each output linear layer comprising a matrix of neuron weights associated with the output linear layer, wherein each hidden layer is connected with a following output linear layer in an alternate way, the last output linear layer being the output linear layer of the network.
Clause 12. Electronic system for inferring of a Morphological Neural Network (MNN), using inference data including an input data vector, wherein the MNN comprises at least one morphological hidden layer and an output linear layer, a first matrix of neuron weights associated with the morphological hidden layer, and a second matrix of neuron weights associated with the output linear layer, and wherein the system comprises:
Clause 13. Electronic system according to clause 12, wherein the temporal encoder comprises an array of digital comparators configured to temporally encode the components of the matrix of addition components, and wherein the encoding is performed by comparing each component of the matrix of addition components with a counter signal.
Clause 14. Electronic system according to clause 13, further comprising a linear-feedback shift register (LFSR) for generating a random counter signal.
Clause 15. Electronic system according to any of clauses 12 to 14, wherein the selection module comprises a plurality of OR logic gates, to select a maximum value.
Clause 16. Electronic system according to any of clauses 12 to 15, wherein the selection module comprises a plurality of AND logic gates, to select a minimum value.
Clause 17. Electronic system according to any of clauses 12 to 16, wherein the array of multiplying logic gates comprises a plurality of XNOR logic gates.
Clause 18. Electronic system according to any of clauses 12 to 17, wherein the array of multiplying logic gates comprises a plurality of AND logic gates.
Although only a number of examples have been disclosed herein, other alternatives, modifications, uses and/or equivalents thereof are possible. Furthermore, all possible combinations of the described examples are also covered. Thus, the scope of the present disclosure should not be limited by particular examples, but should be determined only by a fair reading of the claims that follow. If reference signs related to drawings are placed in parentheses in a claim, they are solely for attempting to increase the intelligibility of the claim, and shall not be construed as limiting the scope of the claim.
Further, although the examples described with reference to the drawings comprise computing apparatus/systems and processes performed in computing apparatus/systems, the invention also extends to computer programs, particularly computer programs on or in a carrier, adapted for putting the system into practice.
Number | Date | Country | Kind |
---|---|---|---|
22382198.4 | Mar 2022 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2023/054880 | 2/27/2023 | WO |