This disclosure relates to a method to perform convolutions between arbitrary vectors using clusters of weakly coupled oscillators.
A large number of signal processing applications, ranging from surveillance cameras to automobiles to airplanes to UAVs, need to perform filtering operations on large volumes of signals that are acquired by numerous sensors in real time. For example, prior art object recognition algorithms employ a deep learning network whose fundamental computation is a convolution operation.
In the prior art, coupled oscillators have been used to compute a degree-of-match (DoM) between two vectors, as described in References [2], [7], and [8], below, which are incorporated herein by reference. A DoM is computed from the difference between the two vectors, and is based on the dynamics of spontaneous synchronization among the coupled oscillators. The concept is that if the vectors have similar values such that the match is high and the differences are small, then the oscillators synchronize in frequency and phase relatively quicker.
Depending on the oscillator behavior, for example as described in References [7] and [8] for a CMOS relaxation oscillator, and coupling architecture, which may be a ring as described in Reference [2], the output DoM measure has been shown to roughly correlate with some Lp norm of the distance between the vectors.
A person skilled in the art understands the formula for an Lp norm, and knows that the formula for an L2 norm of a vector x is
while the formula for an L1 norm is
It has been challenging to characterize the DoM measure with a closed-form analytic function of an Lp norm that is differentiable everywhere. Being able to perform such a characterization of DoM is critical because the prior art pattern recognition and machine learning algorithms, which may for example use convolutional nets, or a hierarchy of auto-encoders, are trained using variants of gradient descent, which may for example be delta rule and back-propagation. Delta rule and back-propagation work only for differentiable activation functions for each of the vast number of units in the network. For this reason, existing attempts at exploiting the concept of oscillator clusters to build complex visual object recognition systems have achieved only limited success, as described in References [3] and [4], below, which are incorporated herein by reference. However, methods that use oscillator clusters but which do not depend on gradient descent training have been more successful, as described in References [7] and [8].
The following references are incorporated by reference.
What is needed is an improved method to compute inner products, dot products, and convolutions using a cluster of weakly coupled oscillators. The embodiments of the present disclosure answer these and other needs.
The convolution of two vectors is essentially the dot product between the two vectors. Computing the dot product between two vectors is computationally costly. Each pair of elements M must be multiplied and then summed. When a large number of data points are involved this straight forward computation becomes too costly and a faster, computationally cheaper method is needed. One such method is to compute an approximate dot product of two vectors from the Degree of Match (DoM) between the two vectors and the DoM between each vector and the zero vector. A cluster of weakly coupled oscillators is a computationally inexpensive way to compute the DoM between two vectors. One cluster is used for a first vector and a second, a second cluster is used for the DoM between the first vector and the zero vector, and a third cluster is used to compute the DoM between the second vector and the zero vector. Each DoM is used to enter a precomputed piecewise continuous and differentiable function to produce an estimate of the magnitude squared of the difference between the two vectors. Then the approximate dot product computation is a straight forward combination of the three estimates of the square of the magnitude as described below.
In a first embodiment disclosed herein, a method to perform convolutions between arbitrary vectors comprises estimating a first degree of match for a difference between a first vector having a plurality of first elements and a second vector having a plurality of second elements using a first cluster of weakly coupled oscillators, estimating a second degree of match for the first vector using a second cluster of weakly coupled oscillators, estimating a third degree of match for the second vector using a third cluster of weakly coupled oscillators, deriving a first squared L2 norm (the square of the magnitude of the difference between the first vector and the second vector) from the first degree of match, deriving a second squared L2 norm from the second degree of match, deriving a third squared L2 norm from the third degree of match, adding the second squared L2 norm and the third squared L2 norm, and subtracting the first squared L2 norm to form a sum, and dividing the sum by two. The end result is an estimate of the convolution (dot product) between the first vector and the second vector.
In another embodiment disclosed herein, a method to perform convolutions between arbitrary vectors {right arrow over (X)} and {right arrow over (T)} comprises calculating the formula
{right arrow over (X)}·{right arrow over (T)}=½{∥{right arrow over (X)}′−{right arrow over (0)}∥2+∥{right arrow over (T)}′−{right arrow over (0)}∥2−∥{right arrow over (X)}′−{right arrow over (T)}′∥2−2αXβT∥{right arrow over (X)}∥1−2αTβX∥{right arrow over (T)}∥1−2βXβT}
wherein
∥{right arrow over (X)}′−{right arrow over (T)}′∥2 is derived from a first degree of match for a difference between the vector
{right arrow over (X)}′ and the vector
{right arrow over (T)}′ using a first cluster of weakly coupled oscillators, wherein ∥{right arrow over (X)}′−{right arrow over (0)}∥2 is derived from a second degree of match for a difference between the vector
{right arrow over (X)}′ and a zero vector using a second cluster of weakly coupled oscillators, wherein ∥{right arrow over (T)}′−{right arrow over (0)}∥2 is derived from a third degree of match for a difference between the vector
{right arrow over (T)}′ and a zero vector using a third cluster of weakly coupled oscillators, wherein each element of the vector {right arrow over (X)} and the vector {right arrow over (T)} is linearly scaled to range between −1 and +1 to form a scaled vector {right arrow over (X)}′ and to form a scaled vector {right arrow over (T)}′ ({right arrow over (X)}′=αX{right arrow over (X)}+βX and {right arrow over (T)}″=αT{right arrow over (T)}+βT), wherein is the L1 norm of vector {right arrow over (X)}, and wherein ∥{right arrow over (T)}∥1 is the L1 norm of vector {right arrow over (T)}.
In yet another embodiment disclosed herein, a device to perform convolutions between arbitrary vectors {right arrow over (X)} and {right arrow over (T)} comprises a processor for calculating the formula
{right arrow over (X)}·{right arrow over (T)}=½{∥{right arrow over (X)}′−{right arrow over (0)}∥2+∥{right arrow over (T)}′−{right arrow over (0)}∥2−∥{right arrow over (X)}′−{right arrow over (T)}′∥2−2αXβT∥{right arrow over (X)}∥1−2αTβX∥{right arrow over (T)}∥1−2βXβT},
a first cluster of weakly coupled oscillators for determining a first degree of match for a difference between the vector
{right arrow over (X)}′ and the vector
{right arrow over (T)}′ to derive
∥{right arrow over (X)}′−{right arrow over (T)}′∥2, a second cluster of weakly coupled oscillators for determining a second degree of match for a difference between the vector
{right arrow over (X)}′ and a zero vector to derive ∥{right arrow over (X)}′−{right arrow over (0)}∥2, a third cluster of weakly coupled oscillators for determining a third degree of match for a difference between the vector
{right arrow over (T)}′ and a zero vector to derive ∥{right arrow over (T)}′−{right arrow over (0)}∥2, wherein each element of the vector {right arrow over (X)} and the vector {right arrow over (T)} is linearly scaled to range between −1 and +1 to form a scaled vector {right arrow over (X)}′ and to form a scaled vector {right arrow over (T)}′, wherein ∥{right arrow over (X)}∥1 is the L1 norm of vector {right arrow over (X)}, and wherein ∥{right arrow over (T)}∥1 is the L1 norm of vector {right arrow over (T)}.
These and other features and advantages will become further apparent from the detailed description and accompanying figures that follow. In the figures and description, numerals indicate the various features, like numerals referring to like features throughout both the drawings and the description.
In the following description, numerous specific details are set forth to clearly describe various specific embodiments disclosed herein. One skilled in the art, however, will understand that the presently claimed invention may be practiced without all of the specific details discussed below. In other instances, well known features have not been described so as not to obscure the invention.
The present disclosure describes an analog method to compute inner products, and thereby convolutions, using a cluster of weakly coupled oscillators. The oscillators may be nanoscale oscillator devices, such as resonant body oscillators (RBOs) and spin torque oscillators (STDs). The method of the present disclosure would require 104 less power consumption than that needed for conventional Boolean arithmetic-based convolution. Also, the processing speed of the method of the present disclosure would be 103 times faster than computing convolutions using conventional Boolean arithmetic. Therefore, a large improvement with respect to size, weight, area, and power (SWAP) is possible.
The present disclosure is a method to approximate the computation of a dot product, for which closed-form optimal weight update equations exist for training deep learning networks. For instance, in convolution nets, as described in References [5] and [6], which are incorporated herein by reference, the activity of each unit in the feature matching layers is governed by a sigmoidal signal function that operates on the dot product between its fan-in weight template vector and the inputs in its immediate receptive field. The present disclosure relies on approximating an L2 norm as a function of an DoM with piecewise linear functions, where the number of segments in the piecewise linear function is a variable that improves performance monotonically, as further described below.
Given two high-dimensional vectors {right arrow over (X)} and {right arrow over (T)} with arbitrary ranges of values, the method of the present disclosure can provide a fast computation of the inner product of the two vectors {right arrow over (X)}·{right arrow over (T)} based on the following Equation (1)
∥{right arrow over (X)}−{right arrow over (T)}∥2=∥{right arrow over (X)}∥2+∥{right arrow over (T)}∥2−2({right arrow over (X)}·{right arrow over (T)}) (1), which can be rearranged as Equation (2),
{right arrow over (X)}·{right arrow over (T)}=½{∥{right arrow over (X)}∥2+∥{right arrow over (T)}∥2−∥{right arrow over (X)}−{right arrow over (T)}∥2}, (2) which is equivalent to Equation (3)
{right arrow over (X)}·{right arrow over (T)}=½{∥{right arrow over (X)}−{right arrow over (0)}∥2+∥{right arrow over (T)}−{right arrow over (0)}∥2−∥{right arrow over (X)}−{right arrow over (T)}∥2} (3).
The method of the present disclosure extracts estimates for the required squared L2 norms, namely the squared L2 norms ∥{right arrow over (X)}−{right arrow over (0)}∥2, ∥{right arrow over (T)}−{right arrow over (0)}∥2, and ∥{right arrow over (X)}−{right arrow over (T)}∥2 from the DoM outputs of the oscillator clusters for the corresponding three pairs of vectors, namely, ({right arrow over (X)},{right arrow over (0)}), (T,{right arrow over (0)}), and ({right arrow over (X)},{right arrow over (T)}), respectively. As a person skilled in the art would understand that a given value of an L2 norm, for example ∥{right arrow over (X)}−{right arrow over (T)}∥2, may be the result of different pairs of {right arrow over (X)} and {right arrow over (T)}, which is exacerbated for high dimensional vectors.
The method of the present disclosure is applicable to any oscillator cluster technology that computes DoM between two vectors using the physics of spontaneous synchronization.
The method, as shown in
In the offline procedure, a cluster of weakly coupled oscillators, such as the cluster of weakly coupled oscillators 20, shown in
In the method, in order to characterize the DoM outputs 22 for different squared L2 norms, it is assumed, without loss of generality as further described below, that the minimum and maximum value for each element in each vector {right arrow over (X)} and {right arrow over (T)}, ranges between −1 and +1. This ensures that the each squared L2 norm ranges from 0 to 4N, where N is the dimensionality of the vectors. For the purpose of characterizing the DoM outputs 22 for the squared L2 norms ∥{right arrow over (X)}−{right arrow over (0)}∥2, ∥{right arrow over (T)}−{right arrow over (0)}∥2, and ∥{right arrow over (X)}−{right arrow over (T)}∥2, this range of −1 and +1 for each vector {right arrow over (X)} and {right arrow over (T)} preferably is sampled uniformly across the range.
Once the DoM outputs 22 across the range of −1 and +1 for each vector {right arrow over (X)} and {right arrow over (T)}, a graph 24, as shown in
By performing the above steps in an offline procedure, the graph 24 with piecewise linear segments 25 may be used to lookup, or immediately estimate, a squared L2 norm for a DoM generated by the cluster of weakly coupled oscillators.
If vectors {right arrow over (X)} and {right arrow over (T)} have arbitrary valued-elements, rather than elements ranging from −1 to +1, the vectors may be linearly scaled and shifted to the range of −1 to 1. This can be trivially achieved based on the minimum and maximum values across the elements for each vector. The linear transformation functions are as follows:
{right arrow over (X)}′=α
X
{right arrow over (X)}+β
X (4) and
{right arrow over (T)}′=α
T
{right arrow over (T)}+β
T (5).
Combining Equations (3)-(5), the following Equation (6) can be derived:
{right arrow over (X)}·{right arrow over (T)}=½{∥{right arrow over (X)}′−{right arrow over (0)}∥2+∥{right arrow over (T)}′−{right arrow over (0)}∥2−∥{right arrow over (X)}′−{right arrow over (T)}′∥2−2αXβT∥{right arrow over (X)}∥1−2αTβX∥{right arrow over (T)}∥1−2βXβT} (6).
Equation (6), above, shows that the dot product 26 can be estimated using the concept of coupled oscillators wherein the first three terms of Equation (6) are three squared L2 norms. These squared L2 norms are derived by using the cluster of weakly coupled oscillators 20 to compute a DoM 22 for each of the first three terms in Equation (6), as shown in
Equation (6) also requires computing the L1 norms of the two vectors (i.e., ∥{right arrow over (X)}∥1 and ∥{right arrow over (T)}∥1), which are relatively less expensive computationally compared to multiplication, because the L1 norm of a vector, as discussed above, is merely the sum of the absolute values of the elements in the vector.
The computations of Equation (6) may be performed by any processor, computer, or microprocessor having storage and computing elements whether digital or analog.
The generation of the DoM outputs 22 for the different squared L2 norms ∥{right arrow over (X)}−{right arrow over (0)}∥2, ∥{right arrow over (T)}−{right arrow over (0)}∥2, and ∥{right arrow over (X)}−{right arrow over (T)}∥2 may be implemented in two ways, serially and in parallel. In the first implementation the same cluster of weakly coupled oscillators 24 are used to perform the characterization of the required squared L2 norms in sequence. In the second implementation three clusters of weakly coupled oscillators are used in parallel for the characterization of the three squared L2 norms.
As shown in
The integrators 42 may be implemented with capacitors, and the hysteresis quantizers 44 and the 1-bit DACs 46 may be implemented with CMOS transistors. The output 48 of the time encoder is an asynchronous pulse-type signal that has only two possible values: high and low. This type of oscillator, with only two binary amplitude values, can be implemented efficiently in CMOS technology with low voltage swings.
The output 47 of each time encoder is an input to an averager circuit 50, which includes transconductance amplifiers 51 each connected to resistor 52. The resistor 52 may be connected to a reference voltage VREF1. The transconductance amplifiers 51 convert the voltage outputs 47 of the time encoder oscillators 32 into currents. The currents of all the transconductance amplifiers 51 may be summed together by wire merging and are connected to resistor 52 to form the oscillatory signal y 54.
Then a match circuit is used to convert the oscillatory signal y 54 into the output signal d 34 that has a higher voltage when {right arrow over (X)} is close to {right arrow over (T)} and a lower voltage when {right arrow over (X)} is not close to {right arrow over (T)}. The match circuit includes a buffer 60, a diode 62, a capacitor 64, a current source circuit 66, and an integrator 68. The buffer 60 is used to produce a voltage signal yb 70 with the same voltage value as the signal y 54 produced by the averager circuit 50. The buffer 60 is used to ensure that the current flowing through the diode 62 does not have any effect on the output voltage signal y 54 of the averager circuit 50. The diode 62, capacitor 64, and current source circuit 66 are used to rectify the signal yb 70 The result of the rectification is a voltage signal yc 72 that follows the peak values of the oscillatory signal yb 70. The integrator 68 is used to integrate yc 72. The integrator 68 can be reset by a reset signal 74. The output of the integrator is voltage signal d 34. The voltage of this signal d 34, measured at a certain fixed time period after the reset signal 74 is enabled, represents the degree of match (DoM) between the input {right arrow over (X)} and the target {right arrow over (T)} vectors. The time period to measure the signal d 34 can be in the order of fifty (50) times larger than a typical average oscillation cycle time of the oscillators 32.
For the circuit of
In some embodiments the feedback signal 54 is between 1% and 36% of the arithmetic average of all oscillator outputs 47. Another range for the value of the feedback signal is 0.04 to 0.50 of the arithmetic average of the outputs 47 of the oscillators 32. A typical feedback signal may be 0.36*(Output_of_Oscillator_1+Output_of_Oscillator_2+ . . . +Output_of_Oscillator_M)/M. The number of oscillators M is arbitrary. In
The circuit of
The coupled oscillator cluster of
In another embodiment, a simpler coupled oscillator 100, as shown in
The input Vin 102 to each voltage-controlled CMOS oscillator 100 is from analog voltage difference circuit 101 and is an analog voltage difference (Xi−Ti) of elements Xi and Ti of two vectors {right arrow over (X)} and {right arrow over (T)}. The outputs 104 of the voltage-controlled CMOS oscillators 100 may be combined or summed by direct electrical connection at connection 106, and then buffered by buffer 108 and integrated by integrator 110 to form output 120.
When {right arrow over (X)} and {right arrow over (T)} match, the voltage-controlled CMOS oscillators 100 are more synchronized. When they do not match, the voltage-controlled CMOS oscillators 100 are less synchronized. The output 120 depends on the amount of synchronization and the degree of match (DoM) between the input vector {right arrow over (X)} and the target vector {right arrow over (T)}. The integrated waveform at the output 120, and the sampled voltage of the output 120 has the characteristic of the squared L2 norm, which may be expressed as the L22 norm, as shown in
The DoM circuit shown in
Therefore, vector convolution may be implemented with oscillators by making a simple algebraic transformation of Equation (7). By expanding and rearranging this equation, an expression for the convolution of A and B in terms of three oscillator-based DOM circuits can be derived, as shown in Equation 8.
Equation (8) shows that the dot product or convolution of two vectors A and B can be computed by using three oscillator clusters, each computing a DoM. One oscillator cluster computes the DoM between vector A and B, DOM(A,B), the second oscillator cluster computes the DOM between the vector A and a zero vector, DOM(A,0), and the third oscillator cluster computes the DOM between the vector B and a zero vector, DOM(B,0). Then a simple subtractor and scale operator can be used to produce the dot product of vector A and vector B, as shown in Equation (8). The quality of the derived dot product, or a measure of how close it matches the mathematical ideal, is a function of the oscillators, the coupling, the DOM circuitry, and how close the sampled response is to the L22 norm.
For the circuit described above with reference to
The present disclosure has described methods and apparatus to compute inner products, and thereby convolutions, between arbitrary vectors. Any oscillator cluster technology that computes a degree-of-match between two vectors using spontaneous synchronization dynamics may be used, including those described in
Having now described the invention in accordance with the requirements of the patent statutes, those skilled in this art will understand how to make changes and modifications to the present invention to meet their specific requirements or conditions. Such changes and modifications may be made without departing from the scope and spirit of the invention as disclosed herein.
The foregoing Detailed Description of exemplary and preferred embodiments is presented for purposes of illustration and disclosure in accordance with the requirements of the law. It is not intended to be exhaustive nor to limit the invention to the precise form(s) described, but only to enable others skilled in the art to understand how the invention may be suited for a particular use or implementation. The possibility of modifications and variations will be apparent to practitioners skilled in the art. No limitation is intended by the description of exemplary embodiments which may have included tolerances, feature dimensions, specific operating conditions, engineering specifications, or the like, and which may vary between implementations or with changes to the state of the art, and no limitation should be implied therefrom. Applicant has made this disclosure with respect to the current state of the art, but also contemplates advancements and that adaptations in the future may take into consideration of those advancements, namely in accordance with the then current state of the art. It is intended that the scope of the invention be defined by the Claims as written and equivalents as applicable. Reference to a claim element in the singular is not intended to mean “one and only one” unless explicitly so stated. Moreover, no element, component, nor method or process step in this disclosure is intended to be dedicated to the public regardless of whether the element, component, or step is explicitly recited in the Claims. No claim element herein is to be construed under the provisions of 35 U.S.C. Sec. 112, sixth paragraph, unless the element is expressly recited using the phrase “means for . . . ” and no method or process step herein is to be construed under those provisions unless the step, or steps, are expressly recited using the phrase “comprising the step(s) of . . . .”
This application relates to U.S. patent application Ser. No. 14/202,200, filed Mar. 10, 2014, which is incorporated herein as though set forth in full.
This invention was made under U.S. Government contract HR0011-13-C-0052. The U.S. Government has certain rights in this invention.