The present disclosure relates to artificial neural networks and in particular to hyperdimensional computing that is adaptive to changes in environment, data complexity, and data uncertainty.
Hyperdimensional computing (HDC) has been introduced as a computational model mimicking brain properties towards robust and efficient cognitive learning. The main component of HDC is an encoder that transforms data into knowledge that can be learned and processed at very low cost. Inspired by the human brain, the encoder maps data points into a high-dimensional holographic neural representation. Although the quality of HDC learning directly depends on the encoding module, the lack of flexibility and reliability arising from the deterministic nature of HDC encoding often significantly affects the quality and reliability of the hyperdimensional learning models. Therefore, a need remains for an HDC encoder that provides flexibility and reliability for hyperdimensional computing that is adaptive to changes in environment, data complexity, and data uncertainty.
A hyperdimensional learning framework is disclosed with a variational encoder (VAE) module that is configured to generate variational autoencoding and to generate an unsupervised network that receives a data input and learns to predict the same data in an output layer. A hyperdimensional computing (HDC) learning module is coupled to the unsupervised network through a data bus, wherein the HDC learning module is configured to receive data from the VAE module and update an HDC model of the HDC learning module.
The disclosed hyperdimensional learning framework provides a foundation for a new class of variational autoencoder that ensures that latent space has an ideal representation for hyperdimensional learning. Disclosed embodiments adaptively learn a better HDC representation depending on the changes in the environment, the complexity of the data, and uncertainty in data. Further disclosed is a hyperdimensional classification that directly operates over encoded data and enables robust single-pass and iterative learning while defining a first formal loss function and training method for HDC. Evaluation over large-scale data shows that the disclosed embodiments not only achieve faster and higher quality of learning but also provide inherent robustness to deal with dynamic and uncertain data.
In another aspect, any of the foregoing aspects individually or together, and/or various separate aspects and features as described herein, may be combined for additional advantage. Any of the various features and elements as disclosed herein may be combined with one or more other disclosed features and elements unless indicated to the contrary herein.
Those skilled in the art will appreciate the scope of the present disclosure and realize additional aspects thereof after reading the following detailed description of the preferred embodiments in association with the accompanying drawing figures.
The accompanying drawing figures incorporated in and forming a part of this specification illustrate several aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
The embodiments set forth below represent the necessary information to enable those skilled in the art to practice the embodiments and illustrate the best mode of practicing the embodiments. Upon reading the following description in light of the accompanying drawing figures, those skilled in the art will understand the concepts of the disclosure and will recognize applications of these concepts not particularly addressed herein. It should be understood that these concepts and applications fall within the scope of the disclosure and the accompanying claims.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
It will be understood that when an element such as a layer, region, or substrate is referred to as being “on” or extending “onto” another element, it can be directly on or extend directly onto the other element or intervening elements may also be present. In contrast, when an element is referred to as being “directly on” or extending “directly onto” another element, there are no intervening elements present. Likewise, it will be understood that when an element such as a layer, region, or substrate is referred to as being “over” or extending “over” another element, it can be directly over or extend directly over the other element or intervening elements may also be present. In contrast, when an element is referred to as being “directly over” or extending “directly over” another element, there are no intervening elements present. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected” or “directly coupled” to another element, there are no intervening elements present.
Relative terms such as “below” or “above” or “upper” or “lower” or “horizontal” or “vertical” may be used herein to describe a relationship of one element, layer, or region to another element, layer, or region as illustrated in the Figures. It will be understood that these terms and those discussed above are intended to encompass different orientations of the device in addition to the orientation depicted in the Figures.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including” when used herein specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Embodiments are described herein with reference to schematic illustrations of embodiments of the disclosure. As such, the actual dimensions of the layers and elements can be different, and variations from the shapes of the illustrations as a result, for example, of manufacturing techniques and/or tolerances, are expected. For example, a region illustrated or described as square or rectangular can have rounded or curved features, and regions shown as straight lines may have some irregularity. Thus, the regions illustrated in the figures are schematic and their shapes are not intended to illustrate the precise shape of a region of a device and are not intended to limit the scope of the disclosure. Additionally, sizes of structures or regions may be exaggerated relative to other structures or regions for illustrative purposes and, thus, are provided to illustrate the general structures of the present subject matter and may or may not be drawn to scale. Common elements between figures may be shown herein with common element numbers and may not be subsequently re-described.
The need for efficient processing for diverse cognitive tasks using a vast volume of data generated in Internet of Things (IoT) is increasing. Particularly, there is a crucial need for scalable methods for learning on embedded or edge devices. However, there are technical challenges making it difficult to process data on these devices. One technical challenge is computation efficiency. For example, running machine learning or data processing algorithms often results in extremely slow processing speed and high energy consumption. Yet other machine learning or data processing algorithms require a large cluster of application-specific integrated chips, such as deep learning on Google tensor processing units. Another technical challenge is a lack of robustness to noise. For example, edge devices often rely on unreliable power sources and noisy wireless communications. As such, modern machine learning systems have almost no robustness to such noise and typically fail due to lack of robustness.
Nevertheless, hyperdimensional computing (HDC) has shown great potential to outperform deep learning solutions in terms of energy efficiency and robustness, while ensuring a better or comparable quality of learning. Hyperdimensional computing is introduced as an alternative computational model that mimics important brain functionalities towards high-efficiency and noise-tolerant computation. Hyperdimensional computing is motivated by the observation that the human brain operates on high-dimensional data representations. In HDC, objects are thereby encoded with high-dimensional vectors, called hypervectors, which have thousands of elements. HDC incorporates learning capability along with typical memory functions of storing/loading information, and HDC mimics several important functionalities of the human memory model with vector operations that are computationally tractable and mathematically rigorous in describing human cognition.
HDC shows several advantages compared with the conventional deep learning solutions for learning in IoT systems. One advantage is that HDC is suitable for on-device learning based on hardware acceleration due to HDC's highly parallel nature. Another advantage is that hidden features of information can be well-exposed, thereby empowering both training and inference with the light-weight computation and a small number of iterations. Yet another advantage is that the hypervector representation inherently exhibits strong robustness against the noise and corrupted data. As a result, HDC may be employable as a part of many applications, including activity and gesture recognition, genomics, signal processing, robotics, and sensor fusion. Other advantages of HDC allow learning with a single iteration or very few iterations and learning with few samples while having inherent robustness to noise in hardware.
Regardless of the HDC functionality, transforming data into high-dimensional representation by encoding is a first step that uses randomly generated hypervectors. The quality of HDC learning depends on the encoding module. Many IoT systems deal with dynamic and uncertain data, mostly observed through imperfect data acquired from sensors. However, the lack of flexibility and reliability arising from the deterministic nature of the existing HDC encoding often substantially affects the quality and reliability of the model. Particularly, all previous HDC encoding methods are static and unreliable and thus cannot deal with the dynamic and uncertain data that exist in most real-world problems.
Hyperdimensional computing is a neurally inspired model of computation based on the observation that the human brain operates on high-dimensional and distributed representations of data. The fundamental units of computation in HDC are high-dimensional data or hypervectors, which are constructed from raw signals using an encoding procedure (
A first step in HDC is to map each data point into high-dimensional space. The mapping procedure is often referred to as encoding, as shown in n. The encoding module maps this vector into a high-dimensional vector, H ∈ {−1, +1
:D»n. Three common methods for HDC encoding are the following:
bits of k.
The foregoing encoding methods provide a different quality of learning and computational complexity. The inclusive encoder is the fastest encoder because the inclusive encoder predominately uses bitwise operations. The random projection encoder is the second low cost encoder, for the projection matrix is still a binary/bipolar matrix. In a non-linear encoder, both bases and feature values are non-binary, thus the random projection encoder incurs a slightly higher computational cost. However, in terms of quality of learning, the non-linear encoder is considered state-of-the-art with exceptional capability to extract knowledge from data.
Despite the strengths, all existing HDC encoders are static and unreliable and thus cannot deal with the dynamic and uncertain data that exist in most real-world systems. In IoT systems, the environment and data points are dynamically changing. For example, as one moves through winter, spring, summer, and autumn, outdoor images that include foliage have different backgrounds and temperature sensors are collecting different ranges of values. Beside these seasonal changes in IoT systems, data points may get unpredictable changes, generating various unseen or variational data. Machine learning algorithms, including HDC, require labeled data to train a suitable model to adapt to a new environment. However, it is impractical and often infeasible to collect labels for data observed during inference.
An ideal encoder for HDC should be able to find a better representation given new unlabeled data.
The AutoHD encoder 10 was evaluated on a wide range of learning and cognitive problems. The results show that the AutoHD encoder 10 not only achieves faster and higher quality of learning but also provides inherent robustness to deal with dynamic and uncertain data. Over a traditional non-noisy data set, the AutoHD encoder 10 achieves, on average, 7.7% higher quality of learning compared with state-of-the-art HDC learning methods.
The AutoHD encoder 10 is a uniquely trainable variational encoder for HDC that is configured to dynamically change representation to adapt to changes in data. The AutoHD encoder 10 has a VAE module 12 and a hyperdimensional computing (HDC) learning module 14. Instead of using a static HDC encoder to map data into high-dimensional space as do traditional HDC encoders, the AutoHD encoder 10 employs the VAE module 12 in combination with a dynamic high-dimensional representation. The disclosed VAE module 12 is configured to generate variational autoencoding and generates an unsupervised network that receives a data input and learns to predict the same data in an output layer. During operation, the AutoHD encoder 10 fills VAE latent space with a relatively rich representation that considers the correlation of all inputted data. Traditionally, VAE latent space learns a low-dimensional representation of data. In contrast, the approach according to the present disclosure makes a unique modification to the unsupervised network of the VAE module 12 to learn a high-dimensional representation that can be directly used by an HDC model 16.
Learning in the AutoHD encoder 10 proceeds in two phases:
Variational autoencoding is a form of unsupervised learning in which a compact latent space of a data set is learned. In particular, autoencoding focuses on the training of the encoder that maps data to the latent space and the decoder that does the opposite. Variational autoencoding learns a distribution of the latent variables such that a sampling in the distribution is decoded into an item that resembles the training data. Conventionally, the distribution of the latent variables is in a low-dimensional space and has a Gaussian distribution. The present disclosure relates to a solution that uses VAE latent space to generate a holographic representation for hyperdimensional learning. Variational autoencoding can dynamically capture the correlative distance of data points in latent space depending on the data complexity. In addition, VAE is fully unsupervised with no training cost.
The VAE module 12 assumes that input data x comes from an unknown distribution p*(x) and seeks to approximate such a distribution with a generative neural network with parameters θ that defines a distribution pθ(x)≈p*(x). Another assumption is that data has latent variables z and pθ(x)≈∫pθ(x, z) dz. Using traditional variational Bayes methods to optimize θ is not ideal since the intractable posterior pθ(z|x) needs to be approximated. Additional parameters are introduced: ϕ of an encoder neural network 18 to define the distribution qϕ(z|x) such that qϕ(z)≈pθ(z|x). This framework allows optimization of θ and ϕ simultaneously.
To train the VAE module 12, the maximization function is defined as the variational lower bound:
(θ, ϕ; x)=log pθ(x)−DKL(qϕ(z|x)∥pθ(z|x))
The maximizing function ensures that the parameters θ of the generative model pθ(x) are the most likely, given the data. At the same time, the KL-divergence draws the approximate posterior qϕ(z|x) closer to the true intractable distribution pθ(z|x). This maximization objective can be rewritten as follows:
where the first term indicates the error between the input and the reconstructed data, and the second term of loss function is related to the closeness of latent space to the VAE prior (β). This term gets a higher value when the approximate posterior distribution is similar to the subjective prior. Previous work has modified this optimization objective by adding a hyperparameter β>0 to adjust the importance of each term. This model is known as β-VAE, as shown in
(θ, ϕ; x)=
z˜q
Depending on the distribution of the original data, the negative reconstruction error takes different forms. For example, if input data come from multivariate independent Bernoulli distributions, x˜Bernoulli(p), then the negative reconstruction error yields the cross-entropy loss function
log pθ(x|z)=Σi=1M oi log xi+(1−oi) log(1−xi),
where o=(oi)i=1M is the output of the VAE. In case of x˜(0, I), then log pθ(x)=−∥x−o∥22+C.
VAE Hyperdimensional Representation: In HDC, hypervectors are holographic and (pseudo)random with independent and identically distributed components. A hypervector contains all the information combined and spread across all its components in a full holistic representation so that no component is more responsible for storing any piece of information than another. To ensure that the VAE generates HDC data, it must be shown that the latent space distribution holds independent and holographic representation. In particular, the latent space of the VAE qϕ(z|x) is parametrized with a fixed distribution by design. This distribution is often a multivariate normal distribution (μ, σI) with prior qϕ(z) being
(0, I). This distribution is useful for HDC because, by design, the latent space is drawn from normal distributions, and the spaces are independent of one another.
To ensure holographic representation, neurons in the latent space should correspond to all input features. However, VAEs tend to have non-holographic representation as the dimensionality of latent space is growing. To eliminate that, the dropout layer right before a decoder neural network 20 (see
Existing HDC learning methods first generate all encoding hypervectors belonging to a class/label l and then compute the class hypervector {right arrow over (C)}1 by bundling (adding) all ls, assuming there are
inputs having label l: {right arrow over (C)}l=
jl.
Observe that the existing single-pass training methods saturate the class hypervectors in an HDC model. In a naive single-pass model, the encoded data that are more dominant saturate class hypervectors. Therefore, less common training data on the model have a lower chance to represent themselves. One solution to address this issue is to go iteratively over training data and to adjust the class hypervectors. The model adjustment increases the weight of input data that are likely to be misclassified with the current HDC model 16.
Iterative Training: Assume as a new training data point. The AutoHD encoder 10 is configured to compute the cosine similarity of
with a class hypervector that has the same label as
. If the data point corresponds to the lth class, the similarity of a data point is computed with {right arrow over (C)}l as, δ(
, {right arrow over (C)}l), where δ denotes the cosine similarity. Instead of naively adding data points to the model, the HDC learning module 14 is configured to update the HDC model 16 based on the δ similarity. For example, if an input data has label l, the HDC model 16 updates as follows:
{right arrow over (C)}l←{right arrow over (C)}l+η(1−δl)×
{right arrow over (C)}l′←{right arrow over (C)}l′+η(1−δl′)×
where η is a learning rate. A large δ1 indicates that the input is a common data point that already exists in the model. Therefore, the update adds a very small portion of encoded query to the model to eliminate model saturation (1−δ1≅0).
The explained HDC training methods are slow in convergence. This slowness comes from the HDC training process that only updates two class hypervectors for each misclassification. However, a mispredicted class hypervector may not be the only class against this prediction. In other words, with adjusting the pattern of a mispredicted class, other class hypervectors that may wrongly match with a query may also need to be adjusted. This increases the number of required iterations to update the HDC model 16. To create a clear margin between the class hypervectors, for the first time, a formal loss function is defined for the HDC model 16 that enables updating of all class hypervectors for each misprediction. For each sample of data during retraining, the formal loss function computes the chance that the data correspond to all classes. Then, based on a data label, the formal loss function adaptively updates all class hypervectors.
As
Argmaxi−1k·{right arrow over (C)}i
Using dot product introduces existing loss functions to the HDC learning module 14, and this comes with several benefits:
The present disclosure also focuses on two loss functions: hinge loss and logarithmic loss. The hinge loss is commonly observed in support vector machines. This function seeks to maintain all similarity predictions (dot product) of the correct class larger than a predefined value, commonly 1, compared with all the other classes. Thus, there are penalties not only on mispredictions but also on correct predictions with very low confidence scores. For this reason, this function is also known for maximum margin classification and yields robust linear classifiers.
where o=(oi)k=1k is the similarity scores oi=·{right arrow over (C)}i, and y is the true class label.
The logarithmic loss, also known as cross-entropy loss, transforms similarity scores to distributions and brings classification probabilities of the correct classes to 1, regardless of whether the samples are misclassified or not:
where pi−1 if i=y, and zero otherwise, and q=(qi)i=1k is obtained using the softmax function in the outputs:
The following show the impact of different loss functions on the accuracy and efficiency of the AutoHD encoder 10.
Although the HDC model 16 can be used for online learning with a limited number of parameters, how one should select the best parameters is not clear. Disclosed is a Bayesian framework that identifies optimal hyperparameters of the AutoHD encoder 10 with limited sample data. The framework is used for at least two purposes: (1) finding the best hyperparameters for the AutoHD encoder 10 to maximize learning accuracy, which with the Bayesian framework can be performed using a very small number of samples; and (2) finding default parameters for the AutoHD encoder 10 to map into a new problem, which is necessary for problems for which not enough resources or time are available to optimize the AutoHD encoder 10 for each given data set.
An embodiment according to the present disclosure has been implemented with two co-designed modules, software implementation and hardware acceleration. In software, the effectiveness of the framework of the AutoHD encoder 10 was verified on large-scale learning problems. In hardware, training of the AutoHD encoder 10 and testing was implemented on central processing units (CPUs) and field-programmable gate arrays (FPGAs). For the FPGA, functional blocks of the AutoHD encoder 10 were created using Verilog and synthesized using the Xilinx Vivado Design Suite. The synthesis of the functional blocks was implemented on the Kintex-7 FPGA KC705 Evaluation Kit. Efficiency was ensured to be higher than another automated FPGA implementation. For the CPU, the code for the AutoHD encoder 10 was written in C++ and optimized for performance. The code has been implemented on Raspberry Pi (RPi) 3B+ using an ARM Cortex A53 CPU. The power consumption was collected by a Hioki 3337 power meter. Accuracy and efficiency of AutoHD encoder 10 were evaluated on several popular data sets (listed in Table 1) ranging from small data sets collected in a small IoT network to a large data set that includes hundreds of thousands of data points.
State-of-the-Art Machine Learning Algorithms:
Comparison with Existing HDC Algorithms:
Evaluation shows that the AutoHD encoder 10 provides a significantly higher quality of learning compared with existing encoders. The AutoHD encoder 10 uses VAE to preserve the correlation of all data points in the latent space, which gives the HDC model 16 a higher capacity to store correlative data and learn a suitable functionality. The results indicate that the AutoHD encoder 10 provides, on average, 19.6%, 17.3%, and 7.7% higher classification accuracy compared with associate-based, permutation-based, and random projection encoders, respectively.
The quality of learning for the AutoHD encoder 10 was compared using three different methods:
Naïve Training, which updates the HDC model 16 for each misprediction. The update only affects two class hypervectors and does not consider how far or marginal the misprediction occurred.
Adaptive Training, which updates the HDC model 16 using two introduced loss functions: hinge and log. During adaptive training, all class hypervectors are updated, each misprediction as well as correct predictions. This method maximizes the margin between the class hypervectors during the training, ensuring higher quality of learning with a lower number of required iterations.
Dimensionality:
VAE Depth: The AutoHD encoder 10 uses VAE as an HDC encoding module. The quality or the VAE latent space has direct impact on learning accuracy of the AutoHD encoder 10.
The present disclosure discloses the AutoHD encoder 10, which is a uniquely adaptive and trainable HDC encoding module that dynamically adjusts the similarity of the objects in high-dimensional space. The AutoHD encoder 10 develops a new class or variational autoencoder that ensures the latent space has an ideal representation for hyperdimensional learning. The AutoHD encoder 10 adaptively learns a better HDC representation depending on the changes on the environment, the complexity of the data, and uncertainty in the data. Also disclosed is a hyperdimensional classification that directly operates over encoded data and enables robust single-pass and iterative learning while defining the first formal loss function and training method for HDC. Evaluation shows that the AutoHD encoder 10 not only achieves faster and higher quality of learning but also provides inherent robustness to deal with dynamic and uncertain data.
It is contemplated that any of the foregoing aspects, and/or various separate aspects and features as described herein, may be combined for additional advantage. Any of the various embodiments as disclosed herein may be combined with one or more other disclosed embodiments unless indicated to the contrary herein.
Those skilled in the art will recognize improvements and modifications to the preferred embodiments of the present disclosure. All such improvements and modifications are considered within the scope of the concepts disclosed herein and the claims that follow.
This application claims the benefit of provisional patent application Ser. No. 63/237,648, filed Aug. 27, 2021, the disclosure of which is hereby incorporated herein by reference in its entirety.
This invention was made with government funds under grant number N000142112225 awarded by the Department of the Navy, Office of Naval Research. The U.S. Government has rights in this invention.
| Number | Date | Country | |
|---|---|---|---|
| 63237648 | Aug 2021 | US |