The present disclosure is directed at methods, systems, and techniques for kernel continual learning.
Dataset classification can be performed by computer-implemented machine learning models, such as artificial neural networks. For example, when a dataset comprises images of objects, a computer may be used to implement an object classifier configured to classify each of the objects using an artificial neural network by determining which of several different types of objects each of the depicted objects most closely resembles. The artificial neural network performs feature extraction and classification of the depicted objects.
A classifier can be used to perform multiple tasks, such as classifying different types of objects. Using the same classifier for multiple tasks can, in some circumstances, result in a problem known as “catastrophic forgetting” which may generally refer to catastrophic loss of previously learned information. For example, this may include the tendency of an artificial neural network to forget past knowledge and learned information upon learning information.
According to a first aspect, there is provided a method comprising: obtaining a dataset corresponding to a classification task; performing feature extraction on the dataset using an artificial neural network; and constructing a kernel using features extracted during the feature extraction for use in performing the classification task.
The dataset may be a current task dataset and the classification task may be a current classification task, and the method may further comprise selecting a coreset dataset from the current task dataset, wherein the feature extraction is performed on the coreset dataset, and wherein the kernel is constructed using the features extracted from the coreset dataset.
The method may further comprise performing the current classification task by applying the kernel to features extracted from the current task dataset.
The feature extraction may also be performed on elements of the current task dataset other than the coreset dataset, and performing the current classification task may comprise applying the kernel to features extracted from elements of the current task dataset other than the coreset.
The coreset dataset may be selected uniformly between existing classes of the current task dataset.
The dataset may be an input query dataset, and the method may further comprise: obtaining a task identifier that corresponds to the input query dataset; retrieving, using the task identifier, a coreset dataset corresponding to a classification task to be performed on the input query dataset, wherein the feature extraction is performed on the coreset dataset and on the input query dataset, and wherein the kernel is constructed using the features extracted from the coreset dataset; and classifying the input query dataset by applying the kernel to the features extracted from the input query dataset.
The dataset may comprise an image.
Constructing the kernel may comprise applying kernel ridge regression.
The artificial neural network may comprise at least one of a convolutional neural network and a multilayer perceptron.
The method may further comprise determining random Fourier features from the coreset dataset, and the kernel may be constructed using the random Fourier features.
The coreset dataset may be selected uniformly between existing classes of the input query dataset.
The feature extraction may be performed using a backbone network shared across multiple classification tasks.
According to another aspect, there is provided a system comprising: a processor; a non-transitory computer readable medium communicatively coupled to the processor and having stored thereon computer program code that is executable by the processor and that, when executed by the processor, causes the processor to perform the method of any of the foregoing aspects or suitable combinations thereof.
The system may also comprise a memory communicatively coupled to the processor for storing the coreset dataset.
A non-transitory computer readable medium having stored thereon computer program code that is executable by a processor and that, when executed by the processor, causes the processor to perform the method of any of the foregoing aspects or suitable combinations thereof.
This summary does not necessarily describe the entire scope of all aspects. Other aspects, features and advantages will be apparent to those of ordinary skill in the art upon review of the following description of specific embodiments.
In the accompanying drawings, which illustrate one or more example embodiments:
Artificial intelligence agents are known to suffer from catastrophic forgetting when learning over non-stationary data distributions. Continual learning, also known as life-long learning, was introduced to deal with catastrophic forgetting. It refers to an agent able to continually learn to solve a sequence of non-stationary tasks by accommodating new information, while remaining able to complete past experienced tasks with minimal performance reduction. The fundamental challenge in continual learning is catastrophic forgetting, which is caused by the interference among tasks from heterogeneous data distributions.
Task interference is almost unavoidable when model parameters, like the feature extractor and the classifier, are shared by all tasks. At the same time, it is practically infeasible to keep separate sets of model parameters for each individual task when learning with an arbitrary long sequence of tasks. Moreover, knowledge tends to be more shared and transferred in the lower layers than higher layers across tasks in deep neural networks. This motivates non-parametric classifiers that automatically avoid task interference without sharing any parameters across tasks. Kernel methods provide a well-suited tool due to their non-parametric nature, which have proven to be a powerful technique in the machine learning toolbox. Kernels were shown to be effective in the scenarios of incremental and multi-task learning with support vector machines. Recently, they have also demonstrated to be strong learners in tandem with deep neural networks, especially when learning from limited data. Inspired by the success of kernels in machine learning, in at least some example embodiments herein, there are provided methods and systems to decouple the feature extractor from the classifier and introduce task-specific classifiers based on kernels for continual learning.
“Kernel continual learning” is used herein to deal with catastrophic forgetting in continual learning. Specifically, non-parametric classifiers are learned based on kernel ridge regression. To do so, an episodic memory is deployed to store a subset of samples from the training data per task, the “coreset dataset” (hereinafter simply referred to as the “coreset”), and to learn the classifier based on kernel ridge regression. Using kernels in this fashion may, in at least some embodiments, be beneficial for several reasons. The direct interference of classifiers is naturally avoided as kernels are established in a non-parametric way per task and no classifier parameters are shared across tasks. Moreover, in contrast to conventional memory replay methods, kernel continual learning does not need to replay data from previous tasks for training the current task, which averts task inference while enabling more efficient optimization. In order to achieve adaptive kernels per task, random Fourier features are used to learn kernels in a data-driven manner. To be more specific, kernel continual learning is formalized with random Fourier features as a variational inference problem, where the random Fourier basis is treated as a latent variable and inferred from the coreset of each task. The variational inference formulation naturally induces the regularization term that encourages the model to learn adaptive kernels per task from the coreset only. Consequently, a more compact memory is achieved, which alleviates the storage overhead.
The technical problem solved by at least some embodiments of kernel continual learning herein is catastrophic forgetting in classifiers due to task interference. In continual learning for visual object recognition tasks, the classifier parameters of different tasks are interfered along learning process, causing forgetting knowledge of previous tasks. At least some embodiments of kernel continual learning herein are directed at non-parametric classifiers based on kernels. No classifier parameters are shared among tasks, therefore avoiding interference in classifiers. This enables kernel continual learning to continually solve recognition tasks while being able to solve previously learned tasks without significant performance drop.
As described further below, experiments in accordance with at least some embodiments are performed on four benchmark datasets: Rotated MNIST, Permuted MNIST, Split CIFAR100 and miniImageNet. The results demonstrate the effectiveness of kernel continual learning.
Conventional methods differ in the way they deal with catastrophic forgetting, which are briefly reviewed below in terms of regularization, dynamic architectures and experience replay.
Regularization methods determine the importance of each model's parameter per task, which prevents the parameters from being updated for new tasks. For example, each weight's performance may be specified with the Fisher information matrix. Alternatively, parameter importance may be determined by gradient magnitude. These methods can be explored from the lens of Bayesian optimization. For instance, a regularization technique, inspired by variational inference, may be used to protect against forgetting. Bayesian or not, regularization methods address catastrophic forgetting by adding a regularization term to the main loss function. The penalty term proposed in such methods are unable to prevent drift in the loss landscape of previous tasks. While alleviating forgetting, the penalty also prevents the plasticity to absorb new information from future tasks learned over a long timescale.
Dynamic architectures allocate a subset of the model parameters per task. This is achieved by a gating mechanism, or by incrementally adding new parameters to the model. Incrementally learning and pruning is another possibility. Given an over-parameterized model with the ability to learn quite a few tasks, model expansion can also be achieved by pruning the parameters not contributing to the performance of the current task, while keeping them available for future tasks. These methods are preferred when there is no memory usage constraint and the final model performance is prioritized. They offer an effective way to avoid task interference and catastrophic forgetting, at the expense of suffering from potentially unbounded model expansions and preventing positive knowledge transfer across tasks.
Experience replay methods assume it is possible to access data from previous tasks by having a fixed-size memory or a generative model able to produces samples from old tasks. A model may be augmented with fixed-size memory, which accumulates samples in the proximity of each class center. Alternatively, another memory-based model may be implemented by exploiting a reservoir sampling strategy in the raw input data selection phase. Rather than storing the original samples, certain other models accumulate the parameter gradients during task learning. Certain other models incorporate a generative model into a continual learning model to alleviate catastrophic forgetting by producing samples from previous tasks and retraining the model using data from previous tasks and the current task. These methods assume an extra neural network, such as a generative model or a memory is available. Otherwise, they cannot be exploited. Those replay-based methods benefit from a memory to retrain their model over previous tasks. In contrast, in at least some example embodiments, kernel continual learning only uses memory to store data as a task identifier proxy at inference time without the need of replay for training, which mitigates optimization cost in memory-based methods.
In a traditional supervised learning setting, a model or agent f is learned to map input data from the input space to its target in the corresponding output space: where samples X ∈ are assumed to be drawn from the same data distribution. In case of an image classification problem, X are the images and Y are associated class labels. Instead of solving a single task, continual learning aims to solve a sequence of different tasks, T1, T2, ·Tn, from non-stationary data distributions, where n stands for the number of tasks, and each task is an individual classification problem. A continual learner is required to continually solve each t of those tasks once being trained on its labeled data, while remaining able to solve previous tasks with no or limited access to their data.
Generally, a continual learning model based on a neural network can be regarded as a feature extractor hθ and a classifier fc. The feature extractor is a convolutional architecture before the last fully-connected layer that is shared across tasks. The classifier is the last fully-connected layer. In at least some embodiments herein, a task-specific, non-parametric classifier is implemented based on kernel ridge regression.
The model is trained on the current task t. Given its training data t, a subset of data is uniformly chosen between existing classes in current task t, which is called the “coreset” and denoted as: t=(xi, yi)i=1N
where λ is the weight decay parameter. Based on the Representer theorem:
where k(*,*) is the kernel function. Then a can be calculated in a closed form:
αt=Y(λI+)−1, (3)
where αt=[α1, . . . , αi, . . . , αN
To jointly learn the feature extractor hθ, the total loss function is minimized over samples from the remaining set:
Here (*) is the cross-entropy loss function and the predicted output {tilde over (y)}′ is computed by
{tilde over (y)}′=f
c
α
(ψ(x′))=Softmax(α{tilde over (K)}), (5)
where {tilde over (K)}=ψ(X)ψ(x′)T, ψ(X) denotes the feature maps of samples in the coreset, and Softmax(*) is the sofmax function applied to the output of kernel ridge regression.
Any semi-positive definite kernel, e.g., a radial basis function (RBF) kernel or a dot product linear kernel, may be used to construct the classifier. In at least some example embodiments, random Fourier features are introduced to train data-driven kernels, which have previously demonstrated success in regular learning tasks. Data-driven kernels by random Fourier features provide an appealing technique to train strong classifiers with a relatively small memory footprint for continual learning based on episodic memory.
One of the ingredients when finding a mapping function in non-parametric approaches, such as kernel-ridge regression, is the kernel function. Translation-invariant kernels may be approximated using explicit feature maps, this approach is underpinned by Bochner's theorem, in which a continuous, real valued, symmetric and shift-invariant function k(x, x′)=k(x−x′) on d is a positive definite kernel if and only if it is the Fourier transform p(w) of a positive finite measure such that:
k(x,x′)=eiω
where ζw(x)=eiω
With a sufficient number of samples ω drawn from p(ω), an unbiased estimation of k(x,x′) is ζw(x)ζw(x)*.
Based on Eq. (6), D sets of samples are drawn: {ωi}i=1D and {bi}i=1D from a normal distribution and uniform distribution (with a range of [0, 2π]), respectively, and the random Fourier features (RFFs) are constructed for each data point x using the formula:
Having the random Fourier features, the kernel matrix is determined as k(x, x′)=ψ(x)ψ(x′)T.
Traditionally the shift-invariant kernel is constructed based on random Fourier features, where the Fourier basis is drawn from a Gaussian distribution transformed from a pre-defined kernel. This results in kernels that are agnostic to the task. In continual learning, however, tasks are provided sequentially from non-stationary data distributions, which makes it sub-optimal to share the same kernel function across tasks. To address this problem, task-specific kernels are trained in a data-driven manner. This is suitable for continual learning as it is desirable to train informative kernels by using a coreset of a minimum size. This is formulated as a variational inference problem, where the random basis co is treated as latent variable.
From a probabilistic perspective, it is desirable to maximize the following conditional predictive log-likelihood for the current task t:
which amounts to making maximally accurate predictions on x based on t\t.
Introducing the random Fourier base ω in Eq. (8) that is treated as a latent variable, results in:
Data is used to infer the distribution over the latent variable ω whose prior is conditioned on the data. The data and co are combined to generate kernels to classify x based on kernel ridge regression. An uninformative prior of a standard Gaussian distribution can be placed over the latent variable ω, as described further below in respect of the experiments.
It is intractable to directly solve for the true posterior p(ω|x,y, t\t) over ω; therefore a variational posterior qϕ(ω|t) is introduced and conditioned solely on the coreset t because the coreset will be stored as episodic memory for the inference of each corresponding task.
By incorporating the variational posterior into Eq. (9) and applying Jensen's inequality, the evidence lower bound (ELBO) is established as follows:
Therefore, maximizing the ELBO amounts to maximizing the conditional log-likelihood in Eq. (8).
In the continual learning setting, the model is able to make predictions based solely on the coreset t that is stored in memory. That is, the conditional log-likelihood is conditioned on the coreset only. Based on the ELBO in Eq. (10) the following empirical objective function is established that is minimized by the overall training procedure:
where in the first term, the Monte Carlo method is used to draw samples from the variational posterior q(ω|t) to estimate the log-likelihood, and L is the number of Monte Carlo samples. In the second term, the conditional prior serves as a regularizer that ensures the inferred random Fourier basis should always be relevant to the current task. Minimizing the KL divergence enforces the distribution of random Fourier bases, as inferred from the coreset, to be close to the one from the training set. Moreover, the KL term enables generation of informative kernels adapted to each task by using a relatively small memory.
In practice, the conditional distributions qϕ(ω|t) and pγ(ω|t\t) are assumed to be Gaussian and may be implemented by using the amortization technique. That is, multiple-layer perceptrons are used to generate the distribution parameters, μ and σ, by taking the conditions as input. In experiments, two separate amortization networks are deployed, referred to as the inference network fϕ, for the variational posterior and the prior network fγ for the prior. In addition, to demonstrate the effectiveness of data-driven kernels, a variant of variational random features is implemented by replacing the conditional prior in Eq. (11) with an uninformative one, i.e., an isotropic Gaussian distribution (0,I). In this case, kernels are also learned in a data-driven way from the coreset without being regulated by the training data from the task.
Referring now to
The method 100 selects a representative task at block 104. More particularly, at block 104, a subset of one of the datasets, Dt, obtained from the training database 102, is randomly and uniformly chosen. This subset of Dt is stored in memory 114 for subsequent use at inference time and is excluded from Dt, which resulting dataset is denoted as Dt\Ct herein. This subset of the dataset Dt is the coreset, Ct.
After selecting the coreset, the method 100 comprises performing feature extraction at block 106 on the coreset Ct and dataset Dt\Ct to respectively map features from the coreset Ct and dataset Dt\Ct to a feature space. An artificial neural network, such as a convolutional neural network or multilayer perceptron, may be used to perform this feature extraction.
Extracted features are mapped to the Hilbert space at block 108. Advantageously and in at least some implementations, mapping features from a cartesian space to a Hilbert space allows computing a kernel for a task in a more efficient manner. Also at block 108, using the features extracted from the coreset Ct at block 106, random Fourier features are computed. At block 110, using the random Fourier features determined at block 108 over Ct, a task specific kernel is determined. And, at block 112, the features determined over Dt\Ct at block 106 are classified and their corresponding labels are predicted. By having the predicted label and ground truth, the performance of the method is evaluated and penalized accordingly. After label prediction, the predicted label is compared with the ground truth and the cross-entropy loss function is determined. By backpropagating the loss and error, the feature extractor used at block 106 and/or the random feature generation performed at block 108 may be improved and ideally optimized. Thus, at a high level and in some implementations, the training phase includes receiving a task data set regarding the current task; representing the current task with the task data set using a representative data set which is a subset of data from the task dataset (coreset dataset); extracting features representative and using the features extracted to perform random feature generation for generating the random features for computing the kernel (based on the features generated) to construct a classifier. Notably, for each task observed, a kernel is computed to represent each task and may be reviewed in the Hilbert space.
Referring now to
Referring now to
Experiments are conducted on four benchmark datasets for continual learning. Ablation studies are performed to demonstrate the effectiveness of kernels for continual learning as well as the benefit of variational random features in learning data-driven kernels. Four different datasets are used: Permuted MNIST, Rotated MNIST, Split CIFAR100, and Split minilmageNet.
Permuted MNIST: Following Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, J., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al., Overcoming catastrophic forgetting in neural networks, Proceedings of the national academy of sciences, 114(13):3521-3526, 2017, 20 different MNIST datasets are generated. Each dataset is created by a special pixel permutation of the input images, without changing their corresponding labels. Each dataset has its own permutation by owning a random seed.
Rotated MNIST: Similar to Permuted MNIST, Rotated MNIST has 20 tasks as in Mirzadeh, S. I., Farajtabar, M., Pascanu, R., and Ghasemzadeh, H., Understanding the role of training regimes in continual learning, arXiv preprint arXiv:2006.06958, 2020. Each task's dataset is a specific random rotation of the original MNIST dataset (e.g., task 1, task 3, and task 3 are the main MNIST dataset, 10 degree rotation, and 20 degree rotation, respectively). Each task's dataset is accordingly a ten degree rotation of the previous task's dataset.
Split CIFAR100: As described in Zenke, F., Poole, B., and Ganguli, S., Continual learning through synaptic intelligence, Proceedings of machine learning research, 70:3987, 2017b, this benchmark is generated by dividing the CIFAR100 dataset into 20 sections. Each section represents 5 our of 100 labels (without replacement) from CIFAR100. Hence, it contains 20 tasks and each task is a 5-way classification problem.
Split minilmageNet: Similar to Split CIFAR100, the minilmageNet benchmark as described in Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., and Wierstra, D., Matching Networks for One Shot Learning, arXiv:1606.04080v2 [cs.LG], 2017 contains 100 classes, a subset of the original ImageNet dataset in Russkovsky, 0., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A., and Fei-Fei, L., ImageNet Large Scale Visual Recognition Challenge, arXiv:1409.0575v3 [cs.CV]. It has 20 disjoint tasks, and in each task there exists 5 classes.
The “average accuracy” and “average forgetting” metrics are used to evaluate performance, as described below.
Average Accuracy: This score shows the model accuracy after training t consecutive tasks are finished. That is:
where at,i refers to model performance on task i after it is being trained on task t.
Average Forgetting: This metric measures the decline in accuracy per task between their highest accuracy and the final accuracy reached after model training is finished.
Taken together, the two metrics allow the assessment of how well a continual learner achieves its classification target while overcoming forgetting.
In at least some example embodiments, the system for kernel continual learning comprises three networks: a shared backbone hθ, a posterior network fϕ, and a prior network fγ. An overview of the system 1100 for kernel continual learning is depicted in
On the left of
In at least some embodiments, the system 1100 of
For the Permuted MNIST and Rotated MNIST benchmarks, hθ contains only two-hidden layers that each have 256 neurons, followed by a ReLU activation function. For Split CIFAR100, a ResNet18 architecture similar to Mirzadeh, S. I., Farajtabar, M., Pascanu, R., and Ghasemzadeh, H., Understanding the role of training regimes in continual learning, arXiv preprint arXiv:2006.06958, 2020 is used, and for minilmageNet, a ResNet18 architecture similar to Chaudry, A., Khan, N., Dokania, P., and Ton, P. H. S., Continual Learning in Low-rank Orthogonal Subspaces, arXiv:2010.11635v2 [cs.LG] is used. With regard to the fγ and fϕ, networks, three-hidden layers followed by an ELU activation function are used. The number of neurons in each layer depends on the benchmark. On Permuted MNIST and Rotated MNIST, there are 256 neurons per layer, and 160 and 512 are used for Split CIFAR100 and minilmageNet, respectively. To make fair comparisons, the model is trained for only one epoch per task, namely, each sample in the dataset is observed only once, and the batch size is assigned to be 10. Other optimization techniques such as weight-decay, learning rate decay and dropout are set to the same values as in Mirzadeh, S. I., Farajtabar, M., Pascanu, R., and Ghasemzadeh, H., Understanding the role of training regimes in continual learning, arXiv preprint arXiv:2006.06958, 2020. The model is implemented in Pytorch.
To demonstrate the effectiveness of kernels for continual learning, classifiers based on kernel ridge regression using commonly-used linear, polynomial, radial basis function (RBF) kernels, and the above-described variational random Fourier features are established. Results are reported on Split CIFAR100, where 5 different random seeds are sampled. For each random seed, the model is trained over different kernels. Finally, the result for each kernel is estimated by averaging over their corresponding random seeds. For fair comparison, all kernels are computed using the same coreset of size 20.
The results are shown in Table 1, below. All kernels perform well: the radial basis function (RBF) obtains a modest average accuracy in comparison to other basic kernels such as the linear and polynomials kernels. The linear and polynomial kernels perform similarly. The kernels obtained from variational random features (VRF) achieve the best performance in comparison to other kernels, and they work better than its uninformative counterpart. This emphasizes that the prior incorporated in VRF is more informative because its prior is data-driven.
Regarding VRF,
More particularly, the robustness of kernel continual learning when the number of tasks increases is considered in
To further demonstrate the memory benefit of data-driven kernel learning, variational random features with a predefined RBF kernel in
Since kernel continual learning does not need to replay and only uses memory for inference, the coreset size plays a crucial role. Its influence is therefore ablates on Rotated MNIST, Permuted MNIST, and Split CIFAR100 by varying the coreset sizes with 1, 2, 5, 10, 20, 30, 40, and 50. Here, the number of random bases is set to be 1024 for Rotated MNIST and Permuted MNIST, and 2048 for Split CIFAR100. The results in
When approximating VRF kernels the number of random Fourier bases is a hyperparameter. In principle, a larger number of random Fourier bases achieves better approximation of kernels, leading to better classification accuracy. Here its effect on the continual learning accuracy is investigated. Results with different numbers of bases are shown in
Kernel continual learning is compared with alternative methods on four benchmarks. The accuracy and forgetting scores in Table 5, below, for Rotated, Permuted MNIST and Split CIFAR100 are all adopted from Mirzadeh, S. I., Farajtabar, M., Pascanu, R., and Ghasemzadeh, H., Understanding the role of training regimes in continual learning, arXiv preprint arXiv:2006.06958, 2020, and results for minilmageNet are from Chaudry, A., Khan, N., Dokania, P., and Torr, P. H. S., Continual Learning in Low-rank Orthogonal Subspaces, arXiv:2010.11635v2 [cs.LG]. The column “if” indicates whether a model utilizes a memory, and if so, the column “when” denotes whether the memory data are used during training time or test time. Our method achieves better performance in terms of average accuracy and average forgetting. Moreover, as compared to memory-based methods such as A-GEM and ER-Reservoir, which replay over previous tasks (when =Train), kernel continual learning does not require replay, enabling kernel continual learning of at least some embodiments to be efficient during training time. Also for the most challenging minilmageNet dataset kernel continual learning performs better than other methods, both in terms of accuracy and forgetting. In
As another example application, applying a kernel-based classifier as described herein is used to perform recognition of hand-written digits in different rotation angles. Each rotation angle corresponds to a task, and those tasks are analyzed sequentially. Once trained on the current task of a certain angle, the kernel-based classifier recognizes digits in various different angles that it has previously been trained on without a need to retrain the model.
As described herein, kernel continual learning is a simple but effective variation of continual learning with kernel-based classifiers. To mitigate catastrophic forgetting, instead of using shared classifiers across tasks, task-specific classifiers are trained based on kernel ridge regression. Specifically, an episodic memory is used to store a subset of training samples for each task, which is referred to as the coreset. Kernel learning is formulated as a variational inference problem by treating random Fourier bases as the latent variable to be inferred from the coreset. By doing so, an adaptive kernel is generated for each task while requiring a relatively small memory size.
The processor used in the foregoing embodiments may comprise, for example, a processing unit (such as a processor, microprocessor, or programmable logic controller) or a microcontroller (which comprises both a processing unit and a non-transitory computer readable medium). Examples of computer readable media that are non-transitory include disc-based media such as CD-ROMs and DVDs, magnetic media such as hard drives and other forms of magnetic disk storage, semiconductor based media such as flash media, random access memory (including DRAM and SRAM), and read only memory. As an alternative to an implementation that relies on processor-executed computer program code, a hardware-based implementation may be used. For example, an application-specific integrated circuit (ASIC), field programmable gate array (FPGA), system-on-a-chip (SoC), or other suitable type of hardware implementation may be used as an alternative to or to supplement an implementation that relies primarily on a processor executing computer program code stored on a computer medium.
The embodiments have been described above with reference to flow, sequence, and block diagrams of methods, apparatuses, systems, and computer program products. In this regard, the depicted flow, sequence, and block diagrams illustrate the architecture, functionality, and operation of implementations of various embodiments. For instance, each block of the flow and block diagrams and operation in the sequence diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified action(s). In some alternative embodiments, the action(s) noted in that block or operation may occur out of the order noted in those figures. For example, two blocks or operations shown in succession may, in some embodiments, be executed substantially concurrently, or the blocks or operations may sometimes be executed in the reverse order, depending upon the functionality involved. Some specific examples of the foregoing have been noted above but those noted examples are not necessarily the only examples. Each block of the flow and block diagrams and operation of the sequence diagrams, and combinations of those blocks and operations, may be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. Accordingly, as used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise (e.g., a reference in the claims to “a challenge” or “the challenge” does not exclude embodiments in which multiple challenges are used). It will be further understood that the terms “comprises” and “comprising”, when used in this specification, specify the presence of one or more stated features, integers, steps, operations, elements, and components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and groups. Directional terms such as “top”, “bottom”, “upwards”, “downwards”, “vertically”, and “laterally” are used in the following description for the purpose of providing relative reference only, and are not intended to suggest any limitations on how any article is to be positioned during use, or to be mounted in an assembly or relative to an environment. Additionally, the term “connect” and variants of it such as “connected”, “connects”, and “connecting” as used in this description are intended to include indirect and direct connections unless otherwise indicated. For example, if a first device is connected to a second device, that coupling may be through a direct connection or through an indirect connection via other devices and connections. Similarly, if the first device is communicatively connected to the second device, communication may be through a direct connection or through an indirect connection via other devices and connections. The term “and/or” as used herein in conjunction with a list means any one or more items from that list. For example, “A, B, and/or C” means “any one or more of A, B, and C”.
It is contemplated that any part of any aspect or embodiment discussed in this specification can be implemented or combined with any part of any other aspect or embodiment discussed in this specification.
The scope of the claims should not be limited by the embodiments set forth in the above examples, but should be given the broadest interpretation consistent with the description as a whole.
It should be recognized that features and aspects of the various examples provided above can be combined into further examples that also fall within the scope of the present disclosure. In addition, the figures are not to scale and may have size and shape exaggerated for illustrative purposes.