The present invention relates to the field of digital computer systems, and more specifically, to a method for factorizing hypervectors.
Hypervectors may be factorized using resonator networks and bundling operations. Resonator networks are a type of recurrent neural network that interleaves vector symbolic architecture multiplication operations and pattern completion. Given a hypervector formed from an element-wise product of two or more atomic hypervectors (each from a fixed codebook), the resonator network may find its factors. The resonator network may iteratively search over the alternatives for each factor individually rather than all possible combinations until a set of factors is found that agrees with the input hypervector.
Various embodiments provide a method for performing factorization using a factorization system, computer program product and computer system as described by the subject matter of the independent claims. Advantageous embodiments are described in the dependent claims. Embodiments of the present invention can be freely combined with each other if they are not mutually exclusive.
In one aspect, the invention relates to a method for performing factorization using a factorization system, the factorization system comprising a resonator network that is configured for performing an iterative process in order to factorize an input hypervector into individual hypervectors representing a set of concepts respectively, the iterative process comprising for each concept of the set of concepts at least: an inference step for computing an unbound version of a hypervector representing the concept by an unbinding operation between the input hypervector and estimate hypervectors of the other concepts, a similarity step to compute a similarity vector indicating a similarity of the unbound version with each candidate code hypervector of the concept, and a superposition step to generate an estimate of a hypervector representing the concept; the method comprising: providing for each step of the iterative process alternative implementations of the step; receiving an input hypervector representing a data structure; selecting from the provided implementations for each step of the iterative process a specific implementation of the step; executing the iterative process using the selected implementations, thereby factorizing the input hypervector.
In one aspect the invention relates to a factorization system, the factorization system comprising a resonator network that is configured for performing an iterative process in order to factorize an input hypervector into individual hypervectors representing a set of concepts respectively, the iterative process comprising for each concept of the set of concepts at least: an inference step for computing an unbound version of a hypervector representing the concept by an unbinding operation between the input hypervector and estimate hypervectors of the other concepts, a similarity step to compute a similarity vector indicating a similarity of the unbound version with each candidate code hypervector of the concept, and a superposition step to generate an estimate of a hypervector representing the concept; the factorization system being configured for: providing for each step of the iterative process alternative implementations of the step; receiving an input hypervector representing a data structure; selecting from the provided implementations for each step of the iterative process a specific implementation of the step; executing the iterative process using the selected implementations, thereby factorizing the input hypervector.
Embodiments may further include approaches to factorize an input hypervector representing a plurality of concepts into individual hypervectors each representing a concept from a the plurality of concepts, through iterative processing of a resonator network. Further, embodiments may involve receiving an input hypervector representing a data structure comprised of a plurality of concepts. Further, embodiments may involve unbinding the input hypervector, into a plurality of unbound hypervectors, wherein the each of the plurality of unbound hypervectors corresponds to a single concept from the plurality of concepts of the data structure. Further, embodiments may involve generating one or more similarity vectors for each of the unbound hypervectors, wherein each of the one or more similarity vectors is based on the similarity of the unbound hypervector and one or more candidate code hypervectors representing each of the plurality of concepts. Further, embodiments may involve and generating a plurality of a principal hypervectors, wherein each principal hypervectors is an estimate which represents one concept of the plurality of concepts, based at least in part on the one or more similarity vectors corresponding to the one concept.
Further, embodiments may involve generating a similarity vector is based on one of the following: a dot product, L1 norm, L2 norm, or L{circumflex over ( )}∞ norm. Further, embodiments may involve unbinding the input hypervector, based on a circular convolution or an addition and modulo operation. Further, embodiments may involve adding noise to the similarity vector of the hypervector, wherein the noise is gaussian noise or uniform noise. Further, embodiments may further comprise separating the similarity vectors, based on a softmax operation or an identity operation. Further, embodiments may further comprise sparsifying one or more elements of the similarity vectors, wherein sparsifying is based on one or the following: a pre-determined threshold, a dynamic threshold, a Top-A operation, or an absolute value larger than a mean of all. Further, in an embodiment generating the plurality of principal hypervectors comprises combining through a linear combination each of the candidate code hypervectors with weights to generate a plurality of bundled weights, based on the sparsified similarity vector corresponding to the candidate code hypervector and applying the plurality of bundled weights to a selection function. Further in an embodiment, the approach factorize an input hypervector representing a plurality of concepts into individual hypervectors each representing a concept from a the plurality of concepts, through iterative processing of a resonator network performs a plurality of iterations until a convergence criterion is fulfilled. In an embodiment, the convergence criteria is when a value of at least one element of each of the plurality of similarity scores exceeds a threshold and in another embodiment the convergence criteria is when a predefined number of iterations.
An embodiment of the invention may be a computer program product comprising one or more computer readable storage devices and program instructions stored on the one or more computer readable storage devices, wherein the program instructions are executable by a computer processor to perform one or more operations or processes described throughout this specification.
An embodiment of the invention may be a computer system comprising one or more computer processors, one or more computer readable storage devices, and program instructions stored on the one or more computer readable storage devices for execution by at least one of the one or more computer processors. to perform one or more of the operations or processes described throughout this specification.
While the embodiments described herein are amenable to various modifications and alternative forms, specifics thereof have been shown by way of example in the drawings and will be described in detail. It should be understood, however, that the particular embodiments described are not to be taken in a limiting sense. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure.
The descriptions of the various embodiments of the present invention will be presented for purposes of illustration but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Hyperdimensional computing (HDC) represents data as large vectors called hypervectors. An entity may be represented using these hypervectors. A hypervector may be a vector of bits, integers, real or complex numbers. The hypervector is a vector having a dimension D higher than a minimum dimension, e.g., D>100. The hypervector according to the present subject matter may be sparse hypervector. The sparse hypervector may comprise a fraction of non-zeros which is smaller than a predefined maximum fraction (e.g., the maximum fraction may be 10%). The sparsity of the hypervectors may be chosen or may be dictated by the encoder (e.g., such as a neural network) that produced the hypervectors. HDC may enable computations on hypervectors via a set of mathematical operations. These operations may include a bundling operation. The bundling operation may also be referred to as addition, superposition, chunking, or merging. The bundling operation may combine several hypervectors into a single hypervector.
In one example, the hypervector may be segmented according to the present subject matter into a set of blocks so that a hypervector comprises a set of S blocks, each block having a dimension L, wherein D=S×L and S is the number of blocks in a hypervector. In one example, the hypervector may comprise a number S of blocks, wherein the block size may be higher than one, L>1. That is, each block of each hypervector may comprise L elements. The processing of the hypervectors may be performed blockwise.
According to one example, the hypervector comprises binary values {0, 1}D and has a sparsity smaller than a sparsity threshold. The sparsity may, for example, be the fraction of non-zero values in the hypervector. The present subject matter may enable an efficient bundling of hypervectors with controlled sparsity. According to one example, the sparsity threshold being in a range of: 0.3%-50%. Example values of the sparsity threshold may be 0.39%, 0.4%, 1% or 13%.
The factorization system may be used for querying data structures through their hypervector representations. Data structures may enable to represent cognitive concepts, such as colours, shapes, positions, etc. Each cognitive concept may comprise items e.g., items of the colour concept may comprise red, green, blue etc. The data structure may contain a combination (e.g., product) of multiple components each representing a cognitive concept. For example, the data structure may be an image of a red disk in the bottom right and a green rectangle in the top left, wherein the cognitive concepts may be the colour, shape, and position. In another example, a data structure may form a distributed representation of a tree, wherein each leaf in the tree may represent a concept, and each type of traversal operations in the tree may represent concepts. The data structure may be encoded by an encoder into a hypervector that uniquely represents the data structure.
The encoder may combine hypervectors that represent individual concepts with operations in order to represent a data structure. For example, the above mentioned image may be described as a combination of multiplication (or binding) and addition (or superposition) operations as follows: (bottom right*red*disk)+(top left*green*rectangle). The encoder may represent the image using hypervectors that represent the individual concepts and said operations to obtain the representation of the image as a single hypervector that distinctively represents the knowledge that the disk is red and placed at the bottom right and the rectangle is green and placed at the top left. The encoder may be defined by a vector space of a set of hypervectors which encode a set of cognitive concepts and algebraic operations on this set. The algebraic operations may, for example, comprise a superposition or bundling operation and a binding operation. In addition, the algebraic operations may comprise a permutation operation. The vector space may, for example, be a D-dimensional space, where D>100. The hypervector may be a D-dimensional vector comprising D numbers that define the coordinates of a point in the vector space. The D-dimensional hypervectors may be in {0,1}D. For example, a hypervector may be understood as a line drawn from the origin to the coordinates specified by the hypervector. The length of the line may be the hypervector's magnitude. The direction of the hypervector may encode the meaning of the representation. The similarity in meaning may be measured by the size of the angles between hypervectors. This may typically be quantified as a dot product between hypervectors. The encoder may be a decomposable (i.e., factored) model to represent the data structures. This may be advantageous as the access to the hypervectors may be decomposed into the primitive or atomic hypervectors that represent the individual items of the concepts in the data structure. For example, the encoder may use a Vector Symbolic Architecture (VSA) technique in order to represent the data structure by a hypervector. The encoder may enable to perform an elementwise multiply operation. The encoder may, for example, comprise a trained feed-forward neural network.
Hence, the encoding of data structures may be based on a predefined set of F concepts, where F>1 and candidate items that belong to each of the F concepts. Each candidate item may be represented by a respective hypervector. Each concept may be represented by a matrix of the hypervectors representing candidate items of the concept, e.g., each column of the matrix may be a distinct hypervector. The matrix may be referred to as codebook and the hypervector representing one item of the concept may be referred to as code hypervector. The components of the code hypervector may, for example, be randomly chosen. For example, a codebook representing the concept of colours may comprise seven possible colours as candidate items, a codebook representing the concept of shapes may comprise 26 possible shapes as candidate items etc. The codebooks representing the set of concepts may be referred to as X1, X2 . . . XF respectively. Each i-th codebook Xi may comprise Mx, Mx
Querying such data structures through their hypervector representations may require decoding the hypervectors. Decoding such hypervectors may be performed by testing every combination of code hypervectors. However, this may be very resource consuming. The present subject matter may solve this issue by using the factorization system. The factorization system may perform factorization using a resonator network. The resonator network may be an iterative approach. In particular, the resonator network can efficiently decode a given hypervector without needing to directly test every combination of factors making use of the fact that the superposition operation is used for the encoding of multiple concept items in the given hypervector and the fact that randomized code hypervectors may be highly likely to be close to orthogonal in the vector space, meaning that they can be superposed without much interference. For that, the resonator network may search for possible factorizations of the given hypervector by combining a strategy of superposition and clean-up memory. The clean-up memory may reduce some crosstalk noise between the superposed concept items. The resonator network combines the strategy of superposition and clean-up memory to efficiently search over the combinatorially large space of possible factorizations.
Thus, in each iteration, the resonator network may be configured to execute a sequence of steps (or processing steps) such as the inference step, the similarity step etc. For each step, the present subject matter may provide alternative possible implementations of the step. For example, the similarity step may be implemented by using dot product, L1 norm, L2 norm, L∞ norm etc. Thus, before executing the resonator network, a specific implementation may be selected for each step of the iterative process. In one selection example, the implementation of a given step may be randomly selected from the possible implementations of the step. This may particularly be advantageous in case the implementations are equally performant. In one selection example, the implementation of a given step may be selected based on received user input (e.g., the user input may indicate which implementation to use for the given step). The resonator may thus be executed to factorize hypervectors using the selected implementations.
The term “implementation” of a step X as used herein refers to a software module. The software module may be a group of code representing instructions that can be executed at a computing system or processor to perform the step X. The software module may be provided as a software application, a Dynamic Link Library (DLL), a software object, a software function, a software engine, an executable binary software file or the like. For example, the implementation of the similarity step may be a software module whose execution would compute the similarity step using a specific technique such as L2 norm that is associated to the implementation.
However, hypervectors may be sparse, meaning that they contain a small fraction of non-zeros. This may render the operations such as binding of hypervectors problematic and thus the factorization using the resonator network may not be accurate. The sparse hypervector may be a hypervector comprising a fraction of non-zeros which is smaller than a predefined maximum fraction (e.g., the maximum fraction may be 10%). The fraction of non-zeros may be the ratio of the non-zeros and the total number D of elements of the hypervector. The present subject matter may solve this issue by processing the hypervectors at block level rather than at individual element level during the iterative process. For that, the hypervector may be segmented into a set of blocks so that a hypervector comprises a set of S blocks, each block having a dimension L, wherein D=S×L. S is the number of blocks in a hypervector which may also be the number of non-zeros in the hypervector. The blocks may enable blockwise operations, that is, an operation involving hypervectors may be performed on corresponding blocks of the hypervectors. For example, an addition of two hypervectors may comprise addition of pairs of blocks of the two hypervectors, wherein each pair comprises two corresponding blocks (e.g., the first block of the first hypervector is to be processed with the first block of the second hypervector, the second block of the first hypervector is to be processed with the second block of the second hypervector, and so forth). The iterative process may process blockwise the hypervectors in one or more steps of the iterative process. These steps may involve binding and bundling operations. The blockwise binding and unbinding operations of two hypervectors x and y may be performed using the hypervectors. For example, the binding operation x⊙y, where ⊙ refers to the binding operation, may be performed using the circular convolution of hypervectors x and y. The unbinding operation x{circle around (/)}y, where {circle around (/)} refers to the unbinding operation, may be performed using the circular correlation of hypervectors x and y. The iterative process may stop if a convergence criterion is fulfilled. The convergence criterion may, for example, require a predefined number of iterations to be reached.
Thus, the iterative process may comprise multiple steps which are executed in each iteration of the process. Each step of these steps may be associated with multiple possible implementations. The present subject matter may efficiently factorize the hypervector representing a data structure into the primitives from which it is composed. For example, given a hypervector formed from an element-wise product of two or more hypervectors, its factors (i.e., the two or more hypervectors) may be efficiently found. This way, a nearest-neighbour lookup may need only search over the alternatives for each factor individually rather than all possible combinations. This may reduce the number of operations involved in every iteration of the resonator network and hence reduce the complexity of execution. This may also solve larger size problems (at fixed dimensions), and improve the robustness against noisy input hypervectors.
Assuming for a simplified description of the iterative process of the resonator network that the set of concepts comprises three concepts i.e., F=3, but it is not limited to. The codebooks/matrices representing the set of concepts may be referred to as X, Y and Z respectively (i.e., X=X1, Y=X2 and Z=X3.). The codebook X may comprise Mx code hypervectors x1 . . . xM
Given the hypervector s that represents the data structure and given the set of predefined concepts, an initialization step may be performed by initializing an estimate of the hypervector that represents each concept of the set of concepts. The initial estimates {circumflex over (x)}(0), ŷ(0) and {circumflex over (z)}(0) may, for example, be defined as a superposition of all candidate code hypervectors of the respective concept e.g., {circumflex over (x)}(0)=g(Σi=1, . . . , M
And, for each current iteration t of the iterative process, the following may be performed. Unbound hypervectors {tilde over (x)}(t), {tilde over (y)}(t) and {tilde over (z)}(t) may be computed. Each of the unbound hypervectors may be an estimate of the hypervector that represents the respective concept of the set of concepts. Each of the unbound hypervectors may be inferred from the hypervector s based on the estimates of hypervectors for the other remaining F−1 concepts of the set of concepts. The unbound hypervectors may be computed as follows: {tilde over (x)}(t)=s{circle around (/)}ŷ(t){circle around (/)}{circumflex over (z)}(t), {tilde over (y)}(t)=s{circle around (/)}{circumflex over (x)}(t){circle around (/)}{circumflex over (z)}(t) and {tilde over (z)}(t)=s{circle around (/)}{circumflex over (x)}(t){circle around (/)}ŷ(t), where {circle around (/)} refers to the unbinding operation. The unbinding operation of two hypervectors may be performed using different implementations such as a circular convolution of the two hypervectors or addition and modulo operation. For example, if the circular convolution implementation is selected, the unbinding operation for producing the unbound hypervectors may be performed using the circular convolution. This may be referred to as an inference step. An example implementation of the addition and modulo operation is described below.
The inference step may, however, be noisy if many estimates (e.g., F−1 is high) are tested simultaneously. The unbound hypervectors {tilde over (x)}(t), {tilde over (y)}(t) and {tilde over (z)}(t) may be noisy. This noise may result from crosstalk of many quasi-orthogonal code hypervectors, and may be reduced through a clean-up memory. After providing the unbound version of a hypervector of a given concept, the clean-up memory may be used to find the similarity of each code hypervector of said concept to the unbound version of the hypervector. This may be referred to as a similarity step. The similarity may be computed using a selected one of different implementations such as a dot product, L2 norm and L1 norm. For example, if the dot product implementation is selected, the similarity may be computed as a dot product of the codebook that represents said concept by the unbound version of the hypervector, resulting in an attention vector ax(t), ay(t) and az(t) respectively. The attention vector may be referred to herein as similarity vector. The similarity vectors ax(t), ay(t) and az(t) have sizes Mx, My and Mz respectively and may be obtained as follows: ax(t)=XT{tilde over (x)}(t)∈, ay(t)=YT{tilde over (y)}(t)∈
and az(t)=ZT{tilde over (z)}(t)∈
. The obtained similarity vectors are provided using a dot product similarity; however, it is not limited to as other similarity metrics may be used and which may be implemented differently. For example, the similarity vector ax(t) may indicate a similarity of the unbound hypervector {tilde over (x)}(t) with each candidate code hypervector of the concept (X) e.g., the largest element of ax(t) may indicate the code hypervector which matches best the unbound hypervector {tilde over (x)}(t). The similarity vector ay(t) may indicate a similarity of the unbound hypervector {tilde over (y)}(t) with each candidate code hypervector of the concept (Y) e.g., the largest element of ay(t) may indicate the code hypervector which matches best the unbound hypervector {tilde over (y)}(t). The similarity vector az(t) may indicate a similarity of the unbound hypervector {tilde over (z)}(t) with each candidate code hypervector of the concept (Z) e.g., the largest element of az(t) indicates the code hypervector which matches best the unbound hypervector {tilde over (z)}(t).
A superposition (or bundling) using the similarity vectors ax(t), ay(t) and az(t) as weights may be performed and optionally followed by the application of a selection function g. This may be referred to as the superposition step. This superposition step may be performed using the similarity vectors ax(t), ay(t) and az(t) as follows: {circumflex over (x)}(t+1)=g(ax(t)X), ŷ(t+1)=g(ay(t)Y) and {circumflex over (z)}(t+1)=g(az(t)Z) respectively, in order to obtain the current estimates {circumflex over (x)}(t+1), ŷ(t+1) and {circumflex over (z)}(t+1) respectively of the hypervectors that represent the set of concepts. In other words, the superposition step generates each of the estimates {circumflex over (x)}(t+1), ŷ(t+1) and {circumflex over (z)}(t+1) representing the respective concept by a linear combination of the candidate code hypervectors (provided in respective matrices X, Y and Z), with weights given by the respective similarity vectors ax(t), ay(t) and az(t), and optionally followed by the application of the selection function g. The superposition step may involve a bundling operation which may be performed according to the present method. The bundling operation may be performed using a selected implementation of different possible implementations as described herein. For example, the bundling operation may be performed by weighted additive superposition of code vectors, where the weights are given by the values of the similarity vector. The superposition may for example involve one or more bundling operations which are performed by the present method. Hence, the current estimates of the hypervectors representing the set of concepts respectively may be defined as follows {circumflex over (x)}(t+1)=g(XXT (s{circle around (/)}ŷ(t){circle around (/)}{circumflex over (z)}(t))), ŷ(t+1)=g(YYT (s{circle around (/)}{circumflex over (x)}(t){circle around (/)}{circumflex over (z)}(t))) and {circumflex over (z)}(t+1)=g(ZZT (s{circle around (/)}{circumflex over (x)}(t){circle around (/)}ŷ(t))) where g is the selection function, for example, an argmax function.
The iterative process may stop if a convergence criterion is fulfilled. The convergence criterion may, for example, require that the value of at least one element of each similarity vector ax(t), ay(t) and az(t) exceeds a threshold. In another example, the convergence criterion may require a predefined number of iterations to be reached.
The present subject matter may enable an efficient bundling of hypervectors with arbitrary sparsity. In one example implementation of the bundling operation, the bundling operation may be performed as follows: given hypervectors A, B and C of size D each, the bundling operation B=A+C, may be performed as follows: each of the hypervectors A and C may be segmented into S blocks, the element wise sum of the hypervectors A and C may be performed to obtain the sum hypervector B, elements of each block of the sum hypervector B that fulfill the selection criterion may be selected, and may thus preserved and the non-selected elements may be set to zero in the sum hypervector B. In addition, the present subject matter may provide different implementations of the bundling operation by using different selection criterions. According to one example implementation of the bundling operation, the selection criterion requires a predefined number a of largest elements per block of the sum hypervector. In one example, the predefined number a may be a hyperparameter of randomly selected value or user defined value. This example may enable a bundling procedure which may have a significantly larger bundling capacity compared to the previous state-of-the-art. According to one example implementation of the bundling operation, the selection criterion requires the element of the block exceeds a threshold value, wherein if no element exceeds the threshold value in a block, all elements of the block are selected. This example may provide a threshold-based bundling operation which may be more amenable to hardware implementations as it may reduce to element-wise comparisons. According to one example, the method further comprises normalizing the bundled hypervector per block.
The present resonator implementation may mitigate the following limitations of an existing resonator. The existing sparse resonator may not factorize S=4 sparse hypervectors at problem sizes larger than M=103 for a number 3 of codebooks and size of 10 per codebook. This may limit the deep neural network (DNN) application to problems with up to 103 classes, making extreme classification problems (>100 k classes) not viable with S=4. The existing resonator may not factorize S=2 sparse hypervectors at any problem size. Concretely, at problem size M=1024, the existing resonator may achieve a less than 10% factorization accuracy. The architecture of the existing resonator may be rigid, and may not allow for simple modification which may boost factorization accuracy depending on the problem at hand.
According to one example, the similarity step may further comprise a step of adding noise to the similarity vectors. The adding noise step may be performed using a selected one of different implementations such as an implementation that adds gaussian noise and another implementation that adds uniform noise.
According to one example, the similarity step may further comprise a similarity separation step for applying a function to the similarity vectors. The similarity separation step may be performed using a selected one of different implementations such as an implementation that uses a softmax function and another implementation that uses an identity function.
According to one example, the similarity step comprises a step of sparsifying the similarity vector before the superposition step is performed on the sparsified similarity vector. That is, the similarity vectors ax(t), ay(t) and az(t) are sparsified in order to obtain the sparsified similarity vectors a′x(t), a′y(t) and a′z(t) respectively. The scarification step may for example be implemented using one of multiple possible implementations such as an implementation as described below. The sparsification of the similarity vector may be performed, in accordance with a selected implementation, by activating a portion of the elements of the similarity vector and deactivating the remaining portion of the elements of the similarity vector. Activating an element of the similarity vector means that the element may be used or considered when an operation is performed on the similarity vector. Deactivating an element of the similarity vector means that the element may not be used or considered when an operation is performed on the similarity vector. For example, a′x(t)=kact(ax(t)), a′y(t)=kact(ay(t)) and a′z(t)=kact(az(t)), where kact is an activation function. In this case, the superposition step described above may be performed on the sparsified similarity vectors a′x(t), a′y(t) and a′z(t) (instead of the similarity vectors ax(t), ay(t) and az(t)) as follows: {circumflex over (x)}(t+1)=g(Xa′x(t)), ŷ(t+1)=g(Ya′y(t)) and {circumflex over (z)}(t+1)=g(Za′z(t)) respectively, in order to obtain the current estimates {circumflex over (x)}(t+1), ŷ(t+1) and {circumflex over (z)}(t+1) respectively of the hypervectors that represent the set of concepts. In other words, the superposition step generates each of the estimates {circumflex over (x)}(t+1), ŷ(t+1) and {circumflex over (z)}(t+1) representing the respective concept by a linear combination of the candidate code hypervectors (provided in respective matrices X, Y and Z), with weights given by the respective sparsified similarity vectors a′x(t), a′y(t) and a′z(t), followed by the application of the selection function g. The weights given by the sparsified similarity vector are the values of the sparsified similarity vector. Hence, the current estimates of the hypervectors representing the set of concepts respectively may be defined as follows {circumflex over (x)}(t+1)=g(Xkact(XT (s{circle around (/)}ŷ(t){circle around (/)}{circumflex over (z)}(t)))), ŷ(t+1)=g(Ykact(YT (s{circle around (/)}{circumflex over (x)}(t){circle around (/)}{circumflex over (z)}(t)))) and {circumflex over (z)}(t+1)=g(Zkact(ZT (s{circle around (/)}{circumflex over (x)}(t){circle around (/)}ŷ(t)))).
This example may be advantageous because the sparsification may result in doing only a part of vector multiplication-addition operations instead of all Mx, My or Mz operations and thus may save processing resources. The present subject matter may provide different implementations for performing the sparsification step.
In one first example implementation of the sparsification step, the activation function kact may only activate the top j values in each of the similarity vectors ax(t), ay(t) and az(t), where j<<Mx, j<<My and j<<Mz respectively, and deactivate the rest of elements by setting them to a given value (e.g., zero) to produce a′x(t), a′y(t) and a′z(t) respectively. The top j values of a similarity vector may be obtained by sorting the values of the similarity vector and selecting the j first ranked values. j may, for example, be a configurable parameter whose value may change e.g., depending on available resources.
This example may be advantageous because the sparsification may reduce the amount of computations, increase the size of solvable problems by an order of magnitude at a fixed vector dimension, and improve the robustness against noisy input vectors.
In a second example implementation of the sparsification step, the activation function kact may activate each element in each of the of the similarity vectors ax(t), ay(t) and az(t) only if its absolute value is larger than a mean of all elements of the respective similarity vector. The mean is determined using the absolute values of the similarity vector.
This example may be advantageous because the sparsification may improve the computational complexity of the first embodiment by removing the sort operation needed to find the top-j elements.
In a third example implementation of the sparsification step, the activation function kact may be implemented as follows: in case the maximum value of the sparsified similarity vector exceeds a predefined threshold, the maximum value may be maintained and remaining elements of the sparsified similarity vector may be set to zero. This may be referred to as a pullup activation.
The following is an example of the addition and modulo operation. For example, for binding two blocks of two hypervectors, addition and modulo binding may only work if the blocks are fully sparse, meaning that each block only has one non-zero element. Additionally, this operation assumes that the non-zero element is a 1, that's is, the blocks are binary blocks. For example, the first block is [0, 1, 0, 0] and the second block is [0, 0, 0, 1]. Because the blocks are fully sparse, they can equivalently be represented by only showing the location within the block of the non-zero element; this may be called offset representation. This may result in the following representations of the two blocks: [0, 1, 0, 0]<->1 and [0, 0, 0, 1]<->3. The representation “1” indicates that the second element of the first block has value 1. The representation “3” indicates that the fourth element of the second block has value 1. In this case, the two hypervectors may be bound to each other by computing the sum of their offset representations. However, simply summing the two offset representations may result in a block which is longer than the two inputs to the binding operation. To alleviate this, the final bound block is calculated as the sum of offset representations modulo block length. In this case, the block length L is 4, so the binding results in: (1+3) mod 4=0 which is equivalent to the vector [1, 0, 0, 0], because the offset representation “0” refers to the first element of the block. Circular convolution may implement this exact operation in case the inputs are fully sparse in binary. It is however not restricted to such inputs; it is well-defined for all finite-length block with bounded elements.
The resonator network system 100 may be configured to execute a resonator network to decode hypervectors that are encoded in a vector space defined by three concepts. The codebooks representing the set of concepts may be referred to as X, Y and Z respectively. The codebook X may comprise Mx code hypervectors x1 . . . xM
An input hypervector 101 named s may be received by the resonator network system 100. The input hypervector s may be the result of encoding a data structure such as a coloured image comprising MNIST digits. The encoding may be performed by a VSA technique. At an initial state t=0 the resonator network system 100 may initialize an estimate of the hypervector that represents each concept of the set of concepts as a superposition of all candidate code hypervectors of said concept as follows: {circumflex over (x)}(0)=g(Σi=1, . . . , M
The operation of the resonator network system 100 may be described for a current iteration t. The network nodes 102x, 102y and 102z may receive simultaneously or substantially simultaneously the respective triplet (s, ŷ(t), {circumflex over (z)}(t)), (s, {circumflex over (x)}(t), {circumflex over (z)}(t)) and (s, {circumflex over (x)}(t), ŷ(t)). The three network nodes may compute the unbound versions {tilde over (x)}(t), {tilde over (y)}(t) and {tilde over (z)}(t) of the hypervectors that represent the set of concepts respectively as follows: {tilde over (x)}(t)=s{circle around (/)}ŷ(t){circle around (/)}{circumflex over (z)}(t), {tilde over (y)}(t)=s{circle around (/)}{circumflex over (x)}(t){circle around (/)}{circumflex over (z)}(t) and {tilde over (z)}(t)=s{circle around (/)}{circumflex over (x)}(t){circle around (/)}ŷ(t), where {circle around (/)} refers to blockwise unbinding. This may be referred to as an inference step. That is, the nodes may perform the inference step on respective input triplets. The blockwise unbinding of hypervectors may, for example, be performed using the circular correlation between the hypervectors.
The similarity of the unbound version {tilde over (x)}(t) with each of the Mx code hypervectors x1 . . . xM for multiplying the hypervector {tilde over (x)}(t) by the matrix XT. The similarity of the unbound version {tilde over (y)}(t) with each of the My code hypervectors y1 . . . yM
for multiplying the hypervector {tilde over (y)}(t) by the matrix YT. The similarity of the unbound version {tilde over (z)}(t) with each of the Mz code hypervectors z1 . . . zM
for multiplying the hypervector {tilde over (z)}(t) by the matrix ZT. The resulting vectors ax(t), ay(t) and az(t) may be named similarity vectors or attention vectors. The largest element of each of the similarity vectors ax (t), ay(t) and az(t) indicates the code hypervector which matches best the unbound version {tilde over (x)}(t), {tilde over (y)}(t) and {tilde over (z)}(t) respectively.
After computing the similarity vectors, the similarity vectors ax(t), ay(t) and az(t) may optionally be sparsified using the activation function kact implemented by the activation units 106x, 106y and 106z respectively. The sparsification of the similarity vector may be performed by activating a portion of the elements of the similarity vector. For that, the activation function kact may be used to activate said portion of elements as follows: a′x(t)=kact(ay(t)), a′y(t)=kact(ay(t)) and a′z(t)=kact(az(t)). The modified/sparsified similarity vectors a′x(t), a′y(t) and a′z(t) may be the output of the similarity step. Thus, for each concept of the set of concepts, the similarity step may receive as input the respective one of the unbound versions {tilde over (x)}(t), {tilde over (y)}(t) and {tilde over (z)}(t) and provide as output the respective one of the modified similarity vectors a′x(t), a′y(t) and a′z(t).
After obtaining the modified similarity vectors a′x(t), a′y(t) and a′z(t), a superposition step may be applied on the modified similarity vectors a′x(t), a′y(t) and a′z(t). In case the sparsification is not performed, the superposition step may be performed on the similarity vectors ax(t), ay(t) and az(t).
In one first example implementation of the superposition step, a weighted superposition of the modified similarity vectors a′x(t), a′y(t) and a′z(t) may be performed using the codebooks XT, YT and ZT stored in memories 108x, 108y, and 108z respectively. This may be performed by the following matrix vector multiplications: Xa′x(t), Ya′y(t) and Za′z(t). The resulting hypervectors Xa′x(t)), Ya′y(t)) and Za′z(t) may be fed to the selection units 110x, 110y and 110z respectively. This may enable to obtain the estimate of the hypervectors {circumflex over (x)}(t+1), ŷ(t+1) and {circumflex over (z)}(t+1) respectively for the next iteration t+1 as follows: {circumflex over (x)}(t+1)=g(Xa′x(t)), ŷ(t+1)=g(Ya′y(t)) and {circumflex over (z)}(t+1)=g(Za′z(t)). This may enable the superposition step of the iterative process. For each concept of the concepts the superposition step may receive as input the respective one of the modified similarity vectors a′x(t), a′y(t) and a′z(t) and provides as an output the respective one of the hypervectors {circumflex over (x)}(t+1), ŷ(t+1) and {circumflex over (z)}(t+1). Hence, the estimate of the hypervectors representing the set of concepts respectively may be defined according to the present system as follows {circumflex over (x)}(t+1)=g(Xkact(XT (s{circle around (/)}ŷ(t){circle around (/)}{circumflex over (z)}(t)))), ŷ(t+1)=g(Ykact(YT (s{circle around (/)}{circumflex over (x)}(t){circle around (/)}{circumflex over (z)}(t)))) and {circumflex over (z)}(t+1)=g(Zkact(ZT (s{circle around (/)}{circumflex over (x)}(t){circle around (/)}ŷ(t)))) where g is the selection function.
The iterative process may stop if a stopping criterion is fulfilled. The stopping criterion may, for example, require that {circumflex over (x)}(t+1)={circumflex over (x)}(t), ŷ(t+1)={circumflex over (z)}(t) and {circumflex over (z)}(t+1)={circumflex over (z)}(t) or that a maximum number of iterations is reached.
The factorization system 300 may, for example, be a resonator network as described with reference to
Each step of the iterative process may be implemented using alternative implementations or methods. This is indicated in
Hence, appropriately selected and configured modules can have a significant positive impact on both factorization accuracy and noise resilience.
To configure the factorization system 300, an optimal number of the factors and codebooks may be chosen. To find the optimal number of the factors and codebooks, a grid-search-related algorithm can be used. For example, a search may be performed so that the number of the factors F may be significantly smaller than the codebook size M, i.e., F<<M.
Multiple characteristics to identify the modular sparse factorization aspect of the factorization system 300 may be as follows. Any system factorizing a given D-dimensional product vector in less time than a brute-force approach. The input may be in the sparse domain. The input does not have to lie within the domain of binary sparse block code, or sparse block code. Any product vector with a relatively small number of nonzero elements with real-valued values may be applicable. The system's output defines a set of F factors, necessarily revealing the number of the factors the system operates with. The number of the output factors F should be in the range of the optimal number of the factors.
The element-wise sum of the set of hypervectors may be computed in step 11. This may result in a hypervector named sum hypervector. The blocks of the sum hypervector may be determined in step 13. For example, the sum hypervector may be segmented into a defined number of blocks. Alternatively, the set of hypervectors may be segmented into the defined number of blocks, and the sum hypervector may automatically obtain the block structure. Elements of each block of the sum hypervector that fulfill a selection criterion may be selected in step 15. The non-selected elements of the sum hypervector may be set in step 17 to zero in the sum hypervector, resulting in a bundled hypervector. The selected elements are preserved.
Four hypervectors 501.1 through 501.4 are to be bundled. Each hypervector of the four hypervectors 501.1 through 501.4 may comprise two blocks. The sum of the four hypervectors 501.1 through 501.4 may be performed element-wise to obtain the sum hypervector 502. A selection of the a-many largest elements per block of the sum hypervector 502 may be performed. In the example of
Four hypervectors 601.1 through 601.4 are to be bundled. Each hypervector of the four hypervectors 601.1 through 601.4 may comprise two blocks. The sum of the four hypervectors 601.1 through 601.4 may be performed element-wise to obtain the sum hypervector 602. All elements of the sum hypervector 602 may be compared with a fixed threshold value. The elements that exceed the threshold are preserved, while all others are set to 0. In each block, if no element exceeds the threshold, the entire block is preserved. This may result in the hypervector 603. The blocks of the hypervector 603 are then normalized in order to obtain the bundled hypervector 604. This threshold-based bundling operation may be more amenable to hardware implementations as it may reduce to element-wise comparisons. The variance in the number of activated elements may be significantly larger than that of the top-a based implementation shown in
A granularity of hypervectors may be determined in step 801 so that a hypervector comprises a set of S blocks, each block having size L≥1, wherein D=S×L. For example, the block size may be higher than one, L>1. In other words, step 801 comprises determining for each hypervector a set of S blocks, each block having size L, where D=S×L. For example, the hypervector may be segmented or divided into a number of blocks that is equal to the number of non-zero values (e.g., non-zero value=1) in the hypervector so that each block may comprise one non-zero value. Each processed hypervector may have the same number S of blocks, but the positions/indices of the non-zero values within blocks may differ between the hypervectors.
A data structure may be represented in step 803 by a hypervector s using an encoder such as a VSA based encoder. The data structure may, for example, be a query image representing a visual scene. The encoder may be a feed-forward neural network that is trained to produce the hypervector s as a compound hypervector describing the input visual image. The image may comprise coloured MNIST digits. The components of the image may be the colour, shape, vertical and horizontal locations of the letters in the image. The encoder may, for example, be configured to compute a hypervector for each letter in the image by multiplying the related quasi-orthogonal hypervectors drawn from four fixed codebooks of four concepts: colour codebook (with 7 possible colours), shape codebook (with 26 possible shapes), vertical codebook (with 50 locations), and horizontal codebook (with 50 locations). The product vectors for every letter are added (component-wise) to produce the hypervector s describing the whole image.
For each step of the iterative process of the resonator network a desired implementation may be selected in step 804. For example, as shown in
The hypervector s may be decomposed in step 805 using the resonator network in accordance with the chosen or selected implementations and the determined blocks. The resonator network is configured to receive the input hypervector s and to perform an iterative process to factorize the input hypervector into individual hypervectors representing the set of concepts respectively. The iterative process comprises for each concept of the set of concepts: an inference step for computing an unbound version of a hypervector representing the concept by a blockwise unbinding operation between the input hypervector and estimate hypervectors of the other concepts, a similarity step to compute a similarity vector indicating a similarity of the unbound version with each candidate code hypervector of the concept, and a superposition step to generate an estimate of a hypervector representing the concept by a linear combination of the candidate code hypervectors, with weights given by the similarity vector. The superposition step may, for example, be performed as described with reference to
Computing environment 1800 contains an example of an environment for the execution of at least some of the computer code involved in performing the inventive methods, such as a code for hypervector factorization engine 1900. In addition to block 1900, computing environment 1800 includes, for example, computer 1801, wide area network (WAN) 1802, end user device (EUD) 1803, remote server 1804, public cloud 1805, and private cloud 1806. In this embodiment, computer 1801 includes processor set 1810 (including processing circuitry 1820 and cache 1821), communication fabric 1811, volatile memory 1812, persistent storage 1813 (including operating system 1822 and block 1900, as identified above), peripheral device set 1814 (including user interface (UI) device set 1823, storage 1824, and Internet of Things (IoT) sensor set 1825), and network module 1815. Remote server 1804 includes remote database 1830. Public cloud 1805 includes gateway 1840, cloud orchestration module 1841, host physical machine set 1842, virtual machine set 1843, and container set 1844.
COMPUTER 1801 may take the form of a desktop computer, laptop computer, tablet computer, smart phone, smart watch or other wearable computer, mainframe computer, quantum computer or any other form of computer or mobile device now known or to be developed in the future that is capable of running a program, accessing a network or querying a database, such as remote database 1830. As is well understood in the art of computer technology, and depending upon the technology, performance of a computer-implemented method may be distributed among multiple computers and/or between multiple locations. On the other hand, in this presentation of computing environment 1800, detailed discussion is focused on a single computer, specifically computer 1801, to keep the presentation as simple as possible. Computer 1801 may be located in a cloud, even though it is not shown in a cloud in
PROCESSOR SET 1810 includes one, or more, computer processors of any type now known or to be developed in the future. Processing circuitry 1820 may be distributed over multiple packages, for example, multiple, coordinated integrated circuit chips. Processing circuitry 1820 may implement multiple processor threads and/or multiple processor cores. Cache 1821 is memory that is located in the processor chip package(s) and is typically used for data or code that should be available for rapid access by the threads or cores running on processor set 1810. Cache memories are typically organized into multiple levels depending upon relative proximity to the processing circuitry. Alternatively, some, or all, of the cache for the processor set may be located “off chip.” In some computing environments, processor set 1810 may be designed for working with qubits and performing quantum computing.
Computer readable program instructions are typically loaded onto computer 1801 to cause a series of operational steps to be performed by processor set 1810 of computer 1801 and thereby effect a computer-implemented method, such that the instructions thus executed will instantiate the methods specified in flowcharts and/or narrative descriptions of computer-implemented methods included in this document (collectively referred to as “the inventive methods”). These computer readable program instructions are stored in various types of computer readable storage media, such as cache 1821 and the other storage media discussed below. The program instructions, and associated data, are accessed by processor set 1810 to control and direct performance of the inventive methods. In computing environment 1800, at least some of the instructions for performing the inventive methods may be stored in block 1900 in persistent storage 1813.
COMMUNICATION FABRIC 1811 is the signal conduction path that allows the various components of computer 1801 to communicate with each other. Typically, this fabric is made of switches and electrically conductive paths, such as the switches and electrically conductive paths that make up busses, bridges, physical input/output ports and the like. Other types of signal communication paths may be used, such as fiber optic communication paths and/or wireless communication paths.
VOLATILE MEMORY 1812 is any type of volatile memory now known or to be developed in the future. Examples include dynamic type random access memory (RAM) or static type RAM. Typically, volatile memory 1812 is characterized by random access, but this is not required unless affirmatively indicated. In computer 1801, the volatile memory 1812 is located in a single package and is internal to computer 1801, but, alternatively or additionally, the volatile memory may be distributed over multiple packages and/or located externally with respect to computer 1801.
PERSISTENT STORAGE 1813 is any form of non-volatile storage for computers that is now known or to be developed in the future. The non-volatility of this storage means that the stored data is maintained regardless of whether power is being supplied to computer 1801 and/or directly to persistent storage 1813. Persistent storage 1813 may be a read only memory (ROM), but typically at least a portion of the persistent storage allows writing of data, deletion of data and re-writing of data. Some familiar forms of persistent storage include magnetic disks and solid state storage devices. Operating system 1822 may take several forms, such as various known proprietary operating systems or open source Portable Operating System Interface-type operating systems that employ a kernel. The code included in block 1900 typically includes at least some of the computer code involved in performing the inventive methods.
PERIPHERAL DEVICE SET 1814 includes the set of peripheral devices of computer 1801. Data communication connections between the peripheral devices and the other components of computer 1801 may be implemented in various ways, such as Bluetooth connections, Near-Field Communication (NFC) connections, connections made by cables (such as universal serial bus (USB) type cables), insertion-type connections (for example, secure digital (SD) card), connections made through local area communication networks and even connections made through wide area networks such as the internet. In various embodiments, UI device set 1823 may include components such as a display screen, speaker, microphone, wearable devices (such as goggles and smart watches), keyboard, mouse, printer, touchpad, game controllers, and haptic devices. Storage 1824 is external storage, such as an external hard drive, or insertable storage, such as an SD card. Storage 1824 may be persistent and/or volatile. In some embodiments, storage 1824 may take the form of a quantum computing storage device for storing data in the form of qubits. In embodiments where computer 1801 is required to have a large amount of storage (for example, where computer 1801 locally stores and manages a large database) then this storage may be provided by peripheral storage devices designed for storing very large amounts of data, such as a storage area network (SAN) that is shared by multiple, geographically distributed computers. IoT sensor set 1825 is made up of sensors that can be used in Internet of Things applications. For example, one sensor may be a thermometer and another sensor may be a motion detector.
NETWORK MODULE 1815 is the collection of computer software, hardware, and firmware that allows computer 1801 to communicate with other computers through WAN 1802. Network module 1815 may include hardware, such as modems or Wi-Fi signal transceivers, software for packetizing and/or de-packetizing data for communication network transmission, and/or web browser software for communicating data over the internet. In some embodiments, network control functions and network forwarding functions of network module 1815 are performed on the same physical hardware device. In other embodiments (for example, embodiments that utilize software-defined networking (SDN)), the control functions and the forwarding functions of network module 1815 are performed on physically separate devices, such that the control functions manage several different network hardware devices. Computer readable program instructions for performing the inventive methods can typically be downloaded to computer 1801 from an external computer or external storage device through a network adapter card or network interface included in network module 1815.
WAN 1802 is any wide area network (for example, the internet) capable of communicating computer data over non-local distances by any technology for communicating computer data, now known or to be developed in the future. In some embodiments, the WAN 1802 may be replaced and/or supplemented by local area networks (LANs) designed to communicate data between devices located in a local area, such as a Wi-Fi network. The WAN and/or LANs typically include computer hardware such as copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and edge servers.
END USER DEVICE (EUD) 1803 is any computer system that is used and controlled by an end user (for example, a customer of an enterprise that operates computer 1801), and may take any of the forms discussed above in connection with computer 1801. EUD 1803 typically receives helpful and useful data from the operations of computer 1801. For example, in a hypothetical case where computer 1801 is designed to provide a recommendation to an end user, this recommendation would typically be communicated from network module 1815 of computer 1801 through WAN 1802 to EUD 1803. In this way, EUD 1803 can display, or otherwise present, the recommendation to an end user. In some embodiments, EUD 1803 may be a client device, such as thin client, heavy client, mainframe computer, desktop computer and so on.
REMOTE SERVER 1804 is any computer system that serves at least some data and/or functionality to computer 1801. Remote server 1804 may be controlled and used by the same entity that operates computer 1801. Remote server 1804 represents the machine(s) that collect and store helpful and useful data for use by other computers, such as computer 1801. For example, in a hypothetical case where computer 1801 is designed and programmed to provide a recommendation based on historical data, then this historical data may be provided to computer 1801 from remote database 1830 of remote server 1804.
PUBLIC CLOUD 1805 is any computer system available for use by multiple entities that provides on-demand availability of computer system resources and/or other computer capabilities, especially data storage (cloud storage) and computing power, without direct active management by the user. Cloud computing typically leverages sharing of resources to achieve coherence and economies of scale. The direct and active management of the computing resources of public cloud 1805 is performed by the computer hardware and/or software of cloud orchestration module 1841. The computing resources provided by public cloud 1805 are typically implemented by virtual computing environments that run on various computers making up the computers of host physical machine set 1842, which is the universe of physical computers in and/or available to public cloud 1805. The virtual computing environments (VCEs) typically take the form of virtual machines from virtual machine set 1843 and/or containers from container set 1844. It is understood that these VCEs may be stored as images and may be transferred among and between the various physical machine hosts, either as images or after instantiation of the VCE. Cloud orchestration module 1841 manages the transfer and storage of images, deploys new instantiations of VCEs and manages active instantiations of VCE deployments. Gateway 1840 is the collection of computer software, hardware, and firmware that allows public cloud 1805 to communicate through WAN 1802.
Some further explanation of virtualized computing environments (VCEs) will now be provided. VCEs can be stored as “images.” A new active instance of the VCE can be instantiated from the image. Two familiar types of VCEs are virtual machines and containers. A container is a VCE that uses operating-system-level virtualization. This refers to an operating system feature in which the kernel allows the existence of multiple isolated user-space instances, called containers. These isolated user-space instances typically behave as real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can utilize all resources of that computer, such as connected devices, files and folders, network shares, CPU power, and quantifiable hardware capabilities. However, programs running inside a container can only use the contents of the container and devices assigned to the container, a feature which is known as containerization.
PRIVATE CLOUD 1806 is similar to public cloud 1805, except that the computing resources are only available for use by a single enterprise. While private cloud 1806 is depicted as being in communication with WAN 1802, in other embodiments a private cloud may be disconnected from the internet entirely and only accessible through a local/private network. A hybrid cloud is a composition of multiple clouds of different types (for example, private, community or public cloud types), often respectively implemented by different vendors. Each of the multiple clouds remains a separate and discrete entity, but the larger hybrid cloud architecture is bound together by standardized or proprietary technology that enables orchestration, management, and/or data/application portability between the multiple constituent clouds. In this embodiment, public cloud 1805 and private cloud 1806 are both part of a larger hybrid cloud.
Various aspects of the present disclosure are described by narrative text, flowcharts, block diagrams of computer systems and/or block diagrams of the machine logic included in computer program product (CPP) embodiments. With respect to any flowcharts, depending upon the technology involved, the operations can be performed in a different order than what is shown in a given flowchart. For example, again depending upon the technology involved, two operations shown in successive flowchart blocks may be performed in reverse order, as a single integrated step, concurrently, or in a manner at least partially overlapping in time.
A computer program product embodiment (“CPP embodiment” or “CPP”) is a term used in the present disclosure to describe any set of one, or more, storage media (also called “mediums”) collectively included in a set of one, or more, storage devices that collectively include machine readable code corresponding to instructions and/or data for performing computer operations specified in a given CPP claim. A “storage device” is any tangible device that can retain and store instructions for use by a computer processor. Without limitation, the computer readable storage medium may be an electronic storage medium, a magnetic storage medium, an optical storage medium, an electromagnetic storage medium, a semiconductor storage medium, a mechanical storage medium, or any suitable combination of the foregoing. Some known types of storage devices that include these mediums include: diskette, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or Flash memory), static random access memory (SRAM), compact disc read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanically encoded device (such as punch cards or pits/lands formed in a major surface of a disc) or any suitable combination of the foregoing. A computer readable storage medium, as that term is used in the present disclosure, is not to be construed as storage in the form of transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide, light pulses passing through a fiber optic cable, electrical signals communicated through a wire, and/or other transmission media. As will be understood by those of skill in the art, data is typically moved at some occasional points in time during normal operations of a storage device, such as during access, de-fragmentation or garbage collection, but this does not render the storage device as transitory because the data is not transitory while it is stored.