With rapid advances in technology, computing systems are increasingly prevalent in society today. Vast computing systems execute and support applications that communicate and process immense amounts of data, many times with performance constraints to meet the increasing demands of users. Increasing the efficiency, speed, and effectiveness of computing systems will further improve user experience.
Certain examples are described in the following detailed description and in reference to the drawings.
The discussion below refers to input vectors and feature vectors. An input vector may refer to any vector or set of values in an input space that represents an object and a feature vector may refer to a vector or set of values that represents the object in a feature space. Various transformational techniques may be used to map input vectors in the input space to feature vectors in the feature space. Kernel methods, for example, may rely on the mapping between the input space and the feature space such that the inner product of feature vectors in the feature space can be computed through a kernel function (which may also be denoted as the “kernel trick”). One such example is support vector machine (SVM) classification through the Gaussian kernel. Kernel methods, however, may be inefficient in that the direct mapping from the input space to the feature space is computationally expensive, or in some cases impossible (for example in the case of a Gaussian kernel where the feature space is infinite-dimensional).
Linear kernels are another form of machine learning that utilize input vectors, and may operate with increased effectiveness on specific types of input vectors (e.g., sparse, high-dimensional input vectors). However, when the input vectors are not of the specific types upon which such linear kernels can effectively operate, the accuracy of linear kernels may decrease. For linear kernels, no input-to-feature space mapping is performed (or the input-to-feature mapping is an identity mapping), and thus the effectiveness linear kernels is largely dependent on the input vectors being in a format that linear kernels effectively utilize. As such, real-time processing using such linear kernels may be less effective as such applications provide increased speed and efficiency, but the accuracy of the linear kernel may be insufficient for application or user-specified requirements.
Examples consistent with the present disclosure may support generation of feature vectors using concomitant rank order (CRO) hash sets. As described below, a CRO hash set for an input vector may be computed with high efficiency, and using the CRO hash set to map an input vector to a corresponding feature vector may also yield accuracy benefits that may be comparable to use of non-linear kernel methods. In that regard, feature vector generation using CRO hash sets may provide a strong balance between the accuracy of non-linear kernel methods and the efficiency of linear kernels. As such, the features described herein may result in increased computation efficiency, reduced consumption of processing resources, and improvements in the efficiency and accuracy of real-time processing using machine learning. The features described herein may be useful for real-time applications that require both accuracy and speed in data processing, including applications such as anomaly detection in video streaming, high frequency trading, and fraud detection, for example.
The system 100 may generate feature vectors by mapping input vectors in an input space to feature vectors in a feature space. For a particular set of input vectors, the system 100 may generate a corresponding set of feature vectors. As described in greater detail below, the system 100 may generate sparse binary feature vectors from input vectors through use of concomitant rank order (CRO) hash sets determined for the input vectors. The system 100 may determine the CRO hash sets and generate the feature vectors in linear time, e.g., without costly vector product operations or other non-linear kernel training mechanisms that may consume significant processing resources.
Nonetheless, the feature vectors generated by the system 100 using the determined CRO hash sets may exhibit characteristics that approximate non-linear kernels trained using kernel methods or “kernel tricks”, including the Gaussian kernel in some examples. That is, the feature vectors generated by the system 100 may provide an accuracy similar to non-linear kernel methods, but also take the sparse binary form useful for linear kernels to support machine-learning applications with increased speed and efficiency. Such an accuracy may be unexpected as the feature vectors are generated without actual application of a non-linear kernel method. To further explain, the feature vectors generated by the system 100 may be efficiently generated without the computationally-expensive vector product operations required for non-linear kernel methods, but provide an unexpected accuracy usually characterized by such non-linear kernel methods. The system 100 may thus support feature vector generation with the accuracy of, for example, the Gaussian kernel, but also support the efficiency of linear kernels and other linear machine-learning mechanisms.
The system 100 may implement various engines to provide or support any of the features described herein. In the example shown in
The hardware for the engines 108, 110, and 112 may include a processing resource to execute programming instructions. A processing resource may include various number of processors with single or multiple cores, and a processing resource may be implemented through a single-processor or multi-processor architecture. In some examples, the system 100 implements multiple engines using the same system features or hardware components (e.g., a common processing resource).
The input engine 108, mapping engine 110, and application engine 112 may include components to support the generation and application of feature vectors. In the example implementation shown in
These and other aspects of feature vector generation using CRO hash sets are discussed in greater detail next.
The input vectors 210 may characterize elements of a physical system in any number of ways. In some implementations, the input vectors 210 characterize elements of a physical system through a multi-dimensional vector storing vector element values representing various characteristics or aspects of the physical system elements. In the example shown in
The mapping engine 110 may transform the input vectors 210 into the feature vectors 220. For each input vector received by the input engine 108, the mapping engine 110 may generate a corresponding feature vector, and do so by mapping the input vector in an input space to a corresponding feature vector in a feature space. In the example shown in
To generate the feature vector 221 from the input vector 211, the mapping engine 110 may determine a CRO hash set of the input vector 211. The CRO hash set of an input vector may include a predetermined number of hash values through application of a CRO hash function, which is described in greater detail below. In
The mapping engine 110 may determine a CRO hash set for an input vector according to any number of parameters. Two examples are shown in
Table 1 below illustrates an example process by which the mapping engine 110 may determine the CRO hash set for an input vector A. In Table 1, the input vector A may be defined as AεRN. In implementing or performing the example process, the mapping engine 110 may map input vectors to a CRO hash set with hash values chosen from the universe of 1 to U, where U is specified via the dimensionality parameter 231. The mapping engine 110 may also compute CRO hash sets using the hash numeral parameter 232, which may specify the number of hash values to compute for an input vector and which may be denoted as τ. As another part of the example CRO hast set computation process shown in Table 1, the mapping engine 110 may access, compute, or use a random permutation π of 1-U. The mapping engine 110 may utilize the same random permutation π for a particular set of input vectors or for input vectors of a particular source or particular vector type.
Referring now to Table 1 below, the vector −A represents the input vector A multiplied by −1 and the notation A, B, C, . . . represents the concatenation of vectors A, B, C etc.
Table 2 below illustrates example pseudo-code that the mapping engine 110 may implement or execute to determine CRO hash sets for input vectors. The pseudo-code below may be consistent with the form of Matlab code, but other implementations are possible.
As such, the mapping engine 110 may determine (e.g., compute) CRO hash sets for each of the input vectors 210.
Upon determining the CRO hash set for a particular input vector, the mapping engine 110 may generate a corresponding feature vector from the CRO hash set. In particular, the mapping engine 110 may generate the corresponding feature vector as a vector with dimensionality U (that is, the dimensionality parameter 231). Accordingly, the corresponding feature vector may have a number of vector elements (or, phrased another way, a vector length) equal to the dimensionality parameter 231. The mapping engine 110 may assign values to the U number of vector elements in the corresponding feature vector according to the CRO hash set for the input vector from which the feature vector is mapped or generated from.
To illustrate, the CRO hash set determined for an input vector may include a number of hash values, each between 1 and U, and the mapping engine 110 may use the CRO hash values in the CRO hash set as vector indicies into the feature vector. For each vector element with a vector index represented by a hash value of the CRO hash set, the mapping engine 110 may assign a non-zero value in the feature vector (e.g., a ‘1’ value). For other vector elements with vector indices in the feature vector not represented by the hash values of the determined CRO hash set, the mapping engine 110 may assign a zero value (also denoted as a ‘0’ value). Such an example is shown in
In some implementations, feature vectors generated by the mapping engine 110 using CRO hash sets may be sparse, binary, and high-dimensional. The sparsity, high-dimensional, and binary characteristics of feature vectors generated by the mapping engine 110 may provide increased efficiency in subsequent machine-learning or other processing using the feature vectors.
Regarding sparsity, the sparsity of a feature vector may be measured through the ratio of non-zero vector elements present in the feature vector (which may be equal to the hash numeral parameter 232) to the total number of elements in the feature vector (which may be equal to the dimensionality parameter 231). Thus, the sparsity of the feature vector 221 may be measured as the value of the hash numeral parameter 232/dimensionality parameter 231. Generated feature vectors may be considered sparse when the sparsity of the feature vector is less than a sparsity threshold, e.g., less than 0.25% or any other configurable or predetermined value.
Regarding dimensionality, the generated feature vectors may be high-dimensional when the vector length of the feature vectors exceeds a high-dimensional threshold. As noted above, the vector length of feature vectors generated by the mapping engine 110 may be controlled through the dimensionality parameter 231. Thus, generated feature vectors may be high-dimensional when the dimensionality parameter 231 (and thus the number of elements in the feature vectors) is set to a value that exceeds the high-dimensional threshold. As an example, a feature vector may be high-dimensional when the vector length exceeds 50,000 elements or any other configurable threshold. Regarding the binary vector characteristic, the mapping engine 110 may generate feature vectors to be binary by assigning a ‘1’ value to the vector elements with vector indices represented by the hash values of computed CRO hash sets. Such binary vectors may be subsequently processed with increased efficiency, and thus the mapping engine 110 may improve computer performance for data processing and various machine-learning tasks.
As described above, the mapping engine 110 may generate a set of feature vectors from a set of input vectors using the CRO hash sets determined for the input vectors. The resulting set of feature vectors may exhibit various characteristics that may be beneficial to subsequent processing or use. In particular, feature vectors generated using CRO hash sets may correlate to (e.g., approximate or equate to) an “implicit” kernel. Such a kernel is referred to as “implicit” as the mapping engine 110 may generate feature vectors without explicit application of a kernel, without vector product operations, and without various other costly computations used in non-linear kernel methods. However, the generated feature vectors may be correlated (e.g., characterized) by this implicit kernel as the inner product of generated feature vectors results in this implicit kernel.
The implicit kernel (correlated to feature vectors generated using CRO hash sets) may approximate other kernels used in non-linear kernel methods. In some examples, the implicit kernel approximates the Gaussian kernel, which may also be referred to as the radial basis function (RBF) kernel. The implicit kernel may approximate the Gaussian kernel (or other kernels) within a difference threshold. The difference threshold may refer to a tolerance for the difference between kernel values of the implicit kernel and the Gaussian kernel, and may expressed in absolute values (e.g., difference is within 0.001) or in percentage (e.g., difference is within 5%). One such comparison is shown in
Thus, the example graph 300 may illustrate how at no point does the difference in kernel value between the implicit kernel and the Gaussian kernel exceed a difference threshold (e.g., a 0.001 value or 5%) for various x-axis values of the graph 300 (shown as cos(A, B)).
By approximating the Gaussian kernel, the implicit kernel may exhibit increased accuracy in application of feature vectors generated using CRO hash sets (to which the implicit kernel is correlated). In that regard, the mapping engine 110 may generate feature vectors using CRO hash sets with increased efficiency and lower computational times (as no vector product operations are necessary), but nonetheless provide accuracy and utility of non-linear kernel methods. As noted above, such a combination of accuracy and speed may be unexpected as linear kernels lack the accuracy and effectiveness exhibited by feature vectors generating using CRO hash sets and input-to-feature mapping through non-linear kernel methods are much more computationally expensive. Such feature vectors may thus provide elegant and efficient elements for use in machine-learning, classification, clustering, regression, and particularly for real-time analysis of large sampling data sets such as streaming applications, fraud detection, high-frequency trading, and much more.
The method 400 may include accessing input vectors in an input space, the input vectors characterizing elements of a physical system (402). In some examples, the input engine 108 may access the input vectors in real-time, for example as a data stream for anomaly detection in video data, as data characterizing high frequency trading, as image recognition data, or various online applications.
The method 400 may also include generating feature vectors from the input vectors (404), for example by the mapping engine 110. The feature vectors generated by the mapping engine 110 may correlate to input-feature vector transformations using an implicit kernel. Thus, an inner product of a pair of the feature vectors may correlate to an implicit kernel for the pair of feature vectors and the implicit kernel may approximate a Gaussian kernel within a difference threshold. Moreover, the mapping engine 110 may generate the feature vectors without any vector product operations performed between any of the input vectors, which may allow for efficient feature vector computations with increased an unexpected accuracy.
As shown in
Although one example was shown in
The method 500 may include generating feature vectors from input vectors (502), for example by the mapping engine 110. The mapping engine 110 may generate the feature vectors in any of the ways described herein. For instance, for the method 500 shown in
The feature vectors generated by the mapping engine 110 may be high-dimensional, binary, and sparse. For instance, the dimensionality parameter accessed by the mapping engine 110 may exceed a high-dimension threshold, which may thus case the mapping engine 110 to generate high-dimensional feature vectors. As another example, the mapping engine 110 may access the parameters such that a ratio between the hash numeral parameter and the dimensionality parameter is less than a sparsity threshold. In such examples, the mapping engine 110 may generate the corresponding set of feature vectors as sparse binary feature vectors.
Although one example was shown in
The system 600 may execute instructions stored on the machine-readable medium 620 through the processing resource 610. Executing the instructions may cause the system 600 to perform any of the features described herein, including according to any features of the input engine 108, the mapping engine 110, the application engine 112, or combinations thereof.
For example, execution of the instructions 622 and 624 by the processing resource 610 may cause the system 600 to access input vectors in an input space, the input vectors characterizing elements of a physical system (instructions 622) and generate, from the input vectors, sparse binary feature vectors in a feature space different from the input space (instructions 624). An inner product of a pair of the generated sparse binary feature vectors may correlate to an implicit kernel for the pair, and the implicit kernel may approximate a Gaussian kernel within a difference threshold, e.g., for the unit sphere. Generation of each sparse binary feature vector may be performed without any vector product operations, including without any vector product operations amongst the input vectors. Instead, generation of the sparse binary feature vectors may include determination of a CRO hash set for an input vector corresponding to a sparse binary feature vector; assignment of a ‘1’ value for vector elements of the sparse binary feature vector with vector indices equal to hash values of the CRO hash set; and assignment of a ‘0’ value for other vector elements of the sparse binary feature vector. In some implementations, each of the generated sparse binary feature vectors is sparse by having a ratio of vector elements with a ‘1’ value to total vector elements that is less than a sparsity threshold.
Continuing the example of
The systems, methods, devices, engines, and logic described above, including the input engine 108, mapping engine 110, and application engine 112, may be implemented in many different ways in many different combinations of hardware, logic, circuitry, and executable instructions stored on a machine-readable medium. For example, the input engine 108, the mapping engine 110, the application engine 112, or any combination thereof, may include circuitry in a controller, a microprocessor, or an application specific integrated circuit (ASIC), or may be implemented with discrete logic or components, or a combination of other types of analog or digital circuitry, combined on a single integrated circuit or distributed among multiple integrated circuits. A product, such as a computer program product, may include a storage medium and machine readable instructions stored on the medium, which when executed in an endpoint, computer system, or other device, cause the device to perform operations according to any of the description above, including according to any features of the input engine 108, mapping engine 110, and application engine 112.
The processing capability of the systems, devices, and engines described herein, including the input engine 108, mapping engine 110, and application engine 112, may be distributed among multiple system components, such as among multiple processors and memories, optionally including multiple distributed processing systems. Parameters, databases, and other data structures may be separately stored and managed, may be incorporated into a single memory or database, may be logically and physically organized in many different ways, and may implemented in many ways, including data structures such as linked lists, hash tables, or implicit storage mechanisms. Programs may be parts (e.g., subroutines) of a single program, separate programs, distributed across several memories and processors, or implemented in many different ways, such as in a library (e.g., a shared library).
While various examples have been described above, many more implementations are possible.