Embodiments generally relate to machine learning. More particularly, embodiments relate to an automatic configurable sequence similarity inference system.
A machine learning (ML) network may provide a prediction or classification based on input data. The ML network may be trained with a manually curated set of training data, training algorithms, parameters, models, etc. selected for a particular application domain. However, existing solutions were designed to solve specific problems associated with a specific domain, e.g., ML modeling and analysis techniques developed for specific time-series data/sequential data. The disadvantages of domain specific solutions are worsened when dealing with additional temporal dimension in input data.
The various advantages of the embodiments will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:
Turning now to
Embodiments of each of the above processor 11, memory 12, logic 13, and other system components may be implemented in hardware, software, or any suitable combination thereof. For example, hardware implementations may include configurable logic such as, for example, programmable logic arrays (PLAs), field programmable gate arrays (FPGAs), complex programmable logic devices (CPLDs), or fixed-functionality logic hardware using circuit technology such as, for example, application specific integrated circuit (ASIC), complementary metal oxide semiconductor (CMOS) or transistor-transistor logic (TTL) technology, or any combination thereof.
Alternatively, or additionally, all or portions of these components may be implemented in one or more modules as a set of logic instructions stored in a machine- or computer-readable storage medium such as random access memory (RAM), read only memory (ROM), programmable ROM (PROM), firmware, flash memory, etc., to be executed by a processor or computing device. For example, computer program code to carry out the operations of the components may be written in any combination of one or more operating system (OS) applicable/appropriate programming languages, including an object-oriented programming language such as PYTHON, PERL, JAVA, SMALLTALK, C++, C# or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. For example, the memory 12, persistent storage media, or other system memory may store a set of instructions which when executed by the processor 11 cause the system 10 to implement one or more components, features, or aspects of the system 10 (e.g., the logic 13, testing the target query for universal similarity metrics over a range of parameters, transforming the sequence information, selecting training algorithms, select the set of parameters based on results of the test, automatically configuring the universal sequence model, etc.).
Turning now to
Embodiments of logic 22, and other components of the apparatus 20, may be implemented in hardware, software, or any combination thereof including at least a partial implementation in hardware. For example, hardware implementations may include configurable logic such as, for example, PLAs, FPGAs, CPLDs, or fixed-functionality logic hardware using circuit technology such as, for example, ASIC, CMOS, or TTL technology, or any combination thereof. Additionally, portions of these components may be implemented in one or more modules as a set of logic instructions stored in a machine- or computer-readable storage medium such as RAM, ROM, PROM, firmware, flash memory, etc., to be executed by a processor or computing device. For example, computer program code to carry out the operations of the components may be written in any combination of one or more OS applicable/appropriate programming languages, including an object-oriented programming language such as PYTHON, PERL, JAVA, SMALLTALK, C++, C# or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
The apparatus 20 (
Turning now to
Embodiments of the method 30 may be implemented in a system, apparatus, computer, device, etc., for example, such as those described herein. More particularly, hardware implementations of the method 30 may include configurable logic such as, for example, PLAs, FPGAs, CPLDs, or in fixed-functionality logic hardware using circuit technology such as, for example, ASIC, CMOS, or TTL technology, or any combination thereof. Alternatively, or additionally, the method 30 may be implemented in one or more modules as a set of logic instructions stored in a machine- or computer-readable storage medium such as RAM, ROM, PROM, firmware, flash memory, etc., to be executed by a processor or computing device. For example, computer program code to carry out the operations of the components may be written in any combination of one or more OS applicable/appropriate programming languages, including an object-oriented programming language such as PYTHON, PERL, JAVA, SMALLTALK, C++, C# or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
For example, the method 30 may be implemented on a computer readable medium as described in connection with Examples 20 to 25 below. Embodiments or portions of the method 30 may be implemented in firmware, applications (e.g., through an application programming interface (API)), or driver software running on an operating system (OS).
Some embodiments may advantageously provide an automatic configurable sequence similarity inference system. ML may be important for various intelligent software applications and/or devices. With many different application domains, however, scientists and engineers may dedicate a large amount of time to develop mostly single-domain application solutions for specific problems. Advantageously, some embodiments may provide a multiple-domain or universal domain sequence similarity inference system.
Some other ML and/or data analysis systems may be domain specific. Those skilled in the art may consider that there is no single prediction algorithm that may be optimal for all problems. Accordingly, a significant amount of a data scientist's time may be spent on understanding characteristics of the data, cleansing and normalizing the data, selecting proper parameters, features, and models for analysis and prediction. These steps may often be iterative and necessary to achieve good results. The problem may be exacerbated when dealing with additional temporal dimensions in the data. Such data may include time-series/sequential data existing in many application domains dealing with sensor readings, discrete events, DNA sequences, language processing, etc. Many modeling and analysis techniques have been developed for various domain specific time-series/sequential data. Without disregarding the potential usefulness of domain-specific knowledge, some embodiments may provide a domain independent sequence similarity inference system that may advantageously be automatically configured to achieve good accuracies for major prediction tasks on sequences, such as classification and next-state/event prediction. Some embodiments may also be extended to other tasks such as finding nearest-neighbors, clustering, outlier detection, etc. based on the abstraction of similarity inferences. Some embodiments may provide comparable analysis results to domain-specific counterparts with a significant reduction in the solution development costs.
Because of potential wide applicability and abstraction uniformity, some embodiments may be more readily implemented in hardware for a variety of applications. For example, some embodiments may combine and/or improve a variety of training/modelling techniques to produce an end-to-end universal sequence similarity inference system that may be automatically adapted into a wide range of sequence analysis problems and achieve competitive prediction results as compared to domain specific solutions.
In general, data analysis may be an iterative process in which data scientists examine the data, conduct tests and experiments, select models, adjust parameters, evaluate, and improve the process to reduce prediction errors. Some other systems may automate aspects of the iterative process for specific problems or may utilize meta-learning to select algorithms. These other systems may either not be general enough (e.g., ad hoc for a specific problem) or may be much more complex to set up and configure (e.g., meta-learning). Because of non-uniformity of underlying classifiers, the problem-specific systems may be more difficult to extend. In addition, performance-based selection may tend to be less predictable and may require a wider range of coverage to achieve good selection. Advantageously, some embodiments may provide automatic selection and configuration of training algorithms across one common abstraction and/or representation. In some embodiments, automatic configuration may need to be exercised only once per use case and may be resilient to incremental updates. In addition, some embodiments may be easier to implement. For example, optimizing a collection of training algorithms across a single representation for a hardware implementation may be easier than using a collection of disparate algorithms with different representations.
Some embodiments may provide an end-to-end ML system based on context-tree models and universal similarity metrics that may be configured automatically to adapt to different prediction/recommendation tasks for time-series and sequential event data, achieving high accuracy. Beyond a usage in scalable software systems, a ML component in some embodiments may be a common component embedded into an intelligent devices/software for learning/predicting time-series/sequential event data. For example, some embodiments may provide a universal ML platform for time-series/sequential event data based on a set of training algorithms, a variable-order context-tree model, and averaged log-loss similarity metrics. In some embodiments, a universal learning model may be automatically configured to adapt to the data sets and various prediction/recommendation tasks by selecting parameters from a bootstrapped universal model for the target queries. Advantageously, some embodiments may provide a complete solution which produces high quality results with much less human effort.
Some embodiments may provide an automatic configurable sequence similarity inference system with a framework that may achieve high accuracy (e.g., comparable to domain-specific systems) for a variety of prediction/recommendation tasks for a wide range of time-series/sequential event data with little domain specific tuning effort through an automatic configuration process. In some embodiments, the combination of training algorithms, context-tree model, universal similarity metrics, and automatic configuration may provide good accuracy with little human effort.
Some embodiments of a ML system may include a time-series data transformation, a universal sequence model, similarity metrics, training algorithms, built-in queries, and an automatic configurator for selecting parameters. In some embodiments, the process may start with the automatic configurator using a portion of the training data to test the target query using the similarity metrics over a range of parameters for the time-series data transformation, the universal sequence model, and the training algorithms, to select the best set of parameters for training, testing, and evaluating the final model. In some embodiments, a sequence may refer to a set of related events, movements, or things that follow each other in a particular order. For example, a sequence may include time-series data (e.g., sensor data), temporal event sequences (e.g., calendar events), and/or symbolic sequences (e.g., DNA, English sentences, etc.). In some embodiments, a similarity inference may reach a conclusion or recommendation on the basis of evidence and reasoning using similarity metrics. For example, a similarity inference may cover applications in classifying or clustering sequences (e.g., by finding sequences with similar patterns) and recommending next items in sequences (e.g., extending existing partial sequences).
System Overview Examples
Turning now to
Turning now to
Time-Series Data Transformation Examples
Data binning may refer to a data-processing technique to group values into bins. Data binning may be used to cluster values and tame noise and variance in the data. Some embodiments may apply one or more binning techniques to map numerical values into bins represented by a finite set of symbols (e.g., alphabets or small integers). Any suitable binning techniques may be utilized depending on the application. In some embodiments, the binning technique may meet the following criteria: domain independent binning (e.g., only based on the numerical values in the data); suitable for online processing (e.g., to minimize restrictions on use cases); and fast and scalable (e.g., time-series data may be large in some applications). Some embodiments may utilize a symbolic approximation (SAX) binning technique to bin numerical variables. SAX may be a quantile binning technique to map numerical values into symbols such as ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’ (e.g., or integers 1-5). A small binning number (e.g., 3 to 7) may be sufficient for most classification problems. In some embodiments, the numerical values may be binned based on the following equations:
Break points, β1, . . . , β4 may correspond to [−0.84, −0.25, 0.25, 0.84] and average values may be binned as 3(C), while medium low numbers may be binned as 2(B), very low numbers may be binned as 1(A), medium high numbers may be binned as 4(D), and very high numbers may be binned as 5(E). The best binning number for the SAX binning technique may be determined by the automatic configurator (e.g., as described below).
Turning now to
Universal Sequence Model Examples
To increase or maximize applicability, in some embodiments the representation of a model may be general and flexible in capturing characteristics of a wide range of sequences, but may not be open-ended. For example, an applicable sequence may assume that an item in the sequence depends only on a finite number of previous items (e.g., small integer number in practice), and/or that the sequences possesses static or stable properties (e.g., no divergent, chaotic sequences). In some embodiments, the universal sequence model may be based on a variable-order Markov model (VOM), also known as context trees. Some embodiments of the model may include the following properties: capable of modeling arbitrary high order sequences; resilient to time/phase variances in sequence (only current and preceding states matter); compact knowledge representation of sequences; and/or fast and scalable construction.
In some embodiments, the model may capture occurrences of unique prefixes in training sequences. The model may be represented as a prefix tree (e.g., or a context tree) where paths in the tree are prefixes. Each node in the tree may store the last symbol of a prefix represented by the path from a root node “ε” and the frequency count for the prefix. The model may also be represented as a hash table with prefixes as the keys and occurrence counts as the values. In various embodiments, a context tree and a hash table may provide equivalent representations of the model. Depending on the usage, one representation may be more convenient or more efficient than the other. For example, if the application needs to check for existence of a prefix, a hash table may provide a constant time lookup. On the other hand, some embodiments may benefit from checking a context tree for sorted prefixes. Accordingly, some embodiments of a universal sequence model may be referred to as a prefix table and/or a context tree.
Turning now to
where D may correspond to the depth of the tree, and σ may correspond to an input string. For example, the input string σ may correspond to “ABDACADABDAABDACADAB” for both
The maximum depth of the tree may be referred to as context-tree length (e.g., denoted as ‘D’ in
Training Algorithm Examples
In general, given a sequence, training algorithms may build prefixes along the order of sequence to train the model. The algorithm may track a current working prefix (e.g., a segment of symbols), which may get updated as the process moves forward. The existence of a prefix may be checked against the model by either traversing the context tree or looking up in the hash table (e.g., prefix dictionary). If the prefix exists in the model, the frequency count of the prefix may be incremented. Otherwise, the model may be updated by adding a new child node (e.g., or a new key in hash table) with the frequency count of one. Training algorithms used in some embodiments of a SSI may differ in how prefixes were formed and how the counts were updated.
Turning now to
In some embodiments, accuracy of prediction may be closely related to the context-tree length as well as to how prefixes were generated and how counts were updated. Advantageously, some embodiments include the selection of training algorithms as a part of the framework to adapt to the use cases to achieve good results. Some embodiments may include a variety of training algorithms (e.g., including improvements on CTX-1) to achieve much better accuracy performance. Some embodiments may also utilize the automatic configurator to streamline an iterative model tuning process. For example, some embodiments may include four training algorithms in the SSI (e.g., CTX-1 through CTX-4, as discussed below). The CTX-1 algorithm may provide a baseline for comparison. The other algorithms, CTX-2, CTX-3, and CTX-4, may include enhanced variations from CTX-1 to boost prediction performance for the built-in queries (e.g., the main use cases). For example, the CTX-2 algorithm may be similar to the CTX-1 algorithm with a count enhancement. The CTX-3 algorithm, for example, may be similar to the CTX-1 algorithm with a second pass CTX-2 re-training. For example, the CTX-4 algorithm may be trained with fixed size moving window (e.g., prefix) over input sequences. Some embodiments may advantageously provide a flexible framework which may include any of a number of other training algorithm variations.
In some embodiments, similarity inference may be based on the rankings and scores of predictions. The ranking may be represented as an ordered list based on the metrics computed on test sequences. In some embodiments of the SSI, averaged log-loss may be utilized as the similarity metrics (e.g., as discussed below). The averaged log-loss may be computed from conditional probabilities (e.g., see Eq. 3 through 5 above). Advantageously, the different training algorithm may tailor prefixes and occurrence counts to reflect hypotheses encouraging higher ranking of desirable patterns.
CTX-2 Training Algorithm Examples
Classifying sequences effectively may depend on how well the model can relate and differentiate a sequence across class labels. As differences becoming smaller and patterns becoming subtler, classification accuracy may tend to drop due to confusion of the model, especially on boundary cases. For challenging classification, emphasizing rarer patterns (e.g., signature patterns) may help class identification and separation. In general, longer patterns may be rarer. However, the longer the prefix is, the less support it may have as a pattern. Some embodiments may re-enforce the counts along the sub-prefixes leading to a prefix such that common paths across prefixes may be emphasized allowing common patterns emerged in the ranking.
Turning now to
CTX-3 Training Algorithm Examples
As discussed above, CTX-1 may use an LZ78 scheme iterating over symbols to build up the prefix table incrementally. An artifact of LZ78 may be that to add a prefix, the routine must have previously encountered the sub-prefix leading to the prefix. In other words, the prefix “ABC” would not be added unless “AB” was already in the table. The eventual count for “ABC” would be one less as compared to if “ABC” already exists in the table. The effect may appear inconsequential at the surface since the impact of the effect diminishes as sequence becomes longer. However, for rare patterns or shorter sequences, inference from support of one may be considered incidental while support of two or more may rise up as potential patterns. Because potential missing of longer (distinct) patterns may have a negative impact to the performance, CTX-3 may advantageously address the artifact by capturing longer prefixes better. In some embodiments, CTX-3 may deploy a two-pass approach.
Turning now to
CTX-4 Training Algorithm Examples
A technique for computing moving average and mining sequential patterns may include a moving window technique. Training algorithm CTX-4 may train the model like CTX-1 with prefixes captured in fixed-size moving window (e.g., the window size may be equal to the context-tree length) rather than growing incrementally using LZ78. The moving windows may ensure all prefixes with the size of context-tree length are captured. The CTX-4 training algorithm may provide advantages in predicting a next symbol by minimizing effects of missing patterns due to the artifact of processing prefixes starting at different points or phases.
Turning now to
Similarity Metrics Examples
Similarity metrics may be viewed as distance measures. Specifically, some embodiments may utilize probability-based distance measures to compute probabilities from the model. Any suitable probability-based distance measure may be utilized (e.g., Kullback-Leibler divergence, Bhattacharyya coefficient, Kolmogorov metrics, etc.). Some embodiments may select one universal similarity metric. such that the automatic configurator may not need to perform a computation for selecting a best similarity metric. For example, a similarity metric (e.g., a normalized compression distance) based on Kolmogorov complexity providing an information based compression distance may be a suitable universal similarity metric in accordance with some embodiments.
In some embodiments, a CTX-1 classification may be based on prediction by partial matching (PPM). The CTX-1 training algorithm may use LZ78 compression to build a prefix table and may optimize over a metric called average log-loss. For example, average log-loss may be defined as follows: A test sequence of a variable x with length T may be denoted as x1T=x1, x2, . . . xT. The notation {circumflex over (P)}(σ|s) may represent conditional probability distribution of symbol σ with the given sequence s, which may be calculated from the context tree model based on the formula from Eq. 4 above. Minimizing the average log-loss may be considered as equivalent to maximizing a probability assignment for the entire test sequence. Advantageously, some embodiments may use average log-loss to select a most probable next symbol for a sequence as well as for classification. In some embodiments, average log-loss may be determined as follows:
In accordance with some embodiments, average log-loss may be effective as a similarity (e.g., distance) measure between two sequences, and may advantageously also be used to identify an amount of regularity (e.g., patterns) in sequences through compressibility (e.g., similar to a Kolmogorov complexity). Some embodiments may advantageously utilize average log-loss as a universal similarity metric.
Built-In Query Examples
Some embodiments may provide a framework for automatic configuration and execution of prediction tasks. The built-in queries may define the problem space for the system. For example, each query may represent a specific type of prediction task. Although more queries may be added, some embodiments may include two built-in queries. In particular, some embodiments may include a multivariate sequence classification query, and a multivariate next-state prediction query. The two queries may exemplify the major use cases of some embodiments of the SSI including, for example, classifying sequences and predicting a next item in a sequence. In addition to the direct use cases, other applications may be supported applying these two queries. For example, the two queries may support applications such as finding nearest neighbors, recommendation, planning, conversation, situation awareness, etc. For different use cases and data, the best parameters and salient variables (e.g., features) may be determined by the automatic configurator.
Examples of Training a Context Tree Model for Multivariate Episodes
Turning now to
Multivariate Sequence Classification Examples
Sequence classification may include predicting most likely class labels for a given test sequence. There may often be many variables for a given problem space. An embodiment of a multivariate sequence classification query may predict top-m class labels of a given test episode containing the test sequences corresponding to the variables in the problem space. In some embodiments, the ranking of prediction may be performed by tallying the votes of top-m predicted class labels from each variable sequence. The top-m predicted class labels for each variable sequence may be produced by sorting average log-losses computed from the test sequence against k context-tree models for the variable.
Turning now to
Multivariate Next-State Prediction Examples
The next-state prediction may predict the next symbol (σ) in a given sequence (i.e. given {xiT}, predict {xiT+1=xiTσ}) with the assumption that the current symbol is related to one or more previous symbols in the sequence. Next-state prediction may be used in numerous applications, such as in planning, conversation, fault anticipation, intelligent devices, etc. The problem space often involves multiple variables. To cover most use cases, some embodiments may expand next-state prediction from univariate prediction to multivariate prediction. Next-state prediction without using class labels may be referred to as prediction by partial matching (PPM). In some embodiments, the next symbol may be predicted by taking the last D-1 symbols (e.g., where D corresponds to the context-tree length) in the given test sequence and extending it with one additional symbol from the pool of all symbols (σl, . . . , σs) forming s test sequences of size of D. For each extended sequence, the query may compute average log-loss and may recommend the last symbol of the extended sequence with the smallest average log-loss. Some embodiments may enhance PPM by taking advantages of class labels (e.g., if available) to provide better predictions using context trees of classes to first recommend the class labels of the test sequence and then use the context trees of recommended classes to predict the next state as in PPM.
Turning now to
Automatic Configurator Examples
Part of a data scientist's task may include selecting appropriate models, algorithms, and parameters for a data set to achieve good analysis and prediction results. The selection task may be time consuming due to investigating possible combinations in various aspects of the analysis. Given well defined algorithms, a general model, and robust similarity metrics in SSI, the selection task may be automated against the target data set and queries. Advantageously, some embodiments may provide a framework for automatically selecting the best SSI configurations and parameters that may adapt to different sequence data and analysis.
Some embodiments of an automatic configurator may make a selection from a range of parameters by evaluating results experimentally using an embedded SSI and portions of the training data. Some embodiments may advantageously split the training set into an internal training set and a configuration test set to determine a selection of configurations and parameters through evaluating results of the target query on the training data prior to executing final tests and evaluation on the test data. For example, some embodiments may make a three-to-one random split for the internal training set and a two-to-one random split for the configuration test set. The automatic configuration may only need to be performed once per given data set, and may be resilient to incremental data updates. Advantageously, some embodiments may adapt to most sequence data sets and query without human intervention. In production, running the automatic configurator may be beneficial after the initial run if the data characteristics have changed significantly due to updates or accuracy has degraded below a predefined threshold.
Turning now to
The selection process may be an iterative process over a range of configurations and parameters. The order of processing may be determined by minimizing the computation overheads and speed to convergence of a selection. Default configurations and parameter values may be used if the selection for a specific aspect has not occurred. The selection may follow an order of training algorithm (e.g., see
Selection of Training Algorithm Examples
Turning now to
Selection of Context-Tree Length Examples
Turning now to
Selection of Bin Number Examples
Turning now to
Selection of Salient Variables Examples
In some embodiments, variable selection (e.g., feature selection) may be included to improve classification performance. Any suitable variable/feature selection technology may be applied in a SSI framework in accordance with some embodiments. For example, minimum-redundancy-maximum-relevance (mRMR) feature selection technology may be adopted for the SSI in some embodiments. However, computation of mRMR can be expensive for combinations of a large number of variables. Some embodiments may find it beneficial to reduce computation overheads as well as improve accuracy by removing highly redundant variables (e.g., over 99% correlation) for various classification algorithms. Any suitable technology may be utilized to reduce the number of variables including, for example, principal component analysis (PCA), correlation matrix, mutual information, etc. Some embodiments may advantageously balance computation overheads and accuracy. For example, some embodiments of a SSI may utilize a greedy algorithm in ranking and selecting variables using average log-loss to measure classification power of variables.
Turning now to
Turning now to
The processor core 200 is shown including execution logic 250 having a set of execution units 255-1 through 255-N. Some embodiments may include a number of execution units dedicated to specific functions or sets of functions. Other embodiments may include only one execution unit or one execution unit that can perform a particular function. The illustrated execution logic 250 performs the operations specified by code instructions.
After completion of execution of the operations specified by the code instructions, back end logic 260 retires the instructions of the code 213. In one embodiment, the processor core 200 allows out of order execution but requires in order retirement of instructions. Retirement logic 265 may take a variety of forms as known to those of skill in the art (e.g., re-order buffers or the like). In this manner, the processor core 200 is transformed during execution of the code 213, at least in terms of the output generated by the decoder, the hardware registers and tables utilized by the register renaming logic 225, and any registers (not shown) modified by the execution logic 250.
Although not illustrated in
Referring now to
The system 1000 is illustrated as a point-to-point interconnect system, wherein the first processing element 1070 and the second processing element 1080 are coupled via a point-to-point interconnect 1050. It should be understood that any or all of the interconnects illustrated in
As shown in
Each processing element 1070, 1080 may include at least one shared cache 1896a, 1896b (e.g., static random access memory/SRAM). The shared cache 1896a, 1896b may store data (e.g., objects, instructions) that are utilized by one or more components of the processor, such as the cores 1074a, 1074b and 1084a, 1084b, respectively. For example, the shared cache 1896a, 1896b may locally cache data stored in a memory 1032, 1034 for faster access by components of the processor. In one or more embodiments, the shared cache 1896a, 1896b may include one or more mid-level caches, such as level 2 (L2), level 3 (L3), level 4 (L4), or other levels of cache, a last level cache (LLC), and/or combinations thereof.
While shown with only two processing elements 1070, 1080, it is to be understood that the scope of the embodiments are not so limited. In other embodiments, one or more additional processing elements may be present in a given processor. Alternatively, one or more of processing elements 1070, 1080 may be an element other than a processor, such as an accelerator or a field programmable gate array. For example, additional processing element(s) may include additional processors(s) that are the same as a first processor 1070, additional processor(s) that are heterogeneous or asymmetric to processor a first processor 1070, accelerators (such as, e.g., graphics accelerators or digital signal processing (DSP) units), field programmable gate arrays, or any other processing element. There can be a variety of differences between the processing elements 1070, 1080 in terms of a spectrum of metrics of merit including architectural, micro architectural, thermal, power consumption characteristics, and the like. These differences may effectively manifest themselves as asymmetry and heterogeneity amongst the processing elements 1070, 1080. For at least one embodiment, the various processing elements 1070, 1080 may reside in the same die package.
The first processing element 1070 may further include memory controller logic (MC) 1072 and point-to-point (P-P) interfaces 1076 and 1078. Similarly, the second processing element 1080 may include a MC 1082 and P-P interfaces 1086 and 1088. As shown in
The first processing element 1070 and the second processing element 1080 may be coupled to an I/O subsystem 1090 via P-P interconnects 10761086, respectively. As shown in
In turn, I/O subsystem 1090 may be coupled to a first bus 1016 via an interface 1096. In one embodiment, the first bus 1016 may be a Peripheral Component Interconnect (PCI) bus, or a bus such as a PCI Express bus or another third generation I/O interconnect bus, although the scope of the embodiments are not so limited.
As shown in
Note that other embodiments are contemplated. For example, instead of the point-to-point architecture of
Example 1 may include an electronic processing system, comprising a processor, memory communicatively coupled to the processor, and logic communicatively coupled to the processor to test a target query for one or more similarity metrics over a range of parameters for one or more sets of sequence related information, a multi-domain sequence model, and one or more training routines, and select a set of parameters based on a result of the test.
Example 2 may include the system of Example 1, wherein the logic is further to automatically configure the multi-domain sequence model to adapt to one or more of respective data sets, respective prediction tasks, and respective recommendation tasks based on the selected parameters.
Example 3 may include the system of Example 1, wherein the one or more similarity metrics comprise multi-domain similarity metrics.
Example 4 may include the system of Example 3, wherein the multi-domain similarity metrics comprise averaged log-loss similarity metrics.
Example 5 may include the system of any of Examples 1 to 4, wherein the multi-domain sequence model comprises a variable order context-tree model.
Example 6 may include the system of Example 5, wherein the sequence related information includes one or more of time series data, temporal event sequence information, and symbolic sequence information.
Example 7 may include a semiconductor package apparatus, comprising one or more substrates, and logic coupled to the one or more substrates, wherein the logic is at least partly implemented in one or more of configurable logic and fixed-functionality hardware logic, the logic coupled to the one or more substrates to test a target query for one or more similarity metrics over a range of parameters for one or more sets of sequence related information, a multi-domain sequence model, and one or more training routines, and select a set of parameters based on a result of the test.
Example 8 may include the apparatus of Example 7, wherein the logic is further to automatically configure the multi-domain sequence model to adapt to one or more of respective data sets, respective prediction tasks, and respective recommendation tasks based on the selected parameters.
Example 9 may include the apparatus of Example 7, wherein the one or more similarity metrics comprise multi-domain similarity metrics.
Example 10 may include the apparatus of Example 9, wherein the multi-domain similarity metrics comprise averaged log-loss similarity metrics.
Example 11 may include the apparatus of any of Examples 7 to 10, wherein the multi-domain sequence model comprises a variable order context-tree model.
Example 12 may include the apparatus of Example 11, wherein the sequence related information includes one or more of time series data, temporal event sequence information, and symbolic sequence information.
Example 13 may include the apparatus of Example 7, wherein the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates.
Example 14 may include a method of automatically configuring a model, comprising testing a target query for one or more similarity metrics over a range of parameters for one or more sets of sequence related information, a multi-domain sequence model, and one or more training routines, and selecting a set of parameters based on a result of the test.
Example 15 may include the method of Example 14, further comprising automatically configuring the multi-domain sequence model to adapt to one or more of respective data sets, respective prediction tasks, and respective recommendation tasks based on the selected parameters.
Example 16 may include the method of Example 14, wherein the one or more similarity metrics comprise multi-domain similarity metrics.
Example 17 may include the method of Example 16, wherein the multi-domain similarity metrics comprise averaged log-loss similarity metrics.
Example 18 may include the method of any of Examples 14 to 17, wherein the multi-domain sequence model comprises a variable order context-tree model.
Example 19 may include the method of Example 18, wherein the sequence related information includes one or more of time series data, temporal event sequence information, and symbolic sequence information.
Example 20 may include at least one computer readable medium, comprising a set of instructions, which when executed by a computing device, cause the computing device to test a target query for one or more similarity metrics over a range of parameters for one or more sets of sequence related information, a multi-domain sequence model, and one or more training routines, and select a set of parameters based on a result of the test.
Example 21 may include the at least one computer readable medium of Example 20, comprising a further set of instructions, which when executed by the computing device, cause the computing device to automatically configure the multi-domain sequence model to adapt to one or more of respective data sets, respective prediction tasks, and respective recommendation tasks based on the selected parameters.
Example 22 may include the at least one computer readable medium of Example 20, wherein the one or more similarity metrics comprise multi-domain similarity metrics.
Example 23 may include the at least one computer readable medium of Example 22, wherein the multi-domain similarity metrics comprise averaged log-loss similarity metrics.
Example 24 may include the at least one computer readable medium of any of Examples 20 to 23, wherein the multi-domain sequence model comprises a variable order context-tree model.
Example 25 may include the at least one computer readable medium of Example 24, wherein the sequence related information includes one or more of time series data, temporal event sequence information, and symbolic sequence information.
Example 26 may include an automatic configuration apparatus, comprising means for testing a target query for one or more similarity metrics over a range of parameters for one or more sets of sequence related information, a multi-domain sequence model, and one or more training routines, and means for selecting a set of parameters based on a result of the test.
Example 27 may include the apparatus of Example 26, further comprising means for automatically configuring the multi-domain sequence model to adapt to one or more of respective data sets, respective prediction tasks, and respective recommendation tasks based on the selected parameters.
Example 28 may include the apparatus of Example 26, wherein the one or more similarity metrics comprise multi-domain similarity metrics.
Example 29 may include the apparatus of Example 28, wherein the multi-domain similarity metrics comprise averaged log-loss similarity metrics.
Example 30 may include the apparatus of any of Examples 26 to 29, wherein the multi-domain sequence model comprises a variable order context-tree model.
Example 31 may include the apparatus of Example 30, wherein the sequence related information includes one or more of time series data, temporal event sequence information, and symbolic sequence information.
Embodiments are applicable for use with all types of semiconductor integrated circuit (“IC”) chips. Examples of these IC chips include but are not limited to processors, controllers, chipset components, programmable logic arrays (PLAs), memory chips, network chips, systems on chip (SoCs), SSD/NAND controller ASICs, and the like. In addition, in some of the drawings, signal conductor lines are represented with lines. Some may be different, to indicate more constituent signal paths, have a number label, to indicate a number of constituent signal paths, and/or have arrows at one or more ends, to indicate primary information flow direction. This, however, should not be construed in a limiting manner. Rather, such added detail may be used in connection with one or more exemplary embodiments to facilitate easier understanding of a circuit. Any represented signal lines, whether or not having additional information, may actually comprise one or more signals that may travel in multiple directions and may be implemented with any suitable type of signal scheme, e.g., digital or analog lines implemented with differential pairs, optical fiber lines, and/or single-ended lines.
Example sizes/models/values/ranges may have been given, although embodiments are not limited to the same. As manufacturing techniques (e.g., photolithography) mature over time, it is expected that devices of smaller size could be manufactured. In addition, well known power/ground connections to IC chips and other components may or may not be shown within the figures, for simplicity of illustration and discussion, and so as not to obscure certain aspects of the embodiments. Further, arrangements may be shown in block diagram form in order to avoid obscuring embodiments, and also in view of the fact that specifics with respect to implementation of such block diagram arrangements are highly dependent upon the platform within which the embodiment is to be implemented, i.e., such specifics should be well within purview of one skilled in the art. Where specific details (e.g., circuits) are set forth in order to describe example embodiments, it should be apparent to one skilled in the art that embodiments can be practiced without, or with variation of, these specific details. The description is thus to be regarded as illustrative instead of limiting.
The term “coupled” may be used herein to refer to any type of relationship, direct or indirect, between the components in question, and may apply to electrical, mechanical, fluid, optical, electromagnetic, electromechanical or other connections. In addition, the terms “first”, “second”, etc. may be used herein only to facilitate discussion, and carry no particular temporal or chronological significance unless otherwise indicated.
As used in this application and in the claims, a list of items joined by the term “one or more of” may mean any combination of the listed terms. For example, the phrase “one or more of A, B, and C” and the phrase “one or more of A, B, or C” both may mean A; B; C; A and B; A and C; B and C; or A, B and C.
Those skilled in the art will appreciate from the foregoing description that the broad techniques of the embodiments can be implemented in a variety of forms. Therefore, while the embodiments have been described in connection with particular examples thereof, the true scope of the embodiments should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.