This disclosure is related generally to perpetual problem analytics, and more particularly to a system and method of the joint selection of pattern recognition algorithms and data features.
The accelerating data avalanche is gaining unimpeded momentum that is enabled by the commoditization of computing storage, devices, bandwidth, connectivity, processor parallelization, and processor speed. Consequently, numerous data mining algorithms are becoming available to sift through massive amounts of information. Businesses and governments that do not embrace advanced data analytics will not survive within an environment of highly connected and intelligent enterprise.
Along with the advancement of data mining tools, applying the right algorithm to a problem is critical. For example, practitioners might choose a familiar algorithm for a specific problem that produces a suboptimal solution while a highly tuned system continually determines the best algorithm to apply towards a problem. Equally important, the diversity and dimensionality of data is becoming more challenging and is already intractable. Dimensionality reduction and variable selection is required to select the most important traits of data from an exhaustive set of features. However, varying algorithms will perform differently given changing feature sets. Accurately selecting an algorithm and a set of features is critical to achieve optimal performance.
The present invention relates to a system, method and program product for identifying an algorithm and feature set to solve a problem. In a first aspect, the invention provides a perpetual analytics system for a joint selection of an algorithm and feature set to solve a problem, comprising: an evolutionary computing engine for processing data encoded as chromosomes, wherein each chromosome encodes an algorithm and a feature set; a domain knowledge store that maintains a plurality of algorithms and a plurality of features; a system for applying a generation of chromosomes to a set of data to provide a set of results; and a fitness function for evaluating the set of results to rate a performance of each chromosome in the set of chromosomes; wherein the evolutionary computing engine is adapted to evolve a subset of the set of chromosomes into a new generation of chromosomes.
In a second aspect, the invention provides a method of selecting an algorithm and feature set to solve a problem, comprising: providing an initial generation of chromosomes, wherein each chromosome encodes an algorithm and a feature set; applying each chromosome from the initial generation of chromosomes to a set of data to provide a set of results; evaluating the set of results with a fitness function to rate a performance of each chromosome in the initial set of chromosomes; and evolving a subset of chromosomes to creates a new generation of chromosomes.
In a third aspect, the invention provides a program product stored on a computer readable storage medium for selecting an algorithm and feature set to solve a problem, comprising: program code for providing an initial generation of chromosomes, wherein each chromosome encodes an algorithm and a feature set; program code for applying each chromosome from the initial generation of chromosomes to a set of data to provide a set of results; program code for evaluating the set of results with a fitness function to rate a performance of each chromosome in the initial set of chromosomes; and program code for evolving a subset of chromosomes to creates a new generation of chromosomes.
The illustrative aspects of the present invention are designed to solve the problems herein described and other problems not discussed.
These and other features of this invention will be more readily understood from the following detailed description of the various aspects of the invention taken in conjunction with the accompanying drawings.
a-c depicts scatter plots of data correlations in accordance with an embodiment of the invention.
The drawings are merely schematic representations, not intended to portray specific parameters of the invention. The drawings are intended to depict only typical embodiments of the invention, and therefore should not be considered as limiting the scope of the invention. In the drawings, like numbering represents like elements.
Currently, purely academic, complex and novel algorithms are distilling information into knowledge to solve difficult and real business challenges. However, with the growing number of regressors, classifiers and density estimators, determining which algorithm to implement on a particular problem domain currently requires extensive domain expertise. For example, illustrative algorithms include techniques based on Support Vector Machines (SVM), Neural Networks, Bayesian Belief Networks, numerous clustering algorithms, Hidden Markov Models, Case Based Reasoning, Reinforcement Learning, Regression, Mixture Models, Kernels, etc. The field of statistics produces similarly diverse methods such as Principal Component Analysis, Probability Density Functions, Discrete and Continuous distributions, hypothesis testing, etc. The present invention addresses the process of selecting an analytic algorithm or model and the features of the data of which to process. A framework of feature and algorithm selection is herein described for perpetual knowledge generation.
The selection of features and algorithms to apply for a specific problem must be robust such that they can be encoded into a search and optimization problem. High dimensional searching requires careful consideration to explore the least amount of space while finding a best solution or a Pareto optimal hull. Evolutionary algorithms, modeled from nature, provide a parameterized framework for such searching. Nature provides natural systems that evolve over time within the context of an ecology. Within natural selection, the fittest members or groups of species pass their respective genes to the next generation. As such, the entire species adapts and changes as ecologies change. The best fit members defined as a combination of data features and algorithms are suited for a given environment or problem.
Genetic algorithms are inspired by biology and provide robust search and optimization techniques. Genetic algorithms utilize a fitness function to measure the utility of a chromosome. Generally, a fitness function is either applied directly to a chromosome or the phenotype of the translated representation. A chromosome is a concise and natural data representation of a set of parameters. The entire set of chromosomes creates a generation. The genetic operators within the framework of the algorithm can be either binary or probabilistic. Reproduction, crossover and mutation are the three core functions that generally exist within a genetic algorithm.
Domain Knowledge
The perpetual analytic system 10 is implemented using the genetic algorithm 32. The framework itself requires domain knowledge maintained within domain knowledge store 18. The No Free Lunch Theorem states that the distribution of a pair of solutions on all problems are equal. In other words, an algorithm will perform well on one set of problems while performing poorly on the remaining set. More formally,
ΣfP(dmy|f,m,a1)=ΣfP(dmy|f,m,a2) (1)
where a1 and a2 are a pair of differing algorithms, m are distinct points with the problem space, and dmz(m) is the associated cost or objective value of sample m.
If knowledge is not provided for an algorithm, there is no guarantee that a solution will be effective. As such, the domain knowledge store 18 abstracts specific algorithms 20 from the framework in such a way that a problem is not coupled with the genetic algorithm 32 yet maintains high domain cohesion. Domain knowledge store 18 encompasses algorithms 20 that are designed to run on a specific problem 12. For example, within speech recognition, a binary search tree may be used specifically for n-gram processing. Domain specific questions, such as “Do you like sports” accumulates evidence for a particular branch point within the tree. Further, complexity is encoded within each of the questions and can be designed such that the higher the tree ply level, the more complex the n-gram becomes. The aforementioned algorithm would be one of many algorithms 20 within the domain knowledge store 18. Equations 2 and 3 define two sets such that
∀anεAmax (2)
Ai⊂Amax (3)
an denotes a specific domain algorithm that resides within the complete set of algorithms Amax. Subsets of algorithms denoted Ai are subsets of the entire collection of algorithms.
Coupled with the algorithms 20, an exhaustive list of features 22 are stored within the domain knowledge store 18. The feature list encompasses all of the features 18 maintained by a dataset, e.g., database 30. Pattern recognition algorithms ingest the entire set or subsets of the feature space. An evolved feature selector determines the subset of features that form the feature space for each algorithm.
∀fnεFmax (4)
Fi⊂Fmax (5)
fn denotes a specific domain feature that is a member of the complete set of features Fmax. Subsets of features, denoted by Fi, are subsets from Fmax.
Fitness Functions
Another domain knowledge factor within the perpetual analytics system 10 includes a fitness function 24. A myriad of fitness functions 24 that relate to data retrieval or pure chromosome structure produce the fitness landscape that can be applied to the output of each algorithm 20, which is the phenotype of specific chromosomes. The fitness function 24 en measures the optimality of a chromosome.
Several fitness functions, Ei, can be combined together for an aggregate total of an optimality score.
∀enεEmax (6)
Ei⊂Emax (7)
Genetic Algorithm Parameters
The parameters of the genetic algorithm 32 define the granularity and scope for finding the best algorithm 14 and set of features 16 for a given problem 12. The evolutionary framework is bootstrapped by the number of chromosomes within each generation, cross over type and rate, mutation rate and the number of generations or an exit criterion. In effect, the genetic algorithm parameters describe how to search the space of which it encodes. The domain algorithms 20 and features 22 create the contours within the search space that will be evaluated by domain fitness functions 24. As is standard within genetic algorithms 32, a score of fitness is assigned to each chromosome. The a priori information with respect to reproduction ensures the fitness criterion is maximized before the generation of offspring. In this way, low scoring individuals are minimized within the population. As a result, the best or close to best algorithm and set of features will emerge.
Much design principle was maintained within the perpetual analytic system 10 so that common data mining pitfalls are minimized. Each algorithm 20 within the domain knowledge store 20 is trained and evaluated by training system 28 on separate data sets to protect against over fitting. Training and evaluation can follow the n-fold process for each independent algorithm. The plurality of algorithms (or models) within the genetic algorithm 32 construct eliminates the risk of relying on one model, but rather, models and features compete for eventual implementation. Though the experimenter must ask the right question, the feature selector determines the complexity of the question to ask. External knowledge and wisdom is encapsulated by the collection of algorithms 20 and the fitness function(s) 24. As more data is acquired, each algorithm 20 can be retrained and/or the chromosome fitness functions 24 modified. In this way, the modelers will not become stuck on a single model or sets of features. Instead, the genetic algorithm 32 will adapt to the accumulation of data and select an algorithm 14 and sets of features 16 given the data. If two or more models are recommended by the algorithm selector 34, those models can exist within an ensemble.
An implementation of a genetic algorithm within a people parsing context problem is shown within
The null hypothesis, A, states that an optimal selection of a subset of facial attributes, a hierarchical algorithm, and decision tree from the domain store will not group similar probes and gallery samples. Decomposing A, a1, a2 and a3 assert that cluster quality, search efficiency, and name search quality will not be optimized. The alternative hypothesis, β, believes that a set of optimized data features and algorithms will be an output from the perpetual analytic system 10 (
Within
Genetic algorithm parameters form a GA infrastructure and include elements such as generation number, population size, crossover type, mutation rate and chromosome encoding that is passed into a genetic computing infrastructure, such as the Evolutionary Computing in Java (ECJ Infrastructure 46). The evolutionary chromosomes 48 from ECJ infrastructure 46 represent binary feature selectors. A position within the chromosome, e.g., such as that shown in
Referring again to
Perpetual Analytics
As more data is ingested into the gallery, the gradients of the search space changes. By extension, the algorithm and feature selector will need to search the contours of the new search space. An offline genetic algorithm framework is designed to run on a static gallery space. The resulting gallery model and feature selections are pushed to a production environment that is continually accumulating new data outside of the explored gallery. To maintain a high fidelity gallery model, the next running of the genetic algorithm framework will include both the previous gallery and the newly introduced members. In addition to data change, new algorithms can be introduced into the offline system. The framework is extensible with independent variables, algorithms and features.
Genetic Algorithm Theoretical Foundation
The design of a genetic algorithm depends heavily on the underlying selection scheme, its parameter values and the value of evolutionary parameters, like mutation and crossover rates. In this section a case is made for using tournament selection, Holland's Schema Theorem is then introduced as a general theoretical tool for determining the values of evolutionary parameters, and then Schema Theorem to tournament selection is adapted. This will allow us to find an estimate of the mutation and crossover rates for our case, dependent on the proportion of the population finally occupied by the highest fitness found schema. Finally we will derive a way to obtain the optimal population size given that a certain minimum level of mutational change has to be retained between generations.
Tournament Selection
There are various selection schemes one can choose from when designing a genetic algorithm. The most popular selection schemes are fitness proportionate (also called roulette-wheel), stochastic universal sampling, ranking, local selection, truncation selection, and tournament selection. Stochastic universal sampling and tournament selection are the selection schemes mostly used in practice, since they are easy to implement, and are associated with low stochastic noise. Tournament selection has the advantage over stochastic universal sampling in that it can be easily adapted to parallel computing architectures.
Genetic Algorithm Parameter Determination
Holland's Schema Theorem is applicable to a population, so it can be used to derive estimates for crossover and mutation rates. The first step is, however, to transform the Schema Theorem into a version that is applicable to tournament selection, since Holland's original version was aimed towards fitness proportionate selection
Schema Theorem: The expected number of schema ξ at generation t+1 after one point crossover and mutation is
where P[ξ,t] is the probability of crossing with the same schema ξ, pc is the crossover probability, pM0 is the positional probability of no mutation occurring, h(ξ) is the schema order, {circumflex over (μ)}(t) is defined in equation (10), l(ξ) is the unconstrained distance of the schema, l is the length of the genome, E[ξ,t] is the number of schema ξ at time t, and {circumflex over (μ)}ξ is the absolute fitness of schema ξ (Note that the absolute fitness of a schema is not dependent on time!) and
where nt is the number of schemata in the population at time t.
Holland derived his Schema Theorem for the case of fitness proportionate selection, and so we have to modify it such that it fits our tournament selection scenario. The expression
used in the Schema Theorem betrays its fitness proportionate bias and we need to rewrite this term using tournament selection scheme parameters.
is the expected number of copies of schema ξ in the t+1-th generation before cross over and mutation events occur.
Let an initial population of size N be given and let the tournament size be k. The tournament selection process we are using consists of the following steps:
possibilities to arrange the i best schema copies along N positions and for each of these cases there are
possibilities to arrive at the given pattern. Since there are overall
possibilities to draw sets of k schemata out of N the probability of drawing the best schema exactly i times calculates to
which after simplification results in
This means that the number of copies of the best schema in the daughter population is binomially distributed
So the expected number of best schema copies in the daughter population is k, and the standard deviation is
Note that
Our result so far says, that if we have just one copy of the best schema in the population, then under tournament selection we expect in the next generation k copies. We now want to generalize this result to having at least one copy of the best schema, so let the number of copies of the best schema in the parent population be s with sεIN, where IN is the set of the positive integers. Then the probability of not obtaining a best schema during a single draw of a tournament set is
and since we take exactly one schema from each tournament set the probability of obtaining exactly one best schema from a tournament set is
So we obtain in the daughter population exactly i best schemata with the probability
which again is a binomial probability. The expected value of the number of best schemata in the daughter generation if there are s best schemata in the parent population is
This expression is not easily simplified into a more approachable form, but when we take the limit over the population size then we obtain:
We can see this by the following calculation:
Since
for all NεIN, we see that the expected value is increasing in N.
As before we are also interested in the standard deviation which is
And we obtain
as before in the simpler case of k=1. We see this by going through the following calculation:
The generalization shows us that the number of best copies grows linearly with the generation count with intensity k, and we obtain that we have to replace
by the expression k·E[ξ,t], and as final result we obtain for the schema theorem under tournament selection the following inequality:
A major problem in running evolutionary algorithms is to estimate the mutation rates properly. If the mutation rate is too low, then the process gets easily trapped in sub-optimal situations, while if too high, optimal situations might not be realized due to rapid fluctuations. The goal of this section is to provide reasonable estimates for the rate of mutation, and, linked with that, the rate of crossover. The following paragraph will discuss the effect of mutation and crossover on the schema of highest fitness and this will provide us with an approach to the estimation problems
In a population without crossover and mutation the deterministic part of the evolutionary process, namely reproduction and selection will result eventually in the population consisting entirely of the schema with the highest fitness. Adding crossover to this deterministic system will not change the eventual outcome if the crossover does not destroy every occurrence of the highest fitness schema during the initial time period where that schema frequency is low. Crossover will delay the final outcome, but will not change it.
The outcome changes, however, when adding mutation. In the following we will assume that each defined position of the schema experiences mutation with the same probability and that mutation occurs independently across those positions. Let the number of the defined positions of schema ξ be denoted by h(ξ), and denote with pM0 the probability that at any given defined position and time point no mutation occurs. Now lets assume for the moment that the whole population consists initially exclusively of copies of the highest fitness schema, then within one generation mutation will retain just a proportion of that population, and ignoring crossover that proportion will be (pM0)h(ξ). As the process continues, selection will keep eliminating all other schemata but ξ, while mutation will convert copies of ξ into other schemata. We expect, therefore, that in a population with mutation (and no crossover) the schema of highest fitness eventually will be occupying a proportion of about (pM0)h(ξ) of the population. Adding crossover means that schemata added by mutation possibly interact with ξ, but if they do the results will be selected against like as if they had been generated by mutation. So crossover has an effect like boosting the mutations rate, which means lowering the final proportion of schema ξ.
We return now to the issue mentioned in the introduction to this section, namely the proper estimate of the mutation rate, with an underestimation of that rate leading to entrapment, while an overestimation results in loss of information, means lack of convergence.
One approach to avoid the latter case is to define a target proportion for the schema of highest fitness, which then allows us, according to the discussion above, to derive criteria for the mutation and crossover rate. This approach also will provide a lower bound for the mutation rate, and so avoid the first case of getting trapped in too sub-optimal solutions, since defining a target proportion less than 1 forces the mutation rate above a minimum value. Assuming a target proportion has also implicitly the effect that the process is forced to converge, and that indefinite fluctuation is avoided.
In the following we will take the approach of defining a target proportion pξ for the highest fitness schema ξ, and we will use the Schema Theorem to obtain estimates for the crossover rate pc and the positional non-mutation rate pM0. We denote as pM1 the probability that a mutation occurs, so pM1:=1−pM0, so pM1 is the rate of mutation of the mutational process.
We require pM0 and pc to be chosen such that once the value of P[ξ,t] is sufficiently close to pξ, lets say |P[ξ,t]−pξ|<ε, convergence to the limit population dominates the process, which means that
for almost all t≧T0, where T0 is the first generation for which |P[ξ,t]−pξ|<ε. Taking the limit we obtain:
which means that
Therefore
Solving for (pM0)h(ξ) we obtain
and finally
[The last inequality was obtained from the fact that
for any cε(0,1], and
therefore
Indeed the underlying equality is
Solving now for pc we obtain
and finally
This concludes the derivation of the relationship between the evolutionary dynamics parameters for the purpose of their estimation.
Given the parameters of the genetic algorithm with tournament selection, k=7, pξ=0.85, δ(H)=34, and l=34, the terms N, pM0 and pc can be calculated. From equation X,
With l=34 and with schemata defined from equation X, h(ξ)=34, which implies that l(ξ)=33, to yield
The equation simplifies to 46.6667·(pM0)34−6.6667≧pc. We estimate pM0 first. Since pc>0 we obtain that 46.6667·(pM0)34>6.6667, and, therefore,
or pM0>0.94437. With the greater bound on pc where pc≦1 the equations can simplify to 46.6667·(pM0)34≦7.6667 yielding
giving pM0<0.94826. Overall the probability of no mutation is pM0ε(0.94437, 0.94826). The midpoint of the interval gives pM0=0.94632. Using the midpoint of pM0,
pc≦46.6667·(0.94632)34−6.6667 where pc≦0.4831.
To ensure that the schema ξ of highest fitness maximally occupies 85% of the limit population we have to set the mutation rate to pM1=0.05368 and the crossover rate to pc≦0.4831. The larger the rate of crossover the longer it will take the population to consist of ξ around 85%, and for lower pc the limit proportion is higher. However, the higher the mutation and crossover rate, the more ξ schemata of highest fitness are discovered during the evolutionary process. The schema ξ is dependent on the initial population distribution and changes as higher fit schemata are discovered.
Choosing the appropriate population size when running a GA is necessary for two reasons. First, if the population size is chosen too large, then the GA might not terminate in reasonable time due to immense processing effort. Second, if the population size is being chosen too small, then mutation can not at a sufficient rate introduce new schemata into the population and the algorithm will converge possibly not even to a local maximum, but rather gets stuck way before having reached a peak.
The question we are trying to answer in this section is: What is the best population size to choose when running a genetic algorithm? From that what has been mentioned before it is clear that the best population size is the smallest one that satisfies a certain requirement regarding maintaining mutational change. We tried to answer questions of the following type: Given a certain mutation rate, how large does a population have to be such that mutation will generate at least Y changed schemata with at least probability X (level of confidence X·100%) in the next generation?
Let the mutation rate per schema location be denoted by pM, then the probability that a schema ξ is transformed by mutation into another schema is pMeff:=1−(1pM)l
Let a population of size N be given and let nε{0, 1, 2, . . . , N}. Then the probability of exactly n schemata changing due to mutation from the mother to the daughter generation is:
where Y is the random variable for the number of changed schemata.
From here we obtain that the probability of at least n schemata changing is
So we can rewrite our initial question as follows:
Given a certain schema location mutation rate pM, and schema length l. How large does the size N of a population have to be such that mutation will generate at least n changed schemata with probability
So the parameters we have to supply are mutation rate pM, schema length l, minimum number of schemata n which are supposed to change, and the level of confidence X.
The table and graph depicted in
From the table and the accompanying graph we can see that as population size increases the probability that a certain proportion of the population changes due to mutation increases as well. This makes intuitive sense since in an infinitely large population we expect that the proportion to be changed is exactly the effective population size, namely here 27.5%, and that as the population size increases this proportion is expected to be met better and better.
Since we asked for a 95% confidence that at least 20% of the population changes, we obtain as the best population size N=80 (N=75 [the last number before 80 for which 20% is an integer] yields just a confidence of 94.7%.) The reasoning behind taking the smallest population size which just satisfies the requirements is, that any larger population size will require more processing effort.
The process goes analogously if one wants to have a certain minimum amount (instead of percentage) of changes to occur. The table and graph in
The best population size in this case is actually N=118 with a confidence level of about 95.3%.
A final remark shall clarify the difference between mutational change and variability. Mutational change addresses the probability that a schema will mutate into another schema during the process of generating the daughter population. A high probability of mutational change does not always mean high increase in variability. If the probability of mutating into an already present schema is large, and the parent generation shows large variability, then the increase in variability, even with a large probability of change, might be small. This is for example the case when the population size is close to the total number of possible schemata, and nearly all schemata are already present in a parent generation. In our case the number of schemata in the population will be low compared to the number of possible schemata, so the probability is high that mutation actually generates new schemata, and change indeed introduces more variability.
Genetic algorithm population estimation is derived from the above equations, which asserts that the population size is directly related with the mutation rate, schema length, minimum number of schemata that should change, and a level of confidence for parameter estimation.
Given the probability of mutation, pM0=0.94632, simulation results with the selected number of changed schemata as 25 are shown within the tables shown in
From the table shown in
Following the building blocks model and bounding N=χk (k log χ+log m), the lower bound population number follows,
21(1*log(2)+log(33))=8.4=N. (11)
Models that predict the size of populations for Bayesian Optimization Algorithms (BOA) bound the total number of schemata to initialize within the genetic algorithm was within a large range of [40.55, 2313.8] chromosomes.
O(m1.05)≦n≦O(m2.1)
O(341.05)≦n≦O(342.1)40.55≦n≦2313.8
Goldberg asserted that O(m1.4)≈N which estimated a population number for general genetic algorithms, which is within the building block and bounding model.
O(341.4)=139.3=N (12)
Clearly, N=60 is greater than 8.4, within the interval [40.55, 2313.8].
As a compromise between growth rate and run time, the Data Mining Feature and Algorithm Selector system utilizes the Tournament selection approach. In addition, the selection pressure of a tournament scheme is equal to the tournament. If the selection pressure is increased for the tournament scheme, the growth ratios and the upper bounds of the mutation and crossover probabilities increase. Another important aspect of the tournament scheme is the ability to parallelize processes.
Finally, the initial parameters of the genetic algorithm include N=60, pM1=0.05368, pc≦0.4831, k=7, pξ=0.85, δ(H)=34, and l=34.
Fitness Function
A measure of natural selection determines which individuals survive to the next generation. Such a utility function provides a numerical metric value that can be contrasted to other members of a population. Three weighted metrics provides a fitness value for the encoding of data features and an algorithm. A cluster quality score is determined from a chromosome's structural phenotype or cluster space. The second metric calculates the efficiency of a phenotype for a specific problem. Finally, a name quality score includes precision and recall values for a specific set of features.
Intuitively, a good cluster space contains similar items within clusters that have low variance where each cluster is spread apart. The cluster space is produced from the translation of a chromosome into a phenotype as described herein. The cluster quality metric measures the space ratio of a phenotype or cluster space. The best quality measure maximizes the distance between clusters while minimizing the distance of members within a cluster. Clusters are not necessarily spherical, so the distance between clusters is in general dependent on the orientation of the clusters relative to each other. As such, the measure takes the orientation of the clusters towards each other into account. The measure between clusters can be calculated by sample to mean or sample to sample. Even though the sample based measure is computationally intensive, the clustering measure is accurate without the risk of outlier skew. Outliers have already been eliminated by the clustering algorithm. As described below, the ratio of Db or within cluster spread to Dw or between cluster spread is fundamental for the quality measure.
where n is the overall number of elements in the space (total number of images), N is the number of clusters, nk is the number of elements in cluster k, k=1, 2, . . . , N, d(x,y) is the Euclidean distance between vectors x and y.
Using limited return dynamics,
with a being a parameter whose value is to be chosen such that it imposes a reasonable speed of growth on ρa. The halfway value of ρa, ρa(a)=½, is where for r=a half of the maximum possible quality intensity has been measured. The limited return dynamics grows fairly linearly and then bends to approach 1. Until intensity ½, the dynamics is close to linear yielding a good candidate for the halfway value when r reaches a maximum value and r is limited. However, r can grow indefinitely or clusters can be arbitrarily far apart. Instead, r is bounded by good clustering criteria. Db≦Dw should not occur because two clusters satisfying that condition would never be separated by a clustering approach. As such, Db>Dw. Qualitatively good clustering is defined by Db≧2·Dw where
or
Finally, a=2 is the halfway value and the cluster
quality formula becomes
The second feature affecting the usefulness of the selected algorithm and data features is search efficiency. The search efficiency measure is the expected number of search steps a probe has to exhaust in order to find the desired picture or sample. The reciprocal of the step number is a measure for the efficiency with which a clustering can be searched.
where Cc is a cluster c, c=1, 2, . . . , N and Px is the set of pictures in element x, 0≦w≦1. Within this formula, the only entity changing is N since
is the number of all pictures in the database and
is the number of all feature vectors which are constant. Every variable can be computed a priori except for the number of clusters. The search efficiency depends on the number of clusters if it is measured within a picture database with one fixed set of feature vectors.
Analogous to the cluster quality, limited return dynamics yields a measure for the search efficiency intensity. To calculate the halfway value, the derivative of
with regard to x provides x=√{square root over (c)} as a maximum for f(x). The largest value for w is attained when
The smallest value for f(x) is
attained when x=0, because
As such,
Because the original measure is not limited,
supremum of x, xsup, provides
Finally,
where π is a measure for the search efficiency.
The name search result quality yields a measure of quality from the results by an image feature vector search. The person names associated with each feature vector within the gallery or search space have associated names or targets. The produced confusion is utilized to calculate precision and recall values.
where the F-Score is
and where
and
X={x1, x2, . . . , xM} is the set of names; #xf=Number of occurrences of name x in feature vector f; Sx
Clearly m is between 0 and 1. Limiting return dynamics is used to standardize the score. The maximum possible value for m is 1, which is, for example, always the case when M=1. Finally, the name search result quality function is
A fitness function measures the performance of individuals relative to each other. The measure of overall performance can be based on several independent dimensional quantities. For example, in nature an individual of a species might be selected for size, parenting skills, ability to cooperate, and fur length. Each of the features is utilized within a fitness function that comprehends the diverse qualities. Within the perpetual analytics system 10, such a fitness function combines the cluster quality, search efficiency, and name result quality measures for a given cluster space. Recall that the instructions for the creation of the cluster space have been decoded from a chromosome. The combined metric provides a fitness score for the chromosome.
The creation of a final fitness function involves three consolidation steps. First, a common scale is imposed on each of the quantity measures. Within the system, a quantity is absent or present with an associated unlimited magnitude. The common scale is within the interval [0,1] or within the range of 0 to 100%. Second, the unlimited magnitude is mapped to a limited range [0,1] such that each quantity has been normalized to the same range. A limiting return dynamics function provides a limit on measures. Algebraically, the simplest form of limiting return is given by
where x is the originally measured quantity, and a is the value of that quantity x for which the term y becomes ½. The choice of a determines how fast y is growing as x increases. The variable a is a rescaling parameter which can be used to normalize diverse quantity measures even if the ranges of quantities are vastly different. The second consolidation step consists of finding for each measure the proper value of its parameter a. If the original measure is not unlimited the supremum xsup of the possible x values and use
is used. Lastly, relative weights of each metric encode a contribution to the overall fitness score.
The three fitness functions established are search space performance measures, so in order to establish a performance measure based on their combination we have to combine them via their harmonic mean.
We, therefore, define the final function φ(Ω) as follows:
Since ρ2(rΩ),π(wΩ),λ(mΩ)ε[0,1], each of the reciprocals is ≧1, and so the sum of those reciprocals is ≧3, guaranteeing that φ(Ω)ε[0,1] as well.
If we want to weigh fitness measures differently, we can extend the final fitness function as follows:
where 0≦a≦1, 0≦β≦1, and 0≦a+β≦1.
Analogously to before we conclude that φa,β(Ω)ε[0,1] since a·ρ2(rΩ)ε[0,a]∩[0,1], β·π(wΩ)ε[0,β]∩[0,1], and (1−a−β)·λ(mΩ)ε[0,(1−a−β)]∩[0,1], which again means that the sum of their reciprocals is ≧3.
Algorithms
Classification algorithms are defined by the structure of data to be processed and the behavior as to how the data should be processed. From data structure, classifiers follow a strict taxonomy. At the first level, a classifier can be either exclusive or overlapping. If each object belongs to one class, the classification is exclusive. However, if cluster sets are not disjoint, the classifier becomes overlapping. The cluster creation process is either intrinsic or extrinsic. An algorithm is considered intrinsic if a proximity or feature matrix is solely used to learn classes within data. However, data labels or targets indicate that an extrinsic measure will produce clusters. Intrinsic is commonly known as unsupervised learning while extrinsic is synonymous with supervised learning. K-means clustering is an example of an exclusive and intrinsic algorithm while C-means clustering is an overlapping and intrinsic implementation. Decision or classification trees are examples of extrinsic clustering. A third division within the taxonomy for exclusive and intrinsic algorithms include hierarchical and partitional. Hierarchical clustering is a chain of partitions where each ply or hierarchy level is a partition.
Following data structure decisions, classification algorithms have several differing behaviors for the processing data. Within any classification algorithm, any number of features can be selected at any iteration. A monothetic algorithm will use one data feature at a time. For example, within hierarchical clustering, a set of partitions might use the first feature while the following set select the second feature.
Alternatively, the use of all data features during classification is referred to as polythetic. Data feature processing helps to guide an algorithm to decide to merged or split a class or cluster. Class splitting is referred to as divisive while merging is called agglomerative. If all data begins within one class, the method will divide the least correlated data into separate class(es). However, agglomerative behavior initializes a cluster for each data object and merges like objects into classes. As new classes are formed, the center of the space can be updated after all data elements have been grouped, parallel, or after a single data element has been grouped, serial.
Dp(Clik, Clik+1) = D1(Clik, Clik+1)
A typical hierarchical clustering is implemented as described in table 2. The algorithm belongs to the exclusive, intrinsic, and hierarchical taxonomy while maintaining agglomerative, serial, and polythetic behavior. A hierarchical level or cluster space, Cn, is defined by a series of partitions, Pni. Each cluster, Clik, belongs to th ith partition and contains the lth data member,
xiklεClikεPni;i≧0;k≧0;l≧0;n≧0,
that is the kth cluster. The intersection of two clusters,
Clik∩Cli(k+l)=Φ,
produces an empty set because the data elements belong to one and only cluster. The hierarchical clustering algorithm either stops when all of the data elements are merged into a cluster or when a halting criterion is reached.
The dendrogram depicted in
Dp(Clik,Clik+1).
Two commonly used cluster similarity measures utilize the single link or complete link scores. The single link determines the minimum, Dp, pairwise distances between two clusters. The complete link selects the maximum distance, Dp, of all pairwise points between two clusters. Both algorithms run in O(m*n) between two clusters. The single link is more versatile such that it can extract concentric circles from a cluster space. However, the clusters created by complete link are more compact. Even with the PubFig database reduced and correlated into 32 features, the feature space is highly complex. As such, the single link implementation captured any feature vector landscape, including concentric circles. After the Dp metric is calculated between each cluster, the pair with minimum Dp within the similarity matrix is merged. According to Anderberg, the proximity indices calculated by Dp must satisfy:
a) D p(Clik,Clik+1) ≧0,∀k,k+1
b) Dissimilarity: Dp(Clik,Clik)=0,∀k
c) Similarity Dp(Clik,Clik)≧max Dp(Clik,Clik+1),∀k,k+1
d) Dp(Clik,Clik−1)=Dp(Clik−1,Clik),∀k,k+1
The similarity function implements a Cartesian distance metric. Within binary space, the Cartesian distance and Hamming distance are equivalent. However, given a clustering threshold greater than 0, the epicenters of the resulting clusters will not be within binary space. With the introduction of continuous variables within the epicenter of clusters, Cartesian distance provides a continuous variable as output. The following equation depicts a Hamming distance measure within binary space and a Cartesian function for all other values.
The variable y is an element within the centroid of a cluster.
A second classification algorithm, a version of a decision tree, is implemented within the system. The decision tree uses a growing method such as Chi-squared Automatic Interaction Detection (CHAID), Classification and Regression Trees (CRT), and etc. to create a tree-based classification model. The model creates groups or predicts values of a target based on predictor variables. The decision tree is a form of supervised learning since the target variables are defined a priori. Typically, decision trees are used for prediction, segmentation, stratification, data reduction, and grouping. Traditionally, each branch or node within the tree is represented by a decision rule. The decision rule is also a cut in space or classification process. The decision tree space forms a loose type of clustering space so that homogeneous clusters are formed.
Based on known work, the decision tree algorithm was altered to produce a clustering algorithm. Within the context of problem analytics, the cluster space should be produced from sets of selected feature vectors. To achieve such an algorithm, the decision tree pre and post processing was modified. Despite the alterations, the tree algorithm is still devisive, monothetic, and nonoverlapping.
The pre-processing of the data creates the a priori targets based on selected data features. Each of the data element's feature vector was projected by the chromosome onto a resulting feature space. The feature space for each data element will contain, at most, the original number of traits. After each data element has produced a projection, the target values are created. Target values are either non-existing points, Pne, or existing points, Pe. A non-existing point means that after a feature projection, the resulting set is not within the data set. An existing point is defined by the existence of a feature vector within the data set. Every record within the data set has a target value of Pne or Pe. The number of tree levels is dependent on the list of features to be used.
After the construction of the tree cluster space, all of the tree terminals or leaf nodes are collected as clusters. A two step approach smoothes the cluster space: Pruning mine specifies the minimum number of existing points that must be present within a cluster to survive where a% is a percentage threshold.
mine=|Pe|*a%
Merging mindist determines the minimum distance between clusters before they are to be joined where h is the schema order and b% is a percentage threshold.
mindist=h*b%
The overall algorithm is found in table 4.
Within the perpetual analytic system 10, the Euclidean distance is utilized with all similarity matrix computations. However, within binary space, the Cartesian measure is equivalent to the Hamming measure. As such, the use of a Hamming distance metric within binary space while implementing the Cartesian metric for all others is equivalent as implementing the Euclidean measure for all domains. Even though the Hamming distance is ideal for comparing binary vectors, which are the feature encodings for the PubFig dataset, the metric does not have the concept of a mean vector.
A centroid for a cluster is a mean vector of data elements. The genetic algorithm fitness function weights a cluster efficiency metric that relies on mean vectors. As a result, the Euclidean measure was chosen. Even so, the Euclidean measure of distance is equivalent to the Hamming measure of difference on bitwise comparisons.
Let two binary vectors X=(x1, . . . , xn) and Y=(y1, . . . , yn) be given, which means that xi,yiε{0,1} for i=1, . . . , n. Then the formula for the Hamming Distance H(X,Y) is as follows:
where
This formula can, in the case of xi, yiε{0,1} for i=1, . . . , n, be translated into
The formula for the Euclidean Distance D(X,Y) is:
Since |xi−yi| is either 0 or 1 for i=1, . . . , n, we obtain that |xi−yi|=(xi−yi)2, and so the Euclidean Distance becomes in the binary case:
which means that H(X,Y)=D(X,Y)2.
The Euclidean Distance can be used in binary space instead of the Hamming Distance when comparing vectors while also supporting constructs such as the mean vector, which in general is not a vector in binary space, but whose components show the proportions of 1s of all the binary vectors that contribute to the mean. The Euclidean distance formula accepts of any mean vector where the Hamming Distance Formula would not produce distance measures. Formula (13) can be applied to vectors which are not in binary space, however, the implementation would not be the Hamming Distance.
Each of the photos is described by 73 features obtained by attribute classifiers. All 73 attributes from each image was reduced to 33. Each attribute type, not value, is encoded onto a chromosome as a gene. A chromosome allel is in the set {1,0}. The 0 value means to exclude the attribute from a system iteration while 1 is inclusive. From equation x, a feature vector,
f(ū,cx)=ū′
ū=(1, Gender, Ethnicity, . . . , Generation)
cx=(1, 0, 1, . . . 0)
ū′=(1, null, Ethnicity, . . . , null)
The resulting feature vector is a subset of the original feature vector. Each distinct chromosome patterns select a unique set of features. A chromosome with length 33 has 233 possible combinations or possible translations into feature sets. Table 5 is an enumeration of all 33 features.
Data Preprocessing: Dimensionality Reduction and Variable Selection
The science of data analytics includes the analysis of data for the generation of insights resulting with predictive decisions. As data becomes more complex and heterognous, data analysis becomes intractable. The unsustainability of data complexity is known as the curse of dimensionality. Such high data feature space requires an increasing amount of computational cycles. As dimensionality increases, algorithms on the high feature space become computationally intense. As a result, two general classes of dimensionality reduction techniques along with variable selection methods are implemented within data analytics.
The first class of dimensionality reduction include lossy algorithms that project features into a lower dimensional space. The projection truncates the remainder such that the data cannot be recovered. For example, Principal Component Analysis (PCA) is a lossy compression algorithm that discards data that has a low impact on the overall datagram. The operation is noninvertible such that the original source cannot be retrieved. An example of an application of PCA is with the Joint Photographic Experts Group (JPEG) recognized by the International Organization of Standardization (ISO). With each successive application of the compression algorithm, the data becomes fuzzier. Data loss is not a problem as long as:
Secondly, lossless algorithms apply patterns or statistical models to data that maps the source into a lower dimensional space. The combination of the mapping and resulting data is smaller than the original. However, the algorithm is invertible so that the original data can be preserved. Many wavelet compression or ensemble combinations of algorithms provide lossless or near lossless steps.
Jointly or independent of dimensionality reduction, variable selection chooses which variables to include during analysis. Statistical methods become overwhelmed with an increase number of observations and the number of features within each observation. Each variable of an observation defines a single dimension. Statisticians utilize the term variable or attribute while computer scientists identify with the term feature. Several statistical methods for attribute selection include the Pearson Correlation Coefficient, t-test and other anomaly detection metrics. Even though computer science and statistical language has counterparts, dimensionality reduction and variable selection are very different.
Complimentary, variable selection performs well on non correlated data while dimensionality reduction is suitable for highly correlated data. Both methods can be combined for attribute or feature selection. Dimensionality reduction techniques are best used to rank correlation. Attribute selection is optimal for choosing informative features.
Through both dimensionality reduction and attribute selection methods, the Perpetual Problem Analytic System implements the Pearson correlation, significance testing, and continuous variable thresholding to simplify the data. All of the continuous variables were reduced to 0, absent, or 1, present, values. Of the 73 reduced attributes for each person photograph within the PubFig dataset, 33 attributes were selected to be encoded onto a chromosome.
People Dataset
The embodiment, people parsing, of problem analytics required a set of labeled images that contained enough detail for feature extraction. Several person centric databases are available within the public domain. The PubFig dataset contains 58,797 images of 200 people. Alternatively, a much shallower dataset called Labeled Faces in the Wild (LFW) contains over 13,000 images of over 5,700 people. Both the PubFig and LFW utilize existing images selected from the web and are split within a combination of training, validation and testing sets. Each of the LFW samples is labeled with a person name through a manual process. However, LFW did not have benchmarked attribute or features such as pose, clothing, gender, etc. associated with each picture. A third dataset by Carnegie Mellon University called the Pose Illumination and Expression (PIE) dataset contained 41,368 images of 68 people. The dataset is very deep and contains 60 feature descriptions. However, the images were acquired in a controlled setting.
Unlike PIE, PubFig contained people images from any type of environment. Further contrary to LFW, PubFig has 73 attributes obtained from feature classifiers. The authors of PubFig utilized Amazon's Mechanical Turk, a crowd sourcing labor market, to label the attributes of each person. Each photo was submitted to three people for voting. The PubFig paper provides the accuracy of each classifier. As a result, the PubFig dataset provides a good compromise between a person deep and wide dataset within a natural acquisition with many extracted image features. The PubFig development contains 60 people with 16,336 images with the evaluation set comprised of 140 people with 42,461 images.
Person traits from the PubFig database were discovered by attribute classifiers. The knowledge discovery process of person traits has been called People Parsing or attribute based people search. Kumar et al. applied 73 attribute classifiers to the entire 60,000 facial samples of the PubFig dataset. The classifier training data was obtained from crowd sourcing photographs to the Mechanical Turk. The labor cloud produced over 6.5 million inputs from 3 different people. Only the labels in which all 3 labelers agreed were retained. The mean accuracy of the classifiers is 84.09% while the variance was 0.006, which is very good on faces found on the internet.
The classification score is on the continuous interval [−1 ,1]. Attributes with a score s≧0 accepts the alternate hypothesis, Ha, while s<0 accepts the null hypothesis H0. The null hypothesis claims that the classifier is not correct while the alternative hypothesis supports the contrary. The linear classifier of s=0 reduces the continuous attributes into H0 or Ha classes.
Variable Selection
Each of the 73 variables describe features of a given photograph. Several of the features belong to the same class type such as generation. An individual must be in an exclusive category such as Baby, Child, Youth, Middle Aged or Senior. A person cannot be both a Baby and a Senior. However, to determine groups of variable classes, attribute states were aggregated. Category such as hair color, skin type, gender, generation, facial expression and eye wear type quickly formed. A few cases were ambiguous such as hair type. Could a person have both curly and wavy hair? To resolve non-obvious state relationships, the r-score or the Pearson Correlation Coefficient indicates correlation.
From Equation 14, a covariance matrix was formed with
The denominator of the calculation multiplied each variable's standard deviation denoted as sx and sy respectively. The Pearson Correlation value will be in the interval rε[−1,1]. A score of 1 proves that the two variables are perfectly and positively correlated. Alternatively, an r-score of −1 means the two variables are perfectly and negatively correlated. A percentage indicator of correlation is found by squaring the r-score. Table 6 summarizes common statistical r-score meanings. Within the context of people attributes, the r-score provided empirical evidence of attribute states.
Equally important is the significance test or 2-tailed t-test on the r-score. The significance tests provided the probability that an r-score could occur given the facial dataset. The hypothesis testing space is summarized with equations 15.
The degrees of freedom, from equation 16, of the two tailed test provide N−2 choices of which to place data. Following equation 17, the t-score is related to both degrees of freedom and the r-score. Any t-test score less than 0.05 is significant, which rejects the null hypothesis.
The confidence of attaining an r-score is another test that proved of importance. The z-score of every r-score was calculated. A z-score centers all of the data around 0 with a standard deviation of 1. Equation 18 achieves the z-score from the r-score.
Each z-score's standard error was the standard deviation, s, divided by the square root of the samples. The standard error is an input parameter for the calculation of the 95% confidence interval as shown in equations 20 and 22. The 1.96 value is determined from a z-score table that can be found in any statistical text book. Equation 21 is utilized to compute the r-score given the z-score to find the r-score confidence interval.
Selected Variables and Features
Variable selection and dimensionality reduction techniques was a two step process for data simplification. The data preprocessing reduces the search space of the perpetual analytic system such that several data feature and variable combinations are pruned. Prior and expert domain knowledge about a specific problem is utilized to logically group variables that could be states of a class. Scatter plots shown in
possible combinations to
Tables 7 and 8 depict Pearson Correlation and 2-tailed Significance testing for two candidate variable classes: Teeth and Hair Type. Clearly, the two teeth attributes for Teeth are extremely negatively correlated with an r-score of −0.986. The significance is virtually 100% for the alternate hypothesis or agreeing with the r-score. In addition, the Teeth variable r-score is within the 95% confidence interval.
For the Hair Type class, the r-score showed that the attribute Curly Hair was not, at minimum, moderately correlated with either Wavy or Straight Hair, with −0.164 and 0.045 respectively. The attribute Curly Hair was hoisted out of the Hair Type class and moved into an independent class. However, Wavy Hair and Straight Hair were combined as attributes for the Hair Type class. Both r-score confidence interval scores are calculated from equations 17 and 19.
After the variables were selected from a priori data soure knowledge, scatter plots and statistical analysis, dimensionality reduction produced features. Within a class such as Hair Type, the attribute with highest classification score was kept as the attribute feature. Equation 20 depicts the equation for all or nothing. If a class had only 1 attribute, the feature became Boolean.
The total 73 features was grouped and reduced to 35 attributes
System Training
The Perpetual Problem Analytic System utilizes supervised and unsupervised training techniques resulting with an overall semi supervised approach. Algorithm parameters are trained on half of the training data while the feature and algorithm selectors learn from the remaining data.
Semi Supervised Algorithm Threshold Training
Clustering algorithms require few input parameters that must be determined a priori. For example, K Means clustering is a powerful and efficient pattern classification technique. The term K must be defined before the start of clustering. Alternatively, a proximity score and threshold can be used to dynamically determine the number of clusters during clustering. Hierarchical clustering contains levels of partitions with varying cluster numbers. The halting criteria is defined by a threshold on member proximity scores. As such, the final partition level of a hierarchical clustering dendrogram is determined by a threshold value that translates to a cluster number.
A semi-supervised learning approach was taught using the overall fitness function found above. The cluster quality metric is unsupervised such that external knowledge about the samples are not required. Both the search efficiency and name search result quality require labels within the data. The metrics are a form of supervised learning. By mixing an unsupervised or non labeled data and supervised or labeled data metrics together, the overall fitness is a form of semi supervised learning. Traditional agglomerative hierarchical clustering follows continue to iterate until more than one cluster is present. Within semi-supervised trained hierarchical clustering, the final step checks a halting condition or the acceptable maximum number of clusters. The semi-supervised training approach is described in table 9.
The semi-supervised training approach was run on each chromosome that encodes features and algorithms. The training data was split into two independent sets separate sets to be used for hierarchical threshold training and overall chromosome ranking SPSS provided the mechanism for the training data division. Prior works such as provide the foundations of the approach.
Semi-Supervised Fitness Training
The overall semi-supervised genotype training is outlined in table 10.
Half of the training data was utilized within the entire genotype learning process. Both labeled and non-labeled data was within the training set. The result of table 10 produces a recommendation of a set of features and an algorithm to apply to a problem.
Results—Semi Supervised Algorithm Threshold Learning
The hierchical clustering maximum number of clusters threshold was trained on a subset of labeled development data from PubFig. SPSS Statistics split the development dataset approximately in half with random sampling. A set of parameters establishes the search space proximity matrix range and a step function to evaluate possible cases. A maximum, Dmax, and minimum, Dmin, Cartesian proximity measure between clusters was specified a priori. Within the algorithm, the maximum proximity measure of possible clusters is equal to l where all gene positions are different between a pair of chromosomes. Whereas, the minimum score is 0, which implies equality. Since the smallest increase of Di to Di+1 is 1, the step size for all possible thresholds between Clmax and Clmin is 1 resulting in l possible hierarchical thresholds. Each of the possible thresholds are applied to an agglomerative k-means clustering algorithm. After cluster convergence, the cluster space is measured with φa(Ω):=a·ρ2(rΩ)+β·π(wΩ)+(1−a−β)·λ(mΩ) from equation.
Both the final cluster space score φa(Ω) and the number of clusters are retained. After all possible l steps, the corresponding cluster number for the highest scored cluster space is returned as the |C| halting criteria. The algorithm is repeated for each chromosome since the projection of each unique chromosome creates different spaces. The number of clustering runs is determined by l*n.
According the space fitness evaluation score, the halting criteria |C| is selected. Clearly, by optimizing on φa(Ω), the algorithm produces a compromise between the number of clusters and a selected threshold. As depicted herein, the threshold is equivalent to the Hamming distance between chromosomes or the Cartesian distance in binary space.
Referring again to
I/O may comprise any system for exchanging information to/from an external resource. External devices/resources may comprise any known type of external device, including a monitor/display, speakers, storage, another computer system, a hand-held device, keyboard, mouse, voice recognition system, speech output system, printer, facsimile, pager, etc. The bus provides a communication link between each of the components in the computing device and likewise may comprise any known type of transmission link, including electrical, optical, wireless, etc. Although not shown, additional components, such as cache memory, communication systems, system software, etc., may be incorporated.
Access may be provided over a network such as the Internet, a local area network (LAN), a wide area network (WAN), a virtual private network (VPN), etc. Communication could occur via a direct hardwired connection (e.g., serial port), or via an addressable connection that may utilize any combination of wireline and/or wireless transmission methods. Moreover, conventional network connectivity, such as Token Ring, Ethernet, WiFi or other conventional communications standards could be used. Still yet, connectivity could be provided by conventional TCP/IP sockets-based protocol. In this instance, an Internet service provider could be used to establish interconnectivity. Further, as indicated above, communication could occur in a client-server or server-server environment.
It should be appreciated that the teachings of the present invention could be offered as a business method on a subscription or fee basis. For example, a computer system comprising a perpetual analytics system 10 could be created, maintained and/or deployed by a service provider that offers the functions described herein for customers. That is, a service provider could offer to deploy or provide the ability to provide path information in a storage configuration as described above.
It is understood that in addition to being implemented as a system and method, the features may be provided as one or more program products stored on a computer-readable storage medium, which when run, enables a computer system to provide a perpetual analytics system. To this extent, the computer-readable storage medium may include program code, which implements the processes and systems described herein. It is understood that the term “computer-readable storage medium” comprises one or more of any type of physical embodiment of the program code. In particular, the computer-readable storage medium can comprise program code embodied on one or more portable storage articles of manufacture (e.g., a compact disc, a magnetic disk, a tape, etc.), on one or more data storage portions of a computing device, such as memory and/or a storage system.
As used herein, it is understood that the terms “program code” and “computer program code” are synonymous and mean any expression, in any language, code or notation, of a set of instructions that cause a computing device having an information processing capability to perform a particular function either directly or after any combination of the following: (a) conversion to another language, code or notation; (b) reproduction in a different material form; and/or (c) decompression. To this extent, program code can be embodied as one or more types of program products, such as an application/software program, component software/a library of functions, an operating system, a basic I/O system/driver for a particular computing and/or I/O device, and the like. Further, it is understood that terms such as “component”, “subsystem” and “system” are synonymous as used herein and represent any combination of hardware and/or software capable of performing some function(s).
The block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be run substantially concurrently, or the blocks may sometimes be run in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Although specific embodiments have been illustrated and described herein, those of ordinary skill in the art appreciate that any arrangement which is calculated to achieve the same purpose may be substituted for the specific embodiments shown and that the invention has other applications in other environments. This application is intended to cover any adaptations or variations of the present invention. The following claims are in no way intended to limit the scope of the invention to the specific embodiments described herein.
Number | Name | Date | Kind |
---|---|---|---|
6327582 | Worzel | Dec 2001 | B1 |
7047169 | Pelikan et al. | May 2006 | B2 |
20050060391 | Kaminsky et al. | Mar 2005 | A1 |
20050209982 | Jin et al. | Sep 2005 | A1 |
20090313191 | Yao et al. | Dec 2009 | A1 |
20100169234 | Metzger et al. | Jul 2010 | A1 |
Entry |
---|
Schultz & Grefenstette “Using a Genetic Algorithm to Learn Behaviors for Autonomous Vehicles”, AIAA Guidance, Navigation and Control, 1992, pp. 12. |
JGAL, http://web.archive.org/web/20090731145101/http://jgal.sourceforge.net/ , 2009, pp. 2. |
Obitko, http://web.archive.org/web/20090501232753/http://www.obitko.com/http://web.archive.org/web/20090501232753/http://www.obitko.com/tutorials/genetic-algorithms/encoding.php, 2009, pp. 2,. |
Number | Date | Country | |
---|---|---|---|
20130073490 A1 | Mar 2013 | US |