The present invention relates generally to pattern recognition systems.
There are numerous types of practical pattern recognition systems including, by way of example, facial recognition, and fingerprint recognition systems which are useful for security, speech recognition and handwriting recognition systems which provide alternatives to keyboard based human-machine interfacing, radar target recognition systems and vector quantization systems which are useful for digital compression and digital communications.
Generally, pattern recognition works by using sensors to collect data (e.g., image, audio) and using an application specific feature vector extraction process to produce one or more feature vectors that characterize the collected data. The nature of the feature extraction process varies depending on the nature of the data. Once the feature vectors have been extracted, a particular classification sub-system is used to determine a vector subspace in which the extracted feature vector belongs. Each vector subspace corresponds to one possible identity of what was measured using sensors. For example in facial recognition, each vector subspace can correspond to a particular person. In handwriting recognition each vector subspace can correspond to a particular letter or writing stroke and in speech recognition each subspace can correspond to a particular phoneme-an atom of human speech.
One type of pattern recognition algorithm that is used to map feature vectors to a particular vector sub-space is called Support Vector Machine (SVM). Support Vector Machines are based on a formulation of the task of finding decision boundaries as an inequality constrained optimization problem. The goal of Support Vector Machines is to find decision boundary for which the distance (margin) between the decision boundary and exemplars of classes on both sides of the boundary is maximized. The earliest SVM algorithms were aimed at finding a linear boundary defined by a vector w and a bias wo in a feature vector space. More recently the so-called ‘Kernel trick’ which replaces vector dot products used in the SVM with non-linear functions have been proposed. Examples of Kernel functions K(X,Y) that have been used in Support Vector Machines include Linear K(X,Y)=XTY, Polynomial K(X,Y)=(γXTY+δ)d, Radial Basis Function K(X,Y)=exp(−γ∥X−Y∥2), and Sigmoid K(X,Y)=tanh(γXTY+δ), where γ, δ, and d are fixed parameters for their corresponding kernels. However, these kernel functions are typically used with fixed values of configurations parameters (e.g., P=2 and a fixed value of γ in the case of the Radial Basis Function) so that a simplified Quadratic Programming method can be used to determine the unknown Support Vectors characterizing the input data.
A new distance metric function called the Q-Metric which is useful in pattern recognition is disclosed in Co-pending patent application Ser. No. 11/554,643 filed Oct. 31, 2006, entitled “System For Pattern Recognition With Q-Metrics” by Magdi Mohamed et al. One mathematical expression of the Q-Metric is given by:
where,
For computationally intensive pattern recognition applications, such as training with large, high dimensional training data sets, and on-line recognition at high data rates, the Q-Metric offers the advantage that it only involves elementary arithmetic operations, e.g., addition, subtraction, multiplication, and division. This is unlike the Sigmoid and Gaussian functions mentioned above and unlike the P-Metric;
where, xi and yi are ith coordinates of a first and second vector respectively. The P-Metric involves raising to the power p (which may range up to a high value) and taking a pth root. Since p may take on arbitrary values within a specified range, evaluating the P-Metric, especially the root is computationally intensive and could be numerically unstable. This is in contrast to the Q-Metric, which as stated involves elementary operations. However, like the P-Metric, the Q-Metric is configurable through a range of function from a setting with λ=0 approximating the Manhattan (Taxi Cab) distance to a configuration with λ=−1 approximating the p metric with p=infinity. Thus, the Q-Metric has advantages in terms of metric property versatility yet has a low computational cost.
The accompanying figures, where like reference numerals refer to identical or functionally similar elements throughout the separate views and which together with the detailed description below are incorporated in and form part of the specification, serve to further illustrate various embodiments and to explain various principles and advantages all in accordance with the present invention.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.
Before describing in detail embodiments that are in accordance with the present invention, it should be observed that the embodiments reside primarily in combinations of method steps and apparatus components related to Support Vector Machines. Accordingly, the apparatus components and method steps have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments of the present invention so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.
In this document, relational terms such as first and second, top and bottom, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element proceeded by “comprises . . . a” does not, without more constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises the element.
It will be appreciated that embodiments of the invention described herein may be comprised of one or more conventional processors and unique stored program instructions that control the one or more processors to implement, in conjunction with certain non-processor circuits, some, most, or all of the functions of Support Vector Machines described herein. The non-processor circuits may include, but are not limited to, signal drivers, clock circuits, power source circuits, and user input devices. As such, these functions may be interpreted as steps of a method to perform pattern recognition (classification and regression tasks). Alternatively, some or all functions could be implemented by a state machine that has no stored program instructions, or in one or more Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs), in which each function or some combinations of certain of the functions are implemented as custom logic. Of course, a combination of the two approaches could be used. Thus, methods and means for these functions have been described herein. Further, it is expected that one of ordinary skill, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology, and economic considerations, when guided by the concepts and principles disclosed herein will be readily capable of generating such software instructions and programs and Integrated Circuits (ICs) with minimal experimentation.
The sensors 102 are coupled to one or more digital-to-analog converters (D/A) 106. The D/A 106 is used to digitize the data collected by the sensors 102. Multiple D/A's 106 or multi-channel D/A's 106 may be used if multiple sensors 102 are used. By way of example, the output of the D/A 106 can take the form of time series data and images. The D/A 106 is coupled to a feature vector extractor 108. The feature vector extractor 108 performs lossy compression on the digitized data output by the D/A 106 to produce a feature vector which compactly represents information derived from the subject 104. Various feature vector extraction programs that are specific to particular types of subjects are known to persons having ordinary skill in the relevant arts.
The feature vector extractor 108 is coupled a Support Vector Machine 110. Assigning a feature vector to a sub-space completes a task of classifying the subject. The SVM 110 is coupled to a Q-Metric computer 112. The Q-Metric computer 112 is used to compute Q-Metric distances that are used as Kernel function values by the SVM 110. The Q-Metric computer 112 may be implemented in software, hardware or a combination of hardware and software.
An identification output 114 is coupled to the SVM 110. Information identifying a particular vector-subspace (which corresponds to a particular class or individual) is output via the output 114. The identification output 114 can, for example, comprise a computer monitor.
Another way to compute the Q-Metric distance given by EQU. 1 is by evaluating the following recursion relation.
Ψi=Ψi-1+|xi−yi|+λΨi-1|xi−yi| EQU. 2
starting with an initial function value:
Ψ0=0
up to subscript N where N is the dimensionality of the feature vectors. The Q-Metric distance metric given by EQU. 1 is also determined by
d
λ(x,y)=ΨN
The first multiplier 304 receives the metric control parameter λ at a second input 308. The metric control parameter λ is received through a second input 384 of the recursive lambda rule engine 399 from a parameter register 386. The first multiplier 304 outputs a series of products λδi at an output 310.
The output 310 of the first multiplier 304 is coupled to a first input 312 of a second multiplier 314. The first multiplier 304 in combination with the second multiplier 314 form a three input multiplier. One skilled in the art will appreciate that signals input to the first multiplier 304 and the second multiplier 314 may be permuted among the inputs of the first multiplier 304 and second multiplier 314 without changing the functioning of the engine 399. An output 316 of the second multiplier 314 is coupled to a first input 318 of a first adder 320. A second input 322 of the first adder 320 sequentially receives absolute values of the differences δi directly from the first input 382 of the lambda rule engine 399. An output 324 of the first adder 320 is coupled to a first input 326 of a second adder 328. Accordingly, the first adder 320 and the second adder 328 form a three input adder.
An output 330 of the second adder 328 is coupled to a first input 332 of a multiplexer 334. A second input 336 of the multiplexer 334 is coupled to an initial value register 388. A control input 338 of the multiplexer 334 (controlled by a supervisory controller not shown) determines which of the first input 332 and second input 336 is coupled to an output 340 of the multiplexer 334. Initially the second input 336 which is coupled to the initial value register 388 is coupled to the output 340. For subsequent cycles of operation of the recursive lambda rule engine 399 the first input 332 of the multiplexer 334 which is coupled to the output 330 of the second adder 328, is coupled to the output of the multiplexer 334 so that the engine 399 operates in a recursive manner.
The output 340 of the multiplexer 334 is coupled to an input 342 of a shift register 344. An output 346 of the shift register 344 is coupled to a second input 348 of the second multiplier 314 and to a second input 350 of the second adder 328.
During each cycle of operation, the output of the first multiplier 304 is λδi, the output of the second multiplier 314 is λδiψi-1 (the third term in equation two), the output of the first adder 320 is δi+λDδi ψi-1, and the output of the second adder 328 is ψi-1+δi+λDδi ψi-1, which is the right hand side of equation two. After N cycles of operation the output of the second adder 328 will be the Q-Metric distance.
In block 508 an initial population of vectors of numerical parameters of a non-linear SVM are generated. Each vector includes slack variables ξI, Lagrange multipliers αI, and the configuration parameter λ. The values of the numerical parameters in the vectors in the initial population may be random numbers within predetermined bounds. Each vector includes a total of 2m+1 numerical parameters, where m is the number of the feature vectors read in block 502.
In block 510 an objective function to be optimized is evaluated with each vector in a current population (which initially is the initial population) The objective function is a dual form Lagrangian for a classification SVM and can, for example take the form:
where m is the number of training data points.
The value of the dual form Lagrangian serves as a measure of fitness of each vector for the purpose of Differential Evolution.
Next block 512 tests if a stopping criteria has been met. The stopping criteria can include a variety of tests including but not limited to: a comparison of a maximum fitness or a population average to a predetermined goal, a comparison of a current generations maximum or population average fitness to a best maximum or population average fitness among preceding generations (e.g., stop when fitness degradation or no substantial improvement is detected). Other fitness tests are known to persons of ordinary skill in the art of Differential Evolution.
If it is determined in block 512 that the stopping criteria has not yet been met, then in block 514 population members, i.e., vectors, are selected to be used in forming a next generation based on fitness. For example the stochastic remainder method may be used. Typically high fitness population members may be selected multiple time in order to maintain a constant population size.
In block 516 Differential Evolution evolutionary operations are performed on the vectors that have been selected in block 514. Such operations include one-point crossover, two-point crossover, Differential Evolution mutation and circular shifting. These operations are selectively applied using pre-programmed rates or adaptive rates that may be changed during the optimization process. For each population member and for each operation a random number (e.g., between 0 and 1) is generated and compared to a preprogrammed rate (e.g., 0.11) if the random number is less than the given rate, the operation is performed on the population member.
The numerical parameters ξI αI λ are supposed to be restricted to certain bounds, i.e.:
0≧λ≧−1
C≧α
i≧0
ξi≧0
The Differential Evolution operations that are performed in block 516 may cause certain parameter values to go out of bounds. In block 518 the values of any parameters that have been set of out of bounds are corrected. For example, if the new value of λ is larger than 0, λ can be corrected by using a negative but small random value in the permissible range. If the new value of λ is less than −1, λ can be corrected by using a random value slightly greater than −1 also in the permissible range. The constraints for the other variables can be imposed in a similar manner. Following block 518 the training program 500 returns to 510 to evaluate a new generation which has been generated in blocks 514, 516 and 518.
If in block 512 it is determined that the stopping criteria has been met, the training program 500 branches to block 520 in which support vectors are identified. Support Vectors are those training vectors xi for which the corresponding Lagrange multipliers αi are non-zero, or practically speaking, to accommodate the imprecision of numerical calculation those training vectors for which the corresponding Lagrange multipliers αi are larger than the pre-error tolerance ε.
In block 522 a decision function bias denoted b is computed by evaluating the following function:
M is the number of Support Vectors (e.g. αi≠0) and K(λ,xi,xj) is a Q-Metric based kernel function of the XI and XJ vectors, with a value of the Q-Metric configuration parameter λ determined by Differential Evolution procedure or any other optimization method.
In block 524 the bias and the support vectors are stored for use in on-line pattern recognition.
Alternatively rather than using all of the training data in one run through the training program 500, subsets of the training data can be run through the training program successively. After each run with a subset only the support vectors are retained for use in successive runs. This approach can avoid having to solve a high dimensional non-linear optimization problem in one Differential Evolution run.
The discriminant function corresponding to the separating hyperplane is given by
The received feature vector x is classified based on whether the discriminant function D(x) is positive or negative.
According to alternative embodiments of the invention SVM kernel functions that include the Q-Metric composed with another function are used. For example:
K(λ,xi,xj)=e−d
On the other hand, in the case shown in
The training program 500 shown in
where,
For the Q-Metric SVM regression the discriminant function bias is given by:
and the regression function is given by:
where, ξ is a pre-programmed small number-effectively a tolerance on zero for numerical computation. The Support Vectors for the regression task are those training vectors xj for which the corresponding Lagrange multipliers αj or
A modified form of the training program 500 shown in
K(x,y)=e−γ∥x−y∥
with an initialized allowable margin ε of 0.1 and a fixed value of the parameter γ=1.0.
The result shown in
Support Vector Machine regression using a Q-Metric kernel can be used for a variety of applications including location based services and echo noise cancellation for example. In echo noise cancellation, for each noisy signal vector, there is a desired signal that will be used to cancel the noise. Thus, the noisy signal vector becomes input feature vector, and the desired signal becomes target value for a regression machine.
In the foregoing specification, specific embodiments of the present invention have been described. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present invention as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of present invention. The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential features or elements of any or all the claims. The invention is defined solely by the appended claims including any amendments made during the pendency of this application and all equivalents of those claims as issued.