The present invention relates to the classification of objects and patterns and, more particularly, to a classification system with a generic classification algorithm that is automatically optimized for classifying specific kinds of objects or patterns.
It often is useful to classify the members of a set of objects or patterns into two or more classes according to features of the objects or patterns. If the features have numerical values, it often is useful to define the classes by defining a “feature space” as a multidimensional space whose coordinate axes correspond to the features. The feature space is partitioned into portions corresponding to the classes. Each object's respective set of feature values is a vector in this feature space. An object is classified by determining which portion of the feature space its feature value vector lies in.
Feature space 10 is a “binary” feature space with two classes. The number of classes into which a feature space is partitioned is application-specific. For example, it may be useful to partition a geographic feature space whose coordinate axis features are “latitude” and “altitude” into three climate classes: “hot”, “temperate” and “cold”.
Feature space 10 is a simple two-dimensional feature space that is easy to visualize and so easy to partition. In practical applications, feature spaces may have tens of feature coordinates. Such high-dimensional spaces are difficult or impossible to visualize. Therefore, classification algorithms have been developed for partitioning high-dimensional feature spaces into classes according to training sets of vectors. Examples of such algorithms include nearest neighbor algorithms, support vector machines and least squares algorithms. See Richard O. Duda et al., Pattern Classification (Wiley Interscience, 2000). For any specific application, a set of training vectors is chosen and is classified manually. The algorithm chosen for partitioning the feature space then selects the boundaries (e.g. hyperplanes) that partition the feature space into classes in accordance with the training vectors. Subsequently, given a new vector of feature values, the algorithm decides which class that vector belongs to. The X's in
Note that the partitioning of the feature space need not be, and often is not, explicit. In other words, the algorithm chosen for partitioning the feature space actually operates by determining values of algorithm parameters in accordance with the manual classification of the training vectors. These parameter values define the partition boundaries implicitly, in the sense that, given a new vector of feature values, the algorithm decides on which side of the boundaries the new vector falls.
In higher-dimensional examples, the selection of the best classification algorithm to use for a specific application is a difficult task even for a specialist. So, for example, a manufacturer of digital cameras who desires to include in each camera a chip for advising the user of the camera about the quality (acceptable vs. non-acceptable, for example) of each photograph, would have to invest in an expensive research and development effort to select and optimize the appropriate classification algorithm. There is thus a widely recognized need for, and it would be highly advantageous to have, a system that could be trained by a non-specialist to perform near-optimum classifications for any particular application.
According to the present invention there is provided a classification system, including: (a) a training device for: (i) selecting which one of a plurality of training classification algorithms best classifies a set of training vectors, and (ii) finding a set of values, of parameters of a generic classification algorithm, that enable the generic classification algorithm to substantially emulate the selected training classification algorithm; and (b) at least one classification device for classifying at least one vector other than the training vectors, using the generic classification algorithm with the values.
According to the present invention there is provided a classification system, including: (a) a training device for selecting which one of a plurality of classification algorithms best classifies a set of training vectors; and (b) at least one classification device for classifying at least one vector other than the training vectors, using the selected classification algorithm.
The basic system of a first embodiment of the present invention includes two kinds of devices: a training device and one or more (preferably more than one) classification devices. The training device selects which one of a set of two or more training classification algorithms best classifies a set of training vectors, and then finds a set of values, of parameters of a generic classification algorithm, that enable the generic classification algorithm to substantially emulate the training classification algorithm that best classifies the set of training vectors. The classification device(s) use(s) the generic classification algorithm, parametrized with the values that the training device found, to classify other vectors.
Preferably, the classification device(s) is/are reversibly operationally connectable to the training device to receive the generic classification algorithm parameter values that the training device finds.
Preferably, the training device finds the generic classification algorithm parameter values by steps including resampling the feature space of the training vectors, thereby obtaining a set of resampling vectors, and then classifying the resampling vectors using the training classification algorithm that best classifies the set of training vectors. Most preferably, the resampling by the training device resamples the feature space more densely than does the set of training vectors.
Preferably, the training device has the option of dimensionally reducing the set of training vectors before selecting the training classification algorithm that best classifies the set of training vectors, and the classification device(s) also has/have the option to similarly dimensionally reduce the other vectors that it/they classifies/classify.
Preferably, the system also includes, for each classification device, a respective memory for storing the generic classification algorithm parameter values. Each memory is reversibly operationally connectable to the training device and to the memory's classification device.
Preferably, each classification device includes a mechanism for executing the generic classification algorithm. In various embodiments of the classification device, the mechanism includes a general purpose processor, and/or a nonvolatile memory for storing the generic classification algorithm program code, or a field programmable gate array, or an application-specific integrated circuit.
Preferably, the generic classification algorithm is a k-nearest-neighbors algorithm.
Preferably, the training device includes a nonvolatile memory for storing program code for effecting the selection of the best training classification algorithm and the finding of the corresponding parameters of the generic classification algorithm. Most preferably, at least a portion of such code is included in a dynamically linked library.
The basic system of a second embodiment of the present invention also includes a training device and one or more classification devices. The training device selects which one of a set of two or more classification algorithms best classifies a set of training vectors. The classification device(s) use the selected classification algorithm to classify other vectors.
Preferably, each classification device includes a mechanism for executing the selected classification algorithm and a memory for storing an indication of which one of the classification algorithms has been selected by the training device. Alternatively, each classification device itself does not include such a memory. Instead the system includes, for each classification device, a respective memory, for storing the indication of which one of the classification algorithms has been selected by the training device, that is reversibly operationally connectable to the training device and to the memory device's classification device. Under either alternative, most preferably the memory also is for storing at least one parameter of the classification algorithm that has been selected by the training device.
The invention is herein described, by way of example only, with reference to the accompanying drawings, wherein:
The present invention is of a system which can be trained by a non-specialist to classify objects for any specific application.
The principles and operation of a classification system according to the present invention may be better understood with reference to the drawings and the accompanying description.
Referring again to the drawings,
Training device 30 is represented functionally in
The input to training device 30 is a set 22 of training vectors that have been classified manually. In block 34, set 22 is used as input to several classification algorithms that are used independently, to determine which of these classification algorithms is the best algorithm to use in the application from which the training vectors of set 22 have been selected. These classification algorithms are called “training classification algorithms” herein. One way to determine the best algorithm to use is the “leave one out” method. Given a set 22 of N training vectors, each training classification algorithm is run N times on set 22, each time leaving out one of the vectors. The training classification algorithm that duplicates the manual classification of the largest number of “left out” vectors is selected in block 36 as the “best” training classification algorithm for the application at hand.
In block 38, values of parameters of a generic classification algorithm are selected that enable that generic classification algorithm to emulate the best training classification algorithm. One way to do this is to resample the feature space that is sampled by training set 22, to use the best training classification algorithm to classify the resampling vectors, and to use the resampling vectors thus classified to train the generic classification algorithm. Preferably, the resampling vectors are distributed randomly in the feature space. The resulting parameter values are output as a generic parameter set 24.
Classification device 40 includes a non-volatile memory 42 for storing the generic parameter values of set 24 and a classifying mechanism 44 that uses the generic classification algorithm, as parameterized by the parameter values stored in memory 42, to classify any new feature vector that is presented to classifying mechanism 44. Classifying mechanism 44 may be implemented in hardware, firmware or software. For example, a preferred software implementation of classifying mechanism 44 includes a non-volatile memory for storing program code of the generic classification algorithm, a general purpose processor for executing the code, and a random access memory into which the program code instructions are loaded in order to be executed. A preferred hardware implementation of classifying mechanism 44 includes a field programmable gate array or an application-specific integrated circuit that is hardwired to implement the generic classification algorithm.
The preferred generic classification algorithm is a k-nearest-neighbors algorithm.
Some features of the vectors of training set 22 may be irrelevant to the classification of these vectors. For example, the color of a person's hair has no bearing on whether that person is obese. Including values of the feature “hair color” in the vectors of a training set for training a classification algorithm to partition feature space 10 would just introduce noise to the training process. It is obvious in this simple example not to include a “hair color” feature in an obesity training set; but in practical cases of higher dimensionality it is not obvious what features or combination of features to exclude from the training vectors. Therefore, optionally, before the training classification algorithms are executed in block 34, the dimensionality of the feature space described by training set 22 is reduced in block 32, using a procedure such as principle component analysis that culls irrelevant dimensions from the feature space. For example, given a three-dimensional set of vectors of values of the features “height”, “weight” and “hair color”, classified manually into the classes “obese” and “non-obese”, principle component analysis would determine that the “hair color” feature is irrelevant and reduce the dimensionality of the feature space of the set to two, as in
Classification device 40 preferably is physically separate from training device 30 and is reversibly operationally connected to training device 30 only for the purpose of loading generic parameter set 24 into memory 42. For example, the manufacturer of digital cameras mentioned above trains the generic classification algorithm using training device 30 and an appropriate set 22 of training vectors, and then equips each one of the cameras with its own classification device 40, implemented e.g. as a set of integrated circuits in a multi-chip package, with the parameter values of generic parameter set 24 loaded in its memory 42.
That training device 30 is (preferably) a general purpose computer makes it easy to replace the training classification algorithms and the generic classification algorithm with improved algorithms. This replacement is most conveniently done by downloading the new algorithms from an external source such as the Internet.
That system 20 is self-contained and allows a user of system 20 to implement near-optimal classification without the assistance of specialists allows the user to maintain confidentiality of training set 22.
An alternative embodiment of the present invention lacks the generic classification algorithm. Instead, both training device 30 and classifying mechanism 44 share the same set of classification algorithms. Training device 30 selects the classification algorithm that best classifies training set 22, as above. Then, instead of selecting parameters for a generic classification algorithm, training device 30 prepares a bit string that indicates which of the classification algorithms is the best algorithm. This bit string is transferred to classification device 40, which therefore subsequently knows which of its classification algorithms to use to classify new feature vectors. Along with this bit string, training device 30 sends classification device 40 a set of parameters that defines for classification device 40 the feature space that the best algorithm determined. For example, if the best algorithm is a least squares algorithm then training device 30 sends classification device 40 the parameters of a hypersurface analogous to line 12 of
As noted above, one of the strengths of the present invention is its ability to enable non-specialists to perform near-optimal classification of vectors in feature spaces of high dimension. There also are low-dimension cases that, because of their complexity, benefit from the present invention. Consider, for example, the following “verification” problem. Access to a facility must be restricted to authorized personnel. For that purpose, the facility is equipped with three biometric authentication devices. The first biometric authentication device measures the iris patterns of people who seek access to the facility. The second biometric authentication device measures the facial features of people who seek access to the facility. The third biometric authentication device measures fingerprints of people who seek access to the facility. Each biometric authentication device also reads an identity card of a person seeking access to the facility (which identity card, of course, must be an identity card of a person who is authorized to have access to the facility), compares its biometric measurement to a corresponding measurement in a database of such measurements made on people with authorized access, and produces a number representative of the probability that the person seeking access is the person identified by the identity card. The facility manager wants to combine the three biometric measurements in order to minimize false positives and false negatives. The present invention allows the facility manager to do this without being or hiring a classification specialist. In the context of the present invention, the probability produced by each biometric authentication device is a value of a corresponding feature in a three-dimensional feature space. The facility manager generates a training set 22 for system 20 by assembling a suitably large and varied population of people and by using the biometric authentication devices to make many measurements of respective biometric signatures of each member of the population. For each member of the population, one of these measurements is designated as a reference measurement, and the remaining measurements are transformed into corresponding training vectors by combining them with the reference measurement of that member of the population. These training vectors are classified as “access authorized”. Then the remaining measurements of that member of the population are transformed into another set of corresponding training vectors by combining them with the reference measurement of a different member of the population who is selected at random. These training vectors are classified as “access denied”. System 20 then is trained and implemented as described above.
The present invention, in addition to being useful to users who lack the expertise to develop classification algorithms that are optimized for their own specific applications, also is useful to users who do have such expertise. The present invention, by its generic nature, spares such a user the time and expense of developing and manufacturing a classification device 40 that is custom-tailored to that user's specific needs, even if the user is capable of doing so.
While the invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of the invention may be made.
Number | Date | Country | Kind |
---|---|---|---|
168091 | Apr 2005 | IL | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IL06/00470 | 4/11/2006 | WO | 00 | 10/25/2009 |