The present invention relates to machine learning of artificial intelligence (AI) and, more particularly, to an eclectic classifier and level of confidence thereof.
As is well known, machine learning builds a hypothetical model based on sample data for a computer to make a prediction or a decision. The hypothetical model may be implemented as a classifier, which approximates a mapping function from input variables to output variables. The goal of machine learning is to make the hypothetical model as close as possible to a target function which always gives correct answers. This goal may be achieved by training the hypothetical model with more sample data.
Machine learning approaches are commonly divided into three categories: supervised learning, unsupervised learning, and reinforcement learning. Various models have been developed for machine learning, such as convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM) network, YOLO, ResNet, ResNet-18, ResNet-34, Vgg16, GoogleNet, Lenet, MobileNet, decision trees, and support vector machine (SVM).
However, in the traditional approach, a classifier is applied with only a single model. As shown in
Therefore, it is desirable to provide an improved classifier to mitigate and/or obviate the aforementioned problems.
In fact, the proposed invention considers several models simultaneously. It employs the results of these models and outputs a balanced answer. The invention gives an eclectic solution to the classification problem and a byproduct which we call “level of confidence”, although it takes more computation and time.
To our knowledge, there is no single classifier model (or algorithm) that solves every classification problem with the highest accuracy. Thus, according to a first aspect of the present invention, a method is provided to implement an eclectic classifier.
The eclectic classifier of the present invention may be implemented in a cloud server or a local computer as hardware or software (or computer program) or as separated circuit devices on a set of chips or an integrated circuit device on a single chip.
Before implementing the main steps of the eclectic classifier of the present invention, several preliminary steps should be performed in advance.
(Preliminary Step P1: Preparing a Training Set)
Let Ω⊂p be a collection of data (or observations) which is composed of m memberships (or categories) of elements, and the m memberships are digitized as 1, 2, . . . , m.
A part of data Ωtr⊂Ω, typically called a “training set” and another part of data Ωtt⊂Ω, typically called a “test set” are prepared from the data Ω. The collection of data Ω may optionally include more parts, such as a remaining set Ωth.
(Preliminary Step P2: Setting a Membership Function)
Let y:Ω→S={1, 2, . . . , m} be a membership function (also regarded as a target function) so that y(x) gives precisely the membership of x.
(Preliminary Step P3: Training a Developed Classifier)
The goal of the classification problem is to use the training set Ωtr to derive a classifier ŷ(x) that serves as a good approximation of y(x).
(Preliminary Step P4: Decomposing the Training Set into Subsets)
Clearly, y(x) and ŷ(x) produce two decompositions of Ωtr as disjoint unions of subsets:
where, for j=1, . . . , m,
Ωtr(j)={x∈Ωtr:y(x)=j}
which is the genuine classification of the elements,
and
{circumflex over (Ω)}tr(j)={x∈Ωtr:ŷ(x)=j}
which is the approximate classification of the elements.
Define the cardinalities ntr=|Ωtr| and ntr(j)=|Ωtr(j)|, and obviously, ntr=Σj=1mntr(j). The cardinality |A| of a set A is simply the number of elements in the set A.
(Preliminary Step P5: Preparing a Test Set)
The test set Ωtt is used to determine the accuracy of ŷ, where the accuracy may refer to the percentage (%) of x's in Ωtt such that ŷ(x)=y(x), for example. It is assumed that both Ωtr and Ωtt are sufficiently large and share the full characteristics represented by the whole data Ω.
(Main Step Q1: Combining Developed Classifiers)
Suppose that there are k developed classifiers, ŷ1, . . . , ŷk, k≥2. A vector function is defined as:
V(x)=(ŷ1(x), . . . ,ŷk(x))∈Sk,x∈Ω
(Main Step Q2: Creating Buckets with Identities)
As y and ŷ induce partitions of Ωtr, so does the vector function V. That is:
where, for any I∈Sk,
B(I)={x∈Ωtr:V(x)=I}
Now, I is called an “identity” and B(I) is called a “bucket” in Ωtr with the identity I. In the following description, when an element x is said distributed to B(I), it means that V(x)=I.
(Main Step Q3: Merging Buckets)
It can be understood that totally there are mk (m to the k-th power) buckets. The plan is to assign a membership to each bucket, instead of each individual element. Certainly, such assignment is determined by the composition of the elements in the bucket. This raises a question: how can it be done if a bucket is empty?Furthermore, buckets having only few elements usually carry poor information, and thus likely lead to incorrect answers. Therefore empty buckets and small buckets with very few elements need to be merged into large buckets. For this purpose, define nB(I)=|B(I)| and nB(I)(j)=|B(I)∩Ωtr(j)|, and obviously, nB(I)=Σj=1mnB(I)(j). In a possible way, a merged bucket B may be obtained in such a way that the condition:
holds for certain predetermined positive constant α. The choice of α may be problem dependent A merged bucket will still be denoted as B(I) with I being any one of the identities for which B(I) is part of this merged bucket. Consequently, a merged bucket has more than one way of representation. (For example, when B((1,2,2,3)) and B((1,2,3,3)) are merged into a large bucket, B((1,2,2,3)) is chosen to denote the merged bucket, for the sake of simplifying the representation of the merged bucket. However, it is still possible to choose B((1,2,3,3)) as an alternative representation of the merged bucket.)
(Main Step Q4: Assigning Memberships)
Then, memberships are assigned respectively to the buckets. Such assignment may be done in many ways. One possible approach is illustrated in the following description.
Let a bucket B(I) in Ωtr with identity I be given. Assign the bucket a membership j if the ratio of the number of elements with membership j in B(I) to |Ωtr(j)| is maximal among ratios of all memberships. This defines a function Y:{B(I)}→S on the collection of buckets that:
It should be emphasized that there are many ways to determine the membership of a bucket, and which then result in different functions Y.
(Main Step Q5: Configuring an Eclectic Classifier)
The following is then the formal definition of the eclectic classifier {tilde over (y)}:Ω→S of the present invention:
{tilde over (y)}(x)=Y(B(V(x))),x∈Ω
In summary, the present invention solves the classification problem as follows: Given any element x∈Ω, apply V on x to obtain its identity I=V(x)∈Sk. Accordingly, x is distributed to the bucket B(I) which has the membership Y(B(I)). Finally, the eclectic classifier asserts that Y(B(I)) is also the membership of x. In other words, every element inherits the membership of the bucket to which it is distributed.
Next, according to a second aspect of the present invention, an application of level of confidence (LOC) is introduced associated with the aforementioned membership assignment.
It should be emphasized that LOC is an attribute of each element of Ω with respect to the training set Ωtr. An LOC can be formulated, computed, and utilized toward a better solution of the classification problem. For each bucket B(I), an LOC with respect to the training set Ωtr, denoted by μ, is designated to both the bucket and each element distributed to it as follows:
With the aforementioned assumption that both Ωtr and Ωtt are sufficiently large and share the full characteristics represented by the whole data Ω, the application of LOC may be interpreted as follows:
Let B be a non-merged bucket and let T be the set containing all elements in Ωtt which are distributed to B. Then, the accuracy of {tilde over (y)} on T is approximately equal to the LOC of B. That is to say, the percentage of x's in T for which the equation {tilde over (y)}(x)=y(x) holds is approximately equal to the LOC of B.
It should be noted that the accuracy of a classifier and the LOC of an element are two different concepts. The former is one of the criteria used to evaluate the performance of a classifier, while the latter is, heuristically, an index of the element describing the effectiveness of membership recognition with respect to the training set.
Other objects, advantages, and novel features of the invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings.
Different embodiments of the present invention are provided in the following description. These embodiments are meant to explain the technical content of the present invention, but not meant to limit the scope of the present invention. A feature described in an embodiment may be applied to other embodiments by suitable modification, substitution, combination, or separation.
It should be noted that, in the present specification, when a component is described to have an element, it means that the component may have one or more of the elements, and it does not mean that the component has only one of the element, except otherwise specified.
Moreover, in the present specification, the ordinal numbers, such as “first” or “second”, are used to distinguish a plurality of elements having the same name, and it does not means that there is essentially a level, a rank, an executing order, or a manufacturing order among the elements, except otherwise specified. A “first” element and a “second” element may exist together in the same component, or alternatively, they may exist in different components, respectively. The existence of an element described by a greater ordinal number does not essentially mean the existent of another element described by a smaller ordinal number.
Moreover, in the present specification, the terms, such as “preferably” or “advantageously”, are used to describe an optional or additional element or feature, and in other words, the element or the feature is not an essential element, and may be ignored in some embodiments.
Moreover, each component may be realized as a single circuit or an integrated circuit in suitable ways, and may include one or more active elements, such as transistors or logic gates, or one or more passive elements, such as resistors, capacitors, or inductors, but not limited thereto. Each component may be connected to each other in suitable ways, for example, by using one or more traces to form series connection or parallel connection, especially to satisfy the requirements of input terminal and output terminal. Furthermore, each component may allow transmitting or receiving input signals or output signals in sequence or in parallel. The aforementioned configurations may be realized depending on practical applications.
Moreover, in the present specification, the terms, such as “system”, “apparatus”, “device”, “module”, or “unit”, refer to an electronic element, or a digital circuit, an analogous circuit, or other general circuit, composed of a plurality of electronic elements, and there is not essentially a level or a rank among the aforementioned terms, except otherwise specified.
Moreover, in the present specification, two elements may be electrically connected to each other directly or indirectly, except otherwise specified. In an indirect connection, one or more elements may exist between the two elements.
(Eclectic Classifier)
As shown, the eclectic classifier 1 of the present invention, provided in the context of machine learning, includes an input module 10, a data collection module 20, a classifier combination module 30, a bucket creation module 40, a bucket merger module 50, a membership assignment module 60, and an output module 70.
It can be understood that the modules are illustrated here for the purpose of explaining the present invention, and the modules may be integrated or separated into other forms as hardware or software in separated circuit devices on a set of chips or an integrated circuit device on a single chip. The eclectic classifier 1 is implemented in a cloud server or a local computer.
The input module 10 is configured to receive sample data (or an element) x. The input module 10 may be a sensor, a camera, a speaker, and so on, that can detect physical phenomena, or it may be a data receiver.
The data collection module 20 is connected to the input module 10 and configured to store a collection of data Ω from the input module 10. The collection of data Ω⊂p includes a training Ωtr and/or a test set Ωtt and/or a remaining set Ωth. Here
is the set of real numbers and the expression Ω⊂
p means that the collection of data Ω belongs to
p, the space of p-dimensional real vectors.
With supervised approach, a membership function y:Ω→S={1, 2, . . . , m} can be found so that y(x) gives precisely the membership of the input data x. Accordingly, the collection of data Ω is composed of m memberships (or data categories), and the m memberships are digitized as 1, 2, . . . , m. To specifically explain the meaning of the data categories, for example, when a classifier is used to recognize animal pictures, membership “1” may indicate “dog”, membership “2” may indicate “cat”, . . . , and membership “m” may indicate “rabbit”, herein, “dog”, “cat”, and “rabbit” are regarded as the data categories.
The classifier combination module 30 is connected to the data collection module 20 and configured to combine k developed classifiers ŷ1, . . . , ŷk, k≥2, trained with the training set Ωtr, wherein k is the number of developed classifiers. Each of the developed classifiers ŷ1, . . . , ŷk may employ one model from convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM) network, YOLO, ResNet, ResNet-18, ResNet-34, Vgg16, GoogleNet, Lenet, MobileNet, decision trees, or support vector machine (SVM), but not limited thereto. The developed classifiers ŷ1, . . . , ŷk should be adjusted or trained to have different architectures (regarding the number of neurons, their connections, weights, or bias) even if they employ the same module from the aforementioned models.
However, the developed classifiers ŷ1, . . . , ŷk typically handle the same type of data, for example, they all handle image recognition, all handle sound recognition, and so on.
In particularly, the developed classifiers ŷ1, . . . , ŷk are combined to form a vector function defined as:
V(x)=(ŷ1(x), . . . ,ŷk(x))∈Sk,x∈Ω
Here, each V(x) is a preliminary result given by the developed classifiers ŷ1, . . . , ŷk, and it is a k-dimensional real vector, and Sk={(ji, . . . , jk):ji, . . . , jk∈S} collects the preliminary results for x∈Ω. The preliminary results will be further processed as follows.
The bucket creation module 40 is connected to the classifier combination module 30 and configured to partition the training set Ωtr into buckets B(I) with identities I. That is:
where, for any identity I∈Sk,
B(I)={x∈Ωtr:V(x)=I}
When an element x is said distributed to B(I), it means that V(x)=I. The identities I are associated with characteristics of the data.
It can be understood that the buckets are also data sets created to realize the classification according to the present invention. To specifically explain the meaning of the bucket B(I) and its identity I, for example, in case of m=3 and k=4, a possible form of the identity may be I=(1,2,2,3), and a possible form of the bucket may be B(I)=B((1,2,2,3))={x∈Ωtr; ŷ1(x)=1, ŷ2(x)=2, ŷ3(x)=2, ŷ4(x)=3}.
The bucket merger module 50 is connected to the bucket creation module 40 and configured to merge empty buckets and/or small buckets into large buckets, for example, according to their cardinalities, so as to reduce the bias caused by the rareness of data therein.
In particular, it is possible to define nB(I)=|B(I)| and nB(I)(j)=|B(I)∩Ωtr(j)|, and obviously, nB(I)=Σj=1mnB(I)(j). The bucket creation module 40 is then further configured to define (or denote) the cardinality nB(I)(j) of a bucket B(I) with a membership j and the cardinality ntr(j) of a subset of the training set Ωtr with the membership j, and to perform merger such that
holds for certain predetermined positive constant α between 0 and 1. The choice of the constant α may be problem dependent, so a specific value of α will not be given in the present description.
The membership assignment module 60 is indirectly connected to the bucket creation module 40 through the bucket merger module 50 and configured to assign respective memberships j's to the respective buckets B(I), for example, according to their cardinalities. The memberships j's refer to data categories of the training set Ωtr.
One possible approach is that: let a bucket B(I) in the training set Ωtr with identity I be given. Assign the bucket B(I) a membership j if the ratio of the number of sample data (or elements) x's with membership j in B(I) to the cardinality |Ωtr(j)| of a subset Ωtr(j) of the training set Ωtr with membership j is maximal among ratios of all memberships. This defines a function Y on the collection of buckets B(I) to S that
It should be emphasized that there are many ways to determine the membership of a bucket, and which then result in different functions Y.
The output module 70 is indirectly connected to the classifier combination module 30 through the bucket creation module 40, the bucket merger module 50, and the membership assignment module 60, and configured to derive an output result after the sample data x is processed through the classifier combination module 30. The output result may be directly the membership j, or converted to the data category, such as “dog”, “cat”, or “rabbit” indicated by the membership.
The eclectic classifier 1 of the present invention can be expressed by the following formal definition:
{tilde over (y)}(x)=Y(B(V(x))),x∈Ω
In summary, the present invention solves the classification problem as follows: Given any sample data x∈Ω (Ω may include the training set Ωtr and/or the test set Ωtt and/or the remaining set Ωth), apply the vector function V on the sample data x to obtain its identity I=V(x)∈Sk. Accordingly, in our words, x is distributed to the bucket B(I) which has the membership Y(B(I)). Naturally, the sample data x receives the same membership of the bucket B(I), namely Y(B(I)).
(Level of Confidence)
With the aforementioned implementation, the eclectic classifier 1 of the present invention can further produce a level of confidence (LOC) associated with the membership assignment module 60, as shown in
For each bucket B(I), an LOC with respect to the training set Ωtr, denoted by μ, is designated to the bucket B(I) as a ratio of the cardinality of an intersection of the bucket B(I) and Ωtr(Y(B(I)), a subset of the training set Ωtr with membership Y(B(I)), to the cardinality of the bucket B(I), that is:
This LOC, defined for buckets as given above, is then designated to each sample data x distributed to the bucket B(I). In this way, LOC is defined for every element:
μ(x)=μ(B(V(x))),x∈Ω
(Method to Implement an Eclectic Classifier)
The respective modules and the structure of the eclectic classifier 1 of the present invention have been discussed above. However, in the aspect of software, the eclectic classifier 1 may be implemented by a sequence of steps, as introduced above. Therefore, the method of the present invention essentially includes the following steps, executed in order:
(a) preparing a training set Ωtr from a collection of data Ω. Preferably, the training set Ωtr is further decomposed into subsets Ωtr(j). This step may be executed by the aforementioned data collection module 20.
(b) training k developed classifiers ŷ1, . . . , ŷk, k≥2, with the training set (Ωtr). The classifiers ŷ1, . . . , ŷk may be trained individually by conventional approaches.
(c) combining the developed classifiers ŷ1, . . . , ŷk to form a vector function V(x)=(ŷ1(x), . . . , ŷk(x)). This step may be executed by the aforementioned classifier combination module 30.
(d) creating buckets B(I) with identities I, wherein when sample data x is said distributed to a bucket B(I), it means that V(x)=I. This step may be executed by the aforementioned bucket creation module 40.
(e) merging empty buckets and/or small buckets into large buckets. This step may be executed by the aforementioned bucket merger module 50.
(f) assigning memberships j's respectively to the buckets, the memberships j's referring to data categories of the training set Ωtr. This step may be executed by the aforementioned membership assignment module 60.
(g) deriving an output result {tilde over (y)}(x) (its data category) after sample data x is processed through the vector function V(x). This step may be executed by the aforementioned output module 70.
In conclusion, the present invention provides an eclectic classifier, which combines the results from several developed classifiers that can give a maximal ratio or a majority of predictions or decisions regarded as an optimal answer. In this way, the extreme influences of the disadvantages of the developed classifiers can be avoided, and the advantages of the developed classifiers can be jointly taken into consideration.
Although the present invention has been explained in relation to its preferred embodiment, it is to be understood that many other possible modifications and variations can be made without departing from the spirit and scope of the invention as hereinafter claimed.