A portion of the disclosure of this patent document contains material to which a claim for copyright is made. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but reserves all other copyright rights whatsoever.
Embodiments of the present invention relate to artificial intelligence systems for training classifiers.
There are numerous reasons for classifying entities. Binary classification indicates whether or not an entity is in a particular class. Classification can be done based on the publications of an entity. This can include social media publications. The social media publications are analyzed for the presence of indicators. The indicators might be key words. The presence or absence of an indicator might be digitally stored as a binary value of 1 if said indicator is present and a binary value of 0 if said indicator is not present. Prior art systems have assigned different weights to different indicators. This recognizes that some indicators are stronger than others. It has been discovered, however, that when there is a large number of low weight indicators in an entity's publications, prior art systems tend to over predict the probability that an entity is in a particular class. There is need, therefore, for an artificial intelligence system for training a classifier that will not over predict due to large numbers of low weight indicators.
The summary of the invention is provided as a guide to understanding the invention. It does not necessarily describe the most generic embodiment of the invention or the broadest range of alternative embodiments.
A system for training a classifier has a database of training data and a modeling system for building a classification model based on the training data. The database has a binary class for each entity and binary tokens indicating whether or not one or more indicators about the entity are true. The classification model is based on a tempered indication of the tokens. The tempered indication is a ratio of a weighted sum of the tokens for each entity divided by a tempering factor. The tempering factor is a function of the unweighted sum of the tokens for each entity. Thus the tempering factor will reduce the indication when large numbers of low weight tokens are present so that the model does not over predict the probability of an entity being in a class.
The detailed description describes non-limiting exemplary embodiments. Any individual features may be combined with other features as required by different applications for at least the benefits described herein.
As used herein, the term “about” means plus or minus 10% of a given value unless specifically indicated otherwise.
As used herein, a “computer-based system”, “computer system”, “database” or “engine” comprises an input device for receiving data, an output device for outputting data, a permanent memory for storing data as well as computer code, and a microprocessor for executing computer code. The computer code resident in said permanent memory will physically cause said microprocessor to read-in data via said input device, process said data within said microprocessor, and output said processed data via said output device.
As used herein a “binary value” is any type of computer data that can have two states. Said data may be, but is not limited to, a bit, an integer, a character string, or a floating point number. A binary value of “1” or “true” is interpreted as the number 1 for arithmetic calculations. A binary value of “0” or “false” is interpreted as the number 0 for arithmetic calculations.
As used herein, the symbols “i” and “j” refer to index numbers for one of a plurality of objects. Thus the term “entity j” refers to a jth entity in a plurality of said entities. The term “token i” refers to an ith token in a plurality of said tokens.
As used herein the term “adjudicated class” means that a classification that has been done independently, in at least some respect of the data, used to train a classifier. Referring to
The computer implemented modeling engine 120 comprises a microprocessor and computer readable instructions stored on a permanent memory. The computer readable instructions are operable to cause said microprocessor to physically carry out the steps of:
The output is useful for an automated classification system that will read in token data for prospective entities and use said model for determining a probability of said prospective entity being in said class.
The tempering factor has a value of 1 when there is only one indicator found in an entity's data (i.e. the unweighted sum of said tokens i for said entity j has a value of 1). This is the function of the offset factor 146.
The formula for the tempered indication shown in
In order to compare the tempered indication to a binary class, the tempered indication may be transformed to a real value between 0 and 1 by a normalized asymptotic transformation. A particular normalized asymptotic transformation 152 is shown in
A particular function 162 is shown for calculating an error function. This function is the unweighted sum of squares (i.e. “SSQ”) of the residuals. Any error function, however, may be used that provides an aggregate measure of how well the model fits the data. An alternative error function might be a weighted sum of squares of the residuals where the weights are related to size or importance of the training entities j relative to each other.
A set of 100 entities were adjudicated to determine which class they belonged to. The social media sites of the entities were then analyzed to identify the presence or absence of six words indicative of said class. The class of each entity j was associated with an event date. The dates of the publications used for each entity was after each entity's event date. The classes and indicator tokens were then stored in a training database. A modeling engine then read the data in. The token weights and tempering parameters of a tempered indication where then calculated based on the model shown in
While the disclosure has been described with reference to one or more different exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the disclosure. In addition, many modifications may be made to adapt to a particular situation without departing from the essential scope or teachings thereof. Therefore, it is intended that the disclosure not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention. For example, the methods described herein may be applied to multi-valued or even scalar classes of entities. They can also be extended to tokens that are scalars, such as the number of times a particular indicator is present in a publication, or the degree to which an indicator is present.
Number | Name | Date | Kind |
---|---|---|---|
8024280 | Jessus et al. | Sep 2011 | B2 |
8280828 | Perronnin et al. | Oct 2012 | B2 |
8521679 | Churchill et al. | Aug 2013 | B2 |
8543520 | Diao | Sep 2013 | B2 |
8725660 | Forman et al. | May 2014 | B2 |
8799190 | Stokes et al. | Aug 2014 | B2 |
8805769 | Ritter et al. | Aug 2014 | B2 |
8843422 | Wang et al. | Sep 2014 | B2 |
8954360 | Heidasch et al. | Feb 2015 | B2 |
9015089 | Servi et al. | Apr 2015 | B2 |
9229930 | Sundara et al. | Jan 2016 | B2 |
20080109272 | Sheopuri et al. | May 2008 | A1 |
20090132445 | Rice | May 2009 | A1 |
20130124447 | Badros et al. | May 2013 | A1 |
20130311419 | Xing et al. | Nov 2013 | A1 |
20140201126 | Zadeh | Jul 2014 | A1 |
20140297403 | Parsons et al. | Oct 2014 | A1 |
20150032676 | Smith et al. | Jan 2015 | A1 |
20150120631 | Serrano Gotarredona et al. | Apr 2015 | A1 |
20150127591 | Gupta et al. | May 2015 | A1 |
20150242749 | Carlton | Aug 2015 | A1 |
20150286930 | Kawanaka et al. | Oct 2015 | A1 |
Entry |
---|
Scatter Plot Smoothing, https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lowess.html, last viewed Apr. 7, 2016. |
Wikipedia, Bayesian network, https://en.wikipedia.org/wiki/Bayesian_network, last viewed Mar. 21, 2016. |
Wikipedia, Belief revision, https://en.wikipedia.org/wiki/Belief_revision, lasted viewed Mar. 21, 2016. |
Wikipedia, Local regression, https://en.wikipedia.org/wiki/Local_regression, last viewed Apr. 4, 2016. |
Wikipedia, Monotonic function, https://en.wikipedia.org/wiki/Monotonic_function, last viewed Mar. 21, 2016. |
Wikipedia, Semantic network, https://en.wikipedia.org/wiki/Semantic_network, last viewed Mar. 21, 2016. |
Wikipedia, Logistic regression, https://en.wikipedia.org/wiki/Logistic_regression, last viewed Mar. 28, 2016. |
Wikipedia, Reason maintenance, https://en.wikipedia.org/wiki/Reason_maintenance, last viewed Mar. 21, 2016. |
European Journal of Operational Research 176 (2007) 565-583; O.R. Applications, Strategies for detecting fraudulent claims in the automobile insurance industry; available online at sciencedirect.com, Jan. 1, 2007. |