Knowledge corroboration involves trying to exploit the “wisdom of the crowds” by using knowledge from many different people, organizations, committees, enterprises, automated systems or other entities. For example, a majority voting system is one example of knowledge corroboration where many people vote on an issue and the majority vote is used as the outcome.
Knowledge corroboration is difficult where the knowledge of the individuals to be corroborated is poor or very variable and where some individuals may act maliciously by reporting false information on purpose. In addition, in order to exploit knowledge corroboration, large scale systems are needed in order to corroborate knowledge from large numbers of entities. Providing knowledge corroboration solutions which are efficient and which scale up for use with huge amounts of data is difficult.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known systems for knowledge corroboration.
The following presents a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not an extensive overview of the disclosure and it does not identify key/critical elements of the invention or delineate the scope of the invention. Its sole purpose is to present some concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later.
Knowledge corroboration is described. In an embodiment many judges provide answers to many questions so that at least one answer is provided to each question and at least some of the questions have answers from more than one judge. In an example a probabilistic learning system takes features describing the judges or the questions or both and uses those features to learn an expertise of each judge. For example, the probabilistic learning system has a graphical assessment component which aggregates the answers in a manner which takes into account the learnt expertise in order to determine enhanced answers. In an example the enhanced answers are used for knowledge base clean-up or web-page classification and the learnt expertise is used to select judges for future questions. In an example the probabilistic learning system has a logical component that propagates answers according to logical relations between the questions.
Many of the attendant features will be more readily appreciated as the same becomes better understood by reference to the following detailed description considered in connection with the accompanying drawings.
The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein:
Like reference numerals are used to designate like parts in the accompanying drawings.
The detailed description provided below in connection with the appended drawings is intended as a description of the present examples and is not intended to represent the only forms in which the present example may be constructed or utilized. The description sets forth the functions of the example and the sequence of steps for constructing and operating the example. However, the same or equivalent functions and sequences may be accomplished by different examples.
Although the present examples are described and illustrated herein as being implemented in a knowledge base clean up system and a web page classification system, the systems described are provided as examples and not a limitation. As those skilled in the art will appreciate, the present examples are suitable for application in a variety of different types of knowledge corroboration systems.
Judge features 110, which are characteristics describing the judges are available to the probabilistic learning system 112. For example, the judge features may be an identifier of each judge and/or attributes of the judges such as their profession, age, nationality, gender etc.
Question features 108 are available to the probabilistic learning system 112. For example, questions features 108 may be an identifier of each question and/or attributes of the questions such as type of question, area of knowledge, appearance of certain words in the question etc.
The probabilistic learning system provides enhanced answers to the questions which exploit the “wisdom of the crowds”. The enhanced answers are formed by aggregating and taking into account the expertise (or trustworthiness) of the judges. The expertise of the judges 116 is also learnt by the probabilistic learning system. In some examples the expertise of each judge for each question is learnt and in other examples an overall expertise of each judge is learnt. The enhanced answers may be fed back into the knowledge base 100. Also, the learnt expertise may be used by a judge selection system 118 to select judges 106 for future questions or to reward judges differently depending on their expertise or trustworthiness.
The probabilistic learning system is able to determine the enhanced answers and learn the expertise of the judges without the need for any ground truth knowledge about the judge expertise and/or the answers to the questions. The probabilistic learning system is arranged to take into account uncertainty in the answers provided by the judges and so is able to cope with judges who guess, judges who give poor answers and judges who give malicious answers. Judges may often act inconsistently or unreliably and give inaccurate feedback across knowledge domains. The background knowledge of judges may vary across knowledge domains. For example, the majority of people from the “crowd” may not know that Barack Obama has won a Grammy Award. In such a case the probabilistic learning system is able to identify the few experts in the crowd of judges who may know the truth. The probabilistic learning system is able to give enhanced answers which are better than a simple majority voting system would achieve when aggregating answers of the judges. This is the case for example, where there is at least one answer for each question and where there are multiple answers from different judges for at least a plurality of the questions. Generally speaking, the more answers there are from different judges for the same questions the better the accuracy of the enhanced answers. In some examples, the probabilistic learning system has a logical component which is able to exploit logical relations which exist between the questions in order to make deductions and add to the knowledge base 100 and also to propagate knowledge in the probabilistic learning system according to the logical relations.
A plurality of judges 210 provide truth assessments 208 of the triple statements in the knowledge base. The judges may be any type of judges as described above with reference to
The probabilistic learning system determines enhanced truth assessments 218 in a similar manner as for the enhanced answers described above with reference to
The probabilistic learning system also provides learnt judge expertise 220 information which may be used by a judge selection system 224 to select judges for assessing further triple statements and/or to reward judges based on their expertise.
More detail about a probabilistic learning system 400 is now given with reference to
In some examples the inference engine 410 carries out message passing over the factor graph data structure components according to an inference schedule. For example, this may comprise running inference on the expertise component and then switching iteratively between that component and the remaining components of the probabilistic learning system. However, any suitable inference schedule may be used.
In an example the assessment component 406 comprises a graphical data structure 502 as illustrated schematically in
The assessment component 406 thus comprises a graphical structure which comprises a node representing a learnt estimated true answer connected to, one node for each observed answer to be aggregated to the estimated true answer. Each of the nodes for the observed answers may be connected to at least one node representing a learnt expertise indicator of a judge who gave the observed answer.
In some examples the assessment component is able to cope with situations where judges guess the answers to questions in the event that they do not know the answer. This may be achieved by using a node 600 representing a guessing probability which is learnt by the probabilistic learning system.
As mentioned above, in some examples the assessment component uses two nodes 700, 702 to represent a learnt expertise indicator of a judge which is not question specific as illustrated in
In some embodiments the factor graph data structure 402 comprises an expertise component 800 which is arranged to map judges and questions into a one, two or higher dimensional trait space. The closeness or proximity of judges and questions in trait space can indicate the expertise of a judge for a type of question. The assessment component 406 is then able to use that information about expertise of judges as it aggregates answers from judges. An optional logical component 408 propagates answers made by judges through logical relations between questions. It is not essential to use an expertise component which maps judges and questions into a trait space and in this case features of the judges and/or questions are used to assess expertise of judges by using the features to allow generalization across judges and/or questions. The use of features in this way helps mitigate data sparsity and also can be helpful when dealing with the cold start problem when new judges join the feedback crowd or when new questions are added to the knowledge base. However, it has been found that the use of an expertise component which maps judges and questions into a trait space remarkably improves the corroboration process. The mapping into trait space may be a linear mapping although this is not essential.
In some embodiments an expertise component which maps judges and questions into a trait space is based on U.S. patent application Ser. No. 12/253,854 filed on 17 Oct. 2008 which is incorporated herein by reference in its entirety.
An example expertise component 1000 is illustrated in
The expertise component of
A detailed example of a probabilistic learning system such as that illustrated in
In this example, assessments aijε{T,F} that judges i make about questions j are used in order to infer truth values tjε{T,F} of questions j. The probabilistic learning system may comprise three interacting components:
An assessment component 1104 which relates an assessment aij with the truth value tj of the question, the correctness uij of judge i's assessment, and the guessing probability q of judges.
A logical component 1106 which describes the dependency between the truth value tj of question j and truth values tεDj of questions from which j can be derived.
An expertise component 1108 which learns the expertise ũij of judge i for question j in terms of judge features xi and question features yj, which interact via a latent expertise space.
The following notation may be used to denote the partially observed matrix of assessments by Aε{T,F}n×m, the vector of truth values of the questions by tε{T,F}m, and the matrix of correctness of judge assessments by Uε{T,F}n×m. A variable q∞Beta (α,β) represents the probability that judges will guess the correct answer. A matrix of question-based judge expertise is denoted by Ũεn×m, and a parameter tuple of the judge-question expertise component by Θ:=(r0, v, w, V, W)ε×d
where Ω={{Dj}j-1m, {πj}j-1m, {xi}i-1n, {yj}j-1m, α, β} represents the parameters of the prior distributions over the components of the expertise component Θ are jointly denoted by Σ. A complete reference of notation is given in
Two assessment component of
p(A,U,q|t,Ũ,α,β)=Πi=1nΠj=1mp(aij|tj,iij,q)p(uij|ũij)p(q|α,β), (7)
where Ũ holds the corresponding prior parameters for the components of U. The function uij maps T and F to 1 and 0, respectively.
The variables ũijε{T,F} represent the expertise of judge i for a specific statement j. In some embodiments of the assessement component these are represented independently of the question j as a general reliability uij=ui of judge i. In other embodiments, such as that illustrated in
More detail about the logical component of the probabilistic learning system is now given. Logical relations exist between the questions and in an example, the questions are stored in a knowledge base using a semantic-web formalism. An example of such a semantic-web formalism is now given although any suitable formalism which assigns a probabilistic value to statements made in the formalism may be used may be used.
Using this type of semantic-web formalism provides the benefit that the deductive closure of a knowledge base using this formalism can be constructed at least in polynomial time in the size of the knowledge base.
In an example a knowledge base K is provided with statements f1, f2, . . . , fn using this type of semantic-web formalism. The deductive rules (as described above) provide logical dependencies among the questions' truth values. Furthermore, consider judges u1, u2, . . . , um who give feedback on the questions. Given descriptive judge and question features, the probabilistic learning system may jointly learn the truth values of the questions and the expertise of the judges by leveraging the logical dependencies among questions and the latent affinities between questions and judges.
In some embodiments the assessment, expertise and logical components of the probabilistic learning system comprise graphical structures which map to factor graph data structures. For example, these graphical structures may be Bayesian networks. In embodiments using the logical component the probabilistic learning system is arranged to convert logical expressions or relationships expressed using the semantic-web formalism into graphical structures of the logical component. This provides the benefit that the logical component is able to integrate with the other graphical components of the probabilistic learning system. Also, knowledge may be efficiently propagated through the graphical structure of the logical component. An example of how the relationships expressed using a semantic web formalism are translated into a Bayesian network is now given.
A knowledge base K formed using the semantic-web formalism is available in which p(f)=1, for each question f, is consistent when there is no cycle along deduction paths, or in other words, when no question of the form <X, R, X>(for any XεEnt) can be derived by grounding the above rules. This is what is herein denoted as logical consistency. However, when p(f)≠1 for some questions f in K, logical consistency is no longer defined and an alternative notion of probabilistic consistency is used, in which case the deduction rules are viewed as soft constraints.
Consider the purely logical case. Let c be a question in K and let
(a1b1)→c, . . . ,(aibi)→c (1)
be all deductions of the conclusion c in K, where the ai and the bi can be previously derived. The following must hold:
where d represents the missing evidence, i.e., all missing deductions that could lead to c and only c. Note that this semantic interpretation of the variable d makes the equivalence (abi)dc possible.
The probabilistic learning system can turn the logical formula into a Bayesian network using deterministic conditional probability tables (CPTs) that represent the logical relationships.
The conditional probability at a node abi is given by:
This simplifies a disjunctive normal form to the expression c=ab1 . . . ab1d. Finally, the system connects c with all the variables in the disjunctive normal form by a conditional probability:
In an example of a logical component, each question fj in a knowledge base K is assigned a binary variable tjε{T,F}. For example, each truth value tj of a question that can be logically deduced by a set of premises Dj from K is connected with the truth values of questions in Dj through p(tj|Dj,{tilde over (t)}j) as described above. Each question fj for which there exists no premises in K is assigned p({tilde over (t)}j):=Bernoulli(πj) as a prior, where the binary variable {tilde over (t)}j accounts for the deduction through missing premises. Defining
the “prior” distribution for t factorizes as
where Equations (3) and (4) specify the conditional distribution p(tj|{tilde over (t)}j,Dj).
More detail about an example of an expertise model is now given. The mathematical expression for the expertise model given above factorizes as:
where π(r0, v, w, V, W) is a fully factorizing Gaussian prior over r0, v, w, V, and W whose parameters are jointly denoted by Σ above. Intuitively, the model can be thought of as mapping both judges xi and questions yj into a k-dimensional latent knowledge space, with si=VTxi and zj=WTyj. The question-dependent judge expertise ũij is then modeled as the inner product siTzj between the latent expertise vectors. In addition, purely judge or question related effects are modeled with linear models, xiTv and yjTw, together with an overall threshold r0.
The judge and question features, as represented by the vectors xi, yj, can be seen as characterizing descriptions that allow generalizations across judges and questions.
In this example, the assessment model and the expertise model may be connected by a sigmoid factor
Since the expertise model achieves efficient inference using a fully factorised Gaussian approximation the system approximates the marginal distribution, p(ũij) using a Gaussian. This is achieved by using a Laplace approximation to a Beta distribution Beta(p; a, b) after changing the basis of the Beta distribution via the sigmoid, following. For p(ũij): N (μ,σ2) this gives
As mentioned above the graphical structure of the components of the probabilistic learning system may be implemented as a factor graph. In an example such as the example of
In the examples described above the ground truth of the questions is not known and the true expertise of the judges is not known. However, in some embodiments it is possible to present the judges with a mixture of questions where the answers to some of the questions is known as ground truth. This enables the reliability of the judges to be assessed against the ground truth for some of the questions and this knowledge may be used by the expertise component. In this way the expertise component has enhanced accuracy and is able to provide input to the assessment component which in turn enhances the answers to the questions where ground-truth is not available.
Computing-based device 1500 comprises one or more processors 1502 which may be microprocessors, controllers or any other suitable type of processors for processing computing executable instructions to control the operation of the device in order to manage groups of devices. In some examples, for example where a system on a chip architecture is used, the processors 1502 may include one or more fixed function blocks which implement a part of the method of managing groups of computing devices in hardware (rather than software or firmware). Platform software comprising an operating system 1508 or any other suitable platform software may be provided at the computing-based device to enable application software 1510 to be executed on the device. An inference engine 1512 for carrying out inference on a factor graph data structure 1514 is provided. The factor graph data structure 1514 comprises an expertise component 1516 and an assessment component 1518. It optionally comprises a logical component 1520.
The computer executable instructions may be provided using any computer-readable media that is accessible by computing based device 1500. Computer-readable media may include, for example, computer storage media such as memory 1506 and communications media. Computer storage media, such as memory 1506, includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store information for access by a computing device. In contrast, communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transport mechanism. Although the computer storage media (memory 1506) is shown within the computing-based device 1500 it will be appreciated that the storage may be distributed or located remotely and accessed via a network 1528 or other communication link (e.g. using communication interface 1504).
An output is also provided such as an audio and/or video output to a display system integral with or in communication with the computing-based device. The display system may provide a graphical user interface, or other user interface of any suitable type although this is not essential.
The computing-based device 1500 also comprises an input/output controller 1522 arranged to output display information to a display device 1524 which may be separate from or integral to the computing-based device 1500. The display information may provide a graphical user interface. The input/output controller 1522 is also arranged to receive and process input from one or more devices, such as a user input device 1526 (e.g. a mouse or a keyboard). This user input may be used to input user designed management scenarios. In an embodiment the display device 1524 may also act as the user input device 1526 if it is a touch sensitive display device. The input/output controller 1522 may also output data to devices other than the display device, e.g. a locally connected printing device.
The term ‘computer’ is used herein to refer to any device with processing capability such that it can execute instructions. Those skilled in the art will realize that such processing capabilities are incorporated into many different devices and therefore the term ‘computer’ includes PCs, servers, mobile telephones, personal digital assistants and many other devices.
The methods described herein may be performed by software in machine readable form on a tangible storage medium e.g. in the form of a computer program comprising computer program code means adapted to perform all the steps of any of the methods described herein when the program is run on a computer and where the computer program may be embodied on a computer readable medium. Examples of tangible (or non-transitory) storage media include disks, thumb drives, memory etc and do not include propagated signals. The software can be suitable for execution on a parallel processor or a serial processor such that the method steps may be carried out in any suitable order, or simultaneously.
This acknowledges that software can be a valuable, separately tradable commodity. It is intended to encompass software, which runs on or controls “dumb” or standard hardware, to carry out the desired functions. It is also intended to encompass software which “describes” or defines the configuration of hardware, such as HDL (hardware description language) software, as is used for designing silicon chips, or for configuring universal programmable chips, to carry out desired functions.
Those skilled in the art will realize that storage devices utilized to store program instructions can be distributed across a network. For example, a remote computer may store an example of the process described as software. A local or terminal computer may access the remote computer and download a part or all of the software to run the program. Alternatively, the local computer may download pieces of the software as needed, or execute some software instructions at the local terminal and some at the remote computer (or computer network). Those skilled in the art will also realize that by utilizing conventional techniques known to those skilled in the art that all, or a portion of the software instructions may be carried out by a dedicated circuit, such as a DSP, programmable logic array, or the like.
Any range or device value given herein may be extended or altered without losing the effect sought, as will be apparent to the skilled person.
It will be understood that the benefits and advantages described above may relate to one embodiment or may relate to several embodiments. The embodiments are not limited to those that solve any or all of the stated problems or those that have any or all of the stated benefits and advantages. It will further be understood that reference to ‘an’ item refers to one or more of those items.
The steps of the methods described herein may be carried out in any suitable order, or simultaneously where appropriate. Additionally, individual blocks may be deleted from any of the methods without departing from the spirit and scope of the subject matter described herein. Aspects of any of the examples described above may be combined with aspects of any of the other examples described to form further examples without losing the effect sought.
The term ‘comprising’ is used herein to mean including the method blocks or elements identified, but that such blocks or elements do not comprise an exclusive list and a method or apparatus may contain additional blocks or elements.
It will be understood that the above description of a preferred embodiment is given by way of example only and that various modifications may be made by those skilled in the art. The above specification, examples and data provide a complete description of the structure and use of exemplary embodiments of the invention. Although various embodiments of the invention have been described above with a certain degree of particularity, or with reference to one or more individual embodiments, those skilled in the art could make numerous alterations to the disclosed embodiments without departing from the spirit or scope of this invention.
Number | Name | Date | Kind |
---|---|---|---|
4593367 | Slack et al. | Jun 1986 | A |
4599692 | Tan et al. | Jul 1986 | A |
4620286 | Smith et al. | Oct 1986 | A |
5701400 | Amado | Dec 1997 | A |
6032141 | O'Connor et al. | Feb 2000 | A |
7428521 | Horvitz et al. | Sep 2008 | B2 |
7613670 | Horvitz et al. | Nov 2009 | B2 |
7801896 | Szabo | Sep 2010 | B2 |
7885905 | Heckerman et al. | Feb 2011 | B2 |
20030126136 | Omoigui | Jul 2003 | A1 |
20050096950 | Caplan et al. | May 2005 | A1 |
20070156677 | Szabo | Jul 2007 | A1 |
20070192106 | Zilca | Aug 2007 | A1 |
20070208570 | Bhardwaj et al. | Sep 2007 | A1 |
20080004954 | Horvitz | Jan 2008 | A1 |
20080005095 | Horvitz et al. | Jan 2008 | A1 |
20080065471 | Reynolds et al. | Mar 2008 | A1 |
20080162394 | Horvitz et al. | Jul 2008 | A1 |
20090049000 | Hadar | Feb 2009 | A1 |
20090054123 | Mityagin et al. | Feb 2009 | A1 |
20090254572 | Redlich et al. | Oct 2009 | A1 |
20090299812 | Ray | Dec 2009 | A1 |
20100017348 | Pinckney et al. | Jan 2010 | A1 |
20100070448 | Omoigui | Mar 2010 | A1 |
20100112532 | Kakui | May 2010 | A1 |
20100250497 | Redlich et al. | Sep 2010 | A1 |
20100332583 | Szabo | Dec 2010 | A1 |
20110131163 | Stern et al. | Jun 2011 | A1 |
20110319724 | Cox | Dec 2011 | A1 |
20120288845 | Kumar GL | Nov 2012 | A1 |
Entry |
---|
Carpenter, “Multilevel Bayesian Models of Categorical Data Annotation”, retrieved on Sep. 23, 2010 at <<http://lingpipe.files.wordpress.com/2008/11/carp-bayesian-multilevel-annotation.pdf>>, Technical Report, 2009, pp. 1-52. |
Fadel, et al., “Elaboration Likelihood in Knowledge Management: A Model and Experimental Test”, retrieved on Sep. 23, 2010 at <<http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4439064>>, IEEE Proceedings of Hawaii Intl Conference on System Sciences, 2008, pp. 1-10. |
Galland, et al., “Corroborating Information from Disagreeing Views”, retrieved on Sep. 23, 2010 at <<http://hal.archives-ouvertes.fr/docs/00/42/95/46/PDF/document.pdf>>, Intl Conference on Web Search and Data Mining (WSDM), 2010, pp. 1-10. |
Hartonas, et al., “Adaptivity for Knowledge Content in the Semantic Web”, retrieved on Sep. 23, 2010 at <<http://users.teilar.gr/˜hartonas/KGCM-08-Hartonas.pdf>>, Proceedings of Conference on Knowledge Generation, Communication and Management (KGCM), Orlando, Florida, 2008, pp. 1-7. |
Kasneci, et al., “Bayesian Knowledge Corroboration with Logical Rules and User Feedback”, retrieved on Sep. 23, 2010 at <<http://research.microsoft.com/pubs/131121/kasneciTR2010.pdf>>, Microsoft Research, Microsoft Corporation, TechReport MSR-TR-2010-45, May 2010, pp. 1-18. |
Koide, et al., “Decision Support System for Rocket Launch Using Semantic Web Services”, retrieved on Sep. 23, 2010 at <<http://www.ra.ethz.ch/CDstore/www2005-ws/workshop/wf11/11-koide.pdf>>,WWW Workshop on Semantic Web Technologies in Japan, 2005, pp. 1-4. |
McCool, et al., “Semantic Issues in Web-Scale Knowledge Aggregation”, retrieved on Sep. 23, 2010 at <<http://74.125.155.132/scholar?q=cache:ie275BktyoQJ:scholar.google.com/&hl=en&as—sdt=2000>>, Knowledge Systems Laboratory, 2003, pp. 1-15. |
Ramakrishnan, et al., “A Framework for Schema-Driven Relationship Discovery from Unstructured text”, retrieved on Sep. 23, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.7627&rep=rep1&type=pdf>>, Springer, Lecture Notes in Computer Science, vol. 4273, Proceedings of Intl Semantic Web Conference (ISWC), Athens, Georgia, Nov. 2006, pp. 583-596. |
Raykar, et al., “Learning From Crowds”, retrieved on Sep. 23, 2010 at <<http://www.umiacs.umd.edu/users/vikas/publications/raykar—JMLR—2010—crowds.pdf>>, Journal of Machine Learning Research, vol. 11, 2010, pp. 1297-1322. |
Number | Date | Country | |
---|---|---|---|
20120150771 A1 | Jun 2012 | US |