Interactive learning-based document annotation

Information

  • Patent Application
  • 20070150801
  • Publication Number
    20070150801
  • Date Filed
    December 23, 2005
    18 years ago
  • Date Published
    June 28, 2007
    17 years ago
Abstract
A document annotation system 10 includes a graphical user interface 22 used by an annotator 30 to annotate documents. An active learning component 24 trains an annotation model and proposes annotations to documents based on the annotation model. A request handler 26, 32, 34, 42 conveys annotation requests from the graphical user interface 22 to the active learning component 24, conveys proposed annotations from the active learning component 24 to the graphical user interface 22, and selectably conveys evaluation requests from the graphical user interface 22 to a domain expert 40. During annotation, at least some low probability proposed annotations are presented to the annotator 30 by the graphical user interface 22. The presented low probability proposed annotations enhance training of the annotation model by the active learning component 24.
Description

BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 diagrammatically depicts a document annotation system including an embedded active learning component.



FIG. 2 plots the accuracy of proposed annotations produced by selected learning strategies optionally employed by the embedded active learning component of the document annotation system of FIG. 1.


Claims
  • 1. a document annotation system comprising: a graphical user interface for annotating documents;an active learning component for training an annotation model and for proposing annotations to documents based on the annotation model; anda request handler for conveying annotation requests from the graphical user interface to the active learning component and for conveying proposed annotations from the active learning component to the graphical user interface.
  • 2. The document annotation system as set forth in claim 1, wherein the documents being annotated are one of (i) XML documents, the annotation model being a target XML schema, and (ii) HTML documents.
  • 3. The document annotation system as set forth in claim 1, wherein the active learning component comprises: a probabilistic active learning component that outputs a probability of acceptance associated with each proposed annotation.
  • 4. The document annotation system as set forth in claim 3, wherein the request handler comprises: a mode selector that selects at least between (i) a training mode in which low probability proposed annotations are presented by the graphical user interface and (ii) an annotation mode in which high probability proposed annotations are presented by the graphical user interface.
  • 5. The document annotation system as set forth in claim 4, wherein the mode selector is switchable during a document annotation to switchably effectuate both (i) rapid training of the annotation model through presentation of low probability proposed annotations in the training mode and (ii) rapid annotation through presentation of high probability proposed annotations in the annotation mode.
  • 6. The document annotation system as set forth in claim 4, wherein (i) in the training mode the graphical user interface requires one or more user operations to make an annotation and (ii) in the annotation mode the graphical user interface requires a single user operation to annotate a plurality of elements.
  • 7. The document annotation system as set forth in claim 3, wherein the probabilistic active learning component comprises a probabilistic classifier that probabilistically classifies unannotated document elements respective to classes corresponding to annotations.
  • 8. The document annotation system as set forth in claim 7, wherein the probabilistic classifier is selected from a group consisting of: a k-nearest neighbor classifier, a maximum entropy classifier, and an assembly method classifier.
  • 9. The document annotation system as set forth in claim 3, wherein at least some low probability proposed annotations are presented by the graphical user interface, the presented low probability proposed annotations enhancing training of the annotation model by the active learning component.
  • 10. The document annotation system as set forth in claim 3, wherein the request handler further conveys learning requests from the graphical user interface to the active learning component, each learning request using previously annotated documents or document portions for training of the annotation model.
  • 11. The document annotation system as set forth in claim 1, wherein the request handler comprises: an asynchronous request handler that (i) buffers annotation requests conveyed from the graphical user interface to the active learning component and (ii) buffers proposed annotations to documents conveyed from the active learning component to the graphical user interface.
  • 12. The document annotation system as set forth in claim 1, wherein the request handler further comprises: a domain expert request handler for conveying evaluation requests from the graphical user interface to a human domain expert and for conveying responses from the human domain expert to the graphical user interface.
  • 13. The document annotation system as set forth in claim 12, wherein the evaluation request conveyed by the domain expert request hander includes (i) at least one proposed annotation to a document generated by the active learning component and (ii) the document or a link to the document.
  • 14. The document annotation system as set forth in claim 13, wherein the domain expert request handler comprises: an automated email message generator that generates an email addressed to the domain expert and having content including at least (i) the at least one proposed annotation and (ii) the document or the link to the document.
  • 15. A document annotation method comprising: (i) annotating initially unannotated documents, the annotating including applying an active learning component to propose annotations based on an annotation model and accepting or rejecting the proposed annotations via a graphical user interface;(ii) training the annotation model by applying the active learning component to train or update training of the annotation model based on previous annotations generated by the annotating (i); and(iii) alternating between the annotating (i) and the training (ii) to concurrently train the annotation model and annotate the initially unannotated documents.
  • 16. The document annotation method as set forth in claim 15, wherein the annotating (i) further comprises: for a selected proposed annotation, constructing an evaluation request including at least the selected proposed annotation and the document to which the selected proposed annotation pertains or a link to said document;communicating the evaluation request to a domain expert having expertise in a subject matter domain to which said document pertains; andaccepting or rejecting the selected proposed annotation based on a response of the domain expert to the communicated evaluation request.
  • 17. The document annotation method as set forth in claim 15, wherein the annotating (i) further comprises: selectively biasing proposed annotations toward having a low probability of acceptance to selectively enhance the training (ii).
  • 18. The document annotation method as set forth in claim 15, wherein the applying of the active learning component to propose annotations in the annotating (i) comprises: assigning probabilistic classifications to unannotated document elements, the classifications corresponding to the proposed annotations.
  • 19. The document annotation method as set forth in claim 15, wherein the active learning component and the graphical user interface operate substantially asynchronously respective to one another, the method further comprising: buffering requests conveyed from the graphical user interface to the active learning component, the applying operations of the annotating (i) and training (ii) being performed by the active learning component responsive to buffered requests; andbuffering proposed annotations conveyed from the active learning component to the graphical user interface, the accepting or rejecting operation of the annotating (i) being performed responsive to buffered proposed annotations.
  • 20. A document annotation system comprising: a graphical user interface for accepting or rejecting proposed document annotations;an active learning component for training an annotation model and for generating proposed document annotations based on the annotation model, anda request handler for (i) constructing evaluation requests including at least proposed document annotations and corresponding documents or links to corresponding documents, and (ii) conveying said evaluation requests to a human domain expert via an automated messaging pathway.
  • 21. The document annotation system as set forth in claim 20, wherein the automated messaging pathway includes an email system, and the request handler further comprises: an automated email message generator for generating emails addressed to the human domain expert having as content the constructed evaluation requests.
  • 22. The document annotation system as set forth in claim 20, wherein the constructed evaluation requests include rendered views in which the proposed document annotations are embedded in the corresponding documents.