Method and system to index captioned objects in published literature for information discovery tasks

Information

  • Patent Application
  • 20070219970
  • Publication Number
    20070219970
  • Date Filed
    March 13, 2007
    17 years ago
  • Date Published
    September 20, 2007
    17 years ago
Abstract
The present invention relates to the identification, extraction, linking, storage and provisioning of data that constitute the captioned components of published or “print ready” literature for computerized information discovery activities including search, browse and data mining. These components, or objects, include the tabular presentation of data (“tables”) and graphics such as “figures”, “images” and “illustrations” typically used to supplement the textual narrative of the publication.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

Various aspects of a system for indexing and locating captioned objects is illustrated by way of example, and not by way of limitation, in the accompanying drawings, wherein:



FIGS. 1A and 1B illustrate an exemplary document having a captioned object along with a detailed view of the captioned object;



FIGS. 2A and 2B illustrate another exemplary document having a captioned object along with a detailed view of that captioned object;



FIG. 2C illustrates an exemplary section of a document referencing a captioned object;



FIG. 3 depicts an exemplary computer system on which an embodiment of the present invention may be implemented;



FIG. 4 depicts a flowchart of an exemplary algorithm of indexing captioned objects according to the principles of the present invention;



FIG. 5 depicts an exemplary extraction rule;



FIG. 6 depicts an exemplary system for extracting, indexing, searching and retrieving captioned objects in accordance with the principles of the present invention;



FIG. 7 illustrates an exemplary extracted object as XML;



FIG. 8 illustrates an exemplary editorial screen for extracting information about captioned objects in accordance with the principles of the present invention;



FIG. 9 graphically depicts an association between related objects and abstracts;



FIG. 10 provides a table that illustrates relationships between objects, attributes, and abstracts that are identifiable according to the principles of the present invention;



FIGS. 11A-11E depict exemplary interface screen shots of a search application involving captioned objects;



FIGS. 12A and 12B depict exemplary interface screen shots of another search application;



FIGS. 13A-13I depict exemplary captioned objects that may be used in different embodiments of the present invention to provide advantages over merely textual abstracting and indexing; and



FIGS. 14A-14E depict exemplary interface screen shots of another search application involving captioned objects, including an enhanced abstract.


Claims
  • 1. A method for identifying information comprising the steps of: extracting respective data from a one or more objects associated with one of a plurality of documents;aggregating the extracted data within a database;determining if a relationship exists between any of the extracted data; andbased on the respective extracted data associated with a particular object, identifying others of the plurality of objects related to the particular object.
  • 2. The method of claim 1, further comprising the step of: based on the related objects, identifying respective documents associated therewith.
  • 3. A method for processing a document, the document including at least one object, the method comprising the steps of: generating descriptive data related to the at least one object;storing the descriptive data, a first identifier of the at least one object; and a second identifier of the document; andcreating within an automated computer system a link that associates the descriptive data, the first identifier, and the second identifier.
  • 4. The method of claim 3, wherein the link is searchable by a user to identify one or both of the at least one object and the document.
  • 5. A system for identifying information comprising: a programmable computer configured to execute software that provides: extracting respective data from a one or more objects associated with one of a plurality of documents;aggregating the extracted data within a database;determining if a relationship exists between any of the extracted data; andbased on the respective extracted data associated with a particular object, identifying others of the plurality of objects related to the particular object.
  • 6. An automated method of processing at least one document containing at least one object, comprising the steps of: extracting data from each of the objects in each of the at least on documents into an object record;linking the extracted information in each object record with its associated one of the at least one document; andstoring each linked object record in a computer readable medium.
  • 7. An automated method as claimed in 6, further comprising the step of assigning at least one descriptor to each linked object record.
  • 8. An automated method as claimed in 7, further comprising the step of determining whether any others of the at least one object is related to each of the others of at least one object.
  • 9. An automated method as claimed in claim 8, further comprising the step of identifying a first object that is related to a second object.
  • 10. An automated method as claimed in claim 9, further comprising the step of relationally linking the first object and the second object.
  • 11. A method as claimed in claim 10, wherein the storing step includes storing the relational linkage information linking the first object and the second object.
  • 12. A method as claimed in claim 11, further comprising a second storing step to store the relational linkage information linking the first object and the second object.
  • 13. A method for processing a plurality of documents, the plurality of documents containing a plurality of objects, the method comprising; extracting data from each of the plurality of objects into an object data record;providing a first identifier to each of the plurality of objects;providing each of the plurality of documents with a second identifier;linking each object data record with its associated one of a plurality of documents;assigning at least one index descriptor to each of the object data records;storing each object data record, each first identifier, each second identifier, and each at least one index descriptor in a computer readable medium; andcreating a link in the computer readable medium between each object data record, its first identifier, and its associated second identifier.
  • 14. A method as claimed in claim 13, further comprising the step of verifying each object data record before storing each object data record in the compute readable medium.
  • 15. A method as claimed in claim 14, wherein the creating step further includes the steps of: determining whether relationships exist between any of the plurality of objects;identifying relationships among the plurality of objects;creating in the computer readable medium a relational link between related ones of the plurality of objects.
  • 16. A computer system for processing information from a plurality of objects contained in a plurality of documents, including a computer that extracts data from each of the objects into one of a plurality of the first of at least one object into a first object record; links the extracted information in the first object record with its associated one of the at least one document; and stores the linked first object record in a computer readable medium.
  • 17. A computer system as claimed in claim 16, further including a network interface for communicating the extracted information to and from an external user of the computer system.
  • 18. A system for processing information from a plurality of objects contained in a plurality of documents, the system comprising: an objects content processing system including a processor to extract data from the plurality of objects;an image repository system including computer readable media that stores object images and images of the plurality of documents; andan index that stores data extracted from the plurality of objects, associations between the plurality of objects, and index descriptors assigned to each of the plurality of objects.
  • 19. A system as claimed in claim 18, further comprising a first interface that receives queries to search the index for extracted data responsive to each of the queries.
  • 20. A system as claimed in claim 19, further comprising a second interface that displays objects and objects in response to a request.
  • 21. A system as claimed in claim 18, wherein the objects content processing system further includes a computer program that links each of the plurality of objects with a respective one of the plurality of documents.
  • 22. A system as claimed in claim 21, wherein the objects content processing system further includes a user interface for accessing data extracted from the plurality of objects and links between the plurality of objects with a respective one of the plurality of documents.
  • 23. A system as claimed in claim 21, wherein the user interface is adapted to allow for indexing the plurality of objects based on the data extracted from the plurality of objects.
  • 24. A system as claimed in claim 21, wherein the objects content processing system further includes a computer program that indexes the plurality of objects based on data extracted from the plurality of objects.
  • 25. A system as claimed in claim 21, further comprising an output generator that outputs information from the objects content processing system for storing information regarding the linked objects and the respective one of the plurality of objects in the index.
  • 26. A system as claimed in claim 25, wherein the output generator outputs images from the objects content processing system for storing the linked objects and the respective one of the plurality of objects in the image repository.
  • 27. A computer implemented method of compiling information from a plurality of documents, some of the plurality of documents containing at least one object, comprising the steps of: extracting data from the objects; andstoring the extracted data in a computer searchable medium.
  • 28. A computer implemented method as claimed in claim 27, wherein the extracting step includes the step of providing an object identifier to each object.
  • 29. A computer implemented method as claimed in claim 28, wherein the extracted data includes textual information extracted from the objects.
  • 30. A computer implemented method as claimed in claim 29, wherein the textual information includes caption data or axes label information.
  • 31. A computer implemented method as claimed in claim 28, further comprising the step of assigning at least one descriptor to each linked object record.
  • 32. A computer implemented method as claimed in claim 31, further comprising the step of linking the extracted data from each object with its associated one of the plurality of documents.
  • 33. A computer implemented method as claimed in claim 28, further comprising the step of providing a full text record identifier to each of the plurality of documents.
  • 34. A computer implemented method as claimed in claim 33 further comprising the step of providing an abstract identifier to an abstract of each of the plurality of documents.
  • 35. A computer implemented method as claimed in claim 34 further comprising the step of linking the extracted data from each object with its associated one of the plurality of documents.
  • 36. A computer implemented method as claimed in claim 35 wherein the linking step includes linking each object identifier with the abstract identifier of the abstract of the associated one of the plurality of documents.
  • 37. A computer implemented method as claimed in claim 35 wherein the linking step includes linking each object identifier with the full text record identifier of the associated one of the plurality of documents.
  • 38. A computer implemented method as claimed in claim 36, further comprising storing the object and associated linkages in a computer readable medium after the linking step.
  • 39. A computer implemented method as claimed in claim 37 further comprising storing the object and associated linkages in a computer readable medium after the linking step.
  • 40. A computer implemented method as claimed in claim 36 further comprising verifying the linking step.
  • 41. A computer implemented method as claimed in claim 37 further comprising verifying the linking step.
  • 42. A computer implemented method as claimed in claim 27 further comprising the step of verifying the extracted object data before the storing step.
  • 43. A computer implemented method as claimed in claim 33 further comprising the step of determining if there are any relationships amongst extracted object data.
  • 44. A computer implemented method as claimed in claim 43 further comprising the step of identifying related objects.
  • 45. A computer implemented method as claimed in claim 44 further comprising storing the identifications of related objects.
  • 46. A computer implemented method as claimed in claim 44 further comprising identifying related ones of the plurality documents based on the results of the step of identifying related objects.
  • 47. A system for collecting data from a plurality of objects contained in a plurality of documents, the system comprising: a computer;means in said computer for extracting data from the plurality of objects;means in said computer for linking the data extracted from each of the plurality of objects with an associated one of the plurality of documents; andmeans in said computer to store an object record, which includes each of the plurality of objects and related linking information.
  • 48. A method of identifying information in a database responsive to a query from a user, the database containing information regarding a plurality of documents, at least some of the plurality of documents containing objects, containing extracted information regarding the objects, and containing assigned index descriptors relating to the information contained in the objects, the method comprising: receiving the query from the user;accessing the database in response to the query;determining whether any objects are responsive to the query; andtransmitting information regarding the responsive objects to the user.
  • 49. The method of claim 48, further comprising the steps of determining whether any of the plurality of documents is responsive to the query and transmitting information regarding the responsive one or more of a plurality of documents to the user.
  • 50. The method of claim 49, further comprising the step of transmitting a full responsive one or more of a plurality of documents to the user in response to a request.
  • 51. The method of claim 48, wherein the step of transmitting information regarding the responsive objects to the user includes providing the user with a link to each object.
  • 52. The method of claim 49, wherein the steps of transmitting information regarding the responsive objects to the user and transmitting information regarding the responsive one or more of a plurality of documents to the user are performed simultaneously.
  • 53. The method of claim 50, wherein the steps of transmitting information regarding the responsive objects to the user and transmitting information regarding the responsive one or more of a plurality of documents to the user are performed by displaying summary information regarding the responsive objects and the responsive one or more of a plurality of documents to a user on a single display pages.
  • 54. The method of claim 50, wherein the steps of transmitting information regarding the responsive objects to the user and transmitting information regarding the responsive one or more of a plurality of documents to the user are performed by displaying summary information regarding the responsive objects and the responsive one or more of a plurality of documents to a user on multiple display pages.
  • 55. The method of claim 49 further comprising the step of determining whether there exists a relationship exists between at least one of the transmitted responsive objects and additional objects in the database based on a request from the user.
  • 56. The method of claim 55 further comprising the step of transmitting information regarding objects related to the at least one of the responsive objects to the user.
  • 57. The method of claim 49 wherein the transmitting step includes displaying an objects search results page.
  • 58. The method of claim 50, wherein, based on results of the step of determining whether any objects are responsive to the query, further determining whether any of the plurality of documents is responsive to the query.
  • 59. The method of claim 50, wherein the transmitting step includes providing information from the database linked to the responsive objects.
  • 60. The method of claim 50, wherein the transmitting step includes providing the user with at least one link to access additional information related to each responsive object.
  • 61. A method of identifying information in a database responsive to a query from a user, the database containing information regarding a plurality of documents, at least some of the plurality of documents containing objects, the method comprising: receiving the query from the user;accessing the database in response to the query;determining whether any objects are responsive to the query; andtransmitting information regarding those of the plurality of documents containing responsive objects to the user.
  • 62. A method of identifying information in a database responsive to a query from a user, the database containing information regarding a plurality of documents and data extracted from a plurality of objects contained in the plurality of documents containing objects, the method comprising: receiving the query from the user;accessing the database in response to the query;determining whether any of the data extracted from the plurality of objects is are responsive to the query; andtransmitting information regarding the responsive ones of the plurality of objects to the user.
  • 63. A system for providing information in response to a query from a user, the system comprising: at least one computer containing data associated with a plurality of documents, some of the plurality of documents containing objects, and containing data associated with each of the objects and data associated with each of the objects;a data transmission line coupled to the computer for receiving a query from the user; anda computer program implemented in at least one of the at least one computer for accessing the data extracted from each of the objects and determining whether any of the objects is responsive to the query from the user.
  • 64. A system for providing information as claimed in claim 63, wherein the data associated with each of the objects includes an object identifier.
  • 65. A system for providing information as claimed in claim 64, wherein the data associated with each of the objects is linked to the data of the one of the plurality of documents in which the object is contained.
  • 66. A system for providing information as claimed in claim 63, wherein the data extracted from each of the objects includes textual information contained in each of the objects.
  • 67. A system for providing information as claimed in claim 66, wherein the textual information includes captions.
  • 68. A system for providing information as claimed in claim 66, wherein the textual information includes labels of axes contained in objects.
Provisional Applications (1)
Number Date Country
60783459 Mar 2006 US