INFORMATION EXTRACTING APPARATUS, AND INFORMATION EXTRACTING METHOD

Information

  • Patent Application
  • 20070233465
  • Publication Number
    20070233465
  • Date Filed
    March 19, 2007
    17 years ago
  • Date Published
    October 04, 2007
    16 years ago
Abstract
The information extracting apparatus includes an analyzer, an element extracting unit, and a supplementary-information obtaining unit. The analyzer analyzes text in input data. The supplementary-information obtaining unit obtains accompanying information such as property information that accompanies the data. The element extracting unit supplements the analysis result with the accompanying information, and extracts information on five elements, When, Where, Who, What and How, and predicate information from the text.
Description

BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a functional block diagram of an information extracting apparatus according to a first embodiment of the present invention;



FIG. 2 is an example of a description in a knowledge dictionary shown in FIG. 1;



FIG. 3 is an example of 4W1H-plus-predicate information extracted by an element extracting unit shown in FIG. 1;



FIG. 4 is a schematic for explaining an example in which a supplementary-information obtaining unit shown in FIG. 1 supplements the 4W1H-plus-predicate information from attribute information;



FIG. 5 is a schematic for explaining a definition of document;



FIG. 6 is a schematic for explaining an example in which the supplementary-information obtaining unit extracts information from other parts of text for information supplement;



FIG. 7 is a schematic for explaining an example in which the supplementary-information obtaining unit extracts information from other parts of the text and a document property for information supplement;



FIG. 8 is an output example of each extracted data shown in FIGS. 3, 4, 6, and 7;



FIG. 9 is a flowchart of a 4W1H-plus-predicate information extraction process according to the first embodiment;



FIG. 10 is a flowchart of an analysis process;



FIG. 11 is another flowchart of the 4W1H-plus-predicate information extraction process;



FIG. 12 is a functional block diagram of an information extracting apparatus according to a second embodiment of the present invention;



FIG. 13 is a schematic for explaining conversion examples in which an obtained extraction element is converted into an RDF/XML syntax and an RDF graph by a converter shown in FIG. 12;



FIG. 14 is a functional block diagram of an information extracting apparatus according to a third embodiment of the present invention;



FIG. 15 is a schematic for explaining a document-relationship specifying rule applied to specify an inter-document relationship by a document-relationship specifying unit shown in FIG. 14;



FIG. 16 is another example of a description in the knowledge dictionary shown in FIG. 14;



FIG. 17 is a schematic for explaining extraction of an inter-document relationship in an email document group by the information extracting apparatus shown in FIG. 14;



FIG. 18 is a schematic for explaining extraction of 4W1H-plus-predicate information from a document B shown in FIG. 17;



FIG. 19 is a schematic for explaining extraction of 4W1H-plus-predicate information from documents B and C shown in FIG. 17;



FIG. 20 is a schematic for explaining reconstruction of elements from documents A, B, and C in FIG. 17 by an element reconstructing unit shown in FIG. 14;



FIG. 21 is a flowchart of an information extraction process according to the third embodiment;



FIG. 22 is a flowchart of a document relationship-specifying process;



FIG. 23 is a flowchart of a process in which the element reconstructing unit reconstructs 4W1H-plus-predicate information;



FIG. 24 is a schematic for explaining conversion examples in which 4W1H-plus-predicate information is converted into an RDF syntax and an RDF graph by a converter of an information extracting apparatus according to a fourth embodiment of the present invention;



FIG. 25 is a block diagram of a hardware configuration of the information extracting apparatus according to the embodiments;



FIG. 26 is still another example of a description in the knowledge dictionary;



FIG. 27 is an example of 4W1H-plus-predicate information extracted from an English sentence;



FIG. 28 is an example of a document property;



FIG. 29 is a schematic for explaining an example in which the supplementary-information obtaining unit extracts information from the document property for information supplement; and



FIG. 30 is an output example of each data extracted from Example 1 and Example 2 shown in FIGS. 27 and 29.


Claims
  • 1. An information extracting apparatus comprising: an analyzer that analyzes a syntactic structure of text information contained in first data; andan extracting unit that extracts information on five elements, When, Where, Who, What and How, and a predicate from the text information based on the syntactic structure.
  • 2. The information extracting apparatus according to claim 1, further comprising: a storage unit that stores therein extracted information associated with the text information; anda display unit that displays the extracted information associated with the text information.
  • 3. The information extracting apparatus according to claim 1, further comprising: a dictionary that contains at least one of a part of speech of a word and a combination of parts of speech of words in a clause, relationship information between a destination that the clause modifies and modification applied to the destination, and interpretation rules for determining which of the five elements and the predicate corresponds to the relationship information, whereinthe extracting unit extracts the information from the text information by using the dictionary.
  • 4. The information extracting apparatus according to claim 3, wherein the relationship information is related to a range.
  • 5. The information extracting apparatus according to claim 1, further comprising a supplementary-information obtaining unit that obtains attribute information accompanying the first data as supplementary information, wherein the extracting unit supplements extracted information based on the supplementary information.
  • 6. The information extracting apparatus according to claim 1, further comprising a supplementary-information obtaining unit that obtains another text information in the first data as supplementary information, wherein the extracting unit supplements extracted information based on the supplementary information.
  • 7. The information extracting apparatus according to claim 1, further comprising: a supplementary-information obtaining unit that obtains peripheral information and information on five elements and a predicate from a second data as supplementary information; anda relationship specifying unit that specifies a relationship between the first data and the second data;a rearranging unit that rearranges the information on the five elements and the predicate, whereinthe extracting unit supplements extracted information based on the supplementary information, andthe rearranging unit rearranges the extracted information based on the relationship specified by the relationship specifying unit.
  • 8. The information extracting apparatus according to claim 7, wherein the rearranging unit selects, when pieces of the supplementary information overlap by an amount equal to or greater than a predetermined threshold, one of the pieces of the supplementary information to rearrange the extracted information.
  • 9. The information extracting apparatus according to claim 7, wherein the rearranging unit selects, when pieces of the supplementary information overlap by an amount equal to or greater than a predetermined threshold, one of the pieces of the supplementary information, and selects, when pieces of the information on the five elements and the predicate extracted from the first data and the second data overlap by an amount equal to or greater than a predetermined threshold, one of the piece of the information to rearrange the extracted information.
  • 10. The information extracting apparatus according to claim 7, wherein the rearranging unit rearranges the extracted information based on the information on the five elements and the predicate extracted from second data.
  • 11. An information extracting method comprising: analyzing a syntactic structure of text information contained in first data; andextracting information on five elements, When, Where, Who, What and How, and a predicate from the text information based on the syntactic structure.
  • 12. The information extracting method according to claim 11, further comprising: storing extracted information associated with the text information; anddisplaying the extracted information associated with the text information.
  • 13. The information extracting method according to claim 11, further comprising: storing a dictionary that contains at least one of a part of speech of a word and a combination of parts of speech of words in a clause, relationship information between a destination that the clause modifies and modification applied to the destination, and interpretation rules for determining which of the five elements and the predicate corresponds to the relationship information, whereinthe extracting includes extracting the information from the text information by using the dictionary.
  • 14. The information extracting method according to claim 13, wherein the relationship information is related to a range.
  • 15. The information extracting method according to claim 11, further comprising obtaining attribute information accompanying the first data as supplementary information, wherein the extracting includes supplementing extracted information based on the supplementary information.
  • 16. The information extracting method according to claim 11, further comprising obtaining another text information in the first data as supplementary information, wherein the extracting includes supplementing extracted information based on the supplementary information.
  • 17. The information extracting method according to claim 11, further comprising: obtaining peripheral information and information on five elements and a predicate from a second data as supplementary information; andspecifying a relationship between the first data and the second data;rearranging the information on the five elements and the predicate, whereinthe extracting includes supplementing extracted information based on the supplementary information, andthe rearranging includes rearranging the extracted information based on the relationship specified at the specifying.
  • 18. The information extracting method according to claim 17, wherein the rearranging includes selecting, when pieces of the supplementary information overlap by an amount equal to or greater than a predetermined threshold, one of the pieces of the supplementary information to rearrange the extracted information.
  • 19. The information extracting method according to claim 17, wherein the rearranging includes selecting, when pieces of the supplementary information overlap by an amount equal to or greater than a predetermined threshold, one of the pieces of the supplementary information, and selects, when pieces of the information on the five elements and the predicate extracted from the first data and the second data overlap by an amount equal to or greater than a predetermined threshold, one of the piece of the information to rearrange the extracted information.
  • 20. The information extracting method according to claim 17, wherein the rearranging includes rearranging the extracted information based on the information on the five
Priority Claims (2)
Number Date Country Kind
2006-077740 Mar 2006 JP national
2007-038235 Feb 2007 JP national