This application is based on and claims priority under 35 USC 119 from Japanese Patent Application No. 2019-037285 filed Mar. 1, 2019.
The present invention relates to an information processing apparatus and a non-transitory computer readable medium storing a program.
There is knowledge that a user has to have in the case of reading, for example, a professional book requiring professional knowledge, or knowledge that facilitates understanding of the content of the professional book in a case where the user has the knowledge. However, in many cases, such knowledge is usually personal knowledge and know-how of professionals. In recent years, information in which concepts representing events, relationships, and the like related to knowledge are related to each other in a hierarchical structure is stored in a database so that the personal knowledge and the like of the professionals may be effectively used. For example, in recent years, a database based on a concept of a knowledge graph is developed.
JP2017-182457A and JP2018-005690A are examples of the related art.
In order to use information based on knowledge stored in a database, a user has to know the location where the information is stored in the database and the way of extracting the information.
Aspects of non-limiting embodiments of the present disclosure relate to an information processing apparatus and a non-transitory computer readable medium storing a program extracting information matching an extraction condition designated by a user and presenting the information to the user even in a case where the user does not know a storage location and an extraction method of information related to a document.
Aspects of certain non-limiting embodiments of the present disclosure overcome the above disadvantages and/or other disadvantages not described above. However, aspects of the non-limiting embodiments are not required to overcome the disadvantages described above, and aspects of the non-limiting embodiments of the present disclosure may not overcome any of the disadvantages described above.
According to an aspect of the present disclosure, there is provided an information processing apparatus including a segment obtaining section that obtains a segment described in a document designated by a user, an extraction condition obtaining section that obtains an extraction condition for extracting information including a concept related to the segment as knowledge information from a concept structure information storage section storing concept structure information in which concepts representing events and relationships related to knowledge are related to each other in a hierarchical structure, a specifying section that specifies a storage location of the knowledge information in the concept structure information storage section and an extraction method for the concept included in the knowledge information from a designated content of the extraction condition, an extraction section that extracts the knowledge information in accordance with the specified extraction method from the storage location specified by the specifying section, and a presentation section that presents the knowledge information to the user.
Exemplary embodiment(s) of the present invention will be described in detail based on the following figures, wherein:
Hereinafter, an exemplary embodiment of the present invention will be described based on the drawings.
The information processing apparatus 10 in the present exemplary embodiment includes a document-related KG generation processing unit 1, a preprocessing unit 2, a knowledge graph (KG) 3, and a category dictionary 4. Constituents not used in the description of the present exemplary embodiment are not illustrated in
The knowledge graph 3 is a database storing concept structure information in which concepts representing events and relationships related to knowledge are related to each other in a hierarchical structure. The knowledge graph 3 generally stores various knowledge bases. The “knowledge base” is a database in which knowledge is described based on a specific representation format. The knowledge base corresponds to DB1, DB2, and the like included in the knowledge graph 3 illustrated in
In the present exemplary embodiment, knowledge information that is formed by extracting concepts matching user information (user information corresponds to an extraction condition for the knowledge information) designated by the user and a structural relationship between the extracted concepts from the knowledge graph 3 is presented to the user. The knowledge information presented to the user is formed by partial extraction from the knowledge base included in the knowledge graph 3 and thus, is also a knowledge graph. In
The document-related KG generation processing unit 1 executes a basic process for presenting the knowledge graph (that is, the knowledge information) customized for the user by extracting information from the knowledge graph 3, more specifically, by obtaining a part of information defined in the knowledge base based on a segment included in a document designated by the user. The preprocessing unit 2 provides additional information in the generation of the knowledge information by the document-related KG generation processing unit 1. The category dictionary 4 stores a type of category indicating the industry, the field, and the like and typically used by the user, and information (for example, a material name) related to the category.
The document-related KG generation processing unit 1 includes a single word extraction unit 11, a category label assigning unit 12, a supporting knowledge extraction unit 13, a KG extraction method selection unit 14, a KG extraction processing unit 15, a presentation processing unit 16, a use case database (DB) 31, a supporting knowledge case database (DB) 32, a professional know-how database (DB) 33, and a KG extraction method database (DB) 34.
The single word extraction unit 11 functions as a segment obtaining section and obtains a single word described in the document designated by the user. The “segment” means a word or a phrase. Not only a word (having the same meaning as the “single word”) but also a phrase may be obtained by extraction from the document. In the present exemplary embodiment, a case of extracting the single word will be illustratively described. The category label assigning unit 12 functions as a category linking section and links a category to which the single word belongs to each single word obtained by the single word extraction unit 11 by referring to the category dictionary 4. The supporting knowledge extraction unit 13 functions as an extraction condition obtaining section and obtains the user information input and designated by the user. The user information in the present exemplary embodiment corresponds to the extraction condition for extracting information including a concept related to the single word extracted from the document as the knowledge information from the knowledge graph 3.
The KG extraction method selection unit 14 functions as a specifying section and specifies a storage location of the knowledge information in the knowledge graph 3 and an extraction method (in a strict sense, an extraction method for concepts included in the knowledge information) for the knowledge information from the designated content of the extraction condition. The KG extraction processing unit 15 functions as an extraction section and extracts the knowledge information in accordance with the specified extraction method from the storage location specified by the KG extraction method selection unit 14. The presentation processing unit 16 presents the knowledge information extracted by the KG extraction processing unit 15 to the user. As will be described in detail, the knowledge information may be presented in a graph format or a sentence format in the present exemplary embodiment.
The content of data registered in each of the databases 31 to 34 will be described along with a description of processes.
The preprocessing unit 2 includes a single word extraction unit 21 and a category label assigning unit 22. The single word extraction unit 21 and the category label assigning unit 22 have the same processing functions as the single word extraction unit 11 and the category label assigning unit 12 of the document-related KG generation processing unit 1.
The category dictionary 4 stores category information in which segments and categories are associated with each other in advance.
Each of the constituents 11 to 16, 21, and 22 in the information processing apparatus 10 is implemented by a cooperative operation between a computer forming the information processing apparatus 10 and a program operated by a CPU mounted in the computer. In addition, each of the storage sections 3, 4, and 31 to 34 is implemented in an HDD mounted in the information processing apparatus 10. Alternatively, a RAM or an external storage section may be used through a network.
In addition, the program used in the present exemplary embodiment may be provided by a communication section and may also be provided by storing the program in a computer readable recording medium such as a CD-ROM and a USB memory. The program provided from the communication section or the recording medium is installed on the computer. The CPU of the computer implements various processes by executing the program in order.
For example, when the user of the information processing apparatus 10 in the present exemplary embodiment reads a professional book, the user may not understand the content of the professional book due to insufficient professional knowledge in the professional field. Even in a case where the user desires to obtain knowledge necessary for understanding, the necessary knowledge may be professional knowledge and generally know-how of a professional. Even in a case where the knowledge is stored as information in a database such as the knowledge graph 3 of the present exemplary embodiment, the location where necessary information is stored in the database and the way of extracting the information may not be known without knowledge of handling the database.
Therefore, in the present exemplary embodiment, knowledge such as the know-how of the professional is accumulated in the knowledge graph 3 and may be used by the user. Information necessary for the user may be presented as the knowledge information without knowing the location where the information necessary for the user is stored in the knowledge graph 3 and the way of extracting the information from the storage location.
Furthermore, in the present exemplary embodiment, the information necessary for the user is not presented as a uniform content. The information necessary for the user may be presented as a content corresponding to the purpose of the user and a level matching a knowledge level specified from the user information designated by the user.
Hereinafter, a process of presenting the knowledge graph (that is, the knowledge information) necessary for the user in the present exemplary embodiment will be described using the flowchart illustrated in
In a case where the user inputs a document (professional book illustrated above; corresponds to a target document illustrated in
In a case where the document is obtained, the single word extraction unit 11 extracts single words from the obtained document (step S120). It is assumed that single words related to material for which information is registered in the knowledge graph 3 are extracted. A summary of a process of extracting the single words is illustrated in
The single word extraction unit 11 extracts texts indicating material by referring to the knowledge graph 3 and extracts single words matching the extracted texts. The single word extraction unit 11 further extracts a document name of the document in which the extracted single words (in
While the single words are extracted from the document in the present exemplary embodiment, the single words may be extracted using sentences included in the range of a part of the document such as a range designated by the user as a target and not using the whole document as a target. For example, a text area for copying sentences is disposed in a separate window, and a part of the sentences copied in the text area is used as a target of the single word extraction. In addition, while the single word extraction unit 11 automatically extracts corresponding single words, the user may designate the single words.
In a case where the single word extraction unit 11 extracts the single words, next, the category label assigning unit 12 assigns a category label to each single word (step S130).
The knowledge graph 3 includes a domain ontology describing concepts related to individual target areas (that is, categories). The category is linked to the single word using the domain ontology. Information in which the category label is assigned to each single word in the above manner corresponds to a target document single word set+category label 52 illustrated in
In a case where the user inputs the document, the supporting knowledge extraction unit 13 then causes the user to designate the user information. The designated user information is information including concepts related to the single words extracted by the single word extraction unit 11, that is, the extraction condition for the knowledge information to be presented to the user. In the present exemplary embodiment, a case where the user designates items of “purpose”, “required information quality”, and “category” as the extraction condition for the knowledge information is considered.
The supporting knowledge extraction unit 13 displays concepts (“risk check” and the like in “purpose”, “all” and the like in “required information quality”, and “steel” and the like in “category”) related to each item set in the use case database 31 on a screen as selection candidates. The user selects item values matching the purpose and the like of the user for each concept from the displayed item values. Thus, the user information is said to be information indicating a relationship between the user and the target document. The supporting knowledge extraction unit 13 obtains the user information by causing the user to select the item values (step S140). The user information designated by the user is the extraction condition for the knowledge information. In a strict sense, the user information is the extraction condition for concepts included in the knowledge information. Thus, in the following description, the obtained user information will be referred to as the “extraction condition for the knowledge information” or simply the “extraction condition”.
Next, the supporting knowledge extraction unit 13 extracts a supporting knowledge case corresponding to the extraction condition designated by the user from supporting knowledge cases registered in the supporting knowledge case database 32 (step S150). In order to present the user with concepts matching the extraction condition designated by the user, it is necessary to clarify the target of search. In the supporting knowledge case, knowledge association information in which the extraction condition designated by the user is associated with the target of search and an action for the search is defined. The action corresponds to the extraction method for the knowledge information. The professional know-how database 33 illustrated in
As described thus far, the supporting knowledge extraction unit 13 extracts the supporting knowledge case corresponding to the extraction condition designated by the user. By extracting the supporting knowledge case, the supporting knowledge extraction unit 13 specifies the action for the way of extracting concepts included in the knowledge information, in other words, the concepts to be extracted and included in the knowledge information, based on the extraction condition from the professional know-how database 33.
As illustrated in
According to the setting example illustrated in
The extracted supporting knowledge case varies depending on the item values included in the user information by the user. Accordingly, the contents and the number of extracted actions may vary, and the priority of each action may vary even in a case where the same actions are extracted.
By extracting the supporting knowledge case 53, the supporting knowledge extraction unit 13 specifies that it is necessary to search for knowledge to be presented to the user, that is, professional knowledge and know-how to be searched by the user such as knowledge (referred to as “information”) related to explosive chemical reactions between materials and dangerous substance material in the above example. Next, the KG extraction method selection unit 14 selects a KG extraction method for linking the knowledge to be searched to the knowledge case included in the knowledge graph 3 as the storage location of the knowledge (step S160).
First, the KG extraction method selection unit 14 recognizes that “presentation of explosive chemical reactions between materials” is earlier than “presentation of dangerous substance material” in a search order (that is, the order of actions to be executed) by referring to the extracted supporting knowledge case 53. Two KG extraction methods are linked to “presentation of dangerous substance material”. Priority orders are set from the information defined in the KG extraction methods.
According to the data structure of the KG extraction method database 34 illustrated in
The KG extraction method selection unit 14 specifies the knowledge base including concepts necessary for generating the knowledge information, that is, the storage location of the knowledge information, by referring to the KG extraction method database 34. In addition, the priority order of the knowledge base is specified considering both the priority order set in the professional know-how database 33 and the priority order set in the KG extraction method. The KG extraction method including the storage location and the storage method of the generated knowledge information and the priority order of the knowledge base corresponds to a KG extraction method 54 illustrated in
The preprocessing unit 2 includes the single word extraction unit 21 and the category label assigning unit 22 equivalent to the single word extraction unit 11 and the category label assigning unit 12 of the document-related KG generation processing unit 1. Accordingly, in order for the document-related KG generation processing unit 1 to generate the target document single word+category label 52 from the target document, the preprocessing unit 2 performs preprocessing of generating a reference document single word set 55 from a reference document and generating a reference document single word set+category label 56 (step S170). For example, the “reference document” is desirably a professional book belonging to the same professional field as the target document. Plural professional books may be set as the reference document.
In a case where the storage location, in other words, the knowledge base (DB1 and the like in the above example) of the knowledge information to be presented to the user in the knowledge graph 3 is specified in the above manner, the KG extraction processing unit 15 executes a KG extraction process of extracting concepts included in the knowledge information to be presented to the user from the storage location (step S180). That is, in a case where the user inputs the document and the user information (that is, the extraction condition for the knowledge information), the KG extraction processing unit 15 automatically extracts information (that is, information matching the extraction condition for the knowledge information) considered to be necessary for the user from the large size knowledge graph 3 and presents the information to the user. Hereinafter, details of the KG extraction process performed by the KG extraction processing unit 15 in the present exemplary embodiment will be described using the flowchart illustrated in
First, the structure of the knowledge graph 3 used in the description of the KG extraction process is illustrated in
In step S160, the KG extraction method selection unit 14 selects “extraction of relationship graph between entities (materials) based on relation “explosive chemical reaction”” as the KG extraction method having the highest priority order.
First, as illustrated by enclosures with broken lines 64, 65, and 66 in
Next, as illustrated by enclosures with broken lines 67, 68, and 69 in
Next, as illustrated by enclosures with broken lines 73, 74, and 75 in
Materials may have various representations in a case where the materials are described using text strings. For example, a material “sodium” has a representation “Na” different from“sodium”. Therefore, the extraction may extend to representations other than the text “sodium”.
That is, the KG extraction processing unit 15 extends information to be extracted as illustrated by enclosures with broken lines 76, 77, and 78 in
In the present exemplary embodiment, a contribution degree of material linked to the role is also presented as information. Thus, an entity “contribution degree” 79 is also extracted.
The KG extraction processing unit 15 extracts candidates of information to be presented to the user in the above manner and also deletes information not necessary for the user. That is, while, in step S130, the category label is assigned to each single word extracted in step S120, the KG extraction processing unit 15 deletes information related to a category not assigned to each single word. Specifically, as illustrated in
In the relation information extraction process in step S183, entities (referred to as “nodes”) positioned between an entity “Na explosion” 83 linked to alkali metal explosion and the entity “Na” 70 are extracted. However, as illustrated in
In the above manner, the KG extraction processing unit 15 extracts information to be presented to the user based on the single word set+category label 52 including the single words extracted from the target document. Furthermore, in the present exemplary embodiment, the single word set+category label 56 is generated from the reference document. Therefore, as illustrated by an enclosure with a broken line 85 in
The knowledge information generated in the above manner is knowledge related to “presentation of explosive chemical reactions between materials” having the highest priority. While the knowledge information to be presented to the user may be generated using only the KG extraction method having the highest priority, the knowledge information may be generated for other priorities and merged into the knowledge information illustrated in
The presentation processing unit 16 presents the knowledge graph, that is, the knowledge information, extracted from the knowledge graph 3 in the above manner to the user (step S190). For example, the knowledge information may be transmitted to a terminal device used by the user and displayed on the terminal device. The user may further understand the professional book designated as the target document by referring to the presented knowledge information.
The presentation processing unit 16 is not limited to a presentation method of presenting the knowledge information in a graph format as illustrated in
In the sentence format illustrated in
In addition, for example, the type of information may be displayed in an easily identifiable manner by changing a display form such as differentiating a display color depending on the type of concept and the type of document such as the target document and the reference document.
While
For example, the presentation processing unit 16 displays the target document on the terminal device. In the above example, in a case where single words (hereinafter, “target single words”) such as “sodium” and “water” as a presentation target of knowledge in the knowledge information are displayed on the screen, the target single words are displayed as selectable single words. For example, the target single words are displayed in a selectable manner by changing the display form of the target single words from the display forms of other single words such as changing the display color of the target single words or underlining the target single words.
In a case where the user selects any target single word, information related to the selected target single word is extracted from the knowledge information illustrated in
As another example, the presentation processing unit 16 displays the target document on the terminal device. In a case where the target single word is displayed on the screen by the user scrolling the target document, the presentation processing unit 16 extracts information related to the target word displayed on the screen from the knowledge information illustrated in
While the user operation is considered as a user operation using the mouse in the above description, the user operation is not for limitation purposes. For example, an augmented reality (AR) technology is used. In a case where the user points at the target single word, information related to the target single word is extracted and displayed near the pointed target single word. Alternatively, in a case where the user seeing the target single word is detected, information related to the target single word may be extracted and displayed near the seen target single word.
The knowledge information to be displayed on the screen may be displayed such that information positioned in the lower layer is not displayed and information in the higher layer is displayed as described above, or information positioned in the lower layer is expanded from the beginning.
In addition, while the target document is the processing target in the above description, other documents such as the reference document may be the processing target.
While one target document is set as a generation target of the knowledge information in the above description, plural documents may be collectively set as the generation target. This process corresponds to a modification example of step S110 illustrated in
For example, a target document selection processing section is disposed. The target document selection processing section displays a document content screen and a document selection list screen on the terminal device. Document names of documents designated by the user are displayed in a desired order of reading on the document selection list screen. The display order of the document name list displayed on the document selection list screen may be switched by a predetermined operation. The content of the document selected to be read by the user from the list displayed on the document selection list screen is displayed on the document content screen.
In addition, on the document selection list screen, the document name of the document of which the content is displayed on the document content screen, that is, the currently read document, is displayed in a first color (for example, red), and the document name (document name displayed immediately below the document name of the currently read document) of the subsequently read document is displayed in a second color (for example, yellow). In addition, in a case where a document that is read immediately previously to the currently read document is present, the document name (document name displayed immediately above the document name of the currently read document) of the document is displayed in a third color (for example, gray).
The terminal device of the user further displays a display screen of the knowledge information. In a case where the knowledge information is generated using the currently read document as the target document, the presentation processing unit 16 displays the knowledge information in the sentence format or the graph format. In the case of displaying the knowledge information in the sentence format, the knowledge information related to the target single word selected by the user may be displayed, or the corresponding knowledge information may be displayed in response to a scroll operation as described above. The same applies to the following description.
In a case where the user selects a “subsequent document” button displayed on the screen, the document name of the document to be subsequently read is displayed in red, and the document name of the read document is switched to a gray display. In addition, the document name displayed immediately below the document name displayed in red is displayed in yellow. The target document selection processing section displays the content of the new document selected as the currently read document on the document content screen. In addition, the presentation processing unit 16 displays the knowledge information generated for the new document as the target document on the display screen of the knowledge information.
While the knowledge information may be generated using one document selected as the currently read document from the document name list as the target document as described above, plural documents may be handled as the target document.
For example, while the immediately previously read document, the currently read document, and the document to be subsequently read are identifiable from each other by color in the above description, these three documents may be collectively set as the target document. That is, the document-related KG generation processing unit 1 generates the knowledge information using three documents including the currently read document and the immediately previous and subsequent documents as the target document. The document-related KG generation processing unit 1 generates one knowledge information by unifying the three documents.
While the target document is selected by designating the currently read document and the immediately previous and subsequent documents, that is, each one document immediately previous and immediately subsequent to the currently read document, that is, a range of ±1 from the currently read document, the number of target documents may be adjusted by appropriately setting the range.
The foregoing description of the exemplary embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, thereby enabling others skilled in the art to understand the invention for various embodiments and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2019-037285 | Mar 2019 | JP | national |