This application is based on and claims priority under 35 USC 119 from Japanese Patent Application No. 2015-126922 filed on Jun. 24, 2015.
1. Technical Field
The present invention relates to an object classification device and a non-transitory computer readable medium.
2 . Related Art
There is a technology in which plural keywords extracted from a document group are visualized in a two-dimensional map.
An aspect of the present invention provides an object classification device including: a keyword determining unit that determines keywords for one or plural objects; a keyword ordering unit that orders the plural determined keywords based on a conceptual hierarchical structure which is a structure that hierarchically represents concept of words; and a classifying unit that classifies the objects so as to be associated with the ordered keywords.
Exemplary embodiment(s) of the present invention will be described in detail based on the following figures, wherein
Hereinafter, an exemplary embodiment of the present invention will be described with reference to the drawings.
[Hardware Configuration]
An object classification device 10 according to the present embodiment can be realized as, for example, an information processing device such as a personal computer, and
The control unit 11 includes a program control device such as a CPU, and executes various information processes according to programs stored in the storage unit 12.
The storage unit 12 includes a memory device such as a RAM or a ROM, and a hard disk, and stores programs executed by the control unit 11. The storage unit 12 functions as a work memory of the control unit 11.
The communication unit 13 is a network interface such as a LAN card, and transmits and receives information to and from another information processing device via communication means such as a LAN or a wireless communication network.
The display unit 14 is a display device such as a liquid crystal display, and displays information according to an instruction input from the control unit 11.
The operation unit 15 is a mouse, a keyboard, or a touch panel, and receives the operation of a user to output an operation signal to the control unit 11.
[Functional Block Diagram]
The object obtaining unit 21 obtains an object from a storage device such as a hard disk that stores object data. Here, the obtained object may be an electronic document, a figure, or a table. The object obtaining unit 21 may obtain an object by downloading the object via a network, or may obtain an object by performing OCR on an object image obtained by a scanner.
In the present embodiment, it is assumed that the object obtained by the object obtaining unit 21 includes an object (referred to as a retrieval object) retrieved under a specific condition, and an object (referred to as a similar object) similar to the retrieval object. The retrieval object is, for example, an object retrieved based on a retrieval condition input by a user, and in the present embodiment, the object obtaining unit 21 obtains one or more objects belonging to the retrieval condition input by the user, as the retrieval object. The similar object is an object similar to the retrieval object, and in the present embodiment, the object obtaining unit 21 obtains one or more objects similar to the retrieval object for each retrieval object, as the similar object. For example, the object obtaining unit 21 calculates the degree of similarity between the retrieval object and another object based on an element such as a word included in the retrieval object, and obtains an object of which the calculated degree of similarity exceeds a predetermined degree of similarity, as the similar object. Here, the degree of similarity may be changeably set. If the degree of similarity is set to be high, a limited similar object which is more similar to the retrieval object is obtained, and if the degree of similarity is set to be low, a broad similar object which is similar to the retrieval object to some extent is obtained. In the present embodiment, it is assumed that one or more retrieval objects and one or more similar objects obtained by the object obtaining unit 21 are divided into plural object groups. For example, it assumed that a set of one retrieval object and one or more similar objects similar to the retrieval object are described as one object group.
The keyword determining unit 22 determines one keyword which is representative of the object group obtained by the object obtaining unit 21. Here, it is assumed that the keyword determining unit 22 determines one keyword which is representative of the object group for each of the plural object groups. Specifically, the keyword determining unit 22 may extract a word having a high appearance frequency from the object included in the obtained object group to determine the extracted word as a keyword, or may further add the structure of an object or the importance of syntax to the extracted word to determine a keyword.
When the keyword determining unit 22 determines the keyword by the aforementioned method, plural words having the high appearance frequency may be extracted from the object included in the object group in some cases. When the plural words (referred to as candidate keywords) which are candidates of the keyword are extracted, it is assumed that the keyword determining unit 22 determines one keyword which is representative of the object group based on the plural candidate keywords and a conceptual hierarchical structure. A process of determining the keyword which is representative of the object group by the keyword determining unit 22 will be described below.
Initially, the conceptual hierarchical structure is a structure that hierarchically represents the concept of words.
In the general conceptual hierarchical structure shown in
Thus, in the present embodiment, it is assumed that the arrangement order in the horizontal direction in the conceptual hierarchical structure has conceptual continuity by conceptually ordering the words within the same hierarchy of the conceptual hierarchical structure.
The ordering of words in the horizontal direction in the conceptual hierarchical structure will be described with reference to a schematic diagram showing a general conceptual hierarchical structure shown in
For example, in the method of evaluating the relevance of the words (W-1)m to the words Wn, it is possible to evaluate the relevance by calculating the degree of similarity between an object group including the words Wn of the objects stored in the storage device and an object group including a group of words associated as the subordinate concepts of the words (W-1)m of the objects stored in the storage device. In this case, when a feature vector of the object group including the words Wn is expressed as dn, and a feature vector of the object group including a group of words associated as the subordinate concepts of the word (W-1)m is expressed as dm, an inner product of the feature vector dn and the feature vector dm is calculated as the degree of similarity. That is, the relevance of the words Wn to the word (W-1)m is evaluated using a degree of similarity S(n, m) =dn·dm, and it is evaluated that the higher the degree of similarity S(n, m), the higher the relevance and the lower the degree of similarity S(n, m), the lower the relevance. The feature vector dn and the feature vector dm may be the sum of the feature vectors of the respective objects included in the object group. Here, when keywords Ki (i=1, 2, . . . , and t) having the number of elements t are extracted from the object group having the number of objects N, the feature vector of the objects is expressed as a t-dimensional vector (0, 1, 1, . . . , 0)t by expressing a case where Ki is included in the respective objects Ej (j=1, 2, 3, . . . , and N) included in the object group as “1” and a case where Ki is not included in the respective objects as “0”.
The words are ordered by rearranging the respective words of the words Wn={W8, W9, W10, W11} such that the word Wn is positioned closely to the word (W-1)m evaluated as having high relevance to the word Wn depending on the degree of similarity S(n, m) calculated as stated above and the position in the conceptual hierarchical structure. By using such a method, it is possible to construct the conceptual hierarchical structure with conceptual continuity in the arrangement order in the horizontal direction by ordering the words through the rearrangement of the words in the respective hierarchies in the order from the highest hierarchy.
As the method of calculating the degree of similarity S(n, m), the degree of similarity may be calculated by adding the relevance of the word (W-1)m to the word (W-1)p associated as the superordinate concept of the word Wn to the word. For example, when the relevance of the word (W-1)m to the word (W-1)p is expressed as Rm, it is assumed that the degree of similarity S(n, m) is calculated as a degree of similarity S(n, m)=Rm·dn·dm. Here, the relevance Rm may be obtained by quantifying the conceptual relevance of the word (W-1)m to the word (W-1)p. For example, the relevance Rm may be the number of steps in the horizontal direction in the conceptual hierarchical structure from the word (W-1)p to the word (W-1)m. That is, in
An example of a method of determining one keyword which is representative of the object group by the keyword determining unit 22 based on the conceptual hierarchical structure in which the words are conceptually ordered in the horizontal direction will be described. For example, in the conceptual hierarchical structure in which the words are conceptually ordered in the horizontal direction as shown in
Another method of determining one keyword which is representative of the object group by the keyword determining unit 22 based on the conceptual hierarchical structure in which the words are conceptually ordered in the horizontal direction will be described. For example, in the conceptual hierarchical structure in which the words are conceptually ordered in the horizontal direction shown in
The keyword ordering unit 23 orders the keywords determined in the plural object groups by the keyword determining unit 22 based on the conceptual hierarchical structure. Specifically, the keyword ordering unit 23 orders the keywords which are representative of the respective object groups based on the positions in the conceptual hierarchical structure in which the words are arranged in the horizontal direction. The keyword ordering unit 23 orders the keywords by obtaining the positions of the respective keywords in the conceptual hierarchical structure in which the words are conceptually ordered in the horizontal direction according to the conceptual ordering and associating the positions of the keywords with the arrangement order in the horizontal direction, and can order the respective keywords in the order of having the conceptual continuity.
The object classifying unit 24 classifies the object obtained by the object obtaining unit 21 so as to be associated with any of the keywords determined by the keyword determining unit 22.
The graph displaying unit 25 arranges the keywords ordered by the keyword ordering unit 23 in any positions of a matrix in the order thereof, and displays a two-dimensional table in which the objects classified so as to be associated with the respective keywords are arranged as elements on the display unit 14. Although it has been described in the present embodiment that the graph displaying unit 25 displays the two-dimensional table on the display unit 14, the present invention is not limited to this example. For example, a coordinate plane or a three-dimensional table may be displayed.
[Object Classifying Process]
Here, an example of a flow of an object classifying process performed by the object classification device according to the present embodiment will be described with reference to a flowchart shown in
Initially, when the user inputs the retrieval condition, the object obtaining unit 21 obtains the retrieval object retrieved based on the input retrieval condition (S101). Here, it is assumed that the object obtaining unit 21 obtains plural retrieval objects retrieved based on the input retrieval condition.
The object obtaining unit 21 obtains the similar object similar to the retrieval object for each of the plural retrieval objects obtained in process S101 (S102). Here, it is assumed that the object obtaining unit 21 obtains one or more similar objects for each retrieval object depending on a predetermined degree of similarity. It is assumed that a set of one retrieval object and similar objects similar to the retrieval object is one object group.
The keyword determining unit 22 determines a keyword which is representative of the object group based on the word included in the objects (the retrieval object and the similar objects) of the object group for each object group (S103).
The keyword ordering unit 23 orders keywords determined for each object group by the keyword determining unit 22 based on the conceptual hierarchical structure (S104).
The object classifying unit 24 classifies the respective objects included in the object group obtained by the object obtaining unit 21 so as to be associated with any keywords (S105).
The graph displaying unit 25 displays the object classifying table in which the keywords ordered by the keyword ordering unit 23 are arranged in any positions of the matrix in the order thereof and the objects classified by the object classifying unit 24 are arranged in the respective elements on the display unit 14 (S106), and ends the object classifying process.
[Re-retrieval Process using Object Classifying Table]
An object re-retrieval process performed using the object classifying table will be described. It has been described in the object classifying table shown in
Here, an example of a flow of the object re-retrieval process performed by the object classifying device according to the present embodiment will be described with reference to a flowchart shown in
Initially, a range including one or more objects is designated from the object classifying table in response to a mouse operation of the user (S201). Here, the user may select a desired range of the object classifying table, and thus, the range may be designated. Alternatively, the user may select keywords, and thus, rows of the selected keywords may be designated as the range.
The object obtaining unit 21 obtains the similar objects similar to the object for each object included in the range designated in process S201 (S202). Specifically, the object obtaining unit 21 obtains the similar objects similar to the object E2,2 included in the range R designated in
The object classifying unit 24 classifies the similar objects obtained in process S202 so as to be associated with any keywords displayed in the object classifying table (S203).
The graph displaying unit 25 displays the object classifying table in which the object included in the range designated in process S101 and the similar objects obtained in process S202 are arranged on the display unit 14 (S204).
Here, when the object obtaining unit 21 obtains the similar objects by decreasing the degree of similarity in process S202, there is a possibility that the similar objects to be classified in positions far away from the range R will be obtained. For example, the objects which are similar to the object included in the range R but are not associated with the keyword corresponding to the range R may be obtained in some cases. In this case, the obtained objects are classified so as to be associated with the keyword far away from the range R such as a keyword W14. Basically, it is not preferable that a user who wants to retrieve the object on the periphery of the range R obtains the object classified in the position far away from the range R. Thus, it is assumed that the object classifying unit 24 does not classify the similar objects associated with the keyword far away from the designated range by a predetermined threshold or more. Here, the predetermined threshold may be determined based on the distance depending on the conceptual continuity of the keyword. The graph displaying unit 25 may not display the object classified into the keyword in the position far away from the designated range by the predetermined threshold or more. Accordingly, it is possible to display the object desired by the user through the re-retrieving in the object classifying table.
The foregoing description of the exemplary embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, thereby enabling others skilled in the art to understand the invention for various embodiments and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2015-126922 | Jun 2015 | JP | national |