The present invention relates to a character string search system and a control method therefor.
Organizations such as companies are configured with a plurality of departments each of which is a business unit, and each department often has information relates to business. In such organizations, there are cases in which it is necessary to analyze information across a plurality of departments. In this case, since different expressions are used for information having the same meaning between departments, it is difficult to search and extract information having a specific meaning for analysis without omission.
In regard to a character string search of different expressions, for example, Patent Document 1 states “a database is constructed by keywords selected from limited vocabularies using a limited vocabulary dictionary (thesaurus). Then, at the time of search, a search expert selects keywords including extension of synonyms with respect to requested information from the thesaurus.”
Patent Document 1: JP 62-011932 A
As stated in Patent Document 1, when the expert searches for information using the thesaurus, it is possible to search even if different expressions are used for information having the same meaning. However, for this purpose, the experts who are familiar with the thesaurus are necessary, but there are few experts or few people having a skill close to an expert. Further, a great deal of time and effort are required to keep the thesaurus up to date.
In this regard, it is an object of the present invention to provide useful terms among search terms used in the past in relation a designated search term so that the provided useful terms are easily used.
A representative character string search system according to the present invention includes a storage device that associates and stores an attribute of a searcher who inputs a search term for a first search and a term group including one or more search terms used for the first search as a rule, a processor, and a display device, wherein the processor specifies a first rule including the search term input by the searcher in a second search from among rules stored in the storage device, and specifies a second rule including a searcher having an attribute close to the attribute of the searcher in the second search from the first rule.
According to the present invention, is possible to provide useful terms among search terms used in the past in relation a designated search term so that the provided useful terms are easily used.
Hereinafter, exemplary embodiments will be described with reference to the appended drawings. In the following description, information will be described using an expression “table,” but such information need not be necessarily expressed using a table and may be expressed using a data structure other than a table. A process described using a processor as a subject may be a process performed by a computer or an information processing device. All or a part of a process performed by a processor executing a program may be implemented by dedicated hardware. Further, various kinds of programs may be installed in a system or a computer through a program distribution server or a computer readable storage medium.
As a preparatory process before a search, a search system (hereinafter also referred to as a “system”) acquires all character strings included in a text of a search target. Then, the system selects character strings by development of lowercase characters and uppercase characters of “Mc-KEY” and development of the presence or absence of a hyphen from all character strings of a text of a search target, for example, centering on “Mc-KEY,” and creates a term group 100. Here, when the system acquires all the character strings of the text of the search target, the system may acquire a position of each character string in the text.
The searcher first performs a search based on the term group 100. In an example of
This is because the close state of the attribute of the searcher is set so that target information is easily acquired using the same search term as the searcher. In the example of
Thus, the new searcher can search for the search terms of term group 101. Further, the new searcher can perform revision, that is, addition or deletion based on the term group 101 so that appropriate search terms are obtained. This is because the purpose of the search is not always perfectly matched, and a character string serving as a search target changes over time, or a text of a search target is newly added. For example, the new searcher can add a term that is not included in the term group 101 but included in the term group 100, for example, “Mc_KEY,” and can add a term that is not included in the term group 100, for example, “KEY-Mc.”
Further, when the new searcher determines “McKEY” not to be appropriate as the search term, it is possible to exclude “McKEY” included in the term group 101 from the search target. The new searcher can perform a search using the term group 103 created as described above. A term group 103 can easily include “MCKEY” when the term group 101 rather than the term group 102 is used. Then, the term group 103 can be used for a search to be performed later.
As described above, the search term can be easily selected using the previous term group. Particularly, since the previous term group is selected based on the attribute of the searcher, it is possible to suppress selection of a useless search term. Further, it is possible to revise a term group to be suitable for a search, and the revised term group can be further used.
An input device 203 is a device that receives an input from the searcher (user) such as a keyboard or a mouse. An output device 204 is a device that performs an output to the searcher (user) such as a display device. Input content and output content will be further described later. Instead of the input device 203 and the output device 204, a computer connected to the network 208 may receive an input from the searcher, receive an input through the network IF 202, and transmit an output through the network IF 202, and perform an output to the searcher.
The storage device 205 stores programs and data. The storage device 205 may be a Random Access Memory (RAM) or may include a Hard Disk Drive (HDD), a Solid State Drive (SSD), or the like in addition to a RAM. A RAM and a HDD are separate objects as a device but described as the storage device 205 collectively. The storage device 205 stores rule information 206 and character string appearance information 207 as information to be used for a search. Other information necessary for a search may be stored.
The rule information 206 is information of a term group including one or more search terms, and in the example of
The rule information 206 includes a user name, a user attribute, and a creation date and time. Since a term groups including the same search terms may be created by different users or a term groups including the same search terms may be created by the same user at different dates and times for different purposes, the information is identifiably included. For this reason, one line of the rule information 206 illustrated in
The character string appearance information 207 is information of a character string included in the text of the search target. Since information of a character string is also used as information of a search term as described above, the character string appearance information 207 includes a relevant rule indicating a rule corresponding to a term group in which a character string is included as a search term. In the example of
The relevant rule of the character string appearance information 207 is information about a rule rather than direct information about a term group because it has a structure depending on the structure of the rule information 206. When the term group including the same search terms has a plurality of rules, the relevant rule substantially designates the term group by having pointers to all of a plurality of rules. One line of character string appearance information 207 illustrated in
Either or both of the rule information 206 and the character string appearance information 207 may be stored in a device connected to the network 208 instead of the storage device 205 of the search system 200 and accessed via the network IF 202. The processor 201 selects the search term with reference to the rule information 206 and the character string appearance information 207, and specifies a position of a character string which is identical to the search term in the text of the search target. Then, the processor 201 stores information related to a new term group in the rule information 206 and the character string appearance information 207. The rule information 206 and the character string appearance information 207 will be described later.
The character string position list indicates a list of positions at which a character string appears in the text of search target. For example, in the case of “mc-key,” since the number of appearances is 67, the character string position list may include 67 pieces of position information. Here, the position information may be the number of characters starting from a first character of the text of the search target. Further, when the number of appearances is large, a data amount of the character string position list increases, and instead of the number of characters from the first character of the text of the search target, for example, information indicating whether or not there is a character string among 100 characters obtained by dividing a text in units of 100 characters may be used.
When there is such information, it is possible to specify a position of a character by pattern matching from the beginning in 100 characters in which there is a character string. Since a method of deciding a search term does not depend on content of the information of the character string position list, and thus further description will be omitted. Further, “#” of the character string appearance information 207a and 207b indicates a number of a term, “m” indicates an integer of 9 or more, a dotted line row of the character string appearance information 207b is the same as the information of the character string appearance information 207a.
A process of performing reading and writing of the information illustrated in
In step 602, the processor 201 causes a window for inputting a search term to be displayed on the display device 204. The window is a window 900a illustrated in
When the user inputs the search term to the input field 901 and clicks a button 903 by operating the mouse of the input device 203, in step 603, the processor 201 acquires the search term input to the input field 901. The search term may be similarly acquired by detecting an input to the input field 901 even when the button 903 is not pressed down. In the example of
It is because when the same searcher inputs the same search term, the searcher often intends to obtain a similar search result, and thus the same term group of the same searcher is set to be likely to be used. Further, it is because that a possibility that a newly created term group is suitable for the purpose of search is high. In the example of
In step 703, the processor 201 determines whether or not there is “1A4G” which is identical to a department of the searcher in departments in which the rule numbers of the rule information 206a are “0” and “2,” and creates a list in a creation date and time order when there is “1A4G” in a plurality of rules. It is because the searchers belonging to the same department often search using the same product name, the same term group of the same department is set to be likely to be used. In the example of
In step 704, the processor 201 determines whether or not the creator name and the department in which the rule numbers of the rule information 206a are “0” and “2” are different from the user name and the department of the searcher, that is, whether or not there is a rule in which a list is not created in steps 702 and 703, and creates a list in the creation date and time order when there are a plurality of rules. In the example of
In step 705, the processor 201 combines the list created in step 702, the list created in step 703, and the list created in step 704 in this order, and creates one list. In this example, a rule in which the rule number is “2” and the rule in which the rule number is “0” become a rule list in this order. Here, the processor 201 may add a rule indicating only the search term input in step 603 to the rule list.
In the example of
In step 605, the processor 201 displays the rule list created in step 604 on the menu 904 of the window 900a. For example, the menu 904 may be displayed by arranging the rules from the top to the bottom so that the rules are easily selected in the order of the list. In the example of
Further, a rule indicating only the inputted search term is added in step 705, and “no expansion (system)” is displayed on the menu 904, When a plurality of rule list are created in step 604, a plurality of rules may be displayable together with a menu “more view” as indicated by window 900a. Further, when one rule in the menu 904 is selected by the mouse of the input device 203, and a button 902 is clicked, the processor 201 performs step 606.
The processor 201 acquires rule selection. In this example, “sales analysis of Mc-KEY (TARO Jul-8 16: 16)” is selected, and information indicating that “2” of rule number is selected is acquired. In step 607, the processor displays a window for revising the selected term, that is, the search term of the term group. This is the window 900b illustrated in
In step 802, the processor 201 causes the character string and the number of appearances acquired in step 801 to be displayed on display fields 907b and 907c. As illustrated in
In step 803, the processor 201 calculates a sum of the number of appearances of the character strings included in the rule selected in step 606 based on the number of appearances acquired in step 801. In this example, 2,900,538 which is the sum of the number of appearances of the term numbers “4,” “7,” and “8” is calculated. Then, the calculated value is displayed in the display field 906. Here, information indicating that this value is a “result of the selected rule” may be displayed in the display field 906. Further, content of the rule may be acquired from the memo of the rule information 206a and displayed in the display field 906 as “sales analysis of Mc-KEY.” Further, “2,900,000” may be displayed using double digits as significant figures.
In step 804, the processor 201 acquires the term numbers from the term groups of all the rules included in the list created in step 705, and causes all the character strings of the acquired search terms in the display field 907b. In the example of
Then, the processor 201 acquires the term numbers from the term group of the rule selected in step 606, and sets the display of the character strings of the acquired search term to a selected state by default. In the example of
Instead of acquiring the term numbers from the term group of the rule selected in step 606, the term numbers may be acquired from the term group of the rule or the user determined in step 703, and the display of the character string of the acquired search term may be set to the selected state by default.
In step 805, the processor 201 acquires an input for cancelling the selection of the character string as the search term or adding selection. In the example of
In step 806, the processor 201 also acquires an input of a new search term to an input field 910c. In the example of
Through steps 805 and 806, in the example of
The processor 201 may perform a process to return to the window 900a when a button 905 is clicked in the windows 900b and 900c. For example, when the user who has seen the display field 906 or the display field 907b desires to change the selected because the number of appearances is larger than expected, the user can select the rule again by clicking the button 905 again. When a button 909 is clicked in the window 900c, the search term input or selected in the input field 910c or the display field 907c is saved as a new rule.
In other words, in step 609, the processor 201 stores information related to the new rule in the rule information 206 and the character string appearance information 207. In this example, a rule in which the rule number is “n” is added as indicated in the rule information 206b in addition to the rule information 206a. In the term group of this rule, “7” (“McKEY”) of the term group in which the rule number is “2” is released, “8” (“Mc_KEY”) and “m” (“KEY-Mc”) are added “4,” and “6,” “8,” and “m” are obtained. Further, a memo of a rule which is selected and serves as a basis may be copied to a memo in which the rule number of the rule information 206b is “n,” added as “sales analysis of Mc-KEY,” and revised later, and a memo input filed (not illustrated) may be provided in the window 900c so that a memo can be input.
Further, in this example, the rule number “n” is added to the relevant rules of the term numbers “4,” “6,” and “8” as indicated in the character string appearance information 207b in addition to the character string appearance information 207a. When the character string appearance information 207a has been created for all the character strings included in the text of the search target in advance, there is the term number “m” because the character number “KEY-Mc” has appeared as often as the number of appearances “p,” and thus the rule number “n” is added to the relevant rule of the term number “m”. When the character string appearance information 207a has not been created for all the character strings included in the text of the search target in advance, and the character string “KEY-Mc” is not included in the character string appearance information 207a, the term number may be set to “m,” and the character string “KEY-Mc” and the relevant rule “n” may be added. By revising the rule information 206b and the character string appearance information 207b, the rule in which the rule number “n” can be used for a next search.
When the user clicks a button 908 in the window 900c, the processor 201 performs a search in step 610. This search is performed using the character string position list of the character string appearance information 207b as described above. Here, when the character string “KEY-Mc” is not already included in the character string appearance information 207a, the number of appearances of the character number “m” of the character string appearance information 207b and the character string position list may be created by performing pattern matching on the text of the search target using the character string “KEY-Mc.”
In step 611, the processor 201 displays a result of the search in step 609 and ends the search process. Here, the process may return to step 601 without ending the process.
As described above, the search term used in the past in relation to the designated search term is displayed, and thus it is possible to easily reuse the search term used in the past. Particularly, since the display order of the search terms to be reused is controlled based on the attribute of the searcher, and the search terms to be reused enter the selected state by default, and thus it is possible to provide useful search terms so that the search terms can be easily used. Further, it is possible to revise the search term, and the revised search term can be used for the next search.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2014/081792 | 12/1/2014 | WO | 00 |