1. Technical Field
Embodiments of the present disclosure relate to file search technology, and particularly to an electronic device and method for searching for related terms using the electronic device.
2. Description of Related Art
Related terms of preset query terms can be obtained using a natural language processing (NLP) method by calculating a relevance score between every two of the query terms. Generally, two methods are used to calculate the relevance score between every two query terms. In a first method, the relevance score between every two of the query terms is calculated according to an angle between two vectors of every two terms. The smaller the angle between the two vectors, the larger the relevance score of the two terms is.
In a second method, the relevance score between every two of the query terms is obtained by calculating a conditional probability between every two of the query terms. The larger the conditional probability between two terms, the larger the relevance score according to this method. Therefore, a new method for searching related terms is desired.
All of the processes described below may be embodied in, and fully automated via, functional code modules executed by one or more general purpose electronic devices or processors. The code modules may be stored in any type of non-transitory computer-readable medium or other storage device. Some or all of the methods may alternatively be embodied in specialized hardware. Depending on the embodiment, the non-transitory computer-readable medium may be a hard disk drive, a compact disc, a digital video disc, a tape drive or other storage medium.
The display device 20 may be used to display search results matching with preset query terms, and the input device 22 may be a mouse or a keyboard used to input computer readable data. The storage device 23 may be a non-volatile computer storage chip that can be electrically erased and reprogrammed, such as a hard disk or a flash memory card.
The related term search system 24 is used to determine hyponym terms of query terms input by a user, and obtain related terms of the query terms according to the determined hyponym terms. In one embodiment, a hyponym term is a word or phrase whose semantic field is included within that of another word, for example, oak is a hyponym of tree, and dog is a hyponym of animal. The related terms have direct or indirect relationships with the query terms, for example, handheld phone is a related term of mobile phone. In one embodiment, the related term search system 24 may include computerized instructions in the form of one or more programs that are executed by the at least one processor 25 and stored in the storage device 23 (or memory). A detailed description of the related term search system 24 will be given in the following paragraphs.
In step S1, the receiving module 201 receives a plurality of query terms input by the user from a client computer, for example, the receiving module 201 receives input via a mouse, a touch screen, or a keyboard, etc. In one embodiment, one or more unimportant words (e.g., stop words) are removed from the query terms. That is, the query terms merely includes core terms which are important to search operations. In one embodiment, the unimportant words at least include articles, adverbs, and quantifiers, such as “a”, and “the” and “this”.
In step S2, the first determining module 202 determines hyponym terms of each of the query terms from the storage device 23. In one embodiment, the hyponym terms of each of the query terms may be pre-established (or pre-determined) manually and stored in the storage device 23, and the first determining module 202 may obtain the hyponym terms corresponding to each query term from the storage device 23.
In step S3, the calculating module 203 merges all the hyponym terms of the query terms into a set of the hyponym terms, and calculates a weight factor of each hyponym term in the set of the hyponym terms. In one embodiment, a number of occurrence times (hereinafter refer to as “occurrence number”) of a hyponym term repeated in the set of the hyponym terms is determined to be a weight factor of the hyponym term.
For example, suppose that the user inputs four query terms, “Hyponym1” represents a first hyponym set of a first query term, Hyponym1=(h1, h2, h5), “Hyponym2” represents a second hyponym set of a second query term, Hyponym2=(h2, h4, h5, h7), “Hyponym3” represents a third hyponym set of a third query term, Hyponym3=(h1, h6), and “Hyponym4” represents a fourth hyponym set of a fourth query term, Hyponym4=(h1, h7, h8). Then, the occurrence number of each hyponym term in the set of the four hyponym sets is as follows: Hyponym=(h1:3, h2:2, h4:1, h5:2, h6:1, h7:2, h8:1). That is, the weight factor of each hyponym term is as follows: h1=3, h2-2, h4-1, h5-2, h6-1, h7-2, h8-1.
In step S4, the second determining module 204 determines a specified number of the hyponym terms in the set of the hyponym terms according to the weight factors of the hyponym terms. For example, the second determining module 204 arranges the hyponym terms according to a descending sequence of the weight factor, and selects the specified number of hyponym terms according to descending sequence. In other embodiments, the second determining module 204 may arrange the hyponym terms according to other specified sequence (e.g., an ascending sequence) of the weight factor, and select the specified number of hyponym terms according to an ascending sequence.
For example, an rearranged set of the above-mentioned hyponym terms is as follows: Hyponym=(h1:3, h2:2, h5:2, h7:2, h4:1, h6:1, h8:1). If the specified number is three, the hyponym terms of “h1, h2, and h5” are selected by the second determining module 204.
Because the hyponym terms which have the lower weight factor are filtered by the second determining module 204, the search results based on the hyponym terms of the query terms are more accurate. For example, if the user inputs two query terms, such as “slide” and “mobile phone”, the present disclosure can determine an accurate hyponym term of “slide mobile phone”, and further determine related terms as being “slide smart phone”, “slide handheld phone”, and so on. The hyponym term of “slide battery panel” may be filtered by the present disclosure if the hyponym term of “slide battery panel” has a lower weight factor. Then, an accurate search operation may be performed based on the accurate hyponym term, the related terms, and the query terms.
In step S5, the searching module 205 adds the determined hyponym terms into related terms of the query terms, obtains search results from a data source by performing a search operation from a data source (e.g., USPTO) based on the hyponym terms of the query terms, the related terms of the query terms, and the query terms, and displays the search results on the display device 20.
It should be emphasized that the above-described embodiments of the present disclosure, particularly, any embodiments, are merely possible examples of implementations, merely set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) of the disclosure without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and the present disclosure and protected by the following claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| 201210044065.7 | Feb 2012 | CN | national |