The present invention relates to an apparatus and a method for retrieving some contents, particularly those for retrieving indefinite contents that have some relation to a keyword.
With the spread of information terminals such as cellular phones and personal computers, huge amounts of contents, including movies, images, music, games and electronic books, are getting available to anyone with ease nowadays. So there is an increasing demand to use the information terminals for contents-retrieval, and many suggestions have been made to help users retrieve efficiently such contents that meet the user's expectation.
As an exemplar of such improvements, JPA2005-310094 suggests using history data, in which keywords relevant to those contents which have been taken by a user are recorded in association with identification data of the user, to extend the search range by including those keywords, called extension keywords, which are relevant to an input keyword. On the basis of the extension keywords, such contents may be retrieved and provided as extended information or recommendation.
The above prior art also suggests changing the display condition of the retrieved contents depending upon the degree of affinity of the extension keywords with the input keyword. For example, only those contents are displayed, which contain such extension keywords that are highly similar to the input keyword. The higher the similarity of the extension keyword to the input keyword, are the retrieved contents displayed in the first or upper place or with some alert, e.g. attached with the larger number of asterisks. The degree of similarity is calculated using the number of extension keywords that are contained in information attached to the retrieved contents, the letter string length (the number of bytes) of the attached information, and a relation formula of the number of extension keywords.
In the above prior art, the mutual relations of the extension keywords to the input keyword are not reflected on the displayed retrieval result, so it is not always apparent to the user how and why such retrieval result is derived from the input keyword.
In view of the foregoing, a primary object of the present invention is to provide a contents-retrieval apparatus and a contents-retrieval method, which are improved for the convenience of the users and facilitate understanding the relations between the retrieved contents and the input keyword.
According to the invention, an apparatus for contents retrieval comprises a contents storage device storing data of a variety of contents; a thesaurus storage device storing data of a thesaurus that classifies and organizes words according to their mutual relations; a search query input device; a keyword determining device for determining a keyword among a search query input through the search query input device; a related word searching device for searching the thesaurus for words related to the determined keyword, the related word searching device further obtaining information on relations of the retrieved related words to the keyword from the thesaurus; a contents searching device for retrieving such contents that correspond to the keyword and the retrieved related words from the contents storage device; and a display device for displaying a result of retrieval by the contents searching device, the display device displaying the retrieved contents on a screen in a manner variable according to the relations between the keyword and the related words, which correspond to the displayed contents.
According to a preferred embodiment, the display device decides positions of the retrieved contents on the screen to reflect the relations between the corresponding keyword and related words. Preferably, the display device displays the contents corresponding to the keyword in a center area of the screen.
It is also preferable that the display device displays the contents corresponding to the keyword in a larger size than the contents corresponding to the related words, especially where the contents are images.
The related word searching device preferably scores the degree of relevancy of each related word to the keyword.
The contents storage device preferably stores additional information on the contents in association with the respective contents, so that the contents searching device searches for the corresponding contents while comparing the keyword and the related words with the additional information.
The contents retrieval apparatus of the invention is preferably provided with a device for regularizing spelling variation in the keyword and/or a device for searching for synonyms to the keyword so that the related word searching device counts the synonyms in the keywords and searches for related words to these keywords.
It is also preferable to provide the contents retrieval apparatus of the invention with a device for judging whether a word retrieved as a related word by the related word searching device is an antonym to the keyword, so that the related word searching device excludes the retrieved word from the related words if the retrieved word is judged to be antonymous to the keyword.
According to the invention, a method for contents retrieval comprises steps of:
determining a keyword among an input search query;
retrieving related words to the determined keyword from a thesaurus that classifies and organizes words according to their mutual relations;
obtaining information on relations of the retrieved related words to the keyword from the thesaurus;
retrieving such contents that correspond to the keyword and the retrieved related words from among a variety of contents; and
displaying the retrieved contents on a screen in a manner variable according to the relations of the related words to the keyword, which correspond to the displayed contents.
Since the contents retrieved based on the keyword and its related words are displayed on a screen in a manner variable according to the relations of the related words to the keyword, the present invention facilitates understanding the relations between the retrieved contents and the input keyword, and thus helps the users retrieve expected contents conveniently.
The above and other objects and advantages of the present invention will be more apparent from the following detailed description of the preferred embodiments when read in connection with the accompanied drawings, wherein like reference numerals designate like or corresponding parts throughout the several views, and wherein:
For mutual data communication, the digital camera 10 is connected to the personal computer 12 through a communication cable, e.g. an IEEE1394 type or an USB (universal serial bus) type, or a wireless LAN. The recording medium 11 can also communicate data with the personal computer 12 through a specific driver.
The personal computer 12 is provided with a monitor 15 and an operating section 16 consisting of a keyboard and a mouse. Referring to
The HDD 23 stores various programs and data for operating the personal computer 12, a program for viewer-software that totalizes registration and retrieval of images, and a number of image data files read out from the digital camera 10 and the recording medium 11. The CPU 20 reads the program out of the HDD 23 and develops it on the RAM 22 to process it sequentially. The CPU 20 controls respective components of the personal computer 12 according to operational signals input through the operating section 16.
The communication interface 24 interfaces the data communication between an external instrument like the digital camera 10 and a communication network like the Internet 13. The display controller 25 controls the monitor 15 to display appropriate screens.
As shown in
The thesaurus dictionary classifies words in a tree structure, wherein subordinate words, i.e. words with narrower meanings, are branched from a super-ordinate word, i.e. a word with a broader meaning. As shown for example in
On registering and retrieving some images, the user starts up the viewer software by operating the operating section 16. As the viewer software starts up, an authentication procedure is carried out to certify an access to the server 14. Upon the access being certified, it becomes possible to register and retrieve images. The viewer software is provided with an image registration mode and an image retrieval mode. To register some images, a list of images stored in the HDD 23 are displayed as thumbnails on the monitor 15, and the user selects the images from the list by operating the operating section 16, and inputs additional information, such as title, tag and comments, on each of the selected images. Then, the selected images are registered. To retrieve some images, the user operates the operating section 16 to input a string of letters or characters as a search query.
When the image retrieval mode of the viewer software is selected, the CPU 20 constructs a keyword determining section 40, a related word searching section 41 and an image searching section 42, as shown in
The keyword determining section 40 analyzes the letter string input through the operating section 16 to determine the keywords for searching. For example, if the input letter spring is a noun like “flower” and “lion”, the keyword determining section 40 recognizes the input letter string itself as a keyword. If the input letter spring is a sentence like “I am looking for pictures of red cars”, the keyword determining section 40 subjects the input letter string to a syntactic analysis for analyzing the grammatical construction of the sentence and a morphemic analysis for dividing the sentence into morphemes (the minimum linguistic unit that makes sense) and parsing them. On the basis of the analysis results, the keyword determining section 40 extracts a term or word that is appropriate for a search term or keyword. In this example, “red car” is determined to be the keyword. If the input letter string directly designates the kind of images, e.g. “images of Tokyo tower”, a word that is probably contained in the additional information on the designated images, “Tokyo tower” in this example, is regarded as a keyword, wherein the additional information is input by the user at the time of registering each image. The keyword determining section 40 outputs data of the determined keyword to the related word searching section 41 and to the image searching section 42.
The related word searching section 41 accesses to the thesaurus database 30 through the communication interface 24, searching the thesaurus database 30 for words related to the keyword determined by the keyword determining section 40. The related word searching section 41 also retrieves relevancy information on these related words from the thesaurus database 30. The relevancy information indicates how these related words relate to the input keyword. As for the example of
On retrieving the related word, the related word searching section 41 scores the degrees of relevancy of each word to the input keyword. For example, the related word searching section 41 converts the distance in meaning between the input keyword and the retrieved related word to a numerical value. Concretely, the input keyword is assumed to have a perfect score, e.g. 100 points, and its related words are scored in mark-back system: a synonym is −1 point, a broader term and a narrower word are −2 points, and an anonym is −3 points. A narrower term subordinate to a synonym to the keyword is (−1)+(−2)=−3 points, and a term still narrower than a narrower term is (−2)+(−2)=−4 points. Referring again to the example of
The image searching section 42 receives information on the keyword from the keyword determining section 40 and information on the related words from the related word searching section 41, and makes an access to the image database 31 through the communication interface 24, to search the communication interface 24 for such images that match with the received keyword and the received related words. The image searching section 42 compares the information on the keyword and related words with the additional information of the stored images, to check if the keyword or any related word is contained in the title or the tag or the comments of each image. Thus, the communication interface 24 retrieves a predetermined number of images and outputs data of the retrieved images to the display controller 25.
On the basis of the image data from the image searching section 42, as well as the respective scores of the retrieved related words and their relevancy information stored in the RAM 22, the display controller 25 controls the monitor 15 to display a search result display window 50, as shown for example in
Now the processing procedure of the image registering retrieving system 2 as constructed above will be described with reference to the flowchart of
The user operates the operating section 16 to input an arbitrate string of letters for searching. The, the keyword determining section 40 analyzes the input letter string through the syntactic and morphemic analyses to determine a keyword for searching. Information on the determined keyword is output to the related word searching section 41 and the image searching section 42.
Upon receipt of the keyword from the keyword determining section 40, the related word searching section 41 accesses to the thesaurus database 30 through the communication interface 24, to retrieve related words to the keyword and the relevancy information on these related words to the keyword from the thesaurus database 30. The related word searching section 41 scores the degree of relevancy of each related word to the keyword. The related word searching section 41 continues retrieving the related words and scoring their relevancy degrees till it retrieves a predetermined number of related words or ones above a predetermined score. After the completion of retrieval, information on the retrieved related words is output to the image searching section 42. The score and the relevancy information of each related word is temporarily stored in the RAM 22.
In response to the information on the input keyword from the keyword determining section 40 and the information on the related words from the related word searching section 41, the image searching section 42 accesses to the image database 31 through the communication interface 24 to search for some keyword images and related images. After a predetermined number of images are retrieved, data of the retrieved images and the information on the score and relevancy of each related word, as stored in the RAM 22, are sent to the display controller 25, so the display controller 25 controls the monitor 15 to display the search result display window 50.
When the user puts a pointer 53 on one of the keyword images 51 in the search result display window 50 and clicks the mouse of the operating section 16 thereon, the related word searching section 41 retrieves different related words from the already retrieved ones, or as indicated by dashed lines in
When the user puts the pointer 53 on one of the related images 52 and clicks thereon, the keyword determining section 40 determines the word corresponding to the chosen related image 52 to be a new keyword. Then, words relevant to the new keyword and images corresponding to the new keyword and its related words are retrieved in the same way as described above, and the search result display window 50 changes its display contents correspondingly. Past records on the display contents of the search result display window 50 as well as on the chosen images 51 and 52 are temporarily stored in the RAM 22, so the search result display window 50 may return to the previous display screen. Thus, the user can browse a wide variety of images, including those retrieved based on such words relating to the related words to the keyword, by choosing the images 51 and 52 one after another, without the need for inputting another keyword or letter string. As the images corresponding to the keyword and the related words are simultaneously retrieved and displayed on the same screen, the user may enjoy browsing the images just like surfing on the sea of information. So the image registering retrieving system 2 is very convenient for the user to search for indefinite contents.
As described so far, the images 51 and 52 are displayed according to the degrees of relevancy of the related words to the keyword, so that the relations between the keyword image 51 and the related images 52 may be visually recognizable. Therefore, the convenience of the user on searching is still more improved.
In Japanese writing system, letter strings that may be input as search terms (keywords or key phrases) can be written in different scripts, such as kanji (Chinese characters), hiragana, katakana, roman letters, one-byte characters, double-byte characters, capital letters and small letters. That is, the user can use the different scripts for expressing the same word. For example, the word for “dog” may be written in hiragana, katakana or kanji characters. Furthermore, there are cases where different Kanji characters are used to spell the same word according to its application fields. For example, different kanji characters for expressing a term for “superconductivity” are respectively used in JIS standard, i.e. in the industrial field and in academic parlance, i.e. in the academic field. Also, there are cases where several different katakana-spellings are used for a foreign or imported word, for example, “interface”. There are also spelling variation or representation variation between modern and classic kanji characters, as well as differences in declensional Kana ending or in punctuation. Besides, the input letter strings can contain erratum and omission of characters.
Moreover, Japanese language contains plenty of synonyms in comparison with other languages. Take the word “I” for example, the pronoun used by a speaker when talking about himself or herself, there are tens of synonyms in Japanese: “watashi”, “boku”, “ore”, “waqahai”, “shousei” etc. There are also a huge number of abbreviations, like “Toudai” for “Tokyo-daigaku (The university of Tokyo)”, or “Souri” for “Naikaku-Souridaijin (prime minister)”, as well as a huge number of common names, like “Shusho” for “Naikaku-souridaijin”. Hereinafter, the above-mentioned tolerated variations and potential mistakes in spelling will be called the spelling variation.
For this reason, it is preferable providing a spelling variation regularizing section 60 and a synonym search section 61 behind the keyword determining section 40, as shown in
It is also possible to provide an antonym distinguishing section 70, as shown in
Although the number of related words and the number of images retrieved based on a keyword are predetermined in the above embodiment, it is possible that the user can preset and change the numbers of retrieved words and images. For scoring the degree of relevancy of each related word to the keyword, another method is usable instead of the above-mentioned mark-back system. For example, it is possible to register scores of the respective words previously in the thesaurus database 30. It is also possible to weight the scores according to the words, so that a word which is distant in the meaning from a keyword but is tightly associated with the keyword, e.g. “top of Japan” to “Mt. Fuji”, will get a high score. Furthermore, it is possible to record the history about the choice of the related images 52, so that those words which are chosen frequently will get higher scores.
Although the above-described embodiment refers to images as the contents to be retrieved, the present invention is applicable to other cases where the retrieval contents are movies, music, games, electronic books and the like. The present invention is also applicable to retrieval of commercial articles on websites.
In the above-described embodiment, the devices 40 to 42 for executing the image retrieval are built in the CPU 20 when the user starts up the viewer software program and selects the image retrieval mode. But it is possible to mount the devices 40 to 42 as hardware components, e.g. in the form of discrete circuits or FPGA (field programmable gate array), in the personal computer 12. It is also possible to construct the devices 40 to 42 as separate members that are connectable to the personal computer 12. The spelling variation regularizing section 60 and the synonym searching section 61 may be provided in the keyword determining section 40. Also, the antonym distinguishing section 70 may be provided in the related word searching section 41.
Thus, the present invention is not to be limited to the above-described embodiments but, on the contrary, various modifications will be possible without departing from the scope of claims appended hereto.
Number | Date | Country | Kind |
---|---|---|---|
2007-041112 | Feb 2007 | JP | national |