This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2014-190225, filed Sep. 18, 2014, the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to a tag adding apparatus and tag adding method.
A function of adding a tag (also referred to as an annotation, notes, or the like) is provided to apparatuses or services dealing with electronic content as a way to classify/arrange electronic content such as web pages, electronic documents, electronic books, etc. With such an environment, the user can add a desired tag to the electronic content by using a text entered by the user, text entered by some other user, or mechanically determined text. The tag is utilized, for example, to search for content.
However, when the user adds a tag, not much consideration is given to ease of searching in many cases. As a method of retrieving content to which a tag is added, carrying out search refinement by selecting a tag, and carrying out full-text searching similar to general document searching are generally employed. When the tag added by the user is inappropriate, it is difficult to find out the objective content by search refinement. In such a case, the user eventually resorts to using full-text searching, and the added tag is not utilized. A technique for enhancing search potential while allowing the user to have the flexibility to personally add a tag to the content becomes necessary.
According to one embodiment, a tag adding apparatus includes an input unit, a storage unit, a search unit, a search unit, an analyzer, a determination unit, and a registration unit. The input unit inputs an input tag added to an input content item. The storage unit stores registered content items in association with registered tags added to the registered content items. The search unit retrieves a first content aggregate and a second content aggregate from the storage unit, the first content aggregate being an aggregate of registered content items to which registered tags matching the input tag are added, the second content aggregate being an aggregate of registered content items to which registered tags matching an additional tag candidate are added. The analyzer analyzes the number of registered content items of the first content aggregate, and to analyze an inclusion relationship between the first content aggregate and the second content aggregate. The determination unit determines an additional tag to be additionally added to the input content item based on a result of analysis by the analyzer. The registration unit registers the input content item in the storage unit in association with the input tag and the additional tag.
Hereinafter, various embodiments will be described with reference to the drawings, In the following embodiments, identical elements are denoted by identical reference symbols, and duplicated descriptions are omitted.
The tag adding apparatus 100 determines an additional tag, which is a tag to be additionally added to the electronic content, based on an input tag that is a tag added to the electronic content by the user, and preserves the electronic content in association with the input tag and the additional tag. The additional tag makes it easy for the user to retrieve desired content, i.e., the additional tag enhances search potential. Examples of the electronic content include web pages, electronic documents, TV or other programs, still images, moving images, etc. A tag adding operation is included in the function of classifying/arranging the electronic content such as a storage function of the electronic content, bookmarking function, etc. Such functions are provided to devices or services dealing with electronic content. In the following descriptions, electronic content will simply be called content.
As shown in
The tag input unit 101 receives an input tag added to the content item. In the following, a content item to which an input tag is added by the user is called an input content item. The input tag is designated by the user when the content item is stored in the content storage unit 102. In an example, the user directly inputs text by using a keyboard or a software keyboard. In another example, the user designates an input tag by speaking. In this case, the speech of the user is converted into text by a speech-recognition technique. In still another example, the user designates an input tag by handwritten character input using a touch panel. In this case, the handwritten characters of the user are converted into text by a character recognition technique. In still another example, the user selects one of tag candidates recommended and presented by an application.
The content storage unit 102 stores therein the content items in association with the tags added to the content items. In the following, the content item stored in the content storage unit 102 is called the registered content item, and the tag added to the registered content item is called the registered tag.
The search unit 103 retrieves from the content storage unit 102 a plurality of content aggregates including a first content aggregate and second content aggregate, based on the input tag. The first content aggregate is an aggregate of registered content items to which registered tags matching (i.e., coincident with or similar to) the input tag are added. The search unit 103 searches the content storage unit 102 by using the input tag as a search query to thereby acquire the first content aggregate. As the search query, an additional tag candidate to be set based on a result of an analysis of the first content aggregate carried out by the content aggregate analyzer 104 can also be used in addition to the input tag. The search unit 103 searches the content storage unit 102 by using an additional tag candidate as a search query to thereby acquire an aggregate of registered content items to which registered tags matching the additional tag candidate are added as a second content aggregate. When there is a plurality of additional tag candidates, a second content aggregate is created for each of the additional tag candidates.
The content aggregate analyzer 104 analyzes a plurality of content aggregates retrieved by the search unit 103. Specifically, the content aggregate analyzer 104 analyzes the number of content items of the first content aggregate, and an inclusion relationship between the first content aggregate and second content aggregate.
The additional tag determination unit 105 determines an additional tag to be additionally added to the input content item on the basis of a result of the analysis carried out by the content aggregate analyzer 104. The content registration unit 106 registers the input content item in the content storage unit 102 in association with the input tag and additional tag.
As described above, the content item is preserved in the content storage unit 102 together with the tag (input tag) added by the user, and a tag (additional tag) recommended by the tag adding apparatus 100. In general, the user does not always add a tag to the content item after taking the ease of searching into consideration. Accordingly, when only a tag added by the user is simply added to the content item, a problem exemplified in the following is caused at the time of searching in some cases.
When falling into such situations, the user eventually uses, for example, full-text searching, and the added tag is not utilized. The tag adding apparatus 100 according to this embodiment adds an additional tag in order to enhance the search potential while allowing the user to have the flexibility to personally add a tag.
Next, an operation of the tag adding apparatus 100 will be described below.
In step S303, at least one of registered tags added to the registered content items in the first content aggregate is set to an additional tag candidate, and further searching is carried out by using the additional tag candidate. An aggregate of registered content items acquired for each additional tag candidate is output as a second content aggregate. Specifically, the content aggregate analyzer 104 creates one or more sub-aggregates from the first content aggregate. In an example of a method of creating a sub-aggregate, a registered tag contributory to (i.e., useful for) search refinement of the input content item in the first content aggregate is selected from registered tags added to the registered content items in the first content aggregate, and an aggregate of the registered content items to each of which the selected registered tag is added is set to a sub-aggregate. As the evaluation criterion, criteria utilized in the decision tree construction such as ID3, C4.5, and the like can be used. The registered tag forming the sub-aggregate is set to an additional tag candidate. It should be noted that a word in the input content item may be set to the additional tag candidate.
In step S304, the content aggregate analyzer 104 determines whether or not a second content aggregate including the first content aggregate is present. In the one embodiment, that the second content aggregate includes the first content aggregate indicates that the whole first content aggregate is included in the second content aggregate. In another embodiment, this condition may be relaxed. That is, that the second content aggregate includes the first content aggregate can indicate that a ratio of the registered content items included in both the first content aggregate and second content aggregate to all of the registered content items in the first content aggregate is equal to or greater than a threshold. When a second content aggregate including the first content aggregate is present, processing advances to step S305 and, when a second content aggregate including the first content aggregate is not present, the processing advances to step S306
In step S305, the additional tag determination unit 105 determines a registered tag forming the second content aggregate including the first content aggregate as an additional tag. The additional tag determined in step S305 corresponds to superordinate conception of the input tag.
In step S306, the content aggregate analyzer 104 determines whether or not the number of content items of the first content aggregate is equal to or greater than a threshold. This threshold may be a constant determined in advance, and may be changeable by, for example, adjusting according to the number of registered content items stored in the content storage unit 102. When the number of the content items of the first content aggregate is equal to or greater than the threshold, the processing advances to step S307 and, when the number of the elements of the first content aggregate is smaller than the threshold, the processing advances to step S308.
In step S307, the additional tag determination unit 105 determines at least one of registered tags added to the registered content items in the first content aggregate, and contributory to search refinement of the input content item in the first content aggregate as an additional tag. Specifically, the additional tag determination unit 105 determines a registered tag forming sub-aggregate which is among the sub-aggregates and to which the input content item conforms as an additional tag. Alternatively, a registered tag which is high in the Inverse Document Frequency (IDF) may be selected from among registered tags added to the registered content items in the first content aggregate as an additional tag. Thereby, it becomes even easier to refine the input content in the sub-aggregate. Alternatively, the additional tag determination unit 105 may determine, as an additional tag, a word in the input content item which is contributory to search refinement of the input content item in the first content aggregate. The additional tag determined in step S307 corresponds to subordinate conception of the input tag.
In step S308, the content registration unit 106 registers the input content item in the content storage unit 102 in association with the input tag and the additional tag determined by the additional tag determination unit 105.
In this way, the tag adding apparatus 100 according to this embodiment determines an additional tag based on the input tag designated by the user, and registers the input content item in associated with the input tag and the determined additional tag. Therefore, it is possible to carry out tag supplementation in consideration of the intention of the user.
On the other hand, it is assumed that the user has added an input tag “tablet”. In this case, a tag “digital device” corresponding to the superordinate conception is added, and a tag “education utilization” corresponding to the subordinate conception, and being one of methods of utilization of the tablet terminal, is added.
It should be noted that the tag names “information education” and “tablet” mentioned herein are to be added by the user, and so the names to be specifically added are changeable depending on the utilization form of the user.
The case where the user newly stores the content item associated with Information Technology (IT), and the content item associated with education in the content storage unit 102 will be described below with reference to
A case is assumed where the user adds the tag “science” to the content item associated with “What should education be with respect to the evolution of living things”, and stores the content item. In the example shown in
In the case where the number of registered content items to which registered tags “science” are added, i.e., the number of content items of the first content aggregate is equal to or greater than a threshold; “biology”, “math”, “English”, and the like are extracted as registered tags forming the sub-aggregates. A similarity between each of the sub-aggregates and input content item is calculated. As the basis of calculation, anything may be used if it is an indicator based on the feature forming the content item such as cosine similarity of a document vector, BM25, and the like. If it is assumed that a sub-aggregate corresponding to the tag “biology” among the above is closest to the input content item, the tag “biology” is determined as an additional tag.
Note that when the number of the content items of the first content aggregate is equal to or greater than the threshold, the additional tag may be selected from words (character strings) included in the input content item. The first content aggregates are classified into one or more clusters by clustering, a word representing a cluster including the input content item is extracted by using an indicator such as IDF or the like, and the extracted word is determined as an additional tag. As the clustering method, a generally used method such as the hierarchical clustering, k-means, etc. can be used. In this example, the words in the input content item such as “living things”, “evolution”, and the like can be determined as the additional tags.
Next, a case is assumed where the user adds a tag “IT” to the content item associated with “Present state and issue of information education using tablet terminals”, and stores the content item. When the number of registered content items to which tags “IT” are added is equal to or greater than the threshold, the registered tag “education” forming the sub-aggregate closest to the input content item, the word “information education”, etc. in the input content item are determined as the additional tags.
In the example of
It should be noted that when a registered tag having a name different from the input tag is added to the registered content item having details similar to the input content item, the additional tag determination unit 105 regards the registered tag as a synonym for the input tag. For example, “information technology” and “IT” are regarded as synonyms. Specifically, the additional tag determination unit 105 can determine this registered tag as an additional tag, or can unify the names of the registered tag and input tag into one of the names. In the case of the former, a registered tag having a name identical to the input tag can be added to the registered content item.
Next, a content search apparatus that retrieves the content item desired by the user from an aggregate of content items to which tags are added according to the above-mentioned system will be described below.
The display unit 704 displays thereon various screens such as an input screen of a search query, screen of a search result, and the like. For example, as shown in
As described above, through the tag input to be carried out by the user, the tag adding apparatus 100 according to this embodiment has flexibility from the viewpoint of arrangement. Furthermore, the tag adding apparatus 100 complements a tag in a hierarchical relationship in consideration of search refinement. Thereby, both of ease of classification, and a reduction in search effort are realized.
The tag adding apparatus 100 according to this embodiment adds an additional tag at a timing at which the user adds the input tag. In this case, the additional tag may differ between the case where there are few registered content items and registered tags, and the case where there are numerous registered content items and registered tags, even when the same input tag is added to the same input content item. Especially in the former case, there is a possibility of the added additional tag being insufficient.
In the modification example of this embodiment, the processing of adding an additional tag to the input content item is executed at predetermined timing or at a timing at which the user carries out an explicit operation. The predetermined timing is, for example, a point in time for carrying out a periodic review. The timing at which the user carries out an explicit operation is, for example, a point in time at which the user executes a search operation of the content item. In this case, the adding of the additional tags is executed in the range of the entire registered content items or within a limited range of, for example, registered content items to which registered tags coincident with the tag designated as the retrieval query are added. A criterion for determination as to whether or not additional-tag adding processing should be executed again can be based on, for example, the time elapsed from execution of the last tag adding processing. Specifically, additional-tag adding processing may be executed again for registered content items for which a predetermined period of time has elapsed since the last tag adding processing has been executed.
Regarding the case where a periodic review is also carried out, the selection criterion of the content item can be identical to the above. Apart from the above, it is also conceivable that the additional tag may be reviewed at a timing in which the content aggregate itself significantly changes. For example, a case where a television program is treated as a content item is assumed. In this case, as the tag, there are tags personally added by the user, and in addition to information described in the metadata, for example, a program guide, such as a name of a leading actor/actress, genre, broadcasting station, etc can be a tag. Regarding programs personally recorded and managed by the user, it is assumed that the user inputs or selects a tag, and an additional tag is correspondingly added. Other programs also exist which are not recorded, but are listed in the program guide. In such a case, new programs are automatically registered every day, and old programs are deleted. Also, when there is a large change in the content group of the management object such as rearrangement of the program configuration, and new construction of a broadcasting station, it is conceivable that the details of programs to be managed are significantly changed at one time. Concurrent with the change, the management details of tags are also influenced, and thus it is necessary to conduct a complete review of the configuration of tags.
In the tag adding apparatus according to the embodiment, although implementation in a portable hardware device is assumed, part of the functions thereof may be executed on an external server connected thereto through a network. It is also possible to implement the tag adding apparatus in a general computer including a control device such as a CPU, storage device such as a ROM, RAM, external storage device such as an HDD, display device such a liquid crystal display device, and input device such as a keyboard and mouse.
Instructions shown in the processing procedure shown in the above embodiment can be executed on the basis of a software program. A general-purpose computer system stores therein this program in advance, and reads this program, whereby it is also possible to obtain an advantage identical to the advantage obtained by the above-mentioned tag adding apparatus. Instructions described in the above-mentioned embodiment are recorded on a magnetic disk (flexible disk, hard disk, etc.), optical disk (CD-ROM, CD-R, CD-RW, DVD-ROM, DVD±R, DVD±RW, Blu-ray (registered trade mark) Disc, etc.), and semiconductor memory or other recording media similar to this as a program which can be executed by a computer. If these recording media are recording media which can be read by a computer or an embedded system, their storage form may have any configuration. The computer can realize an operation identical to the above-mentioned tag adding apparatus by reading the program from this recording medium, and causing the CPU to execute the instructions described in the program on the basis of the program. Of course, when acquiring the program or when reading the program, the computer may acquire or read the program through a network. The middleware (MW) or the like such as an operating system (OS), database-management software, network, etc. operating on the computer on the basis of instructions of the program installed from the recording medium onto the computer or the embedded system may execute part of each of the processing items for realizing the embodiment.
Furthermore, the recording medium in the embodiment is not limited to a medium independent of the computer or the embedded system, and a recording medium storing or temporarily storing therein a downloaded program transmitted through a LAN, the Internet, or the like is also included in the scope of the recording medium in the embodiment.
Also, the recording medium is not limited to one recording medium and, when the processing of the embodiment is executed on the basis of a plurality of media, the media are included in the scope of the recording medium in the embodiment, and the configuration of each medium may be any type of configuration.
It should be noted that the computer or the embedded system in the embodiment is designed to execute each processing item in the embodiment on the basis of the program stored in the recording medium, and may have one of the configurations of a device formed by one of a personal computer and microcomputer, and a system in which a plurality of devices are network-connected.
In addition, the computer in the embodiment is not limited to a personal computer, and includes an arithmetic processing unit, microcomputer, etc. included in an information processing apparatus, and apparatuses and devices capable of realizing the functions in the embodiment by means of a program are generically called a computer in the embodiment.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2014-190225 | Sep 2014 | JP | national |