This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2013-059028, filed Mar. 21, 2013, the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to an annotation search apparatus and method.
Terminal devices such as a PC (Personal Computer) and tablet terminal, which include a pen input interface, provide an annotation function which allows the user to annotate an electronic document (for example, a Web page, electronic book, and the like). According to such environment, the user can easily annotate an electronic document of his or her own interest anytime via a display device and input device which electronically imitate a familiar paper sheet and pen.
An annotation function enables to collect interesting information. However, if a large number annotation information items related to annotated documents are accumulated, when the user performs a job such as document creation by utilizing the annotation information items, it is difficult to find out an annotation information item useful to create that document. For this reason, it is required to allow the user to search for an annotation information item with high availability.
According to an embodiment, an annotation search apparatus includes a feature extractor and an annotation search unit. The feature extractor is configured to extract an annotation feature from an input document and an annotation appended by a user to the input document. The annotation search unit is configured to search annotation information items to retrieve at least one of the annotation information items according to an intended purpose of the user, one of the annotation information items corresponding to the input document and including the annotation feature.
Various embodiments will be described hereinafter with reference to the accompanying drawings.
The annotation search apparatus 100 stores, i.e., accumulates annotation information items to be used later, which are related to annotated electronic documents (to be also simply referred to as documents hereinafter), and searches the accumulated annotation information items for an annotation information item according to a use destination, i.e., a user's intended purpose. Thus, when the user uses an annotation information item, an annotation information item which is useful for the user can be presented.
As an example of annotations according to this embodiment, for example, recorded bookmarks of documents and images of a Web page, electronic book, electronic magazine, and the like, and handwritten annotations such as an enclosing figure, underline, character string (for example, a comment), symbols (for example, ◯, ⋆, etc.), and the like are included. In this embodiment, a document includes, for example, a Web page, electronic book, electronic magazine, and the like, and can include text and images. Note that a document may also be an electronic document which is obtained by reading a paper document (for example, a magazine) using an optical reader such as a camera or scanner, and further undergoing OCR (Optical Character Reader) processing.
As shown in
The annotation input unit 101 inputs a document used (for example, browsed or created) by the user and an annotation appended to this document by the user. In this embodiment, the user inputs a desired annotation at a desired position on a document displayed on the display screen using the pen.
The feature extractor 102 extracts an annotation feature from the document and annotation input by the annotation input unit 101. The annotation feature includes an annotation type (for example, an enclosing figure, underline, or character string), an object to which an annotation is appended (for example, text, an image, or an entire document), and a position where an annotation is appended in a document (for example, the coordinates of a document as a whole, coordinates on the display screen, line number, paragraph number, or XPath). An object to which an annotation is appended will also be referred to as an annotated object hereinafter.
The document classification unit 103 classifies the input document to a category of predetermined categories, based on the contents of the entire document and annotated object. Alternatively, the document classification unit 103 classifies, based on the contents of the entire document and annotated object, the input document into a cluster of clusters determined based on a set of annotation information items stored in the annotation storage unit 104.
The annotation storage unit 104 stores annotation information items. Each annotation information item includes the document and annotation input by the annotation input unit 101, the annotation feature extracted by the feature extractor 102, and the category or cluster of the input document classified by the document classification unit 103.
The template storage unit 105 stores one or more templates (patterns or formats) for a document generated by the user using an annotation information item. A document generated by the user using an annotation information item will be referred to as a note hereinafter.
The template selection unit 106 presents templates stored in the template storage unit 105 so as to allow the user to select a template. More specifically, the user selects a desired one of templates presented by the template selection unit 106 using, for example, the pen, and the template selection unit 106 receives a user operation for selecting the template.
The character input unit 107 receives an input from the user with respect to a note. In this embodiment, the user can input text to a note using a keyboard and can input handwritten characters and figures on a note using the pen.
The annotation search unit 108 searches the annotation information items stored in the annotation storage unit 104 to retrieve at least one of the annotation information items according to the user's intended purpose. More specifically, the annotation search unit 108 searches the annotation storage unit 104 for available one or more annotation information items based on the type of the template selected by the template selection unit 106 and the input contents from the character input unit 107.
The annotation selection unit 109 presents the annotation information item found by the annotation search unit 108 so as to allow the user to select the annotation information item. In this embodiment, an annotation information item is presented to the user by displaying a document appended with an annotation. A document appended with an annotation will be referred to as an annotated document hereinafter. An annotated document is generated based on the annotation information item, and is displayed. More specifically, an annotated document can be generated from a document and annotation included in an annotation information item. In this case, the user selects a desired annotated document from those presented by the annotation selection unit 109 using the pen. The annotation selection unit 109 receives a user operation for selecting an annotated document.
The annotation operation unit 110 receives a user operation for an annotated document pasted on the note. The display unit 111 displays an annotation information item (more specifically, annotated document) selected by the annotation selection unit 109 on the note.
The annotation input unit 101, feature extractor 102, document classification unit 103, template selection unit 106, character input unit 107, annotation search unit 108, annotation selection unit 109, annotation operation unit 110, and display unit 111 may be implemented by a CPU (Central Processing Unit) and a memory used by the CPU. The annotation storage unit 104 and template storage unit 105 may be implemented by the memory used by the CPU and/or an auxiliary storage device.
In step S202, the document classification unit 103 classifies the document based on the contents of the entire document and the annotated object. As a classification method, a method of classifying a document to one or more categories of a plurality of predetermined categories (for example, travel, commercial product, health, economy, and book), a method of classifying a document to one or more clusters of a plurality of clusters obtained as a result of clustering of annotation information items already stored in the annotation storage unit 104, and the like can be used. In the former method, the document classification unit 103 identifies a category to which the document belongs using, for example, classifiers such as a support vector machine. In the latter method, the document classification unit 103 classifies the document to a cluster using, for example, hierarchical clustering.
In step S203, an annotation information item including the annotation feature extracted in step S201 and the document classification result obtained in step S202 is stored in the annotation storage unit 104 together with the input document and input annotation.
Note that the annotation information item may also include other information (for example, URL (Uniform Resource Locator)) associated with the input document. Also, the annotation information item may also include an annotated document generated from the input document and input annotation. In this case, the annotation information item need not include the input document and input annotation.
In this manner, the annotation information item associated with the document annotated by the user is stored in the annotation storage unit 104. By repetitively executing the processing shown in
A document 301 shown in
An annotation information item stored in the annotation storage unit 104 by the processing shown in
As a presentation method of an annotated document, a method of displaying the entire document, a method of clipping and displaying an annotated object, and the like can be used. For example, when an image is enclosed by a line, as shown in
Initially, templates stored in the template storage unit 105 are displayed. The user selects a desired template from those displayed by the template selection unit 106 (step S401). As templates, formats which assume intended purposes such as a travel note, commercial product comparison note, reading note, and foodie note are prepared. For example, when a travel note is selected, information items such as a destination to visit, transportation, and souvenir are assumed to be input. Note that the template storage unit 105 may store a template which does not assume any specific intended purpose.
The annotation search unit 108 searches for an annotation information item according to the type of the template selected in step S401 and a user operation for the note (step S402). For example, when the template for a travel note is selected, the annotation search unit 108 determines that a use destination (user's intended purpose) is a travel category based on the type of the selected template, and generates a search query used to search for an annotation information item related to a document classified to the travel category. When the user inputs “Finland” to the note, a search query required to search for an annotation information item related to document including “Finland” in annotated object or around it is generated. When a template for a reading note is selected, a search query required to search for an annotation information item related to document classified to a book category is generated. In this case, the search query is set to preferentially retrieve annotation information items related to documents each appended with an underline and comment of those related to documents classified to the book category. In this manner, the search priority may be set based on the type of the template and the class of the annotation.
When an annotated document corresponding to the retrieved annotation information item is displayed, the user selects that annotated document by the annotation selection unit 109 to paste the selected annotated document to the note (step S403). As a pasting method, a method of simply touching the displayed annotated document with the finger or pen, a method of designating a position to lay out the document by dragging & dropping, and the like are available. When a plurality of annotated documents are selected and laid out, two or more annotated documents can be pasted at the same time. Furthermore, part (for example, a character string or image) of the annotated document can be selected and pasted to (for example, to a memo field in the note).
The annotated document pasted on the note is displayed by the display unit 111. As a display method, a method of displaying a document without changing its size, a method of displaying a document in an enlarged or reduced scale according to the size of a template or a frame in the template, a method of displaying a document by adjusting a marking range in correspondence with a shape of a frame in a template, and the like can be used.
In this embodiment, an annotation information item is displayed to have an appearance when it is annotated intact. That is, the annotated document is displayed. Furthermore, a function of displaying processing results of the feature extractor 102 and document classification unit 103, information of an annotated document, and the like in response to a predetermined operation (“touching”, “double-tapping”, “turning over”, “reading contents described in lower layer by flipping”, etc.) may be included.
For example, when “travel note” is selected (step S551), annotation information items (more specifically, annotated documents) which are more likely to be pasted on the travel note are displayed in turn. In the example of
Next, an example upon selection of “reading note” will be described below. In the example of
Note that the embodiment is not limited to the example in which an annotation information item classified to a category according to the type of the selected template is displayed as an available annotation information item. For example, when the travel note is selected, an annotation information item classified to the travel category may be displayed, and that which is not classified to the travel category, that is, an unavailable annotation information item may be displayed with a low priority order. The annotation information search operation may be automatically executed when a predetermined operation (that for pasting the annotated document on the note or the like) is made, in addition to the search operation explicitly conducted by the user.
As a method of displaying a list of annotated documents, a method of displaying annotated documents in a priority order from that corresponding to annotation information items retrieved preferentially, a method of expressing priority levels by the degrees of emphasis of sizes, colors, or the like, and the like can be used.
A method of implementing predetermined functions by making predetermined operations on the annotated document pasted on the note will be described below with reference to
Also, operations for an annotated document may be determined according to the template (that is, note) as a destination where that document is pasted. For example,
Note that in
The annotation search apparatus 100 can include a document generation unit 901 (shown in
Annotation information items collected by the user and the generated note are browsed by the user himself or herself, and can also be shared by other users. In this case, the user can also make a comment on the shared note of another user.
A sharing example of annotation information items and a note will be described below with reference to
The annotation search unit 108 can present an annotation information item generated by another user to the user. The annotation search unit 108 may retrieve an annotation information item by another user for the same document as that corresponding to an annotation information item by the user, and may retrieve an annotation information item corresponding to another document having the same layout as that of the document corresponding to the retrieved annotation information item.
Note that the annotation storage unit 104 need not always be arranged in the annotation search apparatus 100, and may be arranged in another apparatus (for example, a server) which can communicate with the annotation search apparatus 100. With this arrangement, a plurality of users can share an annotation information item.
As described above, the annotation search apparatus according to this embodiment accumulates annotation information items related to documents of the user's interests, and searches the accumulated annotation information items for an annotation information item according to a use destination (user's intended purpose), thus allowing to easily generate a note using the annotation information items. That is, the user can easily retrieve a useful annotation information item. Furthermore, predetermined processing is executed in response to an operation (for example, that using the pen) for a document included in an annotation information item or an annotated document. Thus, the user can execute processing such as keyword extraction and related information search without inputting any keyword using the keyboard. That is, required information can be easily extracted using the contents of an original document (input document).
In the aforementioned example of this embodiment, an annotation information item (annotated document) is presented after template selection. Alternatively, when an annotated document is selected, an available template may be presented.
The annotation search apparatus of this embodiment assumes implementation by a portable hardware apparatus. Alternatively, some functions of the annotation search apparatus of this embodiment may be executed on an external server connected to a network. Also, the annotation search apparatus of this embodiment can be implemented by a general computer which includes a control device such as a CPU, a storage device such as a ROM and RAM, an external storage device such as an HDD, a display device such as a display, and an input device such as a keyboard and mouse.
Instructions in the processing sequences described in the aforementioned embodiment can be executed based on a program as software. A general-purpose computer system stores this program in advance and loads the stored program, thus obtaining the same effects as those by the annotation search apparatus of the aforementioned embodiment. The instructions described in the aforementioned embodiment are recorded, as a program which can be executed by a computer, in a magnetic disk (flexible disk, hard disk, etc.), optical disk (CD-ROM, CD-R, CD-RW, DVD-ROM, DVD±R, DVD±RW, etc.), semiconductor memory, or similar recording medium. The storage format of a recording medium is not particularly limited as long as that recording medium is readable by a computer or embedded system. The computer loads the program from this recording medium, and controls a CPU to execute instructions described in the program based on this program, thus implementing the same operation as the annotation search apparatus of the aforementioned embodiment. Of course, the computer may acquire or load the program via a network.
Also, an OS (Operating System), database management software, MW (middleware) for a network, or the like, which runs on a computer, may execute some of processes required to implement this embodiment based on instructions of a program installed from the recording medium in a computer or embedded system.
Furthermore, the recording medium of this embodiment is not limited to a medium separate from a computer or embedded system, and includes a recording medium, which stores or temporarily stores a program downloaded via a LAN, the Internet, or the like.
The number of recording media is not limited to one, and the recording medium of this embodiment includes the case in which the processing of this embodiment is executed from a plurality of media. That is, the medium configuration is not particularly limited.
Note that the computer or embedded system of this embodiment is used to execute respective processes of this embodiment based on the program stored in the recording medium, and may have an arbitrary arrangement such as a single apparatus (for example, a personal computer, microcomputer, etc.), or a system in which a plurality of apparatuses are connected via a network.
The computer of this embodiment is not limited to a personal computer, and includes an arithmetic processing device, microcomputer, or the like included in an information processing apparatus, and is a generic name of a device and apparatus, which can implement the functions of this embodiment based on the program.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2013-059028 | Mar 2013 | JP | national |