1. Field of the Invention
The present invention relates to an image retrieval apparatus, a method for retrieving an image, and a control program for the image retrieval apparatus. In this image retrieval apparatus, document image data is input by an inputting unit such as a scanner, and accumulated and stored in a storage unit such as a hard disk. A specified document image data is retrieved and output from among the document image data stored in the storage unit in response to user's specification.
2. Description of the Related Art
A large capacity memory device such as a hard disk, and an inputting unit such as a scanner for electronically reading document image data are in a widespread use. As a result, construction and storage of a large-scale document image database can be realized. Such a document image database is applicable to an electronic book, medical document, administrative record, electronic scrap, map, administrative format, and manual. These days, use of a document image database system is widespread since in general, storing the read document image in electronic media is less expensive than storing the document as it is.
Concerning the above described document image database system, Japanese Patent Application Laid-Open No. 2000-324331 discusses a method for compressing image data that is apt to become large, and effectively managing such data. In this method, first, an image of each page of the input document image data is divided into a plurality of regions according to the image attribute (e.g., text, graphic, table, and picture) contained in the target page. Then, the image in each region is subjected to a different compression process depending on the attribute to reduce the data amount of the entire page. Specifically, this method is performed according to the following procedures:
The extension process is applied to the compression data in every attribute region which is obtained by dividing the image into the small regions. The extended data is pasted to the coordinate position of the region within an original image page, so that the image of a page is reproduced.
It is necessary to effectively distinguish and search the desired document in handling a large-scale document image database system. As a method for referring the desired document to the database, a text string or its combination presumed to be present in the desired document is retrieved. However, since this method requires optical character recognition process with high accuracy, it is difficult to realize the practical application.
There is another method for referring the desired document to the database, which assumes that a user has some knowledge of the appearance of the document that the user desires to retrieve. A method for using this appearance information to refer to the document image database is discussed in U.S. Pat. No. 5,933,823. The method is described hereinafter.
First, an example document image whose rough appearance is similar to the target document is generated by simple category selection or the like to obtain its image feature information. Second, the image feature information is used for searching the database to display the plurality of documents as the search result which have a similar appearance to the example document image. Third, as a key for the next search, the user selects a document, which has the most similar appearance to the desired document, from among the displayed search result. The next search is performed with the selected key. The desired document is finally retrieved by repeating the process.
U.S. Pat. No. 5,933,823 proposes the following three methods for presenting the example document image, which is used for a key for the first search, to the image database system:
An embodiment of the present invention is directed to providing an image retrieval device in which a user can efficiently retrieve the desired document image data by a simple operation.
According to an aspect of the present invention, an apparatus includes an inputting unit configured to input document image data; a layout analysis unit configured to divide the input document image data of each page into a plurality of regions according to attribute of an image of the page to generate layout information for each of the plurality of regions; a processing unit configured to classify the document image data of each page such that the document image data of each page belongs to one of a plurality of groups, based on the layout information stored in association with the document image data; a specification unit configured to specify one of the plurality of groups according to which a user requests to retrieve one or more pages of document image data; and a retrieval unit configured to retrieve one or more pages of document image data belonging to the group, which is specified by the specification unit, from among a plurality of pages of document image data.
According to an embodiment of the present invention, a user uses own memory of layout of a page of document image data, which the user desires to search, so that the search process can be performed only by specifying a group, to which the document image data of the page, which the user desires to search, belongs, according to the layout. Therefore, the user can efficiently search the desired document image data by a simple operation.
Further features and aspects of the present invention will become apparent from the following detailed description of exemplary embodiments with reference to the attached drawings.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments, features, and aspects of the invention and, together with the description, serve to explain the principles of the invention.
Various exemplary embodiments, features, and aspects of the invention will be described in detail below with reference to the drawings.
A mass-storage device 101 as a storage unit with a large capacity can register and store a large amount of (a lot of) document image data, and includes a hard disk device and the like. The document image data is accumulated in the mass-storage device 101 so as to constitute a document image database which can realize the search by using a layout of an image of a page which will be described hereinafter.
A central processing unit (CPU) 102 is a control unit for overseeing and controlling the entire system of the image retrieval apparatus. The CPU 102 functions as a layout analysis unit for analyzing a layout of an image of each page of the document image data which is to be input (as described hereinafter). Additionally, the CPU 102 functions as a processing unit for classifying document image data in a unit of a page by using the layout, based on the layout information (layout analysis data) obtained in the layout analysis. Moreover, the CPU 102 is also a retrieval unit configured to retrieve the document image data in response to the user's specification based on the layout of the page.
A read only memory (ROM) 103 stores a control program that is executed by the CPU 102. The control program includes the programs corresponding to each process of control procedures (described hereinafter) illustrated in flow charts of
A random access memory (RAM) 104 temporarily memorizes each data to be processed by the CPU 102.
A display unit 105 (output unit) includes a liquid crystal display device or the like capable of displaying bitmap image data.
An operation unit 106 includes input keys used for performing each input. The user uses the keys to operate the image retrieval apparatus. Some of the input keys (a cursor key 501 and a determination key 502 in
An inputting unit 107 is configured to input the document image data. Specifically, the inputting unit 107 is a scanner device for electrically reading the document image of a manuscript to convert the image into the image data, or an interface device for receiving the document image data from an external apparatus (not shown) through an appropriate interface.
A printer 108 outputs the image of the document image data, which has been obtained as the retrieval result, by printing the image on a sheet.
In the configuration, the CPU 102 controls the overall operation in response to the input from the keys in the operation unit 106 performed by a user. For instance, the CPU 102 registers the document image data, which is input by the inputting unit 107, to the mass-storage device 101; that is, the CPU 102 accumulates and stores the document image data. In addition, the CPU 102 searches and retrieves the document image data, which corresponds to the condition specified by a user, from the mass-storage device 101. The CPU 102 outputs the retrieved document image data by displaying the document image data in the display unit 105, or printing out the data using the printer 108.
In the registration operation, classifying is performed in which the document image data in a unit of page to be input is classified such that the document image data belongs to one of a plurality of groups according to the layout of each page having a text region and other image region. In the retrieval operation, the group to which the document image data of the page that the user desires to retrieve, belongs is specified according to the layout of the image of the page, based on the memory of the user itself about the layout of the page of the document which the user desires to retrieve. Subsequently, the document image data of the page belonging to the specified group is retrieved and output.
The registration operation of the document image data performing the classifying process according to the layout of the image of each page and the retrieval operation of the document image data in response to the group specified by the page layout are described hereinafter in detail.
When the document image data is registered, first, in step S201, the inputting unit 107 inputs the document image data of a page as a multi-valued color image data under control of the CPU 102. The multi-valued color image data is represented, for example, as data of 24 bits, and it is temporarily stored in the RAM 104.
In step S202, the CPU 102 converts the input multi-valued image data into a binary image data. The conversion is performed at this stage to generate the binary image data in addition to the multi-valued image data, while the multi-valued image data is kept for later use.
In step S203, the CPU 102 as a layout analysis unit performs the layout analysis corresponding to the image attribute contained in the document image of the page, based on the binary image data. Namely, at first, the image attribute (e.g., text, graphic, table, picture, and photo) is determined. The image of a page is divided into a plurality of regions (n regions) according to the determined image attributes, and data in n regions is obtained. Then, data of layout information, that is, x and y coordinates at a point of origin (a point at the upper-left corner), width, and height, and the data of the attribute in each divided region are generated. Hereinafter, these data and the data of n regions are in all referred to as layout analysis data. The layout analysis is described in Japanese Patent Application Laid-Open No. 2000-324331. Further, the attribute of the image of a page is initially performed by dividing the image of a page into many small regions as described in U.S. Pat. No. 5,907,835.
An example of the layout analysis process performed in step S203 is illustrated in
After the layout analysis is carried out, in step S204 in
Next, in step S205, the CPU 102 executes the classifying process using the layout to the stored document image data of a page. The classifying process is performed based on the layout analysis data (layout information) as to the plurality of regions in a page which has been stored in association with the stored document image data of a page. In the classifying of the present exemplary embodiment, first, one page is divided into equal four regions having the size of 2×2. In the respective four regions, the area of the text or blank region and that of the image region other than text are compared in size. When the area of the text or blank region is larger than the image region, the region is determined as a text or blank portion; meanwhile, the area of the image region other than text is larger than the text or blank region, the region is determined as an image portion other than text. Then, it is determined into which pattern of layout images (1) to (16) illustrated in
The classifying process is performed to obtain the information about the layout representative image 400 to which the layout of the image of a page is applied. Namely, the discrimination information of the group, into which the document image data of a page is classified and which is allocated according to the layout, is obtained. The discrimination information is referred to as a page layout group, and is represented by the numbers 1 to 16 which are attached to each layout representative image. For instance, while the document image 301 illustrated in
In step S206, the CPU 102 stores the numeric data of the page layout group of the image of a page which has been obtained in step S205, in the mass-storage device 101. The numeric data is stored in association with the layout analysis data of this page already stored therein, i.e., in association with the already stored document image data of this page.
In step S207, in order to inform the classification result of the clustering using the layout to the user, the CPU 102 displays the layout representative image of the group in the display unit 105 into which the document image data of a page is classified and belongs, from among sixteen patterns of the group illustrated in
First, in step S601, when the user operates the input key of the operation part 106 to instruct the retrieval process, the CPU 102 starts the retrieval process of the document image data accumulated in the mass-storage device 101.
In step S602, the CPU 102 displays the layout representative images 400 having sixteen patterns illustrated in
In step S603, among the sixteen patterns of layout representative images 400 displayed in step S602, the user determines which is most similar to the layout of the text part and the image part of the document image page which the user desires to retrieve. Then, input is performed to select and specify the layout representative image determined to be most similar using the cursor key 501, the determination key 502 (see
In step S604, the CPU 102 retrieves the document image data of the page which belongs to the group specified by the layout representative image. Namely, first, the numeric value which is the same as the numeric value of the group of the layout representative image selected by the user in step S603 is retrieved from among the numeric value data of the page layout group of the document image of each page stored in the mass-storage device 101. Then, the document image data of one or a plurality of pages stored in association with the retrieved numeric data of the page layout group is searched and retrieved.
In step S605, the CPU 102 outputs the document image data of one or a plurality of pages retrieved in step S604 to the display unit 105 to display the image of the page. Then, the process is finished.
The user selects the desired document image data from one or plural pages of displayed image of the document image data, and performs operations such as printing. When a large number of the pages is retrieved, the document image data of the retrieved pages is further searched and narrowed down by other methods.
As described above, according to the image retrieval apparatus of the present exemplary embodiment, when the user attempts to retrieve the desired document image data, the layout representative images of a plurality of pages are displayed. Then, in performing the retrieval process, the user selects and specifies only a displayed layout representative image which has most similar layout to a document page which the user memorizes and desires to retrieve, based on the text or blank region and the image region in the document page. Further, it is possible to retrieve the page which the user desires, by an extremely simple operation using a user's memory of the layout of the document image of the page.
In the present exemplary embodiment, the number of the layout representative images, i.e., the number of the group corresponding to the image layout of a page, is sixteen. However, the number of the group is not limited to sixteen. In addition, the patterns of the layout representative images are not limited to those illustrated in
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all modifications, equivalent structures, and functions.
This application claims priority from Japanese Patent Application No. 2007-078107 filed Mar. 26, 2007, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2007-078107 | Mar 2007 | JP | national |