Document reflow involves reestablishing where the line breaks and page breaks occur in a text document. Document reflowing happens as a matter of course in any word processor or web browser when the size of the character font or the page margins/dimensions are changed. This capability requires an online representation of the elements of the text. The ability to reflow images of texts is important in a number of circumstances. Foremost is the growing business of repurposing scanned images of books for electronic reader devices (e.g., desktop computers, and mobile devices, such as specialized electronic book (eBook) devices, laptops, mini-notebook “net-books,” and other mobile devices, including mobile telephones), in which the display area is smaller than a typical book page. Rather than requiring the user to scroll horizontally on each line to read the full width of the page—which is completely impractical for usability—the text needs to be reflowed to fit the narrower page of the device. After the reflowing, a user simply needs to scroll vertically to read through the reformatted book. Other purposes include reformatting a scanned text for printing with a larger font size, reproducing a scanned book with a large font in a smaller font, reprinting a book with a different page size than the original, and using excerpts of a scanned text within a poster or other document in a way that calls for line breaks different than in the original.
When an image of text needs to be reflowed, optical character recognition (OCR) typically is performed on the scanned text to produce a character stream representation of the scanned text that can be reflowed. This approach has the disadvantage that errors in the OCR results will appear in the reflowed text. In another approach, the location of each word in each block of text is determined utilizing page decomposition software that provides a location, height and width of the bounding box for each word, and the text data is reflowed by taking each successive bounding box for each word and generating a line of text until each line is filled and, if the bounding box of a word extends beyond the display width, a new line is started and the word is placed on the new line. By requiring the identification of each word in a scanned document, this approach is computationally and memory resource intensive, making it less suitable for compact application environments in which processing and memory resources typically are significantly constrained.
In the following description, like reference numbers are used to identify like elements. Furthermore, the drawings are intended to illustrate major features of example embodiments in a diagrammatic manner. The drawings are not intended to depict every feature of actual embodiments nor relative dimensions of the depicted elements, and are not drawn to scale.
A “computer” is any machine, device, or apparatus that processes data. Some types of computers process data according to computer-readable instructions that are stored on a computer-readable medium either temporarily or permanently. Example types of computers include server, desktop and portable computers, electronic book readers, personal digital assistants (PDAs), multimedia players, game controllers, mobile telephones, and pagers), image and video recording and playback devices (e.g., digital still and video cameras, VCRs, and DVRs), printers, and other embedded data processing environments (e.g., application-specific integrated circuits (ASICs)).
The term “reflow area” refers to an area where text can be displayed, including areas of a display screen and areas of a printed page.
The terms “text” and “textual” refer to a logical arrangement of text elements (e.g., glyphs, characters, or symbols) of a written composition or a score of a musical composition. Text may or may not be presented with divisions between logical aggregates (e.g., words, sentences, or musical bars) of the constituent text elements.
A “text line image” is an image of a line of text. A “line of text” refers to a sequential arrangement of text elements, typically in reading order, along a line that may be straight or curved. The term “maximum line length” refers to the maximum potential length of a respective reflow area line that is available for displaying one or more whole or partial text line images. The maximum line lengths of different lines of a reflow area may be the same or different.
The embodiments that are described herein provide document reflow apparatus and methods that can reflow text written in flowing script type languages (e.g., Hindi) and non-word-based texts (e.g., music), without requiring significant memory and computational resources. At least some of these embodiments do not require the use of OCR engines or the like in order to reflow text and therefore are not limited to reflowing text for which OCR engines are available. Due to their efficient use of processing and memory resources, some of these embodiments may be implemented in relatively small and inexpensive components that have modest processing power and modest memory capacity. As a result, these embodiments are highly suitable for incorporation into compact computer device environments that have significant size, processing, and memory constraints, including but not limited to an mobile devices (e.g., electronic book readers, portable computers, personal digital assistants (PDAs), multimedia players, game controllers, mobile telephones, and pagers), image and video recording and playback devices (e.g., digital still and video cameras, VCRs, and DVRs), printers, and other embedded data processing environments (e.g., application specific integrated circuits (ASICs)).
A. Overview
The document image 16 may be any type of image that contains one or more lines of text (e.g., a scanned image of a printed page of text). The reflow area 18 may be any type of area where images of lines of text may be displayed, including a computer display screen and a printed page.
Embodiments of the document reflow system 10 may be implemented by one or more discrete modules (or data processing components) that are not limited to any particular hardware, firmware, or software configuration. In the illustrated embodiments, these modules may be implemented in any type of computer environment, including in digital electronic circuitry (e.g., an application-specific integrated circuit, such as a digital signal processor (DSP)) or in computer hardware, firmware, device driver, or software. In some embodiments, the functionalities of the modules of the document reflow system 10 are combined into a single data processing component. In some embodiments, the respective functionalities of each of one or more of the modules of the document reflow system 10 are performed by a respective set of multiple data processing components.
The document decomposition module 12 and the text line image reflowing module 14 may be co-located on a single apparatus or they may be distributed across multiple apparatus. If distributed across multiple apparatus, document decomposition module 12 and the text line image reflowing module 14 may communicate with each other over local wired or wireless connections, or they may communicate over global network connections (e.g., over the Internet). In some example embodiments, the document decomposition module 12 is located on a server computer and the text line image reflowing module 14 is located on a client computer terminal (e.g., a desktop computer or a portable computer, such as an eBook reader or a mobile telephone).
B. Decomposing A Document Image
The document decomposition module 12 decomposes the document image 16 to produce a decomposition specification that includes specifications of locations of text line images corresponding to complete lines of text in the document image 16 (see
In accordance with the method of
In some embodiments, the decomposition specification is in the form of a data structure (e.g., a table or a list) that is stored on a computer-readable medium in an XML (eXtensible Markup Language) file format. The decomposition specification may be associated with the document image 16 in a variety of different ways. For example, in some embodiments, the decomposition specification may be incorporated into a metadata header of the document image data file. In other embodiments, the decomposition specification may be stored in a separate data file that includes a reference (e.g., a hyperlink or a uniform resource locator) to the document image 16.
In some embodiments, the document decomposition module 12 identifies text blocks in the document image 16 and determines specifications of bounding boxes that respectively contain complete lines of text in the text block using any of a variety of different document decomposition processes that commonly are used in optical character recognition technology. Such processes typically include image binarization and text segmentation. The binarization process typically involves classifying image pixels as text or background based on adaptive thresholding and histogram analysis. The text segmentation process typically involves using connected components analysis or edge-based analysis to identify regions of text in the binarized image.
C. Reflowing Text Line Images
The text line image reflowing module 14 reflows the text line images specified in the document decomposition specification received from the document decomposition module 12 in the reflow area 18 (see
In accordance with the method of
In accordance with the method of
The text line image reflowing module 14 sets the packed length for the current line of the reflow area equal to zero (
If the packed length is not greater than the maximum line length specified for the current reflow area line (
If the packed length is greater than the maximum line length specified for the current reflow area line (
When the packed length exceeds the maximum line length specified for the current line of the reflow area (
With respect to textual content that is written in accordance with an orthography (e.g., an orthography of a language using an alphabetic script) that includes discernable breaks (e.g., in English or German orthographies, words are separated by space marks) between words, the text line image reflowing module 14 searches for the first such break starting at the maximal division location and moving toward the beginning of the current text line image in reverse reading order.
An approach similar to the one used to detect boundaries between words written in languages using alphabetic scripts is used for reflowing document images of musical compositions. In some embodiments, the text line image reflowing module 14 reflows text line images of a musical composition based on the detection of bar lines between the measures of the musical composition. In these embodiments, the text line image reflowing module 14 selects any needed division locations in such text line images at bar lines separating bars of the musical composition.
With respect to textual content written in accordance with an orthography that does not include discernable breaks between words, the text line image reflowing module 14 may determine the division location in the current text line image based on an analysis of the textual content (
In general, the document reflow system 10 typically includes one or more discrete data processing components, each of which may be in the form of any one of various commercially available data processing chips. In some implementations, the document reflow system 10 is embedded in the hardware of any one of a wide variety of digital and analog computer devices, including desktop and workstation computers, digital still image cameras, digital video cameras, printers, scanners, and portable electronic devices (e.g., mobile phones, laptop and notebook computers, and personal digital assistants). In some embodiments, the document reflow system 10 executes process instructions (e.g., machine-readable code, such as computer software) in the process of implementing the methods that are described herein. These process instructions, as well as the data generated in the course of their execution, are stored in one or more computer-readable media. Storage devices suitable for tangibly embodying these instructions and data include all forms of non-volatile computer-readable memory, including, for example, semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices, magnetic disks such as internal hard disks and removable hard disks, magneto-optical disks, DVD-ROM/RAM, and CD-ROM/RAM.
A user may interact (e.g., enter commands or data) with the computer system 140 using one or more input devices 150 (e.g., a keyboard, a computer mouse, a microphone, joystick, and touch pad). Information may be presented through a user interface that is displayed to a user on the display 151 (implemented by, e.g., a display monitor), which is controlled by a display controller 154 (implemented by, e.g., a video graphics card). The computer system 140 also typically includes peripheral output devices, such as speakers and a printer. One or more remote computers may be connected to the computer system 140 through a network interface card (NIC) 156.
As shown in
The embodiments that are described herein provide document reflow apparatus and methods that can reflow text written in flowing script type languages (e.g., Hindi) and non-word-based texts (e.g., music), without requiring significant memory and computational resources. Due to their efficient use of processing and memory resources, some of these embodiments may be implemented in relatively small and inexpensive components that have modest processing power and modest memory capacity. As a result, these embodiments are highly suitable for incorporation into compact computer device environments that have significant size, processing, and memory constraints.
Other embodiments are within the scope of the claims.