1. Field
The exemplary embodiments described herein are directed to systems and methods for capturing images, and more specifically, to systems and methods for capturing images of documents.
2. Related Art
One way to capture images of documents is by using digital cameras or smartphones as shown in
There exist several systems with interactive or automatic features for rectification of such problems. For example, some digital cameras have a keystone correction feature. While viewing a photo, the user can select a menu command for keystone correction. Such systems automatically detect a set of edges that form trapezoids. The user selects (with arrow buttons) the desired trapezoid edges, and then selects a menu command to perform the correction to rectify the image into a rectangle with perpendicular sides. A new image is created and the original image is also kept.
Other systems utilize numerous corrective functions including a perspective transform, which can be complicated for novice users. The user selects the image, and goes to the menu command for Perspective Transform. Anchor points appear and the user can drag them to the desired locations. The anchor points are coupled in which anchor points on opposite edges move in unison.
Certain exemplary embodiments of the invention described here are directed to methods and systems that substantially obviate one or more of the above and other problems associated with related art techniques for image rectification.
Aspects of these exemplary embodiments include a method of manipulating an image containing a document, which may involve creating a three dimensional viewport for the image; upon receipt of a gesture for rectifying the image, rectifying the image according to the gesture; wherein the gesture is received directly on the image.
Other aspects of these exemplary embodiments may further include a method of manipulating an image of a document, which may involve creating a three dimensional viewport for the image of the document; and receiving a command; wherein if the command is for scrolling the image, scrolling the image according to the gesture and correcting for geometric distortion by utilizing a numerical algorithm to solve for a geometric transform during the scrolling.
Further aspects of the exemplary embodiments may further include an apparatus, which may involve a touch display operable to display an image of a document in a three dimensional viewport; and a manipulation module or a processor; wherein upon receipt of a gesture directly on the image displayed on the touch display, the manipulation module or processor manipulates the image according to the gesture.
Additional aspects of the exemplary embodiments will be set forth in part in the description which follows, and in part will be apparent from the description, or may be learned by practice of the exemplary embodiments. Aspects of the exemplary embodiments may be realized and attained by means of the elements and combinations of various elements and aspects particularly pointed out in the following detailed description and the appended claims.
It is to be understood that both the foregoing and the following descriptions are exemplary and explanatory only and are not intended to limit the embodiments or the application thereof in any manner whatsoever.
The accompanying drawings, which are incorporated in and constitute a part of this specification exemplify the embodiments and, together with the description, serve to explain and illustrate principles of the exemplary embodiments.
a to 3c illustrate rectification of a document contained in a photo object in accordance with an exemplary embodiment.
a and 4b illustrate exemplary gestures that can be used to perform the rectification in accordance with an exemplary embodiment.
a and 5b illustrates scaling up a document image to make it readable in accordance with an exemplary embodiment.
a to 6e illustrate a comparison of skewing when an image is scrolled vertically versus correcting for the skewing in accordance with an exemplary embodiment.
a to 11c illustrate scrolling an image along the horizontal axis.
In the following detailed description, reference will be made to the accompanying drawings, in which identical functional elements are designated with like numerals. The aforementioned accompanying drawings show by way of illustration, and not by way of limitation, exemplary embodiments and implementations consistent with principles of the exemplary embodiments. These implementations are described in sufficient detail to enable those skilled in the art to practice the exemplary embodiments and it is to be understood that other implementations may be utilized and that structural changes and/or substitutions of various elements may be made without departing from the scope and spirit of exemplary embodiments. The following detailed description is, therefore, not to be construed in a limited sense. Additionally, the exemplary embodiments as described may be implemented in the form of a software running on a general purpose computer, in the form of a specialized hardware, or combination of software and hardware.
There are several drawbacks in using known methods of interacting with electronic devices to manipulate or correct the images of documents captured or displayed on such devices. For example, related art systems require entering and exiting different modes often with working with images of documents. This not only hampers fluid interaction with the document but also makes the interactions susceptible to mode errors. Control points or control segments utilized in related art systems to manipulate images of documents detract from a clean user interface design. For example, it is difficult to select small targets on touch screens (the “Fat Finger Problem”) such as control points and lines within related art systems. Moreover, using control points and control segments requires excessive operations (e.g., mouse clicking and dragging). Extra images are generated when using such control points and segments, and since high resolution is required for document legibility, the extra images can take up a substantial amount of memory as the size of the photo collection of documents grows.
Although automatic rectification methods do exist in related systems, they require the edges of the bounding rectangle of the document to be visible inside the photo, which is not always the case. Some related art methods are based on detecting lines of text on the page, which would not work well for pictures, diagrams, and handwriting. Hence, even though automatic methods are available in related art systems, there will be occasions when they fail.
To address these problems, exemplary embodiments of the invention utilize several features, 1) the utilization of a three dimensional (3D) viewport for displaying a captured image instead of a standard two dimensional (2D) viewport, 2) the utilization of gestures and multi-touch gestures for performing manipulation/rectification of the captured image within the 3D viewport without utilizing control points or segments, and 3) the utilization of numerical methods (such as Newton's method) within the 3D viewport to correct for distortion of the image without requiring intensive processing power that may not be available for mobile devices. Exemplary embodiments of the invention can thereby support fluid modeless interaction, eliminate control points or segments, and generate a tiny amount of metadata to specify the view of the scene rather than creating a new high resolution image. By employing multi-touch displays and 3D graphics, it is possible for the exemplary embodiments to develop interactive techniques for rectification that are intuitive and fluid. With multi-touch displays, the user can directly manipulate an image object (e.g., a photo of a document) with a rich vocabulary of gestures. By placing the image object in a 3D scene/viewport with a perspective camera rendering mechanism, the user can easily correct for perspective distortions by simply manipulating the image object in the scene within the 3D viewport.
Previous systems utilized two dimensional rectangular viewing regions as viewports in order to perform rectification. However, the manipulation of the object is thereby restricted only along the x and y dimensions. The exemplary embodiments of the invention create a novel three dimensional viewing region for the image of the document as a viewport to allow the user to perform rectification with gestures and multi-touch gestures and permit manipulation in the 3D region. The exemplary embodiments of the invention utilize the x, y, and z dimensions to perform numerical methods that provide rectification while requiring less computational work than the previous systems.
Certain exemplary embodiments attempt rectification by focusing on the problem of perspective distortion (also known as keystone correction). This occurs when the photo of a document is taken at an angle, as shown in
a-3c illustrate rectification of a photo object in accordance with an exemplary embodiment. To perform the rectification, the photo object is first rotated in the 3D view port so that the contents on the document are straightened out, as shown in the transition from rotating
a and 4b illustrate exemplary gestures that can be used to perform the rectification of
A side effect of the rotation operation is that the content on the document is oftentimes compressed horizontally, as shown in
a and 5b illustrates scaling up a document image within the 3D viewport to make it readable in accordance with an exemplary embodiment. The document image can be scaled up to make it readable as shown in
Next, the user might want to scroll around on the page by using the familiar scroll bars around the viewport, or equivalently by touching and dragging the page around.
An alternative approach is to employ both a 2D viewport and a 3D viewport, where the 3D viewport is a special mode that the user enters to perform perspective correction and to create a new image for viewing, which can then be viewed back in the 2D viewport. However, there are several disadvantages such as more complexity for the user in terms of multiple UI metaphors, requiring modes, and taking up more memory with the generated images. A more technical problem is that on some platforms (e.g. WINDOWS 7), the rendering pipeline to produce the nice transformed image on the display, which leverages specialized graphics hardware such as the graphical processing unit (GPU), is different from the software rendering pipeline for processing bitmaps, and the results of the latter exhibit a noticeable decrease in image quality (such as poorer anti-aliasing).
Scrolling in the 3D scene under perspective rendering leads to some problems in 3D graphics. Examples in vertical scrolling are described below. For reference, the standard convention to describe the coordinates in 3D graphics is that the x-axis points to the right, the y-axis points up, and the z-axis points outward from the screen.
a to 6e illustrate a comparison of skewing when an image is scrolled vertically versus correcting for the skewing in accordance with an exemplary embodiment.
Referring to
An approach according to an exemplary embodiment to compute the angle ψ for a given scroll value dy and angle θ is first to figure out an equation for them. This will lead to terms involving sines and cosines of these angles, but the formulas do not appear to reduce to a simple expression for ψ. A numerical method such as Newton's Method can be applied, since the sine and cosine functions can be differentiated.
Using trigonometry, an expression can be derived in the form:
f(ψ)=0 (1),
where f(ψ) is a polynomial of trigonometric functions of ψ and (τ+ψ), which also involves the parameters b, d, h, dy. The explicit formula is:
ƒ(ψ)=(b cos ψ−dy)[−h sin(τ+ψ)−(d+dy tan ψ)]−(h cos(τ+ψ)−dy)[b sin ψ+(d+dy tan ψ)]=0. (2)
Formula (2) does not reduce to a simple formula for ψ, so f(ψ)=0 should be solved numerically. f(ψ)=0 can be solved numerically by applying an algorithm such as Newton's Method. In order to use Newton's Method, which is an iterative algorithm, the derivative of f′(ψ) is obtained, and an initial estimate ψ 0 of the solution. Then Newton's Method defines an iterative sequence of values {ψ n} by the recurrence equation:
The derivative f′(ψ) can be obtained from f(ψ), given by expression (2) above, using the chain rule from calculus along with the basic formulas for the derivative of the trigonometric functions.
The initial estimate ψ 0 of the solution can be made by taking a rough approximation of
In implementations of the exemplary embodiments, the terms are computed until the difference between successive terms is less than ε=0.00001. This requires several hundred iterations, and it runs fast enough for real time interaction as the user clicks the scrollbar repeatedly.
a to 11c illustrate scrolling an image along the horizontal axis. The image of the document in
To correct for the stretching shown in 1201 and to reach a correctly rectified image as shown in 1202 of
By similar triangles, we have
Since dz=dx tan θ, the solution is
Moreover, other implementations of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. Various aspects and/or components of the described embodiments may be used singly or in any combination in the image identification system. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.