The United States Patent Office (USPTO) has published a notice effectively stating that the USPTO's computer programs require that patent applicants reference both a serial number and indicate whether an application is a continuation or continuation-in-part. See Stephen G. Kunin, Benefit of Prior-Filed Application, USPTO Official Gazette 18 Mar. 2003. The present Applicant Entity (hereinafter “Applicant”) has provided above a specific reference to the application(s) from which priority is being claimed as recited by statute. Applicant understands that the statute is unambiguous in its specific reference language and does not require either a serial number or any characterization, such as “continuation” or “continuation-in-part,” for claiming priority to U.S. patent applications. Notwithstanding the foregoing, Applicant understands that the USPTO's computer programs have certain data entry requirements, and hence Applicant is designating the present application as a continuation-in-part of its parent applications as set forth above, but expressly points out that such designations are not to be construed in any way as any type of commentary and/or admission as to whether or not the present application contains any new matter in addition to the matter of its parent application(s).
All subject matter of the Related Applications and of any and all parent, grandparent, great-grandparent, etc. applications of the Related Applications is incorporated herein by reference to the extent such subject matter is not inconsistent herewith.
1. Field
Embodiments relate to optical character and text recognition and finger tapping gestures in working with text in images.
2. Related Art
Various types of input devices perform operations in association with electronic devices such as mobile phones, tablets, scanners, personal computers, copiers, etc. Exemplary operations include moving a cursor and making selections on a display screen, paging, scrolling, panning, zooming, etc. Input devices include, for example, buttons, switches, keyboards, mice, trackballs, pointing sticks, joy sticks, touch surfaces (including touch pads and touch screens), etc.
Recently, integration of touch screens with electronic devices has provided tremendous flexibility for developers to emulate a wide range of functions (including the displaying of information) that can be performed by touching the screen. This is especially evident when dealing with small-form electronic devices (e.g., mobile phones, personal data assistants, tablets, netbooks, portable media players) and large electronic devices embedded with a small touch panel (e.g., multi-function printer/copiers and digital scanners).
Existing emulation techniques based on gestures are not effective and are unavailable with activities and operations of existing devices, software and user interfaces. Further, it is difficult to select and manipulate text-based information shown on a screen using gestures, especially where the information is displayed in the form of an image. For example, operations such as selecting a correct letter, word, line, or sentence to be deleted, copied, inserted, or replaced often proves difficult or impossible using gestures.
Embodiments disclose a device with a touch sensitive screen that supports receiving input such as through tapping and other touch gestures. The device can identify, select or work with initially unrecognized text. Unrecognized text may be found in existing images or images dynamically displayed on the screen (such as through showing images captured by a camera lens in combination with video or photography software). Text is recognized and may be subsequently selected and/or processed.
A single tap gesture can cause a portion of a character string to be selected. A double tap gesture can cause the entire character string to be selected. A tap and hold gesture can cause the device to enter a cursor mode wherein a placement of a cursor relative to the characters in a character string can be adjusted. In a text selection mode, a finger can be used to move the cursor from a cursor start position to a cursor end position and to select text between the positions.
Selected or identified text can populate fields, control the device, etc. Recognition of text (e.g., through one or more optical character recognition functions) can be performed upon access to or capture of an image. Alternatively, recognition of text can be performed in response to the device detecting a tapping or other touch gesture on the touch sensitive screen of the device. Tapping is preferably on or near a portion of text that a user seeks to identify or recognize, and acquire, save, or process.
Other details and features will be apparent from the detailed description.
In the following description, for purposes of explanation, numerous specific details are set forth. Other embodiments and implementations are possible.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments.
Broadly, a technique described herein is to select or identify text based on gestures. The technique may be implemented on any electronic device with a touch interface to support gestures, or on devices that accept input or feedback from a user, or on devices through automated selection means (e.g., software or firmware algorithms). Advantageously, in one embodiment, once text is selected or identified, further processing is initiated based on the selected or identified text, as further explained.
While the category of electronic devices with a touch interface to support gestures is quite large, for illustrative purposes, reference is made to a multi-function printer/copier or scanner equipped with a touch sensitive screen. Hardware for such a device is described with reference to
In one embodiment, a tapping gesture is used for text selection or identification. The type of tapping gesture determines how text gets selected or how a portion of text is identified.
Referring to
If the user is not content with the initial cursor position “A”, the user does not release the finger 104 and does not enter text selection mode as described. Instead, the user maintains finger contact on the touch screen 100 to cause the device to continue being in cursor mode. In cursor mode, the user can slide the finger 104 to move the cursor 112 and/or cursor control 110 to a desired location in the text 102. Typically, movement of the cursor control 110 causes a sympathetic or corresponding movement in the position of the cursor 112. In the example of
Text selection in text selection mode is illustrated with reference to
The above-described gesture-based methods may advantageously be implemented on a scanner to capture information from scanned documents. Alternatively, such gesture-based methods may be implemented on touch-enabled display such as on a smartphone, a tablet device, a laptop having a touch screen, etc.
With reference to
In some embodiments, a touch screen 100 may display an image comprising text that has not previously been subjected to optical character recognition (OCR). In such cases, an OCR operation is performed as described herein more fully. In summary, the OCR operation may be performed immediately after the device accesses, opens or captures the image or displays the image. Alternatively, the OCR operation may be performed over a portion of the image as a user interacts with the (unrecognized) image displayed on the touch screen 100.
The OCR operation may be performed according to or part of one of various scenarios, some of which are illustrated in the flowchart 700 of
In one embodiment, if an OCR operation is performed, the device (e.g., tablet, smartphone) identifies a relevant portion of the image that likely has text, and performs OCR on the entire portion of the image with text or the entire image. This scenario involves segmenting the image into regions that likely have text and performs recognition of each of these regions—characters, words and/or paragraphs (regions) are located, and characters, words, etc. are recognized. This scenario occurs at step 706 when OCR is performed.
Alternatively, the capture portion of a device may send the image (or portion of the image) to a component of the system, and the component of the system may perform one or more OCR functions and returns the result to the device. For example, a smartphone could capture an image and could send the image to a network accessible computing component or device, and the computing component or device (e.g., server or cloud-based service) would OCR the image and return a representation of the image and/or the recognized text back to the smartphone.
Assuming that the system or device performed OCR function(s) on the image, at block 708, a user selects or identifies text on a representation of the image shown on the touch screen of the device by making a tapping or touch gesture to the touch screen of the device. The portion of text so selected or identified optionally could be highlighted or otherwise displayed in a way to show that the text was selected or identified. The highlighting or indication of selection could be shown until a further action is triggered, or the highlighting or indication of selection could be displayed for s short time, such as a temporary flashing of text selected, touched or indicated. Such highlighting could be done by any method known in the user interface programming art. After text is selected, at block 716, further processing maybe performed with the selected or identified text (as explained further below). Preferably, such further processing occurs in consequence of selecting or identifying the text (at 708) such that further processing occurs directly after said selecting or identifying.
From block 706, when the system does not perform OCR on the entire image (initially), further processing is done at block 710. The further processing at block 710 includes, for example, allowing a user to identify a relevant portion or area of the image by issuing a tap or touch gesture to the touch screen of the device as shown in block 712. For example, a user could make a single tap gesture on or near a single word. In response to receiving the gesture, the device estimates an area of interest, identifies the relevant region containing or including the text (e.g., word, sentence, paragraph), performs one or more OCR functions, and recognizes the text corresponding to the gesture at block 713. The OCR of the text 713 preferably occurs in response to or directly after identifying a relevant portion of text or area of text.
Alternatively, the further processing of block 710 could be receiving an indication of an area that includes text as shown at block 714. When a user selects an entire area, such as by communicating a rectangle gesture to the touch screen (and corresponding portion of the image), the device or part of the system performs OCR on the entire selected area of the image. For example, a user could select a column of text or a paragraph of text from an image of a page of a document.
Once text is selected, yet further processing may be performed at block 716. For example, a document (or email message, SMS text message, or other “document”-like implementation) may be populated or generated with some or all of the recognized text. Such document may be automatically generated in response to the device receiving a tapping gesture at block 708. For example, if a user takes a picture of text that includes an email address, and then makes a double-tap gesture on or near the email address, the system may find the relevant area of the image, OCR (recognize) the text corresponding to the email address, recognize the text as an email address, open an application corresponding to an email message, and populates a “to” field with the email address. Such sequence of events may occur even farther upstream in the process, such as at the point of triggering a smartphone to take a picture of text that includes an email address. Thus, an email application may be opened and pre-populated with the email address from the picture in response to just taking a picture. The same could be done with a phone number: a picture could be taken, and the phone number would be dialed or stored into a contact. No intermediate encoding of text, such as through use of a QR code, need be used due in part to OCR processing.
The techniques described may be used with many file types (images). For example image file types (e.g., .tiff, .jpg, and .png file types) may be used. Further, vector-based images may be used and do not have encoded text present. PDF format documents may or may not have text already encoded and available. At the time of opening a PDF document, a device using the techniques described herein can determine whether the document has encoded text information available or not, and can determine whether OCR is needed.
The described tapping gestures are preferably performed on the middle panel 804 to select text. As shown in
Each time a user selects, identifies or interacts with the text in the middle panel 804, the user interface and software operating on the device may automatically determine the format of the text so selected. For example, if a user selects text that is a date, the OCR function determines or recognizes that the text is a date and populates the field with a date. The date extracted from the format associated with a “date” field in the left panel 802 may be used to format the text selected or extracted from the middle panel 804. In
Similarly, fonts and other text attributes may be modified consistent with a configuration for each data field in the left panel 802 as the text is identified or selected, and sent to the respective field. Thus, the font, text size, etc. of the text found or identified in the image of the center panel 804 is not required to be perpetuated to the fields in the left panel 802, but may be done. In such case, the user interface attempts to match the attributes of the recognized text of the image with a device-accessible, device-generated, or device-specific font, etc. For example, the Invoice Number from the image may correspond to an Arial font or typeface of size 14 of a font generated by the operating system of a smartphone. With reference to
With reference to
The system 900 also may receive a number of inputs and outputs for communicating information externally. For interface with a user or operator, the system 900 may include one or more user input devices 906 (e.g., keyboard, mouse, imaging device, touch-sensitive display screen) and one or more output devices 908 (e.g., Liquid Crystal Display (LCD) panel, sound playback device (speaker, etc)).
For additional storage, the system 900 may also include one or more mass storage devices 910 (e.g., removable disk drive, hard disk drive, Direct Access Storage Device (DASD), optical drive (e.g., Compact Disk (CD) drive, Digital Versatile Disk (DVD) drive), tape drive). Further, the system 900 may include an interface with one or more networks 912 (e.g., local area network (LAN), wide area network (WAN), wireless network, Internet) to permit the communication of information with other computers coupled to the one or more networks. It should be appreciated that the system 900 may include suitable analog and digital interfaces between the processor 902 and each of the components 904, 906, 908, and 912 as may be known in the art.
The system 900 operates under the control of an operating system 914, and executes various computer software applications, components, programs, objects, modules, etc., to implement the techniques described. Moreover, various applications, components, programs, objects, etc., collectively indicated by Application Software 916 in
The routines executed to implement the embodiments may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as computer programs. The computer programs may comprise one or more instructions set at various times in various memory and storage devices in a computer, and that, when read and executed by one or more processors in a system, cause the system to perform operations necessary to execute elements involving the various aspects.
While the techniques have been described in the context of fully functioning computers and computer systems, those skilled in the art will appreciate that the various embodiments are capable of being distributed as a program product in a variety of forms, and that the techniques operate regardless of the particular type of computer-readable media used to actually effect its distribution. Examples of computer-readable media include but are not limited to recordable type media such as volatile and non-volatile memory devices, floppy and other removable disks, hard disk drives, optical disks (e.g., Compact Disk Read-Only Memory (CD ROMS), Digital Versatile Disks, (DVDs), etc.), among others, and transmission type media such as digital and analog communication links.
Further, the functionality operating on the cellular telephone 1006 may only momentarily, temporarily or in passing capture an image of some or all of the business card. For example, upon engaging the desired function, the cellular telephone 1006 may operate the camera 1008 in a video mode, capture one or more images, and recognize text in these images until the operation and cellular telephone 1006 locates one or more telephone numbers. At that time, the cellular telephone 1006 discards any intermediate or temporary data and any captured video or images, and initiates the telephone call.
Alternatively, the cellular telephone 1006 may be placed into a telephone number capture mode. In such mode, the camera 1008 captures image(s), the cellular telephone 1006 extracts information, and the information is stored in a contact record. In such mode any amount of recognized data may be used to populate fields associated with the contact record such as first name, last name, street address and telephone number(s) (or other information available in an image of some or all of the business card 1002). A prompt confirming correct capture may be shown to the user on the touch screen 1016 prior to storing the contact record.
Alternatively, some or all of a page of the document (not shown) could be shown on the touch screen 1106. A user could select text from the document shown in the image 1108 shown on the touch screen 1106 according to the mechanism(s) described in reference to
Once an image is accessed or acquired, the software program segments the image into regions 1206. These regions are those that likely contain a group of textual elements (e.g., letters, words, sentences, paragraphs). Such segmentation may include calculating or identifying coordinates, relative to one or more positions in the image, of the textual elements. Such coordinates may be recorded or saved for further processing. Segmenting 1206 may include one or more other functions. After sementing, one or more components perform optical character recognition (OCR) functions on each of the identified regions 1208. The OCR step 1208 may include one or more other related functions such as sharpening of regions of the acquired image, removing noise, etc. The software then waits for input (e.g., gesture, touch by a user) to a touch enabled display on a location of or near one of the segmented text regions. In a preferred implementation, at least a portion of the image, or a representation of the image, is shown on the touch enabled display. The displayed image may serve as a reference for determining where in the image a touch or gesture is given and interpreted. It is through the displayed image that interaction with the text of the image is possible.
In response to receiving an input or gesture 1210, the software interprets the input or gesture and then identifies a relevant text of the image 1212. Such identification may include, for example, selecting the relevant text, saving the relevant text to a memory or storage, copying the text to a memory, or passing a copy of the text to another function or software. Further processing 1214 is preferably performed on the identified text. Further processing 1214 may include such activities as populating a field of a database accessible by the software, dialing a telephone number, sending an SMS text message, launching an email program and populating one or more relevant fields, etc.
Once an image is accessed or acquired, the software program partially segments the image into regions 1306. These regions are those that likely contain a group of textual elements (e.g., letters, words, sentences, paragraphs). Such partial segmentation may include calculating or identifying some possible coordinates, relative to one or more positions in the image, of the textual elements. Such coordinates may be recorded or saved for further processing. Partially segmenting the image 1306 may include one or more other functions. Partial segmentation may identify down to the level of each character, or may segment just down to each word, or just identify those few regions that contain a block of text. As to
Instead, at this stage of the exemplary method 1300, the software waits for and receives input (e.g., gesture, touch by a user) to the touch enabled display 1308 on a location of or near one of the segmented text regions. In a preferred implementation, at least a portion of the image, or a representation of the image, is shown on the touch enabled display. The displayed image may serve as a reference for determining where in the image a touch or gesture is given and interpreted. It is through the displayed image that interaction with the text of the image is possible.
In response to receiving the touch or gesture 1308, one or more components perform one or more optical character recognition (OCR) functions 1310 on an identified region that corresponds to the touch or gesture. The OCR step 1310 may include one or more other related functions such as sharpening of a relevant region of the acquired image, removing noise from the relevant region, etc. For example, a block or region of the image (that includes a word of text in bitmap format) “receives” a double-tap gesture and this block or region of the image is subjected to an OCR function through which the word is recognized and identified. Next, the relevant text is identified 1312. Continuing with the double-tap example, such identification involves identifying just a single word from a line of text where the tap gesture has been interpreted to refer to the particular word based on the location of the tap gesture. Identification may also include displaying the word on the touch enabled display or altering the pixels of the image that correspond to the word in the image. At this point, the displayed image or portion of the image is still preferably a bitmapped image, but may include a combination of bitmapped image and rendering to the display of encoded (i.e., recognized) text. The displaying of text may include addition of a highlighting characteristic, or a color change to each letter of the selected word. Such identification may also include, for example, selecting the relevant text, saving the relevant text to a memory or storage, copying the text to a memory, or passing a copy of the text to another function or software. Further processing 1314 is preferably performed on the identified text. Further processing 1314 may include such activities as populating a field of a database accessible by the software, dialing a telephone number, sending an SMS text message, launching an email program and populating one or more relevant fields, etc. The further processing may be dependent upon the interpretation of the recognized text. For example, if the word selected through a tap gesture is “open,” the further processing may involve launching of a function or dialogue for a user to open a document. In another example, if the word selected through a tap gesture is “send,” further processing may involve communicating to the instant or other software application to receive the command to “send.” In yet another example, if the text selected through a tap gesture is “call 650-123-4567”, further processing may involve causing the device to call the recognized phone number.
At this stage of the exemplary method 1400, the software waits for and receives input (e.g., gesture, touch by a user) to the touch enabled display 1406 on a location of or near one of the segmented text regions. In a preferred implementation, at least a portion of the image, or a representation of the image, is shown on the touch enabled display when waiting for the input, gesture or touch. The displayed image may serve as a reference for determining where in the image a touch or gesture is given and interpreted. It is through the displayed image that interaction with the text of the image is possible.
In response to receiving the touch or gesture 1406, one or more components perform identification 1408 (such as a segmentation or a location identification) on a relevant portion (or entirety) of the image. Further, one or more components perform one or more OCR functions 1410 on an identified region that corresponds to the touch or gesture. The segmentation step 1408 or OCR step 1410 may include one or more other related functions such as sharpening of a relevant region of the acquired image, removing noise from the relevant region, etc. For example, a block or region of the image (that includes a word of text in bitmap format) “receives” a double-tap gesture and this block or region of the image is subjected to segmentation to identify a relevant region, and then to an OCR function through which the word is recognized and identified. Segmentation and OCR of the entire image need not be performed through this method 1400 if the gesture communicates less than such. Accordingly, less computation by a processor is needed for a user to gain access to recognized (OCR'd) text of an image through this method 1400.
Next, the text of a relevant portion of the image is identified 1412. Continuing with the double-tap example, such identification involves identifying just a single word from a line of text where the tap gesture has been interpreted to refer to the particular word based on the location of the tap gesture. Identification may also include displaying the word on the touch enabled display or altering the pixels of the image that correspond to the word in the image. At this point, the displayed image or portion of the image is still preferably a bitmapped image, but may include a combination of bitmapped image and rendering to the display of encoded (i.e., recognized) text. The displaying of text may include addition of a highlighting characteristic, or a color change to each letter of the selected word. Such identification may also include, for example, selecting the relevant text, saving the relevant text to a memory or storage, copying the text to a memory, or passing a copy of the text to another function or software. Further processing 1314 is preferably performed on the identified text. Further processing 1314 may include such activities as populating a field of a database accessible by the software, dialing a telephone number, sending an SMS text message, launching an email program and populating one or more relevant fields, etc.
While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative and not restrictive and that the techniques are not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art upon studying this disclosure. In this technology, where growth is fast and further advancements are not easily foreseen, the disclosed embodiments may be readily modifiable in arrangement and detail as facilitated by enabling technological advancements without departing from the principals of the present disclosure.
For purposes of the USPTO extra-statutory requirements, the present application constitutes a continuation-in-part of U.S. patent application Ser. No. 12/466,333 that was filed on 14 May 2009, which is currently co-pending, or is an application of which a currently co-pending application is entitled to the benefit of the filing date. The present application also constitutes a continuation-in-part of U.S. patent application Ser. No. 12/467,245 that was filed on 15 May 2009, which is currently co-pending, or is an application of which a currently co-pending application is entitled to the benefit of the filing date.
Number | Date | Country | |
---|---|---|---|
Parent | 12466333 | May 2009 | US |
Child | 13361713 | US | |
Parent | 12467245 | May 2009 | US |
Child | 12466333 | US |