This invention relates to the field of data extraction and, in particular, to integrated image detection and contextual commands.
Current technologies for searching for and identifying interesting patterns in a piece of text data locate specific structures in the text. A device performing a pattern search refers to a library containing a collection of structures, each structure defining a pattern that is to be recognized. A pattern is a sequence of so-called definition items. Each definition item specifies an element of the text pattern that the structure recognizes. A definition item may be a specific string or a structure defining another pattern using definition items in the form of strings or structures. For example, a structure may give the definition of what is to be identified as a US state code. According to the definition, a pattern in a text will be identified as a US state code if it corresponds to one of the strings that make up the associated definition items, such as “AL”, “AK”, “AS”, etc. Another example structure may be a telephone number. A pattern will be identified as a telephone number if it includes a string of three numbers, followed by a hyphen or space, followed by a string of four numbers.
These pattern detection technologies only work to identify patterns in pieces of text data. In modern data processing systems, however, important data may be contained in other forms that just simple text. One example of the form of data is an image, such as JPEG (Joint Photographic Experts Group), PNG (Portable Network Graphics), TIFF (Tagged Image File Format), or other image file format. An image may be received at a data processing system, for example in an email or multimedia messaging service (MMS) message, or the image may be taken by a camera attached to the device. The image may be of a document, sign, poster, etc. that contains interesting information. Current pattern detection technologies cannot identify patterns in the image that can be used by the data processing system to perform certain commands based on the context.
Embodiments are described to identify important information in an image that can be used by a data processing system to perform certain commands based on the context of the information. A text recognition module identifies textual information in the image. To identify the textual information, the text recognition module performs a text recognition process on image data corresponding to the image. The text recognition process may include optical character recognition (OCR). A data detection module identifies a pattern in the textual information and determines a data type of the pattern. The data detection module may compare the textual information to a definition of a known pattern structure. In certain embodiments, the data type may include one of a phone number, an email address, a website address, a street address, an ISBN (International Standard Book Number), a price value, a movie title, album art, and a barcode. A user interface provides a user with a contextual processing command option based on the data type of the pattern in the textual information. The data processing system executes the contextual processing command in an application of the system. In certain embodiments, the application may include one of a phone application, an SMS (Short Message Service) and MMS (Multimedia Messaging Service) messaging application, a chat application, an email application, a web browser application, a camera application, an address book application, a calendar application, a mapping application, a word processing application, and a photo application.
In one embodiment, a facial recognition module scans the image and identifies a face in the image using facial recognition processing. The facial recognition processing extracts landmarks, such as the relative position, size, and/or shape of the eyes, nose, cheekbones, and jaw, from the face and compares the landmarks to a database of known faces. The user interface provides the user with a contextual processing command option based on the identification of the face in the image.
The present disclosure is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings.
In the following detailed description of embodiments of the invention, reference is made to the accompanying drawings in which like references indicate similar elements, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, mechanical, electrical, functional and other changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims.
Embodiments are described to identify important information in an image that can be used by a data processing system to perform certain commands based on the context of the information. In one embodiment, image data is received by the data processing system. The image data may be received, for example, in an email or multimedia messaging service (MMS) message, or the image may be captured by a camera attached to the device. A text recognition module in the data processing system performs character recognition on the image data to identify textual information in the image and create a textual data stream. The textual data stream is provided to a data detection module which identifies the type of data (e.g., date, telephone number, email address, etc.) based on the structure and recognized patterns. The data detection module causes a user interface of the data processing system to display a number of contextual processing options to the user based on the identified textual information.
In one embodiment, data processing system 100 includes text recognition module 120, data detection module 130, and user interface 140. Text recognition module 120 may perform text recognition processing on received image data 110. Image data 110 may be in any number of formats, such as for example, JPEG (Joint Photographic Experts Group), PNG (Portable Network Graphics), TIFF (Tagged Image File Format), or other image file format. Image data 110 may be received by data processing system 100 in a message, such as an email message, SMS (Short Message Service) message, MMS (Multimedia Messaging Service) message, chat message, or other message. The image data 110 may also correspond to an image in a web page presented by a web browser. Additionally, the image may be captured by an image capture device, such as a camera, integrated with or attached to data processing system 100. Generally, image data 110 may correspond to any image presented by a computing device to a user.
Upon receiving image data 110, text recognition module 120 may perform text recognition processing on the data to identify any textual data stored in the image represented by image data 110. In one embodiment, the text recognition processing includes OCR (optical character recognition). OCR is the recognition of printed or written text or characters by a computer. This involves photo scanning of the text, analysis of the scanned-in image, and then translation of the character image into character codes, such as Unicode or ASCII (American Standard Code for Information Interchange), commonly used in data processing. During OCR processing, the scanned-in image or bitmap is analyzed for light and dark areas in order to identify each alphabetic letter or numeric digit. When a character is recognized, it is converted into Unicode. Special circuit boards and computer chips (e.g., digital signal processing or DSP chip) designed expressly for OCR may be used to speed up the recognition process. In other embodiments, other text recognition processing techniques may be used.
Text recognition module 120 outputs a stream of character codes representing the textual data identified in the image. The stream is received by data detection module 130. In one embodiment, data detection module 130 identifies interesting patterns in the data stream, determines the type of data represented in the pattern and provides contextual processing commands to a user via user interface 140. Further details regarding the operation of data detection module 130 will be provided below.
The search by engine 232 yields a certain number of identified patterns 236. These patterns 236 are then presented to a user via user interface 140. For each identified pattern, the user interface 140 may suggest a certain number of contextual command options, to be implemented in an application 250. For example, if the identified pattern is a URL address the interface 140 may suggest the action “open corresponding web page in a web browser” to the user. If the user selects the suggested action a corresponding application 250 may be started, such as, in the given example, the web browser.
The suggested actions in the contextual commands preferably depend on the context 244 of the application with which the user manipulates the image data 110. More specifically, when performing an action, the system can take into account the application context 244, such as the type of the application (word processor, email client, . . . ) or the information available through the application (time, date, sender, recipient, reference, . . . ) to tailor the action and make it more useful or “intelligent” to the user. The type of suggested actions may also depend on the data type of the associated pattern. If the recognized pattern is a phone number, other actions will be suggested than if the recognized pattern is a street address.
Referring to
The stream of character code output by text recognition module 120 is received by pattern search engine 232, which searches the textual stream for known patterns at block 330. In one embodiment, the pattern search is done in the background without the user noticing it. In a data processing system having an attached pointing device, such as a mouse, when the user places his mouse pointer over a text element that has been recognized as an interesting pattern 236 having actions associated with it, this text element is visually highlighted to the user in user interface 140. In a data processing system with a touch-screen, the patterns 236 identified in the text may be highlighted automatically, without the need of a user action. In some embodiments, the non-highlighted areas of the image may be darkened to increase the visual contrast. At block 340, method 300 presents a number of contextual command options to the user based on the detected data. The highlighted area may include a small arrow or other graphical element. The user can click on this arrow in order to visualize actions associated with the identified pattern 236 in a contextual menu. The user may select one of the suggested actions or commands, which is executed in a corresponding application 250.
In one embodiment, as shown in
Although not illustrated, the following commands may be relevant to the various identified patterns described above. For movie title 402, the commands may include offering more information on the movie (e.g., showtimes, playing locations, trailer, ratings, reviews, etc.) which may be retrieved from a movie website(s) over a network, offering to purchase tickets to an upcoming showing of the movie if still playing in theaters, and offering to purchase or rent the movie from an online merchant, if available. For ISBN number 404, the commands may include offering more information on the book (e.g., title, author, publisher, reviews, excerpts, etc.) and offering to purchase the book from an online merchant, if available. For price value 408, the commands may include adding the price to an existing note (e.g., a shopping list) and comparing the prices to other prices for the same item at other retailers. For date and time 409, the commands may include adding an associated event to an entry in a calendar application or a task list, which may include adding it to an existing entry or creating a new calendar entry. For album art 410, the commands may include offering more information on the album (e.g., artist, release date, track list, reviews, etc.), offering to by the album from an online merchant, and offering to buy concert tickets for the artist. For phone number 412, the commands may include calling the phone number, sending a SMS or MMS message to the phone number, and adding the phone number to an address book, which may include adding it to an existing contact entry or creating a new contact entry. For email address 414, the commands may include sending an email to the email address, and adding the email address to an address book, which may include adding it to an existing contact entry or creating a new contact entry. For street address 416, the commands may include showing the street address on a map, determining directions to/from the street address from/to a current location of the data processing system or other location, and adding the street address to an address book, which may include adding it to an existing contact entry or creating a new contact entry. For barcode 418, the commands may include offering more information on the product corresponding to the barcode which may be retrieved from a website or other database, and offering to buy the product from an online merchant, if available. In response to the user selection of one of the provided contextual command options, the processing system may cause the action to be performed in an associated application.
In one embodiment, facial recognition module scans an image represented by image data 110 after text recognition module 120 has identified textual data and data detection module 130 has identified and recognizable patterns in the textual data. In other embodiments, however, facial recognition module may scan the image before or in parallel with text recognition module 120 and/or data detection module 130.
Upon receiving image data 110, facial recognition module 550 may perform facial recognition processing on the data to identify any faces in the image represented by image data 110. In one embodiment, the facial recognition processing employs one or more facial recognition algorithms to identify faces by extracting landmarks, or features, from an image of the subject's face. For example, an algorithm may analyze the relative position, size, and/or shape of the eyes, nose, cheekbones, and jaw. These features are then used to search for other images with matching features. The features may be compared with known images in a database 552 which may be stored locally in data processing system 500 or remotely accessible over a network. Other algorithms normalize a gallery of face images and then compress the face data, only saving the data in the image that is useful for face detection. A probe image is then compared with the face data. Generally, facial recognition algorithms can be divided into two main approaches: geometric, which looks at distinguishing features; or photometric, which is a statistical approach that distills an image into values and compares the values with templates to eliminate variances. The facial recognition algorithms employed by facial recognition module may include Principal Component Analysis with eigenface, Linear Discriminate Analysis, Elastic Bunch Graph Matching fisherface, the Hidden Markov model, neuronal motivated dynamic link matching, or other algorithms.
Referring to
At block 640, method 600 performs a facial recognition scan on the image data. In one embodiment, the scan is performed by facial recognition module 550. Facial recognition module 550 may compare any recognized faces in the image to a database 552 of known faces in order to identify the recognized faces. In one embodiment, the facial recognition scan may be performed in parallel with the OCR and data detection processes performed at blocks 620 and 630. In a data processing system having an attached pointing device, such as a mouse, when the user places his mouse pointer over a text element or face that has been recognized, the text element or face is visually highlighted to the user in user interface 140. In a data processing system with a touch-screen, the text elements and faces identified in the image may be highlighted automatically, without the need of a user action. At block 650, method 600 presents a number of contextual commands to the user based on the identified text elements and faces. As shown in
Memory 804 may include modules 812 and application 818. In at least certain implementations of the system 800, the processor 802 may receive data from one or more of the modules 812 and application 818 and may perform the processing of that data in the manner described herein. In at least certain embodiments, modules 812 may include text recognition module 120, data detection module 130, user interface 140 and facial recognition module 550. Processor 802 may execute instructions stored in memory on image data as described above with reference to these modules. Applications 818 may include a phone application, an SMS/MMS messaging application, a chat application, an email application, a web browser application, a camera application, an address book application, a calendar application, a mapping application, a word processing application, a photo application, or other applications. Upon receiving a selection of a contextual command through I/O device 820, processor 802 may execute the command in one of these corresponding applications.
Embodiments of the present invention include various operations described herein. These operations may be performed by hardware components, software, firmware, or a combination thereof. Any of the signals provided over various buses described herein may be time multiplexed with other signals and provided over one or more common buses. Additionally, the interconnection between circuit components or blocks may be shown as buses or as single signal lines. Each of the buses may alternatively be one or more single signal lines and each of the single signal lines may alternatively be buses.
Certain embodiments may be implemented as a computer program product that may include instructions stored on a machine-readable medium. These instructions may be used to program a general-purpose or special-purpose processor to perform the described operations. A machine-readable medium includes any mechanism for storing information in a form (e.g., software, processing application) readable by a machine (e.g., a computer). The machine-readable medium may include, but is not limited to, magnetic storage medium (e.g., floppy diskette); optical storage medium (e.g., CD-ROM); magneto-optical storage medium; read-only memory (ROM); random-access memory (RAM); erasable programmable memory (e.g., EPROM and EEPROM); flash memory; or another type of medium suitable for storing electronic instructions.
Additionally, some embodiments may be practiced in distributed computing environments where the machine-readable medium is stored on and/or executed by more than one computer system. In addition, the information transferred between computer systems may either be pulled or pushed across the communication medium connecting the computer systems.
The digital processing devices described herein may include one or more general-purpose processing devices such as a microprocessor or central processing unit, a controller, or the like. Alternatively, the digital processing device may include one or more special-purpose processing devices such as a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or the like. In an alternative embodiment, for example, the digital processing device may be a network processor having multiple processors including a core unit and multiple microengines. Additionally, the digital processing device may include any combination of general-purpose processing devices and special-purpose processing devices.
Although the operations of the methods herein are shown and described in a particular order, the order of the operations of each method may be altered so that certain operations may be performed in an inverse order or so that certain operation may be performed, at least in part, concurrently with other operations. In another embodiment, instructions or sub-operations of distinct operations may be in an intermittent and/or alternating manner.