The present invention relates to a system for the automated conversion of displayed text to audio.
A vast amount of information is available in “hardcopy” print media such as books, newspapers, leaflets, and mailings as well as electronic print media such as online documents. Many people, however, are unable to avail themselves of this information due to visual impairment or illiteracy.
There are a variety of techniques employed to audibly convey the content of print media to those who can not read it. For example, print media may be recorded onto tapes which may then be made available for audio replay. However, this is highly inefficient and has found only a limited use with respect to popular novels and certain educational materials.
One existing system is capable of capturing an image of print media using a scanner or fax machine, recognizing the printed words from the image, and reciting each word in the order printed by relying upon phonemes. In this system, the optical character recognition software requires that the text portion of the image be orthogonally oriented with respect to the boundaries of the image. In other words, if the text is diagonally skewed on the print media, the software in this system will not be capable of interpreting the text. Accordingly, to ensure that the text portion of an image is properly oriented, this system physically aligns the print media in an orthogonal orientation using a tray, a frame, or other structure. The system then linearly scans the print media while it is physically maintained in its proper orientation by scanning successive rows of pixels into memory. The data, as a result of the scanning, is arranged in a digital image format and the system then processes the digital image, identifies the printed letters, and forms words from the letters to match each word to an associated audio file of that word, and plays the audio files in the proper sequence.
Unfortunately, using such a system is cumbersome. First, and particularly with respect to users of desktop flatbed scanners, a visually impaired person may have difficulty properly aligning the print media with respect to the scanning surface. Second, desktop flatbed scanners and fax machines are often too bulky and/or heavy to be used in a variety of social contexts, such as for a menu in a restaurant or for a magazine in a waiting room lobby. Finally, such a systems requires that print media be fed into the device page by page, which is not practical with respect to many items such as menus, bound books, or magazines.
The present inventors considered the existing scanning technology and determined that the need to maintain the paper and imaging device in a precise orientation with respect to each other while obtaining the image is burdensome and ineffective. Further, the existing scanning devices are bulky and difficult to use in many situations, such as at a restaurant or at a bookstore.
The camera 16 is interconnected to a processing device 20, such as a device that includes a microprocessor, to process the data received from the camera 16. The image may contain a portion of the book 12, one page of the book 12, or the book 12 overlaid on the surrounding background. The processing device 20 processes the image in a suitable manner, to be described in detail below, and provides an audio output to the audio device 14. The lens 14 and the camera 16 may be provided as a single unit, if desired, and interconnected to the processing device 20 by a suitable mechanism, such as a USB connection. The processing device 20 includes the audio device 14 or otherwise is connected to the audio device 14. The system may likewise be provided in a single unit.
Referring to
If the user does not elect to update the electronic dictionary, the system 10 then proceeds to an image capture step 130. At step 130 the user may orient the camera 16 to obtain an image of the source, which preferably includes some textual information. The camera 16 may capture the image through operation of any appropriate electrical, mechanical, or electromechanical operation such as a button, lever, computer keystroke, touchpad, voice, etc., depending on the particular device that is used. The image may be captured at any desired resolution, as text may be characterized at relatively low resolutions.
As it may be observed, the camera 16 obtains the image with a generally arbitrary orientation of the book 12. As a result of the camera being at an arbitrary angular orientation with respect to the source material, the resulting image will frequently not have the source material in a perpendicular orientation to the image sensor, such as with fixed scanning type devices. Accordingly, particularized processing of the image should be undertaken to orient the source material in an orientation suitable for optical character recognition. Once the image is captured, it may be processed by a processing unit 134. The processing unit 134 may include an image analysis module 136. The image analysis module 136 modifies the captured images in a suitable manner.
Referring to
The image analysis module 136 may also include a distortion adjustment module 202 that corrects for distortion in the lines of the text. Such distortion might arise, for example, from an image of pages of a book that has a large bulge to it, or taking an image of text that angles away from the lens 14, thus distorting the edges of the text in a manner similar to keystoning. The manner of correcting such distortion is similar to that of correcting the skew of the image, except that smaller portions of the text are examined and processed individually. The manner of correcting such keystone effects modifies the imprinted region of the image into a rectangular shape.
The image may be processed by a segmentation module 204 to segment the image into text and graphics (e.g. non-text). Once the text has been identified in the image, the text portion of the image may be further analyzed. Any suitable segmentation technique may be used.
After segmentation, the image analysis module 136 may include a focus module 206 that estimates the amount of blur in the image and then corrects for it. Any suitable blur estimation and correction technique may be used. The system 10 may estimate and correct the blur electronically, or alternately, the system 10 may include an auto-focusing lens 14 to correct blur prior to the capture of the image. The image processing steps may, of course, be performed in any suitable order. Also, fewer or additional imaging processing steps may be included, as desired.
Once the image has been processed to reduce blur, skew, and distortion, and all non-text portions of the image are segmented out of the image, the image may be processed by an optical character recognition (OCR) module 208. The purpose of the OCR module 208 is to recognize characters of text. Preferably, characters in a variety of fonts may be recognized. Once the characters are recognized, they are divided into words by a word construction module 210. OCR techniques and word construction techniques are well known.
Once individual words of text are distinguished by the image analysis module 136, a data stream of words is provided to a spell check module 138 which may correct spelling on a word-by-word basis, in the same manner as word processors, i.e. common misspelling such as “teh” is automatically corrected to “the.” The spell check module 138 may also be optionally programmable.
The corrected data stream of words may be forwarded to a word location module 140A-140C that attempts to compare the words in the data stream to those in the electronic dictionary. If a dictionary word can not be found, the word location module may select a “guess” by selecting from a series of words in the electronic dictionary similar to the word in the data stream or “guessing” the proper output. Alternatively, the word “unknown” could represent any word in the data stream that is unable to be located in the electronic dictionary. Optionally, the system 10 may store a list of unknown words during use for later display to a partially blind user so that a user may elect to add selected unknown words to the programmable dictionary as convenient.
Because the text, hence the book 12 or other printed material, may be placed at an arbitrary orientation with respect to the camera 16, the system 10 does not require a tray, a feeder, or other bulky frame. Thus, the camera 16 may be compact and contained in any number of portable devices, such as digital cameras, cell phones, PDAs, laptop computers, etc. The system 10 may therefore be used in far more situations than existing text-to audio systems. Further, because the print media being captured does not need to be fed into an optical device page by page, a far wider range of print media is available for audio conversion using the system 10 than is presently available. For example, if the system 10 is incorporated into a cell phone or a PDA, a visually impaired person could capture an image of a menu, a newspaper, a flyer received on a street, even print displayed on an LED or LCD display. Each of these items could then be audibly read to the user by the system 10.
Referring to
Once the words in the data stream are identified, each successive word is sent, in sequence to a word recitation module 142 that instructs the audio device 14 to play the audio files associated with each successive word. Some embodiments of the system 10 may permit a user to select one of a selection of voices, or adjust the volume or pitch of the audio device. Another embodiment of the system 10 may permit a user to delay playback, and store one or more audio streams into memory or a buffer until a convenient time arises to listen to the audio stream. If this embodiment is employed, it would be preferable to erase an electronic image file from memory once the associated word data stream is stored, as the electronic image is no longer necessary and uses far more storage space than the word data stream, which could merely comprise a list of addresses in an electronic dictionary. Once the audio for an image has been played, or stored if desired, the user may be prompted by a page prompt module 144A-144C to turn the page or otherwise notified that the image has been processed.
The system 10 may be able to recognize a page number based on its position in the header or footer of the image. After being prompted to turn the page, if a user turns to a non-sequential page, i.e. skips a page or inadvertently turns the book back a page instead of forward, the system 10 could audibly recite a message that a page has been skipped or the book paged backwards, etc and prompt the user to verify that that was intended or allow a user to turn to the correct page.
Because the image analysis module 136 is capable of correcting the blur, the skew, the distortion, and other imperfections in a captured image, an image of print media may be captured and converted into audio by portable devices.
The cell phone 360 may be easily carried by a person and used to capture a wide variety of print media, including restaurant menus, newspapers, magazine pages, text on billboards, pages of books in libraries, screen shots on computer monitors, LCD and LED displays, nutritional labels in grocery stores, among many other examples. The utility of such a device is apparent. Because the types of text now available for audible recitation with the cell phone 360 (or any other portable device that includes the system 10 such as a PDA, a laptop computer, etc.) is so varied, the cell phone 360 or other device may preferably include software that implements a plurality of “templates” to distinguish among the types of print media that may potentially be captured and audibly recited.
To illustrate the utility of this optional feature, assume that a visually impaired person is seated in a restaurant and handed a menu. Some menus are printed in dual column format with the entree selections printed in the left hand column and the price for each respective entree printed in the right hand column, across from the respective entree to which the price pertains. Other menus, however, are printed in single column format with the price of an entree simply written at the end of the text of the entree to which it pertains. Still other menus may separate entrees by price, i.e. list a price and then list all the entrees available for that price. In this scenario, the cell phone 360 may include a programmed series of templates, one of which is a “menu” template, which the user could then select, i.e. the user could press a button on a cell phone to cycle through audibly recited templates until the “menu” template is recited by the audio device. With that template, the image processing module would recognize that a captured image is from a menu, and on the basis of that template, analyze the image to determine which format the menu has been printed in, and recite the text in the proper sequence.
Other types of templates may also be used. For example, the cell phone 360 or other such device could include templates for newspapers, or phone books. If a phone book template used with the cell phone 360, the phone book template could not only identify the column format of the text, but when using that template, a user could be allowed an option of one-touch dialing of the number or name just audibly recited.
The terms and expressions that have been employed in the foregoing specification are used therein as terms of description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding equivalents of the features shown and described or portions thereof, it being recognized that the scope of the invention is defined and limited only by the claims that follow.
Number | Name | Date | Kind |
---|---|---|---|
3632887 | Leipp et al. | Jan 1972 | A |
3665111 | Schieser | May 1972 | A |
4080517 | Moricca et al. | Mar 1978 | A |
4996707 | O'Malley et al. | Feb 1991 | A |
4996760 | Coleman | Mar 1991 | A |
5345557 | Wendt | Sep 1994 | A |
5680478 | Wang et al. | Oct 1997 | A |
5836664 | Conner et al. | Nov 1998 | A |
6052663 | Kurzweil et al. | Apr 2000 | A |
6076060 | Lin et al. | Jun 2000 | A |
6097375 | Byford | Aug 2000 | A |
6205261 | Goldberg | Mar 2001 | B1 |
6208436 | Cunningham | Mar 2001 | B1 |
6256610 | Baum | Jul 2001 | B1 |
6377928 | Saxena et al. | Apr 2002 | B1 |
6408072 | Fernandez-Martinez | Jun 2002 | B1 |
6476376 | Biegelsen et al. | Nov 2002 | B1 |
6529641 | Chakraborty | Mar 2003 | B1 |
6622276 | Nagasaki et al. | Sep 2003 | B2 |
6721465 | Nakashima et al. | Apr 2004 | B1 |
6775381 | Nelson et al. | Aug 2004 | B1 |
6965862 | Schuller | Nov 2005 | B2 |
7031553 | Myers et al. | Apr 2006 | B2 |
7088853 | Hiroe et al. | Aug 2006 | B2 |
7092496 | Maes et al. | Aug 2006 | B1 |
7171046 | Myers et al. | Jan 2007 | B2 |
20010044724 | Hon et al. | Nov 2001 | A1 |
20010051872 | Kagoshima et al. | Dec 2001 | A1 |
20010056342 | Piehn et al. | Dec 2001 | A1 |
20020031264 | Fujimoto et al. | Mar 2002 | A1 |
20020032702 | Horii | Mar 2002 | A1 |
20020069062 | Hyde-Thomson et al. | Jun 2002 | A1 |
20020163653 | Struble et al. | Nov 2002 | A1 |
20030091174 | Fulford et al. | May 2003 | A1 |
20030134256 | Tretiakoff et al. | Jul 2003 | A1 |
20030229497 | Wilson et al. | Dec 2003 | A1 |
20040203817 | Pao et al. | Oct 2004 | A1 |
Number | Date | Country |
---|---|---|
2306527 | Oct 2000 | CA |
2 343 664 | Apr 2001 | CA |
100 62 379 | Dec 2000 | DE |
0457830 | Jan 1990 | EP |
0458859 | Feb 1990 | EP |
0680652 | Jan 1994 | EP |
0680654 | Jan 1994 | EP |
0689706 | Jan 1994 | EP |
1157526 | Nov 2001 | EP |
2002-202789 | Jul 2002 | JP |
9703093 | Mar 1997 | KR |
2000024318 | May 2000 | KR |
2000026424 | May 2000 | KR |
259777 | Jun 2000 | KR |
86445 | Feb 2002 | SG |
WO 8504747 | Dec 1984 | WO |
WO 9009716 | Jan 1990 | WO |
WO 9009657 | Feb 1990 | WO |
WO 9417516 | Jan 1994 | WO |
WO 9417517 | Jan 1994 | WO |
WO 9417518 | Jan 1994 | WO |
WO 9423423 | Mar 1994 | WO |
WO 9504988 | Aug 1994 | WO |
WO 9622594 | Dec 1995 | WO |
WO 9707499 | Aug 1996 | WO |
WO 9734292 | Feb 1997 | WO |
WO 9819297 | Oct 1997 | WO |
WO 9943141 | Feb 1999 | WO |
WO 9965256 | Jun 1999 | WO |
WO 9966496 | Jun 1999 | WO |
WO 0008832 | Jul 1999 | WO |
WO 0010101 | Aug 1999 | WO |
WO 0019412 | Sep 1999 | WO |
WO 0023985 | Oct 1999 | WO |
WO 0045373 | Jan 2000 | WO |
WO 0051316 | Feb 2000 | WO |
WO 0106489 | Apr 2000 | WO |
WO 0104799 | Jul 2000 | WO |
WO 0159976 | Feb 2001 | WO |
WO 0159976 | Feb 2001 | WO |
WO 0169905 | Mar 2001 | WO |
WO 0186634 | Apr 2001 | WO |
WO 0249003 | Dec 2001 | WO |
WO 0277972 | Mar 2002 | WO |
WO 0277975 | Mar 2002 | WO |
WO 0280107 | Mar 2002 | WO |
WO 0280140 | Mar 2002 | WO |
WO 0284643 | Mar 2002 | WO |
Number | Date | Country | |
---|---|---|---|
20050071167 A1 | Mar 2005 | US |