This patent application relates to apparatuses and methods that process an image from a camera of a handheld device, to identify symbols therein.
Handheld devices such as a cell phone 108 (
Recognition of text in handheld camera captured image 107 (
MSERs are regions that are geometrically contiguous (and one can go from one pixel to any other pixel by traversing neighbors) with monotonic transformation in property values, and invariant to affine transformations (transformations that preserve straight lines and ratios of distances between points on the straight lines). Boundaries of MSERs may be used in the prior art as connected components (see act 114 in
One such method is described in, for example, an article entitled “Robust Text Detection In Natural Images With Edge-Enhanced Maximally Stable Extremal Regions” by Chen et al, believed to be published in IEEE International Conference on Image Processing (ICIP), September 2011 that is incorporated by reference herein in its entirety as background. MSERs are believed to have been first described by Matas et al., e.g. in an article entitled “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions”, Proc. Of British Machine Vision Conference, 2002, pages 384-393 that is incorporated by reference herein in its entirety. The method described by Matas et al. is known to be computationally expensive because the time taken to identify MSERs in an image. The time taken to identify MSERs in an image can be reduced by use of a method of the type described by Nister, et al., “Linear Time Maximally Stable Extremal Regions”, ECCV, 2008, Part II, LNCS 5303, pp 183-196, published by Springer-Verlag Berlin Heidelberg that is also incorporated by reference herein in its entirety.
The current inventors note that prior art methods of the type described by Chen et al. or by Matas et al. or by Nister et al. identify hundreds of MSERs, and sometimes identify thousands of MSERs in an image 107 (
OCR methods of the prior art originate in the field of document processing, wherein the document image contains a series of lines of text oriented parallel to one another (e.g. 20 lines of text on a page). Such OCR methods extract a vector (called “feature vector”) from binary values in each block and this vector that is then compared with a library of reference vectors generated ahead of time (based on training images of letters of an alphabet to be recognized). Next, a letter of the alphabet which is represented by a reference vector in the library that most closely matches the vector of the block is identified as recognized, to conclude OCR (“document” OCR).
The current inventors believe that MSER processing of the type described above, to detect a connected component for use in OCR, requires memory and processing power that is not normally available in today's handheld devices, such as a smart phone. Hence, there appears to be a need for methods and apparatuses to speed up MSER processing, of the type described below.
In several embodiments, intensities of pixels in an image of a scene in the real world are used to compute an attribute of a histogram of intensities, as a function of number of pixels at each intensity level. Hence, a histogram attribute may be used in automatic selection from the image, of one or more regions (in a process referred to as coarse localization), on which processing is to be performed to identify maximally stable extremal regions (MSERs) that are to be subject to OCR. An example of such an attribute is bimodality (more specifically, presence of two peaks distinct from one another) in the histogram, detection of which results in selection of the region for MSER processing.
Another such histogram attribute may be used in automatic selection of one or more parameters used in MSER processing, e.g. parameters Δ and Max Variation. A first example of such a histogram attribute (“support”) is the number of bins of the histogram in which corresponding counts of pixels exceed a threshold. In some embodiments, the just-described support attribute is varied (1) inversely with MSER parameter Δ and (2) directly with MSER parameter Max Variation. A second example attribute is variance, in the histogram of pixel intensities, which is also varied (1) inversely with MSER parameter Δ and (2) directly with MSER parameter Max Variation. A third example attribute is area above mean, in the histogram of pixel intensities, which is made to vary: (1) directly with MSER parameter Δ and (2) inversely with MSER parameter Max Variation.
Some embodiments make both uses of histogram attributes as described above, specifically by using one or more attributes to select a region for MSER processing, and also using one or more attributes to select the MSER parameters Δ and Max Variation. However, other embodiments make only a single use of such a histogram attribute, as described next. Certain embodiments use an attribute of the type described above to select a region for MSER processing, and parameters Δ and Max Variation are selected using any method. In other embodiments, a region for MSER processing is selected by any method, followed by using an attribute of the type described above to select MSER parameters Δ and Max Variation.
Accordingly, it is to be understood that several other aspects of the described embodiments will become readily apparent to those skilled in the art from the description herein, wherein it is shown and described various aspects by way of illustration. The drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
In several aspects of the described embodiments, an image (also called “handheld camera captured image”) of a scene of a real world (e.g. see
Next, in act 215, one or more processors 404 execute fourth instructions to perform MSER processing, e.g. using at least one portion (or block) that has been selected in act 212A. The MSER processing by execution of the fourth instructions may use a look-up table in memory 329 to obtain one or more input parameters in addition to the input identified by execution of the third instructions. The look-up table used in the fourth instructions may supply one or more specific combinations of values for the parameters Δ and Max Variation, which are input to an MSER method (also called MSER input parameters). Such a look-up table may be populated ahead of time, with specific values for Δ and Max Variation, e.g. determined by experimentation to generate contours that are appropriate for recognition of text in a natural image (e.g. image 501), such as value 8 for Δ and value 0.07 for Max Variation. Depending on the embodiment, the look-up table may be looked up using as an index, any attribute (of the type described herein), e.g. computed based on pixel intensities.
In some embodiments, the MSER processing in act 215 performed by execution of the fourth instructions includes comparing a difference in intensities of a pair of pixels in image 501 to a predetermined limit, followed by execution of fifth instructions to add to a list in memory 329 (
Such a region Qi may be identified by execution of fifth instructions in act 215 (
Regions may be identified in act 215 by use of a method of the type described in the article entitled “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions” by Matas et al. incorporated by reference above. Alternatively other methods can be used to perform connected component analysis and identification of regions in act 215 e.g. methods of the type described in an article entitled “Application of Floyd-Warshall Labelling Technique Identification of Connected Pixel Components In Binary Image” by Hyunkyung Shin and Joong Sang Shin, published in Kangweon-Kyungki Math. Jour. 14 (2006), No. 1, pp. 47-55 that is incorporated by reference herein in its entirety, or as described in an article entitled “Fast Connected Component Labeling Algorithm Using A Divide and Conquer Technique” by Jung-Me Park, Carl G. Looney and Hui-Chuan Chen, believed to be published in Matrix (2000), Volume: 4, Issue: 1, Publisher: Elsevier Ltd, pages 4-7 that is also incorporated by reference herein in its entirety.
Hence, a specific manner in which regions of an image 501 are identified in act 215 by mobile device 401 in described embodiments can be different, depending on the embodiment. As noted above, in several embodiments, each region of image 501 that is identified by use of an MSER method of the type described above is represented in memory 329 by act 215 in the form of a list of pixels, with two coordinates for each pixel, namely the x-coordinate and the y-coordinate in two dimensional space (of the image). The list of pixels is stored by act 215 in one or more memories, as a representation of a region Qi which is a maximally stable extremal region (MSER).
Act 215 is performed, in some embodiments, by one or more MSER processor(s) 352 (
In act 217, the one or more processors check if a plurality of portions of the entire image have been processed (evaluated for MSER processing), and if not return to act 212A (described above). If the entire image has been processed, then act 218 is performed by the one or more processors 404 to analyze the MSERs to identify one or more symbols in the image, e.g. by comparing with a library of symbols. For example, a binarized version of such an MSER is used in several described embodiments, as a connected component that is input to optical character recognition (OCR). Next, whichever one or more symbols are found in act 218 to be the closest match(es) is/are marked in one or more memories as being identified in the image, followed by returning to act 201. Specifically, in some embodiments, a predetermined number (e.g. 3) of symbols that are found to be closest to the input of OCR are identified by OCR, as alternatives to one another, while other embodiments of OCR identify a single symbol that is found to be closest to the OCR input.
In some embodiments, a histogram attribute computed in act 211B is used in act 212B (
One illustration of a histogram attribute that is computed in act 211B (described above) is shown in
Threshold 302 is identified in a predetermined manner, e.g. set to a fixed percent (or fraction), such as 10% of the maximum count or peak 303 among the N bins of histogram 301. For example, if the maximum count or peak 303 is 80, then the threshold 302 has a value of 8 and therefore support 309 is determined as the number of bins S (from among the N bins) of histogram 301 which have corresponding counts of pixels exceeding the value 8 (of threshold 302). Some embodiments of processor(s) 404 crop the histogram by executing seventh instructions, using threshold 302 in order to determine the support 309.
Support 309 in the form of the number of bins S as described in the preceding paragraph is an attribute that may be used in act 212B (described above) with a lookup table 1023 (
Some embodiments described above perform the method of
Support 309 in
Support 309 may be used in a predetermined test of some embodiments, to determine whether a corresponding image portion (from which histogram 301 was extracted) should be selected for MSER processing, as per act 212A in
Another illustration of such an attribute that is computed in act 211 and used in act 212B (
Another such attribute computed in some embodiments of act 211B (
Several embodiments of the type described above in reference to
Certain embodiments perform coarse localization in act 212 to generate input (B) in the form of one or more image portions that are to be subject to MSER processing as shown in
Mobile device 401 of some embodiments that performs the method shown in
In addition to memory 329, mobile device 401 may include one or more other types of memory such as flash memory (or SD card) 1008 and/or a hard disk and/or an optical disk (also called “secondary memory”) to store data and/or software for loading into memory 329 (also called “main memory”) and/or for use by processor(s) 404. Mobile device 401 may further include a wireless transmitter and receiver in transceiver 1010 and/or any other communication interfaces 1009. It should be understood that mobile device 401 may be any portable electronic device such as a cellular or other wireless communication device, personal communication system (PCS) device, personal navigation device (PND), Personal Information Manager (PIM), Personal Digital Assistant (PDA), laptop, camera, smartphone, tablet (such as iPad available from Apple Inc) or other suitable mobile platform that is capable of creating an augmented reality (AR) environment.
A mobile device 401 of the type described above may include other position determination methods such as object recognition using “computer vision” techniques. The mobile device 401 may also include means for remotely controlling a real world object which may be a toy, in response to user input on mobile device 401 e.g. by use of transmitter in transceiver 1010, which may be an IR or RF transmitter or a wireless a transmitter enabled to transmit one or more signals over one or more types of wireless communication networks such as the Internet, WiFi, cellular wireless network or other network. The mobile device 401 may further include, in a user interface, a microphone and a speaker (not labeled). Of course, mobile device 401 may include other elements unrelated to the present disclosure, such as a read-only-memory 1007 which may be used to store firmware for use by processor 404.
Also, depending on the embodiment, a mobile device 401 may perform reference free tracking and/or reference based tracking using a local detector in mobile device 401 to detect predetermined symbols in images, in implementations that execute the OCR software 1014 to identify, e.g. characters of text in an image. The above-described identification of blocks for use by OCR software 1014 may be performed in software (executed by one or more processors or processor cores) or in hardware or in firmware, or in any combination thereof.
In some embodiments of mobile device 401, the above-described MSER input generator 351 and MSER processor 352 are included in OCR software 1014 that is implemented by a processor 404 executing the software 320 in memory 329 of mobile device 401, although in other embodiments any one or more of MSER input generator 351 and MSER processor 352 are implemented in any combination of hardware circuitry and/or firmware and/or software in mobile device 401. Hence, depending on the embodiment, various functions of the type described herein of OCR software may be implemented in software (executed by one or more processors or processor cores) or in dedicated hardware circuitry or in firmware, or in any combination thereof.
Although some embodiments of one or more processor(s) 404 perform MSER processing after performing either act 212A (
If the decision in act 714 is that the MSER method is to be performed, then act 715 is performed by processor(s) 404. In act 715, another attribute of the histogram of pixel intensities in the selected rectangular portion is computed by processor(s) 404. Then, in an act similar to above-described act 212B, another lookup table 1023 of thresholds (also called “second table”) is used with this attribute (also called “second attribute”) by processor(s) 404 to identify (in act 715) one or more parameters that are input to an MSER method (such as Δ and Max Variation). Thereafter, in act 716, the MSER method is performed, e.g. as described above in reference to act 215. Subsequently, in act 717, the one or more processor(s) 404 check whether all rectangular portions have been processed and if not return to act 712 to select another rectangular portion for processing. When all rectangular portions have been processed, the one or more processor(s) 404 go from act 717 to act 718 to analyze the MSER regions to identify one or more symbols in the image followed by storing in one or more memories, the symbols identified in the image.
Accordingly, depending on the embodiment, any one or more of MSER input generator 351 and MSER processor 352 can, but need not necessarily include, one or more microprocessors, embedded processors, controllers, application specific integrated circuits (ASICs), digital signal processors (DSPs), and the like. The term processor is intended to describe the functions implemented by the system rather than specific hardware. Moreover, as used herein the term “memory” refers to any type of computer storage medium, including long term, short term, or other memory associated with the mobile platform, and is not to be limited to any particular type of memory or number of memories, or type of media upon which memory is stored.
Hence, methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in firmware 1013 (
Any non-transitory machine-readable medium tangibly embodying software instructions (also called “computer instructions”) may be used in implementing the methodologies described herein. For example, software 320 (
Non-transitory computer-readable media includes physical computer storage media. A non-transitory storage medium may be any available non-transitory medium that can be accessed by a computer. By way of example, and not limitation, such non-transitory computer-readable media can comprise RAM, ROM, Flash Memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other non-transitory medium that can be used to store program code in the form of instructions or data structures and that can be accessed by a computer; disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of non-transitory computer-readable media.
Although certain examples are illustrated in connection with specific embodiments for instructional purposes, the described embodiments is not limited thereto. Hence, although item 401 shown in
Depending on a specific symbol recognized in a handheld camera captured image, a user can receive different types of feedback depending on the embodiment. Additionally haptic feedback (e.g. by vibration of mobile device 401) is provided by triggering haptic feedback circuitry 1018 (
Accordingly, in some embodiments, one or more processor(s) 404 are programmed with software 320 in an apparatus to operate as means for receiving an image of a scene of real world, means for computing an attribute based on pixel intensities in the image, means for using the attribute to identify at least one input to be used in processing the image to identify at least one maximally stable extremal region therein, means for performing said processing to identify said at least one maximally stable extremal region based on said at least one input, and means for storing in one or more memories, the at least one maximally stable extremal region identified by said processing. In some of the just-described embodiments one or more processor(s) 404 are programmed with software 320 to operate as means for subsampling the image to obtain a subsampled version, means for identifying an additional maximally stable extremal region (also called “second maximally stable extremal region”) in the subsampled version and means for using a stroke width of the additional maximally stable extremal region to identify said portion to be subject to said processing.
Various adaptations and modifications may be made without departing from the scope of the described embodiments. Therefore, the spirit and scope of the appended claims should not be limited to the foregoing description. It is to be understood that several other aspects of the described embodiments will become readily apparent to those skilled in the art from the description herein, wherein it is shown and described various aspects by way of illustration. The drawings and detailed description are to be regarded as illustrative in nature. Numerous modifications and adaptations of the described embodiments are encompassed by the attached claims.
This application claims priority under 35 USC §119 (e) from U.S. Provisional Application No. 61/673,700 filed on Jul. 19, 2012 and entitled “Parameter Selection and Coarse Localization of Interest Regions for MSER Processing” which is incorporated herein by reference in its entirety. This application claims priority under 35 USC §119 (e) from U.S. Provisional Application No. 61/674,846 filed on Jul. 23, 2012 and entitled “Identifying A Maximally Stable Extremal Region (MSER) In An Image By Skipping Comparison Of Pixels In The Region” which is incorporated herein by reference in its entirety. This application is related to commonly-owned and concurrently filed U.S. application Ser. No. 13/797,433, entitled “Identifying A Maximally Stable Extremal Region (MSER) In An Image By Skipping Comparison Of Pixels In The Region” which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
3710321 | Rubenstein | Jan 1973 | A |
4654875 | Srihari et al. | Mar 1987 | A |
5321768 | Fenrich et al. | Jun 1994 | A |
5459739 | Handley et al. | Oct 1995 | A |
5465304 | Cullen et al. | Nov 1995 | A |
5519786 | Courtney et al. | May 1996 | A |
5563403 | Bessho et al. | Oct 1996 | A |
5633954 | Gupta et al. | May 1997 | A |
5751850 | Rindtorff | May 1998 | A |
5764799 | Hong et al. | Jun 1998 | A |
5768451 | Hisamitsu et al. | Jun 1998 | A |
5805747 | Bradford | Sep 1998 | A |
5835633 | Fujisaki et al. | Nov 1998 | A |
5844991 | Hochberg et al. | Dec 1998 | A |
5978443 | Patel | Nov 1999 | A |
6023536 | Visser | Feb 2000 | A |
6092045 | Stubley et al. | Jul 2000 | A |
6266439 | Pollard et al. | Jul 2001 | B1 |
6393443 | Rubin et al. | May 2002 | B1 |
6473517 | Tyan et al. | Oct 2002 | B1 |
6674919 | Ma et al. | Jan 2004 | B1 |
6678415 | Popat et al. | Jan 2004 | B1 |
6687421 | Navon | Feb 2004 | B1 |
6738512 | Chen et al. | May 2004 | B1 |
6954795 | Takao et al. | Oct 2005 | B2 |
7110621 | Greene et al. | Sep 2006 | B1 |
7142727 | Notovitz et al. | Nov 2006 | B2 |
7263223 | Irwin | Aug 2007 | B2 |
7333676 | Myers et al. | Feb 2008 | B2 |
7403661 | Curry et al. | Jul 2008 | B2 |
7450268 | Martinez et al. | Nov 2008 | B2 |
7724957 | Abdulkader | May 2010 | B2 |
7738706 | Aradhye et al. | Jun 2010 | B2 |
7783117 | Liu et al. | Aug 2010 | B2 |
7817855 | Yuille et al. | Oct 2010 | B2 |
7889948 | Steedly et al. | Feb 2011 | B2 |
7961948 | Katsuyama | Jun 2011 | B2 |
7984076 | Kobayashi et al. | Jul 2011 | B2 |
8009928 | Manmatha et al. | Aug 2011 | B1 |
8189961 | Nijemcevic et al. | May 2012 | B2 |
8194983 | Al-Omari et al. | Jun 2012 | B2 |
8285082 | Heck | Oct 2012 | B2 |
8306325 | Chang | Nov 2012 | B2 |
8417059 | Yamada | Apr 2013 | B2 |
8542926 | Panjwani et al. | Sep 2013 | B2 |
8644646 | Heck | Feb 2014 | B2 |
20020037104 | Myers et al. | Mar 2002 | A1 |
20030026482 | Dance | Feb 2003 | A1 |
20030099395 | Wang et al. | May 2003 | A1 |
20030215137 | Wnek | Nov 2003 | A1 |
20040179734 | Okubo | Sep 2004 | A1 |
20040240737 | Lim et al. | Dec 2004 | A1 |
20050041121 | Steinberg et al. | Feb 2005 | A1 |
20050123199 | Mayzlin et al. | Jun 2005 | A1 |
20050238252 | Prakash et al. | Oct 2005 | A1 |
20060039605 | Koga | Feb 2006 | A1 |
20060215231 | Borrey et al. | Sep 2006 | A1 |
20060291692 | Nakao et al. | Dec 2006 | A1 |
20070110322 | Yuille et al. | May 2007 | A1 |
20070116360 | Jung et al. | May 2007 | A1 |
20070217676 | Grauman et al. | Sep 2007 | A1 |
20080008386 | Anisimovich et al. | Jan 2008 | A1 |
20080063273 | Shimodaira | Mar 2008 | A1 |
20080112614 | Fluck et al. | May 2008 | A1 |
20090060335 | Rodriguez et al. | Mar 2009 | A1 |
20090202152 | Takebe et al. | Aug 2009 | A1 |
20090232358 | Cross | Sep 2009 | A1 |
20090252437 | Li et al. | Oct 2009 | A1 |
20090316991 | Geva et al. | Dec 2009 | A1 |
20090317003 | Heilper et al. | Dec 2009 | A1 |
20100049711 | Singh et al. | Feb 2010 | A1 |
20100067826 | Honsinger et al. | Mar 2010 | A1 |
20100080462 | Miljanic et al. | Apr 2010 | A1 |
20100128131 | Tenchio et al. | May 2010 | A1 |
20100141788 | Hwang et al. | Jun 2010 | A1 |
20100144291 | Stylianou et al. | Jun 2010 | A1 |
20100172575 | Lukac et al. | Jul 2010 | A1 |
20100195933 | Nafarieh | Aug 2010 | A1 |
20100232697 | Mishima et al. | Sep 2010 | A1 |
20100239123 | Funayama et al. | Sep 2010 | A1 |
20100245870 | Shibata | Sep 2010 | A1 |
20100272361 | Khorsheed et al. | Oct 2010 | A1 |
20100296729 | Mossakowski | Nov 2010 | A1 |
20110052094 | Gao et al. | Mar 2011 | A1 |
20110081083 | Lee et al. | Apr 2011 | A1 |
20110188756 | Lee et al. | Aug 2011 | A1 |
20110215147 | Goncalves | Sep 2011 | A1 |
20110222768 | Galic et al. | Sep 2011 | A1 |
20110249897 | Chaki et al. | Oct 2011 | A1 |
20110274354 | Nijemcevic | Nov 2011 | A1 |
20110280484 | Ma et al. | Nov 2011 | A1 |
20110285873 | Showering | Nov 2011 | A1 |
20120051642 | Berrani et al. | Mar 2012 | A1 |
20120066213 | Ohguro | Mar 2012 | A1 |
20120092329 | Koo et al. | Apr 2012 | A1 |
20120114245 | Lakshmanan et al. | May 2012 | A1 |
20120155754 | Chen et al. | Jun 2012 | A1 |
20130001295 | Goncalves | Jan 2013 | A1 |
20130058575 | Koo et al. | Mar 2013 | A1 |
20130129216 | Tsai et al. | May 2013 | A1 |
20130194448 | Baheti et al. | Aug 2013 | A1 |
20130195315 | Baheti et al. | Aug 2013 | A1 |
20130195360 | Krishna Kumar et al. | Aug 2013 | A1 |
20130195376 | Baheti et al. | Aug 2013 | A1 |
20130308860 | Mainali et al. | Nov 2013 | A1 |
20140003709 | Ranganathan et al. | Jan 2014 | A1 |
20140022406 | Baheti et al. | Jan 2014 | A1 |
20140023271 | Baheti et al. | Jan 2014 | A1 |
20140023273 | Baheti et al. | Jan 2014 | A1 |
20140023274 | Barman et al. | Jan 2014 | A1 |
20140023275 | Krishna Kumar et al. | Jan 2014 | A1 |
20140023278 | Krishna Kumar et al. | Jan 2014 | A1 |
20140161365 | Acharya | Jun 2014 | A1 |
20140168478 | Baheti et al. | Jun 2014 | A1 |
Number | Date | Country |
---|---|---|
1146478 | Oct 2001 | EP |
1840798 | Oct 2007 | EP |
2192527 | Jun 2010 | EP |
2453366 | Apr 2009 | GB |
2468589 | Sep 2010 | GB |
2004077358 | Sep 2004 | WO |
Entry |
---|
Chaudhuri B., Ed., “Digital Document Processing—Major Directions and Recent Advances”, 2007, Springer-Verlag London Limited, XP002715747, ISBN : 978-1-84628-501-1 pp. 103-106, p. 106, section “5.3.5 Zone Separation and Character Segmentation”, paragraph 1. |
Chaudhuri B.B., et al., “An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi)”, Proceedings of the 4th International Conference on Document Analysis and Recognition.(ICDAR). Ulm, Germany, Aug. 18-20, 1997; [Proceedings of the ICDAR], Los Alamitos, IEEE Comp. Soc, US, vol. 2, Aug. 18, 1997, pp. 1011-1015, XP010244882, DOI: 10.1109/ICDAR.1997.620662 ISBN: 978-0-8186-7898-1 the whole document. |
Chaudhury S (Eds.): “OCR Technical Report for the project Development of Robust Document Analysis and Recognition System for Printed Indian Scripts”, 2008, pp. 149-153, XP002712777, Retrieved from the Internet: URL:http://researchweb.iiit.ac.inj-jinesh/ocrDesignDoc.pdf [retrieved on Sep. 5, 2013]. |
Chen Y.L., “A knowledge-based approach for textual information extraction from mixed text/graphics complex document images”, Systems Man and Cybernetics (SMC), 2010 IEEE International Conference on, IEEE, Piscataway, NJ, USA, Oct. 10, 2010, pp. 3270-3277, XP031806156, ISBN: 978-1-4244-6586-6. |
Dalal N., et al., “Histograms of oriented gradients for human detection”, Computer Vision and Pattern Recognition, 2005 IEEE Computer Society Conference on, IEEE, Piscataway, NJ, USA, Jun. 25, 2005, pp. 886-893 vol. 1, XP031330347, ISBN: 978-0-7695-2372-9 Section 6.3. |
Forssen P.E., et al., “Shape Descriptors for Maximally Stable Extremal Regions”, Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, IEEE, PI, Oct. 1, 2007, pp. 1-8, XP031194514 , ISBN: 978-1- 4244-1630-1 abstract Section 2. Multi-resoltuion MSER. |
International Search Report and Written Opinion—PCT/US2013/049498—ISA/EPO—Oct. 29, 2013. |
Minoru M., Ed., “Character Recognition”, Aug. 2010, Sciyo, XP002715748, ISBN: 978-953-307-105-3 pp. 91-95, p. 92, section “7.3 Baseline Detection Process”. |
Pal U et al., “Multi-skew detection of Indian script documents” Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on Seattle, WA, USA Sep. 10-13, 2001, Los Aalmitos, CA, USA, IEEE Comput. Soc. US, Sep. 10, 2001, pp. 292-296, XP10560519, DOI.10.1109/ICDAR.2001.953801, ISBN: 978-0-7695-1263-1. |
Pal U., et al., “OCR in Bangla: an Indo-Bangladeshi language”, Pattern Recognition, 1994. vol. 2—Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conferenc E on Jerusalem, Israel Oct. 9-13, 1994, Los Alamitos, CA, USA, IEEE Comput. Soc. vol. 2, Oct 9, 1994, pp. 269-273, XP010216292, DOI: 10.1109/ICPR.1994.576917 ISBN: 978-0-8186-6270-6 the whole document. |
Premaratne H.L., et al., “Lexicon and hidden Markov model-based optimisation of the recognised Sinhala script”, Pattern Recognition Letters, Elsevier, Amsterdam, NL, vol. 27, No. 6, Apr. 15, 2006, pp. 696-705, XP027922538, ISSN: 0167-8655. |
Ray A.K et al., “Information Technology—Principles and Applications”. 2004. Prentice-Hall of India Private Limited. New Delhi! XP002712579, ISBN: 81-203-2184-7, pp. 529-531. |
Senda S., et al., “Fast String Searching in a Character Lattice,” IEICE Transactions on Information and Systems, Information & Systems Society, Tokyo, JP, vol. E77-D, No. 7, Jul. 1, 1994, pp. 846-851, XP000445299, ISSN: 0916-8532. |
Senk V., et al., “A new bidirectional algorithm for decoding trellis codes,” Eurocon' 2001, Trends in Communications, International Conference on Jul. 4-7, 2001, Piscataway, NJ, USA, IEEE, Jul. 4, 2001, pp. 34-36, vol. I, XP032155513, DOI :10.1109/EURCON.2001.937757 ISBN : 978-0-7803-6490-5. |
Sinha R.M.K., et al., “On Devanagari document processing”, Systems, Man and Cybernetics, 1995. Intelligent Systems for the 21st Century., IEEE International Conference on Vancouver, BC, CANADA Oct. 22-25, 1995, New York, NY, USA,IEEE, US, vol. 2, Oct. 22, 1995, pp. 1621-1626, XP010194509, DOI: 10.1109/ICSMC.1995.538004 ISBN: 978-0-7803-2559-3 the whole document. |
Song Y., et al., “A Handwritten Character Extraction Algorithm for Multi-language Document Image”, 2011 International Conference on Document Analysis and Recognition, Sep. 18, 2011, pp. 93-98, XP055068675, DOI: 10.1109/ICDAR.2011.28 ISBN: 978-1-45-771350-7. |
Uchida S et al., “Skew Estimation by Instances”, 2008 the Eighth IAPR International Workshop on Document Analysis Systems, Sep. 1, 2008, pp. 201-208, XP055078375, DOI: 10.1109/DAS.2008.22, ISBN: 978-0-76-953337-7. |
Unser M., “Sum and Difference Histograms for Texture Classification”, Transactions on Pattern Analysis and Machine Intelligence, IEEE, Piscataway, USA, vol. 30, No. 1, Jan. 1, 1986, pp. 118-125, XP011242912, ISSN: 0162-8828 section A; p. 122, right-hand column p. 123. |
Wu V., et al., “TextFinder: An Automatic System to Detect and Recognize Text in Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21. No. 11, Nov. 1, 1999, pp. 1224-1229, XP055068381. |
“4.1 Points and patches” In: Szeliski Richard: “Computer Vision—Algorithms and Applications”, 2011, Springer-Verlag, London, XP002696110, p. 195, ISBN: 978-1-84882-934-3. |
Agrawal M., et al., “2 Base Devanagari OCR System” In: Govindaraju V, Srirangataj S (Eds.): “Guide to OCR for Indic Scripts—Document Recognition and Retrieval”, 2009, Springer Science+Business Media, London, XP002696109, pp. 184-193, ISBN: 978-1-84888-329-3. |
Chowdhury A.R., et al., “Text Detection of Two Major Indian Scripts in Natural Scene Images”, Sep. 22, 2011, Camera-Based Document Analysis and Recognition, Springer Berlin Heidelberg, pp. 42-57, XP019175802, ISBN: 978-3-642-29363-4. |
Ghoshal R., et al., “Headline Based Text Extraction from Outdoor Images”, 4th International Conference on Pattern Recognition and Machine Intelligence, Springer LNCS, vol. 6744, Jun. 27, 2011, pp. 446-451, XP055060285. |
Papandreou A. et al., “A Novel Skew Detection Technique Based on Vertical Projections”, International Conference on Document Analysis and Recognition, Sep. 18, 2011, pp. 384-388, XP055062043, DOI: 10.1109/ICDAR.2011.85, ISBN: 978-1-45-771350-7. |
Setlur, et al., “Creation of data resources and design of an evaluation test bed for Devanagari script recognition”, Research Issues in Data Engineering: Multi-lingual Information Management, RIDE-MLIM 2003. Proceedings. 13th International Workshop, 2003, pp. 55-61. |
Agrawal, M. et al. “Generalization of Hindi OCR Using Adaptive Segmentation and Font Files,” V. Govindaraju, S. Setlur (eds.), Guide to OCR for Indic Scripts, Advances in Pattern Recognition, Springer-Verlag London Limited 2009, pp. 181-207. |
Dlagnekov, L. et al. “Detecting and Reading Text in Natural Scenes,” Oct. 2004, pp. 1-22. |
Elgammal, A.M. et al. “Techniques for Language Identification for Hybrid Arabic-English Document Images,” believed to be published in 2001 in Proceedings of IEEE 6th International Conference on Document Analysis and Recognition, pp. 1-5. |
Holmstrom, L. et al. “Neural and Statistical Classifiers-13 Taxonomy and Two Case Studies,” IEEE Transactions on Neural Networks, Jan. 1997, pp. 5-17, vol. 8 (1). |
Jain, A. K. et al. “Automatic Text Location in Images and Video Frames,” believed to be published in Proceedings of Fourteenth International Conference on Pattern Recognition, vol. 2, Aug. 1998, pp. 1497-1499. |
Machine Learning, retrieved from http://en.wikipedia.org/wiki/Machine—learning, May 7, 2012, pp. 1-8. |
Moving Average, retrieved from http://en.wikipedia.org/wiki/Moving—average, Jan. 23, 2013, pp. 1-5. |
Pardo, M. et al. “Learning From Data: A Tutorial With Emphasis on Modern Pattern Recognition Methods,” IEEE Sensors Journal, Jun. 2002, pp. 203-217, vol. 2 (3). |
Renold, M. “Detecting and Reading Text in Natural Scenes,” Master's Thesis, May 2008, pp. 1-59. |
Vedaldi, A. “An Implementation of Multi-Dimensional Maximally Stable Extremal Regions”, Feb. 7, 2007, pp. 1-7. |
VLFeat—Tutorials—MSER, retrieved from http://www.vlfeat.org/overview/mser.html, Apr. 30, 2012, pp. 1-2. |
Chaudhuri et al. “Skew Angle Detection of Digitized Indian Script Documents”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Feb. 1997, pp. 182-186, vol. 19, No. 2. |
Chen, et al. “Detecting and reading text in natural scenes,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'04), 2004, pp. 1-8. |
Epshtein et al. “Detecting text in natural scenes with stroke width transform,” Computer Vision and Pattern Recognition (CVPR) 2010, pp. 2963-2970, (as downloaded from “http://research.microsoft.com/pubs/149305/1509.pdf”). |
Jain, et al. “Automatic text location in images and video frames”, Pattern Recognition, 1998, pp. 2055-2076, vol. 31, No. 12. |
Jayadevan, et al., “Offline Recognition of Devanagari Script: A Survey”, IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews, 2010, pp. 1-15. |
Kapoor et al. “Skew angle detection of a cursive handwritten Devanagari script character image”, Indian Institute of Science, May-Aug. 2002, pp. 161-175. |
Lee, et al. “A new methodology for gray-scale character segmentation and recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Oct. 1996, pp. 1045-1050, vol. 18, No. 10. |
Li et al. “Automatic Text Detection and Tracking in a Digital Video”, IEEE Transactions on Image Processing, Jan. 2000, pp. 147-156, vol. 9 No. 1. |
Matas, et al. “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions”, Proc. of British Machine Vision Conference, 2002, pp. 384-393. |
Mikulik, et al. “Construction of Precise Local Affine Frames,” Center for Machine Perception, Czech Technical University in Prague, Czech Republic, Abstract and second paragraph of Section 1; Algorithms 1 & 2 of Section 2 and Section 4, International Conference on Pattern Recognition, 2010, pp. 1-5. |
Pal, et al. “Indian script character recognition: a survey”, Pattern Recognition Society, Published by Elsevier Ltd, 2004, pp. 1887-1899. |
Chen et al. “Robust Text Detection in Natural Images With Edge-Enhanced Maximally Stable Extremal Regions”, believed to be published in IEEE International Conference on Image Processing (ICIP), Sep. 2011, pp. 1-4. |
Nister, et al. “Linear Time Maximally Stable Extremal Regions”, ECCV, 2008, Part II, LNCS 5303, pp. 183-196, published by Springer-Verlag Berlin Heidelberg. |
Shin et al. “Application of Floyd-Warshall Labelling Technique: Identification of Connected Pixel Components in Binary Image”, published in Kangweon-Kyungki Math. Jour. 14 (2006), No. 1, pp. 47-55. |
Park et al. “Fast Connected Component Labeling Algorithm Using a Divide and Conquer Technique”, believed to be published in Matrix (2000), vol. 4, Issue: 1, Publisher: Elsevier Ltd, pp. 4-7. |
Lowe, D.G., “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, Jan. 5, 2004, 28 pp. |
Newell, A.J., et al.,“Multiscale histogram of oriented gradient descriptors for robust character recognition”, 2011 International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2011, 5 pp. |
Wikipedia, “Connected-Component Labeling,”, retrieved from http://en.wikipedia.org/wiki/Connected-component—labeling on May 14, 2012, date believed to be prior to Mar. 12, 2013, 7 pages. |
Wikipedia, “Histogram of Oriented Gradients,” retrieved from http://en.wikipedia.org/wiki/Histogram—of oriented—gradients on Apr. 30, 2015, date believed to be prior to Mar. 12, 2013, 7 pages. |
Kristensen, F., et al., “Real-Time Extraction of Maximally Stable Extremal Regions on an FPGA,” IEEE International Symposium on Circuits and Systems 2007 (ISCAS 2007), New Orleans, LA, May 27-30, 2007, pp. 165-168. |
Number | Date | Country | |
---|---|---|---|
20140023270 A1 | Jan 2014 | US |
Number | Date | Country | |
---|---|---|---|
61673700 | Jul 2012 | US | |
61674846 | Jul 2012 | US |