This patent application relates to apparatuses and methods that process an image to identify therein regions that differ from their surroundings.
Mobile devices such as a cell phone 108 (
MSERs are regions that are geometrically contiguous (and one can go from one pixel to any other pixel by traversing neighboring pixels in such a region) with monotonic transformation in property values, and invariant to affine transformations (transformations that preserve straight lines and ratios of distances between points on the straight lines). In prior art methods known to the current inventors, MSER detection evaluates intensities of all pixels in such a region (e.g. to ensure that the pixels contact one another, so that the region is contiguous).
After MSERs are identified, boundaries of MSERs may be used in the prior art as connected components (see act 114 in
MSERs are believed to have been first described by Matas et al., e.g. in an article entitled “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions”, Proc. Of British Machine Vision Conference, 2002, pages 384-393 that is incorporated by reference herein in its entirety. The method described by Matas et al. is known to be computationally expensive and a lot of time is normally taken to identify MSERs in an image. The time taken to identify MSERs in an image can be reduced by use of a method of the type described by Nister, et al., “Linear Time Maximally Stable Extremal Regions”, ECCV, 2008, Part II, LNCS 5303, pp 183-196, published by Springer-Verlag Berlin Heidelberg that is also incorporated by reference herein in its entirety.
The current inventors note that prior art methods of the type described by Chen et al. or by Matas et al. or by Nister et al. identify hundreds of MSERs in an image. Such methods sometimes identify thousands of MSERs in an image 107 that includes details of natural features, such as leaves of a tree or leaves of plants, shrubs, and bushes.
Identifying such large numbers of MSERs in today's computers, using methods of the type described above, while being accurate, takes a significant amount of time, depending on the amount of detail in portions of the image that contain natural features. The current inventors find such methods impractical for use in recognition of text by handheld devices, such as smart phones, due to inherent limitations of such devices, on computation power and memory, relative to computers. Hence, there appears to be a need for methods and apparatuses of the type described below.
In several embodiments, an image is processed to automatically identify regions to be subject to optical character recognition (OCR), as follows. One or more processors make comparisons using intensities (which are non-binary) of multiple pluralities of pixels (hereinafter compared pixels) that are located at corresponding positions in the image, to identify multiple sets of positions in multiple regions. At least two compared pixels identified in a given set of positions are separated from one another by one or more skipped pixels also included in the given set. Hence, each set (among the multiple sets) may be created by including therein positions of compared pixels that are used in comparisons, and additional positions of skipped pixels that are not used in any comparisons. Skipping of pixels, to create each set, reduces computation (relative to comparisons using all pixels that are identified in a set). Although pixels are skipped in creating a set, the compared pixels and the skipped pixels together identified in each set constitute a region that is contiguous in the image.
Comparisons that are made as described above, by using intensities of the compared pixels, can be different depending on the embodiment. For example, in certain embodiments, comparisons are made between the intensity of each pixel and a common threshold intensity i that is used to identify in the image, a region Qi that is a maximally stable extremal region or MSER. In other embodiments, comparisons are made between the intensity of one compared pixel and the intensity of another compared pixel, with positions of these two pixels being separated from one another by positions of skipped pixels whose number changes dynamically, so as to make the skipping of pixels adaptive, e.g. based on a difference in intensity between similar two pixels compared in a prior iteration.
Although computation is reduced by skipping comparison of certain pixels as noted above, the number of sets which are identified may still be quite large. Hence, in several embodiments, the number of sets created by the above-described comparison of pairs of pixels is automatically reduced, by merging two or more sets of positions, when one or more predetermined tests for merger are satisfied. Specifically, in certain embodiments, a first attribute of a first region identified by positions in a first set and a second attribute of a second region identified by positions in a second set are used in a test, and when the test is found to be satisfied, the first set and the second set are merged, to form a merged set.
A test which is used to merge two or more sets can be different, depending on the embodiment. In an example of such a test, the first attribute is a first line segment which is obtained by projection of the first region in a specific direction (e.g. horizontal) on to a specific axis (e.g. y-axis), and the second attribute is a second line segment which is obtained by projection of the second region in the same direction on to the same axis, and the test checks whether the first line segment completely overlaps the second line segment (or vice versa), e.g. by comparison of endpoints of the two line segments. When such a test is met by two sets, the positions in these two sets are aggregated, by grouping them together to form the merged set, which is then stored in one or more non-transitory computer readable storage media (e.g. non-volatile memory). Depending on the embodiment, a merged set of positions (which identifies a merged region) may itself be merged with any other set of positions of the type described above, to form a larger merged set, e.g. when the same test or another such test is found to be satisfied. One or more sets of the type described above (whether or not merged) are subsequently OCR processed, in the normal manner.
OCR processing of merged sets reduces the amount of processing that would be required in normal OCR of all sets resulting from comparison of pixels (whether or not the comparison was made by skipping pixels). For example, when two sets are merged together, to form a merged set as described above, OCR processing of the merged set requires less computation than OCR processing of each of the two sets individually. Moreover, in certain languages, OCR processing of such a merged set provides more accuracy than normal OCR processing of each of the two sets individually. For example, when an image has characters in Devanagari script, OCR of a first set with one or more positions indicative of a modifier of a character (called “maatra”), and OCR of a second set with positions indicative of the character is likely to be less accurate than OCR of a merged set of positions from both the first set and the second set together indicative of the modifier and the character in combination.
It is to be understood that several other aspects of the embodiments will become readily apparent to those skilled in the art from the description herein, wherein it is shown and described various aspects by way of illustration. The drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
In several aspects of the described embodiments, an image 201 (
Specifically, one or more processors 409 are programmed to perform an act 211 (
After a region Qi has been identified by act 211, the one or more processors 409 perform an act 212 in operation 210, to check whether all sets have been created, and if not return to act 211. When act 212 determines that no more sets can be created from image 301, operation 210 is completed. Subsequently, certain described embodiments may proceed to operation 220 to merge one or more sets. Skipping of comparisons (and pixels) to create a set of positions identifying a region that may be subject to OCR reduces comparisons and related computation that would be required in normal MSER, to create the set of positions. As noted above, the above-described act 211 is performed repeatedly, to generate for the image, multiple sets of positions identifying corresponding regions Qi, . . . Rj, . . . Sk etc, wherein at least some positions in each of these multiple sets are included without comparison of intensities of respective pixels, thereby to reduce overall computation.
Although computation is reduced in skipping comparison of some pixels in each of the sets that identify regions Qi, . . . Rj, . . . Sk as noted above, the number of sets which are identified may too large, to be individually subject to OCR in mobile device 401 (which has its own limit, on processing power available therein). Specifically, as noted in the background section above, the number of such sets depends on the amount of detail in image 201 (
Specifically, in certain embodiments, an operation 220 is performed to merge two regions Qi, and Rj by checking in act 221 whether a test is satisfied by attributes of these two regions. When the test is found to be satisfied in act 221, the corresponding sets of positions are merged in act 222, to form a merged set. Thereafter, act 223 checks if all regions have been checked in act 221 (including any regions identified by merged sets) and if no, then act 221 is again performed. Therefore, a merged set of positions (which identifies a merged region) created by act 222 may itself be merged with any other set of positions to form a larger merged set, e.g. when the same test of act 221 or another such test is found to be satisfied. When the answer in act 221 is yes, one or more merged sets are stored in one or more memories 329 (or other non-transitory computer-readable storage media) by one or more processors 409 (
Sets of the type described above (whether or not merged) are thereafter OCR processed in the normal manner, which depends on the specific embodiment. Specifically, after performance of merging in operation 220, the one or more processors 409 perform operation 230 to binarize in the normal manner, intensities of pixels in image 201 at the positions identified in a set (whether or not merged). Operation 230 is followed in some embodiments by another operation 240 wherein the region is classified as text/non-text when a predetermined test is satisfied and/or by use of a neural network that has been trained. The binary values and positions of regions that are classified as text are subsequently supplied in an operation 250 to an optical character recognition (OCR) system, e.g. to identify presence in the region of one or more letters of an alphabet of a predetermined language.
OCR processing of merged sets obtained by performing the operation 220 as described above reduces the amount of processing that would be otherwise required in normal OCR of all sets that are created by comparison of pixel intensities (whether or not the comparison was made by skipping pixels). For example, when sets of two regions Qi and Rj are merged together, to form a merged set as described above, OCR processing of the merged set requires less computation than OCR processing of each of the regions Qi and Rj individually.
Operation 210 of
Accordingly, several embodiments of the type described in the preceding paragraph skip a number D of pixels that increases at each iteration, and the increase may be a preset number δ of pixels (e.g. increase by δ=1 additional skipped pixel at each iteration), so that D increases by δ in each iteration. In other embodiments, an identical number D of pixels may be skipped in each iteration e.g. D=1 pixel may be skipped (i.e. only a single skipped pixel is located between each pair of pixels whose intensities are compared in each iteration of act 311). In still other embodiments, the number D of pixels that are skipped at each iteration may be varied, e.g. at random, or based on a change in intensity therebetween.
In certain embodiments of the type described above, a difference in intensities (which are non-binary) of a pair of pixels such as pixels (1,0) and (3,0) is compared to a threshold, wherein the pair of pixels (1,0) and (3,0) are located in the image, separated from one another by at least pixel (2,0) between them. When the threshold is found by a comparison to have been exceeded, a position of a pixel in the pair (“selected pixel”), such as pixel (1,0) is added to a set of positions that identify a region (or a portion of the region), to be subject to OCR. Also added to the set is at least one position of a pixel not used in comparison, and located adjacent to the selected pixel, e.g. skipped pixel (2,0) when it is found to satisfy MSER constraints.
In some embodiments, image 201 is low-pass filtered and down-sampled as illustrated by image 202 (
Note that in most embodiments of the type described above, act 311 (
After performing the act 311, in act 312 in operation 310 (
When there are more pixels to check, one or more processors 409 perform act 311 on another pair of pixels (not including the skipped pixel). If the answer in act 312 is yes, then the one or more processors 409 in mobile device 401 perform act 313 to read temporary storage and add to a set of positions (to be subject to OCR), at least a position of a pixel in the pair (“selected pixel”). In act 313 in operation 310, the one or more processors 409 also add to the set of positions, one or more additional positions (e.g. of one or more additional pixels) in normal sized image, e.g. image 301 that is/are located adjacent to the selected pixel, and saved in temporary storage (as per act 316) after which the temporary storage may be freed (as per act 313A). Depending on the embodiment, the additional pixel(s) may or may not include one or more skipped pixels.
In some embodiments, pixels identified in a set of positions (which may be implemented in a list) identify a region Qi that includes a local extrema in the image 201, such as a local maxima or a local minima. Such a region Qi may be identified in act 312 (
After act 313, the one or more processors 409 perform act 314 in operation 310, to check if there are any more pixels to be checked, for the current set of positions, e.g. check if there are any more unvisited neighbors (separated by one or more skipped pixels), and if so return to act 311. If in act 313, the one or more processors 409 find that there are no more pixels to visit, then the one or more processors 409 perform act 315, similar or identical to act 212(described above).
In act 315 in operation 310, the one or more processors 409 check if all sets of positions that can be identified in image 301 have been identified, e.g. whether there are no more unvisited pixels in image 301 (other than skipped pixels). Typically, numerous sets of positions (e.g. tens of sets or even hundreds of sets) are identified by repeated performance of acts 311-314 described above on an image 201 (
Note that a region of non-text pixels indicative of natural features 303 in image 201 is now identified based on skipping pixels as described above (see
Specific acts performed during merging in operation 320 can be different in different embodiments, although acts 321-324 illustrated in
In
In some embodiments, in addition to the just-described checking of overlap, the test of act 321 may additionally check other conditions, e.g. proximity of the two regions Qi and Rj in the direction of the projection (in this example, proximity in the horizontal direction, or along the x-axis). For example, the test of act 321 may check whether a block that contains region Rj is located immediately above, or below, or to the left, or to the right of a block that contains region Qi with no intervening block therebetween. When the test of act 321 is satisfied, these two blocks of regions Qi and Rj are merged. Additional tests for merger of two adjacent blocks of regions Qi and Rj may be based on, for example, relative heights of the two blocks, and/or aspect ratio of either or both blocks.
In one example illustrated in
When the test is satisfied in act 321, the one or more processors 409 go to act 322 to prepare a merged set of positions, by including therein positions in the first set and positions in the second set. For example blocks 361 and 362 shown in
In several embodiments of the type described above, two regions Qi and Rj that are merged with one another are both identified by skipping pixels. However, in other embodiment only one of regions Qi and Rj is identified by skipping pixels while the other of regions Qi and Rj is identified without skipping pixels.
For text expressed in certain natural languages (such as Hindi), OCR processing of a merged set which is created as described above may provide more accuracy than normal OCR processing of each of the two sets individually. For example, when an image has characters in Devanagari script, OCR of a first set with one or more positions indicative of an accent mark (called “maatra”) 361 (see
Note that the number of skipped pixels is determined in some embodiments illustrated in
In some embodiments, skipping of pixels during comparison in act 211 (described above) is performed by obtaining the pixels by performance of an operation 410 (
However, image 202 which is a low pass image need not be generated in some embodiments which simply perform a mapping to identify positions of N/D2 pixels in the image 201 of normal size, as constituting the image 202. Moreover, although image 201 is low-pass filtered (as per act 411) prior to mapping (as per act 412) in some embodiments, other embodiments may directly perform such mapping without low-pass filtering.
Omission of D2−1 pixels is illustrated in an example shown in
The scaled-down image 450 identified by positions in set 451 is then used in operation 420 (
An example of a region that results from operation 420 is shown in
The rest of set 431S is generated in act 432 in
In some embodiments, positions of the D2−1 pixels are not obtained by interpolation and instead these positions are simply selected in a predetermined manner, for example, to form a square of positions that are D2 in number in region 470, in one example with a position being supplemented located at a top left corner of the square, or in another example the position being supplemented located at a center of the square (when the value of D is odd). Accordingly, the set 431S (e.g. stored in memory 329) of positions in image 440 (
In some embodiments, region 470 is then classified as text or non-text in operation 240 followed by input to an operation 250 (
Certain embodiments perform acts illustrated in
In several embodiments, a value of the above-described subsampling factor D is known ahead of time (e.g. value 2 or 3 is predetermined) and this value is used in subsampling the entire image (e.g. in image subsampler 771 in
Performing OCR on such binary values, which have been generated by skipping pixels as described above, is unexpectedly found to be more effective than OCR on regions identified by using all pixels, i.e. more effective than including skipped pixels when identifying a region. Skipping of pixels to identify regions as described above eliminates certain regions (“non-text regions”) that may otherwise be identified, but result in unnecessary work by the OCR system. Also, intensities of pixels that are neighbors to one another within such non-text regions may not be as uniform relative to one another, as in regions identified by skipping pixels. Skipping pixels as described herein not only eliminates some non-text regions, but also improves speed because fewer regions are identified, classified as text, and supplied to the OCR system.
In certain embodiments, subsampling and contour generation are performed independent of one another, with subsampling being performed on an entire image, followed by MSER identification being performed on the scaled-down image as described above. In other embodiments, MSER identification and subsampling are performed incrementally and iteratively within an image, so a value of subsampling factor D for a next iteration is calculated dynamically based on properties of pixels that surround a pixel in a current iteration (also called “current pixel”), as illustrated in
Specifically, certain embodiments of the type illustrated in
Several embodiments of the type illustrated in
Hence, a count of pixels, between a current pixel and a pixel whose intensity is being compared in adaptive downsampling changes depending at least partially on the predetermined threshold being not exceeded by the deviation. So, deviation is filtered spatially to determine the amount of down-sampling that can be done. Other adaptive downsampling approaches that could work are to derive other statistics that quantify variation. An example is the ratio of the frequency response at Fs/2 to the response at DC. This is simply the ratio: H(Fs/2, Fs/2)/H(0,0). In some embodiments, H is the 2-D discrete Fourier transform. In some adaptive downsampling embodiments, the down-sampling rate is kept constant at least in a small region so that the complexity of dealing with varying down-sampling rates does not become too high.
In some embodiments that perform adaptive downsampling, a processor 409 starts in act 500 (
Thereafter, in act 503 (
If the answer in act 503 is no, processor 409 goes to act 504 to save the value of D (in the variable LastD), and then goes to act 505 to increase the value of D, e.g. by multiplying it by a predetermined rate (which may be 2, for example). The increased D is then used in performing the act 502 (described above). When the answer in act 503 is yes, processor 509 goes to act 506 to check if the difference between D and LastD is less than a limit, and if the answer is no then processor 509 goes to act 507. In act 507, the value of D is reduced, e.g. to the value (D+LastD)/2, and then act 502 is again performed. If the answer in act 506 is yes, then a local minima has been found, and therefore processor 409 starts an operation 510 to create a set of positions that is to identify an MSER, as described next.
Specifically, in act 511 of operation 510, processor 409 adds the current pixel's position to a stack, to start creating a set for a current extremal region. Then, in act 512, processor 409 adds to the stack, any positions in the image of pixels that were just skipped in reaching the most-recently added position, from a previously added position (if any pixels were skipped). Subsequently, processor 409 pops the heap in act 513 and checks in act 514 if the heap was empty. If the answer in act 514 is yes, then operation 510 ends, and creation of the set of positions, to identify an MSER is completed, and the set is stored in memory.
If the answer in act 514 is no, then act 515 is performed to check if a value of intensity of the current pixel (which has just been obtained, by popping the heap) is less than the pixel of the 2nd component on the stack. In act 515, if the answer yes, then processor 409 returns to act 502 (described above). If the answer in act 515 is no, then processor 409 goes to act 516, to merge the last two connected components from the stack. Then processor 409 goes to act 517, to check if the grey level of the current pixel is greater than a pixel at the top of the stack. If the answer in act 517 is no, processor 409 returns to act 502 (described above). If the answer in act 517 is yes, operation 510 is completed.
After completion of operation 510, processor 409 performs acts 521-522 of the type described by Nister et al. Specifically in some embodiments, in act 521, processor 409 computes variation scores, followed by finding an MSER, followed by filtering. Then in act 522, processor 409 stores in memory 329 one or more MSERs, by storing pixels that have been determined to be included in an MSER as well as any pixels located between the just-described pixels and were skipped. After act 522, processor 409 returns to act 515 as shown by branch 529, so the process from act 515 to act 522 continues as long as current value of a pixel popped up from the heap is not less than the pixel value of second component in the stack.
Certain embodiments use a discrete wavelet transform (DWT) to determine a value of the downsampling factor D. An example (not shown) uses two dyadic CDF 5/3 wavelet stages of a predetermined image (called Castle image), used to identify regions with almost zero edge information (which, depending on the embodiment, are down-sampled with a high value of factor D, e.g. value 8, or even completely segmented out), from regions where edge information is clearly preserved across one or even both levels of the Wavelet decomposition (which are down-sampled at a low value of factor D, e.g. value 2). The ith level of the dyadic wavelet decomposition produces sub-bands down-sampled by 2i. Note that certain image portions containing prominent high frequency content may also be down-sampled significantly, because the edge information is sufficiently preserved at the lower resolutions as well, as observed in the three smaller “edge-map” images. Another benefit of using the discrete wavelet transform (DWT) is that in some embodiments the (Low, Low) frequency sub-band at various stages of the wavelet decomposition is automatically used as a down-sampled image at certain resolution, and an MSER method is run directly on the portions of these down-sampled images.
Accordingly, some embodiments of mobile device 401 perform acts 611-617 as illustrated in
Mobile device 401 of some embodiments that performs the method shown in
In addition to memory 329, mobile device 401 may include one or more other types of memory such as flash memory (or SD card) 1008 and/or a hard disk and/or an optical disk (also called “secondary memory”) to store data and/or software for loading into memory 329 (also called “main memory”) and/or for use by processor(s) 404. Mobile device 401 may further include a wireless transmitter and receiver in transceiver 1010 and/or any other communication interfaces 1009. It should be understood that mobile device 401 may be any portable electronic device such as a cellular or other wireless communication device, personal communication system (PCS) device, personal navigation device (PND), Personal Information Manager (PIM), Personal Digital Assistant (PDA), laptop, camera, smartphone, tablet (such as iPad available from Apple Inc) or other suitable mobile platform that is capable of creating an augmented reality (AR) environment.
A mobile device 401 of the type described above may include other position determination methods such as object recognition using “computer vision” techniques. The mobile device 401 may also include means for remotely controlling a real world object which may be a toy, in response to user input on mobile device 401 e.g. by use of transmitter in transceiver 1010, which may be an IR or RF transmitter or a wireless a transmitter enabled to transmit one or more signals over one or more types of wireless communication networks such as the Internet, WiFi, cellular wireless network or other network. The mobile device 401 may further include, in a user interface, a microphone and a speaker (not labeled). Of course, mobile device 401 may include other elements unrelated to the present disclosure, such as a read-only-memory 1007 which may be used to store firmware for use by processor 404.
Also, depending on the embodiment, a mobile device 401 may perform reference free tracking and/or reference based tracking using a local detector in mobile device 401 to detect text on objects (e.g. billboards) in a scene of real world, in implementations that execute the OCR software 1014 to generate augmented text (e.g. a word translated from Hindi in an image, into the language English) to display to on screen 402. The above-described identification of letter candidates for use by OCR software 1014 may be performed in software (executed by one or more processors or processor cores) or in hardware or in firmware, or in any combination thereof.
In some embodiments of mobile device 401, the above-described contour upsampler 352, MSER Identifier 355 and image subsampler 351 are included in a text recognition module that is implemented by a processor 404 executing the software 328 in memory 329 of mobile device 401, although in other embodiments any one or more of contour upsampler 352, MSER Identifier 355 and image subsampler 351 are implemented in any combination of hardware circuitry and/or firmware and/or software in mobile device 401. Hence, depending on the embodiment, various functions of the type described herein of a text recognition module may be implemented in software (executed by one or more processors or processor cores) or in dedicated hardware circuitry or in firmware, or in any combination thereof.
Accordingly, depending on the embodiment, any one or more of contour upsampler 352, MSER Identifier 355 and image subsampler 351 can, but need not necessarily include, one or more microprocessors, embedded processors, controllers, application specific integrated circuits (ASICs), digital signal processors (DSPs), and the like. The term processor is intended to describe the functions implemented by the system rather than specific hardware. Moreover, as used herein the term “memory” refers to any type of computer storage medium, including long term, short term, or other memory associated with the mobile platform, and is not to be limited to any particular type of memory or number of memories, or type of media upon which memory is stored.
Hence, methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in firmware 1013 (
Any non-transitory machine-readable medium tangibly embodying software instructions (also called “computer instructions”) may be used in implementing the methodologies described herein. For example, software 328 (
Non-transitory computer-readable media includes physical computer storage media. A non-transitory storage medium may be any available non-transitory medium that can be accessed by a computer. By way of example, and not limitation, such non-transitory computer-readable media can comprise RAM, ROM, Flash Memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other non-transitory medium that can be used to store program code in the form of instructions or data structures and that can be accessed by a computer; disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of non-transitory computer-readable media.
Although several aspects are illustrated in connection with specific embodiments for instructional purposes, the embodiments are not limited thereto. Hence, although item shown in
Although in some embodiments, a single image (e.g. from a still shot) is subsampled for MSER processing to identify regions, followed by upsampling identified regions that are then subject to text recognition, in other embodiments a sequence of images in a video are processed in the above-described manner, similar to processing of the single image.
Various adaptations and modifications may be made without departing from the scope of the described embodiments. Therefore, the spirit and scope of the appended claims should not be limited to the foregoing description. It is to be understood that several other aspects of the described embodiments will become readily apparent to those skilled in the art from the description herein, wherein it is shown and described various aspects by way of illustration. The drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
This application claims priority under 35 USC §119 (e) from U.S. Provisional Application No. 61/674,846 filed on Jul. 23, 2012 and entitled “Identifying A Maximally Stable Extremal Region (MSER) In An Image By Skipping Comparison Of Pixels In The Region”, which is incorporated herein by reference in its entirety. This application also claims priority under 35 USC §119 (e) from U.S. Provisional Application No. 61/673,700 filed on Jul. 19, 2012 and entitled “Parameter Selection And Coarse Localization Of Interest Regions For MSER Processing”, which is incorporated herein by reference in its entirety. This application is related to concurrently filed and co-owned U.S. application Ser. No. 13/796,729 entitled “Parameter Selection And Coarse Localization Of Interest Regions For MSER Processing” which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
3710321 | Rubenstein | Jan 1973 | A |
4654875 | Srihari et al. | Mar 1987 | A |
5321768 | Fenrich et al. | Jun 1994 | A |
5459739 | Handley et al. | Oct 1995 | A |
5465304 | Cullen et al. | Nov 1995 | A |
5519786 | Courtney et al. | May 1996 | A |
5563403 | Bessho et al. | Oct 1996 | A |
5633954 | Gupta et al. | May 1997 | A |
5751850 | Rindtorff | May 1998 | A |
5764799 | Hong et al. | Jun 1998 | A |
5768451 | Hisamitsu et al. | Jun 1998 | A |
5805747 | Bradford | Sep 1998 | A |
5835633 | Fujisaki et al. | Nov 1998 | A |
5844991 | Hochberg et al. | Dec 1998 | A |
5978443 | Patel | Nov 1999 | A |
6023536 | Visser | Feb 2000 | A |
6266439 | Pollard et al. | Jul 2001 | B1 |
6393443 | Rubin et al. | May 2002 | B1 |
6473517 | Tyan et al. | Oct 2002 | B1 |
6674919 | Ma et al. | Jan 2004 | B1 |
6678415 | Popat et al. | Jan 2004 | B1 |
6687421 | Navon | Feb 2004 | B1 |
6738512 | Chen et al. | May 2004 | B1 |
6954795 | Takao et al. | Oct 2005 | B2 |
7142727 | Notovitz et al. | Nov 2006 | B2 |
7263223 | Irwin | Aug 2007 | B2 |
7333676 | Myers et al. | Feb 2008 | B2 |
7403661 | Curry et al. | Jul 2008 | B2 |
7450268 | Martinez et al. | Nov 2008 | B2 |
7724957 | Abdulkader | May 2010 | B2 |
7738706 | Aradhye et al. | Jun 2010 | B2 |
7783117 | Liu et al. | Aug 2010 | B2 |
7817855 | Yuille et al. | Oct 2010 | B2 |
7889948 | Steedly et al. | Feb 2011 | B2 |
7961948 | Katsuyama | Jun 2011 | B2 |
7984076 | Kobayashi et al. | Jul 2011 | B2 |
8009928 | Manmatha et al. | Aug 2011 | B1 |
8189961 | Nijemcevic et al. | May 2012 | B2 |
8194983 | Al-Omari et al. | Jun 2012 | B2 |
8285082 | Heck | Oct 2012 | B2 |
8306325 | Chang | Nov 2012 | B2 |
8417059 | Yamada | Apr 2013 | B2 |
8542926 | Panjwani et al. | Sep 2013 | B2 |
8644646 | Heck | Feb 2014 | B2 |
20030026482 | Dance | Feb 2003 | A1 |
20030099395 | Wang et al. | May 2003 | A1 |
20030215137 | Wnek | Nov 2003 | A1 |
20040179734 | Okubo | Sep 2004 | A1 |
20040240737 | Lim et al. | Dec 2004 | A1 |
20050041121 | Steinberg et al. | Feb 2005 | A1 |
20050123199 | Mayzlin et al. | Jun 2005 | A1 |
20050238252 | Prakash et al. | Oct 2005 | A1 |
20060039605 | Koga | Feb 2006 | A1 |
20060215231 | Borrey et al. | Sep 2006 | A1 |
20060291692 | Nakao et al. | Dec 2006 | A1 |
20070110322 | Yuille et al. | May 2007 | A1 |
20070116360 | Jung et al. | May 2007 | A1 |
20070217676 | Grauman et al. | Sep 2007 | A1 |
20080008386 | Anisimovich et al. | Jan 2008 | A1 |
20080063273 | Shimodaira | Mar 2008 | A1 |
20080112614 | Fluck et al. | May 2008 | A1 |
20090060335 | Rodriguez et al. | Mar 2009 | A1 |
20090202152 | Takebe et al. | Aug 2009 | A1 |
20090232358 | Cross | Sep 2009 | A1 |
20090252437 | Li et al. | Oct 2009 | A1 |
20090316991 | Geva et al. | Dec 2009 | A1 |
20090317003 | Heilper et al. | Dec 2009 | A1 |
20100049711 | Singh et al. | Feb 2010 | A1 |
20100067826 | Honsinger et al. | Mar 2010 | A1 |
20100080462 | Miljanic et al. | Apr 2010 | A1 |
20100128131 | Tenchio et al. | May 2010 | A1 |
20100141788 | Hwang et al. | Jun 2010 | A1 |
20100144291 | Stylianou et al. | Jun 2010 | A1 |
20100172575 | Lukac et al. | Jul 2010 | A1 |
20100195933 | Nafarieh | Aug 2010 | A1 |
20100232697 | Mishima et al. | Sep 2010 | A1 |
20100239123 | Funayama et al. | Sep 2010 | A1 |
20100245870 | Shibata | Sep 2010 | A1 |
20100272361 | Khorsheed et al. | Oct 2010 | A1 |
20100296729 | Mossakowski | Nov 2010 | A1 |
20110052094 | Gao et al. | Mar 2011 | A1 |
20110081083 | Lee et al. | Apr 2011 | A1 |
20110188756 | Lee et al. | Aug 2011 | A1 |
20110215147 | Goncalves | Sep 2011 | A1 |
20110222768 | Galic et al. | Sep 2011 | A1 |
20110249897 | Chaki et al. | Oct 2011 | A1 |
20110274354 | Nijemcevic | Nov 2011 | A1 |
20110280484 | Ma et al. | Nov 2011 | A1 |
20110285873 | Showering | Nov 2011 | A1 |
20120051642 | Berrani et al. | Mar 2012 | A1 |
20120066213 | Ohguro | Mar 2012 | A1 |
20120092329 | Koo et al. | Apr 2012 | A1 |
20120114245 | Lakshmanan et al. | May 2012 | A1 |
20120155754 | Chen et al. | Jun 2012 | A1 |
20130001295 | Goncalves | Jan 2013 | A1 |
20130058575 | Koo et al. | Mar 2013 | A1 |
20130129216 | Tsai et al. | May 2013 | A1 |
20130194448 | Baheti et al. | Aug 2013 | A1 |
20130195315 | Baheti et al. | Aug 2013 | A1 |
20130195360 | Krishna Kumar et al. | Aug 2013 | A1 |
20130195376 | Baheti et al. | Aug 2013 | A1 |
20130308860 | Mainali et al. | Nov 2013 | A1 |
20140003709 | Ranganathan et al. | Jan 2014 | A1 |
20140022406 | Baheti et al. | Jan 2014 | A1 |
20140023270 | Baheti et al. | Jan 2014 | A1 |
20140023273 | Baheti et al. | Jan 2014 | A1 |
20140023274 | Barman et al. | Jan 2014 | A1 |
20140023275 | Krishna Kumar et al. | Jan 2014 | A1 |
20140023278 | Krishna Kumar et al. | Jan 2014 | A1 |
20140161365 | Acharya | Jun 2014 | A1 |
20140168478 | Baheti et al. | Jun 2014 | A1 |
Number | Date | Country |
---|---|---|
1146478 | Oct 2001 | EP |
1840798 | Oct 2007 | EP |
2192527 | Jun 2010 | EP |
2453366 | Apr 2009 | GB |
2468589 | Sep 2010 | GB |
2004077358 | Sep 2004 | WO |
Entry |
---|
Chaudhuri et al. “Skew Angle Detection of Digitized Indian Script Documents”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Feb. 1997, pp. 182-186, vol. 19, No. 2. |
Chen, et al. “Detecting and reading text in natural scenes,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'04), 2004, pp. 1-8. |
Epshtein et al. “Detecting text in natural scenes with stroke width transform,” Computer Vision and Pattern Recognition (CVPR) 2010, pp. 2963-2970, (as downloaded from “http://research.microsoft.com/pubs/149305/1509.pdf”). |
Jain, et al. “Automatic text location in images and video frames”, Pattern Recognition, 1998, pp. 2055-2076, vol. 31, No. 12. |
Jayadevan, et al. “Offline Recognition of Devanagari Script: A Survey”, IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews, 2010, pp. 1-15. |
Kapoor et al. “Skew angle detection of a cursive handwritten Devanagari script character image”, Indian Institute of Science, May-Aug. 2002, pp. 161-175. |
Lee, et al. “A new methodology for gray-scale character segmentation and recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Oct. 1996, pp. 1045-1050, vol. 18, No. 10. |
Li et al. “Automatic Text Detection and Tracking in a Digital Video”, IEEE Transactions on Image Processing, Jan. 2000, pp. 147-156, vol. 9 No. 1. |
Matas, et al. “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions”, Proc. of British Machine Vision Conference, 2002, pp. 384-393. |
Mikulik, et al. “Construction of Precise Local Affine Frames,” Center for Machine Perception, Czech Technical University in Prague, Czech Republic, Abstract and second paragraph of Section 1; Algorithms 1 & 2 of Section 2 and Section 4, International Conference on Pattern Recognition, 2010, pp. 1-5. |
Pal, et al. “Indian script character recognition: a survey”, Pattern Recognition Society, Published by Elsevier Ltd, 2004, pp. 1887-1899. |
Chen et al. “Robust Text Detection in Natural Images With Edge-Enhanced Maximally Stable Extremal Regions”, believed to be published in IEEE International Conference on Image Processing (ICIP), Sep. 2011, pp. 1-4. |
Nister, et al. “Linear Time Maximally Stable Extremal Regions”, ECCV, 2008, Part II, LNCS 5303, pp. 183-196, published by Springer-Verlag Berlin Heidelberg. |
Agrawal, M. et al. “Generalization of Hindi OCR Using Adaptive Segmentation and Font Files,” V. Govindaraju, S. Setlur (eds.), Guide to OCR for Indic Scripts, Advances in Pattern Recognition, Springer-Verlag London Limited 2009, pp. 181-207. |
Dlagnekov, L. et al. “Detecting and Reading Text in Natural Scenes,” Oct. 2004, pp. 1-22. |
Elgammal, A.M. et al. “Techniques for Language Identification for Hybrid Arabic-English Document Images,” believed to be published in 2001 in Proceedings of IEEE 6th International Conference on Document Analysis and Recognition, pp. 1-5. |
Holmstrom, L. et al. “Neural and Statistical Classifiers—Taxonomy and Two Case Studies,” IEEE Transactions on Neural Networks, Jan. 1997, pp. 5-17, vol. 8 (1). |
Jain, A. K. et al. “Automatic Text Location in Images and Video Frames,” believed to be published in Proceedings of Fourteenth International Conference on Pattern Recognition, vol. 2, Aug. 1998, pp. 1497-1499. |
Machine Learning, retrieved from http://en.wikipedia.org/wiki/Machine—learning, May 7, 2012, pp. 1-8. |
Moving Average, retrieved from http://en.wikipedia.org/wiki/Moving—average, Jan. 23, 2013, pp. 1-5. |
Pardo, M. et al. “Learning From Data: A Tutorial With Emphasis on Modern Pattern Recognition Methods,” IEEE Sensors Journal, Jun. 2002, pp. 203-217, vol. 2 (3). |
Park, J-M. et al. “Fast Connected Component Labeling Algorithm Using a Divide and Conquer Technique,” believed to be published in Matrix (2000), vol. 4 (1), pp. 4-7, Publisher: Elsevier Ltd. |
Renold, M. “Detecting and Reading Text in Natural Scenes,” Master's Thesis, May 2008, pp. 1-59. |
Shin, H. et al. “Application of Floyd-Warshall Labelling Technique: Identification of Connected Pixel Components in Binary Image,” Kangweon-Kyungki Math. Jour. 14(2006), No. 1, pp. 47-55. |
Vedaldi, A. “An Implementation of Multi-Dimensional Maximally Stable Extremal Regions”, Feb. 7, 2007, pp. 1-7. |
VLFeat—Tutorials—MSER, retrieved from http://www.vlfeat.org/overview/mser.html, Apr. 30, 2012, pp. 1-2. |
“4.1 Points and patches” In: Szeliski Richard: “Computer Vision—Algorithms and Applications”, 2011, Springer-Verlag, London, XP002696110, p. 195, ISBN: 978-1-84882-934-3. |
Agrawal M., et al., “2 Base Devanagari OCR System” In: Govindaraju V, Srirangataj S (Eds.): “Guide to OCR for Indic Scripts—Document Recognition and Retrieval”, 2009, Springer Science+Business Media, London, XP002696109, pp. 184-193, ISBN: 978-1-84888-329-3. |
Chowdhury A.R., et al., “Text Detection of Two Major Indian Scripts in Natural Scene Images”, Sep. 22, 2011 (Sep. 2, 2011), Camera-Based Document Analysis and Recognition, Springer Berlin Heidelberg, pp. 42-57, XP019175802, ISBN: 978-3-642-29363-4. |
Ghoshal R., et al., “Headline Based Text Extraction from Outdoor Images”, 4th International Conference on Pattern Recognition and Machine Intelligence, Springer LNCS, vol. 6744, Jun. 27, 2011, pp. 446-451, XP055060285. |
Papandreou A. et al., “A Novel Skew Detection Technique Based on Vertical Projections”, International Conference on Document Analysis and Recognition, Sep. 18, 2011, pp. 384-388, XP055062043, DOI: 10.1109/ICDAR.2011.85, ISBN: 978-1-45-771350-7. |
Setlur, et al., “Creation of data resources and design of an evaluation test bed for Devanagari script recognition”, Research Issues in Data Engineering: Multi-lingual Information Management, RIDE-MLIM 2003. Proceedings. 13th International Workshop, 2003, pp. 55-61. |
Chaudhuri B., Ed., “Digital Document Processing—Major Directions and Recent Advances”, 2007, Springer-Verlag London Limited, XP002715747, ISBN : 978-1-84628-501-1 pp. 103-106, p. 106, section “5.3.5 Zone Separation and Character Segmentation”, paragraph 1. |
Chaudhuri B.B., et al., “An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi)”, Proceedings of the 4th International Conference on Document Analysis and Recognition.(ICDAR). ULM, Germany, Aug. 18-20, 1997; [Proceedings of the ICDAR], Los Alamitos, IEEE Comp. Soc, US, vol. 2, Aug. 18, 1997, pp. 1011-1015, XP010244882, DOI: 10.1109/ICDAR.1997.620662 ISBN: 978-0-8186-7898-1 the whole document. |
Chaudhury S (Eds.): OCR Technical Report for the project “Development of Robust Document Analysis and Recognition System for Printed Indian Scripts”, 2008, pp. 149-153, XP002712777, Retrieved from the Internet: URL:http://researchweb.iiit.ac.inj-jinesh/ocrDesignDoc.pdf [retrieved on Sep. 5, 2013]. |
Chen Y.L., “A knowledge-based approach for textual information extraction from mixed text/graphics complex document images”, Systems Man and Cybernetics (SMC), 2010 IEEE International Conference on, IEEE, Piscataway, NJ, USA, Oct. 10, 2010, pp. 3270-3277, XP031806156, ISBN: 978-1-4244-6586-6. |
Dalal N., et al., “Histograms of oriented gradients for human detection”, Computer Vision and Pattern Recognition, 2005 IEEE Computer Society Conference on, IEEE, Piscataway, NJ, USA, Jun. 25, 2005, pp. 886-893 vol. 1, XP031330347, ISBN: 978-0-7695-2372-9 Section 6.3. |
Forssen P.E., et al., “Shape Descriptors for Maximally Stable Extremal Regions”, Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, IEEE, PI, Oct. 1, 2007, pp. 1-8, XP031194514 , ISBN: 978-1-4244-1630-1 abstract Section 2. Multi-resoltuion MSER. |
International Search Report and Written Opinion—PCT/US2013/049499—ISA/EPO—Nov. 6, 2013. |
Minoru M., Ed., “Character Recognition”, Aug. 2010, Sciyo, XP002715748, ISBN: 978-953-307-105-3 pp. 91-95, p. 92, section “7.3 Baseline Detection Process”. |
Pal U et al., “Multi-skew detection of Indian script documents” Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on Seattle, WA, USA Sep. 10-13, 2001, Los Aalmitos, CA, USA, IEEE Comput. Soc. US, Sep. 10, 2001, pp. 292-296, XP010560519, DOI:10.1109/ICDAR.2001.953801, ISBN: 978-0-7695-1263-1. |
Pal U., et al., “OCR in Bangla: an Indo-Bangladeshi language”, Pattern Recognition, 1994. vol. 2—Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conferenc e on Jerusalem, Israel Oct. 9-13, 1994, Los Alamitos, CA, USA, IEEE Comput. Soc, vol. 2, Oct. 9, 1994, pp. 269-273, XP010216292, DOI: 10.1109/ICPR.1994.576917 ISBN: 978-0-8186-6270-6 the whole document. |
Premaratne H.L., et al., “Lexicon and hidden Markov model-based optimisation of the recognised Sinhala script”, Pattern Recognition Letters, Elsevier, Amsterdam, NL, vol. 27, No. 6, Apr. 15, 2006, pp. 696-705, XP027922538, ISSN: 0167-8655. |
Ray A.K et al., “Information Technology—Principles and Applications”. 2004. Prentice-Hall of India Private Limited. New Delhi! XP002712579, ISBN: 81-203-2184-7, pp. 529-531. |
Senda S., et al., “Fast String Searching in a Character Lattice,” IEICE Transactions on Information and Systems, Information & Systems Society, Tokyo, JP, vol. E77-D, No. 7, Jul. 1, 1994, pp. 846-851, XP000445299, ISSN: 0916-8532. |
Senk V., et al., “A new bidirectional algorithm for decoding trellis codes,” Eurocon' 2001, Trends in Communications, International Conference on Jul. 4-7, 2001, Piscataway, NJ, USA, IEEE, Jul. 4, 2001, pp. 34-36, vol. I, XP032155513, DOI :10.1109/EURCON.2001.937757 ISBN : 978-0-7803-6490-5. |
Sinha R.M.K., et al., “On Devanagari document processing”, Systems, Man and Cybernetics, 1995. Intelligent Systems for the 21st Century., IEEE International Conference on Vancouver, BC, Canada Oct. 22-25, 1995, New York, NY, USA,IEEE, US, vol. 2, Oct. 22, 1995, pp. 1621-1626, XP010194509, DOI: 10-1109/ICSMC.1995.538004 ISBN: 978-0-7803-2559-3 the whole document. |
Song Y., et al., “A Handwritten Character Extraction Algorithm for Multi-language Document Image”, 2011 International Conference on Document Analysis and Recognition, Sep. 18, 2011, pp. 93-98, XP055068675, DOI: 10.1109/ICDAR.2011.28 ISBN: 978-1-45-771350-7. |
Uchida S et al., “Skew Estimation by Instances”, 2008 The Eighth IAPR International Workshop on Document Analysis Systems, Sep. 1, 2008, pp. 201-208, XP055078375, DOI: 10.1109/DAS.2008.22, ISBN: 978-0-76-953337-7. |
Unser M., “Sum and Difference Histograms for Texture Classification”, Transactions on Pattern Analysis and Machine Intelligence, IEEE, Piscataway, USA, vol. 30, No. 1, Jan. 1, 1986, pp. 118-125, XP011242912, ISSN: 0162-8828 section A; p. 122, right-hand column p. 123. |
Wu V., et al., “TextFinder: An Automatic System to Detect and Recognize Text in Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, No. 11, Nov. 1, 1999, pp. 1224-1229, XP055068381. |
Number | Date | Country | |
---|---|---|---|
20140023271 A1 | Jan 2014 | US |
Number | Date | Country | |
---|---|---|---|
61674846 | Jul 2012 | US | |
61673700 | Jul 2012 | US |