This disclosure relates generally to computing systems, and, more particularly, to methods and apparatus to detect a text region of interest in a digital image using machine-based analysis.
Image recognition involves computer-aided techniques to analyze pictures or photographs to determine and/or identify the content of the captured scene (e.g., the recognition of the general subject matter of the scene and/or the recognition of individual objects within the scene). Such techniques are useful in different applications across different industries. For example, retail establishments, product manufacturers, and other business establishments may take advantage of image recognition techniques of photographs of such establishments (e.g., pictures of product shelving) to identify quantities and/or types of products in inventory, to identify shelves that need to be restocked and/or the frequency with which products need restocking, to recognize and read product barcodes or textual information about the product, to assess product arrangements and displays, etc.
The figures are not to scale. In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts.
Descriptors “first,” “second,” “third,” etc. are used herein when identifying multiple elements or components which may be referred to separately. Unless otherwise specified or understood based on their context of use, such descriptors are not intended to impute any meaning of priority, physical order or arrangement in a list, or ordering in time but are merely used as labels for referring to multiple elements or components separately for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for ease of referencing multiple elements or components.
Examples disclosed herein employ computer vision and machine-based deep learning to detect context in which text is located (e.g., a text region of interest) in images. To identify locations of text regions of interest based on context of text, examples disclosed herein employ a CNN that is trained based on deep learning techniques to discern between different contexts in which text appears in an image. A CNN is a deep learning network relying on previously analyzed (e.g., trained) images to analyze new images. For example, if an element of interest to be analyzed and/or detected is a product logo, a CNN may be trained using a plurality of images including the product logo to understand the significant elements of the logo (e.g., the shape, color, etc.) so that the CNN can detect, with a certain probability, that the logo appears in an image. CNNs typically perform such analysis using a pixel-by-pixel comparison algorithm. For example, a CNN may perform such analysis by extracting visual features from the image. However, text recognition performance of CNNs is substantially lower than their visual feature recognition performance due to the similarity of the visual features of text across different regions. To overcome the poor text recognition performance of CNNs and leverage their strengths in visual feature recognition performance, examples disclosed herein pre-process text in images to generate color-coded text-map images in which different color shadings are used to generate color-coded visual depictions of locations of text in an image. These color-coded text-maps operate as proxies for corresponding text when CNNs analyze the color-coded text-map images based on visual feature analysis.
In examples disclosed herein, CNN-based deep learning is used to analyze images that include text-based information or descriptions and identify text regions of interest by discerning such text regions of interest from other text regions not of interest. Techniques disclosed herein are useful in many areas including analyzing images having high-densities of text that cannot be parsed, discerned, or identified based on text characteristics with a suitable accuracy by CNNs using prior techniques. In examples disclosed herein, color-coding or color-shading locations of text in text-maps facilitate visually perceiving high-density text in an image as, for example, paragraphs of text, tables of text, groupings of relatively small-sized fonts compared to an image as a whole, etc.
In some examples disclosed herein, a source image with different text regions is analyzed by generating text data from the source image, the source image including a first text region of interest and a second text region not of interest; generating a plurality of color-coded text-map images, the plurality of color-coded text-map images including color-coded segments with different colors, the color-coded segments corresponding to different text characteristics; and determining a first location in the source image as more likely to be the first text region of interest than a second location in the source image corresponding to the second text region that is not of interest based on performing a CNN analysis on the source image and the plurality of color-coded text-map images.
As used herein, a text characteristic is defined as an aspect or trait of text characters and/or words. For example, a text characteristic may be whether the text is punctuation, whether the text is numeric, whether the text appears more than a threshold number of times, whether the text matches a dictionary of known words, or any other suitable characteristic that can be measured. As used herein, text context or context of text is defined as the underlying setting that denotes the purpose or intent for which text appears on an image. For example, the text context or context of text may represent that text is in a text region to represent an ingredients list section on a food product label, that text is in a text region to represent a nutrition facts table on a food product label, that text is in a text region to identify artistic performers on an admissions ticket, that text is in a text region to represent a store address on a sales receipt, etc.
As used herein, a text region of interest is defined as a region of text in an image that corresponds to a text context or context of text specified in a user input as a query or request for locating in an input image. For example, a user may specify in a configuration file or in an input parameter that an image analysis process should identify a text region of interest as a location of an ingredients list or a location of a nutrition facts table in an image of a food product label. Alternatively, if an example image is a sales receipt, the text region of interest may be a location of a product price or a location of a store address. In yet another example, if the input image is a product webpage for an online retailer, the text region of interest may be a location of a department list or a location of a clearance section. In examples disclosed herein, a CCN discerns between a text region of interest and other text regions that are not of interest in an input image. As used herein, text regions not of interest are regions of text in an input image not commensurate with the text context or context of text identified in user input for locating in an input image.
In examples disclosed herein, separate color-coded text-maps are generated using separate colors corresponding to different measured text characteristics. In examples disclosed herein, images of the color-coded text-maps are provided as input to a CNN to identify text context or context of text, and locate text regions of interest in a subject image.
In examples disclosed herein, color-coded text-maps represent locations of text characters based on text characteristics. Example color-coded text-maps disclosed herein are visual representations in which color highlighting, color shading, or color chips are placed at locations corresponding to text characters and/or words using color values (e.g., red, green, blue, magenta, cyan, yellow, etc.) depending on the relevance of these text characters and/or words to the text characteristics corresponding to those colors. For example, extracted text of interest matching a predetermined set of words (e.g., a dictionary containing known words or phrases that are likely to be in the requested text context such as the keyword fiber in the text context of ingredients lists) may be colored and/or highlighted with a first color. In a similar example, extracted text of interest satisfying (e.g., greater than or equal to) a numerical threshold (e.g., numerical text less than 100) may be colored and/or highlighted with a second color. In yet another example, text or words appearing in an image a number of times satisfying (e.g., greater than or equal to) an occurrence ratio threshold or an occurrence threshold may be colored and/or highlighted with a third color. In examples disclosed herein, images of the color-coded text-maps are utilized as inputs to a CNN. In examples disclosed herein, the color-coding generated for a word or text is a color highlighting, color bar, color shading, or color chip that covers the footprint or area occupied by the corresponding word or text. In this manner, a text-map becomes a visually perceptive map of locations of words or text relative to one another within the boundary limits of an image.
In the illustrated example of
In the example of
In the example of
In the example of
In the example of
In
In the example of
The example CNN 110 is trained during a training phase to detect a particular type of text region of interest. Based on such training, the CNN 110 analyzes the color-coded-component inputs of the input images 201, 203 to detect features located therein, and generate probability outputs indicative of likelihoods that different corresponding text regions of the nutritional image 201 are a text region of interest. In the example of
In the example of
In
In the example illustrated in
In the example of
After the OCR analysis on the image 402 (e.g., the recognition of text in the OCR-processed image of interest 404), the example second text-maps 502 (
In the example of
In the example of
In the example of
In the example of
In examples disclosed herein, the text-to-color filter 606 determines what extracted text of the image 102 satisfies different ones of the text characteristics (e.g., matches, satisfies a threshold, etc.). For example, a text characteristic may be punctuation such that any punctuation text satisfies the text characteristic. In such examples, the text-to-color filter 606 may determine the locations of all punctuation text on the image 102 and provides the locations to the color-coding generator 608 in association with the text characteristic. Furthermore, the text-to-color filter 606 may determine if the extracted text on the image 102 satisfies a second text characteristic. For example, the second text characteristic may specify that text must match words in a dictionary. In the example of
In the example of
In the example of
In addition, the color-coding generator 608 can color code using multiple levels of intensity. Such different color intensity levels can be based on how often particular text is known to appear in a particular context across different items relative to other text. For example, both water and apples may be in a dictionary of the text database 609 for the context of ingredients list. However, the term water may be marked with a higher intensity color shading than the term apples although both are marked with the same color. In such example, the reason for the higher intensity shading for water is that water is known to occur more often across ingredients lists than apples.
In the example of
In
In the example of
In the example of
In the example illustrated in
In the example of
Additionally, the text region of interest 801 represents the prediction region located in the second image 804, along with an example prediction performance. In this example, the prediction performance is 0.91, or 91 percent. Illustrated in Table 1 below, the prediction performance when utilizing the text-map generator 108 with the CNN 110 of
In Table 1 above, the Precision represents a performance rate that is the relationship between true positives and the sum of true positives and false positives predicted by a CNN with respect to locations of text regions of interest, the Recall represents a rate that is the relationship between true positives and the sum of true positives and false negatives predicted by a CNN with respect to locations of text regions of interest, and Accuracy represents the overall performance (e.g., the relationship between true positives and the sum of true positives, false positives, and false negatives) of the CNN. As shown in Table 1 above, across two contexts of text (e.g., ingredients and nutritional facts), the CNN with text-maps (e.g., the CNN 110 utilizing the text-map generator 108) is more accurate than a CNN without text-maps.
While an example manner of implementing the text-map generator 108 of
A flowchart representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the text-map generator 108 of
The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a packaged format, etc. Machine readable instructions as described herein may be stored as data (e.g., portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, etc. in order to make them directly readable and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and stored on separate computing devices, wherein the parts when decrypted, decompressed, and combined form a set of executable instructions that implement a program such as that described herein. In another example, the machine readable instructions may be stored in a state in which they may be read by a computer, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc. in order to execute the instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, the disclosed machine readable instructions and/or corresponding program(s) are intended to encompass such machine readable instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s) when stored or otherwise at rest or in transit.
As mentioned above, the example processes of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, and (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A
In response, the text-map generator 108 of
In the example of
At block 1120, the text-to-color filter 606 and the color-coding generator 608 of
The text-to-color filter 606 selects a corresponding color (block 1220). For example, for each selected text characteristic, the text-to-color filter 606 pairs an individual color. The text-to-color filter 606 determines text on the image that satisfies the text characteristic (block 1230). If the text-to-color filter 606 determines text that satisfies the text characteristic, the text-to-color filter 606 generates text location information of the identified text (block 1240). The color-coding generator 608 generates a color-coded text-map using color (e.g., the color selected in block 1220) to highlight the text satisfying the text characteristic (block 1250) based on the text location information from the text-to-color filter 606. For example, to execute block 1250, the color-coding generator 608 may generate a plurality of color-coded text-map images, the plurality of color-coded text-map images including color-coded segments with different colors, the color-coded segments corresponding to text having different text characteristics. If the color-coding generator 608 determines text in the image does not satisfy the text characteristic, or after creating the color-coded text-map at block 1250, the color-coding generator 608 determines whether another text characteristic is to be analyzed (block 1260).
If the color-coding generator 608 determines at block 1260 that another text characteristic is to be analyzed, then control returns to block 1210. Alternatively, if the color-coding generator 608 determines at block 1260 there is not another text characteristic to be analyzed, the color-coding generator 608 stores the generated color-coded text-map(s) in memory (block 1270). Control returns to a calling function or process such as the process implemented by the instructions represented by
The processor platform 1300 of the illustrated example includes a processor 1312. The processor 1312 of the illustrated example is hardware. For example, the processor 1312 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example image interface 602, the example OCR text detector 604, the example text-to-color filter 606, the example color-coding generator 608, the example data interface 610, and/or, more generally, the example text-map generator 108 of
The processor 1312 of the illustrated example includes a local memory 1313 (e.g., a cache). The processor 1312 of the illustrated example is in communication with a main memory including a volatile memory 1314 and a non-volatile memory 1316 via a bus 1318. The volatile memory 1314 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 1316 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1314, 1316 is controlled by a memory controller.
The processor platform 1300 of the illustrated example also includes an interface circuit 1320. The interface circuit 1320 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
In the illustrated example, one or more input devices 1322 are connected to the interface circuit 1320. The input device(s) 1322 permit(s) a user to enter data and/or commands into the processor 1012. The input device(s) can be implemented by, for example, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, and/or isopoint system.
One or more output devices 1324 are also connected to the interface circuit 1320 of the illustrated example. The output devices 1024 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. The interface circuit 1320 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 1320 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 1326. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 1300 of the illustrated example also includes one or more mass storage devices 1328 for storing software and/or data. Examples of such mass storage devices 1328 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.
Machine executable instructions 1332 represented by the flowcharts of
From the foregoing, it will be appreciated that example methods, apparatus and articles of manufacture have been disclosed that improve a computers accuracy in predicting text regions of interest in images including text characters and/or words using a convolutional neural network. The disclosed methods, apparatus and articles of manufacture increase the efficiency and accuracy of a computing device in detecting context of text by utilizing a plurality of color-coded text-maps generated by a text map generator to detect context of text using a convolutional neural network. The example disclosed methods, apparatus and articles of manufacture improve the efficiency of using a computing device by automatically identifying data relating to context of textual information in images. The example disclosed methods, apparatus and articles of manufacture are accordingly directed to one or more improvement(s) in the functioning of a computer.
Example methods, apparatus, systems, and articles of manufacture to detect a text region of interest in a digital image using machine-based analysis are disclosed herein. Further examples and combinations thereof include the following:
Example 1 includes an apparatus to analyze characteristics of text of interest, the apparatus comprising a text detector to provide text data from a first image, the first image including a first text region of interest and a second text region not of interest, a color-coding generator to generate a plurality of color-coded text-map images, the plurality of color-coded text-map images including color-coded segments with different colors, the color-coded segments corresponding to different text characteristics, and a convolutional neural network (CNN) to determine a first location in the first image as more likely to be the first text region of interest than a second location in the first image corresponding to the second text region that is not of interest based on performing a CNN analysis on the first image and the plurality of color-coded text-map images.
Example 2 includes the apparatus of example 1, wherein the plurality of color-coded text-map images includes a first color-coded text-map image and a second color-coded text-map image, the first color-coded text-map image including first color-coded segments of a first color, and the second color-coded text-map image including second color-coded segments of a second color.
Example 3 includes the apparatus of example 2, wherein the first color-coded segments correspond to a first text characteristic, and the second color-coded segments correspond to a second text characteristic.
Example 4 includes the apparatus of example 3, wherein the first color is different than the second color.
Example 5 includes the apparatus of example 1, wherein the CNN analysis identifies the second text region that is not of interest as separate from the first text region of interest when a same keyword appears in both the first text region of interest and the second text region that is not of interest.
Example 6 includes the apparatus of example 1, further including an interface to provide the plurality of color-coded text-map images to the CNN via a plurality of corresponding input channels of the CNN.
Example 7 includes the apparatus of example 1, wherein the first image is at least one of a food product label, a non-food product label, a sales receipt, a webpage, or a ticket.
Example 8 includes the apparatus of example 1, wherein the first text region of interest includes at least one of a nutrition facts table, a list of ingredients, a product description, candidate persons, numerical dates, or percentages.
Example 9 includes a non-transitory computer readable medium comprising computer readable instructions which, when executed, cause at least one processor to at least generate text data from a first image, the first image including a first text region of interest and a second text region not of interest, generate a plurality of color-coded text-map images, the plurality of color-coded text-map images including color-coded segments with different colors, the color-coded segments corresponding to different text characteristics, and determine a first location in the first image as more likely to be the first text region of interest than a second location in the first image corresponding to the second text region that is not of interest based on performing a CNN analysis on the first image and the plurality of color-coded text-map images.
Example 10 includes the computer readable medium of example 9, wherein the plurality of color-coded text-map images includes a first color-coded text-map image and a second color-coded text-map image, the first color-coded text-map image including first color-coded segments of a first color, and the second color-coded text-map image including second color-coded segments of a second color.
Example 11 includes the computer readable medium of example 10, wherein the first color-coded segments correspond to a first text characteristic, and the second color-coded segments correspond to a second text characteristic.
Example 12 includes the computer readable medium of example 11, wherein the first color is different than the second color.
Example 13 includes the computer readable medium of example 9, further including the at least one processor to identify the second text region that is not of interest as separate from the first text region of interest when a same keyword appears in both the first text region of interest and the second text region that is not of interest.
Example 14 includes the computer readable medium of example 9, further including the at least one processor to provide the plurality of color-coded text-map images to a CNN via a plurality of corresponding input channels of the CNN.
Example 15 includes the computer readable medium of example 9, wherein the first image is at least one of a food product label, a non-food product label, a sales receipt, a webpage, or a ticket.
Example 16 includes the computer readable medium of example 9, wherein the first text region of interest includes at least one of a nutrition facts table, a list of ingredients, a product description, candidate persons, numerical dates, or percentages.
Example 17 includes a method to analyze characteristics of text of interest, the method comprising generating text data from a first image, the first image including a first text region of interest and a second text region not of interest, generating a plurality of color-coded text-map images, the plurality of color-coded text-map images including color-coded segments with different colors, the color-coded segments corresponding to different text characteristics, and determining a first location in the first image as more likely to be the first text region of interest than a second location in the first image corresponding to the second text region that is not of interest based on performing a CNN analysis on the first image and the plurality of color-coded text-map images.
Example 18 includes the method of example 17, wherein the plurality of color-coded text-map images includes a first color-coded text-map image and a second color-coded text-map image, the first color-coded text-map image including first color-coded segments of a first color, and the second color-coded text-map image including second color-coded segments of a second color.
Example 19 includes the method of example 18, wherein the first color-coded segments correspond to a first text characteristic, and the second color-coded segments correspond to a second text characteristic.
Example 20 includes the method of example 19, wherein the first color is different than the second color.
Example 21 includes the method of example 17, further including identifying the second text region that is not of interest as separate from the first text region of interest when a same keyword appears in both the first text region of interest and the second text region that is not of interest.
Example 22 includes the method of example 17, further including providing the plurality of color-coded text-map images to the CNN via a plurality of corresponding input channels of the CNN.
Example 23 includes the method of example 17, wherein the first image is at least one of a food product label, a non-food product label, a sales receipt, a webpage, or a ticket.
Example 24 includes the method of example 17, wherein the first text region of interest includes at least one of a nutrition facts table, a list of ingredients, a product description, candidate persons, numerical dates, or percentages.
Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2019/000299 | 3/28/2019 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2020/194004 | 10/1/2020 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
3323135 | Miller | Jun 1967 | A |
5410611 | Huttenlocher et al. | Apr 1995 | A |
5606690 | Hunter et al. | Feb 1997 | A |
7454063 | Kneisl et al. | Nov 2008 | B1 |
7792709 | Trandal et al. | Sep 2010 | B1 |
8285047 | Nagarajan et al. | Oct 2012 | B2 |
8494281 | Nagarajan | Jul 2013 | B2 |
8787695 | Wu | Jul 2014 | B2 |
8792141 | Moore et al. | Jul 2014 | B2 |
8983170 | Nepomniachtchi et al. | Mar 2015 | B2 |
9014432 | Fan et al. | Apr 2015 | B2 |
9158744 | Rao et al. | Oct 2015 | B2 |
9239952 | Hsu | Jan 2016 | B2 |
9262686 | Singer | Feb 2016 | B1 |
9290022 | Makabe | Mar 2016 | B2 |
9298685 | Barrus | Mar 2016 | B2 |
9298979 | Nepomniachtchi et al. | Mar 2016 | B2 |
9323135 | Veloso | Apr 2016 | B1 |
9324073 | Nepomniachtchi et al. | Apr 2016 | B2 |
9384389 | Sankaranarayanan | Jul 2016 | B1 |
9384839 | Avila et al. | Jul 2016 | B2 |
9396540 | Sampson | Jul 2016 | B1 |
9684842 | Deng | Jun 2017 | B2 |
9710702 | Nepomniachtchi et al. | Jul 2017 | B2 |
9747504 | Ma et al. | Aug 2017 | B2 |
9760786 | Sahagun et al. | Sep 2017 | B2 |
9824270 | Mao | Nov 2017 | B1 |
9875385 | Humphreys | Jan 2018 | B1 |
10032072 | Tran | Jul 2018 | B1 |
10157425 | Chelst et al. | Dec 2018 | B2 |
10235585 | Deng | Mar 2019 | B2 |
10242285 | Thrasher et al. | Mar 2019 | B2 |
10395772 | Lucas et al. | Aug 2019 | B1 |
10679283 | Pesce | Jun 2020 | B1 |
11257049 | Durazo Almeida | Feb 2022 | B1 |
11321956 | Geng | May 2022 | B1 |
11410446 | Shanmuganathan et al. | Aug 2022 | B2 |
11414053 | Tanaami et al. | Aug 2022 | B2 |
11468491 | Dalal | Oct 2022 | B2 |
11476981 | Wei et al. | Oct 2022 | B2 |
11562557 | Miginnis et al. | Jan 2023 | B2 |
11587148 | Elder | Feb 2023 | B2 |
11593552 | Sarkar | Feb 2023 | B2 |
11609956 | Jain | Mar 2023 | B2 |
11625930 | Rodriguez et al. | Apr 2023 | B2 |
11810383 | Patel et al. | Nov 2023 | B2 |
11842035 | Jahjah et al. | Dec 2023 | B2 |
20020037097 | Hoyos et al. | Mar 2002 | A1 |
20030185448 | Seeger et al. | Oct 2003 | A1 |
20060232619 | Otsuka et al. | Oct 2006 | A1 |
20070041642 | Romanoff et al. | Feb 2007 | A1 |
20080205759 | Zandifar et al. | Aug 2008 | A1 |
20090164422 | Pacella | Jun 2009 | A1 |
20100306080 | Trandal et al. | Dec 2010 | A1 |
20110122443 | Otsuka et al. | May 2011 | A1 |
20110243445 | Uzelac et al. | Oct 2011 | A1 |
20110289395 | Breuel et al. | Nov 2011 | A1 |
20110311145 | Bern et al. | Dec 2011 | A1 |
20120183211 | Hsu et al. | Jul 2012 | A1 |
20120274953 | Makabe | Nov 2012 | A1 |
20120330971 | Thomas et al. | Dec 2012 | A1 |
20130058575 | Koo et al. | Mar 2013 | A1 |
20130170741 | Hsu et al. | Jul 2013 | A9 |
20140002868 | Landa et al. | Jan 2014 | A1 |
20140064618 | Janssen, Jr. | Mar 2014 | A1 |
20140188647 | Argue | Jul 2014 | A1 |
20140195891 | Venkata Radha Krishna Rao et al. | Jul 2014 | A1 |
20150039479 | Gotanda | Feb 2015 | A1 |
20150127428 | Gharachorloo | May 2015 | A1 |
20150169951 | Khintsitskiy et al. | Jun 2015 | A1 |
20150254778 | Kmak et al. | Sep 2015 | A1 |
20150317642 | Argue | Nov 2015 | A1 |
20150363792 | Arini | Dec 2015 | A1 |
20150363822 | Rowe | Dec 2015 | A1 |
20160005189 | Gray | Jan 2016 | A1 |
20160034863 | Ross | Feb 2016 | A1 |
20160063469 | Etzion | Mar 2016 | A1 |
20160125383 | Chan | May 2016 | A1 |
20160171585 | Singh | Jun 2016 | A1 |
20160203625 | Khan et al. | Jul 2016 | A1 |
20160210507 | Abdollahian | Jul 2016 | A1 |
20160234431 | Kraft et al. | Aug 2016 | A1 |
20160307059 | Chaudhury et al. | Oct 2016 | A1 |
20160342863 | Kwon | Nov 2016 | A1 |
20170293819 | Deng | Oct 2017 | A1 |
20180005345 | Apodaca et al. | Jan 2018 | A1 |
20180053045 | Lorenzini et al. | Feb 2018 | A1 |
20180060302 | Liang et al. | Mar 2018 | A1 |
20180317116 | Komissarov et al. | Nov 2018 | A1 |
20190026803 | De Guzman | Jan 2019 | A1 |
20190050639 | Ast | Feb 2019 | A1 |
20190080207 | Chang | Mar 2019 | A1 |
20190171900 | Thrasher et al. | Jun 2019 | A1 |
20190244020 | Yoshino et al. | Aug 2019 | A1 |
20190325211 | Ordonez et al. | Oct 2019 | A1 |
20190332662 | Middendorf et al. | Oct 2019 | A1 |
20190354818 | Reisswig et al. | Nov 2019 | A1 |
20200097718 | Schäfer | Mar 2020 | A1 |
20200142856 | Neelamana | May 2020 | A1 |
20200151444 | Price et al. | May 2020 | A1 |
20200151902 | Almazán | May 2020 | A1 |
20200175267 | Schäfer et al. | Jun 2020 | A1 |
20200249803 | Sobel et al. | Aug 2020 | A1 |
20200364451 | Ammar et al. | Nov 2020 | A1 |
20200401798 | Foncubierta Rodriguez et al. | Dec 2020 | A1 |
20200410231 | Chua et al. | Dec 2020 | A1 |
20210004880 | Benkreira et al. | Jan 2021 | A1 |
20210019287 | Prasad et al. | Jan 2021 | A1 |
20210034856 | Torres et al. | Feb 2021 | A1 |
20210090694 | Colley et al. | Mar 2021 | A1 |
20210117665 | Simantov et al. | Apr 2021 | A1 |
20210117668 | Zhong et al. | Apr 2021 | A1 |
20210142092 | Zhao et al. | May 2021 | A1 |
20210149926 | Komninos et al. | May 2021 | A1 |
20210158038 | Shanmuganathan et al. | May 2021 | A1 |
20210216765 | Xu | Jul 2021 | A1 |
20210248420 | Zhong et al. | Aug 2021 | A1 |
20210295101 | Tang et al. | Sep 2021 | A1 |
20210319217 | Wang et al. | Oct 2021 | A1 |
20210334737 | Balaji | Oct 2021 | A1 |
20210343030 | Sagonas et al. | Nov 2021 | A1 |
20210357710 | Zhang et al. | Nov 2021 | A1 |
20210406533 | Arroyo et al. | Dec 2021 | A1 |
20220004756 | Jennings | Jan 2022 | A1 |
20220114821 | Arroyo et al. | Apr 2022 | A1 |
20220189190 | Arroyo et al. | Jun 2022 | A1 |
20220198185 | Prebble | Jun 2022 | A1 |
20220383651 | Shanmuganathan et al. | Dec 2022 | A1 |
20220397809 | Talpade et al. | Dec 2022 | A1 |
20220414630 | Yebes Torres et al. | Dec 2022 | A1 |
20230004748 | Rodriguez et al. | Jan 2023 | A1 |
20230005286 | Yebes Torres et al. | Jan 2023 | A1 |
20230008198 | Gadde et al. | Jan 2023 | A1 |
20230196806 | Ramalingam et al. | Jun 2023 | A1 |
20230214899 | Martínez Cebrián et al. | Jul 2023 | A1 |
20230230408 | Arroyo et al. | Jul 2023 | A1 |
20230394859 | Montero et al. | Dec 2023 | A1 |
Number | Date | Country |
---|---|---|
2957433 | Jun 2020 | CA |
103123685 | May 2013 | CN |
104866849 | Aug 2015 | CN |
104866849 | Aug 2015 | CN |
108229397 | Jun 2018 | CN |
108829397 | Nov 2018 | CN |
109389124 | Feb 2019 | CN |
112446351 | Mar 2021 | CN |
112560862 | Mar 2021 | CN |
202013005144 | Oct 2013 | DE |
2595412 | Nov 2021 | GB |
H0749529 | Feb 1995 | JP |
2008211850 | Sep 2008 | JP |
2019139737 | Aug 2019 | JP |
7049529 | Apr 2022 | JP |
10-1831204 | Feb 2018 | KR |
101831204 | Feb 2018 | KR |
2013044145 | Mar 2013 | WO |
WO-2018054326 | Mar 2018 | WO |
2018201423 | Nov 2018 | WO |
2020194004 | Oct 2020 | WO |
2022006295 | Jan 2022 | WO |
2022123199 | Jun 2022 | WO |
Entry |
---|
International Searching Authority, “International Search Report,” mailed in connection with International Patent Application No. PCT/IB2019/000299, on Dec. 23, 2019, 3 pages. |
International Searching Authority, “Written Opinion,” mailed in connection with International Patent Application No. PCT/IB2019/000299, on Dec. 23, 2019, 4 pages. |
Bartz et al., “STN-OCT: A Single Neural Network for Text Detection and Text Recognition,” Computer Science, Jul. 27, 2017, 9 pages. |
Ivan Ozhiganov, et al. “Deep Dive Into OCR for Receipt Recognition,” DZone, Jun. 21, 2017, 19, pages. |
NielsenIQ Brandbank, “Nielsen Brandbank Product Library,” Online Available. Retrieved on Apr. 1, 2022. 5 pages. [retrieved from: https://www.brandbank.com/us/product-library/]. |
Github, “FIAT tool—Fast Image Data Annotation Tool,” Github.com, downloaded on Apr. 1, 2022, 30 pages, [retrieved from: https://github.com/christopher5106/FastAimotation Tooll. |
United States Patent and Trademark Office, “Notice of Allowance and Fee(s)Due,” issued in connection with U.S. Appl. No. 16/692,797, issued Apr. 5, 2022, 10 pages. |
United States Patent and Trademark Office, “Corrected Notice of Allowability,” issued in connection with U.S. Appl. No. 16/692,797, dated Apr. 22, 2022, 3 pages. |
International Searching Authority, “International Preliminary Report on Patentability,” mailed in connection with International Patent Application No. PCT/US2020/061269, on May 17, 2022, 5 pages. |
Gu et al.,“ XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding,” Conference on Computer Vision and Pattern Recognition ( CVPR), Jun. 18, 2022, 10 pages. |
European Patent Office, “Communication pursuant to Rules 161(2) and 162 EPC,” issuedin connection with Application No. 20891012.5, dated Jun. 29, 2022, 3 pages. |
Datasetlist, “A tool using OpenCV to annotate images for image classification, optical character reading, . . . ,” Datasetlist.com, dated Jul. 13, 2022, 30 pages. |
Villota et al. “Text Classification Models for Form Entity Linking”, arXiv, 2021, 10 pages. |
United States Patent and Trademark Office, “Non-Final Office Action,” issued in connection with U.S. Appl. No. 17/345,940, dated Aug. 18, 2022, 8 pages. |
United States Patent and Trademark Office, “Non-Final Office Action” , issued in connection with U.S. Appl. No. 17/075,675, issued Sep. 22, 2022, 12 Pages. |
Huang et al., “LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking,” 30th ACM International Conference on Multimedia, Oct. 2022, 10 pages. |
Zhang et al.,“Multimodal Pre-training Based on Graph Attention Network for Document Understanding.” IEEE Transactions on Multimedia, vol. 25, Oct. 12, 2022, 13 pages. |
European Patent Office, “Extended European Search Report,” issued in connection with European Patent Application No. 19921870, dated Oct. 12, 2022, 11 pages. |
International Searching Authority, “International Search Report,” issued in connection with International Patent Application No. PCT/US2022/034570, mailed on Oct. 20, 2022, 3 pages. |
International Searching Authority, “Written Opinion,” issued in connection with International Patent Application No. PCT/US2022/034570, mailed on Oct. 20, 2022, 5 pages. |
Kim et al., “Donut: Document Understanding Transformer without OCR”, arXiv, dated 2021, 29 pages. |
United States Patent and Trademark Office, “Notice of Allowance and Fee(s) Due,” issued in connection with U.S. Appl. No. 17/364,419, dated Nov. 4, 2022, 10 pages. |
Canadian Patent Office, “Examiner's Report,” issued in connection with Canadian Patent Application No. 3,124,868, mailed on Nov. 10, 2022, 4 pages. |
United States Patent and Trademark Office, “Corrected Notice of Allowability”, issued in connection with U.S. Appl. No. 17/364,419, dated Nov. 15, 2022, 2 pages. |
Zhong et al., “Hierarchical Message-Passing Graph Neural Networks,” Data Mining and Knowledge Discovery, Nov. 17, 2022, 28 pages. |
European Patent Office, “European Search Report,” issued in connection with European patent appl. No. 22180113.7-1207, Nov. 22, 2022, 25 pages. |
Dwivedi et al., “Benchmarking Graph Neural Networks,” Journal of Machine Learning Research, Dec. 2022, 49 pages. |
European Patent Office, “Extended European Search Report,” issued in connection with European Patent Application No. 22184405.3, dated Dec. 2, 2022, 4 pages. |
United States Patent and Trademark Office, “Non-Final Office Action,” issued in connection with U.S. Appl. No. 17/379,280, dated Dec. 2, 2022, 14 Pages. |
International Searching Authority, “International Preliminary Report on Patentability,” issued in connection with PCT No. PCT/US2021/039931, issued Dec. 13, 2022, 5 pages. |
United States Patent and Trademark Office, “Corrected Notice of Allowability,” issued in connection with U.S. Appl. No. 17/364,419, dated Jan. 4, 2023, 2 pages. |
United States Patent and Trademark Office, “Non-Final Office Action,” issued in connection with U.S. Appl. No. 17/883,309, dated Jan. 20, 2023, 14 pages. |
United States Patent and Trademark Office, “Corrected Notice of Allowability”, issued in connection with U.S. Appl. No. 17/364,419, filed Feb. 15, 2023, 2 pages. |
United Kingdom Patent Office, “Examination Report under section 18(3),” in connection with Great Britain Patent Application No. 2112299.9, issued Feb. 17, 2023, 2 pages. |
United States Patent and Trademark Office, “Final Action” issued in U.S. Appl. No. 17/075,675, on Mar. 7, 2023 (11 pages). |
United States Patent and Trademark Office, “Non-Final Office Action,” issued in connection with U.S. Appl. No. 17/345,940, dated Mar. 16, 2023, 13 pages. |
United States Patent and Trademark Office, “Final Office Action,” issued in connection with U.S. Appl. No. 17/379,280, dated May 5, 2023, 17 pages. |
United States Patent and Trademark Office, “Notice of Allowance,” issued in connection with U.S. Appl. No. 17/883,309, mailed on May 11, 2023, 9 pages. |
European Patent Office, “Extended European Search Report,” issued in connection with European Patent Application No. 22214553.4, dated May 17, 2023, 9 pages. |
United States Patent and Trademark Office, “Advisory Action,” issued in connection with U.S. Appl. No. 17/075,675, issued May 30, 2023, 3 pages. |
International Searching Authority, International Search Report, issued in connection with International Patent Application No. PCT/US2023/011859, mailed on Jun. 1, 2023, 3 pages. |
International Searching Authority, Written Opinion, issued in connection with International Patent Application No. PCT/US2023/011859, mailed on Jun. 1, 2023, 4 pages. |
United States Patent and Trademark Office, “Notice of Allowance,” issued in U.S. Appl. No. 17/075,675, mailed on Jun. 26, 2023, 8 pages. |
United States Patent and Trademark Office, “Notice of Allowance and Fee(s) Due,” issued in connection with U.S. Appl. No. 17/345,940, mailed on Jul. 7, 2023, 8 pages. |
United Kingdom Intellectual Property Office, “Intention to Grant under Section 18(4),” issued in connection with United Kingdom Patent Application No. 2112299.9, dated Jul. 13, 2023, 2 pages. |
United States Patent and Trademark Office, “Advisory Action,” issued in connection with U.S. Appl. No. 17/379,280, mailed on Jul. 18, 2023, 3 pages. |
Gopal et al., “What is Intelligent Document Processing?” Nano Net Technologies, URL: [https://nanonets.com/blog/intelligent-document-processing/], Jul. 19, 2023, 21 pages. |
United States Patent and Trademark Office, “Corrected Notice of Allowability,” issued in connection with U.S. Appl. No. 17/345,940, dated Jul. 20, 2023, 3 pages. |
Canadian Intellectual Property Office, “Examiner's Report,” issued in connection with Canadian Patent Application No. 3, 124,868, dated Aug. 10, 2023, 5 pages. |
United States Patent and Trademark Office, “Corrected Notice of Allowability,” issued in connection with U.S. Appl. No. 17/883,309, mailed on Aug. 17, 2023, 2 Pages. |
United Kingdom Intellectual Property Office, “Notification of Grant,” issued in connection with United Kingdom Patent Application No. 2112299.9, dated Aug. 29, 2023, 2 pages. |
Amazon, “Intelligent Document Processing,” Amazon Web Services, https://aws.amazon.com/machine-learning/ml-use-cases/document-processing/fintech/, retrieved on Sep. 8, 2023, 6 pages. |
United States Patent and Trademark Office, “Corrected Notice of Allowability,” issued in connection with U.S. Appl. No. 17/075,675, mailed on Oct. 10, 2023, 2 pages. |
United States Patent and Trademark Office, “Non-Final Office Action, ” issued in connection with U.S. Appl. No. 17/710,538, dated Oct. 26, 2023, 6 Pages. |
European Patent Office, “Extended European Search Report,” issued in connection with European Patent Application No. 20891012.5, dated Nov. 17, 2023, 12 pages. |
International Searching Authority, “International Preliminary Report on Patentability,” issued in connection with International Patent Application No. PCT/US2022/034570, issued on Jan. 4, 2024, 7 pages. |
United States Patent and Trademark Office, “Non-Final Office Action,” issued in connection with U.S. Appl. No. 18/191,642, dated Feb. 7, 2024, 18 pages. |
Yadati et al., “HyperGCN: Hypergraph Convolutional Networks for Semi-Supervised Classification,” Proceedings of the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, 18 pages. |
Github, “Tesseract OCR,” Tesseract Repository on GitHub, retrieved from: https://github. com/tesseract-ocr/, dated 2020, 3 pages. |
Carbonell et al., “Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents,” 2020 International Conference on Pattern Recognition (ICPR), Jan. 10, 2021, 6 pages. |
Zacharias et al., “Image Processing Based Scene-Text Detection and Recognition with Tesseract,” arXiv (CoRR), dated Apr. 17, 2020, 6 pages. |
Liu et al. “RoBERTa: A Robustly Optimized BERT Pretraining Approach.” ArXiv abs/1907.11692, Jul. 26, 2019, 13 pages. |
Xu et al., “LayoutLM: Pre-training of Text and Layout for Document Image Understanding,” in International Conference on Knowledge Discovery & Data Mining (SIGKDD), Jun. 16, 2020, 9 pages. [retrieved from: https://arxiv.org/pdf/1912.13318.pdf]. |
Dong et al. “HNHN: Hypergraph Networks with Hyperedge Neurons,” ArXiv abs/2006.12278, dated 2020, 11 pages. |
Yu et al., “PICK: Processing Key Information Extraction from Documents using Improved Graph Learning—Convolutional Networks,” in International Conference on Pattern Recognition (ICPR), dated Jul. 18, 2020, 8 pages. [retrieved from: https://arxiv.org/pdf/2004.07464.pdf]. |
Chen et al., “HGMF: Heterogeneous Graph-Based Fusion for Multimodal Data with Incompleteness,” Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, dated Aug. 20, 2020, 11 pages. |
Wang et al., “DocStruct: A Multimodal Method to Extract Hierarchy Structure in Document for General Form Understanding,” 2020 Conference Empirical Methods in Natural Language Processing (EMNLP), Nov. 16, 2020, 11 pages. |
Zhu et al., “Heterogeneous Mini-Graph Neural Network and Its Application to Fraud Invitation Detection.” 2020 IEEE International Conference on Data Mining (ICDM), Nov. 17, 2020, 9 pages. |
Bandyopadhyay et al., “Hypergraph Attention Isomorphism Network by Learning Line Graph Expansion.” 2020 IEEE International Conference on Big Data (Big Data) (2020): 669-678, 10 pages. |
Arroyo et al., “Multi-label classification of promotions in digital leaflets using textual and visual information,” Proceedings of the Workshop on Natural Language Processing in E-Commerce (EComNLP), pp. 11-20, Barcelona, Spain (Online), Dec. 12, 2020, 10 pages. |
DeepDive, “Distant Supervision,” Online available on Stanford University website, retrieved on Apr. 1, 2022, 2 pages, [retrieved from: http://deepdive.stanford.edu/distant supervision]. |
Nguyen et al. “End-to-End Hierarchical Relation Extraction for Generic Form Understanding”, in International Conference on Pattern Recognition (ICPR), pp. 5238-5245, 2021, 8 pages. |
International Searching Authority, “International Search Report,” mailed in connection with International Patent Application No. PCT/US2020/061269, dated Mar. 11, 2021, 3 pages. |
International Searching Authority, “Written Opinion,” mailed in connection with International Patent Application No. PCT/US2020/061269, on Mar. 11, 2021,4 pages. |
United States Patent and Trademark Office, “Non-Final Office Action,” issued in connection with U.S. Appl. No. 16/692,797, dated Mar. 16, 2021, 12 pages. |
Google, “Detect Text in Images,” Mar. 29, 2021, 16 pages. Retrieved from http://cloud.google.com/vision/docs/ocr. |
Xu et al. “LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding”, arXiv, 2021, 10 pages. |
Ma et al., “Graph Attention Networks with Positional Embeddings.” ArXiv abs/2105.04037, 2021, 13 pages. |
Chen et al., “TextPolar: irregular scene text detection using polar representation,” International Journal on Document Analysis and Recognition (IJDAR), May 23, 2021, 9 pages. |
Li et al. “SelfDoc: Self-Supervised Document Representation Learning.” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 20, 2021, 10 pages. |
Hwang et al., “Spatial Dependency Parsing for Semi-Structured Document Information Extraction,” in International Joint Conference on Natural Language Processing (IJCNLP), Jul. 1, 2021, 14 pages. [retrieved from: https://arxiv.org/pdf/2005.00642.pdf]. |
Li et al., “StructuralLM: Structural Pre-training for Form Understanding.” 59th Annual Meeting of the Association for Computational Linguistics, Aug. 2021, 10 pages. |
Xu et al., “LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding.” ACL, dated 2021, 13 pages. |
Huang et al. “UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks.” IUCAI, 2021, 9 pages. |
Tang et al., “MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction”, in International Joint Conference on Artificial Intelligence (IJCAI), pp. 1039-1045, 2021, 7 pages. |
Qian et al., “A Region-Based Hypergraph Network for Joint Entity-Relation Extraction,” Knowledge-Based Systems. vol. 228, Sep. 2021, 8 pages. |
Prabhu et al., “MTL-FoUn: A Multi-Task Learning Approach to Form Understanding,” 2021 International Conference on Document Analysis and Recognition (ICDAR), Sep. 5, 2021, 5 pages. |
Davis et al., “Visual FUDGE: Form Understanding via Dynamic Graph Editing,” International Conference on Document Analysis and Recognition (ICDAR), Sep. 5, 2021, 16 pages. |
Shen et al., “LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis,” in International Conference on Document Analysis and Recognition (ICDAR), Sep. 5, 2021, 16 pages. [retrieved from: https://arxiv.org/pdf/2103.15348.pdf]. |
Garncarek et al. “Lambert: Layout-Aware Language Modeling for Information Extraction.” ICDAR, 2021, 16 pages. |
Powalski et al., “Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer,” International Conference on Document Analysis and Recognition, Sep. 5, 2021, 17 pages. |
Hong et al., “BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents,” arXiv (CoRR), Sep. 10, 2021, 13 pages. [retrieved from: https://arxiv.org/pdf/2108.04539.pdf]. |
Appalaraju et al., “DocFormer: End-to-End Transformer for Document Understanding,” arXiv (CoRR), Sep. 20, 2021, 22 pages. [retrieved from: https://arxiv.org/pdf/2106.11539.pdf]. |
International Searching Authority, “International Preliminary Report on Patentability,” issued in connection with International Patent Application No. PCT/IB2019/000299, mailed on Sep. 28, 2021, 5 pages. |
Li et al. “StrucTexT: Structured Text Understanding with Multi-Modal Transformers”, in ACM International Conference on Multimedia (ACM Multimedia), pp. 1912-1920. 2021. |
United States Patent and Trademark Office, “Final Office Action,” issued in connection with U.S. Appl. No. 16/692,797, dated Oct. 27, 2021, 14 pages. |
International Searching Authority, “Written Opinion,” issued in connection with International Patent Application No. PCT/US2021/039931, mailed on Nov. 4, 2021, 4 pages. |
International Searching Authority, “International Search Report,” issued in connection with International Patent Application No. PCT/US2021/039931, mailed on Nov. 4, 2021, 3 pages. |
European Patent Office, “Communication pursuant to Rules 161(2) and 162 EPC,” in connection with European Patent Application No. 19921870.2, issued Nov. 5, 2021, 3 pages. |
Zhang et al., “Entity Relation Extraction as Dependency Parsing in Visually Rich Documents,” Empirical Methods in Natural Language Processing (EMNLP), Nov. 7, 2021, 10 pages. |
Hwang et al., “Cost-Effective End-to-end Information Extraction for Semi-structured Document Images,” Empirical Methods in Natural Language Processing (EMNLP), Nov. 7, 2021, 9 pages. |
Gu et al., “UniDoc: Unified Pretraining Framework for Document Understanding,” Neural Information Processing Systems (NeurIPS), Dec. 6, 2021, 12 pages. |
Park et al. “CORD: A Consolidated Receipt Dataset for Post-OCR Parsing. In Workshop on Document Intelligence,” at NeurIPS 2019, 4 pages. |
Wang et al. “LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding”, in Annual Meeting of the Association for Computational Linguistics (ACL), 2022, 11 pages. |
United States Patent and Trademark Office, “Advisory Action,” issued inconnection with U.S. Appl. No. 16/692,797, dated Feb. 16, 2022, 4 pages. |
Github, “Doccano tool,” Github.com, downloaded on Apr. 1, 2022, 12 pages. [retrieved from:https://github.com/doccano/doccano]. |
Datasetlist, “Labeling tools—List of labeling tools,” Datasetlist.com, updated Dec. 2021, downloaded on Apr. 1, 2022, 14 pages. [retrieved from: https://www.datasetlist.com/tools/]. |
Levenshtein, “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals,” Soviet Physics—Doklady, Cybernetics and Control Theory, pp. 707-710, vol. 10, No. 8, Feb. 1966,4 pages. |
Smith et al., “Identification of Common Molecular Subsequences,” Reprinted Journal of Molecular Biology, Academic Press Inc. (London) Ltd., pp. 195-197, dated 1981, 3 pages. |
Govindan et al., “Character Recognition—A Review,” Pattern Recognition, vol. 23, No. 7, pp. 671-683, published Jul. 20, 1990, 13 pages. |
Poulovassilis et al. “A nested-graph model for the representation and manipulation of complex objects.” ACM Trans. Inf. Syst. 12 (1994): 34 pages. |
Hochreiter et al. “Long Short-Term Memory.” Neural Computation 9 (1997): 1735-1780, 46 pages. |
Ng et al., “On Spectral Clustering: Analysis and an Algorithm,” NIPS'01: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, Jan. 2001, 8 pages. |
Crandall et al., “Extraction of special effects caption text events from digital video,” IJDAR, Department of Computer Science and Engineering, The Pennsylvania State University, 202 Pond Laboratory, University Park, PA, accepted Sep. 13, 2022, pp. 138-157, 20 pages. |
Lowe, “Distinctive Image Features from Scale-Invariant Key points,” International Journal of Computer Vision (HCV), published Jan. 5, 2004, 20 pages. |
Marinai, “Introduction to Document Analysis and Recognition,” Machine Learning in Document Analysis and Recognition, published 2008, 20 pages. |
Vogel et al., “Parallel Implementations of Word Alignment Tool,” Software Engineering, Testing, and Quality Assurance for Natural Language Processing, nn. 49-57, Jun. 2008, 10 pages. |
O'Gorman et al., “Document Image Analysis,” IEEE Computer Society Executive Briefings, dated 2009, 125 pages. |
Oliveira et al., “A New Method for Text-Line Segmentation for Warped Documents,” International Conference Image Analysis and Recognition, Jun. 21, 2010, 11 pages. |
Krizhevsky et al., “ImageNet Classification with Deep Convolutional Neural Networks,” In International Conference on Neural Information Processing Systems (NIPS), published 2012, 9 pages. |
Chung et al. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” ArXiv abs/1412.3555, dated 2014, 9 pages. |
Nshuti, “Mobile Scanner and OCR (A First Step Towards Receipt to Spreadsheet),” published 2015, 3 pages. |
Ronneberger et al., “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Medical Image Computing and Computer-Assisted Intervention (MICCAI), dated May 18, 2015, 8 pages. |
Lecun et al., “Deep Learning,” Nature, vol. 521, pp. 436-444, dated May 28, 2015, 9 pages. |
Genereux et al., “NLP Challenges in Dealing with OCR-ed Documents of Derogated Quality,” Workshop on Replicability and Reproducibility in Natural Language Processing, IJCAI 2015, dated Jul. 2015, 6 pages. |
Ren et al., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” In International Conference on Neural Information Processing Systems (NIPS), pp. 91-99, dated Dec. 7, 2015, 14 pages. |
Kim et al., “Character-Aware Neural Language Models,” Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI'IO), pp. 2741-2749, 2016, 9 pages. |
Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection,” In Conference on Computer Vision and Pattern Recognition (CVPR), dated May 9, 2016, 10 pages. |
Osindero et al., “Recursive Recurrent Nets with Attention Modeling for OCR in the Wild,” in Conference on Computer Vision and Pattern Recognition (CVPR), dated Jun. 27, 2016, 10 pages. |
Joulin et al., “Bag of Tricks for Efficient Text Classification,” In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, dated Aug. 9, 2016, 5 pages. |
Konda et al., “Magellan: Toward Building Entity Matching Management Systems Over Data Science Stacks,” Proceedings of the VLDB Endowment, vol. 9, No. 13, pp. 1581-1584, dated 2016, 4 pages. |
Kipf et al., “Semi-Supervised Classification with Graph Convolutional Networks,” 5th International Conference on Learning Representations, Apr. 24, 2017, 14 Pages. |
Bojanowski et al., “Enriching Word Vectors with Subword Information,” In Journal Transactions of the Association for Computational Linwstics, 2017, vol. 5, pp. 135-146, dated Jun. 2017, 12 pages. |
Vaswani et al., “Attention is all you need,” In Advances in Neural Information Processing Systems,31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, last revised Dec. 6, 2017, 15 pages. |
Hui, “mAP (mean Average Precision) for Object Detection,” Mar. 6, 2018, 2 pages. Retrieved from [https://medium.eom/@jonathan hui/map-mean-average-precision-for-object-detection-45cl21a311731 on May 11, 2020, 2 pages. |
Velickovic et al. “Graph Attention Networks,” ArXiv abs/1710.10903, 2018, 12 pages. |
Mudgal et al., “Deep learning for entity matching: A design space exploration,” in Proceedings of the 2018 International Conference on Management of Data, dated Jun. 10-15, 2018, 16 pages. |
Wick et al., “Calamari—A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition,” Digital Humanities Quarterly, Jul. 5, 2018, 12 pages. [retrieved from: https:/ /arxiv.org/ftp/arxiv/papers/1807/1807.02004.pdf]. |
Akbik et al., “Contextual String Embeddings for Sequence Labeling,” In Proceedings of the 27th International Conference on Computational Linguistics (COLING), dated Aug. 2018, 12 pages. |
Ray et al., “U-PC: Unsupervised Planogram Compliance,” in European Conference on Computer Vision (ECCV), 2018, 15 pages. [retrieved from:http:/ /openaccess.thecvf.com/content_ ECCV _ 2018/papers/ Archan_Ray_ U-PC_Unsupervised Planogram ECCV 2018 paper.pdf]. |
Follmann et al., “MVTec D2S: Densely Segmented Supermarket Dataset,” In European Conference on Computer Vision (ECCV), dated 2018, 17 pages. |
Li et al., “Extracting Figures and Captions from Scientific Publications,” Short Paper, CIKM18, Oct. 22-26, 2018, Torino, Italy, 4 pages. |
Elfwing et al. “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning,” Neural Networks: Journal of the International Neural Network Society, vol. 107, Nov. 2018, 18 pages. |
Huang et al., “Mask R-CNN with Pyramid Attention Network for Scene Text Detection”, arXiv:1811.09058v1, pp. 1-9, https://arxiv.org/abs/1811.09058, Nov. 22, 2018, 9 pages. |
Wikipedia, “Precision & Recall,” Dec. 17, 2018 revision, 12 pages. |
Artificial Intelligence & Image Analysis, “Intelligent Automation Eliminates Manual Data Entry From Complex Documents,” White Paper, accessed on Jan. 30, 2019, 3 pages. |
Artificial Intelligence & Image Analysis, “Historic Document Conversion,” Industry Paper, accessed on Jan. 30, 2019, 4 pages. |
Loshchilov et al., “Decoupled Weight Decay Regularization,” 2019 International Conference on Learning Representations, May 6, 2019, 19 pages. |
Nathancy, “How do I make masks to set all of image background, except the text, to white?”, stakoverflow.com, https://stackoverflow.com/questions/56465359/how-do-i-make-masks-to-set-all-of-image-background-except-the-text-to-white, Jun. 5, 2019, 5 pages. |
Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” In Conference of the North American Chapter of the Association for ComputationalLinguistics (NAACL-HLT), dated Jun. 24, 2019, 16 pages. |
Qasim et al., “Rethinking Table Recognition using Graph Neural Networks,” In Inernational Conference on Document Analysis and Recognition (ICDAR), dated Jul. 3, 2019, 6 pages. |
Feng et al., “Computer vision algorithms and hardware implementations: A survey,” Integration: the VLSI Journal, vol. 69, pp. 309-320, dated Jul. 27, 2019, 12 pages. |
Hu et al., “Semi-supervised Node Classification via Hierarchical Graph Convolutional Networks.” ArXiv abs/1902.06667, 2019, 8 Pages. |
Oliveira et al., “dhSegment: A generic deep-learning approach for document segmentation,” In 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), dated Aug. 14, 2019, 6 pages. |
Zhong et al., “PubLayNet: largest dataset ever for document layout analysis,” In International Conference on Document Analysis and Recognition (ICDAR), dated Aug. 16, 2019, 8 pages. |
Guillaume et al., “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents,” International Conference on Document Analysis and Recognition (ICDAR), Sep. 20, 2019, 6 pages. |
Leicester et al., “Using Scanner Technology to Collect Expenditure Data,” Fiscal Studies, vol. 30, Issue 3-4, 2009, 29 pages. |
United States and Patent and Trademark Office, “Non-Final Office Action,” issued in connection with U.S. Appl. No. 17/566,135, dated Mar. 27, 2024, 13 pages. |
United States Patent and Trademark Office, “Non-Final Office Action,” issued in connection with U.S. Appl. No. 18/476,978, dated Apr. 18, 2024, 20 pages. |
United States Patent and Trademark Office, “Notice of Allowance and Fee(s) Due,” issued in connection with U.S. Appl. No. 17/710,538, dated Apr. 19, 2024, 8 pages. |
United States Patent and Trademark Office, “Non-Final Office Action,” issued in connection with U.S. Appl. No. 17/710,660, on May 28, 2024, 9 pages. |
Canadian Intellectual Property Office, “Office Action,” issued in connection with Canadian Patent Application No. 3,182,471, dated May 28, 2024, 5 pages. |
Visich, “Bar Codes and Their Applications,” Research Foundation of State University of New York, 1990, 59 pages. |
European Patent Office, “Communication pursuant to Article 94(3) EPC,” issued in connection with European Patent Application No. 19 921 870.2-1207, on Apr. 9, 2024, 7 pages. |
United States Patent and Trademark Office, “Supplemental Notice of Allowability,” issued in connection with U.S. Appl. No. 17/710,538, dated May 8, 2024, 3 pages. |
United States Patent and Trademark Office, “Final Office Action,” issued in connection with U.S. Appl. No. 17/566,135, dated Jul. 25, 2024, 17 pages. |
United States Patent and Trademark Office, “Notice of Allowance and Fee(s) Due,” issued in connection with U.S. Appl. No. 18/191,642, dated Aug. 28, 2024, 7 pages. |
United States Patent and Trademark Office, “Final Office Action,” issued in connection with U.S. Appl. No. 18/476,978, dated Aug. 14, 2024, 22 pages. |
United States Patent and Trademark Office, “Notice of Allowance and Fee(s) Due,” issued in connection with U.S. Appl. No. 17/710,538, dated Aug. 14, 2024, 8 pages. |
United States Patent and Trademark Office, “Non-Final Office Action,” issued in connection with U.S. Appl. No. 17/710,649, dated Sep. 16, 2024, 12 pages. |
United States Patent and Trademark Office, “Notice of Allowance and Fee(s) Due,” issued in connection with U.S. Appl. No. 17/710,660 dated Sep. 25, 2024, 9 pages. |
International Searching Authority, “International Preliminary Report on Patentability,” issued in connection with International Application No. PCT/US2023/011859, mailed on Aug. 6, 2024, 6 pages. |
United States Patent and Trademark Office, “Supplemental Notice of Allowability,” issued in connection with U.S. Appl. No. 17/710,538, dated Sep. 11, 2024, 3 pages. |
United States Patent and Trademark Office, “Advisory Action,” issued in connection with U.S. Appl. No. 18/476,978, dated Oct. 7, 2024, 3 pages. |
United States Patent and Trademark Office, “Notice of Allowance and Fee(s) Due,” issued in connection with U.S. Patent Application No. 17/566,135, dated Oct. 11, 2024, 9 pages. |
Number | Date | Country | |
---|---|---|---|
20220189190 A1 | Jun 2022 | US |