Some embodiments relate to searches. More specifically, some embodiments provide a mechanism to conduct a language independent search within scanned documents based on an invoice format of a supplier.
While some of the data items in an invoice may be common to many invoices, there may not be a set, fixed, or shared standard for configuring the data in invoices by companies and organizations. As such, a major concern with processing invoices includes accurately recognizing and determining the relevant data items in an invoice. A number of systems, devices, and processes attempt to disambiguate invoice data by trying to recognize the language of the invoice and, through various methods and processes, interpret what the language means. Such systems and processes may tend to be resource hungry, complex, and not reliably accurate. Some such systems include review and verification operations by a human in an effort to increase the accuracy in recognizing and interpreting the language of the invoices since automated language recognition systems are not typically very accurate. However, such human interaction is also resource intensive and costly. Since there is not set standard configuration for organizing, configuring, or even naming data items on invoices, considerable effort, resources, and techniques have been developed in an attempt to better recognize the language components of invoices given the unstructured nature of invoices.
Accordingly, a language independent method and system for efficiently searching invoices are provided by some embodiments herein.
A computer system, device, application, or service may be used to generate a query statement or function, execute the query statement against a collection of data, and display the result of executing the query statement or function. In some instances, the query may relate to an invoice. More generally, the methods and systems herein may relate to, interface with, include, and comprise an invoicing system, application, service, or platform.
As shown, invoice 100 includes a variety and multitude of information thereon. Some of the information included in invoice 100 may be unique to the invoice, and some other information in invoice 100 may include data items that may be typically included in an invoice. For example, invoice 100 includes an invoice date comprising a descriptor or anchor 105 and an invoice term 110. As used herein, the term anchor may refer to a descriptor or label associated with an invoice term (e.g., 110). Additionally, the phrase invoice term refers to a value of an invoice data item. As illustrated in
Some of the other types and variety of data items on invoice 100 may include the anchor of “Invoice ID” 115 and an associated invoice term value of “GD-2311-0001” at 120; the invoice term of a supplier name at 125; and the invoice term of an address for the supplier at 130; and a line item 135 including details of invoiced goods and services such as, for example, a cost of an item at 140 (12.50 USD), a quantity of the item delivered at 145 (7), and a total costs based on the cost and quantity of the item at 150. The collective of the total costs based on the cost and quantity of the item forms part of a line item detail at 135. As shown, the line item details reside on one line of invoice 100, although line items details may, in some embodiments, extend beyond a single line of a helpful shower.
In some embodiments, some of the data items on invoice 100 may be commonly found on invoices since the information may typically be needed, used, or desired for the processing and settling of the invoice. Data items such as, for example, an invoice date, the invoice ID, the name and address of the supplier, and line item details may be the type of invoice data items that a business, organization, or user may want to have for each and every invoice they process or intake. In some embodiments herein, the identification and use of anchors on a per supplier basis operates to provide a reliable and efficient mechanism for searching invoices that is language independent. That is, some embodiments herein use anchors specific to each supplier and their invoice configuration, thereby eliminating a need to recognize and understand the language of the invoice since the anchors can be used retrieve relevant invoice data.
At S210, an invoice term associated with an invoice from the particular supplier associated with the invoice (i.e., the supplier that generated the invoice) is received. This invoice term may be received from a database. Moreover, the invoice term may have been established for the particular supplier of the invoice during the scanning and verification of the invoice at or before S205. Additionally, the invoice term for and associated with the supplier of the invoice may have been established at some other point prior to S210. As an example, the value “GD-2311-0001” may be received as the invoice term.
At S215, a comparison of the invoice term received from the database and the text file of the invoice being currently processed is invoked. A determination is made whether the invoice term is in the text file representation of the invoice. This comparison and determination is made in an effort to exactly determine the anchor for this invoice term (e.g., “GD-2311-0001”) for the supplier associated with the invoice.
Process 200 proceeds to determine at S220, in an instance it is determined the invoice term is in the text file representation of the invoice, an anchor term associated with the invoice term. This determination is accomplished by an examination of the text file representation of the invoice in a proximity of the searched invoice term. The terms and phrases, if any, in the vicinity of the searched invoice term are examined to determine the proper anchor for the invoice term for the particular supplier. For example, referring to
At S220, the determined anchor may be mapped to reference the invoice term. In some embodiments, the spatial relationship between the invoice term (e.g., “GD-2311-0001”) and anchor (e.g., “Invoice ID) is noted and stored at S225 for future reference. The spatial relationship between the invoice term and the anchor may indicate that the anchor for the invoice term is located to the right or left of the invoice term, below the invoice term, above the invoice term, or some other relative position. The invoice term and the anchor term associated with the invoice term are stored in a record for the supplier
According to some embodiments, an invoice date for a scanned invoice may be determined by process 300. At S305, all of the dates in an invoice are determined. The invoice may be searched for anchors having the form of a date. For example, the anchors in this example may include various combinations of numerals and words, including abbreviations, that have been established as anchors for dates. Upwards of thirty (30) different date formats and configurations may be considered by an equal number of “invoice date” anchors. For example, the invoice may be searched for dates formatted as DD.MM.YYY, MM.DD.YYYY; MM.DD.YY, DD.MM.YY, and other formats. Upon the discovery of any “invoice date” anchors, the dates associated with each anchor is noted. The dates may be stored (at least temporarily).
At S310, a determination is made regarding which one of the dates resulting from operation S305 is most recent to but prior to the present processing of the invoice. This determined date is logically considered the invoice date, that is the date the invoice for the delivered goods and services was issued.
In some aspects and embodiments, dates other than an invoice date may be determined in accordance with the steps and operations of process 300.
At S405, all numeric strings in an invoice are determined. As a part of S405 or prior to S405, the invoice may be searched for the type of currency associated therewith. Anchors representative of the previously found currency type may be used in searching for numeric strings in close proximity with the anchors. The close proximity between the anchors and the numeric strings may indicate the numeric strings are associated with the anchors. Upon the discovery of currency anchors, the currency amounts associated with each anchor are noted. The currency amounts may be stored (at least temporarily).
In some embodiments, a currency amount may be found by determining all numeric values in a scanned invoice document having a decimal separator such as a comma and a period or dot. In some embodiments, it does not matter whether a comma or a dot is used as the decimal separator. For example, each of the following numeric strings would be recognized as a currency amount: 100.16; 100.16; 100,254.76 or 100,254.76.
At S410, a determination is made regarding which one of the currency amounts resulting from operation S405 is the largest. This largest determined currency amount (e.g.,
At S505, all numeric strings in an invoice are determined. The invoice may be searched for anchors having the form of a currency. As a part of S505 or prior to S505, the invoice may be searched for the type of currency associated therewith. Anchors representative of the previously found currency type may be used in searching for numeric strings in close proximity with the anchors. The close proximity between the anchors and the numeric strings may indicate the numeric strings are associated with the anchors. Upon the discovery of currency anchors, the currency amounts associated with each anchor is noted. The currency amounts may be stored (at least temporarily).
At S510, a determination is made regarding which one of the currency amounts resulting from operation S505 is the largest amount and which is the next largest amount. The determination of process 510 may include a sorting of the results of operation 505.
At 515, a difference between the largest determined currency amount (e.g.,
At S605, all available or potential suppliers are determined from a master list, listing, or record that includes all of the potential suppliers for a user (e.g., a business organization). At S610, the invoice is searched for name matches (e.g.,
Turning to
At S705, a scanned invoice is searched (i.e., queried) for a relationship between terms that may be defined as (a*b=c). In some embodiments, all of the terms a, b, and c, are located on the same line of the scanned invoice. In the event a search of the scanned invoice reveals a group of invoice terms satisfying the (a*b=c) constraint, then the line of the invoice containing the terms is logically considered a “line item”. Invoice 100 may be used as an example of an invoice including a line item. Referring to
In some embodiments, the results of S705 may be verified by comparing the results with a purchase order (p.o.) corresponding to the scanned invoice. In some embodiments, the p.o. corresponding to an invoice is listed or otherwise included in the invoice. The comparison of the determined line item details with the known line item details of the p.o. may operate to verify an accuracy of operation S705.
Client 805 may be associated with a Web browser to access services provided by business process platform 810 via HyperText Transport Protocol (HTTP) communication. For example, a user may manipulate a user interface of client 805 to select data items that indicate an instruction. Client 805, in response, may transmit a corresponding HTTP service request to the business service provider 810 as illustrated. A service-oriented architecture may conduct any processing required by the request (e.g., generating queries and executing the queries against a collection of data) and, after completing the processing, provides a response (e.g., search results) to client 805. Client 805 may comprise a Personal Computer (PC) or mobile device executing a Web client. Examples of a Web client include, but are not limited to, a Web browser, an execution engine (e.g., JAVA, Flash, Silverlight) to execute associated code in a Web browser, and/or a dedicated standalone application.
In some aspects,
All systems and processes discussed herein may be embodied in program code stored on one or more computer-readable media. Such media may include, for example, a floppy disk, a CD-ROM, a DVD-ROM, magnetic tape, and solid state Random Access Memory (RAM) or Read Only Memory (ROM) storage units. According to some embodiments, a memory storage unit may be associated with access patterns and may be independent from the device (e.g., magnetic, optoelectronic, semiconductor/solid-state, etc.) Moreover, in-memory technologies may be used such that databases, etc. may be completely operated in RAM memory at a processor. Embodiments are therefore not limited to any specific combination of hardware and software.
Client 805 may provide a user interface for presenting collections of data, such as search results, to a user and receive an indication of a selection of one or more of the data items presented in the user interface. In some embodiments, the data may be associated with data structures hosted by business service provider 810.
Accordingly, a method and mechanism for efficiently and automatically creating and executing a query or search of a scanned invoice from a supplier, where the search is conducted based on and using anchor terms, are provided by some embodiments herein.
Processor 905 communicates with a storage device 930. Storage device 930 may comprise any appropriate information storage device, including combinations of magnetic storage devices (e.g., a hard disk drive), optical storage devices, and/or semiconductor memory devices.
Storage device 930 stores a program 935 for controlling the processor 905 and query engine application 945 for determining, constructing, and executing queries. Processor 905 performs instructions of the programs 935 and 945 and thereby operates in accordance with any of the embodiments described herein. Programs 935 and 945 may be stored in a compressed, uncompiled and/or encrypted format. Programs 935 and 945 may furthermore include other program elements, such as an operating system, a database management system, and/or device drivers used by the processor 905 to interface with peripheral devices.
In some embodiments (such as shown in
Although embodiments have been described with respect to web browser displays, note that embodiments may be associated with other types of user interface displays. For example, a user interface may be associated with a portable device such as a smart phone or a tablet computing device, with a user interface element.
Embodiments have been described herein solely for the purpose of illustration. Persons skilled in the art will recognize from this description that embodiments are not limited to those described, but may be practiced with modifications and alterations limited only by the spirit and scope of the appended claims.