Claims
- 1. A method for automatically mathematically decomposing a table-structured document, the method comprising:
utilizing mathematical relationships, together with textual and positional clues to the mathematical relationships, in a collaborative manner, to derive a mathematical construct of the table-structured document.
- 2. The method of claim 1, wherein the table-structured document comprises at least one of: a balance sheet, an income statement and a cash flow statement.
- 3. The method of claim 1, wherein the steps are performed automatically by a computer system.
- 4. The method of claim 1, wherein the table-structured document is in the form of at least one of: an ASCII text document, an EBCDIC text document, a spreadsheet, a PDF file, a Postscript file, and an HTML document.
- 5. The method of claim 1, wherein the table-structured document comprises an electronic document.
- 6. The method of claim 5, wherein the electronic document is obtained electronically via at least one of: the Internet, an electronic mail message, an intranet, an extranet, and a scanner.
- 7. A method for automatically mathematically decomposing a table-structured document, wherein the table-structured document comprises rows of data, the method comprising:
identifying each row of data in the table-structured document by aggregating together successively larger sets of values from consecutive rows in the table-structured document, starting at the top of the document and considering possible negative and positive permutations of each value as necessary, to see if the sums thereof are substantially equal to a value in the next consecutive row in the table-structured document, wherein if the sum thereof is substantially equal to the value in the next consecutive row in the table-structured document, the value in that next consecutive row in the table-structured document is identified as a subtotal.
- 8. The method of claim 7, wherein the table-structured document comprises at least one of a cash flow statement and an income statement.
- 9. The method of claim 7, wherein the steps are performed automatically by a computer system.
- 10. A method for automatically mathematically decomposing a table-structured document comprising multiple tables having predefined mathematical relationships, the method comprising at least one of the following steps:
utilizing textual clues and the predefined mathematical relationships between the multiple tables to partition the document into multiple sub-tables; identifying a value in each sub-table as a grand total for the sub-table utilizing at least one of positional and textual information; assigning all line items within the sub-table, except the grand total for the sub-table, as children of the sub-table; pre-identifying subtotals within each table by utilizing available textual clues; pre-identifying a value sign for each value in each table by utilizing available textual clues; identifying and validating mathematical relationships between the children in each sub-table by summing together all line items values within the sub-table, except the grand total value for the sub-table, to create a validation sum, and then subtracting a sum of successively larger sets of line item values from the validation sum until the result thereof equals the grand total value for the sub-table, wherein when the result thereof equals the grand total value for the sub-table, the values in the set of line item values are identified as subtotals of the sub-table.
- 11. The method of claim 10, wherein the table-structured document comprises a balance sheet.
- 12. The method of claim 10, wherein the steps are performed automatically by a computer system.
- 13. The method of claim 10, wherein the document is in the form of at least one of: an ASCII text document, an EBCDIC text document, a spreadsheet, a PDF file, a Postscript file, and an HTML document.
- 14. The method of claim 10, wherein the document comprises an electronic document.
- 15. The method of claim 14, wherein the electronic document is obtained electronically via at least one of: the Internet, an electronic mail message, an intranet, an extranet, and a scanner.
- 16. The method of claim 10, wherein the method is utilized to analyze at least one of: a company's financial health and the integrity of the financial statement.
- 17. A method for automatically mathematically decomposing a financial statement comprising rows of data, the method comprising:
identifying each row of data in the financial statement as an individual line item, a subtotal or a total, by summing together sequentially larger sets of values from consecutive rows, considering possible combinations of positive and negative permutations of each value as necessary, to see if the sum thereof equals the value in the next consecutive row in the financial statement; wherein if the sum of the set of values equals the value in the next consecutive row in the financial statement, the set of values are identified as individual line items belonging to a category, and wherein the value in the next consecutive row in the financial statement is identified as the subtotal of the category.
- 18. The method of claim 17, wherein the financial statement comprises at least one of a cash flow statement and an income statement.
- 19. The method of claim 17, wherein the steps are performed automatically by a computer system.
- 20. The method of claim 17, wherein the financial statement is in the form of at least one of: an ASCII text document, an EBCDIC text document, a spreadsheet, a PDF file, a Postscript file, and an HTML document.
- 21. The method of claim 17, wherein the financial statement comprises an electronic document.
- 22. The method of claim 21, wherein the electronic document is obtained electronically via at least one of: the Internet, an electronic mail message, an intranet, an extranet, and a scanner.
- 23. The method of claim 17, wherein the method is utilized to analyze at least one of: a company's financial health and the integrity of the financial statement.
- 24. A system for automatically mathematically decomposing a table-structured document, the system comprising:
a means for utilizing mathematical relationships, together with textual and positional clues to the mathematical relationships, in a collaborative manner, to derive a mathematical construct of the table-structured document.
- 25. The system of claim 24, wherein the table-structured document comprises at least one of: a balance sheet, an income statement and a cash flow statement.
- 26. The system of claim 24, wherein a computer system automatically mathematically decomposes the table-structured document.
- 27. The system of claim 24, wherein the table-structured document is in the form of at least one of: an ASCII text document, an EBCDIC text document, a spreadsheet, a PDF file, a Postscript file, and an HTML document.
- 28. The system of claim 24, wherein the table-structured document comprises an electronic document.
- 29. The system of claim 28, wherein the electronic document is obtained electronically via at least one of: the Internet, an electronic mail message, an intranet, an extranet, and a scanner.
- 30. A system for automatically mathematically decomposing a table-structured document, wherein the table-structured document comprises rows of data, the system comprising:
a means for identifying each row of data in the table-structured document by aggregating together successively larger sets of values from consecutive rows in the table-structured document, starting at the top of the document and considering the possible negative and positive permutations of each value as necessary, to see if the sums thereof are substantially equal to a value in the next consecutive row in the table-structured document, wherein if the sum thereof is substantially equal to the value in the next consecutive row in the table-structured document, the value in that next consecutive row in the table-structured document is identified as a subtotal.
- 31. The system of claim 30, wherein the table-structured document comprises at least one of a cash flow statement and an income statement.
- 32. The system of claim 30, wherein a computer system automatically mathematically decomposes the table-structured document.
- 33. A system for automatically mathematically decomposing a table-structured document comprising multiple tables having predefined mathematical relationships, the system comprising at least one of the following:
a means for utilizing textual clues and the predefined mathematical relationships between the multiple tables to partition the document into multiple sub-tables; a means for identifying a value in each sub-table as a grand total for the sub-table utilizing at least one of positional and textual information; a means for assigning all line items within the sub-table, except the grand total for the sub-table, as children of the sub-table; a means for pre-identifying subtotals within each table by utilizing available textual clues; a means for pre-identifying a value sign for each value in each table by utilizing available textual clues; a means for identifying and validating mathematical relationships between the children in each sub-table by summing together all line items values within the sub-table, except the grand total value for the sub-table, to create a validation sum, and then subtracting a sum of successively larger sets of line item values from the validation sum until the result thereof equals the grand total value for the sub-table, wherein when the result thereof equals the grand total value for the sub-table, the values in the set of line item values are identified as subtotals of the sub-table.
- 34. The system of claim 33, wherein a computer system automatically mathematically decomposes the table-structured document.
- 35. The system of claim 33, wherein the table-structured document is in the form of at least one of: an ASCII text document, an EBCDIC text document, a spreadsheet, a PDF file, a Postscript file, and an HTML document.
- 36. The system of claim 33, wherein the table-structured document comprises an electronic document.
- 37. The system of claim 36, wherein the electronic document is obtained electronically via at least one of: the Internet, an electronic mail message, an intranet, an extranet, and a scanner.
- 38. The system of claim 33, wherein the system is utilized to analyze at least one of: a company's financial health and the integrity of the financial statement.
- 39. A system for automatically mathematically decomposing a financial statement comprising rows of data, the system comprising:
a means for identifying each row of data in the financial statement as an individual line item, a subtotal or a total, by summing together sequentially larger sets of values from consecutive rows, considering the possible combinations of positive and negative permutations of each value as necessary, to see if the sum thereof equals the value in the next consecutive row in the financial statement; wherein if the sum of the set of values equals the value in the next consecutive row in the financial statement, the set of values are identified as individual line items belonging to a category, and wherein the value in the next consecutive row in the financial statement is identified as the subtotal of the category.
- 40. The system of claim 39, wherein the financial statement comprises at least one of a cash flow statement and an income statement.
- 41. The system of claim 39, wherein a computer system automatically mathematically decomposes the financial statement.
- 42. The system of claim 39, wherein the financial statement is in the form of at least one of: an ASCII text document, an EBCDIC text document, a spreadsheet, a PDF file, a Postscript file, and an HTML document.
- 43. The system of claim 39, wherein the financial statement comprises an electronic document.
- 44. The system of claim 43, wherein the electronic document is obtained electronically via at least one of: the Internet, an electronic mail message, an intranet, an extranet, and a scanner.
- 45. The system of claim 39, wherein the system is utilized to analyze at least one of: a company's financial health and the integrity of the financial statement.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This invention is related to commonly-owned, co-pending U.S. patent application Ser. No. ______, entitled “Automated Understanding, Extraction and Structured Reformatting of Information in Electronic Files,” filed herewith on Mar. 27, 2003, which is hereby incorporated in full by reference. This invention is also related to commonly-owned, co-pending U.S. patent application Ser. No. ______, entitled “Automated Understanding and Decomposition of Table-Structured Electronic Documents,” filed herewith on Mar. 27, 2003, which is also hereby incorporated in full by reference.