The present invention generally relates to the comparison of tables within multiple documents, and more particularly, to a system and method for the comparison of content within tables separate from the form and structure of those tables.
The need to manage and mitigate risks in documents, including changes made to document versions by internal and external sources, has become an essential component of a number of business specialties involved in documents that may be sensitive in nature, such as those found in the legal, financial, government and accounting business sectors. Professionals in industries that consistently engage in document sharing and collaboration within and outside of their organizations find document comparison programs to be essential in ensuring these professionals identify and address all changes made throughout a document's lifecycle and all sensitive metadata held within their documents.
Documents can be edited in a number of programs by multiple users. Changes can be made to text, tables, images, and other embedded objects, values and formulas, header and footer content, comments, and many other document aspects. Even documents that appear to be protected from change, such as PDF documents, are not secure from the possibility of being the recipients of changes or modifications. Users can edit those PDF documents in their native format or convert to a separate file type, edit the document, and then recreate a PDF of that document. A review of a document after it has been shared with an external source, either by humans or by computer programs, is thus necessitated to ensure any changes are accurately identified in the document content.
Document comparison programs, such as Litera Change-Pro, Workshare Professional or Deltaview, Soft Interface DiffDocs, DocsCorp Comparedocs and Esquire Innovations iRedline are computer applications that compare differences between two documents (e.g., Microsoft Word, Excel and PowerPoint Documents, PDF documents, HTML documents, database tables, etc.), a task formerly reserved solely for humans. These programs identify and ascertain differences in an original (first) and modified (second) document and display those differences in a third document, commonly referred to as a ‘redline’ document.
The use of conventional document comparison programs that produce ‘redline’ documents has, to date, been limited in its capacity to incorporate context in review of changes made between original and modified documents in relation to information displayed within tables in Microsoft® Word, Word_Perfect®, HTML, PDF and other document formats. Conventional methods and systems are limited in their ability to comprehend context within table layouts. They are only capable of comparing information presented within tables by comparing information stored at a cellular level. If a change is made to content within a cell, such as merging or splitting cells (both standard table layout processes), that change will be listed as a deletion or addition by the conventional methods and systems. The entire cell (including all content therein) will be displayed by the conventional methods and systems as having been changed. If multiple lines of text exist in the original document in a single cell and this text is moved into multiple cells, the conventional methods and systems would show all the text in the original cell as deleted and all the text in the new cells as an addition.
This presentation of a change to the table cell, even in a scenario where the context of such a change does not affect the user's comprehension of that information, belies the way that users experience and engage with content within tables. The merging of two cells, both containing content, does not change the context of the content originally held within those two, separate cells. Conventional methods and systems, however, consider such a change to a table layout a change to the content itself and mark that content as changed (as a deletion and addition). This limits the user's ability to view a document and decipher which changes made to that document are contextually relevant.
Embodiments of the invention provide an improved method and system, including a novel algorithm, herein termed the Intelligent Algorithm, that recognizes the merging and splitting of table cells and compares content in tables in a first document and a second document across and within those merged and split table cells. In an exemplary embodiment, the system and method, advantageously, provides the ability to compare content within tables in context of the additions and deletions of cells in tables caused by the merging and splitting of cells in tables by disregarding table structure, with the exception of scenarios in which additional content and cells in combination have been added or existing content and cells in combination have been deleted. The merging or splitting of cells, the rearrangement of content into parallel cells and other similar amendments to layout, when not representative of contextual change, are not recorded or listed as changes to documents by the exemplary embodiment. The Intelligent Algorithm is able to contextualize changes made within tables by creating an array of text found within a table in both an original (first) and a modified (second) document, then comparing the text array in the original document to the text array in the modified document, and, finally, displaying the text back to a user. Only words that have been modified (added or deleted) will be displayed by the Intelligent Algorithm as changes to the user (and not the entire cell content, as is conducted by prior art).
Accordingly, an exemplary system, method and computer program product for comparison of content within tables, separate from the form and structure of those tables; identifying tables in a first and second document, creating respective text arrays of content of the tables from the first and second documents, comparing the content of the respective text arrays to determine differences between the content of the tables, and displaying the determined differences between the content of the tables, regardless of form or structure of the tables, is presented.
Still other aspects, features, and advantages of the present invention are readily apparent from the following detailed description, by illustrating a number of exemplary embodiments and implementations, including the best mode contemplated for carrying out the present invention. The present invention is also capable of other and different embodiments, and its several details can be modified in various respects, all without departing from the spirit and scope of the present invention. Accordingly, the drawings and descriptions are to be regarded as illustrative in nature, and not as restrictive.
Embodiments of the present invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification. In the drawings:
The various aspects are described hereafter in greater detail in connection with a number of exemplary embodiments to facilitate an understanding of the invention. However, the invention should not be construed as being limited to these embodiments. Rather, these embodiments are provided so that the disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The invention includes recognition that conventional systems detect changes made within tables (including changes to text and the addition and deletion of cells in tables) in a first document and a second document and provide for the ability for users to view those changes in a third, ‘redline’ document. However, these systems fail to provide for a way to disregard changes made to form and structure of tables when those changes do not affect the presentation of content within those tables.
Generally, the exemplary embodiments include the capacity to compare content within tables in two documents separate from the form and structure of those documents. In the Intelligent Algorithm, the following steps are taken to provide this capacity: (1) the system and method first compares tables in documents using the traditional methods used by prior art; (2) when an added or deleted cell in a table is detected before or after an existing cell in a table, those cells are merged with the existing cell; (3) these merged and split cells in one document are connected to a corresponding cell in the other document by applying a dynamic programming matrix to the tables in the two documents. These merged cells are compared with the single comparable cell from the other document. Where there is discovered similarity (or where there is discovered the longest common subsequence between cell text strings), the system and method considers those cells merged or split. For each merged or split cell, the cell merging/splitting algorithm is applied. The dynamic programming matrix is constructed. Matrix cells are filled with numbers delineating the amount of similar words in one cell in a table from one document to the other and columns of the matrix are filled with merged/split cells. The matrix uses the following code to fill each cell in that table:
Where I, J are the linear indexes of table cells, orgCell and modCell are two arrays that hold the table cells' strings, and dp-matrix is a dynamic programming matrix. LCS is the function that returns Longest common sequence of two cell strings.
Referring now to the drawings, wherein like reference numerals designate identical or corresponding parts throughout the several views, and more particularly to
In
The above-described devices and subsystems of the exemplary embodiments can include, for example, any suitable servers, workstations, PCs, laptop computers, PDAs, Internet appliances, handheld devices, cellular telephones, wireless devices, other devices, and the like, capable of performing the processes of the exemplary embodiments. The devices and subsystems of the exemplary embodiments can communicate with each other using any suitable protocol and can be implemented using one or more programmed computer systems or devices.
One or more interface mechanisms can be used with the exemplary embodiments, including, for example, Internet access, telecommunications in any suitable form (e.g., voice, modem, and the like), wireless communications media, and the like. For example, employed communications networks or links can include one or more wireless communications networks, cellular communications networks, G3 communications networks, Public Switched Telephone Network (PSTNs), Packet Data Networks (PDNs), the Internet, intranets, a combination thereof, and the like.
It is to be understood that the devices and subsystems of the exemplary embodiments are for exemplary purposes, as many variations of the specific hardware used to implement the exemplary embodiments are possible, as will be appreciated by those skilled in the relevant art(s). For example, the functionality of one or more of the devices and subsystems of the exemplary embodiments can be implemented via one or more programmed computer systems or devices.
To implement such variations as well as other variations, a single computer system can be programmed to perform the special purpose functions of one or more of the devices and subsystems of the exemplary embodiments. On the other hand, two or more programmed computer systems or devices can be substituted for any one of the devices and subsystems of the exemplary embodiments. Accordingly, principles and advantages of distributed processing, such as redundancy, replication, and the like, also can be implemented, as desired, to increase the robustness and performance of the devices and subsystems of the exemplary embodiments.
The devices and subsystems of the exemplary embodiments can store information relating to various processes described herein. This information can be stored in one or more memories, such as a hard disk, optical disk, magneto-optical disk, RAM, and the like, of the devices and subsystems of the exemplary embodiments. One or more databases of the devices and subsystems of the exemplary embodiments can store the information used to implement the exemplary embodiments of the present inventions. The databases can be organized using data structures (e.g., records, tables, arrays, fields, graphs, trees, lists, and the like) included in one or more memories or storage devices listed herein. The processes described with respect to the exemplary embodiments can include appropriate data structures for storing data collected and/or generated by the processes of the devices and subsystems of the exemplary embodiments in one or more databases thereof.
All or a portion of the devices and subsystems of the exemplary embodiments can be conveniently implemented using one or more general purpose computer systems, microprocessors, digital signal processors, micro-controllers, and the like, programmed according to the teachings of the exemplary embodiments of the present inventions, as will be appreciated by those skilled in the computer and software arts. Appropriate software can be readily prepared by programmers of ordinary skill based on the teachings of the exemplary embodiments, as will be appreciated by those skilled in the software art. Further, the devices and subsystems of the exemplary embodiments can be implemented on the World Wide Web. In addition, the devices and subsystems of the exemplary embodiments can be implemented by the preparation of application-specific integrated circuits or by interconnecting an appropriate network of conventional component circuits, as will be appreciated by those skilled in the electrical art(s). Thus, the exemplary embodiments are not limited to any specific combination of hardware circuitry and/or software.
Stored on any one or on a combination of computer readable media, the exemplary embodiments of the present inventions can include software for controlling the devices and subsystems of the exemplary embodiments, for driving the devices and subsystems of the exemplary embodiments, for enabling the devices and subsystems of the exemplary embodiments to interact with a human user, and the like. Such software can include, but is not limited to, device drivers, firmware, operating systems, development tools, applications software, and the like. Such computer readable media further can include the computer program product of an embodiment of the present inventions for performing all or a portion (if processing is distributed) of the processing performed in implementing the inventions. Computer code devices of the exemplary embodiments of the present inventions can include any suitable interpretable or executable code mechanism, including but not limited to scripts, interpretable programs, dynamic link libraries (DLLs), Java classes and applets, complete executable programs, Common Object Request Broker Architecture (CORBA) objects, and the like. Moreover, parts of the processing of the exemplary embodiments of the present inventions can be distributed for better performance, reliability, cost, and the like.
As stated above, the devices and subsystems of the exemplary embodiments can include computer readable medium or memories for holding instructions programmed according to the teachings of the present inventions and for holding data structures, tables, records, and/or other data described herein. Computer readable medium can include any suitable medium that participates in providing instructions to a processor for execution. Such a medium can take many forms, including but not limited to, non-volatile media, volatile media, transmission media, and the like. Non-volatile media can include, for example, optical or magnetic disks, magneto-optical disks, and the like. Volatile media can include dynamic memories, and the like. Transmission media can include coaxial cables, copper wire, fiber optics, and the like. Transmission media also can take the form of acoustic, optical, electromagnetic waves, and the like, such as those generated during radio frequency (RF) communications, infrared (IR) data communications, and the like. Common forms of computer-readable media can include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other suitable magnetic medium, a CD-ROM, CDRW, DVD, any other suitable optical medium, punch cards, paper tape, optical mark sheets, any other suitable physical medium with patterns of holes or other optically recognizable indicia, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other suitable memory chip or cartridge, a carrier wave or any other suitable medium from which a computer can read.
While the present inventions have been described in connection with a number of exemplary embodiments, and implementations, the present inventions are not so limited, but rather cover various modifications, and equivalent arrangements, which fall within the purview of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
3920895 | Vieri et al. | Nov 1975 | A |
3920896 | Bishop et al. | Nov 1975 | A |
5008853 | Bly et al. | Apr 1991 | A |
5129082 | Tirfing et al. | Jul 1992 | A |
5146552 | Cassoria et al. | Sep 1992 | A |
5204947 | Bernstein et al. | Apr 1993 | A |
5272628 | Koss | Dec 1993 | A |
5321505 | Leddy | Jun 1994 | A |
5341469 | Rossberg et al. | Aug 1994 | A |
5515491 | Bates et al. | May 1996 | A |
5539871 | Gibson | Jul 1996 | A |
5596700 | Darnell et al. | Jan 1997 | A |
5596705 | Reimer et al. | Jan 1997 | A |
5606609 | Houser et al. | Feb 1997 | A |
5659676 | Redpath | Aug 1997 | A |
5664208 | Pavely et al. | Sep 1997 | A |
5669005 | Curbow et al. | Sep 1997 | A |
5671428 | Muranaga et al. | Sep 1997 | A |
5694544 | Tanigawa et al. | Dec 1997 | A |
5706452 | Ivanov | Jan 1998 | A |
5706502 | Floey et al. | Jan 1998 | A |
5708826 | Ikeda et al. | Jan 1998 | A |
5708845 | Wistendahi et al. | Jan 1998 | A |
5740444 | Frid-Nielsen | Apr 1998 | A |
5752055 | Redpath et al. | May 1998 | A |
5758313 | Shah et al. | May 1998 | A |
5761419 | Schwartz et al. | Jun 1998 | A |
5761499 | Sonderegger | Jun 1998 | A |
5781732 | Adams | Jul 1998 | A |
5781901 | Kuzma | Jul 1998 | A |
5787175 | Carter | Jul 1998 | A |
5799191 | Moriyasu et al. | Aug 1998 | A |
5801702 | Dolan et al. | Sep 1998 | A |
5809512 | Kato | Sep 1998 | A |
5860073 | Ferrel et al. | Jan 1999 | A |
5864870 | Guck | Jan 1999 | A |
5870754 | Dimitrova et al. | Feb 1999 | A |
5878421 | Ferrel et al. | Mar 1999 | A |
5890177 | Moody et al. | Mar 1999 | A |
5893126 | Drews et al. | Apr 1999 | A |
5911776 | Guck | Jun 1999 | A |
5931906 | Fidelibus, Jr. et al. | Aug 1999 | A |
5937066 | Gennaro et al. | Aug 1999 | A |
5938724 | Pommier et al. | Aug 1999 | A |
5941944 | Messerly | Aug 1999 | A |
5944785 | Pommier et al. | Aug 1999 | A |
5949413 | Lerissa et al. | Sep 1999 | A |
5950214 | Rivette et al. | Sep 1999 | A |
5956736 | Hanson et al. | Sep 1999 | A |
5958006 | Eggleston et al. | Sep 1999 | A |
5978836 | Ouchi | Nov 1999 | A |
5987469 | Lewis et al. | Nov 1999 | A |
6009462 | Birrell et al. | Dec 1999 | A |
6014135 | Fernandes | Jan 2000 | A |
6029171 | Smiga et al. | Feb 2000 | A |
6064751 | Smithies et al. | May 2000 | A |
6067551 | Brown et al. | May 2000 | A |
6088709 | Watanabe | Jul 2000 | A |
6119147 | Tommey et al. | Sep 2000 | A |
6158903 | Schaeffer et al. | Dec 2000 | A |
6178431 | Douglas | Jan 2001 | B1 |
6182080 | Clements | Jan 2001 | B1 |
6212534 | Lo et al. | Apr 2001 | B1 |
6243722 | Day et al. | Jun 2001 | B1 |
6289460 | Hajmiragha | Sep 2001 | B1 |
6317777 | Skarbo et al. | Nov 2001 | B1 |
6324555 | Sites | Nov 2001 | B1 |
6334141 | Varma et al. | Dec 2001 | B1 |
6336134 | Varma | Jan 2002 | B1 |
6343313 | Salesky et al. | Jan 2002 | B1 |
6360236 | Khan et al. | Mar 2002 | B1 |
6363352 | Dailey et al. | Mar 2002 | B1 |
6411989 | Anupam et al. | Jun 2002 | B1 |
6560637 | Dunlap et al. | May 2003 | B1 |
6590584 | Yamaura et al. | Jul 2003 | B1 |
6643663 | Dabney et al. | Nov 2003 | B1 |
6681371 | Devanbu | Jan 2004 | B1 |
6687878 | Eintracht et al. | Feb 2004 | B1 |
6708172 | Wong et al. | Mar 2004 | B1 |
7146561 | Bauchot et al. | Dec 2006 | B2 |
7251680 | DeVos | Jul 2007 | B2 |
7260773 | Zernik | Aug 2007 | B2 |
7266554 | Kayahara et al. | Sep 2007 | B2 |
7650355 | Davis | Jan 2010 | B1 |
7987444 | Fuller et al. | Jul 2011 | B2 |
20010037367 | Iyer | Nov 2001 | A1 |
20020023106 | Bauchot et al. | Feb 2002 | A1 |
20020049786 | Bibliowicz et al. | Apr 2002 | A1 |
20020059342 | Gupta et al. | May 2002 | A1 |
20020059343 | Kurishima et al. | May 2002 | A1 |
20020065848 | Walker et al. | May 2002 | A1 |
20020078088 | Kuruoglu et al. | Jun 2002 | A1 |
20020085030 | Ghani | Jul 2002 | A1 |
20020107886 | Gentner et al. | Aug 2002 | A1 |
20020143691 | Ramaley et al. | Oct 2002 | A1 |
20030023961 | Barsness et al. | Jan 2003 | A1 |
20030112273 | Hadfield et al. | Jun 2003 | A1 |
20030158855 | Farnham et al. | Aug 2003 | A1 |
20030197730 | Kakuta et al. | Oct 2003 | A1 |
20030217336 | Gounares et al. | Nov 2003 | A1 |
20040085354 | Massand | May 2004 | A1 |
20040205653 | Hadfield et al. | Oct 2004 | A1 |
20040230616 | Kruy et al. | Nov 2004 | A1 |
20050108280 | Kagle et al. | May 2005 | A1 |
20060167879 | Umeki et al. | Jul 2006 | A1 |
20060253482 | Zellweger et al. | Nov 2006 | A1 |
20060262339 | Jacobs et al. | Nov 2006 | A1 |
20070011183 | Langseth et al. | Jan 2007 | A1 |
20070186157 | Walker et al. | Aug 2007 | A1 |
20080222508 | Nguyen et al. | Sep 2008 | A1 |
20080256188 | Massand | Oct 2008 | A1 |
20080275870 | Shanahan et al. | Nov 2008 | A1 |
20080301193 | Massand | Dec 2008 | A1 |
20090119578 | Relyea et al. | May 2009 | A1 |
20090276692 | Rosner | Nov 2009 | A1 |
20100049745 | Aebig et al. | Feb 2010 | A1 |
Number | Date | Country | |
---|---|---|---|
20100241943 A1 | Sep 2010 | US |