The present invention relates generally to systems and methods for audit log analysis, and in particular, to a system and method for analyzing data recorded in an audit log generated as part of an electronic discovery (e-Discovery) process in litigation.
The invention provided herein has a number of embodiments useful, for example, in utilizing audit logs and extending the role of audit logs to serve additional functions of interest in the context of e-Discovery. According to one or more embodiments of the present invention, a method, apparatus, and computer program product is provided for analyzing data recorded in an audit log generated as part of an electronic discovery (e-Discovery) process in litigation.
In one aspect of the present invention, a computer implemented method is provided for analyzing data recorded in an audit log generated as part of an electronic discovery (e-Discovery) process in litigation. On one or more computers, an audit log is retrieved from a storage system accessible from the computer. The audit log comprises data regarding a chronological sequence of actions taken to produce case documents relevant in litigation. The data in the audit log is analyzed and a comprehensive overview of the electronic discovery process is compiled based on the analyzed data for presentation to a user. The actions recorded in the audit log include any user-generated content (e.g. flags, comments, etc.) associated with the production of the case document, which may be recorded as additional metadata for the document.
In one embodiment of the invention, the computer implemented method further monitors, on one or more computers, activity in the electronic discovery process based on the analyzed data. In another embodiment of the invention, the computer implemented method further recovers, on one or more computers, a previously produced case document that is corrupted based on the analyzed data. Corruption of a case document includes lost or corrupted metadata associated with the case document (e.g. lost or corrupted flags, comments, etc.). In a further embodiment of the invention, the audit log is cached in the storage system to speed up the analysis of the data in the audit log. In another embodiment of the invention, the computer implemented method further controls, on one or more computers, the expiration of case documents produced during the electronic discovery process based on the analyzed data.
Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
In the following description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration one or more specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural and functional changes may be made without departing from the scope of the present invention.
An audit log is a chronological sequence of audit records, each of which provides evidence directly pertaining to and resulting from the execution of a business process or system function (see, e.g. http://en.wikipedia.org/wiki/Audit_trail). Audit logs play an important role in the electronic discovery (e-Discovery) process. During the e-Discovery process, documents relevant to litigation often need to be located and extracted from very large collections of company documents. When producing such documents as evidence during litigation, the process that led to the selection of those documents is also very important. The sequence of actions that reviewers take to produce the documents is generally captured in audit logs, which corroborate the relevance of the produced documents and are thus usually produced alongside the documents as evidence. Any action pertinent to the litigation process must be recorded in the audit log. This may include audit records and metadata corresponding to actions taken to create collections of business documents, such as emails, business reports, and memos, as well as actions taken to categorize, index, search, analyze, annotate, and print these documents. For this reason, audit logs are indispensable to the e-Discovery process and are generally retained at all costs as an essential component of an e-Discovery product.
Embodiments of the present invention provide for non-traditional applications of audit logs in the context of e-Discovery systems and processes. Systems and methods are provided for analyzing and managing audit logs and records, which relate to litigation as well as post-litigation processes.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
With reference now to
In the depicted example, server 104 is connected to network 102 along with storage unit 106. In addition, clients 108, 110, and 112 are connected to network 102. These clients 108, 110, and 112 may be, for example, personal computers or network computers. In the depicted example, server 104 provides data, such as boot files, operating system images, and programs to clients 108, 110 and 112. Clients 108, 110 and 112 are clients to server 104. Network data processing system 100 may include additional servers, clients, and other devices not shown. In the depicted example, network data processing system 100 is the Internet with network 102 representing a worldwide collection of networks and gateways that use the TCP/IP suite of protocols to communicate with one another.
Referring to
Peripheral component interconnect (PCI) bus bridge 214 connected to I/O bus 212 provides an interface to PCI local bus 216. A number of modems may be connected to PCI local bus 216. Typical PCI bus implementations will support four PCI expansion slots or add-in connectors. Communications links to network computers 108, 110 and 112 in
Those of ordinary skill in the art will appreciate that the hardware depicted in
The data processing system depicted in
Server 104 may provide a suitable website or other internet-based graphical user interface accessible by users to enable user interaction for aspects of an embodiment of the present invention. In one embodiment, Netscape web server, IBM Websphere Internet tools suite, an IBM DB2 for Linux, Unix and Windows (also referred to as “IBM DB2 for LUW”) platform and a Sybase database platform are used in conjunction with a Sun Solaris operating system platform. Additionally, components such as JBDC drivers, IBM connection pooling and IBM MQ series connection methods may be used to provide data access to several sources. The term webpage as it is used herein is not meant to limit the type of documents and programs that might be used to interact with the user. For example, a typical website might include, in addition to standard HTML documents, various forms, Java applets, JavaScript, active server pages (ASP), Java Server Pages (JSP), common gateway interface scripts (CGI), extensible markup language (XML), dynamic HTML, cascading style sheets (CSS), helper programs, plug-ins, and the like.
With reference now to
Expansion bus interface 314 provides a connection for a keyboard and mouse adapter 320, modem 322, and additional memory 324. SCSI host bus adapter 312 provides a connection for hard disk drive 326, tape drive 328, and CD-ROM drive 330. Typical PCI local bus implementations will support three or four PCI expansion slots or add-in connectors.
An operating system runs on processor 302 and is used to coordinate and provide control of various components within data processing system 300 in
Those of ordinary skill in the art will appreciate that the hardware in
As another example, data processing system 300 may be a stand-alone system configured to be bootable without relying on some type of network communication interface, whether or not data processing system 300 comprises some type of network communication interface. As a further example, data processing system 300 may be a Personal Digital Assistant (PDA) device, which is configured with ROM and/or flash ROM in order to provide non-volatile memory for storing operating system files and/or user-generated data.
The depicted example in
Those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope of the present invention. Specifically, those skilled in the art will recognize that any combination of the above components, or any number of different components, including computer programs, peripherals, and other devices, may be used to implement the present invention, so long as similar functions are performed thereby.
For example, any type of computer, such as a mainframe, minicomputer, or personal computer, could be used with and for embodiments of the present invention. In addition, many types of applications other than caching applications could benefit from the present invention. Specifically, any application that performs remote access may benefit from the present invention.
Herein, the term “by” should be understood to be inclusive. That is, when reference is made to performing A by performing X and Y, it should be understood this may include performing A by performing X, Y and Z.
In block 402, an audit log is retrieved from a storage system accessible from one or more computers. The audit log comprises data regarding a chronological sequence of actions taken to produce case documents relevant in litigation.
In block 404, the data in the audit log is analyzed on one or more computers.
In block 406, a comprehensive overview of the electronic discovery process is compiled, on one or more computers, based on the analyzed data for presentation to a user.
According to a first embodiment of the present invention, the analyzed data in the audit log is used to monitor case activity during the litigation and e-Discovery process. According to a second embodiment, the analyzed data in the audit log is used to backup and recover lost or corrupted cases or documents involved in the litigation process, which include the metadata for the cases or documents as well as the audit actions leading to the generation of the metadata. According to a third embodiment, the analyzed data in the audit log is used to control case document expiration. According to a fourth embodiment, the audit log is cached to speed up audit analysis. In exemplary implementations, the systems and methods provided operate on top of an existing records management system, such as a FileNet P8™ or CM™ system provided by IBM®.
According to one aspect of the present invention, the computer implemented method of analyzing data recorded in an audit log generated as part of an e-Discovery process provides for the monitoring of case activity. Apart from producing a monolithic audit report at the end of the e-Discovery process, audit logs can also be used during the process to monitor, track, analyze, and optimize the process itself. In various embodiments, the monitoring of case activity includes reviewing the actions of a particular reviewer, for example, checking up on the actions of a new person assigned with an e-Discovery task. In other embodiments, the monitoring of case activity includes improving the efficiency of the case review/e-Discovery process by locating areas of inefficiency that can be redesigned. Further embodiments include tracking case review/e-Discovery activity and progress towards various goals. In one exemplary implementation, for e-Discovery tasks assigned to multiple reviewers, a supervisor can browse or search the audit log to oversee individual reviewer activity, track progress, detect potential problems, and locate process inefficiencies that can be optimized.
This method and system also provides for early detection of any abnormal activity (both innocent and malicious) in the e-Discovery process, thus avoiding any potentially serious and expensive consequences. Activity that is innocent or unintentional but harmful includes, for example, premature exports of documents and flagging too many documents. Activity that is malicious includes, for example, abuse of access privileges that compromise the security of the documents. Early detection of such abnormal activities is important in preventing any undesirable consequences.
According to a second aspect of the present invention, the computer implemented method of analyzing data recorded in an audit log generated as part of an e-Discovery process provides for the recovery of lost or corrupted cases. Since the process of gathering evidence can be long and tedious, any loss or corruption of data can set the effort back significantly. A case or document can be lost or corrupted, for example, if the case is deleted or a fatal software or hardware failure occurs.
Full backups of case or document data structures can be potentially large and expensive. In traditional backup recovery mechanisms, full backups of data need to be performed frequently, which incur a recurring cost both in terms of resources and performance (e.g. disk space and CPU cycles). Furthermore, recovery from backups to a globally consistent state with minimal data loss is often a tricky endeavor.
Embodiments of the present invention provide a simple and cost-effective method and system for recovering lost or corrupted documents. The actions by reviewers that are applied to a case manipulate one or more data structures. The current state of the system is a cumulative result of all the actions that were taken by users of the system up to that point. Furthermore, any action that materially changes the contents of a case is recorded as an entry in the audit log. If the audit log is determined to be intact by the system, a lost or corrupted case can be recovered, regenerated or rebuilt from any starting point in the case's history to any consistent state prior to failure by replaying or repeating the actions in the audit log in chronological order. Since the audit log is transactional (i.e. actions or sets of actions are audited only after they are completed, thus leaving the case in a consistent state), recovery to any point in the audit trail will return the case to a globally consistent state from which the e-Discovery process can resume. Different needs for case recovery are satisfied by various embodiments of the invention, which includes, for example, reverting a case to its initial state (e.g. just after creation), reverting a case to its last consistent state (e.g. just before the system crashed or the case was deleted), and reverting a case to any desired state in between (e.g. just before a major action was taken accidentally). As an additional advantage, the original contents of the audit log are retained even after recovery.
Since the audit log is provided for an e-Discovery process, there is little or no overhead for the recovery mechanism provided herein. Additionally, the audit log contains a record of user actions, which can be monitored or analyzed easily by a user, rather than database operations that are hardly human-readable. This allows unhindered user control and input when initiating audit log-based database recovery. Furthermore, with the recovery mechanism provided herein, the system can be rebuilt entirely by replaying the actions recorded in the audit log rather than rely on some backup data, albeit inconsistent data, being available so that the recovery process could roll back to its last backed-up consistent state prior to the crash. This is especially useful if earlier parts of the audit log contain actions that are known to be obsolete and the user decides to skip them during recovery. Moreover, only the audit log needs to be determined to be intact and uncorrupted for the recovery mechanism to bring the system back to a consistent state, which is more cost-effective than regular backups.
According to a third aspect of the present invention, the computer implemented method of analyzing data recorded in an audit log generated as part of an e-Discovery process provides for control over the expiration of case documents. Documents in a case are released when they are no longer relevant to the litigation being carried out. Each document is assigned an expiration date. Typically when a case is deleted, so are all the documents in it. However, in some situations, documents in a deleted case may still need to be retained, for example, due to further litigation that may require them or until a statute of limitations expires.
Embodiments of the invention provide a method and system for preserving and accessing such documents after the case has been deleted. Unlike other case artifacts, the audit log is retained even after a case is deleted and within it are references to each document on which some audit-worthy action was taken. These references provide the location and a handle or pointer for each document, thus retaining them for later access. Additionally, once the documents are accessed via the audit log, their expiration dates may be updated with new expiration dates that are propagated down to an underlying records management system (e.g. IBM® Content Manager, IBM® FileNet Records Manager), which is responsible for the classifying, storing, and disposing of these cases and documents. This may be accomplished through the use of some simple extensions.
According to a fourth aspect of the present invention, the computer implemented method of analyzing data recorded in an audit log generated as part of an e-Discovery process provides for the caching of audit logs to speed up audit analysis, as illustrated in
In at least one embodiment, the audit log is stored in a content repository 512 (i.e. repository storage) or other backup system to ensure high availability and permanence. The repository 512 may be a remotely-located server. In an exemplary implementation, a queue-like structure 514 allows batch writes to the repository. The queue 514 is flushed periodically (depending on the flush interval) and reduces the load on the repository.
In other embodiments, fast access is needed for deep real-time analysis and monitoring of case activity via audit logs. In at least one embodiment, the audit log is stored locally on a disk-based index 510 (i.e. disk storage) to allow fast searching and analysis in answering queries of interest. While such a data structure facilitates interactive querying, it does not provide the same availability and recoverability guarantees that a content repository does.
In preferred embodiments, the audit log is stored in both a repository 512 and local disk 510 (i.e. dual storage model) to provide balancing between performance and recoverability depending on the needs of the user. In the dual storage model, audit records are eventually persisted on a repository 512 but in the meantime are also cached in an index on a local disk 510. Storing an audit log in a repository 512 ensures high availability and permanence of audit logs and records and storing it on a local disk 510 allows faster searching and analysis of audit logs and records for user queries.
In at least one embodiment, as shown in
The two versions of the audit log (separately stored on the local disk and on the repository) must be synchronized periodically. During synchronization, the repository is queried to obtain its last committed or synchronized state. All the audit records from the last synchronized state to the latest consistent state between the repository and the disk storage are then written to the repository. The synchronization is incremental and non-blocking. Furthermore, actions continue to be audited in real-time while synchronization is taking place.
The frequency of synchronization is governed by a “flush interval” which determines the balance between performance and recoverability.
This concludes the description of the preferred embodiments of the present invention. The foregoing description of the preferred embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto. The above specification, examples and data provide a complete description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.