The present disclosure relates in general to databases, and, in particular, systems and methods for mapping fields.
Electronic Discovery involves the exchange of electronic documents and emails between parties pursuant to litigation. The documents and emails are stored in databases, often referred to as document review platforms. Electronic Discovery requires importing and exporting documents between various document review platforms and Electronic Discovery tools. There is no concrete specification for data formats used in the exchange of data between the various document review platforms and Electronic Discovery tools. Therefore, an operator must undertake a time-intensive process of inspecting and manipulating the data for data imports and exports to conform to specifications.
The present invention will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the invention, which, however should not be taken to limit the invention to the specific embodiments, but are for explanation and understanding only.
The following description sets forth numerous specific details such as examples of specific systems, apparatuses, components, methods, and so forth, in order to provide a good understanding of several embodiments of the present invention. It will be apparent to one skilled in the art, however, that at least some embodiments of the present invention may be practiced without these specific details. In other instances, well-known components or methods are not described in detail or are presented in simple block diagram formats in order to avoid unnecessarily obscuring the present invention. Thus, the specific details set forth are merely exemplary. Particular implementations may vary from these exemplary details and still be contemplated to be within the spirit and scope of the present invention.
The server 104 consists of a data repository that store data field names and data field categories in one or more databases 108 for use by the client devices 102 as described in detail below. The database 108 may be connected to the server 104 or directly to the network.
One server 104 may interact with a large number of client devices 102. Therefore, each server 104 is typically a high end computer with a large storage capacity, one or more fast microprocessors, and one or more high speed network connections. In contrast, each client device 102 typically includes less storage capacity, a single microprocessor, and a single network interface.
Each of the devices illustrated in
The example computing device 200 includes a main unit 202 which may include, if desired, one or more processing units 204 electrically coupled by an address/data bus 206 to one or more memories 208, other computer circuitry 210, and one or more interface circuits 212.
The processing unit 204 may include any suitable processor or plurality of processors. In addition, the processing unit 204 may include other components that support the one or more processors. For example, the processing unit 204 may include a central processing unit (CPU), a graphics processing unit (GPU), and/or a direct memory access (DMA) unit.
The memory 208 may include various types of non-transitory memory including volatile memory and/or non-volatile memory such as, but not limited to, distributed memory, read-only memory (ROM), random access memory (RAM) etc. The memory 208 typically stores a software program that interacts with the other devices in the system as described herein. This program may be executed by the processing unit 204 in any suitable manner. The interface circuit 212 may be implemented using any suitable interface standard, such as an Ethernet interface and/or a Universal Serial Bus (USB) interface. One or more input devices 214 may be connected to the interface circuit 212 for entering data and commands into the main unit 202. For example, the input device 214 may be a keyboard, mouse, touch screen, track pad, voice recognition system, and/or any other suitable input device. One or more displays, printers, speakers, monitors, televisions, high definition televisions, and/or other suitable high bandwidth output devices 216 may also be connected to the main unit 202 via the interface circuit 212. High bandwidth output devices 216 typically consume uncompressed data, such as uncompressed audio and/or video data. For example, a display for displaying decompressed video data may be a cathode ray tube (CRTs), liquid crystal displays (LCDs), electronic ink (e-ink), and/or any other suitable type of display. One or more storage devices 218 may also be connected to the main unit 202 via the interface circuit 212. For example, a hard drive, CD drive, DVD drive, and/or other storage device may be connected to the main unit 202. The storage device 218 may store any type of data used by the device 200. The computing device 200 may also exchange data with one or more low bandwidth input/output (I/O) devices 220. Low bandwidth I/O devices 220 typically produce and/or consume compressed data, such as compressed audio and/or video data. For example, low bandwidth I/O devices 220 may include network routers, thumb drives, and so on. The computing device 200 may also exchange data with other network devices 222 via a connection to a network 108 of
The user interface 302 receives input from the user and displays the current field map, field mapping status, a list of unmapped fields, user controls (e.g., buttons) for generating and updating the field map, and options for loading, saving, accepting, rejecting, or altering field maps. Via the user interface 302, the user may choose, from storage (e.g., memory or hard drive), a list of desired field names 310 for the final mapping result and one or more flat files 312 as inputs for the field mapping module 304. The list of desired field names 310 may be a file comprising the desired field names. The list of desired field names 310 may be located on the hard drive or located in memory. The flat files 312, defined by a data format (e.g., Concordance-compatible flat file (dat), comma separated values (csv), or binary format), may be a file on the hard drive. The client 102 may also receive the flat files 312 and list of desired field names 310 via an Application Programming Interface (API).
The field name repository 306 stores dictionaries of field names and their categories. For example, in the industry of electronic discovery, such field names may include the metadata captured by electronic discovery tools that represent the date when a particular email was sent. The field name repository 306 may also store the categorization of each of the field names it stores. For example, the field name repository 306 may store email metadata for the date an email was in a flat file 312, that metadata field may be named any of the following “Date Sent”, “Sent Date”, “EmailSentDate”, “DateEmailSent”. Because those fields are named differently, but have essentially the same meaning, the field name repository 306 would have each of those field names categorized under the same category. In this circumstance, the category may be named “Email Sent Date.”
In one embodiment, the field name repository 306 resides only on the client 102. In this instance, the client 102 can generate a field map without the need for a network connection to a server 106. In another embodiment, the field name repository 306 resides only on the server 106. In this instance, a client 102 must have a network connection to server 106 to generate a field map. In yet another embodiment the field name repository 306 resides on both the client 102 and the server 106. In this instance, the client may utilize the field name repository residing on either the client 102 or the server 106 or both.
The field mapping module 304 performs the bulk of the field mapping process and receives input from several sources. These input sources include user decisions via user interface 302, field name repositories 306, the fields comprising the list of desired field names 310, flat files 312, and database servers 314.
The purpose of field mapping module 304 is to map fields between one or more flat files 312 and one or more relational databases located on database servers 314, or between one or more flat files 312 and a list of desired field names 310, or between a list of desired field names 310 and one or more relational databases located on database server 314. When mapping fields between the aforementioned sources, the field mapping module 304 adheres to rules defined by user interface 302 and categories defined in field name repository 306. The process performed by the field mapping module involves categorizing each field from two or more of the aforementioned sources, comparing the resulting categorizations and generating a field map. The field mapping module 304 also creates or updates the flat files 312 utilizing the desired field names in accordance with the mapping results determined by the field mapping module 304 and/or input from the user via the user interface 302. The field mapping module 302 also updates the field name repository 306 with new fields and categories. The field mapping module 302 may also assist, utilizing the resulting field map, with exporting or importing fields and accompanying data from relational databases to or from, respectively, flat files.
If the field mapping module 304 can categorize any fields that were mapped by the user, then the field mapping module 304 may add those fields to their associated categories in the field name repository 303.
If the categories for the fields paired by the user cannot be determined, the field mapping module prompts the user to categorize fields having undefined categories via the user-interface (block 455). Previously unknown field names and categories, that have now been defined by the user, may be added to the field name repository 303 (block 460). Because the field name repository 303 may be located on a server on a local network or the internet, many clients 102 may benefit from access to a field mapping engine that may be updated dynamically as new fields and categories are discovered. To avoid a situation where a field name repository 303 is populated with erroneous field categorization, an administrator may perform a verification or an algorithm may allow an update to the field name repository 303 only when certain conditions are met. After field mapping is complete, the field management system saves the field map (block 465). The field management system updates the flat file 312 by replacing its original field names with the corresponding field names from list of desired field names 310 according to the field map (block 470). The update may be performed several ways including, but not limited to, direct modification of the original flat file or making a copy of the flat file and modifying the copy. Once that is complete, the field mapping module 304 checks whether another flat file is queued to be mapped by the field mapping module (block 475). If yes, the field mapping module starts a new field mapping procedure starting at block 405. This allows for multiple flat files and relational databases, involving substantially the same subject matter, to be mapped in the same mapping session for speed and efficiency.
Number | Date | Country | |
---|---|---|---|
62360443 | Jul 2016 | US |