This invention is related to obtaining of data from messaging files, and more particularly, to methods and systems to obtain data from messaging files in order to extract electronic data from such messaging files and store the data in a different format.
Vast amounts of active and archived corporate electronic information exist on backup tape media. This information is increasingly becoming the target of opposing litigation attorneys or increasingly important as a source of information for knowledge management. Conventional methods of producing data from large quantities of backup tapes are difficult to implement, cost prohibitive, or both.
A problem with managing data from backup media is particularly problematic with companies having many different tape backup systems using different backup environments. A previous attempt to solve the problem of retrieving information from backup tapes involves restoring the tapes using a “Native Environment” (NE) approach. The NE approach recreates the original backup environment from which the tape was generated so that data from the tapes can be restored and moves the restored data from the replicated environment to a target storage system for further analysis.
Replicating the NE in order to restore backup tapes requires that all server names, configurations, software versions, user names, and passwords are consistent with the environment as it stood at the time of the backup. Replicating all of this information becomes quite challenging as systems age, names of systems change, passwords change, software versions change, and administrators change. Furthermore, backup software is typically designed to restore data for the purposes of disaster recovery (an all or nothing proposition) and not to intelligently process large amounts of data from large numbers of media to obtain only relevant information.
Even if the backup environment can be recreated, all the records may need to be examined. Those records may be for over thousand employees in a large company. Managing all this data is a nightmare even if the environment can be recreated. For many companies, the amount of information can exceed a terabyte. Storing over a terabyte of information takes a lot of memory space and consumes valuable computer resources during the storing operation.
Beyond trying to manage the shear volume of data, other problems exist. Passwords of former employees may need to be replicated. Further, operating and backup applications become obsolete over time. In other instances, the information can only be backed up onto a specific machine that may no longer exist. Simply put, trying to extract any or all data from a large number backup tapes generated from different backup environments is difficult.
Working with messaging systems adds further complexity. The database files for a single messaging system may have different formats and attributes depending on whether the database file is for a message, a calendar item, a contact (an address book entry), or the like. Currently, about the only way to extract the data is to replicate the messaging environment including individual email accounts. Replication of the messaging environment may be impossible, or if not impossible, replication is costly, time consuming, or both.
A method and system can be used to read and obtain data from messaging files regardless of the messaging environment used to generate the messaging files. The method and system can read part of a messaging file to identify the type of entry (e.g., email message, calendar item, address book entry, etc.) and access information on where information within the entry is located within the messaging file based on an identifying signature. The method and system can be used to obtain data from messaging files without having to recreate the messaging environment, including individual email accounts. The data can be stored in a target storage medium in a format that is more usable and more easily searched.
In one set of embodiments, a method of obtaining data from a messaging file can include identifying an entry within the messaging file. The method can also include extracting electronic data corresponding to the entry. The method can further include storing the electronic data in a different format.
In another set of embodiments, a data processing system readable medium can have code embodied therein. The code can comprise instructions for carrying out the methods described herein.
The foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as defined in the appended claims.
The present invention is illustrated by way of example and not limitation in the accompanying figures.
Skilled artisans appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.
Reference is now made in detail to the exemplary embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts (elements).
The methods and systems described in more detail below can be used to extract data directly from messaging files and store the data in a more usable format within a target storage system. By circumventing recreation of the messaging environment, including individual email accounts, the methods and systems can eliminate the expense of hardware and time spent configuring software that is usually required in order to properly replicate a messaging environment.
Data extraction can now be performed in heterogeneous environments without having to recreate the messaging environment. A suite of software applications can be used to read tapes from any environment, any operating system, basically any host platforms and any backup system tape, more specifically, and process data from the messaging file and store it in a target storage system without having to recreate the messaging environment, including individual email accounts. The method and system described herein can obtain the data from entries within a messaging file and interpret that data without reliance on the messaging environment used to create that file. After the data is extracted, the data can be stored in a more usable form, such as within a database that can be readily searched.
Optionally, filtering the electronic data based on parameter(s) of interest (dates, file type, keyword searching, any metadata, etc.) can be performed as the electronic data is read from the messaging file and before storing the electronic data on the virtual media or other media at the target sub-system (e.g., all data can be read from the messaging files and placed on the virtual or other media). The method and system can track the location of all original data as it is placed in a database at the target sub-system.
Before describing the system and method in more detail, a few terms are defined or clarified to aid in understanding of the invention. The term “messaging” is intended to mean related to communication or communications resources. For example, a messaging environment may include the hardware, software, database(s), files, or any combination thereof used to operate an email system. The messaging environment includes individual email accounts. A messaging file may include information including email messages, calendar items, address books, and the like.
The term “tape” is used generally to describe any form of electronic storage media from which electronic data can be retrieved. The term “backup tape” is used to refer to a tape used to store backup data or information. Backup tapes are not limited only to tape drives, and can include hard disk drives, CD-ROMs, flash cards, or other persistent data storage medium.
As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a method, process, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such method, process, article, or apparatus. Further, unless expressly stated to the contrary, “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
Also, use of the “a” or “an” are employed to describe elements and components of the invention. This is done merely for convenience and to give a general sense of the invention. This description should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.
System 100 can further comprise file extraction engine 142, de-duplication engine 144, email extraction engine 146, collective database 162, and backup tape(s) 182, each of which is addressed below. File extraction engine 142 is coupled to backup tape(s) 122 and can separate email files from other files. Email extraction engine 146 is coupled to file extraction engine 142 and can extract data regarding the email without the need of recreating email accounts. De-duplication engine 144 is coupled electronically to network 124, file extraction engine 142, and email extraction engine 146 and can save one copy of content for a file or email entry and metadata for the file or email entry. Such metadata may include information regarding which computers (not shown in
The components shown in
Portions of the methods described herein may be implemented using a computer program that can operate on one or more computers. The computer program can include code that comprises instructions to carry out the method described herein. The computer program may be stored on a tangible medium, such as ROM, RAM, HD, a DASD array, magnetic tape, floppy diskette, optical storage device, or other appropriate computer readable medium.
In an illustrative embodiment of the invention, the computer-executable instructions may be lines of assembly code, compiled C, C++, Java, or other language code. Other architectures may be used. For example, the functions of any one of the computers may be performed by a different computer. Additionally, a computer program or its software components with such code may be embodied in more than one computer.
A principal focus of the system and method described herein is directed to email extraction engine 146. Still, other portions of system 100, and particularly file extraction engine 142, may be described to aid in understanding of the method and system described in more detail below. The method and system can involve taking electronic data from messaging files from backup tapes 122, network 124, or both without having to recreate the messaging environment or individual email accounts (e.g., *.pst files). The data from the messaging systems can be put into a useable, searchable form, so that later access to the data can be performed without needing to recreate the messaging environment.
A brief overview of one embodiment is described with respect to
A more detail description of the method and system is given in accordance with non-limiting, exemplary embodiments of the present invention. At the beginning of the method described, in one embodiment, system 100 can include messaging files that reside within backup tape 122 or on network 124. The method includes identifying messaging files on backup tape 122 and network 124. The procedure for identifying the messaging files on backup tape 122 involves intervening activities. Therefore, the procedure for backup tape 122 is addressed before the procedure for network 124.
The backup tape 122 may have been generated during a routine back up of a computer (not shown) using a conventional backup system. Intervening activities are used to extract files from backup tape 122 using file extraction engine 142. Initially, data may have been backed up onto backup tape 122 from any number of systems. Those systems may have different backup environments. File extraction engine 142 can communicate with the hardware to understand the stored data formats/hardware protocols (e.g., SCSI) in order to read the raw data as shown in step 202. In this embodiment, file extraction engine 142 interprets/reverse engineers the data from backup tape 122 (e.g., extract the data directly from the backup tape 122 by understanding the system (e.g., UNIX TAR) and the protocols used in storing the data)). A more detailed description of the communication and interpretation/reverse engineering are given in U.S. patent application Ser. Nos. 10/697,728 entitled “System and Method for Data Extraction in a Non-Native Environment” by Gardner et al. filed Oct. 30, 2003, and 60/440,855 entitled “System and Method for Data Extraction in a Non-Native Environment, Data De-Duplication, Database Creation and Manipulation, Image Back-up and PST File Monitoring” by Gardner et al. filed Jan. 17, 2003.
File extraction engine 142 identifies the backup system used to generate backup tape 122. Non-limiting examples of backup systems include Backup Exec™, ARCserve™ or UNIX TAR. After determining the backup system for generating backup tape 122, a lookup table can be used to determine where different portions of the data for a file are located.
In one embodiment, the output from reading part of backup tape 122 is illustrated in the hex editor display of
File extraction engine 142 can then use the file name extension to identify messaging files on backup tape 122 (step 204). The messaging files may be used with messaging systems, such as Microsoft Exchange™, Lotus Notes™, Eudora™, or the like. In one specific embodiment, Microsoft Exchange™ systems may be used, and Exchange Database (“EDB”) files are the messaging files being identified. The EDB files have a file name extension of “edb”, and therefore, file extraction engine 142 searches for EDB files, which have file names of “*.edb”, wherein “*” represents the base portion of the file name and may be one or more characters long. Other messaging systems, including Lotus Notes™, Eudora™, or the like, may have similar file name extensions to identify messaging files. The file illustrated in
Information on network 124 may include messaging files (part of EUI in network 124) that came from computers (not shown) backed up onto network 124. A server or other computer (not shown) used for operating network 124 may identify messaging files on network 124 in a manner similar to file extraction engine 142. The server or other computer may search for files having file name extensions of “edb”. After identifying the files, whether on backup tape 122, network 124, or both, the messaging files are sent from file extraction engine 142, network 124, or both to email extraction engine 146 at steps 206 and 222 in
The data extraction is performed by email extraction engine 146 in a manner similar to the file extraction performed by file extraction engine 142 discussed previously. Similar to file extraction, extraction of electronic data from a messaging file can be performed without having to recreate the messaging environment, such as recreating hardware, software, and individual email accounts for each of the users.
As shown in one embodiment of
In one embodiment; the method and system can interpret/reverse engineer the raw data from the messaging files. The raw data and patterns of the raw data within the messaging files can be examined to determine the type of each entry (e.g., email messages, calendar item, address book entry, etc.) in the messaging file based on the logical format of the data. Different types of entries within the messaging files will have different attributes. For example, an email message may have attributes including addressee, addressor, copies (whether actual or blind copies (“bcc”)), title, content, time sent, and potentially other attributes. A calendar item may have attributes including start time, end time, title, location, reminder, and potentially other attributes. An address book entry may have attributes including name, address, telephone number, fax number, mobile (cellular) number, email address, and potentially other attributes. Note that the email message may have only one time attribute, whereas a calendar item has two times attributes. Also, an address book entry has a telephone number attribute, which may not occur with an email message or calendar item. By examining the raw data and patterns within the messaging files, the type of entry can be determined.
After the type of entry is determined, a lookup table can be accessed to determine the logical format for the electronic data corresponding to the particular entry. The look up table may include information related to what the type of entry and where the attributes for that entry are located within the messaging file. In one embodiment, the method can also include extracting data from the message files using email extraction engine at step 244. The data can be extracted using the logical format to determine the attributes of the entry.
As shown in the embodiment of
Note that not all of the activities described above are required, that an element within a specific activity may not be required, and that further activities may be performed in addition to those illustrated. Still further, the order in which each of the activities are listed are not necessarily the order in which they are performed. After reading this specification, skilled artisans will be capable of determining what activities can be used for their specific needs.
Similar to file extraction, extraction of electronic data from a messaging file can be performed without having to recreate the messaging environment, such as recreating hardware, software, and personal email accounts for each of the users. Recreation of the messaging environment to restore backup data involves the procurement, setup and operation of messaging software, servers upon which to load the software, and databases upon which to store the messaging files, all of which are expensive and time consuming. The method obviates the need for those operations and costs, saving considerable time and money.
The system and method can be designed to allow for output into nearly any electronic format desired, which adds considerably more flexibility in use for the end user, as opposed to simply having a *.pst file full of email. Thus, the data from the messaging files can be extracted and put directly into the database without having to creating or recreate *.pst files.
The example illustrates how information can be extracted from an EDB file.
Referring to the embodiment of
Referring to
The first message corresponds to box 440 and starts at with the “0700” characters at the end of the corresponding arrow in
Referring to the next row pointer (“B600F084”), the next message starts at 4F0 bytes from the beginning and has a length of B6 (hexadecimal) bytes. The file includes two other messages that correspond to the other two row pointers within box 460.
The owning object ID to table identifier mapping and the column identifier to MAPI identifier mappings are contained in a Msysobj table.
By knowing the conventions used for email files, such as EDB files, the techniques described herein can be used to obtain information from such email files without having to recreate individual email accounts.
In the foregoing specification, the invention has been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present invention as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of present invention.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or element of any or all the claims.
This application claims priority under 35 U.S.C. §119(e) to U.S. Patent Application Nos. 60/440,855 entitled “System and Method for Data Extraction in a Non-Native Environment, Data De-Duplication, Database Creation and Manipulation, Image Back-up and PST File Monitoring” by Gardner et al. filed Jan. 17, 2003, and 60/440,728 entitled “Method and System for Enterprise-Wide Retention of Digital or Electronic Data” by Robert Gomes filed Jan. 17, 2003. This application claims priority under 35 U.S.C. §120 to and is a continuation-in-part of U.S. patent application Ser. No. 10/697,728, entitled “System and Method for Data Extraction in a Non-Native Environment” by Gardner et al. filed Oct. 30, 2003 now abandoned. This application is related to U.S. patent application Ser. No. 10/759,622 entitled “Method and System for Enterprise-Wide Retention of Digital or Electronic Data” by Gomes filed on Jan. 16, 2004, Ser. No. 10/759,599, entitled “System and Method for Data De-Duplication” by Gardner et al. filed on Jan. 16, 2004, Ser. No. 10/759,623, entitled “System and Method for a Data Extraction and Backup Database” by Gardner et al. filed on Jan. 16, 2004, Ser. No. 10/759,643 entitled “Method and System for Forensic Imaging to Virtual Media” by Gardner et al. filed on Jan. 16, 2004, and Ser. No. 10/760,010 entitled “System and Method of Monitoring a Personal Folder File” by Gardner et al. filed on Jan. 16, 2004.
Number | Name | Date | Kind |
---|---|---|---|
5107419 | MacPhail | Apr 1992 | A |
5350303 | Fox et al. | Sep 1994 | A |
5535381 | Kopper | Jul 1996 | A |
5617566 | Malcolm | Apr 1997 | A |
5689699 | Howell et al. | Nov 1997 | A |
5717913 | Driscoll | Feb 1998 | A |
5732265 | Dewitt et al. | Mar 1998 | A |
5742807 | Masinter | Apr 1998 | A |
5778395 | Whiting et al. | Jul 1998 | A |
5813009 | Johnson et al. | Sep 1998 | A |
5813015 | Pascoe | Sep 1998 | A |
5926811 | Miller et al. | Jul 1999 | A |
5937401 | Hillegas | Aug 1999 | A |
5982370 | Kamper | Nov 1999 | A |
6023710 | Steiner et al. | Feb 2000 | A |
6047294 | Deshayes et al. | Apr 2000 | A |
6157931 | Cane et al. | Dec 2000 | A |
6182029 | Friedman | Jan 2001 | B1 |
6189002 | Roitblat | Feb 2001 | B1 |
6199067 | Geller | Mar 2001 | B1 |
6199081 | Meyerzon et al. | Mar 2001 | B1 |
6216123 | Robertson et al. | Apr 2001 | B1 |
6226630 | Billmers | May 2001 | B1 |
6226759 | Miller et al. | May 2001 | B1 |
6240409 | Aiken | May 2001 | B1 |
6243713 | Nelson et al. | Jun 2001 | B1 |
6256633 | Dharap | Jul 2001 | B1 |
6269382 | Cabrera et al. | Jul 2001 | B1 |
6278992 | Curtis et al. | Aug 2001 | B1 |
6324548 | Sorenson | Nov 2001 | B1 |
6389403 | Dorak | May 2002 | B1 |
6421767 | Milillo et al. | Jul 2002 | B1 |
6477544 | Bolosky et al. | Nov 2002 | B1 |
6493711 | Jeffrey | Dec 2002 | B1 |
6662198 | Satyanarayanan et al. | Dec 2003 | B2 |
6708165 | Jeffrey | Mar 2004 | B2 |
6745197 | McDonald | Jun 2004 | B2 |
6751628 | Coady | Jun 2004 | B2 |
6778979 | Grefenstette et al. | Aug 2004 | B2 |
6810395 | Bharat | Oct 2004 | B1 |
6834110 | Marconcini et al. | Dec 2004 | B1 |
6859800 | Roche et al. | Feb 2005 | B1 |
6915435 | Merriam | Jul 2005 | B1 |
6928526 | Zhu et al. | Aug 2005 | B1 |
6947954 | Cohen et al. | Sep 2005 | B2 |
6954750 | Bradford | Oct 2005 | B2 |
6996580 | Bae et al. | Feb 2006 | B2 |
7047386 | Ngai et al. | May 2006 | B1 |
7089286 | Malik | Aug 2006 | B1 |
7137065 | Huang et al. | Nov 2006 | B1 |
7146388 | Stakutis et al. | Dec 2006 | B2 |
7174368 | Ross, Jr. | Feb 2007 | B2 |
7260568 | Zhang et al. | Aug 2007 | B2 |
7269564 | Milsted et al. | Sep 2007 | B1 |
7284191 | Grefenstette et al. | Oct 2007 | B2 |
7287025 | Wen et al. | Oct 2007 | B2 |
7313556 | Gallivan et al. | Dec 2007 | B2 |
7325041 | Hara et al. | Jan 2008 | B2 |
7458082 | Slaughter et al. | Nov 2008 | B1 |
7526478 | Friedman | Apr 2009 | B2 |
7533291 | Lin | May 2009 | B2 |
7650341 | Oratovsky et al. | Jan 2010 | B1 |
20020002468 | Spagna et al. | Jan 2002 | A1 |
20020019814 | Ganesan | Feb 2002 | A1 |
20020038296 | Margolus et al. | Mar 2002 | A1 |
20020059317 | Black et al. | May 2002 | A1 |
20020107803 | Lisanke et al. | Aug 2002 | A1 |
20020107877 | Whiting et al. | Aug 2002 | A1 |
20020116402 | Luke | Aug 2002 | A1 |
20020120925 | Logan | Aug 2002 | A1 |
20020138376 | Hinkle | Sep 2002 | A1 |
20020140960 | Ishikawa | Oct 2002 | A1 |
20020143737 | Seki et al. | Oct 2002 | A1 |
20020143871 | Meyer et al. | Oct 2002 | A1 |
20020147733 | Gold et al. | Oct 2002 | A1 |
20020161745 | Call | Oct 2002 | A1 |
20020178176 | Sekiguchi et al. | Nov 2002 | A1 |
20020194324 | Guha | Dec 2002 | A1 |
20030028889 | McCoskey et al. | Feb 2003 | A1 |
20030069803 | Pollitt | Apr 2003 | A1 |
20030069877 | Grefenstette et al. | Apr 2003 | A1 |
20030105718 | Hurtado et al. | Jun 2003 | A1 |
20030110130 | Pelletier | Jun 2003 | A1 |
20030126247 | Strasser et al. | Jul 2003 | A1 |
20030126362 | Camble et al. | Jul 2003 | A1 |
20030135464 | Mourad et al. | Jul 2003 | A1 |
20030145209 | Eagle et al. | Jul 2003 | A1 |
20030182304 | Summerlin et al. | Sep 2003 | A1 |
20030233455 | Leber et al. | Dec 2003 | A1 |
20040034632 | Carmel et al. | Feb 2004 | A1 |
20040054630 | Ginter et al. | Mar 2004 | A1 |
20040064447 | Simske et al. | Apr 2004 | A1 |
20040064537 | Anderson et al. | Apr 2004 | A1 |
20040068604 | Le et al. | Apr 2004 | A1 |
20040083211 | Bradford | Apr 2004 | A1 |
20040143609 | Gardner et al. | Jul 2004 | A1 |
20040158559 | Poltorak | Aug 2004 | A1 |
20040186827 | Anick et al. | Sep 2004 | A1 |
20040193695 | Salo et al. | Sep 2004 | A1 |
20040205448 | Grefenstette et al. | Oct 2004 | A1 |
20050097081 | Sellen et al. | May 2005 | A1 |
20050097092 | Annau et al. | May 2005 | A1 |
20050144157 | Moody et al. | Jun 2005 | A1 |
20050160481 | Todd et al. | Jul 2005 | A1 |
20050223067 | Buchheit et al. | Oct 2005 | A1 |
20050234843 | Beckius et al. | Oct 2005 | A1 |
20050283473 | Rousso et al. | Dec 2005 | A1 |
20060167842 | Watson | Jul 2006 | A1 |
20060173824 | Bensky et al. | Aug 2006 | A1 |
20060230035 | Bailey et al. | Oct 2006 | A1 |
20070011154 | Musgrove et al. | Jan 2007 | A1 |
20070033177 | Friedman | Feb 2007 | A1 |
20070033183 | Friedman | Feb 2007 | A1 |
20070033410 | Eagle et al. | Feb 2007 | A1 |
20070038616 | Guha | Feb 2007 | A1 |
20070050339 | Kasperski et al. | Mar 2007 | A1 |
20070050351 | Kasperski et al. | Mar 2007 | A1 |
20070061335 | Ramer et al. | Mar 2007 | A1 |
20070088687 | Bromm et al. | Apr 2007 | A1 |
20070192284 | Finley et al. | Aug 2007 | A1 |
20070198470 | Freedman et al. | Aug 2007 | A1 |
20070233692 | Lisa et al. | Oct 2007 | A1 |
20070245108 | Yasaki et al. | Oct 2007 | A1 |
20070253643 | Nagarajan | Nov 2007 | A1 |
20070255686 | Kemp et al. | Nov 2007 | A1 |
20070266009 | Williams | Nov 2007 | A1 |
20070282811 | Musgrove | Dec 2007 | A1 |
20070282826 | Hoeber et al. | Dec 2007 | A1 |
20070288450 | Datta et al. | Dec 2007 | A1 |
20080005651 | Grefenstette et al. | Jan 2008 | A1 |
20080059187 | Roitblat et al. | Mar 2008 | A1 |
20080059512 | Roitblat et al. | Mar 2008 | A1 |
20080077570 | Tang et al. | Mar 2008 | A1 |
20080097975 | Guay et al. | Apr 2008 | A1 |
20080104032 | Sarkar | May 2008 | A1 |
20080147644 | Aridor et al. | Jun 2008 | A1 |
20080162498 | Omoigui | Jul 2008 | A1 |
20080195601 | Ntoulas et al. | Aug 2008 | A1 |
20090024612 | Tang et al. | Jan 2009 | A1 |
20090182737 | Melman | Jul 2009 | A1 |
Entry |
---|
Amati, G. et al.; “Probabilistic Models of Information Retreival Based on Measuring the Divergence from Randomness”; ACM Transactions on Information Systems, vol. 20, No. 4. Oct. 2002. |
Attar et al., “Local Feedback in Full-Text Retrieval Systems”, Journal of the ACM (JACM), vol. 24, Issue 3 (Jul. 1977), pp. 397-417, ISSN:0004-5411. |
Cai et al., “Automatic Query Expansion Based on Directed Divergence”, Proceedings of the International Conference on Information Technology: Coding and Computing, p. 8, 2002, ISBN:0-7695-1506-1. |
Cohen, E. et al.; “Processing Top k Queries from Samples”; ACM. |
Conlon, S., “Automatic Web Searching and Categorizing Using Query Expansion and Focusing”, (Mississippi University.), 6p, Jan. 2003. |
Crestani, F. et al.; “Is This Document Relevant? . . . Probably”: A Survey of Probabilistic Models in Information Retrieval; ACM Computing Surveys vol. 30, No. 4, Dec. 1998. |
E. M. Voorhees, “Query expansion using lexical-semantic relations”, Annual ACM Conference on Research and Development in Information Retrieval, Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, 1994, Dublin, Ireland, Aug. 1994, pp. 61-69, ISBN:0-387-1988-X. |
Gauch et al., “A Corpus Analysis Approach for Automatic Query Expansion and Its Extension to Multiple Databases” ACM Transactions on Information Systems, vol. 17, No. 3, Jul. 1999, pp. 250-269. |
Gehler, P. et al.; “The Rate Adapting Poisson Model for Information Retrieval and Object Recognition”; Proceedings of the 23rd International Conference on Machine Learning, 2006. |
http://www.googleguide.com/tools.html <retrieved on Jul. 8, 2009>. |
http://www.lexisnexis.com/toolbar/help/using.htm#HighlightAndClick <retrieved on Jul. 8, 2009>. |
Ilyas, I. et al.; “Adaptive Rank-Aware Query Optimization in Relational Databases”; ACM Transactions on Database Systems; vol. 31. No. 4, Dec. 2006. |
Luk, R. et al.; “A Comparison of Chinese Document Indexing Strategies and Retrieval Models”; ACM Transactions on Asian Language Information Processing, vol. 1, No. 3, Sep. 2002. |
Margulis, E.; “Modelling Documents with Multiple Poisson Distributions”; Information Processing & Management vol. 29, No. 2, 1993. |
Margulis, E.; “N-Poisson Document Modelling”; SIGIR '92. |
Mei, Q. et al.; “A Study of Poisson Query Generation Model for Information Retrieval”; SIGIR '07 Proceedings, Session 12: Formal Models. |
Mitra et al., “Improving Automatic Query Expansion”, Annual ACM Conference on Research and Development in Information Retrieval, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia, pp. 206-214, Year of Publication: 1998, ISBN:1-58113-015-5. |
Ozmutlu, H. et al.; “Analysis of large data logs: an application of Poisson sampling on excite web queries”; Information Processing and Management, vol. 38, 2002. |
Robertson, S. et al.; “Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retreival”; Centre for Interactive Systems Research, London. |
Roelleke, T.; “A Frequency-based and a Poisson-based Definition of the Probability of Being Informative”; SIGIR '03. |
Tao, Y. et al.; “Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions”; Proceedings of the 31st VLDB Conference, Norway 2005. |
Volkmer et al., “Exploring Automatic Query Refinement for Text-Based Video Retrieval”, IEEE International Conference on Multimedia and Expo, Jul. 9-12, 2006, pp. 765-768, Toronto, Ont., ISBN: 1-4244-0366-7. |
Xu et al., “Query expansion using local and global document analysis,” in Proc. of ACM-SIGIR 1996, Zurich, Switzerland, Aug. 18-22, 1996, pp. 4-11. |
Yan, T. et al.; “The SIFT Information Dissemination System”; ACM Transactions on Database Systems, vol. 24, No. 4, Dec. 1999. |
Zakariah, R. et al.; “Detecting Junk Mails by Implementing Statistical Theory”; IEEE Proceedings of the 20th International Conference on Advanced Information Networking and Applications, 2006. |
Roitblat, Herbert L. (2004), “Electronic Data Are Increasingly Important to Successful Litigation”, Trends in Electronic Data. |
Roitblat, Herbert L. (2005), “Document Retrieval”, DolphinSearch, Inc. |
“The Sedona Principles: Best Practices Recommendations & Principles for Addressing Electronic Document Production,” The Sedona Conference Working Group Series, Jul. 2005 Version. |
Meng, W., et al., “Building Efficient and Effective Metasearch Engines,” ACM Computing Surveys, ACM, New York, NY, US, US, vol. 34, No. 1, Mar. 1, 2002, pp. 48-89. |
Comparing IBM Tivoli Storage Manager and VERITAS NetBackup in Real-World Environments. A summary by IBM of the whitepaper and benchmark written by Progressive Strategies, Nov. 8, 2002. |
Beyond Backup Toward Storage Management by M. Kaczmarski, T. Jiang and D.A. Pease. IBM Systems Journal, vol. 42, pp. 322-337, Nov. 2, 2003. |
PCT Search Report and Written Opinion dated Jun. 18, 2008, PCT/US07/13483. |
PCT Search Report and Written Opinion dated May 8, 2009, PCT/US2009/032990. |
Number | Date | Country | |
---|---|---|---|
60440855 | Jan 2003 | US | |
60440728 | Jan 2003 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10697728 | Oct 2003 | US |
Child | 10759663 | US |